English version -> medium.com
designateの各サービスのworker数をデフォルトの1から2以上にした時、service_statusが更新されないのでその原因を調べた。
まず、worker数が1の時の起動の流れを確認 すると、designate/service_status.HeartBeatEmitter.startが実行されているのがわかります。
workerが1の場合
2019-04-04 12:14:52.670 96400 DEBUG designate.service_status [-] designate/service_status.HeartBeatEmitter.__init__: 96400 __init__ /usr/lib/python2.7/site-packages/designate/service_status.py:53 2019-04-04 12:14:52.730 96400 DEBUG designate.service_status [-] designate/service_status.HeartBeatEmitter._emit_heartbeat: 96400 _emit_heartbeat /usr/lib/python2.7/site-packages/designate/service_status.py:75 2019-04-04 12:14:52.730 96400 DEBUG designate.service_status [-] designate/service_status.HeartBeatEmitter._emit_heartbeat: _running=False _emit_heartbeat /usr/lib/python2.7/site-packages/designate/service_status.py:76 2019-04-04 12:14:52.731 96400 DEBUG designate.service_status [-] designate/service_status.HeartBeatEmitter.start: 96400 start /usr/lib/python2.7/site-packages/designate/service_status.py:100 2019-04-04 12:14:52.732 96400 DEBUG designate.service_status [-] designate/service_status.HeartBeatEmitter.start: _running=True start /usr/lib/python2.7/site-packages/designate/service_status.py:102 2019-04-04 12:14:57.735 96400 DEBUG designate.service_status [-] designate/service_status.HeartBeatEmitter._emit_heartbeat: 96400 _emit_heartbeat /usr/lib/python2.7/site-packages/designate/service_status.py:75 2019-04-04 12:14:57.735 96400 DEBUG designate.service_status [-] designate/service_status.HeartBeatEmitter._emit_heartbeat: _running=True _emit_heartbeat /usr/lib/python2.7/site-packages/designate/service_status.py:76 2019-04-04 12:15:02.732 96400 DEBUG designate.service_status [req-2b1269ed-5802-4787-a2f2-8ef583d01bd7 - - - - -] designate/service_status.HeartBeatEmitter._emit_heartbeat: 96400 _emit_heartbeat /usr/lib/python2.7/site-packages/designate/service_status.py:75 2019-04-04 12:15:02.733 96400 DEBUG designate.service_status [req-2b1269ed-5802-4787-a2f2-8ef583d01bd7 - - - - -] designate/service_status.HeartBeatEmitter._emit_heartbeat: _running=True _emit_heartbeat /usr/lib/python2.7/site-packages/designate/service_status.py:76 2019-04-04 12:15:07.731 96400 DEBUG designate.service_status [req-b3c89e7e-a321-4c4c-8a0b-d8bb880962bf - - - - -] designate/service_status.HeartBeatEmitter._emit_heartbeat: 96400 _emit_heartbeat /usr/lib/python2.7/site-packages/designate/service_status.py:75 2019-04-04 12:15:07.732 96400 DEBUG designate.service_status [req-b3c89e7e-a321-4c4c-8a0b-d8bb880962bf - - - - -] designate/service_status.HeartBeatEmitter._emit_heartbeat: _running=True _emit_heartbeat /usr/lib/python2.7/site-packages/designate/service_status.py:76
MariaDB [designate]> select * from service_statuses where service_name='api'\G
*************************** 1. row ***************************
id: 4e24db8d45be496f8119fda6eb706d96
created_at: 2019-04-04 03:14:57
updated_at: 2019-04-04 03:37:18
service_name: api
hostname: dns001.host
heartbeated_at: 2019-04-04 03:37:18
status: UP
stats: {}
capabilities: {}
1 row in set (0.00 sec)
ではworker数が2以上になった場合はどうなるのか。
わかりやすいように以下のようにデバック用のログが表示されるようにして起動します。
# diff -u /usr/lib/python2.7/site-packages/designate/service_status.py.org /usr/lib/python2.7/site-packages/designate/service_status.py
--- /usr/lib/python2.7/site-packages/designate/service_status.py.org 2019-04-03 17:48:46.892646687 +0900
+++ /usr/lib/python2.7/site-packages/designate/service_status.py 2019-04-03 17:47:35.032081139 +0900
@@ -12,6 +12,7 @@
# License for the specific language governing permissions and limitations
# under the License.
import abc
+import os
from oslo_config import cfg
from oslo_log import log as logging
@@ -49,6 +50,7 @@
def __init__(self, service, threadgroup, status_factory=None):
super(HeartBeatEmitter, self).__init__()
+ LOG.debug("designate/service_status.HeartBeatEmitter.__init__: %s", os.getpid())
self._service = service
self._hostname = CONF.host
@@ -70,6 +72,7 @@
"""
Returns Status, Stats, Capabilities
"""
+ LOG.debug("designate/service_status.HeartBeatEmitter._emit_heartbeat: %s", os.getpid())
if not self._running:
return
@@ -93,6 +96,7 @@
pass
def start(self):
+ LOG.debug("designate/service_status.HeartBeatEmitter.start: %s", os.getpid())
self._running = True
def stop(self):
そして、起動すると、以下のようなログになります。
2019-04-04 12:37:24.512 98588 DEBUG designate.service_status [-] designate/service_status.HeartBeatEmitter.__init__: 98588 __init__ /usr/lib/python2.7/site-packages/designate/service_status.py:53 2019-04-04 12:37:24.533 98598 DEBUG designate.service_status [-] designate/service_status.HeartBeatEmitter.start: 98598 start /usr/lib/python2.7/site-packages/designate/service_status.py:100 2019-04-04 12:37:24.533 98598 DEBUG designate.service_status [-] designate/service_status.HeartBeatEmitter.start: _running=True start /usr/lib/python2.7/site-packages/designate/service_status.py:102 2019-04-04 12:37:24.536 98599 DEBUG designate.service_status [-] designate/service_status.HeartBeatEmitter.start: 98599 start /usr/lib/python2.7/site-packages/designate/service_status.py:100 2019-04-04 12:37:24.536 98599 DEBUG designate.service_status [-] designate/service_status.HeartBeatEmitter.start: _running=True start /usr/lib/python2.7/site-packages/designate/service_status.py:102 2019-04-04 12:37:24.537 98600 DEBUG designate.service_status [-] designate/service_status.HeartBeatEmitter.start: 98600 start /usr/lib/python2.7/site-packages/designate/service_status.py:100 2019-04-04 12:37:24.538 98600 DEBUG designate.service_status [-] designate/service_status.HeartBeatEmitter.start: _running=True start /usr/lib/python2.7/site-packages/designate/service_status.py:102 2019-04-04 12:37:24.563 98601 DEBUG designate.service_status [-] designate/service_status.HeartBeatEmitter.start: 98601 start /usr/lib/python2.7/site-packages/designate/service_status.py:100 2019-04-04 12:37:24.563 98601 DEBUG designate.service_status [-] designate/service_status.HeartBeatEmitter.start: _running=True start /usr/lib/python2.7/site-packages/designate/service_status.py:102 2019-04-04 12:37:24.636 98588 DEBUG designate.service_status [-] designate/service_status.HeartBeatEmitter._emit_heartbeat: 98588 _emit_heartbeat /usr/lib/python2.7/site-packages/designate/service_status.py:75 2019-04-04 12:37:24.636 98588 DEBUG designate.service_status [-] designate/service_status.HeartBeatEmitter._emit_heartbeat: _running=False _emit_heartbeat /usr/lib/python2.7/site-packages/designate/service_status.py:76 2019-04-04 12:37:29.637 98588 DEBUG designate.service_status [-] designate/service_status.HeartBeatEmitter._emit_heartbeat: 98588 _emit_heartbeat /usr/lib/python2.7/site-packages/designate/service_status.py:75 2019-04-04 12:37:29.637 98588 DEBUG designate.service_status [-] designate/service_status.HeartBeatEmitter._emit_heartbeat: _running=False _emit_heartbeat /usr/lib/python2.7/site-packages/designate/service_status.py:76
__init__ と _emit_heartbeat は process_idが 98588 で startは 98598, 98599, 98600, 98601になってます。
そして、各process idを確認すると、
# ps auxwwf | grep designate-api designa+ 98588 1.9 0.7 347336 62380 ? Ss 12:37 0:02 /usr/bin/python2 /usr/bin/designate-api --config-file /etc/designate/designate.conf --log-file /var/log/designate/api.log designa+ 98598 0.2 0.8 392348 69792 ? S 12:37 0:00 \_ /usr/bin/python2 /usr/bin/designate-api --config-file /etc/designate/designate.conf --log-file /var/log/designate/api.log designa+ 98599 0.2 0.8 392344 69784 ? S 12:37 0:00 \_ /usr/bin/python2 /usr/bin/designate-api --config-file /etc/designate/designate.conf --log-file /var/log/designate/api.log designa+ 98600 0.2 0.8 392344 69784 ? S 12:37 0:00 \_ /usr/bin/python2 /usr/bin/designate-api --config-file /etc/designate/designate.conf --log-file /var/log/designate/api.log designa+ 98601 0.2 0.8 392348 69784 ? S 12:37 0:00 \_ /usr/bin/python2 /usr/bin/designate-api --config-file /etc/designate/designate.conf --log-file /var/log/designate/api.log
となってるので、__init__と_emit_heartbeatは親プロセスでstartは子プロセスということがわかります。
start メソッドでは _running という値をTrueに変更していて、これは defaultではFalseなので実際に確認を行っている親プロセスでは、False から変わることはありません。
def start(self):
self._running = True
それが原因で_emit_heartbeat の if not self._running の判定は毎回 Falseでreturnされるため、更新がされていません。
def _emit_heartbeat(self):
"""
Returns Status, Stats, Capabilities
"""
if not self._running:
return
status, stats, capabilities = self._get_status()
service_status = objects.ServiceStatus(
service_name=self._service,
hostname=self._hostname,
status=status,
stats=stats,
capabilities=capabilities,
heartbeated_at=timeutils.utcnow()
)
LOG.trace("Emitting %s", service_status)
self._transmit(service_status)
だから親プロセスの _runningフラグを書き換える必要があります。
# diff -u /usr/lib/python2.7/site-packages/designate/cmd/api.py.org /usr/lib/python2.7/site-packages/designate/cmd/api.py
--- /usr/lib/python2.7/site-packages/designate/cmd/api.py.org 2019-04-03 18:10:09.908918965 +0900
+++ /usr/lib/python2.7/site-packages/designate/cmd/api.py 2019-04-03 18:09:49.886038819 +0900
@@ -40,4 +40,5 @@
server = api_service.Service(threads=CONF['service:api'].threads)
service.serve(server, workers=CONF['service:api'].workers)
+ server.heartbeat_emitter.start()
service.wait()
こういう設定を他のサービス(central, pool_manager, zone_manager, mdns)にも修正を行うと、
以下のように止まってたservice_statusが更新されるようになります。
- before
MariaDB [designate]> select * from service_statuses where service_name='api'\G
*************************** 1. row ***************************
id: 4e24db8d45be496f8119fda6eb706d96
created_at: 2019-04-04 03:14:57
updated_at: 2019-04-04 03:37:18
service_name: api
hostname: dns001.host
heartbeated_at: 2019-04-04 03:37:18
status: UP
stats: {}
capabilities: {}
1 row in set (0.00 sec)
- after
MariaDB [designate]> select * from service_statuses where service_name='api'\G
*************************** 1. row ***************************
id: 4e24db8d45be496f8119fda6eb706d96
created_at: 2019-04-04 03:14:57
updated_at: 2019-04-04 03:42:23
service_name: api
hostname: dns001.host
heartbeated_at: 2019-04-04 03:42:23
status: UP
stats: {}
capabilities: {}
1 row in set (0.00 sec)
そしてlogも
2019-04-04 12:42:23.711 99088 DEBUG designate.service_status [req-8be27649-7819-4416-b936-9015b9515963 - - - - -] designate/service_status.HeartBeatEmitter._emit_heartbeat: 99088 _emit_heartbeat /usr/lib/python2.7/site-packages/designate/service_status.py:75 2019-04-04 12:42:23.712 99088 DEBUG designate.service_status [req-8be27649-7819-4416-b936-9015b9515963 - - - - -] designate/service_status.HeartBeatEmitter._emit_heartbeat: _running=True _emit_heartbeat /usr/lib/python2.7/site-packages/designate/service_status.py:76 2019-04-04 12:42:28.716 99088 DEBUG designate.service_status [req-286a4481-dad8-4fc5-802d-ad1d9e0b8d3b - - - - -] designate/service_status.HeartBeatEmitter._emit_heartbeat: 99088 _emit_heartbeat /usr/lib/python2.7/site-packages/designate/service_status.py:75 2019-04-04 12:42:28.717 99088 DEBUG designate.service_status [req-286a4481-dad8-4fc5-802d-ad1d9e0b8d3b - - - - -] designate/service_status.HeartBeatEmitter._emit_heartbeat: _running=True _emit_heartbeat /usr/lib/python2.7/site-packages/designate/service_status.py:76
と、しっかりTrueになっていることが確認できます。
プルリクエスト
このイシューについてはアップストリームに反映してマージされました。