Kunitoshi Otonari
otona****@yskne*****
2008年 11月 14日 (金) 09:03:52 JST
山内様 こんにちは、音成です。 早急なご回答ありがとうございます。 返事が遅くなりまして大変申し訳ございません。 >最近のバージョンでは、私も互換モードで動作させた事がないのですが、Version2 >モードでも、停止に失敗する問題がVersion2.1.3ではあったと記憶しています。 山内さんのおっしゃる通りで現在Version1の互換モードで動作させています。 (Version2モードでは、設定が大変そうなのでVersion1を使用するようにしています。) Version2モードでも問題があるということですのでheartbeatのバージョンをあげること で、対応可能かどうかなど、もう少し調べてみようと思います。 以下/var/log/ha-logと/etc/ha.cfです。 ■/var/log/ha-log --------------------------------------------------------------- heartbeat[9262]: 2008/11/05_16:06:40 info: Daily informational memory statistics heartbeat[9262]: 2008/11/05_16:06:40 info: MSG stats: 250/10599 ms age 0 [pid9262/MST_CONTROL] heartbeat[9262]: 2008/11/05_16:06:40 info: cl_malloc stats: 7923/356932 578940/238325 [pid9262/MST_CONTROL] heartbeat[9262]: 2008/11/05_16:06:40 info: RealMalloc stats: 584164 total malloc bytes. pid [9262/MST_CONTROL] heartbeat[9262]: 2008/11/05_16:06:40 info: Current arena value: 0 heartbeat[9262]: 2008/11/05_16:06:40 info: MSG stats: 0/2 ms age 9539820 [pid9919/HBFIFO] heartbeat[9262]: 2008/11/05_16:06:40 info: cl_malloc stats: 347/418 33820/16329 [pid9919/HBFIFO] heartbeat[9262]: 2008/11/05_16:06:40 info: RealMalloc stats: 35480 total malloc bytes. pid [9919/HBFIFO] heartbeat[9262]: 2008/11/05_16:06:40 info: Current arena value: 0 heartbeat[9262]: 2008/11/05_16:06:40 info: MSG stats: 0/0 ms age 9893074 [pid9920/HBWRITE] heartbeat[9262]: 2008/11/05_16:06:40 info: cl_malloc stats: 373/11609 39364/20769 [pid9920/HBWRITE] heartbeat[9262]: 2008/11/05_16:06:40 info: RealMalloc stats: 47952 total malloc bytes. pid [9920/HBWRITE] heartbeat[9262]: 2008/11/05_16:06:40 info: Current arena value: 0 heartbeat[9262]: 2008/11/05_16:06:40 info: MSG stats: 0/0 ms age 9893084 [pid9921/HBREAD] heartbeat[9262]: 2008/11/05_16:06:40 info: cl_malloc stats: 373/427 31236/16716 [pid9921/HBREAD] heartbeat[9262]: 2008/11/05_16:06:40 info: RealMalloc stats: 31320 total malloc bytes. pid [9921/HBREAD] heartbeat[9262]: 2008/11/05_16:06:40 info: Current arena value: 0 heartbeat[9262]: 2008/11/05_16:06:40 info: These are nothing to worry about. heartbeat[9262]: 2008/11/05_16:16:40 info: Daily informational memory statistics heartbeat[9262]: 2008/11/05_16:16:40 info: MSG stats: 250/11261 ms age 0 [pid9262/MST_CONTROL] heartbeat[9262]: 2008/11/05_16:16:40 info: cl_malloc stats: 7923/379200 578940/237998 [pid9262/MST_CONTROL] heartbeat[9262]: 2008/11/05_16:16:40 info: RealMalloc stats: 584164 total malloc bytes. pid [9262/MST_CONTROL] heartbeat[9262]: 2008/11/05_16:16:40 info: Current arena value: 0 heartbeat[9262]: 2008/11/05_16:16:40 info: MSG stats: 0/2 ms age 10139820 [pid9919/HBFIFO] heartbeat[9262]: 2008/11/05_16:16:40 info: cl_malloc stats: 347/418 33820/16329 [pid9919/HBFIFO] heartbeat[9262]: 2008/11/05_16:16:40 info: RealMalloc stats: 35480 total malloc bytes. pid [9919/HBFIFO] heartbeat[9262]: 2008/11/05_16:16:40 info: Current arena value: 0 heartbeat[9262]: 2008/11/05_16:16:40 info: MSG stats: 0/0 ms age 10493074 [pid9920/HBWRITE] heartbeat[9262]: 2008/11/05_16:16:40 info: cl_malloc stats: 373/12308 39364/20769 [pid9920/HBWRITE] heartbeat[9262]: 2008/11/05_16:16:40 info: RealMalloc stats: 47952 total malloc bytes. pid [9920/HBWRITE] heartbeat[9262]: 2008/11/05_16:16:40 info: Current arena value: 0 heartbeat[9262]: 2008/11/05_16:16:40 info: MSG stats: 0/0 ms age 10493074 [pid9921/HBREAD] heartbeat[9262]: 2008/11/05_16:16:40 info: cl_malloc stats: 373/427 31236/16716 [pid9921/HBREAD] heartbeat[9262]: 2008/11/05_16:16:40 info: RealMalloc stats: 31320 total malloc bytes. pid [9921/HBREAD] heartbeat[9262]: 2008/11/05_16:16:40 info: Current arena value: 0 heartbeat[9262]: 2008/11/05_16:16:40 info: These are nothing to worry about. heartbeat[9262]: 2008/11/05_16:20:11 info: Link nsm2.ysk.com:eth1 up. heartbeat[9262]: 2008/11/05_16:20:11 info: Status update for node nsm2.ysk.com: status init heartbeat[9262]: 2008/11/05_16:20:11 info: Status update for node nsm2.ysk.com: status up heartbeat[9262]: 2008/11/05_16:20:11 info: Managed write_hostcachedata process 4477 exited with return code 0. harc[4476]: 2008/11/05_16:20:11 info: Running /etc/ha.d/rc.d/status status heartbeat[9262]: 2008/11/05_16:20:11 info: Managed status process 4476 exited with return code 0. harc[4493]: 2008/11/05_16:20:11 info: Running /etc/ha.d/rc.d/status status heartbeat[9262]: 2008/11/05_16:20:11 info: Managed status process 4493 exited with return code 0. heartbeat[9262]: 2008/11/05_16:20:11 info: all clients are now paused heartbeat[9262]: 2008/11/05_16:20:11 info: all clients are now resumed heartbeat[9262]: 2008/11/05_16:20:12 info: Status update for node nsm2.ysk.com: status active heartbeat[9262]: 2008/11/05_16:20:12 WARN: T_STARTING received during takeover. heartbeat[9262]: 2008/11/05_16:20:12 info: AnnounceTakeover(local 1, foreign 1, reason 'T_RESOURCES(us)' (1)) harc[4511]: 2008/11/05_16:20:12 info: Running /etc/ha.d/rc.d/status status heartbeat[9262]: 2008/11/05_16:20:12 info: Managed status process 4511 exited with return code 0. heartbeat[9262]: 2008/11/05_16:20:12 info: other_holds_resources: 0 heartbeat[9262]: 2008/11/05_16:20:12 info: remote resource transition completed. heartbeat[9262]: 2008/11/05_16:20:12 info: AnnounceTakeover(local 1, foreign 1, reason 'T_RESOURCES(us)' (1)) heartbeat[9262]: 2008/11/05_16:20:12 info: other_holds_resources: 0 heartbeat[9262]: 2008/11/05_16:26:40 info: Daily informational memory statistics heartbeat[9262]: 2008/11/05_16:26:40 info: MSG stats: 8/12445 ms age 0 [pid9262/MST_CONTROL] heartbeat[9262]: 2008/11/05_16:26:40 info: cl_malloc stats: 664/418256 77568/38925 [pid9262/MST_CONTROL] heartbeat[9262]: 2008/11/05_16:26:40 info: RealMalloc stats: 597636 total malloc bytes. pid [9262/MST_CONTROL] heartbeat[9262]: 2008/11/05_16:26:40 info: Current arena value: 0 heartbeat[9262]: 2008/11/05_16:26:40 info: MSG stats: 0/2 ms age 10739820 [pid9919/HBFIFO] heartbeat[9262]: 2008/11/05_16:26:40 info: cl_malloc stats: 347/418 33820/16329 [pid9919/HBFIFO] heartbeat[9262]: 2008/11/05_16:26:40 info: RealMalloc stats: 35480 total malloc bytes. pid [9919/HBFIFO] heartbeat[9262]: 2008/11/05_16:26:40 info: Current arena value: 0 heartbeat[9262]: 2008/11/05_16:26:40 info: MSG stats: 0/0 ms age 11093074 [pid9920/HBWRITE] heartbeat[9262]: 2008/11/05_16:26:40 info: cl_malloc stats: 374/13051 39416/20793 [pid9920/HBWRITE] heartbeat[9262]: 2008/11/05_16:26:40 info: RealMalloc stats: 47952 total malloc bytes. pid [9920/HBWRITE] heartbeat[9262]: 2008/11/05_16:26:40 info: Current arena value: 0 heartbeat[9262]: 2008/11/05_16:26:40 info: MSG stats: 0/0 ms age 11093074 [pid9921/HBREAD] heartbeat[9262]: 2008/11/05_16:26:40 info: cl_malloc stats: 374/1382 39448/20813 [pid9921/HBREAD] heartbeat[9262]: 2008/11/05_16:26:40 info: RealMalloc stats: 39860 total malloc bytes. pid [9921/HBREAD] heartbeat[9262]: 2008/11/05_16:26:40 info: Current arena value: 0 heartbeat[9262]: 2008/11/05_16:26:40 info: These are nothing to worry about. heartbeat[9262]: 2008/11/05_16:27:45 info: other_holds_resources: 0 heartbeat[9262]: 2008/11/05_16:27:54 WARN: node nsm2.ysk.com: is dead heartbeat[9262]: 2008/11/05_16:27:54 info: Dead node nsm2.ysk.com gave up resources. heartbeat[9262]: 2008/11/05_16:27:54 info: Link nsm2.ysk.com:eth1 dead. heartbeat[9262]: 2008/11/05_16:31:58 info: Heartbeat restart on node nsm2.ysk.com heartbeat[9262]: 2008/11/05_16:31:58 info: Link nsm2.ysk.com:eth1 up. heartbeat[9262]: 2008/11/05_16:31:58 info: Status update for node nsm2.ysk.com: status active harc[10840]: 2008/11/05_16:31:58 info: Running /etc/ha.d/rc.d/status status heartbeat[9262]: 2008/11/05_16:31:58 info: Managed status process 10840 exited with return code 0. heartbeat[9262]: 2008/11/05_16:36:40 info: Daily informational memory statistics heartbeat[9262]: 2008/11/05_16:36:40 info: MSG stats: 3/13568 ms age 0 [pid9262/MST_CONTROL] heartbeat[9262]: 2008/11/05_16:36:40 info: cl_malloc stats: 514/455311 67208/34813 [pid9262/MST_CONTROL] heartbeat[9262]: 2008/11/05_16:36:40 info: RealMalloc stats: 597636 total malloc bytes. pid [9262/MST_CONTROL] heartbeat[9262]: 2008/11/05_16:36:40 info: Current arena value: 0 heartbeat[9262]: 2008/11/05_16:36:40 info: MSG stats: 0/2 ms age 11339830 [pid9919/HBFIFO] heartbeat[9262]: 2008/11/05_16:36:40 info: cl_malloc stats: 347/418 33820/16329 [pid9919/HBFIFO] heartbeat[9262]: 2008/11/05_16:36:40 info: RealMalloc stats: 35480 total malloc bytes. pid [9919/HBFIFO] heartbeat[9262]: 2008/11/05_16:36:40 info: Current arena value: 0 heartbeat[9262]: 2008/11/05_16:36:40 info: MSG stats: 0/0 ms age 11693084 [pid9920/HBWRITE] heartbeat[9262]: 2008/11/05_16:36:40 info: cl_malloc stats: 373/13788 39364/20769 [pid9920/HBWRITE] heartbeat[9262]: 2008/11/05_16:36:40 info: RealMalloc stats: 47952 total malloc bytes. pid [9920/HBWRITE] heartbeat[9262]: 2008/11/05_16:36:40 info: Current arena value: 0 heartbeat[9262]: 2008/11/05_16:36:40 info: MSG stats: 0/0 ms age 11693214 [pid9921/HBREAD] heartbeat[9262]: 2008/11/05_16:36:40 info: cl_malloc stats: 374/2224 39448/20813 [pid9921/HBREAD] heartbeat[9262]: 2008/11/05_16:36:40 info: RealMalloc stats: 39860 total malloc bytes. pid [9921/HBREAD] heartbeat[9262]: 2008/11/05_16:36:40 info: Current arena value: 0 heartbeat[9262]: 2008/11/05_16:36:40 info: These are nothing to worry about. heartbeat[9262]: 2008/11/05_16:36:40 WARN: Gmain_timeout_dispatch: Dispatch function for memory stats took too long to execute: 160 ms (> 100 ms) (GSource: 0x970aaa8) heartbeat[9262]: 2008/11/05_16:37:26 WARN: Shutdown delayed until current resource activity finishes. --------------------------------------------------------------- ■/etc/ha.cf --------------------------------------------------- logfile /var/log/ha-log debugfile /var/log/ha-debug debug 2 logfacility local0 keepalive 1 deadtime 9 warntime 3 initdead 60 udpport 694 ucast eth1 172.16.1.3 auto_failback off node nsm2.ysk.com node nsm1.ysk.dom respawn root /ndp/HighAvailability/bin/ha_watch 5 --------------------------------------------------- ※「/ndp/HighAvailability/bin/」は、独自に作成したパスです。 以上です。宜しくお願い致します。 <HJEEL****@ybb*****> の、 "Re: [Linux-ha-jp] heartbeat の停止処理が完了しない。" において、 ""HIDEO YAMAUCHI" <renay****@ybb*****>"さんは書きました: > 音成さん > > こんにちは、山内といいます。はじめまして。。。。 > > 音也さんのメールを見ると、おそらく、HeartbeatはVersion1の互換モードで動作さ > れていると思います。 > 最近のバージョンでは、私も互換モードで動作させた事がないのですが、Version2 > モードでも、停止に失敗する問題がVersion2.1.3ではあったと記憶しています。 > > 差し支えない範囲で構いませんので、ログと、/etc/ha.cfを添付していただければ、 > もう少し状況が把握できると思いますが。。。 > > 如何でしょうか? > > 以上、よろしく御願いいたします。 > > > -----Original Message----- > > From: linux****@lists***** > > [mailto:linux****@lists*****]On Behalf Of > > Kunitoshi Otonari > > Sent: Tuesday, November 11, 2008 7:06 PM > > To: linux****@lists***** > > Subject: [Linux-ha-jp] heartbeat の停止処理が完了しない。 > > > > > > > > 音成と申します。 > > > > 環境は以下の通りです。 > > ・OS:CentOS > > ・カーネル:2.6.9-67.0.15.ELsmp > > ・heartbeat:2.1.3-3.el4.centos > > ・マシン台数二台 > > > > 現象は、システム再起動(停止)を行うと、heartbeatの停止処理で > > 止まってしまうという問題です。 > > > > 再現方法は、以下の通りです。 > > 1.1台目のコンソール画面から「shutdown -h now」を実行。 > > 2.「1」を実行したらすぐに2台目のコンソール画面から「shutdown -h now」を > > 実行。 > > (1台目のheartbeatの停止処理が終わる前に二台目のheartbeat停止処理を動作 > > させる。) > > 3.「2」で実行したマシンのheartbeat停止処理で止まる。 > > ※「shutdown -h now」では無くても「/etc/rc.d/init.d/heartbeat stop」でも再 > > 現できる。 > > > > 上記再現方法の実行前と実行後にcl_statusの結果を取得しました。 > > > > | 実行前 | 実行後 > > --------------------------------------------------------------------- > > hbstatus | Heartbeat is running | Heartbeat is running > > | on this machine| on this machine > > --------------------------------------------------------------------- > > listnodes -p | 表示無し | 表示無し > > --------------------------------------------------------------------- > > listnodes -n | 1台目FQDN | 1台目FQDN > > | 2台目FQDN | 2台目FQDN > > --------------------------------------------------------------------- > > nodestatus <1台目FQDN> | active | dead > > --------------------------------------------------------------------- > > nodestatus <2台目FQDN> | active | active > > --------------------------------------------------------------------- > > nodetype <1台目FQDN> | normal | normal > > --------------------------------------------------------------------- > > nodetype <2台目FQDN> | normal | normal > > --------------------------------------------------------------------- > > listhblinks <1台目FQDN> | eth1 | eth1 > > --------------------------------------------------------------------- > > listhblinks <2台目FQDN> | eth1 | eth1 > > --------------------------------------------------------------------- > > hblinkstatus <1台目FQDN>| up | dead > > --------------------------------------------------------------------- > > hblinkstatus <2台目FQDN>| dead | dead > > --------------------------------------------------------------------- > > clientstatus | offline | offline > > --------------------------------------------------------------------- > > rscstatus | none | transition > > --------------------------------------------------------------------- > > hbparameter | off | off > > -p auto_failback | | > > --------------------------------------------------------------------- > > ※heartbeat停止処理で止まっている側のマシンにtelnetでログインしデータを収 > 集 > > 。 > > > > 気になるログ出力: > > ---------------------------------------------------------------- > > WARN: Shutdown delayed until current resource activity finishes. > > ---------------------------------------------------------------- > > ※heartbeatが出力しています。 > > > > インターネットでいろいろと解決策を探してみましたが見つけることが > > できませんでした。 > > この状況に対しての解決策をご存知の方はご教授ください。 > > どんな情報でもご教授くださると幸いです。 > > > > 以上です。宜しくお願い致します。 > > > > > > -- > > (T開2) 音成 邦紀 > > E-mail:otona****@yskne***** > > 内線:4011 > > > > _______________________________________________ > > Linux-ha-japan mailing list > > Linux****@lists***** > > http://lists.sourceforge.jp/mailman/listinfo/linux-ha-japan > > > > _______________________________________________ > Linux-ha-japan mailing list > Linux****@lists***** > http://lists.sourceforge.jp/mailman/listinfo/linux-ha-japan >