Pages

Tuesday, May 6, 2014

Cluster Time Synchonisation Services (CTSS) and NTP

In 11gR2 Oracle integrated time synchronization with the Grid Infra software itself, with the Cluster Time Synchronization Service (ctssd). Previously you used ntp or another form of time synch, and I had installed 11gR2 GI before with no issues with regards to this. Just disable ntpd before beginning installation, chkconfig to make sure it's off on all levels, and when starting the installation the OUI will configure ctss when it recognizes that ntp is not active.

However I did a recent install of 11gR2 (11.2.0.4) on OEL 6.5, everything installed fine but during the cluster verification step at the end the check would fail and the installation log would show the following:

 INFO: Checking if CTSS Resource is running on all nodes...
 INFO: CTSS resource check passed
 INFO: Querying CTSS for time offset on all nodes...
 INFO: Query of CTSS for time offset passed
 INFO: Check CTSS state started...
 INFO: CTSS is in Observer state. Switching over to clock synchronization checks using NTP
 INFO: Starting Clock synchronization checks using Network Time Protocol(NTP)...
 INFO: NTP Configuration file check started...
 INFO: NTP Configuration file check passed
 INFO: Checking daemon liveness...
 INFO: Liveness check failed for "ntpd"
 INFO: Check failed on nodes:
 INFO:   ol6-112-rac2
 INFO: NTP daemon slewing option check failed on some nodes
 INFO: Check failed on nodes:
 INFO:   ol6-112-rac1
 INFO: PRVF-5436 : The NTP daemon running on one or more nodes      lacks the slewing option "-x"
 INFO: Clock synchronization check using Network Time Protocol(NTP) failed
 INFO: PRVF-9652 : Cluster Time Synchronization Services check failed

Hmm. I had already disabled ntpd before I installed. Never mind, we'll do it again to make sure.

Stopped on both nodes:

 [root@ol6-112-rac2 bin]# service ntpd stop

Check to make sure it's off in chkconfig:

 [root@ol6-112-rac2 ol6-112-rac2]# chkconfig --list | grep ntp
 ntpd            0:off   1:off   2:off   3:off   4:off   5:off   6:off

And then hit retry in the installer. I didn't get a screenshot, but the same error ended up happening.
I then turned to the trusty Oracle documentation  which states:

"If you have an NTP service on your server but you cannot use the service to synchronize time with a time server, then you must deactivate and deinstall the NTP to use Cluster Time Synchronization Service."
Well it's already been deactived, but not deinstalled.

 [root@ol6-112-rac2 ol6-112-rac2]# rpm -e ntp
 error: Failed dependencies:
    ntp is needed by (installed) system-config-date-1.9.60-2.0.1.el6.noarch
    ntp is needed by (installed) ipa-client-3.0.0-37.el6.x86_64

Yeeah, I didn't want to deal with the hassle of checking if I still needed the dependencies or not, so I scoured around on forums and the like. In passing, I read that you had to rename the ntp.conf file in /etc. Which is a bit weird, but looking at the log above, it does check for the existence of the ntp.conf file:

 INFO: NTP Configuration file check started...
 INFO: NTP Configuration file check passed
 INFO: Checking daemon liveness...
 INFO: Liveness check failed for "ntpd"

I didn't think it would be important since it looks for the log file, then checks to see if the service is alive, which I think would be a better indicator, but that's why I'm not an Oracle engineer. So since I didn't want to uninstall:

 [root@ol6-112-rac2 ol6-112-rac2]# cd /etc
 [root@ol6-112-rac2 ol6-112-rac2]# mv ntp.conf ntp.conf.orig

On both nodes again. At this point I had already exited out of the OUI and was up-ing and down-ing the cluster manually, so after starting the cluster with crsctl start crs,

 [ctssd(23630)]CRS-2401:The Cluster Time Synchronization Service started on host ol6-112-rac2.
 2014-05-06 11:26:01.867:
 [ctssd(23630)]CRS-2407:The new Cluster Time Synchronization Service reference node is host ol6-112-rac1.
 2014-05-06 11:26:03.460:
 [ohasd(23309)]CRS-2767:Resource state recovery not attempted for 'ora.diskmon' as its target state is OFFLINE
 2014-05-06 11:26:03.460:
 [ohasd(23309)]CRS-2769:Unable to failover resource      'ora.diskmon'.
 2014-05-06 11:26:09.034:
 [ctssd(23630)]CRS-2408:The clock on host ol6-112-rac2 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.

Well would you look at that. Checking just the ctss component:

 [root@ol6-112-rac2 bin]# ./crsctl check ctss
 CRS-4701: The Cluster Time Synchronization Service is in Active mode.
 CRS-4702: Offset (in msec): 0

And there we go. Long story short, if you don't want to uninstall ntp but want to use CTSS, move/rename the ntp.conf  file before installing GI.

No comments:

Post a Comment