Pages

Wednesday, May 14, 2014

PRCA-1022 : ACFS file system resource already exists for disk group


I was recently playing around with ACFS and ADVM, trying different things on my test cluster. I had created an ASM volume and mounted it using ASMCA, and then deleted it and tried to re-create the same thing with the command line.

Creating the diskgroup, the volume, all worked fine through the CLI. However when trying to add with srvctl using the following command:

srvctl add filesystem -d /dev/asm/dbhome-326 -g 'DBHOME' -v DBHOME -m /u01/app/oracle/product/11.2.0/dbhome_1 -u oracle
I got the following error:

PRCA-1022 : ACFS file system resource already exists for disk group dbhome and volume orahome

Hm. That's weird. So I tried running: 

srvctl remove filesystem -d /dev/asm/dbhome-326

And get an error saying the filesystem doesn't exist. But the dbhome diskgroup exists, but I couldn't seem to find the link between the two.

So here's the real lesson for the day. I usually run Putty in little rectangular windows, and so when I do a crsctl stat res -t, the output scrolls by and I usually just get the last 10 entries or so, and assume everything above is fine. Except when it's not. Like in this case.

It looks like the ASMCA doesn't clean up everything quite right, as the ACFS resource was still listed in the registry, as evidenced by this entry in the crsctl stat output: (truncated somewhat for readability) 

...
ora.dbhome.dbhome.acfs 
 OFFLINE OFFLINE ol6-112-rac1 volume 
 OFFLINE OFFLINE ol6-112-rac2 volume
So it was just a matter of deleting the resource:

[root@ol6-112-rac1 bin]# ./crsctl delete resource ora.dbhome.dbhome.acfs
And upon deletion, the srvctl add filesystem command run successfully. So: Run your terminal windows longer vertically! (or pay closer attention to the status output of crsctl stat).

Tuesday, May 6, 2014

Cluster Time Synchonisation Services (CTSS) and NTP

In 11gR2 Oracle integrated time synchronization with the Grid Infra software itself, with the Cluster Time Synchronization Service (ctssd). Previously you used ntp or another form of time synch, and I had installed 11gR2 GI before with no issues with regards to this. Just disable ntpd before beginning installation, chkconfig to make sure it's off on all levels, and when starting the installation the OUI will configure ctss when it recognizes that ntp is not active.

However I did a recent install of 11gR2 (11.2.0.4) on OEL 6.5, everything installed fine but during the cluster verification step at the end the check would fail and the installation log would show the following:

 INFO: Checking if CTSS Resource is running on all nodes...
 INFO: CTSS resource check passed
 INFO: Querying CTSS for time offset on all nodes...
 INFO: Query of CTSS for time offset passed
 INFO: Check CTSS state started...
 INFO: CTSS is in Observer state. Switching over to clock synchronization checks using NTP
 INFO: Starting Clock synchronization checks using Network Time Protocol(NTP)...
 INFO: NTP Configuration file check started...
 INFO: NTP Configuration file check passed
 INFO: Checking daemon liveness...
 INFO: Liveness check failed for "ntpd"
 INFO: Check failed on nodes:
 INFO:   ol6-112-rac2
 INFO: NTP daemon slewing option check failed on some nodes
 INFO: Check failed on nodes:
 INFO:   ol6-112-rac1
 INFO: PRVF-5436 : The NTP daemon running on one or more nodes      lacks the slewing option "-x"
 INFO: Clock synchronization check using Network Time Protocol(NTP) failed
 INFO: PRVF-9652 : Cluster Time Synchronization Services check failed

Hmm. I had already disabled ntpd before I installed. Never mind, we'll do it again to make sure.

Stopped on both nodes:

 [root@ol6-112-rac2 bin]# service ntpd stop

Check to make sure it's off in chkconfig:

 [root@ol6-112-rac2 ol6-112-rac2]# chkconfig --list | grep ntp
 ntpd            0:off   1:off   2:off   3:off   4:off   5:off   6:off

And then hit retry in the installer. I didn't get a screenshot, but the same error ended up happening.
I then turned to the trusty Oracle documentation  which states:

"If you have an NTP service on your server but you cannot use the service to synchronize time with a time server, then you must deactivate and deinstall the NTP to use Cluster Time Synchronization Service."
Well it's already been deactived, but not deinstalled.

 [root@ol6-112-rac2 ol6-112-rac2]# rpm -e ntp
 error: Failed dependencies:
    ntp is needed by (installed) system-config-date-1.9.60-2.0.1.el6.noarch
    ntp is needed by (installed) ipa-client-3.0.0-37.el6.x86_64

Yeeah, I didn't want to deal with the hassle of checking if I still needed the dependencies or not, so I scoured around on forums and the like. In passing, I read that you had to rename the ntp.conf file in /etc. Which is a bit weird, but looking at the log above, it does check for the existence of the ntp.conf file:

 INFO: NTP Configuration file check started...
 INFO: NTP Configuration file check passed
 INFO: Checking daemon liveness...
 INFO: Liveness check failed for "ntpd"

I didn't think it would be important since it looks for the log file, then checks to see if the service is alive, which I think would be a better indicator, but that's why I'm not an Oracle engineer. So since I didn't want to uninstall:

 [root@ol6-112-rac2 ol6-112-rac2]# cd /etc
 [root@ol6-112-rac2 ol6-112-rac2]# mv ntp.conf ntp.conf.orig

On both nodes again. At this point I had already exited out of the OUI and was up-ing and down-ing the cluster manually, so after starting the cluster with crsctl start crs,

 [ctssd(23630)]CRS-2401:The Cluster Time Synchronization Service started on host ol6-112-rac2.
 2014-05-06 11:26:01.867:
 [ctssd(23630)]CRS-2407:The new Cluster Time Synchronization Service reference node is host ol6-112-rac1.
 2014-05-06 11:26:03.460:
 [ohasd(23309)]CRS-2767:Resource state recovery not attempted for 'ora.diskmon' as its target state is OFFLINE
 2014-05-06 11:26:03.460:
 [ohasd(23309)]CRS-2769:Unable to failover resource      'ora.diskmon'.
 2014-05-06 11:26:09.034:
 [ctssd(23630)]CRS-2408:The clock on host ol6-112-rac2 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.

Well would you look at that. Checking just the ctss component:

 [root@ol6-112-rac2 bin]# ./crsctl check ctss
 CRS-4701: The Cluster Time Synchronization Service is in Active mode.
 CRS-4702: Offset (in msec): 0

And there we go. Long story short, if you don't want to uninstall ntp but want to use CTSS, move/rename the ntp.conf  file before installing GI.