Pages

Wednesday, May 14, 2014

PRCA-1022 : ACFS file system resource already exists for disk group


I was recently playing around with ACFS and ADVM, trying different things on my test cluster. I had created an ASM volume and mounted it using ASMCA, and then deleted it and tried to re-create the same thing with the command line.

Creating the diskgroup, the volume, all worked fine through the CLI. However when trying to add with srvctl using the following command:

srvctl add filesystem -d /dev/asm/dbhome-326 -g 'DBHOME' -v DBHOME -m /u01/app/oracle/product/11.2.0/dbhome_1 -u oracle
I got the following error:

PRCA-1022 : ACFS file system resource already exists for disk group dbhome and volume orahome

Hm. That's weird. So I tried running: 

srvctl remove filesystem -d /dev/asm/dbhome-326

And get an error saying the filesystem doesn't exist. But the dbhome diskgroup exists, but I couldn't seem to find the link between the two.

So here's the real lesson for the day. I usually run Putty in little rectangular windows, and so when I do a crsctl stat res -t, the output scrolls by and I usually just get the last 10 entries or so, and assume everything above is fine. Except when it's not. Like in this case.

It looks like the ASMCA doesn't clean up everything quite right, as the ACFS resource was still listed in the registry, as evidenced by this entry in the crsctl stat output: (truncated somewhat for readability) 

...
ora.dbhome.dbhome.acfs 
 OFFLINE OFFLINE ol6-112-rac1 volume 
 OFFLINE OFFLINE ol6-112-rac2 volume
So it was just a matter of deleting the resource:

[root@ol6-112-rac1 bin]# ./crsctl delete resource ora.dbhome.dbhome.acfs
And upon deletion, the srvctl add filesystem command run successfully. So: Run your terminal windows longer vertically! (or pay closer attention to the status output of crsctl stat).

Tuesday, May 6, 2014

Cluster Time Synchonisation Services (CTSS) and NTP

In 11gR2 Oracle integrated time synchronization with the Grid Infra software itself, with the Cluster Time Synchronization Service (ctssd). Previously you used ntp or another form of time synch, and I had installed 11gR2 GI before with no issues with regards to this. Just disable ntpd before beginning installation, chkconfig to make sure it's off on all levels, and when starting the installation the OUI will configure ctss when it recognizes that ntp is not active.

However I did a recent install of 11gR2 (11.2.0.4) on OEL 6.5, everything installed fine but during the cluster verification step at the end the check would fail and the installation log would show the following:

 INFO: Checking if CTSS Resource is running on all nodes...
 INFO: CTSS resource check passed
 INFO: Querying CTSS for time offset on all nodes...
 INFO: Query of CTSS for time offset passed
 INFO: Check CTSS state started...
 INFO: CTSS is in Observer state. Switching over to clock synchronization checks using NTP
 INFO: Starting Clock synchronization checks using Network Time Protocol(NTP)...
 INFO: NTP Configuration file check started...
 INFO: NTP Configuration file check passed
 INFO: Checking daemon liveness...
 INFO: Liveness check failed for "ntpd"
 INFO: Check failed on nodes:
 INFO:   ol6-112-rac2
 INFO: NTP daemon slewing option check failed on some nodes
 INFO: Check failed on nodes:
 INFO:   ol6-112-rac1
 INFO: PRVF-5436 : The NTP daemon running on one or more nodes      lacks the slewing option "-x"
 INFO: Clock synchronization check using Network Time Protocol(NTP) failed
 INFO: PRVF-9652 : Cluster Time Synchronization Services check failed

Hmm. I had already disabled ntpd before I installed. Never mind, we'll do it again to make sure.

Stopped on both nodes:

 [root@ol6-112-rac2 bin]# service ntpd stop

Check to make sure it's off in chkconfig:

 [root@ol6-112-rac2 ol6-112-rac2]# chkconfig --list | grep ntp
 ntpd            0:off   1:off   2:off   3:off   4:off   5:off   6:off

And then hit retry in the installer. I didn't get a screenshot, but the same error ended up happening.
I then turned to the trusty Oracle documentation  which states:

"If you have an NTP service on your server but you cannot use the service to synchronize time with a time server, then you must deactivate and deinstall the NTP to use Cluster Time Synchronization Service."
Well it's already been deactived, but not deinstalled.

 [root@ol6-112-rac2 ol6-112-rac2]# rpm -e ntp
 error: Failed dependencies:
    ntp is needed by (installed) system-config-date-1.9.60-2.0.1.el6.noarch
    ntp is needed by (installed) ipa-client-3.0.0-37.el6.x86_64

Yeeah, I didn't want to deal with the hassle of checking if I still needed the dependencies or not, so I scoured around on forums and the like. In passing, I read that you had to rename the ntp.conf file in /etc. Which is a bit weird, but looking at the log above, it does check for the existence of the ntp.conf file:

 INFO: NTP Configuration file check started...
 INFO: NTP Configuration file check passed
 INFO: Checking daemon liveness...
 INFO: Liveness check failed for "ntpd"

I didn't think it would be important since it looks for the log file, then checks to see if the service is alive, which I think would be a better indicator, but that's why I'm not an Oracle engineer. So since I didn't want to uninstall:

 [root@ol6-112-rac2 ol6-112-rac2]# cd /etc
 [root@ol6-112-rac2 ol6-112-rac2]# mv ntp.conf ntp.conf.orig

On both nodes again. At this point I had already exited out of the OUI and was up-ing and down-ing the cluster manually, so after starting the cluster with crsctl start crs,

 [ctssd(23630)]CRS-2401:The Cluster Time Synchronization Service started on host ol6-112-rac2.
 2014-05-06 11:26:01.867:
 [ctssd(23630)]CRS-2407:The new Cluster Time Synchronization Service reference node is host ol6-112-rac1.
 2014-05-06 11:26:03.460:
 [ohasd(23309)]CRS-2767:Resource state recovery not attempted for 'ora.diskmon' as its target state is OFFLINE
 2014-05-06 11:26:03.460:
 [ohasd(23309)]CRS-2769:Unable to failover resource      'ora.diskmon'.
 2014-05-06 11:26:09.034:
 [ctssd(23630)]CRS-2408:The clock on host ol6-112-rac2 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.

Well would you look at that. Checking just the ctss component:

 [root@ol6-112-rac2 bin]# ./crsctl check ctss
 CRS-4701: The Cluster Time Synchronization Service is in Active mode.
 CRS-4702: Offset (in msec): 0

And there we go. Long story short, if you don't want to uninstall ntp but want to use CTSS, move/rename the ntp.conf  file before installing GI.

Thursday, March 27, 2014

Everything you wanted to know about SCNs but was too afraid to ask


So I did some reading about SCN's from various sources, i.e. mostly forums, blogs, and Tom Kyte's site. And there is A LOT of information out there about SCN, much more than would be of any practical use. And I don't want to reinvent the wheel, a lot of other people have explored the topic and are exponentially much more knowledgeable than I am. So I tried to compile the pertinent points which I found were most significant to my understanding about SCN's

What is it?

An SCN or System Change Number is an internal logical sequence assigned to transactions, used to provide a sequence to transactions and used to satisfy recovery and ACID requirements. (Doc ID 1376995.1)

A sequence? So it's like a ticket number given to each transaction?

Not necessarilly. One way to think of it as an internal timestamp. There is a very good analogy by Hans Forbrich in this forum post, where he compares it to the security camera at the airport. It captures data from multiple feeds, but are all linked together by the same timestamp. Say an auditor wanted to know what happened in the departure lounge, the check-in counter, and the luggage claim at 2:51 PM. The camera at each location would be different transactions within the database, and the timestamp of 2:51 PM would be the SCN.

But, then that would mean the same SCN would be assigned to multiple transactions?

YES! An SCN is NOT unique to a single transaction. Transactions in different sessions that commit at the same time CAN have the same SCN! (Doc ID 1376995.1)

So going back, you mentioned it's like a clock. Does that mean it is consistently moving?

Technically, no. An action is needed to move the SCN from it's current value to the next. This can be a transaction OR a query. Jonathan Lewis mentions that there are 3 actions that can cause the SCN to move forward, but I can't find the exact documentation that states this.
On the other hand, in a normally operating database it is always moving since Oracle background processes are always doing transactions, even if there are no user transactions being done. So in a way, it is consistently moving, but because of operations that are being performed that move it forward.

But when I query CURRENT_SCN from V$DATABASE, it keeps on increasing!

This is one of the operations that increments the SCN. As answered by Jonathan Lewis in this thread, when querying V$DATABASE it triggers an increment to the SCN. Oracle calls kcmgas (Get and Advance SCN) and kcmgss (Get Snapshot SCN). Side note - kcm stands for Kernel Cache Miscellaneous functions. Additionally, he also mentions that the functions dbms_cdc_utility.get_current_scn and dbms_flashback.get_system_change_number do not increase the SCN since they only call kcmgss.

I did a small experiment to try to prove this:

I have a short query that shows the stats for the session doing a select select on CURRENT_SCN, and the session doing a dbms_flashback.get_system_change_number.

select case when t.sid = 49 then 'dbms_flashback'
    else 'current_scn'
    end "OPERATION"
 ,
n.name, t.value
from v$statname n, v$sesstat t
where t.sid in ('42','49')
and t.statistic# in ('163','164')
and t.statistic# = n.statistic#
group by t.sid, n.name, t.value
order by t.sid;

OPERATION      NAME                                                    VALUE
-------------- -------------------------------------------------- ----------
current_scn    calls to get snapshot scn: kcmgss                          23
current_scn    calls to kcmgas                                             2
dbms_flashback calls to get snapshot scn: kcmgss                         319
dbms_flashback calls to kcmgas                                             1

Then ran the dbms_flashback function:

SQL> select dbms_flashback.get_system_change_number
  2  from dual;

GET_SYSTEM_CHANGE_NUMBER
------------------------
                 1172005

And then ran the query again:

OPERATION      NAME                                                    VALUE
-------------- -------------------------------------------------- ----------
current_scn    calls to get snapshot scn: kcmgss                          23
current_scn    calls to kcmgas                                             2
dbms_flashback calls to get snapshot scn: kcmgss                         320
dbms_flashback calls to kcmgas                                             1


As you can see the kcmgss stat for the dbms_flashback session increased by 1 to 320, while the rest of the stats remain unchanged.
In another session I then selected current_scn from v$database:


SQL> select current_scn from v$database;

CURRENT_SCN
-----------
    1172028

And then ran the query again;

OPERATION      NAME                                                    VALUE
-------------- -------------------------------------------------- ----------
current_scn    calls to get snapshot scn: kcmgss                          24
current_scn    calls to kcmgas                                             3
dbms_flashback calls to get snapshot scn: kcmgss                         320
dbms_flashback calls to kcmgas                                             1    

Reading about the SCN has pointed me in reading more about ITLs, which I had been putting off for a while. But for now it looks like a good topic for my next post.

Friday, March 21, 2014

SCNs


I started working with Oracle database in late 2006. It's 2014 now, which is roughly 7 and a half years later, and I am still finding things about Oracle that have totally demolished any pre-conceived notions that I have had for the entirety of my career.

Deeper Down the Rabbit Hole

Yesterday I decided to setup a Logical Standby, for the simple reason that I would soon be put in charge of managing one and I have never done so before. The initial step would be to create a physical standby, and I doing this, running commands my fingers haven't typed in over a year, was a really happy moment for me. But I digress.

Once I had setup my physical standby, I was doing checking and monitoring to ensure that the standby was being updated. And I remember a question a client had put to me before, how do they know at the most granular level that the standby is indeed being updated?

I explained the whole theory behind archivelog sequence numbers, SCNs and the like, which seemed to satisfy them. But thinking about it now, I never really satisfied my own quest for that answer.

That led me down the path of investigating of how to check at the most granular level, to see follow the trail of a transaction as it was committed on the primary, and transferred and applied on the standby. How, and could I track this through the SCN of the transaction? Which again, somehow led me to read about SCNs and what they are (and what they are not!).

I had always believed that the SCN of the database was a counter, a record for each transaction, which would only get incremented at the commit of each transaction. Shown in table form, this is how I pictured it in my mind:

Txn num SCN 
------- ------------- 
 1      99183 
 2      99184 
 3      99185 
 4      99186 
 5      99187

Makes sense right? But how come when we query CURRENT_SCN from v$database it keeps on increasing, even if there is nothing going on in the database? 

SQL> select current_scn from v$database; 

 CURRENT_SCN 
 ----------- 
 994651 

 SQL> / 

 CURRENT_SCN 
 ----------- 
 994669 

 SQL> / 

 CURRENT_SCN 
 ----------- 
 994671

My answer to this to satisfy clients (and to somewhat my degree myself) was that there was always some Oracle background processes doing work, so for each transaction they do, the SCN increases.

Then I stumbled upon this Ask Tom thread, which mentioned that to think of the SCN more of a clock. Always moving, and gets assigned to transaction when they are committed. 

Txn num SCN 
------- ------------- 
 1      99183 
 2      99188 
 3      99192 
 4      99193 
 5      99201 

So depending on when a transaction is committed, that would determine the SCN that is assigned to your transaction. This explanation BLEW MY MIND. It totally decimated whatever I had learnt and thought about the SCN for the PAST SEVEN YEARS. This also explains why the SCN in v$database keeps increasing right? I was so elated about this fact when I found out yesterday afternoon. I come into work this morning, and keep reading, and lo and behold, THIS IS NOT CORRECT.

Yes, the SCN can be thought of a counter. And yes, it is incremented by transaction that occur within the database. But the reason as to why the value of CURRENT_SCN keeps increasing is because THE SCN IS INCREASED WHENEVER WE QUERY THAT VIEW. There is truly nothing left of my mind as it has been continuously been blown apart by all these revelations, which I should've known about SIX OR SO YEARS AGO.

So my entire day has been dedicated to trying to find out as much as I could about the SCN. I'll try and collate everything in the posts to follow.

Friday, February 14, 2014

Viewing Hidden Parameters in Oracle Database


Well they wouldn't really be 'hidden parameters' if they were out in plain sight right? Right? (crickets).

Anyway, hidden parameters won't be available through a show parameter or from v$parameter, but can be selected from several x$ks% tables. Below is a query from RedShoreTech that output's it nicely. I added several column formatting options at the top, and you can whateverother conditional statements relevant to your purpose at the end, e.g. adding 'AND a.ksppinm like '%latch%'.

set linesize 200 
col parameter for a30 col "Session Value" for a8 
col "Instance Value" for a16 

SELECT a.ksppinm "Parameter", 
 b.ksppstvl "Session Value", 
 c.ksppstvl "Instance Value" 
FROM x$ksppi a, x$ksppcv b, x$ksppsv c 
WHERE a.indx = b.indx AND a.indx = c.indx 
AND a.ksppinm LIKE '/_%' escape '/' /