Thursday, April 14, 2011

OCR File and Voting Disk Administration by Example





Oracle Clusterware 10g, formerly known as Cluster Ready Services (CRS) is software that when installed on servers running the same operating system, enables the servers to be bound together to operate and function as a single server or cluster. This infrastructure simplifies the requirement for an Oracle Real Application Clusters (RAC) database by providing cluster software that is tightly integrated with the Oracle Database.
The Oracle Clusterware requires two critical clusterware components: a voting disk to record node membership information and the Oracle Cluster Registry (OCR) to record cluster configuration information:

Voting Disk

The voting disk is a shared partition that Oracle Clusterware uses to verify cluster node membership and status. Oracle Clusterware uses the voting disk to determine which instances are members of a cluster by way of a health check and arbitrates cluster ownership among the instances in case of network failures. The primary function of the voting disk is to manage node membership and prevent what is known as Split Brain Syndrome in which two or more instances attempt to control the RAC database. This can occur in cases where there is a break in communication between nodes through the interconnect.
The voting disk must reside on a shared disk(s) that is accessible by all of the nodes in the cluster. For high availability, Oracle recommends that you have multiple voting disks. Oracle Clusterware can be configured to maintain multiple voting disks (multiplexing) but you must have an odd number of voting disks, such as three, five, and so on. Oracle Clusterware supports a maximum of 32 voting disks. If you define a single voting disk, then you should use external mirroring to provide redundancy.
A node must be able to access more than half of the voting disks at any time. For example, if you have five voting disks configured, then a node must be able to access at least three of the voting disks at any time. If a node cannot access the minimum required number of voting disks it is evicted, or removed, from the cluster. After the cause of the failure has been corrected and access to the voting disks has been restored, you can instruct Oracle Clusterware to recover the failed node and restore it to the cluster.


What is a voting disk?
A voting disk is a file that manages information about node membership.
What are the administrative tasks involved with voting disk?
Following administrative tasks are performed with the voting disk :
1) Backing up voting disks
2) Recovering Voting disks
3) Adding voting disks
4) Deleting voting disks
5) Moving voting disks
How do we backup voting disks?
1) Oracle recommends that you back up your voting disk after the initial cluster creation and after we complete any node addition or deletion procedures.
2) First, as root user, stop Oracle Clusterware (with the crsctl stop crs command) on all nodes. Then, determine the current voting disk by issuing the following command:
         crsctl query css votedisk
3) Then, issue the dd or ocopy command to back up a voting disk, as appropriate.
Give the syntax of backing up voting disks:-
On Linux or UNIX systems:
dd if=voting_disk_name of=backup_file_name
where,
voting_disk_name is the name of the active voting disk
backup_file_name is the name of the file to which we want to back up the voting disk contents
On Windows systems, use the ocopy command:
ocopy voting_disk_name backup_file_name
What is the Oracle Recommendation for backing up voting disk?
Oracle recommends us to use the dd command to backup the voting disk with aminimum block size of 4KB.
How do you restore a voting disk?
To restore the backup of your voting disk, issue the dd or ocopy command for Linux and UNIX systems or ocopy for Windows systems respectively.
On Linux or UNIX systems:
dd if=backup_file_name of=voting_disk_name
On Windows systems, use the ocopy command:
ocopy backup_file_name voting_disk_name
where,
backup_file_name is the name of the voting disk backup file
voting_disk_name is the name of the active voting disk
How can we add and remove multiple voting disks?
If we have multiple voting disks, then we can remove the voting disks and add them back into our environment using the following commands, where path is the complete path of the location where the voting disk resides:
crsctl delete css votedisk path
crsctl add css votedisk path

How do we stop Oracle Clusterware? When do we stop it?
Before making any modification to the voting disk, as root user, stop Oracle Clusterware using the crsctl stop crs command on all nodes.
How do we add voting disk?
To add a voting disk, issue the following command as the root user, replacing the path variable with the fully qualified path name for the voting disk we want to add:
crsctl add css votedisk path -force
How do we move voting disks?
To move a voting disk, issue the following commands as the root user, replacing the path variable with the fully qualified path name for the voting disk we want to move:
crsctl delete css votedisk path -force
crsctl add css votedisk path -force

How do we remove voting disks?
To remove a voting disk, issue the following command as the root user, replacing the path variable with the fully qualified path name for the voting disk we want to remove:
crsctl delete css votedisk path -force
What should we do after modifying voting disks?
After modifying the voting disk, restart Oracle Clusterware using the crsctl start crs command on all nodes, and verify the voting disk location using the following command:
crsctl query css votedisk
When can we use -force option?
If our cluster is down, then we can include the -force option to modify the voting disk configuration, without interacting with active Oracle Clusterware daemons. However, using the -force option while any cluster node is active may corrupt our configuration.

Oracle Cluster Registry (OCR)

Maintains cluster configuration information as well as configuration information about any cluster database within the cluster. OCR is the repository of configuration information for the cluster that manages information about like the cluster node list and instance-to-node mapping information. This configuration information is used by many of the processes that make up the CRS as well as other cluster-aware applications which use this repository to share information amoung them. Some of the main components included in the OCR are:
·         Node membership information
·         Database instance, node, and other mapping information
·         ASM (if configured)
·         Application resource profiles such as VIP addresses, services, etc.
·         Service characteristics
·         Information about processes that Oracle Clusterware controls
·         Information about any third-party applications controlled by CRS (10g R2 and later)
The OCR stores configuration information in a series of key-value pairs within a directory tree structure. To view the contents of the OCR in a human-readable format, run the ocrdump command. This will dump the contents of the OCR into an ASCII text file in the current directory named OCRDUMPFILE.
The OCR must reside on a shared disk(s) that is accessible by all of the nodes in the cluster. Oracle Clusterware 10g Release 2 allows you to multiplex the OCR and Oracle recommends that you use this feature to ensure cluster high availability. Oracle Clusterware allows for a maximum of two OCR locations; one is the primary and the second is an OCR mirror. If you define a single OCR, then you should use external mirroring to provide redundancy. You can replace a failed OCR online, and you can update the OCR through supported APIs such as Enterprise Manager, the Server Control Utility (SRVCTL), or the Database Configuration Assistant (DBCA).

OCR SUMMARY

·         this is the central repository for the Oracle cluster: there we find all the information about the cluster in real-time;
·          CRS update the OCR with the information about the node failure or reconfiguration;
·          CSS update the OCR when a node is added or deleted;
·          NetCA, DBCA, SRVCTL update the OCR with the services information;
·          this is a binary file and cannot be edited;
·          the OCR information is cached on each node;
·          only one node (the master node) can update the OCR file => The master node has the OCR cache up-to-date in real time;
·          OCR file is automatically backed up in the OCR location every 4 hours:

       cd $CRS_HOME/cdata/<cluster name>
        ls
        backup00.ocr backup01.ocr backup02.ocr day.ocr day_.ocr week.ocr

·         OCR file can be backud up manually as well running the following command:
                  dd if=voting_disk_name of=backup_file_name

·         the OCR files are backed up for a week and overwritten in a circulary manner;
·          because the OCR is a key component of the Oracle cluster, the OCR file must be mirrored;
·          the OCR file can be exported, imported with ocrconfig command;    


There are two methods for OCR Backup (Oracle Cluster Registry)

1. Automatically generated OCR files under $CRS_HOME/cdata/crs
2. OCR export/logical backup


The Oracle Clusterware automatically creates OCR backups
-Every four hours: last three copies
-At the End of the Day: last two copies
-At the end of the week: last two copies
.

To backup OCR file, copy the generated file from $CRS_HOME/cdata/crs to your backup directory (/backup/oracle).

You must run the backup as “root”.

Run the below command to take OCR export backup.
# ocrconfig -export export_file_name


The example configuration used in this article consists of a two-node RAC with a clustered database named racdb.idevelopment.info running Oracle RAC 10g Release 2 on the Linux x86 platform. The two node names are racnode1 andracnode2, each hosting a single Oracle instance named racdb1 and racdb2 respectively. For a detailed guide on building the example clustered database environment, please see:

The example Oracle Clusterware environment is configured with a single voting disk and a single OCR file on an OCFS2 clustered file system. Note that the voting disk is owned by the oracle user in the oinstall group with 0644 permissions while the OCR file is owned by root in the oinstall group with 0640 permissions:
[oracle@racnode1 ~]$ ls -l /u02/oradata/racdb
total 16608
-rw-r--r-- 1 oracle oinstall 10240000 Aug 26 22:43 CSSFile
drwxr-xr-x 2 oracle oinstall     3896 Aug 26 23:45 dbs/
-rw-r----- 1 root   oinstall  6836224 Sep  3 23:47 OCRFile

Check Current OCR File

[oracle@racnode1 ~]$ ocrcheck
Status of Oracle Cluster Registry is as follows :
         Version                  :          2
         Total space (kbytes)     :     262120
         Used space (kbytes)      :       4660
         Available space (kbytes) :     257460
         ID                       :    1331197
         Device/File Name         : /u02/oradata/racdb/OCRFile
                                    Device/File integrity check succeeded

                                    Device/File not configured

         Cluster registry integrity check succeeded
Check Current Voting Disk
[oracle@racnode1 ~]$ crsctl query css votedisk
 0.     0    /u02/oradata/racdb/CSSFile

located 1 votedisk(s).



View OCR Configuration Information
Two methods exist to verify how many OCR files are configured for the cluster as well as their location. If the cluster is up and running, use the ocrcheck utility as either the oracle or root user account:
[oracle@racnode1 ~]$ ocrcheck
Status of Oracle Cluster Registry is as follows :
         Version                  :          2
         Total space (kbytes)     :     262120
         Used space (kbytes)      :       4660
         Available space (kbytes) :     257460
         ID                       :    1331197
         Device/File Name         : /u02/oradata/racdb/OCRFile  <-- OCR (primary)
                                    Device/File integrity check succeeded

                                    Device/File not configured  <-- OCR Mirror (not configured)

         Cluster registry integrity check succeeded
If CRS is down, you can still determine the location and number of OCR files by viewing the file ocr.loc, whose location is somewhat platform dependent. For example, on the Linux platform it is located in /etc/oracle/ocr.locwhile on Sun Solaris it is located at /var/opt/oracle/ocr.loc:
[root@racnode1 ~]# cat /etc/oracle/ocr.loc
ocrconfig_loc=/u02/oradata/racdb/OCRFile
local_only=FALSE
To view the actual contents of the OCR in a human-readable format, run the ocrdump command. This command requires the CRS stack to be running. Running the ocrdump command will dump the contents of the OCR into an ASCII text file in the current directory named OCRDUMPFILE:
[root@racnode1 ~]# ocrdump
[root@racnode1 ~]# ls -l OCRDUMPFILE
-rw-r--r-- 1 root root 250304 Oct  2 22:46 OCRDUMPFILE
The ocrdump utility also allows for different output options:
#
# Write OCR contents to specified file name.
#
[root@racnode1 ~]# ocrdump /tmp/'hostname'_ocrdump_'date +%m%d%y:%H%M'


#
# Print OCR contents to the screen.
#
[root@racnode1 ~]# ocrdump -stdout -keyname SYSTEM.css


#
# Write OCR contents out to XML format.
#
[root@racnode1 ~]# ocrdump -stdout -keyname SYSTEM.css -xml > ocrdump.xml

Add an OCR File

Starting with Oracle Clusterware 10g Release 2 (10.2), users now have the ability to multiplex (mirror) the OCR. Oracle Clusterware allows for a maximum of two OCR locations; one is the primary and the second is an OCR mirror. To avoid simultaneous loss of multiple OCR files, each copy of the OCR should be placed on a shared storage device that does not share any components (controller, interconnect, and so on) with the storage devices used for the other OCR file.
Before attempting to add a mirrored OCR, determine how many OCR files are currently configured for the cluster as well as their location. If the cluster is up and running, use the ocrcheck utility as either the oracle or root user account:
[oracle@racnode1 ~]$ ocrcheck
Status of Oracle Cluster Registry is as follows :
         Version                  :          2
         Total space (kbytes)     :     262120
         Used space (kbytes)      :       4660
         Available space (kbytes) :     257460
         ID                       :    1331197
         Device/File Name         : /u02/oradata/racdb/OCRFile  <-- OCR (primary)
                                    Device/File integrity check succeeded

                                    Device/File not configured  <-- OCR Mirror (not configured yet)

         Cluster registry integrity check succeeded
If CRS is down, you can still determine the location and number of OCR files by viewing the file ocr.loc, whose location is somewhat platform dependent. For example, on the Linux platform it is located in /etc/oracle/ocr.locwhile on Sun Solaris it is located at /var/opt/oracle/ocr.loc:
[root@racnode1 ~]# cat /etc/oracle/ocr.loc
ocrconfig_loc=/u02/oradata/racdb/OCRFile
local_only=FALSE
The results above indicate I have only one OCR file and that it is located on an OCFS2 file system. Since we are allowed a maximum of two OCR locations, I intend to create an OCR mirror and locate it on the same OCFS2 file system in the same directory as the primary OCR. Please note that I am doing this for the sake brevity. The OCR mirror should always be placed on a separate device than the primary OCR file to guard against a single point of failure.
Note that the Oracle Clusterware stack should be online and running on all nodes in the cluster while adding, replacing, or removing the OCR location and hence does not require any system downtime.
http://www.idevelopment.info/images/popup_dialog_exclamation_mark.gif 
The operations performed in this section affect the OCR for the entire cluster. However, theocrconfig command cannot modify OCR configuration information for nodes that are shut down or for nodes on which Oracle Clusterware is not running. So, you should avoid shutting down nodes while modifying the OCR using the ocrconfig command. If for any reason, any of the nodes in the cluster are shut down while modifying the OCR using the ocrconfig command, you will need to perform a repair on the stopped node before it can brought online to join the cluster. Please see the section "Repair an OCR File on a Local Node" for instructions on repairing the OCR file on the affected node.
You can add an OCR mirror after an upgrade or after completing the Oracle Clusterware installation. The Oracle Universal Installer (OUI) allows you to configure either one or two OCR locations during the installation of Oracle Clusterware. If you already mirror the OCR, then you do not need to add a new OCR location; Oracle Clusterware automatically manages two OCRs when you configure normal redundancy for the OCR. As previously mentioned, Oracle RAC environments do not support more than two OCR locations; a primary OCR and a secondary (mirrored) OCR.
Run the following command to add or relocate an OCR mirror using either destination_file or disk to designate the target location of the additional OCR:
ocrconfig -replace ocrmirror <destination_file>
ocrconfig -replace ocrmirror <disk>
http://www.idevelopment.info/images/popup_dialog_information_mark.gif 
You must be logged in as the root user to run the ocrconfig command.

http://www.idevelopment.info/images/popup_dialog_stop_mark.gif 
Please note that ocrconfig -replace is the only way to add/relocate OCR files/mirrors. Attempting to copy the existing OCR file to a new location and then manually adding/changing the file pointer in the ocr.loc file is not supported and will actually fail to work.
For example:
#
# Verify CRS is running on node 1.
#
[root@racnode1 ~]# crsctl check crs
CSS appears healthy
CRS appears healthy
EVM appears healthy

#
# Verify CRS is running on node 2.
#
[root@racnode2 ~]# crsctl check crs
CSS appears healthy
CRS appears healthy
EVM appears healthy

#
# Configure the shared OCR destination_file/disk before
# attempting to create the new ocrmirror on it. This example
# creates a destination_file on an OCFS2 file system.
# Failure to pre-configure the new destination_file/disk
# before attempting to run ocrconfig will result in the
# following error:
#
#     PROT-21: Invalid parameter
#
[root@racnode1 ~]# cp /dev/null /u02/oradata/racdb/OCRFile_mirror
[root@racnode1 ~]# chown root /u02/oradata/racdb/OCRFile_mirror
[root@racnode1 ~]# chgrp oinstall /u02/oradata/racdb/OCRFile_mirror
[root@racnode1 ~]# chmod 640 /u02/oradata/racdb/OCRFile_mirror

#
# Add new OCR mirror.
#
[root@racnode1 ~]# ocrconfig -replace ocrmirror /u02/oradata/racdb/OCRFile_mirror
After adding the new OCR mirror, check that it can be seen from all nodes in the cluster:
#
# Verify new OCR mirror from node 1.
#
[root@racnode1 ~]# ocrcheck
Status of Oracle Cluster Registry is as follows :
         Version                  :          2
         Total space (kbytes)     :     262120
         Used space (kbytes)      :       4668
         Available space (kbytes) :     257452
         ID                       :    1331197
         Device/File Name         : /u02/oradata/racdb/OCRFile
                                    Device/File integrity check succeeded
         Device/File Name         : /u02/oradata/racdb/OCRFile_mirror  <-- New OCR Mirror
                                    Device/File integrity check succeeded

         Cluster registry integrity check succeeded


[root@racnode1 ~]# cat /etc/oracle/ocr.loc
#Device/file  getting replaced by device /u02/oradata/racdb/OCRFile_mirror
ocrconfig_loc=/u02/oradata/racdb/OCRFile
ocrmirrorconfig_loc=/u02/oradata/racdb/OCRFile_mirror


#
# Verify new OCR mirror from node 2.
#
[root@racnode2 ~]# ocrcheck
Status of Oracle Cluster Registry is as follows :
         Version                  :          2
         Total space (kbytes)     :     262120
         Used space (kbytes)      :       4668
         Available space (kbytes) :     257452
         ID                       :    1331197
         Device/File Name         : /u02/oradata/racdb/OCRFile
                                    Device/File integrity check succeeded
         Device/File Name         : /u02/oradata/racdb/OCRFile_mirror  <-- New OCR Mirror
                                    Device/File integrity check succeeded

         Cluster registry integrity check succeeded


[root@racnode2 ~]# cat /etc/oracle/ocr.loc
#Device/file  getting replaced by device /u02/oradata/racdb/OCRFile_mirror
ocrconfig_loc=/u02/oradata/racdb/OCRFile
ocrmirrorconfig_loc=/u02/oradata/racdb/OCRFile_mirror

1. To add an OCR device:
To add an OCR device, provide the full path including file name.
ocrconfig -replace ocr <filename>
To add an OCR mirror device, provide the full path including file name.
ocrconfig -replace ocrmirror <filename>
2. To remove an OCR device:
To remove an OCR device:
ocrconfig -replace ocr
To remove an OCR mirror device
ocrconfig -replace ocrmirror
3. To replace or move the location of an OCR device:
To replace the OCR device with <filename>, provide the full path including file name.
ocrconfig -replace ocr <filename>
To replace the OCR mirror device with <filename>, provide the full path including file name.
ocrconfig -replace ocrmirror <filename>

Backing up the voting disk(s) is often performed on a regular basis by the DBA to guard the cluster against a single point of failure as the result of hardware failure or user error. Because the node membership information does not usually change, it is not a strict requirement that you back up the voting disk every day. At a minimum, however, your backup strategy should include procedures to back up all voting disks at the following times and make certain that the backups are stored in a secure location that is accessible from all nodes in the cluster in the event the voting disk(s) need to be restored:
  • After installing Oracle Clusterware
  • After adding nodes to or deleting nodes from the cluster
  • After performing voting disk add or delete operations
Oracle Clusterware 10g Release 1 (10.1) only allowed for one voting disk while Oracle Clusterware 10g Release 2 (10.2) lifted this restriction to allow for 32 voting disks. For high availability, Oracle recommends that Oracle Clusterware 10g R2 users configure multiple voting disks while keeping in mind that you must have an odd number of voting disks, such as three, five, and so on. To avoid simultaneous loss of multiple voting disks, each voting disk should be placed on a shared storage device that does not share any components (controller, interconnect, and so on) with the storage devices used for the other voting disks. If you define a single voting disk, then you should use external mirroring to provide redundancy.
To make a backup copy of the voting disk on UNIX/Linux, use the dd command:
dd if=<voting_disk_name> of=<backup_file_name> bs=<block_size>
Perform this operation on every voting disk where voting_disk_name is the name of the active voting disk (input file), backup_file_name is the name of the file to which you want to back up the voting disk contents (output file), and block_size is the value to set both the input and output block sizes. As a general rule on most platforms, including Linux and Sun, the block size for the dd command should be 4k to ensure that the backup of the voting disk gets complete blocks.
If your voting disk is stored on a raw device, use the device name in place of voting_disk_name. For example:
dd if=/dev/raw/raw3 of=/u03/crs_backup/votebackup/VotingDiskBackup.dmp bs=4k
When you use the dd command to make backups of the voting disk, the backup can be performed while the Cluster Ready Services (CRS) process is active; you do not need to stop the CRS daemons (namely, the crsd.bin process) before taking a backup of the voting disk.
The following is a working UNIX script that can be scheduled in CRON to backup the OCR File and the Voting Disk on a regular basis:
For the purpose of this example, the current Oracle Clusterware environment is configured with three voting disks on an OCFS2 clustered file system that will be backed up to a local file system on one of the nodes in the cluster. For example:
#
# Query the location and number of voting disks.
#
[root@racnode1 ~]# crsctl query css votedisk
 0.     0    /u02/oradata/racdb/CSSFile
 1.     0    /u02/oradata/racdb/CSSFile_mirror1
 2.     0    /u02/oradata/racdb/CSSFile_mirror2

#
# Backup all three voting disks.
#
[root@racnode1 ~]# dd if=/u02/oradata/racdb/CSSFile of=/u03/crs_backup/votebackup/CSSFile.bak bs=4k
2500+0 records in
2500+0 records out
10240000 bytes (10 MB) copied, 0.259862 seconds, 39.4 MB/s

[root@racnode1 ~]# dd if=/u02/oradata/racdb/CSSFile_mirror1 of=/u03/crs_backup/votebackup/CSSFile_mirror1.bak bs=4k
2500+0 records in
2500+0 records out
10240000 bytes (10 MB) copied, 0.295964 seconds, 34.6 MB/s

[root@racnode1 ~]# dd if=/u02/oradata/racdb/CSSFile_mirror2 of=/u03/crs_backup/votebackup/CSSFile_mirror2.bak bs=4k
2500+0 records in
2500+0 records out
10240000 bytes (10 MB) copied, 0.249039 seconds, 41.1 MB/s

http://www.idevelopment.info/images/top_v1.gif


The recommended way to recover from a lost or corrupt voting disk is to restore it from a previous good backup that was taken with the dd command.
There are actually very few steps required to restore the voting disks:
1.     Shutdown CRS on all nodes in the cluster.
2.     List the current location of the voting disks.
3.     Restore each of the voting disks using the dd command from a previous good backup of the voting disks that was taken using the same dd command.
4.     Re-start CRS on all nodes in the cluster.
For example:

[root@racnode1 ~]# crsctl stop crs
[root@racnode2 ~]# crsctl stop crs

[root@racnode1 ~]# crsctl query css votedisk

[root@racnode1 ~]# # Do this for all voting disks...
[root@racnode1 ~]# dd if=<backup_voting_disk> of=<voting_disk_name> bs=4k

[root@racnode1 ~]# crsctl start crs
[root@racnode2 ~]# crsctl start crs
The following is an example of what occurs on all RAC nodes when a voting disk is destroyed. This example will manually corrupt all voting disks in the cluster. After the Oracle RAC nodes reboot from the crash, we will follow up with the steps required to restore the lost/corrupt voting disk which will make use of the voting disk backups that were created in the previous section.
http://www.idevelopment.info/images/popup_dialog_exclamation_mark.gif 
Although it should go without saying, DO NOT perform this recovery scenario on a critical system like production!
First, let's check the status of the cluster and all RAC components, list the current location of the voting disk(s), and finally list the voting disk backup that will be used to recover from:
[root@racnode1 ~]# crs_stat -t
Name           Type           Target    State     Host
------------------------------------------------------------
ora.racdb.db   application    ONLINE    ONLINE    racnode2
ora....b1.inst application    ONLINE    ONLINE    racnode1
ora....b2.inst application    ONLINE    ONLINE    racnode2
ora....srvc.cs application    ONLINE    ONLINE    racnode2
ora....db1.srv application    ONLINE    ONLINE    racnode1
ora....db2.srv application    ONLINE    ONLINE    racnode2
ora....SM1.asm application    ONLINE    ONLINE    racnode1
ora....E1.lsnr application    ONLINE    ONLINE    racnode1
ora....de1.gsd application    ONLINE    ONLINE    racnode1
ora....de1.ons application    ONLINE    ONLINE    racnode1
ora....de1.vip application    ONLINE    ONLINE    racnode1
ora....SM2.asm application    ONLINE    ONLINE    racnode2
ora....E2.lsnr application    ONLINE    ONLINE    racnode2
ora....de2.gsd application    ONLINE    ONLINE    racnode2
ora....de2.ons application    ONLINE    ONLINE    racnode2
ora....de2.vip application    ONLINE    ONLINE    racnode2


[root@racnode1 ~]# crsctl query css votedisk
 0.     0    /u02/oradata/racdb/CSSFile
 1.     0    /u02/oradata/racdb/CSSFile_mirror1
 2.     0    /u02/oradata/racdb/CSSFile_mirror2

located 3 votedisk(s).


[root@racnode1 ~]# ls -l /u03/crs_backup/votebackup
total 30048
-rw-r--r-- 1 root root 10240000 Oct  8 21:24 CSSFile.bak
-rw-r--r-- 1 root root 10240000 Oct  8 21:24 CSSFile_mirror1.bak
-rw-r--r-- 1 root root 10240000 Oct  8 21:25 CSSFile_mirror2.bak
The next step is to simulate the corruption or loss of the voting disk(s).
Oracle RAC 10g R1 / R2 (not patched with 10.2.0.4)
If you are using Oracle RAC 10g R1 or Oracle RAC 10g R2 (not patched with 10.2.0.4), simply write zero's to one of the voting disk:
[root@racnode1 ~]# dd if=/dev/zero of=/u02/oradata/racdb/CSSFile

Both RAC servers are now stuck and will be rebooted by CRS...
Oracle RAC 11g or higher (including Oracle RAC 10g R2 patched with 10.2.0.4)
Starting with Oracle RAC 11g R1 (including Oracle RAC 10g R2 patched with 10.2.0.4), attempting to corrupt a voting disk using dd will result in all nodes being rebooted, however, Oracle Clusterware will re-construct the corrupt voting disk and successfully bring up the RAC components. Because the voting disks do not contain persistent data, CSSD is able to fully reconstruct the voting disks so long as the cluster is running. This feature was introduced with Oracle Clusterware 11.1 and is also available with Oracle Clusterware 10.2 patched with 10.2.0.4.
This makes it a bit more difficult to corrupt a voting disk by simply writing zero's to it. You would need to find a way to dd the voting disks and stop the cluster before any of the voting disks could be automatically recovered by CSSD. Good luck with that! To simulate the corruption (actually the loss) of the voting disk and have both nodes crash, I'm simply going to delete all of the voting disks and then manually reboot the nodes:
Delete the voting disk...
[root@racnode1 ~]# rm /u02/oradata/racdb/CSSFile
[root@racnode1 ~]# rm /u02/oradata/racdb/CSSFile_mirror1
[root@racnode1 ~]# rm /u02/oradata/racdb/CSSFile_mirror2

Reboot both nodes to simulate the crash...
[root@racnode1 ~]# reboot
[root@racnode2 ~]# reboot
After the reboot, CRS will not come up and all RAC components will be down:
[root@racnode1 ~]# crs_stat -t
CRS-0184: Cannot communicate with the CRS daemon.

[root@racnode2 ~]# crs_stat -t
CRS-0184: Cannot communicate with the CRS daemon.
Ok, let's start the recovery process.
#
# Locate the voting disk backups that were taken in the
# previous section.
#
[root@racnode1 ~]# cd /u03/crs_backup/votebackup
[root@racnode1 votebackup]# ls -l *.bak
-rw-r--r-- 1 root root 10240000 Oct  8 21:24 CSSFile.bak
-rw-r--r-- 1 root root 10240000 Oct  8 21:24 CSSFile_mirror1.bak
-rw-r--r-- 1 root root 10240000 Oct  8 21:25 CSSFile_mirror2.bak

#
# Recover the voting disk (or voting disks) using the same
# dd command that was used to back it up, but with the input
# file and output file in reverse.
#
[root@racnode1 ~]# dd if=/u03/crs_backup/votebackup/CSSFile.bak of=/u02/oradata/racdb/CSSFile bs=4k
2500+0 records in
2500+0 records out
10240000 bytes (10 MB) copied, 0.252425 seconds, 40.6 MB/s

[root@racnode1 ~]# dd if=/u03/crs_backup/votebackup/CSSFile_mirror1.bak of=/u02/oradata/racdb/CSSFile_mirror1 bs=4k
2500+0 records in
2500+0 records out
10240000 bytes (10 MB) copied, 0.217645 seconds, 47.0 MB/s

[root@racnode1 ~]# dd if=/u03/crs_backup/votebackup/CSSFile_mirror2.bak of=/u02/oradata/racdb/CSSFile_mirror2 bs=4k
2500+0 records in
2500+0 records out
10240000 bytes (10 MB) copied, 0.220051 seconds, 46.5 MB/s

#
# Verify the permissions on the recovered voting disk(s) are
# set appropriately.
#
[root@racnode1 ~]# chown oracle /u02/oradata/racdb/CSSFile
[root@racnode1 ~]# chgrp oinstall /u02/oradata/racdb/CSSFile
[root@racnode1 ~]# chmod 644 /u02/oradata/racdb/CSSFile

[root@racnode1 ~]# chown oracle /u02/oradata/racdb/CSSFile_mirror1
[root@racnode1 ~]# chgrp oinstall /u02/oradata/racdb/CSSFile_mirror1
[root@racnode1 ~]# chmod 644 /u02/oradata/racdb/CSSFile_mirror1

[root@racnode1 ~]# chown oracle /u02/oradata/racdb/CSSFile_mirror2
[root@racnode1 ~]# chgrp oinstall /u02/oradata/racdb/CSSFile_mirror2
[root@racnode1 ~]# chmod 644 /u02/oradata/racdb/CSSFile_mirror2

#
# With the recovered voting disk(s) in place, restart CRS
# on all Oracle RAC nodes.
#
[root@racnode1 ~]# crsctl start crs
[root@racnode2 ~]# crsctl start crs

http://www.idevelopment.info/images/popup_dialog_information_mark.gif 
If you have multiple voting disks, then you can remove the voting disks and add them back into your environment using the crsctl delete css votedisk path and crsctl add css votedisk path commands respectively, where path is the complete path of the location on which the voting disk resides.
After recovering the voting disk, run through several tests to verify that Oracle Clusterware is functioning correctly:
[root@racnode1 ~]# crs_stat -t
Name           Type           Target    State     Host
------------------------------------------------------------
ora.racdb.db   application    ONLINE    ONLINE    racnode1
ora....b1.inst application    ONLINE    ONLINE    racnode1
ora....b2.inst application    ONLINE    ONLINE    racnode2
ora....srvc.cs application    ONLINE    ONLINE    racnode2
ora....db1.srv application    ONLINE    ONLINE    racnode1
ora....db2.srv application    ONLINE    ONLINE    racnode2
ora....SM1.asm application    ONLINE    ONLINE    racnode1
ora....E1.lsnr application    ONLINE    ONLINE    racnode1
ora....de1.gsd application    ONLINE    ONLINE    racnode1
ora....de1.ons application    ONLINE    ONLINE    racnode1
ora....de1.vip application    ONLINE    ONLINE    racnode1
ora....SM2.asm application    ONLINE    ONLINE    racnode2
ora....E2.lsnr application    ONLINE    ONLINE    racnode2
ora....de2.gsd application    ONLINE    ONLINE    racnode2
ora....de2.ons application    ONLINE    ONLINE    racnode2
ora....de2.vip application    ONLINE    ONLINE    racnode2

[root@racnode1 ~]# crsctl check crs
CSS appears healthy
CRS appears healthy
EVM appears healthy



Move the OCR

#
# The new raw storage devices for OCR should be owned by the
# root user, must be in the oinstall group, and must have
# permissions set to 640. Provide at least 280MB of disk
# space for each OCR file and verify the raw storage devices
# can be seen from all nodes in the cluster.
#
[root@racnode1 ~]# ls -l /dev/raw/raw[12]
crw-r----- 1 root oinstall 162, 1 Oct  8 21:55 /dev/raw/raw1
crw-r----- 1 root oinstall 162, 2 Oct  8 21:55 /dev/raw/raw2

[root@racnode2 ~]# ls -l /dev/raw/raw[12]
crw-r----- 1 root oinstall 162, 1 Oct  8 21:54 /dev/raw/raw1
crw-r----- 1 root oinstall 162, 2 Oct  8 21:54 /dev/raw/raw2

#
# Use the dd command to zero out the devices and make sure
# no data is written to the raw devices.
#
[root@racnode1 ~]# dd if=/dev/zero of=/dev/raw/raw1
[root@racnode1 ~]# dd if=/dev/zero of=/dev/raw/raw2

#
# Verify CRS is running on node 1.
#
[root@racnode1 ~]# crsctl check crs
CSS appears healthy
CRS appears healthy
EVM appears healthy

#
# Verify CRS is running on node 2.
#
[root@racnode2 ~]# crsctl check crs
CSS appears healthy
CRS appears healthy
EVM appears healthy

#
# Query the current location and number of OCR files on
# the OCFS2 file system.
#
[root@racnode1 ~]# ocrcheck
Status of Oracle Cluster Registry is as follows :
         Version                  :          2
         Total space (kbytes)     :     262120
         Used space (kbytes)      :       4676
         Available space (kbytes) :     257444
         ID                       : 1513888898
         Device/File Name         : /u02/oradata/racdb/OCRFile         <-- OCR (primary)
                                    Device/File integrity check succeeded
         Device/File Name         : /u02/oradata/racdb/OCRFile_mirror  <-- OCR (mirror)
                                    Device/File integrity check succeeded

         Cluster registry integrity check succeeded

#
# Move OCR and OCR mirror to new storage location.
#
[root@racnode1 ~]# ocrconfig -replace ocr /dev/raw/raw1
[root@racnode1 ~]# ocrconfig -replace ocrmirror /dev/raw/raw2

#
# Verify OCR relocation from node 1.
#
[root@racnode1 ~]# ocrcheck
Status of Oracle Cluster Registry is as follows :
         Version                  :          2
         Total space (kbytes)     :     262120
         Used space (kbytes)      :       4676
         Available space (kbytes) :     257444
         ID                       : 1513888898
         Device/File Name         : /dev/raw/raw1
                                    Device/File integrity check succeeded
         Device/File Name         : /dev/raw/raw2
                                    Device/File integrity check succeeded

         Cluster registry integrity check succeeded

#
# Verify OCR relocation from node 2.
#
[root@racnode2 ~]# ocrcheck
Status of Oracle Cluster Registry is as follows :
         Version                  :          2
         Total space (kbytes)     :     262120
         Used space (kbytes)      :       4676
         Available space (kbytes) :     257444
         ID                       : 1513888898
         Device/File Name         : /dev/raw/raw1
                                    Device/File integrity check succeeded
         Device/File Name         : /dev/raw/raw2
                                    Device/File integrity check succeeded

         Cluster registry integrity check succeeded

#
# Remove all deleted OCR files from the OCFS2 file system.
#
[root@racnode1 ~]# rm /u02/oradata/racdb/OCRFile
[root@racnode1 ~]# rm /u02/oradata/racdb/OCRFile_mirror

Move the Voting Disk

#
# The new raw storage devices for the voting disks should be
# owned by the oracle user, must be in the oinstall group,
# and and must have permissions set to 644. Provide at least
# 20MB of disk space for each voting disk and verify the raw
# storage devices can be seen from all nodes in the cluster.
#
[root@racnode1 ~]# ls -l /dev/raw/raw[345]
crw-r--r-- 1 oracle oinstall 162, 3 Oct  8 22:44 /dev/raw/raw3
crw-r--r-- 1 oracle oinstall 162, 4 Oct  8 22:45 /dev/raw/raw4
crw-r--r-- 1 oracle oinstall 162, 5 Oct  9 00:22 /dev/raw/raw5

[root@racnode2 ~]# ls -l /dev/raw/raw[345]
crw-r--r-- 1 oracle oinstall 162, 3 Oct  8 22:53 /dev/raw/raw3
crw-r--r-- 1 oracle oinstall 162, 4 Oct  8 22:54 /dev/raw/raw4
crw-r--r-- 1 oracle oinstall 162, 5 Oct  9 00:23 /dev/raw/raw5

#
# Use the dd command to zero out the devices and make sure
# no data is written to the raw devices.
#
[root@racnode1 ~]# dd if=/dev/zero of=/dev/raw/raw3
[root@racnode1 ~]# dd if=/dev/zero of=/dev/raw/raw4
[root@racnode1 ~]# dd if=/dev/zero of=/dev/raw/raw5

#
# Query the current location and number of voting disks on
# the OCFS2 file system. There needs to be at least two
# voting disks configured before attempting to perform the
# move.
#
[root@racnode1 ~]# crsctl query css votedisk
 0.     0    /u02/oradata/racdb/CSSFile
 1.     0    /u02/oradata/racdb/CSSFile_mirror1
 2.     0    /u02/oradata/racdb/CSSFile_mirror2

located 3 votedisk(s).

#
# Stop all application processes.
#
[root@racnode1 ~]# srvctl stop database -d racdb
[root@racnode1 ~]# srvctl stop asm -n racnode1
[root@racnode1 ~]# srvctl stop asm -n racnode2
[root@racnode1 ~]# srvctl stop nodeapps -n racnode1
[root@racnode1 ~]# srvctl stop nodeapps -n racnode2

#
# Verify all application processes are OFFLINE.
#
[root@racnode1 ~]# crs_stat -t
Name           Type           Target    State     Host
------------------------------------------------------------
ora.racdb.db   application    OFFLINE   OFFLINE
ora....b1.inst application    OFFLINE   OFFLINE
ora....b2.inst application    OFFLINE   OFFLINE
ora....srvc.cs application    OFFLINE   OFFLINE
ora....db1.srv application    OFFLINE   OFFLINE
ora....db2.srv application    OFFLINE   OFFLINE
ora....SM1.asm application    OFFLINE   OFFLINE
ora....E1.lsnr application    OFFLINE   OFFLINE
ora....de1.gsd application    OFFLINE   OFFLINE
ora....de1.ons application    OFFLINE   OFFLINE
ora....de1.vip application    OFFLINE   OFFLINE
ora....SM2.asm application    OFFLINE   OFFLINE
ora....E2.lsnr application    OFFLINE   OFFLINE
ora....de2.gsd application    OFFLINE   OFFLINE
ora....de2.ons application    OFFLINE   OFFLINE
ora....de2.vip application    OFFLINE   OFFLINE

#
# Shut down CRS on node 1 and verify the CRS stack is not up.
#
[root@racnode1 ~]# crsctl stop crs
Stopping resources. This could take several minutes.
Successfully stopped CRS resources.
Stopping CSSD.
Shutting down CSS daemon.
Shutdown request successfully issued.

[root@racnode1 ~]# ps -ef | grep d.bin | grep -v grep

#
# Shut down CRS on node 2 and verify the CRS stack is not up.
#
[root@racnode2 ~]# crsctl stop crs
Stopping resources. This could take several minutes.
Successfully stopped CRS resources.
Stopping CSSD.
Shutting down CSS daemon.
Shutdown request successfully issued.

[root@racnode2 ~]# ps -ef | grep d.bin | grep -v grep

#
# Move all three voting disks to new storage location.
#
[root@racnode1 ~]# crsctl delete css votedisk /u02/oradata/racdb/CSSFile -force
successful deletion of votedisk /u02/oradata/racdb/CSSFile.

[root@racnode1 ~]# crsctl add css votedisk /dev/raw/raw3 -force
Now formatting voting disk: /dev/raw/raw3
successful addition of votedisk /dev/raw/raw3.

[root@racnode1 ~]# crsctl delete css votedisk /u02/oradata/racdb/CSSFile_mirror1 -force
successful deletion of votedisk /u02/oradata/racdb/CSSFile_mirror1.

[root@racnode1 ~]# crsctl add css votedisk /dev/raw/raw4 -force
Now formatting voting disk: /dev/raw/raw4
successful addition of votedisk /dev/raw/raw4.

[root@racnode1 ~]# crsctl delete css votedisk /u02/oradata/racdb/CSSFile_mirror2 -force
successful deletion of votedisk /u02/oradata/racdb/CSSFile_mirror2.

[root@racnode1 ~]# crsctl add css votedisk /dev/raw/raw5 -force
Now formatting voting disk: /dev/raw/raw5
successful addition of votedisk /dev/raw/raw5.

#
# Verify voting disk(s) relocation from node 1.
#
[root@racnode1 ~]# crsctl query css votedisk
 0.     0    /dev/raw/raw3
 1.     0    /dev/raw/raw4
 2.     0    /dev/raw/raw5

located 3 votedisk(s).

#
# Verify voting disk(s) relocation from node 2.
#
[root@racnode2 ~]# crsctl query css votedisk
 0.     0    /dev/raw/raw3
 1.     0    /dev/raw/raw4
 2.     0    /dev/raw/raw5

located 3 votedisk(s).

#
# Remove all deleted voting disk files from the OCFS2 file system.
#
[root@racnode1 ~]# rm /u02/oradata/racdb/CSSFile
[root@racnode1 ~]# rm /u02/oradata/racdb/CSSFile_mirror1
[root@racnode1 ~]# rm /u02/oradata/racdb/CSSFile_mirror2

#
# With all voting disks now located on raw storage devices,
# restart CRS on all Oracle RAC nodes.
#
[root@racnode1 ~]# crsctl start crs
[root@racnode2 ~]# crsctl start crs

0 comments:

Post a Comment

Auto Scroll Stop Scroll