Oracle: Oracle RAC Overview

AGENDA

What is Real Application Cluster (RAC) ?
Why to use RAC ?
Single instance vs RAC
RAC Architecture
Sharing of Resources
Background processes
Internal structures and services
RAC Administration
ORACLE Clusterware
RAC Mechanism
Failover
Load Balancing

WHAT IS RAC ??
Real Application Clusters (RAC) introduced in oracle 9i, is a technology that enables a single database to work on multiple instances simultaneously on different nodes.

A RAC database requires three components:

cluster nodes
Shared storage
Oracle Clusterware

To identify RAC instance in a database the following can be used:

Use the DBMS_UTILITY.IS_CLUSTER_DATABASE function
Show parameter CLUSTER_DATABASE

WHY USE RAC ??
High Availability
Failover
Reliability
Scalability
Manageability
Recoverability
Transparency
Row Locking
Error Detection
Buffer Cache Management
Continuous Operations
Load Balancing/Sharing
Reduction in total cost of ownership

SINGLE INSTANCE ~ RAC

RAC ARCHITECTURE:

Public Network: It is the public IP on which listeners would be listening and clients would contact the listener on this public IP.

Private Network Interconnect: It is a network path i.e exclusively used for inter-instance communication used by the cluster and dedicated to the server nodes of a cluster. It is used for the synchronization ofresources and in some cases for the transfer of data. It has a high bandwidth and low latency.

Storage Network: It is the network which connects the instances to the database in RAC.

WHAT’S SHARED, WHAT’S NOT
SHARED
Disk access
Resources that manage data
All instances have common data & controls files
NOT SHARED
1) Each node has its own dedicated:
      • System memory
      • Operating system
      • Database instance
      • Application software
2) Each instance has individual
      • Log files and
      • Rollback segments

BACKGROUND PROCESSES:

1. Global Cache Service Processes (LMSn)
LMSn handles block transfers between the holding instance's buffer cache and requesting foreground process on the requesting instance.
LMS maintains read consistency by rolling back any uncommitted transactions for blocks that are being requested by any remote instance.
Even if ’n’ value(0-9) varies depending on the amount of messaging traffic amongst nodes in the cluster, there is default, one LMS process per pair of CPUs.

2. Global Enqueue Service Monitor (LMON)
It constantly handles reconfiguration of locks and global resources when a node joins or leaves the cluster. Its services are also known as Cluster Group Services (CGS).

3. Global Enqueue Service Daemon (LMD)
It manages lock manager service requests for GCS resources and sends them to a service queue to be handled by the LMSn process. The LMD process also handles global deadlock detection and remote resource requests (remote resource requests are requests originating from another instance).

4. Lock Process (LCK)
LCK manages non-cache fusion resource requests such as library and row cache requests and lock requests that are local to the server. Because the LMS process handles the primary function of lock management, only a single LCK process exists in each instance.

5. Diagnosability Daemon (DIAG)
This background process monitors the health of the instance and captures diagnostic data about process failures within instances. The operation of this daemon is automated and updates an alert log file to record the activity that it performs.

NOTE: In RAC environment, all instances have their separate alert logs.

Global Service Daemon (GSD)
This is a component in RAC that receives requests from the SRVCTL control utility to execute administrative tasks like startup or shutdown. The command is executed locally on each node and the results are returned to SRVCTL. The GSD is installed on the nodes by default.

INTERNAL STRUCTURES AND SERVICES

Global Resource Directory (GRD)
Records current state and owner of each resource
Contains convert and write queues
Distributed across all instances in cluster
Maintained by GCS and GES

Global Cache Services (GCS)Implements cache coherency for database
Coordinates access to database blocks for instances

Global Enqueue Services (GES)
Controls access to other resources (locks) including library cache and dictionary cache
Performs deadlock detection

RAC ADMINISTRATION:

Parameter File

Using a shared SPFILE is recommended while each instance can have its own dedicated parameter file as well.
SPFILE should be placed on a shareable disk subsystem like a rawdevice, a clustered file system, or Automatic Storage Management (ASM).
Oracle instance parameters for the RAC environment can be grouped into three major categories i.e Unique, Identical,Instance-Specific Parameters.
If a parameter appears more than once in a parameter file, the last specified value is the effective value, unless the values are on consecutive lines—in that case values from consecutive lines are concatenated.
To set the value for a parameter the following syntax is used. An asterisk (*) or no value in place of an instance_nameimplies that the parameter value is valid for all the instances. <instance_name>. <parameter_name>=<parameter_value>*.undo_management=auto
To set the parameter in SPFILE, the following command can be used. alter system set <parameter>=<value>scope=<memory/spfile/both>comment=<'comments'>deferred sid=<sid, *>
When DBCA is used to create an RAC database, by default it creates an SPFILE on the shared disk subsystem used. Else it can also be created manually.

Undo Management

Oracle stores original values of the data called Before imagein undo segments to provide read consistency and to roll back uncommitted transactions.
In the Oracle RAC environment, each instance stores transaction undo data in its dedicated undo tablespace. For this we must set undo_tablespace for individual instances as follows. prod1.undo_tablespace= undo_tbs1 prod2.undo_tablespace=undo_tbs2
Instances in RAC can use either automatic or manual undo management, but the parameter undo_managementhas to be the same across all the instances.

To increase the size of an undo tablespace, either of the following can be used.
1. Add another database to undo tablespace.
2. Increase the size of the existing datafile(s) belonging to the undo tablespace.

While using Manual Undo Management following considerations aremade.

1. Use manual undo management only if you have very good reason for not using automatic undo management.

2.Do not create other objects such as tables, indexes, and so on in the tablespace used for rollback segments.

3.Create one rollback segment for every four concurrent transactions.

Temporary Tablespace

In an RAC environment, all instance share the same temporary tablespace with each instance creating a temporary segment in the tablespace it is using. The size should be at least equal to the concurrent maximum requirement of all the instances.
The default temporary tablespace cannot be dropped or taken offline; however it can be changed followed by dropping or taking offline the original default temporary tablespace.
To determine the temporary tablespace used by each instance GV$SORT_SEGMENTand GV$TEMPSEG_USAGE views can be queried on INST_IDcolumn basis.
In a temporary tablespace group, an user will always use the same assigned temporary tablespace irrespective of the instance being used. Aninstance can reclaim the space used by other instances’temporary segments in that tablespace if required for large sorts.

Online Redologs

In an RAC environment, each instance has its own set of online redologfiles and redologscalled Thread.
The size of an online redologis independent of other instances’redologsizes and is determined by the local instance’s workload and backup and recovery considerations.
Each instance has exclusive write access to its own online redologfiles, however it can read another instance’s current online redologfile to perform instance recovery if needed. Thus, an online redologneeds to be located on a shared storage device and cannot be on a local disk.
Similar to single instance database, views V$LOG and V$LOGFILE can be used in RAC for getting information about redo logs.

Archive Logs

Steps to enable or disable log mode
1.Set cluster_database=false for the instance:
            alter system set cluster_database=false scope=spfile sid= 'prod1';
2. Shut down all the instances accessing the database:
           srvctl stop database -d prod
3. Mount the database using the local instance:
          startup mount
4. Enable archiving:
         alter database archivelog/ noarchivelog;
5. Change the parameter cluster_database=true for the instance prod1:
        alter system set cluster_database=true scope=spfile sid='prod1’;
6. Shut down the local instance:
        shutdown ;
7. Bring up all the instances:
         srvctl start database -d prod

Flashback Area (Enable / Disable )
1. Set cluster_database=false for the instance to perform this operation.
              alter system set cluster_database=false scope=spfile sid= 'prod1';
2. Set DB_RECOVERY_FILE_DEST_SIZE and DB_RECOVERY_FILE_DEST. The                 DB_RECOVERY_FILE_DEST parameter should point to a shareable disk subsystem. alter system set db_recovery_file_dest_size=200m scope=spfile; alter system set db_recovery_file_dest='/ocfs2/flashback' scope=spfile;
3. Shut down all instances accessing the database:
            srvctl stop database -d prod
4. Mount the database using the local instance:
           startup mount
5. Enable the flashback by issuing the following command:
           alter database flashback on/off;
6. Change back the parameter to cluster_database=true for the instance prod1:
           alter system set cluster_database=true scope=spfile sid='prod1'
7. Shut down the instance:
           shutdown;
8. Start all the instances:
          srvctl start database -d prod

SRVCTL Utility

To start all instances associated with a database the following command can be executed from any of the nodes. The command also starts listeners on each node if not already running.

srvctl start database -d <DB Name>

Similarly, to shut down all instances associated with the database stop command can be used which does not stop listeners, as they mightbe serving other database instances running on the same machine.

srvctl stop database –d <DB Name>

Options specified by -o are directly passed on to SQL *Plus as command-line options for start/stop commands.

srvctl stop database -d <DB Name> -o immediate
srvctl start database -d <DB Name> -o force

To perform operation on individual instance level, -I option can be used.

srvctl stop database -d <DB Name> -i instance <IN name>

ORACLE CLUSTERWARE

It is a cluster manager integrated in the Database to handle the cluster including node membership, group services, global resource management, and high availability functions. It can also be used with non-cluster database.

Names as it has evolved -

                     1.Oracle Cluster Management Services (OCMS) - 9.0.1 and 9.2
                     2.Cluster Ready Services (CRS) ( Generic, portable cluster manager) - 10.1
                     3.Oracle Clusterware (CRS has been renamed) - 10.2

Background processes -

            1. Cluster Synchronization Service (CSS)
            2. Cluster Ready Services (CRS)
            3. Event Manager (EVM)

To administer Clusterware, CRSCTL utility can be used i.e present in $ORA_CRS_HOME/bin.
Oracle Clusterware must be installed prior to installing ORACLE database. We need ROOT user during the installation process to perform various tasks requiring super user privileges.
Being the first Oracle software to be installed on the system, it is susceptible to configuration errors. So it is recommended to useCluster Verification Utility (CLUVFY) on all nodes i.e introduced in Oracle 10.2 but backward compatible with 10.1.
Oracle Clusterware requires two files that must be located on shared storage for its operation.

             1. Oracle Cluster Registry (OCR)
             2. Voting Disk

Oracle Cluster Registry (OCR)
1) Located on shared storage and in Oracle 10.2 and above can be mirrored to maximum two copies.
2) Defines cluster resources including
              • Databases and Instances ( RDBMS and ASM)
              • Services and Node Applications (VIP,ONS,GSD)
              • Listener Process

Voting Disk (Quorum Disk / File in Oracle 9i)
1) Used to determine RAC instance membership and is located on shared storage accessible to all instances.
2) used to determine which instance takes control of cluster in case of node failure to avoid split brain
3) In Oracle 10.2 and above can be mirrored to only Odd number of copies (1, 3, 5 etc)

VIRTUAL IP (VIP)

To make the applications highly available and to eliminate SPOF,Oracle 10g introduced a new feature called CLUSTER VIPs i.e a virtual IP address different from the set of in cluster IP addresses that is used by the outside world to connect to the database.
A VIP name and address must be registered in the DNS along with standard static IP information. Listeners would be configured to listen on VIPs instead of the public IP.
When a node is down, the VIP is automatically failed over to oneof the other nodes. The node that gets the VIP will “re-ARP”to the world, indicating the new MAC address of the VIP. Clients are sent error message immediately rather than waiting for the TCP timeout value.

CACHE FUSION

1) Underlying technology that enables RAC (starting with 9i and improved performance with 10g)
2) Protocol that allows instances to combine their data caches intoa shared global cache.
3) Allows any node to get the most up-to-date data information from the cache of any other node in the cluster without having to access the disk drives again.
4) Needed when Dirty Blockof data is created

Data from disk is read into memory on a node
Data is updated on that node
Data hasn’t been written to disk yet
Another node requests the data

WHAT IS FAILOVER ?
1) If a node in the shared disk cluster fails, the system dynamically redistributes the workload among the surviving cluster nodes.
2) RAC checks to detect node and network failures. A disk-based heartbeat mechanism uses the control file to monitor node membership and the cluster interconnect is regularly checked to determine correct operation.

3) Enhanced failover reliability in 10g with the use of Virtual IP addresses (VIPs)
4) If one node or instance fails, node detecting failure does the following.
             • Read redo log of failed instance from last checkpoint
             • Apply redo to datafiles including undo segments (roll forward)
             • Rollback uncommitted transactions
             • Cluster is frozen during part of this process

FAST APPLICATION NOTIFICATION (FAN)

1) FAN is a method introduced in Oracle 10.1, by which applicationscan be informed of changes in cluster status for -
       • Fast node failure detection
       • Workload balancing

2) Advantageous by preventing applications from -
        • Waiting for TCP/IP timeouts when a node fails
        • Trying to connect to currently down database service
        • Processing data received from failed node

3) Can be notified using -
         • Server side callouts
         • Fast Connection Failover (FCF)
         • ONS API

ORACLE NOTIFICATION SERVICE (ONS)

1) ONS, introduced in Oracle 10.1 is a subscribe serviceused by Oracle Clusterware to propagate messages to :
             • Nodes in cluster
             • Middle-tier application servers
             • Clients

2) ONS is automatically launched for RAC on each node as part of the Oracle Clusterware installation process.However, it can also be configured to run on nodes hosting client or mid-tier applications.

3) It is an underlying mechanism for Fast Application Notification (FAN).

4) ONSruns independent of Transparent Application Failover (TAF).

TRANSPARENT APPLICATION FAILOVER (TAF)

1) TAF is a client-side feature that allows for clients to reconnect to surviving databases in the event of a failure of a database instance.
2) Masks failures to end users; they don’t need to log back into the system
3) Applications and users are transparently reconnected to another node
4) Applications and queries continue uninterrupted
5) Transactions can failover and replay
6) Login context maintained
7) DML transactions are rolled back
8) Requires configuration in TNSNAMES.ORA
       RAC_FAILOVER =
              (DESCRIPTION =
                (ADDRESS_LIST =
                     (FAILOVER = ON)
(ADDRESS = (PROTOCOL = TCP)(HOST = node1)(PORT = 1521))
(ADDRESS = (PROTOCOL = TCP)(HOST = node2)(PORT = 1521))
)
(CONNECT_DATA =
              (SERVICE_NAME = RAC)
              (SERVER = DEDICATED) (FAILOVER_MODE =(TYPE=SELECT)(METHOD=BASIC)(RETRIES=30)(DELAY=5))
) )

                  THANK YOU

Monday, April 11, 2011

Oracle RAC Overview

0 comments:

Post a Comment