Chapter 7. Oracle 10gR2 Clusterware

Chapter 7. Oracle 10gR2 Clusterware

7.1. Installing Oracle 10gR2 Clusterware (formerly 10gR1 CRS)

7.1. Installing Oracle 10gR2 Clusterware (formerly 10gR1 CRS)

Although this is documented in Oracle install manuals, in metalink notes, and elsewhere, it is consolidated here, so that this manual can be used as the main reference for a successful installation. A good supplementary Oracle article for doing RAC installations can be found here:

http://www.oracle.com/technology/pub/articles/smiley_rac10g_install.html

7.1.1. RHEL Preparation

All four RAC nodes need to be up and running, and in the CS4 cluster. All GFS volumes that will be used for this Oracle install should be mounted on all four nodes. At a minimum, the GFS volume (/mnt/ohome) that will contain the shared installation must be mounted:

Filesystem           		1K-blocks      Used Available Use% Mounted on
/dev/mapper/redo1-log1            4062624        20   4062604   1% /mnt/log1
/dev/mapper/redo2-log2            4062368        20   4062348   1% /mnt/log2
/dev/mapper/redo3-log3            4062624        20   4062604   1% /mnt/log3
/dev/mapper/redo4-log4            4062368        20   4062348   1% /mnt/log4
/dev/mapper/common-ohome          6159232        20   6159212   1% /mnt/ohome
/dev/mapper/oradata-datafiles    50193856        40  50193816   1% /mnt/datafiles

7.1.1.1. Map the shared raw partitions to RHEL rawdevices

The certified version of Oracle 10g on GFS requires that the two clusterware files be located on shared raw partitions and be visible by all RAC nodes in the cluster. The GULM lock server nodes do not need access to these files. These partitions are usually located on a small LUN that is not used for other purposes.

The LUN /dev/sda should be large enough to create two 256MB partitions. Using the /dev/sda command, create two primary partitions:

rac1 # fdisk /dev/sda
Device contains neither a valid DOS partition table, nor Sun, SGI or OSF disklabel
Building a new DOS disklabel. Changes will remain in memory only,
until you decide to write them. After that, of course, the previous
content will not be recoverable.

Warning: invalid flag 0x0000 of partition table 4 will be corrected by w(rite)

Command (m for help): p

Disk /dev/sda: 536 MB, 536870912 bytes
17 heads, 61 sectors/track, 1011 cylinders
Units = cylinders of 1037 * 512 = 530944 bytes

   Device Boot      Start         End      Blocks   Id  System

Command (m for help): n
Command action
   e   extended
   p   primary partition (1-4)
p
Partition number (1-4): 1
First cylinder (1-1011, default 1): 
Using default value 1
Last cylinder or +size or +sizeM or +sizeK (1-1011, default 1011): +256M

Command (m for help): p

Disk /dev/sda: 536 MB, 536870912 bytes
17 heads, 61 sectors/track, 1011 cylinders
Units = cylinders of 1037 * 512 = 530944 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1               1         483      250405   83  Linux

Command (m for help): n
Command action
   e   extended
   p   primary partition (1-4)
p
Partition number (1-4): 2
First cylinder (484-1011, default 484): 
Using default value 484
Last cylinder or +size or +sizeM or +sizeK (484-1011, default 1011): 
Using default value 1011

Command (m for help): p

Disk /dev/sda: 536 MB, 536870912 bytes
17 heads, 61 sectors/track, 1011 cylinders
Units = cylinders of 1037 * 512 = 530944 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1               1         483      250405   83  Linux
/dev/sda2             484        1011      273768   83  Linux

Command (m for help): w
The partition table has been altered!

Calling ioctl() to re-read partition table.
Syncing disks.

If the other nodes were already up and running while you created these partitions, these other nodes must re-read the partition table from disk (blockdev –rereadpt /dev/sda).

Make sure the service rawdevices is enabled on all four RAC nodes for the run level that will be used. This example enables it for both run levels. Run:

rac1 # chkconfig –level 35 rawdevices on

The mapping occurs in the files /etc/sysconfig/rawdevices

# raw device bindings
# format:  <rawdev> <major> <minor>
#          <rawdev> <blockdev>
# example: /dev/raw/raw1 /dev/sda1
#          /dev/raw/raw2 8 5
/dev/raw/raw1 /dev/sda1
/dev/raw/raw2 /dev/sda2

The permissions of these files must always be owned by the oracle user used to install the software (oracle). A 10 second delay is needed to insure that the rawdevices service has a chance to configure the /dev/raw directory. Add these lines to the /etc/rc.local file. This file is symbolically linked to /etc/rc?.d/S99local.

echo "Sleep a bit first and then set the permissions on raw"
sleep 10
chown oracle:dba /dev/raw/raw?

Note

After you install Clusterware and if you see a set of three /tmp/crsctl.<pid> trace files, then Clusterware did not start and there will be an error message in these files, usually complaining about permissions. Make sure the /dev/raw/raw? files are owned by oracle owner (in this example, oracle:dba)

7.1.1.2. Configure /etc/sysctl.conf

All four RAC nodes should have the same settings.

#
#	Oracle specific settings
#  	x86 Huge Pages are 2MB
#
#vm.hugetlb_pool                = 3000
#
kernel.shmmax                   = 4047483648
kernel.shmmni                   = 4096
kernel.shmall                   = 1051168
kernel.sem                      = 250 32000 100 128
net.ipv4.ip_local_port_range    = 1024 65000
fs.file-max                     = 65536
#
# This is for Oracle RAC core GCS services
#
net.core.rmem_default           = 1048576
net.core.rmem_max               = 1048576
net.core.wmem_default           = 1048576
net.core.wmem_max               = 1048576

The parameter that most often needs to be modified to support larger SGAs is the shared memory setting: kernel.shmmax. Typically 75% of the memory in a node should be allocated to the SGA. This does assume a modest number of Oracle foreground processes, which can consume physical memory for allocating the PGA (Oracle Process Global Area). The PGA is typically used for sorting. On a 4GB system, a 3GB SGA is recommended. The amount of memory consumed by the SGA and the PGA are very workload-dependant.

Note

The maximum size of the SGA on a 64-bit version of RHEL4 is currently slightly less than 128GB. The maximum size of the SGA on a 32-bit version of RHEL4 varies a bit. The standard size is 1.7GB. If the oracle binary is lower mapped, then this maximum can be increased to 2.5GB on –SMP kernels and 3.7GB on –HUGEMEM kernels. Lower mapping is an Oracle approved linking technique that changes the address where the SGA attaches in the user address space. When it is lowered, there is more space available for attaching a larger shared memory segment. See Metalink Doc 260152.1

Another strategy for extending the SGA to 8GB and higher in a 32-bit environment is through the use of the /dev/shm filesystem, although this is not recommended. If you need this much SGA, then using the 64-bit version of Oracle and RHEL4 is a better strategy.

The net.core.* parameters establish the UDP buffers that will be used by the Oracle Global Cache Services (GCS) for heartbeats and inter-node communication (including the movement of Oracle buffers). For large SGAs (more than 16GB), the use of HugeTLBs is recommended.

Tip

TLBs or Translation Lookaside Buffers is the working end of a Page Table Entry (PTE). The hardware speaks in physical addresses, whereas the processes running in user-mode speak only PVAs (Process Virtual Address), including the SGA. These addresses have to be translated and modern CPUs must provide some TLB register space so that during memory loads, the translation does not cause extra memory references.

By default, the page table entry on x86 hardware is 4K. When configuring a large SGA (16GB or more), the number of 4K PTEs (or TLBs slots) required to just map the SGA into the user’s process space requires 4,000,000 PTEs. HugeTLBs are a mechanism in RHEL that permits the use of 2MB hardware page tables. This mechanism reduces the number of PTEs required to map the SGA. The performance improvments increase with the size of the SGA, but can be between 10-30%.

During RHEL installation, 4GB of swap was set up and the Oracle Installer will check for this minimum.

7.1.1.3. Create the oracle user

You have to create a user (typically oracle or oinstall). The user name is somewhat arbitrary, but the DBAs might insist that it be one of these two. However, the group must be dba. Configure the /etc/sudoers file so that oracle admin users can safely execute root commands, which is required during and after the install:

# User alias specification
User_Alias      SYSAD=oracle, oinstall
User_Alias      USERADM=oracle, oinstall

# User privilege specification
SYSAD   ALL=(ALL)       ALL
USERADM ALL=(root)      NOPASSWD:/usr/local/etc/yanis.client
root    ALL=(ALL) ALL

7.1.1.4. Create_a_clean_ssh_connection_environment

You have to insure that whenever Clusterware talks to other nodes in the cluster, the ssh commands proceed unimpeded and without extraneous session dialog. In order to insure that all connection pathways are set up, run:

rac1 $ ssh rac2 date
Wed May 10 21:48:02 PDT 2006

and not return any extra strings or prompts, such as:

rac1 $ ssh rac2 date
oracle@rac2's password:
OR
The authenticity of host 'rac2 (192.168.1.151)' can't be established.
RSA key fingerprint is 48:e5:e0:84:63:62:03:84:c7:57:05:6b:58:7d:12:07.
Are you sure you want to continue connecting (yes/no)?

Create a file of ~/.ssh/authorized_keys, distribute it to all four nodes and then proceed to execute ssh hostname date to every host in the RAC cluster, in all combinations over both the primary and heartbeat interfaces. If you miss any one of them, the Oracle Clusterware installer will fail at the node verification step.

On rac1, login to the oracle user and make sure $HOME/.ssh is empty. Do not supply a passphrase for the keygen command; just press Return. Run:

rac1 $ ssh-keygen –t dsa
Generating public/private dsa key pair.
Enter file in which to save the key (/home/oracle/.ssh/id_dsa): 
Enter passphrase (empty for no passphrase): 
Enter same passphrase again: 
Your identification has been saved in /home/oracle/.ssh/id_dsa.
Your public key has been saved in /home/oracle/.ssh/id_dsa.pub.
The key fingerprint is:
9e:98:88:5c:17:bc:1f:dc:05:33:21:cf:04:99:23:e1 oracle@rac1

Repeat this step on all four RAC nodes (not required by GULM lock servers), collect up all the ~/.ssh/id_dsa.pub files into one ~/.ssh/authorized_keys file and distribute this to the other three nodes:

ssh rac2 cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
ssh rac3 cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
ssh rac4 cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

scp ~/.ssh/authorized_keys rac2:~/.ssh
scp ~/.ssh/authorized_keys rac3:~/.ssh
scp ~/.ssh/authorized_keys rac4:~/.ssh

Run all cominations from all nodes for both PUBLIC and PRIVATE networks (including the node where you are currently executing):

              
                
rac1 $ ssh rac1 date
rac1 $ ssh rac-priv date
rac1 $ ssh rac2 date
rac1 $ ssh rac2-priv date
rac1 $ ssh rac3 date
rac1 $ ssh rac3-priv date
rac1 $ ssh rac4 date
rac1 $ ssh rac4-priv date


              
            

7.1.1.5. Download Oracle Installers

Download the Clusterware and Database installation materials from OTN (Oracle Technology Network) as this is where the current base releases for all platforms are located. These are zipped, cpio files. Create a local installer directory on node1 (/home/oracle/inst) and then expand the archives:

gunzip -c 10201_clusterware_linux_x86_64.cpio.gz | cpio -ivdm &>log1 &

gunzip -c 10201_database_linux_x86_64.cpio.gz | cpio -ivdm &>log2 &

The installer can be run from any filesystem mounted on node1.

7.1.1.6. Create shared home directories

The Clusterware can be installed on each node locally, or on the shared_home. This is a production maintenance decision. A single shared Clusterware home is clearly less complex, but requires the entire cluster to shutdown when you do a Clusterware upgrade. Node-local Clusterware gives you the ability to do rolling upgrades, but with some added maintenance costs. This sample cluster will perform a single shared Clusterware install, so directories should be created and owned prior to running the installer

rac1 $ sudo mkdir /mnt/ohome/oracle
rac1 $ sudo chown oracle:dba /mnt/ohome/oracle

7.1.1.7. Verify X11 connectivity

In this example, the remote hostname where the X Windows will appear is called adminws. For X11, xhost + must be executed on adminws from any session running on this system. A shell window on adminws will login to lock1 and must have the DISPLAY environment variable set either upon login or in some profile:

rac1 $ export DISPLAY=adminws:0.0

Run xclock, to make sure that the X11 clock program appears on the adminws desktop.

Although, you can have ORACLE_BASE, ORACLE_HOME pre-set in the oracle user profile prior to running the installer, it is not mandatory. In our case, it is set to point to the shared Oracle home location that is a 6GB GFS volume. The installer will detect these values if they are set:

              
                
export ORACLE_BASE=/mnt/ohome/oracle/1010
export ORACLE_HOME=/mnt/ohome/oracle/1010/product/db

              
            

7.1.1.8. Clusterware rootpre.sh

The script /home/oracle/inst/clusterware/rootpre/rootpre.sh checks to see if a previous version of Clusterware has been installed. Once this script executes successfully, then it is safe to start up the Clusterware installer:

/home/oracle/inst/clusterware/runInstaller

********************************************************************************

Please run the script rootpre.sh as root on all machines/nodes. The script can be found at the top level of the local installer directory. Once you have run the script, please type Y to proceed

Answer 'y' if root has run 'rootpre.sh' so you can proceed with Oracle Clusterware installation.
Answer 'n' to abort installation and then ask root to run 'rootpre.sh'.

********************************************************************************

Has 'rootpre.sh' been run by root? [y/n] (n)
y
Starting Oracle Universal Installer...
Oracle Universal Installer: Welcome window

Figure 7.1. Oracle Universal Installer: Welcome window

Oracle Universal Installer: Specify Inventory Directory window

Figure 7.2. Oracle Universal Installer: Specify Inventory Directory window

Verify that $ORACLE_BASE/oraInventory is located on the shared GFS volume (/mnt/ohome). If you want an inventory on each node for CRS or the RDBMS, you would need to type in a node local directory (/opt/oracle/1010/oraInventory), but you have to insure the directory is created and owned by the oracle user before you click Next.

Oracle Universal Installer: Specify Home Details window

Figure 7.3. Oracle Universal Installer: Specify Home Details window

This screen’s default path will need to be changed, as it wants to put the CRSHOME in ORACLE_HOME. This install is a single, shared CRS install, so the path is on the shared GFS volume. The name was simplified to just crs. Click Next.

Prerequisite checks run and since we have done our preparation work in the file /etc/sysctl.conf, then we expect no errors or warnings.

Oracle Universal Installer: Prerequisite Checks window

Figure 7.4. Oracle Universal Installer: Prerequisite Checks window

Click Next.

Oracle Universal Installer: Specify Cluster Configuration window

Figure 7.5. Oracle Universal Installer: Specify Cluster Configuration window

Click Next.

Next, the other three nodes need to be added to the cluster configuration. All of these hosts must be defined in /etc/hosts on all nodes.

Modify a Node dialog

Figure 7.6. Modify a Node dialog

Click OK.

The completed configuration screen should contain all four nodes.

Oracle Universal Installer: Specify Cluster Configuration window

Figure 7.7. Oracle Universal Installer: Specify Cluster Configuration window

Click Next.

This is the step that fails if any part of the ssh hostname date set up was not performed correctly.

If the /etc/hosts, ~/.ssh/authorized_keys and ~/.ssh/known_hosts are all properly setup, then the installer should proceed to the next screen. Fully qualified hostnames can sometimes cause confusion, so the public network hostnames entered into the Clusterware installer must match the string that is returned from (hostname. Otherwise, go back and verify the entire matrix of ssh hostname date calls to make sure all these paths are clean. Often the self-referential ones are missed, ssh rac1 date from rac1 itself.

Oracle Universal Installer: Specify Network Interface Usage window

Figure 7.8. Oracle Universal Installer: Specify Network Interface Usage window

Edit the eth0 fabric and change the interface type to Public and click Next.

Edit Private Interconnect Type dialog

Figure 7.9. Edit Private Interconnect Type dialog

Click OK.

Oracle Universal Installer: Specify OCR Location window

Figure 7.10. Oracle Universal Installer: Specify OCR Location window

Click Next.

Assign the quorum voting and registry files. The option external redundancy is chosen as the files reside on a storage array that implements redundancy.

Oracle Universal Installer: Specify Voting Disk Location window

Figure 7.11. Oracle Universal Installer: Specify Voting Disk Location window

The quorum vote disk will be located on /dev/raw/raw2. Once again, external redundancy is chosen. Click Next.

Oracle Universal Installer: Summary window

Figure 7.12. Oracle Universal Installer: Summary window

The next screen is the Install Summary screen. Click Install.

The installer starts to install, link and copy. This process typically takes less than 10 minutes depending on the performance of the CPU and the filesystem.

Execute Configuration Scripts dialog

Figure 7.13. Execute Configuration Scripts dialog

This screen prompts for 2 sets of scripts to be run on all 4 nodes. Run the orainstRoot.sh script first on each node, in order.

rac1 $ sudo /mnt/ohome/oracle/1010/oraInventory/orainstRoot.sh
Password:
Changing permissions of /mnt/ohome/oracle/1010/oraInventory to 770.
Changing groupname of /mnt/ohome/oracle/1010/oraInventory to dba.
The execution of the script is complete

7.1.1.9. Instantiating Clusterware

The script /mnt/ohome/oracle/1010/product/crs/root.sh must be run on every node, one at a time, starting with rac1. You must wait until this script completes successfully on a given node before you can execute it on the next node. This script can take several minutes to complete per node, so be patient. This script will initialize the files, configure RHEL to run the Oracle Clusterware kernel services (including appending services to /etc/inittab) and then start these services up. Only the first execution of this script will initialize the registry and quorum disk files.

rac1 $ sudo /mnt/ohome/oracle/1010/product/crs/root.sh
WARNING: directory '/mnt/ohome/oracle/1010/product' is not owned by root
WARNING: directory '/mnt/ohome/oracle/1010' is not owned by root
WARNING: directory '/mnt/ohome/oracle is not owned by root
Checking to see if Oracle CRS stack is already configured

Setting the permissions on OCR backup directory
Setting up NS directories
Oracle Cluster Registry configuration upgraded successfully
WARNING: directory '/mnt/ohome/oracle/1010/product' is not owned by root
WARNING: directory '/mnt/ohome/oracle/1010' is not owned by root
WARNING: directory '/mnt/ohome/oracle is not owned by root
Successfully accumulated necessary OCR keys.
Using ports: CSS=49895 CRS=49896 EVMC=49898 and EVMR=49897.
node FIXME 
node 1: rac1 rac1-priv rac1
node 2: rac2 rac2-priv rac2
node 3: rac3 rac3-priv rac3
node 4: rac4 rac4-priv rac4
Creating OCR keys for user 'root', privgrp 'root'..
Operation successful.
Now formatting voting device: /dev/raw/raw2
Format of 1 voting devices complete.
Startup will be queued to init within 90 seconds.
Adding daemons to inittab
Expecting the CRS daemons to be up within 600 seconds.
CSS is active on these nodes.
        rac1
CSS is inactive on these nodes.
        rac2
        rac3
        rac4
Local node checking complete.

Run /mnt/ohome/oracle/1010/product/crs/root.sh on the remaining nodes. As this script executes on the other nodes, the last few lines should change to indicate that more nodes are active. These last few lines are from the command crsctl check install:

CSS is active on these nodes.
        rac1
        rac2
CSS is inactive on these nodes.
        rac3
        rac4
Local node checking complete.

If successful, the completion of the script on the fourth node should indicate that CSS is running on all nodes

CSS is active on these nodes.
        rac1
        rac2
        rac3
        rac4
CSS is active on all nodes.

Return to the main installer screen and click OK. Most of the verification and installation checks should pass.

Oracle Universal Installer: Configuration Assistants window

Figure 7.14. Oracle Universal Installer: Configuration Assistants window

Warning dialog

Figure 7.15. Warning dialog

If not, or if this pop-up occurs then is it likely the CRS application registration has failed to start up. This is usually due to it not finding the tool in the path, but this can be fixed by running the vipca utility from rac1 once you quit the installer. Click OK to the pop-up and Next for the Configuration Assistants screen.

Oracle Universal Installer: End of Installation window

Figure 7.16. Oracle Universal Installer: End of Installation window

The crs_stat command will display any registered CRS resources. There are currenlty none, so the vipca utility will need to be executed next.

rac1 $ crs_stat –t
CRS-0202: No resources are registered.

7.1.1.10. Registering Clusterware resources with VIPCA

The environment variable $ORA_CRS_HOME should be added to the oracle user profile and vipca must run as root.

rac1 $ export CRS_HOME=/mnt/ohome/oracle/1010/product/crs
rac1 $ sudo $CRS_HOME/bin/vipca
VIP Configuration Assistant: Welcome window

Figure 7.17. VIP Configuration Assistant: Welcome window

Click Next on this window and the next one. Then the hostnames mapping window appears:

VIP Configuration Assistant: Virtual IPs for Cluster Nodes window

Figure 7.18. VIP Configuration Assistant: Virtual IPs for Cluster Nodes window

Fill in the first IP Alias name and press Tab. The tool should fill in the rest.

VIP Configuration Assistant: Virtual IPs for Cluster Nodes window

Figure 7.19. VIP Configuration Assistant: Virtual IPs for Cluster Nodes window

Click Next and a summary screen appears. Click OK.

VIP Configuration Assistant: Progress dialog

Figure 7.20. VIP Configuration Assistant: Progress dialog

The final window should be:

Configuration Results window

Figure 7.21. Configuration Results window

Click Exit and then rerun the status command.

rac1 $ crs_stat -t
Name           Type           Target    State     Host        
------------------------------------------------------------
ora.rac1.gsd   application    ONLINE    ONLINE    rac1        
ora.rac1.ons   application    ONLINE    ONLINE    rac1        
ora.rac1.vip   application    ONLINE    ONLINE    rac1        
ora.rac2.gsd   application    ONLINE    ONLINE    rac2        
ora.rac2.ons   application    ONLINE    ONLINE    rac2        
ora.rac2.vip   application    ONLINE    ONLINE    rac2        
ora.rac3.gsd   application    ONLINE    ONLINE    rac3        
ora.rac3.ons   application    ONLINE    ONLINE    rac3        
ora.rac3.vip   application    ONLINE    ONLINE    rac3        
ora.rac4.gsd   application    ONLINE    ONLINE    rac4        
ora.rac4.ons   application    ONLINE    ONLINE    rac4        
ora.rac4.vip   application    ONLINE    ONLINE    rac4

Note: This documentation is provided {and copyrighted} by Red Hat®, Inc. and is released via the Open Publication License. The copyright holder has added the further requirement that Distribution of substantively modified versions of this document is prohibited without the explicit permission of the copyright holder. The CentOS project redistributes these original works (in their unmodified form) as a reference for CentOS-4 because CentOS-4 is built from publicly available, open source SRPMS. The documentation is unmodified to be compliant with upstream distribution policy. Neither CentOS-4 nor the CentOS Project are in any way affiliated with or sponsored by Red Hat®, Inc.