Drop Down MenusCSS Drop Down MenuPure CSS Dropdown Menu

Monday 9 November 2015

Resolving Gaps in Data Guard Apply Using Incremental RMAN BAckup

Recently, we had a glitch on a Data Guard (physical standby database) on infrastructure. This is not a critical database; so the monitoring was relatively lax. And that being done by an outsourcer does not help it either. In any case, the laxness resulted in a failure remaining undetected for quite some time and it was eventually discovered only when the customer complained. This standby database is usually opened for read only access from time totime.This time, however, the customer saw that the data was significantly out of sync with primary and raised a red flag. Unfortunately, at this time it had become a rather political issue.


Since the DBA in charge couldn’t resolve the problem, I was called in. In this post, I will describe the issue and how it was resolved. In summary, there are two parts of the problem:

(1) What happened
(2) How to fix it

What Happened

Let’s look at the first question – what caused the standby to lag behind. First, I looked for the current SCN numbers of the primary and standby databases. On the primary:

SQL> select current_scn from v$database;

CURRENT_SCN
-----------
1447102

On the standby:

SQL> select current_scn from v$database;

CURRENT_SCN
-----------
1301571

Clearly there is a difference. But this by itself does not indicate a problem; since the standby is expected to lag behind the primary (this is an asynchronous non-real time apply setup). The real question is how much it is lagging in the terms of wall clock. To know that I used the scn_to_timestamp function to translate the SCN to a timestamp:

SQL> select scn_to_timestamp(1447102) from dual;

SCN_TO_TIMESTAMP(1447102)
-------------------------------
18-DEC-09 08.54.28.000000000 AM

I ran the same query to know the timestamp associated with the SCN of the standby database as well (note, I ran it on the primary database, though; since it will fail in the standby in a mounted mode):

SQL> select scn_to_timestamp(1301571) from dual;

SCN_TO_TIMESTAMP(1301571)
-------------------------------
15-DEC-09 07.19.27.000000000 PM

This shows that the standby is two and half days lagging! The data at this point is not just stale; it must be rotten.

The next question is why it would be lagging so far back in the past. This is a 10.2 database where FAL server should automatically resolved any gaps in archived logs. Something must have happened that caused the FAL (fetch archived log) process to fail. To get that answer, first, I checked the alert log of the standby instance. I found these lines that showed the issue clearly:


Fri Dec 18 06:12:26 2009
Waiting for all non-current ORLs to be archived...
Media Recovery Waiting for thread 1 sequence 700
Fetching gap sequence in thread 1, gap sequence 700-700
… 
Fri Dec 18 06:13:27 2009
FAL[client]: Failed to request gap sequence
GAP - thread 1 sequence 700-700
DBID 846390698 branch 697108460
FAL[client]: All defined FAL servers have been attempted.

Going back in the alert log, I found these lines:

Tue Dec 15 17:16:15 2009
Fetching gap sequence in thread 1, gap sequence 700-700
Error 12514 received logging on to the standby
FAL[client, MRP0]: Error 12514 connecting to DEL1 for fetching gap sequence
Tue Dec 15 17:16:15 2009
Errors in file /opt/oracle/admin/DEL2/bdump/del2_mrp0_18308.trc:
ORA-12514: TNS:listener does not currently know of service requested in connect descriptor
Tue Dec 15 17:16:45 2009
Error 12514 received logging on to the standby
FAL[client, MRP0]: Error 12514 connecting to DEL1 for fetching gap sequence
Tue Dec 15 17:16:45 2009
Errors in file /opt/oracle/admin/DEL2/bdump/del2_mrp0_18308.trc:
ORA-12514: TNS:listener does not currently know of service requested in connect descriptor

This clearly showed the issue. On December 15th at 17:16:15, the Managed Recovery Process encountered an error while receiving the log information from the primary. The error was ORA-12514 “TNS:listener does not currently know of service requested in connect descriptor”. This is usually the case when the TNS connect string is incorrectly specified. The primary is called DEL1 and there is a connect string called DEL1 in the standby server.

The connect string works well. Actually, right now there is no issue with the standby getting the archived logs; so there connect string is fine - now. The standby is receiving log information from the primary. There must have been some temporary hiccups causing that specific archived log not to travel to the standby. If that log was somehow skipped (could be an intermittent problem), then it should have been picked by the FAL process later on; but that never happened. Since the sequence# 700 was not applied, none of the logs received later – 701, 702 and so on – were applied either. This has caused the standby to lag behind since that time.

So, the fundamental question was why FAL did not fetch the archived log sequence# 700 from the primary. To get to that, I looked into the alert log of the primary instance. The following lines were of interest:


Tue Dec 15 19:19:58 2009
Thread 1 advanced to log sequence 701 (LGWR switch)
Current log# 2 seq# 701 mem# 0: /u01/oradata/DEL1/onlinelog/o1_mf_2_5bhbkg92_.log
Tue Dec 15 19:20:29 2009Errors in file /opt/oracle/product/10gR2/db1/admin/DEL1/bdump/del1_arc1_14469.trc:
ORA-00308: cannot open archived log '/u01/oraback/1_700_697108460.dbf'
ORA-27037: unable to obtain file status
Linux Error: 2: No such file or directory
Additional information: 3
Tue Dec 15 19:20:29 2009
FAL[server, ARC1]: FAL archive failed, see trace file.
Tue Dec 15 19:20:29 2009
Errors in file /opt/oracle/product/10gR2/db1/admin/DEL1/bdump/del1_arc1_14469.trc:
ORA-16055: FAL request rejected
ARCH: FAL archive failed.
Archiver continuing
Tue Dec 15 19:20:29 2009
ORACLE Instance DEL1 - Archival Error. Archiver continuing.

These lines showed everything clearly. The issue was:

ORA-00308: cannot open archived log '/u01/oraback/1_700_697108460.dbf'
ORA-27037: unable to obtain file status
Linux Error: 2: No such file or directory


The archived log simply was not available. The process could not see the file and couldn’t get it across to the standby site.

Upon further investigation I found that the DBA actually removed the archived logs to make some room in the filesystem without realizing that his action has removed the most current one which was yet to be transmitted to the remote site. The mystery surrounding why the FAL did not get that log was finally cleared.

Solution

Now that I know the cause, the focus was now on the resolution. If the archived log sequence# 700 was available on the primary, I could have easily copied it over to the standby, registered the log file and let the managed recovery process pick it up. But unfortunately, the file was gone and I couldn’t just recreate the file. Until that logfile was applied, the recovery will not move forward. So, what are my options?

One option is of course to recreate the standby - possible one but not technically feasible considering the time required. The other option is to apply the incremental backup of primary from that SCN number. That’s the key – the backup must be from a specific SCN number. I have described the process since it is not very obvious. The following shows the step by step approach for resolving this problem. I have shown where the actions must be performed – [Standby] or [Primary].

1. [Standby] Stop the managed standby apply process:

SQL> alter database recover managed standby database cancel;

Database altered.

2. [Standby] Shutdown the standby database

3. [Primary] On the primary, take an incremental backup from the SCN number where the standby has been stuck:

RMAN> run {
2> allocate channel c1 type disk format '/u01/oraback/%U.rmb';
3> backup incremental from scn 1301571 database;
4> }

using target database control file instead of recovery catalog
allocated channel: c1
channel c1: sid=139 devtype=DISK

Starting backup at 18-DEC-09
channel c1: starting full datafile backupset
channel c1: specifying datafile(s) in backupset
input datafile fno=00001 name=/u01/oradata/DEL1/datafile/o1_mf_system_5bhbh59c_.dbf
… 
piece handle=/u01/oraback/06l16u1q_1_1.rmb tag=TAG20091218T083619 comment=NONE
channel c1: backup set complete, elapsed time: 00:00:06
Finished backup at 18-DEC-09
released channel: c1

4. [Primary] On the primary, create a new standby controlfile:

SQL> alter database create standby controlfile as '/u01/oraback/DEL1_standby.ctl';

Database altered.

5. [Primary] Copy these files to standby host:

oracle@oradba1 /u01/oraback# scp *.rmb *.ctl oracle@oradba2:/u01/oraback
oracle@oradba2's password:
06l16u1q_1_1.rmb 100% 43MB 10.7MB/s 00:04
DEL1_standby.ctl 100% 43MB 10.7MB/s 00:04 

6. [Standby] Bring up the instance in nomount mode:

SQL> startup nomount

7. [Standby] Check the location of the controlfile:

SQL> show parameter control_files

NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
control_files string /u01/oradata/standby_cntfile.ctl

8. [Standby] Replace the controlfile with the one you just created in primary.

9. $ cp /u01/oraback/DEL1_standby.ctl /u01/oradata/standby_cntfile.ctl
10.[Standby] Mount the standby database:

SQL> alter database mount standby database;
11.[Standby] RMAN does not know about these files yet; so you must let it know – by a process called cataloging. Catalog these files:

$ rman target=/

Recovery Manager: Release 10.2.0.4.0 - Production on Fri Dec 18 06:44:25 2009

Copyright (c) 1982, 2007, Oracle. All rights reserved.

connected to target database: DEL1 (DBID=846390698, not open)
RMAN> catalog start with '/u01/oraback';

using target database control file instead of recovery catalog
searching for all files that match the pattern /u01/oraback

List of Files Unknown to the Database
=====================================
File Name: /u01/oraback/DEL1_standby.ctl
File Name: /u01/oraback/06l16u1q_1_1.rmb

Do you really want to catalog the above files (enter YES or NO)? yes
cataloging files...
cataloging done

List of Cataloged Files
=======================
File Name: /u01/oraback/DEL1_standby.ctl
File Name: /u01/oraback/06l16u1q_1_1.rmb

12.Recover these files:

RMAN> recover database;

Starting recover at 18-DEC-09
using channel ORA_DISK_1
channel ORA_DISK_1: starting incremental datafile backupset restore
channel ORA_DISK_1: specifying datafile(s) to restore from backup set
destination for restore of datafile 00001: /u01/oradata/DEL2/datafile/o1_mf_system_5lptww3f_.dbf
...
channel ORA_DISK_1: reading from backup piece /u01/oraback/05l16u03_1_1.rmb
channel ORA_DISK_1: restored backup piece 1
piece handle=/u01/oraback/05l16u03_1_1.rmb tag=TAG20091218T083619
channel ORA_DISK_1: restore complete, elapsed time: 00:00:07

starting media recovery

archive log thread 1 sequence 8012 is already on disk as file /u01/oradata/1_8012_697108460.dbf
archive log thread 1 sequence 8013 is already on disk as file /u01/oradata/1_8013_697108460.dbf
… 

13. After some time, the recovery fails with the message:

archive log filename=/u01/oradata/1_8008_697108460.dbf thread=1 sequence=8009
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of recover command at 12/18/2009 06:53:02
RMAN-11003: failure during parse/execution of SQL statement: alter database recover logfile '/u01/oradata/1_8008_697108460.dbf'
ORA-00310: archived log contains sequence 8008; sequence 8009 required
ORA-00334: archived log: '/u01/oradata/1_8008_697108460.dbf'

This happens because we have come to the last of the archived logs. The expected archived log with sequence# 8008 has not been generated yet.

14.At this point exit RMAN and start managed recovery process:

SQL> alter database recover managed standby database disconnect from session;

Database altered.

15.Check the SCN’s in primary and standby:

[Standby] SQL> select current_scn from v$database;

CURRENT_SCN
-----------
1447474
[Primary] SQL> select current_scn from v$database;

CURRENT_SCN
-----------
1447478

Now they are very close to each other. The standby has now caught up.

Applying OPATCH on single instance database

What is OPATCH?

Opatch is a set of perl scripts and java classes which allow the application and rolling back of interim (one-off) patches currently for the Oracle RDBMS product. The program requires that the perl language interpreter version 5.005_03 or greater to run but version 5.6 or greater is
recommended. The main shell of OPatch is called from opatch, which calls other Perl modules and subsequently java classes as well. Under a normal Oracle home installation the JRE is used from the Oracle home and the Perl installation as well which is installed with the Apache
software in $ORACLE_HOME/Apache/perl .
In single line you can say,  OPatch is the oracle database’s Interim (one-off) Patch Installer.

Where can I download it from ?
If OPatch is not installed into your Oracle Home ($ORACLE_HOME/OPatch), you may need to download it from Metalink.
Download latest OPatch (Patch 6880880 ) for the Oracle software release.
Follow README.txt for the OPatch utility installation.
https://updates.oracle.com/Orion/Services/download?type=readme&aru=13116058

or you can follow the steps as follows:
% mv p6880880_101000_SOLARIS64.zip $ORACLE_HOME
% cd $ORACLE_HOME
(If a former OPatch directory already exist, then rename it)                                                                        
% mv Opatch OPatch_100057_OLD                                                                                                   
% unzip p6880880_101000_SOLARIS64.zip
Archive:  p6880880_101000_SOLARIS64.zip
   creating: OPatch/
   creating: OPatch/docs/
  inflating: OPatch/docs/FAQ        
  inflating: OPatch/docs/README.txt 
  inflating: OPatch/docs/Users_Guide.txt 
  inflating: OPatch/docs/bt1.txt    
  inflating: OPatch/docs/bt2.txt    
  inflating: OPatch/docs/tmp        
  inflating: OPatch/emdpatch.pl     
   creating: OPatch/jlib/
  inflating: OPatch/jlib/opatch.jar 
  inflating: OPatch/opatch          
  inflating: OPatch/opatch.bat      
  inflating: OPatch/opatch.pl       
   creating: OPatch/perl_modules/
  inflating: OPatch/perl_modules/Apply.pm 
  inflating: OPatch/perl_modules/AttachHome.pm 
  inflating: OPatch/perl_modules/Command.pm 
  inflating: OPatch/perl_modules/LsInventory.pm 
  inflating: OPatch/perl_modules/Query.pm 
  inflating: OPatch/perl_modules/RollBack.pm 
  inflating: OPatch/perl_modules/Version.pm 
  inflating: OPatch/perl_modules/XML.pm 
  inflating: OPatch/perl_modules/opatchIO.pm 
  inflating: OPatch/README.txt      
   creating: OPatch/ocm/
 extracting: OPatch/ocm/ocm.zip     
   creating: OPatch/ocm/doc/
  inflating: OPatch/ocm/doc/license.txt 
   creating: OPatch/ocm/lib/
  inflating: OPatch/ocm/lib/emocmutl.jar 
   creating: OPatch/ocm/bin/
  inflating: OPatch/ocm/bin/emocmrsp

What is Oracle Database Inventory and where it is located?
Oracle Inventory is the location or the place where all the information about an Oracle Home is stored and maintained. When ever we install an RDBMS Oracle Home, a new inventory gets created. Applying any new patch to Oracle Home will update the inventory for that Oracle Home and record the details for the patch applied. This inventory is in the form of XML files.
The location of inventory is defined in a file called oraInst.loc. The path for this file is provided while installing Oracle Home. If we dont supply any path, the is file will be present at central location /etc/oraInst.loc. Also we can have a central inventory if its not used by any other oracle installation. If the central inventory is used by previous Oracle installation we can create local inventory. The content of oraInst.loc file is as shown below.
[oracle@bcfprddb|BCFPRD /oracle/products/apps]$cd /etc    
[oracle@bcfprddb|BCFPRD /etc]$cat oraInst.loc
inventory_loc=/oracle/oraInventory
inst_group=dba
[oracle@bcfprddb|BCFPRD /etc]$
Here “inventory_loc” gives the location of Inventory directory where as inst_group gives the name of the group which is the owner of this inventory.

How to create Local Inventory?
You can create local inventory at any location while installing Oracle Home. You need to use -invPtrLoc variable along with runInstaller command and give the location of oraInst.loc. Otherwise the default path assumed for oraInst.loc is /etc/oraInst.loc.
./runInstaller -invPtrLoc /slot/ems2029/appmgr/apps/tech_st/10.1.2/oraInst.loc
After the installation starts it will ask for location of oraInventory directory and group which should own it. It will make entry of these into oraInst.loc file.

Opatch help
You can get all the options using opatch -help
[oracle@bcfprddb|BCFPRD /oracle/products/apps]$cd $ORACLE_HOME/OPatch
[oracle@bcfprddb|BCFPRD /oracle/products/apps/10.2/OPatch]$
[oracle@bcfprddb|BCFPRD /oracle/products/apps/10.2/OPatch]$
[oracle@bcfprddb|BCFPRD /oracle/products/apps/10.2/OPatch]$opatch -help
Invoking OPatch 10.2.0.4.9

Oracle Interim Patch Installer version 10.2.0.4.9
Copyright (c) 2009, Oracle Corporation.  All rights reserved.

 Usage: opatch [ -help ] [ -r[eport] ] [ command ]

            command := apply
                       lsinventory
                       napply
                       nrollback
                       rollback
                       query
                       version
                       prereq
                       util

 <global_arguments> := -help       Displays the help message for the command.
                       -report     Print the actions without executing.

 example:
   'opatch -help'
   'opatch auto -help'
   'opatch apply -help'
   'opatch lsinventory -help'
   'opatch napply -help'
   'opatch nrollback -help'
   'opatch rollback -help'
   'opatch prereq -help'
   'opatch util -help'

OPatch succeeded.
You can get specific help for any command of opatch using opatch <command> -help. Example opatch apply -help.

How to Applying the OPATCH?
Check the opatch version
[oracle@bcfprddb|BCFPRD /oracle/products/apps/10.2/OPatch]$opatch version
Invoking OPatch 10.2.0.4.9

OPatch Version: 10.2.0.4.9

OPatch succeeded.
Check the command opatch lsinventory command is working properly.
oracle@bcfprddb|BCFPRD /oracle/products/apps/10.2/OPatch]$opatch lsinventory
Invoking OPatch 10.2.0.4.9

Oracle Interim Patch Installer version 10.2.0.4.9
Copyright (c) 2009, Oracle Corporation.  All rights reserved.

Oracle Home       : /oracle/products/apps/10.2
Central Inventory : /oracle/oraInventory
   from           : /etc/oraInst.loc
OPatch version    : 10.2.0.4.9
OUI version       : 10.2.0.5.0
OUI location      : /oracle/products/apps/10.2/oui
Log file location : /oracle/products/apps/10.2/cfgtoollogs/opatch/opatch2011-09-13_00-12-48AM.log

Patch history file: /oracle/products/apps/10.2/cfgtoollogs/opatch/opatch_history.txt

Lsinventory Output file location : /oracle/products/apps/10.2/cfgtoollogs/opatch/lsinv/lsinventory2011-09-13_00-12-48AM.txt

--------------------------------------------------------------------------------
Installed Top-level Products (3): 

Oracle Database 10g                                                  10.2.0.1.0
Oracle Database 10g Release 2 Patch Set 2                            10.2.0.3.0
Oracle Database 10g Release 2 Patch Set 4                            10.2.0.5.0
There are 3 products installed in this Oracle Home.

There are no Interim patches installed in this Oracle Home.

--------------------------------------------------------------------------------

OPatch succeeded.

shut down the database.
[oracle@bcfprddb|BCFPRD /oracle/products/apps/10.2/OPatch]$sqlplus "/ as sysdba"

SQL*Plus: Release 10.2.0.5.0 - Production on Tue Sep 13 00:30:17 2011

Copyright (c) 1982, 2010, Oracle.  All Rights Reserved.

Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.5.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL> shut immediate;
Database closed.
Database dismounted.
ORACLE instance shut down.

stop the listener.
[oracle@bcfprddb|BCFPRD /oraclebck/10010310]$lsnrctl stop

LSNRCTL for IBM/AIX RISC System/6000: Version 10.2.0.5.0 - Production on 13-SEP-2011 00:30:04

Copyright (c) 1991, 2010, Oracle.  All rights reserved.

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=bcfprddb)(PORT=1521)))
The command completed successfully

Stop the DBCONSOLE
[oracle@bcfprddb|BCFPRD /oracle/products/apps]$emctl stop dbconsole
Oracle Enterprise Manager 10g Database Control Release 10.2.0.5.0  
Copyright (c) 1996, 2010 Oracle Corporation.  All rights reserved.
http://bcfprddb:1158/em/console/aboutApplication
Stopping Oracle Enterprise Manager 10g Database Control ...

Stop isqlplus
[oracle@bcfprddb|BCFPRD /oracle/products/apps]$isqlplusctl stop
iSQL*Plus 10.2.0.5.0
Copyright (c) 2003, 2010, Oracle.  All Rights Reserved.
iSQL*Plus instance on port 5560 is not running ...

Crosscheck the database & listener is not running.
[oracle@bcfprddb|BCFPRD /oracle/products/apps]$lsnrctl status
 LSNRCTL for IBM/AIX RISC System/6000: Version 10.2.0.5.0 - Production on 13-SEP-2011 00:34:31
 Copyright (c) 1991, 2010, Oracle.  All rights reserved.
 Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=bcfprddb)(PORT=1521)))
  TNS-12541: TNS:no listener
  TNS-12560: TNS:protocol adapter error
  TNS-00511: No listener
  IBM/AIX RISC System/6000 Error: 79: Connection refused
oracle@bcfprddb|BCFPRD /oracle/products/apps]$ps -ef|grep pmon_

Unzip the downloaded opatch,
make a copy of the OPatch folder, copy the etc & files folders of the patch to $ORACLE_HOME/OPatch directory,
Run the command opatch apply command from $ORACLE_HOME/OPatch location
I am applying the 10010310 opatch on the database.

[oracle@bcfprddb|BCFPRD /oracle/products/apps/10.2/OPatch]$opatch apply
Invoking OPatch 10.2.0.4.9

Oracle Interim Patch Installer version 10.2.0.4.9
Copyright (c) 2009, Oracle Corporation.  All rights reserved.

Oracle Home       : /oracle/products/apps/10.2
Central Inventory : /oracle/oraInventory
   from           : /etc/oraInst.loc
OPatch version    : 10.2.0.4.9
OUI version       : 10.2.0.5.0
OUI location      : /oracle/products/apps/10.2/oui
Log file location : /oracle/products/apps/10.2/cfgtoollogs/opatch/opatch2011-09-13_00-35-57AM.log

Patch history file: /oracle/products/apps/10.2/cfgtoollogs/opatch/opatch_history.txt

ApplySession applying interim patch '10010310' to OH '/oracle/products/apps/10.2'

Running prerequisite checks...

OPatch detected non-cluster Oracle Home from the inventory and will patch the local system only.

Please shutdown Oracle instances running out of this ORACLE_HOME on the local system.
(Oracle Home = '/oracle/products/apps/10.2')

Is the local system ready for patching? [y|n]
y
User Responded with: Y
Backing up files and inventory (not for auto-rollback) for the Oracle Home
Backing up files affected by the patch '10010310' for restore. This might take a while...
Backing up files affected by the patch '10010310' for rollback. This might take a while...

Patching component oracle.rdbms, 10.2.0.5.0...
Updating archive file "/oracle/products/apps/10.2/lib/libserver10.a"  with "lib/libserver10.a/ksu.o"
Running make for target ioracle
ApplySession adding interim patch '10010310' to inventory

Verifying the update...
Inventory check OK: Patch ID 10010310 is registered in Oracle Home inventory with proper meta-data.
Files check OK: Files from Patch ID 10010310 are present in Oracle Home.

The local system has been patched and can be restarted.

OPatch succeeded.

Check the Patch is listed in the ouptput of opatch lsinventory command: 10010310 should be listed in the above command

[oracle@bcfprddb|BCFPRD /oracle/products/apps/10.2/OPatch]$opatch lsinventory
Invoking OPatch 10.2.0.4.9
 Oracle Interim Patch Installer version 10.2.0.4.9
Copyright (c) 2009, Oracle Corporation.  All rights reserved.
 Oracle Home       : /oracle/products/apps/10.2
Central Inventory : /oracle/oraInventory
   from           : /etc/oraInst.loc
OPatch version    : 10.2.0.4.9
OUI version       : 10.2.0.5.0
OUI location      : /oracle/products/apps/10.2/oui
Log file location : /oracle/products/apps/10.2/cfgtoollogs/opatch/opatch2011-09-13_00-39-11AM.log
 Patch history file: /oracle/products/apps/10.2/cfgtoollogs/opatch/opatch_history.txt
 Lsinventory Output file location : /oracle/products/apps/10.2/cfgtoollogs/opatch/lsinv/lsinventory2011-09-13_00-39-11AM.txt
 --------------------------------------------------------------------------------
Installed Top-level Products (3):
 Oracle Database 10g                                                  10.2.0.1.0
Oracle Database 10g Release 2 Patch Set 2                            10.2.0.3.0
Oracle Database 10g Release 2 Patch Set 4                            10.2.0.5.0
There are 3 products installed in this Oracle Home.
 Interim patches (1) :
 Patch  10010310     : applied on Tue Sep 13 00:36:22 GMT+05:30 2011
Unique Patch ID:  13113841
   Created on 4 Nov 2010, 13:11:42 hrs PST8PDT
   Bugs fixed:
     10010310
 --------------------------------------------------------------------------------
 OPatch succeeded.


Opatch is successfully applied, Start the listener & the database.
Master note for opatch on metalink : Master Note For OPatch [ID 293369.1]

Sunday 8 November 2015

How to correct & fix the corrupted Oracle (9i, 10g, 11g) inventory using runInstaller & opatch?

Inventory is the index for the oracle installation. All the details about the ORACLE_HOME including different utility versions, patch details etc. If you do a fresh Oracle installation the inventory will be updated. But if you copy the ORACLE binaries from some other location and if you are not updating the inventory, we cannot read the inventory using opatch utility.
It will give error. Here I am showing the steps to fix the oracle inventory. Please see.
Step 1: checking the inventory of a corrupted database using opatch lsinventory utility and the result is an error.

$ /oracle/milprod/oracle/product/11.1.0/OPatch/opatch lsinventory -invPtrLoc  ./oraInst.loc
Invoking OPatch 11.1.0.6.9
Oracle Interim Patch Installer version 11.1.0.6.9
Copyright (c) 2009, Oracle Corporation.  All rights reserved.
Oracle Home       : /oracle/milprod/oracle/product/11.1.0
Central Inventory : /oracle/milprod/oracle/product/11.1.0/inventory
   from           : ./oraInst.loc
OPatch version    : 11.1.0.6.9
OUI version       : 11.1.0.7.0
OUI location      : /oracle/milprod/oracle/product/11.1.0/oui
Log file location : /oracle/milprod/oracle/product/11.1.0/cfgtoollogs/opatch/opatch2011-06-15_01-40-34AM.log
Patch history file: /oracle/milprod/oracle/product/11.1.0/cfgtoollogs/opatch/opatch_history.txt
List of Homes on this system:
Inventory load failed... OPatch cannot load inventory for the given Oracle Home.
Possible causes are:
   Oracle Home dir. path does not exist in Central Inventory
   Oracle Home is a symbolic link
   Oracle Home inventory is corrupted
LsInventorySession failed: OracleHomeInventory gets null oracleHomeInfo
OPatch failed with error code 73
oracle@prodserv(230) milprod /oracle/milprod/oracle/product/11.1.0

Step 2
For safer side take a backup or move the current inventory to a new location.
mv -i inventory inventory.old

Step 3
Create a oraInst.loc (if not present already) file with following content in ORACLE_HOME directory
inventory_loc=/oracle/milprod/oracle/product/11.1.0/inventory
inst_group=dba

Step 4
Run the following command to fix the inventory in Unix prompt from ORACLE_HOME direcotry
$ORACLE_HOME/oui/bin/runInstaller -silent -invPtrLoc "/oracle/milprod/oracle/product/11.1.0/oraInst.loc" -attachHome ORACLE_HOME="/oracle/milprod/oracle/product/11.1.0" ORACLE_HOME_NAME="OraDb11g_home"
$ ORACLE_HOME/oui/bin/runInstaller -silent -invPtrLoc "/oracle/milprod/oracle/product/11.1.0/oraInst.loc" -attachHome ORACLE_HOME="/oracle/milprod/oracle/product/11.1.0" ORACLE_HOME_NAME="OraDb11g_home"
Starting Oracle Universal Installer...
Checking swap space: must be greater than 500 MB.   Actual 21312 MB    Passed
Preparing to launch Oracle Universal Installer from /tmp/OraInstall2011-06-15_01-46-42AM. Please wait ...oracle@prodserv(241) milprod /oracle/milprod/oracle/product/11.1.0/oui/bin
$ The inventory pointer is located at /oracle/milprod/oracle/product/11.1.0/oraInst.loc
The inventory is located at /oracle/milprod/oracle/product/11.1.0/inventory
'AttachHome' was successful.

Step 5
Now your corrupted Oracle inventory is fixed. Once again you check your inventory with opatch command.
$ /oracle/milprod/oracle/product/11.1.0/OPatch/opatch lsinventory -invPtrLoc  /oracle/milprod/oracle/product/11.1.0/or>
Invoking OPatch 11.1.0.6.9
Oracle Interim Patch Installer version 11.1.0.6.9
Copyright (c) 2009, Oracle Corporation.  All rights reserved.
Oracle Home       : /oracle/milprod/oracle/product/11.1.0
Central Inventory : /oracle/milprod/oracle/product/11.1.0/inventory
   from           : /oracle/milprod/oracle/product/11.1.0/oraInst.loc
OPatch version    : 11.1.0.6.9
OUI version       : 11.1.0.7.0
OUI location      : /oracle/milprod/oracle/product/11.1.0/oui
Log file location : /oracle/milprod/oracle/product/11.1.0/cfgtoollogs/opatch/opatch2011-06-15_01-49-21AM.log
Patch history file: /oracle/milprod/oracle/product/11.1.0/cfgtoollogs/opatch/opatch_history.txt
Lsinventory Output file location : /oracle/milprod/oracle/product/11.1.0/cfgtoollogs/opatch/lsinv/lsinventory2011-06-15_01-49-21AM.txt
--------------------------------------------------------------------------------
Installed Top-level Products (2):
Oracle Database 11g                                                  11.1.0.6.0
Oracle Database 11g Patch Set 1                                      11.1.0.7.0
There are 2 products installed in this Oracle Home.
Interim patches (3) :
Patch  9352179      : applied on Tue Aug 24 14:11:35 PDT 2010
Unique Patch ID:  12335561
   Created on 1 Apr 2010, 01:22:01 hrs PST8PDT
   Bugs fixed:
     7378322, 9068088, 8348481, 9054253, 6851110, 7206858, 7497788, 8974548
     7331867, 8251486, 8851675, 8211920, 9352179, 7013124, 7643188, 7135702
     7529174, 7196532, 7515779, 7705669, 8402548, 8608377, 7119382, 7510766
     9001453, 8364676, 9066130, 7424804, 7628387, 7408621, 7426336, 7553884
     8856696, 6843972, 7694979, 8565708, 6972189, 6598432, 6768362, 6501490
     8836375, 8216875, 7527650, 7719143, 8402551, 7454752, 8290478, 7412296
     8402555, 7719148, 8284633, 8318050, 7639602, 8539335, 8613137, 7650993
     8940197, 6970731, 7589862, 9170608, 7131291, 9109536, 7586451, 7446163
     8199266, 7460818, 9114072, 6618461, 7451927, 7373196, 8230457, 7336031
     7420394, 8413059, 8402562, 8914979, 8402637, 7393804, 8876094, 6196748
     7627743, 7348847, 8408887, 7183523, 8531282, 7441663, 7720494, 7719668
     8645846, 8539923, 9458811, 8534338, 8339404, 8236851, 9458814, 7602341
     9458816, 7036453, 7610362, 8855553, 7384419, 7690421, 9458819, 8419383
     8855559, 8543737, 7447559, 6772911, 7175513, 7341598, 7706138, 8243648
     8367827, 8365141, 7350127, 9027691, 8549480, 8462173, 7356443, 7593835
     8483871, 8242410, 7044551, 7572069, 7639121, 9458829, 8855565, 8825048
     7253531, 8328853, 8341623, 8409848, 6851669, 7318276, 8257122, 7013817
     8860821, 7309458, 8450529, 8306933, 8306934, 6840740, 9458831, 6981690
     8304329, 8281906, 7480809, 8855570, 8339352, 7340448, 8499600, 7393258
     8588540, 8790767, 8855575, 6599920, 7630416, 7426959, 8855577, 6980601
     8342506, 8717461, 6452375, 8607693, 6407486, 7653579, 7416901, 7281382
     8599477, 7535429, 8582594, 7475055, 8217795, 7409110, 7432514, 8362693
     7436152, 7680907, 7013835, 7708340, 8499043, 8361398, 6784747, 7524944
     7496908, 7662620, 8224083, 7385253, 9189647, 7225720, 6941717, 7122161
     8898852, 8363210, 7477246, 7377810, 7299153, 9118620, 8213302, 8909984
     9118622, 9209238, 8702276, 7497640, 6991626, 7630874, 7311909, 7432601
     7213937, 7462112, 7352414, 7516867, 8199107, 7296258, 6812439, 6870937
     7219752, 7263842, 8287680, 6900214, 8870559, 6882739, 8813366, 8990527
     8352304, 7500792, 8352309, 7538000, 8834636, 8324760, 7652888, 7330434
     7113299, 7307972, 8487273, 7462709, 7486595, 8650661, 9032717, 8658581
     8244217, 7515145, 7499911, 7411865, 7276960, 8890026, 7432556, 7613481
     7522002, 8674263, 7452373, 8221425, 9188010, 7438445, 7334226, 9197917
     8284438, 7494333, 7318049, 8250643, 7834195, 8214576, 6679303, 8815639
     8277580, 7715244, 7675269, 7516536, 8490879, 6980597, 8301559, 7499353
     8496830, 7357609, 7462589, 6647480, 8416414, 7830065, 7189645, 7203349
     8702535, 7436280, 6955744, 7506785, 7366290, 6977167, 8570572, 7702085
     8348464, 7185872, 8625762, 6059178, 7257038, 8542307, 8833297, 8578132
     7606362, 7330611, 8226397, 7628866, 6903819, 8391256, 7676737, 7258928
     7829321, 7311601, 8433270, 7511040, 7434194, 8369094, 8563941, 8563942
     8563943, 8563944, 8563945, 8563946, 7716219, 8563947, 7345904, 8563948
     7556778, 8220734, 6870994, 7597354, 9135679, 7523787
Patch  7019661      : applied on Tue Aug 24 13:39:21 PDT 2010
Unique Patch ID:  11132692.1
   Created on 6 Apr 2009, 07:43:52 hrs US/Pacific
   Bugs fixed:
     7019661
Patch  7394790      : applied on Wed Feb 24 11:48:28 PST 2010
Unique Patch ID:  11678836
   Created on 18 Aug 2009, 15:46:40 hrs PST8PDT
   Bugs fixed:
     7394790
--------------------------------------------------------------------------------
OPatch succeeded.
oracle@prodserv(242) milprod /oracle/milprod/oracle/product/11.1.0/oui/bin

Friday 6 November 2015

Configuring SHMMAX and SHMALL for Oracle in Linux

The logic behind finding the SHMMAX value for your system

If you read the release and install and configuration documentation for Oracle 11g, Oracle recommends you set shmmax to half of your physical memory or 4 GB less 1 byte. Choose whichever is lower.  The interesting 4 GB less 1byte memory setting is based on the virtual memory addressing limitations of the 32 bit (x86) processors. All 64 bit processors can address much larger memory addresse space and is not limited like it’s 32 bit brethren. So much larger shared memory segments can be permitted and created and used.
When Oracle first starts up, after having read it’s configuration files it will know how much memory it needs for it’s assorted SGA’s.  The best fit for an Oracle SGA shared memory segment is when the SGA fits into one shared memory segment.  As I mentioned earlier, Oracle can use non-contiguous shared memory segments, but its not Oracle’s chosen memory model.  Therefore by defenition, if the SGA is larger than the value of SHMMAX, then Oracle has to use one of the two other remaining shared memory models.  Contiguous or non-contiguous memory segments.

I try to set the SHMMAX to the size of the largest SGA defined on your database server.  If your system has 6 Oracle instances configured, 2 require 2 GB SGA’s, 2 require 1.5GB SGA’s and 2 require 1 GB SGA’s, you would set the SHMMAX to 2GB.

Edit your sysctl.conf and add the SHMMAX value to 2 GB in bytes
# vi /etc/sysctl.conf
kernel.shmmax = 2147483648
 
Or, as this is Unix and there are many ways of doing things, use sysctl -w argument=xyz which writes to the sysctl.conf file.
# sysctl -w kernel.shmmax=2147483648
 
Now reload the parameters by asking sysctl politely to to reread it’s configuration
# sysctl –p
 
Now that you have configured shmmax and shmall, and all the other steps required to configure you system for Oracle, I will jump ahead very fast here and install Oracle and create a test db instance.
# su - oracle
$ cd /tmp/2wherever/install/files/are
$ ./runInstaller
….
# service oracle start
Create a test databse
$ dbca
etc…..
# ipcs -a ------ Shared Memory Segments -------- key        shmid      owner      perms      bytes      nattch     status 0x6c6c6536 0          root       600        4096       0 0x2270894c 2359297    oracle     660        1512046592 30 ------ Semaphore Arrays -------- key        semid      owner      perms      nsems 0x00000000 0          root       600        1 0x00000000 65537      root       600        1 0x00000000 294914     apache     600        1 0x00000000 327683     apache     600        1 0x396b9144 983044     oracle     660        154 ------ Message Queues -------- key        msqid      owner      perms      used-bytes   messages
The last few steps have been exceedingly oversimplied, but the point here is to show the shared memory use.  You will also see that user oracle has grabbed 154 semaphores.  This isn’t a huge number, and in this case it’s only a test db instance so no real traffic.

============================


SHMALL sets the total amount of shared memory pages that can be used system wide, in pages.

SHMMAX is the maximum size of a single shared memory segment set in bytes.

The SHMMAX parameter is a safeguard parameter that sets the upper limit of how much shared memory a process can possibly request. The maximum supported value of SHMMAX on a 32-bit system is 4 GB - 1 byte.

SHMALL is the division of SHMMAX/PAGE_SIZE, e.g:. 1073741824/4096=262144.

The default PAGE_SIZE in Linux, which also applies to /dev/shm (POSIX) shared memory is 4 KB (4096 bytes). With the introduction of the Linux kernel 2.6, the system can also be configured to use Huge Pages, which uses a page size of 2 MB (2048 kilobyte). Kernel Huge Pages, unlike standard shared memory pages are non-swapable and therefore set and reserved at system startup.

Oracle 11g database uses Automatic Memory Management (AMM) by default, which relies on /dev/shm. /dev/shm (POSIX shared memory) uses 4 KB pages and is set to 50 % of phsyical memory (RAM), by default. For performance reasons, which can be very drastic depending on how much memory the system has to manage, any Oracle database that uses more than 4 GB of SGA should use kernel Huge Pages.


=================================

======
SHMMAX and SHMALL are two key shared memory parameters that directly impact’s the way by which Oracle creates an SGA. Shared memory is nothing but part of Unix IPC System (Inter Process Communication) maintained by kernel where multiple processes share a single chunk of memory to communicate with each other.
While trying to create an SGA during a database startup, Oracle chooses from one of the 3 memory management models a) one-segment or b) contiguous-multi segment or c) non-contiguous multi segment. Adoption of any of these models is dependent on the size of SGA and values defined for the shared memory parameters in the linux kernel, most importantly SHMMAX.
SHMMAX and SHMALL -

SHMMAX is the maximum size of a single shared memory segment set in “bytes”.

silicon:~ # cat /proc/sys/kernel/shmmax536870912SHMALL is the total size of Shared Memory Segments System wide set in “pages”.silicon:~ # cat /proc/sys/kernel/shmall1415577
The key thing to note here is the value of SHMMAX is set in "bytes" but the value of SHMMALL is set in "pages".


What’s the optimal value for SHMALL?

As SHMALL is the total size of Shard Memory Segments System wide, it should always be less than the Physical Memory on the System and should also be less than sum of SGA’s of all the oracle databases on the server. Once this value (sum of SGA’s) hit the limit, i.e. the value of shmall, then any attempt to start a new database (or even an existing database with a resized SGA) will result in an “out of memory” error (below). This is because there won’t be any more shared memory segments that Linux can allocate for SGA.

ORA-27102: out of memory

Linux-x86_64 Error: 28: No space left on device.
So above can happen for two reasons. Either the value of shmall is not set to an optimal value or you have reached the threshold on this server.
Setting the value for SHMALL to optimal is straight forward. All you want to know is how much “Physical Memory” (excluding Cache/Swap) you have on the system and how much of it should be set aside for Linux Kernel and to be dedicated to Oracle Databases.
For e.g. Let say the Physical Memory of a system is 6GB, out of which you want to set aside 1GB for Linux Kernel for OS Operations and dedicate the rest of 5GB to Oracle Databases. Then here’s how you will get the value for SHMALL.
Convert this 5GB to bytes and divide by page size. Remember SHMALL should be set in “pages” not “bytes”.
So here goes the calculation.


Determine Page Size first, can be done in two ways. In my case it’s 4096 and that’s the recommended and default in most cases which you can keep the same.

silicon :~ # getconf PAGE_SIZE
4096or
silicon:~ # cat /proc/sys/kernel/shmmni4096

Convert 5GB into bytes and divide by page size, I used the linux calc to do the math.
silicon:~ # echo "( 5 * 1024 * 1024 * 1024 ) / 4096 " | bc -l1310720.00000000000000000000

Reset shmall and load it dynamically into kernel
silicon:~ # echo "1310720" > /proc/sys/kernel/shmallsilicon:~ # sysctl –p Verify if the value has been taken into effect.
silicon:~ # sysctl -a | grep shmallKernel.shmall = 1310720
Another way to look this up is
silicon:~ # ipcs -lm
------ Shared Memory Limits --------max number of segments = 4096 /* SHMMNI */max seg size (kbytes) = 524288 /* SHMMAX */max total shared memory (kbytes) = 5242880 /* SHMALL */min seg size (bytes) = 1


To keep the value effective after every reboot, add the following line to /etc/sysctl.conf

echo “kernel.shmall = 1310720” >> /etc/sysctl.conf
Also verify if sysctl.conf is enabled or will be read during boot.
silicon:~ # chkconfig boot.sysctlboot.sysctl on
If returns “off”, means it’s disabled. Turn it on by running
silicon:~ # chkconfig boot.sysctl onboot.sysctl on
What’s the optimal value for SHMMAX?
Oracle makes use of one of the 3 memory management models to create the SGA during database startup and it does this in following sequence. First Oracle attempts to use the one-segment model and if this fails, it proceeds with the next one which's the contiguous multi-segment model and if that fails too, it goes with the last option which is the non-contiguous multi-segment model.
So during startup it looks for shmmax parameter and compares it with the initialization parameter *.sga_target. If shmmax > *.sga_target, then oracle goes with one-segment model approach where the entire SGA is created within a single shared memory segment.
But the above attempt (one-segment) fails if SGA size otherwise *.sga_target > shmmax, then Oracle proceeds with the 2nd option – contiguous multi-segment model. Contiguous allocations, as the name indicates are a set of shared memory segments which are contiguous within the memory and if it can find such a set of segments then entire SGA is created to fit in within this set.


But if cannot find a set of contiguous allocations then last of the 3 option’s is chosen – non-contiguous multi-segment allocation and in this Oracle has to grab the free memory segments fragmented between used spaces.

So let’s say if you know the max size of SGA of any database on the server stays below 1GB, you can set shmmax to 1 GB. But say if you have SGA sizes for different databases spread between 512MB to 2GB, then set shmmax to 2Gigs and so on.
Like SHMALL, SHMMAX can be defined by one of these methods..
Dynamically reset and reload it to the kernel..


silicon:~ # echo "536870912" > /proc/sys/kernel/shmmaxsilicon:~ # sysctl –p -- Dynamically reload the parameters.Or use sysctl to reload and reset ..
silicon:~ # sysctl -w kernel.shmmax=536870912

To permanently set so it’s effective in reboots…
silicon:~ # echo "kernel.shmmax=536870912" >> /etc/systctl.conf

Install doc for 11g recommends the value of shmmax to be set to "4GB – 1byte" or half the size of physical memory whichever is lower. I believe “4GB – 1byte” is related to the limitation on the 32 bit (x86) systems where the virtual address space for a user process can only be little less than 4GB. As there’s no such limitation for 64bit (x86_64) bit systems, you can define SGA’s larger than 4 Gig’s. But idea here is to let Oracle use the efficient one-segment model and for this shmmax should stay higher than SGA size of any individual database on the system.

####################################################################################################################################

PURPOSE
Shared memory and semaphores are two important resources for an Oracle
instance on Unix. An instance cannot start if it is unable to allocate what
it needs. This paper primarily discusses the process Oracle goes through to
allocate shared memory and semaphores at instance startup. Other important
points unrelated to startup as well as some troubleshooting information will
be touched upon.


SCOPE & APPLICATION
Understanding Oracle and Shared Memory/Semaphores.




Unix Semaphores and Shared Memory Explained

===============================

General
=======


Shared memory is exactly that - a memory region that can shared between
different processes. Oracle uses shared memory for implementing the
SGA, which needs to be visible to all database sessions. Shared memory
is also used in the implementation of the SQL*Net V1 Fast driver as a
means of communicating between the application and shadow process. On
the RS/6000, each shadow process stores its PGA in a shared memory
segment (however, only the shadow attaches this segment). In the
latter two cases, Oracle allocates the shared memory dynamically as
opposed to the allocation of the SGA, which occurs at instance startup.
This allocation will not be discussed in this paper.


Semaphores can be thought of as flags (hence their name, semaphores).
They are either on or off. A process can turn on the flag or turn it off.
If the flag is already on, processes who try to turn on the flag will
sleep until the flag is off. Upon awakening, the process will
reattempt to turn the flag on, possibly suceeding or possibly sleeping
again. Such behaviour allows semaphores to be used in implementing a
post-wait driver - a system where processes can wait for events (i.e.
wait on turning on a semphore) and post events (i.e. turning of a
semaphore). This mechanism is used by Oracle to maintain concurrency
control over the SGA, since it is writeable by all processes attached.
Also, for the same reasons, use of the Fast Driver requires additional
semaphores. However, these semaphores will be allocated dynamically
instead of at instance startup. This allocation will not be discussed in
this paper.


Instance startup
================


On instance startup, the first things that the instance does is:

-Read the "init<SID>.ora"

-Start the background processes

-Allocate the shared memory and semphores required

The size of the SGA will be calculated from various "init.ora" parameters.
This will be the amount of shared memory required. The SGA is broken into 4
sections - the fixed portion, which is constant in size, the variable portion,
which varies in size depending on "init.ora" parameters, the redo block
buffer, which has its size controlled by log_buffers, and the db
block buffer, which has its size controlled by db_block_buffers.


The size of the SGA is the sum of the sizes of the 4 portions.
There is unfortunately no simple formula for determining the size
of the variable portion. Generally, the shared pool dominates all
other parts of the variable portion, so as a rule of thumb, one can
estimate the size as the value of shared_pool_size (in v6, one can
ignore the size of the variable portion).


The number of semphores required is much simpler to determine. Oracle will
need exactly as many semaphores as the value of the processes "init.ora"
parameter.


Note that the recommended kernel parameter values in the ICG are enough
to support the default database (4M SGA, 50 processes), but may be
insufficient to run a larger instance. With the above estimations and the
information which follows, a DBA should be able to build a kernel with
appropriate settings to support the instance.



Shared memory allocation
========================


Oracle has 3 different possible models for the SGA - one-segment,
contiguous multi-segment, and non-contiguous multi-segment.
When attempting to allocate and attach shared memory for the SGA, it
will attempt each one, in the above order, until one succeeds or raises
an ORA error. On other, non-fatal, errors, Oracle simply cleans up and
tries again using the next memory model. The entire SGA must fit into
shared memory, so the total amount of shared memory allocated under any
model will be equal to the size of the SGA. This calculated value will
be referred to below as SGASIZE.


The one-segment model is the simplest and first model tried. In this
model, the SGA resides in only one shared memory segment. Oracle attempts
to allocate and attach one shared memory segement of size equal to total
size of the SGA. However, if the SGASIZE is larger than the configured
SHMMAX, this will obviously fail (with EINVAL). In this case, the SGA will
need to be placed in multiple shared memory segments, and Oracle proceeds
to the next memory model for the SGA. If an error other than EINVAL occurs
when allocating the shared memory with shmget(), Oracle will raise an
ORA-7306. If the segment was received (i.e. if SHMMAX > SGASIZE), Oracle
attempts to attach it at the start address defined in ksms.o. An error
on the attach will raise an ORA-7307.


With multiple segments there are two possibilities. The segments
can be attached contiguously, so that it appears to be one large
shared memory segment, or non-contiguously, with gaps between the
segments. The former wastes less space that could be used for the stack
or heap, but depending on alignment requirements for shared memory
(defined by SHMLBA in the kernel), it may not be possible.


At this point, Oracle needs to determine SHMMAX so it can determine how many
segments will be required. This is done via a binary search
algorithm over the range [1...SGASIZE] (since Oracle is trying this
model and not the one segment model it must be that SHMMAX<SGASIZE)
The value of SHMMAX calculated will then be rounded to an even page size
(on some machines, possibly to an even 2 or 4 page block).


In the contiguous segment model, Oracle simply divides the SGA into
SGASIZE/SHMMAX (rounded down) segments of size SHMMAX plus another segment
of size SGASIZE modulo SHMMAX. If more than SS_SEG_MAX segments are
required total, an ORA-7329 is raised. It then allocates and attaches
one segment at a time, attaching the first segment at the start address
defined in "ksms.o". Subsequent segments are attached at an address equal
to the previous segment's attach address plus the size of the previous
segment so that they are contiguous in memory.


For example, if SHMMAX is 2M, SGASIZE is 5M, and the start address is
0xd0000000, there would be 3 segments, 2 of 2M and 1 of 1M. They would be
attached at 0xd0000000, 0xd0000800 (0xd0000000+2M), and 0xd0001000
(0xd0000800+2M). If Oracle receives an error allocating a shared memory
segment, an ORA-7336 is raised.


If an error is raised on attaching a shared memory segement, Oracle checks
the system error returned. If it is EINVAL, the attach address used is most
likely badly aligned (not a mulitple of SHMLBA). In this case, Oracle tries
the next model for SGA allocation, non-contiguous segments. Otherwise, an
ORA-7337 is raised.


The last model Oracle will try is the non-contiguous model. Here,
things become a bit more complicated. After calculating SHMMAX, Oracle
first checks to see if it can put the fixed and variable portion into
one shared memory segment just large enough to hold the two portions
If it can, it allocates a segment just big enough to hold both portions.
If it cannot, it will put them each into their own seperate segment just
large enough to hold each portion. If the fixed portion is larger than
SHMMAX, an ORA-7330 will be raised. If the variable portion is larger
than SHMMAX, an ORA-7331 will be raised. Then Oracle computes the number
of redo block buffers it can fit in a segment (rounded down to an
integral number of buffers - buffers cannot overlap segments). An ORA-7332
is raised is SHMMAX is smaller than the size of a redo block.


Similarly, the number of db block buffers per segment is calculated, with an
ORA-7333 raised if SHMMAX is too small to hold one db block. Then Oracle can
compute the total number of segments required for both the redo and database
block buffers. This will be buffers/buffers per segment (rounded down) segments
and one (if necessary) of buffers modulo buffers per segment size, calculated
seperately for both the redo and db block buffers. These segments will be
of a size just large enough to hold the buffers (so no space is wasted).


The total number of segments allocated will then be the number needed for
the fixed and variable portions (1 or 2) plus the number needed for the
redo block buffers plus the number of segments needed for the database block
buffers. If this requires more than SS_SEG_MAX segments, an ORA-7334 is
raised.


Once the number of segments and their sizes is determined, Oracle
then allocates and attaches the segments one at a time; first the fixed
and variable portion segment(s), then the redo block buffer segment(s),
then the db block buffer segment(s). They will be attached non-contiguously,
with the first segment attached at the start address in "ksms.o" and following
segments being attached at the address equal to the attach address of the
previous segment plus the size of the previous segment, rounded up to a
mulitple of SHMBLA.


If Oracle receives an error allocating a shared memory segment, an ORA-7336 is
raised. If an error is raised on attaching a shared memory segement, Oracle
checks the system error returned. If it is EINVAL, normally another model
would be tried, but as there are no more models to try, an ORA-7310 is raised.
Other attach errors raise an ORA-7337.


At this point, we have either attached the entire SGA or returned an
ORA error. The total size of segments attached is exactly SGASIZE;
no space is wasted. Once Oracle has the shared memory attached, Oracle
proceeds to allocating the semaphores it requires.


Semaphore allocation
====================


Semaphore allocation is much simpler than shared memory. Oracle just
needs to allocate a number of semaphores equal to the processes parameter
in "init.ora". PROCESSES will be used to refer to this value. Note on
machines with a post-wait kernel extension, Oracle does not need to allocate
semaphores (because it doesn't need to implement its own post-wait mechanism).


Oracle uses semaphores to control concurrency between all the
background processes (pmon, smon, dbwr, lgwr, and oracle shadows).
Semaphores are also used to control two-task communication between
the user process and shadow process if the fast (shared memory)
driver is used. And in the Unix ports based on MIPS RISC
processors, Oracle uses a special semaphore to perform basic
test & set functions that are not provided by the processor.


Typing "ipcs -sb" will show you what semaphores are allocated to
your system at the moment. This will display all the semaphore
sets allocated, their identifying number, the owner, the number
of semaphores in each set, and more.


Occasionally, unexpected termination of Oracle processes will
leave semaphore resources locked. If your database is not
running, but "ipcs -sb" shows that semaphore sets owned by
oracle are still in use, then you need to deallocate (free)
them. If you don't do this, then you may not be able to allocate
enough semaphores later to restart your database.


Freeing semaphore sets is done with the "ipcrm" command. For
each set that oracle has allocated, type "ipcrm -s ID" where ID
is the set number you see from the "ipcs" output. Semaphores can
also be freed by rebooting the system.


ORA-7250, ORA-7279, ORA-27146
If the environment variable ORANSEMS is set, Oracle will use that value
as the number it will allocate per set. Oracle will attempt to allocate
one set of size ORANSEMS. If this fails, an ORA-7250 is raised. If
ORANSEMS is not set, Oracle tries to determine the maximum number of
semaphores allowed per set (SEMMSL). It does this by first trying to
allocate a set of PROCESSES semaphores. If this fails with EINVAL, it
tries again, this time trying to get one fewer semaphore. If this fails
an ORA-7279 or ORA-27146 on 8.1.X or higher is raised. This process
continues until either the semget() succeeds, or when the
number of semaphores Oracle is attempting to allocate drops to zero.
Increase the kernel parameter SEMMNS if an ORA-7279 or ORA-27146 is
generated.


ORA-7251
If the latter case occurs, an ORA-7251 will be raised. Now Oracle begins
allocating sets of size SEMMSL (or ORANSEMS, as the case may be) until it has
at least PROCESSES semaphores.


ORA-7252, ORA-7339
All semaphore sets will be the same size, so if PROCESSES is not a multiple
of SEMMSL (or ORANSEMS), there will be additional semaphores allocated that
will not be used (or in other words, PROCESSES/SEMMSL, rounded up, sets of
SEMMSL semaphores will be allocated). Should an error occur trying to
allocate a semaphore set, ORA-7252 will be raised. If more than SS_SEM_MAX
semaphore sets would be required, an ORA-7339 occurs.


At this point, Oracle has either allocated at least PROCESSES semaphores
or returned an ORA error. All IPC resources required by Oracle on Unix
have been allocated and the related information can be written into the
sgadef file for this instance for later use by other processes which connect
to the instance.


Connecting to an instance
=========================


All shadow processes, when starting, attempt to attach the SGA. Shadows
will be started whenever there is a logon attempt (the connect command
includes an implicit logoff, so it produces a new shadow). The only
exception is SQL*Dba in version 7 which immediately spawns a shadow process
and where connect commands do not spawn a new shadow. Also, since SQL*Dba
is used to start up the database, errors encountered in attempting to attach
the SGA will be discarded because the SGA may not have been allocated yet.
When a startup command is issued later, the SGA and semaphores will be
allocated. Note that this applies only to version 7 and sqldba.


What Oracle does when attempting to connect to the SGA depends on the
version of Oracle. In version 6, the "sgadef<SID>.dbf" file is used to
get the necessary information. In version 7, the SGA itself contains
the information about the shared memory and semaphores (how the
bootstrap works will be explained later). In either case, the
information stored is the same - the key, id, size, and attach
address of each shared memory segment and the key, id, and size of
each semaphore set. Note that we need not do anything special to
initialize the semaphores. We can use them with the data structure
we read in on connecting.


The version 6, approach is rather simple. It first tries to open the
"sgadef<SID>.dbf" file. If it cannot, an ORA-7318 is raised. Once
opened, the data written earlier on startup is read. If an error
occurs for some reason on the read, an ORA-7319 occurs. Once all the
data is read in, Oracle attaches each segment in turn.


First, it generates what it believes the key for the segment should be. It
then gets that segment, returning ORA-7429 if it fails. The key used
and the key stored are then compared. They should be equal, but if
not, an ORA-7430 occurs. Once the key is verified, the segment is
attached. A failure to attach the segment raises an ORA-7320. If
the segment is attached, but not at the address we requested, an
ORA-7321 occurs. This process is repeated for all segments until the
entire SGA is attached.


Version 7 differs only in the first part, when the shared memory and
semaphore data is read. Once that data is read in, Oracle proceeds in
the same manner. To fetch this data, Oracle generates what it thinks
should be the key for the first segment of the SGA and attaches it
as if it were the only segment. Once it is attached, the data is
copied from the SGA. With this data, Oracle attaches any remaining
segments for the SGA.


There is one possible problem. If somehow two instances have a key
collision (i.e. they both generate the same key for their first segment), it
is possible to only have one of the two instances up at a time! Connection
attempts to either one will connect a user to whichever instance is up.
This is rare, but can happen. Development is currently working on a better
key generation algorithm.


Note: See NOTE:399261.1 for information regarding 10g and newer releases
as a new feature known as NUMA optimization.



Attaching shared memory
=======================


As seen in previous sections, shared memory must be received (this may
mean allocating the shared memory, but not necessarily) and then
attached, to be used. Attaching shared memory brings the shared
memory into the process' memory space. There are some important
things about attach addresses. For one thing, they may need to be
alligned on some boundary (generally defined by SHMLBA). More
importantly, shared memory must mapped to pages in the process'
memory space which are unaccounted for. Every process already has a
text, a data, and a stack segment laid out as follows (in general):


+---------+ high addresses
| stack |
|---------| -+
| | | |
| v | |
|---------| |
| shm seg | |- unused portion
|---------| | These are valid pages for shared memory
| ^ | | Pages are allocated from this area
| | | | as both the stack and heap(data) grow
|---------| -+
| data |
|---------|
| text |
+---------+ low addresses


So, valid attach addresses lie in the unused region between the stack
and the data segments (a shared memory segment is drawn in the
diagram to aid in visualization - not every process has shared memory
attached!). Of course, the validity also depends on the
size of the segment, since it cannot overlap another segment. Note
that both the stack and data segments can grow during the life of a
process. Because segments must be contiguous and overlapping is not
allowed, this is of some importance.


Attaching shared memory creates a limit on how much the stack or data segment
can grow. Limiting the stack is typically not a problem, except when running
deeply recursive code. Neither is limiting the data segment, but this does
restrict the amount memory that can be dynamically allocated by a
program. It is possible (but seldom) that some applications
running against the database may hit this limit in the shadow (since
the shadow has the SGA attached). This is the cause of ORA-7324 and
ORA-7325 errors. How to deal with these is discussed in the
troubleshooting section.


The SGA is attached, depending on the allocation model used, more or
less contiguously (there may be gaps, but those can be treated as if
they were part of the shared memory). So where the beginning of the
SGA can be attached depends on the SGA's size. The default address
which is chosen by Oracle is generally sufficient for most SGAs.
However, it may be necessary to relocate the SGA for very large
sizes. It may also need to be changed if ORA-7324 or ORA-7325 errors
are occuring. The beginning attach address is defined in the file
"ksms.s". Changing the attach address requires recompilation of the
Oracle kernel and should not be done without first consulting Oracle
personnel. Unfortunately, there is no good way to determine what a good
attach address will be.


When changing the address to allow a larger SGA, a good rule of thumb is
taking the default attach address in "ksms.s" and subtracting the size of
the SGA. The validity of an attach address can be tested with the Oracle
provided tstshm executable. Using:


tstshm -t <size of SGA> -b <new attach address>

will determine if the address is usable or not.


Troubleshooting
===============


Errors which might have multiple causes are discussed in this
sections. Errors not mentioned here generally have only one cause
which has a typically obvious solution.


ORA-7306, ORA-7336, ORA-7329
Oracle received a system error on a shmget() call. The system error
should be reported. There are a few possibilities:


1) There is insufficient shared memory available. This is
indicated by the operating system error ENOSPC. Most likely, SHMMNI
is too small. Alternatively, there may shared memory already
allocated; if it is not attached, perhaps it can be freed. Maybe
shared memory isn't configured in the kernel.


2) There is insufficient memory available. Remember, shared memory
needs pages of virtual memory. The system error ENOMEM indicates there
is insufficient virtual memory. Swap needs to be increased, either by
adding more or by freeing currently used swap (i.e. free other shared
memory, kill other processes)


3) The size of the shared memory segment requested is invalid. In this
case, EINVAL is returned by the system. This should be very rare - however,
it is possible. This can occur if SHMMAX is not a mulitple of page
size and Oracle is trying a multi-segment model. Remember that Oracle
rounds its calculation of SHMMAX to a page boundary, so it may have
rounded it up past the real SHMMAX! (Whether this is a bug is
debatable).


4) The shared memory segment does not exist. This would be indicated
by the system error ENOENT. This would never happen on startup; it
only would happen on connects. The shared memory most likely has been
removed unexpectedly by someone or the instance is down.


ORA-7307, ORA-7337, ORA-7320
Oracle received a system error on a shmat() call. The system should be
reported. There a a few possibilities:


1) The attach address is bad. If this is the cause, EINVAL is returned
by the system. Refer to the section on the attach address to see why
the attach address might be bad. This may happen after enlarging the
SGA.


2) The permissions on the segment do not allow the process to attach
it. The operating system error will be EACCES. Generally the cause of
this is either the setuid bit is not turned on for the oracle
executable, or root started the database (and happens to own the shared
memory). Normally, this would be seen only on connects.


3) The process cannot attach any more shared memory segments. This
would be accompanieed by the system error EMFILE. SHMSEG is too
small. Note that as long as SHMSEG is greater than SS_SEG_MAX, you
should never see this happen.


ORA-7329, ORA-7334
Oracle has determined the SGA needs too many shared memory segments. Since you
can't change the limit on the number of segments, you should instead increase
SHMMAX so that fewer segments are required.


ORA-7339
Oracle has determined it needs too many semaphore sets. Since you
can't change the limit on the number of semaphore sets, you should
increase SEMMSL so fewer sets are required.


ORA-7250, ORA-7279, ORA-7252, ORA-27146
Oracle received a system error on a semget() call. The system error should be
reported. There should be only one system error ever returned with
this, ENOSPC. This can mean one of two things. Either the system
limit on sempahore sets has been reached or the system limit on the
total number of semaphores has been reached. Raise SEMMNI or SEMMNS,
as is appropriate, or perhaps there are some semaphore sets which can
be released. In the case of ORA-7250, ORANSEMS may be set too high
(>SEMMSL). If it is, raise SEMMSL or decrease ORANSEMS.


ORA-7251
Oracle failed to allocate even a semaphore set of only one semaphore. It is
likely that semaphores are not configured in the kernel.


ORA-7318
Oracle could not open the sgadef file. The system error number will be
returned. There are a few possible causes:


1) The file doesn't exist. In this case, the system error ENOENT is
returned. Maybe ORACLE_SID or ORACLE_HOME is set wrong so that Oracle
is looking in the wrong place. Possibly the file does not exist (in this
case, a restart is necessary to allow connections again).


2) The file can't be accessed for reading. The operating system error returned
with this is EACCES. The permissions on the file (or maybe
directories) don't allow an open for reading of the sgadef file. It
might not be owned by the oracle owner. The setuid bit might not be
turned on for the oracle executable.


ORA-7319
Oracle did not find all the data it expected when reading the
sgadef<SID>.dbf file. Most likely the file has been truncated. The
only recovery is to restart the instance.


ORA-7430
Oracle expected a key to be used for the segment which does not match the
key stored in the shared memory and semaphore data structure. This probably
indicates a corruption of the sgadef file (in version 6) or
the data in the first segment of the SGA (in version 7). A restart of
the instance is probably necessary to recover in that case. It may
also be a key collision problem and Oracle is attached to the wrong
instance.


ORA-7321
Oracle was able to attach the segment, but not at the address it
requested. In most cases, this would be caused by corrupted data in
the sgadef file (in version 6) or the first segment of the SGA (in
version 7). A restart of the database may be necessary to recover.


ORA-7324, ORA-7325
Oracle was unable to allocate memory. Most likely, the heap (data
segment) has grown into the bottom of the SGA. Relocating the SGA to a
higher attach address may help, but there may be other causes. Memory
leaks can cause this error. The init.ora parameter sort_area_size may be
too large, decreasing it may resolve the error. The init.ora parameter
context_incr may also be too large, decreasing it may resolve this


ORA-7264, ORA-7265
Oracle was unable to decrement/increment a semaphore. This generally
is accompanied by the system error EINVAL and a number which is the
identifier of the semaphore set. This is almost always because the
semaphore set was removed, but the shadow process was not aware of it
(generally due to a shutdown abort or instance crash). This error
is usually ignorable.


System Parameters
=================


SHMMAX - kernel parameter controlling maximum size of one shared memory
segment
SHMMNI - kernel parameter controlling maximum number of shared memory segments
in the system
SHMSEG - kernel parameter controlling maximum number of shared memory segments
a process can attach
SEMMNS - kernel parameter controlling maximum number of semphores in
the system
SEMMNI - kernel parameter controlling maximum number of semaphore
sets. Semphores in Unix are allocated in sets of 1 to SEMMSL.
SEMMSL - kernel parameter controlling maximum number of semaphores in a
semphore set.
SHMLBA - kernel parameter controlling alignment of shared memory
segments; all segments must be attached at multiples of this value.
Typically, non-tunable.



System errors
=============


ENOENT - No such file or directory, system error 2
ENOMEM - Not enough core, system error 12
EACCES - Permission denied, system error number 13
EINVAL - Invalid argument, system error number 22
EMFILE - Too many open files, system error number 24
ENOSPC - No space left on device, system error number 28


Oracle parameters
=================

SS_SEG_MAX - Oracle parameter specified at compile time (therefore,
unmodifiable without an Oracle patch) which defines maximum
number of segements the SGA can reside in. Normally set to 20.
SS_SEM_MAX - Oracle parameter specified at compile time (therefore,
unmodifiable without an Oracle patch) which defined maximum
number of semaphore sets oracle will allocate. Normally set to 1



=================