Log Gap Detection and Resolution
by Arif Malik, Sr. Database Administrator
Contents
- Introduction
- Primary Database Heartbeat
- Determining if an Archive Gap Exists
- Using FAL_CLIENT and FAL_SERVER
- Further Reading
- About the Author
Introduction
Archive gaps are simply a range of archived redo logs that were created at a time when the standby database was not available to receive them as they were being generated by the primary database. These archive gaps occur most often during network unavailability between the primary and standby database. During network outages like this, the standby database fails to receive any redo data from the primary database. When the network failure has been resolved, automatic transmission of redo data from the primary to the standby database is resumed. The missing archived redo logs signify the gap.
Primary Database Heartbeat
In general, it is the primary database that first discovers there is a gap. Every one minute, the primary database polls all of its standby databases to determine if there is a gap in the sequence of archived redo logs. This polling between the primary and standby database is often referred to as the heartbeat. This heartbeat is performed by the primary database serially. The DBA can always determine if there is a gap in the archived redo logs by querying the V$ARCHIVE_GAP view (from the physical standby database) as described in the next section.
Determining if an Archive Gap Exists
As mentioned in the previous section, the DBA can determine if there is a there is a gap in the archived redo logs by querying the V$ARCHIVE_GAP view from the physical standby database. Let's walk through a quick example of how to manually check and resolve a gap in the archived redo logs.
We start with the following assumptions:
- The Data Guard configuration consists of a physical primary database and one physical standby database.
- The Data Guard configuration is set for Maximum Performance.
- The name of the physical primary database is TESTDB.IDEVELOPMENT.INFO and resides on the node vmlinux3.idevelopment.info.
- The name of the physical standby database is TESTDB.IDEVELOPMENT.INFO and resides on the node vmlinux4.idevelopment.info.
- The primary database is configured to send redo to the physical standby database by defining: log_archive_dest_2='service=TESTDB_VMLINUX4 optional reopen=15'
- Neither FAL_CLIENT nor FAL_SERVER is defined on the physical standby database.
- The standby database is in managed recovery mode and is current (all archived redo logs have been applied) with the primary database. The current log sequence on the primary database is 16.
With the above assumptions, perform a query of the V$ARCHIVE_GAP view from the physical standby database:
SQL> select * from v$archive_gap;
no rows selected
|
From the output of the above query, we can conclude that there are no gaps in the archived redo logs on the physical standby database!
Now, let's simulate a network failure by shutting down the Oracle TNS Listener process and physical standby database on vmlinux4.idevelopment.info:
$ lsnrctl stop
$ sqlplus "/ as sysdba" <
|
At this point, the physical standby database is completely down and unavailable to the primary database. One of the first things to notice is a series of RFS errors in the alert.log for the primary database.
Errors in alert.log from the Primary Database
...
Sat Jul 22 23:19:04 2006
ARC1: Evaluating archive log 3 thread 1 sequence 16
ARC1: Beginning to archive log 3 thread 1 sequence 16
Creating archive destination LOG_ARCHIVE_DEST_2: 'TESTDB_VMLINUX4'
Creating archive destination LOG_ARCHIVE_DEST_1: '/u02/oraarchive/TESTDB/arch_t1_s16.dbf'
ARC1: Completed archiving log 3 thread 1 sequence 16
Sat Jul 22 23:21:25 2006
Errors in file /u01/app/oracle/admin/TESTDB/bdump/testdb_arc1_3495.trc:
ORA-12541: TNS:no listener
Sat Jul 22 23:22:25 2006
Errors in file /u01/app/oracle/admin/TESTDB/bdump/testdb_arc1_3495.trc:
ORA-12541: TNS:no listener
Sat Jul 22 23:23:25 2006
Errors in file /u01/app/oracle/admin/TESTDB/bdump/testdb_arc1_3495.trc:
ORA-12541: TNS:no listener
...
|
We can then further investigate the trace file (testdb_arc1_3495.trc) written to the alert.log above.
Errors in testdb_arc1_3495.trc from the Primary Database
/u01/app/oracle/admin/TESTDB/bdump/testdb_arc1_3495.trc
Oracle9i Enterprise Edition Release 9.2.0.7.0 - Production
With the Partitioning, Oracle Label Security, OLAP and Oracle Data Mining options
JServer Release 9.2.0.7.0 - Production
ORACLE_HOME = /u01/app/oracle/product/9.2.0
System name: Linux
Node name: vmlinux3
Release: 2.6.9-22.EL
Version: #1 Sat Oct 8 17:48:27 CDT 2005
Machine: i686
Instance name: TESTDB
Redo thread mounted by this instance: 0
|
Now let's perform a few log switches from the primary database:
SQL> alter system switch logfile;
System altered.
SQL> alter system switch logfile;
System altered.
SQL> alter system switch logfile;
System altered.
|
Next, bring up the Oracle TNS Listener process and mount the physical standby database:
$ lsnrctl start
$ sqlplus "/ as sysdba"
SQL> startup nomount
SQL> alter database mount standby database;
SQL> alter database recover managed standby database disconnect from session;
|
After a minute has passed, the primary database (the ARCH1 process) will wake up and attempt to ping the physical standby database. Once it is determined that the physical standby database is mounted (and the TNS Listener process is up), it will start to push any archived redo logs to the standby database. This can be see in the alert.log from the primary database:
RFS Process Recovery in alert.log from the Primary Database
...
Sat Jul 22 23:28:26 2006
ARC1: Begin FAL archive (thread 1 sequence 17 destination TESTDB_VMLINUX4)
Creating archive destination LOG_ARCHIVE_DEST_2: 'TESTDB_VMLINUX4'
ARC1: Complete FAL archive (thread 1 sequence 17 destination TESTDB_VMLINUX4)
ARC1: Begin FAL archive (thread 1 sequence 18 destination TESTDB_VMLINUX4)
Creating archive destination LOG_ARCHIVE_DEST_2: 'TESTDB_VMLINUX4'
ARC1: Complete FAL archive (thread 1 sequence 18 destination TESTDB_VMLINUX4)
ARC1: Begin FAL archive (thread 1 sequence 19 destination TESTDB_VMLINUX4)
Creating archive destination LOG_ARCHIVE_DEST_2: 'TESTDB_VMLINUX4'
ARC1: Complete FAL archive (thread 1 sequence 19 destination TESTDB_VMLINUX4)
...
|
Finally, let's query the V$ARCHIVE_GAP view from the physical standby database to determine if there are any gaps:
SQL> select * from v$archive_gap;
no rows selected
|
As we can see, there are no gaps in the archive redo logs being reported!
Using FAL_CLIENT and FAL_SERVER
As mentioned throughout this article, we can see that Data Guard provides gap detection and resolution automatically by functionality provided in the primary database. The primary database polls its standby databases using a heartbeat every one minute. No extra configuration settings are required and no intervention is needed by the DBA to detect and resolve these gaps. In addition to this built-in functionality; however, it is also possible to configurelog apply services to take part in automatically resolving archive gaps as they occur on the physical standby database side.
FAL_CLIENT and FAL_SERVER are initialization parameters used to configure log gap detection and resolution at the standby database side of a physical database configuration. This functionality is provided by log apply servicesand is used by the physical standby database to manage the detection and resolution of archived redo logs.
- FAL_CLIENTSpecifies the TNS network services name for the standby database (which is sent to the FAL server process by log apply services) that should be used by the FAL server process to connect to the standby database. The syntax would be:
FAL_CLIENT=
- FAL_SERVERSpecifies the TNS network service name that the standby database should use to connect to the FAL server process. The syntax would be:
FAL_SERVER=
Note that the value provided for FAL_SERVER is not limited to defining only one FAL server; it can define multiple FAL servers (possible a FAL server that is running on another standby database) that are separated by commas as in the following example:FAL_SERVER='primary_db','standby_db1','standby_db2','standby_db3'
FAL server is a background process that is generally located on the primary database server (however it can also be found on another standby database). FAL server is responsible for servicing incoming requests from the FAL client.
No comments:
Post a Comment