DataGuard
Data Guard Cheatsheet
Terminology
Log Files
Data Guard Broker
There are a number of commands that you can use to change the state of the database
Redo Processing
Logical Standby
Monitoring
Switchover, Failover and FSFO
Primary database | A production database |
Standby database | A database that can become the primary database, should the primary fail |
EOR | End Of Redo |
LWGR | Log Writer process |
LNS | Log Network Server |
ORL | Online Redo Log |
RFS | Remote File Server |
SRL | Standby Redo Log file |
SYNC and ASYNC | Synchronous and Asynchronous |
DG alert Log | drc<db_unique_name>.log |
Alert Log | alert_<SID>.log |
Logfile locations | # change the instance name to reflect the one you have choosen and the path you installed oracle prod1 (alert log): /u01/app/oracle/diag/rdbms/prod1/PROD1/trace/alert_PROD1.log prod1 (DG log): /u01/app/oracle/diag/rdbms/prod1/PROD1/trace/drcPROD1.logprod1dr (alert log): /u01/app/oracle/diag/rdbms/prod1dr/PROD1DR/trace/alert_PROD1DR.log prod1dr (DG log): /u01/app/oracle/diag/rdbms/prod1dr/PROD1DR/trace/drcPROD1DR.log |
identify log files | ## You can get the log locations from the below view col name for a25 col value for a65; select name, value from v$diag_info; |
Create base configuration | # Primary Database server DGMGRL> create configuration prod1 as > primary database is prod1 > connect identifier is prod1; Configuration "prod1" created with primary database "prod1" |
Add the standby database |
# Primary Database server - if you have setup db_unique_name, tnsname and log_archive_dest_n
DGMGRL> add database prod1dr; # Primary Database server - the full command set DGMGRL> connect sys/password DGMGRL> add database prod1dr > as connect identifier is prod1dr > maintained as physical;
Database "prod1dr" added
|
Display configuration | DGMGRL> show configuration |
Display Database | DGMGRL> show database verbose prod1 |
Enabling the configuration | # Primary Database server DGMGRL> enable configuration Enabled. |
Edit configuration | EDIT CONFIGURATION SET PROPERTY <name>=<value> EDIT DATABASE <db_name> SET PROPERTY <name>=<value> EDIT INSTANCE <in_name> SET PROPERTY <name>=value> There are many options see the broker section for more information |
Troubleshooting (Monitoring commands and log files)
| |
configuration | DGMGRL> show configuration; |
database | DGMGRL> show database prod1; DGMGRL> show database prod1dr; # There are a number of specific information commands, here are the most used DGMGRL> show database prod1 statusreport; DGMGRL> show database prod1 inconsistentProperties; DGMGRL> show database prod1 inconsistentlogxptProps; DGMGRL> show database prod1 logxptstatus; DGMGRL> show database prod1 latestlog; |
Logfiles | # change the instance name to reflect the one you have choosen prod1 (alert log): /u01/app/oracle/diag/rdbms/prod1/PROD1/trace/alert_PROD1.log prod1 (DG log): /u01/app/oracle/diag/rdbms/prod1/PROD1/trace/drcPROD1.logprod1dr (alert log): /u01/app/oracle/diag/rdbms/prod1dr/PROD1DR/trace/alert_PROD1DR.log prod1dr (DG log): /u01/app/oracle/diag/rdbms/prod1dr/PROD1DR/trace/drcPROD1DR.log |
turn off/on the redo transport service for all standby databases |
Primary
| DGMGRL> edit database prod1 set state=transport-off; DGMGRL> edit database prod1 set state=transport-on; |
turn off/on the apply state |
Standby
| DGMGRL> edit database prod1dr set state=apply-off; DGMGRL> edit database prod1dr set state=apply-on; |
put a database into a real-time query mode |
Standby
| DGMGRL> edit database prod1dr set state=apply-off; sql> alter database open read only; DGMGRL> edit database prod1dr set state=apply-on; |
change the protection mode |
Primary
| # Choose what level of protection you require sql> alter database set standby to maximize performance; sql> alter database set standby to maximize availability; sql> alter database set standby to maximize protection; # display the configuration DGMGRL> show configuration |
Redo Processes (Primary and Standby Databases)
| |
Processes | There are a number of Oracle background processes that play a key role, first the primary database
|
Real-Time Apply
| |
Enable real-time apply | sql> alter database recover managed standby database using current logfile disconnect; |
Determine if real-time apply is enabled | sql> select recovery_mode from v$archive_dest_status where dest_id = 2; RECOVERY_MODE -------------------------- MANAGED REAL-TIME APPLY |
Tools and views to monitor redo
| |
Background processes | select process, client_process, thread#, sequence#, status from v$managed_standby; ## primary (example) PROCESS CLIENT_P THREAD# SEQUENCE# STATUS --------- -------- ---------- ---------- ------------ ARCH ARCH 1 58 CLOSING ARCH ARCH 0 0 CONNECTED ARCH ARCH 1 59 CLOSING ARCH ARCH 1 56 CLOSING LNS LNS 1 60 WRITING LNS LNS 1 60 WRITING ## physical standby (example) PROCESS CLIENT_P THREAD# SEQUENCE# STATUS --------- -------- ---------- ---------- ------------ ARCH ARCH 0 0 CONNECTED ARCH ARCH 1 55 CLOSING ARCH ARCH 0 0 CONNECTED ARCH ARCH 1 59 CLOSING RFS N/A 0 0 IDLE RFS UNKNOWN 0 0 IDLE RFS UNKNOWN 0 0 IDLE RFS LGWR 1 60 IDLE MRP0 N/A 1 60 APPLYING_LOG ## Logical standby (example) PROCESS CLIENT_P THREAD# SEQUENCE# STATUS --------- -------- ---------- ---------- ------------ ARCH ARCH 1 55 CLOSING ARCH ARCH 1 10 CLOSING ARCH ARCH 0 0 CONNECTED ARCH ARCH 0 0 CONNECTED RFS UNKNOWN 0 0 IDLE RFS LGWR 1 60 IDLE RFS UNKNOWN 0 0 IDLE RFS UNKNOWN 0 0 IDLE |
Information on Redo Data | select * from v$dataguard_stats; Note: this indirectly shows how much redo data could be lost if the primary db crashes |
Redo apply rate | select to_char(snapshot_time, 'dd-mon-rr hh24:mi:ss') snapshot_time, thread#, sequence#, applied_scn, apply_rate from v$standby_apply_snapshot; Note: this command can only run when the database is open |
Recovery operations | select to_char(start_time, 'dd-mon-rr hh24:mi:ss') start_time, item, round(sofar/1024,2) "MB/Sec" from v$recovery_progress where (item='Active Apply Rate' or item='Average Apply Rate'); |
schema that are not maintained by SQL apply | select owner from dba_logstdby_skip where statement_opt = 'INTERNAL SCHEMA' order by owner; Note: system and sys schema are not replicated so don't go creating tables in these schemas, the above command should return about 17 schemas (Oracle 11g) that are replicated. |
Check tables with unsupported data types | select distinct owner, table_name from dba_logstdby_unsupported; select owner, table_name from logstdby_unsupported_tables; |
skip replication of tables | ## Syntax dbms_logstdby.skip ( stmt in varchar2, schema_name in varchar2 default null, object_name in varchar2 default null, proc_name in varchar2 default null, use_like in boolean default true, esc in char1 default null ); ## Examples execute dbms_logstdby.skip(stmt => 'DML', schema_name => 'HR', object_name => 'EMPLOYEE'); execute dbms_logstdby.skip(stmt => 'SCHEMA_DDL', schema_name => 'HR', object_name => 'EMPLOYEE'); # skip all DML operations execute dbms_logstdby.skip(stmt => 'DML', schema_name => 'HR', object_name => '%'); |
revoke a skipped table | stop SQL apply execute dbms_logstdby.instantiate_table(schema_name => 'HR', table_name => 'EMPLOYEE', DBLINK => 'INSTANTIATE_TABLE_LINK'); execute dbms_logstdby.skip(stmt => 'DML', schema_name => 'HR', object_name => 'EMPLOYEE'); start SQL apply Note: the dblink should point to the primary database, we have to stop SQL apply as the instantiate table procedure uses Oracle's data pump network interface to lock the source table to obtain the SCN at the primary database, it then releases the lock and gets a consistent snapshot of the table from the primary database, it remembers the SCN associated with the consistent snapshot. |
display what tables are being skipped | select owner, name, use_like, esc from dba_logstdby_skip where statement_opt = 'DML'; |
setting the guard on a database | alter database guard standby; |
Inside SQL Apply
| |
List the above processes | select * from v$logstdby_process |
Increase the LCR cache size | # Set the cache size to 200MB execute dbms_logstdby.apply_set('MAX_SGA', 200); |
How much LCR cache is being used | select used_memory_size from v$logmnr_session where session_id = (select value from v$logstdby_stats where name = 'SESSION_ID'); |
setting SQL apply mode for the application | execute dbms_logstdby.apply_set (name => 'PRESERVE_COMMIT_ORDER', value => FALSE); |
Determine the number of DDL statements since the last restart | select name, value from v$logstdby_stats where name = 'DDL TXNS DELIVERED'; NAME VALUE ------------------------------------------------------------------------ DDL TXNS DELIVERED 510 |
displaying the barrier | select status_code as sc, status from v$logstdby_process where type = 'BUILDER'; sc status ------------------------------------------------------------------------------------- 44604 BARRIER SYNCHRONIZATION ON DDL WITH XID 1.15.256 (WAITING ON 17 TRANSACTIONS) |
Tuning SQL Apply
| |
MAX_SERVERS | # Set the MAX_SERVERS to 8 x the number of cores execute dbms_logstdby.apply_set ('MAX_SERVERS', 64); |
MAX_SGA | # Set the MAX_SGA to 200MB execute dbms_logstdby.apply_set ('MAX_SGA', 200); |
_HASH_TABLE_SIZE | # Set the Hash table size to 10 million execute dbms_logstdby.apply_set ('_HASH_TABLE_SIZE', 10000000); |
DDL | defer DDLs to off-peak hours |
Preserve commit order | # Set the PERSERVE_COMMIT_ORDER to false execute dbms_logstdby.apply_set (name => 'PRESERVE_COMMIT_ORDER', value => FALSE); |
lagging SQL Apply | # apply lag: indicates how current the replicated data at the logical standby is # transport lag: indicates how much redo data that has already been generated is missing at the logical # standby in term of redo records select name, value, unit from v$dataguard_stats; |
SQL Apply component bottleneck | select name, value from v$logstdby_stats where name like 'TRASNACTIONS%'; Name Value ----------------------------------------------------------------------------------------------------- TRANSACTIONS APPLIED 3764 TRANSACTIONS MINED 4985 The mined transactions should be about twice the applied transaction, if this decreases or staying at a low value you need to start looking at the mining engine. |
Make sure all preparers are busy | select count(1) as idle_preparers from v$logstdby_process where type = 'PREPARER' and STATUS_CODE = 16166; IDLE_PREPARER ---------------------------- 0 |
Make sure the peak size is well below the amount allocated | select used_memory_size from v$logstdby_session where session_id = (select value from v$logstdby_stats where name = 'LOGMINER SESSION ID'); USED_MEMORY_SIZE ---------------------------- 32522244 |
verify that the preparer does not have enough work for the applier processes | select (available_txn - pinned_txn) as pipleline_depth from v$logstdby_session where session_id (select value from v$lostdby_stats where name = 'LOGMINER SESSION ID'); PIPELINE_DEPTH ---------------------------- 8 select count(*) as applier_count from v$logstdby_process where type = 'APPLIER'; APPLIER_COUNT ---------------------------- 20 |
Setting max_servers and preparers | execute dbms_logstdby.apply_set('MAX_SERVERS', 36); execute dbms_logstdby.apply_set('PREPARE_SERVERS', 3); |
display the pageout activity | ## Run this first select name, value from v$logstdby_stats where name line '%PAGE%' or name like '%UPTIME' or name like '%IDLE%'; ## Run the second time about 10 mins later select name, value from v$logstdby_stats where name line '%PAGE%' or name like '%UPTIME' or name like '%IDLE%'; Now subtract one from the other and work out the percentage rate, if pageout has increase above 5% then increase the MAX_SERVERS |
unassigned large transactions | ## By default SQL apply should be one-sixth of the number of applier processes select (available_txn - pinned_txn) as pipleline_depth from v$logstdby_session where session_id (select value from v$lostdby_stats where name = 'LOGMINER SESSION ID'); PIPELINE_DEPTH ---------------------------- 256 select count(1) as idle_applier from v$logstdby_process where type = 'APPLIER' and statuscode = 16166; IDLE_APPLIER --------------------------- 12 ## Now look for the unassigned large transactions select value from v$logstdby_stats where name = 'LARGE TXNS WAITING TO BE ASSIGNED'; VALUE --------------------------- 12 |
archive gap logs | # Use the thread# when using RAC an detect missing sequences select thread#, low_sequence#, high_sequence# from v$archive_gap; |
delays in redo transport | select max(sequence#), thread# from v$archived_log group by thread#; ## you can use the dg_archivelog_monitor.sh script, which accepts three parameters, primary, physical ## and the archive log threshold (# of archive logs) dg_archivelog_monitor.sh <primary> <standby> <threshold> |
Identify the missing logs on the primary | ## On the primary run the below select L.thread#, L.sequence# from (select thread#, sequence# from v$archived_log where dest_id=1) L where L.sequence# not in (select sequence# from v$archived_log where dest_id=2 and thread# = L.thread#); |
apply rate and active monitoring | select to_char(start_time, 'DD-MON-RR HH24:MI:SS') start_time, item , sofar from v$recovery_progress where item in ('Active Apply Rate', 'Average Apply Rate', 'Redo Applied'); Note: the redo applied is measured in megabytes, while the average apply rate and the active apply rate is measured in kilobytes. |
transport and apply lag | col name for a13 col value for a13 col unit for a30 set lines 132 select name, value, unit, time_computed from v$dataguard_stats where name in ('transport lag', 'apply lag'); ## use the dg_time_lag.ksh script dg_time_lag.ksh |
Viewing the status of the managed recovery process | col client_pid for a10; select pid, process, status, client_process, client_pid, thread#, sequence#, block#, blocks from v$managed_standby; |
Complete Switchover | ## Start the switcover on the original primary alter database commit to switchover to standby; ## On the new primary complete the switchover alter database commit to switchover to primary; ## Now open the database on the new primary alter database open; |
Complete Failover | ## Start the failover alter database commit to switchover to primary; # Change the level of protection that you require sql> alter database set standby to maximize performance; sql> alter database set standby to maximize availability; sql> alter database set standby to maximize protection; |
Broker switchover | DGMGRL> switchover to prod1lr |
Action |
Step
| Commands |
check redo has been received |
1
| ## check the syn status, it should say yes (run on the standby) sql> select db_unique_name, protection_mode, synchronization_status, synchronized from v$archive_dest_status ## if it says NO then lets make further checks (run on the standby) sql> select client_process, process, sequence#, status from v$managed_standby; ## now check on the primary we should be one in front (run on the primary) sql> select thread#, sequence#, status from v$log; Note: if using a RAC environment make sure you check each instance |
check that redo has been applied (physical) |
2
| ## check that MRP (applying_log) matches the RFS process, if the MRP line is missing then you need to ## start the apply process, you also may see the status of wait_for_gap so wait until the gap have been ## resolved first sql> select client_process, process, sequence#, status from v$managed_standby; |
check that redo has been applied (logical) |
3
| ## if you are using a logical standby then you need to check the following to confirm the redo has been ## applied sql> select applied_scn, latest_scn, mining_scn from v$logstdby_progress; ## if the mining scn is behind you may have a gap check this by using the following sql> select status from v$logstdby_process where type = 'READER'; |
show any running jobs or backups |
4
| sql> select process, operation, r.status, mbytes_processed pct, s.status from v$rman_status r, v$session s where r.sid=s.sid; |
increase logging level (if required) |
5
| sql> alter system set log_archive_trace=8129; ## to turn it off again sql> alter system set log_archive_trace=0; |
check for active sessions |
6
| ## Display the active sessions sql> select program, type from v$session where type='USER'; |
check the switchover status |
7
| ## make sure the status is "to standby", if you get "sessions active", then stop those sessions (see above command to identify ## sessions) sql> select switchover_status from v$database; |
tail the log alert log file |
8
| tail alert??.log |
switchover (primary) |
9
| ## on the primary, after this command completes you will have two physical standbys sql> alter database commit to switchover to physical standby with session shutdown; Note: at this point if you want to rollback this switchover see my troubleshooting section to get ot back to normal |
check the switchover status |
10
| sql> select switchover_status from v$database; |
complete the switchover (physical) |
11
| sql> alter database commit to switchover to primary with session shutdown; |
open the new primary |
12
| sql> alter database open; |
finish off the old primary |
13
| sql> shutdown immediate; sql> startup mount; sql> alter database recover managed standby database using current logfile disconnect; |
Action |
Step
| Commands |
check redo has been received |
1
| ## check the syn status, it should say yes (run on the standby) sql> select db_unique_name, protection_mode, synchronization_status, synchronized from v$archive_dest_status ## if it says NO then lets make further checks (run on the standby) sql> select client_process, process, sequence#, status from v$managed_standby; ## now check on the primary we should be one in front (run on the primary) sql> select thread#, sequence#, status from v$log; Note: if using a RAC environment make sure you check each instance |
check that redo has been applied (physical) |
2
| ## check that MRP (applying_log) matches the RFS process, if the MRP line is missing then you need to ## start the apply process, you also may see the status of wait_for_gap so wait until the gap have been ## resolved first sql> select client_process, process, sequence#, status from v$managed_standby; |
check that redo has been applied (logical) |
3
| ## if you are using a logical standby then you need to check the following to confirm the redo has been ## applied sql> select applied_scn, latest_scn, mining_scn from v$logstdby_progress; ## if the mining scn is behind you may have a gap check this by using the following sql> select status from v$logstdby_process where type = 'READER'; |
show any running jobs or backups |
4
| sql> select process, operation, r.status, mbytes_processed pct, s.status from v$rman_status r, v$session s where r.sid=s.sid; |
increase logging level (if required) |
5
| sql> alter system set log_archive_trace=8129; ## to turn it off again sql> alter system set log_archive_trace=0; |
check for active sessions |
6
| ## Display the active sessions sql> select program, type from v$session where type='USER'; |
check the switchover status |
7
| ## make sure the status is "to standby", if you get "sessions active", then stop those sessions (see above command to identify ## sessions) sql> select switchover_status from v$database; |
tail the log alert log file |
8
| tail alert??.log |
Prepare the primary standby |
9
| sql> alter database prepare to switchover to logical standby; ## confirm that the prepare has started to happen, you should now see "preparing switchover" sql> select switchover_status from v$database; |
Prepare the logical standby |
10
| sql> alter database prepare to switchover to primary; ## confirm that the prepare has started to happen, you should see "preparing dictionary" sql> select switchover_status from v$database; ## wait a while until the dictionary is built and sent and you should see "preparing switchover" sql> select switchover_status from v$database; |
Check primary database state |
11
| ## you should now see its in the state of "to logical standby" sql> select switchover_status from v$database; |
the last chance to CANCEL the switchover (no going back after this) |
12
| ## On the primary sql> alter database prepare to switchover cancel; ## on the logical sql> alter database prepare to switchover cancel; |
switchover the primary to a logical standby |
13
| sql> alter database commit to switchover to logical standby; |
switchover the logical standby to a primary |
14
| ## check that its ready to become the primary, you should see "to primary" sql> select switchover_status from v$database ## Complete the switchover sql> alter database commit to standby to primary; |
start the apply process |
15
| sql> alter database start logical standby apply immediate; |
Action |
Step
| Commands |
Check redo applied |
1
| ## This will tell you the lag time select name, value, time_computed from v$dataguard_stats where name like '%lag%'; ## You can also use the SCN number select thread#, sequence#, last_change#, last_time from v$standby_log; |
the failover process (physical standby) |
2
| ## Start by telling the apply process that this standby is going to be the new primary, and to apply all ## the redo that it has alter database recover managed standby database cancel; alter database recover managed standby database finish; ## At this point the protection mode is lowered select protection_mode from v$database; ## Now issue the switchover command and then open the database alter database commit to switchover to primary with session shutdown; alter database open; ## Startup the other RAC instances if using RAC ## You can then raise the protection mode (if desired) set standby database to maximum protection; |
the failover process (logical standby) |
2
| alter database activate logical standby database finish apply; |
Action |
Step
| Commands |
bring back the old primary (physical standby) |
1
| ## Since redo is applied by SCN we need he failover SCN from the new primary select to_char(standby_became_primary_scn) failover_scn from v$database; FAILOVER_SCN ----------------------------------------------- 7658841 ## Now flashback the old primary to this SCN and start in mount mode startup mount; flashback database to scn 7658841; alter database convert to physical standby; shutdown immediate; startup mount; ## hopefully the old primary will start to resolve any gap issues at the next log switch, which means we can start the MRP ## process to get this standby going to catchup as fast as possible alter database recover managed standby database using current logfile disconnect; ## eventually the missing redos will be sent to the standby and applied, bring us back to synchronization again. |
bring back the old primary (logical standby) |
2
| ## again we need to obtained the SCN select merge_change# as flashback_scn, processed_change# as recovery_scn from dba_logstdby_history where stream_sequence# = (select max(stream_sequence#)-1 from dba_logstdby_history); flashback_scn recovery_scn --------------------------------------------------------- 7658941 7659568 ## Now flashback the old primary to this SCN and start in mount mode startup mount; flashback database to scn 7658841; alter database convert to physical standby; shutdown immediate; startup mount; ## Now we need to hand feed the archive logs from the primary to the standby (old primary) into the MRP ## process, so lets get those logs (run on the primary) select file_name from dba_logstdby_log where first_changed# <= recovery_scn and next_change# > flashback_scn; ## Now you will hopefully have a short list of the files you need, now you need to register them with ## the standby database (old primary) alter database register logfile '<files from above list>'; ## Now you can recover up to the SCN but not including the one you specify recover managed standby database until change 7659568; ## Now the standby database becomes a logical standby as up to this point it has been a physical one. alter database active standby database; ## Lastly you need tell your new logical standby to ask the primary for a new copy of the dictionary and ## all the redo in between. The SQL Apply will connect to the new primary using the database link and ## retrieve the LogMiner dictionary, once the dictionary has been built, SQL Apply will apply all the ## redo sent from the new primary and get itself synchronized create public database link reinstatelogical connect to system identified by password using 'service_name_of_new_primary_database'; alter database start logical standby apply new primary reinstatelogical; |
Use the Broker to bring back the old Primary
| ||
Use the broker to do it all for you |
n/a
| DGMGRL> failover to prod1dr; DGMGRL> reinstate database prod1; |
Fast Start Failover (FSFO)
| |
Monitor a specific condition via the Broker | DGMGRL> enable fast_start failover condition "Corrupted Controlfile"; DGMGRL> enable fast_start failover condition "Datafile Offline"; |
Display conditions that are be monitored | DGMGRL> show fast_start failover; |
Select the standby to become the primary | DGMGRL> edit database prod1 set property FastStartFailoverTarget = 'prod1dr'; DGMGRL> edit database prod1dr set property FastStartFailoverTarget = 'prod1'; |
change threshold | DGMGRL> edit configuration set property FastStartFailoverTargetThreshold = 45; |
lag limit | DGMGRL> edit configuration set property FastStartFailoverLagLimit = 60; |
abort primary if in a hung state | DGMGRL>edit configuration set property FastStartFailoverPmyShutdown = true; |
reinstate primary after a failover | DGMGRL>edit configuration set property FastStartFailoverAutoReinstate = true; |
Enable FSFO | DGMGRL> enable fast_start failover; ## Display the configuration DGMGRL> show fast_start failover; |
No comments:
Post a Comment