Anotomy of an RMAN TSPITR restore which I didn’t do

Wow, that’s a handful — “restore which I didn’t do“, alright then why mention it? Well, here’s why. Imagine you get an email with a title “PROJECT-NAME – QA Schema Rollback?” You quickly read through it and realize that a developer messed up a very important heavily used schema on a QA system for a project that’s about to go live. A quick analysis of the situation reveals that a delete was issued on three tables setup with “cascade constraints” clause and it wiped out data from 11 detail tables. Why would you do that is another question, which I took up with the development team but I digress …

What do you do?

First thing that came to my mind was “flashback“! Sounds good, but what do you flashback? A schema? No such thing. A table? Maybe, but what about referential integrity in 11 detail tables? That’s OK it can be solved like this — I flashback the master tables first and then the 11 detail tables, it might work … Lets check the pre-reqs — undo_management=AUTO (check), undo_retention = 900 (not good — it’s been over 15 minutes). Alright, forget about flashing back anything (mental note to myself — increase undo_retention to few hours).

What’s next?

The next thing is point in time recovery. Quick check shows that the schema is fully contained in a single tablespace — very good, we can do a tablespace level recovery instead of the whole database. And since we are on 10gR2 how about fully automated RMAN TSPITR? Sounds good, lets see what we need for this to work:

  • The target instance, containing the tablespace to be recovered
  • The Recovery Manager client
  • The control file and (optional) recovery catalog, used for the RMAN repository records of backup activity
  • Archived redo logs and backup sets from the target database, which are the source of the reconstructed tablespace.
  • The auxiliary instance, an Oracle database instance used in the recovery process to perform the actual work of recovery.
  • The target time, the point in time or SCN that the tablespace will be left at after TSPITR
  • The recovery set, which consists of the datafiles containing the tablespaces to be recovered;
  • The auxiliary set, which includes datafiles required for TSPITR of the recovery set which are not themselves part of the recovery set. The auxiliary set typically includes:
  • The auxiliary destination, an optional location on disk which can be used to store any of the auxiliary set datafiles, control files and online logs of the auxiliary instance during TSPITR. Files stored here can be deleted after TSPITR is complete.

A handful list, but doable. Alright, suppose I have all this (I did) and suppose I want to go through it, how does it work? Here’s how:

To perform TSPITR of the recovery set using RMAN and an automated auxiliary instance, you carry out the preparations for TSPITR described in “Planning and Preparing for TSPITR”, and then issue the RECOVER TABLESPACE command, specifying, at a minimum, the tablespaces of the recovery set and the target time for the point-in-time recovery, and, if desired, an auxiliary destination as well. RMAN then carries out the following steps:

  1. If there is no connection to an auxiliary instance, RMAN creates the auxiliary instance, starts it up and connects to it.
  2. Takes the tablespaces to be recovered offline in the target database
  3. Restores a backup control file from a point in time before the target time to the auxiliary instance
  4. Restores the datafiles from the recovery set and the auxiliary set to the auxiliary instance. Files are restored either in locations you specify for each file, or the original location of the file (for recovery set files) or in the auxiliary destination (for auxiliary set files, if you used the AUXILIARY DESTINATION argument of RECOVER TABLESPACE)
  5. Recovers the restored datafiles in the auxiliary instance to the specified time
  6. Opens the auxiliary database with the RESETLOGS option
  7. Exports the dictionary metadata about objects in the recovered tablespaces to the target database
  8. Shuts down the auxiliary instance
  9. Issues SWITCH commands on the target instance, so that the target database control file now points to the datafiles in the recovery set that were just recovered at the auxiliary instance.
  10. Imports the dictionary metadata from the auxiliary instance to the target instance, allowing the recovered objects to be accessed.
  11. Deletes all auxiliary set files.

At that point the TSPITR process is complete. The recovery set datafiles are returned to their contents at the specified point in time, and belong to the target database.

The first thing that should jump at you here is item# 2 — “Takes the tablespaces to be recovered offline in the target database” because that’s when the real downtime will begin. Why? Because realize that while some of the schema’s data might have been lost, at least in this particular case it’s not caused a compete outage yet, we still have a website connected to it and functioning. But as soon as I take the tablespace offline it will be a true outage and I will be under the gun to deliver it ASAP. ASAP == interruptions and stress, which as you know inevitably leads to mistakes. And mistakes made while performing RMAN’s TSPITR could cause quite serious issues, much bigger then you’ve started with.

Here’s one scenario where it’s very easy to make a mistake. Lest say the developer tells you that the schema was corrupted at t1. You go through with RMAN TSPITR and realize that he was wrong and that you need to redo RMAN TSPITR using t1-15 minutes. Can you do that? ………. The answer is Not unless you are using recovery catalog, here’s why:

Assume that you run TSPITR on a tablespace, and then bring the tablespace online at time t. Backups of the tablespace created before time t are no longer usable for recovery with a current control file. You cannot run TSPITR again on this tablespace to recover it to any time less than or equal to time t, nor can you use the current control file to recover the database to any time less than or equal to t. Therefore, you must back up the recovered tablespace as soon as TSPITR is complete.

It is extremely important that you choose the right target time or SCN for your TSPITR. As noted already, once you bring a tablespace online after TSPITR, you cannot use any backup from a time earlier than the moment you brought the tablespace online. In practice, this means that you cannot make a second attempt at TSPITR if you choose the wrong target time the first time, unless you are using a recovery catalog. (If you have a recovery catalog, however, you can perform repeated TSPITRs to different target times.)

For example, assume that you are not using a recovery catalog, and you run TSPITR on a tablespace, and then bring the tablespace online at 5PM on Friday. Backups of the tablespace created before 5PM Friday are no longer usable for recovery with a current control file. You cannot run TSPITR again on this tablespace with a target time earlier than 5PM Friday, nor can you use the current control file to recover the database to any time earlier than 5PM Friday. Your only option will be point-in-time recovery of your entire database using a restored control file.

Now that’s a bit of a problem isn’t it? What this means is that you might actually have just ONE SHOOT at getting it right. You miss it and you might be done, at least with using RMAN TSPITR, you can still recover full database using backup control file from before time t.

Considering all of these nuances it quickly became apparent to me that perhaps RMAN TSPITR is not the right solution in this case. It sure sounded great on paper. What’s not to like here, just one command like this:

RECOVER TABLESPACE users, tools 
     UNTIL LOGSEQ 1300 THREAD 1
      AUXILIARY DESTINATION '/disk1/auxdest';

Solves your issue! But it’s not as simple, it’s not just one command, there’s a lot more to this and even if you are willing to take the risk there are other limitations you need to be aware of. If any of the following exists in the tablespace to be recovered, then RMAN TSPITR will not work:

  • tables with VARRAY
  • nested tables
  • external tables
  • Snapshot logs and snapshot tables
  • undo or rollback segments
  • objects owned by SYS

You’d know if you had any Snapshot logs or tables because DBA’s typically set those up, the same thing applies to RBS/UNDO/SYS objects. Checking the rest would be easy:

SQL> col data_type format a30
SQL> r
  1  select count(*),DATA_TYPE from dba_tab_columns where owner='SCHEMA-IN-TABLESPACE'
  2* group by DATA_TYPE

  COUNT(*) DATA_TYPE
---------- ------------------------------
         2 LONG RAW
        72 NUMBER
         1 CLOB
        20 DATE
       200 VARCHAR2
         2 BLOB

6 rows selected.

SQL> select count(*) from dba_nested_tables where owner='SCHEMA-IN-TABLESPACE'
  2  ;

  COUNT(*)
----------
         0

SQL> select count(*) from dba_external_tables where owner='SCHEMA-IN-TABLESPACE'
  2  ;

  COUNT(*)
----------
         0

SQL>

You also have to consider that all the while you are thinking and researching, the database is still being accessed, and it’s entirely possible that changes are being made to the schema in question. If you restore it to the time in the past all of these changes will lost, so you’d have to actually take a quick backup of the schema as it is now before you perform RMAN TSPITR just in case you’ll need anything from it …

Also realize that all of this information you have to either know my heart or sift through very quickly because a decision needs to be made as to how to solve this issue. And unless you have recently practiced RMAN TSPITR on this specific tablespace in this specific database a conservative DBA, which all DBAs should be, would never make a decision to use RMAN TSPITR. I didn’t — I used RMAN backup set to recover the database on another host, exported the schema in question and imported it as another schema into the original database so that developers could reconstruct lost rows from it. My solution, while not the fastest was the sure thing and that’s what our job is all about.

October 7, 2009

Posted In: Operations, RMAN

Tags: ,

EVNT install notes for Oracle 10.2.0.4

You are probably wondering what is EVNT?

EVNT is an Oracle monitoring framework I wrote and have been using since 2002. My then boss encouraged me to devote few months of my time to automating all our monitoring needs. We analyzed our requirements, designed the framework and I wrote all the code in few months. In retrospect, this was the best time investment we’ve made — it’s still serving us well after all these years and numerous upgrades — we started with 8.1.7 and now are on 10gR2. Which brings me to why I started this post — I just wrote up EVNT install notes for 10.2.0.4.

I also heavily use OEM — it’s good for lots of things and not so good for others. Ironically, I still don’t trust OEM for critical database up-time monitoring. And if you too, feel that OEM is missing in some areas I encourage you to give EVNT a shoot — DOWNLOAD: EVNT – Event Monitoring Framework for Oracle. EVNT is especially useful to those DBAs that rely heavily on sqlplus/shell scripts for database monitoring.

September 30, 2009

Posted In: Operations

Tags: ,

Exadata Version 2 – final blow to Oracle/HP relationship?

It took Oracle buying SUN to finally come out with something worthy this relationship — Exadata Version 2. Oracle has killed two birds with one stone capitalizing on their superior Oracle Enterprise Linux and SUN’s x64 hardware, once again proving that it’s bet is on OEL not Solaris.

The pricing model allows incremental growth using the following stages:

Basic System – $110,000
Quarter Rack – $350,000
Half Rack – $650,000
Full Rack – $1.15M

Exadata Version 2 Pricing

Key Internals are as follows (per Full RACK):
8 Sun Fire X4170 DB Nodes per RACK
14 Sun Fire X4275 Storage Nodes per RACK
Combo of multiple Sun Quad Data Rate (QDR) Datacenter InfiniBand Switches 36 capable of 40Gb/sec

Oracle/SUN claim that by using X4275 they were able to significantly simplify storage solution eliminating complex SAN architectures. Here’s a brief overview of the architecture – Sun Oracle Database Machine and Exadata Storage Server. And here’s the presentation Larry Ellison gave at The Sun Oracle Database Machine Announcement.

So is Exadata Version 2 a final blow to Oracle/HP relationship? Time will tell, all I can say is that we couldn’t be happier with our HP hardware running Oracle Application Servers, Oracle RAC and various middle-tier solutions on Oracle EL.

September 22, 2009

Posted In: Operations

Tags: , , , , , ,

annoyance with Oracle Patch Set Update PSU 8576156

I was just applying PSU 8576156 on a 10.2.0.4 installation under Solaris and I was surprised to learn that it requires you to provide your metalink username/email address and optionally password, under disguise of “to be informed of security issues“. Once you provide your email/password it immediately tries to validate the information by contacting oracle’s servers via HTTP — it doesn’t bother to check if you have a proxy it just goes out there and hangs for few minutes eventually failing to connect at which point you are asked if there’s an HTTP PROXY on your network. Only after failing to connect, you are finally given a chance to OPT-Out of this ridiculous practice of information gathering by specifying “NONE” in the “Proxy specification“:

box.SID-> /u01/app/oracle/product/10.2.0/db_1/OPatch/opatch apply
Invoking OPatch 10.2.0.4.7

Oracle Interim Patch Installer version 10.2.0.4.7
Copyright (c) 2009, Oracle Corporation.  All rights reserved.

Oracle Home       : /u01/app/oracle/product/10.2.0/db_1
Central Inventory : /u01/app/oracle/oraInventory
   from           : /var/opt/oracle/oraInst.loc
OPatch version    : 10.2.0.4.7
OUI version       : 10.2.0.4.0
OUI location      : /u01/app/oracle/product/10.2.0/db_1/oui
Log file location : /u01/app/oracle/product/10.2.0/db_1/cfgtoollogs/opatch/opatch2009-09-21_17-12-50PM.log

Patch history file: /u01/app/oracle/product/10.2.0/db_1/cfgtoollogs/opatch/opatch_history.txt

ApplySession applying interim patch '8576156' to OH '/u01/app/oracle/product/10.2.0/db_1'

Running prerequisite checks...
Provide your email address to be informed of security issues, install and
initiate Oracle Configuration Manager. Easier for you if you use your My
Oracle Support Email address/User Name.
Visit http://www.oracle.com/support/policies.html for details.
Email address/User Name: your.metalink-email@your-company.com
Password (optional):
Unable to establish a network connection to Oracle. If your systems require a
proxy server for outbound Internet connections, enter the proxy server details
in this format:
[@][:]
If you want to remain uninformed of critical security issues in your
configuration, enter NONE
Proxy specification: NONE

Enjoy the all new Patch Set Update!

September 21, 2009

Posted In: Operations

Tags: , ,

Quick How-To on vncserver for Oracle DBA

I don’t normally use vncserver, I prefer SSH X11 tunneling/forwarding due to it’s inherited security, but sometimes there have been issues with X11/SSH and it’s nice to know an alternative. This quick how-to is meant for an Oracle DBA looking for a quick way to run GUI tools from the Oracle stack, such as runInstaller, dbca, dbua, etc.. It is not meant to serve as a step by step guide to setup a secure, reusable vncserver setup in your environment, I do however provide some additional references where you can find additional information (see the end of this post).

Lets get to it then — the good news is that vncserver should already be installed on your RHEL / OEL. All you have to do is start it up and tell it the display number like so:

[root@rac1 ~]# su - oracle
rac1.XRACP1-> vncserver :12

You will require a password to access your desktops.

Password:
Verify:

New 'rac1.mydomain.com:12 (oracle)' desktop is rac1.mydomain.com:12

Creating default startup script /u01/app/oracle/.vnc/xstartup
Starting applications specified in /u01/app/oracle/.vnc/xstartup
Log file is/u01/app/oracle/.vnc/rac1.mydomain.com:12.log

rac1.XRACP1-> ps -ef | grep vnc
oracle   31123     1  0 13:04 pts/1    00:00:00 Xvnc :12 -desktop rac1.mydomain.com:12 (oracle) -httpd /usr/share/vnc/classes -auth /u01/app/oracle/.Xauthority -geometry 1024x768 -depth 16 -rfbwait 30000 -rfbauth /u01/app/oracle/.vnc/passwd -rfbport 5912 -pn
oracle   31158     1  0 13:04 pts/1    00:00:00 vncconfig -iconic
oracle   32309 31078  0 13:05 pts/1    00:00:00 grep vnc
rac1.XRACP1->

If you want to kill vncserver use “vncserver -kill :12” command where :12 is the display number you specified when starting it:

vncserver -kill :12

rac1.XRACP1-> ps -ef | grep vnc
oracle    9597     1  1 17:50 pts/3    00:00:00 Xvnc :12 -desktop rac1.mydomain.com:12 (oracle) -httpd /usr/share/vnc/classes -auth /u01/app/oracle/.Xauthority -geometry 1024x768 -depth 16 -rfbwait 30000 -rfbauth /u01/app/oracle/.vnc/passwd -rfbport 5912 -pn
oracle   10065     1  0 17:50 pts/3    00:00:00 vncconfig -iconic
oracle   10303  9514  0 17:50 pts/3    00:00:00 grep vnc

rac1.XRACP1-> vncserver -kill :12
Killing Xvnc process ID 9597

rac1.XRACP1-> ps -ef | grep vnc
oracle   14396  9514  0 17:52 pts/3    00:00:00 grep vnc
rac1.XRACP1->

That’s all to it, next you can access the display via http://host:58XX where XX is the screen number you specified when starting the vncserver:

http://rac1.mydomain.com:5812

REFERENCE:
Note: 551711.1 Linux OS Service ‘vncserver’
Note: 735767.1 How to Setup VNC Server with Clipboard Support on RHEL/OEL
APRESS Book: Linux Recipes for Oracle DBAs

September 16, 2009

Posted In: Linux, Operations

Tags:

11gR2 clients connect to the database using SCANs

If you’ve extended your RAC cluster on a set of new nodes you already know how painful it can be to have to go through the list of your clients and make sure their SQL*Net configuration is up to date. 11gR2 solves this problem using Single Client Access Name (SCAN).

The single client access name (SCAN) is a hostname used to provide service access for clients to the cluster. Because the SCAN is associated with the cluster as a whole, rather than to a particular node, the SCAN makes it possible to add or remove nodes from the cluster without needing to reconfigure clients. It also adds location independence for the databases, so that client configuration does not have to depend on which nodes are running a particular database. Clients can continue to access the cluster in the same way as with previous releases, but Oracle recommends that clients accessing the cluster use the SCAN.

Reference: 1.3.2.2 IP Address Requirements

How is SCAN implemented?

For high availability purposes the SCAN name should be associated with at least three IP addresses using DNS round-robin resolution. If you opt to use Grid Naming Service then GNS can also be used to manage the SCAN name.

SCAN is configured at a cluster level not at the node level, that’s what makes it so flexible — no mater how many nodes your clusters consists of, your clients can continue to use SCAN to access the services of your cluster utilizing all nodes even if you add or delete them:

The SCAN is a virtual IP name, similar to the names used for virtual IP addresses, such as node1-vip. However, unlike a virtual IP, the SCAN is associated with the entire cluster, rather than an individual node, and associated with multiple IP addresses, not just one address.

SCAN works as an independent handler for the entire cluster — it acts on client’s behalf during connection request since it knows all cluster services and it’s available, least loaded nodes:

The SCAN works by being able to resolve to multiple IP addresses reflecting multiple listeners in the cluster handling public client connections. When a client submits a request, the SCAN listener listening on a SCAN IP address and the SCAN port is contracted on a client’s behalf. Because all services on the cluster are registered with the SCAN listener, the SCAN listener replies with the address of the local listener on the least-loaded node where the service is currently being offered. Finally, the client establishes connection to the service through the listener on the node where service is offered. All of these actions take place transparently to the client without any explicit configuration required in the client.

Bottom line – use SCAN – it simplifies cluster management:

Because the SCAN addresses resolve to the cluster, rather than to a node address in the cluster, nodes can be added to or removed from the cluster without affecting the SCAN address configuration.

Reference: D.1.3.5 About the SCAN

If you found this article helpful and would like to receive more like it as soon as I release them make sure to sign up to my newsletter below:

SUBSCRIBE

September 10, 2009

Posted In: RAC

Tags: , , ,

11gR2 – raw and block devices – no longer supported

I was just reading up on the 11gR2 documentation for Grid Infrastructure Installation and finally we have a closure on the topic of RAW and BLOCK devices for OCR and VOTING disks:

With this release, OUI no longer supports installation of Oracle Clusterware files on block or raw devices. Install Oracle Clusterware files either on Automatic Storage Management diskgroups, or in a supported shared file system.

For new installations, OCR and voting disk files can be placed either on ASM, or on a cluster file system or NFS system. Installing Oracle Clusterware files on raw or block devices is no longer supported, unless an existing system is being upgraded.

REFERENCE: What’s New in Oracle Grid Infrastructure Installation and Configuration?

Perfect timing! I was just mulling over what to do with OCR/VOTING on my upcoming SAN-based RAC install — now it’s clear — use 11gR2 and store them on ASM.

September 10, 2009

Posted In: RAC

Tags: , ,

11gR2 is here!

In case you didn’t get this via oracle.com news updates — 11gR2 is out and available for download from here ….

And as I have predicted, the first distro is Linux x86 and Linux x86-64. So if you are a die-hard Solaris fan it should now be obvious (if it wasn’t already) that Linux is now officially the preferred platform (as Solaris once was but no longer is).

I actually wasn’t expecting this in September — I thought it was going to get announced closer to Oracle Open World but I am not complaining here :)

September 10, 2009

Posted In: Linux

Tags: , , ,

Installing Oracle instantclient basic and instantclient sqlplus on win32

Sometimes basic things like installing the latest Oracle instantclient on the PCs of all of your developers can take considerable time. I typically setup a dedicated DBA Portal website wherever I work and then write up instructions for repetitive things like these. It’s all about – do it once and forget it. Here’s the copy of the writeup I did to install the latest 11g Oracle instantclient basic and instantclient sqlplus on win32:

Download Software

Go to Instant Client Downloads for Microsoft Windows (32-bit) download page:

http://www.oracle.com/technology/software/tech/oci/instantclient/htdocs/winsoft.html

And download basic-win32 and sqlplus-win32 files to your PC (for example):

  • instantclient-basic-win32-11.1.0.7.0.zip
  • instantclient-sqlplus-win32-11.1.0.7.0.zip

Create Oracle Directory

Create a C:\oracle directory on your C drive (if you don’t already have one) and move both files into C:\oracle.

Unzip both files

  1. first unzip instantclient-basic-win32-11.1.0.7.0.zip
  2. then unzip instantclient-sqlplus-win32-11.1.0.7.0.zip

to unzip:

  • right click on file
  • point to WinZip
  • Click Extract to here …

End result should be a new folder called instantclient_11_1 in your C:\oracle:

instantclient_11_1

Setup your system %PATH% and %TNS_ADMIN% variables

  • In Windows explorer LEFT pane, right click on My Computer icon and click Properties
  • Click Advanced tab
  • Click Environment variables button:

Environment variables button

In the System Variables panel select Path variable and click Edit button:

System Variables/Path

In the Variable Value field paste the following: C:\oracle\instantclient_11_1. Make sure it’s before all other path values that point to your OLD Oracle clients, then clickOk

Variable Value

Now find a variable called TNS_ADMIN – it could be in either panel (System / User), if you don’t have such variable, create it in the User panel by clicking New:

Variable Name: TNS_ADMIN
Variable Value: C:\oracle\instantclient_11_1

Click Ok

TNS_ADMIN Variable Win

  • Now click Ok on the Environment Variables screen
  • and click Ok on the System Properties window to close it

Setup tnsnames.ora file

If you already have an existing copy of tnsnames.ora file then place it to:

C:\oracle\instantclient_11_1

If you don’t – then create a new tnsnames.ora file, for example:

XRACQ_MYSERVICE_TAF =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = vip-qarac1)(PORT = 1521))
    (ADDRESS = (PROTOCOL = TCP)(HOST = vip-qarac2)(PORT = 1521))
    (ADDRESS = (PROTOCOL = TCP)(HOST = vip-qarac4)(PORT = 1521))
    (ADDRESS = (PROTOCOL = TCP)(HOST = vip-qarac3)(PORT = 1521))
    (LOAD_BALANCE = yes)
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = XRACQ_MYSERVICE_TAF.domain.com)
      (FAILOVER_MODE =
        (TYPE = SELECT)
        (METHOD = BASIC)
        (RETRIES = 180)
        (DELAY = 5)
      )
    )
  )

Test 11g client by running sqlplus

  • Open a new CMD window by going to Start->Run
  • In the Run Window type in: cmd and press Ok:

in the CMD window’s prompt – enter the following: sqlplus user/pass@TNS_ALIAS where:

user:      is the database username given to you by the DBA
pass:      is the database password given to you by the DBA
TNS_ALIAS: The alias you setup in the tnsnames.ora file

example:

C:\>sqlplus xxx/xxxxxxxx@XRACQ_MYSERVICE_TAF

SQL*Plus: Release 11.1.0.7.0 - Production on Wed Aug 26 16:24:23 2009

Copyright (c) 1982, 2008, Oracle.  All rights reserved.


Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP, Data Mining
and Real Application Testing options

SQL>

Known Errors:

If you get the ORA-12705 error while running sqlplus:

C:\>sqlplus xxxxxx/xxxxxxxx@xxxxxxxxxxx

SQL*Plus: Release 11.1.0.7.0 - Production on Wed Aug 26 16:58:33 2009

Copyright (c) 1982, 2008, Oracle.  All rights reserved.

ERROR:
ORA-12705: Cannot access NLS data files or invalid environment specified

Enter user-name:

Here’s the solution:

To fix the ORA-12705: Cannot access NLS data files or invalid environment specified error, go to your registry editor (run regedit from Start->Run) and then:

  • Navigate to Windows Registry in \HKEY_LOCAL_MACHINE\SOFTWARE\ORACLE
  • Remove the NLS_LANG from there (select it and delete)

Re-run the sqlplus test to make sure it works (you will need to open a new CMD window for the registry changes to take place).

Bonus PDF

If you’d like to get a copy of this article in an easy to share PDF – please sign up for my newsletter – Confessions of an Oracle DBA where I share tips, scripts and tricks I’ve learned during almost two decades in the tech field as an Oracle DBA:

SUBSCRIBE

Additional Resources

  1. If you have any additional questions please join our Oracle DBA Community and post them there.
  2. And if you need any further Oracle DBA help – I can be reached at: http://www.hashjoin.com/contact

End.

August 26, 2009

Posted In: Operations

Tags: , , ,

Using Flashback Database to strengthen Data Guard Setup

MAA recommends to enable Flashback Database on the primary prior to FailOver:

Enable Flashback Database to reinstate failed production databases after a
failover operation has completed. Flashback Database provides a second
very significant function, enabling fast point in time recovery if needed.

See: MAA_WP_10gR2_SwitchoverFailoverBestPractices.pdf

Enabling Flashback Database involves setting up flash recovery area and setting a flashback retention target which specifies how far back you want to be able to restore your database using the Flashback Database feature.

Once Flashback Database is setup, the database starts to copy images of each altered block into the flashback logs — this works for all datafiles. When it’s time to Flashback the database, the copies of the blocks from flashback logs are used to reconstruct the datafiles to a state just prior to the desired flashback time, the redo/arch logs are then used to bring a datafiles to a consistent state.

WARNING::
Redo logs must be available for the entire time period spanned by the
flashback logs, whether on tape or on disk. (In practice, however, redo
logs are generally needed much longer than the flashback retention target
to support point-in-time recovery.)

There are also a number of operations you can perform on your database, such
as dropping a tablespace or shrinking a datafile, which cannot be reversed
with Flashback Database. After such an operation, the flashback database
window begins at the time immediately following that operation.

One thing to consider is that the only way to guarantee a database can be returned to a specific point in time is to use guaranteed restore points. In other words, don’t use “normal” restore points for this purpose and don’t rely on Flashback Database alone. The only constraint to how far you can go back, is the size of your disk space in the flash recovery area.

Again, WARNING:
Limitations that apply to Flashback Database also apply to guaranteed
restore points. For example, shrinking a datafile or dropping a tablespace
can prevent flashing back the affected datafiles to the guaranteed restore
point.

Creating a guaranteed restore point without having enough sufficient free space in the flash recovery area (FRA) will cause the FRA to fill completely, that’s because “No file in the flash recovery area is eligible for deletion if it is required to satisfy the guarantee [restore point]“. In many circumstances, this will cause your database to halt.

To save on space in FRA you can disable the Flashback Database and still create a guaranteed restore point. In this case, the first time a datafile block is modified, an image of this block before the modification is stored in the flashback logs. This saves on space because only one-time copy of every changed data block is stored there, but subsequent modifications to the same block do not cause the block contents to be logged again. This method works really well and it’s more efficient as long as your primary only needs to be able to return to the specific point in time at which the guaranteed restore point was created, such as to a before-state of a failed application upgrade that might have made changes to a database.

See: 5.1.1 About Flashback Database

That’s theory, how about some practice time? Follow along. First, we verify our setup (in this case no FRA [db_recovery_file_dest] is setup:

rac1.XRACP1-> sqlplus /nolog

SQL*Plus: Release 10.2.0.4.0 - Production on Fri Jun 12 19:25:07 2009

Copyright (c) 1982, 2007, Oracle.  All Rights Reserved.

SQL> connect / as sysdba
Connected.
SQL>    select * from v$recovery_file_dest;

no rows selected

SQL> show parameter recovery

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
db_recovery_file_dest                string
db_recovery_file_dest_size           big integer 0
recovery_parallelism                 integer     0
SQL> select * from v$flash_recovery_area_usage;

no rows selected

SQL>

Lets create FRA (NOTE: since this is a RAC database, I am creating FRA on a clustered FS (OCFS2) /u02):

[root@rac1 log]# mkdir -p /u02/oradata/rcv_area
[root@rac1 log]# chown -R oracle:dba /u02/oradata

rac1.XRACP1-> df -k /u02
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/mapper/2000b080023002235p1
                      37899136   1971200  35927936   6% /u02
rac1.XRACP1->


SQL> alter system set db_recovery_file_dest_size=28g scope=both sid='*';

System altered.

SQL> alter system set db_recovery_file_dest='/u02/oradata/rcv_area' scope=both sid='*';

System altered.

SQL>

SQL> set lines 132
SQL> col name format a35
SQL> set trims on
SQL> select * from v$recovery_file_dest;

NAME                                SPACE_LIMIT SPACE_USED SPACE_RECLAIMABLE NUMBER_OF_FILES
----------------------------------- ----------- ---------- ----------------- ---------------
/u02/oradata/rcv_area                3.0065E+10          0                 0               0

SQL> select * from v$flash_recovery_area_usage;

FILE_TYPE    PERCENT_SPACE_USED PERCENT_SPACE_RECLAIMABLE NUMBER_OF_FILES
------------ ------------------ ------------------------- ---------------
CONTROLFILE                   0                         0               0
ONLINELOG                     0                         0               0
ARCHIVELOG                    0                         0               0
BACKUPPIECE                   0                         0               0
IMAGECOPY                     0                         0               0
FLASHBACKLOG                  0                         0               0

6 rows selected.

SQL>

So far we setup the FRA, now lets try to create guaranteed restore point. One thing to remember here though is this — if flashback database is not enabled (to save on space), then the database must be mounted, not open, when creating the first guaranteed restore point (or if all previously created guaranteed restore points have been dropped). If you attempt to create a guaranteed restore point when the database is opened you get ORA-38787 error:

SQL> CREATE RESTORE POINT gr_01 GUARANTEE FLASHBACK DATABASE;
CREATE RESTORE POINT gr_01 GUARANTEE FLASHBACK DATABASE
*
ERROR at line 1:
ORA-38784: Cannot create restore point 'GR_01'.
ORA-38787: Creating the first guaranteed restore point requires mount mode when flashback database is off.

A normal restore point however, works, but it will not do anything since flashback database is off, here’s an example:

SQL> !ls -l /u02/oradata/rcv_area
total 0

SQL> CREATE RESTORE POINT nr_01;

Restore point created.

SQL> !ls -l /u02/oradata/rcv_area
total 0

SQL> create table xyz(t number) tablespace tools;

Table created.

SQL> !ls -l /u02/oradata/rcv_area
total 0


SQL> insert into xyz values (1000);

1 row created.

SQL> commit;

Commit complete.

SQL> !ls -l /u02/oradata/rcv_area
total 0

SQL>

As you can see nothing happened … even though a normal restore point is created. Lets drop it (this syntax works for both NORMAL and GUARANTEED RPs):

SQL> drop restore point nr_01;

Restore point dropped.


SQL> SELECT NAME, SCN, TIME, DATABASE_INCARNATION#,
           GUARANTEE_FLASHBACK_DATABASE,STORAGE_SIZE
           FROM V$RESTORE_POINT;
  2    3
no rows selected

Now lets create GUARANTEED restore point WITHOUT enabling FLASHBACK DATABASE which requires database to be mounted only, and since this is RAC database all other instances must be shutdown first as well:

rac1.XRACP1-> srvctl stop database -d XRACP
rac1.XRACP1-> sqlplus /nolog

SQL*Plus: Release 10.2.0.4.0 - Production on Fri Jun 12 20:01:53 2009

Copyright (c) 1982, 2007, Oracle.  All Rights Reserved.

SQL> connect / as sysdba
Connected to an idle instance.
SQL> startup mount;
ORACLE instance started.

Total System Global Area 1610612736 bytes
Fixed Size                  2084296 bytes
Variable Size             822084152 bytes
Database Buffers          771751936 bytes
Redo Buffers               14692352 bytes
Database mounted.
SQL> CREATE RESTORE POINT gr_01 GUARANTEE FLASHBACK DATABASE;

Restore point created.

SQL> shutdown immediate;
ORA-01109: database not open


Database dismounted.
ORACLE instance shut down.
SQL> exit
Disconnected from Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP, Data Mining
and Real Application Testing options
rac1.XRACP1-> srvctl start database -d XRACP
rac1.XRACP1-> srvctl start service -d XRACP
rac1.XRACP1->

Now lets see what we can gather from V$RESTORE_POINT:

SQL> col name format a15
col time format a35
set lines 132
set trims on
SQL> SQL> SQL> SQL>
SQL>
SQL>
SQL>
SQL> SELECT NAME, SCN, TIME, DATABASE_INCARNATION#,
           GUARANTEE_FLASHBACK_DATABASE,STORAGE_SIZE
           FROM V$RESTORE_POINT;
  2    3
NAME                   SCN TIME                                DATABASE_INCARNATION# GUA STORAGE_SIZE
--------------- ---------- ----------------------------------- --------------------- --- ------------
GR_01           9.0317E+12 12-JUN-09 08.02.19.000000000 PM                         1 YES     63766528

SQL>
   

rac1.XRACP1-> find /u02/oradata/rcv_area -ls
5161314    4 drwxr-xr-x   3 oracle   dba          4096 Jun 12 20:02 /u02/oradata/rcv_area
5161316    4 drwxr-x---   3 oracle   oinstall     4096 Jun 12 20:02 /u02/oradata/rcv_area/XRACP
5161317    4 drwxr-x---   2 oracle   oinstall     4096 Jun 12 20:03 /u02/oradata/rcv_area/XRACP/flashback
5161318 15576 -rw-r-----   1 oracle   oinstall 15949824 Jun 12 20:04 /u02/oradata/rcv_area/XRACP/flashback/o1_mf_5365ov16_.flb
237846 15576 -rw-rw----   1 oracle   oinstall 15949824 Jun 12 20:03 /u02/oradata/rcv_area/XRACP/flashback/o1_mf_5365qm48_.flb
4128929 15576 -rw-rw----   1 oracle   oinstall 15949824 Jun 12 20:03 /u02/oradata/rcv_area/XRACP/flashback/o1_mf_5365qtxy_.flb
7227425 15576 -rw-rw----   1 oracle   oinstall 15949824 Jun 12 20:03 /u02/oradata/rcv_area/XRACP/flashback/o1_mf_5365qtyd_.flb
rac1.XRACP1->

Much better now — it is working, and a few days later:

SQL> !date;ls -l /u02/oradata/rcv_area/XRACP/flashback
Mon Jun 15 15:02:17 PDT 2009
total 358248
-rw-r-----  1 oracle oinstall 15949824 Jun 13 04:00 o1_mf_5365ov16_.flb
-rw-rw----  1 oracle oinstall 15949824 Jun 12 20:03 o1_mf_5365qm48_.flb
-rw-rw----  1 oracle oinstall 15949824 Jun 12 20:03 o1_mf_5365qtxy_.flb
-rw-rw----  1 oracle oinstall 15949824 Jun 12 20:03 o1_mf_5365qtyd_.flb
-rw-rw----  1 oracle oinstall 15949824 Jun 12 22:00 o1_mf_536dm0jm_.flb
-rw-rw----  1 oracle oinstall 15949824 Jun 12 22:01 o1_mf_536dmnxj_.flb
-rw-rw----  1 oracle oinstall 15949824 Jun 13 02:00 o1_mf_536dntxs_.flb
-rw-rw----  1 oracle oinstall 15949824 Jun 12 23:03 o1_mf_536j9b8b_.flb
-rw-rw----  1 oracle oinstall 15949824 Jun 12 23:28 o1_mf_536j9jvj_.flb
-rw-rw----  1 oracle oinstall 15949824 Jun 13 06:00 o1_mf_536kqnmf_.flb
-rw-rw----  1 oracle oinstall 15949824 Jun 13 15:10 o1_mf_536tnpky_.flb
-rw-rw----  1 oracle oinstall 15949824 Jun 13 21:00 o1_mf_5371or32_.flb
-rw-rw----  1 oracle oinstall 15949824 Jun 13 23:03 o1_mf_5378qrx2_.flb
-rw-rw----  1 oracle oinstall 15949824 Jun 14 12:00 o1_mf_537m8tox_.flb
-rw-rw----  1 oracle oinstall 15949824 Jun 14 07:33 o1_mf_5388yvsd_.flb
-rw-rw----  1 oracle oinstall 15949824 Jun 14 19:00 o1_mf_538xgm1v_.flb
-rw-rw----  1 oracle oinstall 15949824 Jun 14 02:00 o1_mf_5394pdx7_.flb
-rw-rw----  1 oracle oinstall 15949824 Jun 14 23:04 o1_mf_539h0tcs_.flb
-rw-rw----  1 oracle oinstall 15949824 Jun 15 04:00 o1_mf_53b2lorr_.flb
-rw-rw----  1 oracle oinstall 15949824 Jun 15 15:00 o1_mf_53bl5v8y_.flb
-rw-rw----  1 oracle oinstall 15949824 Jun 15 15:00 o1_mf_53cbs5hs_.flb
-rw-rw----  1 oracle oinstall 15949824 Jun 15 15:00 o1_mf_53cs3y00_.flb
-rw-rw----  1 oracle oinstall 15949824 Jun 15 15:00 o1_mf_53dbfmx5_.flb

SQL> SELECT NAME, SCN, TIME, DATABASE_INCARNATION#,
              GUARANTEE_FLASHBACK_DATABASE,STORAGE_SIZE
              FROM V$RESTORE_POINT;  2    3

NAME                   SCN TIME                                DATABASE_INCARNATION# GUA STORAGE_SIZE
--------------- ---------- ----------------------------------- --------------------- --- ------------
GR_01           9.0317E+12 12-JUN-09 08.02.19.000000000 PM                         1 YES    366657536

SQL> select * from v$flash_recovery_area_usage;

FILE_TYPE    PERCENT_SPACE_USED PERCENT_SPACE_RECLAIMABLE NUMBER_OF_FILES
------------ ------------------ ------------------------- ---------------
CONTROLFILE                   0                         0               0
ONLINELOG                     0                         0               0
ARCHIVELOG                    0                         0               0
BACKUPPIECE                   0                         0               0
IMAGECOPY                     0                         0               0
FLASHBACKLOG               1.22                         0              23

6 rows selected.

SQL>

And that’s how you create a guaranteed restore point. But what do you use it for, with regards to STANDBY FAILOVER? You really shouldn’t, because you don’t know when a FAILOVER occurs so you can’t create a point right before it. You should however, use a guaranteed restore point during a SWITCHOVER because during a switchover you are in full control of both primary and standby databases. In a FAILOVER scenario, enabling Flashback Database [ ALTER DATABASE FLASHBACK ON; ] is more effective than trying to create guaranteed restore points.

Still have questions? Here are pointers to additional reading with excerpts:

From Note:565535.1 titled “Flashback Database Best Practices & Performance”:

Manual primary database reinstate- if Data Guard is being used without the
fast-start failover feature and a Data Guard failover is necessary, then
flashback database can be used to manually reinstate the failed primary
database. This is documented in the Data Guard Administration and Concepts
guide – see “12.4 Using Flashback Database After a Failover” here:

And from 12.4 Using Flashback Database After a Failover:

After a failover occurs, the original primary database can no longer
participate in the Data Guard configuration until it is repaired and
established as a standby database in the new configuration. To do
this, you can use the Flashback Database feature to recover the
failed primary database to a point in time before the failover
occurred, and then convert it into a physical or logical standby
database in the new configuration.

August 22, 2009

Posted In: Data Guard, RAC

Tags: , , , , ,