Detecting and Recovering from database Corruption :-
Block Corruption :-
Block may be corrupted in two ways
- Media corruption
- Logical Corruption.
Media Corruption :-
- A media corrupt block is a block where the contents of the block make no sense whatsoever: its contents do not match the format expected by oracle, according to the formatting rules for the tablespace and the objects within it.
- When a datafile is created, it is formatted into Oracle blocks ,of whatever block size was chosen for the tablespace.This is the first level of formatting.
- A second level occurs when the blcok is actually used.When an object is allocated an extent, the blocks in the extent are not further formatted, but as the high water mark of the object is advanced into the new extent, then the blocks receive a second level of formatting, as they take on the chrateristics required for that particular segment.
- If,because of damage to the disks, this formatting is lost, then the block will be ‘media corrupt’.
Logical corruption :-
- Where the Oracle formatting is correct – the block does have the correct header area and a data area – but the contents of the block are internally inconsistent.
- For example, the block header of a table block includes a row directory, stating where each row begins.
- If when attempting to find a row, there isn’t one, this will be a logical corruption – an Oracle internal Error-rather that a media problem.
Parameter Relating to Block Corruption :-
DB_BLOCK_CHECKSUM
DB_BLOCK_CHECKING
- DB_BLOCK_CHECKSUM on TRUE,whenever the DBWn process writes a block to disk it will compute a checksum for the block and include it in the block header.
- when a server reads a block, if the checksum is present it will recalculate it and compare.
- This will mean that any damage occuring in the time between the DBWn writing the block and its being read back will be detected.
- Even when this parameter is on false, checksumming is still enabled for the system tablespace.
- DB_BLOCK_CHECKING to TRUE will have impact on performance,
and Oracle advises leaving it on default unless there are specific problems related to corruptions occuring in memory. - When set to TRUE, the ORacle block will check the block for consistency every time the buffer containing the block is accessed.
- This will mean that if corruptions are occuring in memory, they will be detected immediately but the price is high-perhaps as much as 10 percent of processing cappacity.
- Oracle Corporation advises leaving both these parameters on default, unless instructed otherwise by Oracle Support.
Detecting Block Corruptions :-
- A corrupt block may be detected by your users when they try to read it.
- Example :-
- Windows based, Tablespace called DAMAGED with a datafile DAM.DBF. and a table CORRUPT_TAB, created within it. and a priamry key constraint added. Thsi caused the creation of an index, called CORR_PK, on the constrained column.
- The file was deliberately damaged by using an editor to change a few bytes of some blocks of the CORRUPT_TAB table. May be of either media or logical corruption the file still exists and is of the correct size, but the contents have been damaged externally to ORacle. the index, however is not directly affected.
The Corrupted Block Error Message :-
Sql>select count(*) from corrupt_tab;
count(*)
——-
50
sql>select * from corrupt_tab;
select * from corrupt_tab
*
ERROR at line 1:
ORA-01578:ORACLE data block corrupted(file # 7, block # 5)
ORA-01110:data file 7: ‘C:\ORACLE\ORADATA\DAM.DBF’
sql>
- The first query succeeds. This is because although the query appears to request a full table scan, in fact Oracle can execute the query without reading the table atall.
- There is a primary key constraint on the table, which means that Oracle can count the number of rows by scanning the index, not the table. The second query which requires reading the table itself,fails witha na ORA_01578 error message that lists the address of the corrupted block and the messate returned to the session that detected teh problem, but there is also this message in the alert log UDUMP directory.you can identify which object id affect in the trace file.
Ex: – objd=52077
sql>select owner,object_name,object_type from dba_objects where object_id=52077;
An alternative Query to identify the object would be to match the data block address against the DBA_EXTENTS view.
sql>select owner,segment_name,segment_type from dba_extents where file_id=7 and 5 between block_id and block_id + blocks;
owner segment_name segment_type
——- ———— ————-
HR CORRUPT_TAB TABLE
Unless the corruption is in the header blocks of the datafile, it will only be detected when the block is read by a server process.
DBVERIFY Utility :-
- An external utility run from an operating system prompt to verify datafiles.
- It can be run against files while they are in use, or aginst image copies made with user managed backups or by rman.
- It cannot verify online or archive redo logs, nor the contrlfile nor rman backup sets.
- the files to be verified can exist on a conventional file system or on Automatic storage management disks or on a raw devices.
Executable –> Unix – $oracle_home/bin/dbv
windows – %oracle_home%\bin\dbv.exe
- If “Total pages influx” is greater than zero, it is not an indication of any problem; it merely shows that while DBVERIFY was running against an open file,it encountered a block that was currently being written to by the DBWn process.
- When that happens, it will re-read the block until it gets a consistent image. “Total Pages Influx is the number of times it had to do this.
- If you are using rman to backup your datafiles, DBVERIFY is not needed, because RMAN will perform its own verification.
- DBVERIFY is essential if you are using user managed backups, in order to determine that the backups are actullay usable.
C:\Documents and Settings\mazar>dbv file=D:\oracle\product\10.2.0\oradata\riyadh
\USERS01.DBF
DBVERIFY: Release 10.2.0.1.0 – Production on Tue Feb 9 15:22:25 2010
Copyright (c) 1982, 2005, Oracle. All rights reserved.
DBVERIFY – Verification starting : FILE = D:\oracle\product\10.2.0\oradata\riyad
h\USERS01.DBF
DBVERIFY – Verification complete
Total Pages Examined : 640
Total Pages Processed (Data) : 35
Total Pages Failing (Data) : 0
Total Pages Processed (Index): 2
Total Pages Failing (Index): 0
Total Pages Processed (Other): 53
Total Pages Processed (Seg) : 0
Total Pages Failing (Seg) : 0
Total Pages Empty : 550
Total Pages Marked Corrupt : 0
Total Pages Influx : 0
Highest block SCN : 1685733 (0.1685733)
C:\Documents and Settings\mazar>
The ANALYZE command :-
sql>analyze table validate structure;
sql>analyze index validate struture;
sql>analyze table validate struture cascade;
- When validating the structure of a table, analyze verifies the integrity of each of data blocks and rows. When validating the structure of an index, it veriifes the integrity of the data blocks and the index keys.
- The cascade option will verify both the table and any associated indexes.
- CASCADE can be a very time consuming process for large objects because it checks to ensure each index entry points to a valid table row.
ANALYZE..VALID STRUTURE are that for partitioned tables it will check that rows are in the correct partition ,and for indexes it will compute the ratio of space devoted to active index entries and space wasted because the index entries refer to deleted rows. VIEW INDEX_STATS.
DBMS_REPAIR Package :-
DBMS_REPAIR is a set of procedures that will check objects for problems and make the objects usable again.
DBMS_REPAIR Procedures
The following table lists the procedures included in the DBMS_REPAIR
package.
Comparison of Corruption Detection Methods
Step 1: Create a repair table
BEGIN
DBMS_REPAIR.ADMIN_TABLES (
TABLE_NAME => ‘REPAIR_TABLE’,
TABLE_TYPE => dbms_repair.repair_table,
ACTION => dbms_repair.create_action,
TABLESPACE => ‘USERS’);
END;
/
desc repair_table;
query select tablespace_id, schema_name, object_name, block_id from repair_Table;
SET SERVEROUTPUT ON
DECLARE num_corrupt INT;
BEGIN
num_corrupt := 0;
DBMS_REPAIR.CHECK_OBJECT (
SCHEMA_NAME => ‘SCOTT’,
OBJECT_NAME => ‘DEPT’,
REPAIR_TABLE_NAME => ‘REPAIR_TABLE’,
CORRUPT_COUNT => num_corrupt);
DBMS_OUTPUT.PUT_LINE(‘number corrupt: ‘ || TO_CHAR (num_corrupt));
END;
/
SELECT OBJECT_NAME, BLOCK_ID, CORRUPT_TYPE, MARKED_CORRUPT,
CORRUPT_DESCRIPTION, REPAIR_DESCRIPTION
FROM REPAIR_TABLE;
Fixing corrupt blocks
SET SERVEROUTPUT ON
DECLARE num_fix INT;
BEGIN
num_fix := 0;
DBMS_REPAIR.FIX_CORRUPT_BLOCKS (
SCHEMA_NAME => ‘SCOTT’,
OBJECT_NAME=> ‘DEPT’,
OBJECT_TYPE => dbms_repair.table_object,
REPAIR_TABLE_NAME => ‘REPAIR_TABLE’,
FIX_COUNT=> num_fix);
DBMS_OUTPUT.PUT_LINE(‘num fix: ‘ || TO_CHAR(num_fix));
END;
/
Skipping Corrupt Blocks
BEGIN
DBMS_REPAIR.SKIP_CORRUPT_BLOCKS (
SCHEMA_NAME => ‘SCOTT’,
OBJECT_NAME => ‘DEPT’,
OBJECT_TYPE => dbms_repair.table_object,
FLAGS => dbms_repair.skip_flag);
END;
/
Querying scott’s tables using the DBA_TABLES view shows that SKIP_CORRUPT is enabled for table scott.dept.
SELECT OWNER, TABLE_NAME, SKIP_CORRUPT FROM DBA_TABLES WHERE OWNER = ‘SCOTT’;
…………
Recovering Corrupt Blocks with RMAN
–> If you are making user-managed backups, you cannot perform a block recovery becuase the granularity of a user managed restore is the datafile
–> If one block of a file many gigabytes big is corrupted, you must restore and recover the entire datafile through the usual.
- Take the damaged file offline.
- Restore the file from a backup made before the corruption occured.
- Recover the file completely
- Bring the recovered file online.
This can be done with zero loss, but the impact on your users ,in terms of downtime , may be enormous.
RMAN backups open the possibilty of block level restore and recovery, perhaps with no downtime at all.
Note :- Block media recovery is applicable only to datafile blocks. If a block of an online log file, an archive log file or a controlfile is coruupted. you must have a multiplexed copy available if you ever need to recover.
Detection Of Corrupt Blocks :-
–> RMAN ,being an Oracle aware tool, will verify the contents of data blocks as it reads them; unless instructed otherwise,it wil lterminate the backup as soon as it hits a corrupt block.
–>If you wish, you can run RMAN backups that specify a tolerance for corrupted blocks. If this is done ,then rather than throwing an error and terminating the backup immediately when aa corruption is detected, RMAN will continue to backup the datafile but will record the address of any corruptions it encounters in its repository.
Example :-
Rman to continue a backup as long as no more than one hundred corrupt blocks are encountered.
RMAN>run {
set maxcorrupt for datafile 7 to 100;
backup datafile 7;}
the details of Corrupt blocks are visible in two places.
V$DATABASE_BLOCK_CORRUPTION –> the address of the cause of the problem. the datafile and block number.
V$BACKUP_CORRUPTION –> for backup set backups
V$COPY_CORRUPTION –>backup were to an image copy.
RMAN will always check for physical corruption, known as “media corruption” in the non RMAN world.
Example : this would be a block that oracle cannot process at all: an invalid checksum, or a block full of zeros.
RMAN can also be instructed to check for logical corruption, also known as “SOftware Corruption” as well.
These checks will occur whenever a file is backed up, whether as an image copy or into a backup set.
RMAN>backup nochecksum datafile 7;
will not check for physical corruption and
RMAN>backup check logical datafile 6;
Overview of BLock Media Recovery : BMR
–> BMR changes the granularity of a restore and recovery operation from the datafile to just the damaged blocks.
Advantage :-
–>The file does not have to be taken offline ; normal DML can continue.
–> The mean time to recover is much reduced, since only the damaged blocks are involved in the operation, not the whole file. The only downtime that will occur is if a session happens to hit a block that is actully damaged and has not yet been recovered.
Process :
- Rman will extract backups of these blocks from a backup set or an image copy and write them to the datafile.
- Then RMAN will pass through the archive logs generated since the backup and extract redo records relevant to the restored blocks and apply them.
- The recovery will always be complete – it would be logically impossible to do an incomplete recovery; incomplete recovery of just one block would leave the database in an inconsistent state.
- If a session hits a corruopted block before the BMR process has completed, then it will still receive an ORA-01578 error but it is quite possible that the BMR operation will be complete before any users are aware of the problem.
- BMR can be applied to any data block whatsoever: Unlike DBMS_REPAIR, it is not restircted to tables and indexes; LOB and UNDO segments can also be block recovered.
Block Recover Command :
The block recover command always specifies a list of one or more blocks to be restored and recoverd.
RMAN>blockrecover datafile 7 block 5;
RMAN to restore and recover the one specified block from the most recent backup set or image copy of the file
RMAN>blockrecover datafile 7 block 5,6,7 datafile 9 block 21,25;
There may be doubt regarding 7 block 5 from backupset 1093;
Will restore from the nominated backup set, which could also be specified by a tag:
RMAN>blockrecover datafile 7 block 5 from tag monthly_whole;
Instructs RMAn to restore and recover every block that has been discovered to be damaged by a previous backup operation, using only backups made at least one week ago.
Note :- BMR context, the keyword UNTIL does not denote an incomplete recovery. It means that the restore must be from a backup made before a particular date ( or sequence number or SCN).
Nice article! keep the good work.
Thanks Sir.