Oracle: Technology, Application, and more: 2009

Sunday, April 26, 2009

RMAN: All Backups Are Not Created Equal

I came across an interesting behavior exhibited by RMAN in a recent test I was conducting. I thought I will share this here. The scenario has been explained with a sample below.

The test was conducted in a Oracle 11g single instance environment running on OEL 4. But I tested the similar response to be true in 10g as well.

Here is a simple question: Will the size of a level 1 incremental be bigger than a FULL backup?

Answer: It depends. If the volume of change after the full backup is higher than the size of the full backup, then the incremental backup will be bigger in size, irrespective of the fact that the incremental being a “delta”.

But If I were to ask the question a little differently, Will a "LEVEL 1 INCREMENTAL" backup without a "prior" level 0 backup, be of equal size compared to a "FULL" backup, then the answer most probably will be: YES.

The assumption is that a FULL backup is nothing but a LEVEL 0 backup. If in the absence of a LEVEL 0 backup, an incremental LEVEL 1 backup will act as a LEVEL 0 backup. So, this will exhibit similar behavior. In other words, the size of the "FULL BACKUP" and an "INCREMENTAL LEVEL 1" will be of the same size.

I was surprised to learn that this is not the case *always*.

We all know, that based on the compatibility settings, whether its >=10.0 or less, the behavior of the incremental backups change. Essentially, A “FULL” backup is not very useful in an incremental backup strategy. So it is important that we do a Level 0 and Level 1 backups to maintain a viable incremental backup and recovery strategy. A level 1 backup without a level 0 is akin to a FULL backup with the advantage of an implementable incremental backup strategy. However, lack of understanding of the behavior of incremental backup options will have serious consequences when we reach implementation.

Check this out.

create tablespace test_5mb_tbs
datafile '/oradata2/OSI11/tbs_tbs_tbs_1.dbf' size 5M
extent management local

11:12:07 SQL> create table rman_bkup_test(col1 number)
11:12:54 2 tablespace test_5mb_tbs;

Table created.

select count(extent_id)ext_id ,segment_name
from dba_extents
where segment_name='RMAN_BKUP_TEST' group by segment_name
11:16:41 SQL> /

EXT_ID SEGMENT_NAME
------ ---------------
1 RMAN_BKUP_TEST

11:16:41 SQL> declare
11:17:00 2 begin
11:17:02 3 for x in 1..50000 loop
11:17:07 4 insert into rman_bkup_test(col1) values (x);
11:17:16 5 end loop;
11:17:18 6 end;
11:17:20 7 /

PL/SQL procedure successfully completed.

Elapsed: 00:00:07.50
11:17:28 SQL> get afiedt.buf
1 select count(extent_id)ext_id ,segment_name
2 from dba_extents
3* where segment_name='RMAN_BKUP_TEST' group by segment_name
11:17:41 SQL> /

EXT_ID SEGMENT_NAME
------ ---------------
11 RMAN_BKUP_TEST

By running a block, we fill it up to 11 extents.

*All the RMAN output is edited for conciseness*

RMAN> backup full tablespace test_5mb_tbs tag='full_backup';

Starting backup at 24-APR-09
piece handle=/oradata3/OSI11/BACKUP/17kd94q0_1_1 tag=FULL_BACKUP

RMAN> backup incremental level 1 tablespace test_5mb_tbs tag='inc_lvl1';

piece handle=/oradata3/OSI11/BACKUP/18kd94sr_1_1 tag=INC_LVL1

[ora11rac@linux1 BACKUP]$ ls -lrt
total 1636
-rw-r--r-- 1 ora11rac dbarac11 113 Apr 16 16:47 afiedt.buf
-rw-r----- 1 ora11rac dbarac11 843776 Apr 24 11:19 17kd94q0_1_1
-rw-r----- 1 ora11rac dbarac11 819200 Apr 24 11:21 18kd94sr_1_1

As we can see, the full backup and a level 1 incremental backup without a level 0 base results in a backup piece of approximately the same size.

11:35:39 SQL> delete from rman_bkup_test where rownum <48001;
11:35:58 SQL> commit;

RMAN> backup full tablespace test_5mb_tbs tag='full_aft_del';

piece handle=/oradata3/OSI11/BACKUP/19kd96ep_1_1 tag=FULL_AFT_DEL comment=NONE

About 48K rows gets deleted and a full backup is taken again. Before I do this backup, I delete the backup sets from previous iteration. I delete the RMAN backups for every iteration, throughout this exercise.

[ora11rac@linux1 BACKUP]$ ls -lt
total 832
-rw-r----- 1 ora11rac dbarac11 843776 Apr 24 11:47 19kd96ep_1_1

12:02:31 SQL> alter table rman_bkup_test enable row movement;
12:02:43 SQL> alter table rman_bkup_test shrink space;
RMAN> backup full tablespace test_5mb_tbs tag='full_aft_shrink';

piece handle=/oradata3/OSI11/BACKUP/1akd97e9_1_1

[ora11rac@linux1 BACKUP]$ ls -lrt
total 632
-rw-r--r-- 1 ora11rac dbarac11 113 Apr 16 16:47 afiedt.buf
-rw-r----- 1 ora11rac dbarac11 638976 Apr 24 12:04 1akd97e9_1_1

After the shrink, the full backup results in a 630K backup piece instead of the 800K size from before.

RMAN> backup incremental level 1 tablespace test_5mb_tbs tag='level1_aft_shrink ';

piece handle=/oradata3/OSI11/BACKUP/1bkd97hd_1_1
[ora11rac@linux1 BACKUP]$ ls -lrt
total 1436
-rw-r--r-- 1 ora11rac dbarac11 113 Apr 16 16:47 afiedt.buf
-rw-r----- 1 ora11rac dbarac11 638976 Apr 24 12:04 1akd97e9_1_1
-rw-r----- 1 ora11rac dbarac11 819200 Apr 24 12:06 1bkd97hd_1_1

However, if I did an incremental level 1, even though this is *exactly* similar to a full backup with the advantage of qualifying to be part of an incremental backup strategy, the size of the backup does not change from the first test. If you notice, this size is similar to the size we noticed from the initial level 0 backup, before the 48K rows were deleted.

RMAN> backup incremental level 0 tablespace test_5mb_tbs tag='level0_aft_shrink';

piece handle=/oradata3/OSI11/BACKUP/1ckd9e5s_1_1 tag=LEVEL0_AFT_SHRINK comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:01
Finished backup at 24-APR-09

[ora11rac@linux1 BACKUP]$ ls -l
total 632
-rw-r----- 1 ora11rac dbarac11 638976 Apr 24 13:59 1ckd9e5s_1_1

Backup incremental level 0 produces a backup size that is similar to a FULL backup.

To summarize: Though a Level 1 w/o a Level 0 is similar to a full backup and can be part of an incremental backup strategy, various factors, influence the backup size, and performance (time taken) to perform the level 1. Understanding the implications of the various backup types is critical when implementing a backup strategy.

It is important to implement an incremental backup strategy with a level 0 base backup, no matter if a level 1 could be (indirectly) used as a starting point.

As always, comments/suggestions are appreciated.

Tuesday, February 24, 2009

Surprising Little change in 11g NetCA

I was working on Oracle 11g trying to troubleshoot some weird x-windows issues that is totally unrelated to the actual project I was involved in.

The issue was that, the Oracle binaries were moved from LOC-A to LOC-B. Things were working fine as usual as this is no big deal. But after a few days we wanted to run NetCA and the GUI would not start. Initially since the environment is completed handled via remote vncservers, it was thought that perhaps the issue had to do with DISPLAY setup and such. Troubleshooting DISPLAY settings came out with nothing.

Then came the idea to look into the netca script itself.

Here is where I found an interesting change.

In 11g, the ORACLE_HOME variable is hard coded based on the initial installation of the binaries. Like this:

vi..Netca file and you will see...

ORACLE_HOME=/apps/ora11rac/product/11.1/db_1

However, I checked a 10g binary installation and this one seems to go after the
$ORACLE_HOME environment variable, which is good.

10g

case $ORACLE_HOME in
"") echo "****ORACLE_HOME environment variable not set!"
echo " ORACLE_HOME should be set to the main"
echo " directory that contains Oracle products."
echo " Set and export ORACLE_HOME, then re-run."
exit 1;;
esac

Friday, February 20, 2009

Row Exclusive in Oracle Database 11g. Interesting Changes.

While doing some random testing in 11g (11.1.0.6), I came across an interesting situation. I am still investigating and trying to look for answers but I thought it is also worth discussing here. Perhaps, someone might shed some light on this change.

We know how row exclusive mode works up until 10g; Just to illustrate I have a sample test case below.

SID 136 is the blocker that holds table "my_all_table" in exclusive mode.

SID 132 and 130 are requesting exclusive access and are waiting for 136 to release its lock.

1* lock table my_all_table in row exclusive mode
SQL> /

Table(s) Locked.

SQL> select sys_context('USERENV','SID'),sys_context('USERENV','session_user') from dual;

SYS_CONTEXT('USERENV','SID')
--------------------------------------------------------------------------------
SYS_CONTEXT('USERENV','SESSION_USER')
--------------------------------------------------------------------------------
136
GVAIDHYA

SQL> select * from v$lock where sid in (132,136,130);

ADDR KADDR SID TY ID1 ID2 LMODE REQUEST CTIME BLOCK
-------- -------- ---------- -- ---------- ---------- ---------- ---------- ---------- ----------
53132010 53132028 136 TM 52553 0 6 0 2797 1
531320BC 531320D4 132 TM 52553 0 0 3 2614 0
53132168 53132180 130 TM 52553 0 0 3 81 0

SQL> select sid,serial#,blocking_session,username from v$session where blocking_session is not null;

SID SERIAL# BLOCKING_SESSION USERNAME
---------- ---------- ---------------- ------------------------------
130 68 137 GVAIDHYA
132 362 137 GVAIDHYA

The output for the above SQL stating that the blocking session is 137, (One higher than the actual blocker (136) is due to the bug in 10R2 which is fixed in 10.2.0.4. My test instance for 10g is a 2-node RAC. The bug details are here.

Now, the same scenario was played in a 11g database, This is a single instance database but that is irrelevant to our discussion.

SQL> @mysid

SYS_CONTEXT('USERENV','SID')
--------------------------------------------------------------------------------
132

SQL> lock table my_all_table in row exclusive mode;

Table(s) Locked.

SQL> sho user
USER is "GVAIDHYA"
SQL> select sysdate from dual;

SYSDATE
---------
20-FEB-09

SQL> lock table my_all_table in row exclusive mode;

Table(s) Locked.

SQL> @mysid

SYS_CONTEXT('USERENV','SID')
--------------------------------------------------------------------------------
129

My SIDS; blocker and the waiter, are 132 and 129 respectively. However, the difference in 11g compared to 10g is that the session is not waiting!!. In other words, you could not tell which one is a blocker and which one is a waiter as the two sessions are moving along just fine.

SQL> select * from v$Lock where sid in (select sid from v$session where username='GVAIDHYA');

ADDR KADDR SID TY ID1 ID2 LMODE REQUEST CTIME BLOCK
-------- -------- ---------- -- ---------- ---------- ---------- ---------- ---------- ----------
6A7CA568 6A7CA594 129 AE 99 0 4 0 886 0
6A7CA9F0 6A7CAA1C 132 AE 99 0 4 0 850 0
B7E2464C B7E2467C 132 TM 57448 0 3 0 822 0
B7E2464C B7E2467C 129 TM 57448 0 3 0 871 0

SQL> select sid,serial#,blocking_session from v$session where username = 'GVAIDHYA';

SID SERIAL# BLOCKING_SESSION
---------- ---------- ----------------
129 18
132 38

1* select object_name, object_id from dba_objects where object_id in (99,57448)
SQL> /

OBJECT_NAME OBJECT_ID
------------------------------ ----------
ORA$BASE 99
MY_ALL_TABLE 57448

A look at the v$lock shows that both sessions have locked the object (my_all_table , object_id 57448) in Row-X mode (LMODE=3). Big change from 10g when the blocker had it in (LOMODE=6) and the waiters were just waiting (LMODE=0). Interestingly, there is a new twist. We also notice that there are two more entries for the same SIDs in question. The two entries point to an object_id 99, which is held in shared mode. Also notice that the "BLOCK" column is all zeroes.

What is this change and why?

There is a new TYPE called "AE". I could not find much details on this. However this is associated to an object ID 99. Upon looking at this object name, it points to a new object called ORA$BASE. Which raises an interesting question. What is this ORA$BASE? Based on an article at Alex's blog, it seems like this is an object that will maintain version/patchset related data dictionary details. Which sounds intriguing. I was wondering how Oracle was able to pull of an "online patching" though it is said that this only applied to one-off and diagnostic patches. Based on the hint from Alex's blog, It seems like Oracle is trying to maintain multiple versions (probably 2) of dictionary views one for the pre-patch state and one for the post-patch.

Though OBJECT_TYPE lists same material as it did in the previous release documentation, these is a new entry called "EDITION" for the ORA$BASE. There is a new columns called "EDITION_NAME" is said to be reserved for future use.

Yet, the questions remains as to why my row exclusive is not showing up as "lock_mode 6" in v$lock and how that it allow
changes to happen from both sessions to the same object while I clearly had them locked in exclusive mode? I am still trying to find the answer. If you have any ideas please share it with me.

Friday, January 23, 2009

SQL Tuning & Troubleshooting

Recently, an interesting performance issue cropped up at one of my clients' site. I thought I will share the issue, troubleshooting, and the takeaways with you.

It was stated that a job that used to run exceptionally well was taking a long time. Especially, considering the fact that it is a simple "DELETE" statement. The interesting part was that the "Commit" issued after the "DELETE" took a long time.

Below is a screenshot.

08:35:30 SQL> DELETE FROM apc.prep
08:35:38 2 WHERE event_id = '122608T09' AND lead_or_assoc = 'A';
0 rows deleted.
Elapsed: 00:00:00.03
08:35:40 SQL> commit;
Commit complete.
Elapsed: 00:00:32.89

That is 32 seconds for a commit of essentially "0" rows! Something was up.

Tanel Poder discussed a while ago, about the discipline of methodical common sense tuning and troubleshooting. He even had a nice presentation on this topic. This is one of the classic examples.

Instead of looking into, I/O issues, redo logs, log file sync issues etc. I decided to try a simple "delete/commit" on a sample table. It worked just fine. So I decided that something in this "candidate" object was the root cause.

By looking to see if there were any triggers on the object, I did find one. Great. Perhaps this is the problem for the slowdown. Upon looking at the code for the trigger...

CREATE OR REPLACE TRIGGER APC.prep_sheet_audit
BEFORE INSERT OR UPDATE
ON apc.prep
FOR EACH ROW
BEGIN
IF INSERTING
THEN
:NEW.added_by := SUBSTR (USER, 1, 10);
:NEW.date_added := SYSDATE;
:NEW.date_last_modified := SYSDATE;
:NEW.last_modified_by := SUBSTR (USER, 1, 10);
:NEW.status_flag := 'A';
END IF;
END;

Tt only applies to "INSERTS" and we are doing a delete. So I decided to take a 10046 trace to see what was happening.

Upon reviewing the tkprof'd trace:

I came across some interesting statements.

delete from "APC"."VW_APC_LEAD_STYLE"

call count cpu elapsed disk query current rows

------- ------ -------- ---------- ---------- ---------- ---------- ----------

Parse 1 0.00 0.00 0 0 0 0

Execute 1 7.07 15.39 1464 8703 515264 120497

Fetch 0 0.00 0.00 0 0 0 0

------- ------ -------- ---------- ---------- ---------- ---------- ----------

total 2 7.07 15.39 1464 8703 515264 120497

INSERT /*+ */ INTO "APC"."VW_APC_LEAD_STYLE"("EVENT_ID","MARKET_CODE",
"PAGE_ID","PAGE_SECTION_CODE","STYLE_NUM","LEAD_OR_ASSOC"
) SELECT
<.....sniped .....>
"PREP_SHEET"."DESCRIPTION",MAX("PREP_SHEET"."BLOCK_NUMBER") FROM
"APC"."PREP_SHEET" "PREP_SHEET" WHERE "PREP_SHEET"."LEAD_OR_ASSOC"<>'A'
GROUP BY "PREP_SHEET"."EVENT_ID","PREP_SHEET"."MARKET_CODE",
"PREP_SHEET"."PAGE_ID","PREP_SHEET"."PAGE_SECTION_CODE",
<...>

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------

Parse 1 0.02 0.00 0 34 0 0

Execute 1 7.68 17.72 63327 64462 388792 120497

Fetch 0 0.00 0.00 0 0 0 0

------- ------ -------- ---------- ---------- ---------- ---------- ----------

total 2 7.70 17.72 63327 64496 388792 120497

The total elapsed time of these two statements is approximately 32 seconds.

As we know (pasted again below..) the Commit, we noticed, the total time was

08:35:30 SQL> DELETE FROM apc.prep
08:35:38 2 WHERE event_id = '122608T09' AND lead_or_assoc = 'A';
0 rows deleted.
Elapsed: 00:00:00.03
08:35:40 SQL> commit;
Commit complete.
Elapsed: 00:00:32.89

So, these are the culprits. But a View (based on the name starting with VW)? what is this? where did this come from?

The delete followed by an insert gave me an idea that this may not be an ordinary view but perhaps a Materialized View.

Upon checking the DBA_OBJECTS,

SQL> select owner, object_name,object_id,object_type from dba_objects where object_name='VW_APC_LEAD_STYLE';

OWNER OBJECT_NAME OBJECT_ID OBJECT_TYPE

------------------------------ ------------------------------ ---------- -------------------

APC VW_APC_LEAD_STYLE 50243 TABLE

APC VW_APC_LEAD_STYLE 50245 MATERIALIZED VIEW

With the existence of a materialized view confirmed, the last piece of the puzzle is to make sure that the materialized view was created on the base table

"PREP_SHEET" though this definitely must be the case, we need proof.

So upon checking the DDL for the materialized view,

CREATE MATERIALIZED VIEW APC.VW_APC_LEAD_STYLE
BUILD IMMEDIATE
USING INDEX
TABLESPACE APC_DATA
REFRESH COMPLETE ON COMMIT
WITH PRIMARY KEY
AS
select EVENT_ID,
<....sniped...>
max(BLOCK_NUMBER) as BLOCK_NUMBER
from apc.prep_sheet
where LEAD_OR_ASSOC != 'A'
GROUP BY
<......sniped.....>
DESCRIPTION;

Now it was proved that the reason for the commit to take all that time with no rows getting deleted was not a problem with the performance of the database but due to objects that were created from behind the scenes.

Later on, this issue started throwing ORA-603 errors like the one below.

ksedmp: internal or fatal error
ORA-00603: ORACLE server session terminated by fatal error
ORA-00600: internal error code, arguments: [4080], [1], [131], [], [], [], [], []
Current SQL statement for this session:
COMMIT
Checking the alert.log confirmed
Errors in file /dkha3090/ora01/app/oracle/admin/apcu/udump/apcu_ora_10641626.trc:
ORA-00600: internal error code, arguments: [4080], [1], [131], [], [], [], [], []
Mon Jan 19 16:54:37 2009
Following on-commit snapshots not refreshed :
APC.VW_APC_LEAD_STYLE
Error 600 trapped in 2PC on transaction 4.31.61701. Cleaning up.
Error stack returned to user:
ORA-00600: internal error code, arguments: [4080], [1], [131], [], [], [], [], []
Mon Jan 19 16:54:37 2009
Errors in file /dkha3090/ora01/app/oracle/admin/apcu/udump/apcu_ora_10641626.trc:
ORA-00603: ORACLE server session terminated by fatal error
ORA-00600: internal error code, arguments: [4080], [1], [131], [], [], [], [], []
Mon Jan 19 16:54:41 2009
DISTRIB TRAN APCU.WORLD.b6b63999.4.31.61701
is local tran 4.31.61701 (hex=04.1f.f105)
insert pending collecting tran, scn=83987547675 (hex=13.8e0c461b)
Mon Jan 19 16:54:42 2009
DISTRIB TRAN APCU.WORLD.b6b63999.4.31.61701
is local tran 4.31.61701 (hex=04.1f.f105))
delete pending collecting tran, scn=83987547675 (hex=13.8e0c461b)

which led us to note Bug 1385495

With the materialized view changed to refresh on DEMAND. The problem was resolved.

Takeaways:

1. Simplistic Logical approach to performance troubleshooting always works.

2. Naming convention is very imporrtant. Generally, VW is used to refer to views though this is not written in stone. Perhaps, a prefix

of MVIEW would have helped instantly recognize that it was a materialized view instead of a view. Though this was identified quickly.

3. Someone in the past mentioned that 10046 trace is a treasure trove. This still is valid.

Wednesday, January 14, 2009

OBI Repository - Incremental Change Maintenance

I recently came across an interesting challenge: To be able to incrementally merge changes into the OBI repository. Since this is an ongoing process of production support and maintenance, the scenario is all too usual. Migrate change from DEV -> PROD (I’m skipping the QA/TEST environment just for the reasons of brevity). I’m sure you get the point.

PROD is live and well. Now changes are being made to the development environment as part of ongoing maintenance/enhancements. Now, the DEV environment needs to be rolled into production without causing disruption to the production repository.

I was browsing to see if this has been addressed in forums and blogs. Though I came across a couple of articles, they had some issues or the other. One instance suggested a way to add totally new information. In another instance, the delta information resulted in duplicate entries of the previously existing information which required a manual cleanup which is not very ideal. Especially if your repository (rpd) is too big and complex.

I found a tip in the OBIEE Forums where Madan has an elegant solution that works; though Oracle warns that the option is deprecated. The above solution has been explained with nice screen shots here.

I try to address the problem by using the supported, “merge repository” feature. Hopefully, you find this helpful in some ways. I did a test in a fairly complex setting. But for illustration purposes, I took a very simple example scenario and have shown it below.

We are well aware of the “MERGE” feature in OBI Administration tool. We are aware of the Original, Modified and Current Repository and what they all stand for. In my opinion, these terms are too confusing and does not provide a clear meaning to what they really stand for. For instance, Is the Original repository the one I am “currently” using in Production or is it the current repository? Does the Current Repository refer to the freshly updated DEV repository or is that the Original Repository because of the fact that it has all the Updates? or shall we say, “true Original”, since the changes that were not reflected in PROD at the moment invalidates it to be the “Original”? I am going to refer to the repositories in question and coin new names to them that I think will be instantly recognizable.

Current Repository – Hereafter referred to as “Target Repository”: Repository that is “actively” being used in Production. In other words, the target repository is the one that needs to be incrementally updated.

Orignial Repository – Hereafter referred to as “Updated Repository”: Repostiory (usually in DEV/QA that has all the release updates that eventually should be applied to PRODUCTION (i.e. Target Repository).

Modified Repository – Hereafter referred to as “Copy of Updated Repository” : This is a plain copy of the Updated Repository. It’ s the copy of the Updated repository that I am using as the modified repository which “actually” makes sense.

In my example, I take a simplest case of a target repository where a table “SINGLE_TABLE_TEST” exists with one column (COL1). A change has to be made to this by adding a DESCRIPTION column to the table. Once this has been done in the Updated Repository, the Target Repository needs to be updated.

Open the Target Repository in Offline mode.

This slide indicates that the Target Repository currently has just (COL1) as part of the Single_table_test structure.

By Selecting the Merge Option from Files Menu, Open the “Updated Repository” (sh_obiee_usage_track – Copy.rpd) in my example.

In the Merge Repositories window, select the copy of the Updated repository for the “Modified Repository” option. Once you are done, you will see the screen below.

This screen shows the pieces in all the three repository copies. In the middle window, the default description would be “Deleted from Current”. Select “Current” as the decision option.

Once you click “Merge”, you will see the Merged Repository to have just the incremental updates added to the target repository. Just by backing up the “target repository” and renaming the “Merged repository” to target, we now got the incremental updates in Production. In my testing, even the session and environment variables remained unchanged.

If you have any updates/comments/links to enhance this note please feel free to let me know.

Friday, January 9, 2009

RMAN Improvements Over Versions

Back in the days of 9iR2, RMAN was getting very popular but there were some issues with features where end-user requirements were not met to it's fullest extent. Over the next set of releases Oracle RMAN has improved on these issues and has become my favorites these days.

One of the biggest improvements in RMAN came in Oracle 10g with the introduction of block change tracking. However, there was another feature that left us longing for more. The "KEEP FOREVER" feature that would let us keep a backup forever (or until we desired) so to speak.

In Oracle 9i (9.2.0.7 to be precise), I tried to use this feature to retain my Oracle ERP 11i backups until a critical phases of the projects were complete say, CRP II or UAT etc. In 9i, when the retention was set to number of redundancies, and a copy of the database backup was meant to be kept forever, we ran into an issue when the newest nth backup will actually not happen.
In other words, If the redundancy was set at (let's say 5), and the first backup was "kept forever", then the 6th backup was not actually retained for restore. Instead the backup commands will complete successfully without the physical backup taking place. The only option, as I recall, was to keep incrementing the redundancy until the "keep forever" backup was to expire and do catalog maintenance to cleanup subsequent backups before getting back to the original redundancy requirement. To say the least, this was not very helpful.

Then in Oracle 10g, I was delighted to see that this issue was resolved but the one last nagging issue still remained with long term backups. That is the requirement to have the backup in a consistent state in order to have the archived logs deleted. If the backup was to be taken in an inconsistent state, Oracle kept every single one of archived logs for recovery. I used to wonder why this requirement was enforced as it is expected that an inconsistent backup when used for restore will be restored to a point-in-time or SCN and should be opened "resetlogs". In this case, there is no need for "all" the archives to be kept up until right this minute.

Finally, this critical issue has been answered in Oracle 11g.

Oracle 11g has introduced a new feature called a "restore point" in rman backups wherein a restore point that tracks the last SCN for a long-term inconsistent backup is maintained. Gone are the syntaxes "keep forever logs or nologs" instead we use "keep forever restore point XXX". This will provide us with the ability to delete all the archive logs generated after the long term backup thereby saving lot of space that in it's absence would be needed to maintain all the archived logs.

So with 11g, Now we can take a inconsistent backup that is long term with no overhead storage requirement for all the archived logs generated. That's great!!!

Below are the rman output for a 10g and 11g "keep forever" backup respectively.

10g Keep Forever Backup

RMAN> backup full database tag='full_keep_forever' keep forever logs;

Starting backup at 04-DEC-08
using channel ORA_DISK_1
using channel ORA_DISK_2
backup will never be obsolete
archived logs required to recover from this backup will expire when this backup expires
skipping datafile 6; already backed up 2 time(s)
channel ORA_DISK_1: starting full datafile backupset
channel ORA_DISK_1: specifying datafile(s) in backupset
input datafile fno=00004 name=C:\ORACLE\PRODUCT\ORADATA\O10203\USERS01.DBF
input datafile fno=00008 name=C:\ORACLE\PRODUCT\ORADATA\O10203\RMAN_TEST_TBS01.DBF
input datafile fno=00005 name=C:\ORACLE\PRODUCT\ORADATA\O10203\USERS02.DBF
channel ORA_DISK_1: starting piece 1 at 04-DEC-08
channel ORA_DISK_2: starting full datafile backupset
channel ORA_DISK_2: specifying datafile(s) in backupset
input datafile fno=00001 name=C:\ORACLE\PRODUCT\ORADATA\O10203\SYSTEM01.DBF
input datafile fno=00002 name=C:\ORACLE\PRODUCT\ORADATA\O10203\UNDOTBS01.DBF
input datafile fno=00003 name=C:\ORACLE\PRODUCT\ORADATA\O10203\SYSAUX01.DBF
input datafile fno=00007 name=C:\ORACLE\PRODUCT\ORADATA\O10203\NEW_RW_TBS01.DBF
channel ORA_DISK_2: starting piece 1 at 04-DEC-08
channel ORA_DISK_2: finished piece 1 at 04-DEC-08
piece handle=F:\RMANBACKUP\E3K1DTCM_1_1 tag=FULL_KEEP_FOREVER comment=NONE
channel ORA_DISK_2: backup set complete, elapsed time: 00:03:21
channel ORA_DISK_1: finished piece 1 at 04-DEC-08
piece handle=F:\RMANBACKUP\E2K1DTCK_1_1 tag=FULL_KEEP_FOREVER comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:03:38
Finished backup at 04-DEC-08

Starting Control File and SPFILE Autobackup at 04-DEC-08
piece handle=C:\ORACLE\PRODUCT\DB102\DATABASE\C-4123959065-20081204-01 comment=NONE
Finished Control File and SPFILE Autobackup at 04-DEC-08

11g Keep Forever Backup

RMAN> backup full database tag='full_keep_forever' keep forever restore point abc;

Starting backup at 04-DEC-08
current log archived

using channel ORA_DISK_1
backup will never be obsolete
archived logs required to recover from this backup will be backed up
channel ORA_DISK_1: starting full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
input datafile file number=00002 name=C:\ORACLE\PRODUCT\ORADATA\ORA11\SYSAUX01.DBF
input datafile file number=00001 name=C:\ORACLE\PRODUCT\ORADATA\ORA11\SYSTEM01.DBF
input datafile file number=00003 name=C:\ORACLE\PRODUCT\ORADATA\ORA11\UNDOTBS01.DBF
input datafile file number=00004 name=C:\ORACLE\PRODUCT\ORADATA\ORA11\USERS01.DBF
input datafile file number=00005 name=C:\ORACLE\PRODUCT\ORADATA\ORA11\EISCAT01.DBF
input datafile file number=00006 name=C:\ORACLE\PRODUCT\ORADATA\ORA11\HYPUSER01.DBF
input datafile file number=00007 name=C:\ORACLE\PRODUCT\ORADATA\ORA11\PROD_SRC01.DBF
channel ORA_DISK_1: starting piece 1 at 04-DEC-08
channel ORA_DISK_1: finished piece 1 at 04-DEC-08
piece handle=C:\ORACLE\PRODUCT\11G\DB_1\DATABASE\11K1DTMI_1_1 tag=FULL_KEEP_FOREVER comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:02:58

current log archived
using channel ORA_DISK_1
backup will never be obsolete
archived logs required to recover from this backup will be backed up
channel ORA_DISK_1: starting archived log backup set
channel ORA_DISK_1: specifying archived log(s) in backup set
input archived log thread=1 sequence=191 RECID=31 STAMP=672593814
channel ORA_DISK_1: starting piece 1 at 04-DEC-08
channel ORA_DISK_1: finished piece 1 at 04-DEC-08
piece handle=C:\ORACLE\PRODUCT\11G\DB_1\DATABASE\12K1DTSQ_1_1 tag=FULL_KEEP_FOREVER comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:01

using channel ORA_DISK_1
backup will never be obsolete
archived logs required to recover from this backup will be backed up
channel ORA_DISK_1: starting full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
including current SPFILE in backup set
channel ORA_DISK_1: starting piece 1 at 04-DEC-08
channel ORA_DISK_1: finished piece 1 at 04-DEC-08
piece handle=C:\ORACLE\PRODUCT\11G\DB_1\DATABASE\13K1DTT9_1_1 tag=FULL_KEEP_FOREVER comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:01

using channel ORA_DISK_1
backup will never be obsolete
archived logs required to recover from this backup will be backed up
channel ORA_DISK_1: starting full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
including current control file in backup set
channel ORA_DISK_1: starting piece 1 at 04-DEC-08
channel ORA_DISK_1: finished piece 1 at 04-DEC-08
piece handle=C:\ORACLE\PRODUCT\11G\DB_1\DATABASE\14K1DTTN_1_1 tag=FULL_KEEP_FOREVER comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:01
Finished backup at 04-DEC-08

RMAN>

Oracle: Technology, Application, and more