Voltar ao site

Online purging historical data efficiently using a single command in Oracle Database

23 de fevereiro de 2020

Eventually, we are requested to delete some historical data from tables and if you are not aware of a few tricks you are probably in a huge mess. Deleting rows from a table generates a lot of undo and redo data. Undo data to support the rollback operation and redo data to guarantee the durability of your data and the undo data just generated for any recovery scenarios. When a lot of redo data is being generated means your database, that is expected to be in archive log mode, will generate a lot of archive log files. You might not be prepared for the amount of space required and so on... The space required for the archive log files will, obviously, depend on the amount of data you are deleting.

My point here is that, depending on your level of expertise, you probably know that, for example in a scenario you are keeping only 30% of the original data, you can CTAS just the portion of data you are keeping, drop the former table and rename it back to its original name. This way is much better and faster than a delete statement. I will not demonstrate that here as you can easily find articles on the web about that. If you do not know what I am talking about please read this one: Deleting many rows from a big table. Another option is dropping partitions if your table is partitioned, but partitioning requires Enterprise Edition and extra licensing. I am presenting here an operation that can be executed in any Oracle Database Enterprise Edition version 12.2 or above without the need of any extra license. Maybe by removing the "online" clause from the statement you can also run this in Standard Edition, but please check the licensing guide first and test.

First of all, let's connect to our database. Here I am using my Autonomous Transaction Processing Database TESTDB:

Let us check the size of the table we are working with:

A 12 GB table might be small comparing to your environment, but is large enough for my test here. It contains 128 million rows:

The data in it is just for testing purposes:

Let us say we want to keep only the current year data which is 2020 in the time of writing. We can check how many rows we have per year by grouping the data set by the order_date column and trunc the result by year:

If we had this table partitioned by year it would be easy to drop all the partitions older than 2020, but in this case the table is not partitioned.

With a simple command introduced in Oracle Database 12c Release 2 you can accomplish your task easily, online, with no need to rebuild the indexes:

By executing the same previous query we can confirm that only data from 2020 is kept:

Just one more count to confirm that this is all the data that is left in this table:

If you deleted the rows, despite all the undo data, redo data and archive log files that would be generated, the table would have all the 12GB still allocated, so this is a huge gain by running only one simple command:

Bear in mind that to perform any move operation you must have the free space required to accommodate the table that is being moved and all its indexes since Oracle copies the data to temporary tables and indexes in the background. If you are not aware of what I am talking about it is worth to read this one I wrote a couple of years ago (in Portuguese, but easily readable in English using Chrome translation feature): https://loredata.app/2017/08/10/o-que-acontece-quando-voce-faz-rebuild-online-do-indice/

I hope you liked it. Have fun on the next time you have to purge data, but remember to test it in a test environment before going to production! It is also worth to take a look at MOS for bugs and Oracle Docs related to that feature.

See you next time!

Franky