Recursive SQL for Data Normalisation


Recursive SQL can be awesome, although a bit hard to read in its SQL standard beauty. Let’s assume you have some aggregated data with dates and a number of events per date:

|                           DATE | COUNT |
|--------------------------------|-------|
| October, 01 2013 00:00:00+0000 |     2 |
| October, 02 2013 00:00:00+0000 |     1 |
| October, 03 2013 00:00:00+0000 |     3 |
| October, 04 2013 00:00:00+0000 |     4 |
| October, 05 2013 00:00:00+0000 |     2 |
| October, 06 2013 00:00:00+0000 |     0 |
| October, 07 2013 00:00:00+0000 |     2 |

Now let’s assume you want to normalise or “unaggregate” this data, generating “COUNT” records per date. The desired output is this:

|                           DATE | EVENT_NUMBER |
|--------------------------------|--------------|
| October, 01 2013 00:00:00+0000 |            1 |
| October, 01 2013 00:00:00+0000 |            2 |
| October, 02 2013 00:00:00+0000 |            1 |
| October, 03 2013 00:00:00+0000 |            1 |
| October, 03 2013 00:00:00+0000 |            2 |
| October, 03 2013 00:00:00+0000 |            3 |
| October, 04 2013 00:00:00+0000 |            1 |
| October, 04 2013 00:00:00+0000 |            2 |
| October, 04 2013 00:00:00+0000 |            3 |
| October, 04 2013 00:00:00+0000 |            4 |
| October, 05 2013 00:00:00+0000 |            1 |
| October, 05 2013 00:00:00+0000 |            2 |
| October, 07 2013 00:00:00+0000 |            1 |
| October, 07 2013 00:00:00+0000 |            2 |

As you may have noticed, there are no records for those dates with zero events (October 06). With recursive SQL, this is rather simple to achieve.

with recursive

-- Data could also be a regular table containing
-- the actual data
data(date, count) as (
  select date '2013-10-01', 2 union all
  select date '2013-10-02', 1 union all
  select date '2013-10-03', 3 union all
  select date '2013-10-04', 4 union all
  select date '2013-10-05', 2 union all
  select date '2013-10-06', 0 union all
  select date '2013-10-07', 2
),

-- This is the recursive common table expression
-- It starts with all data where count > 0
-- ... and then recurses by subtracting one
recurse(date, count) as (
  select date, count
  from data
  where count > 0
  union all
  select date, count - 1
  from recurse
  where count > 1
)
select date, count event_number from recurse
order by date asc, event_number asc;

See also this SQLFiddle to see the above CTE in action.

Incredibly, Oracle’s CONNECT BY clause doesn’t seem to be an option here. I challenge you to find a better solution, though! For instance, this beautiful solution that works with PostgreSQL:

with recursive
data(date, count) as (
  select date '2013-10-01', 2 union all
  select date '2013-10-02', 1 union all
  select date '2013-10-03', 3 union all
  select date '2013-10-04', 4 union all
  select date '2013-10-05', 2 union all
  select date '2013-10-06', 0 union all
  select date '2013-10-07', 2
)
select date, generate_series(1, count) event_number
from data
where count > 0
order by date asc, event_number asc;

Tags: , , , , ,

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 1,844 other followers

%d bloggers like this: