Converting Columns to Rows

cols2rowsSometimes data is entered in columns but what you really want is to rearrange that data into rows instead.

There is a method to transpose data in Vertica using SQL, but it is a bit tedious.

Say we have the following data in a Vertica database table named TEST:

dbadmin=> SELECT * FROM test;
name | c1 | c2 | c3
-----+----+----+----
PENG |  1 |  5 |  7
TOM  |  5 |  3 |  2
JIM  |  2 |  4 |  6
(3 rows)

How would we calculate the average value of columns C1, C2 and C3 for each row in the table? Maybe something like the following query will help…

dbadmin=> SELECT AVG(C1, C2, C3)
dbadmin->   FROM test
dbadmin->  GROUP
dbadmin->     BY name;
ERROR 3457:  Function AVG(int, int, int) does not exist, or permission is denied for AVG(int, int, int)
HINT:  No function matches the given name and argument types. You may need to add explicit type casts

Oops, the AVG() function can only accept one argument as it works to average row values, not multiple column values.

So we need to somehow convert the columns C1, C2 and C3 into rows so that the AVG() function can do its thing!

Oracle 11g has a great feature in the UNPIVOT operator that can be used to convert column-based data into separate rows.

Example from Oracle:

SQL> SELECT * FROM test;
NAME          C1         C2         C3
----- ---------- ---------- ----------
JIM            2          4          6
PENG           1          5          7
TOM            5          3          2
3 rows selected.
SQL> SELECT name,
2           AVG(value)
3      FROM test
4      UNPIVOT
5        EXCLUDE NULLS
6         (VALUE FOR PROPERTY IN (c1, c2, c3))
7     GROUP
8        BY name
9     ORDER
10       BY name;
NAME  AVG(VALUE)
----- ----------
JIM            4
PENG  4.33333333
TOM   3.33333333
3 rows selected.

But how can we pull this feat off in Vertica where the UNPIVOT operator is non-existent? Well, it’s a little more tedious, but it is possible.

Example in Vertica:

dbadmin=> SELECT * FROM test;
name | c1 | c2 | c3
------+----+----+----
PENG |  1 |  5 |  7
TOM  |  5 |  3 |  2
JIM  |  2 |  4 |  6
(3 rows)
dbadmin=> SELECT foo3.name,
dbadmin->        foo2.c_avg
dbadmin->   FROM (SELECT name, rn, AVG(c) c_avg
dbadmin(>           FROM (SELECT name,
dbadmin(>                        ROW_NUMBER()OVER
dbadmin(>                          (ORDER BY c1, c2, c3) rn,
dbadmin(>                        c1 c
dbadmin(>                   FROM test
dbadmin(>                  UNION ALL
dbadmin(>                 SELECT name,
dbadmin(>                        ROW_NUMBER()OVER
dbadmin(>                          (ORDER BY c1, c2, c3) rn,
dbadmin(>                        c2 c
dbadmin(>                   FROM test
dbadmin(>                  UNION ALL
dbadmin(>                 SELECT name,
dbadmin(>                        ROW_NUMBER()OVER
dbadmin(>                          (ORDER BY c1, c2, c3) rn,
dbadmin(>                        c3 c
dbadmin(>                   FROM test) foo
dbadmin(>           GROUP BY name,
dbadmin(>                    rn) foo2
dbadmin->   JOIN (SELECT name,
dbadmin->                ROW_NUMBER() OVER
dbadmin->                  (ORDER BY c1, c2, c3) rn,
dbadmin->                c1, c2, c3
dbadmin->           FROM test) foo3
dbadmin->     ON foo3.name = foo2.name
dbadmin->    AND foo3.rn = foo2.rn
dbadmin->  ORDER
dbadmin->     BY foo3.name,
dbadmin->        foo3.rn;
name |      c_avg
------+------------------
JIM  |                4
PENG | 4.33333333333333
TOM  | 3.33333333333333
(3 rows)

The core of the column to row conversion is handled by the virtual table aliased as FOO. Each SELECT represents one of the columns to be converted to a row, and they are glued together via the UNION ALL statements. If we add a new column to the TEST table and want it included in the average calculation, we’ll have to add another SELECT to the FOO table.

Instead of retyping that long SQL command every time the data changes and we need to recalculate the average, we can create a database VIEW.

dbadmin=> CREATE OR REPLACE VIEW test_col_avg_vw AS
dbadmin=>  SELECT foo3.name,
dbadmin->         foo2.c_avg
dbadmin->    FROM (SELECT name, rn, AVG(c) c_avg
dbadmin(>            FROM (SELECT name,
dbadmin(>                         ROW_NUMBER()OVER
dbadmin(>                           (ORDER BY c1, c2, c3) rn,
dbadmin(>                         c1 c
dbadmin(>                    FROM test
dbadmin(>                   UNION ALL
dbadmin(>                  SELECT name,
dbadmin(>                         ROW_NUMBER()OVER
dbadmin(>                           (ORDER BY c1, c2, c3) rn,
dbadmin(>                         c2 c
dbadmin(>                    FROM test
dbadmin(>                   UNION ALL
dbadmin(>                  SELECT name,
dbadmin(>                         ROW_NUMBER()OVER
dbadmin(>                           (ORDER BY c1, c2, c3) rn,
dbadmin(>                         c3 c
dbadmin(>                    FROM test) foo
dbadmin(>            GROUP BY name,
dbadmin(>                     rn) foo2
dbadmin->    JOIN (SELECT name,
dbadmin->                 ROW_NUMBER() OVER
dbadmin->                   (ORDER BY c1, c2, c3) rn,
dbadmin->                 c1, c2, c3
dbadmin->            FROM test) foo3
dbadmin->      ON foo3.name = foo2.name
dbadmin->     AND foo3.rn = foo2.rn
dbadmin->   ORDER
dbadmin->      BY foo3.name,
dbadmin->         foo3.rn;
CREATE VIEW

Querying the VIEW is a lot simpler!

dbadmin=> INSERT INTO test VALUES ('PATRICK', 4, 9, 3);
OUTPUT
--------
1
(1 row)
dbadmin=> SELECT * FROM test_col_avg_vw;
name    | c1 | c2 | c3 |     col_avg
--------+----+----+----+------------------
JIM     |  2 |  4 |  6 |                4
PATRICK |  4 |  9 |  3 | 5.33333333333333
PENG    |  1 |  5 |  7 | 4.33333333333333
TOM     |  5 |  3 |  2 | 3.33333333333333
(4 rows)

Have fun!

Twitter Digg Delicious Stumbleupon Technorati Facebook Email

No comments yet... Be the first to leave a reply!

Leave a Reply

You must be logged in to post a comment.