Converting Columns to Rows
Sometimes data is entered in columns but what you really want is to rearrange that data into rows instead.
There is a method to transpose data in Vertica using SQL, but it is a bit tedious.
Say we have the following data in a Vertica database table named TEST:
name | c1 | c2 | c3
-----+----+----+----
PENG | 1 | 5 | 7
TOM | 5 | 3 | 2
JIM | 2 | 4 | 6
(3 rows)
How would we calculate the average value of columns C1, C2 and C3 for each row in the table? Maybe something like the following query will help…
dbadmin-> FROM test
dbadmin-> GROUP
dbadmin-> BY name;
ERROR 3457: Function AVG(int, int, int) does not exist, or permission is denied for AVG(int, int, int)
HINT: No function matches the given name and argument types. You may need to add explicit type casts
Oops, the AVG() function can only accept one argument as it works to average row values, not multiple column values.
So we need to somehow convert the columns C1, C2 and C3 into rows so that the AVG() function can do its thing!
Oracle 11g has a great feature in the UNPIVOT operator that can be used to convert column-based data into separate rows.
Example from Oracle:
NAME C1 C2 C3
----- ---------- ---------- ----------
JIM 2 4 6
PENG 1 5 7
TOM 5 3 2
3 rows selected.
SQL> SELECT name,
2 AVG(value)
3 FROM test
4 UNPIVOT
5 EXCLUDE NULLS
6 (VALUE FOR PROPERTY IN (c1, c2, c3))
7 GROUP
8 BY name
9 ORDER
10 BY name;
NAME AVG(VALUE)
----- ----------
JIM 4
PENG 4.33333333
TOM 3.33333333
3 rows selected.
But how can we pull this feat off in Vertica where the UNPIVOT operator is non-existent? Well, it’s a little more tedious, but it is possible.
Example in Vertica:
name | c1 | c2 | c3
------+----+----+----
PENG | 1 | 5 | 7
TOM | 5 | 3 | 2
JIM | 2 | 4 | 6
(3 rows)
dbadmin=> SELECT foo3.name,
dbadmin-> foo2.c_avg
dbadmin-> FROM (SELECT name, rn, AVG(c) c_avg
dbadmin(> FROM (SELECT name,
dbadmin(> ROW_NUMBER()OVER
dbadmin(> (ORDER BY c1, c2, c3) rn,
dbadmin(> c1 c
dbadmin(> FROM test
dbadmin(> UNION ALL
dbadmin(> SELECT name,
dbadmin(> ROW_NUMBER()OVER
dbadmin(> (ORDER BY c1, c2, c3) rn,
dbadmin(> c2 c
dbadmin(> FROM test
dbadmin(> UNION ALL
dbadmin(> SELECT name,
dbadmin(> ROW_NUMBER()OVER
dbadmin(> (ORDER BY c1, c2, c3) rn,
dbadmin(> c3 c
dbadmin(> FROM test) foo
dbadmin(> GROUP BY name,
dbadmin(> rn) foo2
dbadmin-> JOIN (SELECT name,
dbadmin-> ROW_NUMBER() OVER
dbadmin-> (ORDER BY c1, c2, c3) rn,
dbadmin-> c1, c2, c3
dbadmin-> FROM test) foo3
dbadmin-> ON foo3.name = foo2.name
dbadmin-> AND foo3.rn = foo2.rn
dbadmin-> ORDER
dbadmin-> BY foo3.name,
dbadmin-> foo3.rn;
name | c_avg
------+------------------
JIM | 4
PENG | 4.33333333333333
TOM | 3.33333333333333
(3 rows)
The core of the column to row conversion is handled by the virtual table aliased as FOO. Each SELECT represents one of the columns to be converted to a row, and they are glued together via the UNION ALL statements. If we add a new column to the TEST table and want it included in the average calculation, we’ll have to add another SELECT to the FOO table.
Instead of retyping that long SQL command every time the data changes and we need to recalculate the average, we can create a database VIEW.
dbadmin=> SELECT foo3.name,
dbadmin-> foo2.c_avg
dbadmin-> FROM (SELECT name, rn, AVG(c) c_avg
dbadmin(> FROM (SELECT name,
dbadmin(> ROW_NUMBER()OVER
dbadmin(> (ORDER BY c1, c2, c3) rn,
dbadmin(> c1 c
dbadmin(> FROM test
dbadmin(> UNION ALL
dbadmin(> SELECT name,
dbadmin(> ROW_NUMBER()OVER
dbadmin(> (ORDER BY c1, c2, c3) rn,
dbadmin(> c2 c
dbadmin(> FROM test
dbadmin(> UNION ALL
dbadmin(> SELECT name,
dbadmin(> ROW_NUMBER()OVER
dbadmin(> (ORDER BY c1, c2, c3) rn,
dbadmin(> c3 c
dbadmin(> FROM test) foo
dbadmin(> GROUP BY name,
dbadmin(> rn) foo2
dbadmin-> JOIN (SELECT name,
dbadmin-> ROW_NUMBER() OVER
dbadmin-> (ORDER BY c1, c2, c3) rn,
dbadmin-> c1, c2, c3
dbadmin-> FROM test) foo3
dbadmin-> ON foo3.name = foo2.name
dbadmin-> AND foo3.rn = foo2.rn
dbadmin-> ORDER
dbadmin-> BY foo3.name,
dbadmin-> foo3.rn;
CREATE VIEW
Querying the VIEW is a lot simpler!
OUTPUT
--------
1
(1 row)
dbadmin=> SELECT * FROM test_col_avg_vw;
name | c1 | c2 | c3 | col_avg
--------+----+----+----+------------------
JIM | 2 | 4 | 6 | 4
PATRICK | 4 | 9 | 3 | 5.33333333333333
PENG | 1 | 5 | 7 | 4.33333333333333
TOM | 5 | 3 | 2 | 3.33333333333333
(4 rows)
Have fun!
No comments yet... Be the first to leave a reply!