To detect duplication prior to loading data into heavyDB, you can perform the following steps. For this example, the files are labeled A,B,C...Z.
- Load file A into the table
MYTABLE
- Run the following query.
select count(t1.uniqueCol) as dups from MYTABLE t1 join MYTABLE t2 on t1.uCol = t2.uCol;
There should be no rows returned; if rows are returned, your first A file is not unique.
- Load file B into able
TEMPTABLE
- Run the following query.
select count(t1.uniqueCol) as dups from MYTABLE t1 join MYTABLE t2 on t1.uCol = t2.uCol;
There should be no rows returned if file B is unique. Fix B if the information is not unique using details from the selection.
- Load the fixed B file into
MYFILE
- Drop table
TEMPTABLE
- Repeat steps 3-6 for the rest of the set for each file prior to loading the data to the real
MYTABLE
instance.
Comments
0 comments
Article is closed for comments.