Friday, February 26, 2016

Hive: Auto Increment column


In this post I will explain what I did in order to generate a unique and incremental ID in Hive.

Hive lets you  create custom UDFs to solve problems of this type.  As Hive is written in Java, UDFs will need to be written in JAVA.

Researching about Hive, I realized that exists a jar called hive-contrib-1.1.0.jar that lets you create row_sequences. For this you need to add the specific jar in Hive.

In my case, I copied this jar into HDFS and then I required it in Hive in this way:

add jar hdfs:///user/jars/hive-contrib-1.1.0.jar;

And now you can use the UDF to define row_sequence() function to process for auto increase ID:

CREATE TEMPORARY FUNCTION row_sequence as 'org.apache.hadoop.hive.contrib.udf.UDFRowSequence';

And you could use the query for getting the unique identifier:

CREATE TABLE IF NOT EXISTS users
(
ID int,
name  varchar(60)
) row format delimited fields terminated by '\073' stored as textfile;

INSERT OVERWRITE TABLE users SELECT row_sequence(), name FROM users;


If this post was useful for you, maybe you could be interested in this other topic: Analyse Tweets using Flume, Hadoop and Hive

 

No comments:

Post a Comment