Sunday, January 15, 2017

Hadoop - Installing Hive with Derby

Summary


This is my personal experience and summary steps to install Hive and Derby

Pre-requisite


  • Download HIVE from Apache (I am using Hive 2.0.0)
  • Download Derby from Apache (I am using Derby 10.12.1.1)
  • Make sure Java 1.7 is installed
  • Hadoop is configured and working (I am using Hadoop 2.7.1)

Installing Hive


1. Move the Hive installer to a directory. For this example, I had create a folder /usr/local/hive for Hive

cp apache-hive-2.0.0-bin.tar.gz /usr/local/hive

2. Unpackage Hive

tar -xzvf apache-hive-2.0.0-bin.tar.gz

3. Set Hive environment variable

You will need to set the following in the environment

export HIVE_HOME=/usr/local/hive/apache-hive-2.0.0-bin
export PATH=$HIVE_HOME/bin:$PATH

So, you can do the following to export Hive variable to the environment when user log in

a. Create a  /etc/profile.d/hive.sh

sudo vi /etc/profile.d/hive.sh

b. Add the following in /etc/profile.d/hive.sh

export HIVE_HOME=/usr/local/hive/apache-hive-2.0.0-bin
export PATH=$HIVE_HOME/bin:$PATH

c. Source this file or re-login to setup the environment.

4. Next step, we will need to install Apache Derby

Install Hive Metastore - Apache Derby


In this example, I will use Apache Derby as Hive metastore

1. Move the Derby installer to a directory. For this example, I had create a folder /usr/local/derby for Derby

cp db-derby-10.12.1.1-bin.tar.gz /usr/local/derby

2. Unpackage Derby

tar -zxvf db-derby-10.12.1.1-bin.tar.gz

3. Set Derby environment variable

You will need to set the following
export DERBY_HOME=/usr/local/derby/db-derby-10.12.1.1-bin
export PATH=$DERBY_HOME/bin:$PATH

So, you you can do the following to export Derby variable to the environment when user log in

a. Create a  /etc/profile.d/derby.sh

sudo vi /etc/profile.d/derby.sh

b. Add the following in /etc/profile.d/derby.sh

export DERBY_HOME=/usr/local/derby/db-derby-10.12.1.1-bin
export PATH=$DERBY_HOME/bin:$PATH
export DERBY_OPTS="-Dderby.system.home=$DERBY_HOME/data"


c. Source this file or re-login to setup the environment.

4. Create a Metastore directory

Create a data directory to hold the Metastore

mkdir $DERBY_HOME/data

5. Derby configuration is completed. Next section will tell you how to start and stop Derby


Start and Stop Derby


By default Derby will create databases in the directory it was started from, that mean, if you start Derby at /tmp, it will use /tmp as Derby system home and create a Metastore at /tmp. For this example, we had already set DERBY_OPTS with -Dderby.system.home=$DERBY_HOME/data. This mean, we can start Derby server at any directory and it will still use $DERBY_HOME/data as the system home.

Now you can start up Derby with
nohup startNetworkServer -h 0.0.0.0 &

To stop Derby, do

stopNetworkServer

Once you are able to startup Derby, we need to configure Hive to talk to Derby.

Configure Hive with Derby

1. Go to Hive configuration folder and create a hive-site.xml

$ cd $HIVE_HOME/conf
$ cp hive-default.xml.template hive-site.xml

2. Add the following in hive-site.xml. During my installation, these variable already exist in the hive-default.xml.template. So, search for them.










3. Create /opt/hadoop/hive/conf/jpox.properties

vi  $HIVE_HOME/conf/jpox.properties

4. Add the folloing to jpox.properties

javax.jdo.PersistenceManagerFactoryClass=org.jpox.PersistenceManagerFactoryImpl
org.jpox.autoCreateSchema=false
org.jpox.validateTables=false
org.jpox.validateColumns=false
org.jpox.validateConstraints=false
org.jpox.storeManagerType=rdbms
org.jpox.autoCreateSchema=true
org.jpox.autoStartMechanismMode=checked
org.jpox.transactionIsolation=read_committed
javax.jdo.option.DetachAllOnCommit=true
javax.jdo.option.NontransactionalRead=true
javax.jdo.option.ConnectionDriverName=org.apache.derby.jdbc.ClientDriver
javax.jdo.option.ConnectionURL=jdbc:derby://hostname:1527/metastore_db;create=true
javax.jdo.option.ConnectionUserName=APP
javax.jdo.option.ConnectionPassword=mine
5. Copy the following file to Hive library folder

cp $DERBY_HOME/lib/derbyclient.jar $HIVE_HOME/lib
cp $DERBY_HOME/lib/derbytools.jar $HIVE_HOME/lib

6. Hive configuration is completed. Now, we need to set up the HDFS to for HIVE to use

Configure Hadoop HDFS for HIVE


Hive need the following HDFS folder in order to run. To create them, do the following

$HADOOP_HOME/bin/hadoop fs -mkdir /tmp
$HADOOP_HOME/bin/hadoop fs -mkdir -p /user/hive/warehouse
$HADOOP_HOME/bin/hadoop fs -chmod g+w /tmp
$HADOOP_HOME/bin/hadoop fs -chmod g+w /user/hive/warehouse


Test and Run Hive

All configuration should be completed. You can test Hive with the following




Reference

1. https://cwiki.apache.org/confluence/display/Hive/AdminManual+Configuration
2. https://cwiki.apache.org/confluence/display/Hive/HiveDerbyServerMode
3. https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-RunningHive


No comments:

Post a Comment