Hadoop - Installing Hive with Derby
Summary
This is my personal experience and summary steps to install Hive and Derby
Pre-requisite
- Download HIVE from Apache (I am using Hive 2.0.0)
- Download Derby from Apache (I am using Derby 10.12.1.1)
- Make sure Java 1.7 is installed
- Hadoop is configured and working (I am using Hadoop 2.7.1)
Installing Hive
1. Move the Hive installer to a directory. For this example, I had create a folder /usr/local/hive for Hive
cp apache-hive-2.0.0-bin.tar.gz /usr/local/hive
2. Unpackage Hive
tar -xzvf apache-hive-2.0.0-bin.tar.gz
3. Set Hive environment variable
You will need to set the following in the environment
export HIVE_HOME=/usr/local/hive/apache-hive-2.0.0-bin
export PATH=$HIVE_HOME/bin:$PATH
So, you can do the following to export Hive variable to the environment when user log in
a. Create a /etc/profile.d/hive.sh
sudo vi /etc/profile.d/hive.sh
b. Add the following in /etc/profile.d/hive.sh
export HIVE_HOME=/usr/local/hive/apache-hive-2.0.0-bin
export PATH=$HIVE_HOME/bin:$PATH
c. Source this file or re-login to setup the environment.
4. Next step, we will need to install Apache Derby
Install Hive Metastore - Apache Derby
In this example, I will use Apache Derby as Hive metastore
1. Move the Derby installer to a directory. For this example, I had create a folder /usr/local/derby for Derby
cp db-derby-10.12.1.1-bin.tar.gz /usr/local/derby
2. Unpackage Derby
tar -zxvf db-derby-10.12.1.1-bin.tar.gz
3. Set Derby environment variable
You will need to set the following
export DERBY_HOME=/usr/local/derby/db-derby-10.12.1.1-bin
export PATH=$DERBY_HOME/bin:$PATH
So, you you can do the following to export Derby variable to the environment when user log in
a. Create a /etc/profile.d/derby.sh
sudo vi /etc/profile.d/derby.sh
b. Add the following in /etc/profile.d/derby.sh
export DERBY_HOME=/usr/local/derby/db-derby-10.12.1.1-bin
export PATH=$DERBY_HOME/bin:$PATH
export DERBY_OPTS="-Dderby.system.home=$DERBY_HOME/data"
c. Source this file or re-login to setup the environment.
4. Create a Metastore directory
Create a data directory to hold the Metastore
mkdir $DERBY_HOME/data
5. Derby configuration is completed. Next section will tell you how to start and stop Derby
Start and Stop Derby
By default Derby will create databases in the directory it was started from, that mean, if you start Derby at /tmp, it will use /tmp as Derby system home and create a Metastore at /tmp. For this example, we had already set DERBY_OPTS with -Dderby.system.home=$DERBY_HOME/data. This mean, we can start Derby server at any directory and it will still use $DERBY_HOME/data as the system home.
Now you can start up Derby with
nohup startNetworkServer -h 0.0.0.0 &
To stop Derby, do
stopNetworkServer
Once you are able to startup Derby, we need to configure Hive to talk to Derby.
Configure Hive with Derby
1. Go to Hive configuration folder and create a hive-site.xml$ cd $HIVE_HOME/conf
$ cp hive-default.xml.template hive-site.xml
2. Add the following in hive-site.xml. During my installation, these variable already exist in the hive-default.xml.template. So, search for them.
3. Create /opt/hadoop/hive/conf/jpox.properties
vi $HIVE_HOME/conf/jpox.properties
4. Add the folloing to jpox.properties
5. Copy the following file to Hive library folder
javax.jdo.PersistenceManagerFactoryClass=org.jpox.PersistenceManagerFactoryImpl
org.jpox.autoCreateSchema=false
org.jpox.validateTables=false
org.jpox.validateColumns=false
org.jpox.validateConstraints=false
org.jpox.storeManagerType=rdbms
org.jpox.autoCreateSchema=true
org.jpox.autoStartMechanismMode=checked
org.jpox.transactionIsolation=read_committed
javax.jdo.option.DetachAllOnCommit=true
javax.jdo.option.NontransactionalRead=true
javax.jdo.option.ConnectionDriverName=org.apache.derby.jdbc.ClientDriver
javax.jdo.option.ConnectionURL=jdbc:derby://hostname:1527/metastore_db;create=true
javax.jdo.option.ConnectionUserName=APP
javax.jdo.option.ConnectionPassword=mine
cp $DERBY_HOME/lib/derbyclient.jar $HIVE_HOME/lib
cp $DERBY_HOME/lib/derbytools.jar $HIVE_HOME/lib
6. Hive configuration is completed. Now, we need to set up the HDFS to for HIVE to use
Configure Hadoop HDFS for HIVE
Hive need the following HDFS folder in order to run. To create them, do the following
$HADOOP_HOME/bin/hadoop fs -mkdir /tmp
$HADOOP_HOME/bin/hadoop fs -mkdir -p /user/hive/warehouse
$HADOOP_HOME/bin/hadoop fs -chmod g+w /tmp
$HADOOP_HOME/bin/hadoop fs -chmod g+w /user/hive/warehouse
Test and Run Hive
All configuration should be completed. You can test Hive with the following
Reference
1. https://cwiki.apache.org/confluence/display/Hive/AdminManual+Configuration
2. https://cwiki.apache.org/confluence/display/Hive/HiveDerbyServerMode
3. https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-RunningHive
This comment has been removed by a blog administrator.
ReplyDelete