Thursday, March 23, 2017

Hive - Hive metastore database is not initialized

Problem


I encountered the following error when I tried to start up Hive

Exception in thread "main" java.lang.RuntimeException: Hive metastore database is not initialized. Please use schematool (e.g. ./schematool -initSchema -dbType ...) to create the schema. If needed, don't forget to include the option to auto-create the underlying database in your JDBC connection string (e.g. ?createDatabaseIfNotExist=true for mysql)

Solution


a. Delete the existing $DERBY_HOME/data/metastore_db with rm command

b. Stop the Derby server

c. Run schematool -initSchema -dbType derby

Tuesday, February 7, 2017

Hive - Permission denied when trying to start Hive with other users

Problem


I had installed Hive and the Hiver CLI work with the user who installed Hive. However, when I try to start Hive CLI with another user, it failed with

Exception in thread "main" java.lang.RuntimeException: java.lang.RuntimeException: java.io.IOException: Permission denied
    at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522)
    at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:677)
    at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: java.lang.RuntimeException: java.io.IOException: Permission denied
    at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:515)
    ... 8 more
Caused by: java.io.IOException: Permission denied
    at java.io.UnixFileSystem.createFileExclusively(Native Method)
    at java.io.File.createTempFile(File.java:2001)
    at org.apache.hadoop.hive.ql.session.SessionState.createTempFile(SessionState.java:818)
    at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:513)

Solution


Usually, this is due to permission issue at the HIVE scratch directory (hive.exec.local.scratchdir) defined at hive-site.xml. You could use chmod to change the directory permission directly or configure hive.scratch.dir.permission at hive-site.xml to other value (default at 700)

Sunday, January 15, 2017

Hadoop - Installing Hive with Derby

Summary


This is my personal experience and summary steps to install Hive and Derby

Pre-requisite


  • Download HIVE from Apache (I am using Hive 2.0.0)
  • Download Derby from Apache (I am using Derby 10.12.1.1)
  • Make sure Java 1.7 is installed
  • Hadoop is configured and working (I am using Hadoop 2.7.1)

Installing Hive


1. Move the Hive installer to a directory. For this example, I had create a folder /usr/local/hive for Hive

cp apache-hive-2.0.0-bin.tar.gz /usr/local/hive

2. Unpackage Hive

tar -xzvf apache-hive-2.0.0-bin.tar.gz

3. Set Hive environment variable

You will need to set the following in the environment

export HIVE_HOME=/usr/local/hive/apache-hive-2.0.0-bin
export PATH=$HIVE_HOME/bin:$PATH

So, you can do the following to export Hive variable to the environment when user log in

a. Create a  /etc/profile.d/hive.sh

sudo vi /etc/profile.d/hive.sh

b. Add the following in /etc/profile.d/hive.sh

export HIVE_HOME=/usr/local/hive/apache-hive-2.0.0-bin
export PATH=$HIVE_HOME/bin:$PATH

c. Source this file or re-login to setup the environment.

4. Next step, we will need to install Apache Derby

Install Hive Metastore - Apache Derby


In this example, I will use Apache Derby as Hive metastore

1. Move the Derby installer to a directory. For this example, I had create a folder /usr/local/derby for Derby

cp db-derby-10.12.1.1-bin.tar.gz /usr/local/derby

2. Unpackage Derby

tar -zxvf db-derby-10.12.1.1-bin.tar.gz

3. Set Derby environment variable

You will need to set the following
export DERBY_HOME=/usr/local/derby/db-derby-10.12.1.1-bin
export PATH=$DERBY_HOME/bin:$PATH

So, you you can do the following to export Derby variable to the environment when user log in

a. Create a  /etc/profile.d/derby.sh

sudo vi /etc/profile.d/derby.sh

b. Add the following in /etc/profile.d/derby.sh

export DERBY_HOME=/usr/local/derby/db-derby-10.12.1.1-bin
export PATH=$DERBY_HOME/bin:$PATH
export DERBY_OPTS="-Dderby.system.home=$DERBY_HOME/data"


c. Source this file or re-login to setup the environment.

4. Create a Metastore directory

Create a data directory to hold the Metastore

mkdir $DERBY_HOME/data

5. Derby configuration is completed. Next section will tell you how to start and stop Derby


Start and Stop Derby


By default Derby will create databases in the directory it was started from, that mean, if you start Derby at /tmp, it will use /tmp as Derby system home and create a Metastore at /tmp. For this example, we had already set DERBY_OPTS with -Dderby.system.home=$DERBY_HOME/data. This mean, we can start Derby server at any directory and it will still use $DERBY_HOME/data as the system home.

Now you can start up Derby with
nohup startNetworkServer -h 0.0.0.0 &

To stop Derby, do

stopNetworkServer

Once you are able to startup Derby, we need to configure Hive to talk to Derby.

Configure Hive with Derby

1. Go to Hive configuration folder and create a hive-site.xml

$ cd $HIVE_HOME/conf
$ cp hive-default.xml.template hive-site.xml

2. Add the following in hive-site.xml. During my installation, these variable already exist in the hive-default.xml.template. So, search for them.










3. Create /opt/hadoop/hive/conf/jpox.properties

vi  $HIVE_HOME/conf/jpox.properties

4. Add the folloing to jpox.properties

javax.jdo.PersistenceManagerFactoryClass=org.jpox.PersistenceManagerFactoryImpl
org.jpox.autoCreateSchema=false
org.jpox.validateTables=false
org.jpox.validateColumns=false
org.jpox.validateConstraints=false
org.jpox.storeManagerType=rdbms
org.jpox.autoCreateSchema=true
org.jpox.autoStartMechanismMode=checked
org.jpox.transactionIsolation=read_committed
javax.jdo.option.DetachAllOnCommit=true
javax.jdo.option.NontransactionalRead=true
javax.jdo.option.ConnectionDriverName=org.apache.derby.jdbc.ClientDriver
javax.jdo.option.ConnectionURL=jdbc:derby://hostname:1527/metastore_db;create=true
javax.jdo.option.ConnectionUserName=APP
javax.jdo.option.ConnectionPassword=mine
5. Copy the following file to Hive library folder

cp $DERBY_HOME/lib/derbyclient.jar $HIVE_HOME/lib
cp $DERBY_HOME/lib/derbytools.jar $HIVE_HOME/lib

6. Hive configuration is completed. Now, we need to set up the HDFS to for HIVE to use

Configure Hadoop HDFS for HIVE


Hive need the following HDFS folder in order to run. To create them, do the following

$HADOOP_HOME/bin/hadoop fs -mkdir /tmp
$HADOOP_HOME/bin/hadoop fs -mkdir -p /user/hive/warehouse
$HADOOP_HOME/bin/hadoop fs -chmod g+w /tmp
$HADOOP_HOME/bin/hadoop fs -chmod g+w /user/hive/warehouse


Test and Run Hive

All configuration should be completed. You can test Hive with the following




Reference

1. https://cwiki.apache.org/confluence/display/Hive/AdminManual+Configuration
2. https://cwiki.apache.org/confluence/display/Hive/HiveDerbyServerMode
3. https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-RunningHive


Sunday, November 27, 2016

Linux - Locking a file

You could use the following command to lock a Linux file


#Open myfile as file descriptor 2
$ exec 2>myfile
#lock the file descriptor 2
$ flock -x 2

And release the lock with the following

$ exec 2>&-

Then, you could show file descriptor holding with the following command

lsof | grep myfile

or

lsof myfile

It could return something like

bash        132  userX    2wW   REG    8,2    0 135612 myfile

The following is the man page from lsof


FD
is the File Descriptor number of the file or:
cwd current working directory; Lnn library references (AIX); err FD information error (see NAME column); jld jail directory (FreeBSD); ltx shared library text (code and data); Mxx hex memory-mapped type number xx. m86 DOS Merge mapped file; mem memory-mapped file; mmap memory-mapped device; pd parent directory; rtd root directory; tr kernel trace file (OpenBSD); txt program text (code and data); v86 VP/ix mapped file;
FD is followed by one of these characters, describing the mode under which the file is open:
r for read access;
w for write access;
u for read and write access;
space if mode unknown and no lock
character follows;
'-' if mode unknown and lock
character follows.
The mode character is followed by one of these lock characters, describing
the type of lock applied to the file:
N for a Solaris NFS lock of unknown type;
r for read lock on part of the file;
R for a read lock on the entire file;
w for a write lock on part of the file;
W for a write lock on the entire file;
u for a read and write lock of any length;
U for a lock of unknown type;
x for an SCO OpenServer Xenix lock on part
of the file;
X for an SCO OpenServer Xenix lock on the
entire file;
space if there is no lock.

So, if the output is 2wW, it means File Descriptor 2 is open for write access and has a write lock on the entire file.

Sunday, October 16, 2016

Oracle - Unable to extend index SYS.I_OBJ1 by 8 in tablespace SYSTEM

If you see the following similar error message

unable to extend index SYS.I_OBJ1 by 8 in tablespace SYSTEM

It means that your tablespace is out of space and you should increase it if possible.

You can issue the following command to check your current size of the tablespace

select * from dba_data_files where tablespace_name='SYSTEM';

It will return the path and the current size of the tablespace



The above should that I have only about 650M tablespace.

Then, depend on your current size, you could issue the following 

alter database datafile '/u01/app/oracle/oradata/XE/system.dbf' resize 1024M

Note: The above example is increasing to 1024M to my path. You should modify the path and size according to your need.

Reference:

1. http://www.markcallen.com/oracle/ora-01654-unable-to-extend-index/

Thursday, September 22, 2016

Tomcat - Setting up SSL with self-signed certificate

At the Tomcat document (SSL How To) provides a detailed explanation on how to create Tomcat with SSL (Self-Signed)

Below are the essential steps required for Linux

1. Prepare the certificate key store

Run the command

$JAVA_HOME/bin/keytool -genkey -alias tomcat -keyalg RSA

This command will create a new file, in the home directory of the user under which you run it, named ".keystore".

If you want to create a keystore somewhere else, you can use

$JAVA_HOME/bin/keytool -genkey -alias tomcat -keyalg RSA
  -keystore /path/to/my/keystore

2. Edit the Tomcat server.xml to something similar as following



The above will use JSSE implementation of SSL.

Tuesday, August 23, 2016

Hadoop - How to setup a Hadoop Cluster

Below is a step-by-step guide which I had used to setup a Hadoop Cluster

Scenario


3 VMs involved:

1) NameNode, ResourceManager - Host name: NameNode.net
2) DataNode 1 - Host name: DataNode1.net
3) DataNode 2 - Host name: DataNode2.net


Pre-requisite


1) You could create a new Hadoop user or use an existing user. But make sure the user have access to the Hadoop installation in ALL nodes

2) Install JAVA. Refer here for a good version. In this guide, Java is installed at /usr/java/latest

3) Download a stable version of Hadoop from Apache Mirrors

This guide is based on Hadoop 2.7.1 and assume that we had create a user call hadoop


Setup Passphaseless SSH from NameNode to all Nodes.


1) Run the command

ssh-keygen

This command will ask you a set of questions and accepting the default is fine. Eventually, it will create a set of private key (id_rsa) and public key (id_rsa.pub) at the user directory (/home/hadoop/.ssh)

2) Copy the public key to all Nodes with the following

ssh-copy-id -i /home/hadoop/.ssh/id_rsa.pub NameNode.net
ssh-copy-id -i /home/hadoop/.ssh/id_rsa.pub DataNode1.net
ssh-copy-id -i /home/hadoop/.ssh/id_rsa.pub DataNode2.net

3) Test the passphaseless SSH connection from NameNode with

ssh (hostname)


Install Hadoop in all Node


1) With the downloaded Hadoop distribution. Unzip it to a location where the Hadoop user had access

For this guide, I had create a /usr/local/hadoop and un-tar the distribution at this folder. The full path of Hadoop installation is /usr/local/hadoop/hadoop-2.7.1


Setup Environment Variables


1) It is best that Hadoop Variables are exported to the environment when user log in. To do so, run the command at the NameNode

sudo vi /etc/profile.d/hadoop.sh

2) Add the following in /etc/profile.d/hadoop.sh

export JAVA_HOME=/usr/java/latest
export HADOOP_HOME=/usr/hadoop
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_YARN_HOME=$HADOOP_HOME
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/native"
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$HADOOP_HOME/sbin:$HADOOP_HOME/bin:$JAVA_HOME/bin:$PATH
3) Source this file or re-login to setup the environment.

4) (OPTIONAL) Set up the above for all Nodes.


Setup NameNode & ResourceManager


1) Make a directory to hold NameNode data

mkdir /usr/local/hadoop/hdfs_namenode

2) Setup $HADOOP_HOME/etc/hadoop/hdfs-site.xml




Note: dfs.datanode.data.dir value must be a URI

3) Setup $HADOOP_HOME/etc/hadoop/core-site.xml





4) (OPTIONAL) Setup $HADOOP_HOME/etc/hadoop/mapred-site.xml (We are using NameNode as ResourceManager)



5) (OPTIONAL) Setup $HADOOP_HOME/etc/hadoop/yarn-site.xml (We are using NameNode as ResourceManager)


6) Setup $HADOOP_HOME/etc/hadoop/slaves

First, remove localhost from the file, then add the following



Setup DataNodes


1) Make a directory to hold DataNode data

mkdir /usr/local/hadoop/hdfs_datanode

2) Setup $HADOOP_HOME/etc/hadoop/hdfs-site.xml



Note: dfs.datanode.data.dir value must be a URI

3) Setup $HADOOP_HOME/etc/hadoop/core-site.xml




Format NameNode


The above setting should be enough to set up the Hadoop cluster. Next, for the first time, you will need to format the NameNode. Use the following command to format the NameNode

hdfs namenode -format

Example output is



Note: the same command can be used to reformat your existing NameNode. But remember to clean up your datanodes hdfs folder as well.


Start NameNode


You can start Hadoop with the given script

start-dfs.sh

Example output is










Stop NameNode


You can stop Hadoop with the given script

stop-dfs.sh

Example output is



Start ResourceManager


You can start the ResourceManager, in this case, Yarn, with the given script

start-yarn.sh

Example output is




Stop ResourceManager


You can stop the ResourceManager, in this case, Yarn, with the given script

stop-yarn.sh

Example output is



Show status of Hadoop


You can use the following command to show status of Hadoop

jps

Example output is











Complete Testing


You can also do the following to perform a complete test to ensure Hadoop is running fine.






















You could access the Hadoop Resource Manager information at http://NameNode_hostname:8088



You could also access the Hadoop cluster summary at http://NameNode_hostname:50070. You should be able to see the number of datanodes being setup for the cluster.


Reference


1. http://www.server-world.info/en/note?os=CentOS_7&p=hadoop
2. http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/ClusterSetup.html