Tuesday, February 7, 2017

Hive - Permission denied when trying to start Hive with other users


I had installed Hive and the Hiver CLI work with the user who installed Hive. However, when I try to start Hive CLI with another user, it failed with

Exception in thread "main" java.lang.RuntimeException: java.lang.RuntimeException: java.io.IOException: Permission denied
    at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522)
    at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:677)
    at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: java.lang.RuntimeException: java.io.IOException: Permission denied
    at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:515)
    ... 8 more
Caused by: java.io.IOException: Permission denied
    at java.io.UnixFileSystem.createFileExclusively(Native Method)
    at java.io.File.createTempFile(File.java:2001)
    at org.apache.hadoop.hive.ql.session.SessionState.createTempFile(SessionState.java:818)
    at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:513)


Usually, this is due to permission issue at the HIVE scratch directory (hive.exec.local.scratchdir) defined at hive-site.xml. You could use chmod to change the directory permission directly or configure hive.scratch.dir.permission at hive-site.xml to other value (default at 700)

Sunday, January 15, 2017

Hadoop - Installing Hive with Derby


This is my personal experience and summary steps to install Hive and Derby


  • Download HIVE from Apache (I am using Hive 2.0.0)
  • Download Derby from Apache (I am using Derby
  • Make sure Java 1.7 is installed
  • Hadoop is configured and working (I am using Hadoop 2.7.1)

Installing Hive

1. Move the Hive installer to a directory. For this example, I had create a folder /usr/local/hive for Hive

cp apache-hive-2.0.0-bin.tar.gz /usr/local/hive

2. Unpackage Hive

tar -xzvf apache-hive-2.0.0-bin.tar.gz

3. Set Hive environment variable

You will need to set the following in the environment

export HIVE_HOME=/usr/local/hive/apache-hive-2.0.0-bin
export PATH=$HIVE_HOME/bin:$PATH

So, you can do the following to export Hive variable to the environment when user log in

a. Create a  /etc/profile.d/hive.sh

sudo vi /etc/profile.d/hive.sh

b. Add the following in /etc/profile.d/hive.sh

export HIVE_HOME=/usr/local/hive/apache-hive-2.0.0-bin
export PATH=$HIVE_HOME/bin:$PATH

c. Source this file or re-login to setup the environment.

4. Next step, we will need to install Apache Derby

Install Hive Metastore - Apache Derby

In this example, I will use Apache Derby as Hive metastore

1. Move the Derby installer to a directory. For this example, I had create a folder /usr/local/derby for Derby

cp db-derby- /usr/local/derby

2. Unpackage Derby

tar -zxvf db-derby-

3. Set Derby environment variable

You will need to set the following
export DERBY_HOME=/usr/local/derby/db-derby-

So, you you can do the following to export Derby variable to the environment when user log in

a. Create a  /etc/profile.d/derby.sh

sudo vi /etc/profile.d/derby.sh

b. Add the following in /etc/profile.d/derby.sh

export DERBY_HOME=/usr/local/derby/db-derby-
export DERBY_OPTS="-Dderby.system.home=$DERBY_HOME/data"

c. Source this file or re-login to setup the environment.

4. Create a Metastore directory

Create a data directory to hold the Metastore

mkdir $DERBY_HOME/data

5. Derby configuration is completed. Next section will tell you how to start and stop Derby

Start and Stop Derby

By default Derby will create databases in the directory it was started from, that mean, if you start Derby at /tmp, it will use /tmp as Derby system home and create a Metastore at /tmp. For this example, we had already set DERBY_OPTS with -Dderby.system.home=$DERBY_HOME/data. This mean, we can start Derby server at any directory and it will still use $DERBY_HOME/data as the system home.

Now you can start up Derby with
nohup startNetworkServer -h &

To stop Derby, do


Once you are able to startup Derby, we need to configure Hive to talk to Derby.

Configure Hive with Derby

1. Go to Hive configuration folder and create a hive-site.xml

$ cd $HIVE_HOME/conf
$ cp hive-default.xml.template hive-site.xml

2. Add the following in hive-site.xml. During my installation, these variable already exist in the hive-default.xml.template. So, search for them.

3. Create /opt/hadoop/hive/conf/jpox.properties

vi  $HIVE_HOME/conf/jpox.properties

4. Add the folloing to jpox.properties

5. Copy the following file to Hive library folder

cp $DERBY_HOME/lib/derbyclient.jar $HIVE_HOME/lib
cp $DERBY_HOME/lib/derbytools.jar $HIVE_HOME/lib

6. Hive configuration is completed. Now, we need to set up the HDFS to for HIVE to use

Configure Hadoop HDFS for HIVE

Hive need the following HDFS folder in order to run. To create them, do the following

$HADOOP_HOME/bin/hadoop fs -mkdir /tmp
$HADOOP_HOME/bin/hadoop fs -mkdir -p /user/hive/warehouse
$HADOOP_HOME/bin/hadoop fs -chmod g+w /tmp
$HADOOP_HOME/bin/hadoop fs -chmod g+w /user/hive/warehouse

Test and Run Hive

All configuration should be completed. You can test Hive with the following


1. https://cwiki.apache.org/confluence/display/Hive/AdminManual+Configuration
2. https://cwiki.apache.org/confluence/display/Hive/HiveDerbyServerMode
3. https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-RunningHive

Sunday, November 27, 2016

Linux - Locking a file

You could use the following command to lock a Linux file

#Open myfile as file descriptor 2
$ exec 2>myfile
#lock the file descriptor 2
$ flock -x 2

And release the lock with the following

$ exec 2>&-

Then, you could show file descriptor holding with the following command

lsof | grep myfile


lsof myfile

It could return something like

bash        132  userX    2wW   REG    8,2    0 135612 myfile

The following is the man page from lsof

is the File Descriptor number of the file or:
cwd current working directory; Lnn library references (AIX); err FD information error (see NAME column); jld jail directory (FreeBSD); ltx shared library text (code and data); Mxx hex memory-mapped type number xx. m86 DOS Merge mapped file; mem memory-mapped file; mmap memory-mapped device; pd parent directory; rtd root directory; tr kernel trace file (OpenBSD); txt program text (code and data); v86 VP/ix mapped file;
FD is followed by one of these characters, describing the mode under which the file is open:
r for read access;
w for write access;
u for read and write access;
space if mode unknown and no lock
character follows;
'-' if mode unknown and lock
character follows.
The mode character is followed by one of these lock characters, describing
the type of lock applied to the file:
N for a Solaris NFS lock of unknown type;
r for read lock on part of the file;
R for a read lock on the entire file;
w for a write lock on part of the file;
W for a write lock on the entire file;
u for a read and write lock of any length;
U for a lock of unknown type;
x for an SCO OpenServer Xenix lock on part
of the file;
X for an SCO OpenServer Xenix lock on the
entire file;
space if there is no lock.

So, if the output is 2wW, it means File Descriptor 2 is open for write access and has a write lock on the entire file.

Sunday, October 16, 2016

Oracle - Unable to extend index SYS.I_OBJ1 by 8 in tablespace SYSTEM

If you see the following similar error message

unable to extend index SYS.I_OBJ1 by 8 in tablespace SYSTEM

It means that your tablespace is out of space and you should increase it if possible.

You can issue the following command to check your current size of the tablespace

select * from dba_data_files where tablespace_name='SYSTEM';

It will return the path and the current size of the tablespace

The above should that I have only about 650M tablespace.

Then, depend on your current size, you could issue the following 

alter database datafile '/u01/app/oracle/oradata/XE/system.dbf' resize 1024M

Note: The above example is increasing to 1024M to my path. You should modify the path and size according to your need.


1. http://www.markcallen.com/oracle/ora-01654-unable-to-extend-index/

Thursday, September 22, 2016

Tomcat - Setting up SSL with self-signed certificate

At the Tomcat document (SSL How To) provides a detailed explanation on how to create Tomcat with SSL (Self-Signed)

Below are the essential steps required for Linux

1. Prepare the certificate key store

Run the command

$JAVA_HOME/bin/keytool -genkey -alias tomcat -keyalg RSA

This command will create a new file, in the home directory of the user under which you run it, named ".keystore".

If you want to create a keystore somewhere else, you can use

$JAVA_HOME/bin/keytool -genkey -alias tomcat -keyalg RSA
  -keystore /path/to/my/keystore

2. Edit the Tomcat server.xml to something similar as following

The above will use JSSE implementation of SSL.

Tuesday, August 23, 2016

Hadoop - How to setup a Hadoop Cluster

Below is a step-by-step guide which I had used to setup a Hadoop Cluster


3 VMs involved:

1) NameNode, ResourceManager - Host name: NameNode.net
2) DataNode 1 - Host name: DataNode1.net
3) DataNode 2 - Host name: DataNode2.net


1) You could create a new Hadoop user or use an existing user. But make sure the user have access to the Hadoop installation in ALL nodes

2) Install JAVA. Refer here for a good version. In this guide, Java is installed at /usr/java/latest

3) Download a stable version of Hadoop from Apache Mirrors

This guide is based on Hadoop 2.7.1 and assume that we had create a user call hadoop

Setup Passphaseless SSH from NameNode to all Nodes.

1) Run the command


This command will ask you a set of questions and accepting the default is fine. Eventually, it will create a set of private key (id_rsa) and public key (id_rsa.pub) at the user directory (/home/hadoop/.ssh)

2) Copy the public key to all Nodes with the following

ssh-copy-id -i /home/hadoop/.ssh/id_rsa.pub NameNode.net
ssh-copy-id -i /home/hadoop/.ssh/id_rsa.pub DataNode1.net
ssh-copy-id -i /home/hadoop/.ssh/id_rsa.pub DataNode2.net

3) Test the passphaseless SSH connection from NameNode with

ssh (hostname)

Install Hadoop in all Node

1) With the downloaded Hadoop distribution. Unzip it to a location where the Hadoop user had access

For this guide, I had create a /usr/local/hadoop and un-tar the distribution at this folder. The full path of Hadoop installation is /usr/local/hadoop/hadoop-2.7.1

Setup Environment Variables

1) It is best that Hadoop Variables are exported to the environment when user log in. To do so, run the command at the NameNode

sudo vi /etc/profile.d/hadoop.sh

2) Add the following in /etc/profile.d/hadoop.sh

export JAVA_HOME=/usr/java/latest
export HADOOP_HOME=/usr/hadoop
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/native"
3) Source this file or re-login to setup the environment.

4) (OPTIONAL) Set up the above for all Nodes.

Setup NameNode & ResourceManager

1) Make a directory to hold NameNode data

mkdir /usr/local/hadoop/hdfs_namenode

2) Setup $HADOOP_HOME/etc/hadoop/hdfs-site.xml

Note: dfs.datanode.data.dir value must be a URI

3) Setup $HADOOP_HOME/etc/hadoop/core-site.xml

4) (OPTIONAL) Setup $HADOOP_HOME/etc/hadoop/mapred-site.xml (We are using NameNode as ResourceManager)

5) (OPTIONAL) Setup $HADOOP_HOME/etc/hadoop/yarn-site.xml (We are using NameNode as ResourceManager)

6) Setup $HADOOP_HOME/etc/hadoop/slaves

First, remove localhost from the file, then add the following

Setup DataNodes

1) Make a directory to hold DataNode data

mkdir /usr/local/hadoop/hdfs_datanode

2) Setup $HADOOP_HOME/etc/hadoop/hdfs-site.xml

Note: dfs.datanode.data.dir value must be a URI

3) Setup $HADOOP_HOME/etc/hadoop/core-site.xml

Format NameNode

The above setting should be enough to set up the Hadoop cluster. Next, for the first time, you will need to format the NameNode. Use the following command to format the NameNode

hdfs namenode -format

Example output is

Note: the same command can be used to reformat your existing NameNode. But remember to clean up your datanodes hdfs folder as well.

Start NameNode

You can start Hadoop with the given script


Example output is

Stop NameNode

You can stop Hadoop with the given script


Example output is

Start ResourceManager

You can start the ResourceManager, in this case, Yarn, with the given script


Example output is

Stop ResourceManager

You can stop the ResourceManager, in this case, Yarn, with the given script


Example output is

Show status of Hadoop

You can use the following command to show status of Hadoop


Example output is

Complete Testing

You can also do the following to perform a complete test to ensure Hadoop is running fine.

You could access the Hadoop Resource Manager information at http://NameNode_hostname:8088

You could also access the Hadoop cluster summary at http://NameNode_hostname:50070. You should be able to see the number of datanodes being setup for the cluster.


1. http://www.server-world.info/en/note?os=CentOS_7&p=hadoop
2. http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/ClusterSetup.html

Tuesday, August 9, 2016

JAVA - _JAVA_OPTIONS and JAVA_TOOL_OPTIONS environment variable

JAVA_TOOL_OPTIONS and _JAVA_OPTIONS are 2 useful environment variables which allow user to set JVM options in the form of environment variables, rather than setting it at the command line. But, they have slight differences

1. Precedence - From my testing, the precedence (order of evaluation) is


With this, there is different use-case for _JAVA_OPTIONS and JAVA_TOOL_OPTIONS

For _JAVA_OPTIONS, you could use it to overwrite the JVM options which has been defined in the command line.

For JAVA_TOOL_OPTIONS, you could use it to put additional JVM options for the predefined command line.

2. Documentation - JAVA_TOOL_OPTIONS is well documented but _JAVA_OPTIONS. So, _JAVA_OPTIONS may not be officially supported.

3. Support - _JAVA_OPTIONS is Oracle specific. The IBM Java equivalent will be IBM_JAVA_OPTIONS. JAVA_TOOL_OPTIONS is platform independent.


3. http://stackoverflow.com/questions/28327620/difference-between-java-options-java-tool-options-and-java-opts