Tuesday, August 23, 2016

Hadoop - How to setup a Hadoop Cluster

Below is a step-by-step guide which I had used to setup a Hadoop Cluster


3 VMs involved:

1) NameNode, ResourceManager - Host name: NameNode.net
2) DataNode 1 - Host name: DataNode1.net
3) DataNode 2 - Host name: DataNode2.net


1) You could create a new Hadoop user or use an existing user. But make sure the user have access to the Hadoop installation in ALL nodes

2) Install JAVA. Refer here for a good version. In this guide, Java is installed at /usr/java/latest

3) Download a stable version of Hadoop from Apache Mirrors

This guide is based on Hadoop 2.7.1 and assume that we had create a user call hadoop

Setup Passphaseless SSH from NameNode to all Nodes.

1) Run the command


This command will ask you a set of questions and accepting the default is fine. Eventually, it will create a set of private key (id_rsa) and public key (id_rsa.pub) at the user directory (/home/hadoop/.ssh)

2) Copy the public key to all Nodes with the following

ssh-copy-id -i /home/hadoop/.ssh/id_rsa.pub NameNode.net
ssh-copy-id -i /home/hadoop/.ssh/id_rsa.pub DataNode1.net
ssh-copy-id -i /home/hadoop/.ssh/id_rsa.pub DataNode2.net

3) Test the passphaseless SSH connection from NameNode with

ssh (hostname)

Install Hadoop in all Node

1) With the downloaded Hadoop distribution. Unzip it to a location where the Hadoop user had access

For this guide, I had create a /usr/local/hadoop and un-tar the distribution at this folder. The full path of Hadoop installation is /usr/local/hadoop/hadoop-2.7.1

Setup Environment Variables

1) It is best that Hadoop Variables are exported to the environment when user log in. To do so, run the command at the NameNode

sudo vi /etc/profile.d/hadoop.sh

2) Add the following in /etc/profile.d/hadoop.sh

export JAVA_HOME=/usr/java/latest
export HADOOP_HOME=/usr/hadoop
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/native"
3) Source this file or re-login to setup the environment.

4) (OPTIONAL) Set up the above for all Nodes.

Setup NameNode & ResourceManager

1) Make a directory to hold NameNode data

mkdir /usr/local/hadoop/hdfs_namenode

2) Setup $HADOOP_HOME/etc/hadoop/hdfs-site.xml

Note: dfs.datanode.data.dir value must be a URI

3) Setup $HADOOP_HOME/etc/hadoop/core-site.xml

4) (OPTIONAL) Setup $HADOOP_HOME/etc/hadoop/mapred-site.xml (We are using NameNode as ResourceManager)

5) (OPTIONAL) Setup $HADOOP_HOME/etc/hadoop/yarn-site.xml (We are using NameNode as ResourceManager)

6) Setup $HADOOP_HOME/etc/hadoop/slaves

First, remove localhost from the file, then add the following

Setup DataNodes

1) Make a directory to hold DataNode data

mkdir /usr/local/hadoop/hdfs_datanode

2) Setup $HADOOP_HOME/etc/hadoop/hdfs-site.xml

Note: dfs.datanode.data.dir value must be a URI

3) Setup $HADOOP_HOME/etc/hadoop/core-site.xml

Format NameNode

The above setting should be enough to set up the Hadoop cluster. Next, for the first time, you will need to format the NameNode. Use the following command to format the NameNode

hdfs namenode -format

Example output is

Note: the same command can be used to reformat your existing NameNode. But remember to clean up your datanodes hdfs folder as well.

Start NameNode

You can start Hadoop with the given script


Example output is

Stop NameNode

You can stop Hadoop with the given script


Example output is

Start ResourceManager

You can start the ResourceManager, in this case, Yarn, with the given script


Example output is

Stop ResourceManager

You can stop the ResourceManager, in this case, Yarn, with the given script


Example output is

Show status of Hadoop

You can use the following command to show status of Hadoop


Example output is

Complete Testing

You can also do the following to perform a complete test to ensure Hadoop is running fine.

You could access the Hadoop Resource Manager information at http://NameNode_hostname:8088

You could also access the Hadoop cluster summary at http://NameNode_hostname:50070. You should be able to see the number of datanodes being setup for the cluster.


1. http://www.server-world.info/en/note?os=CentOS_7&p=hadoop
2. http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/ClusterSetup.html

Tuesday, August 9, 2016

JAVA - _JAVA_OPTIONS and JAVA_TOOL_OPTIONS environment variable

JAVA_TOOL_OPTIONS and _JAVA_OPTIONS are 2 useful environment variables which allow user to set JVM options in the form of environment variables, rather than setting it at the command line. But, they have slight differences

1. Precedence - From my testing, the precedence (order of evaluation) is


With this, there is different use-case for _JAVA_OPTIONS and JAVA_TOOL_OPTIONS

For _JAVA_OPTIONS, you could use it to overwrite the JVM options which has been defined in the command line.

For JAVA_TOOL_OPTIONS, you could use it to put additional JVM options for the predefined command line.

2. Documentation - JAVA_TOOL_OPTIONS is well documented but _JAVA_OPTIONS. So, _JAVA_OPTIONS may not be officially supported.

3. Support - _JAVA_OPTIONS is Oracle specific. The IBM Java equivalent will be IBM_JAVA_OPTIONS. JAVA_TOOL_OPTIONS is platform independent.


3. http://stackoverflow.com/questions/28327620/difference-between-java-options-java-tool-options-and-java-opts

Monday, July 18, 2016

Oracle - Enabling Oracle XE APEX

I had been trying to figure where and why can't I log into my Oracle 11g XE APEX. The following are the steps to fix my APEX

1. Find out which port is binded for APEX. For this, you can use lsnrctl command to find it out

$ lsnrctl status | grep HTTP

The above shows that my APEX is binded to port 12345. So, the URL will be http://hostname:12345apex/apex_admin

2. Next, when I access the url from the browser, it requires me to provide a username password in a authentication dialog, and this dialog always display with either "the server says xdb" or "the server says APEX"

For me, this is telling me that my Oracle has not been set up for remote access. To fix this, I log into Oracle database instance from the command line and issue a procedure call as follow

$sqlplus / as sysdba
Connected to:
Oracle Database 11g Express Edition Release - 64bit Production

SQL> exit

After the above, you may need to restart your Oracle database. Then, try to log into APEX again. It should bring you to the log in page which look like

3. Now, I had also forgotten my APEX password. So, I have to reset it with the following in the command line

$sqlplus / as sysdba
Connected to:
Oracle Database 11g Express Edition Release - 64bit Production

SQL> @apxxepwd admin
SQL> exit

The above will reset the default Administrator account (username: admin) with the password: admin

You can now log into the APEX with

Username: admin
Password: admin

The APEX will then ask you to change the Administrator password.

There you go, you can start using APEX.

Wednesday, May 18, 2016

LDAP - Create encrypted user password

In Person object class, there is a userPassword attribute and LDAP usually use this to store user password.

To add an encrypted password to the LDAP userPassword attribute, you could

1. Use ldappasswd command

ldappasswd -xv -D "cn=Manager,dc=example,dc=com" -w secret -S "cn=user1,dc=example,dc=com"


x --> Use simple authentication
v --> Run in verbose
D --> bind DN
w --> password for simple authentication
-S --> prompt for new password

2. Use slappasswd command

$ /usr/local/sbin/slappasswd
New password:
Re-enter new password:

It will ask for your password and generate a SSHA password. Copy the output and put at the userPassword attribute in your ldif file.

Wednesday, April 27, 2016

LDAP - Adding inetOrgPerson schema to OpenLDAP

By default, inetOrgPerson schema is not included in the slapd.conf. You will need the following steps to add inetOrgPerson schema to OpenLDAP

1. Edit slapd.conf with any editor tools. The slapd.conf usually is located at /usr/local/etc/openldap and require sudo access to edit the file.

2. Add  the following

include        /usr/local/etc/openldap/schema/cosine.schema
include         /usr/local/etc/openldap/schema/inetorgperson.schema

to the top section of the slapd.conf file. Please note that the order is important. cosine.schema is required because attribute type audio is defined in cosine.schema.

3. Restart the LDAP with "su root -c /usr/local/libexec/slapd"

Tuesday, March 15, 2016

Oracle - ORA-01034: ORACLE not available

When you see the following error

ORA-01034: ORACLE not available
ORA-27101: shared memory realm does not exist
Linux-x86_64 Error: 2: No such file or directory
Process ID: 0
Session ID: 0 Serial number: 0

and your ORACLE_HOME is set correctly. You may want to check if your ORACLE_SID spelled with wrong case. 

ORACLE_SID is case sensitive in Linux

I saw this https://asktom.oracle.com/pls/asktom/f?p=100:11:0::NO::P11_QUESTION_ID:318216852435

The SID is a site identifier. It plus the Oracle_home are hashed together in Unix to create a unique key name for attaching an SGA. If your Oracle_sid or Oracle_home is not set correctly, you'll get "oracle not available" since we cannot attach to a shared memory segment that is identified by magic key.

From Oracle website, the basic format of tnsnames.ora is


net_service_name can be use when connecting with sqlplus

CONNECT username@net_service_name

If no net_service_name is given, it will use ORACLE_SID

Connectivity Concepts from Oracle is a good read about Service name and SID.

Tuesday, February 23, 2016

Linux - Adding a swap file to RHEL

It is very possible that you may want to increase swap space after installation. For myself, my server is running out of memory, both physical and virtual. So as a workaround, I tried the following to increase my swap space. Although swap partition is recommended, I chose swap file because it will be less disruptive to the users (ie, do not need to reboot..)

To add a swap file:

Determine the size of the new swap file in megabytes and multiple by 1024 to determine the block size. For example, the block size of a 8GB swap file is 8192000.

1. At a shell prompt as root, type the following command with count being equal to the desired block size:

dd if=/dev/zero of=/swapfile bs=1024 count=8192000

The command output is

$ dd if=/dev/zero of=/swapfile bs=1024 count=8192000
8192000+0 records in
8192000+0 records out
8388608000 bytes (8.4 GB) copied, 25.3272 s, 331 MB/s

2. Setup the swap file with the command:

mkswap /swapfile

The command output is

$ mkswap /swapfile
mkswap: /swapfile: warning: don't erase bootbits sectors
        on whole disk. Use -f to force.
Setting up swapspace version 1, size = 8191996 KiB
no label, UUID=7s1166af-ls99-e938-lpos-p0a9000e1234

3. To enable the swap file immediately but not automatically at boot time:

swapon /swapfile

This command does not have any output

4. To enable it at boot time, edit /etc/fstab to include:

/swapfile               swap                    swap    defaults        0 0

The next time the system boots, it enables the new swap file.

This command does not have any output

5. After adding the new swap file and enabling it, verify it is enabled by viewing the output of the command cat /proc/swaps or free.

You could also use Top or HTop command to see the additional swap amount. Or, you could ls -al / to see the swapfile

$ ls -al /
-rw-r--r--    1 root root 8388608000 Jul  5 22:21 swapfile

The above is reference from RedHat System Administrator Guide with my own experience added.