Installation
- Install hadoop: https://ccp.cloudera.com/display/CDHDOC/CDH3+Installation
- (optional) Links to installing hive/pig/hbase are on the bottom of the link in step 2.
Configuration
- Hive: by default it does not allow multiple users, because by default it uses a local derby metastore. So, if you wish to have multiple users connect to it, install mysql, and configure /etc/hive/conf/hive-site.xml. See the hive configuration section of: https://ccp.cloudera.com/display/CDHDOC/Hive+Installation
Also see: https://cwiki.apache.org/confluence/display/Hive/AdminManual+MetastoreAdmin - You have to set JAVA_HOME.
- There are several files you need to setup in order for hdfs/hive to work, all located, by default, at /etc/hadoop/conf:
- core-site.xml: You need to set hadoop.tmp.dir and fs.default.name. For example:
-
-
-
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/hdfs-tmp/${user.name}</value>
<description>A base
for
other temporary directories.</description>
</property>
<property>
<name>fs.
default
.name</name>
<value>hdfs:
//localhost:54310</value>
<description>The name of the
default
file system. A URI whose
scheme and authority determine the FileSystem implementation. The
uri's scheme determines the config property (fs.SCHEME.impl) naming
the FileSystem implementation
class
. The uri's authority is used to
determine the host, port, etc.
for
a filesystem.</description>
</property>
-
- hdfs-site.xml You need to set dfs.name.dir and dfs.data.dir. For example:
-
<property>
<name>dfs.name.dir</name>
<value>/home/hadoop/name</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/home/hadoop/data,/home/hadoop/data1</value>
</property>
-
- mapred-site.xml You need to set mapred.local.dir, mapred.tmp.dir, mapred.job.tracker:
-
<property>
<name>mapred.local.dir</name>
<value>/home/hadoop/mapred/local</value>
</property>
<property>
<name>mapred.tmp.dir</name>
<value>/home/hadoop/mapred/temp</value>
</property>
<property>
<name>mapred.job.tracker</name>
<value>localhost:
9001
</value>
</property>
-
- For the directories you have specified in the aforementioned xml configuration files, you must ensure that their permissions are set correctly.
-
- hadoop.tmp.dir can have permission 1777 (sticky bit) for user hdfs
- dfs.name.dir and dfs.data.dir can be owned by hdfs user and hadoop group with 700 permission.
- mapred.local.dir and mapred.tmp.dir can belong to the mapred user and hadoop group.
-
Test installation
Some quick tests you can do to test installation:
Hive:
mkdir /tmp/hivetest echo " a b c d" > /tmp/hivetest/hivetest.txt hive create external table wkstable1 (one string, two string, three string, four string) row format delimited fields terminated by ' ' stored as textfile location '/tmp/hivetest' show tables; select two from wkstable1; |
Pig:
mkdir /tmp/pig cp /etc/passwd /tmp/pig cd /tmp/pig pig A = load 'passwd' using PigStorage( ':' ); B = foreach A generate $ 0 as id; dump B; |