7/16/2015

Building Hadoop 2.4.0 on Mac OS X Yosemite 10.10.3 with native components

Install pre-requisites

We'll need these for the actual build.

sudo port install cmake gmake gcc48 zlib gzip maven32 apache-ant

Install protobuf 2.5.0

As the current latest version in macports is 2.6.x, we need to stick to an earlier version:

cd ~/tools
svn co http://svn.macports.org/repository/macports/trunk/dports/devel/protobuf-cpp -r 105333
cd protobuf-cpp/
sudo port install

To verify:

protoc --version
# libprotoc 2.5.0

Acquire sources

As I needed an exact version for my work to reproduce an issue, I'll go with version 2.4.0 for now. I suppose some of the fixes will work with earlier or later versions as well. Look around in the tags folder for other versions.

cd ~/dev
svn co http://svn.apache.org/repos/asf/hadoop/common/tags/release-2.4.0 hadoop-2.4.0
cd hadoop-2.4.0

Fix sources

We need to patch JniBasedUnixGroupsNetgroupMapping:

patch -p0 <<EOF
--- hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/security/JniBasedUnixGroupsNetgroupMapping.c.orig 2015-07-16 17:14:20.000000000 +0200
+++ hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/security/JniBasedUnixGroupsNetgroupMapping.c 2015-07-16 17:17:47.000000000 +0200
@@ -74,7 +74,7 @@
   // endnetgrent)
   setnetgrentCalledFlag = 1;
 #ifndef __FreeBSD__
-  if(setnetgrent(cgroup) == 1) {
+  setnetgrent(cgroup); {
 #endif
     current = NULL;
     // three pointers are for host, user, domain, we only care

EOF

As well as container-executor.c:

patch -p0 <<EOF
--- hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c.orig 2015-07-16 17:49:15.000000000 +0200
+++ hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c 2015-07-16 18:13:03.000000000 +0200
@@ -498,7 +498,7 @@
   char **users = whitelist;
   if (whitelist != NULL) {
     for(; *users; ++users) {
-      if (strncmp(*users, user, LOGIN_NAME_MAX) == 0) {
+      if (strncmp(*users, user, 64) == 0) {
         free_values(whitelist);
         return 1;
       }
@@ -1247,7 +1247,7 @@
               pair);
     result = -1; 
   } else {
-    if (mount("none", mount_path, "cgroup", 0, controller) == 0) {
+    if (mount("none", mount_path, "cgroup", 0) == 0) {
       char *buf = stpncpy(hier_path, mount_path, strlen(mount_path));
       *buf++ = '/';
       snprintf(buf, PATH_MAX - (buf - hier_path), "%s", hierarchy);
@@ -1274,3 +1274,21 @@
   return result;
 }
 
+int fcloseall(void)
+{
+    int succeeded; /* return value */
+    FILE *fds_to_close[3]; /* the size being hardcoded to '3' is temporary */
+    int i; /* loop counter */
+    succeeded = 0;
+    fds_to_close[0] = stdin;
+    fds_to_close[1] = stdout;
+    fds_to_close[2] = stderr;
+    /* max iterations being hardcoded to '3' is temporary: */
+    for ((i = 0); (i < 3); i++) {
+ succeeded += fclose(fds_to_close[i]);
+    }
+    if (succeeded != 0) {
+ succeeded = EOF;
+    }
+    return succeeded;
+}

EOF

Install Oracle JDK 1.7

You'll need to install "Java SE Development Kit 7 (Mac OS X x64)" from Oracle. Then let's fix some things expected by the build at a different place:

export JAVA_HOME=`/usr/libexec/java_home -v 1.7`
sudo mkdir $JAVA_HOME/Classes
sudo ln -s $JAVA_HOME/lib/tools.jar $JAVA_HOME/Classes/classes.jar

Install Hadoop 2.4.0:

Sooner or later we've been expected to get here, right?

mvn package -Pdist,native -DskipTests -Dtar

If all goes well:

main:
     [exec] $ tar cf hadoop-2.4.0.tar hadoop-2.4.0
     [exec] $ gzip -f hadoop-2.4.0.tar
     [exec] 
     [exec] Hadoop dist tar available at: /Users/doma/dev/hadoop-2.4.0/hadoop-dist/target/hadoop-2.4.0.tar.gz
     [exec] 
[INFO] Executed tasks
[INFO] 
[INFO] --- maven-javadoc-plugin:2.8.1:jar (module-javadocs) @ hadoop-dist ---
[INFO] Building jar: /Users/doma/dev/hadoop-2.4.0/hadoop-dist/target/hadoop-dist-2.4.0-javadoc.jar
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO] 
[INFO] Apache Hadoop Main ................................ SUCCESS [1.177s]
[INFO] Apache Hadoop Project POM ......................... SUCCESS [1.548s]
[INFO] Apache Hadoop Annotations ......................... SUCCESS [3.394s]
[INFO] Apache Hadoop Assemblies .......................... SUCCESS [0.277s]
[INFO] Apache Hadoop Project Dist POM .................... SUCCESS [1.765s]
[INFO] Apache Hadoop Maven Plugins ....................... SUCCESS [3.143s]
[INFO] Apache Hadoop MiniKDC ............................. SUCCESS [2.498s]
[INFO] Apache Hadoop Auth ................................ SUCCESS [3.265s]
[INFO] Apache Hadoop Auth Examples ....................... SUCCESS [2.074s]
[INFO] Apache Hadoop Common .............................. SUCCESS [1:26.460s]
[INFO] Apache Hadoop NFS ................................. SUCCESS [4.527s]
[INFO] Apache Hadoop Common Project ...................... SUCCESS [0.032s]
[INFO] Apache Hadoop HDFS ................................ SUCCESS [2:09.326s]
[INFO] Apache Hadoop HttpFS .............................. SUCCESS [14.876s]
[INFO] Apache Hadoop HDFS BookKeeper Journal ............. SUCCESS [5.814s]
[INFO] Apache Hadoop HDFS-NFS ............................ SUCCESS [2.941s]
[INFO] Apache Hadoop HDFS Project ........................ SUCCESS [0.034s]
[INFO] hadoop-yarn ....................................... SUCCESS [0.034s]
[INFO] hadoop-yarn-api ................................... SUCCESS [57.713s]
[INFO] hadoop-yarn-common ................................ SUCCESS [20.985s]
[INFO] hadoop-yarn-server ................................ SUCCESS [0.040s]
[INFO] hadoop-yarn-server-common ......................... SUCCESS [6.935s]
[INFO] hadoop-yarn-server-nodemanager .................... SUCCESS [12.889s]
[INFO] hadoop-yarn-server-web-proxy ...................... SUCCESS [2.362s]
[INFO] hadoop-yarn-server-applicationhistoryservice ...... SUCCESS [4.059s]
[INFO] hadoop-yarn-server-resourcemanager ................ SUCCESS [11.368s]
[INFO] hadoop-yarn-server-tests .......................... SUCCESS [0.467s]
[INFO] hadoop-yarn-client ................................ SUCCESS [4.109s]
[INFO] hadoop-yarn-applications .......................... SUCCESS [0.043s]
[INFO] hadoop-yarn-applications-distributedshell ......... SUCCESS [2.123s]
[INFO] hadoop-yarn-applications-unmanaged-am-launcher .... SUCCESS [1.902s]
[INFO] hadoop-yarn-site .................................. SUCCESS [0.030s]
[INFO] hadoop-yarn-project ............................... SUCCESS [3.828s]
[INFO] hadoop-mapreduce-client ........................... SUCCESS [0.069s]
[INFO] hadoop-mapreduce-client-core ...................... SUCCESS [19.507s]
[INFO] hadoop-mapreduce-client-common .................... SUCCESS [13.039s]
[INFO] hadoop-mapreduce-client-shuffle ................... SUCCESS [2.232s]
[INFO] hadoop-mapreduce-client-app ....................... SUCCESS [7.625s]
[INFO] hadoop-mapreduce-client-hs ........................ SUCCESS [6.198s]
[INFO] hadoop-mapreduce-client-jobclient ................. SUCCESS [5.440s]
[INFO] hadoop-mapreduce-client-hs-plugins ................ SUCCESS [1.534s]
[INFO] Apache Hadoop MapReduce Examples .................. SUCCESS [4.577s]
[INFO] hadoop-mapreduce .................................. SUCCESS [2.903s]
[INFO] Apache Hadoop MapReduce Streaming ................. SUCCESS [3.509s]
[INFO] Apache Hadoop Distributed Copy .................... SUCCESS [6.723s]
[INFO] Apache Hadoop Archives ............................ SUCCESS [1.705s]
[INFO] Apache Hadoop Rumen ............................... SUCCESS [4.460s]
[INFO] Apache Hadoop Gridmix ............................. SUCCESS [3.330s]
[INFO] Apache Hadoop Data Join ........................... SUCCESS [2.585s]
[INFO] Apache Hadoop Extras .............................. SUCCESS [2.361s]
[INFO] Apache Hadoop Pipes ............................... SUCCESS [9.603s]
[INFO] Apache Hadoop OpenStack support ................... SUCCESS [3.797s]
[INFO] Apache Hadoop Client .............................. SUCCESS [6.102s]
[INFO] Apache Hadoop Mini-Cluster ........................ SUCCESS [0.091s]
[INFO] Apache Hadoop Scheduler Load Simulator ............ SUCCESS [3.251s]
[INFO] Apache Hadoop Tools Dist .......................... SUCCESS [5.068s]
[INFO] Apache Hadoop Tools ............................... SUCCESS [0.032s]
[INFO] Apache Hadoop Distribution ........................ SUCCESS [24.974s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 8:54.425s
[INFO] Finished at: Thu Jul 16 18:22:12 CEST 2015
[INFO] Final Memory: 173M/920M
[INFO] ------------------------------------------------------------------------

Using it

First we'll extract the results of our build. Then actually there is a little bit of configuration needed even for a single-cluster setup. Don't worry, I'll copy it here for your comfort ;-)

tar -xvzf /Users/doma/dev/hadoop-2.4.0/hadoop-dist/target/hadoop-2.4.0.tar.gz -C ~/tools

The contents of ~/tools/hadoop-2.4.0/etc/hadoop/core-site.xml:

<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://localhost:9000</value>
    </property>
</configuration>

The contents of ~/tools/hadoop-2.4.0/etc/hadoop/hdfs-site.xml:

<configuration>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
</configuration>

Passwordless SSH

From the official docs:

ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

Starting up

Let's see what we've did. This is a raw copy from the official docs.

  1. Format the filesystem:
    bin/hdfs namenode -format
    
  2. Start NameNode daemon and DataNode daemon:
    sbin/start-dfs.sh
    

    The hadoop daemon log output is written to the $HADOOP_LOG_DIR directory (defaults to $HADOOP_HOME/logs).

  3. Browse the web interface for the NameNode; by default it is available at:
  4. Make the HDFS directories required to execute MapReduce jobs:
    bin/hdfs dfs -mkdir /user
    bin/hdfs dfs -mkdir /user/<username>
    
  5. Copy the input files into the distributed filesystem:
    bin/hdfs dfs -put etc/hadoop input
    

    Check if they are there at http://localhost:50070/explorer.html#/

  6. Run some of the examples provided (that's actually one line...):
    bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.4.0.jar grep input output 'dfs[a-z.]+'
    
  7. Examine the output files:

    Copy the output files from the distributed filesystem to the local filesystem and examine them:

    bin/hdfs dfs -get output output
    cat output/*
    

    or

    View the output files on the distributed filesystem:

    bin/hdfs dfs -cat output/*
    
  8. When you're done, stop the daemons with:
    sbin/stop-dfs.sh
    

Possible errors without the fixes & tweaks above

This list is an excerpt from my efforts during the build. They meant to drive you here via google ;-) Apply the procedure above and all of these errors will be fixed for you.

Without ProtoBuf

If you don't have protobuf, you'll get the following error:

[INFO] --- hadoop-maven-plugins:2.4.0:protoc (compile-protoc) @ hadoop-common ---
[WARNING] [protoc, --version] failed: java.io.IOException: Cannot run program "protoc": error=2, No such file or directory
[ERROR] stdout: []

Wrong version of ProtoBuf

If you don't have the correct version of protobuf, you'll get

[ERROR] Failed to execute goal org.apache.hadoop:hadoop-maven-plugins:2.4.0:protoc (compile-protoc) on project hadoop-common: org.apache.maven.plugin.MojoExecutionException: protoc version is 'libprotoc 2.6.1', expected version is '2.5.0' -> [Help 1]

CMAKE missing

If you don't have cmake, you'll get

[ERROR] Failed to execute goal org.apache.maven.plugins:maven-antrun-plugin:1.7:run (make) on project hadoop-common: An Ant BuildException has occured: Execute failed: java.io.IOException: Cannot run program "cmake" (in directory "/Users/doma/dev/hadoop-2.4.0/hadoop-common-project/hadoop-common/target/native"): error=2, No such file or directory
[ERROR] around Ant part ...... @ 4:132 in /Users/doma/dev/hadoop-2.4.0/hadoop-common-project/hadoop-common/target/antrun/build-main.xml

JAVA_HOME missing

If you don't have JAVA_HOME correctly set, you'll get

     [exec] -- Detecting CXX compiler ABI info
     [exec] -- Detecting CXX compiler ABI info - done
     [exec] -- Detecting CXX compile features
     [exec] -- Detecting CXX compile features - done
     [exec] CMake Error at /opt/local/share/cmake-3.2/Modules/FindPackageHandleStandardArgs.cmake:138 (message):
     [exec]   Could NOT find JNI (missing: JAVA_AWT_LIBRARY JAVA_JVM_LIBRARY
     [exec]   JAVA_INCLUDE_PATH JAVA_INCLUDE_PATH2 JAVA_AWT_INCLUDE_PATH)
     [exec] Call Stack (most recent call first):
     [exec]   /opt/local/share/cmake-3.2/Modules/FindPackageHandleStandardArgs.cmake:374 (_FPHSA_FAILURE_MESSAGE)
     [exec]   /opt/local/share/cmake-3.2/Modules/FindJNI.cmake:287 (FIND_PACKAGE_HANDLE_STANDARD_ARGS)
     [exec]   JNIFlags.cmake:117 (find_package)
     [exec]   CMakeLists.txt:24 (include)
     [exec] 
     [exec] 
     [exec] -- Configuring incomplete, errors occurred!
     [exec] See also "/Users/doma/dev/hadoop-2.4.0/hadoop-common-project/hadoop-common/target/native/CMakeFiles/CMakeOutput.log".

JniBasedUnixGroupsNetgroupMapping.c patch missing

If you don't have the patch for JniBasedUnixGroupsNetgroupMapping.c above, you'll get

     [exec] [ 38%] Building C object CMakeFiles/hadoop.dir/main/native/src/org/apache/hadoop/security/JniBasedUnixGroupsNetgroupMapping.c.o
     [exec] /Library/Developer/CommandLineTools/usr/bin/cc  -Dhadoop_EXPORTS -g -Wall -O2 -D_REENTRANT -D_GNU_SOURCE -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -fPIC -I/Users/doma/dev/hadoop-2.4.0/hadoop-common-project/hadoop-common/target/native/javah -I/Users/doma/dev/hadoop-2.4.0/hadoop-common-project/hadoop-common/src/main/native/src -I/Users/doma/dev/hadoop-2.4.0/hadoop-common-project/hadoop-common/src -I/Users/doma/dev/hadoop-2.4.0/hadoop-common-project/hadoop-common/src/src -I/Users/doma/dev/hadoop-2.4.0/hadoop-common-project/hadoop-common/target/native -I/Library/Java/JavaVirtualMachines/jdk1.7.0_51.jdk/Contents/Home/include -I/Library/Java/JavaVirtualMachines/jdk1.7.0_51.jdk/Contents/Home/include/darwin -I/opt/local/include -I/Users/doma/dev/hadoop-2.4.0/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/util    -o CMakeFiles/hadoop.dir/main/native/src/org/apache/hadoop/security/JniBasedUnixGroupsNetgroupMapping.c.o   -c /Users/doma/dev/hadoop-2.4.0/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/security/JniBasedUnixGroupsNetgroupMapping.c
     [exec] /Users/doma/dev/hadoop-2.4.0/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/security/JniBasedUnixGroupsNetgroupMapping.c:77:26: error: invalid operands to binary expression ('void' and 'int')
     [exec]   if(setnetgrent(cgroup) == 1) {
     [exec]      ~~~~~~~~~~~~~~~~~~~ ^  ~
     [exec] 1 error generated.
     [exec] make[2]: *** [CMakeFiles/hadoop.dir/main/native/src/org/apache/hadoop/security/JniBasedUnixGroupsNetgroupMapping.c.o] Error 1
     [exec] make[1]: *** [CMakeFiles/hadoop.dir/all] Error 2
     [exec] make: *** [all] Error 2

fcloseall patch missing

Without applying the fcloseall patch above, you might get the following error:

     [exec] Undefined symbols for architecture x86_64:
     [exec]   "_fcloseall", referenced from:
     [exec]       _launch_container_as_user in libcontainer.a(container-executor.c.o)
     [exec] ld: symbol(s) not found for architecture x86_64
     [exec] collect2: error: ld returned 1 exit status
     [exec] make[2]: *** [target/usr/local/bin/container-executor] Error 1
     [exec] make[1]: *** [CMakeFiles/container-executor.dir/all] Error 2
     [exec] make: *** [all] Error 2

Symlink missing

Without the "export JAVA_HOME=`/usr/libexec/java_home -v 1.7`;sudo mkdir $JAVA_HOME/Classes;sudo ln -s $JAVA_HOME/lib/tools.jar $JAVA_HOME/Classes/classes.jar" line creating the symlinks above, you'll get

Exception in thread "main" java.lang.AssertionError: Missing tools.jar at: /Library/Java/JavaVirtualMachines/jdk1.7.0_79.jdk/Contents/Home/Classes/classes.jar. Expression: file.exists()
 at org.codehaus.groovy.runtime.InvokerHelper.assertFailed(InvokerHelper.java:395)
 at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.assertFailed(ScriptBytecodeAdapter.java:683)
 at org.codehaus.mojo.jspc.CompilationMojoSupport.findToolsJar(CompilationMojoSupport.groovy:371)
 at org.codehaus.mojo.jspc.CompilationMojoSupport.this$4$findToolsJar(CompilationMojoSupport.groovy)
...

References:

http://java-notes.com/index.php/hadoop-on-osx

https://issues.apache.org/jira/secure/attachment/12602452/HADOOP-9350.patch

http://www.csrdu.org/nauman/2014/01/23/geting-started-with-hadoop-2-2-0-building/

https://developer.apple.com/library/mac/documentation/Porting/Conceptual/PortingUnix/compiling/compiling.html

https://github.com/cooljeanius/libUnixToOSX/blob/master/fcloseall.c

http://hadoop.apache.org/docs/r2.4.1/hadoop-project-dist/hadoop-common/SingleCluster.html


5 comments:

  1. Thanks Doma,
    This post really helped me to build hadoop 2.2.

    I faced few issues as follows:
    1. Got 'MAX_PATH' undefined error in the native code in the project hadoop-hdfs build.

    The file '.../hadoop-hdfs-project/hadoop-hdfs/src/main/native/util/posix_util.c' is missing the following include statement
    #include

    2. Need to satisfy the maven dependencies by running
    mvn install -Pnative -DskipTests

    My usual method of build is
    mvn compile -Pnative -DskipTests
    which is not sufficient.

    ReplyDelete
    Replies
    1. Thanks for the feedback Kandy. Are you sure the #include line is just by itself, not specifying what it is including? Next time I'll need to build Hadoop with native components on OSX I'll check them out ;-)

      Delete
  2. Thanks for providing this informative information you may also refer.
    http://www.s4techno.com/blog/2016/08/13/storm-components/

    ReplyDelete
  3. Me happy that it helped somebody out there ;-)
    You are welcome.

    ReplyDelete
    Replies
    1. BTW I'm doing consultancy as well, in professional matters feel free to write me to domalajos@gmail.com

      Delete