Anyway, if you have any problem with setting up Apache Hadoop version 3+ on you Mac, this article might help get over with it. This guide will use the latest Hadoop version, which is Hadoop 3.2.1, which is fetched by default using Homebrew. Apache Hadoop was initially developed by Yahoo and the project is a combination between the previous Apache Hadoop Core and Apache Hadoop Common repos. The Hadoop project has gained a lot of notoriety thanks to its great results in implementing a multi-server distributed computing system for handling huge amounts of data. Wget is a handy tool to download files using http, https or ftp protocols from internet. We will need this utility to download Hadoop binaries from Internet later in this post to install apache Hadoop on Mac or Linux machine. To install it using homebrew, type following on terminal $ brew install wget. This is the second stable release of Apache Hadoop 2.10 line. It contains 218 bug fixes, improvements and enhancements since 2.10.0. Users are encouraged to read the overview of major changes since 2.10.0. For details of 218 bug fixes, improvements, and other enhancements since the previous 2.10.0 release, please check release notes and changelog detail the changes since 2.10.0.
Hadoop best performs on a cluster of multiple nodes/servers, however, it can run perfectly on a single machine, even a Mac, so we can use it for development. Also, Spark is a popular tool to process data in Hadoop. The purpose of this blog is to show you the steps to install Hadoop and Spark on a Mac.
Operating System: Mac OSX Yosemite 10.11.3
Hadoop Version 2.7.2
Spark 1.6.1
Hadoop Version 2.7.2
Spark 1.6.1
Pre-requisites
1. Install Java
Open a terminal window to check what Java version is installed.
$ java -version
$ java -version
If Java is not installed, go to https://java.com/en/download/ to download and install latest JDK. If Java is installed, use following command in a terminal window to find the java home path
$ /usr/libexec/java_home
$ /usr/libexec/java_home
Next we need to set JAVA_HOME environment on mac
$ echo export “JAVA_HOME=$(/usr/libexec/java_home)” >> ~/.bash_profile
$ source ~/.bash_profile
$ echo export “JAVA_HOME=$(/usr/libexec/java_home)” >> ~/.bash_profile
$ source ~/.bash_profile
2. Enable SSH as Hadoop requires it.
Go to System Preferences -> Sharing -> and check “Remote Login”.
Generate SSH Keys
$ ssh-keygen -t rsa -P “”
$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
$ ssh-keygen -t rsa -P “”
$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
Open a terminal window, and make sure we can do this.
$>ssh localhost
$>ssh localhost
Download Hadoop Distribution
Download the latest hadoop distribution (2.7.2 at the time of writing)
http://www.apache.org/dyn/closer.cgi/hadoop/common/hadoop-2.7.2/hadoop-2.7.2.tar.gz
http://www.apache.org/dyn/closer.cgi/hadoop/common/hadoop-2.7.2/hadoop-2.7.2.tar.gz
Create Hadoop Folder
Open a new terminal window, and go to the download folder, (let’s use “~/Downloads”), and find hadoop-2.7.2.tar
$ cd ~/Downloads
$ tar xzvf hadoop-2.7.2.tar
$ mv hadoop-2.7.2 /usr/local/hadoop
$ tar xzvf hadoop-2.7.2.tar
$ mv hadoop-2.7.2 /usr/local/hadoop
Hadoop Configuration Files
Go to the directory where your hadoop distribution is installed.
$ cd /usr/local/hadoop
$ cd /usr/local/hadoop
Then change the following files
$ vi etc/hadoop/hdfs-site.xml
2 4 6 | < property > < value >1</ value > </ configuration > |
$ vi etc/hadoop/core-site.xml
2 4 6 | < property > < value >hdfs://localhost:9000</ value > </ configuration > |
$ vi etc/hadoop/yarn-site.xml
2 4 6 | < property > < value >mapreduce_shuffle</ value > </ configuration > |
$ vi etc/hadoop/mapred-site.xml
![Apache Hadoop Download For Mac Apache Hadoop Download For Mac](https://static.studytime.xin/article/20200505170646.png)
2 4 6 | < property > < value >yarn</ value > </ configuration > |
Start Hadoop Services
Format HDFS
$ cd /usr/local/hadoop
$ bin/hdfs namenode -format
$ cd /usr/local/hadoop
$ bin/hdfs namenode -format
Start HDFS
$ sbin/start-dfs.sh
$ sbin/start-dfs.sh
Start YARN
$ sbin/start-yarn.sh
$ sbin/start-yarn.sh
Validation
Check HDFS file Directory
$ bin/hdfs dfs -ls /
$ bin/hdfs dfs -ls /
If you don’t like to include the bin/ every time you run a hadoop command, you can do the following
$ vi ~/.bash_profile
append this line to the end of the file “export PATH=$PATH:/usr/local/hadoop/bin”
$ source ~/.bash_profile
append this line to the end of the file “export PATH=$PATH:/usr/local/hadoop/bin”
$ source ~/.bash_profile
Now try to add the following two folders in HDFS that is needed for MapReduce job, but this time, don’t include the bin/.
$ hdfs dfs -mkdir /user
$ hdfs dfs -mkdir /user/{your username}
$ hdfs dfs -mkdir /user/{your username}
You can also open a browser and access Hadoop by using the following URL
http://localhost:50070/
http://localhost:50070/
Next: Spark
Installing Spark is a little easier. You can download the latest Spark here:
http://spark.apache.org/downloads.html
http://spark.apache.org/downloads.html
It’s a little tricky on choosing which package type. We want to choose “pre-build with user provided Hadoop [can use with most Hadoop distributions]” type, and the downloaded file name is spark-1.6.1-bin-without-hadoop.tgz
After spark is downloaded, we need to untar it. Open a terminal window and do the following:
$ cd ~/Downloads
$ tar xzvf spark-1.6.1-bin-without-hadoop.tgz
$ mv spark-1.6.1-bin-without-hadoop /usr/local/spark
$ tar xzvf spark-1.6.1-bin-without-hadoop.tgz
$ mv spark-1.6.1-bin-without-hadoop /usr/local/spark
Add spark bin folder to PATH
$ vi ~/.bash_profile
append this line to the end of the file “export PATH=$PATH:/usr/local/spark/bin”
$ source ~/.bash_profile
append this line to the end of the file “export PATH=$PATH:/usr/local/spark/bin”
$ source ~/.bash_profile
What about Scala?
Spark is written in Scala, so even though we can use Java to write Spark code, we want to install Scala as well.
Download Scala from here: http://www.scala-lang.org/download/
Choose the first one to download Scala in binary, and the downloaded file is scala-2.11.8.tar
Choose the first one to download Scala in binary, and the downloaded file is scala-2.11.8.tar
Untar Scala and move it to a dedicated folder
$ cd ~/Downloads
$ tar xzvf scala-2.11.8.tar
$ mv scala-2.11.8 /usr/local/scala
$ tar xzvf scala-2.11.8.tar
$ mv scala-2.11.8 /usr/local/scala
Add Scala bin folder to PATH
What Is Hadoop
$ vi ~/.bash_profile
append this line to the end of the file “export PATH=$PATH:/usr/local/scala/bin”
$ source ~/.bash_profile
append this line to the end of the file “export PATH=$PATH:/usr/local/scala/bin”
$ source ~/.bash_profile
Now you should be able to do the following to access Spark shell for Scala
$ spark-shell
That’s it! Happy coding!
The Hadoop Development Tools (HDT) is a set of plugins for the Eclipse IDE for developing against the Hadoop platform.
The plugin provides the following features with in the eclipse IDE:
- Wizards for creation of Hadoop Based Projects
- Wizards for creating Java Classes for Mapper/Reducer/Driver etc.
- Launching Map-Reduces programs on a Hadoop cluster
- Listing running Jobs on MR Cluster
- Browsing/inspecting HDFS nodes
- Browsing/inspecting Zookeeper nodes
Install Apache Hadoop On Windows
The tool allows you to allow working with multiple versions(1.1 and 2.2) of Hadoop from within one IDE.
This project is currently a member of the Apache Incubator, so check back for updates, or come join us [email protected].
Download Hadoop For Windows 10
Apache Hadoop Development Tools is an effort undergoing incubation at The Apache Software Foundation(ASF) sponsored by the Apache Incubator PMC.Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects. While incubation status is not necessarily a reflection of the completeness or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF.