Thursday, 15 November 2018

Spark Installation & Configuration

   
Apache Spark is an open source parallel processing framework for running large-scale data analytics applications across clustered computers. It can handle both batch and real-time analytics and data processing workloads. Apache Spark achieves high performance for both batch and streaming data, using a state-of-the-art DAG scheduler, a query optimizer, and a physical execution engine.


Prerequisites:

Java 8
Hadoop 2.6
Scala (Spark comes prebuilt with Hadoop and scala)

Downloading Apache Spark

Download spark-1.6.1 using below command on /usr/local directory.
$ cd /usr/local
$ sudo wget https://archive.apache.org/dist/spark/spark-1.6.1/spark-1.6.1-bin-hadoop2.6.tgz
Extract the Spark tar file
$ sudo tar -xvf spark-1.6.1-bin-hadoop2.6.tgz
$ sudo mv spark-1.6.1-bin-hadoop2.6 spark

Set Environment veriable

First we need to set environment variable for java. Edit ~/.bashrc file.
# nano ~/.bashrc
Append following values at end of file and save the file.
export SPARK_HOME=/usr/local/spark
export PATH=$ SPARK_HOME/bin:$PATH
Change the ownership and permissions of the directory /usr/local/spark
$ sudo chown -R hdfs:hdfs /usr/local/spark
$ sudo chmod -R 755 /usr/local/spark
For spark-sql, copy hive-site.xml file to /usr/local/spark/conf folder.
$ sudo cp /usr/local/hive/conf/hive-site.xml /usr/local/spark/conf/

Edit hive-site.xml and add the following code in the file
$ sudo nano /usr/local/spark/conf/hive-site.xml
<property>
  <name>hive.metastore.uris</name>
  <value>thrift://localhost:9083</value>
</property>

Start the Spark Services

Start the spark service using following command.
$ cd /usr/local/spark/sbin
$ ./start-all.sh
Get spark-shell prompt using following command.
$ cd /usr/local/spark/bin
$ ./spark-shell
Get spark-sql prompt using following command.
cd /usr/local/spark/bin
$ ./spark-sql
Share:

Monday, 8 October 2018

Drill 1.10.0 Installation & Configuration

Apache Drill is an open-source software framework that supports data-intensive distributed applications for interactive analysis of large-scale datasets. Apache Drill is first schema-free SQL engine. Unlike Hive, it does not use MR job internally and compared to most distributed query engines, it does not depend on Hadoop.





Prerequisites

Java 8
ZooKeeper quorum

Download and Installing Drill

Download Drill 1.10.0 using below command on /usr/local directory.
$ cd /usr/local
$ sudo wget http://www-eu.apache.org/dist/drill/drill-1.10.0/apache-drill-1.10.0.tar.gz
Unpack the compressed tar file by using this command
$ sudo tar -xvf apache-drill-1.10.0.tar.gz
Rename apache-drill-1.10.0 directory to drill in /usr/local directory by using give command
$ sudo mv apache-drill-1.10.0 drill

Setting up Drill Environment Variables

First we need to set environment variable for Drill. Edit ~/.bashrc file.
$ sudo nano ~/.bashrc
Append following values at end of file and save the file.
export DRILL_HOME=/usr/local/drill
export PATH=$DRILL_HOME/bin:$PATH
Change the ownership and permissions of the directory /usr/local/drill
$ sudo chown -R hdfs:hdfs /usr/local/drill
$ sudo chmod -R 755 /usr/local/drill
Reload the configuration file ~/.bashrc with the following command.
$ source ~/.bashrc

Start Drill services

To stat Drill service used following command
$ drill-embedded
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=512M; support was removed in 8.0
Jul 20, 2017 7:19:59 PM org.glassfish.jersey.server.ApplicationHandler initialize
INFO: Initiating Jersey application, version Jersey: 2.8 2014-04-29 01:25:26...
apache drill 1.10.0
"say hello to my little drill"

Drill Web Interfaces

Access your Drill Web Interfaces on port 8047 in your favorite web browser.

Showing below web Interface


Share:

Wednesday, 22 August 2018

HBase 0.98.4 Installation & Configuration


HBase is a column-oriented database management system that runs on top of Hadoop Distributed File System (HDFS). HBase can host very large tables, billions of rows, millions of columns and can provide real-time, random read/write access to Hadoop data.HBase scales linearly across very large datasets and easily combines data sources with different structures and schemas.



Prerequisites

Java
Hadoop 

Download HBase File

Download HBase 0.98.4 using below command on /usr/local directory.
$sudo wget http://archive.apache.org/dist/hbase/hbase-0.98.4/hbase-0.98.4-hadoop2-bin.tar.gz /usr/local/
Unpack  hbase-0.98.4-hadoop2-bin.tar.gz file.
$ cd /usr/local
$ sudo tar –xvf hbase-0.98.4-hadoop2-bin.tar.gz
Rename hbase-0.98.4-hadoop2-bin to hbase
$ sudo mv hbase-0.98.4-hadoop2 hbase

Setting up environment for HBase

Edit ~/.bashrc file for set up the HBase environment by appending the following lines.

$ sudo nano ~/.bashrc 
export HBASE_HOME=/usr/local/hbase
export PATH=$HBASE_HOME/bin:$PATH
Reload the configuration file ~/.bashrc with the following command.
# source ~/.bashrc
Edit hbase-env.sh file.
$ cd /usr/local/hbase/conf
$ sudo nano hbase-env.sh
Set value for following environment variable.
export JAVA_HOME=/usr/local/jdk
export HBASE_REGIONSERVERS=/usr/local/hbase/conf/regionservers
export HBASE_MANAGES_ZK=true
Change the ownership and permissions of the directory /usr/local/hbase
$ sudo chown -R hdfs:hdfs /usr/local/hbase
$ sudo chmod -R 755 /usr/local/hbase

Edit Configuration Files

Edit hbase-site.xml and place the following properties inside the file.
$ cd /usr/local/hbase/conf
$ sudo nano hbase-site.xml
<configuration>
 <property>
  <name>hbase.rootdir</name>
  <value>hdfs://localhost:9000/hbase</value>
 </property>
 <property>
  <name>hbase.cluster.distributed</name>
  <value>true</value>
 </property>
 <property>
  <name>hbase.zookeeper.quorum</name>
  <value>localhost</value>
 </property>
 <property>
  <name>dfs.replication</name>
  <value>1</value>
 </property>
 <property>
  <name>hbase.zookeeper.property.clientPort</name>
  <value>2181</value>
 </property>
 <property>
  <name>hbase.zookeeper.property.dataDir</name>
  <value>/home/hduser/hbase/zookeeper</value>
 </property>
</configuration>

Check Version

Check the version of HBase
$ hbase version

2017-04-27 14:25:17,790 INFO  [main] util.VersionInfo: HBase 0.98.4-hadoop2
2017-04-27 14:25:17,791 INFO  [main] util.VersionInfo: Subversion git://acer/usr/src/hbase -r 890e852ce1c51b71ad180f626b71a2a1009246da
2017-04-27 14:25:17,791 INFO  [main] util.VersionInfo: Compiled by apurtell on Mon Jul 14 19:45:06 PDT 2014

Start and verify the Services

Start HBase Server
$ start-hbase.sh
To verify the services will be start or not using following command.
$ jps
It should be provide following list of services.
15230 HMaster
15441 HRegionServer
Check whether HBaseproperly install or not
$ hbase shell
It showing HBase prompt
hbase(main):001:0>
To Stop HBase Server
$ stopt-hbase.sh

HBase Web Interfaces

Access your HBase Master server on port 60010 in your favourite web browser.
Access your HBase Master Region server on port 60030 in your favourite web browser
Share:

Wednesday, 11 July 2018

Zookeeper 3.4.6. Installation & Configuration


Apache Zookeeper is a distributed co-ordination service to manage large set of hosts. It is essentially a centralized service for distributed systems to a hierarchical key-value store, which is used to provide a distributed configuration service, synchronization service, and naming registry for large distributed systems.



Prerequisites

Java 8

Downloading and Installing Zookeeper

Download zookeeper-3.4.6 using below command on /usr/local directory.
$ cd /usr/local
$ sudo wget https://archive.apache.org/dist/zookeeper/zookeeper-3.4.6/ zookeeper-3.4.6.tar.gz
Unpack the compressed tar file by using this command
$ sudo tar –xvzf zookeeper-3.4.6.tar.gz
Rename zookeeper-3.4.6 directory to zookeeper in /usr/local directory by using give command.
$ sudo mv zookeeper-3.4.6 zookeeper

Setting up Zookeeper Environment Variables

First we need to set environment variable for zookeeper. Edit ~/.bashrc file.
$ sudo nano ~/.bashrc
Append following values at end of file and save the file.
export ZOOKEEPER_HOME=/usr/local/zookeeper
export PATH=$ ZOOKEEPER_HOME/bin:$PATH
Change the ownership and permissions of the directory /usr/local/zookeeper
$ sudo chown -R hdfs:hdfs /usr/local/zookeeper
$ sudo chmod -R 755 /usr/local/zookeeper
Reload the configuration file ~/.bashrc with the following command.
$ source ~/.bashrc

Start zookeeper services

Start Zookeeper service using this command.
$ zkServer.sh start
Showing Zookeeper is starting.
JMX enabled by default
Using config: /usr/local/zookeeper/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
Share:

Wednesday, 6 June 2018

Flume 1.4.0 Installation & Configuration


Apache Flume is a distributed, reliable, and available system for efficiently collecting, aggregating and moving large amounts of log data from many different sources to a centralized data store. Apache Flume is not only restricted to log data aggregation. Flume can be used to transport massive quantities of event data including network traffic data, social-media-data, email messages also.



Prerequisites

Java
Hadoop

Download flume tar file

Download flume 1.4.0 using below command on /usr/local directory.
$ sudo wget http://archive.apache.org/dist/flume/1.4.0/apache-flume-1.4.0-bin.tar.gz /usr/local
Unpack apache-flume-1.4.0-bin.tar.gz file
$ cd /usr/local
$ sudo tar -xvf apache-flume-1.4.0-bin.tar.gz
Rename apache-flume-1.4.0-bin folder to flume
$ sudo mv apache-flume-1.4.0-bin flume 

Setting up environment for flume

Edit ~/.bashrc file for set up the flume environment by appending the following lines.
$ sudo nano ~/.bashrc 

export FLUME_HOME=/usr/local/flume
export PATH=$FLUME_HOME/bin:$PATH
Reload the configuration file ~/.bashrc with the following command.
# source ~/.bashrc
Rename flume-env.sh.template file to flume-env.sh.
$ cd /usr/local/flume/conf
$ sudo cp flume-env.sh.template flume-env.sh
Edit the flume-env.sh file
$ sudo nano flume-env.sh
Set value for JAVA_HOME and JAVA_OPTS environment variable with java installation directory.(default this variable are commented. Need to uncomment and set values as per below).
JAVA_HOME=/usr/opt/jdk
JAVA_OPTS="-Xms500m –Xmx1000m -Dcom.sun.management.jmxremote"
Note:If we are going to use memory channels while setting flume agents, it is preferable to increase the memory limit in JAVA_OPTS variable. By default, the minimum and maximum memory values are 100 MB and 200 MB respectively. Better to increase these limits to 500 to 1000 MB respectively.

Change the ownership and permissions of the directory /usr/local/flume
$ sudo chown -R hdfs:hdfs /usr/local/flume
$ sudo chmod -R 755 /usr/local/flume

Check version

Check the version of flume.
$ flume-ng version
Flume 1.4.0
Source code repository: https://git-wip-us.apache.org/repos/asf/flume.git
Revision: 756924e96ace470289472a3bdb4d87e273ca74ef
Compiled by mpercy on Mon Jun 24 18:22:14 PDT 2013
From source with checksum f7db4bb30c2114d0d4fde482f183d4fe
Share:

Saturday, 2 June 2018

Sqoop 1.4.4 Installation & Configuration



Apache Sqoop is a tool designed for efficiently transferring bulk data between Apache Hadoop and relational databases. Sqoop is used to import data from RDBMS to Hadoop Distributed File System or  Hadoop eco-systems like Hive and HBase. Similarly, Sqoop can also be used to extract data from Hadoop to relational databases like oracle, Mysql etc.

Prerequisite:

  • Hadoop
  • MySql Database 

Download Sqoop 1.4.4 using below command on /usr/local directory.

$sudo wget http://archive.apache.org/dist/sqoop/1.4.4/

sqoop-1.4.4.bin__hadoop-2.0.4-alpha.tar.gz /usr/local
Unpack sqoop-1.4.4.bin__hadoop-2.0.4-alpha.tar.gz file
$ cd /usr/local

$ sudo tar -xvf sqoop-1.4.4.bin__hadoop-2.0.4-alpha.tar.gz
Rename sqoop-1.4.4.bin__hadoop-2.0.4-alpha to sqoop
$ sudo mv sqoop-1.4.4.bin__hadoop-2.0.4-alpha sqoop

Setting up environment for Sqoop

Edit ~/.bashrc file for set up the Pig environment by appending the following lines.
$ sudo nano ~/.bashrc 

export SQOOP_HOME=/usr/local/sqoop

export PATH=$SQOOP_HOME/bin:$PATH
Reload the configuration file ~/.bashrc with the following command.
# source ~/.bashrc
Sqoop can be connected to various types of databases. It uses JDBC to connect to them. JDBC driver for each of databases is needed by sqoop to connect to them. In this setup we are handling with MySQL connector to connect MySQL DB.
Download below jar files in /usr/local/sqoop path
mysql-connector-java-5.1.25.jar

sqljdbc4.jar
Copy above jar files from /usr/local/sqoop to /usr/local/sqoop/lib/ folder
$sudo cp /usr/local/sqoop/mysql-connector-java-5.1.28.jar /usr/local/sqoop/lib/

$sudo cp /usr/local/sqoop/sqljdbc4.jar /usr/local/sqoop/lib/
Change the ownership and permissions of the directory /usr/local/sqoop
$ sudo chown -R hdfs:hdfs /usr/local/sqoop

$ sudo chmod -R 755 /usr/local/sqoop
Check the version of sqoop.
$ sqoop version

Sqoop 1.4.4

git commit id 050a2015514533bc25f3134a33401470ee9353ad

Compiled by vasanthkumar on Mon Jul 22 20:06:06 IST 2013

Share:

Total Pageviews

Lables

Powered by Blogger.

Followers