2018 ~ For New Generation Of Technologies...

Thursday, 15 November 2018

Spark Installation & Configuration

By Suhas KayarkarNovember 15, 2018 No comments:

Apache Spark is an open source parallel processing framework for running large-scale data analytics applications across clustered computers. It can handle both batch and real-time analytics and data processing workloads. Apache Spark achieves high performance for both batch and streaming data, using a state-of-the-art DAG scheduler, a query optimizer, and a physical execution engine.

Prerequisites:

Java 8
Hadoop 2.6
Scala (Spark comes prebuilt with Hadoop and scala)

Downloading Apache Spark

Download spark-1.6.1 using below command on /usr/local directory.

$ cd /usr/local

$ sudo wget https://archive.apache.org/dist/spark/spark-1.6.1/spark-1.6.1-bin-hadoop2.6.tgz

Extract the Spark tar file

$ sudo tar -xvf spark-1.6.1-bin-hadoop2.6.tgz

$ sudo mv spark-1.6.1-bin-hadoop2.6 spark

Set Environment veriable

First we need to set environment variable for java. Edit ~/.bashrc file.

# nano ~/.bashrc

Append following values at end of file and save the file.

export SPARK_HOME=/usr/local/spark

export PATH=$ SPARK_HOME/bin:$PATH

Change the ownership and permissions of the directory /usr/local/spark

$ sudo chown -R hdfs:hdfs /usr/local/spark

$ sudo chmod -R 755 /usr/local/spark

For spark-sql, copy hive-site.xml file to /usr/local/spark/conf folder.

$ sudo cp /usr/local/hive/conf/hive-site.xml /usr/local/spark/conf/

Edit hive-site.xml and add the following code in the file

$ sudo nano /usr/local/spark/conf/hive-site.xml

<name>hive.metastore.uris</name>

<value>thrift://localhost:9083</value>

</property>

Start the Spark Services

Start the spark service using following command.

$ cd /usr/local/spark/sbin

$ ./start-all.sh

Get spark-shell prompt using following command.

$ cd /usr/local/spark/bin

$ ./spark-shell

Get spark-sql prompt using following command.

$ cd /usr/local/spark/bin

$ ./spark-sql

Drill 1.10.0 Installation & Configuration

By Suhas KayarkarOctober 08, 2018 No comments:

Apache Drill is an open-source software framework that supports data-intensive distributed applications for interactive analysis of large-scale datasets. Apache Drill is first schema-free SQL engine. Unlike Hive, it does not use MR job internally and compared to most distributed query engines, it does not depend on Hadoop.

Prerequisites

Java 8
ZooKeeper quorum

Download and Installing Drill

Download Drill 1.10.0 using below command on /usr/local directory.

$ cd /usr/local

$ sudo wget http://www-eu.apache.org/dist/drill/drill-1.10.0/apache-drill-1.10.0.tar.gz

Unpack the compressed tar file by using this command

$ sudo tar -xvf apache-drill-1.10.0.tar.gz

Rename apache-drill-1.10.0 directory to drill in /usr/local directory by using give command

$ sudo mv apache-drill-1.10.0 drill

Setting up Drill Environment Variables

First we need to set environment variable for Drill. Edit ~/.bashrc file.

$ sudo nano ~/.bashrc

Append following values at end of file and save the file.

export DRILL_HOME=/usr/local/drill

export PATH=$DRILL_HOME/bin:$PATH

Change the ownership and permissions of the directory /usr/local/drill

$ sudo chown -R hdfs:hdfs /usr/local/drill

$ sudo chmod -R 755 /usr/local/drill

Reload the configuration file ~/.bashrc with the following command.

$ source ~/.bashrc

Start Drill services

To stat Drill service used following command

$ drill-embedded

Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=512M; support was removed in 8.0

Jul 20, 2017 7:19:59 PM org.glassfish.jersey.server.ApplicationHandler initialize

INFO: Initiating Jersey application, version Jersey: 2.8 2014-04-29 01:25:26...

apache drill 1.10.0

"say hello to my little drill"

Drill Web Interfaces

Access your Drill Web Interfaces on port 8047 in your favorite web browser.

http://localhost:8047

Showing below web Interface

HBase 0.98.4 Installation & Configuration

By Suhas KayarkarAugust 22, 2018 No comments:

HBase is a column-oriented database management system that runs on top of Hadoop Distributed File System (HDFS). HBase can host very large tables, billions of rows, millions of columns and can provide real-time, random read/write access to Hadoop data.HBase scales linearly across very large datasets and easily combines data sources with different structures and schemas.

Prerequisites

Java
Hadoop

Download HBase File

Download HBase 0.98.4 using below command on /usr/local directory.

$sudo wget http://archive.apache.org/dist/hbase/hbase-0.98.4/hbase-0.98.4-hadoop2-bin.tar.gz /usr/local/

Unpack hbase-0.98.4-hadoop2-bin.tar.gz file.

$ cd /usr/local

$ sudo tar –xvf hbase-0.98.4-hadoop2-bin.tar.gz

Rename hbase-0.98.4-hadoop2-bin to hbase

$ sudo mv hbase-0.98.4-hadoop2 hbase

Setting up environment for HBase

Edit ~/.bashrc file for set up the HBase environment by appending the following lines.

$ sudo nano ~/.bashrc

export HBASE_HOME=/usr/local/hbase

export PATH=$HBASE_HOME/bin:$PATH

Reload the configuration file ~/.bashrc with the following command.

# source ~/.bashrc

Edit hbase-env.sh file.

$ cd /usr/local/hbase/conf

$ sudo nano hbase-env.sh

Set value for following environment variable.

export JAVA_HOME=/usr/local/jdk
export HBASE_REGIONSERVERS=/usr/local/hbase/conf/regionservers
export HBASE_MANAGES_ZK=true

Change the ownership and permissions of the directory /usr/local/hbase

$ sudo chown -R hdfs:hdfs /usr/local/hbase

$ sudo chmod -R 755 /usr/local/hbase

Edit Configuration Files

Edit hbase-site.xml and place the following properties inside the file.

$ cd /usr/local/hbase/conf

$ sudo nano hbase-site.xml

<name>hbase.rootdir</name>

<value>hdfs://localhost:9000/hbase</value>

</property>

<name>hbase.cluster.distributed</name>

</property>

<name>hbase.zookeeper.quorum</name>

<value>localhost</value>

</property>

<name>dfs.replication</name>

</property>

<name>hbase.zookeeper.property.clientPort</name>

</property>

<name>hbase.zookeeper.property.dataDir</name>

<value>/home/hduser/hbase/zookeeper</value>

</property>

</configuration>

Check Version

Check the version of HBase

$ hbase version

2017-04-27 14:25:17,790 INFO [main] util.VersionInfo: HBase 0.98.4-hadoop2

2017-04-27 14:25:17,791 INFO [main] util.VersionInfo: Subversion git://acer/usr/src/hbase -r 890e852ce1c51b71ad180f626b71a2a1009246da

2017-04-27 14:25:17,791 INFO [main] util.VersionInfo: Compiled by apurtell on Mon Jul 14 19:45:06 PDT 2014

Start and verify the Services

Start HBase Server

$ start-hbase.sh

To verify the services will be start or not using following command.

$ jps

It should be provide following list of services.

15230 HMaster

15441 HRegionServer

Check whether HBaseproperly install or not

$ hbase shell

It showing HBase prompt

hbase(main):001:0>

To Stop HBase Server

$ stopt-hbase.sh

HBase Web Interfaces

Access your HBase Master server on port 60010 in your favourite web browser.

http://localhost:60010

Access your HBase Master Region server on port 60030 in your favourite web browser

http://localhost:60030

Zookeeper 3.4.6. Installation & Configuration

By Suhas KayarkarJuly 11, 2018 No comments:

Apache Zookeeper is a distributed co-ordination service to manage large set of hosts. It is essentially a centralized service for distributed systems to a hierarchical key-value store, which is used to provide a distributed configuration service, synchronization service, and naming registry for large distributed systems.

Prerequisites

Java 8

Downloading and Installing Zookeeper

Download zookeeper-3.4.6 using below command on /usr/local directory.

$ cd /usr/local

$ sudo wget https://archive.apache.org/dist/zookeeper/zookeeper-3.4.6/ zookeeper-3.4.6.tar.gz

Unpack the compressed tar file by using this command

$ sudo tar –xvzf zookeeper-3.4.6.tar.gz

Rename zookeeper-3.4.6 directory to zookeeper in /usr/local directory by using give command.

$ sudo mv zookeeper-3.4.6 zookeeper

Setting up Zookeeper Environment Variables

First we need to set environment variable for zookeeper. Edit ~/.bashrc file.

$ sudo nano ~/.bashrc

Append following values at end of file and save the file.

export ZOOKEEPER_HOME=/usr/local/zookeeper

export PATH=$ ZOOKEEPER_HOME/bin:$PATH

Change the ownership and permissions of the directory /usr/local/zookeeper

$ sudo chown -R hdfs:hdfs /usr/local/zookeeper

$ sudo chmod -R 755 /usr/local/zookeeper

Reload the configuration file ~/.bashrc with the following command.

$ source ~/.bashrc

Start zookeeper services

Start Zookeeper service using this command.

$ zkServer.sh start

Showing Zookeeper is starting.

JMX enabled by default

Using config: /usr/local/zookeeper/bin/../conf/zoo.cfg

Starting zookeeper ... STARTED

Flume 1.4.0 Installation & Configuration

By Suhas KayarkarJune 06, 2018 No comments:

Apache Flume is a distributed, reliable, and available system for efficiently collecting, aggregating and moving large amounts of log data from many different sources to a centralized data store. Apache Flume is not only restricted to log data aggregation. Flume can be used to transport massive quantities of event data including network traffic data, social-media-data, email messages also.

Prerequisites

Java
Hadoop

Download flume tar file

Download flume 1.4.0 using below command on /usr/local directory.

$ sudo wget http://archive.apache.org/dist/flume/1.4.0/apache-flume-1.4.0-bin.tar.gz /usr/local

Unpack apache-flume-1.4.0-bin.tar.gz file

$ cd /usr/local

$ sudo tar -xvf apache-flume-1.4.0-bin.tar.gz

Rename apache-flume-1.4.0-bin folder to flume

$ sudo mv apache-flume-1.4.0-bin flume

Setting up environment for flume

Edit ~/.bashrc file for set up the flume environment by appending the following lines.

$ sudo nano ~/.bashrc

export FLUME_HOME=/usr/local/flume

export PATH=$FLUME_HOME/bin:$PATH

Reload the configuration file ~/.bashrc with the following command.

# source ~/.bashrc

Rename flume-env.sh.template file to flume-env.sh.

$ cd /usr/local/flume/conf

$ sudo cp flume-env.sh.template flume-env.sh

Edit the flume-env.sh file

$ sudo nano flume-env.sh

Set value for JAVA_HOME and JAVA_OPTS environment variable with java installation directory.(default this variable are commented. Need to uncomment and set values as per below).

JAVA_HOME=/usr/opt/jdk

JAVA_OPTS="-Xms500m –Xmx1000m -Dcom.sun.management.jmxremote"

Note:If we are going to use memory channels while setting flume agents, it is preferable to increase the memory limit in JAVA_OPTS variable. By default, the minimum and maximum memory values are 100 MB and 200 MB respectively. Better to increase these limits to 500 to 1000 MB respectively.

Change the ownership and permissions of the directory /usr/local/flume

$ sudo chown -R hdfs:hdfs /usr/local/flume

$ sudo chmod -R 755 /usr/local/flume

Check version

Check the version of flume.

$ flume-ng version

Flume 1.4.0

Source code repository: https://git-wip-us.apache.org/repos/asf/flume.git

Revision: 756924e96ace470289472a3bdb4d87e273ca74ef

Compiled by mpercy on Mon Jun 24 18:22:14 PDT 2013

From source with checksum f7db4bb30c2114d0d4fde482f183d4fe

Sqoop 1.4.4 Installation & Configuration

By Suhas KayarkarJune 02, 2018 No comments:

Apache Sqoop is a tool designed for efficiently transferring bulk data between Apache Hadoop and relational databases. Sqoop is used to import data from RDBMS to Hadoop Distributed File System or Hadoop eco-systems like Hive and HBase. Similarly, Sqoop can also be used to extract data from Hadoop to relational databases like oracle, Mysql etc.

Prerequisite:

Hadoop
MySql Database

Download Sqoop 1.4.4 using below command on /usr/local directory.

$sudo wget http://archive.apache.org/dist/sqoop/1.4.4/

sqoop-1.4.4.bin__hadoop-2.0.4-alpha.tar.gz /usr/local

Unpack sqoop-1.4.4.bin__hadoop-2.0.4-alpha.tar.gz file

$ cd /usr/local

$ sudo tar -xvf sqoop-1.4.4.bin__hadoop-2.0.4-alpha.tar.gz

Rename sqoop-1.4.4.bin__hadoop-2.0.4-alpha to sqoop

$ sudo mv sqoop-1.4.4.bin__hadoop-2.0.4-alpha sqoop

Setting up environment for Sqoop

Edit ~/.bashrc file for set up the Pig environment by appending the following lines.

$ sudo nano ~/.bashrc

export SQOOP_HOME=/usr/local/sqoop

export PATH=$SQOOP_HOME/bin:$PATH

Reload the configuration file ~/.bashrc with the following command.

# source ~/.bashrc

Sqoop can be connected to various types of databases. It uses JDBC to connect to them. JDBC driver for each of databases is needed by sqoop to connect to them. In this setup we are handling with MySQL connector to connect MySQL DB.

Download below jar files in /usr/local/sqoop path

mysql-connector-java-5.1.25.jar

sqljdbc4.jar

Copy above jar files from /usr/local/sqoop to /usr/local/sqoop/lib/ folder

$sudo cp /usr/local/sqoop/mysql-connector-java-5.1.28.jar /usr/local/sqoop/lib/

$sudo cp /usr/local/sqoop/sqljdbc4.jar /usr/local/sqoop/lib/

Change the ownership and permissions of the directory /usr/local/sqoop

$ sudo chown -R hdfs:hdfs /usr/local/sqoop

$ sudo chmod -R 755 /usr/local/sqoop

Check the version of sqoop.

$ sqoop version

Sqoop 1.4.4

git commit id 050a2015514533bc25f3134a33401470ee9353ad

Compiled by vasanthkumar on Mon Jul 22 20:06:06 IST 2013

For New Generation Of Technologies...

Thursday, 15 November 2018

Spark Installation & Configuration

Prerequisites:

Downloading Apache Spark

Set Environment veriable

Start the Spark Services

Monday, 8 October 2018

Drill 1.10.0 Installation & Configuration

Prerequisites

Download and Installing Drill

Setting up Drill Environment Variables

Start Drill services

Drill Web Interfaces

Wednesday, 22 August 2018

HBase 0.98.4 Installation & Configuration

Prerequisites

Download HBase File

Setting up environment for HBase

Edit Configuration Files

Check Version

Start and verify the Services

HBase Web Interfaces

Wednesday, 11 July 2018

Zookeeper 3.4.6. Installation & Configuration

Prerequisites

Downloading and Installing Zookeeper

Setting up Zookeeper Environment Variables

Start zookeeper services

Wednesday, 6 June 2018

Flume 1.4.0 Installation & Configuration

Prerequisites

Download flume tar file

Setting up environment for flume

Check version

Saturday, 2 June 2018

Sqoop 1.4.4 Installation & Configuration

Prerequisite:

Download Sqoop 1.4.4 using below command on /usr/local directory.

Setting up environment for Sqoop

Total Pageviews

Lables

Blog Archive

About Me

Followers

Labels

Blog Archive

Contact Form

Blogroll

About