Wednesday, 6 June 2018

Flume 1.4.0 Installation & Configuration


Apache Flume is a distributed, reliable, and available system for efficiently collecting, aggregating and moving large amounts of log data from many different sources to a centralized data store. Apache Flume is not only restricted to log data aggregation. Flume can be used to transport massive quantities of event data including network traffic data, social-media-data, email messages also.



Prerequisites

Java
Hadoop

Download flume tar file

Download flume 1.4.0 using below command on /usr/local directory.
$ sudo wget http://archive.apache.org/dist/flume/1.4.0/apache-flume-1.4.0-bin.tar.gz /usr/local
Unpack apache-flume-1.4.0-bin.tar.gz file
$ cd /usr/local
$ sudo tar -xvf apache-flume-1.4.0-bin.tar.gz
Rename apache-flume-1.4.0-bin folder to flume
$ sudo mv apache-flume-1.4.0-bin flume 

Setting up environment for flume

Edit ~/.bashrc file for set up the flume environment by appending the following lines.
$ sudo nano ~/.bashrc 

export FLUME_HOME=/usr/local/flume
export PATH=$FLUME_HOME/bin:$PATH
Reload the configuration file ~/.bashrc with the following command.
# source ~/.bashrc
Rename flume-env.sh.template file to flume-env.sh.
$ cd /usr/local/flume/conf
$ sudo cp flume-env.sh.template flume-env.sh
Edit the flume-env.sh file
$ sudo nano flume-env.sh
Set value for JAVA_HOME and JAVA_OPTS environment variable with java installation directory.(default this variable are commented. Need to uncomment and set values as per below).
JAVA_HOME=/usr/opt/jdk
JAVA_OPTS="-Xms500m –Xmx1000m -Dcom.sun.management.jmxremote"
Note:If we are going to use memory channels while setting flume agents, it is preferable to increase the memory limit in JAVA_OPTS variable. By default, the minimum and maximum memory values are 100 MB and 200 MB respectively. Better to increase these limits to 500 to 1000 MB respectively.

Change the ownership and permissions of the directory /usr/local/flume
$ sudo chown -R hdfs:hdfs /usr/local/flume
$ sudo chmod -R 755 /usr/local/flume

Check version

Check the version of flume.
$ flume-ng version
Flume 1.4.0
Source code repository: https://git-wip-us.apache.org/repos/asf/flume.git
Revision: 756924e96ace470289472a3bdb4d87e273ca74ef
Compiled by mpercy on Mon Jun 24 18:22:14 PDT 2013
From source with checksum f7db4bb30c2114d0d4fde482f183d4fe
Share:

Saturday, 2 June 2018

Sqoop 1.4.4 Installation & Configuration



Apache Sqoop is a tool designed for efficiently transferring bulk data between Apache Hadoop and relational databases. Sqoop is used to import data from RDBMS to Hadoop Distributed File System or  Hadoop eco-systems like Hive and HBase. Similarly, Sqoop can also be used to extract data from Hadoop to relational databases like oracle, Mysql etc.

Prerequisite:

  • Hadoop
  • MySql Database 

Download Sqoop 1.4.4 using below command on /usr/local directory.

$sudo wget http://archive.apache.org/dist/sqoop/1.4.4/

sqoop-1.4.4.bin__hadoop-2.0.4-alpha.tar.gz /usr/local
Unpack sqoop-1.4.4.bin__hadoop-2.0.4-alpha.tar.gz file
$ cd /usr/local

$ sudo tar -xvf sqoop-1.4.4.bin__hadoop-2.0.4-alpha.tar.gz
Rename sqoop-1.4.4.bin__hadoop-2.0.4-alpha to sqoop
$ sudo mv sqoop-1.4.4.bin__hadoop-2.0.4-alpha sqoop

Setting up environment for Sqoop

Edit ~/.bashrc file for set up the Pig environment by appending the following lines.
$ sudo nano ~/.bashrc 

export SQOOP_HOME=/usr/local/sqoop

export PATH=$SQOOP_HOME/bin:$PATH
Reload the configuration file ~/.bashrc with the following command.
# source ~/.bashrc
Sqoop can be connected to various types of databases. It uses JDBC to connect to them. JDBC driver for each of databases is needed by sqoop to connect to them. In this setup we are handling with MySQL connector to connect MySQL DB.
Download below jar files in /usr/local/sqoop path
mysql-connector-java-5.1.25.jar

sqljdbc4.jar
Copy above jar files from /usr/local/sqoop to /usr/local/sqoop/lib/ folder
$sudo cp /usr/local/sqoop/mysql-connector-java-5.1.28.jar /usr/local/sqoop/lib/

$sudo cp /usr/local/sqoop/sqljdbc4.jar /usr/local/sqoop/lib/
Change the ownership and permissions of the directory /usr/local/sqoop
$ sudo chown -R hdfs:hdfs /usr/local/sqoop

$ sudo chmod -R 755 /usr/local/sqoop
Check the version of sqoop.
$ sqoop version

Sqoop 1.4.4

git commit id 050a2015514533bc25f3134a33401470ee9353ad

Compiled by vasanthkumar on Mon Jul 22 20:06:06 IST 2013

Share:

Total Pageviews

Lables

Powered by Blogger.

Followers