12.4. Flume | Hands-On Demo On CloudxLab

Описание к видео 12.4. Flume | Hands-On Demo On CloudxLab

Let's do a hands-on exercise in Flume. We will read the data from the port and push it to HDFS. Login to the CloudxLab Linux console on two different terminals. On the first terminal, we will run flume-agent and on the second terminal, we will run a server from which we will read the data.

Copy flume configuration from HDFS to the Linux console. It is located at /data/flume/conf on HDFS. Open flume.properties.
We've defined configurations for agent a1 in this file.
We can define configuration for multiple agents a1, a2, a3 in the same file. While running flume, we can specify the name of the agent which we want to run on that machine.

We have specified source type as netcat.
netcat is a good way to quickly create a server which listens on a specified port.
Let's change the port number to 44444.
While running flume-agent if port 44444 is used by any other user, it will throw up an "Address already in use" error. In that case please change the port to some other number like 44445 or 44446 in flume configuration file.

Sink type is HDFS.
Change HDFS sink path to your home directory in HDFS.

Also please note that we are specifying the channel type as memory which will buffer events in memory.
Bind the source and sink to the channel.

Let's run the flume-agent on the first terminal. Please note that we are specifying the agent name as a1. Port 44444 is used by another process. Let's change the port to 44445 and run the flume agent again. Port 44445 is also used by another process. Change the port to 44443. Run the agent again and this time it is started successfully

Now let’s produce some data. Go to the second terminal and type nc localhost 44443. Type in some data and see if it gets pushed to HDFS in the sink path.

Login to Hue and check the path in the File browser. Here we can see the data in binary format.

This Big Data Tutorial will help you learn HDFS, ZooKeeper, Hive, HBase, NoSQL, Oozie, Flume, Sqoop, Spark, Spark RDD, Spark Streaming, Kafka, SparkR, SparkSQL, MLlib, and GraphX from scratch. Everything in this course is explained with the relevant example thus you will actually know how to implement the topics that you will learn in this course.

Let us know in the comments below if you find it helpful.

In order to claim the certificate from E&ICT Academy, IIT Roorkee, visit https://bit.ly/cxlyoutube

________

Website ► https://www.cloudxlab.com
Facebook ►   / cloudxlab  
Instagram ►   / cloudxlab  
Twitter ►   / cloudxlab  

Комментарии

Информация по комментариям в разработке