Apche Hadoop Flume Tutorial

What is Apache Flume?

Apache Flume is one tool and used to moving data from one place to another place.Flume is the distributed systems that transporting the data at reliable manner.Flume is most important part of hadoop ecosystem.In Apache flume all data unit consider as one event. It collecting log data from various web servers to HDFS.

Features of Apache Flume:

  • Main Feature of flume is collected data from multiple web servers
  • It import large amount of data that produced by facebook,twitter.
  • It supports Fan-in-Fan-out flows and more amount of sources and destination types.
  • It collects the data from multiple sources and move to destination.

Apche Hadoop Flume Tutorial

Main Components of apche Flume:

Event:

All data units are event in Apache flume.Main Purpose of event is tranport the data from source to destination and it have byte array for data storage.It tranport single data to destination.

Agent:

Work of agent is get data from client or another agent and transferred to next destination.

Source:

Source is the sub component of agent.It collect data from generators and send to the channels.Flume supports more number of sources and receives data from generators. It is the main component of flume because it enter the data into flume.

Sink:

Flume contains more number of sinks also. It deletes the data from channels and move that data to next destination. Destination of sink is the one type of agent.

Channel:

Channels are collect the events from sources and deleted by sinks. Channel can work with many number of sinks and sources.

Client:

Events are made by clients and send that events to agents

Channel Sectors:

Main purpose of channel sectors is determines which channel are transfer the data. There are two types of channel Sectors

Default Channel sectors – it send all events to all channel

Multiplexing Channel Sectors – It sents events based on address of channel

Data Flow in Flume:

Flume contains following four types of data flow

1.Multi hop Flow:

Flume contails more number of agents and if event travel thorugh one event it is called multi hop flow

2.Fan out Flow:

The data send one source to muliple channel in flume is called fan out flow. There are two types of fan out flow

  • Replicating work flow
  • Multiplexing work flow

3.Fan in Flow:

The data send one channel to multiple sources are called Fan in Flow

4.Failure Handling:

All events have two transactions that are sender and receiver. Sender send the data to receiver and if once receiver receives the data and send received signal to sender. Once sender receives the signal and transaction made by sender. If no signal received by sender transaction should not be processed.