Command Line Installation
Also available as:
PDF
loading table of contents...

Installing Flume

Flume is included in the HDP repository, but it is not installed automatically as part of the standard HDP installation process. Hortonworks recommends that administrators not install Flume agents on any node in a Hadoop cluster. The following image depicts a sample topology with six Flume agents:

  • Agents 1, 2, and 4 installed on web servers in Data Centers 1 and 2.

  • Agents 3 and 5 installed on separate hosts in Data Centers 1 and 2 to collect and forward server data in Avro format.

  • Agent 6 installed on a separate host on the same network as the Hadoop cluster in Data Center 3 to write all Avro-formatted data to HDFS

[Note]Note

It is possible to run multiple Flume agents on the same host. The sample topology represents only one potential data flow.

[Note]Note

Hortonworks recommends that administrators use a separate configuration file for each Flume agent. In the diagram above, agents 1, 2, and 4 may have identical configuration files with matching Flume sources, channels, sinks. This is also true of agents 3 and 5. While it is possible to use one large configuration file that specifies all the Flume components needed by all the agents, this is not typical of most production deployments. See Configuring Flume for more information about configuring Flume agents.

For additional information regading Flume, see Apache Flume Component Guide.

Prerequisites

  1. You must have at least core Hadoop on your system. See Configuring the Remote Repositories for more information.

  2. Verify the HDP repositories are available:

    yum list flume

    The output should list at least one Flume package similar to the following:

    flume.noarch 1.5.2.2.2.6.0-2800.el6 HDP-2.6

    If yum responds with "Error: No matching package to list" as shown below, yum cannot locate a matching RPM. This can happen if the repository hosting the HDP RPMs is unavailable, or has been disabled. Follow the instructions at Configuring the Remote Repositories to configure private repository before proceeding.

    Error: No matching package to list.
  3. You must have set up your JAVA_HOME environment variable per your operating system. See JDK Requirements for instructions on installing JDK.

    export JAVA_HOME=/path/to/java
  4. The following Flume components have HDP component dependencies. You cannot use these Flume components if the dependencies are not installed.

    Table 18.1. Flume 1.5.2 Dependencies

    Flume Component

    HDP Component Dependencies

    HDFS Sink

    Hadoop 2.5

    HBase Sink

    HBase 0.98.0

    Hive Sink

    Hive 0.13.0, HCatalog 0.13.0, and Hadoop 2.5


Installation

Verify the HDP repositories are available for your Flume installation by entering yum list flume. See Prerequisites for more information.

To install Flume from a terminal window, type:

  • For RHEL or CentOS:

    yum install flume

    yum install flume-agent #This installs init scripts

  • For SLES:

    zypper install flume

    zypper install flume-agent #This installs init scripts

  • For Ubuntu and Debian:

    apt-get install flume

    apt-get install flume-agent #This installs init scripts

The main Flume files are located in /usr/hdp/current/flume-server. The main configuration files are located in /etc/flume/conf.