Workflow Management
Also available as:
PDF
loading table of contents...

Create the Sqoop Action to Extract Data

You must add a Sqoop action to the ETL workflow to extract the data from the database.

Steps

  1. Click Create>Workflow.

  2. Click the connector between the Start and End nodes, then click the + icon.

  3. With the workflow graph displayed in the workspace, click the Name field and enter a descriptive name for the workflow.

    For example, ETL workflow.

    The name in this field is the name that displays for the workflow job in the WFM dashboard.

  4. Click the Sqoop icon to add an action node to the workflow.

    This Sqoop action will extract data from the MySQL database.

  5. Click the Sqoop node in the workflow graph and rename it using a descriptive name.

    For example, name the node sqoop-extract. Spaces are not allowed in node names.

    This is necessary because there will be two Sqoop actions in this workflow, and each node in a workflow must have a unique name. Having descriptive node names is also helpful when identifying what a node is intended to do, especially in more complicated workflows.

    [Tip]Tip

    Spaces are not allowed in action node names.

  6. Click the Sqoop node again and then click the Action Settings gear icon.

  7. In the Sqoop action dialog box, select Command.

    In the Sqoop action settings, you can choose to use commands or arguments with a job-XML element to run the job. This example uses a command.

  8. In the Command field, enter a command to extract data.

    For example:

    import --connect jdbc:mysql://wfmgr-5.openstacklocal/customer-data --username wfm --password-file /user/wfm/.password
    --table marketing --split-by rowkey --hive-import -m 1 

    This Sqoop command imports the MySQL data from the database customer-data into a Hive table called marketing. The password for user wfm is called from a password file.

  9. Expand the Advanced Properties section and do the following:

    1. Browse to the directory that contains the Hive and Tez configuration files you copied into a lib directory and add those resources to the File fields.

      For example:

      /user/wfm/oozie/apps/lib/lib_$TIMESTAMP/hive/hive-conf.xml

      /user/wfm/oozie/apps/lib/lib_$TIMESTAMP/tez/tez-conf.xml

    2. In the Prepare section, select delete, and then browse for or type the path to be deleted.

      Selecting delete ensures that if a job is interrupted prior to completion, any files that were created will be deleted prior to re-executing the job, otherwise the rerun cannot complete.

      You can optionally include the delete option in the Command field.

  10. Use the default settings for the remaining fields and options.

  11. Click Save and close the dialog box.

More Information

Apache Sqoop Action

Apache Sqoop User Guide