Data Access
Also available as:
PDF
loading table of contents...

Chapter 6. Orchestrating SQL and APIs with Apache Phoenix

Apache Phoenix is a SQL abstraction layer for interacting with Apache HBase and other Hadoop components. Phoenix lets you create and interact with tables in the form of typical DDL/DML statements via its standard JDBC API. With the driver APIs, Phoenix translates SQL to native HBase API calls. Consequently, Phoenix provides a SQL skin for working with data and objects stored in the NoSQL schema of HBase.

This Phoenix documentation focuses on interoperability with HBase. For more information about Phoenix capabilities, see the Apache Phoenix website.

Enabling Phoenix and Interdependent Components

If you have a Hortonworks Data Platform installation with Ambari, then no separate installation is required for Phoenix.

To enable Phoenix with Ambari:

  1. Open Ambari.

  2. Select Services tab > HBase > Configs tab.

  3. Scroll down to the Phoenix SQL settings.

  4. (Optional) Reset the Phoenix Query Timeout.

  5. Click the Enable Phoenix slider button.

If you installed Hortonworks Data Platform manually and did not include the Phoenix component, see Installing Apache Phoenix.

[Important]Important

Your Phoenix installation must be the same version as the one that is packaged with the distribution of the HDP stack version that is deployed across your cluster.

Thin Client Connectivity with Phoenix Query Server

The Phoenix Query Server (PQS) is a component of the Apache Phoenix distribution. PQS provides an alternative means to connect directly. PQS is a stand-alone server that converts custom API calls from "thin clients" to HTTP requests that make use of Phoenix capabilities. This topology offloads most computation to PQS and requires a smaller client-side footprint. The PQS client protocol is based on the Avatica component of Apache Calcite.

Securing Authentication on the Phoenix Query Server

You can enable Kerberos-based authentication on PQS with Ambari. If you chose to install HDP manually instead, see Configuring Phoenix Query Server to enable the Kerberos protocol.

Selecting and Obtaining a Client Driver

You have two options to develop an application that works with Phoenix, depending on the client-server architecture:

Without Phoenix Query Server: If your environment does not have a PQS layer, applications that connnect to Phoenix must use the Phoenix JDBC client driver.

With Phoenix Query Server: PQS is an abstraction layer that enables other languages such as Python and GoLang to work with Phoenix. The layer provides a protocol buffer as an HTTP wrapper around Phoenix JDBC. You might prefer to use a non-Java client driver for one of various reasons, such as to avoid the JVM footprint on the client or to develop with a different application framework.

To obtain the appropriate driver for application development:

JDBC Driver

Use the /usr/hdp/current/phoenix-client/phoenix-client.jar file in the Hortonworks Phoenix server-client repository . If you use the repository, download the JAR file corresponding to your installed HDP version. With Ambari, you can determine the HDP version by using the Versions tab. Alternatively, run the hadoop version command to print information displaying the HDP version.

JDBC Driver as a Maven dependency

See Download the HDP Maven Artifacts for Maven artifact repositories that are available for HDP.

Microsoft .NET Driver

Download and install a NuGet package for the Microsoft .NET Driver for Apache Phoenix and Phoenix Query Server. Note: Operability with this driver is a Hortonworks Technical Preview and considered under development. Do not use this feature in your production systems. If you have questions regarding this feature, contact Support by logging a case on the Hortonworks Support Portal.

Other non-Java drivers

Other non-JDBC Drivers for Phoenix are available as HDP add-ons and on other websites, but they are not currently supported by Hortonworks. You can find compatible client drivers by constructing a web search string consisting of "avatica" and the name of an application programming language that you want to use. Example: avatica python .

Creating and Using User-Defined Functions (UDFs) in Phoenix

With a user-defined function (UDF), you can extend the functionality of your SQL statements by creating scalar functions that operate on a specific tenant. For details about creating, dropping, and how to use UDFs for Phoenix, see User-defined functions on the Apache website.

Mapping Phoenix Schemas to HBase Namespaces

You can map a Phoenix schema to an HBase namespace to gain multitenancy features in Phoenix.

HBase, which is often the underlying storage engine for Phoenix, has namespaces to support multitenancy features. Multitenancy helps an HBase user or administrator perform access control and quota management tasks. Also, namespaces enable tighter control of where a particular data set is stored on RegionsServers. See Enabling Multitenancy with Namepaces for further information.

Prior to HDP 2.5, Phoenix tables could not be associated with a namespace other than the default namespace.

Enabling Namespace Mapping

[Important]Important

After you set the properties to enable the mapping of Phoenix schemas to HBase namespaces, reverting the property settings renders the Phoenix database unusable. Test or carefully plan the Phoenix to HBase namespace mappings before implementing them.

To enable Phoenix schema mapping to a non-default HBase namespace:

  1. Set the phoenix.schema.isNamespaceMappingEnabled property to true in the hbase-site.xml file of both the client and the server.

  2. Restart the HBase Master and RegionServer processes.

[Note]Note

You might not want to map Phoenix system tables to namespaces because there are compatibility issues with your current applications. In this case, set the phoenix.schema.mapSystemTablesToNamespace property of the hbase-site.xml file to false.

Namespace Mapping Properties in the hbase-site.xml File

phoenix.schema.isNamespaceMappingEnabled

Enables mapping of tables of a Phoenix schema to a non-default HBase namespace. To enable mapping of schema to a non-default namespace, set the value of this property to true. Default setting for this property is false.

phoenix.schema.mapSystemTablesToNamespace

With true setting (default): After namespace mapping is enabled with the other property, all system tables, if any, are migrated to a namespace called system.

With false setting: System tables are associated with the default namespace.

Creating New Schemas and Tables with Namespace Mapping

You can use the following DDL statements for managing schemas:

  • CREATE SCHEMA

  • USE SCHEMA

  • DROP SCHEMA

You must have admin privileges in HBase to run CREATE SCHEMA or DROP SCHEMA.

See the Apache Phoenix Grammar reference page for how you can use these DDL statements.

As you create physical tables, views, and indexes, you can associate them with a schema. If the schema already has namespace mapping enabled, the newly created objects automatically become part of the HBase namespace. The directory of the HBase namespace that maps to the Phoenix schema inherits the schema name. For example, if the schema name is store1, then the full path to the namespace is $hbase.rootdir/data/store1. See the "F.A.Q." section of Apache Phoenix Namespace Mapping for more information.

Associating Tables of a Schema to a Namespace

After you enable namespace mapping on a Phoenix schema that already has tables, you can migrate the tables to an HBase namespace. The namespace directory that contains the migrated tables inherits the schema name. For example, if the schema name is store1, then the full path to the namespace is $hbase.rootdir/data/store1.

System tables are migrated to the namespace automatically during the first connection after enabling namespace properties.

Associating in a Noncustomized Environment without Kerberos

Run the following command to associate a table:

$bin/psql.py {ZooKeeper_hostname -m schema_name.table_name}

Associating in a Customized Kerberos Environment

Prerequisite: In a Kerberos-secured environment, you must have admin privileges (user hbase) to complete the following task.

  1. Navigate to the Phoenix home directory. The default location is /usr/hdp/current/phoenix-client/.

  2. Run a command to migrate a table of a schema to a namespace, using the following command syntax for the options that apply to your environment:

    $ bin/psql.py {{ZooKeeper_hostnames:2181} | [:zookeeper.znode.parent] | [:HBase_headless_keytab_location] | [:principal_name] | [;TenantId=tenant_Id] | [;CurrentSCN=current_SCN]}
    -m {schema_name.table_name}

Additional information for valid command parameters:

ZooKeeper_hostnames

Enter the ZooKeeper hostname or hostnames that compose the ZooKeeper quorum. If you enter multiple hostnames, enter them as comma-separated values. This parameter is required. You must append the colon and ZooKeeper port number if you invoke the other security parameters in the command. The default port number is 2181.

zookeeper.znode.parent

This setting is defined in the hbase-site.xml file.

-m schema_name.table_name

The -m argument is required. There is a space before and after the -m option.

Phoenix Repair Tool

[Important]Important

The Phoenix repair tool of HDP is a technical preview and considered under development. Do not use this feature in your production systems. If you have questions regarding this feature, contact Support by logging a case on the Hortonworks Support Portal.

Apache Phoenix depends on the SYSTEM.CATALOG table for metadata information, such as table structure and index location, to function correctly. Use the Phoenix repair tool to validate the data integrity of the SYSTEM.CATALOG table. If a Phoenix client is not functioning as expected and throwing exceptions such as ArrayIndexOutOfBound or TableNotFound, this tool can help identify the problem and fix it.

The repair tool is designed to flag issues that are flagrant trouble spots and to fix SYSTEM.CATALOG problems in a way that does not radically affect your Phoenix system. The tool prompts you to confirm changes before the SYSTEM.CATALOG table is modified.

Do not use the Phoenix repair tool for an upgrade. The tool is designed to function ony with the current version of the system catalog and to use the HBase API directly.

Running the Phoenix Repair Tool

[Tip]Tip

Run the HDFS fsck and HBase hbck tools before running the Phoenix repair tool. Checking the condition of HDFS and HBase is highly recommended because the Phoenix repair tool does not run on HDFS and HBase, both of which must be in working order for the repair tool to fix Phoenix problems.

About this Task

  • The Phoenix repair tool looks for table records in the system catalog and collects all corresponding information about columns and indexes. If certain inconsistencies are detected, then the tool prompts you to verify that it should proceed with fixing the problems. The tool can fix the following problems:

    • Missing or disabled physical table

    • Incorrect number of columns in table metadata information

    • Table record has columns with an internal index that is out of range

  • The tool performs a cross-reference check between user tables and indexes. If a user table has an index that misses a physical table, the tool offers to delete the link to this index as well as to delete the index table record from the system catalog. If the physical table is disabled, the tool asks whether it needs to be enabled.

  • If you allow the Phoenix repair tool to fix an issue, the tool creates a snapshot of the SYSTEM.CATALOG table. The snapshot is created in case you want to rollback the repair operation.

Prerequisites

Verify that no concurrent execution of the Phoenix repair tool launches or runs while you run the tool. Also, ensure that no other clients modify the system catalog data while the tool runs.

Steps

  1. Run the psl.py utility with the -r option:

    /usr/hdp/current/phoenix-client/psql.py -r
  2. If the tool detects previously stored snapshots on the system, respond to the Restore dialogue prompt:

    • Respond whether the tool should delete or retain the previously recorded snapshots.

    • Indicate whether the tool should proceed with the integrity check or restore a table from the one of the snapshots.

Result and Next Step

After the tool completes the check, you can consider the SYSTEM.CATALOG table as validated. You can proceed with SQL operations in the Phoenix CLI.