Using Apache Phoenix to store and access data
Also available as:

Limitations of Apache Phoenix-spark connector

You should be aware of the following limitations on using the Apache Phoenix-Spark connector:

  • You can use the DataSource API only for basic support for column and predicate pushdown.
  • The DataSource API does not support passing custom Phoenix settings in configuration. You must create the DataFrame or RDD directly, if you need a fine-grained configuration.
  • There is no support for aggregate or distinct queries, but you can perform any operation on RDDs or DataFrame formed after reading data from Phoenix.

    The Phoenix JDBC driver normalizes column names, but the Phoenix-Spark integration does not perform this operation while loading data from Phoenix Table. so, while creating data frames or RDDs from Phoenix table(sparkContext.phoenixTableAsRDD or sqlContext.phoenixTableAsDataFrame), you must specify column names in the same way as defined when the Phoenix table was created. However, while persisting data frame in Phoenix , it can normalize the column names(which are not double quoted) by default, which can also be turned off by setting the skipNormalizingIdentifier parameter to true.

    df.saveToPhoenix(<tableName>, zkUrl = Some(quorumAddress),skipNormalizingIdentifier=true)