Limitations of Apache Phoenix-spark connector
You should be aware of the following limitations on using the Apache Phoenix-Spark connector:
- You can use the DataSource API only for basic support for column and predicate pushdown.
- The DataSource API does not support passing custom Phoenix settings in configuration. You must create the DataFrame or RDD directly, if you need a fine-grained configuration.
- There is no support for aggregate or distinct queries, but you can perform any
operation on RDDs or DataFrame formed after reading data from Phoenix.Note
The Phoenix JDBC driver normalizes column names, but the Phoenix-Spark integration does not perform this operation while loading data from Phoenix Table. so, while creating data frames or RDDs from Phoenix table(sparkContext.phoenixTableAsRDD or sqlContext.phoenixTableAsDataFrame), you must specify column names in the same way as defined when the Phoenix table was created. However, while persisting data frame in Phoenix , it can normalize the column names(which are not double quoted) by default, which can also be turned off by setting the skipNormalizingIdentifier parameter to true.
df.saveToPhoenix(<tableName>, zkUrl = Some(quorumAddress),skipNormalizingIdentifier=true)