Known Issues Iceberg

Learn about the known issues in Iceberg, the impact or changes to the functionality, and the workaround.

Concurrent compactions and modify statements can corrupt Iceberg tables
Hive or Impala DELETE/UPDATE/MERGE operations on Iceberg V2 tables can corrupt the tables if there is a concurrent table compaction from Spark. The issue happens if the compaction and modify statement runs in parallel, and if the compaction job commits before the modify statement. In that case the modify statement’s position delete files still point to the old files. The results in the case of DELETE and in the case of UPDATE / MERGE are as follows:
  • DELETE

    Delete records pointing to old files have no effect.

  • UPDATE / MERGE

    Delete records pointing to old files have no effect. The table will also have the newly added data records, which means rewritten records will still be active.

Use one of the following workarounds:
  • Do not run compactions and DELETE/UPDATE/MERGE statements in parallel.
  • Do not compact the table via Iceberg’s RewriteFiles operation. For example do not use Spark’s rewriteDataFiles.
CDPD-57551: Performance issue can occur on reads after writes of Iceberg tables
Hive might generate too many small files, which causes performance degradation.
Maintain a relatively small number of data files under the iceberg table/partition directory to have efficient reads. To alleviate poor performance caused by too many small files, run the following queries:
TRUNCATE TABLE target;
INSERT OVERWRITE TABLE target select * from target FOR SYSTEM_VERSION AS OF <preTruncateSnapshotId>;