Troubleshooting DAS Installation
Also available as:
PDF

Replication failure in the DAS Event Processor

DAS uses replication as a way to copy database and table metadata information from Hive to DAS Postgres database. If the replication fails, then you may not be able to see database or table information in DAS.

The Hive metadata replication process occours in the following two phases:
  • Bootstrap dump
  • Incremental dump

When the DAS Event Processor is started for the first time, the entire Hive database and table metadata is copied in to DAS. This is known as the bootstrap dump. After this phase, only the differences are copied in to DAS at one-minute intervals, from the time the last successful dump was run. This is known as an incremental dump.

If the bootstrap dump never succeeded, then you will not see any database or table information in DAS. If the bootstrap dump fails, then the information regarding the failure is captured in the most recent Event Processor log.

If an incremental dump fails, then you will not see any new changes to the databases and tables in DAS. The incremental dump relies on events stored in the Hive metastore, because these events take up a lot of space and are only used for replicating data. The events are removed from Hive metastore daily, which can affect DAS.

Fixing incremental dump failures

If you see the message “Notification events are missing in the meta store”, then:
  1. Stop the DAS Event Processor.
  2. Log in to the DAS Postgres database.
  3. Re-initiate the bootstrap dump by running the following commands:
    update das.databases set creation_source = 'EVENT_PROCESSOR';
     update das.tables set creation_source = 'EVENT_PROCESSOR';
     update das.columns set creation_source = 'EVENT_PROCESSOR';
     delete from das.db_replication_info;
  4. Restart the DAS Event Processor.
Note
Note

The error message "Notification events are missing in the meta store" is a symptom and not the cause of an issue, at most times. You can usually fix this by resetting the database. However, to determine the actual cause of the failure, we need to look at the first repl dump failure. Older Event Processor logs may contain the information about the actual cause, but they are purged from the system at a set frequency. Therefore, if you hit this issue, we recommend that you first copy all the Event Processor logs that are available; else it might be too late to diagnose the actual issue.

If the first exception that you hit is an SQLException, then it is a Hive-side failure. Save the HiveServer and the Hive Metastore logs for the time when the exception occoured.

File a bug with Cloudera Support along with the above-mentioned logs.