Repair partitions using MSCK repair
The MSCK REPAIR TABLE command was designed to manually add partitions that are added to or removed from the file system, such as HDFS or S3, but are not present in the metastore.
This task assumes you created a partitioned external table named
emp_part that stores partitions outside the warehouse. You
remove one of the partition directories on the file system. This action renders the
metastore inconsistent with the file system. You repair the discrepancy manually to
synchronize the metastore with the file system, HDFS for example.
List the partition directories on HDFS.
hdfs dfs -ls /user/hive/dataload/employee Found 4 items drwxr-xr-x - hive hdfs 0 2019-06-09 00:31 /user/hive/dataload/employee/dept=finance drwxr-xr-x - hive hdfs 0 2019-06-09 00:31 /user/hive/dataload/employee/dept=sales drwxr-xr-x - hive hdfs 0 2019-06-09 00:31 /user/hive/dataload/employee/dept=service -rw-r--r-- 3 hdfs hdfs 147 2019-06-08 23:54 /user/hive/dataload/employee/employees.csv
hdfs dfs -rm -r /user/hive/dataload/employee/dept=sales; 19/06/13 00:52:16 INFO fs.TrashPolicyDefault: Moved: 'hdfs://max.com:8020/user/hive/dataload/employee/dept=sales' to trash at:...This action also removes the data file 000000_0 in the dept=sales directory.
Launch Beeline, and in the Hive shell, look at the
SHOW PARTITIONS emp_part;The list of partitions is stale; it still includes the dept=sales directory.
+----------------+ | partition | +----------------+ | dept=finance | | dept=sales | | dept=service | +----------------+
Repair the partition manually.
MSCK REPAIR TABLE emp_part DROP PARTITIONS;