[ERROR] 2021-03-31 13:18:54,023 [pool-3-thread-1] infoworks.discovery.export.filesystem.FilesystemDestination:148 :: Error while initializing data directories
com.amazon.ws.emr.hadoop.fs.consistency.exception.ConsistencyException: Directory 'ingestion/full/csv' present in the metadata but not s3
listStatus: No Amazon S3 object for metadata item /S3_bucket/dir/object
getFileStatus: Key dir/file is present in metadata but not Amazon S3
For every Amazon S3 operation, EMRFS checks the metadata for information about the set of objects in consistent view. If EMRFS finds that Amazon S3 is inconsistent during one of these operations, it retries the operation according to parameters defined in emrfs-site configuration properties. After EMRFS exhausts the retries, it either throws a ConsistencyException or logs the exception and continue the workflow.
Mostly the consistent problem comes due to retry logic in spark and hadoop systems. When a process of creating a file on s3 failed, but it was already updated in the dynamodb. when the hadoop process restarts the process as the entry is already present in the dynamodb. It throws a consistent error.
If you want to delete the metadata of s3 which is stored in the dynamoDB, whose objects are already removed
emrfs delete s3://path
Retrieves the metadata for the objects that are physically present in s3 into dynamo db
emrfs import s3://path
Sync the data between s3 and the metadata.
emrfs sync s3://path
To see whether that particular object is present in both s3 and metadata
emrfs diff s3://path