Error :


[ERROR] 2021-03-31 13:18:54,023 [pool-3-thread-1] infoworks.discovery.export.filesystem.FilesystemDestination:148 :: Error while initializing data directories Directory 'ingestion/full/csv' present in the metadata but not s3



listStatus: No Amazon S3 object for metadata item /S3_bucket/dir/object

getFileStatus: Key dir/file is present in metadata but not Amazon S3

Root cause: 


For every Amazon S3 operation, EMRFS checks the metadata for information about the set of objects in consistent view. If EMRFS finds that Amazon S3 is inconsistent during one of these operations, it retries the operation according to parameters defined in emrfs-site configuration properties. After EMRFS exhausts the retries, it either throws a ConsistencyException or logs the exception and continue the workflow. 


Mostly the consistent problem comes due to retry logic in spark and hadoop systems. When a process of creating a file on s3 failed, but it was already updated in the dynamodb. when the hadoop process restarts the process as the entry is already present in the dynamodb. It throws a consistent error.




If you want to delete the metadata of s3 which is stored in the dynamoDB, whose objects are already removed

emrfs delete s3://path

Retrieves the metadata for the objects that are physically present in s3 into dynamo db


emrfs import s3://path

Sync the data between s3 and the metadata.

emrfs sync s3://path

 To see whether that particular object is present in both s3 and metadata

emrfs diff s3://path

Ref :