Error :

 

[ERROR] 2021-03-31 13:18:54,023 [pool-3-thread-1] infoworks.discovery.export.filesystem.FilesystemDestination:148 :: Error while initializing data directories

com.amazon.ws.emr.hadoop.fs.consistency.exception.ConsistencyException: Directory 'ingestion/full/csv' present in the metadata but not s3

 

 

listStatus: No Amazon S3 object for metadata item /S3_bucket/dir/object

getFileStatus: Key dir/file is present in metadata but not Amazon S3


Root cause: 

 

For every Amazon S3 operation, EMRFS checks the metadata for information about the set of objects in consistent view. If EMRFS finds that Amazon S3 is inconsistent during one of these operations, it retries the operation according to parameters defined in emrfs-site configuration properties. After EMRFS exhausts the retries, it either throws a ConsistencyException or logs the exception and continue the workflow. 


Solution:


Mostly the consistent problem comes due to retry logic in spark and hadoop systems. When a process of creating a file on s3 failed, but it was already updated in the dynamodb. when the hadoop process restarts the process as the entry is already present in the dynamodb. It throws a consistent error.

 

 

 

If you want to delete the metadata of s3 which is stored in the dynamoDB, whose objects are already removed


emrfs delete s3://path


Retrieves the metadata for the objects that are physically present in s3 into dynamo db

 

emrfs import s3://path


Sync the data between s3 and the metadata.


emrfs sync s3://path


 To see whether that particular object is present in both s3 and metadata


emrfs diff s3://path


Ref :http://docs.aws.amazon.com/emr/latest/ManagementGuide/emrfs-cli-reference.html