When you are running some incremental ingestion jobs or pipeline jobs, sometimes you might see the below errors related to broadcast joins in the failed databricks logs.
org.apache.spark.sql.execution.OutOfMemorySparkException: Size of broadcasted table far exceeds estimates and exceeds limit of spark.driver.maxResultSize=4294967296. You can disable broadcasts for this query using set spark.sql.autoBroadcastJoinThreshold=-1
As suggested in the exception itself we have 2 options here, either to increase the driver max result size or disable the broadcast joins.
You can set the below-advanced configs at table level or at the pipeline level.
For ingestion tables,
Applicable Infoworks Data Foundry Versions