After running several internal tests with both the Direct load and Indirect load options, we have observed the following performance results:
Please note that our recommendation is solely based on our testing of the GCP-provided Spark BigQuery connector. We encourage you to seek GCP support for their official recommendation on load options for the Spark BigQuery connector. Their insights and expertise will provide further guidance on making the most optimal data loading decisions for your specific use cases.
1. For datasets smaller than 10 GB, the Direct load option outperforms the Indirect load option by approximately 15% in terms of loading speed.
2. For datasets greater than 10 GB, the Indirect load option demonstrates superior performance, loading data approximately 40% faster compared to the Direct load option.
Based on our findings, we recommend utilizing the Direct load option for smaller tables with data sizes below 10 GB. This choice will ensure faster data loading and efficient resource utilization in such scenarios.
Conversely, for larger tables exceeding 10 GB, we recommend opting for the Indirect load option. This approach offers substantial performance gains, making it the preferred choice for data loading in these cases.