Discover how adjusting the `max_parallel_workers_per_gather` setting in PostgreSQL affects query performance and why it may lead to slower execution times in some scenarios.
---
This video is based on the question https://stackoverflow.com/q/73041265/ asked by the user 'Dyaksa Hanindito' ( https://stackoverflow.com/u/12209343/ ) and on the answer https://stackoverflow.com/a/73044420/ provided by the user 'jjanes' ( https://stackoverflow.com/u/1721239/ ) at 'Stack Overflow' website. Thanks to these great users and Stackexchange community for their contributions.
Visit these links for original content and any more details, such as alternate solutions, latest updates/developments on topic, comments, revision history etc. For example, the original title of the Question was: Postgresql - is setting max_parallel_workers_per_gather higher could even make query slower?
Also, Content (except music) licensed under CC BY-SA https://meta.stackexchange.com/help/l...
The original Question post is licensed under the 'CC BY-SA 4.0' ( https://creativecommons.org/licenses/... ) license, and the original Answer post is licensed under the 'CC BY-SA 4.0' ( https://creativecommons.org/licenses/... ) license.
If anything seems off to you, please feel free to write me at vlogize [AT] gmail [DOT] com.
---
The Curious Case of max_parallel_workers_per_gather in PostgreSQL
When it comes to optimizing queries in PostgreSQL, one might think that allowing more workers to process a query concurrently will always yield better performance. However, as one user discovered, increasing the max_parallel_workers_per_gather setting can sometimes lead to slower execution times. In this blog, we will explore why this happens and how it demonstrates the complexities of query optimization in relational databases.
Understanding the Problem
Imagine you are working on a PostgreSQL query that requires significant processing power. Upon testing, you decide to modify the max_parallel_workers_per_gather parameter, which determines the maximum number of parallel worker processes that can be used when executing a query. Initially, you set this parameter to 2 and witnessed a relatively quick execution time of 3.5 seconds.
However, when you increased the parameter to 8, expecting an improvement in performance, the execution time ballooned to 15 seconds. This unexpected slowing down raises the critical question:
Is it possible that lowering max_parallel_workers_per_gather may actually improve query speed?
The Answer: Yes, It Can Happen
Yes, the answer is an emphatic yes. As we delve into the intricacies of PostgreSQL performance, we find that several factors influence how parallel execution affects query speed. Below are the primary considerations:
1. I/O Capacity and Parallel Execution
When you increase the max_parallel_workers_per_gather, you are signaling to PostgreSQL that your system can handle multiple concurrent I/O operations effectively. However:
If your system lacks sufficient I/O capacity, allowing more workers could lead to contention and competition for resources.
Increased contention can result in a query execution plan that relies heavily on random I/O, which can become a bottleneck and slow the process down.
In this specific case, the performance degradation when increasing the workers may be linked to the database system being unable to efficiently handle the additional load, especially if other queries or background processes are already using significant I/O resources.
2. Query Execution Plans Matter
The execution plan dictates how the database engine decides to fetch and process data. In your scenario, with 2 workers, the plan was efficient, leading to a reduced execution time. However, with 8 workers, the plan shifted, resulting in over 2 million loops during an Index Scan. This drastic change can greatly impact performance:
Understanding Execution Plans: When testing, be sure to examine the differences in execution plans with these varying settings. It's essential to know that PostgreSQL might choose a different execution path based on the available resources and the number of requested workers.
Cost of Random Access: The increase in loops indicates that the system was performing many random access reads rather than sequential reads, which are typically more efficient.
3. Testing and Cache Considerations
In your tests, you cleared the query cache between runs, allowing for a fair comparison. While this is an important step to ensure clean tests, it's also worth noting that query performance can improve with cached data. Each execution can behave differently depending on how the data is cached and accessed.
Conclusion: Finding the Right Balance
The case you encountered highlights an essential lesson in database query optimization: more is not always better. When configuring PostgreSQL parameters like max_parallel_workers_per_gather, it's critical to balance resource availability, expected workload,
Информация по комментариям в разработке