Optimize Black's Multiprocessing For Speed
Hey guys! Today, I want to chat about a cool optimization trick that could seriously boost the performance of Black, the uncompromising Python code formatter, when it's running in multiprocessing mode. This isn't just some random idea; it's based on a real-world observation and a bit of experimentation. So, buckle up, and let's dive into how we can make Black even faster!
The Idea: Schedule Slow Things First
So, the core concept here revolves around smartly scheduling files for Black to format. Imagine you've got a bunch of files, some small and some massive. If you schedule the big, slow files last, you might end up with a situation where your cores are mostly idle while waiting for that one behemoth to finish. This is a classic case of inefficient bin packing.
The main idea is simple: prioritize the "slower" tasks first. By tackling the larger files upfront, the scheduler has a better chance to efficiently distribute the smaller files later on. This approach can lead to significantly reduced overall execution time, especially when dealing with a mix of file sizes.
I initially stumbled upon this potential improvement while investigating a report from the FreeCAD project. They were experiencing slow execution times with Black in their pre-commit.ci setup. After digging around, I pinpointed two key factors: the use of a pre-compiled version of Black (which is now being rolled out automatically) and the presence of several very large files that were hogging most of the processing time. It became clear that optimizing how Black handles these large files could lead to substantial performance gains.
Experimenting with Black and File Scheduling
To test this idea, I ran an experiment using Black on a virtual machine with 12 cores. The goal was to see if prioritizing larger files could indeed improve performance. I used a specific commit of FreeCAD as the codebase and applied a small diff to .pre-commit-config.yaml
to isolate the Black execution.
Here's the diff I used:
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
index 101f719e..8e277c8d 100644
--- a/.pre-commit-config.yaml
+++ b/.pre-commit-config.yaml
@@ -44,7 +44,6 @@
src/Gui/3Dconnexion/navlib|
src/Gui/QSint|
src/Gui/Quarter|
- src/Mod/Fem/femexamples|
src/Mod/Import/App/SCL|
src/Mod/Import/App/SCL_output|
src/Mod/Mesh/App/TestData|
@@ -70,3 +69,13 @@
rev: 719856d56a62953b8d2839fb9e851f25c3cfeef8 # frozen: v21.1.2
hooks:
- id: clang-format
+- repo: local
+ hooks:
+ - id: black-local
+ name: black
+ description: "Black: The uncompromising Python code formatter"
+ entry: black --line-length 100
+ language: system
+ minimum_pre_commit_version: 2.9.2
+ require_serial: true
+ types_or: [python, pyi]
And here's the shell script I used to run the tests:
#!/usr/bin/env bash
set -euxo pipefail
pip uninstall -yqq black
pip install -qq /tmp/black --no-deps
rm -rf ~/.cache/black
pre-commit run black-local --all-files --verbose
Important Notes: I used a non-precompiled version of Black for this experiment. While I suspect the results would be similar with the pre-compiled version, the performance difference might be less pronounced. Also, I didn't cherry-pick a particularly terrible set of files. If I wanted a worst-case scenario, I could have intentionally scheduled the slowest files last.
Baseline Performance
First, I established a baseline by running Black without any modifications. After a few runs, I got a representative time:
$ bash t.sh
+ pip uninstall -yqq black
+ pip install -qq /tmp/black --no-deps
+ rm -rf /home/asottile/.cache/black
+ pre-commit run black-local --all-files --verbose
black....................................................................Passed
- hook id: black-local
- duration: 32.82s
All done! ✨ 🍰 ✨
1002 files left unchanged.
As you can see, the baseline execution time was around 32.82 seconds.
Applying the Optimization
Next, I applied a patch to Black that sorts the files by size before scheduling them for formatting. The idea is that larger files are likely to take longer, so we want to process them first.
diff --git a/src/black/concurrency.py b/src/black/concurrency.py
index f6a2b8a..fd8db5f 100644
--- a/src/black/concurrency.py
+++ b/src/black/concurrency.py
@@ -164,7 +164,7 @@
executor, format_file_in_place, src, fast, mode, write_back, lock
)
): src
- for src in sorted(sources)
+ for src in sorted(sources, key=lambda f: (-f.stat().st_size, f))
}
pending = tasks.keys()
try:
With this patch in place, I ran the same test again:
$ bash t.sh
+ pip uninstall -yqq black
+ pip install -qq /tmp/black --no-deps
+ rm -rf /home/asottile/.cache/black
+ pre-commit run black-local --all-files --verbose
black....................................................................Passed
- hook id: black-local
- duration: 26.94s
All done! ✨ 🍰 ✨
1002 files left unchanged.
The result? A significant improvement! The execution time dropped to around 26.94 seconds. While this is just anecdotal evidence from a few runs, it strongly suggests that there's potential for optimization here. It is important to remember that more rigorous benchmarking is necessary to validate these findings.
Potential and Further Exploration
So, what does this all mean? It means that by intelligently scheduling files for Black, we can potentially squeeze out some extra performance. Prioritizing larger files, which are likely to be slower, allows the scheduler to better pack the smaller files later on, leading to more efficient core utilization.
Of course, this is just a starting point. There are many other factors that could influence Black's performance, and further exploration is needed. For example:
- More sophisticated scheduling algorithms: Instead of simply sorting by file size, we could use more advanced algorithms that take into account file complexity, code structure, or even historical formatting times.
- Dynamic scheduling: We could dynamically adjust the scheduling based on the current load on the cores. If a core is idle, we could immediately assign it the next largest file.
- Integration with pre-commit.ci: By incorporating this optimization directly into pre-commit.ci, we could potentially improve the performance of Black across a wide range of projects.
In conclusion, optimizing Black's multiprocessing capabilities by scheduling larger files first shows promise for improving performance. While further research and testing are needed, this simple change could lead to significant time savings, especially for projects with a mix of large and small files. Keep an eye on the Black project for future updates and potential implementations of this optimization!
For more information on Black and its capabilities, check out the official Black documentation on Read the Docs. This will give you a deeper understanding of how Black works and how you can use it to improve your Python code.