-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: CodSpeed Benchmarks #4243
base: main
Are you sure you want to change the base?
Conversation
CODSPEED_FORCE_OPTIMIZATION: true
…bundled with vite
export default defineConfig({ | ||
plugins: [codspeedPlugin()], | ||
// ... | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basic config for now, will need adjustment if we decide to do testing with vitest
.github/workflows/ci.yml
Outdated
IGNORED_FILES_UNPROCESSED=$(git ls-files --cached --ignored --exclude-from=all.gitignore) | ||
IGNORED_FILES=$(grep -v -F "patches/@codspeed+core+3.1.0.patch" <<< "$IGNORED_FILES_UNPROCESSED" || true) | ||
|
||
echo "IGNORED_FILES: $IGNORED_FILES" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can revert this as soon as codspeed
doesn't require the patch anymore
patches/@codspeed+core+3.1.0.patch
Outdated
diff --git a/node_modules/@codspeed/core/dist/index.cjs.js b/node_modules/@codspeed/core/dist/index.cjs.js | ||
index 1c40cda..4a5d588 100644 | ||
--- a/node_modules/@codspeed/core/dist/index.cjs.js | ||
+++ b/node_modules/@codspeed/core/dist/index.cjs.js | ||
@@ -26,7 +26,10 @@ const getV8Flags = () => { | ||
"--no-opt", | ||
"--predictable", | ||
"--predictable-gc-schedule", | ||
- "--interpreted-frames-native-stack" | ||
+ "--interpreted-frames-native-stack", | ||
+ // "--jitless", | ||
+ '--no-concurrent-sweeping', | ||
+ '--max-old-space-size=4096', | ||
]; | ||
if (nodeVersionMajor < 18) { | ||
flags.push("--no-randomize-hashes"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As recommended by codspeed maintainers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For existing benchmark suite, we have the following node options:
'--predictable',
'--no-concurrent-sweeping',
'--no-minor-gc-task',
'--min-semi-space-size=1024', // 1GB
'--max-semi-space-size=1024', // 1GB
'--trace-gc', // no gc calls should happen during benchmark, so trace them
Do we want to use similar flags? Are they equivalent? Asking from a place of ignorance here.
Tiny nit in terms of uniform quoting for the options, double quote vs single quote, just because I can't help myself. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Completely missed this comment. For now, I'd prefer to keep the current options as my tests have shown an acceptable level of variance between runs with that. I have however sent these options to the CodSpeed maintainers and they're testing out if they see any improvements. As mentioned below, since CodSpeed and our benchmark differ in terms of instrumentation, we may also need other flags.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And regarding the quotes, since this is a patch applied to codspeed-core, I tried to align with the code style in that folder.
- src/__benchmarks__/github-schema.json | ||
- src/__benchmarks__/github-schema.graphql |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could merge these two with the existing benchmark resource files into a shared folder
@@ -51,14 +51,17 @@ | |||
"build:deno": "node --loader ts-node/esm resources/build-deno.ts", | |||
"diff:npm": "node --loader ts-node/esm resources/diff-npm-package.ts", | |||
"gitpublish:npm": "bash ./resources/gitpublish.sh npm npmDist", | |||
"gitpublish:deno": "bash ./resources/gitpublish.sh deno denoDist" | |||
"gitpublish:deno": "bash ./resources/gitpublish.sh deno denoDist", | |||
"postinstall": "patch-package" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For patching CodSpeed with added stability
Thanks so much for working on this, @erikwrede !!!
Just taking a quick look, this is my biggest point of concern. Is there anything strange about our benchmarks that leads to unusual variance? Is there any suggestion that we might be able to eventually migrate all benchmarks to CodSpeed? It would be great to be able to deprecate the old benchmarks entirely with a solution that provides similar coverage. |
❌ Deploy Preview for compassionate-pike-271cb3 failed.
|
I can fully relate to your concerns. While I'd also love to see CodSpeed be the only benchmarking solution setup long-term, the indeterministic nature of the JIT can cause performance differences in some of the benchmarks: https://codspeed.io/erikwrede/graphql-js/benchmarks For now, I'd suggest to keep all benchmarks and just ignore all instable cases. With CodSpeed, we can freely choose our regression threshhold, but too many false positives for regressions or improvements will certainly degrade the experience. Once we see an improvement in stability over the ignored benchmarks, we can re-evaluate. My take: let's try it. If it proves unreliable, even for the benchmarks we thought to be stable, we can always remove it again and fall back to our other benchmarking solution. |
I guess I'm not totally understanding the big picture here. From a place of ignorance, I will try to ask some more specific questions:
In my head, I am comparing this proposed solution to setting up a non-shared privately hosted Github Actions runner with a dedicated CPU at a cost of about $20 a month and trying to understand the differences. Does our current benchmarking solution have the same variance problem for some of the benchmarks between runs, but get around this by always rerunning the pre-change and post-change right away? As you might be able to tell from the above, I am a bit uncertain as to the trade-offs here => feel free to enlighten as much of my ignorance as you can with the time you have available; @JoviDeCroock might also have some thoughts, of course! |
(cherry picked from commit 57d1f7f4bbcf55b22758f516911e335528b42cc6)
(cherry picked from commit 9eacdea7e2ddfc10eb9d17d026bc1d8fd1a3dc59)
First of all, let me say I don't consider your questions coming from a place of ignorance but rather a desire for rigorousness, which is what a project like this needs ;) Let me reiterate the rationale behind suggesting CodSpeed and why I use it for other open-source projects:
I consider CodSpeed to be a "linter" for performance. It will catch regressions you might not think about when making code changes. And when things improve, it will give you a satisfying comment about improvements. Continuously seeing that pushes you to think of other cases to benchmark and be performance-conscious. The USP of CodSpeed is the instrumentation enabling it to run on GH-Actions runners or just about any hardware while providing consistent results. However, to ensure perfect code quality, more than linting is required. A more rigorous review would be best. In terms of benchmarking GraphQL-js, this is what our current DIY benchmarking solution is. It tests on a built package and includes memory usage. For now, I don't see CodSpeed replacing it but supplementing it. Could we still host a $20/Month Hetzner bare metal machine and run our DIY script there? Sure! A custom solution will always best suit our purpose. However, to get the same benefit, we'd also need reporting, a dashboard, and a way to extract flame graphs. If we want to invest in this and build our own solution, I wouldn't oppose it. So, now that we got the use case straight, let's take a deeper look at performance:
CodSpeed's instrumentation excludes the syscalls from measures; the DIY solution ignores runs with too many context switches. The remaining problems are mostly GC-related and JIT related. We want our benchmark to run on optimized hot code, and we don't want GC to interrupt it at different points in time. So, please see this as a PoC and feel free to discuss the tradeoffs. I put this on the table as an option, and I'm curious to see what you think. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I appreciate very much the depth of the response! I think from a maintenance perspective, if we are confident in the full suite of benchmarks with codspeed, it seems like a better long-term solution compared to maintaining are own self-hosted git runner.
# Conflicts: # package.json
Summary
As discussed on the discord #wg channel, this is the prototype for CodSpeed benchmarks using vitest.
Some of the benchmarks we currently have may not be suitable for CodSpeeds instrumentation and may still provide variance in results. CodSpeed, for now, is just meant as to supplement the fully-fledged benchmark suite to prevent accidental regressions and get a quick impact overview on each PR. We are always able to remove certain benchmarks from the CodSpeed suite and keep them in the more powerful main benchmark suite.
Additionally, the introduction of vitest for benchmarking provides a path forward to using vitest for the tests, too
A sample run of CodSpeed on my fork can be found here: erikwrede/graphql-js#3 (comment)
Changes in this PR
@types/chai
because Vitest bundles it, no other way I'm aware of to fix this unfortunately - no impact on developmentAdministrative steps before merging