feat: CodSpeed Benchmarks #4243

erikwrede · 2024-10-17T16:24:24Z

Summary

As discussed on the discord #wg channel, this is the prototype for CodSpeed benchmarks using vitest.

Some of the benchmarks we currently have may not be suitable for CodSpeeds instrumentation and may still provide variance in results. CodSpeed, for now, is just meant as to supplement the fully-fledged benchmark suite to prevent accidental regressions and get a quick impact overview on each PR. We are always able to remove certain benchmarks from the CodSpeed suite and keep them in the more powerful main benchmark suite.

Additionally, the introduction of vitest for benchmarking provides a path forward to using vitest for the tests, too

A sample run of CodSpeed on my fork can be found here: erikwrede/graphql-js#3 (comment)

Changes in this PR

Add Codspeed
Add Vitest
remove @types/chai because Vitest bundles it, no other way I'm aware of to fix this unfortunately - no impact on development
Refactor all benchmarks as CodSpeed+Vitest benchmarks
Add Github Workflows

Administrative steps before merging

Setup CodSpeed for this Repo (codspeed.io)
Add the necessary tokens https://docs.codspeed.io/ci/github-actions

CODSPEED_FORCE_OPTIMIZATION: true

…bundled with vite

linux-foundation-easycla · 2024-10-17T16:24:37Z

The committers listed above are authorized under a signed CLA.

✅ login: erikwrede / name: Erik Wrede (dd0629f, 114d936, b5b3049, 9c771c0, 64729f8, 1425baa, 04fa980, 9c713c9, badafc1, e1f79a2, e43ff47, 8eb26fb, fe41a4c, ef74c7d, 71a3028, 0574c6b, d88e025, 06734cf, 3b899ec, 016acb6, 6c49c41, 0cc1c05, ccf912d, 3c83dfd)

erikwrede · 2024-10-17T16:37:44Z

vitest.config.ts

+export default defineConfig({
+  plugins: [codspeedPlugin()],
+  // ...
+});


Basic config for now, will need adjustment if we decide to do testing with vitest

erikwrede · 2024-10-17T16:38:27Z

.github/workflows/ci.yml

+          IGNORED_FILES_UNPROCESSED=$(git ls-files --cached --ignored --exclude-from=all.gitignore)
+          IGNORED_FILES=$(grep -v -F "patches/@codspeed+core+3.1.0.patch" <<< "$IGNORED_FILES_UNPROCESSED" || true)
+
+          echo "IGNORED_FILES: $IGNORED_FILES"


We can revert this as soon as codspeed doesn't require the patch anymore

erikwrede · 2024-10-17T16:38:41Z

patches/@codspeed+core+3.1.0.patch

+diff --git a/node_modules/@codspeed/core/dist/index.cjs.js b/node_modules/@codspeed/core/dist/index.cjs.js
+index 1c40cda..4a5d588 100644
+--- a/node_modules/@codspeed/core/dist/index.cjs.js
+++ b/node_modules/@codspeed/core/dist/index.cjs.js
+@@ -26,7 +26,10 @@ const getV8Flags = () => {
+     "--no-opt",
+     "--predictable",
+     "--predictable-gc-schedule",
+-    "--interpreted-frames-native-stack"
+    "--interpreted-frames-native-stack",
+    // "--jitless",
+    '--no-concurrent-sweeping',
+    '--max-old-space-size=4096',
+   ];
+   if (nodeVersionMajor < 18) {
+     flags.push("--no-randomize-hashes");


As recommended by codspeed maintainers

For existing benchmark suite, we have the following node options:

'--predictable', '--no-concurrent-sweeping', '--no-minor-gc-task', '--min-semi-space-size=1024', // 1GB '--max-semi-space-size=1024', // 1GB '--trace-gc', // no gc calls should happen during benchmark, so trace them

Do we want to use similar flags? Are they equivalent? Asking from a place of ignorance here.

Tiny nit in terms of uniform quoting for the options, double quote vs single quote, just because I can't help myself. :)

Completely missed this comment. For now, I'd prefer to keep the current options as my tests have shown an acceptable level of variance between runs with that. I have however sent these options to the CodSpeed maintainers and they're testing out if they see any improvements. As mentioned below, since CodSpeed and our benchmark differ in terms of instrumentation, we may also need other flags.

And regarding the quotes, since this is a patch applied to codspeed-core, I tried to align with the code style in that folder.

erikwrede · 2024-10-17T16:43:02Z

cspell.yml

+  - src/__benchmarks__/github-schema.json
+  - src/__benchmarks__/github-schema.graphql


We could merge these two with the existing benchmark resource files into a shared folder

erikwrede · 2024-10-17T16:43:35Z

package.json

@@ -51,14 +51,17 @@
    "build:deno": "node --loader ts-node/esm resources/build-deno.ts",
    "diff:npm": "node --loader ts-node/esm resources/diff-npm-package.ts",
    "gitpublish:npm": "bash ./resources/gitpublish.sh npm npmDist",
-    "gitpublish:deno": "bash ./resources/gitpublish.sh deno denoDist"
+    "gitpublish:deno": "bash ./resources/gitpublish.sh deno denoDist",
+    "postinstall": "patch-package"


For patching CodSpeed with added stability

yaacovCR · 2024-10-17T20:44:11Z

Thanks so much for working on this, @erikwrede !!!

Some of the benchmarks we currently have may not be suitable for CodSpeeds instrumentation and may still provide variance in results.

Just taking a quick look, this is my biggest point of concern. Is there anything strange about our benchmarks that leads to unusual variance? Is there any suggestion that we might be able to eventually migrate all benchmarks to CodSpeed? It would be great to be able to deprecate the old benchmarks entirely with a solution that provides similar coverage.

netlify · 2024-10-18T10:02:54Z

❌ Deploy Preview for compassionate-pike-271cb3 failed.

Name	Link
🔨 Latest commit	`0574c6b`
🔍 Latest deploy log	https://app.netlify.com/sites/compassionate-pike-271cb3/deploys/67163557bdab330008e9b1cc

erikwrede · 2024-10-18T10:28:43Z

I can fully relate to your concerns. While I'd also love to see CodSpeed be the only benchmarking solution setup long-term, the indeterministic nature of the JIT can cause performance differences in some of the benchmarks:

https://codspeed.io/erikwrede/graphql-js/benchmarks

For now, I'd suggest to keep all benchmarks and just ignore all instable cases. With CodSpeed, we can freely choose our regression threshhold, but too many false positives for regressions or improvements will certainly degrade the experience. Once we see an improvement in stability over the ignored benchmarks, we can re-evaluate.

My take: let's try it. If it proves unreliable, even for the benchmarks we thought to be stable, we can always remove it again and fall back to our other benchmarking solution.

yaacovCR · 2024-10-20T09:23:20Z

For now, I'd suggest to keep all benchmarks and just ignore all instable cases.
Once we see an improvement in stability over the ignored benchmarks, we can re-evaluate.

I guess I'm not totally understanding the big picture here. From a place of ignorance, I will try to ask some more specific questions:

How do we tell from the codspeed UI which benches are unstable and should be ignored?
When you write "once we see an improvement in stability over the ignored benchmarks" do you mean that the unstable benchmarks are expected to become stable? Or that we can write better benchmarks that are more stable? Or that the usefulness of the stable once is so great, that the fact that there are unstable ones is not going to bother us?

In my head, I am comparing this proposed solution to setting up a non-shared privately hosted Github Actions runner with a dedicated CPU at a cost of about $20 a month and trying to understand the differences.

Does our current benchmarking solution have the same variance problem for some of the benchmarks between runs, but get around this by always rerunning the pre-change and post-change right away?

As you might be able to tell from the above, I am a bit uncertain as to the trade-offs here => feel free to enlighten as much of my ignorance as you can with the time you have available; @JoviDeCroock might also have some thoughts, of course!

(cherry picked from commit 57d1f7f4bbcf55b22758f516911e335528b42cc6)

(cherry picked from commit 9eacdea7e2ddfc10eb9d17d026bc1d8fd1a3dc59)

erikwrede · 2024-10-20T12:45:21Z

First of all, let me say I don't consider your questions coming from a place of ignorance but rather a desire for rigorousness, which is what a project like this needs ;)

Let me reiterate the rationale behind suggesting CodSpeed and why I use it for other open-source projects:

It always runs in your pipeline.
Brief Reporting
Comes with flame graphs -> Quick overview of what changes
Monitoring what changed over time

I consider CodSpeed to be a "linter" for performance. It will catch regressions you might not think about when making code changes. And when things improve, it will give you a satisfying comment about improvements. Continuously seeing that pushes you to think of other cases to benchmark and be performance-conscious. The USP of CodSpeed is the instrumentation enabling it to run on GH-Actions runners or just about any hardware while providing consistent results.

However, to ensure perfect code quality, more than linting is required. A more rigorous review would be best. In terms of benchmarking GraphQL-js, this is what our current DIY benchmarking solution is. It tests on a built package and includes memory usage. For now, I don't see CodSpeed replacing it but supplementing it.

Could we still host a $20/Month Hetzner bare metal machine and run our DIY script there? Sure! A custom solution will always best suit our purpose. However, to get the same benefit, we'd also need reporting, a dashboard, and a way to extract flame graphs. If we want to invest in this and build our own solution, I wouldn't oppose it.

So, now that we got the use case straight, let's take a deeper look at performance:
In node benchmarking, we have a couple of factors affecting benchmark consistency:

Garbage Collection
JIT
Syscalls, Context Switching

CodSpeed's instrumentation excludes the syscalls from measures; the DIY solution ignores runs with too many context switches. The remaining problems are mostly GC-related and JIT related. We want our benchmark to run on optimized hot code, and we don't want GC to interrupt it at different points in time.
I saw some variance between runs of the same codebase in earlier versions of this PR, but I made some changes to the CodSpeed patch today. Now, everything seems stable. If, over time, the benchmark turns out to be flaky on CodSpeed, we can always exclude it from monitoring. Flaky benchmarks would diminish the value of every solution, because they lead to ignoring the results in the long run. Any solution we choose should have some measures in place to avoid flakiness / variance.

So, please see this as a PoC and feel free to discuss the tradeoffs. I put this on the table as an option, and I'm curious to see what you think.

erikwrede · 2024-10-20T13:06:28Z

And to be very specific about your questions:

How do we tell from the codspeed UI which benches are unstable and should be ignored?

Looking at changes in benchmark performance over time and seeing that a benchmark changed, even though the corresponding code wasn't modified. For example for this benchmark over multiple readme-only commits:

Ideally, we would want a near-straight line here.

When you write "once we see an improvement in stability over the ignored benchmarks" do you mean that the unstable benchmarks are expected to become stable? Or that the usefulness of the stable once is so great, that the fact that there are unstable ones is not going to bother us?

CodSpeed for Node is still relatively new, hence my patches to it in this PR. Future changes could help bring down variance on GH Actions runners and cause tests to be stable. Reiterating on my previous comment, I don't see any unexpected variance between commits anymore. It seems that my newest patch was able to fix this problem.

Or that we can write better benchmarks that are more stable?

At the current moment, I don't see a way for us to write benchmarks that are more stable on CodSpeed. The ones where I saw variance are, as of now, less compatible with CodSpeed's instrumented approach.

Does our current benchmarking solution have the same variance problem for some of the benchmarks between runs, but get around this by always rerunning the pre-change and post-change right away?

Our current benchmarking solution has a different approach to measurement, it's not directly comparable. CodSpeed instruments the measurements, filtering out noise like context switches, we run the benchmark several times while dropping any results that contain GC or context switches.

yaacovCR

I appreciate very much the depth of the response! I think from a maintenance perspective, if we are confident in the full suite of benchmarks with codspeed, it seems like a better long-term solution compared to maintaining are own self-hosted git runner.

package.json

# Conflicts: # package.json

erikwrede added 17 commits October 3, 2024 23:56

experiment: add codspeed

114d936

fix: make esm loader happy

016acb6

fix: use correct node command

ccf912d

chore: trigger codspeed run

3b899ec

experiment: add codspeed using vitest

06734cf

chore: lint

fe41a4c

fix: revert import broken by lint

71a3028

experiment: try out patching

dd0629f

chore: enable env:

1425baa

CODSPEED_FORCE_OPTIMIZATION: true

trigger codspeed

64729f8

fix: adjust ci & lint to new benchmarks

9c713c9

refactor: remove types/chai to prevent type conflicts. they now come …

badafc1

…bundled with vite

chore: prettier

e43ff47

fix: commonly ignored files

e1f79a2

chore: prettier

ef74c7d

chore: add codspeed to spellcheck

3c83dfd

chore: remove unused codspeed command

d88e025

erikwrede requested a review from a team as a code owner October 17, 2024 16:24

erikwrede changed the base branch from 16.x.x to main October 17, 2024 16:24

erikwrede commented Oct 17, 2024

View reviewed changes

yaacovCR mentioned this pull request Oct 17, 2024

add benchmark for async list with nested async field #3805

Closed

fix: rename files

9c771c0

erikwrede added 3 commits October 20, 2024 14:39

chore: patch codspeed

b5b3049

(cherry picked from commit 57d1f7f4bbcf55b22758f516911e335528b42cc6)

chore: fix patch

04fa980

(cherry picked from commit 9eacdea7e2ddfc10eb9d17d026bc1d8fd1a3dc59)

chore: remove tinybench plugin

6c49c41

fix: adjust commonly ignored files workflow

8eb26fb

yaacovCR approved these changes Oct 20, 2024

View reviewed changes

yaacovCR reviewed Oct 20, 2024

View reviewed changes

package.json Outdated Show resolved Hide resolved

erikwrede added 2 commits October 21, 2024 13:03

chore: unpin tinybench

0cc1c05

Merge remote-tracking branch 'origin/main' into feat/codspeed-vitest

0574c6b

# Conflicts: # package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: CodSpeed Benchmarks #4243

feat: CodSpeed Benchmarks #4243

erikwrede commented Oct 17, 2024 •

edited

Loading

linux-foundation-easycla bot commented Oct 17, 2024 •

edited

Loading

erikwrede Oct 17, 2024

erikwrede Oct 17, 2024

erikwrede Oct 17, 2024

yaacovCR Oct 20, 2024

erikwrede Oct 22, 2024

erikwrede Oct 22, 2024

erikwrede Oct 17, 2024

erikwrede Oct 17, 2024

yaacovCR commented Oct 17, 2024

netlify bot commented Oct 18, 2024 •

edited

Loading

erikwrede commented Oct 18, 2024

yaacovCR commented Oct 20, 2024

erikwrede commented Oct 20, 2024

erikwrede commented Oct 20, 2024

yaacovCR left a comment

		- src/__benchmarks__/github-schema.json
		- src/__benchmarks__/github-schema.graphql

feat: CodSpeed Benchmarks #4243

Are you sure you want to change the base?

feat: CodSpeed Benchmarks #4243

Conversation

erikwrede commented Oct 17, 2024 • edited Loading

Summary

Changes in this PR

Administrative steps before merging

linux-foundation-easycla bot commented Oct 17, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yaacovCR commented Oct 17, 2024

netlify bot commented Oct 18, 2024 • edited Loading

❌ Deploy Preview for compassionate-pike-271cb3 failed.

erikwrede commented Oct 18, 2024

yaacovCR commented Oct 20, 2024

erikwrede commented Oct 20, 2024

erikwrede commented Oct 20, 2024

yaacovCR left a comment

Choose a reason for hiding this comment

erikwrede commented Oct 17, 2024 •

edited

Loading

linux-foundation-easycla bot commented Oct 17, 2024 •

edited

Loading

netlify bot commented Oct 18, 2024 •

edited

Loading