Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reuse typechecking results - stage 1 #17898

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

psfinaki
Copy link
Member

@psfinaki psfinaki commented Oct 18, 2024

First part of the implementation of reusing typecheck results, as per the design doc.

This does the following:

  • adds the flag --reusetypecheckingresults, with all the hooking
  • detects if there is a case to reuse typechecking results

Nothing happens - yet - we only report related activities though.

Cache invalidation scenarios I looked at:

  • changed sources
  • changed references
  • changed command line

Copy link
Contributor

github-actions bot commented Oct 18, 2024

❗ Release notes required


✅ Found changes and release notes in following paths:

Warning

No PR link found in some release notes, please consider adding it.

Change path Release notes path Description
src/Compiler docs/release-notes/.FSharp.Compiler.Service/9.0.200.md No current pull request URL (#17898) found, please consider adding it

@psfinaki psfinaki changed the title Reuse typechecking results - stage 1 [WIP] Reuse typechecking results - stage 1 Oct 18, 2024
src/Compiler/Driver/fsc.fs Outdated Show resolved Hide resolved
@psfinaki psfinaki requested a review from T-Gro October 18, 2024 16:31
@psfinaki
Copy link
Member Author

@T-Gro PTAL, this is an early stage so I mostly just wanted to discuss the questions I raised in the comments, so that we go in the agreed direction.

@edgarfgp
Copy link
Contributor

It seems like this is going to rely/build on top of graph checking have couple questions:

@psfinaki
Copy link
Member Author

@edgarfgp so we'll need to discuss this - personally I'd like to turn graph-checking on. Now (like, before Xmas) is a rather good time - we can catch some bugs there before the next big release.

It's somewhat scaring but, well, just this probably shouldn't be stopping us 👀 Graph-checking and reusing typechecking results can test each other and I'd say we should use this benefit here.

I am not speaking for the whole team just yet - only sharing some thoughts. We will discuss and prioritize this.

@psfinaki psfinaki marked this pull request as ready for review November 6, 2024 14:00
@psfinaki psfinaki requested a review from a team as a code owner November 6, 2024 14:00
@psfinaki psfinaki changed the title [WIP] Reuse typechecking results - stage 1 Reuse typechecking results - stage 1 Nov 6, 2024
@psfinaki psfinaki force-pushed the reuse-typecheck-a branch 2 times, most recently from ce9c28e to 3275457 Compare November 8, 2024 10:18
Comment on lines +80 to +83
let getThisCompilationReferences =
List.map (fun (r: AssemblyReference) -> r.Text, getContentHash r.Text)
>> List.map (fun (name, hash) -> $"{name}: {hash}")
>> List.toArray
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is A LOT of reading into memory, what's the memory consumption for the whole BCL and runtime assemblies? Can we check MVID and/or date? Also, in this case we gonna be reading them twice? One for reuse tc and in normal compilation. There should be some cache introduced for them if we'll have to read them twice.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can do those things. Checking hash helps in the scenarios where one changes a dependency, reverts it immediately (ctrl + Z) and then triggers the recompilation. If we only look at the date, this will look like a new recompilation.

We can different processes for system and non-system dependencies, do you think it's worth it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should just straight up use MVID for assemblies. If it's good for deterministic builds and msbuild, it should be good enough for us. This way you'll only be reading header, and not calculate anything.

Comment on lines +86 to +118
if FileSystem.DirectoryExistsShim tcDataFolderName then
use _ = Activity.start Activity.Events.reuseTcResultsCachePresent []

let prevCompilationCmdLine = getPrevCompilationCmdLine ()
let thisCompilationCmdLine = getThisCompilationCmdLine tcConfig.cmdLineArgs

if prevCompilationCmdLine = thisCompilationCmdLine then

let prevCompilationReferences = getPrevCompilationReferences ()
let thisCompilationReferences = getThisCompilationReferences tcConfig.referencedDLLs

if prevCompilationReferences = thisCompilationReferences then

let prevCompilationGraph = getPrevCompilationGraph ()
let thisCompilationGraph = getThisCompilationGraph inputs

if prevCompilationGraph = thisCompilationGraph then
use _ = Activity.start Activity.Events.reuseTcResultsCacheHit []

() // do nothing, yet
else
use _ = Activity.start Activity.Events.reuseTcResultsCacheMissed []

writeThisCompilationGraph thisCompilationGraph
else
use _ = Activity.start Activity.Events.reuseTcResultsCacheMissed []

writeThisCompilationReferences thisCompilationReferences
else
use _ = Activity.start Activity.Events.reuseTcResultsCacheMissed []

writeThisCompilationCmdLine thisCompilationCmdLine
else
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can cognitive load be reduced here somehow? Not to have a bunch of nested conditions?

let (++) a b = Path.Combine(a, b)

[<Sealed>]
type ReuseTcResultsDriver(tcConfig: TcConfig) =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should probably be named differently, something along the lines of CachingDriver.

Comment on lines +43 to +45
let writeThisCompilationCmdLine = writeThisTcData cmdLineFilePath
let writeThisCompilationGraph = writeThisTcData graphFilePath
let writeThisCompilationReferences = writeThisTcData referencesFilePath
Copy link
Member

@vzarytovskii vzarytovskii Nov 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All writers should use buffer and flush it in the very end to avoid partial data being in the cached file.

Comment on lines +24 to +26
let cmdLineFilePath = tcDataFolderName ++ "cmdline"
let graphFilePath = tcDataFolderName ++ "graph"
let referencesFilePath = tcDataFolderName ++ "references"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think those should be separate files, to avoid partial and incorrect data, it should be as atomic as possible.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was my first idea, but than I thought - granularity adds some flexibility here: if one of those things changes, we won't need to rewrite all of them, saving some time and memory. Multiple files are also a bit easier to work with - there is no need to set up some file structure here (what follows what, empty lines and so on).

What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It risks mixing things, I would put as we do with sigdata or assembly - all in one resource. It's fine rewriting all of them, it's not a lot of data, but it is easier to verify that all data is valid.

let graphFilePath = tcDataFolderName ++ "graph"
let referencesFilePath = tcDataFolderName ++ "references"

let writeThisTcData path content =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some sort of checksum mechanism should also be incorporated into the file, so it's immediately apparent that both format is correct as well as file is well written.

@psfinaki psfinaki force-pushed the reuse-typecheck-a branch 2 times, most recently from 478fde0 to 3f0d13c Compare November 12, 2024 16:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

4 participants