RFC 2: Skeleton for ExecutionContext #15350

ysbaddaden · 2025-01-16T15:08:17Z

Integrates the skeleton as per RFC #0002.

Add the ExecutionContext module;
Add the ExecutionContext::Scheduler module;
Add the execution_context compile-time flag.

When the execution_context flag is set:

Don't load Crystal::Scheduler;
Plug ExecutionContext instead of Crystal::Scheduler in spawn, Fiber, ...

This is only the skeleton: there are no implementations (yet). Trying to compile anything with -Dexecution_context will obviously fail until the follow up pull requests implementing at least the ST and/or MT context are merged.

refs #15342

- Add the `ExecutionContext` module; - Add the `ExecutionContext::Scheduler` module; - Add the `execution_context` compile-time flag. When the `execution_context` flag is set: - Don't load `Crystal::Scheduler`; - Plug `ExecutionContext` instead of `Crystal::Scheduler` in `spawn`, `Fiber`, ... This is only the skeleton: there are no implementations (yet). Trying to compile anything with `-Dexecution_context` will obviously fail for the time being.

The current ST and MT schedulers use a distinct pool per thread, which means we only need the thread safety for execution contexts that will share a single pool for a whole context.

The point is to avoid parallel enqueues while running the event loop, so we get better control to where and how the runnable fibers are enqueued; for example all at once instead of one by one (may not be as effective as it sounds). More importantly for Execution Contexts: it avoids parallel enqueues while the eventloop is running which sometimes leads to confusing behavior; for example when deciding to wake up a scheduler/thread we musn't interryupt the event loop (obviously). This is working correctly for the Polling (Epoll, Kqueue) and IOCP event loop implementations. I'm less confident with the libevent one where the external library executes arbitrary callbacks.

src/concurrent.cr

src/crystal/system/unix/signal.cr

src/crystal/system/thread.cr

straight-shoota · 2025-01-16T19:35:30Z

src/execution_context/execution_context.cr

+
+{% raise "ERROR: execution contexts require the `preview_mt` compilation flag" unless flag?(:preview_mt) %}
+
+module ExecutionContext


thought: I'm wondering:

Should this be in the top-level namespace or rather in Crystal::ExecutionContext?

Should this be a publicly documented module?

For comparison, Scheduler is in the Crystal namespace and nodoc.

Being the module to hold all the execution contexts and being the public interface definition to all the implementations, I'd say it better be documented?

Or maybe it's methods could be :nodoc: for starters, but the module itself must appear.

I think having both ExecutionContext and Crystal::ExecutionContext would just be terribly confusing.

Even if documented, it could make some sense to move it into the Crystal namespace because it's a component of the Crystal runtime. Like Crystal::EventLoop.

I understand the Crystal namespace is to keep internal runtime details, which the event loop currently is: none of its interface is public.

I'd move the global Thread to become Crystal::Thread for example, for the same reason.

But execution contexts have a public interface —at least constructors and #spawn— so taking the global ExecutionContext namespace makes sense to me.

Crystal is not just for internal details. It can also serve for public details of the runtime.
I think it's generally a good idea to collect implicit types of the runtime in the Crystal namespace instead of spreading them out in the top level namespace. Of course it's up to interpretation which types should be considered for that.
Anyway, we may not be able to change this for existing types, but we have the option when introducing a new one and should make a good decision.

I guess this is one of the first time we introduce something that's really on the edge between private and public.

I'm inclined to Crystal:: whether the api is public or not.

ExecutionContext is a generic name that might clash with user's code.

I agree with @straight-shoota that the Crystal namespace is not necessarily internal things.

Allright, let's go for the Crystal namespace.

I'm starting to believe that we should have put things under the Fiber namespace; these are tools to handle fibers after all:

Fiber::Channel

Fiber::Mutex

Fiber::WaitGroup

Fiber::ExecutionContext

This is what we did with Thread::Mutex for example.

Fiber:: works as well for me, it has the same benefits as using Crystal. And is clearly related to the runtime.

src/execution_context/scheduler.cr

Co-authored-by: Johannes Müller <[email protected]>

We need this because we don't load crystal/scheduler anymore when execution contexts are enabled.

straight-shoota · 2025-01-18T12:47:39Z

src/execution_context/scheduler.cr

+      # that being said, we can still trust the *current_fiber* local variable
+      # (it's the only exception)


question: Is this note relevant? Even current_fiber isn't used anywhere after the context swap.

Yes, I believe so. It's a note to the future selves.

Explaining that we switched context and we can't trust the local variables, including self (local var) and ivars (accessed through self), is useful knowledge. Doubly so because I think it used to work: the same thread/scheduler kept resuming the same fibers, but that's no longer true.

It will prevent someone to use @dead_fiber instead of Thread.current.dead_fiber for example, or trying to cache Thread.current or trusting #thread —yes, I fell for the first one at least 🤕

That mostly concerns the whole comment. I'm particularly wondering about the second paragraph referring to current_fiber.

straight-shoota · 2025-01-18T13:15:10Z

src/execution_context/scheduler.cr

+      {% unless flag?(:interpreted) %}
+        thread.dead_fiber = current_fiber if current_fiber.dead?
+      {% end %}


issue: Having two checks for dead fiber cleanup up on every single context swap seems like it might be a bit wasteful.
I reckon in the vast majority of use cases involving any remotely significant amount of IO (or other frequent fiber-swap points) dead fibers appear only at a very low percentage of context swaps. But these checks impact performance of every single one of them. They're relatively simple, but it still adds up. Fast context swaps are important for efficient concurrency.

Also this causes another effect when there is actually a dead fiber to clean up, happening during the context swap might delay it significantly (depending on the release procedure).

This could also cause unnecessary performance penalties. When a thread picks up a fiber, but before actually resuming it, it notices it has a dead fiber's stack to clean up and occupies itself with that. This delays executing the fiber because the thread has already reserved it, while another thread might be ready to run it already.

Overall, dead fiber cleanup shouldn't be time critical, really. The only critical property is that it must be delayed until after the fiber has really
So we could do it at opportune, controlled moments instead of checking for it on every stack swap. For example, when a thread is idle or when spawning a new fiber and the stack pool is empty (maybe we can reclaim the stack of a dead fiber).

I suppose the dead marking could perhaps be moved into the cleanup of Fiber#run so it really only happens when a fiber is dead, avoiding the check on every swapcontext.

I can indeed move the before-swap check to Fiber#run.

As for the after-swap... I don't see alternatives 😢

a queue of dying fibers in the stack pool? we'd still have to set the fiber's state as truly dead after swapcontext;

GC finalizers? that would delay the recycling and freeing to when the GC decides to collect, and stacks would end up in whatever stack pool (unless we share a single pool for the process). Each stack being 8MB of virtual memory it quickly adds up and we can easily reach OOM.

Maybe we can optimize passing the dead fiber?

we can't use @dead_fiber as explained in the note below (a fiber may be resumed by any thread) 😢

use a @@dead_fiber thread local instead of having to resolve the @@current thread local on Thread then checking Thread#dead_fiber? 🤔

pass the dead fiber as a local variable through assembly... there could be a setcontext function that wouldn't save the current context (why bother?) but would instead set the dead fiber to some register, while swapcontext wouldn't bother (but would still need to tell where to set the deaf fiber)... the after-swap check would then be a local compare/jump 🤔

NOTE: we want to recycle the stacks —I measured and it's much faster to avoid the mmap syscall on every spawn— and we'd still need to safely unmap the stacks anyway.

NOTE: Go doesn't have this issue because it allocates a 2KB stack in the GC HEAP and reallocates when needed, it eventually lets the GC collect it.

ysbaddaden · 2025-01-18T15:38:35Z

src/execution_context/scheduler.cr

+
+      {% unless flag?(:interpreted) %}
+        if fiber = Thread.current.dead_fiber?
+          fiber.execution_context.stack_pool.release(fiber.@stack)


Since a fiber can only be resumed by its owning execution context, and that #stack_pool is shared by the context, we could merely call stack_pool.release(fiber.@stack) here 🤔

ysbaddaden added 3 commits January 16, 2025 15:59

Add thread safety to Fiber::StackPool

35da3f9

The current ST and MT schedulers use a distinct pool per thread, which means we only need the thread safety for execution contexts that will share a single pool for a whole context.

ysbaddaden added kind:feature topic:multithreading labels Jan 16, 2025

ysbaddaden self-assigned this Jan 16, 2025

ysbaddaden mentioned this pull request Jan 14, 2025

Implement RFC 0002: ExecutionContext [EPIC] #15342

Open

straight-shoota reviewed Jan 16, 2025

View reviewed changes

ysbaddaden and others added 3 commits January 17, 2025 14:09

Fix: wrong documentation for ExecutionContext#enqueue

0683695

Co-authored-by: Johannes Müller <[email protected]>

Document Thread#dead_fiber (delayed fiber stack cleanup)

8abce65

Fix: move require crystal/system/print_error to src/raise

7f2d0d4

We need this because we don't load crystal/scheduler anymore when execution contexts are enabled.

straight-shoota reviewed Jan 18, 2025

View reviewed changes

ysbaddaden commented Jan 18, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC 2: Skeleton for ExecutionContext #15350

RFC 2: Skeleton for ExecutionContext #15350

ysbaddaden commented Jan 16, 2025

straight-shoota Jan 16, 2025

ysbaddaden Jan 17, 2025

ysbaddaden Jan 17, 2025 •

edited

Loading

straight-shoota Jan 17, 2025

ysbaddaden Jan 17, 2025

straight-shoota Jan 17, 2025

ysbaddaden Jan 17, 2025

bcardiff Jan 18, 2025

ysbaddaden Jan 18, 2025

bcardiff Jan 18, 2025

straight-shoota Jan 18, 2025

ysbaddaden Jan 18, 2025 •

edited

Loading

straight-shoota Jan 18, 2025

straight-shoota Jan 18, 2025

straight-shoota Jan 18, 2025

ysbaddaden Jan 18, 2025

ysbaddaden Jan 18, 2025


		{% raise "ERROR: execution contexts require the `preview_mt` compilation flag" unless flag?(:preview_mt) %}

		module ExecutionContext

		# that being said, we can still trust the current_fiber local variable
		# (it's the only exception)

RFC 2: Skeleton for ExecutionContext #15350

Are you sure you want to change the base?

RFC 2: Skeleton for ExecutionContext #15350

Conversation

ysbaddaden commented Jan 16, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ysbaddaden Jan 17, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ysbaddaden Jan 18, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ysbaddaden Jan 17, 2025 •

edited

Loading

ysbaddaden Jan 18, 2025 •

edited

Loading