Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wasm2c: update memory/table operations to use u64 + harmonize checks #2506

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

keithw
Copy link
Member

@keithw keithw commented Nov 11, 2024

The PR updates the bulk memory operations (memory.fill, memory.copy, table.fill, etc.) to support 64-bit addresses and counts, and standardizes on a 64-bit version of RANGE_CHECK everywhere.

Previously we were only taking u32's for these arguments, even with memory64 enabled. (I don't think the memory64 tests check the ability to use memory.copy or the other operations beyond the first 4 GiB of a memory -- I wonder if there would be a way to add this as an "intensive" test if people don't mind having to allocate >4 GiB to run the test.)

This is a stepping-stone to being able to mix software-bounds-checked i64 memories and "guard-page-checked" i32 memories in the same module (#2507) and supporting custom-page-sizes (#2508).

src/template/wasm2c.declarations.c Outdated Show resolved Hide resolved
return _addcarry_u64(0, a, b, resptr);
}
#endif

#define RANGE_CHECK(mem, offset, len) \
Copy link
Collaborator

@shravanrn shravanrn Nov 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While defaulting to the 64-bit RANGE check is fine for memcpy, tables, etc. (it's unlikely to affect performance), the concern is that the 64-bit RANGE_CHECK will slow down accesses to 32-bit linear memories for bounds-checked wasm2c. Firefox uses the bounds-checked wasm2c for Wasm on 32-bit devices, and so it is perf sensitive to this.

I don't know if this is addressed in a future PR, but this particular PR would be a perf problem from the Firefox use case.

  • If you believe future PRs you are landing will give us the property "bounds checks on 32-bit memories are not slowed down", then i don't have any concerns. (I'd prefer landing this PR and the PR that fixes it in quick succession though). I'll look through the other PRs next to see if this is resolved by them

  • If you believe this is not addressed in future PRs, we may need to specialize the bounds checked added depending on the type of memory, which may need specializing i32_load etc. on the type of memory

  • An alternate approach would be to make the current PR about changing the RANGE_CHECK on the memory_fill style operations only, but leaving the RANGE_CHECKs on memory ops as is, i.e., it checks depending on SUPPORT_MEMORY64

Edit: I see that this might possibly be addressed in the next PR. If yes, please disregard the concern

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for these thoughtful (and well-taken) comments. I believe #2507 will nail this for you (by preserving the current RANGE_CHECK on 32-bit, default-page-size memories), so, how about we wait to get alignment on both #2506 and #2507 and then land them at the same time.

I should say that even the current RANGE_CHECK uses 64-bit arithmetic:

#define RANGE_CHECK(mem, offset, len)               \
  if (UNLIKELY(offset + (uint64_t)len > mem->size)) \
    TRAP(OOB);

... but the difference is that RANGE_CHECK64 does an explicit check for 64-bit overflow. I wish I had the benchmarking infrastructure to promise you it won't affect performance on 32-bit x86 but... safer to wait for #2507 which lets you keep the same code.

@shravanrn
Copy link
Collaborator

shravanrn commented Nov 11, 2024

I wonder if there would be a way to add this as an "intensive" test if people don't mind having to allocate >4 GiB to run the test.

Is the concern that the test-suite may not run on small machines?

One thought I had to run at least simple tests due to lazy memory allocation on Linux-like OSes? If tests are of the form such as below:

create wasm memory of 8 GB
memcpy 100 bytes at index 100
memcpy 100 bytes at index 4GB + 100

this should end up allocating only two physical pages for the heap. So apart from the large virtual memory footprint, the test should run fine even on small machines?

@keithw keithw force-pushed the w2c-harmonize-types branch 2 times, most recently from beb8809 to 64a616d Compare November 12, 2024 05:09
#elif defined(_MSC_VER)
static inline bool add_overflow(uint64_t a, uint64_t b, uint64_t* resptr) {
return _addcarry_u64(0, a, b, resptr);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe used #define here for consistency with above? Or use static inline function above?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@keithw
Copy link
Member Author

keithw commented Nov 12, 2024

I wonder if there would be a way to add this as an "intensive" test if people don't mind having to allocate >4 GiB to run the test.

Is the concern that the test-suite may not run on small machines?

Yeah, and also that a test could be really slow to run (e.g. memory.fill with an "n" greater than UINT32_MAX).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants