Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Track original wasm instructions location #71

Merged
merged 1 commit into from
Oct 23, 2019

Conversation

yurydelendik
Copy link
Contributor

Work toward #67

  • when decoding, store all offsets in the ExprId -> offset map
  • when encoding, remember all ExprId -> offset
  • on finalize of writing, merge information to get source offset -> encoded offset transform

Copy link
Member

@fitzgen fitzgen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So is the idea here that we would use the resulting offsets map to modify the DWARF in place outside of walrus the same way a linker might?

A few comments/questions/suggestions in line below.

src/module/functions/local_function/mod.rs Outdated Show resolved Hide resolved
src/module/functions/local_function/mod.rs Outdated Show resolved Hide resolved
src/module/functions/local_function/mod.rs Outdated Show resolved Hide resolved
src/module/mod.rs Outdated Show resolved Hide resolved
@yurydelendik
Copy link
Contributor Author

So is the idea here that we would use the resulting offsets map to modify the DWARF in place outside of walrus the same way a linker might?

The wasm DWARF will need the code_transform (probably even more in the future) to be properly transformed. There is https://github.com/yurydelendik/wtmaps-utils/tree/master/wdwarf-cp project -- I'm planing to extend it to accept this data and/or be make it as a library to be used with Module.generate_dwarf option.

@yurydelendik yurydelendik marked this pull request as ready for review May 21, 2019 17:30
@yurydelendik
Copy link
Contributor Author

@fitzgen additional info (based on standup meeting), DWARF transform logic is currently located at https://github.com/yurydelendik/wtmaps-utils in the wdwarf library -- the provided by this PR data will be used for final DWARF sections generation.

@fitzgen
Copy link
Member

fitzgen commented May 22, 2019

So I whipped up a smoke test to check that inserting (drop (i32.const 0)) at the start of a function results in the expected code transformations.

That is, going from input.wasm to output.wasm:

$ wasm-objdump -d crates/tests/input.wasm crates/tests/output.wasm 

input.wasm:	file format wasm 0x1

Code Disassembly:

00002e func[0] <f>:
 00002f: 41 b9 0a                   | i32.const 1337
 000032: 0b                         | end

output.wasm:	file format wasm 0x1

Code Disassembly:

00002e func[0] <f>:
 00002f: 41 00                      | i32.const 0
 000031: 1a                         | drop
 000032: 41 b9 0a                   | i32.const 1337
 000035: 0b                         | end

The i32.const 1337 has moved from offset 0x2f to offset 0x32, however the code_transform contains [(0x32, 0x31)].

I rebased this branch on top of master and added a commit that contains this failing test:

@yurydelendik could you look into why the test's assertions about the code transforms are failing?

@RReverser
Copy link
Member

Is this PR / feature planned to be revived?

Currently wasm-pack supports dwarf-debug-info = true in the config, but it doesn't appear very useful since all the locations are off, and can't be used for source mapping.

@fitzgen
Copy link
Member

fitzgen commented Oct 9, 2019

@RReverser yes this feature is somethign we definitely want, but this PR is quite bit rotted at this point.

yurydelendik added a commit to yurydelendik/walrus that referenced this pull request Oct 21, 2019
@yurydelendik yurydelendik force-pushed the debuginfo branch 2 times, most recently from ac129c4 to 61fe12e Compare October 22, 2019 16:06
@yurydelendik
Copy link
Contributor Author

Rebased and changed logic to use InstrLocId instead of ExprId. InstrLocId is just a debug location token, though for wasm purposes (and by default) it can be used to store bytecode offset.

@yurydelendik yurydelendik requested a review from fitzgen October 22, 2019 16:08
Copy link
Member

@fitzgen fitzgen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thanks for including that smoke test, Yury.

So the idea is that InstrLocId is not necessarily a code section offset, and users can customize its representation with config.on_instr_pos? Why would someone want to use something else? What would we be losing by only supporting code section offsets?

@yurydelendik
Copy link
Contributor Author

Why would someone want to use something else?

If tools can read/write debug location format, they can allow better integration.

What would we be losing by only supporting code section offsets?

At this point nothing. Right now it is module bytecode offset (I would prefer to extend bytecode to entire usize type (instead of u32), if we are planing to support only that). Or, do we need to switch InstrLocId to code section offsets format, which will require to offset bytecode position to code section start?

@fitzgen
Copy link
Member

fitzgen commented Oct 22, 2019

Gotcha.

Can you fix the fuzz targets? Then we can merge this 👍

https://travis-ci.org/rustwasm/walrus/jobs/601361918#L950

  --> fuzz_targets/raw.rs:11:22
   |
7  |     let module = match walrus::Module::from_buffer(data) {
   |         ------ help: consider changing this to be mutable: `mut module`
...
11 |     let serialized = module.emit_wasm();
   |                      ^^^^^^ cannot borrow as mutable
error[E0596]: cannot borrow `module` as mutable, as it is not declared as mutable
  --> fuzz_targets/raw.rs:14:24
   |
12 |     let module =
   |         ------ help: consider changing this to be mutable: `mut module`
13 |         walrus::Module::from_buffer(&serialized).expect("we should only emit valid Wasm data");
14 |     let reserialized = module.emit_wasm();
   |                        ^^^^^^ cannot borrow as mutable

Just the raw target is showing up, presumably because it is the first we try to run, but the others may need to be updated as well.

@fitzgen fitzgen merged commit 7d2007d into rustwasm:master Oct 23, 2019
@fitzgen
Copy link
Member

fitzgen commented Oct 23, 2019

Thanks!

@RReverser
Copy link
Member

Woot, this was fast. Thanks!

@RReverser
Copy link
Member

@fitzgen Do you plan to make a wasm-bindgen release with these changes?

@fitzgen
Copy link
Member

fitzgen commented Oct 23, 2019

@RReverser we can publish a new walrus and use that in wasm-bindgen, but I suspect what you're really interested in is using this to fix up llvm's emitted DWARF, right? If so, then that requires writing a little bit more than what this PR provided.

That code could theoretically go in either walrus or wasm-bindgen. I guess I would prefer it live in walrus, since it isn't really bindings-specific, and any sort of wasm transform will need this functionality to preserve debug info.

We will need CustomSection implementations for each .debug_foo section that we wish to preserve (can probably start with just a DebugLines struct for .debug_lines) that use the new origin-tracking infra introduced in this PR. These should probably be behind a cargo feature and live in src/dwarf.rs. Then we would have a config option for enabling DWARF support, and when set we would create the DebugLines instance from the .debug_lines custom section if it exists, and add it to the module's custom section registry so that when we write the wasm back out it updates all code offsets. Once that is working, we would do the same for each of the other DWARF sections.

@RReverser
Copy link
Member

but I suspect what you're really interested in is using this to fix up llvm's emitted DWARF, right?

I guess I'm confused, I thought that's what this PR is doing? If not:

We will need CustomSection implementations for each .debug_foo section ...

I had an idea that these operations could be done in gimli::write::Dwarf instead - it already accepts a callback that converts any addresses in constructor, and, while it's not the original goal, same callback could be used for returning an adjusted address in the new binary.

Then the rest of it is as easy as reading DWARF custom sections of Wasm into read::Dwarf and re-emitting them back immediately after construction of write::Dwarf with such callback.

@yurydelendik
Copy link
Contributor Author

yurydelendik commented Oct 23, 2019

I had an idea that these operations could be done in gimli::write::Dwarf instead

@RReverser There is a project started at https://github.com/yurydelendik/wtmaps-utils that will demonstrate the scope of the work needs to accomplish the task -- it is not trivial.

@fitzgen
Copy link
Member

fitzgen commented Oct 23, 2019

Then the rest of it is as easy as reading DWARF custom sections of Wasm into read::Dwarf and re-emitting them back immediately after construction of write::Dwarf with such callback.

This is basically what those CustomSection implementations would be doing.

@philipc
Copy link

philipc commented Oct 23, 2019

Looking at convert.rs in wdwarf, a lot of it is copied from gimli. Is there a way we can adapt gimli's convert to do what you need? I had originally thought that transformations would work by calling gimli's convert, and then mutating the resulting write::Dwarf, but maybe that is not feasible? I would like to avoid the duplication of logic somehow though.

@yurydelendik
Copy link
Contributor Author

Is there a way we can adapt gimli's convert to do what you need?

@philipc yes, that's an ideal situation. This project is created to identify what needs to be done for this use case (on gimli or user side). I'm planning to resume a work on it soon.

Let's create an issue at wdwarf or gimli repo to continue this discussion. I'll just mention that this is a problem not just translation of address, but changed data to still has be a valid DWARF:

  • keep .debug_line data in order
  • keep .debug_info ranges valid
  • (in the future) modify DWARF expressions if wasm locals/globals were changed

@RReverser
Copy link
Member

I'll just mention that this is a problem not just translation of address, but changed data to still has be a valid DWARF:
keep .debug_line data in order
keep .debug_info ranges valid

AFAIK the things listed here are already taken care of by convert_address callback in write::Dwarf, hence the suggestion above to reuse it.

@RReverser
Copy link
Member

RReverser commented Oct 29, 2019

@fitzgen I've looked into what you meant by implementing CustomSection + apply_code_transform btw. I think I see how it could work, but this API doesn't feel ideal, as it allows to only process one section at a time, whereas DWARF info is emitted across a bunch of different section.

If walrus could just produce a map of original -> final locations, then it seems much easier to reuse read::Dwarf + write::Dwarf that can read and update all the DWARF info at once (and extend support to more sections over time), rather than handling each section manually in wasm-bindgen itself.

I've tried to experiment with this in a wasm-bindgen branch, but looks like there are some incompatibilities with new walrus, particularly in anyhow::Error vs failure::Error used in two projects.

@fitzgen
Copy link
Member

fitzgen commented Oct 29, 2019

I've tried to experiment with this in a wasm-bindgen branch

As I mentioned above, I think walrus is the place this should live, not wasm-bindgen.

If walrus could just produce a map of original -> final locations, then it seems much easier to reuse read::Dwarf + write::Dwarf that can read and update all the DWARF info at once (and extend support to more sections over time), rather than handling each section manually in wasm-bindgen itself.

We could do that, but I also don't think that doing each section one at a time is bad, because then we can easily do each section in parallel.

@RReverser
Copy link
Member

because then we can easily do each section in parallel

It feels like something that should be done inside gimli, as it 1) knows more about sections and can parallelise work better and 2) could use such parallelisation on other platforms too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants