Add op access for `OpCodeInfo` and `Code` #614

susitsm · 2024-09-20T11:04:24Z

Iced provides OpAccess through the InstructionInfoFactory interface which does advanced analysis based on concrete operands of an instruction. This merge request aims to add a simpler OpAccess API which uses data derivable from the Code enum and does not need full instructions.

The code compiles but it is a WORK IN PROGRESS.

Unresolved issues:

Some code generator generated code was modified in 40f17c3. Those modifications should be moved to the code generator
Some new functions are missing documentation
Add documentation to InstructionInfo about how it simplifies operand accesses.
Move functions to where they belong: most new code was put into src/rust/iced-x86/src/encoder/code_flags.rs, some of it might be better placed somewhere else.
Fix the remaining todo!()-s in Code::op0_access. I didn't know how some of them should be converted. I would appreciate any feedback/help you can give me.

…ums::{OP_ACCESS_1, OP_ACCESS_2}` from `pub(super) static` to `pub(crate) const`

…ut the need of using unsafe blocks

… to code.rs

…ng the generator up to date with the changes in 40f17c3

wtfsck · 2024-09-25T16:35:30Z

Makes sense to add those functions to Code/OpCodeInfo.

Not sure if OpCodeInfo needs the extra op_accesses field since it can just call the Code functions.

One thing missing is tests for all new APIs. Could be a lot more work so maybe I'll add that sometime. Would require (C# code, see generator code) generating a new txt file for at least the Code test that would include the Code value and the OpAccess values, write a parser for it (should be easy, split the line), add the test code somewhere.

susitsm · 2024-09-29T06:29:01Z

Removed op access from OpCodeInfo, there is no need to do the same thing multiple ways. I also added a miniscule amount of docs about the optimizations done by InstructionInfoFactory.

There is one more thing I am not sure about, and that is OpInfo0::WriteMem_ReadWriteReg. Currently, I convert it to OpAccess::ReadWrite, but I am thinking that maybe we should create a CodeOpAccess enum, and just leave it as-is. Similarly to OpCodeOperandKind and OpKind.

Also, I could not find any info about OpInfo0::CondWrite32_ReadWrite64. We might also want to leave that as its own kind.

wtfsck · 2024-10-22T17:06:50Z

Sorry, forgot to reply.

This is the reason I didn't add those functions to Code etc, because it requires extra input, eg. bitness.

I don't like a new enum, that reveals too much about the implementation.

The correct way is to add the required input, I think that's only bitness, if it's more than that, this could get ugly.

susitsm · 2024-10-22T18:17:04Z

We need at most 2 inputs to disambiguate: bitness and whether the instruction has memory operands. Both are for operand 0.

First, there are the 32 bit CMOVcc-s Cmovo_r32_rm32, Cmovno_r32_rm32, Cmovb_r32_rm32, Cmovae_r32_rm32, Cmove_r32_rm32, Cmovne_r32_rm32, Cmovbe_r32_rm32, Cmova_r32_rm32, Cmovs_r32_rm32, Cmovns_r32_rm32, Cmovp_r32_rm32, Cmovnp_r32_rm32, Cmovl_r32_rm32, Cmovge_r32_rm32, Cmovle_r32_rm32, Cmovg_r32_rm32
These have OpInfo0::CondWrite32_ReadWrite64. In the InstructionInfoFactory this is then converted based on bitness.
To be honest I don't understand why that is. If the reason is that in 64 bit mode the upper 32 bits are always zeroed, that isn't really the first operand, is it? The first operand is the lower 32 bits.

After giving it some thought, I think this should be CondWrite for this API, and we do not need bitness as an input. Anybody dealing with x86-64 should know that the upper 32 bits are always zeroed, and has to handle it as a special case for all instructions. If the reason for requiring bitness here is not the 32 upper bits, please correct me.

Second, there are Movss_xmm_xmmm32, Movsd_xmm_xmmm64, Movss_xmmm32_xmm, Movsd_xmmm64_xmm. These have OpInfo0::WriteMem_ReadWriteReg.
Relevant part from the intel manual for movss:

Legacy version: When the source and destination operands are XMM registers, bits (MAXVL-1:32) of the corresponding destination register are unmodified. When the source operand is a memory location and destination operand is an XMM registers, Bits (127:32) of the destination operand is cleared to all 0s, bits MAXVL:128 of the destination operand remains unchanged.

To support this, we would need an argument, has_memory_operand: bool.

To summarize: the proposal is to add has_memory_operand as an argument to op0_access and op_access.

wtfsck · 2024-10-26T15:09:01Z

After giving it some thought, I think this should be CondWrite for this API, and we do not need bitness as an input. Anybody dealing with x86-64 should know that the upper 32 bits are always zeroed, and has to handle it as a special case for all instructions. If the reason for requiring bitness here is not the 32 upper bits, please correct me.

The upper 32 bits are cleared in 64-bit mode if operand size is 32 bits.

Operation
temp := SRC
IF condition TRUE
    THEN DEST := temp;
ELSE IF (OperandSize = 32 and IA-32e mode active)
    THEN DEST[63:32] := 0;
FI;

So the the CPU actually does dest = cond ? temp : dest and this always clears the upper 32 bits of dest if operand size is 32.

To support this, we would need an argument, has_memory_operand: bool.

To summarize: the proposal is to add has_memory_operand as an argument to op0_access and op_access.

It would need two args, and it's now getting very ugly since the user has to pass in bitness and check if the instruction has a memory operand.

susitsm · 2024-10-31T10:50:35Z

Operation
temp := SRC
IF condition TRUE
    THEN DEST := temp;
ELSE IF (OperandSize = 32 and IA-32e mode active)
    THEN DEST[63:32] := 0;
FI;

I know that this is from the manual, I'm arguing that this code is not really correct when you actually try to implement it.
Let's look at an example cmovz eax, ebx. DEST could mean 2 different things there, neither produces correct code. First, if I say DEST is eax

temp := ebx
IF condition TRUE
    THEN eax := temp;
ELSE IF (OperandSize = 32 and IA-32e mode active)
    THEN eax[63:32] := 0;
FI;

What is eax[63:32]?
If we say that DEST is rax

temp := ebx
IF condition TRUE
    THEN rax := temp;
ELSE IF (OperandSize = 32 and IA-32e mode active)
    THEN rax[63:32] := 0;
FI;

then what is rax := temp? temp is 32 bits, rax is 64.
This is why I wrote that when someone wants to work with x86-64, handling the zeroing of the upper bits will always be something they need to handle as a special case.
If we define DEST as eax, the access for eax is a conditional write, regardless of the bitness. We will need one less argument, giving a cleaner API and letting the user handle the zero extend as they see fit.

Nevertheless, I'm not here to argue. If you still think that the 2 argument version is better, that works too. I won't make more of a case against it and finish the PR 😄
Also, I can give it a single struct as argument to make the API more future-proof.

wtfsck · 2024-11-15T17:05:25Z

Hopefully I didn't misunderstand you. Let me explain again.

In 16/32-bit mode, Cmovo_r32_rm32 should be CondWrite because it's not possible to observe the CPU writing to the upper 32 bits of a 64-bit GPR.

So we can ignore the upper 32 bits of the 64-bit GPR and just assume the CPU does this: dest = cond ? temp : dest which is a CondWrite

In 64-bit mode, Cmovo_r32_rm32 is not CondWrite, it would have to be ReadWrite since the CPU always clears the upper 64 bits of the 64-bit destination GPR.

In 64-bit mode, the CPU does this: dest-64 = ZX(cond ? temp-32 : dest-32) which always clears the upper 32-bits of dest-64. This either completely overwrites dest-64 with temp-32 or it writes the low 32 bits of dest (dest-32) to dest-64, i.e., zero extends dest-64 from 32 bits to 64 bits. So in 64-bit mode it's either Write or Zero-extend (read-then-write), i.e., ReadWrite.

So we would need a bitness argument.

However, in this case we could just return ReadWrite (NOT CondWrite) for both so we won't have to use an extra bitness arg, but then the caller has to check if it's a CMOVcc if it's 16/32-bit code.

So we're back to just one arg again, the 'is mem op' argument. That's acceptable, 2 args would be 1 arg too much.

susitsm added 8 commits September 20, 2024 12:42

convert info::info_table::TABLE from static to const, `info::en…

40f17c3

…ums::{OP_ACCESS_1, OP_ACCESS_2}` from `pub(super) static` to `pub(crate) const`

make info::info_table::TABLE data accessible inside the crate witho…

e6c5218

…ut the need of using unsafe blocks

add OpAccess to OpCodeInfo

167ccb8

use core::mem instead of std::mem

2e98397

move InfoFlags1 and InfoFlags2 to info/info_flags.rs, methods of Code…

187f6bf

… to code.rs

implement TODOs in op0_access

bec1baa

fix warnings

e462d76

fix clippy warnings

0916d0e

susitsm force-pushed the opcode-op-access branch from 70d3794 to d5d4411 Compare September 21, 2024 08:02

put the new functions behind the instr_info feature

c1d5e8d

susitsm force-pushed the opcode-op-access branch from d5d4411 to c1d5e8d Compare September 21, 2024 08:14

susitsm added 2 commits September 21, 2024 10:18

cargo fmt

0ff8ffd

revert info::enums::{OP_ACCESS_1, OP_ACCESS_2} to pub(super), bri…

f0a0c81

…ng the generator up to date with the changes in 40f17c3

susitsm added 3 commits September 29, 2024 08:12

add some docs to InstructionInfoFactory about the OpAccess optimizations

c8a0969

remove op_access from OpCodeInfo

a7a17ac

fix clippy error

2c5b88c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add op access for `OpCodeInfo` and `Code` #614

Add op access for `OpCodeInfo` and `Code` #614

susitsm commented Sep 20, 2024 •

edited

Loading

wtfsck commented Sep 25, 2024 •

edited

Loading

susitsm commented Sep 29, 2024 •

edited

Loading

wtfsck commented Oct 22, 2024

susitsm commented Oct 22, 2024

wtfsck commented Oct 26, 2024

susitsm commented Oct 31, 2024

wtfsck commented Nov 15, 2024 •

edited

Loading

Add op access for OpCodeInfo and Code #614

Are you sure you want to change the base?

Add op access for OpCodeInfo and Code #614

Conversation

susitsm commented Sep 20, 2024 • edited Loading

wtfsck commented Sep 25, 2024 • edited Loading

susitsm commented Sep 29, 2024 • edited Loading

wtfsck commented Oct 22, 2024

susitsm commented Oct 22, 2024

wtfsck commented Oct 26, 2024

susitsm commented Oct 31, 2024

wtfsck commented Nov 15, 2024 • edited Loading

Add op access for `OpCodeInfo` and `Code` #614

Add op access for `OpCodeInfo` and `Code` #614

susitsm commented Sep 20, 2024 •

edited

Loading

wtfsck commented Sep 25, 2024 •

edited

Loading

susitsm commented Sep 29, 2024 •

edited

Loading

wtfsck commented Nov 15, 2024 •

edited

Loading