Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add op access for OpCodeInfo and Code #614

Open
wants to merge 14 commits into
base: master
Choose a base branch
from

Conversation

susitsm
Copy link
Contributor

@susitsm susitsm commented Sep 20, 2024

Iced provides OpAccess through the InstructionInfoFactory interface which does advanced analysis based on concrete operands of an instruction. This merge request aims to add a simpler OpAccess API which uses data derivable from the Code enum and does not need full instructions.

The code compiles but it is a WORK IN PROGRESS.

Unresolved issues:

  • Some code generator generated code was modified in 40f17c3. Those modifications should be moved to the code generator
  • Some new functions are missing documentation
  • Add documentation to InstructionInfo about how it simplifies operand accesses.
  • Move functions to where they belong: most new code was put into src/rust/iced-x86/src/encoder/code_flags.rs, some of it might be better placed somewhere else.
  • Fix the remaining todo!()-s in Code::op0_access. I didn't know how some of them should be converted. I would appreciate any feedback/help you can give me.

@wtfsck
Copy link
Member

wtfsck commented Sep 25, 2024

Makes sense to add those functions to Code/OpCodeInfo.

Not sure if OpCodeInfo needs the extra op_accesses field since it can just call the Code functions.

One thing missing is tests for all new APIs. Could be a lot more work so maybe I'll add that sometime. Would require (C# code, see generator code) generating a new txt file for at least the Code test that would include the Code value and the OpAccess values, write a parser for it (should be easy, split the line), add the test code somewhere.

@susitsm
Copy link
Contributor Author

susitsm commented Sep 29, 2024

Removed op access from OpCodeInfo, there is no need to do the same thing multiple ways. I also added a miniscule amount of docs about the optimizations done by InstructionInfoFactory.

There is one more thing I am not sure about, and that is OpInfo0::WriteMem_ReadWriteReg. Currently, I convert it to OpAccess::ReadWrite, but I am thinking that maybe we should create a CodeOpAccess enum, and just leave it as-is. Similarly to OpCodeOperandKind and OpKind.

Also, I could not find any info about OpInfo0::CondWrite32_ReadWrite64. We might also want to leave that as its own kind.

@wtfsck
Copy link
Member

wtfsck commented Oct 22, 2024

Sorry, forgot to reply.

This is the reason I didn't add those functions to Code etc, because it requires extra input, eg. bitness.

I don't like a new enum, that reveals too much about the implementation.

The correct way is to add the required input, I think that's only bitness, if it's more than that, this could get ugly.

@susitsm
Copy link
Contributor Author

susitsm commented Oct 22, 2024

We need at most 2 inputs to disambiguate: bitness and whether the instruction has memory operands. Both are for operand 0.

First, there are the 32 bit CMOVcc-s Cmovo_r32_rm32, Cmovno_r32_rm32, Cmovb_r32_rm32, Cmovae_r32_rm32, Cmove_r32_rm32, Cmovne_r32_rm32, Cmovbe_r32_rm32, Cmova_r32_rm32, Cmovs_r32_rm32, Cmovns_r32_rm32, Cmovp_r32_rm32, Cmovnp_r32_rm32, Cmovl_r32_rm32, Cmovge_r32_rm32, Cmovle_r32_rm32, Cmovg_r32_rm32
These have OpInfo0::CondWrite32_ReadWrite64. In the InstructionInfoFactory this is then converted based on bitness.
To be honest I don't understand why that is. If the reason is that in 64 bit mode the upper 32 bits are always zeroed, that isn't really the first operand, is it? The first operand is the lower 32 bits.

After giving it some thought, I think this should be CondWrite for this API, and we do not need bitness as an input. Anybody dealing with x86-64 should know that the upper 32 bits are always zeroed, and has to handle it as a special case for all instructions. If the reason for requiring bitness here is not the 32 upper bits, please correct me.

Second, there are Movss_xmm_xmmm32, Movsd_xmm_xmmm64, Movss_xmmm32_xmm, Movsd_xmmm64_xmm. These have OpInfo0::WriteMem_ReadWriteReg.
Relevant part from the intel manual for movss:

Legacy version: When the source and destination operands are XMM registers, bits (MAXVL-1:32) of the corresponding destination register are unmodified. When the source operand is a memory location and destination operand is an XMM registers, Bits (127:32) of the destination operand is cleared to all 0s, bits MAXVL:128 of the destination operand remains unchanged.

To support this, we would need an argument, has_memory_operand: bool.

To summarize: the proposal is to add has_memory_operand as an argument to op0_access and op_access.

@wtfsck
Copy link
Member

wtfsck commented Oct 26, 2024

After giving it some thought, I think this should be CondWrite for this API, and we do not need bitness as an input. Anybody dealing with x86-64 should know that the upper 32 bits are always zeroed, and has to handle it as a special case for all instructions. If the reason for requiring bitness here is not the 32 upper bits, please correct me.

The upper 32 bits are cleared in 64-bit mode if operand size is 32 bits.

Operation
temp := SRC
IF condition TRUE
    THEN DEST := temp;
ELSE IF (OperandSize = 32 and IA-32e mode active)
    THEN DEST[63:32] := 0;
FI;

So the the CPU actually does dest = cond ? temp : dest and this always clears the upper 32 bits of dest if operand size is 32.

To support this, we would need an argument, has_memory_operand: bool.

To summarize: the proposal is to add has_memory_operand as an argument to op0_access and op_access.

It would need two args, and it's now getting very ugly since the user has to pass in bitness and check if the instruction has a memory operand.

@susitsm
Copy link
Contributor Author

susitsm commented Oct 31, 2024

Operation
temp := SRC
IF condition TRUE
    THEN DEST := temp;
ELSE IF (OperandSize = 32 and IA-32e mode active)
    THEN DEST[63:32] := 0;
FI;

I know that this is from the manual, I'm arguing that this code is not really correct when you actually try to implement it.
Let's look at an example cmovz eax, ebx. DEST could mean 2 different things there, neither produces correct code. First, if I say DEST is eax

temp := ebx
IF condition TRUE
    THEN eax := temp;
ELSE IF (OperandSize = 32 and IA-32e mode active)
    THEN eax[63:32] := 0;
FI;

What is eax[63:32]?
If we say that DEST is rax

temp := ebx
IF condition TRUE
    THEN rax := temp;
ELSE IF (OperandSize = 32 and IA-32e mode active)
    THEN rax[63:32] := 0;
FI;

then what is rax := temp? temp is 32 bits, rax is 64.
This is why I wrote that when someone wants to work with x86-64, handling the zeroing of the upper bits will always be something they need to handle as a special case.
If we define DEST as eax, the access for eax is a conditional write, regardless of the bitness. We will need one less argument, giving a cleaner API and letting the user handle the zero extend as they see fit.

Nevertheless, I'm not here to argue. If you still think that the 2 argument version is better, that works too. I won't make more of a case against it and finish the PR 😄
Also, I can give it a single struct as argument to make the API more future-proof.

@wtfsck
Copy link
Member

wtfsck commented Nov 15, 2024

Hopefully I didn't misunderstand you. Let me explain again.

In 16/32-bit mode, Cmovo_r32_rm32 should be CondWrite because it's not possible to observe the CPU writing to the upper 32 bits of a 64-bit GPR.

So we can ignore the upper 32 bits of the 64-bit GPR and just assume the CPU does this: dest = cond ? temp : dest which is a CondWrite

In 64-bit mode, Cmovo_r32_rm32 is not CondWrite, it would have to be ReadWrite since the CPU always clears the upper 64 bits of the 64-bit destination GPR.

In 64-bit mode, the CPU does this: dest-64 = ZX(cond ? temp-32 : dest-32) which always clears the upper 32-bits of dest-64. This either completely overwrites dest-64 with temp-32 or it writes the low 32 bits of dest (dest-32) to dest-64, i.e., zero extends dest-64 from 32 bits to 64 bits. So in 64-bit mode it's either Write or Zero-extend (read-then-write), i.e., ReadWrite.

So we would need a bitness argument.

However, in this case we could just return ReadWrite (NOT CondWrite) for both so we won't have to use an extra bitness arg, but then the caller has to check if it's a CMOVcc if it's 16/32-bit code.

So we're back to just one arg again, the 'is mem op' argument. That's acceptable, 2 args would be 1 arg too much.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants