Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More optimizations #101

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

More optimizations #101

wants to merge 2 commits into from

Conversation

turol
Copy link
Contributor

@turol turol commented Aug 31, 2024

GCC was putting a conditional move in the copy loop for some reason. The second one is less certain and seems to be highly sensitive to changes.

    N           Min           Max        Median           Avg        Stddev
x  30      18039044      18336971      18278984      18267936     61730.367
+  30      18453057      18728267      18672697      18658935     56689.819
Difference at 95.0% confidence
	390999 +/- 30634.3
	2.14036% +/- 0.167694%
	(Student's t, pooled s = 59263.7)
Instructions/second, higher is better

for (opIndex = 0;
opIndex < ((Size < ND_MAX_INSTRUCTION_LENGTH) ? Size : ND_MAX_INSTRUCTION_LENGTH);
opIndex++)
if (Size < ND_MAX_INSTRUCTION_LENGTH)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This just moves the condition from the for outside. Modern compilers should be able to deal with it as it is (for example, MSVC unrolls the entire loop, so introducing this if actually makes things worse).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very likely GCC doing something stupid. I'll test clang too and see what happens.

@@ -3813,22 +3813,27 @@ NdGetEffectiveAddrAndOpMode(
static const ND_UINT8 szLut[3] = { ND_SIZE_16BIT, ND_SIZE_32BIT, ND_SIZE_64BIT };
ND_BOOL w64, f64, d64, has66;

if ((ND_CODE_64 != Instrux->DefCode) && !!(Instrux->Attributes & ND_FLAG_IWO64))
// Branchless form of (ND_CODE_64 != Instrux->DefCode) && !!(Instrux->Attributes & ND_FLAG_IWO64)
if (((ND_CODE_64 ^ Instrux->DefCode) * (Instrux->Attributes & ND_FLAG_IWO64)) != 0)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect the branch predictors in modern CPUs to be quite competent at predicting the branches here, since they operate on INSTRUX state that will very often be the same, allowing for a very high prediction rate.
I understand that these changes may remove some conditional moves or branches, but is it really worth it? At some point, one has to decide whether the (small) increase in performance is worth the less readable code. I don't think this is the case here. In addition, depending on the architecture, the multiplication may be a new source of overhead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Multiplication pipelines easier than branches. I wasn't entirely happy with this commit either but it did make it faster.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The compiler may even ditch the multiplication altogether, and rely on conditional instructions (such as CMOVcc or SETcc), but this is not the point. The point is that it shouldn't really matter, since modern branch prediction should easily handle these branches with minimal overhead. These statements are not a performance bottleneck.

@vlutas
Copy link
Collaborator

vlutas commented Sep 2, 2024

On Windows (MSVC compiler), the performance with these changes is essentially the same.

@turol
Copy link
Contributor Author

turol commented Sep 2, 2024

Does your CI pipeline upload the MSVC -built binaries somewhere?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants