-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix #105969 #106218
fix #105969 #106218
Conversation
Build analysis is green, all failures are known. |
src/coreclr/jit/lowerxarch.cpp
Outdated
// Avx512F.Insert/ExtractVector256 accepts inputs with base type smaller that 64 | ||
// bits. it makes operSize wrong in those case. | ||
if (op2->AsHWIntrinsic()->GetHWIntrinsicId() == NI_AVX512F_InsertVector256 || | ||
op2->AsHWIntrinsic()->GetHWIntrinsicId() == NI_AVX512F_ExtractVector256) | ||
{ | ||
operSize = 8; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will fix the issue, but I'm not sure its entirely the "right" fix (and may miss some other similar cases).
We have a few instructions (namely those listed here: https://github.com/dotnet/runtime/blob/main/src/coreclr/jit/instr.cpp#L110) where they effectively operate "bitwise" without masking but which need to consider the actual operand size when masking (or broadcast, if supported) is used.
So we probably need to ensure all of these are handled and are picking the "right" operand size (generally 4 when the input is of a "small type").
In the particular case of Insert/ExtractVector256
we should probably be using vinserti32x8
instead of vinserti64x4
for the small types and for int
/uint
. Those are technically AVX512DQ
, but the JIT relies on all of F+BW+CD+DQ+VL
being available together, so it should be fine to just adjust them in hwintrinsiclistxarch.h
and get the "better" codegen. Some of the other intrinsics that need special handling may be in the same boat.
-- Alternatively we could just explicitly lower cases like NI_AVX512F_InsertVector256
to be NI_AVX512DQ_InsertVector256
and change the base type to int/uint
for byte/ubyte
, short/ushort
, and int/uint
. We could similarly lower cases like NI_AVX512F_And
to normalize the base type to int/uint
, which would likely also fix the issue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will fix the issue, but I'm not sure its entirely the "right" fix (and may miss some other similar cases).
We have a few instructions (namely those listed here: https://github.com/dotnet/runtime/blob/main/src/coreclr/jit/instr.cpp#L110) where they effectively operate "bitwise" without masking but which need to consider the actual operand size when masking (or broadcast, if supported) is used.
Thanks for the inputs,
At a quick glance on the instructions on the list, and/andn/or/xor/insert/extract
may hit the same bug, movudqx/round/broadcast
does not have EmbMask support yet. Do we consider fix them all? or leave those unsupported instructions for now (those instructions intrinsically have the support for embedded masking).
So we probably need to ensure all of these are handled and are picking the "right" operand size (generally 4 when the input is of a "small type").
In the particular case of
Insert/ExtractVector256
we should probably be usingvinserti32x8
instead ofvinserti64x4
for the small types and forint
/uint
. Those are technicallyAVX512DQ
, but the JIT relies on all ofF+BW+CD+DQ+VL
being available together, so it should be fine to just adjust them inhwintrinsiclistxarch.h
and get the "better" codegen. Some of the other intrinsics that need special handling may be in the same boat. -- Alternatively we could just explicitly lower cases likeNI_AVX512F_InsertVector256
to beNI_AVX512DQ_InsertVector256
and change the base type toint/uint
forbyte/ubyte
,short/ushort
, andint/uint
. We could similarly lower cases likeNI_AVX512F_And
to normalize the base type toint/uint
, which would likely also fix the issue
I would try to use the first option, as this is how those logical instructions (And/Or/AndN/Xor) are designed as well and based on how Avx512 ISA family is being checked, it should safe to use DQ instructions. These might still need to be handled together during lowering to normalize the base data type to int/uint
unless it is a long type. (did some extensive testing, And
with base type to be byte will have the same bug.)
|
…atches with the data type implied by the intrinsics.
Going to give this another passover later tonight before approving/merging. Want to double check the tables myself and make sure we didn't miss any other obvious edge cases here. The changes look generally good/correct, however, so it's more just an extra audit I'm wanting to do for completeness |
@Ruihan-Yin, is this ready for final review? |
Yes, changes are ready for review, thanks for the help from @tannergooding |
@Ruihan-Yin, there is an assertion failure. PTAL. Pipeline: runtime-coreclr jitstress-isas-avx512/20240813.7
|
Wanted to call out that - RWD00 dq 00FF00FF00FF00FFh, 00FF00FF00FF00FFh, 00FF00FF00FF00FFh, 00FF00FF00FF00FFh
+ RWD00 dd 00FF00FFh So while the encoding used for There also appears to be a disassembly quirk in that its showing |
#106405 fixes the disassembly and the Edit: Changed to include the fix directly in this PR so they can make the cutoff |
There's still a remaining failure caused by this change. Validating the fix locally and should have it pushed up shortly. |
…sted AND having an embedded broadcast
GenTree* nestedOp1 = op1Intrinsic->Op(1); | ||
GenTree* nestedOp2 = op1Intrinsic->Op(2); | ||
|
||
if (nestedOp2->isContained() && nestedOp2->OperIsHWIntrinsic()) | ||
{ | ||
GenTreeHWIntrinsic* nestedIntrin = nestedOp2->AsHWIntrinsic(); | ||
NamedIntrinsic nestedIntrinId = nestedIntrin->GetHWIntrinsicId(); | ||
|
||
if ((nestedIntrinId == NI_SSE3_MoveAndDuplicate) || | ||
(nestedIntrinId == NI_AVX2_BroadcastScalarToVector128) || | ||
(nestedIntrinId == NI_AVX2_BroadcastScalarToVector256) || | ||
(nestedIntrinId == NI_AVX512F_BroadcastScalarToVector512)) | ||
{ | ||
// We need to rewrite the embedded broadcast back to a regular constant | ||
// so that the subsequent containment check for ptestm can determine | ||
// if the embedded broadcast is still relevant | ||
|
||
GenTree* broadcastOp = nestedIntrin->Op(1); | ||
|
||
if (broadcastOp->OperIsHWIntrinsic(NI_Vector128_CreateScalarUnsafe)) | ||
{ | ||
BlockRange().Remove(broadcastOp); | ||
broadcastOp = broadcastOp->AsHWIntrinsic()->Op(1); | ||
} | ||
|
||
assert(broadcastOp->OperIsConst()); | ||
|
||
GenTree* vecCns = | ||
comp->gtNewSimdCreateBroadcastNode(simdType, broadcastOp, | ||
op1Intrinsic->GetSimdBaseJitType(), simdSize); | ||
|
||
BlockRange().InsertAfter(broadcastOp, vecCns); | ||
nestedOp2 = vecCns; | ||
|
||
BlockRange().Remove(broadcastOp); | ||
BlockRange().Remove(nestedIntrin); | ||
} | ||
} | ||
|
||
node->Op(1) = nestedOp1; | ||
node->Op(2) = nestedOp2; | ||
|
||
// Make sure we aren't contained since ptestm will do its own containment check | ||
node->Op(2)->ClearContained(); | ||
nestedOp2->ClearContained(); | ||
|
||
if (varTypeIsSmall(simdBaseType)) | ||
{ | ||
// Fixup the base type so embedded broadcast and the mask size checks still work | ||
node->NormalizeJitBaseTypeToInt(testIntrinsicId, simdBaseType); | ||
|
||
simdBaseJitType = node->GetSimdBaseJitType(); | ||
simdBaseType = node->GetSimdBaseType(); | ||
|
||
maskBaseJitType = simdBaseJitType; | ||
maskBaseType = simdBaseType; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the normalization, we could now more easily encounter something that looked like (x & y) == 0
where (x & y)
were TYP_INT
or TYP_UINT
but tmp == 0
was still a small type. -- This was also possible before the fix, so an existing bug, but less likely since it wouldn't have been typical for a user to do something like (x & y).AsByte() == 0
In such a case, (x & y)
may have also already made y
an `embedded broadcast and that would cause a disconnect.
The simplest fix would've been to simply not do this optimization if op2
of the AND
operation was already contained, but that had some fairly significant size regressions and a measurable perf hit to some core APIs, including GetIndexOfFirstNonAsciiChar_Vector
as the inner loop codegen was no longer doing a simple check and branch.
So I went with the more verbose "proper" fix that instead ensures the type of tmp == 0
is fixed up and the containment of op2
is properly resolved so that the rewritten PTESTM
can properly contain the operation itself.
@TIHan is checking spmi asmdiffs test failure. |
The diffs were failing to acquire the mch files, so it didn't run anything. It's re-running now and has been for about 45 min. |
The last set of spmi failures were all due to the downloads failing, generally that's due to a JIT/EE version change going in (like the one we had earlier today: https://github.com/dotnet/runtime/commits/9d9af3d22f4b9fdd453790eecab63795f896ea1a/src/coreclr/inc/jiteeversionguid.h) The current jitstress-isas-avx512 failure is unrelated, the history shows its been failing off and on for a while now: https://dev.azure.com/dnceng-public/public/_build/results?buildId=776127&view=ms.vss-test-web.build-test-results-tab&runId=19865742&resultId=118465&paneView=history |
I ran the full PMI diffs locally ( |
/ba-g runtime queue completed but results were not reported due to GitHub outage: https://dev.azure.com/dnceng-public/public/_build/results?buildId=776122&view=results |
Thanks for the investigation! is there any other blocker before merging? |
Some spmi-diffs jobs failed to acquire the MCH files again, due to the JIT/EE version guid change Local runs of both SPMI diffs and full PMI diffs are passing. Just waiting on SPMI replay to finish |
SPMI replay has also finished, only failures are also MCH acquisition issues due to the JIT/EE version change. They remain passing locally. |
attempt to resolve #105969