fix #105969 #106218

Ruihan-Yin · 2024-08-10T00:12:44Z

attempt to resolve #105969

Ruihan-Yin · 2024-08-10T06:35:24Z

Build analysis is green, all failures are known.
@JulieLeeMSFT @tannergooding PTAL.

tannergooding · 2024-08-10T19:21:00Z

src/coreclr/jit/lowerxarch.cpp

+                                    // Avx512F.Insert/ExtractVector256 accepts inputs with base type smaller that 64
+                                    // bits. it makes operSize wrong in those case.
+                                    if (op2->AsHWIntrinsic()->GetHWIntrinsicId() == NI_AVX512F_InsertVector256 ||
+                                        op2->AsHWIntrinsic()->GetHWIntrinsicId() == NI_AVX512F_ExtractVector256)
+                                    {
+                                        operSize = 8;
+                                    }


This will fix the issue, but I'm not sure its entirely the "right" fix (and may miss some other similar cases).

We have a few instructions (namely those listed here: https://github.com/dotnet/runtime/blob/main/src/coreclr/jit/instr.cpp#L110) where they effectively operate "bitwise" without masking but which need to consider the actual operand size when masking (or broadcast, if supported) is used.

So we probably need to ensure all of these are handled and are picking the "right" operand size (generally 4 when the input is of a "small type").

In the particular case of Insert/ExtractVector256 we should probably be using vinserti32x8 instead of vinserti64x4 for the small types and for int/uint. Those are technically AVX512DQ, but the JIT relies on all of F+BW+CD+DQ+VL being available together, so it should be fine to just adjust them in hwintrinsiclistxarch.h and get the "better" codegen. Some of the other intrinsics that need special handling may be in the same boat.
-- Alternatively we could just explicitly lower cases like NI_AVX512F_InsertVector256 to be NI_AVX512DQ_InsertVector256 and change the base type to int/uint for byte/ubyte, short/ushort, and int/uint. We could similarly lower cases like NI_AVX512F_And to normalize the base type to int/uint, which would likely also fix the issue

This will fix the issue, but I'm not sure its entirely the "right" fix (and may miss some other similar cases).

We have a few instructions (namely those listed here: https://github.com/dotnet/runtime/blob/main/src/coreclr/jit/instr.cpp#L110) where they effectively operate "bitwise" without masking but which need to consider the actual operand size when masking (or broadcast, if supported) is used.

Thanks for the inputs,

At a quick glance on the instructions on the list, and/andn/or/xor/insert/extract may hit the same bug, movudqx/round/broadcast does not have EmbMask support yet. Do we consider fix them all? or leave those unsupported instructions for now (those instructions intrinsically have the support for embedded masking).

So we probably need to ensure all of these are handled and are picking the "right" operand size (generally 4 when the input is of a "small type").

In the particular case of Insert/ExtractVector256 we should probably be using vinserti32x8 instead of vinserti64x4 for the small types and for int/uint. Those are technically AVX512DQ, but the JIT relies on all of F+BW+CD+DQ+VL being available together, so it should be fine to just adjust them in hwintrinsiclistxarch.h and get the "better" codegen. Some of the other intrinsics that need special handling may be in the same boat. -- Alternatively we could just explicitly lower cases like NI_AVX512F_InsertVector256 to be NI_AVX512DQ_InsertVector256 and change the base type to int/uint for byte/ubyte, short/ushort, and int/uint. We could similarly lower cases like NI_AVX512F_And to normalize the base type to int/uint, which would likely also fix the issue

I would try to use the first option, as this is how those logical instructions (And/Or/AndN/Xor) are designed as well and based on how Avx512 ISA family is being checked, it should safe to use DQ instructions. These might still need to be handled together during lowering to normalize the base data type to int/uint unless it is a long type. (did some extensive testing, And with base type to be byte will have the same bug.)

JulieLeeMSFT · 2024-08-12T17:34:56Z

@TIHan for code review for .NET 9.

…atches with the data type implied by the intrinsics.

tannergooding · 2024-08-12T22:17:43Z

Going to give this another passover later tonight before approving/merging.

Want to double check the tables myself and make sure we didn't miss any other obvious edge cases here. The changes look generally good/correct, however, so it's more just an extra audit I'm wanting to do for completeness

src/coreclr/jit/lowerxarch.cpp

src/coreclr/jit/hwintrinsic.cpp

JulieLeeMSFT · 2024-08-14T16:39:30Z

@Ruihan-Yin, is this ready for final review?
Rerunning failed tests.

Ruihan-Yin · 2024-08-14T17:07:01Z

@Ruihan-Yin, is this ready for final review? Rerunning failed tests.

Yes, changes are ready for review, thanks for the help from @tannergooding

JulieLeeMSFT · 2024-08-14T17:35:33Z

@Ruihan-Yin, there is an assertion failure. PTAL.

Pipeline: runtime-coreclr jitstress-isas-avx512/20240813.7
Log:

  Starting:    profiler.elt.XUnitWrapper (parallel test collections = on [4 threads], stop on fail = off)
    profiler\elt\slowpatheltenter\slowpatheltenter.cmd [FAIL]
      
      Assert failure(PID 4496 [0x00001190], Thread: 5920 [0x1720]): Assertion failed 'IS_ALIGNED(addr, byteSize)' in 'System.Text.Ascii:GetIndexOfFirstNonAsciiChar_Vector(uint,uint):uint' during 'Emit code' (IL size 810; hash 0x3c71a05c; FullOpts)
      
          File: D:\a\_work\1\s\src\coreclr\jit\emitxarch.cpp:14542
          Image: C:\h\w\B3E509A2\p\CoreRun.exe
      
      Unhandled exception. System.Exception: Profiler tests are expected to contain the text 'PROFILER TEST PASSES' in the console output of the profilee app to indicate a passing test. Usually it is printed from the Shutdown() method of the profiler implementation. This text was not found in the output above. Profilee returned exit code -1073740286.
         at Profiler.Tests.ProfilerTestRunner.FailFastWithMessage(String error)
         at Profiler.Tests.ProfilerTestRunner.Run(String profileePath, String testName, Guid profilerClsid, String profileeArguments, ProfileeOptions profileeOptions, Dictionary`2 envVars, String reverseServerName, Boolean loadAsNotification, Int32 notificationCopies)
         at SlowPathELTTests.SlowPathELTEnter.Main(String[] args)

tannergooding · 2024-08-14T17:36:51Z

Wanted to call out that spmi-diffs shows size regression, but that's namely because it doesn't account for data constant size at all. With the base type being int/uint it's recognizing cases like this now:

- RWD00  	dq	00FF00FF00FF00FFh, 00FF00FF00FF00FFh, 00FF00FF00FF00FFh, 00FF00FF00FF00FFh
+ RWD00  	dd	00FF00FFh

So while the encoding used for vpand is 2-bytes larger due to the embedded broadcast, we actually have 28 bytes of size savings.

There also appears to be a disassembly quirk in that its showing {1to1} when it should say {1to4} or {1to8}. This looks to be because emitDispEmbBroadcastCount is using emitGetBaseMemOpSize which is taking broadcast size into account, when rather broadcast size should only be taking into account for emitGetMemOpSize. This is unrelated to this PR, so I'll get a separate fix up for it.

tannergooding · 2024-08-14T17:54:36Z

#106405 fixes the disassembly and the IS_ALIGNED assert

Edit: Changed to include the fix directly in this PR so they can make the cutoff

…sembly

tannergooding · 2024-08-14T19:17:04Z

There's still a remaining failure caused by this change. Validating the fix locally and should have it pushed up shortly.

…sted AND having an embedded broadcast

tannergooding · 2024-08-14T20:42:13Z

src/coreclr/jit/lowerxarch.cpp

+                        GenTree* nestedOp1 = op1Intrinsic->Op(1);
+                        GenTree* nestedOp2 = op1Intrinsic->Op(2);
+
+                        if (nestedOp2->isContained() && nestedOp2->OperIsHWIntrinsic())
+                        {
+                            GenTreeHWIntrinsic* nestedIntrin   = nestedOp2->AsHWIntrinsic();
+                            NamedIntrinsic      nestedIntrinId = nestedIntrin->GetHWIntrinsicId();
+
+                            if ((nestedIntrinId == NI_SSE3_MoveAndDuplicate) ||
+                                (nestedIntrinId == NI_AVX2_BroadcastScalarToVector128) ||
+                                (nestedIntrinId == NI_AVX2_BroadcastScalarToVector256) ||
+                                (nestedIntrinId == NI_AVX512F_BroadcastScalarToVector512))
+                            {
+                                // We need to rewrite the embedded broadcast back to a regular constant
+                                // so that the subsequent containment check for ptestm can determine
+                                // if the embedded broadcast is still relevant
+
+                                GenTree* broadcastOp = nestedIntrin->Op(1);
+
+                                if (broadcastOp->OperIsHWIntrinsic(NI_Vector128_CreateScalarUnsafe))
+                                {
+                                    BlockRange().Remove(broadcastOp);
+                                    broadcastOp = broadcastOp->AsHWIntrinsic()->Op(1);
+                                }
+
+                                assert(broadcastOp->OperIsConst());
+
+                                GenTree* vecCns =
+                                    comp->gtNewSimdCreateBroadcastNode(simdType, broadcastOp,
+                                                                       op1Intrinsic->GetSimdBaseJitType(), simdSize);
+
+                                BlockRange().InsertAfter(broadcastOp, vecCns);
+                                nestedOp2 = vecCns;
+
+                                BlockRange().Remove(broadcastOp);
+                                BlockRange().Remove(nestedIntrin);
+                            }
+                        }
+
+                        node->Op(1) = nestedOp1;
+                        node->Op(2) = nestedOp2;

                        // Make sure we aren't contained since ptestm will do its own containment check
-                        node->Op(2)->ClearContained();
+                        nestedOp2->ClearContained();
+
+                        if (varTypeIsSmall(simdBaseType))
+                        {
+                            // Fixup the base type so embedded broadcast and the mask size checks still work
+                            node->NormalizeJitBaseTypeToInt(testIntrinsicId, simdBaseType);
+
+                            simdBaseJitType = node->GetSimdBaseJitType();
+                            simdBaseType    = node->GetSimdBaseType();
+
+                            maskBaseJitType = simdBaseJitType;
+                            maskBaseType    = simdBaseType;
+                        }


With the normalization, we could now more easily encounter something that looked like (x & y) == 0 where (x & y) were TYP_INT or TYP_UINT but tmp == 0 was still a small type. -- This was also possible before the fix, so an existing bug, but less likely since it wouldn't have been typical for a user to do something like (x & y).AsByte() == 0

In such a case, (x & y) may have also already made y an `embedded broadcast and that would cause a disconnect.

The simplest fix would've been to simply not do this optimization if op2 of the AND operation was already contained, but that had some fairly significant size regressions and a measurable perf hit to some core APIs, including GetIndexOfFirstNonAsciiChar_Vector as the inner loop codegen was no longer doing a simple check and branch.

So I went with the more verbose "proper" fix that instead ensures the type of tmp == 0 is fixed up and the containment of op2 is properly resolved so that the rewritten PTESTM can properly contain the operation itself.

JulieLeeMSFT · 2024-08-14T22:35:16Z

@TIHan is checking spmi asmdiffs test failure.

TIHan · 2024-08-14T22:56:29Z

The diffs were failing to acquire the mch files, so it didn't run anything. It's re-running now and has been for about 45 min.

tannergooding · 2024-08-14T22:58:45Z

The last set of spmi failures were all due to the downloads failing, generally that's due to a JIT/EE version change going in (like the one we had earlier today: https://github.com/dotnet/runtime/commits/9d9af3d22f4b9fdd453790eecab63795f896ea1a/src/coreclr/inc/jiteeversionguid.h)

The current jitstress-isas-avx512 failure is unrelated, the history shows its been failing off and on for a while now: https://dev.azure.com/dnceng-public/public/_build/results?buildId=776127&view=ms.vss-test-web.build-test-results-tab&runId=19865742&resultId=118465&paneView=history

tannergooding · 2024-08-14T23:01:26Z

I ran the full PMI diffs locally (jit-diff.exe diff --diff --pmi --tests, jit-diff.exe diff --diff --pmi --frameworks, and jit-diff.exe diff --diff --pmi --benchmarks), and no failures as compared to existing dotnet/main (there's historically been few like for FSharp.Core and a few interop tests)

tannergooding · 2024-08-14T23:59:47Z

/ba-g runtime queue completed but results were not reported due to GitHub outage: https://dev.azure.com/dnceng-public/public/_build/results?buildId=776122&view=results

Issues are known #100558 and #106428

Ruihan-Yin · 2024-08-15T00:04:40Z

Thanks for the investigation! is there any other blocker before merging?

tannergooding · 2024-08-15T00:04:41Z

Some spmi-diffs jobs failed to acquire the MCH files again, due to the JIT/EE version guid change

Local runs of both SPMI diffs and full PMI diffs are passing. Just waiting on SPMI replay to finish

tannergooding · 2024-08-15T00:09:15Z

SPMI replay has also finished, only failures are also MCH acquisition issues due to the JIT/EE version change. They remain passing locally.

fix dotnet#105969

78d6f56

dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Aug 10, 2024

dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Aug 10, 2024

This was referenced Aug 10, 2024

The Operation will be canceled. The next steps may not contain expected logs. dotnet/dnceng#3008

Open

System.Threading.ThreadPools.Tests.ThreadPoolTests.IOCompletionPortCountConfigVarTest failure - time out, unexpected exit code #106206

Closed

formatting

e17c891

tannergooding reviewed Aug 10, 2024

View reviewed changes

JulieLeeMSFT assigned Ruihan-Yin and TIHan Aug 12, 2024

JulieLeeMSFT added this to the 9.0.0 milestone Aug 12, 2024

reset the base type of some intrinsics when the actual base type mism…

4c63c15

…atches with the data type implied by the intrinsics.

This was referenced Aug 12, 2024

msbuild crashes with "MSB0001: Internal MSBuild Error: must be valid" dotnet/dnceng#3304

Open

TimeProviderTests.TestProviderTimer failed in CI #103459

Closed

Merge branch 'main' into runtime_105969

9c74ed4

tannergooding reviewed Aug 13, 2024

View reviewed changes

src/coreclr/jit/lowerxarch.cpp Outdated Show resolved Hide resolved

Ruihan-Yin added 3 commits August 13, 2024 13:40

Move the normalization to import.

cfa61e5

clean up

1a60820

bug fix

f62ef56

build-analysis bot mentioned this pull request Aug 13, 2024

MSBuild crashing in the build #92290

Open

Ruihan-Yin added 2 commits August 13, 2024 14:02

formatting.

98f7f1d

bug fix

b158fb5

tannergooding closed this Aug 14, 2024

tannergooding reopened this Aug 14, 2024

tannergooding reviewed Aug 14, 2024

View reviewed changes

src/coreclr/jit/hwintrinsic.cpp Show resolved Hide resolved

resolve comments.

9ab0d58

JulieLeeMSFT requested a review from TIHan August 14, 2024 16:02

tannergooding approved these changes Aug 14, 2024

View reviewed changes

Ensure that we're computing the correct memory operand size for disas…

c8682a4

…sembly

tannergooding mentioned this pull request Aug 14, 2024

Ensure that we're computing the correct memory operand size for disassembly #106405

Closed

TIHan approved these changes Aug 14, 2024

View reviewed changes

Merge remote-tracking branch 'dotnet/main' into runtime_105969

c2e4a82

Ensure that we correctly handled rewriting PTESTM to account for a ne…

9bd684f

…sted AND having an embedded broadcast

tannergooding reviewed Aug 14, 2024

View reviewed changes

TIHan approved these changes Aug 14, 2024

View reviewed changes

tannergooding merged commit ab3c7da into dotnet:main Aug 15, 2024
107 of 121 checks passed

build-analysis bot mentioned this pull request Aug 15, 2024

Miscellaneous.CopyCtor.ValidateCopyConstructorAndDestructorCalled failing with "Values Differ" #106428

Closed

DrewScoggins mentioned this pull request Aug 20, 2024

[Perf] Windows/x64: 12 Regressions on 8/15/2024 12:39:36 AM #106706

Closed

This was referenced Aug 20, 2024

[Perf] Linux/x64: 47 Improvements on 8/15/2024 12:39:36 AM dotnet/perf-autofiling-issues#40105

Closed

[Perf] Windows/x64: 25 Improvements on 8/15/2024 12:39:36 AM dotnet/perf-autofiling-issues#40073

Closed

github-actions bot locked and limited conversation to collaborators Sep 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix #105969 #106218

fix #105969 #106218

Ruihan-Yin commented Aug 10, 2024

Ruihan-Yin commented Aug 10, 2024

tannergooding Aug 10, 2024 •

edited

Loading

Ruihan-Yin Aug 11, 2024

JulieLeeMSFT commented Aug 12, 2024

tannergooding commented Aug 12, 2024

JulieLeeMSFT commented Aug 14, 2024

Ruihan-Yin commented Aug 14, 2024

JulieLeeMSFT commented Aug 14, 2024

tannergooding commented Aug 14, 2024

tannergooding commented Aug 14, 2024 •

edited

Loading

tannergooding commented Aug 14, 2024

tannergooding Aug 14, 2024

JulieLeeMSFT commented Aug 14, 2024

TIHan commented Aug 14, 2024

tannergooding commented Aug 14, 2024

tannergooding commented Aug 14, 2024

tannergooding commented Aug 14, 2024

Ruihan-Yin commented Aug 15, 2024

tannergooding commented Aug 15, 2024

tannergooding commented Aug 15, 2024

fix #105969 #106218

fix #105969 #106218

Conversation

Ruihan-Yin commented Aug 10, 2024

Ruihan-Yin commented Aug 10, 2024

tannergooding Aug 10, 2024 • edited Loading

Choose a reason for hiding this comment

Ruihan-Yin Aug 11, 2024

Choose a reason for hiding this comment

JulieLeeMSFT commented Aug 12, 2024

tannergooding commented Aug 12, 2024

JulieLeeMSFT commented Aug 14, 2024

Ruihan-Yin commented Aug 14, 2024

JulieLeeMSFT commented Aug 14, 2024

tannergooding commented Aug 14, 2024

tannergooding commented Aug 14, 2024 • edited Loading

tannergooding commented Aug 14, 2024

tannergooding Aug 14, 2024

Choose a reason for hiding this comment

JulieLeeMSFT commented Aug 14, 2024

TIHan commented Aug 14, 2024

tannergooding commented Aug 14, 2024

tannergooding commented Aug 14, 2024

tannergooding commented Aug 14, 2024

Ruihan-Yin commented Aug 15, 2024

tannergooding commented Aug 15, 2024

tannergooding commented Aug 15, 2024

tannergooding Aug 10, 2024 •

edited

Loading

tannergooding commented Aug 14, 2024 •

edited

Loading