Skip to content

xe: gemm: fix stride interface#5068

Open
rjoursler wants to merge 1 commit intomainfrom
rjoursle/gemm_stride
Open

xe: gemm: fix stride interface#5068
rjoursler wants to merge 1 commit intomainfrom
rjoursle/gemm_stride

Conversation

@rjoursler
Copy link
Copy Markdown
Contributor

Avoids incorrect offset calculations for batched GEMM. Fixes the following issue reported in MFDNN-14479

$ ./tests/benchdnn/benchdnn --matmul --engine=gpu --dt=f32:f32:f32 --stag=acbd --wtag=adbc --dtag=abcd --attr-post-ops=binary_mul:f32:0 --attr-scratchpad=user 4x3x16413x16:4x3x16x16413
Segmentation fault from GPU at 0xff0000000000f000, ctx_id: 1 (CCS) type: 0 (NotPresent), level: 1 (PDE), access: 1 (Write), banned: 1, aborting.
Segmentation fault from GPU at 0xff0000000000f000, ctx_id: 1 (CCS) type: 0 (NotPresent), level: 1 (PDE), access: 1 (Write), banned: 1, aborting.
Abort was called at 306 line in file:
./shared/source/os_interface/linux/drm_neo.cpp
Aborted (core dumped) 

@rjoursler rjoursler requested a review from a team as a code owner April 22, 2026 17:38
@github-actions github-actions Bot added the platform:gpu-intel Codeowner: @oneapi-src/onednn-gpu-intel label Apr 22, 2026
@rjoursler
Copy link
Copy Markdown
Contributor Author

make test
disable test_device_cpu
disable build_cpu_runtime_omp
disable build_cpu_runtime_sycl
disable build_cpu_runtime_tbb

@rjoursler
Copy link
Copy Markdown
Contributor Author

make test_ov

arg_list.set(argn++, pd()->scale_stride(i, eff_b_arg));
}
if (problem->hasCMXScale()) {
arg_list.set(argn++, stride_c / problem->cqGroupM);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now 64-bit but the kernel interface uses 32-bit. I guess we don't have any type checks but theoretically that could lead to a similar overflow issue.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good points, I will try updating these as well and check if there are any issues.

@Simonsays095
Copy link
Copy Markdown
Contributor

Could we do something similar to what we do with the opencl dispatcher, and use 64-bit strides/dimensions only if required by the problem?

@rjoursler
Copy link
Copy Markdown
Contributor Author

rjoursler commented Apr 24, 2026

Could we do something similar to what we do with the opencl dispatcher, and use 64-bit strides/dimensions only if required by the problem?

We definitely can, but I only intend to do this if we encounter a performance regression. If the offset calculations is not important for performance (which is generally the case for GEMM), then there is no benefit to adding this control, it only creates an extra point of failure.

Avoids incorrect offset calculations for batched GEMM.
@rjoursler rjoursler force-pushed the rjoursle/gemm_stride branch from c6ddc5f to 053f482 Compare April 27, 2026 22:39
@rjoursler
Copy link
Copy Markdown
Contributor Author

make test
disable test_device_cpu
disable build_cpu_runtime_omp
disable build_cpu_runtime_sycl
disable build_cpu_runtime_tbb

@rjoursler
Copy link
Copy Markdown
Contributor Author

make test_ov

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

platform:gpu-intel Codeowner: @oneapi-src/onednn-gpu-intel

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants