Aymen Qader
3d282c93c5
Add Arm®v9-A architecture SME SGEMM kernels
Add implementation of SGEMM based on the Arm®v9-A architecture Scalable
Matrix Extension (SME) [1], using the Arm C Language Extensions (ACLE)
[2].
Add SME2 compute & packing kernels for SGEMM and enable them under the
ARMV9SME target.
The compute kernel performs outer products on panels of A and B,
accumulating into 2x2 inner blocks of C via the SME two-dimensional
architectural register, ZA.
The non-transpose packing kernel performs a copy into a contiguous
buffer using SVE loads & stores in Streaming SVE mode. Streaming SVE is
an execution mode introduced by SME that supports execution of SVE code
with the SME defined vector length, known as the Streaming SVE vector
length (SVL).
The transpose packing kernel performs on-the-fly transposition by
utilizing horizontal & vertical tile slice access to the SME ZA
register.
Includes an update to the driver to account for expanded inner block.
Note: this places the ARMV9SME target in WIP state. It is functional for
SGEMM, and all GEMM tests are passing. Other BLAS3 routines have not
been updated to match the larger kernel size, so SYMM/TRMM tests are
currently expected to fail in this WIP state.
[1] https://developer.arm.com/documentation/109246/0100/SME-Overview/SME-and-SME2
[2] https://arm-software.github.io/acle/main/acle.html
1 year ago
Aymen Qader
b036235f37
Add Arm®v9-A architecture SME target
Add a new target, ARMV9SME, for Arm®v9-A architecture systems that
support the Scalable Matrix Extension (SME) [1].
Initially inherits ARMV8SVE settings with updated compiler flags. This
target can only be built with an SME-capable toolchain such as GCC 14 or
LLVM 19.
Includes some initial FEAT_SME2 feature detection on Linux targets via
hwcaps. Target is disabled in DYNAMIC_ARCH builds by default.
This is intended as a base target for SME2 kernels.
[1] https://developer.arm.com/documentation/109246/0100/SME-Overview/SME-and-SME2
1 year ago
Martin Kroeker
89f02ed394
Merge pull request #5014 from martin-frbg/issue5013
Add some missed lapack 3.11+ symbols to gensymbol
1 year ago
Martin Kroeker
61d5aec7c1
remove typo
1 year ago
Martin Kroeker
5aea097df0
add missing lapack 3.11+ symbols
1 year ago
Martin Kroeker
72f7b7011c
Merge pull request #5009 from martin-frbg/pybenchdoc
DOCS, pybench : Add build notes for Windows and flang from gh Discussion 5008
1 year ago
Martin Kroeker
0f8ff82592
Add build notes for Windows and flang from gh Discussion 5008
1 year ago
Martin Kroeker
81666de4ef
Merge pull request #5007 from martin-frbg/issue5006
Revert the NRM2 kernels for NeoverseN2 and ARMV8SVE targets to the generic NEON version
1 year ago
Martin Kroeker
230e665bca
Merge pull request #4996 from iha-taisei/sdgemv_sve_unroll
Loop-unrolled transposed [SD]GEMV kernels for A64FX and Neoverse V1
1 year ago
Martin Kroeker
3345007d8f
retire the thunderx2 NRM2 kernels due to reported inaccuracies and NAN
1 year ago
Martin Kroeker
5fe983db29
retire the thunderx2 nrm2 kernels for now due to NAN and inaccuracies
1 year ago
Martin Kroeker
5dc4d7dd7e
Merge pull request #5005 from martin-frbg/evbarm
Improve support for NetBSD on arm64 (evbarm)
1 year ago
Martin Kroeker
4ba471dd5a
Merge pull request #5003 from mathomp4/bugfix/nag-pic
Fixes for NAG Compiler
1 year ago
Martin Kroeker
a791912cbb
handle uname returning evbarm on NetBSD
1 year ago
Martin Kroeker
1a6ecda398
utilize /proc/cpuinfo on NetBSD too
1 year ago
Matthew Thompson
c4e8bac5a5
Fix indent
1 year ago
Matthew Thompson
d3b2036d49
Move to use ERROR STOP instead of ABORT
1 year ago
Matthew Thompson
35334ed2ea
Fixes for Fortran Standards violations for lapack-netlib
1 year ago
Matthew Thompson
be19966d3b
Fixes for NAG CMake
1 year ago
Martin Kroeker
9c5d20187b
Merge pull request #4999 from dg0yt/macro-failed
Fix redefinition of FAILED
1 year ago
Matthew Thompson
2eaf285de5
Use F_COMPILER name
1 year ago
Matthew Thompson
a8b1705dbd
CMake build has wrong PIC flag for NAG
1 year ago
Martin Kroeker
5f65846691
Merge pull request #4998 from dg0yt/arm-type-function
arm: Declare symbols as .type function
1 year ago
Kai Pastor
93eb42fdc8
Fix redefinition of FAILED
1 year ago
Kai Pastor
dc905636d1
arm: Declare symbols as .type function
1 year ago
Iha, Taisei
4918beecbe
Loop-unrolled transposed [SD]GEMV kernels for A64FX and Neoverse V1
1 year ago
Martin Kroeker
0578a89afd
Merge pull request #4993 from martin-frbg/issue4991
Translate CMAKE_SYSTEM_NAME in compilations on or for IOS
1 year ago
Martin Kroeker
57a51d74c9
translate CMAKE_SYSTEM_NAME in compilations on or for IOS
1 year ago
Martin Kroeker
35f2e6afe6
Merge pull request #4992 from mmuetzel/ci-msys2
CI (MinGW): Remove CLANG32 environment from build matrix.
1 year ago
Markus Mützel
f5e6b5b5c9
CI (MinGW): Remove CLANG32 environment from build matrix.
The CLANG32 environment is in the process of being removed from MSYS2
currently:
https://www.msys2.org/news/#2024-09-23-starting-to-drop-the-clang32-environment
Remove it from the build matrix ahead of its complete removal from MSYS2.
1 year ago
Martin Kroeker
8e8003a2d1
Merge pull request #4180 from mmuetzel/cmake
CI (MinGW): Remove work-around needed for old versions of LLVM Flang
1 year ago
Martin Kroeker
71963a7bc4
Merge pull request #4985 from CheryDan/RISCV/sched
added optimizations for RISC-V YIELDING
1 year ago
Markus Mützel
7452af4471
CI (MinGW): Remove work-around with NO_AVX512 that was needed for older versions of LLVM Flang.
2 years ago
Martin Kroeker
82088cb266
Merge pull request #4986 from martin-frbg/readme_compilers
Add compiler version notes and mention the f2c fallback LAPACK in the README
1 year ago
Martin Kroeker
8481301f1a
Merge pull request #4987 from martin-frbg/issue3973
Update build instructions for WoA (use LLVM19 and its flang-new)
1 year ago
Martin Kroeker
009c1e0387
fix download link for the current WoA binary of LLVM
1 year ago
Martin Kroeker
760a5371f3
Update build instructions for WoA (use LLVM19 and its flang-new)
1 year ago
Martin Kroeker
3a63bbabd1
Add compiler version notes and mention the f2c fallback LAPACK
1 year ago
Martin Kroeker
c520ed1916
Merge pull request #4984 from rgommers/docs-link
doc: update README to link to the html docs and fix links
1 year ago
daichengrong
0b3db03d4b
added optimizations for RISC-V YIELDING
1 year ago
Ralf Gommers
a0131e56e0
doc: update README to link to the html docs and fix links
Also some minor formatting improvements and linking the home page.
1 year ago
Martin Kroeker
18014b04c8
Merge pull request #4979 from martin-frbg/issue4978-2
Remove any optimization flags from DEBUG builds on POWER architecture
1 year ago
Martin Kroeker
9db51f790a
Remove any optimization flags from DEBUG builds on POWER architecture
1 year ago
Martin Kroeker
e334b79b47
Merge pull request #4977 from martin-frbg/issue4973
Add dummy implementations of openblas_get/set_affinity for OpenMP builds
1 year ago
Martin Kroeker
4060dd43e3
Add dummy implementations of openblas_get/set_affinity
1 year ago
Martin Kroeker
2e2f952bfb
Merge pull request #4975 from martin-frbg/fixup4974
Update Cray compiler options and calling convention in CMake
1 year ago
Martin Kroeker
cea9df3643
Update Cray compiler options and calling convention
1 year ago
Martin Kroeker
3e7e312d7d
Merge pull request #4974 from cenewcombe/develop
Corrections for Cray and Nvidia Fortran compiler calling conventions
1 year ago
Caroline Newcombe
10cf06dce1
Merge branch 'OpenMathLib:develop' into develop
1 year ago
Caroline Newcombe
760bf7aa37
Update Fortran return for complex data types (Cray and Nvidia compilers)
1 year ago