Martin Kroeker
22c6607db9
Use ASMNAME to get symbol name from build system; leave x18 unused as reserved on MacOS
9 months ago
Martin Kroeker
ca22e28ca1
Rename sgemm_direct_sme1.S to sgemm_direct_sme1_2VLx2VL.S
9 months ago
Martin Kroeker
9c43301b6d
Merge pull request #5421 from reibax-marcus/develop
fix: broken cblas installation when using makefile based builds
9 months ago
Martin Kroeker
9d6df1dd3e
Merge pull request #5422 from ChipKerchner/addRVVVectorizedPacking
Add and use vectorized packing in ZVL128B and ZVL256B for RISCV
9 months ago
Martin Kroeker
f3b2a15fad
Merge pull request #5420 from yuanjia111/develop
Move the value assignment of vector x in gemv_n_sve.c to the outermos…
9 months ago
Chip Kerchner
64401b4417
Disable vectorized packing for DGEMM - since it is slower than scalar.
9 months ago
Martin Kroeker
5e43ba948c
Merge pull request #5419 from Mousius/bgemm-optimisation
Optimize SBGEMM / BGEMM for NEOVERSEV1 further
9 months ago
Chip Kerchner
c00afc86a6
Add and use vectorized packing to ZVL128B and ZVL256B. Up to 3x+ faster than generic scalar functions.
9 months ago
Xabier Marquiegui
3a6b79c50f
fix: broken cblas installation when using makefile based builds
Fix cblas.h missing from target directory if NO_CBLAS is defined but has
a value that indicates you do want cblas built and installed.
9 months ago
yuanjia
803e8d4838
Move the value assignment of vector x in gemv_n_sve.c to the outermost loop to reduce the repeated data retrieval.
1.Verify correctness using BLAS-Tester
2.Using the built-in benchmark to verify performance, the performance of float and doule type improved by about 60% and about 40% respectively.The test command is:
export OMP_NUM_THREADS=1;numactl -C 10 -l ./sgemv.goto 3000 4000 100
export OMP_NUM_THREADS=1;numactl -C 10 -l ./dgemv.goto 3000 4000 100
9 months ago
Chris Sidebottom
5f47b872f1
Remove older kernels for BGEMM on NEOVERSEV1
9 months ago
Chris Sidebottom
114316f361
Optimize SBGEMM / BGEMM for NEOVERSEV1 further
This changes the kernels to pack full SVE vectors and reduces the
overall complexity of the inner GEMM loop.
9 months ago
Martin Kroeker
75c6ab4036
CI: Update WoA job to use LLVM 20.1.8 and avoid stray preinstalled LLVM19 ( #5411 )
* Update to 20.1.8
* fix PATH to avoid the obsolete LLVM19 that appeared in the preinstalled msvc folder hierarchy
9 months ago
Martin Kroeker
5c5f852ee3
Merge pull request #5415 from martin-frbg/Fixum-5399
Fix compilation of the NeoverseN2 SBGEMM kernel
9 months ago
Martin Kroeker
f1ee61ea30
Include NEON header for the bfloat conversion functions
10 months ago
Martin Kroeker
b3ffd5524a
Include NEON header for the bfloat conversion functions
10 months ago
Martin Kroeker
d23680b81d
Merge pull request #5407 from nakagawa-fj/feature/gemm_divide_rate_for_neoversev1
Multi-thread Performance Improvement of GEMM on NeoverseV1 with DIVIDE_RATE=1
10 months ago
Martin Kroeker
b4cc4be2ce
Merge pull request #5410 from martin-frbg/issue5404
Adjust multithreading threshold in S/DGER and add an intermediate step
10 months ago
Martin Kroeker
0968dddf1a
Merge pull request #5409 from martin-frbg/issue5372
Work around gcc15.1 on POWER misoptimizing DGEMV at -O3
10 months ago
Martin Kroeker
eddfe1e6b3
Merge pull request #5408 from ChipKerchner/fixRISCV64GEMVInitializationAndWarnings
Fix bad vector zero initializer and other compiler warnings for RISC-V.
10 months ago
Martin Kroeker
30d11bc92c
Adjust multithreading threshold and add an intermediate step
10 months ago
Martin Kroeker
a3b9c933c5
mark xbuffer as volatile to work around gcc15.1 optimizer bug
10 months ago
Chip Kerchner
72f082f31d
Fix bad vector zero initializer and other compiler warnings for RISC-V.
10 months ago
Masato Nakagawa
7e29f11396
Multi-thread GEMM Performance Improvement on NeoverseV1 (DIVIDE_RATE=1)
10 months ago
Martin Kroeker
9a64b32b44
Merge pull request #5406 from martin-frbg/fixbgemmtest
Fix building of bgemm tests on GEMM3M-capable (x86) targets
10 months ago
Martin Kroeker
b66a01f909
Fix building of bgemm tests on GEMM3M-capable (x86) targets
10 months ago
Martin Kroeker
a5e7c0e3e0
Merge pull request #5396 from abhishek-iitmadras/abhishekk_bfloat16
ARM64: Enable bfloat16 kernels by default
10 months ago
abhishek-fujitsu
6356190d06
fix gfortran link path in dynamic_arch.yml
10 months ago
abhishek-fujitsu
4c8dcb3a8f
Darwin/arm64: disable SVE/SME and fix gfortran link path
10 months ago
Martin Kroeker
33b50548eb
Merge pull request #5403 from martin-frbg/issue5402
Introduce a (crude) threshold to multithreading in STRMV/DTRMV
10 months ago
Martin Kroeker
c504aedca1
Merge pull request #5400 from Mousius/neoversev2-target
Add NEOVERSEV2 target support
10 months ago
Martin Kroeker
b9e107932a
add NeoverseV2
10 months ago
Martin Kroeker
2f89a5970e
fix NeoverseV2 typo
10 months ago
Martin Kroeker
a9e8fa06bf
Introduce a (crude) threshold to multithreading
10 months ago
Martin Kroeker
b4c2b34a45
Merge pull request #5401 from martin-frbg/followup-5397
Include float-bfloat conversion functions in ONLY_CBLAS builds as well
10 months ago
Martin Kroeker
c9204f7b6f
Merge pull request #5399 from Mousius/bgemm-8x4
Add optimized BGEMM for NEOVERSEN2 target
10 months ago
Martin Kroeker
a55e65dba9
Merge pull request #5391 from martin-frbg/issue5387
Use OpenBLAS_ROOT_DIR in OpenBLASConfig.cmake generation only if set
10 months ago
abhishek-fujitsu
0bc79da587
add neon header
10 months ago
abhishek-fujitsu
720a4743b9
update contribution list
10 months ago
abhishek-fujitsu
05fc88180c
ARM64: Enable bfloat16 kernels by default
1 year ago
Martin Kroeker
965463f177
Include float-bfloat conversion functions in ONLY_CBLAS builds as well
10 months ago
Martin Kroeker
4272cf8c7f
Merge pull request #5398 from martin-frbg/fixup-5394
Update ?GEMM-to-?GEMV forwarding settings for CMake
10 months ago
Chris Sidebottom
87247daadc
Add NEOVERSEV2 target support
Did a quick run around to make `TARGET=NEVOERSEV2` build successfully.
Fixes #5385
10 months ago
Chris Sidebottom
ea2faf0c9a
Add optimized BGEMM for NEOVERSEN2 target
This re-uses the existing NEOVERSEN2 8x4 `sbgemm` kernel to implement `bgemm`.
10 months ago
Martin Kroeker
a5b55f6fe3
remove CBLAS restriction on GEMM_GEMV forwarding
10 months ago
Martin Kroeker
a4f4662459
Merge pull request #5397 from omegacoleman/fix-cblas-bgemm
Fix cmake building with cblas_bgemm
10 months ago
Martin Kroeker
82954ba4ca
Update ?GEMM-to-?GEMV forwarding settings
10 months ago
Martin Kroeker
392d38168e
Merge pull request #5394 from Mousius/optimize-bgemv
Optimized BGEMV for NEOVERSEV1 target
10 months ago
youcai
41f9701ebc
Fix cmake building with cblas_bgemm
10 months ago
Martin Kroeker
f4caa61e47
Merge pull request #5395 from martin-frbg/fixloongsonCI
Fix libffi6 download in the Loongarch64_clang CI job (for now)
10 months ago