Martin Kroeker
1ee8879c78
Add VORTEXM4
5 months ago
Martin Kroeker
edaa73fd24
Hide the local 2VLx2VL symbol as static is insufficient for this with gcc
5 months ago
Martin Kroeker
501728a354
adjust register 20 accesses to 21 after moving x18
5 months ago
Martin Kroeker
107c883c8a
Update SME-related kernels
5 months ago
Martin Kroeker
05dbb54362
Delete misplaced file
5 months ago
Martin Kroeker
4609732e69
Relax version number requirement for AppleClang
5 months ago
Martin Kroeker
bf98e448eb
Add VORTEXM4 to DYNAMIC_ARCH list
5 months ago
Martin Kroeker
0bc19a1335
Update SME kernel details
5 months ago
Martin Kroeker
426b5f23ed
Add compiler options for VORTEXM4
5 months ago
Martin Kroeker
4328c91e27
relax requirements in compiler SME capability check
5 months ago
Martin Kroeker
c794d0a4ce
Add VORTEXM4
5 months ago
Martin Kroeker
a4f5fec46e
Add compiler options for VORTEXM4
5 months ago
Martin Kroeker
ca542f319f
Add VORTEXM4
5 months ago
Martin Kroeker
18f9582f3e
Add VORTEXM4
5 months ago
Martin Kroeker
4e2a8c18e5
Split VORTEXM4 from VORTEX target due to SME support
5 months ago
Martin Kroeker
30970460b8
Add VORTEXM4 target
5 months ago
Martin Kroeker
b0a00fbd62
Add minimal compiler flags for VORTEXM4
5 months ago
Martin Kroeker
ccfd0170fb
Enable SME on MacOS and add VORTEXM4 to DYNAMIC_ARCH list
5 months ago
Martin Kroeker
ef0b883dff
Add sgemm_direct_performant for ARM64
5 months ago
Martin Kroeker
e76c39099a
Add sgemm_direct_performant for ARM64
5 months ago
Martin Kroeker
202a7a0e2a
Separate VORTEXM4 from VORTEX and ARMV9SME
5 months ago
Martin Kroeker
de91afd2ae
Move SGEMM_DIRECT after the CBLAS parameter check and add sgemm_direct_performant for ARM64
5 months ago
Martin Kroeker
0203657f40
Add sgemm_direct_performant for ARM64
5 months ago
Martin Kroeker
e82bcd2740
Update ARM64 sgemm_direct object generation
5 months ago
Martin Kroeker
731f4dd686
Add VORTEXM4 settings
5 months ago
Martin Kroeker
53d3bb50cc
Get symbol name from build system; change b.first to b.mi for AppleClang compatibility
5 months ago
Martin Kroeker
08a00326a4
Build symbol name from build system variables
5 months ago
Martin Kroeker
89898fc499
Add sgemm_direct_performant for switching between direct and regular kernels
5 months ago
Martin Kroeker
22c6607db9
Use ASMNAME to get symbol name from build system; leave x18 unused as reserved on MacOS
5 months ago
Martin Kroeker
ca22e28ca1
Rename sgemm_direct_sme1.S to sgemm_direct_sme1_2VLx2VL.S
5 months ago
Martin Kroeker
9c43301b6d
Merge pull request #5421 from reibax-marcus/develop
fix: broken cblas installation when using makefile based builds
5 months ago
Martin Kroeker
9d6df1dd3e
Merge pull request #5422 from ChipKerchner/addRVVVectorizedPacking
Add and use vectorized packing in ZVL128B and ZVL256B for RISCV
5 months ago
Martin Kroeker
f3b2a15fad
Merge pull request #5420 from yuanjia111/develop
Move the value assignment of vector x in gemv_n_sve.c to the outermos…
5 months ago
Chip Kerchner
64401b4417
Disable vectorized packing for DGEMM - since it is slower than scalar.
5 months ago
Martin Kroeker
5e43ba948c
Merge pull request #5419 from Mousius/bgemm-optimisation
Optimize SBGEMM / BGEMM for NEOVERSEV1 further
5 months ago
Chip Kerchner
c00afc86a6
Add and use vectorized packing to ZVL128B and ZVL256B. Up to 3x+ faster than generic scalar functions.
5 months ago
Xabier Marquiegui
3a6b79c50f
fix: broken cblas installation when using makefile based builds
Fix cblas.h missing from target directory if NO_CBLAS is defined but has
a value that indicates you do want cblas built and installed.
5 months ago
yuanjia
803e8d4838
Move the value assignment of vector x in gemv_n_sve.c to the outermost loop to reduce the repeated data retrieval.
1.Verify correctness using BLAS-Tester
2.Using the built-in benchmark to verify performance, the performance of float and doule type improved by about 60% and about 40% respectively.The test command is:
export OMP_NUM_THREADS=1;numactl -C 10 -l ./sgemv.goto 3000 4000 100
export OMP_NUM_THREADS=1;numactl -C 10 -l ./dgemv.goto 3000 4000 100
5 months ago
Chris Sidebottom
5f47b872f1
Remove older kernels for BGEMM on NEOVERSEV1
5 months ago
Chris Sidebottom
114316f361
Optimize SBGEMM / BGEMM for NEOVERSEV1 further
This changes the kernels to pack full SVE vectors and reduces the
overall complexity of the inner GEMM loop.
5 months ago
Martin Kroeker
75c6ab4036
CI: Update WoA job to use LLVM 20.1.8 and avoid stray preinstalled LLVM19 ( #5411 )
* Update to 20.1.8
* fix PATH to avoid the obsolete LLVM19 that appeared in the preinstalled msvc folder hierarchy
5 months ago
Martin Kroeker
5c5f852ee3
Merge pull request #5415 from martin-frbg/Fixum-5399
Fix compilation of the NeoverseN2 SBGEMM kernel
5 months ago
Martin Kroeker
f1ee61ea30
Include NEON header for the bfloat conversion functions
6 months ago
Martin Kroeker
b3ffd5524a
Include NEON header for the bfloat conversion functions
6 months ago
Martin Kroeker
d23680b81d
Merge pull request #5407 from nakagawa-fj/feature/gemm_divide_rate_for_neoversev1
Multi-thread Performance Improvement of GEMM on NeoverseV1 with DIVIDE_RATE=1
6 months ago
Martin Kroeker
b4cc4be2ce
Merge pull request #5410 from martin-frbg/issue5404
Adjust multithreading threshold in S/DGER and add an intermediate step
6 months ago
Martin Kroeker
0968dddf1a
Merge pull request #5409 from martin-frbg/issue5372
Work around gcc15.1 on POWER misoptimizing DGEMV at -O3
6 months ago
Martin Kroeker
eddfe1e6b3
Merge pull request #5408 from ChipKerchner/fixRISCV64GEMVInitializationAndWarnings
Fix bad vector zero initializer and other compiler warnings for RISC-V.
6 months ago
Martin Kroeker
30d11bc92c
Adjust multithreading threshold and add an intermediate step
6 months ago
Martin Kroeker
a3b9c933c5
mark xbuffer as volatile to work around gcc15.1 optimizer bug
6 months ago