Martin Kroeker
|
ef8a44d981
|
Merge 2b5d8c789d into 06c09deee9
|
5 months ago |
Martin Kroeker
|
06c09deee9
|
Merge pull request #5426 from hideaki-motoki/issue5417_axpy_sve
Implementing SVE in `[SD]AXPY` Kernels for `A64FX` and `Graviton3E`
|
5 months ago |
Martin Kroeker
|
da7d0f4a38
|
Merge pull request #5427 from yuanjia111/develop
Optimize the gemv_t_vector.c kernel for RISCV64_ZVL256B target
|
5 months ago |
Martin Kroeker
|
2b5d8c789d
|
remove debugging printout
|
5 months ago |
Martin Kroeker
|
1b88c9c742
|
remove debugging printouts
|
5 months ago |
Martin Kroeker
|
b4fc09e9e1
|
Add registers d8 to d15 to clobber lists as the code does not expressly save them
|
5 months ago |
Martin Kroeker
|
8e50b8d525
|
Add d8 to d15 to clobber lists as the code does not expressly save them
|
5 months ago |
Martin Kroeker
|
7f89c6f353
|
smh-based direct sgemm currently requires leading dimensions to be same as matrix dimension
|
5 months ago |
yuanjia
|
c2cc7a3602
|
riscv64: optimize gemv_t_vector.c
|
5 months ago |
h-motoki
|
e23f9c6642
|
Merge remote-tracking branch 'upstream/develop' into issue5417_axpy_sve
|
5 months ago |
Martin Kroeker
|
b3f247ae5a
|
Merge pull request #5425 from martin-frbg/fixup5389
Increase L2 defaults for RISCV X280 / ZVL256B and ARM SVE targets in CMake cross-compilation
|
5 months ago |
h-motoki
|
855945befb
|
Implementing SVE in [SD]AXPY Kernels for A64FX and Graviton3E
|
5 months ago |
Martin Kroeker
|
7c1839899e
|
Increase assumed L2 sizes for RISCV X280 / ZVL256B and for SVE-capable ARM64
|
5 months ago |
Martin Kroeker
|
1ee8879c78
|
Add VORTEXM4
|
5 months ago |
Martin Kroeker
|
edaa73fd24
|
Hide the local 2VLx2VL symbol as static is insufficient for this with gcc
|
5 months ago |
Martin Kroeker
|
501728a354
|
adjust register 20 accesses to 21 after moving x18
|
5 months ago |
Martin Kroeker
|
107c883c8a
|
Update SME-related kernels
|
5 months ago |
Martin Kroeker
|
05dbb54362
|
Delete misplaced file
|
5 months ago |
Martin Kroeker
|
4609732e69
|
Relax version number requirement for AppleClang
|
5 months ago |
Martin Kroeker
|
bf98e448eb
|
Add VORTEXM4 to DYNAMIC_ARCH list
|
5 months ago |
Martin Kroeker
|
0bc19a1335
|
Update SME kernel details
|
5 months ago |
Martin Kroeker
|
426b5f23ed
|
Add compiler options for VORTEXM4
|
5 months ago |
Martin Kroeker
|
4328c91e27
|
relax requirements in compiler SME capability check
|
5 months ago |
Martin Kroeker
|
c794d0a4ce
|
Add VORTEXM4
|
5 months ago |
Martin Kroeker
|
a4f5fec46e
|
Add compiler options for VORTEXM4
|
5 months ago |
Martin Kroeker
|
ca542f319f
|
Add VORTEXM4
|
5 months ago |
Martin Kroeker
|
18f9582f3e
|
Add VORTEXM4
|
5 months ago |
Martin Kroeker
|
4e2a8c18e5
|
Split VORTEXM4 from VORTEX target due to SME support
|
5 months ago |
Martin Kroeker
|
30970460b8
|
Add VORTEXM4 target
|
5 months ago |
Martin Kroeker
|
b0a00fbd62
|
Add minimal compiler flags for VORTEXM4
|
5 months ago |
Martin Kroeker
|
ccfd0170fb
|
Enable SME on MacOS and add VORTEXM4 to DYNAMIC_ARCH list
|
5 months ago |
Martin Kroeker
|
ef0b883dff
|
Add sgemm_direct_performant for ARM64
|
5 months ago |
Martin Kroeker
|
e76c39099a
|
Add sgemm_direct_performant for ARM64
|
5 months ago |
Martin Kroeker
|
202a7a0e2a
|
Separate VORTEXM4 from VORTEX and ARMV9SME
|
5 months ago |
Martin Kroeker
|
de91afd2ae
|
Move SGEMM_DIRECT after the CBLAS parameter check and add sgemm_direct_performant for ARM64
|
5 months ago |
Martin Kroeker
|
0203657f40
|
Add sgemm_direct_performant for ARM64
|
5 months ago |
Martin Kroeker
|
e82bcd2740
|
Update ARM64 sgemm_direct object generation
|
5 months ago |
Martin Kroeker
|
731f4dd686
|
Add VORTEXM4 settings
|
5 months ago |
Martin Kroeker
|
53d3bb50cc
|
Get symbol name from build system; change b.first to b.mi for AppleClang compatibility
|
5 months ago |
Martin Kroeker
|
08a00326a4
|
Build symbol name from build system variables
|
5 months ago |
Martin Kroeker
|
89898fc499
|
Add sgemm_direct_performant for switching between direct and regular kernels
|
5 months ago |
Martin Kroeker
|
22c6607db9
|
Use ASMNAME to get symbol name from build system; leave x18 unused as reserved on MacOS
|
5 months ago |
Martin Kroeker
|
ca22e28ca1
|
Rename sgemm_direct_sme1.S to sgemm_direct_sme1_2VLx2VL.S
|
5 months ago |
Martin Kroeker
|
9c43301b6d
|
Merge pull request #5421 from reibax-marcus/develop
fix: broken cblas installation when using makefile based builds
|
5 months ago |
Martin Kroeker
|
9d6df1dd3e
|
Merge pull request #5422 from ChipKerchner/addRVVVectorizedPacking
Add and use vectorized packing in ZVL128B and ZVL256B for RISCV
|
6 months ago |
Martin Kroeker
|
f3b2a15fad
|
Merge pull request #5420 from yuanjia111/develop
Move the value assignment of vector x in gemv_n_sve.c to the outermos…
|
6 months ago |
Chip Kerchner
|
64401b4417
|
Disable vectorized packing for DGEMM - since it is slower than scalar.
|
6 months ago |
Martin Kroeker
|
5e43ba948c
|
Merge pull request #5419 from Mousius/bgemm-optimisation
Optimize SBGEMM / BGEMM for NEOVERSEV1 further
|
6 months ago |
Chip Kerchner
|
c00afc86a6
|
Add and use vectorized packing to ZVL128B and ZVL256B. Up to 3x+ faster than generic scalar functions.
|
6 months ago |
Xabier Marquiegui
|
3a6b79c50f
|
fix: broken cblas installation when using makefile based builds
Fix cblas.h missing from target directory if NO_CBLAS is defined but has
a value that indicates you do want cblas built and installed.
|
6 months ago |