27304fb29
(refs/pull/5432/merge)
Merge 7fcad02dc2 into c31861ea62 by
2025-09-03 13:36:58 -0400
9fd0a04e5
(refs/pull/5187/merge)
Merge b537c1be49 into c31861ea62 by
2025-09-02 18:57:29 -0600
01ab5e059
(refs/pull/5434/merge)
Merge ed6c223105 into c31861ea62 by
2025-09-02 18:44:21 -0400
e42106ab7
(refs/pull/5431/merge)
Merge ce79fe12fd into c31861ea62 by
2025-09-02 23:12:13 +0200
c31861ea6
(HEAD -> develop)
Merge pull request #5435 from martin-frbg/update_rvv_ci by
2025-09-02 14:11:16 -0700
9b28fed6a
(gh-pages)
deploy: 6d070820fc by
2025-09-02 19:28:31 +0000
57c2936a4
(refs/pull/5435/head)
Merge branch 'OpenMathLib:develop' into update_rvv_ci by
2025-09-02 12:09:30 -0700
6d070820f
Merge pull request #5436 from martin-frbg/update_osx_ci by
2025-09-02 12:09:09 -0700
1c7251ca2
(refs/pull/5436/head)
remove the -llto_library option for any osx fortran compiler by
2025-09-02 18:36:02 +0200
728c1e7c3
(refs/pull/3748/merge)
Merge c1a5a71d1c into 06c09deee9 by
2025-09-02 11:18:14 -0400
a1331406a
drop (re)installation of cmake on osx runners by
2025-09-02 15:39:08 +0200
c42fccccb
Drop installation of cmake by
2025-09-02 15:36:32 +0200
4c1a4e60a
Update toolchain to its latest nightly build by
2025-09-02 14:54:08 +0200
ed6c22310
(refs/pull/5434/head)
CMake: Improve the wording of the OpenMP mixed linkage check by
2025-09-01 22:33:52 -0400
fd8f0d4f8
CMake: Demote the OpenMP mixed linkage check to NOTICE by
2025-08-31 22:21:01 -0400
cef77f8e3
(refs/pull/4833/merge)
Merge 5f8744d4e4 into 06c09deee9 by
2025-08-29 18:09:13 +0800
96210915c
(refs/pull/4080/merge)
Merge 806073ccbc into 06c09deee9 by
2025-08-29 18:07:43 +0800
5a27c6cf6
(refs/pull/5256/merge)
Merge 8e47512286 into 06c09deee9 by
2025-08-29 06:06:28 -0400
488ecb444
(refs/pull/5326/merge)
Merge ed457343d5 into 06c09deee9 by
2025-08-29 18:05:46 +0800
8f0fbfe79
(refs/pull/5413/merge)
Merge 52792f6da7 into 06c09deee9 by
2025-08-29 10:24:11 +0200
7e9193509
(refs/pull/5318/merge)
Merge 2049628f22 into 06c09deee9 by
2025-08-29 14:50:47 +0900
1e581357f
(refs/pull/4270/merge)
Merge 4da1a0b1da into 06c09deee9 by
2025-08-29 14:50:21 +0900
a80d1d3cc
(refs/pull/5010/merge)
Merge dc68a48ddd into 06c09deee9 by
2025-08-28 10:52:09 -0400
7fcad02dc
(refs/pull/5432/head)
fix RVV 1.0 detection code by
2025-08-28 14:08:46 +0000
ce79fe12f
(refs/pull/5431/head)
disable fp16 flags on RISC-V unless BUILD_HFLOAT16=1 by
2025-08-27 10:15:09 +0000
bcbfbde10
(refs/pull/5418/merge)
Merge b04ac31f6e into 06c09deee9 by
2025-08-27 22:34:20 +0530
59b189099
(refs/pull/5338/merge)
Merge fe783000d8 into 06c09deee9 by
2025-08-27 17:52:09 +0530
2d406ebde
(refs/pull/5393/merge)
Merge 06ced6da16 into 06c09deee9 by
2025-08-27 09:23:12 +0100
ef8a44d98
(refs/pull/5423/merge)
Merge 2b5d8c789d into 06c09deee9 by
2025-08-26 16:30:37 +0530
d2ae5fe70
(refs/pull/1752/merge)
Merge 8450c13fb1 into 06c09deee9 by
2025-08-26 17:15:44 +0900
620cc8daf
deploy: 06c09deee9 by
2025-08-26 08:10:46 +0000
06c09deee
Merge pull request #5426 from hideaki-motoki/issue5417_axpy_sve by
2025-08-26 01:10:14 -0700
60ef0f758
deploy: da7d0f4a38 by
2025-08-25 13:46:13 +0000
da7d0f4a3
Merge pull request #5427 from yuanjia111/develop by
2025-08-25 06:45:44 -0700
2b5d8c789
(refs/pull/5423/head)
remove debugging printout by
2025-08-24 13:50:08 -0700
1b88c9c74
remove debugging printouts by
2025-08-24 13:48:22 -0700
b4fc09e9e
Add registers d8 to d15 to clobber lists as the code does not expressly save them by
2025-08-23 14:39:27 -0700
8e50b8d52
Add d8 to d15 to clobber lists as the code does not expressly save them by
2025-08-23 14:36:49 -0700
7f89c6f35
smh-based direct sgemm currently requires leading dimensions to be same as matrix dimension by
2025-08-23 14:20:15 -0700
765853c88
(refs/pull/4313/merge)
Merge d67a534b9e into b3f247ae5a by
2025-08-23 22:33:47 +0800
c2cc7a360
(refs/pull/5427/head)
riscv64: optimize gemv_t_vector.c by
2025-08-22 16:14:14 +0800
e23f9c664
(refs/pull/5426/head)
Merge remote-tracking branch 'upstream/develop' into issue5417_axpy_sve by
2025-08-21 22:16:28 +0900
af13d97bc
deploy: b3f247ae5a by
2025-08-21 12:14:03 +0000
b3f247ae5
Merge pull request #5425 from martin-frbg/fixup5389 by
2025-08-21 05:13:34 -0700
855945bef
Implementing SVE in [SD]AXPY Kernels for A64FX and Graviton3E by
2025-08-21 20:56:58 +0900
7c1839899
(refs/pull/5425/head)
Increase assumed L2 sizes for RISCV X280 / ZVL256B and for SVE-capable ARM64 by
2025-08-21 11:57:07 +0200
1ee8879c7
Add VORTEXM4 by
2025-08-20 09:59:32 -0700
edaa73fd2
Hide the local 2VLx2VL symbol as static is insufficient for this with gcc by
2025-08-20 06:33:28 -0700
501728a35
adjust register 20 accesses to 21 after moving x18 by
2025-08-20 06:24:38 -0700
107c883c8
Update SME-related kernels by
2025-08-19 05:13:28 -0700
05dbb5436
Delete misplaced file by
2025-08-19 05:12:09 -0700
4609732e6
Relax version number requirement for AppleClang by
2025-08-18 14:54:20 -0700
bf98e448e
Add VORTEXM4 to DYNAMIC_ARCH list by
2025-08-18 14:43:08 -0700
0bc19a133
Update SME kernel details by
2025-08-18 14:38:16 -0700
426b5f23e
Add compiler options for VORTEXM4 by
2025-08-18 14:35:36 -0700
4328c91e2
relax requirements in compiler SME capability check by
2025-08-18 14:34:51 -0700
c794d0a4c
Add VORTEXM4 by
2025-08-18 14:33:24 -0700
a4f5fec46
Add compiler options for VORTEXM4 by
2025-08-18 14:32:07 -0700
ca542f319
Add VORTEXM4 by
2025-08-18 08:41:38 -0700
3bbee42dc
(refs/pull/4054/merge)
Merge 6d05b63bce into 9c43301b6d by
2025-08-18 20:34:24 +0500
18f9582f3
Add VORTEXM4 by
2025-08-18 01:54:09 -0700
4e2a8c18e
Split VORTEXM4 from VORTEX target due to SME support by
2025-08-18 01:53:04 -0700
30970460b
Add VORTEXM4 target by
2025-08-18 01:52:05 -0700
b0a00fbd6
Add minimal compiler flags for VORTEXM4 by
2025-08-18 01:51:10 -0700
ccfd0170f
Enable SME on MacOS and add VORTEXM4 to DYNAMIC_ARCH list by
2025-08-18 01:50:13 -0700
ef0b883df
Add sgemm_direct_performant for ARM64 by
2025-08-18 01:48:08 -0700
e76c39099
Add sgemm_direct_performant for ARM64 by
2025-08-18 01:47:17 -0700
202a7a0e2
Separate VORTEXM4 from VORTEX and ARMV9SME by
2025-08-18 01:45:40 -0700
de91afd2a
Move SGEMM_DIRECT after the CBLAS parameter check and add sgemm_direct_performant for ARM64 by
2025-08-18 01:44:21 -0700
0203657f4
Add sgemm_direct_performant for ARM64 by
2025-08-18 01:42:32 -0700
e82bcd274
Update ARM64 sgemm_direct object generation by
2025-08-18 01:41:13 -0700
731f4dd68
Add VORTEXM4 settings by
2025-08-18 01:39:35 -0700
53d3bb50c
Get symbol name from build system; change b.first to b.mi for AppleClang compatibility by
2025-08-18 01:37:50 -0700
08a00326a
Build symbol name from build system variables by
2025-08-18 01:35:41 -0700
89898fc49
Add sgemm_direct_performant for switching between direct and regular kernels by
2025-08-18 01:31:40 -0700
22c6607db
Use ASMNAME to get symbol name from build system; leave x18 unused as reserved on MacOS by
2025-08-18 01:30:10 -0700
ca22e28ca
Rename sgemm_direct_sme1.S to sgemm_direct_sme1_2VLx2VL.S by
2025-08-18 01:25:44 -0700
c9814eb96
deploy: 9c43301b6d by
2025-08-17 10:03:37 +0000
9c43301b6
Merge pull request #5421 from reibax-marcus/develop by
2025-08-17 03:03:05 -0700
9d6df1dd3
Merge pull request #5422 from ChipKerchner/addRVVVectorizedPacking by
2025-08-16 13:45:35 -0700
003c8d1aa
deploy: f3b2a15fad by
2025-08-16 19:07:26 +0000
f3b2a15fa
Merge pull request #5420 from yuanjia111/develop by
2025-08-16 12:06:53 -0700
64401b441
(refs/pull/5422/head)
Disable vectorized packing for DGEMM - since it is slower than scalar. by
2025-08-13 13:41:12 +0000
b37ea80dd
deploy: 5e43ba948c by
2025-08-13 09:39:28 +0000
5e43ba948
Merge pull request #5419 from Mousius/bgemm-optimisation by
2025-08-13 02:10:20 -0700
c00afc86a
Add and use vectorized packing to ZVL128B and ZVL256B. Up to 3x+ faster than generic scalar functions. by
2025-08-12 17:18:56 +0000
3a6b79c50
(refs/pull/5421/head)
fix: broken cblas installation when using makefile based builds by
2025-08-12 14:40:15 +0200
803e8d483
(refs/pull/5420/head)
Move the value assignment of vector x in gemv_n_sve.c to the outermost loop to reduce the repeated data retrieval. 1.Verify correctness using BLAS-Tester 2.Using the built-in benchmark to verify performance, the performance of float and doule type improved by about 60% and about 40% respectively.The test command is: export OMP_NUM_THREADS=1;numactl -C 10 -l ./sgemv.goto 3000 4000 100 export OMP_NUM_THREADS=1;numactl -C 10 -l ./dgemv.goto 3000 4000 100 by
2025-08-12 18:03:16 +0800
5f47b872f
(refs/pull/5419/head)
Remove older kernels for BGEMM on NEOVERSEV1 by
2025-08-10 17:56:05 +0000
114316f36
Optimize SBGEMM / BGEMM for NEOVERSEV1 further by
2025-08-10 16:29:03 +0000
50965e597
deploy: 75c6ab4036 by
2025-08-09 10:28:54 +0000
75c6ab403
CI: Update WoA job to use LLVM 20.1.8 and avoid stray preinstalled LLVM19 (#5411) by
2025-08-09 03:28:24 -0700
fbefd2a52
(refs/pull/5411/head)
Update windows_arm64.yml by
2025-08-08 14:34:09 +0200
1aed477ee
Update windows_arm64.yml by
2025-08-08 14:06:20 +0200
b04ac31f6
(refs/pull/5418/head)
add level3 defaults for x86 by
2025-08-08 13:14:56 +0300
599e4e372
add modules search path by
2025-08-08 11:55:27 +0200
c77aec697
Update windows_arm64.yml by
2025-08-08 11:04:35 +0200
35fd2c4b8
deploy: 5c5f852ee3 by
2025-08-04 11:29:54 +0000
5c5f852ee
Merge pull request #5415 from martin-frbg/Fixum-5399 by
2025-08-04 04:29:26 -0700
f1ee61ea3
(refs/pull/5415/head)
Include NEON header for the bfloat conversion functions by
2025-08-04 00:21:39 -0700