Martin Kroeker
874744976c
fix dimension used in nancheck (Reference-LAPACK PR 1135)
10 months ago
Srangrang
9f13b2c6ac
style: modify HALF to BFLOAT16 in benchmark folder
10 months ago
Srangrang
ec14e1648c
fix: resolve non-RISCV host build failed issue
- adjust interface to disable "small matrix" pathway
- separate HFLOAT16 from BFLOAT16
- remove SHGEMM_UNROLL_M and SHGEMM_UNROLL_N equal conditions
Related to PR#5290
Co-authored-by Martin
10 months ago
Martin Kroeker
0ea173ec8c
Merge pull request #5304 from martin-frbg/fixgemmtr_if
fix source file used for sbgemmt/sbgemmtr in CMake builds
10 months ago
Martin Kroeker
5e393f207c
fix source file used for sbgemmt/sbgemmtr
10 months ago
Martin Kroeker
162591af47
Merge branch 'OpenMathLib:develop' into issue5289
10 months ago
Martin Kroeker
dbd5643d37
Merge pull request #5302 from martin-frbg/zscal_mips_3
mips64 SICORTEX: temporarily change default C/ZSCAL to the non-asm implementation
10 months ago
Martin Kroeker
31ef2cbbb3
Exit if memory allocation keeps failing, instead of looping forever
10 months ago
Martin Kroeker
e338d34ce1
fix path
10 months ago
Martin Kroeker
d36093d084
temporarily change default C/ZSCAL to the non-asm implementation
10 months ago
Martin Kroeker
cc4b04a684
Merge pull request #5301 from martin-frbg/zscal_mips_2
kernel/mips(64): Fix cscal and zscal
10 months ago
Martin Kroeker
b3c90564d7
resync with the generic arm version for inf/nan handling
10 months ago
Martin Kroeker
6bdc7f9eb7
Merge pull request #5300 from martin-frbg/fixup5296
kernel/riscv64: Fix cscal/zscal for riscv64_generic
10 months ago
Martin Kroeker
63272b6c82
Merge pull request #5299 from martin-frbg/x86_64-ssezscal
Disable the default SSE kernels for x86_64 CSCAL/ZSCAL for now
10 months ago
Martin Kroeker
73af02b89f
use dummy2 as Inf/NAN handling flag
10 months ago
Martin Kroeker
549a9f1dbb
Disable the default SSE kernels for CSCAL/ZSCAL for now
10 months ago
Martin Kroeker
ca1ce84ee5
Merge pull request #5298 from martin-frbg/fixup5281
Fix PR5281 "kernel/arm64: fix cscal/zscal"
10 months ago
Martin Kroeker
58eeb9041c
fix handling of dummy2
11 months ago
Martin Kroeker
7c77537b25
Merge pull request #5297 from martin-frbg/zscal_x86_sparc
kernel/(x86|sparc): Fix cscal and zscal by reverting to the generic C kernels
11 months ago
Martin Kroeker
63287e1855
Merge pull request #5296 from martin-frbg/zscal_riscv
kernel/riscv64: Fix cscal and zscal
11 months ago
Martin Kroeker
d2855d3dab
Merge pull request #5285 from martin-frbg/zscal_zarch
kernel/zarch: Fix cscal and zscal
11 months ago
Martin Kroeker
1408be5fe0
Merge pull request #5282 from martin-frbg/zscal_power
kernel/power: Fixed cscal and zscal
11 months ago
Martin Kroeker
1589d0b21e
Merge pull request #5281 from martin-frbg/zscal_arm64
kernel/arm64: fixed cscal and zscal
11 months ago
Martin Kroeker
a86419fb66
Merge pull request #5280 from martin-frbg/zscal_x86_64
kernel/x86_64: fixed cscal and zscal
11 months ago
Martin Kroeker
11ff18bb0f
Merge pull request #5081 from XiWeiGu/kernel_generic_fixed_cscal_zscal
kernel/generic: Fixed cscal and zscal
11 months ago
Martin Kroeker
2e2691b34b
Merge pull request #5078 from XiWeiGu/la64_fixed_cscal_zscal
LoongArch64: fixed cscal and zscal
11 months ago
Martin Kroeker
f4194fc65f
Merge branch 'develop' into la64_fixed_cscal_zscal
11 months ago
Martin Kroeker
e12132abd4
Use generic C/ZSCAL kernels to address inf/nan handling for now
11 months ago
Martin Kroeker
1cefbea7ea
Use generic SCAL kernels to address inf/nan handling for now
11 months ago
Sharif Inamdar
8279e68805
Optimize gemv_n_sve_v1x3 kernel
- Calculate predicate outside the loop
- Divide matrix in blocks of 3
11 months ago
Martin Kroeker
f18b7a46bf
add dummy2 flag handling for inf/nan agnostic zeroing
11 months ago
Martin Kroeker
fe220a0d7d
Merge pull request #5291 from guoyuanplct/develop
kernel/riscv64:fixed the performance problem in RISCV64_ZVL256 when OPENBLAS_K is small
11 months ago
Martin Kroeker
bbdc265798
Merge pull request #5294 from arnej27959/arnej/fix-arm64-register
Accumulate results in output register explicitly
11 months ago
Arne Juul
5442aff218
Accumulate results in output register explicitly
11 months ago
guoyuanplct
83fcab7578
Merge branch 'develop' of https://github.com/guoyuanplct/OpenBLAS into develop
11 months ago
guoyuanplct
2ae019161a
fixed the performance problem in RISCV64_ZVL256 when OPENBLAS_K is small
11 months ago
Srangrang
fb89820f20
Merge branch 'develop' of https://github.com/Srangrang/OpenBLAS into develop
11 months ago
Srangrang
4e1a381e5b
fix: resolve the compilation failure without zfh instruction
- modify the macro conditions in Makefile.system
- Delete development test code
Related to issue#5279
11 months ago
Linjin Li
fa2b08b378
Merge pull request #1 from gkdddd/riscv_shgemm
Added shgemm_kernel_8x8 for RISCV64_ZVL128B and shgemm_kernel_16x8 fo…
11 months ago
gkdddd
670ec6f757
Added shgemm_kernel_8x8 for RISCV64_ZVL128B and shgemm_kernel_16x8 for RISCV64_ZVL256B
Added HFLOAT16 support for RISCV64
Added shgemm_kernel_8x8 for RISCV64_ZVL128B and shgemm_kernel_16x8 for RISCV64_ZVL256B based on HFLOAT16
The instruction sets used are ZVFH and ZFH, which need to be supported by RVV1.0
Related to issue #5279
Co-authored-by Linjin Li <linjin_li@163.com>
11 months ago
Martin Kroeker
02267d86f5
Merge pull request #5288 from guoyuanplct/develop
kernel/riscv64:Optimized the implementation of axpby on TARGET=RISCV64_ZVL256B.
11 months ago
guoyuanplct
d2003dc886
del lines
11 months ago
guoyuanplct
45fd2d9b07
Optimized the axpby function.
11 months ago
Srangrang
0a967797a1
Add FP16 support for RISCV
11 months ago
Martin Kroeker
fb8dc8ff5c
Add dummy2 flag handling
11 months ago
Srangrang
2996c25c94
add shgemm for RISCV_ZVL128B
11 months ago
Martin Kroeker
cf06250d36
add handling of dummy2 flag
11 months ago
Martin Kroeker
28f8fdaf0f
support flag for NaN/Inf handling and fix scaling of NaN/Inf values
11 months ago
Martin Kroeker
669c847ceb
support extra flag for NaN handling
11 months ago
Martin Kroeker
0163143fdd
Merge pull request #5278 from martin-frbg/fixup5276
Fix compilation with pre-C99 compilers
11 months ago