Martin Kroeker
e338d34ce1
fix path
10 months ago
Martin Kroeker
d36093d084
temporarily change default C/ZSCAL to the non-asm implementation
10 months ago
Martin Kroeker
b3c90564d7
resync with the generic arm version for inf/nan handling
10 months ago
Martin Kroeker
6bdc7f9eb7
Merge pull request #5300 from martin-frbg/fixup5296
kernel/riscv64: Fix cscal/zscal for riscv64_generic
10 months ago
Martin Kroeker
73af02b89f
use dummy2 as Inf/NAN handling flag
10 months ago
Martin Kroeker
549a9f1dbb
Disable the default SSE kernels for CSCAL/ZSCAL for now
10 months ago
Martin Kroeker
58eeb9041c
fix handling of dummy2
11 months ago
Martin Kroeker
7c77537b25
Merge pull request #5297 from martin-frbg/zscal_x86_sparc
kernel/(x86|sparc): Fix cscal and zscal by reverting to the generic C kernels
11 months ago
Martin Kroeker
63287e1855
Merge pull request #5296 from martin-frbg/zscal_riscv
kernel/riscv64: Fix cscal and zscal
11 months ago
Martin Kroeker
d2855d3dab
Merge pull request #5285 from martin-frbg/zscal_zarch
kernel/zarch: Fix cscal and zscal
11 months ago
Martin Kroeker
1408be5fe0
Merge pull request #5282 from martin-frbg/zscal_power
kernel/power: Fixed cscal and zscal
11 months ago
Martin Kroeker
1589d0b21e
Merge pull request #5281 from martin-frbg/zscal_arm64
kernel/arm64: fixed cscal and zscal
11 months ago
Martin Kroeker
a86419fb66
Merge pull request #5280 from martin-frbg/zscal_x86_64
kernel/x86_64: fixed cscal and zscal
11 months ago
Martin Kroeker
11ff18bb0f
Merge pull request #5081 from XiWeiGu/kernel_generic_fixed_cscal_zscal
kernel/generic: Fixed cscal and zscal
11 months ago
Martin Kroeker
f4194fc65f
Merge branch 'develop' into la64_fixed_cscal_zscal
11 months ago
Martin Kroeker
e12132abd4
Use generic C/ZSCAL kernels to address inf/nan handling for now
11 months ago
Martin Kroeker
1cefbea7ea
Use generic SCAL kernels to address inf/nan handling for now
11 months ago
Martin Kroeker
f18b7a46bf
add dummy2 flag handling for inf/nan agnostic zeroing
11 months ago
Martin Kroeker
fe220a0d7d
Merge pull request #5291 from guoyuanplct/develop
kernel/riscv64:fixed the performance problem in RISCV64_ZVL256 when OPENBLAS_K is small
11 months ago
Arne Juul
5442aff218
Accumulate results in output register explicitly
11 months ago
guoyuanplct
2ae019161a
fixed the performance problem in RISCV64_ZVL256 when OPENBLAS_K is small
11 months ago
guoyuanplct
d2003dc886
del lines
11 months ago
guoyuanplct
45fd2d9b07
Optimized the axpby function.
11 months ago
Martin Kroeker
fb8dc8ff5c
Add dummy2 flag handling
11 months ago
Martin Kroeker
cf06250d36
add handling of dummy2 flag
11 months ago
Martin Kroeker
28f8fdaf0f
support flag for NaN/Inf handling and fix scaling of NaN/Inf values
11 months ago
Martin Kroeker
669c847ceb
support extra flag for NaN handling
11 months ago
Martin Kroeker
0b0bb9951d
Merge pull request #5265 from guoyuanplct/develop
kernel/riscv64:Added support for omatcopy on RISCV64_ZVL256B
11 months ago
guoyuanplct
be9f7550b5
Format Code
11 months ago
guoyuanplct
4d213653d8
kernel/riscv64:Added support for omatcopy on riscv64.
11 months ago
Martin Kroeker
8afddc1a81
Merge pull request #5262 from guoyuanplct/develop
kernel/riscv64:Fixed the bug of openblas_utest_ext failing in c/zgemv and some c/zgbmv tests:
11 months ago
guoyuanplct
9a7e3f102b
kernel/riscv64:Fixed the bug of openblas_utest_ext failing in c/zgemv and some c/zgbmv tests:
11 months ago
pengxu
a978ad3180
Loongarch64: add C functions of zgemm_ncopy_16
1 year ago
pengxu
0ccb050583
Loongarch64: fixed cgemm_ncopy_16_lasx
1 year ago
Martin Kroeker
5141a90993
Fix ARMV9SME target in DYNAMIC_ARCH and add SME query code for MacOS ( #5222 )
* Fix ARMV9SME target and add support_sme1 code for MacOS
* make sgemm_direct unconditionally available on all arm64
* build a (dummy) sgemm_direct kernel on all arm64
* Update dynamic_arm64.c
1 year ago
Martin Kroeker
151b74284e
Merge pull request #5203 from quic/fix-sgemmdirect-sme1
Add vector registers to clobber list to prevent compiler optimization.
1 year ago
Martin Kroeker
cba32d001a
Merge pull request #5245 from guoyuanplct/develop
Optimized RVV_ZVL256B Implementation of zgemv_n
1 year ago
pengxu
f19e72c402
Loongarch64: fixed swap_lasx
1 year ago
pengxu
b471fa337b
Loongarch64: fixed snrm2_lasx
1 year ago
pengxu
57bb46bedf
Loongarch64: fixed rot_lasx
1 year ago
pengxu
6dc4ca2391
Loongarch64: fixed icamax_lasx
1 year ago
pengxu
b528b1b8ea
Loongarch64: fixed iamax_lasx
1 year ago
pengxu
ba9569e382
Loongarch64: fixed dot_lasx
1 year ago
pengxu
dc5fa29851
Loongarch64: fixed cscal_lasx
1 year ago
pengxu
a98dd6d911
Loongarch64: fixed copy_lasx
1 year ago
pengxu
d49319c2d2
Loongarch64: fixed cnrm2_lasx
1 year ago
pengxu
74c97ef814
Loongarch64: fixed cdot_lasx
1 year ago
pengxu
be525521ad
Loongarch64: fixed asum_lasx
1 year ago
pengxu
0cd5ca5527
Loongarch64: fixed amax_lasx
1 year ago
guoyuanplct
11ffc8680e
Format the code
1 year ago