Martin Kroeker
fb99fc2e6e
fix type conversion warnings
2 years ago
Martin Kroeker
d4db6a9f16
Separate the interface for SBGEMMT from GEMMT due to differences in GEMV arguments
2 years ago
Martin Kroeker
e5d2725e5a
Merge pull request #4185 from XiWeiGu/mips_enable_msa
MIPS: Enable MSA
2 years ago
Martin Kroeker
a4fde2c5ac
Merge pull request #4451 from martin-frbg/overflow_reset
Reset "buffer management structure overflowed" state and free auxiliary struct on blas_shutdown
2 years ago
Martin Kroeker
b537528feb
Merge pull request #4480 from XiWeiGu/loongarch64-fixed-{s/d}amin-lsx
LoongArch64: Fixed {s/d}amin LSX optimization
2 years ago
Martin Kroeker
bc7154a80d
Merge pull request #4482 from martin-frbg/issue4476
Fix missing NO_AVX2 fallback for SapphireRapids in DYNAMIC_ARCH
2 years ago
Martin Kroeker
6d8a273cca
Handle zero increment(s) in C910V ?AXPBY ( #4483 )
* Handle zero increment(s)
2 years ago
Martin Kroeker
dbcf4f8b7d
Merge pull request #4479 from XiWeiGu/loongarch-opt-axpby
Loongarch opt axpby
2 years ago
Martin Kroeker
dc802dd637
Merge pull request #4474 from ChipKerchner/sgemmIncopy_PR
Vectorize in-copy packing/copying for SGEMM - up to 4X faster.
2 years ago
Martin Kroeker
e307675222
Merge pull request #4478 from martin-frbg/issue4475
Fix incompatible pointer type in BFLOAT16 GEMMT
2 years ago
Martin Kroeker
033168cdf0
Merge pull request #4481 from martin-frbg/cpuid_riscv
Update lowercase cpunames for RISC-V
2 years ago
Martin Kroeker
a29f91ae9a
Merge pull request #4471 from ChipKerchner/fixMakefileAIXOpenMP
Fix Makefiles to support OpenMP on AIX for xlc (clang) with xlf.
2 years ago
Martin Kroeker
e61d96303d
Fix missing NO_AVX2 fallback for SapphireRapids
2 years ago
Martin Kroeker
d02c61e82e
Update lowercase cpunames for RISC-V
2 years ago
Martin Kroeker
7228c708d7
Merge pull request #4461 from markdryan/cpuid_riscv64_crash
Fix two issues with cpuid_riscv64.c
2 years ago
gxw
adde725321
LoongArch64: Fixed {s/d}amin LSX optimization
2 years ago
gxw
7bc93d95a1
LoongArch64: Opt {c/z}axpby
2 years ago
gxw
1e1f487dc7
LoongArch64: Fixed {s/d}axpby
2 years ago
gxw
3597827c93
utest: add axpby
2 years ago
Martin Kroeker
68d354814f
Fix incompatible pointer type in BFLOAT16 mode
2 years ago
Martin Kroeker
3848d4e9f4
Merge pull request #4477 from martin-frbg/c910caxpy
Temporarily disable the CAXPY/ZAXPY kernels for C910V to workaround a CI hang
2 years ago
Martin Kroeker
4d8dee508c
temporarily disable the CAXPY/ZAXPY kernels
2 years ago
Martin Kroeker
27816fa929
Merge pull request #4472 from sergei-lewis/dev/slewis/merge-from-riscv
Merge risc-v branch to develop
2 years ago
Chip Kerchner
2bb7ea64a1
Only vectorize 64-bit version for Power8.
2 years ago
Sergei Lewis
3ffd6868d7
Merge branch 'develop' into dev/slewis/merge-from-riscv
2 years ago
Sergei Lewis
a3b0ef6596
Restore riscv64 fixes from develop branch: dot product double precision accumulation, zscal NaN handling
2 years ago
Martin Kroeker
ec74dcd213
Merge pull request #4470 from martin-frbg/issue4455
Add CBLAS interfaces for BLAS extensions ?AMIN/?AMAX and C/ZAXPYC
2 years ago
Chip Kerchner
61c8e19f95
Fix Makefile to support OpenMP on AIX for xlc (clang) with xlf.
2 years ago
Martin Kroeker
47bd064763
Fix names in build rules
2 years ago
Martin Kroeker
a7d004e820
Fix CBLAS prototype
2 years ago
Martin Kroeker
b54cda8490
Unify creation of CBLAS interfaces for ?AMIN/?AMAX and C/ZAXPYC between gmake and cmake builds
2 years ago
Martin Kroeker
1a6fdb0353
Add prototypes for extensions ?AMIN/?AMAX and CAXPYC/ZAXPYC
2 years ago
Martin Kroeker
d1343302bd
Merge pull request #4465 from XiWeiGu/utest-zscal
utest: Add tests for zscal
2 years ago
gxw
969601a1dc
X86_64: Fixed bug in zscal
Fixed handling of NAN and INF arguments when
inc is greater than 1.
2 years ago
Martin Kroeker
98c9ff3194
Merge pull request #4464 from XiWeiGu/loongarch64-zscal
LoongArch64: Handle NAN and INF
2 years ago
Martin Kroeker
9f0630187a
Merge pull request #4463 from XiWeiGu/loongarch64-zamax-zamin
Loongarch64: amax and amin
2 years ago
Chip Kerchner
09bb48d1b9
Vectorize in-copy packing/copying for SGEMM - 4X faster.
2 years ago
gxw
bb043a021f
utest: Add tests for zscal
2 years ago
gxw
83ce97a4ca
LoongArch64: Handle NAN and INF
2 years ago
gxw
3d4dfd0085
Benchmark: Rename the executable file names for {sc/dz}a{min/max}
No interface named {c/z}a{min/max}, keeping it would
cause ambiguity
2 years ago
gxw
a79d117405
LoogArch64: Fixed bug for {s/d}amin
2 years ago
gxw
519ea6e87a
utest: Add utest for the {sc/dz}amax and {s/d/sc/dz}amin
2 years ago
Sergei Lewis
1093def0d1
Merge branch 'risc-v' into develop
2 years ago
Martin Kroeker
8892121130
Merge pull request #4462 from martin-frbg/issue4449
Use +sve in arch declarations of the fallback paths for SVE targets
2 years ago
Martin Kroeker
48a4c4d454
Use +sve in arch declarations of the fallback paths for SVE targets
2 years ago
Mark Ryan
e0b610d01f
Harmonize riscv64 LIBNAME for forced and non-forced targets
The forced values for LIBNAME were either riscv64_generic or c910v
while the non-forced value of LIBNAME was always riscv64.
2 years ago
Mark Ryan
ec2aa32eb0
Fix crash in cpuid_riscv64.c
The crash is reproducible when building OpenBLAS without forcing a
target in a riscv64 container running on an X86_64 machine with an
older version of QEMU, e.g., 7.0.0, registered with binfmt_misc to
run riscv64 binaries. With this setup, cat /proc/cpuinfo in the
container returns the cpu information for the host, which contains a
"model name" string, and we execute the buggy code. The code in
question is searching in an uninitialised buffer for the ':' character
and doesn't check to see whether it was found or not. This can result
in pmodel containing the pointer value 1 and a crash when pmodel is
defererenced. The algorithm to detect the C910V CPU has not been
modified, merely fixed to prevent the crash.
A few additional checks for NULL pointers are added to improve the
robustness of the code and a whitespace error is corrected.
2 years ago
Martin Kroeker
889c5d026a
Merge pull request #4456 from kseniyazaytseva/riscv-rvv10
Fix BLAS and LAPACK tests for RVV 1.0 target, update to 0.12.0 intrincics
2 years ago
Martin Kroeker
4e2a32ff51
Merge pull request #4454 from kseniyazaytseva/riscv-rvv07
Fix BLAS and LAPACK tests for C910V and RISCV64_ZVL256B targets
2 years ago
gxw
276e3ebf9e
LoongArch64: Add dzamax and dzamin opt
2 years ago