Martin Kroeker
fa220b2969
Merge pull request #4382 from Mousius/sve-dot-again
Tweak SVE dot kernel
2 years ago
Martin Kroeker
3f46d0c79a
Merge pull request #4381 from darshanp4/issue_4323
Update GEMM param for NEOVERSEV1
2 years ago
Chris Sidebottom
60e66725e4
Use numeric labels to allow repeated inlining
2 years ago
Chris Sidebottom
7a4fef4f60
Tweak SVE dot kernel
This changes the SVE dot kernel to only predicate when necessary as well
as streamlining the assembly a bit. The benchmarks seem to indicate this
can improve performance by ~33%.
2 years ago
Darshan Patel
dab0da8243
Update GEMM param for NEOVERSEV1
2 years ago
Martin Kroeker
3b520a56a9
Merge pull request #4378 from martin-frbg/issue3871
Mention C906V instruction set limitation and update DYNAMIC_ARCH lists
2 years ago
Martin Kroeker
563daadc92
Merge pull request #4379 from barracuda156/ppc970
PPC970: drop -mcpu=970 which seems to produce faulty code
2 years ago
barracuda156
8c143331b0
PPC970: drop -mcpu=970 which seems to produce faulty code
Fixes: https://github.com/OpenMathLib/OpenBLAS/issues/4376
2 years ago
Martin Kroeker
d2f1594bca
Merge pull request #4368 from martin-frbg/issue4073
Add complex type definitions for MSVC in Reference-LAPACK's lapack.h
2 years ago
Martin Kroeker
544cb86300
Mention C906V instruction set limitation and update DYNAMIC_ARCH lists
2 years ago
Martin Kroeker
8793601e86
Merge pull request #4375 from martin-frbg/issue4352
Retire the GotoBLAS gemv_t kernel still used as fallback on x86_64
2 years ago
Martin Kroeker
f06b535566
Use C kernel for dgemv_t due to limitations of the old assembly one
2 years ago
Martin Kroeker
293131d6b9
Merge pull request #4370 from barracuda156/unbreak_powerpc
macOS PowerPC: fix CMake build
2 years ago
barracuda156
981e315b30
cc.cmake: use -force_cpusubtype_ALL for Darwin PPC
2 years ago
barracuda156
d9653af018
KERNEL.PPC970, KERNEL.PPCG4: unbreak CMake parsing
Fixes: https://github.com/OpenMathLib/OpenBLAS/issues/4366
2 years ago
Martin Kroeker
302ca7edc7
Merge pull request #4371 from barracuda156/970
cc.cmake: add optflags for G5 and G4 kernels
2 years ago
barracuda156
a8d3619f65
cc.cmake: add optflags for G5 and G4 kernels
2 years ago
Martin Kroeker
aa46f1e4e7
revert addition of MSVC-compatible complex (moved to lapacke_config.h)
2 years ago
Martin Kroeker
dcdc351272
Add MSVC-compatible complex types
2 years ago
Martin Kroeker
55a0718f72
Merge pull request #4369 from ChipKerchner/power10Copies
Replace two vector loads with one vector pair load.
2 years ago
Chip-Kerchner
93747fb377
Merge remote-tracking branch 'origin/develop' into power10Copies
2 years ago
Martin Kroeker
dcf6999c4e
remove extraneous endif
2 years ago
Martin Kroeker
330101e0b3
Add complex type definitions for MSVC
2 years ago
Martin Kroeker
d9f1478068
Merge pull request #4367 from barracuda156/unbreak_powerpc
Fix arch detection with CMake build for PowerPC
2 years ago
barracuda156
9dbc8129b3
cpuid_power.c: add CPU_SUBTYPE_POWERPC_7400 case
2 years ago
barracuda156
c732f275a2
system_check.cmake: fix arch detection for Darwin PowerPC
2 years ago
Martin Kroeker
e60fb0f397
Merge pull request #4359 from mseminatore/win_perf
Improve Windows threading performance scaling
2 years ago
Mark Seminatore
efa9515a23
Merge branch 'OpenMathLib:develop' into win_perf
2 years ago
Chip-Kerchner
4e738e561a
Replace two vector loads with one vector pair load and fix endianess of stores.
2 years ago
Mark Seminatore
edac80d7e8
some cleanup, dynamically scale threads, add missing WIN_CASE defn
2 years ago
Martin Kroeker
5b09833b1c
Merge pull request #4019 from uniontech-lilinjie/develop
fix typo
2 years ago
Martin Kroeker
3193aa9c7e
Merge pull request #4362 from yinshiyou/la-dev
Add 15 level1 optimizations for LoongArch.
2 years ago
yancheng
d32f38fb37
loongarch64: Add optimizations for nrm2.
2 years ago
yancheng
f9b468990e
loongarch64: Add optimizations for rot.
2 years ago
yancheng
c80e7e27d1
loongarch64: Add optimizations for sum and asum.
2 years ago
yancheng
d4c96a35a8
loongarch64: Add optimizations for axpy and axpby.
2 years ago
yancheng
360acc0a41
loongarch64: Add optimizations for swap.
2 years ago
yancheng
174c25766b
loongarch64: Add optimizations for copy.
2 years ago
yancheng
49829b2b7d
loongarch64: Add optimizations for iamin.
2 years ago
yancheng
be83f5e4e0
loongarch64: Add optimizations for iamax.
2 years ago
yancheng
e3fb2b5afa
loongarch64: Add optimizations for imin.
2 years ago
yancheng
e46b48e372
loongarch64: Add optimizations for imax.
2 years ago
yancheng
702fc1d56d
loongarch64: Add optimization for min.
2 years ago
yancheng
346b384d1c
loongarch64: Add optimization for max.
2 years ago
yancheng
ff2ecc6cda
loongarch64: Add optimization for amin.
2 years ago
yancheng
265b5f2e80
loongarch64: Add optimizations for amax.
2 years ago
yancheng
993ede7c70
loongarch64: Add optimizations for scal.
2 years ago
Mark Seminatore
4ebf814b42
fix bug failing to mark task as finished.
2 years ago
Mark Seminatore
5f51811728
try at new threading model
2 years ago
Martin Kroeker
a8cb611157
Merge pull request #4358 from martin-frbg/lapack954
Fix keyword used to count successful tests (Reference-LAPACK PR 954)
2 years ago