Hao Chen
|
edabb93668
|
loongarch64: Refine axpby optimization functions.
|
2 years ago |
Hao Chen
|
1ec5dded43
|
loongarch64: Add c/zrot optimization functions.
Signed-off-by: Hao Chen <chenhao@loongson.cn>
|
2 years ago |
Hao Chen
|
3c53ded315
|
loongarch64: Add c/znrm2 optimization functions.
|
2 years ago |
Hao Chen
|
fbd612f8c4
|
loongarch64: Add ic/zamin optimization functions.
|
2 years ago |
Hao Chen
|
d97272cb35
|
loongarch64: Add c/zdot optimization functions.
|
2 years ago |
Hao Chen
|
65a0aeb128
|
loongarch64: Add c/zcopy optimization functions.
Signed-off-by: Hao Chen <chenhao@loongson.cn>
|
2 years ago |
Hao Chen
|
2a34fb4b80
|
loongarch64: Add and refine scal optimization functions.
Signed-off-by: Hao Chen <chenhao@loongson.cn>
|
2 years ago |
Hao Chen
|
8785e948b5
|
loongarch64: Add camin optimization function.
|
2 years ago |
Hao Chen
|
0753848e03
|
loongarch64: Refine and add axpy optimization functions.
Signed-off-by: Hao Chen <chenhao@loongson.cn>
|
2 years ago |
Hao Chen
|
06fd5b5995
|
loongarch64: Add and Refine asum optimization functions.
|
2 years ago |
guxiwei
|
e771be185e
|
Optimize copy functions with lsx.
Signed-off-by: Hao Chen <chenhao@loongson.cn>
|
2 years ago |
Hao Chen
|
179ed51d3b
|
Add dgemm_kernel_8x4.S file.
|
2 years ago |
Hao Chen
|
173a65d4e6
|
loongarch64: Add and refine iamax optimization functions.
|
2 years ago |
zhoupeng
|
ea70e165c7
|
loongarch64: Refine rot optimization.
|
2 years ago |
zhoupeng
|
116aee7527
|
loongarch64: Refine imin optimization.
|
2 years ago |
zhoupeng
|
8be2654193
|
loongarch64: Refine imax optimization.
|
2 years ago |
zhoupeng
|
154baad454
|
loongarch64: Refine iamin optimization.
|
2 years ago |
Shiyou Yin
|
36c12c4971
|
loongarch64: Refine copy,swap,nrm2,sum optimization.
|
2 years ago |
Shiyou Yin
|
c6996a80e9
|
loongarch64: Refine amax,amin,max,min optimization.
|
2 years ago |
Martin Kroeker
|
3b520a56a9
|
Merge pull request #4378 from martin-frbg/issue3871
Mention C906V instruction set limitation and update DYNAMIC_ARCH lists
|
2 years ago |
Martin Kroeker
|
563daadc92
|
Merge pull request #4379 from barracuda156/ppc970
PPC970: drop -mcpu=970 which seems to produce faulty code
|
2 years ago |
barracuda156
|
8c143331b0
|
PPC970: drop -mcpu=970 which seems to produce faulty code
Fixes: https://github.com/OpenMathLib/OpenBLAS/issues/4376
|
2 years ago |
Martin Kroeker
|
d2f1594bca
|
Merge pull request #4368 from martin-frbg/issue4073
Add complex type definitions for MSVC in Reference-LAPACK's lapack.h
|
2 years ago |
Martin Kroeker
|
544cb86300
|
Mention C906V instruction set limitation and update DYNAMIC_ARCH lists
|
2 years ago |
Martin Kroeker
|
8793601e86
|
Merge pull request #4375 from martin-frbg/issue4352
Retire the GotoBLAS gemv_t kernel still used as fallback on x86_64
|
2 years ago |
Martin Kroeker
|
f06b535566
|
Use C kernel for dgemv_t due to limitations of the old assembly one
|
2 years ago |
Martin Kroeker
|
293131d6b9
|
Merge pull request #4370 from barracuda156/unbreak_powerpc
macOS PowerPC: fix CMake build
|
2 years ago |
barracuda156
|
981e315b30
|
cc.cmake: use -force_cpusubtype_ALL for Darwin PPC
|
2 years ago |
barracuda156
|
d9653af018
|
KERNEL.PPC970, KERNEL.PPCG4: unbreak CMake parsing
Fixes: https://github.com/OpenMathLib/OpenBLAS/issues/4366
|
2 years ago |
Martin Kroeker
|
302ca7edc7
|
Merge pull request #4371 from barracuda156/970
cc.cmake: add optflags for G5 and G4 kernels
|
2 years ago |
barracuda156
|
a8d3619f65
|
cc.cmake: add optflags for G5 and G4 kernels
|
2 years ago |
Martin Kroeker
|
aa46f1e4e7
|
revert addition of MSVC-compatible complex (moved to lapacke_config.h)
|
2 years ago |
Martin Kroeker
|
dcdc351272
|
Add MSVC-compatible complex types
|
2 years ago |
Martin Kroeker
|
55a0718f72
|
Merge pull request #4369 from ChipKerchner/power10Copies
Replace two vector loads with one vector pair load.
|
2 years ago |
Chip-Kerchner
|
93747fb377
|
Merge remote-tracking branch 'origin/develop' into power10Copies
|
2 years ago |
Martin Kroeker
|
dcf6999c4e
|
remove extraneous endif
|
2 years ago |
Martin Kroeker
|
330101e0b3
|
Add complex type definitions for MSVC
|
2 years ago |
Martin Kroeker
|
d9f1478068
|
Merge pull request #4367 from barracuda156/unbreak_powerpc
Fix arch detection with CMake build for PowerPC
|
2 years ago |
barracuda156
|
9dbc8129b3
|
cpuid_power.c: add CPU_SUBTYPE_POWERPC_7400 case
|
2 years ago |
barracuda156
|
c732f275a2
|
system_check.cmake: fix arch detection for Darwin PowerPC
|
2 years ago |
Martin Kroeker
|
e60fb0f397
|
Merge pull request #4359 from mseminatore/win_perf
Improve Windows threading performance scaling
|
2 years ago |
Mark Seminatore
|
efa9515a23
|
Merge branch 'OpenMathLib:develop' into win_perf
|
2 years ago |
Chip-Kerchner
|
4e738e561a
|
Replace two vector loads with one vector pair load and fix endianess of stores.
|
2 years ago |
Mark Seminatore
|
edac80d7e8
|
some cleanup, dynamically scale threads, add missing WIN_CASE defn
|
2 years ago |
Martin Kroeker
|
5b09833b1c
|
Merge pull request #4019 from uniontech-lilinjie/develop
fix typo
|
2 years ago |
Martin Kroeker
|
3193aa9c7e
|
Merge pull request #4362 from yinshiyou/la-dev
Add 15 level1 optimizations for LoongArch.
|
2 years ago |
yancheng
|
d32f38fb37
|
loongarch64: Add optimizations for nrm2.
|
2 years ago |
yancheng
|
f9b468990e
|
loongarch64: Add optimizations for rot.
|
2 years ago |
yancheng
|
c80e7e27d1
|
loongarch64: Add optimizations for sum and asum.
|
2 years ago |
yancheng
|
d4c96a35a8
|
loongarch64: Add optimizations for axpy and axpby.
|
2 years ago |