Martin Kroeker
903589f84b
Update zscal.c
2 years ago
Martin Kroeker
711433fcf0
Update zscal.c
2 years ago
Martin Kroeker
d3d99c34f2
Fix handling of NAN and INF
2 years ago
Martin Kroeker
519b40fad9
Merge pull request #4398 from yinshiyou/la-dev
Add Optimizations for LoongArch.
2 years ago
pengxu
a5d0d21378
loongarch64: Add zgemm and cgemm optimization
2 years ago
gxw
546f13558c
loongarch64: Add {c/z}swap and {c/z}sum optimization
2 years ago
Hao Chen
edabb93668
loongarch64: Refine axpby optimization functions.
2 years ago
Hao Chen
1ec5dded43
loongarch64: Add c/zrot optimization functions.
Signed-off-by: Hao Chen <chenhao@loongson.cn>
2 years ago
Hao Chen
3c53ded315
loongarch64: Add c/znrm2 optimization functions.
2 years ago
Hao Chen
fbd612f8c4
loongarch64: Add ic/zamin optimization functions.
2 years ago
Hao Chen
d97272cb35
loongarch64: Add c/zdot optimization functions.
2 years ago
Hao Chen
65a0aeb128
loongarch64: Add c/zcopy optimization functions.
Signed-off-by: Hao Chen <chenhao@loongson.cn>
2 years ago
Hao Chen
2a34fb4b80
loongarch64: Add and refine scal optimization functions.
Signed-off-by: Hao Chen <chenhao@loongson.cn>
2 years ago
Hao Chen
8785e948b5
loongarch64: Add camin optimization function.
2 years ago
Hao Chen
0753848e03
loongarch64: Refine and add axpy optimization functions.
Signed-off-by: Hao Chen <chenhao@loongson.cn>
2 years ago
Hao Chen
06fd5b5995
loongarch64: Add and Refine asum optimization functions.
2 years ago
guxiwei
e771be185e
Optimize copy functions with lsx.
Signed-off-by: Hao Chen <chenhao@loongson.cn>
2 years ago
Hao Chen
179ed51d3b
Add dgemm_kernel_8x4.S file.
2 years ago
Hao Chen
173a65d4e6
loongarch64: Add and refine iamax optimization functions.
2 years ago
zhoupeng
ea70e165c7
loongarch64: Refine rot optimization.
2 years ago
zhoupeng
116aee7527
loongarch64: Refine imin optimization.
2 years ago
zhoupeng
8be2654193
loongarch64: Refine imax optimization.
2 years ago
zhoupeng
154baad454
loongarch64: Refine iamin optimization.
2 years ago
Shiyou Yin
36c12c4971
loongarch64: Refine copy,swap,nrm2,sum optimization.
2 years ago
Shiyou Yin
c6996a80e9
loongarch64: Refine amax,amin,max,min optimization.
2 years ago
Chris Sidebottom
ecae1389df
Reduce duplication in kernel definitions
These files are exactly the same, so I believe we can reduce these files
down. Other files require a slightly more complex unpicking.
2 years ago
Chris Sidebottom
60e66725e4
Use numeric labels to allow repeated inlining
2 years ago
Chris Sidebottom
7a4fef4f60
Tweak SVE dot kernel
This changes the SVE dot kernel to only predicate when necessary as well
as streamlining the assembly a bit. The benchmarks seem to indicate this
can improve performance by ~33%.
2 years ago
Martin Kroeker
f06b535566
Use C kernel for dgemv_t due to limitations of the old assembly one
2 years ago
barracuda156
d9653af018
KERNEL.PPC970, KERNEL.PPCG4: unbreak CMake parsing
Fixes: https://github.com/OpenMathLib/OpenBLAS/issues/4366
2 years ago
Chip-Kerchner
93747fb377
Merge remote-tracking branch 'origin/develop' into power10Copies
2 years ago
Chip-Kerchner
4e738e561a
Replace two vector loads with one vector pair load and fix endianess of stores.
2 years ago
yancheng
d32f38fb37
loongarch64: Add optimizations for nrm2.
2 years ago
yancheng
f9b468990e
loongarch64: Add optimizations for rot.
2 years ago
yancheng
c80e7e27d1
loongarch64: Add optimizations for sum and asum.
2 years ago
yancheng
d4c96a35a8
loongarch64: Add optimizations for axpy and axpby.
2 years ago
yancheng
360acc0a41
loongarch64: Add optimizations for swap.
2 years ago
yancheng
174c25766b
loongarch64: Add optimizations for copy.
2 years ago
yancheng
49829b2b7d
loongarch64: Add optimizations for iamin.
2 years ago
yancheng
be83f5e4e0
loongarch64: Add optimizations for iamax.
2 years ago
yancheng
e3fb2b5afa
loongarch64: Add optimizations for imin.
2 years ago
yancheng
e46b48e372
loongarch64: Add optimizations for imax.
2 years ago
yancheng
702fc1d56d
loongarch64: Add optimization for min.
2 years ago
yancheng
346b384d1c
loongarch64: Add optimization for max.
2 years ago
yancheng
ff2ecc6cda
loongarch64: Add optimization for amin.
2 years ago
yancheng
265b5f2e80
loongarch64: Add optimizations for amax.
2 years ago
yancheng
993ede7c70
loongarch64: Add optimizations for scal.
2 years ago
Martin Kroeker
39bf8ece20
Merge pull request #4340 from yinshiyou/la-dev
Add some refines and optimizations for LoongArch.
2 years ago
Shiyou Yin
9fe07d82fd
loongarch: Add LSX optimization for dot.
2 years ago
Shiyou Yin
13b8c44b44
loongarch: Add optimization for dsdot kernel.
2 years ago