guxiwei
e771be185e
Optimize copy functions with lsx.
Signed-off-by: Hao Chen <chenhao@loongson.cn>
2 years ago
Hao Chen
179ed51d3b
Add dgemm_kernel_8x4.S file.
2 years ago
Hao Chen
173a65d4e6
loongarch64: Add and refine iamax optimization functions.
2 years ago
zhoupeng
ea70e165c7
loongarch64: Refine rot optimization.
2 years ago
zhoupeng
116aee7527
loongarch64: Refine imin optimization.
2 years ago
zhoupeng
8be2654193
loongarch64: Refine imax optimization.
2 years ago
zhoupeng
154baad454
loongarch64: Refine iamin optimization.
2 years ago
Shiyou Yin
36c12c4971
loongarch64: Refine copy,swap,nrm2,sum optimization.
2 years ago
Shiyou Yin
c6996a80e9
loongarch64: Refine amax,amin,max,min optimization.
2 years ago
Martin Kroeker
21564bde2c
Merge pull request #4394 from martin-frbg/dyn_vortex
Add Apple M as NeoverseN1 in ARM64 DYNAMIC_ARCH runtime detection
2 years ago
Martin Kroeker
e9c32ed165
Merge pull request #4384 from yetist/develop
Fix: build failed on LoongArch
2 years ago
Martin Kroeker
e7a895e714
Add Apple M as NeoverseN1
2 years ago
Martin Kroeker
474ce0ace9
Merge pull request #4393 from martin-frbg/pr4389-2
Remove redundant targets from the default ARM64 DYNAMIC_ARCH list
2 years ago
Martin Kroeker
1106460bb3
remove redundant targets from the default ARM64 DYNAMIC_ARCH list
2 years ago
Martin Kroeker
236acee706
Merge pull request #4389 from Mousius/reduce-dynamic-targets
Use functionally equivalent dynamic targets
2 years ago
Xiaotian Wu
d2f4f1b28a
CI: update toolchains for LoongArch64
2 years ago
Wu Xiaotian
0baf462dbc
Fix: build failed on LoongArch
According to the documentation at https://github.com/loongson/la-abi-specs/blob/release/lapcs.adoc#the-base-abi-variants , valid -mabi parameters are lp64s, lp64f, lp64d, ilp32s, ilp32f and ilp32d.
2 years ago
Martin Kroeker
63a83939a1
Merge pull request #4390 from Mousius/reduce-kernel-duplication
Reduce duplication in kernel definitions
2 years ago
Martin Kroeker
dba404055d
Merge pull request #4392 from martin-frbg/lapack959
Fix issues related to the ?GEDMD functions (Reference-LAPACK PR 959)
2 years ago
Martin Kroeker
c6fa921027
Add tests for ?GEDMD (Reference-LAPACK PR 959)
2 years ago
Martin Kroeker
283713e4c5
Add tests for ?GEDMD (Reference-LAPACK PR 959)
2 years ago
Martin Kroeker
201f22f49a
Fix issues related to ?GEDMD (Reference-LAPACK PR 959)
2 years ago
Martin Kroeker
05dde8ef04
Merge pull request #4391 from martin-frbg/lapack942
Handle corner cases of LWORK (Reference-LAPACK PR 942)
2 years ago
Martin Kroeker
45ef0d7361
Handle corner cases of LWORK (Reference-LAPACK PR 942)
2 years ago
Martin Kroeker
c082669ad4
Handle corner cases of LWORK (Reference-LAPACK PR 942)
2 years ago
Martin Kroeker
29d6024ec5
Handle corner cases of LWORK (Reference-LAPACK PR 942)
2 years ago
Martin Kroeker
0814491d96
Handle corner cases of LWORK (Reference-LAPACK PR 942)
2 years ago
Martin Kroeker
5c11b2ff41
Handle corner cases of LWORK (Reference-LAPACK PR 942)
2 years ago
Martin Kroeker
8ce44c18a0
Handle corner cases of LWORK (Reference-LAPACK PR 942)
2 years ago
Chris Sidebottom
dc20a78188
Use functionally equivalent dynamic targets
Similar to `drivers/other/dynamic.c`, I've looked for functionally
equivalent targets and mapped them in the default DYNAMIC_ARCH build.
Users can still build specific cores using DYNAMIC_LIST.
2 years ago
Chris Sidebottom
ecae1389df
Reduce duplication in kernel definitions
These files are exactly the same, so I believe we can reduce these files
down. Other files require a slightly more complex unpicking.
2 years ago
Martin Kroeker
68ef2328eb
Merge pull request #4388 from martin-frbg/issue4387
Add lower limit for multithreading in the reimplemented LAPACK ?GESV
2 years ago
Martin Kroeker
a7ed60bfe9
Add lower limit for multithreading
2 years ago
Martin Kroeker
67779177b9
Merge pull request #4383 from martin-frbg/fixlapatest
Restore OpenBLAS-specific changes to the LAPACK test framework
2 years ago
Martin Kroeker
e67a0eaaf9
Restore OpenBLAS-specific build rule changes
2 years ago
Martin Kroeker
bb8b91e9f2
restore OpenBLAS-specific test paths
2 years ago
Martin Kroeker
fa220b2969
Merge pull request #4382 from Mousius/sve-dot-again
Tweak SVE dot kernel
2 years ago
Martin Kroeker
3f46d0c79a
Merge pull request #4381 from darshanp4/issue_4323
Update GEMM param for NEOVERSEV1
2 years ago
Chris Sidebottom
60e66725e4
Use numeric labels to allow repeated inlining
2 years ago
Chris Sidebottom
7a4fef4f60
Tweak SVE dot kernel
This changes the SVE dot kernel to only predicate when necessary as well
as streamlining the assembly a bit. The benchmarks seem to indicate this
can improve performance by ~33%.
2 years ago
Darshan Patel
dab0da8243
Update GEMM param for NEOVERSEV1
2 years ago
Martin Kroeker
3b520a56a9
Merge pull request #4378 from martin-frbg/issue3871
Mention C906V instruction set limitation and update DYNAMIC_ARCH lists
2 years ago
Martin Kroeker
563daadc92
Merge pull request #4379 from barracuda156/ppc970
PPC970: drop -mcpu=970 which seems to produce faulty code
2 years ago
barracuda156
8c143331b0
PPC970: drop -mcpu=970 which seems to produce faulty code
Fixes: https://github.com/OpenMathLib/OpenBLAS/issues/4376
2 years ago
Martin Kroeker
d2f1594bca
Merge pull request #4368 from martin-frbg/issue4073
Add complex type definitions for MSVC in Reference-LAPACK's lapack.h
2 years ago
Martin Kroeker
544cb86300
Mention C906V instruction set limitation and update DYNAMIC_ARCH lists
2 years ago
Martin Kroeker
8793601e86
Merge pull request #4375 from martin-frbg/issue4352
Retire the GotoBLAS gemv_t kernel still used as fallback on x86_64
2 years ago
Martin Kroeker
f06b535566
Use C kernel for dgemv_t due to limitations of the old assembly one
2 years ago
Martin Kroeker
293131d6b9
Merge pull request #4370 from barracuda156/unbreak_powerpc
macOS PowerPC: fix CMake build
2 years ago
barracuda156
981e315b30
cc.cmake: use -force_cpusubtype_ALL for Darwin PPC
2 years ago