d2b7f0f9c
define CGEMM INCOPY/ITCOPY kernels by
2023-12-30 20:49:53 +0100
2327b13b3
define CGEMM INCOPY/ITCOPY kernels by
2023-12-30 20:48:40 +0100
0f648ebcd
(refs/pull/4399/head)
use alternate download for the CLFS cross-compiler package by
2023-12-30 20:31:32 +0100
519b40fad
Merge pull request #4398 from yinshiyou/la-dev by
2023-12-30 19:51:08 +0100
a5d0d2137
(refs/pull/4398/head)
loongarch64: Add zgemm and cgemm optimization by
2023-12-29 15:10:01 +0800
546f13558
loongarch64: Add {c/z}swap and {c/z}sum optimization by
2023-12-29 11:03:53 +0800
edabb9366
loongarch64: Refine axpby optimization functions. by
2023-12-29 15:08:10 +0800
1ec5dded4
loongarch64: Add c/zrot optimization functions. by
2023-12-28 21:23:59 +0800
3c53ded31
loongarch64: Add c/znrm2 optimization functions. by
2023-12-28 20:26:01 +0800
fbd612f8c
loongarch64: Add ic/zamin optimization functions. by
2023-12-28 20:07:58 +0800
d97272cb3
loongarch64: Add c/zdot optimization functions. by
2023-12-28 19:09:18 +0800
65a0aeb12
loongarch64: Add c/zcopy optimization functions. by
2023-12-28 17:45:17 +0800
2a34fb4b8
loongarch64: Add and refine scal optimization functions. by
2023-12-27 18:17:51 +0800
8785e948b
loongarch64: Add camin optimization function. by
2023-12-27 17:04:46 +0800
0753848e0
loongarch64: Refine and add axpy optimization functions. by
2023-12-27 16:54:01 +0800
06fd5b599
loongarch64: Add and Refine asum optimization functions. by
2023-12-27 10:44:02 +0800
e771be185
Optimize copy functions with lsx. by
2023-12-21 14:28:06 +0800
179ed51d3
Add dgemm_kernel_8x4.S file. by
2023-12-21 14:18:39 +0800
173a65d4e
loongarch64: Add and refine iamax optimization functions. by
2023-12-25 15:11:04 +0800
ea70e165c
loongarch64: Refine rot optimization. by
2023-12-28 20:07:59 +0800
116aee752
loongarch64: Refine imin optimization. by
2023-12-28 15:17:28 +0800
8be265419
loongarch64: Refine imax optimization. by
2023-12-28 10:24:24 +0800
154baad45
loongarch64: Refine iamin optimization. by
2023-12-27 16:04:33 +0800
36c12c497
loongarch64: Refine copy,swap,nrm2,sum optimization. by
2023-12-27 11:30:17 +0800
c6996a80e
loongarch64: Refine amax,amin,max,min optimization. by
2023-12-08 16:06:17 +0800
21564bde2
Merge pull request #4394 from martin-frbg/dyn_vortex by
2023-12-28 13:35:55 +0100
75fe9c21e
(refs/pull/4397/head)
Scale P and Q with L2 cache size for SVE by
2023-12-27 17:52:19 +0000
e9c32ed16
Merge pull request #4384 from yetist/develop by
2023-12-27 14:05:01 +0100
e7a895e71
(refs/pull/4394/head)
Add Apple M as NeoverseN1 by
2023-12-25 12:36:05 +0100
474ce0ace
Merge pull request #4393 from martin-frbg/pr4389-2 by
2023-12-25 12:30:56 +0100
1106460bb
(refs/pull/4393/head)
remove redundant targets from the default ARM64 DYNAMIC_ARCH list by
2023-12-25 12:29:56 +0100
236acee70
Merge pull request #4389 from Mousius/reduce-dynamic-targets by
2023-12-25 12:27:42 +0100
d2f4f1b28
(refs/pull/4384/head)
CI: update toolchains for LoongArch64 by
2023-12-20 14:13:04 +0800
0baf462db
Fix: build failed on LoongArch by
2023-12-20 10:34:47 +0800
63a83939a
Merge pull request #4390 from Mousius/reduce-kernel-duplication by
2023-12-24 18:04:26 +0100
dba404055
Merge pull request #4392 from martin-frbg/lapack959 by
2023-12-24 10:44:15 +0100
c6fa92102
(refs/pull/4392/head)
Add tests for ?GEDMD (Reference-LAPACK PR 959) by
2023-12-23 23:39:53 +0100
283713e4c
Add tests for ?GEDMD (Reference-LAPACK PR 959) by
2023-12-23 23:32:45 +0100
201f22f49
Fix issues related to ?GEDMD (Reference-LAPACK PR 959) by
2023-12-23 23:27:38 +0100
05dde8ef0
Merge pull request #4391 from martin-frbg/lapack942 by
2023-12-23 23:11:46 +0100
45ef0d736
(refs/pull/4391/head)
Handle corner cases of LWORK (Reference-LAPACK PR 942) by
2023-12-23 20:16:33 +0100
c082669ad
Handle corner cases of LWORK (Reference-LAPACK PR 942) by
2023-12-23 20:05:03 +0100
29d6024ec
Handle corner cases of LWORK (Reference-LAPACK PR 942) by
2023-12-23 19:44:11 +0100
0814491d9
Handle corner cases of LWORK (Reference-LAPACK PR 942) by
2023-12-23 19:37:03 +0100
5c11b2ff4
Handle corner cases of LWORK (Reference-LAPACK PR 942) by
2023-12-23 19:27:20 +0100
8ce44c18a
Handle corner cases of LWORK (Reference-LAPACK PR 942) by
2023-12-23 19:24:10 +0100
dc20a7818
(refs/pull/4389/head)
Use functionally equivalent dynamic targets by
2023-12-23 12:19:33 +0000
ecae1389d
(refs/pull/4390/head)
Reduce duplication in kernel definitions by
2023-12-23 12:21:48 +0000
68ef2328e
Merge pull request #4388 from martin-frbg/issue4387 by
2023-12-21 22:21:44 +0100
a7ed60bfe
(refs/pull/4388/head)
Add lower limit for multithreading by
2023-12-21 20:05:23 +0100
67779177b
Merge pull request #4383 from martin-frbg/fixlapatest by
2023-12-20 14:01:59 +0100
e67a0eaaf
(refs/pull/4383/head)
Restore OpenBLAS-specific build rule changes by
2023-12-19 23:15:11 +0100
bb8b91e9f
restore OpenBLAS-specific test paths by
2023-12-19 23:13:02 +0100
fa220b296
Merge pull request #4382 from Mousius/sve-dot-again by
2023-12-19 18:46:18 +0100
3f46d0c79
Merge pull request #4381 from darshanp4/issue_4323 by
2023-12-19 16:53:53 +0100
60e66725e
(refs/pull/4382/head)
Use numeric labels to allow repeated inlining by
2023-12-19 13:11:06 +0000
7a4fef4f6
Tweak SVE dot kernel by
2023-12-15 12:50:48 +0000
dab0da824
(refs/pull/4381/head)
Update GEMM param for NEOVERSEV1 by
2023-12-19 13:56:55 +0530
5bdde6299
(refs/pull/4380/head)
test loading numpy/openblas on neoversen1 by
2023-12-17 18:42:30 +0100
3b520a56a
Merge pull request #4378 from martin-frbg/issue3871 by
2023-12-15 21:58:56 +0100
563daadc9
Merge pull request #4379 from barracuda156/ppc970 by
2023-12-15 20:03:44 +0100
8c143331b
(refs/pull/4379/head)
PPC970: drop -mcpu=970 which seems to produce faulty code by
2023-12-15 22:55:52 +0800
d2f1594bc
Merge pull request #4368 from martin-frbg/issue4073 by
2023-12-15 14:49:52 +0100
544cb8630
(refs/pull/4378/head)
Mention C906V instruction set limitation and update DYNAMIC_ARCH lists by
2023-12-15 14:03:59 +0100
8793601e8
Merge pull request #4375 from martin-frbg/issue4352 by
2023-12-15 13:35:18 +0100
f06b53556
(refs/pull/4375/head)
Use C kernel for dgemv_t due to limitations of the old assembly one by
2023-12-15 09:58:44 +0100
293131d6b
Merge pull request #4370 from barracuda156/unbreak_powerpc by
2023-12-14 10:30:03 +0100
981e315b3
(refs/pull/4370/head)
cc.cmake: use -force_cpusubtype_ALL for Darwin PPC by
2023-12-14 12:01:31 +0800
d9653af01
KERNEL.PPC970, KERNEL.PPCG4: unbreak CMake parsing by
2023-12-13 19:23:50 +0800
302ca7edc
Merge pull request #4371 from barracuda156/970 by
2023-12-13 14:32:37 +0100
a8d3619f6
(refs/pull/4371/head)
cc.cmake: add optflags for G5 and G4 kernels by
2023-12-13 19:42:56 +0800
aa46f1e4e
(refs/pull/4368/head)
revert addition of MSVC-compatible complex (moved to lapacke_config.h) by
2023-12-12 23:07:48 +0100
dcdc35127
Add MSVC-compatible complex types by
2023-12-12 23:06:22 +0100
55a0718f7
Merge pull request #4369 from ChipKerchner/power10Copies by
2023-12-12 18:49:21 +0100
93747fb37
(refs/pull/4369/head)
Merge remote-tracking branch 'origin/develop' into power10Copies by
2023-12-12 09:32:49 -0600
dcf6999c4
remove extraneous endif by
2023-12-12 11:27:17 +0100
6bd7c54af
introduce MT_TRACE to clean up SMP_DEBUG code by
2023-12-11 15:13:04 -0800
330101e0b
Add complex type definitions for MSVC by
2023-12-11 21:52:00 +0100
d9f147806
Merge pull request #4367 from barracuda156/unbreak_powerpc by
2023-12-11 21:38:32 +0100
9dbc8129b
(refs/pull/4367/head)
cpuid_power.c: add CPU_SUBTYPE_POWERPC_7400 case by
2023-12-11 21:09:06 +0800
c732f275a
system_check.cmake: fix arch detection for Darwin PowerPC by
2023-12-11 21:05:31 +0800
e60fb0f39
Merge pull request #4359 from mseminatore/win_perf by
2023-12-09 23:40:26 +0100
efa9515a2
(refs/pull/4359/head)
Merge branch 'OpenMathLib:develop' into win_perf by
2023-12-09 10:09:49 -0800
4e738e561
Replace two vector loads with one vector pair load and fix endianess of stores. by
2023-12-08 12:36:08 -0600
1332f8a82
Merge pull request #4159 from OMaghiarIMG/risc-v-tail-policy by
2023-12-08 10:25:41 +0100
edac80d7e
some cleanup, dynamically scale threads, add missing WIN_CASE defn by
2023-12-07 14:59:27 -0800
2d316c292
Merge pull request #4125 from OMaghiarIMG/risc-v by
2023-12-07 14:50:58 +0100
5b09833b1
Merge pull request #4019 from uniontech-lilinjie/develop by
2023-12-07 14:46:17 +0100
3193aa9c7
Merge pull request #4362 from yinshiyou/la-dev by
2023-12-07 09:15:15 +0100
d32f38fb3
(refs/pull/4362/head)
loongarch64: Add optimizations for nrm2. by
2023-12-07 13:15:55 +0800
f9b468990
loongarch64: Add optimizations for rot. by
2023-12-07 13:12:29 +0800
c80e7e27d
loongarch64: Add optimizations for sum and asum. by
2023-12-07 13:08:03 +0800
d4c96a35a
loongarch64: Add optimizations for axpy and axpby. by
2023-12-07 13:02:03 +0800
360acc0a4
loongarch64: Add optimizations for swap. by
2023-12-07 12:57:05 +0800
174c25766
loongarch64: Add optimizations for copy. by
2023-12-07 12:15:46 +0800
49829b2b7
loongarch64: Add optimizations for iamin. by
2023-12-07 12:11:30 +0800
be83f5e4e
loongarch64: Add optimizations for iamax. by
2023-12-07 12:07:30 +0800
e3fb2b5af
loongarch64: Add optimizations for imin. by
2023-12-07 12:01:05 +0800
e46b48e37
loongarch64: Add optimizations for imax. by
2023-12-07 11:56:41 +0800
702fc1d56
loongarch64: Add optimization for min. by
2023-12-07 11:51:19 +0800