Martin Kroeker
522aaf53bf
Break out of potentially infinite rescaling loop in LAPACK xLARGV/xLARTG/xLARTGP
Reference-LAPACK issue 411
6 years ago
Martin Kroeker
0f9a935a5a
Merge pull request #62 from xianyi/develop
rebase
6 years ago
Martin Kroeker
79cd69fea4
Merge pull request #2644 from martin-frbg/cmake-maxstack
Add CMAKE support for MAX_STACK_ALLOC setting
6 years ago
Martin Kroeker
bb12c2c854
Limit MAX_STACK_ALLOC availability to non-Wndows
6 years ago
Martin Kroeker
32c1c1e125
Update azure-pipelines.yml
6 years ago
Martin Kroeker
f1953b8b81
Update azure-pipelines.yml
6 years ago
Martin Kroeker
6e97df7b47
Add CMAKE support for MAX_STACK_ALLOC setting
6 years ago
Martin Kroeker
729303e5ed
Merge pull request #2643 from craft-zhang/cortex-a53
Improve performance of SGEMM on Arm Cortex-A53
6 years ago
Martin Kroeker
547965530f
Merge pull request #2638 from leezu/actions
Add Github Actions test for DYNAMIC_ARCH builds on Linux and macOS
6 years ago
ZhangDanfeng
9b7877ccf1
sgemm copy source init
Signed-off-by: ZhangDanfeng <467688405@qq.com>
6 years ago
ZhangDanfeng
f82fa802d1
Insert prefetch
Signed-off-by: ZhangDanfeng <467688405@qq.com>
6 years ago
Martin Kroeker
3eda3d34c3
Merge pull request #2641 from martin-frbg/ppcg4
Work around PPC G4 test failures
6 years ago
Martin Kroeker
a8f42ae85c
set cmake build type to Release
6 years ago
Martin Kroeker
e6e2e531bc
revert clang pragma
6 years ago
Martin Kroeker
456dc04441
Update sgemm_kernel_16x4_skylakex_3.c
6 years ago
Martin Kroeker
89323458a9
preset optimization level for apple clang
6 years ago
Martin Kroeker
e153bdeb70
Update dynamic_arch.yml
6 years ago
Martin Kroeker
c2001f7756
Make cmake build verbose to see options in use
6 years ago
Martin Kroeker
c2b3f0b3f6
Revert "keep Apple Clang from optimizing this"
6 years ago
Martin Kroeker
f16e39554d
Change PPCG4 CGEMM_M to match kernel change
6 years ago
Martin Kroeker
b1ee81228a
Change complex DOT and ROT to generic kernels and switch CGEMM
in response to test failures seen in #2628 and BLAS-Tester
6 years ago
Martin Kroeker
9f7358d7dc
Keep Apple Clang from optimizing this
6 years ago
Martin Kroeker
54fa90fb25
Keep apple clang 11.0.3 from trying to optimize this (and running out of registers)
6 years ago
Leonard Lausen
5a709b8340
Print CPU info in output
6 years ago
Leonard Lausen
b31a68b835
Add Github Actions test for DYNAMIC_ARCH builds
6 years ago
Martin Kroeker
a349d48d89
Merge pull request #2636 from martin-frbg/issue2634
Fix CMAKE build Issues on OS X
6 years ago
Martin Kroeker
4db00121dc
Disable EXPRECISION and add -lm on OSX (same as the BSDs and Linux)
6 years ago
Martin Kroeker
909897f13b
Document option USE_LOCKING
6 years ago
Martin Kroeker
e79245acd9
Merge pull request #2635 from ilayn/patch-1
BUG: Fix the loop range in ZHEEQUB.f
6 years ago
Ilhan Polat
76d2612e0c
BUG: Fix the loop range in ZHEEQUB.f
6 years ago
Martin Kroeker
dd7a650792
Merge pull request #59 from xianyi/develop
rebase
6 years ago
Martin Kroeker
4a4c50a7ce
Merge pull request #2627 from pkubaj/patch-1
Add powerpc (32-bit)
6 years ago
Martin Kroeker
d069780e63
Merge pull request #2626 from docularxu/working-gcc-version-detections
make GCC version detection OS-independent
6 years ago
pkubaj
33c8790603
Add powerpc (32-bit)
Only powerpc64 is present.
6 years ago
Guodong Xu
06387ac0e6
make GCC version detection OS-independent
Previous design put GCC version detection inside of OSNAME 'WINNT'.
However, such detections are required for 'Linux' and possibly other
OS'es as well. For example, there is usage of the GCC versions
in Makefile.arm64. When compiling on Linux machine, in the previous
design, Markfile.arm64 will not know the correct GCC version.
The fix is to move GCC version detection into common part, not
wrapped by anything.
Signed-off-by: Guodong Xu <guodong.xu@linaro.com>
6 years ago
Martin Kroeker
f1a18d245b
Merge pull request #2618 from craft-zhang/cortex-A53
Improve performance of SGEMM and STRMM on Arm Cortex-A53
6 years ago
张丹枫
2a3aa91354
update CONTRIBUTORS.md, adding myself
6 years ago
张丹枫
ea5bdc3f72
split cortex-a53 param to match 8x8 kernel
6 years ago
张丹枫
9df79ae9a3
update sgemm and strmm kernel selecting strategy
6 years ago
张丹枫
a1fc6041cd
use general register to speedup
6 years ago
张丹枫
edb423d772
align general register using to strmm_kernel_8x8
6 years ago
zhangdanfeng
0e6eb8c247
sgemm kernel use sgemm_kernel_8x8_cortexa53
Signed-off-by: zhangdanfeng <zhangdanfeng@cloudwalk.cn>
6 years ago
zhangdanfeng
d475db29c6
optimized for cortex-a53
Signed-off-by: zhangdanfeng <zhangdanfeng@cloudwalk.cn>
6 years ago
Martin Kroeker
729ac6bd4a
Merge pull request #2623 from mhillenibm/zarch_dgemm_z14
s390x: Use new sgemm kernel also for DGEMM and DTRMM on Z14 (+ small cleanup)
6 years ago
Marius Hillenbrand
89fe17f20e
s390x: Use new sgemm kernel also for DGEMM and DTRMM on Z14
Apply our new GEMM kernel implementation, written in C with vector intrinsics,
also for DGEMM and DTRMM on Z14 and newer (i.e., architectures with FP32 SIMD
instructions). As a result, we gain around 10% in performance on z15, in
addition to improving maintainability.
Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>
6 years ago
Marius Hillenbrand
bdd795ed03
s390x/GEMM: replace 0-init with peeled first iteration
... since it gains another ~2% of SGEMM and DGEMM performance on z15;
also, the code just called for that cleanup.
Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>
6 years ago
Martin Kroeker
e1038ea836
Merge pull request #2622 from martin-frbg/issue2619
Improve declaration of LAPACKE_get_nancheck
6 years ago
Martin Kroeker
6baa9a778d
Improve declaration of LAPACKE_get_nancheck
6 years ago
Martin Kroeker
cf46c9f84e
Merge pull request #2617 from martin-frbg/issue2616
Add workaround for unhandled gmake jobserver flags in c_check/f_check
6 years ago
Martin Kroeker
55602fce56
Ignore spurious all-numeric library names derived from mishandled jobserver flags
6 years ago