Martin Kroeker
bd906e3410
fix copy-paste error in build rules for cblas_crotg and cblas_zrotg
5 years ago
Martin Kroeker
3dbb32c734
Merge pull request #11 from xianyi/develop
rebase
5 years ago
Martin Kroeker
00880c720a
Merge pull request #3087 from martin-frbg/lapack477
Apply Reference-LAPACK PR 477 for convergence problems in CHGEQZ/ZHGEQZ
5 years ago
Martin Kroeker
856bc36533
Add exceptional shift to fix rare convergence problems
5 years ago
Martin Kroeker
fe71887b68
Merge pull request #10 from xianyi/develop
rebase
5 years ago
Martin Kroeker
10094bd885
Merge pull request #3076 from martin-frbg/dyn-thunderx
Add Ci job for ARM64/gcc10 DYNAMIC_ARCH
5 years ago
Martin Kroeker
eea0c0f2ed
Merge pull request #3085 from alexhenrie/memory_alloc
Fix null pointer check in blas_memory_alloc
5 years ago
Martin Kroeker
85be43e0df
Merge pull request #3083 from martin-frbg/develop
Add DYNAMIC_LIST support for ARM64
5 years ago
Martin Kroeker
0cb9e9fc8d
Remove the VORTEX support bits again for now
5 years ago
Martin Kroeker
cb61d3b46b
Add DYNAMIC_LIST support for ARM64
5 years ago
Alex Henrie
113840da12
Fix null pointer check in blas_memory_alloc
5 years ago
Martin Kroeker
deb2e66bcc
Add DYNAMIC_LIST support for ARM64
5 years ago
Martin Kroeker
9b2d69aa80
Add DYNAMIC_LIST option for ARM64
5 years ago
Martin Kroeker
e3ff4cdd23
Merge pull request #9 from xianyi/develop
rebase
5 years ago
Martin Kroeker
0745ba43a4
Merge pull request #3082 from RajalakshmiSR/scalp10
Optimize s/dscal function for POWER10
5 years ago
Rajalakshmi Srinivasaraghavan
3ede843d50
Optimize s/dscal function for POWER10
This patch makes use of new POWER10 vector pair instructions for
loads and stores.
5 years ago
Martin Kroeker
69a5558203
Merge pull request #3059 from Guobing-Chen/BF16_gemm
Initial code for Cooperlake BF16 GEMM kernel
5 years ago
Martin Kroeker
d6905403e3
Merge pull request #3068 from alexhenrie/scan-build
scan-build fixes
5 years ago
Martin Kroeker
411926b572
Merge pull request #3079 from RajalakshmiSR/rotp10
Optimize s/drot function for POWER10
5 years ago
Rajalakshmi Srinivasaraghavan
439b93f6d2
Optimize s/drot function for POWER10
This patch makes use of new POWER10 vector pair instructions for
loads and stores.
5 years ago
Martin Kroeker
d6cf67778c
Merge pull request #3075 from martin-frbg/issue3074
Fix DYNAMIC_ARCH compilation on POWER with gcc <11
5 years ago
Martin Kroeker
b94dab5250
patch to support power10 in builtin_cpu_is was backported to gcc 10.2, so allow that as wel
5 years ago
Martin Kroeker
6178974cd9
Update .drone.yml
5 years ago
Martin Kroeker
0b9e4d1278
Add gcc10/arm64 DYNAMIC_ARCH build
5 years ago
Martin Kroeker
63fa3c3f8f
Require gcc 11 for builtin_cpu_is(power10)
fixes #3074
5 years ago
Martin Kroeker
3612d9a57a
Merge pull request #8 from xianyi/develop
rebase
5 years ago
Martin Kroeker
16dddb760e
Merge pull request #3070 from RajalakshmiSR/cdot
Optimize cdot function for POWER10
5 years ago
Rajalakshmi Srinivasaraghavan
eff7c9166e
Optimize cdot function for POWER10
This patch makes use of new POWER10 vector pair instructions for
loads and stores.
5 years ago
Alex Henrie
f1bf2603e6
Remove dead assignment to dflag in rotmg functions
5 years ago
Alex Henrie
6f32991eae
Don't define the mode variable when not needed in gemm functions
5 years ago
Alex Henrie
202fc9e8ed
Fix uninitialized argument value in dasum_k
5 years ago
Martin Kroeker
e378b24487
Merge pull request #3067 from albertziegenhagel/fix-generic-cmake
Fix building "generic" TRMM kernel with CMake
5 years ago
Martin Kroeker
3628b22d49
Merge pull request #3064 from martin-frbg/issue3063
Add cblas_crotg, cblas_zrotg, cblas_csrot and cblas_zdrot
5 years ago
Martin Kroeker
af2b0d0205
Merge pull request #3066 from martin-frbg/buffsizefix
Fix compile-time setting of the GEMM buffer size for gmake builds
5 years ago
Martin Kroeker
4bf988959a
Merge pull request #3062 from austinpagan/GemmPreferedSize3
Added definitions for GEMM_PREFERED_SIZE and SWITCH_RATIO to the POWE…
5 years ago
Martin Kroeker
a0e4fb3a28
Merge pull request #3061 from martin-frbg/arm64-pgi
Support NVIDIA HPC SDK on ARM64
5 years ago
Martin Kroeker
2c445be8ba
Merge pull request #3051 from martin-frbg/rocketlake
Add CPUID information for Intel Rocket Lake
5 years ago
Albert Ziegenhagel
e3f4063683
Fix building "generic" TRMM kernel with CMake
The CMake "TARGET_CORE" variables stores the "generic" target name in all lowercase letters, but gets compared to an all uppercase string, which results in the wrong TRMM kernel being selected.
This commit converts the TARGET_CORE to all uppercase before comparing its value to make sure case mismatches are not an issue in the future anymore.
5 years ago
Martin Kroeker
6bbe6d5b92
Make compile-time BUFFERSIZE setting actually reach the compiler/preprocessor
5 years ago
Martin Kroeker
89ae305e11
Workaround for cmake having its own C_COMPILER variable
5 years ago
Martin Kroeker
da8d7f09f1
try to work around gcc update problems
5 years ago
Martin Kroeker
25c986db5a
Add prototypes for CBLAS_CROTG and CBLAS_ZROTG
5 years ago
Martin Kroeker
a8f249458d
Build CBLAS interfaces for CROTG and ZROTG as well
5 years ago
Martin Kroeker
bc5b35367f
restore Makefile after accidental overwrite
5 years ago
Martin Kroeker
930aff2c2e
Build CBLAS interfaces for CROTG and ZROTG as well
5 years ago
Martin Kroeker
ac3e2a3fdd
Add CBLAS interfaces for csrot and zdrot
5 years ago
Martin Kroeker
9ccb12b031
Add prototypes for cblas_csrot and cblas_zdrot
5 years ago
Martin Kroeker
e18a2c22db
Merge pull request #3060 from martin-frbg/dyn_arm64
Label the assembly part of the ARMV8 dynamic arch detection as volatile
5 years ago
Martin Kroeker
b716c0ef01
Add workaround for NVIDIA HPC
5 years ago
Martin Kroeker
2efa3b70dc
Add workaround for NVIDIA HPC
5 years ago