Martin Kroeker
2663e44724
Update version to 0.3.14 for release
4 years ago
Martin Kroeker
6f2900c164
Merge pull request #3149 from martin-frbg/changelog14
Update Changelog for 0.3.14
4 years ago
Martin Kroeker
7888b5127c
Update Changelog for 0.3.14
4 years ago
Martin Kroeker
8808c291b9
Merge pull request #3148 from martin-frbg/issue3145
Add workaround for older gcc on big-endian ppc64 not supporting casts in defines
4 years ago
Martin Kroeker
8cdf0825de
Add workaround for older gcc on ppc64be not supporting casts in defines
4 years ago
Martin Kroeker
9e0dbe8e59
Merge pull request #18 from xianyi/develop
rebase
4 years ago
Martin Kroeker
52f99d3944
Merge pull request #3147 from martin-frbg/issue3146
Fix DYNAMIC_ARCH builds with CLANG on ppc64
4 years ago
Martin Kroeker
186368ddc3
Fix compilation with CLANG
4 years ago
Martin Kroeker
c0b94ae1df
Merge pull request #3143 from martin-frbg/fix3088
Resolve circular dependency between common.h and param.h
4 years ago
Martin Kroeker
ddd86309a1
Merge pull request #3144 from xoviat/fix-test
disable openmp
4 years ago
xoviat
e9d453b623
disable openmp
4 years ago
Martin Kroeker
ecb4babcf4
remove inclusion of common.h again to avoid circular dependency
4 years ago
Martin Kroeker
34753eaebb
Include common.h (and indirectly param.h) rather than just param.h to have BLASLONG available w/o circular dependencies
4 years ago
Martin Kroeker
efa72a631b
Merge pull request #17 from xianyi/develop
rebase
4 years ago
Martin Kroeker
30d835168a
Merge pull request #3088 from xoviat/msvc
add misc fixes.
4 years ago
Martin Kroeker
8f6a744807
Merge pull request #3141 from martin-frbg/nagfor-2
Leave out ARM64 march/mtune options when compiling with nagfor
4 years ago
Martin Kroeker
6726771645
Support compilation with NAG fortran
4 years ago
Martin Kroeker
a51cae6b2e
Merge pull request #3140 from martin-frbg/issue3139
Fix compilation on older x86_64 targets with old compilers that lack intrinsics support
4 years ago
Martin Kroeker
d30b943251
Merge pull request #3138 from martin-frbg/nagfor
Add support for compilation with the NAG Fortran compiler
4 years ago
Martin Kroeker
0934568d9c
Move includes under the ifdef for compilers w/o intrinsics support
4 years ago
Martin Kroeker
697e64bbb6
Fix syntax
4 years ago
Martin Kroeker
bffb9b0e95
Merge pull request #3136 from austinpagan/Gemm.PQ
Modifying a couple parameters in the "POWER10"-specific section of pa…
4 years ago
Martin Kroeker
6ae7af78a3
Support compilation with nagfor
4 years ago
Martin Kroeker
041a26fd79
Support compilation with nagfor
4 years ago
Martin Kroeker
3c356b1a1f
Support compilation with the NAG Fortran compiler
4 years ago
Martin Kroeker
b1215f2f8c
Merge pull request #16 from xianyi/develop
rebase
4 years ago
Martin Kroeker
0b73041b16
Merge pull request #3137 from RajalakshmiSR/zscal_p10
Optimize zscal function for POWER10
4 years ago
austinpagan
9579bd47e5
Modifying a couple paramaters in the "POWER10"-specific section of param.h, for performance enhancements for SGEMM and DGEMM.
5 years ago
Rajalakshmi Srinivasaraghavan
09d47af2c0
Optimize zscal function for POWER10
This patch makes use of new POWER10 vector pair instructions for
loads and stores.
5 years ago
Martin Kroeker
ef0238ba2b
Merge pull request #3130 from martin-frbg/issue3128
Replace spurious AVX512 requirement in the Haswell srot microkernel with an AVX2/FMA3 guard
5 years ago
Martin Kroeker
a9f6f7ad39
Remove spurious AVX512 requirement and add AVX2/FMA3 guard
5 years ago
Martin Kroeker
1d254d321b
Merge pull request #3129 from RajalakshmiSR/asum_p10
Optimize s/dasum function for POWER10
5 years ago
Rajalakshmi Srinivasaraghavan
41646ed006
Optimize s/dasum function for POWER10
This patch makes use of new POWER10 vector pair instructions for
loads and stores.
5 years ago
Martin Kroeker
3679781872
Merge pull request #3126 from martin-frbg/m1bench
Support timing Apple M1 in the benchmarks
5 years ago
Martin Kroeker
38dcf3454b
Support timing Apple M1
5 years ago
Martin Kroeker
e34d57ca90
Merge pull request #3125 from martin-frbg/issue3123
Fix AMD AOCC compiler detection
5 years ago
Martin Kroeker
20f492c298
Fix AMD AOCC compiler detection
5 years ago
Martin Kroeker
c7c82be1c3
Merge pull request #3122 from martin-frbg/xeigtstz
Fix unusual stack size requirements of the LAPACK EIG tests (from Reference-LAPACK PR 335)
5 years ago
Martin Kroeker
9564f688c4
Adjust build rules for ?chkee.F
5 years ago
Martin Kroeker
90c1776c86
Adjust build rules for ?chkee.F
5 years ago
Martin Kroeker
9cf861e8fa
Add rewritten cchkee.F from Reference-LAPACK PR335
5 years ago
Martin Kroeker
9b7b1da133
Add rewritten dchkee.F from Reference-LAPACK PR335
5 years ago
Martin Kroeker
a5ab891292
Add rewritten schkee.F from Reference-LAPACK PR335
5 years ago
Martin Kroeker
90bb4ac821
Add rewritten zchkee.F from Reference-LAPACK PR335
5 years ago
Martin Kroeker
23a0d1bc1f
Delete zchkee.f
5 years ago
Martin Kroeker
0e96c378fd
Delete schkee.f
5 years ago
Martin Kroeker
ee16efff3c
Delete dchkee.f
5 years ago
Martin Kroeker
0197519dd7
Delete cchkee.f
5 years ago
Martin Kroeker
865829cfac
Merge pull request #3121 from RajalakshmiSR/mmarename
POWER10: Rename mma builtins
5 years ago
Rajalakshmi Srinivasaraghavan
0571c3187b
POWER10: Rename mma builtins
The LLVM and GCC teams agreed to rename the __builtin_mma_assemble_pair and
__builtin_mma_disassemble_pair built-ins to __builtin_vsx_assemble_pair and
__builtin_vsx_disassemble_pair respectively. This patch is to make
corresponding changes in dgemm kernel. Also made changes in
inputs to those builtins to avoid some potential typecasting issues.
Reference gcc commit id:77ef995c1fbcab76a2a69b9f4700bcfd005d8e62
5 years ago