Martin Kroeker
697e64bbb6
Fix syntax
4 years ago
Martin Kroeker
6ae7af78a3
Support compilation with nagfor
4 years ago
Martin Kroeker
041a26fd79
Support compilation with nagfor
4 years ago
Martin Kroeker
3c356b1a1f
Support compilation with the NAG Fortran compiler
4 years ago
Martin Kroeker
b1215f2f8c
Merge pull request #16 from xianyi/develop
rebase
4 years ago
Martin Kroeker
0b73041b16
Merge pull request #3137 from RajalakshmiSR/zscal_p10
Optimize zscal function for POWER10
4 years ago
Rajalakshmi Srinivasaraghavan
09d47af2c0
Optimize zscal function for POWER10
This patch makes use of new POWER10 vector pair instructions for
loads and stores.
5 years ago
Martin Kroeker
ef0238ba2b
Merge pull request #3130 from martin-frbg/issue3128
Replace spurious AVX512 requirement in the Haswell srot microkernel with an AVX2/FMA3 guard
5 years ago
Martin Kroeker
a9f6f7ad39
Remove spurious AVX512 requirement and add AVX2/FMA3 guard
5 years ago
Martin Kroeker
1d254d321b
Merge pull request #3129 from RajalakshmiSR/asum_p10
Optimize s/dasum function for POWER10
5 years ago
Rajalakshmi Srinivasaraghavan
41646ed006
Optimize s/dasum function for POWER10
This patch makes use of new POWER10 vector pair instructions for
loads and stores.
5 years ago
Martin Kroeker
3679781872
Merge pull request #3126 from martin-frbg/m1bench
Support timing Apple M1 in the benchmarks
5 years ago
Martin Kroeker
38dcf3454b
Support timing Apple M1
5 years ago
Martin Kroeker
e34d57ca90
Merge pull request #3125 from martin-frbg/issue3123
Fix AMD AOCC compiler detection
5 years ago
Martin Kroeker
20f492c298
Fix AMD AOCC compiler detection
5 years ago
Martin Kroeker
c7c82be1c3
Merge pull request #3122 from martin-frbg/xeigtstz
Fix unusual stack size requirements of the LAPACK EIG tests (from Reference-LAPACK PR 335)
5 years ago
Martin Kroeker
9564f688c4
Adjust build rules for ?chkee.F
5 years ago
Martin Kroeker
90c1776c86
Adjust build rules for ?chkee.F
5 years ago
Martin Kroeker
9cf861e8fa
Add rewritten cchkee.F from Reference-LAPACK PR335
5 years ago
Martin Kroeker
9b7b1da133
Add rewritten dchkee.F from Reference-LAPACK PR335
5 years ago
Martin Kroeker
a5ab891292
Add rewritten schkee.F from Reference-LAPACK PR335
5 years ago
Martin Kroeker
90bb4ac821
Add rewritten zchkee.F from Reference-LAPACK PR335
5 years ago
Martin Kroeker
23a0d1bc1f
Delete zchkee.f
5 years ago
Martin Kroeker
0e96c378fd
Delete schkee.f
5 years ago
Martin Kroeker
ee16efff3c
Delete dchkee.f
5 years ago
Martin Kroeker
0197519dd7
Delete cchkee.f
5 years ago
Martin Kroeker
865829cfac
Merge pull request #3121 from RajalakshmiSR/mmarename
POWER10: Rename mma builtins
5 years ago
Rajalakshmi Srinivasaraghavan
0571c3187b
POWER10: Rename mma builtins
The LLVM and GCC teams agreed to rename the __builtin_mma_assemble_pair and
__builtin_mma_disassemble_pair built-ins to __builtin_vsx_assemble_pair and
__builtin_vsx_disassemble_pair respectively. This patch is to make
corresponding changes in dgemm kernel. Also made changes in
inputs to those builtins to avoid some potential typecasting issues.
Reference gcc commit id:77ef995c1fbcab76a2a69b9f4700bcfd005d8e62
5 years ago
Martin Kroeker
d12a2d0d04
Merge pull request #3120 from martin-frbg/3118-x
Fix use of undefined CC variable in f_check
5 years ago
Martin Kroeker
2d369bd916
fix undefined CC variable
5 years ago
Martin Kroeker
93843c55b6
Merge pull request #15 from xianyi/develop
rebase
5 years ago
Martin Kroeker
e3a6132e12
Merge pull request #3119 from xianyi/revert-3118-issue3018-2
Revert "Fix undefined CC in f_check (again)"
5 years ago
Martin Kroeker
736f0146c3
Revert "Fix undefined CC in f_check (again)"
5 years ago
Martin Kroeker
897fc2b6ef
Merge pull request #3118 from martin-frbg/issue3018-2
Fix undefined CC in f_check (again)
5 years ago
Martin Kroeker
441c116105
fix undefined CC again
5 years ago
Martin Kroeker
8ecd80a34a
Merge pull request #14 from xianyi/develop
rebase
5 years ago
Martin Kroeker
4ba53db0da
Merge pull request #3117 from haampie/fix-perl
use /usr/bin/env perl
5 years ago
Martin Kroeker
6c365ff648
Merge pull request #3114 from martin-frbg/issue3113
Fix dll_callback and p_process_term signatures for USE_TLS on Windows x64
5 years ago
Martin Kroeker
e33bcdbb7b
Merge pull request #3115 from martin-frbg/issue2532
Replace unoptimized OMATCOPY_RT with 4x4 blocked version
5 years ago
Harmen Stoppels
ec6b354c32
use /usr/bin/env perl
5 years ago
Martin Kroeker
292d1af1a0
Update omatcopy_rt.c
5 years ago
Martin Kroeker
325b398e3c
Update omatcopy_rt.c
5 years ago
Martin Kroeker
6f5667b4d4
Enable optimized S/D OMATCOPY_RT
5 years ago
Martin Kroeker
cceeee7806
Add optimized omatcopy_rt
5 years ago
Martin Kroeker
0a4546b742
Typo fix
5 years ago
Martin Kroeker
b1eed27a54
Replace naive omatcopy_rt with 4x4 blocked implementation
as suggested by MigMuc in issue 2532
5 years ago
Martin Kroeker
1a3ad4b670
Fix signatures of the TLS-mode dll_callback and p_process_term functions for Win64
5 years ago
Martin Kroeker
86a5f98e4a
Merge pull request #13 from xianyi/develop
rebase
5 years ago
Martin Kroeker
1caa44bea9
Merge pull request #3111 from hawkinsp/forkrace
Fix race in blas_thread_shutdown.
5 years ago
Peter Hawkins
dbbf92c1d1
Fix race in blas_thread_shutdown.
blas_server_avail was read without holding server_lock. If multiple threads call blas_thread_shutdown simultaneously, for example, by calling fork(), then they can attempt to shut down multiple times. This can lead to a segmentation fault.
5 years ago