Martin Kroeker
5fdf9ad24f
Merge pull request #2228 from martin-frbg/issue2227
Add Intel Goldmont Plus CPUID
6 years ago
Martin Kroeker
2fe967c542
Merge branch 'develop' into issue2227
6 years ago
Martin Kroeker
6d8595351c
Add Intel Goldmont Plus CPUID
fixes #2227
6 years ago
Martin Kroeker
f40200f559
Merge pull request #2223 from martin-frbg/getarch-pgi
Make getarch compile with PGI
6 years ago
Martin Kroeker
a95a5e52b8
Fix PGI compiler detection for getarch
6 years ago
Martin Kroeker
e3d846ab57
Do not use -march=native with the PGI compiler
6 years ago
Martin Kroeker
8506386d82
Merge pull request #1 from xianyi/develop
rebase
6 years ago
Martin Kroeker
9ef96b32a6
Add multithreading support to the x86_64 zdot kernel ( #2222 )
* Add multithreading support
copied from the ThunderX2T99 kernel. For #2221
6 years ago
Martin Kroeker
b48c025974
Merge pull request #2218 from martin-frbg/issue2215
Make the new DGEMM regression test properly depend on CBLAS and LAPACKE
6 years ago
Martin Kroeker
a1fce67743
Make the new DGEMM regression test properly depend on CBLAS and LAPACKE
fixes #2215
6 years ago
Martin Kroeker
103b32fdb7
Merge pull request #2216 from martin-frbg/issue2214
Remove case-sensitivity in x86 LSAME on (AMD) cpus without CMOV
6 years ago
Martin Kroeker
aef9804089
Fix unwanted case-sensitivity in x86 LSAME for (AMD) processors without CMOV
Problem was already noticed some years ago in #238 , but back then the problem was only corrected in one of the #ifdef branches.
Fixes #2214
6 years ago
Martin Kroeker
303869f572
Update with changes from 0.3.7
6 years ago
Martin Kroeker
02d9203981
Increment version to 0.3.8.dev
6 years ago
Martin Kroeker
7b6808b69c
Increment version to 0.3.8.dev
6 years ago
Martin Kroeker
321288597c
Merge pull request #2212 from martin-frbg/nofort-nolib
Avoid spurious dependency on the fortran runtime despite NOFORTRAN=1
6 years ago
Martin Kroeker
be147a9f28
Avoid adding a spurious dependency on the fortran runtime despite NOFORTRAN=1
for cases where a fortran compiler is present but not wanted (e.g. not fully functional)
6 years ago
Martin Kroeker
c275290ea6
Merge pull request #2211 from martin-frbg/arm64_gcc_trivial
Silence two nuisance warnings from gcc
6 years ago
Martin Kroeker
b7bbb02447
Silence two nuisance warnings from gcc
6 years ago
Martin Kroeker
bf1430f7d7
Merge pull request #2208 from martin-frbg/munmap-debug
Provide more information on mmap/munmap failure
6 years ago
Martin Kroeker
dccff2e785
Merge pull request #2206 from martin-frbg/zen-dtrmm
Replace vpermpd with vpermilpd in the Haswell DTRMM kernel
6 years ago
Martin Kroeker
5c3458a6e7
Merge pull request #2199 from martin-frbg/zen-dtrsm
Replace most vpermpd calls in the Haswell DTRSM_RN kernel
6 years ago
Martin Kroeker
1776ad82c0
Add files via upload
6 years ago
Martin Kroeker
4e2f81cfa1
Provide more information on mmap/munmap failure
for #2207
6 years ago
Martin Kroeker
acf6002ab2
Replace most vpermpd calls in the Haswell DTRSM_RN kernel
6 years ago
Martin Kroeker
96a794e9fd
Merge pull request #2198 from martin-frbg/icelake
Update CPUID recognition for Intel Ice Lake
6 years ago
Martin Kroeker
3d36c45116
Add CPUID identification of Intel Ice Lake
6 years ago
Martin Kroeker
648491e1aa
Autodetect Intel Ice Lake (as SKYLAKEX target)
6 years ago
Martin Kroeker
2dfb804cb9
Replace vpermpd with vpermilpd in the Haswell DTRMM kernel
to improve performance on AMD Zen (#2180 ) applying wjc404's improvement of the DGEMM kernel from #2186
6 years ago
Martin Kroeker
4c153ec9da
Merge pull request #2196 from wjc404/develop
Add vbroadcastsd kernel to dgemm_kernel_4x8_haswell.S
6 years ago
wjc404
7eecd8e39c
Add files via upload
6 years ago
Martin Kroeker
f0406a7708
Merge pull request #2112 from ffontaine/develop
Makefile.arm: remove -march flags
6 years ago
Martin Kroeker
561f3fd995
Merge pull request #2193 from martin-frbg/makeutest
Override special make variables
6 years ago
Martin Kroeker
30efed14d1
Unset special make variables in ctest Makefile as well
6 years ago
Martin Kroeker
af2e7f28fc
Override special make variables
as seen in https://github.com/xianyi/OpenBLAS/issues/1912#issuecomment-514183900 , any external setting of TARGET_ARCH (which could result from building OpenBLAS as part of a larger project that actually uses this variable) would cause the utest build to fail.
(Other subtargets appear to be unaffected as they do not use implicit make rules)
6 years ago
Martin Kroeker
4250e6ed64
Merge pull request #2191 from tylerjereddy/conditional_updates
MAINT: remove legacy CMake endif()
6 years ago
Martin Kroeker
7b0b7c11d2
Merge pull request #2190 from martin-frbg/zdot-zen
Replace vpermpd with vpermilpd in the Haswell/Zen zdot microkernel
6 years ago
Martin Kroeker
d14cf1ccf4
Merge pull request #2189 from wjc404/develop
Update dgemm_kernel_4x8_haswell.S for reducing cache misses
6 years ago
Tyler Reddy
3f6ab1582a
MAINT: remove legacy CMake endif()
* clean up a case where CMake endif()
contained the conditional used in the
if(), which is no longer needed /
discouraged since our minimum required
CMake version supports the modern syntax
6 years ago
Martin Kroeker
28e96458e5
Replace vpermpd with vpermilpd
to improve performance on Zen/Zen2 (as demonstrated by wjc404 in #2180 )
6 years ago
wjc404
95fb98f556
Update dgemm_kernel_4x8_haswell.S
6 years ago
wjc404
4801c6d36b
Update dgemm_kernel_4x8_haswell.S
6 years ago
wjc404
9440fa607d
Add files via upload
6 years ago
wjc404
94db259e5b
Add files via upload
6 years ago
wjc404
f49f8047ac
Add files via upload
6 years ago
wjc404
825777faab
Update dgemm_kernel_4x8_haswell.S
6 years ago
wjc404
9c89757562
Add files via upload
6 years ago
Martin Kroeker
b0b7600bef
Merge pull request #2186 from wjc404/develop
Update "dgemm_kernel_4x8_haswell.S" for improving performance on zen2 chips
6 years ago
wjc404
9b04baeaee
Update dgemm_kernel_4x8_haswell.S
6 years ago
wjc404
8a074b3965
Update dgemm_kernel_4x8_haswell.S
6 years ago