Martin Kroeker
ae1d1f74f7
Query AVX2 and AVX512 capability for runtime cpu selection
7 years ago
Martin Kroeker
20d1aad13f
Fix missing quotes around thunderx targets
7 years ago
TiborGY
d11554c88f
Validate user supplied TARGET ( #1941 )
the build will now abort with an error message when an undefined build TARGET is named
Fixes #1938
7 years ago
Martin Kroeker
ed704185ab
Increment version to 0.3.6.dev
7 years ago
Martin Kroeker
2940798ea7
Increment version to 0.3.6.dev
7 years ago
Martin Kroeker
1c75b65d53
Merge branch 'release-0.3.0' into develop
7 years ago
Martin Kroeker
13d006339b
Update ChangeLog.txt with changes from 0.3.5
7 years ago
Martin Kroeker
bf76162635
Merge pull request #1944 from hartzell/patch-1
Typo: Skyalke -> Skylake
7 years ago
George Hartzell
0d52aefc6b
Typo: Skyalke -> Skylake
Worth fixing, it gets in the way of searching....
7 years ago
Martin Kroeker
a6787b0f81
Merge pull request #1939 from TiborGY/patch-2
Fix typo in UNKNOWN core name
7 years ago
Martin Kroeker
8643521127
Merge pull request #1943 from martin-frbg/issue1748
Re-enable loop unrolling in trmv and remove the scary warning
7 years ago
Martin Kroeker
5a720cf9ca
Re-enable loop unrolling in trmv and remove the scary warning
fixes #1748 as that half of the fix for #1332 appears to have been an overreaction on my part.
7 years ago
Martin Kroeker
ccd5945d38
Merge pull request #1942 from martin-frbg/issue1720
Delete the pthread key on cleanup in TLS mode
7 years ago
Martin Kroeker
9f80e0f5fc
Remove stray include of complex.h
already provided conditionally by common.h via openblas_utest.h
Unconditional inclusion breaks older Android and similar platforms that use OPENBLAS_COMPLEX_STRUCT
7 years ago
Martin Kroeker
bba1e67269
Delete the pthread key on cleanup in TLS mode
to avoid a crash when OpenBLAS was loaded via dlopen and libc tries to clean up the leaked TLS after dlclose
Fixes #1720
7 years ago
Martin Kroeker
93240f489e
Fix wrong case in TARGET setting for Alpine
7 years ago
TiborGY
7cbc2c37d6
Update cpuid_mips64.c
7 years ago
TiborGY
c329de2931
Update Makefile
7 years ago
TiborGY
187233953c
Update cpuid_mips.c
7 years ago
TiborGY
09170268a3
Update cpuid_arm.c
7 years ago
TiborGY
211120c508
Fix typo in UNKNOWN core name
Should be of no consequence, right?
7 years ago
Martin Kroeker
9e4d190f4f
Merge pull request #1932 from martin-frbg/issue1915
Add -fPIC to provided CFLAGS/FFLAGS if required
7 years ago
Martin Kroeker
fe02ba86a4
Remove unnecessary change again
7 years ago
Martin Kroeker
284fb00971
Merge pull request #1934 from fenrus75/betagoof
Fix thinko in skylake beta handling
7 years ago
Arjan van de Ven
795285c587
Fix thinko in skylake beta handling
casting ints is cheaper but it has a rounding, not memory casing effect, resulting in
invalid outcome
7 years ago
Martin Kroeker
d6818777d1
Make sure that -fPIC is present if needed
7 years ago
Martin Kroeker
5bd21ab6e1
Make sure that -fPIC is present when needed
override user-provided FFLAGS if necessary
7 years ago
Martin Kroeker
e1eab96502
Merge pull request #1931 from martin-frbg/pr1921
Add -mavx2 to TARGET=HASWELL builds
7 years ago
Martin Kroeker
76b4b8980f
Use -dumpversion with gcc only
7 years ago
Martin Kroeker
49e0f485da
Add -mavx2 for TARGET=HASWELL if compiler supports and requires it
7 years ago
Martin Kroeker
43c2b0eb55
Add -mavx2 to TARGET=HASWELL builds
to leverage improvements from PR#1921
7 years ago
Martin Kroeker
942e229ed5
Merge pull request #1930 from martin-frbg/issue1908
Reflect ARMV8 target definition changes from PR1876
7 years ago
Martin Kroeker
26a3402773
Reflect ARMV8 target definition changes from PR1876
and create config target directory for cross-compiles.
7 years ago
Martin Kroeker
20033f992a
Merge pull request #1929 from martin-frbg/issue1924
Avoid taking the root of a negative number in simple threaded syrk
7 years ago
Martin Kroeker
f343ed65b5
Avoid taking the root of a negative number
Fixes #1924 where numpy 1.17+ would report the (transient) FE_INVALID exception raised for the domain error.
7 years ago
Martin Kroeker
a5a1118527
Merge pull request #1 from xianyi/develop
rebase
7 years ago
Martin Kroeker
e23366e860
Merge pull request #1921 from fenrus75/haswelldgemm
Replicate some of the SKYLAKEX dgemm improvements also to HASWELL
7 years ago
Arjan van de Ven
b28f75cd7e
set GEMM_PREFERED_SIZE for HASWELL
Haswell likes a GEMM_PREFERED_SIZE of 16 to improve the split that the
threading code does to make it a nice multiple of the SIMD kernel size
7 years ago
Arjan van de Ven
d321448a63
dgemm: use dgemm_ncopy_8_skylakex.c also for Haswell
The dgemm_ncopy_8_skylakex.c code is not avx512 specific and gives
a nice performance boost for medium sized matrices
7 years ago
Arjan van de Ven
c43331ad0a
dgemm: Use the skylakex beta function also for haswell
it's more efficient for certain tall/skinny matrices
7 years ago
Martin Kroeker
e8ca5a59a9
Merge pull request #1919 from fenrus75/haswelltuning
(sgemm) Apply some of the SKYLAKEX optimizations also to HASWELL
7 years ago
Martin Kroeker
c4e23dd016
Update Makefile
7 years ago
Martin Kroeker
cfc4acc221
typo
7 years ago
Martin Kroeker
545c2b1bbb
Add -mavx2 on Haswell only if the compiler supports it
7 years ago
Arjan van de Ven
69d206440a
Make the skylakex/haswell sgemm code compile and run even with compilers without avx2 support
7 years ago
Martin Kroeker
3843e3e017
use -maxv2 on haswell
7 years ago
Martin Kroeker
fbcb14a74b
should be core-avx2
7 years ago
Martin Kroeker
2a3190dc76
fix elseifeq and use older option core2-avx for compatibility
7 years ago
Martin Kroeker
1ebe5c0f49
Add -march=haswell to HASWELL part of DYNAMIC_ARCH build
7 years ago
Arjan van de Ven
0586899a10
Use sgemm_ncopy_4_skylakex.c also for Haswell
sgemm_ncopy_4_skylakex.c uses SSE transpose operations where the
real perf win happens; this also works great for Haswell.
This gives double digit percentage gains on small and skinny matrices
7 years ago