oon3m0oo
bdb29242a3
Merge ba586c3d16 into 4dd70d98d7
8 years ago
Martin Kroeker
4dd70d98d7
Merge pull request #1667 from xianyi/revert-1642-develop
Revert "Rewrite &= -> = and simplify the initial blocking phase."
8 years ago
Martin Kroeker
504310eeb9
Merge pull request #1665 from martin-frbg/cpuid-ryzen2
Add cpuid for AMD Ryzen 2
8 years ago
Martin Kroeker
ea1f39518f
Merge pull request #1663 from martin-frbg/issue1641
Double MAX_ALLOCATING_THREADS to fix segfaults with Go and Octave
8 years ago
Martin Kroeker
5f2a3c05cd
Revert "Rewrite &= -> = and simplify the initial blocking phase."
8 years ago
Martin Kroeker
d0ec4325cf
Add cpuid for AMD Ryzen 2
8 years ago
Martin Kroeker
3f73e8b8cf
Add cpuid for AMD Ryzen 2
for #1664
8 years ago
Martin Kroeker
a83f01e0ee
Merge pull request #1662 from martin-frbg/cmake-avx512
Add -march=skylake-avx512 to AVX512 compile check and suppress its ou…
8 years ago
Martin Kroeker
a49203b48c
Double MAX_ALLOCATING_THREADS to fix segfaults with Go and Octave
for #1641
8 years ago
Craig Donner
ba586c3d16
Ensure that the gotoblas lookup table is always initialized.
It is possible to build a program that calls a non-GEMM OpenBLAS routine from
a static initializer. Since the order of initialization is undefined, and even
less defined when using __attribute__((constructor)) in one TU and a C++ static
initializer in another TU, it can happen (and does, unfortunately) that
gotoblas_init is not called before the first BLAS routine. This results in a
segfault when trying to index into the gotoblas table.
The solution I have here is indirection: rather than directly using the table
use an inlined function to first check if it's been initialized. Since it will
only not have been done once, hopefully the branch prediction still keeps things
fast.
8 years ago
Martin Kroeker
b74aef2816
Add -march=skylake-avx512 to AVX512 compile check and suppress its output
8 years ago
Martin Kroeker
a9fa805007
Merge pull request #1660 from martin-frbg/issue1659
Fix typo that broke compilation with DYNAMIC_ARCH and NO_AVX2
8 years ago
Martin Kroeker
9d15a3bd16
Fix typo that broke compilation with DYNAMIC_ARCH and NO_AVX2
fixes 1659
8 years ago
Martin Kroeker
bbf2124970
set version number to 0.3.2.dev
8 years ago
Martin Kroeker
1392eba488
set version number to 0.3.2.dev
8 years ago
Martin Kroeker
61659f8765
Merge pull request #1648 from martin-frbg/nofort
Handle NOFORTRAN=0
8 years ago
Martin Kroeker
3d3c19717c
Merge pull request #1655 from martin-frbg/issue1641
Fix apparent off-by-one error in calculation of MAX_ALLOCATING_THREADS
8 years ago
Martin Kroeker
24e344038d
Merge pull request #1654 from martin-frbg/avx512check
Add compiler option to avx512 test and hide test output
8 years ago
Martin Kroeker
4e9c34018e
Fix apparent off-by-one error in calculation of MAX_ALLOCATING_THREADS
fixes #1641
8 years ago
Martin Kroeker
f5243e8e1f
Add compiler option to avx512 test and hide test output
8 years ago
Martin Kroeker
ba8388cee0
Merge pull request #1651 from martin-frbg/avx512-nodgemm
Disable the 16x2 DTRMM kernel on SkylakeX as well
8 years ago
Martin Kroeker
6e54b0a027
Disable the 16x2 DTRMM kernel on SkylakeX as well
8 years ago
Martin Kroeker
40c8cbc3bf
Merge pull request #1650 from martin-frbg/avx512-nodgemm
Disable the AVX512 DGEMM kernel for now
8 years ago
Martin Kroeker
d3c9eb4c7d
Merge pull request #1639 from martin-frbg/dyn_list
Add DYNAMIC_LIST option for user-defined list of dynamic targets
8 years ago
Martin Kroeker
f0a8dc2eec
Disable the AVX512 DGEMM kernel for now
due to #1643
8 years ago
Martin Kroeker
cc92257ea6
Update Makefile
8 years ago
Martin Kroeker
2aba1b1658
Merge branch 'develop' into nofort
8 years ago
Martin Kroeker
8396e9e777
Handle NOFORTRAN=0
8 years ago
Martin Kroeker
bfad307ed7
Merge pull request #1647 from martin-frbg/armv7-dot
Remove premature exits from ARMV7 xdot codes
8 years ago
Martin Kroeker
b83e4c60c7
Remove premature exit for INC_X or INC_Y zero
8 years ago
Martin Kroeker
e344db269b
Remove premature exit for INC_X or INC_Y zero
8 years ago
Martin Kroeker
545b82efd3
Remove premature exit for INC_X or INC_Y zero
8 years ago
Martin Kroeker
e322a951fe
Remove premature exit for INC_X or INC_Y zero
8 years ago
Martin Kroeker
ff2f171036
Merge pull request #1644 from martin-frbg/revert-filterout
Revert changes to NOFORTRAN handling in Makefile
8 years ago
Martin Kroeker
092175cfec
Revert changes to NOFORTRAN handling from 952541e
8 years ago
Martin Kroeker
750162a05f
Try gradual fallback for cores not in the dynamic core list
8 years ago
Martin Kroeker
e6d93f20f1
Merge pull request #2 from martin-frbg/develop
merge develop
8 years ago
Martin Kroeker
c38c65eb65
Merge pull request #1 from xianyi/develop
Merge xianyi:develop into develop
8 years ago
Martin Kroeker
ce3651516f
Merge pull request #1642 from oon3m0oo/develop
Rewrite &= -> = and simplify the initial blocking phase.
8 years ago
Craig Donner
0144068537
Rewrite &= -> = and simplify the initial blocking phase.
8 years ago
Martin Kroeker
1833a67071
Add support for a user-defined list of dynamic targets
8 years ago
Martin Kroeker
0b2b83d9ed
Add support for a user-defined list of dynamic targets
8 years ago
Martin Kroeker
62cf769aa6
Merge pull request #1638 from martin-frbg/issue1637
Expose the CBLAS interface to the IxAMIN functions and have make build it
8 years ago
Martin Kroeker
eb71d61c7c
Expose CBLAS interface to BLAS extensions iXamin
8 years ago
Martin Kroeker
9cf22b7d91
Build cblas_iXamin interfaces
8 years ago
Martin Kroeker
cc66743b66
Merge pull request #1634 from oon3m0oo/develop
Fix data races reported by TSAN.
8 years ago
oon3m0oo
2aa0a5804e
Use BLAS rather than CBLAS in test_fork.c ( #1626 )
This is handy for people not using lapack.
8 years ago
Craig Donner
28c28ed275
Fix data races reported by TSAN.
8 years ago
oon3m0oo
a399d00425
Further improvements to memory.c. ( #1625 )
- Compiler TLS is now used only used when the compiler supports it
- If compiler TLS is unsupported, we use platform-specific TLS
- Only one variable (an index) is now in TLS
- We only access TLS once per alloc, and never when freeing
- Allocation / release info is now stored within the allocation itself, by
over-allocating; this saves having external structures do the bookkeeping, and
reduces some of the redundant data that was being stored (such as addresses)
- We never hit the alloc lock when not using SMP or when using OpenMP (that was
my fault)
- Now that there are fewer tracking structures I think this is a bit easier to
read than before
8 years ago
Martin Kroeker
f66b9c8826
Merge pull request #1630 from martin-frbg/x86-march
Add -march=skylake-avx512 to flags if target is skylake x
8 years ago