Zhiyong Dang
3716267124
Change _STDC_VERSION__ to __STDC_VERSION__
Change-Id: Id3fa4e8d9eedd4ef7230df69b611e7f397301a42
8 years ago
Martin Kroeker
20c6c38e51
Merge branch 'develop' into atomic
8 years ago
Martin Kroeker
8ec28ff461
Remove unguarded use of _Atomic and fix tabbing
8 years ago
Martin Kroeker
bb9876db33
Fix thread races and infinite looping on systems with many cpus
On systems with more than 64 cpus, blas_quickdivide will sometimes return zero which creates bogus workloads when used for the stride calculation. This then leads to threads spinning incessantly waiting for a status change that never happens, as seen in #1497 .
This patch also fixes several data races that were found by helgrind and/or tsan while debugging the issue.
8 years ago
Martin Kroeker
40160ff3c1
Use _Atomic instead of volatile for thread safety where C11 is supported
8 years ago
Andrew
d602b99386
LAPACK helpers in C that need care too
8 years ago
Ashwin Sekhar T K
3918d17025
LAPACK: Fix lapack-test errors in ARM64 threaded version
9 years ago
Werner Saar
c81dc6322f
prepared lapack/potrf functions for UNROLL values, that are not a power of two
9 years ago
Werner Saar
3e1bbd6b5f
prepared lapack/getrf functions for UNROLL values, that are not a power of two
9 years ago
Werner Saar
956be69e1d
optimized getrf_single.c for POWER8
10 years ago
Werner Saar
6a2bde7a2d
optimized dgemm and dgetrf for POWER8
10 years ago
Hank Anderson
e74462a3f5
Moved declarations to start of functions to satisfy MSVC C89 implementation.
11 years ago
Hank Anderson
056ba26755
Changed a number of inline calls to use __inline.
MSVC doesn't inmplement C99, so can't use the inline keyword. __inline
appears to work in MSVC and GCC.
11 years ago
Timothy Gu
6c2ead30f0
Remove all trailing whitespace except lapack-netlib
Signed-off-by: Timothy Gu <timothygu99@gmail.com>
12 years ago
Zhang Xianyi
5048a80032
Refs #283 . Fixed the incorrect usage of long data type for Windows 64.
12 years ago
Zhang Xianyi
32d2ca3035
Refs #214 , #221 , #246 . Fixed the getrf overflow bug on Windows.
I used a smaller threshold since the stack size is 1MB on windows.
13 years ago
Zhang Xianyi
5d3312142a
Refs #221 #246 . Fixed the overflowing stack bug in mutlithreading BLAS3.
When NUM_THREADS(MAX_CPU_NUNBERS) is very large ,e.g. 256.
typedef struct {
volatile BLASLONG working[MAX_CPU_NUMBER][CACHE_LINE_SIZE * DIVIDE_RATE];
} job_t;
job_t job[MAX_CPU_NUMBER];
The job array is equal 8MB.
Thus, We use malloc instead of stack allocation.
13 years ago
Zhang Xianyi
1b056c5328
Refs #130 Prevent reading ipiv array beyond the bound in ?laswp. Use laswp instead of laswp_oncopy in getrf.
13 years ago
Xianyi Zhang
342bbc3871
Import GotoBLAS2 1.13 BSD version codes.
15 years ago