Martin Kroeker
9d43140d61
Improve check for conflicting config_kernel.h
5 years ago
Martin Kroeker
8ef600f1a3
Merge pull request #95 from xianyi/develop
rebase
5 years ago
Martin Kroeker
88928650c4
Merge pull request #2883 from martin-frbg/issue2872
Minor CMAKE fixes
5 years ago
Martin Kroeker
fbda20c856
Merge pull request #94 from xianyi/develop
rebase
5 years ago
Martin Kroeker
82a497ec5d
restore PRESCOTT default for DYNAMIC_LIST
5 years ago
Martin Kroeker
de27e4f5fb
Stop DYNAMIC_ARCH build if the toplevel source contains a stray config_kernel.h from a gmake build
This is unlikely to happen in practice, but if it does, the rogue file would get included instead of the dynamically generated version for each target_core, leading to very confusing errors like "invalid operands (undefined UND and ABS sections)" in compilation of the assembly kernels as macros like PREFETCH would remain undefined
5 years ago
Martin Kroeker
e1b7123bbe
Merge pull request #2867 from Qiyu8/usimd-floatdot
Optimize the performance of dot by using universal intrinsics in X86/ARM
5 years ago
Qiyu8
f32d34a015
add sse3 compiler flag
5 years ago
Martin Kroeker
599777ecb7
Merge pull request #2879 from martin-frbg/issue2839
Default BLAS3_MEM_ALLOC_THRESHOLD on all platforms to 32
5 years ago
Martin Kroeker
a5feea6611
make BLAS3_MEM_ALLOC_THRESHOLD configurable on non-Windows
5 years ago
Martin Kroeker
dc8e4e1959
Reduce the BLAS3 heap allocation threshold to 32 and mark it as configurable
5 years ago
Martin Kroeker
cccd1438da
Merge pull request #93 from xianyi/develop
rebase
5 years ago
Martin Kroeker
f032d8966e
Merge pull request #2874 from Flamefire/memory_fixes
Avoid out of bounds access on invalid memory free
5 years ago
Martin Kroeker
f6e4cf2f9d
Merge pull request #2876 from Flamefire/omp_fork_fix
Lazyly reinit threads after a fork in OMP mode
5 years ago
Martin Kroeker
9828343e12
Merge pull request #2878 from brada4/asms
fix clang std=c18 compilation on aarch64
5 years ago
User User-User
d2333e7842
aarch64 fix std=c18 compilation
5 years ago
Alexander Grund
3094fc6c83
Lazyly reinit threads after a fork in OMP mode
This initializes the per-thread memory buffers which get
cleared/released on a fork via pthread_at_fork. Not doing so leads to
each thread calling blas_memory_alloc on almost every execution which
slows down the code significantly as the threads race for the memory
allocation using locks to serialize that.
5 years ago
Alexander Grund
3c05f54df8
Avoid out of bounds access on invalid memory free
5 years ago
Alexander Grund
dee7c49938
Fix TABs and trailing space
5 years ago
Martin Kroeker
d3c0d6811b
Merge pull request #2873 from martin-frbg/issue2871
Check for __linux rather than linux in cpuid code and benchmarks
5 years ago
Martin Kroeker
9637cd1fd1
Merge pull request #2865 from thisch/backticks
Consolidate usage of backticks for build options
5 years ago
Martin Kroeker
2367726578
Remove redundant status message
5 years ago
Martin Kroeker
5464eb13ea
Change ifdef linux to __linux for C11 compatibility
5 years ago
Martin Kroeker
e1574cbc83
Change ifdef linux to __linux for C11 compatibility
and add a fallback for unsupported operating systems in detect()
5 years ago
Martin Kroeker
0b2bb5696a
Change ifdef linux to __linux for C11 compatibility
5 years ago
Martin Kroeker
a7d5d0078d
Change ifdef linux to __linux for C11 compatibility
5 years ago
Martin Kroeker
be40440ec5
Change ifdef linux to __linux for C11 compatibility
5 years ago
Martin Kroeker
2bf70c8e3b
Change ifdef linux to __linux for C11 compatibility
5 years ago
Qiyu8
60e6c68e38
Adapt ARM architect
5 years ago
Martin Kroeker
64629cb5c7
Merge pull request #91 from xianyi/develop
rebase
5 years ago
Qiyu8
1b1a757f5f
Optimize the performance of dot by using universal intrinsics in X86/ARM
5 years ago
Martin Kroeker
0d98ce202c
Merge pull request #2866 from RajalakshmiSR/p10_dcopy
Optimize dcopy/zcopy for POWER10
5 years ago
Rajalakshmi Srinivasaraghavan
2df4235e00
Optimize dcopy/zcopy for POWER10
This patch makes use of new POWER10 vector pair instructions for
loads and stores. Tested in simulator and no new failures.
5 years ago
Thomas Hisch
fe8cd5ae7e
Consolidate usage of backticks for build options
There were some build options in the README that were not
highlighted. Now all are highlighted.
5 years ago
Martin Kroeker
ba31c8f5f9
Merge pull request #2853 from Qiyu8/usimd-daxpy
Optimize the performance of daxpy by using universal intrinsics
5 years ago
Martin Kroeker
e961d4d609
Merge pull request #2864 from martin-frbg/lapack445
FIx underflow/rounding errors in LAPACK (S,D)LANV2
5 years ago
Martin Kroeker
7ed25e9e10
FIx underflow/rounding errors in LAPACK (S,D)LANV2
Reference-LAPACK PR 445, fixing their issue 263
5 years ago
Martin Kroeker
7b169379e0
Merge pull request #2863 from martin-frbg/readmefixes
Readmefixes
5 years ago
Martin Kroeker
7f539fb850
Update cpu list, outline cmake build, clarify scope of set_num_threads extension
5 years ago
Martin Kroeker
caf7a12295
Merge pull request #90 from xianyi/develop
rebase
5 years ago
Martin Kroeker
72b5b73647
Merge pull request #2850 from xiaojiayuan111/develop
fix a bug of trmm
5 years ago
Qiyu8
881c15179f
remove default support for FMA4 on zen architect
5 years ago
Martin Kroeker
dfaafd3b55
Merge pull request #2854 from martin-frbg/travis-graviton
Add an AWS-Graviton2 build to Travis CI
5 years ago
Martin Kroeker
f2e9a24e1a
Add AWS Graviton2 build
5 years ago
Martin Kroeker
61fae59298
Merge pull request #88 from xianyi/develop
rebase
5 years ago
Martin Kroeker
33d22f99f1
Merge pull request #2851 from martin-frbg/travis-xcode12
Add an OSX build with xcode12
5 years ago
Martin Kroeker
5ba01dd1a8
Add an OSX build with xcode12
5 years ago
Qiyu8
14f7dad3b7
performance improved
5 years ago
y00512012
06cf73a239
fix a bug of trmm
5 years ago
Qiyu8
325b539c26
Optimize the performance of daxpy by using universal intrinsics
5 years ago