Martin Kroeker
d64cc2be81
Add early returns
5 years ago
Martin Kroeker
c9b67141f0
Add early returns
5 years ago
Martin Kroeker
6797a3a1e0
Add early returns
5 years ago
Martin Kroeker
936966a42c
Make ILAENV and xGETRF2 functions available
5 years ago
Martin Kroeker
5c6c2cd4f6
Merge pull request #2775 from Guobing-Chen/Fix_OMP_threads_specify
Fix OMP num specify issue
5 years ago
Martin Kroeker
e54be4ba1c
Merge pull request #2792 from pkubaj/patch-1
Add aliases for armv6, armv7
5 years ago
pkubaj
48a1364e10
Add aliases for armv6, armv7
FreeBSD uses those names for 32-bit ARM variants.
5 years ago
Chen, Guobing
0c1c903f1e
Fix OMP num specify issue
In current code, no matter what number of threads specified, all
available CPU count is used when invoking OMP, which leads to very bad
performance if the workload is small while all available CPUs are big.
Lots of time are wasted on inter-thread sync. Fix this issue by really
using the number specified by the variable 'num' from calling API.
Signed-off-by: Chen, Guobing <guobing.chen@intel.com>
5 years ago
Martin Kroeker
a073fa870e
Merge pull request #2791 from martin-frbg/issue2787
Fix crashes in parallelized x86_64 ZDOT particularly on Windows
5 years ago
Martin Kroeker
b2053239fc
Fix mssing dummy parameter (imag part of alpha) of zdot_thread_function
5 years ago
Martin Kroeker
b11bb6e728
Merge pull request #2790 from martin-frbg/issue2789
Add OpenMP dependency to pkgconfig information if needed
5 years ago
Martin Kroeker
1840bc5b52
Add OpenMP dependency to pkgconfig file if needed
5 years ago
Martin Kroeker
7c0977c267
Add OpenMP dependency to pkgconfig file if needed
5 years ago
Martin Kroeker
fb3d80c42a
Merge pull request #78 from xianyi/develop
rebase
5 years ago
Martin Kroeker
9ee21a0a39
Merge pull request #2780 from Guobing-Chen/CPL_build_support
Enable COOPERLAKE build target
5 years ago
Martin Kroeker
bd3207b4b4
Update system.cmake
5 years ago
Martin Kroeker
b8ebfc9335
Update system.cmake
5 years ago
Martin Kroeker
7c1986640b
fallback from cooperlake to skylake if gcc<10
5 years ago
Martin Kroeker
71d33c952d
Typo fix
5 years ago
Martin Kroeker
6a3c074786
-march=cooperlake requires gcc10
5 years ago
Martin Kroeker
430f741b30
-march=cooperlake requires gcc10
5 years ago
Martin Kroeker
6f4dc7445d
Fix typo
5 years ago
Martin Kroeker
81fbe8d088
-march=cooperlake only available in gcc >= 10
5 years ago
Martin Kroeker
bb9cf766f5
make march=cooperlake option conditional on gcc >= 10.1
5 years ago
Martin Kroeker
75eeb265d7
[WIP] Refactor the driver code for direct SGEMM ( #2782 )
Move "direct SGEMM" functionality out of the SkylakeX SGEMM kernel and make it available
(on x86_64 targets only for now) in DYNAMIC_ARCH builds
* Add sgemm_direct targets in the kernel Makefile.L3 and CMakeLists.txt
* Add direct_sgemm functions to the gotoblas struct in common_param.h
* Move sgemm_direct_performant helper to separate file
* Update gemm.c to macros for sgemm_direct to support dynamic_arch naming via common_s,h
* (Conditionally) add sgemm_direct functions in setparam-ref.c
5 years ago
Martin Kroeker
2c72972570
Merge pull request #2785 from albertziegenhagel/always-generate-pkg-config
Do not require pkg-config to generate the *.pc file
5 years ago
Albert Ziegenhagel
6b731d917f
Do not require pkg-config to generate the *.pc file
Generating the pkg-config file does not actually depend on pkg-config being available.
5 years ago
Martin Kroeker
5dcf47cd97
Merge pull request #2784 from martin-frbg/issue2783
Add fallback typedef for bfloat16 to openblas_config.h template
5 years ago
Martin Kroeker
aa286e301b
Add typedef for bfloat16 if needed
5 years ago
Martin Kroeker
9f0ef9cdfc
Merge pull request #77 from xianyi/develop
rebase
5 years ago
Martin Kroeker
6bfc66663c
revert
5 years ago
Martin Kroeker
a8c6fb9e1c
revert
5 years ago
Martin Kroeker
5ec8f716cf
revert
5 years ago
Martin Kroeker
82f8a0aeba
Update .drone.yml
5 years ago
Martin Kroeker
d57d503c15
Update Makefile
5 years ago
Martin Kroeker
37ac23e8a3
Add simple MT sgemm precision test and INTERFACE64 build
5 years ago
Martin Kroeker
6a93e3b2ba
Add simple sgemm preicsion test
5 years ago
Martin Kroeker
47ce1dd08f
Update gemm64.cpp
5 years ago
Martin Kroeker
f5fcc5baec
Add trivial gemm test for multithread consistency
5 years ago
Martin Kroeker
597010a968
Fix incorrect argument to SLASET
Reference-LAPACK issue 425 (and 318)
5 years ago
Martin Kroeker
d64f1ef26b
Fix incorrect argument to SLASET
Reference-LAPACK issue 425 (and 318)
5 years ago
Martin Kroeker
c62aad62e5
Fix incorrect calls to DLASET
Reference-LAPACK issue 429
5 years ago
Chen, Guobing
e740c4873d
Enable COOPERLAKE build target
Enable new build target platform -- COOPERLAKE. This target platform
supports all the SKYLAKEX supported ISAs + avx512bf16. So all the
SKYLAKEX specific kernels/drivers and related code are now extended
to be also active on COOPERLAKE. Besides, new BF16 related kernels
are active under this target.
5 years ago
Martin Kroeker
efdd237a91
Add a dedicated POWER9 build to the Travis CI ( #2774 )
* Add dedicated POWER9 build (using new syntax to ensure it runs as a P9-only containerized job rather than a VM that
might end up on P8 hardware half of the time)
* Bump gcc version for POWER9 build
5 years ago
Martin Kroeker
4573cb2f43
Merge pull request #2765 from martin-frbg/issue2760
Add memory barrier to the PPC blas_lock implementation for Linux
5 years ago
Martin Kroeker
2a4bb797db
Merge pull request #2773 from martin-frbg/issue2770
Fix Makefiles still mishandling NO_CBLAS=0 and NO_LAPACKE=0
5 years ago
Martin Kroeker
cbbe38bb88
Merge pull request #2772 from mhillenibm/s390x_gemm_tuning
s390x: GEMM tuning for z14
5 years ago
Martin Kroeker
619343278d
Fix mishandling of NO_CBLAS=0 and NO_LAPACKE=0
5 years ago
Martin Kroeker
fee361ae64
fix another source of NO_CBLAS=0 surprise
5 years ago
Martin Kroeker
62f4c84f27
Merge pull request #76 from xianyi/develop
rebase
5 years ago