Wangyang Guo
7d3ecc92cb
Small Matrix: skylakex: add SGEMM_SMALL_M_PERMIT and tune for TN kernel
5 years ago
Wangyang Guo
96a4f950ef
Small Matrix: add GEMM_SMALL_MATRIX_PERMIT to tune small matrics case
5 years ago
Wangyang Guo
754dc9ffb9
Small Matrix: skylakex: add sgemm tn kernel
5 years ago
Wangyang Guo
967df074b7
Small Matrix: skylakex: sgemm nt: optimize for M < 12
5 years ago
Wangyang Guo
fdd2d0fc7b
Small Matrix: skylakex: add sgemm nt kernel
5 years ago
Wangyang Guo
5f91668904
Small Matrix: skylakex: sgemm nn: fix n6 conflicts with n4
5 years ago
Wangyang Guo
0ecaa99fc2
Small Matrix: skylakex: sgemm nn: fix error when beta not zero
5 years ago
Wangyang Guo
a1835c8ca2
Small Matrix: skylakex: sgemm nn: add n6 to improve performance
5 years ago
Wangyang Guo
50bd888c73
Small Matrix: skylakex: sgemm nn: reduce store 4 N at a time
5 years ago
Wangyang Guo
95912941ca
Small Matrix: skylakex: sgemm nn: reduce store 4 M at a time
5 years ago
Wangyang Guo
8a4bb07453
Small Matrix: skylakex: sgemm nn: clean up unused code
5 years ago
Wangyang Guo
04ac9c7a13
Small Matrix: skylakex: sgemm_nn: optimize for M <= 8
5 years ago
Wangyang Guo
20befbb2f9
Optimize M < 16 using AVX512 mask
5 years ago
Wangyang Guo
c3e4c4db47
small matrix: SkylakeX: add SGEMM NN kernel
5 years ago
Zhang Xianyi
77460ac255
Fix gemm_batch bug for SMALL_MATRIX_OPT=1.
5 years ago
Zhang Xianyi
88e6806e3f
Init cblas_?gemm_batch implementation.
5 years ago
Xianyi Zhang
4130d1732e
Refs #2587 fix small matrix c/zgemm bug.
5 years ago
Xianyi Zhang
255b6dd0fa
Merge branch 'develop' into small_matrices
5 years ago
Xianyi Zhang
741d6c5cb8
Refs #2587 Add small matrix optimization reference kernel for c/zgemm.
5 years ago
Martin Kroeker
514a3d7d63
Merge pull request #2798 from kadler/aix-cpuid
Fix compile error on AIX cpuid detection
5 years ago
Kevin Adler
085aae8bdb
Fix compile error on AIX cpuid detection
In 589c74a the cpuid detection was changed to use systemcfg, but a copy
and paste error was introduced during some refactoring that caused
POWER7 detection to reference CPUTYPE_POWER7 (which doesn't exist)
instead of CPUTYPE_POWER6.
5 years ago
Xianyi Zhang
712ca43069
Change a1b0 gemm to b0 gemm.
5 years ago
Martin Kroeker
5c6c2cd4f6
Merge pull request #2775 from Guobing-Chen/Fix_OMP_threads_specify
Fix OMP num specify issue
5 years ago
Martin Kroeker
e54be4ba1c
Merge pull request #2792 from pkubaj/patch-1
Add aliases for armv6, armv7
5 years ago
pkubaj
48a1364e10
Add aliases for armv6, armv7
FreeBSD uses those names for 32-bit ARM variants.
5 years ago
Chen, Guobing
0c1c903f1e
Fix OMP num specify issue
In current code, no matter what number of threads specified, all
available CPU count is used when invoking OMP, which leads to very bad
performance if the workload is small while all available CPUs are big.
Lots of time are wasted on inter-thread sync. Fix this issue by really
using the number specified by the variable 'num' from calling API.
Signed-off-by: Chen, Guobing <guobing.chen@intel.com>
5 years ago
Martin Kroeker
a073fa870e
Merge pull request #2791 from martin-frbg/issue2787
Fix crashes in parallelized x86_64 ZDOT particularly on Windows
5 years ago
Martin Kroeker
b2053239fc
Fix mssing dummy parameter (imag part of alpha) of zdot_thread_function
5 years ago
Martin Kroeker
b11bb6e728
Merge pull request #2790 from martin-frbg/issue2789
Add OpenMP dependency to pkgconfig information if needed
5 years ago
Martin Kroeker
1840bc5b52
Add OpenMP dependency to pkgconfig file if needed
5 years ago
Martin Kroeker
7c0977c267
Add OpenMP dependency to pkgconfig file if needed
5 years ago
Martin Kroeker
fb3d80c42a
Merge pull request #78 from xianyi/develop
rebase
5 years ago
Martin Kroeker
9ee21a0a39
Merge pull request #2780 from Guobing-Chen/CPL_build_support
Enable COOPERLAKE build target
5 years ago
Martin Kroeker
bd3207b4b4
Update system.cmake
5 years ago
Martin Kroeker
b8ebfc9335
Update system.cmake
5 years ago
Martin Kroeker
7c1986640b
fallback from cooperlake to skylake if gcc<10
5 years ago
Martin Kroeker
71d33c952d
Typo fix
5 years ago
Martin Kroeker
6a3c074786
-march=cooperlake requires gcc10
5 years ago
Martin Kroeker
430f741b30
-march=cooperlake requires gcc10
5 years ago
Martin Kroeker
6f4dc7445d
Fix typo
5 years ago
Martin Kroeker
81fbe8d088
-march=cooperlake only available in gcc >= 10
5 years ago
Martin Kroeker
bb9cf766f5
make march=cooperlake option conditional on gcc >= 10.1
5 years ago
Martin Kroeker
75eeb265d7
[WIP] Refactor the driver code for direct SGEMM ( #2782 )
Move "direct SGEMM" functionality out of the SkylakeX SGEMM kernel and make it available
(on x86_64 targets only for now) in DYNAMIC_ARCH builds
* Add sgemm_direct targets in the kernel Makefile.L3 and CMakeLists.txt
* Add direct_sgemm functions to the gotoblas struct in common_param.h
* Move sgemm_direct_performant helper to separate file
* Update gemm.c to macros for sgemm_direct to support dynamic_arch naming via common_s,h
* (Conditionally) add sgemm_direct functions in setparam-ref.c
5 years ago
Martin Kroeker
2c72972570
Merge pull request #2785 from albertziegenhagel/always-generate-pkg-config
Do not require pkg-config to generate the *.pc file
5 years ago
Albert Ziegenhagel
6b731d917f
Do not require pkg-config to generate the *.pc file
Generating the pkg-config file does not actually depend on pkg-config being available.
5 years ago
Martin Kroeker
5dcf47cd97
Merge pull request #2784 from martin-frbg/issue2783
Add fallback typedef for bfloat16 to openblas_config.h template
5 years ago
Martin Kroeker
aa286e301b
Add typedef for bfloat16 if needed
5 years ago
Martin Kroeker
9f0ef9cdfc
Merge pull request #77 from xianyi/develop
rebase
5 years ago
Martin Kroeker
6bfc66663c
revert
5 years ago
Martin Kroeker
a8c6fb9e1c
revert
5 years ago