Wangyang Guo
|
fee5abd84b
|
Small Matrix: support cmake build
|
4 years ago |
Wangyang Guo
|
478d1086c1
|
Small Matrix: support DYNAMIC_ARCH build
|
4 years ago |
Wangyang Guo
|
6b58bca18b
|
Small Matrix: disable low performance default kernel
|
4 years ago |
Wangyang Guo
|
fa777f5517
|
Small Matrix: skylakex: add DGEMM_SMALL_M_PERMIT and tune for TN kernel
|
4 years ago |
Wangyang Guo
|
8592c21af4
|
Small Matrix: skylakex: dgemm nn: fix typo in idx load
|
4 years ago |
Wangyang Guo
|
3e79f6d89a
|
Small Matrix: skylakex: add dgemm tn kernel
|
4 years ago |
Wangyang Guo
|
323d7da4f7
|
Small Matrix: skylakex: add dgemm tt kernel
|
4 years ago |
Wangyang Guo
|
f57fc932ac
|
Small Matrix: skylakex: add dgemm nt kernel
|
4 years ago |
Wangyang Guo
|
91ec21202b
|
Small Matrix: skylakex: add dgemm nn kernel
|
4 years ago |
Wangyang Guo
|
72e070539c
|
Small Matrix: skylakex: add sgemm tt kernel
|
4 years ago |
Wangyang Guo
|
02c6e764f2
|
Small Matrix: skylakex: add SGEMM_SMALL_M_PERMIT and tune for TN kernel
|
4 years ago |
Wangyang Guo
|
5dc7c3c8e5
|
Small Matrix: add GEMM_SMALL_MATRIX_PERMIT to tune small matrics case
|
4 years ago |
Wangyang Guo
|
642c393879
|
Small Matrix: skylakex: add sgemm tn kernel
|
4 years ago |
Wangyang Guo
|
ae3f5c737c
|
Small Matrix: skylakex: sgemm nt: optimize for M < 12
|
4 years ago |
Wangyang Guo
|
0d72d75bf9
|
Small Matrix: skylakex: add sgemm nt kernel
|
4 years ago |
Wangyang Guo
|
ca7682e3a3
|
Small Matrix: skylakex: sgemm nn: fix n6 conflicts with n4
|
4 years ago |
Wangyang Guo
|
9967e61abb
|
Small Matrix: skylakex: sgemm nn: fix error when beta not zero
|
4 years ago |
Wangyang Guo
|
a87736346f
|
Small Matrix: skylakex: sgemm nn: add n6 to improve performance
|
4 years ago |
Wangyang Guo
|
4c9d9940fd
|
Small Matrix: skylakex: sgemm nn: reduce store 4 N at a time
|
4 years ago |
Wangyang Guo
|
13b32f69b7
|
Small Matrix: skylakex: sgemm nn: reduce store 4 M at a time
|
4 years ago |
Wangyang Guo
|
3d8c6d9607
|
Small Matrix: skylakex: sgemm nn: clean up unused code
|
4 years ago |
Wangyang Guo
|
49b61a3f30
|
Small Matrix: skylakex: sgemm_nn: optimize for M <= 8
|
4 years ago |
Wangyang Guo
|
f88470323b
|
Optimize M < 16 using AVX512 mask
|
4 years ago |
Wangyang Guo
|
9186456a12
|
small matrix: SkylakeX: add SGEMM NN kernel
|
4 years ago |
Xianyi Zhang
|
6022e5629c
|
Refs #2587 fix small matrix c/zgemm bug.
|
5 years ago |
Xianyi Zhang
|
57ed58cefe
|
Refs #2587 Add small matrix optimization reference kernel for c/zgemm.
|
5 years ago |
Xianyi Zhang
|
17d32a4a82
|
Change a1b0 gemm to b0 gemm.
|
5 years ago |
Xianyi Zhang
|
59cb5de46b
|
Refs #2587 Fix typos.
|
5 years ago |
Xianyi Zhang
|
be3349405d
|
Add alpha=1.0 beta=0.0 for small gemm.
|
5 years ago |
Xianyi Zhang
|
0a2077901c
|
Add small marix optimization kernel interface.
make SMALL_MATRIX_OPT=1
|
5 years ago |
gxw
|
0b8f7c8c10
|
Add cmake support for LOONGARCH64
|
4 years ago |
gxw
|
af0a69f355
|
Add support for LOONGARCH64
|
4 years ago |
Martin Kroeker
|
49bbf330ca
|
Empirical workaround for numpy SVD NaN problem from issue 3318
|
4 years ago |
Martin Kroeker
|
5b4b385ecf
|
Temporarily disable the SkylakeX sgemv_t microkernel due to LAPACK testsuite failures
|
4 years ago |
User User-User
|
39ef0880ae
|
copy conf
|
4 years ago |
Martin Kroeker
|
c4b464cac6
|
Merge pull request #3273 from austinpagan/sbgemm_gcc10_fix
Power10: Fix for SBGEMM
|
4 years ago |
Gordon Fossum
|
e6dd44d989
|
Power10: Fix for SBGEMM
While testing bfloat16 sbgemm kernel, there are some failures for odd value inputs due to updating result for
additional bytes.
|
4 years ago |
Gilles Gouaillardet
|
9d292d37b2
|
arm64: add the missing d9 register to the clobber list
Refs. numpy/numpy#18422
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
|
4 years ago |
Martin Kroeker
|
2e8ff4a781
|
Merge pull request #3266 from martin-frbg/powerparam
Remove spurious casts from PPC parameters and fix compilation for older targets
|
4 years ago |
Martin Kroeker
|
dbba381dc3
|
Merge pull request #3260 from intelmy/sgemv_t_opt
Optimized sgemv_t for small N based on AVX512
|
4 years ago |
Martin Kroeker
|
efdbdd8f82
|
Add prefetch values for power3
|
4 years ago |
Martin Kroeker
|
3906ef3b0f
|
Add prefetch values for power3
|
4 years ago |
Martin Kroeker
|
8adf0971d8
|
Add prefetch values for power3
|
4 years ago |
Martin Kroeker
|
08e2e60762
|
Add prefetch values for power3
|
4 years ago |
Martin Kroeker
|
fb9e678235
|
Fix caxpy/zaxpy for big-endian
|
4 years ago |
Martin Kroeker
|
dc4fcb48df
|
Fix inverted conditional for caxpy/zaxpy
|
4 years ago |
Martin Kroeker
|
7a48247761
|
fix c/zrot and sgemv for POWER5
|
4 years ago |
Rajalakshmi Srinivasaraghavan
|
cbb70438df
|
POWER10: Fixes for sbgemm kernel
While testing bfloat16 sbgemm kernel, there are some failures
for odd value inputs due to array access beyond the boundary.
|
4 years ago |
Ma, Yu
|
706a08d4a0
|
Optimized sgemv_t for small N based on AVX512
|
4 years ago |
Zhaofeng Li
|
590be3fae3
|
riscv64: Add Makefile
|
4 years ago |