Chip Kerchner
|
915a6d6e44
|
Add casting.
|
1 year ago |
Chip Kerchner
|
7ec3c16d82
|
Remove beta from optimized functions.
|
1 year ago |
Chip Kerchner
|
7cc00f68c9
|
Remove more duplicate.
|
1 year ago |
Chip Kerchner
|
e238a68c03
|
Remove duplicate.
|
1 year ago |
Chip Kerchner
|
32095b0cbb
|
Remove parameter.
|
1 year ago |
Chip Kerchner
|
c8788208c8
|
Fixing block issue with transpose version.
|
1 year ago |
Chip Kerchner
|
d7c0d87cd1
|
Small changes.
|
1 year ago |
Chip Kerchner
|
eb6f3a05ef
|
Common MMA code.
|
1 year ago |
Chip Kerchner
|
fb287d17fc
|
Common code.
|
1 year ago |
Chip Kerchner
|
8ab6245771
|
Small change.
|
1 year ago |
Chip Kerchner
|
df19375560
|
Almost final code for MMA.
|
1 year ago |
Chip Kerchner
|
05aa63e738
|
More MMA BF16 GEMV code.
|
1 year ago |
Chip Kerchner
|
c9ce37d527
|
Force vector pairs in clang.
|
1 year ago |
Chip Kerchner
|
89a12fa083
|
MMA BF16 GEMV code.
|
1 year ago |
Chip Kerchner
|
7947970f9d
|
Move common code.
|
1 year ago |
Chip Kerchner
|
72216d28c2
|
Fix bug with inc_y adding results twice.
|
1 year ago |
Chip Kerchner
|
2f142ee857
|
More common code.
|
1 year ago |
Chip Kerchner
|
39fd29f1de
|
Minor improvement and turn off BF16 GEMV forwarding by default.
|
1 year ago |
Chip Kerchner
|
8541b25e1d
|
Special case beta is one.
|
1 year ago |
Chip Kerchner
|
76227e2948
|
Initial commit for vectorized BF16 GEMV. Added GEMM_GEMV_FORWARD_BF16 to enable using BF16 GEMV for one dimension matrices. Updated unit test to support inc_x != 1 or inc_y for GEMV.
|
1 year ago |
Martin Kroeker
|
e05d98d00a
|
expressly use fld.d/fst.d for floating point registers instead of LD/ST macros
|
1 year ago |
Chip Kerchner
|
a0aeba631d
|
Merge branch 'develop' into betterPowerGEMVTail
|
1 year ago |
Chip Kerchner
|
083faf7556
|
Merge branch 'develop' into betterPowerGEMVTail
|
1 year ago |
Chip Kerchner
|
75472b830a
|
Merge branch 'develop' into betterPowerGEMVTail
|
1 year ago |
Henry Chen
|
ef94b96530
|
Use ldc1 and sdc1 for the prologue and epilogue on LOONGSON3A
This fix is similar to
2d8064174c.
|
1 year ago |
Martin Kroeker
|
7ca835a82c
|
address clang array overflow warning
|
1 year ago |
Martin Kroeker
|
46e331a917
|
remove the unworkable GEMM3M restriction from GENERIC again
|
1 year ago |
Martin Kroeker
|
ccc23338d7
|
have the dummy GEMM3M kernel at least forward to regular GEMM
|
1 year ago |
Martin Kroeker
|
f1c9803f9a
|
add proper return statement
|
1 year ago |
Martin Kroeker
|
60abcc3991
|
add proper return statement
|
1 year ago |
Chip Kerchner
|
1a7b8c650d
|
Merge branch 'develop' into betterPowerGEMVTail
|
1 year ago |
Martin Kroeker
|
9afd0c8afd
|
Merge pull request #4814 from Mousius/gemv-proxy
Forward GEMM to GEMV when one argument is actually a vector
|
1 year ago |
Martin Kroeker
|
edbf093c98
|
Update zarch SCAL kernels to handle INF and NAN arguments (#4829)
* handle INF and NAN in input (for S/D only if DUMMY2 argument is set)
|
1 year ago |
Chris Sidebottom
|
ba2e989c67
|
Add accumulators to AArch64 GEMV Kernels
This helps to reduce values going missing as we accumulate.
|
1 year ago |
Martin Kroeker
|
a875304eb0
|
fix inverted conditional for NAN handling
|
1 year ago |
Martin Kroeker
|
24acdd6bbb
|
correct offset
|
1 year ago |
Martin Kroeker
|
fb7c53c5e5
|
Merge pull request #4807 from martin-frbg/scalfixes
[WIP]Make NAN handling in the SCAL kernels depend on the dummy2 parameter
|
1 year ago |
Martin Kroeker
|
15c53dd2e0
|
Merge pull request #4794 from XiWeiGu/Fixed_Numpy_CI_Test
Try to fixed numpy ci test failures
|
1 year ago |
Martin Kroeker
|
a4e56e0452
|
Merge pull request #4806 from Mousius/small-gemm
Small GEMM for AArch64 with SVE
|
1 year ago |
yamazaki-mitsufumi
|
88caf02f62
|
Fix ambiguous error on Mac OS
|
1 year ago |
Martin Kroeker
|
b613754143
|
Update scal..c
|
1 year ago |
Martin Kroeker
|
f5d04318e3
|
Merge branch 'OpenMathLib:develop' into scalfixes
|
1 year ago |
Martin Kroeker
|
73f8866ffb
|
make NAN handling depend on DUMMY2 parameter
|
1 year ago |
Martin Kroeker
|
dfbc2348a8
|
fix NAN handling
|
1 year ago |
Martin Kroeker
|
c064319ecb
|
fix alpha=NAN case
|
1 year ago |
Martin Kroeker
|
c2ffd90e8c
|
make NAN handling depend on dummy2 parameter
|
1 year ago |
Chris Sidebottom
|
ea4ab3b310
|
Better header guard around bridge
|
1 year ago |
Chris Sidebottom
|
7311d93016
|
Unroll TT further
|
1 year ago |
Martin Kroeker
|
a815594fd1
|
Merge pull request #4801 from markdryan/markdryan/riscv-dynamic-arch
Add autodetection for riscv64
|
1 year ago |
Martin Kroeker
|
dd6c33d34d
|
make NAN handling depend on dummy2 parameter
|
1 year ago |