Chris Sidebottom
|
7a6fa699f2
|
Small GEMM for AArch64
This is a fairly conservative addition of small matrix kernels using
SVE.
|
1 year ago |
pengxu
|
6546600342
|
Optimized ssymv and dsymv kernel LASX for LoongArch
|
1 year ago |
Chip-Kerchner
|
99384933ff
|
Revert "Merge pull request #4532 from austinpagan/cgemm_zgemm_c_code"
This reverts commit accea1555159d0928a6aa2db740c042c7e8f0dd3, reversing
changes made to b925353006.
|
1 year ago |
Martin Kroeker
|
577d480c62
|
Merge pull request #4529 from ErnstPeng/feature-branch
Optimized sgemv and dgemv kernel LSX for LoongArch
|
1 year ago |
pengxu
|
b2db064285
|
Optimized sgemv and dgemv kernel LSX for LoongArch
|
1 year ago |
Martin Kroeker
|
cfbb701497
|
Merge pull request #4536 from XiWeiGu/loongarch64-cgemv-zgemv-opt
Loongarch64 cgemv zgemv opt
|
1 year ago |
gxw
|
8e05c053be
|
LoongArch64:Fixed the failed test cases test_{c/z}gemv_n in test_extensions
|
1 year ago |
gxw
|
3f22fc2233
|
LoongArch64: Add zgemv LSX opt
|
1 year ago |
gxw
|
c508a10cf2
|
LoongArch64: Add cgemv LSX opt
|
1 year ago |
Martin Kroeker
|
accea15551
|
Merge pull request #4532 from austinpagan/cgemm_zgemm_c_code
Cgemm zgemm c code
|
1 year ago |
Martin Kroeker
|
8e872a91a9
|
Fix erroneous mapping of SUM kernels to ASUM
|
1 year ago |
Martin Kroeker
|
6699227d45
|
Merge pull request #4525 from XiWeiGu/loongarch64_fixed_kernel_regress_skx_avx
LoongArch64: Fixed utest kernel_regress:skx_avx
|
1 year ago |
gxw
|
8dea25ffff
|
LoongArch64: Fixed utest kernel_regress:skx_avx
|
1 year ago |
Martin Kroeker
|
7d506984fa
|
fix assignment of default CSUM kernel
|
1 year ago |
Martin Kroeker
|
12787775d9
|
add csum/zsum kernels (trivially derived from the asum ones)s)
|
1 year ago |
Martin Kroeker
|
8f8ef3492a
|
Add CSUM and ZSUM kernels (trivially derived from their existing ASUM counterparts)
|
1 year ago |
Martin Kroeker
|
be5e18c6f9
|
Add kernel definitions for CSUM and ZSUM
|
1 year ago |
gxw
|
990507e3b8
|
LoongArch64: Opt zgemv with LASX
|
1 year ago |
gxw
|
d51ffec3a2
|
LoongArch64: Opt cgemv with LASX
|
1 year ago |
pengxu
|
4787a55c64
|
Optimized cgemm kernel 16x4 LASX for LoongArch
|
1 year ago |
Sergei Lewis
|
ba17758c02
|
fix axpy implementations where y has a stride of 0
|
1 year ago |
Dmitry Mikushin
|
d0f5dc763b
|
Adding USE_GEMM3M macro to kernel targets, so that the *gemm3m functions and parameters can be included into the gotoblas structure. Fixes #4500
|
1 year ago |
Sergei Lewis
|
ff1523163f
|
Fix axpy test hangs when n==0. Reenable zaxpy_vector kernel for C910V.
|
2 years ago |
pengxu
|
fe3da43b7d
|
Optimized zgemm kernel 8*4 LASX, 4*4 LSX and cgemm kernel 8*4 LSX for LoongArch
|
2 years ago |
Martin Kroeker
|
e5d2725e5a
|
Merge pull request #4185 from XiWeiGu/mips_enable_msa
MIPS: Enable MSA
|
2 years ago |
Martin Kroeker
|
b537528feb
|
Merge pull request #4480 from XiWeiGu/loongarch64-fixed-{s/d}amin-lsx
LoongArch64: Fixed {s/d}amin LSX optimization
|
2 years ago |
Martin Kroeker
|
6d8a273cca
|
Handle zero increment(s) in C910V ?AXPBY (#4483)
* Handle zero increment(s)
|
2 years ago |
Martin Kroeker
|
dbcf4f8b7d
|
Merge pull request #4479 from XiWeiGu/loongarch-opt-axpby
Loongarch opt axpby
|
2 years ago |
Martin Kroeker
|
dc802dd637
|
Merge pull request #4474 from ChipKerchner/sgemmIncopy_PR
Vectorize in-copy packing/copying for SGEMM - up to 4X faster.
|
2 years ago |
gxw
|
adde725321
|
LoongArch64: Fixed {s/d}amin LSX optimization
|
2 years ago |
gxw
|
7bc93d95a1
|
LoongArch64: Opt {c/z}axpby
|
2 years ago |
gxw
|
1e1f487dc7
|
LoongArch64: Fixed {s/d}axpby
|
2 years ago |
Martin Kroeker
|
4d8dee508c
|
temporarily disable the CAXPY/ZAXPY kernels
|
2 years ago |
austinpagan
|
87ba528d8b
|
Changed C files to straighten out indentation. Removed commented lines from other file.
|
2 years ago |
austinpagan
|
461cf9083c
|
Merge remote-tracking branch 'origin/develop' into cgemm_zgemm_c_code
|
2 years ago |
austinpagan
|
ddac75e0ef
|
Adding .C versions of CGEMM and ZGEMM
|
2 years ago |
Chip Kerchner
|
2bb7ea64a1
|
Only vectorize 64-bit version for Power8.
|
2 years ago |
Sergei Lewis
|
3ffd6868d7
|
Merge branch 'develop' into dev/slewis/merge-from-riscv
|
2 years ago |
Sergei Lewis
|
a3b0ef6596
|
Restore riscv64 fixes from develop branch: dot product double precision accumulation, zscal NaN handling
|
2 years ago |
Martin Kroeker
|
d1343302bd
|
Merge pull request #4465 from XiWeiGu/utest-zscal
utest: Add tests for zscal
|
2 years ago |
gxw
|
969601a1dc
|
X86_64: Fixed bug in zscal
Fixed handling of NAN and INF arguments when
inc is greater than 1.
|
2 years ago |
Martin Kroeker
|
98c9ff3194
|
Merge pull request #4464 from XiWeiGu/loongarch64-zscal
LoongArch64: Handle NAN and INF
|
2 years ago |
Chip Kerchner
|
09bb48d1b9
|
Vectorize in-copy packing/copying for SGEMM - 4X faster.
|
2 years ago |
gxw
|
83ce97a4ca
|
LoongArch64: Handle NAN and INF
|
2 years ago |
gxw
|
a79d117405
|
LoogArch64: Fixed bug for {s/d}amin
|
2 years ago |
Sergei Lewis
|
1093def0d1
|
Merge branch 'risc-v' into develop
|
2 years ago |
Martin Kroeker
|
889c5d026a
|
Merge pull request #4456 from kseniyazaytseva/riscv-rvv10
Fix BLAS and LAPACK tests for RVV 1.0 target, update to 0.12.0 intrincics
|
2 years ago |
Martin Kroeker
|
4e2a32ff51
|
Merge pull request #4454 from kseniyazaytseva/riscv-rvv07
Fix BLAS and LAPACK tests for C910V and RISCV64_ZVL256B targets
|
2 years ago |
gxw
|
276e3ebf9e
|
LoongArch64: Add dzamax and dzamin opt
|
2 years ago |
Martin Kroeker
|
a21b2fa5e4
|
Merge pull request #4452 from kseniyazaytseva/riscv-generic
Fix BLAS, BLAS-like functions and Generic RISC-V kernels
|
2 years ago |