Vaisakh K V
d23eb3b93e
Support for SME1 based sgemm_direct kernel for cblas_sgemm level 3 API
* Added ARMV9SME target
* Added SGEMM_DIRECT kernel based on SME1
1 year ago
gxw
73c6a28073
x86_64: opt somatcopy_ct with AVX
1 year ago
Ayappan Perumal
020cce1068
Fix build issues with gcc compiler as well
1 year ago
Ayappan Perumal
b6ec73e77c
Fix AIX build
1 year ago
Martin Kroeker
016bdb9b0b
Merge pull request #4946 from XiWeiGu/la64_omatcopy_lasx
LoongArch64: Opt somatcopy with LASX
1 year ago
Chip Kerchner
ab71a1edf2
Better VSX.
1 year ago
gxw
bb31bbef52
LoongArch64: Opt somatcopy_ct with LASX
1 year ago
gxw
b37129341b
LoongArch64: Opt somatcopy_cn with LASX
1 year ago
gxw
acf6cab304
LoongArch64: Opt somatcopy_rn with LASX
1 year ago
gxw
15edb441bf
LoongArch64: Opt somatcopy_rt with LASX
1 year ago
Chip Kerchner
36bd3eeddf
Vectorize BF16 GEMV (VSX & MMA). Use GEMM_GEMV_FORWARD_BF16 (for Power).
1 year ago
Martin Kroeker
e52d9b4cf1
Merge pull request #4928 from austinpagan/czgemm_in_c
CGEMM & ZGEMM using C code, Power only, P10 only.
1 year ago
Gordon Fossum
0b7fb5c791
CGEMM & ZGEMM using C code.
1 year ago
Martin Kroeker
9783dd07ab
Rename KERNEL.LOONGSONGENERIC to KERNEL.LA64_GENERIC
1 year ago
Martin Kroeker
c9e92348a6
Handle inf/nan if dummy2 flag is set
1 year ago
Martin Kroeker
d714013ab9
change sgemm kernel to 4x4 as the 16x4 altivec goes out of bounds
1 year ago
Martin Kroeker
de421b7764
Merge pull request #4904 from XiWeiGu/la64_cross_cmake
LoongArch64: Enable cmake cross-compilation
1 year ago
gxw
30af9278dc
LoongArch64: Enable cmake cross-compilation
1 year ago
gxw
48698b2b1d
LoongArch64: Rename core
Use microarchitecture name instead of meaningless strings to name the core,
the legacy core is still retained.
1. Rename LOONGSONGENERIC to LA64_GENERIC
2. Rename LOONGSON3R5 to LA464
3. Rename LOONGSON2K1000 to LA264
1 year ago
Deeksha Goplani
4894c54055
Improve TN case with further unrolling
1 year ago
Martin Kroeker
e05d98d00a
expressly use fld.d/fst.d for floating point registers instead of LD/ST macros
1 year ago
Chip Kerchner
a0aeba631d
Merge branch 'develop' into betterPowerGEMVTail
1 year ago
Chip Kerchner
083faf7556
Merge branch 'develop' into betterPowerGEMVTail
1 year ago
Chip Kerchner
75472b830a
Merge branch 'develop' into betterPowerGEMVTail
1 year ago
Henry Chen
ef94b96530
Use ldc1 and sdc1 for the prologue and epilogue on LOONGSON3A
This fix is similar to
2d8064174c .
1 year ago
Martin Kroeker
7ca835a82c
address clang array overflow warning
1 year ago
Martin Kroeker
46e331a917
remove the unworkable GEMM3M restriction from GENERIC again
1 year ago
Martin Kroeker
ccc23338d7
have the dummy GEMM3M kernel at least forward to regular GEMM
1 year ago
Martin Kroeker
f1c9803f9a
add proper return statement
1 year ago
Martin Kroeker
60abcc3991
add proper return statement
1 year ago
Chip Kerchner
1a7b8c650d
Merge branch 'develop' into betterPowerGEMVTail
1 year ago
Martin Kroeker
9afd0c8afd
Merge pull request #4814 from Mousius/gemv-proxy
Forward GEMM to GEMV when one argument is actually a vector
1 year ago
Martin Kroeker
edbf093c98
Update zarch SCAL kernels to handle INF and NAN arguments ( #4829 )
* handle INF and NAN in input (for S/D only if DUMMY2 argument is set)
1 year ago
Chris Sidebottom
ba2e989c67
Add accumulators to AArch64 GEMV Kernels
This helps to reduce values going missing as we accumulate.
1 year ago
Martin Kroeker
a875304eb0
fix inverted conditional for NAN handling
1 year ago
Martin Kroeker
24acdd6bbb
correct offset
1 year ago
Martin Kroeker
fb7c53c5e5
Merge pull request #4807 from martin-frbg/scalfixes
[WIP]Make NAN handling in the SCAL kernels depend on the dummy2 parameter
1 year ago
Martin Kroeker
15c53dd2e0
Merge pull request #4794 from XiWeiGu/Fixed_Numpy_CI_Test
Try to fixed numpy ci test failures
1 year ago
Martin Kroeker
a4e56e0452
Merge pull request #4806 from Mousius/small-gemm
Small GEMM for AArch64 with SVE
1 year ago
yamazaki-mitsufumi
88caf02f62
Fix ambiguous error on Mac OS
1 year ago
Martin Kroeker
b613754143
Update scal..c
1 year ago
Martin Kroeker
f5d04318e3
Merge branch 'OpenMathLib:develop' into scalfixes
1 year ago
Martin Kroeker
73f8866ffb
make NAN handling depend on DUMMY2 parameter
1 year ago
Martin Kroeker
dfbc2348a8
fix NAN handling
1 year ago
Martin Kroeker
c064319ecb
fix alpha=NAN case
1 year ago
Martin Kroeker
c2ffd90e8c
make NAN handling depend on dummy2 parameter
1 year ago
Chris Sidebottom
ea4ab3b310
Better header guard around bridge
1 year ago
Chris Sidebottom
7311d93016
Unroll TT further
1 year ago
Martin Kroeker
a815594fd1
Merge pull request #4801 from markdryan/markdryan/riscv-dynamic-arch
Add autodetection for riscv64
1 year ago
Martin Kroeker
dd6c33d34d
make NAN handling depend on dummy2 parameter
1 year ago