Martin Kroeker
e12d474780
Eliminate uses of CREAL on left-hand side of assignments
3 years ago
Martin Kroeker
9e29598575
workaround fault with ssq=inf,scale=0
3 years ago
Honglin Zhu
123e0dfb62
Neoverse N2 sbgemm:
1. Modify the algorithm to resolve multithreading failures
2. No memory allocation in sbgemm kernel
3. Optimize when alpha == 1.0f
3 years ago
Honglin Zhu
bc3728475f
format code
3 years ago
Honglin Zhu
55d686d41e
neoverse n2 sbgemm:
implement ncopy tcopy kernel_8x4
3 years ago
Honglin Zhu
04593bb27c
neoverse n2 sbgemm: init file
3 years ago
Martin Kroeker
be5500e704
Merge pull request #3669 from VFerrari/fix_small_matrix_kernel
POWER: fix issues with the small matrix kernel
3 years ago
Martin Kroeker
92275a7902
Merge pull request #3642 from nursik/develop
Add ARM64 support for Windows
3 years ago
VFerrari
cac634fce3
POWER10: Fix multithreading check when USE_THREAD=0
This patch fixes an issue when OpenBLAS is compiled for TARGET=POWER10
and the flag USE_THREAD is set to 0.
The function `num_cpu_avail` is only available when USE_THREAD=1,
so SMP is defined.
3 years ago
Martin Kroeker
9283c7c0b5
Merge pull request #3655 from RajalakshmiSR/zgemmasmp10
POWER10: Fix ZGEMM testcase failures
3 years ago
Rajalakshmi Srinivasaraghavan
f191bc652b
POWER10: Fix ZGEMM testcase failures
This patch fixes storing and restoring non volatile registers
in zgemm POWER10 kernel.
3 years ago
Rajalakshmi Srinivasaraghavan
8419d538ff
POWER10: convert dgemv inline assembly
This patch makes use of compiler builtins and matches with assembly
performance. Tested with clang14 and gcc12.
3 years ago
Xianyi Zhang
5e9a912591
Merge branch 'develop' into risc-v
3 years ago
Xianyi Zhang
968e1f51d8
Update RISC-V Intrinsic API.
3 years ago
Nursultan Zarlyk
1bb7993a97
Fix MSVC ARM64 build. Add generic kernel for ARM64
3 years ago
Martin Kroeker
dc49edd4e6
Revert "roll back DGEMM kernel ... for DYNAMIC_ARCH"
3 years ago
Rajalakshmi Srinivasaraghavan
b62173c5a0
POWER10: Changing store instructions for Level1 functions
This patch changes 32 bytes stores to two 16 bytes stores
to fix a recent degradation due to 32 bytes stores.
3 years ago
Martin Kroeker
84cb58b7fb
Fix generator rules for ?laswp_ncopy and ?neg_tcopy
3 years ago
Martin Kroeker
05dcfa176e
fix undefined prefetchsizes
3 years ago
Martin Kroeker
2bbb9f05c7
fix undefined prefetchsize
3 years ago
Martin Kroeker
115bc9b98f
CortexX1 is ARMV8 like A7x
3 years ago
Martin Kroeker
b3b4672c30
Add initial support for Phytium FT2000 series and ARMV9 Cortex 510/710/X1/X2
3 years ago
Martin Kroeker
40302558ed
Remove extraneous (and wrong) definition of sbgemm_r on x86_64
3 years ago
Caroline Newcombe
5cc1111383
fix unsafe read of Y in assembly kernel
3 years ago
Xianyi Zhang
45786b05da
Merge branch 'develop' into risc-v
3 years ago
Wangyang Guo
225683218c
Small Matrix: use proper inline asm input constraint for AVX512 mask
3 years ago
Martin Kroeker
9c626e466e
really fix definition of SHUFFLE_MAGIC_NO
3 years ago
Martin Kroeker
0698212c8c
Remove stray $
3 years ago
Martin Kroeker
9d7429406f
Declare SHUFFLE_MAGIC_NO as const to placate clang
3 years ago
Martin Kroeker
d9894f45d3
Define sbgemm_r to fix DYNAMIC_ARCH builds
3 years ago
Martin Kroeker
522f809825
Merge pull request #3542 from martin-frbg/issue3540
Fix compilation for CooperLake on Windows/clang
3 years ago
Mosè Giordano
abbc947edb
Fix compilation of Skylake AVX512 kernels with GCC 6
3 years ago
Martin Kroeker
c62f8e2c01
Prevent compiler attempts to use k0 as mask register
3 years ago
Martin Kroeker
80eb581c83
Fix non-portable u_int64_t
3 years ago
Martin Kroeker
73ffabe6ba
Guard uses of _mm512_reduce_add_p?
3 years ago
Martin Kroeker
7656aba00e
Merge pull request #3493 from martin-frbg/casts+cleanup
WIP casts and cleanups
4 years ago
Martin Kroeker
addc2a7aaa
Add proper defaults for IMIN/IMAX
4 years ago
Martin Kroeker
299d4d70a3
Add default KERNEL file for Elbrus E2K arch
4 years ago
Martin Kroeker
3492bea602
Create Makefile
4 years ago
Martin Kroeker
898cf5faf3
Add Elbrus e2k architecture support
4 years ago
Martin Kroeker
c1c0d5ce1d
Merge pull request #3492 from binebrank/arm_sve_zgemm
SVE zgemm&cgemm (and other BLAS 3 complex)
4 years ago
Bine Brank
19d435b1b3
update armv8sve + contributors
4 years ago
Bine Brank
f158d59087
adapt CMake
4 years ago
Bine Brank
b6a445cfd8
adapt Makefile for SVE trsm
4 years ago
Bine Brank
0fb6cc07bf
fix ztrsm lt/ut copy
4 years ago
Bine Brank
f1315288a8
add sve ztrsm
4 years ago
Bine Brank
aaa2b1a861
fix sve dtrsm kernels
4 years ago
Bine Brank
8071e179f1
add remaining sve trsm copy kernels
4 years ago
Bine Brank
f87468ac91
trsm_lncopy_sve
4 years ago
Bine Brank
e8939b3d30
sve trsmRN and trsmRT
4 years ago