Martin Kroeker
454edd741c
Merge pull request #3425 from binebrank/arm_sve_dgemm
Add dgemm kernel for arm64 SVE
4 years ago
Bine Brank
f4da23dcb6
reduced dgemm_unroll_m to work with 128-bit sve
4 years ago
Bine Brank
9388f05a3c
configure SVE Makefile
4 years ago
Martin Kroeker
52a3f004a0
Fix unintended reversion of recent CortexA53 changes
4 years ago
Martin Kroeker
19ccef5fb1
Add generic MIPS32 target
4 years ago
Jia-Chen
302f22693a
MOD: optimize normal DGEMM on ARMV8 cortex-A53 & cortex-A55
4 years ago
Martin Kroeker
46947efb83
Ignore compiler support for MIPS MSA if the cpu lacks this capability
4 years ago
Bine Brank
ab7917910d
add v2x8 kernel + fix sve dtrmm
4 years ago
Bine Brank
7093372e32
add ARMV8SVE target
4 years ago
Wangyang Guo
7b2f5cb3b7
sbgemm: spr: enlarge P to 256 for performance
4 years ago
Wangyang Guo
0abbcd19c1
sbgemm: spr: tuning for blocking params
4 years ago
Wangyang Guo
3dc6052c7e
initial support for Sapphire Rapids platform
4 years ago
Martin Kroeker
24233b7c49
Use "big arm server" GEMM defaults for Vortex
4 years ago
kavanabhat
fe3c778c51
AIX changes for P10 with GNU Compiler
4 years ago
Wangyang Guo
8356a604f0
sbgemm: cooperlake: tuning for block params
4 years ago
Niyas Sait
7cddbf99b1
Make explicit conversion condition on _WIN64 flag
4 years ago
Niyas Sait
d1ed72fa87
[win/arm64]: Explicit casting for GMEMM_DEFAULT_ALIGN to create 64-bit value
Win64 uses LLP64 datamodel and unsigned long is only 32-bit. For 64-bit
architecture we need 64-bit mask to correctly generate address
4 years ago
gxw
af0a69f355
Add support for LOONGARCH64
4 years ago
Martin Kroeker
a6351e32f0
Remove BLASLONG casts from SPARC entries
in response to https://github.com/xianyi/OpenBLAS/pull/3266#issuecomment-878637675
4 years ago
User User-User
b7da75e4fd
WiP CORTEX A55 support
4 years ago
Martin Kroeker
7dfc45e840
Remove casts for PPC/POWER and complete parameters for POWER3/4
4 years ago
Gordon Fossum
198adea961
Changed default P/Q values for CGEMM and ZGEMM (Power10 only)
4 years ago
Martin Kroeker
8cdf0825de
Add workaround for older gcc on ppc64be not supporting casts in defines
4 years ago
Martin Kroeker
ecb4babcf4
remove inclusion of common.h again to avoid circular dependency
4 years ago
Martin Kroeker
30d835168a
Merge pull request #3088 from xoviat/msvc
add misc fixes.
4 years ago
austinpagan
9579bd47e5
Modifying a couple paramaters in the "POWER10"-specific section of param.h, for performance enhancements for SGEMM and DGEMM.
4 years ago
Rajalakshmi Srinivasaraghavan
63fa6c832e
Fix build issue on POWER8 with DYNAMIC_ARCH
Running make DYNAMIC_ARCH=1 on POWER 8 BE with gcc10.2 version, gives
the following error due to the difference in UNROLL_M/N.
'No rule to make target 'dgemm_incopy_POWER10.o', needed by kernel'
5 years ago
xoviat
457ccc42c9
Merge branch 'develop' into msvc
5 years ago
Gordon Fossum
ed652d8136
Added definitions for GEMM_PREFERED_SIZE and SWITCH_RATIO to the POWER9 and POWER10 specific sections of param.h.
5 years ago
Martin Kroeker
83de62c20d
Merge pull request #3026 from martin-frbg/revert747
Revert PR747 - SYRK parameter changes for Haswell and related targets
5 years ago
gxw
4b548857d6
Add msa support for loongson
1. Using core loongson3r3 and loongson3r4 for loongson
2. Add DYNAMIC_ARCH for loongson
Change-Id: I1c6b54dbeca3a0cc31d1222af36a7e9bd6ab54c1
5 years ago
Martin Kroeker
d71fe4ed4e
Remove GEMM_DEFAULT_UNROLL_MN parameters for Haswell and ZEN (introduced in PR747)
5 years ago
Martin Kroeker
b0b14f4e9b
Change comments to C style for compatibility
5 years ago
Rajalakshmi Srinivasaraghavan
41fe6e864e
POWER10: Update param.h
Increasing the values of DGEMM_DEFAULT_P and DGEMM_DEFAULT_Q helps
in improving performance ~10% for DGEMM.
5 years ago
Xianyi Zhang
fc35b72ae1
Refs #2899
Merge branch 'openblas-open-910' of git://github.com/damonyu1989/OpenBLAS into damonyu1989-openblas-open-910
5 years ago
Xianyi Zhang
913cc9a4ca
Merge branch 'develop' into risc-v
5 years ago
Rajalakshmi Srinivasaraghavan
dd7a9cc5bf
POWER10: Change dgemm unroll factors
Changing the unroll factors for dgemm to 8 shows improved performance with
POWER10 MMA feature. Also made some minor changes in sgemm for edge cases.
5 years ago
Zhang Xianyi
d7ba7679b6
Merge branch 'develop' into risc-v
5 years ago
damonyu
ef8e7d0279
Add the support for RISC-V Vector.
Change-Id: Iae7800a32f5af3903c330882cdf6f292d885f266
5 years ago
Martin Kroeker
ca31c32693
Rename "HALF" and "sh" to "BFLOAT16" and "sb"
5 years ago
Chen, Guobing
e740c4873d
Enable COOPERLAKE build target
Enable new build target platform -- COOPERLAKE. This target platform
supports all the SKYLAKEX supported ISAs + avx512bf16. So all the
SKYLAKEX specific kernels/drivers and related code are now extended
to be also active on COOPERLAKE. Besides, new BF16 related kernels
are active under this target.
5 years ago
Marius Hillenbrand
e115c97e05
s390x/SGEMM: adjust default P and Q to multiples of M
We recently changed the register blocking for SGEMM on s390x to 16x4.
However, we did not adjust Q to a multiple of 16 and thus fell back to
the 8x4 kernel at each block's margin, without need. Adjust P and Q to
multiples of 16 to employ the faster 16x4 kernel for complete full-sized
blocks.
Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>
5 years ago
Ashwin Sekhar T K
4e1be0e481
ARM64: Add THUNDERX3T110 Target
5 years ago
Martin Kroeker
bd2498c886
Use POWER6 GEMM parameters on 32bit POWER8
5 years ago
Rajalakshmi Srinivasaraghavan
d23419accc
powerpc: Optimized SHGEMM kernel for POWER10
This patch introduces new optimized version of SHGEMM kernel
using power10 Matrix-Multiply Assist (MMA) feature introduced in
POWER ISA v3.1. This patch makes use of new POWER10 compute instructions
for matrix multiplication operation.
Tested on simulator and there are no new test failures.
5 years ago
Rajalakshmi Srinivasaraghavan
9fe930f205
powerpc: Add support for future processor
This is the initial patch to support build infrastructure
for POWER10 architecture.
5 years ago
Martin Kroeker
f16e39554d
Change PPCG4 CGEMM_M to match kernel change
5 years ago
张丹枫
ea5bdc3f72
split cortex-a53 param to match 8x8 kernel
5 years ago
Marius Hillenbrand
1b0b4349a1
s390x/Z14: Change register blocking for SGEMM to 16x4
Change register blocking for SGEMM (and STRMM) on z14 from 8x4 to 16x4
by adjusting SGEMM_DEFAULT_UNROLL_M and choosing the appropriate copy
implementations. Actually make KERNEL.Z14 more flexible, so that the
change in param.h suffices. As a result, performance for SGEMM improves
by around 30% on z15.
On z14, FP SIMD instructions can operate on float-sized scalars in
vector registers, while z13 could do that for double-sized scalars only.
Thus, we can double the amount of elements of C that are held in
registers in an SGEMM kernel.
Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>
5 years ago
Martin Kroeker
03ff213c51
Increase POWER8 ZGEMM_R and use same R values for POWER9
fixes lapack-test zger failures seen in #2299 after application of my PR #2551
5 years ago