Chip-Kerchner
4e738e561a
Replace two vector loads with one vector pair load and fix endianess of stores.
2 years ago
Martin Kroeker
04bc801999
(Re)apply fixes for supporting only a subset of precision types from PR 3915
2 years ago
Martin Kroeker
9019bc4945
Use SkylakeX ?ASUM microkernel for Cooperlake/Sapphirerapids as well
2 years ago
Martin Kroeker
3bfa4d4dcc
Fix outdated SVE kernel definitions for Cortex cpus by aliasing to ARMV8SVE
2 years ago
Rajalakshmi Srinivasaraghavan
980f702f72
POWER: AIX: Make use of power10 optimization
POWER10 optimizations are disabled when using default AIX assembler.
As we have fixed many issues recently, enabling optimization path
for default assembler.
2 years ago
Rajalakshmi Srinivasaraghavan
9f42570e33
POWER: Increase macro size limit for AIX
This patch increases the macro size limit from 4096 to 16384 to
allow compiling larger assembly files in AIX.
Tested with GCC and IBM Open XL C.
2 years ago
Martin Kroeker
9f49aef91b
Merge pull request #4255 from RajalakshmiSR/AIX-P10
POWER10: Fix compilation issues with Open XL C
2 years ago
Martin Kroeker
e7d05402e0
Fix up S/D GEMM copy function definitions after #4009
2 years ago
Rajalakshmi Srinivasaraghavan
71d733e5f7
POWER: Avoid m4 conversions for C files
This patch removes intermediate m4 conversions used in sbgemm
compilation as it is not needed for .c files.
Tested on AIX with gcc and IBM Open XL C.
2 years ago
Rajalakshmi Srinivasaraghavan
82fc29a57a
POWER10: Fallback to POWER8 functions
As cgemm and zgemm kernels are not optimized for big endian falling
back to POWER8 versions. Tested on AIX using gcc and Open XL C.
2 years ago
Rajalakshmi Srinivasaraghavan
db0805906b
powerpc: Fix build errors with Open XL C
This patch fixes errors when using Open XL C compiler on AIX.
Tested with gcc/xlf and ibm-clang/xlf compiler combinations.
2 years ago
Martin Kroeker
675cd551da
fix improper function prototypes (empty parentheses)
2 years ago
gxw
d15e0a055c
LoongArch64: Fixed compilation issues when enable DYNAMIC_ARCH
2 years ago
gxw
4670eb1462
LoongArch64: Add dtrsm kernel
2 years ago
gxw
f2cf929374
LoongArch64: Add sgemv kernel
2 years ago
Martin Kroeker
8e6d93359d
Merge pull request #4196 from TiborGY/obsolete_inlines
Modernize obsolete inline order
2 years ago
gxw
394a1fd1bf
LoongArch64: Compatible with early internal toolchain
__loongarch_grlen and __loongarch_frlen were introduced in gcc version 8.3.0
(Loongnix 8.3.0-6.lnd.vec.31) internally within Loongson to standardize the
general and floating-point register widths. However, previous versions did
not have them, requiring additional checks to be added.
2 years ago
Martin Kroeker
9c4ae4d4fb
Merge pull request #4206 from martin-frbg/issue4201-2
Work around miscompilation of zdot_thunderx2t99 by the current NVIDIA HPC compiler
2 years ago
Martin Kroeker
88435104c8
Merge pull request #4204 from martin-frbg/llvm17-2
Work around LLVM17 miscompiling the AVX512 microkernels for CASUM/ZASUM
2 years ago
Martin Kroeker
fc8894dd98
Workaround miscompilation by NVIDIA nvc
2 years ago
Martin Kroeker
7a6203ffa1
restore default Neoverse SVE build instructions for non-NVIDIA compilers
2 years ago
Martin Kroeker
2c3034ff7f
Disable the C/ZASUM AVX512 microkernels when compiling with LLVM17 as well
2 years ago
Martin Kroeker
8794544b43
Add support for compiling the Neoverse SVE kernels with the NVIDIA HPC compiler
2 years ago
gxw
553cc1372f
LoongArch64: Add sgemm_kernel
2 years ago
Martin Kroeker
12ede72ab7
Merge pull request #4192 from imciner2/im/clangfix
Fix cooperlake and sapphire rapids march flags on clang
2 years ago
Ian McInerney
79c15db348
Fix power10 gcc intrinsic check
__builtin_vsx_assemble_pair was only in GCC 10-11.2 and was replaced by
__builtin_vsx_build_pair thereafter.
2 years ago
TGY
b5ba95a6c0
Modernize obsolete inline order
2 years ago
Ian McInerney
8a8a8479be
Fix cooperlake and sapphire rapids march flags on clang
The march=cooperlake and march=sapphirerapids flags were never getting
added when building with Clang targetting those architectures. Instead
it was falling back to the skylake AVX512 implementation.
Clang added support for these two architectures in Clang 9 and Clang 12,
so introduce new checks for those versions to enable the appropriate
march flag, and fallback to skylake otherwise.
2 years ago
Martin Kroeker
34da1a067d
Allow negative INCX (API change from version 3.10 of the reference implementation)
2 years ago
Martin Kroeker
07e32c4cb8
Allow negative INCX (API change from version 3.10 of the reference implementation)
2 years ago
Martin Kroeker
c211da0688
Allow negative INCX (API change from version 3.10 of the reference implementation)
2 years ago
Martin Kroeker
a34a0a7abc
Allow negative INCX (API change from version 3.10 of the reference implementation)
2 years ago
Martin Kroeker
54d3246fc6
Allow negative INCX (API change from version 3.10 of the reference implementation)
2 years ago
Martin Kroeker
7dd441d5db
Allow negative INCX (API change from version 3.10 of the reference implementation)
2 years ago
Martin Kroeker
f692178792
Allow negative INCX (API change from version 3.10 of the reference implementation)
2 years ago
Martin Kroeker
d15ffb7fdf
Allow negative INCX (API change from version 3.10 of the reference implementation)
2 years ago
Martin Kroeker
a2d867f4d1
Allow negative iNCX (API change from version 3.10 of the reference implementation)
2 years ago
Martin Kroeker
afdc56a421
Merge pull request #4158 from XiWeiGu/loongarch64_update_dgemm_kernel
LoongArch64: Update dgemm kernel
2 years ago
gxw
e8b571d245
LoongArch64: Add dgemv_t_8_lasx.S and dgemv_n_8_lasx.S V2
2 years ago
gxw
71fcee6eef
LoongArch64: Update dgemm kernel
2 years ago
Martin Kroeker
0f521ece25
Merge pull request #4183 from martin-frbg/issue4181
Apply USE_TRMM to MIPS64_GENERIC as to GENERIC in gmake builds
2 years ago
Martin Kroeker
41c31bc1d4
Revert "LoongArch64: Add dgemv_t_8_lasx.S and dgemv_n_8_lasx.S"
2 years ago
Martin Kroeker
61d803547a
Apply USE_TRMM to MIPS64_GENERIC as to GENERIC
2 years ago
Martin Kroeker
f8ee309402
Merge pull request #4153 from XiWeiGu/dgemv
LoongArch64: Add dgemv_t_8_lasx.S and dgemv_n_8_lasx.S
2 years ago
gxw
ec1e96aac8
LoongArch64: Add dgemv_t_8_lasx.S and dgemv_n_8_lasx.S
2 years ago
gxw
d46772e037
LoongArch64: Add compiler feature checks
2 years ago
Martin Kroeker
4664b57e6e
use shortcut only when both incx and incy are zero
2 years ago
Martin Kroeker
09131f79a6
Merge pull request #4164 from martin-frbg/issue4162
Enable use of AVX512 microkernels with NVIDIA HPC from version 22.3
2 years ago
Martin Kroeker
6a428b5629
Update casum_microk_skylakex-2.c
2 years ago
Martin Kroeker
ebb447e32e
Update zasum_microk_skylakex-2.c
2 years ago