Martin Kroeker
f73cfb7e2c
change line endings from CRLF to LF
3 years ago
Martin Kroeker
1688c7da43
change line endings from CRLF to LF
3 years ago
Bart Oldeman
6c1043eb41
Add [cz]scal microkernels for SKYLAKEX
These are as similar to dscal_microk_skylakex-2.c as possible
for consistency.
Note that before this change SKYLAKEX+ uses generic C functions for
cscal/zscal via commit 2271c350 from #2610 (which is masked by
commit 086d87a30 ). However now #3799 disables FMAs (in turn enabled
by `-march=skylake-avx512`) in the plain C code which fixes excessive
LAPACK test failures more nicely.
3 years ago
Martin Kroeker
c9d78dc3b2
Remove excess initializer (leftover from rework of PR 3793)
3 years ago
Martin Kroeker
65338a9493
Merge pull request #3799 from bartoldeman/cscal-zscal-no-fma
x86_64: prevent GCC and Clang from generating FMAs in cscal/zscal.
3 years ago
Honglin Zhu
79066b6bf3
Change file name to match the norm and delete useless code.
3 years ago
Bart Oldeman
e7e3aa2948
x86_64: prevent GCC and Clang from generating FMAs in cscal/zscal.
If e.g. -march=haswell is set in CFLAGS, GCC generates FMAs by default, which
is inconsistent with the microkernels, none of which use FMAs. These
inconsistencies cause a few failures in the LAPACK testcases, where
eigenvalue results with/without eigenvectors are compared.
Moreover using FMAs for multiplication of complex numbers can give surprising
results, see 22aa81f for more information.
This uses the same syntax as used in 22aa81f for zarch (s390x).
3 years ago
Honglin Zhu
4989e039a5
Define SBGEMM_ALIGN_K for DYNAMIC_ARCH build
3 years ago
Honglin Zhu
843e9fd0b9
Fix typo error
3 years ago
Honglin Zhu
b00d5b9746
New sbgemm implementation for Neoverse N2
1. Use UZP instructions but not gather load and scatter store instructions to get lower latency.
2. Padding k to a power of 4.
3 years ago
Martin Kroeker
f6f35a4288
fix copyobj declarations to work with DYNAMIC_ARCH
3 years ago
Martin Kroeker
b1d69fb3ac
Add MIPS64_GENERIC as a copy of GENERIC
3 years ago
gxw
edea1bcfaf
MIPS64: Fixed failed utest dsdot:dsdot_n_1 when TARGET=I6500
3 years ago
Martin Kroeker
101a2c77c3
Fix warnings
3 years ago
Martin Kroeker
23d59baaf1
Add -mfma to -mavx2 for Apple clang, and set AVX2 options for Zen as well
3 years ago
gxw
365936ae1b
MIPS64: Using the macro MTC rather than MTC1
3 years ago
Martin Kroeker
739c3c44a7
Work around windows/osx gcc12 x86_64 tree-optimizer problem and add an osx/gcc12 build to Azure CI ( #3745 )
Add pragma to disable the gcc tree-optimizer for some x86_64 S and Z kernels with gcc12 on OSX or Windows
3 years ago
Martin Kroeker
bd30120ba7
Merge pull request #3720 from FlyGoat/mips64
Make it work on general MIPS64 processors
3 years ago
Jiaxun Yang
a50b29c540
Provide a fallback MIPS64_GENERIC target
It is really dangerous to fallback to Loongson core on other
MIPS64 processors.
Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
3 years ago
Jiaxun Yang
50c4eeb97d
alpha: Remove include of version.h
It will be defined by preprocessor argument.
Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
3 years ago
Ivan Pribec
802e71bf05
Add const attribute to lsame
3 years ago
gxw
fbfe1daf6e
LoongArch64: Add DYNAMIC_ARCH support
3 years ago
Martin Kroeker
cd8e57040c
Merge pull request #3691 from martin-frbg/issue3679-sparc
SPARC: fix DNRM2 returning INF instead of zero due to intermediate overflow
3 years ago
Martin Kroeker
6c118b7977
Fix DNRM2 returning INF instead of zero due to intermediate overflow
3 years ago
Martin Kroeker
c43ec53bdd
Merge pull request #3690 from RajalakshmiSR/cdotp10
POWER: Fix complex dot function failures
3 years ago
Martin Kroeker
b7c65d08cb
Merge pull request #3689 from RajalakshmiSR/dgemvgcc10
POWER10: dgemv builtin rename
3 years ago
Martin Kroeker
06ef015234
fix DNRM2 returning INF instead of zero due to intermediate overflow
3 years ago
Rajalakshmi Srinivasaraghavan
a612e78a97
POWER: Fix complex dot function failures
There are some test failures in complex dot functions when compiling with gcc12.
The machine constraints used now do not update all the four elements in the
expected result array. Fixing this with a reduced level of optimization.
This is not changing any performance numbers but will be converted to C code in future.
3 years ago
Rajalakshmi Srinivasaraghavan
432fd99445
POWER10: dgemv builtin rename
Add check to use correct builtin name for older versions
of gcc10 compilers.
3 years ago
gxw
4dd05e526b
LoongArch64: Fix dnrm2_tiny testcase failure
3 years ago
gxw
cce4b1d956
MIPS64: Fix dnrm2_tiny testcase failure
3 years ago
Martin Kroeker
e12d474780
Eliminate uses of CREAL on left-hand side of assignments
3 years ago
Martin Kroeker
9e29598575
workaround fault with ssq=inf,scale=0
3 years ago
Honglin Zhu
123e0dfb62
Neoverse N2 sbgemm:
1. Modify the algorithm to resolve multithreading failures
2. No memory allocation in sbgemm kernel
3. Optimize when alpha == 1.0f
3 years ago
Honglin Zhu
bc3728475f
format code
3 years ago
Honglin Zhu
55d686d41e
neoverse n2 sbgemm:
implement ncopy tcopy kernel_8x4
3 years ago
Honglin Zhu
04593bb27c
neoverse n2 sbgemm: init file
3 years ago
Martin Kroeker
be5500e704
Merge pull request #3669 from VFerrari/fix_small_matrix_kernel
POWER: fix issues with the small matrix kernel
3 years ago
Martin Kroeker
92275a7902
Merge pull request #3642 from nursik/develop
Add ARM64 support for Windows
3 years ago
VFerrari
cac634fce3
POWER10: Fix multithreading check when USE_THREAD=0
This patch fixes an issue when OpenBLAS is compiled for TARGET=POWER10
and the flag USE_THREAD is set to 0.
The function `num_cpu_avail` is only available when USE_THREAD=1,
so SMP is defined.
3 years ago
Martin Kroeker
9283c7c0b5
Merge pull request #3655 from RajalakshmiSR/zgemmasmp10
POWER10: Fix ZGEMM testcase failures
3 years ago
Rajalakshmi Srinivasaraghavan
f191bc652b
POWER10: Fix ZGEMM testcase failures
This patch fixes storing and restoring non volatile registers
in zgemm POWER10 kernel.
3 years ago
Rajalakshmi Srinivasaraghavan
8419d538ff
POWER10: convert dgemv inline assembly
This patch makes use of compiler builtins and matches with assembly
performance. Tested with clang14 and gcc12.
3 years ago
Xianyi Zhang
5e9a912591
Merge branch 'develop' into risc-v
3 years ago
Xianyi Zhang
968e1f51d8
Update RISC-V Intrinsic API.
3 years ago
Nursultan Zarlyk
1bb7993a97
Fix MSVC ARM64 build. Add generic kernel for ARM64
3 years ago
Martin Kroeker
dc49edd4e6
Revert "roll back DGEMM kernel ... for DYNAMIC_ARCH"
3 years ago
Rajalakshmi Srinivasaraghavan
b62173c5a0
POWER10: Changing store instructions for Level1 functions
This patch changes 32 bytes stores to two 16 bytes stores
to fix a recent degradation due to 32 bytes stores.
3 years ago
Martin Kroeker
84cb58b7fb
Fix generator rules for ?laswp_ncopy and ?neg_tcopy
3 years ago
Martin Kroeker
05dcfa176e
fix undefined prefetchsizes
3 years ago