Martin Kroeker
2a5fe97e3b
temporarily(?) disable the alpha=0 branch as it does not handle INF,NAN
1 year ago
Martin Kroeker
7f8f037a36
handle INF and NAN in input
1 year ago
Martin Kroeker
f1248b849d
handle INF and NAN in input
1 year ago
Rajalakshmi Srinivasaraghavan
e112191b54
POWER: Fix issues in zscal to address lapack failures
This patch fixes following lapack failures with clang compiler on POWER.
zed.out: ZVX: 18 out of 5190 tests failed to pass the threshold
zgd.out: ZGV drivers: 25 out of 1092 tests failed to pass the threshold
zgd.out: ZGV drivers: 6 out of 1092 tests failed to pass the threshold
1 year ago
Martin Kroeker
aa259b141d
Merge pull request #4704 from amritahs-ibm/saxpy_perf_fix
Fix regression SAXPY when compiler with OpenXL compiler.
1 year ago
Chip Kerchner
3a1417671a
POWER: Fixing endianness issue in cswap/zswap kernel for AIX
1 year ago
Amrita H S
87b3d9054f
Fix regression SAXPY when compiler with OpenXL compiler.
SAXPY built with OpenXL regresses when compared to SAXPY
built with gcc. OpenXL compiler doesn't know that the
SAXPY inner kernel assembly is a 64 element loop and
to it the remainder loop is the main loop. It vectorizes
and interleaves the remainder to be a 48 elements per
iteration loop. With a max of 63 iterations, a 48 element
loop is mostly not going to get executed, so the 1 element
scalar loop that is the remainder after that is probably
mostly what gets executed.
This can be fixed by adding a pragma, loop interleave_count(2)
which will result in 8 element loop.
Signed-off-by: Amrita H S <amritahs@linux.vnet.ibm.com>
1 year ago
Chip-Kerchner
99384933ff
Revert "Merge pull request #4532 from austinpagan/cgemm_zgemm_c_code"
This reverts commit accea1555159d0928a6aa2db740c042c7e8f0dd3, reversing
changes made to b925353006 .
1 year ago
Martin Kroeker
accea15551
Merge pull request #4532 from austinpagan/cgemm_zgemm_c_code
Cgemm zgemm c code
1 year ago
austinpagan
87ba528d8b
Changed C files to straighten out indentation. Removed commented lines from other file.
2 years ago
austinpagan
ddac75e0ef
Adding .C versions of CGEMM and ZGEMM
2 years ago
Chip Kerchner
2bb7ea64a1
Only vectorize 64-bit version for Power8.
2 years ago
Chip Kerchner
09bb48d1b9
Vectorize in-copy packing/copying for SGEMM - 4X faster.
2 years ago
Chip-Kerchner
058dd2a4cb
Replace two vector loads with one vector pair load and fix endianess of stores - DGEMM versions.
2 years ago
barracuda156
d9653af018
KERNEL.PPC970, KERNEL.PPCG4: unbreak CMake parsing
Fixes: https://github.com/OpenMathLib/OpenBLAS/issues/4366
2 years ago
Chip-Kerchner
4e738e561a
Replace two vector loads with one vector pair load and fix endianess of stores.
2 years ago
Rajalakshmi Srinivasaraghavan
980f702f72
POWER: AIX: Make use of power10 optimization
POWER10 optimizations are disabled when using default AIX assembler.
As we have fixed many issues recently, enabling optimization path
for default assembler.
2 years ago
Rajalakshmi Srinivasaraghavan
82fc29a57a
POWER10: Fallback to POWER8 functions
As cgemm and zgemm kernels are not optimized for big endian falling
back to POWER8 versions. Tested on AIX using gcc and Open XL C.
2 years ago
Martin Kroeker
8e6d93359d
Merge pull request #4196 from TiborGY/obsolete_inlines
Modernize obsolete inline order
2 years ago
Ian McInerney
79c15db348
Fix power10 gcc intrinsic check
__builtin_vsx_assemble_pair was only in GCC 10-11.2 and was replaced by
__builtin_vsx_build_pair thereafter.
2 years ago
TGY
b5ba95a6c0
Modernize obsolete inline order
2 years ago
Martin Kroeker
54d3246fc6
Allow negative INCX (API change from version 3.10 of the reference implementation)
2 years ago
Manjul Mohan
58b88aa5f0
POWER10: Fix compiler warnings
This patch removes the warning messages related to unused variables in
sbgemm_kernel_power10.c.
Signed-off-by: Manjul Mohan <manjul@linux.vnet.ibm.com>
2 years ago
Martin Kroeker
1688c7da43
change line endings from CRLF to LF
3 years ago
Martin Kroeker
6c118b7977
Fix DNRM2 returning INF instead of zero due to intermediate overflow
3 years ago
Martin Kroeker
c43ec53bdd
Merge pull request #3690 from RajalakshmiSR/cdotp10
POWER: Fix complex dot function failures
3 years ago
Rajalakshmi Srinivasaraghavan
a612e78a97
POWER: Fix complex dot function failures
There are some test failures in complex dot functions when compiling with gcc12.
The machine constraints used now do not update all the four elements in the
expected result array. Fixing this with a reduced level of optimization.
This is not changing any performance numbers but will be converted to C code in future.
3 years ago
Rajalakshmi Srinivasaraghavan
432fd99445
POWER10: dgemv builtin rename
Add check to use correct builtin name for older versions
of gcc10 compilers.
3 years ago
VFerrari
cac634fce3
POWER10: Fix multithreading check when USE_THREAD=0
This patch fixes an issue when OpenBLAS is compiled for TARGET=POWER10
and the flag USE_THREAD is set to 0.
The function `num_cpu_avail` is only available when USE_THREAD=1,
so SMP is defined.
3 years ago
Martin Kroeker
9283c7c0b5
Merge pull request #3655 from RajalakshmiSR/zgemmasmp10
POWER10: Fix ZGEMM testcase failures
3 years ago
Rajalakshmi Srinivasaraghavan
f191bc652b
POWER10: Fix ZGEMM testcase failures
This patch fixes storing and restoring non volatile registers
in zgemm POWER10 kernel.
3 years ago
Rajalakshmi Srinivasaraghavan
8419d538ff
POWER10: convert dgemv inline assembly
This patch makes use of compiler builtins and matches with assembly
performance. Tested with clang14 and gcc12.
3 years ago
Rajalakshmi Srinivasaraghavan
b62173c5a0
POWER10: Changing store instructions for Level1 functions
This patch changes 32 bytes stores to two 16 bytes stores
to fix a recent degradation due to 32 bytes stores.
3 years ago
Martin Kroeker
05dcfa176e
fix undefined prefetchsizes
3 years ago
Martin Kroeker
2bbb9f05c7
fix undefined prefetchsize
3 years ago
Rafael Cardoso Fernandes Sousa
c78fdcc80d
[POWER] Add support for SMALL_MATRIX_OPT
4 years ago
kavanabhat
9cc95e5657
AIX changes for P10 with GNU Compiler
4 years ago
kavanabhat
fe3c778c51
AIX changes for P10 with GNU Compiler
4 years ago
Rafael Cardoso Fernandes Sousa
b751edf624
Fix unused variable warnings on Power
4 years ago
Rajalakshmi Srinivasaraghavan
b06880c2cd
POWER10: Improving dasum performance
Unrolling a loop in dasum micro code to help in improving
POWER10 performance.
4 years ago
Martin Kroeker
c4b464cac6
Merge pull request #3273 from austinpagan/sbgemm_gcc10_fix
Power10: Fix for SBGEMM
4 years ago
Gordon Fossum
e6dd44d989
Power10: Fix for SBGEMM
While testing bfloat16 sbgemm kernel, there are some failures for odd value inputs due to updating result for
additional bytes.
4 years ago
Martin Kroeker
2e8ff4a781
Merge pull request #3266 from martin-frbg/powerparam
Remove spurious casts from PPC parameters and fix compilation for older targets
4 years ago
Martin Kroeker
efdbdd8f82
Add prefetch values for power3
4 years ago
Martin Kroeker
3906ef3b0f
Add prefetch values for power3
4 years ago
Martin Kroeker
8adf0971d8
Add prefetch values for power3
4 years ago
Martin Kroeker
08e2e60762
Add prefetch values for power3
4 years ago
Martin Kroeker
fb9e678235
Fix caxpy/zaxpy for big-endian
4 years ago
Martin Kroeker
dc4fcb48df
Fix inverted conditional for caxpy/zaxpy
4 years ago
Martin Kroeker
7a48247761
fix c/zrot and sgemv for POWER5
4 years ago