Martin Kroeker
2d0929fa7c
Move the test for zero incx,incy in ARMV7 ROT
to pass the related utest (see #1469 )
7 years ago
Martin Kroeker
125343cc88
Drop test for zero incx,incy in armv7 AXPY
...to pass the related utest (see #1469 )
7 years ago
Martin Kroeker
22167170b3
Merge pull request #1477 from quickwritereader/develop
Power8 blas3 copy-pack routines
8 years ago
Ashwin Sekhar T K
fa9ca65c0e
ARM64: Fix utest dsdot errors
8 years ago
Martin Kroeker
719b68f077
Merge pull request #1473 from martin-frbg/p2align
Replace .align with .p2aligns in dscal.c and the Nehalem microkernels as well
8 years ago
Martin Kroeker
fe9f15f2d8
Merge pull request #1472 from martin-frbg/utest-fixes
Fix limited DSDOT precision on arm,aarch64 and zarch
8 years ago
Martin Kroeker
497f0c3d8a
Replace .align with .p2align in the Nehalem microkernels
8 years ago
Martin Kroeker
ea37db828e
Convert .align to .p2align for OSX compatibility
8 years ago
Martin Kroeker
6e70287776
Use generic/dot.c for DSDOT on ARMV5 and above
The default arm/dot.c is less precise when used for DSDOT, as shown by utest
8 years ago
Martin Kroeker
58f236ad73
Use generic/dot.c for DSDOT on zarch
8 years ago
Martin Kroeker
e207107150
Use generic/dot.c for DSDOT on z13
The implementation in arm/dot.c has lower precision, as shown by the utest for dsdot.
8 years ago
Martin Kroeker
c9d408064a
Use dot.S also for DSDOT on CORTEXA57
8 years ago
Martin Kroeker
288d1a3f6e
Use dot.S also for DSDOT on ARMV8
8 years ago
Martin Kroeker
7c1925acec
Use .p2align instead of .align for compatibility on Sandybridge as well
8 years ago
Martin Kroeker
2359c7c1a9
Use .p2align instead of .align for portability
The OSX assembler apparently mishandles the argument to decimal .align, leading to a significant loss of performance
as observed in #730 , #901 and most recently #1470
8 years ago
Martin Kroeker
e7366a4161
Restore the remaining utests ( #1462 )
* Restore the remaining utests
* Try fork test on Cygwin and Linux only, it hangs on at least ARMv8/Android as well
* Use generic sswap/dswap kernels for NEHALEM 32bit to fix fault found by the restored swap utest
* Disable zdotu test for MS cl to work around runtime error -1073741819 on AppVeyor for now
(probably coding error in the initialization of the complex numbers or wrong choice of zdotu API)
8 years ago
the mslm
2c0a008281
dgemm_ncopy_4_ save/restore
8 years ago
the mslm
c5425daa6b
power8 ?gemm_tcopy save/restore
8 years ago
Martin Kroeker
b47e6822aa
Enable most assembly kernels in the generic ARMV8 target
ref #1439
8 years ago
Abdelrauf
60596a1abc
Merge branch 'develop' into develop
8 years ago
Abdelrauf
afd514c25d
small fix inside ifdef z13mvc . (z13mvc code is not used in production)
8 years ago
Martin Kroeker
f45776ec1f
Merge pull request #1440 from quickwritereader/develop
small corrections
8 years ago
Martin Kroeker
e388459a27
Merge pull request #1419 from brada4/develop
Initialize unitialized values for repeated calls
8 years ago
Abdelrauf
f653e7a18d
small fix
small fix inside ifdef z13mvc . (z13mvc code is not used in production)
8 years ago
the mslm
f946a89432
zscal (case: real alpha=0 ) mikrokernel shift&mem fix , da_i as input reg. small typo fixes
8 years ago
Martin Kroeker
485df77612
Make USE_TRMM depend on TARGET_CORE not TARGET
Fixes #1432 (and possibly other DTRMM-related failures on Haswell and related architectures when built with cmake)
8 years ago
Martin Kroeker
e4c71a799a
Merge pull request #1426 from quickwritereader/develop
(Z13 ) Blas1 mikrokernels can be inlined by gcc. Refactoring,fixes,tunings
8 years ago
the mslm
2619ad7ea5
Blas1 mikrokernels can be inlined by gcc. Refactoring ( symbolic operan
names). Some fixes and tunings
8 years ago
Andrew
e5cc3d72c0
core.IdenticalExpr clang501 checker
8 years ago
Andrew
4938faa822
core.IdenticalExpr clang501 checker
8 years ago
Andrew
9fa986337d
add missing brackets to silence indentation warnings gcc721
8 years ago
Andrew
3eed97f6b9
Initialize values to silence cppcheck
8 years ago
Andrew
13e137fbc9
Initialize uninitialized variables (cppcheck)
8 years ago
Martin Kroeker
3d23f45107
Merge pull request #1415 from quickwritereader/develop
(Z systems Z13) small fixes, some (i(dz)amin,i(dz)amax,(dz)dot,(dz)asum) mikrokernels…
8 years ago
Abdelrauf
87669d1c0a
small fixes, some (i(dz)amin,i(dz)amax,(dz)dot,(dz)asum) mikrokernels can be inlined
8 years ago
Martin Kroeker
42285d8e70
Merge pull request #1410 from brada4/develop
Address warnings #1357
8 years ago
Andrew
d602b99386
LAPACK helpers in C that need care too
8 years ago
Andrew
4d0b005e5b
Eliminate remaining unused results in kernels (clang5 analyzer)
8 years ago
Martin Kroeker
b81656936f
Merge pull request #1409 from martin-frbg/issue1292-2
Tag %1 and %2 as both input and output operands
8 years ago
Martin Kroeker
b973990df2
Tag %1 and %2 as both input and output operands
fix from #1292 extended to the other gemv microkernels
8 years ago
Martin Kroeker
1e31124eb0
Merge pull request #1406 from martin-frbg/issue1292
Tag %1 and %2 as both input and output
8 years ago
Martin Kroeker
cc9500db41
Merge pull request #1403 from brada4/develop
Address few more warnings
8 years ago
Martin Kroeker
723f396a20
Tag %1 and %2 as both input and output
The inline assembly modifies its input operands, so mark them as output to avoid surprises with optimization. Fixes #1292
8 years ago
Andrew
03e5ff0687
initialize potentially unitialized variables (clang5)
8 years ago
Andrew
47deec2c1a
fix couple of dead assignment warnings
8 years ago
Martin Kroeker
43c0622e7b
Retire Piledriver/Steamroller/Excavator daxpy microkernels as well
related to issue #1332
8 years ago
Martin Kroeker
0623636c98
Use Sandybridge daxpy kernel on Haswell and Zen for now
The testcase from #1332 exposes a problem in daxpy_microk_haswell-2.c that is not seen with
any of the other Intel x86_64 microkernels.
8 years ago
Andrew
281a2b952f
warning cleanup ( #1380 )
* dead increments in driver/level2
* dead increments in kernel/generic
* part dead increments in kernel/x86_64
8 years ago
Martin Kroeker
8213385ab8
Work around compiler warnings for unused variables in the generic zgemm3m_Xcopy kernels
8 years ago
Martin Kroeker
db00a51e6b
Merge pull request #1371 from martin-frbg/develop
Add trivially optimized DSDOT for POWER8
8 years ago