Martin Kroeker
6ebcce229f
Work around current conda/tqdm auto-update problem
4 years ago
Martin Kroeker
1a8b6134c2
Merge pull request #3278 from brada4/A55
Add CORTEXA55 cpuid 0xd05 support
4 years ago
Martin Kroeker
f0b822a709
Update cpuid_arm64.c
4 years ago
User User-User
130327e9af
OK
4 years ago
User User-User
750719528a
bugz
4 years ago
User User-User
91e2b11d3c
add to cmake listings too
4 years ago
User User-User
548aa522e5
remove misplaced file
4 years ago
User User-User
6423b282a1
dynamic_arch
4 years ago
User User-User
9335d42740
add gcc8 version matching
4 years ago
User User-User
39ef0880ae
copy conf
4 years ago
User User-User
b7da75e4fd
WiP CORTEX A55 support
4 years ago
Martin Kroeker
a7627c5afd
Merge pull request #3276 from martin-frbg/issue3274
Add workaround for another macro name collision with Windows 10 SDK winnt.h
4 years ago
Martin Kroeker
9499ab0d45
Merge pull request #3275 from martin-frbg/lapack580
Fix missing EXTERNAL declarations in LAPACK TESTING (LAPACK PR 580)
4 years ago
Martin Kroeker
307c4c0786
Fix typo
4 years ago
Martin Kroeker
e83df93975
Work around another recent macro name collision with winnt.h
4 years ago
Martin Kroeker
13fa9f737d
Modify defines for CR and RC to work around name collision on Windows
4 years ago
Martin Kroeker
5958ffc9b6
Declare DZASUM as EXTERNAL
4 years ago
Martin Kroeker
cd0e4aadb1
Declare ZDROT as EXTERNAL
4 years ago
Martin Kroeker
e2621ef93a
Declare SROT as EXTERNAL
4 years ago
Martin Kroeker
9e1b43ea9b
Declare DROT as EXTERNAL
4 years ago
Martin Kroeker
5269348178
Declare CSROT as EXTERNAL
4 years ago
Martin Kroeker
92e024bbb3
Declare SCASUM as EXTERNAL
4 years ago
Martin Kroeker
c4b464cac6
Merge pull request #3273 from austinpagan/sbgemm_gcc10_fix
Power10: Fix for SBGEMM
4 years ago
Gordon Fossum
e6dd44d989
Power10: Fix for SBGEMM
While testing bfloat16 sbgemm kernel, there are some failures for odd value inputs due to updating result for
additional bytes.
4 years ago
Martin Kroeker
baf03a0937
Merge pull request #3252 from martin-frbg/more_shortcuts
Further shortcuts for (small) cases that do not need buffer allocation
4 years ago
Martin Kroeker
7aab5e826c
Merge pull request #3250 from martin-frbg/gemv-shortcut
Add shortcut for small-size S/D GEMV_N with increments of one
4 years ago
Martin Kroeker
29417adf4c
Merge pull request #3270 from ggouaillardet/topic/dznrm2_tx2
arm64: add the missing d9 register to the clobber list
4 years ago
Gilles Gouaillardet
9d292d37b2
arm64: add the missing d9 register to the clobber list
Refs. numpy/numpy#18422
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
4 years ago
Martin Kroeker
2e8ff4a781
Merge pull request #3266 from martin-frbg/powerparam
Remove spurious casts from PPC parameters and fix compilation for older targets
4 years ago
Martin Kroeker
dbba381dc3
Merge pull request #3260 from intelmy/sgemv_t_opt
Optimized sgemv_t for small N based on AVX512
4 years ago
Martin Kroeker
f61991d439
Merge pull request #3264 from RajalakshmiSR/sbgemmp10
POWER10: Fixes for sbgemm kernel
4 years ago
Martin Kroeker
efdbdd8f82
Add prefetch values for power3
4 years ago
Martin Kroeker
3906ef3b0f
Add prefetch values for power3
4 years ago
Martin Kroeker
8adf0971d8
Add prefetch values for power3
4 years ago
Martin Kroeker
08e2e60762
Add prefetch values for power3
4 years ago
Martin Kroeker
fb9e678235
Fix caxpy/zaxpy for big-endian
4 years ago
Martin Kroeker
dc4fcb48df
Fix inverted conditional for caxpy/zaxpy
4 years ago
Martin Kroeker
7a48247761
fix c/zrot and sgemv for POWER5
4 years ago
Martin Kroeker
7dfc45e840
Remove casts for PPC/POWER and complete parameters for POWER3/4
4 years ago
Rajalakshmi Srinivasaraghavan
cbb70438df
POWER10: Fixes for sbgemm kernel
While testing bfloat16 sbgemm kernel, there are some failures
for odd value inputs due to array access beyond the boundary.
4 years ago
Ma, Yu
706a08d4a0
Optimized sgemv_t for small N based on AVX512
4 years ago
Zhang Xianyi
9f3d903817
Merge pull request #3259 from zhaofengli/riscv64-fixes
riscv64 fixes
4 years ago
Zhaofeng Li
590be3fae3
riscv64: Add Makefile
4 years ago
Zhaofeng Li
3521cd48cb
RISCV64_GENERIC: Use generic kernel for DSDOT for better precision
The implementation in `riscv64/dot.c` fails the `test_dsdot` test, and
the generic kernel seems to have better precision. Tested on SiFive
FU740 (HiFive Unmatched) and QEMU.
Also see #1469 .
4 years ago
Zhaofeng Li
1e0192a5cc
riscv64/imin: Fix wrong comparison
Same as #1990 .
4 years ago
Martin Kroeker
fe9aff17fe
Merge pull request #3258 from martin-frbg/hbaction
revert "try to work around gcc update problems" in Homebrew workflow
4 years ago
Martin Kroeker
8c25b440a0
revert "try to work around gcc update problems"
...as homebrew has dropped at least gcc8 now
4 years ago
Martin Kroeker
f84197c1a7
Add shortcuts for (small) cases that do not need expensive buffer allocation
4 years ago
Martin Kroeker
734bd265a8
revert symv changes for now
4 years ago
Martin Kroeker
1217eb910d
Fix copy-paste errors in variables used
4 years ago