maamountki
dc4d3bccd5
[ZARCH] Fix icamax/icamin
7 years ago
maamountki
c7143c1019
[ZARCH] Fix iamax/imax single precision
7 years ago
maamountki
04873bb174
[ZARCH] Undo the last commit
7 years ago
maamountki
c8ef9fb220
[ZARCH] Fix bug in iamax/iamin/imax/imin
7 years ago
maamountki
b111829226
[ZARCH] Update max/min functions
7 years ago
maamountki
b815a04c87
[ZARCH] fix a bug in max/min functions
7 years ago
maamountki
1a7925b3a3
[ZARCH] Update dgemv_n_4.c
7 years ago
maamountki
406f835f00
[ZARCH] update cgemv_n_4.c
7 years ago
maamountki
621dedb37b
[ZARCH] Update cgemv_t_4.c
7 years ago
maamountki
b731e8246f
Update sgemv_t_4.c
7 years ago
maamountki
ecc31b743f
Update dgemv_t_4.c
7 years ago
maamountki
5d89d6b143
[ZARCH] fix sgemv_n_4.c
7 years ago
maamountki
67432b23c2
[ZARCH] fix cgemv_n_4.c
7 years ago
maamountki
be66f5d5c2
[ZARCH] fix data prefetch type in sdot
7 years ago
maamountki
c2ffef8156
[ZARCH] fix data prefetch type in ddot
7 years ago
maamountki
e7455f500c
[ZARCH] fix dsdot.c
7 years ago
maamountki
3eafcfa650
[ZARCH] fix cgemv_n_4.c
7 years ago
maamountki
94cd946b96
[ZARCH] fix cgemv_n_4.c
7 years ago
maamountki
1aa840a0a2
[ZARCH] fix sgemv_t_4.c
7 years ago
maamountki
e6c0e39492
Optimize Zgemv
7 years ago
maamountki
23229011db
[ZARCH] Z14 support, BLAS 1/2 single precision implementations, Some missing double precision implementations, Gemv optimization
7 years ago
Martin Kroeker
4e103c822c
typo fix
7 years ago
Martin Kroeker
d2142760e0
Fix precision problem in DSDOT
7 years ago
Martin Kroeker
2fbfc64da8
Use C kernels for default c/zAXPY, xROT, c/zSWAP
7 years ago
Martin Kroeker
ba8388cee0
Merge pull request #1651 from martin-frbg/avx512-nodgemm
Disable the 16x2 DTRMM kernel on SkylakeX as well
7 years ago
Martin Kroeker
6e54b0a027
Disable the 16x2 DTRMM kernel on SkylakeX as well
7 years ago
Martin Kroeker
40c8cbc3bf
Merge pull request #1650 from martin-frbg/avx512-nodgemm
Disable the AVX512 DGEMM kernel for now
7 years ago
Martin Kroeker
f0a8dc2eec
Disable the AVX512 DGEMM kernel for now
due to #1643
7 years ago
Martin Kroeker
b83e4c60c7
Remove premature exit for INC_X or INC_Y zero
7 years ago
Martin Kroeker
e344db269b
Remove premature exit for INC_X or INC_Y zero
7 years ago
Martin Kroeker
545b82efd3
Remove premature exit for INC_X or INC_Y zero
7 years ago
Martin Kroeker
e322a951fe
Remove premature exit for INC_X or INC_Y zero
7 years ago
Martin Kroeker
c628c6fa59
Merge pull request #1612 from oon3m0oo/cpus
Fixed a few more unnecessary calls to num_cpu_avail.
7 years ago
Martin Kroeker
6f71c0fce4
Return a somewhat sane default value for L2 cache size if cpuid retur… ( #1611 )
* Return a somewhat sane default value for L2 cache size if cpuid returned something unexpected
Fixes #1610 , the KVM hypervisor on Google Chromebooks returning zero for CPUID 0x80000006, causing DYNAMIC_ARCH
builds of OpenBLAS to hang
7 years ago
Craig Donner
c2545b0fd6
Fixed a few more unnecessary calls to num_cpu_avail.
I don't have as many benchmarks for these as for gemm, but it should still
make a difference for small matrices.
7 years ago
Arjan van de Ven
89372e0993
Use AVX512 also for DGEMM
this required switching to the generic gemm_beta code (which is faster anyway on SKX)
for both DGEMM and SGEMM
Performance for the not-retuned version is in the 30% range
7 years ago
Martin Kroeker
0023515733
Typo fix (misplaced parenthesis)
7 years ago
Arjan van de Ven
99c7bba8e4
Initial support for SkylakeX / AVX512
This patch adds the basic infrastructure for adding the SkylakeX (Intel Skylake server)
target. The SkylakeX target will use the AVX512 (AVX512VL level) instruction set,
which brings 2 basic things:
1) 512 bit wide SIMD (2x width of AVX2)
2) 32 SIMD registers (2x the number on AVX2)
This initial patch only contains a trivial transofrmation of the Haswell SGEMM kernel
to AVX512VL; more will follow later but this patch aims to get the infrastructure
in place for this "later".
Full performance tuning has not been done yet; with more registers and wider SIMD
it's in theory possible to retune the kernels but even without that there's an
interesting enough performance increase (30-40% range) with just this change.
7 years ago
Martin Kroeker
8562d5787a
Merge pull request #1583 from martin-frbg/issue1575
Handle INCX=0,INCY=0 case
7 years ago
Martin Kroeker
7df8c4f76f
typo fix
7 years ago
Martin Kroeker
2fc748bf72
Restore optimized swap kernel now that we have a proper fix
7 years ago
Martin Kroeker
d1b7be14aa
Handle INCX=0,INCY=0 case
Fixes #1575 (sswap/dswap failing the swap utest on x86) as suggested by atsampson.
7 years ago
Martin Kroeker
961d25e9c7
Use the new zrot.c on POWER8 for crot as well
fixes #1571 (the old zrot.S assembly does not handle incx=0 correctly)
7 years ago
Martin Kroeker
f5959f2543
Merge pull request #1567 from martin-frbg/mipstrmm
Revert " Switch mips32 target to USE_TRMM to fix complex TRMM"
7 years ago
Martin Kroeker
82012b960b
Revert " Switch mips32 target to USE_TRMM to fix complex TRMM"
... as it was just a silly workaround for the issue seen in #1563 , caused by #1419
7 years ago
Martin Kroeker
8dd3515fa2
Merge pull request #1565 from martin-frbg/mipstypo
Remove extraneous brace from previous commit of mips dsdot fix
7 years ago
Martin Kroeker
95f7f0229c
Remove extraneous brace from previous commit
7 years ago
Martin Kroeker
5082fe4306
Merge pull request #1564 from martin-frbg/issue1563
Revert changes from PR#1419
7 years ago
Martin Kroeker
7a7619af6d
Revert changes from PR#1419
at least one of these changes apparently is an oversimplification, leading to TRMM breakage on some platforms as observed in #1563
7 years ago
Martin Kroeker
893b535540
Use correct data type for initializers of v2f64, v4f32
Fixes #1561
7 years ago