Kavana Bhat
3938e59569
AIX changes for Power8
6 years ago
Kavana Bhat
3dc6b26eff
AIX changes for Power8
6 years ago
Martin Kroeker
7c51cc8527
Merge branch 'develop' into develop
6 years ago
AbdelRauf
853a18bc17
power9 makefile. dgemm based on power8 kernel with following changes : 32x unrolled 16x4 kernel and 8x4 kernel using (lxv stxv butterfly rank1 update). improvement from 17 to 22-23gflops. dtrmm cases were added into dgemm itself
6 years ago
Martin Kroeker
5b95534afc
Make TARGET=GENERIC compatible with DYNAMIC_ARCH=1
for issue #2048
6 years ago
Martin Kroeker
885a3c4350
USE_TRMM on Z14
from patch provided by aarnez in #991
7 years ago
Martin Kroeker
f3fd44a731
Set USE_TRMM for all ZARCH variants to fix TRMM faults with zarch-generic
fixes #1743
7 years ago
Arjan van de Ven
99c7bba8e4
Initial support for SkylakeX / AVX512
This patch adds the basic infrastructure for adding the SkylakeX (Intel Skylake server)
target. The SkylakeX target will use the AVX512 (AVX512VL level) instruction set,
which brings 2 basic things:
1) 512 bit wide SIMD (2x width of AVX2)
2) 32 SIMD registers (2x the number on AVX2)
This initial patch only contains a trivial transofrmation of the Haswell SGEMM kernel
to AVX512VL; more will follow later but this patch aims to get the infrastructure
in place for this "later".
Full performance tuning has not been done yet; with more registers and wider SIMD
it's in theory possible to retune the kernels but even without that there's an
interesting enough performance increase (30-40% range) with just this change.
7 years ago
Martin Kroeker
82012b960b
Revert " Switch mips32 target to USE_TRMM to fix complex TRMM"
... as it was just a silly workaround for the issue seen in #1563 , caused by #1419
7 years ago
Martin Kroeker
018f2dad27
Switch mips32 target to USE_TRMM to fix complex TRMM
7 years ago
Martin Kroeker
9c5518319a
Revert "Fix 32bit HASWELL builds"
7 years ago
Martin Kroeker
0e2cf102e1
Fix 32bit HASWELL
8 years ago
Denis Steckelmacher
c9ff735da6
Add ZEN support (tested for auto-detected static backend)
8 years ago
Zhang Xianyi
b678471d65
Merge branch 'z13' into develop
Conflicts:
CONTRIBUTORS.md
9 years ago
Zhang Xianyi
864e202afd
Add USE_TRMM=1 for IBM z13 in kernel/Makefile.L3
9 years ago
Kaustubh Raste
c8a7860eb3
STRSM optimized
Signed-off-by: Kaustubh Raste <kaustubh.raste@imgtec.com>
9 years ago
Werner Saar
b752858d6c
added dgemm-, dtrmm-, zgemm- and ztrmm-kernel for power8
10 years ago
Zhang Xianyi
94b125255f
Merge branch 'develop' into cmake
Conflicts:
driver/others/memory.c
10 years ago
Martin Koehler
711ca33bc6
Improved Ximatcopy when lda==ldb.
The Ximatcopy functions create a copy of the input matrix
although they seem to work inplace. The new routines
XIMATCOPY_K_YY perform the operations inplace if the leading
dimension does not change.
10 years ago
Zhang Xianyi
f874465bb8
Use cmake to build OpenBLAS GENERIC Target on MSVC x86 64-bit.
Disable CBLAS and LAPACK.
10 years ago
Werner Saar
9bd962f655
modified haswell parameter dgemm_unroll_n
10 years ago
Zhang Xianyi
ea7f9dacf4
Refs #509 . Fixed geadd building bug with DYNAMIC_ARCH=1.
11 years ago
Martin Koehler
39cc6b21d3
Add ATLAS-style ?geadd function
11 years ago
Zhang Xianyi
a85c2785ae
Refs #467 . Added generic kernel file for x86_64.
11 years ago
wernsaar
e80b144932
enabled compiling of *3M functions
11 years ago
wernsaar
be94db096c
disabled *3M functions for x86_64 platforms
11 years ago
Timothy Gu
6c2ead30f0
Remove all trailing whitespace except lapack-netlib
Signed-off-by: Timothy Gu <timothygu99@gmail.com>
11 years ago
wernsaar
cee257f384
Ref #51 : added blas extensions zomatcopy and comatcopy
11 years ago
wernsaar
7bfb3011e8
Ref #51 : added blas extension somatcopy
11 years ago
wernsaar
8c8f596238
Ref #51 : added blas extension domatcopy as not opimized reference
11 years ago
wernsaar
ffe70b1fdc
modified Makefile.L3
12 years ago
wernsaar
cff70a666d
added generic trmm kernels and modified Makefile.L3
12 years ago
wernsaar
d854b30ae6
Added UNROLL values for 3M to getarch_2nd.c, Makefile.system and Makefile.L3
12 years ago
Wang Qian
8e53b57bb2
Appending gemmkernel and trmmkernel C code in kernel/generic, this code can be used to execute on a new platform which dose not have optimized assemble kernel.
14 years ago
Xianyi Zhang
342bbc3871
Import GotoBLAS2 1.13 BSD version codes.
15 years ago