Martin Kroeker
fee361ae64
fix another source of NO_CBLAS=0 surprise
5 years ago
Martin Kroeker
5dd14e3d48
Make building the bfloat16 functions conditional on option BUILD_HALF ( #2590 )
* make building the bfloat16 BLAS functions conditional on BUILD_HALF
* pass the BUILD_HALF option to gensymbol
* Pass BUILD_HALF as a compiler define for dynamic_arch builds
5 years ago
Rajalakshmi Srinivasaraghavan
7eb55504b1
RFC : Add half precision gemm for bfloat16 in OpenBLAS
This patch adds support for bfloat16 data type matrix multiplication kernel.
For architectures that don't support bfloat16, it is defined as unsigned short
(2 bytes). Default unroll sizes can be changed as per architecture as done for
SGEMM and for now 8 and 4 are used for M and N. Size of ncopy/tcopy can be
changed as per architecture requirement and for now, size 2 is used.
Added shgemm in kernel/power/KERNEL.POWER9 and tested in powerpc64le and
powerpc64. For reference, added a small test compare_sgemm_shgemm.c to compare
sgemm and shgemm output.
This patch does not cover OpenBLAS test, benchmark and lapack tests for shgemm.
Complex type implementation can be discussed and added once this is approved.
5 years ago
Guillaume Horel
af9ac0898a
fix Makefile
6 years ago
Guillaume Horel
9b2f0323d6
update Makefile
6 years ago
Martin Kroeker
79cfc24a62
Add interface for ?sum (derived from ?asum)
6 years ago
Martin Kroeker
3d1e36d4cb
Build CBLAS interfaces for I?MIN and I?MAX
6 years ago
Martin Kroeker
9cf22b7d91
Build cblas_iXamin interfaces
7 years ago
Martin Kroeker
7f546f54fa
Add cblas_xerbla
8 years ago
Werner Saar
ae4ac6f984
removed obj-files, that are moved to lapack 3.7.0
9 years ago
Martin Koehler
39cc6b21d3
Add ATLAS-style ?geadd function
11 years ago
wernsaar
9e829ce98f
enabled cblas gemm3m functions
11 years ago
wernsaar
d49fd33885
disabled SYMM3M and HEMM3M functions because segment violations
11 years ago
wernsaar
7aae4a62e7
enabled use of GEMM3M functions
11 years ago
Martin Koehler
a057e5434d
add CBLAS interface for s/d/c/zimatcopy
11 years ago
Martin Köhler
7794766d3c
Add cblas_(s/d/c/z)omatcopy in order to have cblas interface for them.
11 years ago
wernsaar
cedc1f4b14
Ref #410 : disabled optimized potri functions ( single threading bug)
11 years ago
wernsaar
be94db096c
disabled *3M functions for x86_64 platforms
11 years ago
Timothy Gu
6c2ead30f0
Remove all trailing whitespace except lapack-netlib
Signed-off-by: Timothy Gu <timothygu99@gmail.com>
11 years ago
wernsaar
faeab93df0
Ref #51 : added blas extensions simatcopy, dimatcopy, cimatcopy, zimatcopy
11 years ago
wernsaar
cee257f384
Ref #51 : added blas extensions zomatcopy and comatcopy
11 years ago
wernsaar
7bfb3011e8
Ref #51 : added blas extension somatcopy
11 years ago
wernsaar
8c8f596238
Ref #51 : added blas extension domatcopy as not opimized reference
11 years ago
wernsaar
faf3ac0aad
Ref #285 : added axpby kernels
11 years ago
wernsaar
89da450800
enabled and tested optimized potri lapack functions
11 years ago
wernsaar
c26bbee489
enabled abd tested optimized trtri lapack functions
11 years ago
wernsaar
a748d3a75d
enabled optimized trti2 lapack functions again
11 years ago
wernsaar
a5ab231ad4
enabled optimized complex lauum lapack functions again
11 years ago
wernsaar
dbaeea7b59
enabled lauu2 and lauum lapack functions again
11 years ago
wernsaar
0d75f3b6a2
enabled and tested optimized gesv lapack functions
11 years ago
wernsaar
2ff66e661d
enabled and tested optimized laswp lapack function
11 years ago
wernsaar
ebc95e6f11
enabled and tested optimized potf2 lapack functions
11 years ago
wernsaar
61a2c50e8e
enabled and tested optimized getf2 lapack functions
11 years ago
wernsaar
4f98f8c9b3
enabled and tested optimized potrf lapack functions
11 years ago
wernsaar
536875d463
enabled and tested optimized getrs lapack functions
11 years ago
wernsaar
65f2fba4c3
enabled and tested optimized cgetrf lapack function
11 years ago
wernsaar
eea6f51df9
enabled and tested optimized sgetrf lapack function
11 years ago
wernsaar
6fc4646709
enabled and tested optimized zgetrf lapack function
11 years ago
wernsaar
ac029f81b3
enabled and tested optimized dgetrf function
11 years ago
wernsaar
189ca1bcee
removed lapack objects from interface/Makefile
11 years ago
Zhang Xianyi
c92ae012a6
Refs #279 . Provide ONLY_CBLAS flag. If you only need CBLAS without
a fortran compiler, please try make ONLY_CBLAS=1.
This mode only compiler CBLAS without BLAS fortran interface and LAPACK.
12 years ago
Jameson Nash
d0e731e8b8
provide support for passing CFLAGS, FFLAGS, PFLAGS, FPFLAGS to make on the command line
13 years ago
Xianyi Zhang
722dd08703
ref #80 . On P4 CPU with 32-bit Windows XP, Octave crashed with OpenBLAS. Walkaroud: Use netlib reference gemv instead of own funtions.
For example, make USE_NETLIB_GEMV=1
14 years ago
Xianyi Zhang
8f1090d32a
Support NO_LAPACK=1 to build the lib without LAPACK functions.
15 years ago
Xianyi Zhang
342bbc3871
Import GotoBLAS2 1.13 BSD version codes.
15 years ago