Martin Kroeker
bd0752444a
Merge pull request #2894 from RajalakshmiSR/bf16_packing
POWER10: Change the packing format for bfloat16
5 years ago
Rajalakshmi Srinivasaraghavan
0826d68f93
POWER10: Change the packing format for bfloat16
As the new MMA instructions need the inputs in 4x2 order for bfloat16,
changing the format in copy/packing code. This avoids permute instructions
in the gemm kernel inner loop.
5 years ago
Martin Kroeker
602a0c7a69
Merge pull request #2892 from RajalakshmiSR/bf16_make
Fix build issues with bfloat16
5 years ago
Rajalakshmi Srinivasaraghavan
b5d30b390d
Fix build issues with bfloat16
This patch fixes compilation errors due to recent renaming from SH to SB
with BUILD_BFLOAT16.
5 years ago
Martin Kroeker
d85b968424
Merge pull request #2891 from martin-frbg/fix-2886
Fix several bugs and omissions from the BFLOAT16 rename
5 years ago
Martin Kroeker
9dca578c79
Cleanup
5 years ago
Martin Kroeker
1e7eb7b7a9
Fix typos in currently unused sections
5 years ago
Martin Kroeker
84949754a0
Fix bfloat16 conditional
5 years ago
Martin Kroeker
2ae8785603
Add a POWER9 build with BFLOAT16 enabled
5 years ago
Martin Kroeker
e05af6575e
Fix some overlooked "SHBLAS" entries
5 years ago
Martin Kroeker
c1643006ae
Merge pull request #97 from xianyi/develop
rebase
5 years ago
Martin Kroeker
08929430cd
Merge pull request #2886 from martin-frbg/issue_2767
Rename "HALF" precision functions (sh prefix) to "BFLOAT16" with "sb" prefix
5 years ago
Martin Kroeker
0c84ffe05f
Merge pull request #2881 from mattip/fninit
add fninit to reset fpu registers before assembler routines
5 years ago
Martin Kroeker
cb4274e3ad
Merge pull request #2888 from Qiyu8/usimd-sum
Optimize the performance of sum by using universal intrinsics
5 years ago
Matti Picus
403eb513a0
use emms instead, add WIN guards
5 years ago
Martin Kroeker
cb839575ed
Convert the prototypes of the unimplemented BFLOAT16 functions to the new naming scheme
5 years ago
Qiyu8
0ed1f07660
Optimize the performance of sum by using universal intrinsics
5 years ago
Martin Kroeker
bb74dd29db
Restore -msse3
5 years ago
Martin Kroeker
629c497b6c
common_sh.h renamed to common_sb.h
5 years ago
Martin Kroeker
2c552f1074
Change "HALF" and "sh" to "BFLOAT16" and "sb"
5 years ago
Martin Kroeker
7ae9e8960e
Change "HALF" and "sh" to "BFLOAT16" and "sb"
5 years ago
Martin Kroeker
e3a29f6b58
Change "HALF" and "sh" to "BFLOAT16" and "sb"
5 years ago
Martin Kroeker
006c7f6671
Change "HALF" and "sh" to "BFLOAT16" and "sb"
5 years ago
Martin Kroeker
85154c2e18
Change "HALF" and "sh" to "BFLOAT16" and "sb"
5 years ago
Martin Kroeker
ae1ab5bfdf
Change "HALF" and "sh" to "BFLOAT16" and "sb"
5 years ago
Martin Kroeker
052f31bc3c
Change "HALF" and "sh" to "BFLOAT16" and "sb"
5 years ago
Martin Kroeker
3aecafad80
Change "HALF" and "sh" to "BFLOAT16" and "sb"
5 years ago
Martin Kroeker
756062afa5
Rename "HALF" and "sh" to "BFLOAT16" and "sb"
5 years ago
Martin Kroeker
2061f7fdff
Rename "HALF" and "sh" to "BFLOAT16" and "sb"
5 years ago
Martin Kroeker
dc8a1afa63
Rename "HALF" and "sh" to "BFLOAT16" and "sb"
5 years ago
Martin Kroeker
32733ded04
Rename "HALF" and "sh" to "BFLOAT16" and "sb"
5 years ago
Martin Kroeker
3bc8e8c334
Rename "HALF" and "sh" to "BFLOAT16"and "sb"
5 years ago
Martin Kroeker
573508f0ee
Rename common_sh.h to common_sb.h
5 years ago
Martin Kroeker
ca31c32693
Rename "HALF" and "sh" to "BFLOAT16" and "sb"
5 years ago
Martin Kroeker
5800758b43
Rename "HALF" and "sh" to "BFLOAT16" and "sb"
5 years ago
Martin Kroeker
924fd806d0
Rename "HALF" and "sh" to "BFLOAT16" and "sb"
5 years ago
Martin Kroeker
4db09c6cec
Rename compare_sgemm_shgemm.c to compare_sgemm_sbgemm.c
5 years ago
Martin Kroeker
fd94236042
Rename "HALF" and "sh" to "BFLOAT16" and "sb"
5 years ago
Martin Kroeker
68ce719fac
Rename shdot_microk_cooperlake.c to sbdot_microk_cooperlake.c
5 years ago
Martin Kroeker
d7dd9b396c
Rename shdot.c to sbdot.c
5 years ago
Martin Kroeker
9ae80490e0
rename "HALF" and "sh" to "BFLOAT16" and "sb"
5 years ago
Martin Kroeker
d314d1f49f
Rename shgemm_kernel_power10.c to sbgemm_kernel_power10.c
5 years ago
Martin Kroeker
f0883740e4
Merge pull request #96 from xianyi/develop
rebase
5 years ago
Martin Kroeker
1c0b03efb4
Merge branch 'develop' into develop
5 years ago
Martin Kroeker
c589c3e2a1
Merge pull request #2882 from martin-frbg/issue2709
Use generic C for (D/Z)NRM2 on Windows x86_64
5 years ago
Martin Kroeker
ec638a82bf
Merge pull request #2852 from martin-frbg/issue2588-cmake
Support building only a subset of variable types
5 years ago
Martin Kroeker
caa0d757ca
repair TABs
5 years ago
Martin Kroeker
6154f72d6d
Copy BUILD_ settings to the LAPACK make.inc
5 years ago
Martin Kroeker
ae8b0d257a
Set BUILD_ options to 1 instead of just defining them
5 years ago
Martin Kroeker
1da32cc1fc
Add cblas_xerbla interface
5 years ago