Gordon Fossum
e6dd44d989
Power10: Fix for SBGEMM
While testing bfloat16 sbgemm kernel, there are some failures for odd value inputs due to updating result for
additional bytes.
4 years ago
Rajalakshmi Srinivasaraghavan
cbb70438df
POWER10: Fixes for sbgemm kernel
While testing bfloat16 sbgemm kernel, there are some failures
for odd value inputs due to array access beyond the boundary.
4 years ago
Rajalakshmi Srinivasaraghavan
0826d68f93
POWER10: Change the packing format for bfloat16
As the new MMA instructions need the inputs in 4x2 order for bfloat16,
changing the format in copy/packing code. This avoids permute instructions
in the gemm kernel inner loop.
5 years ago
Martin Kroeker
9ae80490e0
rename "HALF" and "sh" to "BFLOAT16" and "sb"
5 years ago
Martin Kroeker
d314d1f49f
Rename shgemm_kernel_power10.c to sbgemm_kernel_power10.c
5 years ago