Browse Source

s390x: allow clang to emit fused multiply-adds (replicates gcc's default behavior)

gcc's default setting for floating-point expression contraction is
"fast", which allows the compiler to emit fused multiply adds instead of
separate multiplies and adds (amongst others). Fused multiply-adds,
which assembly kernels typically apply, also bring a significant
performance advantage to the C implementation for matrix-matrix
multiplication on s390x. To enable that performance advantage for builds
with clang, add -ffp-contract=fast to the compiler options.

Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>
tags/v0.3.11^2
Marius Hillenbrand 5 years ago
parent
commit
095f4e6964
1 changed files with 6 additions and 0 deletions
  1. +6
    -0
      Makefile.zarch

+ 6
- 0
Makefile.zarch View File

@@ -8,3 +8,9 @@ ifeq ($(CORE), Z14)
CCOMMON_OPT += -march=z14 -mzvector -O3
FCOMMON_OPT += -march=z14 -mzvector
endif

# Enable floating-point expression contraction for clang, since it is the
# default for gcc
ifeq ($(C_COMPILER), CLANG)
CCOMMON_OPT += -ffp-contract=fast
endif

Loading…
Cancel
Save