You can end up in a situation where the HAVE_x/NO_x flags are used for compiling param.h but not
Makefile.conf which leads to further inconsistencies. This uses the existing logic to parse ARCHCONFIG and ensures it ends up
in the eventual Makefile.conf whenever FORCE is enabled.
This was gated behind GOTOBLAS_MAKEFILE, which looks like it's intended
to stop the much more expensive run of `Makefile.prebuild` rather than
prevent reading the configuration. Moving this `endif` means that things
such as `dynamic_<X>.c` correctly get the architecture flags.
This is a precursor to enabling the SVE kernels for Arm(R) Neoverse(TM)
V1 which has 256-bit SVE. Testing revealed that the SVE kernel was
actually worse in some cases than the existing kernel which seemed odd -
removing these prefetches the underlying architecture seems to do a better job
😸
This PR patches the f_check scripts to detect the ifort compiler. With
out this, the value of F_COMPILER is G77 which causes the link phase to
use icc rather than ifort, resulting in missing libifcore.
I incorrectly added `+sve` to the Neoverse(TM) N1 CPUs GCC parameters,
which doesn't support SVE - this results in failed builds when using a
compiler that doesn't support `-mtune=neoverse-n1` which appears to hide
the mistake.
Unlike [dcz]scal, sscal still used the original GotoBLAS SSE code from scal_sse.S.
This code follows dscal as closely as possible, except for the inc_x > 1 code
for which a plain C loop is used much like the one in cscal.c, instead of an
adaptation of the SSE2 asm code of dscal.c (I tried but the performance wasn't
better than the plain C loop).
Using 256-bit registers in dscal makes this microkernel consistent with
cscal and zscal, and generally doubles performance if the vector fits in
L1 cache.
This adds an SVE implementation to sdot/ddot when available, falling back to the previous Advanced SIMD kernel where there's no SVE implementation for the kernel.
All the targets were essentially treating `dot_thunderx2t99.c` as the Advanced SIMD implementation so I've renamed it to better fit with the feature detection.