You can not select more than 25 topicsTopics must start with a chinese character,a letter or number, can include dashes ('-') and can be up to 35 characters long.
using intrinsics is a bit easier to read (at least for the non-math part of the code)
and also allows the compiler to be better about register allocation and optimizing the
non-math (loop/setup) code.
It also allows the code to honor the "no fma" flag if the user so desires.
The result of this change is (measured for a size of 16) a 15% performance increase.
And it is a step towards being able to add an AVX512 version of the code.