* Protect align directives in assembly files that are currently problematic with LLVM on WoA * use the armv8 zdot on WoA to work around other LLVM issues
I don't have as many benchmarks for these as for gemm, but it should still make a difference for small matrices.