Martin Kroeker
96d80801bc
Reinstate the CooperLake microkernel
1 year ago
Martin Kroeker
f5bc97c37e
Merge pull request #5227 from zanpeeters/develop
Wrong output from getarch on Apple M4
1 year ago
Martin Kroeker
050c3b26ae
Merge pull request #5236 from ywwry66/apple_workaround
Follow-up to #5233 , fixing "Argument list too long"
1 year ago
Ruiyang Wu
9aa7a0b2a7
Follow-up to d659f3c
1 year ago
Martin Kroeker
94fceaeac5
Merge pull request #5233 from ywwry66/apple_workaround
Fix "Argument list too long" compilation error for Intel macOS
1 year ago
Ruiyang Wu
d659f3c3f6
Fix "Argument list too long" compilation error for Intel macOS
1 year ago
Martin Kroeker
2e4309315c
Merge pull request #5219 from martin-frbg/sbgemvn_cooper
Temporarily disable the Cooper Lake/Sapphire Rapids microkernel for non-transpose SBGEMV
1 year ago
Martin Kroeker
afc1dc69cd
Merge pull request #5234 from RevySR/bump-xuantie-qemu
Bump xuantie qemu for c910v
1 year ago
Han Gao
1f687b2f60
Bump xuantie qemu for c910v
Signed-off-by: Han Gao <rabenda.cn@gmail.com>
1 year ago
Martin Kroeker
dd38b4e811
Merge pull request #5225 from annop-w/gemv_n
Improve performance for SGEMVN on NEONVERSEN1
1 year ago
Martin Kroeker
3a088de2d1
Merge pull request #5228 from martin-frbg/cmakecrossarm
Update and amend parameters for Neoverse cpus in CMake crossbuilds
1 year ago
Martin Kroeker
0241d516f6
Merge pull request #5220 from iha-taisei/sdgemv_n_unroll
Further performance improvements to non-transposed [SD]GEMV kernels for A64FX and Neoverse V1.
1 year ago
Martin Kroeker
afb664527f
Merge pull request #5221 from tetsuzo-usui/tune_symv_for_arm64
Add AArch64-optimized SYMV kernels
1 year ago
Annop Wongwathanarat
d535728803
Improve performance for SGEMVN on NEONVERSEN1
1 year ago
Martin Kroeker
d9369bda1e
Update and amend parameters for Neoverse cpus
1 year ago
zanpeeters
acef78c778
Reset buffer length before every call to sysctlbyname.
1 year ago
zanpeeters
d1c2528aed
Add L1_DATA_LINESIZE for ifdef __APPLE__
1 year ago
zanpeeters
7b66330dea
hw.perflevel[01].cpusperl changed to hw.perflevel[01].cpusperl2
1 year ago
Usui, Tetsuzo
d711906e3e
Add symv kernels for arm64
1 year ago
Iha, Taisei
f1e628b889
Further performance improvements to [SD]GEMV.
1 year ago
Martin Kroeker
39718cd28e
Merge pull request #5218 from martin-frbg/lapacke_mangling
lapacke_mangling.h is no longer generated, so don't delete on make clean
1 year ago
Martin Kroeker
211dfd0754
disable the CooperLake microkernel as it produces wrong results
1 year ago
Martin Kroeker
fd3afef122
lapacke_mangling.h is no longer generated, so don't delete on make clean
1 year ago
Martin Kroeker
b30dc9701f
Merge pull request #5215 from annop-w/gemv_t
Use SVE kernel for S/DGEMVT for SVE machines
1 year ago
Martin Kroeker
2893d0add4
Merge pull request #5211 from guoyuanplct/develop
Optimizing the Implementation of GEMV on the RISC-V V Extension
1 year ago
Martin Kroeker
ed1e470663
Merge pull request #5217 from haampie/hs/fix/darwin-gcc
test_potrs.c: do not use GCC pragma on darwin-aarch64
1 year ago
Harmen Stoppels
3d6d026fe1
no-gcse when loongarch64
1 year ago
Harmen Stoppels
51ba70f47b
test_potrs.c: remove pragma darwin-aarch64 support
Using GCC 14.2.0 on Darwin, the pragma ultimately causes a linker error
"ld: invalid r_symbolnum=". The current workaround is to use the old
linker, but (a) it's deprecated and (b) it can produce libraries that
are subsequently not linkable with the newer linker in dependents: the
new ld64 does not link to libraries with duplicate rpaths created by the
classic linker.
1 year ago
Annop Wongwathanarat
ec146157d3
Use SVE kernel for S/DGEMVT for SVE machines
1 year ago
Martin Kroeker
de2380e5a6
Merge pull request #5214 from martin-frbg/issue5200
Remove spurious cast from Alpha and Cell's DEFAULT_ALIGN
1 year ago
Martin Kroeker
a34b487f22
Remove spurious cast from Alpha and Cell's DEFAULT_ALIGN
1 year ago
Martin Kroeker
1b3e7cc491
Merge pull request #5212 from martin-frbg/lapack1119
Fix incomplete error message in EIG test (Reference-LAPACK PR 1119)
1 year ago
Martin Kroeker
4270d5bc43
Merge pull request #5204 from martin-frbg/issue4692
Repeat the libs target's "ln" in the all target to ensure completeness of copy on Windows
1 year ago
Martin Kroeker
880e43ee54
Merge pull request #5198 from martin-frbg/woadlldebug
Fix pdb file creation in debug dll builds with CMake on Windows/WoA
1 year ago
Martin Kroeker
70865a894e
Merge pull request #5180 from ywwry66/openmp_use_cmake
CMake: Pass `OpenMP` compiler and linker flags through CMake targets
1 year ago
Martin Kroeker
f0f274725d
Merge pull request #5207 from martin-frbg/issue5202
Fix MacOS compilation with xcode16.3/clang17/gcc14
1 year ago
Martin Kroeker
94fb7033a4
Fix incomplete error message (Reference-LAPACK PR 1119)
1 year ago
lglglglgy
1ff303f36e
Optimizing the Implementation of GEMV on the RISC-V V Extension
Specialized some scenarios, performed loop unrolling, and reduced the
number of multiplications.
1 year ago
Martin Kroeker
fc8090b607
Move additional omp dependency to EXTRALIB
1 year ago
Martin Kroeker
1c5d0d5539
move libomp to extralib
1 year ago
Martin Kroeker
67c5bdd639
Azure CI: Update flang call in OSX_LLVM_flangnew job ( #5208 )
* Update flang call in OSX_LLVM_flangnew job
1 year ago
Martin Kroeker
1ed962d259
Fix compilation with xcode16.3/clang17/gcc14
1 year ago
Martin Kroeker
f0008f50cc
Merge pull request #5206 from ColumbusAI/develop
Update zsum.c -- fixed spelling error to successfully compile
1 year ago
ColumbusAI
7bf848454d
Update zsum.c -- fixed spelling error to successfully compile
spelling error where zsum_kernel is used and it should be zasum_kernel. Will not compile without fix.
1 year ago
Martin Kroeker
0aa5ef29ec
Repeat the libs target's "ln" in the all target to ensure completeness
1 year ago
Martin Kroeker
f90eff306d
Merge pull request #5197 from e4t/z-arch-exec-stack
On zarch don't produce objects from assembler with a writable stack s…
1 year ago
Martin Kroeker
3fc15ad81c
Fix pdb file creation in debug dll builds with CMake on Windows/WoA
1 year ago
Egbert Eich
61b9339d3a
getarch/cpuid.S: Fix warning about executable stack
When using the GNU toolchain a warning is printed about an executible
stack:
/usr/lib64/gcc/.../x86_64-suse-linux/bin/ld: warning: /tmp/ccyG3xBB.o: missing .note.GNU-stack section implies executable stack
[ 15s] /usr/lib64/gcc/.../x86_64-suse-linux/bin/ld: NOTE: This behaviour is deprecated and will be removed in a future version of the linker
to prevent this warning, add:
```
.section .note.GNU-stack,"",@progbits
```
Signed-off-by: Egbert Eich <eich@suse.com>
1 year ago
Egbert Eich
ea6515c4b3
On zarch don't produce objects from assembler with a writable stack section
On z-series, the current version of the GNU toolchain produces warnings
such as:
```
/usr/lib64/gcc/[...]/s390x-suse-linux/bin/ld: warning: ztrmm_kernel_RC_Z14.o: missing .note.GNU-stack section implies
executable stack
/usr/lib64/[...]/s390x-suse-linux/bin/ld: NOTE: This behaviour is deprecated and will be removed in a future version of the linker
```
To prevent this message and make sure we are future proof, add
```
.section .note.GNU-stack,"",@progbits
```
Also add the `.size` bit to give the asm defined functions a proper size
in the symbol table.
Signed-off-by: Egbert Eich <eich@suse.com>
1 year ago
Martin Kroeker
f33943d73e
Merge pull request #5196 from martin-frbg/issue5193
Fix misinterpretation of NO_LAPACK=0 and SPMV settings in CMake builds
1 year ago