zanpeeters
acef78c778
Reset buffer length before every call to sysctlbyname.
1 year ago
zanpeeters
d1c2528aed
Add L1_DATA_LINESIZE for ifdef __APPLE__
1 year ago
zanpeeters
7b66330dea
hw.perflevel[01].cpusperl changed to hw.perflevel[01].cpusperl2
1 year ago
Usui, Tetsuzo
d711906e3e
Add symv kernels for arm64
1 year ago
Iha, Taisei
f1e628b889
Further performance improvements to [SD]GEMV.
1 year ago
Martin Kroeker
39718cd28e
Merge pull request #5218 from martin-frbg/lapacke_mangling
lapacke_mangling.h is no longer generated, so don't delete on make clean
1 year ago
Martin Kroeker
211dfd0754
disable the CooperLake microkernel as it produces wrong results
1 year ago
Martin Kroeker
fd3afef122
lapacke_mangling.h is no longer generated, so don't delete on make clean
1 year ago
Martin Kroeker
b30dc9701f
Merge pull request #5215 from annop-w/gemv_t
Use SVE kernel for S/DGEMVT for SVE machines
1 year ago
Martin Kroeker
2893d0add4
Merge pull request #5211 from guoyuanplct/develop
Optimizing the Implementation of GEMV on the RISC-V V Extension
1 year ago
Martin Kroeker
ed1e470663
Merge pull request #5217 from haampie/hs/fix/darwin-gcc
test_potrs.c: do not use GCC pragma on darwin-aarch64
1 year ago
Harmen Stoppels
3d6d026fe1
no-gcse when loongarch64
1 year ago
Harmen Stoppels
51ba70f47b
test_potrs.c: remove pragma darwin-aarch64 support
Using GCC 14.2.0 on Darwin, the pragma ultimately causes a linker error
"ld: invalid r_symbolnum=". The current workaround is to use the old
linker, but (a) it's deprecated and (b) it can produce libraries that
are subsequently not linkable with the newer linker in dependents: the
new ld64 does not link to libraries with duplicate rpaths created by the
classic linker.
1 year ago
Annop Wongwathanarat
ec146157d3
Use SVE kernel for S/DGEMVT for SVE machines
1 year ago
Martin Kroeker
de2380e5a6
Merge pull request #5214 from martin-frbg/issue5200
Remove spurious cast from Alpha and Cell's DEFAULT_ALIGN
1 year ago
Martin Kroeker
a34b487f22
Remove spurious cast from Alpha and Cell's DEFAULT_ALIGN
1 year ago
Martin Kroeker
1b3e7cc491
Merge pull request #5212 from martin-frbg/lapack1119
Fix incomplete error message in EIG test (Reference-LAPACK PR 1119)
1 year ago
Martin Kroeker
4270d5bc43
Merge pull request #5204 from martin-frbg/issue4692
Repeat the libs target's "ln" in the all target to ensure completeness of copy on Windows
1 year ago
Martin Kroeker
880e43ee54
Merge pull request #5198 from martin-frbg/woadlldebug
Fix pdb file creation in debug dll builds with CMake on Windows/WoA
1 year ago
Martin Kroeker
70865a894e
Merge pull request #5180 from ywwry66/openmp_use_cmake
CMake: Pass `OpenMP` compiler and linker flags through CMake targets
1 year ago
Martin Kroeker
f0f274725d
Merge pull request #5207 from martin-frbg/issue5202
Fix MacOS compilation with xcode16.3/clang17/gcc14
1 year ago
Martin Kroeker
94fb7033a4
Fix incomplete error message (Reference-LAPACK PR 1119)
1 year ago
lglglglgy
1ff303f36e
Optimizing the Implementation of GEMV on the RISC-V V Extension
Specialized some scenarios, performed loop unrolling, and reduced the
number of multiplications.
1 year ago
Martin Kroeker
fc8090b607
Move additional omp dependency to EXTRALIB
1 year ago
Martin Kroeker
1c5d0d5539
move libomp to extralib
1 year ago
Martin Kroeker
67c5bdd639
Azure CI: Update flang call in OSX_LLVM_flangnew job ( #5208 )
* Update flang call in OSX_LLVM_flangnew job
1 year ago
Martin Kroeker
1ed962d259
Fix compilation with xcode16.3/clang17/gcc14
1 year ago
Martin Kroeker
f0008f50cc
Merge pull request #5206 from ColumbusAI/develop
Update zsum.c -- fixed spelling error to successfully compile
1 year ago
ColumbusAI
7bf848454d
Update zsum.c -- fixed spelling error to successfully compile
spelling error where zsum_kernel is used and it should be zasum_kernel. Will not compile without fix.
1 year ago
Martin Kroeker
0aa5ef29ec
Repeat the libs target's "ln" in the all target to ensure completeness
1 year ago
Martin Kroeker
f90eff306d
Merge pull request #5197 from e4t/z-arch-exec-stack
On zarch don't produce objects from assembler with a writable stack s…
1 year ago
Vaisakh K V
04915be829
Add vector registers to clobber list to prevent compiler optimization.
SME based SGEMMDIRECT kernel uses the vector registers (z) and adding
clobber list informs compiler not to optimize these registers.
1 year ago
Martin Kroeker
3fc15ad81c
Fix pdb file creation in debug dll builds with CMake on Windows/WoA
1 year ago
Egbert Eich
61b9339d3a
getarch/cpuid.S: Fix warning about executable stack
When using the GNU toolchain a warning is printed about an executible
stack:
/usr/lib64/gcc/.../x86_64-suse-linux/bin/ld: warning: /tmp/ccyG3xBB.o: missing .note.GNU-stack section implies executable stack
[ 15s] /usr/lib64/gcc/.../x86_64-suse-linux/bin/ld: NOTE: This behaviour is deprecated and will be removed in a future version of the linker
to prevent this warning, add:
```
.section .note.GNU-stack,"",@progbits
```
Signed-off-by: Egbert Eich <eich@suse.com>
1 year ago
Egbert Eich
ea6515c4b3
On zarch don't produce objects from assembler with a writable stack section
On z-series, the current version of the GNU toolchain produces warnings
such as:
```
/usr/lib64/gcc/[...]/s390x-suse-linux/bin/ld: warning: ztrmm_kernel_RC_Z14.o: missing .note.GNU-stack section implies
executable stack
/usr/lib64/[...]/s390x-suse-linux/bin/ld: NOTE: This behaviour is deprecated and will be removed in a future version of the linker
```
To prevent this message and make sure we are future proof, add
```
.section .note.GNU-stack,"",@progbits
```
Also add the `.size` bit to give the asm defined functions a proper size
in the symbol table.
Signed-off-by: Egbert Eich <eich@suse.com>
1 year ago
Martin Kroeker
f33943d73e
Merge pull request #5196 from martin-frbg/issue5193
Fix misinterpretation of NO_LAPACK=0 and SPMV settings in CMake builds
1 year ago
Ruiyang Wu
251c3f857d
gh m1: fix mixed linkage when built with OpenMP and clang+gfortran
1 year ago
Ruiyang Wu
1b0c0f00e9
CMake: Avoid mixed OpenMP linkage
1 year ago
Ruiyang Wu
02fd1df10b
CMake: Pass `OpenMP` compiler and linker flags through CMake targets
Using `OpenMP::OpenMP_LANG` targets for CMake is less error-prone than
passing the compiler and linker flags manually. Furthermore, it allows
the user to customize those flags by setting `OpenMP_LANG_FLAGS`,
`OpenMP_LANG_LIB_NAMES`, and `OpenMP_omp_LIBRARY`.
1 year ago
Martin Kroeker
8b35534201
Merge pull request #5195 from martin-frbg/update-gensymbolpl
Re-synchronize gensymbol.pl with the posix shell version
1 year ago
Martin Kroeker
51c1fb1f93
Fix ?spmv build and misinterpretation of NO_LAPACK=0
1 year ago
Martin Kroeker
3ca1ba1be3
resynchronize with the posix shell version
1 year ago
Martin Kroeker
72f0abeed5
Merge pull request #5191 from Harishmcw/CMake_Symbol_Fix
Fix DLL symbol name pre/postfixing in CMake builds on Windows
1 year ago
Harishmcw
1724b3f104
DLL symbol pre/postfixing in CMake builds
1 year ago
Harishmcw
c2e7ab5351
DLL symbol pre/postfixing in CMake builds
1 year ago
Martin Kroeker
200771078f
Merge pull request #5190 from Harishmcw/develop
Fix missing commas in gensymbol.pl and DLL symbol pre/postfixing in CMake builds
1 year ago
Martin Kroeker
4e3afa7beb
Merge pull request #5175 from shubhamsvc/dgemv_thread_throttling
Add thread throttling profile for DGEMV on NEOVERSEV1
1 year ago
Harishmcw
c0a5c9655e
Fix missing commas in gensymbol.pl
1 year ago
shubham.chaudhari
8e289ecddc
Simplified thread throttling function in gemv
1 year ago
shubham.chaudhari
189dbbc04f
Add thread throttling for dynamic arch neoversev1
1 year ago