Martin Kroeker
f90eff306d
Merge pull request #5197 from e4t/z-arch-exec-stack
On zarch don't produce objects from assembler with a writable stack s…
10 months ago
Egbert Eich
61b9339d3a
getarch/cpuid.S: Fix warning about executable stack
When using the GNU toolchain a warning is printed about an executible
stack:
/usr/lib64/gcc/.../x86_64-suse-linux/bin/ld: warning: /tmp/ccyG3xBB.o: missing .note.GNU-stack section implies executable stack
[ 15s] /usr/lib64/gcc/.../x86_64-suse-linux/bin/ld: NOTE: This behaviour is deprecated and will be removed in a future version of the linker
to prevent this warning, add:
```
.section .note.GNU-stack,"",@progbits
```
Signed-off-by: Egbert Eich <eich@suse.com>
10 months ago
Egbert Eich
ea6515c4b3
On zarch don't produce objects from assembler with a writable stack section
On z-series, the current version of the GNU toolchain produces warnings
such as:
```
/usr/lib64/gcc/[...]/s390x-suse-linux/bin/ld: warning: ztrmm_kernel_RC_Z14.o: missing .note.GNU-stack section implies
executable stack
/usr/lib64/[...]/s390x-suse-linux/bin/ld: NOTE: This behaviour is deprecated and will be removed in a future version of the linker
```
To prevent this message and make sure we are future proof, add
```
.section .note.GNU-stack,"",@progbits
```
Also add the `.size` bit to give the asm defined functions a proper size
in the symbol table.
Signed-off-by: Egbert Eich <eich@suse.com>
10 months ago
Martin Kroeker
f33943d73e
Merge pull request #5196 from martin-frbg/issue5193
Fix misinterpretation of NO_LAPACK=0 and SPMV settings in CMake builds
10 months ago
Martin Kroeker
8b35534201
Merge pull request #5195 from martin-frbg/update-gensymbolpl
Re-synchronize gensymbol.pl with the posix shell version
10 months ago
Martin Kroeker
51c1fb1f93
Fix ?spmv build and misinterpretation of NO_LAPACK=0
10 months ago
Martin Kroeker
3ca1ba1be3
resynchronize with the posix shell version
10 months ago
Martin Kroeker
72f0abeed5
Merge pull request #5191 from Harishmcw/CMake_Symbol_Fix
Fix DLL symbol name pre/postfixing in CMake builds on Windows
10 months ago
Harishmcw
1724b3f104
DLL symbol pre/postfixing in CMake builds
10 months ago
Harishmcw
c2e7ab5351
DLL symbol pre/postfixing in CMake builds
10 months ago
Martin Kroeker
200771078f
Merge pull request #5190 from Harishmcw/develop
Fix missing commas in gensymbol.pl and DLL symbol pre/postfixing in CMake builds
10 months ago
Martin Kroeker
4e3afa7beb
Merge pull request #5175 from shubhamsvc/dgemv_thread_throttling
Add thread throttling profile for DGEMV on NEOVERSEV1
10 months ago
Harishmcw
c0a5c9655e
Fix missing commas in gensymbol.pl
10 months ago
shubham.chaudhari
8e289ecddc
Simplified thread throttling function in gemv
10 months ago
shubham.chaudhari
189dbbc04f
Add thread throttling for dynamic arch neoversev1
11 months ago
shubham.chaudhari
b6cb5ece58
Add thread throttling profile for DGEMV on NEOVERSEV1
11 months ago
Martin Kroeker
51c244a098
Merge pull request #5184 from taoye9/fix_sbgemv_n_bug
fix bugs in aarch64 sbgemv_n kernel
10 months ago
Ye Tao
f27ba5efd1
fix bugs in aarch64 sbgemv_n kernel
10 months ago
Martin Kroeker
e9fbe0a838
Merge pull request #5183 from annop-w/fix_sbgemv_t
Fix bug in ARM64 sbgemv_t
10 months ago
Annop Wongwathanarat
edef2e4441
Fix bug in ARM64 sbgemv_t
10 months ago
Martin Kroeker
b55ca71d5b
Merge pull request #5182 from annop-w/sgemm_ncopy
Optimize aarch64 sgemm_ncopy
10 months ago
Martin Kroeker
2f778554b8
Merge pull request #5181 from taoye9/change_sbgemn_cast_bf16
replace customize bf16_to_fp32 with arm neon vcvtah_f32_bf16
10 months ago
Martin Kroeker
66e0f1e621
Merge pull request #5178 from martin-frbg/lapack_cplx_dummy
Add dummy implementations of make_complex_(float/double) to simplify Windows DLL linking
10 months ago
Annop Wongwathanarat
9807f56580
Optimize aarch64 sgemm_ncopy
10 months ago
Martin Kroeker
1ba02656e6
Merge pull request #5177 from martin-frbg/cmakelapacke
Fix omission of LAPACKE interfaces for cgesvdq,strsyl3 and deprecated functions in CMAKE builds
10 months ago
Martin Kroeker
8a418b1aab
Add dummy implementations for the LAPACK_COMPLEX_CUSTOM case
10 months ago
Martin Kroeker
b34235ca66
Fix inclusion of deprecated interfaces and cgesvdq/strsyl3
10 months ago
Martin Kroeker
37b854769b
Merge pull request #5173 from nakagawa-fj/gemm_load_imbalance
Improving Load Imbalance in Thread-Parallel GEMM
10 months ago
Martin Kroeker
a3e7b16072
Merge pull request #5157 from manaalmj/feature
Optimize gemv_n_sve kernel
10 months ago
Martin Kroeker
8865850496
Merge pull request #5176 from annop-w/fix_sbgemv_t
Fix aarch64 sbgemv_t compilation error for GCC < 13
10 months ago
Ye Tao
4c00099ed6
replace customize bf16_to_fp32 with arm neon vcvtah_f32_bf16
10 months ago
Annop Wongwathanarat
a085b6c9ec
Fix aarch64 sbgemv_t compilation error for GCC < 13
10 months ago
Masato Nakagawa
80d3c2ad95
Add Improving Load Imbalance in Thread-Parallel GEMM
10 months ago
manjam01
5c4e38ab17
Optimize gemv_n_sve kernel
11 months ago
Martin Kroeker
39eb43d441
Improve thread safety of pthreads builds that rely on C11 atomic operations for locking ( #5170 )
* Tighten memory orders for C11 atomic operations
10 months ago
Martin Kroeker
1d5ed5c46b
Merge pull request #5168 from taoye9/add_sbgemvn_on_neonversen2
Add dispatch of SBGEMVNKERNEL for NEOVERSEN2 and NEOVERSEV2
11 months ago
Martin Kroeker
7338a473a7
Merge pull request #5150 from Harishmcw/WoA-Experiments
Redefined threading logic for GESV and GEMV on WoA
11 months ago
Martin Kroeker
5f200dca54
Merge pull request #5166 from martin-frbg/issue5158
Expose the option to build without LAPACKE to ccmake
11 months ago
Martin Kroeker
8b98db13e3
Merge pull request #5167 from taoye9/fix_sbgemv_n_kernel_typo
fix minior issues of redeclaration of float x0,x1 in sbgemv_n_neon.c
11 months ago
Ye Tao
6b8b35cdf2
fix minior issues of redeclaration of float x0,x1 in sbgemv_n_neon.c
11 months ago
Ye Tao
38ee7c9301
Add dispatch of SBGEMVNKERNEL for NEOVERSEN2 and NEOVERSEV2
11 months ago
Martin Kroeker
217324d880
Merge pull request #5162 from taoye9/add_sbgemv_tests
add beta and alpha testcase for sbgemv
11 months ago
Martin Kroeker
e4630ed15a
Merge pull request #5160 from taoye9/sbgemv_n_neon
Add SBGEMVN Kernel for ARM64
11 months ago
Martin Kroeker
35914aa9a2
Expose the option to build without LAPACKE to ccmake
11 months ago
Martin Kroeker
2b941c44b5
Merge branch 'develop' into sbgemv_n_neon
11 months ago
Martin Kroeker
c797e27a1c
Merge pull request #5159 from annop-w/sbgemv_t_bfdot
Add sbgemv_t_bfdot kernel for ARM64
11 months ago
Ye Tao
4346b91559
add beta and alpha testcase for sbgemv
11 months ago
Ye Tao
35bdbca153
Add sbgemv_n_neon kernel for arm64.
11 months ago
Annop Wongwathanarat
edaf51dd99
Add sbgemv_t_bfdot kernel for ARM64
This improves performance for sbgemv_t by up to 100x on NEOVERSEV1.
The geometric mean speedup is ~61x for M=N=[2,512].
11 months ago
Martin Kroeker
ef9e3f7159
Merge pull request #5149 from martin-frbg/fixup5077-5088
Make the Neoverse GEMM/GEMV throttling code conditional on SMP
11 months ago