Martin Kroeker
52792f6da7
Revert "CMake: Pass `OpenMP` compiler and linker flags through CMake targets"
9 months ago
Martin Kroeker
d23680b81d
Merge pull request #5407 from nakagawa-fj/feature/gemm_divide_rate_for_neoversev1
Multi-thread Performance Improvement of GEMM on NeoverseV1 with DIVIDE_RATE=1
9 months ago
Martin Kroeker
b4cc4be2ce
Merge pull request #5410 from martin-frbg/issue5404
Adjust multithreading threshold in S/DGER and add an intermediate step
9 months ago
Martin Kroeker
0968dddf1a
Merge pull request #5409 from martin-frbg/issue5372
Work around gcc15.1 on POWER misoptimizing DGEMV at -O3
9 months ago
Martin Kroeker
eddfe1e6b3
Merge pull request #5408 from ChipKerchner/fixRISCV64GEMVInitializationAndWarnings
Fix bad vector zero initializer and other compiler warnings for RISC-V.
9 months ago
Martin Kroeker
30d11bc92c
Adjust multithreading threshold and add an intermediate step
9 months ago
Martin Kroeker
a3b9c933c5
mark xbuffer as volatile to work around gcc15.1 optimizer bug
9 months ago
Chip Kerchner
72f082f31d
Fix bad vector zero initializer and other compiler warnings for RISC-V.
9 months ago
Masato Nakagawa
7e29f11396
Multi-thread GEMM Performance Improvement on NeoverseV1 (DIVIDE_RATE=1)
9 months ago
Martin Kroeker
9a64b32b44
Merge pull request #5406 from martin-frbg/fixbgemmtest
Fix building of bgemm tests on GEMM3M-capable (x86) targets
9 months ago
Martin Kroeker
b66a01f909
Fix building of bgemm tests on GEMM3M-capable (x86) targets
9 months ago
Martin Kroeker
a5e7c0e3e0
Merge pull request #5396 from abhishek-iitmadras/abhishekk_bfloat16
ARM64: Enable bfloat16 kernels by default
9 months ago
abhishek-fujitsu
6356190d06
fix gfortran link path in dynamic_arch.yml
9 months ago
abhishek-fujitsu
4c8dcb3a8f
Darwin/arm64: disable SVE/SME and fix gfortran link path
9 months ago
Martin Kroeker
33b50548eb
Merge pull request #5403 from martin-frbg/issue5402
Introduce a (crude) threshold to multithreading in STRMV/DTRMV
9 months ago
Martin Kroeker
c504aedca1
Merge pull request #5400 from Mousius/neoversev2-target
Add NEOVERSEV2 target support
9 months ago
Martin Kroeker
b9e107932a
add NeoverseV2
9 months ago
Martin Kroeker
2f89a5970e
fix NeoverseV2 typo
9 months ago
Martin Kroeker
a9e8fa06bf
Introduce a (crude) threshold to multithreading
9 months ago
Martin Kroeker
b4c2b34a45
Merge pull request #5401 from martin-frbg/followup-5397
Include float-bfloat conversion functions in ONLY_CBLAS builds as well
9 months ago
Martin Kroeker
c9204f7b6f
Merge pull request #5399 from Mousius/bgemm-8x4
Add optimized BGEMM for NEOVERSEN2 target
9 months ago
Martin Kroeker
a55e65dba9
Merge pull request #5391 from martin-frbg/issue5387
Use OpenBLAS_ROOT_DIR in OpenBLASConfig.cmake generation only if set
9 months ago
abhishek-fujitsu
0bc79da587
add neon header
9 months ago
abhishek-fujitsu
720a4743b9
update contribution list
10 months ago
abhishek-fujitsu
05fc88180c
ARM64: Enable bfloat16 kernels by default
1 year ago
Martin Kroeker
965463f177
Include float-bfloat conversion functions in ONLY_CBLAS builds as well
9 months ago
Martin Kroeker
4272cf8c7f
Merge pull request #5398 from martin-frbg/fixup-5394
Update ?GEMM-to-?GEMV forwarding settings for CMake
9 months ago
Chris Sidebottom
87247daadc
Add NEOVERSEV2 target support
Did a quick run around to make `TARGET=NEVOERSEV2` build successfully.
Fixes #5385
9 months ago
Chris Sidebottom
ea2faf0c9a
Add optimized BGEMM for NEOVERSEN2 target
This re-uses the existing NEOVERSEN2 8x4 `sbgemm` kernel to implement `bgemm`.
10 months ago
Martin Kroeker
a5b55f6fe3
remove CBLAS restriction on GEMM_GEMV forwarding
9 months ago
Martin Kroeker
a4f4662459
Merge pull request #5397 from omegacoleman/fix-cblas-bgemm
Fix cmake building with cblas_bgemm
9 months ago
Martin Kroeker
82954ba4ca
Update ?GEMM-to-?GEMV forwarding settings
9 months ago
Martin Kroeker
392d38168e
Merge pull request #5394 from Mousius/optimize-bgemv
Optimized BGEMV for NEOVERSEV1 target
9 months ago
youcai
41f9701ebc
Fix cmake building with cblas_bgemm
10 months ago
Martin Kroeker
f4caa61e47
Merge pull request #5395 from martin-frbg/fixloongsonCI
Fix libffi6 download in the Loongarch64_clang CI job (for now)
10 months ago
Martin Kroeker
444d03db9c
switch to another site that still has libffi6 (for now)
10 months ago
Chris Sidebottom
2c3cdaf74e
Optimized BGEMV for NEOVERSEV1 target
- Adds bgemv T based off of sbgemv T kernel
- Adds bgemv N which is slightly alterated to not use Y as an
accumulator due to the output being bf16 which results in loss of
precision
- Enables BGEMM_GEMV_FORWARD to proxy BGEMM to BGEMV with new kernels
10 months ago
Martin Kroeker
7d908564fe
Use OpenBLAS_ROOT_DIR in CMake config file generation only if set
10 months ago
Martin Kroeker
2f81d6e60c
Merge pull request #5390 from martin-frbg/issue5388-2
Declare the "small" complex DOT and AXPY kernels for RISCV-ZVL256B static in addition to inline
10 months ago
Martin Kroeker
e2d941e9af
Declare the "small" kernel static in addition to inline
10 months ago
Martin Kroeker
8214700930
Declare the "small" kernel static in addition to inline
10 months ago
Martin Kroeker
4ae8707b54
Merge pull request #5389 from martin-frbg/issue5388
Add cross-compilation parameters for RISCV64 targets in CMake
10 months ago
Martin Kroeker
b24212f5df
fix numbers
10 months ago
Martin Kroeker
6ff06f5483
Add cross-compilation data for RISCV64 targets
10 months ago
Martin Kroeker
d92f151634
Merge pull request #5386 from martin-frbg/issue5384
Fixes for some gcc warnings
10 months ago
Martin Kroeker
30dbca5051
fix misleading indentation to silence a gcc warning
10 months ago
Martin Kroeker
38e6999295
format cleanup
10 months ago
Martin Kroeker
3df503cafd
portability fix and cleanup
10 months ago
Martin Kroeker
39c90f9859
Merge pull request #5380 from quic/topic/sgemm_direct_sme1_alpha_beta
SME1 based direct kernel (with alpha and beta) for cblas_sgemm level 3
10 months ago
Rajendra Prasad Matcha
eae0abfdb6
SME1 based direct kernel with alpha and beta for cblas_sgemm level 3 API.
10 months ago