Martin Kroeker
11ff18bb0f
Merge pull request #5081 from XiWeiGu/kernel_generic_fixed_cscal_zscal
kernel/generic: Fixed cscal and zscal
11 months ago
Martin Kroeker
2e2691b34b
Merge pull request #5078 from XiWeiGu/la64_fixed_cscal_zscal
LoongArch64: fixed cscal and zscal
11 months ago
Martin Kroeker
f4194fc65f
Merge branch 'develop' into la64_fixed_cscal_zscal
11 months ago
Martin Kroeker
fe220a0d7d
Merge pull request #5291 from guoyuanplct/develop
kernel/riscv64:fixed the performance problem in RISCV64_ZVL256 when OPENBLAS_K is small
11 months ago
Martin Kroeker
bbdc265798
Merge pull request #5294 from arnej27959/arnej/fix-arm64-register
Accumulate results in output register explicitly
11 months ago
Arne Juul
5442aff218
Accumulate results in output register explicitly
11 months ago
guoyuanplct
83fcab7578
Merge branch 'develop' of https://github.com/guoyuanplct/OpenBLAS into develop
11 months ago
guoyuanplct
2ae019161a
fixed the performance problem in RISCV64_ZVL256 when OPENBLAS_K is small
11 months ago
Martin Kroeker
02267d86f5
Merge pull request #5288 from guoyuanplct/develop
kernel/riscv64:Optimized the implementation of axpby on TARGET=RISCV64_ZVL256B.
11 months ago
guoyuanplct
d2003dc886
del lines
11 months ago
guoyuanplct
45fd2d9b07
Optimized the axpby function.
11 months ago
Martin Kroeker
0163143fdd
Merge pull request #5278 from martin-frbg/fixup5276
Fix compilation with pre-C99 compilers
11 months ago
Martin Kroeker
20f2ba0141
Move declaration of i for pre-C99 compilers
11 months ago
Martin Kroeker
e2e6a4d90a
Merge pull request #5276 from nakagawa-fj/gemm_2d_thread_partitioning
Improvement of 2D thread-partitioned GEMM for M << N case
11 months ago
Martin Kroeker
9ef5995c22
Merge pull request #5277 from martin-frbg/fixmingw32
Fix building with mingw32-gcc15
11 months ago
Martin Kroeker
42b7d1f897
Fix addressing of alpha in CBLAS
11 months ago
Martin Kroeker
bd573a9d38
Expand mingw32 gfortran workaround to all versions after 14.1
11 months ago
Masato Nakagawa
2351a98005
Update 2D thread-partitioned GEMM for M << N case.
11 months ago
Martin Kroeker
a5f701c4ab
Merge pull request #5274 from martin-frbg/issue5247
Expressly provide a shared libs option in CMakelists.txt
11 months ago
Martin Kroeker
4ca76d9de4
Expressly provide a shared libs option
1 year ago
Martin Kroeker
846a5436e7
Merge pull request #5273 from martin-frbg/issue5259
CMAKE: Do not suffix the library with a 64 if LIBNAMESUFFIX already contains it
1 year ago
Martin Kroeker
8779eac3b8
Do not add a 64 suffix to the library name if the user-provided suffix already contains it
1 year ago
Martin Kroeker
3473118213
Merge pull request #5272 from martin-frbg/issue5271
Fix compiler options for NeoverseN1 and CortexX2/A?10 in CMake builds
1 year ago
Martin Kroeker
f2022c23ac
Remove sve capability from NeoverseN1 and specify CortexX2/A?10 as arm8.4a
1 year ago
Martin Kroeker
b5456c1b41
Merge pull request #5260 from taoye9/enable_bf16_gemm_gemv_forward_on_arm64
enable sbgemm to be forward to sbgemv on arm64
1 year ago
Martin Kroeker
5a322f21af
Merge pull request #5268 from martin-frbg/fix-dyn-sgemmdirect
Fix conditional inclusion of SGEMM_KERNEL_DIRECT
1 year ago
Martin Kroeker
6680e0592f
Fix conditional inclusion of SGEMM_KERNEL_DIRECT
1 year ago
Martin Kroeker
0b0bb9951d
Merge pull request #5265 from guoyuanplct/develop
kernel/riscv64:Added support for omatcopy on RISCV64_ZVL256B
1 year ago
guoyuanplct
7732a55200
Add retry mechanism after deadlock timeout for c910v.
1 year ago
guoyuanplct
be9f7550b5
Format Code
1 year ago
guoyuanplct
4d213653d8
kernel/riscv64:Added support for omatcopy on riscv64.
1 year ago
Martin Kroeker
8afddc1a81
Merge pull request #5262 from guoyuanplct/develop
kernel/riscv64:Fixed the bug of openblas_utest_ext failing in c/zgemv and some c/zgbmv tests:
1 year ago
guoyuanplct
9a7e3f102b
kernel/riscv64:Fixed the bug of openblas_utest_ext failing in c/zgemv and some c/zgbmv tests:
1 year ago
Martin Kroeker
5366902f9d
Merge pull request #5261 from ErnstPeng/fix-lasx
Fix cgemm_ncopy_16_lasx function for lapack-test and add it C function
1 year ago
pengxu
a978ad3180
Loongarch64: add C functions of zgemm_ncopy_16
1 year ago
pengxu
0ccb050583
Loongarch64: fixed cgemm_ncopy_16_lasx
1 year ago
Ye Tao
7321444660
enable sbgemm to be forward to sbgemv on arm64
1 year ago
Martin Kroeker
cf9e34c1f4
Merge pull request #5258 from martin-frbg/issue5255
Fix empty prototypes in files converted from Fortran (fixes compilation with GCC15)
1 year ago
Martin Kroeker
5141a90993
Fix ARMV9SME target in DYNAMIC_ARCH and add SME query code for MacOS ( #5222 )
* Fix ARMV9SME target and add support_sme1 code for MacOS
* make sgemm_direct unconditionally available on all arm64
* build a (dummy) sgemm_direct kernel on all arm64
* Update dynamic_arm64.c
1 year ago
Martin Kroeker
2320e0b757
Merge pull request #5244 from chitao1234/develop
allow the use of LAPACK_COMPLEX_CPP when using MSVC compiler
1 year ago
Martin Kroeker
0d69a2930d
Fix empty prototypes of select/selctg
1 year ago
Martin Kroeker
ebbe682f7d
Fix function prototypes
1 year ago
Martin Kroeker
151b74284e
Merge pull request #5203 from quic/fix-sgemmdirect-sme1
Add vector registers to clobber list to prevent compiler optimization.
1 year ago
Martin Kroeker
3c878f3e70
Cirrus CI: Update xcode version in the Apple crossbuilds ( #5254 )
* Update xcode version in the Apple crossbuilds
1 year ago
Martin Kroeker
3e961c2771
Merge pull request #5251 from martin-frbg/issue5250
Fix out-of-bounds accesses in ?/SCAL/?GEEV triggered by preceding errrors/invalid inputs
1 year ago
Martin Kroeker
0ea9205a6c
Merge pull request #5249 from scottt/fix-build-on-intel-arrow-lake
cpuid_x86: improve Intel Arrow Lake detection
1 year ago
Martin Kroeker
cba32d001a
Merge pull request #5245 from guoyuanplct/develop
Optimized RVV_ZVL256B Implementation of zgemv_n
1 year ago
Martin Kroeker
5c958dfe1e
Avoid of out of bounds accesses in SCAL when INFO<0
1 year ago
Martin Kroeker
4c0445aed1
Avoid out of bounds accesses in SCAL when INFO <0
1 year ago
Martin Kroeker
d48a2fc469
Avoid out of bounds accesses in SCAL when INFO<0
1 year ago