| @@ -1,4 +1,134 @@ | |||||
| OpenBLAS ChangeLog | OpenBLAS ChangeLog | ||||
| ==================================================================== | |||||
| Version 0.3.30 | |||||
| 16-Jun-2025 | |||||
| general: | |||||
| - fixed an installation problem with the thread safety test in gmake builds | |||||
| - fixed spurious overwriting of an input array in complex GEMMT/GEMMTR | |||||
| - fixed naming of GEMMTR in error messages from XERBLA | |||||
| - fixed compilation of SBGEMMT/SBGEMMTR in CMake builds | |||||
| - fixed the implementation of ?NRM2 to handle INCX=0 correctly | |||||
| - removed tests for CSROT and ZDROT that relied on unspecified behavior | |||||
| - fixed a performance regression in multithreaded GEMM that was particularly | |||||
| serious on POWER targets | |||||
| - fixed linking issues when using LLVM's flang-new with gmake | |||||
| - fixed a potential thread safety problem with C11 atomic operations | |||||
| - further improved the workload partitioning in parallel GEMM | |||||
| - fixed omission of LAPACKE interfaces for CGESVDQ,CTRSYL3 and ?GEQPF in | |||||
| CMake builds | |||||
| - fixed mishandling of setting NO_LAPACK to FALSE, and incorrect dependencies | |||||
| for LAPACK function SPMV in CMake builds | |||||
| - added explicit CMake options for building LAPACKE and shared libraries | |||||
| - simplified and improved handling of OpenMP options in CMake builds | |||||
| - reworked Windows DLL generation in CMake builds to ensure correct symbol | |||||
| renaming (pre/postfixing) and optional generation of PDB files for debugging | |||||
| - updated the Perl script version of the gensymbol utility for use with | |||||
| Windows-on-Arm | |||||
| - Fixed building with (Mingw) gmake on Windows to ensure completeness of the | |||||
| LAPACK included in the static library (potential race condition due to the | |||||
| Windows version of the "ln" utility creating snapshot copies rather than links) | |||||
| - fixed unwanted deletion of the lapacke_mangling.h file by "make clean" | |||||
| - fixed potential duplication of a _64 suffix on library names in CMake builds | |||||
| - fixed compilation of the C fallback copies of the LAPACK code with GCC 15 | |||||
| - included fixed from the Reference-LAPACK project: | |||||
| - fixed a truncated error message in the EIG part of the testsuite | |||||
| (Reference-LAPACK PR 1119) | |||||
| - fixed too strict check in LAPACKE_?gesdd_work (PR #1126) | |||||
| - fixed memory corruption when calling ?GEEV with non-finite data (PR #1128) | |||||
| - fixed missing initialization of a variable in C/GEQP3RK (PR #1131) | |||||
| - fixed 2nd dimension chosen in C/ZUNMLQ transposition operation (PR #1135) | |||||
| x86_64: | |||||
| - fixed an error in the SBGEMV kernel for Cooper Lake/Sapphire Rapids | |||||
| - fixed corner cases of NAN and INF input handling in CSCAL and ZSCAL | |||||
| - improved the compiler identification code for flang-new | |||||
| - fixed a potential build issue in the ZSUM kernel | |||||
| - fixed "argument list too long" errors when building on MacOS | |||||
| - added cpu autodetection support for several new Arrow Lake models | |||||
| - fixed conditional inclusion of the fast path SGEMM kernel in DYNAMIC_ARCH | |||||
| - fixed compilation with the MinGW build of GCC 15 | |||||
| arm64: | |||||
| - added an optimized SBGEMM kernel for NEOVERSEV1 | |||||
| - improved 1xN SBGEMM performance by forwarding to SBGEMV | |||||
| - introduced a stepwise increase of the thread count used for | |||||
| SGEMM and SGEMV on NEOVERSEV1/V2 in relation to problem size | |||||
| - introduced a stepwise increase of the thread count used for | |||||
| DGEMV on NEOVERSEV1 in relation to problem size | |||||
| - introduced a stepwise increase of the thread count used for | |||||
| SDOT and DDOT on NEOVERSEV1 in relation to problem size | |||||
| - worked around assembler limitations in LLVM for Windows-on-Arm | |||||
| - enabled cpu type autodetection from the registry on Windows-on-Arm | |||||
| - improved multithreading threshold for GEMV and GESV on Windows-on-Arm | |||||
| - fixed overoptimization issues with LLVM's flang in Windows-on-Arm | |||||
| - fixed corner cases of NAN and INF input handling in CSCAL and ZSCAL | |||||
| - added a fast path SGEMM kernel for small workloads on SME capable targets | |||||
| - improved performance of SGEMM and DGEMM kernels for small workloads | |||||
| - improved performance of SGEMV and DGEMV on SVE-capable targets | |||||
| - improved performance of SGEMV on NEOVERSEN1 and Apple M | |||||
| - added optimized SSYMV and DSYMV kernels for NEOVERSEN1, Apple M and all | |||||
| SVE capable targets | |||||
| - added optimized SBGEMV kernels for NEOVERSEV1/V2/N2 | |||||
| - improved performance of SGEMM through faster NCOPY kernels | |||||
| - added compiler options for the NVIDIA HPC Compiler Suite | |||||
| - fixed compilation on OSX with XCode 16.3 and later | |||||
| - fixed cpu core type and cache size detection on Apple M4 | |||||
| - updated GEMM parameter settings for Neoverse cpus in cross-builds with CMake | |||||
| - fixed default compiler options for NEOVERSEN1 and CORTEXX2 in CMake builds | |||||
| - fixed conditional inclusion of the fast path SGEMM kernel in DYNAMIC_ARCH | |||||
| - fixed potential miscompilation of the non-SVE SDOT kernel | |||||
| riscv64: | |||||
| - added optimized SROTM and DROTM kernels for x280 | |||||
| - fixed corner cases of NAN and INF input handling in CSCAL and ZSCAL | |||||
| - improved performance of GEMM_TCOPY on RVV1.0 targets with | |||||
| VLEN of 128 or 256 | |||||
| - improved performance of OMATCOPY on targets with VLEN 256 | |||||
| - greatly improved performance of SGEMV/DGEMV | |||||
| - improved performance of CGEMV and ZGEMV on C910V and all RVV targets | |||||
| with VLEN 256 | |||||
| - improved performance of SAXPBY and DAXPBY on C910V and all RVV targets | |||||
| with VLEN 256 | |||||
| - improved performance of AXPY and DOT on C910V and ZVL256B targets by | |||||
| falling back to non-vectorized code for very small N. (Thereby fixing | |||||
| poor performance of CHBMV/ZHBMV for very small K) | |||||
| - fixed CMake build failures of the TRMM kernels | |||||
| loongarch64: | |||||
| - improved performance of the LSX versions of SSYMV/DSYMV | |||||
| - made the LASX versions of the DSYMV and SSYMV kernels | |||||
| compatible with hardware changes in LA664 and future targets | |||||
| - fixed inaccuracies in several LASX kernels | |||||
| - improved compatibility of LSX kernels with LA264 targets | |||||
| - fixed handling of deprecated target names in CMake builds | |||||
| - fixed corner cases of NAN and INF input handling in CSCAL and ZSCAL | |||||
| power: | |||||
| - fixed building for PPCG4 with CMake | |||||
| - fixed SSCAL/DSCAL on PPC970 running FreeBSD | |||||
| - fixed a potential alignment issue in the POWER8 SGEMV kernel | |||||
| - fixed corner cases of NAN and INF input handling in CSCAL and ZSCAL | |||||
| zarch: | |||||
| - fixed corner cases of NAN and INF input handling in CSCAL and ZSCAL | |||||
| - fixed unwanted generation of object files with a writable stack | |||||
| x86: | |||||
| - fixed corner cases of NAN and INF input handling in CSCAL and ZSCAL | |||||
| arm: | |||||
| - fixed corner cases of NAN and INF input handling in CSCAL and ZSCAL | |||||
| sparc: | |||||
| - fixed corner cases of NAN and INF input handling in CSCAL and ZSCAL | |||||
| alpha: | |||||
| - fixed build failure caused by spurious Windows-only typecasts | |||||
| cell: | |||||
| - fixed probable build issue caused by spurious Windows-only typecasts | |||||
| ==================================================================== | ==================================================================== | ||||
| Version 0.3.29 | Version 0.3.29 | ||||
| 12-Jan-2025 | 12-Jan-2025 | ||||