| @@ -1,4 +1,134 @@ | |||
| OpenBLAS ChangeLog | |||
| ==================================================================== | |||
| Version 0.3.30 | |||
| 16-Jun-2025 | |||
| general: | |||
| - fixed an installation problem with the thread safety test in gmake builds | |||
| - fixed spurious overwriting of an input array in complex GEMMT/GEMMTR | |||
| - fixed naming of GEMMTR in error messages from XERBLA | |||
| - fixed compilation of SBGEMMT/SBGEMMTR in CMake builds | |||
| - fixed the implementation of ?NRM2 to handle INCX=0 correctly | |||
| - removed tests for CSROT and ZDROT that relied on unspecified behavior | |||
| - fixed a performance regression in multithreaded GEMM that was particularly | |||
| serious on POWER targets | |||
| - fixed linking issues when using LLVM's flang-new with gmake | |||
| - fixed a potential thread safety problem with C11 atomic operations | |||
| - further improved the workload partitioning in parallel GEMM | |||
| - fixed omission of LAPACKE interfaces for CGESVDQ,CTRSYL3 and ?GEQPF in | |||
| CMake builds | |||
| - fixed mishandling of setting NO_LAPACK to FALSE, and incorrect dependencies | |||
| for LAPACK function SPMV in CMake builds | |||
| - added explicit CMake options for building LAPACKE and shared libraries | |||
| - simplified and improved handling of OpenMP options in CMake builds | |||
| - reworked Windows DLL generation in CMake builds to ensure correct symbol | |||
| renaming (pre/postfixing) and optional generation of PDB files for debugging | |||
| - updated the Perl script version of the gensymbol utility for use with | |||
| Windows-on-Arm | |||
| - Fixed building with (Mingw) gmake on Windows to ensure completeness of the | |||
| LAPACK included in the static library (potential race condition due to the | |||
| Windows version of the "ln" utility creating snapshot copies rather than links) | |||
| - fixed unwanted deletion of the lapacke_mangling.h file by "make clean" | |||
| - fixed potential duplication of a _64 suffix on library names in CMake builds | |||
| - fixed compilation of the C fallback copies of the LAPACK code with GCC 15 | |||
| - included fixed from the Reference-LAPACK project: | |||
| - fixed a truncated error message in the EIG part of the testsuite | |||
| (Reference-LAPACK PR 1119) | |||
| - fixed too strict check in LAPACKE_?gesdd_work (PR #1126) | |||
| - fixed memory corruption when calling ?GEEV with non-finite data (PR #1128) | |||
| - fixed missing initialization of a variable in C/GEQP3RK (PR #1131) | |||
| - fixed 2nd dimension chosen in C/ZUNMLQ transposition operation (PR #1135) | |||
| x86_64: | |||
| - fixed an error in the SBGEMV kernel for Cooper Lake/Sapphire Rapids | |||
| - fixed corner cases of NAN and INF input handling in CSCAL and ZSCAL | |||
| - improved the compiler identification code for flang-new | |||
| - fixed a potential build issue in the ZSUM kernel | |||
| - fixed "argument list too long" errors when building on MacOS | |||
| - added cpu autodetection support for several new Arrow Lake models | |||
| - fixed conditional inclusion of the fast path SGEMM kernel in DYNAMIC_ARCH | |||
| - fixed compilation with the MinGW build of GCC 15 | |||
| arm64: | |||
| - added an optimized SBGEMM kernel for NEOVERSEV1 | |||
| - improved 1xN SBGEMM performance by forwarding to SBGEMV | |||
| - introduced a stepwise increase of the thread count used for | |||
| SGEMM and SGEMV on NEOVERSEV1/V2 in relation to problem size | |||
| - introduced a stepwise increase of the thread count used for | |||
| DGEMV on NEOVERSEV1 in relation to problem size | |||
| - introduced a stepwise increase of the thread count used for | |||
| SDOT and DDOT on NEOVERSEV1 in relation to problem size | |||
| - worked around assembler limitations in LLVM for Windows-on-Arm | |||
| - enabled cpu type autodetection from the registry on Windows-on-Arm | |||
| - improved multithreading threshold for GEMV and GESV on Windows-on-Arm | |||
| - fixed overoptimization issues with LLVM's flang in Windows-on-Arm | |||
| - fixed corner cases of NAN and INF input handling in CSCAL and ZSCAL | |||
| - added a fast path SGEMM kernel for small workloads on SME capable targets | |||
| - improved performance of SGEMM and DGEMM kernels for small workloads | |||
| - improved performance of SGEMV and DGEMV on SVE-capable targets | |||
| - improved performance of SGEMV on NEOVERSEN1 and Apple M | |||
| - added optimized SSYMV and DSYMV kernels for NEOVERSEN1, Apple M and all | |||
| SVE capable targets | |||
| - added optimized SBGEMV kernels for NEOVERSEV1/V2/N2 | |||
| - improved performance of SGEMM through faster NCOPY kernels | |||
| - added compiler options for the NVIDIA HPC Compiler Suite | |||
| - fixed compilation on OSX with XCode 16.3 and later | |||
| - fixed cpu core type and cache size detection on Apple M4 | |||
| - updated GEMM parameter settings for Neoverse cpus in cross-builds with CMake | |||
| - fixed default compiler options for NEOVERSEN1 and CORTEXX2 in CMake builds | |||
| - fixed conditional inclusion of the fast path SGEMM kernel in DYNAMIC_ARCH | |||
| - fixed potential miscompilation of the non-SVE SDOT kernel | |||
| riscv64: | |||
| - added optimized SROTM and DROTM kernels for x280 | |||
| - fixed corner cases of NAN and INF input handling in CSCAL and ZSCAL | |||
| - improved performance of GEMM_TCOPY on RVV1.0 targets with | |||
| VLEN of 128 or 256 | |||
| - improved performance of OMATCOPY on targets with VLEN 256 | |||
| - greatly improved performance of SGEMV/DGEMV | |||
| - improved performance of CGEMV and ZGEMV on C910V and all RVV targets | |||
| with VLEN 256 | |||
| - improved performance of SAXPBY and DAXPBY on C910V and all RVV targets | |||
| with VLEN 256 | |||
| - improved performance of AXPY and DOT on C910V and ZVL256B targets by | |||
| falling back to non-vectorized code for very small N. (Thereby fixing | |||
| poor performance of CHBMV/ZHBMV for very small K) | |||
| - fixed CMake build failures of the TRMM kernels | |||
| loongarch64: | |||
| - improved performance of the LSX versions of SSYMV/DSYMV | |||
| - made the LASX versions of the DSYMV and SSYMV kernels | |||
| compatible with hardware changes in LA664 and future targets | |||
| - fixed inaccuracies in several LASX kernels | |||
| - improved compatibility of LSX kernels with LA264 targets | |||
| - fixed handling of deprecated target names in CMake builds | |||
| - fixed corner cases of NAN and INF input handling in CSCAL and ZSCAL | |||
| power: | |||
| - fixed building for PPCG4 with CMake | |||
| - fixed SSCAL/DSCAL on PPC970 running FreeBSD | |||
| - fixed a potential alignment issue in the POWER8 SGEMV kernel | |||
| - fixed corner cases of NAN and INF input handling in CSCAL and ZSCAL | |||
| zarch: | |||
| - fixed corner cases of NAN and INF input handling in CSCAL and ZSCAL | |||
| - fixed unwanted generation of object files with a writable stack | |||
| x86: | |||
| - fixed corner cases of NAN and INF input handling in CSCAL and ZSCAL | |||
| arm: | |||
| - fixed corner cases of NAN and INF input handling in CSCAL and ZSCAL | |||
| sparc: | |||
| - fixed corner cases of NAN and INF input handling in CSCAL and ZSCAL | |||
| alpha: | |||
| - fixed build failure caused by spurious Windows-only typecasts | |||
| cell: | |||
| - fixed probable build issue caused by spurious Windows-only typecasts | |||
| ==================================================================== | |||
| Version 0.3.29 | |||
| 12-Jan-2025 | |||