| @@ -1,4 +1,98 @@ | |||||
| OpenBLAS ChangeLog | OpenBLAS ChangeLog | ||||
| ==================================================================== | |||||
| Version 0.3.29 | |||||
| 12-Jan-2025 | |||||
| general: | |||||
| - fixed a potential NULL pointer dereference in multithreaded builds | |||||
| - added function aliases for GEMMT using its new name GEMMTR adopted by Reference-BLAS | |||||
| - fixed a build failure when building without LAPACK_DEPRECATED functions | |||||
| - the minimum required CMake version for CMake-based builds was raised to 3.16.0 in order | |||||
| to remove many compatibility and deprecation warnings | |||||
| - added more detailed CMake rules for OpenMP builds (mainly to support recent LLVM) | |||||
| - fixed the behavior of the recently added CBLAS_?GEMMT functions with row-major data | |||||
| - improved thread scaling of multithreaded SBGEMV | |||||
| - improved thread scaling of multithreaded TRTRI | |||||
| - fixed compilation of the CBLAS testsuite with gcc14 (and no Fortran compiler) | |||||
| - added support for option handling changes in flang-new from LLVM18 onwards | |||||
| - added support for recent calling conventions changes in Cray and NVIDIA compilers | |||||
| - added support for compilation with the NAG Fortran compiler | |||||
| - fixed placement of the -fopenmp flag and libsuffix in the generated pkgconfig file | |||||
| - improved the CMakeConfig file generated by the Makefile build | |||||
| - fixed const-correctness of cblas_?geadd in cblas.h | |||||
| - fixed a potential inaccuracy in multithreaded BLAS3 calls | |||||
| - fixed empty implementations of get/set_affinity that print a warning in OpenMP builds | |||||
| - fixed function signatures for TRTRS in the converted C version of LAPACK | |||||
| - fixed omission of several single-precision LAPACK symbols in the shared library | |||||
| - improved build instructions for the provided "pybench" benchmarks | |||||
| - improved documentation, including added build instructions for WoA and HarmonyOS | |||||
| - added a separate "make install_tests" target for use with cross-compilations | |||||
| - integrated improvements and corrections from Reference-LAPACK: | |||||
| - removed a comparison in LAPACKE ?tpmqrt that is always false (LAPACK PR 1062) | |||||
| - fixed the leading dimension for B in tests for GGEV (LAPACK PR 1064) | |||||
| - replaced the ?LARFT functions with a recursive implementation (LAPACK PR 1080) | |||||
| arm: | |||||
| - fixed build with recent versions of the NDK (missing .type declaration of symbols) | |||||
| arm64: | |||||
| - fixed a long-standing bug in the (generic) c/zgemm_beta kernel that could lead to | |||||
| reads and writes outside the array bounds in some circumstances | |||||
| - rewrote cpu autodetection to scan all cores and return the highest performing type | |||||
| - improved the DGEMM performance for SVE targets and small matrix sizes | |||||
| - improved dimension criteria for forwarding from GEMM to GEMV kernels | |||||
| - added SVE kernels for ROT and SWAP | |||||
| - improved SVE kernels for SGEMV and DGEMV on A64FX and NEOVERSEV1 | |||||
| - added support for using the "small matrix" kernels with CMake as well | |||||
| - fixed compilation on Windows on Arm | |||||
| - improved compile-time detection of SVE capability | |||||
| - added cpu autodetection and initial support for Apple M4 | |||||
| - added support for compilation on systems running IOS | |||||
| - added support for compilation on NetBSD ("evbarm" architecture) | |||||
| - fixed NRM2 implementations for generic SVE targets and the Neoverse N2 | |||||
| - fixed compilation for SVE-capable targets with the NVIDIA compiler | |||||
| x86_64: | |||||
| - fixed a wrong storage size in the SBGEMV kernel for Cooper Lake | |||||
| - added cpu autodetection for Intel Granite Rapids | |||||
| - added cpu autodetection for AMD Ryzen 5 series | |||||
| - added optimized SOMATCOPY_CT for AVX-capable targets | |||||
| - fixed the fallback implementation of GEMM3M in GENERIC builds | |||||
| - tentatively re-enabled builds with the EXPRECISION option | |||||
| - worked around a miscompilation of tests with mingw32-gfortran14 | |||||
| - added support for compilation with the Intel oneAPI 2025.0 compiler on Windows | |||||
| power: | |||||
| - fixed multithreaded SBGEMM | |||||
| - fixed a CMake build problem on POWER10 | |||||
| - improved the performance of SGEMV | |||||
| - added vectorized implementations of SBGEMV and support for forwarding 1xN SBGEMM to them | |||||
| - fixed illegal instructions and potential memory overflow in SGEMM on PPCG4 | |||||
| - fixed handling of NaN and Inf arguments in SSCAL and DSCAL on PPC440,G4 and 970 | |||||
| - added improved CGEMM and ZGEMM kernels for POWER10 | |||||
| - added Makefile logic to remove all optimization flags in DEBUG builds | |||||
| mips64: | |||||
| - fixed compilation with gcc14 | |||||
| - fixed GEMM parameter selection for the MIPS64_GENERIC target | |||||
| - fixed a potential build failure when compiling with OpenMP | |||||
| loongarch64: | |||||
| - fixed compilation for Loongson3 with recent versions of gmake | |||||
| - fixed a potential loss of precision in Loongson3A GEMM | |||||
| - fixed a potential build failure when compiling with OpenMP | |||||
| - added optimized SOMATCOPY for LASX-capable targets | |||||
| - introduced a new cpu naming scheme while retaining compatibility | |||||
| - added support for cross-compiling Loongarch64 targets with CMake | |||||
| - added support for compilation with LLVM | |||||
| riscv64: | |||||
| - removed thread yielding overhead caused by sched_yield | |||||
| - replaced some non-standard intrinsics with their official names | |||||
| - fixed and sped up the implementations of CGEMM/ZGEMM TCOPY for vector lenghts 128 and 256 | |||||
| - improved the performance of SNRM2/DNRM2 for RVV1.0 targets | |||||
| - added optimized ?OMATCOPY_CN kernels for RVV1.0 targets | |||||
| ==================================================================== | ==================================================================== | ||||
| Version 0.3.28 | Version 0.3.28 | ||||
| 8-Aug-2024 | 8-Aug-2024 | ||||