| @@ -1,4 +1,115 @@ | |||
| OpenBLAS ChangeLog | |||
| ==================================================================== | |||
| Version 0.3.2 | |||
| 30-Jul-2018 | |||
| common: | |||
| * fixes for regressions caused by the rewrite of the thread | |||
| initialization code in 0.3.1 | |||
| POWER: | |||
| * fixed cpu autodetection for the BSDs | |||
| MIPS64: | |||
| * fixed utest errors in AXPY, DSDOT, ROT and SWAP | |||
| x86_64: | |||
| * added autodetection of AMD Ryzen 2 | |||
| * fixed build with older versions of MSVC | |||
| ==================================================================== | |||
| Version 0.3.1 | |||
| 01-Jul-2018 | |||
| common: | |||
| * rewritten thread initialization code with significantly reduced overhead | |||
| * added CBLAS interfaces to the IxAMIN BLAS extension functions | |||
| * fixed the lapack-test target | |||
| * CMAKE builds now create an OpenBLASConfig.cmake file | |||
| * ZAXPY now uses a single thread for small input sizes | |||
| * the LAPACK code was updated from Reference-LAPACK/lapack#253 | |||
| (fixing LAPACKE interfaces to Aasen's functions) | |||
| POWER: | |||
| * corrected CROT and ZROT behaviour with zero INC_X | |||
| ARMV7: | |||
| * corrected xDOT behaviour with zero INC_X or INC_Y | |||
| x86_64: | |||
| * retired some older targets of DYNAMIC_ARCH builds to a new option DYNAMIC_OLDER, | |||
| this affects PENRYN,DUNNINGTON,OPTERON,OPTERON_SSE3,BOBCAT,ATOM and NANO | |||
| (which will still be supported via the slower PRESCOTT kernels when this option is not set) | |||
| * added an option DYNAMIC_LIST that (used in conjunction with DYNAMIC_ARCH) allows to | |||
| specify the list of x86_64 targets to include. Any target not on the list will be supported | |||
| by the Sandybridge or Nehalem kernels if available, or by Prescott. | |||
| * improved SWITCH_RATIO on Haswell for increased GEMM throughput | |||
| * added initial support for Intel Skylake X, including an AVX512 SGEMM kernel | |||
| * added autodetection of Intel Cannon Lake series as Skylake X | |||
| * added a default L2 cache size for hypervisors that return zero here (Chromebook) | |||
| * fixed a name clash with recent Windows10 headers that broke the build with (at least) | |||
| recent mingw from MSYS2 | |||
| * fixed a link error in mixed clang/gfortran builds with OpenMP | |||
| * updated the OSX deployment target to 10.8 | |||
| * switched on parallel make for builds on MS Windows by default | |||
| x86: | |||
| * fixed SSWAP and DSWAP behaviour with zero INC_X and INC_Y | |||
| ==================================================================== | |||
| Version 0.3.0 | |||
| 23-May-2108 | |||
| common: | |||
| * fixed some more thread race and locking bugs | |||
| * added preliminary support for calling an OpenMP build of the library from multiple threads | |||
| * removed performance impact of thread locks added in 0.2.20 on OpenMP code | |||
| * general code cleanup | |||
| * optimized DSDOT implementation | |||
| * improved thread distribution for GEMM | |||
| * corrected IMATCOPY/OMATCOPY implementation | |||
| * fixed out-of-bounds accesses in the multithreaded xBMV/xPMV and SYMV implementations | |||
| * cmake build improvements | |||
| * pkgconfig file now contains build options | |||
| * openblas_get_config() now reports USE_OPENMP and NUM_THREADS settings used for the build | |||
| * corrections and improvements for systems with more than 64 cpus | |||
| * LAPACK code updated to 3.8.0 including later fixes | |||
| * added ReLAPACK, a recursive implementation of several LAPACK functions | |||
| * Rewrote ROTMG to handle cases that the netlib code failed to address | |||
| * Disabled (broken) multithreading code for xTRMV | |||
| * corrected prototypes of complex CBLAS functions to make our cblas.h match the generally accepted standard | |||
| * shared memory access failures on startup are now handled more gracefully | |||
| * restored utests from earlier releases (and made them pass on all affected systems) | |||
| SPARC: | |||
| * several fixes for cpu autodetection | |||
| POWER: | |||
| * corrected vector register overwriting in several Power8 kernels | |||
| * optimized additional BLAS functions | |||
| ARM: | |||
| * added support for CortexA53 and A72 | |||
| * added autodetection for ThunderX2T99 | |||
| * made most optimized kernels the default for generic ARMv8 targets | |||
| x86_64: | |||
| * parallelized DDOT kernel for Haswell | |||
| * changed alignment directives in assembly kernels to boost performance on OSX | |||
| * fixed register handling in the GEMV microkernels (bug exposed by gcc7) | |||
| * added support for building on OpenBSD and Dragonfly | |||
| * updated compiler options to work with Intel release 2018 | |||
| * support fully optimized build with clang/flang on Microsoft Windows | |||
| * fixed building on AIX | |||
| IBM Z: | |||
| * added optimized BLAS 1/2 functions | |||
| MIPS: | |||
| * fixed cpu autodetection helper code | |||
| * added mips32 1004K cpu (Mediatek MT7621 and similar SoC) | |||
| * added mips64 I6500 cpu | |||
| ==================================================================== | |||
| Version 0.2.20 | |||
| 24-Jul-2017 | |||