Update with the changes from 0.3.4

7 years ago · 360374be62
--- a/Changelog.txt
+++ b/Changelog.txt
@@ -1,4 +1,77 @@
 OpenBLAS ChangeLog
 ====================================================================
 Version 0.3.4
 02-Dec-2018

 common:
 	* the new, experimental thread-local memory allocation had 
 	  inadvertently been left enabled for gmake builds in 0.3.3
 	  despite the announcement. It is now disabled by default, and
 	  single-threaded builds will keep using the old allocator even
 	  if the USE_TLS option is turned on.
 	* OpenBLAS will now provide enough buffer space for at least 50
 	  threads by default.
 	* The output of openblas_get_config() now contains the version
 	  number.
 	* A serious thread safety bug in GEMV operation with small M and
 	  large N size has been fixed.
 	* The code will now automatically call blas_thread_init after a
 	  fork if needed before handling a call to openblas_set_num_threads
 	* Accesses to parallelized level3 functions from multiple callers
 	  are now serialized to avoid thread races (unless using OpenMP).
 	  This should provide better performance than the known-threadsafe
 	  (but non-default) USE_SIMPLE_THREADED_LEVEL3 option.
 	* When building LAPACK with gfortran, -frecursive is now (again)
 	  enabled by default to ensure correct behaviour.
        * The OpenBLAS version cblas.h now supports both CBLAS_ORDER and
 	  CBLAS_LAYOUT as the name of the matrix row/column order option.
 	* Externally set LDFLAGS are now passed through to the final compile/link
 	  steps to facilitate setting platform-specific linker flags.
 	* A potential race condition during the build of LAPACK (that would 
 	  usually manifest itself as a failure to build TESTING/MATGEN) has been 
 	  fixed.
 	* xHEMV has been changed to stay single-threaded for small input sizes
 	  where the overhead of multithreading exceeds any possible gains
 	* CSWAP and ZSWAP have been limited to a single thread except on ARMV8 or
 	  ThunderX hardware with sizable input.
 	* Linker flags for the PGI compiler have been updated
 	* Behaviour of AXPY with zero increments is now handled in the C interface,
 	  correcting the result on at least Intel Atom.
 	* The result matrix from calling SGELSS with an all-zero input matrix is 
 	  now zeroed completely.
 	  
 x86_64:
 	* Autodetection of AMD Ryzen2 has been fixed (again).
        * CMAKE builds now support labeling of an INTERFACE64=1 build of
 	  the library with the _64 suffix.
 	* AVX512 version of DGEMM has been added and the AVX512 SGEMM kernel
 	  has been sped up by rewriting with C intrinsics
 	* Fixed compilation on RHEL5/CENTOS5 (issue with typename __WAIT_STATUS)
 	
 POWER:
 	* added support for building on AIX (with gcc and GNU tools from AIX Toolbox).
 	* CPU type detection has been implemented for AIX.
 	* CPU type detection has been fixed for NETBSD.
 	
 MIPS64:
 	* AXPY on LOONGSON3A has been corrected to pass "zero increment" utest.
 	* DSDOT on LOONGSON3A has been fixed.
 	* the SGEMM microkernel has been hardened against potential data loss.
 	
 ARMV8:
 	* DYNAMic_ARCH support is now available for 64bit ARM
 	* cross-compiling for ARMV8 under iOS now works.
 	* cpu-specific code has been rearranged to make better use of both
 	  hardware commonalities and model-specific compiler optimizations.
 	* XGENE1 has been removed as a TARGET, superseded by the improved generic
 	  ARMV8 support.
 	
 ARMV7:
 	* Older assembly mnemonics have been converted to UAL form to allow
 	  building with clang 7.0
 	* Cross compiling LAPACKE for Android has been fixed again (broken by
 	  update to LAPACK 3.7.0 some while ago).  
 	  
 ====================================================================
 Version 0.3.3
 31-Aug-2018