You can not select more than 25 topics Topics must start with a chinese character,a letter or number, can include dashes ('-') and can be up to 35 characters long.

Makefile.system 26 kB

rebase? (#1) * With the Intel compiler on Linux, prefer ifort for the final link step icc has known problems with mixed-language builds that ifort can handle just fine. Fixes #1956 * Rename operands to put lda on the input/output constraint list * Fix wrong constraints in inline assembly for #2009 * Fix inline assembly constraints rework indices to allow marking argument lda4 as input and output. For #2009 * Fix inline assembly constraints rework indices to allow marking argument lda as input and output. * Fix inline assembly constraints * Fix inline assembly constraints * Fix inline assembly constraints in Bulldozer TRSM kernels rework indices to allow marking i,as and bs as both input and output (marked operand n1 as well for simplicity). For #2009 * Correct range_n limiting same bug as seen in #1388, somehow missed in corresponding PR #1389 * Allow multithreading TRMV again revert workaround introduced for issue #1332 as the actual cause appears to be my incorrect fix from #1262 (see #1388) * Fix error introduced during cleanup * Reduce list of kernels in the dynamic arch build to make compilation complete reliably within the 1h limit again * init * move fix to right place * Fix missing -c option in AVX512 test * Fix AVX512 test always returning false due to missing compiler option * Make x86_32 imply NO_AVX2, NO_AVX512 in addition to NO_AVX fixes #2033 * Keep xcode8.3 for osx BINARY=32 build as xcode10 deprecated i386 * Make sure that AVX512 is disabled in 32bit builds for #2033 * Improve handling of NO_STATIC and NO_SHARED to avoid surprises from defining either as zero. Fixes #2035 by addressing some concerns from #1422 * init * address warning introed with #1814 et al * Restore locking optimizations for OpenMP case restore another accidentally dropped part of #1468 that was missed in #2004 to address performance regression reported in #1461 * HiSilicon tsv110 CPUs optimization branch add HiSilicon tsv110 CPUs optimization branch * add TARGET support for HiSilicon tsv110 CPUs * add TARGET support for HiSilicon tsv110 CPUs * add TARGET support for HiSilicon tsv110 CPUs * Fix module definition conflicts between LAPACK and ReLAPACK for #2043 * Do not compile in AVX512 check if AVX support is disabled xgetbv is function depends on NO_AVX being undefined - we could change that too, but that combo is unlikely to work anyway * ctest.c : add __POWERPC__ for PowerMac * Fix crash in sgemm SSE/nano kernel on x86_64 Fix bug #2047. Signed-off-by: Celelibi <celelibi@gmail.com> * param.h : enable defines for PPC970 on DarwinOS fixes: gemm.c: In function 'sgemm_': ../common_param.h:981:18: error: 'SGEMM_DEFAULT_P' undeclared (first use in this function) #define SGEMM_P SGEMM_DEFAULT_P ^ * common_power.h: force DCBT_ARG 0 on PPC970 Darwin without this, we see ../kernel/power/gemv_n.S:427:Parameter syntax error and many more similar entries that relates to this assembly command dcbt 8, r24, r18 this change makes the DCBT_ARG = 0 and openblas builds through to completion on PowerMac 970 Tests pass * Make TARGET=GENERIC compatible with DYNAMIC_ARCH=1 for issue #2048 * make DYNAMIC_ARCH=1 package work on TSV110. * make DYNAMIC_ARCH=1 package work on TSV110 * Add Intel Denverton for #2048 * Add Intel Denverton * Change 64-bit detection as explained in #2056 * Trivial typo fix as suggested in #2022 * Disable the AVX512 DGEMM kernel (again) Due to as yet unresolved errors seen in #1955 and #2029 * Use POSIX getenv on Cygwin The Windows-native GetEnvironmentVariable cannot be relied on, as Cygwin does not always copy environment variables set through Cygwin to the Windows environment block, particularly after fork(). * Fix for #2063: The DllMain used in Cygwin did not run the thread memory pool cleanup upon THREAD_DETACH which is needed when compiled with USE_TLS=1. * Also call CloseHandle on each thread, as well as on the event so as to not leak thread handles. * AIX asm syntax changes needed for shared object creation * power9 makefile. dgemm based on power8 kernel with following changes : 32x unrolled 16x4 kernel and 8x4 kernel using (lxv stxv butterfly rank1 update). improvement from 17 to 22-23gflops. dtrmm cases were added into dgemm itself * Expose CBLAS interfaces for I?MIN and I?MAX * Build CBLAS interfaces for I?MIN and I?MAX * Add declarations for ?sum and cblas_?sum * Add interface for ?sum (derived from ?asum) * Add ?sum * Add implementations of ssum/dsum and csum/zsum as trivial copies of asum/zsasum with the fabs calls replaced by fmov to preserve code structure * Add ARM implementations of ?sum (trivial copies of the respective ?asum with the fabs calls removed) * Add ARM64 implementations of ?sum as trivial copies of the respective ?asum kernels with the fabs calls removed * Add ia64 implementation of ?sum as trivial copy of asum with the fabs calls removed * Add MIPS implementation of ?sum as trivial copy of ?asum with the fabs calls removed * Add MIPS64 implementation of ?sum as trivial copy of ?asum with the fabs replaced by mov to preserve code structure * Add POWER implementation of ?sum as trivial copy of ?asum with the fabs replaced by fmr to preserve code structure * Add SPARC implementation of ?sum as trivial copy of ?asum with the fabs replaced by fmov to preserve code structure * Add x86 implementation of ?sum as trivial copy of ?asum with the fabs calls removed * Add x86_64 implementation of ?sum as trivial copy of ?asum with the fabs calls removed * Add ZARCH implementation of ?sum as trivial copies of the respective ?asum kernels with the ABS and vflpsb calls removed * Detect 32bit environment on 64bit ARM hardware for #2056, using same approach as #2058 * Add cmake defaults for ?sum kernels * Add ?sum * Add ?sum definitions for generic kernel * Add declarations for ?sum * Add -lm and disable EXPRECISION support on *BSD fixes #2075 * Add in runtime CPU detection for POWER. * snprintf define consolidated to common.h * Support INTERFACE64=1 * Add support for INTERFACE64 and fix XERBLA calls 1. Replaced all instances of "int" with "blasint" 2. Added string length as "hidden" third parameter in calls to fortran XERBLA * Correct length of name string in xerbla call * Avoid out-of-bounds accesses in LAPACK EIG tests see https://github.com/Reference-LAPACK/lapack/issues/333 * Correct INFO=4 condition * Disable reallocation of work array in xSYTRF as it appears to cause memory management problems (seen in the LAPACK tests) * Disable repeated recursion on Ab_BR in ReLAPACK xGBTRF due to crashes in LAPACK tests * sgemm/strmm * Update Changelog with changes from 0.3.6 * Increment version to 0.3.7.dev * Increment version to 0.3.7.dev * Misc. typo fixes Found via `codespell -q 3 -w -L ith,als,dum,nd,amin,nto,wis,ba -S ./relapack,./kernel,./lapack-netlib` * Correct argument of CPU_ISSET for glibc <2.5 fixes #2104 * conflict resolve * Revert reference/ fixes * Revert Changelog.txt typos * Disable the SkyLakeX DGEMMITCOPY kernel as well as a stopgap measure for https://github.com/numpy/numpy/issues/13401 as mentioned in #1955 * Disable DGEMMINCOPY as well for now #1955 * init * Fix errors in cpu enumeration with glibc 2.6 for #2114 * Change two http links to https Closes #2109 * remove redundant code #2113 * Set up CI with Azure Pipelines [skip ci] * TST: add native POWER8 to CI * add native POWER8 testing to Travis CI matrix with ppc64le os entry * Update link to IBM MASS library, update cpu support status * first try migrating one of the arm builds from travis * fix tabbing in azure commands * Update azure-pipelines.yml take out offending lines (although stolen from https://github.com/conda-forge/opencv-feedstock azure-pipelines fiie) * Update azure-pipelines.yml * Update azure-pipelines.yml * Update azure-pipelines.yml * Update azure-pipelines.yml * DOC: Add Azure CI status badge * Add ARMV6 build to azure CI setup (#2122) using aytekinar's Alpine image and docker script from the Travis setup [skip ci] * TST: Azure manylinux1 & clean-up * remove some of the steps & comments from the original Azure yml template * modify the trigger section to use develop since OpenBLAS primarily uses this branch; use the same batching behavior as downstream projects NumPy/ SciPy * remove Travis emulated ARMv6 gcc build because this now happens in Azure * use documented Ubuntu vmImage name for Azure and add in a manylinux1 test run to the matrix [skip appveyor] * Add NO_AFFINITY to available options on Linux, and set it to ON to match the gmake default. Fixes second part of #2114 * Replace ISMIN and ISAMIN kernels on all x86_64 platforms (#2125) * Mark iamax_sse.S as unsuitable for MIN due to issue #2116 * Use iamax.S rather than iamax_sse.S for ISMIN/ISAMIN on all x86_64 as workaround for #2116 * Move ARMv8 gcc build from Travis to Azure * Move ARMv8 gcc build from Travis to Azure * Update .travis.yml * Test drone CI * install make * remove sudo * Install gcc * Install perl * Install gfortran and add a clang job * gfortran->gcc-gfortran * Switch to ubuntu and parallel jobs * apt update * Fix typo * update yes * no need of gcc in clang build * Add a cmake build as well * Add cmake builds and print options * build without lapack on cmake * parallel build * See if ubuntu 19.04 fixes the ICE * Remove qemu armv8 builds * arm32 build * Fix typo * TST: add SkylakeX AVX512 CI test * adapt the C-level reproducer code for some recent SkylakeX AVX512 kernel issues, provided by Isuru Fernando and modified by Martin Kroeker, for usage in the utest suite * add an Intel SDE SkylakeX emulation utest run to the Azure CI matrix; a custom Docker build was required because Ubuntu image provided by Azure does not support AVX512VL instructions * Add option USE_LOCKING for single-threaded build with locking support for calling from concurrent threads * Add option USE_LOCKING for single-threaded build with locking support * Add option USE_LOCKING for SMP-like locking in USE_THREAD=0 builds * Add option USE_LOCKING but keep default settings intact * Remove unrelated change * Do not try ancient PGI hacks with recent versions of that compiler should fix #2139 * Build and run utests in any case, they do their own checks for fortran availability * Add softfp support in min/max kernels fix for #1912 * Revert "Add softfp support in min/max kernels" * Separate implementations of AMAX and IAMAX on arm As noted in #1912 and comment on #1942, the combined implementation happens to "do the right thing" on hardfp, but cannot return both value and index on softfp where they would have to share the return register * Ensure correct output for DAMAX with softfp * Use generic kernels for complex (I)AMAX to support softfp * improved zgemm power9 based on power8 * upload thread safety test folder * hook up c++ thread safety test (main Makefile) * add c++ thread test option to Makefile.rule * Document NO_AVX512 for #2151 * sgemm pipeline improved, zgemm rewritten without inner packs, ABI lxvx v20 fixed with vs52 * Fix detection of AVX512 capable compilers in getarch 21eda8b5 introduced a check in getarch.c to test if the compiler is capable of AVX512. This check currently fails, since the used __AVX2__ macro is only defined if getarch itself was compiled with AVX2/AVX512 support. Make sure this is the case by building getarch with -march=native on x86_64. It is only supposed to run on the build host anyway. * c_check: Unlink correct file * power9 zgemm ztrmm optimized * conflict resolve * Add gfortran workaround for ABI violations in LAPACKE for #2154 (see gcc bug 90329) * Add gfortran workaround for ABI violations for #2154 (see gcc bug 90329) * Add gfortran workaround for potential ABI violation for #2154 * Update fc.cmake * Remove any inadvertent use of -march=native from DYNAMIC_ARCH builds from #2143, -march=native precludes use of more specific options like -march=skylake-avx512 in individual kernels, and defeats the purpose of dynamic arch anyway. * Avoid unintentional activation of TLS code via USE_TLS=0 fixes #2149 * Do not force gcc options on non-gcc compilers fixes compile failure with pgi 18.10 as reported on OpenBLAS-users * Update Makefile.x86_64 * Zero ecx with a mov instruction PGI assembler does not like the initialization in the constraints. * Fix mov syntax * new sgemm 8x16 * Update dtrmm_kernel_16x4_power8.S * PGI compiler does not like -march=native * Fix build on FreeBSD/powerpc64. Signed-off-by: Piotr Kubaj <pkubaj@anongoth.pl> * Fix build for PPC970 on FreeBSD pt. 1 FreeBSD needs DCBT_ARG=0 as well. * Fix build for PPC970 on FreeBSD pt.2 FreeBSD needs those macros too. * cgemm/ctrmm power9 * Utest needs CBLAS but not necessarily FORTRAN * Add mingw builds to Appveyor config * Add getarch flags to disable AVX on x86 (and other small fixes to match Makefile behaviour) * Make disabling DYNAMIC_ARCH on unsupported systems work needs to be unset in the cache for the change to have any effect * Mingw32 needs leading underscore on object names (also copy BUNDERSCORE settings for FORTRAN from the corresponding Makefile)
6 years ago
7 years ago
rebase? (#1) * With the Intel compiler on Linux, prefer ifort for the final link step icc has known problems with mixed-language builds that ifort can handle just fine. Fixes #1956 * Rename operands to put lda on the input/output constraint list * Fix wrong constraints in inline assembly for #2009 * Fix inline assembly constraints rework indices to allow marking argument lda4 as input and output. For #2009 * Fix inline assembly constraints rework indices to allow marking argument lda as input and output. * Fix inline assembly constraints * Fix inline assembly constraints * Fix inline assembly constraints in Bulldozer TRSM kernels rework indices to allow marking i,as and bs as both input and output (marked operand n1 as well for simplicity). For #2009 * Correct range_n limiting same bug as seen in #1388, somehow missed in corresponding PR #1389 * Allow multithreading TRMV again revert workaround introduced for issue #1332 as the actual cause appears to be my incorrect fix from #1262 (see #1388) * Fix error introduced during cleanup * Reduce list of kernels in the dynamic arch build to make compilation complete reliably within the 1h limit again * init * move fix to right place * Fix missing -c option in AVX512 test * Fix AVX512 test always returning false due to missing compiler option * Make x86_32 imply NO_AVX2, NO_AVX512 in addition to NO_AVX fixes #2033 * Keep xcode8.3 for osx BINARY=32 build as xcode10 deprecated i386 * Make sure that AVX512 is disabled in 32bit builds for #2033 * Improve handling of NO_STATIC and NO_SHARED to avoid surprises from defining either as zero. Fixes #2035 by addressing some concerns from #1422 * init * address warning introed with #1814 et al * Restore locking optimizations for OpenMP case restore another accidentally dropped part of #1468 that was missed in #2004 to address performance regression reported in #1461 * HiSilicon tsv110 CPUs optimization branch add HiSilicon tsv110 CPUs optimization branch * add TARGET support for HiSilicon tsv110 CPUs * add TARGET support for HiSilicon tsv110 CPUs * add TARGET support for HiSilicon tsv110 CPUs * Fix module definition conflicts between LAPACK and ReLAPACK for #2043 * Do not compile in AVX512 check if AVX support is disabled xgetbv is function depends on NO_AVX being undefined - we could change that too, but that combo is unlikely to work anyway * ctest.c : add __POWERPC__ for PowerMac * Fix crash in sgemm SSE/nano kernel on x86_64 Fix bug #2047. Signed-off-by: Celelibi <celelibi@gmail.com> * param.h : enable defines for PPC970 on DarwinOS fixes: gemm.c: In function 'sgemm_': ../common_param.h:981:18: error: 'SGEMM_DEFAULT_P' undeclared (first use in this function) #define SGEMM_P SGEMM_DEFAULT_P ^ * common_power.h: force DCBT_ARG 0 on PPC970 Darwin without this, we see ../kernel/power/gemv_n.S:427:Parameter syntax error and many more similar entries that relates to this assembly command dcbt 8, r24, r18 this change makes the DCBT_ARG = 0 and openblas builds through to completion on PowerMac 970 Tests pass * Make TARGET=GENERIC compatible with DYNAMIC_ARCH=1 for issue #2048 * make DYNAMIC_ARCH=1 package work on TSV110. * make DYNAMIC_ARCH=1 package work on TSV110 * Add Intel Denverton for #2048 * Add Intel Denverton * Change 64-bit detection as explained in #2056 * Trivial typo fix as suggested in #2022 * Disable the AVX512 DGEMM kernel (again) Due to as yet unresolved errors seen in #1955 and #2029 * Use POSIX getenv on Cygwin The Windows-native GetEnvironmentVariable cannot be relied on, as Cygwin does not always copy environment variables set through Cygwin to the Windows environment block, particularly after fork(). * Fix for #2063: The DllMain used in Cygwin did not run the thread memory pool cleanup upon THREAD_DETACH which is needed when compiled with USE_TLS=1. * Also call CloseHandle on each thread, as well as on the event so as to not leak thread handles. * AIX asm syntax changes needed for shared object creation * power9 makefile. dgemm based on power8 kernel with following changes : 32x unrolled 16x4 kernel and 8x4 kernel using (lxv stxv butterfly rank1 update). improvement from 17 to 22-23gflops. dtrmm cases were added into dgemm itself * Expose CBLAS interfaces for I?MIN and I?MAX * Build CBLAS interfaces for I?MIN and I?MAX * Add declarations for ?sum and cblas_?sum * Add interface for ?sum (derived from ?asum) * Add ?sum * Add implementations of ssum/dsum and csum/zsum as trivial copies of asum/zsasum with the fabs calls replaced by fmov to preserve code structure * Add ARM implementations of ?sum (trivial copies of the respective ?asum with the fabs calls removed) * Add ARM64 implementations of ?sum as trivial copies of the respective ?asum kernels with the fabs calls removed * Add ia64 implementation of ?sum as trivial copy of asum with the fabs calls removed * Add MIPS implementation of ?sum as trivial copy of ?asum with the fabs calls removed * Add MIPS64 implementation of ?sum as trivial copy of ?asum with the fabs replaced by mov to preserve code structure * Add POWER implementation of ?sum as trivial copy of ?asum with the fabs replaced by fmr to preserve code structure * Add SPARC implementation of ?sum as trivial copy of ?asum with the fabs replaced by fmov to preserve code structure * Add x86 implementation of ?sum as trivial copy of ?asum with the fabs calls removed * Add x86_64 implementation of ?sum as trivial copy of ?asum with the fabs calls removed * Add ZARCH implementation of ?sum as trivial copies of the respective ?asum kernels with the ABS and vflpsb calls removed * Detect 32bit environment on 64bit ARM hardware for #2056, using same approach as #2058 * Add cmake defaults for ?sum kernels * Add ?sum * Add ?sum definitions for generic kernel * Add declarations for ?sum * Add -lm and disable EXPRECISION support on *BSD fixes #2075 * Add in runtime CPU detection for POWER. * snprintf define consolidated to common.h * Support INTERFACE64=1 * Add support for INTERFACE64 and fix XERBLA calls 1. Replaced all instances of "int" with "blasint" 2. Added string length as "hidden" third parameter in calls to fortran XERBLA * Correct length of name string in xerbla call * Avoid out-of-bounds accesses in LAPACK EIG tests see https://github.com/Reference-LAPACK/lapack/issues/333 * Correct INFO=4 condition * Disable reallocation of work array in xSYTRF as it appears to cause memory management problems (seen in the LAPACK tests) * Disable repeated recursion on Ab_BR in ReLAPACK xGBTRF due to crashes in LAPACK tests * sgemm/strmm * Update Changelog with changes from 0.3.6 * Increment version to 0.3.7.dev * Increment version to 0.3.7.dev * Misc. typo fixes Found via `codespell -q 3 -w -L ith,als,dum,nd,amin,nto,wis,ba -S ./relapack,./kernel,./lapack-netlib` * Correct argument of CPU_ISSET for glibc <2.5 fixes #2104 * conflict resolve * Revert reference/ fixes * Revert Changelog.txt typos * Disable the SkyLakeX DGEMMITCOPY kernel as well as a stopgap measure for https://github.com/numpy/numpy/issues/13401 as mentioned in #1955 * Disable DGEMMINCOPY as well for now #1955 * init * Fix errors in cpu enumeration with glibc 2.6 for #2114 * Change two http links to https Closes #2109 * remove redundant code #2113 * Set up CI with Azure Pipelines [skip ci] * TST: add native POWER8 to CI * add native POWER8 testing to Travis CI matrix with ppc64le os entry * Update link to IBM MASS library, update cpu support status * first try migrating one of the arm builds from travis * fix tabbing in azure commands * Update azure-pipelines.yml take out offending lines (although stolen from https://github.com/conda-forge/opencv-feedstock azure-pipelines fiie) * Update azure-pipelines.yml * Update azure-pipelines.yml * Update azure-pipelines.yml * Update azure-pipelines.yml * DOC: Add Azure CI status badge * Add ARMV6 build to azure CI setup (#2122) using aytekinar's Alpine image and docker script from the Travis setup [skip ci] * TST: Azure manylinux1 & clean-up * remove some of the steps & comments from the original Azure yml template * modify the trigger section to use develop since OpenBLAS primarily uses this branch; use the same batching behavior as downstream projects NumPy/ SciPy * remove Travis emulated ARMv6 gcc build because this now happens in Azure * use documented Ubuntu vmImage name for Azure and add in a manylinux1 test run to the matrix [skip appveyor] * Add NO_AFFINITY to available options on Linux, and set it to ON to match the gmake default. Fixes second part of #2114 * Replace ISMIN and ISAMIN kernels on all x86_64 platforms (#2125) * Mark iamax_sse.S as unsuitable for MIN due to issue #2116 * Use iamax.S rather than iamax_sse.S for ISMIN/ISAMIN on all x86_64 as workaround for #2116 * Move ARMv8 gcc build from Travis to Azure * Move ARMv8 gcc build from Travis to Azure * Update .travis.yml * Test drone CI * install make * remove sudo * Install gcc * Install perl * Install gfortran and add a clang job * gfortran->gcc-gfortran * Switch to ubuntu and parallel jobs * apt update * Fix typo * update yes * no need of gcc in clang build * Add a cmake build as well * Add cmake builds and print options * build without lapack on cmake * parallel build * See if ubuntu 19.04 fixes the ICE * Remove qemu armv8 builds * arm32 build * Fix typo * TST: add SkylakeX AVX512 CI test * adapt the C-level reproducer code for some recent SkylakeX AVX512 kernel issues, provided by Isuru Fernando and modified by Martin Kroeker, for usage in the utest suite * add an Intel SDE SkylakeX emulation utest run to the Azure CI matrix; a custom Docker build was required because Ubuntu image provided by Azure does not support AVX512VL instructions * Add option USE_LOCKING for single-threaded build with locking support for calling from concurrent threads * Add option USE_LOCKING for single-threaded build with locking support * Add option USE_LOCKING for SMP-like locking in USE_THREAD=0 builds * Add option USE_LOCKING but keep default settings intact * Remove unrelated change * Do not try ancient PGI hacks with recent versions of that compiler should fix #2139 * Build and run utests in any case, they do their own checks for fortran availability * Add softfp support in min/max kernels fix for #1912 * Revert "Add softfp support in min/max kernels" * Separate implementations of AMAX and IAMAX on arm As noted in #1912 and comment on #1942, the combined implementation happens to "do the right thing" on hardfp, but cannot return both value and index on softfp where they would have to share the return register * Ensure correct output for DAMAX with softfp * Use generic kernels for complex (I)AMAX to support softfp * improved zgemm power9 based on power8 * upload thread safety test folder * hook up c++ thread safety test (main Makefile) * add c++ thread test option to Makefile.rule * Document NO_AVX512 for #2151 * sgemm pipeline improved, zgemm rewritten without inner packs, ABI lxvx v20 fixed with vs52 * Fix detection of AVX512 capable compilers in getarch 21eda8b5 introduced a check in getarch.c to test if the compiler is capable of AVX512. This check currently fails, since the used __AVX2__ macro is only defined if getarch itself was compiled with AVX2/AVX512 support. Make sure this is the case by building getarch with -march=native on x86_64. It is only supposed to run on the build host anyway. * c_check: Unlink correct file * power9 zgemm ztrmm optimized * conflict resolve * Add gfortran workaround for ABI violations in LAPACKE for #2154 (see gcc bug 90329) * Add gfortran workaround for ABI violations for #2154 (see gcc bug 90329) * Add gfortran workaround for potential ABI violation for #2154 * Update fc.cmake * Remove any inadvertent use of -march=native from DYNAMIC_ARCH builds from #2143, -march=native precludes use of more specific options like -march=skylake-avx512 in individual kernels, and defeats the purpose of dynamic arch anyway. * Avoid unintentional activation of TLS code via USE_TLS=0 fixes #2149 * Do not force gcc options on non-gcc compilers fixes compile failure with pgi 18.10 as reported on OpenBLAS-users * Update Makefile.x86_64 * Zero ecx with a mov instruction PGI assembler does not like the initialization in the constraints. * Fix mov syntax * new sgemm 8x16 * Update dtrmm_kernel_16x4_power8.S * PGI compiler does not like -march=native * Fix build on FreeBSD/powerpc64. Signed-off-by: Piotr Kubaj <pkubaj@anongoth.pl> * Fix build for PPC970 on FreeBSD pt. 1 FreeBSD needs DCBT_ARG=0 as well. * Fix build for PPC970 on FreeBSD pt.2 FreeBSD needs those macros too. * cgemm/ctrmm power9 * Utest needs CBLAS but not necessarily FORTRAN * Add mingw builds to Appveyor config * Add getarch flags to disable AVX on x86 (and other small fixes to match Makefile behaviour) * Make disabling DYNAMIC_ARCH on unsupported systems work needs to be unset in the cache for the change to have any effect * Mingw32 needs leading underscore on object names (also copy BUNDERSCORE settings for FORTRAN from the corresponding Makefile)
6 years ago
rebase? (#1) * With the Intel compiler on Linux, prefer ifort for the final link step icc has known problems with mixed-language builds that ifort can handle just fine. Fixes #1956 * Rename operands to put lda on the input/output constraint list * Fix wrong constraints in inline assembly for #2009 * Fix inline assembly constraints rework indices to allow marking argument lda4 as input and output. For #2009 * Fix inline assembly constraints rework indices to allow marking argument lda as input and output. * Fix inline assembly constraints * Fix inline assembly constraints * Fix inline assembly constraints in Bulldozer TRSM kernels rework indices to allow marking i,as and bs as both input and output (marked operand n1 as well for simplicity). For #2009 * Correct range_n limiting same bug as seen in #1388, somehow missed in corresponding PR #1389 * Allow multithreading TRMV again revert workaround introduced for issue #1332 as the actual cause appears to be my incorrect fix from #1262 (see #1388) * Fix error introduced during cleanup * Reduce list of kernels in the dynamic arch build to make compilation complete reliably within the 1h limit again * init * move fix to right place * Fix missing -c option in AVX512 test * Fix AVX512 test always returning false due to missing compiler option * Make x86_32 imply NO_AVX2, NO_AVX512 in addition to NO_AVX fixes #2033 * Keep xcode8.3 for osx BINARY=32 build as xcode10 deprecated i386 * Make sure that AVX512 is disabled in 32bit builds for #2033 * Improve handling of NO_STATIC and NO_SHARED to avoid surprises from defining either as zero. Fixes #2035 by addressing some concerns from #1422 * init * address warning introed with #1814 et al * Restore locking optimizations for OpenMP case restore another accidentally dropped part of #1468 that was missed in #2004 to address performance regression reported in #1461 * HiSilicon tsv110 CPUs optimization branch add HiSilicon tsv110 CPUs optimization branch * add TARGET support for HiSilicon tsv110 CPUs * add TARGET support for HiSilicon tsv110 CPUs * add TARGET support for HiSilicon tsv110 CPUs * Fix module definition conflicts between LAPACK and ReLAPACK for #2043 * Do not compile in AVX512 check if AVX support is disabled xgetbv is function depends on NO_AVX being undefined - we could change that too, but that combo is unlikely to work anyway * ctest.c : add __POWERPC__ for PowerMac * Fix crash in sgemm SSE/nano kernel on x86_64 Fix bug #2047. Signed-off-by: Celelibi <celelibi@gmail.com> * param.h : enable defines for PPC970 on DarwinOS fixes: gemm.c: In function 'sgemm_': ../common_param.h:981:18: error: 'SGEMM_DEFAULT_P' undeclared (first use in this function) #define SGEMM_P SGEMM_DEFAULT_P ^ * common_power.h: force DCBT_ARG 0 on PPC970 Darwin without this, we see ../kernel/power/gemv_n.S:427:Parameter syntax error and many more similar entries that relates to this assembly command dcbt 8, r24, r18 this change makes the DCBT_ARG = 0 and openblas builds through to completion on PowerMac 970 Tests pass * Make TARGET=GENERIC compatible with DYNAMIC_ARCH=1 for issue #2048 * make DYNAMIC_ARCH=1 package work on TSV110. * make DYNAMIC_ARCH=1 package work on TSV110 * Add Intel Denverton for #2048 * Add Intel Denverton * Change 64-bit detection as explained in #2056 * Trivial typo fix as suggested in #2022 * Disable the AVX512 DGEMM kernel (again) Due to as yet unresolved errors seen in #1955 and #2029 * Use POSIX getenv on Cygwin The Windows-native GetEnvironmentVariable cannot be relied on, as Cygwin does not always copy environment variables set through Cygwin to the Windows environment block, particularly after fork(). * Fix for #2063: The DllMain used in Cygwin did not run the thread memory pool cleanup upon THREAD_DETACH which is needed when compiled with USE_TLS=1. * Also call CloseHandle on each thread, as well as on the event so as to not leak thread handles. * AIX asm syntax changes needed for shared object creation * power9 makefile. dgemm based on power8 kernel with following changes : 32x unrolled 16x4 kernel and 8x4 kernel using (lxv stxv butterfly rank1 update). improvement from 17 to 22-23gflops. dtrmm cases were added into dgemm itself * Expose CBLAS interfaces for I?MIN and I?MAX * Build CBLAS interfaces for I?MIN and I?MAX * Add declarations for ?sum and cblas_?sum * Add interface for ?sum (derived from ?asum) * Add ?sum * Add implementations of ssum/dsum and csum/zsum as trivial copies of asum/zsasum with the fabs calls replaced by fmov to preserve code structure * Add ARM implementations of ?sum (trivial copies of the respective ?asum with the fabs calls removed) * Add ARM64 implementations of ?sum as trivial copies of the respective ?asum kernels with the fabs calls removed * Add ia64 implementation of ?sum as trivial copy of asum with the fabs calls removed * Add MIPS implementation of ?sum as trivial copy of ?asum with the fabs calls removed * Add MIPS64 implementation of ?sum as trivial copy of ?asum with the fabs replaced by mov to preserve code structure * Add POWER implementation of ?sum as trivial copy of ?asum with the fabs replaced by fmr to preserve code structure * Add SPARC implementation of ?sum as trivial copy of ?asum with the fabs replaced by fmov to preserve code structure * Add x86 implementation of ?sum as trivial copy of ?asum with the fabs calls removed * Add x86_64 implementation of ?sum as trivial copy of ?asum with the fabs calls removed * Add ZARCH implementation of ?sum as trivial copies of the respective ?asum kernels with the ABS and vflpsb calls removed * Detect 32bit environment on 64bit ARM hardware for #2056, using same approach as #2058 * Add cmake defaults for ?sum kernels * Add ?sum * Add ?sum definitions for generic kernel * Add declarations for ?sum * Add -lm and disable EXPRECISION support on *BSD fixes #2075 * Add in runtime CPU detection for POWER. * snprintf define consolidated to common.h * Support INTERFACE64=1 * Add support for INTERFACE64 and fix XERBLA calls 1. Replaced all instances of "int" with "blasint" 2. Added string length as "hidden" third parameter in calls to fortran XERBLA * Correct length of name string in xerbla call * Avoid out-of-bounds accesses in LAPACK EIG tests see https://github.com/Reference-LAPACK/lapack/issues/333 * Correct INFO=4 condition * Disable reallocation of work array in xSYTRF as it appears to cause memory management problems (seen in the LAPACK tests) * Disable repeated recursion on Ab_BR in ReLAPACK xGBTRF due to crashes in LAPACK tests * sgemm/strmm * Update Changelog with changes from 0.3.6 * Increment version to 0.3.7.dev * Increment version to 0.3.7.dev * Misc. typo fixes Found via `codespell -q 3 -w -L ith,als,dum,nd,amin,nto,wis,ba -S ./relapack,./kernel,./lapack-netlib` * Correct argument of CPU_ISSET for glibc <2.5 fixes #2104 * conflict resolve * Revert reference/ fixes * Revert Changelog.txt typos * Disable the SkyLakeX DGEMMITCOPY kernel as well as a stopgap measure for https://github.com/numpy/numpy/issues/13401 as mentioned in #1955 * Disable DGEMMINCOPY as well for now #1955 * init * Fix errors in cpu enumeration with glibc 2.6 for #2114 * Change two http links to https Closes #2109 * remove redundant code #2113 * Set up CI with Azure Pipelines [skip ci] * TST: add native POWER8 to CI * add native POWER8 testing to Travis CI matrix with ppc64le os entry * Update link to IBM MASS library, update cpu support status * first try migrating one of the arm builds from travis * fix tabbing in azure commands * Update azure-pipelines.yml take out offending lines (although stolen from https://github.com/conda-forge/opencv-feedstock azure-pipelines fiie) * Update azure-pipelines.yml * Update azure-pipelines.yml * Update azure-pipelines.yml * Update azure-pipelines.yml * DOC: Add Azure CI status badge * Add ARMV6 build to azure CI setup (#2122) using aytekinar's Alpine image and docker script from the Travis setup [skip ci] * TST: Azure manylinux1 & clean-up * remove some of the steps & comments from the original Azure yml template * modify the trigger section to use develop since OpenBLAS primarily uses this branch; use the same batching behavior as downstream projects NumPy/ SciPy * remove Travis emulated ARMv6 gcc build because this now happens in Azure * use documented Ubuntu vmImage name for Azure and add in a manylinux1 test run to the matrix [skip appveyor] * Add NO_AFFINITY to available options on Linux, and set it to ON to match the gmake default. Fixes second part of #2114 * Replace ISMIN and ISAMIN kernels on all x86_64 platforms (#2125) * Mark iamax_sse.S as unsuitable for MIN due to issue #2116 * Use iamax.S rather than iamax_sse.S for ISMIN/ISAMIN on all x86_64 as workaround for #2116 * Move ARMv8 gcc build from Travis to Azure * Move ARMv8 gcc build from Travis to Azure * Update .travis.yml * Test drone CI * install make * remove sudo * Install gcc * Install perl * Install gfortran and add a clang job * gfortran->gcc-gfortran * Switch to ubuntu and parallel jobs * apt update * Fix typo * update yes * no need of gcc in clang build * Add a cmake build as well * Add cmake builds and print options * build without lapack on cmake * parallel build * See if ubuntu 19.04 fixes the ICE * Remove qemu armv8 builds * arm32 build * Fix typo * TST: add SkylakeX AVX512 CI test * adapt the C-level reproducer code for some recent SkylakeX AVX512 kernel issues, provided by Isuru Fernando and modified by Martin Kroeker, for usage in the utest suite * add an Intel SDE SkylakeX emulation utest run to the Azure CI matrix; a custom Docker build was required because Ubuntu image provided by Azure does not support AVX512VL instructions * Add option USE_LOCKING for single-threaded build with locking support for calling from concurrent threads * Add option USE_LOCKING for single-threaded build with locking support * Add option USE_LOCKING for SMP-like locking in USE_THREAD=0 builds * Add option USE_LOCKING but keep default settings intact * Remove unrelated change * Do not try ancient PGI hacks with recent versions of that compiler should fix #2139 * Build and run utests in any case, they do their own checks for fortran availability * Add softfp support in min/max kernels fix for #1912 * Revert "Add softfp support in min/max kernels" * Separate implementations of AMAX and IAMAX on arm As noted in #1912 and comment on #1942, the combined implementation happens to "do the right thing" on hardfp, but cannot return both value and index on softfp where they would have to share the return register * Ensure correct output for DAMAX with softfp * Use generic kernels for complex (I)AMAX to support softfp * improved zgemm power9 based on power8 * upload thread safety test folder * hook up c++ thread safety test (main Makefile) * add c++ thread test option to Makefile.rule * Document NO_AVX512 for #2151 * sgemm pipeline improved, zgemm rewritten without inner packs, ABI lxvx v20 fixed with vs52 * Fix detection of AVX512 capable compilers in getarch 21eda8b5 introduced a check in getarch.c to test if the compiler is capable of AVX512. This check currently fails, since the used __AVX2__ macro is only defined if getarch itself was compiled with AVX2/AVX512 support. Make sure this is the case by building getarch with -march=native on x86_64. It is only supposed to run on the build host anyway. * c_check: Unlink correct file * power9 zgemm ztrmm optimized * conflict resolve * Add gfortran workaround for ABI violations in LAPACKE for #2154 (see gcc bug 90329) * Add gfortran workaround for ABI violations for #2154 (see gcc bug 90329) * Add gfortran workaround for potential ABI violation for #2154 * Update fc.cmake * Remove any inadvertent use of -march=native from DYNAMIC_ARCH builds from #2143, -march=native precludes use of more specific options like -march=skylake-avx512 in individual kernels, and defeats the purpose of dynamic arch anyway. * Avoid unintentional activation of TLS code via USE_TLS=0 fixes #2149 * Do not force gcc options on non-gcc compilers fixes compile failure with pgi 18.10 as reported on OpenBLAS-users * Update Makefile.x86_64 * Zero ecx with a mov instruction PGI assembler does not like the initialization in the constraints. * Fix mov syntax * new sgemm 8x16 * Update dtrmm_kernel_16x4_power8.S * PGI compiler does not like -march=native * Fix build on FreeBSD/powerpc64. Signed-off-by: Piotr Kubaj <pkubaj@anongoth.pl> * Fix build for PPC970 on FreeBSD pt. 1 FreeBSD needs DCBT_ARG=0 as well. * Fix build for PPC970 on FreeBSD pt.2 FreeBSD needs those macros too. * cgemm/ctrmm power9 * Utest needs CBLAS but not necessarily FORTRAN * Add mingw builds to Appveyor config * Add getarch flags to disable AVX on x86 (and other small fixes to match Makefile behaviour) * Make disabling DYNAMIC_ARCH on unsupported systems work needs to be unset in the cache for the change to have any effect * Mingw32 needs leading underscore on object names (also copy BUNDERSCORE settings for FORTRAN from the corresponding Makefile)
6 years ago
rebase? (#1) * With the Intel compiler on Linux, prefer ifort for the final link step icc has known problems with mixed-language builds that ifort can handle just fine. Fixes #1956 * Rename operands to put lda on the input/output constraint list * Fix wrong constraints in inline assembly for #2009 * Fix inline assembly constraints rework indices to allow marking argument lda4 as input and output. For #2009 * Fix inline assembly constraints rework indices to allow marking argument lda as input and output. * Fix inline assembly constraints * Fix inline assembly constraints * Fix inline assembly constraints in Bulldozer TRSM kernels rework indices to allow marking i,as and bs as both input and output (marked operand n1 as well for simplicity). For #2009 * Correct range_n limiting same bug as seen in #1388, somehow missed in corresponding PR #1389 * Allow multithreading TRMV again revert workaround introduced for issue #1332 as the actual cause appears to be my incorrect fix from #1262 (see #1388) * Fix error introduced during cleanup * Reduce list of kernels in the dynamic arch build to make compilation complete reliably within the 1h limit again * init * move fix to right place * Fix missing -c option in AVX512 test * Fix AVX512 test always returning false due to missing compiler option * Make x86_32 imply NO_AVX2, NO_AVX512 in addition to NO_AVX fixes #2033 * Keep xcode8.3 for osx BINARY=32 build as xcode10 deprecated i386 * Make sure that AVX512 is disabled in 32bit builds for #2033 * Improve handling of NO_STATIC and NO_SHARED to avoid surprises from defining either as zero. Fixes #2035 by addressing some concerns from #1422 * init * address warning introed with #1814 et al * Restore locking optimizations for OpenMP case restore another accidentally dropped part of #1468 that was missed in #2004 to address performance regression reported in #1461 * HiSilicon tsv110 CPUs optimization branch add HiSilicon tsv110 CPUs optimization branch * add TARGET support for HiSilicon tsv110 CPUs * add TARGET support for HiSilicon tsv110 CPUs * add TARGET support for HiSilicon tsv110 CPUs * Fix module definition conflicts between LAPACK and ReLAPACK for #2043 * Do not compile in AVX512 check if AVX support is disabled xgetbv is function depends on NO_AVX being undefined - we could change that too, but that combo is unlikely to work anyway * ctest.c : add __POWERPC__ for PowerMac * Fix crash in sgemm SSE/nano kernel on x86_64 Fix bug #2047. Signed-off-by: Celelibi <celelibi@gmail.com> * param.h : enable defines for PPC970 on DarwinOS fixes: gemm.c: In function 'sgemm_': ../common_param.h:981:18: error: 'SGEMM_DEFAULT_P' undeclared (first use in this function) #define SGEMM_P SGEMM_DEFAULT_P ^ * common_power.h: force DCBT_ARG 0 on PPC970 Darwin without this, we see ../kernel/power/gemv_n.S:427:Parameter syntax error and many more similar entries that relates to this assembly command dcbt 8, r24, r18 this change makes the DCBT_ARG = 0 and openblas builds through to completion on PowerMac 970 Tests pass * Make TARGET=GENERIC compatible with DYNAMIC_ARCH=1 for issue #2048 * make DYNAMIC_ARCH=1 package work on TSV110. * make DYNAMIC_ARCH=1 package work on TSV110 * Add Intel Denverton for #2048 * Add Intel Denverton * Change 64-bit detection as explained in #2056 * Trivial typo fix as suggested in #2022 * Disable the AVX512 DGEMM kernel (again) Due to as yet unresolved errors seen in #1955 and #2029 * Use POSIX getenv on Cygwin The Windows-native GetEnvironmentVariable cannot be relied on, as Cygwin does not always copy environment variables set through Cygwin to the Windows environment block, particularly after fork(). * Fix for #2063: The DllMain used in Cygwin did not run the thread memory pool cleanup upon THREAD_DETACH which is needed when compiled with USE_TLS=1. * Also call CloseHandle on each thread, as well as on the event so as to not leak thread handles. * AIX asm syntax changes needed for shared object creation * power9 makefile. dgemm based on power8 kernel with following changes : 32x unrolled 16x4 kernel and 8x4 kernel using (lxv stxv butterfly rank1 update). improvement from 17 to 22-23gflops. dtrmm cases were added into dgemm itself * Expose CBLAS interfaces for I?MIN and I?MAX * Build CBLAS interfaces for I?MIN and I?MAX * Add declarations for ?sum and cblas_?sum * Add interface for ?sum (derived from ?asum) * Add ?sum * Add implementations of ssum/dsum and csum/zsum as trivial copies of asum/zsasum with the fabs calls replaced by fmov to preserve code structure * Add ARM implementations of ?sum (trivial copies of the respective ?asum with the fabs calls removed) * Add ARM64 implementations of ?sum as trivial copies of the respective ?asum kernels with the fabs calls removed * Add ia64 implementation of ?sum as trivial copy of asum with the fabs calls removed * Add MIPS implementation of ?sum as trivial copy of ?asum with the fabs calls removed * Add MIPS64 implementation of ?sum as trivial copy of ?asum with the fabs replaced by mov to preserve code structure * Add POWER implementation of ?sum as trivial copy of ?asum with the fabs replaced by fmr to preserve code structure * Add SPARC implementation of ?sum as trivial copy of ?asum with the fabs replaced by fmov to preserve code structure * Add x86 implementation of ?sum as trivial copy of ?asum with the fabs calls removed * Add x86_64 implementation of ?sum as trivial copy of ?asum with the fabs calls removed * Add ZARCH implementation of ?sum as trivial copies of the respective ?asum kernels with the ABS and vflpsb calls removed * Detect 32bit environment on 64bit ARM hardware for #2056, using same approach as #2058 * Add cmake defaults for ?sum kernels * Add ?sum * Add ?sum definitions for generic kernel * Add declarations for ?sum * Add -lm and disable EXPRECISION support on *BSD fixes #2075 * Add in runtime CPU detection for POWER. * snprintf define consolidated to common.h * Support INTERFACE64=1 * Add support for INTERFACE64 and fix XERBLA calls 1. Replaced all instances of "int" with "blasint" 2. Added string length as "hidden" third parameter in calls to fortran XERBLA * Correct length of name string in xerbla call * Avoid out-of-bounds accesses in LAPACK EIG tests see https://github.com/Reference-LAPACK/lapack/issues/333 * Correct INFO=4 condition * Disable reallocation of work array in xSYTRF as it appears to cause memory management problems (seen in the LAPACK tests) * Disable repeated recursion on Ab_BR in ReLAPACK xGBTRF due to crashes in LAPACK tests * sgemm/strmm * Update Changelog with changes from 0.3.6 * Increment version to 0.3.7.dev * Increment version to 0.3.7.dev * Misc. typo fixes Found via `codespell -q 3 -w -L ith,als,dum,nd,amin,nto,wis,ba -S ./relapack,./kernel,./lapack-netlib` * Correct argument of CPU_ISSET for glibc <2.5 fixes #2104 * conflict resolve * Revert reference/ fixes * Revert Changelog.txt typos * Disable the SkyLakeX DGEMMITCOPY kernel as well as a stopgap measure for https://github.com/numpy/numpy/issues/13401 as mentioned in #1955 * Disable DGEMMINCOPY as well for now #1955 * init * Fix errors in cpu enumeration with glibc 2.6 for #2114 * Change two http links to https Closes #2109 * remove redundant code #2113 * Set up CI with Azure Pipelines [skip ci] * TST: add native POWER8 to CI * add native POWER8 testing to Travis CI matrix with ppc64le os entry * Update link to IBM MASS library, update cpu support status * first try migrating one of the arm builds from travis * fix tabbing in azure commands * Update azure-pipelines.yml take out offending lines (although stolen from https://github.com/conda-forge/opencv-feedstock azure-pipelines fiie) * Update azure-pipelines.yml * Update azure-pipelines.yml * Update azure-pipelines.yml * Update azure-pipelines.yml * DOC: Add Azure CI status badge * Add ARMV6 build to azure CI setup (#2122) using aytekinar's Alpine image and docker script from the Travis setup [skip ci] * TST: Azure manylinux1 & clean-up * remove some of the steps & comments from the original Azure yml template * modify the trigger section to use develop since OpenBLAS primarily uses this branch; use the same batching behavior as downstream projects NumPy/ SciPy * remove Travis emulated ARMv6 gcc build because this now happens in Azure * use documented Ubuntu vmImage name for Azure and add in a manylinux1 test run to the matrix [skip appveyor] * Add NO_AFFINITY to available options on Linux, and set it to ON to match the gmake default. Fixes second part of #2114 * Replace ISMIN and ISAMIN kernels on all x86_64 platforms (#2125) * Mark iamax_sse.S as unsuitable for MIN due to issue #2116 * Use iamax.S rather than iamax_sse.S for ISMIN/ISAMIN on all x86_64 as workaround for #2116 * Move ARMv8 gcc build from Travis to Azure * Move ARMv8 gcc build from Travis to Azure * Update .travis.yml * Test drone CI * install make * remove sudo * Install gcc * Install perl * Install gfortran and add a clang job * gfortran->gcc-gfortran * Switch to ubuntu and parallel jobs * apt update * Fix typo * update yes * no need of gcc in clang build * Add a cmake build as well * Add cmake builds and print options * build without lapack on cmake * parallel build * See if ubuntu 19.04 fixes the ICE * Remove qemu armv8 builds * arm32 build * Fix typo * TST: add SkylakeX AVX512 CI test * adapt the C-level reproducer code for some recent SkylakeX AVX512 kernel issues, provided by Isuru Fernando and modified by Martin Kroeker, for usage in the utest suite * add an Intel SDE SkylakeX emulation utest run to the Azure CI matrix; a custom Docker build was required because Ubuntu image provided by Azure does not support AVX512VL instructions * Add option USE_LOCKING for single-threaded build with locking support for calling from concurrent threads * Add option USE_LOCKING for single-threaded build with locking support * Add option USE_LOCKING for SMP-like locking in USE_THREAD=0 builds * Add option USE_LOCKING but keep default settings intact * Remove unrelated change * Do not try ancient PGI hacks with recent versions of that compiler should fix #2139 * Build and run utests in any case, they do their own checks for fortran availability * Add softfp support in min/max kernels fix for #1912 * Revert "Add softfp support in min/max kernels" * Separate implementations of AMAX and IAMAX on arm As noted in #1912 and comment on #1942, the combined implementation happens to "do the right thing" on hardfp, but cannot return both value and index on softfp where they would have to share the return register * Ensure correct output for DAMAX with softfp * Use generic kernels for complex (I)AMAX to support softfp * improved zgemm power9 based on power8 * upload thread safety test folder * hook up c++ thread safety test (main Makefile) * add c++ thread test option to Makefile.rule * Document NO_AVX512 for #2151 * sgemm pipeline improved, zgemm rewritten without inner packs, ABI lxvx v20 fixed with vs52 * Fix detection of AVX512 capable compilers in getarch 21eda8b5 introduced a check in getarch.c to test if the compiler is capable of AVX512. This check currently fails, since the used __AVX2__ macro is only defined if getarch itself was compiled with AVX2/AVX512 support. Make sure this is the case by building getarch with -march=native on x86_64. It is only supposed to run on the build host anyway. * c_check: Unlink correct file * power9 zgemm ztrmm optimized * conflict resolve * Add gfortran workaround for ABI violations in LAPACKE for #2154 (see gcc bug 90329) * Add gfortran workaround for ABI violations for #2154 (see gcc bug 90329) * Add gfortran workaround for potential ABI violation for #2154 * Update fc.cmake * Remove any inadvertent use of -march=native from DYNAMIC_ARCH builds from #2143, -march=native precludes use of more specific options like -march=skylake-avx512 in individual kernels, and defeats the purpose of dynamic arch anyway. * Avoid unintentional activation of TLS code via USE_TLS=0 fixes #2149 * Do not force gcc options on non-gcc compilers fixes compile failure with pgi 18.10 as reported on OpenBLAS-users * Update Makefile.x86_64 * Zero ecx with a mov instruction PGI assembler does not like the initialization in the constraints. * Fix mov syntax * new sgemm 8x16 * Update dtrmm_kernel_16x4_power8.S * PGI compiler does not like -march=native * Fix build on FreeBSD/powerpc64. Signed-off-by: Piotr Kubaj <pkubaj@anongoth.pl> * Fix build for PPC970 on FreeBSD pt. 1 FreeBSD needs DCBT_ARG=0 as well. * Fix build for PPC970 on FreeBSD pt.2 FreeBSD needs those macros too. * cgemm/ctrmm power9 * Utest needs CBLAS but not necessarily FORTRAN * Add mingw builds to Appveyor config * Add getarch flags to disable AVX on x86 (and other small fixes to match Makefile behaviour) * Make disabling DYNAMIC_ARCH on unsupported systems work needs to be unset in the cache for the change to have any effect * Mingw32 needs leading underscore on object names (also copy BUNDERSCORE settings for FORTRAN from the corresponding Makefile)
6 years ago
13 years ago
rebase? (#1) * With the Intel compiler on Linux, prefer ifort for the final link step icc has known problems with mixed-language builds that ifort can handle just fine. Fixes #1956 * Rename operands to put lda on the input/output constraint list * Fix wrong constraints in inline assembly for #2009 * Fix inline assembly constraints rework indices to allow marking argument lda4 as input and output. For #2009 * Fix inline assembly constraints rework indices to allow marking argument lda as input and output. * Fix inline assembly constraints * Fix inline assembly constraints * Fix inline assembly constraints in Bulldozer TRSM kernels rework indices to allow marking i,as and bs as both input and output (marked operand n1 as well for simplicity). For #2009 * Correct range_n limiting same bug as seen in #1388, somehow missed in corresponding PR #1389 * Allow multithreading TRMV again revert workaround introduced for issue #1332 as the actual cause appears to be my incorrect fix from #1262 (see #1388) * Fix error introduced during cleanup * Reduce list of kernels in the dynamic arch build to make compilation complete reliably within the 1h limit again * init * move fix to right place * Fix missing -c option in AVX512 test * Fix AVX512 test always returning false due to missing compiler option * Make x86_32 imply NO_AVX2, NO_AVX512 in addition to NO_AVX fixes #2033 * Keep xcode8.3 for osx BINARY=32 build as xcode10 deprecated i386 * Make sure that AVX512 is disabled in 32bit builds for #2033 * Improve handling of NO_STATIC and NO_SHARED to avoid surprises from defining either as zero. Fixes #2035 by addressing some concerns from #1422 * init * address warning introed with #1814 et al * Restore locking optimizations for OpenMP case restore another accidentally dropped part of #1468 that was missed in #2004 to address performance regression reported in #1461 * HiSilicon tsv110 CPUs optimization branch add HiSilicon tsv110 CPUs optimization branch * add TARGET support for HiSilicon tsv110 CPUs * add TARGET support for HiSilicon tsv110 CPUs * add TARGET support for HiSilicon tsv110 CPUs * Fix module definition conflicts between LAPACK and ReLAPACK for #2043 * Do not compile in AVX512 check if AVX support is disabled xgetbv is function depends on NO_AVX being undefined - we could change that too, but that combo is unlikely to work anyway * ctest.c : add __POWERPC__ for PowerMac * Fix crash in sgemm SSE/nano kernel on x86_64 Fix bug #2047. Signed-off-by: Celelibi <celelibi@gmail.com> * param.h : enable defines for PPC970 on DarwinOS fixes: gemm.c: In function 'sgemm_': ../common_param.h:981:18: error: 'SGEMM_DEFAULT_P' undeclared (first use in this function) #define SGEMM_P SGEMM_DEFAULT_P ^ * common_power.h: force DCBT_ARG 0 on PPC970 Darwin without this, we see ../kernel/power/gemv_n.S:427:Parameter syntax error and many more similar entries that relates to this assembly command dcbt 8, r24, r18 this change makes the DCBT_ARG = 0 and openblas builds through to completion on PowerMac 970 Tests pass * Make TARGET=GENERIC compatible with DYNAMIC_ARCH=1 for issue #2048 * make DYNAMIC_ARCH=1 package work on TSV110. * make DYNAMIC_ARCH=1 package work on TSV110 * Add Intel Denverton for #2048 * Add Intel Denverton * Change 64-bit detection as explained in #2056 * Trivial typo fix as suggested in #2022 * Disable the AVX512 DGEMM kernel (again) Due to as yet unresolved errors seen in #1955 and #2029 * Use POSIX getenv on Cygwin The Windows-native GetEnvironmentVariable cannot be relied on, as Cygwin does not always copy environment variables set through Cygwin to the Windows environment block, particularly after fork(). * Fix for #2063: The DllMain used in Cygwin did not run the thread memory pool cleanup upon THREAD_DETACH which is needed when compiled with USE_TLS=1. * Also call CloseHandle on each thread, as well as on the event so as to not leak thread handles. * AIX asm syntax changes needed for shared object creation * power9 makefile. dgemm based on power8 kernel with following changes : 32x unrolled 16x4 kernel and 8x4 kernel using (lxv stxv butterfly rank1 update). improvement from 17 to 22-23gflops. dtrmm cases were added into dgemm itself * Expose CBLAS interfaces for I?MIN and I?MAX * Build CBLAS interfaces for I?MIN and I?MAX * Add declarations for ?sum and cblas_?sum * Add interface for ?sum (derived from ?asum) * Add ?sum * Add implementations of ssum/dsum and csum/zsum as trivial copies of asum/zsasum with the fabs calls replaced by fmov to preserve code structure * Add ARM implementations of ?sum (trivial copies of the respective ?asum with the fabs calls removed) * Add ARM64 implementations of ?sum as trivial copies of the respective ?asum kernels with the fabs calls removed * Add ia64 implementation of ?sum as trivial copy of asum with the fabs calls removed * Add MIPS implementation of ?sum as trivial copy of ?asum with the fabs calls removed * Add MIPS64 implementation of ?sum as trivial copy of ?asum with the fabs replaced by mov to preserve code structure * Add POWER implementation of ?sum as trivial copy of ?asum with the fabs replaced by fmr to preserve code structure * Add SPARC implementation of ?sum as trivial copy of ?asum with the fabs replaced by fmov to preserve code structure * Add x86 implementation of ?sum as trivial copy of ?asum with the fabs calls removed * Add x86_64 implementation of ?sum as trivial copy of ?asum with the fabs calls removed * Add ZARCH implementation of ?sum as trivial copies of the respective ?asum kernels with the ABS and vflpsb calls removed * Detect 32bit environment on 64bit ARM hardware for #2056, using same approach as #2058 * Add cmake defaults for ?sum kernels * Add ?sum * Add ?sum definitions for generic kernel * Add declarations for ?sum * Add -lm and disable EXPRECISION support on *BSD fixes #2075 * Add in runtime CPU detection for POWER. * snprintf define consolidated to common.h * Support INTERFACE64=1 * Add support for INTERFACE64 and fix XERBLA calls 1. Replaced all instances of "int" with "blasint" 2. Added string length as "hidden" third parameter in calls to fortran XERBLA * Correct length of name string in xerbla call * Avoid out-of-bounds accesses in LAPACK EIG tests see https://github.com/Reference-LAPACK/lapack/issues/333 * Correct INFO=4 condition * Disable reallocation of work array in xSYTRF as it appears to cause memory management problems (seen in the LAPACK tests) * Disable repeated recursion on Ab_BR in ReLAPACK xGBTRF due to crashes in LAPACK tests * sgemm/strmm * Update Changelog with changes from 0.3.6 * Increment version to 0.3.7.dev * Increment version to 0.3.7.dev * Misc. typo fixes Found via `codespell -q 3 -w -L ith,als,dum,nd,amin,nto,wis,ba -S ./relapack,./kernel,./lapack-netlib` * Correct argument of CPU_ISSET for glibc <2.5 fixes #2104 * conflict resolve * Revert reference/ fixes * Revert Changelog.txt typos * Disable the SkyLakeX DGEMMITCOPY kernel as well as a stopgap measure for https://github.com/numpy/numpy/issues/13401 as mentioned in #1955 * Disable DGEMMINCOPY as well for now #1955 * init * Fix errors in cpu enumeration with glibc 2.6 for #2114 * Change two http links to https Closes #2109 * remove redundant code #2113 * Set up CI with Azure Pipelines [skip ci] * TST: add native POWER8 to CI * add native POWER8 testing to Travis CI matrix with ppc64le os entry * Update link to IBM MASS library, update cpu support status * first try migrating one of the arm builds from travis * fix tabbing in azure commands * Update azure-pipelines.yml take out offending lines (although stolen from https://github.com/conda-forge/opencv-feedstock azure-pipelines fiie) * Update azure-pipelines.yml * Update azure-pipelines.yml * Update azure-pipelines.yml * Update azure-pipelines.yml * DOC: Add Azure CI status badge * Add ARMV6 build to azure CI setup (#2122) using aytekinar's Alpine image and docker script from the Travis setup [skip ci] * TST: Azure manylinux1 & clean-up * remove some of the steps & comments from the original Azure yml template * modify the trigger section to use develop since OpenBLAS primarily uses this branch; use the same batching behavior as downstream projects NumPy/ SciPy * remove Travis emulated ARMv6 gcc build because this now happens in Azure * use documented Ubuntu vmImage name for Azure and add in a manylinux1 test run to the matrix [skip appveyor] * Add NO_AFFINITY to available options on Linux, and set it to ON to match the gmake default. Fixes second part of #2114 * Replace ISMIN and ISAMIN kernels on all x86_64 platforms (#2125) * Mark iamax_sse.S as unsuitable for MIN due to issue #2116 * Use iamax.S rather than iamax_sse.S for ISMIN/ISAMIN on all x86_64 as workaround for #2116 * Move ARMv8 gcc build from Travis to Azure * Move ARMv8 gcc build from Travis to Azure * Update .travis.yml * Test drone CI * install make * remove sudo * Install gcc * Install perl * Install gfortran and add a clang job * gfortran->gcc-gfortran * Switch to ubuntu and parallel jobs * apt update * Fix typo * update yes * no need of gcc in clang build * Add a cmake build as well * Add cmake builds and print options * build without lapack on cmake * parallel build * See if ubuntu 19.04 fixes the ICE * Remove qemu armv8 builds * arm32 build * Fix typo * TST: add SkylakeX AVX512 CI test * adapt the C-level reproducer code for some recent SkylakeX AVX512 kernel issues, provided by Isuru Fernando and modified by Martin Kroeker, for usage in the utest suite * add an Intel SDE SkylakeX emulation utest run to the Azure CI matrix; a custom Docker build was required because Ubuntu image provided by Azure does not support AVX512VL instructions * Add option USE_LOCKING for single-threaded build with locking support for calling from concurrent threads * Add option USE_LOCKING for single-threaded build with locking support * Add option USE_LOCKING for SMP-like locking in USE_THREAD=0 builds * Add option USE_LOCKING but keep default settings intact * Remove unrelated change * Do not try ancient PGI hacks with recent versions of that compiler should fix #2139 * Build and run utests in any case, they do their own checks for fortran availability * Add softfp support in min/max kernels fix for #1912 * Revert "Add softfp support in min/max kernels" * Separate implementations of AMAX and IAMAX on arm As noted in #1912 and comment on #1942, the combined implementation happens to "do the right thing" on hardfp, but cannot return both value and index on softfp where they would have to share the return register * Ensure correct output for DAMAX with softfp * Use generic kernels for complex (I)AMAX to support softfp * improved zgemm power9 based on power8 * upload thread safety test folder * hook up c++ thread safety test (main Makefile) * add c++ thread test option to Makefile.rule * Document NO_AVX512 for #2151 * sgemm pipeline improved, zgemm rewritten without inner packs, ABI lxvx v20 fixed with vs52 * Fix detection of AVX512 capable compilers in getarch 21eda8b5 introduced a check in getarch.c to test if the compiler is capable of AVX512. This check currently fails, since the used __AVX2__ macro is only defined if getarch itself was compiled with AVX2/AVX512 support. Make sure this is the case by building getarch with -march=native on x86_64. It is only supposed to run on the build host anyway. * c_check: Unlink correct file * power9 zgemm ztrmm optimized * conflict resolve * Add gfortran workaround for ABI violations in LAPACKE for #2154 (see gcc bug 90329) * Add gfortran workaround for ABI violations for #2154 (see gcc bug 90329) * Add gfortran workaround for potential ABI violation for #2154 * Update fc.cmake * Remove any inadvertent use of -march=native from DYNAMIC_ARCH builds from #2143, -march=native precludes use of more specific options like -march=skylake-avx512 in individual kernels, and defeats the purpose of dynamic arch anyway. * Avoid unintentional activation of TLS code via USE_TLS=0 fixes #2149 * Do not force gcc options on non-gcc compilers fixes compile failure with pgi 18.10 as reported on OpenBLAS-users * Update Makefile.x86_64 * Zero ecx with a mov instruction PGI assembler does not like the initialization in the constraints. * Fix mov syntax * new sgemm 8x16 * Update dtrmm_kernel_16x4_power8.S * PGI compiler does not like -march=native * Fix build on FreeBSD/powerpc64. Signed-off-by: Piotr Kubaj <pkubaj@anongoth.pl> * Fix build for PPC970 on FreeBSD pt. 1 FreeBSD needs DCBT_ARG=0 as well. * Fix build for PPC970 on FreeBSD pt.2 FreeBSD needs those macros too. * cgemm/ctrmm power9 * Utest needs CBLAS but not necessarily FORTRAN * Add mingw builds to Appveyor config * Add getarch flags to disable AVX on x86 (and other small fixes to match Makefile behaviour) * Make disabling DYNAMIC_ARCH on unsupported systems work needs to be unset in the cache for the change to have any effect * Mingw32 needs leading underscore on object names (also copy BUNDERSCORE settings for FORTRAN from the corresponding Makefile)
6 years ago
rebase? (#1) * With the Intel compiler on Linux, prefer ifort for the final link step icc has known problems with mixed-language builds that ifort can handle just fine. Fixes #1956 * Rename operands to put lda on the input/output constraint list * Fix wrong constraints in inline assembly for #2009 * Fix inline assembly constraints rework indices to allow marking argument lda4 as input and output. For #2009 * Fix inline assembly constraints rework indices to allow marking argument lda as input and output. * Fix inline assembly constraints * Fix inline assembly constraints * Fix inline assembly constraints in Bulldozer TRSM kernels rework indices to allow marking i,as and bs as both input and output (marked operand n1 as well for simplicity). For #2009 * Correct range_n limiting same bug as seen in #1388, somehow missed in corresponding PR #1389 * Allow multithreading TRMV again revert workaround introduced for issue #1332 as the actual cause appears to be my incorrect fix from #1262 (see #1388) * Fix error introduced during cleanup * Reduce list of kernels in the dynamic arch build to make compilation complete reliably within the 1h limit again * init * move fix to right place * Fix missing -c option in AVX512 test * Fix AVX512 test always returning false due to missing compiler option * Make x86_32 imply NO_AVX2, NO_AVX512 in addition to NO_AVX fixes #2033 * Keep xcode8.3 for osx BINARY=32 build as xcode10 deprecated i386 * Make sure that AVX512 is disabled in 32bit builds for #2033 * Improve handling of NO_STATIC and NO_SHARED to avoid surprises from defining either as zero. Fixes #2035 by addressing some concerns from #1422 * init * address warning introed with #1814 et al * Restore locking optimizations for OpenMP case restore another accidentally dropped part of #1468 that was missed in #2004 to address performance regression reported in #1461 * HiSilicon tsv110 CPUs optimization branch add HiSilicon tsv110 CPUs optimization branch * add TARGET support for HiSilicon tsv110 CPUs * add TARGET support for HiSilicon tsv110 CPUs * add TARGET support for HiSilicon tsv110 CPUs * Fix module definition conflicts between LAPACK and ReLAPACK for #2043 * Do not compile in AVX512 check if AVX support is disabled xgetbv is function depends on NO_AVX being undefined - we could change that too, but that combo is unlikely to work anyway * ctest.c : add __POWERPC__ for PowerMac * Fix crash in sgemm SSE/nano kernel on x86_64 Fix bug #2047. Signed-off-by: Celelibi <celelibi@gmail.com> * param.h : enable defines for PPC970 on DarwinOS fixes: gemm.c: In function 'sgemm_': ../common_param.h:981:18: error: 'SGEMM_DEFAULT_P' undeclared (first use in this function) #define SGEMM_P SGEMM_DEFAULT_P ^ * common_power.h: force DCBT_ARG 0 on PPC970 Darwin without this, we see ../kernel/power/gemv_n.S:427:Parameter syntax error and many more similar entries that relates to this assembly command dcbt 8, r24, r18 this change makes the DCBT_ARG = 0 and openblas builds through to completion on PowerMac 970 Tests pass * Make TARGET=GENERIC compatible with DYNAMIC_ARCH=1 for issue #2048 * make DYNAMIC_ARCH=1 package work on TSV110. * make DYNAMIC_ARCH=1 package work on TSV110 * Add Intel Denverton for #2048 * Add Intel Denverton * Change 64-bit detection as explained in #2056 * Trivial typo fix as suggested in #2022 * Disable the AVX512 DGEMM kernel (again) Due to as yet unresolved errors seen in #1955 and #2029 * Use POSIX getenv on Cygwin The Windows-native GetEnvironmentVariable cannot be relied on, as Cygwin does not always copy environment variables set through Cygwin to the Windows environment block, particularly after fork(). * Fix for #2063: The DllMain used in Cygwin did not run the thread memory pool cleanup upon THREAD_DETACH which is needed when compiled with USE_TLS=1. * Also call CloseHandle on each thread, as well as on the event so as to not leak thread handles. * AIX asm syntax changes needed for shared object creation * power9 makefile. dgemm based on power8 kernel with following changes : 32x unrolled 16x4 kernel and 8x4 kernel using (lxv stxv butterfly rank1 update). improvement from 17 to 22-23gflops. dtrmm cases were added into dgemm itself * Expose CBLAS interfaces for I?MIN and I?MAX * Build CBLAS interfaces for I?MIN and I?MAX * Add declarations for ?sum and cblas_?sum * Add interface for ?sum (derived from ?asum) * Add ?sum * Add implementations of ssum/dsum and csum/zsum as trivial copies of asum/zsasum with the fabs calls replaced by fmov to preserve code structure * Add ARM implementations of ?sum (trivial copies of the respective ?asum with the fabs calls removed) * Add ARM64 implementations of ?sum as trivial copies of the respective ?asum kernels with the fabs calls removed * Add ia64 implementation of ?sum as trivial copy of asum with the fabs calls removed * Add MIPS implementation of ?sum as trivial copy of ?asum with the fabs calls removed * Add MIPS64 implementation of ?sum as trivial copy of ?asum with the fabs replaced by mov to preserve code structure * Add POWER implementation of ?sum as trivial copy of ?asum with the fabs replaced by fmr to preserve code structure * Add SPARC implementation of ?sum as trivial copy of ?asum with the fabs replaced by fmov to preserve code structure * Add x86 implementation of ?sum as trivial copy of ?asum with the fabs calls removed * Add x86_64 implementation of ?sum as trivial copy of ?asum with the fabs calls removed * Add ZARCH implementation of ?sum as trivial copies of the respective ?asum kernels with the ABS and vflpsb calls removed * Detect 32bit environment on 64bit ARM hardware for #2056, using same approach as #2058 * Add cmake defaults for ?sum kernels * Add ?sum * Add ?sum definitions for generic kernel * Add declarations for ?sum * Add -lm and disable EXPRECISION support on *BSD fixes #2075 * Add in runtime CPU detection for POWER. * snprintf define consolidated to common.h * Support INTERFACE64=1 * Add support for INTERFACE64 and fix XERBLA calls 1. Replaced all instances of "int" with "blasint" 2. Added string length as "hidden" third parameter in calls to fortran XERBLA * Correct length of name string in xerbla call * Avoid out-of-bounds accesses in LAPACK EIG tests see https://github.com/Reference-LAPACK/lapack/issues/333 * Correct INFO=4 condition * Disable reallocation of work array in xSYTRF as it appears to cause memory management problems (seen in the LAPACK tests) * Disable repeated recursion on Ab_BR in ReLAPACK xGBTRF due to crashes in LAPACK tests * sgemm/strmm * Update Changelog with changes from 0.3.6 * Increment version to 0.3.7.dev * Increment version to 0.3.7.dev * Misc. typo fixes Found via `codespell -q 3 -w -L ith,als,dum,nd,amin,nto,wis,ba -S ./relapack,./kernel,./lapack-netlib` * Correct argument of CPU_ISSET for glibc <2.5 fixes #2104 * conflict resolve * Revert reference/ fixes * Revert Changelog.txt typos * Disable the SkyLakeX DGEMMITCOPY kernel as well as a stopgap measure for https://github.com/numpy/numpy/issues/13401 as mentioned in #1955 * Disable DGEMMINCOPY as well for now #1955 * init * Fix errors in cpu enumeration with glibc 2.6 for #2114 * Change two http links to https Closes #2109 * remove redundant code #2113 * Set up CI with Azure Pipelines [skip ci] * TST: add native POWER8 to CI * add native POWER8 testing to Travis CI matrix with ppc64le os entry * Update link to IBM MASS library, update cpu support status * first try migrating one of the arm builds from travis * fix tabbing in azure commands * Update azure-pipelines.yml take out offending lines (although stolen from https://github.com/conda-forge/opencv-feedstock azure-pipelines fiie) * Update azure-pipelines.yml * Update azure-pipelines.yml * Update azure-pipelines.yml * Update azure-pipelines.yml * DOC: Add Azure CI status badge * Add ARMV6 build to azure CI setup (#2122) using aytekinar's Alpine image and docker script from the Travis setup [skip ci] * TST: Azure manylinux1 & clean-up * remove some of the steps & comments from the original Azure yml template * modify the trigger section to use develop since OpenBLAS primarily uses this branch; use the same batching behavior as downstream projects NumPy/ SciPy * remove Travis emulated ARMv6 gcc build because this now happens in Azure * use documented Ubuntu vmImage name for Azure and add in a manylinux1 test run to the matrix [skip appveyor] * Add NO_AFFINITY to available options on Linux, and set it to ON to match the gmake default. Fixes second part of #2114 * Replace ISMIN and ISAMIN kernels on all x86_64 platforms (#2125) * Mark iamax_sse.S as unsuitable for MIN due to issue #2116 * Use iamax.S rather than iamax_sse.S for ISMIN/ISAMIN on all x86_64 as workaround for #2116 * Move ARMv8 gcc build from Travis to Azure * Move ARMv8 gcc build from Travis to Azure * Update .travis.yml * Test drone CI * install make * remove sudo * Install gcc * Install perl * Install gfortran and add a clang job * gfortran->gcc-gfortran * Switch to ubuntu and parallel jobs * apt update * Fix typo * update yes * no need of gcc in clang build * Add a cmake build as well * Add cmake builds and print options * build without lapack on cmake * parallel build * See if ubuntu 19.04 fixes the ICE * Remove qemu armv8 builds * arm32 build * Fix typo * TST: add SkylakeX AVX512 CI test * adapt the C-level reproducer code for some recent SkylakeX AVX512 kernel issues, provided by Isuru Fernando and modified by Martin Kroeker, for usage in the utest suite * add an Intel SDE SkylakeX emulation utest run to the Azure CI matrix; a custom Docker build was required because Ubuntu image provided by Azure does not support AVX512VL instructions * Add option USE_LOCKING for single-threaded build with locking support for calling from concurrent threads * Add option USE_LOCKING for single-threaded build with locking support * Add option USE_LOCKING for SMP-like locking in USE_THREAD=0 builds * Add option USE_LOCKING but keep default settings intact * Remove unrelated change * Do not try ancient PGI hacks with recent versions of that compiler should fix #2139 * Build and run utests in any case, they do their own checks for fortran availability * Add softfp support in min/max kernels fix for #1912 * Revert "Add softfp support in min/max kernels" * Separate implementations of AMAX and IAMAX on arm As noted in #1912 and comment on #1942, the combined implementation happens to "do the right thing" on hardfp, but cannot return both value and index on softfp where they would have to share the return register * Ensure correct output for DAMAX with softfp * Use generic kernels for complex (I)AMAX to support softfp * improved zgemm power9 based on power8 * upload thread safety test folder * hook up c++ thread safety test (main Makefile) * add c++ thread test option to Makefile.rule * Document NO_AVX512 for #2151 * sgemm pipeline improved, zgemm rewritten without inner packs, ABI lxvx v20 fixed with vs52 * Fix detection of AVX512 capable compilers in getarch 21eda8b5 introduced a check in getarch.c to test if the compiler is capable of AVX512. This check currently fails, since the used __AVX2__ macro is only defined if getarch itself was compiled with AVX2/AVX512 support. Make sure this is the case by building getarch with -march=native on x86_64. It is only supposed to run on the build host anyway. * c_check: Unlink correct file * power9 zgemm ztrmm optimized * conflict resolve * Add gfortran workaround for ABI violations in LAPACKE for #2154 (see gcc bug 90329) * Add gfortran workaround for ABI violations for #2154 (see gcc bug 90329) * Add gfortran workaround for potential ABI violation for #2154 * Update fc.cmake * Remove any inadvertent use of -march=native from DYNAMIC_ARCH builds from #2143, -march=native precludes use of more specific options like -march=skylake-avx512 in individual kernels, and defeats the purpose of dynamic arch anyway. * Avoid unintentional activation of TLS code via USE_TLS=0 fixes #2149 * Do not force gcc options on non-gcc compilers fixes compile failure with pgi 18.10 as reported on OpenBLAS-users * Update Makefile.x86_64 * Zero ecx with a mov instruction PGI assembler does not like the initialization in the constraints. * Fix mov syntax * new sgemm 8x16 * Update dtrmm_kernel_16x4_power8.S * PGI compiler does not like -march=native * Fix build on FreeBSD/powerpc64. Signed-off-by: Piotr Kubaj <pkubaj@anongoth.pl> * Fix build for PPC970 on FreeBSD pt. 1 FreeBSD needs DCBT_ARG=0 as well. * Fix build for PPC970 on FreeBSD pt.2 FreeBSD needs those macros too. * cgemm/ctrmm power9 * Utest needs CBLAS but not necessarily FORTRAN * Add mingw builds to Appveyor config * Add getarch flags to disable AVX on x86 (and other small fixes to match Makefile behaviour) * Make disabling DYNAMIC_ARCH on unsupported systems work needs to be unset in the cache for the change to have any effect * Mingw32 needs leading underscore on object names (also copy BUNDERSCORE settings for FORTRAN from the corresponding Makefile)
6 years ago
rebase? (#1) * With the Intel compiler on Linux, prefer ifort for the final link step icc has known problems with mixed-language builds that ifort can handle just fine. Fixes #1956 * Rename operands to put lda on the input/output constraint list * Fix wrong constraints in inline assembly for #2009 * Fix inline assembly constraints rework indices to allow marking argument lda4 as input and output. For #2009 * Fix inline assembly constraints rework indices to allow marking argument lda as input and output. * Fix inline assembly constraints * Fix inline assembly constraints * Fix inline assembly constraints in Bulldozer TRSM kernels rework indices to allow marking i,as and bs as both input and output (marked operand n1 as well for simplicity). For #2009 * Correct range_n limiting same bug as seen in #1388, somehow missed in corresponding PR #1389 * Allow multithreading TRMV again revert workaround introduced for issue #1332 as the actual cause appears to be my incorrect fix from #1262 (see #1388) * Fix error introduced during cleanup * Reduce list of kernels in the dynamic arch build to make compilation complete reliably within the 1h limit again * init * move fix to right place * Fix missing -c option in AVX512 test * Fix AVX512 test always returning false due to missing compiler option * Make x86_32 imply NO_AVX2, NO_AVX512 in addition to NO_AVX fixes #2033 * Keep xcode8.3 for osx BINARY=32 build as xcode10 deprecated i386 * Make sure that AVX512 is disabled in 32bit builds for #2033 * Improve handling of NO_STATIC and NO_SHARED to avoid surprises from defining either as zero. Fixes #2035 by addressing some concerns from #1422 * init * address warning introed with #1814 et al * Restore locking optimizations for OpenMP case restore another accidentally dropped part of #1468 that was missed in #2004 to address performance regression reported in #1461 * HiSilicon tsv110 CPUs optimization branch add HiSilicon tsv110 CPUs optimization branch * add TARGET support for HiSilicon tsv110 CPUs * add TARGET support for HiSilicon tsv110 CPUs * add TARGET support for HiSilicon tsv110 CPUs * Fix module definition conflicts between LAPACK and ReLAPACK for #2043 * Do not compile in AVX512 check if AVX support is disabled xgetbv is function depends on NO_AVX being undefined - we could change that too, but that combo is unlikely to work anyway * ctest.c : add __POWERPC__ for PowerMac * Fix crash in sgemm SSE/nano kernel on x86_64 Fix bug #2047. Signed-off-by: Celelibi <celelibi@gmail.com> * param.h : enable defines for PPC970 on DarwinOS fixes: gemm.c: In function 'sgemm_': ../common_param.h:981:18: error: 'SGEMM_DEFAULT_P' undeclared (first use in this function) #define SGEMM_P SGEMM_DEFAULT_P ^ * common_power.h: force DCBT_ARG 0 on PPC970 Darwin without this, we see ../kernel/power/gemv_n.S:427:Parameter syntax error and many more similar entries that relates to this assembly command dcbt 8, r24, r18 this change makes the DCBT_ARG = 0 and openblas builds through to completion on PowerMac 970 Tests pass * Make TARGET=GENERIC compatible with DYNAMIC_ARCH=1 for issue #2048 * make DYNAMIC_ARCH=1 package work on TSV110. * make DYNAMIC_ARCH=1 package work on TSV110 * Add Intel Denverton for #2048 * Add Intel Denverton * Change 64-bit detection as explained in #2056 * Trivial typo fix as suggested in #2022 * Disable the AVX512 DGEMM kernel (again) Due to as yet unresolved errors seen in #1955 and #2029 * Use POSIX getenv on Cygwin The Windows-native GetEnvironmentVariable cannot be relied on, as Cygwin does not always copy environment variables set through Cygwin to the Windows environment block, particularly after fork(). * Fix for #2063: The DllMain used in Cygwin did not run the thread memory pool cleanup upon THREAD_DETACH which is needed when compiled with USE_TLS=1. * Also call CloseHandle on each thread, as well as on the event so as to not leak thread handles. * AIX asm syntax changes needed for shared object creation * power9 makefile. dgemm based on power8 kernel with following changes : 32x unrolled 16x4 kernel and 8x4 kernel using (lxv stxv butterfly rank1 update). improvement from 17 to 22-23gflops. dtrmm cases were added into dgemm itself * Expose CBLAS interfaces for I?MIN and I?MAX * Build CBLAS interfaces for I?MIN and I?MAX * Add declarations for ?sum and cblas_?sum * Add interface for ?sum (derived from ?asum) * Add ?sum * Add implementations of ssum/dsum and csum/zsum as trivial copies of asum/zsasum with the fabs calls replaced by fmov to preserve code structure * Add ARM implementations of ?sum (trivial copies of the respective ?asum with the fabs calls removed) * Add ARM64 implementations of ?sum as trivial copies of the respective ?asum kernels with the fabs calls removed * Add ia64 implementation of ?sum as trivial copy of asum with the fabs calls removed * Add MIPS implementation of ?sum as trivial copy of ?asum with the fabs calls removed * Add MIPS64 implementation of ?sum as trivial copy of ?asum with the fabs replaced by mov to preserve code structure * Add POWER implementation of ?sum as trivial copy of ?asum with the fabs replaced by fmr to preserve code structure * Add SPARC implementation of ?sum as trivial copy of ?asum with the fabs replaced by fmov to preserve code structure * Add x86 implementation of ?sum as trivial copy of ?asum with the fabs calls removed * Add x86_64 implementation of ?sum as trivial copy of ?asum with the fabs calls removed * Add ZARCH implementation of ?sum as trivial copies of the respective ?asum kernels with the ABS and vflpsb calls removed * Detect 32bit environment on 64bit ARM hardware for #2056, using same approach as #2058 * Add cmake defaults for ?sum kernels * Add ?sum * Add ?sum definitions for generic kernel * Add declarations for ?sum * Add -lm and disable EXPRECISION support on *BSD fixes #2075 * Add in runtime CPU detection for POWER. * snprintf define consolidated to common.h * Support INTERFACE64=1 * Add support for INTERFACE64 and fix XERBLA calls 1. Replaced all instances of "int" with "blasint" 2. Added string length as "hidden" third parameter in calls to fortran XERBLA * Correct length of name string in xerbla call * Avoid out-of-bounds accesses in LAPACK EIG tests see https://github.com/Reference-LAPACK/lapack/issues/333 * Correct INFO=4 condition * Disable reallocation of work array in xSYTRF as it appears to cause memory management problems (seen in the LAPACK tests) * Disable repeated recursion on Ab_BR in ReLAPACK xGBTRF due to crashes in LAPACK tests * sgemm/strmm * Update Changelog with changes from 0.3.6 * Increment version to 0.3.7.dev * Increment version to 0.3.7.dev * Misc. typo fixes Found via `codespell -q 3 -w -L ith,als,dum,nd,amin,nto,wis,ba -S ./relapack,./kernel,./lapack-netlib` * Correct argument of CPU_ISSET for glibc <2.5 fixes #2104 * conflict resolve * Revert reference/ fixes * Revert Changelog.txt typos * Disable the SkyLakeX DGEMMITCOPY kernel as well as a stopgap measure for https://github.com/numpy/numpy/issues/13401 as mentioned in #1955 * Disable DGEMMINCOPY as well for now #1955 * init * Fix errors in cpu enumeration with glibc 2.6 for #2114 * Change two http links to https Closes #2109 * remove redundant code #2113 * Set up CI with Azure Pipelines [skip ci] * TST: add native POWER8 to CI * add native POWER8 testing to Travis CI matrix with ppc64le os entry * Update link to IBM MASS library, update cpu support status * first try migrating one of the arm builds from travis * fix tabbing in azure commands * Update azure-pipelines.yml take out offending lines (although stolen from https://github.com/conda-forge/opencv-feedstock azure-pipelines fiie) * Update azure-pipelines.yml * Update azure-pipelines.yml * Update azure-pipelines.yml * Update azure-pipelines.yml * DOC: Add Azure CI status badge * Add ARMV6 build to azure CI setup (#2122) using aytekinar's Alpine image and docker script from the Travis setup [skip ci] * TST: Azure manylinux1 & clean-up * remove some of the steps & comments from the original Azure yml template * modify the trigger section to use develop since OpenBLAS primarily uses this branch; use the same batching behavior as downstream projects NumPy/ SciPy * remove Travis emulated ARMv6 gcc build because this now happens in Azure * use documented Ubuntu vmImage name for Azure and add in a manylinux1 test run to the matrix [skip appveyor] * Add NO_AFFINITY to available options on Linux, and set it to ON to match the gmake default. Fixes second part of #2114 * Replace ISMIN and ISAMIN kernels on all x86_64 platforms (#2125) * Mark iamax_sse.S as unsuitable for MIN due to issue #2116 * Use iamax.S rather than iamax_sse.S for ISMIN/ISAMIN on all x86_64 as workaround for #2116 * Move ARMv8 gcc build from Travis to Azure * Move ARMv8 gcc build from Travis to Azure * Update .travis.yml * Test drone CI * install make * remove sudo * Install gcc * Install perl * Install gfortran and add a clang job * gfortran->gcc-gfortran * Switch to ubuntu and parallel jobs * apt update * Fix typo * update yes * no need of gcc in clang build * Add a cmake build as well * Add cmake builds and print options * build without lapack on cmake * parallel build * See if ubuntu 19.04 fixes the ICE * Remove qemu armv8 builds * arm32 build * Fix typo * TST: add SkylakeX AVX512 CI test * adapt the C-level reproducer code for some recent SkylakeX AVX512 kernel issues, provided by Isuru Fernando and modified by Martin Kroeker, for usage in the utest suite * add an Intel SDE SkylakeX emulation utest run to the Azure CI matrix; a custom Docker build was required because Ubuntu image provided by Azure does not support AVX512VL instructions * Add option USE_LOCKING for single-threaded build with locking support for calling from concurrent threads * Add option USE_LOCKING for single-threaded build with locking support * Add option USE_LOCKING for SMP-like locking in USE_THREAD=0 builds * Add option USE_LOCKING but keep default settings intact * Remove unrelated change * Do not try ancient PGI hacks with recent versions of that compiler should fix #2139 * Build and run utests in any case, they do their own checks for fortran availability * Add softfp support in min/max kernels fix for #1912 * Revert "Add softfp support in min/max kernels" * Separate implementations of AMAX and IAMAX on arm As noted in #1912 and comment on #1942, the combined implementation happens to "do the right thing" on hardfp, but cannot return both value and index on softfp where they would have to share the return register * Ensure correct output for DAMAX with softfp * Use generic kernels for complex (I)AMAX to support softfp * improved zgemm power9 based on power8 * upload thread safety test folder * hook up c++ thread safety test (main Makefile) * add c++ thread test option to Makefile.rule * Document NO_AVX512 for #2151 * sgemm pipeline improved, zgemm rewritten without inner packs, ABI lxvx v20 fixed with vs52 * Fix detection of AVX512 capable compilers in getarch 21eda8b5 introduced a check in getarch.c to test if the compiler is capable of AVX512. This check currently fails, since the used __AVX2__ macro is only defined if getarch itself was compiled with AVX2/AVX512 support. Make sure this is the case by building getarch with -march=native on x86_64. It is only supposed to run on the build host anyway. * c_check: Unlink correct file * power9 zgemm ztrmm optimized * conflict resolve * Add gfortran workaround for ABI violations in LAPACKE for #2154 (see gcc bug 90329) * Add gfortran workaround for ABI violations for #2154 (see gcc bug 90329) * Add gfortran workaround for potential ABI violation for #2154 * Update fc.cmake * Remove any inadvertent use of -march=native from DYNAMIC_ARCH builds from #2143, -march=native precludes use of more specific options like -march=skylake-avx512 in individual kernels, and defeats the purpose of dynamic arch anyway. * Avoid unintentional activation of TLS code via USE_TLS=0 fixes #2149 * Do not force gcc options on non-gcc compilers fixes compile failure with pgi 18.10 as reported on OpenBLAS-users * Update Makefile.x86_64 * Zero ecx with a mov instruction PGI assembler does not like the initialization in the constraints. * Fix mov syntax * new sgemm 8x16 * Update dtrmm_kernel_16x4_power8.S * PGI compiler does not like -march=native * Fix build on FreeBSD/powerpc64. Signed-off-by: Piotr Kubaj <pkubaj@anongoth.pl> * Fix build for PPC970 on FreeBSD pt. 1 FreeBSD needs DCBT_ARG=0 as well. * Fix build for PPC970 on FreeBSD pt.2 FreeBSD needs those macros too. * cgemm/ctrmm power9 * Utest needs CBLAS but not necessarily FORTRAN * Add mingw builds to Appveyor config * Add getarch flags to disable AVX on x86 (and other small fixes to match Makefile behaviour) * Make disabling DYNAMIC_ARCH on unsupported systems work needs to be unset in the cache for the change to have any effect * Mingw32 needs leading underscore on object names (also copy BUNDERSCORE settings for FORTRAN from the corresponding Makefile)
6 years ago
rebase? (#1) * With the Intel compiler on Linux, prefer ifort for the final link step icc has known problems with mixed-language builds that ifort can handle just fine. Fixes #1956 * Rename operands to put lda on the input/output constraint list * Fix wrong constraints in inline assembly for #2009 * Fix inline assembly constraints rework indices to allow marking argument lda4 as input and output. For #2009 * Fix inline assembly constraints rework indices to allow marking argument lda as input and output. * Fix inline assembly constraints * Fix inline assembly constraints * Fix inline assembly constraints in Bulldozer TRSM kernels rework indices to allow marking i,as and bs as both input and output (marked operand n1 as well for simplicity). For #2009 * Correct range_n limiting same bug as seen in #1388, somehow missed in corresponding PR #1389 * Allow multithreading TRMV again revert workaround introduced for issue #1332 as the actual cause appears to be my incorrect fix from #1262 (see #1388) * Fix error introduced during cleanup * Reduce list of kernels in the dynamic arch build to make compilation complete reliably within the 1h limit again * init * move fix to right place * Fix missing -c option in AVX512 test * Fix AVX512 test always returning false due to missing compiler option * Make x86_32 imply NO_AVX2, NO_AVX512 in addition to NO_AVX fixes #2033 * Keep xcode8.3 for osx BINARY=32 build as xcode10 deprecated i386 * Make sure that AVX512 is disabled in 32bit builds for #2033 * Improve handling of NO_STATIC and NO_SHARED to avoid surprises from defining either as zero. Fixes #2035 by addressing some concerns from #1422 * init * address warning introed with #1814 et al * Restore locking optimizations for OpenMP case restore another accidentally dropped part of #1468 that was missed in #2004 to address performance regression reported in #1461 * HiSilicon tsv110 CPUs optimization branch add HiSilicon tsv110 CPUs optimization branch * add TARGET support for HiSilicon tsv110 CPUs * add TARGET support for HiSilicon tsv110 CPUs * add TARGET support for HiSilicon tsv110 CPUs * Fix module definition conflicts between LAPACK and ReLAPACK for #2043 * Do not compile in AVX512 check if AVX support is disabled xgetbv is function depends on NO_AVX being undefined - we could change that too, but that combo is unlikely to work anyway * ctest.c : add __POWERPC__ for PowerMac * Fix crash in sgemm SSE/nano kernel on x86_64 Fix bug #2047. Signed-off-by: Celelibi <celelibi@gmail.com> * param.h : enable defines for PPC970 on DarwinOS fixes: gemm.c: In function 'sgemm_': ../common_param.h:981:18: error: 'SGEMM_DEFAULT_P' undeclared (first use in this function) #define SGEMM_P SGEMM_DEFAULT_P ^ * common_power.h: force DCBT_ARG 0 on PPC970 Darwin without this, we see ../kernel/power/gemv_n.S:427:Parameter syntax error and many more similar entries that relates to this assembly command dcbt 8, r24, r18 this change makes the DCBT_ARG = 0 and openblas builds through to completion on PowerMac 970 Tests pass * Make TARGET=GENERIC compatible with DYNAMIC_ARCH=1 for issue #2048 * make DYNAMIC_ARCH=1 package work on TSV110. * make DYNAMIC_ARCH=1 package work on TSV110 * Add Intel Denverton for #2048 * Add Intel Denverton * Change 64-bit detection as explained in #2056 * Trivial typo fix as suggested in #2022 * Disable the AVX512 DGEMM kernel (again) Due to as yet unresolved errors seen in #1955 and #2029 * Use POSIX getenv on Cygwin The Windows-native GetEnvironmentVariable cannot be relied on, as Cygwin does not always copy environment variables set through Cygwin to the Windows environment block, particularly after fork(). * Fix for #2063: The DllMain used in Cygwin did not run the thread memory pool cleanup upon THREAD_DETACH which is needed when compiled with USE_TLS=1. * Also call CloseHandle on each thread, as well as on the event so as to not leak thread handles. * AIX asm syntax changes needed for shared object creation * power9 makefile. dgemm based on power8 kernel with following changes : 32x unrolled 16x4 kernel and 8x4 kernel using (lxv stxv butterfly rank1 update). improvement from 17 to 22-23gflops. dtrmm cases were added into dgemm itself * Expose CBLAS interfaces for I?MIN and I?MAX * Build CBLAS interfaces for I?MIN and I?MAX * Add declarations for ?sum and cblas_?sum * Add interface for ?sum (derived from ?asum) * Add ?sum * Add implementations of ssum/dsum and csum/zsum as trivial copies of asum/zsasum with the fabs calls replaced by fmov to preserve code structure * Add ARM implementations of ?sum (trivial copies of the respective ?asum with the fabs calls removed) * Add ARM64 implementations of ?sum as trivial copies of the respective ?asum kernels with the fabs calls removed * Add ia64 implementation of ?sum as trivial copy of asum with the fabs calls removed * Add MIPS implementation of ?sum as trivial copy of ?asum with the fabs calls removed * Add MIPS64 implementation of ?sum as trivial copy of ?asum with the fabs replaced by mov to preserve code structure * Add POWER implementation of ?sum as trivial copy of ?asum with the fabs replaced by fmr to preserve code structure * Add SPARC implementation of ?sum as trivial copy of ?asum with the fabs replaced by fmov to preserve code structure * Add x86 implementation of ?sum as trivial copy of ?asum with the fabs calls removed * Add x86_64 implementation of ?sum as trivial copy of ?asum with the fabs calls removed * Add ZARCH implementation of ?sum as trivial copies of the respective ?asum kernels with the ABS and vflpsb calls removed * Detect 32bit environment on 64bit ARM hardware for #2056, using same approach as #2058 * Add cmake defaults for ?sum kernels * Add ?sum * Add ?sum definitions for generic kernel * Add declarations for ?sum * Add -lm and disable EXPRECISION support on *BSD fixes #2075 * Add in runtime CPU detection for POWER. * snprintf define consolidated to common.h * Support INTERFACE64=1 * Add support for INTERFACE64 and fix XERBLA calls 1. Replaced all instances of "int" with "blasint" 2. Added string length as "hidden" third parameter in calls to fortran XERBLA * Correct length of name string in xerbla call * Avoid out-of-bounds accesses in LAPACK EIG tests see https://github.com/Reference-LAPACK/lapack/issues/333 * Correct INFO=4 condition * Disable reallocation of work array in xSYTRF as it appears to cause memory management problems (seen in the LAPACK tests) * Disable repeated recursion on Ab_BR in ReLAPACK xGBTRF due to crashes in LAPACK tests * sgemm/strmm * Update Changelog with changes from 0.3.6 * Increment version to 0.3.7.dev * Increment version to 0.3.7.dev * Misc. typo fixes Found via `codespell -q 3 -w -L ith,als,dum,nd,amin,nto,wis,ba -S ./relapack,./kernel,./lapack-netlib` * Correct argument of CPU_ISSET for glibc <2.5 fixes #2104 * conflict resolve * Revert reference/ fixes * Revert Changelog.txt typos * Disable the SkyLakeX DGEMMITCOPY kernel as well as a stopgap measure for https://github.com/numpy/numpy/issues/13401 as mentioned in #1955 * Disable DGEMMINCOPY as well for now #1955 * init * Fix errors in cpu enumeration with glibc 2.6 for #2114 * Change two http links to https Closes #2109 * remove redundant code #2113 * Set up CI with Azure Pipelines [skip ci] * TST: add native POWER8 to CI * add native POWER8 testing to Travis CI matrix with ppc64le os entry * Update link to IBM MASS library, update cpu support status * first try migrating one of the arm builds from travis * fix tabbing in azure commands * Update azure-pipelines.yml take out offending lines (although stolen from https://github.com/conda-forge/opencv-feedstock azure-pipelines fiie) * Update azure-pipelines.yml * Update azure-pipelines.yml * Update azure-pipelines.yml * Update azure-pipelines.yml * DOC: Add Azure CI status badge * Add ARMV6 build to azure CI setup (#2122) using aytekinar's Alpine image and docker script from the Travis setup [skip ci] * TST: Azure manylinux1 & clean-up * remove some of the steps & comments from the original Azure yml template * modify the trigger section to use develop since OpenBLAS primarily uses this branch; use the same batching behavior as downstream projects NumPy/ SciPy * remove Travis emulated ARMv6 gcc build because this now happens in Azure * use documented Ubuntu vmImage name for Azure and add in a manylinux1 test run to the matrix [skip appveyor] * Add NO_AFFINITY to available options on Linux, and set it to ON to match the gmake default. Fixes second part of #2114 * Replace ISMIN and ISAMIN kernels on all x86_64 platforms (#2125) * Mark iamax_sse.S as unsuitable for MIN due to issue #2116 * Use iamax.S rather than iamax_sse.S for ISMIN/ISAMIN on all x86_64 as workaround for #2116 * Move ARMv8 gcc build from Travis to Azure * Move ARMv8 gcc build from Travis to Azure * Update .travis.yml * Test drone CI * install make * remove sudo * Install gcc * Install perl * Install gfortran and add a clang job * gfortran->gcc-gfortran * Switch to ubuntu and parallel jobs * apt update * Fix typo * update yes * no need of gcc in clang build * Add a cmake build as well * Add cmake builds and print options * build without lapack on cmake * parallel build * See if ubuntu 19.04 fixes the ICE * Remove qemu armv8 builds * arm32 build * Fix typo * TST: add SkylakeX AVX512 CI test * adapt the C-level reproducer code for some recent SkylakeX AVX512 kernel issues, provided by Isuru Fernando and modified by Martin Kroeker, for usage in the utest suite * add an Intel SDE SkylakeX emulation utest run to the Azure CI matrix; a custom Docker build was required because Ubuntu image provided by Azure does not support AVX512VL instructions * Add option USE_LOCKING for single-threaded build with locking support for calling from concurrent threads * Add option USE_LOCKING for single-threaded build with locking support * Add option USE_LOCKING for SMP-like locking in USE_THREAD=0 builds * Add option USE_LOCKING but keep default settings intact * Remove unrelated change * Do not try ancient PGI hacks with recent versions of that compiler should fix #2139 * Build and run utests in any case, they do their own checks for fortran availability * Add softfp support in min/max kernels fix for #1912 * Revert "Add softfp support in min/max kernels" * Separate implementations of AMAX and IAMAX on arm As noted in #1912 and comment on #1942, the combined implementation happens to "do the right thing" on hardfp, but cannot return both value and index on softfp where they would have to share the return register * Ensure correct output for DAMAX with softfp * Use generic kernels for complex (I)AMAX to support softfp * improved zgemm power9 based on power8 * upload thread safety test folder * hook up c++ thread safety test (main Makefile) * add c++ thread test option to Makefile.rule * Document NO_AVX512 for #2151 * sgemm pipeline improved, zgemm rewritten without inner packs, ABI lxvx v20 fixed with vs52 * Fix detection of AVX512 capable compilers in getarch 21eda8b5 introduced a check in getarch.c to test if the compiler is capable of AVX512. This check currently fails, since the used __AVX2__ macro is only defined if getarch itself was compiled with AVX2/AVX512 support. Make sure this is the case by building getarch with -march=native on x86_64. It is only supposed to run on the build host anyway. * c_check: Unlink correct file * power9 zgemm ztrmm optimized * conflict resolve * Add gfortran workaround for ABI violations in LAPACKE for #2154 (see gcc bug 90329) * Add gfortran workaround for ABI violations for #2154 (see gcc bug 90329) * Add gfortran workaround for potential ABI violation for #2154 * Update fc.cmake * Remove any inadvertent use of -march=native from DYNAMIC_ARCH builds from #2143, -march=native precludes use of more specific options like -march=skylake-avx512 in individual kernels, and defeats the purpose of dynamic arch anyway. * Avoid unintentional activation of TLS code via USE_TLS=0 fixes #2149 * Do not force gcc options on non-gcc compilers fixes compile failure with pgi 18.10 as reported on OpenBLAS-users * Update Makefile.x86_64 * Zero ecx with a mov instruction PGI assembler does not like the initialization in the constraints. * Fix mov syntax * new sgemm 8x16 * Update dtrmm_kernel_16x4_power8.S * PGI compiler does not like -march=native * Fix build on FreeBSD/powerpc64. Signed-off-by: Piotr Kubaj <pkubaj@anongoth.pl> * Fix build for PPC970 on FreeBSD pt. 1 FreeBSD needs DCBT_ARG=0 as well. * Fix build for PPC970 on FreeBSD pt.2 FreeBSD needs those macros too. * cgemm/ctrmm power9 * Utest needs CBLAS but not necessarily FORTRAN * Add mingw builds to Appveyor config * Add getarch flags to disable AVX on x86 (and other small fixes to match Makefile behaviour) * Make disabling DYNAMIC_ARCH on unsupported systems work needs to be unset in the cache for the change to have any effect * Mingw32 needs leading underscore on object names (also copy BUNDERSCORE settings for FORTRAN from the corresponding Makefile)
6 years ago
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788899091929394959697989910010110210310410510610710810911011111211311411511611711811912012112212312412512612712812913013113213313413513613713813914014114214314414514614714814915015115215315415515615715815916016116216316416516616716816917017117217317417517617717817918018118218318418518618718818919019119219319419519619719819920020120220320420520620720820921021121221321421521621721821922022122222322422522622722822923023123223323423523623723823924024124224324424524624724824925025125225325425525625725825926026126226326426526626726826927027127227327427527627727827928028128228328428528628728828929029129229329429529629729829930030130230330430530630730830931031131231331431531631731831932032132232332432532632732832933033133233333433533633733833934034134234334434534634734834935035135235335435535635735835936036136236336436536636736836937037137237337437537637737837938038138238338438538638738838939039139239339439539639739839940040140240340440540640740840941041141241341441541641741841942042142242342442542642742842943043143243343443543643743843944044144244344444544644744844945045145245345445545645745845946046146246346446546646746846947047147247347447547647747847948048148248348448548648748848949049149249349449549649749849950050150250350450550650750850951051151251351451551651751851952052152252352452552652752852953053153253353453553653753853954054154254354454554654754854955055155255355455555655755855956056156256356456556656756856957057157257357457557657757857958058158258358458558658758858959059159259359459559659759859960060160260360460560660760860961061161261361461561661761861962062162262362462562662762862963063163263363463563663763863964064164264364464564664764864965065165265365465565665765865966066166266366466566666766866967067167267367467567667767867968068168268368468568668768868969069169269369469569669769869970070170270370470570670770870971071171271371471571671771871972072172272372472572672772872973073173273373473573673773873974074174274374474574674774874975075175275375475575675775875976076176276376476576676776876977077177277377477577677777877978078178278378478578678778878979079179279379479579679779879980080180280380480580680780880981081181281381481581681781881982082182282382482582682782882983083183283383483583683783883984084184284384484584684784884985085185285385485585685785885986086186286386486586686786886987087187287387487587687787887988088188288388488588688788888989089189289389489589689789889990090190290390490590690790890991091191291391491591691791891992092192292392492592692792892993093193293393493593693793893994094194294394494594694794894995095195295395495595695795895996096196296396496596696796896997097197297397497597697797897998098198298398498598698798898999099199299399499599699799899910001001100210031004100510061007100810091010101110121013101410151016101710181019102010211022102310241025102610271028102910301031103210331034103510361037103810391040104110421043104410451046104710481049105010511052105310541055105610571058105910601061106210631064106510661067106810691070107110721073107410751076107710781079108010811082108310841085108610871088108910901091109210931094109510961097109810991100110111021103110411051106110711081109111011111112111311141115111611171118111911201121112211231124112511261127112811291130113111321133113411351136113711381139114011411142114311441145114611471148114911501151115211531154115511561157115811591160116111621163116411651166116711681169117011711172117311741175117611771178117911801181118211831184118511861187118811891190119111921193119411951196119711981199120012011202120312041205120612071208120912101211121212131214121512161217121812191220122112221223122412251226122712281229123012311232123312341235123612371238123912401241124212431244124512461247124812491250125112521253125412551256125712581259126012611262126312641265126612671268126912701271127212731274127512761277127812791280128112821283128412851286128712881289129012911292129312941295129612971298129913001301130213031304130513061307130813091310131113121313131413151316131713181319132013211322132313241325132613271328132913301331133213331334133513361337133813391340134113421343134413451346134713481349135013511352135313541355135613571358135913601361136213631364136513661367136813691370137113721373137413751376137713781379138013811382138313841385138613871388138913901391139213931394139513961397139813991400
  1. #
  2. # Include user definition
  3. #
  4. # TO suppress recursive includes
  5. INCLUDED = 1
  6. ifndef TOPDIR
  7. TOPDIR = .
  8. endif
  9. # If ARCH is not set, we use the host system's architecture.
  10. ifndef ARCH
  11. ARCH := $(shell uname -m)
  12. endif
  13. # Catch conflicting usage of ARCH in some BSD environments
  14. ifeq ($(ARCH), amd64)
  15. override ARCH=x86_64
  16. else ifeq ($(ARCH), powerpc64)
  17. override ARCH=power
  18. else ifeq ($(ARCH), i386)
  19. override ARCH=x86
  20. else ifeq ($(ARCH), aarch64)
  21. override ARCH=arm64
  22. endif
  23. NETLIB_LAPACK_DIR = $(TOPDIR)/lapack-netlib
  24. # Default C compiler
  25. # - Only set if not specified on the command line or inherited from the environment.
  26. # - CC is an implicit variable so neither '?=' or 'ifndef' can be used.
  27. # http://stackoverflow.com/questions/4029274/mingw-and-make-variables
  28. # - Default value is 'cc' which is not always a valid command (e.g. MinGW).
  29. ifeq ($(origin CC),default)
  30. # Check if $(CC) refers to a valid command and set the value to gcc if not
  31. ifneq ($(findstring cmd.exe,$(SHELL)),)
  32. ifeq ($(shell where $(CC) 2>NUL),)
  33. CC = gcc
  34. endif
  35. else # POSIX-ish
  36. ifeq ($(shell command -v $(CC) 2>/dev/null),)
  37. ifeq ($(shell uname -s),Darwin)
  38. CC = clang
  39. # EXTRALIB += -Wl,-no_compact_unwind
  40. else
  41. CC = gcc
  42. endif # Darwin
  43. endif # CC exists
  44. endif # Shell is sane
  45. endif # CC is set to default
  46. # Default Fortran compiler (FC) is selected by f_check.
  47. ifndef MAKEFILE_RULE
  48. include $(TOPDIR)/Makefile.rule
  49. else
  50. include $(TOPDIR)/$(MAKEFILE_RULE)
  51. endif
  52. #
  53. # Beginning of system configuration
  54. #
  55. ifndef HOSTCC
  56. HOSTCC = $(CC)
  57. endif
  58. ifdef TARGET
  59. GETARCH_FLAGS := -DFORCE_$(TARGET)
  60. GETARCH_FLAGS += -DUSER_TARGET
  61. endif
  62. # Force fallbacks for 32bit
  63. ifeq ($(BINARY), 32)
  64. ifeq ($(TARGET), HASWELL)
  65. GETARCH_FLAGS := -DFORCE_NEHALEM
  66. endif
  67. ifeq ($(TARGET), SKYLAKEX)
  68. GETARCH_FLAGS := -DFORCE_NEHALEM
  69. endif
  70. ifeq ($(TARGET), SANDYBRIDGE)
  71. GETARCH_FLAGS := -DFORCE_NEHALEM
  72. endif
  73. ifeq ($(TARGET), BULLDOZER)
  74. GETARCH_FLAGS := -DFORCE_BARCELONA
  75. endif
  76. ifeq ($(TARGET), PILEDRIVER)
  77. GETARCH_FLAGS := -DFORCE_BARCELONA
  78. endif
  79. ifeq ($(TARGET), STEAMROLLER)
  80. GETARCH_FLAGS := -DFORCE_BARCELONA
  81. endif
  82. ifeq ($(TARGET), EXCAVATOR)
  83. GETARCH_FLAGS := -DFORCE_BARCELONA
  84. endif
  85. ifeq ($(TARGET), ZEN)
  86. GETARCH_FLAGS := -DFORCE_BARCELONA
  87. endif
  88. ifeq ($(TARGET), ARMV8)
  89. GETARCH_FLAGS := -DFORCE_ARMV7
  90. endif
  91. endif
  92. #TARGET_CORE will override TARGET which is used in DYNAMIC_ARCH=1.
  93. #
  94. ifdef TARGET_CORE
  95. GETARCH_FLAGS := -DFORCE_$(TARGET_CORE)
  96. endif
  97. # Force fallbacks for 32bit
  98. ifeq ($(BINARY), 32)
  99. ifeq ($(TARGET_CORE), HASWELL)
  100. GETARCH_FLAGS := -DFORCE_NEHALEM
  101. endif
  102. ifeq ($(TARGET_CORE), SKYLAKEX)
  103. GETARCH_FLAGS := -DFORCE_NEHALEM
  104. endif
  105. ifeq ($(TARGET_CORE), SANDYBRIDGE)
  106. GETARCH_FLAGS := -DFORCE_NEHALEM
  107. endif
  108. ifeq ($(TARGET_CORE), BULLDOZER)
  109. GETARCH_FLAGS := -DFORCE_BARCELONA
  110. endif
  111. ifeq ($(TARGET_CORE), PILEDRIVER)
  112. GETARCH_FLAGS := -DFORCE_BARCELONA
  113. endif
  114. ifeq ($(TARGET_CORE), STEAMROLLER)
  115. GETARCH_FLAGS := -DFORCE_BARCELONA
  116. endif
  117. ifeq ($(TARGET_CORE), EXCAVATOR)
  118. GETARCH_FLAGS := -DFORCE_BARCELONA
  119. endif
  120. ifeq ($(TARGET_CORE), ZEN)
  121. GETARCH_FLAGS := -DFORCE_BARCELONA
  122. endif
  123. endif
  124. # On x86_64 build getarch with march=native. This is required to detect AVX512 support in getarch.
  125. ifeq ($(ARCH), x86_64)
  126. ifneq ($(C_COMPILER), PGI)
  127. GETARCH_FLAGS += -march=native
  128. endif
  129. endif
  130. ifdef INTERFACE64
  131. ifneq ($(INTERFACE64), 0)
  132. GETARCH_FLAGS += -DUSE64BITINT
  133. endif
  134. endif
  135. ifndef GEMM_MULTITHREAD_THRESHOLD
  136. GEMM_MULTITHREAD_THRESHOLD=4
  137. endif
  138. GETARCH_FLAGS += -DGEMM_MULTITHREAD_THRESHOLD=$(GEMM_MULTITHREAD_THRESHOLD)
  139. ifeq ($(NO_AVX), 1)
  140. GETARCH_FLAGS += -DNO_AVX
  141. endif
  142. ifeq ($(BINARY), 32)
  143. GETARCH_FLAGS += -DNO_AVX -DNO_AVX2 -DNO_AVX512
  144. NO_AVX512 = 1
  145. endif
  146. ifeq ($(NO_AVX2), 1)
  147. GETARCH_FLAGS += -DNO_AVX2
  148. endif
  149. ifeq ($(NO_AVX512), 1)
  150. GETARCH_FLAGS += -DNO_AVX512
  151. endif
  152. ifeq ($(DEBUG), 1)
  153. GETARCH_FLAGS += -g
  154. endif
  155. ifeq ($(QUIET_MAKE), 1)
  156. MAKE += -s
  157. endif
  158. ifndef NO_PARALLEL_MAKE
  159. NO_PARALLEL_MAKE=0
  160. endif
  161. GETARCH_FLAGS += -DNO_PARALLEL_MAKE=$(NO_PARALLEL_MAKE)
  162. ifdef MAKE_NB_JOBS
  163. GETARCH_FLAGS += -DMAKE_NB_JOBS=$(MAKE_NB_JOBS)
  164. endif
  165. ifeq ($(HOSTCC), loongcc)
  166. GETARCH_FLAGS += -static
  167. endif
  168. #if don't use Fortran, it will only compile CBLAS.
  169. ifeq ($(ONLY_CBLAS), 1)
  170. NO_LAPACK = 1
  171. else
  172. ONLY_CBLAS = 0
  173. endif
  174. # This operation is expensive, so execution should be once.
  175. ifndef GOTOBLAS_MAKEFILE
  176. export GOTOBLAS_MAKEFILE = 1
  177. # Generating Makefile.conf and config.h
  178. DUMMY := $(shell $(MAKE) -C $(TOPDIR) -f Makefile.prebuild CC="$(CC)" FC="$(FC)" HOSTCC="$(HOSTCC)" CFLAGS="$(GETARCH_FLAGS)" BINARY=$(BINARY) USE_OPENMP=$(USE_OPENMP) TARGET_CORE=$(TARGET_CORE) ONLY_CBLAS=$(ONLY_CBLAS) TARGET=$(TARGET) all)
  179. ifndef TARGET_CORE
  180. include $(TOPDIR)/Makefile.conf
  181. else
  182. include $(TOPDIR)/Makefile_kernel.conf
  183. endif
  184. endif
  185. ifndef NUM_PARALLEL
  186. NUM_PARALLEL = 1
  187. endif
  188. ifndef NUM_THREADS
  189. NUM_THREADS = $(NUM_CORES)
  190. endif
  191. ifeq ($(NUM_THREADS), 1)
  192. override USE_THREAD = 0
  193. override USE_OPENMP = 0
  194. endif
  195. ifdef USE_THREAD
  196. ifeq ($(USE_THREAD), 0)
  197. SMP =
  198. else
  199. SMP = 1
  200. endif
  201. else
  202. ifeq ($(NUM_THREAD), 1)
  203. SMP =
  204. else
  205. SMP = 1
  206. endif
  207. endif
  208. ifeq ($(SMP), 1)
  209. USE_LOCKING =
  210. endif
  211. ifndef NEED_PIC
  212. NEED_PIC = 1
  213. endif
  214. ARFLAGS =
  215. CPP = $(COMPILER) -E
  216. AR = $(CROSS_SUFFIX)ar
  217. AS = $(CROSS_SUFFIX)as
  218. LD = $(CROSS_SUFFIX)ld
  219. RANLIB = $(CROSS_SUFFIX)ranlib
  220. NM = $(CROSS_SUFFIX)nm
  221. DLLWRAP = $(CROSS_SUFFIX)dllwrap
  222. OBJCOPY = $(CROSS_SUFFIX)objcopy
  223. OBJCONV = $(CROSS_SUFFIX)objconv
  224. # For detect fortran failed, only build BLAS.
  225. ifeq ($(NOFORTRAN), 1)
  226. NO_LAPACK = 1
  227. endif
  228. #
  229. # OS dependent settings
  230. #
  231. ifeq ($(OSNAME), Darwin)
  232. ifndef MACOSX_DEPLOYMENT_TARGET
  233. export MACOSX_DEPLOYMENT_TARGET=10.8
  234. endif
  235. MD5SUM = md5 -r
  236. endif
  237. ifneq (,$(findstring $(OSNAME), FreeBSD OpenBSD DragonFly))
  238. MD5SUM = md5 -r
  239. endif
  240. ifeq ($(OSNAME), NetBSD)
  241. MD5SUM = md5 -n
  242. endif
  243. ifeq ($(OSNAME), Linux)
  244. EXTRALIB += -lm
  245. NO_EXPRECISION = 1
  246. endif
  247. ifeq ($(OSNAME), Android)
  248. EXTRALIB += -lm
  249. endif
  250. ifeq ($(OSNAME), AIX)
  251. EXTRALIB += -lm
  252. endif
  253. ifeq ($(OSNAME), WINNT)
  254. NEED_PIC = 0
  255. NO_EXPRECISION = 1
  256. EXTRALIB += -defaultlib:advapi32
  257. SUFFIX = obj
  258. PSUFFIX = pobj
  259. LIBSUFFIX = a
  260. ifeq ($(C_COMPILER), CLANG)
  261. CCOMMON_OPT += -DMS_ABI
  262. endif
  263. ifeq ($(C_COMPILER), GCC)
  264. #Test for supporting MS_ABI
  265. GCCVERSIONGTEQ4 := $(shell expr `$(CC) -dumpversion | cut -f1 -d.` \>= 4)
  266. GCCVERSIONGT4 := $(shell expr `$(CC) -dumpversion | cut -f1 -d.` \> 4)
  267. GCCMINORVERSIONGTEQ7 := $(shell expr `$(CC) -dumpversion | cut -f2 -d.` \>= 7)
  268. ifeq ($(GCCVERSIONGT4), 1)
  269. # GCC Majar version > 4
  270. # It is compatible with MSVC ABI.
  271. CCOMMON_OPT += -DMS_ABI
  272. endif
  273. ifeq ($(GCCVERSIONGTEQ4), 1)
  274. ifeq ($(GCCMINORVERSIONGTEQ7), 1)
  275. # GCC Version >=4.7
  276. # It is compatible with MSVC ABI.
  277. CCOMMON_OPT += -DMS_ABI
  278. endif
  279. endif
  280. endif
  281. # Ensure the correct stack alignment on Win32
  282. # http://permalink.gmane.org/gmane.comp.lib.openblas.general/97
  283. ifeq ($(ARCH), x86)
  284. CCOMMON_OPT += -mincoming-stack-boundary=2
  285. FCOMMON_OPT += -mincoming-stack-boundary=2
  286. endif
  287. endif
  288. ifeq ($(OSNAME), Interix)
  289. NEED_PIC = 0
  290. NO_EXPRECISION = 1
  291. INTERIX_TOOL_DIR = /opt/gcc.3.3/i586-pc-interix3/bin
  292. endif
  293. ifeq ($(OSNAME), CYGWIN_NT)
  294. NEED_PIC = 0
  295. NO_EXPRECISION = 1
  296. OS_CYGWIN_NT = 1
  297. endif
  298. ifneq ($(OSNAME), WINNT)
  299. ifneq ($(OSNAME), CYGWIN_NT)
  300. ifneq ($(OSNAME), Interix)
  301. ifneq ($(OSNAME), Android)
  302. ifdef SMP
  303. EXTRALIB += -lpthread
  304. endif
  305. endif
  306. endif
  307. endif
  308. endif
  309. # ifeq logical or
  310. ifeq ($(OSNAME), $(filter $(OSNAME),WINNT CYGWIN_NT Interix))
  311. OS_WINDOWS=1
  312. endif
  313. ifdef QUAD_PRECISION
  314. CCOMMON_OPT += -DQUAD_PRECISION
  315. NO_EXPRECISION = 1
  316. endif
  317. ifneq ($(ARCH), x86)
  318. ifneq ($(ARCH), x86_64)
  319. NO_EXPRECISION = 1
  320. endif
  321. endif
  322. ifdef UTEST_CHECK
  323. CCOMMON_OPT += -DUTEST_CHECK
  324. SANITY_CHECK = 1
  325. endif
  326. ifdef SANITY_CHECK
  327. CCOMMON_OPT += -DSANITY_CHECK -DREFNAME=$(*F)f$(BU)
  328. endif
  329. MAX_STACK_ALLOC ?= 2048
  330. ifneq ($(MAX_STACK_ALLOC), 0)
  331. CCOMMON_OPT += -DMAX_STACK_ALLOC=$(MAX_STACK_ALLOC)
  332. endif
  333. ifdef USE_LOCKING
  334. ifneq ($(USE_LOCKING), 0)
  335. CCOMMON_OPT += -DUSE_LOCKING
  336. endif
  337. endif
  338. #
  339. # Architecture dependent settings
  340. #
  341. ifeq ($(ARCH), x86)
  342. ifndef BINARY
  343. NO_BINARY_MODE = 1
  344. endif
  345. ifeq ($(CORE), generic)
  346. NO_EXPRECISION = 1
  347. endif
  348. ifndef NO_EXPRECISION
  349. ifeq ($(F_COMPILER), GFORTRAN)
  350. # ifeq logical or. GCC or LSB
  351. ifeq ($(C_COMPILER), $(filter $(C_COMPILER),GCC LSB))
  352. EXPRECISION = 1
  353. CCOMMON_OPT += -DEXPRECISION -m128bit-long-double
  354. FCOMMON_OPT += -m128bit-long-double
  355. endif
  356. ifeq ($(C_COMPILER), CLANG)
  357. EXPRECISION = 1
  358. CCOMMON_OPT += -DEXPRECISION
  359. FCOMMON_OPT += -m128bit-long-double
  360. endif
  361. endif
  362. endif
  363. endif
  364. ifeq ($(ARCH), x86_64)
  365. ifeq ($(CORE), generic)
  366. NO_EXPRECISION = 1
  367. endif
  368. ifndef NO_EXPRECISION
  369. ifeq ($(F_COMPILER), GFORTRAN)
  370. # ifeq logical or. GCC or LSB
  371. ifeq ($(C_COMPILER), $(filter $(C_COMPILER),GCC LSB))
  372. EXPRECISION = 1
  373. CCOMMON_OPT += -DEXPRECISION -m128bit-long-double
  374. FCOMMON_OPT += -m128bit-long-double
  375. endif
  376. ifeq ($(C_COMPILER), CLANG)
  377. EXPRECISION = 1
  378. CCOMMON_OPT += -DEXPRECISION
  379. FCOMMON_OPT += -m128bit-long-double
  380. endif
  381. endif
  382. endif
  383. endif
  384. ifeq ($(C_COMPILER), INTEL)
  385. CCOMMON_OPT += -wd981
  386. endif
  387. ifeq ($(USE_OPENMP), 1)
  388. #check
  389. ifeq ($(USE_THREAD), 0)
  390. $(error OpenBLAS: Cannot set both USE_OPENMP=1 and USE_THREAD=0. The USE_THREAD=0 is only for building single thread version.)
  391. endif
  392. # ifeq logical or. GCC or LSB
  393. ifeq ($(C_COMPILER), $(filter $(C_COMPILER),GCC LSB))
  394. CCOMMON_OPT += -fopenmp
  395. endif
  396. ifeq ($(C_COMPILER), CLANG)
  397. CCOMMON_OPT += -fopenmp
  398. endif
  399. ifeq ($(C_COMPILER), INTEL)
  400. CCOMMON_OPT += -fopenmp
  401. endif
  402. ifeq ($(C_COMPILER), PGI)
  403. CCOMMON_OPT += -mp
  404. endif
  405. ifeq ($(C_COMPILER), OPEN64)
  406. CCOMMON_OPT += -mp
  407. CEXTRALIB += -lstdc++
  408. endif
  409. ifeq ($(C_COMPILER), PATHSCALE)
  410. CCOMMON_OPT += -mp
  411. endif
  412. endif
  413. ifeq ($(DYNAMIC_ARCH), 1)
  414. ifeq ($(ARCH), x86)
  415. DYNAMIC_CORE = KATMAI COPPERMINE NORTHWOOD PRESCOTT BANIAS \
  416. CORE2 PENRYN DUNNINGTON NEHALEM ATHLON OPTERON OPTERON_SSE3 BARCELONA BOBCAT ATOM NANO
  417. endif
  418. ifeq ($(ARCH), x86_64)
  419. DYNAMIC_CORE = PRESCOTT CORE2
  420. ifeq ($(DYNAMIC_OLDER), 1)
  421. DYNAMIC_CORE += PENRYN DUNNINGTON
  422. endif
  423. DYNAMIC_CORE += NEHALEM
  424. ifeq ($(DYNAMIC_OLDER), 1)
  425. DYNAMIC_CORE += OPTERON OPTERON_SSE3
  426. endif
  427. DYNAMIC_CORE += BARCELONA
  428. ifeq ($(DYNAMIC_OLDER), 1)
  429. DYNAMIC_CORE += BOBCAT ATOM NANO
  430. endif
  431. ifneq ($(NO_AVX), 1)
  432. DYNAMIC_CORE += SANDYBRIDGE BULLDOZER PILEDRIVER STEAMROLLER EXCAVATOR
  433. endif
  434. ifneq ($(NO_AVX2), 1)
  435. DYNAMIC_CORE += HASWELL ZEN
  436. endif
  437. ifneq ($(NO_AVX512), 1)
  438. ifneq ($(NO_AVX2), 1)
  439. DYNAMIC_CORE += SKYLAKEX
  440. endif
  441. endif
  442. endif
  443. ifdef DYNAMIC_LIST
  444. override DYNAMIC_CORE = PRESCOTT $(DYNAMIC_LIST)
  445. XCCOMMON_OPT = -DDYNAMIC_LIST -DDYN_PRESCOTT
  446. XCCOMMON_OPT += $(foreach dcore,$(DYNAMIC_LIST),-DDYN_$(dcore))
  447. CCOMMON_OPT += $(XCCOMMON_OPT)
  448. #CCOMMON_OPT += -DDYNAMIC_LIST='$(DYNAMIC_LIST)'
  449. endif
  450. ifeq ($(ARCH), arm64)
  451. DYNAMIC_CORE = ARMV8
  452. DYNAMIC_CORE += CORTEXA57
  453. DYNAMIC_CORE += THUNDERX
  454. DYNAMIC_CORE += THUNDERX2T99
  455. endif
  456. ifeq ($(ARCH), power)
  457. DYNAMIC_CORE = POWER6
  458. DYNAMIC_CORE += POWER8
  459. DYNAMIC_CORE += POWER9
  460. endif
  461. # If DYNAMIC_CORE is not set, DYNAMIC_ARCH cannot do anything, so force it to empty
  462. ifndef DYNAMIC_CORE
  463. override DYNAMIC_ARCH=
  464. endif
  465. endif
  466. ifeq ($(ARCH), ia64)
  467. NO_BINARY_MODE = 1
  468. BINARY_DEFINED = 1
  469. ifeq ($(F_COMPILER), GFORTRAN)
  470. ifeq ($(C_COMPILER), GCC)
  471. # EXPRECISION = 1
  472. # CCOMMON_OPT += -DEXPRECISION
  473. endif
  474. endif
  475. endif
  476. ifeq ($(ARCH), $(filter $(ARCH),mips64 mips))
  477. NO_BINARY_MODE = 1
  478. endif
  479. ifeq ($(ARCH), alpha)
  480. NO_BINARY_MODE = 1
  481. BINARY_DEFINED = 1
  482. endif
  483. ifeq ($(ARCH), arm)
  484. NO_BINARY_MODE = 1
  485. BINARY_DEFINED = 1
  486. CCOMMON_OPT += -marm
  487. FCOMMON_OPT += -marm
  488. # If softfp abi is mentioned on the command line, force it.
  489. ifeq ($(ARM_SOFTFP_ABI), 1)
  490. CCOMMON_OPT += -mfloat-abi=softfp
  491. FCOMMON_OPT += -mfloat-abi=softfp
  492. endif
  493. ifeq ($(OSNAME), Android)
  494. ifeq ($(ARM_SOFTFP_ABI), 1)
  495. EXTRALIB += -lm
  496. else
  497. EXTRALIB += -Wl,-lm_hard
  498. endif
  499. endif
  500. endif
  501. ifeq ($(ARCH), arm64)
  502. NO_BINARY_MODE = 1
  503. BINARY_DEFINED = 1
  504. endif
  505. #
  506. # C Compiler dependent settings
  507. #
  508. # ifeq logical or. GCC or CLANG or LSB
  509. # http://stackoverflow.com/questions/7656425/makefile-ifeq-logical-or
  510. ifeq ($(C_COMPILER), $(filter $(C_COMPILER),GCC CLANG LSB))
  511. CCOMMON_OPT += -Wall
  512. COMMON_PROF += -fno-inline
  513. NO_UNINITIALIZED_WARN = -Wno-uninitialized
  514. ifeq ($(QUIET_MAKE), 1)
  515. CCOMMON_OPT += $(NO_UNINITIALIZED_WARN) -Wno-unused
  516. endif
  517. ifdef NO_BINARY_MODE
  518. ifeq ($(ARCH), $(filter $(ARCH),mips64))
  519. ifdef BINARY64
  520. CCOMMON_OPT += -mabi=64
  521. else
  522. CCOMMON_OPT += -mabi=n32
  523. endif
  524. BINARY_DEFINED = 1
  525. else ifeq ($(ARCH), $(filter $(ARCH),mips))
  526. CCOMMON_OPT += -mabi=32
  527. BINARY_DEFINED = 1
  528. endif
  529. ifeq ($(CORE), LOONGSON3A)
  530. CCOMMON_OPT += -march=mips64
  531. FCOMMON_OPT += -march=mips64
  532. endif
  533. ifeq ($(CORE), LOONGSON3B)
  534. CCOMMON_OPT += -march=mips64
  535. FCOMMON_OPT += -march=mips64
  536. endif
  537. ifeq ($(CORE), 1004K)
  538. CCOMMON_OPT += -mips32r2 $(MSA_FLAGS)
  539. FCOMMON_OPT += -mips32r2 $(MSA_FLAGS)
  540. endif
  541. ifeq ($(CORE), P5600)
  542. CCOMMON_OPT += -mips32r5 -mnan=2008 -mtune=p5600 $(MSA_FLAGS)
  543. FCOMMON_OPT += -mips32r5 -mnan=2008 -mtune=p5600 $(MSA_FLAGS)
  544. endif
  545. ifeq ($(CORE), I6400)
  546. CCOMMON_OPT += -mips64r6 -mnan=2008 -mtune=i6400 $(MSA_FLAGS)
  547. FCOMMON_OPT += -mips64r6 -mnan=2008 -mtune=i6400 $(MSA_FLAGS)
  548. endif
  549. ifeq ($(CORE), P6600)
  550. CCOMMON_OPT += -mips64r6 -mnan=2008 -mtune=p6600 $(MSA_FLAGS)
  551. FCOMMON_OPT += -mips64r6 -mnan=2008 -mtune=p6600 $(MSA_FLAGS)
  552. endif
  553. ifeq ($(CORE), I6500)
  554. CCOMMON_OPT += -mips64r6 -mnan=2008 -mtune=i6400 $(MSA_FLAGS)
  555. FCOMMON_OPT += -mips64r6 -mnan=2008 -mtune=i6400 $(MSA_FLAGS)
  556. endif
  557. ifeq ($(OSNAME), AIX)
  558. BINARY_DEFINED = 1
  559. endif
  560. endif
  561. ifndef BINARY_DEFINED
  562. ifneq ($(OSNAME), AIX)
  563. ifdef BINARY64
  564. CCOMMON_OPT += -m64
  565. else
  566. CCOMMON_OPT += -m32
  567. endif
  568. endif
  569. endif
  570. endif
  571. ifeq ($(C_COMPILER), PGI)
  572. ifdef BINARY64
  573. CCOMMON_OPT += -tp p7-64
  574. else
  575. CCOMMON_OPT += -tp p7
  576. endif
  577. endif
  578. ifeq ($(C_COMPILER), PATHSCALE)
  579. ifdef BINARY64
  580. CCOMMON_OPT += -m64
  581. else
  582. CCOMMON_OPT += -m32
  583. endif
  584. endif
  585. #
  586. # Fortran Compiler dependent settings
  587. #
  588. ifeq ($(F_COMPILER), FLANG)
  589. CCOMMON_OPT += -DF_INTERFACE_FLANG
  590. ifdef BINARY64
  591. ifdef INTERFACE64
  592. ifneq ($(INTERFACE64), 0)
  593. FCOMMON_OPT += -i8
  594. endif
  595. endif
  596. FCOMMON_OPT += -Wall
  597. else
  598. FCOMMON_OPT += -Wall
  599. endif
  600. ifeq ($(USE_OPENMP), 1)
  601. FCOMMON_OPT += -fopenmp
  602. endif
  603. endif
  604. ifeq ($(F_COMPILER), G77)
  605. CCOMMON_OPT += -DF_INTERFACE_G77
  606. FCOMMON_OPT += -Wall
  607. ifndef NO_BINARY_MODE
  608. ifneq ($(OSNAME), AIX)
  609. ifdef BINARY64
  610. FCOMMON_OPT += -m64
  611. else
  612. FCOMMON_OPT += -m32
  613. endif
  614. endif
  615. endif
  616. endif
  617. ifeq ($(F_COMPILER), G95)
  618. CCOMMON_OPT += -DF_INTERFACE_G95
  619. FCOMMON_OPT += -Wall
  620. ifneq ($(OSNAME), AIX)
  621. ifndef NO_BINARY_MODE
  622. ifdef BINARY64
  623. FCOMMON_OPT += -m64
  624. else
  625. FCOMMON_OPT += -m32
  626. endif
  627. endif
  628. endif
  629. endif
  630. ifeq ($(F_COMPILER), GFORTRAN)
  631. CCOMMON_OPT += -DF_INTERFACE_GFORT
  632. FCOMMON_OPT += -Wall
  633. # make single-threaded LAPACK calls thread-safe #1847
  634. FCOMMON_OPT += -frecursive
  635. # work around ABI problem with passing single-character arguments
  636. FCOMMON_OPT += -fno-optimize-sibling-calls
  637. #Don't include -lgfortran, when NO_LAPACK=1 or lsbcc
  638. ifneq ($(NO_LAPACK), 1)
  639. EXTRALIB += -lgfortran
  640. endif
  641. ifdef NO_BINARY_MODE
  642. ifeq ($(ARCH), $(filter $(ARCH),mips64))
  643. ifdef BINARY64
  644. FCOMMON_OPT += -mabi=64
  645. else
  646. FCOMMON_OPT += -mabi=n32
  647. endif
  648. else ifeq ($(ARCH), $(filter $(ARCH),mips))
  649. FCOMMON_OPT += -mabi=32
  650. endif
  651. else
  652. ifdef BINARY64
  653. ifneq ($(OSNAME), AIX)
  654. FCOMMON_OPT += -m64
  655. endif
  656. ifdef INTERFACE64
  657. ifneq ($(INTERFACE64), 0)
  658. FCOMMON_OPT += -fdefault-integer-8
  659. endif
  660. endif
  661. else
  662. ifneq ($(OSNAME), AIX)
  663. FCOMMON_OPT += -m32
  664. endif
  665. endif
  666. endif
  667. ifeq ($(USE_OPENMP), 1)
  668. FCOMMON_OPT += -fopenmp
  669. endif
  670. endif
  671. ifeq ($(F_COMPILER), INTEL)
  672. CCOMMON_OPT += -DF_INTERFACE_INTEL
  673. ifdef INTERFACE64
  674. ifneq ($(INTERFACE64), 0)
  675. FCOMMON_OPT += -i8
  676. endif
  677. endif
  678. ifeq ($(USE_OPENMP), 1)
  679. FCOMMON_OPT += -fopenmp
  680. endif
  681. endif
  682. ifeq ($(F_COMPILER), FUJITSU)
  683. CCOMMON_OPT += -DF_INTERFACE_FUJITSU
  684. ifeq ($(USE_OPENMP), 1)
  685. FCOMMON_OPT += -openmp
  686. endif
  687. endif
  688. ifeq ($(F_COMPILER), IBM)
  689. CCOMMON_OPT += -DF_INTERFACE_IBM
  690. # FCOMMON_OPT += -qarch=440
  691. ifdef BINARY64
  692. FCOMMON_OPT += -q64
  693. ifdef INTERFACE64
  694. ifneq ($(INTERFACE64), 0)
  695. FCOMMON_OPT += -qintsize=8
  696. endif
  697. endif
  698. else
  699. FCOMMON_OPT += -q32
  700. endif
  701. ifeq ($(USE_OPENMP), 1)
  702. FCOMMON_OPT += -openmp
  703. endif
  704. endif
  705. ifeq ($(F_COMPILER), PGI)
  706. CCOMMON_OPT += -DF_INTERFACE_PGI
  707. COMMON_PROF += -DPGICOMPILER
  708. ifdef BINARY64
  709. ifdef INTERFACE64
  710. ifneq ($(INTERFACE64), 0)
  711. FCOMMON_OPT += -i8
  712. endif
  713. endif
  714. FCOMMON_OPT += -tp p7-64
  715. else
  716. FCOMMON_OPT += -tp p7
  717. endif
  718. ifeq ($(USE_OPENMP), 1)
  719. FCOMMON_OPT += -mp
  720. endif
  721. endif
  722. ifeq ($(F_COMPILER), PATHSCALE)
  723. CCOMMON_OPT += -DF_INTERFACE_PATHSCALE
  724. ifdef BINARY64
  725. ifdef INTERFACE64
  726. ifneq ($(INTERFACE64), 0)
  727. FCOMMON_OPT += -i8
  728. endif
  729. endif
  730. endif
  731. ifeq ($(USE_OPENMP), 1)
  732. FCOMMON_OPT += -mp
  733. endif
  734. endif
  735. ifeq ($(F_COMPILER), OPEN64)
  736. CCOMMON_OPT += -DF_INTERFACE_OPEN64
  737. ifdef BINARY64
  738. ifdef INTERFACE64
  739. ifneq ($(INTERFACE64), 0)
  740. FCOMMON_OPT += -i8
  741. endif
  742. endif
  743. endif
  744. ifeq ($(ARCH), $(filter $(ARCH),mips64 mips))
  745. ifndef BINARY64
  746. FCOMMON_OPT += -n32
  747. else
  748. FCOMMON_OPT += -n64
  749. endif
  750. ifeq ($(CORE), LOONGSON3A)
  751. FCOMMON_OPT += -loongson3 -static
  752. endif
  753. ifeq ($(CORE), LOONGSON3B)
  754. FCOMMON_OPT += -loongson3 -static
  755. endif
  756. else
  757. ifndef BINARY64
  758. FCOMMON_OPT += -m32
  759. else
  760. FCOMMON_OPT += -m64
  761. endif
  762. endif
  763. ifeq ($(USE_OPENMP), 1)
  764. FEXTRALIB += -lstdc++
  765. FCOMMON_OPT += -mp
  766. endif
  767. endif
  768. ifeq ($(C_COMPILER), OPEN64)
  769. ifeq ($(ARCH), $(filter $(ARCH),mips64 mips))
  770. ifndef BINARY64
  771. CCOMMON_OPT += -n32
  772. else
  773. CCOMMON_OPT += -n64
  774. endif
  775. ifeq ($(CORE), LOONGSON3A)
  776. CCOMMON_OPT += -loongson3 -static
  777. endif
  778. ifeq ($(CORE), LOONGSON3B)
  779. CCOMMON_OPT += -loongson3 -static
  780. endif
  781. else
  782. ifndef BINARY64
  783. CCOMMON_OPT += -m32
  784. else
  785. CCOMMON_OPT += -m64
  786. endif
  787. endif
  788. endif
  789. ifeq ($(C_COMPILER), SUN)
  790. CCOMMON_OPT += -w
  791. ifeq ($(ARCH), x86)
  792. CCOMMON_OPT += -m32
  793. else
  794. FCOMMON_OPT += -m64
  795. endif
  796. endif
  797. ifeq ($(F_COMPILER), SUN)
  798. CCOMMON_OPT += -DF_INTERFACE_SUN
  799. ifeq ($(ARCH), x86)
  800. FCOMMON_OPT += -m32
  801. else
  802. FCOMMON_OPT += -m64
  803. endif
  804. ifeq ($(USE_OPENMP), 1)
  805. FCOMMON_OPT += -xopenmp=parallel
  806. endif
  807. endif
  808. ifeq ($(F_COMPILER), COMPAQ)
  809. CCOMMON_OPT += -DF_INTERFACE_COMPAQ
  810. ifeq ($(USE_OPENMP), 1)
  811. FCOMMON_OPT += -openmp
  812. endif
  813. endif
  814. ifdef BINARY64
  815. ifdef INTERFACE64
  816. ifneq ($(INTERFACE64), 0)
  817. CCOMMON_OPT +=
  818. #-DUSE64BITINT
  819. endif
  820. endif
  821. endif
  822. ifeq ($(NEED_PIC), 1)
  823. ifeq ($(C_COMPILER), IBM)
  824. CCOMMON_OPT += -qpic=large
  825. else
  826. CCOMMON_OPT += -fPIC
  827. endif
  828. ifeq ($(F_COMPILER), SUN)
  829. FCOMMON_OPT += -pic
  830. else
  831. FCOMMON_OPT += -fPIC
  832. endif
  833. endif
  834. ifeq ($(DYNAMIC_ARCH), 1)
  835. CCOMMON_OPT += -DDYNAMIC_ARCH
  836. endif
  837. ifeq ($(DYNAMIC_OLDER), 1)
  838. CCOMMON_OPT += -DDYNAMIC_OLDER
  839. endif
  840. ifeq ($(NO_LAPACK), 1)
  841. CCOMMON_OPT += -DNO_LAPACK
  842. #Disable LAPACK C interface
  843. NO_LAPACKE = 1
  844. endif
  845. ifeq ($(NO_LAPACKE), 1)
  846. CCOMMON_OPT += -DNO_LAPACKE
  847. endif
  848. ifeq ($(NO_AVX), 1)
  849. CCOMMON_OPT += -DNO_AVX
  850. endif
  851. ifeq ($(ARCH), x86)
  852. CCOMMON_OPT += -DNO_AVX
  853. endif
  854. ifeq ($(NO_AVX2), 1)
  855. CCOMMON_OPT += -DNO_AVX2
  856. endif
  857. ifeq ($(NO_AVX512), 1)
  858. CCOMMON_OPT += -DNO_AVX512
  859. endif
  860. ifdef SMP
  861. CCOMMON_OPT += -DSMP_SERVER
  862. ifeq ($(ARCH), mips64)
  863. ifneq ($(CORE), LOONGSON3B)
  864. USE_SIMPLE_THREADED_LEVEL3 = 1
  865. endif
  866. endif
  867. ifeq ($(USE_OPENMP), 1)
  868. # USE_SIMPLE_THREADED_LEVEL3 = 1
  869. # NO_AFFINITY = 1
  870. CCOMMON_OPT += -DUSE_OPENMP
  871. endif
  872. ifeq ($(BIGNUMA), 1)
  873. CCOMMON_OPT += -DBIGNUMA
  874. endif
  875. endif
  876. ifeq ($(NO_WARMUP), 1)
  877. CCOMMON_OPT += -DNO_WARMUP
  878. endif
  879. ifeq ($(CONSISTENT_FPCSR), 1)
  880. CCOMMON_OPT += -DCONSISTENT_FPCSR
  881. endif
  882. # Only for development
  883. # CCOMMON_OPT += -DPARAMTEST
  884. # CCOMMON_OPT += -DPREFETCHTEST
  885. # CCOMMON_OPT += -DNO_SWITCHING
  886. # USE_PAPI = 1
  887. ifdef USE_PAPI
  888. CCOMMON_OPT += -DUSE_PAPI
  889. EXTRALIB += -lpapi -lperfctr
  890. endif
  891. ifdef DYNAMIC_THREADS
  892. CCOMMON_OPT += -DDYNAMIC_THREADS
  893. endif
  894. CCOMMON_OPT += -DMAX_CPU_NUMBER=$(NUM_THREADS)
  895. CCOMMON_OPT += -DMAX_PARALLEL_NUMBER=$(NUM_PARALLEL)
  896. ifdef USE_SIMPLE_THREADED_LEVEL3
  897. CCOMMON_OPT += -DUSE_SIMPLE_THREADED_LEVEL3
  898. endif
  899. ifeq ($(USE_TLS), 1)
  900. CCOMMON_OPT += -DUSE_TLS
  901. endif
  902. CCOMMON_OPT += -DVERSION=\"$(VERSION)\"
  903. ifndef SYMBOLPREFIX
  904. SYMBOLPREFIX =
  905. endif
  906. ifndef SYMBOLSUFFIX
  907. SYMBOLSUFFIX =
  908. endif
  909. ifndef LIBNAMESUFFIX
  910. LIBNAMEBASE = $(SYMBOLPREFIX)openblas$(SYMBOLSUFFIX)
  911. else
  912. LIBNAMEBASE = $(SYMBOLPREFIX)openblas$(SYMBOLSUFFIX)_$(LIBNAMESUFFIX)
  913. endif
  914. ifeq ($(OSNAME), CYGWIN_NT)
  915. LIBPREFIX = cyg$(LIBNAMEBASE)
  916. else
  917. LIBPREFIX = lib$(LIBNAMEBASE)
  918. endif
  919. KERNELDIR = $(TOPDIR)/kernel/$(ARCH)
  920. include $(TOPDIR)/Makefile.$(ARCH)
  921. CCOMMON_OPT += -DASMNAME=$(FU)$(*F) -DASMFNAME=$(FU)$(*F)$(BU) -DNAME=$(*F)$(BU) -DCNAME=$(*F) -DCHAR_NAME=\"$(*F)$(BU)\" -DCHAR_CNAME=\"$(*F)\"
  922. ifeq ($(CORE), PPC440)
  923. CCOMMON_OPT += -DALLOC_QALLOC
  924. endif
  925. ifeq ($(CORE), PPC440FP2)
  926. STATIC_ALLOCATION = 1
  927. endif
  928. ifneq ($(OSNAME), Linux)
  929. NO_AFFINITY = 1
  930. endif
  931. ifneq ($(ARCH), x86_64)
  932. ifneq ($(ARCH), x86)
  933. ifneq ($(CORE), LOONGSON3B)
  934. NO_AFFINITY = 1
  935. endif
  936. endif
  937. endif
  938. ifdef NO_AFFINITY
  939. CCOMMON_OPT += -DNO_AFFINITY
  940. endif
  941. ifdef FUNCTION_PROFILE
  942. CCOMMON_OPT += -DFUNCTION_PROFILE
  943. endif
  944. ifdef HUGETLB_ALLOCATION
  945. CCOMMON_OPT += -DALLOC_HUGETLB
  946. endif
  947. ifdef HUGETLBFILE_ALLOCATION
  948. CCOMMON_OPT += -DALLOC_HUGETLBFILE -DHUGETLB_FILE_NAME=$(HUGETLBFILE_ALLOCATION)
  949. endif
  950. ifdef STATIC_ALLOCATION
  951. CCOMMON_OPT += -DALLOC_STATIC
  952. endif
  953. ifdef DEVICEDRIVER_ALLOCATION
  954. CCOMMON_OPT += -DALLOC_DEVICEDRIVER -DDEVICEDRIVER_NAME=\"/dev/mapper\"
  955. endif
  956. ifdef MIXED_MEMORY_ALLOCATION
  957. CCOMMON_OPT += -DMIXED_MEMORY_ALLOCATION
  958. endif
  959. ifeq ($(OSNAME), SunOS)
  960. TAR = gtar
  961. PATCH = gpatch
  962. GREP = ggrep
  963. AWK = nawk
  964. else
  965. TAR = tar
  966. PATCH = patch
  967. GREP = grep
  968. AWK = awk
  969. endif
  970. ifndef MD5SUM
  971. MD5SUM = md5sum
  972. endif
  973. REVISION = -r$(VERSION)
  974. MAJOR_VERSION = $(word 1,$(subst ., ,$(VERSION)))
  975. ifeq ($(DEBUG), 1)
  976. COMMON_OPT += -g
  977. endif
  978. ifeq ($(DEBUG), 1)
  979. FCOMMON_OPT += -g
  980. endif
  981. ifndef COMMON_OPT
  982. COMMON_OPT = -O2
  983. endif
  984. ifndef FCOMMON_OPT
  985. FCOMMON_OPT = -O2 -frecursive
  986. endif
  987. override CFLAGS += $(COMMON_OPT) $(CCOMMON_OPT) -I$(TOPDIR)
  988. override PFLAGS += $(COMMON_OPT) $(CCOMMON_OPT) -I$(TOPDIR) -DPROFILE $(COMMON_PROF)
  989. override FFLAGS += $(COMMON_OPT) $(FCOMMON_OPT)
  990. override FPFLAGS += $(FCOMMON_OPT) $(COMMON_PROF)
  991. #MAKEOVERRIDES =
  992. ifdef NEED_PIC
  993. ifeq (,$(findstring PIC,$(FFLAGS)))
  994. override FFLAGS += -fPIC
  995. endif
  996. endif
  997. #For LAPACK Fortran codes.
  998. #Disable -fopenmp for LAPACK Fortran codes on Windows.
  999. ifdef OS_WINDOWS
  1000. LAPACK_FFLAGS := $(filter-out -fopenmp -mp -openmp -xopenmp=parallel,$(FFLAGS))
  1001. LAPACK_FPFLAGS := $(filter-out -fopenmp -mp -openmp -xopenmp=parallel,$(FPFLAGS))
  1002. else
  1003. LAPACK_FFLAGS := $(FFLAGS)
  1004. LAPACK_FPFLAGS := $(FPFLAGS)
  1005. endif
  1006. LAPACK_CFLAGS = $(CFLAGS)
  1007. LAPACK_CFLAGS += -DHAVE_LAPACK_CONFIG_H
  1008. ifdef INTERFACE64
  1009. ifneq ($(INTERFACE64), 0)
  1010. LAPACK_CFLAGS += -DLAPACK_ILP64
  1011. endif
  1012. endif
  1013. ifdef OS_WINDOWS
  1014. LAPACK_CFLAGS += -DOPENBLAS_OS_WINDOWS
  1015. endif
  1016. ifeq ($(C_COMPILER), LSB)
  1017. LAPACK_CFLAGS += -DLAPACK_COMPLEX_STRUCTURE
  1018. endif
  1019. ifndef SUFFIX
  1020. SUFFIX = o
  1021. endif
  1022. ifndef PSUFFIX
  1023. PSUFFIX = po
  1024. endif
  1025. ifndef LIBSUFFIX
  1026. LIBSUFFIX = a
  1027. endif
  1028. ifneq ($(DYNAMIC_ARCH), 1)
  1029. ifndef SMP
  1030. LIBNAME = $(LIBPREFIX)_$(LIBCORE)$(REVISION).$(LIBSUFFIX)
  1031. LIBNAME_P = $(LIBPREFIX)_$(LIBCORE)$(REVISION)_p.$(LIBSUFFIX)
  1032. else
  1033. LIBNAME = $(LIBPREFIX)_$(LIBCORE)p$(REVISION).$(LIBSUFFIX)
  1034. LIBNAME_P = $(LIBPREFIX)_$(LIBCORE)p$(REVISION)_p.$(LIBSUFFIX)
  1035. endif
  1036. else
  1037. ifndef SMP
  1038. LIBNAME = $(LIBPREFIX)$(REVISION).$(LIBSUFFIX)
  1039. LIBNAME_P = $(LIBPREFIX)$(REVISION)_p.$(LIBSUFFIX)
  1040. else
  1041. LIBNAME = $(LIBPREFIX)p$(REVISION).$(LIBSUFFIX)
  1042. LIBNAME_P = $(LIBPREFIX)p$(REVISION)_p.$(LIBSUFFIX)
  1043. endif
  1044. endif
  1045. LIBDLLNAME = $(LIBPREFIX).dll
  1046. IMPLIBNAME = lib$(LIBNAMEBASE).dll.a
  1047. ifneq ($(OSNAME), AIX)
  1048. LIBSONAME = $(LIBNAME:.$(LIBSUFFIX)=.so)
  1049. else
  1050. LIBSONAME = $(LIBNAME:.$(LIBSUFFIX)=.a)
  1051. endif
  1052. LIBDYNNAME = $(LIBNAME:.$(LIBSUFFIX)=.dylib)
  1053. LIBDEFNAME = $(LIBNAME:.$(LIBSUFFIX)=.def)
  1054. LIBEXPNAME = $(LIBNAME:.$(LIBSUFFIX)=.exp)
  1055. LIBZIPNAME = $(LIBNAME:.$(LIBSUFFIX)=.zip)
  1056. LIBS = $(TOPDIR)/$(LIBNAME)
  1057. LIBS_P = $(TOPDIR)/$(LIBNAME_P)
  1058. LIB_COMPONENTS = BLAS
  1059. ifneq ($(NO_CBLAS), 1)
  1060. LIB_COMPONENTS += CBLAS
  1061. endif
  1062. ifneq ($(NO_LAPACK), 1)
  1063. LIB_COMPONENTS += LAPACK
  1064. ifneq ($(NO_LAPACKE), 1)
  1065. LIB_COMPONENTS += LAPACKE
  1066. endif
  1067. ifeq ($(BUILD_RELAPACK), 1)
  1068. LIB_COMPONENTS += ReLAPACK
  1069. endif
  1070. endif
  1071. ifeq ($(ONLY_CBLAS), 1)
  1072. LIB_COMPONENTS = CBLAS
  1073. endif
  1074. export OSNAME
  1075. export ARCH
  1076. export CORE
  1077. export LIBCORE
  1078. export PGCPATH
  1079. export CONFIG
  1080. export CC
  1081. export FC
  1082. export BU
  1083. export FU
  1084. export NEED2UNDERSCORES
  1085. export USE_THREAD
  1086. export NUM_THREADS
  1087. export NUM_CORES
  1088. export SMP
  1089. export MAKEFILE_RULE
  1090. export NEED_PIC
  1091. export BINARY
  1092. export BINARY32
  1093. export BINARY64
  1094. export F_COMPILER
  1095. export C_COMPILER
  1096. export USE_OPENMP
  1097. export CROSS
  1098. export CROSS_SUFFIX
  1099. export NOFORTRAN
  1100. export NO_FBLAS
  1101. export EXTRALIB
  1102. export CEXTRALIB
  1103. export FEXTRALIB
  1104. export HAVE_SSE
  1105. export HAVE_SSE2
  1106. export HAVE_SSE3
  1107. export HAVE_SSSE3
  1108. export HAVE_SSE4_1
  1109. export HAVE_SSE4_2
  1110. export HAVE_SSE4A
  1111. export HAVE_SSE5
  1112. export HAVE_AVX
  1113. export HAVE_VFP
  1114. export HAVE_VFPV3
  1115. export HAVE_VFPV4
  1116. export HAVE_NEON
  1117. export HAVE_MSA
  1118. export MSA_FLAGS
  1119. export KERNELDIR
  1120. export FUNCTION_PROFILE
  1121. export TARGET_CORE
  1122. export NO_AVX512
  1123. export SGEMM_UNROLL_M
  1124. export SGEMM_UNROLL_N
  1125. export DGEMM_UNROLL_M
  1126. export DGEMM_UNROLL_N
  1127. export QGEMM_UNROLL_M
  1128. export QGEMM_UNROLL_N
  1129. export CGEMM_UNROLL_M
  1130. export CGEMM_UNROLL_N
  1131. export ZGEMM_UNROLL_M
  1132. export ZGEMM_UNROLL_N
  1133. export XGEMM_UNROLL_M
  1134. export XGEMM_UNROLL_N
  1135. export CGEMM3M_UNROLL_M
  1136. export CGEMM3M_UNROLL_N
  1137. export ZGEMM3M_UNROLL_M
  1138. export ZGEMM3M_UNROLL_N
  1139. export XGEMM3M_UNROLL_M
  1140. export XGEMM3M_UNROLL_N
  1141. ifdef USE_CUDA
  1142. export CUDADIR
  1143. export CUCC
  1144. export CUFLAGS
  1145. export CULIB
  1146. endif
  1147. .SUFFIXES: .$(PSUFFIX) .$(SUFFIX) .f
  1148. .f.$(SUFFIX):
  1149. $(FC) $(FFLAGS) -c $< -o $(@F)
  1150. .f.$(PSUFFIX):
  1151. $(FC) $(FPFLAGS) -pg -c $< -o $(@F)
  1152. ifdef BINARY64
  1153. PATHSCALEPATH = /opt/pathscale/lib/3.1
  1154. PGIPATH = /opt/pgi/linux86-64/7.1-5/lib
  1155. else
  1156. PATHSCALEPATH = /opt/pathscale/lib/3.1/32
  1157. PGIPATH = /opt/pgi/linux86/7.1-5/lib
  1158. endif
  1159. ACMLPATH = /opt/acml/4.3.0
  1160. ifneq ($(OSNAME), Darwin)
  1161. MKLPATH = /opt/intel/mkl/10.2.2.025/lib
  1162. else
  1163. MKLPATH = /Library/Frameworks/Intel_MKL.framework/Versions/10.0.1.014/lib
  1164. endif
  1165. ATLASPATH = /opt/atlas/3.9.17/opteron
  1166. FLAMEPATH = $(HOME)/flame/lib
  1167. ifneq ($(OSNAME), SunOS)
  1168. SUNPATH = /opt/sunstudio12.1
  1169. else
  1170. SUNPATH = /opt/SUNWspro
  1171. endif