 rebase? (#1)
* With the Intel compiler on Linux, prefer ifort for the final link step
icc has known problems with mixed-language builds that ifort can handle just fine. Fixes #1956
* Rename operands to put lda on the input/output constraint list
* Fix wrong constraints in inline assembly
for #2009
* Fix inline assembly constraints
rework indices to allow marking argument lda4 as input and output. For #2009
* Fix inline assembly constraints
rework indices to allow marking argument lda as input and output.
* Fix inline assembly constraints
* Fix inline assembly constraints
* Fix inline assembly constraints in Bulldozer TRSM kernels
rework indices to allow marking i,as and bs as both input and output (marked operand n1 as well for simplicity). For #2009
* Correct range_n limiting
same bug as seen in #1388, somehow missed in corresponding PR #1389
* Allow multithreading TRMV again
revert workaround introduced for issue #1332 as the actual cause appears to be my incorrect fix from #1262 (see #1388)
* Fix error introduced during cleanup
* Reduce list of kernels in the dynamic arch build
to make compilation complete reliably within the 1h limit again
* init
* move fix to right place
* Fix missing -c option in AVX512 test
* Fix AVX512 test always returning false due to missing compiler option
* Make x86_32 imply NO_AVX2, NO_AVX512 in addition to NO_AVX
fixes #2033
* Keep xcode8.3 for osx BINARY=32 build
as xcode10 deprecated i386
* Make sure that AVX512 is disabled in 32bit builds
for #2033
* Improve handling of NO_STATIC and NO_SHARED
to avoid surprises from defining either as zero. Fixes #2035 by addressing some concerns from #1422
* init
* address warning introed with #1814 et al
* Restore locking optimizations for OpenMP case
restore another accidentally dropped part of #1468 that was missed in #2004 to address performance regression reported in #1461
* HiSilicon tsv110 CPUs optimization branch
add HiSilicon tsv110 CPUs optimization branch
* add TARGET support for HiSilicon tsv110 CPUs
* add TARGET support for HiSilicon tsv110 CPUs
* add TARGET support for HiSilicon tsv110 CPUs
* Fix module definition conflicts between LAPACK and ReLAPACK
for #2043
* Do not compile in AVX512 check if AVX support is disabled
xgetbv is function depends on NO_AVX being undefined - we could change that too, but that combo is unlikely to work anyway
* ctest.c : add __POWERPC__ for PowerMac
* Fix crash in sgemm SSE/nano kernel on x86_64
Fix bug #2047.
Signed-off-by: Celelibi <celelibi@gmail.com>
* param.h : enable defines for PPC970 on DarwinOS
fixes:
gemm.c: In function 'sgemm_':
../common_param.h:981:18: error: 'SGEMM_DEFAULT_P' undeclared (first use in this function)
#define SGEMM_P SGEMM_DEFAULT_P
^
* common_power.h: force DCBT_ARG 0 on PPC970 Darwin
without this, we see
../kernel/power/gemv_n.S:427:Parameter syntax error
and many more similar entries
that relates to this assembly command
dcbt 8, r24, r18
this change makes the DCBT_ARG = 0
and openblas builds through to completion on PowerMac 970
Tests pass
* Make TARGET=GENERIC compatible with DYNAMIC_ARCH=1
for issue #2048
* make DYNAMIC_ARCH=1 package work on TSV110.
* make DYNAMIC_ARCH=1 package work on TSV110
* Add Intel Denverton
for #2048
* Add Intel Denverton
* Change 64-bit detection as explained in #2056
* Trivial typo fix
as suggested in #2022
* Disable the AVX512 DGEMM kernel (again)
Due to as yet unresolved errors seen in #1955 and #2029
* Use POSIX getenv on Cygwin
The Windows-native GetEnvironmentVariable cannot be relied on, as
Cygwin does not always copy environment variables set through Cygwin
to the Windows environment block, particularly after fork().
* Fix for #2063: The DllMain used in Cygwin did not run the thread memory
pool cleanup upon THREAD_DETACH which is needed when compiled with
USE_TLS=1.
* Also call CloseHandle on each thread, as well as on the event so as to not leak thread handles.
* AIX asm syntax changes needed for shared object creation
* power9 makefile. dgemm based on power8 kernel with following changes : 32x unrolled 16x4 kernel and 8x4 kernel using (lxv stxv butterfly rank1 update). improvement from 17 to 22-23gflops. dtrmm cases were added into dgemm itself
* Expose CBLAS interfaces for I?MIN and I?MAX
* Build CBLAS interfaces for I?MIN and I?MAX
* Add declarations for ?sum and cblas_?sum
* Add interface for ?sum (derived from ?asum)
* Add ?sum
* Add implementations of ssum/dsum and csum/zsum
as trivial copies of asum/zsasum with the fabs calls replaced by fmov to preserve code structure
* Add ARM implementations of ?sum
(trivial copies of the respective ?asum with the fabs calls removed)
* Add ARM64 implementations of ?sum
as trivial copies of the respective ?asum kernels with the fabs calls removed
* Add ia64 implementation of ?sum
as trivial copy of asum with the fabs calls removed
* Add MIPS implementation of ?sum
as trivial copy of ?asum with the fabs calls removed
* Add MIPS64 implementation of ?sum
as trivial copy of ?asum with the fabs replaced by mov to preserve code structure
* Add POWER implementation of ?sum
as trivial copy of ?asum with the fabs replaced by fmr to preserve code structure
* Add SPARC implementation of ?sum
as trivial copy of ?asum with the fabs replaced by fmov to preserve code structure
* Add x86 implementation of ?sum
as trivial copy of ?asum with the fabs calls removed
* Add x86_64 implementation of ?sum
as trivial copy of ?asum with the fabs calls removed
* Add ZARCH implementation of ?sum
as trivial copies of the respective ?asum kernels with the ABS and vflpsb calls removed
* Detect 32bit environment on 64bit ARM hardware
for #2056, using same approach as #2058
* Add cmake defaults for ?sum kernels
* Add ?sum
* Add ?sum definitions for generic kernel
* Add declarations for ?sum
* Add -lm and disable EXPRECISION support on *BSD
fixes #2075
* Add in runtime CPU detection for POWER.
* snprintf define consolidated to common.h
* Support INTERFACE64=1
* Add support for INTERFACE64 and fix XERBLA calls
1. Replaced all instances of "int" with "blasint"
2. Added string length as "hidden" third parameter in calls to fortran XERBLA
* Correct length of name string in xerbla call
* Avoid out-of-bounds accesses in LAPACK EIG tests
see https://github.com/Reference-LAPACK/lapack/issues/333
* Correct INFO=4 condition
* Disable reallocation of work array in xSYTRF
as it appears to cause memory management problems (seen in the LAPACK tests)
* Disable repeated recursion on Ab_BR in ReLAPACK xGBTRF
due to crashes in LAPACK tests
* sgemm/strmm
* Update Changelog with changes from 0.3.6
* Increment version to 0.3.7.dev
* Increment version to 0.3.7.dev
* Misc. typo fixes
Found via `codespell -q 3 -w -L ith,als,dum,nd,amin,nto,wis,ba -S ./relapack,./kernel,./lapack-netlib`
* Correct argument of CPU_ISSET for glibc <2.5
fixes #2104
* conflict resolve
* Revert reference/ fixes
* Revert Changelog.txt typos
* Disable the SkyLakeX DGEMMITCOPY kernel as well
as a stopgap measure for https://github.com/numpy/numpy/issues/13401 as mentioned in #1955
* Disable DGEMMINCOPY as well for now
#1955
* init
* Fix errors in cpu enumeration with glibc 2.6
for #2114
* Change two http links to https
Closes #2109
* remove redundant code #2113
* Set up CI with Azure Pipelines
[skip ci]
* TST: add native POWER8 to CI
* add native POWER8 testing to
Travis CI matrix with ppc64le
os entry
* Update link to IBM MASS library, update cpu support status
* first try migrating one of the arm builds from travis
* fix tabbing in azure commands
* Update azure-pipelines.yml
take out offending lines (although stolen from https://github.com/conda-forge/opencv-feedstock azure-pipelines fiie)
* Update azure-pipelines.yml
* Update azure-pipelines.yml
* Update azure-pipelines.yml
* Update azure-pipelines.yml
* DOC: Add Azure CI status badge
* Add ARMV6 build to azure CI setup (#2122)
using aytekinar's Alpine image and docker script from the Travis setup
[skip ci]
* TST: Azure manylinux1 & clean-up
* remove some of the steps & comments
from the original Azure yml template
* modify the trigger section to use
develop since OpenBLAS primarily uses
this branch; use the same batching
behavior as downstream projects NumPy/
SciPy
* remove Travis emulated ARMv6 gcc build
because this now happens in Azure
* use documented Ubuntu vmImage name for Azure
and add in a manylinux1 test run to the matrix
[skip appveyor]
* Add NO_AFFINITY to available options on Linux, and set it to ON
to match the gmake default. Fixes second part of #2114
* Replace ISMIN and ISAMIN kernels on all x86_64 platforms (#2125)
* Mark iamax_sse.S as unsuitable for MIN due to issue #2116
* Use iamax.S rather than iamax_sse.S for ISMIN/ISAMIN on all x86_64 as workaround for #2116
* Move ARMv8 gcc build from Travis to Azure
* Move ARMv8 gcc build from Travis to Azure
* Update .travis.yml
* Test drone CI
* install make
* remove sudo
* Install gcc
* Install perl
* Install gfortran and add a clang job
* gfortran->gcc-gfortran
* Switch to ubuntu and parallel jobs
* apt update
* Fix typo
* update yes
* no need of gcc in clang build
* Add a cmake build as well
* Add cmake builds and print options
* build without lapack on cmake
* parallel build
* See if ubuntu 19.04 fixes the ICE
* Remove qemu armv8 builds
* arm32 build
* Fix typo
* TST: add SkylakeX AVX512 CI test
* adapt the C-level reproducer code for some
recent SkylakeX AVX512 kernel issues, provided
by Isuru Fernando and modified by Martin Kroeker,
for usage in the utest suite
* add an Intel SDE SkylakeX emulation utest run to
the Azure CI matrix; a custom Docker build was required
because Ubuntu image provided by Azure does not support
AVX512VL instructions
* Add option USE_LOCKING for single-threaded build with locking support
for calling from concurrent threads
* Add option USE_LOCKING for single-threaded build with locking support
* Add option USE_LOCKING for SMP-like locking in USE_THREAD=0 builds
* Add option USE_LOCKING but keep default settings intact
* Remove unrelated change
* Do not try ancient PGI hacks with recent versions of that compiler
should fix #2139
* Build and run utests in any case, they do their own checks for fortran availability
* Add softfp support in min/max kernels
fix for #1912
* Revert "Add softfp support in min/max kernels"
* Separate implementations of AMAX and IAMAX on arm
As noted in #1912 and comment on #1942, the combined implementation happens to "do the right thing" on hardfp, but cannot return both value and index on softfp where they would have to share the return register
* Ensure correct output for DAMAX with softfp
* Use generic kernels for complex (I)AMAX to support softfp
* improved zgemm power9 based on power8
* upload thread safety test folder
* hook up c++ thread safety test (main Makefile)
* add c++ thread test option to Makefile.rule
* Document NO_AVX512
for #2151
* sgemm pipeline improved, zgemm rewritten without inner packs, ABI lxvx v20 fixed with vs52
* Fix detection of AVX512 capable compilers in getarch
21eda8b5 introduced a check in getarch.c to test if the compiler is capable of
AVX512. This check currently fails, since the used __AVX2__ macro is only
defined if getarch itself was compiled with AVX2/AVX512 support. Make sure this
is the case by building getarch with -march=native on x86_64. It is only
supposed to run on the build host anyway.
* c_check: Unlink correct file
* power9 zgemm ztrmm optimized
* conflict resolve
* Add gfortran workaround for ABI violations in LAPACKE
for #2154 (see gcc bug 90329)
* Add gfortran workaround for ABI violations
for #2154 (see gcc bug 90329)
* Add gfortran workaround for potential ABI violation
for #2154
* Update fc.cmake
* Remove any inadvertent use of -march=native from DYNAMIC_ARCH builds
from #2143, -march=native precludes use of more specific options like -march=skylake-avx512 in individual kernels, and defeats the purpose of dynamic arch anyway.
* Avoid unintentional activation of TLS code via USE_TLS=0
fixes #2149
* Do not force gcc options on non-gcc compilers
fixes compile failure with pgi 18.10 as reported on OpenBLAS-users
* Update Makefile.x86_64
* Zero ecx with a mov instruction
PGI assembler does not like the initialization in the constraints.
* Fix mov syntax
* new sgemm 8x16
* Update dtrmm_kernel_16x4_power8.S
* PGI compiler does not like -march=native
* Fix build on FreeBSD/powerpc64.
Signed-off-by: Piotr Kubaj <pkubaj@anongoth.pl>
* Fix build for PPC970 on FreeBSD pt. 1
FreeBSD needs DCBT_ARG=0 as well.
* Fix build for PPC970 on FreeBSD pt.2
FreeBSD needs those macros too.
* cgemm/ctrmm power9
* Utest needs CBLAS but not necessarily FORTRAN
* Add mingw builds to Appveyor config
* Add getarch flags to disable AVX on x86
(and other small fixes to match Makefile behaviour)
* Make disabling DYNAMIC_ARCH on unsupported systems work
needs to be unset in the cache for the change to have any effect
* Mingw32 needs leading underscore on object names
(also copy BUNDERSCORE settings for FORTRAN from the corresponding Makefile)
6 years ago |
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350 |
- # Functions to help with the OpenBLAS build
-
- # Reads string from getarch into CMake vars. Format of getarch vars is VARNAME=VALUE
- function(ParseGetArchVars GETARCH_IN)
- string(REGEX MATCHALL "[0-9_a-zA-Z]+=[0-9_a-zA-Z]+" GETARCH_RESULT_LIST "${GETARCH_IN}")
- foreach (GETARCH_LINE ${GETARCH_RESULT_LIST})
- # split the line into var and value, then assign the value to a CMake var
- string(REGEX MATCHALL "[0-9_a-zA-Z]+" SPLIT_VAR "${GETARCH_LINE}")
- list(GET SPLIT_VAR 0 VAR_NAME)
- list(GET SPLIT_VAR 1 VAR_VALUE)
- set(${VAR_NAME} ${VAR_VALUE} PARENT_SCOPE)
- endforeach ()
- endfunction ()
-
- # Reads a Makefile into CMake vars.
- macro(ParseMakefileVars MAKEFILE_IN)
- message(STATUS "Reading vars from ${MAKEFILE_IN}...")
- file(STRINGS ${MAKEFILE_IN} makefile_contents)
- foreach (makefile_line ${makefile_contents})
- string(REGEX MATCH "([0-9_a-zA-Z]+)[ \t]*=[ \t]*(.+)$" line_match "${makefile_line}")
- if (NOT "${line_match}" STREQUAL "")
- set(var_name ${CMAKE_MATCH_1})
- set(var_value ${CMAKE_MATCH_2})
- # check for Makefile variables in the string, e.g. $(TSUFFIX)
- string(REGEX MATCHALL "\\$\\(([0-9_a-zA-Z]+)\\)" make_var_matches ${var_value})
- foreach (make_var ${make_var_matches})
- # strip out Makefile $() markup
- string(REGEX REPLACE "\\$\\(([0-9_a-zA-Z]+)\\)" "\\1" make_var ${make_var})
- # now replace the instance of the Makefile variable with the value of the CMake variable (note the double quote)
- string(REPLACE "$(${make_var})" "${${make_var}}" var_value ${var_value})
- endforeach ()
- set(${var_name} ${var_value})
- else ()
- string(REGEX MATCH "include \\$\\(KERNELDIR\\)/(.+)$" line_match "${makefile_line}")
- if (NOT "${line_match}" STREQUAL "")
- ParseMakefileVars(${KERNELDIR}/${CMAKE_MATCH_1})
- endif ()
- endif ()
- endforeach ()
- endmacro ()
-
- # Returns all combinations of the input list, as a list with colon-separated combinations
- # E.g. input of A B C returns A B C A:B A:C B:C
- # N.B. The input is meant to be a list, and to past a list to a function in CMake you must quote it (e.g. AllCombinations("${LIST_VAR}")).
- # #param absent_codes codes to use when an element is absent from a combination. For example, if you have TRANS;UNIT;UPPER you may want the code to be NNL when nothing is present.
- # @returns LIST_OUT a list of combinations
- # CODES_OUT a list of codes corresponding to each combination, with N meaning the item is not present, and the first letter of the list item meaning it is presen
- function(AllCombinations list_in absent_codes_in)
- list(LENGTH list_in list_count)
- set(num_combos 1)
- # subtract 1 since we will iterate from 0 to num_combos
- math(EXPR num_combos "(${num_combos} << ${list_count}) - 1")
- set(LIST_OUT "")
- set(CODES_OUT "")
- foreach (c RANGE 0 ${num_combos})
-
- set(current_combo "")
- set(current_code "")
-
- # this is a little ridiculous just to iterate through a list w/ indices
- math(EXPR last_list_index "${list_count} - 1")
- foreach (list_index RANGE 0 ${last_list_index})
- math(EXPR bit "1 << ${list_index}")
- math(EXPR combo_has_bit "${c} & ${bit}")
- list(GET list_in ${list_index} list_elem)
- if (combo_has_bit)
- if (current_combo)
- set(current_combo "${current_combo}:${list_elem}")
- else ()
- set(current_combo ${list_elem})
- endif ()
- string(SUBSTRING ${list_elem} 0 1 code_char)
- else ()
- list(GET absent_codes_in ${list_index} code_char)
- endif ()
- set(current_code "${current_code}${code_char}")
- endforeach ()
-
- if (current_combo STREQUAL "")
- list(APPEND LIST_OUT " ") # Empty set is a valid combination, but CMake isn't appending the empty string for some reason, use a space
- else ()
- list(APPEND LIST_OUT ${current_combo})
- endif ()
- list(APPEND CODES_OUT ${current_code})
-
- endforeach ()
-
- set(LIST_OUT ${LIST_OUT} PARENT_SCOPE)
- set(CODES_OUT ${CODES_OUT} PARENT_SCOPE)
- endfunction ()
-
- # generates object files for each of the sources, using the BLAS naming scheme to pass the function name as a preprocessor definition
- # @param sources_in the source files to build from
- # @param defines_in (optional) preprocessor definitions that will be applied to all objects
- # @param name_in (optional) if this is set this name will be used instead of the filename. Use a * to indicate where the float character should go, if no star the character will be prepended.
- # e.g. with DOUBLE set, "i*max" will generate the name "idmax", and "max" will be "dmax"
- # @param replace_last_with replaces the last character in the filename with this string (e.g. symm_k should be symm_TU)
- # @param append_with appends the filename with this string (e.g. trmm_R should be trmm_RTUU or some other combination of characters)
- # @param no_float_type turns off the float type define for this build (e.g. SINGLE/DOUBLE/etc)
- # @param complex_filename_scheme some routines have separate source files for complex and non-complex float types.
- # 0 - compiles for all types
- # 1 - compiles the sources for non-complex types only (SINGLE/DOUBLE)
- # 2 - compiles for complex types only (COMPLEX/DOUBLE COMPLEX)
- # 3 - compiles for all types, but changes source names for complex by prepending z (e.g. axpy.c becomes zaxpy.c)
- # 4 - compiles for complex types only, but changes source names for complex by prepending z (e.g. hemv.c becomes zhemv.c)
- # STRING - compiles only the given type (e.g. DOUBLE)
- function(GenerateNamedObjects sources_in)
-
- if (DEFINED ARGV1)
- set(defines_in ${ARGV1})
- endif ()
-
- if (DEFINED ARGV2 AND NOT "${ARGV2}" STREQUAL "")
- set(name_in ${ARGV2})
- # strip off extension for kernel files that pass in the object name.
- get_filename_component(name_in ${name_in} NAME_WE)
- endif ()
-
- if (DEFINED ARGV3)
- set(use_cblas ${ARGV3})
- else ()
- set(use_cblas false)
- endif ()
-
- if (DEFINED ARGV4)
- set(replace_last_with ${ARGV4})
- endif ()
-
- if (DEFINED ARGV5)
- set(append_with ${ARGV5})
- endif ()
-
- if (DEFINED ARGV6)
- set(no_float_type ${ARGV6})
- else ()
- set(no_float_type false)
- endif ()
-
- if (no_float_type)
- set(float_list "DUMMY") # still need to loop once
- else ()
- set(float_list "${FLOAT_TYPES}")
- endif ()
-
- set(real_only false)
- set(complex_only false)
- set(mangle_complex_sources false)
- if (DEFINED ARGV7 AND NOT "${ARGV7}" STREQUAL "")
- if (${ARGV7} EQUAL 1)
- set(real_only true)
- elseif (${ARGV7} EQUAL 2)
- set(complex_only true)
- elseif (${ARGV7} EQUAL 3)
- set(mangle_complex_sources true)
- elseif (${ARGV7} EQUAL 4)
- set(mangle_complex_sources true)
- set(complex_only true)
- elseif (NOT ${ARGV7} EQUAL 0)
- set(float_list ${ARGV7})
- endif ()
- endif ()
-
- if (complex_only)
- list(REMOVE_ITEM float_list "SINGLE")
- list(REMOVE_ITEM float_list "DOUBLE")
- elseif (real_only)
- list(REMOVE_ITEM float_list "COMPLEX")
- list(REMOVE_ITEM float_list "ZCOMPLEX")
- endif ()
-
- set(float_char "")
- set(OBJ_LIST_OUT "")
- foreach (float_type ${float_list})
- foreach (source_file ${sources_in})
-
- if (NOT no_float_type)
- string(SUBSTRING ${float_type} 0 1 float_char)
- string(TOLOWER ${float_char} float_char)
- endif ()
-
- if (NOT name_in)
- get_filename_component(source_name ${source_file} NAME_WE)
- set(obj_name "${float_char}${source_name}")
- else ()
- # replace * with float_char
- if (${name_in} MATCHES "\\*")
- string(REPLACE "*" ${float_char} obj_name ${name_in})
- else ()
- set(obj_name "${float_char}${name_in}")
- endif ()
- endif ()
-
- if (replace_last_with)
- string(REGEX REPLACE ".$" ${replace_last_with} obj_name ${obj_name})
- else ()
- set(obj_name "${obj_name}${append_with}")
- endif ()
-
- # now add the object and set the defines
- set(obj_defines ${defines_in})
-
- if (use_cblas)
- set(obj_name "cblas_${obj_name}")
- list(APPEND obj_defines "CBLAS")
- elseif (NOT "${obj_name}" MATCHES "${ARCH_SUFFIX}")
- set(obj_name "${obj_name}${ARCH_SUFFIX}")
- endif ()
-
- list(APPEND obj_defines "ASMNAME=${FU}${obj_name};ASMFNAME=${FU}${obj_name}${BU};NAME=${obj_name}${BU};CNAME=${obj_name};CHAR_NAME=\"${obj_name}${BU}\";CHAR_CNAME=\"${obj_name}\"")
- if (${float_type} STREQUAL "DOUBLE" OR ${float_type} STREQUAL "ZCOMPLEX")
- list(APPEND obj_defines "DOUBLE")
- endif ()
- if (${float_type} STREQUAL "COMPLEX" OR ${float_type} STREQUAL "ZCOMPLEX")
- list(APPEND obj_defines "COMPLEX")
- if (mangle_complex_sources)
- # add a z to the filename
- get_filename_component(source_name ${source_file} NAME)
- get_filename_component(source_dir ${source_file} DIRECTORY)
- string(REPLACE ${source_name} "z${source_name}" source_file ${source_file})
- endif ()
- endif ()
-
- if (VERBOSE_GEN)
- message(STATUS "${obj_name}:${source_file}")
- message(STATUS "${obj_defines}")
- endif ()
-
- # create a copy of the source to avoid duplicate obj filename problem with ar.exe
- get_filename_component(source_extension ${source_file} EXT)
- set(new_source_file "${CMAKE_CURRENT_BINARY_DIR}${CMAKE_FILES_DIRECTORY}/${obj_name}${source_extension}")
- if (IS_ABSOLUTE ${source_file})
- set(old_source_file ${source_file})
- else ()
- set(old_source_file "${CMAKE_CURRENT_LIST_DIR}/${source_file}")
- endif ()
-
- string(REPLACE ";" "\n#define " define_source "${obj_defines}")
- string(REPLACE "=" " " define_source "${define_source}")
- file(WRITE ${new_source_file}.tmp "#define ${define_source}\n#include \"${old_source_file}\"")
- configure_file(${new_source_file}.tmp ${new_source_file} COPYONLY)
- file(REMOVE ${new_source_file}.tmp)
- list(APPEND SRC_LIST_OUT ${new_source_file})
-
- endforeach ()
- endforeach ()
-
- list(APPEND OPENBLAS_SRC ${SRC_LIST_OUT})
- set(OPENBLAS_SRC ${OPENBLAS_SRC} PARENT_SCOPE)
- endfunction ()
-
- # generates object files for each of the sources for each of the combinations of the preprocessor definitions passed in
- # @param sources_in the source files to build from
- # @param defines_in the preprocessor definitions that will be combined to create the object files
- # @param all_defines_in (optional) preprocessor definitions that will be applied to all objects
- # @param replace_scheme If 1, replace the "k" in the filename with the define combo letters. E.g. symm_k.c with TRANS and UNIT defined will be symm_TU.
- # If 0, it will simply append the code, e.g. symm_L.c with TRANS and UNIT will be symm_LTU.
- # If 2, it will append the code with an underscore, e.g. symm.c with TRANS and UNIT will be symm_TU.
- # If 3, it will insert the code *around* the last character with an underscore, e.g. symm_L.c with TRANS and UNIT will be symm_TLU (required by BLAS level2 objects).
- # If 4, it will insert the code before the last underscore. E.g. trtri_U_parallel with TRANS will be trtri_UT_parallel
- # @param alternate_name replaces the source name as the object name (define codes are still appended)
- # @param no_float_type turns off the float type define for this build (e.g. SINGLE/DOUBLE/etc)
- # @param complex_filename_scheme see GenerateNamedObjects
- function(GenerateCombinationObjects sources_in defines_in absent_codes_in all_defines_in replace_scheme)
-
- set(alternate_name_in "")
- if (DEFINED ARGV5)
- set(alternate_name_in ${ARGV5})
- endif ()
-
- set(no_float_type false)
- if (DEFINED ARGV6)
- set(no_float_type ${ARGV6})
- endif ()
-
- set(complex_filename_scheme "")
- if (DEFINED ARGV7)
- set(complex_filename_scheme ${ARGV7})
- endif ()
-
- AllCombinations("${defines_in}" "${absent_codes_in}")
- set(define_combos ${LIST_OUT})
- set(define_codes ${CODES_OUT})
-
- list(LENGTH define_combos num_combos)
- math(EXPR num_combos "${num_combos} - 1")
-
- foreach (c RANGE 0 ${num_combos})
-
- list(GET define_combos ${c} define_combo)
- list(GET define_codes ${c} define_code)
-
- foreach (source_file ${sources_in})
-
- set(alternate_name ${alternate_name_in})
-
- # replace colon separated list with semicolons, this turns it into a CMake list that we can use foreach with
- string(REPLACE ":" ";" define_combo ${define_combo})
-
- # now add the object and set the defines
- set(cur_defines ${define_combo})
- if ("${cur_defines}" STREQUAL " ")
- set(cur_defines ${all_defines_in})
- else ()
- list(APPEND cur_defines ${all_defines_in})
- endif ()
-
- set(replace_code "")
- set(append_code "")
- if (replace_scheme EQUAL 1)
- set(replace_code ${define_code})
- else ()
- if (replace_scheme EQUAL 2)
- set(append_code "_${define_code}")
- elseif (replace_scheme EQUAL 3)
- if ("${alternate_name}" STREQUAL "")
- string(REGEX MATCH "[a-zA-Z]\\." last_letter ${source_file})
- else ()
- string(REGEX MATCH "[a-zA-Z]$" last_letter ${alternate_name})
- endif ()
- # first extract the last letter
- string(SUBSTRING ${last_letter} 0 1 last_letter) # remove period from match
- # break the code up into the first letter and the remaining (should only be 2 anyway)
- string(SUBSTRING ${define_code} 0 1 define_code_first)
- string(SUBSTRING ${define_code} 1 -1 define_code_second)
- set(replace_code "${define_code_first}${last_letter}${define_code_second}")
- elseif (replace_scheme EQUAL 4)
- # insert code before the last underscore and pass that in as the alternate_name
- if ("${alternate_name}" STREQUAL "")
- get_filename_component(alternate_name ${source_file} NAME_WE)
- endif ()
- set(extra_underscore "")
- # check if filename has two underscores, insert another if not (e.g. getrs_parallel needs to become getrs_U_parallel not getrsU_parallel)
- string(REGEX MATCH "_[a-zA-Z]+_" underscores ${alternate_name})
- string(LENGTH "${underscores}" underscores)
- if (underscores EQUAL 0)
- set(extra_underscore "_")
- endif ()
- string(REGEX REPLACE "(.+)(_[^_]+)$" "\\1${extra_underscore}${define_code}\\2" alternate_name ${alternate_name})
- else()
- set(append_code ${define_code}) # replace_scheme should be 0
- endif ()
- endif ()
-
- GenerateNamedObjects("${source_file}" "${cur_defines}" "${alternate_name}" false "${replace_code}" "${append_code}" "${no_float_type}" "${complex_filename_scheme}")
- endforeach ()
- endforeach ()
-
- set(OPENBLAS_SRC ${OPENBLAS_SRC} PARENT_SCOPE)
- endfunction ()
-
|