From 970e48e9e5530ba7d6289708c21e880a485d5e54 Mon Sep 17 00:00:00 2001 From: Ralf Gommers Date: Sat, 4 Jan 2025 15:35:21 +0100 Subject: [PATCH 01/10] docs: improve readability of the Build system page This only fixes Markdown syntax, and adds a few headers to bring some structure into the long list of variables that influence the build. It does not add or remove variables. --- docs/build_system.md | 113 ++++++++++++++++++++++++++----------------- 1 file changed, 69 insertions(+), 44 deletions(-) diff --git a/docs/build_system.md b/docs/build_system.md index 3de220580..872553749 100644 --- a/docs/build_system.md +++ b/docs/build_system.md @@ -1,7 +1,10 @@ -This page describes the Make-based build, which is the default/authoritative -build method. Note that the OpenBLAS repository also supports building with -CMake (not described here) - that generally works and is tested, however there -may be small differences between the Make and CMake builds. +!!! info "Supported build systems" + + This page describes the Make-based build, which is the + default/authoritative build method. Note that the OpenBLAS repository also + supports building with CMake (not described here) - that generally works + and is tested, however there may be small differences between the Make and + CMake builds. !!! warning This page is made by someone who is not the developer and should not be considered as an official documentation of the build system. For getting the full picture, it is best to read the Makefiles and understand them yourself. @@ -49,56 +52,78 @@ Makefile ## Important Variables -Most of the tunable variables are found in [Makefile.rule](https://github.com/xianyi/OpenBLAS/blob/develop/Makefile.rule), along with their detailed descriptions.
-Most of the variables are detected automatically in [Makefile.prebuild](https://github.com/xianyi/OpenBLAS/blob/develop/Makefile.prebuild), if they are not set in the environment. +Most of the tunable variables are found in +[Makefile.rule](https://github.com/xianyi/OpenBLAS/blob/develop/Makefile.rule), +along with their detailed descriptions. -### CPU related -``` -ARCH - Target architecture (eg. x86_64) -TARGET - Target CPU architecture, in case of DYNAMIC_ARCH=1 means library will not be usable on less capable CPUs -TARGET_CORE - TARGET_CORE will override TARGET internally during each cpu-specific cycle of the build for DYNAMIC_ARCH -DYNAMIC_ARCH - For building library for multiple TARGETs (does not lose any optimizations, but increases library size) -DYNAMIC_LIST - optional user-provided subset of the DYNAMIC_CORE list in Makefile.system -``` +Most of the variables are detected automatically in +[Makefile.prebuild](https://github.com/xianyi/OpenBLAS/blob/develop/Makefile.prebuild), +if they are not set in the environment. -### Toolchain related -``` -CC - TARGET C compiler used for compilation (can be cross-toolchains) -FC - TARGET Fortran compiler used for compilation (can be cross-toolchains, set NOFORTRAN=1 if used cross-toolchain has no fortran compiler) -AR, AS, LD, RANLIB - TARGET toolchain helpers used for compilation (can be cross-toolchains) -HOSTCC - compiler of build machine, needed to create proper config files for target architecture -HOST_CFLAGS - flags for build machine compiler -``` +### CPU related -### Library related -``` -BINARY - 32/64 bit library +- `ARCH`: target architecture (e.g., `x86-64`). +- `DYNAMIC_ARCH`: For building library for multiple `TARGET`s (does not lose any + optimizations, but increases library size). +- `DYNAMIC_LIST`: optional user-provided subset of the `DYNAMIC_CORE` list in + [Makefile.system](https://github.com/xianyi/OpenBLAS/blob/develop/Makefile.system). +- `TARGET`: target CPU architecture. In case of `DYNAMIC_ARCH=1`, it means that + the library will not be usable on less capable CPUs. +- `TARGET_CORE`: override `TARGET` internally during each CPU-specific cycle of + the build for `DYNAMIC_ARCH`. -BUILD_SHARED - Create shared library -BUILD_STATIC - Create static library -QUAD_PRECISION - enable support for IEEE quad precision [ largely unimplemented leftover from GotoBLAS, do not use ] -EXPRECISION - Obsolete option to use float80 of SSE on BSD-like systems -INTERFACE64 - Build with 64bit integer representations to support large array index values [ incompatible with standard API ] +### Toolchain related -BUILD_SINGLE - build the single-precision real functions of BLAS [and optionally LAPACK] -BUILD_DOUBLE - build the double-precision real functions -BUILD_COMPLEX - build the single-precision complex functions -BUILD_COMPLEX16 - build the double-precision complex functions -(all four types are included in the build by default when none was specifically selected) +- `CC`: `TARGET` C compiler used for compilation (can be cross-toolchains). +- `FC`: `TARGET` Fortran compiler used for compilation (can be cross-toolchains, + set `NOFORTRAN=1` if the used cross-toolchain has no Fortran compiler). +- `AR`, `AS`, `LD`, `RANLIB`: `TARGET` toolchain helpers used for compilation + (can be cross-toolchains). +- `HOSTCC`: compiler of build machine, needed to create proper config files for + the target architecture. +- `HOST_CFLAGS`: flags for the build machine compiler. -BUILD_BFLOAT16 - build the "half precision brainfloat" real functions - -USE_THREAD - Use a multithreading backend (default to pthread) -USE_LOCKING - implement locking for thread safety even when USE_THREAD is not set (so that the singlethreaded library can - safely be called from multithreaded programs) -USE_OPENMP - Use OpenMP as multithreading backend -NUM_THREADS - define this to the maximum number of parallel threads you expect to need (defaults to the number of cores in the build cpu) -NUM_PARALLEL - define this to the number of OpenMP instances that your code may use for parallel calls into OpenBLAS (default 1,see below) -``` +### Library related +#### Library kind and bitness options + +- `BINARY`: whether to build a 32-bit or 64-bit library (default is `64`, set + to `32` on a 32-bit platform). +- `BUILD_SHARED`: create a shared library +- `BUILD_STATIC`: create a static library +- `INTERFACE64`: build with 64-bit (ILP64) integer representations to support + large array index values (incompatible with the standard 32-bit integer (LP64) API). + +#### Data type options + +- `BUILD_SINGLE`: build the single-precision real functions of BLAS and (if + it's built) LAPACK +- `BUILD_DOUBLE`: build the double-precision real functions +- `BUILD_COMPLEX`: build the single-precision complex functions +- `BUILD_COMPLEX16`: build the double-precision complex functions +- `BUILD_BFLOAT16`: build the "half precision brainfloat" real functions +- `EXPRECISION`: obsolete option to use float80 of SSE on BSD-like systems +- `QUAD_PRECISION`: enable support for IEEE quad precision (largely + unimplemented leftover from GotoBLAS, do not use) + +By default, the single- and double-precision real and complex floating-point +functions are included in the build, while the half- and extended-precision +functions are not. + +#### Threading options + +- `USE_THREAD`: Use a multithreading backend (defaults to `pthreads`). +- `USE_LOCKING`: implement locking for thread safety even when `USE_THREAD` is + not set (so that the single-threaded library can safely be called from + multithreaded programs). +- `USE_OPENMP`: Use OpenMP as multithreading backend +- `NUM_THREADS`: define this to the maximum number of parallel threads you + expect to need (defaults to the number of cores in the build CPU). +- `NUM_PARALLEL`: define this to the number of OpenMP instances that your code + may use for parallel calls into OpenBLAS (the default is `1`, see below). OpenBLAS uses a fixed set of memory buffers internally, used for communicating and compiling partial results from individual threads. For efficiency, the From d4addc0688b0d12f91b15d6420b5ea966802e8b4 Mon Sep 17 00:00:00 2001 From: Ralf Gommers Date: Sat, 4 Jan 2025 16:02:34 +0100 Subject: [PATCH 02/10] docs: improve description of library, data type and toolchain build variables --- docs/build_system.md | 18 +++++++++++++----- 1 file changed, 13 insertions(+), 5 deletions(-) diff --git a/docs/build_system.md b/docs/build_system.md index 872553749..9ceed1365 100644 --- a/docs/build_system.md +++ b/docs/build_system.md @@ -79,6 +79,13 @@ if they are not set in the environment. - `CC`: `TARGET` C compiler used for compilation (can be cross-toolchains). - `FC`: `TARGET` Fortran compiler used for compilation (can be cross-toolchains, set `NOFORTRAN=1` if the used cross-toolchain has no Fortran compiler). +- `COMMON_OPT`: flags to add to all invocations of the target C and Fortran compilers + (overrides `CFLAGS`/`FFLAGS` - prefer using `COMMON_OPT`) +- `CCOMMON_OPT`: flags to add to all invocations of the target C compiler + (overrides `CFLAGS`) +- `FCOMMON_OPT`: flags to add to all invocations of the target Fortran compiler + (overrides `FFLAGS`) +- `LDFLAGS`: flags to add to all target linker invocations - `AR`, `AS`, `LD`, `RANLIB`: `TARGET` toolchain helpers used for compilation (can be cross-toolchains). - `HOSTCC`: compiler of build machine, needed to create proper config files for @@ -92,11 +99,13 @@ if they are not set in the environment. - `BINARY`: whether to build a 32-bit or 64-bit library (default is `64`, set to `32` on a 32-bit platform). -- `BUILD_SHARED`: create a shared library -- `BUILD_STATIC`: create a static library - `INTERFACE64`: build with 64-bit (ILP64) integer representations to support large array index values (incompatible with the standard 32-bit integer (LP64) API). +Note that both shared and static libraries will be built with the Make-based +build. The CMake build provides `BUILD_SHARED_LIBS`/`BUILD_STATIC_LIBS` +variables to allow building only one of the two. + #### Data type options - `BUILD_SINGLE`: build the single-precision real functions of BLAS and (if @@ -105,9 +114,8 @@ if they are not set in the environment. - `BUILD_COMPLEX`: build the single-precision complex functions - `BUILD_COMPLEX16`: build the double-precision complex functions - `BUILD_BFLOAT16`: build the "half precision brainfloat" real functions -- `EXPRECISION`: obsolete option to use float80 of SSE on BSD-like systems -- `QUAD_PRECISION`: enable support for IEEE quad precision (largely - unimplemented leftover from GotoBLAS, do not use) +- `EXPRECISION`: (do not use, this is a work in progress) option to use `long + double` functions By default, the single- and double-precision real and complex floating-point functions are included in the build, while the half- and extended-precision From c526b10b6897bfa7099e9e00060fb35a1bbbc3b5 Mon Sep 17 00:00:00 2001 From: Ralf Gommers Date: Sat, 4 Jan 2025 16:18:26 +0100 Subject: [PATCH 03/10] docs: add library and symbol name build variables --- docs/build_system.md | 14 ++++++++++++++ 1 file changed, 14 insertions(+) diff --git a/docs/build_system.md b/docs/build_system.md index 9ceed1365..aa5d1fe12 100644 --- a/docs/build_system.md +++ b/docs/build_system.md @@ -151,3 +151,17 @@ same time, then only one of them will be able to make progress while all the rest of them spin-wait for the one available buffer. Setting `NUM_PARALLEL` to the upper bound on the number of OpenMP runtimes that you can have in a process ensures that there are a sufficient number of buffer sets available. + +#### Library and symbol name options + +- `FIXED_LIBNAME`: if set to `1`, uses a non-versioned name for the library and + no symbolic linking to variant names (default is `0`) +- `LIBNAMEPREFIX`: prefix that, if given, will be inserted in the library name + before `openblas` (e.g., `xxx` will result in `libxxxopenblas.so`) +- `LIBNAMESUFFIX`: suffix that, if given, will be inserted in the library name + after `openblas`, separated by an underscore (e.g., `yyy` will result in + `libopenblas_yyy.so`) +- `SYMBOLPREFIX`: prefix that, if given, will be added to all symbol names + *and* to the library name +- `SYMBOLSUFFIX`: suffix that, if given, will be added to all symbol names + *and* to the library name From ed114150d13a2e3203fb0bffc8587330d33896a7 Mon Sep 17 00:00:00 2001 From: Ralf Gommers Date: Sat, 4 Jan 2025 16:28:31 +0100 Subject: [PATCH 04/10] docs: add the build variables for BLAS/LAPACK functionality --- docs/build_system.md | 23 +++++++++++++++++++---- 1 file changed, 19 insertions(+), 4 deletions(-) diff --git a/docs/build_system.md b/docs/build_system.md index aa5d1fe12..c8b8f36ea 100644 --- a/docs/build_system.md +++ b/docs/build_system.md @@ -60,6 +60,8 @@ Most of the variables are detected automatically in [Makefile.prebuild](https://github.com/xianyi/OpenBLAS/blob/develop/Makefile.prebuild), if they are not set in the environment. +The most commonly used variables are documented below. There are more options +though - please read the linked Makefiles if you want to see all variables. ### CPU related @@ -101,10 +103,8 @@ if they are not set in the environment. to `32` on a 32-bit platform). - `INTERFACE64`: build with 64-bit (ILP64) integer representations to support large array index values (incompatible with the standard 32-bit integer (LP64) API). - -Note that both shared and static libraries will be built with the Make-based -build. The CMake build provides `BUILD_SHARED_LIBS`/`BUILD_STATIC_LIBS` -variables to allow building only one of the two. +- `NO_STATIC`: if set to `1`, don't build a static library (default is `0`) +- `NO_SHARED`: if set to `1`, don't build a shared library (default is `0`) #### Data type options @@ -165,3 +165,18 @@ ensures that there are a sufficient number of buffer sets available. *and* to the library name - `SYMBOLSUFFIX`: suffix that, if given, will be added to all symbol names *and* to the library name + +#### BLAS and LAPACK options + +By default, the Fortran and C interfaces to BLAS and LAPACK are built, +including deprecated functions, while +[ReLAPACK](https://github.com/HPAC/ReLAPACK) is not. + +- `NO_CBLAS`: if set to `1`, don't build the CBLAS interface (default is `0`) +- `ONLY_CBLAS`: if set to `1`, only build the CBLAS interface (default is `0`) +- `NO_LAPACK`: if set to `1`, don't build LAPACK (default is `0`) +- `NO_LAPACKE`: if set to `1`, don't build the LAPACKE interface (default is `0`) +- `BUILD_LAPACK_DEPRECATED`: if set to `0`, don't build deprecated LAPACK + functions (default is `1`) +- `BUILD_RELAPACK`: if set to `1`, build Recursive LAPACK on top of LAPACK + (default is `0`) From 5aa1845a43e2bbe7a4d269de54dac05916eb5613 Mon Sep 17 00:00:00 2001 From: Ralf Gommers Date: Sat, 4 Jan 2025 16:55:43 +0100 Subject: [PATCH 05/10] docs: fix two broken links related to MSVC The doc build is now clean of warnings again. --- docs/faq.md | 2 +- docs/install.md | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/faq.md b/docs/faq.md index 1a3505ca9..93d76c67f 100644 --- a/docs/faq.md +++ b/docs/faq.md @@ -99,7 +99,7 @@ Here is the result of the DGEMM subroutine's performance on Intel Core i5-2500K ### How can I call an OpenBLAS function in Microsoft Visual Studio? -Please read [this page](install.md#visual-studio). +Please read [this page](install.md#visual-studio-native-windows-abi). ### How can I use CBLAS and LAPACKE without C99 complex number support (e.g. in Visual Studio)? diff --git a/docs/install.md b/docs/install.md index b7d8a3616..55ebc35c1 100644 --- a/docs/install.md +++ b/docs/install.md @@ -505,7 +505,7 @@ In your shell, move to this directory: `cd exports`. incompatibility in the C ABI would be a bug). The import libraries of MSVC have the suffix `.lib`. They are generated - from a `.def` file using MSVC's `lib.exe`. See [the MSVC instructions](use_visual_studio.md#generate-import-library-before-0210-version). + from a `.def` file using MSVC's `lib.exe`. === "MinGW" From f764d76a4a0306517727abac4c5ec4f924629666 Mon Sep 17 00:00:00 2001 From: Ralf Gommers Date: Sat, 4 Jan 2025 18:10:41 +0100 Subject: [PATCH 06/10] docs: improve the Makefile dependency graph Uses Mermaid to render it as a diagram in the html docs. --- .github/workflows/docs.yml | 2 +- docs/build_system.md | 65 +++++++++++++++----------------------- mkdocs.yml | 7 +++- 3 files changed, 33 insertions(+), 41 deletions(-) diff --git a/.github/workflows/docs.yml b/.github/workflows/docs.yml index da40b853f..391183d1c 100644 --- a/.github/workflows/docs.yml +++ b/.github/workflows/docs.yml @@ -23,7 +23,7 @@ jobs: python-version: "3.10" - name: Install MkDocs and doc theme packages - run: pip install mkdocs mkdocs-material mkdocs-git-revision-date-localized-plugin + run: pip install mkdocs mkdocs-material mkdocs-git-revision-date-localized-plugin mkdocs-mermaid2-plugin - name: Build docs site run: mkdocs build diff --git a/docs/build_system.md b/docs/build_system.md index c8b8f36ea..f26bfb917 100644 --- a/docs/build_system.md +++ b/docs/build_system.md @@ -9,47 +9,34 @@ !!! warning This page is made by someone who is not the developer and should not be considered as an official documentation of the build system. For getting the full picture, it is best to read the Makefiles and understand them yourself. -## Makefile dep graph - -``` -Makefile -| -|----- Makefile.system # !!! this is included by many of the Makefiles in the subdirectories !!! -| | -| |===== Makefile.prebuild # This is triggered (not included) once by Makefile.system -| | | # and runs before any of the actual library code is built. -| | | # (builds and runs the "getarch" tool for cpu identification, -| | | # runs the compiler detection scripts c_check and f_check) -| | | -| | ----- (Makefile.conf) [ either this or Makefile_kernel.conf is generated ] -| | | { Makefile.system#L243 } -| | ----- (Makefile_kernel.conf) [ temporary Makefile.conf during DYNAMIC_ARCH builds ] -| | -| |----- Makefile.rule # defaults for build options that can be given on the make command line -| | -| |----- Makefile.$(ARCH) # architecture-specific compiler options and OpenBLAS buffer size values -| -|~~~~~ exports/ -| -|~~~~~ test/ -| -|~~~~~ utest/ -| -|~~~~~ ctest/ -| -|~~~~~ cpp_thread_test/ -| -|~~~~~ kernel/ -| -|~~~~~ ${SUBDIRS} -| -|~~~~~ ${BLASDIRS} -| -|~~~~~ ${NETLIB_LAPACK_DIR}{,/timing,/testing/{EIG,LIN}} -| -|~~~~~ relapack/ +## Makefile dependency graph + + + +```mermaid +flowchart LR + A[Makefile] -->|included by many of the Makefiles in the subdirectories!| B(Makefile.system) + B -->|triggered, not included, once by Makefile.system, and runs before any of the actual library code is built. builds and runs the 'getarch' tool for cpu identification, runs the compiler detection scripts c_check/f_check| C{Makefile.prebuild} + C -->|either this or Makefile_kernel.conf is generated| D[Makefile.conf] + C -->|temporary Makefile.conf during DYNAMIC_ARCH builds| E[Makefile_kernel.conf] + B -->|defaults for build options that can be given on the make command line| F[Makefile.rule] + B -->|architecture-specific compiler options and OpenBLAS buffer size values| G[Makefile.$ARCH] + A --> exports + A -->|directories: test, ctest, utest, cpp_thread_test| H(test directories) + A --> I($BLASDIRS) + I --> interface + I --> driver/level2 + I --> driver/level3 + I --> driver/others + A -->|for each target in DYNAMIC_CORE if DYNAMIC_ARCH=1| kernel + A -->|subdirs: timing, testing, testing/EIG, testing/LIN| J($NETLIB_LAPACK_DIR) + A --> relapack ``` + ## Important Variables Most of the tunable variables are found in diff --git a/mkdocs.yml b/mkdocs.yml index 374b03e39..6e2b33be2 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -26,13 +26,18 @@ theme: plugins: - search + - mermaid2 - git-revision-date-localized: enable_creation_date: true markdown_extensions: - admonition - pymdownx.details - - pymdownx.superfences + - pymdownx.superfences: + custom_fences: + - name: mermaid + class: mermaid + format: !!python/name:mermaid2.fence_mermaid_custom - footnotes - pymdownx.tabbed: alternate_style: true From c0bf48fbf32da2197fa5093f0cc4a30f0b05238f Mon Sep 17 00:00:00 2001 From: Ralf Gommers Date: Sat, 4 Jan 2025 18:13:40 +0100 Subject: [PATCH 07/10] docs: remove warning on the Build system page Content is reviewed fairly carefully, and should be up to the same standard as the rest of the docs now. --- docs/build_system.md | 2 -- 1 file changed, 2 deletions(-) diff --git a/docs/build_system.md b/docs/build_system.md index f26bfb917..d5d76cc46 100644 --- a/docs/build_system.md +++ b/docs/build_system.md @@ -6,8 +6,6 @@ and is tested, however there may be small differences between the Make and CMake builds. -!!! warning - This page is made by someone who is not the developer and should not be considered as an official documentation of the build system. For getting the full picture, it is best to read the Makefiles and understand them yourself. ## Makefile dependency graph From 1833e68bee0bc2fee5dcc7f8b45580bd29269606 Mon Sep 17 00:00:00 2001 From: Ralf Gommers Date: Sat, 4 Jan 2025 20:55:39 +0100 Subject: [PATCH 08/10] docs: improve rendering of "Runtime variables" page --- docs/runtime_variables.md | 45 +++++++++++++++++++++++++-------------- mkdocs.yml | 1 + 2 files changed, 30 insertions(+), 16 deletions(-) diff --git a/docs/runtime_variables.md b/docs/runtime_variables.md index a43b98cac..f1ffb791f 100644 --- a/docs/runtime_variables.md +++ b/docs/runtime_variables.md @@ -1,25 +1,38 @@ -## Runtime variables - OpenBLAS checks the following environment variables on startup: -* **OPENBLAS_NUM_THREADS=** the number of threads to use (for non-OpenMP-builds of OpenBLAS) -* **OMP_NUM_THREADS=** the number of threads to use (for OpenMP builds - note that setting this may also affect any other OpenMP code) -* **OPENBLAS_DEFAULT_NUM_THREADS=** the number of threads to use, irrespective if OpenBLAS was built for OpenMP or pthreads +* `OPENBLAS_NUM_THREADS`: the number of threads to use (for non-OpenMP builds + of OpenBLAS) +* `OMP_NUM_THREADS`: the number of threads to use (for OpenMP builds - note + that setting this may also affect any other OpenMP code) +* `OPENBLAS_DEFAULT_NUM_THREADS`: the number of threads to use, irrespective if + OpenBLAS was built for OpenMP or pthreads + +* `OPENBLAS_MAIN_FREE=1`: this can be used to disable automatic assignment of + cpu affinity in OpenBLAS builds that have it enabled by default +* `OPENBLAS_THREAD_TIMEOUT`: this can be used to define the length of time + that idle threads should wait before exiting +* `OMP_ADAPTIVE=1`: this can be used in OpenMP builds to actually remove any + surplus threads when the number of threads is decreased -* **OPENBLAS_MAIN_FREE=1**" this can be used to disable automatic assignment of cpu affinity in OpenBLAS builds that have it enabled by default -* **OPENBLAS_THREAD_TIMEOUT=** this can be used to define the length of time that idle threads should wait before exiting -* **OMP_ADAPTIVE=1** this can be used in OpenMP builds to actually remove any surplus threads when the number of threads is decreased +`DYNAMIC_ARCH` builds also accept the following: -DYNAMIC_ARCH builds also accept the following: -* **OPENBLAS_VERBOSE=** set this to "1" to enable a warning when there is no exact match for the detected cpu in the library - set this to "2" to make OpenBLAS print the name of the cpu target it autodetected -* **OPENBLAS_CORETYPE=** set this to one of the supported target names to override autodetection, e.g. OPENBLAS_CORETYPE=HASWELL -* **OPENBLAS_L2_SIZE=** set this to override the autodetected size of the L2 cache where it is not reported correctly (in virtual environments) +* `OPENBLAS_VERBOSE`: + - set this to `1` to enable a warning when there is no exact match for the + detected cpu in the library + - set this to `2` to make OpenBLAS print the name of the cpu target it + autodetected + +* `OPENBLAS_CORETYPE`: set this to one of the supported target names to + override autodetection, e.g., `OPENBLAS_CORETYPE=HASWELL` +* `OPENBLAS_L2_SIZE`: set this to override the autodetected size of the L2 + cache where it is not reported correctly (in virtual environments) Deprecated variables still recognized for compatibilty: -* **GOTO_NUM_THREADS=** equivalent to **OPENBLAS_NUM_THREADS** -* **GOTOBLAS_MAIN_FREE** equivalent to **OPENBLAS_MAIN_FREE** -* **OPENBLAS_BLOCK_FACTOR** this applies a scale factor to the GEMM "P" parameter of the block matrix code, see file driver/others/parameter.cen + +* `GOTO_NUM_THREADS`: equivalent to `OPENBLAS_NUM_THREADS` +* `GOTOBLAS_MAIN_FREE`: equivalent to `OPENBLAS_MAIN_FREE` +* `OPENBLAS_BLOCK_FACTOR`: this applies a scale factor to the GEMM "P" + parameter of the block matrix code, see file `driver/others/parameter.c` diff --git a/mkdocs.yml b/mkdocs.yml index 6e2b33be2..333344fe3 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -51,6 +51,7 @@ nav: - extensions.md - developers.md - build_system.md + - runtime_variables.md - distributing.md - ci.md - about.md From eda80f436a35491078e226ae6a471c419e8fda7a Mon Sep 17 00:00:00 2001 From: Ralf Gommers Date: Sat, 4 Jan 2025 21:10:43 +0100 Subject: [PATCH 09/10] docs: improve rendering of Windows on Arm instructions --- docs/install.md | 59 ++++++++++++++++++++++++++++++++----------------- 1 file changed, 39 insertions(+), 20 deletions(-) diff --git a/docs/install.md b/docs/install.md index 55ebc35c1..7ac85e82d 100644 --- a/docs/install.md +++ b/docs/install.md @@ -443,28 +443,43 @@ A fully functional native OpenBLAS for WoA that can be built as both a static an (Note that you can use the free "Visual Studio 2022 Community Edition" for this task. In principle it would be possible to build with VisualStudio alone, but using the LLVM toolchain enables native compilation of the Fortran sources of LAPACK and of all the optimized assembly files, which VisualStudio cannot handle on its own) -1. Clone OpenBLAS to your local machine and checkout to latest release of OpenBLAS (unless you want to build the latest development snapshot - here we are using the 0.3.28 release as the example, of course this exact version may be outdated by the time you read this) +1. Clone OpenBLAS to your local machine and checkout to latest release of + OpenBLAS (unless you want to build the latest development snapshot - here we + are using the 0.3.28 release as the example, of course this exact version + may be outdated by the time you read this) - ```cmd - git clone https://github.com/OpenMathLib/OpenBLAS.git - cd OpenBLAS - git checkout v0.3.28 - ``` + ```cmd + git clone https://github.com/OpenMathLib/OpenBLAS.git + cd OpenBLAS + git checkout v0.3.28 + ``` 2. Install Latest LLVM toolchain for WoA: -Download the Latest LLVM toolchain for WoA from [the Release page](https://github.com/llvm/llvm-project/releases/tag/llvmorg-19.1.5). At the time of writing, this is version 19.1.5 - be sure to select the latest release for which you can find a precompiled package whose name ends in "-woa64.exe" (precompiled packages -usually lag a week or two behind their corresponding source release). -Make sure to enable the option “Add LLVM to the system PATH for all the users” -Note: Make sure that the path of LLVM toolchain is at the top of Environment Variables section to avoid conflicts between the set of compilers available in the system path + Download the Latest LLVM toolchain for WoA from [the Release + page](https://github.com/llvm/llvm-project/releases/tag/llvmorg-19.1.5). At + the time of writing, this is version 19.1.5 - be sure to select the + latest release for which you can find a precompiled package whose name ends + in "-woa64.exe" (precompiled packages usually lag a week or two behind their + corresponding source release). Make sure to enable the option + *“Add LLVM to the system PATH for all the users”*. + + Note: Make sure that the path of LLVM toolchain is at the top of Environment + Variables section to avoid conflicts between the set of compilers available + in the system path 3. Launch the Native Command Prompt for Windows ARM64: -From the start menu search for “ARM64 Native Tools Command Prompt for Visual Studio 2022” -Alternatively open command prompt, run the following command to activate the environment: -"C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Auxiliary\Build\vcvarsarm64.bat" + From the start menu search for *"ARM64 Native Tools Command Prompt for Visual + Studio 2022"*. Alternatively open command prompt, run the following command to + activate the environment: + + ```cmd + C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Auxiliary\Build\vcvarsarm64.bat + ``` -Navigate to the OpenBLAS source code directory and start building OpenBLAS by invoking Ninja: +4. Navigate to the OpenBLAS source code directory and start building OpenBLAS + by invoking Ninja: ```cmd cd OpenBLAS @@ -476,14 +491,18 @@ Navigate to the OpenBLAS source code directory and start building OpenBLAS by in ninja -j16 ``` -Note: You might want to include additional options in the cmake command here. For example, the default configuration only generates a static.lib version of the library. If you prefer a DLL, you can add -DBUILD_SHARED_LIBS=ON. - -Note that it is also possible to use the same setup to build OpenBLAS with Make, if you prepare Makefiles over the CMake build for some reason: + Note: You might want to include additional options in the cmake command + here. For example, the default configuration only generates a + `static.lib` version of the library. If you prefer a DLL, you can add + `-DBUILD_SHARED_LIBS=ON`. - ```cmd - $ make CC=clang-cl FC=flang-new AR="llvm-ar" TARGET=ARMV8 ARCH=arm64 RANLIB="llvm-ranlib" MAKE=make - ``` + Note that it is also possible to use the same setup to build OpenBLAS + with Make, if you prefer Makefiles over the CMake build for some + reason: + ```cmd + $ make CC=clang-cl FC=flang-new AR="llvm-ar" TARGET=ARMV8 ARCH=arm64 RANLIB="llvm-ranlib" MAKE=make + ``` #### Generating an import library From f697cfe0d0023afd96bae6bc1026b0d451e1ce6e Mon Sep 17 00:00:00 2001 From: Ralf Gommers Date: Sat, 4 Jan 2025 21:18:07 +0100 Subject: [PATCH 10/10] docs: improve the rendering of the HarmonyOS build instructions --- docs/install.md | 35 +++++++++++++++++++++++------------ 1 file changed, 23 insertions(+), 12 deletions(-) diff --git a/docs/install.md b/docs/install.md index 7ac85e82d..a3174202f 100644 --- a/docs/install.md +++ b/docs/install.md @@ -711,25 +711,36 @@ to the minimum iOS version you want to target and execute this file to build the ### HarmonyOS -For this target you will need the cross-compiler toolchain package by Huawei, which contains solutions for both Windows and Linux. Only the Linux-based -toolchain has been tested so far, but the following instructions may apply similarly to Windows: - -Download https://repo.huaweicloud.com/harmonyos/os/4.1.1-Release/ohos-sdk-windows_linux-public.tar.gz (or whatever newer version may be available in the future). Use tar xvf ohos-sdk-windows_linux_public.tar.gz to unpack it somewhere on your system. This will create a folder named "ohos-sdk" with subfolders "linux" and "windows". In the linux one you will find a ZIP archive named "native-linux-x64-4.1.7.8-Release.zip" - you need to unzip this where you want to -install the cross-compiler, for example in /opt/ohos-sdk. +For this target you will need the cross-compiler toolchain package by Huawei, +which contains solutions for both Windows and Linux. Only the Linux-based +toolchain has been tested so far, but the following instructions may apply +similarly to Windows: + +Download [this HarmonyOS 4.1.1 SDK](https://repo.huaweicloud.com/harmonyos/os/4.1.1-Release/ohos-sdk-windows_linux-public.tar.gz), +or whatever newer version may be available in the future). Use `tar -xvf +ohos-sdk-windows_linux_public.tar.gz` to unpack it somewhere on your system. +This will create a folder named "ohos-sdk" with subfolders "linux" and +"windows". In the linux one you will find a ZIP archive named +`native-linux-x64-4.1.7.8-Release.zip` - you need to unzip this where you want +to install the cross-compiler, for example in `/opt/ohos-sdk`. In the directory where you unpacked OpenBLAS, create a build directory for cmake, and change into it : -``` +```bash mkdir build cd build ``` -Use the version of `cmake` that came with the SDK, and specify the location of its toolchain file as a cmake option. Also set the build target for OpenBLAS to ARMV8 and specify NOFORTRAN=1 (at least as of version 4.1.1, the SDK contains no Fortran compiler): -``` -/opt/ohos-sdk/linux/native/build-tools/cmake/bin/cmake -DCMAKE_TOOLCHAIN_FILE=/opt/ohos-sdk/linux/native/build/cmake/ohos.toolchain.cmake \ +Use the version of `cmake` that came with the SDK, and specify the location of +its toolchain file as a cmake option. Also set the build target for OpenBLAS to +`ARMV8` and specify `NOFORTRAN=1` (at least as of version 4.1.1, the SDK +contains no Fortran compiler): +```bash +/opt/ohos-sdk/linux/native/build-tools/cmake/bin/cmake \ + -DCMAKE_TOOLCHAIN_FILE=/opt/ohos-sdk/linux/native/build/cmake/ohos.toolchain.cmake \ -DOHOS_ARCH="arm64-v8a" -DTARGET=ARMV8 -DNOFORTRAN=1 .. ``` -Additional other OpenBLAS build options like USE_OPENMP=1 or DYNAMIC_ARCH=1 will probably work too. -Finally do the build: -``` +Additional other OpenBLAS build options like `USE_OPENMP=1` or `DYNAMIC_ARCH=1` +will probably work too. Finally do the build: +```bash /opt/ohos-sdk/linux/native/build-tools/cmake/bin/cmake --build . ```