| @@ -1,70 +1,174 @@ | |||
| ## Compile the library | |||
| This user manual covers compiling OpenBLAS itself, linking your code to OpenBLAS, | |||
| example code to use the C (CBLAS) and Fortran (BLAS) APIs, and some troubleshooting | |||
| tips. Compiling OpenBLAS is optional, since you may be able to install with a | |||
| package manager. | |||
| !!! Note BLAS API reference documentation | |||
| The OpenBLAS documentation does not contain API reference documentation for | |||
| BLAS or LAPACK, since these are standardized APIs, the documentation for | |||
| which can be found in other places. If you want to understand every BLAS | |||
| function and definition, we recommend reading the | |||
| [Intel MKL reference manual](https://software.intel.com/en-us/intel-mkl/documentation) | |||
| or the [Netlib BLAS documentation](http://netlib.org/blas/). | |||
| OpenBLAS does contain a limited number of functions that are non-standard, | |||
| these are documented at [OpenBLAS extension functions](extensions.md). | |||
| ## Compiling OpenBLAS | |||
| ### Normal compile | |||
| * type `make` to detect the CPU automatically. | |||
| or | |||
| * type `make TARGET=xxx` to set target CPU, e.g. `make TARGET=NEHALEM`. The full target list is in file TargetList.txt. | |||
| ### Cross compile | |||
| Please set `CC` and `FC` with the cross toolchains. Then, set `HOSTCC` with your host C compiler. At last, set `TARGET` explicitly. | |||
| The default way to build and install OpenBLAS from source is with Make: | |||
| ``` | |||
| make # add `-j4` to compile in parallel with 4 processes | |||
| make install | |||
| ``` | |||
| Examples: | |||
| By default, the CPU architecture is detected automatically when invoking | |||
| `make`, and the build is optimized for the detected CPU. To override the | |||
| autodetection, use the `TARGET` flag: | |||
| * On x86 box, compile the library for ARM Cortex-A9 linux. | |||
| ``` | |||
| # `make TARGET=xxx` sets target CPU: e.g. for an Intel Nehalem CPU: | |||
| make TARGET=NEHALEM | |||
| ``` | |||
| The full list of known target CPU architectures can be found in | |||
| `TargetList.txt` in the root of the repository. | |||
| Install only gnueabihf versions. Please check https://github.com/xianyi/OpenBLAS/issues/936#issuecomment-237596847 | |||
| ### Cross compile | |||
| make CC=arm-linux-gnueabihf-gcc FC=arm-linux-gnueabihf-gfortran HOSTCC=gcc TARGET=CORTEXA9 | |||
| For a basic cross-compilation with Make, three steps need to be taken: | |||
| * On X86 box, compile this library for loongson3a CPU. | |||
| - Set the `CC` and `FC` environment variables to select the cross toolchains | |||
| for C and Fortran. | |||
| - Set the `HOSTCC` environment variable to select the host C compiler (i.e. the | |||
| regular C compiler for the machine on which you are invoking the build). | |||
| - Set `TARGET` explicitly to the CPU architecture on which the produced | |||
| OpenBLAS binaries will be used. | |||
| #### Cross-compilation examples | |||
| Compile the library for ARM Cortex-A9 linux on an x86-64 machine | |||
| _(note: install only `gnueabihf` versions of the cross toolchain - see | |||
| [this issue comment](https://github.com/OpenMathLib/OpenBLAS/issues/936#issuecomment-237596847) | |||
| for why_): | |||
| ``` | |||
| make BINARY=64 CC=mips64el-unknown-linux-gnu-gcc FC=mips64el-unknown-linux-gnu-gfortran HOSTCC=gcc TARGET=LOONGSON3A | |||
| make CC=arm-linux-gnueabihf-gcc FC=arm-linux-gnueabihf-gfortran HOSTCC=gcc TARGET=CORTEXA9 | |||
| ``` | |||
| * On X86 box, compile this library for loongson3a CPU with loongcc (based on Open64) compiler. | |||
| Compile OpenBLAS for a loongson3a CPU on an x86-64 machine: | |||
| ``` | |||
| make BINARY=64 CC=mips64el-unknown-linux-gnu-gcc FC=mips64el-unknown-linux-gnu-gfortran HOSTCC=gcc TARGET=LOONGSON3A | |||
| ``` | |||
| Compile OpenBLAS for loongson3a CPU with the `loongcc` (based on Open64) compiler on an x86-64 machine: | |||
| ``` | |||
| make CC=loongcc FC=loongf95 HOSTCC=gcc TARGET=LOONGSON3A CROSS=1 CROSS_SUFFIX=mips64el-st-linux-gnu- NO_LAPACKE=1 NO_SHARED=1 BINARY=32 | |||
| ``` | |||
| ### Debug version | |||
| ### Building a debug version | |||
| make DEBUG=1 | |||
| Add `DEBUG=1` to your build command, e.g.: | |||
| ``` | |||
| make DEBUG=1 | |||
| ``` | |||
| ### Install to the directory (optional) | |||
| ### Install to a specific directory | |||
| Example: | |||
| !!! note | |||
| make install PREFIX=your_installation_directory | |||
| Installing to a directory is optional; it is also possible to use the shared or static | |||
| libraries directly from the build directory. | |||
| The default directory is /opt/OpenBLAS. Note that any flags passed to `make` during build should also be passed to `make install` to circumvent any install errors, i.e. some headers not being copied over correctly. | |||
| Use `make install` with the `PREFIX` flag to install to a specific directory: | |||
| For more information, please read [Installation Guide](install.md). | |||
| ``` | |||
| make install PREFIX=/path/to/installation/directory | |||
| ``` | |||
| The default directory is `/opt/OpenBLAS`. | |||
| !!! important | |||
| Note that any flags passed to `make` during build should also be passed to | |||
| `make install` to circumvent any install errors, i.e. some headers not | |||
| being copied over correctly. | |||
| ## Link the library | |||
| For more detailed information on building/installing from source, please read | |||
| the [Installation Guide](install.md). | |||
| * Link shared library | |||
| ## Linking to OpenBLAS | |||
| OpenBLAS can be used as a shared or a static library. | |||
| ### Link a shared library | |||
| The shared library is normally called `libopenblas.so`, but not that the name | |||
| may be different as a result of build flags used or naming choices by a distro | |||
| packager (see [distributing.md] for details). To link a shared library named | |||
| `libopenblas.so`, the flag `-lopenblas` is needed. To find the OpenBLAS headers, | |||
| a `-I/path/to/includedir` is needed. And unless the library is installed in a | |||
| directory that the linker searches by default, also `-L` and `-Wl,-rpath` flags | |||
| are needed. For a source file `test.c` (e.g., the example code under _Call | |||
| CBLAS interface_ further down), the shared library can then be linked with: | |||
| ``` | |||
| gcc -o test test.c -I/your_path/OpenBLAS/include/ -L/your_path/OpenBLAS/lib -Wl,-rpath,/your_path/OpenBLAS/lib -lopenblas | |||
| ``` | |||
| The `-Wl,-rpath,/your_path/OpenBLAS/lib` option to linker can be omitted if you ran `ldconfig` to update linker cache, put `/your_path/OpenBLAS/lib` in `/etc/ld.so.conf` or a file in `/etc/ld.so.conf.d`, or installed OpenBLAS in a location that is part of the `ld.so` default search path (usually /lib,/usr/lib and /usr/local/lib). Alternatively, you can set the environment variable LD_LIBRARY_PATH to point to the folder that contains libopenblas.so. Otherwise, linking at runtime will fail with a message like `cannot open shared object file: no such file or directory` | |||
| The `-Wl,-rpath,/your_path/OpenBLAS/lib` linker flag can be omitted if you | |||
| ran `ldconfig` to update linker cache, put `/your_path/OpenBLAS/lib` in | |||
| `/etc/ld.so.conf` or a file in `/etc/ld.so.conf.d`, or installed OpenBLAS in a | |||
| location that is part of the `ld.so` default search path (usually `/lib`, | |||
| `/usr/lib` and `/usr/local/lib`). Alternatively, you can set the environment | |||
| variable `LD_LIBRARY_PATH` to point to the folder that contains `libopenblas.so`. | |||
| Otherwise, the build may succeed but at runtime loading the library will fail | |||
| with a message like: | |||
| ``` | |||
| cannot open shared object file: no such file or directory | |||
| ``` | |||
| If the library is multithreaded, please add `-lpthread`. If the library contains LAPACK functions, please add `-lgfortran` or other Fortran libs, although if you only make calls to LAPACKE routines, i.e. your code has `#include "lapacke.h"` and makes calls to methods like `LAPACKE_dgeqrf`, `-lgfortran` is not needed. | |||
| More flags may be needed, depending on how OpenBLAS was built: | |||
| * Link static library | |||
| - If `libopenblas` is multi-threaded, please add `-lpthread`. | |||
| - If the library contains LAPACK functions (usually also true), please add | |||
| `-lgfortran` (other Fortran libraries may also be needed, e.g. `-lquadmath`). | |||
| Note that if you only make calls to LAPACKE routines, i.e. your code has | |||
| `#include "lapacke.h"` and makes calls to methods like `LAPACKE_dgeqrf`, | |||
| then `-lgfortran` is not needed. | |||
| !!! tip Use pkg-config | |||
| Usually a pkg-config file (e.g., `openblas.pc`) is installed together | |||
| with a `libopenblas` shared library. pkg-config is a tool that will | |||
| tell you the exact flags needed for linking. For example: | |||
| ``` | |||
| $ pkg-config --cflags openblas | |||
| -I/usr/local/include | |||
| $ pkg-config --libs openblas | |||
| -L/usr/local/lib -lopenblas | |||
| ``` | |||
| ### Link a static library | |||
| Linking a static library is simpler - add the path to the static OpenBLAS | |||
| library to the compile command: | |||
| ``` | |||
| gcc -o test test.c /your/path/libopenblas.a | |||
| ``` | |||
| You can download `test.c` from https://gist.github.com/xianyi/5780018 | |||
| ## Code examples | |||
| ### Call CBLAS interface | |||
| This example shows calling cblas_dgemm in C. https://gist.github.com/xianyi/6930656 | |||
| This example shows calling `cblas_dgemm` in C: | |||
| <!-- Source: https://gist.github.com/xianyi/6930656 --> | |||
| ```c | |||
| #include <cblas.h> | |||
| #include <stdio.h> | |||
| @@ -83,14 +187,17 @@ void main() | |||
| } | |||
| ``` | |||
| To compile this file, save it as `test_cblas_dgemm.c` and then run: | |||
| ``` | |||
| gcc -o test_cblas_open test_cblas_dgemm.c -I /your_path/OpenBLAS/include/ -L/your_path/OpenBLAS/lib -lopenblas -lpthread -lgfortran | |||
| gcc -o test_cblas_open test_cblas_dgemm.c -I/your_path/OpenBLAS/include/ -L/your_path/OpenBLAS/lib -lopenblas -lpthread -lgfortran | |||
| ``` | |||
| will result in a `test_cblas_open` executable. | |||
| ### Call BLAS Fortran interface | |||
| This example shows calling dgemm Fortran interface in C. https://gist.github.com/xianyi/5780018 | |||
| This example shows calling the `dgemm` Fortran interface in C: | |||
| <!-- Source: https://gist.github.com/xianyi/5780018 --> | |||
| ```c | |||
| #include "stdio.h" | |||
| #include "stdlib.h" | |||
| @@ -158,22 +265,41 @@ int main(int argc, char* argv[]) | |||
| } | |||
| ``` | |||
| To compile this file, save it as `time_dgemm.c` and then run: | |||
| ``` | |||
| gcc -o time_dgemm time_dgemm.c /your/path/libopenblas.a -lpthread | |||
| ./time_dgemm <m> <n> <k> | |||
| ``` | |||
| You can then run it as: `./time_dgemm <m> <n> <k>`, with `m`, `n`, and `k` input | |||
| parameters to the `time_dgemm` executable. | |||
| ## Troubleshooting | |||
| !!! note | |||
| * Please read [Faq](faq.md) at first. | |||
| * Please use gcc version 4.6 and above to compile Sandy Bridge AVX kernels on Linux/MingW/BSD. | |||
| * Please use Clang version 3.1 and above to compile the library on Sandy Bridge microarchitecture. The Clang 3.0 will generate the wrong AVX binary code. | |||
| * The number of CPUs/Cores should less than or equal to 256. On Linux x86_64(amd64), there is experimental support for up to 1024 CPUs/Cores and 128 numa nodes if you build the library with BIGNUMA=1. | |||
| * OpenBLAS does not set processor affinity by default. On Linux, you can enable processor affinity by commenting the line NO_AFFINITY=1 in Makefile.rule. But this may cause [the conflict with R parallel](https://stat.ethz.ch/pipermail/r-sig-hpc/2012-April/001348.html). | |||
| * On Loongson 3A. make test would be failed because of pthread_create error. The error code is EAGAIN. However, it will be OK when you run the same testcase on shell. | |||
| When calling the Fortran interface from C, you have to deal with symbol name | |||
| differences caused by compiler conventions. That is why the `dgemm_` function | |||
| call in the example above has a trailing underscore. This is what it looks like | |||
| when using `gcc`/`gfortran`, however such details may change for different | |||
| compilers. Hence it requires extra support code. The CBLAS interface may be | |||
| more portable when writing C code. | |||
| ## BLAS reference manual | |||
| When writing code that needs to be portable and work across different | |||
| platforms and compilers, the above code example is not recommended for | |||
| usage. Instead, we advise looking at how OpenBLAS (or BLAS in general, since | |||
| this problem isn't specific to OpenBLAS) functions are called in widely | |||
| used projects like Julia, SciPy, or R. | |||
| If you want to understand every BLAS function and definition, please read [Intel MKL reference manual](https://software.intel.com/en-us/intel-mkl/documentation) or [netlib.org](http://netlib.org/blas/) | |||
| Here are [OpenBLAS extension functions](extensions.md) | |||
| ## Troubleshooting | |||
| * Please read the [FAQ](faq.md) first, your problem may be described there. | |||
| * Please ensure you are using a recent enough compiler, that supports the | |||
| features your CPU provides (example: GCC versions before 4.6 were known to | |||
| not support AVX kernels, and before 6.1 AVX512CD kernels). | |||
| * The number of CPU cores supported by default is <=256. On Linux x86-64, there | |||
| is experimental support for up to 1024 cores and 128 NUMA nodes if you build | |||
| the library with `BIGNUMA=1`. | |||
| * OpenBLAS does not set processor affinity by default. On Linux, you can enable | |||
| processor affinity by commenting out the line `NO_AFFINITY=1` in | |||
| `Makefile.rule`. | |||
| * On Loongson 3A, `make test` is known to fail with a `pthread_create` error | |||
| and an `EAGAIN` error code. However, it will be OK when you run the same | |||
| testcase in a shell. | |||