@@ -1481,7 +1501,7 @@ Here is the result of the DGEMM subroutine's performance on Intel Core i5-2500K
<h2 id="os-and-compiler">OS and Compiler</h2>
<h3 id="how-can-i-call-an-openblas-function-in-microsoft-visual-studio"><a name="MSVC"></a>How can I call an OpenBLAS function in Microsoft Visual Studio?</h3>
<h3 id="how-can-i-use-cblas-and-lapacke-without-c99-complex-number-support-eg-in-visual-studio"><a name="C99_complex_number"></a>How can I use CBLAS and LAPACKE without C99 complex number support (e.g. in Visual Studio)?</h3>
<p>Zaheer has fixed this bug. You can now use the structure instead of C99 complex numbers. Please read <a href="http://github.com/xianyi/OpenBLAS/issues/95">this issue page</a> for details.</p>
<p><a href="https://github.com/OpenMathLib/OpenBLAS/issues/305">This issue</a> is for using LAPACKE in Visual Studio.</p>
@@ -1667,7 +1687,7 @@ from the main thread of your program (i.e. outside of omp parallel constructs),
@@ -1320,7 +1340,10 @@ use the <a href="https://github.com/rainers/cv2pdb">cv2pdb</a> tool to do so.</p
the LLVM toolchain enables native compilation of the Fortran sources of LAPACK and of all the optimized assembly files, which VisualStudio cannot handle on its own)</p>
<ol>
<li>
<p>Clone OpenBLAS to your local machine and checkout to latest release of OpenBLAS (unless you want to build the latest development snapshot - here we are using the 0.3.28 release as the example, of course this exact version may be outdated by the time you read this) </p>
<p>Clone OpenBLAS to your local machine and checkout to latest release of
OpenBLAS (unless you want to build the latest development snapshot - here we
are using the 0.3.28 release as the example, of course this exact version
<p>Note: You might want to include additional options in the cmake command
here. For example, the default configuration only generates a
<code>static.lib</code> version of the library. If you prefer a DLL, you can add
<code>-DBUILD_SHARED_LIBS=ON</code>.</p>
<p>Note that it is also possible to use the same setup to build OpenBLAS
with Make, if you prefer Makefiles over the CMake build for some
reason:</p>
<div class="highlight"><pre><span></span><code>$ make CC=clang-cl FC=flang-new AR="llvm-ar" TARGET=ARMV8 ARCH=arm64 RANLIB="llvm-ranlib" MAKE=make
</code></pre></div>
</li>
</ol>
<p>Download the Latest LLVM toolchain for WoA from <a href="https://github.com/llvm/llvm-project/releases/tag/llvmorg-19.1.5">the Release page</a>. At the time of writing, this is version 19.1.5 - be sure to select the latest release for which you can find a precompiled package whose name ends in "-woa64.exe" (precompiled packages
usually lag a week or two behind their corresponding source release).<br />
Make sure to enable the option “Add LLVM to the system PATH for all the users”
Note: Make sure that the path of LLVM toolchain is at the top of Environment Variables section to avoid conflicts between the set of compilers available in the system path</p>
<ol>
<li>Launch the Native Command Prompt for Windows ARM64:</li>
</ol>
<p>From the start menu search for “ARM64 Native Tools Command Prompt for Visual Studio 2022”
Alternatively open command prompt, run the following command to activate the environment:
<p>Note: You might want to include additional options in the cmake command here. For example, the default configuration only generates a static.lib version of the library. If you prefer a DLL, you can add -DBUILD_SHARED_LIBS=ON.</p>
<p>Note that it is also possible to use the same setup to build OpenBLAS with Make, if you prepare Makefiles over the CMake build for some reason:</p>
<pre><code>```cmd
$ make CC=clang-cl FC=flang-new AR="llvm-ar" TARGET=ARMV8 ARCH=arm64 RANLIB="llvm-ranlib" MAKE=make
```
</code></pre>
<h4 id="generating-an-import-library">Generating an import library</h4>
<p>Microsoft Windows has this thing called "import libraries". You need it for
MSVC; you don't need it for MinGW because the <code>ld</code> linker is smart enough -
@@ -1373,7 +1408,7 @@ In your shell, move to this directory: <code>cd exports</code>.</p>
MSVC and MinGW are actually identical, so linking is actually okay (any
incompatibility in the C ABI would be a bug).</p>
<p>The import libraries of MSVC have the suffix <code>.lib</code>. They are generated
from a <code>.def</code> file using MSVC's <code>lib.exe</code>. See <a href="use_visual_studio.md#generate-import-library-before-0210-version">the MSVC instructions</a>.</p>
from a <code>.def</code> file using MSVC's <code>lib.exe</code>.</p>
</div>
<div class="tabbed-block">
<p>MinGW import libraries have the suffix <code>.a</code>, just like static libraries.
Adjust <code>MIN_IOS_VERSION</code> as necessary for your installation. E.g., change the version number
to the minimum iOS version you want to target and execute this file to build the library.</p>
<h3 id="harmonyos">HarmonyOS</h3>
<p>For this target you will need the cross-compiler toolchain package by Huawei, which contains solutions for both Windows and Linux. Only the Linux-based
toolchain has been tested so far, but the following instructions may apply similarly to Windows:</p>
<p>Download https://repo.huaweicloud.com/harmonyos/os/4.1.1-Release/ohos-sdk-windows_linux-public.tar.gz (or whatever newer version may be available in the future). Use tar xvf ohos-sdk-windows_linux_public.tar.gz to unpack it somewhere on your system. This will create a folder named "ohos-sdk" with subfolders "linux" and "windows". In the linux one you will find a ZIP archive named "native-linux-x64-4.1.7.8-Release.zip" - you need to unzip this where you want to
install the cross-compiler, for example in /opt/ohos-sdk.</p>
<p>For this target you will need the cross-compiler toolchain package by Huawei,
which contains solutions for both Windows and Linux. Only the Linux-based
toolchain has been tested so far, but the following instructions may apply
Use the version of <code>cmake</code> that came with the SDK, and specify the location of its toolchain file as a cmake option. Also set the build target for OpenBLAS to ARMV8 and specify NOFORTRAN=1 (at least as of version 4.1.1, the SDK contains no Fortran compiler):
<p>OpenBLAS checks the following environment variables on startup:</p>
<ul>
<li><strong>OPENBLAS_NUM_THREADS=</strong> the number of threads to use (for non-OpenMP-builds of OpenBLAS)</li>
<li><strong>OMP_NUM_THREADS=</strong> the number of threads to use (for OpenMP builds - note that setting this may also affect any other OpenMP code)</li>
<li><code>OPENBLAS_NUM_THREADS</code>: the number of threads to use (for non-OpenMP builds
of OpenBLAS)</li>
<li><code>OMP_NUM_THREADS</code>: the number of threads to use (for OpenMP builds - note
that setting this may also affect any other OpenMP code)</li>
<li>
<p><strong>OPENBLAS_DEFAULT_NUM_THREADS=</strong> the number of threads to use, irrespective if OpenBLAS was built for OpenMP or pthreads</p>
<p><code>OPENBLAS_DEFAULT_NUM_THREADS</code>: the number of threads to use, irrespective if
OpenBLAS was built for OpenMP or pthreads</p>
</li>
<li>
<p><strong>OPENBLAS_MAIN_FREE=1</strong>" this can be used to disable automatic assignment of cpu affinity in OpenBLAS builds that have it enabled by default</p>
<p><code>OPENBLAS_MAIN_FREE=1</code>: this can be used to disable automatic assignment of
cpu affinity in OpenBLAS builds that have it enabled by default</p>
</li>
<li><strong>OPENBLAS_THREAD_TIMEOUT=</strong> this can be used to define the length of time that idle threads should wait before exiting</li>
<li><strong>OMP_ADAPTIVE=1</strong> this can be used in OpenMP builds to actually remove any surplus threads when the number of threads is decreased</li>
<li><code>OPENBLAS_THREAD_TIMEOUT</code>: this can be used to define the length of time
that idle threads should wait before exiting</li>
<li><code>OMP_ADAPTIVE=1</code>: this can be used in OpenMP builds to actually remove any
surplus threads when the number of threads is decreased</li>
</ul>
<p><code>DYNAMIC_ARCH</code> builds also accept the following:</p>
<ul>
<li>
<p><code>OPENBLAS_VERBOSE</code>:</p>
<ul>
<li>set this to <code>1</code> to enable a warning when there is no exact match for the
detected cpu in the library</li>
<li>set this to <code>2</code> to make OpenBLAS print the name of the cpu target it
autodetected</li>
</ul>
</li>
<li>
<p><code>OPENBLAS_CORETYPE</code>: set this to one of the supported target names to
<li><code>OPENBLAS_L2_SIZE</code>: set this to override the autodetected size of the L2
cache where it is not reported correctly (in virtual environments)</li>
</ul>
<p>Deprecated variables still recognized for compatibilty:</p>
<ul>
<li><code>GOTO_NUM_THREADS</code>: equivalent to <code>OPENBLAS_NUM_THREADS</code></li>
<li><code>GOTOBLAS_MAIN_FREE</code>: equivalent to <code>OPENBLAS_MAIN_FREE</code></li>
<li><code>OPENBLAS_BLOCK_FACTOR</code>: this applies a scale factor to the GEMM "P"
parameter of the block matrix code, see file <code>driver/others/parameter.c</code></li>
</ul>
<p>DYNAMIC_ARCH builds also accept the following:
* <strong>OPENBLAS_VERBOSE=</strong> set this to "1" to enable a warning when there is no exact match for the detected cpu in the library
set this to "2" to make OpenBLAS print the name of the cpu target it autodetected
* <strong>OPENBLAS_CORETYPE=</strong> set this to one of the supported target names to override autodetection, e.g. OPENBLAS_CORETYPE=HASWELL
* <strong>OPENBLAS_L2_SIZE=</strong> set this to override the autodetected size of the L2 cache where it is not reported correctly (in virtual environments)</p>
<p>Deprecated variables still recognized for compatibilty:
* <strong>GOTO_NUM_THREADS=</strong> equivalent to <strong>OPENBLAS_NUM_THREADS</strong>
* <strong>GOTOBLAS_MAIN_FREE</strong> equivalent to <strong>OPENBLAS_MAIN_FREE</strong>
* <strong>OPENBLAS_BLOCK_FACTOR</strong> this applies a scale factor to the GEMM "P" parameter of the block matrix code, see file driver/others/parameter.cen</p>