You can not select more than 25 topics Topics must start with a chinese character,a letter or number, can include dashes ('-') and can be up to 35 characters long.

index.html 25 kB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608609610611612613614615616617618619620621622623624625626627628629630631632633634635636637638639640641642643644645646647648649650651652653654655656657658659660661662663664665666667668669670671672673674675676677678679680681682683684685686687688689690691692693694695696697698699700701702703704705706707708709710711712713714715716717718719720721722723724725726727728729730731732733734735736737738739740741742743744745746747748749750751752753754755756757758759760761762763764765766767768769770
  1. <!doctype html>
  2. <html lang="en" class="no-js">
  3. <head>
  4. <meta charset="utf-8">
  5. <meta name="viewport" content="width=device-width,initial-scale=1">
  6. <link rel="canonical" href="https://openblas.net/docs/developers/">
  7. <link rel="prev" href="../extensions/">
  8. <link rel="next" href="../build_system/">
  9. <link rel="icon" href="../logo.svg">
  10. <meta name="generator" content="mkdocs-1.6.0, mkdocs-material-9.5.22">
  11. <title>Developer manual - OpenBLAS</title>
  12. <link rel="stylesheet" href="../assets/stylesheets/main.732c4fb1.min.css">
  13. <link rel="stylesheet" href="../assets/stylesheets/palette.06af60db.min.css">
  14. <link rel="preconnect" href="https://fonts.gstatic.com" crossorigin>
  15. <link rel="stylesheet" href="https://fonts.googleapis.com/css?family=Roboto:300,300i,400,400i,700,700i%7CRoboto+Mono:400,400i,700,700i&display=fallback">
  16. <style>:root{--md-text-font:"Roboto";--md-code-font:"Roboto Mono"}</style>
  17. <script>__md_scope=new URL("..",location),__md_hash=e=>[...e].reduce((e,_)=>(e<<5)-e+_.charCodeAt(0),0),__md_get=(e,_=localStorage,t=__md_scope)=>JSON.parse(_.getItem(t.pathname+"."+e)),__md_set=(e,_,t=localStorage,a=__md_scope)=>{try{t.setItem(a.pathname+"."+e,JSON.stringify(_))}catch(e){}}</script>
  18. </head>
  19. <body dir="ltr" data-md-color-scheme="default" data-md-color-primary="grey" data-md-color-accent="indigo">
  20. <input class="md-toggle" data-md-toggle="drawer" type="checkbox" id="__drawer" autocomplete="off">
  21. <input class="md-toggle" data-md-toggle="search" type="checkbox" id="__search" autocomplete="off">
  22. <label class="md-overlay" for="__drawer"></label>
  23. <div data-md-component="skip">
  24. <a href="#developer-manual" class="md-skip">
  25. Skip to content
  26. </a>
  27. </div>
  28. <div data-md-component="announce">
  29. </div>
  30. <header class="md-header md-header--shadow" data-md-component="header">
  31. <nav class="md-header__inner md-grid" aria-label="Header">
  32. <a href=".." title="OpenBLAS" class="md-header__button md-logo" aria-label="OpenBLAS" data-md-component="logo">
  33. <img src="../logo.svg" alt="logo">
  34. </a>
  35. <label class="md-header__button md-icon" for="__drawer">
  36. <svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24"><path d="M3 6h18v2H3V6m0 5h18v2H3v-2m0 5h18v2H3v-2Z"/></svg>
  37. </label>
  38. <div class="md-header__title" data-md-component="header-title">
  39. <div class="md-header__ellipsis">
  40. <div class="md-header__topic">
  41. <span class="md-ellipsis">
  42. OpenBLAS
  43. </span>
  44. </div>
  45. <div class="md-header__topic" data-md-component="header-topic">
  46. <span class="md-ellipsis">
  47. Developer manual
  48. </span>
  49. </div>
  50. </div>
  51. </div>
  52. <label class="md-header__button md-icon" for="__search">
  53. <svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24"><path d="M9.5 3A6.5 6.5 0 0 1 16 9.5c0 1.61-.59 3.09-1.56 4.23l.27.27h.79l5 5-1.5 1.5-5-5v-.79l-.27-.27A6.516 6.516 0 0 1 9.5 16 6.5 6.5 0 0 1 3 9.5 6.5 6.5 0 0 1 9.5 3m0 2C7 5 5 7 5 9.5S7 14 9.5 14 14 12 14 9.5 12 5 9.5 5Z"/></svg>
  54. </label>
  55. <div class="md-search" data-md-component="search" role="dialog">
  56. <label class="md-search__overlay" for="__search"></label>
  57. <div class="md-search__inner" role="search">
  58. <form class="md-search__form" name="search">
  59. <input type="text" class="md-search__input" name="query" aria-label="Search" placeholder="Search" autocapitalize="off" autocorrect="off" autocomplete="off" spellcheck="false" data-md-component="search-query" required>
  60. <label class="md-search__icon md-icon" for="__search">
  61. <svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24"><path d="M9.5 3A6.5 6.5 0 0 1 16 9.5c0 1.61-.59 3.09-1.56 4.23l.27.27h.79l5 5-1.5 1.5-5-5v-.79l-.27-.27A6.516 6.516 0 0 1 9.5 16 6.5 6.5 0 0 1 3 9.5 6.5 6.5 0 0 1 9.5 3m0 2C7 5 5 7 5 9.5S7 14 9.5 14 14 12 14 9.5 12 5 9.5 5Z"/></svg>
  62. <svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24"><path d="M20 11v2H8l5.5 5.5-1.42 1.42L4.16 12l7.92-7.92L13.5 5.5 8 11h12Z"/></svg>
  63. </label>
  64. <nav class="md-search__options" aria-label="Search">
  65. <button type="reset" class="md-search__icon md-icon" title="Clear" aria-label="Clear" tabindex="-1">
  66. <svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24"><path d="M19 6.41 17.59 5 12 10.59 6.41 5 5 6.41 10.59 12 5 17.59 6.41 19 12 13.41 17.59 19 19 17.59 13.41 12 19 6.41Z"/></svg>
  67. </button>
  68. </nav>
  69. </form>
  70. <div class="md-search__output">
  71. <div class="md-search__scrollwrap" data-md-scrollfix>
  72. <div class="md-search-result" data-md-component="search-result">
  73. <div class="md-search-result__meta">
  74. Initializing search
  75. </div>
  76. <ol class="md-search-result__list" role="presentation"></ol>
  77. </div>
  78. </div>
  79. </div>
  80. </div>
  81. </div>
  82. </nav>
  83. </header>
  84. <div class="md-container" data-md-component="container">
  85. <main class="md-main" data-md-component="main">
  86. <div class="md-main__inner md-grid">
  87. <div class="md-sidebar md-sidebar--primary" data-md-component="sidebar" data-md-type="navigation" >
  88. <div class="md-sidebar__scrollwrap">
  89. <div class="md-sidebar__inner">
  90. <nav class="md-nav md-nav--primary" aria-label="Navigation" data-md-level="0">
  91. <label class="md-nav__title" for="__drawer">
  92. <a href=".." title="OpenBLAS" class="md-nav__button md-logo" aria-label="OpenBLAS" data-md-component="logo">
  93. <img src="../logo.svg" alt="logo">
  94. </a>
  95. OpenBLAS
  96. </label>
  97. <ul class="md-nav__list" data-md-scrollfix>
  98. <li class="md-nav__item">
  99. <a href=".." class="md-nav__link">
  100. <span class="md-ellipsis">
  101. Home
  102. </span>
  103. </a>
  104. </li>
  105. <li class="md-nav__item">
  106. <a href="../install/" class="md-nav__link">
  107. <span class="md-ellipsis">
  108. Install OpenBLAS
  109. </span>
  110. </a>
  111. </li>
  112. <li class="md-nav__item">
  113. <a href="../user_manual/" class="md-nav__link">
  114. <span class="md-ellipsis">
  115. User manual
  116. </span>
  117. </a>
  118. </li>
  119. <li class="md-nav__item">
  120. <a href="../extensions/" class="md-nav__link">
  121. <span class="md-ellipsis">
  122. Extensions
  123. </span>
  124. </a>
  125. </li>
  126. <li class="md-nav__item md-nav__item--active">
  127. <input class="md-nav__toggle md-toggle" type="checkbox" id="__toc">
  128. <label class="md-nav__link md-nav__link--active" for="__toc">
  129. <span class="md-ellipsis">
  130. Developer manual
  131. </span>
  132. <span class="md-nav__icon md-icon"></span>
  133. </label>
  134. <a href="./" class="md-nav__link md-nav__link--active">
  135. <span class="md-ellipsis">
  136. Developer manual
  137. </span>
  138. </a>
  139. <nav class="md-nav md-nav--secondary" aria-label="Table of contents">
  140. <label class="md-nav__title" for="__toc">
  141. <span class="md-nav__icon md-icon"></span>
  142. Table of contents
  143. </label>
  144. <ul class="md-nav__list" data-md-component="toc" data-md-scrollfix>
  145. <li class="md-nav__item">
  146. <a href="#source-codes-layout" class="md-nav__link">
  147. <span class="md-ellipsis">
  148. Source codes Layout
  149. </span>
  150. </a>
  151. </li>
  152. <li class="md-nav__item">
  153. <a href="#optimizing-gemm-for-a-given-hardware" class="md-nav__link">
  154. <span class="md-ellipsis">
  155. Optimizing GEMM for a given hardware
  156. </span>
  157. </a>
  158. </li>
  159. <li class="md-nav__item">
  160. <a href="#run-openblas-test" class="md-nav__link">
  161. <span class="md-ellipsis">
  162. Run OpenBLAS Test
  163. </span>
  164. </a>
  165. </li>
  166. <li class="md-nav__item">
  167. <a href="#benchmarking" class="md-nav__link">
  168. <span class="md-ellipsis">
  169. Benchmarking
  170. </span>
  171. </a>
  172. </li>
  173. <li class="md-nav__item">
  174. <a href="#adding-autodetection-support-for-a-new-revision-or-variant-of-a-supported-cpu" class="md-nav__link">
  175. <span class="md-ellipsis">
  176. Adding autodetection support for a new revision or variant of a supported cpu
  177. </span>
  178. </a>
  179. </li>
  180. <li class="md-nav__item">
  181. <a href="#adding-dedicated-support-for-a-new-cpu-model" class="md-nav__link">
  182. <span class="md-ellipsis">
  183. Adding dedicated support for a new cpu model
  184. </span>
  185. </a>
  186. </li>
  187. <li class="md-nav__item">
  188. <a href="#adding-support-for-an-entirely-new-architecture" class="md-nav__link">
  189. <span class="md-ellipsis">
  190. Adding support for an entirely new architecture
  191. </span>
  192. </a>
  193. </li>
  194. </ul>
  195. </nav>
  196. </li>
  197. <li class="md-nav__item">
  198. <a href="../build_system/" class="md-nav__link">
  199. <span class="md-ellipsis">
  200. Build system
  201. </span>
  202. </a>
  203. </li>
  204. <li class="md-nav__item">
  205. <a href="../distributing/" class="md-nav__link">
  206. <span class="md-ellipsis">
  207. Redistributing OpenBLAS
  208. </span>
  209. </a>
  210. </li>
  211. <li class="md-nav__item">
  212. <a href="../ci/" class="md-nav__link">
  213. <span class="md-ellipsis">
  214. CI jobs
  215. </span>
  216. </a>
  217. </li>
  218. <li class="md-nav__item">
  219. <a href="../about/" class="md-nav__link">
  220. <span class="md-ellipsis">
  221. About
  222. </span>
  223. </a>
  224. </li>
  225. <li class="md-nav__item">
  226. <a href="../faq/" class="md-nav__link">
  227. <span class="md-ellipsis">
  228. FAQ
  229. </span>
  230. </a>
  231. </li>
  232. </ul>
  233. </nav>
  234. </div>
  235. </div>
  236. </div>
  237. <div class="md-sidebar md-sidebar--secondary" data-md-component="sidebar" data-md-type="toc" >
  238. <div class="md-sidebar__scrollwrap">
  239. <div class="md-sidebar__inner">
  240. <nav class="md-nav md-nav--secondary" aria-label="Table of contents">
  241. <label class="md-nav__title" for="__toc">
  242. <span class="md-nav__icon md-icon"></span>
  243. Table of contents
  244. </label>
  245. <ul class="md-nav__list" data-md-component="toc" data-md-scrollfix>
  246. <li class="md-nav__item">
  247. <a href="#source-codes-layout" class="md-nav__link">
  248. <span class="md-ellipsis">
  249. Source codes Layout
  250. </span>
  251. </a>
  252. </li>
  253. <li class="md-nav__item">
  254. <a href="#optimizing-gemm-for-a-given-hardware" class="md-nav__link">
  255. <span class="md-ellipsis">
  256. Optimizing GEMM for a given hardware
  257. </span>
  258. </a>
  259. </li>
  260. <li class="md-nav__item">
  261. <a href="#run-openblas-test" class="md-nav__link">
  262. <span class="md-ellipsis">
  263. Run OpenBLAS Test
  264. </span>
  265. </a>
  266. </li>
  267. <li class="md-nav__item">
  268. <a href="#benchmarking" class="md-nav__link">
  269. <span class="md-ellipsis">
  270. Benchmarking
  271. </span>
  272. </a>
  273. </li>
  274. <li class="md-nav__item">
  275. <a href="#adding-autodetection-support-for-a-new-revision-or-variant-of-a-supported-cpu" class="md-nav__link">
  276. <span class="md-ellipsis">
  277. Adding autodetection support for a new revision or variant of a supported cpu
  278. </span>
  279. </a>
  280. </li>
  281. <li class="md-nav__item">
  282. <a href="#adding-dedicated-support-for-a-new-cpu-model" class="md-nav__link">
  283. <span class="md-ellipsis">
  284. Adding dedicated support for a new cpu model
  285. </span>
  286. </a>
  287. </li>
  288. <li class="md-nav__item">
  289. <a href="#adding-support-for-an-entirely-new-architecture" class="md-nav__link">
  290. <span class="md-ellipsis">
  291. Adding support for an entirely new architecture
  292. </span>
  293. </a>
  294. </li>
  295. </ul>
  296. </nav>
  297. </div>
  298. </div>
  299. </div>
  300. <div class="md-content" data-md-component="content">
  301. <article class="md-content__inner md-typeset">
  302. <h1 id="developer-manual">Developer manual</h1>
  303. <h2 id="source-codes-layout">Source codes Layout</h2>
  304. <div class="highlight"><pre><span></span><code>OpenBLAS/
  305. ├── benchmark Benchmark codes for BLAS
  306. ├── cmake CMakefiles
  307. ├── ctest Test codes for CBLAS interfaces
  308. ├── driver Implemented in C
  309. │   ├── level2
  310. │   ├── level3
  311. │   ├── mapper
  312. │   └── others Memory management, threading, etc
  313. ├── exports Generate shared library
  314. ├── interface Implement BLAS and CBLAS interfaces (calling driver or kernel)
  315. │   ├── lapack
  316. │   └── netlib
  317. ├── kernel Optimized assembly kernels for CPU architectures
  318. │   ├── alpha Original GotoBLAS kernels for DEC Alpha
  319. │   ├── arm ARMV5,V6,V7 kernels (including generic C codes used by other architectures)
  320. │   ├── arm64 ARMV8
  321. │   ├── generic General kernel codes written in plain C, parts used by many architectures.
  322. │   ├── ia64 Original GotoBLAS kernels for Intel Itanium
  323. │ ├── mips
  324. │   ├── mips64
  325. │   ├── power
  326. | ├── riscv64
  327. | ├── simd Common code for Universal Intrinsics, used by some x86_64 and arm64 kernels
  328. │   ├── sparc
  329. │   ├── x86
  330. │ ├── x86_64
  331. │   └── zarch
  332. ├── lapack Optimized LAPACK codes (replacing those in regular LAPACK)
  333. │   ├── getf2
  334. │   ├── getrf
  335. │   ├── getrs
  336. │   ├── laswp
  337. │   ├── lauu2
  338. │   ├── lauum
  339. │   ├── potf2
  340. │   ├── potrf
  341. │   ├── trti2
  342. │ ├── trtri
  343. │   └── trtrs
  344. ├── lapack-netlib LAPACK codes from netlib reference implementation
  345. ├── reference BLAS Fortran reference implementation (unused)
  346. ├── relapack Elmar Peise&#39;s recursive LAPACK (implemented on top of regular LAPACK)
  347. ├── test Test codes for BLAS
  348. └── utest Regression test
  349. </code></pre></div>
  350. <p>A call tree for <code>dgemm</code> is as following.</p>
  351. <div class="highlight"><pre><span></span><code>interface/gemm.c
  352. driver/level3/level3.c
  353. gemm assembly kernels at kernel/
  354. </code></pre></div>
  355. <p>To find the kernel currently used for a particular supported cpu, please check the corresponding <code>kernel/$(ARCH)/KERNEL.$(CPU)</code> file.</p>
  356. <p>Here is an example for <code>kernel/x86_64/KERNEL.HASWELL</code></p>
  357. <p><div class="highlight"><pre><span></span><code>...
  358. DTRMMKERNEL = dtrmm_kernel_4x8_haswell.c
  359. DGEMMKERNEL = dgemm_kernel_4x8_haswell.S
  360. ...
  361. </code></pre></div>
  362. According to the above <code>KERNEL.HASWELL</code>, OpenBLAS Haswell dgemm kernel file is <code>dgemm_kernel_4x8_haswell.S</code>.</p>
  363. <h2 id="optimizing-gemm-for-a-given-hardware">Optimizing GEMM for a given hardware</h2>
  364. <p>Read the Goto paper to understand the algorithm.</p>
  365. <p>Goto, Kazushige; van de Geijn, Robert A. (2008). <a href="http://delivery.acm.org/10.1145/1360000/1356053/a12-goto.pdf?ip=155.68.162.54&amp;id=1356053&amp;acc=ACTIVE%20SERVICE&amp;key=A79D83B43E50B5B8%2EF070BBE7E45C3F17%2E4D4702B0C3E38B35%2E4D4702B0C3E38B35&amp;__acm__=1517932837_edfe766f1e295d9a7830812371e1d173">"Anatomy of High-Performance Matrix Multiplication"</a>. ACM Transactions on Mathematical Software 34 (3): Article 12
  366. (The above link is available only to ACM members, but this and many related papers is also available on the pages
  367. of van de Geijn's FLAME project, http://www.cs.utexas.edu/~flame/web/FLAMEPublications.html )</p>
  368. <p>The <code>driver/level3/level3.c</code> is the implementation of Goto's algorithm. Meanwhile, you can look at <code>kernel/generic/gemmkernel_2x2.c</code>, which is a naive <code>2x2</code> register blocking gemm kernel in C.</p>
  369. <p>Then,
  370. * Write optimized assembly kernels. consider instruction pipeline, available registers, memory/cache accessing
  371. * Tuning cache block size, <code>Mc</code>, <code>Kc</code>, and <code>Nc</code> </p>
  372. <p>Note that not all of the cpu-specific parameters in param.h are actively used in algorithms. DNUMOPT only appears as a scale factor in profiling output of the level3 syrk interface code, while its counterpart SNUMOPT (aliased as NUMOPT in common.h) is not used anywhere at all.
  373. SYMV_P is only used in the generic kernels for the symv and chemv/zhemv functions - at least some of those are usually overridden by cpu-specific implementations, so if you start by cloning the existing implementation for a related cpu you need to check its KERNEL file to see if tuning SYMV_P would have any effect at all.
  374. GEMV_UNROLL is only used by some older x86_64 kernels, so not all sections in param.h define it.
  375. Similarly, not all of the cpu parameters like L2 or L3 cache sizes are necessarily used in current kernels for a given model - by all indications the cpu identification code was imported from some other project originally.</p>
  376. <h2 id="run-openblas-test">Run OpenBLAS Test</h2>
  377. <p>We use netlib blas test, cblas test, and LAPACK test. Meanwhile, we use <a href="https://github.com/xianyi/BLAS-Tester">BLAS-Tester</a>, a modified test tool from ATLAS.</p>
  378. <ul>
  379. <li>Run <code>test</code> and <code>ctest</code> at OpenBLAS. e.g. <code>make test</code> or <code>make ctest</code>.</li>
  380. <li>Run regression test <code>utest</code> at OpenBLAS.</li>
  381. <li>Run LAPACK test. e.g. <code>make lapack-test</code>.</li>
  382. <li>Clone <a href="https://github.com/xianyi/BLAS-Tester">BLAS-Tester</a>, which can compare the OpenBLAS result with netlib reference BLAS.</li>
  383. </ul>
  384. <p>The project makes use of several Continuous Integration (CI) services conveniently interfaced with github to automatically check compilability on a number of platforms.
  385. Lastly, the testsuites included with "numerically heavy" projects like Julia, NumPy, Octave or QuantumEspresso can be used for regression testing.</p>
  386. <h2 id="benchmarking">Benchmarking</h2>
  387. <p>Several simple C benchmarks for performance testing individual BLAS functions are available in the <code>benchmark</code> folder, and its <code>scripts</code> subdirectory contains corresponding versions for Python, Octave and R.
  388. Other options include</p>
  389. <ul>
  390. <li>https://github.com/RoyiAvital/MatlabJuliaMatrixOperationsBenchmark (various matrix operations in Julia and Matlab)</li>
  391. <li>https://github.com/mmperf/mmperf/ (single-core matrix multiplication)</li>
  392. </ul>
  393. <h2 id="adding-autodetection-support-for-a-new-revision-or-variant-of-a-supported-cpu">Adding autodetection support for a new revision or variant of a supported cpu</h2>
  394. <p>Especially relevant for x86_64, a new cpu model may be a "refresh" (die shrink and/or different number of cores) within an existing
  395. model family without significant changes to its instruction set. (e.g. Intel Skylake, Kaby Lake etc. still are fundamentally Haswell,
  396. low end Goldmont etc. are Nehalem). In this case, compilation with the appropriate older TARGET will already lead to a satisfactory build.</p>
  397. <p>To achieve autodetection of the new model, its CPUID (or an equivalent identifier) needs to be added in the <code>cpuid_&lt;architecture&gt;.c</code>
  398. relevant for its general architecture, with the returned name for the new type set appropriately. For x86 which has the most complex
  399. cpuid file, there are two functions that need to be edited - get_cpuname() to return e.g. CPUTYPE_HASWELL and get_corename() for the (broader)
  400. core family returning e.g. CORE_HASWELL. (This information ends up in the Makefile.conf and config.h files generated by <code>getarch</code>. Failure to
  401. set either will typically lead to a missing definition of the GEMM_UNROLL parameters later in the build, as <code>getarch_2nd</code> will be unable to
  402. find a matching parameter section in param.h.)</p>
  403. <p>For architectures where "DYNAMIC_ARCH" builds are supported, a similar but simpler code section for the corresponding runtime detection of the cpu exists in <code>driver/others/dynamic.c</code> (for x86) and <code>driver/others/dynamic_&lt;arch&gt;.c</code> for other architectures.<br />
  404. Note that for x86 the CPUID is compared after splitting it into its family, extended family, model and extended model parts, so the single decimal
  405. number returned by Linux in /proc/cpuinfo for the model has to be converted back to hexadecimal before splitting into its constituent
  406. digits, e.g. 142 = 8E , translates to extended model 8, model 14.</p>
  407. <h2 id="adding-dedicated-support-for-a-new-cpu-model">Adding dedicated support for a new cpu model</h2>
  408. <p>Usually it will be possible to start from an existing model, clone its KERNEL configuration file to the new name to use for this TARGET and eventually replace individual kernels with versions better suited for peculiarities of the new cpu model. In addition, it is necessary to add
  409. (or clone at first) the corresponding section of GEMM_UNROLL parameters in the toplevel param.h, and possibly to add definitions such as USE_TRMM
  410. (governing whether TRMM functions use the respective GEMM kernel or a separate source file) to the Makefiles (and CMakeLists.txt) in the kernel
  411. directory. The new cpu name needs to be added to TargetLists.txt and the cpu autodetection code used by the <code>getarch</code> helper program - contained in
  412. the <code>cpuid_&lt;architecture&gt;.c</code> file amended to include the CPUID (or equivalent) information processing required (see preceding section).</p>
  413. <h2 id="adding-support-for-an-entirely-new-architecture">Adding support for an entirely new architecture</h2>
  414. <p>This endeavour is best started by cloning the entire support structure for 32bit ARM, and within that the ARMV5 cpu in particular as this is implemented through plain C kernels only. An example providing a convenient "shopping list" can be seen in pull request #1526.</p>
  415. </article>
  416. </div>
  417. <script>var target=document.getElementById(location.hash.slice(1));target&&target.name&&(target.checked=target.name.startsWith("__tabbed_"))</script>
  418. </div>
  419. </main>
  420. <footer class="md-footer">
  421. <div class="md-footer-meta md-typeset">
  422. <div class="md-footer-meta__inner md-grid">
  423. <div class="md-copyright">
  424. Made with
  425. <a href="https://squidfunk.github.io/mkdocs-material/" target="_blank" rel="noopener">
  426. Material for MkDocs
  427. </a>
  428. </div>
  429. </div>
  430. </div>
  431. </footer>
  432. </div>
  433. <div class="md-dialog" data-md-component="dialog">
  434. <div class="md-dialog__inner md-typeset"></div>
  435. </div>
  436. <script id="__config" type="application/json">{"base": "..", "features": [], "search": "../assets/javascripts/workers/search.b8dbb3d2.min.js", "translations": {"clipboard.copied": "Copied to clipboard", "clipboard.copy": "Copy to clipboard", "search.result.more.one": "1 more on this page", "search.result.more.other": "# more on this page", "search.result.none": "No matching documents", "search.result.one": "1 matching document", "search.result.other": "# matching documents", "search.result.placeholder": "Type to start searching", "search.result.term.missing": "Missing", "select.version": "Select version"}}</script>
  437. <script src="../assets/javascripts/bundle.5cfa9459.min.js"></script>
  438. </body>
  439. </html>

OpenBLAS is an optimized BLAS library based on GotoBLAS2 1.13 BSD version.