You can not select more than 25 topics Topics must start with a chinese character,a letter or number, can include dashes ('-') and can be up to 35 characters long.

allocator.cpp 26 kB

[WIP] vulkan compute (#618) * vulkan infrastructure * vkallocator and vkmat * layer interface for vulkan compute * wip... * default vulkan device, command wrapper, upload model weight in load_model to simplify layer interface * simplify command api, vkmat holds staging buffer, relu works * initialize specialization constant, simplify command dispatch, fix staging buffer copy with different shape, convolution works * init extension functions * dynamic local size and group count * group count=1 is invalid * regard device max workgroup size limit * fix relu oooops * decouple command record and staging allocation * create result blob * add pooling shader * buffer is faster than image :) * fix pooling shader * add innerproduct shader * readonly writeonly decoration * simplify buffer creation * decouple command and layer, VK_KHR_descriptor_update_template extension makes descriptor binding update easy :D * fix vulkan building issues in visual studio (#1) * fix building issues on visual studio * ignore benchmark * cancel changes * ... ... * decouple paramdict and vulkandevice * fix staging buffer destroy in model loading * remove vkdev member in option * add padding shader * simplify vulkan layer creation, simplify convolution and pooling shader for no padding, less debug output * add convolutiondepthwise and softmax shader * specialization float type, add leakyrelu * add dropout shader * add batchnorm shader * split vulkan forward * add scale shader * push constant type can be int or float * set_optimal_local_size_xyz * add eltwise shader * concat vulkan forward * fix convolution without bias * add dummy shader for concat and split, more fix ... * optional VK_KHR_descriptor_update_template and VK_KHR_push_descriptor * check VK_KHR_push_descriptor for vkCmdPushDescriptorSetWithTemplateKHR * binaryop and unaryop shader * hide raw command buffer * simple vkbenchncnn benchmark * create device with transfer queue * rename command to vkcompute, add vktransfer and layer upload_model interface * external VkMat, copy and map wrt buffer offset * command copy respect offset and size * decouple weight upload and load, simplify upload weight api, use one big staging buffer for uploading weights * fix build on android * binding count can not vary :( * barrier check state, fix sub-op destruction * declare local_size_xyz constant, fix crash on radv * fix local_size_xyz, second try * more barrier and state fix * fix softmax * reconstruct buffer memory allocator, reuse blob buffer, less verbose output * find unified memory type index * weight staging buffer allocator and weight buffer allocator, respect descriptor buffer offset alignment * use VK_KHR_descriptor_update_template for faster descriptor update if available, multithread pipeline creation * find more useful vulkan extensions and enable them * fix msvc build * respect VK_KHR_dedicated_allocation for weight buffer allocation * fix android build * fix bias name conflicts with metal * decouple pipeline and layer, building shader sources into shader module, dedicated create_pipeline api, simplify pipeline recording * drop dummy shader, inplace softmax, multiple shader module works * fix unique queue family index error * flatten support vulkan * mnasnet run * find shader module by name, each entry point per shader module, fix attribute/id conflict on moltenvk * some minor changes * add some high level api * use dedicated transfer queue to upload weight model * prefer mappable buffer on unified memory * global pooling and convolution fc, reuse staging buffer * implement ring-buffer style blob allocator, add VkBufferMemory capacity * use blob allocator for workspace blob, it works fine :) * vulkan option off * Update layer.cpp * fix build with vulkan off * less verbose output, fix crash on vulkan_compute off * merge benchncnn tool * allocator clear api, use new weight buffer allocator per net * add default locked allocator * mapped mat ptr api, persistent mapped memory works generally :) * travis ci linux vulkan * travis ci vulkan wip ... * more gpu wip ... * more gpu wip ... * wip... * wip... * wip... ... * wip... ios vulkan build... * find glslangValidator on ios build * use dynamic moltenvk library * travis ci wip ... * ios simulator does not support metal at all * fix cpu only extractor * optimize workgroup size, first try * optimize workgroup size, second try * conv1x1s1d1 vec4 * revert build system * fix ncnn2mem build * fix ncnn2mem build
7 years ago
[WIP] vulkan compute (#618) * vulkan infrastructure * vkallocator and vkmat * layer interface for vulkan compute * wip... * default vulkan device, command wrapper, upload model weight in load_model to simplify layer interface * simplify command api, vkmat holds staging buffer, relu works * initialize specialization constant, simplify command dispatch, fix staging buffer copy with different shape, convolution works * init extension functions * dynamic local size and group count * group count=1 is invalid * regard device max workgroup size limit * fix relu oooops * decouple command record and staging allocation * create result blob * add pooling shader * buffer is faster than image :) * fix pooling shader * add innerproduct shader * readonly writeonly decoration * simplify buffer creation * decouple command and layer, VK_KHR_descriptor_update_template extension makes descriptor binding update easy :D * fix vulkan building issues in visual studio (#1) * fix building issues on visual studio * ignore benchmark * cancel changes * ... ... * decouple paramdict and vulkandevice * fix staging buffer destroy in model loading * remove vkdev member in option * add padding shader * simplify vulkan layer creation, simplify convolution and pooling shader for no padding, less debug output * add convolutiondepthwise and softmax shader * specialization float type, add leakyrelu * add dropout shader * add batchnorm shader * split vulkan forward * add scale shader * push constant type can be int or float * set_optimal_local_size_xyz * add eltwise shader * concat vulkan forward * fix convolution without bias * add dummy shader for concat and split, more fix ... * optional VK_KHR_descriptor_update_template and VK_KHR_push_descriptor * check VK_KHR_push_descriptor for vkCmdPushDescriptorSetWithTemplateKHR * binaryop and unaryop shader * hide raw command buffer * simple vkbenchncnn benchmark * create device with transfer queue * rename command to vkcompute, add vktransfer and layer upload_model interface * external VkMat, copy and map wrt buffer offset * command copy respect offset and size * decouple weight upload and load, simplify upload weight api, use one big staging buffer for uploading weights * fix build on android * binding count can not vary :( * barrier check state, fix sub-op destruction * declare local_size_xyz constant, fix crash on radv * fix local_size_xyz, second try * more barrier and state fix * fix softmax * reconstruct buffer memory allocator, reuse blob buffer, less verbose output * find unified memory type index * weight staging buffer allocator and weight buffer allocator, respect descriptor buffer offset alignment * use VK_KHR_descriptor_update_template for faster descriptor update if available, multithread pipeline creation * find more useful vulkan extensions and enable them * fix msvc build * respect VK_KHR_dedicated_allocation for weight buffer allocation * fix android build * fix bias name conflicts with metal * decouple pipeline and layer, building shader sources into shader module, dedicated create_pipeline api, simplify pipeline recording * drop dummy shader, inplace softmax, multiple shader module works * fix unique queue family index error * flatten support vulkan * mnasnet run * find shader module by name, each entry point per shader module, fix attribute/id conflict on moltenvk * some minor changes * add some high level api * use dedicated transfer queue to upload weight model * prefer mappable buffer on unified memory * global pooling and convolution fc, reuse staging buffer * implement ring-buffer style blob allocator, add VkBufferMemory capacity * use blob allocator for workspace blob, it works fine :) * vulkan option off * Update layer.cpp * fix build with vulkan off * less verbose output, fix crash on vulkan_compute off * merge benchncnn tool * allocator clear api, use new weight buffer allocator per net * add default locked allocator * mapped mat ptr api, persistent mapped memory works generally :) * travis ci linux vulkan * travis ci vulkan wip ... * more gpu wip ... * more gpu wip ... * wip... * wip... * wip... ... * wip... ios vulkan build... * find glslangValidator on ios build * use dynamic moltenvk library * travis ci wip ... * ios simulator does not support metal at all * fix cpu only extractor * optimize workgroup size, first try * optimize workgroup size, second try * conv1x1s1d1 vec4 * revert build system * fix ncnn2mem build * fix ncnn2mem build
7 years ago
[WIP] vulkan compute (#618) * vulkan infrastructure * vkallocator and vkmat * layer interface for vulkan compute * wip... * default vulkan device, command wrapper, upload model weight in load_model to simplify layer interface * simplify command api, vkmat holds staging buffer, relu works * initialize specialization constant, simplify command dispatch, fix staging buffer copy with different shape, convolution works * init extension functions * dynamic local size and group count * group count=1 is invalid * regard device max workgroup size limit * fix relu oooops * decouple command record and staging allocation * create result blob * add pooling shader * buffer is faster than image :) * fix pooling shader * add innerproduct shader * readonly writeonly decoration * simplify buffer creation * decouple command and layer, VK_KHR_descriptor_update_template extension makes descriptor binding update easy :D * fix vulkan building issues in visual studio (#1) * fix building issues on visual studio * ignore benchmark * cancel changes * ... ... * decouple paramdict and vulkandevice * fix staging buffer destroy in model loading * remove vkdev member in option * add padding shader * simplify vulkan layer creation, simplify convolution and pooling shader for no padding, less debug output * add convolutiondepthwise and softmax shader * specialization float type, add leakyrelu * add dropout shader * add batchnorm shader * split vulkan forward * add scale shader * push constant type can be int or float * set_optimal_local_size_xyz * add eltwise shader * concat vulkan forward * fix convolution without bias * add dummy shader for concat and split, more fix ... * optional VK_KHR_descriptor_update_template and VK_KHR_push_descriptor * check VK_KHR_push_descriptor for vkCmdPushDescriptorSetWithTemplateKHR * binaryop and unaryop shader * hide raw command buffer * simple vkbenchncnn benchmark * create device with transfer queue * rename command to vkcompute, add vktransfer and layer upload_model interface * external VkMat, copy and map wrt buffer offset * command copy respect offset and size * decouple weight upload and load, simplify upload weight api, use one big staging buffer for uploading weights * fix build on android * binding count can not vary :( * barrier check state, fix sub-op destruction * declare local_size_xyz constant, fix crash on radv * fix local_size_xyz, second try * more barrier and state fix * fix softmax * reconstruct buffer memory allocator, reuse blob buffer, less verbose output * find unified memory type index * weight staging buffer allocator and weight buffer allocator, respect descriptor buffer offset alignment * use VK_KHR_descriptor_update_template for faster descriptor update if available, multithread pipeline creation * find more useful vulkan extensions and enable them * fix msvc build * respect VK_KHR_dedicated_allocation for weight buffer allocation * fix android build * fix bias name conflicts with metal * decouple pipeline and layer, building shader sources into shader module, dedicated create_pipeline api, simplify pipeline recording * drop dummy shader, inplace softmax, multiple shader module works * fix unique queue family index error * flatten support vulkan * mnasnet run * find shader module by name, each entry point per shader module, fix attribute/id conflict on moltenvk * some minor changes * add some high level api * use dedicated transfer queue to upload weight model * prefer mappable buffer on unified memory * global pooling and convolution fc, reuse staging buffer * implement ring-buffer style blob allocator, add VkBufferMemory capacity * use blob allocator for workspace blob, it works fine :) * vulkan option off * Update layer.cpp * fix build with vulkan off * less verbose output, fix crash on vulkan_compute off * merge benchncnn tool * allocator clear api, use new weight buffer allocator per net * add default locked allocator * mapped mat ptr api, persistent mapped memory works generally :) * travis ci linux vulkan * travis ci vulkan wip ... * more gpu wip ... * more gpu wip ... * wip... * wip... * wip... ... * wip... ios vulkan build... * find glslangValidator on ios build * use dynamic moltenvk library * travis ci wip ... * ios simulator does not support metal at all * fix cpu only extractor * optimize workgroup size, first try * optimize workgroup size, second try * conv1x1s1d1 vec4 * revert build system * fix ncnn2mem build * fix ncnn2mem build
7 years ago
[WIP] vulkan compute (#618) * vulkan infrastructure * vkallocator and vkmat * layer interface for vulkan compute * wip... * default vulkan device, command wrapper, upload model weight in load_model to simplify layer interface * simplify command api, vkmat holds staging buffer, relu works * initialize specialization constant, simplify command dispatch, fix staging buffer copy with different shape, convolution works * init extension functions * dynamic local size and group count * group count=1 is invalid * regard device max workgroup size limit * fix relu oooops * decouple command record and staging allocation * create result blob * add pooling shader * buffer is faster than image :) * fix pooling shader * add innerproduct shader * readonly writeonly decoration * simplify buffer creation * decouple command and layer, VK_KHR_descriptor_update_template extension makes descriptor binding update easy :D * fix vulkan building issues in visual studio (#1) * fix building issues on visual studio * ignore benchmark * cancel changes * ... ... * decouple paramdict and vulkandevice * fix staging buffer destroy in model loading * remove vkdev member in option * add padding shader * simplify vulkan layer creation, simplify convolution and pooling shader for no padding, less debug output * add convolutiondepthwise and softmax shader * specialization float type, add leakyrelu * add dropout shader * add batchnorm shader * split vulkan forward * add scale shader * push constant type can be int or float * set_optimal_local_size_xyz * add eltwise shader * concat vulkan forward * fix convolution without bias * add dummy shader for concat and split, more fix ... * optional VK_KHR_descriptor_update_template and VK_KHR_push_descriptor * check VK_KHR_push_descriptor for vkCmdPushDescriptorSetWithTemplateKHR * binaryop and unaryop shader * hide raw command buffer * simple vkbenchncnn benchmark * create device with transfer queue * rename command to vkcompute, add vktransfer and layer upload_model interface * external VkMat, copy and map wrt buffer offset * command copy respect offset and size * decouple weight upload and load, simplify upload weight api, use one big staging buffer for uploading weights * fix build on android * binding count can not vary :( * barrier check state, fix sub-op destruction * declare local_size_xyz constant, fix crash on radv * fix local_size_xyz, second try * more barrier and state fix * fix softmax * reconstruct buffer memory allocator, reuse blob buffer, less verbose output * find unified memory type index * weight staging buffer allocator and weight buffer allocator, respect descriptor buffer offset alignment * use VK_KHR_descriptor_update_template for faster descriptor update if available, multithread pipeline creation * find more useful vulkan extensions and enable them * fix msvc build * respect VK_KHR_dedicated_allocation for weight buffer allocation * fix android build * fix bias name conflicts with metal * decouple pipeline and layer, building shader sources into shader module, dedicated create_pipeline api, simplify pipeline recording * drop dummy shader, inplace softmax, multiple shader module works * fix unique queue family index error * flatten support vulkan * mnasnet run * find shader module by name, each entry point per shader module, fix attribute/id conflict on moltenvk * some minor changes * add some high level api * use dedicated transfer queue to upload weight model * prefer mappable buffer on unified memory * global pooling and convolution fc, reuse staging buffer * implement ring-buffer style blob allocator, add VkBufferMemory capacity * use blob allocator for workspace blob, it works fine :) * vulkan option off * Update layer.cpp * fix build with vulkan off * less verbose output, fix crash on vulkan_compute off * merge benchncnn tool * allocator clear api, use new weight buffer allocator per net * add default locked allocator * mapped mat ptr api, persistent mapped memory works generally :) * travis ci linux vulkan * travis ci vulkan wip ... * more gpu wip ... * more gpu wip ... * wip... * wip... * wip... ... * wip... ios vulkan build... * find glslangValidator on ios build * use dynamic moltenvk library * travis ci wip ... * ios simulator does not support metal at all * fix cpu only extractor * optimize workgroup size, first try * optimize workgroup size, second try * conv1x1s1d1 vec4 * revert build system * fix ncnn2mem build * fix ncnn2mem build
7 years ago
7 years ago
[WIP] vulkan compute (#618) * vulkan infrastructure * vkallocator and vkmat * layer interface for vulkan compute * wip... * default vulkan device, command wrapper, upload model weight in load_model to simplify layer interface * simplify command api, vkmat holds staging buffer, relu works * initialize specialization constant, simplify command dispatch, fix staging buffer copy with different shape, convolution works * init extension functions * dynamic local size and group count * group count=1 is invalid * regard device max workgroup size limit * fix relu oooops * decouple command record and staging allocation * create result blob * add pooling shader * buffer is faster than image :) * fix pooling shader * add innerproduct shader * readonly writeonly decoration * simplify buffer creation * decouple command and layer, VK_KHR_descriptor_update_template extension makes descriptor binding update easy :D * fix vulkan building issues in visual studio (#1) * fix building issues on visual studio * ignore benchmark * cancel changes * ... ... * decouple paramdict and vulkandevice * fix staging buffer destroy in model loading * remove vkdev member in option * add padding shader * simplify vulkan layer creation, simplify convolution and pooling shader for no padding, less debug output * add convolutiondepthwise and softmax shader * specialization float type, add leakyrelu * add dropout shader * add batchnorm shader * split vulkan forward * add scale shader * push constant type can be int or float * set_optimal_local_size_xyz * add eltwise shader * concat vulkan forward * fix convolution without bias * add dummy shader for concat and split, more fix ... * optional VK_KHR_descriptor_update_template and VK_KHR_push_descriptor * check VK_KHR_push_descriptor for vkCmdPushDescriptorSetWithTemplateKHR * binaryop and unaryop shader * hide raw command buffer * simple vkbenchncnn benchmark * create device with transfer queue * rename command to vkcompute, add vktransfer and layer upload_model interface * external VkMat, copy and map wrt buffer offset * command copy respect offset and size * decouple weight upload and load, simplify upload weight api, use one big staging buffer for uploading weights * fix build on android * binding count can not vary :( * barrier check state, fix sub-op destruction * declare local_size_xyz constant, fix crash on radv * fix local_size_xyz, second try * more barrier and state fix * fix softmax * reconstruct buffer memory allocator, reuse blob buffer, less verbose output * find unified memory type index * weight staging buffer allocator and weight buffer allocator, respect descriptor buffer offset alignment * use VK_KHR_descriptor_update_template for faster descriptor update if available, multithread pipeline creation * find more useful vulkan extensions and enable them * fix msvc build * respect VK_KHR_dedicated_allocation for weight buffer allocation * fix android build * fix bias name conflicts with metal * decouple pipeline and layer, building shader sources into shader module, dedicated create_pipeline api, simplify pipeline recording * drop dummy shader, inplace softmax, multiple shader module works * fix unique queue family index error * flatten support vulkan * mnasnet run * find shader module by name, each entry point per shader module, fix attribute/id conflict on moltenvk * some minor changes * add some high level api * use dedicated transfer queue to upload weight model * prefer mappable buffer on unified memory * global pooling and convolution fc, reuse staging buffer * implement ring-buffer style blob allocator, add VkBufferMemory capacity * use blob allocator for workspace blob, it works fine :) * vulkan option off * Update layer.cpp * fix build with vulkan off * less verbose output, fix crash on vulkan_compute off * merge benchncnn tool * allocator clear api, use new weight buffer allocator per net * add default locked allocator * mapped mat ptr api, persistent mapped memory works generally :) * travis ci linux vulkan * travis ci vulkan wip ... * more gpu wip ... * more gpu wip ... * wip... * wip... * wip... ... * wip... ios vulkan build... * find glslangValidator on ios build * use dynamic moltenvk library * travis ci wip ... * ios simulator does not support metal at all * fix cpu only extractor * optimize workgroup size, first try * optimize workgroup size, second try * conv1x1s1d1 vec4 * revert build system * fix ncnn2mem build * fix ncnn2mem build
7 years ago
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608609610611612613614615616617618619620621622623624625626627628629630631632633634635636637638639640641642643644645646647648649650651652653654655656657658659660661662663664665666667668669670671672673674675676677678679680681682683684685686687688689690691692693694695696697698699700701702703704705706707708709710711712713714715716717718719720721722723724725726727728729730731732733734735736737738739740741742743744745746747748749750751752753754755756757758759760761762763764765766767768769770771772773774775776777778779780781782783784785786787788789790791792793794795796797798799800801802803804805806807808809810811812813814815816817818819820821822823824825826827828829830831832833834835836837838839840841842843844845846847848849850851852853854855856857858859860861862863864865866867868869870871872873874875876877878879880881882883884885886887888889890891892893894895896897898899900901902903904905906907908909910911912913914915916917918919920921922923924925926927928929930931932933934935936937938939940941
  1. // Tencent is pleased to support the open source community by making ncnn available.
  2. //
  3. // Copyright (C) 2018 THL A29 Limited, a Tencent company. All rights reserved.
  4. //
  5. // Licensed under the BSD 3-Clause License (the "License"); you may not use this file except
  6. // in compliance with the License. You may obtain a copy of the License at
  7. //
  8. // https://opensource.org/licenses/BSD-3-Clause
  9. //
  10. // Unless required by applicable law or agreed to in writing, software distributed
  11. // under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR
  12. // CONDITIONS OF ANY KIND, either express or implied. See the License for the
  13. // specific language governing permissions and limitations under the License.
  14. #include "allocator.h"
  15. #include <stdio.h>
  16. #include <algorithm>
  17. #include "gpu.h"
  18. namespace ncnn {
  19. Allocator::~Allocator()
  20. {
  21. }
  22. PoolAllocator::PoolAllocator()
  23. {
  24. size_compare_ratio = 192;// 0.75f * 256
  25. }
  26. PoolAllocator::~PoolAllocator()
  27. {
  28. clear();
  29. if (!payouts.empty())
  30. {
  31. fprintf(stderr, "FATAL ERROR! pool allocator destroyed too early\n");
  32. std::list< std::pair<size_t, void*> >::iterator it = payouts.begin();
  33. for (; it != payouts.end(); it++)
  34. {
  35. void* ptr = it->second;
  36. fprintf(stderr, "%p still in use\n", ptr);
  37. }
  38. }
  39. }
  40. void PoolAllocator::clear()
  41. {
  42. budgets_lock.lock();
  43. std::list< std::pair<size_t, void*> >::iterator it = budgets.begin();
  44. for (; it != budgets.end(); it++)
  45. {
  46. void* ptr = it->second;
  47. ncnn::fastFree(ptr);
  48. }
  49. budgets.clear();
  50. budgets_lock.unlock();
  51. }
  52. void PoolAllocator::set_size_compare_ratio(float scr)
  53. {
  54. if (scr < 0.f || scr > 1.f)
  55. {
  56. fprintf(stderr, "invalid size compare ratio %f\n", scr);
  57. return;
  58. }
  59. size_compare_ratio = (unsigned int)(scr * 256);
  60. }
  61. void* PoolAllocator::fastMalloc(size_t size)
  62. {
  63. budgets_lock.lock();
  64. // find free budget
  65. std::list< std::pair<size_t, void*> >::iterator it = budgets.begin();
  66. for (; it != budgets.end(); it++)
  67. {
  68. size_t bs = it->first;
  69. // size_compare_ratio ~ 100%
  70. if (bs >= size && ((bs * size_compare_ratio) >> 8) <= size)
  71. {
  72. void* ptr = it->second;
  73. budgets.erase(it);
  74. budgets_lock.unlock();
  75. payouts_lock.lock();
  76. payouts.push_back(std::make_pair(bs, ptr));
  77. payouts_lock.unlock();
  78. return ptr;
  79. }
  80. }
  81. budgets_lock.unlock();
  82. // new
  83. void* ptr = ncnn::fastMalloc(size);
  84. payouts_lock.lock();
  85. payouts.push_back(std::make_pair(size, ptr));
  86. payouts_lock.unlock();
  87. return ptr;
  88. }
  89. void PoolAllocator::fastFree(void* ptr)
  90. {
  91. payouts_lock.lock();
  92. // return to budgets
  93. std::list< std::pair<size_t, void*> >::iterator it = payouts.begin();
  94. for (; it != payouts.end(); it++)
  95. {
  96. if (it->second == ptr)
  97. {
  98. size_t size = it->first;
  99. payouts.erase(it);
  100. payouts_lock.unlock();
  101. budgets_lock.lock();
  102. budgets.push_back(std::make_pair(size, ptr));
  103. budgets_lock.unlock();
  104. return;
  105. }
  106. }
  107. payouts_lock.unlock();
  108. fprintf(stderr, "FATAL ERROR! pool allocator get wild %p\n", ptr);
  109. ncnn::fastFree(ptr);
  110. }
  111. UnlockedPoolAllocator::UnlockedPoolAllocator()
  112. {
  113. size_compare_ratio = 192;// 0.75f * 256
  114. }
  115. UnlockedPoolAllocator::~UnlockedPoolAllocator()
  116. {
  117. clear();
  118. if (!payouts.empty())
  119. {
  120. fprintf(stderr, "FATAL ERROR! unlocked pool allocator destroyed too early\n");
  121. std::list< std::pair<size_t, void*> >::iterator it = payouts.begin();
  122. for (; it != payouts.end(); it++)
  123. {
  124. void* ptr = it->second;
  125. fprintf(stderr, "%p still in use\n", ptr);
  126. }
  127. }
  128. }
  129. void UnlockedPoolAllocator::clear()
  130. {
  131. std::list< std::pair<size_t, void*> >::iterator it = budgets.begin();
  132. for (; it != budgets.end(); it++)
  133. {
  134. void* ptr = it->second;
  135. ncnn::fastFree(ptr);
  136. }
  137. budgets.clear();
  138. }
  139. void UnlockedPoolAllocator::set_size_compare_ratio(float scr)
  140. {
  141. if (scr < 0.f || scr > 1.f)
  142. {
  143. fprintf(stderr, "invalid size compare ratio %f\n", scr);
  144. return;
  145. }
  146. size_compare_ratio = (unsigned int)(scr * 256);
  147. }
  148. void* UnlockedPoolAllocator::fastMalloc(size_t size)
  149. {
  150. // find free budget
  151. std::list< std::pair<size_t, void*> >::iterator it = budgets.begin();
  152. for (; it != budgets.end(); it++)
  153. {
  154. size_t bs = it->first;
  155. // size_compare_ratio ~ 100%
  156. if (bs >= size && ((bs * size_compare_ratio) >> 8) <= size)
  157. {
  158. void* ptr = it->second;
  159. budgets.erase(it);
  160. payouts.push_back(std::make_pair(bs, ptr));
  161. return ptr;
  162. }
  163. }
  164. // new
  165. void* ptr = ncnn::fastMalloc(size);
  166. payouts.push_back(std::make_pair(size, ptr));
  167. return ptr;
  168. }
  169. void UnlockedPoolAllocator::fastFree(void* ptr)
  170. {
  171. // return to budgets
  172. std::list< std::pair<size_t, void*> >::iterator it = payouts.begin();
  173. for (; it != payouts.end(); it++)
  174. {
  175. if (it->second == ptr)
  176. {
  177. size_t size = it->first;
  178. payouts.erase(it);
  179. budgets.push_back(std::make_pair(size, ptr));
  180. return;
  181. }
  182. }
  183. fprintf(stderr, "FATAL ERROR! unlocked pool allocator get wild %p\n", ptr);
  184. ncnn::fastFree(ptr);
  185. }
  186. #if NCNN_VULKAN
  187. VkAllocator::VkAllocator(const VulkanDevice* _vkdev) : vkdev(_vkdev)
  188. {
  189. mappable = false;
  190. }
  191. VkBuffer VkAllocator::create_buffer(size_t size, VkBufferUsageFlags usage)
  192. {
  193. VkBufferCreateInfo bufferCreateInfo;
  194. bufferCreateInfo.sType = VK_STRUCTURE_TYPE_BUFFER_CREATE_INFO;
  195. bufferCreateInfo.pNext = 0;
  196. bufferCreateInfo.flags = 0;
  197. bufferCreateInfo.size = size;
  198. bufferCreateInfo.usage = usage;
  199. bufferCreateInfo.sharingMode = VK_SHARING_MODE_EXCLUSIVE;
  200. bufferCreateInfo.queueFamilyIndexCount = 0;
  201. bufferCreateInfo.pQueueFamilyIndices = 0;
  202. VkBuffer buffer;
  203. VkResult ret = vkCreateBuffer(vkdev->vkdevice(), &bufferCreateInfo, 0, &buffer);
  204. if (ret != VK_SUCCESS)
  205. {
  206. fprintf(stderr, "vkCreateBuffer failed %d\n", ret);
  207. return 0;
  208. }
  209. return buffer;
  210. }
  211. VkDeviceMemory VkAllocator::allocate_memory(size_t size, uint32_t memory_type_index)
  212. {
  213. VkMemoryAllocateInfo memoryAllocateInfo;
  214. memoryAllocateInfo.sType = VK_STRUCTURE_TYPE_MEMORY_ALLOCATE_INFO;
  215. memoryAllocateInfo.pNext = 0;
  216. memoryAllocateInfo.allocationSize = size;
  217. memoryAllocateInfo.memoryTypeIndex = memory_type_index;
  218. VkDeviceMemory memory = 0;
  219. VkResult ret = vkAllocateMemory(vkdev->vkdevice(), &memoryAllocateInfo, 0, &memory);
  220. if (ret != VK_SUCCESS)
  221. {
  222. fprintf(stderr, "vkAllocateMemory failed %d\n", ret);
  223. }
  224. return memory;
  225. }
  226. VkDeviceMemory VkAllocator::allocate_dedicated_memory(size_t size, uint32_t memory_type_index, VkBuffer buffer)
  227. {
  228. VkMemoryAllocateInfo memoryAllocateInfo;
  229. memoryAllocateInfo.sType = VK_STRUCTURE_TYPE_MEMORY_ALLOCATE_INFO;
  230. memoryAllocateInfo.pNext = 0;
  231. memoryAllocateInfo.allocationSize = size;
  232. memoryAllocateInfo.memoryTypeIndex = memory_type_index;
  233. VkMemoryDedicatedAllocateInfoKHR memoryDedicatedAllocateInfo;
  234. memoryDedicatedAllocateInfo.sType = VK_STRUCTURE_TYPE_MEMORY_DEDICATED_ALLOCATE_INFO_KHR;
  235. memoryDedicatedAllocateInfo.pNext = 0;
  236. memoryDedicatedAllocateInfo.image = 0;
  237. memoryDedicatedAllocateInfo.buffer = buffer;
  238. memoryAllocateInfo.pNext = &memoryDedicatedAllocateInfo;
  239. VkDeviceMemory memory = 0;
  240. VkResult ret = vkAllocateMemory(vkdev->vkdevice(), &memoryAllocateInfo, 0, &memory);
  241. if (ret != VK_SUCCESS)
  242. {
  243. fprintf(stderr, "vkAllocateMemory failed %d\n", ret);
  244. }
  245. return memory;
  246. }
  247. static inline size_t least_common_multiple(size_t a, size_t b)
  248. {
  249. if (a == b)
  250. return a;
  251. if (a > b)
  252. return least_common_multiple(b, a);
  253. size_t lcm = b;
  254. while (lcm % a != 0)
  255. {
  256. lcm += b;
  257. }
  258. return lcm;
  259. }
  260. VkUnlockedBlobBufferAllocator::VkUnlockedBlobBufferAllocator(const VulkanDevice* _vkdev) : VkAllocator(_vkdev)
  261. {
  262. mappable = vkdev->info.device_local_memory_index == vkdev->info.unified_memory_index;
  263. buffer_offset_alignment = vkdev->info.buffer_offset_alignment;
  264. if (mappable)
  265. {
  266. // least common multiple for memory_map_alignment and buffer_offset_alignment
  267. size_t memory_map_alignment = vkdev->info.memory_map_alignment;
  268. buffer_offset_alignment = least_common_multiple(buffer_offset_alignment, memory_map_alignment);
  269. }
  270. block_size = alignSize(16 * 1024 * 1024, buffer_offset_alignment);// 16M
  271. }
  272. VkUnlockedBlobBufferAllocator::~VkUnlockedBlobBufferAllocator()
  273. {
  274. clear();
  275. }
  276. void VkUnlockedBlobBufferAllocator::set_block_size(size_t _block_size)
  277. {
  278. block_size = _block_size;
  279. }
  280. void VkUnlockedBlobBufferAllocator::clear()
  281. {
  282. // fprintf(stderr, "VkUnlockedBlobBufferAllocator %lu\n", buffer_blocks.size());
  283. for (size_t i=0; i<buffer_blocks.size(); i++)
  284. {
  285. VkBufferMemory* ptr = buffer_blocks[i];
  286. // std::list< std::pair<size_t, size_t> >::iterator it = budgets[i].begin();
  287. // while (it != budgets[i].end())
  288. // {
  289. // fprintf(stderr, "VkUnlockedBlobBufferAllocator budget %p %lu %lu\n", ptr->buffer, it->first, it->second);
  290. // it++;
  291. // }
  292. if (mappable)
  293. vkUnmapMemory(vkdev->vkdevice(), ptr->memory);
  294. vkDestroyBuffer(vkdev->vkdevice(), ptr->buffer, 0);
  295. vkFreeMemory(vkdev->vkdevice(), ptr->memory, 0);
  296. delete ptr;
  297. }
  298. buffer_blocks.clear();
  299. budgets.clear();
  300. }
  301. VkBufferMemory* VkUnlockedBlobBufferAllocator::fastMalloc(size_t size)
  302. {
  303. size_t aligned_size = alignSize(size, buffer_offset_alignment);
  304. const int buffer_block_count = buffer_blocks.size();
  305. // find first spare space in buffer_blocks
  306. for (int i=0; i<buffer_block_count; i++)
  307. {
  308. std::list< std::pair<size_t, size_t> >::iterator it = budgets[i].begin();
  309. while (it != budgets[i].end())
  310. {
  311. size_t budget_size = it->second;
  312. if (budget_size < aligned_size)
  313. {
  314. it++;
  315. continue;
  316. }
  317. // return sub buffer
  318. VkBufferMemory* ptr = new VkBufferMemory;
  319. ptr->buffer = buffer_blocks[i]->buffer;
  320. ptr->offset = it->first;
  321. ptr->memory = buffer_blocks[i]->memory;
  322. ptr->capacity = aligned_size;
  323. ptr->mapped_ptr = buffer_blocks[i]->mapped_ptr;
  324. // adjust budgets
  325. if (budget_size == aligned_size)
  326. {
  327. budgets[i].erase(it);
  328. }
  329. else
  330. {
  331. it->first += aligned_size;
  332. it->second -= aligned_size;
  333. }
  334. // fprintf(stderr, "VkUnlockedBlobBufferAllocator M %p +%lu %lu\n", ptr->buffer, ptr->offset, ptr->capacity);
  335. return ptr;
  336. }
  337. }
  338. size_t new_block_size = std::max(block_size, aligned_size);
  339. // create new block
  340. VkBufferMemory* block = new VkBufferMemory;
  341. block->buffer = create_buffer(new_block_size, VK_BUFFER_USAGE_STORAGE_BUFFER_BIT | VK_BUFFER_USAGE_TRANSFER_SRC_BIT | VK_BUFFER_USAGE_TRANSFER_DST_BIT);
  342. block->offset = 0;
  343. // TODO respect VK_KHR_dedicated_allocation ?
  344. VkMemoryRequirements memoryRequirements;
  345. vkGetBufferMemoryRequirements(vkdev->vkdevice(), block->buffer, &memoryRequirements);
  346. block->memory = allocate_memory(memoryRequirements.size, vkdev->info.device_local_memory_index);
  347. vkBindBufferMemory(vkdev->vkdevice(), block->buffer, block->memory, 0);
  348. block->mapped_ptr = 0;
  349. if (mappable)
  350. {
  351. vkMapMemory(vkdev->vkdevice(), block->memory, 0, new_block_size, 0, &block->mapped_ptr);
  352. }
  353. buffer_blocks.push_back(block);
  354. // return sub buffer
  355. VkBufferMemory* ptr = new VkBufferMemory;
  356. ptr->buffer = block->buffer;
  357. ptr->offset = 0;
  358. ptr->memory = block->memory;
  359. ptr->capacity = aligned_size;
  360. ptr->mapped_ptr = block->mapped_ptr;
  361. // adjust budgets
  362. std::list< std::pair<size_t, size_t> > budget;
  363. if (new_block_size > aligned_size)
  364. {
  365. budget.push_back(std::make_pair(aligned_size, new_block_size - aligned_size));
  366. }
  367. budgets.push_back(budget);
  368. // fprintf(stderr, "VkUnlockedBlobBufferAllocator M %p +%lu %lu\n", ptr->buffer, ptr->offset, ptr->capacity);
  369. return ptr;
  370. }
  371. void VkUnlockedBlobBufferAllocator::fastFree(VkBufferMemory* ptr)
  372. {
  373. // fprintf(stderr, "VkUnlockedBlobBufferAllocator F %p +%lu %lu\n", ptr->buffer, ptr->offset, ptr->capacity);
  374. const int buffer_block_count = buffer_blocks.size();
  375. int block_index = -1;
  376. for (int i=0; i<buffer_block_count; i++)
  377. {
  378. if (buffer_blocks[i]->buffer == ptr->buffer && buffer_blocks[i]->memory == ptr->memory)
  379. {
  380. block_index = i;
  381. break;
  382. }
  383. }
  384. if (block_index == -1)
  385. {
  386. fprintf(stderr, "FATAL ERROR! unlocked VkUnlockedBlobBufferAllocator get wild %p\n", ptr->buffer);
  387. delete ptr;
  388. return;
  389. }
  390. // merge
  391. std::list< std::pair<size_t, size_t> >::iterator it_merge_left = budgets[block_index].end();
  392. std::list< std::pair<size_t, size_t> >::iterator it_merge_right = budgets[block_index].end();
  393. std::list< std::pair<size_t, size_t> >::iterator it = budgets[block_index].begin();
  394. for ( ; it != budgets[block_index].end(); it++)
  395. {
  396. if (it->first + it->second == ptr->offset)
  397. {
  398. it_merge_left = it;
  399. }
  400. else if (ptr->offset + ptr->capacity == it->first)
  401. {
  402. it_merge_right = it;
  403. }
  404. }
  405. if (it_merge_left != budgets[block_index].end() && it_merge_right != budgets[block_index].end())
  406. {
  407. it_merge_left->second = it_merge_right->first + it_merge_right->second - it_merge_left->first;
  408. budgets[block_index].erase(it_merge_right);
  409. }
  410. else if (it_merge_left != budgets[block_index].end())
  411. {
  412. it_merge_left->second = ptr->offset + ptr->capacity - it_merge_left->first;
  413. }
  414. else if (it_merge_right != budgets[block_index].end())
  415. {
  416. it_merge_right->second = it_merge_right->first + it_merge_right->second - ptr->offset;
  417. it_merge_right->first = ptr->offset;
  418. }
  419. else
  420. {
  421. if (ptr->offset == 0)
  422. {
  423. // chain leading block
  424. budgets[block_index].push_front(std::make_pair(ptr->offset, ptr->capacity));
  425. }
  426. else
  427. {
  428. budgets[block_index].push_back(std::make_pair(ptr->offset, ptr->capacity));
  429. }
  430. }
  431. delete ptr;
  432. }
  433. VkBlobBufferAllocator::VkBlobBufferAllocator(const VulkanDevice* _vkdev) : VkUnlockedBlobBufferAllocator(_vkdev)
  434. {
  435. }
  436. VkBlobBufferAllocator::~VkBlobBufferAllocator()
  437. {
  438. clear();
  439. }
  440. void VkBlobBufferAllocator::clear()
  441. {
  442. MutexLockGuard guard(budgets_lock);
  443. VkUnlockedBlobBufferAllocator::clear();
  444. }
  445. VkBufferMemory* VkBlobBufferAllocator::fastMalloc(size_t size)
  446. {
  447. MutexLockGuard guard(budgets_lock);
  448. return VkUnlockedBlobBufferAllocator::fastMalloc(size);
  449. }
  450. void VkBlobBufferAllocator::fastFree(VkBufferMemory* ptr)
  451. {
  452. MutexLockGuard guard(budgets_lock);
  453. return VkUnlockedBlobBufferAllocator::fastFree(ptr);
  454. }
  455. VkWeightBufferAllocator::VkWeightBufferAllocator(const VulkanDevice* _vkdev) : VkAllocator(_vkdev)
  456. {
  457. mappable = vkdev->info.device_local_memory_index == vkdev->info.unified_memory_index;
  458. buffer_offset_alignment = vkdev->info.buffer_offset_alignment;
  459. if (mappable)
  460. {
  461. // least common multiple for memory_map_alignment and buffer_offset_alignment
  462. size_t memory_map_alignment = vkdev->info.memory_map_alignment;
  463. buffer_offset_alignment = least_common_multiple(buffer_offset_alignment, memory_map_alignment);
  464. }
  465. block_size = alignSize(8 * 1024 * 1024, buffer_offset_alignment);// 8M
  466. }
  467. VkWeightBufferAllocator::~VkWeightBufferAllocator()
  468. {
  469. clear();
  470. }
  471. void VkWeightBufferAllocator::set_block_size(size_t _block_size)
  472. {
  473. block_size = _block_size;
  474. }
  475. void VkWeightBufferAllocator::clear()
  476. {
  477. // fprintf(stderr, "VkWeightBufferAllocator %lu %lu\n", buffer_blocks.size(), dedicated_buffer_blocks.size());
  478. buffer_block_free_spaces.clear();
  479. for (size_t i=0; i<buffer_blocks.size(); i++)
  480. {
  481. VkBufferMemory* ptr = buffer_blocks[i];
  482. if (mappable)
  483. vkUnmapMemory(vkdev->vkdevice(), ptr->memory);
  484. vkDestroyBuffer(vkdev->vkdevice(), ptr->buffer, 0);
  485. vkFreeMemory(vkdev->vkdevice(), ptr->memory, 0);
  486. delete ptr;
  487. }
  488. buffer_blocks.clear();
  489. for (size_t i=0; i<dedicated_buffer_blocks.size(); i++)
  490. {
  491. VkBufferMemory* ptr = dedicated_buffer_blocks[i];
  492. if (mappable)
  493. vkUnmapMemory(vkdev->vkdevice(), ptr->memory);
  494. vkDestroyBuffer(vkdev->vkdevice(), ptr->buffer, 0);
  495. vkFreeMemory(vkdev->vkdevice(), ptr->memory, 0);
  496. delete ptr;
  497. }
  498. dedicated_buffer_blocks.clear();
  499. }
  500. VkBufferMemory* VkWeightBufferAllocator::fastMalloc(size_t size)
  501. {
  502. // fprintf(stderr, "VkWeightBufferAllocator fastMalloc %lu\n", size);
  503. size_t aligned_size = alignSize(size, buffer_offset_alignment);
  504. const int buffer_block_count = buffer_blocks.size();
  505. // find first spare space in buffer_blocks
  506. int block_index = -1;
  507. size_t block_offset = 0;
  508. for (int i=0; i<buffer_block_count; i++)
  509. {
  510. size_t free_size = buffer_block_free_spaces[i];
  511. if (free_size >= aligned_size)
  512. {
  513. block_index = i;
  514. block_offset = block_size - free_size;
  515. break;
  516. }
  517. }
  518. if (block_index != -1)
  519. {
  520. // return sub buffer
  521. VkBufferMemory* ptr = new VkBufferMemory;
  522. ptr->buffer = buffer_blocks[block_index]->buffer;
  523. ptr->offset = block_offset;
  524. ptr->memory = buffer_blocks[block_index]->memory;
  525. ptr->capacity = aligned_size;
  526. ptr->mapped_ptr = buffer_blocks[block_index]->mapped_ptr;
  527. buffer_block_free_spaces[block_index] -= aligned_size;
  528. return ptr;
  529. }
  530. size_t new_block_size = std::max(block_size, aligned_size);
  531. // create new block
  532. VkBufferMemory* block = new VkBufferMemory;
  533. block->buffer = create_buffer(new_block_size, VK_BUFFER_USAGE_STORAGE_BUFFER_BIT | VK_BUFFER_USAGE_TRANSFER_DST_BIT);
  534. block->offset = 0;
  535. if (vkdev->info.support_VK_KHR_get_memory_requirements2 && vkdev->info.support_VK_KHR_dedicated_allocation)
  536. {
  537. VkBufferMemoryRequirementsInfo2KHR bufferMemoryRequirementsInfo2;
  538. bufferMemoryRequirementsInfo2.sType = VK_STRUCTURE_TYPE_BUFFER_MEMORY_REQUIREMENTS_INFO_2_KHR;
  539. bufferMemoryRequirementsInfo2.pNext = 0;
  540. bufferMemoryRequirementsInfo2.buffer = block->buffer;
  541. VkMemoryRequirements2KHR memoryRequirements2;
  542. memoryRequirements2.sType = VK_STRUCTURE_TYPE_MEMORY_REQUIREMENTS_2_KHR;
  543. memoryRequirements2.pNext = 0;
  544. VkMemoryDedicatedRequirementsKHR memoryDedicatedRequirements;
  545. memoryDedicatedRequirements.sType = VK_STRUCTURE_TYPE_MEMORY_DEDICATED_REQUIREMENTS_KHR;
  546. memoryDedicatedRequirements.pNext = 0;
  547. memoryRequirements2.pNext = &memoryDedicatedRequirements;
  548. vkdev->vkGetBufferMemoryRequirements2KHR(vkdev->vkdevice(), &bufferMemoryRequirementsInfo2, &memoryRequirements2);
  549. bool dedicatedAllocation = memoryDedicatedRequirements.requiresDedicatedAllocation || memoryDedicatedRequirements.prefersDedicatedAllocation;
  550. if (dedicatedAllocation)
  551. {
  552. block->memory = allocate_dedicated_memory(memoryRequirements2.memoryRequirements.size, vkdev->info.device_local_memory_index, block->buffer);
  553. vkBindBufferMemory(vkdev->vkdevice(), block->buffer, block->memory, 0);
  554. block->mapped_ptr = 0;
  555. if (mappable)
  556. {
  557. vkMapMemory(vkdev->vkdevice(), block->memory, 0, new_block_size, 0, &block->mapped_ptr);
  558. }
  559. dedicated_buffer_blocks.push_back(block);
  560. // return sub buffer
  561. VkBufferMemory* ptr = new VkBufferMemory;
  562. ptr->buffer = block->buffer;
  563. ptr->offset = 0;
  564. ptr->memory = block->memory;
  565. ptr->capacity = new_block_size;
  566. ptr->mapped_ptr = block->mapped_ptr;
  567. return ptr;
  568. }
  569. }
  570. VkMemoryRequirements memoryRequirements;
  571. vkGetBufferMemoryRequirements(vkdev->vkdevice(), block->buffer, &memoryRequirements);
  572. block->memory = allocate_memory(memoryRequirements.size, vkdev->info.device_local_memory_index);
  573. vkBindBufferMemory(vkdev->vkdevice(), block->buffer, block->memory, 0);
  574. // fprintf(stderr, "VkWeightBufferAllocator M %p\n", block->buffer);
  575. block->mapped_ptr = 0;
  576. if (mappable)
  577. {
  578. vkMapMemory(vkdev->vkdevice(), block->memory, 0, new_block_size, 0, &block->mapped_ptr);
  579. }
  580. buffer_blocks.push_back(block);
  581. buffer_block_free_spaces.push_back(new_block_size - aligned_size);
  582. // return sub buffer
  583. VkBufferMemory* ptr = new VkBufferMemory;
  584. ptr->buffer = block->buffer;
  585. ptr->offset = 0;
  586. ptr->memory = block->memory;
  587. ptr->capacity = aligned_size;
  588. ptr->mapped_ptr = block->mapped_ptr;
  589. return ptr;
  590. }
  591. void VkWeightBufferAllocator::fastFree(VkBufferMemory* ptr)
  592. {
  593. // fprintf(stderr, "VkWeightBufferAllocator F %p\n", ptr->buffer);
  594. delete ptr;
  595. }
  596. VkUnlockedStagingBufferAllocator::VkUnlockedStagingBufferAllocator(const VulkanDevice* _vkdev) : VkAllocator(_vkdev)
  597. {
  598. mappable = true;
  599. memory_type_index = vkdev->info.unified_memory_index;
  600. if (memory_type_index == -1)
  601. memory_type_index = vkdev->info.host_visible_memory_index;
  602. size_compare_ratio = 192;// 0.75f * 256
  603. }
  604. VkUnlockedStagingBufferAllocator::~VkUnlockedStagingBufferAllocator()
  605. {
  606. clear();
  607. }
  608. void VkUnlockedStagingBufferAllocator::set_size_compare_ratio(float scr)
  609. {
  610. if (scr < 0.f || scr > 1.f)
  611. {
  612. fprintf(stderr, "invalid size compare ratio %f\n", scr);
  613. return;
  614. }
  615. size_compare_ratio = (unsigned int)(scr * 256);
  616. }
  617. void VkUnlockedStagingBufferAllocator::clear()
  618. {
  619. // fprintf(stderr, "VkUnlockedStagingBufferAllocator %lu\n", budgets.size());
  620. std::list<VkBufferMemory*>::iterator it = budgets.begin();
  621. for (; it != budgets.end(); it++)
  622. {
  623. VkBufferMemory* ptr = *it;
  624. // fprintf(stderr, "VkUnlockedStagingBufferAllocator F %p\n", ptr->buffer);
  625. vkUnmapMemory(vkdev->vkdevice(), ptr->memory);
  626. vkDestroyBuffer(vkdev->vkdevice(), ptr->buffer, 0);
  627. vkFreeMemory(vkdev->vkdevice(), ptr->memory, 0);
  628. delete ptr;
  629. }
  630. budgets.clear();
  631. }
  632. VkBufferMemory* VkUnlockedStagingBufferAllocator::fastMalloc(size_t size)
  633. {
  634. // find free budget
  635. std::list<VkBufferMemory*>::iterator it = budgets.begin();
  636. for (; it != budgets.end(); it++)
  637. {
  638. VkBufferMemory* ptr = *it;
  639. size_t capacity = ptr->capacity;
  640. // size_compare_ratio ~ 100%
  641. if (capacity >= size && ((capacity * size_compare_ratio) >> 8) <= size)
  642. {
  643. budgets.erase(it);
  644. // fprintf(stderr, "VkUnlockedStagingBufferAllocator M %p %lu reused %lu\n", ptr->buffer, size, capacity);
  645. return ptr;
  646. }
  647. }
  648. VkBufferMemory* ptr = new VkBufferMemory;
  649. ptr->buffer = create_buffer(size, VK_BUFFER_USAGE_TRANSFER_SRC_BIT | VK_BUFFER_USAGE_TRANSFER_DST_BIT);
  650. ptr->offset = 0;
  651. VkMemoryRequirements memoryRequirements;
  652. vkGetBufferMemoryRequirements(vkdev->vkdevice(), ptr->buffer, &memoryRequirements);
  653. ptr->memory = allocate_memory(memoryRequirements.size, memory_type_index);
  654. vkBindBufferMemory(vkdev->vkdevice(), ptr->buffer, ptr->memory, 0);
  655. ptr->capacity = size;
  656. vkMapMemory(vkdev->vkdevice(), ptr->memory, 0, size, 0, &ptr->mapped_ptr);
  657. // fprintf(stderr, "VkUnlockedStagingBufferAllocator M %p %lu\n", ptr->buffer, size);
  658. return ptr;
  659. }
  660. void VkUnlockedStagingBufferAllocator::fastFree(VkBufferMemory* ptr)
  661. {
  662. // fprintf(stderr, "VkUnlockedStagingBufferAllocator F %p\n", ptr->buffer);
  663. // return to budgets
  664. budgets.push_back(ptr);
  665. }
  666. VkStagingBufferAllocator::VkStagingBufferAllocator(const VulkanDevice* _vkdev) : VkUnlockedStagingBufferAllocator(_vkdev)
  667. {
  668. }
  669. VkStagingBufferAllocator::~VkStagingBufferAllocator()
  670. {
  671. clear();
  672. }
  673. void VkStagingBufferAllocator::clear()
  674. {
  675. MutexLockGuard guard(budgets_lock);
  676. VkUnlockedStagingBufferAllocator::clear();
  677. }
  678. VkBufferMemory* VkStagingBufferAllocator::fastMalloc(size_t size)
  679. {
  680. MutexLockGuard guard(budgets_lock);
  681. return VkUnlockedStagingBufferAllocator::fastMalloc(size);
  682. }
  683. void VkStagingBufferAllocator::fastFree(VkBufferMemory* ptr)
  684. {
  685. MutexLockGuard guard(budgets_lock);
  686. VkUnlockedStagingBufferAllocator::fastFree(ptr);
  687. }
  688. VkWeightStagingBufferAllocator::VkWeightStagingBufferAllocator(const VulkanDevice* _vkdev) : VkAllocator(_vkdev)
  689. {
  690. mappable = true;
  691. memory_type_index = vkdev->info.host_visible_memory_index;
  692. }
  693. VkWeightStagingBufferAllocator::~VkWeightStagingBufferAllocator()
  694. {
  695. }
  696. VkBufferMemory* VkWeightStagingBufferAllocator::fastMalloc(size_t size)
  697. {
  698. VkBufferMemory* ptr = new VkBufferMemory;
  699. ptr->buffer = create_buffer(size, VK_BUFFER_USAGE_TRANSFER_SRC_BIT | VK_BUFFER_USAGE_TRANSFER_DST_BIT);
  700. ptr->offset = 0;
  701. VkMemoryRequirements memoryRequirements;
  702. vkGetBufferMemoryRequirements(vkdev->vkdevice(), ptr->buffer, &memoryRequirements);
  703. ptr->memory = allocate_memory(memoryRequirements.size, memory_type_index);
  704. vkBindBufferMemory(vkdev->vkdevice(), ptr->buffer, ptr->memory, 0);
  705. ptr->capacity = size;
  706. vkMapMemory(vkdev->vkdevice(), ptr->memory, 0, size, 0, &ptr->mapped_ptr);
  707. // fprintf(stderr, "VkWeightStagingBufferAllocator M %p %lu\n", ptr->buffer, size);
  708. return ptr;
  709. }
  710. void VkWeightStagingBufferAllocator::fastFree(VkBufferMemory* ptr)
  711. {
  712. // fprintf(stderr, "VkWeightStagingBufferAllocator F %p\n", ptr->buffer);
  713. vkUnmapMemory(vkdev->vkdevice(), ptr->memory);
  714. vkDestroyBuffer(vkdev->vkdevice(), ptr->buffer, 0);
  715. vkFreeMemory(vkdev->vkdevice(), ptr->memory, 0);
  716. delete ptr;
  717. }
  718. #endif // NCNN_VULKAN
  719. } // namespace ncnn