84 Commits (1469bc8b19b83d44206f36abfa3dc7377feeef69)

Author SHA1 Message Date
  nihui 7d1eec3d5d the use_bf16_storage option 6 years ago
  xieydd b760e22da2
fix requant relu6 bug (#1590) 6 years ago
  nihui 52ce59e672 fix build with requant option on 6 years ago
  nihui 0f7e7bca02
shader shape specialization constant and basic local group size partition (#1523) 6 years ago
  nihui e2bd4eae6e write shape as 4-number tuple 6 years ago
  nihui 6cefaad957 ncnnoptimize shape inference, load shape hint 6 years ago
  nihui a718129d76 shader pack8 option works 6 years ago
  nihui 6f2ef1932d int8 code refactoring wip, add int8 test 6 years ago
  Anton Kochkov 07170542c9 Fix GCC 9.x warnings (#1462) 6 years ago
  Sungmann Cho 9bfc554bc9 Fix warnings on Visual Studio (#1431) 6 years ago
  nihuini 3c9b3074e4 reclaim local vulkan allocator after blob_mats_gpu clear, fix random crash in multithread gpu inference without explicit per-thread allocator set 6 years ago
  nihuini 50e8b5e4e8 multiple transfers may run concurrently if there is no dependency with each other, do not share staging buffer memory to fix potential data race 6 years ago
  nihuini 33956cbfc3 pretty error info 6 years ago
  nihuini a170ef1acf remove the default option usage in layer interface, fix write out of range in cast arm pack4, handle fp16p conversion on cpu/gpu transfer 6 years ago
  nihuini e73b06bbb8 fix build with NCNN_STRING=OFF 6 years ago
  nihuini 64333429bb data reader wrapper, fix #1325 6 years ago
  nihui 8c1b87b1a2 fallback to cpu if no vulkan device found 6 years ago
  Natsu 637d96c1d2 Fix gcc 9 compilation failure (#1189) 6 years ago
  nihui ff62e7eed9 use_packing_layout option works 6 years ago
  nihui b4c388a72a Mat misc function accept option parameter, deconvolution pack4 arm neon 6 years ago
  nihui 8c53706987 net vkdev getter api 6 years ago
  BUG1989 bcfe9f453f initial the ncnn post training quantization tools (#1067) 7 years ago
  nihuini b25f76833a restore per extractor allocator setters, patially revert e09607bc22 7 years ago
  nihuini 21b5508c96 shared locked vkallocator cannot prevent concurrent accessing during actual gpu inference, use seperated vkallocator for each queue 7 years ago
  nihuini 040a8d2427 set vulkan device by gpu index 7 years ago
  nihui 21f79b8546 prefer cpu fp16 casting to reduce upload/download overhead on discrete gpu 7 years ago
  nihuini e09607bc22 add option to upload model function, pipeline creation honors option use flags, setting allocator per extractor do not make much sense 7 years ago
  BUG1989 d9f269fa3d use sgemm fp32 on arm platform,optimize conv1x1s2 (#1031) 7 years ago
  nihuini 838c5df839 option api changes 7 years ago
  nihuini 7f7bbf12e5 new api for getting the default gpu device 7 years ago
  nihuini cd7559c639 more fix for fp16p, still disabled by default 7 years ago
  nihui 25b9736f82 shader fp16 packed 7 years ago
  nihuini 738fb6bb14 print gpu per layer benchmark 7 years ago
  nihuini c9a9486307 merge command submit and wait, expose queue_count, concurrent queue submission shall work 7 years ago
  nihuini 7a8f68aca6 move vulkan code to subdir, new layer interface create_pipeline and destroy_pipeline for post-loading works 7 years ago
  nihuini b81e1f3906 get rid of the old workaround :) 7 years ago
  nihuini 4729ea3505 bottom blob memory never alias, reuse blob memory more elegantly relying on refcount 7 years ago
  nihui 8724440c59 bind wait barrier count member to memory, fix #932 7 years ago
  nihui 162c46647d do not create fp16 shader module on unsupported platform 7 years ago
  nihui d753fe2589 upload fp16 weight, enable fp16 storage and arithmetic 7 years ago
  Gemfield add8c73922 Fix the return value of load_param and load_model (#855) 7 years ago
  Gemfield 573c2bcd93 Fix crash issue during load_model (#848) 7 years ago
  nihui caeb85d6cd multithreaded pipeline creation and destruction may cause driver crash :( 7 years ago
  nihuini b2e41bf83d fallback convolution to cpu path for pad -233 7 years ago
  nihuini d999f43b87 fix vulkan initialization using memory loading 7 years ago
  nihuini d263cd507c gpu packing and unpacking 7 years ago
  nihuini d3a11eb6c9 one codepath for unified and discrete device 7 years ago
  nihuini 433a92401a auto barrier in pipeline and copy command 7 years ago
  nihuini 1f4bdd91b5 uint32_t typed workgroup size 7 years ago
  BUG1989 df3d224484 new int8 implement,better accuracy (#749) 7 years ago