384 Commits (b49cb56ad9f4d3d2ca83bf7eec4e808a534cce72)

Author SHA1 Message Date
  nihui b49cb56ad9 constify vulkan device handle, use default local vulkan device if not specified 7 years ago
  nihui 5e07749a4a do not emit upload transfer on unified memory 7 years ago
  nihui 9ebac3fe9e dedicated reference counter for staging data 7 years ago
  nihui 68afd1fa17 reset fence 7 years ago
  nihui 81ee56b209 copy buffer has offset alignment limit, re-implement concat as compute pipeline 7 years ago
  nihuini 83efa73cf6 fallback to cpu forward if layer not support vulkan, automatically! 7 years ago
  nihuini bdd305638d command reset 7 years ago
  nihuini 10a088397e concat interleave image row 7 years ago
  nihuini 1ace8068e3 zero detected is not error 7 years ago
  nihuini 14efdd8e00 reorg shader 7 years ago
  nihui b62e9c4b1e shufflechannel shader 7 years ago
  nihuini bb04055e80 permute shader 7 years ago
  nihui 24f423b0c6 fix build on msvc 7 years ago
  nihui cc4376d8e6 do not upload unnecessary pack1 weight, reduce gpu memory usage 7 years ago
  nihui 0ad0c07526 drop duplicated weight data in convolution-fc, use the more light-weight pipelines 7 years ago
  nihuini 43c4b57201 group deconvolution packing family 7 years ago
  nihuini 8547864b6f group convolution packing family 7 years ago
  nihuini 675fcc72a5 interp vulkan 7 years ago
  nihuini 37413ea95c implement depthwise deconvolution vulkan, fix top blob state 7 years ago
  nihuini 468516879f implement deconvolution vulkan family support 7 years ago
  nihuini e213605cd4 reduce memory usage of weight packing 7 years ago
  nihuini 7312887671 transfer command hold data context 7 years ago
  nihuini 4a57f88c3c vkcompute auto begin end, use proper alignment for vktransfer staging buffer offset 7 years ago
  nihuini 39f2c71d5b fix name conflict on ios 7 years ago
  nihui f4e12101c0 fix convolution typed innerproduct pack4 7 years ago
  nihui 0acdbebf3b merge refcount into buffer memory cookie 7 years ago
  nihui 960ffa1a50 optimize workgroup size for convolution depthwise and innerproduct pack4 7 years ago
  nihui a15b389d86 fix innerproduct pack1to4 pack4to1 weight upload 7 years ago
  Emmanuel Benazera a8fd79e1bc fixed cell initialization in LSTM layer 7 years ago
  nihui 62543f9b1e flatten pack1to4 7 years ago
  nihui 9480dcbc36 fix innerproduct out packing 7 years ago
  nihui f9dc551081 add innerproduct pack1to4 pack4to1 glue code 7 years ago
  nihui 3f91d6b529 add innerproduct pack1to4 pack4to1 shader 7 years ago
  nihui cd7f120250 lrn norm across channel pack4, rename member name with pipeline prefix 7 years ago
  nihui 7ee3216fff add convolution pack1to4 pack4to1 7 years ago
  nihui 9d2b345eab lrn region within channel pack4 7 years ago
  nihui ad68e1e0e6 enable googlenet alexnet vulkan benchmark, fix build on msvc 7 years ago
  nihui 559183904b fix random crash on dedicated allocation 7 years ago
  nihui f9ea621305 pooling full padding 7 years ago
  nihui ee59f14900 add lrn shader 7 years ago
  nihui 1792fe79ec drop deprectaed softmax shader, destory softmax pipeline 7 years ago
  nihui 9e2b327c17 packing shader for 3-dim blob 7 years ago
  nihuini 9a805b045e innerproduct receive flattened blob 7 years ago
  nihui c60773bde4 add transfer-transfer barrier, concat pack4 7 years ago
  nihui 303996af4c auto flatten before innerproduct 7 years ago
  nihuini ba723706bb add flatten pack4 7 years ago
  nihui f0b4933eac
massive simd optimize in compute shader (#772) 7 years ago
  nihui 8e5674363b
element packing (#770) 7 years ago
  mzpan 777f3f98d9 add w=1 h=1 op (#765) 7 years ago
  Eric Liu e6b1412217 Increase a few performance of yolov3 and change tab to space (#767) 7 years ago