nihui
b49cb56ad9
constify vulkan device handle, use default local vulkan device if not specified
7 years ago
nihui
5e07749a4a
do not emit upload transfer on unified memory
7 years ago
nihui
9ebac3fe9e
dedicated reference counter for staging data
7 years ago
nihui
68afd1fa17
reset fence
7 years ago
nihui
81ee56b209
copy buffer has offset alignment limit, re-implement concat as compute pipeline
7 years ago
nihuini
83efa73cf6
fallback to cpu forward if layer not support vulkan, automatically!
7 years ago
nihuini
bdd305638d
command reset
7 years ago
nihuini
10a088397e
concat interleave image row
7 years ago
nihuini
1ace8068e3
zero detected is not error
7 years ago
nihuini
14efdd8e00
reorg shader
7 years ago
nihui
b62e9c4b1e
shufflechannel shader
7 years ago
nihuini
bb04055e80
permute shader
7 years ago
nihui
24f423b0c6
fix build on msvc
7 years ago
nihui
cc4376d8e6
do not upload unnecessary pack1 weight, reduce gpu memory usage
7 years ago
nihui
0ad0c07526
drop duplicated weight data in convolution-fc, use the more light-weight pipelines
7 years ago
nihuini
43c4b57201
group deconvolution packing family
7 years ago
nihuini
8547864b6f
group convolution packing family
7 years ago
nihuini
675fcc72a5
interp vulkan
7 years ago
nihuini
37413ea95c
implement depthwise deconvolution vulkan, fix top blob state
7 years ago
nihuini
468516879f
implement deconvolution vulkan family support
7 years ago
nihuini
e213605cd4
reduce memory usage of weight packing
7 years ago
nihuini
7312887671
transfer command hold data context
7 years ago
nihuini
4a57f88c3c
vkcompute auto begin end, use proper alignment for vktransfer staging buffer offset
7 years ago
nihuini
39f2c71d5b
fix name conflict on ios
7 years ago
nihui
f4e12101c0
fix convolution typed innerproduct pack4
7 years ago
nihui
0acdbebf3b
merge refcount into buffer memory cookie
7 years ago
nihui
960ffa1a50
optimize workgroup size for convolution depthwise and innerproduct pack4
7 years ago
nihui
a15b389d86
fix innerproduct pack1to4 pack4to1 weight upload
7 years ago
Emmanuel Benazera
a8fd79e1bc
fixed cell initialization in LSTM layer
7 years ago
nihui
62543f9b1e
flatten pack1to4
7 years ago
nihui
9480dcbc36
fix innerproduct out packing
7 years ago
nihui
f9dc551081
add innerproduct pack1to4 pack4to1 glue code
7 years ago
nihui
3f91d6b529
add innerproduct pack1to4 pack4to1 shader
7 years ago
nihui
cd7f120250
lrn norm across channel pack4, rename member name with pipeline prefix
7 years ago
nihui
7ee3216fff
add convolution pack1to4 pack4to1
7 years ago
nihui
9d2b345eab
lrn region within channel pack4
7 years ago
nihui
ad68e1e0e6
enable googlenet alexnet vulkan benchmark, fix build on msvc
7 years ago
nihui
559183904b
fix random crash on dedicated allocation
7 years ago
nihui
f9ea621305
pooling full padding
7 years ago
nihui
ee59f14900
add lrn shader
7 years ago
nihui
1792fe79ec
drop deprectaed softmax shader, destory softmax pipeline
7 years ago
nihui
9e2b327c17
packing shader for 3-dim blob
7 years ago
nihuini
9a805b045e
innerproduct receive flattened blob
7 years ago
nihui
c60773bde4
add transfer-transfer barrier, concat pack4
7 years ago
nihui
303996af4c
auto flatten before innerproduct
7 years ago
nihuini
ba723706bb
add flatten pack4
7 years ago
nihui
f0b4933eac
massive simd optimize in compute shader ( #772 )
* init vec4 shader
* more vec4 shader ...
* convolutiondepthwise is depthwise
* pooling pack4, fix global pooling
* dropout pack4, relu pack4
* softmax pack4
* more shader vec4 ..
* fix staging remap, remove layer pipeline member, add destroy_pipeline interface, add pack4 glue code
* eltwise pack4 glue code
* add binary pack4, unary pack4
* add binaryop unaryop pack4 glue code
7 years ago
nihui
8e5674363b
element packing ( #770 )
* mat packing
* packing layer
* packing works
* convert_packing function
7 years ago
mzpan
777f3f98d9
add w=1 h=1 op ( #765 )
* add w=1 h=1 op
* add w=1 h=1 op
* add w=1 h=1 op
7 years ago
Eric Liu
e6b1412217
Increase a few performance of yolov3 and change tab to space ( #767 )
* Fixed a yolov3 resolution bug
* Set yolo defalut mean to 1.0
* Fix coding style and increase a few performance
* Update mobilenet yolov3 benchmark param
7 years ago