nihui
dd83284cee
prelu shader
7 years ago
nihui
e50b339f04
clip shader
7 years ago
nihui
69788b0467
reshape shader family
7 years ago
nihui
c41bcd98a3
priorbox shader, fix permute order 1 on image, fix potential staging memory leak
7 years ago
nihuini
a88b6bfbb8
update softmax version
7 years ago
nihuini
0956d26df1
add absval sigmoid tanh shader
7 years ago
nihuini
b5faa0e519
respect pad param in deconv vulkan
7 years ago
BUG1989
8e337d440e
fix the bug with convdw7x7 op working on int8 mode ( #818 )
7 years ago
nihuini
4bc543d85c
crop shader
7 years ago
nihuini
9787625e4b
warn users about the old wrong softmax behavior on axis not zero
7 years ago
nihuini
c54e57ed6f
Merge branch 'master' of https://github.com/Tencent/ncnn
7 years ago
nihuini
85a28959e4
fix binaryop shader binding, use shared buffer state, fix blob copy in non-light mode, fix #817
7 years ago
BUG1989
8ff831f7cd
fix the segmentation fault when load int8 model ( #811 )
7 years ago
nihuini
ff0e8c85c5
bind the same pipeline may cause driver incorrectly optimize into one, use two pipelines to always change the current one
7 years ago
BUG1989
df3d224484
new int8 implement,better accuracy ( #749 )
* add the armv7a conv3x3s1 implement without overflow,remove old codes
* fix the bug of conv3x3s2 packed int8
* new int8 implement,weight quant by perchanel,better accuracy~
* fix the bug of conv3x3s1 packed int8 neon
* add the naive c fp32 and int8 winograd F(2,3)
* add the neon intrinsic int8 winograd F(2,3)
* optimize the armv7a int8 winograd F(2,3) with neon assembly
* optimize the armv7a int8 winograd F(2,3) input transform with assembly.
* add the requantize layer and int8 relu implement.
* add graph optimize conv1x1s2 -> conv1x1s1,begin optimize int8 aarch64.
* fix int8 bugs
* add the c naive im2col with sgemm
* add aarch64 int8 winograd f23, conv3x3s2 naive implement
* add the int8 sgemm conv7x7s2 on x86/armv7a platform
* optimize the int8 sgemm by neon intrinsic and packed kernel
* optimize the int8 sgemm with packed data
* optimize the int8 sgemm with armv7a neon assembly
* add the int8 sgemm on arm64-v8a platform
* perpare to merge latest codes from master
* add the int8 param files
* In the Class Net,add the fuse_network method
7 years ago
nihuini
4ac56d3c1c
unified memory index is not mandatory, sanity check
7 years ago
nihuini
5f0ee22a33
treat as unified memory architecture if memory heap is same
7 years ago
nihui
253bef2f7b
example now runs on gpu when vulkan enabled
7 years ago
nihui
d85775fbcd
fix softmax axis order on 3-dim, fix caffe reshape conversion, regenerate ssd param
7 years ago
nihui
979ed57487
packing param for identity packing when padding disabled, auto packing conversion between cpu and gpu blob
7 years ago
nihui
b49cb56ad9
constify vulkan device handle, use default local vulkan device if not specified
7 years ago
nihui
5e07749a4a
do not emit upload transfer on unified memory
7 years ago
nihui
182c340b3a
enable ssd vulkan benchmark
7 years ago
nihui
9ebac3fe9e
dedicated reference counter for staging data
7 years ago
nihui
68afd1fa17
reset fence
7 years ago
nihui
81ee56b209
copy buffer has offset alignment limit, re-implement concat as compute pipeline
7 years ago
nihuini
f162de7263
drop deprecated hack
7 years ago
nihuini
83efa73cf6
fallback to cpu forward if layer not support vulkan, automatically!
7 years ago
nihuini
bdd305638d
command reset
7 years ago
nihuini
10a088397e
concat interleave image row
7 years ago
nihuini
1ace8068e3
zero detected is not error
7 years ago
nihuini
ab4c94aea9
fix cpu-only build
7 years ago
nihuini
b54e115f6e
enable mobilenet-yolo mobilenet-yolov3 vulkan benchmark
7 years ago
nihuini
14efdd8e00
reorg shader
7 years ago
nihui
723d326760
enable shufflenet and vgg16 in vulkan benchmark
7 years ago
nihui
b62e9c4b1e
shufflechannel shader
7 years ago
nihuini
bb04055e80
permute shader
7 years ago
nihui
24f423b0c6
fix build on msvc
7 years ago
nihui
cc4376d8e6
do not upload unnecessary pack1 weight, reduce gpu memory usage
7 years ago
nihui
0ad0c07526
drop duplicated weight data in convolution-fc, use the more light-weight pipelines
7 years ago
nihuini
43c4b57201
group deconvolution packing family
7 years ago
nihuini
8547864b6f
group convolution packing family
7 years ago
nihuini
675fcc72a5
interp vulkan
7 years ago
nihuini
37413ea95c
implement depthwise deconvolution vulkan, fix top blob state
7 years ago
nihuini
468516879f
implement deconvolution vulkan family support
7 years ago
nihuini
e213605cd4
reduce memory usage of weight packing
7 years ago
nihuini
7312887671
transfer command hold data context
7 years ago
nihuini
4a57f88c3c
vkcompute auto begin end, use proper alignment for vktransfer staging buffer offset
7 years ago
nihuini
39f2c71d5b
fix name conflict on ios
7 years ago
nihui
f4e12101c0
fix convolution typed innerproduct pack4
7 years ago