nihuini
d933f384b6
bump engine version
7 years ago
nihuini
038389fa63
blacklist known buggy driver
7 years ago
nihuini
d999f43b87
fix vulkan initialization using memory loading
7 years ago
nihuini
d263cd507c
gpu packing and unpacking
7 years ago
nihuini
806911a549
packing vec and image shader
7 years ago
BUG1989
2f4c4a8202
fix the compile error when using armv7a without neon ( #835 )
7 years ago
nihuini
6f9ffca7e3
fix crop on channel dim only, fix #797 , fix #831
7 years ago
nihuini
76638aebf9
fix build on msvc
7 years ago
nihuini
d9301c4f59
convert mxnet crop slice step1, convert onnx slice step1, fix reduction dims 2, fix #441 , fix #498 , fix #519
7 years ago
BUG1989
ff38053321
[WIP] arm64-v8a int8 optimization ( #823 )
* requantize layer arm64-v8a neon implement
* convdw3x3s1 arm64-v8a neon implement
* convdw3x3s2 arm64-v8a neon implement
* conv1x1s1 arm64-v8a is optimized by neon assembly
* conv sgemm int8 optimized with neon assembly,kernel transform is offline
* conv conv winograd int8 optimized with neon assembly,fix ci build failed
* conv3x3s2 int8 arm64-v8a optimized with neon assembly,remove old codes.
7 years ago
nihuini
d3a11eb6c9
one codepath for unified and discrete device
7 years ago
nihuini
433a92401a
auto barrier in pipeline and copy command
7 years ago
nihuini
2672cd437f
add layer type index member
7 years ago
nihuini
ce65edcc84
fix flatten pack1to4
7 years ago
nihuini
5646b7d2c2
flatten image
7 years ago
nihuini
1f4bdd91b5
uint32_t typed workgroup size
7 years ago
nihuini
2e939fab0f
fix memleak
7 years ago
nihuini
532054b453
expose more device info
7 years ago
nihui
dd83284cee
prelu shader
7 years ago
nihui
e50b339f04
clip shader
7 years ago
nihui
69788b0467
reshape shader family
7 years ago
nihui
c41bcd98a3
priorbox shader, fix permute order 1 on image, fix potential staging memory leak
7 years ago
nihuini
0956d26df1
add absval sigmoid tanh shader
7 years ago
nihuini
b5faa0e519
respect pad param in deconv vulkan
7 years ago
BUG1989
8e337d440e
fix the bug with convdw7x7 op working on int8 mode ( #818 )
7 years ago
nihuini
4bc543d85c
crop shader
7 years ago
nihuini
9787625e4b
warn users about the old wrong softmax behavior on axis not zero
7 years ago
nihuini
c54e57ed6f
Merge branch 'master' of https://github.com/Tencent/ncnn
7 years ago
nihuini
85a28959e4
fix binaryop shader binding, use shared buffer state, fix blob copy in non-light mode, fix #817
7 years ago
BUG1989
8ff831f7cd
fix the segmentation fault when load int8 model ( #811 )
7 years ago
nihuini
ff0e8c85c5
bind the same pipeline may cause driver incorrectly optimize into one, use two pipelines to always change the current one
7 years ago
BUG1989
df3d224484
new int8 implement,better accuracy ( #749 )
* add the armv7a conv3x3s1 implement without overflow,remove old codes
* fix the bug of conv3x3s2 packed int8
* new int8 implement,weight quant by perchanel,better accuracy~
* fix the bug of conv3x3s1 packed int8 neon
* add the naive c fp32 and int8 winograd F(2,3)
* add the neon intrinsic int8 winograd F(2,3)
* optimize the armv7a int8 winograd F(2,3) with neon assembly
* optimize the armv7a int8 winograd F(2,3) input transform with assembly.
* add the requantize layer and int8 relu implement.
* add graph optimize conv1x1s2 -> conv1x1s1,begin optimize int8 aarch64.
* fix int8 bugs
* add the c naive im2col with sgemm
* add aarch64 int8 winograd f23, conv3x3s2 naive implement
* add the int8 sgemm conv7x7s2 on x86/armv7a platform
* optimize the int8 sgemm by neon intrinsic and packed kernel
* optimize the int8 sgemm with packed data
* optimize the int8 sgemm with armv7a neon assembly
* add the int8 sgemm on arm64-v8a platform
* perpare to merge latest codes from master
* add the int8 param files
* In the Class Net,add the fuse_network method
7 years ago
nihuini
4ac56d3c1c
unified memory index is not mandatory, sanity check
7 years ago
nihuini
5f0ee22a33
treat as unified memory architecture if memory heap is same
7 years ago
nihui
d85775fbcd
fix softmax axis order on 3-dim, fix caffe reshape conversion, regenerate ssd param
7 years ago
nihui
979ed57487
packing param for identity packing when padding disabled, auto packing conversion between cpu and gpu blob
7 years ago
nihui
b49cb56ad9
constify vulkan device handle, use default local vulkan device if not specified
7 years ago
nihui
5e07749a4a
do not emit upload transfer on unified memory
7 years ago
nihui
9ebac3fe9e
dedicated reference counter for staging data
7 years ago
nihui
68afd1fa17
reset fence
7 years ago
nihui
81ee56b209
copy buffer has offset alignment limit, re-implement concat as compute pipeline
7 years ago
nihuini
83efa73cf6
fallback to cpu forward if layer not support vulkan, automatically!
7 years ago
nihuini
bdd305638d
command reset
7 years ago
nihuini
10a088397e
concat interleave image row
7 years ago
nihuini
1ace8068e3
zero detected is not error
7 years ago
nihuini
14efdd8e00
reorg shader
7 years ago
nihui
b62e9c4b1e
shufflechannel shader
7 years ago
nihuini
bb04055e80
permute shader
7 years ago
nihui
24f423b0c6
fix build on msvc
7 years ago
nihui
cc4376d8e6
do not upload unnecessary pack1 weight, reduce gpu memory usage
7 years ago