nihuini
cd4be6d0fa
call vulkan create_pipeline on the vkdev condition, drop opt_cpu hacks
6 years ago
nihuini
81a028547a
fix bus error on armv7
6 years ago
nihuini
19d75955d6
arm neon assembly optimization for conv3x3s1 winograd pack4to1
6 years ago
bindog
04b4b02324
[WIP] add reduce op support for onnx ( #1308 )
* [WIP] add reduce op support for onnx
* extend reduction to support 1,2-dim reduction and keepdims
* fix compile error
* split type to 3 flags && split keepdims to another function
6 years ago
nihuini
22a2be4e6c
fix crop pack4 with reference blob
6 years ago
nihuini
6a8e5c58da
fix build on armv7
6 years ago
nihuini
e63e2449fd
arm neon assembly optimization for conv7x7s2 pack1to4
6 years ago
nihui
56fd26a2da
arm neon assembly optimization for conv1x1s1 pack4to1
6 years ago
nihui
7ad514917b
fix potential out of write on unroll 12 remainder
6 years ago
nihuini
15e86dc8e9
reduce pack4 weight memory usage for specialized kernel, reduce runtime memory usage in conv3x3s1 winograd
6 years ago
nihuini
581a06d471
since innerproduct pack4 always consumes flattened blob, which layout is same as pack1 branch, so reuse pack1 implementation to reduce memory usage
6 years ago
nihuini
c5f1dc3fe4
arm neon assembly optimization for conv3x3s1 pack4to1
6 years ago
nihui
2f8b31c3b4
unroll outch 2 for conv3x3s1 pack1to4
6 years ago
nihui
e0f6e3f669
pre-interleave 8-channel weight data on aarch64, conv1x1s1 version
6 years ago
nihuini
d11bf14d44
pre-interleave 8-channel weight data on aarch64
6 years ago
nihui
7173b6e38e
arm neon assembly optimization for conv3x3s2 pack4
6 years ago
nihuini
cf0c49dd71
arm neon assembly optimization for conv5x5s1 pack4 and conv5x5s2 pack4
6 years ago
nihui
9e529354fb
arm neon optimization for conv1x1s2 pack4
6 years ago
nihuini
f8f3b0b5aa
shufflechannel pack4
6 years ago
nihuini
50d5896ce7
reshape pack4
6 years ago
nihuini
624291e2b2
use subop optimization for group convolution deconvolution pack4 family
6 years ago
nihui
48e3e7d49c
move neon activation into a wrapper function
6 years ago
nihui
b37ecab630
auto flatten before innerproduct pack4
6 years ago
nihui
afd1f08194
arm neon assembly optimization for pooling2x2s2 max pack4
6 years ago
nihui
e19b7097df
arm neon assembly optimization for conv3x3s1 pack1to4
6 years ago
nihui
3ac6335ba3
hardsigmoid and hardswish pack4
6 years ago
nihui
21e74487b4
arm neon optimization for convdw5x5 pack4
6 years ago
volvet
ecd64fb36b
Fixed lots of compile warnings ( #1286 )
* Fixed lots of compile warnings
* refine the unused warning change
6 years ago
nihui
3e1bad4880
arm neon assembly optimization for pooling3x3s2 max pack4
6 years ago
nihui
08a97c169f
arm neon assembly optimization for relu pack4
6 years ago
nihui
a1bd88fb4a
arm neon assembly optimization for padding constant pack4
6 years ago
nihui
17f343e7e4
convdw3x3 pack4 arm neon assembly optimization
6 years ago
nihui
6703286831
the very long ld1, one less load
6 years ago
nihui
22a3ade6ce
unroll size 12 for conv1x1s1 and conv3x3s1 winograd pack4 on aarch64
6 years ago
nihui
3a452f734a
arm neon assembly optimization for conv3x3s2 pack1to4
6 years ago
nihui
6edd42f566
arm neon assembly for conv1x1s1 and conv3x3s1 winograd pack4
6 years ago
nihuini
c0a4ffcf66
convolution pad_value param
6 years ago
nihuini
587a67eb51
the noop layer
6 years ago
nihuini
b7085ceec0
deconvolution apply output adj first, then crop the padding
6 years ago
tpoisonooo
8dbafe7764
constraint input value to [-127, +127] ( #1258 )
* constraint input value to [-127, +127]
* keep new line at the end
6 years ago
nihui
e56fcc77c5
optimize dot memory layout
6 years ago
nihuini
8a7b4b035e
radv crash with large local group size, workaround
6 years ago
nihuini
80f898b079
unaryop tanh vulkan
6 years ago
nihuini
91ef4eea4f
fix unaryop arm, fix #1241
6 years ago
nihuini
3e3189736c
fix msvc build, fix #1237
6 years ago
Xu Yang
31cf7f3c5b
fix ConvolutionDepthWise int8_requantize ( #1233 )
6 years ago
nihuini
c4bebc6371
x86 conv3x3s1 winograd43 produce wrong result, revert to the good-old winograd23 version
6 years ago
CnybTseng
d11c4c1d42
修改最新版ncnn/src/layer/vulkan/shader目录下的几个文件,以适配最新版的glslang,本次修改已在大疆Manifold2-G平台上验证通过 ( #1231 )
6 years ago
nihui
46e7ac76ab
apply sgemm-like dot in winograd pack4 neon
6 years ago
nihui
d6860d93f2
fix batchnorm pack4 neon multithreaded
6 years ago