nihuini
c4f23ae8ad
rename Mat packing to elempack
7 years ago
nihui
7655b9e4e9
fix build on armv7 again ...
7 years ago
nihui
a97439988f
fix build on armv7
7 years ago
nihuini
81a5dfe76b
general convolution and convolutiondepthwise arm neon pack4, wip
7 years ago
tpoisonooo
1ca4387c9c
Auto choose conv implementation ( #1085 )
* add relative README_CN.md;
* obtain time cost with op->forward().
7 years ago
BUG1989
bcfe9f453f
initial the ncnn post training quantization tools ( #1067 )
* initial the ncnn post training quantization tools
* clear some comments of tools
* fix the Travis ci compiler error
7 years ago
BUG1989
d9f269fa3d
use sgemm fp32 on arm platform,optimize conv1x1s2 ( #1031 )
7 years ago
nihuini
4de4078779
move platform includes out of namespace
7 years ago
nihui
3e003ffd98
fuse sigmoid
7 years ago
nihuini
7a8f68aca6
move vulkan code to subdir, new layer interface create_pipeline and destroy_pipeline for post-loading works
7 years ago
nihuini
3f85cafc08
fuse relu leakyrelu clip into convolution/deconvolution/innerproduct
7 years ago
BUG1989
93a34a897d
add int8 winograd F(4,3) with neon assembly optimization ( #891 )
* add the implement of int8 winograd F(4,3)
* add int8 winograd F(4,3) naive c to arm64-v8a platform
* optimize int8 winograd F(4,3) with neon
* merge dequant op into int8 winograd F(4,3)
* enable int8 wino F(4,3) case with all size
7 years ago
BUG1989
780c7d9a72
merge de/requantize op, optimize some int8 conv layer on arm64-v8a ( #867 )
* optimize the conv sgemm int8 on arm64-v8a platform
* optimize int8 arm64-v8a with sadalp ins
* merge requantize op into latest conv layer
* merge requantize op into conv-int8 op
* update the mobilenet.param in the benchmark
* Update README.md
update Kirin970 and RK3399
* try to fix the travis build error
7 years ago
BUG1989
2f4c4a8202
fix the compile error when using armv7a without neon ( #835 )
7 years ago
BUG1989
ff38053321
[WIP] arm64-v8a int8 optimization ( #823 )
* requantize layer arm64-v8a neon implement
* convdw3x3s1 arm64-v8a neon implement
* convdw3x3s2 arm64-v8a neon implement
* conv1x1s1 arm64-v8a is optimized by neon assembly
* conv sgemm int8 optimized with neon assembly,kernel transform is offline
* conv conv winograd int8 optimized with neon assembly,fix ci build failed
* conv3x3s2 int8 arm64-v8a optimized with neon assembly,remove old codes.
7 years ago
BUG1989
8e337d440e
fix the bug with convdw7x7 op working on int8 mode ( #818 )
7 years ago
BUG1989
df3d224484
new int8 implement,better accuracy ( #749 )
* add the armv7a conv3x3s1 implement without overflow,remove old codes
* fix the bug of conv3x3s2 packed int8
* new int8 implement,weight quant by perchanel,better accuracy~
* fix the bug of conv3x3s1 packed int8 neon
* add the naive c fp32 and int8 winograd F(2,3)
* add the neon intrinsic int8 winograd F(2,3)
* optimize the armv7a int8 winograd F(2,3) with neon assembly
* optimize the armv7a int8 winograd F(2,3) input transform with assembly.
* add the requantize layer and int8 relu implement.
* add graph optimize conv1x1s2 -> conv1x1s1,begin optimize int8 aarch64.
* fix int8 bugs
* add the c naive im2col with sgemm
* add aarch64 int8 winograd f23, conv3x3s2 naive implement
* add the int8 sgemm conv7x7s2 on x86/armv7a platform
* optimize the int8 sgemm by neon intrinsic and packed kernel
* optimize the int8 sgemm with packed data
* optimize the int8 sgemm with armv7a neon assembly
* add the int8 sgemm on arm64-v8a platform
* perpare to merge latest codes from master
* add the int8 param files
* In the Class Net,add the fuse_network method
7 years ago
BUG1989
229f8fd8db
add the armv7a conv3x3s2, convdw3x3s1/s2 int8 implement without overflow
7 years ago
BUG1989
4289c64090
add the armv7a conv1x1s1 sgemm int8 implement without overflow : )
7 years ago
nihuini
19ad4cf284
fix build without neon
7 years ago
nihuini
8526a69777
packed int8 convolution 3x3 stride 1 for armv7, 7%~25% faster than vanilla one, but God knows how hard I try :|
7 years ago
nihuini
6f1b0b0a61
quantized padding in convolution, use range sweets
7 years ago
nihui
72411b7a6c
restore the old conv3x3s2 as reference, fast dilation convolution fails on striding
7 years ago
nihui
1f20eb4e8c
pack weight and more unroll makes improvement, ~20% faster for conv3x3s2
7 years ago
nihui
fe14037777
more sub op preload
7 years ago
nihui
2fe7ada4d8
add arm int8 convolution stub, preload group op for x86
7 years ago
nihui
5d04a3a45c
layer holds bottom blob scale, depthwise convolution read group scales
7 years ago
nihuini
da352916fe
fix pd using flag condition
8 years ago
nihuini
e34aa7786a
armv7 int8 quantize/dequantize and conv1x1s1
8 years ago
nihui
a169cec363
core int8 inference, quantize and dequantize, net using flag, caffe2ncnn reads int8 scale table
8 years ago
nihui
9706cd1447
implement ncnn blob/workspace allocator, fine-grained per-layer openmp threads control, fix #469
8 years ago
nihui
5879cb4d15
sgemm outperform direct conv on large channel
8 years ago
nihui
56a667472a
sgemm is always faster on common channel size
8 years ago
nihuini
d172a34329
direct assembly port, enable convolution 1x1 sgemm on armv7
8 years ago
nihuini
0fdb8da60e
sgemm convolution 1x1 wip, about 20%~75% faster on aarch64, while armv7 compiler is foolish qaq
8 years ago
nihui
72bb261e7a
switch to winograd5
8 years ago
Hyungsuk Yoon
8f56e00b4b
make convolution with dilation fast
8 years ago
nihui
7d1e49584d
call Innerproduct for convolution on flattened blob
8 years ago
nihui
03c1f63c2e
switch to winograd4
8 years ago
nihui
a181d25098
new model load api, fix #215
8 years ago
nihui
bdb70a2010
padding w h in convolution and deconvolution
8 years ago
nihui
44b4519307
non-square convolution and deconvolution kernel stride dilation
8 years ago
nihuini
964040fe3c
more runtime decisions for winograd path
8 years ago
nihui
c77ca16468
enable conv3x3s1 winograd optimization, two paths for small image on armv7 and all for aarch64
8 years ago
nihui
790829bc62
partition dot tiles and reuse kernel register, over 20% improvement for tiny image
8 years ago
nihui
eea3ca577a
disable winograd atm ...
8 years ago
nihui
0385d8e8ad
implement winograd64 optimization for convolution 3x3s1
8 years ago
nihuini
47218db6e5
fix minus padding SAME, fix #116
8 years ago
nihuini
23630b14b9
implement tensorflow style padding SAME type for convolution and pooling, second try
9 years ago
nihuini
b7db8be4f6
add ncnn source qwq
9 years ago