nihui
4136de3b8d
arm optimization for convolution int8 packed unified elempack ( #5147 )
2 years ago
nihui
4494aadd74
deconvolution dynamic weight ( #5119 )
2 years ago
nihui
80b3b9c6f0
arm optimization for convolution int8 winograd unified elempack ( #5087 )
* enable out elempack 8 for winograd and sgemm
2 years ago
nihui
c8662cce5e
arm optimization for convolution int8 gemm unified elempack ( #5016 )
2 years ago
nihui
5ac17df797
arm optimization for packed convolution unified elempack ( #4590 )
3 years ago
nihui
c777bf09dc
arm convolution sgemm unified elempack ( #4572 )
* fuse im2col and packb tile
3 years ago
nihui
dabc4c065f
arm convolution winograd unified elempack ( #4556 )
* update f43 coeffs
* arm convolution winograd unified elempack
* disable bf16s test atm
* test gnu inline asm off
3 years ago
nihui
9c6f1107d2
fix #4315 ( #4316 )
3 years ago
nihui
9b8272e86d
arm edsp and arm neon optimization for convolution int8 winograd ( #4017 )
3 years ago
nihui
20a14bf5ae
arm convolution winograd dot function, adjust arm convolution winograd strategy ( #3915 )
4 years ago
nihui
ca0ba4b25f
fine grained winograd options, adjust x86 convolution winograd strategy ( #3908 )
* fine grained winograd options
* x86 optimization for convolution winograd f23 pack4/pack8/pack16
* fix avx512 and t4 ci
* fix fast direct conv path
* winograd63 is actually slower than winograd43 on very large channel
4 years ago
Evgeny Proydakov
85e483e6ba
Fixed several compile warnings for ios build: ( #3885 )
4 years ago
nihui
7886e90c65
split arm82 source for smaller binary and memory footprint ( #3877 )
* split arm82 source, wip
* check compiler arm82 only for arm 64bit target
* drop arm82 registery
* strict check compiler support arm82
4 years ago
nihui
241524ffce
discard weight memory for x86 arm vulkan ( #3865 )
* discard weight memory for x86 and vulkan
* drop arm innerproduct weight
* drop arm convolution weight
* drop arm convolutiondepthwise weight
* drop x86 vulkan deconvolution deconvolutiondepthwise weight
* drop arm deconvolution deconvolutiondepthwise weight
* arm neon assembly optimization for innerproduct pack4
4 years ago
nihui
bf64d8f1ec
fix winograd function name ( #3820 )
4 years ago
nihui
3a827434a9
optimize arm sgemm convolution condition ( #3806 )
4 years ago
nihui
9298d05e86
split convolution winograd transform input output ( #3688 )
4 years ago
nihui
3f2799d706
always build tightly packed weight, fix #3545 ( #3547 )
4 years ago
nihui
c0a94cd9ca
fix armv7 without neon ( #3514 )
4 years ago
nihui
6941ec8fc9
arm neon optimization for general packed convolution ( #3426 )
4 years ago
nihui
878cb713d5
optional arm82 dot source ( #3415 )
4 years ago
nihui
999e640d43
dynamic convolution weight ( #3408 )
4 years ago
nihui
24fbb6e8cb
honor thread setting on load and vulkan command, ci avx512 t4 ( #3391 )
4 years ago
zhiliu6
814f89ef1a
Fuse HardSwish activation into Convolution and InnerProduct ( #3233 )
* add general fused activation
* add NCNN_FORCE_INLINE option
4 years ago
nihui
cdf45a6512
cmake option NCNN_BF16 ( #3068 )
4 years ago
Evgeny Proydakov
e01e965c68
Fixed compile warnings for clang compiler on MacOS ARM. [-Wunused-variable] ( #3000 )
5 years ago
nihuini
d6b2ea5aac
arm neon optimization for convolution 3x3 on small channels
5 years ago
nihuini
68468dccbd
arm neon assembly optimization for padding int8 pack8, convolution int8 out elempack 4
5 years ago
nihui
7e1aaa5828
cmake option NCNN_INT8 ( #2839 )
5 years ago
nihuini
01f5dcb700
arm neon optimization for convolution sgemm pack1 pack8to1 int8
5 years ago
nihuini
e9ab1acf27
arm neon optimization for convolution sgemm pack1to8 int8
5 years ago
nihuini
e975de1f36
better condition for arm82 conv3x3s1 winograd
5 years ago
nihui
e9cc637573
arm neon optimization for int8 packing kernels ( #2809 )
5 years ago
nihui
5fe75f19ef
architecture changes for int8 packing ( #2771 )
* quantize and dequantize tests
* unify activation and usability function
* drop NCNN_REQUANT cmake option, test dequantize requantize pack8, fix webassembly build
* benchmark use requantize int8 model
5 years ago
nihui
3c92a1184b
arm neon optimization for general convolution im2col sgemm ( #2668 )
* arm neon optimization for conv3x3s1 winograd42
* better condition
* Update test_convolution.cpp
* Update test_convolution.cpp
* more proper conditions
* arm neon optimization for general im2col sgemm pack4
* add sgemm
* wip
* wip
* fix armv7 build
* more conditions blah blah
* code format
* fix convolution
* move packed convolution to seperated header source
* unify weight data bf16
* proper conditions
* conv3x3s2 sgemm pack4 test
5 years ago
nihui
ab56083ca5
arm neon optimization for conv3x3s1 winograd42 ( #2664 )
5 years ago
Zhuo Zhang
a1e9993616
fix convolution_arm.cpp shadowed variables warning ( #2448 )
5 years ago
nihui
bf09af21be
exp arm fp16sa neon optimization
5 years ago
nihui
39bbb34ffc
conv1x1s1 conv3x3s1 pack4 pack8to4 arm fp16sa neon assembly optimization
5 years ago
nihuini
440db2c8fc
conv1x1 pack4 arm fp16sa
5 years ago
nihuini
b5be1449d9
conv3x3s1 winograd pack8to4 arm fp16sa
5 years ago
nihuini
d17c26e925
conv1x1s1 pack4to8 pack8to4 arm fp16sa
5 years ago
nihui
db5f05c6f0
conv1x1s1 conv3x3s1 winograd pack8to1 arm fp16sa
5 years ago
nihuini
9c33b6c1c8
conv1x1s1 arm fp16sa
5 years ago
nihuini
30ff3800d8
conv5x5s1 conv5x5s2 pack8 arm fp16sa neon assembly optimization
5 years ago
nihuini
62c453b16d
conv7x7s2 pack1to8 arm fp16sa neon assembly optimization
5 years ago
nihuini
b5b486fbfa
conv3x3s2 pack8 arm fp16sa neon assembly optimization
5 years ago
nihui
d8e9fc1443
conv3x3s1 conv3x3s2 pack1to8, padding pack8, relu pack8 arm neon fp16sa assmebly optimization
5 years ago
nihuini
f6d808b090
crop pack8 arm fp16s, conv3x3s2 pack1to8 arm fp16sa intrinsic
5 years ago
nihuini
5d5a3d1434
conv1x1s1 conv1x1s2 conv3x3s1 winograd pack8 arm fp16sa
5 years ago