nihui
db035d602d
update ncnnoptimize layers, lightmode=false keeps original weight ( #5414 )
2 years ago
nihui
556b79ce4d
create layer decoupled ( #5258 )
* create layer decoupled
* no more virtual public
* allow build test with shared library
* decouple cpu vulkan
* drop old scripts
2 years ago
nihui
4494aadd74
deconvolution dynamic weight ( #5119 )
2 years ago
nihui
a12cd7c212
mips msa and loongson mmi optimization for convolution int8 winograd f43 ( #4014 )
3 years ago
nihui
e7ca89853e
mips convolution winograd dot function and strategy ( #3925 )
3 years ago
nihui
ca0ba4b25f
fine grained winograd options, adjust x86 convolution winograd strategy ( #3908 )
* fine grained winograd options
* x86 optimization for convolution winograd f23 pack4/pack8/pack16
* fix avx512 and t4 ci
* fix fast direct conv path
* winograd63 is actually slower than winograd43 on very large channel
3 years ago
nihui
06a36e9c1f
discard weight memory for mips ( #3869 )
4 years ago
nihui
c3adbcf9f3
mips optimization for convolution sgemm ( #3853 )
* mips optimization for convolution sgemm
* mips optimization for general convolution int8 gemm
* mips optmization for convolution winograd pack1
* preload magic
4 years ago
nihui
bf64d8f1ec
fix winograd function name ( #3820 )
4 years ago
nihui
9298d05e86
split convolution winograd transform input output ( #3688 )
4 years ago
nihui
c09d7b3591
mips msa optimization for convolution int8 ( #3675 )
* basic mips msa optimization for convolution int8
* mips msa optimization for convolution int8 gemm
* mips msa optimization for convolution int8 winograd pack8to4/pack8to1
* mention msa maddv/msubv intrinsics bug
4 years ago
nihui
3f2799d706
always build tightly packed weight, fix #3545 ( #3547 )
4 years ago
nihui
6941ec8fc9
arm neon optimization for general packed convolution ( #3426 )
4 years ago
nihui
999e640d43
dynamic convolution weight ( #3408 )
4 years ago
nihui
24fbb6e8cb
honor thread setting on load and vulkan command, ci avx512 t4 ( #3391 )
4 years ago
zhiliu6
814f89ef1a
Fuse HardSwish activation into Convolution and InnerProduct ( #3233 )
* add general fused activation
* add NCNN_FORCE_INLINE option
4 years ago
nihui
927e34278c
mips msa optimization part2 ( #3063 )
* mips msa optimization for convolution 3x3 pack1to4, 7x7s2 pack1to4, dropout, eltwise, hardsigmoid, hardswish, packing, flatten, prelu
* prefetch convdw3x3 and convdw5x5
* fix mips convolution sgemm pack4to1
* more prefetch helps
4 years ago
nihui
49cda73420
mips msa optimization for convolution sgemm and winograd ( #3055 )
* mips msa optimization for convolution sgemm and 3x3s1 winograd, use msa fmadd
* mips msa optimization for convolution sgemm pack4to1
* mips msa optimization for swish, improve sgemm kernel
* unroll 12x4, use prefetch
4 years ago
nihui
1f49fc4b67
mips msa optimization for padding packing flatten innerproduct convolution convolutiondepthwise general cases
4 years ago