nihui
db035d602d
update ncnnoptimize layers, lightmode=false keeps original weight ( #5414 )
2 years ago
Sophon
294e786d36
convolution_x86: Fix typo in logging ( #5310 )
Signed-off-by: Xilin Wu <wuxilin123@gmail.com>
2 years ago
nihui
556b79ce4d
create layer decoupled ( #5258 )
* create layer decoupled
* no more virtual public
* allow build test with shared library
* decouple cpu vulkan
* drop old scripts
2 years ago
nihui
4494aadd74
deconvolution dynamic weight ( #5119 )
2 years ago
nihui
7b02425246
x86 optimization for convolution int8 winograd unified elempack ( #5054 )
2 years ago
nihui
9ecf6a61be
x86 optimization for convolution int8 gemm unified elempack ( #4881 )
2 years ago
nihui
55709708e9
x86 optimization for convolution int8 packed unified elempack ( #4861 )
2 years ago
nihui
2e3e680d77
x86 optimization for packed convolution unified elempack ( #4469 )
3 years ago
nihui
bd5bbe3f2c
x86 optimization for winograd unified elempack part2 ( #4470 )
* improve gemm packb threading
* optimize tile size
* profile winograd condition
* handle threads changes
3 years ago
nihui
88274827da
x86 optimization for winograd unified elempack ( #4456 )
3 years ago
nihui
1f1981052c
convolution deconvolution and deformableconv2d x86 use sgemm ( #4414 )
* drop old sgemm code
* fix convdw test
* fix avx512 gemm
* optimize prefer sgemm condition
3 years ago
nihui
8eab5ea0ea
x86 sse2/avx2 optimization for convolution sgemm/winograd int8 family ( #4286 )
3 years ago
nihui
20a14bf5ae
arm convolution winograd dot function, adjust arm convolution winograd strategy ( #3915 )
3 years ago
nihui
ca0ba4b25f
fine grained winograd options, adjust x86 convolution winograd strategy ( #3908 )
* fine grained winograd options
* x86 optimization for convolution winograd f23 pack4/pack8/pack16
* fix avx512 and t4 ci
* fix fast direct conv path
* winograd63 is actually slower than winograd43 on very large channel
3 years ago
nihui
241524ffce
discard weight memory for x86 arm vulkan ( #3865 )
* discard weight memory for x86 and vulkan
* drop arm innerproduct weight
* drop arm convolution weight
* drop arm convolutiondepthwise weight
* drop x86 vulkan deconvolution deconvolutiondepthwise weight
* drop arm deconvolution deconvolutiondepthwise weight
* arm neon assembly optimization for innerproduct pack4
4 years ago
nihui
02a7e64e18
optimize x86 winograd input transform transpose ( #3818 )
* optimize x86 winograd input transform transpose
* x86 sse2/avx optimization for convolution winograd23/43 pack1
4 years ago
nihui
bf64d8f1ec
fix winograd function name ( #3820 )
4 years ago
nihui
131f3d1323
x86 avx512 optimization for convolution winograd pack16to1 and deconvolution family, increase simpleomp argv count ( #3694 )
* convolution winograd pack16to1
* x86 deconvolution and deconvolutiondepthwise
* simpleomp allow 32 arguments
* drop shadow variable workaround
* less winograd test error
4 years ago
nihui
3d169b3237
x86 avx512 optimization ( #3691 )
* convolution sgemm pack16to1
* convolution sgemm pack4to16
* eltwise avx512
4 years ago
nihui
9298d05e86
split convolution winograd transform input output ( #3688 )
4 years ago
nihui
dadc640c66
x86 avx512 optimization ( #3581 )
* unified relu avx512
* unifed clip avx512
* unaryop avx512
* sigmoid avx512
* binaryop avx512
* padding convolution avx512
* convolutiondepthwise avx512
* innerproduct avx512
* reshape avx512
* slice avx512
* hardsigmoid hardswish avx512
* swish avx512
* pooling avx512
* crop avx512
* convolution sgemm pack16
* convolution 3x3 winograd pack16
* interp avx512
* convolution sgemm pack1to16
* convolution sgemm pack16to8
* convolution sgemm pack8to16
* convolution sgemm pack16to4
* fix vulkan permute pack8
* fix vulkan convolution gemm pack8to1
4 years ago
nihui
920aa79f04
drop x86 avx2 fp16 ( #3568 )
4 years ago
nihuini
57a7101fc6
fix ci, second try
4 years ago
nihuini
cfedcfdc57
fix ci, first try
4 years ago
nihui
3f2799d706
always build tightly packed weight, fix #3545 ( #3547 )
4 years ago
nihui
139554b36e
rewrite convolution x86 sgemm pack1 ( #3544 )
4 years ago
nihui
fb6283c8b0
x86 avx fma optimization ( #3543 )
4 years ago
nihui
de77b669c4
x86 sse2 optimization for conv1x1/3x3 pack4 and general sgemm pack4/pack4to1 ( #3538 )
* x86 sse2 optimization for conv1x1 conv3x3 pack4 and general sgemm pack4/pack4to1
* x86 sse2 optimization for conv3x3s1 pack4to1 and general sgemm convolution pack4to1, use aligned load/store
* enforce explicit alignment
4 years ago
nihui
d95213a005
x86 convolution int8 optimization third stage ( #3506 )
* avx-vnni and avx512-vnni optimization for convolution int8 gemm and 3x3 winograd pack8to4/pack8to1
4 years ago
nihui
c2896bcd4d
x86 convolution int8 optimization second stage ( #3495 )
* some sse 4.1 optimization
* sse2/avx2 optimization for convolution 3x3 winograd42 int8 pack8to4/pack8to1
4 years ago
nihui
e9b8f0a6ef
x86 avx2 optimization for convolution gemm int8 ( #3489 )
4 years ago
nihui
6941ec8fc9
arm neon optimization for general packed convolution ( #3426 )
4 years ago
nihui
999e640d43
dynamic convolution weight ( #3408 )
4 years ago
nihui
24fbb6e8cb
honor thread setting on load and vulkan command, ci avx512 t4 ( #3391 )
4 years ago
Tijmen Verhulsdonck
ac5dc23ccc
added a number of optimized sse layers ( #3302 )
* added a number of optimized sse layers, specifically to increase performance of mobilenet style networks
4 years ago
zhiliu6
a08f700775
Optimize avx convolution activation ( #3299 )
* use general fmadd
* forceline x86 fmadd for better performance
* fix msvc compile warning
* simplify swish implementation
* Use activation layer for better performance
* Optimize x86 ConvolutionDepthWise activation
4 years ago
zhiliu6
814f89ef1a
Fuse HardSwish activation into Convolution and InnerProduct ( #3233 )
* add general fused activation
* add NCNN_FORCE_INLINE option
4 years ago
Tijmen Verhulsdonck
4270b5c502
Fix broken codepaths with AVX only ( #3254 )
* Fix codepaths for fp16 weights when only AVX is enabled
* Disable opt overrides
* Update SDK url
* Update vulkan SDK download version
* Debugging risv pad
* apply code-format changes
* fix padding test
* fix mips slice test
* fix lrn test
* implement mish swish image shader, fix pooling adaptive image storage support, drop debug output
* update ci ubuntu 18.04
Co-authored-by: nihui <shuizhuyuanluo@126.com>
4 years ago
Tijmen Verhulsdonck
eaa7e24db6
Added ability to switch AVX/AVX2 during runtime ( #3076 )
4 years ago
Evgeny Proydakov
9245cdca42
Fixed compile warnings for clang compiler on MacOS. [-Wunused-parameter] ( #2998 )
5 years ago
nihuini
687cc857b1
some x86 sse2 optimization for convolution int8
5 years ago
nihui
7e1aaa5828
cmake option NCNN_INT8 ( #2839 )
5 years ago
nihui
1ea8bfbd2e
x86 avx2 conv3x3s1 pack8 direct optimization, fix #2789
5 years ago
nihui
5fe75f19ef
architecture changes for int8 packing ( #2771 )
* quantize and dequantize tests
* unify activation and usability function
* drop NCNN_REQUANT cmake option, test dequantize requantize pack8, fix webassembly build
* benchmark use requantize int8 model
5 years ago
zhiliu6
57397c418d
Optimize general AVX2 convolution. ( #2714 )
5 years ago
nihui
82c4acc187
conv1x1s1 and packing pack4 x86 optimization, fix #2510 fix #2509
5 years ago
Zhuo Zhang
f13035794a
fix convolution_x86*.cpp-shadowed-variables-warning ( #2444 )
5 years ago
nihuini
1a3191e245
fix libncnn build with gcc-4.8 and gcc-4.4, fix #2388
5 years ago
zhiliu6
25b224479c
optimize left over x86 convolution ( #2378 )
5 years ago
nihui
a071637064
optional sse2 ( #2373 )
5 years ago