nihui/ncnn - ncnn - 开源协同云脑生态支撑系统

Commit Graph

Author	SHA1	Message	Date
nihui	db035d602d	update ncnnoptimize layers, lightmode=false keeps original weight (#5414 )	2 years ago
Sophon	294e786d36	convolution_x86: Fix typo in logging (#5310 ) Signed-off-by: Xilin Wu <wuxilin123@gmail.com>	2 years ago
nihui	556b79ce4d	create layer decoupled (#5258 ) * create layer decoupled * no more virtual public * allow build test with shared library * decouple cpu vulkan * drop old scripts	2 years ago
nihui	4494aadd74	deconvolution dynamic weight (#5119 )	2 years ago
nihui	7b02425246	x86 optimization for convolution int8 winograd unified elempack (#5054 )	2 years ago
nihui	9ecf6a61be	x86 optimization for convolution int8 gemm unified elempack (#4881 )	2 years ago
nihui	55709708e9	x86 optimization for convolution int8 packed unified elempack (#4861 )	2 years ago
nihui	2e3e680d77	x86 optimization for packed convolution unified elempack (#4469 )	3 years ago
nihui	bd5bbe3f2c	x86 optimization for winograd unified elempack part2 (#4470 ) * improve gemm packb threading * optimize tile size * profile winograd condition * handle threads changes	3 years ago
nihui	88274827da	x86 optimization for winograd unified elempack (#4456 )	3 years ago
nihui	1f1981052c	convolution deconvolution and deformableconv2d x86 use sgemm (#4414 ) * drop old sgemm code * fix convdw test * fix avx512 gemm * optimize prefer sgemm condition	3 years ago
nihui	8eab5ea0ea	x86 sse2/avx2 optimization for convolution sgemm/winograd int8 family (#4286 )	3 years ago
nihui	20a14bf5ae	arm convolution winograd dot function, adjust arm convolution winograd strategy (#3915 )	3 years ago
nihui	ca0ba4b25f	fine grained winograd options, adjust x86 convolution winograd strategy (#3908 ) * fine grained winograd options * x86 optimization for convolution winograd f23 pack4/pack8/pack16 * fix avx512 and t4 ci * fix fast direct conv path * winograd63 is actually slower than winograd43 on very large channel	3 years ago
nihui	241524ffce	discard weight memory for x86 arm vulkan (#3865 ) * discard weight memory for x86 and vulkan * drop arm innerproduct weight * drop arm convolution weight * drop arm convolutiondepthwise weight * drop x86 vulkan deconvolution deconvolutiondepthwise weight * drop arm deconvolution deconvolutiondepthwise weight * arm neon assembly optimization for innerproduct pack4	4 years ago
nihui	02a7e64e18	optimize x86 winograd input transform transpose (#3818 ) * optimize x86 winograd input transform transpose * x86 sse2/avx optimization for convolution winograd23/43 pack1	4 years ago
nihui	bf64d8f1ec	fix winograd function name (#3820 )	4 years ago
nihui	131f3d1323	x86 avx512 optimization for convolution winograd pack16to1 and deconvolution family, increase simpleomp argv count (#3694 ) * convolution winograd pack16to1 * x86 deconvolution and deconvolutiondepthwise * simpleomp allow 32 arguments * drop shadow variable workaround * less winograd test error	4 years ago
nihui	3d169b3237	x86 avx512 optimization (#3691 ) * convolution sgemm pack16to1 * convolution sgemm pack4to16 * eltwise avx512	4 years ago
nihui	9298d05e86	split convolution winograd transform input output (#3688 )	4 years ago
nihui	dadc640c66	x86 avx512 optimization (#3581 ) * unified relu avx512 * unifed clip avx512 * unaryop avx512 * sigmoid avx512 * binaryop avx512 * padding convolution avx512 * convolutiondepthwise avx512 * innerproduct avx512 * reshape avx512 * slice avx512 * hardsigmoid hardswish avx512 * swish avx512 * pooling avx512 * crop avx512 * convolution sgemm pack16 * convolution 3x3 winograd pack16 * interp avx512 * convolution sgemm pack1to16 * convolution sgemm pack16to8 * convolution sgemm pack8to16 * convolution sgemm pack16to4 * fix vulkan permute pack8 * fix vulkan convolution gemm pack8to1	4 years ago
nihui	920aa79f04	drop x86 avx2 fp16 (#3568 )	4 years ago
nihuini	57a7101fc6	fix ci, second try	4 years ago
nihuini	cfedcfdc57	fix ci, first try	4 years ago
nihui	3f2799d706	always build tightly packed weight, fix #3545 (#3547 )	4 years ago
nihui	139554b36e	rewrite convolution x86 sgemm pack1 (#3544 )	4 years ago
nihui	fb6283c8b0	x86 avx fma optimization (#3543 )	4 years ago
nihui	de77b669c4	x86 sse2 optimization for conv1x1/3x3 pack4 and general sgemm pack4/pack4to1 (#3538 ) * x86 sse2 optimization for conv1x1 conv3x3 pack4 and general sgemm pack4/pack4to1 * x86 sse2 optimization for conv3x3s1 pack4to1 and general sgemm convolution pack4to1, use aligned load/store * enforce explicit alignment	4 years ago
nihui	d95213a005	x86 convolution int8 optimization third stage (#3506 ) * avx-vnni and avx512-vnni optimization for convolution int8 gemm and 3x3 winograd pack8to4/pack8to1	4 years ago
nihui	c2896bcd4d	x86 convolution int8 optimization second stage (#3495 ) * some sse 4.1 optimization * sse2/avx2 optimization for convolution 3x3 winograd42 int8 pack8to4/pack8to1	4 years ago
nihui	e9b8f0a6ef	x86 avx2 optimization for convolution gemm int8 (#3489 )	4 years ago
nihui	6941ec8fc9	arm neon optimization for general packed convolution (#3426 )	4 years ago
nihui	999e640d43	dynamic convolution weight (#3408 )	4 years ago
nihui	24fbb6e8cb	honor thread setting on load and vulkan command, ci avx512 t4 (#3391 )	4 years ago
Tijmen Verhulsdonck	ac5dc23ccc	added a number of optimized sse layers (#3302 ) * added a number of optimized sse layers, specifically to increase performance of mobilenet style networks	4 years ago
zhiliu6	a08f700775	Optimize avx convolution activation (#3299 ) * use general fmadd * forceline x86 fmadd for better performance * fix msvc compile warning * simplify swish implementation * Use activation layer for better performance * Optimize x86 ConvolutionDepthWise activation	4 years ago
zhiliu6	814f89ef1a	Fuse HardSwish activation into Convolution and InnerProduct (#3233 ) * add general fused activation * add NCNN_FORCE_INLINE option	4 years ago
Tijmen Verhulsdonck	4270b5c502	Fix broken codepaths with AVX only (#3254 ) * Fix codepaths for fp16 weights when only AVX is enabled * Disable opt overrides * Update SDK url * Update vulkan SDK download version * Debugging risv pad * apply code-format changes * fix padding test * fix mips slice test * fix lrn test * implement mish swish image shader, fix pooling adaptive image storage support, drop debug output * update ci ubuntu 18.04 Co-authored-by: nihui <shuizhuyuanluo@126.com>	4 years ago
Tijmen Verhulsdonck	eaa7e24db6	Added ability to switch AVX/AVX2 during runtime (#3076 )	4 years ago
Evgeny Proydakov	9245cdca42	Fixed compile warnings for clang compiler on MacOS. [-Wunused-parameter] (#2998 )	5 years ago
nihuini	687cc857b1	some x86 sse2 optimization for convolution int8	5 years ago
nihui	7e1aaa5828	cmake option NCNN_INT8 (#2839 )	5 years ago
nihui	1ea8bfbd2e	x86 avx2 conv3x3s1 pack8 direct optimization, fix #2789	5 years ago
nihui	5fe75f19ef	architecture changes for int8 packing (#2771 ) * quantize and dequantize tests * unify activation and usability function * drop NCNN_REQUANT cmake option, test dequantize requantize pack8, fix webassembly build * benchmark use requantize int8 model	5 years ago
zhiliu6	57397c418d	Optimize general AVX2 convolution. (#2714 )	5 years ago
nihui	82c4acc187	conv1x1s1 and packing pack4 x86 optimization, fix #2510 fix #2509	5 years ago
Zhuo Zhang	f13035794a	fix convolution_x86*.cpp-shadowed-variables-warning (#2444 )	5 years ago
nihuini	1a3191e245	fix libncnn build with gcc-4.8 and gcc-4.4, fix #2388	5 years ago
zhiliu6	25b224479c	optimize left over x86 convolution (#2378 )	5 years ago
nihui	a071637064	optional sse2 (#2373 )	5 years ago

1 2 3

112 Commits (db035d602de6ec0cd3bdd191cb21f4b73e7599be)