nihui/ncnn - ncnn - 开源协同云脑生态支撑系统

Commit Graph

Author	SHA1	Message	Date
nihuini	d6b2ea5aac	arm neon optimization for convolution 3x3 on small channels	5 years ago
nihuini	6a397716ca	arm neon optimization for instancenorm	5 years ago
nihuini	687cc857b1	some x86 sse2 optimization for convolution int8	5 years ago
zhiliu6	ec0f904c16	improve x86 1x1 pack8 convolution performance (#2852 )	5 years ago
nihuini	68468dccbd	arm neon assembly optimization for padding int8 pack8, convolution int8 out elempack 4	5 years ago
nihuini	31d436c627	more verbose load failure, ncnn2int8 write int8 data properly	5 years ago
nihuini	05d457c78f	innerproduct int8 support all fused activation types	5 years ago
nihuini	1bc0126302	fix crash when input cpu blob and extract the same from gpu, update vgg16 int8 model	5 years ago
nihui	7e1aaa5828	cmake option NCNN_INT8 (#2839 )	5 years ago
nihui	66455c1b95	implement 2823 binary broadcasting type (#2827 )	5 years ago
nihuini	85efe132ff	unroll inch 4 for convolution sgemm int8	5 years ago
nihui	c6cd5e8628	fix armv7 no-neon build	5 years ago
nihuini	e38a5fcbe6	fix build	5 years ago
nihuini	01f5dcb700	arm neon optimization for convolution sgemm pack1 pack8to1 int8	5 years ago
zhiliu6	c4700c52ca	optimize x86 1x1 pack8 convolution (#2820 )	5 years ago
nihui	0d1d5b66c5	fix arm64 asm build	5 years ago
nihuini	e9ab1acf27	arm neon optimization for convolution sgemm pack1to8 int8	5 years ago
nihuini	e975de1f36	better condition for arm82 conv3x3s1 winograd	5 years ago
nihuini	41a4bea954	unroll size 8 for conv3x3s1 pack8to1 int8 arm64	5 years ago
nihuini	3631c1933d	non-inlined addref and release slows down overall speed, move them to header	5 years ago
nihui	e9cc637573	arm neon optimization for int8 packing kernels (#2809 )	5 years ago
nihui	d7cbc055f3	fix illegal instruction on pi4 when NCNN_ARM82 enabled compiler may compile inline member functions as noinline blocks for different architectures, and linker may pick the newer arch, that results illegal instructions on old hardware	5 years ago
nihuini	256754bff9	fix build with old gcc, fix #2805	5 years ago
zhiliu6	61cd9da55b	optimize x86 3x3 pack8 leftover (#2797 )	5 years ago
nihuini	912e81d086	fix tanh neon, fix #2751	5 years ago
nihui	32b48f0157	fix int8 auto pack layout	5 years ago
nihui	1ea8bfbd2e	x86 avx2 conv3x3s1 pack8 direct optimization, fix #2789	5 years ago
nihui	5fe75f19ef	architecture changes for int8 packing (#2771 ) * quantize and dequantize tests * unify activation and usability function * drop NCNN_REQUANT cmake option, test dequantize requantize pack8, fix webassembly build * benchmark use requantize int8 model	5 years ago
nihui	d4a7abc218	fix onnx2ncnn clip without max blob, fix #2788	5 years ago
nihui	eaee64c782	export vkcommand and vkcompute	5 years ago
nihuini	e86799e95f	fix get_big_cpu_count return zero on smp cpu	5 years ago
nihui	7c079b853e	default to big cpu count	5 years ago
restyled-io[bot]	5f00ba89d2	feat(ncnnoptimize): replace denormals to zero on layers with weights (#2690 ) * feat(ncnnoptimize): replace denormals to zero on layers with weights Co-authored-by: youngsoo.lee <youngsoo15.lee@gmail.com> Co-authored-by: Restyled.io <commits@restyled.io>	5 years ago
nihui	67e24e0703	use local pool allocator (#2736 ) * use local pool allocator * detach extract feat from local allocator * fix test	5 years ago
nihuini	15d63ec0f5	fuse onnx multiheadattention with same qkv blob	5 years ago
Cai Shanli	f5b307689b	fix net and extractor destroy order when use vulkan (#2732 )	5 years ago
RBelogorodtsevFBase	1212ed6e94	implements gelu activation (#2749 )	5 years ago
nihui	b58cd14678	fix non arm-neon build	5 years ago
nihui	0870bf45b1	optimize warpaffine family	5 years ago
nihuini	c17eb4e208	multiheadattention layer	5 years ago
nihuini	b51959802c	fix buffer2host copy, fix #2725	5 years ago
nihuini	7ac23ab34d	fuse onnx layernorm, fix 2-dim layernorm implementation, add test	5 years ago
zhiliu6	57397c418d	Optimize general AVX2 convolution. (#2714 )	5 years ago
Xu Yang	fd634e9a58	remove unnecessary mat clone when NCNN_BENCHMARK enabled (#2708 )	5 years ago
Dahan Gong	cbd410c237	fix broken inplace forward (#2709 )	5 years ago
restyled-io[bot]	8c9bea2322	Restyle faster bbox calculation by background score (#2693 ) * faster bbox calculation by background score * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle Co-authored-by: Qoo <r97922153@gmail.com> Co-authored-by: Restyled.io <commits@restyled.io>	5 years ago
zylo117	41fba71fa0	fix adaptive avg pooling accumulation overflow in vulkan using fp16 arithmetic (#2698 )	5 years ago
nihui	3c92a1184b	arm neon optimization for general convolution im2col sgemm (#2668 ) * arm neon optimization for conv3x3s1 winograd42 * better condition * Update test_convolution.cpp * Update test_convolution.cpp * more proper conditions * arm neon optimization for general im2col sgemm pack4 * add sgemm * wip * wip * fix armv7 build * more conditions blah blah * code format * fix convolution * move packed convolution to seperated header source * unify weight data bf16 * proper conditions * conv3x3s2 sgemm pack4 test	5 years ago
zylo117	65d71d8f23	support adaptive_pooling in vulkan implementation (#2681 )	5 years ago
Youngsoo Lee	b9bed8d993	feat: add denormal options (#2656 ) * feat: add denormal options Flush-To-Zero(FTZ) and Denormals-Are-Zero(DAZ) are modes that bypass IEEE754 methods of dealing with denormal floating-point numbers on x86_64 and some x86 CPUs. * feat: Integrate `flush_denormals` into `Extractor::extract` * chore: replace global variable with `ThreadLocalStorage`	5 years ago

1 2 3 4 5 ...

1257 Commits (d6b2ea5aacee36141bf4597d8d68a2aafe472559)