nihui/ncnn - ncnn - 开源协同云脑生态支撑系统

Commit Graph

Author	SHA1	Message	Date
BUG1989	1b0e33460d	add armv7 int8 conv3x3s1,using vaddw to replace vadd and vmovl	7 years ago
nihui	72411b7a6c	restore the old conv3x3s2 as reference, fast dilation convolution fails on striding	7 years ago
nihui	1f20eb4e8c	pack weight and more unroll makes improvement, ~20% faster for conv3x3s2	7 years ago
chensy	30cc738309	fix asm "invalid operand" error for target iOS armv7 on file dequantize_arm.cpp	7 years ago
Diego Gomes	4d73407df8	fix gettid call for glibc	7 years ago
Diego Gomes	534f38ed87	fix auxv read for elf64	7 years ago
nihuini	2dbaf6f7b7	store int8 scale in binary	7 years ago
nihui	fe14037777	more sub op preload	7 years ago
nihui	2fe7ada4d8	add arm int8 convolution stub, preload group op for x86	7 years ago
nihui	eac7c66a97	fix fp32 group convolution on x86	7 years ago
nihui	5d04a3a45c	layer holds bottom blob scale, depthwise convolution read group scales	7 years ago
nihui	354b95256c	bump param version, backward compatible	7 years ago
nihuini	2bc504925e	fix int8_scales from multiple blobs, fix #512	7 years ago
nihuini	da352916fe	fix pd using flag condition	8 years ago
nihuini	6b536701c3	sub-mat shall be allocator-aware	8 years ago
nihuini	e34aa7786a	armv7 int8 quantize/dequantize and conv1x1s1	8 years ago
nihuini	4be27a0a89	int8 inference on x86	8 years ago
nihui	a169cec363	core int8 inference, quantize and dequantize, net using flag, caffe2ncnn reads int8 scale table	8 years ago
nihui	b6b90c888f	high resolution timestamp on windows	8 years ago
nihui	af49e2cada	install allocator.h	8 years ago
nihui	7e1f358084	fix build on msvc	8 years ago
nihui	9706cd1447	implement ncnn blob/workspace allocator, fine-grained per-layer openmp threads control, fix #469	8 years ago
nihui	5879cb4d15	sgemm outperform direct conv on large channel	8 years ago
nihuini	4b8101e7fc	Revert "optimize interleave section for load first, about 5%~10% speed gain" This reverts commit `1e4eaeeacd`.	8 years ago
nihui	56a667472a	sgemm is always faster on common channel size	8 years ago
nihui	1e4eaeeacd	optimize interleave section for load first, about 5%~10% speed gain	8 years ago
nihui	6895cbf810	single vldm is faster than two vld1 on armv7, and some pipeline optimize	8 years ago
nihuini	05d7562a5d	reorder kernel weight, pipeline friendly ;)	8 years ago
nihui	b8f4f024a4	implement reorg yolodetectionoutput layer from caffe-yolov2	8 years ago
nihuini	ee98817446	proper first row/col handling in resize family, fix #429	8 years ago
nihuini	511baa6718	optional image pixel api, fix #434	8 years ago
nihuini	2368d29a1e	more explicit alignment on armv7	8 years ago
nihuini	d172a34329	direct assembly port, enable convolution 1x1 sgemm on armv7	8 years ago
nihuini	b3e24cafc3	openmp++	8 years ago
nihuini	0fdb8da60e	sgemm convolution 1x1 wip, about 20%~75% faster on aarch64, while armv7 compiler is foolish qaq	8 years ago
nihuini	2b20bf940c	drop armv7 vaddvq_f32 hack	8 years ago
nihui	72bb261e7a	switch to winograd5	8 years ago
nihuini	a234e9240d	fix concat on height	8 years ago
nihuini	003873c55b	crop on channel and crop by param	8 years ago
Chang, Hui-Tang	dc2a689d10	fix proposal roi_score_blob bug (#430 )	8 years ago
nihuini	99a343ce70	allocate after permute, reduce peak memory usage	8 years ago
nihuini	0ce0c11851	load sub-op in advance for group convolution	8 years ago
nihuini	86f4264c7c	arm neon assembly for winograd5	8 years ago
kyuusaku	d2416187dc	fix parameter check for interp (#425 )	8 years ago
nihuini	90643630c2	apple a10/a11 is armv8.2-a	8 years ago
nihuini	50e1f0e531	const for to_pixels family	8 years ago
nihuini	ce74836e2a	yet another winograd convolution implementation, unroll outch 8 tiles 4 inch 4, about 22% faster, more optimization may comes soon :>	8 years ago
nihui	30b6cc4ecd	rdiv binaryop	8 years ago
nihui	2f90a794ad	rsub binaryop	8 years ago
nihuini	a341e7465c	reject to load model with empty network, fix #392	8 years ago

1 2 3 4 5 ...

257 Commits (837e6b047e8cf18650cb9ffe27dffa170a2401e2)