nihui/ncnn - ncnn - 开源协同云脑生态支撑系统

Commit Graph

Author	SHA1	Message	Date
daquexian	d38871bbfc	load bin in a single pass (#4966 ) Signed-off-by: daquexian <daquexian566@gmail.com>	2 years ago
Upliner Mikhalych	e8645e9117	Don't silently ignore errors in VkCompute::submit_and_wait (#4828 )	2 years ago
nihui	903ec7c2c9	fix overwrite builtin layer destruction (#4732 ) * fix overwrite builtin layer destruction * make modelbin class copyable * test++	3 years ago
nihui	1b4a8fd4b2	fix warnings and code clean (#4729 )	3 years ago
jason_w	48f9bcfce2	place the `if` statement outside the `for` loop (#4707 )	3 years ago
nihui	db628b1b99	allow overwriting built-in layer with custom layer (#4616 )	3 years ago
nihui	15761fc1a6	arm vfpv4 asimdhp asimdfhm optimization for gemm (#4432 )	3 years ago
nihui	0b591b0d1f	implement layer feature disabled bit (#4278 )	3 years ago
LinHe	9426e21166	Memory Pool Improvement For Variadic Sized Inputs (#4190 ) * Simple miss count for better space efficiency * Simple double ended greedy; * Add size drop threshold setter; * set workspace allocator cr to zero as we had some sort of recylcing capability :P Co-authored-by: LinHeLurking <LinHeLurking@users.noreply.github.com> Co-authored-by: nihuini <nihuini@tencent.com>	3 years ago
nihui	dadc640c66	x86 avx512 optimization (#3581 ) * unified relu avx512 * unifed clip avx512 * unaryop avx512 * sigmoid avx512 * binaryop avx512 * padding convolution avx512 * convolutiondepthwise avx512 * innerproduct avx512 * reshape avx512 * slice avx512 * hardsigmoid hardswish avx512 * swish avx512 * pooling avx512 * crop avx512 * convolution sgemm pack16 * convolution 3x3 winograd pack16 * interp avx512 * convolution sgemm pack1to16 * convolution sgemm pack16to8 * convolution sgemm pack8to16 * convolution sgemm pack16to4 * fix vulkan permute pack8 * fix vulkan convolution gemm pack8to1	4 years ago
nihui	559e5b23f9	vulkan tensorcore optimization (#3628 ) * query and enable cooperative matrix * fix build with old vulkan sdk * implement cooperative matrix optimization * add nvidia-t4 coverage * adjust test option for more coverage	4 years ago
nihui	cfcb1cffa9	massive vulkan optimization part2 (#3621 ) * vulkan local memory optimization for conv1x1 pack4 and winograd on dgpu * unified innerproduct pipeline creation * reorder deconvolution weight layout * flexible local memory data type * more local memory optimization for conv/deconv gemm	4 years ago
nihui	3a83704c38	binary4d, unary4d (#3443 )	4 years ago
nihui	aa9753b2f0	detach mat from local blob allocator so net instance could be destroyed much earlier (#3287 )	4 years ago
nihuini	affbefe311	some space cleanup, blob clone from allocator	4 years ago
nihui	cdf45a6512	cmake option NCNN_BF16 (#3068 )	4 years ago
Tijmen Verhulsdonck	eaa7e24db6	Added ability to switch AVX/AVX2 during runtime (#3076 )	4 years ago
nihui	3a77b09c31	fix test failure	5 years ago
nihuini	9b5cb959b9	auto convert int8 to fp32 on extract	5 years ago
nihui	ad37c34d25	disable NCNN_ARM82DOT whenever NCNN_ARM82 disabled	5 years ago
Cai Shanli	8cc8cd716a	Add get input and output names (#2890 )	5 years ago
nihui	17936e9f54	fix packing risc-v test, add cpu_riscv_vlenb()	5 years ago
nihui	11958424c2	runtime riscv v and zfh dispatch, riscv v optimization for cast	5 years ago
nihui	1c26291757	more verbose hint for find_blob_index_by_name failure	5 years ago
nihuini	34bd5ef161	update eq quant info	5 years ago
nihuini	72ef77a469	fix build with NCNN_STRING off and NCNN_VULKAN on	5 years ago
zhiliu6	fb9d529487	fix compile error when NCNN_STRING is disabled (#2874 )	5 years ago
nihuini	31d436c627	more verbose load failure, ncnn2int8 write int8 data properly	5 years ago
nihuini	1bc0126302	fix crash when input cpu blob and extract the same from gpu, update vgg16 int8 model	5 years ago
nihui	e9cc637573	arm neon optimization for int8 packing kernels (#2809 )	5 years ago
nihui	32b48f0157	fix int8 auto pack layout	5 years ago
nihui	5fe75f19ef	architecture changes for int8 packing (#2771 ) * quantize and dequantize tests * unify activation and usability function * drop NCNN_REQUANT cmake option, test dequantize requantize pack8, fix webassembly build * benchmark use requantize int8 model	5 years ago
nihui	d4a7abc218	fix onnx2ncnn clip without max blob, fix #2788	5 years ago
nihui	67e24e0703	use local pool allocator (#2736 ) * use local pool allocator * detach extract feat from local allocator * fix test	5 years ago
Cai Shanli	f5b307689b	fix net and extractor destroy order when use vulkan (#2732 )	5 years ago
nihuini	b51959802c	fix buffer2host copy, fix #2725	5 years ago
Xu Yang	fd634e9a58	remove unnecessary mat clone when NCNN_BENCHMARK enabled (#2708 )	5 years ago
Dahan Gong	cbd410c237	fix broken inplace forward (#2709 )	5 years ago
Youngsoo Lee	b9bed8d993	feat: add denormal options (#2656 ) * feat: add denormal options Flush-To-Zero(FTZ) and Denormals-Are-Zero(DAZ) are modes that bypass IEEE754 methods of dealing with denormal floating-point numbers on x86_64 and some x86 CPUs. * feat: Integrate `flush_denormals` into `Extractor::extract` * chore: replace global variable with `ThreadLocalStorage`	5 years ago
nihui	9fd4d371ae	bridge image for adreno image upload and download (#2658 ) * add bridge image for adreno image storage upload and download * enable sbn1, print bugbilz flag * blacklist old adreno * let user choose use_image_storage option even when bug_storage_buffer_no_l1	5 years ago
nihuini	2a57ca4942	reduce memory usage in lightmode, handle upload image allocation failure properly	5 years ago
nihuini	bd68ee487b	fallback to cpu when image allocation failed, fix #2648	5 years ago
nihui	af7d8184aa	handle image allocation failure properly	5 years ago
nihui	09b2bf6213	Break down forward_layer (#2577 )	5 years ago
nihui	54c0a13b9f	build shared library (#2525 ) * build shared lib and enable lto * reserved for layer and option * allocator pimpl * datareader pimpl * paramdict pimpl, disable copy assign for allocator and datareader * modelbin pimpl * net extractor pimpl * gpu pimple * disable copy assign vulkandevice, code format * command pimpl, dummy image readonly * pipeline pipelinecache pimpl, export platform class * code format, export simple family * update ci * disable lto on android armv7, merge webassembly ci * link libgcc, fix macos dylib version * pipeline pimpl, gpu info pimpl * destroy gpu info after vulkan device * ignore msvc stl class warning * fix ncnn_paramdict_get_float return type * fix vktransfer upload fp16 without flatten, add command test	5 years ago
nihui	1040f40c8b	update c api for custom allocator datareader modelbin and layer registration, add cookie userdata to layer	5 years ago
nihui	79efe33fdc	cmake option for platform api uses (#2502 ) * cmake option for platform api uses * adroid gpu ci does not rely on glslangvalidator, add android termux ci	5 years ago
nihui	343bc3b7dc	single blob consumer (#2493 )	5 years ago
Zhuo Zhang	3c99287da5	fix src/net.cpp missing-field-initializers warning (#2494 )	5 years ago
maxfy1992	0f325d7910	add decrease unpack pack overhead (#2489 ) Co-authored-by: yangfengmax <yangfengmax@didichuxing.com>	5 years ago

1 2 3 4

163 Commits (d38871bbfc38048d904efe50099bc2b1b7901bc1)