nihui/ncnn - ncnn - 开源协同云脑生态支撑系统

Commit Graph

Author	SHA1	Message	Date
nihui	b853b3d132	get_physical_cpu_count api family (#4302 ) * get_physical_cpu_count api family * set default to physical big cpu * always treat smt core as big core * is_smt_cpu * get max freq mhz on windows * windows thread affinity	3 years ago
LinHe	9426e21166	Memory Pool Improvement For Variadic Sized Inputs (#4190 ) * Simple miss count for better space efficiency * Simple double ended greedy; * Add size drop threshold setter; * set workspace allocator cr to zero as we had some sort of recylcing capability :P Co-authored-by: LinHeLurking <LinHeLurking@users.noreply.github.com> Co-authored-by: nihuini <nihuini@tencent.com>	3 years ago
nihui	1d0917c83b	fix build with very old gcc (#4048 ) * clear bom marker, avoid vector data function	3 years ago
xuehao.ma	962a49069a	add the param file of fastestdet in benchmark (#4026 )	3 years ago
tpoisonooo	6fd801b6d7	feat(src/layer): add vision_transformer benchmark (#3730 ) * feat(src/layer): add vision_transformer benchmark and relative layer * refactor(testutil.h): add para for RandomMat	4 years ago
dog-qiuqiu	009d607a15	add the param file of yolo-fastest in benchmark (#3470 )	4 years ago
BUG1989	2112a4d7c3	add the param file of nanodet_m in benchmark (#3047 ) * add the param file of nanodet_m in benchmark * add Noop layer into nanodet_m param for benchmark * update nanodet_m.param	4 years ago
nihuini	26dc9820e4	custom mlir ncnn optimize pass, add efficientnetv2_b0 benchmark	5 years ago
Cai Shanli	8cc8cd716a	Add get input and output names (#2890 )	5 years ago
nihui	e4a4b51d27	openmp on webassembly (#2234 ) * openmp on webasm works * fix compile flags * dynamic kmp runtime initialization * clang simpleomp ci * fix dispatch on unique cpu	5 years ago
Evgeny Proydakov	8b0c46c45d	A single approach was used to suppress the msvc C4996 compiler warning [_CRT_SECURE_NO_WARNINGS] (#2208 )	5 years ago
fawdlstty	1d1cb29869	Fixed compile warning due to default cast (#2201 )	5 years ago
nihui	11cffce114	armv8.2 infrastructure (#1856 ) * runtime cpu dispatch * force thread one * disable openmp for coverage * simplify test layer * print NCNN_TARGET_ARCH * less ci build variants * weight fp16 storage option * test convdw int8 * apple a12 a13 * ncnn_add_layer ncnn_add_shader cmake macro	5 years ago
nihui	fe6bc1ed4d	Ci rv64gcv and rv64gc (#1936 )	5 years ago
zhiliu6	cdbff653b8	Add yolov4 example option. (#1913 ) * Add yolov4 example option. Add yolov4-tiny for benchmark. * Restyled by astyle * Restyled by clang-format * Restyled by astyle Co-authored-by: Restyled.io <commits@restyled.io>	5 years ago
nihui	164273de61	online pipeline cache (#1792 ) * online pipeline cache wip * device-wide pipeline cache * enable model-wide pipeline cache * drop pre-created shader modules * always use pipeline cache * use implicit model-wide pipeline cache, code format * code clean	5 years ago
nihuini	9bb06e46cf	implicit gpu instance creation, fix #1849	6 years ago
nihui	3ef995ed1e	format code style and setup restyled.io (#1840 )	6 years ago
Tijmen Verhulsdonck	da09e5e7f1	Adding channel padding support for blazeface model. (#1826 ) * Add channel padding and blazeface model support. * remove python binding * remove std::min usage * fix reference blob usage * Increased padding test coverage * implement requested changes	6 years ago
nihuini	03a5378651	efficientnet-b0 and regnety-400m benchmark	6 years ago
nihuini	054ec09195	adreno device blacklist	6 years ago
nihuini	d232272db0	lower end gpu friendly	6 years ago
nihuini	b71f22d074	report adreno info, benchncnn enable image storage on adreno	6 years ago
nihui	9a9a618229	image storage is mandatory, less options makes life easier	6 years ago
nihui	62da1228e1	adreno image shader + fp16 + fp16a (#1714 ) * wip * wip * fix * image and imageview can not be destroyed until command execution ends * fast copy path for tightly packed data * wip * texture load works * 1d 3d image * record clone image, multiple commands share one image reference * upload download image * layer forward accept vkimagemat * vkimagemat graph works * staging vkimagemat for passing dynamic parameters, macro for fp32+image shader, padding image shader * vkimagemat elemsize * convolution test pass * conv1x1s1 image shader * fast staging image allocator from host memory, pooling image shader * convolutiondepthwise image shader * innerproduct image shader * packing image shader * crop deconvolution image shader * resolve spirv binding types * image fp16 and fp16a, cast image shader * eltwise image shader * wip * absval image shader * deconvolutiondepthwise image shader * concat image shader, squeezenet works * noop split image shader * uniform precision hint * layer support_image_storage * wip * vulkan device utility operator * command is storage and packing option aware * fallback to cpu on image allocation failed, mobilenetssd works * flatten image shader, enable more test * ci test * check imgfp32 imgfp16 imgfp16a features * fix ci test * fix ci test * upgrade swiftshader * wip * opt aggressive * imgfp16p * opt none * convolution winograd image shader * fix flush range, fast copy path for continous buffer * minor fix * fix innerproduct * wip ... * wip * cast fix * packing test * wip * image fp16p is fp16p * wip * silence * more line info * code clean * softmax image shader	6 years ago
kalcohol	06e129d259	add skip cooling down option (#1566 ) * add option for cooling down * add option for cooling down add the option usage add benchmark of RTX2080 * add benchmark of rtx2060	6 years ago
nihui	6f2ef1932d	int8 code refactoring wip, add int8 test	6 years ago
Sungmann Cho	447b1369f5	Fix warnings on Visual Studio (#1422 ) * Change DataReader::read()'s signature to fix warning C4267 This CL fixes lots of warning "C4267: 'initializing': conversion from 'size_t' to 'int'" in our codebase by matching DataReader::read()'s signature to fread(). * Fix warnings C4244 and C4267 in tools/ncnnoptimize.cpp C4244: 'initializing': conversion from 'double' to 'float', possible loss of data C4267: 'initializing': conversion from 'size_t' to 'int', possible loss of data * Fix warning C4244 in src\layer\selu.cpp C4244: '=': conversion from 'double' to 'float', possible loss of data * Fix warning C4244 in src\layer\cast.cpp C4244: '=': conversion from 'double' to 'float', possible loss of data C4244: 'return': conversion from 'float' to 'signed char', possible loss of data * Fix warning C4244 in src\layer\psroipooling.cpp C4244: 'initializing': conversion from 'double' to 'float', possible loss of data C4244: 'initializing': conversion from 'double' to 'int', possible loss of data	6 years ago
nihuini	2e59da35a9	fill input and weight data with zero	6 years ago
nihuini	02b07b3e43	update qcom810 and iphone5s benchmark	6 years ago
nihuini	64333429bb	data reader wrapper, fix #1325	6 years ago
nihuini	567e2bd501	a dirty hack for resolving int8 pack4 crash	6 years ago
nihuini	f8caef7691	add shufflenet_v2 benchmark	6 years ago
nihuini	9d4255a4a4	add mobilenet_v3 benchmark	6 years ago
Eric Liu	f5eee84185	Update mobilenetv2-yolov3 (#1165 ) * Add mobilenetv2_yolov3 ,and remove old model * Recovery yolov2 * Update link	6 years ago
Natsu	6d1944f2c3	CMake improvement (#1115 ) * CMake improvement * Fix bugs * Fix typo * Propagate vulkan dependency * import vulkan * add config files, now exported target cmake should be able to find packages * Propagate no-rtti and no-exception * Provide a option to control rtti and exception in mobile platform * Make cmake clean * Resolve conflicts * Update CMake PIE is propagated by INTERFACE_POSITION_INDEPENDENT_CODE * Remove bad things	6 years ago
BUG1989	bcfe9f453f	initial the ncnn post training quantization tools (#1067 ) * initial the ncnn post training quantization tools * clear some comments of tools * fix the Travis ci compiler error	7 years ago
nihuini	21b5508c96	shared locked vkallocator cannot prevent concurrent accessing during actual gpu inference, use seperated vkallocator for each queue	7 years ago
nihuini	040a8d2427	set vulkan device by gpu index	7 years ago
nihuini	838c5df839	option api changes	7 years ago
nihuini	9b33e647bd	use fixed blob names for benchmark	7 years ago
nihuini	7a8f68aca6	move vulkan code to subdir, new layer interface create_pipeline and destroy_pipeline for post-loading works	7 years ago
nihuini	20fb006282	coverage never works without proper unittest	7 years ago
nihuini	d263cd507c	gpu packing and unpacking	7 years ago
BUG1989	2f4c4a8202	fix the compile error when using armv7a without neon (#835 )	7 years ago
nihuini	1f4bdd91b5	uint32_t typed workgroup size	7 years ago
BUG1989	df3d224484	new int8 implement,better accuracy (#749 ) * add the armv7a conv3x3s1 implement without overflow,remove old codes * fix the bug of conv3x3s2 packed int8 * new int8 implement,weight quant by perchanel,better accuracy~ * fix the bug of conv3x3s1 packed int8 neon * add the naive c fp32 and int8 winograd F(2,3) * add the neon intrinsic int8 winograd F(2,3) * optimize the armv7a int8 winograd F(2,3) with neon assembly * optimize the armv7a int8 winograd F(2,3) input transform with assembly. * add the requantize layer and int8 relu implement. * add graph optimize conv1x1s2 -> conv1x1s1,begin optimize int8 aarch64. * fix int8 bugs * add the c naive im2col with sgemm * add aarch64 int8 winograd f23, conv3x3s2 naive implement * add the int8 sgemm conv7x7s2 on x86/armv7a platform * optimize the int8 sgemm by neon intrinsic and packed kernel * optimize the int8 sgemm with packed data * optimize the int8 sgemm with armv7a neon assembly * add the int8 sgemm on arm64-v8a platform * perpare to merge latest codes from master * add the int8 param files * In the Class Net,add the fuse_network method	7 years ago
nihui	182c340b3a	enable ssd vulkan benchmark	7 years ago
nihuini	f162de7263	drop deprecated hack	7 years ago
nihuini	83efa73cf6	fallback to cpu forward if layer not support vulkan, automatically!	7 years ago

1 2

66 Commits (057b5bb515d551fa64decdb7350422c19feba447)