nihui/ncnn - ncnn - 开源协同云脑生态支撑系统

Commit Graph

Author	SHA1	Message	Date
nihui	804ac3421d	infrastructure and optimization for a53 and a55 (#4596 ) * new api for detecting arm midr and a53 a55 arch info wrapper * let a35 be a53 :P * a53 bf16s * detect running core	3 years ago
nihui	b853b3d132	get_physical_cpu_count api family (#4302 ) * get_physical_cpu_count api family * set default to physical big cpu * always treat smt core as big core * is_smt_cpu * get max freq mhz on windows * windows thread affinity	3 years ago
nihui	ca0ba4b25f	fine grained winograd options, adjust x86 convolution winograd strategy (#3908 ) * fine grained winograd options * x86 optimization for convolution winograd f23 pack4/pack8/pack16 * fix avx512 and t4 ci * fix fast direct conv path * winograd63 is actually slower than winograd43 on very large channel	3 years ago
nihui	559e5b23f9	vulkan tensorcore optimization (#3628 ) * query and enable cooperative matrix * fix build with old vulkan sdk * implement cooperative matrix optimization * add nvidia-t4 coverage * adjust test option for more coverage	4 years ago
nihui	cfcb1cffa9	massive vulkan optimization part2 (#3621 ) * vulkan local memory optimization for conv1x1 pack4 and winograd on dgpu * unified innerproduct pipeline creation * reorder deconvolution weight layout * flexible local memory data type * more local memory optimization for conv/deconv gemm	4 years ago
nihui	920aa79f04	drop x86 avx2 fp16 (#3568 )	4 years ago
nihui	7c079b853e	default to big cpu count	5 years ago
nihui	67e24e0703	use local pool allocator (#2736 ) * use local pool allocator * detach extract feat from local allocator * fix test	5 years ago
Youngsoo Lee	b9bed8d993	feat: add denormal options (#2656 ) * feat: add denormal options Flush-To-Zero(FTZ) and Denormals-Are-Zero(DAZ) are modes that bypass IEEE754 methods of dealing with denormal floating-point numbers on x86_64 and some x86 CPUs. * feat: Integrate `flush_denormals` into `Extractor::extract` * chore: replace global variable with `ThreadLocalStorage`	5 years ago
nihui	54c0a13b9f	build shared library (#2525 ) * build shared lib and enable lto * reserved for layer and option * allocator pimpl * datareader pimpl * paramdict pimpl, disable copy assign for allocator and datareader * modelbin pimpl * net extractor pimpl * gpu pimple * disable copy assign vulkandevice, code format * command pimpl, dummy image readonly * pipeline pipelinecache pimpl, export platform class * code format, export simple family * update ci * disable lto on android armv7, merge webassembly ci * link libgcc, fix macos dylib version * pipeline pimpl, gpu info pimpl * destroy gpu info after vulkan device * ignore msvc stl class warning * fix ncnn_paramdict_get_float return type * fix vktransfer upload fp16 without flatten, add command test	5 years ago
nihui	b9296c259d	bring up vulkan 1.1 (#2191 ) * query subgroup features * compile spirv 1.3 * drop offline spirv build * do not build tests for android and ios, as they are never tested anyway * code style	5 years ago
nihuini	6238bfd77f	roi variant for from pixels family, enable fp16a by default	5 years ago
nihuini	4e4f0baa73	set openmp blocktime 20 for reducing power consumption, blocktime option	5 years ago
nihui	11cffce114	armv8.2 infrastructure (#1856 ) * runtime cpu dispatch * force thread one * disable openmp for coverage * simplify test layer * print NCNN_TARGET_ARCH * less ci build variants * weight fp16 storage option * test convdw int8 * apple a12 a13 * ncnn_add_layer ncnn_add_shader cmake macro	5 years ago
nihui	164273de61	online pipeline cache (#1792 ) * online pipeline cache wip * device-wide pipeline cache * enable model-wide pipeline cache * drop pre-created shader modules * always use pipeline cache * use implicit model-wide pipeline cache, code format * code clean	5 years ago
nihui	3ef995ed1e	format code style and setup restyled.io (#1840 )	6 years ago
nihui	9a9a618229	image storage is mandatory, less options makes life easier	6 years ago
nihui	62da1228e1	adreno image shader + fp16 + fp16a (#1714 ) * wip * wip * fix * image and imageview can not be destroyed until command execution ends * fast copy path for tightly packed data * wip * texture load works * 1d 3d image * record clone image, multiple commands share one image reference * upload download image * layer forward accept vkimagemat * vkimagemat graph works * staging vkimagemat for passing dynamic parameters, macro for fp32+image shader, padding image shader * vkimagemat elemsize * convolution test pass * conv1x1s1 image shader * fast staging image allocator from host memory, pooling image shader * convolutiondepthwise image shader * innerproduct image shader * packing image shader * crop deconvolution image shader * resolve spirv binding types * image fp16 and fp16a, cast image shader * eltwise image shader * wip * absval image shader * deconvolutiondepthwise image shader * concat image shader, squeezenet works * noop split image shader * uniform precision hint * layer support_image_storage * wip * vulkan device utility operator * command is storage and packing option aware * fallback to cpu on image allocation failed, mobilenetssd works * flatten image shader, enable more test * ci test * check imgfp32 imgfp16 imgfp16a features * fix ci test * fix ci test * upgrade swiftshader * wip * opt aggressive * imgfp16p * opt none * convolution winograd image shader * fix flush range, fast copy path for continous buffer * minor fix * fix innerproduct * wip ... * wip * cast fix * packing test * wip * image fp16p is fp16p * wip * silence * more line info * code clean * softmax image shader	6 years ago
nihui	d85b2bc285	enable packing_layout by default	6 years ago
nihuini	4c6bf24205	explicit cpu thread affinity	6 years ago
nihui	7d1eec3d5d	the use_bf16_storage option	6 years ago
nihui	a718129d76	shader pack8 option works	6 years ago
nihuini	834224fea8	new option use_packing_layout	6 years ago
nihuini	73911492d7	fix validation warning on querypool destruction, enable fp16p by default	7 years ago
nihuini	838c5df839	option api changes	7 years ago

25 Commits (c41aa2fdfdeb80753b02b0ce80b8a0271d49fe1e)