nihui/ncnn - ncnn - 开源协同云脑生态支撑系统

Commit Graph

Author	SHA1	Message	Date
Asd-g	bbf2e5d533	create_gpu_instance: do not perform destroy_gpu_instance() (#5437 ) When performing destroy_gpu_instance(), g_instance.created is always 0.	1 year ago
張小凡	3b048d1923	destroy_gpu_instance() function wait for all devices to be idle before destroy (#4763 ) * destroy_gpu_instance() will internally ensure that all vulkan devices are idle before proceeding with destruction.	2 years ago
Shatyuka	e7748e5311	Fix `destroy_gpu_instance` crash (#5353 ) * Fix `destroy_gpu_instance` crash * Additional check and clear	2 years ago
nihui	05b4dcb06c	report vulkan cm 8x8x16 config, enable fp16a cm (#5298 )	2 years ago
nihui	5329d32e74	check vulkan fp16 uniform support and implement lfp conversion without fp16u (#5287 )	2 years ago
nihui	556b79ce4d	create layer decoupled (#5258 ) * create layer decoupled * no more virtual public * allow build test with shared library * decouple cpu vulkan * drop old scripts	2 years ago
nihui	ded0b78bb2	fix nvidia vulkan crash on exit (#5234 )	2 years ago
nihui	8c4fc5e2a0	enable uniform 16bit and 8bit when available, fix validation error in fp16sa shader (#5233 )	2 years ago
nihui	b4f26237cb	in-house vulkan loader (#5130 ) * vulkan-driver-loader.md * static vulkan on apple	2 years ago
邓实诚	a1e3ebf8e5	implement simplemath (#4905 ) * complete abs, fmod and sin function in simplemath.h * remove some unused variables in simplemath.cpp * modify test-coverage.yml and add some functions to simplemath.cpp * modify erf.cpp which included math.h * include platform.h for NCNN_SIMPLEMATH definition * move utility constants and functions in simplemath.h to simplemath.cpp * guard simplemath functions with extern "C" * add NCNN_EXPORT macro in simplemath.h * include plateform.h and guard all declarations with NCNN_SIMPLEMATH * clean unused code in test_unaryop.cpp * guard #include <vector> with NCNN_SIMPLEMATH in benchncnn.cpp * add 'static' to guard functions that not declarated in header file * modify sin and cos with better implementation --------- Co-authored-by: HonestDeng <HonestDeng@users.noreply.github.com>	2 years ago
nihui	e80fcbca8f	prefer faster and larger device local only memory on amd integrated graphics, heap budget value follows the same strategy as blob allocator (#4936 )	2 years ago
nihui	c45c01c7c1	enable VK_KHR_cooperative_matrix (#4823 ) * enable VK_KHR_cooperative_matrix * add khr cm shader * update glslang * print matrix info	2 years ago
Upliner Mikhalych	e8645e9117	Don't silently ignore errors in VkCompute::submit_and_wait (#4828 )	2 years ago
nihui	15cf81c40d	workaround multiheadattention vulkan nan issue on nvidia gpu (#4682 ) * fix vulkan validation error, prefer VK_KHR_buffer_device_address over VK_EXT_buffer_device_address * enable validation extension features	3 years ago
nihui	72a3e5141f	fix vulkan validation error, prefer VK_KHR_buffer_device_address over VK_EXT_buffer_device_address (#4680 )	3 years ago
nihui	e006aa8007	fix extension not present error (#4655 )	3 years ago
nihui	a2106f840f	setup more extension entrypoint (#4636 )	3 years ago
張小凡	d87e895a1f	Add get_gpu_instance() function and Organized the instance class codes. (#4630 )	3 years ago
張小凡	772b13a1d1	Add three extension capability support check (#4626 ) * Add some extension capability for vma	3 years ago
nihui	254eb8d0d4	blacklist fp16a on old adreno driver (#4587 )	3 years ago
weirdseed	503a8b921f	fix uninitialized gpu bug_buffer_image_load_zero value (#4493 )	3 years ago
ws	643285a08c	fix macos vulkan instance create failed when vulkan sdk version >= 1.… (#4472 ) * enable VK_KHR_portability_subset extension if device support it Co-authored-by: w1ndseeker <w1ndseeker@users.noreply.github.com>	3 years ago
nihui	c16cac2678	update glslang, fix system glslang include path (#3819 )	4 years ago
nihui	50fa6d39c0	enable fp16a for mali t760 v2	4 years ago
nihui	7600270430	create uop in spirv-1 mode for vulkan 1.0 compatibility (#3721 )	4 years ago
nihui	9826f3dbf8	shader include vulkan activation, workaround for moltenvk tanh half4 issue (#3711 )	4 years ago
nihui	559e5b23f9	vulkan tensorcore optimization (#3628 ) * query and enable cooperative matrix * fix build with old vulkan sdk * implement cooperative matrix optimization * add nvidia-t4 coverage * adjust test option for more coverage	4 years ago
nihui	3ddd65e18c	massive vulkan optimization part3 (#3632 ) * implicit gemm * unroll direct conv by 2x2x2	4 years ago
nihui	cfcb1cffa9	massive vulkan optimization part2 (#3621 ) * vulkan local memory optimization for conv1x1 pack4 and winograd on dgpu * unified innerproduct pipeline creation * reorder deconvolution weight layout * flexible local memory data type * more local memory optimization for conv/deconv gemm	4 years ago
nihui	8f25ba0cab	enable fp16a on mali-g31	4 years ago
nihui	30e106b185	add another mali g52 device id	4 years ago
nihui	5f62fdec87	allow more concurrent gpu submits on device with low queue count	5 years ago
nihui	81be8e235c	workaround macos intel dummy image readonly issue, fix #2548 (#2864 )	5 years ago
nihui	9fd4d371ae	bridge image for adreno image upload and download (#2658 ) * add bridge image for adreno image storage upload and download * enable sbn1, print bugbilz flag * blacklist old adreno * let user choose use_image_storage option even when bug_storage_buffer_no_l1	5 years ago
nihuini	3bf03379d7	fix pipeline compilation error on image store fp16sa	5 years ago
nihuini	f437bcdd4c	enable fp16s and int8s on newer adreno/mali, actually enable int8 tests	5 years ago
nihui	80499bd64a	enable VK_LAYER_KHRONOS_validation layer in modern vulkan sdk	5 years ago
nihuini	9b949d65b3	fuse onnx lstm, codeformat exclude pybind11, fix #2562	5 years ago
nihui	54c0a13b9f	build shared library (#2525 ) * build shared lib and enable lto * reserved for layer and option * allocator pimpl * datareader pimpl * paramdict pimpl, disable copy assign for allocator and datareader * modelbin pimpl * net extractor pimpl * gpu pimple * disable copy assign vulkandevice, code format * command pimpl, dummy image readonly * pipeline pipelinecache pimpl, export platform class * code format, export simple family * update ci * disable lto on android armv7, merge webassembly ci * link libgcc, fix macos dylib version * pipeline pimpl, gpu info pimpl * destroy gpu info after vulkan device * ignore msvc stl class warning * fix ncnn_paramdict_get_float return type * fix vktransfer upload fp16 without flatten, add command test	5 years ago
nihuini	5650b77054	fix gpu extension conditions	5 years ago
nihui	1f44e5c6a3	enable ios arm64e (#2475 ) * enable ios arm64e * fix build with old vulkan sdk * link vulkan loader on macos, fix ios moltenvk library path * there is no moltenvk arm64e library atm, link moltenvk directly for macos-arm64	5 years ago
nihui	2b0b2fa388	enable more vulkan extensions, set subgroup size per vendor	5 years ago
nihui	cf3cf83cd3	unified image shader storage type (#2231 ) * drop bug_layout_binding_id_alias flag	5 years ago
nihui	9be3f074a9	ci ndk-r16b (#2104 ) * reset * fix build with old vulkan header	5 years ago
nihui	b9296c259d	bring up vulkan 1.1 (#2191 ) * query subgroup features * compile spirv 1.3 * drop offline spirv build * do not build tests for android and ios, as they are never tested anyway * code style	5 years ago
nihui	4463c3b455	disable image shader on adreno until a better workaround figured out	5 years ago
youzainn	1c5af3d83c	add device_name field for class GpuInfo (#2122 )	5 years ago
nihuini	a334513b5e	fp16a option fix	5 years ago
nihuini	9047741129	always disable fp16/int8 arithmetic for gpu uop	5 years ago
nihui	9f5b660483	compile spirv	5 years ago

1 2 3

127 Commits (bbf2e5d533e3755af782bc459dff03480b54b3d2)