Asd-g
bbf2e5d533
create_gpu_instance: do not perform destroy_gpu_instance() ( #5437 )
When performing destroy_gpu_instance(), g_instance.created is always 0.
1 year ago
張小凡
3b048d1923
destroy_gpu_instance() function wait for all devices to be idle before destroy ( #4763 )
* destroy_gpu_instance() will internally ensure that all vulkan devices are idle before proceeding with destruction.
2 years ago
Shatyuka
e7748e5311
Fix `destroy_gpu_instance` crash ( #5353 )
* Fix `destroy_gpu_instance` crash
* Additional check and clear
2 years ago
nihui
05b4dcb06c
report vulkan cm 8x8x16 config, enable fp16a cm ( #5298 )
2 years ago
nihui
5329d32e74
check vulkan fp16 uniform support and implement lfp conversion without fp16u ( #5287 )
2 years ago
nihui
556b79ce4d
create layer decoupled ( #5258 )
* create layer decoupled
* no more virtual public
* allow build test with shared library
* decouple cpu vulkan
* drop old scripts
2 years ago
nihui
ded0b78bb2
fix nvidia vulkan crash on exit ( #5234 )
2 years ago
nihui
8c4fc5e2a0
enable uniform 16bit and 8bit when available, fix validation error in fp16sa shader ( #5233 )
2 years ago
nihui
b4f26237cb
in-house vulkan loader ( #5130 )
* vulkan-driver-loader.md
* static vulkan on apple
2 years ago
邓实诚
a1e3ebf8e5
implement simplemath ( #4905 )
* complete abs, fmod and sin function in simplemath.h
* remove some unused variables in simplemath.cpp
* modify test-coverage.yml and add some functions to simplemath.cpp
* modify erf.cpp which included math.h
* include platform.h for NCNN_SIMPLEMATH definition
* move utility constants and functions in simplemath.h to simplemath.cpp
* guard simplemath functions with extern "C"
* add NCNN_EXPORT macro in simplemath.h
* include plateform.h and guard all declarations with NCNN_SIMPLEMATH
* clean unused code in test_unaryop.cpp
* guard #include <vector> with NCNN_SIMPLEMATH in benchncnn.cpp
* add 'static' to guard functions that not declarated in header file
* modify sin and cos with better implementation
---------
Co-authored-by: HonestDeng <HonestDeng@users.noreply.github.com>
2 years ago
nihui
e80fcbca8f
prefer faster and larger device local only memory on amd integrated graphics, heap budget value follows the same strategy as blob allocator ( #4936 )
2 years ago
nihui
c45c01c7c1
enable VK_KHR_cooperative_matrix ( #4823 )
* enable VK_KHR_cooperative_matrix
* add khr cm shader
* update glslang
* print matrix info
2 years ago
Upliner Mikhalych
e8645e9117
Don't silently ignore errors in VkCompute::submit_and_wait ( #4828 )
2 years ago
nihui
15cf81c40d
workaround multiheadattention vulkan nan issue on nvidia gpu ( #4682 )
* fix vulkan validation error, prefer VK_KHR_buffer_device_address over VK_EXT_buffer_device_address
* enable validation extension features
3 years ago
nihui
72a3e5141f
fix vulkan validation error, prefer VK_KHR_buffer_device_address over VK_EXT_buffer_device_address ( #4680 )
3 years ago
nihui
e006aa8007
fix extension not present error ( #4655 )
3 years ago
nihui
a2106f840f
setup more extension entrypoint ( #4636 )
3 years ago
張小凡
d87e895a1f
Add get_gpu_instance() function and Organized the instance class codes. ( #4630 )
3 years ago
張小凡
772b13a1d1
Add three extension capability support check ( #4626 )
* Add some extension capability for vma
3 years ago
nihui
254eb8d0d4
blacklist fp16a on old adreno driver ( #4587 )
3 years ago
weirdseed
503a8b921f
fix uninitialized gpu bug_buffer_image_load_zero value ( #4493 )
3 years ago
ws
643285a08c
fix macos vulkan instance create failed when vulkan sdk version >= 1.… ( #4472 )
* enable VK_KHR_portability_subset extension if device support it
Co-authored-by: w1ndseeker <w1ndseeker@users.noreply.github.com>
3 years ago
nihui
c16cac2678
update glslang, fix system glslang include path ( #3819 )
4 years ago
nihui
50fa6d39c0
enable fp16a for mali t760 v2
4 years ago
nihui
7600270430
create uop in spirv-1 mode for vulkan 1.0 compatibility ( #3721 )
4 years ago
nihui
9826f3dbf8
shader include vulkan activation, workaround for moltenvk tanh half4 issue ( #3711 )
4 years ago
nihui
559e5b23f9
vulkan tensorcore optimization ( #3628 )
* query and enable cooperative matrix
* fix build with old vulkan sdk
* implement cooperative matrix optimization
* add nvidia-t4 coverage
* adjust test option for more coverage
4 years ago
nihui
3ddd65e18c
massive vulkan optimization part3 ( #3632 )
* implicit gemm
* unroll direct conv by 2x2x2
4 years ago
nihui
cfcb1cffa9
massive vulkan optimization part2 ( #3621 )
* vulkan local memory optimization for conv1x1 pack4 and winograd on dgpu
* unified innerproduct pipeline creation
* reorder deconvolution weight layout
* flexible local memory data type
* more local memory optimization for conv/deconv gemm
4 years ago
nihui
8f25ba0cab
enable fp16a on mali-g31
4 years ago
nihui
30e106b185
add another mali g52 device id
4 years ago
nihui
5f62fdec87
allow more concurrent gpu submits on device with low queue count
5 years ago
nihui
81be8e235c
workaround macos intel dummy image readonly issue, fix #2548 ( #2864 )
5 years ago
nihui
9fd4d371ae
bridge image for adreno image upload and download ( #2658 )
* add bridge image for adreno image storage upload and download
* enable sbn1, print bugbilz flag
* blacklist old adreno
* let user choose use_image_storage option even when bug_storage_buffer_no_l1
5 years ago
nihuini
3bf03379d7
fix pipeline compilation error on image store fp16sa
5 years ago
nihuini
f437bcdd4c
enable fp16s and int8s on newer adreno/mali, actually enable int8 tests
5 years ago
nihui
80499bd64a
enable VK_LAYER_KHRONOS_validation layer in modern vulkan sdk
5 years ago
nihuini
9b949d65b3
fuse onnx lstm, codeformat exclude pybind11, fix #2562
5 years ago
nihui
54c0a13b9f
build shared library ( #2525 )
* build shared lib and enable lto
* reserved for layer and option
* allocator pimpl
* datareader pimpl
* paramdict pimpl, disable copy assign for allocator and datareader
* modelbin pimpl
* net extractor pimpl
* gpu pimple
* disable copy assign vulkandevice, code format
* command pimpl, dummy image readonly
* pipeline pipelinecache pimpl, export platform class
* code format, export simple family
* update ci
* disable lto on android armv7, merge webassembly ci
* link libgcc, fix macos dylib version
* pipeline pimpl, gpu info pimpl
* destroy gpu info after vulkan device
* ignore msvc stl class warning
* fix ncnn_paramdict_get_float return type
* fix vktransfer upload fp16 without flatten, add command test
5 years ago
nihuini
5650b77054
fix gpu extension conditions
5 years ago
nihui
1f44e5c6a3
enable ios arm64e ( #2475 )
* enable ios arm64e
* fix build with old vulkan sdk
* link vulkan loader on macos, fix ios moltenvk library path
* there is no moltenvk arm64e library atm, link moltenvk directly for macos-arm64
5 years ago
nihui
2b0b2fa388
enable more vulkan extensions, set subgroup size per vendor
5 years ago
nihui
cf3cf83cd3
unified image shader storage type ( #2231 )
* drop bug_layout_binding_id_alias flag
5 years ago
nihui
9be3f074a9
ci ndk-r16b ( #2104 )
* reset
* fix build with old vulkan header
5 years ago
nihui
b9296c259d
bring up vulkan 1.1 ( #2191 )
* query subgroup features
* compile spirv 1.3
* drop offline spirv build
* do not build tests for android and ios, as they are never tested anyway
* code style
5 years ago
nihui
4463c3b455
disable image shader on adreno until a better workaround figured out
5 years ago
youzainn
1c5af3d83c
add device_name field for class GpuInfo ( #2122 )
5 years ago
nihuini
a334513b5e
fp16a option fix
5 years ago
nihuini
9047741129
always disable fp16/int8 arithmetic for gpu uop
5 years ago
nihui
9f5b660483
compile spirv
5 years ago