nihui
b284dbd0f4
discover VK_KHR_shader_non_semantic_info, checked convolution imagestore ( #5955 )
1 year ago
nihui
eed257df1f
ci update llvmpipe ( #5954 )
* check image fp16
1 year ago
nihui
bf13c30210
define device feature macros for glslang, discover VK_EXT_shader_atomic_float and VK_EXT_shader_atomic_float2 ( #5949 )
1 year ago
nihui
8211930a6f
discover VK_KHR_shader_subgroup_rotate ( #5948 )
1 year ago
nihui
1b6485fa17
discover VK_KHR_zero_initialize_workgroup_memory ( #5947 )
1 year ago
nihui
40f7b4e527
discover all subgroup features and VK_KHR_shader_subgroup_extended_types ( #5946 )
1 year ago
nihui
0b9925cfef
intergrate VK_EXT_subgroup_size_control features and properties ( #5940 )
1 year ago
Upliner Mikhalych
cbd17cd062
Fix #5741 don't crash when vkCreateDevice fails ( #5742 )
1 year ago
nihui
bd1f39ed82
blacklist mesa vulkan cooperative matrix feature ( #5739 )
ref https://gitlab.freedesktop.org/mesa/mesa/-/issues/10847
1 year ago
張小凡
3b048d1923
destroy_gpu_instance() function wait for all devices to be idle before destroy ( #4763 )
* destroy_gpu_instance() will internally ensure that all vulkan devices are idle before proceeding with destruction.
2 years ago
nihui
05b4dcb06c
report vulkan cm 8x8x16 config, enable fp16a cm ( #5298 )
2 years ago
nihui
5329d32e74
check vulkan fp16 uniform support and implement lfp conversion without fp16u ( #5287 )
2 years ago
nihui
b4f26237cb
in-house vulkan loader ( #5130 )
* vulkan-driver-loader.md
* static vulkan on apple
2 years ago
nihui
c45c01c7c1
enable VK_KHR_cooperative_matrix ( #4823 )
* enable VK_KHR_cooperative_matrix
* add khr cm shader
* update glslang
* print matrix info
2 years ago
nihui
15cf81c40d
workaround multiheadattention vulkan nan issue on nvidia gpu ( #4682 )
* fix vulkan validation error, prefer VK_KHR_buffer_device_address over VK_EXT_buffer_device_address
* enable validation extension features
3 years ago
nihui
a2106f840f
setup more extension entrypoint ( #4636 )
3 years ago
張小凡
d87e895a1f
Add get_gpu_instance() function and Organized the instance class codes. ( #4630 )
3 years ago
張小凡
772b13a1d1
Add three extension capability support check ( #4626 )
* Add some extension capability for vma
3 years ago
ws
643285a08c
fix macos vulkan instance create failed when vulkan sdk version >= 1.… ( #4472 )
* enable VK_KHR_portability_subset extension if device support it
Co-authored-by: w1ndseeker <w1ndseeker@users.noreply.github.com>
3 years ago
nihui
559e5b23f9
vulkan tensorcore optimization ( #3628 )
* query and enable cooperative matrix
* fix build with old vulkan sdk
* implement cooperative matrix optimization
* add nvidia-t4 coverage
* adjust test option for more coverage
4 years ago
nihui
9fd4d371ae
bridge image for adreno image upload and download ( #2658 )
* add bridge image for adreno image storage upload and download
* enable sbn1, print bugbilz flag
* blacklist old adreno
* let user choose use_image_storage option even when bug_storage_buffer_no_l1
5 years ago
nihui
54c0a13b9f
build shared library ( #2525 )
* build shared lib and enable lto
* reserved for layer and option
* allocator pimpl
* datareader pimpl
* paramdict pimpl, disable copy assign for allocator and datareader
* modelbin pimpl
* net extractor pimpl
* gpu pimple
* disable copy assign vulkandevice, code format
* command pimpl, dummy image readonly
* pipeline pipelinecache pimpl, export platform class
* code format, export simple family
* update ci
* disable lto on android armv7, merge webassembly ci
* link libgcc, fix macos dylib version
* pipeline pimpl, gpu info pimpl
* destroy gpu info after vulkan device
* ignore msvc stl class warning
* fix ncnn_paramdict_get_float return type
* fix vktransfer upload fp16 without flatten, add command test
5 years ago
nihui
1f44e5c6a3
enable ios arm64e ( #2475 )
* enable ios arm64e
* fix build with old vulkan sdk
* link vulkan loader on macos, fix ios moltenvk library path
* there is no moltenvk arm64e library atm, link moltenvk directly for macos-arm64
5 years ago
nihui
2b0b2fa388
enable more vulkan extensions, set subgroup size per vendor
5 years ago
nihui
cf3cf83cd3
unified image shader storage type ( #2231 )
* drop bug_layout_binding_id_alias flag
5 years ago
nihui
b9296c259d
bring up vulkan 1.1 ( #2191 )
* query subgroup features
* compile spirv 1.3
* drop offline spirv build
* do not build tests for android and ios, as they are never tested anyway
* code style
5 years ago
youzainn
1c5af3d83c
add device_name field for class GpuInfo ( #2122 )
5 years ago
nihui
9f5b660483
compile spirv
5 years ago
nihuini
bf279dcf17
workaround corrupted pipeline cache on old qcom adreno
6 years ago
nihui
193e08e834
lazy initialize utility operator, fix #1923
6 years ago
nihui
164273de61
online pipeline cache ( #1792 )
* online pipeline cache wip
* device-wide pipeline cache
* enable model-wide pipeline cache
* drop pre-created shader modules
* always use pipeline cache
* use implicit model-wide pipeline cache, code format
* code clean
6 years ago
nihui
3ef995ed1e
format code style and setup restyled.io ( #1840 )
6 years ago
nihuini
aeba24b371
enable implicit fp16a on arm mali variants, add bug tag for layout binding id alias
6 years ago
nihuini
6788384595
query gpu heap budget api
6 years ago
nihui
17c445480f
runtime spir-v compilation with libglslang ( #1779 )
6 years ago
nihuini
b71f22d074
report adreno info, benchncnn enable image storage on adreno
6 years ago
SunTY
705dd36a31
simplestl is an alternative std vector string implementation ( #1762 )
* 去掉对stl的依赖
* 头文件名,push_back改正
* 去掉构造托管
* 好像是折腾
* data 的返回改为指针,非指针引用
* resize一处写错
* stdint
* 加入c_str
* 改文件名为小写
* NCNN_SIMPLESTL option
* simplestl default to OFF
* Update linux-x64-cpu-gcc.yml
* Update linux-x64-cpu-gcc.yml
* Update linux-x64-cpu-clang.yml
* drop functional header
* arm32 arm64 simplestl ci
* 修改一处内存泄漏, 去掉编译器警告
* resize时默认量的bug
Co-authored-by: nihuini <nihuini@tencent.com>
Co-authored-by: nihui <shuizhuyuanluo@126.com>
6 years ago
nihuini
cefe8d38c3
dynamic image storage support from shape hint
6 years ago
nihui
9a9a618229
image storage is mandatory, less options makes life easier
6 years ago
nihui
e8688b042f
fuse packing cast storage, binaryop image shader, dummy buffer and image, device-wide utility packing converter operators, fix multi-blob layer test
6 years ago
nihui
62da1228e1
adreno image shader + fp16 + fp16a ( #1714 )
* wip
* wip
* fix
* image and imageview can not be destroyed until command execution ends
* fast copy path for tightly packed data
* wip
* texture load works
* 1d 3d image
* record clone image, multiple commands share one image reference
* upload download image
* layer forward accept vkimagemat
* vkimagemat graph works
* staging vkimagemat for passing dynamic parameters, macro for fp32+image shader, padding image shader
* vkimagemat elemsize
* convolution test pass
* conv1x1s1 image shader
* fast staging image allocator from host memory, pooling image shader
* convolutiondepthwise image shader
* innerproduct image shader
* packing image shader
* crop deconvolution image shader
* resolve spirv binding types
* image fp16 and fp16a, cast image shader
* eltwise image shader
* wip
* absval image shader
* deconvolutiondepthwise image shader
* concat image shader, squeezenet works
* noop split image shader
* uniform precision hint
* layer support_image_storage
* wip
* vulkan device utility operator
* command is storage and packing option aware
* fallback to cpu on image allocation failed, mobilenetssd works
* flatten image shader, enable more test
* ci test
* check imgfp32 imgfp16 imgfp16a features
* fix ci test
* fix ci test
* upgrade swiftshader
* wip
* opt aggressive
* imgfp16p
* opt none
* convolution winograd image shader
* fix flush range, fast copy path for continous buffer
* minor fix
* fix innerproduct
* wip ...
* wip
* cast fix
* packing test
* wip
* image fp16p is fp16p
* wip
* silence
* more line info
* code clean
* softmax image shader
6 years ago
nihui
7365bb80a2
vkmat and command api breaks ( #1689 )
* vkmat and command api breaks
* always use compute queue for compute buffer transfer
* no barrier for readonly weight buffer
* record clone, drop queue_owner
* bring back layer forward
* fix validation errors
* lifecycle inside command makes life easier
* update doc
* record_import_android_hardware_buffer
6 years ago
nihuini
1ea9de3bdf
create shader pipeline by type index, resolve binding count and push constant count from spirv. since we don't create compound shader module for macos and ios compatibility, it is enough to use fixed main as the shader entry point
6 years ago
nihui
bbaa4dcce2
compile fp16pa, optimize shader for size, enable implicit fp16 arithmetic for qcom855 and qcom855plus
6 years ago
nihui
038666e049
the initial auto test ( #1464 )
* cpu test
* wip
* ci run test
* travis ci for arm64
* arm64 ctest
* copy vulkan loader
* wip
* run
* Update ccpp.yml
* gpu test
* swiftshader
* cache macos swiftshader
* try MoltenVK
* try vulkaninfo
* give swiftshader another try
* disable failed macos gpu test
* more conv test, fix conv3x3s1 gpu test fail
* fix deconvolution test
* dilation test
* cmake option to build tests
* ncnn_add_layer_test macro
* host barrier before upload and after download, handle packing layout option
* test packing layout
* wip
* wip
* merge deconvolution packing and non-packing code
* merge convolution packing and non-packing code
* pass top_blob_count param
* fix build
* take care of non-coherent mappable memory
6 years ago
nihui
8a87f0267a
workaround local workgroup size specialization constant bug for old arm mali vulkan driver, fix #1424
6 years ago
nihui
a867d96822
dynamic memory type querying, respect memory requirement memory type bits
6 years ago
nihui
7e68c5e1e9
enable ycbcr conversion feature, get graphics queue
6 years ago
nihui
cb41b00e6e
setup VK_KHR_bind_memory2 functions
6 years ago
nihui
b29e8b0e09
check and enable more vulkan extensions
6 years ago