nihuini
c38d304369
the implicit gpu instance makes life easier :)
6 years ago
nihuini
187a3e672d
implicit gpu instance destruction, fix #1849
6 years ago
nihuini
9bb06e46cf
implicit gpu instance creation, fix #1849
6 years ago
nihuini
fd7d87e098
allow linking with external glslang
6 years ago
nihui
3ef995ed1e
format code style and setup restyled.io ( #1840 )
6 years ago
nihuini
554890cda8
fp16p and fp16s cannot be both enabled in shader source
6 years ago
nihuini
1a3a99d7c9
old qcom driver cannot handle binding id alias
6 years ago
nihuini
f87f21779f
resolve cast from type properly, no more fp16p to/from fp16s conversion
6 years ago
nihuini
bb56b5439f
fix vkmat download on integrated gpu, workaround priorbox fp16s with online spirv, fix #1700 fix #1805
6 years ago
nihui
8fec0038ba
fix ci test
6 years ago
nihuini
aeba24b371
enable implicit fp16a on arm mali variants, add bug tag for layout binding id alias
6 years ago
nihuini
054ec09195
adreno device blacklist
6 years ago
nihuini
765003a615
fix build with old vulkan sdk
6 years ago
nihuini
6788384595
query gpu heap budget api
6 years ago
nihui
17c445480f
runtime spir-v compilation with libglslang ( #1779 )
6 years ago
nihuini
b71f22d074
report adreno info, benchncnn enable image storage on adreno
6 years ago
nihuini
c94d1b39ad
force diable image storage on macos and ios, fix #1738
6 years ago
SunTY
705dd36a31
simplestl is an alternative std vector string implementation ( #1762 )
* 去掉对stl的依赖
* 头文件名,push_back改正
* 去掉构造托管
* 好像是折腾
* data 的返回改为指针,非指针引用
* resize一处写错
* stdint
* 加入c_str
* 改文件名为小写
* NCNN_SIMPLESTL option
* simplestl default to OFF
* Update linux-x64-cpu-gcc.yml
* Update linux-x64-cpu-gcc.yml
* Update linux-x64-cpu-clang.yml
* drop functional header
* arm32 arm64 simplestl ci
* 修改一处内存泄漏, 去掉编译器警告
* resize时默认量的bug
Co-authored-by: nihuini <nihuini@tencent.com>
Co-authored-by: nihui <shuizhuyuanluo@126.com>
6 years ago
Naiyang Lin
ceef2470a5
Add logger.h ( #1753 )
6 years ago
nihuini
6682cd1638
image fp16pa, mark some bugihfa todo
6 years ago
nihuini
cefe8d38c3
dynamic image storage support from shape hint
6 years ago
nihuini
1e4a0752b4
fix interp ci test
6 years ago
nihui
9a9a618229
image storage is mandatory, less options makes life easier
6 years ago
nihui
e8688b042f
fuse packing cast storage, binaryop image shader, dummy buffer and image, device-wide utility packing converter operators, fix multi-blob layer test
6 years ago
nihui
62da1228e1
adreno image shader + fp16 + fp16a ( #1714 )
* wip
* wip
* fix
* image and imageview can not be destroyed until command execution ends
* fast copy path for tightly packed data
* wip
* texture load works
* 1d 3d image
* record clone image, multiple commands share one image reference
* upload download image
* layer forward accept vkimagemat
* vkimagemat graph works
* staging vkimagemat for passing dynamic parameters, macro for fp32+image shader, padding image shader
* vkimagemat elemsize
* convolution test pass
* conv1x1s1 image shader
* fast staging image allocator from host memory, pooling image shader
* convolutiondepthwise image shader
* innerproduct image shader
* packing image shader
* crop deconvolution image shader
* resolve spirv binding types
* image fp16 and fp16a, cast image shader
* eltwise image shader
* wip
* absval image shader
* deconvolutiondepthwise image shader
* concat image shader, squeezenet works
* noop split image shader
* uniform precision hint
* layer support_image_storage
* wip
* vulkan device utility operator
* command is storage and packing option aware
* fallback to cpu on image allocation failed, mobilenetssd works
* flatten image shader, enable more test
* ci test
* check imgfp32 imgfp16 imgfp16a features
* fix ci test
* fix ci test
* upgrade swiftshader
* wip
* opt aggressive
* imgfp16p
* opt none
* convolution winograd image shader
* fix flush range, fast copy path for continous buffer
* minor fix
* fix innerproduct
* wip ...
* wip
* cast fix
* packing test
* wip
* image fp16p is fp16p
* wip
* silence
* more line info
* code clean
* softmax image shader
6 years ago
nihuini
5580da4525
bump engine version
6 years ago
nihui
7365bb80a2
vkmat and command api breaks ( #1689 )
* vkmat and command api breaks
* always use compute queue for compute buffer transfer
* no barrier for readonly weight buffer
* record clone, drop queue_owner
* bring back layer forward
* fix validation errors
* lifecycle inside command makes life easier
* update doc
* record_import_android_hardware_buffer
6 years ago
nihuini
1ea9de3bdf
create shader pipeline by type index, resolve binding count and push constant count from spirv. since we don't create compound shader module for macos and ios compatibility, it is enough to use fixed main as the shader entry point
6 years ago
nihui
f972bf49d1
enable bugihfa on rk3288 and rk3399
6 years ago
nihui
7c97142524
old qcom adreno driver seems to have the same bug as mali does
6 years ago
nihui
bbaa4dcce2
compile fp16pa, optimize shader for size, enable implicit fp16 arithmetic for qcom855 and qcom855plus
6 years ago
nihuini
b361b24832
do not enforce coherent memory type, queue transfer after uploading model weight
6 years ago
nihui
038666e049
the initial auto test ( #1464 )
* cpu test
* wip
* ci run test
* travis ci for arm64
* arm64 ctest
* copy vulkan loader
* wip
* run
* Update ccpp.yml
* gpu test
* swiftshader
* cache macos swiftshader
* try MoltenVK
* try vulkaninfo
* give swiftshader another try
* disable failed macos gpu test
* more conv test, fix conv3x3s1 gpu test fail
* fix deconvolution test
* dilation test
* cmake option to build tests
* ncnn_add_layer_test macro
* host barrier before upload and after download, handle packing layout option
* test packing layout
* wip
* wip
* merge deconvolution packing and non-packing code
* merge convolution packing and non-packing code
* pass top_blob_count param
* fix build
* take care of non-coherent mappable memory
6 years ago
nihuini
a477aee0ba
print graphics queue info, const++
6 years ago
nihui
8a87f0267a
workaround local workgroup size specialization constant bug for old arm mali vulkan driver, fix #1424
6 years ago
nihui
a867d96822
dynamic memory type querying, respect memory requirement memory type bits
6 years ago
nihui
7e68c5e1e9
enable ycbcr conversion feature, get graphics queue
6 years ago
nihui
cb41b00e6e
setup VK_KHR_bind_memory2 functions
6 years ago
nihui
b29e8b0e09
check and enable more vulkan extensions
6 years ago
nihuini
21b5508c96
shared locked vkallocator cannot prevent concurrent accessing during actual gpu inference, use seperated vkallocator for each queue
7 years ago
nihuini
e9ffdb5bdd
16bit storage on arm mali is buggy
7 years ago
nihuini
040a8d2427
set vulkan device by gpu index
7 years ago
nihuini
5fdffbcaac
destroy_gpu_instance is not threadsafe anyway, fix deadlock on exit
7 years ago
nihuini
838c5df839
option api changes
7 years ago
nihuini
7f7bbf12e5
new api for getting the default gpu device
7 years ago
nihuini
cd7559c639
more fix for fp16p, still disabled by default
7 years ago
nihui
25b9736f82
shader fp16 packed
7 years ago
nihuini
4b50a97e31
implement vulkan winograd23
7 years ago
nihuini
37e150162a
do not retrieve timestamp availabitliy bits
7 years ago
nihuini
8e2fb2e710
expose timestamp_period and timestamp_valid_bits
7 years ago