* query and enable cooperative matrix
* fix build with old vulkan sdk
* implement cooperative matrix optimization
* add nvidia-t4 coverage
* adjust test option for more coverage
* vulkan local memory optimization for conv1x1 pack4 and winograd on dgpu
* unified innerproduct pipeline creation
* reorder deconvolution weight layout
* flexible local memory data type
* more local memory optimization for conv/deconv gemm
* add bridge image for adreno image storage upload and download
* enable sbn1, print bugbilz flag
* blacklist old adreno
* let user choose use_image_storage option even when bug_storage_buffer_no_l1
* enable ios arm64e
* fix build with old vulkan sdk
* link vulkan loader on macos, fix ios moltenvk library path
* there is no moltenvk arm64e library atm, link moltenvk directly for macos-arm64
* query subgroup features
* compile spirv 1.3
* drop offline spirv build
* do not build tests for android and ios, as they are never tested anyway
* code style
* [build] add toolchain file w/o stdcxx dependency
* [build] link m and gcc lib explicitly
* [ncnn] complete simple stl impl
* [ncnn] adapt for ncnn simplestl
* [test] adapt for ncnn simplestl
* [ncnn] fix missing algorithm and list when simplestl disabled
* [ncnn] fix guard for operator new and delete
* [style] fix the code style
* [build] fix build failed on darwin and emscripten
* [ci] do not import cxx to avoid operator conflict
* [ncnn] add temporary partial_sort impl using bubble sort
heap sort should be used for better perf.
* [ncnn] add std greater and less function
* [ncnn] fix placement new operator overload
* [ncnn] add operator delete with size info
* [build] disable exception, rtti, example and tools when simplestl on
* [build] add toolchain for arm simplestl
* [build] add toolchain for aarch64 simplestl
* [ncnn] move initializer to constructor
* [ncnn] use deteiled type instead of auto
* [ncnn] use plain lib name in target_link_libraries
* runtime cpu dispatch
* force thread one
* disable openmp for coverage
* simplify test layer
* print NCNN_TARGET_ARCH
* less ci build variants
* weight fp16 storage option
* test convdw int8
* apple a12 a13
* ncnn_add_layer ncnn_add_shader cmake macro