nihui
1fcad0e765
loongson mmi optional layer
4 years ago
nihui
457e066eb5
x86 f16c infrastructure ( #3577 )
4 years ago
nihui
4654030541
decouple x86 fma avx2 ( #3560 )
4 years ago
nihuini
51ecc33d9d
check avx512vl extension for discarding old-slow avx512 chips, enable avx512 option by default
4 years ago
nihui
672daa7e04
xop infrastructure and optimization ( #3541 )
4 years ago
nihui
930c36ebe2
avx512 infrastructure ( #3407 )
4 years ago
nihui
878cb713d5
optional arm82 dot source ( #3415 )
4 years ago
nihuini
11794675f3
apple a11 and a12 do not support armv8.2 dotprod, restore the fp16-only optimized path
4 years ago
nihuini
affbefe311
some space cleanup, blob clone from allocator
4 years ago
Tijmen Verhulsdonck
eaa7e24db6
Added ability to switch AVX/AVX2 during runtime ( #3076 )
4 years ago
Tijmen Verhulsdonck
a7f301a99d
Add clang compatiblity ( #3071 )
* Add clang compatiblity
Add ability to build NCNN lib on windows with clang GNU
* Restyled/pull 3071 (#16 )
* [skip ci] Restyled by clang-format
* [skip ci] Restyled by astyle
* [skip ci] Restyled by clang-format
* [skip ci] Restyled by astyle
Co-authored-by: Restyled.io <commits@restyled.io>
Co-authored-by: Restyled.io <commits@restyled.io>
4 years ago
nihui
1c31ac2549
runtime cpu dispatch for mips msa and loongson mmi
4 years ago
nihui
2f70343aec
cmake clean ( #3032 )
4 years ago
nihui
bcbb55f033
apple device always has armv8.2 dot ( #2963 )
5 years ago
nihuini
afc02d57f9
runtime detect armv8.2 dotprod
5 years ago
nihui
11958424c2
runtime riscv v and zfh dispatch, riscv v optimization for cast
5 years ago
nihui
e9cc637573
arm neon optimization for int8 packing kernels ( #2809 )
5 years ago
nihui
3ed6c21565
find threads in cmake config
5 years ago
nihui
14d319db36
include arm82 on native macos arm64
5 years ago
nihui
54c0a13b9f
build shared library ( #2525 )
* build shared lib and enable lto
* reserved for layer and option
* allocator pimpl
* datareader pimpl
* paramdict pimpl, disable copy assign for allocator and datareader
* modelbin pimpl
* net extractor pimpl
* gpu pimple
* disable copy assign vulkandevice, code format
* command pimpl, dummy image readonly
* pipeline pipelinecache pimpl, export platform class
* code format, export simple family
* update ci
* disable lto on android armv7, merge webassembly ci
* link libgcc, fix macos dylib version
* pipeline pimpl, gpu info pimpl
* destroy gpu info after vulkan device
* ignore msvc stl class warning
* fix ncnn_paramdict_get_float return type
* fix vktransfer upload fp16 without flatten, add command test
5 years ago
nihui
1040f40c8b
update c api for custom allocator datareader modelbin and layer registration, add cookie userdata to layer
5 years ago
Cai Shanli
a9df4f6c59
add custom layer destroyer ( #2481 )
* add custom layer destroyer
* set default layer destroyer with 0
5 years ago
nihui
e93ad408c5
Ci release ( #2440 )
* package openmp and glslang together
* find glslang targets in lib64
* define version string
* update moltenvk
5 years ago
nihui
f4f790ca1f
ci macos arm64 ( #2321 )
5 years ago
nihui
b9296c259d
bring up vulkan 1.1 ( #2191 )
* query subgroup features
* compile spirv 1.3
* drop offline spirv build
* do not build tests for android and ios, as they are never tested anyway
* code style
5 years ago
youzainn
3b1b41ec0b
Add some compile options, add vulkan dependency export ( #2062 )
* vulkan cmake export templete
* 1) vulkan cmake dependency export. 2) support opencv_world import. 3) add BUILD_WITH_STATIC_CRT option
* Threads dependency
* NCNN_BUILD_WITH_STATIC_CRT option
* we do not support cmake before version 3.15 for option NCNN_BUILD_WITH_STATIC_CRT
5 years ago
nihui
bb5bfe3841
avx2 infrastructure ( #1943 )
5 years ago
nihui
11cffce114
armv8.2 infrastructure ( #1856 )
* runtime cpu dispatch
* force thread one
* disable openmp for coverage
* simplify test layer
* print NCNN_TARGET_ARCH
* less ci build variants
* weight fp16 storage option
* test convdw int8
* apple a12 a13
* ncnn_add_layer ncnn_add_shader cmake macro
5 years ago
nihui
fe6bc1ed4d
Ci rv64gcv and rv64gc ( #1936 )
5 years ago
nihuini
f3b182da1f
fix ci build
6 years ago
nihuini
989b0f70cc
convert shader source to hex data at build time
6 years ago
nihuini
b5f85eee13
fix image1d_xx8 macro, normalize image shader
6 years ago
nihuini
6682cd1638
image fp16pa, mark some bugihfa todo
6 years ago
nihui
e8688b042f
fuse packing cast storage, binaryop image shader, dummy buffer and image, device-wide utility packing converter operators, fix multi-blob layer test
6 years ago
nihui
62da1228e1
adreno image shader + fp16 + fp16a ( #1714 )
* wip
* wip
* fix
* image and imageview can not be destroyed until command execution ends
* fast copy path for tightly packed data
* wip
* texture load works
* 1d 3d image
* record clone image, multiple commands share one image reference
* upload download image
* layer forward accept vkimagemat
* vkimagemat graph works
* staging vkimagemat for passing dynamic parameters, macro for fp32+image shader, padding image shader
* vkimagemat elemsize
* convolution test pass
* conv1x1s1 image shader
* fast staging image allocator from host memory, pooling image shader
* convolutiondepthwise image shader
* innerproduct image shader
* packing image shader
* crop deconvolution image shader
* resolve spirv binding types
* image fp16 and fp16a, cast image shader
* eltwise image shader
* wip
* absval image shader
* deconvolutiondepthwise image shader
* concat image shader, squeezenet works
* noop split image shader
* uniform precision hint
* layer support_image_storage
* wip
* vulkan device utility operator
* command is storage and packing option aware
* fallback to cpu on image allocation failed, mobilenetssd works
* flatten image shader, enable more test
* ci test
* check imgfp32 imgfp16 imgfp16a features
* fix ci test
* fix ci test
* upgrade swiftshader
* wip
* opt aggressive
* imgfp16p
* opt none
* convolution winograd image shader
* fix flush range, fast copy path for continous buffer
* minor fix
* fix innerproduct
* wip ...
* wip
* cast fix
* packing test
* wip
* image fp16p is fp16p
* wip
* silence
* more line info
* code clean
* softmax image shader
6 years ago
nihuini
1ea9de3bdf
create shader pipeline by type index, resolve binding count and push constant count from spirv. since we don't create compound shader module for macos and ios compatibility, it is enough to use fixed main as the shader entry point
6 years ago
nihui
999da7158f
old glslang reject -Os option, as optimizing for size does not make a big difference, drop it for now, fix #1544
6 years ago
nihui
bbaa4dcce2
compile fp16pa, optimize shader for size, enable implicit fp16 arithmetic for qcom855 and qcom855plus
6 years ago
nihui
0f7e7bca02
shader shape specialization constant and basic local group size partition ( #1523 )
* use Mat class for Shape description
* shape specialization constant in compute shader
* wip
* wip
* test forward_inplace, add binaryop unaryop sigmoid test
* fix arm unaryop test
* fix arm binaryop test
* make shape hint optional, cast int8 to fp32, add cast test
* wip
* follow the good and old local size setting for conv1x1
* the optimal local size rewrite
* fix build on msvc
* add permute shader for all packing layout, add permute test
* concat and slice patial shape constant, slice test
* fix slice test
* interp test
* add lrn test, test packing layout implicitly
* add eltwise test
* add normalize test
* add instancenorm test
* reorg shape constant
* simple local group size partition
* add shape constant param
6 years ago
nihui
33b16811ce
reimplement sfp afp conversion macro as function style buffer load store, drop lds shader for the moment
6 years ago
nihui
5042d14d7d
define sfpvec8 afpvec8 macro, use modern glsl extension for fp16 arithmetic, fix padding aarch64 build
6 years ago
nihuini
628989770b
return values correctly
6 years ago
nihuini
eb9326002f
cmake ncnn_generate_shader_spv_header function
6 years ago
Natsu
637d96c1d2
Fix gcc 9 compilation failure ( #1189 )
* Fix gcc 9 compilation failure
* Fix compilation failure on linux gcc
* Fix compilation failure on old gcc
* Remove C++11 requirement
6 years ago
Natsu
6d1944f2c3
CMake improvement ( #1115 )
* CMake improvement
* Fix bugs
* Fix typo
* Propagate vulkan dependency
* import vulkan
* add config files, now exported target cmake should be able to find packages
* Propagate no-rtti and no-exception
* Provide a option to control rtti and exception in mobile platform
* Make cmake clean
* Resolve conflicts
* Update CMake
PIE is propagated by INTERFACE_POSITION_INDEPENDENT_CODE
* Remove bad things
6 years ago