nihui
9f832c19c1
vulkan int8 packing quantize dequantize requantize ( #3731 )
* add int8 definitions
* packing vulkan int8/int32, quantize vulkan
* vulkan dequantize
* requantize vulkan
11 months ago
nihui
24a3b99f1f
drop layer support_image_storage and option use_image_storage ( #6126 )
* fix pyncnn build
1 year ago
nihui
211e238639
drop layer forward vkimagemat ( #6124 )
vkimagemat was originally used as a mat storage in the hope of improving performance on old adreno gpus, but in fact it is slower than the cpu in most cases and is no longer suitable for the latest adreno architecture and large shapes
1 year ago
nihui
1d9eddff8f
reset extractor local vulkan allocator ( #6002 )
resolve reclaim_blob_allocator get wild allocator warning when clear() called multple times
1 year ago
nihui
8dbcfee5ec
option owns vulkan device index ( #5973 )
1 year ago
nihui
eed257df1f
ci update llvmpipe ( #5954 )
* check image fp16
1 year ago
nihui
bf13c30210
define device feature macros for glslang, discover VK_EXT_shader_atomic_float and VK_EXT_shader_atomic_float2 ( #5949 )
1 year ago
nihui
8211930a6f
discover VK_KHR_shader_subgroup_rotate ( #5948 )
1 year ago
nihui
40f7b4e527
discover all subgroup features and VK_KHR_shader_subgroup_extended_types ( #5946 )
1 year ago
nihui
19caca3140
port rvv intrinsic 1.0+ ( #5642 )
* zfh zvfh xtheadvector infra
* dispatch for rvv and xtheadvector
* dispatch for non-vector zfh
* port xtheadvector recp rsqrt trunc
* general rvv gemm
* c906 and c910 ci
* old tuple code clean
* update riscv64 ci
* update build doc
* drop old th1520 toolchain
1 year ago
nihui
8fe62812c9
arm neon optimization for layernorm fp32/bf16s/fp16s ( #5746 )
1 year ago
Upliner Mikhalych
cbd17cd062
Fix #5741 don't crash when vkCreateDevice fails ( #5742 )
1 year ago
nihui
3752d71200
fix potential fp16s bf16s conflicts on arm vfpv4 ( #5578 )
* fix potential fp16s bf16s conflicts on armv7 vfpv4
* but prefer fp16 on armv8.2
1 year ago
nihui
b4379630fb
x86 handle allocation failures ( #5489 )
2 years ago
nihui
0fd25d6c70
fix arm riscv build with NCNN_BF16=OFF ( #5422 )
2 years ago
nihui
824b79a314
fix rvv extract blob with fp16 enabled, fix #5360 ( #5398 )
2 years ago
nihui
984d6dd844
promote vfpv4 for auto fp16 storage conversion ( #5325 )
* promote vfpv4 for auto fp16 storage conversion
* always report neon and vfpv4 for arm64
2 years ago
nihui
5329d32e74
check vulkan fp16 uniform support and implement lfp conversion without fp16u ( #5287 )
2 years ago
nihui
c222208cc9
feat mask for disable threading, make some extractor setter no-op, update doc ( #5270 )
2 years ago
nihui
a31f66203b
do not cache temporary blob for uploading weight ( #5266 )
2 years ago
nihui
556b79ce4d
create layer decoupled ( #5258 )
* create layer decoupled
* no more virtual public
* allow build test with shared library
* decouple cpu vulkan
* drop old scripts
2 years ago
Justin Fung
465debe9bb
Add print statements for 4 dimensions benchmark ( #5148 )
2 years ago
nihui
39bc71c941
support big endian platform, add powerpc ci ( #5121 )
2 years ago
nihui
80b3b9c6f0
arm optimization for convolution int8 winograd unified elempack ( #5087 )
* enable out elempack 8 for winograd and sgemm
2 years ago
daquexian
d38871bbfc
load bin in a single pass ( #4966 )
Signed-off-by: daquexian <daquexian566@gmail.com>
2 years ago
Upliner Mikhalych
e8645e9117
Don't silently ignore errors in VkCompute::submit_and_wait ( #4828 )
2 years ago
nihui
903ec7c2c9
fix overwrite builtin layer destruction ( #4732 )
* fix overwrite builtin layer destruction
* make modelbin class copyable
* test++
3 years ago
nihui
1b4a8fd4b2
fix warnings and code clean ( #4729 )
3 years ago
jason_w
48f9bcfce2
place the `if` statement outside the `for` loop ( #4707 )
3 years ago
nihui
db628b1b99
allow overwriting built-in layer with custom layer ( #4616 )
3 years ago
nihui
15761fc1a6
arm vfpv4 asimdhp asimdfhm optimization for gemm ( #4432 )
3 years ago
nihui
0b591b0d1f
implement layer feature disabled bit ( #4278 )
3 years ago
LinHe
9426e21166
Memory Pool Improvement For Variadic Sized Inputs ( #4190 )
* Simple miss count for better space efficiency
* Simple double ended greedy;
* Add size drop threshold setter;
* set workspace allocator cr to zero as we had some sort of recylcing capability :P
Co-authored-by: LinHeLurking <LinHeLurking@users.noreply.github.com>
Co-authored-by: nihuini <nihuini@tencent.com>
3 years ago
nihui
dadc640c66
x86 avx512 optimization ( #3581 )
* unified relu avx512
* unifed clip avx512
* unaryop avx512
* sigmoid avx512
* binaryop avx512
* padding convolution avx512
* convolutiondepthwise avx512
* innerproduct avx512
* reshape avx512
* slice avx512
* hardsigmoid hardswish avx512
* swish avx512
* pooling avx512
* crop avx512
* convolution sgemm pack16
* convolution 3x3 winograd pack16
* interp avx512
* convolution sgemm pack1to16
* convolution sgemm pack16to8
* convolution sgemm pack8to16
* convolution sgemm pack16to4
* fix vulkan permute pack8
* fix vulkan convolution gemm pack8to1
4 years ago
nihui
559e5b23f9
vulkan tensorcore optimization ( #3628 )
* query and enable cooperative matrix
* fix build with old vulkan sdk
* implement cooperative matrix optimization
* add nvidia-t4 coverage
* adjust test option for more coverage
4 years ago
nihui
cfcb1cffa9
massive vulkan optimization part2 ( #3621 )
* vulkan local memory optimization for conv1x1 pack4 and winograd on dgpu
* unified innerproduct pipeline creation
* reorder deconvolution weight layout
* flexible local memory data type
* more local memory optimization for conv/deconv gemm
4 years ago
nihui
3a83704c38
binary4d, unary4d ( #3443 )
4 years ago
nihui
aa9753b2f0
detach mat from local blob allocator so net instance could be destroyed much earlier ( #3287 )
4 years ago
nihuini
affbefe311
some space cleanup, blob clone from allocator
4 years ago
nihui
cdf45a6512
cmake option NCNN_BF16 ( #3068 )
5 years ago
Tijmen Verhulsdonck
eaa7e24db6
Added ability to switch AVX/AVX2 during runtime ( #3076 )
5 years ago
nihui
3a77b09c31
fix test failure
5 years ago
nihuini
9b5cb959b9
auto convert int8 to fp32 on extract
5 years ago
nihui
ad37c34d25
disable NCNN_ARM82DOT whenever NCNN_ARM82 disabled
5 years ago
Cai Shanli
8cc8cd716a
Add get input and output names ( #2890 )
5 years ago
nihui
17936e9f54
fix packing risc-v test, add cpu_riscv_vlenb()
5 years ago
nihui
11958424c2
runtime riscv v and zfh dispatch, riscv v optimization for cast
5 years ago
nihui
1c26291757
more verbose hint for find_blob_index_by_name failure
5 years ago
nihuini
34bd5ef161
update eq quant info
5 years ago
nihuini
72ef77a469
fix build with NCNN_STRING off and NCNN_VULKAN on
5 years ago