nihui
debc33fee2
arm handle allocation failures ( #5490 )
1 year ago
nihui
db035d602d
update ncnnoptimize layers, lightmode=false keeps original weight ( #5414 )
2 years ago
nihui
556b79ce4d
create layer decoupled ( #5258 )
* create layer decoupled
* no more virtual public
* allow build test with shared library
* decouple cpu vulkan
* drop old scripts
2 years ago
nihui
4494aadd74
deconvolution dynamic weight ( #5119 )
2 years ago
zhiliu6
125b9f2baf
reduce double usage ( #4671 )
3 years ago
nihui
c471826da1
fix arm bfloat2float float2bfloat oops ( #4439 )
3 years ago
nihui
dd86cebab8
armv8.6 ci and coverage ( #4025 )
* asimdfhm in fc
* move neon bf16 conversion function to arm_usability header
* fix cmake option
* fix build with newer gcc
* arm84 coverage
* arm asimdfhm optimization for innerproduct gemm fp16s
3 years ago
nihui
7886e90c65
split arm82 source for smaller binary and memory footprint ( #3877 )
* split arm82 source, wip
* check compiler arm82 only for arm 64bit target
* drop arm82 registery
* strict check compiler support arm82
4 years ago
nihui
241524ffce
discard weight memory for x86 arm vulkan ( #3865 )
* discard weight memory for x86 and vulkan
* drop arm innerproduct weight
* drop arm convolution weight
* drop arm convolutiondepthwise weight
* drop x86 vulkan deconvolution deconvolutiondepthwise weight
* drop arm deconvolution deconvolutiondepthwise weight
* arm neon assembly optimization for innerproduct pack4
4 years ago
nihui
c0a94cd9ca
fix armv7 without neon ( #3514 )
4 years ago
nihui
24fbb6e8cb
honor thread setting on load and vulkan command, ci avx512 t4 ( #3391 )
4 years ago
nihui
adfc8b25bc
fix deconv output pad ( #3337 )
4 years ago
nihui
cdf45a6512
cmake option NCNN_BF16 ( #3068 )
4 years ago
nihui
5fe75f19ef
architecture changes for int8 packing ( #2771 )
* quantize and dequantize tests
* unify activation and usability function
* drop NCNN_REQUANT cmake option, test dequantize requantize pack8, fix webassembly build
* benchmark use requantize int8 model
5 years ago
nihui
bf09af21be
exp arm fp16sa neon optimization
5 years ago
nihui
72a27d4776
utility wrapper for neon float32 bfloat16 conversion, deconvolution deconvolutiondepthwise arm fp16s fp16sa bf16s
5 years ago
nihui
b5e288b521
layer creator function is not necessary for built-in layers
5 years ago
nihui
01b8b79ed2
packing layout option respect support_packing property
6 years ago
nihui
3ef995ed1e
format code style and setup restyled.io ( #1840 )
6 years ago
nihui
57bedd59fa
fix build without neon
6 years ago
nihui
038666e049
the initial auto test ( #1464 )
* cpu test
* wip
* ci run test
* travis ci for arm64
* arm64 ctest
* copy vulkan loader
* wip
* run
* Update ccpp.yml
* gpu test
* swiftshader
* cache macos swiftshader
* try MoltenVK
* try vulkaninfo
* give swiftshader another try
* disable failed macos gpu test
* more conv test, fix conv3x3s1 gpu test fail
* fix deconvolution test
* dilation test
* cmake option to build tests
* ncnn_add_layer_test macro
* host barrier before upload and after download, handle packing layout option
* test packing layout
* wip
* wip
* merge deconvolution packing and non-packing code
* merge convolution packing and non-packing code
* pass top_blob_count param
* fix build
* take care of non-coherent mappable memory
6 years ago
nihuini
336d1c1edd
remove the ncnn namespace for in source Option
6 years ago
nihuini
cd4be6d0fa
call vulkan create_pipeline on the vkdev condition, drop opt_cpu hacks
6 years ago
nihuini
624291e2b2
use subop optimization for group convolution deconvolution pack4 family
6 years ago
nihui
48e3e7d49c
move neon activation into a wrapper function
6 years ago
nihuini
b7085ceec0
deconvolution apply output adj first, then crop the padding
6 years ago
nihuini
296e0022df
deconvolution output adj and output shape
6 years ago
nihuini
9a6ee37eef
asymmetric padding parameter for convolution and deconvolution family
6 years ago
nihui
394f6786b9
neon enable support_packing
6 years ago
nihui
cf42e7c254
deconvolutiondepthwise pack4 arm neon
6 years ago
nihui
b4c388a72a
Mat misc function accept option parameter, deconvolution pack4 arm neon
6 years ago
BUG1989
d9f269fa3d
use sgemm fp32 on arm platform,optimize conv1x1s2 ( #1031 )
7 years ago
nihuini
4de4078779
move platform includes out of namespace
7 years ago
nihui
3e003ffd98
fuse sigmoid
7 years ago
nihuini
7a8f68aca6
move vulkan code to subdir, new layer interface create_pipeline and destroy_pipeline for post-loading works
7 years ago
nihuini
c6e075cef7
fuse deconv/innerproduct relu arm
7 years ago
nihuini
a76a07eb3f
fix null sub group elemsize/allocator in depthwise deconvolution, fix #539
7 years ago
nihuini
6f1b0b0a61
quantized padding in convolution, use range sweets
7 years ago
nihui
5d04a3a45c
layer holds bottom blob scale, depthwise convolution read group scales
7 years ago
nihuini
6b536701c3
sub-mat shall be allocator-aware
7 years ago
nihui
9706cd1447
implement ncnn blob/workspace allocator, fine-grained per-layer openmp threads control, fix #469
7 years ago
nihuini
aac70893f8
fix build on gcc
8 years ago
nihuini
9ac305e160
create 3-dim sub blob for group convolution, fix #315
8 years ago
nihui
6c4c810fda
decouple modelbin of different input types, simplify timestamp function
8 years ago
nihuini
76a55693a6
decouple convolutiondepthwise and convolution, reduce binary size by 10%, fix #254
8 years ago
nihuini
a84ba8fc0f
element type storage support in Mat, move data member the first so that a pointer to Mat is a pointer to data, convenient index access for float vector
8 years ago
nihui
bdb70a2010
padding w h in convolution and deconvolution
8 years ago
nihui
44b4519307
non-square convolution and deconvolution kernel stride dilation
8 years ago
huyn
8b9365a68c
fix top_blob not set ( #199 )
8 years ago
tedder59
4d59d0afda
Add depthwise Deconvolution. ( #187 )
* add depthwise deconvolution.
* add depthwise deconvolution.
* fix some syntax error and uncessary modification
8 years ago