daquexian
d38871bbfc
load bin in a single pass ( #4966 )
Signed-off-by: daquexian <daquexian566@gmail.com>
2 years ago
Upliner Mikhalych
e8645e9117
Don't silently ignore errors in VkCompute::submit_and_wait ( #4828 )
2 years ago
nihui
903ec7c2c9
fix overwrite builtin layer destruction ( #4732 )
* fix overwrite builtin layer destruction
* make modelbin class copyable
* test++
3 years ago
nihui
1b4a8fd4b2
fix warnings and code clean ( #4729 )
3 years ago
jason_w
48f9bcfce2
place the `if` statement outside the `for` loop ( #4707 )
3 years ago
nihui
db628b1b99
allow overwriting built-in layer with custom layer ( #4616 )
3 years ago
nihui
15761fc1a6
arm vfpv4 asimdhp asimdfhm optimization for gemm ( #4432 )
3 years ago
nihui
0b591b0d1f
implement layer feature disabled bit ( #4278 )
3 years ago
LinHe
9426e21166
Memory Pool Improvement For Variadic Sized Inputs ( #4190 )
* Simple miss count for better space efficiency
* Simple double ended greedy;
* Add size drop threshold setter;
* set workspace allocator cr to zero as we had some sort of recylcing capability :P
Co-authored-by: LinHeLurking <LinHeLurking@users.noreply.github.com>
Co-authored-by: nihuini <nihuini@tencent.com>
3 years ago
nihui
dadc640c66
x86 avx512 optimization ( #3581 )
* unified relu avx512
* unifed clip avx512
* unaryop avx512
* sigmoid avx512
* binaryop avx512
* padding convolution avx512
* convolutiondepthwise avx512
* innerproduct avx512
* reshape avx512
* slice avx512
* hardsigmoid hardswish avx512
* swish avx512
* pooling avx512
* crop avx512
* convolution sgemm pack16
* convolution 3x3 winograd pack16
* interp avx512
* convolution sgemm pack1to16
* convolution sgemm pack16to8
* convolution sgemm pack8to16
* convolution sgemm pack16to4
* fix vulkan permute pack8
* fix vulkan convolution gemm pack8to1
4 years ago
nihui
559e5b23f9
vulkan tensorcore optimization ( #3628 )
* query and enable cooperative matrix
* fix build with old vulkan sdk
* implement cooperative matrix optimization
* add nvidia-t4 coverage
* adjust test option for more coverage
4 years ago
nihui
cfcb1cffa9
massive vulkan optimization part2 ( #3621 )
* vulkan local memory optimization for conv1x1 pack4 and winograd on dgpu
* unified innerproduct pipeline creation
* reorder deconvolution weight layout
* flexible local memory data type
* more local memory optimization for conv/deconv gemm
4 years ago
nihui
3a83704c38
binary4d, unary4d ( #3443 )
4 years ago
nihui
aa9753b2f0
detach mat from local blob allocator so net instance could be destroyed much earlier ( #3287 )
4 years ago
nihuini
affbefe311
some space cleanup, blob clone from allocator
4 years ago
nihui
cdf45a6512
cmake option NCNN_BF16 ( #3068 )
4 years ago
Tijmen Verhulsdonck
eaa7e24db6
Added ability to switch AVX/AVX2 during runtime ( #3076 )
4 years ago
nihui
3a77b09c31
fix test failure
5 years ago
nihuini
9b5cb959b9
auto convert int8 to fp32 on extract
5 years ago
nihui
ad37c34d25
disable NCNN_ARM82DOT whenever NCNN_ARM82 disabled
5 years ago
Cai Shanli
8cc8cd716a
Add get input and output names ( #2890 )
5 years ago
nihui
17936e9f54
fix packing risc-v test, add cpu_riscv_vlenb()
5 years ago
nihui
11958424c2
runtime riscv v and zfh dispatch, riscv v optimization for cast
5 years ago
nihui
1c26291757
more verbose hint for find_blob_index_by_name failure
5 years ago
nihuini
34bd5ef161
update eq quant info
5 years ago
nihuini
72ef77a469
fix build with NCNN_STRING off and NCNN_VULKAN on
5 years ago
zhiliu6
fb9d529487
fix compile error when NCNN_STRING is disabled ( #2874 )
5 years ago
nihuini
31d436c627
more verbose load failure, ncnn2int8 write int8 data properly
5 years ago
nihuini
1bc0126302
fix crash when input cpu blob and extract the same from gpu, update vgg16 int8 model
5 years ago
nihui
e9cc637573
arm neon optimization for int8 packing kernels ( #2809 )
5 years ago
nihui
32b48f0157
fix int8 auto pack layout
5 years ago
nihui
5fe75f19ef
architecture changes for int8 packing ( #2771 )
* quantize and dequantize tests
* unify activation and usability function
* drop NCNN_REQUANT cmake option, test dequantize requantize pack8, fix webassembly build
* benchmark use requantize int8 model
5 years ago
nihui
d4a7abc218
fix onnx2ncnn clip without max blob, fix #2788
5 years ago
nihui
67e24e0703
use local pool allocator ( #2736 )
* use local pool allocator
* detach extract feat from local allocator
* fix test
5 years ago
Cai Shanli
f5b307689b
fix net and extractor destroy order when use vulkan ( #2732 )
5 years ago
nihuini
b51959802c
fix buffer2host copy, fix #2725
5 years ago
Xu Yang
fd634e9a58
remove unnecessary mat clone when NCNN_BENCHMARK enabled ( #2708 )
5 years ago
Dahan Gong
cbd410c237
fix broken inplace forward ( #2709 )
5 years ago
Youngsoo Lee
b9bed8d993
feat: add denormal options ( #2656 )
* feat: add denormal options
Flush-To-Zero(FTZ) and Denormals-Are-Zero(DAZ) are modes that bypass IEEE754 methods of dealing with denormal floating-point numbers on x86_64 and some x86 CPUs.
* feat: Integrate `flush_denormals` into `Extractor::extract`
* chore: replace global variable with `ThreadLocalStorage`
5 years ago
nihui
9fd4d371ae
bridge image for adreno image upload and download ( #2658 )
* add bridge image for adreno image storage upload and download
* enable sbn1, print bugbilz flag
* blacklist old adreno
* let user choose use_image_storage option even when bug_storage_buffer_no_l1
5 years ago
nihuini
2a57ca4942
reduce memory usage in lightmode, handle upload image allocation failure properly
5 years ago
nihuini
bd68ee487b
fallback to cpu when image allocation failed, fix #2648
5 years ago
nihui
af7d8184aa
handle image allocation failure properly
5 years ago
nihui
09b2bf6213
Break down forward_layer ( #2577 )
5 years ago
nihui
54c0a13b9f
build shared library ( #2525 )
* build shared lib and enable lto
* reserved for layer and option
* allocator pimpl
* datareader pimpl
* paramdict pimpl, disable copy assign for allocator and datareader
* modelbin pimpl
* net extractor pimpl
* gpu pimple
* disable copy assign vulkandevice, code format
* command pimpl, dummy image readonly
* pipeline pipelinecache pimpl, export platform class
* code format, export simple family
* update ci
* disable lto on android armv7, merge webassembly ci
* link libgcc, fix macos dylib version
* pipeline pimpl, gpu info pimpl
* destroy gpu info after vulkan device
* ignore msvc stl class warning
* fix ncnn_paramdict_get_float return type
* fix vktransfer upload fp16 without flatten, add command test
5 years ago
nihui
1040f40c8b
update c api for custom allocator datareader modelbin and layer registration, add cookie userdata to layer
5 years ago
nihui
79efe33fdc
cmake option for platform api uses ( #2502 )
* cmake option for platform api uses
* adroid gpu ci does not rely on glslangvalidator, add android termux ci
5 years ago
nihui
343bc3b7dc
single blob consumer ( #2493 )
5 years ago
Zhuo Zhang
3c99287da5
fix src/net.cpp missing-field-initializers warning ( #2494 )
5 years ago
maxfy1992
0f325d7910
add decrease unpack pack overhead ( #2489 )
Co-authored-by: yangfengmax <yangfengmax@didichuxing.com>
5 years ago