nihui
aa9753b2f0
detach mat from local blob allocator so net instance could be destroyed much earlier ( #3287 )
4 years ago
nihuini
affbefe311
some space cleanup, blob clone from allocator
4 years ago
nihui
cdf45a6512
cmake option NCNN_BF16 ( #3068 )
4 years ago
Tijmen Verhulsdonck
eaa7e24db6
Added ability to switch AVX/AVX2 during runtime ( #3076 )
4 years ago
nihui
3a77b09c31
fix test failure
4 years ago
nihuini
9b5cb959b9
auto convert int8 to fp32 on extract
5 years ago
nihui
ad37c34d25
disable NCNN_ARM82DOT whenever NCNN_ARM82 disabled
5 years ago
Cai Shanli
8cc8cd716a
Add get input and output names ( #2890 )
5 years ago
nihui
17936e9f54
fix packing risc-v test, add cpu_riscv_vlenb()
5 years ago
nihui
11958424c2
runtime riscv v and zfh dispatch, riscv v optimization for cast
5 years ago
nihui
1c26291757
more verbose hint for find_blob_index_by_name failure
5 years ago
nihuini
34bd5ef161
update eq quant info
5 years ago
nihuini
72ef77a469
fix build with NCNN_STRING off and NCNN_VULKAN on
5 years ago
zhiliu6
fb9d529487
fix compile error when NCNN_STRING is disabled ( #2874 )
5 years ago
nihuini
31d436c627
more verbose load failure, ncnn2int8 write int8 data properly
5 years ago
nihuini
1bc0126302
fix crash when input cpu blob and extract the same from gpu, update vgg16 int8 model
5 years ago
nihui
e9cc637573
arm neon optimization for int8 packing kernels ( #2809 )
5 years ago
nihui
32b48f0157
fix int8 auto pack layout
5 years ago
nihui
5fe75f19ef
architecture changes for int8 packing ( #2771 )
* quantize and dequantize tests
* unify activation and usability function
* drop NCNN_REQUANT cmake option, test dequantize requantize pack8, fix webassembly build
* benchmark use requantize int8 model
5 years ago
nihui
d4a7abc218
fix onnx2ncnn clip without max blob, fix #2788
5 years ago
nihui
67e24e0703
use local pool allocator ( #2736 )
* use local pool allocator
* detach extract feat from local allocator
* fix test
5 years ago
Cai Shanli
f5b307689b
fix net and extractor destroy order when use vulkan ( #2732 )
5 years ago
nihuini
b51959802c
fix buffer2host copy, fix #2725
5 years ago
Xu Yang
fd634e9a58
remove unnecessary mat clone when NCNN_BENCHMARK enabled ( #2708 )
5 years ago
Dahan Gong
cbd410c237
fix broken inplace forward ( #2709 )
5 years ago
Youngsoo Lee
b9bed8d993
feat: add denormal options ( #2656 )
* feat: add denormal options
Flush-To-Zero(FTZ) and Denormals-Are-Zero(DAZ) are modes that bypass IEEE754 methods of dealing with denormal floating-point numbers on x86_64 and some x86 CPUs.
* feat: Integrate `flush_denormals` into `Extractor::extract`
* chore: replace global variable with `ThreadLocalStorage`
5 years ago
nihui
9fd4d371ae
bridge image for adreno image upload and download ( #2658 )
* add bridge image for adreno image storage upload and download
* enable sbn1, print bugbilz flag
* blacklist old adreno
* let user choose use_image_storage option even when bug_storage_buffer_no_l1
5 years ago
nihuini
2a57ca4942
reduce memory usage in lightmode, handle upload image allocation failure properly
5 years ago
nihuini
bd68ee487b
fallback to cpu when image allocation failed, fix #2648
5 years ago
nihui
af7d8184aa
handle image allocation failure properly
5 years ago
nihui
09b2bf6213
Break down forward_layer ( #2577 )
5 years ago
nihui
54c0a13b9f
build shared library ( #2525 )
* build shared lib and enable lto
* reserved for layer and option
* allocator pimpl
* datareader pimpl
* paramdict pimpl, disable copy assign for allocator and datareader
* modelbin pimpl
* net extractor pimpl
* gpu pimple
* disable copy assign vulkandevice, code format
* command pimpl, dummy image readonly
* pipeline pipelinecache pimpl, export platform class
* code format, export simple family
* update ci
* disable lto on android armv7, merge webassembly ci
* link libgcc, fix macos dylib version
* pipeline pimpl, gpu info pimpl
* destroy gpu info after vulkan device
* ignore msvc stl class warning
* fix ncnn_paramdict_get_float return type
* fix vktransfer upload fp16 without flatten, add command test
5 years ago
nihui
1040f40c8b
update c api for custom allocator datareader modelbin and layer registration, add cookie userdata to layer
5 years ago
nihui
79efe33fdc
cmake option for platform api uses ( #2502 )
* cmake option for platform api uses
* adroid gpu ci does not rely on glslangvalidator, add android termux ci
5 years ago
nihui
343bc3b7dc
single blob consumer ( #2493 )
5 years ago
Zhuo Zhang
3c99287da5
fix src/net.cpp missing-field-initializers warning ( #2494 )
5 years ago
maxfy1992
0f325d7910
add decrease unpack pack overhead ( #2489 )
Co-authored-by: yangfengmax <yangfengmax@didichuxing.com>
5 years ago
Cai Shanli
a9df4f6c59
add custom layer destroyer ( #2481 )
* add custom layer destroyer
* set default layer destroyer with 0
5 years ago
Martin Han
b441f738bd
Extract on CPU without pack/fp16fp32 ( #2288 )
* Add readme for keras2ncnn
* Add supported model variants
* Fix supported model variants
* Add extract without convert pack/fp1632
5 years ago
PENGUINLIONG
8f8f2de4d0
SSE2 optimization pack ( #2123 )
* SSE2: BatchNorm
* Fixed batch norm in AVX configuration
* Optimized register size switch
* Attempt to pass CI
* Attempt to pass CI
* Bias op
* Element wise ops
* Support packing on x86 by default
* Fixed macro range in bias
* Use aligned read for packed data
* Update testutil.h
* Update pooling_x86.cpp
* Support wasn SIMD
* Fix emscripten compiler flags
* fix build
* more ci fix
* concat x86 pack4
* flatten x86 pack4
* more x86 pack4
* ci pass
* fix
* enable sse2 mathfun
* enable --experimental-wasm-simd
Co-authored-by: nihui <shuizhuyuanluo@126.com>
Co-authored-by: nihuini <nihuini@tencent.com>
5 years ago
nihui
cf3cf83cd3
unified image shader storage type ( #2231 )
* drop bug_layout_binding_id_alias flag
5 years ago
nihuini
b766c8cd9e
fix potential divide by zero fault when bf16s / fp16s enabled, fix #2125
5 years ago
nihuini
a334513b5e
fp16a option fix
5 years ago
nihuini
e841ae73c6
fix arm fp16s feat output, fix #2003
5 years ago
nihui
54e79a62d7
fix crash on non-arm82 build
5 years ago
nihui
c173d51c9b
mish sigmoid swish tanh arm fp16s
5 years ago
nihui
71f86af8a6
fix non-arm82 ci
5 years ago
nihui
9a2e2a6937
convert fp32 blobs for layers with fp16 storage support
5 years ago
nihui
308145254e
mask bf16 option in layer forward, disable gpu when bf16 enabled, fix #1962
5 years ago
nihui
71dc13625f
disable bf16 storage for int8 inference
5 years ago