nihuini
3631c1933d
non-inlined addref and release slows down overall speed, move them to header
5 years ago
nihui
e9cc637573
arm neon optimization for int8 packing kernels ( #2809 )
5 years ago
nihui
d7cbc055f3
fix illegal instruction on pi4 when NCNN_ARM82 enabled
compiler may compile inline member functions as noinline blocks for different architectures, and linker may pick the newer arch, that results illegal instructions on old hardware
5 years ago
teng
cc8e7a13fa
Update operation-param-weight-table ( #2811 )
5 years ago
DC Technology
86e53b35db
add Apple M1 benchmark ( #2810 )
* add Apple M1 benchmark
* Update README.md
5 years ago
nihuini
256754bff9
fix build with old gcc, fix #2805
5 years ago
teng
a90a31d340
doc spelling mistakes ( #2804 )
5 years ago
zhiliu6
61cd9da55b
optimize x86 3x3 pack8 leftover ( #2797 )
5 years ago
nihuini
912e81d086
fix tanh neon, fix #2751
5 years ago
nihuini
6c41822d35
do not remove optional hidden output for lstm/gru/rnn
5 years ago
nihuini
5e85f447e6
convert mxnet channel padding, fix some lgtm warnings
5 years ago
Cai Shanli
8011aea8c8
Update release-python.yml ( #2794 )
remove build wheel for ppc64le
5 years ago
nihui
32b48f0157
fix int8 auto pack layout
5 years ago
nihui
1ea8bfbd2e
x86 avx2 conv3x3s1 pack8 direct optimization, fix #2789
5 years ago
ncnnnnn
6e6cb9f4f3
simple sort ncnn_add_layer_test ( #2790 )
for obsessive
5 years ago
nihui
a48bf43ef7
test conv/fc int8 with activation
5 years ago
nihui
5fe75f19ef
architecture changes for int8 packing ( #2771 )
* quantize and dequantize tests
* unify activation and usability function
* drop NCNN_REQUANT cmake option, test dequantize requantize pack8, fix webassembly build
* benchmark use requantize int8 model
5 years ago
nihui
d4a7abc218
fix onnx2ncnn clip without max blob, fix #2788
5 years ago
nihui
eaee64c782
export vkcommand and vkcompute
5 years ago
nihuini
e86799e95f
fix get_big_cpu_count return zero on smp cpu
5 years ago
nihui
7c079b853e
default to big cpu count
5 years ago
restyled-io[bot]
5f00ba89d2
feat(ncnnoptimize): replace denormals to zero on layers with weights ( #2690 )
* feat(ncnnoptimize): replace denormals to zero on layers with weights
Co-authored-by: youngsoo.lee <youngsoo15.lee@gmail.com>
Co-authored-by: Restyled.io <commits@restyled.io>
5 years ago
nihui
19c183cacc
fix ncnnoptimize default cut range
5 years ago
restyled-io[bot]
5e565cfa8a
Restyle add net Cut Function ( #2763 )
* add ncnnoptimise cut net function
* add ncnnoptimise cut net function
* Restyled by clang-format
* Restyled by astyle
* Restyled by clang-format
* Restyled by astyle
Co-authored-by: chenty <admin@chenty.com>
Co-authored-by: Restyled.io <commits@restyled.io>
5 years ago
Cai Shanli
e9a61b3e64
fix python return value policy ( #2756 )
5 years ago
nihui
67e24e0703
use local pool allocator ( #2736 )
* use local pool allocator
* detach extract feat from local allocator
* fix test
5 years ago
Cai Shanli
64ef781a9c
add test python vulkan ( #2754 )
* add test python vulkan
* add python vulkan ci on linux with 3.6 and 3.8
* fix ci vulkan icd
5 years ago
nihuini
15d63ec0f5
fuse onnx multiheadattention with same qkv blob
5 years ago
Cai Shanli
f5b307689b
fix net and extractor destroy order when use vulkan ( #2732 )
5 years ago
RBelogorodtsevFBase
1212ed6e94
implements gelu activation ( #2749 )
5 years ago
PENGUINLIONG
2d31868389
Trouble shooting guide for Vulkan on NVIDIA GPUs ( #2731 )
5 years ago
nihui
b58cd14678
fix non arm-neon build
5 years ago
nihui
0870bf45b1
optimize warpaffine family
5 years ago
nihuini
e449435dbe
fix mlir2ncnn warning, prettier alignment :)
5 years ago
nihuini
c8ccccf045
adapt mlir changes
5 years ago
nihuini
c17eb4e208
multiheadattention layer
5 years ago
Zhiqiang Wang
2c370b2c58
Fix typo in docs ( #2727 )
5 years ago
nihuini
b51959802c
fix buffer2host copy, fix #2725
5 years ago
nihuini
b0d16325b1
fuse onnx binaryop with scalar
5 years ago
Cai Shanli
2edb3ed7a4
nanodet python demo ( #2723 )
* nanodet python demo
* add clip
* fix clip wh
* remove nms package requirement
5 years ago
nihuini
f7cbcaa72b
fix onnx normalize expand ghost shape
5 years ago
nihuini
c910574b5b
fuse onnx multiheadattention
5 years ago
teng
c3466a7798
fix array index out of bounds in examples/yolact.cpp ( #2722 )
5 years ago
RangiLyu
ecf1f413b4
fix duplicate variable name in examples/nanodet.cpp ( #2719 )
5 years ago
nihuini
f2a5ea7678
fix layernorm ghost input without affine
5 years ago
nihuini
7ac23ab34d
fuse onnx layernorm, fix 2-dim layernorm implementation, add test
5 years ago
zhiliu6
57397c418d
Optimize general AVX2 convolution. ( #2714 )
5 years ago
Xu Yang
fd634e9a58
remove unnecessary mat clone when NCNN_BENCHMARK enabled ( #2708 )
5 years ago
Dahan Gong
cbd410c237
fix broken inplace forward ( #2709 )
5 years ago
restyled-io[bot]
8c9bea2322
Restyle faster bbox calculation by background score ( #2693 )
* faster bbox calculation by background score
* Restyled by clang-format
* Restyled by astyle
* Restyled by clang-format
* Restyled by astyle
Co-authored-by: Qoo <r97922153@gmail.com>
Co-authored-by: Restyled.io <commits@restyled.io>
5 years ago