Galasnow
5b5c1fdb8f
Fix build error with NDK r27 ( #5615 )
Enable policy CMP0057 for cmake version >= 3.3
1 year ago
佰阅
391152f500
c_api surpport set_vulkan_device ( #5610 )
1 year ago
quink
92e0b8253b
arm/convolution_3x3_pack1to8_fp16s: prefer ldr/str over ld1/st1 ( #5603 )
Depending on the arch, ldr/str can be faster than ld1/st1, especially
for loading to one lane form. For example, on Cortex A75,
1. execution latency of 'ldr q0' and 'ldr h0' are 5
2. execution latency of 'ld1 {v0.16b}' is 6
3. execution latency of 'ld1 {v0.h}[0]' is 8
On Cortex X3,
1. execution latency of 'ldr q0' and 'ldr h0' are 6
2. execution latency of 'ld1 {v0.16b}' is 6
3. execution latency of 'ld1 {v0.h}[0]' is 8
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
1 year ago
張小凡
051b04ffb4
Updated use-ncnn-with-pytorch-or-onnx document ( #5557 )
1 year ago
lll143653
d355b6dc5b
Add warning and recommend to use pnnx ( #5588 )
1 year ago
nihui
3ee5c18f84
pnnx logaddexp ( #5598 )
1 year ago
nihui
e82015878c
Update modelwriter.h for mha scale param
1 year ago
nihui
f825d3a23c
pnnx fuse onnx sdpa pattern and ncnn qdim mha fusion ( #5589 )
1 year ago
nihui
997c8926d7
use ruapu detection only on windows arm, enable cpu powerinfo with mingw compiler ( #5593 )
1 year ago
zhangyang2057
081a9c39c8
Fix tanh typo for rvv. ( #5584 )
* Fix tanh typo for rvv.
* Fix tanh for rvv fp16.
1 year ago
nihui
569617f212
pnnx convert onnx expand/permute/repeat/reshape/select/slice/cat/ceil/chunk/flatten/floor/maximum/minimum/split/squeeze/stack/transpose/unbind/unsqueeze ( #5583 )
1 year ago
nihui
e7cae68a22
pnnx convert onnx logsoftmax/logsigmoid/mish/selu/sigmoid/silu/softmin/softplus/softshrink/softsign/tanh/tanhshrink ( #5581 )
1 year ago
nihui
1c40615b2d
pnnx convert onnx sdap reduce min/max/mean/sum/prod ( #5579 )
* pnnx convert onnx sdap
* test reduce
1 year ago
nihui
3752d71200
fix potential fp16s bf16s conflicts on arm vfpv4 ( #5578 )
* fix potential fp16s bf16s conflicts on armv7 vfpv4
* but prefer fp16 on armv8.2
1 year ago
nihui
c59885aeac
pnnx convert onnx multiheadattention ( #5575 )
* pnnx convert onnx multiheadattention
* onnx reducemean reducesum
* reducemax reducemin reduceprod
* mask buggy torch
* avoid shadow output
1 year ago
nihui
854678b5f3
pnnx convert onnx prelu gelu elu leakyrelu relu6 celu hardshrink hardsigmoid hardswish clip ( #5572 )
1 year ago
luxincn
02327ba96f
add esp32 build document and ci Refs #5536 ( #5567 )
1 year ago
UOPiceman
74037b49f8
Add Axera AX630C benchmark ( #5559 )
1 year ago
inspireMeNow
1225793e56
benchmark: add Snapdragon 765G and CVITEK SG2000 ( #5555 )
1 year ago
青菜萝 卜冬瓜
9f62738e7c
Add RaspberryPi 5 CPU Overclock benchmark. ( #5561 )
1 year ago
nihui
d264b6353a
pnnx convert onnx rnn lstm gru ( #5553 )
1 year ago
quink
a05c113f80
Add wasi support ( #5534 )
* benchmark: Don't use std::thread when NCNN_THREADS is OFF
* Add wasi support
cmake -B build -DCMAKE_BUILD_TYPE=Release \
--toolchain ${wasi_sdk}/share/cmake/wasi-sdk.cmake \
-DNCNN_RUNTIME_CPU=OFF \
-DNCNN_DISABLE_EXCEPTION=ON \
-DNCNN_THREADS=OFF
cmake --build build
After build, you can run benchncnn on cmdline with wasmtime:
wasmtime --dir . benchncnn
1 year ago
py1066
317635b2fb
add OrangePicm4 ( #5551 )
1 year ago
TianZer
fc6b753d31
Add mingw ci and building document ( #5547 )
1 year ago
inspireMeNow
e48c8aa21c
benchmark: add OrangePi 5 Plus ( #5550 )
1 year ago
青菜萝 卜冬瓜
5d4f0213ce
Add Snapdragon 888 benchmark. ( #5548 )
1 year ago
nihui
74d3eb2345
pnnx convert onnx layernorm instancenorm groupnorm ( #5533 )
* pnnx convert onnx layernorm
* fuse early
* skip layernorm affine false test for torch 2.1
* pnnx convert onnx layernorm instancenorm groupnorm
* take num_features from input shape for instancenorm module
* torch < 1.10 can not handle track_running_stats=True
1 year ago
nihui
98de53920e
pnnx convert onnx lrn, fuse if with constant condition, canonicalize and check foldable inside subgraph ( #5532 )
1 year ago
nihui
755f8e1a6b
pnnx convert onnx batchnorm ( #5529 )
1 year ago
nihui
109ed2665c
pnnx convert onnx avgpool maxpool ( #5527 )
* pnnx convert onnx avgpool
* fuse gather indices
* generalized conv convtranspose
* skip maxpool dilation test for torch < 1.12
* ceil mode for opset 9
1 year ago
sakria9
c8e6e18918
benchmark: add EPYC 7742 and V100 ( #5528 )
1 year ago
nihui
4c3debae2d
multiheadattention scale param ( #5526 )
* update swiftshader
* skip vs2017 swiftshader
1 year ago
nihui
f2a34ee7ae
update pybind11 to 2.12, support numpy 2 ( #5525 )
1 year ago
nihui
f56b18aaf0
pnnx convert onnx resize upsample ( #5522 )
1 year ago
Xyzhao
fbd6690d6c
fix: add NCNN_PLATFORM_API macro for VkAndroidHardwareBufferImageAllocator ( #5521 )
1 year ago
nihui
8235cad999
mha allow qdim differs from embed_dim ( #5519 )
* test mha oom
1 year ago
nihui
2828e7ae96
pnnx reset onnx input shape, convert torch.tile torch.where ( #5517 )
* pnnx reset onnx input shape
* eliminate noop cast
1 year ago
CharlieYu
b786af56f8
benchmark: add raspberry pi 5 benchmark after GPU overclock ( #5518 )
1 year ago
nihui
ffb2fe60ee
pnnx convert onnx pad linear sigmoid softmax relu ( #5516 )
* pnnx convert onnx pad
* pnnx convert onnx linear sigmoid softmax relu
* old onnx softmax
1 year ago
nihui
21babb7eed
pnnx convert onnx conv convtranspose ( #5515 )
1 year ago
nihui
e389a15846
update qcom855plus benchmark
1 year ago
nihui
39c27de47b
test concat oom ( #5502 )
1 year ago
nihui
093c516898
test slice oom ( #5501 )
1 year ago
Wei Wu
bb54d575a0
Update ruapu.h to the latest version. ( #5499 )
The updated ruapu adds support for multiple architectures such as RISC-V, MIPS, and Loongson, and can detect more Arm features.
The latest version is 10b02b3755 .
1 year ago
nihui
da7d1a10f7
test x86 arm convolution oom ( #5492 )
* skip mips loongarch riscv oom test atm
* test softmax oom
1 year ago
nihui
03ca9053c1
Update linux-x64-cpu-gcc.yml
1 year ago
nihui
102e98970f
fix unexpected abs error on powerpc vsx ( #5498 )
1 year ago
nihui
19ea54f266
more x86 vnni optimization for lstm ( #5496 )
* workaround vs2019 crash
1 year ago
nihui
debc33fee2
arm handle allocation failures ( #5490 )
1 year ago
nihui
b4379630fb
x86 handle allocation failures ( #5489 )
1 year ago