* zfh zvfh xtheadvector infra
* dispatch for rvv and xtheadvector
* dispatch for non-vector zfh
* port xtheadvector recp rsqrt trunc
* general rvv gemm
* c906 and c910 ci
* old tuple code clean
* update riscv64 ci
* update build doc
* drop old th1520 toolchain
* only supports hann, hamming and all-one window
* inverse spectrogram does not support length parameter
* spectrogram always returns torch.view_as_real(out) as ncnn does not support complex typed mat yet
* inverse spectrogram always accepts torch.view_as_complex(in) as ncnn does not support complex typed mat yet
Depending on the arch, ldr/str can be faster than ld1/st1, especially
for loading to one lane form. For example, on Cortex A75,
1. execution latency of 'ldr q0' and 'ldr h0' are 5
2. execution latency of 'ld1 {v0.16b}' is 6
3. execution latency of 'ld1 {v0.h}[0]' is 8
On Cortex X3,
1. execution latency of 'ldr q0' and 'ldr h0' are 6
2. execution latency of 'ld1 {v0.16b}' is 6
3. execution latency of 'ld1 {v0.h}[0]' is 8
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
* benchmark: Don't use std::thread when NCNN_THREADS is OFF
* Add wasi support
cmake -B build -DCMAKE_BUILD_TYPE=Release \
--toolchain ${wasi_sdk}/share/cmake/wasi-sdk.cmake \
-DNCNN_RUNTIME_CPU=OFF \
-DNCNN_DISABLE_EXCEPTION=ON \
-DNCNN_THREADS=OFF
cmake --build build
After build, you can run benchncnn on cmdline with wasmtime:
wasmtime --dir . benchncnn
The updated ruapu adds support for multiple architectures such as RISC-V, MIPS, and Loongson, and can detect more Arm features.
The latest version is 10b02b3755.
Here is a big difference between C and C++. foo() in C means that
the funtion takes an unspecified number of arguments, while
foo(void) means the function takes no argument.
Fix -Wstrict-prototypes warning.
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>