peerless2012
1d1ad06459
Fix get_elf_hwcap always return 0 on HarmonyOS NEXT. ( #5951 )
1 year ago
Yexuan Wu
3571d7e8ec
Support better API to detect big little core in windows after win7 ( #5927 )
1 year ago
nihui
19caca3140
port rvv intrinsic 1.0+ ( #5642 )
* zfh zvfh xtheadvector infra
* dispatch for rvv and xtheadvector
* dispatch for non-vector zfh
* port xtheadvector recp rsqrt trunc
* general rvv gemm
* c906 and c910 ci
* old tuple code clean
* update riscv64 ci
* update build doc
* drop old th1520 toolchain
1 year ago
nihui
9cefe9a624
avx vnni int8, avx vnni int16, avx ne convert infrastructure ( #5749 )
1 year ago
nihui
8105c75120
improve compatibility of harmonyos cpu topology abi ( #5740 )
1 year ago
nihui
997c8926d7
use ruapu detection only on windows arm, enable cpu powerinfo with mingw compiler ( #5593 )
1 year ago
quink
a05c113f80
Add wasi support ( #5534 )
* benchmark: Don't use std::thread when NCNN_THREADS is OFF
* Add wasi support
cmake -B build -DCMAKE_BUILD_TYPE=Release \
--toolchain ${wasi_sdk}/share/cmake/wasi-sdk.cmake \
-DNCNN_RUNTIME_CPU=OFF \
-DNCNN_DISABLE_EXCEPTION=ON \
-DNCNN_THREADS=OFF
cmake --build build
After build, you can run benchncnn on cmdline with wasmtime:
wasmtime --dir . benchncnn
1 year ago
nihui
5a8f79f7c7
add apple A17 and M3 family macro ( #5405 )
2 years ago
Shatyuka
5a11c383a2
Support LLVM OpenMP runtime for MSVC ( #5370 )
with `/openmp:llvm` compile option
2 years ago
nihui
4797d19873
ruapu cpu isa detection ( #5341 )
2 years ago
nihui
984d6dd844
promote vfpv4 for auto fp16 storage conversion ( #5325 )
* promote vfpv4 for auto fp16 storage conversion
* always report neon and vfpv4 for arm64
2 years ago
nihui
5b536af234
fix uwp build ( #5328 )
2 years ago
nihui
87d7165848
disable signal based detectisa if being debugged ( #5280 )
2 years ago
nihui
d1d9aa2edb
fix some cpu.cpp warning ( #5244 )
2 years ago
nihui
b7f70cfe4e
initialize cpu thread affinity mask all to all cores ( #5231 )
call omp_set_num_threads with zero num_threads is implementation defined
2 years ago
nihui
1138312f1e
detect avx512 isa with signal action on macos ( #5185 )
2 years ago
nihui
058aa0ad37
enable arm neon intrinsics for msvc build ( #5151 )
2 years ago
nihui
3eb2969db9
fix build with ohos toolchain ( #5105 )
2 years ago
Shirui Cao
7646392d3e
Fix clang-cl.exe compatibility by using the correct cpuid() built-in function ( #4738 )
3 years ago
nihui
1b4a8fd4b2
fix warnings and code clean ( #4729 )
3 years ago
nihui
2b87dc2cf7
force global cpu info initialization ( #4725 )
* force global cpu info initialization
* sanitize zero nT
3 years ago
nihui
8c7c21b5fb
fix fp resource leak in cpu.cpp ( #4704 )
3 years ago
nihui
804ac3421d
infrastructure and optimization for a53 and a55 ( #4596 )
* new api for detecting arm midr and a53 a55 arch info wrapper
* let a35 be a53 :P
* a53 bf16s
* detect running core
3 years ago
nihui
06b97d7e69
fix exynos 9810 isa detection ( #4585 )
3 years ago
nihui
bbc770079e
silence fopen error on sysfs cache files
3 years ago
nihui
6869c81ed3
find cpu cache size from sysfs ( #4502 )
* find cpu cache size from sysfs
* android l3
* make g_thread_affinity_mask singleton
* global mask
3 years ago
nihui
17197b3c45
ci build with musl libc ( #4499 )
3 years ago
nihui
18fbaebe68
get cpu l2 cache size and resolve gemm tile size ( #4411 )
* get cpu l2 cache size and resolve gemm tile size
* optimize constant tile K
* fix per-core l2 cache detection, better macos cpu cluster topology discovery
3 years ago
nihui
c934c6e94a
fix openmp affinity abort when cpu goes offline ( #4370 )
3 years ago
junchao-loongson
279222c2c9
add vector optimization for loongarch64 ( #4242 )
3 years ago
nihui
b853b3d132
get_physical_cpu_count api family ( #4302 )
* get_physical_cpu_count api family
* set default to physical big cpu
* always treat smt core as big core
* is_smt_cpu
* get max freq mhz on windows
* windows thread affinity
3 years ago
nihui
512e584a6a
general cpu feature detection on macos/ios, enable bf16 and i8mm on a15 a16 and m2 ( #4300 )
3 years ago
nihui
5725c028c0
arm dsp infrastructure and optimization for convolution gemm int8 ( #4011 )
4 years ago
nihui
b85bfb6085
armv8.2 asimdfhm and armv8.4 bf16 i8mm and armv8.6 sve sve2 compiler flags and runtime detection functions ( #3964 )
4 years ago
nihui
1377acf945
avx512 bf16 fp16 infrastructure ( #3926 )
4 years ago
nihui
a061871c1c
use getauxval since android api 18 ( #3718 )
4 years ago
nihui
32560f47de
detect more baseline avx512 flags ( #3687 )
4 years ago
nihui
72c467d1d9
mips msa optimization for quantize dequantize requantize ( #3672 )
4 years ago
nihui
457e066eb5
x86 f16c infrastructure ( #3577 )
4 years ago
nihui
4654030541
decouple x86 fma avx2 ( #3560 )
4 years ago
nihuini
51ecc33d9d
check avx512vl extension for discarding old-slow avx512 chips, enable avx512 option by default
4 years ago
nihui
672daa7e04
xop infrastructure and optimization ( #3541 )
4 years ago
nihui
2d46994d2e
wrap avxvnni and avx512vnni build options over cpu feature detector
4 years ago
nihuini
9f7f491885
use the old-style __cpuid_count for old compiler compatibility, fix #3510
4 years ago
nihui
930c36ebe2
avx512 infrastructure ( #3407 )
4 years ago
nihui
f4f7fabe27
fix python wheel, parallel 3 for macos tasks ( #3396 )
4 years ago
nihui
ac3d32aa0d
get_elf_hwcap_from_getauxval ( #3301 )
4 years ago
Zhuo Zhang
492297d2f6
add A15 and M1 macro definitions ( #3263 )
4 years ago
nihuini
11794675f3
apple a11 and a12 do not support armv8.2 dotprod, restore the fp16-only optimized path
4 years ago
Tijmen Verhulsdonck
eaa7e24db6
Added ability to switch AVX/AVX2 during runtime ( #3076 )
5 years ago