nihui
171b9d1bba
use spdx license header, copyright Tencent ( #6152 )
11 months ago
nihui
19caca3140
port rvv intrinsic 1.0+ ( #5642 )
* zfh zvfh xtheadvector infra
* dispatch for rvv and xtheadvector
* dispatch for non-vector zfh
* port xtheadvector recp rsqrt trunc
* general rvv gemm
* c906 and c910 ci
* old tuple code clean
* update riscv64 ci
* update build doc
* drop old th1520 toolchain
1 year ago
nihui
9cefe9a624
avx vnni int8, avx vnni int16, avx ne convert infrastructure ( #5749 )
1 year ago
nihui
997c8926d7
use ruapu detection only on windows arm, enable cpu powerinfo with mingw compiler ( #5593 )
1 year ago
nihui
804ac3421d
infrastructure and optimization for a53 and a55 ( #4596 )
* new api for detecting arm midr and a53 a55 arch info wrapper
* let a35 be a53 :P
* a53 bf16s
* detect running core
3 years ago
nihui
18fbaebe68
get cpu l2 cache size and resolve gemm tile size ( #4411 )
* get cpu l2 cache size and resolve gemm tile size
* optimize constant tile K
* fix per-core l2 cache detection, better macos cpu cluster topology discovery
3 years ago
junchao-loongson
279222c2c9
add vector optimization for loongarch64 ( #4242 )
3 years ago
nihui
b853b3d132
get_physical_cpu_count api family ( #4302 )
* get_physical_cpu_count api family
* set default to physical big cpu
* always treat smt core as big core
* is_smt_cpu
* get max freq mhz on windows
* windows thread affinity
3 years ago
nihui
5725c028c0
arm dsp infrastructure and optimization for convolution gemm int8 ( #4011 )
3 years ago
nihui
b85bfb6085
armv8.2 asimdfhm and armv8.4 bf16 i8mm and armv8.6 sve sve2 compiler flags and runtime detection functions ( #3964 )
4 years ago
nihui
1377acf945
avx512 bf16 fp16 infrastructure ( #3926 )
4 years ago
nihui
32560f47de
detect more baseline avx512 flags ( #3687 )
4 years ago
nihui
457e066eb5
x86 f16c infrastructure ( #3577 )
4 years ago
nihui
4654030541
decouple x86 fma avx2 ( #3560 )
4 years ago
nihuini
51ecc33d9d
check avx512vl extension for discarding old-slow avx512 chips, enable avx512 option by default
4 years ago
nihui
672daa7e04
xop infrastructure and optimization ( #3541 )
4 years ago
nihui
930c36ebe2
avx512 infrastructure ( #3407 )
4 years ago
Tijmen Verhulsdonck
eaa7e24db6
Added ability to switch AVX/AVX2 during runtime ( #3076 )
4 years ago
nihui
1c31ac2549
runtime cpu dispatch for mips msa and loongson mmi
5 years ago
nihuini
afc02d57f9
runtime detect armv8.2 dotprod
5 years ago
nihui
17936e9f54
fix packing risc-v test, add cpu_riscv_vlenb()
5 years ago
nihui
11958424c2
runtime riscv v and zfh dispatch, riscv v optimization for cast
5 years ago
nihui
45bf3cd779
add runtime riscv v detection function, the initial c906 riscv linux toolchain
5 years ago
Youngsoo Lee
b9bed8d993
feat: add denormal options ( #2656 )
* feat: add denormal options
Flush-To-Zero(FTZ) and Denormals-Are-Zero(DAZ) are modes that bypass IEEE754 methods of dealing with denormal floating-point numbers on x86_64 and some x86 CPUs.
* feat: Integrate `flush_denormals` into `Extractor::extract`
* chore: replace global variable with `ThreadLocalStorage`
5 years ago
nihui
54c0a13b9f
build shared library ( #2525 )
* build shared lib and enable lto
* reserved for layer and option
* allocator pimpl
* datareader pimpl
* paramdict pimpl, disable copy assign for allocator and datareader
* modelbin pimpl
* net extractor pimpl
* gpu pimple
* disable copy assign vulkandevice, code format
* command pimpl, dummy image readonly
* pipeline pipelinecache pimpl, export platform class
* code format, export simple family
* update ci
* disable lto on android armv7, merge webassembly ci
* link libgcc, fix macos dylib version
* pipeline pimpl, gpu info pimpl
* destroy gpu info after vulkan device
* ignore msvc stl class warning
* fix ncnn_paramdict_get_float return type
* fix vktransfer upload fp16 without flatten, add command test
5 years ago
Zhuo Zhang
0bade9e6d0
fix typo in cpu.h's comment ( #2538 )
5 years ago
nihui
1184404fbf
support Apple M1 Silicon ( #2335 )
* recognize apple M1
* macos ios cpuset and thread affinity
* big little cpu topology for macos and ios
* silence affinity not supported error
5 years ago
Zhuo Zhang
f05999e792
get_little_cpu_count() and get_big_cpu_count() ( #2247 )
5 years ago
tpoisonooo
2c8288555c
fix(cpu): cpu number bigger than 64 ( #2083 )
Co-authored-by: nihui <shuizhuyuanluo@126.com>
5 years ago
nihuini
4e4f0baa73
set openmp blocktime 20 for reducing power consumption, blocktime option
5 years ago
nihui
bb5bfe3841
avx2 infrastructure ( #1943 )
5 years ago
nihuini
4c6bf24205
explicit cpu thread affinity
6 years ago
nihui
c819b4d839
fix build without openmp
6 years ago
nihuini
b7db8be4f6
add ncnn source qwq
9 years ago