nihui
8d2ac57824
fix missing asimdfhm target macro in ndk-r21 ( #5804 )
1 year ago
nihui
9cefe9a624
avx vnni int8, avx vnni int16, avx ne convert infrastructure ( #5749 )
1 year ago
nihui
1c7af00499
gemm int8 quantization ( #5706 )
* quantize gemm
* write gemm quantize scales
* update doc
* less openmp args
* x86 riscv fallback
* skip gemm vulkan int8
* fix noint8 test, fix arm bf16 test
* enable vfpv4 on neon build only
* fix gemm vulkan without C
* fp16 pack8 output
* enable elempack=8 only for asimdhp+
* tiled gemm int8 test
* opt arm64 tiles, fix asimdhp dispatch
1 year ago
nihui
c4a007406d
windows clang ci ( #5469 )
* windows clang ci
* clang msvc use x86intrin.h for xop
* test arm64 compiler features
2 years ago
nihui
08b7d99a75
rnn/lstm/gru dynamic quantization ( #5435 )
2 years ago
nihui
fafb897ff7
update ios toolchain, add visionos ci, update watchos, ncnn target ilp32 ( #5399 )
2 years ago
nihui
556b79ce4d
create layer decoupled ( #5258 )
* create layer decoupled
* no more virtual public
* allow build test with shared library
* decouple cpu vulkan
* drop old scripts
2 years ago
nihui
058aa0ad37
enable arm neon intrinsics for msvc build ( #5151 )
2 years ago
nihui
b4f26237cb
in-house vulkan loader ( #5130 )
* vulkan-driver-loader.md
* static vulkan on apple
2 years ago
nihui
9ecf6a61be
x86 optimization for convolution int8 gemm unified elempack ( #4881 )
2 years ago
nihui
6c21b08727
check loongarch lasx and enable ( #4820 )
2 years ago
nihui
7fb16be32a
fix aarch64 build without fp16 conversion intrinsics ( #4713 )
* fix aarch64 build without fp16 conversion intrinsics
* vfpv4 always implies neon
3 years ago
nihui
d2d012dce5
x86 bfloat16 cast functions ( #4491 )
* simplify cast fp16 avx512 dispatch
* define sse4.1 macro on msvc avx+
3 years ago
nihui
15761fc1a6
arm vfpv4 asimdhp asimdfhm optimization for gemm ( #4432 )
3 years ago
nihui
7b3261dace
gemm arm optimization ( #4426 )
* cmake determine target 32bit and 64bit
* include opt source with non-runtime cpu
* check compiler support gnu style inline assembly
3 years ago
nihui
c934c6e94a
fix openmp affinity abort when cpu goes offline ( #4370 )
3 years ago
nihui
f527fe88ee
update glslang ( #4361 )
3 years ago
junchao-loongson
279222c2c9
add vector optimization for loongarch64 ( #4242 )
3 years ago
Xavier Hsinyuan
e7eadca6c1
RVV: use new interface for segment load/store & change word_type to size_t&add clang ci (part #4100 ) ( #4118 )
* RVV: use size_t for vl
* RVV: replace vsseg.v tuple type by using regex
-----
search:
vsseg([1-9])e(8|16|32)_v_(f|i|u)\2m(1|2|4|8)x\1\(([ -~]+), vcreate_\3\2m\4x\1\(([ -~]+)\), vl\);
substitute by:
vsseg$1e$2_v_$3$2m$4($5, $6, vl);
* RVV: replace vssseg.v tuple types by using regex
---
search:
vssseg([1-9])e(8|16|32)_v_f\2m1x\1\(([ -~]+), vcreate_f\2m1x\1\(([ -~]+)\), vl\);
substitute by:
vssseg$1e$2_v_f$2m1($3, $4, vl);
* RVV: replace vlseg.v tuple types in load/store
* RVV: replace vloxseg2ei32.v tuple types
* RVV: add a wrapper for old compilers
* RVV: add segment load/store wrapper in pakcing
* RVV: fix cmake test
* RVV: make clang happy by dropping VLAs in sgemm
* RVV: add clang cmake toolchain configure
* RVV: add clang ci, riscv64-unknown-linux-gnu
Co-authored-by: thelastlin <thelastlin@users.noreply.github.com>
Co-authored-by: nihui <shuizhuyuanluo@126.com>
3 years ago
nihui
b4ba207c18
more strict compiler rvv checks, drop rvv-071 support ( #4094 )
3 years ago
nihui
76849cede4
armv8.4 i8mm optimization for convolution gemm int8 ( #4034 )
3 years ago
nihui
dd86cebab8
armv8.6 ci and coverage ( #4025 )
* asimdfhm in fc
* move neon bf16 conversion function to arm_usability header
* fix cmake option
* fix build with newer gcc
* arm84 coverage
* arm asimdfhm optimization for innerproduct gemm fp16s
3 years ago
nihui
b85bfb6085
armv8.2 asimdfhm and armv8.4 bf16 i8mm and armv8.6 sve sve2 compiler flags and runtime detection functions ( #3964 )
3 years ago
nihui
440bfdd2cc
x86 f16c optimization for innerproduct ( #3944 )
3 years ago
nihui
1fd7138d2f
armv7 vfpv4 infrastructure ( #3929 )
* armv7 vfpv4 infrastructure
* optional fp16 format ieee
* arm neon assembly optimization for cast fp16/bf16
3 years ago
nihui
1377acf945
avx512 bf16 fp16 infrastructure ( #3926 )
3 years ago
nihui
7886e90c65
split arm82 source for smaller binary and memory footprint ( #3877 )
* split arm82 source, wip
* check compiler arm82 only for arm 64bit target
* drop arm82 registery
* strict check compiler support arm82
4 years ago
nihui
241524ffce
discard weight memory for x86 arm vulkan ( #3865 )
* discard weight memory for x86 and vulkan
* drop arm innerproduct weight
* drop arm convolution weight
* drop arm convolutiondepthwise weight
* drop x86 vulkan deconvolution deconvolutiondepthwise weight
* drop arm deconvolution deconvolutiondepthwise weight
* arm neon assembly optimization for innerproduct pack4
4 years ago
Xavier Hsinyuan
29b6a32ac0
RVV: follow intrinsic doc, replace vfredsum_* with vfredusum_* ( #3790 )
* RVV: follow intrinsic doc, vfredusum -> vfredsum
* C906: change toolchains for vfredusum
* RVV: test compiler for vfredusum_vs_*
4 years ago
nihui
9826f3dbf8
shader include vulkan activation, workaround for moltenvk tanh half4 issue ( #3711 )
4 years ago
nihui
dadc640c66
x86 avx512 optimization ( #3581 )
* unified relu avx512
* unifed clip avx512
* unaryop avx512
* sigmoid avx512
* binaryop avx512
* padding convolution avx512
* convolutiondepthwise avx512
* innerproduct avx512
* reshape avx512
* slice avx512
* hardsigmoid hardswish avx512
* swish avx512
* pooling avx512
* crop avx512
* convolution sgemm pack16
* convolution 3x3 winograd pack16
* interp avx512
* convolution sgemm pack1to16
* convolution sgemm pack16to8
* convolution sgemm pack8to16
* convolution sgemm pack16to4
* fix vulkan permute pack8
* fix vulkan convolution gemm pack8to1
4 years ago
nihui
a9c59bb93c
add -mavx512bw flag for avx512 build ( #3671 )
4 years ago
nihui
4eb279ce26
add loongson mmi compiler header, less msa prefetch distance ( #3678 )
4 years ago
nihui
1fcad0e765
loongson mmi optional layer
4 years ago
nihui
457e066eb5
x86 f16c infrastructure ( #3577 )
4 years ago
nihui
4654030541
decouple x86 fma avx2 ( #3560 )
4 years ago
nihuini
51ecc33d9d
check avx512vl extension for discarding old-slow avx512 chips, enable avx512 option by default
4 years ago
nihui
672daa7e04
xop infrastructure and optimization ( #3541 )
4 years ago
nihui
930c36ebe2
avx512 infrastructure ( #3407 )
4 years ago
nihui
878cb713d5
optional arm82 dot source ( #3415 )
4 years ago
nihuini
11794675f3
apple a11 and a12 do not support armv8.2 dotprod, restore the fp16-only optimized path
4 years ago
nihuini
affbefe311
some space cleanup, blob clone from allocator
4 years ago
Tijmen Verhulsdonck
eaa7e24db6
Added ability to switch AVX/AVX2 during runtime ( #3076 )
4 years ago
Tijmen Verhulsdonck
a7f301a99d
Add clang compatiblity ( #3071 )
* Add clang compatiblity
Add ability to build NCNN lib on windows with clang GNU
* Restyled/pull 3071 (#16 )
* [skip ci] Restyled by clang-format
* [skip ci] Restyled by astyle
* [skip ci] Restyled by clang-format
* [skip ci] Restyled by astyle
Co-authored-by: Restyled.io <commits@restyled.io>
Co-authored-by: Restyled.io <commits@restyled.io>
4 years ago
nihui
1c31ac2549
runtime cpu dispatch for mips msa and loongson mmi
4 years ago
nihui
2f70343aec
cmake clean ( #3032 )
4 years ago
nihui
bcbb55f033
apple device always has armv8.2 dot ( #2963 )
5 years ago
nihuini
afc02d57f9
runtime detect armv8.2 dotprod
5 years ago
nihui
11958424c2
runtime riscv v and zfh dispatch, riscv v optimization for cast
5 years ago
nihui
e9cc637573
arm neon optimization for int8 packing kernels ( #2809 )
5 years ago