nihui
9ecf6a61be
x86 optimization for convolution int8 gemm unified elempack ( #4881 )
2 years ago
nihui
6c21b08727
check loongarch lasx and enable ( #4820 )
2 years ago
nihui
7fb16be32a
fix aarch64 build without fp16 conversion intrinsics ( #4713 )
* fix aarch64 build without fp16 conversion intrinsics
* vfpv4 always implies neon
3 years ago
nihui
d2d012dce5
x86 bfloat16 cast functions ( #4491 )
* simplify cast fp16 avx512 dispatch
* define sse4.1 macro on msvc avx+
3 years ago
nihui
15761fc1a6
arm vfpv4 asimdhp asimdfhm optimization for gemm ( #4432 )
3 years ago
nihui
7b3261dace
gemm arm optimization ( #4426 )
* cmake determine target 32bit and 64bit
* include opt source with non-runtime cpu
* check compiler support gnu style inline assembly
3 years ago
nihui
c934c6e94a
fix openmp affinity abort when cpu goes offline ( #4370 )
3 years ago
nihui
f527fe88ee
update glslang ( #4361 )
3 years ago
junchao-loongson
279222c2c9
add vector optimization for loongarch64 ( #4242 )
3 years ago
Xavier Hsinyuan
e7eadca6c1
RVV: use new interface for segment load/store & change word_type to size_t&add clang ci (part #4100 ) ( #4118 )
* RVV: use size_t for vl
* RVV: replace vsseg.v tuple type by using regex
-----
search:
vsseg([1-9])e(8|16|32)_v_(f|i|u)\2m(1|2|4|8)x\1\(([ -~]+), vcreate_\3\2m\4x\1\(([ -~]+)\), vl\);
substitute by:
vsseg$1e$2_v_$3$2m$4($5, $6, vl);
* RVV: replace vssseg.v tuple types by using regex
---
search:
vssseg([1-9])e(8|16|32)_v_f\2m1x\1\(([ -~]+), vcreate_f\2m1x\1\(([ -~]+)\), vl\);
substitute by:
vssseg$1e$2_v_f$2m1($3, $4, vl);
* RVV: replace vlseg.v tuple types in load/store
* RVV: replace vloxseg2ei32.v tuple types
* RVV: add a wrapper for old compilers
* RVV: add segment load/store wrapper in pakcing
* RVV: fix cmake test
* RVV: make clang happy by dropping VLAs in sgemm
* RVV: add clang cmake toolchain configure
* RVV: add clang ci, riscv64-unknown-linux-gnu
Co-authored-by: thelastlin <thelastlin@users.noreply.github.com>
Co-authored-by: nihui <shuizhuyuanluo@126.com>
3 years ago
nihui
b4ba207c18
more strict compiler rvv checks, drop rvv-071 support ( #4094 )
3 years ago
nihui
76849cede4
armv8.4 i8mm optimization for convolution gemm int8 ( #4034 )
3 years ago
nihui
dd86cebab8
armv8.6 ci and coverage ( #4025 )
* asimdfhm in fc
* move neon bf16 conversion function to arm_usability header
* fix cmake option
* fix build with newer gcc
* arm84 coverage
* arm asimdfhm optimization for innerproduct gemm fp16s
3 years ago
nihui
b85bfb6085
armv8.2 asimdfhm and armv8.4 bf16 i8mm and armv8.6 sve sve2 compiler flags and runtime detection functions ( #3964 )
3 years ago
nihui
440bfdd2cc
x86 f16c optimization for innerproduct ( #3944 )
3 years ago
nihui
1fd7138d2f
armv7 vfpv4 infrastructure ( #3929 )
* armv7 vfpv4 infrastructure
* optional fp16 format ieee
* arm neon assembly optimization for cast fp16/bf16
3 years ago
nihui
1377acf945
avx512 bf16 fp16 infrastructure ( #3926 )
3 years ago
nihui
7886e90c65
split arm82 source for smaller binary and memory footprint ( #3877 )
* split arm82 source, wip
* check compiler arm82 only for arm 64bit target
* drop arm82 registery
* strict check compiler support arm82
4 years ago
nihui
241524ffce
discard weight memory for x86 arm vulkan ( #3865 )
* discard weight memory for x86 and vulkan
* drop arm innerproduct weight
* drop arm convolution weight
* drop arm convolutiondepthwise weight
* drop x86 vulkan deconvolution deconvolutiondepthwise weight
* drop arm deconvolution deconvolutiondepthwise weight
* arm neon assembly optimization for innerproduct pack4
4 years ago
Xavier Hsinyuan
29b6a32ac0
RVV: follow intrinsic doc, replace vfredsum_* with vfredusum_* ( #3790 )
* RVV: follow intrinsic doc, vfredusum -> vfredsum
* C906: change toolchains for vfredusum
* RVV: test compiler for vfredusum_vs_*
4 years ago
nihui
9826f3dbf8
shader include vulkan activation, workaround for moltenvk tanh half4 issue ( #3711 )
4 years ago
nihui
dadc640c66
x86 avx512 optimization ( #3581 )
* unified relu avx512
* unifed clip avx512
* unaryop avx512
* sigmoid avx512
* binaryop avx512
* padding convolution avx512
* convolutiondepthwise avx512
* innerproduct avx512
* reshape avx512
* slice avx512
* hardsigmoid hardswish avx512
* swish avx512
* pooling avx512
* crop avx512
* convolution sgemm pack16
* convolution 3x3 winograd pack16
* interp avx512
* convolution sgemm pack1to16
* convolution sgemm pack16to8
* convolution sgemm pack8to16
* convolution sgemm pack16to4
* fix vulkan permute pack8
* fix vulkan convolution gemm pack8to1
4 years ago
nihui
a9c59bb93c
add -mavx512bw flag for avx512 build ( #3671 )
4 years ago
nihui
4eb279ce26
add loongson mmi compiler header, less msa prefetch distance ( #3678 )
4 years ago
nihui
1fcad0e765
loongson mmi optional layer
4 years ago
nihui
457e066eb5
x86 f16c infrastructure ( #3577 )
4 years ago
nihui
4654030541
decouple x86 fma avx2 ( #3560 )
4 years ago
nihuini
51ecc33d9d
check avx512vl extension for discarding old-slow avx512 chips, enable avx512 option by default
4 years ago
nihui
672daa7e04
xop infrastructure and optimization ( #3541 )
4 years ago
nihui
930c36ebe2
avx512 infrastructure ( #3407 )
4 years ago
nihui
878cb713d5
optional arm82 dot source ( #3415 )
4 years ago
nihuini
11794675f3
apple a11 and a12 do not support armv8.2 dotprod, restore the fp16-only optimized path
4 years ago
nihuini
affbefe311
some space cleanup, blob clone from allocator
4 years ago
Tijmen Verhulsdonck
eaa7e24db6
Added ability to switch AVX/AVX2 during runtime ( #3076 )
4 years ago
Tijmen Verhulsdonck
a7f301a99d
Add clang compatiblity ( #3071 )
* Add clang compatiblity
Add ability to build NCNN lib on windows with clang GNU
* Restyled/pull 3071 (#16 )
* [skip ci] Restyled by clang-format
* [skip ci] Restyled by astyle
* [skip ci] Restyled by clang-format
* [skip ci] Restyled by astyle
Co-authored-by: Restyled.io <commits@restyled.io>
Co-authored-by: Restyled.io <commits@restyled.io>
4 years ago
nihui
1c31ac2549
runtime cpu dispatch for mips msa and loongson mmi
4 years ago
nihui
2f70343aec
cmake clean ( #3032 )
4 years ago
nihui
bcbb55f033
apple device always has armv8.2 dot ( #2963 )
5 years ago
nihuini
afc02d57f9
runtime detect armv8.2 dotprod
5 years ago
nihui
11958424c2
runtime riscv v and zfh dispatch, riscv v optimization for cast
5 years ago
nihui
e9cc637573
arm neon optimization for int8 packing kernels ( #2809 )
5 years ago
nihui
3ed6c21565
find threads in cmake config
5 years ago
nihui
14d319db36
include arm82 on native macos arm64
5 years ago
nihui
54c0a13b9f
build shared library ( #2525 )
* build shared lib and enable lto
* reserved for layer and option
* allocator pimpl
* datareader pimpl
* paramdict pimpl, disable copy assign for allocator and datareader
* modelbin pimpl
* net extractor pimpl
* gpu pimple
* disable copy assign vulkandevice, code format
* command pimpl, dummy image readonly
* pipeline pipelinecache pimpl, export platform class
* code format, export simple family
* update ci
* disable lto on android armv7, merge webassembly ci
* link libgcc, fix macos dylib version
* pipeline pimpl, gpu info pimpl
* destroy gpu info after vulkan device
* ignore msvc stl class warning
* fix ncnn_paramdict_get_float return type
* fix vktransfer upload fp16 without flatten, add command test
5 years ago
nihui
1040f40c8b
update c api for custom allocator datareader modelbin and layer registration, add cookie userdata to layer
5 years ago
Cai Shanli
a9df4f6c59
add custom layer destroyer ( #2481 )
* add custom layer destroyer
* set default layer destroyer with 0
5 years ago
nihui
e93ad408c5
Ci release ( #2440 )
* package openmp and glslang together
* find glslang targets in lib64
* define version string
* update moltenvk
5 years ago
nihui
f4f790ca1f
ci macos arm64 ( #2321 )
5 years ago
nihui
b9296c259d
bring up vulkan 1.1 ( #2191 )
* query subgroup features
* compile spirv 1.3
* drop offline spirv build
* do not build tests for android and ios, as they are never tested anyway
* code style
5 years ago
youzainn
3b1b41ec0b
Add some compile options, add vulkan dependency export ( #2062 )
* vulkan cmake export templete
* 1) vulkan cmake dependency export. 2) support opencv_world import. 3) add BUILD_WITH_STATIC_CRT option
* Threads dependency
* NCNN_BUILD_WITH_STATIC_CRT option
* we do not support cmake before version 3.15 for option NCNN_BUILD_WITH_STATIC_CRT
5 years ago