nihui/ncnn - ncnn - 开源协同云脑生态支撑系统

Commit Graph

Author	SHA1	Message	Date
nihui	8d2ac57824	fix missing asimdfhm target macro in ndk-r21 (#5804 )	1 year ago
nihui	9cefe9a624	avx vnni int8, avx vnni int16, avx ne convert infrastructure (#5749 )	1 year ago
nihui	1c7af00499	gemm int8 quantization (#5706 ) * quantize gemm * write gemm quantize scales * update doc * less openmp args * x86 riscv fallback * skip gemm vulkan int8 * fix noint8 test, fix arm bf16 test * enable vfpv4 on neon build only * fix gemm vulkan without C * fp16 pack8 output * enable elempack=8 only for asimdhp+ * tiled gemm int8 test * opt arm64 tiles, fix asimdhp dispatch	1 year ago
nihui	c4a007406d	windows clang ci (#5469 ) * windows clang ci * clang msvc use x86intrin.h for xop * test arm64 compiler features	2 years ago
nihui	08b7d99a75	rnn/lstm/gru dynamic quantization (#5435 )	2 years ago
nihui	fafb897ff7	update ios toolchain, add visionos ci, update watchos, ncnn target ilp32 (#5399 )	2 years ago
nihui	556b79ce4d	create layer decoupled (#5258 ) * create layer decoupled * no more virtual public * allow build test with shared library * decouple cpu vulkan * drop old scripts	2 years ago
nihui	058aa0ad37	enable arm neon intrinsics for msvc build (#5151 )	2 years ago
nihui	b4f26237cb	in-house vulkan loader (#5130 ) * vulkan-driver-loader.md * static vulkan on apple	2 years ago
nihui	9ecf6a61be	x86 optimization for convolution int8 gemm unified elempack (#4881 )	2 years ago
nihui	6c21b08727	check loongarch lasx and enable (#4820 )	2 years ago
nihui	7fb16be32a	fix aarch64 build without fp16 conversion intrinsics (#4713 ) * fix aarch64 build without fp16 conversion intrinsics * vfpv4 always implies neon	3 years ago
nihui	d2d012dce5	x86 bfloat16 cast functions (#4491 ) * simplify cast fp16 avx512 dispatch * define sse4.1 macro on msvc avx+	3 years ago
nihui	15761fc1a6	arm vfpv4 asimdhp asimdfhm optimization for gemm (#4432 )	3 years ago
nihui	7b3261dace	gemm arm optimization (#4426 ) * cmake determine target 32bit and 64bit * include opt source with non-runtime cpu * check compiler support gnu style inline assembly	3 years ago
nihui	c934c6e94a	fix openmp affinity abort when cpu goes offline (#4370 )	3 years ago
nihui	f527fe88ee	update glslang (#4361 )	3 years ago
junchao-loongson	279222c2c9	add vector optimization for loongarch64 (#4242 )	3 years ago
Xavier Hsinyuan	e7eadca6c1	RVV: use new interface for segment load/store & change word_type to size_t&add clang ci (part #4100 ) (#4118 ) * RVV: use size_t for vl * RVV: replace vsseg.v tuple type by using regex ----- search: vsseg([1-9])e(8\|16\|32)_v_(f\|i\|u)\2m(1\|2\|4\|8)x\1$([ -~]+), vcreate_\3\2m\4x\1\(([ -~]+)$, vl\); substitute by: vsseg$1e$2_v_$3$2m$4($5, $6, vl); * RVV: replace vssseg.v tuple types by using regex --- search: vssseg([1-9])e(8\|16\|32)_v_f\2m1x\1$([ -~]+), vcreate_f\2m1x\1\(([ -~]+)$, vl\); substitute by: vssseg$1e$2_v_f$2m1($3, $4, vl); * RVV: replace vlseg.v tuple types in load/store * RVV: replace vloxseg2ei32.v tuple types * RVV: add a wrapper for old compilers * RVV: add segment load/store wrapper in pakcing * RVV: fix cmake test * RVV: make clang happy by dropping VLAs in sgemm * RVV: add clang cmake toolchain configure * RVV: add clang ci, riscv64-unknown-linux-gnu Co-authored-by: thelastlin <thelastlin@users.noreply.github.com> Co-authored-by: nihui <shuizhuyuanluo@126.com>	3 years ago
nihui	b4ba207c18	more strict compiler rvv checks, drop rvv-071 support (#4094 )	3 years ago
nihui	76849cede4	armv8.4 i8mm optimization for convolution gemm int8 (#4034 )	3 years ago
nihui	dd86cebab8	armv8.6 ci and coverage (#4025 ) * asimdfhm in fc * move neon bf16 conversion function to arm_usability header * fix cmake option * fix build with newer gcc * arm84 coverage * arm asimdfhm optimization for innerproduct gemm fp16s	3 years ago
nihui	b85bfb6085	armv8.2 asimdfhm and armv8.4 bf16 i8mm and armv8.6 sve sve2 compiler flags and runtime detection functions (#3964 )	3 years ago
nihui	440bfdd2cc	x86 f16c optimization for innerproduct (#3944 )	3 years ago
nihui	1fd7138d2f	armv7 vfpv4 infrastructure (#3929 ) * armv7 vfpv4 infrastructure * optional fp16 format ieee * arm neon assembly optimization for cast fp16/bf16	3 years ago
nihui	1377acf945	avx512 bf16 fp16 infrastructure (#3926 )	3 years ago
nihui	7886e90c65	split arm82 source for smaller binary and memory footprint (#3877 ) * split arm82 source, wip * check compiler arm82 only for arm 64bit target * drop arm82 registery * strict check compiler support arm82	4 years ago
nihui	241524ffce	discard weight memory for x86 arm vulkan (#3865 ) * discard weight memory for x86 and vulkan * drop arm innerproduct weight * drop arm convolution weight * drop arm convolutiondepthwise weight * drop x86 vulkan deconvolution deconvolutiondepthwise weight * drop arm deconvolution deconvolutiondepthwise weight * arm neon assembly optimization for innerproduct pack4	4 years ago
Xavier Hsinyuan	29b6a32ac0	RVV: follow intrinsic doc, replace vfredsum_* with vfredusum_* (#3790 ) * RVV: follow intrinsic doc, vfredusum -> vfredsum * C906: change toolchains for vfredusum * RVV: test compiler for vfredusum_vs_*	4 years ago
nihui	9826f3dbf8	shader include vulkan activation, workaround for moltenvk tanh half4 issue (#3711 )	4 years ago
nihui	dadc640c66	x86 avx512 optimization (#3581 ) * unified relu avx512 * unifed clip avx512 * unaryop avx512 * sigmoid avx512 * binaryop avx512 * padding convolution avx512 * convolutiondepthwise avx512 * innerproduct avx512 * reshape avx512 * slice avx512 * hardsigmoid hardswish avx512 * swish avx512 * pooling avx512 * crop avx512 * convolution sgemm pack16 * convolution 3x3 winograd pack16 * interp avx512 * convolution sgemm pack1to16 * convolution sgemm pack16to8 * convolution sgemm pack8to16 * convolution sgemm pack16to4 * fix vulkan permute pack8 * fix vulkan convolution gemm pack8to1	4 years ago
nihui	a9c59bb93c	add -mavx512bw flag for avx512 build (#3671 )	4 years ago
nihui	4eb279ce26	add loongson mmi compiler header, less msa prefetch distance (#3678 )	4 years ago
nihui	1fcad0e765	loongson mmi optional layer	4 years ago
nihui	457e066eb5	x86 f16c infrastructure (#3577 )	4 years ago
nihui	4654030541	decouple x86 fma avx2 (#3560 )	4 years ago
nihuini	51ecc33d9d	check avx512vl extension for discarding old-slow avx512 chips, enable avx512 option by default	4 years ago
nihui	672daa7e04	xop infrastructure and optimization (#3541 )	4 years ago
nihui	930c36ebe2	avx512 infrastructure (#3407 )	4 years ago
nihui	878cb713d5	optional arm82 dot source (#3415 )	4 years ago
nihuini	11794675f3	apple a11 and a12 do not support armv8.2 dotprod, restore the fp16-only optimized path	4 years ago
nihuini	affbefe311	some space cleanup, blob clone from allocator	4 years ago
Tijmen Verhulsdonck	eaa7e24db6	Added ability to switch AVX/AVX2 during runtime (#3076 )	4 years ago
Tijmen Verhulsdonck	a7f301a99d	Add clang compatiblity (#3071 ) * Add clang compatiblity Add ability to build NCNN lib on windows with clang GNU * Restyled/pull 3071 (#16) * [skip ci] Restyled by clang-format * [skip ci] Restyled by astyle * [skip ci] Restyled by clang-format * [skip ci] Restyled by astyle Co-authored-by: Restyled.io <commits@restyled.io> Co-authored-by: Restyled.io <commits@restyled.io>	4 years ago
nihui	1c31ac2549	runtime cpu dispatch for mips msa and loongson mmi	4 years ago
nihui	2f70343aec	cmake clean (#3032 )	4 years ago
nihui	bcbb55f033	apple device always has armv8.2 dot (#2963 )	5 years ago
nihuini	afc02d57f9	runtime detect armv8.2 dotprod	5 years ago
nihui	11958424c2	runtime riscv v and zfh dispatch, riscv v optimization for cast	5 years ago
nihui	e9cc637573	arm neon optimization for int8 packing kernels (#2809 )	5 years ago

1 2

78 Commits (f1bdc87478c64e0dfdba3d679e6e0dfd4b84df80)