78 Commits (f1bdc87478c64e0dfdba3d679e6e0dfd4b84df80)

Author SHA1 Message Date
  nihui 8d2ac57824
fix missing asimdfhm target macro in ndk-r21 (#5804) 1 year ago
  nihui 9cefe9a624
avx vnni int8, avx vnni int16, avx ne convert infrastructure (#5749) 1 year ago
  nihui 1c7af00499
gemm int8 quantization (#5706) 1 year ago
  nihui c4a007406d
windows clang ci (#5469) 2 years ago
  nihui 08b7d99a75
rnn/lstm/gru dynamic quantization (#5435) 2 years ago
  nihui fafb897ff7
update ios toolchain, add visionos ci, update watchos, ncnn target ilp32 (#5399) 2 years ago
  nihui 556b79ce4d
create layer decoupled (#5258) 2 years ago
  nihui 058aa0ad37
enable arm neon intrinsics for msvc build (#5151) 2 years ago
  nihui b4f26237cb
in-house vulkan loader (#5130) 2 years ago
  nihui 9ecf6a61be
x86 optimization for convolution int8 gemm unified elempack (#4881) 2 years ago
  nihui 6c21b08727
check loongarch lasx and enable (#4820) 2 years ago
  nihui 7fb16be32a
fix aarch64 build without fp16 conversion intrinsics (#4713) 3 years ago
  nihui d2d012dce5
x86 bfloat16 cast functions (#4491) 3 years ago
  nihui 15761fc1a6
arm vfpv4 asimdhp asimdfhm optimization for gemm (#4432) 3 years ago
  nihui 7b3261dace
gemm arm optimization (#4426) 3 years ago
  nihui c934c6e94a
fix openmp affinity abort when cpu goes offline (#4370) 3 years ago
  nihui f527fe88ee
update glslang (#4361) 3 years ago
  junchao-loongson 279222c2c9
add vector optimization for loongarch64 (#4242) 3 years ago
  Xavier Hsinyuan e7eadca6c1
RVV: use new interface for segment load/store & change word_type to size_t&add clang ci (part #4100) (#4118) 3 years ago
  nihui b4ba207c18
more strict compiler rvv checks, drop rvv-071 support (#4094) 3 years ago
  nihui 76849cede4
armv8.4 i8mm optimization for convolution gemm int8 (#4034) 3 years ago
  nihui dd86cebab8
armv8.6 ci and coverage (#4025) 3 years ago
  nihui b85bfb6085
armv8.2 asimdfhm and armv8.4 bf16 i8mm and armv8.6 sve sve2 compiler flags and runtime detection functions (#3964) 3 years ago
  nihui 440bfdd2cc
x86 f16c optimization for innerproduct (#3944) 3 years ago
  nihui 1fd7138d2f
armv7 vfpv4 infrastructure (#3929) 3 years ago
  nihui 1377acf945
avx512 bf16 fp16 infrastructure (#3926) 3 years ago
  nihui 7886e90c65
split arm82 source for smaller binary and memory footprint (#3877) 4 years ago
  nihui 241524ffce
discard weight memory for x86 arm vulkan (#3865) 4 years ago
  Xavier Hsinyuan 29b6a32ac0
RVV: follow intrinsic doc, replace vfredsum_* with vfredusum_* (#3790) 4 years ago
  nihui 9826f3dbf8
shader include vulkan activation, workaround for moltenvk tanh half4 issue (#3711) 4 years ago
  nihui dadc640c66
x86 avx512 optimization (#3581) 4 years ago
  nihui a9c59bb93c
add -mavx512bw flag for avx512 build (#3671) 4 years ago
  nihui 4eb279ce26
add loongson mmi compiler header, less msa prefetch distance (#3678) 4 years ago
  nihui 1fcad0e765 loongson mmi optional layer 4 years ago
  nihui 457e066eb5
x86 f16c infrastructure (#3577) 4 years ago
  nihui 4654030541
decouple x86 fma avx2 (#3560) 4 years ago
  nihuini 51ecc33d9d
check avx512vl extension for discarding old-slow avx512 chips, enable avx512 option by default 4 years ago
  nihui 672daa7e04
xop infrastructure and optimization (#3541) 4 years ago
  nihui 930c36ebe2
avx512 infrastructure (#3407) 4 years ago
  nihui 878cb713d5
optional arm82 dot source (#3415) 4 years ago
  nihuini 11794675f3
apple a11 and a12 do not support armv8.2 dotprod, restore the fp16-only optimized path 4 years ago
  nihuini affbefe311
some space cleanup, blob clone from allocator 4 years ago
  Tijmen Verhulsdonck eaa7e24db6
Added ability to switch AVX/AVX2 during runtime (#3076) 4 years ago
  Tijmen Verhulsdonck a7f301a99d
Add clang compatiblity (#3071) 4 years ago
  nihui 1c31ac2549 runtime cpu dispatch for mips msa and loongson mmi 4 years ago
  nihui 2f70343aec
cmake clean (#3032) 4 years ago
  nihui bcbb55f033
apple device always has armv8.2 dot (#2963) 5 years ago
  nihuini afc02d57f9 runtime detect armv8.2 dotprod 5 years ago
  nihui 11958424c2 runtime riscv v and zfh dispatch, riscv v optimization for cast 5 years ago
  nihui e9cc637573
arm neon optimization for int8 packing kernels (#2809) 5 years ago