271 Commits (da44ec5b140bebd565d029b60f2959f02f96ebf2)

Author SHA1 Message Date
  nihui c5640a16c3
gemm x86 multiply alpha beta in post gemm stage, enable one_blob_only (#4407) 3 years ago
  nihui fd1ac3c7a0
x86 optimization for gemm unified elempack (#4387) 3 years ago
  nihui 0736c5b658
Fix c api allocator (#4360) 3 years ago
  nihui 057b5bb515
split tests (#4354) 3 years ago
  nihui eceac35a7f
implement MultiheadAttention kdim vdim (#4347) 3 years ago
  nihui 498ca7341b
squeeze and expanddims 4d (#4346) 3 years ago
  Lry89757 6a47f8d15c
gridsample op support (#4288) 3 years ago
  nihui 5b28c1730e
implement ncnn fold and unfold (#4326) 3 years ago
  nihui 6e49fa30dc
groupnorm 1d/2d/4d (#4312) 3 years ago
  Fangjun Kuang 5281d51535
implement GLU and pnnx conversion (#4283) 3 years ago
  nihui 0b591b0d1f
implement layer feature disabled bit (#4278) 3 years ago
  miemie2013 b13c2a16ce
Optimize x86 DeformableConv2D (#4128) 3 years ago
  nihui 77eda4c19f
implement lstm proj_size (#4263) 3 years ago
  Lry89757 5eb56b2ea5
[Gelu x86] Finish intrinsic with elempack merged(fast version) (#4144) 3 years ago
  Lry89757 9f59711338
[Prelu x86] Finish intrinsic with elempack merged (#4177) 3 years ago
  Lry89757 9278f90114
[Elu x86] Finish intrinsic with elempack merged (#4153) 3 years ago
  LinHe 03f2ad38ce
Layer Norm x86 SIMD Optimizations (#4065) 3 years ago
  miemie2013 720f3c9aab
Add DeformableConv2D (#4070) 3 years ago
  Lry89757 13a9533984
[BatchNorm Optimize x86] AVX512 intrinsic (#4061) 3 years ago
  nihui 76849cede4
armv8.4 i8mm optimization for convolution gemm int8 (#4034) 3 years ago
  nihui dd86cebab8
armv8.6 ci and coverage (#4025) 3 years ago
  nihui 706831f8a9
arm vfpv4 optimization for innerproduct (#3950) 4 years ago
  nihui 440bfdd2cc
x86 f16c optimization for innerproduct (#3944) 4 years ago
  nihui 067e8e1d92
mips unified elempack for elementwise layers (#3928) 4 years ago
  nihui 1377acf945
avx512 bf16 fp16 infrastructure (#3926) 4 years ago
  nihui 20a14bf5ae
arm convolution winograd dot function, adjust arm convolution winograd strategy (#3915) 4 years ago
  nihui ca0ba4b25f
fine grained winograd options, adjust x86 convolution winograd strategy (#3908) 4 years ago
  Evgeny Proydakov 184c479b64
Added a simple unittest for Power layer. (#3893) 4 years ago
  nihui 241524ffce
discard weight memory for x86 arm vulkan (#3865) 4 years ago
  tpoisonooo 6fd801b6d7
feat(src/layer): add vision_transformer benchmark (#3730) 4 years ago
  NaLan ZeYu 5388f9f312
test: fix printf arguments mismatch (#3774) 4 years ago
  nihui f9c1787de9
implement einsum layer and pnnx conversion (#3768) 4 years ago
  nihui ee6402553c
layernorm for vector and mat along w, pnnx convnext end2end test (#3764) 4 years ago
  jasonZhang e62d674e5d
Add unittest and SSE&AVX optimized for BNLL (#3759) 4 years ago
  nihui 308965b7e9 sanitize cooperative matrix option in tests 4 years ago
  nihui 0ea327b557
x86 sse/avx/avx512 optimization for softmax (#3712) 4 years ago
  nihui 131f3d1323
x86 avx512 optimization for convolution winograd pack16to1 and deconvolution family, increase simpleomp argv count (#3694) 4 years ago
  nihui dadc640c66
x86 avx512 optimization (#3581) 4 years ago
  nihui 559e5b23f9
vulkan tensorcore optimization (#3628) 4 years ago
  nihui 002c07d4ec
mix vulkan winograd f23 and f43 (#3639) 4 years ago
  nihui d42e048b56
pnnx convert torch.addmm (#3634) 4 years ago
  nihui 6e19ab26ba
massive vulkan optimization (#3602) 4 years ago
  nihui 2880eff264
deconv1d deconv3d (#3584) 4 years ago
  nihui 920aa79f04
drop x86 avx2 fp16 (#3568) 4 years ago
  nihui d452eca28f
convert torch.matmul, eliminate noop pad and identity op, fuse transpose matmul, fuse select to unbind (#3554) 4 years ago
  Yuzhong Yan 681141ff42
[YZ] Fix bug in unit test (#3556) 4 years ago
  nihui 33e225f173 fix c api test 4 years ago
  nihui c5d7f963b9
layer tile (#3491) 4 years ago
  Xiaohan Liu 3daabd515d
add missing doffset (#3475) 4 years ago
  nihui 922f8b33c1
reduction4d, merge keepdims arg, add test (#3469) 4 years ago