1427 Commits (340b4e673eb3c5264e1d97d3a5a696eb231dc400)

Author SHA1 Message Date
  Kagurazaka Kotori 08ecc94d63
x86: Use _mm_cvtsi128_si{32,64} in float2int8 (#3536) 4 years ago
  teng 3ff9ae707f
simplify macro (#3530) 4 years ago
  Kagurazaka Kotori 5c078016c2
x86/avx_mathfun.h: Remove fallback warnings (#3527) 4 years ago
  nihui 2d46994d2e wrap avxvnni and avx512vnni build options over cpu feature detector 4 years ago
  nihui bae2ee375f simplify c api layer forward_n output array type 4 years ago
  nihui c0a94cd9ca
fix armv7 without neon (#3514) 4 years ago
  nihui 4e4e0b9cf8 do not link libgcc as we no longer rely on builtin support cpu feature intrinsics now 4 years ago
  nihui d95213a005
x86 convolution int8 optimization third stage (#3506) 4 years ago
  nihuini 9f7f491885
use the old-style __cpuid_count for old compiler compatibility, fix #3510 4 years ago
  nihui 930c36ebe2
avx512 infrastructure (#3407) 4 years ago
  nihui c2896bcd4d
x86 convolution int8 optimization second stage (#3495) 4 years ago
  teng 13a51fbcf8
add else (#3494) 4 years ago
  nihui e9b8f0a6ef
x86 avx2 optimization for convolution gemm int8 (#3489) 4 years ago
  nihui c5d7f963b9
layer tile (#3491) 4 years ago
  nihui 922f8b33c1
reduction4d, merge keepdims arg, add test (#3469) 4 years ago
  nihui 713e712ba6
fix slow fp32/int32 crop on arm82 (#3462) 4 years ago
  nihui 2d98c86ecd
branch less mat channel (#3452) 4 years ago
  nihui 3a83704c38
binary4d, unary4d (#3443) 4 years ago
  nihuini f02b259a15
convert some pnnx reduction family 4 years ago
  nihui 6941ec8fc9
arm neon optimization for general packed convolution (#3426) 4 years ago
  tpoisonooo dddcdb97b2
feat(src): add mat default constructor (#3427) 4 years ago
  nihui 76f2dddc37
cmake option for c api (#3423) 4 years ago
  nihui 878cb713d5
optional arm82 dot source (#3415) 4 years ago
  nihui 999e640d43
dynamic convolution weight (#3408) 4 years ago
  nihui 426e564b6e
general simd optimization for convolution1d (#3404) 4 years ago
  nihui 14c3802de3
c api for 4d mat (#3403) 4 years ago
  nihui f98c396e6b
crop4d (#3402) 4 years ago
  nihui cf20dbc0bd
relu3d, batchnorm3d, reshape4d, flatten4d, permute4d (#3397) 4 years ago
  nihui f10cc6dd93
initial data structure changes for 3dcnn, conv3d, pooling3d (#3378) 4 years ago
  nihui f4f7fabe27
fix python wheel, parallel 3 for macos tasks (#3396) 4 years ago
  nihui 24fbb6e8cb
honor thread setting on load and vulkan command, ci avx512 t4 (#3391) 4 years ago
  nihui ed1fb210ea
arm neon optimization for innerproduct int8 gemm (#3367) 4 years ago
  nihui ac3d32aa0d
get_elf_hwcap_from_getauxval (#3301) 4 years ago
  nihui 4566ad529a
fix rounding on x86 avx and armv7 neon (#3365) 4 years ago
  源源球球✨ f66110b7e0
delete useless variables (#3360) 4 years ago
  nihui f433f86874
fix squeeze expanddims axes and add test (#3359) 4 years ago
  nihui 525df8bcc5
rnn/lstm/gru with unequal input output (#3352) 4 years ago
  nihui f448a8f595
implement interp-1d on 2d blob (#3349) 4 years ago
  nihui 5eb4a2ccd0
implement convolutiondepthwise1d (#3342) 4 years ago
  nihui b3a521981b
implement interp cubic aligncorner (#3338) 4 years ago
  nihui adfc8b25bc
fix deconv output pad (#3337) 4 years ago
  nihui f86a307ab5
silence code scanning 4 years ago
  nihui 6a52e8e5f2
fix potential integer type overflow 4 years ago
  nihui a5b846371a fix adaptive pooling vulkan, second try 4 years ago
  Wang Kangyu 53040f566b
use int ceil/floor div (#3333) 4 years ago
  nihuini 514770a37e
fix adaptive pooling vulkan 4 years ago
  nihuini 80f2138b0c
fix adaptive pooling 4 years ago
  nihuini 6c2cee8186
fix mat clone with atypical source cstep 4 years ago
  nihuini a490f8a533
fix layernorm with affine 4 years ago
  Tijmen Verhulsdonck ac5dc23ccc
added a number of optimized sse layers (#3302) 4 years ago