1399 Commits (2bc77e7487d07a40667fcf9f8fffa17ca75e0523)

Author SHA1 Message Date
  nihui f10cc6dd93
initial data structure changes for 3dcnn, conv3d, pooling3d (#3378) 4 years ago
  nihui f4f7fabe27
fix python wheel, parallel 3 for macos tasks (#3396) 4 years ago
  nihui 24fbb6e8cb
honor thread setting on load and vulkan command, ci avx512 t4 (#3391) 4 years ago
  nihui ed1fb210ea
arm neon optimization for innerproduct int8 gemm (#3367) 4 years ago
  nihui ac3d32aa0d
get_elf_hwcap_from_getauxval (#3301) 4 years ago
  nihui 4566ad529a
fix rounding on x86 avx and armv7 neon (#3365) 4 years ago
  源源球球✨ f66110b7e0
delete useless variables (#3360) 4 years ago
  nihui f433f86874
fix squeeze expanddims axes and add test (#3359) 4 years ago
  nihui 525df8bcc5
rnn/lstm/gru with unequal input output (#3352) 4 years ago
  nihui f448a8f595
implement interp-1d on 2d blob (#3349) 4 years ago
  nihui 5eb4a2ccd0
implement convolutiondepthwise1d (#3342) 4 years ago
  nihui b3a521981b
implement interp cubic aligncorner (#3338) 4 years ago
  nihui adfc8b25bc
fix deconv output pad (#3337) 4 years ago
  nihui f86a307ab5
silence code scanning 4 years ago
  nihui 6a52e8e5f2
fix potential integer type overflow 4 years ago
  nihui a5b846371a fix adaptive pooling vulkan, second try 4 years ago
  Wang Kangyu 53040f566b
use int ceil/floor div (#3333) 4 years ago
  nihuini 514770a37e
fix adaptive pooling vulkan 4 years ago
  nihuini 80f2138b0c
fix adaptive pooling 4 years ago
  nihuini 6c2cee8186
fix mat clone with atypical source cstep 4 years ago
  nihuini a490f8a533
fix layernorm with affine 4 years ago
  Tijmen Verhulsdonck ac5dc23ccc
added a number of optimized sse layers (#3302) 4 years ago
  zhiliu6 a08f700775
Optimize avx convolution activation (#3299) 4 years ago
  Tijmen Verhulsdonck e3aa893dfb
move custom_layer_to_index to public (#3294) 4 years ago
  nihui aa9753b2f0
detach mat from local blob allocator so net instance could be destroyed much earlier (#3287) 4 years ago
  zhiliu6 814f89ef1a
Fuse HardSwish activation into Convolution and InnerProduct (#3233) 4 years ago
  nihui 4313d23355
update ci swiftshader 20211002 (#2366) 4 years ago
  Tijmen Verhulsdonck 4270b5c502
Fix broken codepaths with AVX only (#3254) 4 years ago
  Zhuo Zhang 492297d2f6
add A15 and M1 macro definitions (#3263) 4 years ago
  nihui 57ad2c138c fix build on c906, fix #3230 4 years ago
  nihui ceec22cd46
Update convolution_riscv.cpp (#3050) 4 years ago
  Xavier Hsinyuan a2f89e7392
RVV fp16/fp32 optimized Dropout, GRU and Softmax (#3200) 4 years ago
  TianZer c44a6c7f47
Remove two potential warnings for VisualStudio (#3188) 4 years ago
  nihui 5a35c2b11e
load model memory by reference (#3179) 4 years ago
  nihui e9b5bbcd2d
fix armv7 roundmode (#3176) 4 years ago
  yaobyPerfxlab ec561736a5
Riscv64 c906 d1 (#3159) 4 years ago
  nihui 51652a2280
NCNN_MALLOC_OVERREAD for potential kernel reading data out of allocation size (#3155) 4 years ago
  nihuini a99cc13611
ncnnoptimize generate proper weight for int8 scales, fix non-neon innerproduct int8, fix #3157 4 years ago
  nihui c6cda8d07c
arm neon optimization for requantize leakyrelu (#3144) 4 years ago
  nihuini 169614f732
fix build with NCNN_STDIO off 4 years ago
  nihui da7b64b833 fix build on c906 4 years ago
  nihuini 11794675f3
apple a11 and a12 do not support armv8.2 dotprod, restore the fp16-only optimized path 4 years ago
  nihuini b9460c6e8f
fix armv7 requantize relu, fix #3122 4 years ago
  Xavier Hsinyuan 2a5c672787
Add unittest and RVV optimized for SELU (#3114) 4 years ago
  Xavier Hsinyuan d78add2acd
RVV optimized PReLU, with fp16 support (#3113) 4 years ago
  nihui f9a16ea1ec fix build on c906 4 years ago
  Xavier Hsinyuan 9933cc776a
RVV optimized HardSwish and HardSigmoid (#3108) 4 years ago
  nihui d91cccfb55 apply code-format changes 4 years ago
  Xavier Hsinyuan 99440e67f7
RVV optimized binaryop, with fp16 support (#3097) 4 years ago
  nihui 2c4ae09604
fix #2961 (#3095) 4 years ago