483 Commits (cd4be6d0fadd6d01635a4fd3934d97e90e6f71ff)

Author SHA1 Message Date
  nihuini cd4be6d0fa call vulkan create_pipeline on the vkdev condition, drop opt_cpu hacks 6 years ago
  nihuini 81a028547a fix bus error on armv7 6 years ago
  nihuini 19d75955d6 arm neon assembly optimization for conv3x3s1 winograd pack4to1 6 years ago
  bindog 04b4b02324 [WIP] add reduce op support for onnx (#1308) 6 years ago
  nihuini 22a2be4e6c fix crop pack4 with reference blob 6 years ago
  nihuini 6a8e5c58da fix build on armv7 6 years ago
  nihuini e63e2449fd arm neon assembly optimization for conv7x7s2 pack1to4 6 years ago
  nihui 56fd26a2da arm neon assembly optimization for conv1x1s1 pack4to1 6 years ago
  nihui 7ad514917b fix potential out of write on unroll 12 remainder 6 years ago
  nihuini 15e86dc8e9 reduce pack4 weight memory usage for specialized kernel, reduce runtime memory usage in conv3x3s1 winograd 6 years ago
  nihuini 581a06d471 since innerproduct pack4 always consumes flattened blob, which layout is same as pack1 branch, so reuse pack1 implementation to reduce memory usage 6 years ago
  nihuini c5f1dc3fe4 arm neon assembly optimization for conv3x3s1 pack4to1 6 years ago
  nihui 2f8b31c3b4 unroll outch 2 for conv3x3s1 pack1to4 6 years ago
  nihui e0f6e3f669 pre-interleave 8-channel weight data on aarch64, conv1x1s1 version 6 years ago
  nihuini d11bf14d44 pre-interleave 8-channel weight data on aarch64 6 years ago
  nihui 7173b6e38e arm neon assembly optimization for conv3x3s2 pack4 6 years ago
  nihuini cf0c49dd71 arm neon assembly optimization for conv5x5s1 pack4 and conv5x5s2 pack4 6 years ago
  nihui 9e529354fb arm neon optimization for conv1x1s2 pack4 6 years ago
  nihuini f8f3b0b5aa shufflechannel pack4 6 years ago
  nihuini 50d5896ce7 reshape pack4 6 years ago
  nihuini 624291e2b2 use subop optimization for group convolution deconvolution pack4 family 6 years ago
  nihui 48e3e7d49c move neon activation into a wrapper function 6 years ago
  nihui b37ecab630 auto flatten before innerproduct pack4 6 years ago
  nihui afd1f08194 arm neon assembly optimization for pooling2x2s2 max pack4 6 years ago
  nihui e19b7097df arm neon assembly optimization for conv3x3s1 pack1to4 6 years ago
  nihui 3ac6335ba3 hardsigmoid and hardswish pack4 6 years ago
  nihui 21e74487b4 arm neon optimization for convdw5x5 pack4 6 years ago
  volvet ecd64fb36b Fixed lots of compile warnings (#1286) 6 years ago
  nihui 3e1bad4880 arm neon assembly optimization for pooling3x3s2 max pack4 6 years ago
  nihui 08a97c169f arm neon assembly optimization for relu pack4 6 years ago
  nihui a1bd88fb4a arm neon assembly optimization for padding constant pack4 6 years ago
  nihui 17f343e7e4 convdw3x3 pack4 arm neon assembly optimization 6 years ago
  nihui 6703286831 the very long ld1, one less load 6 years ago
  nihui 22a3ade6ce unroll size 12 for conv1x1s1 and conv3x3s1 winograd pack4 on aarch64 6 years ago
  nihui 3a452f734a arm neon assembly optimization for conv3x3s2 pack1to4 6 years ago
  nihui 6edd42f566 arm neon assembly for conv1x1s1 and conv3x3s1 winograd pack4 6 years ago
  nihuini c0a4ffcf66 convolution pad_value param 6 years ago
  nihuini 587a67eb51 the noop layer 6 years ago
  nihuini b7085ceec0 deconvolution apply output adj first, then crop the padding 6 years ago
  tpoisonooo 8dbafe7764 constraint input value to [-127, +127] (#1258) 6 years ago
  nihui e56fcc77c5 optimize dot memory layout 6 years ago
  nihuini 8a7b4b035e radv crash with large local group size, workaround 6 years ago
  nihuini 80f898b079 unaryop tanh vulkan 6 years ago
  nihuini 91ef4eea4f fix unaryop arm, fix #1241 6 years ago
  nihuini 3e3189736c fix msvc build, fix #1237 6 years ago
  Xu Yang 31cf7f3c5b fix ConvolutionDepthWise int8_requantize (#1233) 6 years ago
  nihuini c4bebc6371 x86 conv3x3s1 winograd43 produce wrong result, revert to the good-old winograd23 version 6 years ago
  CnybTseng d11c4c1d42 修改最新版ncnn/src/layer/vulkan/shader目录下的几个文件,以适配最新版的glslang,本次修改已在大疆Manifold2-G平台上验证通过 (#1231) 6 years ago
  nihui 46e7ac76ab apply sgemm-like dot in winograd pack4 neon 6 years ago
  nihui d6860d93f2 fix batchnorm pack4 neon multithreaded 6 years ago