157 Commits (4136de3b8d2bfaba0f758af0c9b7d1c726bdcef2)

Author SHA1 Message Date
  nihui 4dc98ffaab conv1x1s1 and conv3x3s1 winograd pack4 neon optimization, first try 6 years ago
  nihuini 296e0022df deconvolution output adj and output shape 6 years ago
  nihuini e4b44d293e more autopad SAME_LOWER 6 years ago
  nihuini 9a6ee37eef asymmetric padding parameter for convolution and deconvolution family 6 years ago
  nihui 394f6786b9 neon enable support_packing 6 years ago
  nihui b4c388a72a Mat misc function accept option parameter, deconvolution pack4 arm neon 6 years ago
  tpoisonooo 1a0459cffe Update convolution_arm.cpp 6 years ago
  nihuini c4f23ae8ad rename Mat packing to elempack 6 years ago
  nihui 7655b9e4e9 fix build on armv7 again ... 6 years ago
  nihui a97439988f fix build on armv7 6 years ago
  nihuini 81a5dfe76b general convolution and convolutiondepthwise arm neon pack4, wip 6 years ago
  tpoisonooo 1ca4387c9c Auto choose conv implementation (#1085) 6 years ago
  BUG1989 bcfe9f453f initial the ncnn post training quantization tools (#1067) 7 years ago
  BUG1989 d9f269fa3d use sgemm fp32 on arm platform,optimize conv1x1s2 (#1031) 7 years ago
  nihuini 4de4078779 move platform includes out of namespace 7 years ago
  nihui 3e003ffd98 fuse sigmoid 7 years ago
  nihuini 7a8f68aca6 move vulkan code to subdir, new layer interface create_pipeline and destroy_pipeline for post-loading works 7 years ago
  nihuini 3f85cafc08 fuse relu leakyrelu clip into convolution/deconvolution/innerproduct 7 years ago
  BUG1989 93a34a897d add int8 winograd F(4,3) with neon assembly optimization (#891) 7 years ago
  BUG1989 780c7d9a72 merge de/requantize op, optimize some int8 conv layer on arm64-v8a (#867) 7 years ago
  BUG1989 2f4c4a8202 fix the compile error when using armv7a without neon (#835) 7 years ago
  BUG1989 ff38053321 [WIP] arm64-v8a int8 optimization (#823) 7 years ago
  BUG1989 8e337d440e fix the bug with convdw7x7 op working on int8 mode (#818) 7 years ago
  BUG1989 df3d224484 new int8 implement,better accuracy (#749) 7 years ago
  BUG1989 229f8fd8db add the armv7a conv3x3s2, convdw3x3s1/s2 int8 implement without overflow 7 years ago
  BUG1989 4289c64090 add the armv7a conv1x1s1 sgemm int8 implement without overflow : ) 7 years ago
  nihuini 19ad4cf284 fix build without neon 7 years ago
  nihuini 8526a69777 packed int8 convolution 3x3 stride 1 for armv7, 7%~25% faster than vanilla one, but God knows how hard I try :| 7 years ago
  nihuini 6f1b0b0a61 quantized padding in convolution, use range sweets 7 years ago
  nihui 72411b7a6c restore the old conv3x3s2 as reference, fast dilation convolution fails on striding 7 years ago
  nihui 1f20eb4e8c pack weight and more unroll makes improvement, ~20% faster for conv3x3s2 7 years ago
  nihui fe14037777 more sub op preload 7 years ago
  nihui 2fe7ada4d8 add arm int8 convolution stub, preload group op for x86 7 years ago
  nihui 5d04a3a45c layer holds bottom blob scale, depthwise convolution read group scales 7 years ago
  nihuini da352916fe fix pd using flag condition 7 years ago
  nihuini e34aa7786a armv7 int8 quantize/dequantize and conv1x1s1 7 years ago
  nihui a169cec363 core int8 inference, quantize and dequantize, net using flag, caffe2ncnn reads int8 scale table 7 years ago
  nihui 9706cd1447 implement ncnn blob/workspace allocator, fine-grained per-layer openmp threads control, fix #469 8 years ago
  nihui 5879cb4d15
sgemm outperform direct conv on large channel 8 years ago
  nihui 56a667472a
sgemm is always faster on common channel size 8 years ago
  nihuini d172a34329 direct assembly port, enable convolution 1x1 sgemm on armv7 8 years ago
  nihuini 0fdb8da60e sgemm convolution 1x1 wip, about 20%~75% faster on aarch64, while armv7 compiler is foolish qaq 8 years ago
  nihui 72bb261e7a switch to winograd5 8 years ago
  Hyungsuk Yoon 8f56e00b4b make convolution with dilation fast 8 years ago
  nihui 7d1e49584d call Innerproduct for convolution on flattened blob 8 years ago
  nihui 03c1f63c2e switch to winograd4 8 years ago
  nihui a181d25098 new model load api, fix #215 8 years ago
  nihui bdb70a2010 padding w h in convolution and deconvolution 8 years ago
  nihui 44b4519307 non-square convolution and deconvolution kernel stride dilation 8 years ago
  nihuini 964040fe3c more runtime decisions for winograd path 8 years ago