93 Commits (90e6be457b8094fcb63d219691cdf0c41fe01fc0)

Author SHA1 Message Date
  nihuini 4de4078779 move platform includes out of namespace 7 years ago
  nihui 3e003ffd98 fuse sigmoid 7 years ago
  nihuini 7a8f68aca6 move vulkan code to subdir, new layer interface create_pipeline and destroy_pipeline for post-loading works 7 years ago
  nihuini 3f85cafc08 fuse relu leakyrelu clip into convolution/deconvolution/innerproduct 7 years ago
  BUG1989 93a34a897d add int8 winograd F(4,3) with neon assembly optimization (#891) 7 years ago
  BUG1989 780c7d9a72 merge de/requantize op, optimize some int8 conv layer on arm64-v8a (#867) 7 years ago
  BUG1989 2f4c4a8202 fix the compile error when using armv7a without neon (#835) 7 years ago
  BUG1989 ff38053321 [WIP] arm64-v8a int8 optimization (#823) 7 years ago
  BUG1989 8e337d440e fix the bug with convdw7x7 op working on int8 mode (#818) 7 years ago
  BUG1989 df3d224484 new int8 implement,better accuracy (#749) 7 years ago
  BUG1989 229f8fd8db add the armv7a conv3x3s2, convdw3x3s1/s2 int8 implement without overflow 7 years ago
  BUG1989 4289c64090 add the armv7a conv1x1s1 sgemm int8 implement without overflow : ) 7 years ago
  nihuini 19ad4cf284 fix build without neon 7 years ago
  nihuini 8526a69777 packed int8 convolution 3x3 stride 1 for armv7, 7%~25% faster than vanilla one, but God knows how hard I try :| 7 years ago
  nihuini 6f1b0b0a61 quantized padding in convolution, use range sweets 7 years ago
  nihui 72411b7a6c restore the old conv3x3s2 as reference, fast dilation convolution fails on striding 7 years ago
  nihui 1f20eb4e8c pack weight and more unroll makes improvement, ~20% faster for conv3x3s2 7 years ago
  nihui fe14037777 more sub op preload 7 years ago
  nihui 2fe7ada4d8 add arm int8 convolution stub, preload group op for x86 7 years ago
  nihui 5d04a3a45c layer holds bottom blob scale, depthwise convolution read group scales 7 years ago
  nihuini da352916fe fix pd using flag condition 7 years ago
  nihuini e34aa7786a armv7 int8 quantize/dequantize and conv1x1s1 7 years ago
  nihui a169cec363 core int8 inference, quantize and dequantize, net using flag, caffe2ncnn reads int8 scale table 7 years ago
  nihui 9706cd1447 implement ncnn blob/workspace allocator, fine-grained per-layer openmp threads control, fix #469 8 years ago
  nihui 5879cb4d15
sgemm outperform direct conv on large channel 8 years ago
  nihui 56a667472a
sgemm is always faster on common channel size 8 years ago
  nihuini d172a34329 direct assembly port, enable convolution 1x1 sgemm on armv7 8 years ago
  nihuini 0fdb8da60e sgemm convolution 1x1 wip, about 20%~75% faster on aarch64, while armv7 compiler is foolish qaq 8 years ago
  nihui 72bb261e7a switch to winograd5 8 years ago
  Hyungsuk Yoon 8f56e00b4b make convolution with dilation fast 8 years ago
  nihui 7d1e49584d call Innerproduct for convolution on flattened blob 8 years ago
  nihui 03c1f63c2e switch to winograd4 8 years ago
  nihui a181d25098 new model load api, fix #215 8 years ago
  nihui bdb70a2010 padding w h in convolution and deconvolution 8 years ago
  nihui 44b4519307 non-square convolution and deconvolution kernel stride dilation 8 years ago
  nihuini 964040fe3c more runtime decisions for winograd path 8 years ago
  nihui c77ca16468 enable conv3x3s1 winograd optimization, two paths for small image on armv7 and all for aarch64 8 years ago
  nihui 790829bc62 partition dot tiles and reuse kernel register, over 20% improvement for tiny image 8 years ago
  nihui eea3ca577a disable winograd atm ... 8 years ago
  nihui 0385d8e8ad implement winograd64 optimization for convolution 3x3s1 8 years ago
  nihuini 47218db6e5 fix minus padding SAME, fix #116 8 years ago
  nihuini 23630b14b9 implement tensorflow style padding SAME type for convolution and pooling, second try 8 years ago
  nihuini b7db8be4f6 add ncnn source qwq 9 years ago