409 Commits (af49e2cada56cdae5879686e35033e14dc42ea25)
 

Author SHA1 Message Date
  nihui af49e2cada
install allocator.h 7 years ago
  nihui ae467fee25
project-wide NOMINMAX on msvc 7 years ago
  nihui 7e1f358084
fix build on msvc 7 years ago
  nihui 9706cd1447 implement ncnn blob/workspace allocator, fine-grained per-layer openmp threads control, fix #469 7 years ago
  nihui 5879cb4d15
sgemm outperform direct conv on large channel 7 years ago
  nihui 20c0794b36
Update README.md 7 years ago
  nihuini 4b8101e7fc Revert "optimize interleave section for load first, about 5%~10% speed gain" 7 years ago
  nihui 56a667472a
sgemm is always faster on common channel size 8 years ago
  nihui 1e4eaeeacd optimize interleave section for load first, about 5%~10% speed gain 8 years ago
  Qu Xiaofeng / 曲晓峰 d0cad77a15 Fixed two typos (#466) 8 years ago
  nihui 6895cbf810 single vldm is faster than two vld1 on armv7, and some pipeline optimize 8 years ago
  nihuini 05d7562a5d reorder kernel weight, pipeline friendly ;) 8 years ago
  nihuini 0bbdbf4ff8 add mobilenet-yolo 8 years ago
  nihuini 543d764674 fix yolo preprocess, comment about mobilenet-yolo 8 years ago
  nihui 5c6ef31e07 -x 8 years ago
  nihui eb089c0b32 add yolov2 example 8 years ago
  nihui a94e5adfd1 fix debug build 8 years ago
  nihui 0b6791e2ba convert BN ReLU6 Reorg YoloDetectionOutput Embed LSTM 8 years ago
  nihui b8f4f024a4 implement reorg yolodetectionoutput layer from caffe-yolov2 8 years ago
  kalcohol 8491f2b6a3 fix error C2059 and C2589 when using std::min and std::max. (#456) 8 years ago
  BUG1989 b3965e26cb Update README.md (#452) 8 years ago
  nihuini ee98817446 proper first row/col handling in resize family, fix #429 8 years ago
  nihuini 511baa6718 optional image pixel api, fix #434 8 years ago
  nihui 74d1c1470f
update qcom810 iphone5s benchmark result 8 years ago
  nihui 8275a08950
update qcom410 i.mx7 benchmark result 8 years ago
  nihuini 2368d29a1e more explicit alignment on armv7 8 years ago
  nihuini d172a34329 direct assembly port, enable convolution 1x1 sgemm on armv7 8 years ago
  nihuini b3e24cafc3 openmp++ 8 years ago
  nihuini 0fdb8da60e sgemm convolution 1x1 wip, about 20%~75% faster on aarch64, while armv7 compiler is foolish qaq 8 years ago
  nihuini 2b20bf940c drop armv7 vaddvq_f32 hack 8 years ago
  nihui 72bb261e7a switch to winograd5 8 years ago
  nihuini a234e9240d fix concat on height 8 years ago
  nihuini 588487a8a0 convert caffe crop layer with three offset, fix #165 8 years ago
  nihuini 003873c55b crop on channel and crop by param 8 years ago
  nihui 184cea1ced
Update README.md 8 years ago
  nihuini fd9ef5716a fix parsing inputs list in multiple lines 8 years ago
  Chang, Hui-Tang dc2a689d10 fix proposal roi_score_blob bug (#430) 8 years ago
  nihuini 99a343ce70 allocate after permute, reduce peak memory usage 8 years ago
  nihuini 0ce0c11851 load sub-op in advance for group convolution 8 years ago
  nihuini 86f4264c7c arm neon assembly for winograd5 8 years ago
  kyuusaku d2416187dc fix parameter check for interp (#425) 8 years ago
  nihuini 90643630c2 apple a10/a11 is armv8.2-a 8 years ago
  nihuini 5dc35f2860 w h c order 8 years ago
  nihuini babbb604e1 fix deconvolution weight order 8 years ago
  nihuini 50e1f0e531 const for to_pixels family 8 years ago
  nihuini b89851c6b6 convert sigmoid 8 years ago
  nihuini ce74836e2a yet another winograd convolution implementation, unroll outch 8 tiles 4 inch 4, about 22% faster, more optimization may comes soon :> 8 years ago
  nihui 18d7b3c3d8
Update README.md 8 years ago
  唐琦@异构计算 ba2fa28268 Update README.md (#420) 8 years ago
  nihui 94d9f393f6
ncnn pixel art 8 years ago