132 Commits (a84ba8fc0fe4665047fdce1c2d84b0e8a52aa441)

Author SHA1 Message Date
  nihuini a84ba8fc0f element type storage support in Mat, move data member the first so that a pointer to Mat is a pointer to data, convenient index access for float vector 8 years ago
  nihuini 8773035891 another implementation for winograd 3x3, about 15%~30% speed gains on small images 8 years ago
  nihui 2a62a98e1a allow constructing paramdict and modelbin from userside 8 years ago
  nihui 10b86c2af5 create layer from type name 8 years ago
  nihui 118e037f33 arm neon optimize for mat fill 8 years ago
  nihui 7a43c45e80 remove deprecated code 8 years ago
  nihui a181d25098 new model load api, fix #215 8 years ago
  nihuini b84ba31c23 enable light mode by default 8 years ago
  peng 5ac2de8963 fix shufflechannel 8 years ago
  nihuini df5e04260a fix conv1x1s1 bug 8 years ago
  nihuini 9280a068fe unroll outch for convolution 3x3 winograd64, reduce memory usage 8 years ago
  nihui 1f5c646ee0 pipeline optimize 8 years ago
  nihuini 0564021afc fix armv7 assembly 8 years ago
  nihuini 55ec189998 unroll outch for convolution 1x1 stride 1 8 years ago
  nihuini 57df1076ff neon optimize for depthwise convolution 3x3, about 20%~35% speed gain 8 years ago
  wind19870521 822214269d fix pooling2x2s2_max_neon stride bug 8 years ago
  zengping a54f14feca [fix-compile-warnings] fix compiler warnings, and add werror in CMakeLists.txt (#217) 8 years ago
  nihui 0f52418023 change input param order to w h c, replace caffe MemoryData to Input 8 years ago
  nihui 62913964d6 non-square kernel stride and padding w h for pooling 8 years ago
  nihui bdb70a2010 padding w h in convolution and deconvolution 8 years ago
  nihui 44b4519307 non-square convolution and deconvolution kernel stride dilation 8 years ago
  HustCoderHu 23a3254a7b fix memcpy error 8 years ago
  Zexin, Hu 81fb3818a5 add ShuffleChannel layer, only cpp, no arm yet (#210) 8 years ago
  nihuini 964040fe3c more runtime decisions for winograd path 8 years ago
  nihui c77ca16468 enable conv3x3s1 winograd optimization, two paths for small image on armv7 and all for aarch64 8 years ago
  nihui f2f7ecd2ec fix winograd neon2 for aarch64 8 years ago
  nihui 26303615a6 memcpy for concat 8 years ago
  nihuini a4d28107f4 check clone empty 8 years ago
  nihuini 25f19c2009 implement external scale blob, support SENet 8 years ago
  nihui 15ad4dfb9f forward reuse forward_inplace routine, reduce binary size with little memcpy overhead in non-light mode 8 years ago
  nihui 32cd5f2a5c use mul for the first multiply, drop accumulator clear instructions, about 5% speed performance gains 8 years ago
  nihuini d5da0e84ba fix deconv4x4s2, fix #202 8 years ago
  wind19870521 429e98c91c fix unaryop bug (#200) 8 years ago
  huyn 8b9365a68c fix top_blob not set (#199) 8 years ago
  azrael0fog f232c1a6c5 Update relu_arm.cpp (#189) 8 years ago
  tedder59 4d59d0afda Add depthwise Deconvolution. (#187) 8 years ago
  nihui 790829bc62 partition dot tiles and reuse kernel register, over 20% improvement for tiny image 8 years ago
  nihuini a3be17eb7e special path for 1x1xc innerproduct 8 years ago
  nihuini 50d591cb50 softmax inplace 8 years ago
  peng 39445b5233 no memcpy for small size copy_cut_border/copy_make_boder 8 years ago
  彭 a86cc8f620 memcpy optimize copy_cut_border/copy_make_boder (#179) 8 years ago
  nihuini d99f9d9ac3 implement softmax on vector and image 8 years ago
  liuchang ac3b4768aa fix the missing header file for visual studio. 8 years ago
  nihuini ff3c03cfb1 q9 is useless 8 years ago
  nihuini 8cfd02d633 Merge branch 'master' of https://github.com/Tencent/ncnn 8 years ago
  nihuini 9a55404c72 fix dot on aarch64, still needs improvement ... 8 years ago
  nihui eea3ca577a disable winograd atm ... 8 years ago
  nihui 0385d8e8ad implement winograd64 optimization for convolution 3x3s1 8 years ago
  nihui 20b1330cdb fix lrn within channel 8 years ago
  nihui 8e490d4b68 fix array parsing, first try 8 years ago