185 Commits (394bca8dbb36d3384edb089646aec7ec70fcc12d)

Author SHA1 Message Date
  nihuini 394bca8dbb Merge branch 'master' of https://github.com/Tencent/ncnn 8 years ago
  nihuini 9ac305e160 create 3-dim sub blob for group convolution, fix #315 8 years ago
  Howave 415bfbdfa7 added arm layer compilation for arm-linux system (#316) 8 years ago
  nihuini 318d3abe66 bind register explicitly, fix #306, fix #310, fix #312 8 years ago
  Yantao Xie 2e9da1b95b Add the epsilon parameter to the BatchNorm layer. (fix #303) (#311) 8 years ago
  nihuini 231a52e469 fix build on aarch64 with gcc, fix #309 8 years ago
  BUG1989 af7019d3fc fix compile error (#305) 8 years ago
  nihui 875a188d10 pre interleave kernel memory for winograd4, about 3%~20% speed gains 8 years ago
  dong 6ea09ebf2c Use aarch64 assembly to replace arm intrinsics 8 years ago
  820169199 656de48631 add "#include <float.h>" 8 years ago
  Dong Xu 28154dcb29 fix vst1.f32 of coeff sum at eltwise_arm layer 8 years ago
  nihui 0fd701112e load LRN bias from param 8 years ago
  nihui 7d1e49584d call Innerproduct for convolution on flattened blob 8 years ago
  harhar539 9a8486a823 1.fix pad tail bug in commit d1ea2a3 at pooling layer 8 years ago
  nihui b1aec69ff9 d31 is useless 8 years ago
  nihuini 5e484a47ef fix build, second try 8 years ago
  nihui 5f0fa95f61
fix build 8 years ago
  nihui d1ea2a34b4 rewrite pooling pad scheme, global pooling return continous blob 8 years ago
  nihui 6c4c810fda decouple modelbin of different input types, simplify timestamp function 8 years ago
  nihui 2d4ae30508 fallback to all cores 8 years ago
  nihui 03c1f63c2e switch to winograd4 8 years ago
  nihui bc99d5123b set smp cpu affinity to all cores 8 years ago
  nihuini 098fff355c implement spatial norm, convert L2Normalization 8 years ago
  nihui 5ff6a1808a emmmm, yet another implementation for winograd 3x3, unroll aggressively for aarch64 8 years ago
  YQZ1990 6f13cc5185 slice (#269) 8 years ago
  nihuini bd705d5bdb inplace binaryop with scalar 8 years ago
  nihuini 5f4ac776d1 implement instancenorm 8 years ago
  nihuini db5e805eff padding_mode for Pooling, fix #261 8 years ago
  nihui 2d9410742b concat slice shufflechannel honor elemsize 8 years ago
  nihui 8ccae1d4fd prevent reuse of param array, fix #258 8 years ago
  nihuini 75218953cc aarch64 assembly for conv1x1s1, unroll outch inch as 8x8 8 years ago
  nihuini 76a55693a6 decouple convolutiondepthwise and convolution, reduce binary size by 10%, fix #254 8 years ago
  nihuini 3ffb502bc6 reuse if the same shape 8 years ago
  nihuini c6506d6ecd remaining inch for winograd neon3 8 years ago
  nihui c12fab569f fix convdw3x3s1 on aarch64 8 years ago
  nihui f133729c78 code style changes 8 years ago
  nihuini 03621aa7f9 more x86 stub for convolution and convolutiondepthwise 8 years ago
  Lamply 6612178960 correct arm convolution depthwise mistakes (#246) 8 years ago
  nihui 848c9a1ea7 code clean 8 years ago
  nihui 80fb28de90 unroll outch for convolution 3x3s1, about 10%~20% speed gain 8 years ago
  nihui df218110be unroll num_output for innerproduct, about 60% speed gain 8 years ago
  nihui aaa1ffcef0 emmmm, prefer w h 8 years ago
  nihui d68eb4cd15 wrap benchmark gettimeofday 8 years ago
  Linghan Cheung 811b6ba1b6 print benchmark information for every layer, especially for CONVOLUTION (#241) 8 years ago
  nihuini d2ee4e7d27 ld1 and st1 handle data endian mode per element 8 years ago
  nihui 08e261f423 innerproduct produce continous blob, fix #236 8 years ago
  nihui 682b0d3c0d prelu on vector and image 8 years ago
  nihui 14a2e23407 enable embed layer 8 years ago
  nihui c9789fb879 slice dim 8 years ago
  nihuini 67b80183dd fix param load using external memory 8 years ago