Lamply
6612178960
correct arm convolution depthwise mistakes ( #246 )
8 years ago
nihui
31985b18f8
do not convert to depthwise if group is one
8 years ago
nihui
848c9a1ea7
code clean
8 years ago
nihui
80fb28de90
unroll outch for convolution 3x3s1, about 10%~20% speed gain
8 years ago
nihui
df218110be
unroll num_output for innerproduct, about 60% speed gain
8 years ago
nihui
aaa1ffcef0
emmmm, prefer w h
8 years ago
nihui
d68eb4cd15
wrap benchmark gettimeofday
8 years ago
Linghan Cheung
811b6ba1b6
print benchmark information for every layer, especially for CONVOLUTION ( #241 )
* print benchmark information for every layer, especially for CONVOLUTION
* print benchmark information for every layer, especially for CONVOLUTION, for cross-platform.
* move the function implementation to cpp file to avoid multiple definitions
8 years ago
nihuini
d2ee4e7d27
ld1 and st1 handle data endian mode per element
8 years ago
nihui
08e261f423
innerproduct produce continous blob, fix #236
8 years ago
nihui
682b0d3c0d
prelu on vector and image
8 years ago
yetiancai
5e358b831d
add softmax activation
8 years ago
nihui
14a2e23407
enable embed layer
8 years ago
nihui
49df53dd7e
mxnet elemwise op is binary op
8 years ago
nihui
c9789fb879
slice dim
8 years ago
nihui
0415f16650
fix split on subncnn blob, convert Embedding
8 years ago
nihui
634c6568ff
parse input sub index, convert element_mul SliceChannel
8 years ago
nihui
80d14dd252
parse attrs
8 years ago
sheen
539440c4f3
add bitcode setting
8 years ago
nihuini
67b80183dd
fix param load using external memory
8 years ago
nihuini
7fc23025d4
unroll outch for convolution 1x1 stride 2, about 15%~55% speed gain
8 years ago
nihuini
ccbb94d835
fix build
8 years ago
nihuini
e471028f53
fix avg pooling in tail pad
8 years ago
nihuini
a84ba8fc0f
element type storage support in Mat, move data member the first so that a pointer to Mat is a pointer to data, convenient index access for float vector
8 years ago
nihuini
8773035891
another implementation for winograd 3x3, about 15%~30% speed gains on small images
8 years ago
nihui
2a62a98e1a
allow constructing paramdict and modelbin from userside
8 years ago
nihui
10b86c2af5
create layer from type name
8 years ago
nihui
118e037f33
arm neon optimize for mat fill
8 years ago
nihui
7a43c45e80
remove deprecated code
8 years ago
nihui
a181d25098
new model load api, fix #215
8 years ago
nihuini
b84ba31c23
enable light mode by default
8 years ago
peng
5ac2de8963
fix shufflechannel
8 years ago
nihuini
df5e04260a
fix conv1x1s1 bug
8 years ago
nihuini
9280a068fe
unroll outch for convolution 3x3 winograd64, reduce memory usage
8 years ago
nihui
1f5c646ee0
pipeline optimize
8 years ago
nihuini
0564021afc
fix armv7 assembly
8 years ago
nihuini
55ec189998
unroll outch for convolution 1x1 stride 1
8 years ago
nihuini
57df1076ff
neon optimize for depthwise convolution 3x3, about 20%~35% speed gain
8 years ago
wind19870521
822214269d
fix pooling2x2s2_max_neon stride bug
图像width为奇数时,stride为2时,出错
8 years ago
vsooda
e85bebbf48
mxnet no square convolution
8 years ago
nihuini
9e36e2ba0e
strict Werror may cause unexpected compile error
8 years ago
nihuini
a240678299
warning--
8 years ago
vsooda
02cff845ea
fix mobilenet: add depthwise, fix batch norm ( #218 )
* fix mobilenet: add depthwise, fix batch norm
* use has_attr to judge if eps exist
8 years ago
zengping
a54f14feca
[fix-compile-warnings] fix compiler warnings, and add werror in CMakeLists.txt ( #217 )
* [fix-compile-warnings] fix compiler warnings, and add werror in CMakeLists.txt
* [fix-compile-warnings] fix compiler warnings, remote ycm_extra_conf.py
8 years ago
nihui
0f52418023
change input param order to w h c, replace caffe MemoryData to Input
8 years ago
nihui
abfd3ea6c8
convert non-square convolution pooling param
8 years ago
nihui
62913964d6
non-square kernel stride and padding w h for pooling
8 years ago
nihui
bdb70a2010
padding w h in convolution and deconvolution
8 years ago
nihui
44b4519307
non-square convolution and deconvolution kernel stride dilation
8 years ago
HustCoderHu
23a3254a7b
fix memcpy error
8 years ago