nihui
6c4c810fda
decouple modelbin of different input types, simplify timestamp function
8 years ago
nihui
2d4ae30508
fallback to all cores
8 years ago
nihui
03c1f63c2e
switch to winograd4
8 years ago
nihui
bc99d5123b
set smp cpu affinity to all cores
8 years ago
nihuini
098fff355c
implement spatial norm, convert L2Normalization
8 years ago
nihui
5ff6a1808a
emmmm, yet another implementation for winograd 3x3, unroll aggressively for aarch64
8 years ago
YQZ1990
6f13cc5185
slice ( #269 )
* fix slice dim3
8 years ago
nihuini
bd705d5bdb
inplace binaryop with scalar
8 years ago
nihuini
5f4ac776d1
implement instancenorm
8 years ago
nihuini
db5e805eff
padding_mode for Pooling, fix #261
8 years ago
nihui
2d9410742b
concat slice shufflechannel honor elemsize
8 years ago
nihui
8ccae1d4fd
prevent reuse of param array, fix #258
8 years ago
nihuini
75218953cc
aarch64 assembly for conv1x1s1, unroll outch inch as 8x8
8 years ago
nihuini
76a55693a6
decouple convolutiondepthwise and convolution, reduce binary size by 10%, fix #254
8 years ago
nihuini
3ffb502bc6
reuse if the same shape
8 years ago
nihuini
c6506d6ecd
remaining inch for winograd neon3
8 years ago
nihui
c12fab569f
fix convdw3x3s1 on aarch64
8 years ago
nihui
f133729c78
code style changes
8 years ago
nihuini
03621aa7f9
more x86 stub for convolution and convolutiondepthwise
8 years ago
Lamply
6612178960
correct arm convolution depthwise mistakes ( #246 )
8 years ago
nihui
848c9a1ea7
code clean
8 years ago
nihui
80fb28de90
unroll outch for convolution 3x3s1, about 10%~20% speed gain
8 years ago
nihui
df218110be
unroll num_output for innerproduct, about 60% speed gain
8 years ago
nihui
aaa1ffcef0
emmmm, prefer w h
8 years ago
nihui
d68eb4cd15
wrap benchmark gettimeofday
8 years ago
Linghan Cheung
811b6ba1b6
print benchmark information for every layer, especially for CONVOLUTION ( #241 )
* print benchmark information for every layer, especially for CONVOLUTION
* print benchmark information for every layer, especially for CONVOLUTION, for cross-platform.
* move the function implementation to cpp file to avoid multiple definitions
8 years ago
nihuini
d2ee4e7d27
ld1 and st1 handle data endian mode per element
8 years ago
nihui
08e261f423
innerproduct produce continous blob, fix #236
8 years ago
nihui
682b0d3c0d
prelu on vector and image
8 years ago
nihui
14a2e23407
enable embed layer
8 years ago
nihui
c9789fb879
slice dim
8 years ago
nihuini
67b80183dd
fix param load using external memory
8 years ago
nihuini
7fc23025d4
unroll outch for convolution 1x1 stride 2, about 15%~55% speed gain
8 years ago
nihuini
ccbb94d835
fix build
8 years ago
nihuini
e471028f53
fix avg pooling in tail pad
8 years ago
nihuini
a84ba8fc0f
element type storage support in Mat, move data member the first so that a pointer to Mat is a pointer to data, convenient index access for float vector
8 years ago
nihuini
8773035891
another implementation for winograd 3x3, about 15%~30% speed gains on small images
8 years ago
nihui
2a62a98e1a
allow constructing paramdict and modelbin from userside
8 years ago
nihui
10b86c2af5
create layer from type name
8 years ago
nihui
118e037f33
arm neon optimize for mat fill
8 years ago
nihui
7a43c45e80
remove deprecated code
8 years ago
nihui
a181d25098
new model load api, fix #215
8 years ago
nihuini
b84ba31c23
enable light mode by default
8 years ago
peng
5ac2de8963
fix shufflechannel
8 years ago
nihuini
df5e04260a
fix conv1x1s1 bug
8 years ago
nihuini
9280a068fe
unroll outch for convolution 3x3 winograd64, reduce memory usage
8 years ago
nihui
1f5c646ee0
pipeline optimize
8 years ago
nihuini
0564021afc
fix armv7 assembly
8 years ago
nihuini
55ec189998
unroll outch for convolution 1x1 stride 1
8 years ago
nihuini
57df1076ff
neon optimize for depthwise convolution 3x3, about 20%~35% speed gain
8 years ago