nihuini
394bca8dbb
Merge branch 'master' of https://github.com/Tencent/ncnn
8 years ago
nihuini
9ac305e160
create 3-dim sub blob for group convolution, fix #315
8 years ago
Howave
415bfbdfa7
added arm layer compilation for arm-linux system ( #316 )
8 years ago
nihuini
318d3abe66
bind register explicitly, fix #306 , fix #310 , fix #312
8 years ago
Yantao Xie
2e9da1b95b
Add the epsilon parameter to the BatchNorm layer. ( fix #303 ) ( #311 )
* Add the epsilon parameter to the BatchNorm layer. (fix #303 )
* Move the eps into the sqrt.
8 years ago
nihuini
231a52e469
fix build on aarch64 with gcc, fix #309
8 years ago
BUG1989
af7019d3fc
fix compile error ( #305 )
8 years ago
nihui
875a188d10
pre interleave kernel memory for winograd4, about 3%~20% speed gains
8 years ago
dong
6ea09ebf2c
Use aarch64 assembly to replace arm intrinsics
8 years ago
820169199
656de48631
add "#include <float.h>"
8 years ago
Dong Xu
28154dcb29
fix vst1.f32 of coeff sum at eltwise_arm layer
In line 414: "vmla.f32 q1, q0, %q6 \n", destination register is q1 instead of q0, So, replace the {d0-d1} of line 416 with {d2-d3}.
8 years ago
nihui
0fd701112e
load LRN bias from param
8 years ago
nihui
7d1e49584d
call Innerproduct for convolution on flattened blob
8 years ago
harhar539
9a8486a823
1.fix pad tail bug in commit d1ea2a3 at pooling layer
8 years ago
nihui
b1aec69ff9
d31 is useless
8 years ago
nihuini
5e484a47ef
fix build, second try
8 years ago
nihui
5f0fa95f61
fix build
8 years ago
nihui
d1ea2a34b4
rewrite pooling pad scheme, global pooling return continous blob
8 years ago
nihui
6c4c810fda
decouple modelbin of different input types, simplify timestamp function
8 years ago
nihui
2d4ae30508
fallback to all cores
8 years ago
nihui
03c1f63c2e
switch to winograd4
8 years ago
nihui
bc99d5123b
set smp cpu affinity to all cores
8 years ago
nihuini
098fff355c
implement spatial norm, convert L2Normalization
8 years ago
nihui
5ff6a1808a
emmmm, yet another implementation for winograd 3x3, unroll aggressively for aarch64
8 years ago
YQZ1990
6f13cc5185
slice ( #269 )
* fix slice dim3
8 years ago
nihuini
bd705d5bdb
inplace binaryop with scalar
8 years ago
nihuini
5f4ac776d1
implement instancenorm
8 years ago
nihuini
db5e805eff
padding_mode for Pooling, fix #261
8 years ago
nihui
2d9410742b
concat slice shufflechannel honor elemsize
8 years ago
nihui
8ccae1d4fd
prevent reuse of param array, fix #258
8 years ago
nihuini
75218953cc
aarch64 assembly for conv1x1s1, unroll outch inch as 8x8
8 years ago
nihuini
76a55693a6
decouple convolutiondepthwise and convolution, reduce binary size by 10%, fix #254
8 years ago
nihuini
3ffb502bc6
reuse if the same shape
8 years ago
nihuini
c6506d6ecd
remaining inch for winograd neon3
8 years ago
nihui
c12fab569f
fix convdw3x3s1 on aarch64
8 years ago
nihui
f133729c78
code style changes
8 years ago
nihuini
03621aa7f9
more x86 stub for convolution and convolutiondepthwise
8 years ago
Lamply
6612178960
correct arm convolution depthwise mistakes ( #246 )
8 years ago
nihui
848c9a1ea7
code clean
8 years ago
nihui
80fb28de90
unroll outch for convolution 3x3s1, about 10%~20% speed gain
8 years ago
nihui
df218110be
unroll num_output for innerproduct, about 60% speed gain
8 years ago
nihui
aaa1ffcef0
emmmm, prefer w h
8 years ago
nihui
d68eb4cd15
wrap benchmark gettimeofday
8 years ago
Linghan Cheung
811b6ba1b6
print benchmark information for every layer, especially for CONVOLUTION ( #241 )
* print benchmark information for every layer, especially for CONVOLUTION
* print benchmark information for every layer, especially for CONVOLUTION, for cross-platform.
* move the function implementation to cpp file to avoid multiple definitions
8 years ago
nihuini
d2ee4e7d27
ld1 and st1 handle data endian mode per element
8 years ago
nihui
08e261f423
innerproduct produce continous blob, fix #236
8 years ago
nihui
682b0d3c0d
prelu on vector and image
8 years ago
nihui
14a2e23407
enable embed layer
8 years ago
nihui
c9789fb879
slice dim
8 years ago
nihuini
67b80183dd
fix param load using external memory
8 years ago