Howave
|
415bfbdfa7
|
added arm layer compilation for arm-linux system (#316)
|
8 years ago |
nihuini
|
318d3abe66
|
bind register explicitly, fix #306, fix #310, fix #312
|
8 years ago |
Yantao Xie
|
2e9da1b95b
|
Add the epsilon parameter to the BatchNorm layer. (fix #303) (#311)
* Add the epsilon parameter to the BatchNorm layer. (fix #303)
* Move the eps into the sqrt.
|
8 years ago |
nihuini
|
231a52e469
|
fix build on aarch64 with gcc, fix #309
|
8 years ago |
BUG1989
|
4ebab2725d
|
Updata benchmark ReadMe (#308)
|
8 years ago |
BUG1989
|
af7019d3fc
|
fix compile error (#305)
|
8 years ago |
nihui
|
d7e31987fa
|
Update README.md
|
8 years ago |
nihui
|
004632d9e5
|
Update README.md
|
8 years ago |
nihui
|
ce9eeaba9a
|
Update README.md
|
8 years ago |
nihui
|
875a188d10
|
pre interleave kernel memory for winograd4, about 3%~20% speed gains
|
8 years ago |
nihui
|
d2c01019aa
|
fix convert depthwise deconvolution, fix #300
|
8 years ago |
dong
|
6ea09ebf2c
|
Use aarch64 assembly to replace arm intrinsics
|
8 years ago |
nihui
|
0fe4c6a757
|
Update README.md
|
8 years ago |
820169199
|
656de48631
|
add "#include <float.h>"
|
8 years ago |
Dong Xu
|
28154dcb29
|
fix vst1.f32 of coeff sum at eltwise_arm layer
In line 414: "vmla.f32 q1, q0, %q6 \n", destination register is q1 instead of q0, So, replace the {d0-d1} of line 416 with {d2-d3}.
|
8 years ago |
nihui
|
57f89a0245
|
convert MatMul
|
8 years ago |
nihui
|
4d6fa6cc79
|
convert Constant
|
8 years ago |
nihui
|
875a042da8
|
convert LRN bias
|
8 years ago |
nihui
|
0fd701112e
|
load LRN bias from param
|
8 years ago |
nihui
|
7d1e49584d
|
call Innerproduct for convolution on flattened blob
|
8 years ago |
nihui
|
caf105abc5
|
convert BinaryOp
|
8 years ago |
harhar539
|
9a8486a823
|
1.fix pad tail bug in commit d1ea2a3 at pooling layer
|
8 years ago |
nihui
|
b1aec69ff9
|
d31 is useless
|
8 years ago |
nihuini
|
5e484a47ef
|
fix build, second try
|
8 years ago |
nihui
|
5f0fa95f61
|
fix build
|
8 years ago |
nihui
|
9c05e48e87
|
Update README.md
|
8 years ago |
nihui
|
ecaadb20c6
|
fix result blob in squeezenet example
|
8 years ago |
nihui
|
68f016936d
|
convert Dropout Sum and InnerProduct-like Gemm, inception_v1 works :P
|
8 years ago |
nihui
|
d1ea2a34b4
|
rewrite pooling pad scheme, global pooling return continous blob
|
8 years ago |
nihui
|
5cdcf33cfc
|
convert softmax, squeezenet model works :D
|
8 years ago |
nihui
|
a335ae840f
|
add mobilenet v2 result
|
8 years ago |
nihui
|
0d26710f3d
|
add mobilenet_v2 benchmark
|
8 years ago |
nihui
|
232da265cb
|
parse node attr
|
8 years ago |
nihui
|
405faed2ab
|
readme for benchncnn
|
8 years ago |
nihui
|
e245f1f761
|
benchmark routine :)
|
8 years ago |
nihui
|
6c4c810fda
|
decouple modelbin of different input types, simplify timestamp function
|
8 years ago |
nihui
|
2d4ae30508
|
fallback to all cores
|
8 years ago |
nihui
|
cace367080
|
prefer the toolchain file bundled with android ndk
|
8 years ago |
nihui
|
03c1f63c2e
|
switch to winograd4
|
8 years ago |
nihui
|
bc99d5123b
|
set smp cpu affinity to all cores
|
8 years ago |
nihuini
|
f27b9f7791
|
convert expand_dims
|
8 years ago |
nihuini
|
098fff355c
|
implement spatial norm, convert L2Normalization
|
8 years ago |
nihui
|
5ff6a1808a
|
emmmm, yet another implementation for winograd 3x3, unroll aggressively for aarch64
|
8 years ago |
YQZ1990
|
6f13cc5185
|
slice (#269)
* fix slice dim3
|
8 years ago |
nihuini
|
bd705d5bdb
|
inplace binaryop with scalar
|
8 years ago |
nihuini
|
513a5fad73
|
convert Deconvolution and InstanceNorm, fix elu leaky slope
|
8 years ago |
nihuini
|
5f4ac776d1
|
implement instancenorm
|
8 years ago |
nihuini
|
f4f5a96b9f
|
fix default concat dim
|
8 years ago |
nihuini
|
e2b11f3b91
|
convert more op ...
|
8 years ago |
nihuini
|
eea00ec8ec
|
parse onnx graph
|
8 years ago |