nihui
|
8275a08950
|
update qcom410 i.mx7 benchmark result
|
8 years ago |
nihuini
|
2368d29a1e
|
more explicit alignment on armv7
|
8 years ago |
nihuini
|
d172a34329
|
direct assembly port, enable convolution 1x1 sgemm on armv7
|
8 years ago |
nihuini
|
b3e24cafc3
|
openmp++
|
8 years ago |
nihuini
|
0fdb8da60e
|
sgemm convolution 1x1 wip, about 20%~75% faster on aarch64, while armv7 compiler is foolish qaq
|
8 years ago |
nihuini
|
2b20bf940c
|
drop armv7 vaddvq_f32 hack
|
8 years ago |
nihui
|
72bb261e7a
|
switch to winograd5
|
8 years ago |
nihuini
|
a234e9240d
|
fix concat on height
|
8 years ago |
nihuini
|
588487a8a0
|
convert caffe crop layer with three offset, fix #165
|
8 years ago |
nihuini
|
003873c55b
|
crop on channel and crop by param
|
8 years ago |
nihui
|
184cea1ced
|
Update README.md
|
8 years ago |
nihuini
|
fd9ef5716a
|
fix parsing inputs list in multiple lines
|
8 years ago |
Chang, Hui-Tang
|
dc2a689d10
|
fix proposal roi_score_blob bug (#430)
|
8 years ago |
nihuini
|
99a343ce70
|
allocate after permute, reduce peak memory usage
|
8 years ago |
nihuini
|
0ce0c11851
|
load sub-op in advance for group convolution
|
8 years ago |
nihuini
|
86f4264c7c
|
arm neon assembly for winograd5
|
8 years ago |
kyuusaku
|
d2416187dc
|
fix parameter check for interp (#425)
|
8 years ago |
nihuini
|
90643630c2
|
apple a10/a11 is armv8.2-a
|
8 years ago |
nihuini
|
5dc35f2860
|
w h c order
|
8 years ago |
nihuini
|
babbb604e1
|
fix deconvolution weight order
|
8 years ago |
nihuini
|
50e1f0e531
|
const for to_pixels family
|
8 years ago |
nihuini
|
b89851c6b6
|
convert sigmoid
|
8 years ago |
nihuini
|
ce74836e2a
|
yet another winograd convolution implementation, unroll outch 8 tiles 4 inch 4, about 22% faster, more optimization may comes soon :>
|
8 years ago |
nihui
|
18d7b3c3d8
|
Update README.md
|
8 years ago |
唐琦@异构计算
|
ba2fa28268
|
Update README.md (#420)
Add the benchmark of Rockchip RK3399
|
8 years ago |
nihui
|
94d9f393f6
|
ncnn pixel art
|
8 years ago |
nihuini
|
307a77f04b
|
convert LogisticRegressionOutput
|
8 years ago |
唐琦@异构计算
|
0fa92b5e0a
|
updata benchmark with q820 and hisi3519 (#407)
* Update README.md
Add HiSilicon Hi3519V101 benchmark,only use the big core :)
|
8 years ago |
nihuini
|
643f2a671b
|
convert _maximum_scalar _minimum_scalar _power_scalar
|
8 years ago |
nihui
|
d169af7bd3
|
convert Elu
|
8 years ago |
nihui
|
99c243183b
|
convert PRelu
|
8 years ago |
nihui
|
e3cb03a408
|
convert Pad, fix #398
|
8 years ago |
nihui
|
8a5b35e47d
|
convert more elemwise operator
|
8 years ago |
nihui
|
30b6cc4ecd
|
rdiv binaryop
|
8 years ago |
nihui
|
2f90a794ad
|
rsub binaryop
|
8 years ago |
nihuini
|
3b8a3f6764
|
convert depthwiseconvolution from yonghenglh6 branch
|
8 years ago |
nihuini
|
d6e1b8207a
|
convert _minus_scalar and _mul_scalar
|
8 years ago |
nihuini
|
a341e7465c
|
reject to load model with empty network, fix #392
|
8 years ago |
Neo
|
4905dc81d8
|
# fix bench sqz-ssd and mobile-ssd
squeezenet-ssd and mobilenet-ssd input shape should be 300x300 not 227x227
|
8 years ago |
nihuini
|
356d018771
|
implement Clip and converter support
|
8 years ago |
nihuini
|
b560252af1
|
handle batchnorm scale_factor zero, fix #302
|
8 years ago |
Hyungsuk Yoon
|
8f56e00b4b
|
make convolution with dilation fast
|
8 years ago |
nihuini
|
d7c179f39d
|
armv7 assembly implementation for winograd input output transform, compiler is childish
|
8 years ago |
nihuini
|
3a27cb715f
|
skip background class in gathering
|
8 years ago |
nihui
|
182e92a331
|
Update benchmark.cpp
|
8 years ago |
nihui
|
bc1a84dee5
|
fix mingw32 build
|
8 years ago |
Hyungsuk Yoon
|
b2794ba118
|
rename test_convlution to test_convolution
|
8 years ago |
Hyungsuk Yoon
|
354f515596
|
scanf with width specifier need (width + 1) spaces
|
8 years ago |
nihui
|
47f21dbff2
|
fix travis build (#367)
* fix travis build
* build protobuf source
* Update .travis.yml
* Update .travis.yml
* parallel make
|
8 years ago |
Yantao Xie
|
89c7aa26f8
|
support the scientific notation when parsing layer's paramters.
|
8 years ago |