nihui
|
af49e2cada
|
install allocator.h
|
7 years ago |
nihui
|
ae467fee25
|
project-wide NOMINMAX on msvc
|
7 years ago |
nihui
|
7e1f358084
|
fix build on msvc
|
7 years ago |
nihui
|
9706cd1447
|
implement ncnn blob/workspace allocator, fine-grained per-layer openmp threads control, fix #469
|
7 years ago |
nihui
|
5879cb4d15
|
sgemm outperform direct conv on large channel
|
7 years ago |
nihui
|
20c0794b36
|
Update README.md
|
7 years ago |
nihuini
|
4b8101e7fc
|
Revert "optimize interleave section for load first, about 5%~10% speed gain"
This reverts commit 1e4eaeeacd.
|
7 years ago |
nihui
|
56a667472a
|
sgemm is always faster on common channel size
|
8 years ago |
nihui
|
1e4eaeeacd
|
optimize interleave section for load first, about 5%~10% speed gain
|
8 years ago |
Qu Xiaofeng / 曲晓峰
|
d0cad77a15
|
Fixed two typos (#466)
|
8 years ago |
nihui
|
6895cbf810
|
single vldm is faster than two vld1 on armv7, and some pipeline optimize
|
8 years ago |
nihuini
|
05d7562a5d
|
reorder kernel weight, pipeline friendly ;)
|
8 years ago |
nihuini
|
0bbdbf4ff8
|
add mobilenet-yolo
|
8 years ago |
nihuini
|
543d764674
|
fix yolo preprocess, comment about mobilenet-yolo
|
8 years ago |
nihui
|
5c6ef31e07
|
-x
|
8 years ago |
nihui
|
eb089c0b32
|
add yolov2 example
|
8 years ago |
nihui
|
a94e5adfd1
|
fix debug build
|
8 years ago |
nihui
|
0b6791e2ba
|
convert BN ReLU6 Reorg YoloDetectionOutput Embed LSTM
|
8 years ago |
nihui
|
b8f4f024a4
|
implement reorg yolodetectionoutput layer from caffe-yolov2
|
8 years ago |
kalcohol
|
8491f2b6a3
|
fix error C2059 and C2589 when using std::min and std::max. (#456)
|
8 years ago |
BUG1989
|
b3965e26cb
|
Update README.md (#452)
|
8 years ago |
nihuini
|
ee98817446
|
proper first row/col handling in resize family, fix #429
|
8 years ago |
nihuini
|
511baa6718
|
optional image pixel api, fix #434
|
8 years ago |
nihui
|
74d1c1470f
|
update qcom810 iphone5s benchmark result
|
8 years ago |
nihui
|
8275a08950
|
update qcom410 i.mx7 benchmark result
|
8 years ago |
nihuini
|
2368d29a1e
|
more explicit alignment on armv7
|
8 years ago |
nihuini
|
d172a34329
|
direct assembly port, enable convolution 1x1 sgemm on armv7
|
8 years ago |
nihuini
|
b3e24cafc3
|
openmp++
|
8 years ago |
nihuini
|
0fdb8da60e
|
sgemm convolution 1x1 wip, about 20%~75% faster on aarch64, while armv7 compiler is foolish qaq
|
8 years ago |
nihuini
|
2b20bf940c
|
drop armv7 vaddvq_f32 hack
|
8 years ago |
nihui
|
72bb261e7a
|
switch to winograd5
|
8 years ago |
nihuini
|
a234e9240d
|
fix concat on height
|
8 years ago |
nihuini
|
588487a8a0
|
convert caffe crop layer with three offset, fix #165
|
8 years ago |
nihuini
|
003873c55b
|
crop on channel and crop by param
|
8 years ago |
nihui
|
184cea1ced
|
Update README.md
|
8 years ago |
nihuini
|
fd9ef5716a
|
fix parsing inputs list in multiple lines
|
8 years ago |
Chang, Hui-Tang
|
dc2a689d10
|
fix proposal roi_score_blob bug (#430)
|
8 years ago |
nihuini
|
99a343ce70
|
allocate after permute, reduce peak memory usage
|
8 years ago |
nihuini
|
0ce0c11851
|
load sub-op in advance for group convolution
|
8 years ago |
nihuini
|
86f4264c7c
|
arm neon assembly for winograd5
|
8 years ago |
kyuusaku
|
d2416187dc
|
fix parameter check for interp (#425)
|
8 years ago |
nihuini
|
90643630c2
|
apple a10/a11 is armv8.2-a
|
8 years ago |
nihuini
|
5dc35f2860
|
w h c order
|
8 years ago |
nihuini
|
babbb604e1
|
fix deconvolution weight order
|
8 years ago |
nihuini
|
50e1f0e531
|
const for to_pixels family
|
8 years ago |
nihuini
|
b89851c6b6
|
convert sigmoid
|
8 years ago |
nihuini
|
ce74836e2a
|
yet another winograd convolution implementation, unroll outch 8 tiles 4 inch 4, about 22% faster, more optimization may comes soon :>
|
8 years ago |
nihui
|
18d7b3c3d8
|
Update README.md
|
8 years ago |
唐琦@异构计算
|
ba2fa28268
|
Update README.md (#420)
Add the benchmark of Rockchip RK3399
|
8 years ago |
nihui
|
94d9f393f6
|
ncnn pixel art
|
8 years ago |