BUG1989
|
1b0e33460d
|
add armv7 int8 conv3x3s1,using vaddw to replace vadd and vmovl
|
7 years ago |
nihui
|
72411b7a6c
|
restore the old conv3x3s2 as reference, fast dilation convolution fails on striding
|
7 years ago |
nihui
|
1f20eb4e8c
|
pack weight and more unroll makes improvement, ~20% faster for conv3x3s2
|
7 years ago |
chensy
|
30cc738309
|
fix asm "invalid operand" error for target iOS armv7 on file dequantize_arm.cpp
|
7 years ago |
Diego Gomes
|
4d73407df8
|
fix gettid call for glibc
|
7 years ago |
Diego Gomes
|
534f38ed87
|
fix auxv read for elf64
|
7 years ago |
nihuini
|
2dbaf6f7b7
|
store int8 scale in binary
|
7 years ago |
nihui
|
fe14037777
|
more sub op preload
|
7 years ago |
nihui
|
2fe7ada4d8
|
add arm int8 convolution stub, preload group op for x86
|
7 years ago |
nihui
|
eac7c66a97
|
fix fp32 group convolution on x86
|
7 years ago |
nihui
|
5d04a3a45c
|
layer holds bottom blob scale, depthwise convolution read group scales
|
7 years ago |
nihui
|
354b95256c
|
bump param version, backward compatible
|
7 years ago |
nihuini
|
2bc504925e
|
fix int8_scales from multiple blobs, fix #512
|
7 years ago |
nihuini
|
da352916fe
|
fix pd using flag condition
|
8 years ago |
nihuini
|
6b536701c3
|
sub-mat shall be allocator-aware
|
8 years ago |
nihuini
|
e34aa7786a
|
armv7 int8 quantize/dequantize and conv1x1s1
|
8 years ago |
nihuini
|
4be27a0a89
|
int8 inference on x86
|
8 years ago |
nihui
|
a169cec363
|
core int8 inference, quantize and dequantize, net using flag, caffe2ncnn reads int8 scale table
|
8 years ago |
nihui
|
b6b90c888f
|
high resolution timestamp on windows
|
8 years ago |
nihui
|
af49e2cada
|
install allocator.h
|
8 years ago |
nihui
|
7e1f358084
|
fix build on msvc
|
8 years ago |
nihui
|
9706cd1447
|
implement ncnn blob/workspace allocator, fine-grained per-layer openmp threads control, fix #469
|
8 years ago |
nihui
|
5879cb4d15
|
sgemm outperform direct conv on large channel
|
8 years ago |
nihuini
|
4b8101e7fc
|
Revert "optimize interleave section for load first, about 5%~10% speed gain"
This reverts commit 1e4eaeeacd.
|
8 years ago |
nihui
|
56a667472a
|
sgemm is always faster on common channel size
|
8 years ago |
nihui
|
1e4eaeeacd
|
optimize interleave section for load first, about 5%~10% speed gain
|
8 years ago |
nihui
|
6895cbf810
|
single vldm is faster than two vld1 on armv7, and some pipeline optimize
|
8 years ago |
nihuini
|
05d7562a5d
|
reorder kernel weight, pipeline friendly ;)
|
8 years ago |
nihui
|
b8f4f024a4
|
implement reorg yolodetectionoutput layer from caffe-yolov2
|
8 years ago |
nihuini
|
ee98817446
|
proper first row/col handling in resize family, fix #429
|
8 years ago |
nihuini
|
511baa6718
|
optional image pixel api, fix #434
|
8 years ago |
nihuini
|
2368d29a1e
|
more explicit alignment on armv7
|
8 years ago |
nihuini
|
d172a34329
|
direct assembly port, enable convolution 1x1 sgemm on armv7
|
8 years ago |
nihuini
|
b3e24cafc3
|
openmp++
|
8 years ago |
nihuini
|
0fdb8da60e
|
sgemm convolution 1x1 wip, about 20%~75% faster on aarch64, while armv7 compiler is foolish qaq
|
8 years ago |
nihuini
|
2b20bf940c
|
drop armv7 vaddvq_f32 hack
|
8 years ago |
nihui
|
72bb261e7a
|
switch to winograd5
|
8 years ago |
nihuini
|
a234e9240d
|
fix concat on height
|
8 years ago |
nihuini
|
003873c55b
|
crop on channel and crop by param
|
8 years ago |
Chang, Hui-Tang
|
dc2a689d10
|
fix proposal roi_score_blob bug (#430)
|
8 years ago |
nihuini
|
99a343ce70
|
allocate after permute, reduce peak memory usage
|
8 years ago |
nihuini
|
0ce0c11851
|
load sub-op in advance for group convolution
|
8 years ago |
nihuini
|
86f4264c7c
|
arm neon assembly for winograd5
|
8 years ago |
kyuusaku
|
d2416187dc
|
fix parameter check for interp (#425)
|
8 years ago |
nihuini
|
90643630c2
|
apple a10/a11 is armv8.2-a
|
8 years ago |
nihuini
|
50e1f0e531
|
const for to_pixels family
|
8 years ago |
nihuini
|
ce74836e2a
|
yet another winograd convolution implementation, unroll outch 8 tiles 4 inch 4, about 22% faster, more optimization may comes soon :>
|
8 years ago |
nihui
|
30b6cc4ecd
|
rdiv binaryop
|
8 years ago |
nihui
|
2f90a794ad
|
rsub binaryop
|
8 years ago |
nihuini
|
a341e7465c
|
reject to load model with empty network, fix #392
|
8 years ago |