nihuini
bffb2af2ff
fix build on armv7 without neon
6 years ago
nihuini
dc589351c1
link android
6 years ago
nihuini
a170ef1acf
remove the default option usage in layer interface, fix write out of range in cast arm pack4, handle fp16p conversion on cpu/gpu transfer
6 years ago
nihuini
336d1c1edd
remove the ncnn namespace for in source Option
6 years ago
nihui
90f04bab26
minor index optimization
6 years ago
nihuini
e73b06bbb8
fix build with NCNN_STRING=OFF
6 years ago
zyvv
bbcd5db817
Improvement for winograd43 int8 convolution ( #1354 )
* Update convolution_3x3_int8.h
* 针对winograd43 int8卷积的小改进
1. 使用非均匀量化系数,解决了权重转换溢出问题
2. 输出转换汇编优化
* Update convolution_3x3_int8.h
* Update convolution_3x3_int8.h
* Update convolution_3x3_int8.h
6 years ago
andywangII
66d7ac2463
arm-hisim100-linux make error convolutiondepthwise_3x3_int8.h for asm ( #1310 )
* arm-himix100-linux make error convolutiondepthwise_3x3_int8.h
* arm-hisim100-linux make error convolutiondepthwise_3x3_int8.h for asm
* convolutiondepthwise_3x3_int8.h f
6 years ago
Natsu
f11b772dd1
Fix min/max macro issue ( #1346 )
* Fix min/max macro issue
* Make NOMINMAX public, since benchmark is build under CMake, it will no longer be bothered by min/max
6 years ago
Joshua
96a2499051
fix spp::copy_make_border ( #1345 )
6 years ago
Christopher
2f26697aea
fix err in priorbox ( #1343 )
6 years ago
nihui
a4a162e36d
workaround validation layer complains about Cannot form constants of 8- or 16-bit types, due to specialization constants conversion
6 years ago
nihuini
b2758bd9a0
fix deconvolution vulkan output crop
6 years ago
yehao
60d4ff09d0
add mobilenetv3-ssd ( #1335 )
* update
* Update mobilenetv3ssdlite.cpp
* Update mobilenetv3ssdlite.cpp
* Update mobilenetv3ssdlite.cpp
6 years ago
ShuangLiu1992
ecfc09bbc8
position independent code ( #1333 )
* web
* add disable pic option
6 years ago
nihui
2905639890
fix null fp handling
6 years ago
Guocai He
bbbaedda57
fix Crop bugs ( #1326 )
* fix crop bugs
* fix crop_arm ang crop_vulkan bug about w_offset,_h_offset,c_offset
6 years ago
nihuini
64333429bb
data reader wrapper, fix #1325
6 years ago
nihuini
567e2bd501
a dirty hack for resolving int8 pack4 crash
6 years ago
bindog
9dfd1b05d3
fix bug in reduction ( #1321 )
6 years ago
nihuini
e8bb88830d
convert mxnet squeeze expanddims, convert onnx squeeze unsqueeze
6 years ago
nihuini
a6c60068e6
convert numpy style slice to crop
6 years ago
nihuini
009c4d9a75
convert mxnet reduce axis and keepdims, pad reflect, fix #739
6 years ago
nihuini
c4b84262d9
fix arm neon sgemm, fix #1283
6 years ago
nihuini
65ce6bccfd
faster weight transform for optimized kernel
6 years ago
BUG1989
69e2693c87
fix the bug of SMP cpu powersave not supported.
6 years ago
nihuini
cd4be6d0fa
call vulkan create_pipeline on the vkdev condition, drop opt_cpu hacks
6 years ago
nihuini
81a028547a
fix bus error on armv7
6 years ago
nihuini
19d75955d6
arm neon assembly optimization for conv3x3s1 winograd pack4to1
6 years ago
bindog
04b4b02324
[WIP] add reduce op support for onnx ( #1308 )
* [WIP] add reduce op support for onnx
* extend reduction to support 1,2-dim reduction and keepdims
* fix compile error
* split type to 3 flags && split keepdims to another function
6 years ago
nihuini
22a2be4e6c
fix crop pack4 with reference blob
6 years ago
nihuini
6a8e5c58da
fix build on armv7
6 years ago
nihuini
e63e2449fd
arm neon assembly optimization for conv7x7s2 pack1to4
6 years ago
nihui
56fd26a2da
arm neon assembly optimization for conv1x1s1 pack4to1
6 years ago
nihui
7ad514917b
fix potential out of write on unroll 12 remainder
6 years ago
nihuini
15e86dc8e9
reduce pack4 weight memory usage for specialized kernel, reduce runtime memory usage in conv3x3s1 winograd
6 years ago
nihuini
581a06d471
since innerproduct pack4 always consumes flattened blob, which layout is same as pack1 branch, so reuse pack1 implementation to reduce memory usage
6 years ago
nihuini
c5f1dc3fe4
arm neon assembly optimization for conv3x3s1 pack4to1
6 years ago
nihui
2f8b31c3b4
unroll outch 2 for conv3x3s1 pack1to4
6 years ago
nihui
e0f6e3f669
pre-interleave 8-channel weight data on aarch64, conv1x1s1 version
6 years ago
nihuini
d11bf14d44
pre-interleave 8-channel weight data on aarch64
6 years ago
nihui
7173b6e38e
arm neon assembly optimization for conv3x3s2 pack4
6 years ago
nihuini
cf0c49dd71
arm neon assembly optimization for conv5x5s1 pack4 and conv5x5s2 pack4
6 years ago
nihui
9e529354fb
arm neon optimization for conv1x1s2 pack4
6 years ago
nihuini
f8f3b0b5aa
shufflechannel pack4
6 years ago
nihuini
50d5896ce7
reshape pack4
6 years ago
nihuini
624291e2b2
use subop optimization for group convolution deconvolution pack4 family
6 years ago
nihui
48e3e7d49c
move neon activation into a wrapper function
6 years ago
nihui
8c1b87b1a2
fallback to cpu if no vulkan device found
6 years ago
nihui
b37ecab630
auto flatten before innerproduct pack4
6 years ago