tpoisonooo
acbaaa665b
fix compile warnings for unused parameter ( #4131 )
3 years ago
Lry89757
00c08d7bda
[Batchnorm x86] Merge the multiple elempack ( #4085 )
Co-authored-by: LRY89757 <LRY89757@users.noreply.github.com>
Co-authored-by: nihuini <nihuini@tencent.com>
3 years ago
LinHe
03f2ad38ce
Layer Norm x86 SIMD Optimizations ( #4065 )
Co-authored-by: LinHeLurking <LinHeLurking@users.noreply.github.com>
Co-authored-by: LRY89757 <LRY89757@users.noreply.github.com>
Co-authored-by: nihuini <nihuini@tencent.com>
3 years ago
nihui
b4ba207c18
more strict compiler rvv checks, drop rvv-071 support ( #4094 )
3 years ago
nihui
0666143513
fix vulkan winograd weight layout with cooperative matrix enabled ( #4093 )
3 years ago
miemie2013
720f3c9aab
Add DeformableConv2D ( #4070 )
* Add DeformableConv2D
* add unittest and docs
* pnnx torchvision deformconv2d conversion
Co-authored-by: miemie2013 <miemie2013@users.noreply.github.com>
Co-authored-by: nihui <shuizhuyuanluo@126.com>
3 years ago
nihui
4f414c1806
implement 4d memorydata ( #4074 )
* implement 4d memorydata
* fix ncnnoptimize memorydata 4d
3 years ago
Lry89757
13a9533984
[BatchNorm Optimize x86] AVX512 intrinsic ( #4061 )
* Add the test samples for elempack==16
* Add the AVX512 Support for batchnorm
3 years ago
nihui
30ab31cc41
add address sanitizer ci, fix potential memory leak shouted by asan ( #4058 )
3 years ago
nihui
0ea7a672fa
fix undefined reference to vkGetAndroidHardwareBufferPropertiesANDROID, add android-29 shared ci ( #4056 )
3 years ago
nihui
4bc4a5ed0b
check mat create oom ( #4054 )
3 years ago
nihui
1d0917c83b
fix build with very old gcc ( #4048 )
* clear bom marker, avoid vector data function
3 years ago
nihui
b0c40fa644
unified arm eltwise elempack ( #4040 )
3 years ago
nihui
76849cede4
armv8.4 i8mm optimization for convolution gemm int8 ( #4034 )
3 years ago
nihui
dd86cebab8
armv8.6 ci and coverage ( #4025 )
* asimdfhm in fc
* move neon bf16 conversion function to arm_usability header
* fix cmake option
* fix build with newer gcc
* arm84 coverage
* arm asimdfhm optimization for innerproduct gemm fp16s
3 years ago
nihui
f1ea792b26
fix too many microtask error in old libomp runtime ( #4002 )
3 years ago
nihui
9b8272e86d
arm edsp and arm neon optimization for convolution int8 winograd ( #4017 )
3 years ago
nihui
a12cd7c212
mips msa and loongson mmi optimization for convolution int8 winograd f43 ( #4014 )
3 years ago
nihui
5725c028c0
arm dsp infrastructure and optimization for convolution gemm int8 ( #4011 )
3 years ago
nihui
ef216f732e
armv5 optimization for convolution gemm int8 ( #4010 )
3 years ago
nihui
0a12f81a2d
fix data race in arm rnn/gru/lstm ( #4008 )
3 years ago
nihui
a5fb92db51
optimize innerproduct fp16s transform kernel ( #3994 )
3 years ago
sodo
3b3605eec4
add pkgconfig ( #3984 )
Signed-off-by: sodo <djdisodo@gmail.com>
3 years ago
nihui
8dbedf8a19
use cmake gnuinstalldirs for install destination ( #3968 )
3 years ago
nihui
b85bfb6085
armv8.2 asimdfhm and armv8.4 bf16 i8mm and armv8.6 sve sve2 compiler flags and runtime detection functions ( #3964 )
3 years ago
Yoh
a4ccad3325
Fix gcc4 simd conflict ( #3957 )
* fix mat::fill gcc4 avx sse conflict bug
* fix build and crash with gcc 4.4.0
Co-authored-by: nihui <shuizhuyuanluo@126.com>
3 years ago
nihui
27dc780005
mips msa optimization for innerproduct fp16s ( #3953 )
4 years ago
nihui
706831f8a9
arm vfpv4 optimization for innerproduct ( #3950 )
4 years ago
nihui
440bfdd2cc
x86 f16c optimization for innerproduct ( #3944 )
4 years ago
nihui
1fd7138d2f
armv7 vfpv4 infrastructure ( #3929 )
* armv7 vfpv4 infrastructure
* optional fp16 format ieee
* arm neon assembly optimization for cast fp16/bf16
4 years ago
nihui
067e8e1d92
mips unified elempack for elementwise layers ( #3928 )
4 years ago
nihui
d4ee0853a5
mips msa optimization for cast fp16 ( #3927 )
4 years ago
nihui
1377acf945
avx512 bf16 fp16 infrastructure ( #3926 )
4 years ago
nihui
8c06103132
riscv convolution winograd dot function and strategy ( #3921 )
* riscv convolution dot function
* move rvv ci to centos
* apply code-format changes
* fix gcov path
* move newlib riscv ci to centos
4 years ago
nihui
e7ca89853e
mips convolution winograd dot function and strategy ( #3925 )
4 years ago
nihui
20a14bf5ae
arm convolution winograd dot function, adjust arm convolution winograd strategy ( #3915 )
4 years ago
nihui
ca0ba4b25f
fine grained winograd options, adjust x86 convolution winograd strategy ( #3908 )
* fine grained winograd options
* x86 optimization for convolution winograd f23 pack4/pack8/pack16
* fix avx512 and t4 ci
* fix fast direct conv path
* winograd63 is actually slower than winograd43 on very large channel
4 years ago
Evgeny Proydakov
1c8f1ba7c6
Fixed LGTM warnings ( #3891 )
src/stb_image.h
src/layer/detectionoutput.cpp
tools/quantize/imreadwrite.cpp
4 years ago
Evgeny Proydakov
86a785c4aa
Fixed linux-gcc noint8t build: ( #3888 )
4 years ago
Evgeny Proydakov
4033670a6b
Fixed compile warning in linux-gcc-arm64 build arm82: ( #3889 )
4 years ago
Evgeny Proydakov
85e483e6ba
Fixed several compile warnings for ios build: ( #3885 )
4 years ago
nihui
9376ba71c1
less unroll for unaryop arm, fix padding arm warning
4 years ago
nihui
7886e90c65
split arm82 source for smaller binary and memory footprint ( #3877 )
* split arm82 source, wip
* check compiler arm82 only for arm 64bit target
* drop arm82 registery
* strict check compiler support arm82
4 years ago
nihui
c1f9b03c0b
unified arm absval clip relu dropout hardsigmoid hardswish sigmoid swish unaryop ( #3876 )
4 years ago
nihui
40a69a2dd3
discard riscv weight memory ( #3874 )
* discard riscv innerproduct weight
* drop riscv conv convdw weight
* drop riscv deconv deconvdw weight
4 years ago
nihui
06a36e9c1f
discard weight memory for mips ( #3869 )
4 years ago
nihui
241524ffce
discard weight memory for x86 arm vulkan ( #3865 )
* discard weight memory for x86 and vulkan
* drop arm innerproduct weight
* drop arm convolution weight
* drop arm convolutiondepthwise weight
* drop x86 vulkan deconvolution deconvolutiondepthwise weight
* drop arm deconvolution deconvolutiondepthwise weight
* arm neon assembly optimization for innerproduct pack4
4 years ago
nihui
d2e87a8264
mips general optimization for convdw3x3 ( #3859 )
4 years ago
nihui
48fb166a48
mips loongson mmi optimization for convolution gemm int8 ( #3855 )
4 years ago
nihui
667be10fb0
riscv general optimization for convolution sgemm and winograd and innerproduct ( #3857 )
* riscv general optimization for convolution sgemm and winograd pack1
* riscv general optimization for innerproduct
* riscv general optimization for convdw3x3
4 years ago