Lry89757
5eb56b2ea5
[Gelu x86] Finish intrinsic with elempack merged(fast version) ( #4144 )
* Finish the gelu x86 intrinsics
* Finish the fast tanh x86 simd impl
3 years ago
Lry89757
9f59711338
[Prelu x86] Finish intrinsic with elempack merged ( #4177 )
3 years ago
luqiang guo
5148224516
optmize softmax arm neon ( #4171 )
3 years ago
Menci
479a73a62a
remove duplicated newline ( #4188 )
3 years ago
Molly Sophia
1d7b2172cc
remove duplicated newline ( #4187 )
3 years ago
Lry89757
9278f90114
[Elu x86] Finish intrinsic with elempack merged ( #4153 )
3 years ago
nanjoin
3c0096c548
fix ConvolutionDepthwise allocator not updated ( #4173 )
3 years ago
tpoisonooo
acbaaa665b
fix compile warnings for unused parameter ( #4131 )
3 years ago
Lry89757
00c08d7bda
[Batchnorm x86] Merge the multiple elempack ( #4085 )
Co-authored-by: LRY89757 <LRY89757@users.noreply.github.com>
Co-authored-by: nihuini <nihuini@tencent.com>
3 years ago
LinHe
03f2ad38ce
Layer Norm x86 SIMD Optimizations ( #4065 )
Co-authored-by: LinHeLurking <LinHeLurking@users.noreply.github.com>
Co-authored-by: LRY89757 <LRY89757@users.noreply.github.com>
Co-authored-by: nihuini <nihuini@tencent.com>
3 years ago
nihui
b4ba207c18
more strict compiler rvv checks, drop rvv-071 support ( #4094 )
3 years ago
nihui
0666143513
fix vulkan winograd weight layout with cooperative matrix enabled ( #4093 )
3 years ago
miemie2013
720f3c9aab
Add DeformableConv2D ( #4070 )
* Add DeformableConv2D
* add unittest and docs
* pnnx torchvision deformconv2d conversion
Co-authored-by: miemie2013 <miemie2013@users.noreply.github.com>
Co-authored-by: nihui <shuizhuyuanluo@126.com>
3 years ago
nihui
4f414c1806
implement 4d memorydata ( #4074 )
* implement 4d memorydata
* fix ncnnoptimize memorydata 4d
3 years ago
Lry89757
13a9533984
[BatchNorm Optimize x86] AVX512 intrinsic ( #4061 )
* Add the test samples for elempack==16
* Add the AVX512 Support for batchnorm
3 years ago
nihui
30ab31cc41
add address sanitizer ci, fix potential memory leak shouted by asan ( #4058 )
3 years ago
nihui
0ea7a672fa
fix undefined reference to vkGetAndroidHardwareBufferPropertiesANDROID, add android-29 shared ci ( #4056 )
3 years ago
nihui
4bc4a5ed0b
check mat create oom ( #4054 )
3 years ago
nihui
1d0917c83b
fix build with very old gcc ( #4048 )
* clear bom marker, avoid vector data function
3 years ago
nihui
b0c40fa644
unified arm eltwise elempack ( #4040 )
3 years ago
nihui
76849cede4
armv8.4 i8mm optimization for convolution gemm int8 ( #4034 )
3 years ago
nihui
dd86cebab8
armv8.6 ci and coverage ( #4025 )
* asimdfhm in fc
* move neon bf16 conversion function to arm_usability header
* fix cmake option
* fix build with newer gcc
* arm84 coverage
* arm asimdfhm optimization for innerproduct gemm fp16s
3 years ago
nihui
f1ea792b26
fix too many microtask error in old libomp runtime ( #4002 )
3 years ago
nihui
9b8272e86d
arm edsp and arm neon optimization for convolution int8 winograd ( #4017 )
3 years ago
nihui
a12cd7c212
mips msa and loongson mmi optimization for convolution int8 winograd f43 ( #4014 )
3 years ago
nihui
5725c028c0
arm dsp infrastructure and optimization for convolution gemm int8 ( #4011 )
3 years ago
nihui
ef216f732e
armv5 optimization for convolution gemm int8 ( #4010 )
3 years ago
nihui
0a12f81a2d
fix data race in arm rnn/gru/lstm ( #4008 )
3 years ago
nihui
a5fb92db51
optimize innerproduct fp16s transform kernel ( #3994 )
3 years ago
sodo
3b3605eec4
add pkgconfig ( #3984 )
Signed-off-by: sodo <djdisodo@gmail.com>
3 years ago
nihui
8dbedf8a19
use cmake gnuinstalldirs for install destination ( #3968 )
3 years ago
nihui
b85bfb6085
armv8.2 asimdfhm and armv8.4 bf16 i8mm and armv8.6 sve sve2 compiler flags and runtime detection functions ( #3964 )
3 years ago
Yoh
a4ccad3325
Fix gcc4 simd conflict ( #3957 )
* fix mat::fill gcc4 avx sse conflict bug
* fix build and crash with gcc 4.4.0
Co-authored-by: nihui <shuizhuyuanluo@126.com>
3 years ago
nihui
27dc780005
mips msa optimization for innerproduct fp16s ( #3953 )
3 years ago
nihui
706831f8a9
arm vfpv4 optimization for innerproduct ( #3950 )
3 years ago
nihui
440bfdd2cc
x86 f16c optimization for innerproduct ( #3944 )
3 years ago
nihui
1fd7138d2f
armv7 vfpv4 infrastructure ( #3929 )
* armv7 vfpv4 infrastructure
* optional fp16 format ieee
* arm neon assembly optimization for cast fp16/bf16
3 years ago
nihui
067e8e1d92
mips unified elempack for elementwise layers ( #3928 )
3 years ago
nihui
d4ee0853a5
mips msa optimization for cast fp16 ( #3927 )
3 years ago
nihui
1377acf945
avx512 bf16 fp16 infrastructure ( #3926 )
3 years ago
nihui
8c06103132
riscv convolution winograd dot function and strategy ( #3921 )
* riscv convolution dot function
* move rvv ci to centos
* apply code-format changes
* fix gcov path
* move newlib riscv ci to centos
3 years ago
nihui
e7ca89853e
mips convolution winograd dot function and strategy ( #3925 )
3 years ago
nihui
20a14bf5ae
arm convolution winograd dot function, adjust arm convolution winograd strategy ( #3915 )
3 years ago
nihui
ca0ba4b25f
fine grained winograd options, adjust x86 convolution winograd strategy ( #3908 )
* fine grained winograd options
* x86 optimization for convolution winograd f23 pack4/pack8/pack16
* fix avx512 and t4 ci
* fix fast direct conv path
* winograd63 is actually slower than winograd43 on very large channel
3 years ago
Evgeny Proydakov
1c8f1ba7c6
Fixed LGTM warnings ( #3891 )
src/stb_image.h
src/layer/detectionoutput.cpp
tools/quantize/imreadwrite.cpp
4 years ago
Evgeny Proydakov
86a785c4aa
Fixed linux-gcc noint8t build: ( #3888 )
4 years ago
Evgeny Proydakov
4033670a6b
Fixed compile warning in linux-gcc-arm64 build arm82: ( #3889 )
4 years ago
Evgeny Proydakov
85e483e6ba
Fixed several compile warnings for ios build: ( #3885 )
4 years ago
nihui
9376ba71c1
less unroll for unaryop arm, fix padding arm warning
4 years ago
nihui
7886e90c65
split arm82 source for smaller binary and memory footprint ( #3877 )
* split arm82 source, wip
* check compiler arm82 only for arm 64bit target
* drop arm82 registery
* strict check compiler support arm82
4 years ago