Lry89757
5eb56b2ea5
[Gelu x86] Finish intrinsic with elempack merged(fast version) ( #4144 )
* Finish the gelu x86 intrinsics
* Finish the fast tanh x86 simd impl
3 years ago
Lry89757
9f59711338
[Prelu x86] Finish intrinsic with elempack merged ( #4177 )
3 years ago
Lry89757
9278f90114
[Elu x86] Finish intrinsic with elempack merged ( #4153 )
3 years ago
LinHe
03f2ad38ce
Layer Norm x86 SIMD Optimizations ( #4065 )
Co-authored-by: LinHeLurking <LinHeLurking@users.noreply.github.com>
Co-authored-by: LRY89757 <LRY89757@users.noreply.github.com>
Co-authored-by: nihuini <nihuini@tencent.com>
3 years ago
miemie2013
720f3c9aab
Add DeformableConv2D ( #4070 )
* Add DeformableConv2D
* add unittest and docs
* pnnx torchvision deformconv2d conversion
Co-authored-by: miemie2013 <miemie2013@users.noreply.github.com>
Co-authored-by: nihui <shuizhuyuanluo@126.com>
3 years ago
Lry89757
13a9533984
[BatchNorm Optimize x86] AVX512 intrinsic ( #4061 )
* Add the test samples for elempack==16
* Add the AVX512 Support for batchnorm
3 years ago
nihui
76849cede4
armv8.4 i8mm optimization for convolution gemm int8 ( #4034 )
3 years ago
nihui
dd86cebab8
armv8.6 ci and coverage ( #4025 )
* asimdfhm in fc
* move neon bf16 conversion function to arm_usability header
* fix cmake option
* fix build with newer gcc
* arm84 coverage
* arm asimdfhm optimization for innerproduct gemm fp16s
3 years ago
nihui
706831f8a9
arm vfpv4 optimization for innerproduct ( #3950 )
3 years ago
nihui
440bfdd2cc
x86 f16c optimization for innerproduct ( #3944 )
3 years ago
nihui
067e8e1d92
mips unified elempack for elementwise layers ( #3928 )
3 years ago
nihui
1377acf945
avx512 bf16 fp16 infrastructure ( #3926 )
3 years ago
nihui
20a14bf5ae
arm convolution winograd dot function, adjust arm convolution winograd strategy ( #3915 )
3 years ago
nihui
ca0ba4b25f
fine grained winograd options, adjust x86 convolution winograd strategy ( #3908 )
* fine grained winograd options
* x86 optimization for convolution winograd f23 pack4/pack8/pack16
* fix avx512 and t4 ci
* fix fast direct conv path
* winograd63 is actually slower than winograd43 on very large channel
4 years ago
Evgeny Proydakov
184c479b64
Added a simple unittest for Power layer. ( #3893 )
4 years ago
nihui
241524ffce
discard weight memory for x86 arm vulkan ( #3865 )
* discard weight memory for x86 and vulkan
* drop arm innerproduct weight
* drop arm convolution weight
* drop arm convolutiondepthwise weight
* drop x86 vulkan deconvolution deconvolutiondepthwise weight
* drop arm deconvolution deconvolutiondepthwise weight
* arm neon assembly optimization for innerproduct pack4
4 years ago
tpoisonooo
6fd801b6d7
feat(src/layer): add vision_transformer benchmark ( #3730 )
* feat(src/layer): add vision_transformer benchmark and relative layer
* refactor(testutil.h): add para for RandomMat
4 years ago
NaLan ZeYu
5388f9f312
test: fix printf arguments mismatch ( #3774 )
4 years ago
nihui
f9c1787de9
implement einsum layer and pnnx conversion ( #3768 )
4 years ago
nihui
ee6402553c
layernorm for vector and mat along w, pnnx convnext end2end test ( #3764 )
4 years ago
jasonZhang
e62d674e5d
Add unittest and SSE&AVX optimized for BNLL ( #3759 )
4 years ago
nihui
308965b7e9
sanitize cooperative matrix option in tests
4 years ago
nihui
0ea327b557
x86 sse/avx/avx512 optimization for softmax ( #3712 )
4 years ago
nihui
131f3d1323
x86 avx512 optimization for convolution winograd pack16to1 and deconvolution family, increase simpleomp argv count ( #3694 )
* convolution winograd pack16to1
* x86 deconvolution and deconvolutiondepthwise
* simpleomp allow 32 arguments
* drop shadow variable workaround
* less winograd test error
4 years ago
nihui
dadc640c66
x86 avx512 optimization ( #3581 )
* unified relu avx512
* unifed clip avx512
* unaryop avx512
* sigmoid avx512
* binaryop avx512
* padding convolution avx512
* convolutiondepthwise avx512
* innerproduct avx512
* reshape avx512
* slice avx512
* hardsigmoid hardswish avx512
* swish avx512
* pooling avx512
* crop avx512
* convolution sgemm pack16
* convolution 3x3 winograd pack16
* interp avx512
* convolution sgemm pack1to16
* convolution sgemm pack16to8
* convolution sgemm pack8to16
* convolution sgemm pack16to4
* fix vulkan permute pack8
* fix vulkan convolution gemm pack8to1
4 years ago
nihui
559e5b23f9
vulkan tensorcore optimization ( #3628 )
* query and enable cooperative matrix
* fix build with old vulkan sdk
* implement cooperative matrix optimization
* add nvidia-t4 coverage
* adjust test option for more coverage
4 years ago
nihui
002c07d4ec
mix vulkan winograd f23 and f43 ( #3639 )
* mix vulkan winograd f23 and f43
* larget epsilon for winograd optimization test
4 years ago
nihui
d42e048b56
pnnx convert torch.addmm ( #3634 )
4 years ago
nihui
6e19ab26ba
massive vulkan optimization ( #3602 )
* vulkan deconvolution sgemm col2im
* vulkan convolution winograd43
* improve fp16s numeric stablity
* vulkan convolution im2col sgemm
* check squeezenet top2, as top3 vs top4 score too close..
4 years ago
nihui
2880eff264
deconv1d deconv3d ( #3584 )
* fix sigmoid returns nan with very large input
4 years ago
nihui
920aa79f04
drop x86 avx2 fp16 ( #3568 )
4 years ago
nihui
d452eca28f
convert torch.matmul, eliminate noop pad and identity op, fuse transpose matmul, fuse select to unbind ( #3554 )
4 years ago
Yuzhong Yan
681141ff42
[YZ] Fix bug in unit test ( #3556 )
4 years ago
nihui
33e225f173
fix c api test
4 years ago
nihui
c5d7f963b9
layer tile ( #3491 )
4 years ago
Xiaohan Liu
3daabd515d
add missing doffset ( #3475 )
4 years ago
nihui
922f8b33c1
reduction4d, merge keepdims arg, add test ( #3469 )
4 years ago
nihui
3a83704c38
binary4d, unary4d ( #3443 )
4 years ago
nihui
6941ec8fc9
arm neon optimization for general packed convolution ( #3426 )
4 years ago
nihui
999e640d43
dynamic convolution weight ( #3408 )
4 years ago
nihui
f98c396e6b
crop4d ( #3402 )
4 years ago
nihui
cf20dbc0bd
relu3d, batchnorm3d, reshape4d, flatten4d, permute4d ( #3397 )
Co-authored-by: ElvisYu <elvisyuovo@gmail.com>
Co-authored-by: 余浩文 <m18107220188@163.com>
Co-authored-by: nihui <nihui@users.noreply.github.com>
Co-authored-by: Zr2223 <67497651+Zr2223@users.noreply.github.com>
Co-authored-by: Zr2223 <Zr2223@users.noreply.github.com>
4 years ago
nihui
f10cc6dd93
initial data structure changes for 3dcnn, conv3d, pooling3d ( #3378 )
Co-authored-by: ElvisYu <elvisyuovo@gmail.com>
Co-authored-by: 余浩文 <m18107220188@163.com>
Co-authored-by: Zr2223 <67497651+Zr2223@users.noreply.github.com>
4 years ago
nihui
24fbb6e8cb
honor thread setting on load and vulkan command, ci avx512 t4 ( #3391 )
4 years ago
nihui
f433f86874
fix squeeze expanddims axes and add test ( #3359 )
4 years ago
nihui
0b664ec438
fix potential out of range read in test with int8 inputs ( #3357 )
4 years ago
nihui
525df8bcc5
rnn/lstm/gru with unequal input output ( #3352 )
4 years ago
nihui
f448a8f595
implement interp-1d on 2d blob ( #3349 )
4 years ago
nihui
5eb4a2ccd0
implement convolutiondepthwise1d ( #3342 )
4 years ago
nihui
b3a521981b
implement interp cubic aligncorner ( #3338 )
4 years ago