nihui
6987efd950
fix scale avx512 ( #4580 )
3 years ago
nihui
dabc4c065f
arm convolution winograd unified elempack ( #4556 )
* update f43 coeffs
* arm convolution winograd unified elempack
* disable bf16s test atm
* test gnu inline asm off
3 years ago
WuJinxuan
ff80ac2955
[ARM] Multiheadattention ( #4463 )
3 years ago
nihui
d0c2738043
update riscv winograd f43 coeffs and fix some warnings ( #4537 )
* update winograd f43 coeffs
* rvv tanh rework
* fix warnings
* rebuild qemu
3 years ago
WuJinxuan
6572da3533
[x86] GroupNorm ( #4471 )
Co-authored-by: EdVince <EdVince@users.noreply.github.com>
3 years ago
nihui
1832da8292
concat 4d ( #4528 )
3 years ago
nihui
fb9cf7982d
eltwise 4d ( #4529 )
3 years ago
nihui
32e2de015e
slice 4d ( #4525 )
3 years ago
nihui
fc6ce4a641
copyto operator ( #4522 )
3 years ago
nihui
242e775d21
pnnx convert torch log10, pow 2 as square ( #4518 )
3 years ago
nihui
246e71c526
implement atan2 ( #4516 )
3 years ago
Fangjun Kuang
92e75105c9
Support torch.cumsum ( #4505 )
3 years ago
nihui
ab4cfbf5b0
enrich ncnn binary broadcast rules ( #4513 )
3 years ago
nihui
dfbcd3e69b
improve vulkan winograd f43 fp16 numerical stability ( #4492 )
3 years ago
nihui
fed99fd35b
gemm output transpose, prepack c ( #4479 )
* mha is now permute and reshape free
* gemm user defined tile mnk param
3 years ago
nihui
2e3e680d77
x86 optimization for packed convolution unified elempack ( #4469 )
3 years ago
nihui
88274827da
x86 optimization for winograd unified elempack ( #4456 )
3 years ago
nihui
15761fc1a6
arm vfpv4 asimdhp asimdfhm optimization for gemm ( #4432 )
3 years ago
nihui
c5640a16c3
gemm x86 multiply alpha beta in post gemm stage, enable one_blob_only ( #4407 )
* gemm x86 multiply alpha beta in post gemm stage, enable one_blob_only
* relax mnk multiple restrictions
* make square tiles in each thread
* sanitize num_threads changes
3 years ago
nihui
fd1ac3c7a0
x86 optimization for gemm unified elempack ( #4387 )
3 years ago
nihui
0736c5b658
Fix c api allocator ( #4360 )
* add some c_api interfaces related to allocator setup.
* fix errors in allocator parameters in c_api.
* test c api allocator
Co-authored-by: zhangtongshe <yuyuyezi@vip.qq.com>
3 years ago
nihui
057b5bb515
split tests ( #4354 )
3 years ago
nihui
eceac35a7f
implement MultiheadAttention kdim vdim ( #4347 )
3 years ago
nihui
498ca7341b
squeeze and expanddims 4d ( #4346 )
3 years ago
Lry89757
6a47f8d15c
gridsample op support ( #4288 )
Co-authored-by: LRY89757 <LRY89757@users.noreply.github.com>
Co-authored-by: nihuini <nihuini@tencent.com>
Co-authored-by: nihui <shuizhuyuanluo@126.com>
3 years ago
nihui
5b28c1730e
implement ncnn fold and unfold ( #4326 )
3 years ago
nihui
6e49fa30dc
groupnorm 1d/2d/4d ( #4312 )
3 years ago
Fangjun Kuang
5281d51535
implement GLU and pnnx conversion ( #4283 )
3 years ago
nihui
0b591b0d1f
implement layer feature disabled bit ( #4278 )
3 years ago
miemie2013
b13c2a16ce
Optimize x86 DeformableConv2D ( #4128 )
3 years ago
nihui
77eda4c19f
implement lstm proj_size ( #4263 )
3 years ago
Lry89757
5eb56b2ea5
[Gelu x86] Finish intrinsic with elempack merged(fast version) ( #4144 )
* Finish the gelu x86 intrinsics
* Finish the fast tanh x86 simd impl
3 years ago
Lry89757
9f59711338
[Prelu x86] Finish intrinsic with elempack merged ( #4177 )
3 years ago
Lry89757
9278f90114
[Elu x86] Finish intrinsic with elempack merged ( #4153 )
3 years ago
LinHe
03f2ad38ce
Layer Norm x86 SIMD Optimizations ( #4065 )
Co-authored-by: LinHeLurking <LinHeLurking@users.noreply.github.com>
Co-authored-by: LRY89757 <LRY89757@users.noreply.github.com>
Co-authored-by: nihuini <nihuini@tencent.com>
3 years ago
miemie2013
720f3c9aab
Add DeformableConv2D ( #4070 )
* Add DeformableConv2D
* add unittest and docs
* pnnx torchvision deformconv2d conversion
Co-authored-by: miemie2013 <miemie2013@users.noreply.github.com>
Co-authored-by: nihui <shuizhuyuanluo@126.com>
3 years ago
Lry89757
13a9533984
[BatchNorm Optimize x86] AVX512 intrinsic ( #4061 )
* Add the test samples for elempack==16
* Add the AVX512 Support for batchnorm
3 years ago
nihui
76849cede4
armv8.4 i8mm optimization for convolution gemm int8 ( #4034 )
3 years ago
nihui
dd86cebab8
armv8.6 ci and coverage ( #4025 )
* asimdfhm in fc
* move neon bf16 conversion function to arm_usability header
* fix cmake option
* fix build with newer gcc
* arm84 coverage
* arm asimdfhm optimization for innerproduct gemm fp16s
3 years ago
nihui
706831f8a9
arm vfpv4 optimization for innerproduct ( #3950 )
3 years ago
nihui
440bfdd2cc
x86 f16c optimization for innerproduct ( #3944 )
3 years ago
nihui
067e8e1d92
mips unified elempack for elementwise layers ( #3928 )
4 years ago
nihui
1377acf945
avx512 bf16 fp16 infrastructure ( #3926 )
4 years ago
nihui
20a14bf5ae
arm convolution winograd dot function, adjust arm convolution winograd strategy ( #3915 )
4 years ago
nihui
ca0ba4b25f
fine grained winograd options, adjust x86 convolution winograd strategy ( #3908 )
* fine grained winograd options
* x86 optimization for convolution winograd f23 pack4/pack8/pack16
* fix avx512 and t4 ci
* fix fast direct conv path
* winograd63 is actually slower than winograd43 on very large channel
4 years ago
Evgeny Proydakov
184c479b64
Added a simple unittest for Power layer. ( #3893 )
4 years ago
nihui
241524ffce
discard weight memory for x86 arm vulkan ( #3865 )
* discard weight memory for x86 and vulkan
* drop arm innerproduct weight
* drop arm convolution weight
* drop arm convolutiondepthwise weight
* drop x86 vulkan deconvolution deconvolutiondepthwise weight
* drop arm deconvolution deconvolutiondepthwise weight
* arm neon assembly optimization for innerproduct pack4
4 years ago
tpoisonooo
6fd801b6d7
feat(src/layer): add vision_transformer benchmark ( #3730 )
* feat(src/layer): add vision_transformer benchmark and relative layer
* refactor(testutil.h): add para for RandomMat
4 years ago
NaLan ZeYu
5388f9f312
test: fix printf arguments mismatch ( #3774 )
4 years ago
nihui
f9c1787de9
implement einsum layer and pnnx conversion ( #3768 )
4 years ago