nihui
241524ffce
discard weight memory for x86 arm vulkan ( #3865 )
* discard weight memory for x86 and vulkan
* drop arm innerproduct weight
* drop arm convolution weight
* drop arm convolutiondepthwise weight
* drop x86 vulkan deconvolution deconvolutiondepthwise weight
* drop arm deconvolution deconvolutiondepthwise weight
* arm neon assembly optimization for innerproduct pack4
4 years ago
tpoisonooo
6fd801b6d7
feat(src/layer): add vision_transformer benchmark ( #3730 )
* feat(src/layer): add vision_transformer benchmark and relative layer
* refactor(testutil.h): add para for RandomMat
4 years ago
NaLan ZeYu
5388f9f312
test: fix printf arguments mismatch ( #3774 )
4 years ago
nihui
f9c1787de9
implement einsum layer and pnnx conversion ( #3768 )
4 years ago
nihui
ee6402553c
layernorm for vector and mat along w, pnnx convnext end2end test ( #3764 )
4 years ago
jasonZhang
e62d674e5d
Add unittest and SSE&AVX optimized for BNLL ( #3759 )
4 years ago
nihui
308965b7e9
sanitize cooperative matrix option in tests
4 years ago
nihui
0ea327b557
x86 sse/avx/avx512 optimization for softmax ( #3712 )
4 years ago
nihui
131f3d1323
x86 avx512 optimization for convolution winograd pack16to1 and deconvolution family, increase simpleomp argv count ( #3694 )
* convolution winograd pack16to1
* x86 deconvolution and deconvolutiondepthwise
* simpleomp allow 32 arguments
* drop shadow variable workaround
* less winograd test error
4 years ago
nihui
dadc640c66
x86 avx512 optimization ( #3581 )
* unified relu avx512
* unifed clip avx512
* unaryop avx512
* sigmoid avx512
* binaryop avx512
* padding convolution avx512
* convolutiondepthwise avx512
* innerproduct avx512
* reshape avx512
* slice avx512
* hardsigmoid hardswish avx512
* swish avx512
* pooling avx512
* crop avx512
* convolution sgemm pack16
* convolution 3x3 winograd pack16
* interp avx512
* convolution sgemm pack1to16
* convolution sgemm pack16to8
* convolution sgemm pack8to16
* convolution sgemm pack16to4
* fix vulkan permute pack8
* fix vulkan convolution gemm pack8to1
4 years ago
nihui
559e5b23f9
vulkan tensorcore optimization ( #3628 )
* query and enable cooperative matrix
* fix build with old vulkan sdk
* implement cooperative matrix optimization
* add nvidia-t4 coverage
* adjust test option for more coverage
4 years ago
nihui
002c07d4ec
mix vulkan winograd f23 and f43 ( #3639 )
* mix vulkan winograd f23 and f43
* larget epsilon for winograd optimization test
4 years ago
nihui
d42e048b56
pnnx convert torch.addmm ( #3634 )
4 years ago
nihui
6e19ab26ba
massive vulkan optimization ( #3602 )
* vulkan deconvolution sgemm col2im
* vulkan convolution winograd43
* improve fp16s numeric stablity
* vulkan convolution im2col sgemm
* check squeezenet top2, as top3 vs top4 score too close..
4 years ago
nihui
2880eff264
deconv1d deconv3d ( #3584 )
* fix sigmoid returns nan with very large input
4 years ago
nihui
920aa79f04
drop x86 avx2 fp16 ( #3568 )
4 years ago
nihui
d452eca28f
convert torch.matmul, eliminate noop pad and identity op, fuse transpose matmul, fuse select to unbind ( #3554 )
4 years ago
Yuzhong Yan
681141ff42
[YZ] Fix bug in unit test ( #3556 )
4 years ago
nihui
33e225f173
fix c api test
4 years ago
nihui
c5d7f963b9
layer tile ( #3491 )
4 years ago
Xiaohan Liu
3daabd515d
add missing doffset ( #3475 )
4 years ago
nihui
922f8b33c1
reduction4d, merge keepdims arg, add test ( #3469 )
4 years ago
nihui
3a83704c38
binary4d, unary4d ( #3443 )
4 years ago
nihui
6941ec8fc9
arm neon optimization for general packed convolution ( #3426 )
4 years ago
nihui
999e640d43
dynamic convolution weight ( #3408 )
4 years ago
nihui
f98c396e6b
crop4d ( #3402 )
4 years ago
nihui
cf20dbc0bd
relu3d, batchnorm3d, reshape4d, flatten4d, permute4d ( #3397 )
Co-authored-by: ElvisYu <elvisyuovo@gmail.com>
Co-authored-by: 余浩文 <m18107220188@163.com>
Co-authored-by: nihui <nihui@users.noreply.github.com>
Co-authored-by: Zr2223 <67497651+Zr2223@users.noreply.github.com>
Co-authored-by: Zr2223 <Zr2223@users.noreply.github.com>
4 years ago
nihui
f10cc6dd93
initial data structure changes for 3dcnn, conv3d, pooling3d ( #3378 )
Co-authored-by: ElvisYu <elvisyuovo@gmail.com>
Co-authored-by: 余浩文 <m18107220188@163.com>
Co-authored-by: Zr2223 <67497651+Zr2223@users.noreply.github.com>
4 years ago
nihui
24fbb6e8cb
honor thread setting on load and vulkan command, ci avx512 t4 ( #3391 )
4 years ago
nihui
f433f86874
fix squeeze expanddims axes and add test ( #3359 )
4 years ago
nihui
0b664ec438
fix potential out of range read in test with int8 inputs ( #3357 )
4 years ago
nihui
525df8bcc5
rnn/lstm/gru with unequal input output ( #3352 )
4 years ago
nihui
f448a8f595
implement interp-1d on 2d blob ( #3349 )
4 years ago
nihui
5eb4a2ccd0
implement convolutiondepthwise1d ( #3342 )
4 years ago
nihui
b3a521981b
implement interp cubic aligncorner ( #3338 )
4 years ago
nihui
aa9753b2f0
detach mat from local blob allocator so net instance could be destroyed much earlier ( #3287 )
4 years ago
zhiliu6
814f89ef1a
Fuse HardSwish activation into Convolution and InnerProduct ( #3233 )
* add general fused activation
* add NCNN_FORCE_INLINE option
4 years ago
Tijmen Verhulsdonck
4270b5c502
Fix broken codepaths with AVX only ( #3254 )
* Fix codepaths for fp16 weights when only AVX is enabled
* Disable opt overrides
* Update SDK url
* Update vulkan SDK download version
* Debugging risv pad
* apply code-format changes
* fix padding test
* fix mips slice test
* fix lrn test
* implement mish swish image shader, fix pooling adaptive image storage support, drop debug output
* update ci ubuntu 18.04
Co-authored-by: nihui <shuizhuyuanluo@126.com>
4 years ago
zhiliu6
80699dd3f9
fix hardswish test beta param ( #3214 )
4 years ago
nihui
c6cda8d07c
arm neon optimization for requantize leakyrelu ( #3144 )
* arm neon optimization for requantize leakyrelu
* add missing changes
* Update test_requantize.cpp
* more test coverage
4 years ago
Xavier Hsinyuan
2a5c672787
Add unittest and RVV optimized for SELU ( #3114 )
4 years ago
nihuini
f1533667ff
fix test_c_api net instance destroyed earlier than blob destruction
4 years ago
Tijmen Verhulsdonck
eaa7e24db6
Added ability to switch AVX/AVX2 during runtime ( #3076 )
4 years ago
nihui
b413fd3a3d
auto code-format bot and disable restyled ( #3075 )
4 years ago
DaydreamCoding
f42d0e5dc9
fix warpaffine_bilinear_yuv420sp uv matrix ( #3048 )
4 years ago
nihui
4f135e07bf
implement convolution1d and pooling1d ( #3035 )
* implement convolution1d and pooling1d
* add conv1d pool1d test
* fuse convolution1d activation
* update operator doc
* fix vulkan adpative pooling
4 years ago
nihuini
12eaa6f9ba
update concat test
4 years ago
nihuini
a180bf7bdc
update concat test for larger channels
4 years ago
nihui
c1ce8ea84d
add more test
5 years ago
nihuini
07fa2e1fe3
prefer large channels for int8 operator tests
5 years ago