nihui
559e5b23f9
vulkan tensorcore optimization ( #3628 )
* query and enable cooperative matrix
* fix build with old vulkan sdk
* implement cooperative matrix optimization
* add nvidia-t4 coverage
* adjust test option for more coverage
4 years ago
nihui
002c07d4ec
mix vulkan winograd f23 and f43 ( #3639 )
* mix vulkan winograd f23 and f43
* larget epsilon for winograd optimization test
4 years ago
nihui
d42e048b56
pnnx convert torch.addmm ( #3634 )
4 years ago
nihui
6e19ab26ba
massive vulkan optimization ( #3602 )
* vulkan deconvolution sgemm col2im
* vulkan convolution winograd43
* improve fp16s numeric stablity
* vulkan convolution im2col sgemm
* check squeezenet top2, as top3 vs top4 score too close..
4 years ago
nihui
2880eff264
deconv1d deconv3d ( #3584 )
* fix sigmoid returns nan with very large input
4 years ago
nihui
920aa79f04
drop x86 avx2 fp16 ( #3568 )
4 years ago
nihui
d452eca28f
convert torch.matmul, eliminate noop pad and identity op, fuse transpose matmul, fuse select to unbind ( #3554 )
4 years ago
Yuzhong Yan
681141ff42
[YZ] Fix bug in unit test ( #3556 )
4 years ago
nihui
33e225f173
fix c api test
4 years ago
nihui
c5d7f963b9
layer tile ( #3491 )
4 years ago
Xiaohan Liu
3daabd515d
add missing doffset ( #3475 )
4 years ago
nihui
922f8b33c1
reduction4d, merge keepdims arg, add test ( #3469 )
4 years ago
nihui
3a83704c38
binary4d, unary4d ( #3443 )
4 years ago
nihui
6941ec8fc9
arm neon optimization for general packed convolution ( #3426 )
4 years ago
nihui
999e640d43
dynamic convolution weight ( #3408 )
4 years ago
nihui
f98c396e6b
crop4d ( #3402 )
4 years ago
nihui
cf20dbc0bd
relu3d, batchnorm3d, reshape4d, flatten4d, permute4d ( #3397 )
Co-authored-by: ElvisYu <elvisyuovo@gmail.com>
Co-authored-by: 余浩文 <m18107220188@163.com>
Co-authored-by: nihui <nihui@users.noreply.github.com>
Co-authored-by: Zr2223 <67497651+Zr2223@users.noreply.github.com>
Co-authored-by: Zr2223 <Zr2223@users.noreply.github.com>
4 years ago
nihui
f10cc6dd93
initial data structure changes for 3dcnn, conv3d, pooling3d ( #3378 )
Co-authored-by: ElvisYu <elvisyuovo@gmail.com>
Co-authored-by: 余浩文 <m18107220188@163.com>
Co-authored-by: Zr2223 <67497651+Zr2223@users.noreply.github.com>
4 years ago
nihui
24fbb6e8cb
honor thread setting on load and vulkan command, ci avx512 t4 ( #3391 )
4 years ago
nihui
f433f86874
fix squeeze expanddims axes and add test ( #3359 )
4 years ago
nihui
0b664ec438
fix potential out of range read in test with int8 inputs ( #3357 )
4 years ago
nihui
525df8bcc5
rnn/lstm/gru with unequal input output ( #3352 )
4 years ago
nihui
f448a8f595
implement interp-1d on 2d blob ( #3349 )
4 years ago
nihui
5eb4a2ccd0
implement convolutiondepthwise1d ( #3342 )
4 years ago
nihui
b3a521981b
implement interp cubic aligncorner ( #3338 )
4 years ago
nihui
aa9753b2f0
detach mat from local blob allocator so net instance could be destroyed much earlier ( #3287 )
4 years ago
zhiliu6
814f89ef1a
Fuse HardSwish activation into Convolution and InnerProduct ( #3233 )
* add general fused activation
* add NCNN_FORCE_INLINE option
4 years ago
Tijmen Verhulsdonck
4270b5c502
Fix broken codepaths with AVX only ( #3254 )
* Fix codepaths for fp16 weights when only AVX is enabled
* Disable opt overrides
* Update SDK url
* Update vulkan SDK download version
* Debugging risv pad
* apply code-format changes
* fix padding test
* fix mips slice test
* fix lrn test
* implement mish swish image shader, fix pooling adaptive image storage support, drop debug output
* update ci ubuntu 18.04
Co-authored-by: nihui <shuizhuyuanluo@126.com>
4 years ago
zhiliu6
80699dd3f9
fix hardswish test beta param ( #3214 )
4 years ago
nihui
c6cda8d07c
arm neon optimization for requantize leakyrelu ( #3144 )
* arm neon optimization for requantize leakyrelu
* add missing changes
* Update test_requantize.cpp
* more test coverage
4 years ago
Xavier Hsinyuan
2a5c672787
Add unittest and RVV optimized for SELU ( #3114 )
4 years ago
nihuini
f1533667ff
fix test_c_api net instance destroyed earlier than blob destruction
4 years ago
Tijmen Verhulsdonck
eaa7e24db6
Added ability to switch AVX/AVX2 during runtime ( #3076 )
4 years ago
nihui
b413fd3a3d
auto code-format bot and disable restyled ( #3075 )
4 years ago
DaydreamCoding
f42d0e5dc9
fix warpaffine_bilinear_yuv420sp uv matrix ( #3048 )
5 years ago
nihui
4f135e07bf
implement convolution1d and pooling1d ( #3035 )
* implement convolution1d and pooling1d
* add conv1d pool1d test
* fuse convolution1d activation
* update operator doc
* fix vulkan adpative pooling
5 years ago
nihuini
12eaa6f9ba
update concat test
5 years ago
nihuini
a180bf7bdc
update concat test for larger channels
5 years ago
nihui
c1ce8ea84d
add more test
5 years ago
nihuini
07fa2e1fe3
prefer large channels for int8 operator tests
5 years ago
nihui
3a77b09c31
fix test failure
5 years ago
nihuini
fef61c5296
fix arm build
5 years ago
nihuini
934a1a8e32
test flatten packing padding int8
5 years ago
nihui
49f3e1ea09
drawing api and stb_image ( #2913 )
* drawing api
* add drawing test
* yuv420sp drawing
* enable simpleocv in webassembly build
5 years ago
nihui
17936e9f54
fix packing risc-v test, add cpu_riscv_vlenb()
5 years ago
nihui
a61f03ec76
arm neon optimization for pixelshuffle scale 2
5 years ago
nihuini
d6b2ea5aac
arm neon optimization for convolution 3x3 on small channels
5 years ago
nihui
7e1aaa5828
cmake option NCNN_INT8 ( #2839 )
5 years ago
nihui
66455c1b95
implement 2823 binary broadcasting type ( #2827 )
5 years ago
nihuini
41a4bea954
unroll size 8 for conv3x3s1 pack8to1 int8 arm64
5 years ago