nihui
6941ec8fc9
arm neon optimization for general packed convolution ( #3426 )
4 years ago
nihui
999e640d43
dynamic convolution weight ( #3408 )
4 years ago
nihui
f98c396e6b
crop4d ( #3402 )
4 years ago
nihui
cf20dbc0bd
relu3d, batchnorm3d, reshape4d, flatten4d, permute4d ( #3397 )
Co-authored-by: ElvisYu <elvisyuovo@gmail.com>
Co-authored-by: 余浩文 <m18107220188@163.com>
Co-authored-by: nihui <nihui@users.noreply.github.com>
Co-authored-by: Zr2223 <67497651+Zr2223@users.noreply.github.com>
Co-authored-by: Zr2223 <Zr2223@users.noreply.github.com>
4 years ago
nihui
f10cc6dd93
initial data structure changes for 3dcnn, conv3d, pooling3d ( #3378 )
Co-authored-by: ElvisYu <elvisyuovo@gmail.com>
Co-authored-by: 余浩文 <m18107220188@163.com>
Co-authored-by: Zr2223 <67497651+Zr2223@users.noreply.github.com>
4 years ago
nihui
24fbb6e8cb
honor thread setting on load and vulkan command, ci avx512 t4 ( #3391 )
4 years ago
nihui
f433f86874
fix squeeze expanddims axes and add test ( #3359 )
4 years ago
nihui
0b664ec438
fix potential out of range read in test with int8 inputs ( #3357 )
4 years ago
nihui
525df8bcc5
rnn/lstm/gru with unequal input output ( #3352 )
4 years ago
nihui
f448a8f595
implement interp-1d on 2d blob ( #3349 )
4 years ago
nihui
5eb4a2ccd0
implement convolutiondepthwise1d ( #3342 )
4 years ago
nihui
b3a521981b
implement interp cubic aligncorner ( #3338 )
4 years ago
nihui
aa9753b2f0
detach mat from local blob allocator so net instance could be destroyed much earlier ( #3287 )
4 years ago
zhiliu6
814f89ef1a
Fuse HardSwish activation into Convolution and InnerProduct ( #3233 )
* add general fused activation
* add NCNN_FORCE_INLINE option
4 years ago
Tijmen Verhulsdonck
4270b5c502
Fix broken codepaths with AVX only ( #3254 )
* Fix codepaths for fp16 weights when only AVX is enabled
* Disable opt overrides
* Update SDK url
* Update vulkan SDK download version
* Debugging risv pad
* apply code-format changes
* fix padding test
* fix mips slice test
* fix lrn test
* implement mish swish image shader, fix pooling adaptive image storage support, drop debug output
* update ci ubuntu 18.04
Co-authored-by: nihui <shuizhuyuanluo@126.com>
4 years ago
zhiliu6
80699dd3f9
fix hardswish test beta param ( #3214 )
4 years ago
nihui
c6cda8d07c
arm neon optimization for requantize leakyrelu ( #3144 )
* arm neon optimization for requantize leakyrelu
* add missing changes
* Update test_requantize.cpp
* more test coverage
4 years ago
Xavier Hsinyuan
2a5c672787
Add unittest and RVV optimized for SELU ( #3114 )
4 years ago
nihuini
f1533667ff
fix test_c_api net instance destroyed earlier than blob destruction
4 years ago
Tijmen Verhulsdonck
eaa7e24db6
Added ability to switch AVX/AVX2 during runtime ( #3076 )
4 years ago
nihui
b413fd3a3d
auto code-format bot and disable restyled ( #3075 )
4 years ago
DaydreamCoding
f42d0e5dc9
fix warpaffine_bilinear_yuv420sp uv matrix ( #3048 )
4 years ago
nihui
4f135e07bf
implement convolution1d and pooling1d ( #3035 )
* implement convolution1d and pooling1d
* add conv1d pool1d test
* fuse convolution1d activation
* update operator doc
* fix vulkan adpative pooling
4 years ago
nihuini
12eaa6f9ba
update concat test
5 years ago
nihuini
a180bf7bdc
update concat test for larger channels
5 years ago
nihui
c1ce8ea84d
add more test
5 years ago
nihuini
07fa2e1fe3
prefer large channels for int8 operator tests
5 years ago
nihui
3a77b09c31
fix test failure
5 years ago
nihuini
fef61c5296
fix arm build
5 years ago
nihuini
934a1a8e32
test flatten packing padding int8
5 years ago
nihui
49f3e1ea09
drawing api and stb_image ( #2913 )
* drawing api
* add drawing test
* yuv420sp drawing
* enable simpleocv in webassembly build
5 years ago
nihui
17936e9f54
fix packing risc-v test, add cpu_riscv_vlenb()
5 years ago
nihui
a61f03ec76
arm neon optimization for pixelshuffle scale 2
5 years ago
nihuini
d6b2ea5aac
arm neon optimization for convolution 3x3 on small channels
5 years ago
nihui
7e1aaa5828
cmake option NCNN_INT8 ( #2839 )
5 years ago
nihui
66455c1b95
implement 2823 binary broadcasting type ( #2827 )
5 years ago
nihuini
41a4bea954
unroll size 8 for conv3x3s1 pack8to1 int8 arm64
5 years ago
nihui
e9cc637573
arm neon optimization for int8 packing kernels ( #2809 )
5 years ago
nihui
1ea8bfbd2e
x86 avx2 conv3x3s1 pack8 direct optimization, fix #2789
5 years ago
ncnnnnn
6e6cb9f4f3
simple sort ncnn_add_layer_test ( #2790 )
for obsessive
5 years ago
nihui
a48bf43ef7
test conv/fc int8 with activation
5 years ago
nihui
5fe75f19ef
architecture changes for int8 packing ( #2771 )
* quantize and dequantize tests
* unify activation and usability function
* drop NCNN_REQUANT cmake option, test dequantize requantize pack8, fix webassembly build
* benchmark use requantize int8 model
5 years ago
nihuini
15d63ec0f5
fuse onnx multiheadattention with same qkv blob
5 years ago
RBelogorodtsevFBase
1212ed6e94
implements gelu activation ( #2749 )
5 years ago
nihuini
c17eb4e208
multiheadattention layer
5 years ago
nihuini
7ac23ab34d
fuse onnx layernorm, fix 2-dim layernorm implementation, add test
5 years ago
nihui
3c92a1184b
arm neon optimization for general convolution im2col sgemm ( #2668 )
* arm neon optimization for conv3x3s1 winograd42
* better condition
* Update test_convolution.cpp
* Update test_convolution.cpp
* more proper conditions
* arm neon optimization for general im2col sgemm pack4
* add sgemm
* wip
* wip
* fix armv7 build
* more conditions blah blah
* code format
* fix convolution
* move packed convolution to seperated header source
* unify weight data bf16
* proper conditions
* conv3x3s2 sgemm pack4 test
5 years ago
nihui
ab56083ca5
arm neon optimization for conv3x3s1 winograd42 ( #2664 )
5 years ago
nihuini
f437bcdd4c
enable fp16s and int8s on newer adreno/mali, actually enable int8 tests
5 years ago
nihui
74451897cb
handle gemm in innerproduct ( #2607 )
5 years ago