nihui
f10cc6dd93
initial data structure changes for 3dcnn, conv3d, pooling3d ( #3378 )
Co-authored-by: ElvisYu <elvisyuovo@gmail.com>
Co-authored-by: 余浩文 <m18107220188@163.com>
Co-authored-by: Zr2223 <67497651+Zr2223@users.noreply.github.com>
4 years ago
nihui
f4f7fabe27
fix python wheel, parallel 3 for macos tasks ( #3396 )
4 years ago
nihui
24fbb6e8cb
honor thread setting on load and vulkan command, ci avx512 t4 ( #3391 )
4 years ago
nihui
ed1fb210ea
arm neon optimization for innerproduct int8 gemm ( #3367 )
4 years ago
nihui
ac3d32aa0d
get_elf_hwcap_from_getauxval ( #3301 )
4 years ago
nihui
4566ad529a
fix rounding on x86 avx and armv7 neon ( #3365 )
4 years ago
源源球球✨
f66110b7e0
delete useless variables ( #3360 )
4 years ago
nihui
f433f86874
fix squeeze expanddims axes and add test ( #3359 )
4 years ago
nihui
525df8bcc5
rnn/lstm/gru with unequal input output ( #3352 )
4 years ago
nihui
f448a8f595
implement interp-1d on 2d blob ( #3349 )
4 years ago
nihui
5eb4a2ccd0
implement convolutiondepthwise1d ( #3342 )
4 years ago
nihui
b3a521981b
implement interp cubic aligncorner ( #3338 )
4 years ago
nihui
adfc8b25bc
fix deconv output pad ( #3337 )
4 years ago
nihui
f86a307ab5
silence code scanning
4 years ago
nihui
6a52e8e5f2
fix potential integer type overflow
4 years ago
nihui
a5b846371a
fix adaptive pooling vulkan, second try
4 years ago
Wang Kangyu
53040f566b
use int ceil/floor div ( #3333 )
4 years ago
nihuini
514770a37e
fix adaptive pooling vulkan
4 years ago
nihuini
80f2138b0c
fix adaptive pooling
4 years ago
nihuini
6c2cee8186
fix mat clone with atypical source cstep
4 years ago
nihuini
a490f8a533
fix layernorm with affine
4 years ago
Tijmen Verhulsdonck
ac5dc23ccc
added a number of optimized sse layers ( #3302 )
* added a number of optimized sse layers, specifically to increase performance of mobilenet style networks
4 years ago
zhiliu6
a08f700775
Optimize avx convolution activation ( #3299 )
* use general fmadd
* forceline x86 fmadd for better performance
* fix msvc compile warning
* simplify swish implementation
* Use activation layer for better performance
* Optimize x86 ConvolutionDepthWise activation
4 years ago
Tijmen Verhulsdonck
e3aa893dfb
move custom_layer_to_index to public ( #3294 )
4 years ago
nihui
aa9753b2f0
detach mat from local blob allocator so net instance could be destroyed much earlier ( #3287 )
4 years ago
zhiliu6
814f89ef1a
Fuse HardSwish activation into Convolution and InnerProduct ( #3233 )
* add general fused activation
* add NCNN_FORCE_INLINE option
4 years ago
nihui
4313d23355
update ci swiftshader 20211002 ( #2366 )
* build macos vulkan
* drop rt
* load swiftshader
* workaround swiftshader fltmax fp16 nan
Co-authored-by: zhuo@mbp <imzhuo@foxmail.com>
4 years ago
Tijmen Verhulsdonck
4270b5c502
Fix broken codepaths with AVX only ( #3254 )
* Fix codepaths for fp16 weights when only AVX is enabled
* Disable opt overrides
* Update SDK url
* Update vulkan SDK download version
* Debugging risv pad
* apply code-format changes
* fix padding test
* fix mips slice test
* fix lrn test
* implement mish swish image shader, fix pooling adaptive image storage support, drop debug output
* update ci ubuntu 18.04
Co-authored-by: nihui <shuizhuyuanluo@126.com>
4 years ago
Zhuo Zhang
492297d2f6
add A15 and M1 macro definitions ( #3263 )
4 years ago
nihui
57ad2c138c
fix build on c906, fix #3230
4 years ago
nihui
ceec22cd46
Update convolution_riscv.cpp ( #3050 )
4 years ago
Xavier Hsinyuan
a2f89e7392
RVV fp16/fp32 optimized Dropout, GRU and Softmax ( #3200 )
* RVV optimzied DropOut
* RVV optimized GRU, fp32
* RVV optimized GRU, fp16
* RVV optimzed Softmax
4 years ago
TianZer
c44a6c7f47
Remove two potential warnings for VisualStudio ( #3188 )
4 years ago
nihui
5a35c2b11e
load model memory by reference ( #3179 )
4 years ago
nihui
e9b5bbcd2d
fix armv7 roundmode ( #3176 )
4 years ago
yaobyPerfxlab
ec561736a5
Riscv64 c906 d1 ( #3159 )
* Use RVV spec 0.7.1 for C906.
* Fix code style issue.
* Update convolution_sgemm_packn_fp16s.h
RVV_SPEC_0_7 update
* apply code-format changes
Co-authored-by: Zhang Xianyi <xianyi@perfxlab.com>
Co-authored-by: yaobyPerfxlab <yaobyPerfxlab@users.noreply.github.com>
4 years ago
nihui
51652a2280
NCNN_MALLOC_OVERREAD for potential kernel reading data out of allocation size ( #3155 )
4 years ago
nihuini
a99cc13611
ncnnoptimize generate proper weight for int8 scales, fix non-neon innerproduct int8, fix #3157
4 years ago
nihui
c6cda8d07c
arm neon optimization for requantize leakyrelu ( #3144 )
* arm neon optimization for requantize leakyrelu
* add missing changes
* Update test_requantize.cpp
* more test coverage
4 years ago
nihuini
169614f732
fix build with NCNN_STDIO off
4 years ago
nihui
da7b64b833
fix build on c906
4 years ago
nihuini
11794675f3
apple a11 and a12 do not support armv8.2 dotprod, restore the fp16-only optimized path
4 years ago
nihuini
b9460c6e8f
fix armv7 requantize relu, fix #3122
4 years ago
Xavier Hsinyuan
2a5c672787
Add unittest and RVV optimized for SELU ( #3114 )
4 years ago
Xavier Hsinyuan
d78add2acd
RVV optimized PReLU, with fp16 support ( #3113 )
4 years ago
nihui
f9a16ea1ec
fix build on c906
4 years ago
Xavier Hsinyuan
9933cc776a
RVV optimized HardSwish and HardSigmoid ( #3108 )
* RVV optimized HardSwish, with fp16 support
* RVV optimized HardSigmoid, with fp16 support
* apply code-format changes
Co-authored-by: thelastlin <thelastlin@users.noreply.github.com>
4 years ago
nihui
d91cccfb55
apply code-format changes
4 years ago
Xavier Hsinyuan
99440e67f7
RVV optimized binaryop, with fp16 support ( #3097 )
4 years ago
nihui
2c4ae09604
fix #2961 ( #3095 )
4 years ago