nihuini
440db2c8fc
conv1x1 pack4 arm fp16sa
5 years ago
nihui
b0457e3ecd
conv1x1 pack8to1 arm fp16sa unroll size 4
5 years ago
nihuini
b5be1449d9
conv3x3s1 winograd pack8to4 arm fp16sa
5 years ago
nihuini
d17c26e925
conv1x1s1 pack4to8 pack8to4 arm fp16sa
5 years ago
nihuini
f5c5b79293
layernorm layer
5 years ago
nihuini
4ad5230c2a
groupnorm layer
5 years ago
nihuini
332399c05f
add gemm layer
5 years ago
nihuini
b35b5d7a3c
gemm layer
5 years ago
Zhuo Zhang
686a426935
fix #2012 , cmake find vulkan issue ( #2017 )
* give hint for VUKLAN_SDK on Windows
note:
1. tested with vulkan sdk 1.2.148.0, the .exe installer will automatically set VULKAN_SDK env var in System Config after installation.
2. only tested with VS2017. cmake 3.6.3 not support VS2017 since it is
released in 2016.
* fix cmake find vulkan on windows
* fix macos cmake find vulkan
Co-authored-by: Restyled.io <commits@restyled.io>
Co-authored-by: nihui <shuizhuyuanluo@126.com>
5 years ago
zhiliu6
952faf6668
Add yolov3detectionoutput test and AVX optimization ( #1994 )
5 years ago
nihui
036c94499e
fix conv1x1s1 vulkan
5 years ago
nihui
db5f05c6f0
conv1x1s1 conv3x3s1 winograd pack8to1 arm fp16sa
5 years ago
nihui
f6fb757672
fix convdw3x3s1 pack8 arm fp16sa
5 years ago
nihuini
bf0eb69bfe
convdw3x3s2 pack8 arm fp16sa neon assembly optimization
5 years ago
nihuini
9c33b6c1c8
conv1x1s1 arm fp16sa
5 years ago
nihuini
30ff3800d8
conv5x5s1 conv5x5s2 pack8 arm fp16sa neon assembly optimization
5 years ago
nihuini
62c453b16d
conv7x7s2 pack1to8 arm fp16sa neon assembly optimization
5 years ago
nihuini
b5b486fbfa
conv3x3s2 pack8 arm fp16sa neon assembly optimization
5 years ago
Zhuo Zhang
418047661c
fix #1984 & fix cmake ( #2000 )
5 years ago
ncnnnnn
e2557c1678
fix UNIT64_MAX not declared #2009 ( #2010 )
5 years ago
nihui
5a9c99ce00
convdw5x5s1 pack8 arm neon fp16sa assembly optimization
5 years ago
nihuini
e841ae73c6
fix arm fp16s feat output, fix #2003
5 years ago
nihuini
8df3a02391
unroll 12 for conv1x1s1 and conv3x3s1 winograd pack8 arm fp16sa
5 years ago
nihui
d8e9fc1443
conv3x3s1 conv3x3s2 pack1to8, padding pack8, relu pack8 arm neon fp16sa assmebly optimization
5 years ago
kingdeviljin
ac9cbaca56
#1993 resize_bilinear_c4 fix ( #1999 )
5 years ago
nihuini
b53f4072ce
convdw3x3s1 condw3x3s2 pack8 arm fp16sa
5 years ago
nihuini
f6d808b090
crop pack8 arm fp16s, conv3x3s2 pack1to8 arm fp16sa intrinsic
5 years ago
nihuini
bc05a71a7c
conv1x1s1 conv3x3s1 winograd arm fp16sa neon assembly optimization
5 years ago
nihuini
5d5a3d1434
conv1x1s1 conv1x1s2 conv3x3s1 winograd pack8 arm fp16sa
5 years ago
nihuini
20a0fc8628
packing honor thread count
5 years ago
nihui
54e79a62d7
fix crash on non-arm82 build
5 years ago
nihui
c173d51c9b
mish sigmoid swish tanh arm fp16s
5 years ago
nihui
72a27d4776
utility wrapper for neon float32 bfloat16 conversion, deconvolution deconvolutiondepthwise arm fp16s fp16sa bf16s
5 years ago
nihui
e644164873
reshape arm bf16s fp16s, flatten api
5 years ago
nihui
aa68246dc7
more test coverage
5 years ago
nihui
e7abc5fbd7
concat slice arm fp16sa pack8
5 years ago
nihui
aa1a9e90c5
interp shufflechannel arm fp16sa pack8
5 years ago
nihuini
c6d7525367
convolutiondepthwise arm fp16sa pack8
5 years ago
nihuini
bc3822acc3
convolution flatten arm fp16sa pack8
5 years ago
nihuini
dbb761b9a4
binaryop eltwise padding pooling arm fp16sa pack8
5 years ago
nihuini
91d91ba556
hardsigmoid hardswish arm fp16s fp16sa
5 years ago
nihuini
f23122bb3f
since fp16 storage option is on by default, upper-level function may pass fp32 storage with default option, guard with element bits checking
5 years ago
nihuini
a18a9fd8c5
eltwise arm fp16s fp16sa
5 years ago
nihuini
8385d81afa
interp arm fp16s fp16sa
5 years ago
nihuini
5a4243e44e
binaryop arm fp16s
5 years ago
nihuini
301abe657c
relu arm fp16s
5 years ago
nihui
03c9ed11d2
pooling arm fp16s fp16sa
5 years ago
nihui
71f86af8a6
fix non-arm82 ci
5 years ago
nihui
9a2e2a6937
convert fp32 blobs for layers with fp16 storage support
5 years ago
nihui
d4d501a7fe
fix innerproduct fp16sa
5 years ago