nihuini
4ad5230c2a
groupnorm layer
5 years ago
nihuini
b35b5d7a3c
gemm layer
5 years ago
zhiliu6
952faf6668
Add yolov3detectionoutput test and AVX optimization ( #1994 )
5 years ago
nihui
036c94499e
fix conv1x1s1 vulkan
5 years ago
nihui
db5f05c6f0
conv1x1s1 conv3x3s1 winograd pack8to1 arm fp16sa
5 years ago
nihui
f6fb757672
fix convdw3x3s1 pack8 arm fp16sa
5 years ago
nihuini
bf0eb69bfe
convdw3x3s2 pack8 arm fp16sa neon assembly optimization
5 years ago
nihuini
9c33b6c1c8
conv1x1s1 arm fp16sa
5 years ago
nihuini
30ff3800d8
conv5x5s1 conv5x5s2 pack8 arm fp16sa neon assembly optimization
5 years ago
nihuini
62c453b16d
conv7x7s2 pack1to8 arm fp16sa neon assembly optimization
5 years ago
nihuini
b5b486fbfa
conv3x3s2 pack8 arm fp16sa neon assembly optimization
5 years ago
nihui
5a9c99ce00
convdw5x5s1 pack8 arm neon fp16sa assembly optimization
5 years ago
nihuini
8df3a02391
unroll 12 for conv1x1s1 and conv3x3s1 winograd pack8 arm fp16sa
5 years ago
nihui
d8e9fc1443
conv3x3s1 conv3x3s2 pack1to8, padding pack8, relu pack8 arm neon fp16sa assmebly optimization
5 years ago
nihuini
b53f4072ce
convdw3x3s1 condw3x3s2 pack8 arm fp16sa
6 years ago
nihuini
f6d808b090
crop pack8 arm fp16s, conv3x3s2 pack1to8 arm fp16sa intrinsic
6 years ago
nihuini
bc05a71a7c
conv1x1s1 conv3x3s1 winograd arm fp16sa neon assembly optimization
6 years ago
nihuini
5d5a3d1434
conv1x1s1 conv1x1s2 conv3x3s1 winograd pack8 arm fp16sa
6 years ago
nihuini
20a0fc8628
packing honor thread count
6 years ago
nihui
c173d51c9b
mish sigmoid swish tanh arm fp16s
6 years ago
nihui
72a27d4776
utility wrapper for neon float32 bfloat16 conversion, deconvolution deconvolutiondepthwise arm fp16s fp16sa bf16s
6 years ago
nihui
e644164873
reshape arm bf16s fp16s, flatten api
6 years ago
nihui
aa68246dc7
more test coverage
6 years ago
nihui
e7abc5fbd7
concat slice arm fp16sa pack8
6 years ago
nihui
aa1a9e90c5
interp shufflechannel arm fp16sa pack8
6 years ago
nihuini
c6d7525367
convolutiondepthwise arm fp16sa pack8
6 years ago
nihuini
bc3822acc3
convolution flatten arm fp16sa pack8
6 years ago
nihuini
dbb761b9a4
binaryop eltwise padding pooling arm fp16sa pack8
6 years ago
nihuini
91d91ba556
hardsigmoid hardswish arm fp16s fp16sa
6 years ago
nihuini
f23122bb3f
since fp16 storage option is on by default, upper-level function may pass fp32 storage with default option, guard with element bits checking
6 years ago
nihuini
a18a9fd8c5
eltwise arm fp16s fp16sa
6 years ago
nihuini
8385d81afa
interp arm fp16s fp16sa
6 years ago
nihuini
5a4243e44e
binaryop arm fp16s
6 years ago
nihuini
301abe657c
relu arm fp16s
6 years ago
nihui
03c9ed11d2
pooling arm fp16s fp16sa
6 years ago
nihui
d4d501a7fe
fix innerproduct fp16sa
6 years ago
nihui
e9c71a1ead
innerproduct arm fp16s fp16sa
6 years ago
nihui
1a57600bd7
fix ci crash
6 years ago
nihuini
11f5033249
convolutiondepthwise arm fp16s fp16sa
6 years ago
nihuini
6ab284bc3a
convolution arm fp16s fp16sa
6 years ago
nihuini
6d2c0e5683
flatten fp16s
6 years ago
nihuini
47ae0c151a
some shared arm bf16s fp16s implementation
6 years ago
zchrissirhcz
b80b84fda5
fix #1542 ; fix avx2 uint16_t including ( #1968 )
* fix #1542 ; fix avx2 uint16_t including
for #1542 , it is for compatibility for opencv 2.x, such as on ubuntu 16.04 apt installed opencv
6 years ago
nihuini
8700985540
yet another workaround for nexus6p gpu
6 years ago
nihui
21762e09e5
fix dilated convolution ( #1956 )
6 years ago
nihuini
4d2d625432
fix avx2 build, second try, fix #1953
6 years ago
nihuini
8b0890999a
fix avx2 build, fix #1953
6 years ago
nihui
88367f4164
Ci enable mips msa ( #1949 )
6 years ago
nihui
bb5bfe3841
avx2 infrastructure ( #1943 )
6 years ago
nihui
11cffce114
armv8.2 infrastructure ( #1856 )
* runtime cpu dispatch
* force thread one
* disable openmp for coverage
* simplify test layer
* print NCNN_TARGET_ARCH
* less ci build variants
* weight fp16 storage option
* test convdw int8
* apple a12 a13
* ncnn_add_layer ncnn_add_shader cmake macro
6 years ago