nihui
72a27d4776
utility wrapper for neon float32 bfloat16 conversion, deconvolution deconvolutiondepthwise arm fp16s fp16sa bf16s
5 years ago
nihuini
c6d7525367
convolutiondepthwise arm fp16sa pack8
5 years ago
nihuini
bc3822acc3
convolution flatten arm fp16sa pack8
5 years ago
nihuini
f23122bb3f
since fp16 storage option is on by default, upper-level function may pass fp32 storage with default option, guard with element bits checking
5 years ago
nihuini
11f5033249
convolutiondepthwise arm fp16s fp16sa
5 years ago
nihuini
6ab284bc3a
convolution arm fp16s fp16sa
5 years ago
nihui
21762e09e5
fix dilated convolution ( #1956 )
5 years ago
nihuini
0d6cc01d55
innerproduct handle mish activation, fix naive C testing, fix #1930
5 years ago
nihui
b5e288b521
layer creator function is not necessary for built-in layers
5 years ago
nihui
01b8b79ed2
packing layout option respect support_packing property
6 years ago
nihui
3ef995ed1e
format code style and setup restyled.io ( #1840 )
6 years ago
zhiliu6
3bfabf1d6a
Add fused convolution and mish layer support. ( #1761 )
6 years ago
Naiyang Lin
ceef2470a5
Add logger.h ( #1753 )
6 years ago
nihuini
b2d9325c0d
test activation fusion
6 years ago
nihui
90e6be457b
conv1x1s1 bf16s neon kernel
6 years ago
nihuini
1984cad0e1
conv5x5s2 bf16s neon kernel
6 years ago
nihuini
17577775ae
conv5x5s1 bf16s neon kernel
6 years ago
nihui
6561334c5f
conv7x7s2 pack1to4 bf16s neon kernel
6 years ago
nihui
b7c82fcc45
code clean, concat bf16s
6 years ago
nihui
c3f966f7e7
conv3x3s1 pack4to1 bf16s neon kernel
6 years ago
nihuini
c6ebd13afb
conv1x1 pack4to1 bf16s neon kernel
6 years ago
nihui
7d1eec3d5d
the use_bf16_storage option
6 years ago
nihui
c819b4d839
fix build without openmp
6 years ago
nihui
e14716dfef
convolution and pooling make padding helper, flatten innerproduct pooling bf16s neon
6 years ago
nihui
57bedd59fa
fix build without neon
6 years ago
nihui
719d9f48ae
im2col wrt sgemm convolution option
6 years ago
nihui
25eb060b7c
fix potential crash on int8 convolution with no bias
6 years ago
tpoisonooo
2e51b026ce
fix some boring compile warnings ( #1510 )
6 years ago
tpoisonooo
7168829f06
Fix int8 requant ( #1499 )
* skip compile caffe tools
* add convolution_int8 requant test
* fix
* test int8requant
* revert code
* clean code
* fix CI error
* resolve review advices
* Update convolution_arm.cpp
Co-authored-by: nihui <shuizhuyuanluo@126.com>
6 years ago
nihui
6f2ef1932d
int8 code refactoring wip, add int8 test
6 years ago
nihui
038666e049
the initial auto test ( #1464 )
* cpu test
* wip
* ci run test
* travis ci for arm64
* arm64 ctest
* copy vulkan loader
* wip
* run
* Update ccpp.yml
* gpu test
* swiftshader
* cache macos swiftshader
* try MoltenVK
* try vulkaninfo
* give swiftshader another try
* disable failed macos gpu test
* more conv test, fix conv3x3s1 gpu test fail
* fix deconvolution test
* dilation test
* cmake option to build tests
* ncnn_add_layer_test macro
* host barrier before upload and after download, handle packing layout option
* test packing layout
* wip
* wip
* merge deconvolution packing and non-packing code
* merge convolution packing and non-packing code
* pass top_blob_count param
* fix build
* take care of non-coherent mappable memory
6 years ago
tpoisonooo
d702052449
Add assembly int8 gemm ( #1307 )
* add assembly gemm
* m1/m2/m4 assembler-local label
* clean useless code
* Update gemm_symm_int8.h
add license
6 years ago
nihuini
336d1c1edd
remove the ncnn namespace for in source Option
6 years ago
nihuini
567e2bd501
a dirty hack for resolving int8 pack4 crash
6 years ago
nihuini
65ce6bccfd
faster weight transform for optimized kernel
6 years ago
nihuini
cd4be6d0fa
call vulkan create_pipeline on the vkdev condition, drop opt_cpu hacks
6 years ago
nihuini
19d75955d6
arm neon assembly optimization for conv3x3s1 winograd pack4to1
6 years ago
nihuini
e63e2449fd
arm neon assembly optimization for conv7x7s2 pack1to4
6 years ago
nihui
56fd26a2da
arm neon assembly optimization for conv1x1s1 pack4to1
6 years ago
nihuini
15e86dc8e9
reduce pack4 weight memory usage for specialized kernel, reduce runtime memory usage in conv3x3s1 winograd
6 years ago
nihuini
c5f1dc3fe4
arm neon assembly optimization for conv3x3s1 pack4to1
6 years ago
nihui
e0f6e3f669
pre-interleave 8-channel weight data on aarch64, conv1x1s1 version
6 years ago
nihui
7173b6e38e
arm neon assembly optimization for conv3x3s2 pack4
6 years ago
nihuini
cf0c49dd71
arm neon assembly optimization for conv5x5s1 pack4 and conv5x5s2 pack4
6 years ago
nihui
9e529354fb
arm neon optimization for conv1x1s2 pack4
6 years ago
nihui
48e3e7d49c
move neon activation into a wrapper function
6 years ago
nihui
e19b7097df
arm neon assembly optimization for conv3x3s1 pack1to4
6 years ago
nihui
3a452f734a
arm neon assembly optimization for conv3x3s2 pack1to4
6 years ago
nihui
6edd42f566
arm neon assembly for conv1x1s1 and conv3x3s1 winograd pack4
6 years ago
nihuini
c0a4ffcf66
convolution pad_value param
6 years ago