nihui
f22e5d4a6d
fix build
6 years ago
nihui
3cd7a30172
shufflechannel bf16s
6 years ago
nihui
90e6be457b
conv1x1s1 bf16s neon kernel
6 years ago
nihui
7a89ce6223
slice bf16s
6 years ago
nihui
e23d5038ab
clip sigmoid tanh bf16s
6 years ago
nihuini
9ce0ad78ff
hardswish bf16s
6 years ago
nihuini
3b243cc7d5
hardsigmoid bf16s
6 years ago
nihuini
867ff7ae97
binaryop bf16s
6 years ago
nihuini
d2f7fc5a76
fix dwconv5x5s1 pack4 bf16s on aarch64
6 years ago
nihui
efaa1a4af1
dwconv5x5s1 pack4 bf16s neon kernel
6 years ago
nihui
ec40b4dbd7
test bf16s ( #1644 )
* wip
* wip
* wip
* fix avx2 test
6 years ago
Michael Grad
1dea8774b9
fix arch64 build ( #1633 )
Co-authored-by: mgrad <mgrad@meraki.com>
6 years ago
nihuini
5255d2c328
dwconv5x5s2 pack4 bf16s neon kernel
6 years ago
nihui
d023137426
test fp16 packed and shader pack8 option ( #1636 )
* wip
* fix slice pack8 test
* fix flatten pack8 test
* fix binaryop pack8 test
* fix interp pack8 test
* rewrite cast test for different blob type and packing
6 years ago
nihuini
1984cad0e1
conv5x5s2 bf16s neon kernel
6 years ago
Xu Yang
dbd9cbab4a
fix layer innerproduct when build with requant option on ( #1624 )
6 years ago
Leo
63be294d81
Add mips layers ( #1496 )
* Add mips softmax layer
* Fix bias value error in bias_layer
* Fix max and min value error in clip_layer
* Remove unused elempack variable
* Add sigmoid mips layer
* Add mips tanh math func
* Add mips tanh layer
* Add msa_fill_w_f32 for load float type data
* Remove conv layer header
6 years ago
nihuini
17577775ae
conv5x5s1 bf16s neon kernel
6 years ago
nihui
14b8b68c2c
onnx lstm ( #1613 )
* lstm with direction
* convert onnx lstm
* lstm works
6 years ago
nihui
6561334c5f
conv7x7s2 pack1to4 bf16s neon kernel
6 years ago
nihui
b7c82fcc45
code clean, concat bf16s
6 years ago
nihui
c3f966f7e7
conv3x3s1 pack4to1 bf16s neon kernel
6 years ago
nihuini
c6ebd13afb
conv1x1 pack4to1 bf16s neon kernel
6 years ago
nihui
7d1eec3d5d
the use_bf16_storage option
6 years ago
nihui
beea536e22
relu pack4 bf16s neon assembly optimization
6 years ago
nihui
5be84ef76d
relu bf16s neon
6 years ago
nihui
c819b4d839
fix build without openmp
6 years ago
nihui
e14716dfef
convolution and pooling make padding helper, flatten innerproduct pooling bf16s neon
6 years ago
nihui
f8abe645bb
crop arm bf16
6 years ago
nihui
2c3b70f8b2
else if is ugly :D
6 years ago
nihui
d599791f59
padding pack4 bf16s neon kernel
6 years ago
xieydd
b760e22da2
fix requant relu6 bug ( #1590 )
* fix requant relu6 bug
* fix
* delete pipeline change in forward/forward_inplace avoid race in multithreading
6 years ago
nihui
15145be72b
convdw3x3 pack4 bf16s neon kernel
6 years ago
nihui
1f2aafce8a
conv3x3 pack1to4 bf16s neon kernel
6 years ago
nihuini
a5525a0587
fix group convolution 3x3s1 with size hit winograd bound, fix #1581
6 years ago
nihui
8ffcb2f973
add conv1x1 conv3x3 pack4 bf16s neon kernel
6 years ago
nihui
44eb28fadc
fix cast arm packing test
6 years ago
nihui
f214883203
cast between float32 and bfloat16
6 years ago
Cai Shanli
0281e51fe5
named all enum types ( #1570 )
6 years ago
nihui
9929d52885
less duplicated code for crop layer, slice axes starts from 0
6 years ago
nihui
57bedd59fa
fix build without neon
6 years ago
nihui
82fa27967e
priorbox shader fix
6 years ago
zhiliu6
2606d2a2c9
Fix AVX convolution problem. ( #1549 )
6 years ago
nihui
bdc6681d44
replace nullptr to 0, fix build on old toolchain, fix #1541
6 years ago
zhiliu6
01bfdb0ce5
Optimize FP32/FP16 conversion with AVX intrinsic. ( #1545 )
* optimize yolov3 output extraction speed.
* Optimize FP32/FP16 conversion with AVX intrinsics.
6 years ago
nihui
cd8e0d8045
fix crop arm without packing, fix #1546
6 years ago
nihui
362100936d
fix scale pack4
6 years ago
nihui
61956069ae
fix pack8 condition
6 years ago
nihui
9821be7a42
create pack8 pipeline only if shader_pack8 is enabled
6 years ago
nihui
bbaa4dcce2
compile fp16pa, optimize shader for size, enable implicit fp16 arithmetic for qcom855 and qcom855plus
6 years ago