Howave
123ca35e00
fix compile warnings ( #1042 )
7 years ago
nihuini
bade132589
comment++
7 years ago
nihuini
81be8c86ae
fix bus error in resize_bilinear_c2 on armv7
7 years ago
nihuini
17d63a1491
fix bus error in resize_bilinear_c3 on armv7
7 years ago
nihuini
e9ffdb5bdd
16bit storage on arm mali is buggy
7 years ago
nihuini
73911492d7
fix validation warning on querypool destruction, enable fp16p by default
7 years ago
nihuini
040a8d2427
set vulkan device by gpu index
7 years ago
nihui
21f79b8546
prefer cpu fp16 casting to reduce upload/download overhead on discrete gpu
7 years ago
nihui
721abe91a8
packed mat is handy
7 years ago
nihui
afcfe0936f
fix false warnings
7 years ago
nihuini
e56f0d47cc
fix out of range load and store in bilinear resize c2/c3 neon block
7 years ago
BUG1989
c2022f4501
optimize conv sgemm with sse on intel platform ( #1035 )
* optimize conv sgemm with sse
* Update convolution_x86.cpp
7 years ago
nihuini
e09607bc22
add option to upload model function, pipeline creation honors option use flags, setting allocator per extractor do not make much sense
7 years ago
nihuini
e09d11f936
rough fix build without arm neon
7 years ago
nihuini
5fdffbcaac
destroy_gpu_instance is not threadsafe anyway, fix deadlock on exit
7 years ago
BUG1989
d9f269fa3d
use sgemm fp32 on arm platform,optimize conv1x1s2 ( #1031 )
7 years ago
nihuini
838c5df839
option api changes
7 years ago
nihuini
7f7bbf12e5
new api for getting the default gpu device
7 years ago
nihuini
4de4078779
move platform includes out of namespace
7 years ago
BUG1989
b53541e8f9
fix arm winograd int8,optimize winograd x86 ( #1025 )
7 years ago
BUG1989
01b3804828
optimization the x86 convolution layer with avx2 ( #1019 )
* add the "Tu Fa" conv sgemm fp32 with avx2 for x86
* add avx2 cmake option
* fix some bugs of avx2 pull request
7 years ago
nihui
fe4b00f7a2
unroll outh 4 for winograd gemm
7 years ago
nihuini
74276314bb
unroll size 4 for conv1x1s1 pack4
7 years ago
nihuini
cd7559c639
more fix for fp16p, still disabled by default
7 years ago
nihuini
4b6bffa560
Mat row should be elemsize-aware
7 years ago
harhar539
5e317b98c5
fix illegal memory access at conv layer of vulkan ( #1011 )
* 1.fix pad tail bug in commit d1ea2a3 at pooling layer
* fix illegal memory access at conv layer of vulkan
fix illegal memory access at conv layer of vulkan when bias term is 0
7 years ago
nihui
25b9736f82
shader fp16 packed
7 years ago
nihuini
4b50a97e31
implement vulkan winograd23
7 years ago
nihuini
37e150162a
do not retrieve timestamp availabitliy bits
7 years ago
nihuini
738fb6bb14
print gpu per layer benchmark
7 years ago
nihuini
8e2fb2e710
expose timestamp_period and timestamp_valid_bits
7 years ago
nihuini
c9a9486307
merge command submit and wait, expose queue_count, concurrent queue submission shall work
7 years ago
nihuini
2b21cf9e02
move mutex class family to platform.h
7 years ago
nihuini
aa94e77e68
fix pipeline object leak
7 years ago
nihui
3e003ffd98
fuse sigmoid
7 years ago
nihui
5adfa290a5
1x1s1d1_lds_4_4_4 is non-optimal, delete it
7 years ago
nihuini
8ac300c3a2
mat4 type in shared memory makes some driver unhappy ..
7 years ago
nihuini
f5ba97e7c6
lds optimize for conv3x3s1, conv1x1s1 and fc
7 years ago
nihuini
8322a14964
set fixed local size
7 years ago
nihuini
7a8f68aca6
move vulkan code to subdir, new layer interface create_pipeline and destroy_pipeline for post-loading works
7 years ago
nihuini
c6e075cef7
fuse deconv/innerproduct relu arm
7 years ago
nihui
be81ecf1f6
fix build on msvc
7 years ago
nihuini
528fe8e9e3
gpu convolution/deconvolution/innerproduct fuse activation
7 years ago
nihuini
3f85cafc08
fuse relu leakyrelu clip into convolution/deconvolution/innerproduct
7 years ago
nihuini
7984ffcb4d
ncnnoptimize tool
7 years ago
nihuini
b81e1f3906
get rid of the old workaround :)
7 years ago
791136190
e2e8e1b9d7
mxnet2ncnn tool support symbol.softmax op ( #938 )
* [CHG]when use HybridBlock to call F.softmax,the softmax op name is "softmax"(mxnet_version:10301)
* [CHG]Remove type mismatch error when using static code detection tool
7 years ago
nihuini
5d86014d9c
add missing barrier for transfer dst, fix softmax pack4, fix #932
7 years ago
nihuini
4729ea3505
bottom blob memory never alias, reuse blob memory more elegantly relying on refcount
7 years ago
nihui
274392eb80
convolution padding same on gpu
7 years ago