Howave
123ca35e00
fix compile warnings ( #1042 )
7 years ago
nihuini
bade132589
comment++
7 years ago
nihuini
81be8c86ae
fix bus error in resize_bilinear_c2 on armv7
7 years ago
nihuini
17d63a1491
fix bus error in resize_bilinear_c3 on armv7
7 years ago
nihuini
e9ffdb5bdd
16bit storage on arm mali is buggy
7 years ago
nihui
1273d69c20
update qcom410 imx7d benchmark
7 years ago
nihuini
73911492d7
fix validation warning on querypool destruction, enable fp16p by default
7 years ago
nihuini
040a8d2427
set vulkan device by gpu index
7 years ago
nihuini
9f9ac56538
update qcom810 and iphone5s benchmark
7 years ago
nihui
21f79b8546
prefer cpu fp16 casting to reduce upload/download overhead on discrete gpu
7 years ago
nihui
af950819cd
convert add_n and ElementWiseSum, fix #1008
7 years ago
nihui
721abe91a8
packed mat is handy
7 years ago
nihui
afcfe0936f
fix false warnings
7 years ago
nihuini
e56f0d47cc
fix out of range load and store in bilinear resize c2/c3 neon block
7 years ago
BUG1989
c2022f4501
optimize conv sgemm with sse on intel platform ( #1035 )
* optimize conv sgemm with sse
* Update convolution_x86.cpp
7 years ago
nihuini
e09607bc22
add option to upload model function, pipeline creation honors option use flags, setting allocator per extractor do not make much sense
7 years ago
nihuini
83d7154be8
adapt option api changes
7 years ago
nihuini
e09d11f936
rough fix build without arm neon
7 years ago
nihuini
5fdffbcaac
destroy_gpu_instance is not threadsafe anyway, fix deadlock on exit
7 years ago
BUG1989
d9f269fa3d
use sgemm fp32 on arm platform,optimize conv1x1s2 ( #1031 )
7 years ago
nihuini
838c5df839
option api changes
7 years ago
nihuini
7f7bbf12e5
new api for getting the default gpu device
7 years ago
nihuini
4de4078779
move platform includes out of namespace
7 years ago
BUG1989
b53541e8f9
fix arm winograd int8,optimize winograd x86 ( #1025 )
7 years ago
nihui
3aae0748e3
Update README.md
7 years ago
BUG1989
01b3804828
optimization the x86 convolution layer with avx2 ( #1019 )
* add the "Tu Fa" conv sgemm fp32 with avx2 for x86
* add avx2 cmake option
* fix some bugs of avx2 pull request
7 years ago
nihuini
9b33e647bd
use fixed blob names for benchmark
7 years ago
nihuini
8cb107e78c
apply model optimize
7 years ago
nihui
fe4b00f7a2
unroll outh 4 for winograd gemm
7 years ago
nihuini
74276314bb
unroll size 4 for conv1x1s1 pack4
7 years ago
nihuini
cd7559c639
more fix for fp16p, still disabled by default
7 years ago
nihuini
4b6bffa560
Mat row should be elemsize-aware
7 years ago
harhar539
5e317b98c5
fix illegal memory access at conv layer of vulkan ( #1011 )
* 1.fix pad tail bug in commit d1ea2a3 at pooling layer
* fix illegal memory access at conv layer of vulkan
fix illegal memory access at conv layer of vulkan when bias term is 0
7 years ago
nihui
25b9736f82
shader fp16 packed
7 years ago
nihuini
4b50a97e31
implement vulkan winograd23
7 years ago
nihuini
37e150162a
do not retrieve timestamp availabitliy bits
7 years ago
nihuini
738fb6bb14
print gpu per layer benchmark
7 years ago
nihuini
8e2fb2e710
expose timestamp_period and timestamp_valid_bits
7 years ago
nihuini
c9a9486307
merge command submit and wait, expose queue_count, concurrent queue submission shall work
7 years ago
nihuini
2b21cf9e02
move mutex class family to platform.h
7 years ago
nihuini
aa94e77e68
fix pipeline object leak
7 years ago
kalcohol
a6aab42f95
add himix200 toolchain for Hi3516CV500, Hi3516DV300, Hi3519AV100. ( #989 )
7 years ago
nihui
07260527fc
fix activation params
7 years ago
nihui
3e003ffd98
fuse sigmoid
7 years ago
nihui
5adfa290a5
1x1s1d1_lds_4_4_4 is non-optimal, delete it
7 years ago
nihuini
8ac300c3a2
mat4 type in shared memory makes some driver unhappy ..
7 years ago
nihuini
f5ba97e7c6
lds optimize for conv3x3s1, conv1x1s1 and fc
7 years ago
nihuini
8322a14964
set fixed local size
7 years ago
nihuini
e46a3e428a
cmake warning--
7 years ago
nihuini
7a8f68aca6
move vulkan code to subdir, new layer interface create_pipeline and destroy_pipeline for post-loading works
7 years ago