nihuini
5fdffbcaac
destroy_gpu_instance is not threadsafe anyway, fix deadlock on exit
7 years ago
BUG1989
d9f269fa3d
use sgemm fp32 on arm platform,optimize conv1x1s2 ( #1031 )
7 years ago
nihuini
838c5df839
option api changes
7 years ago
nihuini
7f7bbf12e5
new api for getting the default gpu device
7 years ago
nihuini
4de4078779
move platform includes out of namespace
7 years ago
BUG1989
b53541e8f9
fix arm winograd int8,optimize winograd x86 ( #1025 )
7 years ago
nihui
3aae0748e3
Update README.md
7 years ago
BUG1989
01b3804828
optimization the x86 convolution layer with avx2 ( #1019 )
* add the "Tu Fa" conv sgemm fp32 with avx2 for x86
* add avx2 cmake option
* fix some bugs of avx2 pull request
7 years ago
nihuini
9b33e647bd
use fixed blob names for benchmark
7 years ago
nihuini
8cb107e78c
apply model optimize
7 years ago
nihui
fe4b00f7a2
unroll outh 4 for winograd gemm
7 years ago
nihuini
74276314bb
unroll size 4 for conv1x1s1 pack4
7 years ago
nihuini
cd7559c639
more fix for fp16p, still disabled by default
7 years ago
nihuini
4b6bffa560
Mat row should be elemsize-aware
7 years ago
harhar539
5e317b98c5
fix illegal memory access at conv layer of vulkan ( #1011 )
* 1.fix pad tail bug in commit d1ea2a3 at pooling layer
* fix illegal memory access at conv layer of vulkan
fix illegal memory access at conv layer of vulkan when bias term is 0
7 years ago
nihui
25b9736f82
shader fp16 packed
7 years ago
nihuini
4b50a97e31
implement vulkan winograd23
7 years ago
nihuini
37e150162a
do not retrieve timestamp availabitliy bits
7 years ago
nihuini
738fb6bb14
print gpu per layer benchmark
7 years ago
nihuini
8e2fb2e710
expose timestamp_period and timestamp_valid_bits
7 years ago
nihuini
c9a9486307
merge command submit and wait, expose queue_count, concurrent queue submission shall work
7 years ago
nihuini
2b21cf9e02
move mutex class family to platform.h
7 years ago
nihuini
aa94e77e68
fix pipeline object leak
7 years ago
kalcohol
a6aab42f95
add himix200 toolchain for Hi3516CV500, Hi3516DV300, Hi3519AV100. ( #989 )
7 years ago
nihui
07260527fc
fix activation params
7 years ago
nihui
3e003ffd98
fuse sigmoid
7 years ago
nihui
5adfa290a5
1x1s1d1_lds_4_4_4 is non-optimal, delete it
7 years ago
nihuini
8ac300c3a2
mat4 type in shared memory makes some driver unhappy ..
7 years ago
nihuini
f5ba97e7c6
lds optimize for conv3x3s1, conv1x1s1 and fc
7 years ago
nihuini
8322a14964
set fixed local size
7 years ago
nihuini
e46a3e428a
cmake warning--
7 years ago
nihuini
7a8f68aca6
move vulkan code to subdir, new layer interface create_pipeline and destroy_pipeline for post-loading works
7 years ago
nihui
d46c5989e1
mention practical-pytorch-to-onnx-to-ncnn
7 years ago
nihui
8a3955dde7
delete tensorflow2ncnn/pytorch2ncnn, it never works
7 years ago
nihuini
7c5fd53855
parse tensor info
7 years ago
nihuini
dbef56df47
write graph
7 years ago
nihuini
ed2e441a1d
fix write fp16 weight with tag
7 years ago
nihuini
c6e075cef7
fuse deconv/innerproduct relu arm
7 years ago
nihuini
81fda4d6d7
ncnnoptimize save fp16 storage weight
7 years ago
nihuini
01807482f2
parse pt file
7 years ago
nihuini
60e8261812
input param whc order
7 years ago
nihuini
48a1cb4052
more optimize routines ...
7 years ago
ShuangLiu1992
98bb0bb243
Update onnx2ncnn.cpp ( #948 )
* Update onnx2ncnn.cpp
num_filter for ConvTranspose is wrong (line 1075), should be the same sa Conv, which reads the number of filters from dim(0) (line 998)
int num_filter = W.dims(0);
* Update onnx2ncnn.cpp
7 years ago
Cocoa Oikawa
102a68d193
adding detailed how to build for jetson ( #953 )
* adding how to build for jetson
* update build instruction for NVIDIA Jetson
7 years ago
nihui
be81ecf1f6
fix build on msvc
7 years ago
nihui
a75d45fa9a
chmod -x
7 years ago
nihui
2fe769f314
update fused param files, enable ncnnoptimize tool build
7 years ago
nihui
99b81b2ee9
eliminate dropout
7 years ago
nihuini
528fe8e9e3
gpu convolution/deconvolution/innerproduct fuse activation
7 years ago
nihuini
3f85cafc08
fuse relu leakyrelu clip into convolution/deconvolution/innerproduct
7 years ago