786 Commits (5fdffbcaac47c8afe79b7cdc5bc7aab76be07f87)
 

Author SHA1 Message Date
  nihuini 5fdffbcaac destroy_gpu_instance is not threadsafe anyway, fix deadlock on exit 7 years ago
  BUG1989 d9f269fa3d use sgemm fp32 on arm platform,optimize conv1x1s2 (#1031) 7 years ago
  nihuini 838c5df839 option api changes 7 years ago
  nihuini 7f7bbf12e5 new api for getting the default gpu device 7 years ago
  nihuini 4de4078779 move platform includes out of namespace 7 years ago
  BUG1989 b53541e8f9 fix arm winograd int8,optimize winograd x86 (#1025) 7 years ago
  nihui 3aae0748e3
Update README.md 7 years ago
  BUG1989 01b3804828 optimization the x86 convolution layer with avx2 (#1019) 7 years ago
  nihuini 9b33e647bd use fixed blob names for benchmark 7 years ago
  nihuini 8cb107e78c apply model optimize 7 years ago
  nihui fe4b00f7a2 unroll outh 4 for winograd gemm 7 years ago
  nihuini 74276314bb unroll size 4 for conv1x1s1 pack4 7 years ago
  nihuini cd7559c639 more fix for fp16p, still disabled by default 7 years ago
  nihuini 4b6bffa560 Mat row should be elemsize-aware 7 years ago
  harhar539 5e317b98c5 fix illegal memory access at conv layer of vulkan (#1011) 7 years ago
  nihui 25b9736f82 shader fp16 packed 7 years ago
  nihuini 4b50a97e31 implement vulkan winograd23 7 years ago
  nihuini 37e150162a do not retrieve timestamp availabitliy bits 7 years ago
  nihuini 738fb6bb14 print gpu per layer benchmark 7 years ago
  nihuini 8e2fb2e710 expose timestamp_period and timestamp_valid_bits 7 years ago
  nihuini c9a9486307 merge command submit and wait, expose queue_count, concurrent queue submission shall work 7 years ago
  nihuini 2b21cf9e02 move mutex class family to platform.h 7 years ago
  nihuini aa94e77e68 fix pipeline object leak 7 years ago
  kalcohol a6aab42f95 add himix200 toolchain for Hi3516CV500, Hi3516DV300, Hi3519AV100. (#989) 7 years ago
  nihui 07260527fc fix activation params 7 years ago
  nihui 3e003ffd98 fuse sigmoid 7 years ago
  nihui 5adfa290a5 1x1s1d1_lds_4_4_4 is non-optimal, delete it 7 years ago
  nihuini 8ac300c3a2 mat4 type in shared memory makes some driver unhappy .. 7 years ago
  nihuini f5ba97e7c6 lds optimize for conv3x3s1, conv1x1s1 and fc 7 years ago
  nihuini 8322a14964 set fixed local size 7 years ago
  nihuini e46a3e428a cmake warning-- 7 years ago
  nihuini 7a8f68aca6 move vulkan code to subdir, new layer interface create_pipeline and destroy_pipeline for post-loading works 7 years ago
  nihui d46c5989e1
mention practical-pytorch-to-onnx-to-ncnn 7 years ago
  nihui 8a3955dde7 delete tensorflow2ncnn/pytorch2ncnn, it never works 7 years ago
  nihuini 7c5fd53855 parse tensor info 7 years ago
  nihuini dbef56df47 write graph 7 years ago
  nihuini ed2e441a1d fix write fp16 weight with tag 7 years ago
  nihuini c6e075cef7 fuse deconv/innerproduct relu arm 7 years ago
  nihuini 81fda4d6d7 ncnnoptimize save fp16 storage weight 7 years ago
  nihuini 01807482f2 parse pt file 7 years ago
  nihuini 60e8261812 input param whc order 7 years ago
  nihuini 48a1cb4052 more optimize routines ... 7 years ago
  ShuangLiu1992 98bb0bb243 Update onnx2ncnn.cpp (#948) 7 years ago
  Cocoa Oikawa 102a68d193 adding detailed how to build for jetson (#953) 7 years ago
  nihui be81ecf1f6 fix build on msvc 7 years ago
  nihui a75d45fa9a chmod -x 7 years ago
  nihui 2fe769f314 update fused param files, enable ncnnoptimize tool build 7 years ago
  nihui 99b81b2ee9 eliminate dropout 7 years ago
  nihuini 528fe8e9e3 gpu convolution/deconvolution/innerproduct fuse activation 7 years ago
  nihuini 3f85cafc08 fuse relu leakyrelu clip into convolution/deconvolution/innerproduct 7 years ago