257 Commits (837e6b047e8cf18650cb9ffe27dffa170a2401e2)

Author SHA1 Message Date
  BUG1989 1b0e33460d add armv7 int8 conv3x3s1,using vaddw to replace vadd and vmovl 7 years ago
  nihui 72411b7a6c restore the old conv3x3s2 as reference, fast dilation convolution fails on striding 7 years ago
  nihui 1f20eb4e8c pack weight and more unroll makes improvement, ~20% faster for conv3x3s2 7 years ago
  chensy 30cc738309 fix asm "invalid operand" error for target iOS armv7 on file dequantize_arm.cpp 7 years ago
  Diego Gomes 4d73407df8 fix gettid call for glibc 7 years ago
  Diego Gomes 534f38ed87 fix auxv read for elf64 7 years ago
  nihuini 2dbaf6f7b7 store int8 scale in binary 7 years ago
  nihui fe14037777 more sub op preload 7 years ago
  nihui 2fe7ada4d8 add arm int8 convolution stub, preload group op for x86 7 years ago
  nihui eac7c66a97 fix fp32 group convolution on x86 7 years ago
  nihui 5d04a3a45c layer holds bottom blob scale, depthwise convolution read group scales 7 years ago
  nihui 354b95256c bump param version, backward compatible 7 years ago
  nihuini 2bc504925e fix int8_scales from multiple blobs, fix #512 7 years ago
  nihuini da352916fe fix pd using flag condition 8 years ago
  nihuini 6b536701c3 sub-mat shall be allocator-aware 8 years ago
  nihuini e34aa7786a armv7 int8 quantize/dequantize and conv1x1s1 8 years ago
  nihuini 4be27a0a89 int8 inference on x86 8 years ago
  nihui a169cec363 core int8 inference, quantize and dequantize, net using flag, caffe2ncnn reads int8 scale table 8 years ago
  nihui b6b90c888f
high resolution timestamp on windows 8 years ago
  nihui af49e2cada
install allocator.h 8 years ago
  nihui 7e1f358084
fix build on msvc 8 years ago
  nihui 9706cd1447 implement ncnn blob/workspace allocator, fine-grained per-layer openmp threads control, fix #469 8 years ago
  nihui 5879cb4d15
sgemm outperform direct conv on large channel 8 years ago
  nihuini 4b8101e7fc Revert "optimize interleave section for load first, about 5%~10% speed gain" 8 years ago
  nihui 56a667472a
sgemm is always faster on common channel size 8 years ago
  nihui 1e4eaeeacd optimize interleave section for load first, about 5%~10% speed gain 8 years ago
  nihui 6895cbf810 single vldm is faster than two vld1 on armv7, and some pipeline optimize 8 years ago
  nihuini 05d7562a5d reorder kernel weight, pipeline friendly ;) 8 years ago
  nihui b8f4f024a4 implement reorg yolodetectionoutput layer from caffe-yolov2 8 years ago
  nihuini ee98817446 proper first row/col handling in resize family, fix #429 8 years ago
  nihuini 511baa6718 optional image pixel api, fix #434 8 years ago
  nihuini 2368d29a1e more explicit alignment on armv7 8 years ago
  nihuini d172a34329 direct assembly port, enable convolution 1x1 sgemm on armv7 8 years ago
  nihuini b3e24cafc3 openmp++ 8 years ago
  nihuini 0fdb8da60e sgemm convolution 1x1 wip, about 20%~75% faster on aarch64, while armv7 compiler is foolish qaq 8 years ago
  nihuini 2b20bf940c drop armv7 vaddvq_f32 hack 8 years ago
  nihui 72bb261e7a switch to winograd5 8 years ago
  nihuini a234e9240d fix concat on height 8 years ago
  nihuini 003873c55b crop on channel and crop by param 8 years ago
  Chang, Hui-Tang dc2a689d10 fix proposal roi_score_blob bug (#430) 8 years ago
  nihuini 99a343ce70 allocate after permute, reduce peak memory usage 8 years ago
  nihuini 0ce0c11851 load sub-op in advance for group convolution 8 years ago
  nihuini 86f4264c7c arm neon assembly for winograd5 8 years ago
  kyuusaku d2416187dc fix parameter check for interp (#425) 8 years ago
  nihuini 90643630c2 apple a10/a11 is armv8.2-a 8 years ago
  nihuini 50e1f0e531 const for to_pixels family 8 years ago
  nihuini ce74836e2a yet another winograd convolution implementation, unroll outch 8 tiles 4 inch 4, about 22% faster, more optimization may comes soon :> 8 years ago
  nihui 30b6cc4ecd rdiv binaryop 8 years ago
  nihui 2f90a794ad rsub binaryop 8 years ago
  nihuini a341e7465c reject to load model with empty network, fix #392 8 years ago