77 Commits (f1bdc87478c64e0dfdba3d679e6e0dfd4b84df80)

Author SHA1 Message Date
  nihui da7d1a10f7
test x86 arm convolution oom (#5492) 1 year ago
  nihui 556b79ce4d
create layer decoupled (#5258) 2 years ago
  nihui ded0b78bb2
fix nvidia vulkan crash on exit (#5234) 2 years ago
  邓实诚 a1e3ebf8e5
implement simplemath (#4905) 2 years ago
  nihui c45c01c7c1
enable VK_KHR_cooperative_matrix (#4823) 2 years ago
  nihui 85991e2e0e
test custom option, update ci (#4609) 3 years ago
  nihui fed99fd35b
gemm output transpose, prepack c (#4479) 3 years ago
  nihui 15761fc1a6
arm vfpv4 asimdhp asimdfhm optimization for gemm (#4432) 3 years ago
  nihui 706831f8a9
arm vfpv4 optimization for innerproduct (#3950) 3 years ago
  nihui 067e8e1d92
mips unified elempack for elementwise layers (#3928) 3 years ago
  nihui 241524ffce
discard weight memory for x86 arm vulkan (#3865) 4 years ago
  tpoisonooo 6fd801b6d7
feat(src/layer): add vision_transformer benchmark (#3730) 4 years ago
  nihui 308965b7e9 sanitize cooperative matrix option in tests 4 years ago
  nihui dadc640c66
x86 avx512 optimization (#3581) 4 years ago
  nihui 559e5b23f9
vulkan tensorcore optimization (#3628) 4 years ago
  nihui 920aa79f04
drop x86 avx2 fp16 (#3568) 4 years ago
  Yuzhong Yan 681141ff42
[YZ] Fix bug in unit test (#3556) 4 years ago
  nihui 3a83704c38
binary4d, unary4d (#3443) 4 years ago
  nihui 6941ec8fc9
arm neon optimization for general packed convolution (#3426) 4 years ago
  nihui 999e640d43
dynamic convolution weight (#3408) 4 years ago
  nihui f10cc6dd93
initial data structure changes for 3dcnn, conv3d, pooling3d (#3378) 4 years ago
  nihui 24fbb6e8cb
honor thread setting on load and vulkan command, ci avx512 t4 (#3391) 4 years ago
  nihui 0b664ec438
fix potential out of range read in test with int8 inputs (#3357) 4 years ago
  Tijmen Verhulsdonck 4270b5c502
Fix broken codepaths with AVX only (#3254) 4 years ago
  Tijmen Verhulsdonck eaa7e24db6
Added ability to switch AVX/AVX2 during runtime (#3076) 4 years ago
  nihui 3a77b09c31 fix test failure 5 years ago
  nihuini fef61c5296 fix arm build 5 years ago
  nihuini 934a1a8e32 test flatten packing padding int8 5 years ago
  nihui 17936e9f54 fix packing risc-v test, add cpu_riscv_vlenb() 5 years ago
  nihui a61f03ec76 arm neon optimization for pixelshuffle scale 2 5 years ago
  nihui 5fe75f19ef
architecture changes for int8 packing (#2771) 5 years ago
  nihui 21dc650eb3
check layer support (#2564) 5 years ago
  tpoisonooo baf49574c4
innerproduct aarch64 use gemm (#2521) 5 years ago
  nihui 54c0a13b9f
build shared library (#2525) 5 years ago
  PENGUINLIONG 8f8f2de4d0
SSE2 optimization pack (#2123) 5 years ago
  maxfy1992 a106baa3b8
add interp param align_corner (#2236) 5 years ago
  Leo 5afd318b86
Support remove libstdc++ denpendency (#2030) 5 years ago
  nihuini d3f0b9f993 try smaller random values 5 years ago
  nihuini 5d5a3d1434 conv1x1s1 conv1x1s2 conv3x3s1 winograd pack8 arm fp16sa 5 years ago
  nihui aa1a9e90c5 interp shufflechannel arm fp16sa pack8 5 years ago
  nihuini df5a7f32d4 enable arm82 fp16sa pack8 test 5 years ago
  nihuini 47ae0c151a some shared arm bf16s fp16s implementation 5 years ago
  nihui bb5bfe3841
avx2 infrastructure (#1943) 5 years ago
  nihui 11cffce114
armv8.2 infrastructure (#1856) 5 years ago
  nihui 3ff40b0679
Ci rv32imc (#1940) 5 years ago
  nihuini 0d6cc01d55 innerproduct handle mish activation, fix naive C testing, fix #1930 5 years ago
  Tijmen Verhulsdonck 3325cf94f8
Added AVX swish/lrn/batchnorm (#1897) 5 years ago
  Tijmen Verhulsdonck 73aa99e83c
LSTM arm/x86 + fp16 innerproduct arm (#1881) 5 years ago
  nihui 12ce58074e some code clean 5 years ago
  Tijmen Verhulsdonck 66618340ac
x86 fp16 weight storage optimizations (#1871) 5 years ago