ice
7a0c19c856
feat: pipe & spv cache
9 months ago
Copilot
4644540ea4
Add Windows XP support merging PRs #6176 and #6177 ( #6204 )
Co-authored-by: Sugar-Baby <87747602+Sugar-Baby@users.noreply.github.com>
Co-authored-by: AtomAlpaca <66774326+AtomAlpaca@users.noreply.github.com>
10 months ago
nihui
fe509e9bc1
flexible coopmat mnk and unified elempack for vulkan deconvolution gemm ( #6199 )
10 months ago
nihui
0cfe201b3c
fix vulkan absval fp16 ( #6167 )
* fix 1d 2d cstep
* fix ranged cstep
10 months ago
nihui
171b9d1bba
use spdx license header, copyright Tencent ( #6152 )
10 months ago
nihui
9f832c19c1
vulkan int8 packing quantize dequantize requantize ( #3731 )
* add int8 definitions
* packing vulkan int8/int32, quantize vulkan
* vulkan dequantize
* requantize vulkan
11 months ago
nihui
bd0b111775
vulkan tight fp16p pack1 ( #6127 )
11 months ago
nihui
24a3b99f1f
drop layer support_image_storage and option use_image_storage ( #6126 )
* fix pyncnn build
11 months ago
nihui
211e238639
drop layer forward vkimagemat ( #6124 )
vkimagemat was originally used as a mat storage in the hope of improving performance on old adreno gpus, but in fact it is slower than the cpu in most cases and is no longer suitable for the latest adreno architecture and large shapes
11 months ago
nihui
b9f98f0d3a
always allocate aligned size for 1d/2d mat and vkmat ( #6104 )
* fix sub mat cstep
* fix embed
* rnn/lstm/gru int8 test without rounding diversity
11 months ago
nihui
4c4ecdf118
dequantize pack8 for all datatypes, fix convdw int8 dequant pack8 ( #6109 )
11 months ago
hanzh
78b2e68728
arm unified elempack optimization for groupnorm ( #4080 )
Co-authored-by: mmyyy22 <mmyyy22@users.noreply.github.com>
Co-authored-by: nihui <nihuini@tencent.com>
11 months ago
nihui
8363040cb4
pnnx ncnn gelu fast mode, fix interp 2d resize ( #5999 )
1 year ago
nihui
ef0b0e631c
interp output size expression ( #5994 )
1 year ago
nihui
39c055d7f2
crop axes starts ends expression ( #5976 )
* skip dynamic tensor index
* handle clone oom
1 year ago
nihui
eed257df1f
ci update llvmpipe ( #5954 )
* check image fp16
1 year ago
nihui
07267f2618
softmax 4d test and vulkan, softmax unified elempack optimization for x86 arm riscv ( #5931 )
1 year ago
nihui
6396a732ef
reshape shape expression, drop reshape permute, test reshape oom ( #5918 )
1 year ago
Yexuan Wu
3571d7e8ec
Support better API to detect big little core in windows after win7 ( #5927 )
1 year ago
erquren
c9e0c877f9
add missing license header ( #5925 )
1 year ago
nihui
1e3fcb9dda
paramdict value string type, natural array representation ( #5915 )
1 year ago
nihui
23890900c2
x86 optimization for convolution int8 gemm ( #5874 )
* cmake check compiler test cannot be optimized out
* drop requant pack4
1 year ago
nihui
4a70be45ed
fix requantize pack4to8 ( #5893 )
1 year ago
nihui
ff5b554003
restrict one dim quantize scale size, test quantize oom ( #5892 )
* restrict one dim quantize scale size
* sse2 requantize pack8
1 year ago
nihui
956bccd295
restrict one dim requantize scale bias size ( #5888 )
1 year ago
nihui
48e1260a6f
restrict one dim dequantize scale bias size ( #5886 )
1 year ago
nihui
21a71d3673
slim x86 dequantize ( #5879 )
* remove dequantize pack8 test, seems to be useless
1 year ago
nihui
a13958ef47
optimize ncnn test building time ( #5867 )
1 year ago
nihui
39cf4f6018
slim reduction ( #5866 )
1 year ago
nihui
44e0d95c0d
x86 sse2/xop/avx/avx2/avx512/vnni/vnniint8 optimization for gemm int8 ( #5763 )
* skip round problem
* sde on ubuntu24
1 year ago
nihui
a9553fcc15
skip unaryop round halfway cases for powerpc ( #5814 )
1 year ago
nihui
19caca3140
port rvv intrinsic 1.0+ ( #5642 )
* zfh zvfh xtheadvector infra
* dispatch for rvv and xtheadvector
* dispatch for non-vector zfh
* port xtheadvector recp rsqrt trunc
* general rvv gemm
* c906 and c910 ci
* old tuple code clean
* update riscv64 ci
* update build doc
* drop old th1520 toolchain
1 year ago
nihui
0734b657d9
spectrogram and inverse spectrogram ( #5779 )
* only supports hann, hamming and all-one window
* inverse spectrogram does not support length parameter
* spectrogram always returns torch.view_as_real(out) as ncnn does not support complex typed mat yet
* inverse spectrogram always accepts torch.view_as_complex(in) as ncnn does not support complex typed mat yet
1 year ago
nihui
e7602a206b
fix gemm arm int8 scales descales offset ( #5750 )
1 year ago
nihui
8fe62812c9
arm neon optimization for layernorm fp32/bf16s/fp16s ( #5746 )
1 year ago
nihui
66b54cbea2
multiheadattention int8 quantization ( #5733 )
* x86 vulkan fallback
* comment about bf16s
1 year ago
nihui
1c7af00499
gemm int8 quantization ( #5706 )
* quantize gemm
* write gemm quantize scales
* update doc
* less openmp args
* x86 riscv fallback
* skip gemm vulkan int8
* fix noint8 test, fix arm bf16 test
* enable vfpv4 on neon build only
* fix gemm vulkan without C
* fp16 pack8 output
* enable elempack=8 only for asimdhp+
* tiled gemm int8 test
* opt arm64 tiles, fix asimdhp dispatch
1 year ago
nihui
5df5413c81
embed int8 quantization and add embed test ( #5667 )
1 year ago
nihui
fdf0df3079
RMSNorm ( #5630 )
1 year ago
nihui
3752d71200
fix potential fp16s bf16s conflicts on arm vfpv4 ( #5578 )
* fix potential fp16s bf16s conflicts on armv7 vfpv4
* but prefer fp16 on armv8.2
1 year ago
nihui
4c3debae2d
multiheadattention scale param ( #5526 )
* update swiftshader
* skip vs2017 swiftshader
1 year ago
nihui
8235cad999
mha allow qdim differs from embed_dim ( #5519 )
* test mha oom
1 year ago
nihui
39c27de47b
test concat oom ( #5502 )
1 year ago
nihui
093c516898
test slice oom ( #5501 )
1 year ago
nihui
da7d1a10f7
test x86 arm convolution oom ( #5492 )
* skip mips loongarch riscv oom test atm
* test softmax oom
1 year ago
nihui
08b7d99a75
rnn/lstm/gru dynamic quantization ( #5435 )
2 years ago
nihui
9ce7930413
x86 optimization for convolution tiled gemm ( #5426 )
2 years ago
nihui
e3758fdd19
fix test reduction warning ( #5397 )
2 years ago
nihui
984d6dd844
promote vfpv4 for auto fp16 storage conversion ( #5325 )
* promote vfpv4 for auto fp16 storage conversion
* always report neon and vfpv4 for arm64
2 years ago
nihui
5329d32e74
check vulkan fp16 uniform support and implement lfp conversion without fp16u ( #5287 )
2 years ago