dependabot[bot]
f1bdc87478
Bump pypa/cibuildwheel from 2.20.0 to 2.22.0 ( #5794 )
Bumps [pypa/cibuildwheel](https://github.com/pypa/cibuildwheel ) from 2.20.0 to 2.22.0.
- [Release notes](https://github.com/pypa/cibuildwheel/releases )
- [Changelog](https://github.com/pypa/cibuildwheel/blob/main/docs/changelog.md )
- [Commits](https://github.com/pypa/cibuildwheel/compare/v2.20.0...v2.22.0 )
---
updated-dependencies:
- dependency-name: pypa/cibuildwheel
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
1 year ago
nihui
8d2ac57824
fix missing asimdfhm target macro in ndk-r21 ( #5804 )
1 year ago
nihui
1172b04355
pnnx pass level2 priority ( #5791 )
* constantlist is constant
* f pad value none
* skip ncnn istft center=False test for old torch
* fix fuse pad conv1d
1 year ago
nihui
0734b657d9
spectrogram and inverse spectrogram ( #5779 )
* only supports hann, hamming and all-one window
* inverse spectrogram does not support length parameter
* spectrogram always returns torch.view_as_real(out) as ncnn does not support complex typed mat yet
* inverse spectrogram always accepts torch.view_as_complex(in) as ncnn does not support complex typed mat yet
1 year ago
Joson
c043612e91
disable pnnx debug model ( #5784 )
1 year ago
nihui
9cefe9a624
avx vnni int8, avx vnni int16, avx ne convert infrastructure ( #5749 )
1 year ago
nihui
e71fdf8e51
pnnx write implicit int conversion in python script ( #5767 )
1 year ago
nihui
c32442aa09
disable x86 auto recip optimization for potential precision loss ( #5762 )
1 year ago
nihui
6077adc6bc
pnnx do not fold tensor with dynamic shape, use fp32 module by default ( #5755 )
1 year ago
nihui
e7602a206b
fix gemm arm int8 scales descales offset ( #5750 )
1 year ago
nihui
c1f9e959f5
pnnx torch 2.5 ( #5748 )
1 year ago
nihui
8fe62812c9
arm neon optimization for layernorm fp32/bf16s/fp16s ( #5746 )
1 year ago
Upliner Mikhalych
cbd17cd062
Fix #5741 don't crash when vkCreateDevice fails ( #5742 )
1 year ago
nihui
73d3519326
layernorm x86 optimization, re ( #5745 )
1 year ago
nihui
bd1f39ed82
blacklist mesa vulkan cooperative matrix feature ( #5739 )
ref https://gitlab.freedesktop.org/mesa/mesa/-/issues/10847
1 year ago
nihui
8105c75120
improve compatibility of harmonyos cpu topology abi ( #5740 )
1 year ago
nihui
f856011297
pnnx drop onnx weight-like graph input ( #5736 )
1 year ago
nihui
121b1fecd5
apply code-format changes
1 year ago
nihui
66b54cbea2
multiheadattention int8 quantization ( #5733 )
* x86 vulkan fallback
* comment about bf16s
1 year ago
nihui
1c7af00499
gemm int8 quantization ( #5706 )
* quantize gemm
* write gemm quantize scales
* update doc
* less openmp args
* x86 riscv fallback
* skip gemm vulkan int8
* fix noint8 test, fix arm bf16 test
* enable vfpv4 on neon build only
* fix gemm vulkan without C
* fp16 pack8 output
* enable elempack=8 only for asimdhp+
* tiled gemm int8 test
* opt arm64 tiles, fix asimdhp dispatch
1 year ago
Ankush Goel
9b5f6a39b4
fix: typo ( #5709 )
1 year ago
nihui
80c78a0e40
pnnx fuse t5-layernorm as rmsnorm ( #5675 )
1 year ago
nihui
21e54d8c7a
update modelwriter for rmsnorm ( #5676 )
1 year ago
nihui
204583ba52
x86 sse2/avx/avx512 optimization for rmsnorm ( #5672 )
1 year ago
nihui
8077d340a9
arm neon optimzation for rmsnorm ( #5668 )
1 year ago
nihui
5df5413c81
embed int8 quantization and add embed test ( #5667 )
1 year ago
nihui
5e2d56d025
pnnx fuse mobilevit style selfattention, onnx2pnnx handle more general gemm ( #5659 )
1 year ago
nihui
25a22e0c0c
update release download
1 year ago
張小凡
a6d3ef5a0b
Fixed bug #5637 ( #5640 )
1 year ago
nihui
27f64a1382
Update README.md
1 year ago
Joey Ballentine
a0c9e7783d
Add python binding for loading bin from memory ( #5164 )
1 year ago
nihui
4de536951a
onnx2pnnx do not fold single constant for gemm weight ( #5634 )
1 year ago
nihui
789d8686c7
pnnx functionize do not create shadow op for identity consumers ( #5632 )
1 year ago
nihui
70310e951e
fix out of range read in convolution im2col aarch64 ( #5631 )
1 year ago
Kelun Lei
07196eee2e
benchmark: add Kunpeng 920 7260 ( #5606 )
1 year ago
張小凡
e550419508
Add yolov8 ncnn example ( #5506 )
1 year ago
nihui
fdf0df3079
RMSNorm ( #5630 )
1 year ago
nihui
abad90cb1c
pnnx drop torch.max torch.min indice node if not used ( #5629 )
1 year ago
nihui
eb6e084c2d
pnnx convert nn.RMSNorm F.rms_norm ( #5628 )
1 year ago
nihui
c46278d0bb
pnnx convert onnx resize with roi, torch.max torch.min with dim returns tuple ( #5627 )
* pnnx convert onnx resize with roi, torch.max torch.min with dim returns tuple
* torch max min only support single dim
1 year ago
nihui
ae17e5e177
ci release ubuntu2404, major release yml refactor ( #5624 )
* release ubuntu 24.04 package, major release yml refactor
* update macos vulkan sdk
* set MACOSX_DEPLOYMENT_TARGET
1 year ago
nihui
ecfd88a11b
pnnx2ncnn convert torch.roll with one or two shifts ( #5623 )
1 year ago
nihui
f3cd4c2e91
pnnx2ncnn handle F.maxpool without dilation param ( #5622 )
1 year ago
nihui
b9debee8fb
pnnx ci for torch 2.4 ( #5618 )
* update onnx proto
1 year ago
nihui
60823a8de3
pnnx handles sdpa batch index ( #5617 )
1 year ago
dependabot[bot]
03cf161dbd
Bump pypa/cibuildwheel from 2.17.0 to 2.20.0 ( #5613 )
Bumps [pypa/cibuildwheel](https://github.com/pypa/cibuildwheel ) from 2.17.0 to 2.20.0.
- [Release notes](https://github.com/pypa/cibuildwheel/releases )
- [Changelog](https://github.com/pypa/cibuildwheel/blob/main/docs/changelog.md )
- [Commits](https://github.com/pypa/cibuildwheel/compare/v2.17.0...v2.20.0 )
---
updated-dependencies:
- dependency-name: pypa/cibuildwheel
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
1 year ago
Galasnow
5b5c1fdb8f
Fix build error with NDK r27 ( #5615 )
Enable policy CMP0057 for cmake version >= 3.3
1 year ago
佰阅
391152f500
c_api surpport set_vulkan_device ( #5610 )
1 year ago
quink
92e0b8253b
arm/convolution_3x3_pack1to8_fp16s: prefer ldr/str over ld1/st1 ( #5603 )
Depending on the arch, ldr/str can be faster than ld1/st1, especially
for loading to one lane form. For example, on Cortex A75,
1. execution latency of 'ldr q0' and 'ldr h0' are 5
2. execution latency of 'ld1 {v0.16b}' is 6
3. execution latency of 'ld1 {v0.h}[0]' is 8
On Cortex X3,
1. execution latency of 'ldr q0' and 'ldr h0' are 6
2. execution latency of 'ld1 {v0.16b}' is 6
3. execution latency of 'ld1 {v0.h}[0]' is 8
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
1 year ago
張小凡
051b04ffb4
Updated use-ncnn-with-pytorch-or-onnx document ( #5557 )
1 year ago