nihui
a46edcf720
x86 optimization for interp ( #3546 )
4 years ago
nihui
139554b36e
rewrite convolution x86 sgemm pack1 ( #3544 )
4 years ago
Yoh
d2999b8d53
Optimize scale x86 ( #3540 )
Co-authored-by: Yoh-Z <Yoh-Z@users.noreply.github.com>
4 years ago
nihui
fb6283c8b0
x86 avx fma optimization ( #3543 )
4 years ago
nihui
3a43cc7015
update efficientnetv2_b0 param for reduction axes changes
4 years ago
nihui
3181616439
treat old reduction axes param as failure
4 years ago
nihui
672daa7e04
xop infrastructure and optimization ( #3541 )
4 years ago
nihui
9d0c36358c
add z8350 and n5105 benchmark
4 years ago
nihui
de77b669c4
x86 sse2 optimization for conv1x1/3x3 pack4 and general sgemm pack4/pack4to1 ( #3538 )
* x86 sse2 optimization for conv1x1 conv3x3 pack4 and general sgemm pack4/pack4to1
* x86 sse2 optimization for conv3x3s1 pack4to1 and general sgemm convolution pack4to1, use aligned load/store
* enforce explicit alignment
4 years ago
nihui
6422e6acd3
fix x86 sgemm convolution int8 weight shuffle
4 years ago
nihui
340b4e673e
pnnx fold constant ( #3521 )
4 years ago
Kagurazaka Kotori
08ecc94d63
x86: Use _mm_cvtsi128_si{32,64} in float2int8 ( #3536 )
This patch uses _mm_cvtsi128_si{32,64} intrinsics when returning value
in float2int8() to reduce unnecessary memory accesses.
Resolves TODO "use _mm_cvtsi128_si64 on 64bit target".
Signed-off-by: Kagurazaka Kotori <kagurazakakotori@gmail.com>
4 years ago
nihui
1d0b78f9b6
Update README.md
4 years ago
nihui
a356d152bb
Update README.md
4 years ago
Joson
70795c6548
Create README.md ( #3532 )
4 years ago
teng
3ff9ae707f
simplify macro ( #3530 )
4 years ago
Kagurazaka Kotori
5c078016c2
x86/avx_mathfun.h: Remove fallback warnings ( #3527 )
* x86/avx_mathfun.h: Remove fallback warnings
This patch removes warning messages indicating falling back to SSE2
when AVX2 support is disabled as suggested. Also reorders non-AVX2
macros for readability and faster preprocessing.
Suggested-by: nihui <shuizhuyuanluo@126.com>
Signed-off-by: Kagurazaka Kotori <kagurazakakotori@gmail.com>
* apply code-format changes
Co-authored-by: kagurazakakotori <kagurazakakotori@users.noreply.github.com>
4 years ago
nihui
2d46994d2e
wrap avxvnni and avx512vnni build options over cpu feature detector
4 years ago
nihui
33e225f173
fix c api test
4 years ago
nihui
bae2ee375f
simplify c api layer forward_n output array type
4 years ago
nihuini
1be043aad5
convert torch mean/sum/prod reduction with no args
4 years ago
nihuini
b4a755495c
convert pnnx zeros roll remainder
4 years ago
nihui
c0a94cd9ca
fix armv7 without neon ( #3514 )
4 years ago
nihuini
4ba1eb6d2f
assign unique names for all pnnx operator and operand names. fix #3493
4 years ago
nihuini
457f7d1c63
fix use-after-free, fix #3492
4 years ago
nihui
b07ad54320
add zynq-7020 benchmark
4 years ago
nihui
4e4e0b9cf8
do not link libgcc as we no longer rely on builtin support cpu feature intrinsics now
4 years ago
nihui
71f377e9e9
update benchmark from Q-engineering
4 years ago
nihui
d95213a005
x86 convolution int8 optimization third stage ( #3506 )
* avx-vnni and avx512-vnni optimization for convolution int8 gemm and 3x3 winograd pack8to4/pack8to1
4 years ago
nihuini
9f7f491885
use the old-style __cpuid_count for old compiler compatibility, fix #3510
4 years ago
nihui
930c36ebe2
avx512 infrastructure ( #3407 )
4 years ago
nihui
c2896bcd4d
x86 convolution int8 optimization second stage ( #3495 )
* some sse 4.1 optimization
* sse2/avx2 optimization for convolution 3x3 winograd42 int8 pack8to4/pack8to1
4 years ago
teng
13a51fbcf8
add else ( #3494 )
4 years ago
nihui
e9b8f0a6ef
x86 avx2 optimization for convolution gemm int8 ( #3489 )
4 years ago
nihui
c5d7f963b9
layer tile ( #3491 )
4 years ago
dependabot[bot]
d25388c938
Bump pypa/gh-action-pypi-publish from 1.4.2 to 1.5.0 ( #3490 )
Bumps [pypa/gh-action-pypi-publish](https://github.com/pypa/gh-action-pypi-publish ) from 1.4.2 to 1.5.0.
- [Release notes](https://github.com/pypa/gh-action-pypi-publish/releases )
- [Commits](https://github.com/pypa/gh-action-pypi-publish/compare/v1.4.2...v1.5.0 )
---
updated-dependencies:
- dependency-name: pypa/gh-action-pypi-publish
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
4 years ago
Xiaoyang Chen
4d31c46532
[pnnx] Update README.md ( #3487 )
4 years ago
nihui
7d3503c06a
pnnx Tensor index ( #3483 )
* pnnx Tensor index
* add test
4 years ago
nihuini
1db16ce9fc
pnnx torch norm stack test
4 years ago
nihuini
23d3340017
pnnx norm stack
4 years ago
nihuini
e33bdd16e8
pnnx fuse conv1d-bn convtranspose1d-bn
4 years ago
nihuini
f8ca1e7585
fix pnnx crash on unsupported expression
4 years ago
nihui
7c60dc2db7
pnnx roialign ( #3478 )
4 years ago
nihui
143258e317
pnnx torchvision deformconv2d ( #3459 )
4 years ago
Xiaohan Liu
3daabd515d
add missing doffset ( #3475 )
4 years ago
nihui
7b222a19af
update benchmark ( #3465 )
* update qcom855+ benchmark
* Update README.md
* Update README.md
* add rock3a, update imx.7d benchmark
* update raspberrypi3b+ benchmark
* update
4 years ago
dog-qiuqiu
009d607a15
add the param file of yolo-fastest in benchmark ( #3470 )
4 years ago
nihuini
014387dfae
update operators doc
4 years ago
nihuini
de436f9e26
pnnx arange matmul zeros_like expand_as
4 years ago
nihui
922f8b33c1
reduction4d, merge keepdims arg, add test ( #3469 )
4 years ago