nihui
2a00a74c1d
add loongarch ci build status
4 years ago
nihui
5b7268d95f
loongarch64 ci ( #3455 )
4 years ago
nihui
131f3d1323
x86 avx512 optimization for convolution winograd pack16to1 and deconvolution family, increase simpleomp argv count ( #3694 )
* convolution winograd pack16to1
* x86 deconvolution and deconvolutiondepthwise
* simpleomp allow 32 arguments
* drop shadow variable workaround
* less winograd test error
4 years ago
_0Mirror
2dcd85ca71
docs: fix docs about 'Build for iOS on macOS with xcode' ( #3696 )
4 years ago
nihui
3d169b3237
x86 avx512 optimization ( #3691 )
* convolution sgemm pack16to1
* convolution sgemm pack4to16
* eltwise avx512
4 years ago
nihui
9298d05e86
split convolution winograd transform input output ( #3688 )
4 years ago
nihui
32560f47de
detect more baseline avx512 flags ( #3687 )
4 years ago
dependabot[bot]
a0621487ac
Bump actions/download-artifact from 2 to 3 ( #3686 )
Bumps [actions/download-artifact](https://github.com/actions/download-artifact ) from 2 to 3.
- [Release notes](https://github.com/actions/download-artifact/releases )
- [Commits](https://github.com/actions/download-artifact/compare/v2...v3 )
---
updated-dependencies:
- dependency-name: actions/download-artifact
dependency-type: direct:production
update-type: version-update:semver-major
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
4 years ago
dependabot[bot]
c7808d2c6a
Bump actions/upload-artifact from 2 to 3 ( #3685 )
Bumps [actions/upload-artifact](https://github.com/actions/upload-artifact ) from 2 to 3.
- [Release notes](https://github.com/actions/upload-artifact/releases )
- [Commits](https://github.com/actions/upload-artifact/compare/v2...v3 )
---
updated-dependencies:
- dependency-name: actions/upload-artifact
dependency-type: direct:production
update-type: version-update:semver-major
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
4 years ago
nihui
dadc640c66
x86 avx512 optimization ( #3581 )
* unified relu avx512
* unifed clip avx512
* unaryop avx512
* sigmoid avx512
* binaryop avx512
* padding convolution avx512
* convolutiondepthwise avx512
* innerproduct avx512
* reshape avx512
* slice avx512
* hardsigmoid hardswish avx512
* swish avx512
* pooling avx512
* crop avx512
* convolution sgemm pack16
* convolution 3x3 winograd pack16
* interp avx512
* convolution sgemm pack1to16
* convolution sgemm pack16to8
* convolution sgemm pack8to16
* convolution sgemm pack16to4
* fix vulkan permute pack8
* fix vulkan convolution gemm pack8to1
4 years ago
nihui
462b80052f
define NOMINMAX for pnnx windows build
4 years ago
nihui
a9c59bb93c
add -mavx512bw flag for avx512 build ( #3671 )
4 years ago
dependabot[bot]
97f0fbea01
Bump codecov/codecov-action from 2.1.0 to 3 ( #3680 )
Bumps [codecov/codecov-action](https://github.com/codecov/codecov-action ) from 2.1.0 to 3.
- [Release notes](https://github.com/codecov/codecov-action/releases )
- [Changelog](https://github.com/codecov/codecov-action/blob/master/CHANGELOG.md )
- [Commits](https://github.com/codecov/codecov-action/compare/v2.1.0...v3 )
---
updated-dependencies:
- dependency-name: codecov/codecov-action
dependency-type: direct:production
update-type: version-update:semver-major
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
4 years ago
nihui
be7cae2bef
mips msa optimization for convolutiondepthwise innerproduct int8 ( #3679 )
4 years ago
nihui
4eb279ce26
add loongson mmi compiler header, less msa prefetch distance ( #3678 )
4 years ago
nihui
c09d7b3591
mips msa optimization for convolution int8 ( #3675 )
* basic mips msa optimization for convolution int8
* mips msa optimization for convolution int8 gemm
* mips msa optimization for convolution int8 winograd pack8to4/pack8to1
* mention msa maddv/msubv intrinsics bug
4 years ago
dependabot[bot]
1a0bf1f517
Bump pypa/cibuildwheel from 2.3.1 to 2.4.0 ( #3674 )
Bumps [pypa/cibuildwheel](https://github.com/pypa/cibuildwheel ) from 2.3.1 to 2.4.0.
- [Release notes](https://github.com/pypa/cibuildwheel/releases )
- [Changelog](https://github.com/pypa/cibuildwheel/blob/main/docs/changelog.md )
- [Commits](https://github.com/pypa/cibuildwheel/compare/v2.3.1...v2.4.0 )
---
updated-dependencies:
- dependency-name: pypa/cibuildwheel
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
4 years ago
nihui
72c467d1d9
mips msa optimization for quantize dequantize requantize ( #3672 )
4 years ago
WuJinxuan
e984f9f40d
add multiheadattention arm ( #3667 )
* add:multiheadattention_arm
* pass the test in local
* add omp
* return naive
Co-authored-by: EdVince <EdVince@users.noreply.github.com>
4 years ago
nihui
1fcad0e765
loongson mmi optional layer
4 years ago
nihui
0d83ad99f8
link pnnx with pthread, fix minmax issue on windows build
4 years ago
nihui
559e5b23f9
vulkan tensorcore optimization ( #3628 )
* query and enable cooperative matrix
* fix build with old vulkan sdk
* implement cooperative matrix optimization
* add nvidia-t4 coverage
* adjust test option for more coverage
4 years ago
nihui
4302f78f55
less specialization constant for vulkan conv1x1s1d1 shaders ( #3657 )
4 years ago
Guo Haria
67f52ba73c
Update yolov5.py ( #3656 )
fix rect position bug.
4 years ago
Shangxin
94beeaf000
fix word case ( #3655 )
4 years ago
nihui
b934fd53e7
fix vs2019 packaging
4 years ago
tpoisonooo
aab476f5b6
fix build warning ( #3651 )
4 years ago
nihui
9c92df814f
better condition for mixing vulkan winograd f23 and f43
4 years ago
nihui
944829838b
vulkan conv1x1s1d1 for any packing ( #3646 )
4 years ago
nihui
62a872bad3
vulkan winograd for any packing ( #3645 )
4 years ago
dependabot[bot]
aac71352fd
Bump actions/cache from 2.1.7 to 3 ( #3643 )
Bumps [actions/cache](https://github.com/actions/cache ) from 2.1.7 to 3.
- [Release notes](https://github.com/actions/cache/releases )
- [Commits](https://github.com/actions/cache/compare/v2.1.7...v3 )
---
updated-dependencies:
- dependency-name: actions/cache
dependency-type: direct:production
update-type: version-update:semver-major
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
4 years ago
nihui
9e3cc1c5df
two stage vulkan innerproduct ( #3642 )
4 years ago
nihui
677d54d496
fuse vulkan winograd pad and crop ( #3640 )
4 years ago
nihui
002c07d4ec
mix vulkan winograd f23 and f43 ( #3639 )
* mix vulkan winograd f23 and f43
* larget epsilon for winograd optimization test
4 years ago
nihui
d42e048b56
pnnx convert torch.addmm ( #3634 )
4 years ago
nihui
3ddd65e18c
massive vulkan optimization part3 ( #3632 )
* implicit gemm
* unroll direct conv by 2x2x2
4 years ago
nihui
cfcb1cffa9
massive vulkan optimization part2 ( #3621 )
* vulkan local memory optimization for conv1x1 pack4 and winograd on dgpu
* unified innerproduct pipeline creation
* reorder deconvolution weight layout
* flexible local memory data type
* more local memory optimization for conv/deconv gemm
4 years ago
tpoisonooo
edd3a78ffe
style(src/layer): remove unused para and var ( #3623 )
Co-authored-by: MegEngine <megengine@megvii.com>
Co-authored-by: tpoisonooo <tpoisonooo@users.noreply.github.com>
4 years ago
tpoisonooo
6e12647985
docs(how-to-build.md): update jetson description ( #3622 )
Co-authored-by: MegEngine <megengine@megvii.com>
4 years ago
tpoisonooo
b8f36c258e
Update how-to-build.md ( #3619 )
4 years ago
nihui
f9663d7726
pnnx support torch 1.11.0 ( #3617 )
* adapt torch 1.11.0 api changes
* find python library for torchvision linking
4 years ago
nihui
6e19ab26ba
massive vulkan optimization ( #3602 )
* vulkan deconvolution sgemm col2im
* vulkan convolution winograd43
* improve fp16s numeric stablity
* vulkan convolution im2col sgemm
* check squeezenet top2, as top3 vs top4 score too close..
4 years ago
nihui
8f25ba0cab
enable fp16a on mali-g31
4 years ago
Kenji Mouri
8e29c42080
Improve SSE2 implementations in x86 targets. ( #3605 )
* Make some typos for SSE2 floor.
* Improve the implementation of SSE2 abs.
* Improve the implementation of SSE2 ceil.
4 years ago
Kenji Mouri
2b4a2125e6
Add SSE2 implementation of floor and ceil in x86 targets. ( #3595 )
* Add SSE2 implementation of floor and ceil in x86 targets.
* apply code-format changes
* Update the SSE2 floor implementation.
Co-authored-by: MouriNaruto <MouriNaruto@users.noreply.github.com>
4 years ago
MouriNaruto
f4f4cfd784
apply code-format changes
4 years ago
nihui
8af8e52cb0
add nanodetplus pnnx example
4 years ago
Kenji Mouri
3ba5d9765f
Add arm and arm64 targets support for MSVC. ( #3592 )
4 years ago
Kenji Mouri
e4f6b118a2
Fix comment typo because it should be itm[6][6]. ( #3591 )
4 years ago
nihuini
b053a8c6d5
fix unlocked pool allocator destroyed too early issue in gpu convdw and deconvdw inference
4 years ago