nihui
e80fcbca8f
prefer faster and larger device local only memory on amd integrated graphics, heap budget value follows the same strategy as blob allocator ( #4936 )
2 years ago
佰阅
75e10c6e61
Support mac platform static library compilation ( #4859 )
2 years ago
tpoisonooo
a24787b32b
feat(benchmark/benchncnn.cpp): support user defined case ( #4782 )
2 years ago
dependabot[bot]
ffe1510c2f
Bump pypa/cibuildwheel from 2.13.1 to 2.15.0 ( #4926 )
Bumps [pypa/cibuildwheel](https://github.com/pypa/cibuildwheel ) from 2.13.1 to 2.15.0.
- [Release notes](https://github.com/pypa/cibuildwheel/releases )
- [Changelog](https://github.com/pypa/cibuildwheel/blob/main/docs/changelog.md )
- [Commits](https://github.com/pypa/cibuildwheel/compare/v2.13.1...v2.15.0 )
---
updated-dependencies:
- dependency-name: pypa/cibuildwheel
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2 years ago
nihui
4abadd2ffb
binaryop implicit broadcast B with 1 dimension rank for outer axis ( #4930 )
2 years ago
nihui
285d0793d4
pnnx fuse expression for scalar-like attribute and unbind chain ( #4928 )
2 years ago
nihui
60fedae38b
fix pnnx ghost reshape shape expression inputs, fix intmax overflow on fuse/eval expression ( #4923 )
2 years ago
JeremyRand
0a8cf31a05
Add POWER8 VSX toolchains ( #4853 )
* Add POWER8 VSX toolchains
POWER8, though slower than POWER9, is still used in the wild; these
toolchains should still be much faster on POWER8 than POWER8 without VSX
optimizations.
* VSX toolchains: set -cpu arg in QEMU CI tests
2 years ago
mizu-bai
4c861a0d1a
Add Building with Intel oneAPI ( #4920 )
2 years ago
Han Gao
974d1122b2
Add TH1520 gpu benchmark ( #4917 )
2 years ago
nihui
e02b6e8521
fix pnnx slice copy shape type, inplace op link output ( #4914 )
2 years ago
nihui
e13fbe2a8a
pnnx add missing Tensor.to pattern ( #4908 )
2 years ago
nihui
759d55d555
pnnx convert torch cross and t ( #4896 )
2 years ago
Joson
c6b191c3df
add RDK X3 Module benchmark ( #4895 )
2 years ago
Han Gao
412960c027
Update 3A6000 data ( #4894 )
2 years ago
nihui
c45c01c7c1
enable VK_KHR_cooperative_matrix ( #4823 )
* enable VK_KHR_cooperative_matrix
* add khr cm shader
* update glslang
* print matrix info
2 years ago
nihui
07b840087b
code clean for mha arm opt ( #4613 )
2 years ago
nihui
669ee2f2ff
pnnx update ( #4870 )
Tensor.fill
Tensor.index_put
Tensor.to
Tensor.type_as
torch.topk
fmod
call Tensor member functions with inputnames
static shape_as_tensor
nn.Linear dynamic bias
eliminate noop type_as
convert two-dim nn.Linear to ncnn gemm
convert torch.stack to ncnn concat+reshape
ignore torch einsum path input
2 years ago
nihui
55709708e9
x86 optimization for convolution int8 packed unified elempack ( #4861 )
2 years ago
ฅ'ω'ฅ
2303b77ac1
Update how-to-build.md ( #4872 )
2 years ago
Mek101
9f29a1737c
c_api return null on null layer ( #4865 )
2 years ago
Mek101
411a098d5e
Expose layer_to_index in c-api ( #4860 )
2 years ago
Mek101
a9a7be0e0a
c_api: expose Mat border processing api ( #4855 )
2 years ago
Upliner Mikhalych
e8645e9117
Don't silently ignore errors in VkCompute::submit_and_wait ( #4828 )
2 years ago
nihui
810bfbac6e
pnnx eliminate noop expand and expand_as ( #4850 )
2 years ago
JeremyRand
472244420e
VSX toolchains: check for SSE2 support ( #4845 )
Improves compatibility with Clang 11.
Also rename NCNN_SSE* options to NCNN_VSX_SSE* to avoid conflict between
x86 and POWER (went unnoticed before because x86 doesn't have an option
for toggling SSE 4.1).
Co-authored-by: Jeremy Rand <jeremyrand@danwin1210.de>
2 years ago
nihui
a87be24795
pnnx convert conv with non-zero padding mode ( #4849 )
2 years ago
dependabot[bot]
f1943fd847
Bump pypa/cibuildwheel from 2.13.0 to 2.13.1 ( #4796 )
Bumps [pypa/cibuildwheel](https://github.com/pypa/cibuildwheel ) from 2.13.0 to 2.13.1.
- [Release notes](https://github.com/pypa/cibuildwheel/releases )
- [Changelog](https://github.com/pypa/cibuildwheel/blob/main/docs/changelog.md )
- [Commits](https://github.com/pypa/cibuildwheel/compare/v2.13.0...v2.13.1 )
---
updated-dependencies:
- dependency-name: pypa/cibuildwheel
dependency-type: direct:production
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2 years ago
Zhang Geng
5e50270e05
Update KunPeng 920 Platform ( #4847 )
2 years ago
nihui
91090d793b
pnnx fix build, prepend batch for broadcast reshape ( #4841 )
* fix build, prepend batch for broadcast reshape
* sanitize filename
* do not fuse to eltwise if broadcast
2 years ago
JeremyRand
47e0daf4a1
Translate x86_64 SSE to ppc64le VSX intrinsics ( #4807 )
* Add POWER9 VSX toolchains
Translating x86_64 SSE to ppc64le VSX intrinsics yields a quite large
speedup on POWER9. See this article for background:
https://www.talospace.com/2019/07/easier-power-vectorizing-for-fun-and.html
* Add power9le docs
* power9le clang toolchain: Document Clang 13+ requirement
---------
Co-authored-by: Jeremy Rand <jeremyrand@danwin1210.de>
2 years ago
Kin Yu Shek
e8d8042b90
Fix a mistake in docs/faq ( #4837 )
2 years ago
未知时光
dee2e0dc0c
fix ios-simulator-gpu badge ( #4836 )
2 years ago
張小凡
1e0d70af8c
Add translated document: glsl-extension.zh.md ( #4818 )
2 years ago
nihui
4b97730b0d
x86 packed convolution transform kernel avx2/avx512 optimization ( #4819 )
* fix non-sse non-neon weight pack
2 years ago
nihui
6c21b08727
check loongarch lasx and enable ( #4820 )
2 years ago
nihui
43aba6badb
Update glsl-extension.md
2 years ago
nihui
172b748c74
add ncnn glsl extension doc ( #4817 )
2 years ago
nihui
1283a19305
pnnx convert torch round trunc ( #4813 )
* update riscv qemu
* c906 test on qemu
* fix qemu aarch64
2 years ago
nihui
3a74ae4d3d
update rpi3b+ benchmark data
2 years ago
nihui
8c40a59216
pnnx insert reshape for ncnn global pooling ( #4812 )
2 years ago
nihui
9022b7162a
implement all explicit binaryop broadcast types ( #4809 )
* simplify binaryop
* less gpu test
* update binaryop broadcast doc
* do not test atan2 zero
2 years ago
nihui
cc37c10997
update rpi4b benchmark
2 years ago
Zhenjia Guo
d9e45ec703
fix pnnx PermissionError ( #4801 )
2 years ago
Zhang Geng
4a78b6d457
Update HUAWEI KunPeng 920 platform ( #4795 )
2 years ago
nihui
e112461d30
write shape, fuse sam image encoder attention ( #4792 )
* write shape, fuse sam image encoder attention
* set more dynamic shape as static
* less warning for constant tensor node
2 years ago
nihui
b8cf8cb73e
pnnx rewrite multiple ops ( #4780 )
fuse F.scaled_dot_product_attention
3 years ago
張小凡
ec0a8503c5
Fix function recursion errors under some low-version c++ linux compilers ( #4768 )
3 years ago
Justin62628
9dc581e490
Fix pnnx index out of range in eval expression ( #4765 )
3 years ago
dependabot[bot]
cb104e31ee
Bump pypa/cibuildwheel from 2.12.3 to 2.13.0 ( #4759 )
Bumps [pypa/cibuildwheel](https://github.com/pypa/cibuildwheel ) from 2.12.3 to 2.13.0.
- [Release notes](https://github.com/pypa/cibuildwheel/releases )
- [Changelog](https://github.com/pypa/cibuildwheel/blob/main/docs/changelog.md )
- [Commits](https://github.com/pypa/cibuildwheel/compare/v2.12.3...v2.13.0 )
---
updated-dependencies:
- dependency-name: pypa/cibuildwheel
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
3 years ago