nihui
9376ba71c1
less unroll for unaryop arm, fix padding arm warning
4 years ago
li mengyang
3a2ac84e3c
add benchmark for amd 5700g ( #3878 )
4 years ago
nihui
7886e90c65
split arm82 source for smaller binary and memory footprint ( #3877 )
* split arm82 source, wip
* check compiler arm82 only for arm 64bit target
* drop arm82 registery
* strict check compiler support arm82
4 years ago
nihui
c1f9b03c0b
unified arm absval clip relu dropout hardsigmoid hardswish sigmoid swish unaryop ( #3876 )
4 years ago
nihui
c5fed063f5
pnnx fuse expression skip foldable ( #3875 )
4 years ago
nihui
40a69a2dd3
discard riscv weight memory ( #3874 )
* discard riscv innerproduct weight
* drop riscv conv convdw weight
* drop riscv deconv deconvdw weight
4 years ago
nihui
7441d8a7c8
eliminate noop cat for single input ( #3871 )
4 years ago
nihui
06a36e9c1f
discard weight memory for mips ( #3869 )
4 years ago
nihui
241524ffce
discard weight memory for x86 arm vulkan ( #3865 )
* discard weight memory for x86 and vulkan
* drop arm innerproduct weight
* drop arm convolution weight
* drop arm convolutiondepthwise weight
* drop x86 vulkan deconvolution deconvolutiondepthwise weight
* drop arm deconvolution deconvolutiondepthwise weight
* arm neon assembly optimization for innerproduct pack4
4 years ago
nihui
d2e87a8264
mips general optimization for convdw3x3 ( #3859 )
4 years ago
nihui
d373407bcb
add c906 c910 v240 toolchain
4 years ago
nihui
48fb166a48
mips loongson mmi optimization for convolution gemm int8 ( #3855 )
4 years ago
nihui
667be10fb0
riscv general optimization for convolution sgemm and winograd and innerproduct ( #3857 )
* riscv general optimization for convolution sgemm and winograd pack1
* riscv general optimization for innerproduct
* riscv general optimization for convdw3x3
4 years ago
nihui
c3adbcf9f3
mips optimization for convolution sgemm ( #3853 )
* mips optimization for convolution sgemm
* mips optimization for general convolution int8 gemm
* mips optmization for convolution winograd pack1
* preload magic
4 years ago
nihui
a5bcc8895f
armv5 optimization for convolution sgemm ( #3852 )
4 years ago
nihui
50d04dee30
optimize sgemm and winograd remain size register layout ( #3851 )
4 years ago
nihui
0a4f50dbf4
arm neon assembly optimization for innerproduct unroll outch 4 ( #3848 )
4 years ago
nihui
569ee37c52
arm neon optimization for pooling bf16s, fix some bf16s packing issue, relax winograd transform intrinsic order ( #3847 )
4 years ago
dependabot[bot]
781145f15b
Bump pypa/cibuildwheel from 2.5.0 to 2.6.0 ( #3844 )
Bumps [pypa/cibuildwheel](https://github.com/pypa/cibuildwheel ) from 2.5.0 to 2.6.0.
- [Release notes](https://github.com/pypa/cibuildwheel/releases )
- [Changelog](https://github.com/pypa/cibuildwheel/blob/main/docs/changelog.md )
- [Commits](https://github.com/pypa/cibuildwheel/compare/2.5.0...2.6.0 )
---
updated-dependencies:
- dependency-name: pypa/cibuildwheel
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
4 years ago
nihui
e49f0226e1
multi-threading rnn/lstm/gru with openmp ( #3834 )
4 years ago
FeiGeChuanShu
02107e0fbf
fix yolox input shape w!=h ( #3839 )
4 years ago
nihui
e7a664c6e5
convert pnnx torch.index_select and torch.scatter_add ( #3842 )
4 years ago
nihui
02a7e64e18
optimize x86 winograd input transform transpose ( #3818 )
* optimize x86 winograd input transform transpose
* x86 sse2/avx optimization for convolution winograd23/43 pack1
4 years ago
Yoh
2a05c69cdd
fix x86 unaryop bug on gcc-4.4 ( #3838 )
4 years ago
nihui
4c7965781f
add pnnx ncnn pass for chain dict output ( #3836 )
* chain dict output
* recognize sub_
4 years ago
nihui
f48d45209b
binaryop type specialization ( #3830 )
4 years ago
nihui
026a04f220
convert torch.norm to ncnn, fix F.normalize vector ( #3828 )
4 years ago
tripleMu
6f4b444fe5
Fix .gitignore ( #3824 )
4 years ago
nihui
bf64d8f1ec
fix winograd function name ( #3820 )
4 years ago
陸 言
2161ab2a0c
Edit _bias128 in scale_x86.cpp for useless if ( #3821 )
4 years ago
nihui
c16cac2678
update glslang, fix system glslang include path ( #3819 )
4 years ago
nihui
9e11dac7d1
simpleomp with libgomp abi ( #3816 )
4 years ago
BUG1989
c2fb93b6ff
Add the benchmark of AX620A ( #3813 )
4 years ago
nihui
f79073c182
update how-to-build doc for raspberrypi and d1
4 years ago
FeiGeChuanShu
617d23f6ce
Add RK3588 benchmark ( #3808 )
4 years ago
nihui
3a827434a9
optimize arm sgemm convolution condition ( #3806 )
4 years ago
nihui
0daad605e0
fix make slice expression with dynamic parameters
4 years ago
wzyforgit
46a6d9b422
Add Loongson and Sunway benchmark data ( #3802 )
4 years ago
nihui
5cdb7f6617
make compiler optimization happy with loop ( #3799 )
4 years ago
tpoisonooo
6fd801b6d7
feat(src/layer): add vision_transformer benchmark ( #3730 )
* feat(src/layer): add vision_transformer benchmark and relative layer
* refactor(testutil.h): add para for RandomMat
4 years ago
nihui
817bd1fdc4
fix vs2022 ci ( #3792 )
4 years ago
Xavier Hsinyuan
29b6a32ac0
RVV: follow intrinsic doc, replace vfredsum_* with vfredusum_* ( #3790 )
* RVV: follow intrinsic doc, vfredusum -> vfredsum
* C906: change toolchains for vfredusum
* RVV: test compiler for vfredusum_vs_*
4 years ago
nihui
2dc1ae45fe
pnnx bitwise and compare op ( #3791 )
* bitwise and compare op
* masked_fill and gather test
4 years ago
nihui
d476191ff1
pnnx export_onnx function ( #3784 )
4 years ago
nihui
0aba8af1d3
pnnx swin transformer ( #3783 )
4 years ago
村长大人
c697e988b0
fixbug: linux-arm-hisiv500 load mem "Bus error" ( #3779 )
4 years ago
Zhiqiang Wang
a7aa0fe70d
Update README.md ( #3778 )
4 years ago
Zhuo Zhang
3dfc10647c
docs: update QQ group info ( #3777 )
4 years ago
NaLan ZeYu
5388f9f312
test: fix printf arguments mismatch ( #3774 )
4 years ago
jasonZhang
663b42e0d2
add tanh avx512 optimize ( #3770 )
4 years ago