nihui
4abadd2ffb
binaryop implicit broadcast B with 1 dimension rank for outer axis ( #4930 )
2 years ago
JeremyRand
0a8cf31a05
Add POWER8 VSX toolchains ( #4853 )
* Add POWER8 VSX toolchains
POWER8, though slower than POWER9, is still used in the wild; these
toolchains should still be much faster on POWER8 than POWER8 without VSX
optimizations.
* VSX toolchains: set -cpu arg in QEMU CI tests
2 years ago
mizu-bai
4c861a0d1a
Add Building with Intel oneAPI ( #4920 )
2 years ago
ฅ'ω'ฅ
2303b77ac1
Update how-to-build.md ( #4872 )
2 years ago
JeremyRand
47e0daf4a1
Translate x86_64 SSE to ppc64le VSX intrinsics ( #4807 )
* Add POWER9 VSX toolchains
Translating x86_64 SSE to ppc64le VSX intrinsics yields a quite large
speedup on POWER9. See this article for background:
https://www.talospace.com/2019/07/easier-power-vectorizing-for-fun-and.html
* Add power9le docs
* power9le clang toolchain: Document Clang 13+ requirement
---------
Co-authored-by: Jeremy Rand <jeremyrand@danwin1210.de>
2 years ago
Kin Yu Shek
e8d8042b90
Fix a mistake in docs/faq ( #4837 )
2 years ago
張小凡
1e0d70af8c
Add translated document: glsl-extension.zh.md ( #4818 )
2 years ago
nihui
43aba6badb
Update glsl-extension.md
2 years ago
nihui
172b748c74
add ncnn glsl extension doc ( #4817 )
2 years ago
nihui
9022b7162a
implement all explicit binaryop broadcast types ( #4809 )
* simplify binaryop
* less gpu test
* update binaryop broadcast doc
* do not test atan2 zero
2 years ago
nihui
6b5ca0f70d
add doc for building for qnx ( #4709 )
3 years ago
nihui
c28c8c04a1
multiheadattention attn mask ( #4668 )
3 years ago
nihui
b640574b88
rough vulkan gemm and multiheadattention ( #4618 )
3 years ago
He Yang
f9180330e2
update how-to-build.md and delete obsolete tutorials in docs ( #4660 )
3 years ago
張小凡
868ea52bea
update faq.md about gpu performance ( #4614 )
3 years ago
Zhuo Zhang
a124c2a839
fix typos in citation and benchmark docs ( #4604 )
3 years ago
inisis
f7de5a7dc2
update faq.md ( #4584 )
3 years ago
inisis
37042b2174
update build doc for Centos users ( #4583 )
3 years ago
nihui
6f661f9bc4
Update FAQ-ncnn-throw-error.md
3 years ago
nihui
afc9310c62
update new operators for modelwriter ( #4540 )
3 years ago
nihui
47ea2877ed
stb and emsdk update ( #4536 )
* stb_image_write 1.16
* stb_image v2.28
* update emsdk 3.1.28
* enable stb arm neon
* update doc
Co-authored-by: ncnnnnn <67086033+ncnnnnn@users.noreply.github.com>
3 years ago
nihui
fc6ce4a641
copyto operator ( #4522 )
3 years ago
nihui
242e775d21
pnnx convert torch log10, pow 2 as square ( #4518 )
3 years ago
nihui
246e71c526
implement atan2 ( #4516 )
3 years ago
Fangjun Kuang
92e75105c9
Support torch.cumsum ( #4505 )
3 years ago
nihui
ab4cfbf5b0
enrich ncnn binary broadcast rules ( #4513 )
3 years ago
Hitesh Kumar
add0a7bac4
fix : minor typo readme ( #4486 )
3 years ago
nihui
fed99fd35b
gemm output transpose, prepack c ( #4479 )
* mha is now permute and reshape free
* gemm user defined tile mnk param
3 years ago
WuJinxuan
10e9d91576
Add x86 MultiHeadAttention ( #4443 )
* fix doc, sync x86 gemm fix
Co-authored-by: EdVince <EdVince@users.noreply.github.com>
Co-authored-by: nihuini <nihuini@tencent.com>
3 years ago
nihui
fd1ac3c7a0
x86 optimization for gemm unified elempack ( #4387 )
3 years ago
nihui
eceac35a7f
implement MultiheadAttention kdim vdim ( #4347 )
3 years ago
Lry89757
6a47f8d15c
gridsample op support ( #4288 )
Co-authored-by: LRY89757 <LRY89757@users.noreply.github.com>
Co-authored-by: nihuini <nihuini@tencent.com>
Co-authored-by: nihui <shuizhuyuanluo@126.com>
3 years ago
Fangjun Kuang
5281d51535
implement GLU and pnnx conversion ( #4283 )
3 years ago
nihui
77eda4c19f
implement lstm proj_size ( #4263 )
3 years ago
MisakaBit
bbbe17c5b5
docs: disable fp16 when wrong results encountered caused by overflow ( #4248 )
3 years ago
Lry89757
b16f8ca921
[docs] Fix typo ( #4201 )
3 years ago
miemie2013
720f3c9aab
Add DeformableConv2D ( #4070 )
* Add DeformableConv2D
* add unittest and docs
* pnnx torchvision deformconv2d conversion
Co-authored-by: miemie2013 <miemie2013@users.noreply.github.com>
Co-authored-by: nihui <shuizhuyuanluo@126.com>
3 years ago
Zhouzhou
4158e63668
docs:add sse optimized zh ( #4053 )
Signed-off-by: Zhouzhou <1197236910@qq.com>
3 years ago
tpoisonooo
207ca0e0bb
Improve protobuf FAQ doc ( #3973 )
3 years ago
nihui
f8c76e730a
fix ci release optimization with cmake >= 3.21 and ndk23 ( #3976 )
* Update release.yml
* Update how-to-build.md
3 years ago
nihui
14588023b5
release ubuntu 22.04 package, fix ndk debug flag for r23+ ( #3972 )
3 years ago
dankernel
01655613d1
Create build-for-VisualStudio.en.md ( #3956 )
- Translated Chinese documents into English
- Updated to VS2022 version
3 years ago
Jianbo-Ning
4fad760a64
add faq.en.md ( #3901 )
3 years ago
nihui
f79073c182
update how-to-build doc for raspberrypi and d1
4 years ago
Lry89757
ca9abd1c4a
Update the add-custom-layer.zh.md ( #3741 )
1. 🐛 Fix Bug of float and int. 修复了std::max()中参数int和float参数不符合的Bug
2. 👀 The structure of ncnn changes. ncnn文件结构变动,所有testlayer.cpp改成在tests/文件夹中
4 years ago
nihui
ae75a093fa
update print 4d mat, remove deprecated content
4 years ago
nihui
49e70c81a6
update linking glslang libraries
4 years ago
_0Mirror
2dcd85ca71
docs: fix docs about 'Build for iOS on macOS with xcode' ( #3696 )
4 years ago
nihui
c09d7b3591
mips msa optimization for convolution int8 ( #3675 )
* basic mips msa optimization for convolution int8
* mips msa optimization for convolution int8 gemm
* mips msa optimization for convolution int8 winograd pack8to4/pack8to1
* mention msa maddv/msubv intrinsics bug
4 years ago
tpoisonooo
6e12647985
docs(how-to-build.md): update jetson description ( #3622 )
Co-authored-by: MegEngine <megengine@megvii.com>
4 years ago