nihui
0734b657d9
spectrogram and inverse spectrogram ( #5779 )
* only supports hann, hamming and all-one window
* inverse spectrogram does not support length parameter
* spectrogram always returns torch.view_as_real(out) as ncnn does not support complex typed mat yet
* inverse spectrogram always accepts torch.view_as_complex(in) as ncnn does not support complex typed mat yet
1 year ago
nihui
66b54cbea2
multiheadattention int8 quantization ( #5733 )
* x86 vulkan fallback
* comment about bf16s
1 year ago
nihui
1c7af00499
gemm int8 quantization ( #5706 )
* quantize gemm
* write gemm quantize scales
* update doc
* less openmp args
* x86 riscv fallback
* skip gemm vulkan int8
* fix noint8 test, fix arm bf16 test
* enable vfpv4 on neon build only
* fix gemm vulkan without C
* fp16 pack8 output
* enable elempack=8 only for asimdhp+
* tiled gemm int8 test
* opt arm64 tiles, fix asimdhp dispatch
1 year ago
Ankush Goel
9b5f6a39b4
fix: typo ( #5709 )
1 year ago
nihui
5df5413c81
embed int8 quantization and add embed test ( #5667 )
1 year ago
nihui
fdf0df3079
RMSNorm ( #5630 )
1 year ago
張小凡
051b04ffb4
Updated use-ncnn-with-pytorch-or-onnx document ( #5557 )
1 year ago
luxincn
02327ba96f
add esp32 build document and ci Refs #5536 ( #5567 )
1 year ago
TianZer
fc6b753d31
Add mingw ci and building document ( #5547 )
1 year ago
nihui
4c3debae2d
multiheadattention scale param ( #5526 )
* update swiftshader
* skip vs2017 swiftshader
1 year ago
村长大人
1e75a2df21
add harmonyos how to build with vulkan ( #5475 )
2 years ago
RoachZhao
d4292e9a65
Update vulkan-notes.md ( #5472 )
`compute_queue_count` is a function.
2 years ago
nihui
08b7d99a75
rnn/lstm/gru dynamic quantization ( #5435 )
2 years ago
Tabbleman
be15dbe421
add riscv-gnu-toolchain build guide;-) ( #5446 )
2 years ago
nihui
db035d602d
update ncnnoptimize layers, lightmode=false keeps original weight ( #5414 )
2 years ago
lll143653
342faf2e79
Update how-to-build.md ( #5389 )
modify the "bulid for macOS"
2 years ago
Galasnow
964ed7a56a
Add implementation of build for protobuf>=22.0 on Windows ( #5359 )
And fix a missing word.
2 years ago
luqiang-guo
8ddc85f4dd
fix doc dual issue ( #5342 )
2 years ago
afredooo
96d073d541
Some typo fixes ( #5339 )
2 years ago
hugo-syn
7d8019d577
chore: add markdown code highlight ( #5302 )
Signed-off-by: hugo-syn <hugo.vincent@synacktiv.com>
2 years ago
hugo-syn
f35eb4b3b8
chore: Fix multiple typos ( #5301 )
Signed-off-by: hugo-syn <hugo.vincent@synacktiv.com>
2 years ago
nihui
5329d32e74
check vulkan fp16 uniform support and implement lfp conversion without fp16u ( #5287 )
2 years ago
nihui
c222208cc9
feat mask for disable threading, make some extractor setter no-op, update doc ( #5270 )
2 years ago
Ikko Eltociear Ashimine
5581d27d4d
docs: update FAQ-ncnn-vulkan.md ( #5268 )
plase -> please
2 years ago
JeremyRand
ed22eb44cc
Document libomp-dev dependency ( #5228 )
Co-authored-by: Jeremy Rand <jeremyrand@danwin1210.de>
2 years ago
lll143653
d4dcb3a2f0
Update FAQ-ncnn-produce-wrong-result.md ( #5220 )
2 years ago
JeremyRand
c1d952da7e
Fix "Rasberry" typo ( #5182 )
Co-authored-by: Jeremy Rand <jeremyrand@danwin1210.de>
2 years ago
JeremyRand
765ac7aef6
Update Vulkan dependency docs ( #5178 )
Vulkan deps are optional. vulkan-utils is replaced with vulkan-tools
since Debian 10.
Co-authored-by: Jeremy Rand <jeremyrand@danwin1210.de>
2 years ago
JeremyRand
d1f6193250
Update POWER Clang version docs ( #5174 )
Clang prior to 13 no longer fails to build ncnn since #4845 .
Clang 18 fixes SSE4.1 translation, which yields a major speedup.
Co-authored-by: Jeremy Rand <jeremyrand@danwin1210.de>
2 years ago
張小凡
2ecaf37a3e
Fix find GPU driver dll path in windows ( #5141 )
2 years ago
nihui
b4f26237cb
in-house vulkan loader ( #5130 )
* vulkan-driver-loader.md
* static vulkan on apple
2 years ago
nihui
4494aadd74
deconvolution dynamic weight ( #5119 )
2 years ago
Justin Fung
009f5eae97
Add description of build for Nintendo 3DS homebrew launcher ( #5116 )
2 years ago
nihui
14e14a9ae8
slice with indices ( #5103 )
2 years ago
Yoh
3f437d3f3d
Grid sample op ( #4373 )
* pnnx support grid_sample op
* complete the permute and gridsample operator fusion
* spilt calculation into two stages and support permute fusion
2 years ago
nihui
26a70c9b05
fix build with vanilla c906 toolchain ( #5048 )
2 years ago
Amir Ramezani
7e5fa3ade3
shrink operator ( #5022 )
2 years ago
Beq Jal
bcfec1da33
Celu layer and export to ncnn ( #5019 )
2 years ago
Beq Jal
c851231832
add diag layer and its converter ( #4935 )
2 years ago
nihui
cd1c0d6eab
fix build with new protobuf target ( #4955 )
2 years ago
nihui
4abadd2ffb
binaryop implicit broadcast B with 1 dimension rank for outer axis ( #4930 )
2 years ago
JeremyRand
0a8cf31a05
Add POWER8 VSX toolchains ( #4853 )
* Add POWER8 VSX toolchains
POWER8, though slower than POWER9, is still used in the wild; these
toolchains should still be much faster on POWER8 than POWER8 without VSX
optimizations.
* VSX toolchains: set -cpu arg in QEMU CI tests
2 years ago
mizu-bai
4c861a0d1a
Add Building with Intel oneAPI ( #4920 )
2 years ago
ฅ'ω'ฅ
2303b77ac1
Update how-to-build.md ( #4872 )
2 years ago
JeremyRand
47e0daf4a1
Translate x86_64 SSE to ppc64le VSX intrinsics ( #4807 )
* Add POWER9 VSX toolchains
Translating x86_64 SSE to ppc64le VSX intrinsics yields a quite large
speedup on POWER9. See this article for background:
https://www.talospace.com/2019/07/easier-power-vectorizing-for-fun-and.html
* Add power9le docs
* power9le clang toolchain: Document Clang 13+ requirement
---------
Co-authored-by: Jeremy Rand <jeremyrand@danwin1210.de>
2 years ago
Kin Yu Shek
e8d8042b90
Fix a mistake in docs/faq ( #4837 )
2 years ago
張小凡
1e0d70af8c
Add translated document: glsl-extension.zh.md ( #4818 )
2 years ago
nihui
43aba6badb
Update glsl-extension.md
2 years ago
nihui
172b748c74
add ncnn glsl extension doc ( #4817 )
2 years ago
nihui
9022b7162a
implement all explicit binaryop broadcast types ( #4809 )
* simplify binaryop
* less gpu test
* update binaryop broadcast doc
* do not test atan2 zero
2 years ago