Yexuan Wu
e2d93a482e
Unified elempack activation function vulkan shader ( #6175 )
10 months ago
WANG KE
4d8220ce22
Skip int8 model on GPU and merge from upstream ( #6174 )
10 months ago
GIBEREZ
44982d0d23
About the update to the GLSL documentation after the image functions are deprecated ( #6173 )
10 months ago
Yexuan Wu
11b6990a9e
vulkan sigmoid unified elempack shader ( #6170 )
10 months ago
Willaaaaaaa
04341120d4
feat: add cmake print ver support ( #6165 )
10 months ago
Yexuan Wu
7abf84eb74
Fix win32 GetLogicalProcessorInformationEx API ( #6169 )
* fix crash on nt kernel without GetLogicalProcessorInformationEx
10 months ago
nihui
0cfe201b3c
fix vulkan absval fp16 ( #6167 )
* fix 1d 2d cstep
* fix ranged cstep
10 months ago
Christopher
260f493ada
add cross toolchain cmake config of AK3918(AK) and SS928(hisi) ( #6164 )
10 months ago
AtomAlpaca
f2970319d3
loongarch LASX optimization for math/trigonometric functions and sigmoid ( #6163 )
* Add lasx optimization for loongarch_usability.h
* Add _LOONGARCH_FLOAT_CONST_PS256 to avoid variable redefinition
* Add lasx optimization for math funtion
* Add lasx optimization for sigmoid
* add lsx optimization for trigonometric functions
* add lasx optimization for trigonometric functions
10 months ago
chri321
4c72f52954
docs: update Chinese glsl-extension documentation ( #6162 )
- synchronize the latest English content to the Chinese documentation
- correct spelling errors in the English version of glsl-extention
- Fix spelling 'enable_validation_layer' in src/gpu.cpp
10 months ago
dependabot[bot]
cae9e636d0
Bump stefanzweifel/git-auto-commit-action from 5 to 6 ( #6115 )
Bumps [stefanzweifel/git-auto-commit-action](https://github.com/stefanzweifel/git-auto-commit-action ) from 5 to 6.
- [Release notes](https://github.com/stefanzweifel/git-auto-commit-action/releases )
- [Changelog](https://github.com/stefanzweifel/git-auto-commit-action/blob/master/CHANGELOG.md )
- [Commits](https://github.com/stefanzweifel/git-auto-commit-action/compare/v5...v6 )
---
updated-dependencies:
- dependency-name: stefanzweifel/git-auto-commit-action
dependency-version: '6'
dependency-type: direct:production
update-type: version-update:semver-major
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
10 months ago
nihui
171b9d1bba
use spdx license header, copyright Tencent ( #6152 )
11 months ago
nihui
075d07ede2
compute-only vulkan ( #6131 )
11 months ago
zhuzeitou
10912b3b58
Use platform-specific APIs for environment variables ( #6147 )
* Use platform-specific APIs for environment variables
The previous patch used `putenv` as a quick fix for Windows compatibility. However, `putenv` is a legacy API and not the recommended choice.
This commit replaces the single `putenv` call with the most appropriate function for each platform:
- On Windows, it now uses the modern and secure `_putenv_s`.
- On Unix-like systems, it uses the standard `setenv`.
---------
Co-authored-by: nihui <shuizhuyuanluo@126.com>
11 months ago
nihui
9f832c19c1
vulkan int8 packing quantize dequantize requantize ( #3731 )
* add int8 definitions
* packing vulkan int8/int32, quantize vulkan
* vulkan dequantize
* requantize vulkan
11 months ago
J. Zow
ed6dcd0c81
Fix ci error for update linux-riscv64.yml ( #6133 )
11 months ago
nihui
7557f5c208
vulkan absval unified elempack shader ( #6132 )
11 months ago
nihui
626d9d0910
vulkan packing code clean, drop image storage type, unified fp16p fp16s packing ( #6128 )
11 months ago
nihui
bd0b111775
vulkan tight fp16p pack1 ( #6127 )
11 months ago
nihui
24a3b99f1f
drop layer support_image_storage and option use_image_storage ( #6126 )
* fix pyncnn build
11 months ago
nihui
f168962a74
update glslang for fp8, fix fp8 enum ( #6125 )
11 months ago
nihui
abf0de4488
update ruapu to detect zfh zvfh xtheadvector ( #5841 )
* always prefer xtheadvector
* update ci toolchain
11 months ago
nihui
211e238639
drop layer forward vkimagemat ( #6124 )
vkimagemat was originally used as a mat storage in the hope of improving performance on old adreno gpus, but in fact it is slower than the cpu in most cases and is no longer suitable for the latest adreno architecture and large shapes
11 months ago
Yexuan Wu
1cd5373483
instancenorm x86 simd optimization ( #6097 )
11 months ago
nihui
cc40332804
discover VK_KHR_vulkan_memory_model ( #6121 )
11 months ago
nihui
6e3052a88d
print gpu matrix property info ( #6114 )
11 months ago
nihui
8998a13d06
discover VK_EXT_shader_float8 ( #6120 )
11 months ago
nihui
ef5e79d80e
add missing macros for VK_NV_cooperative_vector and VK_NV_cooperative_matrix2 ( #6119 )
11 months ago
nihui
12f57fb3d1
discover VK_NV_cooperative_matrix2 ( #6118 )
11 months ago
nihui
510b461e9a
discover VK_NV_cooperative_vector ( #6117 )
11 months ago
nihui
9cdc02bb7a
unified vulkan khr/nv cooperative matrix shader ( #6116 )
11 months ago
nihui
b9f98f0d3a
always allocate aligned size for 1d/2d mat and vkmat ( #6104 )
* fix sub mat cstep
* fix embed
* rnn/lstm/gru int8 test without rounding diversity
11 months ago
nihui
8a2eab1114
set localsize as multiple of subgroup size ( #2483 )
* fix innerproduct gemm vulkan
11 months ago
nihui
87e8b5f4c1
use combine_x for sse/avx vector combination ( #6113 )
1 year ago
nihui
23b64c9cf9
fix vulkan validation error, do not enforce local_size_x be multiple of subgroup size ( #6111 )
1 year ago
nihui
4c4ecdf118
dequantize pack8 for all datatypes, fix convdw int8 dequant pack8 ( #6109 )
1 year ago
hanzh
78b2e68728
arm unified elempack optimization for groupnorm ( #4080 )
Co-authored-by: mmyyy22 <mmyyy22@users.noreply.github.com>
Co-authored-by: nihui <nihuini@tencent.com>
1 year ago
nihui
6f175fb622
move ci gpu swiftshader to ubuntu25 ( #6107 )
1 year ago
nihui
1d84e987ce
x86 unified elempack optimization for groupnorm, simplify c groupnorm ( #6106 )
1 year ago
nihui
1365eed05e
simplify c layernorm ( #6105 )
1 year ago
zhuzeitou
7b63f5d682
Switch to putenv to fix compilation with llvm-mingw ( #6101 )
Replaces `setenv` with `putenv` because `setenv` is not available on
some toolchains (e.g., llvm-mingw, clang-cl), leading to compilation
errors.
1 year ago
nihui
ebc041cc56
force subgroup 32 for cooperative matrix shader atm ( #6100 )
1 year ago
Lfalive
2ef954b672
unaryop avx512 mask optimization ( #6098 )
1 year ago
nihui
6af6e8a96b
ppocr-v5 example ( #6092 )
* warn puttext
* split dict
1 year ago
Lfalive
7fd167fd25
tanh avx512 mask optimization ( #6096 )
1 year ago
nihui
5cd76533cc
pnnx generate test_inference with valid input shapes anyway ( #6095 )
1 year ago
Yexuan Wu
73d8500988
sigmoid avx512 mask optimization ( #6093 )
1 year ago
tpoisonooo
25824dc554
add a new FAQ link for ncnn deepwiki ( #6083 )
1 year ago
nihui
ff11080db6
pnnx convert onnx MaxPool auto_pad same, torch reshape_as ( #6089 )
1 year ago
tpoisonooo
dddfc50282
raise max gpu count to 32 ( #6084 )
1 year ago