futz12
c6ab155a67
conv based speed stft
1 year ago
nihui
02fcca2a1e
guard VK_USE_PLATFORM_ANDROID_KHR with ifdef ( #5986 )
1 year ago
nihui
33d03e625c
drop ncnn glsl string literal macros ( #5975 )
glsl has no string type anyway
1 year ago
nihui
8dbcfee5ec
option owns vulkan device index ( #5973 )
1 year ago
nihui
84970eed4d
vulkan validation layer enables NCNN_LOGE in shader source ( #5963 )
* NCNN_LOGE in glsl
* Update glsl-extension.md
1 year ago
nihui
b284dbd0f4
discover VK_KHR_shader_non_semantic_info, checked convolution imagestore ( #5955 )
1 year ago
nihui
eed257df1f
ci update llvmpipe ( #5954 )
* check image fp16
1 year ago
nihui
ad30c7f6fb
clean vulkan shader common extension ( #5952 )
* clean vulkan shader common extension
* macro suffix makes glslang unhappy
1 year ago
nihui
bf13c30210
define device feature macros for glslang, discover VK_EXT_shader_atomic_float and VK_EXT_shader_atomic_float2 ( #5949 )
1 year ago
peerless2012
1d1ad06459
Fix get_elf_hwcap always return 0 on HarmonyOS NEXT. ( #5951 )
1 year ago
nihui
8211930a6f
discover VK_KHR_shader_subgroup_rotate ( #5948 )
1 year ago
nihui
1b6485fa17
discover VK_KHR_zero_initialize_workgroup_memory ( #5947 )
1 year ago
nihui
40f7b4e527
discover all subgroup features and VK_KHR_shader_subgroup_extended_types ( #5946 )
1 year ago
nihui
0b9925cfef
intergrate VK_EXT_subgroup_size_control features and properties ( #5940 )
1 year ago
nihui
07267f2618
softmax 4d test and vulkan, softmax unified elempack optimization for x86 arm riscv ( #5931 )
1 year ago
nihui
6396a732ef
reshape shape expression, drop reshape permute, test reshape oom ( #5918 )
1 year ago
Yexuan Wu
3571d7e8ec
Support better API to detect big little core in windows after win7 ( #5927 )
1 year ago
nihui
1e3fcb9dda
paramdict value string type, natural array representation ( #5915 )
1 year ago
nihui
23890900c2
x86 optimization for convolution int8 gemm ( #5874 )
* cmake check compiler test cannot be optimized out
* drop requant pack4
1 year ago
nihui
4a70be45ed
fix requantize pack4to8 ( #5893 )
1 year ago
nihui
ff5b554003
restrict one dim quantize scale size, test quantize oom ( #5892 )
* restrict one dim quantize scale size
* sse2 requantize pack8
1 year ago
nihui
956bccd295
restrict one dim requantize scale bias size ( #5888 )
1 year ago
nihui
48e1260a6f
restrict one dim dequantize scale bias size ( #5886 )
1 year ago
nihui
21a71d3673
slim x86 dequantize ( #5879 )
* remove dequantize pack8 test, seems to be useless
1 year ago
nihui
39cf4f6018
slim reduction ( #5866 )
1 year ago
nihui
a024d801fa
fix ci woa test, disable woa svml optimization ( #5846 )
1 year ago
nihui
66cd40e934
fix clang avx512bf16 build ( #5842 )
* check compiler supports isa with optimization enabled
1 year ago
nihui
44e0d95c0d
x86 sse2/xop/avx/avx2/avx512/vnni/vnniint8 optimization for gemm int8 ( #5763 )
* skip round problem
* sde on ubuntu24
1 year ago
nihui
19caca3140
port rvv intrinsic 1.0+ ( #5642 )
* zfh zvfh xtheadvector infra
* dispatch for rvv and xtheadvector
* dispatch for non-vector zfh
* port xtheadvector recp rsqrt trunc
* general rvv gemm
* c906 and c910 ci
* old tuple code clean
* update riscv64 ci
* update build doc
* drop old th1520 toolchain
1 year ago
nihui
8d2ac57824
fix missing asimdfhm target macro in ndk-r21 ( #5804 )
1 year ago
nihui
0734b657d9
spectrogram and inverse spectrogram ( #5779 )
* only supports hann, hamming and all-one window
* inverse spectrogram does not support length parameter
* spectrogram always returns torch.view_as_real(out) as ncnn does not support complex typed mat yet
* inverse spectrogram always accepts torch.view_as_complex(in) as ncnn does not support complex typed mat yet
1 year ago
nihui
9cefe9a624
avx vnni int8, avx vnni int16, avx ne convert infrastructure ( #5749 )
1 year ago
nihui
c32442aa09
disable x86 auto recip optimization for potential precision loss ( #5762 )
1 year ago
nihui
e7602a206b
fix gemm arm int8 scales descales offset ( #5750 )
1 year ago
nihui
8fe62812c9
arm neon optimization for layernorm fp32/bf16s/fp16s ( #5746 )
1 year ago
Upliner Mikhalych
cbd17cd062
Fix #5741 don't crash when vkCreateDevice fails ( #5742 )
1 year ago
nihui
73d3519326
layernorm x86 optimization, re ( #5745 )
1 year ago
nihui
bd1f39ed82
blacklist mesa vulkan cooperative matrix feature ( #5739 )
ref https://gitlab.freedesktop.org/mesa/mesa/-/issues/10847
1 year ago
nihui
8105c75120
improve compatibility of harmonyos cpu topology abi ( #5740 )
1 year ago
nihui
121b1fecd5
apply code-format changes
1 year ago
nihui
66b54cbea2
multiheadattention int8 quantization ( #5733 )
* x86 vulkan fallback
* comment about bf16s
1 year ago
nihui
1c7af00499
gemm int8 quantization ( #5706 )
* quantize gemm
* write gemm quantize scales
* update doc
* less openmp args
* x86 riscv fallback
* skip gemm vulkan int8
* fix noint8 test, fix arm bf16 test
* enable vfpv4 on neon build only
* fix gemm vulkan without C
* fp16 pack8 output
* enable elempack=8 only for asimdhp+
* tiled gemm int8 test
* opt arm64 tiles, fix asimdhp dispatch
1 year ago
nihui
204583ba52
x86 sse2/avx/avx512 optimization for rmsnorm ( #5672 )
1 year ago
nihui
8077d340a9
arm neon optimzation for rmsnorm ( #5668 )
1 year ago
nihui
5df5413c81
embed int8 quantization and add embed test ( #5667 )
1 year ago
nihui
70310e951e
fix out of range read in convolution im2col aarch64 ( #5631 )
1 year ago
nihui
fdf0df3079
RMSNorm ( #5630 )
1 year ago
佰阅
391152f500
c_api surpport set_vulkan_device ( #5610 )
1 year ago
quink
92e0b8253b
arm/convolution_3x3_pack1to8_fp16s: prefer ldr/str over ld1/st1 ( #5603 )
Depending on the arch, ldr/str can be faster than ld1/st1, especially
for loading to one lane form. For example, on Cortex A75,
1. execution latency of 'ldr q0' and 'ldr h0' are 5
2. execution latency of 'ld1 {v0.16b}' is 6
3. execution latency of 'ld1 {v0.h}[0]' is 8
On Cortex X3,
1. execution latency of 'ldr q0' and 'ldr h0' are 6
2. execution latency of 'ld1 {v0.16b}' is 6
3. execution latency of 'ld1 {v0.h}[0]' is 8
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
1 year ago
nihui
997c8926d7
use ruapu detection only on windows arm, enable cpu powerinfo with mingw compiler ( #5593 )
1 year ago