nihui
0734b657d9
spectrogram and inverse spectrogram ( #5779 )
* only supports hann, hamming and all-one window
* inverse spectrogram does not support length parameter
* spectrogram always returns torch.view_as_real(out) as ncnn does not support complex typed mat yet
* inverse spectrogram always accepts torch.view_as_complex(in) as ncnn does not support complex typed mat yet
1 year ago
nihui
e7602a206b
fix gemm arm int8 scales descales offset ( #5750 )
1 year ago
nihui
8fe62812c9
arm neon optimization for layernorm fp32/bf16s/fp16s ( #5746 )
1 year ago
nihui
66b54cbea2
multiheadattention int8 quantization ( #5733 )
* x86 vulkan fallback
* comment about bf16s
1 year ago
nihui
1c7af00499
gemm int8 quantization ( #5706 )
* quantize gemm
* write gemm quantize scales
* update doc
* less openmp args
* x86 riscv fallback
* skip gemm vulkan int8
* fix noint8 test, fix arm bf16 test
* enable vfpv4 on neon build only
* fix gemm vulkan without C
* fp16 pack8 output
* enable elempack=8 only for asimdhp+
* tiled gemm int8 test
* opt arm64 tiles, fix asimdhp dispatch
1 year ago
nihui
5df5413c81
embed int8 quantization and add embed test ( #5667 )
1 year ago
nihui
fdf0df3079
RMSNorm ( #5630 )
1 year ago
nihui
3752d71200
fix potential fp16s bf16s conflicts on arm vfpv4 ( #5578 )
* fix potential fp16s bf16s conflicts on armv7 vfpv4
* but prefer fp16 on armv8.2
1 year ago
nihui
4c3debae2d
multiheadattention scale param ( #5526 )
* update swiftshader
* skip vs2017 swiftshader
1 year ago
nihui
8235cad999
mha allow qdim differs from embed_dim ( #5519 )
* test mha oom
1 year ago
nihui
39c27de47b
test concat oom ( #5502 )
1 year ago
nihui
093c516898
test slice oom ( #5501 )
1 year ago
nihui
da7d1a10f7
test x86 arm convolution oom ( #5492 )
* skip mips loongarch riscv oom test atm
* test softmax oom
1 year ago
nihui
08b7d99a75
rnn/lstm/gru dynamic quantization ( #5435 )
2 years ago
nihui
9ce7930413
x86 optimization for convolution tiled gemm ( #5426 )
2 years ago
nihui
e3758fdd19
fix test reduction warning ( #5397 )
2 years ago
nihui
984d6dd844
promote vfpv4 for auto fp16 storage conversion ( #5325 )
* promote vfpv4 for auto fp16 storage conversion
* always report neon and vfpv4 for arm64
2 years ago
nihui
5329d32e74
check vulkan fp16 uniform support and implement lfp conversion without fp16u ( #5287 )
2 years ago
nihui
556b79ce4d
create layer decoupled ( #5258 )
* create layer decoupled
* no more virtual public
* allow build test with shared library
* decouple cpu vulkan
* drop old scripts
2 years ago
nihui
ded0b78bb2
fix nvidia vulkan crash on exit ( #5234 )
2 years ago
nihui
eea3fc9b41
optimize vulkan global pooling ( #5191 )
Co-authored-by: nihui <nihui@users.noreply.github.com>
Co-authored-by: michaelcai <michaelcai@tencent.com>
2 years ago
nihui
4136de3b8d
arm optimization for convolution int8 packed unified elempack ( #5147 )
2 years ago
nihui
4494aadd74
deconvolution dynamic weight ( #5119 )
2 years ago
nihui
14e14a9ae8
slice with indices ( #5103 )
2 years ago
邓实诚
a1e3ebf8e5
implement simplemath ( #4905 )
* complete abs, fmod and sin function in simplemath.h
* remove some unused variables in simplemath.cpp
* modify test-coverage.yml and add some functions to simplemath.cpp
* modify erf.cpp which included math.h
* include platform.h for NCNN_SIMPLEMATH definition
* move utility constants and functions in simplemath.h to simplemath.cpp
* guard simplemath functions with extern "C"
* add NCNN_EXPORT macro in simplemath.h
* include plateform.h and guard all declarations with NCNN_SIMPLEMATH
* clean unused code in test_unaryop.cpp
* guard #include <vector> with NCNN_SIMPLEMATH in benchncnn.cpp
* add 'static' to guard functions that not declarated in header file
* modify sin and cos with better implementation
---------
Co-authored-by: HonestDeng <HonestDeng@users.noreply.github.com>
2 years ago
Yoh
3f437d3f3d
Grid sample op ( #4373 )
* pnnx support grid_sample op
* complete the permute and gridsample operator fusion
* spilt calculation into two stages and support permute fusion
2 years ago
nihui
7b02425246
x86 optimization for convolution int8 winograd unified elempack ( #5054 )
2 years ago
FhqTreap
1d7720efe8
fix test conv1d ( #5049 )
2 years ago
nihui
78aca88d67
elu 4d and selu 4d ( #5047 )
2 years ago
Beq Jal
019176c6b2
selu and shufflechannel on x86 ( #5017 )
2 years ago
Amir Ramezani
7e5fa3ade3
shrink operator ( #5022 )
2 years ago
nihui
c8662cce5e
arm optimization for convolution int8 gemm unified elempack ( #5016 )
2 years ago
Amir Ramezani
0ea587b8c7
celu activation vulkan and onnx conversion ( #5018 )
2 years ago
Beq Jal
bcfec1da33
Celu layer and export to ncnn ( #5019 )
2 years ago
Beq Jal
c851231832
add diag layer and its converter ( #4935 )
2 years ago
Amir Ramezani
695f770eab
erf implementation ( #5012 )
* added erf implementation
* added testcase for erf
* added onnx2ncnn support of erf
2 years ago
nihui
4abadd2ffb
binaryop implicit broadcast B with 1 dimension rank for outer axis ( #4930 )
2 years ago
nihui
c45c01c7c1
enable VK_KHR_cooperative_matrix ( #4823 )
* enable VK_KHR_cooperative_matrix
* add khr cm shader
* update glslang
* print matrix info
2 years ago
nihui
55709708e9
x86 optimization for convolution int8 packed unified elempack ( #4861 )
2 years ago
nihui
1283a19305
pnnx convert torch round trunc ( #4813 )
* update riscv qemu
* c906 test on qemu
* fix qemu aarch64
2 years ago
nihui
9022b7162a
implement all explicit binaryop broadcast types ( #4809 )
* simplify binaryop
* less gpu test
* update binaryop broadcast doc
* do not test atan2 zero
2 years ago
nihui
903ec7c2c9
fix overwrite builtin layer destruction ( #4732 )
* fix overwrite builtin layer destruction
* make modelbin class copyable
* test++
3 years ago
nihui
f893d2440d
innerproduct allow 1 height gemm ( #4730 )
3 years ago
nihui
249b264336
workaround moltenvk error on spec const composite op ( #4714 )
* workaround moltenvk error on spec const composite op
* workaround moltenvk crying on binding image with memory offset
3 years ago
nihui
a37a83d850
clip gelu mish tanh 4d ( #4695 )
3 years ago
nihui
cd5a6098a2
sigmoid and swish 4d ( #4692 )
3 years ago
nihui
c28c8c04a1
multiheadattention attn mask ( #4668 )
3 years ago
nihui
b640574b88
rough vulkan gemm and multiheadattention ( #4618 )
3 years ago
nihui
db628b1b99
allow overwriting built-in layer with custom layer ( #4616 )
3 years ago
nihui
1133a18ca8
x86 and arm optimization for convolution1d packed unified elempack ( #4615 )
3 years ago