You can not select more than 25 topics Topics must start with a chinese character,a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
 
 
 
 
nihui 40a69a2dd3
discard riscv weight memory (#3874)
4 years ago
..
avx512_mathfun.h Add SSE&AVX optimized for tan (#3765) 4 years ago
avx_mathfun.h Add SSE&AVX optimized for tan (#3765) 4 years ago
batchnorm_x86.cpp x86 avx512 optimization (#3581) 4 years ago
batchnorm_x86.h Added AVX swish/lrn/batchnorm (#1897) 6 years ago
bias_x86.cpp relu3d, batchnorm3d, reshape4d, flatten4d, permute4d (#3397) 4 years ago
bias_x86.h LSTM arm/x86 + fp16 innerproduct arm (#1881) 6 years ago
binaryop_x86.cpp binaryop type specialization (#3830) 4 years ago
binaryop_x86.h X86 Elempack 8 AVX implementations. (#1853) 6 years ago
bnll_x86.cpp Add unittest and SSE&AVX optimized for BNLL (#3759) 4 years ago
bnll_x86.h Add unittest and SSE&AVX optimized for BNLL (#3759) 4 years ago
cast_fp16.h x86 f16c infrastructure (#3577) 4 years ago
cast_x86.cpp x86 f16c infrastructure (#3577) 4 years ago
cast_x86.h Optimize FP32/FP16 conversion with AVX intrinsic. (#1545) 6 years ago
cast_x86_f16c.cpp x86 f16c infrastructure (#3577) 4 years ago
clip_x86.cpp x86 avx512 optimization (#3581) 4 years ago
clip_x86.h X86 Elempack 8 AVX implementations. (#1853) 6 years ago
concat_x86.cpp x86 avx512 optimization (#3581) 4 years ago
concat_x86.h SSE2 optimization pack (#2123) 5 years ago
convolution1d_x86.cpp x86 avx512 optimization (#3581) 4 years ago
convolution1d_x86.h dynamic convolution weight (#3408) 4 years ago
convolution_1x1.h discard weight memory for x86 arm vulkan (#3865) 4 years ago
convolution_1x1_int8.h x86 avx2 optimization for convolution gemm int8 (#3489) 4 years ago
convolution_1x1_pack1to4.h x86 sse2 optimization for conv1x1/3x3 pack4 and general sgemm pack4/pack4to1 (#3538) 4 years ago
convolution_1x1_pack1to4_int8.h x86 avx2 optimization for convolution gemm int8 (#3489) 4 years ago
convolution_1x1_pack1to8.h x86 avx fma optimization (#3543) 4 years ago
convolution_1x1_pack1to16.h x86 avx512 optimization (#3581) 4 years ago
convolution_1x1_pack4.h x86 sse2 optimization for conv1x1/3x3 pack4 and general sgemm pack4/pack4to1 (#3538) 4 years ago
convolution_1x1_pack4to1.h x86 sse2 optimization for conv1x1/3x3 pack4 and general sgemm pack4/pack4to1 (#3538) 4 years ago
convolution_1x1_pack4to8.h x86 avx fma optimization (#3543) 4 years ago
convolution_1x1_pack4to16.h x86 avx512 optimization (#3691) 4 years ago
convolution_1x1_pack8.h x86 avx fma optimization (#3543) 4 years ago
convolution_1x1_pack8to1.h x86 avx fma optimization (#3543) 4 years ago
convolution_1x1_pack8to1_int8.h x86 avx2 optimization for convolution gemm int8 (#3489) 4 years ago
convolution_1x1_pack8to4.h always build tightly packed weight, fix #3545 (#3547) 4 years ago
convolution_1x1_pack8to4_int8.h x86 avx2 optimization for convolution gemm int8 (#3489) 4 years ago
convolution_1x1_pack8to16.h x86 avx512 optimization (#3581) 4 years ago
convolution_1x1_pack16.h x86 avx512 optimization (#3581) 4 years ago
convolution_1x1_pack16to1.h x86 avx512 optimization (#3691) 4 years ago
convolution_1x1_pack16to4.h x86 avx512 optimization (#3581) 4 years ago
convolution_1x1_pack16to8.h x86 avx512 optimization (#3581) 4 years ago
convolution_2x2_pack8.h Added ability to switch AVX/AVX2 during runtime (#3076) 5 years ago
convolution_3x3.h optimize x86 winograd input transform transpose (#3818) 4 years ago
convolution_3x3_int8.h initial data structure changes for 3dcnn, conv3d, pooling3d (#3378) 4 years ago
convolution_3x3_pack1to4.h added a number of optimized sse layers (#3302) 4 years ago
convolution_3x3_pack1to4_int8.h x86 avx2 optimization for convolution gemm int8 (#3489) 4 years ago
convolution_3x3_pack1to8.h Added ability to switch AVX/AVX2 during runtime (#3076) 5 years ago
convolution_3x3_pack4.h fix winograd function name (#3820) 4 years ago
convolution_3x3_pack4to1.h fix winograd function name (#3820) 4 years ago
convolution_3x3_pack8.h fix winograd function name (#3820) 4 years ago
convolution_3x3_pack8to1.h fix winograd function name (#3820) 4 years ago
convolution_3x3_pack8to1_int8.h fix winograd function name (#3820) 4 years ago
convolution_3x3_pack8to4_int8.h fix winograd function name (#3820) 4 years ago
convolution_3x3_pack16.h fix winograd function name (#3820) 4 years ago
convolution_3x3_pack16to1.h fix winograd function name (#3820) 4 years ago
convolution_5x5.h rewrite convolution x86 sgemm pack1 (#3544) 4 years ago
convolution_7x7_pack1to4_int8.h x86 avx2 optimization for convolution gemm int8 (#3489) 4 years ago
convolution_int8.h architecture changes for int8 packing (#2771) 5 years ago
convolution_pack1to4.h x86 avx fma optimization (#3543) 4 years ago
convolution_pack1to4_int8.h some x86 sse2 optimization for convolution int8 5 years ago
convolution_pack1to8.h x86 avx fma optimization (#3543) 4 years ago
convolution_pack1to16.h x86 avx512 optimization (#3581) 4 years ago
convolution_pack4.h x86 avx fma optimization (#3543) 4 years ago
convolution_pack4to1.h x86 avx fma optimization (#3543) 4 years ago
convolution_pack4to8.h x86 avx fma optimization (#3543) 4 years ago
convolution_pack4to16.h x86 avx512 optimization (#3581) 4 years ago
convolution_pack8.h x86 avx fma optimization (#3543) 4 years ago
convolution_pack8to1.h x86 avx fma optimization (#3543) 4 years ago
convolution_pack8to1_int8.h some x86 sse2 optimization for convolution int8 5 years ago
convolution_pack8to4.h x86 avx fma optimization (#3543) 4 years ago
convolution_pack8to4_int8.h some x86 sse2 optimization for convolution int8 5 years ago
convolution_pack8to16.h x86 avx512 optimization (#3581) 4 years ago
convolution_pack16.h x86 avx512 optimization (#3581) 4 years ago
convolution_pack16to1.h fix build x86 avx512 source with old gcc (#3705) 4 years ago
convolution_pack16to4.h x86 avx512 optimization (#3581) 4 years ago
convolution_pack16to8.h x86 avx512 optimization (#3581) 4 years ago
convolution_sgemm.h rewrite convolution x86 sgemm pack1 (#3544) 4 years ago
convolution_sgemm_int8.h x86 f16c infrastructure (#3577) 4 years ago
convolution_sgemm_pack1to4.h x86 sse2 optimization for conv1x1/3x3 pack4 and general sgemm pack4/pack4to1 (#3538) 4 years ago
convolution_sgemm_pack1to4_int8.h x86 f16c infrastructure (#3577) 4 years ago
convolution_sgemm_pack1to8.h x86 avx fma optimization (#3543) 4 years ago
convolution_sgemm_pack1to16.h x86 avx512 optimization (#3581) 4 years ago
convolution_sgemm_pack4.h x86 sse2 optimization for conv1x1/3x3 pack4 and general sgemm pack4/pack4to1 (#3538) 4 years ago
convolution_sgemm_pack4to1.h x86 sse2 optimization for conv1x1/3x3 pack4 and general sgemm pack4/pack4to1 (#3538) 4 years ago
convolution_sgemm_pack4to8.h always build tightly packed weight, fix #3545 (#3547) 4 years ago
convolution_sgemm_pack4to16.h x86 avx512 optimization (#3691) 4 years ago
convolution_sgemm_pack8.h always build tightly packed weight, fix #3545 (#3547) 4 years ago
convolution_sgemm_pack8to1.h x86 avx fma optimization (#3543) 4 years ago
convolution_sgemm_pack8to1_int8.h x86 f16c infrastructure (#3577) 4 years ago
convolution_sgemm_pack8to4.h always build tightly packed weight, fix #3545 (#3547) 4 years ago
convolution_sgemm_pack8to4_int8.h x86 f16c infrastructure (#3577) 4 years ago
convolution_sgemm_pack8to16.h x86 avx512 optimization (#3581) 4 years ago
convolution_sgemm_pack16.h x86 avx512 optimization (#3581) 4 years ago
convolution_sgemm_pack16to1.h fix build x86 avx512 source with old gcc (#3705) 4 years ago
convolution_sgemm_pack16to4.h x86 avx512 optimization (#3691) 4 years ago
convolution_sgemm_pack16to8.h x86 avx512 optimization (#3581) 4 years ago
convolution_winograd_transform.h optimize x86 winograd input transform transpose (#3818) 4 years ago
convolution_winograd_transform_pack4.h fix winograd function name (#3820) 4 years ago
convolution_winograd_transform_pack8.h fix winograd function name (#3820) 4 years ago
convolution_winograd_transform_pack16.h fix winograd function name (#3820) 4 years ago
convolution_x86.cpp discard weight memory for x86 arm vulkan (#3865) 4 years ago
convolution_x86.h discard weight memory for x86 arm vulkan (#3865) 4 years ago
convolution_x86_avx2.cpp fix winograd function name (#3820) 4 years ago
convolution_x86_avx512vnni.cpp fix winograd function name (#3820) 4 years ago
convolution_x86_avxvnni.cpp fix winograd function name (#3820) 4 years ago
convolution_x86_xop.cpp fix winograd function name (#3820) 4 years ago
convolutiondepthwise_3x3.h format code style and setup restyled.io (#1840) 6 years ago
convolutiondepthwise_3x3_int8.h architecture changes for int8 packing (#2771) 5 years ago
convolutiondepthwise_3x3_pack4.h x86 sse2 optimization for conv1x1/3x3 pack4 and general sgemm pack4/pack4to1 (#3538) 4 years ago
convolutiondepthwise_3x3_pack8.h x86 avx fma optimization (#3543) 4 years ago
convolutiondepthwise_3x3_pack16.h x86 avx512 optimization (#3581) 4 years ago
convolutiondepthwise_5x5_pack4.h x86 sse2 optimization for conv1x1/3x3 pack4 and general sgemm pack4/pack4to1 (#3538) 4 years ago
convolutiondepthwise_5x5_pack8.h x86 avx fma optimization (#3543) 4 years ago
convolutiondepthwise_5x5_pack16.h x86 avx512 optimization (#3581) 4 years ago
convolutiondepthwise_x86.cpp discard riscv weight memory (#3874) 4 years ago
convolutiondepthwise_x86.h discard weight memory for x86 arm vulkan (#3865) 4 years ago
crop_x86.cpp x86 avx512 optimization (#3581) 4 years ago
crop_x86.h X86 Elempack 8 AVX implementations. (#1853) 6 years ago
deconvolution_pack1to4.h x86 avx512 optimization for convolution winograd pack16to1 and deconvolution family, increase simpleomp argv count (#3694) 4 years ago
deconvolution_pack1to8.h x86 avx512 optimization for convolution winograd pack16to1 and deconvolution family, increase simpleomp argv count (#3694) 4 years ago
deconvolution_pack1to16.h x86 avx512 optimization for convolution winograd pack16to1 and deconvolution family, increase simpleomp argv count (#3694) 4 years ago
deconvolution_pack4.h x86 avx512 optimization for convolution winograd pack16to1 and deconvolution family, increase simpleomp argv count (#3694) 4 years ago
deconvolution_pack4to1.h x86 avx512 optimization for convolution winograd pack16to1 and deconvolution family, increase simpleomp argv count (#3694) 4 years ago
deconvolution_pack4to8.h x86 avx512 optimization for convolution winograd pack16to1 and deconvolution family, increase simpleomp argv count (#3694) 4 years ago
deconvolution_pack4to16.h x86 avx512 optimization for convolution winograd pack16to1 and deconvolution family, increase simpleomp argv count (#3694) 4 years ago
deconvolution_pack8.h x86 avx512 optimization for convolution winograd pack16to1 and deconvolution family, increase simpleomp argv count (#3694) 4 years ago
deconvolution_pack8to1.h x86 avx512 optimization for convolution winograd pack16to1 and deconvolution family, increase simpleomp argv count (#3694) 4 years ago
deconvolution_pack8to4.h x86 avx512 optimization for convolution winograd pack16to1 and deconvolution family, increase simpleomp argv count (#3694) 4 years ago
deconvolution_pack8to16.h x86 avx512 optimization for convolution winograd pack16to1 and deconvolution family, increase simpleomp argv count (#3694) 4 years ago
deconvolution_pack16.h x86 avx512 optimization for convolution winograd pack16to1 and deconvolution family, increase simpleomp argv count (#3694) 4 years ago
deconvolution_pack16to1.h fix build x86 avx512 source with old gcc (#3705) 4 years ago
deconvolution_pack16to4.h x86 avx512 optimization for convolution winograd pack16to1 and deconvolution family, increase simpleomp argv count (#3694) 4 years ago
deconvolution_pack16to8.h x86 avx512 optimization for convolution winograd pack16to1 and deconvolution family, increase simpleomp argv count (#3694) 4 years ago
deconvolution_x86.cpp discard weight memory for x86 arm vulkan (#3865) 4 years ago
deconvolution_x86.h discard weight memory for x86 arm vulkan (#3865) 4 years ago
deconvolutiondepthwise_x86.cpp discard weight memory for x86 arm vulkan (#3865) 4 years ago
deconvolutiondepthwise_x86.h discard weight memory for x86 arm vulkan (#3865) 4 years ago
dequantize_x86.cpp x86 avx512 optimization (#3581) 4 years ago
dequantize_x86.h architecture changes for int8 packing (#2771) 5 years ago
dropout_x86.cpp x86 avx512 optimization (#3581) 4 years ago
dropout_x86.h X86 Elempack 8 AVX implementations. (#1853) 6 years ago
eltwise_x86.cpp x86 avx512 optimization (#3691) 4 years ago
eltwise_x86.h X86 Elempack 8 AVX implementations. (#1853) 6 years ago
flatten_x86.cpp x86 avx512 optimization (#3581) 4 years ago
flatten_x86.h architecture changes for int8 packing (#2771) 5 years ago
hardsigmoid_x86.cpp x86 avx512 optimization (#3581) 4 years ago
hardsigmoid_x86.h X86 Elempack 8 AVX implementations. (#1853) 6 years ago
hardswish_x86.cpp x86 avx512 optimization (#3581) 4 years ago
hardswish_x86.h avx2 infrastructure (#1943) 6 years ago
innerproduct_x86.cpp discard weight memory for x86 arm vulkan (#3865) 4 years ago
innerproduct_x86.h discard weight memory for x86 arm vulkan (#3865) 4 years ago
interp_bicubic.h x86 optimization for interp (#3546) 4 years ago
interp_bicubic_pack4.h x86 optimization for interp (#3546) 4 years ago
interp_bicubic_pack8.h x86 optimization for interp (#3546) 4 years ago
interp_bicubic_pack16.h x86 avx512 optimization (#3581) 4 years ago
interp_bilinear.h x86 optimization for interp (#3546) 4 years ago
interp_bilinear_pack4.h x86 optimization for interp (#3546) 4 years ago
interp_bilinear_pack8.h x86 optimization for interp (#3546) 4 years ago
interp_bilinear_pack16.h x86 avx512 optimization (#3581) 4 years ago
interp_x86.cpp x86 avx512 optimization (#3581) 4 years ago
interp_x86.h x86 optimization for interp (#3546) 4 years ago
lrn_x86.cpp x86 avx512 optimization (#3581) 4 years ago
lrn_x86.h Added AVX swish/lrn/batchnorm (#1897) 6 years ago
lstm_x86.cpp multi-threading rnn/lstm/gru with openmp (#3834) 4 years ago
lstm_x86.h drop x86 avx2 fp16 (#3568) 4 years ago
mish_x86.cpp x86 avx512 optimization (#3581) 4 years ago
mish_x86.h X86 Elempack 8 AVX implementations. (#1853) 6 years ago
packing_x86.cpp x86 avx512 optimization (#3581) 4 years ago
packing_x86.h architecture changes for int8 packing (#2771) 5 years ago
padding_pack4.h x86 sse2 optimization for conv1x1/3x3 pack4 and general sgemm pack4/pack4to1 (#3538) 4 years ago
padding_pack8.h x86 avx fma optimization (#3543) 4 years ago
padding_pack8_int8.h test flatten packing padding int8 5 years ago
padding_pack16.h x86 avx512 optimization (#3581) 4 years ago
padding_x86.cpp x86 avx512 optimization (#3581) 4 years ago
padding_x86.h architecture changes for int8 packing (#2771) 5 years ago
pooling_2x2.h X86 Elempack 8 AVX implementations. (#1853) 6 years ago
pooling_2x2_pack4.h added a number of optimized sse layers (#3302) 4 years ago
pooling_2x2_pack8.h X86 Elempack 8 AVX implementations. (#1853) 6 years ago
pooling_2x2_pack16.h x86 avx512 optimization (#3581) 4 years ago
pooling_3x3_pack4.h added a number of optimized sse layers (#3302) 4 years ago
pooling_3x3_pack8.h X86 Elempack 8 AVX implementations. (#1853) 6 years ago
pooling_3x3_pack16.h x86 avx512 optimization (#3581) 4 years ago
pooling_x86.cpp x86 avx512 optimization (#3581) 4 years ago
pooling_x86.h support PyTorch AdaptiveAvgPool2d and AdaptiveMaxPool2d (#2546) 5 years ago
prelu_x86.cpp x86 avx512 optimization (#3581) 4 years ago
prelu_x86.h X86 Elempack 8 AVX implementations. (#1853) 6 years ago
quantize_x86.cpp x86 avx512 optimization (#3581) 4 years ago
quantize_x86.h architecture changes for int8 packing (#2771) 5 years ago
relu_x86.cpp x86 avx512 optimization (#3581) 4 years ago
relu_x86.h architecture changes for int8 packing (#2771) 5 years ago
requantize_x86.cpp x86 avx512 optimization (#3581) 4 years ago
requantize_x86.h architecture changes for int8 packing (#2771) 5 years ago
reshape_x86.cpp x86 avx512 optimization (#3581) 4 years ago
reshape_x86.h reshape x86 pack4 5 years ago
roialign_x86.cpp fix roialign_x86.cpp integer multiply may overflow (#2211) 5 years ago
roialign_x86.h Improve ROIAlign (accelerate ROIAlign, support sampling ratio and aligned ROIAlign) (#1820) 6 years ago
scale_x86.cpp Edit _bias128 in scale_x86.cpp for useless if (#3821) 4 years ago
scale_x86.h LSTM arm/x86 + fp16 innerproduct arm (#1881) 6 years ago
sigmoid_x86.cpp x86 avx512 optimization (#3581) 4 years ago
sigmoid_x86.h X86 Elempack 8 AVX implementations. (#1853) 6 years ago
slice_x86.cpp x86 avx512 optimization (#3581) 4 years ago
slice_x86.h x86 slice pack4 5 years ago
softmax_x86.cpp pnnx save ncnn bin with fp16 storage (#3715) 4 years ago
softmax_x86.h x86 sse/avx/avx512 optimization for softmax (#3712) 4 years ago
sse_mathfun.h Add SSE&AVX optimized for tan (#3765) 4 years ago
swish_x86.cpp x86 avx512 optimization (#3581) 4 years ago
swish_x86.h Added AVX swish/lrn/batchnorm (#1897) 6 years ago
tanh_x86.cpp add tanh avx512 optimize (#3770) 4 years ago
tanh_x86.h LSTM arm/x86 + fp16 innerproduct arm (#1881) 6 years ago
unaryop_x86.cpp fix x86 unaryop bug on gcc-4.4 (#3838) 4 years ago
unaryop_x86.h Add unaryop x86 (#3579) 4 years ago
x86_activation.h fix build x86 avx512 source with old gcc (#3705) 4 years ago
x86_usability.h x86 sse/avx/avx512 optimization for softmax (#3712) 4 years ago
yolov3detectionoutput_x86.cpp style(src/layer/x86): fix build warning (#3699) 4 years ago
yolov3detectionoutput_x86.h Add yolov3detectionoutput test and AVX optimization (#1994) 5 years ago