nihui/ncnn - ncnn - 开源协同云脑生态支撑系统

Commit Graph

Author	SHA1	Message	Date
nihui	da7d1a10f7	test x86 arm convolution oom (#5492 ) * skip mips loongarch riscv oom test atm * test softmax oom	1 year ago
nihui	556b79ce4d	create layer decoupled (#5258 ) * create layer decoupled * no more virtual public * allow build test with shared library * decouple cpu vulkan * drop old scripts	2 years ago
nihui	ded0b78bb2	fix nvidia vulkan crash on exit (#5234 )	2 years ago
邓实诚	a1e3ebf8e5	implement simplemath (#4905 ) * complete abs, fmod and sin function in simplemath.h * remove some unused variables in simplemath.cpp * modify test-coverage.yml and add some functions to simplemath.cpp * modify erf.cpp which included math.h * include platform.h for NCNN_SIMPLEMATH definition * move utility constants and functions in simplemath.h to simplemath.cpp * guard simplemath functions with extern "C" * add NCNN_EXPORT macro in simplemath.h * include plateform.h and guard all declarations with NCNN_SIMPLEMATH * clean unused code in test_unaryop.cpp * guard #include <vector> with NCNN_SIMPLEMATH in benchncnn.cpp * add 'static' to guard functions that not declarated in header file * modify sin and cos with better implementation --------- Co-authored-by: HonestDeng <HonestDeng@users.noreply.github.com>	2 years ago
nihui	c45c01c7c1	enable VK_KHR_cooperative_matrix (#4823 ) * enable VK_KHR_cooperative_matrix * add khr cm shader * update glslang * print matrix info	2 years ago
nihui	85991e2e0e	test custom option, update ci (#4609 ) * early return for cpu test * make nvidia driver happy * fix gemm x86 threading	3 years ago
nihui	fed99fd35b	gemm output transpose, prepack c (#4479 ) * mha is now permute and reshape free * gemm user defined tile mnk param	3 years ago
nihui	15761fc1a6	arm vfpv4 asimdhp asimdfhm optimization for gemm (#4432 )	3 years ago
nihui	706831f8a9	arm vfpv4 optimization for innerproduct (#3950 )	3 years ago
nihui	067e8e1d92	mips unified elempack for elementwise layers (#3928 )	3 years ago
nihui	241524ffce	discard weight memory for x86 arm vulkan (#3865 ) * discard weight memory for x86 and vulkan * drop arm innerproduct weight * drop arm convolution weight * drop arm convolutiondepthwise weight * drop x86 vulkan deconvolution deconvolutiondepthwise weight * drop arm deconvolution deconvolutiondepthwise weight * arm neon assembly optimization for innerproduct pack4	4 years ago
tpoisonooo	6fd801b6d7	feat(src/layer): add vision_transformer benchmark (#3730 ) * feat(src/layer): add vision_transformer benchmark and relative layer * refactor(testutil.h): add para for RandomMat	4 years ago
nihui	308965b7e9	sanitize cooperative matrix option in tests	4 years ago
nihui	dadc640c66	x86 avx512 optimization (#3581 ) * unified relu avx512 * unifed clip avx512 * unaryop avx512 * sigmoid avx512 * binaryop avx512 * padding convolution avx512 * convolutiondepthwise avx512 * innerproduct avx512 * reshape avx512 * slice avx512 * hardsigmoid hardswish avx512 * swish avx512 * pooling avx512 * crop avx512 * convolution sgemm pack16 * convolution 3x3 winograd pack16 * interp avx512 * convolution sgemm pack1to16 * convolution sgemm pack16to8 * convolution sgemm pack8to16 * convolution sgemm pack16to4 * fix vulkan permute pack8 * fix vulkan convolution gemm pack8to1	4 years ago
nihui	559e5b23f9	vulkan tensorcore optimization (#3628 ) * query and enable cooperative matrix * fix build with old vulkan sdk * implement cooperative matrix optimization * add nvidia-t4 coverage * adjust test option for more coverage	4 years ago
nihui	920aa79f04	drop x86 avx2 fp16 (#3568 )	4 years ago
Yuzhong Yan	681141ff42	[YZ] Fix bug in unit test (#3556 )	4 years ago
nihui	3a83704c38	binary4d, unary4d (#3443 )	4 years ago
nihui	6941ec8fc9	arm neon optimization for general packed convolution (#3426 )	4 years ago
nihui	999e640d43	dynamic convolution weight (#3408 )	4 years ago
nihui	f10cc6dd93	initial data structure changes for 3dcnn, conv3d, pooling3d (#3378 ) Co-authored-by: ElvisYu <elvisyuovo@gmail.com> Co-authored-by: 余浩文 <m18107220188@163.com> Co-authored-by: Zr2223 <67497651+Zr2223@users.noreply.github.com>	4 years ago
nihui	24fbb6e8cb	honor thread setting on load and vulkan command, ci avx512 t4 (#3391 )	4 years ago
nihui	0b664ec438	fix potential out of range read in test with int8 inputs (#3357 )	4 years ago
Tijmen Verhulsdonck	4270b5c502	Fix broken codepaths with AVX only (#3254 ) * Fix codepaths for fp16 weights when only AVX is enabled * Disable opt overrides * Update SDK url * Update vulkan SDK download version * Debugging risv pad * apply code-format changes * fix padding test * fix mips slice test * fix lrn test * implement mish swish image shader, fix pooling adaptive image storage support, drop debug output * update ci ubuntu 18.04 Co-authored-by: nihui <shuizhuyuanluo@126.com>	4 years ago
Tijmen Verhulsdonck	eaa7e24db6	Added ability to switch AVX/AVX2 during runtime (#3076 )	4 years ago
nihui	3a77b09c31	fix test failure	5 years ago
nihuini	fef61c5296	fix arm build	5 years ago
nihuini	934a1a8e32	test flatten packing padding int8	5 years ago
nihui	17936e9f54	fix packing risc-v test, add cpu_riscv_vlenb()	5 years ago
nihui	a61f03ec76	arm neon optimization for pixelshuffle scale 2	5 years ago
nihui	5fe75f19ef	architecture changes for int8 packing (#2771 ) * quantize and dequantize tests * unify activation and usability function * drop NCNN_REQUANT cmake option, test dequantize requantize pack8, fix webassembly build * benchmark use requantize int8 model	5 years ago
nihui	21dc650eb3	check layer support (#2564 )	5 years ago
tpoisonooo	baf49574c4	innerproduct aarch64 use gemm (#2521 ) * perf(innerproduct-arm): add aarch64 gemm * fix(innerproduct): fix compilation errror * fix(armv7-innerproduct): fix armv7 compilation error * fix(innerproduct): fix gemm param * fix(int8): update mock scales and fix runtime error * fix(compilation): fix compilation error	5 years ago
nihui	54c0a13b9f	build shared library (#2525 ) * build shared lib and enable lto * reserved for layer and option * allocator pimpl * datareader pimpl * paramdict pimpl, disable copy assign for allocator and datareader * modelbin pimpl * net extractor pimpl * gpu pimple * disable copy assign vulkandevice, code format * command pimpl, dummy image readonly * pipeline pipelinecache pimpl, export platform class * code format, export simple family * update ci * disable lto on android armv7, merge webassembly ci * link libgcc, fix macos dylib version * pipeline pimpl, gpu info pimpl * destroy gpu info after vulkan device * ignore msvc stl class warning * fix ncnn_paramdict_get_float return type * fix vktransfer upload fp16 without flatten, add command test	5 years ago
PENGUINLIONG	8f8f2de4d0	SSE2 optimization pack (#2123 ) * SSE2: BatchNorm * Fixed batch norm in AVX configuration * Optimized register size switch * Attempt to pass CI * Attempt to pass CI * Bias op * Element wise ops * Support packing on x86 by default * Fixed macro range in bias * Use aligned read for packed data * Update testutil.h * Update pooling_x86.cpp * Support wasn SIMD * Fix emscripten compiler flags * fix build * more ci fix * concat x86 pack4 * flatten x86 pack4 * more x86 pack4 * ci pass * fix * enable sse2 mathfun * enable --experimental-wasm-simd Co-authored-by: nihui <shuizhuyuanluo@126.com> Co-authored-by: nihuini <nihuini@tencent.com>	5 years ago
maxfy1992	a106baa3b8	add interp param align_corner (#2236 ) * add interp param align_corner add check support_vulkan after create_pipeline for tests * code style Co-authored-by: yangfengmax <yangfengmax@didichuxing.com>	5 years ago
Leo	5afd318b86	Support remove libstdc++ denpendency (#2030 ) * [build] add toolchain file w/o stdcxx dependency * [build] link m and gcc lib explicitly * [ncnn] complete simple stl impl * [ncnn] adapt for ncnn simplestl * [test] adapt for ncnn simplestl * [ncnn] fix missing algorithm and list when simplestl disabled * [ncnn] fix guard for operator new and delete * [style] fix the code style * [build] fix build failed on darwin and emscripten * [ci] do not import cxx to avoid operator conflict * [ncnn] add temporary partial_sort impl using bubble sort heap sort should be used for better perf. * [ncnn] add std greater and less function * [ncnn] fix placement new operator overload * [ncnn] add operator delete with size info * [build] disable exception, rtti, example and tools when simplestl on * [build] add toolchain for arm simplestl * [build] add toolchain for aarch64 simplestl * [ncnn] move initializer to constructor * [ncnn] use deteiled type instead of auto * [ncnn] use plain lib name in target_link_libraries	5 years ago
nihuini	d3f0b9f993	try smaller random values	5 years ago
nihuini	5d5a3d1434	conv1x1s1 conv1x1s2 conv3x3s1 winograd pack8 arm fp16sa	5 years ago
nihui	aa1a9e90c5	interp shufflechannel arm fp16sa pack8	5 years ago
nihuini	df5a7f32d4	enable arm82 fp16sa pack8 test	5 years ago
nihuini	47ae0c151a	some shared arm bf16s fp16s implementation	5 years ago
nihui	bb5bfe3841	avx2 infrastructure (#1943 )	5 years ago
nihui	11cffce114	armv8.2 infrastructure (#1856 ) * runtime cpu dispatch * force thread one * disable openmp for coverage * simplify test layer * print NCNN_TARGET_ARCH * less ci build variants * weight fp16 storage option * test convdw int8 * apple a12 a13 * ncnn_add_layer ncnn_add_shader cmake macro	5 years ago
nihui	3ff40b0679	Ci rv32imc (#1940 )	5 years ago
nihuini	0d6cc01d55	innerproduct handle mish activation, fix naive C testing, fix #1930	5 years ago
Tijmen Verhulsdonck	3325cf94f8	Added AVX swish/lrn/batchnorm (#1897 ) * added avx versions of commonly used layers * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * Unroll loops and increase test coverage * Restyled/pull 1897 (#13) * Restyle Co-authored-by: Restyled.io <commits@restyled.io> * cleanup variable declaration * fix ctest * Restyled/pull 1897 (#14) Restyle Co-authored-by: Restyled.io <commits@restyled.io> * Fix ctest for convolutions without AVX. Co-authored-by: Restyled.io <commits@restyled.io>	5 years ago
Tijmen Verhulsdonck	73aa99e83c	LSTM arm/x86 + fp16 innerproduct arm (#1881 ) * added fp16 weight storage version * Small changes * Fixed fp16 weight storage layers * fix innerproduct * fix loop error * Fix windows build. Disable fp 16 conversion when detecting int8 weights. Implement requested changes. * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * Update option.cpp Set fp16 storage based on vulkan being used or not. * added ability for storing state in lstm layer * added avx lstm * added arm lstm * fix innerproduct activation location and add 4 parallel channel version * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * revert arm file * commit before switch * implement requested changes * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * More x86 optimized implementations of common layers. Added LSTM layers for arm and x86 + a ctest to verify the layer accuracy Added fp16 innerproduct for arm * fix non avx build * Add fp16 arm compiler and cpu checks. Remove statefullness from LSTM implementation. * Fix build check for fp16 arm * Bypass lstm_fp16 if not supported * Build order was incorrect * fix std::min missing in windows build * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * attempting to fix gnu build by enabling: -mfp16-format=ieee to fix the missing __fp16 type * remove double "fix" * Specify ieee fp16 format * implement requested changes * fix arm non-fp16 build * fix arm lstm * Restyled/pull 1881 (#15) * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle Co-authored-by: Restyled.io <commits@restyled.io> * Check blob size on arm lstm * fix styling Co-authored-by: Restyled.io <commits@restyled.io>	5 years ago
nihui	12ce58074e	some code clean	5 years ago
Tijmen Verhulsdonck	66618340ac	x86 fp16 weight storage optimizations (#1871 ) * added fp16 weight storage version * Small changes * Fixed fp16 weight storage layers * fix innerproduct * fix loop error * Fix windows build. Disable fp 16 conversion when detecting int8 weights. Implement requested changes. * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * Update option.cpp Set fp16 storage based on vulkan being used or not. * fix innerproduct activation location and add 4 parallel channel version * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle * revert arm file * implement requested changes * Restyled by clang-format * Restyled by astyle * Restyled by clang-format * Restyled by astyle Co-authored-by: Restyled.io <commits@restyled.io>	5 years ago

1 2

77 Commits (f1bdc87478c64e0dfdba3d679e6e0dfd4b84df80)