* quantize gemm * write gemm quantize scales * update doc * less openmp args * x86 riscv fallback * skip gemm vulkan int8 * fix noint8 test, fix arm bf16 test * enable vfpv4 on neon build only * fix gemm vulkan without C * fp16 pack8 output * enable elempack=8 only for asimdhp+ * tiled gemm int8 test * opt arm64 tiles, fix asimdhp dispatch
* create layer decoupled * no more virtual public * allow build test with shared library * decouple cpu vulkan * drop old scripts
* mha is now permute and reshape free * gemm user defined tile mnk param