use immediate initialization instead of multiplication in case register content is a NaN
added beta == zero branch, and no need to load C matrix. Signed by: Xuqiang Chen <chenxuqiang3@hisilicon.com>