nihui/ncnn - ncnn - 开源协同云脑生态支撑系统

Commit Graph

Author	SHA1	Message	Date
nihui	6c4c810fda	decouple modelbin of different input types, simplify timestamp function	8 years ago
nihui	2d4ae30508	fallback to all cores	8 years ago
nihui	03c1f63c2e	switch to winograd4	8 years ago
nihui	bc99d5123b	set smp cpu affinity to all cores	8 years ago
nihuini	098fff355c	implement spatial norm, convert L2Normalization	8 years ago
nihui	5ff6a1808a	emmmm, yet another implementation for winograd 3x3, unroll aggressively for aarch64	8 years ago
YQZ1990	6f13cc5185	slice (#269 ) * fix slice dim3	8 years ago
nihuini	bd705d5bdb	inplace binaryop with scalar	8 years ago
nihuini	5f4ac776d1	implement instancenorm	8 years ago
nihuini	db5e805eff	padding_mode for Pooling, fix #261	8 years ago
nihui	2d9410742b	concat slice shufflechannel honor elemsize	8 years ago
nihui	8ccae1d4fd	prevent reuse of param array, fix #258	8 years ago
nihuini	75218953cc	aarch64 assembly for conv1x1s1, unroll outch inch as 8x8	8 years ago
nihuini	76a55693a6	decouple convolutiondepthwise and convolution, reduce binary size by 10%, fix #254	8 years ago
nihuini	3ffb502bc6	reuse if the same shape	8 years ago
nihuini	c6506d6ecd	remaining inch for winograd neon3	8 years ago
nihui	c12fab569f	fix convdw3x3s1 on aarch64	8 years ago
nihui	f133729c78	code style changes	8 years ago
nihuini	03621aa7f9	more x86 stub for convolution and convolutiondepthwise	8 years ago
Lamply	6612178960	correct arm convolution depthwise mistakes (#246 )	8 years ago
nihui	848c9a1ea7	code clean	8 years ago
nihui	80fb28de90	unroll outch for convolution 3x3s1, about 10%~20% speed gain	8 years ago
nihui	df218110be	unroll num_output for innerproduct, about 60% speed gain	8 years ago
nihui	aaa1ffcef0	emmmm, prefer w h	8 years ago
nihui	d68eb4cd15	wrap benchmark gettimeofday	8 years ago
Linghan Cheung	811b6ba1b6	print benchmark information for every layer, especially for CONVOLUTION (#241 ) * print benchmark information for every layer, especially for CONVOLUTION * print benchmark information for every layer, especially for CONVOLUTION, for cross-platform. * move the function implementation to cpp file to avoid multiple definitions	8 years ago
nihuini	d2ee4e7d27	ld1 and st1 handle data endian mode per element	8 years ago
nihui	08e261f423	innerproduct produce continous blob, fix #236	8 years ago
nihui	682b0d3c0d	prelu on vector and image	8 years ago
nihui	14a2e23407	enable embed layer	8 years ago
nihui	c9789fb879	slice dim	8 years ago
nihuini	67b80183dd	fix param load using external memory	8 years ago
nihuini	7fc23025d4	unroll outch for convolution 1x1 stride 2, about 15%~55% speed gain	8 years ago
nihuini	ccbb94d835	fix build	8 years ago
nihuini	e471028f53	fix avg pooling in tail pad	8 years ago
nihuini	a84ba8fc0f	element type storage support in Mat, move data member the first so that a pointer to Mat is a pointer to data, convenient index access for float vector	8 years ago
nihuini	8773035891	another implementation for winograd 3x3, about 15%~30% speed gains on small images	8 years ago
nihui	2a62a98e1a	allow constructing paramdict and modelbin from userside	8 years ago
nihui	10b86c2af5	create layer from type name	8 years ago
nihui	118e037f33	arm neon optimize for mat fill	8 years ago
nihui	7a43c45e80	remove deprecated code	8 years ago
nihui	a181d25098	new model load api, fix #215	8 years ago
nihuini	b84ba31c23	enable light mode by default	8 years ago
peng	5ac2de8963	fix shufflechannel	8 years ago
nihuini	df5e04260a	fix conv1x1s1 bug	8 years ago
nihuini	9280a068fe	unroll outch for convolution 3x3 winograd64, reduce memory usage	8 years ago
nihui	1f5c646ee0	pipeline optimize	8 years ago
nihuini	0564021afc	fix armv7 assembly	8 years ago
nihuini	55ec189998	unroll outch for convolution 1x1 stride 1	8 years ago
nihuini	57df1076ff	neon optimize for depthwise convolution 3x3, about 20%~35% speed gain	8 years ago

1 2 3 4

167 Commits (405faed2abb1e6e6c9fa1fdb43e4c1cbb715da7c)