You can not select more than 25 topics Topics must start with a chinese character,a letter or number, can include dashes ('-') and can be up to 35 characters long.

how-to-write-a-neon-optimized-op-kernel.md 445 B

1234567891011121314151617181920212223242526272829303132333435363738
  1. # benchmark
  2. op
  3. # naive C with openmp
  4. for for for
  5. # unroll, first try
  6. h
  7. # register allocation
  8. kernels
  9. # unroll, second try
  10. simd
  11. # neon intrinsics
  12. optional
  13. # naive neon assembly with pld
  14. asm
  15. # pipeline optimize, first try
  16. more register load mla
  17. # pipeline optimize, second try
  18. interleave load mla
  19. # pipeline optimize, third try
  20. loop tail
  21. # usual practice, load/save
  22. 233
  23. # usual practice, unroll
  24. 233
  25. # usual practice, save register
  26. 233