* Add another memory barrier in memory.c to prevent races in memory slot allocation * Add an all-core test on Drone.io's ThunderX platform and modify dgemm_tester to use all 96 cores