Siracusa

PULPing Up Extended Reality with At-MRAM Computing

Extended Reality (XR) is believed by many to be the “next” generation of advanced human-machine interfaces, after the desktop paradigm that enabled personal computing and the touch-screen that characterizes our modern, smartphone-dominated world.

From our perspective as computer architects and chip designers, this triggers a lot of questions: what should a computer architecture for Extended Reality look like? It must do many things we are now capable of doing, such as processing videos and sensor streams at high rate with Artificial Intelligence (AI) techniques – but it must do so within the boundaries of a lightweight wearable device, such as classic eye-glasses that are non-stigmatizing and fashionable. This is a challenge that requires much more aggressive miniaturization than what is available on smartphones, and much better performance and power efficiency than a smartwatch. In other words, we need to be drastically more efficient than what we are at processing data with embedded AI if we want to realize XR devices that are really natural to use.

What can be the key tools at our disposal to realize these massive gains in efficiency? One of the key ideas is to compute just near the sensors themselves: instead of shuttling data around wasting time and energy, we do as much computation as possible near-sensor and only move data that has been already pre-processed. But near-sensor processing brings also new challenges: how can we integrate enough throughput in a device that is small and consumes little? How can we fit large AI workloads, such as Deep Neural Networks, without relying on a large external Flash memory?

Our latest System-on-Chip, Siracusa, introduces new answers to these questions as the result of a 3-year research project conducted as a collaboration between us at the PULP Platform project and Meta Reality Labs. Siracusa uses TSMC’s 16nm technology, and introduces the new concept of At-MRAM computing. Siracusa integrates a 4MB non-volatile memory that, instead of the more common Flash technology, uses an innovative magnetoresistive RAM (MRAM): a memory that holds values through resistors whose conductivity can be changed by applying appropriate magnetic fields. Integrating such memories in a SoC is not a new endeavor, but Siracusa introduces its key innovation by directly coupling this MRAM with a fully digital configurable hardware accelerator for AI tasks, which is called N-EUREKA. Together with the MRAM, N-EUREKA forms a joint at-MRAM compute unit, which is further integrated with one of our PULP RISC-V clusters, to form what we called a heterogeneous cluster.

Fig.: The 16mm² Siracusa System-on-Chip with architecture of the heterogeneous cluster

The Siracusa At-MRAM computing strategy holds many advantages. Like any innovative non-volatile memory integration strategy, it dramatically reduces energy costs related to bringing neural network weights on-chip: instead, weights are flashed in the chip itself, and can be accessed at just a fraction of the time and power cost. But at the same time, At-MRAM computing also shares some of the advantages with in-memory-computing techniques, recently popularized in academic research – without the related drawbacks. First, the close memory/compute coupling saves further energy and power compared to more conventional non-volatile memory integration techniques employed in SoC’s such as microcontrollers. Second, the fully digital nature of the At-MRAM AI acceleration unit means that analog noise does not negatively impact the performance of neural networks.

The Siracusa prototype we tested achieves up to 700 GOPS of performance and up to 2.7 TOPS/W of energy efficiency at the highest N-EUREKA precision setting (8-bit weights), and most importantly, it demonstrates that the At-MRAM technique enables substantial gains in average power for the typical AI workloads used in XR. But there remains much to be done: deeper integration between compute SoC’s and sensors, towards on-sensor computing; scaling to larger performance to support more of the functions required in XR; better support for complex software pipelines to be executed on sensor. The future is exciting for PULP-based XR computing!

More information can be found in the following resourcess:

Arpan Prasad's paper: Siracusa: A 16 nm Heterogenous RISC-V SoC for Extended Reality with At-MRAM Neural Engine
arXiv: https://arxiv.org/pdf/2312.14750.pdf

Francesco Conti's talk from 10 years of PULP: Next stop XR: towards on-sensor PULP computing for micropower eXtended Reality
Video: https://usisupsi.cloud.panopto.eu/Panopto/Pages/Viewer.aspx?id=9347531c-0c7f-4fc3-8a6f-b07601596a28

Moritz Scherer's ESSCIRC 2023 presentation: Siracusa - Towards On-Sensor Computing for Extended Reality Applications
Slides: https://pulp-platform.org/docs/esscirc2023/SiracusaESSCIRC.pdf