Practical Implementations & Cooperation for Reliable Architectures Using Open-Source Hardware

European Test Symposium (ETS) 2023
Special Session on dependable RISC-V

Michael Rogenmoser  michaero@iis.ee.ethz.ch

PULP Platform
Open Source Hardware, the way it should be!
Soft-Error Tolerance – Required in Different Domains

Space Environment

Automotive

Particle Accelerator

And many more…
Open-Source Microcontroller – PULPissimo

This is the top-level project for the PULPissimo Platform. It instantiates a PULPissimo open-source system with a PULP SoC domain, but no cluster.

github.com/pulp-platform/pulpissimo
PULPissimo Architecture

RISC-V Core

Memory Bank
Memory Bank
Memory Bank
Memory Bank
Memory Bank

Peripheral Interconnect

SoC Control
Timer
GPIO
Debug

Tightly Coupled Data Memory Interconnect

µDMA

Interrupt Control

RISC-V Core

JTAG
SPI
I²C
UART
SDIO

I/O intf.
How do we tackle reliability of the RISC-V core?
Reliable Processing Cores

• Replicate Core
  → Triple-Core Lockstep

• Identical Inputs

• Voted Outputs
  • Directly connects any soft error

• Configurable for performance if reliability is not needed
  • 2.96x speedup
Software Recovery of Triple Modular Redundant Cores

- Radiation causes soft-error
- Error detected by voter
- Core state (RF, PC, CSRs) stored to memory
  - Corrected by voter
- Core state loaded back into cores
- Total procedure in ~600 cycles
Trikarenos – PULPissimo with Reliability

- RISC-V Core
- Memory Bank
- Tightly Coupled Data Memory Interconnect
- Peripheral Interconnect
- µDMA
- JTAG
- SPI
- I²C
- UART
- SDIO
- SoC Control
- Timer
- GPIO
- Debug
- Interrupt Control
- RISC-V Core
- RISC-V Core
- RISC-V Core
How do we tackle reliability of the System Memory?
Byte-addressable Memory – ECC Load and Store

Byte Store

Byte Store with ECC
ECC Load-and-Store – Performance Impact

- 32-bit Hsiao ECC word protection

- Directly grant storage
  - Delay following transaction, not current transaction, to shift & reduce impact

- Results: <1% cycle increase
  - Various tests, such as 8-bit Matrix-Matrix Multiplication
ECC Scrubber

- Multiple errors in a single word lead to unrecoverable errors
- Scan Memory Bank
- Re-write faulty word if error is detected
- Defer permission to external accesses
- Log all corrections (and uncorrectable words)
Trikarenos – PULPissimo with Reliability

RISC-V Core

Memory Bank

ECC

Memory Bank

ECC

Memory Bank

ECC

Memory Bank

ECC

Memory Bank

ECC

Tightly Coupled Data Memory Interconnect

Peripheral Interconnect

μDMA

JTAG

SPI

I²C

UART

SDIO

SoC Control

Timer

GPIO

Debug

Interrupt Control

RISC-V Core

RISC-V Core

RISC-V Core

instr

data

Practical Dependable RISC-V - Michael Rogenmoser - ETS23 - 24.5.23
Trikarenos – ASIC implementation

- TSMC 28HPC+
  - Shown to have high TID tolerance

- 2 mm² @ 250 MHz

- 3 separate Ibex cores

- 256 KiB Memory in 8 word-interleaved banks
Internal structure

- Spatially separated cores with a keepout zone

- Legend:
  - Cores
  - HMR Unit
  - Memory (w/ ECC en-/decode)
  - Interconnect
  - Debugger
  - Logging & control registers, ROM, ...
  - I/O
Power Consumption at max Frequency

![Graph showing power consumption vs. core voltage for different core states: Parallel Cores, Locked Core, Single Core, and Idle. The graph illustrates the power consumption increase with higher voltages and parallel cores, with lockstep configurations showing a 3x increase in power consumption compared to the single core.](image-url)
Open points – PULP is a playground

RISC-V Core

Memory Bank
ECC

Memory Bank
ECC

Memory Bank
ECC

Memory Bank
ECC

Memory Bank
ECC

Tightly Coupled Data Memory Interconnect

Peripheral Interconnect

SoC Control

Timer

GPIO

Debug

JTAG

µDMA

SPI

I²C

UART

SDIO

Interrupt Control

RISC-V Core

RISC-V Core

RISC-V Core

 instr

data
Open points – PULP is a playground

RISC-V Core
Memory Bank
ECC
Tightly Coupled Data Memory Interconnect
Instr data
µDMA
JTAG SPI I²C UART SDIO
Peripheral Interconnect
SoC Control
Timer GPIO Debug
Interrupt Control
RISC-V Core
RISC-V Core
RISC-V Core
ECC
ECC
ECC
ECC
ECC
ECC

Practical Dependable RISC-V - Michael Rogenmoser - ETS23 - 24.5.23
Luca Benini, Ahmad Mirsalari, Alessandro Capotondi, Alessandro Nadalini, Alessandro Ottaviano, Alessio Burrello, Alfio Di Mauro, Andrea Borghesi, Andrea Cossettini, Angelo Garofalo, Arpan Prasad, Chi Zhang, Corrado Bonfanti, Cristian Cioflan, Cyril Koenig, Daniele Palossi, Davide Rossi, Fabio Montagna, Florian Glaser, Francesco Conti, Georg Rutishauser, Germain Haugou, Gianna Paulin, Giuseppe Tagliavini, Hanna Müller, Jannis Schoenleber, Lorenzo Lamberti, Luca Bertaccini, Luca Colagrande, Luca Valente, Maicol Caini, Manuel Eggimann, Manuele Rusci, Marco Bertuletti, Marco Guermandi, Matheus Cavalcante, Matteo Perotti, Mattia Sinigaglia, Michael Rogenmoser, Moritz Scherer, Moritz Schneider, Nazareno Bruschi, Nils Wistoff, Paul Scheffler, Philipp Mayer, Robert Balas, Samuel Riedel, Segio Mazzola, Sergei Vostrikov, Simone Benatti, Thomas Benz, Thorir Ingolfsson, Tim Fischer, Victor Javier Kartsch Morinigo, Victor Jung, Viviane Potocnik, Vlad Niculescu, Xiaying Wang, Yichao Zhang, Yvan Tortorella, Frank K. Gürkaynak, all our past collaborators and many more that we forgot to mention

http://pulp-platform.org @pulp_platform