# **AXI-PACK: Near-Memory Bus Packing for Bandwidth-Efficient Irregular Workloads**

Chi Zhang<sup>1</sup>, Paul Scheffler<sup>1</sup>, Thomas Benz<sup>1</sup>, Matteo Perotti<sup>1</sup>, Luca Benini<sup>1,2</sup> <sup>1</sup>Integrated Systems Laboratory, ETH Zurich; <sup>2</sup>DEI, University of Bologna

### 1 Challenge: irregular memory accesses

### Applications

Graph analytics

**ETH** zürich

For (i=0,i<8,i++) var+=Data[ Stride\*i ]

For (i=0,i<8,i++)

- Fluid dynamics
- Recommender systems
- · Challenge to processors
  - Poor bus utilization
  - Cache trashing
  - Long latencies

## 3 Proposal: AXI-Pack on-chip protocol

Extends Advance eXtensible Interface4 (AXI4)<sup>3</sup> on-chip protocol

Core

- Core-side issues Pattern-aware requests
- Memory-side responses
- Densely packed stream
- Features
  - End-to-end
  - irregular streaming
  - Process-In-Memory
  - **Bus-packing**
  - **Standard based**
  - **Backward compatible**
  - Transparent
  - **Scalable**

# Speed up Memory Core Core Speed up addr size len Core AXI-Pack

### 2 State-of-art solutions

- Core-side stream ISA extensions<sup>1</sup>
  - Decouple computing and memory access: hide latency
  - Inherent inefficiency of narrow bus accesses
  - High index-fetching overhead and bus traffic
- Memory-side extensions<sup>2</sup>
  - + Prefetch and pack irregular elements at memory-side
  - Not well co-integrated with Core-side
  - Non-standard solutions

### 4 Design work

- Define AXI4 user extension (7% more bits)
- Extend a RISC-V vector processor
- Design an AXI-Pack adapter for banked Memory

AR/AW user pack ... indir idx size stride idx ... base offs 30

gap



### **5** Results

Speedup irregular workloads











to banked memory

- 5.4x (stride)
- 2.4x (indirect)
- · light-weight and scalable
  - 6.2% extension area
- Improve energy efficiency
  - 5.3x (stride)
  - 2.1x (indirect)



base — Rutilization

#### References

- 1. Domingos, Joao Mario, et al. "Unlimited vector extension with data streaming support." 2021 ISCA
- 2. Tanabe, Noboru, et al. "A memory accelerator with gather functions for bandwidth-bound irregular applications." Proceedings of the 1st Workshop on Irregular Applications: Architectures and Algorithms. 2011.
- 3. Arm, "AMBA AXI and ACE Protocol Specification," https://developer.arm.com/documentation/ihi0022/hc.