Working with RISC-V
from open ISA to open Architecture to open Hardware

Part 1 of 5: Introduction to RISC-V ISA

Luca Benini <luca.benini@unibo.it>
Davide Rossi <davide.rossi@unibo.it>
Summary

- Part 1 – Introduction to RISC-V ISA
  - What RISC-V is about
  - Description of ISA, and basic principles
  - Simple 32b implementation (Ibex by LowRISC)
  - How to extend the ISA (CV32E40P by OpenHW group)

- Part 2 – Advanced RISC-V Architectures

- Part 3 – PULP concepts

- Part 4 – PULP Extensions and Accelerators

- Part 5 – PULP based chips
RISC-V Instruction Set Architecture

- Started by UC-Berkeley in 2010
- Contract between SW and HW
  - Partitioned into user and privileged spec
  - External Debug
- Standard governed by RISC-V foundation
  - ETHZ is a founding member of the foundation
  - Necessary for the continuity
- Defines 32, 64 and 128 bit ISA
  - No implementation, just the ISA
  - Different implementations (both open and close source)
- At ETHZ+UNIBO we specialize in efficient implementations of RISC-V cores
RISC-V maintains basically a PDF document

Please note, RISC-V ISA and related specifications are developed, ratified and maintained by RISC-V International contributing members within the RISC-V International Technical Committee. Operating details of the Technical Committee can be found in the RISC-V International Tech Group. Work on the specification is performed on GitHub and the GitHub issue mechanism can be used to provide input into the specification.

**ISA Specification**

The specifications shown below represent the current, ratified releases:

- Volume 1, Unprivileged Spec v. 20191213 [PDF] [GitHub (latest)]
- Volume 2, Privileged Spec v. 20190608 [PDF] [GitHub (latest)]

**Debug Specification**

- External Debug Support v. 0.13.2 [PDF]
ISA defines the instructions that processor uses

C++ program translated to RISC-V instructions defined by ISA.

This will run on ANY RISC-V implementation

Screen shot from the excellent Compiler Explorer by Matt Godbolt
https://godbolt.org/
RISC-V Ecosystem

- Binutils – upstream
- GCC – upstream
- LLVM – upstream
- Simulator:
  - ”Spike” - reference
  - QEMU, Gem5
- OpenOCD
- OS
  - Linux, sel4, freeRTOS, zephyr
- Runtimes
  - Jikes, Ocaml, Go
- SW maintained by different parties
  - Binutils and GCC by Sifive a Berkeley start-up

See https://github.com/riscv/riscv-wiki/wiki/RISC-V-Software-Status for an updated list
RISC-V ISA is divided into extensions

- Kept very simple and extendable
  - Wide range of applications from IoT to HPC
- RV + word-width + extensions
  - RV32IMC: 32bit, integer, multiplication, compressed
- User specification:
  - Separated into extensions, only I is mandatory
- Privileged Specification (WIP):
  - Governs OS functionality: Exceptions, Interrupts
  - Virtual Addressing
  - Privilege Levels

<p>| | |</p>
<table>
<thead>
<tr>
<th></th>
<th></th>
</tr>
</thead>
<tbody>
<tr>
<td>I</td>
<td>Integer instructions (frozen)</td>
</tr>
<tr>
<td>E</td>
<td>Reduced number of registers</td>
</tr>
<tr>
<td>M</td>
<td>Multiplication and Division (frozen)</td>
</tr>
<tr>
<td>A</td>
<td>Atomic instructions (frozen)</td>
</tr>
<tr>
<td>F</td>
<td>Single-Precision Floating-Point (frozen)</td>
</tr>
<tr>
<td>D</td>
<td>Double-Precision Floating-Point (frozen)</td>
</tr>
<tr>
<td>C</td>
<td>Compressed Instructions (frozen)</td>
</tr>
<tr>
<td>X</td>
<td>Non Standard Extensions</td>
</tr>
</tbody>
</table>
Work continues on new RISC-V extensions

- Foundation members work in task-groups
- Dedicated task-groups
  - Formal specification
  - Memory Model
  - Marketing
  - External Debug Specification
- ETH Zurich also contributes
  - Bit manipulation
  - Packed SIMD, DSP

<table>
<thead>
<tr>
<th>Q</th>
<th>Quad-precision Floating-Point</th>
</tr>
</thead>
<tbody>
<tr>
<td>L</td>
<td>Decimal Floating Point</td>
</tr>
<tr>
<td>B</td>
<td>Bit Manipulation</td>
</tr>
<tr>
<td>T</td>
<td>Transactional Memory</td>
</tr>
<tr>
<td>P</td>
<td>Packed SIMD</td>
</tr>
<tr>
<td>J</td>
<td>Dynamically Translated Languages</td>
</tr>
<tr>
<td>V</td>
<td>Vector Operations</td>
</tr>
<tr>
<td>N</td>
<td>User-Level Interrupts</td>
</tr>
</tbody>
</table>
What is so special about RISC-V

- It is FREE
- Everybody can build, sell, and make RISC-V cores available
- It is a modern design, no historical baggage
- Some of the more common ISAs (ARM, Intel...) have been around for 20+ years
- Newer implementations still need to be compatible to older designs.
- RISC-V benefited from the mistakes made by others, cleaner design
- Major design decisions have been properly motivated and explained
- Reserved space for extensions, modular
- Open standard, you can help decide how it is developed

RISC-V base ISAs have either little-endian or big-endian memory systems, with the privileged architecture further defining bi-endian operation. Instructions are stored in memory as a sequence of 16-bit little-endian parcels, regardless of memory system endianness. Parcels forming one instruction are stored at increasing halfword addresses, with the lowest-addressed parcel holding the lowest-numbered bits in the instruction specification.

We originally chose little-endian byte ordering for the RISC-V memory system because little-endian systems are currently dominant commercially (all x86 systems; iOS, Android, and Windows for ARM). A minor point is that we have also found little-endian memory systems to be more natural for hardware designers. However, certain application areas, such as IP networking,
The FREEDOM in RISC-V is implementation

- You can access all ISAs without (many) restrictions
  - SW tools need to be developed so that they can generate code for that ISA
- Most ISAs are closed. Only specific vendors can implement it
  - To use a core that implements an ISA, you have to license/buy it from vendor
  - Open source SW (for the ISA) is possible but building HW is not allowed
Are RISC-V processors better than XYZ?

- Actual performance depends on the implementation
  - RISC-V does not specify implementation details (on purpose)

- Modern design, should deliver comparable performance
  - If implemented well, it should perform as well as other modern ISA implementations
  - In our experiments, we see no major weaknesses when compared to other ISAs
  - It also is not magically 2x better

- High-end processor performance is not so much about ISA
  - Implementation “details” like microarchitecture, memory hierarchy, target technology, power management are more important.
What is not so good about RISC-V?

- Still in development
  - Some standards (privilege, vector, debug etc.) still being refined, adjusted.
  - Tools and development environment needs to catch up.

- No canonical implementation ("the" RISC-V core)
  - It is free to implement, so many people did so, resulting in many cores

- Higher end (out of order, superscalar) cores not yet mature
  - In theory there is nothing to prevent a RISC-V based Linux laptop.
  - It will take some more time until RISC-V implementations can compete with other commercial processors (which needed hundreds of man months of work)
  - Getting there (Alibaba XT910, SiFive P550, Esperanto ET-Maxion, Semidynamics Avispado, Rivos ??? and more coming every day!)
# Reduced Instruction Set: all in one page

## Basic Instructions (I)

<table>
<thead>
<tr>
<th>Instruction</th>
<th>Function</th>
</tr>
</thead>
<tbody>
<tr>
<td>Load Word</td>
<td>Load Word</td>
</tr>
</tbody>
</table>

## Privilege Mode

### Floating Point Extensions

- Add/Subtract
- Multiply/Divide

## Compressed Instructions (C)

<table>
<thead>
<tr>
<th>Instruction</th>
<th>Function</th>
</tr>
</thead>
<tbody>
<tr>
<td>Store Word</td>
<td>Store Word</td>
</tr>
</tbody>
</table>

## Atomic Extensions (A)

<table>
<thead>
<tr>
<th>Instruction</th>
<th>Function</th>
</tr>
</thead>
<tbody>
<tr>
<td>Branch</td>
<td>Branch</td>
</tr>
</tbody>
</table>

## Multiply/Divide (M)

<table>
<thead>
<tr>
<th>Instruction</th>
<th>Function</th>
</tr>
</thead>
<tbody>
<tr>
<td>Divide</td>
<td>Divide</td>
</tr>
</tbody>
</table>

## Floating Point Extensions

- Add/Subtract
- Multiply/Divide

---

**Note:** The image includes a detailed view of the RISC-V instruction set architecture, with categories like Basic Instructions, Privilege Mode, Compressed Instructions, Atomic Extensions, Floating Point Extensions, and Multiply/Divide instructions. Each category is color-coded for easy identification.

**Source:** RISC-V Reference Card.
RISC-V Architectural State

- There are 32 registers, each 32 / 64 / 128 bits long
  - Named x0 to x31
  - x0 is hard wired to zero
  - There is a standard ‘E’ extension that uses only 16 registers (RV32E)
- In addition one program counter (PC)
  - Byte based addressing, program counter increments by 4/8/16
- For floating point operation 32 additional FP registers
- Additional Control Status Registers (CSRs)
  - Encoding for up to 4’096 registers are reserved. Not all are used.
RISC-V Instructions four basic types

- **R** register to register operations
- **I** operations with immediate/constant values
- **S / SB** operations with two source registers
- **U / UJ** operations with large immediate/constant value

### R-type

<table>
<thead>
<tr>
<th>31</th>
<th>25-24</th>
<th>20-19</th>
<th>19-18</th>
<th>14-13</th>
<th>11</th>
<th>6-0</th>
</tr>
</thead>
<tbody>
<tr>
<td>funct7</td>
<td>rs2</td>
<td>rs1</td>
<td>funct3</td>
<td>rd</td>
<td>opcode</td>
<td></td>
</tr>
</tbody>
</table>

### I-type

<table>
<thead>
<tr>
<th>31</th>
<th>25-24</th>
<th>20-19</th>
<th>19-18</th>
<th>14-13</th>
<th>11</th>
<th>6-0</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>imm[11:0]</td>
<td>rs1</td>
<td>funct3</td>
<td>rd</td>
<td>opcode</td>
<td></td>
</tr>
</tbody>
</table>

### S-type

<table>
<thead>
<tr>
<th>31</th>
<th>25-24</th>
<th>20-19</th>
<th>19-18</th>
<th>14-13</th>
<th>11</th>
<th>6-0</th>
</tr>
</thead>
</table>

### U-type

<table>
<thead>
<tr>
<th>31</th>
<th>25-24</th>
<th>20-19</th>
<th>19-18</th>
<th>14-13</th>
<th>11</th>
<th>6-0</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td>rd</td>
<td>opcode</td>
</tr>
</tbody>
</table>
Encoding of the instructions, main groups

- **Reserved** opcodes for standard extensions
- Rest of opcodes free for **custom** implementations
- Standard extensions will be frozen/not change in the future

<table>
<thead>
<tr>
<th>inst[4:2]</th>
<th>000</th>
<th>001</th>
<th>010</th>
<th>011</th>
<th>100</th>
<th>101</th>
<th>110</th>
<th>111 (&gt; 32b)</th>
</tr>
</thead>
<tbody>
<tr>
<td>inst[6:5]</td>
<td>LOAD</td>
<td>LOAD-FP</td>
<td>custom-0</td>
<td>MISC-MEM</td>
<td>OP-IMM</td>
<td>AUIPC</td>
<td>OP-IMM-32</td>
<td>48b</td>
</tr>
<tr>
<td></td>
<td>STORE</td>
<td>STORE-FP</td>
<td>custom-1</td>
<td>AMO</td>
<td>OP</td>
<td>LUI</td>
<td>OP-32</td>
<td>64b</td>
</tr>
<tr>
<td></td>
<td>MADD</td>
<td>MSUB</td>
<td>NMSUB</td>
<td>NMADD</td>
<td>OP-FP</td>
<td>reserved</td>
<td>custom-2/rv128</td>
<td>48b</td>
</tr>
<tr>
<td></td>
<td>BRANCH</td>
<td>JALR</td>
<td>reserved</td>
<td>JAL</td>
<td>SYSTEM</td>
<td>reserved</td>
<td>custom-3/rv128</td>
<td>≥ 80b</td>
</tr>
</tbody>
</table>
RISC-V is a load/store architecture

- All operations are on internal registers
  - Can not manipulate data in memory directly
- Load instructions to copy from memory to registers
- R-type or I-type instructions to operate on them
- Store instructions to copy from registers back to memory
- Branch and Jump instructions
Constants (Immediates) in Instructions

- In 32bit instructions, not possible to have 32b constants
  - Constants are distributed in instructions, and then sign extended
  - The Load Upper Immediate (lui) instruction to assemble/push constants

- Instruction types according to immediate encoding
Load from memory (ld), how immediates work

\[ \text{ld } x9, \ 64(x22) \]

- Not possible to fit a 32b address in 32b encoding directly
  - Take the content in source (rs1), add the immediate (imm) to it. This is the address
  - Read from this address in the memory and load into the destination (rd) register

- RISC-V tries to minimize number of instructions
  - The ld instruction seems overly complicated, but you can use this for everything
Branching, how addresses come together

\[ \text{bne } x10, \ x11, \ 2000 \] // if \( x10 \neq x11 \), jump 2000 ahead

- Similar problem, how to encode jump address in branches
  - Branch on Equal (\text{beq}) and Branch on Not Equal (\text{bne})
  - They use B type operations, need two source registers

- Jumps are relative to Program Counter (PC)
  - The \text{immediate} (constant) shows how far we have to jump (PC-relative addressing)
  - Works addresses within ±4096. To branch further, we need several instructions.
RISC-V Instruction Length is Encoded

- LSB of the instruction tells how long the instruction is
- Supports instructions of 16, 32, 48, 64, 80, 96, … , 320 bit
  - Allows RISC-V to have Compressed instructions

```
.....xxxx xxxxxxxxxx xxxxxxxxxx011111 48-bit
.....xxxx xxxxxxxxxx xxxxxxxxxx0111111 64-bit
.....xxxx xxxxxxxxxx xxxxxxxnnn1111111 (80+16*nnnn)-bit, nnnn≠1111
.....xxxx xxxxxxxxxx xxxxxxx111111111 Reserved for ≥320-bits
```

Byte Address: base+4    base+2    base
Compressed Instruction extension ‘C’

- Use 16-bit instructions for common operations
  - Code size reduction by 34%
  - Compressed instructions increase fetch-bandwidth
  - Allow for macro-op fusion of common patterns

x86-64: 3.71 bytes / instruction  RV64IC: 3.00 bytes / instruction
So, how to build RISC-V cores?

- **RISC-V ISA tells you the function**
  - You know which instructions are supported
  - How they are encoded
  - What they are supposed to do

- **It does not tell you any implementation details**
  - Pipeline stages, memory hierarchy, computation units, in-order or out-of-order
  - Everyone is free to figure out how to best implement these

- **Need to come up with a micro-architecture to implement it**
  - Determine which standard extensions are supported, how
  - Choose a micro-architecture that fits performance requirements
What are the Performance Metrics

- **Area**
  - in kGE equivalent (# of simple logic gates) or mm² (technology dependent)

- **Frequency:**
  - Depends on # of gates on longest path

- **Power:**
  - Strongly depends on the above metrics
  - **Leakage:** dissipated even when not working (Area)
  - **Dynamic Power:** dissipated on logic transitions (frequency and area)

- **CPU Design:**
  - **IPC** (Instructions per cycle)
    - IPC implicitly measured in commonly used benchmarks (Coremark, Dhrystone, SpecInt)
  - **Energy Efficiency:** OPs/Joule

- **Hardware Designer**
  - Tries to find a good balance
  - Application dependent
    - IoT and HPC have different requirements
    - One size does not fit all
RISC-V cores developed at ETH Zurich

<table>
<thead>
<tr>
<th>32 bit</th>
<th>64 bit</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>Low Cost Core</strong></td>
<td><strong>Linux capable Core</strong></td>
</tr>
<tr>
<td>- Zero-riscy RV32-ICM</td>
<td>- Ariane RV64-IC(IMA)</td>
</tr>
<tr>
<td>- Micro-riscy RV32-GC</td>
<td>- Full privileged specification</td>
</tr>
<tr>
<td><strong>DSP Enhanced Core</strong></td>
<td><strong>CV64A by OpenHW</strong></td>
</tr>
<tr>
<td>- RI5CY RV32-ICMFX</td>
<td></td>
</tr>
<tr>
<td>- SIMD Hardware instructions Fixed Point</td>
<td></td>
</tr>
<tr>
<td><strong>Streaming Compute Core</strong></td>
<td></td>
</tr>
<tr>
<td>- Snitch RV32-ICMDFX</td>
<td></td>
</tr>
</tbody>
</table>

- **Ibex by LowRISC**
- **CV32E40P by OpenHW**
Zero-riscy / Ibex, small core for control applications

- 2-stage pipeline
- Optimized for area
  - Area:  
    - 19 kGE (Zero-riscy)
    - 12 kGE (Micro-riscy)
  - Critical path:  
    - ~ 30 logic levels
- New name: Ibex
  - LowRISC has taken over Zero/Micro-Riscy in 2019

- Two Configurations:
  - **Zero-riscy**: RV32IMC (2.44 Coremark/MHz)
    - 32 registers, hardware multiplier
  - **Micro-riscy**: RV32EC (0.91 Coremark/MHz)
    - 16 registers (E), software emulated multiplier

---

Ibex continues to grow with LowRISC

Ibex is a small and efficient, 32-bit, in-order RISC-V core with a 2-stage (or optionally 3-stage) pipeline that implements the RV32IMCB instruction set architecture.

Since being contributed to lowRISC by ETH Zürich, it has seen substantial investment of development effort.
Roadmap of Ibex

**lowRISC**

- Randomised execution time
- Non-data-dependent fixed execution time
- Parity checks
- Bus scrambling
- CFI (TBD)
- Shadow PMP regs
- OT secure coding guidelines conform

**Stabilisation** 19Q3-19Q4
- RISC-V specification conformance
- Code clean up and refactoring (~50% LoC changed)
- CI & DV (riscv-dv, Google)

**Perf phase 1** 20Q1
- Branch target ALU
- Third pipeline stage
- Single-cycle MUL
- I$ prototype

**Perf phase 2** 20Q2
- Finalise I$
- Static branch predictor
- Bitmanip ISA extension

**Security hardening phase 1** 20Q2

**Security hardening phase 2** 20Q3
Growth of Ibex measured with Coremark/MHz

Past Work

- 2.43
- 2.55

Today

- 2.92
- 3.09

Future

- 3.19

- Branch Target ALU
- Third Pipeline Stage
- Single Cycle Multiply
- Static Branch Prediction
- Bit Manipulation ISA Extension
RI5CY / CV32E40P our main 32bit RISC-V core

- Zero-riscy / Ibex is suitable for simple applications
  - Control applications, book-keeping

- For number crunching, we need more capable cores
  - Mainly used in clusters for signal processing / machine learning applications

- Tuned for energy efficiency
  - Not necessarily lowest power

- Make use of custom extensions
  - The Xpulp extensions enhance the capabilities
  - Several Xpulp extensions in discussions for ratification
Simplified pipeline for RI5CY / CV32E40P
RI5CY: Our 32-bit workhorse

- 4-stage pipeline
  - 41 kGE
  - Coremark/MHz 3.19
- Includes Xpulp extensions
  - SIMD
  - Fixed point
  - Bit manipulations
  - HW loops
- Different Options:
  - FPU: IEEE 754 single precision
    - Including hardware support for FDIV, FSQRT, FMAC, FMUL
  - Privilege support:
    - Supports privilege mode M and U

RISC-V has space for custom instructions (X)

- There is a reserved decoding space for custom instructions
  - Allows everyone to add new instructions to the core
  - The address decoding space is reserved, it will not be used by future extensions
  - Implementations supporting custom instructions will be compatible with standard ISA
    - Code compiled for standard RISC-V will run without issues
  - The user has to provide support to take advantage of the additional instructions
    - Compiler that generates code for the custom instructions

- We use a lot this degree of freedom
  - Great tool for exploring
  - The goal is to help ratify these extensions as standards through working groups
Our extensions to RI5CY & support in GCC, LLVM

- Post-incrementing load/store instructions
- Hardware Loops (\texttt{lp.start}, \texttt{lp.end}, \texttt{lp.count})
- ALU instructions
  - Bit manipulation (count, set, clear, leading bit detection)
  - Fused operations: (add/sub-shift)
  - Immediate branch instructions
- Multiply Accumulate (32x32 bit and 16x16 bit)
- SIMD instructions (2x16 bit or 4x8 bit) with scalar replication option
  - add, min/max, dotproduct, shuffle, pack (copy), vector comparison

For 8-bit values the following can be executed in a single cycle (\texttt{pv.dotup.b})

\[ Z = D_1 \times K_1 + D_2 \times K_2 + D_3 \times K_3 + D_4 \times K_4 \]
RI5CY ISA extensions improve performance

```c
for (i = 0; i < 100; i++)
    d[i] = a[i] + b[i];
```

### Baseline
- `mv x5, 0`
- `mv x4, 100`
- Lstart:
  - `lb x2, 0(x10)`
  - `lb x3, 0(x11)`
  - `addi x10, x10, 1`
  - `addi x11, x11, 1`
  - `add x2, x3, x2`
  - `sb x2, 0(x12)`
  - `bne x4, x5, Lstart`
- 11 cycles/output

### Auto-incr load/store
- `mv x5, 0`
- `mv x4, 100`
- Lstart:
  - `lb x2, 0(x10)`
  - `lb x3, 0(x11)`
  - `addi x4, x4, -1`
  - `add x2, x3, x2`
  - `sb x2, 0(x12)`
  - `bne x4, x5, Lstart`
- 8 cycles/output

### HW Loop
- `lp.setupi 100, Lend`
- `lb x2, 0(x10)`
- `lb x3, 0(x11)`
- `add x2, x3, x2`
- `sb x2, 0(x12)`
- `bne x4, x5, Lstart`
- 5 cycles/output

### Packed-SIMD
- `lp.setupi 25, Lend`
- `lw x2, 0(x10)`
- `lw x3, 0(x11)`
- `pv.add.b x2, x3, x2`
- `sw x2, 0(x12)`
- 1,25 cycles/output
Runtime for three different applications

Extensions have more effect

- 2D Convolution
- EEMBC Coremark
- Scheduler Application

Better
Different cores for different area budgets

![Bar chart showing area budget comparison between RV32IMCXpulp, RV32IMC, and RV32EC.]
Different cores for different power budgets

RV32IMCXpulp Better RV32IMC x2.4 RV32EC x2.7
Energy Efficiency: 2D-Convolution @55MHz, 0.8V

More frequent events/ processing

- RV32IMC
- RV32EC
- RV32IMCXpulp

- Fast-Events (41.6 ms)
- Good trade-off (649 ms)
- Slow-Events (31 s)

Working with RISC-V
This was a short overview of basics of RISC-V

- Tomorrow, more advanced cores
  - 64bit RISC-V core
  - Discussion on performance
  - Vector processing

- On Wednesday-Friday, we learn about PULP systems
  - Cores alone can not do much, they need a system around
  - Many core systems
  - Managing Data
  - Acceleration
  - Actual Integrated Circuits from the PULP group
Luca Benini, Davide Rossi, Andrea Borghesi, Michele Magno, Simone Benatti, Francesco Conti, Francesco Beneventi, Daniele Palossi, Giuseppe Tagliavini, Antonio Pullini, Germain Haugou, Manuele Rusci, Florian Glaser, Fabio Montagna, Bjoern Forsberg, Pasquale Davide Schiavone, Alfio Di Mauro, Victor Javier Kartsch Morinigo, Tommaso Polonelli, Fabian Schuiki, Stefan Mach, Andreas Kurth, Florian Zaruba, Manuel Eggimann, Philipp Mayer, Marco Guermandi, Xiaying Wang, Michael Hersche, Robert Balas, Antonio Mastrandrea, Matheus Cavalcante, Angelo Garofalo, Alessio Burrello, Gianna Paulin, Georg Rutishauser, Andrea Cossettini, Luca Bertaccini, Maxim Mattheeuws, Samuel Riedel, Sergei Vostrikov, Vlad Niculescu, Hanna Mueller, Matteo Perotti, Nils Wistoff, Luca Bertaccini, Thorir Ingulfsson, Thomas Benz, Paul Scheffler, Alessio Burello, Moritz Scherer, Matteo Spallanzani, Andrea Bartolini, Frank K. Gurkaynak, and many more that we forgot to mention

http://pulp-platform.org  @pulp_platform