The PULP Microcontroller Software Interface Standard (PMSIS) provides the Board Support Package (BSP), the Application Programming Interface (API), and the drivers for running applications on PULP-based Microcontrollers (MCUs). It is developed and expanded based on the old pulp-rt, which was used, e.g., for Mr. Wolf processor.
The GCC and the LLVM compilers used for PULP are respectively based on GNU GCC and LLVM supporting the PULP ISA based on RISC-V standard ISA and specific extensions such as Xpulpv0, Xpulpv1, Xpulpv2 and XpulpNN which have different features and application domains.
PULPOS is an optimized software library for operating system functionalities such as tasking, memory management and interrupts. Alternatively, FreeRTOS is also ported for PULP, including drivers. The Hardware Abstraction Layer (HAL) is a set of functions that hides the register level of the memory map, allowing common programming entry-points for typical hardware modules.
Let’s now take a user point of view. If you’d like to develop an application using machine learning or digital signal processing algorithms, you can start with PULP SDK if you intend to use mostly integer operations, or with Snitch Runtime for optimized floating-point operations. If you wish to use Linux, CVA6 will be your choice. If you intend to develop Hardware (HW), e.g., an HW accelerator, and would like to quickly test some simple software code, then you can go with PULP Runtime. Finally, if you prefer using FreeRTOS, then you can go with the PULP FreeRTOS.
PULP SDK includes the fundamental libraries, tools, and scripts to develop applications for PULP chips, such as platform descriptions, operating system libraries, drivers, and simulators.
It includes the GVSoC virtual platform, which guarantees high accuracy, including all PULP hardware IP models, such as cores, cluster, interconnect, cache, and udma. It is an event-based simulator (cycle-accurate), resulting in fast simulations and allows an agile reconfiguration thanks to a JSON-based platform description files and python generators.
The virtual platform allows dumping architecture events to help developers debugging their applications by better showing what is happening in the system. For example, it can show instructions being executed, DMA transfers, events generated, memory accesses and so on. The generated traces can be visualized using GTKWave.
PULP FreeRTOS provides FreeRTOS and drivers for development of real-time applications on PULP based systems. Programs can be run using RTL simulation (simulating the hardware design), e.g., QuestaSim, or the GVSoC virtual platform (software emulation of the hardware design). A book about FreeRTOS can be found here and the official documentation is available on this website. It has been tested on Pulpissimo, pulp-open, ControlPULP.
PULP Runtime provides a minimal way to run a barebone program on PULP architectures. Programs can be run using RTL simulation, e.g., QuestaSim. You can use it, e.g., when you develop a new piece of HW, such as HW accelerators. It has been tested on Pulpissimo, pulp-open, ControlPULP, Marsellus.
Snitch Runtime provides a fundamental, bare-metal runtime for Snitch systems. It exposes a minimal API to manage execution of code across the available cores and clusters, query information about a thread's context, and to coordinate and exchange data with other threads.
It includes an LLVM-based binary translation simulator for Snitch systems, called banshee, that is capable of specifically emulating the custom instruction set extensions (instruction-accurate).
For Snitch the Trace-viewer or Catapult is used to visualize traces.
Currently, the support for DSP, NN, DORY, and QuantLab workflow is under development.
CVA6 SDK is used for CVA6 which is a 6-stage, single issue, in-order CPU which implements the 64-bit RISC-V instruction set. You can simulate CVA6 in QuestaSim, VCS, Verilator (the Verilator output can be visualised with GTKWave) and you can emulate CVA6 on FPGAs.
PULP DSP provides optimized functions for digital signal processing, such as dot product, matrix multiplication, convolution, fast Fourier transform, etc, for various data types (8-, 16-, 32-bit integer and fixed-point, and single-precision floating-point). The optimized implementations exploit the SIMD instructions, hardware loop, parallel cluster, etc. It has been tested on Mr. Wolf featuring Ibex and CV32E40P cores and pulp-open. It can also be run on GWT GAP8 featuring CV32E40P cores. For more details, please visit the repository and refer to the documentation, where you can find also a documentation on how to use the library and advices on how to optimize codes on PULP.
PULP NN is a multicore computing library for Quantized Neural Network (QNN) inference on PULP clusters of RISC-V based processors. It includes optimized kernels such as convolution, matrix multiplication, pooling, normalization and other common state-of-the-art QNN kernels. It fully exploits the Xpulp ISA extension and the cluster's parallelism to achieve high performance and high energy efficiency on PULP-based devices. It has been tested on GWT GAP8.
DORY (Deployment Oriented to memoRY) is an automatic tool to deploy DNNs on PULP platforms. DORY abstracts DNN tiling problem as a Constraint Programming (CP) problem: it maximizes the L1 memory utilization under the topological constraints imposed by each DNN layer. Then, it generates ANSI C code to orchestrate off- and on-chip transfers and computation phases. Furthermore, to maximize speed, DORY augments the CP formulation with heuristics promoting performance-effective tile sizes based on the PULP-NN or other custom DNN backends. For more details, visit here.
Check these out on GitHub:
PULP SDKDo you need help with PULP?
Most of the discussions are currently held on individual GitHub pages. We are also preparing an online forum. You may find the following helpful, as well:
Mailing list | FAQ | Contact