μClinux on Ibex or CV32E40P?
#1
Hello, 

I am exploring the various pros and cons to adapting 32-bit cores Ibex & CV32E40P to run μClinux (w/o MMU). If the power efficiency is not much less, I can understand the benefits of using CVA6 which can run full-featured (MMU) linux, but I am curious what the power consumption is for CVA6- I do not have access to the paper: "Slow and steady wins the race? A comparison of ultra-low-power RISC-V cores for Internet-of-Things applications." 

However I did read, "Micro-riscy is 1.6× smaller than Zero-riscy (∼11.6 kgates in UMC 65nm), has a power envelope of just 100μW at 160MHz and it is 1.4× more energy efficient than Zero-riscy on pure control code."

I would also like access to: "Near-Threshold RISC-V Core With DSP Extensions for Scalable IoT Endpoint Devices" In a low-power 28-nm FD-SOI process, a peak efficiency of 193 MOPS/mW (40 MHz and 1 mW) can be achieved." 

From reading the abstract, it appears Ibex can run at 100uW, and another core, in the 2nd article, runs at 1mW- Is that CV32E40P?

The reason I am asking is, I would like to build upon a research paper, the "Battery-Free Game Boy" in Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies September 2020 Article No.: 111, which uses an Ambiq Micro Apollo3 board running at sub-threshold voltage, similar to the near-threshold voltage of the RISC-V Core mentioned above. The theory is, if μClinux could be adapted to run on Ibex or CV32E40P, using external memory for RAM, an iPod, android phone, or even a laptop could be built with it and be powered by amorphous solar panels, and ultra-low power e-ink, with battery backup. Thank you.
Reply
#2
Hello,

Great question. Our entire research is based on developing systems that can do more with less power/energy.

First of all, it is important to differentiate between power and energy. Power is all about how much current your circuit draws in an instant. But it does not really tell you how much work your circuit does. The story is a bit more complicated, but basically if you run a circuit at half the speed (half the MHz), the power will also halve. Obviously your circuit will also take twice as long to finish the same job. This is where energy comes in. Energy is Power x Time. So in the above example (grossly simplified), power would be half, but the energy would be the same.

They are both important, (again much simplified) but if you are looking to optimize the life time of a battery operated device, it is energy that is more of a concern (as the battery only stores a certain amount of energy), but when you talk about solar panels (energy harvesting), there is a certain power that you can obtain from a panel, and you have to make sure that you stay below that level.

The tricky part is, you can optimize circuits for low power or more energy efficient operation, but the optimizations are not always the same. We identify two different sources of power consumption. One is dynamic (when the circuit is doing something useful), and the other one is static (when the circuit is powered, there is some unwanted current flow). Normally the static power consumption is really small, and you ignore it, but as you save more and more power, the static power consumption starts to dominate, so some of the tricks that worked well for you, stop working after a while. Reducing the operating voltage is one of them. Once you reach near threshold (or sub threshold) you basically reach a point where dynamic and static power consumption are roughly equal.

So what can you do for low power / higher energy efficiency
* First of all it is a technology question. Newer technologies (7nm, 22nm) are smaller and faster transistors than older (180nm).
* Since static power has become so important, newer technologies do not have ONE transistor, there are usually 10 or more variations of transistors that change may have 100x difference in their static power. Take a look at https://www.synopsys.com/designware-ip/t...-16ff.html for a longer description of different flavors in ONE technology
* Smaller area means less transistors, everything else being equal, less static power. So if you are going for POWER, (again simplified) smaller is better. Our smaller micro/zeroRISCY (now Ibex) are optimized for this reason
* Energy efficiency means that you can finish a given program quicker, parallelization and making sure that all units are kept busy and none is waiting doing nothing (and wasting power) are key tricks to increase the efficiency. RI5CY (now CV32E40P) was adapted for this. It is much bigger (more kGE) but it is more ENERGY efficient than its smaller brothers.
* Whereas Ariane (or CVA6) was built just to run Linux, it is not particularly optimized for one or the other. Since it is a 64 bit core, it is much larger (about 4x RI5CY), it is not really low power
* Whatever the core, the memory needed is MUCH MORE important actually. roughly speaking 5-10 kGEs is the size of 1 kilobyte memory. Basically 8kByte memory is one RI5CY core. (or 2kByte memory is one Ibex core).

All these are gross simplifications and approximations, but our entire research is based on developing architectures that are better.

Practically all our presentations are online under : https://pulp-platform.org/conferences.html These contain all the information in the papers that you may not have access to.

I hope this helps,
Visit pulp-platform.org and follow us on twitter @pulp_platform
Reply
#3
(01-15-2021, 06:37 AM)kgf Wrote: "Hello,

Great question. Our entire research is based on developing systems that can do more with less power/energy.

First of all, it is important to differentiate between power and energy. Power is all about how much current your circuit draws in an instant. But it does not really tell you how much work your circuit does. The story is a bit more complicated, but basically if you run a circuit at half the speed (half the MHz), the power will also halve. Obviously your circuit will also take twice as long to finish the same job. This is where energy comes in. Energy is Power x Time. So in the above example (grossly simplified), power would be half, but the energy would be the same."

Thank you for clarifying this. Some of my responses will be between your quotes and not outside the bubble, but I have added quotes to each of your selected statements. 

There are some IoT devices like smartwatches that are not speed dependent, and are able to run at a very low speed for basic features- telling time, checking weather over 6LoWPAN. Since I believe this could be accomplished on an an Ibex(micro/zeroRISCY or M0+, it would appear an RTOS would be a good fit for Ibex.  

"They are both important, (again much simplified) but if you are looking to optimize the life time of a battery operated device, it is energy that is more of a concern (as the battery only stores a certain amount of energy), but when you talk about solar panels (energy harvesting), there is a certain power that you can obtain from a panel, and you have to make sure that you stay below that level."

Thank you for addressing this. I do want to design a laptop that has a thermal design power (TDP) that uses less than the amount the energy harvester can collect in indoor lighting- similar to the TI-30Xa Solar, although that was a static processor. What I find fascinating is how long solar calculators have been around (1976) and that advances in transistors at 10nm or lower still do not seem capable (or designed) to running a linux OS with ambient lighting.

"The tricky part is, you can optimize circuits for low power or more energy efficient operation, but the optimizations are not always the same. We identify two different sources of power consumption. One is dynamic (when the circuit is doing something useful), and the other one is static (when the circuit is powered, there is some unwanted current flow). Normally the static power consumption is really small, and you ignore it, but as you save more and more power, the static power consumption starts to dominate, so some of the tricks that worked well for you, stop working after a while. Reducing the operating voltage is one of them. Once you reach near threshold (or sub threshold) you basically reach a point where dynamic and static power consumption are roughly equal.

So what can you do for low power / higher energy efficiency
* First of all it is a technology question. Newer technologies (7nm, 22nm) are smaller and faster transistors than older (180nm). "

I read that the Samsung Exynos 9110 uses two 10nm Arm A53 cores  (that run up to 2ghz), a Mali 430 GPU, and runs Tizen (Linux OS). The Galaxy Watch uses it with a battery of 4 days. It is capable of playing YouTube. Considering the size of that battery, it suggests a laptop could have even longer battery.

The niche segment of application capabilities that I have in mind is not necessarily a fully-featured multimedia operating system, because I understand a solar panel may not be able to keep up with the power consumption of video acceleration. However, I am surprised that there are very few mobile products that offer keyboards for productivity that exclude multimedia features. One exception is the Freewrite: https://getfreewrite.com/
This uses an e-ink screen, has a full, physical keyboard, and allows writers to use a display for basic word processing.

The pros of the Freewrite is that it is optimized for typewriting, and it is possible that a RISC-V core could be used for something like this without needing to provide a more energy-consuming linux.

The cons, is that it wouldn't be useful for anything except typewriting. There are many other applications that could be used on a laptop that do not require a heavy cpu utilization, such as LibreOffice, Abiword, and even some light web browsers: https://www.falkon.org/ https://astian.org/en/midori-browser/ I would not mind disabling video if it would cause the battery to not keep up with the energy harvest replenishment.

"* Since static power has become so important, newer technologies do not have ONE transistor, there are usually 10 or more variations of transistors that change may have 100x difference in their static power. Take a look at https://www.synopsys.com/designware-ip/t...-16ff.html for a longer description of different flavors in ONE technology
* Smaller area means less transistors, everything else being equal, less static power. So if you are going for POWER, (again simplified) smaller is better. Our smaller micro/zeroRISCY (now Ibex) are optimized for this reason
* Energy efficiency means that you can finish a given program quicker, parallelization and making sure that all units are kept busy and none is waiting doing nothing (and wasting power) are key tricks to increase the efficiency. RI5CY (now CV32E40P) was adapted for this. It is much bigger (more kGE) but it is more ENERGY efficient than its smaller brothers.
* Whereas Ariane (or CVA6) was built just to run Linux, it is not particularly optimized for one or the other. Since it is a 64 bit core, it is much larger (about 4x RI5CY), it is not really low power
* Whatever the core, the memory needed is MUCH MORE important actually. roughly speaking 5-10 kGEs is the size of 1 kilobyte memory. Basically 8kByte memory is one RI5CY core. (or 2kByte memory is one Ibex core)."

Thank you for this information. I do not know what a kGE is. is it a unit of power/area? Also, considering memory is a significant factor in power consumption, what types of memory technologies (DDR4L\SRAM\NAND) could be adapted to run linux on lower power? I apologize for the many questions, as you can see, I want the best of both worlds, low power microcontrollers and application microprocessors- I am willing to settle with something in between, which is why the RI5CY [CV32E40P] appears to be a good fit.

While you were typing your response I was updating my post with an edit to add new information I found:

"Update: I have found some of my answer from this 2018 post: https://pulp-platform.org/community/show...d=86#pid86

"RI5CY [CV32E40P] is something that would compare to ARM M4(F) in terms of capability. (There is a FPU option in RI5CY)
Zero/Micro Riscy [Ibex] is more like the ARM M0/1/3 core
Ariane [CVA6] is a 64bit core, more similar to a ARMV8-A architecture, a bit like A53 and A57, but implements a simpler architecture (single issue, in order) "

Knowing this, and that the Ambiq Micro Apollo3 uses an ARM M4F, it would appear that RI5CY [CV32E40P] would be possible to run uClinux with an FPU. It seems like application processors require it but I know I am oversimplifying here.  I am curious if https://riot-os.org/ could be modified to run more user space applications. Some features: "An experimental Rust API is also available.[4] It has full multithreading and real-time abilities.[5] SSL/TLS is supported by popular libraries such as wolfSSL.[6]"
RIOT provides multiple network stacks,[8] including IPv6, 6LoWPAN, or Content centric networking and standard protocols such as RPL,[9] User Datagram "Protocol (UDP), Transmission Control Protocol (TCP), and CoAP.

Some more info: "Linux on the other hand, uses a scheduler, which guarantees a fair distribution of processing time. The programming models in Contiki and TinyOS are based on the event driven model, in a way that all tasks are execute within the same context, although they offer partial multi-threading support.
Comparision: These two Contiki OS and Tiny OS, which are suitable for IoT applications. But RIOT fares better when it comes to memory usage and support." https://medium.com/@manjunathperiyapatna...cbf1005baf"

CV32E40P\RI5CY seems to have many of the features of a linux-capable core, with the exception of MMU.

"All these are gross simplifications and approximations, but our entire research is based on developing architectures that are better.

Practically all our presentations are online under : https://pulp-platform.org/conferences.html These contain all the information in the papers that you may not have access to.

I hope this helps,"

Thank you very much kgf,

I started with a raspberry pi zero, and was able to run it on solar power. I have written about that here: https://www.raspberrypi.org/forums/viewt...1#p1785481
I also am writing about e-ink screens here: https://forum.ei2030.org/t/e-ink-low-pow...ame-lid/82

I would be happy to contribute any research I can to your project.

Thank you
Reply
#4
(01-15-2021, 06:37 AM)kgf Wrote: * Whereas Ariane (or CVA6) was built just to run Linux, it is not particularly optimized for one or the other. Since it is a 64 bit core, it is much larger (about 4x RI5CY), it is not really low power
* Whatever the core, the memory needed is MUCH MORE important actually. roughly speaking 5-10 kGEs is the size of 1 kilobyte memory. Basically 8kByte memory is one RI5CY core. (or 2kByte memory is one Ibex core)."

What is the lowest power RAM that could be integrated with an application processor?  

I found a relatively fast one: https://www.cypress.com/file/444201/download (20mhz)
SRAM seems like the goto way to boot an RTOS, but the size for many years has been too low for application processors: https://www.emcraft.com/stm32f429discove...-footprint

Could I put 8 of those 8Mbit FRAM chips together on a board/chip? Could 6MB boot linux? This has 6 SPI ports: https://www.top-electronicsusa.com/ama3b...17907.html

Or would I need to use a different interface, like Parallel? https://www.mouser.com/Semiconductors/Me.../_/N-4bzpt Some run around 1mhz, but I do not know if that is too slow to run linux. 

For IoT, it seems like speed is less of an issue:
https://eepower.com/news/sram-with-lowes...84ns-read/  

Recent products: 
https://www.cypress.com/file/451661/download
https://www.renesas.com/us/en/products/m...ower-srams

Edit: I realize integrated memory is probably more efficient than external memory, although I do not know if the SPI ports are low power enough to serve as a natural extension of the IC. Also, the Cypress FRAM appears to require high-speed SPI, and I'm not sure how many the Apollo3 board has. Also the transistors in the add-on memory are probably larger than the DDR5 sticks being made.
Reply


Forum Jump: