Posts: 6
Threads: 1
Joined: Nov 2024
Hello, this is my first time working with PULPissimo for my research group’s project, and I’ve encountered some questions during its use:
- Why does PULPissimo seem to lack an L1 cache?
- If both data and instructions rely on the L2 cache for caching, wouldn’t that lead to potential conflicts?
Additionally, I’ve noticed that other PULP platforms seem to include an L1 cache. Could you explain the reasoning behind this design choice in PULPissimo? If there are any misunderstandings in my interpretation, I’d greatly appreciate it if you could point them out. Thank you very much!
Posts: 152
Threads: 0
Joined: Oct 2018
I think the naming might be a bit confusing. Our main research projects do not use traditional data cache. We use a local (smaller) memory that is able to respond within a cycle. The higher level hierarchies are then managed differently (either through a DMA, i.e. software controlled). In short L1 refers to the level1 memory, and is not necessarily interpreted as L1 cache. Note that the instruction cache in these systems works in a traditional way.
Conflict between instruction and data memory reads in such systems are handled by multiple physical banks (the logarithmic or tcdm interconnect would allow the system to access two parallel reads from different memory banks) The conflicts are reduced by placing the code and data to physically distinct memory blocks. In some systems where there are also acceleerators this leads to more elaborate banking designs. They differ from application to application so it is not a generic method, we have some projects where we are more or less aggressive with these tricks.
Our Ariane/CVA6 based systems use traditional L1 data caches.
Hope that clarifies some questions
Visit pulp-platform.org and follow us on twitter @pulp_platform
Posts: 6
Threads: 1
Joined: Nov 2024
11-27-2024, 12:57 AM
(This post was last modified: 11-27-2024, 01:19 AM by jsen_che11.)
(11-26-2024, 06:44 PM)kgf Wrote: I think the naming might be a bit confusing. Our main research projects do not use traditional data cache. We use a local (smaller) memory that is able to respond within a cycle. The higher level hierarchies are then managed differently (either through a DMA, i.e. software controlled). In short L1 refers to the level1 memory, and is not necessarily interpreted as L1 cache. Note that the instruction cache in these systems works in a traditional way.
Conflict between instruction and data memory reads in such systems are handled by multiple physical banks (the logarithmic or tcdm interconnect would allow the system to access two parallel reads from different memory banks) The conflicts are reduced by placing the code and data to physically distinct memory blocks. In some systems where there are also acceleerators this leads to more elaborate banking designs. They differ from application to application so it is not a generic method, we have some projects where we are more or less aggressive with these tricks.
Our Ariane/CVA6 based systems use traditional L1 data caches.
Hope that clarifies some questions
Thank you for your reply! I’ve understood that
"the conflicts are reduced by placing the code and data into physically distinct memory blocks." However, I still have a few questions that I’d like to confirm with you:
- In the image below, is the I$ on the left (in the riscy core) the smaller local memory you described for caching data, while Ibuf represents the traditional instruction cache? And do these two data paths communicate with the L2 memory via the TCDM interconnect, corresponding to the
s_lint_fc_data_bus and s_lint_fc_instr_bus signals in the code?
- Unlike Pulpissimo, does the multicore system on the right have a specific data cache (TCDM) and instruction cache, and these two then exchange data with the L2 cache via the SoC bus? Is my understanding correct?
I would greatly appreciate it if you could reply.
Posts: 6
Threads: 1
Joined: Nov 2024
- I think I’ve figured out some of it. In the diagram on the left, I$ / Ibuf refers to the traditional instruction cache, while the data cache is not shown because it’s just a single-cycle register.
- Unlike Pulpissimo, does the multicore system on the right have a dedicated data cache (TCDM) and instruction cache, and do these two then exchange data with the L2 cache via the SoC bus? Is my understanding correct?
Is my current understanding correct? sincerely hope to receive your response.