Sharing data between PULP and HWPE on HERO
#1
Hi,

My goal is to run an application on PULP which passes data to my PULP accelerator (HWPE) and vice versa (I use the ARM standalone application just as a way of running my PULP application, if this is at all necessary, is it? So I don't want to share data between the ARM and PULP, but between PULP and my HWPE)
When I simulated my accelerator on PULPissimo, my PULP application stored data in the TCDM simply by accessing a known offset (there was no virtualization), and the accelerator could access the same data with the appropriate address. I want to have the same functionality with HERO, so how do I do it? If I use addresses like in PULPissimo, what are the proper addresses? (I was advised to use 0x1b201000, but it didn't work for me) If it's more complicated than just accessing some offset, how do I do it otherwise and what Makefile I use which will work with both the HWPE API and the data sharing API?

In addition, I have another question regarding compiling a PULP application (not accelerator related). I want to use the riscv-blas library in my PULP application. This library includes Fortran code. The riscv-toolchain in the HERO environment doesn't have gfortran, so I tried to build another riscv toolchain with gfortran in order to compile the library. However, when I try to link libgfortran.a (generated with the second toolchain) to the PULP application using the HERO scripts, I get a lot of link errors about std C functions that are missing. So I was wondering, what is the proper way of compiling Fortran in a PULP application? Alternatively, if you're familiar with another solution of using BLAS on PULP, I'd appreciate if you could refer me to it.

Thanks,
Adi
Reply
#2
Hi Adi,


Quote:I use the ARM standalone application just as a way of running my PULP application, if this is at all necessary, is it?
Yes, on HERO you need to use the standalone application to launch execution on PULP from the ARM host.

Quote:When I simulated my accelerator on PULPissimo, my PULP application stored data in the TCDM simply by accessing a known offset (there was no virtualization), and the accelerator could access the same data with the appropriate address. I want to have the same functionality with HERO, so how do I do it?
The TCDM of PULP is physically addressed on HERO, just like it is on PULPissimo. You should be able to use the same offset if it does not conflict with other data stored in the TCDM (e.g., by the RTE).

Quote:If I use addresses like in PULPissimo, what are the proper addresses? (I was advised to use 0x1b201000, but it didn't work for me)
Proper addresses for what? 0x1b201000 would be in the peripherals, and I think it should go to the HWPE. What do you mean by it did not work? What happens in an RTL simulation of bigPULP with your accelerator at the HWPE when you access that address?

Quote:what Makefile I use which will work with both the HWPE API and the data sharing API?
What data sharing API? Not HERO's, since you mentioned that you do not want to share data between host and PULP, correct? What does not work with the Makefile from the PULP SDK I sent you a while ago?

Quote:I tried to build another riscv toolchain with gfortran in order to compile the library. However, when I try to link libgfortran.a (generated with the second toolchain) to the PULP application using the HERO scripts, I get a lot of link errors about std C functions that are missing.
What second toolchain did you use? The PULP SDK deliberately only implements a subset of the standard C library, so unless the other toolchain and the Fortran library are similarly specialized, a lot of mismatches can happen. Do you need the functions that are missing, or could you reduce the Fortran library to the required minimum and thus work around the missing functions?


Quote:So I was wondering, what is the proper way of compiling Fortran in a PULP application? Alternatively, if you're familiar with another solution of using BLAS on PULP, I'd appreciate if you could refer me to it.
Personally I do not have experience with Fortran and BLAS on PULP, but maybe other members of the members of the team do.
Reply
#3
(01-08-2019, 09:57 AM)akurth Wrote:
Quote:If I use addresses like in PULPissimo, what are the proper addresses? (I was advised to use 0x1b201000, but it didn't work for me)
Proper addresses for what? 0x1b201000 would be in the peripherals, and I think it should go to the HWPE. What do you mean by it did not work? What happens in an RTL simulation of bigPULP with your accelerator at the HWPE when you access that address?

I followed your advice and ran an RTL simulation. I followed the relevant periph symbols and I see that data is written to some address which I didn't write any data to (I did read from it, but did not write to it). In addition, I noticed that the r_valid signal is active when the r_data contains x... so that makes me think I didn't connect my HWPE to the cluster in the proper way. I changed xne_wrap.sv, created a top wrap for my accelerator and used its instance instead of xne_top_wrap. This is the code for my wrap. I also set XNE_PRESENT to 1.
Do you know what can be the problem? Alternatively, is there a working example for a HWPE integrated into bigpulp?

Thanks,
Adi
Reply
#4
Quote:I see that data is written to some address which I didn't write any data to (I did read from it, but did not write to it)


A TCDM address? An address in your peripheral? Where does the write enable `data_we_o` get flipped from low to high so that a read becomes a write? Is it a read when it comes out of a RI5CY core?

Quote:I noticed that the r_valid signal is active when the r_data contains x

That's how completed writes are signaled in the protocol (see RI5CY Manual rev 2.2, section 3.2), so it is fine if this happens in response to a write. It is not okay as read response, of course.

Quote:This is the code for my wrap.

I did not find obvious mistakes in this code. What do you observe on the `periph` port of `i_dp_top` when you try to read or write it from a core? If nothing happens there, what happens at the next higher hierarchy level? If you have problems accessing your peripheral, they can usually be solved by tracing accesses from the core to the peripheral.

Quote:Alternatively, is there a working example for a HWPE integrated into bigpulp?

Not yet, but I think you are on a very good way of creating one, right? :-)
Reply
#5
Hi Andreas,

After some debugging, I managed to find the problem.
First, in xne_wrap.sv I believe it's supposed to be: 
.periph_wen       (hwacc_cfg_slave.wen    ) 
And not:
.periph_wen       (~hwacc_cfg_slave.wen    )

In addition, in cluster_clock_gating.sv, the clock gating is not really implemented and I understand this is because you can't do it on FPGA. However, with the existing code, the functionality in hwpe_ctrl_regfile_latch.sv seems to be broken, since it looks like it relies on clock gating.
The consequence is that when I write some value to a certain register, all the registers get this value.
In order to make it work, I used the code for clock gating from pulpissimo. However, I believe this works only in simulations and not on the FPGA. How do I fix the problem?

Edit
For now, I fixed it by using the original cluster_clock_gating.sv file (with no clock gating), and changed the condition in the latch_wdata process in hwpe_ctrl_regfile_latch.sv to:
Code:
if ((ClocksxC[k][l] == 1'b1) && (WAddrOneHotxD[k][l]))
Instead of:
Code:
if( ClocksxC[k][l] == 1'b1)
Worked for me in simulation. Haven't tried it on FPGA yet.

Thanks,
Adi
Reply
#6
Hi Adi,

Great that you figured this out and thanks for sharing your solution!

I will look at this in more detail but am very busy right now. It's solved for you at the moment, correct?
Reply
#7
(01-25-2019, 08:23 PM)akurth Wrote: I will look at this in more detail but am very busy right now. It's solved for you at the moment, correct?

Yes, thanks.
Reply
#8
Hi Andreas,

How do I change the TCDM size in HERO?

Thanks,
Adi
Reply
#9
(01-31-2019, 01:01 PM)Adi Wrote: How do I change the TCDM size in HERO?

You have to change the TCDM_SIZE in pulp_soc_defines.sv. If it should exceed 2 MiB, you additionally have to adapt the address map of the cluster bus in cluster_bus_defines.sv. In that case, you will have to update address offsets in the SDK (archi/memory_map.h).

If you are using the PULP SDK for allocating memory in the TCDM, you should change ARCHI_L1_SIZE in archi/properties.h.

Make sure to resynthesize and implement all three HW projects (cluster, SoC, and bigpulp) and rebuild the PULP SDK and all your applications.
Reply
#10
(02-04-2019, 07:46 PM)akurth Wrote: You have to change the TCDM_SIZE in pulp_soc_defines.sv. If it should exceed 2 MiB, you additionally have to adapt the address map of the cluster bus in cluster_bus_defines.sv. In that case, you will have to update address offsets in the SDK (archi/memory_map.h).

Thanks!
How do I do the same thing in PULPissimo? I see that there is no such constant in pulp_soc_defines.sv in PULPissimo. Do I change TCDM_END_ADDRESS in soc_interconnect.sv?
Reply


Forum Jump:


Users browsing this thread: 2 Guest(s)