Modifications of the HERO platform
#1
Hello,

I'm very interested by the HERO platform on the ZC706 FPGA board. I tried to modify bigpulp by defining 2 clusters of 3 cores each. To do this, I changed the number of clusters and cores in the bigpulp/fe/rtl/includes/pulp_soc_defines.sv file and the hero_sdk/pulp_sdk/pulp_configs/configs/systems/hero-z-7045.json file.

To generate the bitstream, I followed the steps described in this link https://github.com/pulp-platform/bigpulp. And to generate the SDK, I followed the steps described in this link https://pulp-platform.org/hero/doc/software/.

However, I was not able to run the "hero-openmp-examples" programs. In the case of the mm-large program, matrix multiplication is correctly executed on ARM, but for the part to be executed on bigPulp it is frozen.

Did I modify the right files for the bigPULP part and the HERO-SDK part?
Otherwise do you have any idea what the problem is?



Thanks in advance.
Reply
#2
Hello Olivier,

The configuration for the ZC706 does not natively have multiple clusters, but the `JUNO` configuration does.  I suggest you compare those two configurations and import the pieces related to multiple clusters from the `JUNO` configuration.

Additionally, I suggest you verify the multi-cluster configuration step-wise.  First, open an instance of your favorite RTL simulator and check whether all clusters and cores are connected properly.  In particular, I think we never tested a non-power-of-two number of cores, so that might cause problems.  Once you are confident in this, modify the SDK and create a simple binary where all cores print something, and run it in RTL simulation.  Third, synthesize in Vivado and check the post-implementation schematic that connections are correct.  Fourth, generate a bitstream and do manual memory accesses on the board to check whether both clusters with their cores are there.  Fifth, modify the host libraries and driver and compile them with maximum debug level, then try to offload simple programs.  Finally, try to offload a more realistic application, such as `mm-large`.
Reply
#3
Hello,

Thanks for these indications, I chose the configuration with 2 clusters of 3 cores, because the configuration with 2 clusters of 4 cores did not allow to generate the bitstream (all the bram is used).
I would try to follow these indications, and to generate the configuration 2 clusters of 4 cores more carefully.

Best.
Reply


Forum Jump:


Users browsing this thread: 2 Guest(s)