By "userland", do you refer to the thread running on the ARM under Linux?  If so, you can use standard GDB.  The multi-core PULP accelerator currently does not have a debug interface, so you applications running there have to be debugged with `printf`s or in simulation.


