After some investigation I finally realized how most of the gdb-stub works and how you can make it work, I also found out what doesn't work and that there are some things I still don't fully understand.
After having a stub that I could download to the target's ROM (in this case an EPROM Emulator) and would properly communicate to the gdb running on the host, I built a C runtime assembly file (crt0.s) and linker command to place the program in the available RAM. Since I had done a similar work for the stub running in ROM, it was only a question of changing some parameters and some basic code at the start up routine.
The test program was very simple, I didn't want to test any advance feature just a simple program flow with loops.
When I added a coded breakpoint in the code, all would work consistently, I could
continue without a worry, control flow would be transferred back to the stub when the breakpoint was hit.
But as soon as I added a gdb break point what I expected to happen was, gdb(host) would read the contents of the instruction at the breakpoint and substitute it by a breakpoint opcode (TRAP). From reading the gdb-stub code and most internet references, a breakpoint should be a TRAP #1 instruction, I had setup the gdb-stub to intercept TRAP #1 instructions and report them as breakpoint to gdb(host), but every time I place a breakpoint I would get a privilege violation at the address following the one of the breakpoint.
The problem was that gdb(host) instead of adding a TRAP #1 (0x4e41) would add a TRAP #F (0x4e4f) (!!!), see it in the below:
I set a break point at line 20 of the code, gdb(host) tries a Z0 command but the stub doesn't implement it, gdb(host) acknowledges it, then it reads the code at the breakpoint position (m4074,2 with reply 2039), then writes a trap instruction at this address (M4074,2:4e4f).
You'll need to click on it to see it ok, sorry couldn't solve this one in the old fashion way..
When after several tries I decided to check "everything" that came in and out of the stub, I check that the instruction is in fact a TRAP #F and not a TRAP #1 as I was made to think, naturally I had not set the gdb-stub to catch these exceptions, then the stub would return to the address following the trap which is not an instruction, hence the privilege violation.
This problem above is probably solve in some gdb configuration line, but I didn't found it yet.. If I do I'll post it back here.
The second problem is only detected when you get these breakpoints to work. After the breakpoint the PC is pointing to breakpoint+2, before you continue you need to roll the PC back by two. This can be done either by the stub or gdb itself. I chose to do it at the stub, but it is possible to do it at the gdb(host) (also some configuration option must be there...). The only problem with any of these is that once a breakpoint is hit, it must be disabled before execution continues.
Gdb(host) just before you continue changes the instruction at the breakpoint address to a trap, then execution continues, when the breakpoint is hit control is given back to the stub and the instruction at the breakpoint is replaced by the previous. If the PC is rolled back by two, at the next continue, the same breakpoint will be hit without further processing... so we are blocked... An option would be to control everything from gdb(host), before continue, single step one instruction (the breakpoint instruction), change the breakpoint by a trap and then continue... I will have to investigate a bit further to see if this is already implemented in gdb but it seems (as Mythbusters would say
Plausible).
If you are also curious about why the coded breakpoint (TRAP #0) above is different than the gdb-inserted breakpoint (TRAP #1 or #F) it is because gdb-inserted breakpoints must roll back the PC, coded ones do not.