BigEd wrote: ↑
Sun May 17, 2020 8:55 am
Ah, by slow path I was thinking of a conventional instruction-at-a-time emulator.
I had another thought: outside of copy protection, what forms of self-modifying code do we see? I would expect it's mostly a tactic used to save a few cycles, by patching operands to avoid indirection, possibly doing it in zero page to save a few more.
Sprite loops tend to use a lot self-modification. As you note, hosting in the zero page is a useful trick! I hit that with both Wizadore and Stryker's Run (same author) and that was annoying to implement in the JIT
But it's not always so simple. If we take Exile, which is optimized to within an inch of its life, there's a large variety of self-modifcation, including a lot of opcode self-modifcation.
A search for "self mod" in the Exile annotated code is a bit of an eye-opener: http://www.level7.org.uk/miscellany/exi ... sembly.txt
Would it be possible to look at these routines macroscopically, and perform the actions they perform, separating out the incidental write to the code stream from the important writes to memory or peripheral?
Yeah, it's probably a productive direction. If we look at the start of the Galaforce sprite routine at $0B00, it goes wild with self-modifying stores:
(6502db) d b00
[ITRP] 0B00: STA $0B4F
[ITRP] 0B03: STA $0B60
[ITRP] 0B06: STA $0B71
[ITRP] 0B09: STA $0B82
[ITRP] 0B0C: STA $0B93
[ITRP] 0B0F: STA $0BA4
[ITRP] 0B12: STX $0B56
[ITRP] 0B15: STX $0B67
[ITRP] 0B18: STX $0B78
[ITRP] 0B1B: STX $0B89
[ITRP] 0B1E: STX $0B9A
[ITRP] 0B21: STX $0BAB
(a bunch more follow too!)
It's most common -- by far -- for self-modifying stores to use the plain abs mode, so the JIT engine has a fully resolved address at compile time. It knows whether x64 code has been compiled at any given address and can act accordingly, including optimizing away the "self-modifcation stamp" for stores to non-code addresses.
At a minimum, coalescing the cache invalidations would be do-able. I do worry that a more complicated scheme would be prone to bugs. For example, if we emit compiled code that contains assumptions about what is or is not at a given store, those assumptions are themselves prone to invalidation. And then those invalidations might cascade across blocks. etc.
Actually, having just written all that, when running Galaforce, beebjit reaches a steady state where there are no compilations occurring. This is because all self-modified operands are spotted and recompiled to just load the latest operand from 6502 memory. So although there's a lot of self-modifying going on, there are no recompiles (or potential ARM cache invalidations) going on. In the short term, I'm going to work on getting even "difficult" cases like Exile running with no recompiles in the steady state.
That is to say, a self-modifying sprite routine could be recognised and special-cased to run at full speed. A self-modifying decryption routine will be emulated conventionally.