The scrolling in XeO3 takes a long time, every game cycle I do this:
ldx #39
ScrollLoop
lda BackBuffer,x
sta HWScreen2+$400+(40*00),x
lda BackBuffer,x
sta HWScreen2+$400+(40*01),x
lda BackBuffer,x
sta HWScreen2+$400+(40*02),x
lda BackBuffer,x
sta HWScreen2+$400+(40*03),x
lda BackBuffer,x
sta HWScreen2+$400+(40*04),x
lda BackBuffer,x
sta HWScreen2+$400+(40*05),x
lda BackBuffer,x
sta HWScreen2+$400+(40*06),x
lda BackBuffer,x
sta HWScreen2+$400+(40*07),x
lda BackBuffer,x
sta HWScreen2+$400+(40*08),x
lda BackBuffer,x
sta HWScreen2+$400+(40*09),x
lda BackBuffer,x
sta HWScreen2+$400+(40*10),x
lda BackBuffer,x
sta HWScreen2+$400+(40*11),x
lda BackBuffer,x
sta HWScreen2+$400+(40*12),x
lda BackBuffer,x
sta HWScreen2+$400+(40*13),x
lda BackBuffer,x
sta HWScreen2+$400+(40*14),x
lda BackBuffer,x
sta HWScreen2+$400+(40*15),x
lda BackBuffer,x
sta HWScreen2+$400+(40*16),x
lda BackBuffer,x
sta HWScreen2+$400+(40*17),x
lda BackBuffer,x
sta HWScreen2+$400+(40*18),x
lda BackBuffer,x
sta HWScreen2+$400+(40*19),x
lda BackBuffer,x
sta HWScreen2+$400+(40*20),x
dex
jpl ScrollLoop1
rts
This code is self-modified to address the new location of the back buffer, and I have to use a jpl (macro) since a normal branch is just out of reach, so this takes (40*21*9)+(40*7) = 7840 cycles. (this is approx as there are also page boundary crossings hidden in here.)
Now in 65816, I can do exactly the same but being 16 bit, the loop is half, and although we add a couple more cycles for LDA/STA, its still much quicker. So the loop is now (20*21*11)+(40*7) = 4900 cycles.
And now lastly, the 65816 has a block transfer instruction MVN+MVP which are like Z80's LDIR instruction, which means (BEST case) its now (20*21*7) = 2940 cycles. Now, although the block transfer would be broken up a little mode (to do lines mainly), its still only going to be around 3000. So not only is more than twice the speed as the 6502 version, but we have the new 20Mhz clock as well.
..............Bitmap blitting suddenly becomes REALLY interesting!!
No comments:
Post a Comment