=============================================================================
 Anomie's Register Doc
 $Revision: 1157 $
 $Date: 2007-07-12 16:39:41 -0400 (Thu, 12 Jul 2007) $
 <anomie@users.sourceforge.net>
=============================================================================


REGISTERS
=========

Addr rw?fvha Name
        bits

        Explanation

"Addr" is the address this register is mapped into the SNES memory space.
"Name" is the official and unofficial name of the register
"bits" is either 8 or 16 characters explicating the bitfields in this register.

The flags are:
rw?fvha
||||||+--> '+' if it can be read/written at any time, '-' otherwise
|||||+---> '+' if it can be read/written during H-Blank
||||+----> '+' if it can be read/written during V-Blank
|||+-----> '+' if it can be read/written during force-blank
||+------> Read/Write style: 'b'     => byte
||                           'h'/'l' => read/write high/low byte of a word
||                           'w'     => word read/write twice low then high
|+-------> 'w' if the register is writable for an effect
+--------> 'r' if the register is readable for a value or effect (i.e. not
            open bus).

To find the entry for a particular register, search for the register number
(i.e. '2100') at the very beginning of the line. Note that the DMA registers
are combined, so e.g. to find $4300, $4310, $4320, $4330, $4340, $4350, $4360,
or $4370 you'd search for '43x0'.

For most registers (and most undefined bits of readable registers), the
returned value is Open Bus, that is the last value read over the main bus from
the ROM (typically part of the opcode arguments or the indirect base address).

Registers matching $21x4-6 or $21x8-A (where x is 0-2) return the last value
read from any of the PPU1 registers $2134-6, $2138-A, or $213E. This is known
as PPU1 Open Bus. Similarly, PPU2 Open Bus involves reading registers $213B-D
or $213F (NOT $21xB-D though).

Note that it may be possible to write registers anytime even if marked '-', but
until we have proof '-' is a better guess.

--------

2100  wb++++ INIDISP - Screen Display
        x---bbbb

        x    = Force blank on when set.
        bbbb = Screen brightness, F=max, 0="off".

        Note that force blank CAN be disabled mid-scanline. However, this can
        result in glitched graphics on that scanline, as the internal rendering
        buffers will not have been updated during force blank. Current theory
        is that BGs will be glitched for a few tiles (depending on how far in
        advance the PPU operates), and OBJ will be glitched for the entire
        scanline.

        Also, writing this register on the first line of V-Blank (225 or 240,
        depending on overscan) when force blank is currently active causes the
        OAM Address Reset to occur.


2101  wb++?- OBSEL - Object Size and Chr Address
        sssnnbbb

        sss  = Object size:
            000 =  8x8  and 16x16 sprites
            001 =  8x8  and 32x32 sprites
            010 =  8x8  and 64x64 sprites
            011 = 16x16 and 32x32 sprites
            100 = 16x16 and 64x64 sprites
            101 = 32x32 and 64x64 sprites
            110 = 16x32 and 32x64 sprites ('undocumented')
            111 = 16x32 and 32x32 sprites ('undocumented')

        nn   = Name Select
        bbb  = Name Base Select (Addr>>14)
            See the section "SPRITES" below for details.


2102  wl++?- OAMADDL - OAM Address low byte
2103  wh++?- OAMADDH - OAM Address high bit and Obj Priority
        p------b aaaaaaaa

        p    = Obj Priority activation bit
            When this bit is set, an Obj other than Sprite 0 may be given
            priority. See the section "SPRITES" below for details.

        b aaaaaaaa = OAM address
            This can be thought of in two ways, depending on your conception of
            OAM. If you consider OAM as a 544-byte table, baaaaaaaa is the word
            address into that table. If you consider OAM to be a 512-byte table
            and a 32-byte table, b is the table selector and aaaaaaaa is the
            word address in the table. See the section "SPRITES" below for
            details.
        
        The internal OAM address is invalidated when scanlines are being
        rendered. This invalidation is deterministic, but we do not know how
        it is determined. Thus, the last value written to these registers is
        reloaded into the internal OAM address at the beginning of V-Blank if
        that occurs outside of a force-blank period. This is known as 'OAM
        reset'. 'OAM reset' also occurs on certain writes to $2100.

        Writing to either $2102 or $2103 resets the entire internal OAM Address
        to the values last written to this register. E.g., if you set $104 to
        this register, write 4 bytes, then write $1 to $2103, the internal OAM
        address will point to word 4, not word 6.


2104  wb++-- OAMDATA - Data for OAM write
        dddddddd

        Note that OAM writes are done in an odd manner, in particular
        the low table of OAM is not affected until the high byte of a
        word is written (however, the high table is affected
        immediately). Thus, if you set the address, then alternate writes and
        reads, OAM will never be affected until you reach the high table!

        Similarly, if you set the address to 0, then write 1, 2, read, then
        write 3, OAM will end up as "01 02 01 03", rather than "01 02 xx 03" as
        you might expect.

        Technically, this register CAN be written during H-blank (and probably
        mid-scanline as well). However, due to OAM address invalidation the
        actual OAM byte written will probably not be what you expect. Note that
        writing during force-blank will only work as expected if that
        force-blank was begun during V-Blank, or (probably) if $2102/3 have
        been reset during that force-blank period.

        See the section "SPRITES" below for details.


2105  wb+++- BGMODE - BG Mode and Character Size
        DCBAemmm

        A/B/C/D = BG character size for BG1/BG2/BG3/BG4
            If the bit is set, then the BG is made of 16x16 tiles. Otherwise,
            8x8 tiles are used. However, note that Modes 5 and 6 always use
            16-pixel wide tiles, and Mode 7 always uses 8x8 tiles. See the
            section "BACKGROUNDS" below for details.

        mmm  = BG Mode
        e    = Mode 1 BG3 priority bit
            Mode     BG depth  OPT  Priorities
                     1 2 3 4        Front -> Back
            -=-------=-=-=-=----=---============---
             0       2 2 2 2    n    3AB2ab1CD0cd
             1       4 4 2      n    3AB2ab1C 0c
                        * if e set: C3AB2ab1  0c
             2       4 4        y    3A 2B 1a 0b
             3       8 4        n    3A 2B 1a 0b
             4       8 2        y    3A 2B 1a 0b
             5       4 2        n    3A 2B 1a 0b
             6       4          y    3A 2  1a 0
             7       8          n    3  2  1a 0
             7+EXTBG 8 7        n    3  2B 1a 0b

            "OPT" means "Offset-per-tile mode". For the priorities, numbers
            mean sprites with that priority. Letters correspond to BGs (A=1,
            B=2, etc), with upper/lower case indicating tile priority 1/0. See
            the section "BACKGROUNDS" below for details.

            Mode 7's EXTBG mode allows you to enable BG2, which uses the same
            tilemap and character data as BG1 but interprets bit 7 of the pixel
            data as a priority bit. BG2 also has some oddness to do with some
            of the per-BG registers below. See the Mode 7 section under
            BACKGROUNDS for details.


2106  wb+++- MOSAIC - Screen Pixelation
        xxxxDCBA

        A/B/C/D = Affect BG1/BG2/BG3/BG4

        xxxx = pixel size, 0=1x1, F=16x16
            The mosaic filter goes over the BG and covers each x-by-x square
            with the upper-left pixel of that square, with the top of the first
            row of squares on the 'starting scanline'. If this register is set
            during the frame, the 'starting scanline' is the current scanline,
            otherwise it is the first visible scanline of the frame. I.e. if
            even scanlines are completely red and odd scanlines are completely
            blue, setting the xxxx=1 mid-frame will make the rest of the screen
            either completely red or completely blue depending on whether you
            set xxxx on an even or an odd scanline.

            XXX: It seems that writing the same value to this register does not
            reset the 'starting scanline', but which changes do reset it?

            Note that mosaic is applied after scrolling, but before any clip
            windows, color windows, or math. So the XxX block can be partially
            clipped, and it can be mathed as normal with a non-mosaiced BG. But
            scrolling can't make it partially one color and partially another.

            Modes 5-6 should 'double' the expansion factor to expand
            half-pixels. This actually makes xxxx=0 have a visible effect,
            since the even half-pixels (usually on the subscreen) hide the odd
            half-pixels. The same thing happens vertically with interlace mode.

            Mode 7, of course, is weird. BG1 mosaics about like normal, as long
            as you remember that the Mode 7 transformations have no effect on
            the XxX blocks. BG2 uses bit A to control 'vertical mosaic' and bit
            B to control 'horizontal mosaic', so you could be expanding over
            1xX, Xx1, or XxX blocks. This can get really interesting as BG1
            still uses bit A as normal, so you could have the BG1 pixels
            expanded XxX with high-priority BG2 pixels expanded 1xX on top of
            them.

        See the section "BACKGROUNDS" below for details.


2107  wb++?- BG1SC - BG1 Tilemap Address and Size
2108  wb++?- BG2SC - BG2 Tilemap Address and Size
2109  wb++?- BG3SC - BG3 Tilemap Address and Size
210a  wb++?- BG4SC - BG4 Tilemap Address and Size
        aaaaaayx

        aaaaaa = Tilemap address in VRAM (Addr>>10)
        x    = Tilemap horizontal mirroring
        y    = Tilemap vertical mirroring
            All tilemaps are 32x32 tiles. If x and y are both unset, there is
            one tilemap at Addr. If x is set, a second tilemap follows the
            first that should be considered "to the right of" the first. If y
            is set, a second tilemap follows the first that should be
            considered "below" the first. If both are set, then a second
            follows "to the right", then a third "below", and a fourth "below
            and to the right".

        See the section "BACKGROUNDS" below for more details.


210b  wb++?- BG12NBA - BG1 and 2 Chr Address
210c  wb++?- BG34NBA - BG3 and 4 Chr Address
        bbbbaaaa

        aaaa = Base address for BG1/3 (Addr>>13)
        bbbb = Base address for BG2/4 (Addr>>13)
            See the section "BACKGROUNDS" below for details.


210d  ww+++- BG1HOFS - BG1 Horizontal Scroll
      ww+++- M7HOFS  - Mode 7 BG Horizontal Scroll
210e  ww+++- BG1VOFS - BG1 Vertical Scroll
      ww+++- M7VOFS  - Mode 7 BG Vertical Scroll
        ------xx xxxxxxxx
        ---mmmmm mmmmmmmm

        x = The BG offset, 10 bits.
        m = The Mode 7 BG offset, 13 bits two's-complement signed.

        These are actually two registers in one (or would that be "4 registers
        in 2"?). Anyway, writing $210d will write both BG1HOFS which works
        exactly like the rest of the BGnxOFS registers below ($210f-$2114), and
        M7HOFS which works with the M7* registers ($211b-$2120) instead.

        Modes 0-6 use BG1xOFS and ignore M7xOFS, while Mode 7 uses M7xOFS and
        ignores BG1HOFS. See the appropriate sections below for details, and
        note the different formulas for BG1HOFS versus M7HOFS.


210f  ww+++- BG2HOFS - BG2 Horizontal Scroll
2110  ww+++- BG2VOFS - BG2 Vertical Scroll
2111  ww+++- BG3HOFS - BG3 Horizontal Scroll
2112  ww+++- BG3VOFS - BG3 Vertical Scroll
2113  ww+++- BG4HOFS - BG4 Horizontal Scroll
2114  ww+++- BG4VOFS - BG4 Vertical Scroll
        ------xx xxxxxxxx

        Note that these are "write twice" registers, first the low byte is
        written then the high. Current theory is that writes to the register
        work like this:
          BGnHOFS = (Current<<8) | (Prev&~7) | ((Reg>>8)&7);
          Prev = Current;
            or
          BGnVOFS = (Current<<8) | Prev;
          Prev = Current;

        Note that there is only one Prev shared by all the BGnxOFS registers.
        This is NOT shared with the M7* registers (not even M7xOFS and
        BG1xOFS).

        x = The BG offset, at most 10 bits (some modes effectively use as few
            as 8).

        Note that all BGs wrap if you try to go past their edges. Thus, the
        maximum offset value in BG Modes 0-6 is 1023, since you have at most 64
        tiles (if x/y of BGnSC is set) of 16 pixels each (if the appropriate
        bit of BGMODE is set).

        Horizontal scrolling scrolls in units of full pixels no matter if we're
        rendering a 256-pixel wide screen or a 512-half-pixel wide screen.
        However, vertical scrolling will move in half-line increments if
        interlace mode is active.

        See the section "BACKGROUNDS" below for details.


2115  wb++?- VMAIN - Video Port Control
        i---mmii

        i    = Address increment mode:
                0 => increment after writing $2118/reading $2139
                1 => increment after writing $2119/reading $213a
            Note that a word write stores low first, then high. Thus, if you're
            storing a word value to $2118/9, you'll probably want to set 1
            here.

        ii = Address increment amount
            00 = Normal increment by 1
            01 = Increment by 32
            10 = Increment by 128
            11 = Increment by 128
        
        mm = Address remapping
            00 = No remapping
            01 = Remap addressing aaaaaaaaBBBccccc => aaaaaaaacccccBBB
            10 = Remap addressing aaaaaaaBBBcccccc => aaaaaaaccccccBBB
            11 = Remap addressing aaaaaaBBBccccccc => aaaaaacccccccBBB

            The "remap" modes basically implement address translation. If
            $2116/7 are set to #$0003, then word address #$0018 will be written
            instead, and $2116/7 will be incremented to $0004.


2116  wl++?- VMADDL - VRAM Address low byte
2117  wh++?- VMADDH - VRAM Address high byte
        aaaaaaaa aaaaaaaa

        This sets the address for $2118/9 and $2139/a. Note that this is a word
        address, not a byte address!

        See the sections "BACKGROUNDS" and "SPRITES" below for details.


2118  wl++-- VMDATAL - VRAM Data Write low byte
2119  wh++-- VMDATAH - VRAM Data Write high byte
        xxxxxxxx xxxxxxxx

        This writes data to VRAM. The writes take effect immediately(?), even
        if no increment is performed. The address is incremented when one of
        the two bytes is written; which one depends on the setting of bit 7 of
        register $2115. Keep in mind the address translation bits of $2115 as
        well.

        The interaction between these registers and $2139/a is unknown.

        See the sections "BACKGROUNDS" and "SPRITES" below for details.


211a  wb++?- M7SEL - Mode 7 Settings
        rc----yx

        r    = Playing field size: When clear, the playing field is 1024x1024
            pixels (so the tilemap completely fills it). When set, the playing
            field is much larger, and the 'empty space' fill is controlled by
            bit 6.

        c    = Empty space fill, when bit 7 is set:
               0 = Transparent.
               1 = Fill with character 0. Note that the fill is matrix
                   transformed like all other Mode 7 tiles.
        
        x/y  = Horizontal/Veritcal mirroring. If the bit is set, flip the
               256x256 pixel 'screen' in that direction.

        See the section "BACKGROUNDS" below for details.


211b  ww+++- M7A - Mode 7 Matrix A (also used with $2134/6)
211c  ww+++- M7B - Mode 7 Matrix B (also used with $2134/6)
211d  ww+++- M7C - Mode 7 Matrix C
211e  ww+++- M7D - Mode 7 Matrix D
        aaaaaaaa aaaaaaaa

        Note that these are "write twice" registers, first the low byte is
        written then the high. Current theory is that writes to the register
        work like this:
          Reg = (Current<<8) | Prev;
          Prev = Current;
        
        Note that there is only one Prev shared by all these registers. This
        Prev is NOT shared with the BGnxOFS registers, but it IS shared with
        the M7xOFS registers.

        These set the matrix parameters for Mode 7. The values are an 8-bit
        fixed point, i.e. the value should be divided by 256.0 when used in
        calculations. See below for more explanation.

        The product A*(B>>8) may be read from registers $2134/6. There is
        supposedly no important delay. It may not be operative during Mode 7
        rendering.

        See the section "BACKGROUNDS" below for details.


211f  ww+++- M7X - Mode 7 Center X
2120  ww+++- M7Y - Mode 7 Center Y
        ---xxxxx xxxxxxxx

        Note that these are "write twice" registers, like the other M7*
        registers. See above for the write semantics. The value is 13 bit
        two's-complement signed.

        The matrix transformation formula is:

        [ X ]   [ A B ]   [ SX + M7HOFS - CX ]   [ CX ]
        [   ] = [     ] * [                  ] + [    ]
        [ Y ]   [ C D ]   [ SY + M7VOFS - CY ]   [ CY ]

        Note: SX/SY are screen coordinates. X/Y are coordinates in the playing
        field from which the pixel is taken. If $211a bit 7 is clear, the
        result is then restricted to 0<=X<=1023 and 0<=Y<=1023. If $211a bits 6
        and 7 are both set and X or Y is less than 0 or greater than 1023, use
        the low 3 bits of each to choose the pixel from character 0.

        The bit-accurate formula seems to be something along the lines of:
          #define CLIP(a) (((a)&0x2000)?((a)|~0x3ff):((a)&0x3ff))

          X[0,y] = ((A*CLIP(HOFS-CX))&~63)
                 + ((B*y)&~63) + ((B*CLIP(VOFS-CY))&~63)
                 + (CX<<8)
          Y[0,y] = ((C*CLIP(HOFS-CX))&~63)
                 + ((D*y)&~63) + ((D*CLIP(VOFS-CY))&~63)
                 + (CY<<8)

          X[x,y] = X[x-1,y] + A
          Y[x,y] = Y[x-1,y] + C

        (In all cases, X[] and Y[] are fixed point with 8 bits of fraction)

        See the section "BACKGROUNDS" below for details.


2121  wb+++- CGADD - CGRAM Address
        cccccccc

        This sets the word address (i.e. color) which will be affected by $2122
        and $213b.


2122  ww+++- CGDATA - CGRAM Data write
        -bbbbbgg gggrrrrr

        This writes to CGRAM, effectively setting the palette colors.
        
        Accesses to CGRAM are handled just like accesses to the low table of
        OAM, see $2104 for details.

        Note that the color values are stored in BGR order.


2123  wb+++- W12SEL - Window Mask Settings for BG1 and BG2
2124  wb+++- W34SEL - Window Mask Settings for BG3 and BG4
2125  wb+++- WOBJSEL - Window Mask Settings for OBJ and Color Window
        ABCDabcd

        c    = Enable window 1 for BG1/BG3/OBJ
        a    = Enable window 2 for BG1/BG3/OBJ
        C/A  = Enable window 1/2 for BG2/BG4/Color
            When the bit is set, the corresponding window will affect the
            corresponding background (subject to the settings of $212e/f).

        d    = Window 1 Inversion for BG1/BG3/OBJ
        b    = Window 2 Inversion for BG1/BG3/OBJ
        D/B  = Window 1/2 Inversion for BG2/BG4/Color
            When the bit is set, "W" should be replaced by "~W" (not-W) in the
            window combination formulae below.

        See the section "WINDOWS" below for more details.


2126  wb+++- WH0 - Window 1 Left Position
2127  wb+++- WH1 - Window 1 Right Position
2128  wb+++- WH2 - Window 2 Left Position
2129  wb+++- WH3 - Window 2 Right Position
        xxxxxxxx

        These set the offset of the appropriate edge of the appropriate window.
        Note that if the left edge is greater than the right edge, the window
        is considered to have no range at all (and thus "W" always is false).
        See the section "WINDOWS" below for more details.


212a  wb+++- WBGLOG - Window mask logic for BGs
        44332211
212b  wb+++- WOBJLOG - Window mask logic for OBJs and Color Window
        ----ccoo

        44/33/22/11/oo/cc = Mask logic for BG1/BG2/BG3/BG4/OBJ/Color
            This specified the window combination method, using standard
            boolean operators:
              00 = OR
              01 = AND
              10 = XOR
              11 = XNOR

            Consider two variables, W1 and W2, which are true for pixels
            between the appropriate left and right bounds as set in
            $2126-$2129 and false otherwise. Then, you have the following
            possibilities: (replace "W#" with "~W#", depending on the Inversion
            settings of $2123-$2125)
              Neither window enabled => nothing masked.
              One window enabled     => Either W1 or W2, as appropriate.
              Both windows enabled   => W1 op W2, where "op" is as above.
            Where the function is true, the BG will be masked.

        See the section "WINDOWS" below for more details.


212c  wb+++- TM - Main Screen Designation
212d  wb+++- TS - Subscreen Designation
        ---o4321

        1/2/3/4/o = Enable BG1/BG2/BG3/BG4/OBJ for display
                    on the main (or sub) screen.

        See the section "BACKGROUNDS" below for details.


212e  wb+++- TMW - Window Mask Designation for the Main Screen
212f  wb+++- TSW - Window Mask Designation for the Subscreen
        ---o4321

        1/2/3/4/o = Enable window masking for BG1/BG2/BG3/BG4/OBJ on the
                    main (or sub) screen.

        See the section "BACKGROUNDS" below for details.


2130  wb+++- CGWSEL - Color Addition Select
        ccmm--sd

        cc = Clip colors to black before math
            00 => Never
            01 => Outside Color Window only
            10 => Inside Color Window only
            11 => Always

        mm = Prevent color math
            00 => Never
            01 => Outside Color Window only
            10 => Inside Color Window only
            11 => Always

        s     = Add subscreen (instead of fixed color)

        d     = Direct color mode for 256-color BGs

        See the sections "BACKGROUNDS", "WINDOWS", and "RENDERING THE
        SCREEN" below for details.


2131  wb+++- CGADSUB - Color math designation
        shbo4321

        s    = Add/subtract select
            0 => Add the colors
            1 => Subtract the colors

        h    = Half color math. When set, the result of the color math is
            divided by 2 (except when $2130 bit 1 is set and the fixed color is
            used, or when color is cliped).

        4/3/2/1/o/b = Enable color math on BG1/BG2/BG3/BG4/OBJ/Backdrop

        See the sections "BACKGROUNDS", "WINDOWS", and "RENDERING THE
        SCREEN" below for details.


2132  wb+++- COLDATA - Fixed Color Data
        bgrccccc

        b/g/r = Which color plane(s) to set the intensity for.
        ccccc = Color intensity.

        So basically, to set an orange you'd do something along the lines of:
            LDA #$3f
            STA $2132
            LDA #$4f
            STA $2132
            LDA #$80
            STA $2132

        See the sections "BACKGROUNDS" and "WINDOWS" below for details.


2133  wb+++- SETINI - Screen Mode/Video Select
        se--poIi

        s    = "External Sync". Used for superimposing "sfx" graphics, whatever
            that means. Usually 0. Not much is known about this bit.
            Interestingly, the SPPU1 chip has a pin named "EXTSYNC" (or
            not-EXTSYNC, since it has a bar over it) which is tied to Vcc.

        e    = Mode 7 EXTBG ("Extra BG"). When this bit is set, you may enable
            BG2 on Mode 7. BG2 uses the same tile and character data as BG1,
            but interprets the high bit of the color data as a priority for the
            pixel.

            Various sources report additional effects for this bit, possibly
            related to bit 7. For example, "Enable the Data Supplied From the
            External Lsi.", whatever that means. Of course, maybe that's a
            typo and it's supposed to apply to bit 7 instead.

        p    = Enable pseudo-hires mode. This creates a 512-pixel horizontal
            resolution by taking pixels from the subscreen for the
            even-numbered pixels (zero based) and from the main screen for the
            odd-numbered pixels. Color math behaves just as with Mode 5/6
            hires. The interlace bit still has no effect. Mosaic operates as
            normal (not like Mode 5/6). The 'subscreen' pixel is clipped (by
            windows) when the main-screen pixel to the LEFT is clipped, not
            when the one to the RIGHT is clipped as you'd expect. What happens
            with pixel column 0 is unknown.

            Enabling this bit in Modes 5 or 6 has no effect.

        o    = Overscan mode. When set, 239 lines will be displayed instead of
            the normal 224. This also means V-Blank will occur that
            much later, and be shorter. All that happens is that extra lines
            get added to the display, and it seems the TV will like to move
            the display up 8 pixels. See below for more details.

        I    = OBJ Interlace. When set regardless of BG mode, the OBJ will be
            interlaced (see bit 0 below), and thus will appear half-height.

            Note that this only controls whether obj are drawn as normal or
            not; the interlace signal is only output to the TV based on bit 0
            below.

        i    = Screen interlace. When set in BG mode 5 (and probably 6), the
            effective screen height will be 448 (or 478) pixels, rather than
            224 (or 239). When set in any other mode, the screen will just get
            a bit jumpy. However, toggling the tilemap each field would
            simulate the increased screen height (much like pseudo-hires
            simulates hires).

            In hardware, setting this bit makes the SNES output a normal
            interlace signal rather than always forcing one frame.

        See the sections "BACKGROUNDS" and "SPRITES" below for details.

        Overscan: The bit only matters at the very end of the frame, if you
        change the setting on line 0xE0 before the normal NMI trigger point
        then it's the same as if you had it on all frame. Note that this
        affects both the NMI trigger point and when HDMA stops for the
        frame.

        If you turn the bit off at the very beginning of scanline X (for
        0xE1<=X<=0xF0), NMI will occur on line X and the last HDMA transfer
        will occur on line X-1. However, on my TV at least, the display will
        remain in the normal no-overscan position for lines E1-EC, it will
        move up only one pixel for line ED, and it will lose vertical sync
        for lines EF-F4!

        Turning the bit on, only line E1 gives any effect: NMI will occur on
        line E2, although the last HDMA will still occur on line E0.
        Anything else acts like you left the bit off the whole time. Note,
        however, that if you wait too long after the beginning of the
        scanline then you will get no effect.

        Even if there is no visible effect, the overscan setting still
        affects VRAM writes. In particular, executing "LDA #'-' / STA $2118
        / LDA $2133 / STA $2133 / LDA #'+' / STA $2118" during the E1-F0
        period will write only + or only - to VRAM, depending on whether the
        overscan bit was set to 0 or 1.

        
2134 r l+++? MPYL - Multiplication Result low byte
2135 r m+++? MPYM - Multiplication Result middle byte
2136 r h+++? MPYH - Multiplication Result high byte
        xxxxxxxx xxxxxxxx xxxxxxxx

        This is the 2's compliment product of the 16-bit value written to $211b
        and the 8-bit value most recently written to $211c. There is supposedly
        no important delay. It may not be operative during Mode 7 rendering.


2137   b++++ SLHV - Software Latch for H/V Counter
        --------

        When read, the H/V counter (as read from $213c and $213d) will be
        latched to the current X and Y position if bit 7 of $4201 is set. The
        data actually read is open bus.


2138 r w++?- OAMDATAREAD* - Data for OAM read
        xxxxxxxx

        OAM reads are straightforward: the current byte as set in $2102/3 and
        incremented by reads from this register and writes to $2104 will be
        returned. Note that writes to the lower table are not affected so
        logically. See register $2104 and the section "SPRITES" below for
        details.

        Also, note that OAM address invalidation probably affects the address
        read by this register as well.


2139 r l++?- VMDATALREAD* - VRAM Data Read low byte
213a r h++?- VMDATAHREAD* - VRAM Data Read high byte
        xxxxxxxx xxxxxxxx

        Simply, this reads data from VRAM. The address is incremented when
        either $2139 or $213a is read, depending on the setting of bit 7 of
        $2115.

        Actually, the reading is more complex. When either of these registers
        is read, the appropriate byte from a word-sized buffer is returned. A
        word from VRAM is loaded into this buffer just *before* the VRAM
        address is incremented. The actual data read and the amount of the
        increment depend on the low 4 bits of $2115. The effect of this is
        that a 'dummy read' is required after setting $2116-7 before you start
        getting the actual data.

        The interaction between these registers and $2118/9 is unknown.

        See the sections "BACKGROUNDS" and "SPRITES" below for details.


213b r w++?- CGDATAREAD* - CGRAM Data read
        -bbbbbgg gggrrrrr

        This reads from CGRAM.

        Accesses to CGRAM are handled just like accesses to the low table of
        OAM, see $2138 for details.

        Note that the color values are stored in BGR order. The '-' bit is PPU2
        Open Bus.


213c r w++++ OPHCT - Horizontal Scanline Location
213d r w++++ OPVCT - Vertical Scanline Location
        -------x xxxxxxxx

        These values are latched by reading $2137 when bit 7 of $4201 is set,
        or by clearing-and-setting bit 7 of $4201 either by writing $4201 or by
        pin 6 of Controller Port 2 (the latch occurs on the 1->0 transition).

        Note that the value read is only 9 bits: bits 1-7 of the high byte are
        PPU2 Open Bus. Each register keeps seperate track of whether to
        return the low or high byte. The high/low selector is reset to 'low'
        when $213f is read (the selector is NOT reset when the counter is
        latched).

        H Counter values range from 0 to 339, with 22-277 being visible on the
        screen. V Counter values range from 0 to 261 in NTSC mode (262 is
        possible every other frame when interlace is active) and 0 to 311 in
        PAL mode (312 in interlace?), with 1-224 (or 1-239(?) if overscan is
        enabled) visible on the screen.


213e r b++++ STAT77 - PPU Status Flag and Version
        trm-vvvv

        t    = Time Over Flag. If more than 34 sprite-tiles (e.g. a 16x16
            sprite has 2 sprite-tiles) were encountered on a single line, this
            flag will be set. The flag is reset at the end of V-Blank. See the
            section "SPRITES" below for details.

        r    = Range Over Flag. If more than 32 sprites were encountered on a
            single line, this flag will be set. The flag is reset at the end of
            V-Blank. See the section "SPRITES" below for details.

            Note that the above two flags are set whether or not OBJ are
            actually enabled at the time.

        m    = "Master/slave mode select". Little is known about this bit.
            Current theory is that it indicates the status of the "MASTER" pin
            on the S-PPU1 chip, which in the normal SNES is always Gnd. We
            always seem to read back 0 here.

        vvvv = 5c77 chip version number. So far, we've only encountered version
            1.

        The '-' bit is PPU1 Open Bus.


213f r b++++ STAT78 - PPU Status Flag and Version
        fl-pvvvv

        f    = Interlace Field. This will toggle every V-Blank.

        l    = External latch flag. When the PPU counters are latched, this
            flag gets set. The flag is reset on read, but only when $4201 bit 7
            is set. 

        p    = NTSC/Pal Mode. If this is a PAL SNES, this bit will be set,
            otherwise it will be clear.

        vvvv = 5C78 chip version number. So far, we've encountered at least 2
            and 3. Possibly 1 as well.

        The '-' bit is PPU2 Open Bus.

        Note: as a side effect of reading this register, the high/low byte
        selector for $213c/d is reset to 'low'.


2140 rwb++++ APUIO0 - APU I/O register 0
2141 rwb++++ APUIO1 - APU I/O register 1
2142 rwb++++ APUIO2 - APU I/O register 2
2143 rwb++++ APUIO3 - APU I/O register 3
        xxxxxxxx

        These registers are used in communication with the SPC700. Note that
        the value written here is not the value read back. Rather, the value
        written shows up in the SPC700's registers $f4-7, and the values
        written to those registers by the SPC700 are what you read here.

        If the SPC700 writes the register during a read, the value read will
        be the logical OR of the old and new values. The exact cycles during
        which the 'read' actually occurs is not known, although a good guess
        would be some portion of the final 3 master cycles of the 6-cycle
        memory access.
        
        Note that these registers are mirrored throughout the range
        $2140-$217f.


2180 rwb++++ WMDATA - WRAM Data read/write
        xxxxxxxx

        This register reads to or writes from the WRAM address set in $2181-3.
        The address is then incremented. The effect of mixed reads and writes
        is unknown, but it is suspected that they are handled logically.

        Note that attempting a DMA from WRAM to this register will not work,
        WRAM will not be written. Attempting a DMA from this register to
        WRAM will similarly not work, the value written is (initially) the Open
        Bus value. In either case, the address in $2181-3 is not incremented.


2181  wl++++ WMADDL - WRAM Address low byte
2182  wm++++ WMADDM - WRAM Address middle byte
2183  wh++++ WMADDH - WRAM Address high bit
        -------x xxxxxxxx xxxxxxxx

        This is the address that will be read or written by accesses to $2180.
        Note that WRAM is also mapped in the SNES memory space from $7E:0000 to
        $7F:FFFF, and from $0000 to $1FFF in banks $00 through $3F and $80
        through $BF.

        Various docs indicate that these registers may be read as well as
        written. However, they are wrong. These registers are open bus.

        DMA from WRAM to these registers has no effect. Otherwise, however, DMA
        writes them as normal. This means you could use DMA mode 4 to $2180 and
        a table in ROM to write any sequence of RAM addresses.

        The value does not wrap at page boundaries on increment.


4016 rwb++++ JOYSER0 - NES-style Joypad Access Port 1
        Rd: ------ca
        Wr: -------l
4017 r?b++++ JOYSER1 - NES-style Joypad Access Port 2
        ---111db

        These registers basically have a direct connection to the controller
        ports on the front of the SNES. 

        l    = Writing this bit controlls the Latch line of both controller
            ports. When 1 is set, the Latch goes high (or is it low? At any
            rate, whichever one makes the pads latch their state). When
            cleared, the Latch goes the other way.
        
        a/b  = These bits return the state of the Data1 line.
        c/d  = These bits return the state of the Data2 line.
            Reading $4016 drives the Clock line of Controller Port 1 low.
            The SNES then reads the Data1 and Data2 lines, and Clock is set
            back to high. $4017 does the same for Port 2.
        
        Note the 1-bits in $4017: the CPU chip has pins for these bits, but
        these pins are tied to Gnd and thus always 1.

        Data for normal joypads is returned in the order: B, Y, Select,
        Start, Up, Down, Left, Right, A, X, L, R, 0, 0, 0, 0, then ones
        until latched again.

        Note that Auto-Joypad Read (see register $4200) will effectively write
        1 then 0 to bit 'l', then read 16 times from both $4016 and $4017. The
        'a' bits will end up in $4218/9, with the first bit read (e.g. the B
        button) in bit 15 of the word. Similarly, the 'b' bits end up in
        $421a/b, the 'c' bits in $42c/d, and the 'd' bits in $421e/f. Any
        further bits the device may return may be read from $4016/$4017 as
        normal.

        The effect of reading these during auto-joypad read is unknown.

        See the section "CONTROLLERS" below for details.


4200  wb+++? NMITIMEN - Interrupt Enable Flags
        n-yx---a

        n    = Enable NMI. If clear, NMI will not occur. If set, NMI will fire
            just after the start of V-Blank.

            NMI fires shortly after the V Counter reaches $E1 (or presumably
            $F0 if overscan is enabled, see register $2133).

        x/y  = IRQ enable.
            0/0 => No IRQ will occur
            0/1 => An IRQ will occur sometime just after the V Counter reaches
                   the value set in $4209/a.
            1/0 => An IRQ will occur sometime just after the H Counter reaches
                   the value set in $4207/8.
            1/1 => An IRQ will occur sometime just after the H Counter reaches
                   the value set in $4207/8 when V Counter equals the value set
                   in $4209/a.

        a    = Auto-Joypad Read Enable. When set, the registers $4218-$421f
            will be updated at about V Counter = $E3 (or presumably $F2).

        Some games try to read this register. However, they work only because
        open bus behavior gives them values they expect.

        This register is initialized to $00 on power on or reset.


4201  wb++++ WRIO - Programmable I/O port (out-port)
        abxxxxxx

        This is basically just an 8-bit I/O Port. 'b' is connected to pin 6 of
        Controller Port 1. 'a' is connected to pin 6 of Controller Port 2, and
        to the PPU Latch line. Thus, writing a 0 then a 1 to bit 'a' will latch
        the H and V Counters much like reading $2137 (the latch happens on the
        transition to 0). When bit 'a' is 0, no latching can occur.

        Any other effects of this register are unknown. See $4213 for the
        I half of the I/O Port.

        Note that the IO Port is initialized as if this register were written
        with all 1-bits at power up, unchanged on reset(?).


4202  wb++++ WRMPYA - Multiplicand A
4203  wb++++ WRMPYB - Multiplicand B
        mmmmmmmm

        Write $4202, then $4203. 8 "machine cycles" (probably 48 master cycles)
        after $4203 is set, the product may be read from $4216/7. $4202 will
        not be altered by this process, thus a new value may be written to
        $4203 to perform another multiplication without resetting $4202.

        The multiplication is unsigned.

        $4202 holds the value $ff on power on and is unchanged on reset.


4204  wl++++ WRDIVL - Dividend C low byte
4205  wh++++ WRDIVH - Dividend C high byte
        dddddddd dddddddd
4206  wb++++ WRDIVB - Divisor B
        bbbbbbbb

        Write $4204/5, then $4206. 16 "machine cycles" (probably 96 master
        cycles) after $4206 is set, the quotient may be read from $4214/5, and
        the remainder from $4216/7. Presumably, $4204/5 are not altered by this
        process, much like $4202.
        
        The division is unsigned. Division by 0 gives a quotient of $FFFF and a
        remainder of C.

        WRDIV holds the value $ffff on power on and is unchanged on reset.


4207  wl++++ HTIMEL - H Timer low byte
4208  wh++++ HTIMEH - H Timer high byte
        -------h hhhhhhhh

        If bit 4 of $4200 is set and bit 5 is clear, an IRQ will fire every
        scanline when the H Counter reaches the value set here. If bits 4 and 5
        are both set, the IRQ will fire only when the V Counter equals the
        value set in $4209/a.
        
        Note that the H Counter ranges from 0 to 339, thus greater values will
        result in no IRQ firing.

        HTIME is initialized to $1ff on power on, unchanged on reset.


4209  wl++++ VTIMEL - V Timer low byte
420a  wh++++ VTIMEH - V Timer high byte
        -------v vvvvvvvv

        If bit 5 of $4200 is set and bit 4 is clear, an IRQ will fire just
        after the V Counter reaches the value set here. If bits 4 and 5 are
        both set, the IRQ will fire instead when the V Counter equals the value
        set here and the H Counter reaches the value set in $4207/8.
        
        Note that the V Counter ranges from 0 to 261 in NTSC mode (262 is
        possible every other frame when interlace is active) and 0 to 311 in
        PAL mode (312 in interlace?), thus greater values will result in no IRQ
        firing.

        VTIME is initialized to $1ff on power on, unchanged on reset.


420b  wb++++ MDMAEN - DMA Enable
        76543210

        7/6/5/4/3/2/1/0 = Enable the selected DMA channels. The CPU will be
            paused until all DMAs complete. DMAs will be executed in order from
            0 to 7 (?).

        See registers $43x0-$43xA for more details.

        If HDMA (init or transfer) occurs while a DMA is in progress, the DMA
        will be paused for the duration. If the HDMA happens to involve the
        current DMA channel, the DMA will be immediately terminated and the
        HDMA will progress using the then-current values of the registers.
        Other DMA channels will be unaffected.

        This register is initialized to $00 on power on or reset.

        See the section "DMA AND HDMA" below for more information.


420c  wb++++ HDMAEN - HDMA Enable
        76543210

        7/6/5/4/3/2/1/0 = Enable the selected HDMA channels. HDMAs will be
            executed in order from 0 to 7 (?).

        See registers $43x0-$43xA for more details.

        If HDMA (init or transfer) occurs while a DMA is in progress, the DMA
        will be paused for the duration. If the HDMA happens to involve the
        current DMA channel, the DMA will be immediately terminated and the
        HDMA will progress using the then-current values of the registers.
        Other DMA channels will be unaffected.

        Note that enabling a channel mid-frame will begin HDMA at the next HDMA
        point. However, the HDMA register initialization only occurs before the
        HDMA point on scanline 0, so those registers will have to be
        initialized by hand before enabling HDMA. A channel that has already
        terminated for the frame cannot be restarted in this manner.

        Writing 0 to a bit will pause an ongoing HDMA; the transfer may be
        continued by writing 1 to the bit.

        This register is initialized to $00 on power on or reset.

        See the section "DMA AND HDMA" below for more information.


420d  wb++++ MEMSEL - ROM Access Speed
        -------f

        f    = FastROM select. The SNES uses a master clock running at
            about 21.477 MHz (current theory is 1.89e9/88 Hz). By default, the
            SNES takes 8 master cycles for each ROM access. If this bit is set
            and ROM is accessed via banks $80-$FF, only 6 master cycles will be
            used.

        This register is initialized to $00 on power on (or reset?).

        See my memory map and timing doc (memmap.txt) for more details.


4210 r b++++ RDNMI - NMI Flag and 5A22 Version
        n---vvvv

        n    = NMI Flag. This bit is set at the start of V-Blank (at the
            moment, we suspect when H-Counter is somewhere between $28 and
            $4E), and cleared on read or at the end of V-Blank. Supposedly, it
            is required that this register be read during NMI.

            Note that this bit is not affected by bit 7 of $4200.

        vvvv = 5A22 chip version number. So far, we've encountered at least 2,
            maybe 1 as well.

        NMI is cleared on power on or reset.

        The '-' bits are open bus.


4211 r b++++ TIMEUP - IRQ Flag
        i-------

        i    = IRQ Flag. This bit is set just after an IRQ fires (at the
            moment, it seems to have the same delay as the NMI Flag of $4210
            has following NMI), and is cleared on read or write. Supposedly, it
            is required that this register be read during the IRQ handler. If
            this really is the case, then I suspect that that read is what
            actually clears the CPU's IRQ line.

        This register is marked read/write in another doc, with no explanation.

        IRQ is cleared on power on or reset.
        
        The '-' bits are open bus.


4212 r b++++ HVBJOY - PPU Status
        vh-----a

        v    = V-Blank Flag. If we're currently in V-Blank, this flag is set,
            otherwise it is clear. The setting seems to occur at H Counter
            about $16-$17 when V Counter is $E1, and the clearing at about $1E
            with V Counter 0.

        h    = H-Blank Flag. If we're currently in H-Blank, this flag is set,
            otherwise it is clear. The setting seems to occur at H Counter
            about $121-$122, and the clearing at about $12-$18.

        a    = Auto-Joypad Status. This is set while Auto-Joypad Read is in
            progress, and cleared when complete. It typically turns on at
            the start of V-Blank, and completes 3 scanlines later.

        This register is marked read/write in another doc, with no explanation.


4213 r b++++ RDIO - Programmable I/O port (in-port)
        abxxxxxx

        Reading this register reads data from the I/O Port. The way the
        I/O Port works, any bit set to 0 in $4201 will be 0 here. Any bit
        set to 1 in $4201 may be 1 or 0 here, depending on whether any other
        device connected to the I/O Port has set a 0 to that bit.

        Bit 'b' is connected to pin 6 of Controller Port 1. Bit 'a' is
        connected to pin 6 of Controller Port 2, and to the PPU Latch line.

        See register $4201 for the O side of the I/O Port.


4214 r l++++ RDDIVL - Quotient of Divide Result low byte
4215 r h++++ RDDIVH - Quotient of Divide Result high byte
        qqqqqqqq qqqqqqqq

        Write $4204/5, then $4206. 16 "machine cycles" (probably 96 master
        cycles) after $4206 is set, the quotient may be read from these
        registers, and the remainder from $4216/7.
        
        The division is unsigned.


4216 r l++++ RDMPYL - Multiplication Product or Divide Remainder low byte
4217 r h++++ RDMPYH - Multiplication Product or Divide Remainder high byte
        xxxxxxxx xxxxxxxx

        Write $4202, then $4203. 8 "machine cycles" (probably 48 master cycles)
        after $4203 is set, the product may be read from these registers.

        Write $4204/5, then $4206. 16 "machine cycles" (probably 96 master
        cycles) after $4206 is set, the quotient may be read from $4214/5, and
        the remainder from these registers.

        The multiplication and division are both unsigned.


4218 r l++++ JOY1L - Controller Port 1 Data1 Register low byte
4219 r h++++ JOY1H - Controller Port 1 Data1 Register high byte
421a r l++++ JOY2L - Controller Port 2 Data1 Register low byte
421b r h++++ JOY2H - Controller Port 2 Data1 Register high byte
421c r l++++ JOY3L - Controller Port 1 Data2 Register low byte
421d r h++++ JOY3H - Controller Port 1 Data2 Register high byte
421e r l++++ JOY4L - Controller Port 2 Data2 Register low byte
421f r h++++ JOY4H - Controller Port 2 Data2 Register high byte
        byetUDLR axlr0000

        The bitmap above only applies for joypads, obviously. More
        generically, Auto Joypad Read effectively sets 1 then 0 to $4016,
        then reads $4016/7 16 times to get the bits for these registers.
        
        a/b/x/y/l/r/e/t = A/B/X/Y/L/R/Select/Start button status.

        U/D/L/R = Up/Down/Left/Right control pad status. Note that only one of
            L/R and only one of U/D may be set, due to the pad hardware.

        These registers are only updated when the Auto-Joypad Read bit (bit 0)
        of $4200 is set. They are being updated while the Auto-Joypad Status
        bit (bit 0) of $4212 is set. Reading during this time will return
        incorrect values.

        See the section "CONTROLLERS" below for details.


43x0 rwb++++ DMAPx - DMA Control for Channel x (x=0-7)
        da-ifttt

        d    = Transfer Direction. When clear, data will be read from the CPU
            memory and written to the PPU register. When set, vice versa.
            
            Contrary to previous belief, this bit DOES affect HDMA! Indirect
            mode is more useful, it will read the table as normal and write
            from Bus B to the Bus A address specified. Direct mode will work as
            expected though, it will read counts from the table and try to
            write the data values into the table.

        a    = HDMA Addressing Mode. When clear, the HDMA table contains the
            data to transfar. When set, the HDMA table contains pointers to the
            data. This bit does not affect DMA.

        i    = DMA Address Increment. When clear, the DMA address will be
            incremented for each byte. When set, the DMA address will be
            decremented. This bit does not affect HDMA.

        f    = DMA Fixed Transfer. When set, the DMA address will not be
            adjusted. When clear, the address will be adjusted as specified by
            bit 4. This bit does not affect HDMA.

        ttt  = Transfer Mode.
            000 => 1 register write once             (1 byte:  p               )
            001 => 2 registers write once            (2 bytes: p, p+1          )
            010 => 1 register write twice            (2 bytes: p, p            )
            011 => 2 registers write twice each      (4 bytes: p, p,   p+1, p+1)
            100 => 4 registers write once            (4 bytes: p, p+1, p+2, p+3)
            101 => 2 registers write twice alternate (4 bytes: p, p+1, p,   p+1)
            110 => 1 register write twice            (2 bytes: p, p            )
            111 => 2 registers write twice each      (4 bytes: p, p,   p+1, p+1)

        The effect of writing this register during HDMA to the associated
        channel is unknown. Most likely, the change takes effect for the
        next HDMA transfer.

        This register is set to $ff on power on, and is unchanged on reset.

        See the section "DMA AND HDMA" below for more information.


43x1 rwb++++ BBADx - DMA Destination Register for Channel x (x=0-7)
        pppppppp

        This specifies the Bus B address to access. Considering the standard
        CPU memory space, this specifies which address $00:2100-$00:21ff to
        access, with two- and four-register modes wrapping $21ff->$2100, not
        $2200.

        The effect of writing this register during HDMA to the associated
        channel is unknown. Most likely, the change takes effect for the
        next transfer.

        This register is set to $ff on power on, and is unchanged on reset.

        See the section "DMA AND HDMA" below for more information.


43x2 rwl++++ A1TxL - DMA Source Address for Channel x low byte (x=0-7)
43x3 rwh++++ A1TxH - DMA Source Address for Channel x high byte (x=0-7)
43x4 rwb++++ A1Bx - DMA Source Address for Channel x bank byte (x=0-7)
        bbbbbbbb hhhhhhhh llllllll

        This specifies the starting Address Bus A address for the DMA transfer,
        or the beginning of the HDMA table for HDMA transfers. Note that Bus A
        does not access the Bus B registers, so pointing this address at say
        $00:2100 results in open bus.

        The effect of writing this register during HDMA to the associated
        channel is unknown. However, current theory is that only $43x4 will
        affect the transfer. The changes will take effect at the next HDMA
        init.

        During DMA, $43x2/3 will be incremented or decremented as specified by
        $43x0. However $43x4 will NOT be adjusted. These registers will not be
        affected by HDMA.

        This register is set to $ff on power on, and is unchanged on reset.
        
        See the section "DMA AND HDMA" below for more information.


43x5 rwl++++ DASxL - DMA Size/HDMA Indirect Address low byte (x=0-7)
43x6 rwh++++ DASxH - DMA Size/HDMA Indirect Address high byte (x=0-7)
43x7 rwb++++ DASBx - HDMA Indirect Address bank byte (x=0-7)
        bbbbbbbb hhhhhhhh llllllll

        For DMA, $43x5/6 indicate the number of bytes to transfer. Note that
        this is a strict limit: if this is set to 1 then only 1 byte will be
        written, even if the transfer mode specifies 2 or 4 registers (and if
        this is 5, all 4 registers would be written once, then the first only
        would be written a second time). Note, however, that writing $0000 to
        this register actually results in a transfer of $10000 bytes, not 0.

        $43x5/6 are decremented during DMA, and thus typically end up set to 0
        when DMA is complete.

        For HDMA, $43x7 specifies the bank for indirect addressing mode. The
        indirect address is copied into $43x5/6 and incremented appropriately.
        For direct HDMA, these registers are not used or altered.

        Writes to $43x7 during indirect HDMA will take effect for the next
        transfer. Writes to $43x5/6 during indirect HDMA will also take effect
        for the next HDMA transfer, however this is only noticable during
        repeat mode (for normal mode, a new indirect address will be read from
        the table before the transfer). For a direct transfer, presumably
        nothing will happen.

        This register is set to $ff on power on, and is unchanged on reset.

        See the section "DMA AND HDMA" below for more information.


43x8 rwl++++ A2AxL - HDMA Table Address low byte (x=0-7)
43x9 rwh++++ A2AxH - HDMA Table Address high byte (x=0-7)
        aaaaaaaa aaaaaaaa

        At the beginning of the frame $43x2/3 are copied into this register for
        all active HDMA channels, and then this register is updated as the
        table is read. Thus, if a game wishes to start HDMA mid-frame (or
        change tables mid-frame), this register must be written. Writing this
        register mid-frame changes the table address for the next scanline.

        This register is not used for DMA.

        This register is set to $ff on power on, and is unchanged on reset.

        See the section "DMA AND HDMA" below for more information.


43xa rwb++++ NLTRx - HDMA Line Counter (x=0-7)
        rccccccc

        r    = Repeat Select. When set, the HDMA transfer will be performed
            every line, rather than only when this register is loaded from the
            table. However, this byte (and the indirect HDMA address) will only
            be reloaded from the table when the counter reaches 0.

        ccccccc = Line count. This is decremented every scanline. When it
            reaches 0, a byte is read from the HDMA table into this register
            (and the indirect HDMA address is read into $43x5/6 if applicable).

        One oddity: the register is decremeted before being checked for r
        status or c==0. Thus, setting a value of $80 is really "128 lines with
        no repeat" rather than "0 lines with repeat". Similarly, a value of $00
        will be "128 lines with repeat" when it doesn't mean "terminate the
        channel".

        This register is initialized at the end of V-Blank for every active
        HDMA channel. Note that if a game wishes to begin HDMA during the
        frame, it will most likely have to initalize this register. Writing
        this mid-transfer will similarly change the count and repeat to take
        effect next scanline. Remember though that 'repeat' won't take effect
        until after the next transfer period.

        This register is set to $ff on power on, and is unchanged on reset.

        See the section "DMA AND HDMA" below for more information.

43xb rwb++++ ????x - Unknown (x=0-7)
43xf rwb++++ ????x - Unknown (x=0-7)
        ????????

        The effects of these registers (if any) are unknown. $43xf and $43xb
        are really aliases for the same register.

        This register is set to $ff on power on, and is unchanged on reset.



SPRITES
=======

The SNES has 128 independant sprites. The sprite definitions are stored in
Object Attribute Memory, or OAM.

OAM
---

OAM consists of 544 bytes, organized into a low table of 512 bytes and a high
table of 32 bytes. Both tables are made up of 128 records. OAM is accessed by
setting the word address in register $2102, the "table select" in bit 0 of
$2103, then writing to $2104 or reading from $2138. Since the high table is
only 32 bytes long, only the low 4 bits of $2102 are significant for indexing
this table.

The internal OAM address is invalidated during the rendering of a scanline;
this invalidation is deterministic, but we do not know how or when the value is
determined. Current theory is that it is invalidated more-or-less continuously
and has something to do with the current OAM address and possibly which sprites
are on the current scanline. The internal OAM address is reloaded from $2102/3
at the beginning of V-Blank, if this occurs outside of a force-blank period.
The reload also occurs on a 1->0 transition of $2100.7.

Each read/write increments the address by one byte (the internal address has 10
bits, with bit 9 selecting the table and bits 0-8 indexing). Reads simply read
the current byte. Writes to the low table go into a word-sized buffer, which is
written to the appropriate word of OAM when the high byte of the word is
written. Thus, if alternating reads and writes occur such that the high byte of
the word is always read instead of written, none of the writes will actually
affect OAM. If the alternation happens such that the writes always occur to the
high byte, not only the high bytes but whatever garbage is left in the low byte
will be written as well!

Pictorally: Start OAM filled with all zeros. Write 1, read, read, Write 2,
read, write 3 => OAM is 00 00 01 02 01 03, rather than 01 00 00 02 00 03 as
you might expect.

Writes to the high table, on the other hand, work exactly as expected.

The record format for the low table is 4 bytes:
  byte OBJ*4+0: xxxxxxxx
  byte OBJ*4+1: yyyyyyyy
  byte OBJ*4+2: cccccccc
  byte OBJ*4+3: vhoopppN

The record format for the high table is 2 bits:
  bit 0/2/4/6 of byte OBJ/4: X
  bit 1/3/5/7 of byte OBJ/4: s

The values are:
  Xxxxxxxxx = X position of the sprite. Basically, consider this signed but see
      below.
  yyyyyyyy  = Y position of the sprite. Values 0-239 are on-screen. -63 through
      -1 are "off the top", so the bottom part of the sprite comes in at the
      top of the screen. Note that this implies a really big sprite can go off
      the bottom and come back in the top.
  cccccccc  = First tile of the sprite. See below for the calculation of the
      VRAM address. Note that this could also be considered as 'rrrrcccc'
      specifying the row and column of the tile in the 16x16 character table.
  N         = Name table of the sprite. See below for the calculation of the
      VRAM address.
  ppp       = Palette of the sprite. The first palette index is 128+ppp*16.
  oo        = Sprite priority. See below for details.
  h/v       = Horizontal/Veritcal flip flags. Note this flips the whole sprite,
      not just the individual tiles. However, the rectangular sprites are
      flipped vertically as if they were two square sprites (i.e. rows
      "01234567" flip to "32107654", not "76543210").
  s         = Sprite size flag. See below for details.

The sprite size is controlled by bits 5-7 of $2101, and the Size bit of OAM.
$2101 determines the two possible sizes for all sprites. If the OAM Size flag
is 0, the sprite uses the smaller size, otherwise it uses the larger size.


Palettes
--------

There are 8 16-color palettes available to sprites, starting at CGRAM index
128. Thus, the palette number 'ppp' in OAM indicates that colors 128+ppp*16
through 128+ppp*16+15 are available to this sprite. However, the first of these
is always considered transparent, to allow for non-rectangular shaped sprites.

Only sprites with palettes 4-7 participate in color math.


Character table in VRAM
-----------------------

Sprites have two 16x16 tile character tables in VRAM. Wrapping on these works
much like for BG tilemaps: tile 0 is to the right of tile $0F and below tile
$F0, tile $10 is below tile 0 and to the right of tile $1F, tile $FF is to the
left of tile $F0 and above tile $0F, and so on. Which character table a sprite
uses is determined by the N bit in OAM. So if you specify Tile=$ff, your 16x16
sprite is made of tiles $ff, $f0, $0f, and $00.

The first table is at the address specified by the Name Base bits of $2101, and
the offset of the second is determined by the Name bits of $2101. The word
address in VRAM of a sprite's first tile may be calculated as:
  ((Base<<13) + (cccccccc<<4) + (N ? ((Name+1)<<12) : 0)) & 0x7fff

See the section "BACKGROUNDS" below for details on the format of the character
data.


Sprite Priority
---------------

There are two 'priority' concepts applicalbe to sprites. First, there are
the priority bits in OAM, which control the priority of the sprites relative
to the BGs. See the section "BACKGROUNDS" for more details on this.

The second is the priority with relation to the other sprites. This is
completely controlled by the sprite's index and the priority rotation
setting.

Priority rotation is set by bit 7 of $2103. If the bit is unset, Sprite 0 is
always the first sprite. Otherwise, take the current internal OAM word
address (not affected by OAM Address Invalidation) and give priority to the
sprite number (OAMAddr&0xFE)>>1. So if you set $2102/3 to $104, then write 4
bytes, sprite 3 will have priority for the next frame. However, OAM Address
Reset will reset the internal OAM address to word $104, so sprite 2 will
have priority for subsequent frames.

There is one major oddity: if you set $2102/3=A, then write 4n+2*(A&1)+1
bytes (e.g. so the next byte written would go to the last byte in the 4-byte
sprite record), sprite ((OAMAddr>>1)+Y)&0x7F has priority (where Y is the
current line as addressed by sprites). Thus, if you put all 128 8x8 sprites
at Y=63, write $8000 to $2102/3, then read 3 bytes from $2138, you will see
sprites 63-70 having priority on successive scanlines.

FirstSprite ends up on top of all other sprites, regardless of the priority
bits in OAM. FirstSprite+1 is on top of FirstSprite+2 is on top of
FirstSprite+3 and so on until FirstSprite+127 (wrapping of course from sprite
127 to sprite 0). Note that only the priority of the topmost sprite is
considered relative to the backgrounds. Thus, if FirstSprite+3 and
FirstSprite+4 are identical except FirstSprite+3 has priority 0 and
FirstSprite+4 has priority 3, they will both be hidden by any backgrounds that
hide priority 0 sprites. This may seem counterintuitive, since FirstSprite+4
would normally go in front of these BGs, but many games depend on this
behavior.


Drawing the Sprites
-------------------

As with everything else on the SNES, sprites are drawn per-scanline. The
process is basically as follows:

 0) If any OBJ is at X=256 (or X=-256, same difference), consider it as being
    at X=0 when considering Range and Time. Note that this doesn't mean you
    actually draw it at X=0.

 1) Range: Starting with the FirstSprite, determine the first 32 sprites on
    this scanline. Only those sprites with -size < X < 256 are considered in
    Range. If there are more than 32 sprites on the scanline, set bit 6 of
    register $213e.

 2) Time: Starting with the last sprite in Range, load up to 34 8x8 tiles (from
    left-to-right, after flipping). If there are more than 34 tiles in Range,
    set bit 7 of $213e. Only those tiles with -8 < X < 256 are counted.
 
 3) Associate with each tile in Range and Time its true X position (256/-256
    should not be set to 0), palette, and priority for drawing.

See the section "RENDERING THE SCREEN" below for details.


BACKGROUNDS
===========

BG Modes
--------

The SNES has 7 background modes, two of which have major variations. The modes
are selected by bits 0-2 of register $2105. The variation of Mode 1 is selected
by bit 3 of $2105, and the variation of Mode 7 is selected by bit 6 of $2133.

Mode    # Colors for BG
         1   2   3   4
======---=---=---=---=
0        4   4   4   4
1       16  16   4   -
2       16  16   -   -
3      256  16   -   -
4      256   4   -   -
5       16   4   -   -
6       16   -   -   -
7      256   -   -   -
7EXTBG 256 128   -   -

In all modes and for all BGs, color 0 in any palette is considered transparent.


Tile Maps and Character Maps
----------------------------

Each BG has two regions of VRAM associated with it: one for the tilemap, and
one for the character data.

The tilemap address is selected by bits 2-7 of registers $2107-a, and the
tilemap size is selected by bits 0-1 of that same register. All tilemaps are
32x32, bits 0-1 simply select the number of 32x32 tilemaps and how they're
layed out in memory:
  00  32x32   AA
              AA
  01  64x32   AB
              AB
  10  32x64   AA
              BB
  11  64x64   AB
              CD
Starting at the tilemap address, the first $800 bytes are for tilemap A. Then
come the $800 bytes for B, then C then D. Of course, if only A is required
something else could be stuck in the empty space.

Each entry in the tilemap is 2 bytes, formatted as (high low):
  vhopppcc cccccccc

  v/h  = Vertical/Horizontal flip this tile.
  o    = Tile priority.
  ppp  = Tile palette. The number of entries in the palette depends on the Mode
      and the BG.
  cccccccccc = Tile number.

To find the tilemap word address for a particular tile (X and Y), you'd use a
formula something like this:
  (Addr<<9) + ((Y&0x1f)<<5) + (X&0x1f) + 
     (SY ? ((Y&0x20)<<(SX ? 6 : 5)) : 0) + (SX ? ((X&0x20)<<5) : 0)

The tile character data is stored at the address pointed to by registers
$210b-c, starting at byte address:
  (Base<<13) + (TileNumber * 8*NumBitplanes)
Each tile is (normally) 8x8 pixels. The data is stored in bitplanes.
Each row of the tile fills 1 byte, with the leftmost pixel being in bit 7. For
4-color tiles, bitplanes 0 and 1 are stored in the low and high bytes of a
word, with 8 words making up the tile. For a 16-color tile, bitplanes 0 and 1
are stored as for a 4-color tile, followed by bitplanes 2 and 3 in the same
format. A 256-color tile is stored in the same way as 2 4-color tiles.

If the appropriate bit of $2105 is set, each "tile" of the tilemap actually
corresponds to a 16x16 pixel block consisting of Tile, Tile+1, Tile+16, and
Tile+17. In this case, the 32x32 tile tilemap codes for a 512x512 pixel screen
rather than a 256x256 pixel screen as normal. Thus, using both 16x16 tiles and
the 64x64 tilemap each BG can be up to 1024x1024 pixels. There is no wrapping
like there is for 16x16 sprites: if you specify Tile=$2ff, you'll get $2ff,
$300, $30f, and $310 (as opposed to $2ff, $2f0, $20f, and $200 you might
otherwise expect). $3ff goes to $000, of course. Flipping in this mode flips th
whole 16x16 tile, not just the individual 8x8 tiles.


BG Scrolling
------------

Of course, depending on the BG mode and the interlace setting, Modes 0-6 have
an actual display of 256x224 or 256x239 pixels. The BG scroll registers
$210d-$2114 control the offset of the displayed area within that possible
256x256 to 1024x1024 pixel BG.

The display can never fall outside the BG: if that would seem to be the case,
simply wrap around back to 0 (or 'tile' the BG to fill the full 1024x1024,
however you like to think of it).

The registers $210d-$2114 are all write-twice to set the 16-bit value. The way
this works, the last write to any of these registers is stored in a buffer.
When a new byte is written to any register, the current register value, the
previous byte written to any of the 6 registers, and the new byte written
are combined as follows:
  For BGnHOFS: (NewByte<<8) | (PrevByte&~7) | ((CurrentValue>>8)&7)
  For BGnVOFS: (NewByte<<8) | PrevByte
For the most part, the details don't really matter as most games always write
two bytes to one of these registers. However, some games write only one byte,
or they do other odd things.

Thus, the tilemap entry for a particular X and Y position on the screen may be
calculated as follows:
  Size = 8 or 16 depending on the appropriate bit of $2105
  TileX = (X + BGnHOFS)/Size
  TileY = (Y + BGnVOFS)/Size
  Look up the tile at TileX and TileY as described above.

Note that many games will set their vertical scroll values to -1 rather than 0.
This is bacause the SNES loads OBJ data for each scanline during the previous
scanline. The very first line, though, wouldn't have any OBJ data loaded! So
the SNES doesn't actually output scanline 0, although it does everything to
render it. These games want the first line of their tilemap to be the first
line output, so they set their VOFS registers in this manner. Note that an
interlace screen needs -2 rather than -1 to properly correct for the missing
line 0 (and an emulator would need to add 2 instead of 1 to account for this).


Direct Color Mode
-----------------

For the 256-color BGs of Modes 3, 4, and 7, $2130 bit 0 when set enables direct
color mode. In this mode, instead of ignoring ppp and using the character data
as the palette index, you treat the character data as expressing a color
BBGGGRRR, and use the 3 bits of ppp as bgr to make the color
  Red=RRRr0, Green=GGGg0, Blue=BBb00

In direct color mode you cannot have a black pixel, since any pixel with
character data = 0 is still considered transparent. Use one of the almost-black
colors instead (01, 08 or 09 are good choices).


Mode 0
------

In Mode 0, you have 4 BGs of 4 colors each. To calculate the starting palette
entry for a particular tile, you calculate:
  ppp*4 + (BG#-1)*32

The background priority is (from 'front' to 'back'):
  Sprites with priority 3
  BG1 tiles with priority 1
  BG2 tiles with priority 1
  Sprites with priority 2
  BG1 tiles with priority 0
  BG2 tiles with priority 0
  Sprites with priority 1
  BG3 tiles with priority 1
  BG4 tiles with priority 1
  Sprites with priority 0
  BG3 tiles with priority 0
  BG4 tiles with priority 0


Mode 1
------

In Mode 1, you have 2 BGs of 16 colors and 1 BG of 4 colors. To calculate the
starting palette entry, calculate:
  ppp*ncolors

The background priority varies depending on the setting of bit 3 of $2105.
The priority is (from 'front' to 'back'):
  BG3 tiles with priority 1 if bit 3 of $2105 is set
  Sprites with priority 3
  BG1 tiles with priority 1
  BG2 tiles with priority 1
  Sprites with priority 2
  BG1 tiles with priority 0
  BG2 tiles with priority 0
  Sprites with priority 1
  BG3 tiles with priority 1 if bit 3 of $2105 is clear
  Sprites with priority 0
  BG3 tiles with priority 0


Mode 2
------

In Mode 2, you have 2 BGs of 16 colors each. To calculate the starting palette
index, calculate:
  ppp*16

The priority is (from 'front' to 'back'):
  Sprites with priority 3
  BG1 tiles with priority 1
  Sprites with priority 2
  BG2 tiles with priority 1
  Sprites with priority 1
  BG1 tiles with priority 0
  Sprites with priority 0
  BG2 tiles with priority 0

Note the change from Modes 0 and 1.

Mode 2 is the first of the Offset-Per-Tile Modes. In this mode, the 'tile
data' for BG3 actually encodes a (possible) replacement HOffset and/or
VOffset value for each tile of BG1 and/or BG2.

Consider a visible scanline. Normally, you'd get the pixels something like
this:

  HOFS = X + BGnHOFS
  VOFS = Y + BGnVOFS
  Pixel[X,Y] = GetPixel(GetTile(BGn, HOFS, VOFS), HOFS, VOFS)

With offset-per-tile, the formula is a little more complicated:
  
  HOFS = X + BGnHOFS
  VOFS = Y + BGnVOFS
  ValidBit = 0x2000 for BG1, or 0x4000 for BG2
  if (!IsFirst8x8Tile(BGn, HOFS)) {
    /* Hopefully these calculations are right... */
    Hval = GetTile(BG3, (HOFS&7)|(((X-8)&~7)+(BG3HOFS&~7)), BG3VOFS)
    Vval = GetTile(BG3, (HOFS&7)|(((X-8)&~7)+(BG3HOFS&~7)), BG3VOFS + 8)
    if (Hval&ValidBit) HOFS = (HOFS&7) | ((X&~7) + (Hval&~7))
    if (Vval&ValidBit) VOFS = Y + Vval
  }
  Pixel[X,Y] = GetPixel(Get8x8Tile(BGn, HOFS, VOFS), HOFS, VOFS)

In other words, number the visible tiles in BGn from 0-32, and the 'visible'
tiles in BG3 the same way. BGn tile 0 is offset as normal, then for 1<=T<33
BGn tile T gets the offset data from BG3 tile T-1. It doesn't matter whether
or not the tiles actually align in any way.

Note that the leftmost visible tile is done as normal in all cases (although
as little as 1 pixel may be visible, and if that still bothers you then use
a clip window to hide it), and the next tile uses the tilemap entry for what
would be BG3's leftmost tile. Note also that the 'new' offset completely
overrides the BGnVOFS register, but the lower 3 bits of the BGnHOFS offset
are still used. And note that the current Y position on the screen does not
affect which row of the BG3 tilemap to reference, it's as if Y were always 0.

On the other hand, note that even if BGn is 16x16 tiles, BG3 can specify the
offset for each 8x8 subtile. And if BG3 is 16x16, the offsets will apply to
all the corresponding 8x8 subtiles on BGn. Also note that if BG3 is 16x16, we
may end up using the same tile for Hval and Vval.


Mode 3
------

In Mode 3, you have one 256-color BG and one 16-color BG. To calculate the
starting palette index, calculate:
  BG1: 0
  BG2: ppp*16

The priority is (from 'front' to 'back'):
  Sprites with priority 3
  BG1 tiles with priority 1
  Sprites with priority 2
  BG2 tiles with priority 1
  Sprites with priority 1
  BG1 tiles with priority 0
  Sprites with priority 0
  BG2 tiles with priority 0

Note that register $2130 may enable Direct Color Mode on BG1.


Mode 4
------

In Mode 4, you have one 256-color BG and one 4-color BG. To calculate the
starting palette index, calculate:
  BG1: 0
  BG2: ppp*4

The priority is (from 'front' to 'back'):
  Sprites with priority 3
  BG1 tiles with priority 1
  Sprites with priority 2
  BG2 tiles with priority 1
  Sprites with priority 1
  BG1 tiles with priority 0
  Sprites with priority 0
  BG2 tiles with priority 0

Note that register $2130 may enable Direct Color Mode on BG1.

Mode 4 is the second of the Offset-Per-Tile Modes. It operates much like
Mode 2, however the SNES doesn't have time to load two offset values.
Instead, it does this:
    Val = GetTile(BG3, ...)
    if (Val&0x8000) {
      Hval = 0
      Vval = Val
    } else {
      Hval = Val
      Vval = 0
    }


Mode 5
------

In Mode 5, you have one 16-color BG and one 4-color BG. To calculate the
starting palette index, calculate:
  ppp*ncolors

The priority is (from 'front' to 'back'):
  Sprites with priority 3
  BG1 tiles with priority 1
  Sprites with priority 2
  BG2 tiles with priority 1
  Sprites with priority 1
  BG1 tiles with priority 0
  Sprites with priority 0
  BG2 tiles with priority 0

Mode 5 is rather different from the previous modes. Instead of using an 8/16
pixel wide tile as normal, it always takes a 16 pixel wide tile (the height
may still be 8 or 16) and only uses half the pixels (zero-based, the even
pixels for subscreen tiles and the odd pixels for mainscreen tiles). Then it
forces pseudo-hires on to render a 512-pixel wide scanline. Also, if
Interlace mode is on (see bit 0 of $2133), the screen is 448 or 478
half-lines high instead of 224 or 239. Either the odd half-lines or the even
half-lines are drawn each frame, as indicated by bit 7 of $213f.

Note that this means you must set $212c and $212d to the same value to get the
'expected' display.


Mode 6
------

In Mode 6, you have only one 16-color BG. To calculate the starting palette
index, calculate:
  ppp*ncolors

The priority is (from 'front' to 'back'):
  Sprites with priority 3
  BG1 tiles with priority 1
  Sprites with priority 2
  Sprites with priority 1
  BG1 tiles with priority 0
  Sprites with priority 0

Mode 6 has the same oddities as Mode 5. In addition, it is an offset per tile
mode! That part works just like as Mode 2. However, remember that Mode 6 always
uses 8 pixel (16 half-pixel) wide tiles, this applies to BG3 as well as BG1.
You can't apply the offset to an 8-half-pixel tile nor to a 16-pixel wide area
(except by using two offset values for the two 8-pixel areas).


Mode 7
------

Mode 7 is extremely different from all the modes before. You have one BG of 256
colors. However, the tilemap and character map are laid out completely
differently.

The tilemap and charactermap are interleaved, with the character data being in
the high byte of each word and the tilemap data being in the low byte (note
that in hardware, VRAM is set up such that odd bytes are in one RAM chip and
even in another, and each RAM chip has a separate address bus. The Mode 7
renderer probably accesses the two chips independantly). The tilemap is 128x128
entries of one byte each, with that one byte being simply a character map
index. The character data is stored packed pixel rather than bitplaned, with
one pixel per byte. Thus, to calculate the tilemap entry byte address for an X
and Y position in the playing field, you'd calculate: (((Y&~7)<<4) + (X>>3))<<1

To find the byte address of the pixel, you'd calculate:
  (((TileData<<6) + ((Y&7)<<3) + (X&7))<<1) + 1
 
Note that bits 4-7 of $2105 are ignored, as are $2107-$210c. They can be
considered to be always 0.

The next odd thing about Mode 7 is that you have full matrix transformation
abilities. With creative use of HDMA, you can even change the matrix per
scanline. See registers $211b-$2120 for details on the matrix transformation
formula. The entire screen can be flipped with bits 0-1 of $211a.

And finally, the playing field can actually be made larger than the tilemap. If
bit 7 of $211a is set, bit 6 of $211a controls what is seen filling the space
surrounding the map.

The background priorities are:
  Sprites with priority 3
  Sprites with priority 2
  Sprites with priority 1
  BG1
  Sprites with priority 0

When bit 6 of $2133 is set, you get a related mode known as Mode 7 EXTBG. In
this mode, you get a BG2 with 128 colors, which uses the same tilemap and
character data as BG1 but interprets the high bit of the pixel as a priority
bit. The priority map is:
  Sprites with priority 3
  Sprites with priority 2
  BG2 pixels with priority 1
  Sprites with priority 1
  BG1
  Sprites with priority 0
  BG2 pixels with priority 0

Note that the BG1 pixels (if BG1 is enabled) will usually completely obscure
the low-priority BG2 pixels.

BG2 uses the Mode 7 scrolling registers ($210d-e) rather than the 'normal' BG2
ones ($210f-10). Subscreen, pseudo-hires, math, and clip windows work as
normal; keep in mind OBJ and that you can do things like enable BG1 on main and
BG2 on sub if you so desire. Mosaic is somewhat weird, see the section on
Mosaic below.

Note that BG1, being a 256-color BG, can do Direct Color mode (in this case, of
course, there is no palette value so you're limited to 256 colors instead of
2048). BG2 does not do direct color mode, since it is only 7-bit.


Rendering the BGs
-----------------

Rendering a BG is simple.

1) Get your H and V offsets (either by reading the appropriate registers or by
   doing the offset-per-tile calculation).
2) Use those to translate the screen X and Y into playing field X and Y
   - Note this is rather complicated for Mode 7
3) Look up the tilemap for those coordinates
4) Use that to find the character data
5) If necessary, de-bitplane it and stick it in a buffer.

See the section "RENDERING THE SCREEN" below for more details.

Unresolved Issues
-----------------

1) What happens to the very first pixel on the scanline in Hires Math?
2) Various registers still need to know when writing to them is effective.


WINDOWS
=======

The masking windows are pretty simple. The windows can be used to mask off a
portion of any BG on the scanline. With HDMA, they can be adjusted per
scanline. They can be combined in various ways, per BG. Each can be used to
select either the region of the BG to keep, or the region of the BG to hide,
per BG. All that's left is to see the registers above and the section
"RENDERING THE SCREEN" below for details.

The Color Window
----------------

The color window is rather different. The color window itself can be set
to clip the colors of pixels to black (before math, so it's almost the same
effect you'd get by setting all entries in the palette to black, then fixing
them before you do subscreen addition--the only difference is that half math
will not occur), and to prevent all color math effects from occurring. These
can be applied never, always, inside the "clip" windows specified for the color
window, or outside the "clip" window.

Bits 6-7 of register $2130 controls whether the pixel colors (and half-math)
will be clipped inside the window, outside the window, never, or always. Bits
4-5 do the same for preventing color math.

Consider the main screen set up so BGs 1 and 2 are visible in an 8x8
checkerboard pattern, with all the BG1 pixels red and all the BG2 pixels blue.
The subscreen is filled with a green BG, and color math is enabled on BG 1
only. You'll end up with a yellow and blue checkerboard. Turn on the color
window to clip colors, and you'll get a green and black checkerboard since
the subscreen is only added (to a black pixel) where BG1 would be visible. If
you clip math instead, you'll get the same display you'd get with color math
disabled on all BGs.

In hires modes, we use the previous main-screen pixel to determine whether the
color window effect should be applied to a subscreen pixel. See "Color Math"
below for details.


RENDERING THE SCREEN
====================

Mosaic
------

The mosaic filter is applied after the BG is rendered and scrolled but before
it is clipped, combined with other BGs, pseudo-hiresed, or mathed. Each XxX
block of pixels is replaced with the upper-leftmost pixel of the block. The
'blocks' are such that the upper-leftmost block is at the left edge of the
screen at the scanline where $2106 was written (or the first visible scanline
if it was not written this frame).

Modes 5/6 Hires work slightly differently: they use a 2XxX block of
half-pixels. Similarly, Modes 5/6 interlaced use a 2Xx2X block of half-pixels.
So if you set $2106 to $0F ("1x1" blocks), the even half-pixels will be
expanded to cover the odd half-pixels. $1F would cover the next even-and-odd
pixel over as well. An example: put a single red pixel at line #1 pixel #0 of
Mode 5 BG1, and a single blue pixel in the same place on BG2. Enable BG1 on
main and BG2 on sub, you'll see the blue pixel only. Set $2106=$03, and you'll
suddenly see both the blue and red pixels. Set $2106=$13, and you'll see
"BRBR" on two lines.

Mode 7's matrix transformations do not affect the mosaic block positions, so
BG1 can be mosaiced about as normal. BG2 in EXTBG mode is weird, though: it
uses bit 0 of $2106 to control "vertical mosaic" and bit 1 to control
"horizontal mosaic". So if $2106 is $F1, BG2 will expand with 1x16 blocks. $F2
will give 16x1 blocks, and only $F3 will give the expected 16x16 blocks. Note
that BG1 still uses bit 0 as usual, so you can have BG1 expanded with 16x16
blocks and the high-priority BG2 pixels expanded with 1x16 blocks on top of it.
Or you could have BG1 rendered as normal, but with the high-priority pixels
from BG2 expanded 16x1 on top of it.


Color Math
----------

Each main-screen BG (and the color-0 backdrop, and the sprites (although
sprites with palettes 0-3 never participate)) may be marked in register $2131
to participate in color math. If the visible pixel is from a layer/OBJ
participating in color math, we perform one of 8 operations on the pixel,
depending on $2130 bit 1 and $2131 bits 6-7.

  0 00: Add the fixed color. R, G, and B are added separately, and clipped to
        the max.
  0 01: Add the fixed color, and divide the result by 2 before clipping (unless
        the Color Window is clipping colors here).
  0 10: Subtract the fixed color from the pixel. For example, if the pixel is
        (31,31,0) and the fixed color is (0,16,16), the result is (31,15,0).
  0 11: Subtract the fixed color, and divide the result by 2 (unless CW etc).
  1 00: Add the corresopnding subscreen pixel, or the fixed color if it's the
        subscreen backdrop.
  1 01: Add the subscreen pixel and divide by 2 (unless CW etc), or add the
        fixed color with no division.
  1 10: Subtract the subscreen pixel/fixed color.
  1 11: Subtract the subscreen pixel and divide by 2 (unless CW etc), or sub
        the fixed color with no division.

In hires modes, color math is applied to the visible subscreen pixels as well.
Choosing the math operation is simple: look at the previous main-screen pixel
(i.e. if we're at pixel #6 on the 512-pixel screen (which is taken from pixel
#4 on the subscreen), we look at pixel #5 (#3 on the main screen)). If no math
was applied to that pixel, don't math this subscreen pixel either. If the
fixed color was added/subtracted, add/subtract the fixed color. And if a pixel
from the subscreen was added/subtracted, add/subtract that main-screen pixel
(the original value before math). What happens to the subscreen pixel at the
left edge of the screen is unknown.

This is really important with color subtraction: normally, if you have
a block of cyan (#00ffff) on main and a block of magenta (#ff00ff) on sub,
subtraction would give a block of green. Hires math will give you a block of
alternating green and red, which will probably appear yellow on your TV. If
you've set $2131 bit 6 and this block is sitting alone in the middle of the
backdrop, you'll have a bright line at the left edge where the fixed color was
subtracted from the subscreen pixel and no 1/2 was applied (because the
previous main pixel had the fixed color subtracted and no 1/2 applied).


Rendering the Screen
--------------------

Note that this may be inaccurate.

 1) Go down the priority list to find the first BG/OBJ layer that is enabled on
    main, not clipped, and has a non-transparent pixel here. You'll always
    bottom out on the backdrop (color 0) if not before.
 2) If the color window clips colors here, set the color of that pixel to 0.
 3) If color math is applicable and the color window doesn't clip math here, do
    math.

Hires modes (BG modes 5 and 6 or any mode 0-4 with bit 3 of $2133 set) should
process the visible subscreen pixels as described above.



CONTROLLERS
===========

The SNES has 2 controller ports on the front of the unit, and an "expansion
port" on the bottom (which AFAIK was only used by a few things released only
in Japan). Little is known about the expansion port.

A number of peripherals could be plugged into the controller ports:
 * Joypads
 * The Multitap (aka MP5), into which up to 4 joypads may be plugged.
 * A Mouse, with 2 buttons.
 * The SuperScope, a bazooka-like light gun.
 * The Konami Justifiers, a normal style gun into which a second gun could
   be plugged.

There are probably others, these are just the ones I know anything about.


Generic
-------

The controller ports of the SNES has 7 pins, laid out something like this:
   _________________ ____________
  |                 |            \
  | (1) (2) (3) (4) | (5) (6) (7) |
  |_________________|____________/

The pins are:
 1: +5v (power)
 2: Clock
 3: Latch
 4: Data1
 5: Data2
 6: IOBit
 7: Ground

Latch is written through bit 0 of register $4016. Writing 1 to this bit
results in Latch going to whatever state means 'latch' to a joypad.

Clock of Port 1 is connected to the 'read' signal of $4016, in that reading
$4016 causes Clock to transition. Data1 and Data2 are then read, and Clock
transitions back (at this point, the pad is expected to stick its next bits
of data on Data1 and Data2). Clock of Port 2 is connected to $4017.

Data1 and Data2 are read through bits 0 and 1 (respectively) of $4016 and
$4017 (for Ports 1 and 2, respectively). Thus, you must read both bits at
once, you can't choose to read only Data1 and leave Data2 for later.

IOBit is connected to the I/O Port (which is accessed through registers
$4201 and $4213). Port 1's IOBit is connected to bit 6 of the I/O Port, and
Port 2's IOBit is connected to bit 7. Note that, since bit 7 of the I/O Port
is connected to the PPU Counter Latch, anything plugged into Port 2 may
latch the H and V Counters by setting IOBit to 0.

Auto Joypad Read, when enabled by bit 0 of $4200, effectively does the
following (in pseudo-ASM):
  LDA $4212
  ORA #$01
  STA $4212  ; pretend it's writable
  
  LDA #$01
  STA $4016
  ; There may be a delay here
  STZ $4016
  
  LDX #$0010
  loop:
      LDA $4016
      REP #$20
      LSR
      ROL $4218
      LSR
      ROL $421C
      SEP #$20

      LDA $4017
      REP #$20
      LSR
      ROL $421A
      LSR
      ROL $421E
      SEP #$20
      
      DEX
  BNE loop
  
  LDA $4212
  AND #$7E
  STA $4212  ; pretend it's writable again


"Open Port"
-----------

If nothing is plugged into a port (or the thing plugged in doesn't connect to
the pin), the SNES will read zeros from Data1 and Data2.


Joypads
-------

The joypads return 16 bits of data out Data1, then one bits until
latched again. The data is:
  byetUDLRaxlr0000

b/y/a/x/l/r are the similarly named buttons. 'e' is select. 't' is start.
U/D/L/R are the pad directions. Note that the standard joypad can only
return either U or D set, and either L or R set. Some games will crash or
exhibit other odd behavior if both U and D and/or both L and R are set.

Data2 is not even connected, nor is IOBit.


Mouse
-----

The mouse returns 32 bits of data out Data1, and 1 bits thereafter. The data
is:
  00000000rlss0001 YyyyyyyyXxxxxxxx

l/r are the two mouse buttons. 'ss' are the "speed bits", which are
incremented mod 3 if Clock cycles while Latch is active. Y/X are the
direction bits (set is up/left), and yyyyyyy/xxxxxxx are the distance
traveled in the appropriate direction.

Supposedly, the 'speed bits' may not match the internal speed setting when
the mouse first receives power. The speed setting controls the delta curve
of the mouse, with 0 giving a flat curve and 2 giving the greatest delta
response.

Data2 and IOBit are presumably not connected, but this is not known for
sure.


SuperScope
----------

The SuperScope returns 8 bits of data out Data1, and 1 bits thereafter. The
data is:
  fctp00on

'f' is Fire, 'c' is Cursor, 't' is Turbo, 'p' is Pause, 'o' is Offscreen,
and 'n' is Noise.

The SuperScope has two modes of operation: normal mode and turbo mode. The
current mode is controlled by a switch on the unit, and is indicated by the
't' bit. Note however that the 't' bit is only updated when the Fire button
is pressed (i.e. the 'f' bit is set). Thus, when you turn turbo on the 't'
bit remains clear until you shoot, and similarly when turbo is deactivated
the bit remains set until you fire.

In either mode, the Pause bit will be set for the first strobe after the
pause button is pressed, and then will be clear for subsequent strobes until
the button is pressed again. However, the pause button is ignored if either
cursor or fire are down(?).

In either mode, the Cursor bit will be set while the Cursor button is pressed.

In normal mode, the Fire bit operates like Pause: it is on for only one strobe.
In turbo mode, it remains set as long as the button is held down.

When Fire/Cursor are set, Offscreen will be set if the gun did not latch during
the previous strobe and cleared otherwise (Offscreen is not altered when
Fire/Cursor are both clear).

Noise is set if there is interference in the infrared transmission from the
Scope to the receiver.

If the Fire button is being held when turbo mode is activated, the gun sets the
Fire bit and begins latching. If the Fire button is being held when turbo mode
is deactivated, the next poll will have Fire clear but the Turbo bit will not
be updated until the next fire (i.e. FcTp => turbo off => fcTp, not fctp).

The PPU latch operates as follows: When Fire or Cursor is set, IOBit is set
to 0 when the gun sees the TV's electron gun, and left a 1 otherwise. Thus,
if the SNES also leaves it one (bit 7 of $4201), the PPU Counters will be
latched at that point. This would also imply that bit 7 of $4213 will be 0
at the moment the SuperScope sees the electron gun.

Since the gun depends on the latching behavior of IOBit, it will only
function properly when plugged into Port 2. If plugged into Port 1 instead,
everything will work except that there will be no way to tell where on the
screen the gun is pointing.

When creating graphics for the SuperScope, note that the color red is not
detected. For best results, use colors with the blue component over 75% and/or
the green component over 50%.

Data2 is presumably not connected, but this is not known for sure.


Justifiers
----------

The Justifier returns 48 bits of data out Data1. Presumably it returns
one bits after (if so, it really only returns 32 bits), but this is not
known. The data is:
  0000000000001110 01010101TtSsl000 1111111111111111

T/t are the trigger states for guns 1 and 2. S/s are the start button states
for guns 1 and 2. 'l' indicates which gun was connected to IOBit: 1 means
gun 1, 0 means gun 2. Note that 'l' toggles even when gun 2 is not connected.

IOBit is used just like for the SuperScope. However, since two guns may be
plugged into one port, which gun is actually connected to IOBit changes each
time Latch cycles. Also note, the Justifier does not wait for the trigger to
be pulled before attempting to latch, it will latch every time it sees the
electron gun. Bit 6 of $213F may be used to determine if the Justifier was
pointed at the screen or not.

Data2 is presumably not connected, but this is not known for sure.


MP5
---

The MP5 plugs into one Controller Port on the SNES (typically Port 2), and
has 4 ports for controllers to be plugged into it (labeled 2 through 5). It
also has an override switch which makes it pass through Pad 2 and ignore
everything else.

If IOBit is 1, Clock is passed through to Pad 2 and Pad 3, Data1 is
connected to Data1 on Pad 2, and Data2 is connected to Data1 on Pad 3. If
IOBit is 0, Pads 4 and 5 are used instead of 2 and 3, respectively. In
either case, Latch is passed through to all pads, and IOBit is presumably
not passed through at all.

Note that Clock is only passed through to the pads that are actually being
passed through. Thus, you can read the first two pads (or let Auto-Joypad
Read do it), then toggle IOBit and read the other two pads manually. Most
games requiring more than 3 players do exactly this.

Also note that there is nothing preventing the MP5 from functioning
perfectly when plugged in to Port 1, except that the game must use bit 6 of
$4201 instead of bit 7 to set IOBit and must use the Port 1 registers
instead of the Port 2 registers. With 2 MP5 units, one could actually create
an 8-player game!

When Latch is active, 1s will be read from Data2 and 0s from Data1. This is
sometmies used to detect the presence of an MP5 unit. The override switch
disables this behavior.

There are reports that the MP5 does not react immediately when IOBit is
transitioned from 0 to 1. Thus, reading 2&3 then 4&5 will probably work
better than vice versa.


DMA AND HDMA
============

DMA, or "direct memory access" is found in a number of computer systems, not
just the Super Nintendo. It's basically a way for a peripheral or
coprocessor to read data directly from memory, instead of requiring the main
CPU to do a number of reads and writes. This is typically faster, if only
because it lets the system skip the opcode fetch-and-decode. In the SNES, the
CPU is paused during DMA since the address busses are in use for the transfer.

HDMA is similar in concept, though rather different in execution: instead of
transferring a block of memory all at once, it transfers a few bytes during
the H-Blank period of each scanline. This is extremely helpful, as most PPU
registers may only be changed during a frame (at least without glitching)
during this narrow window.

The SNES has 8 channels (numbered 0-7) that can be used for either DMA or HDMA.
HDMA takes priority over DMA if both are to occur at once, pausing all DMA and
terminating a conflicting DMA immediately. Lower-numbered channels take
priority over higher-numbered channels.


DMA
---

A DMA transfer has three main variables, and a number of setting bits. These
are: (those marked '*' must be set up before starting DMA)
* Direction (bit 7 of $43x0): Read from PPU or write to PPU?
* Fixed (bit 3 of $43x0): Adjust Address?
* Increment (bit 4 of $43x0): Direction to adjust Address?
* Mode (bits 0-2 of $43x0): See below...
* Port (register $43x1): If this is 'xx', the register accessed will be $21xx.
* AAddress (registers $43x2-4): Any CPU address, just like you'd use with
                                the Absolute Long addressing mode.
* Count (registers $43x5-6): The number of bytes to transfer.

See register $43x0 for the correspondance between the Mode bits and the
transfer mode. Note that One Register Write Once and One Register Write
Twice end up being the exact same thing, and Two Registers Write Once
and Two Registers Write Twice Alternale are the same, but that Two Registers
Write Once and Two Registers Write Twice Each are different.

DMA transfers take 8 master cycles per byte transferred, no matter the
FastROM setting. There is also an overhead of 8 master cycles per channel, and
an overhead of 12-24 cycles for the whole transfer.

The basic process seems to be:
 1. Get byte and write it to the destination.
    - The DMA seems to take advantage of the SNES's two address busses with one
      shared data bus. AAddress is pushed out Bus A, Port is pushed out bus B,
      and the read/write signals are sent according to Direction. The bus
      marked read obligingly put data on the bus, while the bus marked write
      obligingly writes that value.
    - Thus, since the PPU/APU/WRAM registers are only accessible via Bus B,
      attempts to access them via AAddress will result in Open Bus accesses.
    - Attempts to access WRAM via both Bus A and Bus B (registers 2180-3) will
      fail, with the 2180-3 access being Open Bussed.
    - Also, DMA cannot access the $4300-$437f registers nor $420b nor $420c.
      Writes will have no effect, and reads will return Open Bus.
 2. Adjust AAddress.
    - If Fixed is set, do nothing. Else if Increment is set, subtract one,
      else add one.
    - Note that the bank byte is not modified.
 3. Decrement Count. If count is not zero, then go to step 1.
    - Thus, if Count is initially zero, it wraps to 65535 before being
      tested. So you end up transferring 65536 bytes.

Note that Count ($43x5-6) ends up always 0, unless a conflicting HDMA
terminates the transfer early.


HDMA
----

HDMA has 4 flags and 5 variables. Again, those marked '*' are required
before starting HDMA. In addition, those marked '+' are required if HDMA is
to be started mid-frame.
* Addressing Mode (bit 6 of $43x0): If clear, Direct, else Indirect.
* Transfer Mode (bits 0-2 of $43x0): See below...
* Port ($43x1): As for DMA.
* AAddress ($43x2-4): Pointer to the HDMA Table. Not really 'required' for
  starting mid-frame, but unless you're going to stop it before the next
  init...
- Indirect Address ($43x5-6): Used with Indirect Bank. See below...
* Indirect Bank ($43x7): Used with Indirect Address. See below...
+ Address ($43x8-9): See below...
+ Repeat (bit 7 of $43xA): Whether to write every scanline or not
+ Line Counter (bits 0-6 of $43xA): See below...
- DoTransfer: Used internally.

Modes are the same as for DMA. However, note that only one cycle through the
mode is done per scanline, so One Register Write Once will write 1 byte per
scanline, while One Register Write Twice will write two.

For each scanline during which HDMA is active (i.e. at least one channel is not
paused and has not terminated yet for the frame), there are ~18 master cycles
overhead. Each active channel incurs another 8 master cycles overhead (during
which time $43xA is presumably loaded if necessary) for every scanline, whether
or not a transfer actually occurs. If a new indirect address is required, 16
master cycles are taken to load it. Then 8 cycles per byte transferred are
used. Thus, HDMA takes a maximum of 466 master cycles per scanline (if all 8
channels are active, require an indirect address load, and transfer 4 bytes).

The basic process has two sections. First, at the beginning of the frame (V=0
H=approx 6), for all active HDMA channels (see register $420c):
 1. Copy AAddress into Address.
 2. Load $43xA (Line Counter and Repeat) from the table. I believe $00 will
    terminate this channel immediately.
 3. Load Indirect Address, if necessary.
 4. Set DoTransfer to true.

The CPU is paused during this time. Overhead is ~18 master cycles, plus 8
master cycles for each channel set for direct HDMA and 24 master cycles for
each channel set for indirect HDMA.

If you are starting HDMA mid-frame, you must basically do the init process
manually by setting $43x8-A, and $43x5-6 for indirect channels. Note though
that there is no way to perform step 4, so no transfer will be done the first
transfer period. Also, note that a channel that has already terminated for the
frame cannot be restarted.
XXX: Or does it automatically do Step 4 when you enable the channel?

Then, for each scanline from V=0 to V=$e0 (or V=$ef is overscan is enabled) at
about H=$116:
 1. If DoTransfer is false, skip to step 3.
 2. For the number of bytes (1, 2, or 4) required for this Transfer Mode...
    a. Read a byte from Address or Indirect Address, and increment.
    b. Write the byte to Port, Port+1, Port+2, or Port+3, depending on the
       Transfer Mode and which byte we're on.
    - The same notes regarding DMA from PPU to PPU or RAM to RAM via $2180
      apply here as well.
 3. Decrement $43xA.
 4. Set DoTransfer to the value of Repeat.
 5. If Line Counter is zero...
    a. Read the next byte from Address into $43xA (thus, into both Line
       Counter and Repeat).
    b. If Addressing Mode is Indirect, read two bytes from Address into
       Indirect Address (and increment Address by two bytes).
       - One oddity: if $43xA is 0 and this is the last active HDMA channel for
         this scanline, only load one byte for Address, and use the
         $00 for the low byte. So Address ends up incremented one less than
         otherwise expected, and one less CPU Cycle is used.
    c. If $43xA is zero, terminate this HDMA channel for this frame. The bit in
       $420c is not cleared, though, so it may be automatically restarted next
       frame.
    d. Set DoTransfer to true.
 6. Continue with Step 1 next scanline.

HDMA does not occur during V-Blank, as any writes it might perform are
likely have no visible effect anyway. The start-of-frame processing then resets
all active channels at the end of V-Blank. This allows updating of the HDMA
registers during V-Blank without worrying about the transfer beginning
immediately and scribbling on the PPU state.

Note how the above implicitly defines the format of the HDMA table.
Explicitly, the format is a series of entries. Each entry begins with a line
count and repeat flag. If repeat is false, there is one scanline worth of
data following and the count is the number of scanlines to wait before
processing the next entry. If it's true, the line count is the number of
scanlines worth of data following. The data following is either a pointer to
the data (for Indirect HDMA), or the data itself (for Direct HDMA).

Looking at the above, it's clear why Address, and Repeat/Line Counter must
be initialized by hand when starting HDMA mid-frame: they're only
automatically initialized at the start of the frame. Note how AAddress is
not affected by HDMA, though Address and Repeat/Line Counter are.


HISTORY
=======

In the beginning... Well, ok, somewhere in the middle is where I came in. In
my beginning, there was Yoshi's register doc (the one with bananas) and
snesmap.txt (the one missing all those appendices). Both good register docs,
complete as far as bits go, but sorely lacking in the explanations. And there
was the snes9x source, with more clear semantics but many errors. I began
writing test ROMs for others to run, and discover the real behavior of the
SNES. Soon, someone lent me a device to test the ROMs myself.

I discovered many things, and lamented the errors in the documentation
available. Finally one day I decided to sit down and write out everything I
had discovered. An early version of this was the result. Along the way, and
continuing since, I've revised this document with new findings. I believe this
is the most accurate SNES PPU/graphics document available today. Enjoy!