News Team Current issue History Online Support Download Forum @Pouet

01 - 02 - SE - 03 - 04 - 05 - 06 - 07 - 08 - 09 - 10 - 11 - 12 - 13 - 14

Alive 6
[ Back to Main ]

3.) The Blitter
(The Blitter in the STE is, at least from the programmers view,
 identical to the Blitter in the Mega ST. Hardware-wise, it is


  Halftone RAM:
    $FFFF8A00  Halftone RAM, Word 0    (16 Words in total)
    $FFFF8A1E  Halftone RAM, Word 15

  Halftone RAM is a fast 32 Byte Blitter-exclusive RAM
  that can be used for lightning-quick manipulations of copied
  data. Its main purpose was to combine monochrome picture data
  with (16 x 16 pixel) patterns, usually to make them a bit
  darker (halftone).

  Source X Increment Register:
    $FFFF8A20  X X X X X X X X X X X X X X X 0
  Source Y Increment Register:
    $FFFF8A22  X X X X X X X X X X X X X X X X

    These registers encode how many bytes the Blitter increments the
    counter after each copied word ($FFFF8A20) or after each line
    ($FFFF8A22). Source Y Inc has to be even since the Blitter only
    works on a Word-basis and can not access single Bytes.

  Source Address Register:

    The 32-Bit address of the source, meaning the Blitter will start
    reading from this address. This address has to be even as the
    Blitter cannot access single Bytes. The Blitter actually accepts
    real 32-Bit addresses, but the MMU filters the upper 10 bytes

  Endmask Registers
    $FFFF8A28  X X X X X X X X X X X X X X X X  Endmask 1
    $FFFF82AA  X X X X X X X X X X X X X X X X  Endmask 2
    $FFFF82AC  X X X X X X X X X X X X X X X X  Endmask 3

    The Endmask is a Bitmask that can be applied upon the copied
    data in a blockwise way. Endmask 1 is being applied on every
    first word copied in a row, Endmask 2 for all other words
    in this row except for the last one, which is combined with
    Endmask 3. Clever usage of these registers allow to start
    copies from basically every bit in memory.

  Destination X Increment Register:
    $FFFF8A2E  X X X X X X X X X X X X X X X X
  Destination Y Increment Register:
    $FFFF8A30  X X X X X X X X X X X X X X X X

    Similar to the Source X/Y Increment Register. These two denote
    how many Bytes after each copied word/line the Blitter proceeds.

  Destination Address Register:

    This contains the address where the Blitter copies all the
    data to that it computes. A real 32 Bit word that has to be even.

  X Count Register:
    $FFFF8A36  X X X X X X X X X X X X X X X X
  Y Count Register:
    $FFFF8A38  X X X X X X X X X X X X X X X X

    These two registers contain the information about how the 2D
    bitblocks the Blitter copies are shaped. The X Count Register
    contains how many words (!) a line of this rectangular block
    has, the Y-Count how many lines the bitblock has in total.
    This does not include the skipped words, only those the
    Blitter really copies (hence the name count).

  Blit HOP (Halftone OPeration) Register:
    $FFFF8A3A  0 0 0 0 0 0 X X

    How to combine Halftone-Data and copied data is given here.
    A "00" means all copied bits will be set to "1" (blind copy),
    "01" means ONLY halftone content will be copied, "10" implies
    that ONLY source content will be copied (1:1 copy). "11" makes
    the halftone-pattern work as supposed and does a copy
    "Halftone AND source".

  Blit OP (logical OPeration) Register:
    $FFFF8A3B  0 0 0 0 X X X X

    The Blitter can carry out 0-cycles logical operations with
    source and target. The table of possible values follow:
    0 0 0 0    - Target will be zeroed out (blind copy)
    0 0 0 1    - Source AND Target         (inverse copy)
    0 0 1 0    - Source AND NOT Target     (mask copy)
    0 0 1 1    - Source only               (replace copy)
    0 1 0 0    - NOT Source AND Target     (mask copy)
    0 1 0 1    - Target unchanged          (null copy)
    0 1 1 0    - Source XOR Target         (xor copy)
    0 1 1 1    - Source OR Target          (combine copy)
    1 0 0 0    - NOT Source AND NOT Target (complex mask copy)
    1 0 0 1    - NOT Source XOR Target     (complex combine copy)
    1 0 1 0    - NOT Target                (reverse, no copy)
    1 0 1 1    - Source OR NOT Target      (mask copy)
    1 1 0 0    - NOT Source                (reverse direct copy)
    1 1 0 1    - NOT Source OR Target      (reverse combine)
    1 1 1 0    - NOT Source OR NOT Target  (complex reverse copy)
    1 1 1 1    - Target is set to "1"      (blind copy)

  Blitter Control Register:
    $FFFF8A3C  X X X 0 X X X X

    This register serves multiple purposes.
    The lowest 4 bit represent the number of the line in the
    Halftone pattern to use on all blits of this line.
    The upper 3 bits feature extended options of the Blitter.
    Bit 5  -  Smudge-mode
              Which line of the halftone pattern to be used is read
              from the lowest 4 bits of the source buffer when the
              copy starts
    Bit 6  -  Blit-Mode Register
              Decides wether to copy in BLIT Mode (0) or in
              HOG Mode (1). In Blit Mode (also known as cooperative),
              CPU and Blitter get 64 clockcycles in turns, in Hog
              Mode, the Blitter reserves and hogs the bus for as long
              as the copy takes, CPU and DMA get no Bus access.
    Bit 7  -  Busy Bit
              Turns on the Blitter activity and stays "1" until the
              copy is finished

  Blitter Skew Register:
    $FFFF8A3D  X X 0 0 X X X X

    The lowest 4 bit of this register allow to shift the data while
    copying by up to 15 bits to the right. The upper 2 bits are
    Bit 6  -  NFSR (No final source read)
    Bit 7  -  FXSR (Force extra Source Read).
    NFSR means the last word of course is not being read anymore.
    This is only sensible with certain Endmask and skew values.
    FXSR is the opposite and forces the Blitter to read one more
    word at the beginning of a line. Also only sensible with certain
    Endmask/Skew combinations.

So much for the theory. Unfortunately, the Blitter is a lovely but
also pretty stubborn little chip. What went wrong this time ?

? After feeding the Blitter values and activating it, the STE
  totally crashes.
! All the address-related auxilary registers such as X-Count/Y-Count,
  X/Y-Increments etc. are signed values. In other words, the Blitter
  can go backwards in memory as well as forward. Please check if your
  values are correct.

? I am trying a simple and direct copy and set all the important
  registers, but it does not work as i planned.
! The Blitter is a chip and not a software, meaning it does not know
  any default values. Especially when starting to learn "Blitter" it
  is important to ALWAYS set EVERY Register correctly.
  Especially Endmask, Smudge, Skew and OP-Register can lead to very
  funny results if not set correctly. So set ALL the registers at
  least once, for all subsequent copies you do not need to set them
  ALL anymore. Registers modified by a copy are Source and Target
  addresses and the X- and Y-count registers. If you are subsequently
  copying blocks of same size and shape, you will only have to reinit
  these registers.

? The copy appears at the right spot, but is scrambled.
! Make sure your X/Y-Increments are correct for both Source and
  Destination. Especially if you are copying a "tight" block
  (like a 32x32 pixel compact block) to a larger area (like the
  screen) you definetly need to watch the increment registers.
! Also note that after the last word of a line as been copied,
  the Blitter does NOT add the X-increment but only the
  Y-increment. A sensible Y-increment is therefore usually at
  least as large as the X-increment plus the rest of the offset.

? Now the first copy works, but even though i am copying blocks of
  identical size, just setting addresses does not work.
! No, the Blitter uses a few of the registers accessible by the CPU
  for its own counting. Set Addresses, X and Y-Count Registers
  every time you do a copy in any case. If the shape of the blocks
  you copy change, also change X- and Y-Source/Destination

? So i set all the registers, but the copies are incomplete when
  i do multiple copies.
! Before feeding the Blitter new values, make sure it has finished
  its task already by checking the Busy-Bit.
  Do not write new values into the Blitter's registers as long as
  it is still operating.

? It looks like the copy itself works, but it flickers. And i was
  using the Blitter to speed things up, not to make them flicker.
! After feeding the Blitter all the values and activating it, the
  CPU is done and can do other tasks, the Blitter however has just
  started. If the Blitter does critical things in your program make
  sure the "Blit Busy" has returned from "1" to "0" before your CPU
  proceeds when using the Blitter in Blit-mode.

? To make it even faster, i turned the Blitter into Hog-mode.
  But now my program behaves oddly and crashes sometimes at
! The Hog-Mode of the Blitter does not allow the CPU to access to
  bus while the Blitter is active - Not even for interrupts. Make
  sure that your software does not require the CPU to react to
  an interrupt immediatelly - Otherwise, the STE will crash.
  This might turn out especially ugly when using interrupts that
  are critically timed, for example for screen swapping, music
  driving or maybe even Module-replay. Never ever try to use the
  Blitter in hog mode for larger copies under these conditions.

? Is there a way to make the Blitter faster in Blit-mode ?
! Yes, there is. Atari used this to speed up the Blitter in GEM
  without risking to use Hog-mode:
  Check the Busy-Bit. The CPU cannot access the bus and therefore
  not the Busy-Bit if the Blitter is "active". If the CPU can finally
  check the Busy-Bit the Blitter has "paused" and will wait for 64
  clockcycles. Now if the Busy-Bit is 0, the Blitter is done and
  you can leave. If not, set it to "1" manually and do a NOP.
  Writing the Busy-Register will relaunch the Blitter immediatelly,
  but the Blitter needs a few clockcycles to reserve the bus
  (around 7), so the NOP is carried out in any case.
  This gives about 90% the speed of the HOG-mode without losing
  the option to execute interrupts after the next 64 clockcycles.
  Here's an extract from the ST Profibook:
     Loop:    bset.b #7,$FFFF8A3B  ;test and set Busy-Bit
              nop                  ;do a NOP in any case
              bne.s Loop           ;if Busy-Bit was "1", go to Loop
! For copying little blocks (like 16x16 pixels), it is usually
  sufficient to restart the Blitter just once by using a bset.b #7
  instrucion. This will save a few buscycles for the CPU. Some
  experiments are recommended.

? Huh ? My program does not work on the TT ?
! No, it does not. The TT does not have a Blitter.

? I am dissappointed by the Blitter speed for the way i am using it.
  When is it sensible to use the Blitter at all ?
! In fact, the Blitter does not reveal its true potential on small
  blocks. If you are copying let's say 32x16 pixel blocks in 1 or 2
  bitplanes (64 or 128 bytes), the Blitter will not outspeed the
  68000 of the STE in a direct copy and since preshifted blocks of
  that size do not cost a lot of memory, it is also no problem to
  store preshifted blocks of that size. Therefore it is not really
  sensible to use the Blitter on anything smaller than that.
  However, the larger the blocks are you are copying, the more sense
  it will make to rely on the Blitter.

? I am coding the Blitter on the Falcon to reduce CPU usage a bit
  but the program has slowed down even more.
! Unfortunately, the Falcon Blitter is rather useless since the
  68030 is, when doing a simple 1:1 copy, about a factor of 4 to 5
  faster than the Blitter in the Falcon is, even though the Falcon
  Blitter is running at 16 MHz.
  On the Falcon, the Blitter can become useful if you plan to
  heavily use Halftone-pattern, bitwise-shifts and logical operations.
  Otherwise, use the CPU instead.

? I was trying to use the shift-operations of the Blitter to have
  my objects on screen (ST Lowres) move pixelwise, but instead,
  Bitplanes are being screwed up.
! Please bear in mind the interleaved bitplane structure of the
  ST Low resolution. Trying to copy and shift all bitplanes at
  once will make the Blitter shift single bits from bitplane X
  to bitplane Y. Copy bitplane by bitplane and it will work.

? Trying to shift a 16x16 pixel block in one bitplane to the right
  does not work. Why ?
! The Blitter will always do a copy, meaning, it will always read
  a word to write a word. If you have a 16x16 pixel block you want
  to shift to the right by one pixel (=bit), the Blitter will need
  to write 2 words to screen, the first word will have a zero
  shifted in to the very left, and the second word will contain the
  rightmost bit of the first word when it was unshifted. To write
  this word however, the Blitter will also read a word, meaning, the
  next line of your 16x16 pixel block.
  The easiest solution to this problem is to use a 32x16 pixel block
  instead and copy 2 words each line.
? Can't i copy 2 words, but use "No-Final-Source-Read" on the second
  word each line ?
! Unfortunately not. The flag "No-Final-Source-Read" will mean that
  the Blitter does absolutely no source operations, meaning, it will
  neither skew nor clear the source buffer. This way, the word
  previously written to the screen will be written again.

? So i can do shifts to the right. Can i also do shifts to the
  left ?
! Yes, but it is a bit more complicated since you will have to
  rely on sensible ENDMASK-settings, skew values, the FXSR-switch
  and in some cases even the NFSR-flag. Then copy from the right
  to the left.

? When copying less than 3 words, in what way are the ENDMASKs
  used ?
! If copying just one word, ENDMASK1 will be aplied only.
  Copying 2 words a line involves ENDMASK1 on the first and
  ENDMASK3 on the second and therefore last word in each line.
  Copying 3 words and more will mean that ENDMASK1 is applied on
  every first word of each line, ENDMASK3 on the last word and
  ENDMASK2 on each words in between.

? Copying and shifting blocks with the Blitter works now, but
  sometimes, a few bits get lost.
! In some cases, depending on the Endmask- and the Skew-registers,
  the Blitter requires to read a word more than planned. Try
  the FXSR-Register in these certain conditions.

? I heard somewhere, that the Blitter can be used for generating
  software sprites all by itself. Is that true ?
! Yes, you can have software sprites using the Blitter, that
  can be freely positioned (pixel-perfect) without any other
  interference of the CPU than just feeding values into the
  Blitter registers. However, the Blitter cannot produce a
  4 bitplane software-sprite in 1 go.
  The simplest and most convenient way is to generate a 1 bitplane
  mask for all sprites you are going to use. This does not mean to
  preshift them, but to generate the mask for all bitplanes. This
  can easily be done by either CPU or Blitter by logically
  or-combining all 4 bitplanes. Now for software sprites, you use
  the Blitter to shift and logically combine NOT Mask AND screen
  content for all 4 bitplanes, then to copy Sprite OR screen
  for all 4 bitplanes.
  There are ways of doing this faster, but this is very easy to
  program and yet pretty quick, especially for large sprites.

? I program the Falcon in true-colour mode and i would like to
  take advantage of the Blitter.
! Even though of course the Blitter works well in TC-mode, its
  special features, bitwise shifts, extremely fast logical
  operations, masks for bitwise copy and the halftone pattern,
  are basically useless and for a direct copy, the CPU is a lot

? I do not understand the sense of the Halftone-pattern and the
  smudge register ?
! These registers are not being used very regularly and it can be
  assumed that they have been implemented mainly for compatibility
  purposes since the "BitBLT"-algorithm is well defined.
  However, for monochrome patterns, the Halftone-pattern can be
  used for easily applying fill patterns on blocks or for scaling
  the brightness of blocks.
  The smudge register was intended for introducing some kind of a
  random function of the Blitter without involving any math. It
  can be used also for applying a certain line of the halftone-RAM
  to one whole line of the bitblock by putting a value which line
  of the halftone-RAM to use on the beginning of each line of your
  bitblock, but this is already advanced stuff and will not be
  discussed any further.

[ Back to Main ]
[ Onto next Chapter ]

Alive 6