|
|
[ Back to Main ] 3.) The Blitter --------------- (The Blitter in the STE is, at least from the programmers view, identical to the Blitter in the Mega ST. Hardware-wise, it is not) Registers: Halftone RAM: $FFFF8A00 Halftone RAM, Word 0 (16 Words in total) ... $FFFF8A1E Halftone RAM, Word 15 Halftone RAM is a fast 32 Byte Blitter-exclusive RAM that can be used for lightning-quick manipulations of copied data. Its main purpose was to combine monochrome picture data with (16 x 16 pixel) patterns, usually to make them a bit darker (halftone). Source X Increment Register: $FFFF8A20 X X X X X X X X X X X X X X X 0 Source Y Increment Register: $FFFF8A22 X X X X X X X X X X X X X X X X These registers encode how many bytes the Blitter increments the counter after each copied word ($FFFF8A20) or after each line ($FFFF8A22). Source Y Inc has to be even since the Blitter only works on a Word-basis and can not access single Bytes. Source Address Register: $FFFF8A24 XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXX0 The 32-Bit address of the source, meaning the Blitter will start reading from this address. This address has to be even as the Blitter cannot access single Bytes. The Blitter actually accepts real 32-Bit addresses, but the MMU filters the upper 10 bytes out. Endmask Registers $FFFF8A28 X X X X X X X X X X X X X X X X Endmask 1 $FFFF82AA X X X X X X X X X X X X X X X X Endmask 2 $FFFF82AC X X X X X X X X X X X X X X X X Endmask 3 The Endmask is a Bitmask that can be applied upon the copied data in a blockwise way. Endmask 1 is being applied on every first word copied in a row, Endmask 2 for all other words in this row except for the last one, which is combined with Endmask 3. Clever usage of these registers allow to start copies from basically every bit in memory. Destination X Increment Register: $FFFF8A2E X X X X X X X X X X X X X X X X Destination Y Increment Register: $FFFF8A30 X X X X X X X X X X X X X X X X Similar to the Source X/Y Increment Register. These two denote how many Bytes after each copied word/line the Blitter proceeds. Destination Address Register: $FFFF8A32 XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXX0 This contains the address where the Blitter copies all the data to that it computes. A real 32 Bit word that has to be even. X Count Register: $FFFF8A36 X X X X X X X X X X X X X X X X Y Count Register: $FFFF8A38 X X X X X X X X X X X X X X X X These two registers contain the information about how the 2D bitblocks the Blitter copies are shaped. The X Count Register contains how many words (!) a line of this rectangular block has, the Y-Count how many lines the bitblock has in total. This does not include the skipped words, only those the Blitter really copies (hence the name count). Blit HOP (Halftone OPeration) Register: $FFFF8A3A 0 0 0 0 0 0 X X How to combine Halftone-Data and copied data is given here. A "00" means all copied bits will be set to "1" (blind copy), "01" means ONLY halftone content will be copied, "10" implies that ONLY source content will be copied (1:1 copy). "11" makes the halftone-pattern work as supposed and does a copy "Halftone AND source". Blit OP (logical OPeration) Register: $FFFF8A3B 0 0 0 0 X X X X The Blitter can carry out 0-cycles logical operations with source and target. The table of possible values follow: 0 0 0 0 - Target will be zeroed out (blind copy) 0 0 0 1 - Source AND Target (inverse copy) 0 0 1 0 - Source AND NOT Target (mask copy) 0 0 1 1 - Source only (replace copy) 0 1 0 0 - NOT Source AND Target (mask copy) 0 1 0 1 - Target unchanged (null copy) 0 1 1 0 - Source XOR Target (xor copy) 0 1 1 1 - Source OR Target (combine copy) 1 0 0 0 - NOT Source AND NOT Target (complex mask copy) 1 0 0 1 - NOT Source XOR Target (complex combine copy) 1 0 1 0 - NOT Target (reverse, no copy) 1 0 1 1 - Source OR NOT Target (mask copy) 1 1 0 0 - NOT Source (reverse direct copy) 1 1 0 1 - NOT Source OR Target (reverse combine) 1 1 1 0 - NOT Source OR NOT Target (complex reverse copy) 1 1 1 1 - Target is set to "1" (blind copy) Blitter Control Register: $FFFF8A3C X X X 0 X X X X This register serves multiple purposes. The lowest 4 bit represent the number of the line in the Halftone pattern to use on all blits of this line. The upper 3 bits feature extended options of the Blitter. Bit 5 - Smudge-mode Which line of the halftone pattern to be used is read from the lowest 4 bits of the source buffer when the copy starts Bit 6 - Blit-Mode Register Decides wether to copy in BLIT Mode (0) or in HOG Mode (1). In Blit Mode (also known as cooperative), CPU and Blitter get 64 clockcycles in turns, in Hog Mode, the Blitter reserves and hogs the bus for as long as the copy takes, CPU and DMA get no Bus access. Bit 7 - Busy Bit Turns on the Blitter activity and stays "1" until the copy is finished Blitter Skew Register: $FFFF8A3D X X 0 0 X X X X The lowest 4 bit of this register allow to shift the data while copying by up to 15 bits to the right. The upper 2 bits are Bit 6 - NFSR (No final source read) Bit 7 - FXSR (Force extra Source Read). NFSR means the last word of course is not being read anymore. This is only sensible with certain Endmask and skew values. FXSR is the opposite and forces the Blitter to read one more word at the beginning of a line. Also only sensible with certain Endmask/Skew combinations. So much for the theory. Unfortunately, the Blitter is a lovely but also pretty stubborn little chip. What went wrong this time ? ? After feeding the Blitter values and activating it, the STE totally crashes. ! All the address-related auxilary registers such as X-Count/Y-Count, X/Y-Increments etc. are signed values. In other words, the Blitter can go backwards in memory as well as forward. Please check if your values are correct. ? I am trying a simple and direct copy and set all the important registers, but it does not work as i planned. ! The Blitter is a chip and not a software, meaning it does not know any default values. Especially when starting to learn "Blitter" it is important to ALWAYS set EVERY Register correctly. Especially Endmask, Smudge, Skew and OP-Register can lead to very funny results if not set correctly. So set ALL the registers at least once, for all subsequent copies you do not need to set them ALL anymore. Registers modified by a copy are Source and Target addresses and the X- and Y-count registers. If you are subsequently copying blocks of same size and shape, you will only have to reinit these registers. ? The copy appears at the right spot, but is scrambled. ! Make sure your X/Y-Increments are correct for both Source and Destination. Especially if you are copying a "tight" block (like a 32x32 pixel compact block) to a larger area (like the screen) you definetly need to watch the increment registers. ! Also note that after the last word of a line as been copied, the Blitter does NOT add the X-increment but only the Y-increment. A sensible Y-increment is therefore usually at least as large as the X-increment plus the rest of the offset. ? Now the first copy works, but even though i am copying blocks of identical size, just setting addresses does not work. ! No, the Blitter uses a few of the registers accessible by the CPU for its own counting. Set Addresses, X and Y-Count Registers every time you do a copy in any case. If the shape of the blocks you copy change, also change X- and Y-Source/Destination Increments. ? So i set all the registers, but the copies are incomplete when i do multiple copies. ! Before feeding the Blitter new values, make sure it has finished its task already by checking the Busy-Bit. Do not write new values into the Blitter's registers as long as it is still operating. ? It looks like the copy itself works, but it flickers. And i was using the Blitter to speed things up, not to make them flicker. ! After feeding the Blitter all the values and activating it, the CPU is done and can do other tasks, the Blitter however has just started. If the Blitter does critical things in your program make sure the "Blit Busy" has returned from "1" to "0" before your CPU proceeds when using the Blitter in Blit-mode. ? To make it even faster, i turned the Blitter into Hog-mode. But now my program behaves oddly and crashes sometimes at random. ! The Hog-Mode of the Blitter does not allow the CPU to access to bus while the Blitter is active - Not even for interrupts. Make sure that your software does not require the CPU to react to an interrupt immediatelly - Otherwise, the STE will crash. This might turn out especially ugly when using interrupts that are critically timed, for example for screen swapping, music driving or maybe even Module-replay. Never ever try to use the Blitter in hog mode for larger copies under these conditions. ? Is there a way to make the Blitter faster in Blit-mode ? ! Yes, there is. Atari used this to speed up the Blitter in GEM without risking to use Hog-mode: Check the Busy-Bit. The CPU cannot access the bus and therefore not the Busy-Bit if the Blitter is "active". If the CPU can finally check the Busy-Bit the Blitter has "paused" and will wait for 64 clockcycles. Now if the Busy-Bit is 0, the Blitter is done and you can leave. If not, set it to "1" manually and do a NOP. Writing the Busy-Register will relaunch the Blitter immediatelly, but the Blitter needs a few clockcycles to reserve the bus (around 7), so the NOP is carried out in any case. This gives about 90% the speed of the HOG-mode without losing the option to execute interrupts after the next 64 clockcycles. Here's an extract from the ST Profibook: Loop: bset.b #7,$FFFF8A3B ;test and set Busy-Bit nop ;do a NOP in any case bne.s Loop ;if Busy-Bit was "1", go to Loop ! For copying little blocks (like 16x16 pixels), it is usually sufficient to restart the Blitter just once by using a bset.b #7 instrucion. This will save a few buscycles for the CPU. Some experiments are recommended. ? Huh ? My program does not work on the TT ? ! No, it does not. The TT does not have a Blitter. ? I am dissappointed by the Blitter speed for the way i am using it. When is it sensible to use the Blitter at all ? ! In fact, the Blitter does not reveal its true potential on small blocks. If you are copying let's say 32x16 pixel blocks in 1 or 2 bitplanes (64 or 128 bytes), the Blitter will not outspeed the 68000 of the STE in a direct copy and since preshifted blocks of that size do not cost a lot of memory, it is also no problem to store preshifted blocks of that size. Therefore it is not really sensible to use the Blitter on anything smaller than that. However, the larger the blocks are you are copying, the more sense it will make to rely on the Blitter. ? I am coding the Blitter on the Falcon to reduce CPU usage a bit but the program has slowed down even more. ! Unfortunately, the Falcon Blitter is rather useless since the 68030 is, when doing a simple 1:1 copy, about a factor of 4 to 5 faster than the Blitter in the Falcon is, even though the Falcon Blitter is running at 16 MHz. On the Falcon, the Blitter can become useful if you plan to heavily use Halftone-pattern, bitwise-shifts and logical operations. Otherwise, use the CPU instead. ? I was trying to use the shift-operations of the Blitter to have my objects on screen (ST Lowres) move pixelwise, but instead, Bitplanes are being screwed up. ! Please bear in mind the interleaved bitplane structure of the ST Low resolution. Trying to copy and shift all bitplanes at once will make the Blitter shift single bits from bitplane X to bitplane Y. Copy bitplane by bitplane and it will work. ? Trying to shift a 16x16 pixel block in one bitplane to the right does not work. Why ? ! The Blitter will always do a copy, meaning, it will always read a word to write a word. If you have a 16x16 pixel block you want to shift to the right by one pixel (=bit), the Blitter will need to write 2 words to screen, the first word will have a zero shifted in to the very left, and the second word will contain the rightmost bit of the first word when it was unshifted. To write this word however, the Blitter will also read a word, meaning, the next line of your 16x16 pixel block. The easiest solution to this problem is to use a 32x16 pixel block instead and copy 2 words each line. ? Can't i copy 2 words, but use "No-Final-Source-Read" on the second word each line ? ! Unfortunately not. The flag "No-Final-Source-Read" will mean that the Blitter does absolutely no source operations, meaning, it will neither skew nor clear the source buffer. This way, the word previously written to the screen will be written again. ? So i can do shifts to the right. Can i also do shifts to the left ? ! Yes, but it is a bit more complicated since you will have to rely on sensible ENDMASK-settings, skew values, the FXSR-switch and in some cases even the NFSR-flag. Then copy from the right to the left. ? When copying less than 3 words, in what way are the ENDMASKs used ? ! If copying just one word, ENDMASK1 will be aplied only. Copying 2 words a line involves ENDMASK1 on the first and ENDMASK3 on the second and therefore last word in each line. Copying 3 words and more will mean that ENDMASK1 is applied on every first word of each line, ENDMASK3 on the last word and ENDMASK2 on each words in between. ? Copying and shifting blocks with the Blitter works now, but sometimes, a few bits get lost. ! In some cases, depending on the Endmask- and the Skew-registers, the Blitter requires to read a word more than planned. Try the FXSR-Register in these certain conditions. ? I heard somewhere, that the Blitter can be used for generating software sprites all by itself. Is that true ? ! Yes, you can have software sprites using the Blitter, that can be freely positioned (pixel-perfect) without any other interference of the CPU than just feeding values into the Blitter registers. However, the Blitter cannot produce a 4 bitplane software-sprite in 1 go. The simplest and most convenient way is to generate a 1 bitplane mask for all sprites you are going to use. This does not mean to preshift them, but to generate the mask for all bitplanes. This can easily be done by either CPU or Blitter by logically or-combining all 4 bitplanes. Now for software sprites, you use the Blitter to shift and logically combine NOT Mask AND screen content for all 4 bitplanes, then to copy Sprite OR screen for all 4 bitplanes. There are ways of doing this faster, but this is very easy to program and yet pretty quick, especially for large sprites. ? I program the Falcon in true-colour mode and i would like to take advantage of the Blitter. ! Even though of course the Blitter works well in TC-mode, its special features, bitwise shifts, extremely fast logical operations, masks for bitwise copy and the halftone pattern, are basically useless and for a direct copy, the CPU is a lot faster. ? I do not understand the sense of the Halftone-pattern and the smudge register ? ! These registers are not being used very regularly and it can be assumed that they have been implemented mainly for compatibility purposes since the "BitBLT"-algorithm is well defined. However, for monochrome patterns, the Halftone-pattern can be used for easily applying fill patterns on blocks or for scaling the brightness of blocks. The smudge register was intended for introducing some kind of a random function of the Blitter without involving any math. It can be used also for applying a certain line of the halftone-RAM to one whole line of the bitblock by putting a value which line of the halftone-RAM to use on the beginning of each line of your bitblock, but this is already advanced stuff and will not be discussed any further. [ Back to Main ] [ Onto next Chapter ] |
|