68k Coding
10 Years later
Hello everybody, yes this is indeed an article by Dbug (the
coder from NeXT) and not by D-Bug (the menu group), as some of
you might have expected :). Well, now since the identity part is
clear, I can start with the core of this article.
In 2005 DHS organised a megademo compo for the 20 years birthday
of our beloved Atari ST. I couldn't resist participating to such
an event, especially considering that it was also 10 years ago
that I coded something in 68k assembly. (In 1995, when I got my
first job at Adeline Software, I stopped working on my Falcon
painter "Rembrandt").
Additional motivation was generated by moving to Norway. Being
Nerve's neighbour it seemed logical to team up for coding with
the only active Norwegian Atari coder :D and so I joined
Creators.
My involvement in this intro actually started one month before
the compo was announced. I was invited to Nerve's place, and
then Frequent joined us for a micro ST party. Listening to YM
music, Nerve started to code some 3D rasterizing stuff, and I
was preparing logos and sprites for the intro part. Well
honestly, I didn't do many things, just a half finished logo and
some lame sprites animations using CrackArt.
Then sometime after, the 20 years birthday compo was announced,
and it seemed like a perfect motivation for finishing the intro.
We decided to divide the work between us. Nerve would focus on
the 3D part, while I would work on the scroller. After talking
with a few other people, we got additional help offered by
Ptoing and Stingray. Ptoing supplied some pictures and fonts and
Stingray helped optimizing some code. (Also Tero and Proteque
were preparing some stuff, but due to time constraints were not
able to include it before we hit the deadline.)
In the end we managed to meet the deadline (by a mere few
minutes) with a somehow finished version and the nice Atari
people lifted us up to the 3rd place of the intro compo. It
really amazed me, especially since the intro we delivered was
far from containing all the stuff we had originally planned.
There is actually code (e.g. the axis rotation stuff) that has
been disabled in the "final" version because it causes the
scroller to get almost unreadable.
Additional material
Normally if everything worked out fine, you should find the
source code and data of this intro, so you should be able to
assemble the executable by yourself using Devpac. During the
article I will refer to some parts of the code and I hope it is
understandable enough to everybody with a bit of Atari ST
programming experience.
Hardware and Software
Let's get back to the beginning. The first thing I did was to
try to get my real Atari machine up and running. So I unpacked
my Mega STE, plugged it in ... and cried. The hard drive wasn't
recognised anymore; obviously the machine took some damage
during its journey from France to Norway. Even worse, I was
unable to find my VME graphic card, meaning that I would be
obliged to use Devpac in 640x200. Yuck! This made me use my PC
with an emulator for development. I had a "lot fun" trying to
get a decent STEEM configuration, but after some tweaks here and
there - especially setting some shortcuts to be able to do fast
loads and reset of the virtual machine - I was able to code and
test it immediately on the virtual Atari. The final result is
far from being as smooth as what I use when coding Oric demos,
but it's still kind of usable :)
One of my biggest problems was the fact, that Devpac is a
seriously bugged application, which easily bombs for no reason
at all. Other problems were created by the fact that STEEM is
way more tolerant than real Atari STE. This means that my
fullscreen code, which worked perfectly well in STEem, wouldn't
work on a real ST. (Since then it has been tested on STF, STE,
MSTE, and worked fine everywhere, so I'm quite happy with this
part).
Another problem was my lack of documentation. Since all my
Atari/68k related books are still waiting in France, I had to
find a replacement first. Thanks to all the kind Atari-people on
#atari.fr and #atariscne for the tables of clock cycles, and to
Andre that let me use his M68000 developer book :)
And since we are already talking about problems, let's talk
about problematic tools: XnView sucks! It's very difficult to
use this program if the value of RGB components and order of
colours in the palette is important for you. In the end I had to
hack a special Atari ST mode in my own picture converter
program...
For all the data, I used Builder scripts ("Builder" is the name
of a tool I developed for Eden games), allowing me to tweak and
modify values of tables and pre-generated data very easily. Also
having native support for big endian data is a plus :)
Description of the intro
Basically, my intro contains just an introduction text with some
fades. Then it displays a three bitplanes sinus scroller, in
left and right overscan (this works on 8 MHz M68k machines only,
of course) with an eight colours animated background.
This intro has been tested, and appears to work fine on these
machines:
- STF
- STE
- Mega STE
- 68030 Falcon (accelerated or not)
Unfortunately it does not work very well on 68060 processors,
because I used a lot of movep for the scroller. During coding I
didn't know that this instruction is emulated on these machines
:)
The Falcon support is done through the fact I have a double code
path. On an 8 MHz machine the main loop is sync coded, on other
machines the rasters are displayed using the Timer B.
About the text, yes it's using 3 bitplanes, and it's fully
masked with the background. The fact that it almost looks like 2
bitplanes is another problem :p In theory in the original design
the font was supposed to use 7 colours, the 8th colour should
have been a cast shadow. Unfortunately I didn't had time enough
to do that, so in the end I have an overly complicated code,
that could have been done in a totally different and more
efficient way. So well, keep in mind when looking at the code
that it was supposed to do more than what it is used for :)
Source Code
This source code has not been commented in any special way, it's
the way I'm coding, so it may be sparse here and there, so I
will try to explain the less obvious details.
Macros
I generally use quite a lot the conditional code and macro
functionalities in Devpac. Here is a set of useful macros. Their
behaviour generally depends of the values of some equates at the
beginning of the source file.
PAUSE <duration>
This macro generates a temporisation that is equal to the given
parameters (in nops) in it's simple and fast mode it just
generates a bunch of nops, but since this takes a lot of room
there is also a "slow" version that try to use slower
instructions in order to get the same global delay using less
memory. I used the same trick in the STE screen of the Phaleon
using traps to generate various common values for delays. These
instructions are supposedly neutral, but in practice are
modifying the value of D0.
STEEMBREAK
Well, self explanatory. If you use that one in the debug build
of STEEM, this will trigger the internal debugger.
COLORHELP <colour>
I use this to set the colour of the border to see how much cpu
time is taken by the various sections of the code.
COLORSWAP
Same kind of usage than the previous one, but this one just
inverts the current colour value.
BOUNDCHECK <value> <min> <max>
This macro perform a range check, and in case of out of range
detection triggers a real error that can be caught by the
debugging code. Particularly practical if you want to assert
that A0 actually points on the screen buffer.
MAKERGB <red> <green> <blue>
Given three components between 0 and 255, generates a STE
compatible 12 bits colour. I used that a lot to generate
palettes easily, especially when you are working with people
that uses 24 bits paint packages and are not able to mentally
convert a colour to a very wicked STE internal representation!
Code
Just after the macro definition you will find the code that
simply calls the xbios function to enter supervisor mode. There
is nothing special here. It begins with clearing the BSS (I'm
used to do that, because some depackers and/or demo engines
don't do it, and I don't want to bother setting each and every
variable to zero. Flushing everything at the beginning is simple
and fast. Then I store the supervisor stack pointer and set my
own stack. This way one makes sure you don't trash the system
stack, and besides it offers you an easy way to get back to the
system from anywhere within your code including exception
handlers. You just need to restore the hardware registers, set
the ssp value, and perform a PTERM.
Afterwards you'll find the code for machine detection. It's a
mix between my own code and the DHS demo shell code. Their code
is doing on the fly settings of Videl parameters, but I don't
really like to mix up detection and setup, so mine is just
collecting information and put that in some variables inside the
BSS. This information is later used to set the configuration of
the machine in separate routines.
So based on what was found out during hardware detection, the
next routine will decide if it can afford to use overscan or not
(the two different code path). To handle that with a minimum
amount of redundant code, I just set up a bunch of values which
are later used in generic routines. Basically it contains values
for where the visible screen area begins, how many scanlines,
scanline offset, number of columns, etc... technically I could
have handled Falcon hardware overscan with just an additional
set of parameters.
These values (line width) are also used to compute premultiplied
sinus tables values.
Then come the bog standard saving and setting code for hardware
registers, some code to empty the keyboard queue (to avoid
bouncing effect if somebody pressed the space bar to long), and
then a call to the "screen choc" routine. Since I got comments
about the screen "jumping", I have to give an explanation here.
The ScreenChoc is a routine that switch the frequency to 60hz
and then 50hz waiting few vertical redraw in between, to restore
the bit plans position in case the shifter got fucked up.
Probably everybody using Devpac experienced an intro crashing,
and then getting back to Devpac with weird colours because the
bitplanes are not set correctly, this happens when the screen
resolution is changed at the wrong time, and in theory it should
not be necessary in a normal demo that boots in low resolution.
Unfortunately when I tested the sample loader code I noticed
that, when run from medium resolution, the intro was sometimes
launched with shifted planes, so I decided play safe and keep
the routine at the beginning.
Ok let's continue to the "main" routine, it starts with a fade
to black, but that routine is not that simple because it
actually bothers fading whatever is currently on screen by
performing a real fade, not a standard "let's assume it was
white at the beginning" routine. This actually uses the routine
used later in the intro to compute the cross fades between all
the background palettes.
Afterwards Crazy Q's Zak is played (initially it had a SID voice
due to a communication problem, he did not know it was not very
compatible with the fact of having sync coded overscan code),
and then the intro sequence made from various pictures
containing text fading in and out is displayed. (There is also
some conditional assembly, to skip the intro). The duration of
the Atari Fuji logo is actually the time taken by the
computation of the main palette animation and gradient.
And finally we reach the text scroller part.
The code starts extracting information from the font bitmap
picture; I did that at runtime because it made it possible to
integrate new versions of the font from Ptoing very simple and
fast. Basically I just have a list of characters information,
and I just precompute some data I extract from the bitmap. All
these information are stored inside the DATA section within a
table called "font_info". Instead of storing all that using
dc.b, I instead use a complicated macro called LETTER which
takes a bunch of parameters and generates the dc.b values from
that. The obvious advantage is that I can change the data format
just by changing things around in one single macro
implementation instead of playing search and replace for each
single character. By the way it seems that Devpac is a bit buggy
in the macro parameters passing code, because it never accepted
"," instead I had to use the direct ASCII VALUE (44). Well, I
can live with that :)
Among the precalculated information, I use three bytes for each
bitplane, and padded with an additional byte containing the
inverted mask.
After the font is extracted, it's time to set the patterns in
the background, this is done by the DrawBackgroundPattern
routine, which starts off clearing the first 160 bytes (in case
we are using overscan), and then fills the rest of the screen
with binary patterns that look like 8 pixels vertical column (a
bit like vertical rasters). Then some additional passes are
done, one to copy a pre-computed "parallax" effect on the top of
the screen (where nothing moves :p), and then the Creators logo
is masked on top of it :)
Finally, after some variables initialization, the scroller code
is launched, either in fullscreen or not depending of the
detected hardware. From this point we have two different code
paths, and a slightly different look on screen (the non overscan
code path takes less time to execute, so there is an additional
mirror effect at the bottom of the screen).
In order to avoid too much redundant code, the scroller code was
split into three different routines:
One routine inserts a new character into the buffers
(ManageNextLetter). Another one computes the source (text
buffer) and destination (screen) addresses for each screen
column. Those will be used later for the text display
(BlitScrollBufferSinusPush). In the non overscan code path this
third routine is called BlitScrollBufferSinusPop, but in the
overscan code path the code is directly integrated in the
fullscreen code.
The buffer management part is just basically a circular buffer
with a read and a write pointer. Each time the read pointer
arrives dangerously close to the write one, a new letter column
is decoded and inserted in the buffer, and the write pointed
moved by 8 pixels. I choose this approach because it made it
possible to get a perfectly variable moving speed for the text,
without too much hassle. The insertion of a new column is done
with the ScrollerNextFrame routine, that I admit is kind of a
bit complicated, since it is also responsible to handle the
various effects that can be applied during the intro.
Technically the routine writes the new column into 8 buffers,
each shifted by one more pixel. This explains the complicated
shifting and masking code, because it also needs to keep the
pre-computed colours for the background to avoid further masking
later in the code.
In comparison, the push and pop routines are quite simple. The
only complication is that the display of the 24 high pixels
characters is split in three blocs. The reason is that in the
overscan routine I need three scan lines to display one single
character column. So the non overscan routine is also split in
three, but for no particular reason other than it was pointless
to optimise the routine that is already the fastest to execute.
The Push routine just computes the vertical position of each
column of the scroller based on a global scroller sinus value,
plus a local sinus bending value. The three pea just help
pointing on the areas of the code that needs to be modified. I
use self-modifying code because I was kind of running out of
registers in the overscan section, so each of the 24 movep
instructions point directly on the right scanline. This is
actually totally useless now since we disabled the rotation
effect, but well you can play inserting the rotate effect in the
scroll text, and then you will understand why each line can
point on any other line.
Now it is just some more code that is not used in the final
version, for the sake of readability. Please note that in theory
the display code is able to support horizontal distortion (I
wanted a X/Y distorter), but I never managed to find time to
work on it, so it kind of worked at some point, but it would
have required even more work so I ditched it.
Some of you might be interested in the overscan routine, because
it looks suspiciously like the old routine I wrote for the
Phaleon demo: It vsyncs once, then cuts all interrupts, and runs
in a never ending loop :)
The display of the scroller is split into two sections, because
there is a partial palette change between the Creators logo and
the text. Also the routine needs to change 8 colours per line
really fast, to avoid any glitches at the borders. In the
comments you can see how the clock cycles are counted, nothing
special really.
Everything else is just a bunch of utility functions, to display
a picture on screen, erase buffers, compute gradients and
perform fades, temporisation, or things like the timer b routine
used to display the background in the non overscan code path.
(Just note that I use the user stack pointer as a fast scratch
pad to keep the current palette pointer)
So well, I guess that's all that has to be said about the code
included in this intro.
There would be a lot more to say about what we wanted to get
into.
If I had not been so rusty, and hadn't forgot so many times how
this all of this works, I could probably have been able to help
Nerve on his 3D part. This would have been a great improvement
on the whole intro. The sad truth is that he probably created
more optimized 3D rendering code than I ever made on the ST
myself.
In the end I can say that working with everybody in Creators was
quite a nice experience, and I really wish one day we will be
able to do some more things on the Atari ST :)
Dbug for Alive Magazine, 2006-06-18
|