Alive 13 - dbug

01 - 02 - SE - 03 - 04 - 05 - 06 - 07 - 08 - 09 - 10 - 11 - 12 - 13 - 14

Alive 13

68k Coding

10 Years later



        Hello everybody, yes this is indeed an article by Dbug (the 
        coder from NeXT) and not by D-Bug (the menu group), as some of 
        you might have expected :). Well, now since the identity part is 
        clear, I can start with the core of this article.

        In 2005 DHS organised a megademo compo for the 20 years birthday 
        of our beloved Atari ST. I couldn't resist participating to such 
        an event, especially considering that it was also 10 years ago 
        that I coded something in 68k assembly. (In 1995, when I got my 
        first job at Adeline Software, I stopped working on my Falcon 
        painter "Rembrandt").

        Additional motivation was generated by moving to Norway. Being
        Nerve's neighbour it seemed logical to team up for coding with
        the only active Norwegian Atari coder :D and so I joined
        Creators.

        My involvement in this intro actually started one month before
        the compo was announced. I was invited to Nerve's place, and
        then Frequent joined us for a micro ST party. Listening to YM
        music, Nerve started to code some 3D rasterizing stuff, and I
        was preparing logos and sprites for the intro part. Well
        honestly, I didn't do many things, just a half finished logo and
        some lame sprites animations using CrackArt.

        Then sometime after, the 20 years birthday compo was announced,
        and it seemed like a perfect motivation for finishing the intro.
        We decided to divide the work between us. Nerve would focus on
        the 3D part, while I would work on the scroller. After talking
        with a few other people, we got additional help offered by
        Ptoing and Stingray. Ptoing supplied some pictures and fonts and
        Stingray helped optimizing some code. (Also Tero and Proteque
        were preparing some stuff, but due to time constraints were not
        able to include it before we hit the deadline.)

        In the end we managed to meet the deadline (by a mere few
        minutes) with a somehow finished version and the nice Atari
        people lifted us up to the 3rd place of the intro compo. It
        really amazed me, especially since the intro we delivered was
        far from containing all the stuff we had originally planned.
        There is actually code (e.g. the axis rotation stuff) that has
        been disabled in the "final" version because it causes the
        scroller to get almost unreadable.

Additional material


        Normally if everything worked out fine, you should find the
        source code and data of this intro, so you should be able to
        assemble the executable by yourself using Devpac. During the
        article I will refer to some parts of the code and I hope it is
        understandable enough to everybody with a bit of Atari ST
        programming experience.

Hardware and Software


        Let's get back to the beginning. The first thing I did was to
        try to get my real Atari machine up and running. So I unpacked
        my Mega STE, plugged it in ... and cried. The hard drive wasn't
        recognised anymore; obviously the machine took some damage
        during its journey from France to Norway. Even worse, I was
        unable to find my VME graphic card, meaning that I would be
        obliged to use Devpac in 640x200. Yuck! This made me use my PC
        with an emulator for development. I had a "lot fun" trying to
        get a decent STEEM configuration, but after some tweaks here and
        there - especially setting some shortcuts to be able to do fast
        loads and reset of the virtual machine - I was able to code and
        test it immediately on the virtual Atari. The final result is
        far from being as smooth as what I use when coding Oric demos,
        but it's still kind of usable :)

        One of my biggest problems was the fact, that Devpac is a
        seriously bugged application, which easily bombs for no reason
        at all. Other problems were created by the fact that STEEM is
        way more tolerant than real Atari STE. This means that my
        fullscreen code, which worked perfectly well in STEem, wouldn't
        work on a real ST. (Since then it has been tested on STF, STE,
        MSTE, and worked fine everywhere, so I'm quite happy with this
        part).

        Another problem was my lack of documentation. Since all my
        Atari/68k related books are still waiting in France, I had to
        find a replacement first. Thanks to all the kind Atari-people on
        #atari.fr and #atariscne for the tables of clock cycles, and to
        Andre that let me use his M68000 developer book :)

        And since we are already talking about problems, let's talk
        about problematic tools: XnView sucks! It's very difficult to
        use this program if the value of RGB components and order of
        colours in the palette is important for you. In the end I had to
        hack a special Atari ST mode in my own picture converter
        program...

        For all the data, I used Builder scripts ("Builder" is the name
        of a tool I developed for Eden games), allowing me to tweak and
        modify values of tables and pre-generated data very easily. Also
        having native support for big endian data is a plus :)

Description of the intro


        Basically, my intro contains just an introduction text with some
        fades. Then it displays a three bitplanes sinus scroller, in
        left and right overscan (this works on 8 MHz M68k machines only,
        of course) with an eight colours animated background.

        This intro has been tested, and appears to work fine on these
        machines:

        - STF
        - STE
        - Mega STE
        - 68030 Falcon (accelerated or not)

        Unfortunately it does not work very well on 68060 processors,
        because I used a lot of movep for the scroller. During coding I
        didn't know that this instruction is emulated on these machines
        :)

        The Falcon support is done through the fact I have a double code
        path. On an 8 MHz machine the main loop is sync coded, on other
        machines the rasters are displayed using the Timer B.

        About the text, yes it's using 3 bitplanes, and it's fully
        masked with the background. The fact that it almost looks like 2
        bitplanes is another problem :p In theory in the original design
        the font was supposed to use 7 colours, the 8th colour should
        have been a cast shadow. Unfortunately I didn't had time enough
        to do that, so in the end I have an overly complicated code,
        that could have been done in a totally different and more
        efficient way. So well, keep in mind when looking at the code
        that it was supposed to do more than what it is used for :)

Source Code


        This source code has not been commented in any special way, it's
        the way I'm coding, so it may be sparse here and there, so I
        will try to explain the less obvious details.

Macros


        I generally use quite a lot the conditional code and macro
        functionalities in Devpac. Here is a set of useful macros. Their
        behaviour generally depends of the values of some equates at the
        beginning of the source file.

        PAUSE <duration>
        This macro generates a temporisation that is equal to the given
        parameters (in nops) in it's simple and fast mode it just
        generates a bunch of nops, but since this takes a lot of room
        there is also a "slow" version that try to use slower
        instructions in order to get the same global delay using less
        memory. I used the same trick in the STE screen of the Phaleon
        using traps to generate various common values for delays. These
        instructions are supposedly neutral, but in practice are
        modifying the value of D0.

        STEEMBREAK
        Well, self explanatory. If you use that one in the debug build
        of STEEM, this will trigger the internal debugger.

        COLORHELP <colour>
        I use this to set the colour of the border to see how much cpu
        time is taken by the various sections of the code.

        COLORSWAP
        Same kind of usage than the previous one, but this one just
        inverts the current colour value.

        BOUNDCHECK <value> <min> <max>
        This macro perform a range check, and in case of out of range
        detection triggers a real error that can be caught by the
        debugging code. Particularly practical if you want to assert
        that A0 actually points on the screen buffer.

        MAKERGB <red> <green> <blue>
        Given three components between 0 and 255, generates a STE
        compatible 12 bits colour. I used that a lot to generate
        palettes easily, especially when you are working with people
        that uses 24 bits paint packages and are not able to mentally
        convert a colour to a very wicked STE internal representation!

Code


        Just after the macro definition you will find the code that
        simply calls the xbios function to enter supervisor mode. There
        is nothing special here. It begins with clearing the BSS (I'm
        used to do that, because some depackers and/or demo engines
        don't do it, and I don't want to bother setting each and every
        variable to zero. Flushing everything at the beginning is simple
        and fast. Then I store the supervisor stack pointer and set my
        own stack. This way one makes sure you don't trash the system
        stack, and besides it offers you an easy way to get back to the
        system from anywhere within your code including exception
        handlers. You just need to restore the hardware registers, set
        the ssp value, and perform a PTERM.

        Afterwards you'll find the code for machine detection. It's a
        mix between my own code and the DHS demo shell code. Their code
        is doing on the fly settings of Videl parameters, but I don't
        really like to mix up detection and setup, so mine is just
        collecting information and put that in some variables inside the
        BSS. This information is later used to set the configuration of
        the machine in separate routines.

        So based on what was found out during hardware detection, the
        next routine will decide if it can afford to use overscan or not
        (the two different code path). To handle that with a minimum
        amount of redundant code, I just set up a bunch of values which
        are later used in generic routines. Basically it contains values
        for where the visible screen area begins, how many scanlines,
        scanline offset, number of columns, etc... technically I could
        have handled Falcon hardware overscan with just an additional
        set of parameters.

        These values (line width) are also used to compute premultiplied
        sinus tables values.

        Then come the bog standard saving and setting code for hardware
        registers, some code to empty the keyboard queue (to avoid
        bouncing effect if somebody pressed the space bar to long), and
        then a call to the "screen choc" routine. Since I got comments
        about the screen "jumping", I have to give an explanation here.

        The ScreenChoc is a routine that switch the frequency to 60hz
        and then 50hz waiting few vertical redraw in between, to restore
        the bit plans position in case the shifter got fucked up.
        Probably everybody using Devpac experienced an intro crashing,
        and then getting back to Devpac with weird colours because the
        bitplanes are not set correctly, this happens when the screen
        resolution is changed at the wrong time, and in theory it should
        not be necessary in a normal demo that boots in low resolution.
        Unfortunately when I tested the sample loader code I noticed
        that, when run from medium resolution, the intro was sometimes
        launched with shifted planes, so I decided play safe and keep
        the routine at the beginning.

        Ok let's continue to the "main" routine, it starts with a fade
        to black, but that routine is not that simple because it
        actually bothers fading whatever is currently on screen by
        performing a real fade, not a standard "let's assume it was
        white at the beginning" routine. This actually uses the routine
        used later in the intro to compute the cross fades between all
        the background palettes.

        Afterwards Crazy Q's Zak is played (initially it had a SID voice
        due to a communication problem, he did not know it was not very
        compatible with the fact of having sync coded overscan code),
        and then the intro sequence made from various pictures
        containing text fading in and out is displayed. (There is also
        some conditional assembly, to skip the intro). The duration of
        the Atari Fuji logo is actually the time taken by the
        computation of the main palette animation and gradient.

        And finally we reach the text scroller part.

        The code starts extracting information from the font bitmap
        picture; I did that at runtime because it made it possible to
        integrate new versions of the font from Ptoing very simple and
        fast. Basically I just have a list of characters information,
        and I just precompute some data I extract from the bitmap. All
        these information are stored inside the DATA section within a
        table called "font_info". Instead of storing all that using
        dc.b, I instead use a complicated macro called LETTER which
        takes a bunch of parameters and generates the dc.b values from
        that. The obvious advantage is that I can change the data format
        just by changing things around in one single macro
        implementation instead of playing search and replace for each
        single character. By the way it seems that Devpac is a bit buggy
        in the macro parameters passing code, because it never accepted
        "," instead I had to use the direct ASCII VALUE (44). Well, I
        can live with that :)

        Among the precalculated information, I use three bytes for each
        bitplane, and padded with an additional byte containing the
        inverted mask.

        After the font is extracted, it's time to set the patterns in
        the background, this is done by the DrawBackgroundPattern
        routine, which starts off clearing the first 160 bytes (in case
        we are using overscan), and then fills the rest of the screen
        with binary patterns that look like 8 pixels vertical column (a
        bit like vertical rasters). Then some additional passes are
        done, one to copy a pre-computed "parallax" effect on the top of
        the screen (where nothing moves :p), and then the Creators logo
        is masked on top of it :)

        Finally, after some variables initialization, the scroller code
        is launched, either in fullscreen or not depending of the
        detected hardware. From this point we have two different code
        paths, and a slightly different look on screen (the non overscan
        code path takes less time to execute, so there is an additional
        mirror effect at the bottom of the screen).

        In order to avoid too much redundant code, the scroller code was
        split into three different routines:

        One routine inserts a new character into the buffers
        (ManageNextLetter). Another one computes the source (text
        buffer) and destination (screen) addresses for each screen
        column. Those will be used later for the text display
        (BlitScrollBufferSinusPush). In the non overscan code path this
        third routine is called BlitScrollBufferSinusPop, but in the
        overscan code path the code is directly integrated in the
        fullscreen code.

        The buffer management part is just basically a circular buffer
        with a read and a write pointer. Each time the read pointer
        arrives dangerously close to the write one, a new letter column
        is decoded and inserted in the buffer, and the write pointed
        moved by 8 pixels. I choose this approach because it made it
        possible to get a perfectly variable moving speed for the text,
        without too much hassle. The insertion of a new column is done
        with the ScrollerNextFrame routine, that I admit is kind of a
        bit complicated, since it is also responsible to handle the
        various effects that can be applied during the intro.
        Technically the routine writes the new column into 8 buffers,
        each shifted by one more pixel. This explains the complicated
        shifting and masking code, because it also needs to keep the
        pre-computed colours for the background to avoid further masking
        later in the code.

        In comparison, the push and pop routines are quite simple. The
        only complication is that the display of the 24 high pixels
        characters is split in three blocs. The reason is that in the
        overscan routine I need three scan lines to display one single
        character column. So the non overscan routine is also split in
        three, but for no particular reason other than it was pointless
        to optimise the routine that is already the fastest to execute.

        The Push routine just computes the vertical position of each
        column of the scroller based on a global scroller sinus value,
        plus a local sinus bending value. The three pea just help
        pointing on the areas of the code that needs to be modified. I
        use self-modifying code because I was kind of running out of
        registers in the overscan section, so each of the 24 movep
        instructions point directly on the right scanline. This is
        actually totally useless now since we disabled the rotation
        effect, but well you can play inserting the rotate effect in the
        scroll text, and then you will understand why each line can
        point on any other line.

        Now it is just some more code that is not used in the final
        version, for the sake of readability. Please note that in theory
        the display code is able to support horizontal distortion (I
        wanted a X/Y distorter), but I never managed to find time to
        work on it, so it kind of worked at some point, but it would
        have required even more work so I ditched it.

        Some of you might be interested in the overscan routine, because
        it looks suspiciously like the old routine I wrote for the
        Phaleon demo: It vsyncs once, then cuts all interrupts, and runs
        in a never ending loop :)

        The display of the scroller is split into two sections, because
        there is a partial palette change between the Creators logo and
        the text. Also the routine needs to change 8 colours per line
        really fast, to avoid any glitches at the borders. In the
        comments you can see how the clock cycles are counted, nothing
        special really.

        Everything else is just a bunch of utility functions, to display
        a picture on screen, erase buffers, compute gradients and
        perform fades, temporisation, or things like the timer b routine
        used to display the background in the non overscan code path.
        (Just note that I use the user stack pointer as a fast scratch
        pad to keep the current palette pointer)

        So well, I guess that's all that has to be said about the code
        included in this intro.

        There would be a lot more to say about what we wanted to get
        into.

        If I had not been so rusty, and hadn't forgot so many times how
        this all of this works, I could probably have been able to help
        Nerve on his 3D part. This would have been a great improvement
        on the whole intro. The sad truth is that he probably created
        more optimized 3D rendering code than I ever made on the ST
        myself.

        In the end I can say that working with everybody in Creators was
        quite a nice experience, and I really wish one day we will be
        able to do some more things on the Atari ST :)

                                     Dbug for Alive Magazine, 2006-06-18

Alive 13