|
|
------------------------------------------------------------------------------- QUICK GUIDE TO UNDERSTANDING FALCON DEMOS ------------------------------------------------------------------------------- This article is meant to give you some guidelines to understanding how the Falcon should be programmed if you want your production to be compatible with accelerated machines. Hopefully I can clear up some misunderstandings regarding FPU and memory usage. Since its release in late 92 we have seen numerous great demos for our beloved bird, who can forget System by eKo or Sono by Avena only to name two. These and many other demos were great and still are but there is one problem, they are not very compatible with accelerated Falcons, in fact most of them refuse to run at all on anything else but a 16MHz system. This wasn't a big problem back in the mid 90s since accelerators were not that common and if you had one you could usually turn it off, either by a piece of software or a switch on the back of your machine. Still today, 10 years after its release we still see a lot of demos released for standard Falcons only which can be a bit frustrating if you have a really fast machine but no decent software to show what it can do. For some reason accelerators never really caught on in the atariworld, atleast nothing like among amiga users where a 060 board is pretty much standard these days. This is a bit unfortunate for us since demos tend to either be hardcoded for standard Falcons or designed with any Falcon in mind, this means that the latter usually looks a bit "weaker" when running on a slow machine since it cannot take advantage of fixed framerate and things like that. It is a lot easier to program a demo when you know that it will look the same on all machines, you can pregenerate a lot more data if you dont have to consider variable framerate. Fixed framerate is when you have say a sprite being drawn and updated at the same time, you update the position and then you draw the sprite in your mainloop. This worked fine on STs and to some extend on Falcons. The problem is when somebody with a machine twice as fast as yours tries to run your demo the mainloop with update and draw will be executed twice as many times per second as it would on yours resulting in less time (in seconds) to reach the desired position and it usually looks bad. The solution to this is to have all the "updates" run on a timer. If you setup a timer to trigger 100 times per second you will always get an interrupt at the same time on every machine, no matter how fast it is, where you can update your sprite position. You keep the drawing part in your mainloop and this will execute as many times per second as your Falcon can do. Using this method your demos will run better on accelerated Falcons and you can be sure the sprite will move as expected across the screen. Things like the above example has to be considered when programming and hopefully this document can help you understand why some programmers prefer hardcoding their demos while others do things more compatible. 1) FPU The 68882 is usually clocked to 16MHz and is connected to the CPU via a 16bit wide interface, this means that it is anything but fast. It should never be used for innerloop calculations but rather for table generators or small operations in the mainloop. The good thing is that it can handle very big numbers with up to 96bit precision, though 32bit is usually enough. Apart from a few controlregisters the FPU has got 8 dataregisters, fp0-fp7, each up to 96bit big. They are used for all internal operations. Since 68882 has got built-in trig functions such as sine/cosine, pregenerated fixedpoint sinetables are no longer needed and can be replaced with either runtime instructions or generated with great accuracy in the beginning of the program. Deriving sine of a number has never been easier, simply do: fsin fp0,fp1 where fp0 is your number and fp1 will become sine(fp0). On falcon the FPU is optional and even though not many demos make use of it this will more than likely change when the CT60 becomes available with its fast built-in FPU. Reason why democoders would use the FPU is accuracy and lazyness I'd say, usually fixedpoint is enough for most demos but when you start doing heavier calculations such normalizing vectors etc the FPU becomes invaluable with the great accuracy it provides. As for the laziness it is simply so much easier to use fsin and fsqrt than to use tables. As mentioned above, the 040 and 060 both comes in models with integrated FPU, this however is a cutdown version of the 68882 and even though it features most of the instructions, some are left out which means emulation is used to maintain compatibility. On 060 the 64 bit version of mul/div is also left out which means either emulation of of those missing instructions or the use of FPU multiplications when doing high precision calculations, this means that most 060 applications will choke a standard Falcon not only because of the faster CPU but because FPU instructions are used to replace the, on 060, slower CPU equivalent instructions. On 060 a normal muls.l d0,d1 takes 2 clockcycles and an FPU multiplication takes 3 clockcycles, the difference is usually negligible considering the extra accuracy you get with the FPU version. 2) DSP The Falcon DSP is a beast, no doubt, it makes it possible to replay mp3s on a plain 16MHz Falcon aswell as calculating fractals and everything else that requires a lot of multiplications, because that is really what its good at, multiplying numbers. Especially when concatenating matrices or applying them on vectors (vertices) since you have the MAC instruction, Multiply and Accumulate, basically multiply two numbers and add the result to a third number (accumulator). Throw in a "round" and "negate" together with parallel moves and you end up with code quick enough to make even the most overclocked 030 jealous. This is why demos which employ DSP code for both music and graphics can do so many more polygons and rotations than most "normal" demos can. However this comes at a price, compatibility. There is a reason why demos like Sono and Hmmm dont work on accelerated Falcons, it's difficult to keep demos compatible with faster Falcons when trying to squeeze every possible ounce of performance from the machine. Keeping DSP-host transfers in sync can be very difficult when you have to consider faster machines, its easier to hardcode things. When programming graphical effects you are most likely to use the hostport for communication between DSP and host cpu. This interface is an 8bit bus acting like a 24bit dito, three 8bit parts, high, middle and low byte. You can either read/write the last two bytes (middle and low) giving you 16bit or all three bytes plus the high byte yielding a full 32bit long word, the high byte is always ignored and will read as zero and wont cause a bus error. This is all nice and easy to use but there is one problem, speed. Even though the DSP itself is very quick, interface to the host is not, speed is about half of that of STram or roughly 2.5MB/second. This means that you will only benefit from using DSP code when the per-dsp-transfer cost using the CPU is less than two 2 STram read/write. This is ofcourse a very rough figure but it gives you and idea how slow the interface really is. An obvious advantage with using the DSP is parallel processing, since the DSP is a completely seperate processor it should be used accordingly, ie calculating data while the host is busy drawing or clearing the screen and when the host is finished you transfer all data, this to get the most out of it. 3) Memory A standard Falcon comes with 1, 4 or 14MB of ram (STram) but I doubt there are any falconusers out there with only 1 mb in their machines. Upgrades to 14MB are available from a few places and well worth the money. Together with an FPU I consider 14MB being the best upgrade you can do since a lot of applications and demos require more than 4 MB. For one reason or another, some 14MB boards tend to be very sensitive to overclocking, problems range from slight pixelflicker to very unstable machines. Especially CT2 machines have proven unstable with the wrong memory and most homemade 14 MB fail to work properly. A standard Falcon can only use STram but there are quite few boards out there with TTram support. The difference between ST- and TTram is that audio/screen buffers can _only_ be placed in STram whereas code can be placed in both STram and TTram. TTram is also usually a lot faster and the board can hold more memory. Speed difference is significant especially on the Afterburner040 and upcoming CT60. For ab040, TTram read is about 35 MB/second while STram is only half of that of a standard Falcon, about 3 MB/second (due to technical issues with 040+ processors, this problem has been solved on CT60 according to Rodolphe Czuba) Use of TTram is controlled with 2 bits in header of an executable, TTram-mem which, if set, defaults all memory allocated with Malloc() to TTram, and TTram-load which, if set, forces the program to load into TTram. If no TTram is available STram will be used. Of course you can allocate both STram and TTram in your program using Mxalloc() where you can specify what memory type you want. When programming with compatibility in mind and to make your application run as fast as possible you should always try to load the program into TTram by setting the propriate bit in the program header (this can be done using for example fileflag CPX or Thing desktop) and for screen/audio buffers you would have to use Mxalloc() as mentioned above to allocate STram. Another, less obvious pitfall is hardware registers, not only does the Falcon lack shadowregisters available on the ST but since some addon boards are fully 32bit, accessing $ffxxxx will cause problems, use the full 32bit address instead, $ffffxxxx to avoid problems with expansion boards. For demos there used to be a magic 4MB memory limit but recently we have seen more and more productions requiring 14MB, this is mostly due to music, the Falcon is able to replay mp2 music at high quality using DSP in demos but there is a flipside to this coin. A 4 minute mp2, 96kBit/32kHz takes up almost 3MB so unless you stream the music from harddrive you have to load it all into STram and this along with 500kb for screen buffers leaves you little or no memory left for code and graphics on a 4MB machine. This means that most demos with mp2 music are 14MB only. Please keep in mind what I mentioned above, use TTram whereever possible, otherwise we end up with 14MB STram-only productions. 4) Blitter On STE the blitter is really nice and has got a big advantage over the CPU when it comes to copying and shifting data around. On Falcon however its a different matter, even though the blitter is running at 16MHz compared to 8MHz on STE, it is simply too slow to be useful and probably only present for compatibility reasons. And since it can only access STram it makes it even more useless so my advice is to leave the blitter completely on Falcon. 5) Videl Videl is the graphics processor in the Falcon so unless you are using an external graphicscard of some kind, this is where the all graphical output comes from. As mentioned in the memory section Videl can only use STram for framebuffers and its resolution can be set either by using XBIOS or writing to the hardware registers themselves. The latter gives you more control over the resolution and is the prefered method by many coders since there is an excellent tool for this job, Screens Pain. It allows you do set any resolution the Videl is capable of aswell as number of colours and frequencies. On the topic of frequencies, speeders usually replace the 32MHz Videlclock with a 50MHz one which means that 32MHz fed resolutions are no longer working on accelerated machines and should be avoided. Use 25MHz instead for both RGB and VGA. Unfortunately this means that a 320 pixels wide screen on RGB becomes stretched and the left- and rightmost sections are not fully visible anymore unless you can adjust your monitor/tv but this is usually not a problem. When resetting resolution you might end up with corrupt resolution or a slightly offset screen, this is due to a bug in XBIOS, there are ways of restoring the Videl but you might still have problems with offset screens but this is rare and setting and resetting resolution again usually fixes the problem. Summary Most important thing to remain compatibility with faster Falcons is to not use hardcoded DSP transfers since they will not work on anything but the machine you made it for. The TTram vs. STram problem can be solved by clearing the flags in the program header but this has to be considered bad programming since its very easy to allocate memory for your purposes. So, - keep updates on timers but render the frame in the mainloop - allocate STram for screen and audio buffers using Mxalloc(). - dont use the blitter (at all!). - avoid hardcoded DSP transfers (never assume that the CPU and DSP frequency ratio is 1:2 etc). If you follow these simple rules you can be fairly certain that your demo will work properly on any Falcon. If you have any questions regarding this matter (or anything else) dont hesitate to contact me, either using email or on IRC, #atariscne. ------------------------------------------------------------------------------- For more info, download Alive #1 from http://alive.atari.org and read Evil's coding tutorial. You can also download his demosys which included init/restore routines aswell as a timerbased effect system, get it from: http://alive.atari.org/stuff/extra.zip This archive also included a small testdemo as an introduction. ------------------------------------------------------------------------------- Fredrik Egeberg (deez of mind-design) deez@algonet.se |
|