Alive
News Team Current issue History Online Support Download Forum @Pouet

01 - 02 - SE - 03 - 04 - 05 - 06 - 07 - 08 - 09 - 10 - 11 - 12 - 13 - 14

Alive 12
An Approach to very fast
Mandelbrot Iterations
                                                                                
        Hi once more, it's ray again and here comes my second tech 
        article for this issue of Alive. I've recently been inspired to 
        revisit the Mandelbrot iteration loop after meeting Charon / 
        Escape at the Paracon 7 party where he came up with the idea of 
        implementing a Mandelbrot fractal on his FPGA board using 16 
        hardware-multiplicators to compute 8 pixels in parallel in one 
        pixel clock tick AFAIR (amazing stuff, he can now do a 640x480 
        fractal in 60Hz realtime! A note to Charon: It was really fun 
        chatting with you, trading ideas and learning some VHDL ;).

        You probably know calculating the Mandelbrot set is a rather
        complex task in terms of the algorithms time consumption on a
        computer (what else). Hence calculating and drawing Mandelbrot 
        fractals was always (or at least has been) a common benchmark 
        for testing the performance and speed of a machine. Besides the
        fact that the set's fringe holds an infinite amount of
        fascinating patterns and shapes only limited by the accuracy of
        the numbers used to draw the fractal, which is inherently
        fascinating. It's always fun to tweak algorithms that people
        claim to know inside out.

        Let me try explaining why calculating the Mandelbrot set is so
        consumptive in the world of digital computing. Here is the
        sequence that is being used to decide whether a point (Cx, Cy)
        is in- or outside the Mandelbrot set, simple math coming up:

        z(n+1) = z(n)^2 + c , c = (Cx, Cy), z(0) = 0+i*0

        Note: The Julia set is computed in a very similar manner only
        that c = const., z(0) = (Cx, Cy), so the whole explanation goes
        for Julia fractals, too basically.

        (Cx, Cy) will usually be the screen coordinates of the pixel to
        be mapped rescaled to fit the section considered in our fractal.
        You will start out at n=0 and iterate until |z| rises beyond a
        maximum value (2 usually) indicating that the sequence diverges
        at (Cx, Cy), i.e. the point doesn't lie inside the set. It turns
        out that computing |z|^2 is easier due to the lack of a square
        root calculation which makes 2^2 = 4 the upper limit to check
        against. You can use the number of iterations (n) to colour the 
        pixel so it is of course necessary to limit n to your maximum
        amount of colours.

        Here is a bit of pseudo code of how the set is usually
        calculated:

        XMIN     =  -2.5;     // the common section
        XMAX     =  +0.8;
        YMIN     =  -1.25;
        YMAX     =  +1.25;

        MAXABSZ  =   2;
        MAXITER  =  32;       // 32 colours
      
        XRES     = 320;
        YRES     = 200;

        float       Cx, Cy = YMIN;


        for (int y=0;y<YRES;y++) {

              Cx = XMIN;
              for (int x=0;x<XRES;x++) {

                    float a=0, b=0, a_new;  // z := a + i*b = 0
                    int n=0;

                    // Enter the costy iteration loop
                    do {
                          a_new = 2*a*b+Cx; // Z(n+1) = z(n)^2+c
                          b = a*a-b*b+Cy;
                          a = a_new;
                    } while ((n++ < MAXITER-1)&&(a*a+b*b < MAXABSZ*MAXABSZ));
                    *screen++ = n;          // Write out pixel value
                   Cx += (XMAX-XMIN)/XRES;
                    }

              Cy += (YMAX-YMIN)/YRES;
              }        


        Now, this is as unoptimized as it gets, 3 multiplications per
        pixel (assuming we are doing fixed point operations, caching the
        a^2, b^2 values in data-registers and naturally not counting the
        2* which equals a shift or an add respectively).

        The next step would be eliminating muls by using |a|+|b| instead
        of a^2+b^2 to approximate |z|^2 (changing the fractals look a
        bit) and expressing a^2-b^2 as (a+b)*(a-b) (binomial formula)
        cutting the calculus to 2 multiplications per iteration. So your
        loop becomes:

        do {
          a_new = 2*a*b+Cx;
          b = (a+b)*(a-b)+Cy;
          a = a_new;
        } while ((n++ < MAXITER-1)&&(abs(a)+abs(b) < MAXABSZ*2));


        Well here's and now here comes the core idea of my whole
        article: Completely remove the multiplications replacing them by
        square lookups. The equations resulting by the complex squaring
        of z are binomials and thus ideally suited to be used in
        interconnection with an old trick that was used to accelerate or
        perform multiplications on machines with slow multiplications
        (68000) or machines lacking a mul instruction (6502 and so on),
        some math coming up again:

        I.         (a+b)^2 = a^2 + 2*a*b + b^2
        II.   =>   2*a*b = (a+b)^2 - a^2 - b^2
        III.  =>   a*b = ((a+b)^2 - a^2 - b^2) / 2

        That is 3 lookups, 1 add, 2 subs and a shift to perform a mul.
        You can even cut it down to 2 lookups by preceding the method a
        bit but this isn't the issue here.

        Take a look at formula II and notice the terms 2*a*b and a^2,
        b^2 which are being reused during the calculations (on the one
        hand to compute b(n+1) and to compute |z|^2 on the other hand).
        This idea enables us to make up a pretty tight iteration loop,
        at least on machines where memory to register reads and
        adds/subs are faster than multiplications as it is the case on
        the 68000 and 68030. Working with fixed point arithmetic you can
        directly renormalize your square tables, I've found that working
        with 5.11 fixed point tables already gives a neat band to play
        with and it costs me 64k words, but of course you can always
        increase accuracy trading memory.

        In a 128x128 window (16 colours) I can almost zoom into the 
        fractal in realtime on my TT with satisfied me enough to write
        up this little tutorial and release my fractal explorer and its
        source along with this issue of Alive -> "filez/mandel.exp/", it
        plots a rather highres set in 48 colours and is supposed to run
        on both the TT and Falcon (also CT60).

        Note: Computing z' length is now exact again, here's the loop:

        // Fixed point arithmetic from here on

        MAXINT      =  65535;
        FRACBITS    =     11;               // fractional bits

        // Set up the square table

        unsigned int sqr[MAXINT+1];
        for (unsigned long i=0;i<=MAXINT;i++) sqr[i] = (i*i) >> FRACBITS;


        int a=0, b=0;  // Cx, Cy must be suiting fixed point numbers now

        (...)

        // Enter the optimized innerloop
        for(n=0;n<MAXITER;n++) {

        asqr = sqr[a];                      // |z| > MAXABS?
        bsqr = sqr[b];
        if (asqr+bsqr > MAXABSZ*MAXABSZ) break;

        // Chose a's and b's range carefully to avoid a carry in a+b
        a = sqr[a+b]-asqr-bsqr+Cx;
        b = asqr-bsqr+Cy
        }

        *screen++ = n;

        (...)


        Have fun with it; maybe make up a Julia fractal or something. In
        case you are about to extend the idea or optimize it further
        please drop me a line via ray(at)tscc(dot)de, I might be
        interested ;).

        Keep it real, stay Atari.
Ray for Alive, 2005-12-22
Alive 12