Alive 11 - frstclip

01 - 02 - SE - 03 - 04 - 05 - 06 - 07 - 08 - 09 - 10 - 11 - 12 - 13 - 14
Alive 11
       ----------------------------------------------------------------
       Low demand frusctrum Culling, fast Triangle-Clipping in 2D Space
                 effient Facesorting and misc. Topics
       ----------------------------------------------------------------
                                                                                
        I'm having some leisure time over here at Milliways 2k5 in
        kradd/.tSCc.'s place so I decided to write another coding gem
        hopefully helpful to one or another.

        So what's it gonna be about? at the time I wrote the 3d engine
        used in our eil entry 'beams' (sorry for the delay once again)
        sometime back in 2001 I had to come up with some ideas on how to
        perform fast triangle clipping in view-space and fast frustrum
        culling for the large screens that had a high face-count. now,
        the engine used in 'beams' is by no means optimized as can be,
        but however, many many sophisticated methods to perform the
        mentioned tasks are already known nowadays, some of which even
        found their way in hardware implementations. Thus I decided to
        share a bit of experience I gathered while writing my engine in
        order to give a little advice in choosing from the palette of
        algorithms to those of you who are yet faced with the task of
        coding their own engine on a 'low-tech' system like the 16/32bit
        range of Atari machines.

        Let me start off with the topic of sorting your geometry's faces
        which is required if you aim to achieve a correct visual result
        before rendering them into the screen (i.e. they have to be
        drawn in back to front order a method commonly referred to as
        the 'painter's algorithm') there are dozens of superior
        algorithms like the z-buffer (mostly found in hardware solutions
        and not reasonable on any Atari due to its high demand), or the
        s-buffer, BSP-trees and other solutions that do not suffer from
        problems like rendering errors on cyclic overlapping faces
        unlike the painter's algorithm for instance but in which the
        trade-off between implementation effort and speed gain doesn't
        pay off IMHO (especially in using 68k assembler as a language).

        So I decided to go for a plain face depth-sorting method which
        of course requires your sorting to be as quick as possible so of
        course you wouldn't go for any worst case O(n^2) algorithms like
        bubble-, selection sort or alike things. e.g. quick-sort is able
        to perform with O(n*log n) complexity in its best case but there
        is however another simple method which requires O(n) to sort the
        given set of objects outperforming quick-sort even with
        relatively low face counts (few hundreds of below) when realized
        carefully, namely the 'radix sort'. The idea of radix sort is
        basically to put your source array's entries into the
        appropriate places of a destination array of the same size
        according to a radix or key (in our case the face's depth) right
        away. At this point I would like to mention that it is generally
        recommended to use the average of the face's vertexes' z-
        coordinate as its depth and if your engine is limited to a
        constant vertex count (like 3 in my case) it's of course
        sufficient to only sum up these z-values without averaging.

        Normally it would take at least three passes to perform the task
        of sorting your array which are 1st: counting the radixes of a
        single class for all classes, meaning counting all faces of the
        same depth, 2nd: summing up the obtained values in order to
        transform them into indexes into the destination array and 3rd:
        using those indexes to put the source elements into the proper
        positions in the destination array.

        Using an array of linked lists as a destination array you can
        even omit the first two passes and put your source arrays faces
        into the proper position of the destination array, which I call
        the 'render queue'. If you aim at beating O(n*log n) type
        sorting algorithms (and of course you do ;) you'll have to
        reduce the solution's constant timing amount which gets mainly
        determined by the number of different classes. This means that
        we have to keep our z-granularity as coarse as possible but fine
        enough to still make a visual result that is free of glitches.
        I've found reserving 9 bits for the z-resolution to fulfil this
        demand. Here comes some pseudo code to clarify the idea:


// Data class definitions

ZGRANULAR = 9                    // Bits for the depth resolution
ZCLASSES = 1<<ZGRANULAR;

struct {
       ...
       int normal_z; // z coordinate of the face's normal
       int z;        // the face's 'depth'
} face_t;

face_t face[NUMFACES];                   // Face array
face_t * RenderQueue[ZCLASSES] = {NULL}; // Visibility list


// Sorting pass

for (int n=0;n<FACES;n++) {
    if (face[n].normal_z>0) {
       // Add face to the destination list if it is facing us
       face[n].next = RenderQueue[face[n].z];
       RenderQueue[face[n].z] = &face[n];
    }
}


// Rendering pass (back to front)

for (int z=ZCLASSES-1;z!=0;z--) {
    while(RenderQueue[z] != NULL) {  // Render faces with current depth
         RenderFace(RenderQueue[z]);
         RenderQueue[z] = RenderQueue[z]->next; // Next face
    }
}


        I hope you get the idea. The next paragraphs deal with the topic
        of discardable faces falling outside of your viewing frustrum (a
        cube in which segments of your geometry will be visible in our
        case) which is feasible when it comes to rendering scenes with
        many polygons some of which are probably completely or partly
        invisible.

        Culling faces that trivially fall completely into the space in
        the front of and behind of your viewing frustrum can be done
        with very low demand by extending the 2nd pass of the sort
        introduced above. imagine you'd first of all cut out a vertical
        slice of z-space directly in front of the viewer's omitting all
        faces that fall outside.

        Here comes the extension:

// Vertex type

struct {
       x,y,z    : int;  // Original coordinates
       tx,ty,tz : int;  // Transformed coodinates
       ...              // Additional info...
} vertx_t


// Vertex array

vertx_t vertx[NUMVERTEXES];

ZMIN = 0;
ZMAX = ZCLASSES;

// Sorting pass, leaving faces outside z-space

for (int n=0;n<FACES;n++) {
    if (face[n].normal_z>0 &&
        vertx[face[n].vertex1].tz>ZMIN && vertx[face[n].vertex1].tz<ZMAX &&
        vertx[face[n].vertex2].tz>ZMIN && vertx[face[n].vertex2].tz<ZMAX &&
        vertx[face[n].vertex3].tz>ZMIN && vertx[face[n].vertex3].tz<ZMAX
        ) {

       // Add face to the destination list if it is visible
       face[n].next = RenderQueue[face[n].z];
       RenderQueue[face[n].z] = &face[n];
    }
}


        With the convention met above (ZMIN=0 and ZMAX=ZCLASSES) it's
        possible to rescale your z-coordinates to maintain even more
        visual accuracy. You will now have culled trivial cases falling
        completely outside your z-space but what about the ones which
        are only partly outside?

        Well, concerning the z-clipping there's a quick and dirty but
        yet very effective solution that produces reasonable visual
        results: i.e. saturate your vertexes' z-value to the range of
        (ZMIN,ZMAX) prior to transforming from object into screen-space
        but save the non-clipped coordinate for the sorting pass,
        nonetheless. Check it out, it looks ok. Pseudo code ahead:


ZSCALE  =       8

for (int n=0;n<VERTEXES;n++) {
    // Transform vertexes, assume (tx,ty,tz)
    // to hold your transformed vertex
    ...

    vertex[n].z = tz;

    // Clamp the z-value
    if (tz > ZMAX) tz = ZMAX;
    if (tz < ZMIN) tz = ZMIN;

    // Translate into screenspace
    vertex[n].x = (tx << ZSCALE) / tz;
    vertex[n].y = (ty << ZSCALE) / tz;
}


        Ok, let's change to the Next topic, clipping triangles obscured
        by the frustrum's horizontal and vertical boundaries. This can
        be performed in screen space. Although it seems decent to keep
        as much calculus in registers as possible when for instance
        performing a time-critical action like rasterization, I reached
        the conclusion that using an edge-buffer in the rasterizer's
        scan-convertor has many advantages when it comes to clipping
        your triangles (or polygons of higher vertex count). It seems to
        run a bit off-topic now but you'll see what I'm aiming at later
        on. let me introduce a hint at the line clipping algorithm
        developed by Cohen and Sutherland. It subdivides screen-space
        into 9 areas assigned with the following bit-codes:

                         |                 |
               1010      |      1000       |      1001
                         |                 |
        -----------------+-----------------+-----------------
                         |                 |
               0010      |      0000       |      0001
                         |                 |
        -----------------+-----------------+-----------------
                         |                 |
               0110      |      0100       |      0101
                         |                 |

        Each of the bitfield's four flags has a certain meaning
        regarding the placement of each vertex (with '0000' meaning
        'completely inside', bit #0: outside the right border, bit #1:
        outside the left border, bit #2: below the bottom border, bit
        #3: above the top border). Maintaining a clipping code for each
        vertex after translation into view space has obvious advantages
        when it comes to sorting out non-trivial and trivial clipping
        cases, as you may read on. Ahead of everything we will try to
        use these so called 'Cohen-Sutherland' clipping codes' to reject
        trivial cases that have all vertexes outside a certain screen
        boundary, thus falling out of the view-frustrum completely. This
        can be achieved by looking at bits each of the (three) vertexes
        have in common, i.e. check if their clipping code's logical and
        is different from zero. This will mark a trivial case so the
        face can be culled. Pseudo code:

// Extended vertex type

struct {
       x,y,z    : int;  // Original coordinates
       tx,ty,tz : int;  // Transformed coodinates
       cs_code  : char; // Clipping flags
       ...              // Additional info...
} vertx_t


// Sorting pass, expanded to reject trivial cases from the viewing frustrum
// We assume the vertex clipping codes have already been computed before
// (during your coordinate transform loop for instance)

for (int n=0;n<FACES;n++) {

    // Sort out the trivial cases
    if (face[n].normal_z>0 &&
        vertx[face[n].vertex1].tz>ZMIN && vertx[face[n].vertex1].tz<ZMAX &&
        vertx[face[n].vertex2].tz>ZMIN && vertx[face[n].vertex2].tz<ZMAX &&
        vertx[face[n].vertex3].tz>ZMIN && vertx[face[n].vertex3].tz<ZMAX &&
       (vertx[face[n].vertex1].cs_code &
        vertx[face[n].vertex2].cs_code &
        vertx[face[n].vertex3].cs_code) == 0
        ) {

       // Add face to the destination list if it is visible
       face[n].next = RenderQueue[face[n].z];
       RenderQueue[face[n].z] = &face[n];
    }
}


        Still with me so far? Since now it's time to get slightly more
        complicated: Non trivial cases and partly triangle clamping in
        2d space. Here you will see why it makes sense to use an edge-
        buffer; I will start with vertical clamping. This is easy: as
        you might know each triangle (face) section (i.e. edge) will be
        scan-converted separately. It's a common method to use two edge-
        buffers (one for the 'left' and one for 'right' face sections)
        directly obtaining the offset and length for each scanline to be
        rasterized. whether a section is to be placed into the left or
        right buffer is easy with triangles once you've vertically
        sorted the triangle's vertexes, i.e. just compute the
        'direction' of the longest row of the triangle to select between
        one of the two possible configurations (2nd vertex to the left
        or 2nd vertex to the right of the triangle).

        Ok, back to the topic of vertical clamping. Imagine a section to
        be scan-converted, call its vertexes v1 and v2 and assume those
        to be vertically sorted, with clipping codes already assigned,
        of course (as you probably remember bits #3 and #2 of the
        clipping codes will inform you about vertical boundary
        penetrations), there are 6 out of 16 possible situations:

Case
   #1  '0000'   #2 '1000'   #3 '0001'   #4 '1001'   #5 '1010'   #6 '0101'

              |           |           |           |  x v1 (10)|
              |           |           |           | /         |
              |  (10) x v1|           |v1 (10) x  |x v2 (10)  |
   -----------+------/----+-----------+-------/---+-----------+-----------
   (00) v1 x  |     /     | x v1 (00) |      /    |           |
          /   |    /      |  \        |     /     |           |
         /    |   /       |   \       |    /      |           |
    v2  x (00)|  x v2 (00)|    \      |   /       |           |
   -----------+-----------+-----\-----+--/--------+-----------+-----------
              |           | (01) x v2 | x v2 (01) |           |x v1 (01)
              |           |           |           |           | \
              |           |           |           |           |  x v2 (01)


        Packing the clip-codes into a four bit field as above you can
        instantly use the codes as indexes into 16 entry jump-table
        pointing to 6 routines that do either trivially accept (case
        #1), omit (cases #5, #6) or clamp (cases #2, #3, #4) the given
        edge.

        Be careful when it comes to vertically clipping the triangle's
        longer edge, its clipped height will give you the possibly new
        number of visible scanlines that need to be rasterized, take
        this into account to set up your scanline loop counter.

        Once you have cut off invisible vertical parts you can once
        again use a jump-table for the 6 different horizontal setups
        (which are basically the same as the vertical possibilities).
        This time, however, you do not need to completely reject the
        edge-parts that fall outside of the frustrum but rather set
        those slices to the minimum/maximum horizontal coordinate along
        their vertical height depending on where which boundary is being
        crossed. and of course use bits #1 and #0 of the accordant clip-
        codes this time.

        I think you get the idea, the nice thing about it: piss-easy
        implementation the main deal will be writing the loop setups for
        2*6 cases in your scan-conversion routine and you don't even
        need to change your rasterization loop, at all. Another
        advantage, it's also very easy to extend the method to different
        rasterization methods (lightning, texture-mapping etc.). A minor
        disadvantage, the method will leave out few trivial rejection
        cases and render those into clipped cases which will however be
        clamped off vertically or horizontally, respectively, thus
        costing some unnecessary bus transfers.

        I hope you got the general idea and you maybe found this little
        doc useful, if you have any questions feel free to drop me a
        mail via ray(at)tscc(dot)de.

        Keep it real, stay Atari! ;)

                                        ray//.tSCc. for Alive, 2005-10-16
Alive 11