

 Low demand frusctrum Culling, fast TriangleClipping in 2D Space effient Facesorting and misc. Topics  I'm having some leisure time over here at Milliways 2k5 in kradd/.tSCc.'s place so I decided to write another coding gem hopefully helpful to one or another. So what's it gonna be about? at the time I wrote the 3d engine used in our eil entry 'beams' (sorry for the delay once again) sometime back in 2001 I had to come up with some ideas on how to perform fast triangle clipping in viewspace and fast frustrum culling for the large screens that had a high facecount. now, the engine used in 'beams' is by no means optimized as can be, but however, many many sophisticated methods to perform the mentioned tasks are already known nowadays, some of which even found their way in hardware implementations. Thus I decided to share a bit of experience I gathered while writing my engine in order to give a little advice in choosing from the palette of algorithms to those of you who are yet faced with the task of coding their own engine on a 'lowtech' system like the 16/32bit range of Atari machines. Let me start off with the topic of sorting your geometry's faces which is required if you aim to achieve a correct visual result before rendering them into the screen (i.e. they have to be drawn in back to front order a method commonly referred to as the 'painter's algorithm') there are dozens of superior algorithms like the zbuffer (mostly found in hardware solutions and not reasonable on any Atari due to its high demand), or the sbuffer, BSPtrees and other solutions that do not suffer from problems like rendering errors on cyclic overlapping faces unlike the painter's algorithm for instance but in which the tradeoff between implementation effort and speed gain doesn't pay off IMHO (especially in using 68k assembler as a language). So I decided to go for a plain face depthsorting method which of course requires your sorting to be as quick as possible so of course you wouldn't go for any worst case O(n^2) algorithms like bubble, selection sort or alike things. e.g. quicksort is able to perform with O(n*log n) complexity in its best case but there is however another simple method which requires O(n) to sort the given set of objects outperforming quicksort even with relatively low face counts (few hundreds of below) when realized carefully, namely the 'radix sort'. The idea of radix sort is basically to put your source array's entries into the appropriate places of a destination array of the same size according to a radix or key (in our case the face's depth) right away. At this point I would like to mention that it is generally recommended to use the average of the face's vertexes' z coordinate as its depth and if your engine is limited to a constant vertex count (like 3 in my case) it's of course sufficient to only sum up these zvalues without averaging. Normally it would take at least three passes to perform the task of sorting your array which are 1st: counting the radixes of a single class for all classes, meaning counting all faces of the same depth, 2nd: summing up the obtained values in order to transform them into indexes into the destination array and 3rd: using those indexes to put the source elements into the proper positions in the destination array. Using an array of linked lists as a destination array you can even omit the first two passes and put your source arrays faces into the proper position of the destination array, which I call the 'render queue'. If you aim at beating O(n*log n) type sorting algorithms (and of course you do ;) you'll have to reduce the solution's constant timing amount which gets mainly determined by the number of different classes. This means that we have to keep our zgranularity as coarse as possible but fine enough to still make a visual result that is free of glitches. I've found reserving 9 bits for the zresolution to fulfil this demand. Here comes some pseudo code to clarify the idea: // Data class definitions ZGRANULAR = 9 // Bits for the depth resolution ZCLASSES = 1<<ZGRANULAR; struct { ... int normal_z; // z coordinate of the face's normal int z; // the face's 'depth' } face_t; face_t face[NUMFACES]; // Face array face_t * RenderQueue[ZCLASSES] = {NULL}; // Visibility list // Sorting pass for (int n=0;n<FACES;n++) { if (face[n].normal_z>0) { // Add face to the destination list if it is facing us face[n].next = RenderQueue[face[n].z]; RenderQueue[face[n].z] = &face[n]; } } // Rendering pass (back to front) for (int z=ZCLASSES1;z!=0;z) { while(RenderQueue[z] != NULL) { // Render faces with current depth RenderFace(RenderQueue[z]); RenderQueue[z] = RenderQueue[z]>next; // Next face } } I hope you get the idea. The next paragraphs deal with the topic of discardable faces falling outside of your viewing frustrum (a cube in which segments of your geometry will be visible in our case) which is feasible when it comes to rendering scenes with many polygons some of which are probably completely or partly invisible. Culling faces that trivially fall completely into the space in the front of and behind of your viewing frustrum can be done with very low demand by extending the 2nd pass of the sort introduced above. imagine you'd first of all cut out a vertical slice of zspace directly in front of the viewer's omitting all faces that fall outside. Here comes the extension: // Vertex type struct { x,y,z : int; // Original coordinates tx,ty,tz : int; // Transformed coodinates ... // Additional info... } vertx_t // Vertex array vertx_t vertx[NUMVERTEXES]; ZMIN = 0; ZMAX = ZCLASSES; // Sorting pass, leaving faces outside zspace for (int n=0;n<FACES;n++) { if (face[n].normal_z>0 && vertx[face[n].vertex1].tz>ZMIN && vertx[face[n].vertex1].tz<ZMAX && vertx[face[n].vertex2].tz>ZMIN && vertx[face[n].vertex2].tz<ZMAX && vertx[face[n].vertex3].tz>ZMIN && vertx[face[n].vertex3].tz<ZMAX ) { // Add face to the destination list if it is visible face[n].next = RenderQueue[face[n].z]; RenderQueue[face[n].z] = &face[n]; } } With the convention met above (ZMIN=0 and ZMAX=ZCLASSES) it's possible to rescale your zcoordinates to maintain even more visual accuracy. You will now have culled trivial cases falling completely outside your zspace but what about the ones which are only partly outside? Well, concerning the zclipping there's a quick and dirty but yet very effective solution that produces reasonable visual results: i.e. saturate your vertexes' zvalue to the range of (ZMIN,ZMAX) prior to transforming from object into screenspace but save the nonclipped coordinate for the sorting pass, nonetheless. Check it out, it looks ok. Pseudo code ahead: ZSCALE = 8 for (int n=0;n<VERTEXES;n++) { // Transform vertexes, assume (tx,ty,tz) // to hold your transformed vertex ... vertex[n].z = tz; // Clamp the zvalue if (tz > ZMAX) tz = ZMAX; if (tz < ZMIN) tz = ZMIN; // Translate into screenspace vertex[n].x = (tx << ZSCALE) / tz; vertex[n].y = (ty << ZSCALE) / tz; } Ok, let's change to the Next topic, clipping triangles obscured by the frustrum's horizontal and vertical boundaries. This can be performed in screen space. Although it seems decent to keep as much calculus in registers as possible when for instance performing a timecritical action like rasterization, I reached the conclusion that using an edgebuffer in the rasterizer's scanconvertor has many advantages when it comes to clipping your triangles (or polygons of higher vertex count). It seems to run a bit offtopic now but you'll see what I'm aiming at later on. let me introduce a hint at the line clipping algorithm developed by Cohen and Sutherland. It subdivides screenspace into 9 areas assigned with the following bitcodes:   1010  1000  1001   ++   0010  0000  0001   ++   0110  0100  0101   Each of the bitfield's four flags has a certain meaning regarding the placement of each vertex (with '0000' meaning 'completely inside', bit #0: outside the right border, bit #1: outside the left border, bit #2: below the bottom border, bit #3: above the top border). Maintaining a clipping code for each vertex after translation into view space has obvious advantages when it comes to sorting out nontrivial and trivial clipping cases, as you may read on. Ahead of everything we will try to use these so called 'CohenSutherland' clipping codes' to reject trivial cases that have all vertexes outside a certain screen boundary, thus falling out of the viewfrustrum completely. This can be achieved by looking at bits each of the (three) vertexes have in common, i.e. check if their clipping code's logical and is different from zero. This will mark a trivial case so the face can be culled. Pseudo code: // Extended vertex type struct { x,y,z : int; // Original coordinates tx,ty,tz : int; // Transformed coodinates cs_code : char; // Clipping flags ... // Additional info... } vertx_t // Sorting pass, expanded to reject trivial cases from the viewing frustrum // We assume the vertex clipping codes have already been computed before // (during your coordinate transform loop for instance) for (int n=0;n<FACES;n++) { // Sort out the trivial cases if (face[n].normal_z>0 && vertx[face[n].vertex1].tz>ZMIN && vertx[face[n].vertex1].tz<ZMAX && vertx[face[n].vertex2].tz>ZMIN && vertx[face[n].vertex2].tz<ZMAX && vertx[face[n].vertex3].tz>ZMIN && vertx[face[n].vertex3].tz<ZMAX && (vertx[face[n].vertex1].cs_code & vertx[face[n].vertex2].cs_code & vertx[face[n].vertex3].cs_code) == 0 ) { // Add face to the destination list if it is visible face[n].next = RenderQueue[face[n].z]; RenderQueue[face[n].z] = &face[n]; } } Still with me so far? Since now it's time to get slightly more complicated: Non trivial cases and partly triangle clamping in 2d space. Here you will see why it makes sense to use an edge buffer; I will start with vertical clamping. This is easy: as you might know each triangle (face) section (i.e. edge) will be scanconverted separately. It's a common method to use two edge buffers (one for the 'left' and one for 'right' face sections) directly obtaining the offset and length for each scanline to be rasterized. whether a section is to be placed into the left or right buffer is easy with triangles once you've vertically sorted the triangle's vertexes, i.e. just compute the 'direction' of the longest row of the triangle to select between one of the two possible configurations (2nd vertex to the left or 2nd vertex to the right of the triangle). Ok, back to the topic of vertical clamping. Imagine a section to be scanconverted, call its vertexes v1 and v2 and assume those to be vertically sorted, with clipping codes already assigned, of course (as you probably remember bits #3 and #2 of the clipping codes will inform you about vertical boundary penetrations), there are 6 out of 16 possible situations: Case #1 '0000' #2 '1000' #3 '0001' #4 '1001' #5 '1010' #6 '0101'     x v1 (10)     /   (10) x v1 v1 (10) x x v2 (10)  +/++/++ (00) v1 x  /  x v1 (00)  /   /  /  \  /   /  /  \  /   v2 x (00) x v2 (00) \  /   ++\+/++   (01) x v2  x v2 (01)  x v1 (01)      \      x v2 (01) Packing the clipcodes into a four bit field as above you can instantly use the codes as indexes into 16 entry jumptable pointing to 6 routines that do either trivially accept (case #1), omit (cases #5, #6) or clamp (cases #2, #3, #4) the given edge. Be careful when it comes to vertically clipping the triangle's longer edge, its clipped height will give you the possibly new number of visible scanlines that need to be rasterized, take this into account to set up your scanline loop counter. Once you have cut off invisible vertical parts you can once again use a jumptable for the 6 different horizontal setups (which are basically the same as the vertical possibilities). This time, however, you do not need to completely reject the edgeparts that fall outside of the frustrum but rather set those slices to the minimum/maximum horizontal coordinate along their vertical height depending on where which boundary is being crossed. and of course use bits #1 and #0 of the accordant clip codes this time. I think you get the idea, the nice thing about it: pisseasy implementation the main deal will be writing the loop setups for 2*6 cases in your scanconversion routine and you don't even need to change your rasterization loop, at all. Another advantage, it's also very easy to extend the method to different rasterization methods (lightning, texturemapping etc.). A minor disadvantage, the method will leave out few trivial rejection cases and render those into clipped cases which will however be clamped off vertically or horizontally, respectively, thus costing some unnecessary bus transfers. I hope you got the general idea and you maybe found this little doc useful, if you have any questions feel free to drop me a mail via ray(at)tscc(dot)de. Keep it real, stay Atari! ;) ray//.tSCc. for Alive, 20051016 
