Fast 3-d Animation : Meet X-Sharp

 Fast 3-d Animation: meet X-Sharp

 Journal:   Dr. Dobb's Journal  March 1992 v17 n3 p119(7)
 -----------------------------------------------------------------------------
 Title:     Fast 3-D animation: meet X-Sharp. (Graphics Programming) (Column)
 Author:    Abrash, Michael.
 AttFile:    Program:  GP-MAR92.ASC  Source code listing.

 Abstract:  Three elements come into play to enable 12 three-dimensional
            graphics cubes to rotate at an update rate of about 15 frames per
            second on a 20-MHz 386-based machine with a slow video graphics
            array.  One of them is fixed-point arithmetic, which provides the
            graphics with an immediate order-of-magnitude performance boost.
            The second element is the use of the 386-based machine's 32-bit
            multiply and divide instructions.  The third performance
            enhancement key is to maintain and operate on only the relevant
            sections of the transformation coordinates and matrices.  The
            techniques are very powerful and can cause some problems.  For
            example, since points outside of the 64K by 64K by 64K space
            cannot be handled, multiple matrix concatenations may turn out to
            be imprecise.
 -----------------------------------------------------------------------------
 Descriptors..
 Topic:     Three-Dimensional Graphics
            Rendering
            Tutorial
            Performance Improvement
            Animation
            Programming.
 Feature:   illustration
            program.

 -----------------------------------------------------------------------------
 Full Text:

 Across the lake from Vermont, a few miles into upstate New York, the Ausable
 River has carved out a fairly impressive gorge known as "Ausable Chasm."
 Impressive for the East, anyway; you might think of it as the poor man's
 Grand Canyon.  This summer, I did the tour with my wife and five-year-old,
 and it was fun, although I confess that I didn't loosen my grip on my
 daughter's hand until we were on the bus and headed for home; that gorge is
 deep, and the railings tend to be of the single-bar, rusted-out variety.

 New Yorkers can drive straight to this wonder of nature, but Vermonters must
 take their cars across on the ferry; the alternative is driving three hours
 around the south end of Lake Champlain.  No problem; the ferry ride is an
 hour well spent on a beautiful lake.  Or, rather, no problem -- once you're
 on the ferry.  Getting to New York is easy, but, as we found out, the line of
 cars waiting to come back from Ausable Chasm gets lengthy about
 mid-afternoon.  The ferry can hold only so many cars, and we wound up
 spending an unexpected hour exploring the wonders of the ferry docks.  Not a
 big deal, with a good-natured kid and an entertaining mom; we got ice cream,
 explored the beach, looked through binoculars, and told stories.  It was a
 fun break, actually, and before we knew it, the ferry was steaming back to
 pick us up.

 A friend of mine, an elementary-school teacher, helped take 65 sixth graders
 to Ausable Chasm.  Never mind the potential for trouble with 65 kids loose on
 a ferry.  Never mind what it was like trying to herd that group around a
 gorge that looks like it was designed to swallow children and small animals
 without a trace.  The hard part was getting back to the docks and finding
 they'd have to wait an hour for the next ferry.  As my friend put it, "Let me
 tell you, an hour is an eternity with 65 sixth grades screaming the song 'You
 Are My Sunshine."

 Apart from reminding you how lucky you are to be working in a quiet,
 air-conditioned room, in front of a gently humming computer, free to think
 deep thoughts and eat Cheetos to your heart's content, this story provides a
 useful perspective on the malleable nature of time.  An hour isn't just an
 hour -- it can be forever, or it can be the wink of an eye.  Just think of
 the last hour you spent working under a deadline; I bet it went past in a
 flash.  Which is not to say, mind you, that I recommend working in a bus full
 of screaming kids in order to make time pass more slowly; there are quality
 issues here, as well.

 In our 3-D animation work so far, we've used floating-point arithmetic.
 Floating-point arithmetic, even with a floating-point processor but
 especially without, is the microcomputer animation equivalent of working in a
 school bus: It takes forever to do anything, and you just know you're never
 going to accomplish as much as you want to.  This month, it's time for
 fixed-point arithmetic, which will give us an instant order-of-magnitude
 performance boost.  We'll also give our 3-D animation code a much more
 powerful and extensible framework, making it easy to add new and different
 sorts of objects.  Taken together, these alterations will let us start to do
 some really interesting animation.  Unfortunately, they take a lot of code,
 so I'll have to keep the text short.  Therefore, without further ado, I give
 you real real-time animation.

 Fixed Point, Native 386, and More

 As of last month, we were at the point where we could rotate, move, and draw
 a solid cube in real time.  This month's program, shown in Listings One
 through Ten (pages 134 through 138), goes a bit further, rotating 12 solid
 cubes at an update rate of about 15 frames per second (fps) on a 20-MHz 386
 with a slow VGA.  That's 12 transformation matrices, 72 polygons, and 96
 vertices being handled in real time; not Star Wars, granted, but a giant step
 beyond a single cube.  Run the program if you get a chance; you may be
 surprised at just how effective this level of animation is.  I'd like to
 point out, in case anyone missed it, that this is fully general 3-D.  I'm not
 using any shortcuts or tricks, like prestoring coordinates or pregenerating
 bitmaps; if you were to feed in different rotations or vertices, the
 animation would change accordingly.

 The keys to this month's performance are three.  The first key is fixed-point
 arithmetic.  In the last two months, we've worked with floating-point
 coordinates and transformation matrices.  Those values are now stored as
 32-bit fixed-point numbers, in the form 16.16 (16 bits of whole number, 16
 bits of fraction).  32-bit fixed-point numbers allow sufficient precision for
 3-D animation, but can be manipulated with fast integer operations, rather
 than slow floating-point processor operations or excruciatingly slow
 floating-point emulator operations.  Although the speed advantage of
 fixed-point varies depending on the operation, the processor, and whether a
 coprocessor is present, fixed-point multiplication can be as much as 100
 times faster than the emulated floating-point equivalent.  (I'd like to take
 a moment to thank Chris Hecker for his invaluable input in this area.)

 The second performance key is the use of the 386's native 32-bit multiply and
 divide instructions.  Real-mode C compilers, such as Borland C++ and
 Microsoft C, call library routines to perform multiplications and divisions
 involving 32-bit values, and those library functions are fairly slow,
 especially for division.  On a 386, 32-bit multiplication and division can be
 handled with the bit of code in Listing Nine -- and most of even that code is
 only for rounding.

 The third performance key is maintaining and operating on only the relevant
 portions of transformation matrices and coordinates.  The bottom row of every
 transformation matrix we'll use (at least for the near future) is [0 0 0 1],
 so why bother using or recalculating it when concatenating transforms and
 transforming points?  Likewise for the fourth element of a 3-D vector in
 homogeneous coordinates, which is always 1.  Basically, transformation
 matrices are treated as consisting of a 3 X 3 rotation matrix and a 3 X 1
 translation vector, and coordinates are treated as 3 X 1 vectors.  This saves
 a great many multiplications in the course of transforming each point.

 Just for fun, I reimplemented the animation of Listings One through Ten with
 floating-point instructions.  Together, the preceeding optimizations improve
 the performance of the entire animation -- including drawing time and
 overhead, not just math -- by more than ten times over the code that uses the
 floating-point emulator.  Amazing what one can accomplish with a few dozen
 lines of assembler and a switch in number format, isn't it?  Note that no
 assembly code other than the native 386 multiply and divide is used in
 Listings One through Ten, although the polygon fill code is of course mostly
 in assembler; we've achieved 12 cubes animated at 15 fps while doing the 3-D
 work almost entirely in Borland C++ -- and we're still doing sine and cosine
 via the floating-point emulator.  Happily, we're still nowhere near the upper
 limit on the animation potential of the PC.

 Drawbacks

 The techniques we've used to turbo-charge 3-D animation are very powerful,
 but there's a dark side to them as well.  Obviously, native 386 instructions
 won't work on 8088 and 286 machines.  That's rectifiable; equivalent
 multiplication and division routines could be implemented for real mode (and
 I may just do that one of these months, especially if enough of you give me a
 hard time for taking the easy way out with 386 instructions), and performance
 would still be reasonable.  It sure is nice to be able to plug in a 32-bit
 IMUL or DIV and be done with it, though.  More importantly, 32-bit
 fixed-point arithmetic has limitations in range and accuracy.  Points outside
 a 64K X 64K X 64K space can't be handled, imprecision tends to creep in over
 the course of multiple matrix concatenations, and it's quite possible to
 generate the dreaded divide by 0 interrupt if Z coordinates with absolute
 values less than one are used.  I don't have space to discuss these issues in
 detail now, but here are some brief thoughts.  The working 64K X 64K X 64K
 fixed-point space can be paged into a larger virtual space.  Imprecision of a
 pixel or two rarely matters in terms of display quality, and deterioration of
 concatenated rotations can be corrected by restoring orthogonality, for
 example by periodically calculating one row of the matrix as the
 cross-product of the other two (forcing it to be perpendicular to both).  3-D
 clipping with a front clip plane of -1 or less can prevent divide overflow.

 A New Animation Framework

 Listings One through Ten represent not merely faster animation, but also a
 mostly complete, extensible, data-driven animation framework.  Where earlier
 animation code was hardwired to demonstrate certain concepts, this month's
 code is intended to serve as the basis for a solid animation package.
 Objects are stored, in their entirety, in customizable structures; new
 structures can be devised for new sorts of objects.  Drawing, preparing for
 drawing, and moving are all vectored functions, so that variations such as
 shading or texturing, or even radically different sorts of graphics objects,
 such as scaled bitmaps, could be supported.  The cube initialization is
 entirely data driven; more or different cubes, or other sorts of convex
 polyhedrons, could be added by simply changing the initialization data in
 Listing Eight.

 The animation framework is not yet complete.  Movement is supported only
 along the Z axis, and then in a non-general fashion.  More interesting
 movement isn't supported at this point because of the one gaping hole in the
 package: hidden-surface removal.  Until this is implemented -- and it will
 be, soon -- nothing can safely overlap.  It would actually be easy enough to
 perform hidden-surface removal by keeping the cubes in different Z bands and
 drawing them back to front, but this gets into sorting and list issues, and
 is not a complete solution -- and I've crammed as much as will fit into this
 month's code, anyway.

 Where the Time Goes

 The distribution of execution time in the animation code is no longer wildly
 biased toward transformation, but sine and cosine are certainly still sucking
 up cycles.  Likewise, the overhead in the calls to FixedMul() and FixedDiv()
 are costly.  Much of this is correctable with a little carefully crafted
 assembly language and a lookup table; expect that soon.  When all that is
 firmly in place, we'll take a look at the number of pixels being drawn versus
 the bandwidth of display memory; that'll give us an idea of how close we are
 to the theoretical limit of VGA animation.  Probably not too close, even with
 those optimizations; a faster 2-D clipping approach and still faster
 polygon-fill code will most likely be in order.  (Yes, Virginia, there is an
 even faster way to fill polygons!)

 Regardless, this month we have made the critical jump to a usable level of
 performance and a serviceable framework.  From here on out, it's the fun
 stuff.

 X-Sharp

 Three-dimensional animation is a complicated business, and it takes an
 astonishing amount of functionality just to get off the launching pad: page
 flipping, polygon filling, clipping, transformations, list management, and so
 forth.  There's no way all of that could fit in a single column, and in fact
 I've been building toward a critical mass of animation functionality since my
 very first column in DDJ.  This month builds on code from no less than five
 previous columns.  The code that's required in order to link this month's
 animation program is: Listing One from January (draw clipped line list);
 Listings One and Six from July 1991 (mode X mode set, rectangle fill);
 Listing Six from September 1991; Listing Four in March 1991 (polygon edge
 scan); and the FillConvexPolygon() function from Listing One from February
 1991.  The struct keywords in FillConvexPolygon() must be removed to reflect
 the switch to typedefs in the animation header file.

 This is the last time I'm going to list all the code needed to build the
 animation package, and I am not going to print every change to every module
 from now on.  The code is simply getting too large to show every bit of it,
 and the scope is only going to grow as I add functions such as shading, 3-D
 clipping, and hidden surfaces; also, I'm going to be tinkering with stuff
 such as converting key code to assembler and modifying functions slightly as
 structures change, and I hate to take up valuable space in DDJ for what's
 basically fine-tuning and housekeeping.

 I think we've reached the point where we can call this an ongoing project and
 give it a name.  In the spirit of Al Stevens' wonderful D-Flat, I hereby dub
 the animation package X-Sharp.  (X for mode X, sharp because who wants a flat
 animation package?)  From now on, I will make the full source for X-Sharp
 available, complete with make files, online, and otherwise.  It will be
 available in the file XSHARPn.ARC in the DDJ Forum on CompuServe and on M&T
 Online.  Alternatively, you can send me a 360K or 720K formatted diskette and
 an addressed, stamped diskette mailer, care of DDJ (411 Borel Ave., San
 Mateo, CA 94403-3522), and I'll send you the latest copy of X-Sharp.  There's
 no charge, but, in the spirit of Al Stevens's "careware," it'd be very much
 appreciated if you'd slip in a dollar or so for the folks at the Vermont
 Association for the Blind and Visually Impaired, who help the visually
 impaired build productive, self-sufficient lives, and are amazingly
 successful at it.  Imagine for a moment trying to do your work if you lost
 your sight (and it can be done; I got a request for the text of my book Zen
 of Assembly Language in computer-readable form from a blind programmer the
 other day)--heck, imagine trying to cross the street--and I suspect you'll
 understand why it matters so much.  As Al says, it's purely voluntary, but
 both you and I will feel good.

 I'm available on a daily basis to discuss X-Sharp on M&T Online and Bix (user
 name MABRASH in both cases).

 [BACK] Back
Discuss this article in the forums
Date this article was posted to GameDev.net: 7/16/1999
(Note that this date does not necessarily correspond to the date the article was written)
See Also:
Michael Abrash's Articles