Raw Speed and more (X-sharp 3-d prog.)
Journal: Dr. Dobb's Journal April 1992 v17 n4 p139(7)
-----------------------------------------------------------------------------
Title: Raw speed and more. (Graphics Programming column; X-Sharp 3-D
animation program) (Tutorial)
Author: Abrash, Michael.
AttFile: Program: GP-APR92.ASC Source code listing.
Abstract: X-Sharp three-dimensional (3-D) software has assembly language
added for performance improvement, and for matrix math functions.
Several functions can be eliminated by converting them to pure
assembly language, such as the function to multiply a matrix by a
vector, and the function to concatenate matrices. Depth sorting
is useful for handling hidden surfaces, however it does not
address the issue of nonconvex objects. Convex polyhedrons are
addressed. Fast microcomputer-based animation is not seriously
affected by problems with depth sorting, problems that can affect
photo-realistic rendering to a greater extent. Features that are
still to be added to X-Sharp include shading, antialiasing,
support for non-convex objects, 3-D clipping, texture mapping and
performance improvement.
-----------------------------------------------------------------------------
Descriptors..
Topic: Assembly Language
Tutorial
Programming Instruction
Graphics Software
Optimization
Performance Improvement
Software Design
Animation.
Feature: illustration
cartoon
program.
-----------------------------------------------------------------------------
Full Text:
As usual, there's an awful lot to cover this month, so I'll have to make this
quick. That's a shame, because this absolutely true story is even lovelier
in the long version, as told by--well, I promised I wouldn't even hint at who
he is, for reasons that will soon be obvious.
Years ago, this friend of mine--let's call him Bert--went to Hawaii with
three other fellows to celebrate their graduation from high school. This was
an unchaperoned trip, and they behaved pretty much as responsibly as you'd
expect four teenages to behave, which is to say, not; there's a story about a
rental car that, to this day, Bert can't bring himself to tell. They had a
good time, though, save for one thing: no girls.
By and by, they met a group of girls by the pool, but the boys couldn't get
past the hi-howya-doin stage, so they retired to their hotel room to plot a
better approach. This being the early '70s, and them being slightly tipsy
teenagers with raging hormones and the effective combined IQ of four
eggplants, it took them no time at all to come up with a brilliant plan:
streaking. The girls had mentioned their room number, so the boys piled into
the elevator, pushed the button for the girls' floor, shucked their clothes
as fast as they could, and sprinted to the girls' door. They knocked on the
door and ran on down the hall. As the girls opened their door, Bert and his
crew raced past, toward the elevator, laughing hysterically.
Bert was by far the fastest of them all. He whisked between the elevator
doors just as they started to close; by the time his friends got there, it
was too late, and the doors slid shut in their faces. As the elevator began
to move, Bert could hear the frantic pounding of six fists thudding on the
closed doors. As Bert stood among the clothes littering the elevator floor,
the thought of his friends stuck in the hall, naked as jaybirds, was just too
much, and he doubled over with helpless laughter, tears streaming down his
face. The universe had blessed him with one of those exceedingly rare
moments of perfect timing and execution.
The universe wasn't done with Bert quite yet, though. He was still contorted
with laughter--and still quite thoroughly undressed--when the elevator doors
opened again. On the lobby.
And with that, we come to this month's topics, raw speed and hidden surfaces.
Raw Speed, Part I: Assembly Language
I would like to state, here and for the record, that I am not an assembly
language fanatic. Frankly, I prefer programming in C; assembly language is
hard work, and I can get a whole lot more done with fewer hassles in C.
However, I am a performance fanatic, performance being defined as having
programs be as nimble as possible in those areas where the user wants fast
response. And, in the course of pursuing performance, there are times when a
little assembly language goes a long way.
We're now in the fourth month of working on the X-Sharp #-D animation
package. In real-time animation, performance is sine qua non--Latin for
"Make it fast or find another line of work"--so some judiciously applied
assembly language is in order. Last month, we got up to a serviceable
performance level by switching to fixed-point math, then implementing the
fixed-point multiplication and division functions in assembler in order to
take advantage of the 386's 32-bit capabilities. There's another area of the
program that fairly cries out for assembly language: matrix math. The
function to multiply a matrix by a vector (XformVec()) and the function to
concatenate matrices (ConcatXforms()) both loop heavily around calls to
FixedMul(); a lot of calling and looping can be eliminated by converting
these functions to pure assembly language.
Listing One, page 157, is the module FIXED.ASM from the current version of
X-Sharp, with XformVec() and ConcatXforms() implemented in assembly language.
The code is heavily optimized, to the extent of completely unrolling the
loops via macros so that looping is eliminated althogether. FIXED.ASM is
highly effective; the time taken for matrix math is now down to the point
where it's a fairly minor component of execution time, representing less than
ten percent of the total. It's time to turn our optimization sights
elsewhere.
Raw Speed, Part II: Look it Up
It's a funny thing about Turbo Profiler: Time spent in the Borland C++ 80x87
emulator doesn't show up directly anywhere that I can see in the timing
results. The only way to detect it is by way of the line that reports what
percent of total time is represented by all the areas that were profiled; if
you're profiling all areas, whatever's not explicitly accounted for seems to
be the floating-point emulator time. This quirk fooled me for a while,
leading me to think sine and and cosine weren't major drags on performance,
because the sin() and cos() functions spend most of their time in the
emulator, and that time doesn't show up in Turbo Profiler's statistics on
those functions. Once I figured out what was going on, it turned out that
not only were sin() and cos() major drags, they were taking up over half the
total execution time by themselves.
The solution is a lookup table. Listing One contains a function called
CosSin() that calculates both the sine and cosine of an angle, via a lookup
table. The function accepts agnles in tenths of degrees; I decided to use
tenths of degrees rather than radians because that way it's always possible
to look up the sine and cosine of the exact angle requested, rather than
approximating, as would be required with radians. Tenths of degrees should
be fine enough control for most purposes; if not, it's easy to alter CosSin()
for finer gradations yet. GENCOS.C, the program used to generate the lookup
table (COSTABLE.INC), included in Listing One, can be found in the X-Sharp
archive. GENCOS.C can generate a cosine table with any integral number of
steps per degree.
FIXED.ASM speeds X-Sharp up quite a bit, and it changes the performance
balance a great deal. When we started out with 3-D animation, calculation
time was the dragon we faced; more than 90 percent of the total time was
spent doing matrix and projection math. Additional optimizations in the area
of math could still be made (using 32-bit multiplies in the backface-removal
code, for example), but fixed-point math, the sine and cosine lookup, and
selective assembler optimizations have done a pretty good job already. The
bulk of the time taken by X-Sharp is now spent drawing polygons, drawing
rectangles (to erase objects), and waiting for the page to flip. In other
words, we've slain the dragon of 3-D math, or at least wounded it grievously;
now we're back to the dragon of polygon filling. We'll address faster
polygon filling soon, but for the moment, we have more than enough horsepower
to have some fun with. First, though, we need one more feature: hidden
surfaces.
Hidden Surfaces
So far, we've made a number of simplifying assumptions in order to get the
animation to look good; for example, all objects must currently be convex
polyhedrons. We'll deal with that one down the road a little, but first we
have to address a still more fundamental limitation, which is that right now,
objects can never pass behind or in front of each other. In short, it's time
to have a look at hidden surfaces.
There are a passel of ways to do hidden surfaces. Way off at one end (the
slow end) of the spectrum is z-buffering, whereby each pixel of each polygon
is checked as it's drawn to see whether it's the frontmost version of the
pixel at those coordinates. At the other end is the technique of simply
drawing the objects in back-to-front order, so that nearer objects are drawn
on top of farther objects. The latter approach, depth sorting, is the one
we'll take today. (Actually, true depth sorting involves detecting and
resolving possible ambiguities when objects overlap in z; simply sorting on
z, which we'll be doing this month, is more precisely known as the "painter's
algorithm.")
Depth sorting is fast but less than perfect. For one thing, it doesn't
address the issue of nonconvex objects, so we'll have to stick with convex
polyhedrons for now. For another, there's the question of what part of each
object to use as the sorting key; the nearest point, the center, and the
farthest point are all possibilities--and, whichever point is used, depth
sorting doesn't handle some overlap cases properly. Figure 1 illustrates one
case in which back-to-front sorting doesn't work, regardless of what point is
used as the sorting key.
For photo-realistic rendering, these are serious problems.
For fast PC-based animation, however, they're manageable. Choose objects
that aren't too elongated; arrange their paths of travel so they don't
intersect in problematic ways; and, if they do overlap incorrectly, trust
that the glitch will be lost in the speed of the animation and the complexity
of the screen.
Listing Two, page 159, shows the key routines for depth sorting, from the
X-Sharp file OLIST.C. Objects are now stored in a linked list. The initial,
empty list, created by InitializeObjectList(), consists of a sentinel entry
at either end, one at the farthest possible z coordinate, and one at the
nearest. New entries are inserted by AddObject() in z-sorted order. Each
time the objects are moved, before they're drawn at their new locations,
SortObjects() is called to z-sort the object list, so that drawing wil
proceed from back to front. The z-sorting is done on the basis of the
objects' center points; a center-point filed has been added to the object
structure to support this, and the center point for each object is now
transformed along with the vertices. That's really all there is to depth
sorting--and now we can have objects that overlap in x and y.
I wish I could reproduce the lastes X-Sharp demonstration program here; I
really do. It's a lot of fun, with 11 spinning cubes and a whirling faceted
ball bouncing back and forth between the screen and deep space at varying
speeds. The number of faces has nearly doubled since last month, to 138, and
the number of vertices is up to 150, but thanks to FIXED.ASM, the animation
is snappier than ever, and of course there's a much-enhanced sense of depth,
thanks to the visual cues made possible by depth sorting. We've reached the
point of genuinely dynamic animation on a 386, even with a slowpoke VGA; my
daughter thinks it's a little scary having those things hurtling toward her
so fast. On a 486 with a fast (Tseng ET-4000) VGA, the animation is too
fast; the illusion of reality is broken. We should all have such problems.
I'd like to list the demo program in its entirety, but I can't. As I
explained last month, X-Sharp is way too large, over 2000 lines now; even the
changes since last month would blow my page budget out of the water.
Instead, I've made the full source available in the file XSHARPn.ARC in the
DDJ Forum on CompuServe, on M&T Online, and in the graphic.disp conference on
Bix. Alternatively, you can send me a 360K or 720K formatted diskette and an
addressed, stamped diskette mailer, care of DDJ, 411 Borel Ave., San Mateo,
CA 94402, and I'll send you the latest copy of X-Sharp. There's no charge,
but it'd be very much appreciated if you'd slip in a dollar or so to help out
the folks at the Vermont Association for the Blind and Visually Impaired.
It's not every day that you can make a difference so easily--and, believe me,
their work does make a difference.
I'm available on a daily basis to discuss X-Sharp on M&T Online and Bix (user
name mabrash in both cases).
Rounding
FIXED.ASM containes the equate ROUNDING[underscore]ON. When this equate is
1, the results of multiplications and divisions are rounded to the nearest
fixed-point values; when it's 0, the results are truncated. The difference
between the results produced by the two approaches is, at most, [2.sup.-16];
you wouldn't think that would make much difference, now, would you? But it
does. When the animation is run with rounding disabled, the cubes start to
distort visibly after a few minutes, and after a few minutes more they look
like they've been run over. In contrast, I've never seen any significant
distortion with rounding on, even after a half-hour or so. I think the
difference with rounding is not that it's so much more accurate, but rather
that the errors are evenly distributed; with truncation, the errors are
biased, and biased errors become very visible when they're applied to
right-angle objects. Even with rounding, though, errors will eventually
creep in, and reorthogonalization will become necessary at some point. We'll
have a look at that soon.
The performance cost of rounding is small, and the benefits are highly
visible. Still, truncation errors become significant only when they
accumulate over time, as, for example, when rotation matrices are repeatedly
concatenated over the course of many transformations. Some time could be
saved by rounding only in such cases. For example, division is performed
only in the course of projection, and the results do not accumulate over
time, so it would be reasonable to disable rounding for division.
Having a Ball
For three months, we've had nothing to look at but triangles and cubes. It's
time for something a little more visually appealing, so the demonstration
program now features a 72-sided ball. What's particularly interesting about
this ball is that it's created by the GENBALL.C program in the BALL
subdirectory of X-Sharp, and both the size of the ball and the number of
bands of faces are programmable. GENBALL.C spits out to a file all the
arrays of vertices and faces needed to create the ball, ready for inclusion
in INITBALL.C. True, if you change the number of bands, you must change the
Color array in INITBALL.C to match, but that's a tiny detail; by and large,
the process of generating a ballshaped object is now automated. In fact,
we're not limited to ball-shaped objects; substitute a different vertex and
face generation program for GENBALL.C, and you can make whatever convex
polyhedron you want; again, all you have to do is change the Color array
correspondingly. You can easily create multiple versions of the base object,
too; INITCUBE.C is an example of this, creating 11 different cubes.
What we have here is the first glimmer of an object-editing system.
GENBALL.C is the prototype for object definition, and INITBALL.C is the
prototype for general-purpose object instantiation. Certainly, it would be
nice to have an interactive 3-D object editing tool and resource management
setup, and perhaps we'll do that someday. We have our hands full with the
drawing end of things at the moment, though, and for now it's enough to be
able to create objects in a semiautomated way.
Eating Crow and Other Delicacies
Back in November, I said, "Hicolor programming unavoidably requires handling
broken rasters" (Hicolor mode features 32K colors, by means of the Sierra
Hicolor DAC combined with a Hicolor-capable Super-VGA.) Any absolute
assertion on my part seems to be the cue for readers to come swarming out of
the woodworks with examples to the contrary. In this case, M&T Online user
rfredrick was quick to point out that setting the bitmap width--th offset
from the start of one line to the start of the next--to 1928 bytes in 640X480
Hicolor mode eliminates broken rasters (displayed lines that span banks). A
bitmap width of 1928 is selected by setting the Row Offset register (CRTC
register 13h) to 241, like so: outpw(0x3d4, 0x13 \ (241 << 8));. Once the
width is set, all you have to do is use 1928 for the offset from one row to
the next, rather than 1280, and you're all set. Actually, the breaks are
still there, but they can be ignored because they happen off to the right of
the displayed portion of the bitmap. To get the Hicolor drawing code from
the November 1991 column to support 1928-wide Hicolor bitmaps, just change
BitmapWidthInBytes from 640*2 to 1928.
rfrederick credits this information to the folks at Everex. Thanks to all
involved for passing it along.
Coming Up
There are a lot of wonderful things to add to X-Sharp, including shading of
many sorts, antialiasing, support for non-convex objects, faster polygon
drawing and clipping, 3-D clipping, texture mapping, still better
performance--you get the idea. Personally, I'm excited as all get-out about
this stuff, and I'll certainly cover more of it in the near future. Graphics
is a broad field, though, and you folks have diverse interests; I'd like to
hear what you're interested in seeing in this space. More 3-D animation?
More animation of other types? Programming techniques for PC graphics
hardware, such as 24-bpp VGAs, the S3 Super-VGA accelerator, and XGA? JPEG?
Graphics operations such as seed fills and fast (and I do mean fast!) line
drawing? Color mapping to a 256-color palette? Dithering? Something else?
It's a big list, because this is a big--and tremendously exciting--field. So
long as there's code to be written for a topic (after all, this is the
Graphics Programming column), I'm game. Let me know what you'd like to see.
[BACK] Back
Discuss this article in the forums
See Also: © 1999-2011 Gamedev.net. All rights reserved. Terms of Use Privacy Policy
|