PLATO Empire - Squeezing out the Speed

Techniques for Squeezing the Speed

NOTE: I apologize for the long text with few graphics. I plan on making up some useful pictures and animations to help these ideas.

When the central system is providing processing power in terms of thousands-of-instructions-per-second optimization is key. Your desktop or laptop computer has a clock cycle in terms of gigahertz (BILLIONS of instructions per second).

To be fair, the Control Data Cyber was not a slouch in processing, especially for its time. It used 60-bit words for memory; our PCs used to be one or two bytes for a word and most now are four bytes (32-bits). The big systems today are just in the past few years handling 64-bit words. One difference,though, is we're talking about a computer that took up a lot of floor space and enough power to almost require a substation (another bit of hyperbole) and your PC is on your desk or sitting on your lap.

As mentioned elsewhere, user foreground processing was set at 30 (or 50?) TIPS in the early years and periodically reduced to 10 TIPS. If you ran in background mode you weren't even guaranteed to get any time, but could get a lot higher than the throttle of 10 TIPS.

Also, think about the communications: PLATO talked to its terminals at 1200 BAUD, that's 1200 bits of information per second. (It actually was a bit higher than 1200 baud, like 1280.) Today's low-end DSL pipe to the Internet is 120K or 256K or 100 times faster or more. Display to the terminal needed to be kept optimum as well.

We're in the Proximity of "collision" Detection

The single biggest use of computing time by Empire was those inner most loops of the most-used sections of code, the collision detection. Collision between what? Torpedoes and ships, for example, but also whether your view of space can "see" another object.

For EACH object the location of each other object needed to be compared to see if they were in the same space or proximity to each other. Our goal was to make this as fast as possible, trying to be down below a second per update.

Each time your screen updated, your view of space had to be checked against every other object to determine if it should be displayed.

Empire 3 had up to fifty (50) players each with their own ship. Each player could shoot up to three torpedoes. That is a possibility of 150 torpedoes, 50 ships, and XX planets to compare against.

If you have N objects, including yourself, you need (N-1)+(N-2)+...+ 2 + 1 comparisons because once we've checked object 1 against objects 2 through N, we don't have to check object 2 against object 1; that comparison has already been made.

This series works out to (N - 1) * (N / 2).

N Total Comparisons
1045
100450
200900

So if I'm comparing against 200 objects that's almost 1,000 comparisons, which works out to a lot of instructions and when you only have TEN TIPS it ends up way over a second. Need to reduce the total number of comparisons needed.

Cutting Down to Necessary Comparisons

One simplification: we don't care if two torpedoes are in the same "space". We also don't care if a torepedo goes through a planet. We can think of space as being kind-of three dimensional. So, instead of N being 200 or more, we're only comparing each ship against each torpedo or Ships * Torps rather than ((Ships + Torps)!.

BUT REMEMBER, an object is not represented by a single value, it's location has both and X and a Y value AND we're not the X of object1 to the X of object 2, it is an area around the object; in the case of a collision of a ship against another ship or torpedo, it is the location of the ship to the area AROUND the other object. In the case of a screen display update, the object would be the area around the X and Y coordinate at the center of the view space. (Whew!)

So, is X1 > (X2-offsetX) AND X1 < (X2+offsetX) AND Y1 > (Y2-offsetY) AND Y1 < (Y2+offsetY).

Since for any time through the comparisons the offsets can be considered constants, those four calculations should be done prior to the inner loop. [Modern compilers MAY do that optimization, but it is cleaner to be explicit about it.]

Note also the "AND". Here if any comparison is false there is no reason to continue doing the other comparisons. The object is not within range. Go on to the next object.

Sorry for the Math. Trust me, it was a LOT fewer comparisons.

And trust me, even with the inner loops optimized to the max, we were still running slow. We needed a faster way.

In PLATO TUTOR the loop would look like this (NOTE: the assignment arrow in TUTOR has been replaced by "=" for simplicity.):

Comparison Loop in PLATO TUTOR:

calc	 posx_lft=obx(targ) - proxlft $$ pos x,y a 'constant' for this loop
	 posx_rgt=obx(targ) - proxrgt
	 posy_bot=oby(targ) - proxbot
	 posy_top=oby(targ) - proxtop
	 i=0;

loop    (i < objhi)
.       calc    i=i+1
reloop  (obx(i) < posx_lft)
reloop  (obx(i) > posx_rgt)
reloop  (oby(i) < posy_bot)
reloop  (oby(i) > posy_top)
.	 $$ object is within proximity. BOOM.
endloop

If the comparison on the -reloop- command is TRUE, then it immediately returns to the top of the loop. Here, if any of the four comparisons is true, then the object is NOT within proximity.

You can not determine exactly how many comparisons are required because it depends on the always-changing locations of all the objects. The nature of the game tends to draw most of the players toward each other in order to have conflict, something like young soccer players all chasing the ball, so more comparisons will be likely. The maximum case will happen when ALL objects are within proximity or 4 * ((N-1) * (N/2)) comparisons; or 6 * ((N-1) * (N/2)) comparisons if you are working in 3D space, which is what I was trying to optimize for Empire S and Empire V.

Pipeline Processing -findall- or the Masked Bandit

One of the reasons I am writing this section in detail is not just to make a basic primer for computer game design or intro to computer science, this next particular technique I am going to detail potentially has significant use in many areas, now that processors are back into the 64-bit word sizes or larger.

I'm not sure of when the PLATO systems programmers came out with the -find- and -findall- commands, but I'm thinking it was spring 1975. Two reasons I think it was this date:

Seymour Cray designed some of the fastest supercomputers of his time. The Control Data Cybers were his design and later he formed his own company to make the Cray 1 and other supercomputers. On many computers some calculations, such as addition and subtraction, can take only one processing cycle. Other calculations, like multiplication and especially division, can take many cycles. Cray segmented each of these calculations into separate functional units, so if your calculation were to be to add the values of each element of two arrays of numbers together, then after the first cycle the result of adding the first two elements together could be stored, on the next cycle, the next two, and so on.

This is called pipelining.

The -findall- command made use of this design feature so you could do a comparison between a value and an array of values. It provided for the use of a "mask" so that specific bit elements could be seen and others masked off and the result would be an array indicating just the elements that matched the comparison.

Ok. You're thinking... So?

Recall from earlier sections ( Empire 3 Pieces of the Pie and Empire S Problem with the Segment") that we have the arrays of X and Y coordinates in a vertical array, both X and Y in the same word for each element. So, X(12) and Y(12) were both packed into the same 60-bit word.

Now do you see it?

The X and Y coordinate for each object are both stored together packed into a single word. The coordinates of the objects are an array of 60-bit words. ALL you've got to do is make the right target object and mask and do a single -findall- which will provide a list of objects that match. After some setup calculations one single -findall- command generates an array of all objects that are either what should be displayed or that need to blow up.

Oops. But what should that single target and mask look like? That's what took me the year of pondering to come up with and even after a lot of scribbling, drawings, long walks, sleep-deprived nights, it finally took me gathering the team together to break through. The mix of us participating varied, but consisted of myself, Gary Fritz, Chuck Miller, Mike Rodby (if I recall right), Jim Battin (I think?), and likely some others. Sorry if I missed anyone.

We spent many evenings drawing on all of the chalk boards in the room next to the terminal room in the Education building. At the end of the evening we'd take away many notes, think about the ideas for another day or two, reconvene, and talk more. If I recall right, we spent almost two weeks of evenings and through some weekends, off and on, talking about this idea.

Eventually, Chuck Miller came in with an idea that was well thought out and after a couple tweaks actually worked!

Chart drawn on PLATO summer of 1976 by Chuck Miller (I think).

-findall- grid-shift technique

This takes a little visualization and a lot of explanation to understand. Let me try to explain it as simply as possible, first, then maybe add more detail. If this isn't clear, send me an email.

This chart represents ONLY the X coordinate mask. We'll need to do the same thing for the Y coordinate.

Start by looking at the bottom grids. There are ships A, B, C, D, E, and F on the grid and think of them as being "close" on the Y-axis. We're just concerned here about thinking about the X axis. These represent that each ship is in a different sector of 512x512 (another power of two).

Chuck divided the larger grid into "fine-grid" and "gross-grid". Maybe he was drawing the whole thing out on one of those pre-printed grids of graph paper with the darker grid every eight or ten lines. Eight lines, of course, works out well as a power of two it that becomes important.

The location of the four ships (A to F) is important since it tests various conditions. A, B, and C are in Gross Grid #1. D, E, and F are in Gross Grid #2.

Which ships should each other ship be able to see?

If each ship is only looking at one sector, none.

Which should you see if the ship is looking at just each of the sectors around the sector it is in? A and B would see each other. E and F would see each other.

If you were using my mask idea, which was trying to mask sections of the actual X and Y coordinates, there would always be cases where C and D would NOT see each other and that's the reason I was having difficulty figuring out a way to use -findall-.

What Chuck did was use each bit to represent a whole sector. The problem is if you want space to be a lot more than 60 sectors wide, for example or to fit both X and Y target/masks into one word for -findall- (i.e. give 30-bits to X and 30 to Y), then you needed a little different representation.

So, Chuck used the Gross Grid to represent where in all of space the smaller representation of the Fine-Grid was located.

Said another way, go look at that sheet of grid paper. If you put an X in the third big grid down and second over, the Gross Grid would tell you it was in the 2 x 3 gross grid and the Fine Grid tells you where within that particular Gross Grid.

Note, though, on the Gross-Grid chart in the upper right and look back and forth between it and the layout of the ships in space below. See how ship A is in the left half of Gross Grid #1? That's why it shown as "1 A" in the top. (There actually would be two bits set to represent the location, not an "A" squished into that bit.)

Note that "B" is shown as "B 1" and so on. If you were scanning down through the Gross-Grid list looking for ships within a certain range, ship "A" would have B, C, D, and E show up in that part of the scan, but ship F would not show up because the mask only shows the two bits that ship A "might" see.

You have to refine it more, though, which is where the Fine-Grid side comes in. Chuck laid it out so that four bits on each side of your target bit would be set. Then, depending on your mask will determine how many objects you will actually see.

In this example, ship F is way far away from ship A, nine sectors. Each would see the other only if they were using their long-range scanners. (We had some discussions about alerting you if someone were using longer range scanners that your sensors would pick it up, whereas if they were using visual or closer range scanners, maybe no alert would happen. An aside to the main topic.)

Chuck also wanted to have the proximity size be variable. So that a photon torpedo would "see" only ships really close to it, but also so that a ship could use long-range sensors. You just need to put a certain number of bits in your mask to find objects within a certain view space (proximity).

Magic

It seems magical to have a single masked command to return a list of object candidates. In my opinion, this saved the playability of the game.

By the way, you are not required to represent X and Y (or XYZ) in this manner to get a lot of speed. One could just use the X (or Y or Z) to get a slice of space. The probability of a bunch of objects in that slice is a lot smaller. Then, you just check each of the found objects to ensure they are actually ones you need to contend with.

Sharing Time

Even with fantastic methods for collision detection and reducing the number of objects involved, Empire was still on a timesharing system. Occasional reports of odd activity would be experienced. Since we did not have control of the system we had to make do. We could have a minimum update time, say no more than once a second, to go through a loop to update object locations for movement. Another user could compare all of the object locations.

What, for example, happens when one of the users is running in background mode and gets frozen by the system? And, what happens if that user is frozen in the middle of one of the updates, whether in updating the movement of the objects or in the comparisons?

If one user has the luck of updating ALL of the object locations, but is frozen, then all objects stop moving.

If one user is doing collision detection and is frozen, then no objects will be indicated for collision (explosion).

The game sort of blurps along in spurts in some scenarios, but only occasionally.

We ended up having to split up the load, distributing it among the players. This meant that while Empire was running on a timesharing system, it was also timesharing within the program, which made for some complexities.

Later, after I moved to Minneapolis, someone (Jim Battin?) developed an option adding it to EmpHelp so that one or more logins could run in a tight, minimum display loop, taking some of the processing task. By having these runners (others may call these processes 'daemons') doing the update task, the players then could generally expect to have a smoother play.