# Speeding up OpenGL calls

A game with decent graphics (not even “good” by today’s standards) will crank through something to the order of 50,000 OpenGL calls per frame. Maintaining a 30 FPS framerate, then, would require 1,500,000 calls per second. According to my benchmark, on my box, perl is capable of making 2,100,000 do-nothing sub calls per second (regardless of whether it’s to perl or C code). So in a program like that, 71% of the processing power will be spent simply making the OpenGL calls.. That leaves 29% for everything else, including whatever processing OpenGL does within its functions on the main CPU, which simply won’t do.

So I’ve been thinking about ways to speed these up. Display lists are clearly one solution, but they’re not versatile enough. What if you wanted to cycle the colors or textures on a large structure you’re building (something a friend of mine turns out to be doing)?. You can’t use a display list, since you need to set the colors in between the calls.

What we’re battling here is the symbol table lookup and the pure op dispatch speed. We can’t get away from the symbol table lookup as long as we’re in Perl code1. It’s possible to get op dispatch overhead really low, as Parrot has so kindly pointed out. So I was thinking about a pipeline display list structure. Let me explain by example:

```    my \$list = pipeline {
glcall qw{glPushMatrix};
glcall qw{glTranslate 1, 0, 0};
glcall qw{glColor \$ \$ \$}; # extract three numbers from the pipeline
prism_pipeline();         # might extract more numbers
glcall qw{glPopMatrix};
};
my (\$r, \$g, \$b) = @_;
while (1) {
\$r += STEP * 0.1;
\$g += STEP * 0.3;
\$b += STEP * 0.7;
\$list->call;
feed sin \$r, sin \$g, sin \$b;
prism_feed();   # generate the numbers that prism_pipeline asked for
}
```

We’ve turned four calls into two, and with a more complex figures, who-knows-how-many into who-knows-how-few. The pipeline code would be compiled into a quick CGP loop like Parrot uses (verging on the speed of a JIT), and it would just pull numbers out of the pipeline as it needed them. Perhaps we could even build in some simple branching instructions, and, well, maybe just use Parrot, but use a Parrot that’s ready for production.

But that’s the beauty of the interface. When Parrot does become available for production, we could just slap it in place of the original CGP loop, and then we have all of parrot’s neato stuff along with it.

Now, one problem with the interface I just showed is that it lexically separates the pipeline creation and the pipeline feed — a maintenance nightmare for sure. I wonder if I could stick it in a heredoc and actually parse out the piped bits (or hey, maybe even a source filter…). Naturally, making an OpenGL call when there’s no pipeline in scope would just call it, rather than compiling anything. But I wonder if that would give people the impression that they could do more than they actually could. Perhaps I could make one conditional call like:

```    if (\$BUILDING) {
glcall qw{glPushMatrix};
# ...
}
else {
feed \$r, \$g, \$b;
}
```

And that would just become an idiom. I could call `if (\$BUILDING)` something else to make it less confusing, like:

```    draw {
glcall qw{glPushMatrix};
# ...
}
with {
feed \$r, \$g, \$b;
}
```

And the more you separate those, the more speed you get (I of course don’t want it that way, but o/~ you can’t always get what you want o/~). Then instead of making `draw` and `with` `sub(&)`s, I could source filter them out into the `if` structure above, since it’s faster (you see that Perl’s speed becomes a pain here). Of course, there’s another advantage to that, which is that I can just dynamically redefine the regular OpenGL names so it looks like regular GL code, instead of that `glcall qw{}` encantation.

Ideas? I don’t want to have to use Inline::C in my games just to get them to draw fast.

1Sure, I could use a source filter and turn the function calls into, say, opcode numbers, but then you still have to do the lookup on whatever function you’re using to send the numbers.