Hi ya'll... yes I'm sort of back... I took a break from the whole internet thing to
work on our own projects. Sometime soon I'll share a few new things.
I'm still swamped so my updates and hanging out will be sporadic until
we get some things under control.
This is going to be a somewhat technical post so bear with me.
In terms of software oscillators and maintaining timing through
the parallel port, it's actually pretty trivial under linux, and almost as
trivial under windows. Now keep in mind we are NOT talking about
real-time extensions/kernel mods, that'd be cheating (or specically
the CORRECT solution, depending on who you talk to).
For example, both the "butterfly" and the "flower" samples are done
using an parallel port based 8-bit board I hand-wired for tinyness.
The 8-bit board is uber-small so I carry it around with me along with
a mini 1.5" oscilloscope when I want to work on shows discreetly at
coffee shops and truck stops, etc.

http://scc.sc.wfinet.com/howdy/kson_butter.AVI

http://scc.sc.wfinet.com/howdy/kson_abstract.AVI
The first step is to visualize the upper and lower boundaries.
First we'll settle on a scanspeed. In Pangolin, you generally do
abstracts at 8 or 12kpps to avoid stressing the scanners JIC the
internal "math" suggests an aggressive sweep. If you're using
a pattern which you're confident about, 18k is acceptable. I'd
avoid running at 30k on abstracts since you'll almost definitely run into
compression issues if you have a large scanangle (as I generally do)
In either event, we will assume the worst case scenario of 30kpps.
This gives you the programmer 33us to display the next point.
Now if you note that we're feeding the points through the parallel port
so sending each byte will cause a 1us block because of the ISA standard
(unless we're running in EPP or ECP mode, which we'll ignore since
microsoft effectively killed EPP, and ECP can actually hurt our perfomance
because of the small blocksize) When you're sending X Y and say RGB,
that's 5 bytes of data and at least 5 bytes of control. (remember we're
doing worst case, so optimizing these numbers isn't what we want)
Now following true parallel port standard, after each write you normally
need a inb(0x80) to place a 1us hold rate, but since we send a status
after each byte to trigger, you can almost always skip it for this application.
So... 10 bytes transfered = 10us.
That leaves 23us to work with... In normal (non-loaded) operation has
a 1-2us jitter (measured emprically) of the very high resolution timers
on a standard 2Ghz system... on a 700Mhz system you will see 2-3us of
jitter. An interesting note is that the actual jitter on all the machines
(barring interrupt servicing which we'll talk about later) we tested is
withing 2us of the lower bound! In either event, we'll just use an
unlikely high number of say 10us... Now if you read the kernel notes
or all established documentation, you'll see their timing as 10us to 50us.
This is just not true, and is more of an average also taking into account
the gross pauses due to the system interrupts to make sure new
programmers do not make timing dependancies based on the
expectation of fast servicing.
In conclusion, this leaves us 13us for the math. Before I talk more,
I'll go on a tangent.
The key reason most people give for not being able to do effects or
math on a parallel port based system is the servicing of interrupts.
(This is generally shown as hot spots when you move your mouse or
in the extreme example a 1-2 second pause if you close the top of your
Dell laptop :evil: ) The important thing to realize is that if your system
is going to service an interrupt, it doesn't matter what it's doing, it's
going to pause anyways! An extra 13us will be unnoticable.
The real reason they see unacceptable delays when trying to do more
than just simple playback is because of the method they use for timing.
The first code most people use is...
Code:
usleep(10); // theoretically 10us of delay, but more like 20us as there is
// usually a 10-20us latency in servicing.
The reason this is "bad" is because if it takes say 30us to (for what should
have been 10us) then you get some bad hotspotting... And the more CPU you
take for math, the worse it gets... The built-in sleep commands on all
major OSes are not suitable for high-resolution timing purposes.
The more advanced way of writing the above is...
Code:
for (int i=0; i<10; i++) { // delay for 10us
inb(0x80); // a generally agreed "safe" address to read from.
// A read operation on a low IO address will cause
// a very close to 1us delay.
}
Both ways "work", and the second works better, but still requires manual
timing and tuning to get the "magic" numbers and these numbers have to
be tweaked from system to system for optimal operation.
Since the functions are "stupid" in the sense that they don't base off an
external clock, they will also add the problem of desynchronization if you
have two systems running next to each other. (i.e. wiggle the mouse on
one box and leave the other running... the one servicing the mouse will
run slower than the one not if you wiggle for a long enough time)
The better way to do this is go under all the windows and linux user-level
timers and talk to the kernel timer.
I'll give a linux example since it'll fit on one page,
(You'll want a bunch of headers, I'm not near code so I'm typing this
off the top of my head so I may have typos, I'm just trying to give an
idea of what I'm talking about)
Code:
double get_clock_time() {
gettimeofday( &tv, NULL );
return (double) tv.tv_sec + (double) tv.tv_usec * 1.e-6;
}
main() {
double pointdelay = 33; // 33 microseconds
double lasttick;
lasttick=get_clock_time();
for (;;) { // just a loop
// do math here
for (double now=get_clock_time();now>lasttick+framedelay;) {
lasttick=now; // do this vs. lasttick=get_clock_time() to stay synchronized
}
sendpointtolaserfunction(); // insert your function here
}
}
This last code segment will still suffer from system interrupts (no
way around that except going to a fully buffered design (like my
own wee system) or to a standalone computer optimized for
point display (like pangolin)
Well that's pretty much it... once you free up your 10us of system time
to do math or whatever, that gives you quite a bit of leeway in what
you can do... I can do the abstracts above alongs with 2 full 3D transforms
(one for rotation, one for geometric correction) in 5-7 microseconds.
If people work on a framework for this stuff, I'd be happy to drop off algorithms
and code for it! I'm just too swamped on my own projects to work on it.
Next post is doing it with a framebuffer...