Next Previous Contents

4. High-resolution timing

4.1 Delays

First of all, I should say that you cannot guarantee user-mode processes to have exact control of timing because of the multi-tasking nature of Linux. Your process might be scheduled out at any time for anything from about 10 milliseconds to a few seconds (on a system with very high load). However, for most applications using I/O ports, this does not really matter. To minimise this, you may want to nice your process to a high-priority value (see the nice(2) manual page) or use real-time scheduling (see below).

If you want more precise timing than normal user-mode processes give you, there are some provisions for user-mode `real time' support. Linux 2.x kernels have soft real time support; see the manual page for sched_setscheduler(2) for details. There is a special kernel that supports hard real time; see http://luz.cs.nmt.edu/~rtlinux/ for more information on this.

Sleeping: sleep() and usleep()

Now, let me start with the easier timing calls. For delays of multiple seconds, your best bet is probably to use sleep(). For delays of at least tens of milliseconds (about 10 ms seems to be the minimum delay), usleep() should work. These functions give the CPU to other processes (``sleep''), so CPU time isn't wasted. See the manual pages sleep(3) and usleep(3) for details.

For delays of under about 50 milliseconds (depending on the speed of your processor and machine, and the system load), giving up the CPU takes too much time, because the Linux scheduler (for the x86 architecture) usually takes at least about 10-30 milliseconds before it returns control to your process. Due to this, in small delays, usleep(3) usually delays somewhat more than the amount that you specify in the parameters, and at least about 10 ms.

nanosleep()

In the 2.0.x series of Linux kernels, there is a new system call, nanosleep() (see the nanosleep(2) manual page), that allows you to sleep or delay for short times (a few microseconds or more).

For delays <= 2 ms, if (and only if) your process is set to soft real time scheduling (using sched_setscheduler()), nanosleep() uses a busy loop; otherwise it sleeps, just like usleep().

The busy loop uses udelay() (an internal kernel function used by many kernel drivers), and the length of the loop is calculated using the BogoMips value (the speed of this kind of busy loop is one of the things that BogoMips measures accurately). See /usr/include/asm/delay.h) for details on how it works.

Delaying with port I/O

Another way of delaying small numbers of microseconds is port I/O. Inputting or outputting any byte from/to port 0x80 (see above for how to do it) should wait for almost exactly 1 microsecond independent of your processor type and speed. You can do this multiple times to wait a few microseconds. The port output should have no harmful side effects on any standard machine (and some kernel drivers use it). This is how {in|out}[bw]_p() normally do the delay (see asm/io.h).

Actually, a port I/O instruction on most ports in the 0-0x3ff range takes almost exactly 1 microsecond, so if you're, for example, using the parallel port directly, just do additional inb()s from that port to delay.

Delaying with assembler instructions

If you know the processor type and clock speed of the machine the program will be running on, you can hard-code shorter delays by running certain assembler instructions (but remember, your process might be scheduled out at any time, so the delays might well be longer every now and then). For the table below, the internal processor speed determines the number of clock cycles taken; e.g., for a 50 MHz processor (e.g. 486DX-50 or 486DX2-50), one clock cycle takes 1/50000000 seconds (=200 nanoseconds).


Instruction   i386 clock cycles   i486 clock cycles
xchg %bx,%bx          3                   3
nop                   3                   1
or %ax,%ax            2                   1
mov %ax,%ax           2                   1
add %ax,0             2                   1

Clock cycles for Pentiums should be the same as for i486, except that on Pentium Pro/II, add %ax, 0 may take only 1/2 clock cycles. It can sometimes be paired with another instruction (because of out-of-order execution, this need not even be the very next instruction in the instruction stream).

The instructions nop and xchg in the table should have no side effects. The rest may modify the flags register, but this shouldn't matter since gcc should detect it. xchg %bx, %bx is a safe choice for a delay instruction.

To use these, call asm("instruction") in your program. The syntax of the instructions is as in the table above; if you want multiple instructions in a single asm() statement, separate them with semicolons. For example, asm("nop ; nop ; nop ; nop") executes four nop instructions, delaying for four clock cycles on i486 or Pentium processors (or 12 clock cycles on an i386).

asm() is translated into inline assembler code by gcc, so there is no function call overhead.

Shorter delays than one clock cycle are impossible in the Intel x86 architecture.

rdtsc for Pentiums

For Pentiums, you can get the number of clock cycles elapsed since the last reboot with the following C code (which executes the CPU instrution named RDTSC):



   extern __inline__ unsigned long long int rdtsc()
   {
     unsigned long long int x;
     __asm__ volatile (".byte 0x0f, 0x31" : "=A" (x));
     return x;
   }

You can poll this value in a busy loop to delay for as many clock cycles as you want.

4.2 Measuring time

For times accurate to one second, it is probably easiest to use time(). For more accurate times, gettimeofday() is accurate to about a microsecond (but see above about scheduling). For Pentiums, the rdtsc code fragment above is accurate to one clock cycle.

If you want your process to get a signal after some amount of time, use setitimer() or alarm(). See the manual pages of the functions for details.


Next Previous Contents