Notes

I'm trying to keep myself organized by publishing work-in-progress. The thought that someone other than me might actually see this stuff tends to encourage a certain coherency to the diffs and their associated notes.

Any patches that do not have a date as part of the filename or have a not-so-recent date are unlikely to apply cleanly to -current. I do have all these things still in one tree or another or they have been included in the official OpenBSD tree. The network and sparc64 pages are particularly out-of-date (as in, “half a decade”).

June 20 (2015)
More papers on random numbers have been added.
June 3 (2013)
Links to some of my other projects have been added.
Feb 18 (2008)
The kernel rnd and VIA RNG patches have been synced with -current.
Nov 9
The kernel rnd and VIA RNG patches have been synced with -current.
Sept 27
There is a new .NET wrapper for Makoto Matumoto's Mersenne Twister called RandomSFMT.
Sept 26
The c7random program inside the NIST SP 800-90 CTR_DRBG archive now uses /dev/urandom to form part of the nonce.
Sept 5
Vista x64 and PasswordSafe do not get along perfectly. This should help.
Sept 3
NIST SP 800-90 CTR_DRBG compiles under VS2005 again.
Aug 28
An OpenSSL hack has been added to provide full entropy for the default RNG.
Aug 14
The minimal pr5205 diff was checked in.
Aug 9
It looks like there's a small diff that can fix pr5205.
Aug 7
c7random has been merged into the NIST SP 800-90 CTR_DRBG code. There have been some bugfixes, Rijndael known-answer tests have been added, and the code has been reorganized.
July 30
I've gathered some random number generation information onto an Entropy and Random Numbers page.
July 18
The minor effort to make a port for DieHarder has turned into a major patch.
July 17
DieHarder has come along to provide a GPLed alternative to DIEHARD.
June 22
I've updated the Kern rnd and VIA C3/C7 RNG patches to -current.
June 13
I've added some links to NIST's take on RNGs.

Old notes have been archived.

DieHarder (RNG Tester)

Robert G. Brown has put together a GPLed RNG test suite that includes the and expands upon George Marsaglia's DIEHARD suite called DieHarder.

Here's a preliminary port that should get it compiled:

dieharder-port-20070716.tgz

It does not install any documentation or header files. Also, I'm no port, automake, or autoconf expert.

Alternately, here's a patched version of dieharder-2.24.4.tgz that has been lightly tested on amd64/FreeBSD and on i386/OpenBSD.

Full distribution: dieharder-20070718.tar.bz2 (679k)

Patch: dieharder-2.24.4-20070718.diff.bz2 (395k)

VIA RNG fix (pr5205)

The VIA VT-310DP is a dual-processor Mini-ITX board that sometimes likes to panic under heavy load. As pr5205 notes, the problem goes away when entropy collection from the CPU's hardware RNG is disabled. Use of the AES hardware does not trigger this panic. Both require SSE, but the former uses SSE in a callout whereas the latter does so in a kernel thread. More troubling, the entropy collection callout enables SSE by directly manipulating the “Emulation” and “Task Switched” bits of CR0.

This patch moves the entropy collection into a kernel thread (without any CR0 twiddling): pr5205_20070215.diff.gz

An alternate solution can be developed based on a theory about the cause of this problem: Let's say that a process that uses the FPU has just run on CPU0 and is about to be run on CPU1. The FPU state for the process is still living on CPU0, so CPU1 sends an IPI to CPU0 to ask it to flush the FPU state to the appropriate PCB. Unfortunately, CPU0 is right in the middle of its entropy collection loop—which means that it has already sampled CR0. The npxdna_xmm() function gets called, clearing TS (through clts()). The entropy polling resumes, and when it completes, it sets CR0 back to what it was before xpndna_xmm() cleared it. At this point “TS” is set when it should not be set. Exactly how this leads to the ensuing panic is beyond my understanding of the deferred FPU handling…

At any rate, here's a minimal diff that blocks IPIs while the entropy polling is running: pr5205_minimal_20070809.diff.gz (this was checked in to the OpenBSD tree on Aug 14.)

The minimal diff makes the assumption that while the VIA PadLock hardware uses the SSE datapath, it does not change the state of any of the SSE registers. The documentation seems a bit unclear on this point. If this assumption is not correct, then programs that use any of the FPU and/or SSE state that PadLock touches, may occasionally find their math going wrong. The kernel thread solution should avoid this issue since a kernel thread is a process and the kernel already knows how to deal with processes that use SEE. The crypto acceleration is safe since it always runs in the context of the kernel's crypto thread.

The minimal diff does avoid the panics described in pr5205.

For reference, this is what I'm using in production.

kernel rnd

The OpenBSD kernel has a single arc4-based generator that supplies almost all the kernel's random numbers (from /dev/arandom, seeding userland's arc4-based generator, to selecting ephemeral network ports).

The entropy pool is derived from the the Linux random driver written by Theodore Ts'o. (Perhaps a mixing function based on SFMT could inspire some changes?)

The paper Analysis of the Linux Random Number Generator by Zvi Gutterman and Benny Pinkas may also be applicable to OpenBSD's RNG given its common heritage. Barak and Halevi share their thoughts about an alternative architecture in An Architecture for Robust Pseudo-Random Generation and Applications to /dev/random.

There are some possible races in the kernel rnd driver (which supports /dev/Xrandom). This should help.

See this tech@ thread for details.

Feb 18
Yet again, I've regenerated the diff against -current.
Nov 9
Once again, I've regenerated the diff against -current.
June 22
I've regenerated the diff against -current and fixed two bugs in the old patch.

rnd_20080218.diff.gz

Enhanced VIA C3/C7 RNG Support

The VIA C3/C7 processors have internal random number generators. There is already support in OpenBSD for harvesting the entropy from these generators, but here are some improvements.

The current code polls the generator until it has delivered the requested number of bytes. This can take a while (some experiments suggest that a 1GHz C3 spends ~5% of the total available CPU time polling the RNG). If someone decides to shut off the RNG—or if it fails—the polling may never terminate without user intervention.

Later versions of the core add a second entropy source. The diff makes sure that both sources are enabled. On the single C7 where it has been tested, the whitened output rate increased from ~21Mbit/s to ~35Mbit/s (WARNING: it turns out those numbers were collected with apmd -aC, so the CPU clock was changing).

The VIA PadLock Developer Center page has a link to the VIA C5J programming guide for Asssembler

The VIA PadLock Security Engine page has a link to VIA C5J PadLock Security Engine and to an evaluation whitepaper.

Feb 18
Yet again, I've regenerated the patch against -current.
Nov 9
Once again, I've regenerated the patch against -current.
June 22
I've regenerated the patch against -current.

viarng_20080218.diff.gz (This requires the kern rnd patch.)

rndstats Monitor

Here's a utility to monitor the kernel's random number generator/entropy pool (the rnd device).

The kernel entropy pool collects statistics about where bits come from, how much is exported, and so on. This can be monitored through “sysctl kern.random” which provides a great deal of information in a single line:

kern.random=102543780 537536 0 220972 6 1288 0 0 0 0 0 0 3337411 114221 1447 755 8806 108 162 271 363 494 732 1070 1478 1747 2012 2401 3599 3990 3576 4570 3811 2329 1100 733 412 288 156 66 17 12 0 1 0 0 3 3293104 3293104 8516 0 10570 3055 22166 0 0 102086224 0 0 140963 31278 286917 0 0

rndstats provides the same information in a slightly less compact format and will also display the rate of change for the various counters. For example, here's “rndstats -vw 5” on a 1.2GHz C7 (with apmd -aC running):

 total = 113208394 bits (0.113208 Gbit 31.7348 kbit/s)
  used = 539520 strong bits (0 bits/s)
 reads = 0 calls (0 calls/s)
ARC4
 reads = 226124 bytes (37.6 bytes/s)
nstirs = 7 calls (0 calls/s)
 stirs = 1544 bits used (0 bits/s)
Queue:
 waits = 0 (0 waits/s)
  enqs = 3683621 calls (1026 calls/s)
  deqs = 125122 calls (33.2 calls/s)
 drops = 1495 (0 drops/s)
drople = 846 (0 droples/s)
Sources:
    true = 3635512 calls, 112700872 bits (31 bits/call 1024.8 calls/s)
   timer = 8548 calls, 0 bits
   mouse = 0 calls, 0 bits
     tty = 11817 calls, 157519 bits
    disk = 3129 calls, 32346 bits (13.5 bits/call 0.4 calls/s)
     net = 24615 calls, 318422 bits (15.25 bits/call 0.8 calls/s)
   audio = 0 calls, 0 bits
   video = 0 calls, 0 bits
Entropy Histogram:
 0 bits = 8924 calls (0 calls/s 0 bits/s)
 1 bits = 121 calls (0 calls/s 0 bits/s)
 2 bits = 177 calls (0 calls/s 0 bits/s)
 3 bits = 301 calls (0 calls/s 0 bits/s)
 4 bits = 394 calls (0 calls/s 0 bits/s)
 5 bits = 546 calls (0 calls/s 0 bits/s)
 6 bits = 810 calls (0 calls/s 0 bits/s)
 7 bits = 1176 calls (0 calls/s 0 bits/s)
 8 bits = 1623 calls (0.2 calls/s 1.6 bits/s)
 9 bits = 1891 calls (0 calls/s 0 bits/s)
10 bits = 2163 calls (0 calls/s 0 bits/s)
11 bits = 2584 calls (0 calls/s 0 bits/s)
12 bits = 3887 calls (0.4 calls/s 4.8 bits/s)
13 bits = 4337 calls (0 calls/s 0 bits/s)
14 bits = 3994 calls (0 calls/s 0 bits/s)
15 bits = 5147 calls (0 calls/s 0 bits/s)
16 bits = 4328 calls (0 calls/s 0 bits/s)
17 bits = 2596 calls (0.2 calls/s 3.4 bits/s)
18 bits = 1252 calls (0 calls/s 0 bits/s)
19 bits = 812 calls (0.2 calls/s 3.8 bits/s)
20 bits = 457 calls (0.2 calls/s 4 bits/s)
21 bits = 312 calls (0 calls/s 0 bits/s)
22 bits = 169 calls (0 calls/s 0 bits/s)
23 bits = 72 calls (0 calls/s 0 bits/s)
24 bits = 19 calls (0 calls/s 0 bits/s)
25 bits = 13 calls (0 calls/s 0 bits/s)
26 bits = 0 calls (0 calls/s 0 bits/s)
27 bits = 1 calls (0 calls/s 0 bits/s)
28 bits = 0 calls (0 calls/s 0 bits/s)
29 bits = 0 calls (0 calls/s 0 bits/s)
30 bits = 3 calls (0 calls/s 0 bits/s)
31 bits = 3635512 calls (1024.8 calls/s 31768.8 bits/s)

Perhaps one should take those entropy estimates with a grain of salt (and perhaps that grain is large enough to make for a respectable bench press).

rndstats_20070128.tgz

C7 Random Number Generator

The C7 has a built in SHA256 engine in addition to a hardware random number generator. This utility uses the two together to produce what should be a pretty solid random stream on stdout. Each output block of 256 bits is generated from 448 bits generated by the hardware RNG. Enabling the “paranoid” mode (“p” option) XORs the output of the hash with a fresh set of RNG bits and iterates the whole generation process N times before generating an output block (where N is the argument to the “-p” option).

There's a new version that adds a “-N” option to produce random output that is hopefully consistent with the “RBG” of NIST SP 800-90 Appendix D with the C3/C7's entropy source and an AES-256-based CTR_DRBG.

Here's a neat trick: since it isn't such a great idea to stall the kernel for long periods while seeding the entropy pool, we move the heavy lifting into userland. For use when the system is starting, build a static c7random binary (uncomment the LDSTATIC line in the Makefile, then “make clean ; make”) and copy the resulting binary to the root filesystem (e.g., /bin). Then edit /etc/rc to add two copies of /bin/c7random -p 4 -s 16384, one before the host.random file is read and one after host.random is rewritten. It should look something like:

mount -s /usr >/dev/null 2>&1
mount -s /var >/dev/null 2>&1

/bin/c7random -p 4 -s 16384 > /dev/urandom

# if there's no /var/db/host.random, make one through /dev/urandom
if [ ! -f /var/db/host.random ]; then
        dd if=/dev/urandom of=/var/db/host.random bs=1024 count=64 \
                >/dev/null 2>&1
        chmod 600 /var/db/host.random >/dev/null 2>&1
else
        dd if=/var/db/host.random of=/dev/urandom bs=1024 count=64 \
            > /dev/null 2>&1
        dd if=/var/db/host.random of=/dev/arandom bs=1024 count=64 \
            > /dev/null 2>&1
fi

# reset seed file, so that if a shutdown-less reboot occurs,
# the next seed is not a repeat
dd if=/dev/urandom of=/var/db/host.random bs=1024 count=64 \
    > /dev/null 2>&1

/bin/c7random -p 4 -s 16384 > /dev/urandom

# clean up left-over files

Finally, to get the kernel pool reseeded every ten minutes, add

*/10    *       *       *       *       /bin/c7random -p 4 -s 8192 > /dev/urandom

to root's crontab. The command shouldn't take more than 100ms to run, so one could run it more often. Increasing the size of the write would not be useful as the kernel pool is only 4096 bytes (the data is processed by the kernel as it adds it to pool, so a 4096 byte write may not be enough to reach everything). The point is to completely reseed the entropy pool to make life more difficult for someone that has partial knowledge of the pool contents. Small incremental writes to the pool are easier for the attacker to guess and track (if they can get output between the updates).

In reality, the kernel's normal C7 RNG polling will the refill the entire entropy pool once every two seconds or so… Then again, perhaps that means the pool needs to be be larger?

Come to think of it, an entropy-deprived host could use the C7 as a remote entropy source (and c7random doesn't need any particular privileges):

$ ssh randomsource.host /bin/c7random -p 5 -s 32 | hexdump
0000000 1643 bef9 fedb a1b5 f5ab 8230 45f9 e8e8
0000010 fdc5 ca50 a4f3 8e88 39cc c5dd 3011 ae5f
0000020

Obviously, that is only as good as the link security. It could be useful for seeding a server running SSL over a local LAN—as long as the web server is sensible enough to grab new entropy every once in a while.

c7random doesn't check what kind of CPU it is running on. To use is on a CPU with the RNG but without SHA256, use the “-x” option.

For comparison, here's are 1MB runs of from c7random and from /dev/urandom on the same box (1.2GHz C7).

$ /bin/c7random | dd of=/dev/null bs=1k count=1k
1024+0 records in
1024+0 records out
1048576 bytes transferred in 0.429 secs (2443960 bytes/sec)
$ dd if=/dev/urandom of=/dev/null bs=1k count=1k
1024+0 records in
1024+0 records out
1048576 bytes transferred in 10.551 secs (99380 bytes/sec)

The former is consuming more high-quality entropy bits than it is generating output bits—so it should provide solid prediction resistance—at ~2.4Mbyte/s whereas the latter is producing 100kbyte/s from roughly 31kbit/s of entropy input (almost all from the C7's RNG).

Note that this does not require any of the above kernel changes.

The c7random code is now part of the NIST SP 800-90 CTR_DRBG distribution.

C3/C7 Random Number Benchmark (hack)

The C3/C7's internal random number generator has a few control and status bits that can sometimes be of interest. This little hack displays them as well as an estimate of the generator's output rate. The output of via_rng for a 1.2GHz C7 box looks like this:

CentaurHauls Type=0 Family=6 Model=10 Stepping=9
VIA Esther processor 1200MHz
RNG MSR 0x11b: 0x00000248 ( ENBL RNG-BOTH BIAS=0 )
Raw rate:
3.6 s total time
1.7795 us per 8 byte iteration
35.9652 Mbit/s
Kernel polling:
2.09 s total time
103.419 us per 64-byte iteration
1.03419% CPU

And here's the same box with only one of the RNG sources enabled:

CentaurHauls Type=0 Family=6 Model=10 Stepping=9
VIA Esther processor 1200MHz
RNG MSR 0x11b: 0x00000048 ( ENBL RNG-A BIAS=0 )
Raw rate:
6.28 s total time
3.07814 us per 8 byte iteration
20.7918 Mbit/s
Kernel polling:
3.75 s total time
184.049 us per 64-byte iteration
1.84049% CPU

via_rng_20070128.tgz