# Entropy gathering for cryptographic applications in AVR

# Qualification of WDT as entropy source

© 2006 Kasper Pedersen

Copy released under the GNU Lesser General Public License 2.0

#### Brief

In cryptographic applications, security often depends on the parties' ability to generate unpredictable session keys. Without unpredictable session keys, workhorse algorithms like Diffie-Hellman and RSA achieve strengths far below what would be expected from looking at their implementation, even to the point of being broken in seconds.

Unfortunately, randomness is hard. Microcontrollers and microprocessors are the pinnacle of deterministic machines, and if there's one thing they're really poor at, it's behaving in a non-deterministic fashion.

This problem cannot be solved in software. The needed randomness must come from a physical process – hardware – and it must not be possible for other parties to obtain (listen in on) or influence the data.

In the AVR family there are two possible internal sources of randomness: The A/D converter (for parts that have it), and the on-chip oscillators.

This implementation exploits the jitter in the on-chip oscillators. They have the benefit of being hard to observe from outside the chip.

# Obtaining entropy from the watchdog

In AVR, the watchdog is clocked from an on-chip RC oscillator, and this oscillator is different from the core clock oscillator.

To see if it was a possible source of randomness, a small program was written, and executed in a Mega168 on an STK500 with 16MHz external clock. Power was supplied from a 12V lead-acid battery to rule out external power supply noise.

write r24 to the serial port clear r24 arm the watchdog for 16 ms timeout count like mad in r24 (3 clocks/iteration) until the watchdog strikes

#### The program can be found in appendix 1

And the result is promising. 10.000 values were gathered and analyzed. What we get is a probability graph of 'when does the watchdog fire'. If the oscillator was mathematically perfect, we'd get a single narrow line. In reality the on-chip oscillator drifts and jitters: If you were to look at its clock edge on an analog scope, it would be 'thick and fuzzy' (jitter) and move around (drift).



Each bin is 3 clocks wide, and it looks like there are at least 8 well-populated bins, so we might be able to extract 3 bits per sample. Life isn't that good to us, however. The differential value (sample[n] – sample[n-1]) is a little worse:

File: ..\randcap6.bin Mode: second order histogram Note: offset 80h With goodwill this is 6 bins wide, so it's more like 2, or maybe 2.5, bits. The third order differential is going wide again, so this is probably realistic. But: There IS entropy to be obtained. So, let's distill it.

### A practical implementation

The first implementation – just collecting samples – goes a long way to qualifying the hardware, but actually making a seed buffer from it is somewhat more cumbersome. The basic algorithm goes:

On power-on or external hardware reset, start the watchdog On watchdog reset, get the count value, add it to an entropy buffer, and count like mad. And if it was an application watchdog, do not catch it.

This implementation has a simple circular buffer, and adds samples into the buffer. To improve the entropy gathering, it includes a tiny (and totally insecure) 8-bit substitution cipher to mix up the bits slightly. It's 8 bits wide because the analysis tool looks at 8 bit bytes. MD5 would be a lot better, but hiding a poor distribution from the analysis tool is not the goal, and so we can't use MD5 for the analysis phase. *The code can be found in appendix 2.* 

The entropygather entrypoint is to be called early, before any start up code destroys R24.

#### **Observed performance**

With a single pass of the buffer the result isn't good. This was predicted.

| - | В9 | D2 | В7 | В8 | В8 | В5 | В4 | В6 | В6 | В5 | В4 | В5 | В4 | В7 | В3 | В5 |
|---|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|
| - | В6 | D9 | ΒB | В6 | В1 | В1 | В2 | В4 | BЗ | BЗ | В5 | В7 | В7 | В5 | В4 | В6 |
| - | В5 | D5 | BЗ | В4 | В1 | BЗ | В4 | В4 | В6 | В2 | В5 | BЗ | BЗ | В9 | В6 | В6 |
| - | BЗ | D4 | В8 | В5 | BЗ | BЗ | В7 | В7 | В5 | В4 | В4 | В4 | В4 | В5 | В6 | В4 |
| - | В6 | D4 | В7 | В7 | В6 | В5 | BЗ | В2 | В4 | В1 | BО | В1 | BО | BО | В2 | В5 |
| - | В5 | D7 | BC | В6 | В7 | BЗ | В4 | В4 | В6 | В8 | В7 | В6 | В5 | В4 | В4 | BЗ |
| - | ΒA | D1 | В2 | В4 | В8 | В6 | В8 | В7 | В2 | В1 | В0 | В9 | В9 | ΒB | BC | BB |
| - | ΒB | D8 | В7 | В8 | В8 | В5 | В6 | В7 | В2 | В6 | В7 | ΒB | В7 | В9 | BC | BD |
| - | BЗ | D7 | BC | В7 | В9 | В8 | В7 | В8 | В7 | ΒA | В8 | BC | ΒB | В9 | В5 | BЗ |
| - | В2 | D1 | ΒB | В9 | В8 | В9 | В8 | В1 | В5 | В6 | В1 | В3 | В1 | AF | AF | В4 |

The entropy is there, and there's about two or three bits' worth. With 4 passes (as in the code given), it looks usable:

C0 31 E9 E6 06 4B EF 2D DF DC 00 7B 79 8E 78 33
E6 D5 06 E5 27 EF 30 78 72 EA 24 79 AD 35 DA 07
42 CF BF D7 E0 C6 35 8D 91 2E 8F C5 68 06 47 82
49 04 BB 81 27 8A 25 05 10 3C 69 B9 DF 65 E8 A2
83 DF B0 D8 E5 01 6E 96 5C 97 3D 27 90 49 49 42
58 23 2D 5A 5D 35 82 58 E5 9E 8F B4 5E 33 CA 3E
98 DA 6B 47 EE 94 80 8F F8 07 A3 AA 8B 49 E8 E0
59 77 C2 41 00 C0 21 C3 5E D0 FB 76 CD A4 1A 01
C0 D9 0D F0 F9 1B E6 96 4F 6A 55 6F 6C CC 05 48
6B 24 32 FB AA F9 5F CD 09 27 E7 08 34 FB 7C 78

And at this point encodings like Lempel-Ziv (zip) aren't able to compress it. It does shows some funny statistical anomalies though, the reason being that the mixing function is poor at best, which means that we probably need a better mixing function.



Not pretty at all.

# **Third implementation**

A single multiply-by-123 is added to the mixing function; This is lossless, and doesn't conceal things from the histogram plots. *The code can be found in appendix 3.* 

We rerun the experiment with the improved mixing function, using 192, 96, 64, 48, and 32 samples per 128 output bits. This is 0.7, 1.3, 2.0, 2.7, and 4.0 bits of entropy required from each sample. We already know that there isn't 4 bits of entropy available, so the last run should show structure. The 2.7-bits-of-entropy run is an unknown.

The following plots were all generated from 122000 bytes of output (7625 data sets) each, and scaling is fixed.



With 96 samples forming a 128 bit output:



# with 64 samples forming a 128 bit output:



All of these share one common characteristic: The probability that you'll get two identical bytes in a row is twice that of other combinations; For practical purposes it doesn't matter, and I suspect it's a bug in either the analysis tool or the data collection software. And all of them would be suitable for cryptographic applications.

With 48 samples forming a 128 bit output:



This is pretty good, and means that there is at least 2.66 bits of entropy to be obtained from each sample.



The 32-samples-in-128-bits version should, and does, fail:

These two are scaled to 50% vertically, as they would otherwise exceed the scale.

When there isn't enough entropy, because the mixing function's block size (one byte) is the same as the analysis' tool's, the histograms become sparse.

# Using the entropy buffer

A note to people new to using entropy buffers:

While it might seem like the thing to do, do not use the entropy in the buffer directly. The entropy is valuable in that it takes time to obtain (as in resetting the microcontroller 64 times), and there really isn't any reason for sending it to third parties.

Instead, use it as initialization for a pseudo-random number generator. If you are doing cryptography, chances are that you have a block cipher or hash. Block ciphers make excellent cryptographically secure pseudo-random number generators, look up CTR mode. The same goes for hashes (hash the buffer and a counter, but do search the literature to see if there are known vulnerabilities in your chosen hash).

#### Conclusion

The on-chip watchdog timer provides more than 2 bits per sample when compared against an external 16MHz clock every 16 ms (the fastest watchdog timeout available). If the clock frequency is only 8MHz, expect 1 bit of entropy per sample.

- With external 16MHz+ clock, 125 bits per second of entropy
- With external 8-16MHz clock, 63 bits per second of entropy
- The internal oscillator was not tested; Most likely it will work as well, but test!

The entropy mixing function used in the third implementation works, but for real life applications, replacing it with something else would be good from a trust perspective.

Second, the mixing function used here is fragile. Trying to optimize it by throwing out operations gave poor results, so if you modify it, and don't have a rock solid understanding of what it needs to do, do analyze it afterwards.

### Appendix 1: Feasibility test code

```
;
; Does the watchdog fire with a good random spread?
;
.INCLUDE "m168def.inc"
.org 0
rjmp mainp
.org 0x40
mainp:
     ldi r16,0xFF
     ldi r17,0x04
     out SPL, r16
     out SPH,r17
     ;set up uart
     ;16M/19200/16 = 52
     ldi r16,52-1
     sts UBRROL,r16 ;reload
     ldi r16,0x06
     sts UCSROC, r16 ;8 bit
     ldi r16,0x18
     sts UCSR0B,r16 ;enable
     in r16, MCUSR
     andi r16,0x0F
     cpi r16,0x08
     breq acquiremode
     ; power on, not acquisiton. so.
     clr r16
     out MCUSR, r16
     wdr
     ldi r16,0x08
     sts WDTCSR,r16
stopp: rjmp stopp;
acquiremode:
     clr r16
     out MCUCR, r16
     mov r16,r24
     sub r16,r25
                      ; to count increment distance
     mov r25,r24
     rcall sendc
     ldi r16,0x08
     sts WDTCSR,r16
acqloop:
     inc r24
     rjmp acqloop
```

```
sendc:
    lds r17,UCSR0A
    sbrs r17,5
    rjmp sendc
    sts UDR0,r16
    ret
```

# Appendix 2: CRNG 2 source (practical RNG test 1, poor mixing function)

```
UART code not included, identical to the code in appendix 3
.dseq
            .byte 16
eg seedbuf:
eg seedptr: .byte 1
eg_seedstate: .byte 1
.cseq
entropygather:
     ;Look at status register to decide what the situation is.
     in
         r30,MCUSR
     ; is this an external reset or power on?
     andi r30, (1<<BORF) | (1<<EXTRF) | (1<<PORF)
     brne eg init ; if so, we want to gather.
     ; it could also be watchdog
     in
         r30,MCUSR
     andi r30, (1<<WDRF)
     brne eg feed ;watchdog fired.
     ; and if not, someone called the reset vector
     ret
eg init:
     clr r30
                                ;Clear all the reset flags,
     sts eg_seedptr,r30
sts eg_seedstate,r30

                               ;null the feed counter,
                               ;and state
     out MCUSR,r30
                                ; and power-on flags.
     rcall eg_clear
                                ; only for testing
eg wait:
     clr r24 ;not needed in an actual implementation
     ; enable the watchdog and wait for it to fire.
     wdr
     ldi r30,0x08
     sts WDTCSR,r30
eg_wait2:
     inc r24
                               ; spinning around like mad.
     rjmp eg wait2
eg feed:
     ;was it an application watchdog, and not a sampling wdog?
     lds r30, eg seedstate
     or r30,r30
     brne eg done
     ;watchdog fired. accumulate data.
     rcall eg stir ;takes r24 as argument
     ;now we have ~15 ms until the next kick
     ; have enough in buffer?
     lds r30,eg seedptr
     cpi r30,64
                              ;four passes, please
     brne eg wait ; no..
```

```
eg done:
     sts eg seedstate,r30 ;all done.
     wdr
     ldi r30, (1<<WDCE) | (1<<WDE) ;remove watchdog</pre>
     ldi r31, 0
     out MCUSR, r31 ;M168:cannot disable wdog if WDRF is set.
     sts WDTCSR, r30
     sts WDTCSR,r31
     ret
; simple entropy gatherer.
; Yes, it's not much, but it's low-cost, so it can
; feed continuously from other places too..
eg stir:
     lds r30,eg seedptr
     inc r30
     sts eg seedptr,r30
     andi r30,15
                   ;wrap to buffer
     clr r31
     subi r30,low(-eg seedbuf) ;base of buffer
     sbci r31, high (-eg seedbuf)
     ld r16,Z
                             ;fetch
     swap r16
                             ;stir some
     add r16,r16
                             ;rotate
                            ;and feed in inverted carry.
     sbci r16,0
                            ;add in data
     add r16,r24
     st Z,r16
                             ;store
     ret
     ; this is purely to make it less random - TEST ONLY
eg clear:
     ldi r30,low(eg seedbuf) ;base of buffer
     ldi r31, high (eg_seedbuf)
     clr r24
eg_cloop:
     st Z+, r24
     cpi r30,low(eg seedbuf+16) ;done?
     brne eg_cloop
     ret
```

# Appendix 3: Real Use version complete with UART test/demo code

```
.INCLUDE "m168def.inc"
.dseg
eg_seedbuf: .byte 16
eg_seedptr: .byte 1
eg_seedstate: .byte 1
.cseg
.org 0
            rjmp mainp
mainp:
```

```
ldi r16,0xFF
     ldi r17,0x04
     out SPL, r16
     out SPH, r17
     rcall entropygather ; called early, once stack is okay.
     ;set up uart and report the frames
     ;16M/19200/16 = 52
     ldi r16,52-1
     sts UBRROL,r16 ;reload
     ldi r16,0x06
     sts UCSR0C,r16 ;8 bit
     ldi r16,0x18
     sts UCSR0B,r16 ;enable
     ldi r30,low(eg seedbuf) ;base of buffer
     ldi r31, high (eg seedbuf)
     clr r24
cloop:
     ld r16,Z+
     rcall sendc
     cpi r30,low(eg seedbuf+16) ;done?
     brne cloop
     ;rjmp eg init
                   obtain-forever-option for analysis
stopp: rjmp stopp;
sendc:
     lds r17,UCSR0A
     sbrs r17,5
     rjmp sendc
     sts UDR0,r16
     ret
; The entropy gatherer
entropygather:
     ;Look at watchdog to decide what the situation is.
     in
         r30,MCUSR
     ; is this an external reset or power on?
     andi r30, (1<<BORF) | (1<<EXTRF) | (1<<PORF)
     brne eq init ; if so, we want to gather.
     ; it could also be watchdog
     in r30, MCUSR
     andi r30, (1<<WDRF)
     brne eg_feed ;watchdog fired.
     ; and if not, someone called the reset vector, or it's watchdog.
     ret
eg init:
     ;right here you null out the buffer if you're analyzing
     ;the mixing function. We aren't any longer, so no code.
                                ;Clear all the reset flags,
     clr r30
     sts eg seedstate,r30
                                ; and state
     out MCUSR, r30
                                ; no more power-on flags.
     ; fall through to wait-for-data.
eg_wait:
     inc r30
```

```
sts eg seedptr,r30
                                     ;feed count: will be 1.
     ;enable the watchdog and wait for it to fire.
     ldi r30,0x08
     sts WDTCSR,r30
     wdr
eg_wait2:
     inc r24
     rjmp eg wait2
eg_feed:
     ; was it an application watchdog, and not a sampling wdog?
     lds r30, eg seedstate
     or r30,r30
     brne eg done
     ;watchdog fired. accumulate data.
     rcall eg stir
                         ;takes r24 as argument
     ; have enough in buffer?
     lds r30,eg_seedptr
     cpi r30,64
     brne eg wait ; no. still hungry.
eg done:
     sts eg seedstate,r30 ;all done.
     wdr
     ldi r30, (1<<WDCE) | (1<<WDE) ;remove watchdog</pre>
     ldi r31, 0
     out MCUSR, r31 ;M168: you cannot disable wdog if WDRF is set.
     sts WDTCSR,r30
     sts WDTCSR,r31
                     ; and back to main for business as usual
     ret
; simple entropy gatherer/compressor
; Tweak at your own peril
eg stir:
     lds r30,eg_seedptr
     inc r30
     sts eg seedptr,r30
     andi r30,15
                         ;wrap to buffer
     clr r31
     subi r30,low(-eg seedbuf) ;base of buffer
     sbci r31, high (-eg seedbuf)
     ld r16,Z
                             ;fetch
     swap r16
                             ;stir some
     add r16,r16
                             ;rotate
     sbci r16,0
                             ; and feed in inverted carry.
     add r16,r24
                             ;add in data
     mov r0,r16
                             ; the hastily added multiply
     ldi r24,123
     mul r0,r24
     st
         Z,rO
     ret
```