# High-Speed Computing & Co-Processing with FPGAs

21C3 ligh-S 1C3 ecember 28th, 2004 peed 21st Chaos Communication 0 omputing 8 Berlin, Co-Processing with DE Congress FPGAs

FPGAs (Field Programmable Gate Arrays) are slowly becoming more and more advanced and practical as high-speed computing platforms. In this talk, David will provide an in-depth introduction into the guts and capabilities of modern day FPGAs and show how you can take your current algorithms and efficiently convert them to gate logic and run them on hardware. This presentation will also introduce a set of open source cores (jawn v1.0) that will implement the basic functionality of john the ripper on FPGAs and allow you to crack password hashes as fast as 100+ PCs using FPGA PCMCIA cards on your laptop.

#### David Hulton <dhulton@picocomputing.com>

Founder, Dachb0den Labs Chairman, ToorCon Information Security Conference Embedded Systems Engineer, Pico Computing, Inc.



#### Disclaimer

High-S 21C3 - 21st Chaos Communication Congress ecember 28th, 2004 pead Computing g Berlin, DE Co-Processing with FPGAs

- Educational purposes only
- Full disclosure
- I'm not a hardware guy



#### Goals

21C3 High-S ecember 28th, 2004 Deed 21st Chaos Communication Congress Computing 2 Berlin, DE Co-Processing with FPGAs

- This talk will cover:
  - Introduction to FPGAs
    - Verilog
    - Optimization Concepts
  - Cryptography
    - History
    - Password File Cracker (jawn v0.1)
  - Artificial Intelligence
    - Neural Networks



High-S 21C3 -

Chaos Communication Congress

Berlin,

DE

pead 21st

Computing

2

Co-Processing with

FPGAs

ecember 28th, 2004

# **Introduction to FPGAs**

Field Programmable Gate Array

- Lets you prototype IC's
- Code translates directly into circuit logic



# **Introduction to FPGAs**

haos Com omput Bui munication Co-P fucessing. Congress with FPGAs

#### Configurable Logic Blocks (CLBs)

- Registers (flip flops) for fast data storage
- Logic Routing
- Input/Output Blocks (IOBs)
  - Basic pin logic (flip flops, muxs, etc)
- Block Ram
  - Internal memory for data storage
- Digial Clock Managers (DCMs)
  - Clock distribution
- Programmable Routing Matrix
  - Intelligently connects all components together



#### **FPGA Pros / Cons**

High-S ecember 28th, 2004 ß 21st 0690 Chaos Communication Congress Computing 2 Berlin, DE Co-Processing with

FPGAs

#### Pros

- Common Hardware Benefits
  - Massively parallel
  - Pipelineable
- Reprogrammable
  - Self-reconfiguration
- Cons
  - Size constraints / limitations
  - More difficult to code & debug



# **Introduction to FPGAs**

igh-S ecember 21st 8th, Chaos Communication Congress omputing Ber Co-Processing with FPGAs

- Common Applications
  - Encryption / decryption
  - AI / Neural networks
  - Digital signal processing (DSP)
  - Software radio
  - Image processing
  - Communications protocol decoding
  - Matlab / Simulink code acceleration
  - Etc.



# **Introduction to FPGAs**

igh-S ecember G 21st 8th, Chaos Communication Congress omputing 2004 Ber 2 Co-Processing with FPGAs

- Common Applications
  - Encryption / decryption
  - AI / Neural networks
  - Digital signal processing (DSP)
  - Software radio
  - Image processing
  - Communications protocol decoding
  - Matlab / Simulink code acceleration
  - Etc.



## **Types of FPGAs**

igh-S ecember 28th, G 21st Chaos Communication Congress Computing 2004 2 Berlin, Co-Processing with FPGAs

#### Antifuse

- Programmable only once
- Flash
  - Programmable many times
- SRAM
  - Programmable dynamically
  - Most common technology
  - Requires a loader (doesn't keep state after poweroff)

# **Development Platform**

High-S 2103ecember 28th, 21st Deed Chaos Communication Computing 2004 8 Berlin, Co-Processing with DE Congress FPGAs

#### ROAG

- PCMCIA Form Factor
- Virtex II-Pro (XC2VP4-5)
- Embedded PowerPC 405
- 128MB RAM
- 32MB Flash
- 10/100 Ethernet
- Synchronous Serial Port
- 2 RS232 Ports
- CANBus
- Satellite Radio Controller



#### **Development Platform**

2103 High-S ecember 28th, Deed 21st Chaos Communication Computing 2004 2 Berlin, Co-Processing with DE Congress FPGAs

- Virtex II-Pro (XC2VP4-5)
  - 6,768 Logic Cells
    - 12KB of Registers (Distributed RAM)
    - ~ 180,000 Gates
  - 64KB of Block RAM
  - PowerPC 405
    - 300mhz Max Clock Speed



### **Development Platform**

2103 High-S ecember 28th, 2004 21st Chaos Communication Congress 0690 Computing 2 Berlin, Co-Processing with DE FPGAs

- FPGA Programming
  - PCMCIA
  - JTAG
- Embedded System
  - Xilinx's Microkernel
  - Linux
  - OpenBSD / NetBSD / etc ?



# **Creating Your Project**

21C3 High-S ecember 28th, 2004 Deed 21st Chaos Communication Congress Computing 2 Berlin, Co-Processing with DE FPGAs

#### Tools

- ISE 6.3i
- Chipscope 6.3i
- Modelsim 5.8c
- EDK 6.3i
- Installation date + 60-day trials available on xilinx.com



# Verilog

- Hardware Description Language
- Simple C-like Syntax
- Like Go Easy to learn, difficult to master



#### Demonstration

21C3 - 21st Chaos Communication Congress High-Spead ecember 28th, 2004 Computing 2 Berlin, DE Co-Processing with FPGAs

- Interfacing with the PCMCIA bus
  - Creating your design
  - Building
  - Running



# **PCMCIA** Bus

21C3High-Speed ecember 28th, 2004 21st Chaos Communication Congress Computing 2 Berlin, DE Co-Processing with FPGAs

| Lines                   |           |        |
|-------------------------|-----------|--------|
| Address                 | 0x10C8000 |        |
| Data In                 | 0xBEEF    |        |
| Data Out                |           | 0x4110 |
| Read                    |           |        |
| <ul><li>Write</li></ul> |           |        |

- Example
  - Read in input from PCMCIA bus
  - Invert bits and return it

# **Massively Parallel Example**



# **Massively Parallel Example**

PC

High-S 21C3 -

21st Chaos Communication Congress

Berlin, DE

Deec

Computing

2

Co-Processing with

FPGAs

ecember 28th, 2004

- Speed scales with # of instructions & clock speed
- Hardware
  - Speed scales with FPGA's:
    - Size
    - Clock Speed



Berlin, DE





Berlin, DE





Berlin, DE





Berlin, DE





21C3 - 21st Chaos Communication Congress

December 28th, 2004 - Berlin, DE

| High-Speed                           | PC                                      | (x * ~ 10 clo        | ck cycles      | ?) @ 3.0G | hz    |
|--------------------------------------|-----------------------------------------|----------------------|----------------|-----------|-------|
| pead                                 | for(i = 0; i < >                        | (; i++)              |                |           |       |
| Computing 6                          | f[i] = a[i]                             | + b[i] * c[i] – d[i] | ^ e[i]         |           |       |
| Computing & Co-Processing with FPGAs | <ul> <li>Hardware</li> <li>+</li> </ul> | (x + 3 clock<br>×    | cycles) @<br>- | 300Mhz    |       |
| ig with FPGAs                        | Stage                                   | e 1 Stage 2          | Stage 3        | Stage 4   | ⊾ Out |
|                                      | 1ns                                     | 2ns                  | 3ns            | 4ns       |       |

### **Pipeline Example**

PC

- Speed scales with # of instructions & clock speed
- Hardware
  - Speed scales with FPGA's:
    - Size
    - Clock speed
    - Slowest operation in the pipeline

High-S ecember 28th, 2004 G 21st Deec Chaos Communication Congress Computing 2 Berlin, DE Co-Processing with FPGAs

### **Self-Reconfiguration Example**

2103 High-S ecember 28th, 2004 Deed 21st Chaos Communication Congress Computing 2 Berlin, Co-Processing with DE FPGAs

PC data = MultiplyArrays(a, b); RC4(key, data, len); m = MD5(data, len);

Hardware



Enpyright (c) Dechoolden abs & Pico Computing, Inc. 2004.

### **Self-Reconfiguration Example**

21C3 High-S ecember 28th, 2004 Deed 21st Chaos Communication Congress Computing 2 Berlin, Co-Processing with DE FPGAs

PC data = MultiplyArrays(a, b); RC4(key, data, len); m = MD5(data, len);

Hardware



### **Self-Reconfiguration Example**

2103 High-S ecember 28th, 2004 Deed 21st Chaos Communication Congress Computing 2 Berlin, Co-Processing with DE FPGAs









ecember

28th,

2004

Ber

21st

Chaos Communication

Congress

igh-S

printing

2

Co-Processing with

FPGAs

# **History of FPGAs and Cryptography**

- Minimal Key Lengths for Symmetric Ciphers
  - Ronald L. Rivest (R in RSA)
  - Bruce Schneier (Blowfish, Twofish, etc)
  - Tsutomu Shimomura (Mitnick)
  - A bunch of other ad hoc cypherpunks



| High-Speed  | High-Speed Computing & Co-Processing with FPGA: |
|-------------|-------------------------------------------------|
| 21C3 - 21st | 21C3 - 21st Chaos Communication Congress        |
| December 2  | December 28th, 2004 - Berlin, DE                |

ŝ

| Budget              | ΤοοΙ      | 40-bits    | 56-bits    | Recom |  |
|---------------------|-----------|------------|------------|-------|--|
| Pedestrian Ha       |           |            |            |       |  |
| Tiny                | Computers | 1 week     | infeasible | 45    |  |
| \$400               | FPGA      | 5 hours    | 38 years   | 50    |  |
| Small Compan        |           |            |            |       |  |
| \$10K               | FPGA      | 12 min     | 556 days   | 55    |  |
| Corporate Dep       | artment   |            |            |       |  |
| \$300K              | FPGA      | 24 sec     | 19 days    | 60    |  |
|                     | ASIC      | 0.18 sec   | 3 hrs      |       |  |
| Big Company         |           |            |            |       |  |
| \$10M               | FPGA      | 0.7 sec    | 13 hrs     | 70    |  |
|                     | ASIC      | 0.005 sec  | 6 min      |       |  |
| Intelligence Agency |           |            |            |       |  |
| \$300M              | ASIC      | 0.0002 sec | 12 sec     | 75    |  |



- 40-bit SSL is crackable by almost anyone
- 56-bit DES is crackable by companies
- Scared yet?

#### This paper was published in 1996

1998

ecember

28th,

2004

Ber

h-5

omputing

8

Co-Processing with

FPGAs

1st

Chaos Communication

Congress

- The Electronic Frontier Foundation (EFF)
- Cracked DES in < 3 days</p>
- Searched ~9,000,000,000 keys/second
- Cost < \$250,000</p>
- 2001
  - Richard Clayton & Mike Bond (University of Cambridge)
  - Cracked DES on IBM ATMs
  - Able to export all the DES and 3DES keys in ~ 20 minutes
  - Cost < \$1,000 using an FPGA evaluation board</p>

ecember 28th, n-5 Ist Chaos Communication omputing 2004 8 Berlin, Co-Processing with Congress FPGAs

#### 2004

- Philip Leong, Chinese University of Hong Kong
- IDEA
  - 50Mb/sec on a P4 vs. 5,247Mb/sec on Pilchard
- RC4
  - Cracked RC4 keys 58x faster than a P4
  - Parallelized 96 times on a FPGA
  - Cracks 40-bit keys in 50 hours
  - Cost < \$1,000 using a RAM FPGA (Pilchard)</p>



#### **Password File Cracker**

ecember 1-5 8th Chaos Communication Congress omputing Ber Co-Processing with FPGAs

#### Design

- Pipeline design
- Internal cracking engine
  - password = des\_crack(hash, options);
- Interface over PCMCIA
- Can specify cracking options
  - Bits to search
    - e.g. Search 55-bits (instead of 56)
  - Offset to start search
    - e.g. First card gets offset 0, second card gets offset 2\*\*55
  - Typeable/printable characters
  - Alpha-numeric
  - Allows for basic distributed cracking & resume functionality



#### **Password File Cracker**







ligh-S

Computing

2

Co-Processing with

FPGAs

B

21st

Chaos Communication Congress

Berlin,

ecember 28th, 2004

#### **Password File Cracker**

PC (3.0Ghz P4 \w john)

- ~ 300,000 c/s
- Hardware (Low end FPGA \w jawn)
  - 100Mhz/25 = ~4,000,000 c/s
  - When timing issues are resolved it should run at 200Mhz

| Туре | P4     | ROAG  | 8 ROAGs |
|------|--------|-------|---------|
|      | 3808 Y | 292 Y | 36 Y    |
|      | 381 Y  | 28 Y  | 3.5 Y   |
|      | 14 Y   | 1.1 Y | 50 D    |



High-Sp 21C3 -

Chaos Communication

Congress

peed 21st

Computing

8

Co-Processing with

FPGAs

ecember 28th,

2004

Berlin,

DE

# Up & Coming

Pico (PCMCIA)

- 20k CLBs (~ 600k gates) @ ~ 350Mhz
- (3x250Mhz)/25 = ~30m c/s
- Picomon (Compact Flash)
  - 30k CLBs (~ 1m gates) @ ~ 400Mhz
  - (5x300Mhz)/25 = ~60m c/s
- Nest (PCI)
  - 16 Picomons
  - 480k CLBs (~ 16m gates) @ ~ 400Mhz
  - (16x5x300Mhz)/25 = ~960m c/s
  - NOTE: Straight DES cracking is ~ 24b c/s (> 2.5x faster than the EFF DES cracker)



# **Up & Coming Real Performance**

21C3 - 21st Chaos Communication Congress High-Speed December 28th, 2004 - Berlin, DE Computing & Co-Processing with FPGAs

| Туре                 | Pico | Picomon | Nest  |
|----------------------|------|---------|-------|
| 56-bits              | 36Y  | 19Y     | 1.2Y  |
| Typeable / printable | 3.8Y | 1.9Y    | 43D   |
| Alphanumeric         | 54D  | 27D     | 41H   |
| Straight DES         | 1.5Y | 277D    | 17.4D |



# **Artificial Intelligence**

High-S 21C3ecember 28th, 2004 21st 0690 Chaos Communication Congress Computing 2 Berlin, Co-Processing with DE FPGAs

- Back Propegation Neural Network
- Applications
  - Handwriting Recognition
  - Character Recognition
  - Voice Recognition
  - FFTs
  - Automatic Protocol Emulation
  - Pattern Matching
  - Etc.



### **BP Neural Networks**

High-S 21C3 - 21st Chaos Communication Congress December 28th, 2004 -Deed Computing 8 Berlin, DE Co-Processing with FPGAs

```
Running
 for(i=0; i<NEURONS; i++) {</pre>
      for(j=0, x=0; j<LayerDimms[i]; j++)</pre>
         x += y[j]*w[j][i];
     y[i] = x - t[i];
 }
Training
 do {
      e += Train(y, x);
 } while (e > ERRMIN);
```



#### **BP Neural Networks**



Enpyright (c) Dechooden abs & Pico Computing, Inc. 2004.



#### Feedback?

- What do you think?
- Possible Applications?
- Questions?

# **Conclusions / Shameful Plugs**

ToorCon 7

- End of September, 2005
- San Diego, CA USA
- http://www.toorcon.org
- ShmooCon
  - Super Bowl Weekend, 2005
  - Washington DC, USA
  - http://www.shmoocon.com
- LayerOne
  - June, 2005
  - Los Angeles, USA
  - http://www.layerone.info

21C3 High-S ecember 28th, 21st 0690 Chaos Communication Computing 2004 8 Berlin, Co-Processing with DE Congress FPGAs

# **Questions ? Suggestions ?**

igh-S ecember 28th, 21st Chaos Communication Congress omputing 2004 Berlin, Co-Processing with FPGAs

#### David Hulton

- 0x31337@gmail.com
- h1kari@dachb0den.com Will be back up soon!
- OpenCores
  - http://www.opencores.org
  - Xilinx
    - ISE Foundation (Free 60-day trial)
- Pico Computing, Inc.
  - http://www.picocomputing.com
  - Products will be available around March, 2005