





## Developing Custom Instructions & Peripherals for Embedded Processing





- Embedded Systems
- MP3 Player Demonstration
- SOPC Builder & Avalon<sup>™</sup> Switch Fabric
- Custom Peripherals
- Developing Custom Instructions for the Nios<sup>®</sup> Embedded Processor
- Applications & Examples





### What Is an Embedded System?

- Special Purpose Computer
- Consists of Both Hardware & Software
  - Usually Contains At Least One Microprocessor
  - May Utilize An Operating System
  - All I/O Is Task-Specific
- Employs Several Function-Specific Blocks
- Used Where Full-Size Computers Are
  - Too Big
  - Too Expensive
  - Too Generic in Purpose





# **Traditional Design Method**

**Typical Order of Steps** 

- 1. Select System Controller (Processor)
- 2. Select Available Peripherals
- 3. Define Custom Logic
- 4. Adapt to Bus Standard
- 5. Develop Decode Logic
- 6. Multiplex Data Paths
- 7. Design Arbitration Logic
- 8. Create Interrupt Scheme
- 9. Develop Timing Logic





### **System Implementation**







### What If...







### What If...

**Typical Order of Steps** 

- 1. Select System Controller (Processor)
- 2. Select Available Peripherals
- 3. Define Custom Logic
- 4. Adapt to Bus Standard
- 5. Develop Decode Logic
- 6. Multiplex Data Paths
- 7. Design Arbitration Logic
- 8. Create Interrupt Scheme
- 9. Develop Timing Logic







### **Build MP3 Player**

Demonstration





### What Did We Just Do?

### The Power of SOPC Builder



### **MP3 Components Needed**

### **Complete MP3 Player System**



INNOVATION



# **The Nios Microprocessor**

- Soft-Core Microprocessor from Altera<sup>®</sup>
- Features
  - Basic RISC Processor
  - Harvard Architecture
  - Multi-Stage Pipeline
  - 16-or 32-Bit Data Path
  - 16-Bit Instruction
  - 64 Prioritized Interrupts
  - Custom Instructions
- Optimized for Altera FPGAs





# **SOPC Builder-Ready IP**

SOPC Builder-Ready Certification Requirements

- Avalon / AHB Compatible Interface
- OpenCore<sup>®</sup> Evaluation Support
- Evidence of Functional System Verification
- Successful Generation & Compilation of SOPC Builder System
- Plug-&-Play Compatibility with SOPC Builder
- Examples of What's Available
  - Processors
  - PCI, Ethernet & Communication Cores
  - Memory & Memory Controllers
  - USB, I2C, SPI

### It's Also Easy to Create Your Own







# **Custom PWM Peripheral**

- Audio PWM
- Verilog HDL
- Avalon Interface
  - Use Only Needed
     Signals
  - Provides Access to:
    - Period
    - Pulse Width
- External Interface
  - PLL Clock Input
  - PWM Output







### **Custom Instruction: fmul**

### Function Replaced with Hardware







### **MP3 Player Result**



### **SOPC Builder Creates All the Interconnect**







### What Can SOPC Builder Do for My System?

A Closer Look at the Tool



# **How SOPC Builder Helps**

- Automates Block-Based Design
  - System Definition
  - Component Integration
  - System Verification
  - Software Generation
- Fast & Easy



- Supports Design Reuse
  - Third-Party Intellectual Property (IP) Cores
  - Internally Developed Peripherals





### **SOPC Builder – System Integration**



- 🗆 🗙

### **Slave Side Arbitration**

#### **Bus Arbitration** Slave Side Arbitration Master 1 Master 2 Master 1 Master 2 (CPU) (DMA) (CPU) (DMA) Arbite **Arbiter** Arbiter Slave 1 Slave 2 Slave 1 Slave 2 (UART) (UART) (Memory) (Memory)

### **Higher System Throughput & Efficiency**





# **Dynamic Bus Sizing**

#### Narrow Slave

- Can be Translated to Master's Width
- Or Upper Bits Can Be Masked
- Your Choice Transparent to Master









### **SOPC Builder - System Verification**

#### Automated Simulation Generation

- Generate Complete System Simulation Model
- Generate Testbenches
- Setup Project Environment
- Immediate Simulation of Hardware & Software

| 册 wave - default                             |                         |                                     |                 |  |                 |                                         | <u>-0×</u>     |  |  |
|----------------------------------------------|-------------------------|-------------------------------------|-----------------|--|-----------------|-----------------------------------------|----------------|--|--|
| File Edit Cursor Zoom Bookmark Format Window |                         |                                     |                 |  |                 |                                         |                |  |  |
|                                              |                         |                                     |                 |  |                 |                                         |                |  |  |
| — data master ———                            |                         |                                     |                 |  |                 |                                         | $\Box$         |  |  |
| 🥥 d_write                                    | StO                     |                                     |                 |  |                 | A 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 |                |  |  |
| ⊕-O d_address                                | 000400 1///fc           | <u> (0000c0</u>                     | <u>)000400</u>  |  | <u> (0802fc</u> | X000024                                 | <u>X1ffffc</u> |  |  |
| 🥥 d_read                                     | St1                     |                                     |                 |  |                 |                                         |                |  |  |
|                                              | 00000034 fffffff<br>St0 |                                     | (00000034       |  | χιιιι           |                                         |                |  |  |
| d_wait<br>⊡ d_byteenable                     | 1111 1111               |                                     |                 |  |                 |                                         |                |  |  |
| ⊕ d_writedata                                | ffffffa                 | (000000c0                           | ),<br>Ifffffffa |  |                 | 00000034                                |                |  |  |
| — UART —                                     |                         | (00000000                           |                 |  |                 | 0000004                                 |                |  |  |
|                                              |                         |                                     |                 |  |                 |                                         |                |  |  |
| chipselect                                   | St1                     |                                     |                 |  |                 |                                         |                |  |  |
|                                              | 0 7                     | Xo                                  |                 |  | <u>Х</u> 7      | <u></u>                                 | X7             |  |  |
| ⊞ <b>-</b>                                   | fffa                    | (00c0                               | )(fffa          |  |                 | 0034                                    |                |  |  |
|                                              | 0034 0000               |                                     | )0034           |  |                 | 10000                                   | χ0067          |  |  |
| - Internals                                  | 0.4                     |                                     |                 |  |                 |                                         |                |  |  |
| tx_ready                                     | St1                     |                                     |                 |  |                 |                                         |                |  |  |
| ⊕-O tx_data<br>O rx_char_ready               | 9 <u>q</u><br>St0       |                                     |                 |  |                 |                                         |                |  |  |
|                                              | 4 4                     |                                     |                 |  |                 |                                         |                |  |  |
|                                              |                         |                                     |                 |  | a a state a a   |                                         |                |  |  |
|                                              |                         | 30250 ns 30300 ns 30350 ns 30400 ns |                 |  |                 |                                         |                |  |  |
|                                              | 30328100 ps             | <u>30328100 ps</u>                  |                 |  |                 |                                         |                |  |  |
|                                              |                         |                                     |                 |  |                 |                                         |                |  |  |
| 30225400 ps to 30423800 ps //                |                         |                                     |                 |  |                 |                                         |                |  |  |





# **SOPC Builder Software Support**

- Software Development Kit (SDK) Automatically Generates
  - Headers (INC)
    - Memory Map
    - Register Declarations
  - Libraries (LIB)
    - Runtime
  - Source (SRC)
    - Supplied by Peripherals
    - Examples for Processor
- Uses Software Compilers
  - Compile Runtime Libraries
  - Generate Memory Contents
  - Hardware & Software Simulation
- Advanced Software Components
  - Network Protocol Library
  - RTOS Components



| System Contents Nios More "nios_0" Settings System Generation |                                                                             |      |             |         |  |  |  |  |
|---------------------------------------------------------------|-----------------------------------------------------------------------------|------|-------------|---------|--|--|--|--|
| Nios System Settings                                          |                                                                             |      |             |         |  |  |  |  |
| Function                                                      | Mo                                                                          | dule | Offset      | Address |  |  |  |  |
| Reset Location                                                |                                                                             |      | 0x0         |         |  |  |  |  |
| Vector Table (256 bytes)                                      |                                                                             |      | 0x0         |         |  |  |  |  |
| Program Memory                                                | Program Memory                                                              |      |             |         |  |  |  |  |
| Data Memory                                                   |                                                                             |      |             |         |  |  |  |  |
| Primary Serial Port (printf, GERMS)                           |                                                                             |      |             |         |  |  |  |  |
| Auxiliary Serial Port                                         |                                                                             |      |             |         |  |  |  |  |
| System Boot [D: (25 chars max)                                |                                                                             |      |             |         |  |  |  |  |
| Software Components                                           |                                                                             |      |             |         |  |  |  |  |
| Use Name                                                      | Name                                                                        |      | Description |         |  |  |  |  |
| Altera Plugs TCP/IP Networ                                    | Attera Plugs TCP/IP Networking Library Lightweight, RTOS-independent networ |      |             |         |  |  |  |  |





### **Nios Processor in SOPC Builder**

#### Allows You to Create A Custom Instruction







### **Custom Instruction - Performance**

Replace Library Call with Custom Instruction

#define mad\_f\_mul(x,y) nm\_fmul(x,y)

Dramatically Accelerate Software Algorithms

| Category                                      | Number of Cycles to<br>Complete<br>mad_synth_frame() | Number of<br>Logic Elements<br>Used |  |
|-----------------------------------------------|------------------------------------------------------|-------------------------------------|--|
| CPU with Hardware Multiplier                  | 1,279,000                                            | n                                   |  |
| CPU with fmul<br>(Remove Hardware Multiplier) | 293,000                                              | n + 100                             |  |

Entire Function Sees 4x Improvement Just from fmul Acceleration







### **Run MP3 Demo**

### System Created in Minutes





### **How Does It Work?**

Looking Under the Hood



# **Avalon Switch Fabric**

- Avalon SOPC Interface Standard
  - Backbone of SOPC Builder
  - Easy to Use Interface
  - Parameterized
  - Optimized for Altera FPGAs
  - Introduced in Fall 2000
    - Native Bus for Nios Processor
  - Has Since Expanded
    - Altera & AMPP<sup>SM</sup> IP Cores
    - Customer-Defined Peripherals
    - 100+ Cores Planned for 2003







### **Bus Interface Standards**

- Why Bus Standards Are Used
  - Flexibility
    - Provides Wide Range of Capabilities in One Package
    - Guarantees Compatibility
  - Bus Designed to Handle All Contingencies
- Pitfalls of Typical Bus Standards
  - Must Be Complex to Support Everything
  - Even Small Peripherals Must Fully Comply

### Sledgehammer Is Used for Every Size Nail





# **Avalon Switch Fabric Is Different**

- Fabric Custom-Generated for Peripherals
  - Contingencies on per-Peripheral Basis
  - System Is Not Burdened by Bus Complexity
- SOPC Builder Automatically Generates
  - Arbitration
  - Address Decoding
  - Data Path Multiplexing
  - Bus Sizing
  - Wait-State Generation
  - Interrupts





### **Traditional Bus Master / Slave**

- Must Comply Fully to Chosen Bus Standard
  - Bus Standard Adds Complexity
  - Consumes Resources
  - Designed in Reverse
    - Design Starts at Bus Interface
    - Back-End Adapted to Comply

### Result = Non-Optimal Implementation







### **Traditionally Designed System**





#### Large Amount of Engineering Overhead!



### **Avalon Slave**

- No Need to Worry about Bus Interface
- Use Interface Optimal for Nature of Peripheral
- Implement Only Signals Needed
- Avalon Switch Fabric Adapts to Peripherals
- Timing Automatically Handled
- Fabric Created for You
- Arbiters Generated for You

### **Concentrate Effort on Peripheral Functionality!**







### **Avalon System**





**Designer Only Needs to Worry About Peripherals** 



### **Example Avalon Peripherals**

Master Peripheral that Can Write & Read Read-Only Slave Peripheral with waitrequest







### **Example Avalon Peripherals**





# **Creating Custom Peripherals**

How Do I Develop My Own Hardware for Use in SOPC Builder?



# **Reasons for Custom Hardware**

- Acceleration
  - Replace Software with Hardware
- Proprietary Functions
  - Algorithms
  - Product Differentiation
  - Design Reuse
- Availability
  - No Such Ready-Made IP







# **Creating an Avalon Slave**

#### **Pulse Width Modulator**





# **Creating an Avalon Slave**

### PWM Peripheral

- Verilog HDL
- Only 9 Ports
- Dynamic Bus Sizing
  - You Pick Data Width
  - Avalon Switch Fabric
     Adapts
  - Register vs. Memory

```
module avalon pwm (
    clk,
    wr data,
    byte n,
    CS,
    wr n,
    addr,
    clr n,
    rd data,
    pwm out
) ;
    input clk;
    input [31:0] wr data;
    input [3:0] byte n;
    input cs;
    input wr n;
    input addr;
    input clr n;
    output rd data;
    output pwm out;
```





# **Creating an Avalon Master**

- Example
  - POR Controller
    - State Machine
    - Avalon Interface
  - Bring Up System

### Simple & Easy

- Avalon Master
- 4-State FSM







# **Interface to User Logic**

|                                                    | ♦ Interface to                                           | User       | Logic     | - aud       | io_pwm 🛛 🔀  |   |  |  |
|----------------------------------------------------|----------------------------------------------------------|------------|-----------|-------------|-------------|---|--|--|
|                                                    | Ports Instantiation T                                    |            |           |             |             |   |  |  |
|                                                    |                                                          | r Slave 🔻  |           |             |             |   |  |  |
|                                                    | Design Files                                             | _          | 1         |             |             |   |  |  |
|                                                    | ✓ Jeppent Verilog, VHDL, EDIF, or Quartus Schematic File |            |           |             |             |   |  |  |
| 1                                                  | Add   pwm.v                                              |            |           |             |             |   |  |  |
|                                                    | Delete                                                   |            |           |             |             |   |  |  |
| 1                                                  | Top module: pwm                                          |            |           |             |             |   |  |  |
|                                                    | Port Information                                         |            |           |             |             |   |  |  |
|                                                    | Port Name                                                | Width      | Direction | Shared      | Tune        |   |  |  |
|                                                    | wave clk                                                 | 1          | input     | Snareu      | Type        |   |  |  |
|                                                    | reset_n                                                  | 1          | input     | <i>7777</i> | reset_n     |   |  |  |
|                                                    | pwm_select                                               | 1          | input     |             | chipselect  |   |  |  |
|                                                    | reg_address                                              | 1          | input     |             | address     |   |  |  |
|                                                    | write_n                                                  | 1          | input     |             | write_n     |   |  |  |
|                                                    | data_from_cpu                                            | 16         | input     |             | writedata   |   |  |  |
|                                                    | periodic_irq                                             | 1          | output    |             | irq         |   |  |  |
|                                                    | wave_out                                                 | 1          | output    |             | export      |   |  |  |
|                                                    | Read port-lis                                            | t from fil | es        | Add Port    | Delete Port |   |  |  |
|                                                    |                                                          | 🔽 Hide     | Advanced  | Signal Typ  | bes         |   |  |  |
|                                                    | ⊢ AHB Slave's Addressable Space                          |            |           |             |             |   |  |  |
|                                                    | Address span: 0x10000000 V Bits: 32                      |            |           |             |             |   |  |  |
| i                                                  |                                                          |            |           | _           |             | - |  |  |
|                                                    |                                                          |            |           |             |             |   |  |  |
|                                                    |                                                          |            |           |             |             |   |  |  |
| Cancel < Prev Next > Finish Editing Add to Library |                                                          |            |           |             |             |   |  |  |

- Publish Custom Hardware As SOPC Builder Component
- Choose Interface Type:
  - -Register Slave
  - -Memory Slave
  - -Avalon Master
- Add Design Files that Describe User Logic
- Automatically Define Port Table from Design Files
- Make Port Changes or Enter Ports Manually





# **Interface to User Logic**

- Specify Timing Requirements
  - Setup
  - Hold
  - Wait States
- Units
  - Time
  - Clock Cycles

| 🚸 Interface to User Logic - audio_pwm 💦 🔀                       |
|-----------------------------------------------------------------|
| Ports Instantiation Timing Publish                              |
| Setup: 0 vVait: 0 Hold: 0 Units: ns 💌                           |
| System Clock 66.66 MHz Timing granularity is System Clock cydus |
| Read Waveforms cycles                                           |
| data                                                            |
| readn 15ns                                                      |
|                                                                 |
| │ Write Waveforms                                               |
| data                                                            |
| select                                                          |
| writen 15ns                                                     |
| L]                                                              |
|                                                                 |
|                                                                 |
|                                                                 |
| Cancel < Prev Next > Finish Editing Add to Library              |







### **Creating Custom Instructions for Nios**

Augmenting Your Embedded Processor's Instruction Set



# **Custom Instruction - Definition**

- Dramatically Accelerates
   Software Algorithms Using
   Hardware
- Extends Nios Instruction Set
  - Up to Five Instructions
- SOPC Builder Development Tool
  - Automatically Adds User Logic to Nios ALU
  - Assigns Op-Code
  - Generates C- & Assembly- Macros







# **Custom Instruction - Software**

#### Code Macros (include excalibur.h)

- nm\_<macro\_name> (dataa, datab)
- nm\_<macro\_name>\_pfx (prefix, dataa, datab)
- Assembly Code
  - Use Opcodes or Assembly Macro

| LD %r1,[%L6]        | ; Load word at [%L6] into %r1                 |
|---------------------|-----------------------------------------------|
| LD %r0,[%L2]        | ; Load word at [%L2] into %r0                 |
| PFX 1               | ; Only needed if using prefix                 |
| nm_my_cust_inst %r1 | ; Macro calling a Rw opcode, r1 <= r1 "OP" r0 |
| ST [%L4],%r1        | ; %L4 is the pointer, %r1 is stored           |

Makes Your Custom Instruction Look Like a Normal C Function Call 20 YEARS of





### Custom Instruction - Integration Import "fmul" into Nios CPU

| ♦ Altera Nios 3.0 - nios_0                                                                                                                                                                                                                                       |      |            |             |        |  |  |  |
|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------|------------|-------------|--------|--|--|--|
| Architecture   Hardware   Software   Debug   Custom Instructions                                                                                                                                                                                                 |      |            |             |        |  |  |  |
| Library                                                                                                                                                                                                                                                          | Name | Operation  | Cycle Count | Opcode |  |  |  |
| Bit Swap<br>Endian Converter                                                                                                                                                                                                                                     | fmul | RA = RA op | 2           | USR0   |  |  |  |
|                                                                                                                                                                                                                                                                  |      | RA = RA o  |             | USR1   |  |  |  |
| Line Divide                                                                                                                                                                                                                                                      |      | RA = RA o  |             | USR2   |  |  |  |
|                                                                                                                                                                                                                                                                  |      | RA = RA o  |             | USR3   |  |  |  |
|                                                                                                                                                                                                                                                                  |      | RA = RA o  |             | USR4   |  |  |  |
|                                                                                                                                                                                                                                                                  |      |            |             |        |  |  |  |
| Add         Import         Edit         Delete         Up         Down           See <a href="mailto:custom_instruction_readme.txt">custom_instruction_readme.txt</a> , Application Note 188, and the <a href="mailto:custom">custom Instructions Tutorial</a> . |      |            |             |        |  |  |  |
|                                                                                                                                                                                                                                                                  |      |            |             |        |  |  |  |
| Cancel <                                                                                                                                                                                                                                                         | Prev | Next >     | Finish      |        |  |  |  |

| Interface to User L                                     | ogic             | - USRO_   | _nios_0 🛛 🔀 |  |  |  |  |
|---------------------------------------------------------|------------------|-----------|-------------|--|--|--|--|
| Ports Instantiation Timing Publish                      |                  |           |             |  |  |  |  |
| Bus Interface Type: Custom Instruction                  |                  |           |             |  |  |  |  |
|                                                         |                  |           |             |  |  |  |  |
| ₩ Import Verilog, VHDL, EDIF, or Quartus Schematic File |                  |           |             |  |  |  |  |
| Add fmul.bdf                                            |                  |           |             |  |  |  |  |
| Delete                                                  |                  |           |             |  |  |  |  |
| Top module: fmul                                        | Top module: fmul |           |             |  |  |  |  |
| Port Information                                        |                  |           |             |  |  |  |  |
| Port Name                                               | Width            | Direction | Туре        |  |  |  |  |
| dataa                                                   | 32               | input     | dataa       |  |  |  |  |
| datab                                                   | 32               | input     | datab       |  |  |  |  |
| start                                                   | 1                | input     | start       |  |  |  |  |
| reset                                                   | 1                | input     | reset       |  |  |  |  |
| clk_en                                                  | 1                | input     | clk_en      |  |  |  |  |
| cik                                                     | 1                | input clk |             |  |  |  |  |
| prefix                                                  | 11               | input     | prefix      |  |  |  |  |
| result                                                  | 32               | output    | result      |  |  |  |  |
| Read port-list from files Add Port Delete Port          |                  |           |             |  |  |  |  |
| Hide Advanced Signal Types                              |                  |           |             |  |  |  |  |
| AHB Slave's Addressable Space                           |                  |           |             |  |  |  |  |
| Address span: 0x2 💌 Bits:                               |                  |           |             |  |  |  |  |
|                                                         |                  |           |             |  |  |  |  |
|                                                         |                  |           |             |  |  |  |  |
| Cancel < Prey Next > Finish Editing Add to Library      |                  |           |             |  |  |  |  |





# Which One Do I Use?

- Custom Instruction
  - Used for Low-Clock
     Cycle Calculations
  - Provides Quick Access to Inputs/Output
  - Accessed Only by CPU
  - Stalls CPU

- Custom Peripheral
  - Used for Labor-Intensive Operations
  - Accessible through the Avalon Bus
  - Accessible by Other Masters (i.e., DMA)
  - CPU Independent







### **Applications & Examples**

#### **Real Customer Designs**



# **Example 1: SOPC Reality**

Тор



**Bottom** 









The Accordion Stackup







# **Example 1: SOPC Reality**









### **Example 1: SOPC Reality**







#### Application

- Quality & Assurance
   During System Production
- Function
  - Intercept Digital-Image Data Stream
  - Display Image on VGA Monitor
  - Verify Image Integrity
- No Need for Processor





















## **Example 3: Mandelbrot Algorithm**

```
int float mandelbrot(float cr, float ci, int max iter)
{
       float xsqr=0.0, ysqr=0.0, x=0.0, y=0.0;
      int iter=0;
      while ( ((xsqr + ysqr) < 4.0) & (iter < max iter) )
       {
             xsqr = x * x;
             ysqr = y * y;
             y = (2 * x * y) + ci;
             x = xsqr - ysqr + cr;
             iter++;
       }
```

```
return(iter);
```



}



# **Example 3: Optimizations**

- Floating-Point Software in FPU Co-Processor
- Floating-Point Software in Integer Software
- Integer Software Done in Hardware
- Add DMA Transfer to Hardware Acceleration
- Parallelize Subsections of Display
- Simplify Control Master







# **Example 3: Mandelbrot**

Demonstration



# Conclusion

- Altera Delivers System-Level Integration Solutions
  - SOPC Builder
  - Avalon
  - Nios
- SOPC Builder Accelerates Embedded System Design
  - Design Customization & IP Re-Use
  - Hardware Acceleration
  - Rapid Software Development







