









| Observe : Profiling                                                                                                         | Clesion Revertification<br>CISCO SYSTEMS<br>LIMIT EMPOWERING THE<br>INTERNET GENERATION |  |  |  |
|-----------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------|--|--|--|
| <ul> <li>Profiling is one of the simplest and the most effective tools<br/>for observing simulation bottlenecks.</li> </ul> |                                                                                         |  |  |  |
| – vcs <u>+prof</u> -Mupdate                                                                                                 | e -PP +vcsd -f filelist -> vcs.prof                                                     |  |  |  |
|                                                                                                                             |                                                                                         |  |  |  |
| TOP LEVEL VIEW                                                                                                              |                                                                                         |  |  |  |
|                                                                                                                             |                                                                                         |  |  |  |
| TYPE %Totaltime                                                                                                             |                                                                                         |  |  |  |
|                                                                                                                             |                                                                                         |  |  |  |
| PLI                                                                                                                         | 0.10                                                                                    |  |  |  |
| VCD                                                                                                                         | 61.40                                                                                   |  |  |  |
| KERNEL                                                                                                                      | 1.88                                                                                    |  |  |  |
| DESIGN                                                                                                                      | 36.00                                                                                   |  |  |  |
|                                                                                                                             |                                                                                         |  |  |  |
| Rajesh Bawankule, Cisco Systems.                                                                                            | 3                                                                                       |  |  |  |

• This means that 60% time is spent in dumping the VCD file. Someone forgot to turn off the dumping switch! If we are running regressions then we need not keep this dumping on. Just turning off the dumping gave us 2x speed up.



| Observe : Pro                           | filing - II           |             | Clession Rovertification<br>Clession Rovertification<br>Cisco Systems<br>Lilling the<br>Internet Generation |   |
|-----------------------------------------|-----------------------|-------------|-------------------------------------------------------------------------------------------------------------|---|
|                                         | MODULE VIE            | W           |                                                                                                             |   |
| Module(index)                           | %Totaltime            | No of Insta | ances Definition                                                                                            |   |
| LIB_FF_T (1)                            | 3.94                  | 375         | /libpath/LIB_FF_T.v:45.                                                                                     |   |
| LIB_FF (2)                              | 3.87                  | 520         | /libpath/LIB_FF.v :42                                                                                       |   |
| SubtractBlock(3)                        | 2.84                  | 128         | /rtl/SubtractBlock.v:14                                                                                     |   |
| SPARES(4)                               | 2.70                  | 375         | /global/SPARES.v:7                                                                                          |   |
| MVE16384066084(5)<br>/macrocells/MVE163 | 2.67<br>84066084.v:53 | 16          |                                                                                                             |   |
| IO_CELL(10)                             | 1.17                  | 87          | /rtl/ IO_CELL.v:22                                                                                          |   |
| Rajesh Bawankule, Cisco Systems         |                       |             |                                                                                                             | 4 |

- 13-14% time is spent in 2 flops and the spare gates block. This means around 35% logic simulation time is spent on these gates. ((100/40) \* 14). Well, at least for this test. Other tests will hammer some other portion of logic.
- Spare Gates as well as IO macro cells contained instances of flops and gates that do not get optimized after synthesis. I created a blank module for the spare gates and wrote a simpler model for the flipflop to reduce run times. Combined with not dumping I could get around 4x speedup.





## **Optimized compilations**

- Synopsys VCS has inbuilt Radiant technology which can dramatically boost the simulation runs while running long regressions.
  - +rad or +rad+2 Specifies Radiant level 2
  - +rad+1 Specifies Radiant level 1
  - +optconfigfile applying optimizations to part of the design using a configuration file.
- Do not work with coverage tools or SDF back annotated simulations.
- 4x to 10x performance boost

Rajesh Bawankule, Cisco Systems.













- Bit blasted buses translate to more number of simulation events and thus slow down the simulation. Carefully removing IO cell wrapper in design and Bit splicer in simulation testbench will give you tremendous boost in simulation performance.
- I have seen 25-40% saving in run times by doing this. You still need to run few key tests with IO wrappers to have confidence in your setup.





Switches that hinder performance (compile-time, runtime, and memory usage):

| +cli                    | Turn on interactive debug on entire design                      |                                      |  |
|-------------------------|-----------------------------------------------------------------|--------------------------------------|--|
| +cli+mod=#              | Turn on interactive debug just for module mod                   |                                      |  |
| -line                   | Allow line stepping in debugger                                 |                                      |  |
| +acc                    | Obsolete flag to allow global pli access                        |                                      |  |
| -00                     | turn off optimizations                                          |                                      |  |
| -1                      | Obsolete flag for interactive GUI debug                         |                                      |  |
| -RI                     | Compile and run interactive debug with VirSim GUI               |                                      |  |
| -PP                     | Allow VirSim vcd+ binary dumping for post-processing debug      |                                      |  |
| -X*                     | Version specific flags to work around specific bugs             |                                      |  |
| +race                   | turn on race detection                                          |                                      |  |
| +prof                   | turn on VCS profiling                                           |                                      |  |
| -gen_c force            | generation of C intermediate code instead of native object code |                                      |  |
| -gen_asm                | force generation of assembly code instead of native object code |                                      |  |
| -S                      | the same as -gen_asm                                            |                                      |  |
| -P                      | pli.tab which contains acc=rw,cbk:* globally turn on pli access |                                      |  |
| +multisource_int_delays |                                                                 | enables global PLI access visibility |  |
| +transport_int_delay    |                                                                 | enables global PLI access visibility |  |

















