lOMoARcPSD| 58675420
Downloaded by Lynh Nguyen (lynhn228@gmail.com)
HCMC
UNIVERSITY
OF
TECHNOLOGY
AND
EDUCATION
FACULTY
OF
QUALITY
TRAINING
FINAL
PROJECT
Course
name:
HARDWARE/SOFTWARE
CODESIGN
CREATING
A
PROCESSOR
SYSTEM
Lecturer
name
:
List
of
members:
Ho
Chi
Minh
City,
06/2022
lOMoARcPSD| 58675420
LECTURERCOMMENT
TT
Content
Comment
1
Introduction
2
Creating a Processor System
3
Creating a Processor System Lab
4
Conclusion
General comment:
...........................................................................................................................................
...
...........................................................................................................................................
...
...........................................................................................................................................
...
...........................................................................................................................................
...
...........................................................................................................................................
...
...........................................................................................................................................
...
Lecturers signature
lOMoARcPSD| 58675420
ACKNOWLEDGEMENT
To complete the Hardware/Software Codesign subject, we would like to express our
heartfelt gratitude to Assoc. Prof. Phan Van Ca has enthusiastically guided and equipped
us with the necessary helpful knowledge this semester. Furthermore, he has directly
guided and created all conditions to help us during the process of Hardware/Software
Codesign.
Due to the project's short implementation time, the topic's limited knowledge,
limitations, and errors have not been completely overcome. We look forward to
receiving your advice and suggestions.
Student
lOMoARcPSD| 58675420
CONTENT
ACKNOWLEDGEMENT .............................................................................................. i
CONTENT ..................................................................................................................... ii
FIGURE LIST ............................................................................................................... iv
PART 1. INTRODUCTION ........................................................................................... 1
1.1. Introduction ......................................................................................................... 1
1.2. Purpose and requirements .................................................................................... 1
1.3. Layout .................................................................................................................. 1
PART 2. CREATING A PROCESSOR SYSTEM ......................................................... 1
2.1. Embedded System Design in Zynq using IP Integrator ...................................... 1
2.1.1. Embedded Design Architecture in Zynq .................................................. 1
2.1.2. The PS and the PL .................................................................................... 2
2.1.3. Vivado ...................................................................................................... 2
2.2. Creating IP-XACT Hardware Accelerator .......................................................... 3
2.2.1. Port-Level Interfaces ................................................................................ 3
2.2.2. Interface Modes ........................................................................................ 4
2.2.3. Native AXI Slave Lite Interface ............................................................... 4
2.2.4. Controllable Register Maps in AXI4 Lite ................................................ 4
2.2.5. Native AXI4 Master ................................................................................. 4
2.2.6. Burst Accesses Inferred for AXI4 Master ................................................ 5
2.2.7. Byte-Enable Accesses on AXI4 Master ................................................... 5
2.2.8. AXI4 Port Bundling ................................................................................. 5
2.2.9. AXI4 Stream Interface: Ease of Use ........................................................ 6
2.2.10. Generate the hardware accelerator ......................................................... 6
2.2.11. Generated impl Directory ....................................................................... 6
2.3. Integrating the Hardware Accelerator in AXI System ......................................... 6
PART 3. CREATING A PROCESSOR SYSTEM LAB ................................................ 7
3.1. Create a New Project ........................................................................................... 7
3.2. Run C Simulation ................................................................................................ 9
3.3. Synthesize the Design ........................................................................................ 10
3.4. Run RTL/C CoSimulation ..................................................................................11
lOMoARcPSD| 58675420
3.5. Setup IP-XACT Adapter .................................................................................... 12
3.6. Generate IP-XACT Adapter .............................................................................. 13
3.7. Create a Vivado Project ..................................................................................... 15
3.8. Export to SDK and create Application Project .................................................. 19
3.9. Verify the Design in Hardware .......................................................................... 20
REFERENCE ............................................................................................................... 23
lOMoARcPSD| 58675420
FIGURE LIST
Figure 1: The design under consideration ........................................................................9
Figure 2: The header file................................................................................................ 10
Figure 3: Initial part of the generated output in the Console view ................................11
Figure 4: Generated interface signals
.............................................................................12
Figure 5: Selecting the AXI4LiteS adapter and naming bundle ....................................13
Figure 6: Applying bundle to assign y output to AXI4Lite adapter ..............................14
Figure 7: Export RTL Dialog .........................................................................................
15
Figure 8: IP-XACT adapter generated ...........................................................................
15
Figure 9: Adapters drivers directory .............................................................................16
Figure 10: Block design made for Pynq .........................................................................17
Figure 11: Setting path to IP Repositories ..................................................................... 18
Figure 12: Generated design after IRQ_F2P interface enabled .....................................19
Figure 13: Generated address map .................................................................................20
lOMoARcPSD| 58675420
PART 1. INTRODUCTION
1.1. Introduction
This project will present you with the process of using Vivado and IP Integrator to
create a complete Zynq ARM Cortex-A9 based processor system targeting the ZyBoard
Zynq development board. You will use the Block Design feature of IP Integrator to
configure the Zynq PS and add IP to create the hardware system, and SDK to create an
application to verify the design functionality. It will also guide you through the process
of profiling an application and analyzing the output.
1.2. Purpose and requirements
a) Purpose
This lab introduces a design flow to generate a IP-XACT adapter from a design
using Vivado HLS and using the generated IP-XACT adapter in a processor system
using IP Integrator in Vivado. b) Requirements
After completing this lab, you will be able to:
- Understand the steps and directives involved in creating an IP-XACT
adapter from a synthesized design in Vivado HLS
- Create a processor system using IP Integrator in Vivado
- Integrate the generated IP-XACT adapter into the created processor
system.
1.3. Layout
The report is divided into 4 parts:
Part 1. Introduction
Part 2. Creating a Processor System
Part 3. Creating a Processor System Lab
Part 4. Conclusion
PART 2. CREATING A PROCESSOR SYSTEM
2.1. Embedded System Design in Zynq using IP Integrator
2.1.1. Embedded Design Architecture in Zynq
Embedded design in Zynq is based on:
– Processor and peripherals
Dual ARM® Cortex™ -A9 processors of Zynq-7000 AP SoC
AXI interconnect
lOMoARcPSD| 58675420
AXI component peripherals
Reset, clocking, debug ports
– Software platform for processing system
Standalone OS
C language support
Processor services
C drivers for hardware
User application
• Interrupt service routines (optional)
2.1.2. The PS and the PL
The Zynq-7000 AP SoC architecture consists of two major sections:
– PS: Processing system
Dual ARM Cortex-A9 processor based (Single core
versions available)
Multiple peripherals • Hard silicon core
– PL: Programmable logic
• Uses the same 7 series programmable logic
Artix™-based devices: Z-7010, Z-7015, and Z-7020 (high-range I/O banks only)
Single core versions: Z-7017S, Z-7012S, and Z-7014S
Kintex™-based devices: Z-7030, Z-7035, Z-7045, and Z-7100 (mix of high-range and
high-performance I/O banks)
2.1.3. Vivado
What are Vivado, IP Integrator and SDK?
Vivado is the tool suite for Xilinx FPGA design and includes capability for embedded
system design
IP Integrator, is part of Vivado and allows block level design of the hardware part of
an Embedded system
Integrated into Vivado
Vivado includes all the tools, IP, and documentation that are required for designing
systems with the Zynq-7000
AP SoC hard core and/or Xilinx MicroBlaze soft core processor
Vivado + IPI replaces ISE/EDK
lOMoARcPSD| 58675420
– SDK is an Eclipse-based software design environment
Enables the integration of hardware and software components
Links from Vivado
Vivado is the overall project manager and is used for developing non-embedded
hardware and instantiating embedded systems
– Vivado/IP Integrator flow is recommended for developing Zynq embedded systems.
Embedded System Design using Vivado
2.2. Creating IP-XACT Hardware Accelerator
2.2.1. Port-Level Interfaces
The AXI4 interfaces supported by Vivado HLS include
– The AXI4-Stream (axis)
Specify on input arguments or output arguments only, not on input/output
arguments – The AXI4 master (m_axi)
Specify on arrays and pointers (and references in C++) only. You can group
multiple arguments into the same AXI4-Lite interface using the bundle option
– The AXI4-Lite (s_axilite)
Specify on any type of argument except arrays. You can group multiple arguments into
the same AXI4-Lite interface using the bundle option.
lOMoARcPSD| 58675420
2.2.2. Interface Modes
Native AXI Interfaces
AXI4 Slave Lite, AXI4 Master, AXI Stream supported by INTERFACE directive
Provided in RTL after Synthesis
Supported by C/RTL Co-simulation
Supported for Verilog and VHDL BRAM Memory Interface
Identical IO protocol to ap_memory
Bundled differently in IP Integrator
• Provides easier integration to memories with BRAM interface
2.2.3. Native AXI Slave Lite Interface
Interface Mode: s_axilite
Supported with INTERFACE directive
Multiple ports may be grouped into the same Slave Lite interface
All ports which use the same bundle name are grouped
Grouped Ports
Default mode is ap_none for input ports
Default mode is ap_vld for output ports
Default mode ap_ctrl_hs for function (return port)
Default mode can be changed with additional INTERFACE Directives.
2.2.4. Controllable Register Maps in AXI4 Lite
Assigning offset to array (RAM) interfaces
Specified value is offset to base of array
Array’s address space is always contiguous and linear
C Driver Files include offset information
In generated driver file xhls_sig_gen_bram2axis.h
2.2.5. Native AXI4 Master
Interface Mode: m_axi
Supported with INTERFACE directive
Options
Multiple ports may be grouped into the same AXI4 Master interface
All ports which use the same bundle name are grouped
Depth option is required for C/RTL co-simulation
lOMoARcPSD| 58675420
Required for pointers, not arrays
Set to the number of values read/written
Option to support offset or base address
2.2.6. Burst Accesses Inferred for AXI4 Master
There are two types of accesses on an AXI Master: Single Access and Burst Access
Burst accesses are more efficient
Burst access has until now required the use of memcpy()
Burst Accesses are now inferred
From operations in a for-loop and from sequential operations in the code
However: there are some limitations
• Single for-loops only, no nested loops
2.2.7. Byte-Enable Accesses on AXI4 Master
Byte-Enable Accesses Support on AXI4 Master Interfaces
Single bytes are now written and read
Improved AXI4 Master performance
Improved Performance
This code uses 8-bit data
Previously, accessing this required reading/writing full 32-bit
This implied a required read-modify-write behavior: Impacted performance
– Similar performance improvement when accessing struct members
Also often implied read-modify-write behavior
Improved Port Bundling
• Variables of different sizes can be grouped into same AXI4 Master port.
2.2.8. AXI4 Port Bundling
AXI4 Master and Lite Port Bundling
The bundle options groups arguments into the same AXI4 port For
example, group 3 arguments into AXI4 port “ctrl” :
Arguments can be Bundled into AXI4 Master and AXI4 Lite ports
If no bundle name is used a default name is used for all arguments
All go into a single AXI4 Master or AXI4 Lite
Default name applied if no –bundle option is used
Group different sized variables into an AXI4 Master port
lOMoARcPSD| 58675420
2.2.9. AXI4 Stream Interface: Ease of Use
Native Support for AXI4 Stream Interfaces
Native = An AXI4 Stream can be specified with set_directive_interface
No longer required to set the interface then add a resource
This AXI4 Stream interface is part of the HDL after synthesis
This AXI4 Stream interface is simulated by RTL co-simulation
2.2.10. Generate the hardware accelerator
Select Solution > Export RTL
Select IP Catalog, System Generator for Vivado or design check point (dcp) Click
on Configuration… if you want to change the version number or other information
Default is v1_00_a
Click on OK
The directory (ip) will be generated under the impl folder under the current
project directory and current solution
RTL code will be generated, both for Verilog and VHDL languages in their
respective folders
2.2.11. Generated impl Directory
2.3. Integrating the Hardware Accelerator in AXI System
Create a new Vivado project, or open an existing project
Invoke IP Integrator
Construct(modify) the hardware portion of the embedded design by adding the
IP-XACT
Create (Update) top level HDL wrapper
lOMoARcPSD| 58675420
Synthesize any non-embedded components and implement in Vivado
Export the hardware description, and launch XSDK
Create a new software board support package and application projects in the
XSDK
Compile the software with the GNU cross-compiler in XSDK
Download the programmable logic’s completed bitstream using Xilinx Tools >
Program
FPGA in XSDK
Use XSDK to download and execute the program (the ELF file).
PART 3. CREATING A PROCESSOR SYSTEM LAB
3.1. Create a New Project
Create a new project in Vivado HLS targeting xc7z020clg400-1 device
1. Select Start > Xilinx Design Tools > Vivado HLS 2017.4 A Getting
Started GUI will appear.
2. In the Getting Started section, click on Create New Project. The New Vivado HLS
Project wizard opens.
3. Click Browse… button of the Location field, browse to {labs}\lab4, and then click
OK.
4. For Project Name, type fir.prj and click Next.
5. In the Add/Remove Files for the source files, type fir as the function name (the
provided source file contains the function, to be synthesized, called fir).
6. Click the Add Files… button, select fir.c and fir_coef.dat files from the
{sources}\lab4 folder, and then click Open.
7. Click Next.
8. In the Add/Remove Files for the testbench, click the Add Files button, select
fir_test.c file from the {sources}\lab4 folder and click Open.
9. Click Next.
10. In the Solution Configuration page, leave Solution Name field as solution1 and
make sure the clock period as 8. Leave Uncertainty field blank.
11. Click on the Part’s Browse button and using the Parts Specify option, select
xc7z020clg400-1.
12. Click Finish.
lOMoARcPSD| 58675420
You will see the created project in the Explorer view. Expand various sub-folders to
see the entries under each sub-folder.
13. Double-click on the fir.c under the source folder to open its content in the
information pane.
Figure 1: The design under consideration
The FIR filter expects x as a sample input and pointer to the computed sample out
y. Both of them are defined of data type data_t. The coefficients are loaded in array c of
type coef_t from the file called fir_coef.dat located in the current directory. The
sequential algorithm is applied and accumulated value (sample out) is computed in
variable acc of type acc_t.
14. Double-click on the fir.h in the outline tab to open its content in the information
pane.
lOMoARcPSD| 58675420
Figure 2: The header file
The header file includes ap_cint.h so user defined data width (of arbitrary precision)
can be used. It also defines number of taps (N), number of samples to be generated (in
the testbench), and data types coef_t, data_t, and acc_t. The coef_t and data_t are short
(16 bits). Since the algorithm iterates (multiply and accumulate) over 59 taps, there is a
possibility of bit growth of 6 bits and hence acc_t is defined as int38. Since the acc_t is
bigger than sample and coefficient width, they have to cast before being used (like in
lines 16, 18, and 21 of fir.c).
15. Double-click on the fir_test.c under the testbench folder to open its content in the
information pane.
Notice that the testbench opens fir_impulse.dat in write mode, and sends an impulse
(first sample being 0x8000.
3.2. Run C Simulation
Run C simulation to observe the expected output.
1. Select Project > Run C Simulation or click on the button from the tools bar buttons,
and click OK in the C Simulation Dialog window.
The testbench will be compiled using apcc compiler and csim.exe file will be generated.
The csim.exe will then be executed and the output will be displayed in the console view.
lOMoARcPSD| 58675420
Figure 3: Initial part of the generated output in the Console view
3.3. Synthesize the Design
Synthesize the design with the defaults. View the synthesis results and answer
the question listed in the detailed section of this step.
1. Select Solution > Run C Synthesis > Active Solution to start the synthesis process.
2. When synthesis is completed, several report files will become accessible and the
Synthesis Results will be displayed in the information pane.
3. The Synthesis Report shows the performance and resource estimates as well as
estimated latency in the design.
4. Using scroll bar on the right, scroll down into the report and answer the following
question.
Estimated clock period: 8ns Worst
case latency: 175
Number of DSP48E used: 0
Number of BRAMs used: 3 Number
of FFs used: 168
Number of LUTs used: 157
5. The report also shows the top-level interface signals generated by the tools.
lOMoARcPSD| 58675420
Figure 4: Generated interface signals
You can see the design expects x input as 16-bit scalar and outputs y via pointer of
the 16-bit data. It also has ap_vld signal to indicate when the result is valid.
Add PIPELINE directive to the loop and re-synthesize the design. View the
synthesis results.
1. Make sure that the fir.c is open in the information view.
2. Select the Directive tab, and apply the PIPELINE directive to the loop.
3. Select Solution > Run C Synthesis > Active Solution to start the synthesis process.
4. When synthesis is completed, the Synthesis Results will be displayed in the
information pane.
5. Note that the latency has reduced to 63 clock cycles. The DSP48 and BRAM
consumption remains same; however, LUT and FF consumptions have slightly
increased.
3.4. Run RTL/C CoSimulation
Run the RTL/C Co-simulation, selecting Verilog. Verify that the simulation
passes.
1. Select Solution > Run C/RTL Co-simulation or click on the button to open the
dialog box so the desired simulations can be run.
2. Select the Verilog option and click OK.
The Co-simulation will run, generating and compiling several files, and then
simulating the design. In the console window you can see the progress. When done the
RTL Simulation Report shows that it was successful and the latency reported was 62.
lOMoARcPSD| 58675420
3.5. Setup IP-XACT Adapter
Add INTERFACE directive to create AXI4LiteS adapters so IP-XACT adapter
can be generated during the RTL Export step.
1. Make sure that fir.c file is open and in focus in the information view.
2. Select the Directive tab.
3. Right-click x, and click on Insert Directive….
4. In the Vivado HLS Directive Editor dialog box, select INTERFACE using the drop-
down button.
5. Click on the button beside mode (optional). Select s_axilite.
6. In the bundle (optional) field, enter fir_io and click OK.
Figure 5: Selecting the AXI4LiteS adapter and naming bundle
7. Similarly, apply the INTERFACE directive (including bundle) to the y output.
lOMoARcPSD| 58675420
Figure 6: Applying bundle to assign y output to AXI4Lite adapter
8. Apply the INTERFACE directive to the top-level module fir to include ap_start,
ap_done, and ap_idle signals as part of bus adapter (the variable name shown will be
return). Include the bundle information too.
Note that the above steps will create address maps for x, y, ap_start ap_valid,
ap_done, and ap_idle, which can be accessed via software. Alternately, ap_start,
ap_valid, ap_done, ap_idle signals can be generated as separate ports on the core by not
applying RESOURCE directive to the top-level module fir. These ports will then have
to be connected in a processor system using available GPIO IP.
3.6. Generate IP-XACT Adapter
Re-synthesize the design as directives have been added. Run the RTL Export to
generate the IP-XACT adapter.
1. Since the directives have been added, it is safe to re-synthesize the design. Select
Solution > Run C Synthesis > Active Solution
lOMoARcPSD| 58675420
Check the Interface summary at the bottom of the Synthesis report to see the
interface that has been created.
2. Once the design is synthesized, select Solution > Export RTL to open the dialog
box so the desired IP can be generated. An Export RTL Dialog box will open.
Figure 7: Export RTL Dialog
3. Click OK to generate the IP-XACT adapter.
4. When the run is completed, expand the impl folder in the Explorer view and observe
various generated directories, such as ip, misc, verilog and vhdl.
Figure 8: IP-XACT adapter generated
Expand the ip directory and observe several files and sub-directories. One of the
sub-directory of interest is the drivers directory which consists of header, c, tcl, mdd,

Preview text:

lOMoAR cPSD| 58675420
HCMC UNIVERSITY OF TECHNOLOGY AND EDUCATION
FACULTY OF HIGH QUALITY TRAINING FINAL PROJECT
Course name: HARDWARE/SOFTWARE CODESIGN
CREATING A PROCESSOR SYSTEM Lecturer name :
List of members:
Ho Chi Minh City, 06/2022
Downloaded by Lynh Nguyen (lynhn228@gmail.com) lOMoAR cPSD| 58675420 LECTURERCOMMENT TT Content Comment 1 Introduction 2 Creating a Processor System 3
Creating a Processor System Lab 4 Conclusion General comment:
........................................................................................................................................... ...
........................................................................................................................................... ...
........................................................................................................................................... ...
........................................................................................................................................... ...
........................................................................................................................................... ...
........................................................................................................................................... ... Lecturer’s signature lOMoAR cPSD| 58675420 ACKNOWLEDGEMENT
To complete the Hardware/Software Codesign subject, we would like to express our
heartfelt gratitude to Assoc. Prof. Phan Van Ca has enthusiastically guided and equipped
us with the necessary helpful knowledge this semester. Furthermore, he has directly
guided and created all conditions to help us during the process of Hardware/Software Codesign.
Due to the project's short implementation time, the topic's limited knowledge,
limitations, and errors have not been completely overcome. We look forward to
receiving your advice and suggestions. Student lOMoAR cPSD| 58675420 CONTENT
ACKNOWLEDGEMENT .............................................................................................. i
CONTENT ..................................................................................................................... ii
FIGURE LIST ............................................................................................................... iv
PART 1. INTRODUCTION ........................................................................................... 1
1.1. Introduction ......................................................................................................... 1
1.2. Purpose and requirements .................................................................................... 1
1.3. Layout .................................................................................................................. 1
PART 2. CREATING A PROCESSOR SYSTEM ......................................................... 1
2.1. Embedded System Design in Zynq using IP Integrator ...................................... 1
2.1.1. Embedded Design Architecture in Zynq .................................................. 1
2.1.2. The PS and the PL .................................................................................... 2
2.1.3. Vivado ...................................................................................................... 2
2.2. Creating IP-XACT Hardware Accelerator .......................................................... 3
2.2.1. Port-Level Interfaces ................................................................................ 3
2.2.2. Interface Modes ........................................................................................ 4
2.2.3. Native AXI Slave Lite Interface ............................................................... 4
2.2.4. Controllable Register Maps in AXI4 Lite ................................................ 4
2.2.5. Native AXI4 Master ................................................................................. 4
2.2.6. Burst Accesses Inferred for AXI4 Master ................................................ 5
2.2.7. Byte-Enable Accesses on AXI4 Master ................................................... 5
2.2.8. AXI4 Port Bundling ................................................................................. 5
2.2.9. AXI4 Stream Interface: Ease of Use ........................................................ 6
2.2.10. Generate the hardware accelerator ......................................................... 6
2.2.11. Generated impl Directory ....................................................................... 6
2.3. Integrating the Hardware Accelerator in AXI System ......................................... 6
PART 3. CREATING A PROCESSOR SYSTEM LAB ................................................ 7
3.1. Create a New Project ........................................................................................... 7
3.2. Run C Simulation ................................................................................................ 9
3.3. Synthesize the Design ........................................................................................ 10
3.4. Run RTL/C CoSimulation ..................................................................................11 lOMoAR cPSD| 58675420
3.5. Setup IP-XACT Adapter .................................................................................... 12
3.6. Generate IP-XACT Adapter .............................................................................. 13
3.7. Create a Vivado Project ..................................................................................... 15
3.8. Export to SDK and create Application Project .................................................. 19
3.9. Verify the Design in Hardware .......................................................................... 20
REFERENCE ............................................................................................................... 23 lOMoAR cPSD| 58675420 FIGURE LIST
Figure 1: The design under consideration ........................................................................9
Figure 2: The header file................................................................................................ 10
Figure 3: Initial part of the generated output in the Console view ................................11 Figure 4: Generated interface signals
.............................................................................12
Figure 5: Selecting the AXI4LiteS adapter and naming bundle ....................................13
Figure 6: Applying bundle to assign y output to AXI4Lite adapter ..............................14
Figure 7: Export RTL Dialog ......................................................................................... 15
Figure 8: IP-XACT adapter generated ........................................................................... 15
Figure 9: Adapter’s drivers directory .............................................................................16
Figure 10: Block design made for Pynq .........................................................................17
Figure 11: Setting path to IP Repositories ..................................................................... 18
Figure 12: Generated design after IRQ_F2P interface enabled .....................................19
Figure 13: Generated address map .................................................................................20 lOMoAR cPSD| 58675420 PART 1. INTRODUCTION 1.1. Introduction
This project will present you with the process of using Vivado and IP Integrator to
create a complete Zynq ARM Cortex-A9 based processor system targeting the ZyBoard
Zynq development board. You will use the Block Design feature of IP Integrator to
configure the Zynq PS and add IP to create the hardware system, and SDK to create an
application to verify the design functionality. It will also guide you through the process
of profiling an application and analyzing the output.
1.2. Purpose and requirements a) Purpose
This lab introduces a design flow to generate a IP-XACT adapter from a design
using Vivado HLS and using the generated IP-XACT adapter in a processor system
using IP Integrator in Vivado. b) Requirements
After completing this lab, you will be able to: -
Understand the steps and directives involved in creating an IP-XACT
adapter from a synthesized design in Vivado HLS -
Create a processor system using IP Integrator in Vivado -
Integrate the generated IP-XACT adapter into the created processor system. 1.3. Layout
The report is divided into 4 parts: Part 1. Introduction
Part 2. Creating a Processor System
Part 3. Creating a Processor System Lab Part 4. Conclusion
PART 2. CREATING A PROCESSOR SYSTEM
2.1. Embedded System Design in Zynq using IP Integrator
2.1.1. Embedded Design Architecture in Zynq
Embedded design in Zynq is based on: – Processor and peripherals
• Dual ARM® Cortex™ -A9 processors of Zynq-7000 AP SoC • AXI interconnect lOMoAR cPSD| 58675420 • AXI component peripherals
• Reset, clocking, debug ports
– Software platform for processing system • Standalone OS • C language support • Processor services • C drivers for hardware – User application
• Interrupt service routines (optional)
2.1.2. The PS and the PL
The Zynq-7000 AP SoC architecture consists of two major sections: – PS: Processing system
• Dual ARM Cortex-A9 processor based (Single core versions available)
• Multiple peripherals • Hard silicon core – PL: Programmable logic
• Uses the same 7 series programmable logic
– Artix™-based devices: Z-7010, Z-7015, and Z-7020 (high-range I/O banks only)
– Single core versions: Z-7017S, Z-7012S, and Z-7014S
– Kintex™-based devices: Z-7030, Z-7035, Z-7045, and Z-7100 (mix of high-range and high-performance I/O banks) 2.1.3. Vivado
What are Vivado, IP Integrator and SDK?
– Vivado is the tool suite for Xilinx FPGA design and includes capability for embedded system design
• IP Integrator, is part of Vivado and allows block level design of the hardware part of an Embedded system • Integrated into Vivado
• Vivado includes all the tools, IP, and documentation that are required for designing systems with the Zynq-7000
AP SoC hard core and/or Xilinx MicroBlaze soft core processor
• Vivado + IPI replaces ISE/EDK lOMoAR cPSD| 58675420
– SDK is an Eclipse-based software design environment
• Enables the integration of hardware and software components • Links from Vivado
Vivado is the overall project manager and is used for developing non-embedded
hardware and instantiating embedded systems
– Vivado/IP Integrator flow is recommended for developing Zynq embedded systems.
Embedded System Design using Vivado
2.2. Creating IP-XACT Hardware Accelerator
2.2.1. Port-Level Interfaces
The AXI4 interfaces supported by Vivado HLS include – The AXI4-Stream (axis) •
Specify on input arguments or output arguments only, not on input/output
arguments – The AXI4 master (m_axi) •
Specify on arrays and pointers (and references in C++) only. You can group
multiple arguments into the same AXI4-Lite interface using the bundle option – The AXI4-Lite (s_axilite)
• Specify on any type of argument except arrays. You can group multiple arguments into
the same AXI4-Lite interface using the bundle option. lOMoAR cPSD| 58675420 2.2.2. Interface Modes Native AXI Interfaces
– AXI4 Slave Lite, AXI4 Master, AXI Stream supported by INTERFACE directive
• Provided in RTL after Synthesis
• Supported by C/RTL Co-simulation
• Supported for Verilog and VHDL BRAM Memory Interface
– Identical IO protocol to ap_memory
– Bundled differently in IP Integrator
• Provides easier integration to memories with BRAM interface
2.2.3. Native AXI Slave Lite Interface
Interface Mode: s_axilite
– Supported with INTERFACE directive
– Multiple ports may be grouped into the same Slave Lite interface
• All ports which use the same bundle name are grouped Grouped Ports
– Default mode is ap_none for input ports
– Default mode is ap_vld for output ports
– Default mode ap_ctrl_hs for function (return port)
– Default mode can be changed with additional INTERFACE Directives.
2.2.4. Controllable Register Maps in AXI4 Lite
Assigning offset to array (RAM) interfaces
– Specified value is offset to base of array
– Array’s address space is always contiguous and linear
C Driver Files include offset information
– In generated driver file xhls_sig_gen_bram2axis.h
2.2.5. Native AXI4 Master Interface Mode: m_axi
– Supported with INTERFACE directive Options
– Multiple ports may be grouped into the same AXI4 Master interface
• All ports which use the same bundle name are grouped
– Depth option is required for C/RTL co-simulation lOMoAR cPSD| 58675420
• Required for pointers, not arrays
• Set to the number of values read/written
– Option to support offset or base address
2.2.6. Burst Accesses Inferred for AXI4 Master
There are two types of accesses on an AXI Master: Single Access and Burst Access
– Burst accesses are more efficient
– Burst access has until now required the use of memcpy()
Burst Accesses are now inferred
– From operations in a for-loop and from sequential operations in the code
– However: there are some limitations
• Single for-loops only, no nested loops
2.2.7. Byte-Enable Accesses on AXI4 Master
Byte-Enable Accesses Support on AXI4 Master Interfaces
– Single bytes are now written and read
– Improved AXI4 Master performance Improved Performance – This code uses 8-bit data
• Previously, accessing this required reading/writing full 32-bit
• This implied a required read-modify-write behavior: Impacted performance
– Similar performance improvement when accessing struct members
• Also often implied read-modify-write behavior – Improved Port Bundling
• Variables of different sizes can be grouped into same AXI4 Master port.
2.2.8. AXI4 Port Bundling
AXI4 Master and Lite Port Bundling
– The bundle options groups arguments into the same AXI4 port – For
example, group 3 arguments into AXI4 port “ctrl” :
Arguments can be Bundled into AXI4 Master and AXI4 Lite ports
– If no bundle name is used a default name is used for all arguments
• All go into a single AXI4 Master or AXI4 Lite
• Default name applied if no –bundle option is used
– Group different sized variables into an AXI4 Master port lOMoAR cPSD| 58675420
2.2.9. AXI4 Stream Interface: Ease of Use
Native Support for AXI4 Stream Interfaces
– Native = An AXI4 Stream can be specified with set_directive_interface
• No longer required to set the interface then add a resource
• This AXI4 Stream interface is part of the HDL after synthesis
• This AXI4 Stream interface is simulated by RTL co-simulation
2.2.10. Generate the hardware accelerator
Select Solution > Export RTL
Select IP Catalog, System Generator for Vivado or design check point (dcp) Click
on Configuration… if you want to change the version number or other information – Default is v1_00_a Click on OK
The directory (ip) will be generated under the impl folder under the current
project directory and current solution –
RTL code will be generated, both for Verilog and VHDL languages in their respective folders
2.2.11. Generated impl Directory
2.3. Integrating the Hardware Accelerator in AXI System
Create a new Vivado project, or open an existing project Invoke IP Integrator
Construct(modify) the hardware portion of the embedded design by adding the IP-XACT
Create (Update) top level HDL wrapper lOMoAR cPSD| 58675420
Synthesize any non-embedded components and implement in Vivado
Export the hardware description, and launch XSDK
Create a new software board support package and application projects in the XSDK
Compile the software with the GNU cross-compiler in XSDK
Download the programmable logic’s completed bitstream using Xilinx Tools > Program FPGA in XSDK
Use XSDK to download and execute the program (the ELF file).
PART 3. CREATING A PROCESSOR SYSTEM LAB
3.1. Create a New Project
Create a new project in Vivado HLS targeting xc7z020clg400-1 device
1. Select Start > Xilinx Design Tools > Vivado HLS 2017.4 A Getting
Started GUI will appear.
2. In the Getting Started section, click on Create New Project. The New Vivado HLS Project wizard opens.
3. Click Browse… button of the Location field, browse to {labs}\lab4, and then click OK.
4. For Project Name, type fir.prj and click Next.
5. In the Add/Remove Files for the source files, type fir as the function name (the
provided source file contains the function, to be synthesized, called fir).
6. Click the Add Files… button, select fir.c and fir_coef.dat files from the
{sources}\lab4 folder, and then click Open. 7. Click Next.
8. In the Add/Remove Files for the testbench, click the Add Files… button, select
fir_test.c file from the {sources}\lab4 folder and click Open. 9. Click Next.
10. In the Solution Configuration page, leave Solution Name field as solution1 and
make sure the clock period as 8. Leave Uncertainty field blank.
11. Click on the Part’s Browse button and using the Parts Specify option, select xc7z020clg400-1. 12. Click Finish. lOMoAR cPSD| 58675420
You will see the created project in the Explorer view. Expand various sub-folders to
see the entries under each sub-folder.
13. Double-click on the fir.c under the source folder to open its content in the information pane.
Figure 1: The design under consideration
The FIR filter expects x as a sample input and pointer to the computed sample out
y. Both of them are defined of data type data_t. The coefficients are loaded in array c of
type coef_t from the file called fir_coef.dat located in the current directory. The
sequential algorithm is applied and accumulated value (sample out) is computed in variable acc of type acc_t.
14. Double-click on the fir.h in the outline tab to open its content in the information pane. lOMoAR cPSD| 58675420
Figure 2: The header file
The header file includes ap_cint.h so user defined data width (of arbitrary precision)
can be used. It also defines number of taps (N), number of samples to be generated (in
the testbench), and data types coef_t, data_t, and acc_t. The coef_t and data_t are short
(16 bits). Since the algorithm iterates (multiply and accumulate) over 59 taps, there is a
possibility of bit growth of 6 bits and hence acc_t is defined as int38. Since the acc_t is
bigger than sample and coefficient width, they have to cast before being used (like in
lines 16, 18, and 21 of fir.c).
15. Double-click on the fir_test.c under the testbench folder to open its content in the information pane.
Notice that the testbench opens fir_impulse.dat in write mode, and sends an impulse (first sample being 0x8000. 3.2. Run C Simulation
Run C simulation to observe the expected output.
1. Select Project > Run C Simulation or click on the button from the tools bar buttons,
and click OK in the C Simulation Dialog window.
The testbench will be compiled using apcc compiler and csim.exe file will be generated.
The csim.exe will then be executed and the output will be displayed in the console view. lOMoAR cPSD| 58675420
Figure 3: Initial part of the generated output in the Console view
3.3. Synthesize the Design
Synthesize the design with the defaults. View the synthesis results and answer
the question listed in the detailed section of this step.
1. Select Solution > Run C Synthesis > Active Solution to start the synthesis process.
2. When synthesis is completed, several report files will become accessible and the
Synthesis Results will be displayed in the information pane.
3. The Synthesis Report shows the performance and resource estimates as well as
estimated latency in the design.
4. Using scroll bar on the right, scroll down into the report and answer the following question.
Estimated clock period: 8ns Worst case latency: 175 Number of DSP48E used: 0
Number of BRAMs used: 3 Number of FFs used: 168 Number of LUTs used: 157
5. The report also shows the top-level interface signals generated by the tools. lOMoAR cPSD| 58675420
Figure 4: Generated interface signals
You can see the design expects x input as 16-bit scalar and outputs y via pointer of
the 16-bit data. It also has ap_vld signal to indicate when the result is valid.
Add PIPELINE directive to the loop and re-synthesize the design. View the synthesis results.
1. Make sure that the fir.c is open in the information view.
2. Select the Directive tab, and apply the PIPELINE directive to the loop.
3. Select Solution > Run C Synthesis > Active Solution to start the synthesis process.
4. When synthesis is completed, the Synthesis Results will be displayed in the information pane.
5. Note that the latency has reduced to 63 clock cycles. The DSP48 and BRAM
consumption remains same; however, LUT and FF consumptions have slightly increased.
3.4. Run RTL/C CoSimulation
Run the RTL/C Co-simulation, selecting Verilog. Verify that the simulation passes.
1. Select Solution > Run C/RTL Co-simulation or click on the button to open the
dialog box so the desired simulations can be run.
2. Select the Verilog option and click OK.
The Co-simulation will run, generating and compiling several files, and then
simulating the design. In the console window you can see the progress. When done the
RTL Simulation Report shows that it was successful and the latency reported was 62. lOMoAR cPSD| 58675420
3.5. Setup IP-XACT Adapter
Add INTERFACE directive to create AXI4LiteS adapters so IP-XACT adapter
can be generated during the RTL Export step.
1. Make sure that fir.c file is open and in focus in the information view.
2. Select the Directive tab.
3. Right-click x, and click on Insert Directive….
4. In the Vivado HLS Directive Editor dialog box, select INTERFACE using the drop- down button.
5. Click on the button beside mode (optional). Select s_axilite.
6. In the bundle (optional) field, enter fir_io and click OK.
Figure 5: Selecting the AXI4LiteS adapter and naming bundle
7. Similarly, apply the INTERFACE directive (including bundle) to the y output. lOMoAR cPSD| 58675420
Figure 6: Applying bundle to assign y output to AXI4Lite adapter
8. Apply the INTERFACE directive to the top-level module fir to include ap_start,
ap_done, and ap_idle signals as part of bus adapter (the variable name shown will be
return). Include the bundle information too.
Note that the above steps will create address maps for x, y, ap_start ap_valid,
ap_done, and ap_idle, which can be accessed via software. Alternately, ap_start,
ap_valid, ap_done, ap_idle signals can be generated as separate ports on the core by not
applying RESOURCE directive to the top-level module fir. These ports will then have
to be connected in a processor system using available GPIO IP.
3.6. Generate IP-XACT Adapter
Re-synthesize the design as directives have been added. Run the RTL Export to
generate the IP-XACT adapter.
1. Since the directives have been added, it is safe to re-synthesize the design. Select
Solution > Run C Synthesis > Active Solution lOMoAR cPSD| 58675420
Check the Interface summary at the bottom of the Synthesis report to see the
interface that has been created.
2. Once the design is synthesized, select Solution > Export RTL to open the dialog
box so the desired IP can be generated. An Export RTL Dialog box will open.
Figure 7: Export RTL Dialog
3. Click OK to generate the IP-XACT adapter.
4. When the run is completed, expand the impl folder in the Explorer view and observe
various generated directories, such as ip, misc, verilog and vhdl.
Figure 8: IP-XACT adapter generated
Expand the ip directory and observe several files and sub-directories. One of the
sub-directory of interest is the drivers directory which consists of header, c, tcl, mdd,