XCELLENCE IN WIRELESS COMMS
RAM/ROM Utilization
Dual-port RAMs (RAM16x1D) 140
64x1 ROMs (ROM64X1) 12
256x1 ROMs (ROM256x1) 138
Number of block RAMs
Logic Utilization
211 of 336 62%
Number of slice flip-flops 52,943 out of 178,176 29%
Number of four-input LUTs
Logic Distribution
133,498 out of 178,176 74%
Number of occupied slices 83,217 out of 89,088 93%
Total number of four-input LUTs 137,384 out of 178,176 77%
Number of bonded IOBs 361 out of 960 37%
Number of BUFG/BUFGCTRLs 8 out of 32 25%
Number of FIFO16/RAMB16s 214 out of 336 63%
Number of DSP48s 17 out of 96 17%
Number of DCM_ADVs 1 out of 12 8%
Table 1 – WiLDSYS Virtex-4 LX 200 utilization report
but that was OK, since we still had sufficient
margin. We reported the modifications back
to the ASIC database after we met timing on
the FPGA and ran a full regression test on
the ASIC target. After the ASIC synthesis,
there was even an area benefit for the ASIC
implementation of the modified design,
since the relaxed timing allowed the ASIC
synthesis tool to use a lower number of
buffers and smaller combinational cells.
The clock controller is one of the most
critical ASIC blocks, and also one of the
most difficult to map on an FPGA.
Ultimately we had to create a unique block
just for the FPGA implementation. We set
up the synthesis scripts for ASIC and for
FPGA to select the right block for each target.
We used the Virtex-4 digital clock manager
(DCM) resources to replace the
phase-locked loop that would normally be
used in a WLAN ASIC design. The DCM
created the required 240-MHz frequency
from a 40-MHz or 50-MHz oscillator
input, exactly as in the ASIC.
We were targeting the WiLD IP at lowpower
applications and therefore added
support for clock gating to the design,
along with separate power domains. We
knew we could not fully implement the
two features in the FPGA, but we wanted
to include at least the control and switching
of the clock and power domains. With
the eight-region free global clock buffers
of the Virtex-4 LX200 FPGA, we didn’t
have sufficient resources to cover all the
clock frequencies and clock-gating functionalities
of the ASIC. We chose to
implement in the FPGA just the clock
gating required for correct functionality of
the system, and not the clock gating aiming
only at saving power.
To ease the clock tree, we used the
clock-gating conversion feature available
with the Synplify Pro FPGA synthesis tool.
The synthesizer removes the gating condition
from the clock line and places it on
the flip-flop enable input or on the data
input. We used BUFG and BUFGCE
buffers for clock gating so that the output
of this logic was still routed using the highspeed,
low-skew global clock lines of the
Virtex-4 FPGA. A BUFGMUX resource
provided glitch-free multiplexing between
active mode and low-power clocks.
We implemented several voltage domains
in the WiLD design that could be switched
on and off at different times. We validated
this complex and critical functionality with
the Virtex-4, even though the FPGA contains
only one power domain. A domain
that got powered off must be reset at powerup.
On the FPGA, we emulated the powerdown
state with a reset of the domain.
Although this method does not completely
validate the power domain connections and
must be associated with other ASIC checks
and verifications, we were able to spot functional
errors in domain interconnections and
reset generations, which we might not have
detected otherwise with the time-limited
ASIC simulations.
The WiLD platform follows the AHB
standard, with four bus masters sharing
access to peripherals such as memories and
the Advanced Peripheral Bus (APB) subsystem.
We discovered that the critical part
for timing closure was the connection of
the address and data lines. We overcame
this problem by mapping the CPU and
memories inside the FPGA, instead of
using the external devices. For the latter we
took advantage of the large built-in memory
blocks of the Virtex-4 device, drastically
reducing our routing delays. The
Synplify Pro tool also helped, since it was
able to extract timing data from the memory
netlist files (EDN), rather than seeing
them only as black boxes.
Signal processing designed for an ASIC
is not easy to map to an FPGA, because it
is optimized for area, with reuse of hardware
modules on each clock cycle when the
data rate is slower than the clock. FPGA
synthesis goes around this constraint using
retiming and logic duplication. We used
the Xilinx PlanAhead™ floor-planning
tool to constrain critical blocks of the
OFDM modem to a given FPGA zone in
order to reduce the routing delays inside
and between these blocks.
External Interfaces
The WiLDSYS FPGA platform is flexible
and easy to adapt to changes in the external
interfaces, thanks to the Virtex-4’s ability
to support different interface drive
characteristics through its adapted pads.
36 Xcell Journal Fourth Quarter 2009