## **Phase-I FELIX Design Review**

hardware development

#### Kai Chen On behalf of the FELIX group

FELIX Design Review November 11, 2016









**1** FELIX Hardware



2 Prototype design of the FELIX I/O module









- Server Linux PC
- Up to two PCIe interface cards with Xilinx Ultrascale FPGA, depending on bandwidth needed (for two cards: using 2× PCIe slots Gen3 ×8 lanes, leaving enough lanes for the NIC(s))
- NIC, 40 or 100 Gb/s Ethernet interfacing or InfiniBand



SuperMicro X10SRA-F used for development

- Broadwell CPU, e.g. E5-1650V4, 3.6 GHz
- PCIe Gen3 slots



Mellanox ConnectX-3 VPI

- $2 \times$  FDR/QDR Infiniband
- 2× 10/40 GbE

- FLX-709: Xilinx VC-709 evaluation board
  - 4 channels
  - on-board jitter cleaner doesn't meet requirements
    - \* SI5324 doesn't support 0-delay mode
- FLX-710: Hitech Global HTG-710 evaluation board
  - 24 channels
  - only support using on-board oscillator
- TTCfx card is designed to interface TTC system and clean the clock
  - FLX-709: TTCfx outputs clean clock to transceivers: via SMA cable
  - FLX-710: the hardware must be changed
- These boards can be used to develop the FELIX functions when custom boards are not yet available
  - FLX-709: targets detector and trigger system test setups
  - FLX-710: development without interfacing TTC system (mostly for software development nowadays)









### STATLAS Design of custom PCIe card





• A PCIe card was designed for LAr LTDB (LAr Trigger Digitizer Board) test setup at BNL

- Basic functions:
  - PCIe Gen 3  $\times$ 16 lane
  - 48 bidirectional optical links up to 14 Gb/s
  - $\,2\times$  DDR4 SODIMM connectors support: capacity up to 16 GB & 2.1 GT/s
  - with circuits to interface TTC system; with on-board jitter cleaner
  - FPGA resources (logic cells) are about twice of the FLX-709 or FLX-710
- Meets the FELIX requirements. A new function is added for FELIX:
  - micro-controller to support FPGA reprogramming, and firmware update
  - important for detector operation & maintenance
- It is the baseline choice of Phase-I FELIX prototype

Kai Chen - FELIX Design Review





| Device      | SI5338    | SI5345   | SI5341   |
|-------------|-----------|----------|----------|
| Jitter (ps) | 8.58      | 0.09     | 6.39     |
| Device      | CDCM6208  | LMK03200 | LMK03033 |
| Jitter (ps) | 2.06      | 5.91     | 2.74     |
| Device      | CDCE62005 |          |          |
| Jitter (ps) | 8.61      |          |          |

The jitter from 10 kHz to 1 MHz

- This survey was originally done to choose a clock device for a FrontEnd board. It also provided input for the selection of jitter cleaner for FELIX.
- SI5345 showed the best performance.
  - it supports 0-delay mode.
  - 10 outputs.
  - meets requirement for transceivers in both of 7 series FPGA and Ultrascale FPGA.

#### **SATLAS** Clock distribution for the FLX-711 V1P5





- LMK03200 is a backup for SI5345. Both of them support 0-delay mode.
- Each bank can use the reference clock from both of Si5345 & LMK03200.

Three stages of power on sequence.

LTM4630A: 18/36A, in: 4.5-15, out: 0.6-5.3 LTM4630: 18/36A, in: 4.5-15, out: 0.6-1.8 LTM4620A: 13/26A, in: 4.5-16, out: 0.6-5.3



- Will use LTM4630A for all the  $5 \times$  DC-DC power modules.
  - same price.
  - pin compatible.
  - new generation.
- Worst case power dissipation:
  - FPGA: about 37 W.
  - others: about 27 W.





- The use of I2C switches eases the design of firmware and software (compatibility with flx-i2c). One firmware module can handle all I2C slaves.
- Reduce the use of level translator.



- The flash can store 4 different bit files. It is selected by software via the micro-controller. A golden version can be saved in one of them.
- The software can reprogram the FPGA via micro-controller.
- The FPGA firmware can receive bit file from software via PCIe interface, and update the chosen flash partition.
- This design is originally from C-RORC (ATLAS RobinNP) board.
- The program running in the micro-controller and script to control the FPGA programming are modified from the design by Heiko Engel of ALICE group.
- The flash programming via PCIe is being tested.







- The layout is complicated:
  - board thickness (1.57 mm) limits the number of layers.
  - board height requirement causes the traces to be very dense.
  - special impedance requirement for DDR4.
- Two kinds of blind vias are used. 3 sequential laminations are needed when producing the PCB.
  - one (layer 1-6) is for the MiniPODs.
  - one (layer 1-12) is for DDR4 traces.



#### **EXAMPLAS** The mechanical design for the FLX-711









• Two harnesses are ordered, each has  $4 \times 12$ -channel fibers with different lengths.

- Some improvements are made to the first version board, after the testing of V1P0.
  - Mapping of the 48 transceivers:
    - \* make sure the transmitter and receiver in each transceiver has same sequence number in the Tx and Rx MiniPOD
    - \* make sure the 4 transceivers in each quad are connected to the same sub-group (channel 1-4, 5-8 or 9-12) in one MiniPOD.
  - Simplify the clock distribution design, SI5345 is used to replace the SI5338.
  - Power sensing of the DC-DC modules are used, to provide accurate VCCINT, MGTAVCC, MGTAVTT to the FPGA.
  - I2C switches are used.
- Bug fixes
  - Current capacity of the 1.8V power for the PCIe switch PEX8732.
  - Use dedicated clock pins for the DDR system clock.
  - Switch is added between flash and FPGA, since the special FPGA bank 0 is unusable by firmware. These pins will be connected to other bank, when we update the bit file in flash.
  - PCIe lane bit order in each quad.





# **Test results of FLX-711 prototype**





| Properties                                                                               |                               |      | \$   | E |  |
|------------------------------------------------------------------------------------------|-------------------------------|------|------|---|--|
|                                                                                          |                               |      |      |   |  |
| Name:                                                                                    | MIG_1                         |      |      |   |  |
| MIG status:                                                                              | CAL PASS                      |      |      |   |  |
| MicroBlaze status                                                                        | PASS                          |      |      |   |  |
| DQS gate status:                                                                         | RUNNING                       |      |      |   |  |
| Message: No errors<br>detected during<br>calibration.                                    |                               |      |      |   |  |
|                                                                                          |                               |      |      |   |  |
| Status                                                                                   |                               |      |      | C |  |
|                                                                                          | Calibration Stage             | St   | atus |   |  |
| 1 - DQS Gate                                                                             |                               | PASS |      |   |  |
| 2 – DQS Gate Sanit                                                                       | ly Check                      | PASS |      |   |  |
| 3 - Write Leveling                                                                       |                               | PASS |      |   |  |
| 4 - Read Per-Bit D                                                                       |                               | PASS |      |   |  |
| 5 - Read Per-Bit D                                                                       |                               | SKIP |      |   |  |
| 6 - Read DQS Cen                                                                         |                               | PASS |      |   |  |
| 7 - Read Sanity Che<br>8 - Write DQS to DO<br>9 - Write DQS to DM<br>10 - Write DQS to D | eck.                          | PASS |      |   |  |
|                                                                                          |                               | PASS |      |   |  |
|                                                                                          |                               | PASS |      |   |  |
|                                                                                          |                               | PASS |      |   |  |
| 11 - Write DQS to                                                                        |                               | PASS |      |   |  |
| 12 - Read DQS Ce                                                                         | ntering DBI (Simple)          | SKIP |      |   |  |
| 13 - Write Latency                                                                       |                               | PASS |      |   |  |
| 14 - Write Read Sa                                                                       |                               | PASS |      |   |  |
| 15 - Read DQS Ce                                                                         | ntering (Complex)             | PASS |      |   |  |
| 16 - Write Read Sa                                                                       | anity Check 1                 | PASS |      |   |  |
| 17 - Read VREF T                                                                         | raining                       | SKIP |      |   |  |
| 18 - Write Read Sa                                                                       | anity Check 2                 | SKIP |      |   |  |
| 19 - Write DQS to                                                                        | DQ (Complex)                  | PASS |      |   |  |
| 20 - Write DQS to                                                                        | DM/DBI (Complex)              | SKIP |      |   |  |
| 21 - Write Read Sa                                                                       | anity Check 3                 | PASS |      |   |  |
| 22 - Write VREF TI                                                                       | raining                       | SKIP |      |   |  |
| 23 - Write Read Sa                                                                       | anity Check 4                 | SKIP |      |   |  |
| 24 - Read DQS Ce                                                                         | ntering Multi Rank Adjustment | SKIP |      |   |  |
| 25 - Write Read Sa                                                                       | anity Check 5                 | SKIP |      |   |  |
| 26 - Multi Rank Ad                                                                       | justment and Checks           | SKIP |      |   |  |
| 27 - Write Read Sa                                                                       | nity Check 6                  | SKIP |      |   |  |

- The two DDR4 modules work well at a speed of 2.11 GT/s.
  - calibration is OK.
  - passed the 100 times of write and read checking.

82:00.0 PCI bridge: PLX Technology, Inc. PEX 6732 32-lame, 6-Port PCI Express Gen 3 (8.0 GT/s) Switch (new ca) 83:08.0 PCI bridge: PLX Technology, Inc. PEX 0732 32-lame, 8-Port PCI Express Gen 3 (8.0 GT/s) Switch (new ca) 83:09.0 PCI bridge: PLX Technology, Inc. PEX 0732 32-lame, 8-Port PCI Express Gen 3 (8.0 GT/s) Switch (new ca) 83:09.0 Communication controller: Xilinx Corporation Bevice 7038 85:00.0 Communication controller: Xilinx Corporation Bevice 7039

| locks read: |                                           | 100000        |                                     |
|-------------|-------------------------------------------|---------------|-------------------------------------|
|             | 6185536.165939 blocks/s<br>5.898987 GiB/s | 800           |                                     |
| MA Read:    | 2.898987 GLB/S                            | [franss@argos | build]\$ ./flx-throughput -b 100000 |
| locks read: | 12400000                                  | Blocks read:  |                                     |
| locks rate: | 6185905.186155 blocks/s                   |               | 6196905.903473 blocks/s             |
| MA Read:    | 5.899339 GiB/s                            | DMA Read:     | 5.909830 GiB/s                      |
| locks read: | 12406060                                  | Blocks read:  |                                     |
| locks rate: | 6183939.807421 blocks/s                   |               | 6185849.312970 blocks/s             |
| MA Read:    | 5.897465 GiB/s                            | DMA Read:     | 5.899286 GiB/s                      |
| locks read: | 12408080                                  | Blocks read:  | 12408060                            |
|             | 6185336.614234 blocks/s                   | Blocks rate:  | 6184098.089583 blocks/s             |
| MA Read:    |                                           | DMA Read:     | 5.897616 GlB/s                      |
| locks read: | 12406060                                  | Blocks read:  |                                     |
|             | 6184295.636766 blocks/s                   |               | 6185390.978649 blocks/s             |
| MA Read:    | 5.897864 GlB/s                            | DMA Read:     | 5.898849 G1B/s                      |
| 1           |                                           | Blocks read:  | 12406060                            |
|             |                                           |               | 6184691.823969 blocks/s             |
|             |                                           | DMA Read:     | 5.898182 GiB/s                      |
|             |                                           |               |                                     |

- Two Xilinx PCIe endpoints are found by *lspci*.
- Reading of the two PCIe endpoints is done in parallel. In this example the firmware sends simple counter data.
- Total throughput is about 101.7 Gb/s.

|     |                       |                     |              |              |             |         |           |           |               | Vivado                          | 2016.2                   |
|-----|-----------------------|---------------------|--------------|--------------|-------------|---------|-----------|-----------|---------------|---------------------------------|--------------------------|
|     | Ecit Flow Tools Win   |                     |              |              |             |         |           |           |               |                                 |                          |
| 41  | a er 🖬 🏦 🗙 🐝          |                     |              |              | S Dashboa   | ird 🖛 🔯 |           |           |               |                                 |                          |
|     |                       | st/xilinx_tcf/Xilin | x/00001292be | c301         |             |         |           |           |               |                                 |                          |
| ria | I (/O Links           |                     |              |              |             |         |           |           |               |                                 |                          |
|     | Name                  | TX                  | RX.          | Status       | 885         | Errors  | 8ER       | ERT Reset | TX Pattern    | Ric Pattern                     | i TX                     |
| 1.5 | - Ungrouped Links (0) |                     |              |              |             |         |           |           |               |                                 |                          |
| S   | Hound Links (48)      |                     |              |              |             |         |           | Reser     |               |                                 |                          |
|     | -% Found 0            | MGT_X0Y8/TX         | MGT_X0Y8/RX  | 12.797 Gb    | ps 1.021E15 |         |           |           |               |                                 |                          |
| 1   | -% Found 1            |                     |              |              |             |         |           |           |               |                                 |                          |
|     | -% Found 2            |                     |              |              |             |         |           |           |               |                                 |                          |
|     | -% Found 3            |                     |              |              |             |         |           |           | ) PRBS 31-bit | <ul> <li>PRBS 31-bit</li> </ul> | ▼ 0.00 c                 |
|     | -% Found 4            |                     |              |              |             |         |           |           |               |                                 |                          |
|     | -% Found S            |                     |              |              |             |         |           |           |               |                                 |                          |
|     | -% Found 6            | MGT_X0Y14/TD        | MGT_X0Y14/R  | × 12.803 Gb  | DS 1.021E15 | 0E0     | 9.79E-16  | Reset     |               |                                 |                          |
|     | -% Found 7            | MGT_X0Y15/TX        | MGT_X0Y15/R  | × 12.795 Gb  | ps 1.021E15 | 0E0     | 9.79E-16  | Reset     |               |                                 |                          |
|     | -% Found 8            |                     |              |              |             |         |           |           | PRBS 31-bit   | <ul> <li>PRBS 31-bit</li> </ul> | <ul><li>0.00 d</li></ul> |
| 1   | -% Found 9            | MGT_X0Y17/D         | MGT_X0Y17/R  | × 12.795 Gb  | ps 1.021E15 | 0E0     | 9.79E-16  | Reset     |               |                                 |                          |
| 1   | -% Found 10           |                     |              |              |             | 0E0     | 9.79E-16  | Reset     |               |                                 |                          |
|     | -9 Found 11           |                     |              |              |             | 0E0     |           |           | PRBS 31-bit   | <ul> <li>PR85 31-bit</li> </ul> | × 0.00 r                 |
|     | -9 Found 12           | MGT X0Y28(T)        | MGT X0Y28/R  | x 12.803 Gb  | os 1.021E15 | 0E0     | 9.79E-16  | Reset     | PR85 31-bit   | <ul> <li>PR85 31-bit</li> </ul> | + 0.00 i                 |
|     | -9 Found 13           |                     |              |              |             | 0F0     | 9 79F-16  |           | PRBS 31-bit   | · PRBS 31-bit                   | · 0.00 /                 |
|     | -9 Found 14           |                     |              |              |             |         |           |           | 0005 21-Hit   | - PPPS 21-Hit                   | . 0.00                   |
|     | -% Found 15           |                     |              |              |             |         |           |           |               |                                 |                          |
|     | -9 Found 16           | MGT X0132JTX        | NGT_KOTS1/K  | 12.003 00    | 1 021615    |         |           |           |               |                                 |                          |
|     | -% Found 17           | MG1_X0132/13        | NG1_X0132/R  | 12.799 00    | ps 1.021E15 |         |           |           |               |                                 |                          |
|     |                       |                     |              |              |             |         |           |           |               |                                 |                          |
|     | -% Found 18           |                     |              |              |             |         |           |           |               |                                 |                          |
|     | -% Found 19           | MGT_X0Y35/TX        | MGT_X0Y35/R  | × 12.797 GD  | ps 1.021E15 |         |           |           |               |                                 |                          |
|     | -% Found 20           | MGT_X0Y36/TX        | MGT_X0Y36/R  | × 12.803 Gb  | ps 1.021E15 |         |           |           |               |                                 |                          |
|     | -% Found 21           |                     |              |              |             |         |           |           |               |                                 |                          |
|     | -% Found 22           | MGT_X0Y38/TX        | MGT_X0Y38/R  | × 15.803 Cb. | ps 1.021E15 |         |           |           |               |                                 |                          |
|     | -% Found 23           |                     |              |              |             |         |           |           |               |                                 |                          |
|     | -% Found 24           | MGT_X1Y16/TX        | MGT_X1Y16/R  | × 12.803 Gb  | ps 1.021E15 | 0E0     |           |           |               |                                 |                          |
|     | -% Found 25           | MGT_X1Y17/D         | MGT_X1Y17/R  | × 12.796 Gb  | ps 1.021E15 | 0E0     |           |           |               |                                 |                          |
|     | -% Found 26           | MGT_X1Y18/TX        | MGT_X1Y18/R  | × 12.795 Gb  | ps 1.021E15 | 0E0     | 9.79E-16  | Reset     |               |                                 |                          |
|     | -% Found 27           | MGT X1Y19/D         | MGT X1Y19/R  | × 12.800 Gb  | os 1.021E15 | 0E0     | 9.79E-16  | Reset     | PRBS 31-bit   | <ul> <li>PR85 31-bit</li> </ul> | · 0.00 c                 |
|     | -9 Found 28           |                     |              |              |             | 0E0     | 9.79E-16  | Reset     | PR85 31-bit   | <ul> <li>PR85 31-bit</li> </ul> | · 0.00 c                 |
|     | -Sh Found 29          |                     |              |              |             | 0E0     | 9.79E-16  |           | PRBS 31-bit   | · PR85 31-bit                   | + 0.00 c                 |
|     | - % Found 30          |                     |              |              |             | 0E0     | 9 79F-1f  |           | PRBS 31-bit   | <ul> <li>PR85 31-bit</li> </ul> | v 0.00 c                 |
|     | -% Found 31           |                     |              |              |             |         |           |           | PERS 31-bit   | · PERS 31-bit                   | • 0.00 c                 |
|     | -% Found 32           |                     |              |              |             |         |           |           |               |                                 |                          |
|     | -9 Found 33           |                     |              |              |             |         |           |           |               |                                 |                          |
|     | -% Found 34           |                     |              |              |             |         |           |           |               |                                 |                          |
|     | -9 Found 35           |                     |              |              |             |         |           |           |               |                                 |                          |
|     | -% Found 36           | HOT MINDER          | HOLAIT27/R   | 12 802 Ch    | A 1 021E15  |         |           |           |               |                                 |                          |
|     | -% Found 37           | MG1, X1128/13       | HG1, X1128/K | 13 503 00    | 1 021015    |         |           |           |               |                                 |                          |
|     | -% Found 37           |                     |              |              |             |         |           |           |               |                                 |                          |
|     |                       |                     |              |              |             |         |           |           |               |                                 |                          |
|     | -% Found 39           |                     |              |              |             |         |           |           |               |                                 |                          |
|     | -% Found 40           | MGT_X1Y32/TX        | MGT_X1Y32/R  | × 12.802 Gb  | ps 1.021E15 |         |           |           |               |                                 |                          |
|     | -% Found 41           |                     |              |              |             |         |           |           |               |                                 |                          |
|     | -% Found 42           |                     |              |              |             |         |           |           |               |                                 |                          |
|     | -% Found 43           |                     |              |              |             |         |           |           |               |                                 |                          |
|     | -% Found 44           | MGT_X1Y36/TD        | MGT_X1Y36/R  | × 12.804 Gb  | ps 1.021E15 | 0E0     | 9.79E-16  |           |               |                                 | · 0.00 c                 |
|     | -% Found 45           | MGT_X1Y37/TX        | MGT_X1Y37/R  | × 12.800 Gb  | ps 1.021E15 | 0E0     | 9.79E-16  | Reset     |               | · PR85 31-bit                   |                          |
|     | -% Found 46           | MGT_X1Y38/TX        | MGT X1Y38/R  | × 12.800 Gb  | DS 1.021E15 | OEO     | 9.79E-16  |           | PRBS 31-bit   | <ul> <li>PR85 31-bit</li> </ul> | * 0.00 d                 |
|     | -% Found 47           | MGT_X1Y39/TX        | MCT VIVIDIR  | 12 902 Ch    | 0 5 20 514  |         | 4.922E-13 |           | PRRS 31-MI    | <ul> <li>PR85 31-bit</li> </ul> | * 0.00 /                 |



- IBERT testing is done for all the 48 links at 12.8 Gb/s.
  - BER < 1E-15.
  - the last link: RX\_P pin on FPGA side is open due to assembly issue.
  - local clock & LMK03200 are used. TTC clock will be used when SI5345 configuration is ready.
- The typical eye diagram is shown above: open area is 5312.





- Firmware version control:
  - the software can configure the FPGA to load firmware from 1 of the 4 bit files in Flash, via the communication with micro-controller.
- Power dissipation:
  - current for 48 $\times$  12.8 Gb/s IBERT project: VCCINT needs about 9A, MGTAVCC is 10A, MGTAVTT is 5A.
  - project with 4× GBT links, and 2× PCIe cores: VCCINT is 3A, MGTAVCC is 3A, MGTAVTT is 1A, 0.9V for PEX8732 is about 5A.
  - Summary: the power consumption is very close to the analysis done before board design. The whole board will dissipate <64W in the worst case.
- Cooling:
  - for the 48 channel 12.8 Gb/s IBERT project, the FPGA inside temperature is about 63.7 degree.
  - for the IBERT project with 4.8 Gb/s. The temperature inside FPGA is about 53 degree, the outside temperature is about 38 degree.
  - the FPGA will use a fansink, we don't expect MiniPOD (<2W each) will need a heat sink, the air flow in 2U server should be sufficient.
  - One day's running with 12.8 Gb/s links shows no evident problem is related to overheating.
  - the flx-tools will support monitoring temperature of FPGA, MiniPODs and PEX8732.

### **Plan towards the pre-production design**





- Now: bank 224-227 are used for PCIe. This creates congestion in the placement of the logic around that area. The data crossing SLR (Super Logic Region) boundary will increase timing violation.
- Next version: bank 226/227, and 229/230 or 231/232. A complete study will be done to compare these options. If all will work well, the one easier for PCB routing will be chosen.





- Remove the DDR4 modules:
  - the PCB routing will be much easier.
  - no blind via will be used, to minimize sequential lamination.
  - board will be shorter. Space will be available for the integration with TTC PON, or white rabbit modules.





- FELIX prototype development is progressing well.
  - the v1p0 board has been tested extensively and used in the FELIX integration test.
  - the v1p5 board testing is ongoing.
  - good progress after the issue with power chips assembly was resolved.
  - the DDR4, PCIe, transceivers and flash programming have been tested.
  - more boards will be assembled & tested in the coming month, then distributed to FELIX development institutes.
- FELIX pre-production design will be launched soon.
  - major improvements have been identified.
  - new features will be incorporated based on prototype V1P5 test results.
  - board is expected to be available in second half of 2017.