# AM-based Level-1 Track Trigger for CMS Phase II Upgrade

### Zhen Hu **‡ Fermilab**





Dec 23, 2017

CLHCP, Nanjing



### Contents

- Motivation
- System overview
  - Divide and conquer
  - AM introduction
  - PR core firmware introduction
- Hardware R&D
- Full system demonstration
  - Latency
  - Efficiency
- Summary and outlook





### HL-LHC Trigger Upgrade

- High Luminosity LHC in 2026
  - 40 MHz bunch crossing
  - Up to 200 pileup (PU)
  - Tons of data
- Current Level-1 Trigger not working
  - Maximum bandwidth
    - only 100 kHz
  - For HL-LHC, current trigger system would give
    - EG rate @25 GeV  $\rightarrow$  100 kHz
    - Overall Trigger Rate → 1000 kHz (unsustainable) to reach physics goals
  - Increasing trigger threshold →
    lose the opportunity of new
    physics with low threshold



- Upgrade trigger system
  - Must increase total bandwidth
  - Must increase trigger capabilities
  - Level-1 Tracking is a completely NEW handle







### Track trigger advantage and challenge

- Silicon based tracking trigger is crucial for CMS Phase2 upgrade
  - Sharp turn-on efficiency curve
  - Background rate reduction  $\rightarrow$ allows for low object threshold
- Huge challenges
  - How to handle readout of the entire tracker?
    - 260M channel, 40 MHz, 100 Tbps data (the total bandwidth of the world submarine cables a few years ago)
  - ~5 microseconds latency:
    - · Data distribution, track reconstruction, track fitting ...



Dec 23, 2017



Δ

# Solution : divide and conquer



- Space parallel
  - 6\*8 trigger tower
  - 100 Tbps  $\rightarrow$  ~2Tbps





# Solution : divide and conquer



- Space parallel
  - 6\*8 trigger tower
  - 100 Tbps  $\rightarrow$  ~2Tbps
- Time parallel
  - 8x time multiplexing
  - $25ns \rightarrow 200ns$



Conventional

ATCA full mesh

Huge amount of cabling work without ATCA We were the only CMS group developed ATCA in 2016







# Solution : Associative Memory









# Solution : Associative Memory



## AM Pattern and Bank

- A pattern is a low resolution track ۲
  - Made of 1 superstrip (SS) per layer
    - A SS is a group of adjacent strips



**Zhen Hu** 



More powerful AM => less demand on the FPGA More powerful FPGA => less demand on the AM





### Core firmware design







#### ATCA shelf











ATCA shelf #FMC 🚰 Fermila

We tested the data transfer performance for the full mesh back plan, Pulsar2b and RTM ATCA Processing Blade: Pulsar2b









ATCA shelf



We tested the data transfer performance for PRM

ATCA Processing Blade: Pulsar2b



PRM (Pattern Recognition Mezzanine Card)

IPMC (Intelligent platform management controller)



Dec 23, 2017

CLHCP, Nanjing









### Excellent hardware performance

- ATCA shelf
  - 10 blades for parallel processing
  - Full mesh backplane is a natural solution for time multiplexing
    - All of the 56 bidirectional links among 8 Pulsar2b boards were tested at 10Gbps
- Rear Transition Module
  - 10 QSFP bidirectional links
    - 10 Gbps per link achieved
- PRM performance
  - Communication between
    Pulsar2b and PRM FPGAs
    - 10 Gbps achieved
  - Two latest generation of Xilinx FPGAs
    - Interconnection achieved 16.3 Gbps





Dec 23, 2017

CLHCP, Nanjing



# Full system demonstration at Fermilab

- Using the technology today to demonstrate track trigger feasibility
- Two shelves fully loaded with Pulsar-2b boards for 1 Trigger Tower



#### Pattern Recognition Board (PRB) shelf

- 10 Pulsar IIb
- Some boards with PRM Mezzanines

(Bandwidth between any pair of Pulsars is 20Gbps)

#### Data Source Board (DSB) shelf

- Emulates the output of ~400 modules
- 10 Pulsar IIb
- 100 QSFP+ fibers (Capable of sourcing up to 4.8 Tbps data with full shelf)







## **Demonstrator validation**



- Hardware and emulator perfectly matched •
  - Output from each stage validated bit-by-bit
- With the full chain demonstrator, we have measured: •
  - Latency, FGPA resource usage, efficiency, resolution





## **Excellent latency achieved**







### Example events in Vivado



4x(ACB+LTF), 7191.666 ns to 7716.666ns = 525ns, 126 clk @240MHz This is with a selected ttbar + PU200 event with high tail of combinations





# FPGA Resource Utilization (KU060)





PR stage only

PR + 2[FIFO +4(CB+TF)]

- Very light weighted design
- BlockRAM mainly used for DO
  - TF does not increase BlockRAM usage, leaving enough room for TF
- Modest increase in registers and DSP blocks
  - Plenty of room for parallel copies of the fitter





# High efficiency up to PU250

- System is robust against higher luminosity or increase in stub occupancy
  - We demonstrate that the system reconstructs all tracks for events with PU250 within 2.5  $\mu$ s (no truncation needed)
  - Only for high pT jet, truncation needed to meet the pipeline window







### Resolution

#### Excellent performance for L1 application



p<sub>T</sub> resolution

z<sub>0</sub> resolution







# Summary

- A open and flexible ATCA system architecture
  - Regardless of what type of tracking algorithm
- Demonstrated with a vertical slice
  - Excellent performance with today's technology for high luminosity up to PU 250
  - Very low latency: reconstructs all tracks within 2.5  $\mu$ s
  - Safety margins: 1.5  $\mu$ s left to do more processing
- Even with AM ASIC doing the most heavy lifting, we still have challenges for tracking within very high p<sub>T</sub> jets.
   However, we have many ways to improve this.







## AM in FPGA: Overview

- AM in FPGA: very closely follows the AM ASIC (chip) design
  - Match two silicon tiers in ASIC with two modules in FPGA firmware
    - CAM Tier -> a 2D array of Pattern Modules
    - I/O Tier -> fired roads serialization and output
  - Pipelined operation
    - CAM tier: processes pattern matching with stubs for current event N
    - I/O tier: outputs road addresses for event
      N-1 at the same time
- CAM tier logic is optimized for 7-Series/UltraScale FPGA architecture







