





# COMET Phase-l実験オン ライントリガーシステムの 開発

#### M1 宮滝 雅己 大阪大学 理学研究科 物理学専攻 青木研究室 <u>m-miyataki@epp.phys.sci.osaka-u.ac.jp</u>

## **COMET Phase-I experiment**



- Purpose: Search for  $\mu$ -e conversion in an Al target
  - Signal : monoenergtic 105MeV electron
- Single event sensitivity:  $3.0 \times 10^{-15}$ (100 times the current sensitivity)
- Detector:Cylindrical detector system
  - $\cdot electron$  momentum and timing measurement

## Cylindrical detector system

#### CDC (Cylindrical Drift Chamber)

- Evaluate particle momentum
- Measurement of particle hit position
  - 4986 sense wires,18 stereo layers
- Readout electronics : RECBE x104
- CTH (Cylindrical Trigger Hodoscope)
- Measure event timing
- Trigger counter
  - Cherenkov & Scintillation counter sets
- CTH trigger rate > 90 kHz
  - 4 fold coincidence
  - Accidental coincidence & low-E electron dominant





# Trigger system

#### Requirement

- Trigger rate 13 kHz
  - constraints from DAQ
  - •CTH trigger : 90kHz
- •Latency <7 $\mu$ s
  - Constraint by RECBE buffer time

#### **Expected performance**

- Signale efficiency 96%
  - w/ the required trigger rate
- •Latency  $3.2 \mu$ s

Improve by Neural Network!



\*GBDT(Gradient Boosted Decision Tree) is one of the Machine Learning Algorithm

### Why Neural Network?

- · Efficient finding of signal electron
  - · Neural networks learn relationships between neurons in the form of weights
- FPGAs are a good match for NN calculations
  - The calculation of a neural network is essentially a non-linear function, an iterative inner
    product calculation. High energy efficiency hardware implement and massively parallel
    computing are heavily demand.
- Tools for building machine learning models on FPGAs have become available in recent years.



$$\mathbf{x}_{m} = g_{m} \left( \mathbf{W}_{m,m-1} \mathbf{x}_{m-1} + \mathbf{b}_{m} \right)$$
hls 4 ml

© Copyright 2021, Fast Machine Learning Lab.

## Hit classification

Remove wires with multiple hits

to eliminate low energy electrons staying in the same cell

 Machine learning to score hit information for each wire based on energy loss and local patterns

**Input feature** :  $\Delta E$  @ wire-of-interest

 $\Delta E @$  wire-of-interest  $\Delta E @$  wire-of-interest Layer ID of wires





### **Event Classification Algorithm**

#### Previous algorithm



### **Concept of the event-classification**



- Now event classification problem changes into image classification problem.
  - Pattern recognition of the signal electron trajectory with hit score.
- Neural network design study is ongoing.
  - ->Feasibility check

How to implement your NN model on FPGA? & Feasibility Check How long latency ? How large resource? hls 4

© Copyright 2021, Fast Machine Learning Lab.

### HLS4ML concept

Work flow to translate a model into a FPGA implementation using hls4ml



%Fast inference of deep neural networks in FPGAs for particle physics arXiv:1804.06913v3 [physics.ins-det] 28 Jun 2018

#### Key metrics for an FPGA implementation

#### Latency

The total time required for the algorithm to complete

#### Resource usage

- BRAM : Block RAM
  - Hardened RAM resource
- DSPs : Digital Signal Processor
  - Performs multiplication and other arithmetic in the FPGA
- FF : Flip Flops
  - Register data in time with the clock pulse
- LUTs : Look Up Table(Logic)
  - Generic functions on small bit width inputs.

These limitations become constraints.

11

## Neural Network on FPGA



現行のCOTTRI MB\*に対しどのくらいのサイ ズのネットワークが構築できそうか、MNIST 手書き文字データセットで試してみた

> Xilinx Kintex-7 FPGA with part number xc7k355tffg901-1 is installed on COTTRI MB

#### MNIST handwritten character data test bench Two classes of classification, 0 and 1



## Efficient network design

- Compression
  - Reduce the number of synapses or neurons
- Quantization
  - Reduces the precision of the calculations (weights, biases, etc)
- Parallelization
  - Reduce resource utilization by reusing DSPs(Tune how much to parallelize the multiplications required for a given layer computation)



%Fast inference of deep neural networks in FPGAs for particle physics arXiv:1804.06913v3 [physics.ins-det] 28 Jun 2018

#### **Parallelization and Quantization** Parallelization Quantization Inner layer data precision<7,1>\*

Reuse factor == 1

- Data precision<16,6>\*
- Reuse factor ==2



<X,Y>\* X : total number of bits, Y : number of bits representing the signed number above the binary point

## Comparison of the 3 cases

- Casel Bench mark
- Case2 Reuse Factor ==2
- Case3 Inner Layer Data precision <7,1>

|       |                            | 1         |   | Name           | BRAM_18K | DSP48E      | FF     | LUT    |
|-------|----------------------------|-----------|---|----------------|----------|-------------|--------|--------|
|       | latency                    | Reuse     | ; |                |          |             |        |        |
| Case1 | 0.170us<br>(34 cycles) 🥆   | 1         | e | Available      | 1430     | 1440        | 445200 | 222600 |
|       |                            |           |   | Case1          | ~0       | 174 🔍       | 29     | 23     |
| Case2 | 0.310us<br>(31 cycles X 2) | doub<br>2 |   | Utilization(%) |          |             |        |        |
|       |                            |           |   | Case 2         | ~0       | 90 🖌        | 21     | 39     |
| Case3 | 0.124us<br>(25 cycle)      | 1         |   | Case 3         | ~0       | <b>4</b> 58 | 12     | 29     |
|       |                            |           |   | Utilization(%) |          |             |        |        |

Manipulating the accuracy of the data seems to be effective both in reducing latency and in reducing DSP usage.

## **Simulation Result**

The third pattern was High-Level-Synthesised in VIVADO HDL, and then simulated in VIVADO.

I input the signal for handwriting 1



The latency is two clocks longer, but the results are roughly as expected.

## Summary

- The basis for the trigger system is already in place
- We are looking for more powerful event classification algorithms and neural networks are a good candidate, so I'm studying simple MLP first.
- Checking the feasibility of Neural Network implementations using MNIST handwritten character data sets

Back up

### **ADC-Sum compression**



## Hit characteristics



### $\cdot$ Signal hit

- Continuous hits in the same layer
  - Spiral trajectory on CDC
- $\cdot\,$  Single hit in the same wire
- Not reaching outer layers
- MIP Level energy loss

### Background hit

- Low energy electrons
  - Helical trajectory in the same cell
  - Multi hits in the same wire
- · Protons
  - High momentum
  - Large energy loss

\*COMET Phase-I Technical Design Report Fig43 Based on these characteristics, input features were chosen

### **GBDT input feature** Hit classification by GBDT@COTTRI FE

&Cut multi hits on the same wire before hit classification

#### ★2bit Edep data

- Energy loss information
- 0 if the interest wire has no hits

#### $\bigstar$ LayerID: radial distance from the CDC center

- The distribution of Signal has a peak, but B.G. does not.
- Cut hits in innermost & 3 outermost layers
  - Signal electrons are hard to reach the outer layer





Wire of interest

0.0

0.02

0.0

inner



12

14

16

Outer layerID

### **GBDT output distribution** Hit classification by GBDT@COTTRI FE



This hit classifier gives hit scores to each wire hit



## **Event Classification flow**



#### How to make event-classification input data







X vs Y

X vs Y































## **FPGA programing Flow**



#### Key metrics for an FPGA implementation

#### 1. Latency

The total time required for a single iteration of the algorithm to complete

#### 2. Initiation interval

 The number of clock cycles required before the algorithm may accept a new input

#### 3. Resource usage

- BRAM:Block RAM
  - Hardened RAM resource
- DSPs : Digital Signal Processor
  - Performs multiplication and other arithmetic in the FPGA
- FF : Flip Flops
  - Register data in time with the clock pulse
- LUTs : Look Up Table(Logic)
  - Generic functions on small bit width inputs.

These limitations of the metrics become constraints.



**%**Fast inference of deep neural networks in FPGAs for particle physics arXiv:1804.06913v3 [physics.ins-det] 28 Jun 2018

### **Neural Network**



multiplication between layer m-1 and m is also  $N_m \times N_{m-1}$ . The total number of multiplication is

$$N_{multiplication} = \sum_{m=1}^{M} N_{m-1} \times N_m \quad \mathbf{CDSPs}$$