

# J-PARCハドロン実験用の 連続読み出しDAQ開発の現状

#### Outline

- Introduction
- Front-end electronics
- Clock/command/timing distribution
  - CDCM transceiver
  - SPDT protocol
  - Test results
- Summary





## 本多良太郎 for the J-PARC E50 experiment



# Introduction



Introduction, J-PARC E50 experiment

#### Motivation

• Reveal the effective degree of free of the baryon internal structure, the di-quark correlation, by introducing heavy (*c*) quark.

#### Strategy

- Missing spectroscopy via the  $\pi^- p \rightarrow D^{*-} Y_c^*$  reaction.
- Measure production cross and decay branching ratio simultaneously.

#### The experimental setup at J-PARC E50 experiment





#### Secondary $\pi^-$ beam

- 20 GeV/c
- 30 MHz (60 M/spill)
  - (2s duration for slow extraction)

#### Target

• Liquid  $H_2$ , 4-g/cm<sup>2</sup>

#### Reaction

Charmed-baryon production
~1 nb/sr
Background reaction
2.4 mb/sr
Total reaction rate
1.5 MHz
Charged-particle multiplicity

Trigger-less data-streaming-type DAQ system

Schema of the DAQ system

FairMQ +

redis July stem Group

- Process monitor and control via inmemory type DB.
- Automatic topology generation



Total data rate: ~12 GB/s (25 GB/spill) (E50 case)

Clock/command/timing

distribution



# Front-end electronics



#### AMANEQ



AMANEQ V оноки и **Small mezzanine** for I/O extension C7K160T -2FFG **Mezzanine slot DDR3-SDRAM** (compatible with (DDR3-800, 2Gb) HUL) 111111 NIM I/O **RJ45** (Belle2 trigger Main input compatible) Mezzanine2 part (Diff. signals) (DC 30-35V)

A main electronics for Syst network oriented trigger-less data acquisition system (AMANEQ)

- VME 6U size but it doesn't have VME bus
  - VME crate without the power is used as a housing box
- Kintex7 with speed grade -2
  - Transceiver bandwidth up to 10Gbps
  - Can implement SiTCP-XG
    - Throughput: 9.12 Gbps (96% of payload limit)
- Main input ports compatible with HUL
- Has two mezzanine slot
  - Compatible with HUL
    - Hadron Universal Logic module (J-PARC hadron)
  - Mount HUL mezzanine HR-TDC
  - Mount DCR mezzanine for DC readout
- Belle2 trigger port (master clock)
- Has a jitter cleaner (CDCE62002)
- DDR3-SDRAM as a de-randomizer
  - DDR3-1333 with 16-bit bus width.
  - 2 Gb
- Powered by the external power supply with DC 30-35V

# AMANEQ



Electronics System Group



## CIRASAME



CITIROC based multi-MPPC readout electronics for continuous timing measurement (CIRASAME)

- On detector FEE for scintillating fiber trackers and RICH.
- Two 8x8 MPPC arrays are directly attached. (128 ch)
- Four CITIROC chips read them.
- Kintex7 with speed grade -2
  - Can implement 10G SiTCP (SiTCP-XG)
- Belle II link port (master clock)
  - Has a jitter cleaner to clean up the master clock
- DDR3-SDRAM as a de-randomizer
  - DDR3-1333 with 16-bit bus width.
- Powered by the external power supply with DC 35V

Digital part is the same as that of AMANEQ.



# MIKUMARI (Clock/command/timing distribution)





# Clock-duty-cycle-modulation (CMCM) based transceiver



#### Background

#### Requirements

- Low jitter (better than 20 ps)
- Synchronous timing/command/tag distribution
- Controllable phase of the recovered clock
- Distribute a clock over 100 m
- As few transmission lines as possible

#### **Usual solution**



It actually works well, but

- Strongly depends on FPGA built-in blocks.
- Need a special electronics dedicated for distributing clock/data via serial transceivers.

#### Develop a serial transceiver independent clock/timing distribution system. Develop a protocol to send a data and synchronous pulse over.





#### Principle of CDCM

Adopting clock-duty-cycle-modulation (CDCM) as a core technology

- CDCM is a data-on-clock type modulation. (8b10b is a clock-on-data type)
- Data bits are embedded to the trailing edges of the clock signal.



Denis Calvet, IEEE TNS (Volume: 67, Issue: 8, Aug. 2020)

#### Advantages

- This modulated clock can be directly input to PLLs and MMCMs in FPGA and external jitter cleaner ICs.
  - Because the leading edge is used by the phase detector to control VCO, but the trailing edge is not.
- Output clock skews from MMCMs respect to the input modulated clock are automatically adjusted by using the global clock network in FPGA.
  - Automatic phase alignment among front-end electronics.
  - Recovered clock by MMCM can give a phase reference for a clock from the external PLL, which does not have a zero-delay mode.





#### Principle of CDCM



For skew adjustment between slow and fast clocks, the global clock buffer is necessary.

• Maximum transferrable frequency: 125 (142) MHz due to the limitation of the BUFG performance.



[1]. D. Calvet, IEEE TNS (Vol67, Issue8, Aug. 2020)



MIKUMARI 水分神(みくまりのかみ):水の分配をつかさどる神





#### SerDes based CDCM transceiver

pen source consortium of Instrumentation





#### SerDes based CDCM transceiver

is idle

ISIDLE









# Synchronous pulse and data transmission (SPDT) protocol



packet

n source consortium of Instrumentatio



#### **SPDTP** packet structure

| Magic<br>(0xFD) | Data length +<br>Instruction | Pulse timing | Reserve | User data | Check sum | IDLE   |
|-----------------|------------------------------|--------------|---------|-----------|-----------|--------|
| 1 byte          | 1 byte                       | 2 byte       | 1 byte  | 0-16 byte | 2 byte    | 1 byte |

- CDCM data rate: 2-bit per clock-cycle (CDCM-10-2.5) ٠
- Encoder/decoder need 4 clock-cycles to encode/decode 1-byte data. .
- SPDTP packet size: 8-bytes + user data (0-16 bytes) .
  - 32-96 clock-cycles are necessary to send/receive a packet. ٠



packet

\*\*\*IDLE: duty 50% pattern.



# Test results



#### Test configuration

Open source consortium of Instrumentation

Electronics System Group

Master

Slave







Used oscilloscope

- Keysight DSOS054A (Analog BW: 2.1 GHz, 20 GSPS)
  - For NIM signal measurement
- Tektronix DPO 7254 (Analog BW: 2.5 GHz, 40 GSPS)
  - For LVDS measurement



#### Demonstration

n source consortium of Instrumentatio





40T packets were transferred in this demo. No packet drop and checksum mismatch were happened.

Pulse transfer by SPDT protocol



## Jitter performance of master clock

Jitter distribution of master clock



|           |           | Total jitter (Ti) | Syster   |  |
|-----------|-----------|-------------------|----------|--|
|           | Std. div. | @ BER 1e-12       | DPO7254  |  |
|           | 5.0       | 110               | DSOS054A |  |
| CDCE02002 | 5.0       | 77                | Unit: no |  |
| MMCM      | 7.0       | 190               | Onit. ps |  |
| MINICINI  | 1.9       | 106               |          |  |
| DUI       | 8.8       | 297               |          |  |
| FLL       |           | 114               |          |  |
|           |           |                   |          |  |

|                | CDCM linkup | SPDT data<br>transmission |
|----------------|-------------|---------------------------|
| CDCE62002      | Succeed     | Succeed                   |
| MMCM<br>(FPGA) | Succeed     | Succeed                   |
| PLL<br>(FPGA)  | Succeed     | Fail                      |

**Linkup**: Completing IDELAY adjust and bit slip for SERDES and encoder/decoder.

FW with PLL detects the broken modulated pattern soon after the communication start. Modulated pattern transfer and/or receive don't work well due to the large clock jitter.







| →         | Edge-to-Edge time measurement<br>between master and recovered clock |       |       | Recovered clock jitter |           |                   |          |
|-----------|---------------------------------------------------------------------|-------|-------|------------------------|-----------|-------------------|----------|
| Condition | Master                                                              | Slave | データ送信 | Edge-Edge<br>Std. div. | Std. div. | Tj<br>@ BER 1E-12 |          |
| 1         | CDCE                                                                | CDCE  | SPDTP | 12.5                   | 8.9       | 158<br>70         | Unit: ps |
| 4         | MMCM                                                                | MMCM  | SPDTP | 17.7                   | 15.5      | 240<br>168        |          |
| 5         | CDCE                                                                | CDCE  | アイドル  | 11.4                   | 8.8       | 162<br>71         |          |
| 8         | MMCM                                                                | MMCM  | アイドル  | 11.9                   | 7.6       | 213<br>121        |          |



Edge-to-Edge measurement reflects the structure of the main peak of the jitter distribution.

In order to take into account the long tail structure, additional jitter measurement using the spectrum fitting is necessary.



# Jitter performance of recovered clock



#### Jitter distribution of recovered clock (CDCE62002 - CDCE62002)



#### Jitter distribution of recovered clock (MMCM - MMCM)



Data dependent jitter is suppressed in the case of CDCE62002.

If you use MMCM, sending IDLE pattern provides a better jitter performance.



# Synchronize FPGA HR-TDC



## Test setup (HR-TDC)

Electronics System Group





**Timing resolution of synchronized HR-TDC** 

# FPGA based HR-TDC



HUL/AMANEQ mezzanine HR-TDC



32 ch FPGA based HR-TDC

- Both leading/trailing edges
- Intrinsic resolution: 15 ps ( $\sigma$ )





Precise synchronization of the sampling clock (500 MHz) is the key.

# Possible clock path





## Test results

The resolution when two signals were input to the same HR-TDC.

•  $20.6 \pm 0.1 \text{ ps}(\sigma) = \sqrt{2 * (\sigma_{\text{intrinsic}})}$ 



The sufficient performance for the J-PARC E50 experiment. Pattern 3 is preferred because the 125 MHz clock phase is aligned to that of input clock (recovered clock from CDCM).

$$\sigma = \sqrt{2\sigma_{intrinsic}^2 + 2\sigma_{random}^2}$$

 $\sigma_{random} \sim 8-10 \text{ ps}$  Consistent with the result so far.

**E**lectronics

System Group



- CDCMを利用したクロック・タイミング伝送は、SiTCPを使っている実験では、良いツールになると思い。 われる。
- GTトランシーバを使ってデータ読み出しと同期を行っているシステムでは恩恵がない。
- FPGAの一般IOを使ってIOSERDESで変調クロックのやり取りを行う事がキモであるため、信号規格は 問わない。短距離であればメタルケーブルでつないでもよいと思う。
- 伝送可能なクロックスピードは下がるだろうが、単一端信号でも同様の事が可能である。
- AMANEQのメザニンのような拡張基板を用意すれば、MIKUMARIは簡単に実装できる。
- 7シリーズのファミリ、HP・HRバンクの別でIOスピードが異なるのでその点だけ注意。
- 今回使ったSFPモジュールはFS.comの1000BASE-SR用の光モジュール。
- SFP+のTXとRXの信号線はSFF-8431の規格でCMLを使う事が決められているが、差動振幅(V<sub>OD</sub>)は 400-2000 mVと幅がある。実際に測ってみるとFS.comのモジュールはV<sub>OD</sub>=800 mV程度。普通のCML 信号だと思われる。
- 今回はまじめにCML-LVDS変換を行っているがRXは直接FPGAへつないでも大丈夫(V<sub>OD</sub>だけ要確認)。
   通常SFPモジュールはAC結合なので、FPGA側はre-biasするかDCIを使って入力同相電位を決める必要がある。
- TX側はLVDSのスイングが小さいので、ICでCMLへ変換した方が無難だと思われる。



Electronics

- Summary1
- The trigger-less data-streaming-type DAQ system is introduced in the J-PARC E50 experiment and is shared among the other experiments in the high-p beam line.
  - The total expected data is 12 GB/s for the E50 case.
- A main electronics for network oriented trigger-less data acquisition system (AMANEQ) was developed.
  - Two mezzanine slots, which is compatible with HUL, for the function extension.
  - Two data link with the speed up to 10 Gbps. Realized 10 Gbps TCP communication by SiTCP-XG.
  - 2 Gb DDR3-SDRAM.
- The obtained throughput of SiTCP-XG was 9.12 Gbps. 96% of the TCP payload limit.
- CIRASAME was developed for readout for the scintillating fiber trackers and RICH detectors.
  - It has four CITIROC1A ASICs.
  - Digital part is the same as that of AMANEQ.
- Heartbeat method was developed for the continuous timing measurement without any external timing reference signal.



Electronics Svstem Group

### Summary2

- AMANEQ will be used for the clock/timing distribution in the J-PARC experiments.
- In order to transmit the clock and data using a full-duplex optical transceiver, clock-duty-cycle-modulation was adopted.
  - The modulated clock can be directly fed into PLL.
- SerDes based CDCM transceiver and the synchronous pulse and data transmission (SPDT) protocol were developed and implemented in FPGA on AMANEQ.
  - The maximum clock frequency can be transferred by this transceiver is 125 and 142 MHz for the speed grade -1 and -2 FPGA, respectively.
- The jitter performance of CDCE62002 is better than that of MMCM in FPGA.
  - 5.0 ps in std. div., ~110 ps peak-to-peak (eye measurement.)
- PLL in FPGA accompanies a long tail in the jitter distribution. And the clock from PLL can not drive the CDCM transceiver correctly.
- The two AMANEQs are synchronized within 12-13 ps in std. div..
- Basically, sending IDLE patter except when the trigger is transmitted provides the better jitter permeance.
- Timing resolution between two FPGA based HR-TDCs synchronized by MIKUMARI is 24-26 ps ( $\sigma$ ).











#### **Requirement from E50**

Momentum analysis is essential to reduce the trigger rate to the acceptable (~10kHz) rate.

#### Other physics program at this beam line

Λp scatteringΞ\* spectroscopyPion-induced Drell-YanI=3 dibaryon searchThe required trigger condition are different.



The DAQ system must be flexible and scalable. Omit the hardware (FPGA) based trigger, and introduce the trigger-less data-streaming-type DAQ system.



#### Data transfer speed via SiTCP-XG



#### Test setup

Open source consortium of Instrumentati



## Throughput of DDR3-SDRAM



#### Firmware configuration



Data bus of SDRAM is bi-directional. Memory operation is determined by command.

#### Tested write/read pattern.

Open source consortium of Instrumentati





Obtained throughput (reference value) **DDR3-800** 

- ~4.8 Gbps (6.4 Gbps)DDR3-1333
- ~7.9 Gbps (10.66 Gbps)

\*\*\*Access to the same memory bank. Larger over head when changing the bank address.

## CIRASAME







# Heartbeat method



Heartbeat method for the continuous timing measurement

We need the continuous timing measurement over 2 s (spill duration of J-PARC slow extraction)
Required dynamic range: ~10<sup>10</sup> (1 ns TDC case)

Introduce heartbeat method: a technique to reconstruct the time without a long-length time stamp.



