Peer-to-peer streaming - Teledyne SP Devices

Skip Navigation Links Home Technology Peer-to-peer streaming

Learn how peer-to-peer (P2P) streaming helps significantly improve digitizer system's performance. P2P streaming side-tracks the host PC by enabling direct data transfers between the digitizer and graphics processing units (GPUs) or to storage. This is a huge advantage compared to conventional solutions which require data to be copied via the RAM of the host PC. With peer-to-peer, both the CPU and RAM can instead be used for other tasks. Watch the videos and above and read more below.

High-performance digitizers with combined high resolution and sampling rate produce massive amounts of data. For example, the ADQ7 combines 14 bits resolution with a 10 GSPS sampling rate resulting in 20 Gbyte of data per second! This exceeds the capacity of the data link (interface) to the host PC, and data reduction is therefore crucial (fig. 1). Onboard field-programmable gate arrays (FPGAs) help address this problem. These powerful computational resources enable real-time signal processing to reduce the data rate so that it matches the link capacity.

Figure 1. The onboard FPGA helps reduce the data rate so that it matches the link capacity without loss of signal information.

The data reduction can be achieved in many ways, for example:

Triggered acquisition of a user-defined number of consecutive samples (so-called records). Data reduction is achieved by only transferring the records while the rest of the data is discarded. This functionality is supported by our firmware options FWDAQ, FWPD, FW2DDC and FWSDR.
In frequency-domain applications, it is common with digital down conversion that combines filtering and decimation to achieve data reduction. This is supported by FWSDR (for ADQ14) and FW2DDC (for ADQ7).
Application-specific data reduction based on known information about the acquired signal. For example real-time averaging of known repetitive signals, or extracting signal characteristics from time-domain pulses. This is supported by the firmware options above as well as FWATD.
Custom data reduction can be implemented using the firmware development kit. This offers full flexibility and can either be implemented by the customer or by utilizing design services offered to OEM customers.

Massive data reduction can be achieved via the FPGA pre-processing. One such example is the use of real-time waveform averaging on ADQ7 using FWATD. This combination has been used by for example mass spectrometry customers to reduce the output rate from 20 Gbyte/s to 40 Mbyte/s - a reduction of 500 times without loss of signal properties/characteristics!

FPGA pre-processing, therefore, allows for maximum flexibility in the mechanical design. Form factors such as USB 3.0 with seemingly limiting data transfer rates of a few hundred Mbyte/s can still be fully utilized due to the data reduction. This in turn offers additional benefits such as locating the digitizer close to the detector in order to minimize reflections.

Peer-to-peer streaming means that the data is sent to a computational node (for example graphics processing unit (GPU) or disk storage) with little or no involvement of the central processing unit (CPU) or dynamic random-access memory (DRAM) on the host PC. There are three types/levels of data transfer:

In a conventional setup, each piece of hardware in the system has a separate driver which connects to the user’s application. The drivers are assigned separate memory spaces in PC DRAM, and therefore data transfer between the hardware units requires significant copying. This results in a heavy load on both CPU and DRAM in the host PC. For example, streaming data in 14 Gbytes/s means 56 Gbytes/s load on the PC DRAM (fig. 2).

Figure 2. Conventional streaming involves writing digitizer data to memory segment S1 in the PC's DRAM (arrows marked "a"), reading from segment S1 to CPU (arrow "b"), writing to segment S2 (arrow "c") and reading from S2 to GPU DRAM (arrows "d"). In total 4 read/write operations which will require 4 x 14 = 56 GByte/s DRAM capability.

One way of improving the performance is to share the memory between the drivers. This method is called pinned buffer, and it effectively reduces the required copying to half so that for example streaming of 14 Gbytes/s results in 28 Gbytes/s load on the PC DRAM (fig. 3). This method can be sufficient given that the PC provides sufficient memory bandwidth. However, a drawback is that all the hardware drivers (including third-party drivers) need to support shared memory and that is not always the case.

Figure 3. With pinned buffer there is only a single memory segment in PC DRAM (denoted "S" above). This solution requires only one write and one read operation and hence 2 x 14 = 28 GByte/s DRAM capability.

The best performance is achieved using peer-to-peer (P2P) streaming. With this method, the data is sent directly between the digitizer and endpoints via a PCIe switch (or root complex) with little or no involvement of the CPU or DRAM on the host PC (fig. 4). This significantly reduces the workload on the CPU and DRAM. P2P streaming is currently supported on both Windows and Linux for ADQ7 and ADQ3-series and Windows only for ADQ14.

Figure 4. Best performance is achieved using peer-to-peer streaming.

One advantage of PXIe is that it supports a large number of hardware units in a single chassis. Multiple high-speed data streams can be set up via the backplane, for example between several ADQ7 units and ADQDSU disk storage units. This ensures optimal performance without loading the PXIe controller (host PC).

Additional resources and downloads

Peer-to-peer streaming is currently supported on devices ADQ7DC, ADQ7WB, ADQ14, and ADQ3-Series. Please contact us if you want to know more about which GPU models from AMD and Nvidia that are supported and under which operating system.

Below is a list of relevant documentation grouped by type.