Latency Matters in High-Speed Imaging – But FPGAs Can Help

In high-speed imaging where real-time feedback is required, latency cannot be tolerated, especially in mission-critical applications. In this video, Ray Hoare from Concurrent EDA talks about what latency is and why it matters in systems leveraging high-speed or high-data rate cameras. He explains how leveraging an FPGA near the image sensor allows the system to handle the processing requirements at full frame rates and in real time.

Concurrent_Latency_July-2025.mp4 KC: Audio automatically transcribed by Sonix

Concurrent_Latency_July-2025.mp4 KC: this mp4 audio file was automatically transcribed by Sonix with the best speech-to-text algorithms. This transcript may contain errors.

Ray Hoare:
Welcome, Ray Hoare with Concurrent EDA. I want to give you a quick take on why latency matters. And we're going to look at two cameras in particular. Who are we? We're Concurrent EDA. We've been in business 19 years. We are an AMD-embedded partner at the elite certified level, which means we're a bunch of nerds, so that's good. So, I'm a nerd. So, one of the things that we do is we take algorithms and move them into electronics at 40 gigabits per second or faster. FPGAs can be found inside of cameras, inside of frame grabbers, or in embedded systems. We've done a variety of algorithms that we've put into these devices. But I want to explain one thing about latency. People talk about bandwidth. You hear 1 gig, 10 gig, 25 gig, 100 gig. But what is this latency, and why does it matter? So, I want to look at two different cameras. One camera is a high-speed camera. That's our GigaSens 2 megapixel camera. It's a smart camera, has an FPGA inside the camera. It can do 2200 frames per second full resolution. Or if you want to get a small region of interest, it's over 200,000 frames per second. And the nice thing about this camera is it has an FPGA inside of it, where we can do some processing at full frame rate. Now, on the other side of the equation, from SVS-Vistek as an example, is a 245 megapixel camera that's doing 12 frames per second.

Ray Hoare:
Also, 40 gigabits per second. So, both of these cameras 40 or 50 gigabits per second. Lots of pixels coming out. How do you process it? Why do you process it, and why does latency matter? And in this example I'm going to highlight the Eurasys frame grabbe, which has the ability to have custom logic put into the FPGA on the card itself. So, just like the camera, you can put it inside the FPGA right next to the sensor. This one, it's not in the camera itself, but it's in the frame grabber. And so, the frame grabber can do the processing. So, both are very different cameras--high speed, low resolution, crazy high resolution, but 10, 12 frames per second. Both actually are generating 40 gigabits per second of data. So, here is an example that we did a while back at Photonics West where, I'm going to pause the video here. We have this laser beam on this cute little demo here pointing over at a target. And you can see it's right there, and it bounces around. Now it's just a toy. We're just putting a laser in there. So, can we track it? How do we track it? And can we do this in real time. And in this example, we have it in the frame grabber, and you can see that laser dot in there. And we put a bounding box around it. And that was done all inside the FPGA inside the frame grabber at full rate, and I'll just let that play for a little bit so you can see it.

Ray Hoare:
And you can see that dancing laser over here, and it being displayed here. The computer is doing nothing in this case. That processing is being done on the frame grabber as the data comes through. So, let's talk about that for a second. What does that that sort of data processing look like from the photon to the control. The image sensor is the thing that grabs the photons, turns photons into voltage, which turns them into digital signals, which turns it into an image. And that's done inside the image sensor. And there's always a camera FPGA or maybe an ASIC in there in some cases, but usually it's an FPGA, and it's talking to the image sensor, doing some preprocessing, getting that image a little bit better, and sending it down the cables. The frame grabber receives the sequences of pixels and creates a frame. That frame is then called a frame grabber, and it grabs that whole frame, moves it over the PCI bus. So this part is on the right-hand side. That's all inside. The computer moves it over this PCIe bus into DRAM or the system memory where the CPU or the GPU does processing. In this case, you know, so you're doing processing. So, what we have is that whole frame has to get buffered and then moved into DRAM and then processed by the CPU.

Ray Hoare:
And then if it wants to do control, it sends something back out the PCIe bus to an I/O card. Now this I/O card can be the same as the frame grabber, but you're still transversing the PCI bus twice. What this means is that I'm into the milliseconds, tens of milliseconds, hundreds of milliseconds of latency. So, if I'm thinking about controlling something, I'm going to get that photon hits the sensor. I go through this whole processing pipe to the I/O. That delay, that latency, means you can only control something so fast. If you're trying to control a laser, you're trying to do additive manufacturing, or something, or motor control, or something of that nature where you're visually seeing something and controlling something, that latency really matters. All right. So let's say we do the fancy camera and the image sensor, and we use this FPGA inside of the camera itself to do some of that processing. Well, why do we care? So what? Why not just put it in the PC? Well, in this example, we have the image sensor and the FPGA. Not only does it help turn that stream of pixels into a frame, but as that stream comes into the FPGA, we do processing. And that processing is the thing where we're grabbing the pixel images, and we're computing some value that we're trying to extract. In this example, we used a laser beam, and we're trying to follow the laser, in which case we process the image.

Ray Hoare:
We found the high intensity part of the image, and we found its coordinates. And then we said, oh, let's put our bounding box on that image. And so we did that actually in-line as the pixels came in and filled up the frame buffer. So, we didn't actually have to wait until the whole image came in. Therefore our latency, our delay, went down really, really small. This is really useful when we're doing laser tracking, or we're trying to do additive manufacturing where you're using a laser and it's melting the powder. And it's then creating a little melt pool, and you're building that up. Now that melt pool, you don't want that melt pool to be too big, because then it just goes everywhere. You don't want it to be too small, because then you haven't melted the powder. Other things . . . for welding control or other applications of control that you might want to do with a camera. So, that idea of latency from the image sensor to the control--it's great to have it in hardware. If you put it in software, well, then you also have an operating system involved, and other things can go on. Or if there's multiple steps in there, the latency adds, adds, adds, adds. GPUs are great, but they also add latency. So, some some math just to satisfy my nerd, if I look at three frames here, one is the top one where I'm just doing all in software, CPU, GPU, and I'm going to look at these two cameras.

Ray Hoare:
One is my 2 megapixel camera and the other is my 245 megapixel camera. If I do it all in software, it takes a while, and that GPU then has to process all 245 megapixels. And you know, that's a lot of processing, especially if you can do a 5x5, you can do some AI on it. Well, where do you put that? So, what we're saying is, hey, if you want to put that inside the frame grabber logic, the photon comes in, maybe it's not in the camera itself. Maybe you put it in the frame grabber, and the processing is done as the data streams in to the frame grabber. Then your CPU or GPU can then do more advanced AI. Maybe you want an ROI, maybe you want to do image improvement. Maybe you want to do flat field correction. You want to do color correction. You want to do all this sort of front-end processing, maybe extract the region of interest or gather statistics from it and then have your CPU or GPU do control applications, or you're going to flip it right back out the door for your control applications.

Ray Hoare:
And this bottom one is when we're in the smart camera. So, in this math, and I won't go too long but I wanted to show all the calculations, I'm looking at two cameras, horizontal and vertical, where I'm going to look at two different pieces. One is where I'm looking at the whole frame and the next line where I'm only looking at five rows at a time. And let's look at how much time it takes to buffer up the whole frame versus five rows. And I pick five rows so you can do a 5x5 convolution, which is common in image processing. If I take 10 bits per pixel, I get, you know, 20 million pixels, 20 megapixels, 2 megapixels times 10. So, it's 20 megabits per frame. But if I'm only looking at five rows, my amount of data that comes through is much smaller. If I look at the speed of which I'm doing the transmission, 40 gigabits, whether that 40 gigabits is going into the frame or coming right off of that FPGA, it's the same 40 gigabit. That's our rate. How much latency? How much time does it take to grab that frame? And so in the case of an entire frame, we're in 518 microseconds, or 0.5 milliseconds. However, if I'm only looking at five rows, I'm down here at two microseconds. So, I'm starting, in this case, starting the processing as the data comes in. And as the data flows into the FPGA, we can do that processing. And as the last set of rows come in, I'm still doing that processing. And by that time the last pixel comes in, a short latency later, a short period of time later, 2 microseconds, yeah 2 microseconds later, I'm done processing. If you can do it in the FPGA. So, therefore your frame time and your compute time overlap. And that's beneficial if you want to do control. So,we can never get away from how much time it takes to get the pixels off the imager. But we can do processing as the pixel comes off the imager. All right.

Ray Hoare:
Now let's look at the beefy camera. This this beautiful 245 megapixel camera from SVS-Vistek. This is the SHR 811 with a CXP interface on it. This is 19,000 pixels horizontally by 12,000 vertically. This is a beast. Wonderful camera. If we look at 10 bits per pixel, we have a lot of bits in there. So, we're doing the calculation. If we look at how long does it take a whole frame to come out, we're into 61 milliseconds. So, 0.06 seconds. That's a fair amount of time. But it's a lot of pixels. But if I look at just that that one horizontal of 19,200 times five, well, the amount of time it takes to come out is still 24 microseconds. So, it's really, really small. So, if we then put that processing in the frame grabber logic that latency is really, really small. We get down to 24 microseconds. By the time the whole frame has come into the frame grabber, 24 microseconds later, we're done. So, really your frame time plus a small little delta is really your control time.

Ray Hoare:
In this case, if you're doing control and you want to do processing, and the key here is if you can put it in the FPGA, then you can actually get really fast frame rates and very low latency without actually modifying or loading the CPU. So, if we want to say, well, I don't want to do that, I'm going to use a CPU, you know, I'm looking at it, if I take only three times this, this latency, three frame buffers, I'm at 184 milliseconds. So, 0.1 second before I've even done anything with the data, or if I maybe have touched it once. Now you want to do processing. Now your control times are in fractions of seconds to seconds, so you can't really do any high-speed control. But if we do it in the frame grabber, I can do lots and lots of processing. So, latency matters. Processing matters, but if you can put it in the FPGA, then you're in good shape. Now we obviously are FPGA nerds, so that's what we do. The past 19 years we've been taking algorithms, putting them into FPGAs for a whole host of applications, whether in this case we talked about putting it in the camera or in the frame grabber. We also have other boards that have versatile FPGAs with Ryzen CPUs on one Mini-ITX or a bunch of FPGA modules. If you have any questions, let me know. We're here to help. Thanks for listening. Bye.

Sonix is the world’s most advanced automated transcription, translation, and subtitling platform. Fast, accurate, and affordable.

Automatically convert your mp4 files to text (txt file), Microsoft Word (docx file), and SubRip Subtitle (srt file) in minutes.

Sonix has many features that you'd love including share transcripts, collaboration tools, world-class support, generate automated summaries powered by AI, and easily transcribe your Zoom meetings. Try Sonix for free today.

Distributed by Concurrent EDA, LLC

Pricing, Availability and Ordering

Email This email address is being protected from spambots. You need JavaScript enabled to view it. or contact us using the web form below!

	Telephone	412.687.8800
	Address	5001 Baum Blvd Ste 640 Pittsburgh PA 15213
	Email	info@concurrenteda.com