Faster video processing

My Python skills are definitely lacking!

My DiddyBorg is fitted with a USB web camera mounted on a servo which enables it to look around, use openCV (cv2) to find a target across the room (a bulb in a jam jar wrapped with blue film at the moment) and direct Diddy towards it. It needs to do this several times as Diddy is also avoiding obstacles on the route using ultrasound sensors and needs to find the target after each avoiding maneuver. There is also a static CSI camera which is used, again using cv2, to identify animal shapes on cards around the course.

Although I have got this all working it is a very slow process. Because PiCamera will not work with the web camera I am using cv2 to capture the images for processing, as in cap = cv2.VideoCapture(0) ..... success, frame = cap.read().

I have been studying your code in diddyJoyBall which uses several threads running at the same time to build a pool of images ready for processing, which I am guessing speeds up the processsing. I believe it is based on the code for rapid capture and processing in the PiCamera documentation? Most of the code is about the threading aspects and I have been trying to write code which does the same thing using the images I get from cv2 rather than the images from "camera.capture_sequence(.....)" used in diddyJoyBall but, to be blunt, I am finding it very difficult.

I wondered if you have any ideas on how I can achieve this? I would be most grateful for a few pointers.

piborg's picture

If you are using either a Raspberry Pi B+, 2, or 3 then threading is definitely the way to go.
Without threading the script can only really make use of one of the four cores.
This means in an ideal situation the processing could run up to four times as fast :)

We are also using cv2.VideoCapture(0) now with Formula Pi.
The Pi camera support has improved a lot since we wrote the DiddyBorg scripts and we have found it is faster than the older PiCamera based code.
This also has the advantage of the same code works with both the Pi Camera and a USB web camera :)
The OpenCV support for the Pi Camera can be setup by running this command:

sudo modprobe bcm2835-v4l2

The threading is a bit more complex as you have create the threads by hand.
This is not too difficult with Python, but understanding thread can be a bit tricky.

Below is the basic skeleton code we use for getting image processing code to run in multiple threads.
The actual processing code goes into the ProcessImage function inside the ImageProcessor class.

The ImageCapture class is responsible for getting frames from the camera.
It runs in its own thread, but uses very little processing time in practice.
What it does is wait until there is at least one processing thread waiting (held in processorPool).
It takes the oldest thread out of this pool and then gives it the next frame returned from calling cap.read and sends it an event notification.
This continues until the global running flag becomes False.
There should only be one of these threads running.

The ImageProcessor class gets passed the new frame and event notification each time we want it to process a frame from the ImageCapture thread.
Each runs in its own thread and adds itself back into the list of waiting threads (processorPool) after the ProcessImage function has finished.
We usually run four at the same time, but you should be able to have any number from one upwards :)

Our skeleton threading processing code:

# Imports
import cv2
import time
import threading

# Camera settings go here
imageWidth = ...
imageHeight = ...
frameRate = ...
processingThreads = 4

# Shared values
global running
global cap
global frameLock
global processorPool
running = True

# Setup the camera
cap = cv2.VideoCapture(0) 
cap.set(cv2.cv.CV_CAP_PROP_FRAME_WIDTH, imageWidth);
cap.set(cv2.cv.CV_CAP_PROP_FRAME_HEIGHT, imageHeight);
cap.set(cv2.cv.CV_CAP_PROP_FPS, frameRate);
if not cap.isOpened():
    cap.open()

# Image processing thread, self-starting
class ImageProcessor(threading.Thread):
    def __init__(self, name, autoRun = True):
        super(ImageProcessor, self).__init__()
        self.event = threading.Event()
        self.eventWait = (2.0 * processingThreads) / frameRate
        self.name = str(name)
        print 'Processor thread %s started with idle time of %.2fs' % (self.name, self.eventWait)
        self.start()

    def run(self):
        # This method runs in a separate thread
        global running
        global frameLock
        global processorPool
        while running:
            # Wait for an image to be written to the stream
            self.event.wait(self.eventWait)
            if self.event.isSet():
                if not running:
                    break
                try:
                    self.ProcessImage(self.nextFrame)
                finally:
                    # Reset the event
                    self.nextFrame = None
                    self.event.clear()
                    # Return ourselves to the pool at the back
                    with frameLock:
                        processorPool.insert(0, self)
        print 'Processor thread %s terminated' % (self.name)

    def ProcessImage(self, image):
        # Processing for each image goes here
        ### TODO ###

# Image capture thread, self-starting
class ImageCapture(threading.Thread):
    def __init__(self):
        super(ImageCapture, self).__init__()
        self.start()

    # Stream delegation loop
    def run(self):
        # This method runs in a separate thread
        global running
        global cap
        global processorPool
        global frameLock
        while running:
            # Grab the oldest unused processor thread
            with frameLock:
                if processorPool:
                    processor = processorPool.pop()
                else:
                    processor = None
            if processor:
                # Grab the next frame and send it to the processor
                success, frame = cap.read()
                if success:
                    processor.nextFrame = frame
                    processor.event.set()
                else:
                    print 'Capture stream lost...'
                    running = False
            else:
                # When the pool is starved we wait a while to allow a processor to finish
                time.sleep(0.01)
        print 'Capture thread terminated'

# Create some threads for processing and frame grabbing
processorPool = [ImageProcessor(i+1) for i in range(processingThreads)]
allProcessors = processorPool[:]
captureThread = ImageCapture()

# Main loop, basically waits until you press CTRL+C
# The captureThread gets the frames and passes them to an unused processing thread
try:
    print 'Press CTRL+C to quit'
    while running:
        time.sleep(1)
except KeyboardInterrupt:
    print '\nUser shutdown'
except:
    e = sys.exc_info()
    print
    print e
    print '\nUnexpected error, shutting down!'

# Cleanup all processing threads
running = False
while allProcessors:
    # Get the next running thread
    with frameLock:
        processor = allProcessors.pop()
    # Send an event and wait until it finishes
    processor.event.set()
    processor.join()

# Cleanup the capture thread
captureThread.join()

# Cleanup the camera object
cap.release()

The support you guys give to we users is amazing. That's exactly what I was looking for, thank you. I shall go away and cogitate!

Two questions if I may.

Looking through your skeleton code it looks as though "frameLock" is not defined. Should this be defined as a "threading.Lock()" object?

If so, and If I run 2 instances of "ImageCapture()", one for each camera, I assume I should use a separate "frameLock" for each which I could pass by reference, eg "def __init__(self, lock)", as a global would affect both instances. Similarly for "processorPool"?

Or have I completely got the wrong end of the stick as far as threading is concerned?

piborg's picture

Looks like I managed to loose a line somewhere, as you suspect it should be

frameLock = threading.Lock()

at the bottom of the "shared values" section.

The frameLock object is used to protect the processorPool list from being altered in two threads at the same time.
Put simply it forces the threads to run one at a time when changing the processorPool list.

If you have different processing threads for handling the processing of each camera then you probably will want to create a second processorPool and corresponding frameLock object for the second camera.

On the other hand if the processing is using the same threads for both the first and second camera then the ImageCapture objects can share the processorPool and frameLock :)

Thanks.

I have been experimenting with the image capture and image processing threads using u4l2 as you posted above. These work well and I can drive Diddy (although quite slowly) around and the web camera mounted on a servo will keep track of the target.
I initially ran the program with processingthreads = 4, as per your code. However, as far as I can tell there is absolutely no difference in performance between running only 1 thread, 4 threads or even 16 threads.
The offset between the centre of the image and the centre of the cv2 contours is passed to a global variable which is picked up by a function called from the main program (thus unthreaded) which then adjusts the servo by the required amount to keep the target as close to the centre as possible.
I have also tried including this centreTarget() function in the image processing thread but again I can detect no difference in performance. I wonder if the Ultraborg and servo are the bottleneck here. Any thoughts?

piborg's picture

I presume that by performance you are talking about how "fast" the code tracks the target.

My best guess is that the frame rate you have for the processing is low enough that a single thread can keep up with it. I would suggest increasing it and see what happens, I suspect the maximum for your web camera to be something like 30 or 60 frames per second.

What you should find is that as long as the code keeps up the higher frame rate will track things better. If the number is too high and the code does not keep up it will start to go out-of-sync and become delayed the longer the script is running.

In laymens terms does the above code increase the frame rate in the Web UI? if so where would it go within the Diddyborg red web ui script?
if not it there a way to increase the frame rate and reduce the lag?

cheers
John

piborg's picture

The frame rate was kept low so that things run properly even on a weak WiFi signal.

The settings are at the top of the script:

# Settings for the web-page
webPort = 80                            # Port number for the web-page, 80 is what web-pages normally use
imageWidth = 240                        # Width of the captured image in pixels
imageHeight = 180                       # Height of the captured image in pixels
frameRate = 10                          # Number of images to capture per second
displayRate = 2                         # Number of images to request per second
photoDirectory = '/home/pi'             # Directory to save photos to

The frameRate is how fast the camera is reading images on the Pi and the displayRate is how fast the WebUI gets new images. Start by increasing displayRate and see if things improve. To go above 10 you will also need to increase frameRate.

As for lag, that is entirely dependant on how quickly the camera reads images and how long the network takes to send it. You can increase the frameRate value to read from the camera faster, which might help. There is not much else you can really do it improve it.

Subscribe to Comments for "Faster video processing"