Computer vision with OpenCV
Join me on this exciting journey to apply advanced computer vision techniques to identify lane lines. Camera calibration, undistortion, color threshold, perspective transformation, lane detection and image annotation. Thanks to Udacity Self-driving Car Nanodegree for providing me the basic skills set to get there!
When we drive, we use our eyes to decide where to go. The lines on the road that show us where the lanes are, will act as our constant reference for where to steer the vehicle. Naturally, one of the first things we would like to do in developing a self-driving car is to automatically detect lane lines using an algorithm.
In this project we will detect lane lines in images using Python and OpenCV. OpenCV means “Open-Source Computer Vision”, which is a package that has many useful tools for analyzing images.
The following steps have been implemented:
- Computed the camera calibration matrix and distortion coefficients given a set of chessboard images.
- Applied a distortion correction to raw images.
- Used color transforms, gradients, etc., to create a thresholded binary image.
- Applied a perspective transform to rectify binary image (“birds-eye view”).
- Detected lane pixels and fit to find the lane boundary.
- Determined the curvature of the lane and vehicle position with respect to the center.
- Warped the detected lane boundaries back onto the original image.
- Output visual display of the lane boundaries and numerical estimation of lane curvature and vehicle position.
Ultimately we need to use the images to position the car and steer in the right direction, hence it is important to make sure the images accurately perceives the surroundings.
Cameras uses curved lenses to form an image, and light rays often bend a little too much or too little at the edges of these lenses. This creates an effect that distorts the edges of images, so that lines or objects appear more or less curved than they actually are. There is radial (most important) and tangential distortions. It can change the apparent size and shape of an object, it can cause an object’s appearance to change depending on where it is in the field of view, and it can make objects appear closer or farther away than they actually are.
Images can be undistorted by known formulation , mapping distorted points to undistorted points. We take pictures of known shapes in several angles and distances, so we are able to detect image errors. A chessboard is a good choice since is regular and has high contrast and pattern, which makes it easy for automated detection.
We use OpenCV functions
drawChessboardCorners() to automatically find and map the chessboard pattern. Next, we use openCV functions
cv2.undistort() to compute the calibration and undistortion.
Road test images — Original:
Road test images — Undistorted:
Bird’s-Eye View Transformation
We want to measure the curvature of the lines and to do that, we need to transform the road image to a top-down view. To Compute the perspective transform, M, given the source and destination points we use
cv2.getPerspectiveTransform(src,dst) . To compute the inverse perspective transform we use
cv2.getPerspectiveTransform(dst,src) . Finally, we can Warp the image using the perspective transform
cv2.warpPerspective(img, M, img_size, flags=cv2.INTER_LINEAR) .
We need to identify four (4) source coordinates points for the perspective transform. In this case, I assumed that road is a flat plane. This isn’t strictly true, but it can serve as an approximation for this project. We need to pick four points in a trapezoidal shape (similar to region masking) that would represent a rectangle when looking down on the road from above.
There are many ways to select it. For example, many perspective transform algorithms will programmatically detect four source points in an image based on the edge or corner detection, and analyze attributes like color and surrounding pixels. I have selected a trapezoid by using image dimensions ratios as an input. I found it to be a smart way to manually calibrate the pipeline and make sure it generalizes for different roads.
I have also implemented a code to properly sort the four source points for the perspective transformation. Just in case we change the way we come up with those points in the future. It is VERY IMPORTANT to feed it correctly, a wrong step here will mess everything up. The points need to be sorted “clockwise”, starting from top-left. The methodology consists in normalize the input into the [0, 2pi] space, which naturally will sort it “counter-clockwise”. Then I invert the order before output the function.
How to make sure we have a good transformation?
The easiest way to do this is to investigate an image where the lane lines are straight and find four points lying along the lines that, after perspective transform, make the lines look straight and vertical from a bird’s-eye view perspective.
I applied undistortion and then bird’s-eye transformation on a straight image of the road, and played with the trapezoid dimensions until getting this result:
Final trapezoid ratios and car’s hood cropping:
- bottom_width=0.4 , percentage of image width
- top_width=0.092 , percentage of image width
- height=0.4 , percentage of image height
- car_hood=45 , number of pixels to be cropped from bottom meant to get rid of car’s hood
Here are the source (src) and destination (dst) points:
Image Thresholding, Binary Image
I have tried out various combinations of color and gradient thresholds to generate a binary image where the lane lines are clearly visible. There’s more than one way to achieve a good result, but I have achieved the best results using only color threshold. Running my pipeline on the challenge video made clear that using color threshold only was the best of my solutions.
While using gradient threshold I found that Sobel gradient in X (Sx) and gradient magnitude (square root of the squares of the individual x and y gradients) were the best approaches to make the lines clear. When Sx operator is applied to a region of the image, it identifies values that are rising from left to right, so taking the gradient in the x-direction emphasizes edges closer to vertical.
Calculate the derivative in the x direction (the 1, 0 at the end denotes x direction):
sobelx = cv2.Sobel(gray, cv2.CV_64F, 1, 0)
Calculate the derivative in the y direction (the 0, 1 at the end denotes y direction):
sobely = cv2.Sobel(gray, cv2.CV_64F, 0, 1)
Calculate the absolute value of the x derivative:
abs_sobelx = np.absolute(sobelx)
Convert the absolute value image to 8-bit:
scaled_sobel = np.uint8(255*abs_sobelx/np.max(abs_sobelx))
The color thresholding was the best overall for me. I started converting the image from RGB to HSL, then isolates on the saturation (S) channel and applied the threshold range. Why the saturation channel? because it best enhance the lines of interest:
My final color threshold approach was to mask the image with white and yellow colors, convert it to grayscale and create a binary image based on non-zero pixels. The trade-off for better quality detection was the insertion of color constraints into the lane system detection.
Yellow-White color threshold that best generalized for the final pipeline was:
- Yellow: HSV [50,50,50] to HSV [110,255,255]
- White: RGB [200,200,200] to RGB [255,255,255]
Here some of the tryout thresholding combinations:
Bird’s-eye view for Sobel absolute gradient X(scaled_sobel 15 to 100) and HSL(S channel 170 to 255) thresholding:
Bird’s-eye view for Sobel absolute gradient X(scaled_sobel 15 to 100) Yellow(HSV[90,100,100] to HSV[110,255,255]) and White(RGB200 to RGB255) thresholding:
Bird’s-eye view for Yellow(HSV[90,100,100] to HSV[110,255,255]) and White(RGB200 to RGB255) thresholding:
Main Pipeline Video
1. Identify Lane Lines
Next, locate the Lane Lines and fit a 2nd order polynomial:
- Identify start bottom image position of right and left lines: Peaks in a Histogram method .
- Search for the biggest “accumulation of 1s” in horizontal “slices” of the binary image and define a window. Sliding Window method .
- Identify the nonzero pixels in x and y within the window and fit a second order polynomial to each line. Polyfit .
1.1 Peaks in a Histogram Method
After applying calibration, thresholding, and a perspective transform to a road image, we have a binary image where the lane lines stand out clearly. However, we still need to decide explicitly which pixels are part of the lines and which belong to the left line and which belong to the right line.
I first take a histogram along all the columns in the lower half of the image.
With this histogram, I am adding up the pixel values along each column in the image. In my thresholded binary image, pixels are either 0 or 1, so the two most prominent peaks in this histogram will be good indicators of the x-position of the base of the lane lines. I use that as a starting point for where to search for the lines.
1.2 Sliding Window Method
Next, we can use a sliding window, placed around the line centers, to find and follow the lines up to the top of the frame.
We basically search for the biggest “accumulation of 1s” in horizontal “slices” of the binary image and define a window.
9 windows are stacked up along the lane line (each), i.e. the binary image is “sliced and searched” in 9 blocks from bottom to top of the image, but non-zero points will be searched only with the delimited windows (right and left). The initial “centroid” of the windows for the first block is defined by the histogram method. 100 pixels width margin is used to search non-zero pixels within the window. I case the “accumulation” of non-zero points is bigger than 50, the “centroid” of the window is redefined accordingly.
I had no time to play with windows margin (100, 50), but it will be interesting to come back later and try out some modifications. It also may be interesting insert one more variable of control and limit the horizontal shift of the “centroid” with respect to the last window, it may avoid crazy windows combination and ultimately wrong lane detection.
Polyfit and drawing are applied next.
1.3 Non-Sliding Window (Window Search)
The sliding window method is applied to the first frame only. After that we expect the next frame to have a very similar shape, so we can simply search within the last windows and adjust the “centroids” as necessary.
Polyfit and drawing are applied next.
The green shaded area shows where it searches for the lines. So, once you know where the lines are in one frame of video, you can do a highly targeted search for them in the next frame. This is equivalent to using a customized region of interest for each frame of video, which helps to track the lanes through sharp curves and tricky conditions.
I had no time to implement a lose track condition yet. Let’s say it cannot find non-zero pixels within the last windows, and it may happen for a sequence of frames, the program should go back to the sliding windows search or another method to rediscover them.
1.4 Good and bad polyfit frames
I have implemented 2 ways to avoid the crash of the program and discard bad polyfits. First I discard the frame if the polyfit crashes. Second, I keep track of the last line polyfit and calculate the difference to the current. We expect the current frame to have a very similar line shape to the last one. So I have inserted polyfit coefficients tolerances.
How did I come up with the tolerances?
As an initial point I have fitted lines for several frames in different scenarios, and calculate the min and max differences between those different scenes, That gives us initial numbers but the margin is still large because the frames should be very similar and hence smaller tolerances. But I thought about cases in which it will be discarding a relatively long sequence of frames, so the next not discarded frame may be within that range of tolerances.
We have 2nd order polyfit, so we have 3 coefficients. The tolerances used are: 0.001, 0.4, 150.
2. Radius of Curvature and Offset Position
2.1 Radius of curvature
Next, we’ll compute the radius of curvature of the fit and the car offset position with respect to lane center.
The radius of curvature is defined as follow:
The y values of the image increase from top to bottom, I chose to measure the radius of curvature closest to your vehicle, so we evaluate the formula above at the y value corresponding to the bottom of the image, or in Python, at yvalue = image.shape.
If we calculate the radius of curvature based on pixel values, the radius will be in pixel space, which is not the same as real world space. So we first convert x and y values to real world space.
The conversion to real world could involve the measuring how long and wide the section of the lane is that we’re projecting in our warped image. We could do this in detail by measuring out the physical lane’s dimensions in the field of view of the camera, but for this project, we assume the lane is about 30 meters long and 3.7 meters wide .
Here is an example of my result on a test image:
2.2 Offset position
For this project, we assume the camera is mounted at the center of the car, such that the lane center is the midpoint at the bottom of the image between the two lines we’ve detected. The offset of the lane center from the center of the image (converted from pixels to meters) is the distance from the center of the lane.
3. Average Frame (Smooth)
Even when everything is working, the line detections will jump around from frame to frame a bit and it is preferable to smooth over the last n frames of video to obtain a cleaner result. Each time we get a new high-confidence measurement, we append it to the list of recent measurements and then take an average over n past measurements to obtain the lane position we want to draw onto the image.
The good or bad frame selection is already implemented as described above. For the frame smooth and average I found a really helpful implementation by David A. Ventimiglia :
”Using a ring-buffer with the Python deque data structure along with the Numpy average function made it very easy to implement a weighted average over some number of previous frames. Not only did this smooth out the line detections, lane drawings, and distance calculations, it also had the added benefit of significantly increasing the robustness of the whole pipeline. Without buffering — and without a mechanism for identifying and discarding bad detections — the lane would often bend and swirl in odd directions as it became confused by spurious data from shadows, road discolorations, etc. With buffering almost all of that went away, even without discarding bad detections…”
Road Test Images
Road Test Videos!
I found the thresholding technique very challenge to generalize for non-well maintained or under construction roads, and for tracks with very sharp curves. So I guess there are more advanced techniques out there, restraining and smoothing big detection variations. Regarding the sharp curves, I guess in this case we are limited by the field of view of just one camera, but it may still be doable if we use the appropriate transformation matrix.
As described above:
- I had no time to play with windows margins, but it would be interesting to come back later and try out some modifications. It also may be interesting to insert one more variable of control and limit the horizontal shift of the “centroid” with respect to the last window, it may avoid crazy windows combination and ultimately wrong lane detection.
- I had no time as well to implement a lose track condition. Let’s say it cannot find non-zero pixels within the windows, and it may happen for a sequence of frames, the program should go back to the sliding windows search or another method to rediscover them.
The pipeline is doing great on the project video!! it is doing good on the challenge videos! But it fails badly on the brutal “harder_challenge” video which is not posted here. The project and challenge videos are wide one way lanes road, but the harder challenge is a much narrower (one lane) and two ways road, and it also has very sharp curves going out of the field of view of the camera.
I did not stop yet to try and improve the pipeline, generalizing up to the harder challenge video. But I think the key is to implement a dynamic and automated way to define the transformation trapezoid and come up with the appropriate source points for the perspective transform. I would start looking into the edge or corner detection options and analyze attributes like color and surrounding pixels.