One of the most well known features of OpenCV is its functionality for detecting faces. Let's look at how this works by launching the ocvFaceDetect sample located at blocks/openCV/samples/ocvFaceDetect. You should see an image in your webcam similar to the one below, assuming you are two young women standing underneath an umbrella in Amsterdam. I am not, so I used a photograph by Trey Ratcliff.
You'll note that the bounding rectangles of the two faces in the image are highlighted in yellow, and their eyes are marked with transparent blue circles. Let's take a look at how this is achieved, starting with the sample's setup() routine.
void ocvFaceDetectApp::setup() { #if defined( CINDER_MAC ) mFaceCascade.load( getResourcePath( "haarcascade_frontalface_alt.xml" ) ); mEyeCascade.load( getResourcePath( "haarcascade_eye.xml" ) ); #else mFaceCascade.load( getAppPath() + "../../resources/haarcascade_frontalface_alt.xml" ); mEyeCascade.load( getAppPath() + "../../resources/haarcascade_eye.xml" ); #endif mCapture = Capture( 640, 480 ); mCapture.start(); }
First, a bit of platform-specific code. In order to initialize our cv::CascadeClassifier
s mFaceCascade and mEyeCascade, we need to pass a file path for the XML descriptors. The way Mac OS X uses resources makes this easy, since resources are simply files inside the application bundle. We just call app::App::getResourcePath(). However resources under Windows work differently, as they are not individual files but instead are binary baked directly into the .exe itself. In the ocvFaceDetect sample we do not include these XML descriptors as true resources, but instead determine their file path relative to the application by using app::getAppPath().
Speaking of these XML descriptors, what exactly are they? Simply put, these are mathematical descriptions of what constitutes a feature of a certain variety - frontal faces in the first, and human eyes in the second. Both of these descriptors come with OpenCV, and it is also possible to create your own descriptors. If you are interested in knowing more of the specifics, you might read Face Detection using OpenCV, or the Wikipedia page on Viola-Jones object detection framework. For now though we just need to understand that Haar Cascade Classifiers are created using a large database of images and can identify particular features.
Following the initialization of these classifiers, we fire up the webcam. Now let's move on to the sample's most interesting section, the updateFaces() function.
void ocvFaceDetectApp::updateFaces( Surface cameraImage ) { const int calcScale = 2; // calculate the image at half scale // create a grayscale copy of the input image cv::Mat grayCameraImage( toOcv( cameraImage, CV_8UC1 ) ); // scale it to half size, as dictated by the calcScale constant int scaledWidth = cameraImage.getWidth() / calcScale; int scaledHeight = cameraImage.getHeight() / calcScale; cv::Mat smallImg( scaledHeight, scaledWidth, CV_8UC1 ); cv::resize( grayCameraImage, smallImg, smallImg.size(), 0, 0, cv::INTER_LINEAR ); // equalize the histogram cv::equalizeHist( smallImg, smallImg );
cameraImage enters the function as a full resolution, color image - a frame from our webcam. Next, we create a grayscale copy grayCameraImage. By using the optional argument to toOcv(), we can create an 8-bit single channel image. The Cinder OpenCV bridge automatically converts our color input image to a grayscale version. Next, we allocate a cv::Mat smallImg to hold a half-sized copy of the input image. By using this smaller image as input to the face detection algorithm, we can improve the performance of the app at a relatively minor cost in precision. This scale is achieved using the cv::resize() routine. Last, we run a process called histogram equalization on the image using cv::equalizeHist(). This is a contrast enhancement technique that is designed to improve the accuracy of the feature detection.
// clear out the previously deteced faces & eyes mFaces.clear(); mEyes.clear(); // detect the faces and iterate them, appending them to mFaces vector<cv::Rect> faces; mFaceCascade.detectMultiScale( smallImg, faces ); for( vector<cv::Rect>::const_iterator faceIter = faces.begin(); faceIter != faces.end(); ++faceIter ) { Rectf faceRect( fromOcv( *faceIter ) ); faceRect *= calcScale; mFaces.push_back( faceRect ); // detect eyes within this face and iterate them, appending them to mEyes vector<cv::Rect> eyes; mEyeCascade.detectMultiScale( smallImg( *faceIter ), eyes ); for( vector<cv::Rect>::const_iterator eyeIter = eyes.begin(); eyeIter != eyes.end(); ++eyeIter ) { Rectf eyeRect( fromOcv( *eyeIter ) ); eyeRect = eyeRect * calcScale + faceRect.getUpperLeft(); mEyes.push_back( eyeRect ); } } }
In the second half of updateFaces(), we begin by clearing out our std::vector<>
of faces and eyes from the previous frame. We then allocate some temporary storage and make the sample's most important call, cv::CascadeClassifier::detectMultiScale(). This function takes a grayscale cv::Mat as input, smallImg in our case, followed by a vector<cv::Rect>
for storing the objects it detects. Any detected objects are stored in faces, which we iterate in the for-loop that follows. For each face we scale it back up by calcScale so that it's relative to our input image and append to our mFaces variable. Then, we iterate the bounding box of this face for any eyes it might contain, calling detectMultiScale() using our eye classifier this time. Notice the first parameter to this function, smallImg( *faceIter )
. This makes use of the cv::Mat constructor which accepts a cv::Rect to create a sub-image of smallImg. This way we aren't searching the entire image for eyes - only the area of this particular face. Before we push each detected eye into our mEyes varaiable, we need to scale it up by calcScale, and since its location is relative to the face in faceIter, we'll need to offset it by the upper-left corner of the current face. That's all there is to it - you've mastered an age-old OpenCV rite of passage, coding a face detector.
1. cv::CascadeClassifier::detectMultiScale() accepts several additional parameters for which we are using the defaults. Checkout the documentation for this function and experiment with these parameters. How do they affect the accuracy of the detector? What about performance?