Understanding how virtual make-up apps work

Virtual make-up is an interesting technology that can be used to aid in the decision of buying cosmetics, enhancing portraits, or just for fun.

Since the final color of an applied cosmetic depends both, on the color of the cosmetic, and on skin color, most of the time people have to go to the stores and try the products on themselves to see how they would look like. With virtual make-up technology, this can be conveniently simulated on a computer, or a mobile phone. The only thing needed is a photograph of their face looking towards the camera, and the software is able to simulate how a particular make-up would look on that particular skin color. This can also be applied to a live camera feed for a more realistic application with real time rendering.

To create an application like this, the first step is to estimate where the faces are in the photograph. This can be solved with computer vision. In the general case, this problem is called object detection. You need to define how your object looks like, and then train an algorithm with many images of the object. Most computer vision algorithms that perform this task assume that the face appears on a roughly frontal view, with little or no objects covering it. This is because most faces have a similar structure when viewed from the front, whereas profile or back views of the head change a lot from person to person because of hair styles, among other things.

In order to capture the facial structure, these algorithms are usually trained with features such as Haar-like, lbp, or HoG. Once trained, the algorithm is able to detect faces in images.

So, for example, let’s say that you start with an image like this:

After detecting the face, you will end up with an region of interest on the image. Something like this:

Now, inside that region of interest, we need to detect specific face landmarks. These landmarks represent the position of different parts of the face, such as eyes, mouth, and eyebrows. This is again, an object detection problem. You need to train an algorithm with many annotated images and a specific number of facial landmarks. Then, you can use this trained algorithm to detect those facial features in a new image. Just like this for example:

Once you have the position of those landmarks, you need to design your own make-up, and align it to those features. Once you have that, you can then blend together the original image with your designed make-up. Here are some basic examples with a few different colors:

Â

The position of the detected landmarks and the design of the make-up are crucial to make it appear realistic. On top of that, there are many different computer vision techniques that can be applied in order to blend the make-up into the face in a more realistic manner.

Posted in Computer Vision, Photography.

rev="post-388" 2 comments

By samontab – March 11, 2016

Interfacing Intel RealSense 3D camera with Google Camera’s Lens Blur

There is an interesting mode in the Google Camera app, called Lens Blur, that allows you to refocus a picture after it was taken, basically simulating what a light-field camera is able to do but using just a regular smartphone.

To do this, the app uses an array of computer vision techniques such as Structure-from-Motion(SfM), and Multi-View-Stereo(MVS) to create a depth map of the scene. Having this entire pipeline running on a smartphone is remarkable to say the least. You can read more about how this is done here.

Once the depth map and the photo are acquired, the user can select the focus location, and the desired depth of field. A thin lens model is then used to simulate a real lens matching the desired focus location and depth of field, generating a new image.

In theory, any photograph with its corresponding depth map could be used with this technique. To test this, I decided to use an Intel RealSense F200 camera, and it worked. Here are the results:

Focused on the front:

Focused on the back:

Those two images were created on the smartphone using the Lens Blur feature of the Google Camera app. The input image was created by me externally, but the app was happy to process it since I used the same encoding that the app uses.

To do that, I first took a color and depth image from the Realsense camera. Then, projected the depth image into the color camera:

Color photograph from RealSense F200:

Projected depth image from RealSense F200:

The next step is to encode the depth image into a format that Google Camera understands, so I followed the encoding instructions from the documentation. The RangeLinear encoding of the previous depth map looks something like this:

The next step is just to embed the encoded image into the metadata of the original color image, and copy the final photo into the smartphone gallery. After that, you can just open the app, select the image, and refocus it!.

Understanding how virtual make-up apps work

Interfacing Intel RealSense 3D camera with Google Camera’s Lens Blur

Subscribe

About Sebastian Montabone