Deep Fakes - How to spot faked Images
A (fairly) new kind of neural networks, so-called Generative Adversarial Networks or GANs, are nowadays capable of generating deceptively real images of people that do not actually exist. These fake images are indistinguishable from real photos at first glance. Fortunately, you might still uncover them if you look closely – if you know what to look for!
http://www.whichfaceisreal.com/ demonstrates how good these results are. In a quiz you can check your skill in telling fake from real photos, without resorting to tricks like the Reverse Image Search on Google.
If you are familiar with the technology behind it, it is (still) rather easy to distinguish fake from real photos after a little bit of practice. Yet for an amateur it might become increasingly difficult. In the “postfactual age”, brimming with digital misinformation, conspiracy theories and fake news on Facebook or social media, a competence in recognizing these deepfakes becomes more important than ever. Using some examples, we therefore want to explain how computer-generated images can be recognized.
At the same time, we can additionally learn more about how the current generation of Neural Networks works.
But first things first: The technology in this area is advancing rapidly – it is quite possible that in the near future some of the mentioned handicaps of GANs will be overcome (or that GANs will be replaced by a completely different, new technology).
Fragments without context – AI and its flaws
The most important rule for the recognition of generated images (and also text, speech or video) is that we take a moment, take a deep breath and look closely (which is generally recommended for media consumption).
To get a clue how to “detect” an AI, let’s remember briefly what the weaknesses of current AI are. Current AI is not “strong AI”. Meaning, it has no world knowledge, common sense, self-consciousness, etc. It doesn’t “know” what it’s doing. It does not even “know” what a face, a human being, top, bottom, left, right, glasses or an ear is. The AI has only “learned” to recognize things it was trained for, being shown thousands, better even millions, of examples. Unlike humans, it cannot generalize. It tries to imitate these individual elements, such as facial features, as authentically as possible. (Which already works very, very well)
In this context it is good to know how a specific type of neural network works. We are referring to so-called CNNs (Convolutional Neural Networks or ConvNets). These are currently the state-of-the-art when it comes to processing or generating images. We don’t need to go into detail here, it suffices to know that these networks do not develop any idea of the three-dimensional world. CNNs merely “learn” vast amounts of fragments (textures) by heart and develop an intuition about how these snippets can be put together and interpolated (mixed). However, CNNs have no “concept” of a complete or final image. As the GANs that generate the images are built on top of CNNs, they inherit their weaknesses.
Which face is real? – 4 examples
Keeping this in mind, we will now try to reveal the fake pictures. (All following examples are from www.whichfaceisreal.com)
So, which person in the following examples is real?
Example no. 1: Generation of textures, not objects
Both pictures look very real to start with, but if you look closer, several details look strange in the second picture:
- The background looks “real”. Still you can’t tell what kind of object it is supposed to be.
- The border between the child’s hair and the background reminds us of smudgy, runny colour.
- The child’s teeth are pretty crooked.
On the picture of the three women, however, no conspicuous features can be detected.
The solution: The picture of the child is artificially generated. The one of the women is real.
The confusing background in the child’s picture stems from the fact that current neural networks need a lot of examples to learn one thing. Since backgrounds in pictures do not follow any patterns, the networks are struggling with realistic imitations. In short: Faces have many similarities, backgrounds often do not.
The example of the teeth and the margin between the background and the hair shows you that a CNN works with textures instead of thinking “geometrically”. It doesn’t “know” teeth can’t stick to each other – yet the upper and lower incisors in the photo merge into each other and don’t have a clear border.
By the way: Did you notice the child’s chin?
The picture of the three women can (still) be recognized as real: hair and background are clearly distinguishable. The earring of the woman on the right has a distinct, realistic shape. And most important: There are several people in the picture! Neural networks cannot count – a neural network that can generate images of one person cannot generate images of two individuals. It would have to be trained again on millions of images of two persons. But then it still would not be able to produce images of three persons, etc.
Example No. 2: Missing details in computer-generated images
Let’s have a look at the picture of the Asian woman. Here we also have an obscure background, but it is so blurred that it could simply be a picture taken outside with a large distance to the background. So the background is no hint this time.
However, we distinguish individual hairs very well, which is a strong indicator for an authentic image. A Convolutional Neural Network does not know what a hair is and has problems to create single hairs on any background, because it would have to learn from scratch each case: Hair in front of a green background, hair in front of wallpaper, …
A clear sign that the picture is genuine is the woman’s mouth: It is slightly open, we can see anatomically correct teeth and a tongue. A CNN doesn’t know what the inside of a mouth looks like – it would never get the “idea” to add fillings to the teeth.
The easily recognizable ear stud and the realistic folds in the fabric of the blue sweater are the final indicators that the photo is (most likely) genuine.
The face of the woman in the second example is almost perfect, the picture nearly passes as authentic. The teeth are anatomically plausible, the hair looks very realistic and the background is plain – no clues here. However, we can see that a second face protrudes into the picture – and here you can tell at first glance that this can’t be real in any case. As already mentioned: Just because a Convolutional Neural Network can create one face, it can’t generalize to several faces – because it doesn’t know what a face is. Another clear hint is the earring, where no exact shape can be seen when looking closely – furthermore there is no hole in the earlobe and no hook, hanger or clip – what is holding the earring?
Example no. 3: Recognizing fake pictures by the exclusion principle
This example is particularly tricky. The fake can almost just be recognized because the real photo is so clearly genuine. We have learned that edges, backgrounds, mouths and accessories are good clues. The woman with the red cap is definitely genuine:
- The glasses are plausible, but in an unusual place – a CNN cannot yet “understand” that glasses are objects that are not necessarily located on the nose.
- The earrings are matching and definitely attached to the earlobe.
- The background is realistic, we see a realistically proportioned person in the blurred distance.
- The teeth are realistic.
Since the first photo is definitely real, the second one must be fake – but how can we tell?
The background is bright, monochrome and therefore gives us no clue. The teeth have no abnormalities. Since the background is bright, single strands of hair are no indicator.
But there are three tell-tale clues:
- In the upper right corner there are two colored “smudges” that look like cells under a microscope – these kinds of artifacts still now and then occur in images generated by current neural networks.
- The woman’s eyelashes look blurred, the black of the lashes smeared on her eyelids. This could be smudged make-up, but if you look closely you can see that the lashes seem to be drawn onto the skin. Here an “eyelash texture” is fused with an “eyelid texture”.
- The tip of her nose is excessively illuminated, the whole nose appears overly sharp compared to the rest of the picture.
Example no. 4: Here you almost need a magnifying glass
Here you can only recognize the fake photo with the greatest difficulty. If the picture of the young man wouldn’t sport a “cell artifact” (top right), it would be practically impossible. With a lot of effort it is noticeable that
- the lower left eyelid has a slight “dent”,
- the collar of the shirt on the bottom right has slightly “oily” colours,
- the neck has a much sharper edge on one side than on the other,
- the gum on the right side protrudes too far out onto the molar.
But all this can only be seen if you get very close to the picture.
Fortunately, the distinct earrings and the pattern on the strap of the top help us to recognize the image of the woman as authentic – so we can actually exclude the young man.
Checklist – unmasking fake photos
The last example in particular has shown how difficult it is already today to recognize such fake images. It is to be expected that even the hints listed here will not help in the near future.
In detecting AI fakes, the following checklist can help:
- Small details that are obvious to us humans: Do both earrings look the same? Is the number of front teeth correct? Are these realistic glasses?
- Badges or patterns on clothes are very difficult for AIs to fake: Do the clothes look realistic? Are seams on shirts plausible?
- Edges and transitions in AI images tend to “blur” with each other, in other places edges are sometimes unusually “hard”.
- Are sharpness and illumination of the image consistent? AI – contrary to normal computer graphics – cannot perform physical calculations (depth of field, cast shadows).
- Is the image logical in itself? Do writings make sense? Does a hand on the shoulder of a person have the correct number of fingers? Is it physically plausible? The AI does not know the real world at all.
- Treacherous artefacts (color clusters, shapes, unusual colours)
With this list of hints and a little training yourself you might be able to unmask the current generation of image creators – try it yourself! Have you noticed any other hints on the photos that can be used to catch a CNN? Or do you have questions? Let us know – here or via Twitter.