New software can be used to falsify moving images and create a new kind of ‘fake news’ on video. What are the implications for media literacy?
The phenomenon first appeared last year in the work of a programmer calling himself ‘deepfake’. The word has now become a generic term for human image synthesis. It offers a way of combining images of people from different sources to make it appear that they are saying and doing things that they actually did not.
This video gives a very brief demonstration of how the technology can effectively put words into the mouth of a public figure like former President Obama. And there’s an excellent public information video (also featuring Obama) by Buzzfeed’s Jonah Peretti and Get Out director Jordan Peele.
It seems that this doesn’t require complex or expensive technology. It can be done with basic software packages and open source tools like TensorFlow. At the moment (as in the Obama example), it’s a little clunky, but it’s likely to get a whole lot better very quickly. This software is becoming much more widely available. One such package is FakeApp – a free piece of open source software that takes the SnapChat craze of ‘face swapping’ into the world of moving images. Here’s a demonstration of a more complex one, called Deep Video Portraits.
In order to create a deepfake, you need a large body of video images of a given person. In the case of most public figures (like celebrities and politicians), this is pretty easy to obtain; but it is also increasingly the case for many private individuals who post on social networking and video sharing sites. The software will effectively ‘train’ an algorithm (or make it train itself) to identify consistent patterns, for example in gestures and facial expressions; and it will then select and manipulate these images to synchronise them with spoken words.
As with so many other developments on the internet, it was pornography that led the way. Deepfake was first used to create what is euphemistically called ‘involuntary pornography’: this can include revenge porn, but it also includes sequences where the heads of celebrities are pasted on to the bodies of performers in porn videos.
However, there is an obvious potential here for generating ‘fake news’. It would be easy to use this technology to create videos of politicians making false claims or inappropriate comments. Of course, this kind of misrepresentation isn’t new – and we’ve seen a good deal of it recently in the campaigns of the British press against the Labour leader Jeremy Corbyn. But the potential to fabricate moving images takes this to another level. It could obviously play a part in political campaigns, but it also has troubling implications for international security. For example, it wouldn’t be too hard for somebody to create a video of Donald Trump threatening to use nuclear weapons against North Korea (or even announcing that the weapons had been launched) and then to post it online.
As I’ve said, the results to date aren’t completely seamless. But the software is evolving very rapidly, and some experts are predicting that within a few months, it will be impossible to detect. Deepfakes have already been banned by platforms like Twitter and Reddit (assuming, of course, that they can identify them). Revenge porn is subject to laws on harassment. But the potential here is so broad that it’s hard to see how the phenomenon can be eradicated.
Of course, the ability to ‘falsify’ moving images is nothing new. It dates back right to the beginnings of cinema, for example in the work of Georges Melies. We’re all very familiar with the idea of ‘special effects’. With the advent of digitization, the potential to create ‘realistic’ sequences of moving images showing impossible events is entirely taken for granted.
However, when it comes to non-fictional material – to news and documentary – the situation is rather different. In the case of still images, there is a history of falsification, long predating the age of Photoshop – for example in advertising or celebrity culture. When it comes to moving images, we all know in principle that things can be selected, staged, edited or otherwise combined to create misleading impressions of what really happened (although we may not always be aware of this). The potential to ‘insert’ fictional characters into factual documentary material has also been exploited in films like Zelig and Forrest Gump – although here again, it’s assumed that the viewer understands what’s going on.
Scholars of documentary film have always resisted the familiar claim that ‘the camera never lies’ (see, for example, Brian Winston’s book Lies, Damn Lies and Documentaries). Indeed, we might even go so far as to say that the camera always lies – or at least that it always tells us part of the truth.
Nevertheless, I think we do relate to non-fictional material (or at least to material that we believe to be non-fictional) in a different way from fictional material. We may well know that things can be manipulated in all sorts of ways, both at the point of filming and in editing and post-production; but we are still inclined to regard non-fictional images (and especially moving images) as some kind of evidence of truth. In the case of news and documentary, we assume that some kind of ‘pro-filmic’ event has occurred, whether staged or not – in other words, that something ‘real’ happened in front of the camera, and that there are certain ways in which the camera does not, or even cannot lie.
This suggests that credibility isn’t necessarily inherent in the text itself. The text can try to establish its credibility in various ways: it can make ‘reality claims’ of various kinds, and this is something we can analyse. But credibility is ultimately something we as viewers grant to the text.
My own and other research has shown that even quite young children are preoccupied with the relationships between reality and fantasy. As they become more experienced media users, they develop more complex, multi-dimensional ways of identifying the differences between fiction and non-fiction. ‘Media literate’ viewers use very diverse sets of criteria for deciding what to trust, and what to take seriously, or not. This isn’t just a rational process: it also has emotional dimensions (we tend to believe what we want to believe), as well as ideological ones.
However, the problem comes with texts that seem to cross the boundary, or play with it. The most famous example of this was Orson Welles’s radio production of War of the Worlds, first broadcast in 1938. Many people believed that this fictional broadcast about an invasion from Mars was actually real – although research suggested that their responses varied in line with various pre-existing beliefs (religious people were more likely to believe it was real, for example). However, one key issue was whether people heard and took account of the very start of the programme, which clearly flagged it as a theatrical production. Those who tuned in late were more likely to be fooled, and to panic.
I found similar things in my own research on children’s responses to a 1990s TV programme called Ghostwatch – a kind of ‘mockumentary’ that is still described by some as the scariest programme ever broadcast. Ghostwatch was so scary because it used many of the devices and techniques of documentary and what we would now call reality TV. (Significantly, both War of the Worlds and Ghostwatch were broadcast on Halloween night.)
This can cut both ways, though. A recent Channel 4 TV programme Married to a Paedophile featured actors lip-synching (very skillfully) to audio recordings of interviews with real-life participants, not least in order to protect their identities. (This technique was also used in Clio Barnard’s extraordinary film The Arbor, about life on a working class housing estate in Bradford.) Yet here again, it’s interesting to think about how viewers might respond if they did not know that the participants were played by actors.
Meanwhile, there have been several famous cases where the indexicality of video – that is, its status as evidence of real events – is contested. Back in 1991, an amateur video of an African-American man called Rodney King being beaten by Los Angeles police was used in court – although (amazingly) the police disputed whether the video actually showed what it appeared to show. In fact, they used the video as evidence that King represented a threat to them, and were acquitted on that basis – a verdict that sparked days of rioting. Even in such an apparently open-and-shut case, it seemed that video evidence did not speak for itself: such evidence always has to be interpreted, and it can be interpreted in different ways.
All this takes us to a new place in the debate about ‘fake news’. It may be that we are seeing some kind of grand ‘collapse of reality’ – although perhaps that overstates the case. The more we become aware of the potential for manipulation, the more cynical we may become. Ultimately, we may reach a state where nobody feels they can trust anything.
At the very least, these developments reinforce the need for a critical perspective: media students need to be sceptical of all the ways in which media make claims to truth, including those which involve moving images. What deepfake shows is how much more urgent, but also how much more difficult, this is becoming. So, think twice before you upload that selfie…
(Thanks to Miroslav Suzara for first alerting me to this.)
Pingback: Artificial Intelligence in Education: A Media Education Approach | David Buckingham