Text-to-image generators are a handy way to produce striking images. What combination of words creates images that are art, versus those words that generate dull or banal images?
Last month, the site Dal: E (named after Spanish artist Dali and Pixar character Wall-E, from the 2008 film of the same name), reported that users were creating more than two million AI-generated images per day. The site added that it had refined its filters to reject violent or sexual content or other images that violate its policies.
But given the ease of access and increased sophistication of text image generators, many experts predict that it won’t be long before the technology becomes just one more weapon in the arsenal of those looking to spread misinformation and propaganda. The technology is already raising serious questions about copyright and the commercial use of artificially generated images.
Getty Images, for example, unlike some of its competitors, banned selling AI-generated artwork on its site in September due to uncertainty around the legality of such images, while also announcing a partnership with a to place which uses similar technology to enable substantial and creative editing of existing images. The difference emphasized here is that between image generation and image editing, even if the effect of the editing is to create an entirely different image.
In a recently published article report, Democracy Reporting International observed that this “combination of a text model and a synthetic image creator holds out the prospect of a shift in disinformation strategies, from manipulating existing content to creating new realities”. For the researchers, the application of AI technology goes “beyond the manipulation of existing media” to the “production of fully synthetic content…ultimately enabling the quick and easy generation of false visual evidence to complement live fake (news) stories”.
Another significant concern, critics say, is that AI technology will continue to replicate the stereotypes and biases that already exist in our society by pulling from existing images online when it generates pictorial responses to text commands. This would make it easier for those who wish to create visual “evidence” to display alongside falsified narratives targeting marginalized communities.
Democracy Reporting International offers recommendations on how to prepare for and respond to the growing mass of AI-created content. He argues that widespread digital literacy is essential if people are to recognize false narratives and misinformation. The researchers also suggest prebunking, meaning being proactive in fighting tampered images and text, rather than just reacting.
I spoke with Beatriz Almeida Saab, co-author of the report, about the threat posed by text image generators and how best to mitigate the potential damage. This conversation has been edited for length and clarity.
When preparing the report, what did you find unexpected in your research?
The threat is not the technology itself, but access to that technology. Because the technology to manipulate media has always been there, it’s just a matter of how easy and how quickly you can do it. Moreover, we have seen that people believe in much less sophisticated ways of manipulation. Our whole point is that it will get to the point where malicious actors will have easy access to it and it will be effortless. This type of technology is open access, which means that it is available to everyone. There are no regulations in place, which means that if we don’t discuss it at the political level, how will we be prepared to see the consequences?
What would be your nightmare scenario with text-to-image generation?
A malicious actor creates a fake headline, builds a story around it, and uses artificial intelligence (AI), specifically text-to-image generation models, to create an image that perfectly supports their fake narrative, by fabricating fake realistic evidence. Therefore, this false narrative is harder to verify and debunk, so people won’t change their minds as a shred of false evidence supports the story, and there is no room for questioning. a picture.
How does text-to-image generation differ from the “deep fakes” that already exist?
Deep fakes are generally used as a general description of all forms of audiovisual manipulation – video, audio, or both. These are highly sophisticated manipulations using AI-based technology, allowing those aiming to spread misinformation to make it look like someone said or did something they didn’t. or that something happened that never actually happened. The main difference between deep fakes and images generated by text prompts is that deep fakes refer to sophisticated manipulations of existing audiovisual content. Text-to-image creation is new as it moves from manipulating pre-existing media to all new media generation, to creating an image that reflects the desired reality.
Who is most directly impacted by the implementation of this technology? What is the responsibility of the people on the front line of this new technology?
At some level, everyone is impacted. The way we consume information, images and everything online will change. We have to learn to discern what is true from what is false online, which is very difficult. A researcher we interviewed for the report pointed out that your brain will already process information just by consuming it, whether it’s true or not. Your subconscious will process it and it will stay with you. It also impacts what we call provenance technology players, who can detect the authenticity of media. So that has an impact on how you demystify. This affects how you check facts. It involves all of these stakeholders because creating false evidence to support a false narrative is very serious.