Via la newsletter de Datagif.« L’équipe raconte comment elle s’est appuyée sur DALL-E 2, l’outil de génération d’images développé par OpenAI, pour créer la couverture, en itérant sur différentes consignes. Celle qui a permis de générer l’image gagnante ? “Wide-angle shot from below of a female astronaut with an athletic feminine body walking with swagger toward camera on Mars in an infinite universe, synthwave digital art. ” Le futur est sous nos yeux.«
.
n a Monday afternoon in June of the year 2022 AD, six women on Zoom type increasingly bizarre descriptions into a search field.
- “A young woman’s hand with nail polish holding a cosmopolitan cocktail.”
- “A fashionable woman close up directed by Wes Anderson.”
- “A woman wearing an earring that’s a portal to another universe.”
The group, composed of editors from Cosmopolitan, members of artificial-intelligence research lab OpenAI, and a digital artist—Karen X. Cheng, the first “real-world” person granted access to the computer system they’re all using—are working together, with this system, to try to create the world’s first magazine cover designed by artificial intelligence.
DALL-E 2’s vision of a young woman holding a cocktail.
.
Sure, there have been other stabs. AI has been around since the 1950s, and many publications have experimented with AI-created images as the technology has lurched and leaped forward over the past 70 years. Just last week, The Economist used an AI bot to generate an image for its report on the state of AI technology and featured that image as an inset on its cover.
This Cosmo cover is the first attempt to go the whole nine yards.
But the portal-to-another-universe-earring thing isn’t working. “It looks like Mary Poppins,” says Mallory Roynon, creative director of Cosmopolitan, who appears unruffled by the fact that she’s directing an algorithm to assist with one of the more important functions of her job. (Nor should she be ruffled—more on that later.)
Back to something more basic then. Cheng types a fresh request into the text box: “1960s fashionable woman close up, encyclopedia-style illustration.” The AI thinks for 20 seconds. And then: Six high-quality illustrations of women, each unique, appear on the screen.
Six images that didn’t exist until right now.
.
This technology is a creation of OpenAI called DALL-E 2. It’s an artificial intelligence that takes verbal requests from users and then, through its knowledge of hundreds of millions of images across all of human history, creates its own images—pixel by pixel—that are entirely new. Type “bear playing a violin on a stage” and DALL-E will make it for you, in almost any style you want. You can depict your ursine virtuoso “in watercolor,” “in the style of van Gogh,” or “in synthwave,” a style the Cosmo team favors for perhaps obvious reasons.
See: obvious reasons.
.
The results are shockingly good, which is why, since its limited release in April, DALL-E 2 has inspired both awe and trepidation from the people who have seen what it can do. The Verge declared that DALL-E “Could Power a Creative Revolution.” The Studio, a YouTube channel by tech reviewer Marques Brownlee, wondered, “Can AI Replace Our Graphic Designer?”
By the end of the Zoom meeting, a cover is close. It’s taken less than an hour. This is wild to witness. And, yes, a little scary. And it raises serious questions far beyond the scope of magazine design: about art, about ethics, about our future.
Watching it work though? It makes your jaw drop.
This is an exclusive first look at DALL-E 2’s yet-to-be-released “outpainting” feature, which allows users to extend DALL-E’s images…and allows DALL-E to imagine the world beyond their borders.
ALL-E’s creators don’t like to anthropomorphize it, and for good reason—contemplating AI as an autonomous entity freaks people out. Just see the recent news about Google engineer Blake Lemoine, who was put on probation for claiming that his conversations with the company’s AI chatbot, LaMDA, proved it had a soul and should need to grant engineers permission before being experimented on. Most independent experts, as well as Google itself, were quick to dismiss the idea, pointing out that if AI seems human, it’s only because of the massive amounts of data that humans have fed it.
In fact, this kind of AI is fundamentally designed to imitate us. DALL-E is powered by a neural network, a type of algorithm that mimics the workings of the human brain. It “learns” what objects are and how they relate to each other by analyzing images and their human-written captions. DALL-E product manager Joanne Jang says it’s like showing a kid flash cards: If DALL-E sees a lot of pictures of koalas captioned “koala,” it learns what a koala looks like. And if you type “koala riding motorcycle,” DALL-E draws on what it knows about koalas, motorcycles, and the concept of riding to put together a logical interpretation. This understanding of relationships can be keen and contextual: Type “Darth Vader on a Cosmopolitan magazine cover” and DALL-E doesn’t just cut and paste a photo of Darth; it dresses him in a gown and gives him hot-pink lipstick.
The words are jumbled, by the way, because the current version of DALL-E was trained to prioritize artistry over language comprehension. For the real Cosmo cover, creative director Mallory Roynon placed the logo and coverlines herself.
All this represents a major breakthrough in AI, says Drew Hemment, a researcher and lead for the AI & Arts program at the Alan Turing Institute in London. “It is phenomenal, what they have achieved,” he says. There are many following suit: Last month, Google released a similar AI called Imagen, and a comparable generator called Midjourney, which The Economist used for its aforementioned cover image, was released in beta around the same time as DALL-E 2. There’s even a DALL-E “light,” now called Craiyon, made by the open-source community for public use.
That said, the technology is far from perfect. DALL-E is still in what OpenAI calls a “preview” phase, being released to just a thousand users a week as engineers continue to make tweaks. If you ask for something the model hasn’t seen before, for example, it’ll provide its best guess, which can be wacky. Despite the generally high quality of the images it renders, areas requiring finer details often turn out blurry or abstract. Perhaps most problematically, the majority of the people it renders, due to the biased data sets it’s seen, are white. And perhaps most surprisingly, it has a hard time figuring out how many fingers humans are supposed to have—to the machine, the number of fingers seems as arbitrary as the number of leaves on a tree.
But DALL-E is imperfect also by design. It’s intentionally bad at rendering photorealistic faces, instead generating wonky eyes or twisted lips on purpose in an effort to protect against the tech being used to make deepfakes or pornographic images, which disproportionately harm women.
Curious about deepfakes?
This Is the Scandal That Put Them on the Map
These land mines are part of the reason OpenAI is releasing DALL-E slowly, so they can observe user behavior and refine its system of safeguards against misuse. For now, those safeguards include removing sexually explicit images from those hundreds of millions of images used to train the model, prohibiting and flagging the use of hate speech, and instituting a human review process. DALL-E also has a content policy that asks users to adhere to ethical guidelines, like not sharing any photorealistic faces DALL-E may accidentally generate and not removing the multicolored signature in the bottom right corner that indicates an image was made by AI. And there’s an ongoing effort to make the data set less biased and more diverse; in the month Cosmo spent poking around, results already started to yield more representative subjects.
Despite DALL-E’s limitations, intentional and otherwise, its small but growing number of users are forging ahead, posting images on social media at a fever pitch lately—playing around with DALL-E and its knockoffs and sharing thousands of their results, like this and this and this. OpenAI does eventually plan to monetize all this interest by charging users for access to its interface and intends to carefully position it as an artist’s tool, not her replacement—a “creative copilot,” as OpenAI’s Jang puts it. Codex, another of OpenAI’s innovations, writes software based on normal-language directives as opposed to coding lingo and has streamlined and democratized parts of the software development process as a result. In the same way, Jang says she sees DALL-E 2 streamlining essential parts of the creative process like mood-boarding and conceptualizing.
Experts I spoke to generally agreed that while fears of AI replacing visual artists are not totally unfounded, the technology will also create new opportunities and possibly entire new art forms. Independent UK-based AI art curator Luba Elliott says she also hopes it can bring more women to the field of AI-generated art, where they’re less represented.
Outtakes from the Cosmo cover, to the tune of “a strong female president astronaut warrior walking on the planet Mars, digital art synthwave.”
Outtakes from the Cosmo cover, to the tune of “a strong female president astronaut warrior walking on the planet Mars, digital art synthwave.”
.
Cheng, the digital artist working with Cosmo, used DALL-E to make a music video for Nina Simone’s “Feeling Good” and is now using it to design a dress that bursts into geometric shapes when it’s viewed through an augmented reality filter. A video director by trade, Cheng says that in the past, she’s been limited as a visual artist because she can’t draw. “Now I have the power of all these different kinds of artists,” she says. DALL-E has become part of her day-to-day workflow and has drastically sped up her creative process.
“But I don’t want to sugarcoat it either,” Cheng wrote in an Instagram caption accompanying her music video. “With AI, we are about to enter a period of massive change in all fields, not just art. Many people will lose their jobs. At the same time, there will be an explosion of creativity and possibility and new jobs being created—many that we can’t even imagine right now.”
touches almost every part of our lives, from the electronic systems in our cars to the TikTok filters that give us Pamela Anderson eyebrows to our increasingly polarized social feeds and the proliferation of fake news. While AI itself is not new, “it is now a very powerful technology,” says Eduardo Alonso, director of the Artificial Intelligence Research Centre at City, University of London, and “we are starting to consider the ethical and legal impacts of what we are doing.” But technology tends to be a step ahead of the law, he says, so until the law catches up, it’s on the industry itself to set a code of conduct.
OpenAI’s stated mission is to work toward creating an artificial general intelligence (an AGI) that accomplishes two things: first, the ability to perform any task, not just the one’s it’s explicitly asked. That’s why tech like DALL-E is such a big step—it’s an attempt at giving AGI the sense of sight. “For an AGI to fully understand the world, it needs vision,” says Jang. “Up to now, we’ve taught it to be good at reasoning, but now it can look at things and we can incorporate visual reasoning.” This could pave the path for other senses, too, so that one day, Jang says, an AGI could process all the things a human can process. The second, and even loftier, goal? To create an AGI that “benefits all of humanity,” and the experts I spoke with seem to believe the company is genuinely committed to deploying AI responsibly. But others say that any AGI could have dangerous or even catastrophic consequences, like becoming a surveillance tool for authoritarian governments or becoming the operating system that enables autonomous weapons systems.
Ultimately, because a true AGI doesn’t yet exist, we still don’t and can’t know—but every day, whether we’re ready or not, we’re closer to finding out.
ack in the virtual conference room, the Cosmo and OpenAI group is tooling around with the cocktail cover idea, trying to put various miniature objects into the glass, like a sailboat or a tiny woman on a pool float. But the vision seems almost too surreal for DALL-E, which appears confused—“a woman taking a bubble bath in a martini glass” just generates a creepy face floating beneath the surface of the liquid.
Then DALL-E suggests a new idea everyone loves: putting a goldfish in the glass. It almost feels like the AI and the humans are riffing off each other.
DALL-E adds a goldfish.
.
An hour in, the team has stalled. The martini glass images look too clip-art-y to make for a satisfying Cosmo cover, and the deadline is nigh. (My deadline is nigher still, and I find myself wishing an AI would write my story.) When the group signs off, the fate of the cover feels uncertain.
The next morning, though, an email attachment in my inbox: an image of a decidedly feminine, decidedly fearless astronaut in an extraterrestrial landscape, striding toward the reader. It’s DALL-E’s interpretation of Cheng’s prompt from overnight, “wide-angle shot from below of a female astronaut with an athletic feminine body walking with swagger toward camera on Mars in an infinite universe, synthwave digital art,” and it’s stunning.
We have a winner.
We have a winner.
.
The image encapsulates the reasons OpenAI wanted to work with Cosmo and Cosmo wanted to work with OpenAI—reasons Natalie Summers, an OpenAI communications rep who also runs its Artist Access program, put best in email after seeing the cover: “I believe there will be women who see this and a door will open for them to consider going into the AI and machine learning fields—or even just to explore how AI tools can enhance their work and their lives. Women will be better equipped to lead in this next chapter of what it means to coexist with, and determine the course of, increasingly powerful technology. That badass woman astronaut is how I feel right now: swaggering on into a future I am excited to be a part of.”
Members of the team futz with the image in DALL-E over the next 24 hours—Cheng uses an impressive experimental feature, not yet available to users, that draws on the context of the image to “extend” it to the correct cover proportions—and by the next day, Cosmo has a cover.
bserving this process, I think, This sure is a lot of human effort for an AI-generated magazine cover. My initial takeaway is that DALL-E truly is an artist’s tool—one that can’t create without the artist. Which might ultimately be the point.
At a family wedding in the midst of all this, I met a renowned backdrop painter, Sarah Oliphant. Over a 45-year career, Oliphant—an artist in the classical sense, one who paints with a brush and draws with a pen—has painted backdrops for some of the world’s most famous photographers, including Annie Liebovitz and Patrick Demarchelier. I told her about DALL-E and asked what did she think? When AI can write poetry and make art, what did that say about…well, art?
“All art is borrowed,” she said. “Every single thing that’s ever done in art—we’re all just mimicking and forging and copying the human experience.” Everything we’ve ever seen that’s been meaningful to us, she said, becomes the inspiration we draw from when we’re creating art.
A data set, if you will.
She showed me a painting she made as a wedding gift, a fantastically bizarre and lavish creation that depicted the groom as a plump baby sitting in a bird’s nest in a mystical forest, surrounded by sumptuous cakes and mischievous fairies, each of which has the face of the bride. To paint it, Oliphant referenced a photo of the man as a baby, a book of Victorian fairy paintings, and a photo of a bird’s nest—as well as, probably, every baby, picture of a fairy, and bird’s nest she’s seen in her life.
But what about the fact that DALL-E can generate art almost instantaneously? Does that make a difference?
Not to Oliphant. “It’s what art evokes in the viewer that makes it valuable, not how long it took,” she said. This painting took her nine months, but “if the computer can generate a piece of art that I look at and I’m overwhelmed by its beauty or what it evokes, or I see it as intrinsically fascinating, then that’s just as valuable.”
So she didn’t feel threatened?
Oliphant laughed. “I’d say to that computer, good luck,” she said. “Okay, computer, you try to paint a backdrop as beautiful as I can. Go for it.”
A few days after our conversation, I type into DALL-E’s text box: “baby wearing flower wreath in a mystical forest at night, surrounded by fairies and cakes.” I wait. I realize: I’m nervous.
When the images populate 20 seconds later, I actually say “whoa” out loud. They’re eerily similar in mood and composition to Oliphant’s, and all the elements are there…the baby’s rosy cheeks, the gossamer fairy wings. But they’re unsettling to see after the original, and I realize why: DALL-E is representing a different data set, different experiences, a different worldview.
To call it a tool understates its capabilities. It’s not merely a paintbrush that an artist can wield to express her whims directly—if it were, she’d be able to paint a woman taking a bubble bath in a martini glass. Instead, it brings something of its own.
I save one of the better versions of the baby with his head tilted skyward, hands outreached, and enlarge it on my screen. I lean in and search the image, trying to distinguish which part was imagined by the human and which part by the machine.
.
This article has been updated to reflect DALL-E Mini’s name change to Craiyon.