The Uncanny Failures of A.I.-Generated Hands

Gaming

t’s a classic exercise in high-school art class: a student sits at her desk, charcoal pencil held in one hand, poised over a sheet of paper, while the other hand lies outstretched in front of her, palm up, fingers relaxed so that they curve inward. Then she uses one hand to draw the other. It’s a beginner’s assignment, but the task of depicting hands convincingly is one of the most notorious challenges in figurative art. I remember it being incredibly frustrating—getting the angles and proportion of each finger right, determining how the thumb connects to the palm, showing one finger overlapping another just so. Too often, I would end up with a bizarrely long pinky, or a thumb jutting out at an impossible angle like a broken bone. “That’s how students start learning how to draw: learning to look closely,” Kristi Soucie, my high-school art teacher, in Connecticut, told me when I called her up recently. “Everyone assumes they know what a hand looks like, but until you really do look at it you don’t understand.”

Artificial intelligence is facing a similar problem. Newly accessible tools such as Midjourney, Stable Diffusion, and dall-e are able to render a photorealistic landscape, copy a celebrity’s face, remix an image in any artist’s style, and seamlessly replace image backgrounds. Last September, an A.I.-generated image won first prize for digital art at the Colorado State Fair. But when confronted with a request to draw hands the tools have spat out a range of nightmarish appendages: hands with a dozen fingers, hands with two thumbs, hands with more hands sprouting from them like some botanical mutant. The fingers have either too many joints or none at all. They look like diagrams in a medical textbook from an alien world. The machines’ ineptitude at this particular task has become a running joke about the shortcomings of A.I. As one person put it on Twitter, “Never ask a woman her age or an AI model why they’re hiding their hands.”

As others have reported, the hand problem has to do, in part, with the generators’ ability to extrapolate information from the vast data sets of images they have been trained on. When a user types a text prompt into a generator, it draws on countless related images and replicates the patterns it has learned. But, like an archaeologist trying to translate Egyptian hieroglyphs from the Rosetta Stone, the machine can deduce only from its given material, and there are gaps in its knowledge, particularly when it comes to understanding complex organic shapes holistically. Flawed or incomplete data sets produce flawed outputs. As the linguist Noam Chomsky and his co-authors argued recently in a recent Times Op-Ed, machines and humans learn differently. “The human mind is not, like ChatGPT and its ilk, a lumbering statistical engine for pattern matching, gorging on hundreds of terabytes of data,” they wrote. Instead, it “operates with small amounts of information; it seeks not to infer brute correlations among data points but to create explanations.”