On Sunday, a Reddit user named “Ugleh” posted an AI-generated image of a spiral-shaped medieval village that rapidly gained attention on social media for its remarkable geometric qualities. Follow-up posts garnered even more praise, including a tweet with over 145,000 likes. Ugleh created the images using Stable Diffusion and a guidance technique called ControlNet.
Reactions to the artwork online ranged from wonder and amazement to respect for developing something novel in generative AI art. “Never seen pictures like this. Something new in the world of art,” wrote one X user. “Tbh, I’ve seen a LOT of ai art, been in this space a long long time, and this is one of the most awesome pieces I’ve ever seen. You did so good,” wrote AI artist Kali Yuga on X.
Perhaps most notably, Y-Combinator co-founder and frequent social media tech commentator Paul Graham wrote, “This was the point where AI-generated art passed the Turing Test for me.”
Not everyone was impressed, of course, with some X users attempting to pick apart the compositional elements of the AI-generated spiral village. “It’s nice, but there are lots of decisions a human wouldn’t make,” wrote a graphic designer named Trent. “A lot of the shadows aren’t correct, and putting chimneys right above windows makes no sense. Zooming in there are also the tell-tale noise patterns of AI art.”
In June, we covered a technique that used the AI image synthesis model Stable Diffusion and ControlNet to create QR codes that look like rich artworks, including anime-inspired art. Ugleh took the same neural network optimized for creating those QR codes (which themselves are geometric shapes) and fed simple images of spirals and checkerboard patterns into it instead.
When guided by the prompt, “Medieval village scene with busy streets and castle in the distance (masterpiece:1.4), (best quality), (detailed),” ControlNet rendered scenes where artistic elements of the images match the perceptual shapes of spirals and checkerboards. In one image, the clouds arc overhead and people stand in a gentle curve to match the spiral guidance. In another, squares of clouds, hedges, building faces, and a wagon cart make up a checkerboard-shaped scene.
The magic of ControlNet
So how does it work? We’ve covered Stable Diffusion frequently before. It’s a neural network model trained on millions of images scraped from the Internet. But the key here is ControlNet, which first appeared in a research paper titled “Adding Conditional Control to Text-to-Image Diffusion Models” by Lvmin Zhang, Anyi Rao, and Maneesh Agrawala in February 2023, and quickly became popular in the Stable Diffusion community.
Typically, a Stable Diffusion image is created using a text prompt (called text2image) or an image prompt (img2img). ControlNet introduces additional guidance that can take the form of extracted information from a source image, including pose detection, depth mapping, normal mapping, edge detection, and much more. Using ControlNet, someone generating AI artwork can much more closely replicate the shape or pose of a subject in an image.
Using ControlNet and similar prompts, it’s easy to replicate Ugleh’s work, and others have done so to amusing effect, including checkerboard anime characters, an animation, medieval village “goatse” (surprisingly safe for work), and a medieval village version of “Girl with a Pearl Earring.”
Despite the massive attention and many offers to turn the artwork into NFTs, Ugleh has chosen to keep a low profile for now. On X, he said, “I appreciate all the positive feedback toward AI art, I do not plan on making money from my latest generations, and I will not be doing any official interviews. I am just a normal tech-savvy AI nerd who experimented with a new ControlNet technique.”
While the artwork is remarkable, current US copyright policy says that the images do not meet the standards to receive copyright protection, so technically they are in the public domain. While AI-generated artwork is still a contentious subject for many on ethical and legal grounds, enthusiasts continue to push the boundaries of what is possible for an unskilled or untrained practitioner using these new tools.