The Hitchhiker’s Guide to Stable Diffusion

Stable Diffusion and text-to-image generators are the latest technological phenomena pushing the limits of AI

by Prince Addo

Photo Credit:
An image generated by stable diffusion.

William Gibson—best known for his iconic book, “Cyberpunk”wrote one of his most iconic quotes in 2003 in The Economist publication: “The future is already here—it’s just not evenly distributed.” 

These words are as true now as they were 19 years ago and an epitome of that is Stable Diffusion.

Stable Diffusion is a machine learning model developed by StabilityAI that is able to generate images from text—referred to as text-to-image generation. It is an open-source model, so other programmers can develop their own programs based on the original model.

Stable Diffusion is based on the new latent diffusion model (LDM) which improves upon the previous diffusion model (DM). 

The name of the model is a quip at the instability of the DMs. The model itself was trained on a subset of LAION-Aesthetics V2 dataset on 256 A100s, which ended up costing around $600,000.

Stable Diffusion is not the only model in the text-to-image generation game. 

Prior to the release of Stable Diffusion, OpenAI—an AI research lab founded by Elon Musk—released DALL-E—a text-to-image model similar to Stable Diffusion. Google also has a text-to-image model called Imagen and Parti. 

The difference between Stable Diffusion and the other text-to-image models is that Stable Diffusion is not closed under bureaucracy, ill intentions, or a profit-driven company; Stable Diffusion is completely open-source, so anyone and everyone can access the model.

The recent innovations of text-to-image generation have surfaced questions about the ethics of these models. 

Some of the images that the Stable Diffusion model was trained on did not have a creative commons license and that is probably true for most of the text-to-image generators, since these models require a lot of images. Therefore, artists do not get a chance to agree to whether their images are used in an AI that will learn and copy from their works. 

Is it fair that AI research labs and large tech companies use images that do not have an open license to train their models? Many companies and organizations are discouraging or outright banning the use of AI text-to-image generators for ethical reasons.

These recent models also might mark the beginning of the end of the art profession. In this year’s Colorado State Fair, a man entered an AI-generated artwork for the digital artwork competition; he ended up winning first place, which sparked some controversy about the effect of AI-generated artwork on the art industry. 

“We’re watching the death of artistry unfold before our eyes,” one Twitter user stated. 

Although there is a high possibility that art could one day be completely generated by AI with just a description, we are far from such circumstances. 

The model has several limitations, one of them being the generation of human limbs and faces. Full body images of people often have missing limbs, and faces look blurred and inhuman, so these models could not be used to generate accurate descriptions of people or other things. 

It is only adept at generating general images that focus less on accuracy. 

History often repeats itself, and these current events can be compared to ones that happened around 200 years ago. 

In the early 19th century, there was a revolution in the textile industry, which allowed producers to make textile pieces at a remarkable pace and for very little money, but this disrupted the current industry and forced textile workers to leave their trade. 

To rebel against the textile mills, a group of textile workers organized themselves and started destroying the mills; these people were referred to as luddites. The government soon crushed the resistance; everything returned to normal and the textile industry kept improving and innovating.

This bit of history essentially showcases a group of people who rebelled against a technological advancement because it inflicted short-term damage upon their livelihoods. However, if one considers the long-term effects, it’s worth noting that the textile mills made textiles extremely cheap, which was a net positive for society. 

The old textile workers didn’t benefit from the new technology immediately—they actually suffered from it—but as time went on, they received way more benefits than if they would’ve stayed with the old system.

The same could be said about these new AIs, particularly Stable Diffusion. Such a system may have a negative effect on artists in the short term, but in the long run, this is a win for everyone in society, including artists. 

Since Stable Diffusion is open-source, there are many places you can try it out online; the official site is Enjoy!


Categories: Arts

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.