This Stable Diffusion AI art generation thingy is seriously the most fun thing I've ever done on a computer.
You can find plenty of articles about it floating around. Go google it. Here's Wikipedia's page if you're interested.
A lot of people are worried about it. Artists think it'll take their jobs. Politicians think it'll make it easy for the people to generate deepfakes of people doing things they didn't do, which is the government's job.
But it's open source, and the genie's out of the bottle, and well, I got my copy and I'm having a rollicking good time and you can't stop me.
What really matters is: you type words in a box, and it generates art.
In about three seconds.
Sometimes photographs. Sometimes paintings. You can be as specific as you like. If you're not specific, you get some pretty gnarly results. It's a bit like talking to an autistic toddler who has the combined knowledge of every artist who ever lived somewhere in his brain, but is just as happy to splash around with fingerpaints.
And it really struggles with hands. And arms. And legs.
And people doing things.
It helps to tinker with settings. It helps to get really, oddly specific with your requests.
When presented with the infinite power to generate any image you like, it can be a challenge to come up with something.
Tonight I thought I'd generate an image of one of my heroes: Canadian pianist Glenn Gould.
I thought this would be a good idea because Stable Diffusion (SD from here on out) struggles with people doing things with things.
It's great with portraits (especially if they don't involve arms, legs, or hands). It's great with landscapes. Pretty decent with objects.
It's not so good with swordfights, or someone smoking a cigarette. It's kind-of failed to be the infinite porn generator people were excited/worried about, unless you like basic pinups. But it was trained on a gorillion images floating around the web and I figured most of the pictures of Glenn Gould would involve a piano, so...
Here's Glenn Gould playing a piano.
His face isn't perfect. It looks kind of like him.
But again, this model didn't memorize all those billions of pictures. It applied some math to them until it had a pretty good idea of what a request for "Glenn Gould" might look like, and then it did it's best to deliver the results. That's how all those pictures "fit" into a model that's only 4 GB in size, which I downloaded. And which you can, too. (This still seems unreal.)
It gets a little confused about the piano. But what the hell? Each of these pictures only took two seconds to generate, so why not make a ton of them and pick out the best?
So I think, okay, how can I get something interesting out of this prompt? What else might Glenn Gould play? Certainly he'd recorded some pipe organ pieces.
Not bad. It looks more like him, and the keyboards are all oriented in the proper, horizontal direction. It's even got something that looks like pipes above the keys.
So, now that I've got an idea that generates a person doing something, somewhere, let's run with it.
What else could Gould do with those magnificent hands?
Maybe he could drive a train.
No, that's not quite it. He's not so much a conductor as an unruly passenger, here. Keep it down, buddy. If I wanted to hear Bach on my morning commute I'd listen to Spotify!
A second attempt came out a little better.
At this point, the second whiskey started kick in. (I don't have a lot of free time. I have to combine my hobbies.)
Let's think big. SD can paint anything. Why not send Gould to space?
Here's "Glenn Gould piloting a spaceship."
And it's quite good. He has mastery over those Apollo-13 era controls.
But I'm tired of this B&W vintage stuff. I like classical music and Sci-Fi and I want it rendered in HD for a modern audience. It's time to get a little more aggressive with our prompts.
I head over to Lexica to see what kind of prompts are getting good results, and hit Rentry's page for a visual list of artist styles that might tickle my aesthetic sensors. Then I lean down close to SD's autistic ears and whisper: "glenn gould piloting a spaceship, modern style, detailed face, by Brad Kunkle, digital painting, concept art, smooth, sharp focus illustration, trending on artstation"
(Props to Brad Kunkle, his prompt always gives me compelling results.)
SD gives me this:
Hmm. It's a closer approximation of his look. I like the way it's merged the idea of Gould's recording headphones with a space helmet. But we're back in basic portrait territory. He's not doing anything.
OK then, let's get specific about the background. "glenn gould piloting a spaceship, modern style, detailed face, by brad kunkle, in front of an insanely detailed spaceship interior, control panels, monitors, computers, digital painting, concept art, smooth, sharp focus illustration, trending on artstation"
The first result is evocative, even if it's just another portrait. (And it missed the meaning of "interior.")
Another one looks more promising.
He's at the controls. He even has the right number of fingers on the left hand.
OK, so we're onto something good. At this point, it's time to get serious. Let's turn up the CFG scale from 7 to 11. This means that SD will interpret my prompt more strictly. I think. No one really knows. This stuff is indistinguishable from magic. But go too high with the CFG and the results come out a little burned in looking. Experimenting between 5 and 18 gives different results with different prompts, some better higher, some lower.
What does seem to improve things (again, sometimes not always) is increasing the number of steps SD uses to process your prompt.
The way this works is, I think, probably, like: it starts from static, then gets closer and closer to its interpretation of your request with each step. More steps should mean a more...considered and detailed interpretation. But sometimes, too many steps produces horrors. Sometimes, fewer steps gives a softer, more impressionistic or illustrative look. In this case, I had good luck.
Here's low CFG, 20 steps:
Not bad. But at high CFG with 60 steps:
I think it's actually quite good. He's playing those controls like a piano, which is what I wanted.
Of course, these extra steps come with an increase in demand for processing power. Instead of generating this image in two seconds, it takes seven. Sigh. [[Sips whiskey.]]
But more of the results were interesting.
OK, now it's time to expand the canvas a little.
SD was trained on images of 512x512 pixels. Anything too far afield from that generates repeated figures and odd distortions. But I've had pretty good luck with a gentle landscape orientation.
At one point Gould came face to face with himself...in space!
I liked this one, which looks like it's out of Independence Day. Gould (instead of Will Smith) piloting a jet against the alien aggressors! While interpreting a Bach Fugue with MIDI precision with his goddamn eyes closed OMG!
Now that we've hit a real vein of exciting images, the true fun begins.
Adding additional artists to the mix and seeing how their style impacts the results.
Just Brad Kunkle
Brad Kunkle and John Bauer
Brad Kunkle and John Bauer and Rebecca Guay
Brad Kunkle and John Bauer and Rebecca Guay and Hive's own Peter Gric!
And here is one just "by" Peter Gric (who I hope will forgive me, as I see he has been running his own experiments with generative art).
And a final one from Peter, who brings Gould back to The Keyboard, where he ultimately belongs; but keeps him in the skies, where I want him. (I mean holy f@ck!ng sh!t a spaceship with a piano keyboard in the dash? Sign me up for space force...)
There is so much potential with this technology that the mind reels.
I've been playing with this for the past month or so, and the world feels like a different place than it was in early September.
The Wife is excited about it too. As she has actual artistic talent, I suspect she'll make better use of this tool than I can. And there's so much more to experiment with: image to image prompts, inpainting, alternate and custom trained models, upscaling...
Have you encountered this technology? Are you an artist? How do you feel about it?
I appreciate its ability, but as an image maker myself, I feel like this tech does sit me out. Rather, I must create an image in a different way than what I'm familiar. I could see a few folks I know interested, though. @nonameslefttouse comes to mind.
I have plans to blend my medias- wonder if they'll have such impressive results. I still feel inspired to experiment more...
Could be fun to play with but could also send me down a path of dependence. Much like how I depend on a specific toolset to create now, as in a specific art program and personal brush settings for instance. Not sure how I'd feel about my style becoming dependent on things beyond my control. It is fascinating though.
This style of mine presented here over the years often depends on triggering the viewer to experience pareidolia as well. Not sure how AI would handle illusions like that.
Thanks for the mention. I'll reblog this. Nice post.