Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

For practical context, Stable Diffusion 2.X was trained on LAION-5B as opposed to LAION-400M for Stable Diffusion 1.X.

At the least, Stable Diffusion 2.X is better at pain points of image generation such as text legibility and hands, potentially due to having more data points.



Problem with hands is probability.

It's more probable a finger has a finger on both sides of it than not. So the model diffuses lots of adjacent fingers.


But that's the same for everything that has structure. A small section of an arm is much more likely to have another small section of an arm next to it than to have a hand, yet SD's arms are usually well-proportioned.


There's a lot of loooooong necks, though.


No the problem with fingers is that they resemble hotdogs and the AI really likes hotdogs so you get a lot of fingers.

I can make things up too!


SD 2 also removed quite a lot of images of humans due to their fear of people generating CSAM, so the quality actually has gotten worse for anything resembling humans than SD 1.


2.0 removed too many of them due to a bug in the NSFW filter. 2.1+ should be better again.

But they’re harder to control without negative prompting.


This is incorrect. Stable Diffusion 1.x was trained on "laion-improved-aesthetics" (a subset of laion2B-en).


Double checked and both the initial comment and the correction are incorrect: the original v1.1 was trained on LAION-2B, then subsequent versions were finetuned on the aestethics subset.

Either way, the main point is the same: more training data gives better results.

https://github.com/CompVis/stable-diffusion#weights


1.1 wasn’t public. Public releases were trained as I said.





Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: