For practical context, Stable Diffusion 2.X was trained on LAION-5B as opposed to LAION-400M for Stable Diffusion 1.X.
At the least, Stable Diffusion 2.X is better at pain points of image generation such as text legibility and hands, potentially due to having more data points.
But that's the same for everything that has structure. A small section of an arm is much more likely to have another small section of an arm next to it than to have a hand, yet SD's arms are usually well-proportioned.
SD 2 also removed quite a lot of images of humans due to their fear of people generating CSAM, so the quality actually has gotten worse for anything resembling humans than SD 1.
Double checked and both the initial comment and the correction are incorrect: the original v1.1 was trained on LAION-2B, then subsequent versions were finetuned on the aestethics subset.
Either way, the main point is the same: more training data gives better results.
At the least, Stable Diffusion 2.X is better at pain points of image generation such as text legibility and hands, potentially due to having more data points.