Human Motion Diffusion Model

Reventlov · on Oct 11, 2022

You know what ? It reminds me of https://0x0.st/ot5T.gif ( https://en.wikipedia.org/wiki/Dancing_baby )

soperj · on Oct 11, 2022

I understand it's probably gonna get better and better, but the actual result made me laugh out loud. The skipping rope one was hilarious.

cdev_gl · on Oct 11, 2022

If you scroll down a bit there's a wireframe of the skeleton which is what's actually being animated, and you'll notice it's lacking in bones to define the fingers or even possibly hands. Hence why the hands maintain that weird pose throughout all examples.

My gut says that the quality could be rapidly improved without changing the underlying design at all.

The real issue with this, I think, is that motion capture for humans is already widely available and provides much higher fidelity and control than text. Unless I'm misreading the paper badly, this model was trained on exactly such data. Blending between multiple animations through motion capture is also well-understood.

So while the results are impressive, the practical gains seem very marginal. I think perhaps that the equivalent to "inpainting" (as mentioned in the text) and "style transfer" would be the big gain here? If we could use this to retarget animations to different body plans (child, adult, space monster) quickly, or for smarter interpolation between human-authored keyframes, I could see that being a much-desired tool.

nineteen999 · on Oct 11, 2022

I dunno, as an amateur animator and game developer this would be a huge help to me. I have a first gen Perception Neuron suit, I even wrote an addon for Blender that retargets the Neuron output for the Rigify rig.

But it's cumbersome to put on and take off, and to operate, especially when working alone. While I'm in pretty good shape, there's heaps of movements (eg. martial arts, swordplay, firing/reloading a gun etc) that would probably look silly if I performed them. I can see this being very handy at least for prototyping animations at the very least.

Replacing finger bone positions is pretty trival in Blender as well using the Pose Library feature so the lack of finger data isn't that much of a big deal.

jerpint · on Oct 11, 2022

This is very informative, I never knew people "did this at home"

dagmx · on Oct 11, 2022

The reason there’s likely no finger joints is because a lot of motion capture data doesn’t include fidelity beyond the wrist.

So if they’re training on the standard corpuses of motion capture data available and even mixing in their own, they likely won’t have fingers to base data on.

codeflo · on Oct 11, 2022

If you look very closely, the model does have wrists. (Most noticeable on the models left arm — that is the viewer’s right.)

Either way, the shoulder and elbow joints don’t move much during rope skipping, and no matter how it’s recorded, the motion capture data that was used for training should reflect that.

My best guess is that the model has picked up on the tiny arm motions that are present in rope skipping and wildly exaggerated them for some reason.

simion314 · on Oct 11, 2022

So today the issue with traditional animations is that you get all humans have same height, same proportions, like for example in The sims games all adults have exactly same height though you have different body shapes. If you use mods to change height, or legs length then things no longer align, like siting on a chair. This means chairs, beds,tables, doors all have the same height.

Not sure if this diffusion is the answer but some smarter way to extrapolate existing animation to work with bodies and objects that are 10-20% different and look natural.

hoseja · on Oct 11, 2022

Motion capture is expensive and laborious.

soperj · on Oct 11, 2022

That wireframe is gonna be sore. Lifting with it's back instead of it's legs...

Agentlien · on Oct 11, 2022

I was actually really impressed by these results. But yes, that skipping rope does look hilarious.

soperj · on Oct 11, 2022

Oh it's definitely impressive, I just couldn't help laughing.

DougBTX · on Oct 11, 2022

Maybe this is the upside of the uncanny valley, doesn’t look exactly like a person, but it does look like a silly person!

https://en.m.wikipedia.org/wiki/Uncanny_valley

treman · on Oct 11, 2022

Thought the same thing at first. Very silly indeed if you envision a boxer. On second thought, movements rather resemble those of a small child.

boxed · on Oct 11, 2022

The punching animation is much worse :)

bill1306an · on Oct 11, 2022

[flagged]

user8139471 · on Oct 11, 2022

Which image/video are you referring to? As far as I can tell, the colors are not used to represent race, but to convey different kinds of information. For example, in Figure 1 and 3 of the paper, the color indicates different points in time. In the video, the colors indicate different motion generation methods. I would not classify "orange", "blue" and "purple" as Caucasian, but if you want even more colors, you can have a look at the original paper where color coding was used to differentiate between different skinning methods (Figure 2).

https://files.is.tue.mpg.de/black/papers/SMPL2015.pdf

bill1306an · on Oct 11, 2022

To me, the four videos directly beneath the go to it website link are def not Black, Hispanic, Asian, or most other non-white races.

boxed · on Oct 11, 2022

What?

imhoguy · on Oct 11, 2022

Have we got almost everything now to make "Die Hard 6: The Diffusion"?

kwertyoowiyop · on Oct 11, 2022

“Now I have a 3090. Ho Ho Ho!”

ryanschneider · on Oct 11, 2022

Could make for a very interesting control scheme for an adventure game.

mikehollinger · on Oct 11, 2022

> Could make for a very interesting control scheme for an adventure game.

Zork and QWOP's lovechild? :-)

aqme28 · on Oct 11, 2022

I'm imagining an AI-Dungeon-like with a text-based interface but an actual player model.

crazygringo · on Oct 11, 2022

Now that lighting and polygon counts and motion capture have become so crazy realistic in AAA games, I find it's actually motion (outside of cutscenes) that snaps me out of believability the most. RDR2 is the best I've ever seen, but boy there are still a lot of clunkers.

I never imagined ML or stable diffusion would be the answer here, but now I wonder if it will be? So much stable diffusion stuff has so far felt to me like just toys to play with, but modeling movement seems like it could make a gigantic difference in videogames and animation generally.

criddell · on Oct 11, 2022

It's cool to see how capturing human movement has progressed.

Compare this model with how Jordan Mechner digitized human movement for his game Prince of Persia in the 80's:

https://www.youtube.com/watch?v=ZW_eExHpTZI

MertsA · on Oct 12, 2022

I wonder if you could just simulate a skeleton and realistic maximum torque values for the joints and factor in mass and center of mass for each segment. Obviously that doesn't get you "how" to move each limb, but it at least could work to train a network GAN style so that the motion looks realistic. I feel like movements that are physically impossible are some of the most significant things that break immersion and that's something you can model with a physics engine to set some boundaries on training a model based off motion capture training data.

jnichols35 · on Oct 11, 2022

Previous Discussion: https://news.ycombinator.com/item?id=33029522

iFire · on Oct 11, 2022

Is there an opensource alternative to SMPL? The German university doesn't respond to requests.

kleiba · on Oct 11, 2022

https://star.is.tue.mpg.de/

Try to find the email addresses of the authors of the scientific papers that go along with the models, there's usually a good chance that someone will answer if your request is reasonable.