The car is parked IF it is on the side of the road AND it hasn't moved in X time, AND ... But not if... etc.
Software 2.0:
The car is parked if the neural net says so.
Okay, if software 2.0 is all about thinking at a higher level, and training the neural net to deal with the details, why is the focus on detail like "is the car parked", or "is it raining", or "where is the lane marker?" Why can't we train "this is good driving" / "this is bad driving".
As a software 1.0 programmer I can see how that seems completely unreasonable, but it does seem to follow the logical direction of the talk.
What you're describing would be called "end-to-end learning", and the way things have been progressing is that a lot of systems cobbled together (e.g speech synthesis, translation, image recognition) have been converted to models (mostly neural nets) that are learned end-to-end. Autonomous driving is not an exception, and they might still get there, but you have to put your pragmatic hat on and do whatever works right now.
Benefit of this approach include more interpretable results, potentially improved safety guarantees (e.g you can limit the failure to a subsystem, things like that)
Most humans generally don't learn driving only end to end. They are taught how the various components of the car work, traffic rules and conventions etc.
Most people also do some experimentation and calibration. Check how much empty space there is after parking, drive a circle on a snowy parking lot until you spin etc.
One possible reason (from the video) might be: because software still has to follow the law. Good driving might imply having to push around the boundaries of the law from time to time, while self-driving machines cannot do that. So they have to hardcode those rules. On the other hand, they should also try to counterbalance human bias as well (protecting only the driver on an accident comes to mind).
You must start with labeled data. It is easier to label pictures of parked cars than it is to label pictures of good/bad driving. For labeled video, the dimensionality is out of reach of ML for now, and would add lag to your system.
As a software 1.0 programmer I can see how that seems completely unreasonable, but it does seem to follow the logical direction of the talk.