This comment chain appears to have a fundamental misconception of what constitut...

jacquesm · on Nov 13, 2023

There is a difference between what I with my layman-user cap on would consider unsafe (say, releasing purposefully harmful software into the environment or software with bugs like phantom braking) and what I with my embedded software programmer cap on (machine control, aerospace related) would consider to be unsafe. I spend a lot of my time reading root cause analysis on software failures and my only conclusion is that this is a very immature industry with a lot of players that should probably not be trusted as much as we do.

As for safety shims: for instance, watchdog timers are often used for this purpose, to bring a system that is exhibiting buggy behavior to behave more predictable. Personally I would consider any watchdog error that was not directly related to a hardware fault (say a bitflip) as a failure and I'd like to have my car report such failures.

Tesla being non-compliant is precisely my point: they get away with that because they don't have to open up their code to scrutiny. But I'll bet that if they would that their marketing department would not be able to spin stuff the way they do at present.

All of those would take into account the relevant context and I think that's where things go off the rails here: embedded software developers do not get to claim the high ground around what they consider safe or unsafe given the code from that world that I've had my eyes on. If anything it is amazing that it works as well as it does, the only two industries that get a pass based on my experience so far is medical embedded and avionics. Everybody else has the exact same problems as any other software project and would benefit from opening themselves up to scrutiny.

intern4tional · on Nov 13, 2023

> Tesla being non-compliant is precisely my point: they get away with that because they don't have to open up their code to scrutiny.

This is an inaccurate assumption. ASIL compliance is something that you can publicly state, and Tesla explicitly does not. Most automotive products that do follow such standards generally state such. (Example: https://www.aubass.com/products/cp_top.html - search for ASIL D)

Making something open source does not in any way make it safe, unless your enforcement plan is having lawyers look into it and in which case you end up with lawsuits and likely endless litigation that results in a closed platform again. Tesla's calculation is that no one will enforce safety controls or standards compliance on them, and to be fair to date they (Tesla) are right.

> Personally I would consider any watchdog error that was not directly related to a hardware fault (say a bitflip) as a failure and I'd like to have my car report such failures.

This is an opinion and does not properly account for the multitude of scenarios that must be dealt with in automotive. Automotive, unlike aerospace, rarely has failover hardware or in some cases even A/B partitions on the disk for failover in software. Add in the need to process signals in real time and you will encounter situations where use of a health check (watchdog timer) is an appropriate response, and the use of one should not need to be reported.

> embedded software developers do not get to claim the high ground around what they consider safe or unsafe given the code from that world that I've had my eyes on.

They (embedded software developers) don't make such claims; when something is safety certified, an external third party does the validation and verification and asserts that an implementation and processes are either safe or not. In the EU this is usually done by TUV (https://www.tuv.com/world/en/) or Horiba-Mira (https://www.horiba-mira.com/).

This gets extremely complex as this is often hard tied to the hardware (support for a safety island, how memory is managed on the SOC, etc) and the overall E/E architecture (selected messaging protocol for the backbone) and layout of the vehicle. Analyzing a system of systems to determine all the possible impacts and make sure that the chance of failure is small enough to be acceptable is a hard problem to solve and not one any single engineer does.

jacquesm · on Nov 13, 2023

>> Tesla being non-compliant is precisely my point: they get away with that because they don't have to open up their code to scrutiny.

> This is an inaccurate assumption. ASIL compliance is something that you can publicly state, and Tesla explicitly does not.

Sorry, but voluntary standards aren't standards, Tesla is incompliant with ASIL, if they were they'd definitely state that they are so you may as well assume that they're not. Personally I think any party that doesn't bother to state they are compliant should simply not be allowed to ship a vehicle because consumers are not going to be aware of the differences.

> Making something open source does not in any way make it safe, unless your enforcement plan is having lawyers look into it and in which case you end up with lawsuits and likely endless litigation that results in a closed platform again.

It does not guarantee safety. But it does more or less rest on the assumption that over the years at least some safety related bugs would be found and if there is anything that we've learned from open source over the last couple of decades then it is that if you look long and hard enough at even the most battle tested codebases that you will just uncover an ever lasting stream of bugs with the frequency reducing over time.

>> Personally I would consider any watchdog error that was not directly related to a hardware fault (say a bitflip) as a failure and I'd like to have my car report such failures.

> This is an opinion and does not properly account for the multitude of scenarios that must be dealt with in automotive.

Yes, it is my opinion as a software developer of many decades that if you rely on your watchdog timer to keep stuff running besides exceptional cases that you are doing it wrong. Imagine driving on the highway with one of your mirrors wedged against the guardrail for a close analogy of how I see this kind of 'engineering' practice.

A watchdog timer is the equivalent of ctrl-alt-del in case something stops working and while it is better than nothing and should definitely be present because it is still preferable from a system that is no longer responding at all (which is certainly going to be a safety issue) it should not be relied on for normal operation.

> Automotive, unlike aerospace, rarely has failover hardware or in some cases even A/B partitions on the disk for failover in software.

That's a cost decision, and with the cost of computation these days it is also absolute nonsense. A case could be made for this in the 80's but with hardware costing pennies this is simply no longer a valid excuse.

> Add in the need to process signals in real time and you will encounter situations where use of a health check (watchdog timer) is an appropriate response, and the use of one should not need to be reported.

I've been writing real time applications for a very long time and I highly doubt that such situations occur regularly but I'm open to having my mind changed, can you please explain exactly what kind of situation you have in mind where you think a watchdog timer expiring is an appropriate response?

For me a watchdog timer spells: the situation is such that we can no longer reliably function the safer option is to start all over again from a known set of defaults. It says that something unexpected has occurred that causes an operation that should have completed not to be completed and that this is outside of the design parameters that the software was originally specified with indicating that most likely the controller itself is at fault (and not the peripherals that it is attached to).

>> embedded software developers do not get to claim the high ground around what they consider safe or unsafe given the code from that world that I've had my eyes on.

> They (embedded software developers) don't make such claims; when something is safety certified, an external third party does the validation and verification and asserts that an implementation and processes are either safe or not.

Yes. And that process is anything but perfect. I've seen plenty of code that had passed certification that was so buggy it wasn't even funny. Including automotive. In an extreme cases someone thought it perfectly ok to do an OTA update on a vehicle in motion. I kid you not.

So let's not pretend certification is bullet proof even if it is useful it can miss glaringly obvious errors (time pressure, checkbox mentality).

> In the EU this is usually done by TUV (https://www.tuv.com/world/en/) or Horiba-Mira (https://www.horiba-mira.com/).

My experience is limited to the former. Let me recap that: I think their intentions are good but the bulk of the testing is limited to black box rather than in depth review and formal guarantees around performance. This has some interesting effects: it concentrates on the external manifestations of whatever makes the box tick and as long as the test parameters are exhaustive this will work very well. But for any device complex enough that the test parameters are only going to cover a fraction of the total parameter space you may end up with false confidence.

> This gets extremely complex as this is often hard tied to the hardware (support for a safety island, how memory is managed on the SOC, etc) and the overall E/E architecture (selected messaging protocol for the backbone) and layout of the vehicle.

Yes, again, I'm familiar with this and have some (but not complete) insight in how TUV operates when it comes to vehicle and component certification.

> Analyzing a system of systems to determine all the possible impacts and make sure that the chance of failure is small enough to be acceptable is a hard problem to solve and not one any single engineer does.

I think this is fundamentally borked. It will always be time and budget limited. Case in point: I recently reviewed some vehicle related stuff that had already been TUV certified that contained a glaring error in a complex control system, just looking at it from the outside gave me a fair idea of what I had to do to trip it up and sure enough it failed. TUV should have caught that (and the manufacturer too) if they were as safety conscious as they claim to be. I'm not saying that I'm outperforming TUV on a regular basis, I'm just saying that opening up this kind of code to more eyes, especially those that are more creative when it comes to breaking stuff, can - in my opinion - only be beneficial.

Edit: some more thinking about this: I think one of the reasons why I'm quite skeptical about for instance TUV is that in most countries that have large car manufacturers those manufacturers are 'too big to fail' and I would not be surprised at all if TUF (like BaFin) is not in a position strong enough to fail let's say a product line of a major manufacturer even if they find a massive error. It would immediately become a political football and in practice this gives manufacturers a lot of benefit of the doubt with respect to self regulation, besides the fact that such oversight entities are usually understaffed. TUV may well be with the best of intentions but the fact is that VW managed to bamboozle them in a way that any serious code audit including reproducible builds and something to verify that that was indeed what was shipped to customers should have caught.

But I don't see a massive undertaking to put all VW (and other manufacturers') code through the wringer beyond what was already uncovered simply because the only effect that uncovering such a scandal would would be to discredit the German car industry even further. So I don't think anybody is looking too hard.

intern4tional · on Nov 13, 2023

So, this will be my last response to this thread as I think it's run it's course.

> voluntary standards aren't standards

Most of the worlds standards work this way. They are standards, and it is up to various legislative bodies to decide how to enforce these things. In automotive, compliance with a standard is generally attested to a government and included in the package that is shared with other governments to allow import or sale of the car in their country. Tesla simply flaunts that.

> safety related bugs

This kind of thing isn't a thing if you understand automotive safety, or shouldn't be. You should have sufficient safety controls such that an unsafe condition will not occur. If this is a thing, you're talking about a bug then in the applied safety mechanism that allows an escape.

> watchdog timer expiring is an appropriate response?

Keys for SecOC get out of sync and throw an error. Not a safety problem per say, but your health check (since I consider watch dog timers an implementation of health and state management), you'd trigger a restart of the software to resync the keys.

> pretend certification is bullet proof even if it is useful it can miss glaringly obvious errors

I don't, but when it works it is sufficient. Open sourcing something adds nothing when it works. Importantly, usually TUV assumes liability for things they certify in many cases (not all, but generally that is how it works)

> limited to black box rather than in depth review and formal guarantees

We get the latter at my place from them, so I would poke at this area more if you think its black box only. This likely depends on the contractual terms, and who assumes liability.

> VW managed to bamboozle them in a way

The VW code is likely not safety relevant, so it wasn't reviewed as in depth. Most ECU code also isn't reproduceable even today.

> So I don't think anybody is looking too hard.

On this I generally agree as someone in this space. The amount of money invested in Pwn2Own is small given the barrier for entry: https://www.zerodayinitiative.com/blog/2023/8/28/revealing-t...

jacquesm · on Nov 13, 2023

> Keys for SecOC get out of sync and throw an error. Not a safety problem per say, but your health check (since I consider watch dog timers an implementation of health and state management), you'd trigger a restart of the software to resync the keys.

Ok, agreed in that case, though I'd prefer to see a forced reset rather than to rely on the watchdog timer as the mechanism to do it for you. You could just jump to the reset vector instead.