Resources about programming practices for writing safety-critical software

ctz · on March 6, 2017

The obsession with C/C++ here is really weird. Like, take the MCO failure. That's a classic, textbook problem that can be structurally guaranteed not to happen with use of even a basic type system. It should be literally impossible to confuse values of different types/units/dimensions like this in something described as "safety-critical".

It seems like all the resources here are concerned with trying to whittle C/C++ into an appropriate choice of tool, rather than choosing a different tool. It seems like a 1980s-1990s mindset.

greenhouse_gas · on March 7, 2017

As I understood the MCO failure, a better type system wouldn't have helped, as the issue was that another program expected metric input while the first program outputted US units.

Units can help verify that a formula within a program is correct (velocity v = 0.5(metric_acceleration)9.8 t;cout<<v.to_metric();) won't compile, for example.

But it won't help with Program 1:

velocity_imperial v = 0.5(metric_acceleration)9.8 t* t;cout<<v;

Program2:

velocity_metric v; cin>>v; BurnFor(doSomeRocketScienceToCalculateEngineBurn(v))

jononor · on March 7, 2017

Don't pass numbers between programs without units.

greenhouse_gas · on March 7, 2017

The issue wasn't units. It was a break in (programming) "contract". Catching such bugs could introduce further (logic related string parsing) bugs.

A much more robust method (especially considering that this table was prepared in advance) would have been to have someone else to independently duplicate the work and compare results.

myst · on March 7, 2017

Don't use non SI units.

speleo_engr · on March 7, 2017

That doesn't solve everything. There are still textbooks published that use CGS instead of MKS units. Both of those are metric. Non-SI units are widely used in space (e.g., arcseconds).

Mikhail_Edoshin · on March 7, 2017

People use non-SI units.

Qworg · on March 7, 2017

C/C++ is almost the only choice on most embedded systems, which is where most safety critical code lives.

Qwertious · on March 7, 2017

That doesn't make any sense - Pascal and Ada and whatnot that were around in the 1980s, which means the fundamental problem can't be performance on any modern embedded chip. Besides which, you don't trade away stability for performance on a safety-critical chip!

Jtsummers · on March 7, 2017

It's not a technical challenge, but a political/business one. C/C++ won out.

> you don't trade away stability for performance on a safety-critical chip!

In a sane world, no. But we don't live in a sane world. We live in one where (on one system I worked on) it turned out a plane having an overheat (not fire) on one engine would cause the overheat on the other to fail to report to the pilots (corrected or I'd name and shame). And that wasn't just an issue of language, but of sound (or unsound in this case) logic. No one sits down and develops these things correctly. 10k lines of code (at most) on that project and most of it was just cobbled together in an ad hoc fashion.

throwaway729 · on March 7, 2017

> embedded systems, which is where most safety critical code lives

For how much longer?

The machines running self-driving cars aren't tiny little processors running single threaded code. They're basically full server racks worth of compute with multi-core cpus, gpus, and who knows what else.

The current approach of "use crufty-but-trustworthy hardware and never do anything too complex" doesn't scale to the next generation of "embedded".

Qworg · on March 7, 2017

Self driving machine code is inherently unsafe, as we don't have observability for the neural networks that run the most advanced models. The safety critical portion of the code focuses around running the base functions of the car.

One of the major hurdles to real Level 5 systems is proving their correctness.

throwaway729 · on March 7, 2017

> The safety critical portion of the code focuses around running the base functions of the car.

There's the rub. They don't meet the standards of safety-critical code, but they are safety critical.

I'm not sure how this challenge will be addressed, but I doubt the answer is "write everything in C". That approach works when your code is relatively simple, but doesn't scale when the code is actually extremely complex.

nerdponx · on March 7, 2017

But the deep networks and such that power a self-driving car aren't written in C anyway. Or are they?

Fricken · on March 7, 2017

C++ skills are strongly desired by self driving car companies

https://hackernoon.com/five-skills-self-driving-companies-ne...

throwaway729 · on March 8, 2017

They are but that doesn't make them analyzable... Dealing with C craziness AND dnn craziness is silly. And the latter craziness is essential.

blackflame7000 · on March 7, 2017

They most likely are written in C/C++ because the training algorithms are computationally intensive.

speleo_engr · on March 7, 2017

A team of PhDs using modern control theory can prove stability of an aerospace control loop across all different operating conditions.

A team of software engineers can then implement that with static analyzers and run time checkers during development to give confidence to an implementation.

Can you prove stability of a deep learning system across all possible operating conditions it may encounter?

throwaway729 · on March 7, 2017

Can you do the perception tasks that self driving platforms do using only control theory?

speleo_engr · on March 7, 2017

I'm not talking about self driving at all. I took your earlier comment to mean that the you thought the future of avionics control and other classical embedded problems would be using DNNs. Maybe that's not what you meant.

throwaway729 · on March 8, 2017

No. That would be crazy. Well maybe not really as planes become more autonomous and airspace more crowded, delivery drones and so on, but we are still a long way from that.

I was thinking more of self driving though.

pjmlp · on March 7, 2017

There are options available, namely Basic, Pascal, Ada, Java, Oberon compilers all the way down to microcontrolers (PIC and Cortex classes).

The main problem is that C and C++ won the political war and many devs don't even bother to look elsewhere.

Actually even C++, in spite of its safety improvements over C, has issues to cater to embedded devs.

Thankfully the IoT of shame is already making people aware that another path has to be taken.

kristianp · on March 7, 2017

Interestingly there was a comment about an Oberon Embedded IDE today: https://news.ycombinator.com/item?id=13806367

endorphone · on March 6, 2017

How would a basic type system protect against incorrectly interpreting an imperial floating point value as a metric floating point value? That seems like an especially weak example, and fundamentally falls under the realm of logical fault endemic of every possible programming language.

There are legitimate gripes about C/C++, especially in a space with hostile actors an unknown inputs, but that example was particularly weak.

dualogy · on March 7, 2017

> How would a basic type system protect against incorrectly interpreting an imperial floating point value as a metric floating point value

"Better" / richer / more refined (aiming/aspiring at least!) type systems than C/++/Java/C#/Go/etc aim to make it ever-more "powerfully convenient, effortless, and free-of-cost to denote eg. such different units as uniquely distinct (yet 'compatible' when explicit conversion is finally expressly (and visibly (and searchably)) called for) types" that all source code cannot possibly mismatch accidentally without the compiler catching it --- however the issue remains that we don't, and possibly can't have type systems that also enforce such styles (rather than just relying on a developer's/team's own discipline & resolve to adhere to such convention even in the face of deadlines & budget/schedule pressures) unless we actually totally forbid primitives like lone (semantics ambiguous) ints, lone (likewise) floats etc. Similar to "bool blindness", there's the general issue of "primitive/scalar-type blindness" always lingering. Probably some PhD candidate will write a Haskell extension for such a scenario some day soon.

Though of course the next thing the deadline-driven developer will do is construct a single "semantic" int type used for all different semantic sorts kinds and types of "ints"......

gpderetta · on March 7, 2017

For what is worth, C++ type system (which is more powerful than most, actually) make it simple to encode units (including composite units, exponents and ratios) and computing the exact units of complex expressions. See Boost.Units or std::chrono design. And with zero cost abstraction and little syntactic overhead of course.

pjmlp · on March 7, 2017

The problem is getting embedded devs to use it, or when they do use it, not to write "C with C++ compiler" code.

gpderetta · on March 7, 2017

of course, but it would be even harder to get them to use any other language other than C or C++.

cwzwarich · on March 6, 2017

Andrew Kennedy's PhD thesis was about extending Hindley-Milner type inference to support units of measure. It basically boils down to extending unification to support equations over free abelian groups, which had been solved earlier in an abstract context. The approach could be integrated into just about any HM-derived language, and is shipping as part of F#.

dwenzek · on March 7, 2017

Thanks! I was aware of the feature but not of its author.

https://blogs.msdn.microsoft.com/andrewkennedy/

https://channel9.msdn.com/Blogs/Charles/Andrew-Kennedy-F-Uni...

marvy · on March 6, 2017

> How would a type system protect against unit confusion?

That this is a mistake that is possible to make in any language, but nevertheless this is also a problem that could be caught by a sufficiently good type system, if it was put to sufficiently good use. In fact, part of the justification for allowing user defined literals in C++ was precisely to make it easy and convenient to avoid mistakes like the Mars Climate Orbiter. Bjarne Stroustrup himself used that as an example in at least one talk he gave about C++11.

spc476 · on March 7, 2017

A simple example would be instead of this:

    int temperature_a;
    int temperature_b;

or even:

    Temperature a;
    Temperature b;

you would have:

    Celsius a;
    Fahrenheit b;

And an assignment between a and b would be an error, as Celcius and Fahrenheit are different types. Going along this path then, if you can extend the type system enough:

    Yards width;
    Yards length;
    Acre  size = width * length; /* this would be okay */

    Yards width;
    Yards length;
    Yards height;
    Acre  money_bin = width * length * height; /* error */
    /* as acre is a measure of area, not volume */

Think of the type of calculations that the Unix command 'units' does, but built into a language instead of an application.

dkersten · on March 7, 2017

The Frink[1] language has units like this built in. Its pretty cool what you can do with it. I definitely agree that doing that or, at least, something like your code samples, is a good bug-avoidance idea.

[1] https://frinklang.org/

joshmarlow · on March 13, 2017

> How would a basic type system protect against incorrectly interpreting an imperial floating point value as a metric floating point value?

You could wrap your value in a typed data structure:

    enum LengthUnit {
      Feet(f64),
      Meters(f64),
    }

And provide conversion functions between them, then only operate on one type, `LengthUnit::Meters` and throw errors if `LengthUnit::Feet` is passed in.

I'm using Rust syntax here because it's fresher on my mind, but you could do the same with Haskell, OCaml, F#, etc. IIRC, OCaml would optimize away the outer structure so you wouldn't have much/if any performance hit. Presumably compilers for the other languages could/would do the same.

EDIT: for formatting and clarity.

btilly · on March 6, 2017

Anecdotally I personally have talked to several people in the last few years who do things like write guidance systems for rockets. My limited sample frequently either worked in C or (a limited subset of) C++. Albeit with a variety of tooling on top to automatically test and catch a variety of kinds of common flaws.

So as weird as this may seem to you, that mindset is applicable much more recently than you might expect.

nickpsecurity · on March 6, 2017

Most of them do use C due to talent and tooling available. They often subset it. They're also very careful. A small niche of the industry use Ada or embedded Java. A recent trend is toward model-driven develooment with tools such as Simulink, Stateflow, and Esterel SCADE.

btilly · on March 7, 2017

This completely fits with the discussions that I had on the topic. However I have no idea how representative my random contacts were.

AlexDenisov · on March 7, 2017

I have seen so many comments and references like this one, so I went and read the whole investigation report.

1. There was a spacecraft (MCO) and a module that was sending some data from the Earth.

2. The module was delivered late when MCO was on its way for 4 (!) months already before that staff manually calculated the needed data.

3. Some teams switched into "defensive mode" not willing to communicate and fixing the problem when it was clear.

banachtarski · on March 6, 2017

FigmentEngine · on March 6, 2017

Mars Climate Orbiter http://sunnyday.mit.edu/accidents/MCO_report.pdf

Jtsummers · on March 6, 2017

I don't have all my resources on hand right now, but off the top of my head this book should be added:

https://mitpress.mit.edu/books/engineering-safer-world

This list is barely scratching the surface of safety-critical system engineering, but it's a start.

jonahx · on March 7, 2017

I'm halfway through this, and not only is the theory insightful and often unexpected, but it's incredibly engaging, incredibly so for such an academic work.

kqr2 · on March 7, 2017

There is a draft version that is freely available:

http://sunnyday.mit.edu/safer-world.pdf

kevinr · on March 7, 2017

Parent's MIT Press link has a link to the final PDF, down the page on the left.

stanislaw · on March 6, 2017

Thanks for the link. The book has been added.

Jtsummers · on March 6, 2017

NP. I'll try to find more when I get home, no promises though, pretty busy these days with life crap and can't do as much there as I'd like on the technical side (day job doesn't give me time to focus on these issues either).

swah · on March 6, 2017

Other than the latest MISRA, I really enjoyed "Better Embedded System Software" by Phil Koopman.

Ideally you should read it before starting your project, since it deal with the product specification/gathering requirements phase, which is your starting point in safety critical systems.

[1] https://betterembsw.blogspot.com.br/2010/05/test-post.html

phelmig · on March 7, 2017

Does anyone know how software quality is handled in complex supply chains, e.g. automotive? From my point of view software is a 2nd grade citizen in areas dominated by manufacturing and classical engineering.

I guess testing an over-the-update for a car that was build by ann OEM and thousands of suppliers must be quite a task.

Jtsummers · on March 7, 2017

It's getting better, but hardware companies tend to view software as second-class. They think it's "easy", though they're finally accepting that it's not. It's taken decades of fatalities, cost overruns, and missed deadlines for them to realize this, but they're realizing it.

nerdponx · on March 7, 2017

> fatalities

If someone dies because of a preventable bug in your software, shouldn't that be considered manslaughter?

Obviously you formed a corporation in order to shield yourself from legal action (among other things). Fine, so you personally don't get charged with manslaughter. But in that case the corporation should be charged, and if convicted should be sentenced to the corporate equivalent of 25 years in jail. That would be a strong enough incentive to care about software. Of course, it never works like that in real life.

Is this as ridiculous as it sounds to me or is my outrage misplaced somehow?

pjmlp · on March 7, 2017

You are completely right.

As Hoare so elegantly described at his Turing award speech, regarding Algol compilers, back in 1981.

"Many years later we asked our customers whether they wished us to provide an option to switch off these checks in the interests of efficiency on production runs. Unanimously, they urged us not to--they already knew how frequently subscript errors occur on production runs where failure to detect them could be disastrous. I note with fear and horror that even in 1980, language designers and users have not learned this lesson. In any respectable branch of engineering, failure to observe such elementary precautions would have long been against the law. "

Full speech here:

http://www.labouseur.com/projects/codeReckon/papers/The-Empe...

Jtsummers · on March 7, 2017

It's as ridiculous as it sounds, but they go through a lot of effort on the corporate side to make sure they're in the clear. It's a lot of CYA paperwork and stuff demonstrating they've done what they could have. And then out-of-court civil suit settlements that are sealed so no one knows the details and can't form class action suits or coordinate well enough to initiate a criminal investigation (their family member's accident seems like a one-off to them, they don't know the extent of the problems).

adrianN · on March 7, 2017

Typically the software has to be developed according to some ISO standard like https://en.wikipedia.org/wiki/ISO_26262 and the supplier has to have some proof like from the UL or the German TÜV that they followed the procedures.

GoToRO · on March 7, 2017

MISRA, tons of testing, sometimes manual, and just a lot of people working on a somewhat simple problem if you ignore the safety requirement.

Also people don't realize, but by using a linter you basically don't write the code in C, but in "safe C". It's like a different language.

yeslibertarian · on March 7, 2017

hopefully in a future not so far away, most safety-critical code will be formally verified, like http://sel4.systems/ for example

kevinr · on March 7, 2017

Code like the Boeing 787's avionics package gets one better: the spec specifies what the register values should be after each step of execution, and there's a company which takes the code, puts the processor in single-step mode, and checks.

utborin · on March 6, 2017

DO-178B has been replaced by DO-178C.

mrlyc · on March 7, 2017

In addition to MISRA, I've found the safety checklist in Lutz's "Targeting Safety-Related Errors During Software Requirements Analysis" at https://trs.jpl.nasa.gov/bitstream/handle/2014/35179/93-0749... to be very useful.

RaiO · on March 6, 2017

Is there anything like this that specifically addresses reliability in a critical (but not "safety-critical") system?

jacquesm · on March 6, 2017

Yes, Armstrong's thesis is a very good starting point:

http://erlang.org/download/armstrong_thesis_2003.pdf

amk_ · on March 7, 2017

The techniques in the conclusions and appendix starting about page 200 are useful in any language

jacquesm · on March 7, 2017

Yes. This goes far beyond Erlang.

RaiO · on March 6, 2017

Looks great, thanks!

macintux · on March 7, 2017

An interesting paper that effectively describes the hardware equivalent to Erlang is Jim Gray's "Why Do Computers Stop and What Can Be Done About It?"

http://www.hpl.hp.com/techreports/tandem/TR-85.7.pdf

partycoder · on March 6, 2017

I have read the JSF standard. I learned a lot from reading it.

However, the JSF project has been reported to have lots of software defects.

hackuser · on March 7, 2017

> the JSF project has been reported to have lots of software defects

I haven't read anything that differentiates between these two possible scenarios:

1) Poor engineering, execution, etc.

2) The bugs expected in this software project. When I think of it this way, I'm amazed it ever was completed (but maybe I'm thinking about it the wrong way):

* Meet the specifications of not only three U.S. military services but also militaries and other entities in multiple national governments (with all the politics, compromise and complexity that involves).

* Invent and implement technologies to provide capabilities so bleeding edge that few people will imagine some of them for years, if not decades. There are no prior designs; nothing like it has ever been done. Part of the point is to exceed competitors' engineering capabilities by as much as possible.

* Integrate these technologies into a massive system of systems, arguably the most complex system in the history of humankind.

* The system is human-rated.

* Performance is the highest priority; there is no making easy compromises of performance for safety: Human lives, the outcomes of battles, the fates of nations, and the course of history may depend on performance.

* Accomplish this in secret, greatly restricting your access to outside resources. Will this work? You can't publish a paper and get feedback, or make a presentation at a conference.

* Accomplish this in coordination with thousands of suppliers in many countries.

* Because it's hardware and very expensive, your ability to iterate is limited. My completely amateur guess based on the above is that it's a massive, decades-long waterfall-style project.

nickpsecurity · on March 7, 2017

"Integrate these technologies into a massive system of systems, arguably the most complex system in the history of humankind."

You went a bit overboard there. There's plenty of systems probably more complex that work fine on a daily basis. They were usually designed centrally, though.

hackuser · on March 7, 2017

Maybe, but plenty? What are you thinking of?

Also, can you distinguish between those two scenarios (which was my main point)?

nickpsecurity · on March 7, 2017

Huge systems with crazy amounts of code are what power most large companies. The budgets are usually way smaller than for defense contracts such as we're discussing where the profits of the contractors are assured through corruption. The former have to get more done with less. There's thousands of those teams, too, in just the Global 2000. Maybe Fortune 500, too.

Far as amazing examples, I'd go for the centralized, five-9's systems like NonStop or the decentralized ones such as OpenBSD before the failure we're discussing.

hackuser · on March 7, 2017

> OpenBSD

I think there is a miscommunication. OpenBSD is as complex as the F-35? I think the OpenBSD team would be very disappointed if that were true! OTOH, it would make the ~10 minutes it took to install it on the laptop to my right very impressive.

EDIT: What I meant to ask in the GP was, do you see a way to distinguish between whether the F-35 is poorly executed, or whether we're seeing/saw the normal bugs for a project as described above (even admitting hyperbole about complexity, which I'm not sure of, it's still quite a project).

> where the profits of the contractors are assured through corruption

Not a major point, but these issues are too important to let pass these days, IMHO: How much of whose profits are assured through what corruption? I'm sure some of that goes on, as in all large institutions (including the large companies mentioned above), but I'm not ready to assign corruption to all or most defense industry profits with a broad brush.

Also, I rarely hear much attention given to reducing it. For example, the GOP in Congress was working to take procurement authority out of a central DoD office, where it was put to prevent corruption and to put real procurement experts in charge, and into the hands of generals and admirals, people without procurement expertise (commanding fleets and fighting wars is a much different skill set) and with a bad track record regarding corruption in recent history (look up the Fat Leonard scandal, as one example off the top of my head). What I struggle to remember at this moment is whether that change passed; it was a pet project of McCain's.

vonmoltke · on March 7, 2017

Don't blame the tools, blame the carpenters (and the customers, and the customer's bosses).

watwut · on March 6, 2017

That is awesome, thank you.

throwme_1980 · on March 6, 2017

c++ is not considered safe for any RTOS system, in fact you won't find it used in Aviation embedded devices (referring to the big 3 ) Tools yes, you can higher level languages to your heart's content.

vonmoltke · on March 7, 2017

Huh? The F-22A, F-35, P-8, and P-3 are all flying C++ code. Those are just the programs I have personally touched (not necessarily the code, though). Where did you get the idea that it "is not considered safe for any [real-time] system"?

jordanb · on March 7, 2017

The F-22 avionics are mostly Ada. F-35 is mostly C++ though. If you want a good face-palm go read up on the "decision-makers" advocating making the switch. I saw a quote by a general blaming the F-22's cost overruns on the "Ada Operating System".

But don't worry, C++ is a "COTS Industry Standard" so you can bet there were no overruns on the F-35. /s

vonmoltke · on March 7, 2017

> The F-22 avionics are mostly Ada.

Yes, including the piece I directly touched. It was deemed to minor to be worth the cost of rewriting.. Hell, when I left the program we still had an arthritic VAX to build on, should we ever need to rebuild the code.

As for your last line, see my comment elsewhere about blaming the carpenters instead of the tools. :-)

pjmlp · on March 7, 2017

The cost of the F-35 program shows how good that decision has been.

unboxed_type · on March 7, 2017

As far as I understand all vehicles you mentioned are not subject to DO-178 regulation. If they were then C++ code would have been much less likely be used. It is because C++ code is much more difficult to prove correct either using 100% test case coverege (DO-178B) or thru formal methods (DO-178C). Please correct me if I am wrong, I am doing a research on a similar topic.

speleo_engr · on March 7, 2017

The F-35 (JSF) is DO-178B. The RTOS itself (INTEGRITY) is mostly C and C++.

http://www.militaryaerospace.com/articles/2013/10/software-c...

unboxed_type · on March 7, 2017

Well the article says that F-35 runs operating system compliant with DO-178B, right. But it does not say that control logic software (applied software) was a subject to DO-178B certifcation. Anyone can buy a license for DO-178B RTOS, it is not very expensive actually. But to certify your own control avionics algorithms is a whole different story. Agree?

speleo_engr · on March 7, 2017

I agree development of the avionics is different from the RTOS. But what makes you say that the avionics algorithms aren't DO-178B? That is something the customer (gov't) would specify to the contractor. I didn't ever work on the JSF avionics code to know the development process. Did you? A lack of press releases about it does not mean it is not DO-178B compliant.

Reading the JSF coding standard, it sounds like all libraries must be DO-178B: "All libraries used must be DO-178B level A certifiable or written in house and developed using the same software development processes required for all other safety-critical software. This includes both the run-time library functions as well as the C/C++ standard library functions. [10,11] Note that we expect certifiable versions of the C++ standard libraries to be available at some point in the future. These certifiable libraries would be allowed under this rule. "

unboxed_type · on March 8, 2017

Good point. As far as I know there are no DO-178B complient C++ libraries available yet (STL, for example). If you know any please point out. My reasoning is as follows: While civil avionics must comply with regulatory safety standards, military avionics is not. So economically speaking there is a little sense for them to do such costly certification just to ensure quality.

throwme_1980 · on March 7, 2017

i can go on all day about this, but this comment has summarised it nicely, you have to understand that to be compliant with DO-178b/c you need to do an un-godly amount of v&v and other things that literally cost millions $ per quarter in a span of a single project, c is much easier to prove right and can easily achieve a good test coverage, c++ on the otherhand is considered a "high level language" and is used for instrument panels and such things, down the stack however you will see people using much more conservative languages.

speleo_engr · on March 7, 2017

What? The most popular choice for aircraft RTOS seems to be INTEGRITY which is written C/C++ itself.

http://www.militaryaerospace.com/articles/2013/10/software-c...