Simple rules for documenting scientific software

acidburnNSA · on Dec 23, 2018

I manage a team of power plant design engineers writing complex scientific HPC software, mostly in Python (which drives Fortran 77 codes behind the scenes, among other things). It's been a long haul but through the years we've learned a lot of good lessons and are pretty productive at what I consider at least moderately good code. Everyone starts off by reading Clean Code and taking basic proficiency training. We know to try to make code speak for itself because, as we've all seen many times, comments lie. We've slowly learned from the software engineering community how to set up things like Jenkins for CI, black for Python code auto-formatting, Phabricator for revision control policy and code review. We're writing docs in rst with sphinx, watching coverage, and getting pylint 10/10. We're waking up to the wonders of static types.

Mandatory code review has turned out to be a great way to train newbies, write better code, and make sure code is understandable to at least one other person. It's been an investment (it's slowww) but I still think it pays dividends. That's what I'd add to this advice. And I'd tone down the comments focus. Comment only when you fail to express yourself clearly in the code.

petschge · on Dec 23, 2018

One things that has worked well for us, is to give a very explicit invitation to every new code user to question every bit of the code. If something is not obvious, ask why it is there and why it is the way it is. Once the new user understand the code he then adds a comment to the code or explanation in the documentation. After four new users very little code is left where the design considerations is undocumented.

l0b0 · on Dec 23, 2018

One thing I've found even better than code review is pair programming. At the recommendation of the tech lead we all did it for the four years at my last job, and it was a fantastic way to spread knowledge around. For thorny prpblems we'd even "mob" on them, four or five of us sitting in front of one big monitor, rotating who used the keyboard and mouse, discussing and giving high-level instructions to the driver. We collectively figured we produced features about as fast as with individuals programming (because of the intensity and avoiding many dead ends) but that the quality was far superior. Reviewing the process honestly (without getting personal) is key, though, to address inefficiencies and interpersonal issues.

acidburnNSA · on Dec 23, 2018

I believe that. We pair up on super complex things but have been promising to do more of it having heard of successes like yours.

hprotagonist · on Dec 22, 2018

A really great idea: put DOIs in your function documentation.

Writing things like:

Implements equation 3.2 of Foo et. al. (2005), doi:10.2.3/baz

has saved me no end of pain in the past.

petschge · on Dec 22, 2018

This is indeed a helpful thing. And leave a note if you had to rename the variable called "s" in that paper to "x" to work with the 5 equations you pulled of that other paper.

killaken2000 · on Dec 23, 2018

I do this as well. I started doing it in college mainly as a citation but I find it helpful to do a similar thing for this reason as well as for any of the less mainstream algorithms or theorems that would provide further information.

hnuser355 · on Dec 22, 2018

Woah I’ve never considered this idea

timeu · on Dec 23, 2018

Some good basic suggestions that should be followed when doing softwaare engineering. But they forgot one aspect:

Don't be too smart for your own sake when packaging your software. As an HPC administrator I stopped counting how many hours I spent installing scientific software with broken Makefiles, interactive installers, simple github dumps, unresolvable dependencies. Sadly this is quite prevelant in life sciences (I am looking at you bioconductor)

gravypod · on Dec 22, 2018

Would be nice to point the scientific community in the direction of basic software engineering practices. Encouraging engineers to read Code Smells and Refactoring.

Most of the scientific software I've worked with is completely unmaintainable by anyone but the original makers. Buss factors of 1 are not sustainable.

danieltillett · on Dec 22, 2018

I write scientific code for a living and my code is very difficult for anyone else to maintain. It is not because my code is badly documented or written, it is because what it does is very complex. Every module is documented why it exists and what it does and the code straightforward to read, yet the interaction of all the modules is very complex as it reflects the underlying complexity of the problem the code is solving. Some code is hard to maintain because it is solving a complex problem that few people understand.

perfmode · on Dec 22, 2018

Managing complexity is one of the chief concerns of software engineering.

danieltillett · on Dec 22, 2018

Complexity can be managed, but the required domain knowledge is hard to design around. The problem my colleagues struggle with is the domain knowledge I have that they don't. The positive from this is there are few people in the world that can do my job :)

analog31 · on Dec 23, 2018

In my view, there's only a certain extent to which code can teach domain knowledge. People tend to freeze up when they see any kind of math, physics, or quantitative engineering. And managers want to believe that domain knowledge is worthless because it refutes treating people as interchangeable cogs.

The only solution is to make sure you keep someone on staff who has a hope of understanding the underlying technology, and failing to do so is a business risk like any other.

nomel · on Dec 23, 2018

I think you’re talking about code complexity. He’s seems be talking about problem complexity, which just is.

dekhn · on Dec 23, 2018

I don't buy it. I've worked with very complicated systems (ads at google) and it was pretty straightforward- it's the distributed systems communicating at RPC interfaces that is hard to understand. I've also worked with quantum chemistry codes and large supercomputer codes. The ones that were difficult to maintain were just badly written.

There was an article recently about Jeff Dean and Sanjay Ghemawat- one of Sanjay's skills is to make APIs that let you build very complex systems from components such that the resulting system is still comprehensible. This had huge long term positive effects for Google.

I strongly recommend reconsidering that your position; I've been able to drop into codes in tens of different domains, whenever they are well-written, regardless of the complexity.

danieltillett · on Dec 24, 2018

Well all I can say is I have worked with some pretty smart developers (much better than me) who have run slam bang straight into the domain knowledge problem and have got totally stuck. They have a habit of removing critical code that they think is not required or spending a huge amount of time solving trivia.

It is not that the code is really complex, but the underlying problem is complex. Some things are just hard problems and no matter how skilful a developer you are, if a problem requires 10 years of full time study in a particular area you will struggle if you don't have this background.

dekhn · on Dec 25, 2018

Why would a developer remove critical code. You have tests for them... right? Your design doc, code comments, and function call paths in the code all point to why the code is needed. And if you can remove it without breaking anything, then it wasn't needed in the first place.

I think you have bigger problems with your devs, that don't have to do with your "problem domain complexity".

Derbasti · on Dec 23, 2018

How would you make, say, a fourier transform or IIR filter design accessible to people who don't know any signal processing or complex numbers?

You can spend years studying these topics and their related math. Entire text books have been written on these subjects. Any one algorithm probably has multiple scientific publications, explaining how and why they work.

This much information can not fit into a few code comments and a README or two. Especially not in a format accessible to people without domain knowledge.

gravypod · on Dec 23, 2018

You don't need to understand FFTs to understand how to update good software. If there's a segfault generated by a program and I can run it though a debugger I should be able to fix the segfault without explicit domain knowledge of the code. My modifications should be testable via unit tests. There should also be a formalized process to incorporate my changes into the authoritative codebase. That process should contain a code reviewing process where one or more experts on the codebase critique my code.

These are all software engineering things that have nothing to do with domain knowledge.

dekhn · on Dec 23, 2018

Nothing about spectral transforms requires complicated software, beyond the need to optimize the algorithms. Sure, the inner components need specialists. No complaint there. But it's not like your stuff is "complicated". I guess the counter argument is something like Goto BLAS, where the author has an extremely complex internal mental model of Intel instruction scheduling and can emit incomprehensible, but absurdly fast instruction sequences. But of course, the BLAS API is dead simple and almost nobody using it needs to know the underlying solver complexity in detail.

Michaelanjello · on Dec 22, 2018

The way to manage this is to make your code more modular, to factor out subcomponents, to share and have them critiqued separately. For example, in machine learning frameworks, autograd is a separate package for automatic differentiation. I actively post questions and answers for subcomponents on Stack Overflow.

fredosega · on Dec 23, 2018

I'm a 'scientific programmer' who learned how to code basically trial by fire.

My background was mechanical engineering which didn't focus on coding at all other than needing to use/learn MATLAB for several classes. Went to grad school for HPC/CFD and there I was given access to our group's simulation code and was let loose to implement whatever routines I needed to simulate my problems. The shared components were the input/output system and the primary routine drivers (time-stepping and fluid dynamics algorithms and the like), and what I mostly worked with were different constitutive models which hooked into the system. Parallelization was implemented via MPI and was mostly complete, so my only job with respect to parallelization was to make sure that my algorithms would work in parallel.

I ended up taking several programming courses, but these were 100% focused on topics like parallelization with MPI, shared memory parallelization, and optimizing code, and a short stint in GPU programming. I learned nothing about code management or best practices. Oh, and this was all using Fortran and C, though now I'm working with C++ and Python, but that's because the newer libraries seem to be C++ and Python is just easy to glue everything together with.

My general programming knowledge isn't super great, but sometime this year I managed to download an open source iOS app and without any prior knowledge of Xcode or iOS programming/Swift was able to figure out how to implement something that was missing.

This is already a lot of words, but lately I've been thinking about getting out of academia and getting into actual software development cause I figure I kinda do that anyway. Obviously the easiest connections I could make would be to maybe work for companies like ANSYS that develop CFD software, but I feel like my programming knowledge is seriously lacking for that. You can give me a scientific paper that describes an algorithm to do a thing and I'd have no issue implementing it into some existing codebase but I read words written in this discussion like "code smell" and "CI" and I have no idea what these things are.

Anyway, can anyone recommend me some books to read and/or provide some advice and/or anecdotes on jumping ship from HPC/CFD/scientific programming into a general programming developer career?

sischoel · on Dec 23, 2018

Have you thought about contributing to some open source software? It depends on the project, but often after you submit your code, you will get a code review. And you probably will also learn something about testing.

a-dub · on Dec 23, 2018

11) Invariant: Someone should be able to go from a referenced paper or book to your code without scratching their head too much. This can be approached by either fortifying the documentation in the code or by specificity in a methods paper.

amelius · on Dec 22, 2018

Also don't forget to document all dependencies and their versions!

petschge · on Dec 22, 2018

If there is one thing I have about HN it is how smug they are about software engineering. The leading reason why scientific software stinks (and a fair fraction does) is not because scientists and software engineeres in that field suck, but because there is very strong incentives AGAINST writing better software.

Remember: This is not the 17th javascript framework, but software for problems that we don't understand going in. And often enough we have not understood the problem all that well even after a decade when we write the third code. These codes are research codes. Ongoing experiments. The main goal is NOT to produce long-term maintainable software, but to produce scientific understanding and build intuition about the systems that are modeled. The code is just another tool among experiements, analytic calculations and back-of-the-envelope discussions on a white board.

Rewriting the code every 5 years is an insane proposition in software engineering, but completely ok in some fields of science.

Would I like better language support to check SI units for me? Sure. Would I like highly performant libraries for vector fields, that work with gcc 4.6 on a top 500 machine? Sure. Would I like to be allowed to spend time on fixing yeah-I-guess-it-works code? You bet.

But would I like HN to just up about "scientists just need to learn to code"? Oh hell yes! Because -- believe it or not -- we often DO know better. But fixing code is not what the taxpayers, what YOU, pay us for. We are paided to understand nature. And until you are willing to pay higher taxes and spend more money and science and to invest more into fixing long term infrastructure you really do not get to be so damn condescending.

jasonpeacock · on Dec 23, 2018

By this argument, chemists should never wash glassware. Keep re-using it until it's too dirty, then through it out and make new beakers.

Your tools affect the quality of your work. Work with shitty tools, get shitty results.

I'll start believing your argument when scientists actually start publishing the code to go with their papers and make it reproducible.

petschge · on Dec 23, 2018

Two thirds of the codes in my field are on github. And for most others you can get a copy if you ask politely by email. That said I would appreciate it if journals not only had author, title, date and affiliation in the meta data, but also a git url and commit ID.

_wzsf · on Dec 23, 2018

Re-using code doesn't wear it out or contaminate it. This is one of the worst analogies I've heard here.

btrettel · on Dec 22, 2018

These incentives exist but are not a good excuse in my view. I'd recommend reading this blog post about how "incentives" are used to justify all sorts of bad behavior in academia: http://www.talyarkoni.org/blog/2018/10/02/no-its-not-the-inc...

Right now I'm nearly done a PhD in mechanical engineering. I've worked on several computational projects of varying levels of code quality from good (e.g., NIST's Fire Dynamics Simulator: https://github.com/firemodels/fds) to bad (won't give unambiguous examples for obvious reasons...).

I see clear incentives against uncertainty quantification and high quality model validation in my field. Those topics are treated superficially if at all, because people just want their model to look good, and if they do a rigorous job, that increases the chance that their model looks bad. As far as I'm concerned, if you don't do good quality model validation, you're not doing your job as a scientist.

A more specific example: Some codes are kept for a decade or longer if the software is sufficiently complex. One code I worked with was used by multiple graduate students over several PhDs and had very little if anything in terms of documentation. What did exist was far out of date. I spent a fair amount of time trying to understand and document the software, but other people treated me as if I was crazy for wanting to understand the black box I was provided. I don't think I ever actually understood that software fully despite the many hours I spent working on it. And I'm not certain that it provided correct results either. There were almost no tests, and the existing tests were hard to run and were consequently almost never run. The software produced several PhDs, though. The first few PhDs had it fairly easy. I had to pay their technical debt, and I couldn't pay it all back. It made me look bad as far as I'm concerned and did not save time in the long run. If anything, there's a long term incentive toward good software development practices.

Ultimately I think the solution would involve having minimum standard for scientific software quality, or else grant funding will be withdrawn. It will anger many people who have so far gotten away with writing low quality software, but beyond that I don't see many downsides.

petschge · on Dec 23, 2018

I agree that not enough benchmarking (comparision between different codes to check that they get compatible results), verification (checking that the code produces results consistent with the underlying model) and validation (checking that the code reproduces nature to a sufficient degree to be useful) it done.

I have actually written papers that try do to a bit of that and it was annoyingly hard to publish them. But unlike you and a lot of other posters here on HN I don't think the reason for insufficient validation is that we scientists are too stupid or too lazy. On the contrary, we would love to do more of it. But there is next to no funding to do it.

btrettel · on Dec 23, 2018

I appreciate your thoughts.

> I have actually written papers that try do to a bit of that and it was annoyingly hard to publish them.

I don't know what field you work in, but in my field (fluid dynamics) these papers are not uncommon, though often poorly done. There's a relatively new ASME journal on the subject too: http://verification.asmedigitalcollection.asme.org/journal.a...

> On the contrary, we would love to do more of it. But there is next to no funding to do it.

Perhaps this is the crux of the disagreement then. I don't need extra funding to do what I should be doing in the first place. It can be a big time sink in some cases (e.g., I probably spent 6+ months compiling existing data during my PhD), but often is not. It will probably make your model look worse than you'd like, but if everyone did it then science would progress faster.

petschge · on Dec 23, 2018

Kinetic plasma simulations in space plasma and astrophyscial scenarios. I have seen that journal and I think it is a step in the correct direction.

Regarding the funding: you kind DO need that extra funding. because if you don't get funding for that you get out-competed by the other research who skips that step and the next postdoc position or grant goes to that researcher who wrote two more papers instead of testing code that ultimately didn't find any major bugs.

btrettel · on Dec 23, 2018

> you kind DO need that extra funding. because if you don't get funding for that you get out-competed [...]

This seems to be a common view, but I don't think it's always true. In my own work, I've found that being careful has led me to write more papers, not less. Particularly given that I've found errors in previous work that I intend to publish or already have published. There's a delay involved (i.e., it takes time to find the problems) but I think I'm coming out ahead in total publications in the long run.

mturmon · on Dec 23, 2018

> ...I don't think it's always true...

I will have to re-evaluate my thinking on this.

My knee-jerk response is to say this is wrong. I have seen too many cases where people report initial results on some elaborate model as if they explain more aspects of the data, but then the results fail to generalize, and the theory, model or implementation turns out to be flawed or fragile. It gets a publication, but the state of the art does not advance.

But I notice that some younger investigators I have worked with tend to be more careful, with things like releasing code/data openly, being serious about code/model reuse, doing more careful verification/validation, being more rigorous about test/training data. It seems to be a spectrum of better research practices rather than one thing, like “better software engineering“.

Such a more principled approach should prove itself over time, because Nature cannot be fooled.

btrettel · on Dec 23, 2018

Yes, "being careful" refers to a variety of different techniques, not just better software engineering. What has ended up being publishable in my case wasn't related to better software engineering, rather, being more careful about model derivations and more rigorous about the available data. But it could have been software improvements.

To be clear, I don't think everyone needs to be as careful as I am. A few researchers in a particular subfield spotting errors can have large benefits for everyone. And those few may catch the vast majority of the errors. There are diminishing returns to adding more people focused on being rigorous.

petschge · on Dec 23, 2018

This is a very important point. Your budget of "being careful" might be best spent on better software engineering. But it is also possible that you should rather spend it on other things such as better mathematical model, better input data sets or something else. And a (sub)field prospers most if different groups and researchers spent there (very limited and expensive) time on different things. Trying to shame everybody into following the magic 27 rules of software engineering is a step back, not forward.

petschge · on Dec 23, 2018

Right. You do not always get outcompeted. But you run the risk. And yes, having good architecture in parts you need to change often is someting that pays of. As does having tests in section of the code that is brittle, hard to reason about or historically buggy. As does writing documentation on things that you had to spend an annoying amount of time on to understand. And I think that most researchers understand that and invest that time.

The thing is: This does not mean that either of us puts a lot of effort into having nice, good, modular architecture in parts we know that we will never change. Or write a lot of code coverage in module we understand very well and can reason about using analytic calculations. Or write a lot of documentation for things that are obvious to us (be it through familiarity or whatever reason). The next researcher however might want to change exactly that part we never want to change. Or might the analytic check we always used to make sure to code was not going of the deep end is not valid for his work. And he is near certainly going to need documentation on other spots of the code (and find the documentation we wrote and needed useless, because that topic is something glaringly obvious to him).

There is usually no advantage for us to make the code nice for that researcher. And THAT is very scientific software gets the bad name from. Because we have to make the trade-off, what is useful to us (and possibly our time). If we try to make the code nice and friendly for every potential user under the sun we usually DO get outcompeted.

btrettel · on Dec 23, 2018

I think I misunderstood your position earlier. My experience with academic codes seems to differ significantly from yours. It is not uncommon at all for scientific software to have no documentation at all in my experience, and very little if anything in terms of tests. I agree with you that it does not make sense to have detailed documentation and tests for all parts of scientific software. One must prioritize. What I'm arguing against is the (often implicit) attitude that documentation, testing, and other good software engineering practices are not necessary in science at all. I see now that you agree, and it's more of a question of the amount of software engineering practices which are optimal. I appear to prefer more than you do, but you clearly see the value of these practices. Correct me if I'm wrong.

petschge · on Dec 23, 2018

You are not wrong at all. And the amount of test coverage and documentation on needs depends a lot on the field, the numerical methods, the indended users, the type of the code (simulation vs analysis vs plotting) and so on.

The thing where I seem to have a totally different view from most of HN is that I think that we make valid tradeoffs, and that undocumented function and tests that only cover 40% of the code base is a perfectly fine state of affairs, instead of being caused by scientists that are too stupif for basic software engineering.

What a lot of software people also do not get at all is that variables such as x, v and a might be perfectly descriptive variable names.

mturmon · on Dec 23, 2018

You are making some good points here. It is expensive and exacting work to, for instance, set up validation experiments, or to quantify errors exactly enough so that whatever errors that do appear are explained by known effects. Agencies don’t like to pay for this.

One forcing function can be if high-stakes decisions rest on the conclusions from the code. In the geosciences, you start to see some of this care for air quality measurements and for climate science. (One relevant journal here is SIAM J. UQ., https://www.siam.org/Publications/Journals/SIAM-ASA-Journal-...)

The lab where I work has spent a lot of resources on validating thermal and fluid dynamical codes for atmospheric entry of Mars landers — an engineering example rather than science. They obviously are very motivated to avoid an unpleasant surprise from a model failure.

michaelwu · on Dec 22, 2018

In industry, we often also face an incredible amount of uncertainty in the form of unknown future functionality, UX, requirements, etc. As long as a business continues to adapt to changing market conditions, it's rare for its software to reach completeness. Some components' rate of change certainly slows down over time but newer components might rapidly grow in complexity or face major refactors. Successfully scaling to meet increased demand may also demand a pretty fundamental architectural shift. While not entirely the same, your evolving understanding of a problem and its effect on software requirements isn't too different from what we see regularly in industry.

It's very difficult to foresee what changes will be required 1-year's time, let alone 3. To that end, devs try to be extra thoughtful during the planning phase so that the end product is made up of well-behaved components. The extra time taken to come up with a sensible architecture goes a long way when business requirements eventually call for new or enhanced functionality and someone needs to understand the codebase in a reasonable amount of time. To be clear, this doesn't guarantee clean code -- bad architectural decisions happen, planning can occur without a clear requirements, no one held the quality bar during implementation, etc. -- but the outcome is always better than a laissez-faire design "strategy".

Perhaps scientific code churn is much more frequent than commercial code churn. In which case you would want to balance that by perhaps spending less time coming up with a good architecture -- and in the case of throwaway code, perhaps none at all. That being said, I would be surprised if new scientific knowledge always results in you having to discard every line and restart completely from scratch. I imagine that spending more time upfront architecting your software would instead allow you to more quickly react to new research results -- paying dividends on your time, in the long run.

BeetleB · on Dec 22, 2018

I'll just put this here:

https://news.ycombinator.com/item?id=18368455

This is normal. Poor sw practices lead to bugs, which lead to invalid science.

Also

https://physicstoday.scitation.org/do/10.1063/PT.6.1.2018082...

This got caught because it led to a big fight. Most other scientific software is not audited at all. Given my experience with those who write them, I tend not to trust most conjugational results coming from science.

btrettel · on Dec 23, 2018

The second link was fascinating and would make a good submission to HN.

> Given my experience with those who write them, I tend not to trust most conjugational results coming from science.

In computational fluid dynamics, there's a saying: No one believes computations aside from the person who ran them, and everyone believes experiments aside from the person who ran them.

To be fair, this is mostly because turbulence modeling can be very inaccurate, but bugs in the code are also a major concern of mine.

BeetleB · on Dec 23, 2018

>The second link was fascinating and would make a good submission to HN.

It was - that's where I got it from.

petschge · on Dec 22, 2018

Oh it is nothing new that HN is bashing code in science. But if anything that story is a reason for each group to build their own independent code instead of trying to enforce so much software engineering overhead until there is only one code by the best funded group is left because nobody else dares to (or can afford to) build their own code.

a-dub · on Dec 23, 2018

If you have "yeah I guess it works" code, you're likely producing "yeah I guess it's right" science, and that's not really the point it seems.

But yes, you're right. Perfecting things is at cross-purposes with the frenetic pace of research. Crappy untested code gets written under duress. Resist it. That needs to change, just as the whole funding model needs to reward verification more than new discovery... But back to the code, if the code is wrong, the science is likely to be as well.

Radim · on Dec 22, 2018

This is both wrong and dangerous.

I do hear this sentiment often: "Hey dudes, it's just a prototype! Why waste time making it pretty? Who cares? We're doing SCIENCE here!"

But prototypes (and beginnings in general) are precisely the time to be extra careful. Wrong turns and self-delusion are costlier, not cheaper, when you're the one paving the road for others.

In research, there are many ways to lead yourself astray, due to the inherent chaos of novelty, even without software bugs completely flipping the outcome. The idea that writing shitty code somehow "saves time" and is only worthwhile for the "17th javascript framework" has to go.

Articulating your thoughts into a sane logical structure (aka code), with sane names and motivation examples and conceptual units, saves you time even in the short run. Never mind 5 years. It also helps you avoid publishing unreliable, brittle, "SOTA" nonsense… of which there's sadly so much.

petschge · on Dec 22, 2018

There is a wide difference between code that is pretty and code that is careful and written in a way that will not lead you astray. And if you don't know or don't care to make that distinction I am not interested in your opinion.

akiselev · on Dec 22, 2018

> Articulating your thoughts into a sane logical structure (aka code), with sane names and motivation examples and conceptual units, saves you time even in the short run.

If you think he or she is only talking about pretty code, you don't understand the distinction to begin with.

chrisweekly · on Dec 23, 2018

SOTA -- State of the Art?

hyperpallium · on Dec 23, 2018

> Prior work has focused on various aspects of open software development [1–7], but documenting software has been underemphasized.

This article - like all software engineering - is completely unscientific. Scientific method:

> systematic observation, measurement, and experiment, and the formulation, testing, and modification of hypotheses

geoalchimista · on Dec 23, 2018

It is an editorial not a peer-reviewed research article. Mind the difference.