17 February 2011

CS education is important for scientists

Philip Wadler's recent blog post "Scientific programming does not compute" draws attention to a Nature article from last October : "...Error ...why scientific programming does not compute" (Nature 467, pp.775-777 - 14 October 2010)

This article has some wonderful examples of why CS education is important to students interested in pursuing a career in the sciences.
Researchers are spending more and more time writing computer software to model biological structures, simulate the early evolution of the Universe and analyze past climate data, among other topics.
and
As a general rule, researchers do not test or document their programs rigorously, and they rarely release their codes, making it almost impossible to reproduce and verify published results generated by scientific software, say computer scientists.
Although I imagine that computer scientists probably said that they rarely release their "code". Why is it so difficult for writers (and Hollywood) to understand that computer programming "code" is a mass noun (like "water"), and not a count noun (like "pumpkin"). I can only guess that people are confused by things like "secret codes" and think computer code is the same. Anyway, I digress...

So continuing with choice quotes from the article, a Toronto-based computer scientists named Greg Wilson conducted an online survey about the programming habits of scientists back in 2008. He received nearly 2000 responses and reported the following results:
45% said scientists spend more time today developing software than five years ago.
38% of scientists spend at least one fifth of their time developing software.
Only 47% of scientists have a good understanding of software testing. 
Only 34% of scientists think that formal training in developing software is important..
Wilson also added:
"There are terrifying statistics showing that almost all of what scientists know about coding is self-taught," says Wilson. "They just don't know how bad they are."
Why does this matter? Well,. this lack of proper programming skills can lead to data being misinterpreted and incorrect results being published:
As a result, codes [grrr...] may be riddled with tiny errors that do not cause the program to break down, but may drastically change the scientific results that it spits out.
The article then provides a number of interesting real-world examples:

  • A structural biology group at a research center needed to retract 5 previously published papers when they found a bug in the program that analyzed their data.
  • A computational biologist wrote some code based on assumptions that were valid for his work, but another group re-used his code in a different scenario where these assumptions did not hold true.
  • The code used to analyze data from the Large Hadron Collider is so complex and convoluted that it is difficult for new researchers to come in and test or modify the code

What does the article suggest we do about this problem?
In the long tern, though, [software developer Nick] Barnes says that there needs to be a change in the way that science students are trained.
and
Science administrators also need to value programming skills more highly, says David Gavaghan, a computational biologist at the University of Oxford, UK.
In some sense, the article is calling for better software engineering skills (including documenting and testing) rather than raw programming skills. But the programming fundamentals need to come first for everyone.

The crisis in K-12 computer science education is really a crisis for all the sciences.