Science and Fluid Dynamics should have more open sources

Stéphane Zaleski

Revision 1.4,   October 9, 2001.

The most dynamic model in software engineering now is provided by the Open Source movement, and lies behind successful projects such as Linux or the egcs compilers project. However it has found little echo in the community of scientists performing for instance advanced fluid-mechanical simulations. There is a relative paradox in this situation: for almost a decade Linux has swept the university computing world, providing a cheap and reliable alternative to the workstations and supercomputers of the past. The compilers and operating systems that academic scientists use everyday are now Open Source software. Most of the same scientists are reluctant to even let others see their code, not to speak of adopting a full-fledged Open Source development model for the software they produce. In this paper I explore the current state of affairs. I  try to provide an explanation based on the different ways in which academics and programmers gain reputation. I then advise a change of minds: science should be more open.

What is Open Source?

Open Source is a movement that is inspired by the phenomenal success of Linux as a complex, large-scale collaborative project producing Free Software: software that you are free to use, copy, modify and distribute.

One of the main advantages of free software is the creation of a large community of users
who can provide feedback and help in debugging and improving the software. Software mistakes are difficult to find, and having a hundred users reading your source is like being able to reread your source a hundred times, with fresh eyes, exposing bugs that occur only in rare circumstances. A large community thus works as a "parallel debugging device". Hundreds of eyes will be better at spotting obscure bugs, thousands of users will explore many different ways to use the code.

Because more errors are caught in this way the resulting software is often of better quality and more reliable than software developed and debugged by a small group. The quality also improves as users provide feedback on desirable features. Some open source specialists emphasize that opening the code forces developers to adhere better to good programming practice. There is a reward for adhering to the practice, for instance in the form of a higher probability of recruiting good collaborators. On the opposite in the closed-source world, there is no reward for good practice, even if managers are aware of it.

Writers of successful open source software, in science or elsewhere, gain a form of reputation akin to the reputation gained by publication. They are known, cited and recognised as having made a contribution. This reputation is beneficial for the group having collaborated to produce the software as well as for the individual members.

The community of users as a whole obviously gains from the availability of open source software in financial terms, but also in the form of a wider choice of software.

However all these advantages do not fully convince scientists to open their code. To understand why, it is useful first to reflect on the Open Source movement in general.

Hackers' motivation for open source development

Hackers, a word sometimes debased in reference to computer vandals, originally refers to the first communities of programmers in the 1950s and 1960s, whose culture and work predate much of the current sociological debate on the Open Source movement. What motivates hackers beyond the hope of obtaining better software? And what could motivate scientists?

The first motivation of a hacker is a personal interest into a problem. The writer of a free software platform has a personal need for the platform. For instance a "computational fluid dynamist" may need a powerful multigrid solver of some sort, and not be satisfied with anything he finds on the market. He will then start developing it, often starting from existing software. This is of course the same in the scientific world: we develop numerical methods because we wish to solve the problem at hand.

The second motivation is ego boosting. The writer of free software has the satisfaction of working for the common good, to have provided a useful product to sometimes millions of users and to have been recognised as an expert programmer. This can even translate into monetary gain, as recognised hackers start businesses built around free software (such the various distributors of Linux) . A scientist could be similarly motivated, if he felt that his recognition as a scientist will be enhanced by publishing his code. Here the question is the equivalence of publishing code and publishing papers and books, or giving conferences. I will return below to the issue of how recognition or reputation works for hackers and scientists.

The third motivation of Open Source hackers is political: to avoid "world domination" by a single for-profit entity. It was IBM in the past, it is Microsoft now. More deeply, Free Software and Open Source promoters fight the commercialisation of software, to preserve the right of programmers to observe and understand the working of the software components they work with, and to avoid the private ownership of basic software methods, deemed akin to mathematical ideas.

Software licences

Free software is covered by various types of licences, discussed in .

All open source license allow free copying, modification and redistribution of the code. They differ in what one is allowed to do with modified code. In the GNU Public License (GPL) the modified code or "derived" work, must be distributed under a license similar to the GPL; Specifically, it is not possible to take GPL code and to use it to construct a commercial code. For instance a user interface library released under the GPL could not be used to build the user interface of a commercial program.

Most other licenses are more liberal. They allow free usage of the modified code, including inclusion in commercial code. The Lesser GPL or LGPL is such a license. Other licenses give specific rights to the initial developer, such as an exclusive right to link the code with commercial code as in the Netscape license.

Most scientists use different types of licensing agreement. Usually they forbid commercial use entirely, and even do not release the source to non-scientists in some cases.

How successful is the open source movement in the general software world?

The success of Linux and then apache (the web server software most sites use now) were major events. Another major victory for the open source movement was the decision by Netscape to open the source of its browser. However it is possible that a turning point has been reached. The strength of the Open Source movement was the topic of a recent Economist survey [1]. The Economist aptly summarised both the exaggerated hype and the sound support for Open source:

"The Linux hype has undeniably crested. SuSE is not the only company that has had trouble building a viable business based on the free operating system. VA Linux, for example, a start-up which, in late 1999, enjoyed the most successful flotation ever -with shares gaining almost 700% on their first day of trading- recently said it would cut a quarter of its staff and take nine months longer than planned to achieve profitability. Only the
market leader, Red Hat, is doing better than expected. But this should not be read as a sign of the imminent demise of open-source software via Linux, its standard-bearer. Most people in the software industry believe that open-source is here to stay. Steve Ballmer, Microsoft's chief executive, recently called Linux "threat number one". Steven Milunovich, a leading analyst with Merrill Lynch, an investment bank, argues that open-source is a "disruptive technology" that could topple such industry heavyweights as Microsoft and Sun.

In fact, the open-source movement is less about "world domination", which hackers often joke about, and more about an industry which, thanks to the Internet, is learning that there is value in deep co-operation as well as in hard competition. 'Much more than a cause, the open-source movement is an effect of the Internet,' says Tim O'Reilly, head of an eponymous firm that publishes computer books, and a leading open-source thinker.

Why is Fluid Dynamics and Scientific software not very open ?

The closeness of scientific software is a fact. To explain it, several types of scientific software should be distinguished: there are very small pieces, just a few lines, and huge fluid dynamics codes that may spread over hundreds of thousands of lines. There are also very specialised types of software, for instance a Direct Numerical Simulation software (DNS) used by just one researcher and general-purpose Computational Fluid dynamics (CFD) software that may be of interest to the world wide community of engineers.

Most of existing scientific software is of the first sort. A program will be developed to answer a single specific question and be used by a small number of programmer/users, typically on a two-digit scale. In principle any dedicated academic could write his own simulation software. In practice the codes are typically developed by graduate students and professors, with various degrees of cooperation between them . In my experience, one to four people at most collaborate on such projects, although with time project histories may become much longer. The user base for these codes is very small, typically a dozen similar groups worldwide.

Most of this scientific software is not freely distributed although some utility program (computing special functions, linear algebra routines are included in several public domain libraries such as the GSL or netlib (Interestingly the famous Numerical Recipes library is not at all free or open, but only what I later call "visible"). General purpose code is not open source with very few exceptions, for instance MOUSE [2] . There are rather potent reasons for which academics and their institutions refrain from openly distributing of the code.

I examine each in turn.

Additional work

The larger the user and developer base for a project, the more work per line of code there will be. All of us have had the experience of writing code that we would then use ourselves, which may be called a "basic working code". Such code must be debugged for the use we intend to make of it but no more. When such code is to be widely distributed it becomes necessary to provide an extensive set of test cases so that other users may verify that they have a working copy of the code. It is necessary to freeze versions and maintain a repository of the code. Obviously documentation is necessary to teach distant users how to use the code.
A mailing list server is a useful thing to have at this stage.

As collaboration with other developers starts, we need to organise this collaboration: specify a development plan, identify subproblems, and document the code in a new way: documentation for developers is not the same as documentation for users.

Brooks [3] estimates that the additional work for having a product is three times the work required for the basic working code. Moreover Brooks estimates that when the program is part of a larger system of interacting programs, the additional work is nine times the original work for the basic program. In the scientific context, the amount of work probably obeys its own laws. I estimate that testing is much more difficult, because of the long turnaround time between two simulations of a complex system, and the extensive and complex nature of the testing necessary to verify the adherence of the code to the existing science. The resulting factor is probably rather large as well.

Scientist tend to shudder at this additional burden, and indeed it may be unnecessary in the first phases of a project. Many academic programs are written and used by a single graduate student, or a graduate student/professor combination. Documentation is not necessary as new collaborators learn by watching over the shoulder of the other programmers. Some scientific projects will grow, if sufficiently timely and interesting. Generation after generation of graduate students arrive. Commercial applications arise, and collaborations with other universities start. Then better programming practice becomes necessary.

However, at every stage of this process, code could be released without more work than is necessary for the need of the academic project. If code is to be shared among two people, there is no harm in giving it to a third. He will perhaps be very helpful to you by finding an elusive bug. He may on the other hand bother you with requests for help in understanding the code: it is up to you, the author, to balance how much help you are willing to give. If you give more, he will be encouraged to work with your code, and will work more productively, eventually giving you useful feedback. If you give less, nothing bad will happen (One fear however is that the disgruntled user may give you a bad reputation. I will return to this below).

Most of the time in my career I have benefited from opening my code and giving it to others. I very rarely had negative consequences, sometimes they were fantastically beneficial.
To summarise my advice:

Do the amount of additional work that you would have done anyway to grow you project, no more, no less.

This is the same law at work when we select our academic collaboration. We will not spend all our time in exchanges with colleagues. Sometimes it is beneficial to work alone, other times are for sharing and communicating. Quite independently of the issue of source availability, we decide to enlarge or not our collaboration on scientific projects.

Better practice in programming and documentation should be viewed as a tool in your project's interest, not a burden.

Another way to view this issue is that as scientists we wish and indeed need to tailor the size of our collaborations and contacts, not have it imposed on us from outside.

Lost revenue

Scientific software is very rarely fit for commercial use. However there are various ways in which revenue may be extracted. Obviously work derived from academic work and put into a more useful form could be sold commercially.

This is particularly true in fluid mechanics. Simply solving the equations is sometimes not very useful commercially.  However  one may add some features of interest to engineers, such as chemical reaction databases, Graphic User Interfaces, complex meshing tools.  That becomes interesting.  Scientists would reluctantly abandon the hope to extract some revenue, some day, from their code. Parent institutions [PI] which own some rights and also expect some of the revenue share this concern.

 The first reply is, if the work was done in an academic environment, that it is a bad idea to mix code development with attempts to gather additional income.  I for instance would be extremely  loath to collaborate with graduate students, colleagues from my and other institutions,  technical staff and so on and then commercialise the code. I feel this is shortchanging all those who collaborated in the project.  It creates an unhealthy atmosphere. It is contrary to a nice scientific tradition of sharing thoughts, tools and data.

Nevertheless I respect colleagues who would  like to earn some revenue anyway, and there is also the question of self-employed or non-academic people.  And anybody would be upset to see some commercial entity add a Graphic User Interface (GUI) interface and start making money from their years of work.

Keeping control of the way the code is commercialised is a real issue, but it is compatible with making the source code available in several ways [GPL]. Let me describe three 1) distribute the source code freely but with a license that allows only academic or nonprofit use and may forbid redistribution, 2) distribute the code with the GPL  license. In short the GPL imposes that any derived work should also be free. This allows the use by anyone, not only academics, but the right to link with non-free software, as well as other rights, are limited 3) distribute the code with a more liberal license such as the Library GPL.

The choice among the various options is a difficult and complex one.  Moreover the current
discussion of these issues is obfuscated by political considerations and name calling. My advice is: choose what
strategy suits you best, but in any case the fear of losing revenue should not prevent  you from opening your code.
Indeed there are ways of obtaining revenues in all three cases. This is obvious in the first case. In cases 2) and 3) it is less obvious but nevertheless possible. Discussing it would however take this discussion too far from
its central point.

A potential fear however is stealing: what if I distribute my source code with a relatively closed license such as option 1) and somebody violates the license? Well if you worry about this, you will always worry. What if your graduate student gives the code to a friend who markets it? What if somebody steals your portable computer?  In my opinion this type of cheating is a marginal nuisance. But if you think otherwise, then you should balance the amount of lost revenue with the benefit of opening your code to anybody, including your closest collaborators.

Gain or lose reputation by distributing code?

The desire for recognition and reputation among program writers and scientists are real. However they may have contradictory effects. In the hacker's world, publishing code is a step towards gaining reputation. In the academics world, the fear of losing reputation is an oft-cited reason to avoid revealing code.

Reputation actually seems involved in the behaviour of academics in two ways. There is the fear of losing reputation by being criticised for poorly written code, ill-chosen algorithms and so on. There is also a fear of diverting time and energy that could be used to gain reputation by publishing in scientific journals. It is quite possible that at some point, researchers feel that while they spend time preparing their code for free distribution, or after the distribution has been made, other academics will benefit from the distribution to write their own papers with the results of the code. This is why it is sometimes the custom, when giving code in the academic world, to require that the first paper published using the results of the code bears the name of the code-givers as authors.

The first fear, of lost reputation, is in sharp contrast with habits in the hacker's world. Raymond [4] notices that hackers very rarely criticise each other for sloppy programming:

For very similar reasons, attacking the author rather than the code is not done. ... Bug-hunting and criticism are always project-labelled, not person-labelled ...This makes an interesting contrast with many parts of academia, in which trashing putatively defective work by others is an important mode of gaining reputation. In the hacker culture, such behaviour is rather heavily tabooed ...

As a scientist myself, I recognise there is grain of truth there but I claim that it needs not be so. Everybody is better served by a relaxed, respectful atmosphere in the community of scientists. Perhaps the key is, as my master Ed Spiegel used to say, that  "there are too many of us"[5].

However, to avoid severe criticisms even in the hacker's community, a code must do what it is advertised to do. The following is an excerpt from Raymond [4]:

There are consistent patterns in the way the hacker culture values contributions and returns peer esteem for them. It's not hard to observe the following rules:

1. If it doesn't work as well as I have been led to expect it will, it's no good -- no matter how clever and original it is2

Note the `led to expect'. This rule is not a demand for perfection; beta and experimental software is allowed to have bugs. It's a demand that the user be able to accurately estimate risks from the stage of the project and the developers' representations about it.

This rule underlies the fact that open-source software tends to stay in beta for a long time, and not get even a 1.0 version number until the developers are very sure it will not hand out a lot of nasty surprises. In the closed-source world, Version 1.0 means ``Don't touch this if you're prudent.''; in the open-source world it reads something more like ``The developers are willing to bet their reputations on this.'3 '

Scientific code is relatively buggy and scientists will of course value "clever and original work". This may explain why many scientists prefer to leave the code closed, perhaps releasing it only after having found the time to clean up the code. This may of course never happen. In my opinion, releasing the code with appropriate warnings would be a better solution. This is just the practice I advocate above: to release the code in whatever condition it is. There is a real need to understand and explain to the community the dynamics of releasing software in the scientific world, so relatively new, imperfect software is not unjustly criticised.

How scientists and hackers differ

The reputation issue is only an outcome of what is, in my opinion, a deeper difference between a programmer and a scientist. A programmer is primarily motivated by the production of a useful software tool, adapted to some conditions of use. He may develop algorithms but this is not his main goal. The program may use a wide variety of algorithms, old and new.

A scientist is in the business of conceiving and promoting a new idea. The new idea may involve a new algorithm and a computer program may be necessary to demonstrate the idea, but the program is not the goal. Rather, the program is one step in the argument that the new idea is valid or interesting. The new idea may lead to an algorithm that will be used in a lot of different contexts, by different programs.

Actually, non-specialists are often better at writing programs than the original authors of a new algorithm or concept, perhaps because they are in a better position to select the best algorithm and focus on the issue of implementing, rather than discovering algorithms.

To gain reputation, the main goal of the scientist is to promote his new ideas through scientific papers, conferences and books. It is sufficient for this to have a crude program that only he understands. Even partially buggy programs may serve his purpose, by demonstrating how things could be done. On the other hand, the goal of the programmer is to have the most useful program, with a large user base. He does not need to have any original idea in his program.

Still, things could be more open

Despite these barriers to the opening of scientific code, I believe there are very good reasons to distribute scientific code more freely.

At least scientific code should be "visible source" in the sense discussed above. Making the source visible is akin to publishing the proof of a theorem for a mathematician or opening his lab books for others to see in an experimental setup. This is the only way to adhere to scientific standards.

Even when an algorithm is described in a very detailed way in a publication, the actual implementation of it often adds some unforeseen details. Missing these details makes it impossible to understand how the authors have actually performed their work. Only source code can reveal it.
This is the motto of an organisation such as OpenInformatics ( [OI].

For similar reasons, it is considered unscientific in the academic world to use commercial "black box" code when performing genuine scientific calculations. Often this is just a reason of practicality: Commercial code is just not good enough to perform the required work. However this is also a healthy reaction to performing work on an unknown basis. This reaction should demonstrate the benefit of opening the source code basis of one?s simulation work.

Moreover the gains of visible source in the sense of the existence of a large potential debuggers community are enormous. This is often not perceived by scientists who wish to remain secretive about their work. The fear of having one's idea "stolen" by others is not restricted to the worlds of source code: theorists also are sometimes loath to discuss their ideas before they are published. My personal experience, however, is that it is a counterproductive attitude: discussing one's ideas allows to get rid of the bad ideas and to improve the good ones. In software language, if somebody has a conversation with me and then takes version 0.8 of my idea and publishes it, I may quickly thereafter publish a better version 0.9 .

The actual taste for secrecy is very widely fluctuating among scientists. Some scientists behave as ancient craftsmen or cooks, who like to keep some secrets and only reveal it to selected apprentices. Other scientists are more like artists, for instance painters or performers who will be willing and even desire to perform in public. The craftsman scientist is an introvert, the artist scientist an extrovert.

On selling code

Beyond the visible source stage, I would like to argue that it is not a good idea to try to sell code. It is probably better to distribute it freely in some sort of open source license, be either radically free such as the GPL or just open such as the Lesser GPL. It is inherently difficult to market scientific code, with a very small user base and very specialised applications. Moreover the situation seems to evolve towards ever more free or low cost software.

The case for selling code is stronger in the case of complex, general purpose CFD applications that may find a large market. However even in that case selling code creates large difficulties. This an area where public funding has been abundant in helping academic groups or national labs advance projects. Private firms have difficulty making money in the face of so much subsidised competition. As long as they can benefit from a flow of work from the universities they may remain afloat, but once they have to perform all the development and research work on their own they may become much less profitable. Commercial firms may then argue that their competitors have been unfairly subsidised by public funds.

It is much better to claim for the start that the result of research will be made available free of charge to the general public. Moreover this reduces the incentive for secrecy. If code is to be commercially distributed, the university group often has to enter an agreement with a company that will add a GUI and take care of marketing and maintenance. This company will often require that the source be hidden, even from other academics! [6]

How parent institutions and governments can help

If this analysis is right, code shall be more widely available and distributed because it is in the interest of the science community itself. However several institutional decisions may help this trend.

Helping the academic and engineering community by publishing code should be rewarded in a manner similar to the rewards a scientist gets for publishing papers. Ways to measure the usefulness of published code should be developed in the same way as the impact of scientific papers is measured by citation metrics. The results could and should be used for granting appointments promotions and tenure.

Institutions should firmly stand behind their academic personnel to defend them against attacks from commercial companies, such as lawsuits brought about copyright or patent issues. Indeed patent issues are a potentially severe blow to open source software. Software patents are not yet possible in Europe [7] but are already a dangerous practice in the US. There is some urgency in this matter since the EU plans to develop its legislation on software patents.

The European Union, in particular, should take notice of the advantage that some distribution practices give to the United States. A lot of "visible source" software is export-controlled: only US citizens may obtain and use these codes, often developed by national laboratories. The spread of Open Source CFD code in Europe is one of the remedies in a currently weak situation.

France in particular lags behind in open source [1]. The .fr domain which includes the French academic institutions lags behind others in the number of open source projects. French industry is extremely secretive and this behaviour affects the French academics who often rely on industry for their funding. The French public computer science institute, INRIA, has often preferred to commercialise its products through service companies.
However, recently initiatives have begun to arise in the french university to promote free software.  See the Paris 7 initiative
at An excellent report on the issues surrounding the development of free software in a university environment may be found there. One of the suggestion is that the university gives a blanket approval to all its employees to release source under a free software license. To avoid difficulties related to multiple ownership of copyright by collaborators from various universities, the conclusion of agreements between universities is suggested.

Practical steps towards the promotion of Open Source software are relatively low cost. It is not so much an issue of funding (although that always helps) than of attitude. A useful step would be the establishment of tools such as SourceForge dedicated to the scientific world. A potential ScienceWorks repository would couple more visibly the coding aspect to the conceptual aspect, the programming, and the pushing of new ideas. Codes could have links to refereed scientific papers, and codes or projects themselves could be refereed. Codes that pass a certain level of usefulness or quality could be distributed on CD-ROMs or ISO images in a way similar to the Linux distributions.

Further reading and notes

[1] .
The Open Source movement has also inspired Open Science
and Open Content:

[2] A good source of information on CFD codes is
A list of low-cost and free, CFD software may be found on

[3] Frederick Brooks, The Mythical Man-Month, Addison-Wesley. An absolutely recommended reading for anyone wishing to embark in managing a software project. The 20 th anniversary edition (1995) contains four additional
chapters. Among these "No silver bullet" is particularly instructive.

[4] Eric Raymond's writings are generally instructive and have inspired me a great deal.
His famous paper "The Cathedral and the bazaar" explains how open source software combined with a frequent release policy helps getting better quality software. "homesteading the noosphere" discusses in detail the motivations of hackers and their practices in starting and managing projects. The gift economy of the hackers described there is in stark contrast with the practices in the community of scientists. On the other hand, in "the magic cauldron" he explains how on can make money in the open source model.

[5] I am indebted to Edward A. Spiegel for many insightful conversations about current and past practice in science. This quote is from a conversation with him where he argued that, in a distant golden age, the community of physicists used to be much more relaxed. Papers were refereed lightly and the scrutiny was strict only when the author was new to the community. Competition for jobs was much less severe, so scientists refrained from the kinds of aggressive behaviour described in this paper: attacking each other's reputations or "stealing" their ideas.

[6] Although this is a personal choice, I believe that the best type of license is a relatively liberal one such as the LGPL. It allows derived work to be distributed as proprietary software, which enlarges the potential community of users to the commercial world. Ideally, a kind of peaceful collaboration between the free-source programmers and the commercial entities should occur. Academic or free source programmers would refrain from viral licenses such as the GPL and commercial entities would refrain from using or enforcing software patents. An informal nonaggression agreement could be based on these lines.

[7] On the issue of software patents and the recent position of the french secretary for industry Christian Pierret , see: or

[8] Similar issues arise in the context of  education and teaching activities. Some universities are trying to sell education over the web. On the opposite, MIT has decided to release all its course material free of charge . The pros and cons of such decisions have certainly some points in common with those of releasing source code, but would require a separate debate.
More information on issues such as the free access to scientific journal archives may be found on

Additional discussion

Several colleagues have sent me comments that helped improve this paper.

[PI] Andy Manners pointed out to me  many practical problems with copyright and licenses such as the GPL.  The paragraph on "lost revenue" is nevertheless written from the point of view of the scientist. At least in the present stage, I believe that individual scientists should take the lead to argue for a particular policy. They should be motivated first, then lobby their universities to authorize particular modes of code distribution.

This being said, if the creation of software falls within the scope of the employment of the author, then the employer is the copyright owner.  This applies both in the US and France.  In France copyright of software is treated differently from other kinds of copyrights (see "code de la propriété industrielle" on )

This provision is more likely to apply in a university for research software, than in some other contexts. For instance a privately employed engineer may be employed for certain tasks that do not
include the development of software for, say, mesh generation. The if he writes a mesh generator, he can claim copyright.  On the other hand, if one's main research activity is CFD, it is more dubious, but not impossible, that writing Open Source CFD software falls outside the scope of one's employment.

[GPL] I am grateful to Hughes Talbot who very clearly illustrated how the copyright holder could
still commercialize the code while distributing it under the GPL.

[OI] OpenInformatics:   has a petition to require that publicly funded work become open-source. Their motto is:
We believe that researchers supported by publicly-funded grant agencies should be required, as a condition on funding, to  publish any source code under an Open Source or a Free Software license. Such  licensing is the software equivalent of  peer reviewed publication of research  results.

Copyright © 2001   Stéphane Zaleski1
Verbatim copying of this document is permitted, in any medium.

Revision 1.4   October 9  2001
Opinions expressed in this paper are my own and do not necessarily reflect the
position of the university or any branch of the french government.

Revision history:

Revision 1.3 June 22,  2001

Revision 1.2 June 20, 2001

various preliminary versions around June 13, 2001.


2 My emphasis

3 The recent debate about redhat 7.0 should perhaps dampen this enthusiasm.