Critique of Research Software Engineering

Wenqi He · Research Software Engineer, NCSA, University of Illinois

All human activities are embedded in social relations: Whenever we swallow a morsel of food, we enter into a web of relations with others, such as the farmers who grew it and the workers who processed and transported it — most of them are strangers we might never meet. Whenever we put a foot on the pedal, we also enter into a web of relations with strangers: the mechanical engineers who designed the vehicle, the civil engineers who built the roads beneath it, the urban planners who laid out the city, among others. These relations, though impersonal, are the foundations of civilization.

As activities are embedded, context is everything. For instance, one can sing on stage, demonstrate to a student, or sing in the shower — and even when these involve the same technical execution, they are obviously different activities, each serving a different kind of social relation: with an audience, a student, and oneself (if we include this relation with oneself in our definition of “social relations”). Each type of relation carries its own purpose, intent, norms, and value judgements, shaping how we are expected to act, and consequently how we think and feel about what we are doing. Software engineering is no exception. What we do with software does not, on its own, define what we are doing. And yet our received language draws no distinction between the action in isolation and the activity that is socially grounded. It is precisely this omission that motivates the following discussion, which seeks to unpack what the term “research software engineering” fails to capture.

The computer is, in Aristotle’s — or rather, Aquinas’s — terms, a potentia prima: a machine that could become any mechanism we prescribe, whose significance lies entirely in what it could become. A program is simultaneously an actus primus — the realization of the computer’s potential of becoming a specialized mechanism — and simultaneously a potentia secunda, as it holds within it only the capacity to transform inputs into outputs without yet having exercised it. A running process is the actus secundus — the realization of the program’s potential to actually transform inputs into outputs in the world.

Software mainly serves two kinds of purpose: instrumental and rhetorical, corresponding respectively to the social relations of producer and consumer, and that of author and audience.

We normally think of software as instrumental, its utility lying entirely in the output it produces from the inputs. The criteria are familiar: for simulations, correctness and reproducibility; for user-facing services, reliability, availability, performance, and security. But software is also an expressive medium, and thus has the capacity not only to transform data but to transmit ideas, as McLuhan and Kay argued.

It is useful, however, to make a further distinction between media and rhetoric. Rhetorical software is created to establish or support an argument. Sometimes the output is the argument — a visualization makes a claim about data, a simulation produces results that validate a theory. But sometimes the running itself is the argument: a prototype that executes convinces the observer, in a way no description could, that the proposed system is technically feasible, and that the underlying idea has practical merit. In both cases the sole criterion is the effectiveness of the argument. Other metrics such as reliability and performance matter only insofar as they contribute to the effectiveness of the argument. A demonstration that crashes before making its point has failed; one that makes its point and subsequently crashes has, for all practical purposes, succeeded.

Not all media is rhetorical. A video game made purely for entertainment is instrumental in the sense that it is produced to satisfy a need, i.e. the need to be entertained. But a game can also convey an intentional message (e.g. MGS2), which makes it simultaneously a rhetorical device. Our relation with its creators is therefore simultaneously a producer-consumer and an author-audience relation. In this sense, research software is not unlike a video game: it can function both as an instrument that supports research and as a rhetorical device in the service of publications, conference presentations, grant proposals, and other forms of scholarly persuasion.

We could also analyze social relations by the human needs that power them. Some needs can be fulfilled temporarily, but no act of fulfillment extinguishes them, as they are conditions of our being: the need to eat, to sleep, to be entertained, to name but a few. Society gives rise to many more such needs beyond the merely biological — the need to carry out our duties, for instance, is never dissolved by carrying them out (or so we hope). Other needs are contingent: the very act of fulfilling them extinguishes them. The need to have a wedding, for example, is satisfied by simply having the wedding. Some software serves such contingent needs — a data processing script that cleans one particular dataset, a simulation run for a single research project — and once the need is fulfilled, the relation dissolves. We call software that serves contingent needs ephemeral, and software that serves recurrent needs persistent.

Combining these two dimensions gives us four kinds of software:

Ephemeral-rhetorical: e.g. one-off scripts and programs written to produce figures, renderings, interface screenshots, paper supplements.
Ephemeral-instrumental: e.g. migration scripts, data cleaning scripts, analysis pipelines, simulation runs.
Persistent-rhetorical: e.g. research websites, teaching tools, live demos, interactive visualizations, proof-of-concept systems.
Persistent-instrumental: e.g. shared libraries, data pipelines, user-facing platforms, cyberinfrastructure.

These categories are by no means a fixed taxonomy. As social relations shift over time, so too do the categories we might assign to a piece of software — and moreover, multiple categories can apply at the same time. A simulation run in the course of research is ephemeral-instrumental; the same simulation, used to generate figures for a paper, is also ephemeral-rhetorical. A proof-of-concept system is persistent-rhetorical, but if people start to incorporate it into their workflows, it becomes, simultaneously, persistent-instrumental. We could even package an ephemeral-instrumental as a persistent-instrumental service that runs it on demand, or combine ephemeral-instrumental and ephemeral-rhetorical components into a persistent-rhetorical system.

Having laid out this framework, we can now ask more precisely: what exactly distinguishes “research software engineering” from just “software engineering”? Is it merely software engineering in a research context — and if so, what does that really mean? To answer that, we must first understand the Sprachspiel, in Wittgenstein’s terms, that our language about software belongs to.

The software industry is, as a whole, mainly concerned with what we have categorized as persistent-instrumental software — and for good reason: the economic model is sustained by, and actively incentivizes, the cultivation of stable, long-term relations between producers and consumers (not always mutually beneficial, as the case of social media illustrates). This economic Basis and the Überbau of practices and values feed on each other, perpetuating the ideology of “reliability”, “security”, “availability”, “maintainability”, “user experience”, “growth”, “engagement”, and so on. We have internalized this set of values so thoroughly that we take them to be universal — and that is precisely what is insidious about ideology. These familiar words and values did not arise from nowhere, nor do they reflect universal necessity: they grew out of a capitalist mode of production like deep-sea fish, which cannot survive without the pressure and salinity of that particular environment. One could of course recreate those conditions, but simply throwing them into a tank, forgetting their origin, will surely kill them.

Unlike in industry, where the relation is largely fixed and enduring, we work across a disparate collection of short-term projects and grant cycles, spanning the full spectrum of relations, and they shift silently during the course of a project. Consider, for example, a one-off data pipeline that becomes the core dependency of a prototype, which in turn becomes a “production system” (or so it is called), temporarily extended for demo videos. In this particular sequence, nothing ever crosses the boundary of rhetorical software into the category of instrumental. “Production”, “release”, and the like are really misnomers carried over from the language game of industry. Since languages do not reveal their own limitations, without vigilance and scrutiny, both the shift in human relations and intents and the misappropriation of vocabulary and values would go unnoticed, and we are left blindsided when we inevitably find ourselves burnt out by the frustration of misaligned expectations, wasted effort, endless back-and-forth, and people talking past each other, each playing their own language game that just happens to share a vocabulary that looks and sounds the same.

Let us zoom in to a particular crevice in our language. Take the word “user” in the context of rhetorical software whose aim is to convince a reviewer. Who exactly is the “user”, and what relation do we have with them? If we search outside of language, the so-called “user” is nowhere to be found — an improper definite description, in Russell’s terms, like the bald king of France: apparently meaningful, but with no real referent in the world. Following Russell’s analysis, “the king of France is bald” is false not because there are no bald people, but because there is no king of France. Similarly, “the user wants X” is false not because there is no person who wants X. It is certainly not true that no one wants X — in fact, there is at least one person who wants X: the very person who claims that “the user” wants X, much like a ventriloquist speaking through a dummy, or a politician who “represents” the electorate and makes policies “in their interests.” It is false because there is no user.

Surely the reviewer uses the software, and is therefore, by definition, a user? But singing in the shower does not make you a singer in any real sense of the word. In the language game of the software industry, “user” is never simply used in the sense found in a dictionary, but always refers to a real socioeconomic participant embedded in a bona fide producer-consumer relation. In contrast, when the aim of a piece of software is to convince a reviewer, it is rhetorical in nature, and the reviewer is really an audience rather than a consumer — what the industry would call a “user.” A reviewer might temporarily occupy the role of a “user” as a narrative device, much as an actor plays a part, but when the evaluation is complete, the performance ends and they return to their ordinary lives. A reviewer clicking a button is therefore only superficially the same action as a paying user clicking a button. The sociological and psychological forces at play are entirely different. The interaction between the human and the software may appear identical, but the interaction between humans through the software is of an entirely different kind. To equate the two is a category error.

One might object: where is the harm in a little creative imagination? And that is indeed a valid point. After all, the best king in all of history is one without subjects, for there is never a revolt, a protest, or even so much as the slightest grievance against him. And the most beautiful lady in the world is, as no knight would dare dispute, mi señora, la sin par Dulcinea del Toboso. But we should not be reassured, or even proud, of how innocuous our fantasies are. We should ask instead: what good does it do? If all we do is pour our creativity into make-believe, then perhaps we should consider ourselves part of the entertainment industry.

In some languages there is no distinction between blue and green — and speakers of those languages do not make the same perceptual distinctions that speakers of other languages do. This is precisely our predicament: the distinctions we need to clearly delineate our profession simply do not exist in the vocabulary we have inherited from the software industry. The foremost challenge of our profession is therefore not the mechanical execution of designing, coding, testing, and deploying software, nor the unexamined pursuit of virtues such as “security”, “maintainability”, “correctness”, or “user experience” — virtues that are real and important within the language game of the underlying economy, but whose relevance is always relative to the actual social relations the software is grounded in. To properly invest our limited time and effort, we must develop the diagnostic capacity to ask, before anything else: what relation does this software serve, and what does that relation demand of us? In a nascent profession such as ours, the answers are not yet given, and so we must articulate them ourselves. And to articulate them, we must first invent the language, because, at the end of the day, as Wittgenstein said,

Die Grenzen meiner Sprache bedeuten die Grenzen meiner Welt.