Free Software would be nothing without shared source code. The idea is built into the very name “Open Source,” and it is a requirement of all Free Software licenses that source code be open to view, not “welded shut.” Perhaps ironically, source code is the most material of the five components of Free Software; it is both an expressive medium, like writing or speech, and a tool that performs concrete actions. It is a mnemonic that translates between the illegible electron-speed doings of our machines and our lingering ability to partially understand and control them as human agents. Many Free Software programmers and advocates suggest that “information wants to be free” and that sharing is a natural condition of human life, but I argue something contrary: sharing produces its own kind of moral and technical order, that is, “information makes people want freedom” and how they want it is related to how that information is created and circulated. In this chapter I explore the [pg 119] twisted and contingent history of how source code and its sharing have come to take the technical, legal, and pedagogical forms they have today, and how the norms of sharing have come to seem so natural to geeks.
Source code is essential to Free Software because of the historically specific ways in which it has come to be shared, “ported,” and “forked.” Nothing about the nature of source code requires that it be shared, either by corporations for whom secrecy and jealous protection are the norm or by academics and geeks for whom source code is usually only one expression, or implementation, of a greater idea worth sharing. However, in the last thirty years, norms of sharing source code—technical, legal, and pedagogical norms—have developed into a seemingly natural practice. They emerged through attempts to make software into a product, such as IBM’s 1968 “unbundling” of software and hardware, through attempts to define and control it legally through trade secret, copyright, and patent law, and through attempts to teach engineers how to understand and to create more software.
The story of the norms of sharing source code is, not by accident, also the history of the UNIX operating system.
The proliferation of UNIX was also a hybrid commercial-academic undertaking: it was neither a “public domain” object shared solely among academics, nor was it a conventional commercial product. Proliferation occurred through novel forms of academic sharing as well as through licensing schemes constrained by the peculiar status of AT&T, a regulated monopoly forbidden to enter the computer and software industry before 1984. Thus proliferation was not mere replication: it was not the sale of copies of UNIX, but a complex web of shared and re-shared chunks of source code, and the reimplementation of an elegant and simple conceptual scheme. As UNIX proliferated, it was stabilized in multiple ways: by academics seeking to keep it whole and self-compatible through contributions of source code; by lawyers at AT&T seeking to define boundaries that mapped onto laws, licenses, versions, and regulations; and by professors seeking to define it as an exemplar of the core concepts of operating-system theory. In all these ways, UNIX was a kind of primal recursive public, drawing together people for whom the meaning of their affiliation was the use, modification, and stabilization of UNIX.
The obverse of proliferation is differentiation: forking. UNIX is admired for its integrity as a conceptual thing and despised (or marveled at) for its truly tangled genealogical tree of ports and forks: new versions of UNIX, some based directly on the source code, some not, some licensed directly from AT&T, some sublicensed or completely independent.
Forking, like random mutation, has had both good and bad effects; on the one hand, it ultimately created versions of UNIX that were not compatible with themselves (a kind of autoimmune response), but it also allowed the merger of UNIX and the Arpanet, creating a situation wherein UNIX operating systems came to be not only the paradigm of operating systems but also the paradigm of networked computers, through its intersection with the development of the TCP/IP protocols that are at the core of the Internet.
In the early days of computing machinery, there was no such thing as source code. Alan Turing purportedly liked to talk to the machine in binary. Grace Hopper, who invented an early compiler, worked as close to the Harvard Mark I as she could get: flipping switches and plugging and unplugging relays that made up the “code” of what the machine would do. Such mechanical and meticulous work hardly merits the terms reading and writing; there were no GOTO statements, no line numbers, only calculations that had to be translated from the pseudo-mathematical writing of engineers and human computers to a physical or mechanical configuration.
There is a certain irony about the computer, not often noted: the unrivaled power of the computer, if the ubiquitous claims are believed, rests on its general programmability; it can be made to do any calculation, in principle. The so-called universal Turing machine provides the mathematical proof.
In the good old days of computers-the-size-of-rooms, the languages that humans used to program computers were mnemonics; they did not exist in the computer, but on a piece of paper or a specially designed code sheet. The code sheet gave humans who were not Alan Turing a way to keep track of, to share with other humans, and to think systematically about the invisible light-speed calculations of a complicated device. Such mnemonics needed to be “coded” on punch cards or tape; if engineers conferred, they conferred over sheets of paper that matched up with wires, relays, and switches—or, later, printouts of the various machine-specific codes that represented program and data.
With the introduction of programming languages, the distinction between a “source” language and a “target” language entered the practice: source languages were “translated” into the illegible target language of the machine. Such higher-level source languages were still mnemonics of sorts—they were certainly easier for humans to read and write, mostly on yellowing tablets of paper or special code sheets—but they were also structured enough that a source language could be input into a computer and translated into a target language which the designers of the hardware had specified. Inputting commands and cards and source code required a series of actions specific to each machine: a particular card reader or, later, a keypunch with a particular “editor” for entering the commands. Properly input and translated source code provided the machine with an assembled binary program that would, in fact, run (calculate, operate, control). It was a separation, an abstraction that allowed for a certain division of labor between the ingenious human authors and the fast and mechanical translating machines.
Even after the invention of programming languages, programming “on” a computer—sitting at a glowing screen and hacking through the night—was still a long time in coming. For example, only by about 1969 was it possible to sit at a keyboard, write source code, instruct the computer to compile it, then run the program—all without leaving the keyboard—an activity that was all but unimaginable in the early days of “batch processing.”
The proliferation of different machines with different architectures drove a desire, among academics especially, for the standardization of programming languages, not so much because any single language was better than another, but because it seemed necessary to most engineers and computer users to share an emerging corpus of algorithms, solutions, and techniques of all kinds, necessary to avoid reinventing the wheel with each new machine. Algol, a streamlined language suited to algorithmic and algebraic representations, emerged in the early 1960s as a candidate for international standardization. Other languages competed on different strengths: FORTRAN and COBOL for general business use; LISP for symbolic processing. At the same time, the desire for a standard “higher-level” language necessitated a bestiary of translating programs: compilers, parsers, lexical analyzers, and other tools designed to transform the higher-level (human-readable) language into a machine-specific lower-level language, that is, machine language, assembly language, and ultimately the mystical zeroes and ones that course through our machines. The idea of a standard language and the necessity of devising specific tools for translation are the origin of the problem of portability: the ability to move software—not just good ideas, but actual programs, written in a standard language—from one machine to another.
A standard source language was seen as a way to counteract the proliferation of different machines with subtly different architectures. Portable source code would allow programmers to imagine their programs as ships, stopping in at ports of call, docking on different platforms, but remaining essentially mobile and unchanged by these port-calls. Portable source code became the Esperanto of humans who had wrought their own Babel of tribal hardware machines.
Meanwhile, for the computer industry in the 1960s, portable source code was largely a moot point. Software and hardware were [pg 124] two sides of single, extremely expensive coin—no one, except engineers, cared what language the code was in, so long as it performed the task at hand for the customer. Each new machine needed to be different, faster, and, at first, bigger, and then smaller, than the last. The urge to differentiate machines from each other was not driven by academic experiment or aesthetic purity, but by a demand for marketability, competitive advantage, and the transformation of machines and software into products. Each machine had to do something really well, and it needed to be developed in secret, in order to beat out the designs and innovations of competitors. In the 1950s and 1960s the software was a core component of this marketable object; it was not something that in itself was differentiated or separately distributed—until IBM’s famous decision in 1968 to “unbundle” software and hardware.
Before the 1970s, employees of a computer corporation wrote software in-house. The machine was the product, and the software was just an extra line-item on the invoice. IBM was not the first to conceive of software as an independent product with its own market, however. Two companies, Informatics and Applied Data Research, had explored the possibilities of a separate market in software.
IBM’s unbundling decision marked a watershed, the point at which “portable” source code became a conceivable idea, if not a practical reality, to many in the industry.
Portability in the business world meant something specific, however. Even if software could be made portable at a technical level—transferable between two different IBM machines—this was certainly no guarantee that it would be portable between customers. One company’s accounting program, for example, may not suit another’s practices. Portability was therefore hindered both by the diversity of machine architectures and by the diversity of business practices and organization. IBM and other manufacturers therefore saw no benefit to standardizing source code, as it could only provide an advantage to competitors.
Portability was thus not simply a technical problem—the problem of running one program on multiple architectures—but also a kind of political-economic problem. The meaning of product was not always the same as the meaning of hardware or software, but was usually some combination of the two. At that early stage, the outlines of a contest over the meaning of portable or shareable source code are visible, both in the technical challenges of creating high-level languages and in the political-economic challenges that corporations faced in creating distinctive proprietary products.
Set against this backdrop, the invention, success, and proliferation of the UNIX operating system seems quite monstrous, an aberration of both academic and commercial practice that should have failed in both realms, instead of becoming the most widely used portable operating system in history and the very paradigm of an “operating system” in general. The story of UNIX demonstrates how portability became a reality and how the particular practice of sharing UNIX source code became a kind of de facto standard in its wake.
UNIX was first written in 1969 by Ken Thompson and Dennis Ritchie at Bell Telephone Labs in Murray Hill, New Jersey. UNIX was the dénouement of the MIT project Multics, which Bell Labs had funded in part and to which Ken Thompson had been assigned. Multics was one of the earliest complete time-sharing operating systems, a demonstration platform for a number of early innovations in time-sharing (multiple simultaneous users on one computer).
The absence of an economic or corporate mandate for Thompson’s and Ritchie’s creativity and labor was not unusual for Bell Labs; researchers were free to work on just about anything, so long as it possessed some kind of vague relation to the interests of AT&T. However, the lack of funding for a more powerful machine did restrict the kind of work Thompson and Ritchie could accomplish. In particular, it influenced the design of the system, which was oriented toward a super-slim control unit (a kernel) that governed the basic operation of the machine and an expandable suite of small, independent tools, each of which did one thing well and which could be strung together to accomplish more complex and powerful tasks.
UNIX was unique for many technical reasons, but also for a specific economic reason: it was never quite academic and never quite commercial. Martin Campbell-Kelly notes that UNIX was a “non-proprietary operating system of major significance.”
Being on the border of business and academia meant that UNIX was, on the one hand, shielded from the demands of management and markets, allowing it to achieve the conceptual integrity that made it so appealing to designers and academics. On the other, it also meant that AT&T treated it as a potential product in the emerging software industry, which included new legal questions from a changing intellectual-property regime, novel forms of marketing and distribution, and new methods of developing, supporting, and distributing software.
Despite this borderline status, UNIX was a phenomenal success. The reasons why UNIX was so popular are manifold; it was widely admired aesthetically, for its size, and for its clever design and tools. But the fact that it spread so widely and quickly is testament also to the existing community of eager computer scientists and engineers (and a few amateurs) onto which it was bootstrapped, users for whom a powerful, flexible, low-cost, modifiable, and fast operating system was a revelation of sorts. It was an obvious alternative to the complex, poorly documented, buggy operating systems that routinely shipped standard with the machines that universities and research organizations purchased. “It worked,” in other words.
A key feature of the popularity of UNIX was the inclusion of the source code. When Bell Labs licensed UNIX, they usually provided a tape that contained the documentation (i.e., documentation that [pg 128] was part of the system, not a paper technical manual external to it), a binary version of the software, and the source code for the software. The practice of distributing the source code encouraged people to maintain it, extend it, document it—and to contribute those changes to Thompson and Ritchie as well. By doing so, users developed an interest in maintaining and supporting the project precisely because it gave them an opportunity and the tools to use their computer creatively and flexibly. Such a globally distributed community of users organized primarily by their interest in maintaining an operating system is a precursor to the recursive public, albeit confined to the world of computer scientists and researchers with access to still relatively expensive machines. As such, UNIX was not only a widely shared piece of quasi-commercial software (i.e., distributed in some form other than through a price-based retail market), but also the first to systematically include the source code as part of that distribution as well, thus appealing more to academics and engineers.
Throughout the 1970s, the low licensing fees, the inclusion of the source code, and its conceptual integrity meant that UNIX was ported to a remarkable number of other machines. In many ways, academics found it just as appealing, if not more, to be involved in the creation and improvement of a cutting-edge system by licensing and porting the software themselves, rather than by having it provided to them, without the source code, by a company. Peter Salus, for instance, suggests that people experienced the lack of support from Bell Labs as a kind of spur to develop and share their own fixes. The means by which source code was shared, and the norms and practices of sharing, porting, forking, and modifying source code were developed in this period as part of the development of UNIX itself—the technical design of the system facilitates and in some cases mirrors the norms and practices of sharing that developed: operating systems and social systems.
Over the course of 1974-77 the spread and porting of UNIX was phenomenal for an operating system that had no formal system of distribution and no official support from the company that owned it, and that evolved in a piecemeal way through the contributions [pg 129] of people from around the world. By 1975, a user’s group had developed: USENIX.
The manner in which the circulation of and contribution to UNIX occurred is not well documented, but it includes both technical and pedagogical forms of sharing. On the technical side, distribution took a number of forms, both in resistance to AT&T’s attempts to control it and facilitated by its unusually liberal licensing of the software. On the pedagogical side, UNIX quickly became a paradigmatic object for computer-science students precisely because it was a working operating system that included the source code and that was simple enough to explore in a semester or two.
In A Quarter Century of UNIX Salus provides a couple of key stories (from Ken Thompson and Lou Katz) about how exactly the technical sharing of UNIX worked, how sharing, porting, and forking can be distinguished, and how it was neither strictly legal nor deliberately illegal in this context. First, from Ken Thompson: “The first thing to realize is that the outside world ran on releases of UNIX (V4, V5, V6, V7) but we did not. Our view was a continuum. V5 was what we had at some point in time and was probably out of date simply by the activity required to put it in shape to export. After V6, I was preparing to go to Berkeley to teach for a year. I was putting together a system to take. Since it was almost a release, I made a diff with V6 [a tape containing only the differences between the last release and the one Ken was taking with him]. On the way to Berkeley I stopped by Urbana-Champaign to keep an eye on Greg Chesson. . . . I left the diff tape there and I told him that I wouldn’t mind if it got around.”
The need for a magnetic tape to “get around” marks the difference between the 1970s and the present: the distribution of software involved both the material transport of media and the digital copying of information. The desire to distribute bug fixes (the “diff ” tape) resonates with the future emergence of Free Software: the [pg 130] fact that others had fixed problems and contributed them back to Thompson and Ritchie produced an obligation to see that the fixes were shared as widely as possible, so that they in turn might be ported to new machines. Bell Labs, on the other hand, would have seen this through the lens of software development, requiring a new release, contract renegotiation, and a new license fee for a new version. Thompson’s notion of a “continuum,” rather than a series of releases also marks the difference between the idea of an evolving common set of objects stewarded by multiple people in far-flung locales and the idea of a shrink-wrapped “productized” software package that was gaining ascendance as an economic commodity at the same time. When Thompson says “the outside world,” he is referring not only to people outside of Bell Labs but to the way the world was seen from within Bell Labs by the lawyers and marketers who would create a new version. For the lawyers, the circulation of source code was a problem because it needed to be stabilized, not so much for commercial reasons as for legal ones—one license for one piece of software. Distributing updates, fixes, and especially new tools and additions written by people who were not employed by Bell Labs scrambled the legal clarity even while it strengthened the technical quality. Lou Katz makes this explicit.
A large number of bug fixes was collected, and rather than issue them one at a time, a collection tape (“the 50 fixes”) was put together by Ken [the same “diff tape,” presumably]. Some of the fixes were quite important, though I don’t remember any in particular. I suspect that a significant fraction of the fixes were actually done by non-Bell people. Ken tried to send it out, but the lawyers kept stalling and stalling and stalling. Finally, in complete disgust, someone “found a tape on Mountain Avenue” [the location of Bell Labs] which had the fixes. When the lawyers found out about it, they called every licensee and threatened them with dire consequences if they didn’t destroy the tape, after trying to find out how they got the tape. I would guess that no one would actually tell them how they came by the tape (I didn’t).
Distributing the fixes involved not just a power struggle between the engineers and management, but was in fact clearly motivated by the fact that, as Katz says, “a significant fraction of the fixes were done by non-Bell people.” This meant two things: first, that there was an obvious incentive to return the updated system to these [pg 131] people and to others; second, that it was not obvious that AT&T actually owned or could claim rights over these fixes—or, if they did, they needed to cover their legal tracks, which perhaps in part explains the stalling and threatening of the lawyers, who may have been buying time to make a “legal” version, with the proper permissions.
The struggle should be seen not as one between the rebel forces of UNIX development and the evil empire of lawyers and managers, but as a struggle between two modes of stabilizing the object known as UNIX. For the lawyers, stability implied finding ways to make UNIX look like a product that would meet the existing legal framework and the peculiar demands of being a regulated monopoly unable to freely compete with other computer manufacturers; the ownership of bits and pieces, ideas and contributions had to be strictly accountable. For the programmers, stability came through redistributing the most up-to-date operating system and sharing all innovations with all users so that new innovations might also be portable. The lawyers saw urgency in making UNIX legally stable; the engineers saw urgency in making UNIX technically stable and compatible with itself, that is, to prevent the forking of UNIX, the death knell for portability. The tension between achieving legal stability of the object and promoting its technical portability and stability is one that has repeated throughout the life of UNIX and its derivatives—and that has ramifications in other areas as well.
The identity and boundaries of UNIX were thus intricately formed through its sharing and distribution. Sharing produced its own form of moral and technical order. Troubling questions emerged immediately: were the versions that had been fixed, extended, and expanded still UNIX, and hence still under the control of AT&T? Or were the differences great enough that something else (not-UNIX) was emerging? If a tape full of fixes, contributed by non-Bell employees, was circulated to people who had licensed UNIX, and those fixes changed the system, was it still UNIX? Was it still UNIX in a legal sense or in a technical sense or both? While these questions might seem relatively scholastic, the history of the development of UNIX suggests something far more interesting: just about every possible modification has been made, legally and technically, but the concept of UNIX has remained remarkably stable.
Technical portability accounts for only part of UNIX’s success. As a pedagogical resource, UNIX quickly became an indispensable tool for academics around the world. As it was installed and improved, it was taught and learned. The fact that UNIX spread first to university computer-science departments, and not to businesses, government, or nongovernmental organizations, meant that it also became part of the core pedagogical practice of a generation of programmers and computer scientists; over the course of the 1970s and 1980s, UNIX came to exemplify the very concept of an operating system, especially time-shared, multi-user operating systems. Two stories describe the porting of UNIX from machines to minds and illustrate the practice as it developed and how it intersected with the technical and legal attempts to stabilize UNIX as an object: the story of John Lions’s Commentary on Unix 6th Edition and the story of Andrew Tanenbaum’s Minix.
The development of a pedagogical UNIX lent a new stability to the concept of UNIX as opposed to its stability as a body of source code or as a legal entity. The porting of UNIX was so successful that even in cases where a ported version of UNIX shares none of the same source code as the original, it is still considered UNIX. The monstrous and promiscuous nature of UNIX is most clear in the stories of Lions and Tanenbaum, especially when contrasted with the commercial, legal, and technical integrity of something like Microsoft Windows, which generally exists in only a small number of forms (NT, ME, XP, 95, 98, etc.), possessing carefully controlled source code, immured in legal protection, and distributed only through sales and service packs to customers or personal-computer manufacturers. While Windows is much more widely used than UNIX, it is far from having become a paradigmatic pedagogical object; its integrity is predominantly legal, not technical or pedagogical. Or, in pedagogical terms, Windows is to fish as UNIX is to fishing lessons.
Lions’s Commentary is also known as “the most photocopied document in computer science.” Lions was a researcher and senior lecturer at the University of New South Wales in the early 1970s; after reading the first paper by Ritchie and Thompson on UNIX, he convinced his colleagues to purchase a license from AT&T.
Lions’s commentary was a unique document in the world of computer science, containing a kind of key to learning about a central component of the computer, one that very few people would have had access to in the 1970s. It shows how UNIX was ported not only to machines (which were scarce) but also to the minds of young researchers and student programmers (which were plentiful). Several generations of both academic computer scientists and students who went on to work for computer or software corporations were trained on photocopies of UNIX source code, with a whiff of toner and illicit circulation: a distributed operating system in the textual sense.
Unfortunately, Commentary was also legally restricted in its distribution. AT&T and Western Electric, in hopes that they could maintain trade-secret status for UNIX, allowed only very limited circulation of the book. At first, Lions was given permission to distribute single copies only to people who already possessed a license for UNIX V6; later Bell Labs itself would distribute Commentary [pg 134] briefly, but only to licensed users, and not for sale, distribution, or copying. Nonetheless, nearly everyone seems to have possessed a dog-eared, nth-generation copy. Peter Reintjes writes, “We soon came into possession of what looked like a fifth generation photocopy and someone who shall remain nameless spent all night in the copier room spawning a sixth, an act expressly forbidden by a carefully worded disclaimer on the first page. Four remarkable things were happening at the same time. One, we had discovered the first piece of software that would inspire rather than annoy us; two, we had acquired what amounted to a literary criticism of that computer software; three, we were making the single most significant advancement of our education in computer science by actually reading an entire operating system; and four, we were breaking the law.”
Thus, these generations of computer-science students and academics shared a secret—a trade secret become open secret. Every student who learned the essentials of the UNIX operating system from a photocopy of Lions’s commentary, also learned about AT&T’s attempt to control its legal distribution on the front cover of their textbook. The parallel development of photocopying has a nice resonance here; together with home cassette taping of music and the introduction of the video-cassette recorder, photocopying helped drive the changes to copyright law adopted in 1976.
Thirty years later, and long after the source code in it had been completely replaced, Lions’s Commentary is still widely admired by geeks. Even though Free Software has come full circle in providing students with an actual operating system that can be legally studied, taught, copied, and implemented, the kind of “literary criticism” that Lions’s work represents is still extremely rare; even reading obsolete code with clear commentary is one of the few ways to truly understand the design elements and clever implementations that made the UNIX operating system so different from its predecessors and even many of its successors, few, if any of which have been so successfully ported to the minds of so many students.
Lions’s Commentary contributed to the creation of a worldwide community of people whose connection to each other was formed by a body of source code, both in its implemented form and in its textual, photocopied form. This nascent recursive public not only understood itself as belonging to a technical elite which was constituted by its creation, understanding, and promotion of a particular [pg 135] technical tool, but also recognized itself as “breaking the law,” a community constituted in opposition to forms of power that governed the circulation, distribution, modification, and creation of the very tools they were learning to make as part of their vocation. The material connection shared around the world by UNIX-loving geeks to their source code is not a mere technical experience, but a social and legal one as well.
Lions was not the only researcher to recognize that teaching the source code was the swiftest route to comprehension. The other story of the circulation of source code concerns Andrew Tanenbaum, a well-respected computer scientist and an author of standard textbooks on computer architecture, operating systems, and networking.
Minix became as widely used in the 1980s as a teaching tool as Lions’s source code had been in the 1970s. According to Tanenbaum, the Usenet group comp.os.minix had reached 40,000 members by the late 1980s, and he was receiving constant suggestions for changes and improvements to the operating system. His own commitment to teaching meant that he incorporated few of these suggestions, an effort to keep the system simple enough to be printed in a textbook and understood by undergraduates. Minix [pg 136] was freely available as source code, and it was a fully functioning operating system, even a potential alternative to UNIX that would run on a personal computer. Here was a clear example of the conceptual integrity of UNIX being communicated to another generation of computer-science students: Tanenbaum’s textbook is not called “UNIX Operating Systems”—it is called Operating Systems. The clear implication is that UNIX represented the clearest example of the principles that should guide the creation of any operating system: it was, for all intents and purposes, state of the art even twenty years after it was first conceived.
Minix was not commercial software, but nor was it Free Software. It was copyrighted and controlled by Tanenbaum’s publisher, Prentice Hall. Because it used no AT&T source code, Minix was also legally independent, a legal object of its own. The fact that it was intended to be legally distinct from, yet conceptually true to UNIX is a clear indication of the kinds of tensions that govern the creation and sharing of source code. The ironic apotheosis of Minix as the pedagogical gold standard for studying UNIX came in 1991-92, when a young Linus Torvalds created a “fork” of Minix, also rewritten from scratch, that would go on to become the paradigmatic piece of Free Software: Linux. Tanenbaum’s purpose for Minix was that it remain a pedagogically useful operating system—small, concise, and illustrative—whereas Torvalds wanted to extend and expand his version of Minix to take full advantage of the kinds of hardware being produced in the 1990s. Both, however, were committed to source-code visibility and sharing as the swiftest route to complete comprehension of operating-systems principles.
Tanenbaum’s need to produce Minix was driven by a desire to share the source code of UNIX with students, a desire AT&T was manifestly uncomfortable with and which threatened the trade-secret status of their property. The fact that Minix might be called a fork of UNIX is a key aspect of the political economy of operating systems and social systems. Forking generally refers to the creation of new, modified source code from an original base of source code, resulting in two distinct programs with the same parent. Whereas the modification of an engine results only in a modified engine, the [pg 137] modification of source code implies differentiation and reproduction, because of the ease with which it can be copied.
How could Minix—a complete rewrite—still be considered the same object? Considered solely from the perspective of trade-secret law, the two objects were distinct, though from the perspective of copyright there was perhaps a case for infringement, although AT&T did not rely on copyright as much as on trade secret. From a technical perspective, the functions and processes that the software accomplishes are the same, but the means by which they are coded to do so are different. And from a pedagogical standpoint, the two are identical—they exemplify certain core features of an operating system (file-system structure, memory paging, process management)—all the rest is optimization, or bells and whistles. Understanding the nature of forking requires also that UNIX be understood from a social perspective, that is, from the perspective of an operating system created and modified by user-developers around the world according to particular and partial demands. It forms the basis for the emergence of a robust recursive public.
One of the more important instances of the forking of UNIX’s perambulatory source code and the developing community of UNIX co-developers is the story of the Berkeley Software Distribution and its incorporation of the TCP/IP protocols. In 1975 Ken Thompson took a sabbatical in his hometown of Berkeley, California, where he helped members of the computer-science department with their installations of UNIX, arriving with V6 and the “50 bug fixes” diff tape. Ken had begun work on a compiler for the Pascal programming language that would run on UNIX, and this work was taken up by two young graduate students: Bill Joy and Chuck Hartley. (Joy would later co-found Sun Microsystems, one of the most successful UNIX-based workstation companies in the history of the industry.)
Joy, above nearly all others, enthusiastically participated in the informal distribution of source code. With a popular and well-built Pascal system, and a new text editor called ex (later vi), he created the Berkeley Software Distribution (BSD), a set of tools that could be used in combination with the UNIX operating system. They were extensions to the original UNIX operating system, but not a complete, rewritten version that might replace it. By all accounts, Joy served as a kind of one-man software-distribution house, making tapes and posting them, taking orders and cashing checks—all in [pg 138] addition to creating software.
According to Don Libes, Bell Labs allowed Berkeley to distribute its extensions to UNIX so long as the recipients also had a license from Bell Labs for the original UNIX (an arrangement similar to the one that governed Lions’s Commentary).
As BSD developed, it gained different kinds of functionality than the UNIX from which it was spawned. The most significant development was the inclusion of code that allowed it to connect computers to the Arpanet, using the TCP/IP protocols designed by Vinton Cerf and Robert Kahn. The TCP/IP protocols were a key feature of the Arpanet, overseen by the Information Processing and Techniques Office (IPTO) of the Defense Advanced Research Projects Agency (DARPA) from its inception in 1967 until about 1977. The goal of the protocols was to allow different networks, each with its own machines and administrative boundaries, to be connected to each other.
To achieve the benefits of TCP/IP, the resources needed to be implemented in all of the different operating systems that were connected to the Arpanet—whatever operating system and machine happened to be in use at each of the nodes. However, by 1977, the original machines used on the network were outdated and increasingly difficult to maintain and, according to Kirk McKusick, the greatest expense was that of porting the old protocol software to new machines. Hence, IPTO decided to pursue in part a strategy of achieving coordination at the operating-system level, and they chose UNIX as one of the core platforms on which to standardize. In short, they had seen the light of portability. In about 1978 IPTO granted a contract to Bolt, Beranek, and Newman (BBN), one of the original Arpanet contractors, to integrate the TCP/IP protocols into the UNIX operating system.
But then something odd happened, according to Salus: “An initial prototype was done by BBN and given to Berkeley. Bill [Joy] immediately started hacking on it because it would only run an Ethernet at about 56K/sec utilizing 100% of the CPU on a 750. . . . Bill lobotomized the code and increased its performance to on the order of 700KB/sec. This caused some consternation with BBN when they came in with their ‘finished’ version, and Bill wouldn’t accept it. There were battles for years after, about which version would be in the system. The Berkeley version ultimately won.”
Although it is not clear, it appears BBN intended to give Joy the code in order to include it in his BSD version of UNIX for distribution, and that Joy and collaborators intended to cooperate with Rob Gurwitz of BBN on a final implementation, but Berkeley insisted on “improving” the code to make it perform more to their needs, and BBN apparently dissented from this.
Forking, however, does not imply permanent divergence, and the continual improvement, porting, and sharing of software can have odd consequences when forks occur. On the one hand, there are particular pieces of source code: they must be identifiable and exact, and prepended with a copyright notice, as was the case of the Berkeley code, which was famously and vigorously policed by the University of California regents, who allowed for a very liberal distribution of BSD code on the condition that the copyright notice was retained. On the other hand, there are particular named collections of code that work together (e.g., UNIX™, or DARPA-approved UNIX, or later, Certified Open Source [sm]) and are often identified by a trademark symbol intended, legally speaking, to differentiate products, not to assert ownership of particular instances of a product.
The odd consequence is this: Bill Joy’s specific TCP/IP code was incorporated not only into BSD UNIX, but also into other versions of UNIX, including the UNIX distributed by AT&T (which had originally licensed UNIX to Berkeley) with the Berkeley copyright notice removed. This bizarre, tangled bank of licenses and code resulted in a famous suit and countersuit between AT&T and Berkeley, in which the intricacies of this situation were sorted out.
The BSD fork of UNIX (and the subfork of TCP/IP) was only one of many to come. By the early 1980s, a proliferation of UNIX forks had emerged and would be followed shortly by a very robust commercialization. At the same time, the circulation of source code started to slow, as corporations began to compete by adding features and creating hardware specifically designed to run UNIX (such as the Sun Sparc workstation and the Solaris operating system, the result of Joy’s commercialization of BSD in the 1980s). The question of how to make all of these versions work together eventually became the subject of the open-systems discussions that would dominate the workstation and networking sectors of the computer [pg 141] market from the early 1980s to 1993, when the dual success of Windows NT and the arrival of the Internet into public consciousness changed the fortunes of the UNIX industry.
A second, and more important, effect of the struggle between BBN and BSD was simply the widespread adoption of the TCP/IP protocols. An estimated 98 percent of computer-science departments in the United States and many such departments around the world incorporated the TCP/IP protocols into their UNIX systems and gained instant access to Arpanet.
The UNIX operating system is not just a technical achievement; it is the creation of a set of norms for sharing source code in an unusual environment: quasi-commercial, quasi-academic, networked, and planetwide. Sharing UNIX source code has taken three basic forms: porting source code (transferring it from one machine to another); teaching source code, or “porting” it to students in a pedagogical setting where the use of an actual working operating system vastly facilitates the teaching of theory and concepts; and forking source code (modifying the existing source code to do something new or different). This play of proliferation and differentiation is essential to the remarkably stable identity of UNIX, but that identity exists in multiple forms: technical (as a functioning, self-compatible operating system), legal (as a license-circumscribed version subject to intellectual property and commercial law), and pedagogical (as a conceptual exemplar, the paradigm of an operating system). Source code shared in this manner is essentially unlike any other kind of [pg 142] source code in the world of computers, whether academic or commercial. It raises troubling questions about standardization, about control and audit, and about legitimacy that haunts not only UNIX but the Internet and its various “open” protocols as well.
Sharing source code in Free Software looks the way it does today because of UNIX. But UNIX looks the way it does not because of the inventive genius of Thompson and Ritchie, or the marketing and management brilliance of AT&T, but because sharing produces its own kind of order: operating systems and social systems. The fact that geeks are wont to speak of “the UNIX philosophy” means that UNIX is not just an operating system but a way of organizing the complex relations of life and work through technical means; a way of charting and breaching the boundaries between the academic, the aesthetic, and the commercial; a way of implementing ideas of a moral and technical order. What’s more, as source code comes to include more and more of the activities of everyday communication and creation—as it comes to replace writing and supplement thinking—the genealogy of its portability and the history of its forking will illuminate the kinds of order emerging in practices and technologies far removed from operating systems—but tied intimately to the UNIX philosophy.
Copyright: 2008 Duke University Press
Printed in the United States of America on acid-free paper ∞
Designed by C. H. Westmoreland
Typeset in Charis (an Open Source font) by Achorn International
Library of Congress Cataloging-in-Publication data and republication acknowledgments appear on the last printed pages of this book.
License: Licensed under the Creative Commons Attribution-NonCommercial-Share Alike License, available at https://creativecommons.org/licenses/by-nc-sa/3.0/ or by mail from Creative Commons, 559 Nathan Abbott Way, Stanford, Calif. 94305, U.S.A. "NonCommercial" as defined in this license specifically excludes any sale of this work or any portion thereof for money, even if sale does not result in a profit by the seller or if the sale is by a 501(c)(3) nonprofit or NGO.
Duke University Press gratefully acknowledges the support of HASTAC (Humanities, Arts, Science, and Technology Advanced Collaboratory), which provided funds to help support the electronic interface of this book.
Two Bits is accessible on the Web at twobits.net.
≅ SiSU Spine ፨ (object numbering & object search)
(web 1993, object numbering 1997, object search 2002 ...) 2024