In Proceedings of the 12th ACM conference on Hypertext and Hypermedia (august 2001, Aarhus, Denmark). New York: ACM Press: 175-184. Published here by author on june 5, 2001, in accordance with ACM guidelines ©Copyright 2001 by ACM, Inc.

[Download pdf-version] [To webpage of Rune Dalgaard]

 

Hypertext and the Scholarly Archive: Intertexts, Paratexts and Metatexts at Work

 

Rune Dalgaard

University of Aarhus, Denmark
E-mail: runed@imv.au.dk

 

 

ABSTRACT

With the Web, hypertext has become the paradigmatic rhetorical structure of a global and distributed archive. This paper argues that the scholarly archive is going though a process of hypertextualization that is not adequately accounted for in theories on hypertext. A methodological approach based on Gerard Genette’s theory of transtextuality is proposed for a study of the hypertextualized archive. This involves a rejection of the reductionist opposition of hypertext and the fixed linear text, in favor of a study of the intertexts, paratexts and metatexts that work at the interface between texts and archive. I refer to this as second-order textuality.

KEYWORDS: Textuality, web, hypertext rhetoric, theory, paratext, metatext, intertextuality, scholarly and scientific communication, criticism, navigation.

INTRODUCTION

Scholarly and scientific texts [I] are increasingly published, archived, interlinked, indexed and categorized on the web, which is today the largest archive of scholarly texts in the world. Therefore we are now in a position to critically review some of the claims made by early hypertext theory, and carry their occupation with hypertext as a new textual paradigm further. The justifiable celebration of visionaries such as Vannevar Bush and Theodor Nelson should not prevent a critical testing of their ideas against the manifestly emerging scholarly archive [6][24]. Similarly, we can reexamine the notions of textuality formulated by subsequent hypertext theorists such as Jay Bolter and George Landow [4][19].

In the writings of Nelson, Bolter, and Landow, in particular, we find a shared contrasting of hypertext with the fixed linear printed text and all its perceived limitations. Rather than following this lead, I want to shift attention from the text to the relationship of text and archive. The potential of hypertext as a rhetorical structure of an archival system is a somewhat different issue than the intratextual relations at the narrative level of texts.

The opposition between uni-linear and multi-linear, it will be argued, is counterproductive to an understanding of the differences between the archival system of printed texts and the hypertextualized archive. Rather we are witnessing how the link, as a rhetoric device, is mechanizing and electrifying textual relations that were already prone to multi-linear reading in the first place. Those textual relations are the complex relations between secondary texts and texts-as-works. The secondary texts are the texts "between texts", "at the threshold of texts" and "around texts" – what we, inspired by Gerard Genette, can term intertexts, paratexts and metatexts.

In this paper, focus will be these secondary texts and their navigational function in the digital, interlinked scholarly archive. This implies a shift in focus: from the widespread focus on textuality in relation to narrative and reading, to textuality in relation to the archive and metareading. When we refer to notions such as browsing, navigating, and searching, we are referring to a form of metareading that works at the interface between texts and archive. I suggest that this ‘reading the archive’ corresponds to a dominant textual paradigm of the web, and refer to it as second-order textuality.

Hopefully, the approach suggested here is a step towards a better understanding of the textual transformations of the archive. Also, it is hoped that the approach is sensible of how such transformations grow out of the meeting of a new medium and existing practices of communication.

HYPERTEXT – A NEW TEXTUALITY?

From the very beginning, hypertext has been thought of as a new form of textuality. For most hypertext theorists, this textuality is defined by a paradigmatic break with what is conceived as the fixed linear text of print. In Bolter’s words, "The computer is calling into question the idea of fixity: in place of the stable printed text, the computer offers us a fluid, interactive text. The computer promises, therefore, to reverse the qualities that Eisenstein identified in the printing revolution" [4, p. 22]. Landow, in turn, drawing on theories of text formulated by post-structuralists such as Barthes, Foucault and Derrida, claims that "hypertext fragments, disperses or atomizes text…by removing the linearity of print…[and it] destroys the notion of a fixed unitary text" [19, p. 64]. Thus, whether the argument bases itself on technology (Bolter) or literary theories (Landow), hypertext is assumed to bring about a dynamic, deconstructed, and dispersed text.

The opposition of linear and non-linear was already latent in the writing of Vannevar Bush, and was formulated explicitly by Theodor Nelson [6][24]. While it has been criticized and reformulated into the more precise vocabulary of uni- and multi-linearity [19], it has retained an almost axiomatic status in writings on hypertext. Yet, there are some obvious, serious problems with an account that locks the conception of hypertext into a dichotomous opposition to fixity and uni-linearity at the level of the text.

First, from its very inception, hypertext was thought of both at the level of the text and at the level of a networked archive. Yet, archives of text have always been read in a multi-linear fashion, i.e., collections represented in catalogues and other reference works are not supposed to be read from one end to the other. Therefore the notion of multi-linearity does not adequately characterize the transformation involved at this level.

Second, to refer to the printed text as uni-linear requires that we disregard many types of printed texts, since encyclopedias, newspapers, dictionaries, bibliographies and so on, are all intended for multi-linear reading [II]. Similarly, the flexibility possessed by all printed texts, often supported by tables of contents, indexes, and notes, which allows for multiple paths of reading, is usually downplayed. Thus, the variety of forms and flexibility of printed texts has been ignored for the convenient, but narrow, conception of the printed text as a linear straitjacket of thought.

Thus, Landow and Bolter’s theories, apart from their many insights and acute observations, have never been able to satisfactorily account for some obvious paradoxes. If, for example, as Niels Ole Finnemann suggests [8], post-structuralist theories were formulated in relation to printed texts, how can they at the same time be taken as characterizations of the division between print and digital texts? If non-linear (or multi-linear) texts have existed for millennia, why is the correlation between print and uni-linear texts seen as either causal or working behind our backs psychologically, instead of simply as a very functional (even the best?) form of written communication for many purposes?

With respect to these flaws, Aarseth correctly points out that the normative bias of early hypertext theories "must be read in light of the larger project within the hypertext community of trying to connect their technology-ideology of hypertext to various paradigms of textual theory" [1, p. 25]. The justification of the ideological claims made on behalf of hypertext often takes one of two forms: either a belief that hypertext possesses a new critical and reflexive potential (Landow, Bolter, Kolb [17]) - for some, hypertext is even explicitly connected to a particular philosophy or critical attitude [30] - or, alternatively, an idea, already present in Bush’s idea of the ‘Memex’, that hypertext is somehow closer to human associative thinking than printed text [6] [4]. Apart from the contradiction between these two ideas, critical distance being the opposite of associative thinking, they are similar in relying on belief more than on actual studies of hypertext. This supports Alex Pang’s observation that many assertions made by 1st-generation hypertext theorists such as Landow and Bolter are not based on actual technologies or widespread social practice [27][III]. Here we shall try to follow Pang’s suggestion for a 2nd-generation hypertext theory that deals with what actually happens, as opposed to what we would like to happen.

The criticism voiced here against 1st-generation hypertext theory does not aim to diminish the significance of early hypertext theories for bringing up the issue of textual transformations in relation to hypertext. Neither does it preclude seeking inspiration in the many insights and avenues opened by early hypertext theorists. Yet the aforementioned flaws should cause some skepticism concerning the inherent interpretation of hypertext in a larger cultural and historical perspective, and it also calls for a rethinking of the relationship between hypertext and textuality.

MEDIUM, GENRE AND LINK

Strictly speaking, hypertext is not a medium, if by medium we understand some relatively fixed features of a communication technology [IV]. Rather, it is one rhetorical structure among others in the computer, not quite unlike the cut in film. Like the cut in film, we can use this rhetorical means of expression for a variety of purposes in respect to form and content. Unlike the cut, of course, hyperlinks are activated by the user/reader/viewer [V].

It is also possible to use the term hypertext in the sense of a formal genre, i.e., of narrative systems characterized by a certain structure. Such a notion of genre seems to underlie the view of hypertexts as systems using links to make multi-linear texts. Both Aarseth and Finnemann have suggested that we can identify a specific genre of hypertextual works at this level [1][8]. Such a definition would fit most literary attempts to create new narrative forms employing hypertext.

As a general theory of hypertext, however, such a theory has a number of shortcomings. First, we can easily conceive of a deterministic chain of links leaving only singular succession, in which one link leads exclusively to the next. In fact, such uni-linear links are commonly used in digital texts as the equivalent of flipping a page. Second, as noted above, we know of many multi-linear texts that are based on print or other media predating the computer. Thus multi-linearity, while an interesting use of links, can hardly qualify as a defining or even necessary criterion for digital hypertexts as opposed to printed texts.

Instead, we can suggest a more general definition of hypertext as characterized by the link as a rhetorical device. By this definition, the link is not connected to any particular narrative form or form of expression (pictures, sound, text), but understood simply as a mechanized jump from one node to another, activated by a user/reader/surfer. At the lower threshold of what can constitute a node, we have the utterance (understood as a verbal, pictorial or musical statement). Thus, hypertext is not necessarily a narrative, or even a text. For whatever purpose we use this rhetoric device, it is important to realize that such uses are not determined by the link – and therefore not answerable to some "true" form of hypertext – on the contrary, the link takes on meaning in relation to the semantic context to which it is mobilized. Consequently we must do away with the crude, reductionist opposition between the book and the hypertext as emblematic of two incompatible narrative forms.

Since a link is a rhetorical device that can be used in many semantic contexts and at different levels, we can not move directly from a general definition to an understanding of its significance when employed for different purposes of communication. A more sensitive, modest approach to understanding the cultural significance of hypertext is to study if and how the link as a rhetorical device transforms specific forms of communication, and at what level such transformations occur.

An approach to the study of the effects of hypertext on scholarly communication will be sketched in the remainder of this paper. First we will focus on the scholarly text, the scholarly archive, and their hypertextualization.

SCHOLARLY TEXTS AND ARCHIVES

Here, I will focus on the scholarly text as a specific (super)genre, as opposed to literary texts, informative texts (such as manuals) and other (super)genres. On the whole, this distinction corresponds with Kolb [17], yet the texts produced in the sciences, the social sciences and the humanities are all included in the concept of scholarly text. In line with this distinction, ‘scholarly archives’ refers to collections of scholarly texts and the catalogues and reference works giving access to them.

Hypertext and the Scholarly Argument

Some years ago, there was a heated discussion about whether texts would move from print to digital form. If we look at scholarly texts, with the sole exception of the monograph, a significant proportion of journal papers, conference proceedings, preprints, and working papers are now archived and distributed on the web. Consequently there is a wealth of digital texts on the web today, allowing for empirical testing of hypotheses put forward with regard to hypertext and the narrative of scholarly argument.

Looking at the primary scholarly text, a general trend is discernible. While many visions have been put forth, few successful attempts at inventing new narrative forms exist. David Kolb has written interesting papers on the subject and has experimented with realizing some of his ideas, but they are not representative of any existing trend [17] [18]. Similarly, W.J.T. Mitchell’s "City of bits" is widely cited, but more for its form and experimental nature than for its content. Evidenced by the practices of scholars, even hypertext scholars, it is fair to say that a serious alternative to the uni-linear text-as-work for complex arguments has not yet been created.

No indifference is intended toward dialogic forms of writing such as can be found in chat-rooms or mailing lists, but it is important to realize that they perform different and more informal functions for scholarly communication than published papers or monographs. Also, even the uni-linear paper or monograph can be seen as an utterance in larger dialogue within a field of research. The fixity of this genre, as Elizabeth Eisenstein has shown, was instrumental in bringing about modern criticism and science, which have proved to be very dynamic forms of communication [7]. This is evident, not at the level of the individual text, but in the ever-growing archives of texts, such as modern research libraries. That this slower form of dialogue should somehow possess an unquestionable authority is, as a hypothesis, a far cry from the experience of most scholarly authors, exposed as they are to systematic criticism from all sides. Failure to acknowledge this has led Bolter [4] and Pierre Lévy [20] to see in digital textuality a return to a new form of mediated orality, reminiscent of Walter Ong’s notion of ‘secondary orality’, which referred to analogue electronic media such as radio, television and the telephone [26]. Yet, neither the ‘secondary orality’ of analog electronic media, nor dynamic digital texts, has replaced the fixed text. People making such claims generally overlook the division of labor between different media and ways of using them.

Why not favor the most obvious conclusion, that the fixity and uni-linearity of scholarly texts is largely unaffected, whether they exist in digital or printed form, because those features are important for the genre? They are what enable the presentation of complex coherent arguments in a form that can be studied, criticized and become part of the citable scholarly record. This argument might explain why it appears that neither scholarly writing nor reading scholarly texts has changed in the ways envisioned, not even in the hypertext community.

Something else has changed, however, namely the way we navigate the emerging digital scholarly archive on the web. As pointed out by Lyman & Kahle, hypertext is the rhetorical structure of the web, and therefore of this archive [21]. Although this might change, we can assume that hypertext will be part of any rhetorical structure fulfilling the function of the web in the foreseeable future. In any case, it is part of all significant initiatives in publication and archiving of scholarly texts today, and is therefore, the reality we have to deal with.

Hypertext and the Scholarly Archive

With the web, the vision of an interlinked archive giving immediate access to the corpus of recorded knowledge has become at least a partial reality. Crudely put, the web is a hybrid between Vannevar Bush’s idea of the Memex as a personal hyperlinked archive, and Theodor Nelson’s idea of a Docuverse, a public but centrally managed hyperlinked archive. What we have is a public, but distributed and decentralized archive, so referring to the web as a scholarly archive necessitates a few explanatory remarks.

The web, properly speaking, is not one archive, but a distributed system of more or less connected collections of texts. It is one ’cultural archive’, in the general sense that anything on the web in principle can be accessed from anywhere else on the web. Considered as a whole, this is a highly anarchistic network, which hardly qualifies for being described as a single metatext, as Landow suggests [19], and this is unlikely to change. Compared to print-based libraries, we cannot even catalogue the web in its entirety, as evidenced by the difficulties of the search engines in keeping up with the speed of new contributions to this dynamic archive.

Not everything on the web belongs to the scholarly archive; in fact most texts and forms of expression on the web belong to other genres and forms of communication. Yet, even if we focus only on scholarly texts, we are still at the embryo stage of having a working integrated scholarly archive. What we do have, are islands of more or less ordered and connected collections of texts (e-print archives, digital libraries, journal databases, single journals, subject gateways). On top of this, many parts of the emerging archive are not integrated, even at the level of forming collections (cf. self-published texts).

The Hypertextualized Archive

Still, whether we term it ‘Docuverse’ (Nelson), ‘hyper-library’ (Kolb) or ‘the metatext’ (Landow), few would probably contest that the web marks a new archival paradigm. I shall refer to this new archive as the hypertextualized archive. This term emphasizes two interwoven processes characterizing this archive as opposed to the print-based library.

The first is directly related to the link, and refers to the possibility of making semantic relations an active part of the archive. Included here are not only references between scholarly texts, but also indexes, catalogues, and reference works, which suddenly become much more powerful. Whereas a reference in the earlier system had to be mediated through the library catalogue, the link allows (in theory) direct access. This feature has been widely commented, but its overall effects have yet to be understood.

The second process concerns a more relative difference, and relates to what we in a general sense can describe as a textualixation of the interface of the archive. Interaction with digital archives is primarily based on their symbolic representations on a screen (or some other interface). To access the archive of digital records, we do not enter the library in person, but through the web. In the scholarly archive on the web, the physical location of texts is irrelevant, while the semantic positioning through references, lists, catalogues, metadata, and links is not. In other words, the architecture of archival buildings such as libraries has become insignificant for navigation in the scholarly archive. This implies that a number of previously physically invariant features of the texts and archives are now represented/simulated on the interface only when they facilitate navigation in the archive or serve other important functions [VI]. Should we represent bookshelves, determine page formats, let the catalogue be determined by the in-house collection of texts, or find ways to represent librarians on the interface?[VII]

Some of these questions are handled at the level of the general "readers" (browsers) employed, which operate both at the level of the individual text and at the level of the archive. There is an independent textuality involved in the framing and navigational potentials that these browsers offer. Anja Rau has touched upon this subject, comparing this textuality with Genette’s notion of paratext [29]. While this is a legitimate move, it is to some degree at odds with a defining criterion of paratexts for Genette: that the author/publisher assumes responsibility for the paratext [10]. This is not the case with the navigational features of a browser, yet Rau is correct in pointing out that neither is the notion of metatext appropriate.

Terminological differences aside, this discussion contains a larger perspective, which is the predominantly textual nature of the digital scholarly archive. As scholars, practically all the navigational possibilities we confront on the net are represented to us as texts. Whether lists of works, search options, author’s names, references, or links to other archives; we are confronted with texts. We experience a process of intensive textual cross-linking between archives, links from secondary texts to primary texts, citation links, links from comments (and reviews) to the texts commented, and links from search queries, to mention some examples.

These processes witness a hypertextualization of the scholarly archive compared to the typographic system of knowledge, with its geographically scattered buildings, shelves of books, and human intermediaries (librarians). It is important to note, however, that we have long been accustomed to navigating in a corpus of texts by other texts such as catalogs, references, abstracts and annotations. The crucial question, which might eventually produce a satisfactory account of the digital paradigm of knowledge, is how these textual forms are reconfigured and transformed in the shift from print to computer?

I will refer to the textuality at work here as second-order textuality. It is related, but not similar to the notion as employed by Francisco J. Ricardo [VIII]. The focus here is on the textual relations working at the interface of the text-as-works and the archive. Studying this second-order textuality requires a different theoretical framework than the ones suggested in hypertext theory as discussed here. Such a framework will be suggested here, based on the French literary scholar Gerard Genette, and his notion of transtextuality, which has hardly been mentioned in theories on hypertext [IX].

GENETTE'S THEORY OF TRANSTEXTUALITY

Before presenting Genette’s framework and assessing its value for a study of the hypertextualized scholarly archive, some differences of focus are worth pointing out. First, while there is a universal ambition in the concept of transtextuality, Genette studies primarily literary fiction, which differs from scholarly communication in many respects (e.g. systematic references, abstracts, practices of quoting). Second, Genette deals exclusively with the printed book, whereas the significance of the medium is part of the investigation here. Finally, Genette’s interest in transtextual devices is their influence on the reading and reception of texts, whereas my interest is how they influence the reading of the archive, or in other words, navigation among texts.

Genette first laid out a sketch of his project on transtextuality in The Architext: An introduction, stating "for the moment the texts interests me (only) in its textual transcendence – namely, everything that brings it into relation (manifest or hidden) with other texts. I call that transtextuality…"[11, p. 81]. Now, this could perhaps be mistaken for just another name for intertextuality, but intertextuality is only one of five types in Genette’s framework, of which the four remaining are paratext, metatext, hypertext and architext [X].

Intertextuality is characterized as "the literal presence (more or less literal whether integral or not) of one text within another", a strict definition related to Kristeva’s ‘classical’ notion of intertextuality as opposed to more broad, inclusive definitions [11]. Citing another text is an explicit intertextual relation, but also allusion and plagiarism fall within the scope of Genette's definition. This category is quite straightforward, and Genette has never done any studies in this area, arguing that others have conducted extensive studies on intertextuality.

By paratexts Genette understands, as Macksey put it in his foreword, "those liminal devices and conventions, both within the book (peritext) and outside it (epitext), that mediate the book to the reader", of which titles, author’s names, forewords, notes and prefaces are some examples [10]. As indicated in the subtitle of Genette’s book on paratexts, thresholds of interpretation, paratexts exist at the borders of a text, constituting a "zone without any hard and fast boundary on either the inward side (turned toward the text) or the outward side (turned toward the world’s discourse about the text)." [10, p. 2]. In other words, paratext provides the textual interface or framing of a text for which the author takes direct responsibility.

The third type of transtextual relation is the metatext, which Genette defines as "the transtextual relation that links the commentary to the text it comments upon. All literary critics, for centuries, have been producing metatexts without knowing it" [11, p. 82]. Genette has never pursued this perspective, reasoning that it would imply a study of the vast amount of literary criticism throughout history. Yet, if we discount other primary texts, this category can be reserved for secondary texts that comment upon a text or place it within a larger context. Those types of text would, for example, be comments, reviews, catalogues, classifications, journals, tables of contents, subject indexes, bibliographies etc. Admittedly this category spans from critical texts (comments, reviews), over classificatory texts, to purely arbitrary catalogues. It is, however, clearly separable from the intertextual relations between two primary texts. And it is also clearly separable from the paratext, which the author of the primary text is responsible for. Since the proposed definition adequately describes second-order texts, which ‘are about’ other texts, we will stick with it for the time being.

Hypertextual relations are defined in a way markedly different from computer-related definitions. They are "relationships of imitation and transformation, which pastiche and parody can give us an idea of". Hence, the superimposition of one text upon another that Genette speaks of is understood not literally, but in a metaphoric or semantic sense. For obvious reasons the use of the notion ‘hypertext’ in this sense today, while analytically legitimate, creates terminological confusion. Since this form of superimposition, analyzed in the book Palimpsests (1997), is also outside the scope of our investigation, we will simply disregard it in this context.

Finally an architextual relation is the one "that links each text to the various types of discourse it belongs to. Here we have the genres, with their determinations that we’ve already glimpsed: thematic, formal, modal and other (?)" [11, p. 82]. Neither will this attempt at (re)formulating a theory of genres be further discussed here.

There is an ascending level of abstraction in the types of transtextual relations as sketched here. While intertextuality and paratextuality can be exemplified with reference to specific textual forms, defined by their relation to primary texts (e.g. titles, references), metatextuality and hypertextuality refer to less concrete instances of relations between primary texts. And architextuality, even more abstract, refers not to the relations among individual texts, but to the theme and style of a text, which position it within the larger corpus of texts. This difference in the level of analysis might be the reason that there is a general ambiguity concerning the status of textual relations discussed by Genette. They are interchangeably referred to as relations or aspects of texts, on the one hand, and categories or types of text, on the other hand, as pointed out by Klaus Bruhn Jensen [16]. Instead of pursuing Jensen's critique of Genette’s typology, we can introduce the notion of embedding, which can deal with the overlapping categories. Thus we can speak of paratexts such as the title at the same time as acknowledging their potential intertextual and metatextual relations to other texts.

Most important, however is the notion of the threshold presented by Genette, which can be taken to refer to a certain second-order textuality, operating at a different level than texts-as-works. Some of the most important texts embodying this second-order textuality are secondary texts; ‘secondary’ not because they are insignificant, but because they constitute a zone of semantic relations "around", "about" and "in-between" scholarly texts. Thus, secondary texts are defined by their relation to the primary text, and not, as has been the common assumption in hypertext theories, in opposition to its narrative structure. Genette uses ‘threshold’ primarily about paratexts, which in most cases exist in a very physical sense at the borders of books. Here the term is used about intertexts and metatexts as well, since all three types work at the interface of the text and the archive. These are the texts that allow us to move from one text to another, that frame a text and position a text among other texts – they are the most manifest texts operating in the zone of second-order textuality.

LINKING PARATEXTS, INTERTEXTS AND METATEXTS

The textual relations described here are the ones we navigate by, whether we term it surfing, browsing, searching, meta-reading or scanning. This second-order textuality is not new: titles, indexes and references are old phenomena. But it is the level of textuality that is most affected by being hyperlinked. In other words second-order textuality has been built into the archive in a much more powerful way than before. More generally, I propose that this second-order textuality can be considered the dominant textual paradigm of the web.

The activity of metareading corresponds to this second-order textuality. Metareading thus refers to the reading involved, for example, in the scanning of a table of contents of a journal, a subject-listing in a collection, references in a text, a title or an abstract [XI]. This is a highly complex form of reading, involving numerous different interpretative moves. As opposed to reading per se (which can occur both in printed and digital texts) metareading (which can occur both in printed and digital texts), is characterized by the explicit presence of the choice – or we can say it is read with the choice in mind. The enormous potential of hypertext links is the ability to realize the effect of such choices immediately, without being limited by the collection available locally.

To exemplify this perspective we can briefly sketch some of the work being done by paratexts, intertexts and metatexts as a first step in understanding the reconfigurations and transformations of these textual forms in the hypertextualized scholarly archive on the web.

Paratexts

At first glance, the most notable feature of the paratext is perhaps the lack of attention this secondary text has received. Often, the interpretation of the paratext determines whether researchers ever proceed to the primary text itself, or disqualify it as irrelevant. Thus, it is hard to think of a text of more strategic importance for navigation in the archive than the paratext. We usually meet paratexts in the context of a metatext or a reference - in other words, as embedded in other texts. Thus, having browsed the categories of an archive, or performed a search on keywords, the text that confronts us is usually comprised of paratexts.

As authors we are certainly aware of the significance of the paratext, judging by the effort we put into coming up with the right title. And researchers are probably some of the most critical and attentive readers of titles. Titles indicate not only the content of a text, but also position it in relation to other texts and to research traditions. They even govern the expectations we have to a given text. To paraphrase Genette, imagine how James Joyce’s novel would have been read if it had not been entitled Ulysses? [10].

Also, researchers are well aware of the significance of the author’s name in the competition for attention from other researchers. The reason is that the name itself is one of the most important criteria for selecting texts, in the same way the title of a prestigious journal is. As a marker for quality and previous work, the name of the author- is now more important than ever – as also evidenced in the fame of those who foresaw its disappearance, such as Barthes and Foucault.

Apart from the title and the author’s name, the most important paratexts in the scholarly archive are the abstract and the keywords, which are specific to scholarly texts. Once a title or an author’s name has caught our attention, we can choose to read the abstract and the keywords. Only if these seem promising do we proceed to the primary text itself [XII]. As experienced by most readers of scholarly texts, the way to a text often goes through its paratext, and quite often through its embedding in an intertext or metatext.

Intertexts

The prototypical examples of intertexts are, of course, references and quotations. The intertextual reference explicates an interpretative move of the citing author on the text of the cited author, and thereby places it in the context of the citing authors’ argument. This is achieved by embedding paratexts of the cited text (author, title, publisher, date) as the building blocks of the reference, which serve both as identifiers and as texts to be interpreted themselves (cf. above). The modern formalized use of the reference, specific to scholarly texts, was invented after print made it possible to refer to a specific edition, knowing that all copies would be similar. After all, there is little reason to refer to something if you cannot be sure it stays the same for subsequent readers. References can be read in many ways. For a newcomer to a field, they might be a way into the texts on a given subject. To an expert within a field, they tell a lot about the given text and the framework of the author. The most apparent and significant effect of hypertext on intertexts is the possibility of following the references directly to the sources. Earlier intertextual relations were semantic only, which meant that any use of them required a significant effort, and the intermediate use of catalogues at the library, unless by chance you were already in possession of the texts referred to. This rather straightforward difference has often been noted in hypertext theory, although the conclusion derived by Landow is grossly overstated: "As this scenario suggests, hypertext blurs the boundaries between reader and writer" [19]. This misunderstanding originated with Bush, who suggested that creating links between texts would somehow constitute new books, but as most authors know, writing takes a good deal more than that. The reconfiguration at play here is better described as a changed relation between reading and second-order reading. The ease with which we can switch between these two modes of reading and the reach provided by links are significant new qualities in the scholarly archive. Of course these qualities might also result in readers ‘getting lost in cyberspace’, but then again, who has not been lost in piles of books or in the library?

The possibility of intertextual links raises a number of difficulties for archives and publishers who have to secure stable links. Intertextual relations, formerly the sole responsibility of authors, have now become a strategic issue for publishers and archives as well. While a pragmatic praxis of intertextual linking has long been common among authors and born-digital journals on the web, work is also being done on creating large-scale linking of references and citations in pre-print archives and in the formal journal literature [XIII]. These initiatives witness that the network of intertextual references is being integrated in the scholarly archive on the web.

Metatexts

The category of metatexts is perhaps the most complex, since it comprises a great variety of classificatory, critical, indexical and editorial texts. While highly different in character, metatexts all work to establish boundaries and connections between fields of research, to position texts among each other with regard to quality and topic, and to make texts visible to relevant research communities. This is done in editorial metatexts through the logic of inclusion or exclusion, and in classificatory metatexts by placing texts here or there in the classificatory system. These different strategies of mediating between individual texts and the larger textual corpus are obviously (as are paratexts and intertexts) highly consequential for navigation in the scholarly archive.

While metatextual strategies of ordering are indispensable, they are not without problems. A case in point is some of the deficiencies of elaborate systems of classification. Nelson somewhat boldly stated, "There is nothing wrong with categorization. It is, however, by its nature transient: category systems have a half-life, and after a few years, categorizations begin to look fairly stupid." [24, p. 2:49]. Hyperbole aside, Nelson points to a long-recognized difficulty of general classificatory systems in addressing the specific needs of scholars. A hypertextualized archive is surely not the end of classification, but perhaps we will increasingly regard classifications as more or less "local" metatexts to an ever-growing dynamic archive of texts. The most heavily used scholarly metatexts on the web are not general classification schemes, but discipline- or subject- based metatexts, such as subject gateways, e-print archives and journals.

Subject gateways (portals, metaindexes) are metatexts par excellence, since in many cases they do not archive texts themselves but simply link to texts distributed on the web. An interesting point is that different subject gateways act as different editorial metatexts, operating on the same corpus of texts, using different categories and concepts for organization. Some early examples of this type are Alan Liu’s humanities gateway, Voice of the Shuttle, and Daniel Chandler’s Media and Communication Studies site. Not only do they partly overlap with respect to areas covered, but also with respect to specific texts – they even link to each other. Are the editors/compilers that create these metatexts related to the new profession of trailblazers that Bush had in mind? To the extent that they browse post-publication texts, we can suggest that they are. The texts they produce, however, are not "trails" of links forming new books, but metatextual subject indexes covering specific topics or disciplines.

Notably, linking makes such metatexts de facto channels of publication for authors, because they can simply submit links to self-archived texts. The value added by those different subject gateways is primarily that of visibility and positioning of the individual text. Also, depending on the editorial criteria and effort, they can provide quality assurance and commentaries/annotations guiding the metareader. Still, they are not quality filters in the strict sense that peer-reviewed journals are.

If hypertexts have enabled subject gateways to resemble a form of publication, journals themselves are becoming more metatextual. When published on the web, the metatextual qualities of journals, manifested in journal titles and tables of contents, are detached from the journal as a physical medium of distribution and preservation. Instead of distributing multiple print copies around the world and preserving them locally, one archive on the web, accessible from any other point on the net, can do the job. For this reason subscription to a digital journal is usually access to a database of articles, combined with e-mail notification on new issues. The mail contains the traditional table of contents, sometimes supplemented with abstracts, and usually links directly to the articles of the current issue. The issue, however, is just one principle of organization among other potential metatextual orders in the archive of the journal title. In thematic journals, each issue carries a higher degree of semantic unity than the journal as such, but in most peer-reviewed journals it corresponds simply to ‘recent additions’ to the archive of the journal title. The possibility of alternative orders is most evident when more journals are organized in journal databases. One example is the digital library of the ACM, which can be searched and browsed by numerous criteria (proceedings of SIG’s, journal titles, year, authors, keywords, related articles etc.). A more radical example of metatextual journals is the physics e-print archive at the Los Alamos National Laboratory [XIV]. This archive, set up by Paul Ginsparg in 1991, allows authors to self-submit pre-prints, organized by sub-disciplines, before, or at the same time as submitting papers for journal publication. A number of physics journals have accepted this parallel form of publication. This, as has been noted by Hitchcock et al., is challenging the exclusivity of dissemination so far held by journals [15]. Ginsparg himself aptly speaks of journals as one among other forms of overlays to archives [12]. The journals in question have become metatextual overlays to archives which are also accessible by other means [XV].

A general formulation of this transformation is that the physical redundancy of print collections is being replaced by an increasing redundancy of metatextual overlays in the emerging digital system. Also, researchers will increasingly push for integration of collections and metatexts, in the sense that metatexts will be increasingly expected to provide direct access to the texts they refer to.

Some might believe that automation, (so-called) intelligent agents, and search engines will make metatexts less, not more, significant. But, while these can be helpful aids, they are not replacing metatexts – in fact the output of such techniques is always a metatext to be interpreted. Take as an example the text generated as the result of a search inquiry. This user-generated metatext differs from the other metatexts discussed in its lack of editorial organization. More precisely perhaps, we can consider the researcher as the ad hoc editor through the specification of search criteria. As for all the metatexts sketched here, user-generated metatexts are only the first step from the point of view of the researcher. They are metatexts requiring interpretation and further selection.

Second-order Textuality on the Web

The point I am trying to make is that navigation in the scholarly archive involves a second-order textuality, working through a complex web of intertexts, paratexts and metatexts. While the scholarly archive is becoming technically more seamless and efficient, its semantics are increasing in complexity. Thus, the zone of second-order textuality is expanding, and requires critical metareading more than ever.

Also, I have been trying to make a case for a perspectival focusing on intertexts, paratexts and metatexts on (and off) the web, as a way of studying the transformation of the interface of the scholarly text and the scholarly archive in a time of transition. No pretense is made as to having proven the case for this framework, but hopefully some indications of the possible fruitfulness of applying it have been suggested.

FURTHER PERSPECTIVES

While the results sketched here are preliminary results from an ongoing project, we can formulate some tentative conclusions concerning the emergence of a new archival paradigm. Texts are being digitized and hyperlinked – but this paradigm shift is not a break with the scholarly argument as we know it – rather the textuality that has already existed around, between, and about texts is being reconfigured in the new scholarly archive. The paradigm shift occurring at the archival level is better characterized as a shift from the text in the archive, where the archive is a classified collection of texts, to the archive as a network of texts. As a paradigm of archival organization, we can contrast the network with the hierarchical tree. The tree as an organizational structure of the archive, as Peter Burke has recently described it, has been known at least from the middle ages, but flourished and became a dominant paradigm after the printing press [5]. Nothing would be more wrong than to assume that the existence and usefulness of classificatory trees are threatened, but their status might be changing as the prototypical archive changes, from being modeled on the ideal of a universal collection, to a model of being one among many integrated, overlapping, heterogeneous archives in a larger network. Also, at the intertextual level, reference and citation, linking in a much more powerful way than printed references, allows readers to ignore classifications and move directly between texts when it is more convenient. The ‘Networks of scientific papers’ that Derek de Solla Price spoke of with regard to citations are thus becoming part of the archive itself [28], supporting the hypothesis of a significant shift in the scholarly archive, and the nature of this shift.

CONCLUDING REMARKS AND FURTHER WORK

This paper has attempted to formulate a framework adequate for studying the textual relations at play between text and archive in the scholarly archive. While the ideas in this paper are directed at scholarly communication, some might have relevance for other genres as well. The framework is contemplated as a general one, though developed with scholarly communication in mind.

In scholarly communication the textuality at play here touches upon an issue of strategic importance. The most pressing problem confronting scholarly communication today is not a lack of critical thinking, almighty authors, or constraining linearity, but the problem of selection and navigation in an increasingly complex ecology of knowledge. The notion of second-order textuality points both to the increased significance of knowledgeable navigation in the scholarly archive and to a perspective on textuality that might benefit by further studies, and finally, as has been suggested, to a textual paradigm characteristic of the web.

ACKNOWLEDGMENTS

I wish to acknowledge researchers at the Dept. of Media- and Information Studies and researchers at the Center for Internet Research hosted by Aarhus University, Denmark (http://imv.au.dk/cfi/eng/index_eng.html). Also, I wish to acknowledge participants in the cross-disciplinary research project Cyborgs and Cyber space: between narration and sociotechnical reality (www.cyborgs.sdu.dk) of which my project is a part. Finally, I wish to thank my reviewers for thoughtful and helpful comments.

FOOTNOTES

  1. In the rest of this paper the word 'scholarly' refer 'to both sciences and arts, as in the German notion 'Wissenschaft'. The unfortunate bias of 'scholarly' towards the humanities is preferable to the even more exclusive notion of 'scientific' in the Anglo-Saxon tradition.
  2. This has been noted by various observers. See for example James O'Donnell [25] and Espen Aarseth [1].
  3. Pang also includes theorists such as Myron Tuman and Richard Lanham. For a fine introduction to early hypertext theory, see Snyder [31].
  4. I am using a definition formulated by Joshua Meyrowitz, which distinguishes between medium, rhetorical structure and content as three different perspectives on media [22]. For a theory of the fixed features of the computer see a dissertation on the subject by Niels Ole Finnemann [9].
  5. I am indebted to the reviewer who mentioned the existence of a related argument put forward by Adrian Miles [23]. Miles compares the ÔeditÕ and the 'link' arguing for "the cinematic nature of hypertext". The paper is interesting, but sidesteps a difference: the link has to be actualized (selected) by whoever 'reads' it, whereas the 'edit' in movies represents a predefined sequence for the viewer.
  6. This is an example of a more general aspect of digitalization. As Finnemann has noted, when we represent earlier media in the computer, "some of their former invariant properties become variable parameters", capable of manipulation at the level of the binary code [8, p. 14].
  7. I do not have any sort of advanced virtual reality representation or artificially intelligent cybrarian in mind, but simply want to note the fact that the mediator function requires librarians to be accessible through the net.
  8. Ricardo, although he uses the term 'paratext', focuses primarily on creating layers of text revealing semantic and structural relations within text-as-works [30].
  9. Landow and Aarseth both cite Genette's work on narrative theory, but not his books on transtextuality. One exception, however, is Luca Toschi's article on 'Hypertext and Authorship', that discusses the significance of paratexts [32].
  10. In The architext: An introduction, Genette operated with only four categories, which were subsequently expanded to five in palimpsests. Macksey has summarized these clearly and precisely in the foreword to Paratexts, the English translation [10].
  11. For a related but also wider use of the notion of metareading see Patrick Bazin [2].
  12. I am disregarding in this context other means of selecting primary texts, such as personal recommendations, etc..
  13. See, for example, Hitchcock et al. [14] [15] and Harnad & Carr [13], describing their work on OpCit, a system for citation linking in e-print archives. Other notable examples are the ResearchIndex (formerly the CiteSeer) developed by Bollacker et al. [3] and CrossRef, started by a consortium of journal publishers but is today an independent initiative.
  14. See http://arxiv.org/
  15. To what degree this specific model can be generalized is a highly controversial topic, since many journal publishers oppose giving up copyrights, but in the long run publishers will have to offer archives with similar flexibility. For a discussion of this issue with many good references see the mail list "September98-Forum" moderated by Stevan Harnad: http://amsci-forum.amsci.org/archives/september98-forum.html

 

REFERENCES

  1. Aarseth, E. (1995) Cybertext – Perspectives on Ergodic Literature, Bergen, University of Bergen.
  2. Bazin, P. (1996) "Toward Metareading" in The Future of the Book, ed. Nunberg, G., Berkeley: University of California Press, pp. 153-168.
  3. Bollacker, K.D. et al. (1998) "CiteSeer: An Autonomous Web Agent for Automatic Retrieval and Identification of Interesting Publications" in Proceedings of the second international conference on Autonomous Agents, pp. 116-123.
  4. Bolter, J. D. (1991) Writing Space – the Computer, Hypertext, and the History of Writing, New Jersey: Lawrence Erlbaum Associates.
  5. Burke, P. (2000) A social history of knowledge - from Gutenberg to Diderot, Malden, MA: Polity Press.
  6. Bush, V. (1996) "As We May Think", in Druckrey, T., Electronic Culture, New York: Aperture, pp. 29-45. Originally published in The Atlantic Monthly (1945).
  7. Eisenstein, E.L. (1979) The Printing Press as an Agent of Change, vol. 1+2, Massachusetts: Cambridge University Press.
  8. Finnemann, N.O. (1999) "Hypertext and the Representational Capacities of the Binary Alphabet", in working paper series, Aarhus, Centre for Cultural Research. http://www.hum.au.dk/ckulturf/pages/publications/nof/hypertext.htm
  9. Finnemann, N.O. (1994) Tanke, Sprog og Maskine – en teoretisk analyse af computerens symbolske egenskaber, Danmark: Akademisk Forlag. English tr. G. L. Puckering: http://www.au.dk/cfk/DOCS/PHP/finnemann.htm
  10. Genette, G. (1997) Paratexts – thresholds of interpretation, tr. J. E. Lewin, New York: Cambridge University Press, org. Seuils (1987).
  11. Genette, G. (1992) The Architext: An Introduction, tr. J. E. Lewin, Berkeley: University of California Press, org. Introduction à L’architexte (1979).
  12. Ginsparg, P. (1997) "First steps towards electronic research communication" in Dowler ed. Gateways to knowledge, Cambridge & London: MIT Press, pp. 43-58. Early version from 1994: http://arxiv.org/
  13. Harnad, S. & Carr, L. (2000) "Integrating, Navigating, and Analysing Open Eprint Archives through Open Citation Linking (the OpCit project)" in Current Science, 79:5, pp. 629-638.
  14. Hitchcock, S. et al. (1997) "Citation Linking: Improving Access to Online Journals" in Proceedings of the second ACM International conference on Digital Libraries, pp. 115-122.
  15. Hitchcock, S. et al. (2000) "Developing Services for Open Eprint Archives: Globalization, Integration and the Impact of Links" in Proceedings of the fifth ACM conference on Digital Libraries, pp. 143-151.
  16. Jensen, Klaus B. (1999) "Intertextualities and Intermedialities", in Sekvens 99: årbog for Film- og Medievidenskab, ed. I. Bondebjerg & H. K. Haastrup, Copenhagen, University of Copenhagen, pp. 63-86.
  17. Kolb, D. (1997) "Scholarly Hypertext: Self-Represented Complexity", in Proceedings of the eighth ACM conference on Hypertext, pp. 29-37.
  18. Kolb, D. (1994) "Socrates in the Labyrinth", in Hyper/Text/Theory , ed. G. Landow, Baltimore: John Hopkins University Press, pp. 323-344.
  19. Landow, G.P. (1997) Hypertext 2.0 - The Convergence of Contemporary Critical Theory and Technology , rev. ed., Baltimore: John Hopkins University Press.
  20. Lévy, P. (1998) Becoming Virtual – Reality in the Digital Age, tr. R. Bononno, New York, London: Plenum.
  21. Lyman, P. & Kahle, B. (1998) "Archiving Digital Cultural Artifacts", in D-Lib Magazine (july/august), at http://www.dlib.org/dlib/july98/07lyman.html
  22. Meyrowitz, J. (1993) "Images of Media - hidden ferment in the field", in Journal of Communication, 43:3, pp. 55-66.
  23. Miles, A. (1999) "Cinematic paradigms for hypertext" in Continuum: Journal of Media and Cultural Studies, 13:2, pp. 217-226.
  24. Nelson, T.H. (1981) Literary Machines, Swarthmore, Pa. version 93.1.
  25. O’Donnell, J.J. (1993) "St. Augustine to NREN: The Tree of Knowledge and How It Grows" in The Serials Librarian, 23:3/4, pp. 21-41.
  26. Ong, W.J. (1982) Orality and literacy – the technologizing of the word, London: Routledge.
  27. Pang, A.S. (1998) "Hypertext, the Next Generation: A Review and Research Agenda" in First Monday, 3:11. At http://www.firstmonday.org/issues/issue3_11/pang/
  28. Price, D.J.S. (1979) "Networks of Scientific Papers - The pattern of bibliographic references indicates the nature of the research front" in The Scientific Journal, ed. Meadows, A.J., London: ASLIB, pp. 157-162. First published in Science vol. 149 no. 3683, 30 july 1965, pp. 510-515.
  29. Rau, A. (1999) "Towards the recognition of the shell as a integral part of the digital text", in Proceedings of the tenth ACM conference on Hypertext and Hypermedia: Returning to our diverse roots, pp. 119 – 120.
  30. Ricardo, F.J. (1998) "Stalking the Paratext: Speculations on Hypertext Links as Second Order Text", in Proceedings of the ninth ACM conference on Hypertext and Hypermedia: Links, Objects, Time and Space-structure in Hypermedia Systems, ACM press, pp. 142-151.
  31. Snyder, I. (1996) Hypertext – The Electronic Labyrinth Melbourne: Melbourne University Press.
  32. Toschi, L. (1996) "Hypertext and Authorship" in The Future of the Book, ed. Nunberg, G., Berkeley: University of California Press, pp. 169-207.

The preferred form of citation for this article is:
Dalgaard, Rune (2001) "Hypertext and the Scholarly Archive: Intertexts, Paratexts and Metatexts at Work" in Proceedings of the twelfth ACM conference on Hypertext and Hypermedia (august 14-18, Aarhus, Denmark). New York: Acm Press: 175-184.
Available online at www.acm.org or at author website: http://imv.au.dk/~runed/pub/pub.html

 

[To webpage of Rune Dalgaard]

By Rune Dalgaard. Created june 5. 2001