The Concept of Citation Indexing
This is my first Current Contents (CC) essay under the rubric of Citation Comments. As discussed in last week’s CC, this new monthly feature will focus on the applications of the Institute for Scientific Information’s (now Clarivate Analytics) databases(1). An appropriate topic to launch this new series is perhaps the most rudimentary — the basic concept of citation indexing.
To start, it is important to clarify the terminological distinction between “citation” and “reference”. In his classic book Little Science, Big Science, Derek Price gave a clear definition of both terms. He said: “It seems to me a great pity to waste a good technical term by using the words citation and reference interchangeably. I therefore propose and adopt the convention that if Paper R contains a bibliographic footnote using and describing Paper C, then R contains a reference to C, and C has a citation from R. The number of references a paper has is measured by the number of items in its bibliography as endnotes, footnotes, etc., while the number of citations a paper has is found by looking it up [in a] citation index and seeing how many others papers mention it.” (2) (p. 284)
In a nutshell, citations symbolize the conceptual association of scientific ideas as recognized by publishing research authors (3>. By the references they cite in their papers, authors make explicit linkages between their current research and prior work in the archive of scientific literature. These conceptual associations have been described by Robert Merton, Manfred Kochen, and other scholars as intellectual transactions, formal acknowledgments of “intellectual debt” to an earlier source of information (4,5). That is, explicit references imply that an author has found useful a particular published theory, method, or other finding.
Clarivate Analytics databases index these intellectual transactions by listing both the cited and citing works. (That is, the cited work is a paper or book that has been mentioned in the references of other works, while the citing work is the one that contains the references.) The citation indexes were originally designed primarily for information retrieval. Mainly but not exclusively through citation connections, the databases enable you to navigate the literature in unique ways. As a result, you are able to locate relevant papers independent of language, title words, or author keywords. A variety of citation-based search strategies are available, including bibliographic coupling or linking of papers through shared references (Related Records) (6> KeyWords Plus ,(7,8), and others.
Unique Advantages of Citation Indexes
The Clarivate Analytics databases differ from traditional indexing and abstracting services in several ways. From the outset, the Science Citation Index (SCI), Social Sciences Citation Index (SSCI), and Arts & Humanities Citation Index (A&HCI) have been multidisciplinary. They cover virtually all disciplines whereas traditional services are limited to a single field.
The advantages of a multidisciplinary index can be exemplified by the work of Nobelist Harold C. Urey. Published in Science in 1962, “Lifelike forms in meteorites” described the chemical compounds they contained that were essential to the formation of life on earth under the right conditions.(9) This paper deserved to be indexed in a variety of single-discipline databases. But more importantly, citations to this paper have appeared in a large variety of journals in astrophysics, biology, cosmology, chemistry, earth sciences, geochemistry, and so on.
Clarivate Analytics indexes are also comprehensive, providing complete coverage of all types of published source items–not just original research papers, review articles, and technical notes but also letters, corrections and retractions, editorials, and other items. Clarivate Analytics studies have shown that these latter items are important, have substantial impact, and provide useful links to scientific issues and controversies.
As stated at the outset and perhaps most importantly, Clarivate Analytics uniquely indexes the references cited by these source items. This gives you the ability to perform prospective as well as retrospective searches of the literature. Like other indexes, Clarivate Analytics databases allow you to move back in time to locate previously published papers. But Clarivate Analytics databases uniquely allow you to move forward in time—to determine who has subsequently cited an earlier work. Thus, by starting with a single paper or book, you can identify whatever additional papers have referred to it. And each retrieved paper, in turn, may provide a new list of references with which to continue the citation search.
Authoritative, Timely, In-Depth Access to the Literature
It is important to stress that the citation-based associations and connections within the literature are made by authors themselves. Traditional indexes typically rely on human subject specialists to categorize and describe papers, usually using controlled vocabularies or thesauri.
A potential drawback of the latter method is illustrated in my early experience in compiling a list of references on “general adaptation syndrome.” Out of a sample of papers published in a five-year period, 23 had cited Hans Selye’s primordial paper (10). But even though all 23 were indexed in Index Medicus, not one was listed under the MeSH heading, “Adaptation.”
Another shortcoming of human indexing is that there is an inevitable delay due to the time required to read or scan the papers and make subjective judgments about relevant descriptors. In short, timeliness is reduced. In contrast, citation indexing does not involve this type of analysis, which enables the SCI, SSCI, and A&HCI to be virtually concurrent with the literature.
In addition, due to the expense of human indexing, traditional indexes limit the number of terms. But in Clarivate Analytics citation indexes, all cited references are indexed. Since the typical research paper today contains from 25 to 35 references, the resulting number of index entries is correspondingly high. Indeed, citing papers provide useful indexing “statements” or descriptors through the papers they cite.
Citations as Indexing Statements
Thanks to a suggestion by Chauncey Leake in the 1950s, I conducted a thorough analysis of review articles and their cited references. By doing what today would be called context analysis, I soon discovered that the sentences in the review articles were actually detailed, descriptive indexing statements about papers or books they cited.
Several years before ISI (now Clarivate Analytics) was founded, this basic notion was further developed with Robert L. Hayne when we both were at Smith, Kline and French Labs in the 1950s. Through large test samples, we concluded that the titles of papers cited in reviews and other articles were sufficient to add useful descriptive words and phrases to the citing paper. This was later confirmed in studies by A. J. Harley, as Irv Sher and I recently reported (11,12).
In 1990, ISI (now Clarivate Analytics) was able to introduce this citation-based method of derivative (algorithmic) subject indexing, called KeyWords Plus(7,8). In addition to title words, author-supplied keywords, and/or abstract words, KeyWords Plus supplies words and phrases to enhance these other descriptors and thereby retrievability. These KeyWords Plus terms are derived from the titles of cited papers, which have been algorithmically processed to identify the most-commonly recurring words and phrases.
In the space available, it is not possible to stress all the innovative advantages of citation indexing for information retrieval or to illustrate in detail the variety of search strategies it makes possible. While future Citation Comments will address these topics, it is perhaps more important to stress here why scientists should get into the habit of literature searching. One of the more obvious reasons is to avoid the unwitting duplication of research and the wasted time, effort, and funds this involves.
For example, in 1964, John Martyn, Aslib Research Department, London, showed how unintentional duplication is related to ignored or missed sources in the literature.13 He surveyed about 650 British scientists and asked if they had later discovered information in the literature they wished they had at the beginning of their projects. Twenty-two percent said yes and cited 245 specific instances. Of these, 18 percent involved unintentional research duplication. And in 43 percent of these instances, the researchers felt that time, money, or work was wasted.
I’ve always believed that authors should be held by journal editors to the same “due diligence” standards required of inventors by patent offices. That is, authors should formally assert and verify that their ideas are original and do not replicate discoveries already reported in the archives. Consequently, they should be required to acknowledge the “prior art” that directly or indirectly influenced their research.
In my opinion, the problem begins with teaching. Too few colleges require undergraduates to learn how to search the literature. But with proper mentoring, students should come to graduate schools already conditioned to do “prior art” searching—and practice these techniques throughout their careers, whether in academia or industry.
Dr. Eugene Garfield
Founder and Chairman Emeritus, ISI
- Garfield E.From Current Comments® to Citation Comments: continuing a 31-year series of Current Contents® essays with a new focus. Current Contents(51/52):3-5, 20-27 December 1993.
- Price D. J. D.Little science, big science…and beyond. New York: Columbia University Press, 1986. 301 p.
- Small H. G.Cited documents as concept symbols. Soc. Stud. Sci. 8:327-40, 1978.
- Merton R. K.Foreword. (Garfield E) Citation indexing—its theory and application in science, technology, and the humanities. Philadelphia: ISI Press®, 1983. p. vi.
- Kochen M.How do we acknowledge intellectual debts? J. Doc. 43:54-64, 1987.
- Garfield E.Announcing the SCI® Compact Disc Edition: CD-ROM gigabyte storage technology, novel software, and bibliographic coupling make desktop research and discovery a reality. Current Contents® (22):3-13, 30 May 1988. (Reprinted in: Essays of an information scientist: science literacy, policy, evaluation, and other essays. Philadelphia: ISI Press®, 1990. Vol. 11. p. 160-70.)
- ———-KeyWords Plus®: ISI®‘s breakthrough retrieval method. Part 1. Expanding your searching power on Current Contents on Diskette®. Current Contents (32):5-9, 6 August 1990. (Reprinted in: Ibid., 1991. Vol. 13. p. 295-9.)
- ———-KeyWords Plus takes you beyond title words. Part 2. Expanded journal coverage forCurrent Contents on Diskette includes social and behavioral sciences. Current Contents (33):5-9, 13 August 1990. (Reprinted in: Ibid., 1991. Vol. 13. p. 300-4.)
- Urey H. C.Lifelike forms in meteorites. Science 137:623-8, 1962.
- Selye H.General adaptation syndrome. J. Clin. Endocrinol. 6:117-230, 1946.
- Gray W. A. & Harley A. J.Computer assisted indexing. Inform. Storage Retrieval 7:167-74, 1971.
- Garfield E. & Sher I. H.KeyWords Plus—algorithmic derivative indexing. J. Amer. Soc. Inform. Sci. 44:298-9, 1993.
- Martyn J.Unwitting duplication of research. New Sci. 21:338, 1964.