This is the first in a series of editorial comments, written especially for
those users of the Journal Citation Report® (JCR)
with an interest in deepening their understanding of that product.
The topics addressed in this series all arise from the comments we receive
regarding JCR data and its appropriate use. Our goal is to answer
commonly asked questions, to provide new ideas to help our users develop cogent
journal evaluation methods, and to remind users and critics alike about the
many JCR applications.
Cited Title Unification or Making a Molehill out of a Mountain
Key to an understanding of the value of a citation index is the recognition
that the birth of citation indexing was closely tied to a desire to apply machine
automation to the organization and management of scientific literature. Machine-generated
indexing offered a way of reducing the amount of human intervention for subject
classification and indexing. The computer promised a reduction in the processing
time required for incorporation of a bibliographic record into a computerized
database as well. (For more on this, read our essay on the history of citation
indexing).
But by virtue of its incorporation into a database or machine-readable file, the bibliographic record must be standardized in order to facilitate and maximize retrieval of results. A simple example of how standardization might be required can be seen in the various abbreviations for journal titles in records produced from a wide variety of journals and publishers. For instance, the journal title may be referenced as J CHROM or it may read J CHROMATOGR. Different disciplines, even different publishing organizations, have established their own bibliographic styles and practices over time. How does a database producer impose order on a widely divergent set of records?
Database producers, such as Thomson Scientific, generally have established processes for imposing standardization on the various data, known as unification. In particular, journal title unification may be defined as the process whereby a computer locates data strings that match known variant title spellings and updates them to an associated standard or preferred title spelling.
Consider that approximately 25 million citations are processed annually for inclusion in the Thomson Scientific database. About 14 million of these citations refer to one of the nearly 6500 journals listed in the SCI Journal Citation Report® and the SSCI Journal Citation Report®. The remaining 10-11 million citations are distributed over roughly 1.5 million titles, including cited journals, cited books as well as individually cited chapters, patents, conversations, etc.
For some products, such as the Journal Citation
Report (JCR),
data unification represents a significant aspect
of its value to the user. In the absence of some
form of unification, citations that have some variant
form of a journal title could be passed over or
misattributed, thus affecting the reported citation
frequency of the journal and hence any subsequent
perception of the journal's influence. Yet ironically,
as we will see, the subtitle of this segment, Making
a Molehill Out of a Mountain, has two
meanings. Thomson Scientific editorial experts
must correlate tens of millions of cited titles
with a set of standard abbreviations, a mountain
of effort but an important aspect of database quality.
The retrieval of citations associated with a given
journal title and its associated abbreviation has
direct bearing on the ranking that journal may
receive in the JCR.
Yet we may go to great extremes to attribute a
handful of additional cites to JCR journals
only to see resulting averages, such as the impact
factor, altered by perhaps only a tenth of a percentage
point. That represents the molehill of our title.
Unification is necessary to the value of the
product but its final impact may be less than
would appear justified.
Because the JCR dataset is more narrowly circumscribed
than that of the citation indexes and because of the
statistical calculations that must be performed on
the data, the extent of unification in the JCR is
greater than in the Science Citation Index® and Social
Sciences Citation Index®.
The time and effort involved in normal system maintenance is compounded by the time and effort required to seek out those problems specific to this resource. The JCR unification dictionary lists approximately 200,000 cited variants for roughly 15,000 journals that either are currently covered or were covered at one time in Thomson Scientific products. There is an average of twenty cited variants per journal listed in this dictionary. (Some journals are listed with more than 100 variants.)
Any and all of the following can affect the standardization and unification of JCR data:
- bibliographic standards and accuracy in published references
- bibliographic standards in journal title presentation on covers, spines, copyright pages, headers or footers
- Thomson Scientific data entry guidelines for keying
references
- Thomson Scientific title abbreviations in our source journal records
- Thomson Scientific policies on source journal record maintenance
- cited title unification during JCR production
- ambiguous cited titles
- the potential for human error in all phases of presenting and processing journal titles
It is easy to conceive how authors, journal editors
and Thomson Scientific data entry personnel could
make simple keying errors when processing citations,
but these kinds of errors are often anticipated and
caught during regular automated and manual checks
in our database load and JCR production systems. In fact, we have a rating of more than 98% in our accuracy of input. More significant unification problems tend to stem from exceptional situations and deviations from bibliographic standards. It seems that every possible exception that might occur in the presentation and citation of a journal title will occur at some point. Such exceptions can cause conflicts in JCR-related systems.
For example, to process cited titles, we use a twenty character field length. Occasionally, when following standard abbreviation practices, that field length is not adequate for inclusion of all the discriminating information for a longer title. Imagine a journal with the title, International Journal of Manufacturing and Production Systems.
Such a title might appear within the Thomson Scientific
system as INT J MANUF PROD S. But imagine that another
title is considered for inclusion within the database
with a very similar title, the International Journal of Manufacturing and Production Services. It would be easy for the latter to be abbreviated in an identical fashion as the first title. In discovering such a conflict, however, our handling would involve a slight modification within the system of the first title's abbreviation so that the title would appear as INT J MANUF PROD SYS. Such handling might not perfectly meet ISO standards for such material but it would still eliminate a great many of the variant citations.
In another case, the Journal of General Microbiology
changed title to Microbiology which brought it into
conflict with the translation journal for the Russian Mikrobiologiya.
To handle these types of difficulties, a new JCR production table —
the new JCR Homograph Table — will compare volume
and publication year to confirm which would be the
correct Thomson Scientific association. MICROBIOL-UK
will be the Thomson Scientific designation for the
former Journal
of General Microbiology while MICROBIOLOGY+ represents our designation
for the latter. (Note: The plus sign on this last abbreviation indicates that
cites to the original language journal are unified with cites to the translation
title.)
Users have a role in our unification process as well. They frequently perform an invaluable service when they identify ongoing, new or potential problems with our methods of coping with irregularities — and we are deeply grateful when these are brought to our attention.
We are also appreciative when publishers consult us on title changes they may be considering. Such title changes may adversely affect JCR production by creating an unusual situation or by causing a conflict with other journal title abbreviations. By working with our users and with publishers, we can maintain the high level of precision and quality in our citation databases and in the data analysis that appears in the JCR.
This essay was prepared by: Janet Robertson, JCR
Project Coordinator, Thomson Scientific