Dime Novels and Penny Dreadfuls
Original Project Outline and Proposal
The Dime Novel and Story Paper Collection consists of over 8,000 individual
items, and includes long runs of the major dime novel series (Frank Leslie's
Boys of America, Happy Days, Beadle's New York Dime Library, etc.) and equally
strong holdings of story papers like the New York Ledger and Saturday Night.
Both genres flourished from the middle to the close of the 19th century in
America and England (where the novels were known as "penny dreadfuls"),
and benefited from three mutually reinforcing trends: the vastly increased
mechanization of printing, the growth of efficient rail and canal shipping, and
ever-growing rates of literacy.
The dime novels were aimed at youthful, working-class audiences and
distributed in massive editions at newsstands and dry goods stores. Though the
phrase conjures up stereotyped yarns of Wild West adventure, complete with lurid
cover illustration, many other genres were represented: tales of urban outlaws,
detective stories, working-girl narratives of virtue defended, and costume
Story papers, weekly eight-page tabloids, covered much the same ground, but
often combined material and themes to appeal to the whole family. The chief
among them had national circulations greater than any other newspaper or
magazine, some reaching 400,000 copies sold per issue. Unlike the dime novels,
which generally confine illustration to the cover, the story papers integrate
text and illustration (in the form of wood engravings) throughout.
Many dime novels were microfilmed by UMI as part of a large dime novel
collection it prepared about twenty years ago. Without a more extensive survey,
the uniqueness and research value of Stanford's collection, while significant,
These materials are in very fragile condition, and some form of rehousing
will be required before any form of access can be made. Even with protective
housing, the dime novels and penny dreadfuls access to the originals will need
to be limited, due to their fragile condition. For this reason, we
propose rehousing the entire collection as part of this project.
Since the intellectual content of this type of the material is of a limited
nature, we do not feel that the project needs to cover the entire collection at
the outset. Rather, we believe an approach that develops a model
digital representation of the entire collection would serve current
needs better and at the same time give us an approach that could be of general
use for other collections of containing primary resources and mixed media. This
model digital representation will be designed and implemented in such a way as
to permit the entire collection to be linked in and digitized in the future if
we should so choose.
In this "model approach", access to the content of the
representative sample is required. The conjunction between visual and text is
very important in these materials, but it is not necessary to render every page
image. Page images of all or many of the covers and illustrations should be
provided. Most dime novels only have illustrations on the covers; penny
dreadfuls often have illustrations inside, as well, and an adequate selection
of these pages should likewise be rendered.
It has become clear through discussions with the curators that such a model
representation should be based on characteristics or salient features of the
collection that are not usually captured in a bibliographic description. Those
pursuing studies in cultural history, gender studies and literary studies, for
example, would be interested in knowing details such as the target audience,
genre focus, race and class representations, and so on. The curators will
therefore develop a template-based database of features (hereafter
features database)that will be captured as part of the
project, and linked to bibliographic items in the collection.
Therefore, to meet all of the intellectual access requirements for this
approach, we propose using a variety of technologies (scanning, OCR,
rekeying, SGML encoding, traditional cataloging, creation of value-added
database of features) to create a model digital representation that
could serve as an introduction to the collection, be of sufficient size and
importance to serve as a teaching collection, and offers a framework for the
collection's complete reformatting at some future date.
- Prepare collection for rehousing
Small books: (up to 5.5 x 8.5 inches). These will be
stored in mylar envelopes (6 x 9 inches) with opening at top. The bagged books
can then be stored in some form of document case or record storage box with a
central insert to allow a maximum number of books per box.
Midsized materials: (up to 9 x 12 inches). Items will be
placed in polyethylene bags with archival board inserts for support, and
separated with document spacers.
Large format materials (up to 11 x 15 inches). These will be
stored in polyethylene zip-lock bag. These will be stored flat in one of the two
- Put collection, particularly serials and monographic series, in order
Worker (skill level, i.e. student, other, needs to be determined) identifies
runs split between boxes, using the collection inventory.
Going through the boxes in order, worker sets aside issues known to come
from split titles for later handling, and where all issues are present: removes
and discards plastic bags and tape; places each item in its new housing; packs
items into boxes and either labels boxes with contents, or annotates list with
new box number.
As complete runs of titles are assembled, they are then ready for curatorial
- Curatorial review collection to determine which items need to be
captured for the features database
- Initial curatorial screening for inclusion in model digital representation
- Transfer of collection for technical processing
- Creation of features database record for identified items
- Creation of bibliographic record for bibliographic entities (no analytics
required; however holdings statements will be created where appropriate)
Creation of Access Model
- Curatorial determination (developed with the help of relevant
curators/scholars/faculty team member(s) of the items to be included in the
- Development of access model (on-going)
- Digitization of identified items (resolution, bit-depth, preservation and
access copy needs to be determined)
- OCR and/or rekeying of textual material
- SGML-based encoding of textual material
- Integration of digital images, encoded text into the access model
- Incremental Cost Summary
|Category|| Known Cost||Worse Case Cost|
We assume that incremental costs are limited to those for which Stanford is
out of pocket. Resource requirements/costs for currently employed staff is
mentioned in the individual sections, but is not counted here. Digitization and
rekeying are given here as categories of costs which might be present, depending
on some circumstances which we are difficult at present predict or estimate. Our
'Worst Case Cost' is based on needing to treat and outsource 500 images, and
outsouce the rekeying 300 pages of text.
- Material and Labor Costs
Material costs are based on the total number of items in the collection
(7,171) and their format. We should note that 653 are bound together and require
no further preservation treatment at this time.
Labor costs are calculated at the rate of one item per minute, with 6,518
items yielding 108 hours; to which is added 25% for the large amount of shifting
that will be needed, yielding 135 hours. At $10/hour, plus 26.2% benefits, the
total is as shown. More expert labor at a higher cost could drive this figure
up incrementally; we expect the total number of hours to remain roughly the
- Curatorial Review
Two FTE weeks at the outset will be necessary to complete the needed
identification of materials to appear in the features database and make some
preliminary judgements about the items to be digitized.
Some additional time may be necessary (but probably no more than one FTE
day) after the features database has been created to complete the selection.
- Technical Processing
We assume that the 7,100 or so pieces comprise roughly 500 bibliographic
items: 200 titles at the series level, and 300 individual monographs. For the
series items, a 10% sample evidenced a hit rate of 75%, when searching both OCLC
Using current salary, benefits, and production rates, the costs for
|Cataloging Type||Number of Items||Cost
|Series - Copy/Variant Edition||150||$3,600
|Series - Original||
|Monographs - Copy||225||$2,363
|Monographs - Original||75
The features database will be created after curatorial review, during the
cataloging process, so that linkages between records may be easily maintained.
The database will be created by a student assistant currently employed by ATS,
or another graduate student in either English or History will be sought out. The
position will be funded out of current ATS salary requisition dollars.
- Development of access model
Developing the access model will be an on-going project, whose exact
management and structure has yet to be determined. Our preference is to use an
iterative test-bed model where some basic design/access issues would be mapped
out in a couple of meetings with the project team, and then implemented using
rapid development tools, repeating as necessary until the structure emerges. We
will also explore the use of MediaWeaver tools for file management and rapid
Elapsed time expectations of 8 weeks, with at least three 2 hour team
meetings to develop, review and refine model. No external or incremental
- Digitization of collection/Capture of digitization information
The number of items to be digitized will not be known until after the
curatorial review has been completed. We expect that number to represent the
entire collection to some degree. At a minimum one image from each bibliographic
item would be created, and page images from those items selected for content
capture. This would mean approximately 1,000 images (1 image per bibliographic
item and some 500 pages of textual image/content) at the low end of the scale
and 3,000+ images at the high end.
We expect that most, if not all, of the digitization will be done in house,
on flat-bed scanners located within ATS or Preservation, depending on the image
requirements. There are issues associated with digitization, the fragility of
the collection, and the use of this type of scanner that will need to be
addressed (see Unresolved Issues). There may also be
associated incremental costs (equipment, storage, maintenance related) of which
we are currently unaware.
Integrally related with the digitization of the items is the capture of
digitization information to be included or kept with
the digital image. Here there are no standards to follow, and few models. This
is an area ripe for cooperation/collaboration with others to gain experience and
test out some approaches. Potential partners, such as RLG and UC Berkeley, come
immediately to mind, and Connie's work with RLG will likely prove beneficial.
- OCR/Keyboarding/Encoding of selected content resources
We are not sure how well current OCR technology will be able to handle the
selected items, nor the precise workflow that we will need to employ for fragile
items, as our current experience is based on a good original, and then
disbinding and discarding this original. This project will give us the
opportunity to gain some useful experience here. If keyboarding is necessary, we
will explore the possibility of outsourcing.
Our current processes suggest that 500 pages of text could be encoded using
the TEI lite DTD subset at the rate of 15-20 pages an hour, or in 25-35 hours by
existing staff. It is likely that additional mark-up to map in links and image
files not part of our usual procedure would add an additional 5
to 10 hours. One project goal here would be to develop specifications and tools
to automate as much of the encoding process as possible so that future projects
would benefit from greater productivity.
- Integration of images/encoded text into access model
If we are successful in developing a good access model through iterative
prototyping, then the integration of the pieces into the final 'product' will
depend only on how well that prototype withstands, or can be made to withstand,
a production environment. At this stage, the issues will largely be
production-oriented, for example:
- ease/facility of managing content-based searching with SGML and Web to
- ease/facility of managing links/large file structure;
- scalability of project for expansion both at the technical level and at the
- ease/facility of linking in other external, but related resources, as these
may be discovered;
- client software base presumed for delivery of the collection.
- Potential destruction of the original in the course of digitization
The fragility of much of the collection assures that some items we may wish
to digitize may not survive the digitization process. There appear to be four
courses of action in such circumstances, each with attendant risks/costs. All
four could be used on a case-by-base basis. While this approach is attractive,
it is also the most complicated and expensive, as four separate decision points
and project flows would need to be created.
- Treat very fragile material with parylene before digitization
Parylene treatment would strengthen the object sufficiently to allow
digitization, and some handling thereafter. However, such treatment also changes
the object at the molecular level, and thus violates general principles of
artifactual preservation. Parylene treatment would also add an additional
preservation cost to the project for those items treated.
- Outsource items that could stand digitization by digital cameras
Some items which could not be digitized on a flatbed scanner, might
withstand digitization by a digital camera. These items could be outsourced.
Here too additional costs, as well as security issues, would be incurred.
- Forgo digitizing images that would be damaged/destroyed by digitization
- Permit the destruction of some items.
- Lack of processing space in Special Collections
Special collections has acute processing space problems. As
noted,such space issues might result in the suspension of other activities
during the preservation rehousing of the collection. An alternative might be to
transfer the collection to other, secure space for its rehousing if its
processing would otherwise impede the processing of other collections at the
- Prototype for delivering a model digital collection
- Production digitization/preservation model and process flow for fragile
- Beginning production model for capture/storage of digitization
- Accurate cost information for like future digitization projects
- Prototype access model (with related software/hardware requirements)
The features database will consist of the following items (and fixed values,
- Short bibliographic description of item (if record is not linked to
main bibliographic record)
- Condition of item
- Age of target audience
- Young children
- Young Adults
- Middle-aged Adults
- Gender of target audience
- Broad genre focus
- US, but otherwise undifferentiated
- Other (specify)
- Salient graphic features
- Engraving (wood or metal)
- Subject of graphics
- Disasters and accidents (flood, fire, etc.)
- Celebrations (Fourth of July, World's Fairs, etc.)
- Boy's/Girl's adventures
- Subject of Content
- Presence, absence, placement of advertising
- Scattered throughout
- Grouped at the front
- Grouped at the back
- Back Cover
- Additional noteworthy features
At least one record will be created for each bibliographic item, and at
least one record will likewise be created for each piece that is digitized.
Depending on the outcome of curatorial review, additional records may be created
for serials or monographic series, particularly when these evidence changes in
one of the main categories over time.
It is an open question whether any of the materials
selected for digitization will be candidates for additional preservation or
conservation after the digitization process. It is likely that many of these
items will be too badly damaged as a part of the process to be restored to a
usable condition; other items may be treated so that they will be sufficiently
strong to withstand digitization.
Due to the age of the material, the accuracy rate of
the OCR may be insufficiently high, and require rekeying of the material, or the
identification of more recent editions.
The desire to minimize handling of the material
means front-loading all the physical processing. We will therefore want those
135 hours of work accomplished in the shortest possible time. This might be best
accomplished by employing more than one full time work. In any event, the
paucity of space in special collections might require suspending other work
while this work goes on.
For example, some have argued that information about
the following itemsby no means a complete listneed to be included as
header information with the digital surrogate: 1) the state of original
(treated? discarded? microfilmed first?); 2) compression information; 3)
scanning device and calibration information; 4) resolution; 5) density; 6)
manipulation (e.g. were coffee stains removed, was the image enhanced in any
way); 7) authentication for "original" digital file and have
derivative versions point back to "original"; 8) test pattern for gray
scale and color scale; 9) name of service agency that captured image; 10) life
history of image (date, update dates?, record of refreshing).
Our working assumption has been a level of
functionality currently delivered by standard Web clients, such as Mosaic or
Netscape. We will need to explore other options, such as SGML viewers, for some
or all of the desired functionality.
The current elements are based on a cursory
evaluation of the collection. The curators will develop the final version in the
course of completing their review.