Dime Novels and Penny Dreadfuls

Original Project Outline and Proposal

 

Description of Source Material

The Dime Novel and Story Paper Collection consists of over 8,000 individual items, and includes long runs of the major dime novel series (Frank Leslie's Boys of America, Happy Days, Beadle's New York Dime Library, etc.) and equally strong holdings of story papers like the New York Ledger and Saturday Night.

Both genres flourished from the middle to the close of the 19th century in America and England (where the novels were known as "penny dreadfuls"), and benefited from three mutually reinforcing trends: the vastly increased mechanization of printing, the growth of efficient rail and canal shipping, and ever-growing rates of literacy.

The dime novels were aimed at youthful, working-class audiences and distributed in massive editions at newsstands and dry goods stores. Though the phrase conjures up stereotyped yarns of Wild West adventure, complete with lurid cover illustration, many other genres were represented: tales of urban outlaws, detective stories, working-girl narratives of virtue defended, and costume romances.

Story papers, weekly eight-page tabloids, covered much the same ground, but often combined material and themes to appeal to the whole family. The chief among them had national circulations greater than any other newspaper or magazine, some reaching 400,000 copies sold per issue. Unlike the dime novels, which generally confine illustration to the cover, the story papers integrate text and illustration (in the form of wood engravings) throughout.

Many dime novels were microfilmed by UMI as part of a large dime novel collection it prepared about twenty years ago. Without a more extensive survey, the uniqueness and research value of Stanford's collection, while significant, is unclear.

Material Proposed for Digitization/Preservation

These materials are in very fragile condition, and some form of rehousing will be required before any form of access can be made. Even with protective housing, the dime novels and penny dreadfuls access to the originals will need to be limited, due to their fragile condition. For this reason, we propose rehousing the entire collection as part of this project.

Since the intellectual content of this type of the material is of a limited nature, we do not feel that the project needs to cover the entire collection at the outset. Rather, we believe an approach that develops a model digital representation of the entire collection would serve current needs better and at the same time give us an approach that could be of general use for other collections of containing primary resources and mixed media. This model digital representation will be designed and implemented in such a way as to permit the entire collection to be linked in and digitized in the future if we should so choose.

Approach to a Model Digital Representation

In this "model approach", access to the content of the representative sample is required. The conjunction between visual and text is very important in these materials, but it is not necessary to render every page image. Page images of all or many of the covers and illustrations should be provided. Most dime novels only have illustrations on the covers; penny dreadfuls often have illustrations inside, as well, and an adequate selection of these pages should likewise be rendered.

It has become clear through discussions with the curators that such a model representation should be based on characteristics or salient features of the collection that are not usually captured in a bibliographic description. Those pursuing studies in cultural history, gender studies and literary studies, for example, would be interested in knowing details such as the target audience, genre focus, race and class representations, and so on. The curators will therefore develop a template-based database of features (hereafter features database)that will be captured as part of the project, and linked to bibliographic items in the collection.

Therefore, to meet all of the intellectual access requirements for this approach, we propose using a variety of technologies (scanning, OCR, rekeying, SGML encoding, traditional cataloging, creation of value-added database of features) to create a model digital representation that could serve as an introduction to the collection, be of sufficient size and importance to serve as a teaching collection, and offers a framework for the collection's complete reformatting at some future date.

Project Workflow

Preservation/Rehousing

  • Prepare collection for rehousing

    Small books: (up to 5.5 x 8.5 inches). These will be stored in mylar envelopes (6 x 9 inches) with opening at top. The bagged books can then be stored in some form of document case or record storage box with a central insert to allow a maximum number of books per box.

    Midsized materials: (up to 9 x 12 inches). Items will be placed in polyethylene bags with archival board inserts for support, and separated with document spacers.

    Large format materials (up to 11 x 15 inches). These will be stored in polyethylene zip-lock bag. These will be stored flat in one of the two boxes suggested.

  • Put collection, particularly serials and monographic series, in order

    Worker (skill level, i.e. student, other, needs to be determined) identifies runs split between boxes, using the collection inventory.

    Going through the boxes in order, worker sets aside issues known to come from split titles for later handling, and where all issues are present: removes and discards plastic bags and tape; places each item in its new housing; packs items into boxes and either labels boxes with contents, or annotates list with new box number.

    As complete runs of titles are assembled, they are then ready for curatorial review.


Curatorial Review

  • Curatorial review collection to determine which items need to be captured for the features database
  • Initial curatorial screening for inclusion in model digital representation

Technical Processing

  • Transfer of collection for technical processing
  • Creation of features database record for identified items
  • Creation of bibliographic record for bibliographic entities (no analytics required; however holdings statements will be created where appropriate)

Creation of Access Model

  • Curatorial determination (developed with the help of relevant curators/scholars/faculty team member(s) of the items to be included in the model representation
  • Development of access model (on-going)

Digitization

  • Digitization of identified items (resolution, bit-depth, preservation and access copy needs to be determined)[1]
  • OCR[2] and/or rekeying of textual material
  • SGML-based encoding of textual material

Integration

  • Integration of digital images, encoded text into the access model

Estimated Resources (Incremental and Other) Required

  • Incremental Cost Summary

    Category Known CostWorse Case Cost
    Preservation/Rehousing $6,204$6,204
    Cataloging$6,113 $6,113
    Digitization 0$1,000
    Rekeying 0$1,000
    Other Materials 0$1,000
    Total $12,317$15,317

    We assume that incremental costs are limited to those for which Stanford is out of pocket. Resource requirements/costs for currently employed staff is mentioned in the individual sections, but is not counted here. Digitization and rekeying are given here as categories of costs which might be present, depending on some circumstances which we are difficult at present predict or estimate. Our 'Worst Case Cost' is based on needing to treat and outsource 500 images, and outsouce the rekeying 300 pages of text.

  • Preservation/Rehousing
    • Material and Labor Costs

      Materials $4,500
      Labor $1,704
      Total $6,204

      Material costs are based on the total number of items in the collection (7,171) and their format. We should note that 653 are bound together and require no further preservation treatment at this time.

      Labor costs are calculated at the rate of one item per minute, with 6,518 items yielding 108 hours; to which is added 25% for the large amount of shifting that will be needed, yielding 135 hours. At $10/hour, plus 26.2% benefits, the total is as shown. More expert labor at a higher cost could drive this figure up incrementally; we expect the total number of hours to remain roughly the same.[3]

  • Curatorial Review

    Two FTE weeks at the outset will be necessary to complete the needed identification of materials to appear in the features database and make some preliminary judgements about the items to be digitized.

    Some additional time may be necessary (but probably no more than one FTE day) after the features database has been created to complete the selection.

  • Technical Processing

    We assume that the 7,100 or so pieces comprise roughly 500 bibliographic items: 200 titles at the series level, and 300 individual monographs. For the series items, a 10% sample evidenced a hit rate of 75%, when searching both OCLC and RLIN.

    Using current salary, benefits, and production rates, the costs for cataloging are:

    Cataloging TypeNumber of ItemsCost
    Series - Copy/Variant Edition150$3,600
    Series - Original 50$2,500
    Sub-Total 200$6,100
    Monographs - Copy225$2,363
    Monographs - Original75 $3,750
    Sub-Total 300 $6,113
    Grand Total500$12,213

    The features database will be created after curatorial review, during the cataloging process, so that linkages between records may be easily maintained. The database will be created by a student assistant currently employed by ATS, or another graduate student in either English or History will be sought out. The position will be funded out of current ATS salary requisition dollars.

  • Development of access model

    Developing the access model will be an on-going project, whose exact management and structure has yet to be determined. Our preference is to use an iterative test-bed model where some basic design/access issues would be mapped out in a couple of meetings with the project team, and then implemented using rapid development tools, repeating as necessary until the structure emerges. We will also explore the use of MediaWeaver tools for file management and rapid deployment.

    Elapsed time expectations of 8 weeks, with at least three 2 hour team meetings to develop, review and refine model. No external or incremental resources expected.

  • Digitization of collection/Capture of digitization information

    The number of items to be digitized will not be known until after the curatorial review has been completed. We expect that number to represent the entire collection to some degree. At a minimum one image from each bibliographic item would be created, and page images from those items selected for content capture. This would mean approximately 1,000 images (1 image per bibliographic item and some 500 pages of textual image/content) at the low end of the scale and 3,000+ images at the high end.

    We expect that most, if not all, of the digitization will be done in house, on flat-bed scanners located within ATS or Preservation, depending on the image requirements. There are issues associated with digitization, the fragility of the collection, and the use of this type of scanner that will need to be addressed (see Unresolved Issues). There may also be associated incremental costs (equipment, storage, maintenance related) of which we are currently unaware.

    Integrally related with the digitization of the items is the capture of digitization information[4] to be included or kept with the digital image. Here there are no standards to follow, and few models. This is an area ripe for cooperation/collaboration with others to gain experience and test out some approaches. Potential partners, such as RLG and UC Berkeley, come immediately to mind, and Connie's work with RLG will likely prove beneficial.

  • OCR/Keyboarding/Encoding of selected content resources

    We are not sure how well current OCR technology will be able to handle the selected items, nor the precise workflow that we will need to employ for fragile items, as our current experience is based on a good original, and then disbinding and discarding this original. This project will give us the opportunity to gain some useful experience here. If keyboarding is necessary, we will explore the possibility of outsourcing.

    Our current processes suggest that 500 pages of text could be encoded using the TEI lite DTD subset at the rate of 15-20 pages an hour, or in 25-35 hours by existing staff. It is likely that additional mark-up to map in links and image files ‹ not part of our usual procedure ‹ would add an additional 5 to 10 hours. One project goal here would be to develop specifications and tools to automate as much of the encoding process as possible so that future projects would benefit from greater productivity.

  • Integration of images/encoded text into access model

    If we are successful in developing a good access model through iterative prototyping, then the integration of the pieces into the final 'product' will depend only on how well that prototype withstands, or can be made to withstand, a production environment. At this stage, the issues will largely be production-oriented, for example:

    • ease/facility of managing content-based searching with SGML and Web to Pat interface;
    • ease/facility of managing links/large file structure;
    • scalability of project for expansion both at the technical level and at the intellectual level;
    • ease/facility of linking in other external, but related resources, as these may be discovered;
    • client software base presumed for delivery of the collection.[5]

Unresolved Issues


  • Potential destruction of the original in the course of digitization

    The fragility of much of the collection assures that some items we may wish to digitize may not survive the digitization process. There appear to be four courses of action in such circumstances, each with attendant risks/costs. All four could be used on a case-by-base basis. While this approach is attractive, it is also the most complicated and expensive, as four separate decision points and project flows would need to be created.

    1. Treat very fragile material with parylene before digitization

      Parylene treatment would strengthen the object sufficiently to allow digitization, and some handling thereafter. However, such treatment also changes the object at the molecular level, and thus violates general principles of artifactual preservation. Parylene treatment would also add an additional preservation cost to the project for those items treated.

    2. Outsource items that could stand digitization by digital cameras

      Some items which could not be digitized on a flatbed scanner, might withstand digitization by a digital camera. These items could be outsourced. Here too additional costs, as well as security issues, would be incurred.

    3. Forgo digitizing images that would be damaged/destroyed by digitization
    4. Permit the destruction of some items.
  • Lack of processing space in Special Collections

    Special collections has acute processing space problems. As noted,such space issues might result in the suspension of other activities during the preservation rehousing of the collection. An alternative might be to transfer the collection to other, secure space for its rehousing if its processing would otherwise impede the processing of other collections at the same time.


Project Deliverables

  • Prototype for delivering a model digital collection
  • Production digitization/preservation model and process flow for fragile source collections
  • Beginning production model for capture/storage of digitization meta-information
  • Accurate cost information for like future digitization projects
  • Prototype access model (with related software/hardware requirements)

Appendix

Features Database

The features database will consist of the following items (and fixed values, where appropriate):[6]

  1. Short bibliographic description of item (if record is not linked to main bibliographic record)
  2. Condition of item
  3. Age of target audience

    • Young children
    • Young Adults
    • Middle-aged Adults
    • Elderly

  4. Gender of target audience
  5. Broad genre focus

    • Comedy
    • Murder/Violence
    • Domestic
    • Romance
    • Adventure
    • Western

  6. Setting
    • Urban
    • Rural
    • East
    • Midwest
    • West
    • South
    • US, but otherwise undifferentiated
    • Other (specify)

  7. Salient graphic features
    • Colored
    • Engraving (wood or metal)
    • Lithograph
    • Full-page/Cover

  8. Subject of graphics
    • Race
    • Class
    • Gender
    • Romantic
    • Violence
    • Disasters and accidents (flood, fire, etc.)
    • Celebrations (Fourth of July, World's Fairs, etc.)
    • Boy's/Girl's adventures

  9. Subject of Content
  10. Presence, absence, placement of advertising
    • Scattered throughout
    • Grouped at the front
    • Grouped at the back
    • Back Cover

  11. Additional noteworthy features

At least one record will be created for each bibliographic item, and at least one record will likewise be created for each piece that is digitized. Depending on the outcome of curatorial review, additional records may be created for serials or monographic series, particularly when these evidence changes in one of the main categories over time.


Notes

[1]It is an open question whether any of the materials selected for digitization will be candidates for additional preservation or conservation after the digitization process. It is likely that many of these items will be too badly damaged as a part of the process to be restored to a usable condition; other items may be treated so that they will be sufficiently strong to withstand digitization.

[2]Due to the age of the material, the accuracy rate of the OCR may be insufficiently high, and require rekeying of the material, or the identification of more recent editions.

[3]The desire to minimize handling of the material means front-loading all the physical processing. We will therefore want those 135 hours of work accomplished in the shortest possible time. This might be best accomplished by employing more than one full time work. In any event, the paucity of space in special collections might require suspending other work while this work goes on.

[4]For example, some have argued that information about the following items‹by no means a complete list‹need to be included as header information with the digital surrogate: 1) the state of original (treated? discarded? microfilmed first?); 2) compression information; 3) scanning device and calibration information; 4) resolution; 5) density; 6) manipulation (e.g. were coffee stains removed, was the image enhanced in any way); 7) authentication for "original" digital file and have derivative versions point back to "original"; 8) test pattern for gray scale and color scale; 9) name of service agency that captured image; 10) life history of image (date, update dates?, record of refreshing).

[5]Our working assumption has been a level of functionality currently delivered by standard Web clients, such as Mosaic or Netscape. We will need to explore other options, such as SGML viewers, for some or all of the desired functionality.

[6]The current elements are based on a cursory evaluation of the collection. The curators will develop the final version in the course of completing their review.