Data Extraction - Envision Forms Processing
: Envision offers interactive, experiential and engaging programs to students of all grades up to the college level. Students’ mail in filled enrollment forms and their progress is tracked.
Challenge: Envision LLC required digital mail services, student form processing and an electronic record management system for digitized data.
Solution: InfoVance collects mail from client locations and transports to a secure location. Here the envelope front and back and contents are scanned and digitized using document imaging workstations. The images are processed using our image processing software, converted into client defined formats and are categorized based on the student programs. They are then sent to the forms processing module, which takes the scanned forms and envelopes and converts the images into PDF files with the associated data converted into an electronic record. Each program is captured as per the pre-defined templates along with multiple levels of form categorization. The export routines take the data from the forms processing module and integrate it to the Envision system along with EDMS for further review and also integrate into the workflow. We customized our product DIGIDocx™ into a secure archive for Envision, which consists of the EDMS and a workflow manager. This uses role based authentication and permissions to authorize access to the site for data review (along with the related image), search, advanced search and reports. The library has a reporting system that tracks all records (Good, QC), generates error codes for errors in the forms processing, performs audit trails and tracks data delivery. The library also has a configuration system that allows the user to configure EDMS as per client programs to work with relevant data.
Newspaper Archival -
Amateur Athletic Foundation, Los Angeles (Now LA/84) in association with Society for American Baseball Research (SABR) and the Baseball Hall of Fame
Background: The Baseball Hall of Fame had an entire collection of an old newspaper magazine, “The Sporting Life,” published from 1883 through 1924. The originals were oversized, bound, extremely fragile and stored at the BBHOF in Cooperstown, NY. Due to the fragility of the documents, the archivists deemed that there was only one usage or handling of the documents remaining. LA/84 and SABR combined forces to get the BBHOF to release the documents for conversion so that LA/84 could make the digital content available to the public.
Challenge: The font size of the 1800’s and early 1900’s and the yellowing of the paper required us to scan the oversized material in color at a higher resolution of 400 dpi in order to create an optimized image for the content. The resulting 70 MB per image file was too large to manage. Their traditional conversion method was to convert the images to PDF by cleaning up the text. Each page in characters was equivalent to approximately 12 average typewritten pages on letter size paper. Their traditional conversion methodology was cost prohibitive for this particular application. Typical file compression of the files could not get the files small enough to stream to the public while remaining legible.
Solution: InfoVance scanned the fragile documents. Using the DIGIFlex™ software we were able to
compress the original 70 MB TIFF files to approximately 700 KB PDF/A files that were full text searchable. The files were compressed and delivered in an ISO standard PDF/A format with impressive OCR accuracy. LA/84 has already made a significant portion of the collection available to the public through their web site. The conversion costs were only a fraction of what their tradition method would have cost. The look and feel of the vintage content was preserved in a PDF/A with hidden text so the public still senses that they are viewing vintage content.
Photo Conversions - National Registry of Historical Places (NRHP)
Background: NRHP has multiple file folders for each submittal. One file consists of the application which is paper-based. The other folder consists of all of the photographs that were also submitted with the application. The photographs are captured in grayscale at a resolution of 300 dpi and the applications are captured in bi-tonal at a resolution of 400 dpi. It is mandated that the photographs be scanned by a flatbed scanner. NRHP is a recently awarded multi-year contract customer.
Challenge: NRHP required multiple deliverables of the files in varying formats, including TIFF, PDF and DJVU. The TIFF files are to be delivered in a file folder as individual TIFF’s and the other file formats are to be converted and saved as multi-page files with the folder title as the file name. The photographs often have indexing information written on the backs of the photographs. This material is usually written in pencil which appears light on the grayscale image.
Solution: InfoVance scans the applications on a production scanner and scans the fronts and backs of the photographs on flatbed scanners, as required by the contract. To keep the costs down we capture the fronts and backs of the images using the same scanner settings. If we change the scanner settings to optimize for reading the pencil, the same setting would degrade the image of the photo. We scan the fronts and the backs and then process them through separate processes using different algorithms to optimize the results. Digiflex is used to convert the images to the respective derivative files. The original scans are delivered in the TIFF format for archive. The derivative files are compressed and enhanced to increase the contrast and legibility of the derivative files. Thus, we are able to significantly improve upon the legibility of the pencil marking. Then using DIGIFlex™ once again we combine all of the images that were optimized separately into one multi-page PDF/A file for delivery.
Paper Digitization - United States Geological Survey (USGS)
Background: USGS has numerous volumes of publications that they wish to digitize.
Challenge: Many of the publications are old and consist of a small percentage of pages with maps or photographs. Some of the books have fold out maps in envelops attached to the inside of the back cover of the publication. USGS has contracted to scan the books with the black and white pages to be delivered as bi-tonal images, pages with grayscale photos or drawings to be delivered as grayscale images, and pages with color information to be delivered as color images. USGS wants all of the images of a publication to be compressed and delivered in a multi-page PDF/A format.
Solution: InfoVance scans the entire document in production capturing all pages in bi-tonal. Then a preparation operator goes through the publication flagging the pages for grayscale or color capture. The automated color capture feature of the color scanners does not work accurately. In the automatic mode many of the drawings in the automatic mode are captured as bi-tonal while the USGS wants them delivered as grayscale. We use different algorithms to most effectively compress each file type and then combine all of the different file type into one multi-page PDF/A file. We use the originally scanned bi-tonal images of the color and grayscale pages to determine proper placement within the document of the replacement files.
Microfilm Digitization - United States Geological Survey (USGS) (Microfilm)
Background: USGS had numerous volumes of historic documents that only existed on older 16 mm microfilm that they wished to digitize and upload to an old imaging system. (Approximately 400,000 images)
Challenge: The microfilm was older and the quality of the scanning to the film was somewhat light. When scanning the film in a bi-tonal mode some of the lighter text was dropped. The old system that they used to host the images only supported the TIFF file format. The size of grayscale images captured from the film was too large for the old system to support.
Solution: InfoVance scanned the film in grayscale with only a nominal price increase above bi-tonal rates. The grayscale images were enhanced and compressed using DIGIFlex™ software. Although their current system would not support grayscale PDFs, an archive copy of the grayscale PDF files was delivered so that, if and when the system was upgraded, the superior imagery would be available. The grayscale files were converted to bi-tonal images after image enhancement. The resulting bi-tonal imagery was far superior to the original bi-tonal scans coming out of the film scanners.
Image Digitization - Culver Pictures
Background: Culver had more than one million vintage photographs that needed to be digitized.
Challenge: Both the front and the back of the photos were required to be scanned. Culver also sold usage to the photos and potentially required enlargements. The smaller the photo, the higher the resolution was required.
Solution: For the large collection of vintage photographs we scanned at various resolutions based on the size of the original material. In addition we indexed all information off of the tab of the folders in which the photos were provided as well as captured index information from written details on the backs of the photos. We provided the customer with a database drive retrieval InfoVance system. We provided the original scanned TIFF files, a JPEG derivative of the file for them to provide to their printer and a thumbnail file and small JPEG file for them to display on the internet.