Current Research

Digital Text

Retrodigitization is the process of digitizing text that originated in print. The larger mathematical publishing houses have had retrodigitization programs under way for several years, and
many famous older journals will be available as electronic files in the near future. For smaller publishers and for scientific societies, the situation is much less clear.

With mathematical text, the retrogigitization process proceeds through several steps:

(1) scanning to obtain an image;

(2) compensating for faults in the original document or those introduced by the scanning process;

(3) using OCR software to recover text and mathematics;

(4) repackaging the results into logical units.
The aim of this project is to improve on steps (2) and (3). A suite of more than 50,000 scanned images, consisting of
all pages from the 1949 -1996 issues of the Canadian Journal of Mathematics, has been made available by the Canadian
Mathematical Society for this purpose. In the short term the objective will be to improve the software available to enhance
images (taking into account the peculiarities of mathematical text). This includes deskewing, despeckling, and balancing the optical properties of bitonal images containing mathematics. Comparative runs on large samples are needed.

This task is particularly suited for an HPC environment such as the one made available by MRnet. In the long term, any software developed will be made available publicly. The will allow smaller societies and publishers to add their material to the growing body of retrodigitized material.


  Enhancing Education and Research Through Advanced High Speed Optical Networks