Document & Layout

Large-scale digitization projects aim at preserving our cultural heritage from printed documents for the digital age. It is, however, not sufficient to scan the documents and store them as images. Rather, we need to open up their content for search, analysis or re-publishing.

We are researching methods for automatically analyzing printed document content and layout that are able to robustly handle large document collections. We continuously refine document processing steps such as binarization, page extraction and deskewing, our award-winning page segmentation method and our article segmentation method for complex newspaper layouts.

Our current research concentrates on analysis techniques for large-scale book digitization.