OCR Error Correction for Unconstrained Vietnamese Handwritten Text

Master'sNguyen Quoc DungDuc-Anh Le, Ivan Zelinka

Faculty of Engineering

Research output: Proceeding

researchs.abstract

Post-processing is an essential step in detecting and correcting errors in OCR-generated texts. In this paper, we present an automatic OCR post-processing model which comprises both error detection and error correction phases for OCR output texts of unconstrained Vietnamese handwriting. We propose a hybrid approach of generating and scoring correction candidates for both non-syllable and real-syllable errors based on the linguistic features as well as the error characteristics of OCR outputs. We evaluate our proposed model on a Vietnamese benchmark database at the line level. The experimental results show that our model achieves 4.17% of character error rate (CER) and 9.82% of word error rate (WER), which helps improve both CER and WER of an attention-based encoder-decoder approach by 0.5% and 3.5% respectively on the VNOnDB-Line dataset of the Vietnamese online handwritten text recognition competition (VOHTR2018). These results outperform those obtained by various recognition systems in the VOHTR2018 competition.

Overview
Type
Proceeding
Publication year
04 Dec 2019
Original language
English
Published Journal
The Tenth International Symposium on Information and Communication Technology (SoICT 2019)
Classification
Scopus Indexed
ISBN Index
978-1-4503-7245-9
Page
132-138
Quartiles
N/A

Access Document Overview

To read the full-text of this publication, you can request a copy directly from the authors.