A character-recognition system for Hangeul

Sageryd, Johan

A character-recognition system for Hangeul

Mark

Sageryd, Johan ^LU (2009) SPRM01 20081
Language Technology Program

Abstract: This work presents a rule-based character-recognition system for the Korean script, Hangeul. An input raster image representing one Korean character (Hangeul syllable) is thinned down to a skeleton, and the individual lines extracted. The lines, along with information on how they are interconnected, are translated into a set of hierarchical graphs, which can be easily traversed and compared with a set of reference structures represented in the same way. Hangeul consists of consonant and vowel graphemes, which are combined into blocks representing syllables. Each reference structure describes one possible variant of such a grapheme. The reference structures that best match the structures found in the input are combined to form a full... (More); This work presents a rule-based character-recognition system for the Korean script, Hangeul. An input raster image representing one Korean character (Hangeul syllable) is thinned down to a skeleton, and the individual lines extracted. The lines, along with information on how they are interconnected, are translated into a set of hierarchical graphs, which can be easily traversed and compared with a set of reference structures represented in the same way. Hangeul consists of consonant and vowel graphemes, which are combined into blocks representing syllables. Each reference structure describes one possible variant of such a grapheme. The reference structures that best match the structures found in the input are combined to form a full Hangeul syllable. Testing all of the 11 172 possible characters, each rendered as a 200-pixel-squared raster image using the gothic font AppleGothic Regular, had a recognition accuracy of 80.6 percent. No separation logic exists to be able to handle characters whose graphemes are overlapping or conjoined; with such characters removed from the set, thereby reducing the total number of characters to 9 352, an accuracy of 96.3 percent was reached. Hand-written characters were also recognised, to a certain degree. The work shows that it is possible to create a workable character-recognition system with reasonably simple means. (Less)

Please use this url to cite or link to this publication: http://lup.lub.lu.se/student-papers/record/1492719

author

Sageryd, Johan ^LU

supervisor

Johan Frid ^LU

organization

Language Technology Program

course

SPRM01 20081

year

2009

type

H1 - Master's Degree (One Year)

subject

keywords

graph matching, graph thinning, Hangeul, Korean, character recognition

language

English

id

1492719

date added to LUP

2009-10-20 10:28:29

date last changed

2009-10-20 10:28:29

@misc{1492719,
  abstract     = {{This work presents a rule-based character-recognition system for the Korean script, Hangeul. An input raster image representing one Korean character (Hangeul syllable) is thinned down to a skeleton, and the individual lines extracted. The lines, along with information on how they are interconnected, are translated into a set of hierarchical graphs, which can be easily traversed and compared with a set of reference structures represented in the same way. Hangeul consists of consonant and vowel graphemes, which are combined into blocks representing syllables. Each reference structure describes one possible variant of such a grapheme. The reference structures that best match the structures found in the input are combined to form a full Hangeul syllable. Testing all of the 11 172 possible characters, each rendered as a 200-pixel-squared raster image using the gothic font AppleGothic Regular, had a recognition accuracy of 80.6 percent. No separation logic exists to be able to handle characters whose graphemes are overlapping or conjoined; with such characters removed from the set, thereby reducing the total number of characters to 9 352, an accuracy of 96.3 percent was reached. Hand-written characters were also recognised, to a certain degree. The work shows that it is possible to create a workable character-recognition system with reasonably simple means.}},
  author       = {{Sageryd, Johan}},
  language     = {{eng}},
  note         = {{Student Paper}},
  title        = {{A character-recognition system for Hangeul}},
  year         = {{2009}},
}

LUP Student Papers

LUND UNIVERSITY LIBRARIES

A character-recognition system for Hangeul