ocr - Best method to train Tesseract 3.02 -


i'm wondering best method train tesseract (kind of text/tiff , on) particular kind of documents, these particularities:

  • the structure , main text of documents same
  • the things change 5 alphanumeric codes (this real important thing detect!)
  • some of thes codes bold

at moment used standard trained datas, detect entire text , extrapolate codes regular expressions. it's okay, i've got errors sometimes, example:

0 / o

l / / 1

please knowns "tricks" improve precision?

thanks!

during training part of tesseract, have make file manually give engine in order specify ambiguous characters.

for more information @ "unicharambigs" part of tesseract documentation.

best regards.


Comments

Popular posts from this blog

matlab - "Contour not rendered for non-finite ZData" -

delphi - Indy UDP Read Contents of Adata -

javascript - Any ideas when Firefox is likely to implement lengthAdjust and textLength? -