ocr - Best method to train Tesseract 3.02 -
i'm wondering best method train tesseract (kind of text/tiff , on) particular kind of documents, these particularities:
- the structure , main text of documents same
- the things change 5 alphanumeric codes (this real important thing detect!)
- some of thes codes bold
at moment used standard trained datas, detect entire text , extrapolate codes regular expressions. it's okay, i've got errors sometimes, example:
0 / o
l / / 1
please knowns "tricks" improve precision?
thanks!
during training part of tesseract, have make file manually give engine in order specify ambiguous characters.
for more information @ "unicharambigs" part of tesseract documentation.
best regards.
Comments
Post a Comment