2025-03-14 21:59:59 -06:00
|
|
|
# tesseract
|
|
|
|
|
|
|
|
|
|
> OCR (Optical Character Recognition) engine.
|
2025-12-16 10:20:31 -07:00
|
|
|
> More information: <https://github.com/tesseract-ocr/tesseract/blob/main/doc/tesseract.1.asc>.
|
2025-03-14 21:59:59 -06:00
|
|
|
|
2025-12-16 10:20:31 -07:00
|
|
|
- Recognize text in an image and save it to the given path (a `.txt` extension is added automatically):
|
2025-03-14 21:59:59 -06:00
|
|
|
|
2025-12-16 10:20:31 -07:00
|
|
|
`tesseract {{path/to/image.png}} {{path/to/output_file}}`
|
2025-03-14 21:59:59 -06:00
|
|
|
|
2025-12-16 10:20:31 -07:00
|
|
|
- Specify a custom [l]anguage (default is English) with an ISO 639-2 code (e.g. deu = Deutsch = German):
|
2025-03-14 21:59:59 -06:00
|
|
|
|
2025-12-16 10:20:31 -07:00
|
|
|
`tesseract -l deu {{path/to/image.png}} {{path/to/output}}`
|
2025-03-14 21:59:59 -06:00
|
|
|
|
2025-12-16 10:20:31 -07:00
|
|
|
- List the ISO 639-2 codes of installed languages:
|
2025-03-14 21:59:59 -06:00
|
|
|
|
|
|
|
|
`tesseract --list-langs`
|
|
|
|
|
|
2025-12-16 10:20:31 -07:00
|
|
|
- Specify a custom [p]age [s]egmentation [m]ode (default is 3):
|
2025-03-14 21:59:59 -06:00
|
|
|
|
2025-12-16 10:20:31 -07:00
|
|
|
`tesseract --psm {{0..13}} {{path/to/image.png}} {{path/to/output}}`
|
2025-03-14 21:59:59 -06:00
|
|
|
|
|
|
|
|
- List page segmentation modes and their descriptions:
|
|
|
|
|
|
|
|
|
|
`tesseract --help-psm`
|