Pytesseract: OCR with Tesseract (LSTM) in Python

Libraries

OCR

NLP

Computer vision

Text extraction

Multilingual

Deep learning

LSTM

Author

Chris Endemann

Published

April 5, 2025

About this resource

Pytesseract is a Python wrapper for Google’s Tesseract Optical Character Recognition (OCR) engine, used for recognizing and extracting text from images. It works on a wide range of image types (e.g., JPEG, PNG, TIFF) and supports over 100 languages, including Chinese, Arabic, and Devanagari.

Tesseract uses a character-level LSTM model and runs entirely on CPU, making it easy to deploy in low-resource environments. While it’s not state-of-the-art for complex layout or scene text, it’s fast, scriptable, and widely supported — ideal for lightweight OCR use cases.

Key features

Reads printed text from standard image formats
Works with file paths, Pillow/PIL (Python Imaging Library), or OpenCV arrays
Supports multilingual text recognition
Outputs plain text, bounding boxes, PDFs, TSV, and XML formats
Fast CPU-based inference with no GPU dependencies

When to use

You need fast OCR on clean documents or small image batches
You want to automate extraction from scanned forms, labels, or tables
You’re working in a CPU-only or resource-constrained environment
You want a scriptable fallback tool before reaching for ViT-based OCR

Pros and limitations

Pros	Limitations
Easy to install and use on most systems	No GPU acceleration — slower on large datasets
Multilingual out of the box	Cannot be fine-tuned or retrained
Good for simple forms and documents	Struggles with complex layouts or visual context
CPU-only — works in low-resource environments	Lower accuracy than transformer-based models on cluttered or noisy inputs

Tesseract’s fast CPU performance and no-frills setup make it great for small-scale OCR, but it’s not optimized for high-volume pipelines or scene text recognition.

Model architecture

Tesseract relies on an LSTM pipeline trained on character-level text. It performs well when the input is clean and straightforward — such as scanned documents or forms — but struggles with visual ambiguity, clutter, or layout-sensitive content.

For more robust use cases, newer models like TrOCR, Donut, and PaddleOCR use Vision Transformers (ViTs). PaddleOCR in particular includes both CNN- and transformer-based backends. These models are better suited for tasks where text is visually entangled with surrounding context — like reading overlaid labels on maps or structured forms.

Installation and usage

To use pytesseract, you need to install both the Tesseract OCR engine and the Python wrapper.

Ubuntu / Debian

sudo apt update
sudo apt install tesseract-ocr
pip install pytesseract

macOS

brew install tesseract
pip install pytesseract

Windows

Download and install the Tesseract binary from the UB Mannheim builds

Note the install location, typically:

C:\Program Files\Tesseract-OCR\tesseract.exe

Either add this location to your system PATH, or set it manually in your script:

import pytesseract
pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"

Install the Python wrapper:

pip install pytesseract

Basic usage

from PIL import Image  # Pillow is the Python Imaging Library
import pytesseract

# Extract plain text
text = pytesseract.image_to_string(Image.open("example.png"))

# Structured output with positions and confidences
df = pytesseract.image_to_data(Image.open("example.png"), output_type=pytesseract.Output.DATAFRAME)

# Character-level bounding boxes
boxes = pytesseract.image_to_boxes(Image.open("example.png"))

Replace "example.png" with your own image file containing text. Pytesseract supports both in-memory images and file paths.

Questions?

Working on OCR for maps, handwritten notes, or multilingual scans? Curious whether Tesseract is the right fit for your pipeline? Post in the Nexus Q&A to share examples or get advice.