Pytesseract: OCR with Tesseract in Python

Libraries
OCR
NLP
Computer vision
Text extraction
Multilingual
Deep learning
LSTM
Author

Chris Endemann

Published

April 5, 2025

About this resource

Pytesseract is a Python wrapper for Google’s Tesseract Optical Character Recognition (OCR) engine, used for recognizing and extracting text from images. It works on a wide range of image types (e.g., JPEG, PNG, TIFF) and supports over 100 languages, including Chinese, Arabic, and Devanagari.

Tesseract uses a character-level LSTM model and runs entirely on CPU, making it easy to deploy in low-resource environments. While it’s not state-of-the-art for complex layout or scene text, it’s fast, scriptable, and widely supported — ideal for lightweight OCR use cases.

Key features

  • Reads printed text from standard image formats
  • Works with file paths, Pillow/PIL (Python Imaging Library), or OpenCV arrays
  • Supports multilingual text recognition
  • Outputs plain text, bounding boxes, PDFs, TSV, and XML formats
  • Fast CPU-based inference with no GPU dependencies

When to use

  • You need fast OCR on clean documents or small image batches
  • You want to automate extraction from scanned forms, labels, or tables
  • You’re working in a CPU-only or resource-constrained environment
  • You want a scriptable fallback tool before reaching for ViT-based OCR

Pros and limitations

Pros Limitations
Easy to install and use on most systems No GPU acceleration — slower on large datasets
Multilingual out of the box Cannot be fine-tuned or retrained
Good for simple forms and documents Struggles with complex layouts or visual context
CPU-only — works in low-resource environments Lower accuracy than transformer-based models on cluttered or noisy inputs

Tesseract’s fast CPU performance and no-frills setup make it great for small-scale OCR, but it’s not optimized for high-volume pipelines or scene text recognition.

Model architecture

Tesseract relies on an LSTM pipeline trained on character-level text. It performs well when the input is clean and straightforward — such as scanned documents or forms — but struggles with visual ambiguity, clutter, or layout-sensitive content.

For more robust use cases, newer models like TrOCR, Donut, and PaddleOCR use Vision Transformers (ViTs). PaddleOCR in particular includes both CNN- and transformer-based backends. These models are better suited for tasks where text is visually entangled with surrounding context — like reading overlaid labels on maps or structured forms.

Installation and usage

To use pytesseract, you need to install both the Tesseract OCR engine and the Python wrapper.

Ubuntu / Debian

sudo apt update
sudo apt install tesseract-ocr
pip install pytesseract

macOS

brew install tesseract
pip install pytesseract

Windows

  1. Download and install the Tesseract binary from the UB Mannheim builds

  2. Note the install location, typically:

    C:\Program Files\Tesseract-OCR\tesseract.exe
  3. Either add this location to your system PATH, or set it manually in your script:

import pytesseract
pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"
  1. Install the Python wrapper:
pip install pytesseract

Basic usage

from PIL import Image  # Pillow is the Python Imaging Library
import pytesseract

# Extract plain text
text = pytesseract.image_to_string(Image.open("example.png"))

# Structured output with positions and confidences
df = pytesseract.image_to_data(Image.open("example.png"), output_type=pytesseract.Output.DATAFRAME)

# Character-level bounding boxes
boxes = pytesseract.image_to_boxes(Image.open("example.png"))

Replace "example.png" with your own image file containing text. Pytesseract supports both in-memory images and file paths.

Questions?

Working on OCR for maps, handwritten notes, or multilingual scans? Curious whether Tesseract is the right fit for your pipeline? Post in the Nexus Q&A to share examples or get advice.

See also

  • GitHub repo: madmaze/pytesseract – Source code and examples
  • PaddleOCR – End-to-end OCR with detection, recognition, and layout modeling (CNN and ViT backends)
  • TrOCR – Transformer-based OCR with multilingual support
  • Donut – OCR + document understanding via vision-language modeling
  • EasyOCR – Lightweight OCR tool with CNN + LSTM backends