Subtitle Edit

the subtitle editor :)


OCR (Optical Character Recognition)

Subtitle Edit can convert image-based subtitle formats to text using OCR.

OCR Window

Supported Image Formats

OCR Engines

Tesseract

Open-source OCR engine with language packs.

nOCR (Nikse OCR)

Built-in trainable OCR engine.

Binary OCR

Binary image comparison engine.

Google Lens OCR

Cloud-based OCR using Google Lens.

Google Vision OCR

Cloud-based OCR using Google Cloud Vision API.

Ollama OCR

Local LLM-based OCR using Ollama.

Mistral OCR

Cloud-based OCR using Mistral API.

PaddleOCR

Local OCR engine.

How to Use

  1. Open an image-based subtitle file
  2. The OCR window opens automatically
  3. Select an OCR engine
  4. Configure engine-specific settings
  5. Click Start OCR
  6. Review and correct any errors
  7. Click OK to import the text subtitles

Keyboard Shortcuts

General

| Shortcut | Action | |———-|——–| | Escape | Cancel OCR / Close window | | Ctrl+G | Go to line number | | F1 | Show help |

Subtitle Grid

| Shortcut | Action | |———-|——–| | Ctrl+I | Toggle italic formatting | | Ctrl+P | View selected image (use arrow keys to navigate) | | Delete | Delete selected line(s) | | Home | Jump to first line | | End | Jump to last line | | Double-click | Inspect line (nOCR/Binary OCR only) |

Unknown Words List

| Shortcut | Action | |———-|——–| | Enter | Jump to subtitle line containing the selected unknown word |

Note: All shortcuts can be customized. Go to Options → Shortcuts to view and change key bindings.

Pre-processing

Before OCR, you can apply image pre-processing:

OCR Pre-processing

Unknown Words

When the OCR engine encounters uncertain characters, you can:

OCR Fix Replacement Lists

Subtitle Edit uses language-specific XML files to automatically correct common OCR errors. These files are named {language}_OCRFixReplaceList.xml (e.g., eng_OCRFixReplaceList.xml for English) and are located in the Dictionaries folder.

See the example file: Dictionaries/eng_OCRFixReplaceList.xml

XML Structure

The OCR fix replacement list contains several sections that handle different types of corrections:

1. WholeWords

Replaces entire words that match exactly. This is the most common section for fixing OCR mistakes.

<WholeWords>
    <Word from="tñere" to="there" />
    <Word from="ri9ht" to="right" />
    <Word from="0f" to="of" />
    <Word from="alot" to="a lot" />
    <Word from="becuase" to="because" />
</WholeWords>

Use cases:

2. PartialWordsAlways

Replaces character sequences within words, always applied without spell checking.

<PartialWordsAlways>
    <WordPart from="¤" to="o" />
    <WordPart from="lVI" to="M" />
    <WordPart from="IVl" to="M" />
</PartialWordsAlways>

Use cases:

3. WholeLines

Replaces entire lines that match exactly (including formatting tags).

<WholeLines>
    <Line from="[chitte rs]" to="[chitters]" />
    <Line from="Hil' it!" to="Hit it!" />
    <Line from="&lt;i&gt;Hil' it!&lt;/i&gt;" to="&lt;i&gt;Hit it!&lt;/i&gt;" />
    <Line from="ISIGHS]" to="[SIGHS]" />
</WholeLines>

Use cases:

4. PartialLinesAlways

Replaces text fragments within lines, always applied without spell checking.

<PartialLinesAlways>
    <LinePart from="Apollo 1 3" to="Apollo 13" />
    <LinePart from=",.," to="..." />
    <LinePart from=" lt " to=" it " />
    <LinePart from=" lf " to=" if " />
</PartialLinesAlways>

Use cases:

5. PartialLines

Replaces text fragments within lines (may be spell-checked).

<PartialLines>
    <LinePart from=" /be " to=" I be " />
    <LinePart from=" aren '1'" to=" aren't" />
    <LinePart from=" aren'tyou" to=" aren't you" />
</PartialLines>

6. RegularExpressionsIfSpelledCorrectly

Uses regex patterns to fix errors, but only applies the replacement if the corrected word is in the dictionary.

<RegularExpressionsIfSpelledCorrectly>
    <!-- Fix lowercase 'l' to uppercase 'I' if result is a valid word -->
    <RegEx find="\bl([A-Z]+)\b" spellCheck="I$1" replaceWith="I$1" />
    <RegEx find="\b([A-Z]+)l\b" spellCheck="$1I" replaceWith="$1I" />

    <!-- Fix possessive forms: David's, there's -->
    <RegEx find="\b([A-Z][a-z]+)['']s\b" spellCheck="$1" replaceWith="$1's" />
    <RegEx find="\b([a-z]+)['']s\b" spellCheck="$1" replaceWith="$1's" />

    <!-- Fix missing spaces: ofDavid → of David -->
    <RegEx find="\bof([A-Z][a-z]+)\b" spellCheck="$1" replaceWith="of $1" />
    <RegEx find="\bin([A-Z][a-z]+)\b" spellCheck="$1" replaceWith="in $1" />

    <!-- Fix 'l' in brackets: [GRlNDlNG] → [GRINDING] -->
    <RegEx find="\[([A-Z ]*)l([A-Z ]*)\]" spellCheck="[$1I$2]" replaceWith="[$1I$2]" />
</RegularExpressionsIfSpelledCorrectly>

Attributes:

Use cases:

Creating Custom Rules

To add your own OCR fix rules:

  1. Open the appropriate language file: Dictionaries/{language}_OCRFixReplaceList.xml
  2. Add entries to the relevant section based on the type of error
  3. Save the file
  4. Restart Subtitle Edit for changes to take effect

Tips: