树莓派tesseract OCR/光学字符识别

树莓派tesseract OCR/光学字符识别

浏览:47

安装tesseract和python绑定

sudo apt-get install -y libleptonica-dev
sudo apt-get install -y tesseract-ocr
sudo apt-get install -y tesseract-ocr-dev
sudo pip install pytesseract

通过python接口调用tesseract

# tesseract_demo.py
from PIL import Image
from pytesseract import *
import cv2
import numpy as np

IMAGE_FILE="num2.jpg"
cv2_im = cv2.imread(IMAGE_FILE)
pil_im = Image.fromarray(cv2_im)
img = Image.open(IMAGE_FILE)
cv_img = cv2.cvtColor(np.array(img.convert('RGB')),cv2.COLOR_RGB2BGR)
words = image_to_string(pil_im,"eng").strip()

print words
cv2.imshow("cv_img",cv_img)
cv2.waitKey(0)

测试下程序:

python tesseract_demo.py 

查看识别效果

中文识别

要识别中文,需要先下载中文语言包chi_sim.traineddata放到树莓派的:/usr/share/tesseract-ocr/tessdata文件夹下,之后我们将

words = image_to_string(pil_im,"eng").strip()

替换为:

words = image_to_string(pil_im,"chi_sim").strip()

就可以了


频道:电脑