CN214202843U

CN214202843U - Visual impairment person reading device based on OCR and TTS

Info

Publication number: CN214202843U
Application number: CN202023117638.3U
Authority: CN
Inventors: 张德钱; 李宇航; 廖斌强; 丁凡; 杨森泉
Original assignee: Shaoguan University
Current assignee: Shaoguan University
Priority date: 2020-12-22
Filing date: 2020-12-22
Publication date: 2021-09-14
Anticipated expiration: 2030-12-22

Abstract

The utility model relates to a look barrier person reading device based on OCR and TTS, include: the device comprises a microcomputer unit, a voice recognition unit, a camera and a voice player; the microcomputer unit is respectively electrically connected with the voice player, the camera and the voice recognition unit; the voice recognition unit is used for recognizing a voice instruction, the voice instruction comprises a reading instruction, and the voice recognition unit is also used for sending the reading instruction to the microcomputer unit; the microcomputer unit is used for driving the camera to snapshot characters to be read according to the reading instruction to obtain character images; the microcomputer unit is also used for carrying out character recognition on the character image to obtain text data, obtaining a voice stream according to the text data, and sending the voice stream to the voice player for playing. Look barrier person reading device operation simple and easy and read efficient based on OCR and TTS.

Description

Visual impairment person reading device based on OCR and TTS

Technical Field

The utility model relates to an electronic reading equipment technical field especially relates to a visual barrier person reading device based on OCR and TTS.

Background

For the people with visual impairment, there are two reading modes, namely reading through braille in the traditional mode or electronically reading through a blind person reader or screen reading software and other tools. For the former, the person with visual impairment can only read the reading data translated into braille, the data source is limited, the price is high, and the reading efficiency is low; for the latter, the blind reader in the market must first introduce the electronic book, and use it as the reading source, and there is a certain lag in time; the screen reading software can only read characters on a computer screen, mass paper files cannot be handed down, and the experience is not good for people with visual impairment.

SUMMERY OF THE UTILITY MODEL

Based on this, the utility model aims to provide a look barrier person reading device based on OCR and TTS, look barrier person can come trigger device through speech control and treat the characters of reading and shoot, carry out character recognition to the photo, and synthetic pronunciation is played out at last, and its operation is simple and easy and reading efficiency is high.

An OCR and TTS based visually impaired reading device comprising:

the device comprises a microcomputer unit, a voice recognition unit, a camera and a voice player;

the microcomputer unit is respectively electrically connected with the voice player, the camera and the voice recognition unit;

the voice recognition unit is used for recognizing a voice instruction, the voice instruction comprises a reading instruction, and the voice recognition unit is also used for sending the reading instruction to the microcomputer unit;

the microcomputer unit is used for driving the camera to snapshot characters to be read according to the reading instruction to obtain character images;

the microcomputer unit is also used for carrying out character recognition on the character image to obtain text data, obtaining a voice stream according to the text data and sending the voice stream to the voice player for playing; wherein, the voice stream records the content of the text to be read.

Look barrier person reading device based on OCR and TTS, control through pronunciation, read out the characters of waiting to read with the pronunciation broadcast form at last, be convenient for look barrier person's operation, and read efficiently.

Furthermore, the system also comprises a server, wherein the microcomputer unit is in signal connection with the server;

the microcomputer unit is also used for sending the text data to the server;

the server is used for obtaining the voice stream according to the text data through a Baidu voice synthesis API and sending the voice stream to the microcomputer unit.

Further, the microcomputer unit comprises an OCR module, and the OCR module is used for performing character recognition on the character image through an open source OCR algorithm of Google.

Further, the speech recognition unit comprises an ASR management module and an ASR module;

the ASR management module is used for learning and recording the voiceprint information of the voice instruction;

the ASR module is used for identifying a voice instruction according to the recorded voiceprint information and sending an identification result to the ASR management module;

and the ASR management module is also used for sending the recognition result to the microcomputer unit.

Further, the device also comprises a gesture recognition module which is electrically connected with the microcomputer unit;

the gesture recognition module is used for recognizing preset gestures, generating a reading instruction and sending the reading instruction to the microcomputer unit.

Furthermore, the microcomputer unit is also used for carrying out edge detection algorithm processing or binarization algorithm processing on the character image, so that the imaging content of the character image is clearer.

Furthermore, the voice recognition unit further comprises a microphone, wherein the microphone is provided with soundproof cotton, and a sound receiving hole is reserved in one direction only by the soundproof cotton.

Further, the voice recognition device also comprises an LED lamp which is electrically connected with the voice recognition unit through a relay;

the voice recognition unit is also used for driving the LED lamp to be turned on or turned off according to the recognition result of the voice command.

The solar energy collecting device further comprises a case, wherein the case main body is made of aluminum profiles, and a shell of the case is a light-weight snowfoil board;

the LED lamp is a lamp strip and is arranged in the chassis in a surrounding mode;

the case is provided with an induction port, a reading area and a sealing area;

the sensing port is used for performing gesture sensing operation on a user;

the sealing area is used for accommodating the microcomputer unit, the voice player and the voice recognition unit so as to protect a circuit and prevent misoperation of a user;

the reading area is used for placing a file to be read.

Further, the microcomputer unit is a raspberry pi 3B; the ASR management module is an STC89C52 singlechip; the ASR module is an LD3320 voice recognition chip; the gesture recognition module is an E18-D80NK infrared sensor.

For a better understanding and an implementation, the present invention is described in detail below with reference to the accompanying drawings.

Drawings

Fig. 1 is a block diagram of a reading device for visually impaired people based on OCR and TTS according to an embodiment of the present invention;

fig. 2 is a block diagram of a structure of a visually impaired person reading device based on OCR and TTS according to a second embodiment of the present invention.

Detailed Description

Example one

Referring to fig. 1, a vision impairment person reading device (hereinafter referred to as the device) based on OCR and TTS provided in an embodiment of the present invention includes a microcomputer unit 10, a voice recognition unit 20, a camera 30, a voice player 40, a gesture recognition module 50, an LED lamp 60 and a display screen 70, where the microcomputer unit 10 is electrically connected to the voice recognition unit 20, the camera 30, the voice player 40, the gesture recognition module 50 and the display screen 70, respectively, and the LED lamp 60 is electrically connected to the voice recognition unit 20.

The microcomputer unit 10 has a character recognition function and a voice acquisition function.

The character recognition function is to perform character recognition on a character image containing character content to obtain text data.

In one embodiment, the microcomputer unit 10 employs OCR (Optical Character Recognition) technology to realize Character Recognition, which converts characters in an image into a text format by Recognition software. The conventional technology OCR processes and extracts features of an image based on methods such as digital image processing and conventional machine learning, but the recognition effect is very poor due to problems such as image blurring and distortion.

In a preferred embodiment, to increase the recognition rate, the text image is pre-processed before performing the text recognition. The pretreatment process comprises the following steps: processing each pixel on the character image by a threshold segmentation method and/or an edge detection algorithm; and denoising the image through a Gaussian filtering algorithm, so that the imaging content is clearer.

In a preferred embodiment, the microcomputer unit 10 pre-processes the image by the open source algorithm OPENCV and performs text recognition by the google open source Tesseract-OCR engine. Compared with the traditional template matching algorithm and a cascade device, the text images in various formats can be recognized and converted into text data through the model trained by the Tesseract-OCR engine, and the Tesseract-OCR engine supports more than 60 languages, so that the seamless connection of the visually impaired when reading different languages is facilitated.

The voice acquiring function is to acquire a voice stream according to the text data, send the voice stream to the player 40, and control the player 40 to play the voice stream.

Specifically, the microcomputer unit 10 uses a TTS (Text To Speech) technology To realize Text-To-Speech conversion. The conversion process comprises: and analyzing the text content, calling a voice synthesis engine and a voice library to carry out voice synthesis, and finally obtaining a voice stream.

In a specific embodiment, the TTS engine provided by the Android is used to implement speech synthesis, and the speech synthesis process is performed locally in the microcomputer unit 10.

In a specific embodiment, a raspberry pi 3B is used as the microcomputer unit 10.

The voice recognition unit 20 is configured to recognize a voice command and make a further response according to the recognized voice command, where the voice command includes a reading command, a light-on command, a light-off command, and the like.

In particular, the speech recognition unit 20 comprises an ASR management module as well as an ASR module. The ASR management module is used for learning and recording the voiceprint information of the voice instruction; the ASR module carries out voice Recognition on a voice instruction according to the recorded voiceprint information through an Automatic voice Recognition technology (Automatic Speech Recognition) and sends a Recognition result to the ASR management module; the ASR management module is also used for making further response according to the result of the speech recognition.

The ASR module recognizes voice through voiceprints, and different users have different language habits and pronunciation habits, so that a voice presetting stage is arranged before a normal use stage, and the ASR management module learns and records voiceprint information of voice instructions of the users in the voice presetting stage; in the normal use stage, the ASR module performs speech recognition on the speech instruction of the user, and sends the recognition result to the ASR management module, and the ASR management module further responds to the recognition result.

The voice command is composed of preset keywords, for example, the keywords may include: "start reading", "turn on fill light", "turn off fill light"; the keywords can be set to english, cantonese or local dialects according to user habits.

In a specific embodiment, an STC89C52 single chip microcomputer is used as the ASR management module, and an LD3320 voice recognition chip is used as the ASR module.

The camera 30 is used for capturing characters to be read to obtain character images, and sending the character images to the microcomputer unit 10, and the microcomputer unit 10 further processes the character images.

In a specific embodiment, the camera 30 is an 800 ten thousand pixel camera.

When the light is not good, the quality of the character image captured by the camera 30 is poor, so that the microcomputer unit 10 cannot correctly recognize the character image.

For solving above-mentioned technical problem, LED lamp 60 sets up near camera 30 for carry out the light filling in camera 30 shoots, so that even under the not good condition of light, camera 30 still can shoot out the better literary sketch of instruction.

As a further supplement, the gesture recognition module 50 is configured to recognize a preset gesture, generate a gesture command, and send the gesture command to the microcomputer unit 10.

In one specific embodiment, an infrared sensor E18-D80NK module is employed as the gesture recognition module 50.

Preferably, the gesture command includes a reading command, and when the microcomputer unit 10 receives the reading command, the camera 30 is driven to shoot, so as to perform a series of subsequent actions, such as character recognition, voice extraction, voice playing, and the like.

The display screen 70 is used for displaying data generated during the operation of the device, so that a tester or a user can observe and debug the data.

The problem that the LD3320 voice recognition chip cannot correctly recognize voice commands can occur in a noisy environment. In order to solve the above technical problem, preferably, the microphone is provided with soundproof cotton, and the soundproof cotton is provided with a sound receiving hole only in one direction, so as to effectively isolate the ambient noise.

The device also comprises a case, wherein the main body of the case is made of aluminum profiles, and the aluminum profiles have strong hardness and strong corrosion resistance, so that the stability of the structure is ensured; the shell of the case is a light-weight snowfoil board to reduce the weight, and the light-weight snowfoil board is soft, so that a user cannot be scratched when the light-weight snowfoil board is collided, and the case is friendly to a visually impaired person.

The LED lamp 60 is a lamp strip, and is disposed around the chassis for supplementing light in a dark environment. The case is provided with an induction port, a reading area and a sealing area. The sensing port is used for performing gesture sensing operation on a user; the sealing area is used for accommodating the microcomputer unit, the voice player and the voice recognition unit so as to protect a circuit and prevent misoperation of a user; the reading area is used for placing a file to be read.

The working preparation and the process of the device are specifically as follows: the STC89C52 single chip microcomputer is electrically connected with the LED lamp through a relay. The STC89C52 single chip microcomputer is provided with a microphone, and in the voice presetting stage, voice instructions are collected through the microphone and are learned and recorded; in the normal use stage, a voice command is recorded through the microphone, the LD3320 voice recognition chip performs voice recognition on the voice command, and a recognition result is sent to the STC89C52 single chip microcomputer. If the voice command is recognized as a light-on or light-off command, the STC89C52 single chip microcomputer directly controls the on or off of the LED and the like through the relay; if the voice command is a reading command, the STC89C52 singlechip sends the command to the microcomputer unit 10, the raspberry pi 3B drives the camera 30 to complete character image photographing and processing, character recognition is performed through a Tesseract-OCR engine, a TTS engine is used for realizing voice synthesis to obtain a voice stream, and finally the voice player is controlled to play the voice stream.

The utility model provides a pair of look barrier person reading device based on OCR and TTS utilizes techniques such as current OCR technique, gesture recognition trigger, TTS technique for look barrier person can trigger through the gesture and shoot, perhaps operates through speech control, then this device can carry out characters to the characters picture and detect and discern, and synthetic pronunciation is broadcast at last. The device is simple to operate, and the user experience is good; all forms of characters can be recognized and read, and the range of objects which can be read by the visually impaired can be expanded. Secondly, this device still has advantages such as design cost is low, long service life, and the discernment accuracy is high, has than high practical value.

Example two

In the first embodiment, the speech synthesis is implemented by using a TTS engine provided by the Android, and the speech synthesis process is locally completed by the microcomputer unit 10, the implementation of the technology depends on local software and hardware resources, and for a native TTS speech synthesis engine provided by the Android, the output speech is monotonous and the machine sound is obvious.

In order to solve the above technical problem, please refer to fig. 2, the reading device for visually impaired people based on OCR and TTS according to the present invention further includes a server 80 on the basis of the first embodiment, wherein the microcomputer unit 10 is in signal connection with the server 80; the microcomputer unit 10 is further configured to send the text data obtained by the character recognition module to the server 80; the server is used for obtaining the voice stream according to the text data through a Baidu voice synthesis API and sending the voice stream to the microcomputer unit.

In this embodiment, the voice synthesis process is executed on the server, and does not depend on local hardware resources, the local hardware can upload and convert local text into voice and then download the voice into voice only by networking, so that the function of converting text into voice can be realized, and the algorithm deployed at the cloud end has higher synthesis rate, is faster and is more emotional.

The above-mentioned embodiments only represent some embodiments of the present invention, and the description thereof is specific and detailed, but not to be construed as limiting the scope of the present invention. It should be noted that, for those skilled in the art, without departing from the spirit of the present invention, several variations and modifications can be made, which are within the scope of the present invention.

Claims

1. An OCR and TTS based visually impaired reading device comprising:

2. An OCR and TTS based visually impaired reading device according to claim 1 and wherein: the microcomputer unit is in signal connection with the server;

the microcomputer unit is also used for sending the text data to the server;

3. An OCR and TTS based visually impaired reading device according to claim 2 and wherein: the microcomputer unit comprises an OCR module, and the OCR module is used for carrying out character recognition on the character image through an open source OCR algorithm of Google.

4. An OCR and TTS based visually impaired reading device according to claim 2 and wherein: the speech recognition unit comprises an ASR management module and an ASR module;

5. An OCR and TTS based visually impaired reading device according to claim 4, wherein: the gesture recognition module is electrically connected with the microcomputer unit;

6. An OCR and TTS based visually impaired reading device according to claim 5, wherein:

the microcomputer unit is also used for carrying out edge detection algorithm processing or binarization algorithm processing on the character image so as to enable the imaging content of the character image to be clearer.

7. An OCR and TTS based visually impaired reading device according to claim 5, wherein: the voice recognition unit further comprises a microphone, the microphone is provided with soundproof cotton, and the soundproof cotton is only provided with a sound receiving hole in one direction.

8. An OCR and TTS based visually impaired reading device according to claim 5, wherein: the voice recognition device also comprises an LED lamp which is electrically connected with the voice recognition unit through a relay;

9. An OCR and TTS based visually impaired reading device according to claim 8 and wherein: the solar energy snow protection device is characterized by further comprising a case, wherein the case body is made of aluminum profiles, and a shell of the case is a light snow Buddha board;

the case is provided with an induction port, a reading area and a sealing area;

the sensing port is used for performing gesture sensing operation on a user;

the sealed area is used for accommodating the microcomputer unit, the voice player and the voice recognition unit;

the reading area is used for placing a file to be read.

10. An OCR and TTS based visually impaired reading device according to any of claims 5 to 9 and comprising: the microcomputer unit is a raspberry group 3B; the ASR management module is an STC89C52 singlechip; the ASR module is an LD3320 voice recognition chip; the gesture recognition module is an E18-D80NK infrared sensor.