CN214202843U - Visual impairment person reading device based on OCR and TTS - Google Patents

Visual impairment person reading device based on OCR and TTS Download PDF

Info

Publication number
CN214202843U
CN214202843U CN202023117638.3U CN202023117638U CN214202843U CN 214202843 U CN214202843 U CN 214202843U CN 202023117638 U CN202023117638 U CN 202023117638U CN 214202843 U CN214202843 U CN 214202843U
Authority
CN
China
Prior art keywords
voice
ocr
microcomputer unit
unit
recognition
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202023117638.3U
Other languages
Chinese (zh)
Inventor
张德钱
李宇航
廖斌强
丁凡
杨森泉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shaoguan University
Original Assignee
Shaoguan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shaoguan University filed Critical Shaoguan University
Priority to CN202023117638.3U priority Critical patent/CN214202843U/en
Application granted granted Critical
Publication of CN214202843U publication Critical patent/CN214202843U/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Character Discrimination (AREA)

Abstract

The utility model relates to a look barrier person reading device based on OCR and TTS, include: the device comprises a microcomputer unit, a voice recognition unit, a camera and a voice player; the microcomputer unit is respectively electrically connected with the voice player, the camera and the voice recognition unit; the voice recognition unit is used for recognizing a voice instruction, the voice instruction comprises a reading instruction, and the voice recognition unit is also used for sending the reading instruction to the microcomputer unit; the microcomputer unit is used for driving the camera to snapshot characters to be read according to the reading instruction to obtain character images; the microcomputer unit is also used for carrying out character recognition on the character image to obtain text data, obtaining a voice stream according to the text data, and sending the voice stream to the voice player for playing. Look barrier person reading device operation simple and easy and read efficient based on OCR and TTS.

Description

Visual impairment person reading device based on OCR and TTS
Technical Field
The utility model relates to an electronic reading equipment technical field especially relates to a visual barrier person reading device based on OCR and TTS.
Background
For the people with visual impairment, there are two reading modes, namely reading through braille in the traditional mode or electronically reading through a blind person reader or screen reading software and other tools. For the former, the person with visual impairment can only read the reading data translated into braille, the data source is limited, the price is high, and the reading efficiency is low; for the latter, the blind reader in the market must first introduce the electronic book, and use it as the reading source, and there is a certain lag in time; the screen reading software can only read characters on a computer screen, mass paper files cannot be handed down, and the experience is not good for people with visual impairment.
SUMMERY OF THE UTILITY MODEL
Based on this, the utility model aims to provide a look barrier person reading device based on OCR and TTS, look barrier person can come trigger device through speech control and treat the characters of reading and shoot, carry out character recognition to the photo, and synthetic pronunciation is played out at last, and its operation is simple and easy and reading efficiency is high.
An OCR and TTS based visually impaired reading device comprising:
the device comprises a microcomputer unit, a voice recognition unit, a camera and a voice player;
the microcomputer unit is respectively electrically connected with the voice player, the camera and the voice recognition unit;
the voice recognition unit is used for recognizing a voice instruction, the voice instruction comprises a reading instruction, and the voice recognition unit is also used for sending the reading instruction to the microcomputer unit;
the microcomputer unit is used for driving the camera to snapshot characters to be read according to the reading instruction to obtain character images;
the microcomputer unit is also used for carrying out character recognition on the character image to obtain text data, obtaining a voice stream according to the text data and sending the voice stream to the voice player for playing; wherein, the voice stream records the content of the text to be read.
Look barrier person reading device based on OCR and TTS, control through pronunciation, read out the characters of waiting to read with the pronunciation broadcast form at last, be convenient for look barrier person's operation, and read efficiently.
Furthermore, the system also comprises a server, wherein the microcomputer unit is in signal connection with the server;
the microcomputer unit is also used for sending the text data to the server;
the server is used for obtaining the voice stream according to the text data through a Baidu voice synthesis API and sending the voice stream to the microcomputer unit.
Further, the microcomputer unit comprises an OCR module, and the OCR module is used for performing character recognition on the character image through an open source OCR algorithm of Google.
Further, the speech recognition unit comprises an ASR management module and an ASR module;
the ASR management module is used for learning and recording the voiceprint information of the voice instruction;
the ASR module is used for identifying a voice instruction according to the recorded voiceprint information and sending an identification result to the ASR management module;
and the ASR management module is also used for sending the recognition result to the microcomputer unit.
Further, the device also comprises a gesture recognition module which is electrically connected with the microcomputer unit;
the gesture recognition module is used for recognizing preset gestures, generating a reading instruction and sending the reading instruction to the microcomputer unit.
Furthermore, the microcomputer unit is also used for carrying out edge detection algorithm processing or binarization algorithm processing on the character image, so that the imaging content of the character image is clearer.
Furthermore, the voice recognition unit further comprises a microphone, wherein the microphone is provided with soundproof cotton, and a sound receiving hole is reserved in one direction only by the soundproof cotton.
Further, the voice recognition device also comprises an LED lamp which is electrically connected with the voice recognition unit through a relay;
the voice recognition unit is also used for driving the LED lamp to be turned on or turned off according to the recognition result of the voice command.
The solar energy collecting device further comprises a case, wherein the case main body is made of aluminum profiles, and a shell of the case is a light-weight snowfoil board;
the LED lamp is a lamp strip and is arranged in the chassis in a surrounding mode;
the case is provided with an induction port, a reading area and a sealing area;
the sensing port is used for performing gesture sensing operation on a user;
the sealing area is used for accommodating the microcomputer unit, the voice player and the voice recognition unit so as to protect a circuit and prevent misoperation of a user;
the reading area is used for placing a file to be read.
Further, the microcomputer unit is a raspberry pi 3B; the ASR management module is an STC89C52 singlechip; the ASR module is an LD3320 voice recognition chip; the gesture recognition module is an E18-D80NK infrared sensor.
For a better understanding and an implementation, the present invention is described in detail below with reference to the accompanying drawings.
Drawings
Fig. 1 is a block diagram of a reading device for visually impaired people based on OCR and TTS according to an embodiment of the present invention;
fig. 2 is a block diagram of a structure of a visually impaired person reading device based on OCR and TTS according to a second embodiment of the present invention.
Detailed Description
Example one
Referring to fig. 1, a vision impairment person reading device (hereinafter referred to as the device) based on OCR and TTS provided in an embodiment of the present invention includes a microcomputer unit 10, a voice recognition unit 20, a camera 30, a voice player 40, a gesture recognition module 50, an LED lamp 60 and a display screen 70, where the microcomputer unit 10 is electrically connected to the voice recognition unit 20, the camera 30, the voice player 40, the gesture recognition module 50 and the display screen 70, respectively, and the LED lamp 60 is electrically connected to the voice recognition unit 20.
The microcomputer unit 10 has a character recognition function and a voice acquisition function.
The character recognition function is to perform character recognition on a character image containing character content to obtain text data.
In one embodiment, the microcomputer unit 10 employs OCR (Optical Character Recognition) technology to realize Character Recognition, which converts characters in an image into a text format by Recognition software. The conventional technology OCR processes and extracts features of an image based on methods such as digital image processing and conventional machine learning, but the recognition effect is very poor due to problems such as image blurring and distortion.
In a preferred embodiment, to increase the recognition rate, the text image is pre-processed before performing the text recognition. The pretreatment process comprises the following steps: processing each pixel on the character image by a threshold segmentation method and/or an edge detection algorithm; and denoising the image through a Gaussian filtering algorithm, so that the imaging content is clearer.
In a preferred embodiment, the microcomputer unit 10 pre-processes the image by the open source algorithm OPENCV and performs text recognition by the google open source Tesseract-OCR engine. Compared with the traditional template matching algorithm and a cascade device, the text images in various formats can be recognized and converted into text data through the model trained by the Tesseract-OCR engine, and the Tesseract-OCR engine supports more than 60 languages, so that the seamless connection of the visually impaired when reading different languages is facilitated.
The voice acquiring function is to acquire a voice stream according to the text data, send the voice stream to the player 40, and control the player 40 to play the voice stream.
Specifically, the microcomputer unit 10 uses a TTS (Text To Speech) technology To realize Text-To-Speech conversion. The conversion process comprises: and analyzing the text content, calling a voice synthesis engine and a voice library to carry out voice synthesis, and finally obtaining a voice stream.
In a specific embodiment, the TTS engine provided by the Android is used to implement speech synthesis, and the speech synthesis process is performed locally in the microcomputer unit 10.
In a specific embodiment, a raspberry pi 3B is used as the microcomputer unit 10.
The voice recognition unit 20 is configured to recognize a voice command and make a further response according to the recognized voice command, where the voice command includes a reading command, a light-on command, a light-off command, and the like.
In particular, the speech recognition unit 20 comprises an ASR management module as well as an ASR module. The ASR management module is used for learning and recording the voiceprint information of the voice instruction; the ASR module carries out voice Recognition on a voice instruction according to the recorded voiceprint information through an Automatic voice Recognition technology (Automatic Speech Recognition) and sends a Recognition result to the ASR management module; the ASR management module is also used for making further response according to the result of the speech recognition.
The ASR module recognizes voice through voiceprints, and different users have different language habits and pronunciation habits, so that a voice presetting stage is arranged before a normal use stage, and the ASR management module learns and records voiceprint information of voice instructions of the users in the voice presetting stage; in the normal use stage, the ASR module performs speech recognition on the speech instruction of the user, and sends the recognition result to the ASR management module, and the ASR management module further responds to the recognition result.
The voice command is composed of preset keywords, for example, the keywords may include: "start reading", "turn on fill light", "turn off fill light"; the keywords can be set to english, cantonese or local dialects according to user habits.
In a specific embodiment, an STC89C52 single chip microcomputer is used as the ASR management module, and an LD3320 voice recognition chip is used as the ASR module.
The camera 30 is used for capturing characters to be read to obtain character images, and sending the character images to the microcomputer unit 10, and the microcomputer unit 10 further processes the character images.
In a specific embodiment, the camera 30 is an 800 ten thousand pixel camera.
When the light is not good, the quality of the character image captured by the camera 30 is poor, so that the microcomputer unit 10 cannot correctly recognize the character image.
For solving above-mentioned technical problem, LED lamp 60 sets up near camera 30 for carry out the light filling in camera 30 shoots, so that even under the not good condition of light, camera 30 still can shoot out the better literary sketch of instruction.
As a further supplement, the gesture recognition module 50 is configured to recognize a preset gesture, generate a gesture command, and send the gesture command to the microcomputer unit 10.
In one specific embodiment, an infrared sensor E18-D80NK module is employed as the gesture recognition module 50.
Preferably, the gesture command includes a reading command, and when the microcomputer unit 10 receives the reading command, the camera 30 is driven to shoot, so as to perform a series of subsequent actions, such as character recognition, voice extraction, voice playing, and the like.
The display screen 70 is used for displaying data generated during the operation of the device, so that a tester or a user can observe and debug the data.
The problem that the LD3320 voice recognition chip cannot correctly recognize voice commands can occur in a noisy environment. In order to solve the above technical problem, preferably, the microphone is provided with soundproof cotton, and the soundproof cotton is provided with a sound receiving hole only in one direction, so as to effectively isolate the ambient noise.
The device also comprises a case, wherein the main body of the case is made of aluminum profiles, and the aluminum profiles have strong hardness and strong corrosion resistance, so that the stability of the structure is ensured; the shell of the case is a light-weight snowfoil board to reduce the weight, and the light-weight snowfoil board is soft, so that a user cannot be scratched when the light-weight snowfoil board is collided, and the case is friendly to a visually impaired person.
The LED lamp 60 is a lamp strip, and is disposed around the chassis for supplementing light in a dark environment. The case is provided with an induction port, a reading area and a sealing area. The sensing port is used for performing gesture sensing operation on a user; the sealing area is used for accommodating the microcomputer unit, the voice player and the voice recognition unit so as to protect a circuit and prevent misoperation of a user; the reading area is used for placing a file to be read.
The working preparation and the process of the device are specifically as follows: the STC89C52 single chip microcomputer is electrically connected with the LED lamp through a relay. The STC89C52 single chip microcomputer is provided with a microphone, and in the voice presetting stage, voice instructions are collected through the microphone and are learned and recorded; in the normal use stage, a voice command is recorded through the microphone, the LD3320 voice recognition chip performs voice recognition on the voice command, and a recognition result is sent to the STC89C52 single chip microcomputer. If the voice command is recognized as a light-on or light-off command, the STC89C52 single chip microcomputer directly controls the on or off of the LED and the like through the relay; if the voice command is a reading command, the STC89C52 singlechip sends the command to the microcomputer unit 10, the raspberry pi 3B drives the camera 30 to complete character image photographing and processing, character recognition is performed through a Tesseract-OCR engine, a TTS engine is used for realizing voice synthesis to obtain a voice stream, and finally the voice player is controlled to play the voice stream.
The utility model provides a pair of look barrier person reading device based on OCR and TTS utilizes techniques such as current OCR technique, gesture recognition trigger, TTS technique for look barrier person can trigger through the gesture and shoot, perhaps operates through speech control, then this device can carry out characters to the characters picture and detect and discern, and synthetic pronunciation is broadcast at last. The device is simple to operate, and the user experience is good; all forms of characters can be recognized and read, and the range of objects which can be read by the visually impaired can be expanded. Secondly, this device still has advantages such as design cost is low, long service life, and the discernment accuracy is high, has than high practical value.
Example two
In the first embodiment, the speech synthesis is implemented by using a TTS engine provided by the Android, and the speech synthesis process is locally completed by the microcomputer unit 10, the implementation of the technology depends on local software and hardware resources, and for a native TTS speech synthesis engine provided by the Android, the output speech is monotonous and the machine sound is obvious.
In order to solve the above technical problem, please refer to fig. 2, the reading device for visually impaired people based on OCR and TTS according to the present invention further includes a server 80 on the basis of the first embodiment, wherein the microcomputer unit 10 is in signal connection with the server 80; the microcomputer unit 10 is further configured to send the text data obtained by the character recognition module to the server 80; the server is used for obtaining the voice stream according to the text data through a Baidu voice synthesis API and sending the voice stream to the microcomputer unit.
In this embodiment, the voice synthesis process is executed on the server, and does not depend on local hardware resources, the local hardware can upload and convert local text into voice and then download the voice into voice only by networking, so that the function of converting text into voice can be realized, and the algorithm deployed at the cloud end has higher synthesis rate, is faster and is more emotional.
The above-mentioned embodiments only represent some embodiments of the present invention, and the description thereof is specific and detailed, but not to be construed as limiting the scope of the present invention. It should be noted that, for those skilled in the art, without departing from the spirit of the present invention, several variations and modifications can be made, which are within the scope of the present invention.

Claims (10)

1. An OCR and TTS based visually impaired reading device comprising:
the device comprises a microcomputer unit, a voice recognition unit, a camera and a voice player;
the microcomputer unit is respectively electrically connected with the voice player, the camera and the voice recognition unit;
the voice recognition unit is used for recognizing a voice instruction, the voice instruction comprises a reading instruction, and the voice recognition unit is also used for sending the reading instruction to the microcomputer unit;
the microcomputer unit is used for driving the camera to snapshot characters to be read according to the reading instruction to obtain character images;
the microcomputer unit is also used for carrying out character recognition on the character image to obtain text data, obtaining a voice stream according to the text data and sending the voice stream to the voice player for playing; wherein, the voice stream records the content of the text to be read.
2. An OCR and TTS based visually impaired reading device according to claim 1 and wherein: the microcomputer unit is in signal connection with the server;
the microcomputer unit is also used for sending the text data to the server;
the server is used for obtaining the voice stream according to the text data through a Baidu voice synthesis API and sending the voice stream to the microcomputer unit.
3. An OCR and TTS based visually impaired reading device according to claim 2 and wherein: the microcomputer unit comprises an OCR module, and the OCR module is used for carrying out character recognition on the character image through an open source OCR algorithm of Google.
4. An OCR and TTS based visually impaired reading device according to claim 2 and wherein: the speech recognition unit comprises an ASR management module and an ASR module;
the ASR management module is used for learning and recording the voiceprint information of the voice instruction;
the ASR module is used for identifying a voice instruction according to the recorded voiceprint information and sending an identification result to the ASR management module;
and the ASR management module is also used for sending the recognition result to the microcomputer unit.
5. An OCR and TTS based visually impaired reading device according to claim 4, wherein: the gesture recognition module is electrically connected with the microcomputer unit;
the gesture recognition module is used for recognizing preset gestures, generating a reading instruction and sending the reading instruction to the microcomputer unit.
6. An OCR and TTS based visually impaired reading device according to claim 5, wherein:
the microcomputer unit is also used for carrying out edge detection algorithm processing or binarization algorithm processing on the character image so as to enable the imaging content of the character image to be clearer.
7. An OCR and TTS based visually impaired reading device according to claim 5, wherein: the voice recognition unit further comprises a microphone, the microphone is provided with soundproof cotton, and the soundproof cotton is only provided with a sound receiving hole in one direction.
8. An OCR and TTS based visually impaired reading device according to claim 5, wherein: the voice recognition device also comprises an LED lamp which is electrically connected with the voice recognition unit through a relay;
the voice recognition unit is also used for driving the LED lamp to be turned on or turned off according to the recognition result of the voice command.
9. An OCR and TTS based visually impaired reading device according to claim 8 and wherein: the solar energy snow protection device is characterized by further comprising a case, wherein the case body is made of aluminum profiles, and a shell of the case is a light snow Buddha board;
the LED lamp is a lamp strip and is arranged in the chassis in a surrounding mode;
the case is provided with an induction port, a reading area and a sealing area;
the sensing port is used for performing gesture sensing operation on a user;
the sealed area is used for accommodating the microcomputer unit, the voice player and the voice recognition unit;
the reading area is used for placing a file to be read.
10. An OCR and TTS based visually impaired reading device according to any of claims 5 to 9 and comprising: the microcomputer unit is a raspberry group 3B; the ASR management module is an STC89C52 singlechip; the ASR module is an LD3320 voice recognition chip; the gesture recognition module is an E18-D80NK infrared sensor.
CN202023117638.3U 2020-12-22 2020-12-22 Visual impairment person reading device based on OCR and TTS Active CN214202843U (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202023117638.3U CN214202843U (en) 2020-12-22 2020-12-22 Visual impairment person reading device based on OCR and TTS

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202023117638.3U CN214202843U (en) 2020-12-22 2020-12-22 Visual impairment person reading device based on OCR and TTS

Publications (1)

Publication Number Publication Date
CN214202843U true CN214202843U (en) 2021-09-14

Family

ID=77654897

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202023117638.3U Active CN214202843U (en) 2020-12-22 2020-12-22 Visual impairment person reading device based on OCR and TTS

Country Status (1)

Country Link
CN (1) CN214202843U (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114120769A (en) * 2021-11-29 2022-03-01 云知声智能科技股份有限公司 Braille reading method, device, storage medium and electronic device

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114120769A (en) * 2021-11-29 2022-03-01 云知声智能科技股份有限公司 Braille reading method, device, storage medium and electronic device

Similar Documents

Publication Publication Date Title
US8538087B2 (en) Aiding device for reading a printed text
US20190205618A1 (en) Method and apparatus for generating facial feature
Rajesh et al. Text recognition and face detection aid for visually impaired person using Raspberry PI
Ani et al. Smart Specs: Voice assisted text reading system for visually impaired persons using TTS method
CN109558788B (en) Silence voice input identification method, computing device and computer readable medium
EP2144189A3 (en) Method for recognizing and translating characters in camera-based image
AlSaid et al. Deep learning assisted smart glasses as educational aid for visually challenged students
CN214202843U (en) Visual impairment person reading device based on OCR and TTS
Mainkar et al. Raspberry Pi based intelligent reader for visually impaired persons
CN111539408A (en) Intelligent point reading scheme based on photographing and object recognizing
Manage et al. An intelligent text reader based on python
CN111447325A (en) Call auxiliary method, device, terminal and storage medium
Shirke et al. Portable camera based Text Reading of Objects for blind Persons
CN113822187A (en) Sign language translation, customer service, communication method, device and readable medium
Zaman et al. Python based portable virtual text reader
CN110674825A (en) Character recognition method, device and system applied to intelligent voice mouse and storage medium
CN111711864A (en) Intelligent voice photographing method based on television, computer readable storage medium and television
KR102148021B1 (en) Information search method and apparatus in incidental images incorporating deep learning scene text detection and recognition
CN114359446A (en) Animation picture book generation method, device, equipment and storage medium
AV et al. Penpal-electronic pen aiding visually impaired in reading and visualizing textual contents
Muralidharan et al. Reading aid for visually impaired people
Krishna et al. Word Based Text Extraction Algorithm Implementation in Wearable Assistive Device for the Blind
Dokhe et al. Survey Paper: Image Reader For Blind Person
Ong et al. MATLAB-based Image-to-Speech Conversion
Jayyusi et al. Improved Camera-Based Text Reading Assistant System Using Digital Image Processing Techniques

Legal Events

Date Code Title Description
GR01 Patent grant
GR01 Patent grant