CN103390159A - Method and device for converting screen character into voice - Google Patents
Method and device for converting screen character into voice Download PDFInfo
- Publication number
- CN103390159A CN103390159A CN2013103066970A CN201310306697A CN103390159A CN 103390159 A CN103390159 A CN 103390159A CN 2013103066970 A CN2013103066970 A CN 2013103066970A CN 201310306697 A CN201310306697 A CN 201310306697A CN 103390159 A CN103390159 A CN 103390159A
- Authority
- CN
- China
- Prior art keywords
- text
- converted
- string
- content
- voice
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Landscapes
- User Interface Of Digital Computer (AREA)
Abstract
The invention is applicable to the field of artificial intelligence, and provides a method and a device for converting a screen character into a voice. The method comprises the steps as follows: a screenshot is formed from a content of a to-be-converted picture on a terminal screen; the content of the screenshot is converted into a character content; and the character content is converted into a voice signal. The device comprises a screenshot module, a character recognition module and a sound converting module, wherein the screenshot module is used for screenshot of the to-be-converted picture content on the terminal screen; the character recognition module is used for converting the screenshot content into the character content; and the sound converting module is used for converting the character content into the voice signal. According to the method and the device, a voice recognition technology and an optical character recognition technology are combined, characters on the screen are converted into voices on a terminal device, and the voices are read and output.
Description
Technical field
The invention belongs to artificial intelligence field, relate in particular to a kind of method and device that screen text is converted into voice.
Background technology
Intelligent terminal is very universal in the middle of people live at present, for each side such as people's Working Lifes, brings various informations, has greatly enriched the scope of people's acquisition of informations.The intelligent terminal function is increasing, and a lot having text conversion is voice output function, but is all in application-specific, and operation is very inconvenient.
Application software (as E-book reader) on a lot of PC and smart mobile phone is arranged at present, can reasonablely realize the function that the page word content in software is read aloud, the word that perhaps will choose is bright to read out.The implementation of this class application software is generally that event is read aloud in triggering in application, obtains the word that need to read aloud, then word is carried out speech conversion, by loudspeaker, reads.This class software systems major defect is:, based on the conversion of word content, any word in the picture of seeing on screen can't be changed, its conversion can only be the content of text formatting, make inconvenient operation, user experience is not high.
Summary of the invention
The purpose of the embodiment of the present invention is to provide a kind of method and apparatus that screen text is converted into voice, and being intended to solve existing intelligent terminal can not transform the problem of voice for the user provides image content.
The embodiment of the present invention is achieved in that a kind of method that screen text is converted into voice, and described method comprises:
Intercepting image content to be transformed on terminal screen;
The image content of described intercepting is converted to word content;
Described word content is converted to voice signal.
Further, described on terminal screen the intercepting image content to be transformed comprise:
After receiving the event that triggers conversion, eject mask layer;
Draw and get a picture region in the zone of described mask layer;
Image content in described picture region is saved as bitmap object.
Further, the described image content that will intercept is converted to word content and comprises:
, according to described bitmap object and default optical character recognition algorithms, obtain the Word message in described bitmap object;
Described text-string is carried out the analysis of syntax and semantics, obtain text-string.
Further, describedly word content be converted to voice signal comprise:
, according to described text-string and default speech recognition engine, generate voice signal corresponding to described text-string.
Further, described method also comprises:
Play described voice signal.
The present invention also proposes a kind of device that screen text is converted into voice, and described device comprises:
The picture interception module, be used for intercepting image content to be transformed on terminal screen;
Character recognition module, be used for the image content of described intercepting is converted to word content;
The sound modular converter, be used for described word content is converted to voice signal.
Further, described picture interception module comprises:
Trigger element, after being used for receiving the event that triggers conversion, eject mask layer;
Draw and get unit, be used for drawing and getting a picture region in the zone of described mask layer;
Storage unit, be used for the image content in described picture region is saved as bitmap object.
Further, described character recognition module comprises:
Acquiring unit, be used for obtaining the Word message in described bitmap object according to described bitmap object and default optical character recognition algorithms;
Analytic unit, be used for described text-string is carried out the analysis of syntax and semantics, obtains text-string.
Further, described sound modular converter specifically is used for:
, according to described text-string and default speech recognition engine, generate voice signal corresponding to described text-string.
Further, also comprise:
The voice output module, be used for playing described voice signal.
In embodiments of the present invention, terminal is by after the lock-screen picture, choose and need transition region, the technology such as combined with intelligent sectional drawing, OCR identification, speech conversion, thereby realize being voice output with the text conversion in picture, be particularly suitable for view obstacle person and use, and easy and simple to handle, improved user experience.
Description of drawings
Fig. 1 be the embodiment of the present invention one provide screen text is converted into the process flow diagram of the method for voice;
Fig. 2 be the embodiment of the present invention two provide screen text is converted into the structural drawing of the device of voice;
Fig. 3 is the structural drawing of picture interception module in the device that screen text is converted into voice that provides of the embodiment of the present invention two;
Fig. 4 is the structural drawing of character recognition module in the device that screen text is converted into voice that provides of the embodiment of the present invention two.
Embodiment
, in order to make purpose of the present invention, technical scheme and advantage clearer, below in conjunction with drawings and Examples, the present invention is further elaborated.Should be appreciated that specific embodiment described herein, only in order to explain the present invention, is not intended to limit the present invention.
Embodiment one
The embodiment of the present invention one proposes a kind of method that screen text is converted into voice.As shown in Figure 1, the method for the embodiment of the present invention one comprises step:
Step S1, on terminal screen intercepting image content to be transformed;
The embodiment of the present invention one can arrange trigger button in advance on terminal screen,, as toolbar or one and half sidebar of hiding of floating, can be positioned at the superiors of screen, is used for triggering the event that transforms.After event is triggered, the mask layer that occurs one transparent (transparency can arrange as required) in the screen the superiors, this mask layer is used for locking current screen interface, original interface will no longer receive other customer incident this moment, and above-mentioned mask layer can receive to touch and chooses or mouse drag such as chooses at the Action Events.
While delimiting a zone on mask layer as the user, terminal operates on mask layer and draws and get a picture region according to the user, and this picture region can be to touch the zone of sliding and covering, and can be also the zone of choosing choice box.The range computation that terminal is slided according to gesture goes out a rectangular area, records wide, the high attribute of rectangular area, and the starting point relative coordinate.
Terminal creates an interim bitmap object IMG, and the rectangular area of choosing is drawn up, and the rectangular area that can will choose intercepts, and interim bitmap object IMG is stored in cached bitmap formation CacheList.
Step S2, the image content that will intercept are converted to word content;
Take out the bitmap object IMG of intercepting from the formation of buffer memory bitmap object, carry out text identification conversion (can quote third party library), obtain the Word message in bitmap object, then carry out grammer, semantic analysis, obtain final text-string.In this step, the optical character recognition algorithms that the text identification conversion using is default, optical character identification (OCR, Optical Character Recognition) refer to the text information in image is scanned, then image file is carried out analyzing and processing, obtain the process of word and layout information, can fast the word in image be converted into text-string.
Step S3, word content is converted to voice signal.
Call default speech recognition engine (as third party library, the scheme of a lot of maturations is arranged, even built-in in the certain operations system is the functional module of voice with text conversion) text-string is converted to corresponding voice signal.Again voice signal is exported by loudspeaker, realized the sound broadcast.
The embodiment of the present invention one combines speech recognition technology and optical character recognition, realizes on intelligent terminal, the text conversion on screen being become massage voice reading output.It, by after the lock-screen picture, is chosen and needs transition region, the technology such as combined with intelligent sectional drawing, OCR identification, speech conversion, thereby realize being voice output with the text conversion in picture, be particularly suitable for view obstacle person and use, and easy and simple to handle, improved user experience.
, for further illustrating the method for the embodiment of the present invention one, with following two examples, the embodiment of the present invention are described:
Example one
Realization is as follows with the implementation step that screen text is converted to massage voice reading on smart mobile phone:
1, after os starting, background service process of initialization, run case is monitored module, the event of response definition at any time.
2, create one on screen and control widget,, as one and half sidebar of hiding, can, by clicking, pull out the button of complete demonstration.
3, button click, eject 90% a transparency mask layer on screen, monitor the touch event of screen on mask layer.
4, slide one when regional on screen when finger, catch finger and lift event, call backstage conversion process algorithm, mainly contained for five steps:
1) will draw and get zone and be calculated as a rectangular area;
2), with the rectangular area intercepting, save as an object picture;
3) object picture is carried out word identification, and filtration treatment becomes text-string;
4) text-string is carried out speech conversion, convert sound to;
5) calling system voice output unit, with voice output;
5, finish dealing with, the touch area display effect is recovered, and waits for touch event next time.
6, click and return, withdraw from status recognition.
Example two
Realization is as follows with the implementation step that screen text is converted to massage voice reading on PC:
1, after os starting, background service process of initialization, run case is monitored module, the event of response definition at any time.
2, create one on screen and control widget,, as a translucent toolbar, float in the lower right corner, can, by clicking, pull out the button of complete demonstration.
3, button click (perhaps using shortcut, as Ctrl+Shift+C), eject 90% a transparency mask layer on screen, monitor the ole item OLE of the mouse of screen on mask layer.
4, slide a zone when mouse constantly on current screen, catch the mouse drag and drop and lift event, call backstage conversion process algorithm, mainly contained for five steps:
1) will draw and get zone and be calculated as a rectangular area;
2), with the rectangular area intercepting, save as an object picture;
3) object picture is carried out word identification, and filtration treatment obtains text-string;
4) text-string is carried out speech conversion, convert sound to;
5) calling system voice output unit, with voice output;
5, finish dealing with, the selected areas display effect is recovered, and waits for that mouse drag and drop next time choose event.
6, click and return, withdraw from status recognition.
Embodiment two
The embodiment of the present invention two proposes a kind of device that screen text is converted into voice.The device of the embodiment of the present invention two can be terminal itself, also can be the built-in or external device of terminal, and as shown in Figure 2, the device of the embodiment of the present invention two comprises:
Sound modular converter 30, be used for word content is converted to voice signal.
As shown in Figure 3, picture interception module 10 comprises:
Draw and get unit 12, be used for drawing and getting a picture region in the zone of mask layer;
As shown in Figure 4, character recognition module 20 comprises:
Acquiring unit 21, be used for obtaining the Word message in bitmap object according to bitmap object and default optical character recognition algorithms;
The embodiment of the present invention two can arrange trigger button in advance on terminal screen,, as toolbar or one and half sidebar of hiding of floating, can be positioned at the superiors of screen, is used for triggering the event that transforms.After event is triggered, trigger element 11 ejects the mask layer of transparent (transparency can arrange as required) in the screen the superiors, this mask layer is used for locking current screen interface, original interface will no longer receive other customer incident this moment, and above-mentioned mask layer can receive to touch and chooses or mouse drag such as chooses at the Action Events.
Delimit when zone as the user on mask layer, draw and get unit 12 and operate on mask layer and draw and get a picture region according to the user, this picture region can be to touch the zone of sliding and covering, and can be also the zone of choosing choice box.The range computation that terminal is slided according to gesture goes out a rectangular area, records wide, the high attribute of rectangular area, and the starting point relative coordinate.
The image content that character recognition module 20 will intercept is converted to word content.Acquiring unit 21 takes out the bitmap object IMG of intercepting from the formation of buffer memory bitmap object, carry out text identification conversion (can quote third party library), obtain the Word message in bitmap object, then analytic unit 22 carries out grammer, semantic analysis, obtains final text-string.Acquiring unit 21 can utilize default optical character recognition algorithms, this optical character identification refers to the text information in image is scanned, then image file is carried out analyzing and processing, obtain the process of word and layout information, can fast the word in image be converted into text-string.
Sound modular converter 30 is converted to voice signal with word content.Sound modular converter 30 calls default speech recognition engine (as third party library, the scheme of a lot of maturations is arranged, even built-in in the certain operations system is the functional module of voice with text conversion) text-string is converted to corresponding voice signal.Voice output module 40, again with voice signal output, realizes the sound broadcast.
The device of the embodiment of the present invention two combines speech recognition technology and optical character recognition, realizes on intelligent terminal, the text conversion on screen being become massage voice reading output.It, by after the lock-screen picture, is chosen and needs transition region, the technology such as combined with intelligent sectional drawing, OCR identification, speech conversion, thereby realize being voice output with the text conversion in picture, be particularly suitable for view obstacle person and use, and easy and simple to handle, improved user experience.
The foregoing is only preferred embodiment of the present invention,, not in order to limit the present invention, all any modifications of doing within the spirit and principles in the present invention, be equal to and replace and improvement etc., within all should being included in protection scope of the present invention.
Claims (10)
1. a method that screen text is converted into voice, is characterized in that, described method comprises:
Intercepting image content to be transformed on terminal screen;
The image content of described intercepting is converted to word content;
Described word content is converted to voice signal.
2. the method for claim 1, is characterized in that, described on terminal screen the intercepting image content to be transformed comprise:
After receiving the event that triggers conversion, eject mask layer;
Draw and get a picture region in the zone of described mask layer;
Image content in described picture region is saved as bitmap object.
3. method as claimed in claim 2, is characterized in that, the described image content that will intercept is converted to word content and comprises:
, according to described bitmap object and default optical character recognition algorithms, obtain the Word message in described bitmap object;
Described text-string is carried out the analysis of syntax and semantics, obtain text-string.
4. method as claimed in claim 3, is characterized in that, describedly word content is converted to voice signal comprises:
, according to described text-string and default speech recognition engine, generate voice signal corresponding to described text-string.
5. method as described in any one in claim 1 to 4, is characterized in that, described method also comprises:
Play described voice signal.
6. a device that screen text is converted into voice, is characterized in that, described device comprises:
The picture interception module, be used for intercepting image content to be transformed on terminal screen;
Character recognition module, be used for the image content of described intercepting is converted to word content;
The sound modular converter, be used for described word content is converted to voice signal.
7. device as claimed in claim 6, is characterized in that, described picture interception module comprises:
Trigger element, after being used for receiving the event that triggers conversion, eject mask layer;
Draw and get unit, be used for drawing and getting a picture region in the zone of described mask layer;
Storage unit, be used for the image content in described picture region is saved as bitmap object.
8. device as claimed in claim 7, is characterized in that, described character recognition module comprises:
Acquiring unit, be used for obtaining the Word message in described bitmap object according to described bitmap object and default optical character recognition algorithms;
Analytic unit, be used for described text-string is carried out the analysis of syntax and semantics, obtains text-string.
9. device as claimed in claim 8, is characterized in that, described sound modular converter specifically is used for:
, according to described text-string and default speech recognition engine, generate voice signal corresponding to described text-string.
10. device as described in any one in claim 6 to 9, is characterized in that, also comprises:
The voice output module, be used for playing described voice signal.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2013103066970A CN103390159A (en) | 2013-07-19 | 2013-07-19 | Method and device for converting screen character into voice |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2013103066970A CN103390159A (en) | 2013-07-19 | 2013-07-19 | Method and device for converting screen character into voice |
Publications (1)
Publication Number | Publication Date |
---|---|
CN103390159A true CN103390159A (en) | 2013-11-13 |
Family
ID=49534426
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN2013103066970A Pending CN103390159A (en) | 2013-07-19 | 2013-07-19 | Method and device for converting screen character into voice |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103390159A (en) |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105550233A (en) * | 2015-12-04 | 2016-05-04 | 广东欧珀移动通信有限公司 | Method and device for extracting characters from picture |
CN105955646A (en) * | 2016-04-25 | 2016-09-21 | 维沃移动通信有限公司 | Content processing method and intelligent terminal |
CN105988709A (en) * | 2015-12-03 | 2016-10-05 | 广州阿里巴巴文学信息技术有限公司 | Information processing method and device |
CN105989365A (en) * | 2015-01-30 | 2016-10-05 | 深圳市思路飞扬信息技术有限责任公司 | Vision assistant device, system and method |
CN106022332A (en) * | 2016-04-15 | 2016-10-12 | 广州阿里巴巴文学信息技术有限公司 | Terminal device, and device and method of converting paper books into books to be listened for playing |
CN106325750A (en) * | 2016-08-26 | 2017-01-11 | 曹蕊 | Character recognition method and system applied in terminal equipment |
CN106815584A (en) * | 2017-01-19 | 2017-06-09 | 安徽声讯信息技术有限公司 | A kind of camera based on OCR technique is found a view picture conversion system manually |
CN106911959A (en) * | 2017-02-06 | 2017-06-30 | 深圳创维数字技术有限公司 | A kind of voice drawing reading and system based on intelligent television |
CN107633043A (en) * | 2017-09-14 | 2018-01-26 | 广东欧珀移动通信有限公司 | Picture based reminding method, device, terminal device and storage medium |
CN108182432A (en) * | 2017-12-28 | 2018-06-19 | 北京百度网讯科技有限公司 | Information processing method and device |
CN109462689A (en) * | 2018-09-30 | 2019-03-12 | 深圳壹账通智能科技有限公司 | Voice broadcast method and device, electronic device and computer readable storage medium |
CN109933275A (en) * | 2019-02-12 | 2019-06-25 | 努比亚技术有限公司 | A kind of knowledge screen method, terminal and computer readable storage medium |
CN110674825A (en) * | 2019-09-27 | 2020-01-10 | 安徽咪鼠科技有限公司 | Character recognition method, device and system applied to intelligent voice mouse and storage medium |
US10824790B1 (en) | 2019-05-28 | 2020-11-03 | Malcolm E. LeCounte | System and method of extracting information in an image containing file for enhanced utilization and presentation |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6948937B2 (en) * | 2002-01-15 | 2005-09-27 | Tretiakoff Oleg B | Portable print reading device for the blind |
CN1960532A (en) * | 2006-11-08 | 2007-05-09 | 青岛海信移动通信技术股份有限公司 | Handset with function of reading aloud, and implementation method |
-
2013
- 2013-07-19 CN CN2013103066970A patent/CN103390159A/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6948937B2 (en) * | 2002-01-15 | 2005-09-27 | Tretiakoff Oleg B | Portable print reading device for the blind |
CN1960532A (en) * | 2006-11-08 | 2007-05-09 | 青岛海信移动通信技术股份有限公司 | Handset with function of reading aloud, and implementation method |
Cited By (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105989365A (en) * | 2015-01-30 | 2016-10-05 | 深圳市思路飞扬信息技术有限责任公司 | Vision assistant device, system and method |
CN108646975A (en) * | 2015-12-03 | 2018-10-12 | 广州阿里巴巴文学信息技术有限公司 | Information processing method and device |
CN105988709A (en) * | 2015-12-03 | 2016-10-05 | 广州阿里巴巴文学信息技术有限公司 | Information processing method and device |
CN105550233A (en) * | 2015-12-04 | 2016-05-04 | 广东欧珀移动通信有限公司 | Method and device for extracting characters from picture |
CN106022332B (en) * | 2016-04-15 | 2019-04-02 | 广州阿里巴巴文学信息技术有限公司 | Papery reading matter is switched to the device and method that reading matter to be listened plays by terminal device |
CN106022332A (en) * | 2016-04-15 | 2016-10-12 | 广州阿里巴巴文学信息技术有限公司 | Terminal device, and device and method of converting paper books into books to be listened for playing |
CN105955646A (en) * | 2016-04-25 | 2016-09-21 | 维沃移动通信有限公司 | Content processing method and intelligent terminal |
CN106325750A (en) * | 2016-08-26 | 2017-01-11 | 曹蕊 | Character recognition method and system applied in terminal equipment |
CN106815584A (en) * | 2017-01-19 | 2017-06-09 | 安徽声讯信息技术有限公司 | A kind of camera based on OCR technique is found a view picture conversion system manually |
CN106911959B (en) * | 2017-02-06 | 2020-01-14 | 深圳创维数字技术有限公司 | Voice picture reading method and system based on smart television |
CN106911959A (en) * | 2017-02-06 | 2017-06-30 | 深圳创维数字技术有限公司 | A kind of voice drawing reading and system based on intelligent television |
CN107633043A (en) * | 2017-09-14 | 2018-01-26 | 广东欧珀移动通信有限公司 | Picture based reminding method, device, terminal device and storage medium |
CN108182432A (en) * | 2017-12-28 | 2018-06-19 | 北京百度网讯科技有限公司 | Information processing method and device |
US10963760B2 (en) | 2017-12-28 | 2021-03-30 | Beijing Baidu Netcom Science And Technology Co., Ltd. | Method and apparatus for processing information |
CN109462689A (en) * | 2018-09-30 | 2019-03-12 | 深圳壹账通智能科技有限公司 | Voice broadcast method and device, electronic device and computer readable storage medium |
CN109462689B (en) * | 2018-09-30 | 2022-01-04 | 深圳壹账通智能科技有限公司 | Voice broadcasting method and device, electronic device and computer readable storage medium |
CN109933275A (en) * | 2019-02-12 | 2019-06-25 | 努比亚技术有限公司 | A kind of knowledge screen method, terminal and computer readable storage medium |
US10824790B1 (en) | 2019-05-28 | 2020-11-03 | Malcolm E. LeCounte | System and method of extracting information in an image containing file for enhanced utilization and presentation |
CN110674825A (en) * | 2019-09-27 | 2020-01-10 | 安徽咪鼠科技有限公司 | Character recognition method, device and system applied to intelligent voice mouse and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103390159A (en) | Method and device for converting screen character into voice | |
US10976925B2 (en) | Notification shade with animated reveal of notification indications | |
KR102393928B1 (en) | User terminal apparatus for recommanding a reply message and method thereof | |
CN108664201B (en) | Text editing method and device and electronic equipment | |
US11762529B2 (en) | Method for displaying application icon and electronic device | |
JP2023103313A (en) | Invoking automated assistant functions based on detected gesture and gaze | |
WO2020019220A1 (en) | Method for displaying service information in preview interface, and electronic device | |
US20040240739A1 (en) | Pen gesture-based user interface | |
JP7302038B2 (en) | USER PROFILE PICTURE GENERATION METHOD AND ELECTRONIC DEVICE | |
KR20180004552A (en) | Method for controlling user interface according to handwriting input and electronic device for the same | |
KR102591555B1 (en) | Selective detection of visual cues for automated assistants | |
US10620803B2 (en) | Selecting at least one graphical user interface item | |
US20210312679A1 (en) | Method for generating identification pattern and terminal device | |
WO2021222251A1 (en) | Augmented reality interaction and contextual menu system | |
CN103116463A (en) | Interface control method of personal digital assistant applications and mobile terminal | |
CN110825164A (en) | Interaction method and system based on wearable intelligent equipment special for children | |
CN104298654A (en) | Method and device for displaying information | |
CN104216646A (en) | Method and device for creating application program based on gesture | |
CN115357249A (en) | Code generation method and device, electronic equipment and storage medium | |
CN104898928A (en) | Graphic and text information display method and terminal | |
CN111008267A (en) | Intelligent dialogue method and related equipment | |
CN114745585B (en) | Subtitle display method, device, terminal and storage medium | |
KR20160055552A (en) | Method and Device for displaying memo | |
CN106776634A (en) | A kind of method for network access, device and terminal device | |
CN113900620B (en) | Interaction method, device, electronic equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C02 | Deemed withdrawal of patent application after publication (patent law 2001) | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20131113 |