CN111461095A - Voice point reading method, device, equipment and readable medium - Google Patents

Voice point reading method, device, equipment and readable medium Download PDF

Info

Publication number
CN111461095A
CN111461095A CN201910054309.1A CN201910054309A CN111461095A CN 111461095 A CN111461095 A CN 111461095A CN 201910054309 A CN201910054309 A CN 201910054309A CN 111461095 A CN111461095 A CN 111461095A
Authority
CN
China
Prior art keywords
voice
reading
read
content
user
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910054309.1A
Other languages
Chinese (zh)
Inventor
不公告发明人
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing ByteDance Network Technology Co Ltd
Original Assignee
Beijing ByteDance Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing ByteDance Network Technology Co Ltd filed Critical Beijing ByteDance Network Technology Co Ltd
Priority to CN201910054309.1A priority Critical patent/CN111461095A/en
Publication of CN111461095A publication Critical patent/CN111461095A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/22Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/153Segmentation of character regions using recognition of characters or words
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

The embodiment of the disclosure discloses a voice touch and talk method, a device, equipment and a readable medium. Wherein, the method comprises the following steps: if the operation that the user points at the current to-be-read data is detected, and a reading voice instruction of the user is obtained, obtaining an image of the current to-be-read data; identifying the content of the pointed area in the image of the current data to be read according to the operation and the reading voice command; and according to the point-reading voice instruction, broadcasting a point-reading result of the identification content in a voice mode. According to the technical scheme, convenience and real-time performance of point reading are improved, point reading is achieved through direct operation of the fingers of the user, the corresponding point reading function is achieved without assistance of the specially-configured point reading pen, accuracy of point reading content identification is improved, and use experience of the user is enhanced.

Description

Voice point reading method, device, equipment and readable medium
Technical Field
The disclosed embodiments relate to computer processing technologies, and in particular, to a method, an apparatus, a device, and a readable medium for voice point reading.
Background
In the existing point-reading device, in order to meet the diversified requirements of point reading, a dedicated point-reading pen is generally configured for the point-reading device, and the corresponding position on the book is clicked by the point-reading pen, so that the text information of the position is acquired for identification.
At this time, the user is required to carry the configured point-reading pen with him to realize the interaction of the text information on the book, and when the point-reading pen is lost, the point-reading device cannot be used to identify the data information in the book, so that certain point-reading limitation exists.
Disclosure of Invention
In view of this, the embodiments of the present disclosure provide a voice touch-reading method, apparatus, device and readable medium, which solve the problem in the prior art that data must be touched and read by a touch-reading pen, reduce the limitation of information touch-reading, and improve the convenience of touch-reading.
In a first aspect, an embodiment of the present disclosure provides a voice touch and talk method, where the method includes:
if the operation that the user points at the current to-be-read data is detected, and a reading voice instruction of the user is obtained, obtaining an image of the current to-be-read data;
identifying the content of the pointed area in the image of the current data to be read according to the operation and the reading voice command;
and according to the point-reading voice instruction, broadcasting a point-reading result of the identification content in a voice mode.
Further, recognizing the content of the pointed area in the image of the current material to be read according to the operation and the reading voice instruction, including:
obtaining key characteristic points of a user finger in the image of the current data to be read according to the operation;
determining a corresponding pointed area according to the point-reading voice instruction and the key feature point;
the content of the pointed to area is identified.
Further, the identifying the content of the pointed region includes:
and recognizing the text information of the pointed area by adopting an optical character recognition algorithm.
Further, according to the click-to-read voice command, a click-to-read result of the recognized content is broadcasted in a voice mode, and the click-to-read result comprises the following steps:
recognizing the point-reading voice instruction by adopting a natural language processing algorithm to obtain an instruction keyword;
determining a reading result of the identified pointed region content according to the instruction key words;
and broadcasting the reading result by voice.
Further, the voice broadcast the reading result includes:
and converting the reading result into a voice signal, and carrying out voice broadcasting through the 3D doll on the screen.
Further, the voice touch-reading method includes:
if a voice interaction request of a user is acquired, determining corresponding interaction information according to the voice interaction request;
and broadcasting the interactive information through the 3D doll voice.
In a second aspect, an embodiment of the present disclosure provides a voice touch and talk device, where the device includes:
the image acquisition module is used for acquiring an image of the current data to be read if the operation that the user points to the current data to be read is detected and the reading voice instruction of the user is acquired;
the content identification module is used for identifying the content of the pointed area in the image of the current data to be read according to the operation and the reading voice instruction;
and the voice reading module is used for broadcasting the reading result of the identification content according to the reading voice command and the voice.
Further, the content identification module includes:
the characteristic point acquisition unit is used for acquiring key characteristic points of a user finger in the image of the current data to be read according to the operation;
the region determining unit is used for determining a corresponding pointed region according to the point reading voice instruction and the key feature point;
a content identification unit for identifying the content of the pointed area.
Further, the content identification unit is specifically configured to:
and recognizing the text information of the pointed area by adopting an optical character recognition algorithm.
Further, the voice reading module includes:
the keyword determining unit is used for identifying the point-reading voice instruction by adopting a natural language processing algorithm to obtain an instruction keyword;
the reading result determining unit is used for determining the reading result of the identified pointed region content according to the instruction key words;
and the voice broadcasting unit is used for voice broadcasting the reading result.
Further, the voice broadcast unit is specifically used for:
and converting the reading result into a voice signal, and carrying out voice broadcasting through the 3D doll on the screen.
Further, the voice touch and talk device further includes:
the interactive information determining module is used for determining corresponding interactive information according to the voice interaction request if the voice interaction request of the user is acquired;
and the voice interaction module is used for broadcasting the interaction information through the 3D doll voice.
In a third aspect, an embodiment of the present disclosure further provides an apparatus, where the apparatus includes:
one or more processors;
a memory for storing one or more programs;
when executed by the one or more processors, cause the one or more processors to implement a voice click-to-read method as described in any embodiment of the present disclosure.
In a fourth aspect, the embodiments of the present disclosure provide a readable medium, on which a computer program is stored, where the computer program, when executed by a processor, implements a voice touch-and-talk method according to any of the embodiments of the present disclosure.
According to the voice touch-reading method, the voice touch-reading device, the voice touch-reading equipment and the readable medium, when the operation that a user points at the current to-be-touched-read data is detected, and the touch-reading voice instruction of the user is obtained, the pointed area acted by the user finger in the image of the current to-be-touched-read data is determined, the content in the pointed area is recognized, the touch-reading result of the recognized content is broadcasted in a voice mode according to the touch-reading voice instruction, the touch-reading convenience and the real-time performance are improved, the touch-reading is realized through the direct operation of the user finger, the corresponding touch-reading function is not needed to be realized through the assistance of a specially-configured touch-reading pen, the touch-reading content recognition accuracy is improved, and the use experience of the user is enhanced.
Drawings
In order to more clearly illustrate the embodiments of the present disclosure or the technical solutions in the prior art, a brief description will be given below to the drawings required for the embodiments or the technical solutions in the prior art, and it is obvious that the drawings in the following description are some embodiments of the present disclosure, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
Fig. 1 shows a flowchart of a voice touch-and-talk method provided by an embodiment of the present disclosure;
FIG. 2 is a schematic diagram illustrating a principle of recognizing the content of a pointed region in a voice click-to-read process according to an embodiment of the present disclosure;
fig. 3 is a schematic diagram illustrating a principle of voice broadcasting in a voice touch and talk process according to an embodiment of the present disclosure;
fig. 4 is a schematic structural diagram illustrating a voice touch-and-talk device according to an embodiment of the present disclosure;
fig. 5 shows a schematic structural diagram of an apparatus provided by an embodiment of the present disclosure.
Detailed Description
To make the objects, technical solutions and advantages of the present disclosure clearer, the technical solutions of the present disclosure will be clearly and completely described below through embodiments with reference to the accompanying drawings in the embodiments of the present disclosure, and it is obvious that the described embodiments are some, but not all embodiments of the present disclosure. All other embodiments, which can be derived by a person skilled in the art from the embodiments disclosed herein without making any creative effort, shall fall within the protection scope of the present disclosure.
Fig. 1 shows a flowchart of a voice touch-reading method provided by an embodiment of the present disclosure, which is applicable to a process of touch-reading data contents in a book. The voice reading method provided by this embodiment may be executed by the voice reading apparatus provided by this embodiment of the present disclosure, and the apparatus may be implemented in a software and/or hardware manner, and is integrated in a device for executing the method, and the device for executing the method in this embodiment may be an intelligent terminal for reading.
Specifically, as shown in fig. 1, the voice touch-and-talk method provided in the embodiment of the present disclosure may include the following steps:
s110, if the operation that the user points to the current to-be-read data is detected and the reading voice command of the user is obtained, the image of the current to-be-read data is obtained.
The voice touch reading method in the embodiment is mainly applied to a touch reading device for touch reading content in book data, the touch reading device does not need to be configured with a dedicated touch reading pen to assist in achieving a touch reading function, and when a user uses the touch reading device to touch reading the book data, the user firstly starts a camera device which is pre-configured on the touch reading device, so that when the user places the data to be touched read in an acquirable area of the touch reading device, the data to be touched read can be previewed on a screen of the touch reading device in real time, and the user can watch the data to be touched read on the screen; meanwhile, the user can send out corresponding reading voice, the reading equipment acquires and analyzes the reading voice through a pre-configured voice collector, and the current reading intention of the user is judged, so that the subsequent reading function is realized. The reading device in the embodiment is mainly used in the learning auxiliary process of students, so the data to be read can be related data in various teaching fields, such as various books, courseware or exercises.
Meanwhile, when a user needs to perform click reading on the content of a certain book material, the user points the finger to the position of the material to be click-read so as to determine the specific content contained in the material to be click-read which needs to be click-read at this time, in this embodiment, the user directly points to the corresponding click-read position by using the finger of the user without pressing the specific click-read position in the book by using a specially configured click-read pen, and at this time, the operation of pointing to the current material to be click-read by the user refers to the operation of pointing to the click-read position in the material to be click-read by using the finger of the user who points to the image of the material to be click-read displayed on the; the click-to-read voice instruction is a trigger instruction used for indicating that the click-to-read of corresponding contents in the data to be clicked and pointed by the user through a finger is required currently.
Optionally, when a user needs to use the point-reading device in this embodiment to read data in a book, the user may first open the camera device configured in the point-reading device, and place the data to be read in the image acquisition area of the point-reading device, so that the data to be read is previewed on the screen of the point-reading device; when a user needs to read a specific content in the to-be-read data, a finger is firstly pointed to a specific position of the to-be-read data in the image acquisition area, and a corresponding reading voice is sent out, at this time, the reading device can detect an operation of the user pointing to the to-be-read data on a screen, and a reading voice instruction of the user can be acquired through a pre-configured voice acquisition unit to indicate that the user needs to read a certain content in the to-be-read data currently, so that an image of the current to-be-read data is acquired firstly, and the specific content needing to be read at this time is judged subsequently according to the specific pointing direction of the finger in the image of the current to-be-read data.
S120, according to the operation and the reading voice command, the content of the pointed area in the image of the current material to be read is identified.
The pointed area refers to a position area where the specific information required to be read at this time in the data to be read pointed by the user through the finger is located.
Specifically, when an image of current data to be read is obtained, firstly, a corresponding image recognition technology is adopted to analyze a fingertip pointing direction corresponding to an operation that a user finger points at the data to be read in the image, and a point reading voice command is analyzed to judge a type of the point reading content to be read at this time, wherein the type of the point reading content refers to a set type to which information to be read at this time belongs, and for example, the point reading content at this time belongs to words, sentences, paragraphs and the like; for example, the click-to-read voice instruction may be "please translate the word", and at this time, it may be determined that the click-to-read content type is the word; and then, according to the fingertip pointing direction in the image and the type of the click-to-read content contained in the click-to-read voice instruction, the position region pointed to correspondingly by the click-to-read at this time, that is, the pointed region in the embodiment, is determined. Meanwhile, in order to realize a corresponding point reading function, after a corresponding pointed area in the image of the current data to be point read is determined, the text information contained in the pointed area can be identified, so that the corresponding specific content of the point reading is obtained.
And S130, broadcasting the reading result of the identification content in a voice mode according to the reading voice command.
Optionally, after identifying the content of the pointed region in the image of the current data to be read, analyzing the reading voice instruction, and determining the reading intention of the user at this time, where, for example, determining whether the reading voice instruction is directed to a translation, reading, or calculation and problem solving step, and the like, according to different reading intentions carried in the reading voice instruction, a reading result corresponding to the reading intention corresponding to the identified content of the pointed region may be further searched in a background server or a cloud service end, and the reading result corresponding to the identified content is played in a voice broadcast manner; for example, if the click-to-read voice command is "please translate the word", at this time, the translation corresponding to the word may be searched according to the word included in the identified pointed region, and the translation is broadcasted to the corresponding user through voice, so as to implement the corresponding click-to-read function.
According to the technical scheme provided by the embodiment of the disclosure, when the operation that a user points at the current to-be-read data is detected and the touch-read voice instruction of the user is obtained, the pointed area acted by the user finger in the image of the current to-be-read data is determined, the content in the pointed area is identified, and then the touch-read result of the identified content is broadcasted in voice according to the touch-read voice instruction, so that the touch-read convenience and the real-time performance are improved, the touch-read is realized through the direct operation of the user finger, the corresponding touch-read function is realized without the assistance of a specially-configured touch-read pen, the touch-read content identification accuracy is improved, and the use experience of the user is enhanced.
On the basis of the technical solutions provided by the above embodiments, detailed explanations are respectively made on the processes of recognizing the content of the pointed region and broadcasting the recognized content by voice in the voice touch-and-talk method provided by the embodiment of the present disclosure.
Fig. 2 is a schematic diagram illustrating a principle of recognizing the content of the pointed area in the voice click-to-read process according to an embodiment of the present disclosure, which is optimized based on the various alternatives provided above. Specifically, the present embodiment describes in detail a specific determination process for identifying the content of the pointed region according to the image of the current material to be read.
Optionally, the method may specifically include the following steps:
s210, if the operation that the user points to the current to-be-read data is detected and the reading voice command of the user is obtained, the image of the current to-be-read data is obtained.
And S220, acquiring key characteristic points of the user fingers in the image of the current data to be read according to the operation.
Specifically, when the image of the current data to be read is acquired, the image may be identified through a corresponding image identification technology, so as to determine a key feature point 21 on the user finger corresponding to an operation of pointing the user finger to the current data to be read in the image, as shown in fig. 2, the key feature point 21 is a corresponding joint point on the user finger, so as to determine a pointing position of the finger in the image in the following.
And S230, determining a corresponding pointed area according to the point-reading voice command and the key characteristic point.
Specifically, when obtaining the key feature point 21 of the user finger in the image of the current data to be read, the position of the finger tip of the user finger may be determined according to the position of the key feature point 21, so as to determine the pointing position of the user finger in the image of the data to be read, and at the same time, the reading voice instruction is analyzed to obtain the type of the reading content contained in the reading voice instruction, that is, to determine which of the word, paragraph, or mathematical problem the reading content is, and then from the pointing position of the user finger, by identifying the text information at the pointing position, the pointed region 22 in the image corresponding to the type of the reading content is determined.
In addition, when the corresponding pointed region is determined according to the pointing position of the user finger, because the user can draw the corresponding reading region on the material to be read by the finger and use the drawn reading region as the pointed region in the embodiment, in this embodiment, the operation that the user points to the current material to be read can be detected, and when the reading voice instruction of the user is obtained, multiple frames of images of the current material to be read are continuously obtained, the pointing position of the user finger in each frame is analyzed, and then the reading region drawn by the user finger is determined according to the pointing positions of the user finger in the multiple frames and used as the corresponding pointed region, and then the content contained in the pointed region is identified, so that the corresponding reading function is realized.
S240, the content of the pointed area is identified.
Optionally, when the corresponding pointed area in the image of the current data to be read is determined, in this embodiment, the text information contained in the pointed area of the pen tip may be identified by a character recognition technology; for example, in the present embodiment, an Optical Character Recognition (OCR) algorithm may be used to recognize the text information of the pointed region. The OCR algorithm is a process in which an electronic device (e.g., a scanner or a digital camera) inspects characters printed on paper or displayed on a screen, determines a shape of the characters by detecting a dark and light pattern of the characters, and then translates the shape into computer characters by a character recognition method to automatically recognize the characters; in this embodiment, the shape of each character included in the pointed region is detected by an OCR algorithm, so as to determine the text information in the pointed region.
And S250, broadcasting the reading result of the identification content in a voice mode according to the reading voice command.
According to the technical scheme, the corresponding pointed area is determined through the key feature point of the user finger in the image of the current data to be read and the click-to-read voice command, the content in the pointed area is identified, and then the click-to-read result of the identified content is broadcasted in a voice mode according to the click-to-read voice command, so that the convenience and the real-time performance of click-to-read are improved, click-to-read is achieved through direct operation of the user finger, the corresponding click-to-read function is achieved without assistance of a specially-configured click-to-read pen, and the use experience of a user is enhanced.
Fig. 3 shows a schematic diagram of a principle of voice broadcasting in a voice touch and talk process provided by the embodiment of the present disclosure, and the embodiment is optimized based on the various alternatives provided above. Specifically, this embodiment introduces in detail a specific broadcasting process of a click-to-read result corresponding to the identification content by voice broadcasting according to the click-to-read voice command.
Optionally, the method may specifically include the following steps:
s310, if the operation that the user points to the current to-be-read data is detected and the reading voice command of the user is obtained, the image of the current to-be-read data is obtained.
S320, according to the operation and the reading voice command, identifying the content of the pointed area in the image of the current material to be read.
S330, recognizing the point-reading voice instruction by adopting a natural language processing algorithm to obtain an instruction keyword.
The Natural L language Processing (N L P) algorithm is a theory and a method that can realize effective communication between a human and a computer in a Natural language, and can accurately analyze information contained in a user language.
Specifically, in this embodiment, when determining the corresponding click-to-read result, the N L P algorithm may be first used to identify the click-to-read voice instruction, so as to obtain an instruction keyword included in the click-to-read voice instruction, where the instruction keyword may clarify the click-to-read intention of the user in this time, and if the click-to-read voice instruction is "please translate the word", the obtained instruction keywords are "translate" and "word", so as to determine that this time click-to-read is to translate the word.
S340, determining a reading result of the identified pointed area content according to the instruction key words.
Optionally, when an instruction keyword included in the click-to-read voice instruction is obtained, the click-to-read intention of the user determined by the instruction keyword at this time may be determined, and meanwhile, a click-to-read result corresponding to the content included in the currently identified pointed region may be searched in the background server or the cloud server according to the obtained instruction keyword, and if the instruction keyword is "translation", a translation result of the text information included in the pointed region may be searched; for example, when a certain mathematical problem is clicked, the click-to-read voice instruction may be "please calculate the problem", at this time, the mathematical problem pointed by the user through the finger is determined, and a corresponding problem solving process of the mathematical problem is searched at a background server or a cloud server, and the process is used as the click-to-read result.
And S350, broadcasting the click-to-read result through voice.
Specifically, after the click-to-read result of the identified pointed region content is obtained, in this embodiment, the corresponding click-to-read result of the click-to-read of this time may be broadcasted in a voice manner through a voice player configured in advance on the click-to-read device; exemplarily, the voice broadcasting of the click-to-read result in this embodiment may specifically include: and converting the reading result into a voice signal, and carrying out voice broadcasting through the 3D doll on the screen.
Specifically, in this embodiment, a 3D doll 31 may be designed in advance in the point-and-read device, as shown in fig. 3, the 3D doll 31 is correspondingly displayed on the screen, and then the 3D doll 31 is used to perform corresponding voice broadcast. Optionally, when a corresponding click-to-read result is obtained, the click-to-read result can be correspondingly converted into a matched voice signal, the voice signal is broadcasted through the 3D doll 31 on the screen, meanwhile, the 3D doll 31 can analyze the corresponding emotion of the voice broadcast at the moment according to the click-to-read result of the voice broadcast, so that the 3D doll 31 can correspondingly play the voice signal corresponding to the click-to-read result with the matched voice effect, and the voice effect can be a matched voice special effect or a doll action special effect and the like.
And S360, if the voice interaction request of the user is acquired, determining corresponding interaction information according to the voice interaction request.
Optionally, the 3D doll set in this embodiment may further perform corresponding voice interaction with the user, and the user may receive a corresponding voice interaction request by sending corresponding interaction voice, where the voice interaction request may carry content of corresponding information that needs to be queried by the user in the current interaction, and further find interaction information corresponding to the current request according to the voice interaction request, so as to feed back the interaction information to the user later, where the voice interaction request may be "please find a near-synonym of a last click-read word", and the click-read device identifies the voice interaction request by using an N L P algorithm, and finds a near-synonym of the specified word on a background server or a cloud service, so as to feed back the near-synonym to the user later.
And S370, broadcasting the interactive information through 3D doll voice.
Specifically, after the corresponding interactive information is determined, the interactive information can be correspondingly converted into matched voice signals, voice broadcasting is carried out through a 3D doll on a screen, corresponding human-computer voice interaction is achieved on the basis that accurate point reading is guaranteed, and diversification of point reading is improved.
The technical scheme that this disclosure embodiment provided, through the 3D image that sets for on the screen corresponding, adopt the N L P algorithm to obtain the instruction keyword in the click-to-read voice command, and after determining the click-to-read result that corresponds according to this instruction keyword, can carry out voice broadcast to this click-to-read result through the 3D image, the diversified design of click-to-read has been increased, the convenience and the real-time of click-to-read have been improved, can also increase the diversification of information interaction through the 3D image, user's use experience has been promoted.
Fig. 4 is a schematic structural diagram of a voice reading device according to an embodiment of the present disclosure, where the embodiment of the present disclosure is applicable to a process of reading data content on a book, and the device may be implemented by software and/or hardware and integrated in a device for executing the method. As shown in fig. 4, the voice touch-and-talk device in the embodiment of the present disclosure may include:
the image obtaining module 410 is configured to obtain an image of the current to-be-read data if an operation that the user points at the current to-be-read data is detected and a reading voice instruction of the user is obtained;
a content identification module 420, configured to identify, according to the operation and the click-to-read voice instruction, content of a pointed region in an image of the current material to be clicked and read;
and the voice reading module 430 is configured to broadcast a reading result of the identified content according to the reading voice instruction.
According to the technical scheme provided by the embodiment of the disclosure, when the operation that a user points at the current to-be-read data is detected and the touch-read voice instruction of the user is obtained, the pointed area acted by the user finger in the image of the current to-be-read data is determined, the content in the pointed area is identified, and then the touch-read result of the identified content is broadcasted in voice according to the touch-read voice instruction, so that the touch-read convenience and the real-time performance are improved, the touch-read is realized through the direct operation of the user finger, the corresponding touch-read function is realized without the assistance of a specially-configured touch-read pen, the touch-read content identification accuracy is improved, and the use experience of the user is enhanced.
Further, the content identification module 420 may include:
a feature point acquisition unit, configured to acquire key feature points of a user finger in an image of the current data to be read according to the operation;
the area determining unit is used for determining a corresponding pointed area according to the point-reading voice instruction and the key feature point;
a content identification unit for identifying the content of the pointed area.
Further, the content identification unit may be specifically configured to:
and recognizing the text information of the pointed area by adopting an optical character recognition algorithm.
Further, the voice reading module 430 may include:
the keyword determining unit is used for identifying the point-reading voice instruction by adopting a natural language processing algorithm to obtain an instruction keyword;
a reading result determining unit, configured to determine a reading result of the identified pointed region content according to the instruction keyword;
and the voice broadcasting unit is used for voice broadcasting the click-to-read result.
Further, above-mentioned voice broadcast unit can specifically be used for:
and converting the reading result into a voice signal, and carrying out voice broadcasting through the 3D doll on the screen.
Further, the voice touch and talk device may further include:
the interactive information determining module is used for determining corresponding interactive information according to the voice interactive request if the voice interactive request of the user is acquired;
and the voice interaction module is used for broadcasting the interaction information through the 3D doll voice.
The voice reading device provided by the embodiment of the present disclosure and the voice reading method provided by the embodiment belong to the same inventive concept, and technical details that are not described in detail in the embodiment of the present disclosure may refer to the embodiment, and the embodiment of the present disclosure have the same beneficial effects.
Referring now to FIG. 5, a block diagram of an apparatus 500 suitable for use in implementing embodiments of the present disclosure is shown. The devices in the embodiments of the present disclosure may include, but are not limited to, mobile terminals such as mobile phones, notebook computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablet computers), PMPs (portable multimedia players), in-vehicle terminals (e.g., car navigation terminals), and the like, and fixed terminals such as digital TVs, desktop computers, and the like. The device shown in fig. 5 is only an example and should not bring any limitation to the function and use range of the embodiments of the present disclosure.
As shown in fig. 5, the apparatus 500 may include a processing device (e.g., central processing unit, graphics processor, etc.) 501 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)502 or a program loaded from a storage device 508 into a Random Access Memory (RAM) 503. In the RAM 503, various programs and data necessary for the operation of the apparatus 500 are also stored. The processing device 501, the ROM 502, and the RAM 503 are connected to each other through a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.
In general, input devices 506 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc., output devices 507 including, for example, a liquid crystal display (L CD), speaker, vibrator, etc., storage devices 508 including, for example, magnetic tape, hard disk, etc., and communication devices 509. communication devices 509 may allow device 500 to communicate wirelessly or wiredly with other devices to exchange data.
In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication means 509, or installed from the storage means 508, or installed from the ROM 502. The computer program performs the above-described functions defined in the methods of the embodiments of the present disclosure when executed by the processing device 501.
It should be noted that the computer readable medium in the present disclosure can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.
The computer readable medium may be embodied in the apparatus; or may be separate and not incorporated into the device.
The computer readable medium carries one or more programs which, when executed by the apparatus, cause the apparatus to: if the operation that the user points at the current to-be-read data is detected, and a reading voice instruction of the user is obtained, obtaining an image of the current to-be-read data; identifying the content of the pointed area in the image of the current data to be read according to the operation and the reading voice command; and according to the click-to-read voice command, the click-to-read result of the identified content is broadcasted in a voice mode.
Computer program code for carrying out operations of the present disclosure may be written in any combination of one or more programming languages, including AN object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units described in the embodiments of the present disclosure may be implemented by software or hardware. Where the name of an element does not in some cases constitute a limitation on the element itself.
The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the disclosure herein is not limited to the particular combination of features described above, but also encompasses other embodiments in which any combination of the features described above or their equivalents does not depart from the spirit of the disclosure. For example, the above features and (but not limited to) the features disclosed in this disclosure having similar functions are replaced with each other to form the technical solution.

Claims (10)

1. A voice point-reading method is characterized by comprising the following steps:
if the operation that the user points at the current to-be-read data is detected, and a reading voice instruction of the user is obtained, obtaining an image of the current to-be-read data;
identifying the content of the pointed area in the image of the current data to be read according to the operation and the reading voice command;
and according to the point-reading voice instruction, broadcasting a point-reading result of the identification content in a voice mode.
2. The method of claim 1, wherein identifying the content of the pointed region in the image of the material to be read currently according to the operation and the reading voice command comprises:
obtaining key characteristic points of a user finger in the image of the current data to be read according to the operation;
determining a corresponding pointed area according to the point-reading voice instruction and the key feature point;
the content of the pointed to area is identified.
3. The method of claim 2, wherein the identifying the content of the pointed region comprises:
and recognizing the text information of the pointed area by adopting an optical character recognition algorithm.
4. The method of claim 1, wherein according to the click-to-read voice command, the voice broadcasting a click-to-read result of the identified content comprises:
recognizing the point-reading voice instruction by adopting a natural language processing algorithm to obtain an instruction keyword;
determining a reading result of the identified pointed region content according to the instruction key words;
and broadcasting the reading result by voice.
5. The method according to claim 4, wherein the voice broadcasting the click-to-read result comprises:
and converting the reading result into a voice signal, and carrying out voice broadcasting through the 3D doll on the screen.
6. The method of claim 5, further comprising:
if a voice interaction request of a user is acquired, determining corresponding interaction information according to the voice interaction request;
and broadcasting the interactive information through the 3D doll voice.
7. A speech reading apparatus, comprising:
the image acquisition module is used for acquiring an image of the current data to be read if the operation that the user points to the current data to be read is detected and the reading voice instruction of the user is acquired;
the content identification module is used for identifying the content of the pointed area in the image of the current data to be read according to the operation and the reading voice instruction;
and the voice reading module is used for broadcasting the reading result of the identification content according to the reading voice command and the voice.
8. The apparatus of claim 7, wherein the content identification module comprises:
the characteristic point acquisition unit is used for acquiring key characteristic points of a user finger in the image of the current data to be read according to the operation;
the region determining unit is used for determining a corresponding pointed region according to the point reading voice instruction and the key feature point;
a content identification unit for identifying the content of the pointed area.
9. An apparatus, characterized in that the apparatus comprises:
one or more processors;
a memory for storing one or more programs;
when executed by the one or more processors, cause the one or more processors to implement the voice touch-and-talk method of any of claims 1-6.
10. A readable medium, on which a computer program is stored which, when being executed by a processor, carries out the method for speech touch-reading according to any one of claims 1-6.
CN201910054309.1A 2019-01-21 2019-01-21 Voice point reading method, device, equipment and readable medium Pending CN111461095A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910054309.1A CN111461095A (en) 2019-01-21 2019-01-21 Voice point reading method, device, equipment and readable medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910054309.1A CN111461095A (en) 2019-01-21 2019-01-21 Voice point reading method, device, equipment and readable medium

Publications (1)

Publication Number Publication Date
CN111461095A true CN111461095A (en) 2020-07-28

Family

ID=71682166

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910054309.1A Pending CN111461095A (en) 2019-01-21 2019-01-21 Voice point reading method, device, equipment and readable medium

Country Status (1)

Country Link
CN (1) CN111461095A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113408438A (en) * 2021-06-23 2021-09-17 北京字节跳动网络技术有限公司 Control method and device of electronic equipment, terminal and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20090105531A (en) * 2008-04-03 2009-10-07 슬림디스크 주식회사 The method and divice which tell the recognized document image by camera sensor
CN101692681A (en) * 2009-09-17 2010-04-07 杭州聚贝软件科技有限公司 Method and system for realizing virtual image interactive interface on phone set terminal
CN104217197A (en) * 2014-08-27 2014-12-17 华南理工大学 Touch reading method and device based on visual gestures
CN108037882A (en) * 2017-11-29 2018-05-15 佛山市因诺威特科技有限公司 A kind of reading method and system
CN109063583A (en) * 2018-07-10 2018-12-21 广东小天才科技有限公司 A kind of learning method and electronic equipment based on read operation

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20090105531A (en) * 2008-04-03 2009-10-07 슬림디스크 주식회사 The method and divice which tell the recognized document image by camera sensor
CN101692681A (en) * 2009-09-17 2010-04-07 杭州聚贝软件科技有限公司 Method and system for realizing virtual image interactive interface on phone set terminal
CN104217197A (en) * 2014-08-27 2014-12-17 华南理工大学 Touch reading method and device based on visual gestures
CN108037882A (en) * 2017-11-29 2018-05-15 佛山市因诺威特科技有限公司 A kind of reading method and system
CN109063583A (en) * 2018-07-10 2018-12-21 广东小天才科技有限公司 A kind of learning method and electronic equipment based on read operation

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113408438A (en) * 2021-06-23 2021-09-17 北京字节跳动网络技术有限公司 Control method and device of electronic equipment, terminal and storage medium

Similar Documents

Publication Publication Date Title
CN110069608B (en) Voice interaction method, device, equipment and computer storage medium
US10276154B2 (en) Processing natural language user inputs using context data
CN111341308B (en) Method and device for outputting information
CN110969012A (en) Text error correction method and device, storage medium and electronic equipment
CN111178056A (en) Deep learning based file generation method and device and electronic equipment
CN110706707B (en) Method, apparatus, device and computer-readable storage medium for voice interaction
US11622071B2 (en) Follow-up shooting method and device, medium and electronic device
CN112306235A (en) Gesture operation method, device, equipment and storage medium
US20240079002A1 (en) Minutes of meeting processing method and apparatus, device, and medium
CN111459443A (en) Character point-reading method, device, equipment and readable medium
CN111462548A (en) Paragraph point reading method, device, equipment and readable medium
CN111444321A (en) Question answering method, device, electronic equipment and storage medium
CN111460086A (en) Point reading marking method, device, equipment and readable medium
CN112309389A (en) Information interaction method and device
CN111461095A (en) Voice point reading method, device, equipment and readable medium
CN112069786A (en) Text information processing method and device, electronic equipment and medium
CN116629236A (en) Backlog extraction method, device, equipment and storage medium
US20240096347A1 (en) Method and apparatus for determining speech similarity, and program product
WO2021170094A1 (en) Method and device for information interaction
CN115171122A (en) Point reading processing method, device, equipment and medium
CN111435442B (en) Character selection method and device, point reading equipment, electronic equipment and storage medium
CN111459347A (en) Intelligent point reading method, device, equipment and readable medium
WO2021036823A1 (en) Text processing method and apparatus, and device and medium
CN116009682A (en) Interactive display method and device, electronic equipment and readable medium
CN112328308A (en) Method and device for recognizing text

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination