CN109933198A - A kind of method for recognizing semantics and device - Google Patents
A kind of method for recognizing semantics and device Download PDFInfo
- Publication number
- CN109933198A CN109933198A CN201910186422.5A CN201910186422A CN109933198A CN 109933198 A CN109933198 A CN 109933198A CN 201910186422 A CN201910186422 A CN 201910186422A CN 109933198 A CN109933198 A CN 109933198A
- Authority
- CN
- China
- Prior art keywords
- semantic
- information
- user
- initial
- target area
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Landscapes
- User Interface Of Digital Computer (AREA)
- Machine Translation (AREA)
Abstract
The invention discloses a kind of method for recognizing semantics, comprising: acquires the voice messaging of user;Voice messaging is parsed, initial semantic information is obtained;Obtain the gesture motion information of user;According to gesture motion information, the text information of target area is obtained;According to initial semantic information and the text information of target area, the target for obtaining user is semantic.In addition, the invention also discloses a kind of semantic recognition devices, comprising: voice acquisition module, for acquiring the voice messaging of user;Speech recognition module obtains initial semantic information for parsing voice messaging;Data obtaining module obtains the text information of target area for obtaining the gesture motion information of user, and according to gesture motion information;Processing module is controlled, for the text information according to initial semantic information and target area, the target for obtaining user is semantic.Through the invention can user can not be accurately with phonetic representation or in the case where do not know how to express, the true intention of intelligent recognition user.
Description
Technical field
The present invention relates to semantics recognition technical field more particularly to a kind of method for recognizing semantics and device.
Background technique
With the fast development of internet, various intellectual products play more and more important work in people's lives
With people also more and more habitually complete various demands using intelligent terminal.And increasingly with artificial intelligence the relevant technologies
The intelligence degree of maturation, each Terminal Type is also higher and higher.Exchange of the interactive voice as human-computer interaction mainstream in intelligent terminal
One of using, and increasingly by the favor of user.
Currently, the voice that many intelligent sound equipment are inputted based on user in the market identifies, then take corresponding
Measure, thus the accuracy of voice that is inputted by intelligent sound equipment of user drastically influence it is anti-made by intelligent terminal
Feedback.And for the child of primary grades, due to being likely to occur language during language expression in the stage for just starting study
Speech statement is not complete, it is intended that fuzzy situation.Especially for the speech electronic product that child uses, child uses in operation process
The speech production can generate a drawback, for not recognizing words or content, can not accurately be expressed with voice or do not know how
Expression causes speech production in parsing semantic procedure using being limited, and speech production is also just difficult to the true of intelligent recognition user
It is intended to.
Summary of the invention
In order to solve above-mentioned technological deficiency, the present invention provides a kind of method for recognizing semantics and device.Specifically, technical solution
It is as follows:
On the one hand, the present invention provides a kind of method for recognizing semantics, comprising:
Acquire the voice messaging of user;
The voice messaging is parsed, initial semantic information is obtained;
Obtain the gesture motion information of the user;
According to the gesture motion information, the text information of target area is obtained;
According to the initial semantic information and the text information of target area, the target for obtaining the user is semantic.
Further, after the initial semantic information of acquisition further include:
Judge whether the initial semantic information is missing semantic information;
When judging the initial semantic information for missing semantic information, the gesture motion information of the user is obtained.
Further, described to judge whether the initial semantic information is that missing semantic information includes:
Judge whether comprising preset prompting message in the initial semantic information, if so, judging the initial semanteme
Information is missing semantic information.
Further, described according to the initial semantic information and the text information of target area, obtain the target of user
Semanteme includes:
According to the initial semantic information, the semantic regular expression of missing is generated;
According to the text information of the target area, the semantic regular expression of the missing is matched;
According to the semantic regular expression of successful match, the target for obtaining user is semantic.
Further, described according to the gesture motion information, the text information for obtaining target area includes:
According to the gesture motion information, the image of target area is obtained;
Image procossing is carried out to the image of the target area, identifies the text information in the image of the target area.
On the other hand, the invention also discloses a kind of semantic recognition devices, comprising:
Voice acquisition module, for acquiring the voice messaging of user;
Speech recognition module obtains initial semantic information for parsing the voice messaging;
Data obtaining module is obtained for obtaining the gesture motion information of the user, and according to the gesture motion information
Take the text information of target area;
It controls processing module and obtains the use for the text information according to the initial semantic information and target area
The target at family is semantic.
Semantic judgement module, for judging described first according to the initial semantic information after speech recognition module parsing
Whether beginning semantic information is missing semantic information;
The data obtaining module is also used to obtain institute when determining the initial semantic information to lack semantic information
State the gesture motion information of user.
Further, the Semantic judgement module in semantic recognition device of the present invention includes:
Submodule is searched, for whether searching in the initial semantic information comprising preset prompting message;
Decision sub-module, for finding in the initial semantic information when the lookup submodule comprising preset prompting
When information, the initial semantic information is determined to lack semantic information.
Further, the control processing module of semantic recognition device of the present invention includes:
Expression formula generates submodule, for generating the semantic regular expression of missing according to the initial semantic information;
Matched sub-block matches the semantic regular expressions of the missing for the text information according to the target area
Formula;
Semantic acquisition submodule, for the semantic regular expression according to successful match, the target for obtaining user is semantic.
Further, the data obtaining module in semantic recognition device of the present invention includes:
Image taking submodule, for obtaining the gesture motion information of the user, and according to the gesture motion information,
Obtain the image of target area;
Image procossing submodule carries out image procossing for the image to the target area;
Image recognition submodule, for identification text information in the image of the target area.
The present invention is included at least with the next item down advantageous effects:
(1) the present invention overcomes the deficiencies of single voice input obtains under the premise of voice input in conjunction with gesture motion
The text information of target area, the two combines the rear more accurate true semanteme for getting user, so that speech ciphering equipment
User can not be accurately with phonetic representation or do not know how to express in the case where can also be with the true intention of intelligent recognition to user.
(2) after getting the voice messaging of user, the voice messaging obtained to it carries out parsing and obtains initially the present invention
Semantic information, then judges whether the initial semantic information is missing from semantic information again, is only judging that the semantic information is
After lacking semantic information, the gesture motion information for obtaining user can be just triggered, and then obtain the text information of target area.Such as
This is greatly saved power consumption, only judges so that the acquisition of gesture motion information and text information is triggering progress of having ready conditions
Voice messaging belongs to missing semantic information, can not be obtained after whole user is intended to accordingly and just open subsequent operation and carry out auxiliary solution
Analysis, so that the two, which combines, understands that user's true intention, intelligence degree are higher.
(3) judgement of semantic information whether is missing from the present invention to initial semantic information, can be used search whether include
The mode of prompting message is preset to judge.Default prompting message is a kind of relatively simple easy scheme, program operability
By force, it is easy to accomplish.The initial semantic information after parsing need to be only compared with preset prompting message.
(4) in the case where user speech information is unable to expressed intact actual wishes, (initial semantic information is scarce to the present invention
Lose semantic information), the gesture motion information of user is obtained, the target area that user is directed toward is obtained according to the gesture motion of user
Image, then Text region is carried out to the image of the target area, the text information content that user gesture is directed toward is obtained, in conjunction with it
The initial semanteme that preceding voice messaging includes is to obtain user's true intention.Image obtains and identification technology is more mature, knows
Other speed is fast, and feedback can be quickly provided after getting user speech information, improves user experience.
(5) present invention produces the semantic canonical of missing according to the initial semantic information that the voice messaging of parsing user obtains
Expression formula is matched in conjunction with the text information for the target area that image recognition below obtains, then can be quickly obtained complete language
Adopted clause, so that user's true intention is learnt, convenient for providing respective feedback.Using the matching scheme of semantic regular expression, side
Just quick.
Detailed description of the invention
To describe the technical solutions in the embodiments of the present invention more clearly, make required in being described below to embodiment
Attached drawing is briefly introduced, it should be apparent that, drawings in the following description are only some embodiments of the invention, for this
For the those of ordinary skill in field, without any creative labor, it can also be obtained according to these attached drawings
His attached drawing.
Fig. 1 is a kind of flow chart of method for recognizing semantics embodiment one of the present invention;
Fig. 2 is a kind of flow chart of method for recognizing semantics embodiment two of the present invention;
Fig. 3 is a kind of flow chart of method for recognizing semantics embodiment three of the present invention;
Fig. 4 is a kind of flow chart of method for recognizing semantics example IV of the present invention;
Fig. 5 is a kind of flow chart of method for recognizing semantics embodiment five of the present invention;
Fig. 6 is a kind of flow chart of method for recognizing semantics embodiment six of the present invention;
Fig. 7 is a kind of block diagram of semantic recognition device embodiment seven of the present invention;
Fig. 8 is a kind of block diagram of semantic recognition device embodiment eight of the present invention;
Fig. 9 is a kind of block diagram of semantic recognition device embodiment nine of the present invention.
Appended drawing reference:
10-- voice acquisition module;20-- speech recognition module;30-- data obtaining module;31-- image taking submodule
Block;32-- image procossing submodule;33-- image recognition submodule;40-- controls processing module;41-- expression formula generates submodule
Block;42-- matched sub-block;43-- semanteme acquisition submodule;50-- Semantic judgement submodule;51-- searches submodule;52-- sentences
Stator modules.
Specific embodiment
To make the objectives, technical solutions, and advantages of the present invention clearer, below in conjunction with attached drawing to the present invention make into
It is described in detail to one step, it is clear that the described embodiments are only some of the embodiments of the present invention, rather than whole implementation
Example.Based on the embodiments of the present invention, obtained by those of ordinary skill in the art without making creative efforts
All other embodiment, shall fall within the protection scope of the present invention.
Embodiment one
The present invention provides a kind of method for recognizing semantics, and embodiment one is as shown in Figure 1, comprising:
The voice messaging of S101 acquisition user;
Specifically, obtaining the voice messaging of user, voice letter can be acquired by microphone or other voice acquisition devices
Breath.The voice messaging can be the voice that user inputs in real time, and certain voice messaging is not necessarily completely, it may be possible to complete
Voice messaging, it is also possible to only part of speech.
S102 parses the voice messaging, obtains initial semantic information;
After getting voice messaging, then start to parse voice messaging, wants table to get the voice messaging
The basic semantic reached.Corresponding semanteme is obtained about voice messaging is carried out parsing, existing various technological means can be used
Solve, the present invention is not limited to certain class parsing scheme, and the program is also not improvement of the invention, therefore, the present invention no longer into
Row is described in detail.
S103 obtains the gesture motion information of the user;
Specifically, the gesture motion information for obtaining user can be the gesture motion figure for taking user by camera
Picture is also possible to sense the gesture motion of user by other sensing apparatus.
S104 obtains the text information of target area according to the gesture motion information;
After getting the gesture motion information of user, the text that target area can be obtained according to the gesture motion of user is believed
Breath, such as bookish specific certain the road operation topic being directed toward according to user's finger.And obtain the text information of target area, then may be used
To be the image for taking the learning region of user's direction by camera, such as the image of certain problem, then to image into
Row processing identification, obtains the text information of topic.
For S105 according to the initial semantic information and the text information of target area, the target for obtaining the user is semantic.
According to the initial semantic information that parsing voice messaging obtains before, believe in conjunction with the text of the target area obtained below
Breath, the two fusion, can obtain user's true intention, to give corresponding corresponding.
The deficiency that the present embodiment overcomes single voice input obtains under the premise of voice input in conjunction with gesture motion
The text information of target area, the two combine the rear more accurate true semanteme for getting user.
It obtains it is worth noting that, the acquisition of user gesture action message can be shot by camera or passes through it
Its sensing apparatus senses the gesture motion of user.Obtained if it is using camera, can also by a variety of processing modes,
For example, camera is constantly in open stage in speech ciphering equipment service stage, the video of user and user's learning region is shot.When
When collecting the voice messaging of user, go to extract the video frame shot further according to the time point of acquisition, the gesture for obtaining user is dynamic
Make image, then identify the gesture motion, obtains the specific topic image for the learning region that user is directed toward.Again to the topic image
Text region is carried out, corresponding text information is obtained, so that the initial semanteme of the voice messaging before combining, obtains the true of user
It is real semantic, to give corresponding response.Certainly, the camera of the intelligent sound equipment, which can not also be constantly in, opens rank
Section, after the voice messaging for only collecting user, it can just open the gesture motion of shooting user and the image of corresponding learning region
Deng.Dormant state can be returned to after shooting again before.
Embodiment two
The second embodiment of the present invention is as shown in Figure 2, comprising:
The voice messaging of S201 acquisition user;
S202 parses the voice messaging, obtains initial semantic information;
S203 judges whether the initial semantic information is missing semantic information;
Specifically, missing semantic information, that is to say, that the information that complete semantic and user is intended to can not be obtained.In other words
It says, is exactly the initial semanteme for passing through collected user speech information acquisition merely, can not completely obtain the intention of user, also need
The true semanteme for obtaining user is cooperateed with by other supplementary modes.
S204 obtains the gesture motion letter of the user when judging the initial semantic information for missing semantic information
Breath;
S205 obtains the text information of target area according to the gesture motion information;
For S206 according to the initial semantic information and the text information of target area, the target for obtaining the user is semantic.
The present embodiment is relative to embodiment one, after getting initial semantic information, increase by initial semantic information with
Lack semantic information contrast judgement step.By judge the initial semanteme whether be missing from semantic information decide whether triggering obtain
Take the gesture motion information at family.The present embodiment is only determining that initial semanteme is to belong to missing semantic information Shi Caihui to go to obtain
The step for gesture motion information of user, increase, so that the acquisition of gesture motion information is saved significantly in conditional triggering
Power consumption is saved.
Specifically, such as user using intelligent sound equipment auxiliary reading, it is seen that " I likes grape " this when, " Portugal
The two words of grape " do not recognize, and in being intended to request help, then, which says " what the two words read ", then, intelligent sound
Equipment collects the voice, obtains user after then parsing and does not know the pronunciation of the two words, and then can not for which two word
It is expressed by voice messaging, it is semantic to belong to missing according to the initial semantic information judgement.Then, restart camera shooting user
Gesture motion, take the image that user's finger is directed toward " grape " the two words, then image procossing and text are carried out to the region
Word identification, obtains " I likes grape " this text information, then in conjunction with the semanteme " what the two words read " before user, from
And obtain the true semanteme of user be do not know reading " grape " the two words, and then can provide corresponding response " owner, the two
Word reads grape, I likes grape ".In this way, words unacquainted for user, can be obtained auxiliary by the intelligent sound equipment
It helps, user is helped to carry out the reading of books.
Likewise, such as user see " I likes grape " this when, " grape " the two words do not recognize, then
Want to request help, then, which says " I likes -- ", and then, intelligent sound equipment collects the voice, obtains after then parsing
The semanteme for obtaining user is " I likes -- ", and is come out for liking subsequent object not over phonetic representation, therefore can root
It is semantic to belong to missing according to the initial semantic information judgement.Then, the gesture motion for restarting camera shooting user, takes use
Family finger is directed toward the image of " grape " the two words, then carries out image procossing and Text region to the region, and " I likes Portugal for acquisition
This text information of grape ", then in conjunction with the semanteme " I likes -- " before user, so that obtaining the true semanteme of user is not know
" grape " the two words are read, and then corresponding response " owner, I likes grape " can be provided.In this way, not recognizing for user
Words, can read-only understanding words, and unacquainted words then may indicate that, understand user by the intelligent sound equipment
Intention, help user to recognize unacquainted words, help to read and understand.
Embodiment three
The method for recognizing semantics of the present embodiment, as shown in Figure 3, comprising:
The voice messaging of S301 acquisition user;
S302 parses the voice messaging, obtains initial semantic information;
S303 judges whether the initial semantic information is missing semantic information;If so, entering step S305, otherwise enter
Step S304;
For S304 according to the initial semantic information, the target for obtaining the user is semantic;
S305 obtains the gesture motion information of the user;
S306 obtains the text information of target area according to the gesture motion information;
For S307 according to the initial semantic information and the text information of target area, the target for obtaining the user is semantic.
Whether the present embodiment is missing from semantic information using different treating methods according to initial semantic information.For solution
When the initial semantic information obtained after analysis voice messaging is missing from semantic information, the acquisition of user gesture action message is triggered, into
And target area text information is obtained, combine the target for obtaining user semantic by target text information and initial semantic information.
And the case where semantic information is not missing from for initial semantic information, then without triggering the acquisition of user gesture action message, directly
The true semanteme of user can be obtained by the initial semanteme of the voice messaging of user.For example, collecting the voice messaging of user
Be " what the English of elephant is? ", since the collected voice messaging can completely state out the true intention of user, language
Adopted complete, therefore, the initial semantic information for parsing voice messaging acquisition is not belonging to missing semantic information, then just only needing root
The target that user can be directly obtained according to the initial semantic information is semantic -- wonder the English of elephant.So also just without going to touch
Hair obtains corresponding user gesture motion images etc., the initial semantic information that can directly obtain according to the voice messaging of parsing user
The true intention of user is obtained, so as to provide corresponding response in time: the English of elephant is elephant.
Example IV
The method for recognizing semantics of the present embodiment, as shown in Figure 4, comprising:
The voice messaging of S401 acquisition user;
S402 parses the voice messaging, obtains initial semantic information;
Whether S403 judges in the initial semantic information comprising preset prompting message;If so, entering step S404
S404 determines the initial semantic information to lack semantic information;
S405 obtains the gesture motion letter of the user when determining the initial semantic information to lack semantic information
Breath;
S406 obtains the text information of target area according to the gesture motion information;
For S407 according to the initial semantic information and the text information of target area, the target for obtaining the user is semantic.
Specifically, such as preset prompting message is that " small talent puts question to!", then only needing to collect user speech letter
" small talent, enquirement are contained in breath!" the initial semantic information can be determined to lack semantic information, thus after acquisition need to being triggered
Continuous gesture motion information, and then it is semantic come the target for assisting in identifying user to obtain the text information of target area.It is preset to mention
One or more can be set in awake information.
Default prompting message is a kind of relatively simple easy scheme, program strong operability, it is easy to accomplish.Only need by
Initial semantic information after parsing is compared with preset prompting message.
Certainly, the judgement for whether being missing from semantic information for initial semantic information can also use other more intelligent
Mode.After getting initial semantic information say by parsing voice messaging, sentenced according to the semanteme intelligence of initial semantic information
Break the voice messaging whether can expressed intact user true intention.For example, the initial semantic information after parsing is that " this word is read
What ", after getting the initial semanteme, since this word does not know which word referred to, thus user response can not be given, not
Solution user really wants the pronunciation for being appreciated which word, therefore also can not just provide response.In this case, equipment can not
Corresponding response is provided according to the voice messaging of user, then this initial semantic information is just missing semantic information, " this
The case where which word word " refers specifically to, and belongs to missing, it is also necessary to which the gesture motion of user assists identifying.
Embodiment five
The method for recognizing semantics of the present embodiment, as shown in Figure 5, comprising:
The voice messaging of S501 acquisition user;
S502 parses the voice messaging, obtains initial semantic information;
S503 judges whether the initial semantic information is missing semantic information;
S504 obtains the gesture motion letter of the user when judging the initial semantic information for missing semantic information
Breath;
S505 obtains the text information of target area according to the gesture motion information;
S506 generates the semantic regular expression of missing according to the initial semantic information;
S507 matches the semantic regular expression of the missing according to the text information of the target area;
S508 obtains the target semanteme of user according to the semantic regular expression of successful match.
The present embodiment is semantic for the target for obtaining user according to the gesture motion information of user and initial semantic information
It is refined.Specifically, after getting the text information of initial semantic information and target area, first according to initial semantic letter
Breath, generates the semantic regular expression of missing.For example, the voice messaging for collecting user is " what the two words read ", then
According to the initial semanteme of the voice messaging, it is that the two words are not known yet for which two word, belongs to the part of missing, therefore produce
The semantic regular expression of missing: what XX reads.Then text information " the Portugal for the target area being directed toward further according to user's finger
Grape ", the target semanteme that can get user is " what grape reads ", that is, the pronunciation of grape.
Likewise, if user is not to say that the two words read this kind of missing semanteme for puing question to class such as what, but directly
The words of read-only understanding, and unacquainted words is not read out then, it points out to request help by gesture.For example user sees
See " I likes grape " this when, " grape " the two words do not recognize, and in being intended to request help, then, which says " I
Like -- ", then, intelligent sound equipment collects the voice, and obtaining the semanteme of user after then parsing is " I likes -- ", and
It is but come out not over phonetic representation for liking subsequent object, therefore missing can be belonged to according to the initial semantic information judgement
It is semantic.Then, can be first according to the initial semantic information, generate the semantic regular expression of missing: I likes XX.Then further according to
The image for the target area that the user gesture taken is directed toward, the text information for recognizing the target area of user gesture direction are
" I likes grape ", the missing semanteme regular expression before then matching again: I likes XX, obtains complete semantic sentence: I
Like grape, thus inform that user user reads I to like subsequent words be grape, can be with voice output: I likes Portugal
Grape.
Specifically, for example, user uses intelligent sound equipment assisted learning.Speech ciphering equipment collects the voice messaging of user
Afterwards, the camera that will start in speech ciphering equipment is synchronized, finger click action of the user in learning process is collected, is sentenced by movement
After the fixed corresponding learning region for generating query, identification and intents, and voice before combination are carried out to the text in region
Semanteme in input matches the semantic slot in the semantic canonical clause of missing, is carried out to semantic slot by the result that text parses
Filling obtains true semanteme, provides true intention of the user under fuzzy scene.
Embodiment six
The voice messaging of S601 acquisition user;
S602 parses the voice messaging, obtains initial semantic information;
S603 obtains the gesture motion information of the user;
S604 obtains the image of target area according to the gesture motion information;
S605 carries out image procossing to the image of the target area, identifies the text letter in the image of the target area
Breath;
For S606 according to the initial semantic information and the text information of target area, the target for obtaining the user is semantic.
The present embodiment is directed to be described in detail according to the text information of the gesture motion acquisition of information target area of user,
Specifically, being directed toward, obtaining further according to the gesture motion of user for example, the gesture motion image of user can be got by camera
Obtain specific location (target area) image that the user is directed toward.Then specific location (target area) image is carried out at image
Reason, identifies the text information of the target area.
Embodiment seven
Based on the same technical idea, the invention also discloses a kind of semantic recognition device, which can be used the present invention
Method for recognizing semantics identify the true semanteme of user, specifically, the embodiment of the present invention seven as shown in fig. 7, comprises:
Voice acquisition module 10, for acquiring the voice messaging of user;
Speech recognition module 20 obtains initial semantic information for parsing the voice messaging;
Data obtaining module 30, for obtaining the gesture motion information of the user, and according to the gesture motion information,
Obtain the text information of target area;
Processing module 40 is controlled, for the text information according to the initial semantic information and target area, described in acquisition
The target of user is semantic.
In the present embodiment, the voice messaging of user is acquired by voice acquisition module 10, then passes through speech recognition mould again
Block 20 parses the voice messaging, obtains initial semantic.The gesture motion for obtaining user by data obtaining module 30 again, determines
Target area, and then the image information of target area is obtained, it parses the image and obtains corresponding text information.Finally, at control
Reason module 40 is in the target area that the initial semantic information and data obtaining module 30 obtained according to speech recognition module 20 obtains
Text information it is semantic come the target for obtaining user.Specifically, such as user carrys out assisted learning using intelligent sound equipment, the intelligence
On the one hand energy speech ciphering equipment acquires the voice messaging of user, while the camera in the intelligent sound equipment is collected user and learnt
Finger click action in the process.After determining the corresponding learning region for generating query by movement, to the text in region into
Row identification and intents, and the semanteme before combination in voice input, acquisition is true semantic, provides user under the scene
True intention.
The present embodiment initial semantic information that parsing voice messaging obtains before, in conjunction with the target area obtained below
Text information, the two fusion can obtain user's true intention, to give corresponding corresponding.The semantic of the embodiment is known
Other device overcomes the deficiency of single voice input, under the premise of voice input, obtains target area in conjunction with gesture motion
Text information, the two combine the rear more accurate true semanteme for getting user.
In addition, data obtaining module 30 obtains user gesture action message, for example acquisition is shot by camera, image
Head can be constantly in open stage in speech ciphering equipment (built-in semantic recognition device) service stage, shoot user and user's study
The video in region.When collecting the voice messaging of user, goes to extract the video frame shot further according to the time point of acquisition, obtain
Then the gesture motion image of user identifies the gesture motion, obtain the specific topic image for the learning region that user is directed toward.Again
Text region is carried out to the topic image, obtains corresponding text information, thus the initial semanteme of the voice messaging before combining,
The true semanteme for obtaining user, to give corresponding response.Certainly, the camera of the intelligent sound equipment can not also be always
In open stage, after the voice messaging for only collecting user, the gesture motion and corresponding study of shooting user can be just opened
The image etc. in region.Dormant state can be returned to after shooting again before.
Embodiment eight
The present embodiment increases parsing module and Semantic judgement module 50 on the basis of embodiment seven, specifically, as schemed
Shown in 8, semantic recognition device of the present invention further include:
Semantic judgement module 50, for judging institute according to the initial semantic information after the speech recognition module 20 parsing
State whether initial semantic information is missing semantic information;
The data obtaining module 30 is also used to obtain when determining the initial semantic information to lack semantic information
The gesture motion information of the user.
The present embodiment increases Semantic judgement module 50 relative to embodiment seven.Judged by the Semantic judgement module 50
Whether the initial semanteme is missing from semantic information to decide whether that the gesture motion for triggering the acquisition user of data obtaining module 30 is believed
Breath.The present embodiment only determines that initial semanteme is to belong to missing semantic information Shi Caihui to go to pass through information in Semantic judgement module 50
The step for gesture motion information of the acquisition acquisition user of module 30, increase, so that the acquisition of gesture motion information is that having item
The triggering of part, is greatly saved power consumption.
Specifically, such as user uses the reading of intelligent sound equipment (built-in semantic recognition device) auxiliary, it is seen that " I am very mentally disturbed
Te " this when, " perturbed " the two words do not recognize, in being intended to request help, then, the user say " the two words read it is assorted
", then, voice acquisition module 10 collects the voice, obtains user after then parsing by speech recognition module 20 and does not know
The pronunciation of the two words of road, and which two word can not then be expressed by voice messaging, Semantic judgement module 50 is first according to this
It is semantic that the judgement of beginning semantic information belongs to missing.Then, the gesture for restarting the camera shooting user of data obtaining module 30 is dynamic
Make, takes the image that user's finger is directed toward " perturbed " the two words, then image procossing and Text region are carried out to the region, obtain
" I am very perturbed " this text information is obtained, then the semanteme before control processing module 40 combination user " read assorted by the two words
", thus obtain the true semanteme of user be do not know reading " perturbed " the two words, and then can provide corresponding response " owner,
The reading of the two words is perturbed, I am very perturbed ".In this way, words unacquainted for user, can be obtained by the intelligent sound equipment
Auxiliary helps user to carry out the reading of books.
Certainly, other form of presentation can be used also to express the same meaning in user.Likewise, such as user sees
" I am very perturbed " this when, " perturbed " the two words do not recognize, and then, which says " I very -- ", and intelligent sound equipment is (interior
Set semantic recognition device) voice is collected, the semanteme that user is obtained after then parsing is " I very -- ", it is evident that this is initial
Semanteme fails the intention that expressed intact goes out user, therefore it is semantic to belong to missing according to the initial semantic information judgement.Then, then
The gesture motion for starting camera shooting user takes the image that user's finger is directed toward " I am very perturbed " this passage, then
Image procossing and Text region are carried out to the region, " I am very perturbed " this text information is obtained, before user
Semantic " I very -- ", so that obtaining the true semanteme of user is not know readings " perturbed " the two words, and then can provide accordingly
Respond " owner, the reading of the two words is perturbed, I am very perturbed ".In this way, words unacquainted for user, can read-only understanding word
Word, and unacquainted words then may indicate that, the intention for understanding user by the intelligent sound equipment, and user's understanding is helped not recognize
The words of knowledge helps to read understanding.
Preferably, the control processing module 40 in semantic recognition device of the present invention, is also used to sentence when the semanteme
When disconnected module 50 determines that the initial semantic information is not missing from semantic information, the use is obtained according to the initial semantic information
The target at family is semantic.
The case where not being missing from semantic information for the initial semantic information of Semantic judgement module 50 is then obtained without triggering information
Modulus block 30 obtains user gesture action message, and the first of the voice messaging acquisition of user can be directly parsed by semantics recognition module
Begin the semantic true semanteme to obtain user.For example, the voice messaging for collecting user is " 14+25 is equal to how many? ", due to adopting
The voice messaging collected can completely state out the true intention of user, semantic complete, therefore, parse the voice messaging and obtain
The initial semantic information obtained is not belonging to missing semantic information, then just only needing to be directly obtained according to the initial semantic information
The target of user is semantic -- wonder the result of 14+25.So also just without going triggering to obtain corresponding user gesture action diagram
As etc., the true intention of user can be directly obtained according to the initial semantic information that the voice messaging of parsing user obtains, so as to
To provide corresponding response in time: 14+25 is equal to 39.
Embodiment nine
The present embodiment refines Semantic judgement module 50 on the basis of above-described embodiment eight, specifically, such as Fig. 9
Shown, the Semantic judgement module 50 includes:
Submodule 51 is searched, for whether searching in the initial semantic information comprising preset prompting message;
Decision sub-module 52, for finding in the initial semantic information when the lookup submodule 51 comprising preset
When prompting message, the initial semantic information is determined to lack semantic information.
Specifically, such as preset prompting message is that " hello, small talent!", that is to say, that user can not only pass through voice
Come expressed intact it is really semantic when, then only needing first to say " hello, small talent ", may then continue to carry out speech expression,
Or it no longer carries out speech expression and directly uses gesture and act to indicate.Voice acquisition module 10 collects user speech information
" hello, small talent!What this word reads " after, speech recognition module 20 identifies it, obtains initial semantic.Then lead to
The lookup submodule 51 for crossing Semantic judgement module 50 searches in the initial semanteme whether contain " hello, small talent " this prompting
Information, after finding, decision sub-module 52 then determines that the initial semantic information to lack semantic information, obtains so that information need to be triggered
Modulus block 30 obtains subsequent gesture motion information, and then obtains the text information of target area to assist in identifying the target of user
It is semantic.One or more can be set in preset prompting message.The Semantic judgement module 50 of the embodiment is easily achieved, simply just
It is prompt.
Certainly, the Semantic judgement for whether being missing from semantic information for initial semantic information can also use other more intelligence
The mode of energy.After getting initial semantic information say by the parsing voice messaging of speech recognition module 20, Semantic judgement mould
Block 50 according to the semantic intelligent decision of the initial semantic information voice messaging whether can expressed intact user true intention.Than
Such as, the initial semantic information after parsing is " what this word reads ", and after getting the initial semanteme, not known due to this word is
Which word referred to, thus user response can not be given, not knowing about user really wants the pronunciation for being appreciated which word, therefore also
Response can not be provided.In this case, equipment can not provide corresponding response according to the voice messaging of user, then at the beginning of this
The case where beginning semantic information is just missing semantic information, which word " this word " refers specifically to, belong to missing, it is also necessary to the hand of user
Gesture acts to assist being identified.
Preferably, as shown in figure 9, the present embodiment is to semantic recognition device on the basis of any of the above-described Installation practice
Control processing module 40 be unfolded, control processing module 40 include:
Expression formula generates submodule 41, for generating the semantic regular expression of missing according to the initial semantic information;
Matched sub-block 42 matches the semantic canonical table of the missing for the text information according to the target area
Up to formula;
Semantic acquisition submodule 43, for the semantic regular expression according to successful match, the target for obtaining user is semantic.
The present embodiment elaborates control processing module 40.Specifically, being obtained by speech recognition module 20
Initial semantic information is got, after the text information that target area is got by data obtaining module 30, controls processing module 40
Expression formula generate submodule 41 first according to initial semantic information, generate the semantic regular expression of missing.For example, collecting use
The voice messaging at family is " what the two words read ", then according to the initial semanteme of the voice messaging, the two words are not also known
It is any two words, belong to the part of missing, therefore produce the semantic regular expression of missing: what XX reads.Then matching
The text information " perturbed " for the target area that module 42 is directed toward further according to user's finger, is matched to the semantic regular expression of missing
" perturbed that is read ", so that the semantic target semanteme for obtaining the available user of module is perturbed pronunciation.
Likewise, if user is not to say that the two words read this kind of missing semanteme for puing question to class such as what, but directly
The words of read-only understanding, and unacquainted words is not read out then, it points out to request help by gesture.For example user sees
See " I am very perturbed " this when, " perturbed " the two words do not recognize, and in being intended to request help, then, which says " I
Very -- ", then, intelligent sound equipment collects the voice, and the semanteme that user is obtained after then parsing is " I very -- ", this is initial
The semantic expressed intact that obviously fails goes out user's intention, and sentence is simultaneously imperfect, therefore can be belonged to according to the initial semantic information judgement
Missing is semantic.Then, can be first according to the initial semantic information, generate the semantic regular expression of missing: I am very XX.Then root again
According to the image for the target area that the user gesture taken is directed toward, the text information of the target area of user gesture direction is recognized
For " I am very perturbed ", missing semanteme regular expression before then matching again: I am very XX, and obtain complete semantic sentence: I am very
It is perturbed.To inform that user user reads I very -- subsequent words is grape, can be with voice output: perturbed, I am very perturbed.
Preferably, as shown in figure 9, the data obtaining module 30 in the semantic recognition device of any of the above-described embodiment includes:
Image taking submodule 31 is believed for obtaining the gesture motion information of the user, and according to the gesture motion
Breath, obtains the image of target area;
Image procossing submodule 32 carries out image procossing for the image to the target area;
Image recognition submodule 33, for identification text information in the image of the target area.
The present embodiment is refined for data obtaining module 30, specifically, for example, can pass through image taking submodule
31, for example camera gets the gesture motion image of user, is directed toward further according to the gesture motion of user, obtains user direction
Specific location (target area) image.Image procossing submodule 32 carries out image to specific location (target area) image again
Processing, and pass through the text information that image recognition submodule 33 identifies the target area.
Semantic recognition device of the invention is built-in in all kinds of smart machines, for example is built in the language of auxiliary user's study
In sound equipment.The speech ciphering equipment can acquire the voice messaging of user, obtain the initial semanteme of user, then pass through starting speech ciphering equipment
In camera, collect finger click action of the user in learning process, pass through movement and determine corresponding generate query
After practising region, identification and intents, and the initial semanteme that voice inputs before combination are carried out to the text in region, matching lacks
Semantic slot in the semantic canonical clause of mistake is filled semantic slot by the result that text parses, and obtains true semanteme, gives
True intention of the user under fuzzy scene out.
Although preferred embodiments of the present invention have been described, it is created once a person skilled in the art knows basic
Property concept, then additional changes and modifications may be made to these embodiments.So it includes excellent that the following claims are intended to be interpreted as
It selects embodiment and falls into all change and modification of the scope of the invention.
Obviously, various changes and modifications can be made to the invention without departing from essence of the invention by those skilled in the art
Mind and range.In this way, if these modifications and changes of the present invention belongs to the range of the claims in the present invention and its equivalent technologies
Within, then the present invention is also intended to include these modifications and variations.
Claims (10)
1. a kind of method for recognizing semantics characterized by comprising
Acquire the voice messaging of user;
The voice messaging is parsed, initial semantic information is obtained;
Obtain the gesture motion information of the user;
According to the gesture motion information, the text information of target area is obtained;
According to the initial semantic information and the text information of target area, the target for obtaining the user is semantic.
2. a kind of method for recognizing semantics according to claim 1, which is characterized in that it is described obtain initial semantic information it
Afterwards further include:
Judge whether the initial semantic information is missing semantic information;
When judging the initial semantic information for missing semantic information, the gesture motion information of the user is obtained.
3. a kind of method for recognizing semantics according to claim 2, which is characterized in that the judgement initial semantic information
It whether is that missing semantic information includes:
Judge whether comprising preset prompting message in the initial semantic information, if so, judging the initial semantic information
To lack semantic information.
4. a kind of method for recognizing semantics according to claim 1, which is characterized in that described according to the initial semantic information
And the text information of target area, the target semanteme for obtaining user include:
According to the initial semantic information, the semantic regular expression of missing is generated;
According to the text information of the target area, the semantic regular expression of the missing is matched;
According to the semantic regular expression of successful match, the target for obtaining user is semantic.
5. a kind of method for recognizing semantics according to claim 1-4, which is characterized in that described according to the gesture
Action message, the text information for obtaining target area include:
According to the gesture motion information, the image of target area is obtained;
Image procossing is carried out to the image of the target area, identifies the text information in the image of the target area.
6. a kind of semantic recognition device characterized by comprising
Voice acquisition module, for acquiring the voice messaging of user;
Speech recognition module obtains initial semantic information for parsing the voice messaging;
Data obtaining module obtains mesh for obtaining the gesture motion information of the user, and according to the gesture motion information
Mark the text information in region;
It controls processing module and obtains the user's for the text information according to the initial semantic information and target area
Target is semantic.
7. a kind of semantic recognition device according to claim 6, which is characterized in that further include:
Semantic judgement module, for judging the initial language according to the initial semantic information after speech recognition module parsing
Whether adopted information is missing semantic information;
The data obtaining module is also used to obtain the use when determining the initial semantic information to lack semantic information
The gesture motion information at family.
8. a kind of semantic recognition device according to claim 7, which is characterized in that the Semantic judgement module includes:
Submodule is searched, for whether searching in the initial semantic information comprising preset prompting message;
Decision sub-module, for finding in the initial semantic information when the lookup submodule comprising preset prompting message
When, the initial semantic information is determined to lack semantic information.
9. a kind of semantic recognition device according to claim 6, which is characterized in that the control processing module includes:
Expression formula generates submodule, for generating the semantic regular expression of missing according to the initial semantic information;
Matched sub-block matches the semantic regular expression of the missing for the text information according to the target area;
Semantic acquisition submodule, for the semantic regular expression according to successful match, the target for obtaining user is semantic.
10. according to a kind of described in any item semantic recognition devices of claim 6-9, which is characterized in that the acquisition of information mould
Block includes:
Image taking submodule is obtained for obtaining the gesture motion information of the user, and according to the gesture motion information
The image of target area;
Image procossing submodule carries out image procossing for the image to the target area;
Image recognition submodule, for identification text information in the image of the target area.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910186422.5A CN109933198B (en) | 2019-03-13 | 2019-03-13 | Semantic recognition method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910186422.5A CN109933198B (en) | 2019-03-13 | 2019-03-13 | Semantic recognition method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109933198A true CN109933198A (en) | 2019-06-25 |
CN109933198B CN109933198B (en) | 2022-04-05 |
Family
ID=66986980
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910186422.5A Active CN109933198B (en) | 2019-03-13 | 2019-03-13 | Semantic recognition method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109933198B (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110767219A (en) * | 2019-09-17 | 2020-02-07 | 中国第一汽车股份有限公司 | Semantic updating method, device, server and storage medium |
CN111324206A (en) * | 2020-02-28 | 2020-06-23 | 重庆百事得大牛机器人有限公司 | Gesture interaction-based confirmation information identification system and method |
CN111353034A (en) * | 2020-02-28 | 2020-06-30 | 重庆百事得大牛机器人有限公司 | Legal fact correction system and method based on gesture collection |
CN111881691A (en) * | 2020-06-15 | 2020-11-03 | 惠州市德赛西威汽车电子股份有限公司 | System and method for enhancing vehicle-mounted semantic analysis by utilizing gestures |
CN112309387A (en) * | 2020-02-26 | 2021-02-02 | 北京字节跳动网络技术有限公司 | Method and apparatus for processing information |
CN112863508A (en) * | 2020-12-31 | 2021-05-28 | 思必驰科技股份有限公司 | Wake-up-free interaction method and device |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160351068A1 (en) * | 2014-08-27 | 2016-12-01 | South China University Of Technology | Finger reading method and device based on visual gestures |
CN106933783A (en) * | 2015-12-31 | 2017-07-07 | 远光软件股份有限公司 | A kind of method and device on the intelligent extraction date from text |
JP2018041155A (en) * | 2016-09-05 | 2018-03-15 | 株式会社野村総合研究所 | Voice order reception system |
CN109192204A (en) * | 2018-08-31 | 2019-01-11 | 广东小天才科技有限公司 | A kind of sound control method and smart machine based on smart machine camera |
CN109344231A (en) * | 2018-10-31 | 2019-02-15 | 广东小天才科技有限公司 | A kind of method and system of the semantic incomplete corpus of completion |
CN109343817A (en) * | 2018-09-10 | 2019-02-15 | 阿里巴巴集团控股有限公司 | The application method and device and electronic equipment of Self-Service |
-
2019
- 2019-03-13 CN CN201910186422.5A patent/CN109933198B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160351068A1 (en) * | 2014-08-27 | 2016-12-01 | South China University Of Technology | Finger reading method and device based on visual gestures |
CN106933783A (en) * | 2015-12-31 | 2017-07-07 | 远光软件股份有限公司 | A kind of method and device on the intelligent extraction date from text |
JP2018041155A (en) * | 2016-09-05 | 2018-03-15 | 株式会社野村総合研究所 | Voice order reception system |
CN109192204A (en) * | 2018-08-31 | 2019-01-11 | 广东小天才科技有限公司 | A kind of sound control method and smart machine based on smart machine camera |
CN109343817A (en) * | 2018-09-10 | 2019-02-15 | 阿里巴巴集团控股有限公司 | The application method and device and electronic equipment of Self-Service |
CN109344231A (en) * | 2018-10-31 | 2019-02-15 | 广东小天才科技有限公司 | A kind of method and system of the semantic incomplete corpus of completion |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110767219A (en) * | 2019-09-17 | 2020-02-07 | 中国第一汽车股份有限公司 | Semantic updating method, device, server and storage medium |
CN110767219B (en) * | 2019-09-17 | 2021-12-28 | 中国第一汽车股份有限公司 | Semantic updating method, device, server and storage medium |
CN112309387A (en) * | 2020-02-26 | 2021-02-02 | 北京字节跳动网络技术有限公司 | Method and apparatus for processing information |
CN111324206A (en) * | 2020-02-28 | 2020-06-23 | 重庆百事得大牛机器人有限公司 | Gesture interaction-based confirmation information identification system and method |
CN111353034A (en) * | 2020-02-28 | 2020-06-30 | 重庆百事得大牛机器人有限公司 | Legal fact correction system and method based on gesture collection |
CN111324206B (en) * | 2020-02-28 | 2023-07-18 | 重庆百事得大牛机器人有限公司 | System and method for identifying confirmation information based on gesture interaction |
CN111881691A (en) * | 2020-06-15 | 2020-11-03 | 惠州市德赛西威汽车电子股份有限公司 | System and method for enhancing vehicle-mounted semantic analysis by utilizing gestures |
CN112863508A (en) * | 2020-12-31 | 2021-05-28 | 思必驰科技股份有限公司 | Wake-up-free interaction method and device |
Also Published As
Publication number | Publication date |
---|---|
CN109933198B (en) | 2022-04-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109933198A (en) | A kind of method for recognizing semantics and device | |
CN107665708B (en) | Intelligent voice interaction method and system | |
KR102081925B1 (en) | display device and speech search method thereof | |
US5544050A (en) | Sign language learning system and method | |
CN104050160B (en) | Interpreter's method and apparatus that a kind of machine is blended with human translation | |
CN107632980A (en) | Voice translation method and device, the device for voiced translation | |
CN115082602B (en) | Method for generating digital person, training method, training device, training equipment and training medium for model | |
CN110781328A (en) | Video generation method, system, device and storage medium based on voice recognition | |
CN109215638B (en) | Voice learning method and device, voice equipment and storage medium | |
CN109817203B (en) | Voice interaction method and system | |
CN104090968B (en) | The method and apparatus that a kind of intelligent information is pushed | |
JP2017016296A (en) | Image display device | |
CN108877334A (en) | A kind of voice searches topic method and electronic equipment | |
CN111046148A (en) | Intelligent interaction system and intelligent customer service robot | |
JP2023062173A (en) | Video generation method and apparatus of the same, and neural network training method and apparatus of the same | |
CN109108989A (en) | A kind of legal services special purpose robot of semantics recognition | |
CN113837907A (en) | Man-machine interaction system and method for English teaching | |
CN109063182A (en) | A kind of content recommendation method and electronic equipment for searching topic based on voice | |
CN113542797A (en) | Interaction method and device in video playing and computer readable storage medium | |
CN111223014B (en) | Method and system for online generation of subdivision scene teaching courses from a large number of subdivision teaching contents | |
CN109871128B (en) | Question type identification method and device | |
CN115602167A (en) | Display device and voice recognition method | |
CN112261321B (en) | Subtitle processing method and device and electronic equipment | |
CN115269961A (en) | Content search method and related device | |
CN108694939A (en) | Phonetic search optimization method, device and system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |