CN109933198A

CN109933198A - A kind of method for recognizing semantics and device

Info

Publication number: CN109933198A
Application number: CN201910186422.5A
Authority: CN
Inventors: 魏誉荧
Original assignee: Guangdong Genius Technology Co Ltd
Current assignee: Guangdong Genius Technology Co Ltd
Priority date: 2019-03-13
Filing date: 2019-03-13
Publication date: 2019-06-25
Anticipated expiration: 2039-03-13
Also published as: CN109933198B

Abstract

The invention discloses a kind of method for recognizing semantics, comprising: acquires the voice messaging of user；Voice messaging is parsed, initial semantic information is obtained；Obtain the gesture motion information of user；According to gesture motion information, the text information of target area is obtained；According to initial semantic information and the text information of target area, the target for obtaining user is semantic.In addition, the invention also discloses a kind of semantic recognition devices, comprising: voice acquisition module, for acquiring the voice messaging of user；Speech recognition module obtains initial semantic information for parsing voice messaging；Data obtaining module obtains the text information of target area for obtaining the gesture motion information of user, and according to gesture motion information；Processing module is controlled, for the text information according to initial semantic information and target area, the target for obtaining user is semantic.Through the invention can user can not be accurately with phonetic representation or in the case where do not know how to express, the true intention of intelligent recognition user.

Description

A kind of method for recognizing semantics and device

Technical field

The present invention relates to semantics recognition technical field more particularly to a kind of method for recognizing semantics and device.

Background technique

With the fast development of internet, various intellectual products play more and more important work in people's lives With people also more and more habitually complete various demands using intelligent terminal.And increasingly with artificial intelligence the relevant technologies The intelligence degree of maturation, each Terminal Type is also higher and higher.Exchange of the interactive voice as human-computer interaction mainstream in intelligent terminal One of using, and increasingly by the favor of user.

Currently, the voice that many intelligent sound equipment are inputted based on user in the market identifies, then take corresponding Measure, thus the accuracy of voice that is inputted by intelligent sound equipment of user drastically influence it is anti-made by intelligent terminal Feedback.And for the child of primary grades, due to being likely to occur language during language expression in the stage for just starting study Speech statement is not complete, it is intended that fuzzy situation.Especially for the speech electronic product that child uses, child uses in operation process The speech production can generate a drawback, for not recognizing words or content, can not accurately be expressed with voice or do not know how Expression causes speech production in parsing semantic procedure using being limited, and speech production is also just difficult to the true of intelligent recognition user It is intended to.

Summary of the invention

In order to solve above-mentioned technological deficiency, the present invention provides a kind of method for recognizing semantics and device.Specifically, technical solution It is as follows:

On the one hand, the present invention provides a kind of method for recognizing semantics, comprising:

Acquire the voice messaging of user；

The voice messaging is parsed, initial semantic information is obtained；

Obtain the gesture motion information of the user；

According to the gesture motion information, the text information of target area is obtained；

According to the initial semantic information and the text information of target area, the target for obtaining the user is semantic.

Further, after the initial semantic information of acquisition further include:

Judge whether the initial semantic information is missing semantic information；

When judging the initial semantic information for missing semantic information, the gesture motion information of the user is obtained.

Further, described to judge whether the initial semantic information is that missing semantic information includes:

Judge whether comprising preset prompting message in the initial semantic information, if so, judging the initial semanteme Information is missing semantic information.

Further, described according to the initial semantic information and the text information of target area, obtain the target of user Semanteme includes:

According to the initial semantic information, the semantic regular expression of missing is generated；

According to the text information of the target area, the semantic regular expression of the missing is matched；

According to the semantic regular expression of successful match, the target for obtaining user is semantic.

Further, described according to the gesture motion information, the text information for obtaining target area includes:

According to the gesture motion information, the image of target area is obtained；

Image procossing is carried out to the image of the target area, identifies the text information in the image of the target area.

On the other hand, the invention also discloses a kind of semantic recognition devices, comprising:

Voice acquisition module, for acquiring the voice messaging of user；

Speech recognition module obtains initial semantic information for parsing the voice messaging；

Data obtaining module is obtained for obtaining the gesture motion information of the user, and according to the gesture motion information Take the text information of target area；

It controls processing module and obtains the use for the text information according to the initial semantic information and target area The target at family is semantic.

Semantic judgement module, for judging described first according to the initial semantic information after speech recognition module parsing Whether beginning semantic information is missing semantic information；

The data obtaining module is also used to obtain institute when determining the initial semantic information to lack semantic information State the gesture motion information of user.

Further, the Semantic judgement module in semantic recognition device of the present invention includes:

Submodule is searched, for whether searching in the initial semantic information comprising preset prompting message；

Decision sub-module, for finding in the initial semantic information when the lookup submodule comprising preset prompting When information, the initial semantic information is determined to lack semantic information.

Further, the control processing module of semantic recognition device of the present invention includes:

Expression formula generates submodule, for generating the semantic regular expression of missing according to the initial semantic information；

Matched sub-block matches the semantic regular expressions of the missing for the text information according to the target area Formula；

Semantic acquisition submodule, for the semantic regular expression according to successful match, the target for obtaining user is semantic.

Further, the data obtaining module in semantic recognition device of the present invention includes:

Image taking submodule, for obtaining the gesture motion information of the user, and according to the gesture motion information, Obtain the image of target area；

Image procossing submodule carries out image procossing for the image to the target area；

Image recognition submodule, for identification text information in the image of the target area.

The present invention is included at least with the next item down advantageous effects:

(1) the present invention overcomes the deficiencies of single voice input obtains under the premise of voice input in conjunction with gesture motion The text information of target area, the two combines the rear more accurate true semanteme for getting user, so that speech ciphering equipment User can not be accurately with phonetic representation or do not know how to express in the case where can also be with the true intention of intelligent recognition to user.

(2) after getting the voice messaging of user, the voice messaging obtained to it carries out parsing and obtains initially the present invention Semantic information, then judges whether the initial semantic information is missing from semantic information again, is only judging that the semantic information is After lacking semantic information, the gesture motion information for obtaining user can be just triggered, and then obtain the text information of target area.Such as This is greatly saved power consumption, only judges so that the acquisition of gesture motion information and text information is triggering progress of having ready conditions Voice messaging belongs to missing semantic information, can not be obtained after whole user is intended to accordingly and just open subsequent operation and carry out auxiliary solution Analysis, so that the two, which combines, understands that user's true intention, intelligence degree are higher.

(3) judgement of semantic information whether is missing from the present invention to initial semantic information, can be used search whether include The mode of prompting message is preset to judge.Default prompting message is a kind of relatively simple easy scheme, program operability By force, it is easy to accomplish.The initial semantic information after parsing need to be only compared with preset prompting message.

(4) in the case where user speech information is unable to expressed intact actual wishes, (initial semantic information is scarce to the present invention Lose semantic information), the gesture motion information of user is obtained, the target area that user is directed toward is obtained according to the gesture motion of user Image, then Text region is carried out to the image of the target area, the text information content that user gesture is directed toward is obtained, in conjunction with it The initial semanteme that preceding voice messaging includes is to obtain user's true intention.Image obtains and identification technology is more mature, knows Other speed is fast, and feedback can be quickly provided after getting user speech information, improves user experience.

(5) present invention produces the semantic canonical of missing according to the initial semantic information that the voice messaging of parsing user obtains Expression formula is matched in conjunction with the text information for the target area that image recognition below obtains, then can be quickly obtained complete language Adopted clause, so that user's true intention is learnt, convenient for providing respective feedback.Using the matching scheme of semantic regular expression, side Just quick.

Detailed description of the invention

To describe the technical solutions in the embodiments of the present invention more clearly, make required in being described below to embodiment Attached drawing is briefly introduced, it should be apparent that, drawings in the following description are only some embodiments of the invention, for this For the those of ordinary skill in field, without any creative labor, it can also be obtained according to these attached drawings His attached drawing.

Fig. 1 is a kind of flow chart of method for recognizing semantics embodiment one of the present invention；

Fig. 2 is a kind of flow chart of method for recognizing semantics embodiment two of the present invention；

Fig. 3 is a kind of flow chart of method for recognizing semantics embodiment three of the present invention；

Fig. 4 is a kind of flow chart of method for recognizing semantics example IV of the present invention；

Fig. 5 is a kind of flow chart of method for recognizing semantics embodiment five of the present invention；

Fig. 6 is a kind of flow chart of method for recognizing semantics embodiment six of the present invention；

Fig. 7 is a kind of block diagram of semantic recognition device embodiment seven of the present invention；

Fig. 8 is a kind of block diagram of semantic recognition device embodiment eight of the present invention；

Fig. 9 is a kind of block diagram of semantic recognition device embodiment nine of the present invention.

Appended drawing reference:

10-- voice acquisition module；20-- speech recognition module；30-- data obtaining module；31-- image taking submodule Block；32-- image procossing submodule；33-- image recognition submodule；40-- controls processing module；41-- expression formula generates submodule Block；42-- matched sub-block；43-- semanteme acquisition submodule；50-- Semantic judgement submodule；51-- searches submodule；52-- sentences Stator modules.

Specific embodiment

To make the objectives, technical solutions, and advantages of the present invention clearer, below in conjunction with attached drawing to the present invention make into It is described in detail to one step, it is clear that the described embodiments are only some of the embodiments of the present invention, rather than whole implementation Example.Based on the embodiments of the present invention, obtained by those of ordinary skill in the art without making creative efforts All other embodiment, shall fall within the protection scope of the present invention.

Embodiment one

The present invention provides a kind of method for recognizing semantics, and embodiment one is as shown in Figure 1, comprising:

The voice messaging of S101 acquisition user；

Specifically, obtaining the voice messaging of user, voice letter can be acquired by microphone or other voice acquisition devices Breath.The voice messaging can be the voice that user inputs in real time, and certain voice messaging is not necessarily completely, it may be possible to complete Voice messaging, it is also possible to only part of speech.

S102 parses the voice messaging, obtains initial semantic information；

After getting voice messaging, then start to parse voice messaging, wants table to get the voice messaging The basic semantic reached.Corresponding semanteme is obtained about voice messaging is carried out parsing, existing various technological means can be used Solve, the present invention is not limited to certain class parsing scheme, and the program is also not improvement of the invention, therefore, the present invention no longer into Row is described in detail.

S103 obtains the gesture motion information of the user；

Specifically, the gesture motion information for obtaining user can be the gesture motion figure for taking user by camera Picture is also possible to sense the gesture motion of user by other sensing apparatus.

S104 obtains the text information of target area according to the gesture motion information；

After getting the gesture motion information of user, the text that target area can be obtained according to the gesture motion of user is believed Breath, such as bookish specific certain the road operation topic being directed toward according to user's finger.And obtain the text information of target area, then may be used To be the image for taking the learning region of user's direction by camera, such as the image of certain problem, then to image into Row processing identification, obtains the text information of topic.

For S105 according to the initial semantic information and the text information of target area, the target for obtaining the user is semantic.

According to the initial semantic information that parsing voice messaging obtains before, believe in conjunction with the text of the target area obtained below Breath, the two fusion, can obtain user's true intention, to give corresponding corresponding.

The deficiency that the present embodiment overcomes single voice input obtains under the premise of voice input in conjunction with gesture motion The text information of target area, the two combine the rear more accurate true semanteme for getting user.

It obtains it is worth noting that, the acquisition of user gesture action message can be shot by camera or passes through it Its sensing apparatus senses the gesture motion of user.Obtained if it is using camera, can also by a variety of processing modes, For example, camera is constantly in open stage in speech ciphering equipment service stage, the video of user and user's learning region is shot.When When collecting the voice messaging of user, go to extract the video frame shot further according to the time point of acquisition, the gesture for obtaining user is dynamic Make image, then identify the gesture motion, obtains the specific topic image for the learning region that user is directed toward.Again to the topic image Text region is carried out, corresponding text information is obtained, so that the initial semanteme of the voice messaging before combining, obtains the true of user It is real semantic, to give corresponding response.Certainly, the camera of the intelligent sound equipment, which can not also be constantly in, opens rank Section, after the voice messaging for only collecting user, it can just open the gesture motion of shooting user and the image of corresponding learning region Deng.Dormant state can be returned to after shooting again before.

Embodiment two

The second embodiment of the present invention is as shown in Figure 2, comprising:

The voice messaging of S201 acquisition user；

S202 parses the voice messaging, obtains initial semantic information；

S203 judges whether the initial semantic information is missing semantic information；

Specifically, missing semantic information, that is to say, that the information that complete semantic and user is intended to can not be obtained.In other words It says, is exactly the initial semanteme for passing through collected user speech information acquisition merely, can not completely obtain the intention of user, also need The true semanteme for obtaining user is cooperateed with by other supplementary modes.

S204 obtains the gesture motion letter of the user when judging the initial semantic information for missing semantic information Breath；

S205 obtains the text information of target area according to the gesture motion information；

For S206 according to the initial semantic information and the text information of target area, the target for obtaining the user is semantic.

The present embodiment is relative to embodiment one, after getting initial semantic information, increase by initial semantic information with Lack semantic information contrast judgement step.By judge the initial semanteme whether be missing from semantic information decide whether triggering obtain Take the gesture motion information at family.The present embodiment is only determining that initial semanteme is to belong to missing semantic information Shi Caihui to go to obtain The step for gesture motion information of user, increase, so that the acquisition of gesture motion information is saved significantly in conditional triggering Power consumption is saved.

Specifically, such as user using intelligent sound equipment auxiliary reading, it is seen that " I likes grape " this when, " Portugal The two words of grape " do not recognize, and in being intended to request help, then, which says " what the two words read ", then, intelligent sound Equipment collects the voice, obtains user after then parsing and does not know the pronunciation of the two words, and then can not for which two word It is expressed by voice messaging, it is semantic to belong to missing according to the initial semantic information judgement.Then, restart camera shooting user Gesture motion, take the image that user's finger is directed toward " grape " the two words, then image procossing and text are carried out to the region Word identification, obtains " I likes grape " this text information, then in conjunction with the semanteme " what the two words read " before user, from And obtain the true semanteme of user be do not know reading " grape " the two words, and then can provide corresponding response " owner, the two Word reads grape, I likes grape ".In this way, words unacquainted for user, can be obtained auxiliary by the intelligent sound equipment It helps, user is helped to carry out the reading of books.

Likewise, such as user see " I likes grape " this when, " grape " the two words do not recognize, then Want to request help, then, which says " I likes -- ", and then, intelligent sound equipment collects the voice, obtains after then parsing The semanteme for obtaining user is " I likes -- ", and is come out for liking subsequent object not over phonetic representation, therefore can root It is semantic to belong to missing according to the initial semantic information judgement.Then, the gesture motion for restarting camera shooting user, takes use Family finger is directed toward the image of " grape " the two words, then carries out image procossing and Text region to the region, and " I likes Portugal for acquisition This text information of grape ", then in conjunction with the semanteme " I likes -- " before user, so that obtaining the true semanteme of user is not know " grape " the two words are read, and then corresponding response " owner, I likes grape " can be provided.In this way, not recognizing for user Words, can read-only understanding words, and unacquainted words then may indicate that, understand user by the intelligent sound equipment Intention, help user to recognize unacquainted words, help to read and understand.

Embodiment three

The method for recognizing semantics of the present embodiment, as shown in Figure 3, comprising:

The voice messaging of S301 acquisition user；

S302 parses the voice messaging, obtains initial semantic information；

S303 judges whether the initial semantic information is missing semantic information；If so, entering step S305, otherwise enter Step S304；

For S304 according to the initial semantic information, the target for obtaining the user is semantic；

S305 obtains the gesture motion information of the user；

S306 obtains the text information of target area according to the gesture motion information；

For S307 according to the initial semantic information and the text information of target area, the target for obtaining the user is semantic.

Whether the present embodiment is missing from semantic information using different treating methods according to initial semantic information.For solution When the initial semantic information obtained after analysis voice messaging is missing from semantic information, the acquisition of user gesture action message is triggered, into And target area text information is obtained, combine the target for obtaining user semantic by target text information and initial semantic information. And the case where semantic information is not missing from for initial semantic information, then without triggering the acquisition of user gesture action message, directly The true semanteme of user can be obtained by the initial semanteme of the voice messaging of user.For example, collecting the voice messaging of user Be " what the English of elephant is? ", since the collected voice messaging can completely state out the true intention of user, language Adopted complete, therefore, the initial semantic information for parsing voice messaging acquisition is not belonging to missing semantic information, then just only needing root The target that user can be directly obtained according to the initial semantic information is semantic -- wonder the English of elephant.So also just without going to touch Hair obtains corresponding user gesture motion images etc., the initial semantic information that can directly obtain according to the voice messaging of parsing user The true intention of user is obtained, so as to provide corresponding response in time: the English of elephant is elephant.

Example IV

The method for recognizing semantics of the present embodiment, as shown in Figure 4, comprising:

The voice messaging of S401 acquisition user；

S402 parses the voice messaging, obtains initial semantic information；

Whether S403 judges in the initial semantic information comprising preset prompting message；If so, entering step S404

S404 determines the initial semantic information to lack semantic information；

S405 obtains the gesture motion letter of the user when determining the initial semantic information to lack semantic information Breath；

S406 obtains the text information of target area according to the gesture motion information；

For S407 according to the initial semantic information and the text information of target area, the target for obtaining the user is semantic.

Specifically, such as preset prompting message is that " small talent puts question to！", then only needing to collect user speech letter " small talent, enquirement are contained in breath！" the initial semantic information can be determined to lack semantic information, thus after acquisition need to being triggered Continuous gesture motion information, and then it is semantic come the target for assisting in identifying user to obtain the text information of target area.It is preset to mention One or more can be set in awake information.

Default prompting message is a kind of relatively simple easy scheme, program strong operability, it is easy to accomplish.Only need by Initial semantic information after parsing is compared with preset prompting message.

Certainly, the judgement for whether being missing from semantic information for initial semantic information can also use other more intelligent Mode.After getting initial semantic information say by parsing voice messaging, sentenced according to the semanteme intelligence of initial semantic information Break the voice messaging whether can expressed intact user true intention.For example, the initial semantic information after parsing is that " this word is read What ", after getting the initial semanteme, since this word does not know which word referred to, thus user response can not be given, not Solution user really wants the pronunciation for being appreciated which word, therefore also can not just provide response.In this case, equipment can not Corresponding response is provided according to the voice messaging of user, then this initial semantic information is just missing semantic information, " this The case where which word word " refers specifically to, and belongs to missing, it is also necessary to which the gesture motion of user assists identifying.

Embodiment five

The method for recognizing semantics of the present embodiment, as shown in Figure 5, comprising:

The voice messaging of S501 acquisition user；

S502 parses the voice messaging, obtains initial semantic information；

S503 judges whether the initial semantic information is missing semantic information；

S504 obtains the gesture motion letter of the user when judging the initial semantic information for missing semantic information Breath；

S505 obtains the text information of target area according to the gesture motion information；

S506 generates the semantic regular expression of missing according to the initial semantic information；

S507 matches the semantic regular expression of the missing according to the text information of the target area；

S508 obtains the target semanteme of user according to the semantic regular expression of successful match.

The present embodiment is semantic for the target for obtaining user according to the gesture motion information of user and initial semantic information It is refined.Specifically, after getting the text information of initial semantic information and target area, first according to initial semantic letter Breath, generates the semantic regular expression of missing.For example, the voice messaging for collecting user is " what the two words read ", then According to the initial semanteme of the voice messaging, it is that the two words are not known yet for which two word, belongs to the part of missing, therefore produce The semantic regular expression of missing: what XX reads.Then text information " the Portugal for the target area being directed toward further according to user's finger Grape ", the target semanteme that can get user is " what grape reads ", that is, the pronunciation of grape.

Likewise, if user is not to say that the two words read this kind of missing semanteme for puing question to class such as what, but directly The words of read-only understanding, and unacquainted words is not read out then, it points out to request help by gesture.For example user sees See " I likes grape " this when, " grape " the two words do not recognize, and in being intended to request help, then, which says " I Like -- ", then, intelligent sound equipment collects the voice, and obtaining the semanteme of user after then parsing is " I likes -- ", and It is but come out not over phonetic representation for liking subsequent object, therefore missing can be belonged to according to the initial semantic information judgement It is semantic.Then, can be first according to the initial semantic information, generate the semantic regular expression of missing: I likes XX.Then further according to The image for the target area that the user gesture taken is directed toward, the text information for recognizing the target area of user gesture direction are " I likes grape ", the missing semanteme regular expression before then matching again: I likes XX, obtains complete semantic sentence: I Like grape, thus inform that user user reads I to like subsequent words be grape, can be with voice output: I likes Portugal Grape.

Specifically, for example, user uses intelligent sound equipment assisted learning.Speech ciphering equipment collects the voice messaging of user Afterwards, the camera that will start in speech ciphering equipment is synchronized, finger click action of the user in learning process is collected, is sentenced by movement After the fixed corresponding learning region for generating query, identification and intents, and voice before combination are carried out to the text in region Semanteme in input matches the semantic slot in the semantic canonical clause of missing, is carried out to semantic slot by the result that text parses Filling obtains true semanteme, provides true intention of the user under fuzzy scene.

Embodiment six

The voice messaging of S601 acquisition user；

S602 parses the voice messaging, obtains initial semantic information；

S603 obtains the gesture motion information of the user；

S604 obtains the image of target area according to the gesture motion information；

S605 carries out image procossing to the image of the target area, identifies the text letter in the image of the target area Breath；

For S606 according to the initial semantic information and the text information of target area, the target for obtaining the user is semantic.

The present embodiment is directed to be described in detail according to the text information of the gesture motion acquisition of information target area of user, Specifically, being directed toward, obtaining further according to the gesture motion of user for example, the gesture motion image of user can be got by camera Obtain specific location (target area) image that the user is directed toward.Then specific location (target area) image is carried out at image Reason, identifies the text information of the target area.

Embodiment seven

Based on the same technical idea, the invention also discloses a kind of semantic recognition device, which can be used the present invention Method for recognizing semantics identify the true semanteme of user, specifically, the embodiment of the present invention seven as shown in fig. 7, comprises:

Voice acquisition module 10, for acquiring the voice messaging of user；

Speech recognition module 20 obtains initial semantic information for parsing the voice messaging；

Data obtaining module 30, for obtaining the gesture motion information of the user, and according to the gesture motion information, Obtain the text information of target area；

Processing module 40 is controlled, for the text information according to the initial semantic information and target area, described in acquisition The target of user is semantic.

In the present embodiment, the voice messaging of user is acquired by voice acquisition module 10, then passes through speech recognition mould again Block 20 parses the voice messaging, obtains initial semantic.The gesture motion for obtaining user by data obtaining module 30 again, determines Target area, and then the image information of target area is obtained, it parses the image and obtains corresponding text information.Finally, at control Reason module 40 is in the target area that the initial semantic information and data obtaining module 30 obtained according to speech recognition module 20 obtains Text information it is semantic come the target for obtaining user.Specifically, such as user carrys out assisted learning using intelligent sound equipment, the intelligence On the one hand energy speech ciphering equipment acquires the voice messaging of user, while the camera in the intelligent sound equipment is collected user and learnt Finger click action in the process.After determining the corresponding learning region for generating query by movement, to the text in region into Row identification and intents, and the semanteme before combination in voice input, acquisition is true semantic, provides user under the scene True intention.

The present embodiment initial semantic information that parsing voice messaging obtains before, in conjunction with the target area obtained below Text information, the two fusion can obtain user's true intention, to give corresponding corresponding.The semantic of the embodiment is known Other device overcomes the deficiency of single voice input, under the premise of voice input, obtains target area in conjunction with gesture motion Text information, the two combine the rear more accurate true semanteme for getting user.

In addition, data obtaining module 30 obtains user gesture action message, for example acquisition is shot by camera, image Head can be constantly in open stage in speech ciphering equipment (built-in semantic recognition device) service stage, shoot user and user's study The video in region.When collecting the voice messaging of user, goes to extract the video frame shot further according to the time point of acquisition, obtain Then the gesture motion image of user identifies the gesture motion, obtain the specific topic image for the learning region that user is directed toward.Again Text region is carried out to the topic image, obtains corresponding text information, thus the initial semanteme of the voice messaging before combining, The true semanteme for obtaining user, to give corresponding response.Certainly, the camera of the intelligent sound equipment can not also be always In open stage, after the voice messaging for only collecting user, the gesture motion and corresponding study of shooting user can be just opened The image etc. in region.Dormant state can be returned to after shooting again before.

Embodiment eight

The present embodiment increases parsing module and Semantic judgement module 50 on the basis of embodiment seven, specifically, as schemed Shown in 8, semantic recognition device of the present invention further include:

Semantic judgement module 50, for judging institute according to the initial semantic information after the speech recognition module 20 parsing State whether initial semantic information is missing semantic information；

The data obtaining module 30 is also used to obtain when determining the initial semantic information to lack semantic information The gesture motion information of the user.

The present embodiment increases Semantic judgement module 50 relative to embodiment seven.Judged by the Semantic judgement module 50 Whether the initial semanteme is missing from semantic information to decide whether that the gesture motion for triggering the acquisition user of data obtaining module 30 is believed Breath.The present embodiment only determines that initial semanteme is to belong to missing semantic information Shi Caihui to go to pass through information in Semantic judgement module 50 The step for gesture motion information of the acquisition acquisition user of module 30, increase, so that the acquisition of gesture motion information is that having item The triggering of part, is greatly saved power consumption.

Specifically, such as user uses the reading of intelligent sound equipment (built-in semantic recognition device) auxiliary, it is seen that " I am very mentally disturbed Te " this when, " perturbed " the two words do not recognize, in being intended to request help, then, the user say " the two words read it is assorted ", then, voice acquisition module 10 collects the voice, obtains user after then parsing by speech recognition module 20 and does not know The pronunciation of the two words of road, and which two word can not then be expressed by voice messaging, Semantic judgement module 50 is first according to this It is semantic that the judgement of beginning semantic information belongs to missing.Then, the gesture for restarting the camera shooting user of data obtaining module 30 is dynamic Make, takes the image that user's finger is directed toward " perturbed " the two words, then image procossing and Text region are carried out to the region, obtain " I am very perturbed " this text information is obtained, then the semanteme before control processing module 40 combination user " read assorted by the two words ", thus obtain the true semanteme of user be do not know reading " perturbed " the two words, and then can provide corresponding response " owner, The reading of the two words is perturbed, I am very perturbed ".In this way, words unacquainted for user, can be obtained by the intelligent sound equipment Auxiliary helps user to carry out the reading of books.

Certainly, other form of presentation can be used also to express the same meaning in user.Likewise, such as user sees " I am very perturbed " this when, " perturbed " the two words do not recognize, and then, which says " I very -- ", and intelligent sound equipment is (interior Set semantic recognition device) voice is collected, the semanteme that user is obtained after then parsing is " I very -- ", it is evident that this is initial Semanteme fails the intention that expressed intact goes out user, therefore it is semantic to belong to missing according to the initial semantic information judgement.Then, then The gesture motion for starting camera shooting user takes the image that user's finger is directed toward " I am very perturbed " this passage, then Image procossing and Text region are carried out to the region, " I am very perturbed " this text information is obtained, before user Semantic " I very -- ", so that obtaining the true semanteme of user is not know readings " perturbed " the two words, and then can provide accordingly Respond " owner, the reading of the two words is perturbed, I am very perturbed ".In this way, words unacquainted for user, can read-only understanding word Word, and unacquainted words then may indicate that, the intention for understanding user by the intelligent sound equipment, and user's understanding is helped not recognize The words of knowledge helps to read understanding.

Preferably, the control processing module 40 in semantic recognition device of the present invention, is also used to sentence when the semanteme When disconnected module 50 determines that the initial semantic information is not missing from semantic information, the use is obtained according to the initial semantic information The target at family is semantic.

The case where not being missing from semantic information for the initial semantic information of Semantic judgement module 50 is then obtained without triggering information Modulus block 30 obtains user gesture action message, and the first of the voice messaging acquisition of user can be directly parsed by semantics recognition module Begin the semantic true semanteme to obtain user.For example, the voice messaging for collecting user is " 14+25 is equal to how many? ", due to adopting The voice messaging collected can completely state out the true intention of user, semantic complete, therefore, parse the voice messaging and obtain The initial semantic information obtained is not belonging to missing semantic information, then just only needing to be directly obtained according to the initial semantic information The target of user is semantic -- wonder the result of 14+25.So also just without going triggering to obtain corresponding user gesture action diagram As etc., the true intention of user can be directly obtained according to the initial semantic information that the voice messaging of parsing user obtains, so as to To provide corresponding response in time: 14+25 is equal to 39.

Embodiment nine

The present embodiment refines Semantic judgement module 50 on the basis of above-described embodiment eight, specifically, such as Fig. 9 Shown, the Semantic judgement module 50 includes:

Submodule 51 is searched, for whether searching in the initial semantic information comprising preset prompting message；

Decision sub-module 52, for finding in the initial semantic information when the lookup submodule 51 comprising preset When prompting message, the initial semantic information is determined to lack semantic information.

Specifically, such as preset prompting message is that " hello, small talent！", that is to say, that user can not only pass through voice Come expressed intact it is really semantic when, then only needing first to say " hello, small talent ", may then continue to carry out speech expression, Or it no longer carries out speech expression and directly uses gesture and act to indicate.Voice acquisition module 10 collects user speech information " hello, small talent！What this word reads " after, speech recognition module 20 identifies it, obtains initial semantic.Then lead to The lookup submodule 51 for crossing Semantic judgement module 50 searches in the initial semanteme whether contain " hello, small talent " this prompting Information, after finding, decision sub-module 52 then determines that the initial semantic information to lack semantic information, obtains so that information need to be triggered Modulus block 30 obtains subsequent gesture motion information, and then obtains the text information of target area to assist in identifying the target of user It is semantic.One or more can be set in preset prompting message.The Semantic judgement module 50 of the embodiment is easily achieved, simply just It is prompt.

Certainly, the Semantic judgement for whether being missing from semantic information for initial semantic information can also use other more intelligence The mode of energy.After getting initial semantic information say by the parsing voice messaging of speech recognition module 20, Semantic judgement mould Block 50 according to the semantic intelligent decision of the initial semantic information voice messaging whether can expressed intact user true intention.Than Such as, the initial semantic information after parsing is " what this word reads ", and after getting the initial semanteme, not known due to this word is Which word referred to, thus user response can not be given, not knowing about user really wants the pronunciation for being appreciated which word, therefore also Response can not be provided.In this case, equipment can not provide corresponding response according to the voice messaging of user, then at the beginning of this The case where beginning semantic information is just missing semantic information, which word " this word " refers specifically to, belong to missing, it is also necessary to the hand of user Gesture acts to assist being identified.

Preferably, as shown in figure 9, the present embodiment is to semantic recognition device on the basis of any of the above-described Installation practice Control processing module 40 be unfolded, control processing module 40 include:

Expression formula generates submodule 41, for generating the semantic regular expression of missing according to the initial semantic information；

Matched sub-block 42 matches the semantic canonical table of the missing for the text information according to the target area Up to formula；

Semantic acquisition submodule 43, for the semantic regular expression according to successful match, the target for obtaining user is semantic.

The present embodiment elaborates control processing module 40.Specifically, being obtained by speech recognition module 20 Initial semantic information is got, after the text information that target area is got by data obtaining module 30, controls processing module 40 Expression formula generate submodule 41 first according to initial semantic information, generate the semantic regular expression of missing.For example, collecting use The voice messaging at family is " what the two words read ", then according to the initial semanteme of the voice messaging, the two words are not also known It is any two words, belong to the part of missing, therefore produce the semantic regular expression of missing: what XX reads.Then matching The text information " perturbed " for the target area that module 42 is directed toward further according to user's finger, is matched to the semantic regular expression of missing " perturbed that is read ", so that the semantic target semanteme for obtaining the available user of module is perturbed pronunciation.

Likewise, if user is not to say that the two words read this kind of missing semanteme for puing question to class such as what, but directly The words of read-only understanding, and unacquainted words is not read out then, it points out to request help by gesture.For example user sees See " I am very perturbed " this when, " perturbed " the two words do not recognize, and in being intended to request help, then, which says " I Very -- ", then, intelligent sound equipment collects the voice, and the semanteme that user is obtained after then parsing is " I very -- ", this is initial The semantic expressed intact that obviously fails goes out user's intention, and sentence is simultaneously imperfect, therefore can be belonged to according to the initial semantic information judgement Missing is semantic.Then, can be first according to the initial semantic information, generate the semantic regular expression of missing: I am very XX.Then root again According to the image for the target area that the user gesture taken is directed toward, the text information of the target area of user gesture direction is recognized For " I am very perturbed ", missing semanteme regular expression before then matching again: I am very XX, and obtain complete semantic sentence: I am very It is perturbed.To inform that user user reads I very -- subsequent words is grape, can be with voice output: perturbed, I am very perturbed.

Preferably, as shown in figure 9, the data obtaining module 30 in the semantic recognition device of any of the above-described embodiment includes:

Image taking submodule 31 is believed for obtaining the gesture motion information of the user, and according to the gesture motion Breath, obtains the image of target area；

Image procossing submodule 32 carries out image procossing for the image to the target area；

Image recognition submodule 33, for identification text information in the image of the target area.

The present embodiment is refined for data obtaining module 30, specifically, for example, can pass through image taking submodule 31, for example camera gets the gesture motion image of user, is directed toward further according to the gesture motion of user, obtains user direction Specific location (target area) image.Image procossing submodule 32 carries out image to specific location (target area) image again Processing, and pass through the text information that image recognition submodule 33 identifies the target area.

Semantic recognition device of the invention is built-in in all kinds of smart machines, for example is built in the language of auxiliary user's study In sound equipment.The speech ciphering equipment can acquire the voice messaging of user, obtain the initial semanteme of user, then pass through starting speech ciphering equipment In camera, collect finger click action of the user in learning process, pass through movement and determine corresponding generate query After practising region, identification and intents, and the initial semanteme that voice inputs before combination are carried out to the text in region, matching lacks Semantic slot in the semantic canonical clause of mistake is filled semantic slot by the result that text parses, and obtains true semanteme, gives True intention of the user under fuzzy scene out.

Although preferred embodiments of the present invention have been described, it is created once a person skilled in the art knows basic Property concept, then additional changes and modifications may be made to these embodiments.So it includes excellent that the following claims are intended to be interpreted as It selects embodiment and falls into all change and modification of the scope of the invention.

Obviously, various changes and modifications can be made to the invention without departing from essence of the invention by those skilled in the art Mind and range.In this way, if these modifications and changes of the present invention belongs to the range of the claims in the present invention and its equivalent technologies Within, then the present invention is also intended to include these modifications and variations.

Claims

1. a kind of method for recognizing semantics characterized by comprising

Acquire the voice messaging of user；

The voice messaging is parsed, initial semantic information is obtained；

Obtain the gesture motion information of the user；

2. a kind of method for recognizing semantics according to claim 1, which is characterized in that it is described obtain initial semantic information it Afterwards further include:

3. a kind of method for recognizing semantics according to claim 2, which is characterized in that the judgement initial semantic information It whether is that missing semantic information includes:

Judge whether comprising preset prompting message in the initial semantic information, if so, judging the initial semantic information To lack semantic information.

4. a kind of method for recognizing semantics according to claim 1, which is characterized in that described according to the initial semantic information And the text information of target area, the target semanteme for obtaining user include:

5. a kind of method for recognizing semantics according to claim 1-4, which is characterized in that described according to the gesture Action message, the text information for obtaining target area include:

6. a kind of semantic recognition device characterized by comprising

Voice acquisition module, for acquiring the voice messaging of user；

Data obtaining module obtains mesh for obtaining the gesture motion information of the user, and according to the gesture motion information Mark the text information in region；

It controls processing module and obtains the user's for the text information according to the initial semantic information and target area Target is semantic.

7. a kind of semantic recognition device according to claim 6, which is characterized in that further include:

Semantic judgement module, for judging the initial language according to the initial semantic information after speech recognition module parsing Whether adopted information is missing semantic information；

The data obtaining module is also used to obtain the use when determining the initial semantic information to lack semantic information The gesture motion information at family.

8. a kind of semantic recognition device according to claim 7, which is characterized in that the Semantic judgement module includes:

Decision sub-module, for finding in the initial semantic information when the lookup submodule comprising preset prompting message When, the initial semantic information is determined to lack semantic information.

9. a kind of semantic recognition device according to claim 6, which is characterized in that the control processing module includes:

Matched sub-block matches the semantic regular expression of the missing for the text information according to the target area；

10. according to a kind of described in any item semantic recognition devices of claim 6-9, which is characterized in that the acquisition of information mould Block includes:

Image taking submodule is obtained for obtaining the gesture motion information of the user, and according to the gesture motion information The image of target area；