CN109933198B

CN109933198B - Semantic recognition method and device

Info

Publication number: CN109933198B
Application number: CN201910186422.5A
Authority: CN
Inventors: 魏誉荧
Original assignee: Guangdong Genius Technology Co Ltd
Current assignee: Guangdong Genius Technology Co Ltd
Priority date: 2019-03-13
Filing date: 2019-03-13
Publication date: 2022-04-05
Anticipated expiration: 2039-03-13
Also published as: CN109933198A

Abstract

The invention discloses a semantic recognition method, which comprises the following steps: collecting voice information of a user; analyzing the voice information to obtain initial semantic information; acquiring gesture action information of a user; acquiring character information of a target area according to the gesture action information; and acquiring the target semantics of the user according to the initial semantic information and the character information of the target area. In addition, the invention also discloses a semantic recognition device, which comprises: the voice acquisition module is used for acquiring voice information of a user; the voice recognition module is used for analyzing the voice information and acquiring initial semantic information; the information acquisition module is used for acquiring gesture action information of a user and acquiring character information of a target area according to the gesture action information; and the control processing module is used for acquiring the target semantics of the user according to the initial semantic information and the character information of the target area. By the method and the device, the real intention of the user can be intelligently identified under the condition that the user cannot accurately express the voice or does not know how to express the voice.

Description

Semantic recognition method and device

Technical Field

The invention relates to the technical field of semantic recognition, in particular to a semantic recognition method and a semantic recognition device.

Background

With the rapid development of the internet, various intelligent products play more and more important roles in the life of people, and people are also more and more accustomed to using intelligent terminals to fulfill various requirements. And along with the increasing maturity of the related technology of artificial intelligence, the intelligent degree of various terminals is also higher and higher. Voice interaction is also becoming more popular with users as one of the mainstream communication applications of human-computer interaction in intelligent terminals.

At present, many intelligent voice devices in the market recognize based on voice input by users, and then take corresponding measures, so that the accuracy of the voice input by the users through the intelligent voice devices seriously affects the feedback made by the intelligent terminals. In the case of a young child, the language expression may be incomplete and the intention may be ambiguous in the language expression process because the child is in the stage of just beginning learning. Especially for the voice electronic product used by children, the use of the voice product by children in the operation process can generate a disadvantage that the use of the voice product in the semantic parsing process is limited due to the fact that the children cannot accurately express words or contents or know how to express the words or contents, and the real intention of the user cannot be intelligently identified by the voice product.

Disclosure of Invention

In order to solve the technical defects, the invention provides a semantic recognition method and a semantic recognition device. Specifically, the technical scheme is as follows:

in one aspect, the present invention provides a semantic recognition method, including:

collecting voice information of a user;

analyzing the voice information to obtain initial semantic information;

acquiring gesture action information of the user;

acquiring character information of a target area according to the gesture action information;

and acquiring the target semantics of the user according to the initial semantic information and the character information of the target area.

Further, after the obtaining the initial semantic information, the method further includes:

judging whether the initial semantic information is missing semantic information or not;

and when the initial semantic information is judged to be missing semantic information, acquiring gesture action information of the user.

Further, the determining whether the initial semantic information is missing semantic information includes:

and judging whether the initial semantic information contains preset reminding information, if so, judging that the initial semantic information is missing semantic information.

Further, the obtaining the target semantics of the user according to the initial semantic information and the text information of the target area includes:

generating a missing semantic regular expression according to the initial semantic information;

matching the missing semantic regular expression according to the character information of the target area;

and acquiring the target semantics of the user according to the successfully matched semantic regular expression.

Further, the acquiring text information of the target area according to the gesture action information includes:

acquiring an image of a target area according to the gesture action information;

and carrying out image processing on the image of the target area, and identifying character information in the image of the target area.

On the other hand, the invention also discloses a semantic recognition device, which comprises:

the voice acquisition module is used for acquiring voice information of a user;

the voice recognition module is used for analyzing the voice information and acquiring initial semantic information;

the information acquisition module is used for acquiring gesture action information of the user and acquiring character information of a target area according to the gesture action information;

and the control processing module is used for acquiring the target semantics of the user according to the initial semantic information and the character information of the target area.

The semantic judgment module is used for judging whether the initial semantic information is missing semantic information or not according to the initial semantic information analyzed by the voice recognition module;

the information acquisition module is further configured to acquire gesture action information of the user when it is determined that the initial semantic information is missing semantic information.

Further, the semantic judgment module in the semantic recognition device of the present invention includes:

the searching submodule is used for searching whether the initial semantic information contains preset reminding information or not;

and the judging submodule is used for judging the initial semantic information as missing semantic information when the searching submodule searches that the initial semantic information contains preset reminding information.

Further, the control processing module of the semantic recognition device of the present invention includes:

the expression generation submodule is used for generating a missing semantic regular expression according to the initial semantic information;

the matching submodule is used for matching the missing semantic regular expression according to the character information of the target area;

and the semantic obtaining submodule is used for obtaining the target semantics of the user according to the successfully matched semantic regular expression.

Further, the information acquisition module in the semantic recognition device of the present invention includes:

the image shooting submodule is used for acquiring gesture action information of the user and acquiring an image of a target area according to the gesture action information;

the image processing submodule is used for carrying out image processing on the image of the target area;

and the image identification submodule is used for identifying the character information in the image of the target area.

The invention at least comprises the following beneficial technical effects:

(1) the method and the device overcome the defect of single voice input, combine gesture actions to acquire the character information of the target area on the premise of voice input, and more accurately acquire the real semantics of the user after the combination of the character information and the gesture actions, so that the voice equipment can intelligently identify the real intention of the user under the condition that the user cannot accurately express the real semantics by voice or does not know how to express the real semantics.

(2) According to the method and the device, after the voice information of the user is acquired, the acquired voice information is analyzed to acquire the initial semantic information, then whether the initial semantic information is the missing semantic information or not is judged, only after the semantic information is judged to be the missing semantic information, the gesture action information of the user is triggered to be acquired, and further the character information of the target area is acquired. Therefore, the acquisition of the gesture action information and the character information is triggered conditionally, the power consumption is greatly saved, only when the voice information is judged to belong to missing semantic information, the subsequent operation is started for auxiliary analysis after the complete user intention is not obtained, the two operations are combined to understand the real intention of the user, and the intelligent degree is higher.

(3) In the invention, the judgment of whether the initial semantic information is the missing semantic information can be judged by searching whether the initial semantic information contains the preset reminding information. The preset reminding information is a simple and easy scheme, and the scheme is strong in operability and easy to implement. And comparing the analyzed initial semantic information with preset reminding information.

(4) According to the method, under the condition that the voice information of the user cannot completely express the real intention (the initial semantic information is the missing semantic information), gesture action information of the user is obtained, an image of a target area pointed by the user is obtained according to the gesture action of the user, then character recognition is carried out on the image of the target area, character information content pointed by the gesture of the user is obtained, and the initial semantic contained in the previous voice information is combined to obtain the real intention of the user. The image acquisition and recognition technology is mature, the recognition speed is high, feedback can be given quickly after the voice information of the user is acquired, and the user experience is improved.

(5) The invention can generate a missing semantic regular expression according to the initial semantic information obtained by analyzing the voice information of the user, and then matches the initial semantic information with the character information of the target area obtained by later image recognition, so that a complete semantic sentence pattern can be quickly obtained, thereby knowing the true intention of the user and being convenient for giving corresponding feedback. And a matching scheme of a semantic regular expression is adopted, so that convenience and rapidness are realized.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.

FIG. 1 is a flowchart of a first embodiment of a semantic identification method according to the present invention;

FIG. 2 is a flowchart of a second embodiment of a semantic identification method of the present invention;

FIG. 3 is a flowchart of a third embodiment of a semantic identification method of the present invention;

FIG. 4 is a flowchart of a fourth embodiment of a semantic identification method according to the present invention;

FIG. 5 is a flow chart of a fifth embodiment of a semantic identification method of the present invention;

FIG. 6 is a flowchart of a sixth embodiment of a semantic identification method of the present invention;

FIG. 7 is a block diagram of a seventh embodiment of a semantic identification apparatus according to the present invention;

FIG. 8 is a block diagram of an eighth embodiment of a semantic identification apparatus according to the present invention;

fig. 9 is a block diagram of a semantic recognition apparatus according to a ninth embodiment of the present invention.

Reference numerals:

10- -Voice acquisition Module; 20- -a speech recognition module; 30- -information acquisition module; 31- -image capture submodule; 32-an image processing sub-module; 33- -image recognition sub-module; 40- -control the processing module; 41-expression generation submodule; 42- -match sub-module; 43- -semantic acquisition submodule; 50- -semantic judge submodule; 51- -find submodule; decision submodule 52.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention clearer, the present invention will be described in further detail with reference to the accompanying drawings, and it is apparent that the described embodiments are only a part of the embodiments of the present invention, not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Example one

The invention provides a semantic recognition method, the embodiment of which is shown in figure 1 and comprises the following steps:

s101, collecting voice information of a user;

specifically, the voice information of the user is acquired, and the voice information can be acquired through a microphone or other voice acquisition devices. The voice information may be voice input by the user in real time, and of course, the voice information is not necessarily complete, and may be complete voice information, or may be only partial voice.

S102, analyzing the voice information to obtain initial semantic information;

after the voice information is acquired, the voice information is analyzed so as to acquire the basic semantics which the voice information wants to express. The resolution of the speech information to obtain corresponding semantics can be achieved by various conventional technical means, and the present invention is not limited to a certain resolution scheme, and the scheme is not an improvement point of the present invention, and therefore, the present invention will not be described in detail.

S103, acquiring gesture action information of the user;

specifically, the acquiring of the gesture motion information of the user may be to shoot a gesture motion image of the user through a camera, or may be to sense the gesture motion of the user through other sensing devices.

S104, acquiring character information of a target area according to the gesture action information;

after the gesture action information of the user is obtained, the text information of the target area can be obtained according to the gesture action of the user, for example, according to a specific certain operation question on the book pointed by the finger of the user. The text information of the target area may be obtained by capturing an image of the learning area pointed by the user, such as an image of a topic, through the camera, and then processing and identifying the image to obtain text information of the topic.

S105, acquiring the target semantic of the user according to the initial semantic information and the character information of the target area.

According to the initial semantic information obtained by analyzing the voice information in the past, the text information of the target area obtained later is combined, and the two are fused, so that the real intention of the user can be obtained, and corresponding correspondence is given.

The embodiment overcomes the defect of single voice input, and on the premise of voice input, the character information of the target area is acquired by combining the gesture action, and the real semantics of the user can be acquired more accurately after the two are combined.

It is worth noting that the user gesture information can be obtained through shooting by a camera or sensed by other sensing devices. If the camera is used for obtaining, the user can also shoot videos of the user and the learning area of the user through various processing modes, for example, the camera is always in an opening stage in the using stage of the voice equipment. When voice information of a user is collected, the shot video frames are extracted according to the collected time points, gesture action images of the user are obtained, the gesture action is recognized, and specific topic images of a learning area pointed by the user are obtained. And then carrying out character recognition on the topic image to obtain corresponding character information, thereby combining the initial semantics of the previous voice information to obtain the real semantics of the user and further giving corresponding response. Of course, the camera of the intelligent voice device may not be in the starting stage all the time, and only after the voice information of the user is collected, the camera can be started to shoot the gesture action of the user and the image of the corresponding learning area. And after the shooting is finished, the shooting device can return to the prior dormant state.

Example two

A second embodiment of the invention, shown in fig. 2, comprises:

s201, collecting voice information of a user;

s202, analyzing the voice information to obtain initial semantic information;

s203, judging whether the initial semantic information is missing semantic information;

specifically, semantic information is missing, that is, information of complete semantics and user intention cannot be obtained. In other words, the initial semantics obtained by the collected user voice information alone cannot completely obtain the intention of the user, and the real semantics of the user needs to be obtained cooperatively by means of other auxiliary ways.

S204, when the initial semantic information is judged to be missing semantic information, acquiring gesture action information of the user;

s205, acquiring character information of a target area according to the gesture action information;

s206, according to the initial semantic information and the character information of the target area, obtaining the target semantic of the user.

Compared with the first embodiment, after the initial semantic information is acquired, a step of comparing and judging the initial semantic information and the missing semantic information is added. And determining whether to trigger acquisition of gesture action information of the user by judging whether the initial semantics are missing semantic information. According to the method and the device, the gesture action information of the user can be acquired only when the initial semantic information is judged to belong to the missing semantic information, and the step is added, so that the acquisition of the gesture action information is conditionally triggered, and the power consumption is greatly saved.

Specifically, for example, when a user uses an intelligent voice device to assist reading, and sees a sentence of "i like a grape", two words of the "grape" are not known, and thus the user wants to request help, and then the user says "what the two words read", and then the intelligent voice device collects the voice, and after parsing, obtains the pronunciation of the two words that the user does not know, and for which two words cannot be expressed through voice information, and judges that the words belong to missing semantics according to the initial semantic information. Then, a camera is started to shoot gesture actions of the user, images of two characters that the fingers of the user point to the grape are shot, image processing and character recognition are carried out on the area, character information that the user likes the grape is obtained, and then the fact that the user reads the two characters is combined with the previous semantics of the user, so that the fact that the user does not know to read the two characters of the grape is obtained as the real semantics, a corresponding response can be given, the two characters read the grape, and the user likes the grape. Therefore, words unknown to the user can be assisted through the intelligent voice device, and the user is helped to read books.

Similarly, for example, when the user sees the sentence "i like grape", the two words of "grape" are not known, and thus the user wants to request help, so the user says "i like" - ", so the intelligent voice device collects the voice and then analyzes the voice to obtain that the semantic of the user is" i like "-", and the favorite object is not expressed by the voice, so the missing semantic can be judged according to the initial semantic information. Then, a camera is started to shoot gesture actions of the user, images of two characters that the fingers of the user point to the grape are shot, image processing and character recognition are carried out on the area, character information that the user likes the grape is obtained, and then the real semantic meaning of the user is obtained by combining the semantic meaning that the user likes- - - - -, that the user does not know to read the two characters of the grape, and then a corresponding response that the user likes the grape is given. Therefore, the words which are not known by the user can be read only, the unknown words can be pointed out, the intention of the user can be understood through the intelligent voice device, the user can be helped to know the unknown words, and reading and understanding are facilitated.

EXAMPLE III

The semantic recognition method of the embodiment, as shown in fig. 3, includes:

s301, collecting voice information of a user;

s302, analyzing the voice information to obtain initial semantic information;

s303, judging whether the initial semantic information is missing semantic information; if yes, go to step S305, otherwise go to step S304;

s304, acquiring the target semantics of the user according to the initial semantic information;

s305, acquiring gesture action information of the user;

s306, acquiring character information of a target area according to the gesture action information;

s307, acquiring the target semantic of the user according to the initial semantic information and the character information of the target area.

The embodiment adopts different processing methods according to whether the initial semantic information is the missing semantic information. And when the initial semantic information acquired after the voice information is analyzed is missing semantic information, acquiring gesture action information of the user is triggered, further acquiring character information of a target area, and acquiring the target semantic of the user by combining the target character information and the initial semantic information. And for the condition that the initial semantic information is not missing semantic information, the gesture action information of the user does not need to be triggered to be acquired, and the real semantic meaning of the user can be directly acquired through the initial semantic meaning of the voice information of the user. For example, what is the english language of the elephant collected is the user's speech information? The collected voice information can completely express the real intention of the user, and has complete semantics, so that the initial semantic information obtained by analyzing the voice information does not belong to missing semantic information, and the target semantics of the user, namely English which wants to know the elephant, can be directly obtained only according to the initial semantic information. Then, the corresponding user gesture action image and the like do not need to be triggered and acquired, and the real intention of the user can be directly acquired according to the initial semantic information acquired by analyzing the voice information of the user, so that the corresponding response can be given in time: the elephant is elephant in english.

Example four

As shown in fig. 4, the semantic identification method of this embodiment includes:

s401, collecting voice information of a user;

s402, analyzing the voice information to obtain initial semantic information;

s403, judging whether the initial semantic information contains preset reminding information or not; if yes, go to step S404

S404, judging that the initial semantic information is missing semantic information;

s405, when the initial semantic information is judged to be missing semantic information, acquiring gesture action information of the user;

s406, acquiring character information of a target area according to the gesture action information;

s407, acquiring the target semantic of the user according to the initial semantic information and the character information of the target area.

Specifically, for example, the preset reminder is "petit talent, ask a question! "then only the voice message of the user is collected and includes" small day, question! "the initial semantic information can be determined as missing semantic information, so that subsequent gesture action information needs to be triggered and acquired, and then character information of a target area is acquired to assist in recognizing the target semantic of the user. One or more preset reminding messages can be set.

The preset reminding information is a simple and easy scheme, and the scheme is strong in operability and easy to implement. And comparing the analyzed initial semantic information with preset reminding information.

Of course, other more intelligent manners can be adopted for judging whether the initial semantic information is the missing semantic information. For example, after initial semantic information is acquired by analyzing voice information, whether the voice information can completely express the real intention of the user is intelligently judged according to the semantics of the initial semantic information. For example, the analyzed initial semantic information is "what is read by the word", and after the initial semantic information is acquired, the word does not know which word is, so that the user cannot respond, and the user cannot know which word is actually intended to be read, so that the user cannot respond. For such a situation, the device cannot give a corresponding response according to the voice information of the user, so that the initial semantic information is missing semantic information, and the word specifically indicates which word belongs to the missing situation, and a gesture action of the user is also required to assist in recognition.

EXAMPLE five

As shown in fig. 5, the semantic identification method of this embodiment includes:

s501, collecting voice information of a user;

s502, analyzing the voice information to obtain initial semantic information;

s503, judging whether the initial semantic information is missing semantic information;

s504, when the initial semantic information is judged to be missing semantic information, acquiring gesture action information of the user;

s505, acquiring character information of a target area according to the gesture action information;

s506, generating a missing semantic regular expression according to the initial semantic information;

s507, matching the missing semantic regular expression according to the character information of the target area;

s508, according to the successfully matched semantic regular expression, the target semantic of the user is obtained.

The embodiment refines the acquisition of the target semantics of the user according to the gesture action information and the initial semantic information of the user. Specifically, after the initial semantic information and the text information of the target region are obtained, a missing semantic regular expression is generated according to the initial semantic information. For example, if the collected voice information of the user is "what the two words read", the two words do not know which two words belong to the missing part according to the initial semantics of the voice information, so that a missing semantic regular expression can be generated: what XX reads. Then, according to the text information of the target area pointed by the finger of the user, the target semantic of the user, namely the reading of the grape, can be obtained.

Similarly, if the user does not say that the two words read what, etc., the query class lacks semantics, but directly reads only the recognized words, but does not read the unrecognized words, the user indicates that help is requested by a gesture. For example, when a user sees a sentence of "i like grape", the two words of "grape" are not known, so that the user wants to request help, so that the user says "i like" - ", so that the intelligent voice device collects the voice, and then analyzes the voice to obtain that the semantic of the user is" i like "-", and the favorite object is not expressed by the voice, so that the user can judge that the favorite object belongs to the missing semantic according to the initial semantic information. Then, a missing semantic regular expression may be generated according to the initial semantic information: i like XX. Then recognizing that the text information of the target area pointed by the user gesture is 'i like grape' according to the shot image of the target area pointed by the user gesture, and then matching the previous missing semantic regular expression: i like XX, get a complete semantic sentence: i like grape, thus telling the user that the word that I read likes the back is grape, and can output by voice: i like grapes.

Specifically, for example, the user assists learning using a smart voice device. After voice information of a user is collected by voice equipment, a camera in the voice equipment is started synchronously, finger clicking actions of the user in a learning process are collected, a learning area corresponding to a question is judged through the actions, characters in the area are identified and analyzed for intentions, semantic slots in missing semantic regular sentence patterns are matched in combination with semantics in previous voice input, the semantic slots are filled through results of character analysis, real semantics are obtained, and real intentions of the user in a fuzzy scene are given.

EXAMPLE six

S601, collecting voice information of a user;

s602, analyzing the voice information to obtain initial semantic information;

s603, acquiring gesture action information of the user;

s604, acquiring an image of a target area according to the gesture action information;

s605, performing image processing on the image of the target area, and identifying character information in the image of the target area;

s606, according to the initial semantic information and the character information of the target area, the target semantic of the user is obtained.

The present embodiment describes in detail the text information of the target area obtained according to the gesture action information of the user, specifically, for example, the gesture action image of the user can be obtained through the camera, and then the specific position (target area) image pointed by the user is obtained according to the gesture action direction of the user. Then, the image of the specific position (target area) is subjected to image processing, and the character information of the target area is recognized.

EXAMPLE seven

Based on the same technical concept, the present invention further discloses a semantic recognition apparatus, which can recognize the real semantics of the user by using the semantic recognition method of the present invention, specifically, a seventh embodiment of the present invention is shown in fig. 7, and includes:

the voice acquisition module 10 is used for acquiring voice information of a user;

the voice recognition module 20 is configured to parse the voice information to obtain initial semantic information;

the information acquisition module 30 is configured to acquire gesture action information of the user, and acquire text information of a target area according to the gesture action information;

and the control processing module 40 is configured to obtain the target semantic meaning of the user according to the initial semantic meaning information and the text information of the target area.

In this embodiment, the voice information of the user is collected by the voice collecting module 10, and then the voice information is analyzed by the voice recognizing module 20 to obtain the initial semantics. The gesture action of the user is obtained through the information obtaining module 30, the target area is determined, the image information of the target area is further obtained, and the image is analyzed to obtain the corresponding text information. Finally, the control processing module 40 obtains the target semantic meaning of the user according to the initial semantic meaning information obtained by the voice recognition module 20 and the character information of the target area obtained by the information obtaining module 30. Specifically, for example, a user assists learning by using an intelligent voice device, the intelligent voice device collects voice information of the user on one hand, and meanwhile, a camera on the intelligent voice device collects finger click actions of the user in a learning process. After the corresponding learning area with the question is judged through the action, the characters in the area are identified and analyzed for the intention, the real semantics are obtained by combining the semantics in the previous voice input, and the real intention of the user in the scene is given.

According to the embodiment, the real intention of the user can be obtained by combining the initial semantic information obtained by analyzing the voice information with the character information of the target area obtained later and fusing the initial semantic information and the character information, so that corresponding correspondence is given. The semantic recognition device of the embodiment overcomes the defect of single voice input, and on the premise of voice input, the character information of the target area is acquired by combining gesture actions, and the real semantics of the user can be acquired more accurately after the character information and the gesture actions are combined.

In addition, the information obtaining module 30 obtains the gesture information of the user, for example, by shooting through a camera, which may be in an on stage all the time during the use stage of the voice device (built-in semantic recognition device), and shoots videos of the user and the learning area of the user. When voice information of a user is collected, the shot video frames are extracted according to the collected time points, gesture action images of the user are obtained, the gesture action is recognized, and specific topic images of a learning area pointed by the user are obtained. And then carrying out character recognition on the topic image to obtain corresponding character information, thereby combining the initial semantics of the previous voice information to obtain the real semantics of the user and further giving corresponding response. Of course, the camera of the intelligent voice device may not be in the starting stage all the time, and only after the voice information of the user is collected, the camera can be started to shoot the gesture action of the user and the image of the corresponding learning area. And after the shooting is finished, the shooting device can return to the prior dormant state.

Example eight

In this embodiment, on the basis of the seventh embodiment, an analysis module and a semantic determination module 50 are added, specifically, as shown in fig. 8, the semantic recognition apparatus of the present invention further includes:

a semantic judging module 50, configured to judge whether the initial semantic information is missing semantic information according to the initial semantic information analyzed by the voice recognition module 20;

the information obtaining module 30 is further configured to obtain gesture action information of the user when it is determined that the initial semantic information is missing semantic information.

Compared with the seventh embodiment, the present embodiment adds the semantic determination module 50. The semantic determining module 50 determines whether the initial semantic is missing semantic information to determine whether to trigger the information acquiring module 30 to acquire gesture information of the user. In the embodiment, only when the semantic determining module 50 determines that the initial semantic information belongs to the missing semantic information, the gesture action information of the user is acquired through the information acquiring module 30, so that the acquisition of the gesture action information is conditionally triggered, and the power consumption is greatly saved.

Specifically, for example, a user uses an intelligent voice device (built-in semantic recognition device) to assist in reading to see "i very 245428;" when the sentence of "a chinese word," 245428; "the two words are not known, so that the user wants to request help, so that the user says" what the two words read ", so that the voice acquisition module 10 acquires the voice, and then obtains the pronunciation of the two words that the user does not know after the analysis by the voice recognition module 20, and for which two words cannot be expressed by the voice information, the semantic determination module 50 determines that the word belongs to the missing semantic according to the initial semantic information. Then, the camera of the information obtaining module 30 is restarted to shoot the gesture action of the user, shoot the finger of the user pointing to ' 245428 \\ the images of the two words of the core, then the image processing and the word recognition are carried out on the area to obtain the word information of ' I very 245428; ' the core ', and then the processing module 40 is controlled to combine the previous semantics of the user ' what the two words read ', so as to obtain the real semantics of the user who does not know to read ' 245428; ' the two words of the core, and further give a corresponding response ' a host ', the two words read ' 245428;, the me very 245428and the kemp. Therefore, words unknown to the user can be assisted through the intelligent voice device, and the user is helped to read books.

Of course, the user may also use other expressions to express the same meaning. Similarly, for example, when a user sees the sentence "i very \2458B", "2458B", the two words of core are not known, so that the user says "i very" -, the speech is collected by an intelligent speech device (a built-in semantic recognition device), and the semantic of the user obtained after analysis is "i very- - -", obviously, the initial semantic cannot completely express the intention of the user, and thus, the semantic belongs to missing semantic according to the initial semantic information. Then, the camera is started again to shoot the gesture action of the user, the user finger is shot to point to 'I very' 245428 ', the image of the segment of the character' I 'is processed and recognized, the character information' I very '245428' is obtained, then the character information 'I very' is combined with the previous semantic 'I very', the real semantic of the user is not known to read '245428', the two characters 'corncob' are obtained, and then the corresponding response 'owner' can be given, the two characters '245428', the I very '245428', and the medicine core. Therefore, the words which are not known by the user can be read only, the unknown words can be pointed out, the intention of the user can be understood through the intelligent voice device, the user can be helped to know the unknown words, and reading and understanding are facilitated.

Preferably, the control processing module 40 in the semantic recognition device of the present invention is further configured to, when the semantic determining module 50 determines that the initial semantic information is not missing semantic information, obtain the target semantic of the user according to the initial semantic information.

For the case that the initial semantic information of the semantic determining module 50 is not missing semantic information, the information acquiring module 30 does not need to be triggered to acquire the gesture action information of the user, and the semantic identifying module can directly analyze the initial semantic information acquired by the voice information of the user to acquire the real semantic information of the user. For example, what is "14 +25 equal to? Because the collected voice information can completely express the real intention of the user and has complete semantics, the initial semantic information obtained by analyzing the voice information does not belong to missing semantic information, and the target semantics of the user, namely the result of 14+25 is required to be directly obtained only according to the initial semantic information. Then, the corresponding user gesture action image and the like do not need to be triggered and acquired, and the real intention of the user can be directly acquired according to the initial semantic information acquired by analyzing the voice information of the user, so that the corresponding response can be given in time: 14+25 equals 39.

Example nine

In this embodiment, on the basis of the eighth embodiment, the semantic determining module 50 is refined, specifically, as shown in fig. 9, the semantic determining module 50 includes:

the searching submodule 51 is configured to search whether the initial semantic information includes preset reminding information;

the determining submodule 52 is configured to determine that the initial semantic information is missing semantic information when the searching submodule 51 finds that the initial semantic information includes preset reminding information.

Specifically, for example, the preset reminder is "hello, talent on small day! That is, when the user cannot completely express the real semantics only by voice, only "hello, little talent" needs to be spoken first, and then the voice expression can be continued, or the voice expression is not performed any more, and the user can directly indicate by gesture action. The voice collecting module 10 collects the voice information of the user as "hello, smallpox! After what this word reads, "the speech recognition module 20 recognizes it, obtaining the initial semantics. Then, the searching submodule 51 of the semantic judging module 50 searches whether the initial semantic information includes the reminding information of "hello, talent on the small day", and after the initial semantic information is found, the judging submodule 52 judges that the initial semantic information is the missing semantic information, so that the information acquiring module 30 needs to be triggered to acquire the subsequent gesture action information, and further the character information of the target area is acquired to assist in recognizing the target semantic of the user. One or more preset reminding messages can be set. The semantic judgment module 50 of this embodiment is easy to implement, simple and convenient.

Of course, the semantic judgment of whether the initial semantic information is missing semantic information may also adopt other more intelligent modes. For example, after the voice recognition module 20 parses the voice information to obtain the initial semantic information, the semantic determination module 50 intelligently determines whether the voice information can completely express the real intention of the user according to the semantic of the initial semantic information. For example, the analyzed initial semantic information is "what is read by the word", and after the initial semantic information is acquired, the word does not know which word is, so that the user cannot respond, and the user cannot know which word is actually intended to be read, so that the user cannot respond. For such a situation, the device cannot give a corresponding response according to the voice information of the user, so that the initial semantic information is missing semantic information, and the word specifically indicates which word belongs to the missing situation, and a gesture action of the user is also required to assist in recognition.

Preferably, as shown in fig. 9, in addition to any one of the above device embodiments, in this embodiment, a control processing module 40 of the semantic recognition device is developed, and the control processing module 40 includes:

an expression generation submodule 41, configured to generate a missing semantic regular expression according to the initial semantic information;

the matching submodule 42 is used for matching the missing semantic regular expression according to the text information of the target area;

and the semantic obtaining submodule 43 is configured to obtain the target semantic of the user according to the successfully matched semantic regular expression.

The present embodiment explains the control processing module 40 in detail. Specifically, after the initial semantic information is obtained by the voice recognition module 20 and the text information of the target region is obtained by the information obtaining module 30, the expression generating sub-module 41 of the control processing module 40 first generates a missing semantic regular expression according to the initial semantic information. For example, if the collected voice information of the user is "what the two words read", the two words do not know which two words belong to the missing part according to the initial semantics of the voice information, so that a missing semantic regular expression can be generated: what XX reads. Then matching submodule 42 matches the missing semantic regular expression \2428;, 'what the core reads;, so that the semantic acquisition module can obtain the target semantic of user \245828;, the pronunciation of the core according to the text information of the target area pointed to by the user's finger.

Similarly, if the user does not say that the two words read what, etc., the query class lacks semantics, but directly reads only the recognized words, but does not read the unrecognized words, the user indicates that help is requested by a gesture. For example, when a user sees the sentence "me very 245851", while the sentence "245852", the two words of the core are not known, so that the user wants to request help, so that the user says "i very- -", so that the speech is collected by the intelligent speech device, and the semantic of the user obtained after parsing is "i very- -", the initial semantic obviously fails to completely express the user intention, the sentence is incomplete, and thus the semantic belongs to the missing semantic according to the initial semantic information. Then, a missing semantic regular expression may be generated according to the initial semantic information: i am very XX. Then, according to the shot image of the target area pointed by the user gesture, recognizing that the text information of the target area pointed by the user gesture is 'I very 2458A', and then matching the previous missing semantic regular expression: i are XX, and a complete semantic sentence is obtained: a core membrane of '245852'. Thus informing the user that the user reads me very much-the following words are grapes, which can be output by voice: 24582, a core, I very well, 245852, a core.

Preferably, as shown in fig. 9, the information obtaining module 30 in the semantic recognition device according to any of the above embodiments includes:

the image shooting submodule 31 is configured to obtain gesture action information of the user, and obtain an image of a target area according to the gesture action information;

an image processing submodule 32 for performing image processing on the image of the target area;

and the image recognition submodule 33 is used for recognizing the character information in the image of the target area.

In this embodiment, the information obtaining module 30 is refined, specifically, for example, the gesture action image of the user may be obtained through the image capturing sub-module 31, for example, a camera, and then the specific position (target area) image pointed by the user is obtained according to the gesture action direction of the user. The image processing sub-module 32 then performs image processing on the image of the specific position (target area), and recognizes the text information of the target area through the image recognition sub-module 33.

The semantic recognition device can be built in various intelligent devices, such as a voice device for assisting a user in learning. The voice equipment can collect voice information of a user, obtain initial semantics of the user, collect finger click actions of the user in a learning process by starting a camera in the voice equipment, identify and analyze the characters in a learning area after judging the corresponding learning area generating questions through the actions, match semantic slots in a missing semantic regular sentence pattern by combining the initial semantics input by the previous voice, fill the semantic slots with results of character analysis, obtain real semantics and give real intentions of the user in a fuzzy scene.

While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the invention.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims

1. A method of semantic identification, comprising:

collecting voice information of a user;

analyzing the voice information to obtain initial semantic information;

acquiring gesture action information of the user;

acquiring the target semantics of the user according to the initial semantic information and the character information of the target area; the method specifically comprises the following steps:

2. The semantic recognition method according to claim 1, further comprising, after the obtaining the initial semantic information:

3. The semantic recognition method according to claim 2, wherein the determining whether the initial semantic information is missing semantic information comprises:

4. The semantic recognition method according to any one of claims 1 to 3, wherein the obtaining text information of the target area according to the gesture action information comprises:

5. A semantic recognition apparatus, comprising:

the voice acquisition module is used for acquiring voice information of a user;

the control processing module is used for acquiring the target semantics of the user according to the initial semantic information and the character information of the target area;

the control processing module comprises:

6. The semantic recognition device according to claim 5, further comprising:

7. The semantic recognition device according to claim 6, wherein the semantic determination module comprises:

8. The semantic recognition device according to any one of claims 5 to 7, wherein the information acquisition module comprises: