CN112965603A

CN112965603A - Method and system for realizing man-machine interaction

Info

Publication number: CN112965603A
Application number: CN202110325478.1A
Authority: CN
Inventors: 张纯青
Original assignee: Nanjing Avatarmind Robot Technology Co ltd
Current assignee: Nanjing Avatarmind Robot Technology Co ltd
Priority date: 2021-03-26
Filing date: 2021-03-26
Publication date: 2021-06-15

Abstract

The invention discloses a method for realizing man-machine interaction, which comprises the following steps: identifying whether the received input instruction is a behavior instruction type for driving the robot to change the limb action; if yes, analyzing the input instruction to obtain a corresponding keyword; and matching the keywords with a pre-established instruction knowledge graph to obtain an execution instruction. The method and the device improve the accuracy and efficiency of identifying the input instruction of the user, reduce the configuration threshold of the instruction template and facilitate daily maintenance.

Description

Method and system for realizing man-machine interaction

Technical Field

The invention relates to the technical field of artificial intelligence, in particular to a method and a system for realizing human-computer interaction.

Background

In recent years, with the breakthrough of artificial intelligence technology research in a plurality of fields, artificial intelligence is gradually widely applied in the fields of families, businesses, medical treatment, education and the like.

In the process of interacting with the robot, the user often needs to say some behavioral instructions for the robot to act or change its state, such as: moving to a certain position, lifting the right hand and speaking loudly, which requires the robot to be able to accurately recognize the action command. The traditional robot behavior instruction identification method mainly comprises a rule template-based method and a text classification-based method, wherein the rule template-based method is used for converting an instruction input by a user into an abstract template expression form and then carrying out matching, different business requirements are met, template definition rules are different, different template languages are required to be formulated for different businesses, only professional personnel can complete the operation, and the threshold for configuring the template is high; the method based on text classification needs to collect a large amount of labeled behavior instructions as training data to train a corresponding text classifier, and the difficulty of daily maintenance is high depending on the quality and the quantity of the training data; meanwhile, in practical application, commands initiated by users have diversity and complexity, so that the robot has low accuracy in identifying the behavior instructions.

Disclosure of Invention

The invention aims to provide a method and a system for realizing man-machine interaction, which are used for solving the technical problems of low recognition rate of behavior instructions, high threshold for configuring an instruction template and difficult daily maintenance in the prior art.

The technical scheme provided by the invention is as follows:

the invention provides a method for realizing man-machine interaction, which comprises the following steps:

identifying whether the received input instruction is a behavior instruction type for driving the robot to change the limb action;

if yes, analyzing the input instruction to obtain a corresponding keyword;

and matching the keywords with a pre-established instruction knowledge graph to obtain an execution instruction.

Further, before the receiving and identifying whether the input instruction is a behavior instruction, the method further includes:

creating an instruction knowledge graph, wherein the instruction knowledge graph comprises an entity instruction set, an attribute statement set and an incidence relation between an entity instruction and each attribute statement;

the attribute statement comprises a descriptive statement and an example statement, the descriptive statement is a statement for explaining preset features in the entity instruction, the preset features comprise at least one of environmental features, behavior features and time features, and the example statement is a statement which is different from the entity instruction in expression mode and has the same semantic meaning.

Further, the step of matching the keyword with a pre-established instruction knowledge graph to obtain an execution instruction comprises:

screening out target attribute sentences matched with the keywords from the attribute sentence set of the instruction knowledge graph to obtain candidate entity instructions corresponding to the target attribute sentences;

if the number of the candidate entity instructions is equal to one, determining the candidate entity instructions as the execution instructions;

if the number of the candidate entity instructions is more than one, prompting a user to input selection, and determining the candidate entity instruction selected by the user as the execution instruction according to the selection information.

Further, after prompting the user to input a selection, determining the candidate entity instruction selected by the user as the execution instruction according to the selection information includes:

judging whether selection information is acquired within a preset time length or not;

if not, executing termination operation or determining an optimal candidate entity instruction as the execution instruction; the optimal candidate entity instruction is a candidate entity instruction with the highest matching degree with the keyword;

and if so, determining the execution instruction according to the selection information.

Further, the analyzing the input command to obtain the corresponding keyword includes:

performing semantic understanding processing on the input instruction to obtain a word vector to be recognized;

and inputting the word vector to be recognized into a pre-established target word vector model, and outputting the keyword corresponding to the input instruction.

The invention also provides a system for realizing human-computer interaction, which comprises:

the identification module is used for identifying whether the received input instruction is a behavior instruction type which drives the robot to change the limb action;

the analysis module is used for analyzing the input instruction to obtain a corresponding keyword when the input instruction is of a behavior instruction type;

and the matching module is used for matching the keywords with a pre-established instruction knowledge graph to obtain an execution instruction.

Further, the system for implementing human-computer interaction further comprises:

the system comprises a creating module, a judging module and a judging module, wherein the creating module is used for creating an instruction knowledge graph, and the instruction knowledge graph comprises an entity instruction set, an attribute statement set and an incidence relation between an entity instruction and each attribute statement;

Further, the matching module comprises:

the screening submodule is used for screening out a target attribute statement matched with the keyword from an attribute statement set of the instruction knowledge graph to obtain a candidate entity instruction corresponding to the target attribute statement;

the processing submodule is used for determining the candidate entity instruction as the execution instruction if the number of the candidate entity instructions is equal to one;

and the processing sub-module is further used for prompting a user to input selection if the number of the candidate entity instructions is more than one, and determining the candidate entity instruction selected by the user as the execution instruction according to the selection information.

Further, the processing sub-module includes:

the judging unit is used for prompting a user to input a selection if the number of the candidate entity instructions is more than one, and then judging whether the selection information is acquired within a preset time length;

the output unit is used for determining a preset termination instruction or an optimal candidate entity instruction as the execution instruction when the selection information is not acquired within a preset time length; the optimal candidate entity instruction is a candidate entity instruction with the highest matching degree with the keyword;

the output unit is further configured to determine the execution instruction according to the selection information when the selection information is acquired within a preset duration.

Further, the analysis module comprises:

the first processing submodule is used for carrying out semantic understanding processing on the input instruction to obtain a word vector to be recognized;

and the second processing submodule is used for inputting the word vector to be recognized into a pre-established target word vector model and outputting the keyword corresponding to the input instruction.

The method and the system for realizing the man-machine interaction can bring the following beneficial effects:

1. the robot can better understand and correctly execute the input instruction, and effectively improves the accuracy and efficiency of behavior input recognition.

2. According to the invention, a user can construct the instruction knowledge graph according to the actual scene after simple learning, so that the instruction template configuration threshold is effectively reduced.

3. According to the invention, a user sets a small number of attribute values, so that a large number of execution instructions can be matched, the generalization effect of instruction attribute setting is effectively improved, and the daily maintenance is simple.

Drawings

The essential features, technical features, advantages and modes of realisation of the present invention will be further described in the following, in a clearly understandable manner, in connection with the description of preferred embodiments in connection with the accompanying drawings.

FIG. 1 is a flow chart of one embodiment of a method of implementing human-computer interaction of the present invention;

FIG. 2 is a schematic diagram of an instruction knowledge-graph of the present invention;

FIG. 3 is another schematic diagram of an instruction knowledge-graph of the present invention;

FIG. 4 is a flow chart illustrating the identification of an input command according to the present invention;

FIG. 5 is a block diagram of one embodiment of a system for implementing human-computer interaction of the present invention.

The reference numbers illustrate:

51. identification module, 52 analysis module, 53 matching module.

Detailed Description

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the following description will be made with reference to the accompanying drawings. It is obvious that the drawings in the following description are only some examples of the invention, and that for a person skilled in the art, other drawings and embodiments can be derived from them without inventive effort.

For the sake of simplicity, the drawings only schematically show the parts relevant to the present invention, and they do not represent the actual structure as a product. In addition, in order to make the drawings concise and understandable, components having the same structure or function in some of the drawings are only schematically illustrated or only labeled. In this document, "one" means not only "only one" but also a case of "more than one".

Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and alternate implementations are included within the scope of the preferred embodiment of the present invention in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present invention.

In an embodiment of the method of the present invention, as shown in fig. 1, a human-computer interaction implementing method, a human-computer interaction scheme executing subject of the present invention may be a robot or a server. The identification method comprises the following steps:

s11, identifying whether the received input instruction is a behavior instruction type for driving the robot to change limb actions;

in practical application, a user can directly or indirectly send an input instruction to a robot in a mode of sending the input instruction by using a voice signal or in a mode of editing input characters in an input box of an intelligent terminal to generate the input instruction. The transmission mode of the voice signal includes but is not limited to the following forms: the voice signal is initiated by the user on site by using the daily spoken habit, or the voice signal of the user is recorded by the intelligent terminal and then played by the loudspeaker to initiate the voice signal.

When the execution main body of the whole human-computer interaction scheme is the server, the robot or the intelligent terminal only receives and acquires an input instruction of a user, then the robot or the intelligent terminal sends the input instruction to the server, the server performs recognition and analysis work on the input instruction to obtain an execution instruction, the server sends the execution instruction to the robot, and the robot performs interactive response with the user according to the received execution instruction. The robot acquires the input instruction and transfers the input instruction to the server for analysis and identification, so that the hardware requirement of the robot can be reduced, and the manufacturing cost of the robot is further reduced.

When the execution main body of the whole human-computer interaction scheme is the robot, the robot receives and acquires an input instruction of a user and also performs recognition and analysis work on the input instruction, so that the robot can perform interactive response with the user directly according to the execution instruction obtained by self analysis. The robot automatically acquires, analyzes and identifies the input instruction, can interact with the robot under the condition of offline, namely network disconnection, and can accurately and efficiently analyze and identify the input instruction, and the instruction knowledge graph automatically created by the robot better meets the personalized requirements of users.

Therefore, after the robot or the server obtains the input instruction, performing semantic recognition on the input instruction to obtain an instruction type, wherein the instruction type comprises a voice interaction type and a behavior instruction type, if the input instruction is recognized to comprise a word or a word described by an action, the input instruction is indicated to belong to the behavior instruction type, and otherwise, if the input instruction is recognized to not comprise the word or the word described by the action, the input instruction is indicated to belong to the voice interaction type.

For example, the input instructions of the behavior instruction type include, but are not limited to, instructions that drive the robot to change the joint action of the limbs of the robot with guests watching television, turning off the light in the kitchen, and the like. The limb joints include, but are not limited to, arm joints, leg joints, finger joints, and neck joints.

In the specific implementation process, a user can send an input instruction by using a voice signal no matter the user is at home or outdoors, or can send the input instruction to a server or a robot by editing input characters in an intelligent terminal input box to generate the input instruction, then the robot or the server performs semantic recognition on the input instruction, judges whether the input instruction comprises characters or words described by an action, and further judges whether the acquired input instruction is of an action instruction type. The method for extracting the characters or words described by the action comprises the steps of performing word segmentation on an input instruction in a text form by using a word segmentation tool, performing part-of-speech recognition on a word segmentation result, and judging whether the word segmentation result has verb words or verb words in a pre-constructed verb dictionary. And if at least one verb or verb matched with the verb dictionary exists in all the word segmentation results, determining that the input instruction belongs to the behavior instruction type. And otherwise, if all the word segmentation results have no verb or verb matched with the verb dictionary, determining that the input instruction belongs to the behavior instruction type.

For example, after acquiring an input instruction sent by a user, the robot or the server extracts a word or a word about a description of an action in the input instruction through a semantic recognition function, and if one word or word about a description of an action is extracted from the input instruction "go to kitchen and turn off a light", then confirms that the input instruction of the user is an action instruction.

S12, if yes, analyzing the input instruction to obtain a corresponding keyword;

and S13, matching the keywords with a pre-established instruction knowledge graph to obtain an execution instruction.

Specifically, referring to the foregoing embodiment, since the word segmentation result obtained by using the word segmentation tool may include verbs, pronouns, nouns, prepositions, adjectives, and the like, then, after the robot or the server determines that the input instruction is the behavior instruction type through the foregoing embodiment, and obtains the word segmentation result of the input instruction through the word segmentation tool, the pronouns, prepositions, and adjectives in the word segmentation result are further removed, so that the keywords of the input instruction include any one or more of verbs, nouns, and prepositions. Then, the robot or the server matches the keywords with an instruction knowledge graph established in advance by the system to obtain an execution instruction.

In this embodiment, after the robot or the server obtains an input instruction sent by a user, the robot or the server further analyzes the input instruction to obtain a keyword when the input instruction is identified as a behavior instruction type, and then matches the keyword with an instruction knowledge graph established in advance by the system to obtain an execution instruction. Therefore, even if the input command is not in a standard natural language or the input command has information loss, the robot or the server can accurately recognize the input command, and the execution command matched with the input command can accurately reflect the intention of the user, so that the accuracy of input command recognition is effectively improved. In addition, when the input instruction is not a behavior instruction type but a voice interaction type, the robot or the server abandons matching of the semantically recognized keyword with the instruction knowledge graph, and controls the robot to perform dialogue interaction with the user by using the existing voice interaction technology, which is not described in detail herein.

Preferably, when the number of times that the robot or the server acquires the input instruction of the same behavior instruction type exceeds the preset number N, the input instruction of the behavior instruction type and the execution instruction obtained by matching the input instruction of the behavior instruction type are stored. When the robot acquires the stored input instruction at the N +1 th time, the corresponding execution instruction is directly found out from the corresponding relation of the stored data. For example, if the number of times of the input command of the behavior command type received by the robot or the server is "go to the kitchen and turn off the light" exceeds a preset number (for example, 2 times), the input command is "go to the kitchen and turn off the light" and the execution command is "move to the kitchen" and store the input command according to the corresponding relationship. When the robot or the server receives the input command of turning off the lamp in the kitchen again, the corresponding execution command of moving to the kitchen is directly searched from the stored data. The robot or the server does not need to recognize whether the input instruction is a behavior instruction type or not, and does not need to perform semantic recognition and instruction matching, so that the operation burden of the robot or the server can be reduced.

In another embodiment of the method of the present invention, before the step S11 of recognizing whether the received input command is a behavior command type for driving the robot to change the limb action, the method further includes:

s01, creating an instruction knowledge graph, wherein the instruction knowledge graph comprises an entity instruction set, an attribute statement set and an incidence relation between an entity instruction and each attribute statement;

Specifically, the instruction knowledge graph comprises a knowledge base, wherein an entity instruction set, an attribute statement set and an association relationship between an entity instruction and each attribute statement are stored in the knowledge base, and one entity instruction and the attribute statement connected with the entity instruction form an instruction relationship chain, so that a plurality of different instruction relationship chains are stored in the knowledge base.

The invention improves the traditional knowledge graph, and generates the instruction knowledge graph by creating entity instructions and adding corresponding description sentences and example sentences for each entity instruction. The environment features comprise article environment features and orientation environment features, the article environment features are articles for representing the designated space, the orientation environment features are direction layouts for representing the designated space, and the like. The temporal features characterize the time of use of a given space. The behavior characteristics are specified behaviors corresponding to the representation designated space. The robot or server may solidify the instruction knowledge graph into two basic attributes: description and examples. The robot or the server abstracts the expression form of two instruction knowledge maps as shown in fig. 2:

1) the first expression: [ entity instruction ] - - - - > [ statement ] - - - - - > [ keyword ].

2) The second expression is as follows: [ entity instruction ] - - - - > [ example statement ] - - - - - > [ keyword ].

In practical use, the instruction relation chain created by the robot or the server may be an instruction relation chain created only by the first expression or the second expression, or may be an instruction relation chain created by the first expression and the second expression.

Illustratively, as shown in FIG. 3, one example of a chain of command relationships is as follows:

assuming that one of the entity commands is "move to a living room", the user stores description sentences including "place to play television, sofa, and watch television", "enter room with right hand, and meet guests in the room", and naturally other description sentences such as "place to enter front of door, tea table, and tea tasting" may be added according to the requirement. The example sentences stored by the user for the entity instruction include "take guest to sit on", although other example sentences "turn on the television", etc. can be added according to the requirement. By this, the construction of the instruction relationship chain in which the entity instruction is "move to large living room" is completed. And (4) referring to the process aiming at the instruction relation chain corresponding to other entity instructions to fill the rich knowledge base.

Continuing the above example, the environmental characteristics of the articles corresponding to the living room are "sofa, television", the environmental characteristics of the orientation corresponding to the living room are "room with right hand entering door", and the behavioral characteristics corresponding to the living room are "sitting, meeting, watching television", and so on.

In the embodiment, when a user constructs the instruction knowledge graph, only entity instructions, example statements and abstract understanding are required, so that the user can construct the instruction knowledge graph by himself according to actual scenes after simple learning, the instruction template configuration threshold is effectively reduced, meanwhile, a large number of execution instructions can be matched by setting a small number of attribute values, the generalization effect of input instruction attribute setting is improved, and the daily maintenance of the instruction knowledge graph is simpler.

In another embodiment of the method of the present invention, the step S13 of matching the keyword with a pre-established instruction knowledge graph to obtain an execution instruction further specifically includes:

s131, screening out target attribute sentences matched with the keywords from the attribute sentence set of the instruction knowledge graph to obtain candidate entity instructions corresponding to the target attribute sentences;

s132, if the number of the candidate physical instructions is equal to one, determining that the candidate physical instructions are the execution instructions.

S133, if the number of the candidate entity instructions is more than one, prompting a user to input selection, and determining the candidate entity instruction selected by the user as the execution instruction according to the selection information.

In particular, there may be situations where the input instruction is unclear and semantically ambiguous, so that more than one candidate entity instruction may be obtained by matching the robot or the server. Because the attribute sentences in the instruction knowledge graph consist of a plurality of characteristic vocabularies, keywords obtained by semantic analysis of input instructions can be matched with the characteristic vocabularies associated with the attribute sentences, and then target attribute sentences matched with the keywords are screened out. After the robot or the server matches the target attribute sentences with the keywords according to the instruction knowledge graph, the candidate entity instructions corresponding to the target attribute sentences can be obtained according to the incidence relation between each attribute sentence and the entity instructions in the instruction knowledge graph. If the number of the candidate entity instructions is equal to one, the candidate entity instructions are determined to be the execution instructions, and if the number of the candidate entity instructions is larger than one, the robot does not execute immediately, but prompts a user to select the candidate entity instructions through the robot or the server.

The method comprises the following steps of providing a selection item of a candidate entity instruction for a user to select through a voice prompt mode and a touch screen display mode, or providing the selection item of the candidate entity instruction for the user to select through an application program interface of the intelligent terminal; of course, the candidate entity instruction to be prompted may be the most similar one, or the first few may be prompted from high to low depending on the matching rate.

Exemplarily, it is assumed that six candidate entity instructions obtained by matching are available, and if the six candidate entity instructions are provided for a user to select one by one in a voice information manner, the user experience is not good, and the recognition efficiency of the input instruction is further influenced; and the candidate entity instructions are further screened through the matching degree with the keywords, so that the prompting quantity of the candidate entity instructions selected by the user can be reduced, and the efficiency of input instruction identification is effectively improved.

In the embodiment of the invention, the efficiency of input instruction identification is effectively improved by further screening the candidate entity instructions; meanwhile, the user can select the final execution instruction of the robot according to the intention by prompting the user to select the candidate entity instruction, so that the accuracy of input instruction identification is effectively improved.

In another embodiment of the method, after prompting the user to input a selection, before determining, according to the selection information, that the candidate entity instruction selected by the user is the execution instruction, the method further includes:

s1331, judging whether the selection information is acquired within a preset time length;

specifically, the user may be disturbed by various factors during the interaction with the robot or the server, such as noise near when the user sends voice on the street, or talking sound of the user with a third person during the interaction with the server or the robot at home, which may be received by the robot or the server, and may cause the candidate entity instruction to be selected incorrectly if the robot or the server does not process the candidate entity instruction. For example, when the selection item corresponding to a certain candidate entity instruction is the number "3", the robot or the server receives the information of the number "3" in noise or talk sound while waiting for the voice selection of the user, and the candidate entity instruction corresponding to the number "3" selected by the user may be mistakenly considered; the candidate entity instruction may not be user-prepared for selection.

In the implementation step, at this time, the robot or the server only judges whether the selection information of the user is acquired within the preset time length, and does not receive interference information such as noise or talk sound, so as to avoid that the candidate entity instruction is selected by mistake, thereby improving the accuracy of input instruction identification.

S1332, if not, executing termination operation or determining an optimal candidate entity instruction as the execution instruction; the optimal candidate entity instruction is a candidate entity instruction with the highest matching degree with the keyword;

and S1333, if yes, determining the execution instruction according to the selection information.

Specifically, in practical application, there may be a situation where the user is temporarily disturbed by a haste and does not make a selection in time, for example, the user sends an input instruction of "go to a bedroom to open an air conditioner" and is interrupted by a temporary incoming call to make a selection in time, and if the robot or the server does not perform processing, the identification process of the input instruction may be interrupted.

In this implementation step, after prompting the user to input a selection, the robot or the server determines whether to acquire the user selection information within a preset time period, and the robot or the server is in a state of waiting for confirmation within a time period when the timing of the start time of the prompting reaches the preset time period until receiving the selection information input by the manual operation of the user within the preset time period, and then obtains the execution instruction from the candidate entity instruction according to the selection information. Of course, if the selection information input by the manual operation of the user is not obtained after the preset time length is reached, the termination operation is executed, or the optimal candidate entity instruction is determined to be the execution instruction, so that the interruption of the identification process of the input instruction is avoided, and the identification efficiency of the input instruction is improved. Meanwhile, the robot can accurately execute the execution instruction matched with the user input instruction according to the actual condition, so that the use experience of the user is effectively improved.

In the embodiment of the invention, before the user selects the candidate entity instruction, the robot or the server is in a waiting confirmation state within the preset time length, so that the condition that the identification process of the input instruction is interrupted or the candidate entity instruction is selected wrongly is avoided, the accuracy of the identification of the input instruction is effectively improved, the identification error rate of the robot instruction is reduced, the electric quantity consumption of the robot for executing the task according to the wrongly executed instruction is reduced, the use experience of the user on the robot is improved, and the popularization rate and the market prospect of the robot are expanded.

In another embodiment of the present invention, step S12 further includes:

s121, performing semantic understanding processing on the input instruction to obtain a word vector to be recognized;

specifically, a Word vector is a general name of a group of language modeling and feature learning technologies in Word embedded Natural Language Processing (NLP), and Word2vec (Word to vector) is a common tool for converting words into a vector form, which can simplify the processing of text content into vector operation in a vector space, and represent the semantic similarity of texts by using the similarity in the vector space, and the obtained keywords have differences according to the differences of models.

The method needs to construct a training word vector model in advance according to the flow of the method; the method includes collecting a conversational corpus in advance, where the conversational corpus may include a plurality of conversational corpora, and each conversational corpus may be specifically a sentence. The method comprises the steps of performing word segmentation on a current conversation corpus in a conversation corpus to obtain a plurality of word vectors, then arranging the word vectors according to the sequence of the word vectors in the conversation corpus to form a word vector sequence, and repeating the steps to obtain a plurality of word vector sequences. The method comprises the steps of training by adopting a plurality of word vector sequences, namely inputting each word vector in one word vector sequence into an initial word vector model, enabling the similarity between the vectors of each word vector in the word vector sequence predicted by the initial word vector model to be large enough, if the similarity is larger than a preset similarity threshold, otherwise, if the similarity is not larger than the preset similarity threshold, adjusting parameters of the initial word vector model, and enabling the similarity between the vectors of each word vector in the same word vector sequence predicted by the initial word vector model to be larger than the preset similarity threshold. And continuously training the initial word vector model by adopting a plurality of word vector sequences, and when the training reaches a preset number of times, such as millions of times, or the training of the initial word vector model is carried out for a preset continuous number of times, the training target can be met, namely, the similarity threshold of the vectors of all word vectors in the same word vector sequence is greater than the preset similarity threshold, at the moment, the parameters of the initial word vector model are determined, and further, the final target word vector model is determined, and the training is finished.

And S122, inputting the word vector to be recognized into a pre-established target word vector model, and recognizing to obtain a keyword corresponding to the input instruction.

Specifically, the input instruction in the form of a voice signal is converted into a text format to obtain a text of the instruction to be recognized. Through natural segmentation and word bank segmentation, word segmentation is carried out on the instruction text to be recognized in the text format to obtain word segmentation results including single words, words and English words, and then the word segmentation results are subjected to vector conversion to obtain word vectors to be recognized. Because the number of the word vectors to be recognized may be multiple, all the word vectors to be recognized of the instruction text to be recognized are clustered and divided to obtain a plurality of word vector cluster families, and each word vector cluster family comprises at least one word vector to be recognized. Processing all word vectors to be recognized in each word vector cluster to obtain corresponding cluster vectors, inputting the cluster vectors into the trained target word vector model to output a plurality of candidate keywords, determining the candidate keywords corresponding to the word vectors to be recognized with the highest similarity as the keywords of the word vector cluster family, successively analogizing to obtain the keywords of all the word vector cluster families, and further taking the keywords of the instruction text to be recognized as the keywords of the input instruction.

The extracted keywords are obtained from a semantic level instead of a grammar level, all word vectors to be recognized of the instruction text to be recognized are divided into a preset number of word vector cluster groups through a clustering algorithm, namely, each word vector cluster group is considered to contain one keyword, all word vectors contained in the word vector cluster groups are added to obtain cluster vectors, the cluster vectors contain the spatial relation among all word vectors to be recognized, corresponding candidate keywords are output to each word vector cluster group through a pre-established target word vector model, the most similar candidate keyword word words are used as the keywords of the word vector cluster groups, the semantic dependency relation among the text words is fully considered, and the accuracy and the objectivity of keyword extraction are improved.

For example, as shown in fig. 4, it is assumed that the input command acquired by the robot or the server is "to see that tv is not turned on". The robot or the server performs semantic understanding processing on the input instruction 'see/tv is closed but not' by using a natural language processing technology to obtain a word vector to be recognized, which comprises 'go/verb', 'see/verb', 'tv/noun', 'closed/preposition', 'no/verb'. The robot or the server inputs the word vector to be recognized into a pre-established target word vector model, for example, word2vec is used for calculating the similarity between word vectors to be recognized, the most similar keywords of the word vectors to be recognized, namely 'television' are output after the recognition processing of a target word vector model, then a sentence similarity matching technology is used for matching the analyzed keywords with the attribute sentences of the instruction knowledge graph, a robot or a server obtains an input instruction 'see television without' through matching, the input instruction 'see television without' is 'place for placing television, sofa is available, television can be seen', and then the entity instruction deduced by the robot or the server through matching the keyword with the attribute statement in the instruction knowledge graph is ' move to big living room ', then the robot or the server ' moves to big living room ' for the input instruction ' see that the television is not watched ' the execution instruction obtained by identifying and matching '.

In the embodiment, after the robot or the server receives and identifies the input instruction, the robot or the server analyzes the input instruction to obtain the keyword corresponding to the input instruction, and then matches the analyzed result with the attribute statement of the instruction knowledge graph by using a sentence similarity matching technology to obtain the execution instruction, wherein the obtained execution instruction better conforms to the intention of the user to input the instruction, so that the accuracy of the robot input instruction identification is effectively improved.

The invention also provides a system for implementing human-computer interaction, which can be deployed in the robot itself, or in a server providing technical support for the robot, or separately and respectively deploy different functional modules of the recognition system in the robot and the server, as shown in fig. 5, the system for implementing human-computer interaction comprises:

the identification module 51 is used for identifying whether the received input instruction is a behavior instruction type which drives the robot to change the limb action;

the analysis module 52 is configured to, when the input instruction is a behavior instruction type, analyze the input instruction to obtain a corresponding keyword;

and the matching module 53 is configured to match the keyword with a pre-established instruction knowledge graph to obtain an execution instruction.

Specifically, this embodiment is a system embodiment corresponding to the above method embodiment, and specific effects refer to the above method embodiment, which is not described in detail herein.

In another system embodiment of the present invention, based on the embodiment of the above human-computer interaction implementing system, the human-computer interaction implementing system further includes:

the creating module 50 is used for creating an instruction knowledge graph, and the instruction knowledge graph comprises an entity instruction set, an attribute statement set and an incidence relation between an entity instruction and each attribute statement;

Based on the foregoing system embodiments, the matching module includes:

Based on the foregoing system embodiment, the processing sub-module includes:

Based on the foregoing system embodiments, the analysis module comprises:

In conclusion, the implementation method and system for human-computer interaction provided by the invention enable the robot to better understand and recognize the input instruction, effectively improve the accuracy and efficiency of the robot in recognizing the input instruction, enable a user to construct an instruction knowledge graph according to an actual scene after simple learning, and effectively reduce the instruction template configuration threshold; and then, the semantic analysis and the instruction knowledge graph of the sentence similarity matching technology can be used for better generalizing the instructions of the user, the user can set a small number of attribute values to match a large number of user commands, the generalization effect of the instruction attribute setting is effectively improved, and the daily maintenance is simpler.

In the description herein, references to the terms "one embodiment," "some embodiments," "an example," "a specific example," or "some examples" or the like, mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

It should be noted that the above embodiments can be freely combined as necessary. The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims

1. A method for realizing human-computer interaction is characterized by comprising the following steps:

if yes, analyzing the input instruction to obtain a corresponding keyword;

2. The method for implementing human-computer interaction according to claim 1, wherein the step of identifying whether the received input command is a behavior command type for driving the robot to change the limb action further comprises the steps of:

3. The method for implementing human-computer interaction according to claim 2, wherein the step of matching the keyword with a pre-established instruction knowledge graph to obtain an execution instruction comprises the steps of:

4. The method for implementing human-computer interaction according to claim 3, wherein after prompting the user to input a selection, determining the candidate entity instruction selected by the user as the execution instruction according to the selection information comprises:

5. A method for implementing human-computer interaction according to any one of claims 1 to 4, wherein said analyzing said input command to derive a corresponding keyword comprises:

6. A system for implementing human-computer interaction is characterized by comprising:

7. The human-computer interaction implementing system of claim 6, further comprising:

8. The human-computer interaction implementing system of claim 7, wherein the matching module comprises:

9. The human-computer interaction implementing system of claim 8, wherein the processing submodule comprises:

10. A system for human-computer interaction as claimed in any one of claims 6 to 9 wherein the analysis module comprises: