WO2019085697A1 - 人机交互方法和系统 - Google Patents

人机交互方法和系统 Download PDF

Info

Publication number
WO2019085697A1
WO2019085697A1 PCT/CN2018/107893 CN2018107893W WO2019085697A1 WO 2019085697 A1 WO2019085697 A1 WO 2019085697A1 CN 2018107893 W CN2018107893 W CN 2018107893W WO 2019085697 A1 WO2019085697 A1 WO 2019085697A1
Authority
WO
WIPO (PCT)
Prior art keywords
intent
node
intention
information
user
Prior art date
Application number
PCT/CN2018/107893
Other languages
English (en)
French (fr)
Inventor
谢韬
Original Assignee
科沃斯商用机器人有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 科沃斯商用机器人有限公司 filed Critical 科沃斯商用机器人有限公司
Publication of WO2019085697A1 publication Critical patent/WO2019085697A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis

Definitions

  • the present application relates to the field of automatic response systems, and in particular, to a human-computer interaction method and system.
  • the voice mode is an ideal, direct and convenient way of human-computer interaction. Therefore, the question and answer system in the human-computer interaction system on the market currently adopts a voice interaction mode. That is, the user asks a question, the system responds to the user's question, gives the corresponding answer in a voice manner, and some are accompanied by other corresponding operations.
  • Semantic matching is also performed on the voice signals input by the user through the semantic network.
  • the semantics of the semantic analysis is semantically matched through some preset semantic relation libraries and sentence relationship templates. This method has a high limit on the language used by the user, and it is difficult to recognize once there is no preset instruction.
  • the invention patent application with the publication number 104360994A and the invention name is a natural language understanding method and system provides another scheme, and uses the Ranking SVM (sorting learning algorithm based on support vector machine) to extract the feature vector from the text, and then adopts linearity.
  • the kernel SVM implements statistics-based sorting, sorting the correlation between the multi-scene semantic analysis results and the natural language input by the user.
  • the disadvantage of this method is that it is susceptible to noise interference and is prone to over-fitting, so the understanding of natural language is not accurate enough.
  • the invention patent application with the bulletin number CN106156003A and the invention name as a question and answer system provides a method of slot filling to obtain the understanding of the question, and the specific scheme is: solving the problem by cyclic neural network modeling together Intent to identify tasks and slot fill tasks to improve the accuracy of question understanding.
  • the slot filling related technology it is necessary to analyze the sentence, determine the event, extract the entity, find the satisfied slot, and the like, which is relatively complicated in implementation and can only solve the dialogue in the same topic. Realize the conversion of the topic.
  • the technical problem to be solved by the present application is that the understanding of the user's intention in the existing human-computer interaction technology is not accurate enough, and a human-computer interaction method and system are provided for realizing accurate human-machine communication.
  • a human-computer interaction method includes the following steps:
  • the output information is output.
  • the step of determining the best intention by using the intent classifier and the corresponding data processing according to the user tree information group and the intent node tag according to the user text information and the intent node tag specifically includes:
  • the classifier inputs the intent node tag in the information to obtain a plurality of corresponding prediction intents
  • the steps to verify that the predicted intent meets the user's intent include:
  • the global maximum similarity is less than the first threshold, or acquiring corresponding specific output information, and outputting the specific output information; or sending an interaction request to a third-party system; receiving a third-party interaction returned by the third-party system Outputting information; and outputting the third party to interactively output information.
  • the intent node tag in the intent classifier input information is replaced with the intent node tag in the node branch
  • the intent node tag in the intent classifier input information is replaced with the current intent node tag from the current intent node to obtain a corresponding Predictive intent
  • the steps to verify that the predicted intent meets the user's intent include:
  • the intent node label in the intent classifier input information is replaced with the intent node label of the upper level node of the current intent node in the branch, and the above steps are repeated.
  • the comparison is performed according to the maximum similarity of the plurality of predicted intents that have been calculated
  • the score of the maximum similarity of the predicted intent, and the maximum similarity of the largest score is determined as the global maximum similarity
  • Comparing the global maximum similarity and the size of the first threshold if the global maximum similarity is greater than or equal to the first threshold, determining that a prediction intent corresponding to the global maximum similarity conforms to a user intent; The global maximum similarity is less than the first threshold, or obtain corresponding specific output information; and output the specific output information; or send an interaction request to a third-party system; receive third-party interaction output information returned by the third-party system; The user outputs the third party interaction output information.
  • the current intent node can be quickly obtained. After the prediction intent that meets the user's intention is determined as the best intention, the node corresponding to the best intent is determined as the current intent node.
  • the process of determining the node corresponding to the best intent as the current intent node includes:
  • the best intent node with the largest path length is determined as the current intent node.
  • the step of combining the user text information and the intent node label into the input information of the intent classifier specifically includes:
  • the word vector is used as input information for the intended classifier.
  • the intent classifier is a convolutional neural network model or a cyclic neural network model.
  • the application also provides a human-computer interaction system, including:
  • a voice recognition module configured to identify a user's voice input information as user text information
  • a best intent determining module configured to determine a best intention by using an intent classifier and corresponding data processing according to the user text information and the intent node tag, based on the node group of the intent tree;
  • a query module configured to query a comparison table of intent and output information according to the best intention, to obtain corresponding output information
  • An output module configured to output the output information.
  • the best intent determination module includes:
  • a merging unit configured to merge the user text information and the intent node label
  • An intention classifier configured to use the combined information of the merging unit as input information to obtain a prediction intention
  • a verification unit configured to verify whether the predicted intent meets a user intention
  • a determining unit for determining a predicted intent that meets the user's intention as the best intention is
  • the verification unit includes:
  • a searching subunit configured to search a comparison table of intent and preset input information according to the predicted intent output by the intent classifier, to obtain preset input information corresponding to each prediction intent;
  • a similarity calculation subunit configured to calculate a similarity between the user text information and the corresponding preset input information, to obtain a maximum similarity corresponding to the prediction intention
  • a threshold comparison subunit configured to compare the maximum similarity with a threshold, and send the comparison result to the determining unit.
  • the merging unit includes a notification receiving interface, configured to receive a merge notification; correspondingly, the intent classifier includes a notification output interface for outputting a merge notification; or the similarity calculation subunit includes a notification output interface, And transmitting a merge notification to the merging unit; or the threshold comparison subunit includes a notification output interface, configured to send a merge notification to the merging unit.
  • the system described herein further includes a current intent node maintenance module for retaining the current intent node when the maximum similarity is less than the threshold; and the best intention of maximizing the path length in the best intent node path when determining the best intent
  • the node is determined to be the current intent node.
  • the system described herein further includes a third party interface module coupled to the best intent determination module for transmitting the user text information and the interaction request when the best intent determination module determines that there is no best intent Giving a third-party system and receiving output information returned by the third-party system, and transmitting the output information to the output module.
  • a third party interface module coupled to the best intent determination module for transmitting the user text information and the interaction request when the best intent determination module determines that there is no best intent Giving a third-party system and receiving output information returned by the third-party system, and transmitting the output information to the output module.
  • the application adopts an intent tree backtracking mechanism, and provides a human-computer interaction method and system with simple operation, fast system processing and accurate response, and only needs to mark the intention of the corpus in the vertical domain and the superior intention, and does not need other common corpus annotation, saving A lot of time for marking processing.
  • only the classifier is used to predict the intent
  • the backtracking mechanism is used to find the optimal node to obtain accurate output information. It can realize the interaction and the topic switching and interaction under the same topic in the business, and can communicate with the third-party system to realize the switching of the topic outside the service, and provide the user with the reply information of different topics.
  • the application can be applied to the user's intention. Clear areas where the completion of the transaction has clear steps, such as banks, courts, hospitals, etc.
  • FIG. 1 is a general flow chart of a human-computer interaction method according to the present application.
  • FIG. 2 is a schematic diagram showing the relationship of an embodiment of an intent tree of the present application.
  • FIG. 3 is a flow chart of a method for determining a best intention in a human-computer interaction method according to the present application
  • FIG. 4 is a flow chart of another method for determining a best intention in the human-computer interaction method according to the present application.
  • FIG. 5 is a schematic block diagram of a human-machine interaction system according to the present application.
  • FIG. 6 is a schematic block diagram of a best intent determination module of the present application.
  • Embodiment 7 is a schematic block diagram of Embodiment 1 of a best intention determination module
  • FIG. 8 is a schematic block diagram of a second embodiment of the best intention determination module
  • FIG. 9 is another schematic block diagram of a human-machine interaction system according to the present application.
  • Figure 10 is a schematic diagram showing the relationship of another embodiment of the intent tree of the present application.
  • FIG. 11 is a schematic diagram of an intent tree and a backtracking process thereof according to Embodiment 1 of the present application.
  • FIG. 1 is a general flow chart of the applicant's machine interaction method. As shown in FIG. 1, the human-computer interaction method described in the present application includes the following steps:
  • Step S1 Identify the user's voice input information as user text information
  • Step S2 determining, according to the user tree information group and the intent node tag, the best intention by using an intent classifier and corresponding data processing based on the intent tree node group;
  • Step S3 query a comparison table of intent and output information according to the best intention, and obtain corresponding output information
  • Step S4 outputting the output information.
  • the voice information input by the user is identified as the corresponding user text information by the voice recognition technology, which facilitates subsequent processing. Since the speech recognition technology is already a mature technology, the present application will not be described again, and those skilled in the art can refer to any of the current speech recognition technologies.
  • the intent tree in step S2 is the intent tree stored in the system database.
  • the intent tree includes a plurality of nodes in an upper and lower level relationship, each node is marked in the form of an intent node tag, and the path of each node is recorded in the system, thereby determining that the node is in the intent tree. s position.
  • FIG. 2 is a schematic diagram of a relationship between an embodiment of an intent tree of the present application
  • the present embodiment takes a banking system in a vertical domain as an example.
  • a four-level intent node is listed, the highest level is the root intent node Root, and the next level includes three intent nodes of "deposit", "withdrawal” and "loan".
  • the next-level intent node of the intent node “withdrawal” it is “withdrawal of 20,000 or less", “withdrawal of 2 to 50,000", and “withdrawal of more than 50,000”.
  • the lower-level intent nodes of the intent node “withdrawal of 20,000 or less” include "bank card withdrawal of 20,000 or less” and “book pass withdrawal of 20,000 or less”.
  • the lower-level intent nodes of the intent node “withdrawal of more than 50,000” include “reservation of more than 50,000 for reservations” and “reservation of more than 50,000 for reservations”.
  • the present application determines the user's intention by using the intent classifier and the corresponding data processing.
  • the specific process is shown in FIG. 3, which is a method flow for determining the best intention in the applicant's machine interaction method. Figure, as follows:
  • Step S21 Assigning an intent node label in the intent classifier input information.
  • the intent classifier input information is merged information of user text information and an intent node tag.
  • the user text information is obtained by identifying the user voice input information in step S1, which is already a known amount.
  • the intent node tag is a variable.
  • the system database stores information of the "current intent node", which includes the current intent node tag, and thus the current intent node tag is a known parameter.
  • the current intent node can be any node in the intent tree, such as an intent node at the last level. During use, the system saves information about the current intent node person determined after the last interaction.
  • a node branch from the current intent node to the root intent node may be determined according to the location of the current intent node in the intent tree node group.
  • the current intent node tag can be obtained and assigned to the intent node tag in the intent classifier input information.
  • Step S22 combining the user text information and the current intent node label into new text information.
  • Step S23 performing word segmentation processing on the new text information.
  • the new text information is segmented using any of the word segmentation tools of the prior art. For example, for the text message "I want to take some money, about 20,000 look", the word segmentation tool divides it into: I / think / take / some / money /, / probably / 20,000 / look.
  • step S24 the text after the word segmentation is vectorized, for example, by querying the word vector in the corpus, thereby converting the text into a combination of a plurality of high-dimensional vectors.
  • the transformed vector can be expressed as: [V1, V2, V3, V4, V5, V6, V7, V8, V9], where V1-V9 are the corresponding word vectors of the respective participles in the example sentence.
  • the word vector is used as an input of the intent classifier to obtain a prediction intent.
  • the intent classifier is an intent classification model, with a text vector as an input and an intent node tag as an output, and the intended intent node tag can be used to determine what the intent is.
  • a neural network such as a Convolutional Neural Network (CNN) or a Recurrent Neural Networks (RNN), can also be used to obtain a corresponding model formula.
  • CNN Convolutional Neural Network
  • RNN Recurrent Neural Networks
  • Step S26 Find a comparison table of the intent and the preset input information according to the prediction intention, and obtain corresponding preset input information.
  • a comparison table of intent and preset input information is stored in the database, which is a correspondence table of pre-designed, stored questions or questions and their intentions.
  • the database is a correspondence table of pre-designed, stored questions or questions and their intentions.
  • one intent can correspond to multiple questions.
  • Step S27 Calculate the similarity between the user text information and the corresponding preset input information, respectively, and obtain the maximum similarity. Since the similarity between the two sentences is higher, the more similar and more reliable, the accuracy of obtaining the output information can be effectively improved by calculating the similarity of the two sentences.
  • the similarity between sentences and sentences can be calculated from multiple dimensions, including grammar, semantics, and sentence patterns.
  • sentence patterns Regarding grammatical similarity (syntaxSim), considering the order of words, the length of sentences, etc.; with respect to semantic similarity (semanticSim), the sentence vectors are obtained by weighting and averaging the word vectors of each word, and the cosine values between vectors are calculated; The sentence similarity (classSim), given 0 or 1 by judging whether the sentence belongs to the same sentence type.
  • Embodiment 1 of similarity calculation is a diagrammatic representation of Embodiment 1 of similarity calculation:
  • the neural network can also be used to calculate, after the sentence is vectorized, the CNN, RNN or RNN+attention (attention cycle neural network) is used to train the similarity model by calculating the Euclidean distance or the cosine angle of the two sentences. Thus the similarity of the two sentences is obtained.
  • the calculation of this embodiment is simple and easy to explain.
  • Embodiment 2 of similarity calculation is a diagrammatic representation of Embodiment 2 of similarity calculation:
  • the trained model can calculate the similarity between two sentences.
  • Sim(A,B) f(Wx1+b, Wx2+b)
  • X1 and X2 are the vectors of sentence A and sentence B, respectively
  • W and b are neural network parameters
  • f is similarly calculated by Euclidean distance or cosine angle. The function of degrees. This embodiment requires a large amount of corpus for training, and thus has high accuracy.
  • Embodiment 3 of similarity calculation is a diagrammatic representation of Embodiment 3 of similarity calculation:
  • the sentences in the same intent are considered similar to the intentions, and the sentences in different intents are considered to be dissimilar, and the trained model can calculate the similarity between the sentences and the intentions.
  • Sim(A,C) f(Wx1+b, Wx2+b)
  • X1 and X2 are the vectors of sentence A and intention C, respectively
  • W and b are neural network parameters
  • f is similarly calculated by Euclidean distance or cosine angle. The function of degrees. This embodiment is based on a large amount of corpus training, and the calculation speed is fast, which can effectively improve the response speed of the system.
  • the system can calculate the similarity by using the method of the first embodiment. After gradually accumulating a large amount of corpus, the method transitions to the method of the second embodiment or the third embodiment. When the performance of the device is sufficient, the three similarities can also be used at the same time. Finally, comprehensive consideration is given to the use of voting mechanisms or other algorithms to determine the final similarity.
  • Step S28 determining whether the current intent node is the root node, if not, executing step S29, and if yes, executing step S30.
  • Step S29 Obtain an intent node label of a higher-level node of the current intent node. Based on the location of the intent tree and the current intent node, a branch of the node from the current intent node to the end of the root node can be determined. Thus, in this step, the intent node label of the upper level node of the current intent node is obtained from the node branch.
  • step S21 the intent node label of the upper level node of the current intent node is assigned to the intent node label in the intent classifier input information; step S21-27 is repeatedly performed to obtain the maximum similarity of the further prediction intent. Then, it is determined by step S28 whether or not to end the loop processing, and when the intent node label of the root node obtains the prediction intention and its maximum similarity, the loop processing is stopped. The maximum similarity of the plurality of prediction intents is obtained by the aforementioned loop processing.
  • Step S30 Compare the scores of the maximum similarities of the plurality of prediction intentions, and determine the maximum similarity with the largest score as the global maximum similarity.
  • Step S31 determining whether the global maximum similarity is greater than or equal to the first threshold, and if the maximum similarity is greater than or equal to the first threshold, indicating that the predicted intent at this time can be determined as the true intention of the user, and thus in step S32 Determining that the predicted intent corresponding to the maximum similarity conforms to the user's intention, and determines it as the best intention.
  • step S33 the user text information of the user and the interaction request requesting the output information are transmitted to the third party system.
  • the third-party system After receiving the interaction request and the user text information, the third-party system performs related processing, and sends the obtained output information to the system, and the system receives the output information returned by the third-party system, and outputs the output information in step 4.
  • Another way of processing is to output specific output information, such as "please re-enter", in this case.
  • the maximum predicted intent obtained by each intent node is calculated step by step. Similarity. After calculating an intent node, it is determined whether the currently calculated intent node is a root node, and if it is a root node, it indicates that the maximum similarity of the prediction intent obtained according to all intent nodes on the node branch has been calculated, if not the root The node continues to take down an intent node to calculate. Those skilled in the art will appreciate that such a calculation process may also have corresponding changes.
  • the process of obtaining the predicted intent and the process of calculating the maximum similarity of the predicted intent may be performed separately or in parallel. Which method is specifically adopted, those skilled in the art can flexibly adopt any of the foregoing methods according to system requirements and specific software and hardware requirements.
  • FIG. 4 is another method flow chart for determining the best intention of the present application.
  • Step S21a assigning an intent node tag in the intent classifier input information, that is, setting an intent node tag in the intent classifier input information as a current intent node tag.
  • Step S22a combining the user text information and the current intent node label into new text information.
  • Step S23a performing word segmentation processing on the new text information.
  • step S24a the text after the word segmentation is vectorized.
  • Step S25a using the word vector as an input to the intent classifier, obtains a prediction intent.
  • Step S26a searching a comparison table of intent and preset input information, and obtaining preset input information corresponding to the prediction intention;
  • Step S27a calculating a similarity between the user text information and the corresponding preset input information, to obtain a maximum similarity corresponding to the predicted intent;
  • Step S28a determining whether the maximum similarity of the prediction intention is greater than or equal to a second threshold, and if the maximum similarity of the prediction intention is greater than or equal to a second threshold, determining, in step S29a, that the prediction intention meets a user intention, Set it to the best intention. If the maximum similarity of the prediction intent is less than the second threshold, in step S30a, it is determined whether the node for the intent node label in the intent classifier input information is the root node, and if not, step S31a is performed. If it is the root node, step S32a is performed.
  • Step S31a Obtain an intent node label of a higher-level node of the current intent node.
  • the intent node label of the upper level node of the current intent node is assigned to the intent node label in the intent classifier input information, and the above steps are repeated.
  • Step S32a obtaining a maximum similarity of the scores, that is, a global maximum similarity, according to the maximum similarity of the plurality of predicted intents that have been calculated;
  • Step S33a determining whether the global maximum similarity is greater than or equal to the first threshold, and if greater than or equal to the first threshold, determining, in step S34a, the prediction intent corresponding to the global maximum similarity as the best intention; The similarity is less than the first threshold, and in step S35a, a request is sent to a third party or specific output information is acquired.
  • the threshold value used in the similarity calculation is stored. If the similarity calculation values of the two sentences reach the threshold, the two sentences are the same, or may be basically considered to be the same. In this case, it can be determined that the current prediction intention is the true intention of the user to input the information, thereby determining the current prediction intention as the best intention.
  • the calculated value of the similarity is smaller than the threshold, the difference between the two sentences is large, and the intention of the user to input the information is different from the intention of the current prediction.
  • Two thresholds namely a first threshold and a second threshold, are set in the embodiment of the present application.
  • the second threshold is greater than the first threshold, that is, the system does not need to backtrack the intent tree, and the second threshold is used to determine whether the user intention has been obtained, thereby speeding up the processing and improving the response of the system. speed. Due to the setting of the second threshold higher score, the correlation of the two sentences is improved, but when the current maximum similarity is less than the second threshold, it is not conducive to determining whether the two sentences at this time are the same topic, and whether it is needed Go to a third-party system. Therefore, when the global maximum similarity is less than the second threshold, it needs to be compared with the first threshold for determining whether it is necessary to go to the third-party system.
  • third-party systems and the system can be serialized or paralleled, and serially, the system requests third-party systems when the best intent is not found. Parallel or simultaneous requests, the system saves time by simultaneously requesting a third-party system while performing the steps of determining the best intent.
  • a comparison table of intent and output information is stored in the system database of the present application. Therefore, in step 3, the comparison table is queried according to the best intent obtained in step 2, and the output information can be determined, thereby outputting in step 4. The output information.
  • the output information described in the look-up table is textual information in a preferred embodiment.
  • a device such as a display interface
  • output text information when there is a device such as a display interface, output text information.
  • voice information that is, to convert the text information into voice information before outputting, for example, by converting tts into voice information and playing.
  • the comparison table of the intent and the output information may be a one-to-one correspondence or a one-to-many correspondence, that is, one intent may correspond to a plurality of output information, and at this time, one output information may be randomly selected.
  • Step S1b collecting the input question and outputting the corresponding information of the answer.
  • step S2b the intent of the input question and the corresponding superior intention are marked for each pair of input questions and corresponding information of the output answer.
  • Step S3b generating an intent tree according to the intent of the annotation and the corresponding superior intention.
  • Step S4b combining each input question information and the intent node label into new text information.
  • Step S5b using the word vector of the new text information as an input of an intent classifier, which is an intent classification model in the foregoing method, and obtains a prediction intent by an intent classifier.
  • an intent classifier which is an intent classification model in the foregoing method.
  • the corpus can be continuously expanded to provide sufficient and rich corpus content for the human-computer interaction described in the present application.
  • the application also provides a human-computer interaction system, and its principle block diagram is shown in FIG. 5.
  • the system includes a speech recognition module 1, a best intent determination module 2, a query module 3, and an output module 4.
  • the voice recognition module 1 receives the voice information input by the user, and identifies the voice input information of the user as the corresponding user text information.
  • the best intent determination module 2 is connected to the speech recognition module 1 and the database, and the database stores an intent tree and an intent node tag, and the best intent determination module 2 obtains an intent node tag from the database, and Based on the user text information obtained by the speech recognition module 1, based on the node group of the intent tree, the intent classifier is used to determine the best intent; after the best intent is obtained, the best intent is sent to the query module 3.
  • the query module 3 queries the database for a comparison table of intent and output information according to the best intention, thereby obtaining corresponding output information, and transmitting the output information to the output module 4. After the output module 4 obtains the output information, the output information is output according to a format set or a format requested by the user, for example, by text, voice, or the like.
  • the schematic block diagram of the best intention determination module 2 is specifically as shown in FIG. 6, and includes a merging unit 21, an intent classifier 22, a verification unit 23, and a determining unit 24.
  • the merging unit 21 is respectively connected to the voice recognition module 1 and the database, respectively obtains user text information and an intent node label, merges the user text information and the intent node label into new text information, and merges the new text information.
  • the text information is sent to the intent classifier 22.
  • the intention classifier 22 takes the combined information of the merging unit as input information to obtain a prediction intention.
  • the verification unit 23 is configured to verify whether the predicted intent meets the user's intention.
  • the determining unit 24 is configured to determine a predicted intent that conforms to the user's intention as the best intention.
  • the verification unit package 23 includes a lookup subunit 231, a similarity calculation subunit 232, and a threshold comparison subunit 233.
  • the searching sub-unit 231 is configured to search the comparison table of the intent and the preset input information according to the prediction intention output by the intent classifier to obtain preset input information corresponding to the prediction intention; the similarity calculation sub-unit 232 uses The similarity between the user text information and the corresponding preset input information is calculated, and the maximum similarity corresponding to the prediction intention is obtained.
  • the threshold comparison sub-unit 233 is configured to compare the maximum similarity with the magnitude of the threshold, and send the comparison result to the determining unit.
  • FIG. 7 is a schematic block diagram of the first embodiment of the best intention determining module 2.
  • the merging unit 21a is respectively connected with the voice recognition module 1 and the database, respectively obtains the user text information and the current intent node label, merges the user text information and the current intent node label into new text information, and merges the new text information.
  • the text information is sent to the intent classifier 22a.
  • the merging unit further includes a notification receiving interface for receiving the merge notification, thereby performing merging of the user text information and the new intent node label.
  • the intention classifier 22a takes the combined information of the merging unit 21a as input information to obtain a prediction intention.
  • the lookup subunit 231a searches the lookup table of the intent and the preset input information to obtain preset input information corresponding to each prediction intent.
  • the similarity calculation sub-unit 232a calculates the similarity between the user text information and the corresponding preset input information, and obtains the maximum similarity of the corresponding prediction intention.
  • the similarity calculation sub-unit 232a includes a notification output interface that sends a merge notification to the merging unit 21a after calculating the maximum similarity of one prediction intention.
  • the merging unit 21a receives the merge notification by notifying the receiving interface, so that the intent node of the current intent node obtains the intent node tag of the current intent node in the intent tree, performs new merging, and sends the merged information to the intent classifier. 22a.
  • the intent classifier 22a may also send a merge notification to the merging unit 21a after obtaining a prediction intention, and at this time, the similarity calculation sub-unit 232a is not required to transmit the merge notification.
  • the similarity calculation sub-unit 232a obtains the global maximum similarity with the largest score from the maximum similarities of the plurality of prediction intentions, and It is sent to the threshold comparison unit 233a.
  • the threshold comparison unit 233a receives the global maximum similarity, and obtains a first threshold from the database, and compares the global maximum similarity with the first threshold. If the global maximum similarity is greater than or equal to the first threshold, the determining unit 24a The notification is transmitted, and the determining unit 24a determines the prediction intention corresponding to the global maximum similarity as the best intention. If the global maximum similarity is less than the first threshold, the user text information and the interaction request are sent to the third party system 6 through the third party interface module 5.
  • the third-party system 6 processes the interaction request and the user text information, and sends the processed reply information (that is, the output information of the user should be replied to) to the system.
  • the third-party interface module 5 receives the output information returned by the third-party system 6, and sends the output information to the output module 4, and the output module 4 outputs the information.
  • the third-party interface module 5 of the present application is connected to the third-party system 6 in order to solve the problem of providing the user with non-system service content.
  • the question information input by the user is sometimes not the content that the system can solve.
  • the user asks questions in other fields, such as "how much gas cost.”
  • the system predicts an intention according to user input information, when the prediction intention is already the root intention in the intention tree, when the similarity is calculated, the global maximum similarity is still less than the first threshold set internally.
  • the system can determine that the question information input by the user at this time is a topic outside the system, so the user text information and the interaction request are sent to the third-party system, and at this time, the question is processed by the third-party system.
  • the information is obtained by the third-party system, and the third-party system sends the reply information to the system, and is received by the third-party interface module 5 of the system, and sends the reply information to the output module 4, Output module 4 output. Therefore, the system can not only interact with the user on the topic of the system domain, but also can switch between topics in different fields, thereby realizing interaction without topic obstacles and answering various questions raised by the user.
  • the principle block diagram of the second embodiment of the module 2 is determined.
  • the structural composition is the same as that of the first embodiment of the best intention determination module 2, but the workflow is different, as follows:
  • the merging unit 21b obtains user text information and current intent node tags, merges the user text information and the current intent node tag into new text information, and transmits the new text information to the intent classifier 22b.
  • the merging unit 21b further includes a notification receiving interface for receiving a merge notification to perform a new merging.
  • the intention classifier 22b takes the combined information of the merging unit 21b as input information to obtain a prediction intention.
  • the lookup subunit 231b is configured to search the lookup table of the intent and the preset input information according to the prediction intention output by the intention classifier 22b, and obtain preset input information corresponding to the prediction intention.
  • the similarity calculation sub-unit 232b calculates the similarity between the user text information and the corresponding preset input information, obtains the maximum similarity of the corresponding prediction intention, and transmits it to the threshold comparison unit 233b.
  • the threshold comparison unit 233b receives the maximum similarity, and obtains a second threshold from the database, compares the maximum similarity and the second threshold, and if the maximum similarity is greater than or equal to the second threshold, sends the determination to the determining unit 24b.
  • the notification, determination unit 24b determines the prediction intention corresponding to the maximum similarity as the best intention. If the maximum similarity is less than the second threshold, the merge notification is sent to the merging unit 21b through the notification interface.
  • the merging unit 21b performs a new merging according to the merging notification received by the interface, so that the intent node of the upper-level node of the current intent node is obtained in the intent tree in the database, and the merged information is sent to the intent classifier. 22b.
  • the intent classifier 22b obtains another prediction intent based on the new input information.
  • the workflow of each component is as described above. Until the maximum similarity of the prediction intention obtained according to the root intent node by the threshold comparison unit 233b is also smaller than the second threshold by comparison, the notification is sent to the similarity calculation unit 232b, which is required to provide the global maximum similarity.
  • the similarity calculation unit 232b obtains the global maximum similarity from the maximum similarity of all the predicted intentions, and transmits it to the threshold comparison unit 233b.
  • the threshold comparison unit 233b receives the global maximum similarity, and obtains a first threshold from the database, and compares the global maximum similarity with the size of the first threshold. If the global maximum similarity is greater than or equal to the first threshold, the determining unit 24b The notification is transmitted, and the determining unit 24b determines the prediction intention corresponding to the global maximum similarity as the best intention. If the global maximum similarity is less than the first threshold, the user text information and the interaction request are sent to the third-party system 6 through the third-party interface module 5.
  • the intent classifiers 22, 22a, 22b include a text vectorization unit for segmenting the user input text information transmitted by the merging unit 21, 21a, 21b and the merged new text information of the intent node label. And text vectorization processing, the resulting word vector is used as input to the intent classifiers 22, 22a, 22b.
  • the best intent determination module 2 includes the text vectorization unit, i.e., separately from the intent classifiers 22, 22a, 22b, for modular design and maintenance.
  • the system further includes a current intent node maintenance module 7, as shown in FIG. 9, for maximizing the path length in the best intent node path after determining the best intention. If the good intention node is set to the current intent node, the best intent is changed to the current intent, so that the current intent node can be quickly obtained at the beginning of the next interaction process of the system. When the backtracking intention tree still does not get the best intention, the original current intent node remains unchanged.
  • the current intent node maintenance module 7 sets the best intent node as the current intent node. However, sometimes there are intent nodes with multiple identical tags in the intent, as shown in Figure 10. At this time, if there are two or more identical best-intention node tags in the middle, at this time, the node with the longest and deepest path needs to be set as the current intent node. Specifically, first, searching for the best intent node tag in the node tag set of the intent tree to obtain a node path corresponding to the best intention; and when the node path corresponding to the best intention is multiple, the path length is maximum The best intent node is determined to be the current intent node.
  • the human-computer interaction process is briefly described in conjunction with the intent tree shown in FIG. 11 and its back-end process diagram.
  • the current intent in the system is "bank card withdrawal of less than 20,000", and thus, from the current intent node to the node branch of the root intent node, from low to high (the root intent node is at the highest level)
  • the level includes “withdrawal of bank card below 20,000”, “withdrawal of less than 20,000”, “withdrawal” and “Root”.
  • Step S1 Identification: The system recognizes the user's voice input “I want to loan” as text information.
  • Step S2 merge 1: "Bank card withdrawal below 20,000” + "I want a loan”.
  • Step S3 text vectorization: word segmentation, and vectorization.
  • Step S4 Predicting Intent: Using the intent classifier to obtain the predicted intent "with bank card withdrawal of 20,000 or less".
  • Step S5 calculating the similarity, and obtaining the maximum similarity score: 0.485366550785.
  • Step S6 merge 2: "Withdrawals below 20,000” + "I want a loan”.
  • Step S7 text vectorization: word segmentation, and vectorization.
  • Step S8 predicting intent: using the intent classifier to obtain the predicted intent "withdrawal".
  • Step S9 calculating the similarity, and obtaining the maximum similarity score: 0.577754257751.
  • Step S10 merge 3: "Withdrawal” + "I want to loan”.
  • Step S11 text vectorization: word segmentation, and vectorization.
  • Step S12 predicting intent: using the intent classifier to obtain the predicted intent "withdrawal".
  • Step S13 calculating the similarity, and obtaining a maximum similarity score: 0.353053754796.
  • Step S14, merge 4 merge: "ROOT” + "I want to loan”.
  • Step S15 text vectorization: word segmentation, and vectorization.
  • Step S16 predicting intent: using the intent classifier to obtain the predicted intent "loan”.
  • Step S17 calculating the similarity, and obtaining a maximum similarity score: 1.0.
  • Step S18 Obtain a global maximum similarity with the largest score from the plurality of maximum similarities: 1.0.
  • Step S19 Comparing the global maximum similarity score 1.0 with the set first threshold value 0.8, and the global maximum similarity is greater than the first threshold.
  • Step S20 Determine a predicted intent "loan” corresponding to the global maximum similarity as a best intention.
  • Step S21 Select the output information "How much do you want to loan” corresponding to the "loan”, and update the node of the current intent in the system as "loan”.
  • Step S22 outputting the voice information of "how much do you want to loan" to the user.
  • each node in the branch is traversed by backtracking the intent tree, thus increasing the accuracy of the best intent.
  • Step S1 Merger: "Bank card withdrawal is less than 20,000” + "Bank card withdrawal is less than 20,000”.
  • Step S2 forecasting intention: "Bank card withdrawal is less than 20,000”.
  • Step S3 calculating the maximum similarity: 1.
  • Step S4 Comparing the maximum similarity score 1.0 with a set second threshold value 1.0, the maximum similarity being equal to the second threshold value.
  • Step S5 determining a prediction intention "bank card withdrawal of 20,000 or less" corresponding to the maximum similarity is a best intention.
  • step S6 the output information corresponding to the “bank card withdrawal of 20,000 or less” is selected, “Please go to the self-service cash machine to withdraw money”, and the node that updates the current intention in the system is “the bank card withdrawal amount is less than 20,000”.
  • step S7 the user outputs a voice message of “please withdraw money to the self-service cash machine”.
  • the intent tree is no longer traversed, thereby saving processing time and improving the response speed of the system to the user.
  • the present application provides a human-computer interaction method with simple operation, fast system processing, and accurate response for vertical fields with clear intentions and clear steps for the completion of transactions, such as banks, courts, and hospitals. system.
  • the intention tree backtracking mechanism is used, only the intent of the corpus in the vertical domain and the superior intention are marked, and no other common corpus annotation is needed, which saves a lot of time for the annotation processing.
  • only the classifier is used to predict the intent, and the backtracking mechanism is used to find the optimal node to obtain accurate output information. It can realize the interaction and the topic switching and interaction under different topics in the business. By communicating with the third-party system, the topic can be switched outside the service domain, and the user can provide reply information of different topics.
  • embodiments of the present invention can be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment, or a combination of software and hardware. Moreover, the invention can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) including computer usable program code.
  • computer-usable storage media including but not limited to disk storage, CD-ROM, optical storage, etc.
  • the computer program instructions can also be stored in a computer readable memory that can direct a computer or other programmable data processing device to operate in a particular manner, such that the instructions stored in the computer readable memory produce an article of manufacture comprising the instruction device.
  • the apparatus implements the functions specified in one or more blocks of a flow or a flow and/or block diagram of the flowchart.
  • These computer program instructions can also be loaded onto a computer or other programmable data processing device such that a series of operational steps are performed on a computer or other programmable device to produce computer-implemented processing for execution on a computer or other programmable device.
  • the instructions provide steps for implementing the functions specified in one or more of the flow or in a block or blocks of a flow diagram.
  • a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
  • the memory may include non-persistent memory, random access memory (RAM), and/or non-volatile memory in a computer readable medium, such as read only memory (ROM) or flash memory.
  • RAM random access memory
  • ROM read only memory
  • Memory is an example of a computer readable medium.
  • Computer readable media includes both permanent and non-persistent, removable and non-removable media.
  • Information storage can be implemented by any method or technology.
  • the information can be computer readable instructions, data structures, modules of programs, or other data.
  • Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory. (ROM), electrically erasable programmable read only memory (EEPROM), flash memory or other memory technology, compact disk read only memory (CD-ROM), digital versatile disk (DVD) or other optical storage, Magnetic tape cartridges, magnetic tape storage or other magnetic storage devices or any other non-transportable media can be used to store information that can be accessed by a computing device.
  • computer readable media does not include temporary storage of computer readable media, such as modulated data signals and carrier waves.
  • embodiments of the present application can be provided as a method, system, or computer program product.
  • the present application can take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment in combination of software and hardware.
  • the application can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) including computer usable program code.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本申请实施例提供一种人机交互方法和系统,所述方法包括:将用户的语音输入信息识别为对应的用户文本信息;根据所述用户文本信息和意图节点标签,基于意图树节点群,通过意图分类器及对应的数据处理确定最佳意图;根据所述最佳意图,查询意图与输出信息的对照表,得到对应的输出信息;和输出所述输出信息。所述系统包括语音识别模块、最佳意图确定模块、查询模块和输出模块。本申请实施例采用意图树回溯机制,提高了对用户意图的识别准确率,操作简单、系统响应快速。

Description

人机交互方法和系统
交叉引用
本申请引用于2017年10月31日递交的名称为“人机交互方法和系统”的第201711054329.6号中国专利申请,其通过引用被全部并入本申请。
技术领域
本申请涉及自动应答系统技术领域,具体地说,涉及一种人机交互方法和系统。
背景技术
随着社会发展,实现各类功能的机器人在社会中扮演着越来越多角色。在一些服务行业,友好、高效的人机交互显得尤为重要。在诸多的人机交互方式,如触控交互、体感交互、文字方式、语音方式等中,文字和语音为最为常见的交互方式。例如银行系统中使用的ATM机、零售业中付款时使用的刷卡机等等,多使用文字交互方式,为人机交流提供了准确的问答信息。然而相对于语音方式,文字方式的人机交互存在一定的限制,例如当使用人群为儿童、具有一定阅读障碍的人群时,文字方式的人机交互不能为这类群体提供有效的服务。相对而言,语音方式则是一种比较理想、直接、方便的人机交互方式。因而目前市面上人机交互系统中的问答系统,多采用语音交互方式。即用户提出问题,系统对用户的问题做出响应,以语音的方式给出相应的答案,有的还同时伴随着其他相应的操作。
然而,由于在目前的技术中,对语义理解仍然存在很多的技术难点,因而上述的问答系统大部分停留在浅层的对话系统,通常只能根据特定的用户指令在系统数据库中对用户指令进行简单的匹配,从而给出相应的语音响应, 这种模式的人机交互和人与人的交流相去甚远,远不能满足用户的需求。
为了提高对用户语音输入的语义理解,业界做出了很多技术尝试。例如,通过语音识别、语义合成等技术,突破传统系统中一定要输入特定的语音指令的语音输入方式,可以基于自然语言进行人机交互。然而,由于大多数的这类系统采用基于语法的自然语言理解,虽然可以实现语义理解的功能,但是自然语言中的口语很多时候是不规则的,甚至不符合语法,这就导致了识别的失败或错误。
另有通过语义网络对用户输入的语音信号进行语义匹配,其主要是通过一些预设的语义关系库和句型关系模板等,对语义分析的内容进行语义匹配。这种方法对用户使用的语言有较高限制,一旦出现没有经过预设的指令,则很难识别。
公告号为104360994A、发明名称为一种自然语言理解方法及系统的发明专利申请提供了另外一种方案,采用Ranking SVM(基于支持向量机的排序学习算法),对文本提取特征向量,然后采用线性核的SVM,实现基于统计的排序,对多场景语义解析结果与用户输入的自然语言之间的相关度进行排序。该方法的不足在于:容易受噪声干扰,且容易产生过拟合,因而对自然语言的理解不够准确。
公告号为CN106156003A、发明名称为一种问答系统的问句理解方法的发明专利申请提供了一种槽填充的方法来获得问句的理解,具体方案是:通过循环神经网络建模一同解决问句中的意图识别任务和槽填充任务,提高问句理解的准确率。但是槽填充相关技术在使用时,需要对句子进行分析,判断属于什么事件、抽取其中的实体、查找满足的槽等等操作,在实现上相对复杂,且只能解决同一话题内的对话,无法实现话题的转换。
发明内容
本申请所要解决的技术问题在于,针对现有人机交互技术中对用户意图 的理解不够准确,提供一种人机交互方法和系统,用于实现准确的人机交流。
本申请通过如下技术方案解决了上述的技术问题:
一种人机交互方法,包括以下步骤:
将用户的语音输入信息识别为用户文本信息;
根据所述用户文本信息和意图节点标签,基于意图树节点群,通过意图分类器及对应的数据处理确定最佳意图;
根据所述最佳意图,查询意图与输出信息的对照表,得到对应的输出信息;和
输出所述输出信息。
其中,在上述方法中,所述根据所述用户文本信息和意图节点标签,基于意图树节点群,通过意图分类器及对应的数据处理确定最佳意图的步骤具体包括:
获取当前意图节点标签。
从所述意图树节点群中确定从当前意图节点到根意图节点的节点分支;
将所述用户文本信息和意图节点标签合并为意图分类器的输入信息;
通过将意图分类器输入信息中的意图节点标签替换为所述节点分支中的意图节点标签,利用所述意图分类器得到对应的预测意图;
验证所述预测意图是否符合用户意图,将符合用户意图的预测意图确定为最佳意图。
在前述方法中,将意图分类器输入信息中意图节点标签替换为所述分支中的意图节点标签时,从当前意图节点开始,到根意图节点结束,分别用每一个节点的意图节点标签替换意图分类器输入信息中的意图节点标签,得到多个对应的预测意图;
验证所述预测意图是否符合用户意图的步骤包括:
查找意图与预置输入信息的对照表,得到与每一预测意图对应的预置输入信息;
计算用户文本信息与对应的预置输入信息的相似度,获得对应每一预测 意图的最大相似度;
比较所述多个预测意图的最大相似度的分值大小,将分值最大的最大相似度确定为全局最大相似度;和
比较所述全局最大相似度和第一阈值的大小,如果所述全局最大相似度大于或等于所述第一阈值,则确定与所述全局最大相似度对应的预测意图符合用户意图。
其中,如果所述全局最大相似度小于所述第一阈值,或者获取对应的特定输出信息,并输出所述特定输出信息;或者向第三方系统发送交互请求;接收第三方系统返回的第三方交互输出信息;和输出所述第三方交互输出信息。
其中,将意图分类器输入信息中意图节点标签替换为所述节点分支中的意图节点标签时,从当前意图节点开始,用当前意图节点标签替换意图分类器输入信息中的意图节点标签,得到对应的预测意图;
验证所述预测意图是否符合用户意图的步骤包括:
查找意图与预置输入信息的对照表,得到与所述预测意图对应的预置输入信息;
计算用户文本信息与对应的预置输入信息的相似度,得到对应所述预测意图的最大相似度;
比较所述预测意图的最大相似度和第二阈值的大小,如果所述预测意图的最大相似度大于或等于第二阈值,则确定所述预测意图符合用户意图;
如果所述预测意图的最大相似度小于所述第二阈值,将意图分类器输入信息中意图节点标签替换所述分支中当前意图节点的上一级节点的意图节点标签,重复上述步骤。
当将意图分类器输入信息中意图节点标签替换为根意图节点标签时得到的对应最大相似度小于所述第二阈值时,根据已计算得到的多个预测意图的最大相似度,比较所述多个预测意图的最大相似度的分值大小,将分值最大的最大相似度确定为全局最大相似度;和
比较所述全局最大相似度和第一阈值的大小,如果所述全局最大相似度大于或等于所述第一阈值,则确定与所述全局最大相似度对应的预测意图符合用户意图;如果所述全局最大相似度小于所述第一阈值,或者获取对应的特定输出信息;并输出所述特定输出信息;或者向第三方系统发送交互请求;接收第三方系统返回的第三方交互输出信息;和向用户输出所述第三方交互输出信息。
为方便下一次交互可以快速得到当前意图节点,在将符合用户意图的预测意图确定为最佳意图后,将所述最佳意图对应的节点确定为当前意图节点。
其中,将所述最佳意图对应的节点确定为当前意图节点的过程包括:
在所述意图树的节点标签集中搜索所述最佳意图节点标签,得到最佳意图节点路径;
当最佳意图节点路径为多个时,将路径长度最大的最佳意图节点确定为当前意图节点。
在前述方法中,将所述用户文本信息和意图节点标签合并为意图分类器的输入信息的步骤具体包括:
将所述的用户文本信息和意图节点标签合并为新的文本信息;
对所述新的文本信息进行分词和文本向量化处理,得到对应的词向量;和
将所述词向量作为意图分类器的输入信息。
在前述方法中,所述意图分类器为卷积神经网络模型或循环神经网络模型。
本申请还提供了一种人机交互系统,包括:
语音识别模块,用于将用户的语音输入信息识别为用户文本信息;
最佳意图确定模块,用于根据所述用户文本信息和意图节点标签,基于意图树的节点群,利用意图分类器及对应的数据处理确定最佳意图;
查询模块,用于根据所述最佳意图,查询意图与输出信息的对照表,得到对应的输出信息;和
输出模块,用于输出所述输出信息。
其中,所述最佳意图确定模块包括:
合并单元,用于合并所述的用户文本信息和意图节点标签;
意图分类器,用于以所述合并单元的合并信息为输入信息,得到预测意图;
验证单元,用于验证所述预测意图是否符合用户意图;和
确定单元,用于将符合用户意图的预测意图确定为最佳意图。
其中,所述验证单元包括:
查找子单元,用于根据意图分类器输出的预测意图,查找意图与预置输入信息的对照表,得到与每一预测意图对应的预置输入信息;
相似度计算子单元,用于计算用户文本信息与对应的预置输入信息的相似度,获得对应预测意图的最大相似度;和
阈值比较子单元,用于比较所述最大相似度与阈值的大小,并将比较结果发送给所述确定单元。
其中,所述合并单元包括通知接收接口,用于接收合并通知;对应地,所述意图分类器包括通知输出接口,用于输出合并通知;或者所述相似度计算子单元包括通知输出接口,用于向所述合并单元发送合并通知;或者所述阈值比较子单元包括通知输出接口,用于向所述合并单元发送合并通知。
本申请所述系统还包括当前意图节点维护模块,用于在最大相似度小于阈值时,保留当前意图节点;在确定了最佳意图时,将最佳意图节点路径中路径长度最大的最佳意图节点确定为当前意图节点。
本申请所述系统还包括第三方接口模块,与所述最佳意图确定模块相连接,用于在所述最佳意图确定模块确定没有最佳意图时,将所述用户文本信息和交互请求发送给第三方系统,并接收第三方系统返回的输出信息,将所述输出信息发送给所述输出模块。
本申请采用意图树回溯机制,提供了一种操作简单、系统处理快速、响应准确的人机交互方法和系统,只需要标注垂直领域内语料的意图及上级意 图,无需其它通用语料的标注,节省了大量的注标处理时间。在具体实现过程中,只需要用分类器预测意图,并采用回溯机制寻找最优的节点便可以得到准确的输出信息。能够实现业务内同一主题下交互和不同主题下的话题切换和交互,通过与第三方系统的通信,能够实现业务外话题的切换,为用户提供不同话题的回复信息,本申请可以应用在用户意图明确、事务的完成有很清晰步骤的垂直领域,如银行、法院、医院等。
下面结合附图和具体实施例,对本申请的技术方案进行详细地说明。
附图说明
图1为本申请所述人机交互方法的总体流程图;
图2为本申请意图树的一个实施例的关系示意图;
图3为本申请所述人机交互方法中确定最佳意图的方法流程图;
图4为本申请所述人机交互方法中确定最佳意图的另一方法流程图;
图5为本申请所述人机交互系统的原理框图;
图6为本申请所述最佳意图确定模块的原理框图;
图7为最佳意图确定模块实施例一的原理框图;
图8为最佳意图确定模块实施例二的原理框图;
图9为本申请所述人机交互系统的另一原理框图;
图10为本申请意图树的另一个实施例的关系示意图;
图11为本申请应用实施例一中的意图树及其回溯过程示意图。
具体实施方式
图1为本申请人机交互方法的总体流程图。如图1所示,本申请所述的人机交互方法包括以下步骤:
步骤S1、将用户的语音输入信息识别为用户文本信息;
步骤S2、根据所述用户文本信息和意图节点标签,基于意图树节点群,通过意图分类器及对应的数据处理确定最佳意图;
步骤S3、根据所述最佳意图,查询意图与输出信息的对照表,得到对应的输出信息;和
步骤S4、输出所述输出信息。
其中,在所述步骤S1中,通过语音识别技术,将用户输入的语音信息识别为对应的用户文本信息,便于后续的处理。由于语音识别技术已为很成熟的技术,因而,本申请不再展开说明,本领域的技术人员可以参照目前的任意一种语音识别技术来完成。
步骤S2中的意图树为系统数据库中存储的意图树。在本申请中,所述的意图树包括多个呈上、下级关系的节点,每一个节点以意图节点标签的形式标注,并在系统中记录每一节点的路径,从而确定节点在意图树中的位置。
如图2所示,为本申请意图树的一个实施例的关系示意图,本实施例以垂直领域中的银行系统为例。在该实施例中,共列出了四级意图节点,最高一级为根意图节点Root,其下一级包括“存款”、“取款”和“贷款”三个意图节点。意图节点“取款”的下一级意图节点中,分别为“取款2万以下”、“取款2-5万”“取款5万以上”。意图节点“取款2万以下”的下级意图节点包括“银行卡取款2万以下”和“存折取款2万以下”。意图节点“取款5万以上”的下级意图节点包括“取款5万以上需预约”和“取款5万以上已预约”。
为了根据用户的输入信息确定用户的意图,本申请通过意图分类器及对应的数据处理来确定用户意图,具体过程如图3所示,为本申请人机交互方法中确定最佳意图的方法流程图,具体如下:
步骤S21、对意图分类器输入信息中的意图节点标签进行赋值。在本申请中,意图分类器输入信息为用户文本信息和意图节点标签的合并信息。其中,用户文本信息由步骤S1通过识别用户语音输入信息得到,此时已是已知量。而意图节点标签为一变量。系统数据库中存储有“当前意图节点”的信息, 其中包括了当前意图节点标签,因而当前意图节点标签为一已知参量。在系统的最终初始使用时,当前意图节点可以为意图树中的任意一个节点,如最末级的一个意图节点。在使用过程中,系统保存了上一次交互完成后确定的当前意图节点人相关信息。根据所述当前意图节点在意图树节点群中的位置,可以确定从当前意图节点到根意图节点的节点分支。在本步骤中,通过访问数据库中“当前意图节点”信息,便可以得到当前意图节点标签,并将其赋值给意图分类器输入信息中的意图节点标签。
步骤S22、将所述的用户文本信息和当前意图节点标签合并为新的文本信息。
步骤S23、对所述新的文本信息进行分词处理。在该步骤中,采用现有技术中任意一种分词工具对所述新的文本信息进行分词。例如,针对文字信息“我想取一些钱,大概两万样子”,分词工具将其分为:我/想/取/一些/钱/,/大概/两万/样子。
步骤S24,将分词后的文本向量化,例如,通过在语料库中查询词向量,从而将文本转换为多个高维向量的组合。如前的例句,转化后的向量可以表示为:[V1,V2,V3,V4,V5,V6,V7,V8,V9],其中V1-V9为例句中各个分词的对应词向量。
步骤S25、以所述词向量作为意图分类器的输入,得到一个预测意图。其中,所述意图分类器为一个意图分类模型,以文本向量为输入,意图节点标签为输出,通过输出的意图节点标签便可以确定是什么意图。通过对训练语料进行训练可以得到所述的意图分类模型。例如,采用公式y=softmax(Wx+b)表示该模型,其中x为输入的文本向量,W与b为神经网络的权值,y为输出向量,其中最大值对应的标签即为得到的类别,也就是本申请所述的意图节点标签。也可以使用神经网络,如卷积神经网络(Convolutional Neural Network,简称CNN)或循环神经网络(Recurrent Neural Networks,简称RNN)等来获得对应的模型公式。
在得到预测意图后,需要确定所述的预测意图是否符合用户意图,因而 需要验证所述的预测意图。验证方法有多种,以下为其中的一种:
步骤S26、根据所述预测意图,查找意图与预置输入信息的对照表,得到对应的预置输入信息。在本申请中,数据库中存储有意图与预置输入信息的对照表,其为预先设计、存储的问句或问题与其意图的对应表。通常,一个意图可对应多个问句。
步骤S27、分别计算用户文本信息与对应的预置输入信息的相似度,并获得最大相似度。由于两句话相似度越高,即越相似越可信,因而,通过计算两句话的相似度可以有效提高获得输出信息的精确性。
句子与句子间的相似度可以从多个维度来计算,包括语法、语义及句型。关于语法相似度(syntaxSim),考虑词的顺序、句子的长度等;关于语义相似度(semanticSim),通过各个词的词向量加权求平均的方式得出句子向量,并计算向量间的余弦值;句型相似度(classSim),通过判断句子是否属于同一种句型,给0或1。
相似度计算的实施例一:
句子A与B的相似度可以采用以下公式计算:
sim(A,B)=α*semanticSim(A,B)+β*syntaxSim(A,B)+γ*classSim(A,B)
其中α+β+γ=1,α>β,γ
另外,也可以利用神经网络来计算,将句子向量化后,利用CNN、RNN或RNN+attention(注意力循环神经网络),通过计算两句话的欧式距离或余弦夹角来训练相似度模型,从而得到两个句子的相似度。本实施例的计算简单,易解释。
相似度计算的实施例二:
将相同意图下的句子认为是相似句,将不同意图下的句子认为不相似,训练得到的模型即可计算两句话之间的相似度。
句子A与句子B的相似度可由以下用简单的公式表示:
sim(A,B)=f(Wx1+b,Wx2+b),X1、X2分别为句子A与句子B的向量,W、b为神经网络参数,f为通过欧式距离或余弦夹角计算相似度的函数。本 实施例需要大量的语料来进行训练,因而准确度高。
相似度计算的实施例三:
将相同意图下句子与意图认为相似,将不同意图下句子与意图认为不相似,训练得到的模型即可计算句子与意图之间的相似度。
句子A与意图C的相似度可由以下用简单的公式表示:
sim(A,C)=f(Wx1+b,Wx2+b),X1、X2分别为句子A与意图C的向量,W、b为神经网络参数,f为通过欧式距离或余弦夹角计算相似度的函数。本实施例基于大量的语料训练,且计算速度快,可以有效提高系统的响应速度。
系统在初期可使用实施例一的方法计算相似度,在逐步积累大量语料后,过渡到实施例二或实施例三的方法,在设备性能足够的情况下,也可以同时使用三种相似度,最后综合考虑,采用投票机制或其它算法来决策最终的相似度。
步骤S28、判断当前意图节点是否为根节点,如果不是,执行步骤S29,如果是,执行步骤S30。
步骤S29、获取当前意图节点的上一级节点的意图节点标签。根据意图树及当前意图节点所在的位置,可以确定从所述当前意图节点开始,到根节点结束的节点分支。因而,在本步骤中,从该节点分支中获得当前意图节点的上一级节点的意图节点标签。
返回步骤S21,将当前意图节点的上一级节点的意图节点标签赋值给意图分类器输入信息中意图节点标签;重复执行步骤S21-27,得到又一个预测意图的最大相似度。而后,再由步骤S28判断是否结束循环处理过程,在所述根节点的意图节点标签得到预测意图及其最大相似度时,停止所述循环处理过程。通过前述的循环处理过程得到多个预测意图的最大相似度。
步骤S30、比较所述多个预测意图的最大相似度的分值,将分值最大的最大相似度确定为全局最大相似度。
步骤S31、判断所述全局最大相似度是否大于或等于第一阈值,如果所述最大相似度大于或等于第一阈值,说明此时的预测意图可以认定为就是用户 的真实意图,因而在步骤S32确定所述最大相似度对应的预测意图符合用户意图,将其确定为最佳意图。
如果所述最大相似度小于所述第一阈值,说明在本系统中能够得到的与用户真实意图最接近的意图仍然不能代表用户的真实意图,说明此时用户输入的信息与本系统提供的服务不相符,在这种情况下,有两种处理办法,如图3所示,在步骤S33,将用户的所述用户文本信息和要求给予输出信息的交互请求发送给第三方系统。第三方系统接收到该交互请求和所述用户文本信息后进行相关处理,将得到的输出信息发送给本系统,本系统接收第三方系统返回的输出信息,并在步骤4输出所述输出信息。另外一种处理方式,即是在此种情况下输出特定的输出信息,例如“请重新输入”等信息。
在图3所示的方法中,在计算预测意图的最大相似度时,从系统存储的当前意图节点开始,到根节点结束的节点分支,逐级计算由每一个意图节点获取的预测意图的最大相似度。在计算完一个意图节点之后,判断是否当前计算的意图节点是否为根节点,如果是根节点,说明已经计算完根据节点分支上的所有意图节点而得到的预测意图的最大相似度,如果不是根节点,则继续取下一个意图节点来计算。本领域的技术人员可以得知,这种计算流程也可以有相应的变化,例如,获取预测意图的过程与计算预测意图的最大相似度的过程可以分开或并行完成。具体采用哪种方式,本领域的技术人员可以应系统要求及具体的软、硬件要求灵活采用前述的任一种方式。
在得到预测意图后,验证所述的预测意图是否符合用户意图的方法也可以采用另外一种,如图4所示,为本申请的确定最佳意图的另一种方法流程图。
图4所示方法的前几个步骤与图3中所述的方法相同,如:
步骤S21a、对意图分类器输入信息中的意图节点标签进行赋值,即将所述意图分类器输入信息中的意图节点标签设置为当前意图节点标签。
骤S22a、将所述的用户文本信息和当前意图节点标签合并为新的文本信息。
步骤S23a、对所述新的文本信息进行分词处理。
步骤S24a、将分词后的文本向量化。
步骤S25a、以所述词向量作为意图分类器的输入,得到一个预测意图。
步骤S26a、查找意图与预置输入信息的对照表,得到与所述预测意图对应的预置输入信息;
步骤S27a、计算用户文本信息与对应的预置输入信息的相似度,得到对应所述预测意图的最大相似度;
以下是与图3所示方法不同的步骤:
步骤S28a、判断所述预测意图的最大相似度是否大于或等于第二阈值,如果所述预测意图的最大相似度大于或等于第二阈值,在步骤S29a中则确定所述预测意图符合用户意图,将其设定为最佳意图。如果所述预测意图的最大相似度小于所述第二阈值,在步骤S30a中,判断用于意图分类器输入信息中的意图节点标签的节点是否为根节点,如果不是,执行步骤S31a。如果是根节点,执行步骤S32a。
步骤S31a、获取当前意图节点的上一级节点的意图节点标签。返回步骤S21a,将当前意图节点的上一级节点的意图节点标签赋值给意图分类器输入信息中意图节点标签,重复上述步骤。
步骤S32a、根据已计算得到的多个预测意图的最大相似度,从中得到分值最大的相似度,即全局最大相似度;
步骤S33a、判断所述全局最大相似度是否大于或等于第一阈值,如果大于或等于第一阈值,在步骤S34a,将所述全局最大相似度对应的预测意图确定为最佳意图;如果全局最大相似度小于所述第一阈值,在步骤S35a,向第三方发送请求或获取特定输出信息。
在本申请中,在获得了某一预测意图的最大相似度之后,通过与预置的第二阈值相比较,来判断所述预测意图是否符合用户意图。与图3所示的方法相比,不需要回溯意图树、获取全局最大相似度再判断是否符合用户意图,因而可以提高响应速度。
在本申请的系统数据库中,存储有相似度计算时使用的阈值这一数据,如果两个句子的相似度计算值达到了这个阈值,说明这两个句子是一样的,或者可以基本认为是同样的句子,此时,便可以确定当前预测意图即是用户输入该信息的真实意图,从而将当前的预测意图确定为最佳意图。当相似度的计算值小于所述阈值,说明两个句子的差别较大,用户输入该信息的意图与目前预测的意图不同。
在本申请实施例中设置了两个阈值,即第一阈值和第二阈值。对于一个较佳的实施例,第二阈值大于第一阈值,即系统不需回溯意图树,通过分值较大的第二阈值来确定是否已得到用户意图,从而加快处理速度,提高系统的响应速度。由于第二阈值较高分值的设置,提高了两句话相匹配的相关度,但是在当前最大相似度小于第二阈值时,不利于判断此时的两句话是否是同一话题,是否需要转到第三方系统。因而在全局最大相似度小于第二阈值时,需要与第一阈值比较,用于确定是否需要转到第三方系统。
关于第三方系统,第三方系统和本系统可以串行也可以并行,串行即本系统在未找到最佳意图时再请求第三方系统。并行即同时请求,本系统在执行确定最佳意图的步骤时,同时请求第三方系统,从而节省了时间。
使用者也可以根据实际需要和成本考量,不连接第三方系统,当本系统未找到最佳意图时,直接输出特定的信息,如“您的问题我还不知道怎么回答”、“请重新输入”等。
本申请的系统数据库中存储有意图与输出信息的对照表,因而,在步骤3中,根据步骤2中得到的最佳意图查询所述对照表,便可以确定输出信息,从而在步骤4中输出所述的输出信息。
关于对照表中的所述的输出信息,在一个较佳实施例中为文本信息。根据输出格式的需求,例如一些非机器人平台,具有显示界面等设备时,输出文本信息。也可以输出语音信息,即在输出前,将所述文本信息转成语音信息,例如通过tts转为语音信息后播放。
意图与输出信息的对照表可以为一一对应关系,也可以是一对多的对应 关系,即一个意图可以对应多个输出信息,此时,可随机选取一个输出信息。
为了为前述的人机交互方法提供数据支持,本申请需要进行大量的语料训练。参照以下实施例具体说明语料训练的过程:
步骤S1b、收集输入问句及输出答案的对应信息。
步骤S2b、为每一对输入问句及输出答案的对应信息标注意图及对应的上级意图。经过步骤S1b和步骤S2b,得到如表1所示的数据。
表1:
Figure PCTCN2018107893-appb-000001
步骤S3b、根据标注的意图及对应的上级意图生成意图树。例如,如图2所示的意图树。
步骤S4b、将每一个输入问句信息与意图节点标签合并为新的文本信息。
步骤S5b、以所述新的文本信息的词向量作为意图分类器的输入,所述意图分类器为前述方法中的意图分类模型,通过意图分类器得到一个预测意图。对应关系如下表2所示:
表2
意图+输入(问题) 预测意图
我需要取款 取款
取款一万 取款两万以下
取款两万以下银行卡 银行卡取款两万以下
经过上述语料训练方法,可以不断扩充语料,为本申请所述人机交互提供充分、丰富的语料内容。
本申请还提供了一种人机交互系统,其原理框图如图5所示。所述系统包括语音识别模块1、最佳意图确定模块2、查询模块3和输出模块4。其中,所述语音识别模块1接收用户输入的语音信息,并将用户的语音输入信息识别为对应的用户文本信息。所述最佳意图确定模块2与所述语音识别模块1和数据库相连接,数据库中储有意图树及意图节点标签,所述最佳意图确定模块2从所述数据库中取得意图节点标签,并根据所述语音识别模块1得到的用户文本信息,基于意图树的节点群,利用意图分类器确定最佳意图;在得到最佳意图后,将所述最佳意图发送给所述查询模块3。所述查询模块3根据所述最佳意图,在数据库中查询意图与输出信息的对照表,从而得到对应的输出信息,并将所述输出信息发送给所述的输出模块4。输出模块4得到所述输出信息后,根据设定格式或用户要求的格式输出所述输出信息,例如,以文字、语音等方式输出。
其中,所述最佳意图确定模块2的原理框图具体如图6所示,其包括合并单元21、意图分类器22、验证单元23和确定单元24。其中,所述合并单元21分别与语音识别模块1和数据库相连接,分别获得用户文本信息和意图节点标签,将所述的用户文本信息和意图节点标签合并为新的文本信息,并将该新的文本信息发送给意图分类器22。意图分类器22以所述合并单元的合并信息为输入信息,得到预测意图。所述验证单元23,用于验证所述预测意图是否符合用户意图。所述确定单元24用于将符合用户意图的预测意图确定为最佳意图。
其中,将所述验证单元包23括:查找子单元231、相似度计算子单元232和阈值比较子单元233。其中,查找子单元231所述用于根据意图分类器输出的预测意图,查找意图与预置输入信息的对照表,得到与预测意图对应的预 置输入信息;所述相似度计算子单元232用于计算用户文本信息与对应的预置输入信息的相似度,获得对应预测意图的最大相似度。阈值比较子单元233用于比较所述最大相似度与阈值的大小,并将比较结果发送给所述确定单元。
根据不同的数据处理流程,上述各个单元、子单元结合成不同的结构,如图7所示,为最佳意图确定模块2实施例一的原理框图。
所述合并单元21a分别与语音识别模块1和数据库相连接,分别获得用户文本信息和当前意图节点标签,将所述的用户文本信息和当前意图节点标签合并为新的文本信息,并将该新的文本信息发送给意图分类器22a。所述合并单元还包括通知接收接口,用于接收合并通知,从而进行用户文本信息和新的意图节点标签的合并。
意图分类器22a以所述合并单元21a的合并信息为输入信息,得到预测意图。
根据意图分类器22a输出的预测意图,查找子单元231a查找意图与预置输入信息的对照表,得到与每一预测意图对应的预置输入信息。
所述相似度计算子单元232a计算用户文本信息与对应的预置输入信息的相似度,获得对应预测意图的最大相似度。所述相似度计算子单元232a包括通知输出接口,在计算完一个预测意图的最大相似度后,向所述合并单元21a发送合并通知。
所述合并单元21a通过通知接收接口接收合并通知,从而数据库中的意图树中取得当前意图节点的上一级节点的意图节点标签,进行新的合并,并把合并后的信息发送给意图分类器22a。
其中,也可以由意图分类器22a在得到一个预测意图后向所述合并单元21a发送合并通知,此时则不需要所述相似度计算子单元232a来发送合并通知。
经过多次循环计算,将意图树回溯到根节点后,停止回溯,所述相似度计算子单元232a从多个预测意图的最大相似度中得出分值最大的全局最大相似度,并将其发送给阈值比较单元233a。
阈值比较单元233a接收所述全局最大相似度,并从数据库中取得第一阈值,比较全局最大相似度和第一阈值的大小,如果全局最大相似度大于或等于第一阈值,则向确定单元24a发送通知,确定单元24a将与所述全局最大相似度对应的预测意图确定为最佳意图。如果全局最大相似度小于第一阈值,则通过第三方接口模块5向第三方系统6发送所述用户文本信息和交互请求。
第三方系统6根据所述交互请求和所述用户文本信息进行处理,并将处理后得到的回复信息(即应该回复用户的输出信息)发送给本系统。所述第三方接口模块5接收第三方系统6返回的输出信息,将所述输出信息发送给所述输出模块4,所述输出模块4输出该信息。
本申请提供第三方接口模块5与第三方系统6相连接,是为了解决向用户提供非本系统服务内容的回复。在实际应用中,用户输入的问句信息有时并不是本系统可以解决的内容,例如,在银行系统的交互系统中,用户问了其他领域的问题,如“燃气费用是多少”。在处理这类问题时,本系统在根据用户输入信息预测意图时,当预测意图已为意图树中的根意图、在计算相似度时,全局最大相似度仍然小于内部设定的第一阈值,此时,本系统可以判断此时用户输入的问句信息为本系统外的话题,所以将所述用户文本信息和交互请求发送给第三方系统,此时,由第三方系统处理所述问句信息,得到给用户的回复信息,第三方系统会将所述回复信息发送给本系统,由本系统的第三方接口模块5接收,并将所述回复信息发送给所述输出模块4,由所述输出模块4输出。因而,本系统不但可以就本系统领域的话题与用户进行交互,也可以在不同领域的话题之间切换,从而实现无话题障碍的交互,解答用户提出的各种问题。
如图8所示,为最佳意图确定模块2实施例二的原理框图。在本实施例中,其结构组成与最佳意图确定模块2实施例一相同,但是工作流程不同,具体如下:
所述合并单元21b分别获得用户文本信息和当前意图节点标签,将所述的用户文本信息和当前意图节点标签合并为新的文本信息,并将该新的文本 信息发送给意图分类器22b。所述合并单元21b还包括通知接收接口,用于接收合并通知,从而进行新的合并。
意图分类器22b以所述合并单元21b的合并信息为输入信息,得到预测意图。
查找子单元231b所述用于根据意图分类器22b输出的预测意图,查找意图与预置输入信息的对照表,得到与所述预测意图对应的预置输入信息。
所述相似度计算子单元232b计算用户文本信息与对应的预置输入信息的相似度,获得对应预测意图的最大相似度,并将其发送给阈值比较单元233b。
阈值比较单元233b接收所述最大相似度,并从数据库中取得第二阈值,比较所述最大相似度和第二阈值的大小,如果最大相似度大于或等于第二阈值,则向确定单元24b发送通知,确定单元24b将与所述最大相似度对应的预测意图确定为最佳意图。如果所述最大相似度小于第二阈值,则通过通知接口向所述合并单元21b发送合并通知。
所述合并单元21b根据该接口接收的合并通知,从而数据库中的意图树中取得当前意图节点的上一级节点的意图节点标签,进行新的合并,并把合并后的信息发送给意图分类器22b。
意图分类器22b根据新的输入信息得到另一个预测意图。各部件的工作流程如上所述。直到当阈值比较单元233b通过比较得到根据根意图节点得到的预测意图的最大相似度也小于第二阈值时,向相似度计算单元232b发送通知,要求其提供全局最大相似度。相似度计算单元232b从所有的预测意图的最大相似度得到全局最大相似度,并将其发送给阈值比较单元233b。
阈值比较单元233b接收所述全局最大相似度,并从数据库中取得第一阈值,比较全局最大相似度和第一阈值的大小,如果全局最大相似度大于或等于第一阈值,则向确定单元24b发送通知,确定单元24b将与所述全局最大相似度对应的预测意图确定为最佳意图。如果全局最大相似度小于第一阈值时,则通过第三方接口模块5向第三方系统6发送所述用户文本信息和交互请求。
在以上实施例中,意图分类器22、22a、22b包括有文本向量化单元,用于将合并单元21、21a、21b发送的用户输入文本信息和意图节点标签的合并的新的文本信息进行分词和文本向量化处理,得到的词向量作为意图分类器22、22a、22b的输入。在另外的实施例中,最佳意图确定模块2包括所述的文本向量化单元,即与意图分类器22、22a、22b分开独立设置,便于模块化设计和维护。
为了计算的过程简便、信息读取迅速,本系统还包括当前意图节点维护模块7,如图9所示,用于在确定了最佳意图后,将最佳意图节点路径中路径长度最大的最佳意图节点设置为当前意图节点,则将最佳意图变更为当前意图,从而在系统下一次的交互过程开始时可以快速得到当前意图节点。在经过回溯意图树仍然没有得到最佳意图时,保留原来的当前意图节点不变。
在如图2所示意图树中,由于没有相同标签的意图节点,因而,当前意图节点维护模块7将最佳意图节点设置为当前意图节点。然而,有时意图对中会有多个相同标签的意图节点,如图10所示。此时,如果中有两个或多个相同的最佳意图节点标签,此时,需要将路径最长、最深的那个节点设定为当前意图节点。具体地,首先,在所述意图树的节点标签集中搜索所述最佳意图节点标签,得到对应最佳意图的节点路径;当对应最佳意图的节点路径为多个时,将路径长度最大的最佳意图节点确定为当前意图节点。
以下通过具体的应用实施例对本申请进行说明。
应用实施例一
结合图11所示的意图树及其回溯过程示意图,简要说明人机交互过程。在本实施例中,系统中的当前意图为“银行卡取款2万以下”,因而,从当前意图节点到根意图节点的节点分支中,按照从低到高(根意图节点处于最高级别)的级别,依次包括“银行卡取款2万以下”、“取款2万以下”、“取款”“Root”。当用户采用语音输入“我要贷款”时,获得输出信息的过程简要说明如下:
步骤S1、识别:系统将用户的语音输入“我要贷款”识别为文本信息。
步骤S2、合并1:“银行卡取款2万以下”+“我要贷款”。
步骤S3、文本向量化:分词,并向量化。
步骤S4、预测意图:采用意图分类器得到预测意图“采用银行卡取款2万以下”。
步骤S5、计算相似度,得到最大相似度得分:0.485366550785。
步骤S6、合并2:“取款2万以下”+“我要贷款”。
步骤S7、文本向量化:分词,并向量化。
步骤S8、预测意图:采用意图分类器得到预测意图“取款”。
步骤S9、计算相似度,得到最大相似度得分:0.577754257751。
步骤S10、合并3:“取款”+“我要贷款”。
步骤S11、文本向量化:分词,并向量化。
步骤S12、预测意图:采用意图分类器得到预测意图“取款”。
步骤S13、计算相似度,得到最大相似度得分:0.353053754796。
步骤S14、合并4:合并:“ROOT(空)”+“我要贷款”。
步骤S15、文本向量化:分词,并向量化。
步骤S16、预测意图:采用意图分类器得到预测意图“贷款”。
步骤S17、计算相似度,得到最大相似度得分:1.0。
步骤S18、从前述多个最大相似度中得到分值最大的全局最大相似度:1.0。
步骤S19、将所述全局最大相似度得分1.0与设定的第一阈值0.8进行比较,全局最大相似度大于所述第一阈值。
步骤S20、确定与所述全局最大相似度对应的预测意图“贷款”为最佳意图。
步骤S21、选取与“贷款”对应的输出信息“您要贷款多少”,并更新系统中的当前意图的节点为“贷款”。
步骤S22、向用户输出“您要贷款多少”的语音信息。
在本应用实施例中,通过回溯意图树,遍历了所述分支中的每一个节点, 因而增加了最佳意图的准确性。
应用实施例二
在当前意图为“银行卡取款2万以下”,用户输入信息为“银行卡取款2万以下”时,其处理过程简要说明如下:
步骤S1、合并:“银行卡取款2万以下”+“银行卡取款2万以下”。
步骤S2、预测意图:“银行卡取款2万以下”。
步骤S3、计算最大相似度:1。
步骤S4、将所述最大相似度得分1.0与设定的第二阈值1.0进行比较,所述最大相似度等于所述第二阈值。
步骤S5、确定与所述最大相似度对应的预测意图“银行卡取款2万以下”为最佳意图。
步骤S6、选取与“银行卡取款2万以下”对应的输出信息“请您到自助取款机取款”,并更新系统中的当前意图的节点为“银行卡取款2万以下”。
步骤S7、向用户输出“请您到自助取款机取款”的语音信息。
在本实施例中,在找到了最大相似度大于第二阈值后便不再遍历意图树,从而节省了处理时间,提高了系统对用户的响应速度。
综上所述,本申请为用户意图明确、事务的完成有很清晰步骤的垂直领域,如银行、法院、医院等,提供了一种操作简单、系统处理快速、响应准确的人机交互方法和系统。本文采用意图树回溯机制,只需要标注垂直领域内语料的意图及上级意图,无需其它通用语料的标注,节省了大量的注标处理时间。在具体实现过程中,只需要用分类器预测意图,并采用回溯机制寻找最优的节点便可以得到准确的输出信息。能够实现业务内同一主题下交互和不同主题下的话题切换和交互,通过与第三方系统的通信,能够实现业务领域外话题的切换,为用户提供不同话题的回复信息。
本领域内的技术人员应明白,本发明的实施例可提供为方法、系统、或计算机程序产品。因此,本发明可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本发明可采用在一个或多个 其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
本发明是参照根据本发明实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
在一个典型的配置中,计算设备包括一个或多个处理器(CPU)、输入/输出接口、网络接口和内存。内存可能包括计算机可读介质中的非永久性存储器,随机存取存储器(RAM)和/或非易失性内存等形式,如只读存储器(ROM)或闪存(flash RAM)。内存是计算机可读介质的示例。
计算机可读介质包括永久性和非永久性、可移动和非可移动媒体可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括,但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其 他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带,磁带磁磁盘存储或其他磁性存储设备或任何其他非传输介质,可用于存储可以被计算设备访问的信息。按照本文中的界定,计算机可读介质不包括暂存电脑可读媒体(transitory media),如调制的数据信号和载波。
还需要说明的是,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、商品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、商品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、商品或者设备中还存在另外的相同要素。
本领域技术人员应明白,本申请的实施例可提供为方法、系统或计算机程序产品。因此,本申请可采用完全硬件实施例、完全软件实施例或结合软件和硬件方面的实施例的形式。而且,本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
以上所述仅为本申请的实施例而已,并不用于限制本申请。对于本领域技术人员来说,本申请可以有各种更改和变化。凡在本申请的精神和原理之内所作的任何修改、等同替换、改进等,均应包含在本申请的权利要求范围之内。

Claims (16)

  1. 一种人机交互方法,其特征在于,包括:
    将用户的语音输入信息识别为用户文本信息;
    根据所述用户文本信息和意图节点标签,基于意图树节点群,通过意图分类器及对应的数据处理确定最佳意图;
    根据所述最佳意图,查询意图与输出信息的对照表,得到对应的输出信息;和
    输出所述输出信息。
  2. 如权利要求1所述人机交互方法,其特征在于,所述根据所述用户文本信息和意图节点标签,基于意图树节点群,通过意图分类器及对应的数据处理确定最佳意图的步骤具体包括:
    获取当前意图节点标签;
    从所述意图树节点群中确定从当前意图节点到根意图节点的节点分支;
    将所述用户文本信息和意图节点标签合并为意图分类器的输入信息;
    通过将意图分类器输入信息中的意图节点标签替换为所述节点分支中的意图节点标签,利用所述意图分类器得到对应的预测意图;和
    验证所述预测意图是否符合用户意图,将符合用户意图的预测意图确定为最佳意图。
  3. 如权利要求2所述人机交互方法,其特征在于,其中,
    将意图分类器输入信息中的意图节点标签替换为所述节点分支中的意图节点标签时,从当前意图节点开始,到根意图节点结束,分别用每一个节点的意图节点标签替换意图分类器输入信息中的意图节点标签,得到多个对应的预测意图;
    验证所述预测意图是否符合用户意图的步骤包括:
    查找意图与预置输入信息的对照表,得到与每一预测意图对应的预置 输入信息;
    计算用户文本信息与对应的预置输入信息的相似度,获得对应每一预测意图的最大相似度;
    比较所述多个预测意图的最大相似度的分值大小,将分值最大的最大相似度确定为全局最大相似度;和
    比较所述全局最大相似度和第一阈值的大小,如果所述全局最大相似度大于或等于所述第一阈值,则确定与所述全局最大相似度对应的预测意图符合用户意图。
  4. 如权利要求3所述人机交互方法,其特征在于,如果所述全局最大相似度小于所述第一阈值,或者:
    获取对应的特定输出信息,并输出所述特定输出信息;
    或者:
    向第三方系统发送交互请求;接收第三方系统返回的第三方交互输出信息;和输出所述第三方交互输出信息。
  5. 如权利要求2所述人机交互方法,其特征在于,其中,
    将意图分类器输入信息中意图节点标签替换为所述节点分支中的意图节点标签时,从当前意图节点开始,用当前意图节点标签替换意图分类器输入信息中的意图节点标签,得到对应的预测意图;
    验证所述预测意图是否符合用户意图的步骤包括:
    查找意图与预置输入信息的对照表,得到与所述预测意图对应的预置输入信息;
    计算用户文本信息与对应的预置输入信息的相似度,得到对应所述预测意图的最大相似度;和
    比较所述预测意图的最大相似度和第二阈值的大小,如果所述预测意图的最大相似度大于或等于第二阈值,则确定所述预测意图符合用户意图;如果所述预测意图的最大相似度小于所述第二阈值,将意图分类器输入信息中意图节点标签替换为所述节点分支中当前意图节点的上一级节点的意 图节点标签,重复上述步骤。
  6. 如权利要求5所述人机交互方法,其特征在于,当将意图分类器输入信息中意图节点标签替换为根意图节点标签时得到的对应最大相似度小于所述第二阈值时,根据已计算得到的多个预测意图的最大相似度,比较所述多个预测意图的最大相似度的分值大小,将分值最大的最大相似度确定为全局最大相似度;和
    比较所述全局最大相似度和第一阈值的大小,如果所述全局最大相似度大于或等于所述第一阈值,则确定与所述全局最大相似度对应的预测意图符合用户意图;
    如果所述全局最大相似度小于所述第一阈值,或者:获取对应的特定输出信息;并输出所述特定输出信息;或者:向第三方系统发送交互请求;接收第三方系统返回的第三方交互输出信息;和向用户输出所述第三方交互输出信息。
  7. 如权利要求2所述人机交互方法,其特征在于,将符合用户意图的预测意图确定为最佳意图后,还包括:
    将所述最佳意图对应的节点确定为当前意图节点。
  8. 如权利要求7所述人机交互方法,其特征在于,将所述最佳意图对应的节点确定为当前意图节点的步骤包括:
    在所述意图树的节点标签集中搜索所述最佳意图节点标签,得到最佳意图节点路径;和
    当最佳意图节点路径为多个时,将路径长度最大的最佳意图节点确定为当前意图节点。
  9. 如权利要求2-8任一所述人机交互方法,其特征在于,将所述用户文本信息和意图节点标签合并为意图分类器的输入信息的步骤具体包括:
    将所述的用户文本信息和意图节点标签合并为新的文本信息;
    对所述新的文本信息进行分词和文本向量化处理,得到对应的词向量;
    将所述词向量作为意图分类器的输入信息。
  10. 如权利要求1-8任一所述人机交互方法,其特征在于,所述意图分类器为卷积神经网络模型或循环神经网络模型。
  11. 一种人机交互系统,包括:
    语音识别模块,用于将用户的语音输入信息识别为用户文本信息,其特征在于,还包括:
    最佳意图确定模块,用于根据所述用户文本信息和意图节点标签,基于意图树的节点群,通过意图分类器及对应的数据处理确定最佳意图;
    查询模块,用于根据所述最佳意图,查询意图与输出信息的对照表,得到对应的输出信息;和
    输出模块,用于输出所述输出信息。
  12. 如权利要求11所述人机交互系统,其特征在于,所述最佳意图确定模块包括:
    合并单元,用于合并所述用户文本信息和意图节点标签;
    意图分类器,用于以所述合并单元的合并信息为输入信息,得到预测意图;
    验证单元,用于验证所述预测意图是否符合用户意图;和
    确定单元,用于将符合用户意图的预测意图确定为最佳意图。
  13. 如权利要求12所述人机交互系统,其特征在于,所述验证单元包括:
    查找子单元,用于根据意图分类器输出的预测意图,查找意图与预置输入信息的对照表,得到与每一预测意图对应的预置输入信息;
    相似度计算子单元,用于计算用户文本信息与对应的预置输入信息的相似度,获得对应预测意图的最大相似度;和
    阈值比较子单元,用于比较所述最大相似度与阈值的大小,并将比较结果发送给所述确定单元。
  14. 如权利要求13所述人机交互系统,其特征在于,所述合并单元包括通知接收接口,用于接收合并通知;
    对应地,所述意图分类器包括通知输出接口,用于向所述合并单元发送合并通知;或者
    所述相似度计算子单元包括通知输出接口,用于向所述合并单元发送合并通知;或者
    所述阈值比较子单元包括通知输出接口,用于向所述合并单元发送合并通知。
  15. 如权利要求11-14任一所述人机交互系统,其特征在于,还包括当前意图节点维护模块,用于在最大相似度小于阈值时,保留当前意图节点;在确定了最佳意图时,将最佳意图节点路径中路径长度最大的最佳意图节点确定为当前意图节点。
  16. 如权利要求11-14任一所述人机交互系统,其特征在于,还包括:
    第三方接口模块,与所述最佳意图确定模块相连接,用于在所述最佳意图确定模块确定没有最佳意图时,将所述用户文本信息和交互请求发送给第三方系统,并接收第三方系统返回的输出信息,将所述输出信息发送给所述输出模块。
PCT/CN2018/107893 2017-10-31 2018-09-27 人机交互方法和系统 WO2019085697A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201711054329.6 2017-10-31
CN201711054329.6A CN109726387A (zh) 2017-10-31 2017-10-31 人机交互方法和系统

Publications (1)

Publication Number Publication Date
WO2019085697A1 true WO2019085697A1 (zh) 2019-05-09

Family

ID=66294418

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/107893 WO2019085697A1 (zh) 2017-10-31 2018-09-27 人机交互方法和系统

Country Status (2)

Country Link
CN (1) CN109726387A (zh)
WO (1) WO2019085697A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111400340A (zh) * 2020-03-12 2020-07-10 杭州城市大数据运营有限公司 一种自然语言处理方法、装置、计算机设备和存储介质
CN112905893A (zh) * 2021-03-22 2021-06-04 北京百度网讯科技有限公司 搜索意图识别模型的训练方法、搜索意图识别方法及装置

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110188199A (zh) * 2019-05-21 2019-08-30 北京鸿联九五信息产业有限公司 一种用于智能语音交互的文本分类方法
CN110175228B (zh) * 2019-05-27 2023-08-15 苏州课得乐教育科技有限公司 基于基础模块和机器学习的循环嵌入对话训练方法及系统
CN110995945B (zh) * 2019-11-29 2021-05-25 中国银行股份有限公司 一种生成外呼流程的数据处理方法、装置、设备及系统
CN111080927A (zh) * 2019-12-17 2020-04-28 中国建设银行股份有限公司 一种用于封闭式自助金融设备的软件架构方法及装置
CN111930921B (zh) * 2020-10-10 2021-01-22 南京福佑在线电子商务有限公司 意图预测的方法及装置
CN112417110A (zh) * 2020-10-27 2021-02-26 联想(北京)有限公司 一种信息处理方法及装置
CN112905765B (zh) * 2021-02-09 2024-06-18 联想(北京)有限公司 一种信息处理方法及装置
CN113126765A (zh) * 2021-04-22 2021-07-16 北京云迹科技有限公司 一种多模态输入交互方法、装置、机器人和存储介质
CN116052081A (zh) * 2023-01-10 2023-05-02 山东高速建设管理集团有限公司 一种场地安全实时监测方法、系统、电子设备及存储介质

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104598445A (zh) * 2013-11-01 2015-05-06 腾讯科技(深圳)有限公司 自动问答系统和方法
CN106202476A (zh) * 2016-07-14 2016-12-07 广州安望信息科技有限公司 一种基于知识图谱的人机对话的方法及装置
CN106227740A (zh) * 2016-07-12 2016-12-14 北京光年无限科技有限公司 一种面向对话系统的数据处理方法及装置
CN106326307A (zh) * 2015-06-30 2017-01-11 芋头科技(杭州)有限公司 一种语言交互方法
CN106383875A (zh) * 2016-09-09 2017-02-08 北京百度网讯科技有限公司 基于人工智能的人机交互方法和装置
CN106528694A (zh) * 2016-10-31 2017-03-22 百度在线网络技术(北京)有限公司 基于人工智能的语义判定处理方法和装置
CN107146610A (zh) * 2017-04-10 2017-09-08 北京猎户星空科技有限公司 一种用户意图的确定方法及装置

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103533186B (zh) * 2013-09-23 2016-03-02 安徽科大讯飞信息科技股份有限公司 一种基于语音呼叫的业务流程实现方法及系统
JP6178208B2 (ja) * 2013-10-28 2017-08-09 株式会社Nttドコモ 質問分野判定装置及び質問分野判定方法
CN105786798B (zh) * 2016-02-25 2018-11-02 上海交通大学 一种人机交互中自然语言意图理解方法
CN107273477A (zh) * 2017-06-09 2017-10-20 北京光年无限科技有限公司 一种用于机器人的人机交互方法及装置

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104598445A (zh) * 2013-11-01 2015-05-06 腾讯科技(深圳)有限公司 自动问答系统和方法
CN106326307A (zh) * 2015-06-30 2017-01-11 芋头科技(杭州)有限公司 一种语言交互方法
CN106227740A (zh) * 2016-07-12 2016-12-14 北京光年无限科技有限公司 一种面向对话系统的数据处理方法及装置
CN106202476A (zh) * 2016-07-14 2016-12-07 广州安望信息科技有限公司 一种基于知识图谱的人机对话的方法及装置
CN106383875A (zh) * 2016-09-09 2017-02-08 北京百度网讯科技有限公司 基于人工智能的人机交互方法和装置
CN106528694A (zh) * 2016-10-31 2017-03-22 百度在线网络技术(北京)有限公司 基于人工智能的语义判定处理方法和装置
CN107146610A (zh) * 2017-04-10 2017-09-08 北京猎户星空科技有限公司 一种用户意图的确定方法及装置

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111400340A (zh) * 2020-03-12 2020-07-10 杭州城市大数据运营有限公司 一种自然语言处理方法、装置、计算机设备和存储介质
CN111400340B (zh) * 2020-03-12 2024-01-09 杭州城市大数据运营有限公司 一种自然语言处理方法、装置、计算机设备和存储介质
CN112905893A (zh) * 2021-03-22 2021-06-04 北京百度网讯科技有限公司 搜索意图识别模型的训练方法、搜索意图识别方法及装置
CN112905893B (zh) * 2021-03-22 2024-01-12 北京百度网讯科技有限公司 搜索意图识别模型的训练方法、搜索意图识别方法及装置

Also Published As

Publication number Publication date
CN109726387A (zh) 2019-05-07

Similar Documents

Publication Publication Date Title
WO2019085697A1 (zh) 人机交互方法和系统
CN110377911B (zh) 对话框架下的意图识别方法和装置
WO2022078346A1 (zh) 文本意图识别方法、装置、电子设备及存储介质
CN109918673B (zh) 语义仲裁方法、装置、电子设备和计算机可读存储介质
Ren et al. Intention detection based on siamese neural network with triplet loss
WO2021208696A1 (zh) 用户意图分析方法、装置、电子设备及计算机存储介质
CN111708869B (zh) 人机对话的处理方法及装置
CN112328761B (zh) 一种意图标签设置方法、装置、计算机设备及存储介质
CN110633991A (zh) 风险识别方法、装置和电子设备
US10937417B2 (en) Systems and methods for automatically categorizing unstructured data and improving a machine learning-based dialogue system
CN113672718B (zh) 基于特征匹配和领域自适应的对话意图识别方法及系统
US12027159B2 (en) Automated generation of fine-grained call reasons from customer service call transcripts
CN113505601A (zh) 一种正负样本对构造方法、装置、计算机设备及存储介质
CN112446209A (zh) 一种意图标签的设置方法、设备、装置及存储介质
CN113076758B (zh) 一种面向任务型对话的多域请求式意图识别方法
CN114090792A (zh) 基于对比学习的文档关系抽取方法及其相关设备
CN113723077A (zh) 基于双向表征模型的句向量生成方法、装置及计算机设备
CN113342935A (zh) 语义识别方法、装置、电子设备及可读存储介质
Agarwal et al. Lidsnet: A lightweight on-device intent detection model using deep siamese network
Calvo-Zaragoza et al. Recognition of pen-based music notation with finite-state machines
CN112633381B (zh) 音频识别的方法及音频识别模型的训练方法
CN114637831A (zh) 基于语义分析的数据查询方法及其相关设备
CN114595329A (zh) 一种原型网络的少样本事件抽取系统及方法
CN110399984B (zh) 一种信息的预测方法、系统以及电子设备
CN114254622A (zh) 一种意图识别方法和装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18872244

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18872244

Country of ref document: EP

Kind code of ref document: A1