CN106295807B

CN106295807B - A method and device for information processing

Info

Publication number: CN106295807B
Application number: CN201610710565.8A
Authority: CN
Inventors: 蔡宏伟; 朱频频
Original assignee: Guizhou Xiaoai Robot Technology Co ltd; Shanghai Xiaoi Robot Technology Co Ltd
Current assignee: Guizhou Xiaoai Robot Technology Co ltd; Shanghai Xiaoi Robot Technology Co Ltd
Priority date: 2016-08-23
Filing date: 2016-08-23
Publication date: 2018-12-21
Anticipated expiration: 2036-08-23
Also published as: CN109344237A; CN106295807A; CN108764480B; CN109344237B; CN108764480A

Abstract

The invention relates to the technical field of human-computer interaction, in particular to a method and a device for processing information in human-computer interaction. The invention provides an information processing method, which comprises the following steps: providing a model sample library, wherein the model sample library comprises sample standard questions and sample extension questions corresponding to each sample standard question; providing a knowledge base, wherein the knowledge base comprises knowledge base standard questions and knowledge base extension questions and answers corresponding to the knowledge base standard questions, and the knowledge base is used for providing answers for user question sentences; determining whether a sample expansion question matched with a user question sentence in a human-computer interaction log exists in the model sample library; if yes, determining whether the standard question corresponding to the user question sentence in the human-computer interaction log is the same as the standard question corresponding to the matched sample extension question; and if not, optimizing the knowledge base. The invention also provides an information processing device and system corresponding to the information processing method.

Description

A kind of method and device of information processing

Technical field

The present invention relates to the method and devices of information processing in human-computer interaction technique field more particularly to human-computer interaction.

Background technique

Human-computer interaction is the science of the interactive relation between research system and user.System can be various machines Device is also possible to the system and software of computerization.For example, various artificial intelligence systems, example may be implemented by human-computer interaction Such as, intelligent customer service system, speech control system etc..

Artificial intelligence semantics recognition is the basis of human-computer interaction, can be identified to human language, to be converted into machine Device it will be appreciated that language.In order to understand human language, artificial intelligence semantics recognition system needs a set of knowledge base.Magnanimity Isomeric data is organized into knowledge by knowledge learning system, and is dissolved into existing knowledge hierarchy.

Various artificial intelligence systems are handled the original question sentence that user proposes using artificial intelligence semantics recognition technology, Determine the corresponding standard question sentence of the original question sentence, then based on incidental some limited in the standard question sentence and original question sentence Information provide corresponding answer, the place for each original question sentence is recorded in the form of log in artificial intelligence system Manage situation, the information of each log includes: original question sentence (user's question sentence) that user proposes and is answered standard question sentence (standard is asked) Case.

Knowledge base optimized, include two important steps: the interactive log optimized will be needed to pick out；For Select log optimizes knowledge base.

In the prior art, when selecting interactive log, mainly by manually collect and sort out correct log library and Meaningless log library, is then compared with daily interactive log, is filtered to the log content of exact matching.Each log Artificial contrast is all needed, needs to put into a large amount of hand labors.Meanwhile when needing to optimize knowledge base, it is also desirable to professional Knowledge operation maintenance personnel, which for every need to optimize log and carry out standard, asks and writes, and is costly and inefficient down.

Summary of the invention

The purpose of the present invention is to provide a kind of method and device of information processing, overcome present in traditional technology with Lower problem: it needs to put into a large amount of hand labors and selects the interactive log that need to optimize.Meanwhile in information processing, system can be automatic Proposed standard is asked, the investment of hand labor is further reduced, and improves the optimization efficiency of knowledge base.

According to above-mentioned purpose, the present invention provides a kind of method of information processing, comprising: provides model sample library, the mould Pattern example library includes that sample standard asks and asks that corresponding sample extension is asked with each sample standard；Knowledge base is provided, it is described Knowledge base includes that knowledge library standard asks and asks that the extension of corresponding knowledge base is asked and answer with each knowledge library standard, described to know Know library to be used to furnish an answer for user's question sentence；Determining in model sample library whether there is and the user in human-computer interaction log The sample extension that question sentence matches is asked；If it exists, it is determined that the corresponding mark of user's question sentence described in the human-computer interaction log Standard asks whether the corresponding sample standard asked with the extension of matched sample is asked identical；If not identical, optimize the knowledge base.

In one embodiment, the sample extension, which is asked, asks that the sample standard is asked including knowledge base including knowledge base extension Standard is asked.

In one embodiment, it determines in model sample library and expands with the presence or absence of the sample to match with user's question sentence Exhibition asks to include: to ask user's question sentence with sample extension to execute Semantic Similarity Measurement to be in determination model sample library The no sample extension for being greater than first threshold there are the semantic similarity of at least one and user's question sentence is asked.

In one embodiment, determine the corresponding standard of user's question sentence ask with matched sample extension ask corresponding to Sample standard asks that whether identical includes: that the corresponding standard of user's question sentence described in comparison asks that the institute asked with the extension of matched sample is right Sample standard is answered to ask whether text is completely the same.

In one embodiment, it is greater than the first threshold with user's question semanteme similarity if it exists and is less than 100% sample extension asks, and the corresponding standard of user's question sentence ask with semantic similarity be greater than the first threshold and It is identical less than the corresponding sample standard question sentence that the extension of 100% sample is asked, then by user's question sentence and user's question sentence Corresponding standard ask and be added into model sample library in association.

In one embodiment, multiple matched sample extensions are asked if it exists, it is determined that the corresponding mark of user's question sentence Standard, which is asked, asks with the corresponding sample standard asked of matched sample extension and whether identical has comprised determining whether a matched sample Extend the corresponding sample standard asked ask asked with the corresponding standard of user's question sentence it is identical.

It in one embodiment, include: based on the Semantic Similarity Measurement to the optimization of the knowledge base as a result, recommending The corresponding sample standard that the sample extension for being greater than second threshold with the semantic similarity of user's question sentence is asked is asked；It will be from being pushed away The sample standard recommended asks that the sample standard that middle artificial selection goes out is asked and is added into the knowledge base in association with user's question sentence.

In one embodiment, the method also includes: by it is described from the sample standard recommended ask middle artificial selection go out Sample standard is asked is added into model sample library with user's question sentence in association.

In one embodiment, if there is no the samples to match with user's question sentence to extend in model sample library It asks, then creates knowledge point corresponding with user's question sentence in knowledge base, the knowledge point includes: that knowledge library standard is asked, known The extension of knowledge library is asked and answer.

In one embodiment, the method also includes: by the knowledge point created in knowledge base while being added to the mould Pattern example library.

In one embodiment, ask that executing Semantic Similarity Measurement includes: to sample for user's question sentence and sample extension Extension, which is asked, to be segmented, and calculates word and sentence vector value；User's question sentence is segmented, and calculates word and sentence vector Value；The word and sentence vector value and the word of user's question sentence and the degree of correlation of sentence vector value that sample extension is asked are calculated, with User's question sentence and sample extend the semantic similarity asked out.

In one embodiment, with the presence or absence of the sample to match with user's question sentence in determining model sample library Before extension is asked, the method also includes: all user's question sentences in the human-computer interaction log are pre-processed, to filter people Invalid data in machine interactive log user's question sentence.

The present invention also provides a kind of devices of information processing, comprising: the first analysis module, for determining in model sample library It is asked with the presence or absence of the sample extension to match with user's question sentence in human-computer interaction log；Second analysis module, in response to It is asked in the presence of the sample extension to match with user's question sentence, it is determined that the institute of user's question sentence described in the human-computer interaction log Corresponding standard asks whether the corresponding sample standard asked with the extension of matched sample is asked identical；And optimization module, for responding Ask in the corresponding standard of user's question sentence asks not identical with the corresponding sample standard asked of matched sample extension, then optimizes Knowledge base.

In one embodiment, first analysis module includes: Semantic Similarity Measurement module, for asking the user Sentence with sample extension asks execution Semantic Similarity Measurement, in determination model sample library whether there is at least one with it is described The sample extension that the semantic similarity of user's question sentence is greater than first threshold is asked.

In one embodiment, second analysis module includes: comparison module, and the institute for user's question sentence is right Standard is answered to ask the corresponding sample standard asked with the extension of matched sample asks whether text is completely the same.

In one embodiment, second analysis module further include: adding module, in response to existing and the user Question semanteme similarity is greater than the first threshold and the sample extension less than 100% is asked, and user's question sentence is corresponding Standard, which is asked, to be greater than the first threshold with semantic similarity and extends the corresponding sample standard question sentence asked less than 100% sample It is identical, then the corresponding standard of user's question sentence and user's question sentence is asked and is added into the model sample in association Library.

In one embodiment, multiple matched sample extensions ask that then second analysis module determines whether if it exists The corresponding sample standard that one matched sample extension is asked ask asked with the corresponding standard of user's question sentence it is identical.

In one embodiment, the optimization module includes: recommending module, for the knot based on the Semantic Similarity Measurement Fruit, the corresponding sample standard for recommending the sample extension for being greater than second threshold with the semantic matching degree of user's question sentence to ask are asked； And adding module, it is related to user's question sentence for will ask that the standard of middle artificial selection out is asked from the sample standard recommended It is added into the knowledge base to connection.

In one embodiment, the adding module be further used for by it is described from the sample standard recommended ask in manually select The standard selected out is asked is added into model sample library with user's question sentence in association.

In one embodiment, if there is no the samples to match with user's question sentence to extend in model sample library It asks, then the adding module creates knowledge point corresponding with user's question sentence in knowledge base, and the knowledge point includes: knowledge Library standard is asked, knowledge base extension is asked and answer.

In one embodiment, the adding module also by the knowledge point created in knowledge base while being added to the model Sample library.

In one embodiment, the Semantic Similarity Measurement module includes: word and vector calculation module, for expanding sample Exhibition is asked and is segmented, and calculates word and sentence vector value, and segment to user's question sentence, and calculate word and sentence to Magnitude；And relatedness computation module, for calculating the word of sample the extension word asked and sentence vector value and user's question sentence With the degree of correlation of sentence vector value, to show that user's question sentence and sample extend the semantic similarity asked.

In one embodiment, described device further include: preprocessing module, in determining model sample library whether Before being asked in the presence of the sample extension to match with user's question sentence, user's question sentences all in the human-computer interaction log are carried out Pretreatment, to filter the invalid data in human-computer interaction log user's question sentence.

The present invention also provides a kind of system of information processing, the devices including any information processing, further includes: model Sample library, model sample library include that sample standard asks and asks that corresponding sample extension is asked with each sample standard；Know Know library, the knowledge base includes that knowledge library standard asks and asks that corresponding knowledge base extension is asked and answered with each knowledge library standard Case, the knowledge base are used to furnish an answer for user's question sentence.

The present invention carries out Automatic sieve by the model sample library set up first when human-computer interaction log need to be optimized by choosing Choosing has filtered out largely existing knowledge content, has reduced the input amount of hand labor.Simultaneity factor can need to optimize people from trend Machine interactive log proposed standard is asked, artificial only to be selected, and is further reduced hand labor, is improved knowledge base Optimization efficiency.

More preferably understand to have to above-mentioned and other aspect of the invention, preferred embodiment is cited below particularly, and cooperates attached Figure, is described in detail below:

Detailed description of the invention

Fig. 1 is knowledge base schematic diagram of the present invention；

Fig. 2 is model sample of the present invention library schematic diagram；

Fig. 3 is the schematic diagram for optimizing knowledge base process in the method flow of the information processing of one embodiment of the invention；

Fig. 4 is the schematic diagram of the device of the information processing of one embodiment of the invention.

Specific embodiment

User with can generate interactive log in intelligent robot interactive process, every interactive log is by user's question sentence, right The knowledge library standard answered is asked and answer three parts composition.Wherein user's question sentence is that acquisition is directly inputted by user, passes through question and answer After engine is to the parsing identification of user's question sentence, corresponding knowledge library standard is called to ask about corresponding answer.In these interactive logs It is middle that accuracy differentiation is replied with the answer that corresponding knowledge point is given by robot according to user's question sentence, user's question sentence content machine can be divided into Device people do not give reply, correct answer is given by user's question sentence content robot, wrong answer is given by user's question sentence content robot. Robot is caused not reply or give the reason of mistake replies mainly due to having lacked corresponding knowledge in robot knowledge base Point or the way to put questions of existing knowledge point are not abundant enough.Therefore it by the analysis of the interactive log generated daily, extracts because knowledge point lacks The log of the incorrect answer of robot caused by mistake or way to put questions be not abundant is a main path to knowledge base Continuous optimization.This The method and apparatus that invention provides can greatly reduce the artificial input amount when extracting the human-computer interaction log for needing to optimize.This It invents the user's question sentence being primarily upon in interactive log and standard is asked.

Fig. 1 and Fig. 2 are please referred to, figures 1 and 2 show that the partial objects of information processing of the present invention, knowledge base and model sample Example library.

As shown in Figure 1, knowledge base 10 includes that at least one knowledge library standard asks 101 and asks phase with each knowledge library standard 1011 and answer are asked in corresponding knowledge base extension, wherein each knowledge library standard asks a corresponding answer, can there is multiple knowledge Library extension asks that the extension of 1011- knowledge base asks 101n that a corresponding knowledge library standard asks 101.Since knowledge library standard is asked 101 with answering Case is asked present invention is primarily concerned with knowledge library standard and is asked with each knowledge library standard corresponding there are one-to-one relationship Knowledge base extends the treatment process asked.In general, can all have multiple knowledge library standards in knowledge base asks that knowledge library standard is asked 101- knowledge library standard asks 10n.In knowledge base include multiple knowledge points, each knowledge point include: a knowledge library standard ask, Multiple knowledge base extensions are asked with an answer, i.e., different knowledge base extensions ask it is all the corresponding same answer, a knowledge base Standard, which is asked, also corresponds to this answer.Usually from each knowledge point, corresponding multiple knowledge base extensions ask middle selection one expression Clear knowledge base extension easy to maintain asks that the knowledge library standard as the knowledge point is asked, therefore knowledge library standard is asked and known with one The extension of knowledge library is asked identical.It should be noted that each knowledge library standard asks that corresponding knowledge base extension asks that number can be identical, It can also be different.

In human-computer interaction process, after receiving user's question sentence, it can be obtained from knowledge base by Semantic Similarity Measurement With the semantic similarity highest of user's question sentence and the knowledge base extension that is higher than threshold value is asked, and asks knowledge base extension to corresponding answer Case is sent to user, while asking corresponding knowledge library standard with asking relevance conduct by user's question sentence and with knowledge base extension One interactive log.

As shown in Fig. 2, model sample library 20 includes that at least one sample standard asks 201 and corresponding one or more A sample extension asks 2011, similar with knowledge base data structure, and a sample standard, which asks to extend with multiple samples, asks correspondence. Usually from the extension of multiple samples ask it is middle select one expression clearly extension easy to maintain ask as with the multiple sample pair The sample standard answered asks, thus sample standard ask asked with the extension of one of sample it is identical.Each sample standard asks corresponding sample Example extension asks that number may be the same or different.

Fig. 3 is please referred to, the knowledge base Optimizing Flow of one embodiment of the invention is shown comprising the steps of:

Step 301: starting.

Step 302: determining in model sample library with the presence or absence of the sample to match with user's question sentence in human-computer interaction log Example extension is asked.

Step 303: if it exists, it is determined that the corresponding standard of user's question sentence described in the human-computer interaction log ask with Whether the corresponding sample standard that the sample extension matched is asked asks identical.

Step 304: if not identical, optimizing the knowledge base.

In step 302, it has been looked for whether in model sample library first close with human-computer interaction log user's question semanteme As sample extension ask, if there is approximate, then be referred to as match.If having matched, think that this user question sentence can quilt at this time Model sample library determines.Then in step 303, if can be determined, it is determined that the corresponding standard of user's question sentence is asked and the sample Example extension asks that whether identical corresponding standard asks, herein identical refers to that text is completely the same, then shows in knowledge base if they are the same Include knowledge point corresponding with user's question sentence, has optimized knowledge base without using this user journal.If not identical, show Question sentence not corresponding with the interactive log content, shows that this interactive log is new at this time in model sample library and knowledge base Content, need using this interactive log Advance data quality knowledge base, that is, enter in step 304.At this point, due to interaction User's question sentence in log can be determined, can directly by model sample library with interactive log user's question semanteme approximate one The corresponding one or more sample standards of a or multiple sample question sentences, which are asked, recommends knowledge maintenance personnel, when for one, by knowing Know maintenance personnel to judge whether properly；When to be multiple, therefrom directly selected by knowledge maintenance personnel one it is most suitable, Most suitable sample standard that is finally that judgement is suitable or selecting is asked and user's question sentence is stored in knowledge base in association, thus people The investment of work only needs to carry out simple supervision and management, and the knowledge maintenance personnel for the management that exercises supervision need to only recognize Chinese, has Normal logic judgment ability needs coming for certain knowledge edition experience for needing to put into before manually in this way It says, further reduced the requirement to personnel's threshold, and improve optimization efficiency.

The advantages of the method, also resides in, and judges whether that needing to optimize knowledge base is entirely to complete in local model sample library , without the knowledge base using cloud.Arithmetic speed is not only improved in this way, but also saves the spending of cloud knowledge base.

In one embodiment, sample extension, which is asked, asks that sample standard is asked asks including knowledge library standard including knowledge base extension.More Further, sample extension, which is asked, asks that sample standard is asked including the institute in knowledge base including all knowledge bases extension in knowledge base There is knowledge library standard to ask.In this embodiment, model sample library includes that all knowledge library standards in knowledge base are asked and knowledge Library extension is asked.Model sample library is further reduced what subsequent artefacts selected to whether the judgement that optimizes is more accurate at this time Workload.

In one embodiment, in step 302, if judging result is, there is no ask with the user in model sample library The sample extension that sentence matches is asked, then knowledge point corresponding with user's question sentence, the knowledge point packet are created in knowledge base Include: knowledge library standard is asked, knowledge base extension is asked and answer.In this embodiment, it is believed that the interactive log can not be by model sample library Determined, i.e., information not relevant to the interactive log in knowledge base, needs to optimize knowledge base using this interactive log.At this time Due to the interactive log undecidable, a knowledge relevant to user's question sentence is actively only added by knowledge maintenance personnel Point, that is, need to add a knowledge library standard ask, multiple knowledge bases extension ask with an answer, to complete the optimization of knowledge base.

In a preferred embodiment, whether be by semantic similarity measured, can set if being matched in step 302 One threshold value, when semantic similarity is greater than first threshold, it is believed that interactive log user question sentence asks matching with sample extension.When artificial When input amount can guarantee, the first threshold can be set higher.Otherwise, then first threshold can be set low It is some, so as to save human cost.

In one embodiment, whether there is and user's question sentence phase in human-computer interaction log in the determining model sample library Matched sample extension is asked, is to be completed by semantic matching degree operation, is segmented comprising steps of asking sample extension, and Calculate word and sentence vector value；User's question sentence is segmented, and calculates word and sentence vector value；Each sample is calculated to expand The word asked and sentence vector value and the word of user's question sentence and the degree of correlation of sentence vector value are opened up, to obtain user's question sentence The semantic similarity asked is extended with sample.There are many operation method of semantic matching degree, and method in the prior art can also be transported It uses in the present invention.

Since the quality in model sample library is most important for the present invention, more preferably, in another embodiment, to model sample Example library optimizes, including two ways: one, while optimizing to knowledge base, identical content being added into model Sample library；Two, it is extended when in the presence of the sample with user's question semanteme similarity greater than the first threshold and less than 100% It asks, and the corresponding standard of user's question sentence asks the sample with semantic similarity greater than the first threshold and less than 100% It is identical to extend the corresponding sample standard question sentence asked, then asks the corresponding standard of user's question sentence and user's question sentence to phase Associatedly it is added into model sample library.The first optimization is primarily to keep model sample library content with knowledge base content Unanimously, and by newest question sentence and standard it asks and updates into model sample library, approximately handed over encountering the content with update in next time It when mutual log, can directly filter out, optimize without artificial judgment through the invention.Under second of optimal way, due to Correct answer can be provided for current user's question sentence from knowledge base, that is, find correct standard and ask, so as to not have to incite somebody to action Interactive log optimizes into knowledge base, but optimizes to be conducive to for subsequent more interactive logs being included in model sample library and can determine that In range, so as to directly handle related interactive log through the invention.

The information processing method process of one embodiment of the invention specifically includes:

Step 1: starting.

Step 2: determining in model sample library with the presence or absence of the sample to match with user's question sentence in human-computer interaction log Extension is asked, is entered step 3 if it exists, is otherwise entered step 5.

Step 3: determining that the corresponding standard of user's question sentence described in the human-computer interaction log is asked and expand with matched sample Whether the corresponding sample standard that exhibition is asked asks identical.4 are entered step if they are the same, otherwise enter step 6.

Step 4: judging whether the semantic similarity that user's question sentence is asked with sample extension is greater than first threshold and is less than 100%, if then entering step 7, otherwise enter step 8.

Step 5: re-creating knowledge point, and with knowledge point optimization knowledge base and model sample library.

Step 6: selection creation of knowledge point, and with knowledge point optimization knowledge base and model sample library.

Step 7: using interactive log content, Optimized model sample library.

Step 8: terminating.

Wherein step 5 content includes: actively to add a knowledge relevant to user's question sentence by knowledge maintenance personnel Point, that is, need to add a knowledge library standard ask, multiple knowledge bases extension ask with an answer, to complete the optimization of knowledge base, Identical knowledge point Optimized model sample library is utilized simultaneously, and only the question sentence in knowledge point has only been used in the optimization in model sample library Content is asked with standard.Step 6 includes: that one or more standards in recommended models sample library ask and give knowledge maintenance personnel, knowledge Maintenance personnel is directly selected that the pairing is then added into knowledge with a pairing for forming user's question sentence and standard is asked Library, while the pairing is added into model sample library.In step 7, by interactive log user's question sentence and corresponding standard It asks and is added in model sample library, so that forming a pair of new sample extension asks the correspondence asked with sample standard.

The present invention also provides a kind of devices 51 of information processing, please refer to Fig. 4.In one embodiment, described device includes First analysis module 501, the second analysis module 502 and optimization module 503.Interactive log initially enters the first analysis module 501, First analysis module 501 determines in model sample library with the presence or absence of the sample to match with user's question sentence in human-computer interaction log Extension is asked, and if it exists, is then entered the second analysis module 502, is determined that the institute of user's question sentence described in the human-computer interaction log is right It answers standard to ask and whether identical is asked with the corresponding sample standard asked of matched sample extension, optimization module is entered if not identical 503 pairs of knowledge bases optimize.

In another embodiment, Fig. 4 is please referred to, the first analysis module 501 further includes Semantic Similarity Measurement module 5011, The semantic similarity asked is extended for calculating user's question sentence in human-computer interaction log and sample, and then obtains matching degree.Second Analysis module 502 includes comparison module 5021, is asked and the expansion of matched sample for the corresponding standard of user's question sentence Open up the corresponding sample standard asked asks whether text is completely the same.Optimization module 503 further includes recommending module 5031, for being based on Semantic Similarity Measurement module 5011 as a result, recommending the sample for being greater than second threshold with the semantic matching degree of user's question sentence The corresponding sample standard asked is extended to ask.Optimization module 503 further includes adding module 5032, for will be from the sample mark recommended Standard asks that the standard that middle artificial selection goes out is asked and is added into the knowledge base in association with user's question sentence, while by above content Optimization is added into model sample library.

More preferably, while optimizing to knowledge base, model sample library 504 is optimized.Second analysis module 502 is also Including adding module 5022, when whether the semantic similarity that user's question sentence is asked with sample extension is greater than first threshold and is less than 100%, and when corresponding standard asks identical, interactive log content optimization is entered into model sample library.Adding module 5032 is also used to It asks that the standard of middle artificial selection out is asked for the sample standard recommended from recommending module 5031 to add in association with user's question sentence Model sample library is added.

In another embodiment, the invalid data in interactive log is filtered first, can be picked according to preset filtering rule Except the junk data in daily record data, such as: single English alphabet be repeated 5 times more than data.Naive Bayesian can be used later Algorithm is analyzed, and calculates whether log content can determine that in range in analysis model.

The present invention also provides a kind of systems 52 of information processing, please refer to Fig. 4.Including any information processing unit, It include simultaneously knowledge base 504 and model sample library 505.

Offer is to make any person skilled in the art all and can make or use this public affairs to the previous description of the disclosure It opens.The various modifications of the disclosure all will be apparent for a person skilled in the art, and as defined herein general Suitable principle can be applied to other variants without departing from the spirit or scope of the disclosure.The disclosure is not intended to be limited as a result, Due to example described herein and design, but should be awarded and principle disclosed herein and novel features phase one The widest scope of cause.

Claims

1. A method of information processing, comprising:

Provide a model sample library, the model sample library includes sample standard questions and sample extension questions corresponding to each sample standard question;

Provide a knowledge base, the knowledge base includes knowledge base standard questions and knowledge base extended questions and answers corresponding to each knowledge base standard question, and the knowledge base is used to provide answers to user questions;

Determining whether there is a sample expansion question matching the user question in the human-computer interaction log in the model sample library;

If it exists, it is determined whether the corresponding standard question of the user question in the human-computer interaction log is the same as the corresponding sample standard question of the matched sample expansion question;

If not the same, then optimize the knowledge base;

Determining whether the standard question corresponding to the user question is the same as the sample standard question corresponding to the matched sample extended question includes: comparing the corresponding standard question of the user question with the matched sample extended question. The sample standard asks whether the text is completely consistent;

If there is a sample extended question whose semantic similarity with the user question is greater than the first threshold and less than 100%, and the corresponding standard question of the user question is greater than the first threshold and less than 100% of the sample. If the sample standard questions corresponding to the example extension questions are the same, then the user questions and the corresponding standard questions of the user questions are associated and added to the model sample library.

2. The information processing method according to claim 1, wherein the sample extended questions include knowledge base extended questions, and the sample standard questions include knowledge base standard questions.

3. The method for information processing as claimed in claim 1, wherein determining whether there is a sample extension matching the user question in the model sample library includes:

Perform semantic similarity calculation on the user question and the sample extended question to determine whether there is at least one sample extended question whose semantic similarity with the user question is greater than a first threshold in the model sample library.

4. The method for information processing as claimed in claim 1, wherein if there are a plurality of matched sample extended questions, then determine the corresponding standard questions of the user question and the matched sample extended questions. Questions about whether the corresponding sample standards are the same include:

It is determined whether there is a matching sample extended question whose corresponding sample standard question is the same as the corresponding standard question of the user question.

5. The method for information processing as claimed in claim 1, wherein the optimization of the knowledge base comprises:

Based on the result of the semantic similarity calculation, recommending a corresponding sample standard question of a sample extended question whose semantic similarity with the user question sentence is greater than a second threshold;

The sample standard questions manually selected from the recommended sample standard questions are added into the knowledge base in association with the user questions.

6. The method for information processing as claimed in claim 5, characterized in that the method further comprises:

The sample standard questions manually selected from the recommended sample standard questions are added into the model sample library in association with the user questions.

7. The method for information processing as claimed in claim 1, characterized in that, if there is no example expansion question matched with the user question in the model example storehouse, then create a question corresponding to the said user question in the knowledge base. Knowledge points corresponding to user questions, the knowledge points include: standard questions in the knowledge base, extended questions and answers in the knowledge base.

8. The information processing method according to claim 7, further comprising: simultaneously adding the knowledge points created in the knowledge base to the model sample base.

9. The method for information processing as claimed in claim 3, wherein performing the semantic similarity calculation between the user question and the sample extension question comprises:

Carry out word segmentation on the sample expansion question, and calculate word and sentence vector values;

Carry out word segmentation to described user question sentence, and calculate word and sentence vector value;

Calculate the correlation between the word and sentence vector values of the sample extended question and the word and sentence vector values of the user question, so as to obtain the semantic similarity between the user question and the sample extended question.

10. The method for information processing as claimed in claim 1, wherein before determining whether there is an example expansion question matched with the user question in the model example library, the method further comprises:

All user questions in the human-computer interaction log are preprocessed to filter invalid data in the user questions in the human-computer interaction log.

11. An information processing device, comprising:

The first analysis module is used to determine whether there is a sample expansion question matching the user question in the human-computer interaction log in the model sample library;

The second analysis module is configured to determine the standard question corresponding to the user question in the human-computer interaction log and the matching sample extended question in response to the existence of a sample extended question that matches the user question. whether the corresponding sample standards for the same; and

An optimization module, configured to optimize the knowledge base in response to the fact that the corresponding standard question of the user question is not the same as the corresponding sample standard question of the matched sample expansion question;

The second analysis module includes:

A comparison module, used to compare whether the text of the corresponding standard question of the user question is completely consistent with the corresponding sample standard question of the matching sample expansion question;

Adding a module for responding to the presence of sample extended questions with a semantic similarity greater than a first threshold and less than 100% with the user question, and the corresponding standard question of the user question has a semantic similarity greater than the first threshold A threshold and less than 100% of the sample expansion questions corresponding to the same sample standard questions, then the user questions and the corresponding standard questions of the user questions are associated and added to the model sample library .

12. The information processing device according to claim 11, wherein the first analysis module comprises:

The semantic similarity calculation module is used to perform semantic similarity calculation on the user question and the sample extension question, so as to determine whether there is at least one in the model sample library whose semantic similarity with the user question is greater than the first A Threshold Sample Expansion Question.

13. The information processing device according to claim 12, wherein if there are multiple matching sample extension questions, the second analysis module determines whether there is a matching sample extension question corresponding sample The example standard question is the same as the corresponding standard question of the user question.

14. The information processing device according to claim 12, wherein the optimization module comprises:

A recommendation module, configured to recommend, based on the result of the semantic similarity calculation, a sample standard question corresponding to a sample extended question whose semantic matching degree with the user question sentence is greater than a second threshold; and

The adding module is used for adding the manually selected standard questions from the recommended sample standard questions to the knowledge base in association with the user questions.

15. The information processing device according to claim 14, wherein the adding module in the optimizing module

It is further used for adding the manually selected standard questions from the recommended sample standard questions into the model sample library in association with the user questions.

16. The information processing device according to claim 14, wherein if there is no example expansion question matched with the user question in the model example library, all the questions in the optimization module The adding module creates knowledge points corresponding to the user questions in the knowledge base, and the knowledge points include: standard questions in the knowledge base, extended questions and answers in the knowledge base.

17. The information processing device according to claim 16, wherein the adding module in the optimizing module also adds the knowledge points created in the knowledge base to the model sample base at the same time.

18. The device for information processing according to claim 12, wherein the semantic similarity calculation module comprises:

The word segmentation and vector calculation module is used to segment the sample extension question, and calculate the word and sentence vector value, and perform word segmentation to the user question, and calculate the word and sentence vector value; and

Relevance calculation module, used to calculate the correlation between the word and sentence vector value of the sample expansion question and the word and sentence vector value of the user question sentence, so as to obtain the semantic similarity between the user question sentence and the sample extension question Spend.

19. The information processing device according to claim 11, further comprising:

A preprocessing module, configured to preprocess all user questions in the human-computer interaction log to filter Invalid data in the HCI log user query.

20. An information processing system, characterized in that the system comprises:

An information processing device according to any one of claims 11-19;

A model sample library, the model sample library includes sample standard questions and sample extension questions corresponding to each sample standard question;

A knowledge base, the knowledge base includes knowledge base standard questions and knowledge base extended questions and answers corresponding to each knowledge base standard question, and the knowledge base is used to provide answers to user questions.