CN108804627A

CN108804627A - Information acquisition method and device

Info

Publication number: CN108804627A
Application number: CN201810551681.9A
Authority: CN
Inventors: 马文涛; 崔鸣; 崔一鸣; 陈致鹏; 何苏; 王士进; 胡国平; 刘挺
Original assignee: iFlytek Co Ltd
Current assignee: iFlytek Co Ltd
Priority date: 2018-05-31
Filing date: 2018-05-31
Publication date: 2018-11-13
Anticipated expiration: 2038-05-31
Also published as: CN108804627B

Abstract

A kind of information acquisition method of offer of the embodiment of the present invention and device, belong to natural language processing technique field.Method includes：The answer text to match by inquiry text and with inquiry text is separately input into N number of key content computation model, obtains the candidate key content of each key content computation model output；Wherein, key content computation model is obtained after being trained based on the sample key content in sample inquiry text, sample answer text and sample answer text, and each candidate key content is extracted from answer text；According to each candidate key content, optimal key content is obtained, and using optimal key content as the corresponding answer of inquiry text.Since the output result of N number of key content computation model can be merged, there are problems that deviation in training process to effectively evade single model, and then the reliability and accuracy for replying content can be improved, and improve interactive experience of the user when carrying out question and answer interaction with equipment.

Description

Information acquisition method and device

Technical field

The present embodiments relate to natural language processing technique field, more particularly, to a kind of information acquisition method and Device.

Background technology

In recent years, with the development of artificial intelligence related discipline, especially computational linguistics, various question answering systems It comes into being with dialogue robot, people can obtain required letter being linked up with equipment in a manner of natural language Breath.In the related art, for the enquirement of user, typically determine that user puts question in corresponding answer by a model Hold.Since single model inevitably has deviation in the training process, it is difficult to the whole training data distribution of fitting completely, to lead Cause the reliability for replying content low.

Invention content

To solve the above-mentioned problems, the embodiment of the present invention provides one kind and overcoming the above problem or solve at least partly State a kind of information acquisition method and device of problem.

According to a first aspect of the embodiments of the present invention, a kind of information acquisition method is provided, this method includes：

The answer text to match by inquiry text and with inquiry text is separately input into N number of key content computation model, Obtain the candidate key content of each key content computation model output；Wherein, key content computation model is asked based on sample It askes after the sample key content in text, sample answer text and sample answer text is trained and obtains, each candidate pass Key content is extracted from answer text, and N is the positive integer more than 1；

According to each candidate key content, optimal key content is obtained, and using optimal key content as inquiry text pair The answer answered.

Method provided in an embodiment of the present invention, by distinguishing inquiry text and the answer text to match with inquiry text It is input to N number of key content computation model, obtains the candidate key content of each key content computation model output.According to each Candidate key content obtains optimal key content, and using optimal key content as the corresponding answer of inquiry text.Due to that can incite somebody to action The output result of N number of key content computation model is merged, and is existed partially in training process to effectively evade single model Difference, it is difficult to the problem of whole training data of fitting is distributed completely, and then the reliability and accuracy for replying content can be improved, and Improve interactive experience of the user when carrying out question and answer interaction with equipment.

According to a second aspect of the embodiments of the present invention, a kind of information acquisition device is provided, which includes：

Candidate key content obtaining module, the answer text difference for matching by inquiry text and with inquiry text are defeated Enter the candidate key content that each key content computation model output is obtained to N number of key content computation model；Wherein, crucial Content computation model be based on sample inquiry text, sample reply text and sample reply sample key content in text into It is obtained after row training, each candidate key content is extracted from answer text, and N is the positive integer more than 1；

Optimal key content acquisition module, for according to each candidate key content, obtaining optimal key content, and will most Excellent key content is as the corresponding answer of inquiry text.

According to a third aspect of the embodiments of the present invention, a kind of information acquisition apparatus is provided, including：

At least one processor；And

At least one processor being connect with processor communication, wherein：

Memory is stored with the program instruction that can be executed by processor, and the instruction of processor caller is able to carry out first party The information acquisition method that any possible realization method is provided in the various possible realization methods in face.

According to the fourth aspect of the invention, a kind of non-transient computer readable storage medium, non-transient computer are provided Readable storage medium storing program for executing stores computer instruction, and computer instruction makes the various possible realization methods of computer execution first aspect In the information acquisition method that is provided of any possible realization method.

It should be understood that above general description and following detailed description is exemplary and explanatory, it can not Limit the embodiment of the present invention.

Description of the drawings

Fig. 1 is a kind of flow diagram of information acquisition method of the embodiment of the present invention；

Fig. 2 is the flow diagram of the candidate key content acquisition method of the embodiment of the present invention；

Fig. 3 is the flow diagram of the optimal key content acquisition methods of the embodiment of the present invention；

Fig. 4 is that the start statement of the embodiment of the present invention is obtained as the ballot score value in optimal key content when start statement The flow diagram of method；

Fig. 5 is that the END of the embodiment of the present invention is obtained as the ballot score value in optimal key content when END The flow diagram of method；

Fig. 6 is the flow diagram of N number of key content computation model acquisition methods of the embodiment of the present invention；

Fig. 7 is a kind of block diagram of information acquisition device of the embodiment of the present invention；

Fig. 8 is a kind of block diagram of information acquisition apparatus of the embodiment of the present invention.

Specific implementation mode

With reference to the accompanying drawings and examples, the specific implementation mode of the embodiment of the present invention is described in further detail.With Lower embodiment is not limited to the range of the embodiment of the present invention for illustrating the embodiment of the present invention.

People can obtain required information being linked up with equipment in a manner of natural language at present.In correlation In technology, for the enquirement of user, typically determine that user puts question to corresponding answer content by a model.Due to list One model inevitably has deviation in the training process, it is difficult to the whole training data distribution of fitting completely, so as to cause in answer The reliability of appearance is low.For said circumstances, an embodiment of the present invention provides a kind of information acquisition methods.This method can be used for intelligence Energy question and answer scene can be used for the other scenes for needing intelligent answer function, such as Driving Scene, shopping scene, the present invention Embodiment is not especially limited this.In conjunction with different usage scenarios, this method can be executed by different equipment, and the present invention is real Example is applied also to be not especially limited this.For example, if this method is used for Driving Scene, the executive agent of this method can be vehicle-mounted Equipment；If this method is for scene of doing shopping, the executive agent of this method can be mobile terminal.Referring to Fig. 1, this method packet It includes：

101, the answer text to match by inquiry text and with inquiry text is separately input into N number of key content and calculates mould Type obtains the candidate key content of each key content computation model output；Wherein, key content computation model is to be based on sample Inquiry text, sample are replied after the sample key content in text and sample answer text is trained and are obtained, each candidate Key content is extracted from answer text, and N is the positive integer more than 1.

Before executing the above process, voice data when user puts question to can be first obtained, and voice is carried out to voice data Identification is to obtain inquiry text；Alternatively, text input by user can also be directly acquired and as inquiry text, the present invention is implemented Example is not especially limited this.In addition, the answer text to match with inquiry text, may include that inquiry text corresponds to answering for enquirement Multiple content.Specifically, it is putd question to if inquiry text corresponds to as how a certain function in one product of inquiry uses, with inquiry text The answer text to match can illustrate document for the product；Further, it is contemplated that usually have in a product multinomial Function, and the product illustrates to remember in document in the functional operation instruction information of product institute, if in advance according to every Function will illustrate that document splits into several structured texts, and inquiry textual queries is a certain function in product, then Can be the corresponding structured text of the function with answer text that inquiry text matches.If it is one that inquiry text, which is corresponded to and putd question to, The definition of a technical term, then it can be the technology dictionary defined comprising the technical term to reply text.Certainly, text is replied to remove Can also be other forms except the above-mentioned form enumerated, the embodiment of the present invention is not especially limited this.

May include the unrelated redundancy of some enquirements corresponding with inquiry text due to replying in text, to pass through Key content computation model can screen out the redundancy replied in text.Specifically, for any key content computation model, After it will reply text input to the key content computation model, exportable candidate key content.Wherein, each key content meter Calculate model one candidate key content of corresponding output.Candidate key content can be by key content computation model from answer The specific content of text extracted in text can also be the subordinate sentence serial number replied in text, and the embodiment of the present invention is not to waiting The content-form of key content is selected to make specific limit.

In addition, before executing the above process, it can also train in advance and obtain N number of key content computation model.For N number of pass Any key content computation model in key content computation model can train and obtain key content calculating mould in the following way Type, specifically：First, it collects great amount of samples inquiry text, reply text with the sample that sample inquiry text matches；Wherein, The sample key content that sample replies in text is predetermined, and corresponds to the answer content putd question to for sample inquiry text. Text is replied based on sample inquiry text, sample and sample key content is trained initial model, to obtain the key Content computation model.Wherein, initial model can be single neural network model, can also be the group of multiple neural network models It closes, the embodiment of the present invention does not make specific limit to the type of initial model and structure.

It should be noted that when advance training obtains N number of key content computation model, each key content is calculated Model can be obtained by the initial model and identical sample training of same type.It, can be with but in actual implementation training process It is N number of with identical function to obtain by adjusting the parameter of initial model, but the key content meter with different output effects Calculate model.By initial model include convolutional neural networks or including shot and long term memory network for, can by adjusting convolution god Convolution kernel size through network and number, or the node number of shot and long term memory network hidden layer is adjusted, to obtain N number of difference Key content computation model.Alternatively, can also be by different training methods, such as dropout or regularization method, with training Obtain N number of different key content computation model.It is of course also possible to the different training sides that different parameters and use will be adjusted Method, both methods are combined, and N number of different key content computation model is obtained with training, the embodiment of the present invention to this not Make specific limit.

102, according to each candidate key content, optimal key content is obtained, and using optimal key content as inquiry text This corresponding answer.

By above-mentioned steps it is found that by N number of key content computation model, exportable N number of candidate key content.And it is different Key content computation model has different output effects, thus in this step can be to above-mentioned N number of key content computation model The candidate key content exported respectively is merged, to obtain optimal key content.It should be noted that by above-mentioned steps Hold it is found that candidate key content content-form can be specific content of text, can also be subordinate sentence serial number.Therefore, in this step In rapid, the content-form of optimal key content is corresponding with the content-form of candidate key content, can be specific content of text, Can be subordinate sentence serial number, the embodiment of the present invention is not especially limited this.

In key content optimal according to each candidate key content obtaining, each candidate key content can be input to pre- If in model, to obtain optimal key content；Alternatively, can also count reply text in each subordinate sentence in all candidate keys The number occurred in appearance, to which according to the height of occurrence number, it is optimal to form that subordinate sentence is chosen from all candidate key contents Key content, the embodiment of the present invention are not especially limited this.After obtaining optimal key content, optimal key content can be made For the corresponding answer of inquiry text.Specifically, the modes such as can show by voice broadcast or text, using optimal key content as Answer to user, the embodiment of the present invention are not especially limited this.

Content based on above-described embodiment, as a kind of alternative embodiment, the embodiment of the present invention not to by inquiry text and It is separately input into N number of key content computation model with the answer text that inquiry text matches, each key content is obtained and calculates The mode of the candidate key content of model output specifically limits.With reference to figure 2, including but not limited to：

201, for the candidate key content of any key content computation model output, obtains and reply each subordinate sentence in text As the initial probability of start statement in candidate key content, and obtain reply text in each subordinate sentence as candidate key content The end probability of middle END.

202, according to the corresponding initial probability of each subordinate sentence, the corresponding subordinate sentence of maximum initial probability is chosen in text from replying As the start statement of candidate key content, according to the corresponding end probability of each subordinate sentence, maximum knot is chosen in text from replying END of the corresponding subordinate sentence of beam probability as candidate key content.

203, subordinate sentence, start statement and conclusion in text between start statement and END will be replied Sentence, as candidate key content.

To reply in text comprising for 8 subordinate sentences, for any candidate key content, the candidate key content is being determined In start statement and when END, reply starting of each subordinate sentence as start statement in the candidate key content in text Probability, and end probability of each subordinate sentence as END in the candidate key content in text is replied, it can refer to as follows Table 1：Table 1

As shown in Table 1, the initial probability highest of the 4th article of subordinate sentence, the end probability highest of the 6th article of subordinate sentence.Therefore, it can incite somebody to action 4th article of start statement of the subordinate sentence as the candidate key content, using the 6th article of subordinate sentence as the conclusion of the candidate key content Sentence, and using the 4th article, the 5th article and the 6th article subordinate sentence as the candidate key content.By the process in above-mentioned example, N can be obtained A candidate key content.

Method provided in an embodiment of the present invention replies each subordinate sentence in text by acquisition and is used as each candidate key content The initial probability of middle start statement, and as the end probability of END, then based on initial probability and terminate determine the probability To help to improve the computational accuracy of key content computation model, and then candidate key content can be improved in candidate key content Reliability and accuracy.

By the content of above-described embodiment, it is found that single model is in training process, there are deviations, it is difficult to be fitted whole completely Training data is distributed, less reliable and accuracy is relatively low so as to cause the key content that is exported by single model.For the feelings Shape, the content based on above-described embodiment, as a kind of alternative embodiment, the embodiment of the present invention is not to according in each candidate key Hold, the mode for obtaining optimal key content specifically limits.With reference to figure 3, including but not limited to：

301, each start statement in the start statement for including for all candidate key contents, according to each starting language Sentence determines the optimal start statement in optimal key content as the ballot score value in optimal key content when start statement.

By the content of above-described embodiment it is found that the candidate key content that N number of key content computation model is exported may be Differ, some candidate key contents may start statement having the same, some candidate key contents may have phase Same END, and information intersection may be not present in some candidate key contents.In order to merge N number of candidate key content, with An optimal start statement is chosen from N number of candidate key content, the mode of ballot can be used in the embodiment of the present invention, to institute There is the start statement occurred in candidate key content to vote, namely determines in the start statement occurred per together Ballot score value of beginning sentence during ballot is elected, it is maximum to choose ballot score value from the start statement occurred Start statement, using as optimal start statement.

Wherein, the corresponding ballot score value of start statement in candidate key content, represents the start statement as optimal key Accuracy and reliability in content when start statement.Vote score value it is bigger, then show the corresponding reliability of the start statement and Accuracy is higher.Ballot score value can be determined according to the corresponding initial probability of start statement in each candidate key content, It can be determined according to the number that the start statement occurs in all candidate key contents, the embodiment of the present invention does not make this to have Body limits.

302, each END in the END for including for all candidate key contents, according to each conclusion Sentence determines the optimal END in optimal key content as the ballot score value in optimal key content when END.

By the content of above-described embodiment it is found that the candidate key content that N number of key content computation model is exported may be Differ, some candidate key contents may start statement having the same, some candidate key contents may have phase Same END, and information intersection may be not present in some candidate key contents.In order to merge N number of candidate key content, with An optimal END is chosen from N number of candidate key content, the mode of ballot can be used in the embodiment of the present invention, to institute There is the END occurred in candidate key content to vote, namely determines each knot in the END occurred Ballot score value of beam sentence during ballot is elected, it is maximum to choose ballot score value from the END occurred END, using as optimal END.

Wherein, the corresponding ballot score value of END in candidate key content, represents the END as optimal key Accuracy and reliability in content when END.Vote score value it is bigger, then show the corresponding reliability of the END and Accuracy is higher.Ballot score value can be determined according to the corresponding initial probability of END in each candidate key content, It can be determined according to the number that the END occurs in all candidate key contents, the embodiment of the present invention does not make this to have Body limits.

303, subordinate sentence, optimal start statement in text between optimal start statement and optimal END will be replied And optimal END, as optimal key content.

Method provided in an embodiment of the present invention, by according to start statement ballot score value and END ballot score value It determines optimal start statement and optimal END, and then determines optimal key content, to realize to N number of key content meter The candidate key content for calculating model output is merged, and then improves the reliability and accuracy for replying content.

It is found that it can be according to the corresponding initial probability of start statement in candidate key content, really by the content of above-described embodiment Determine the corresponding ballot score value of start statement in candidate key content.Therefore, according to each start statement as it is optimal key in Ballot score value in appearance when start statement, before determining the optimal start statement in optimal key content, based on the principle and on The content for stating embodiment, as a kind of alternative embodiment, the embodiment of the present invention additionally provides in a kind of determining candidate key content Start statement corresponds to the mode of ballot score value.With reference to figure 4, including but not limited to：

401, for any candidate start statement in all candidate start statements, will meet in all candidate key contents The candidate key content of first preset condition, as first object candidate key content, all candidate's start statements are all times It is comprising any candidate start statement and any candidate start statement to select the start statement that key content includes, the first preset condition As first subordinate sentence.

402, according to the total quantity of first object candidate key content in all candidate key contents and every one first mesh The corresponding initial probability of start statement in candidate key content is marked, calculates any candidate start statement as in optimal key content Ballot score value when start statement.

In order to make it easy to understand, by taking N is 5 as an example, then exportable 5 candidate key contents of 5 key content computation models.For Convenient for description, start statement is represented with subordinate sentence serial number of the start statement in replying text.Distinguished with candidate key content For a, b, c, d and e, start statement and the corresponding initial probability of start statement in each candidate key content can refer to Such as the following table 2：

Table 2

By above-mentioned table 2 it is found that the start statement that above-mentioned 5 candidate key contents include is respectively to reply subordinate sentence sequence in text Number subordinate sentence for being respectively subordinate sentence serial number 2,3 and 4 for 2,3 and 4 subordinate sentence namely all candidate start statements.For wherein dividing The candidate start statement of sentence serial number 2, to be start statement and candidate's start statement as the comprising candidate's start statement The candidate key content of one subordinate sentence (namely meeting first preset condition) is respectively a and d, also as first object candidate key Content.At this point, the total quantity of first object candidate key content is 2, start statement corresponds in first object candidate key content a Initial probability be 80%, the corresponding initial probability of start statement is 94% in first object candidate key content d.According to above-mentioned Parameter can calculate the candidate start statement of subordinate sentence serial number 2, the ballot in start statement in as optimal key content point Value.Wherein, specific calculating process can refer to following formula：

score_start-i=count_start(index=i)+sum (p_start-i)/count_start(index=i)

Wherein, score_start-iFor subordinate sentence serial number i candidate start statement as start statement in optimal key content When ballot score value.count_start(index=i) it is the total quantity of first object candidate key content, sum (p_start-i) indicate The summation of the corresponding initial probability of start statement in all first object candidate key contents.It should be noted that herein first Start statement is the candidate start statement of subordinate sentence serial number i in target candidate key content.

By the content in above-mentioned example it is found that candidate start statement for wherein subordinate sentence serial number 2, is originated with the candidate Sentence is that the candidate key content of start statement is respectively a and d.Content in conjunction with above-mentioned example and above-mentioned calculation formula, can Ballot score value of the candidate start statement of subordinate sentence serial number 2 in start statement in as optimal key content, which is calculated, is score_start-2=2.87.

Similarly, for the candidate start statement of wherein subordinate sentence serial number 3, by upper table 2 it is found that being with candidate's start statement The candidate key content of start statement is respectively c and e, also as first object candidate key content.At this point, first object is candidate The total quantity of key content is 2, and the corresponding initial probability of start statement is 72%, first in first object candidate key content c The corresponding initial probability of start statement is 76% in target candidate key content e.Based on above-mentioned parameter and above-mentioned calculation formula, The candidate start statement that subordinate sentence serial number 3 can be calculated, the ballot score value in start statement in as optimal key content are score_start-3=2.74.For the candidate start statement of wherein subordinate sentence serial number 4, it is based on above-mentioned calculating process, it may be determined that point The candidate start statement of sentence serial number 4, the ballot score value in start statement in as optimal key content are score_start-4 =1.64.

It, can basis and by the content of above-described embodiment it is found that in the optimal start statement in determining optimal key content Each candidate's start statement therefrom chooses maximum ballot score value pair as the ballot score value in optimal key content when start statement The candidate start statement answered is as the optimal start statement in optimal key content.And in the examples described above, maximum ballot score value For score_start-2=2.87 namely subordinate sentence serial number 2 candidate start statement.Therefore, text can will be replied in the examples described above The candidate start statement of subordinate sentence serial number 2 in this, as the optimal start statement in optimal key content.

It is found that it can be according to the corresponding end probability of END in candidate key content, really by the content of above-described embodiment Determine the corresponding ballot score value of END in candidate key content.Therefore, according to each END as it is optimal key in Ballot score value in appearance when END, before determining the optimal END in optimal key content, based on the principle and on The content for stating embodiment, as a kind of alternative embodiment, the embodiment of the present invention additionally provides in a kind of determining candidate key content END corresponds to the mode of ballot score value.With reference to figure 5, including but not limited to：

501, for any candidate END in all candidate ENDs, will meet in all candidate key contents The candidate key content of second preset condition, as the second target candidate key content, all candidate's ENDs are all times It is comprising any candidate END and any candidate END to select the END that key content includes, the second preset condition As the last item subordinate sentence.

502, according to the total quantity of the second target candidate key content in all candidate key contents and every one second mesh The corresponding end probability of END in candidate key content is marked, calculates any candidate END as in optimal key content Ballot score value when END.

In order to make it easy to understand, equally by taking N is 5 as an example, then in exportable 5 candidate keys of 5 key content computation models Hold.For ease of description, subordinate sentence serial number with END in replying text represents END.With candidate key content Respectively for a, b, c, d and e, the corresponding end probability of END and END in each candidate key content can With reference to such as the following table 3：

Table 3

By above-mentioned table 3 it is found that the END that above-mentioned 5 candidate key contents include is respectively to reply subordinate sentence sequence in text Number subordinate sentence for being respectively subordinate sentence serial number 2,3 and 4 for 3,4 and 5 subordinate sentence namely all candidate ENDs.For wherein dividing The candidate END of sentence serial number 4, using comprising candidate's END and candidate's END as the last item subordinate sentence The candidate key content of (namely meet second preset condition) is respectively b, d and e, also as the second target candidate key content. At this point, the total quantity of the second target candidate key content is 3, the corresponding knot of END in the second target candidate key content b Beam probability is 64%, and the corresponding end probability of END is 94% in the second target candidate key content d, the second target candidate The corresponding end probability of END is 76% in key content e.The candidate of subordinate sentence serial number 4 can be calculated according to above-mentioned parameter END, the ballot score value in END in as optimal key content.Wherein, specific calculating process can refer to as follows Formula：

score_end-i=count_end(index=i)+sum (p_end-i)/count_end(index=i)

Wherein, score_end-iFor subordinate sentence serial number i candidate END as in optimal key content when END Ballot score value.count_end(index=i) it is the total quantity of the second target candidate key content, sum (p_end-i) indicate all The corresponding summation for terminating probability of END in second target candidate key content.It should be noted that the second target herein END is the candidate END of subordinate sentence serial number i in candidate key content.

By the content in above-mentioned example it is found that candidate END for wherein subordinate sentence serial number 4, is terminated with the candidate Sentence is that the candidate key content of END is respectively b, d and e.Content in conjunction with above-mentioned example and above-mentioned calculation formula, Ballot score value of the candidate END of subordinate sentence serial number 4 in END in as optimal key content, which can be calculated, is score_end-4=3.78.

Similarly, for the candidate END of wherein subordinate sentence serial number 3, by upper table 3 it is found that being with candidate's END The candidate key content of END is respectively c, also as the second target candidate key content.At this point, the second target candidate is closed The total quantity of key content is 1, and the corresponding initial probability of END is 72% in the second target candidate key content c.Based on upper State parameter and above-mentioned calculation formula, you can the candidate END for calculating subordinate sentence serial number 3, in as optimal key content Ballot score value when END is score_end-3=1.72.For the candidate END of wherein subordinate sentence serial number 5, based on upper State calculating process, it may be determined that the candidate END of subordinate sentence serial number 5, the throwing in END in as optimal key content Ticket score value is score_end-5=1.8.

It, can basis and by the content of above-described embodiment it is found that in the optimal END in determining optimal key content Each candidate's END therefrom chooses maximum ballot score value pair as the ballot score value in optimal key content when END The candidate END answered is as the optimal END in optimal key content.And in the examples described above, maximum ballot score value For score_end-4=3.78 namely subordinate sentence serial number 4 candidate END.Therefore, text can will be replied in the examples described above The candidate END of middle subordinate sentence serial number 4, as the optimal END in optimal key content.

Method provided in an embodiment of the present invention, by be based in candidate key content the corresponding initial probability of start statement with And the corresponding end probability of END, determine END as the ballot score value in optimal key content when END, To by the way of ballot, determine the ballot score value of each start statement and the ballot score value of each END, and with this Based on elect optimal start statement and optimal END in optimal key content.Therefore, N number of key content can be calculated The candidate key content of model output is merged, and then improves the reliability and accuracy for replying content.

During actual implementation, it may train in advance and obtain multiple key content computation models to be selected.And these The answer content that key content computation model to be selected obtains in use, based on some key content computation models to be selected its Accuracy and reliability can be relatively high, and its accuracy of answer content for being obtained based on some key content computation models to be selected and Reliability can be relatively low.Therefore, for training obtained key content computation model to be selected in advance, it is necessary to be sieved to it Choosing.Based on the principle, if " N number of key content computation model " involved in above-described embodiment is obtained after screening , then the answer text input to match by inquiry text and with inquiry text to N number of key content computation model obtains each It, can also be to training obtained M key content meters to be selected in advance before the candidate key content of key content computation model output It calculates model to be screened, to obtain N number of key content computation model.

Content based on above-described embodiment is screened as a kind of alternative embodiment an embodiment of the present invention provides a kind of To the mode of N number of key content computation model.With reference to figure 6, including but not limited to：

601, M key content computation models to be selected are combined, obtain several groups target model set to be selected, every group Include N number of key content computation model to be selected in target model set to be selected, M is not less than N.

The present embodiments relate to key content computation model to be selected, can by same type initial model and Identical sample training obtains.It, can be multiple to obtain by adjusting the parameter of initial model in actual implementation training process With identical function, but the key content computation model to be selected with different output effects.Include convolutional Neural with initial model Network or including shot and long term memory network for, can by adjusting the convolution kernel size and number of convolutional neural networks, or The node number for adjusting shot and long term memory network hidden layer, to which training obtains M different key content computation models to be selected. Alternatively, M different passes to be selected can also be obtained with training such as dropout or regularization method by different training methods Key content computation model.It is of course also possible to the different training methods that different parameters and use will be adjusted, both methods into Row combines, and obtains M different key content computation models to be selected with training, the embodiment of the present invention is not especially limited this. Wherein, specific training process can refer to the training process of key content computation model in above-described embodiment, and details are not described herein again.

With M for 5, and respectively for A, B, C, D and E.If N is 3, mould can be calculated from above-mentioned 5 key contents to be selected In type, 3 key content computation models to be selected are chosen by different combinations, include 3 to be selected to obtain several groups The target model set to be selected of key content computation model.For example, (A, B, C), (A, B, E) and (B, C, D) etc..It needs to illustrate It is to be combined different key content computation models to be selected, to obtain target Models Sets to be selected in the examples described above to be It closes.During actual implementation, identical key content computation model to be selected can be also combined, to obtain target model to be selected Set, the embodiment of the present invention are not especially limited this.For example, obtained target model set to be selected can be (A, A, A), (B, B, B) and (C, C, C) etc..

602, multiple test sample examples are separately input into every group of target model set to be selected, every group of target is obtained and waits for modeling Type is integrated into using each test sample example as obtained sample optimal key content when input；Wherein, each test sample Example includes that sample inquiry text and the sample to match with sample inquiry text reply text, and each test sample example is corresponding Have a key content, the corresponding key content of each test sample example be based on sample inquiry text in sample replies text in advance Extraction.

In this step, modeling is waited for for any test sample example and any group of target model set to be selected, this group of target Include N number of key content computation model to be selected in type set, which be input to target model set to be selected, The sample to match by sample inquiry text and with sample inquiry text replies text, is separately input into this group of target and waits for modeling N number of key content computation model to be selected in type set.Similarly with above-described embodiment, it is asked by sample inquiry text and with sample It askes the sample that text matches and replies text, the N number of key content to be selected being separately input into this group of target model set to be selected It, can be according to the process of the optimal key content of above-mentioned acquisition, to obtain the optimal key content of sample after computation model.

It should be noted that a test sample example is input to one group of target model set to be selected, you can obtain one The optimal key content of sample.If the quantity of test sample example is m, and the group number of model set to be selected is n, is surveyed by each sample Examination example is separately input into every group of target model set to be selected, then the optimal key content of m*n sample can be obtained.

603, by every group of target model set to be selected using each test sample example as obtained sample when input most Excellent key content key content corresponding with each test sample example is compared, and determines that every group of target is to be selected according to comparison result The corresponding acquisition of information accuracy rate of model set chooses maximum information and obtains the corresponding target model set conduct to be selected of accuracy rate N number of key content computation model.

For test sample example, text has been replied all in sample inquiry text and the sample to match with sample inquiry text Under the premise of determination, for sample inquiry text correspond to put question to answer content, namely the present embodiments relate to key Content, and can be predetermined.Therefore, in this step, for any test sample example and any group of target model to be selected Set, by this group of target model set to be selected using the test sample example as obtained sample optimal key content when input It is compared with the corresponding key content of test sample example, you can determine when the test sample example is as inputting, this group of mesh The output of model set to be selected is marked the result is that correct or wrong.

Specifically, if the optimal key content of sample of this group of target model set output to be selected is corresponding with the test sample example Key content it is consistent, then can determine the test sample example as input when, the output knot of this group of target model set to be selected Fruit is correct.If the optimal key content of the sample pass corresponding with the test sample example of this group of target model set output to be selected Key content is inconsistent, then can determine when the test sample example is as inputting, the output result of this group of target model set to be selected It is wrong.

It follows that for any group of target model set to be selected, a test sample example is often input to this group of target Model set to be selected can be considered a test process to this group of target model set to be selected.And by above-mentioned comparison process, It can judge the output result correctness of each test process.It therefore, will be upper for any group of target model set to be selected The multiple test sample examples stated in step are separately input into this group of target model set to be selected, you being determined according to output result should The acquisition of information accuracy rate of group target model set to be selected.For example, if the number of test sample example is 100, and by 100 samples After this test case is separately input into this group of target model set to be selected, the number for exporting correct result is 72 times, you can determining should The acquisition of information accuracy rate of group target model set to be selected is 72%.Similarly, in several groups target model set to be selected Every group of target model set to be selected can determine the corresponding acquisition of information of every group of target model set to be selected as procedure described above Accuracy rate.

The corresponding acquisition of information of every group of target model set to be selected is accurate in determining several groups target model set to be selected After rate, maximum information can be chosen and obtain the corresponding target model set to be selected of accuracy rate, and by this group of target model set to be selected In N number of key content computation model to be selected, as N number of key content computation model.

Method provided in an embodiment of the present invention, by the acquisition of information accuracy rate based on every group of target model set to be selected, It chooses maximum information and obtains the corresponding target model set to be selected of accuracy rate as N number of key content computation model, so as to Ensure the reliability and accuracy of answer content.

Content based on above-described embodiment, as a kind of alternative embodiment, the embodiment of the present invention is not to by M keys to be selected Content computation model is combined, and the mode for obtaining several groups target model set to be selected is made specifically to limit the specific restriction of work, packet It includes but is not limited to：Based on greedy algorithm, according to the corresponding acquisition of information accuracy rate of the model set to be selected obtained after combination, one by one M key content computation models to be selected are combined, until obtaining several groups target model set to be selected.

Wherein, greedy algorithm refers to by a series of selection of local optimums, i.e. greed selects to reach required problem Total optimization solution.By the content of above-described embodiment it is found that calculating mould comprising N number of key content to be selected in target model set to be selected Type.In order to make it easy to understand, be respectively A, B, C, D and E with M key content computation models to be selected, N 3, and using A as initial set For molding type, the process for being combined to obtain several groups target model set to be selected to A, B, C, D and E illustrates.

Since A is starting built-up pattern, so as to which first A will be combined with A, B, C, D and E respectively, can be obtained (A, A), (A, B), (A, C), (A, D) and (A, E).Due to being combined between above-mentioned five set and key content computation model to be selected The set obtained afterwards, to the calculating based on the corresponding acquisition of information accuracy rate of the model set to be selected of target in above-described embodiment Journey can equally calculate separately to obtain the corresponding acquisition of information accuracy rate of above-mentioned 5 set.Since greedy algorithm is to be based on part most Excellent solution obtains the corresponding set of accuracy rate, to continue subsequent anabolic process so as to choose maximum information from 5 set.

With set (A, B) be the corresponding set of maximum information acquisition rate for, then can continue by (A, B) respectively with A, B, C, D and E are combined, and can be obtained (A, B, A), (A, B, B), (A, B, C), (A, B, D) and (A, B, E).Similarly, according to above-mentioned calculating Process can calculate separately (A, B, A), (A, B, B), (A, B, C), (A, B, D) and (A, B, E) corresponding acquisition of information accuracy rate.From Maximum information is chosen in this 5 set and obtains the corresponding set of accuracy rate, you can as obtained for starting built-up pattern with A Target model set to be selected.

Similarly, respectively with B, C, D and E be starting built-up pattern, you can obtain respectively with B, C, D and E be starting combination die The target model combination to be selected of type.All targets model combination to be selected obtained as procedure described above, as several groups target wait for Select model set.

Method provided in an embodiment of the present invention, by being based on greedy algorithm, according to the model set to be selected obtained after combination Corresponding acquisition of information accuracy rate is one by one combined M key content computation models to be selected, until obtaining several groups target Model set to be selected.Due to can the mode based on locally optimal solution M key content computation models to be selected are combined, with To several groups target model set to be selected, so as to ensure to reply the reliability and accuracy of content.

It should be noted that above-mentioned all alternative embodiments, may be used the optional implementation that any combination forms the present invention Example, this is no longer going to repeat them.

Content based on above-described embodiment, an embodiment of the present invention provides a kind of information acquisition devices.The device is for holding The information acquisition method provided in row above method embodiment.With reference to figure 7, which includes：

Candidate key content obtaining module 701, for dividing inquiry text and the answer text to match with inquiry text It is not input to N number of key content computation model, obtains the candidate key content of each key content computation model output；Wherein, Key content computation model is in the sample key replied based on sample inquiry text, sample in text and sample answer text Appearance obtains after being trained, and each candidate key content is extracted from answer text, and N is the positive integer more than 1；

Optimal key content acquisition module 702, for according to each candidate key content, obtaining optimal key content, and Using optimal key content as the corresponding answer of inquiry text.

As a kind of alternative embodiment, candidate key content obtaining module 701, including：

Probability acquiring unit, the candidate key content for being exported for any key content computation model are obtained and are replied Initial probability of each subordinate sentence as start statement in candidate key content in text, and obtain and reply each subordinate sentence work in text For the end probability of END in candidate key content；

Sentence acquiring unit, for according to the corresponding initial probability of each subordinate sentence, maximum starting to be chosen in text from replying Start statement of the corresponding subordinate sentence of probability as candidate key content, it is literary from replying according to the corresponding end probability of each subordinate sentence The maximum END for terminating the corresponding subordinate sentence of probability as candidate key content is chosen in this；

Candidate key content determining unit, for point in text between start statement and END will to be replied Sentence, start statement and END, as candidate key content.

As a kind of alternative embodiment, optimal key content acquisition module 702, including：

Optimal start statement determination unit, in the start statement for including for all candidate key contents per together Beginning sentence is determined according to each start statement as the ballot score value in optimal key content when start statement in optimal key Optimal start statement in appearance；

Optimal END determination unit, each knot in END for including for all candidate key contents Beam sentence is determined according to each END as the ballot score value in optimal key content when END in optimal key Optimal END in appearance；

Optimal key content determination unit, for will reply in text positioned at optimal start statement and optimal END it Between subordinate sentence, optimal start statement and optimal END, as optimal key content.

As a kind of alternative embodiment, optimal key content acquisition module 702 further includes：

First object candidate key contents acquiring unit, for for any candidate starting in all candidate start statements Sentence will meet the candidate key content of the first preset condition, as first object candidate key in all candidate key contents Content, all candidate's start statements are the start statement that all candidate key contents include, and the first preset condition is comprising any Candidate start statement and any candidate start statement is as first subordinate sentence；

Start statement is voted score value acquiring unit, for according in all candidate key contents in first object candidate key The corresponding initial probability of start statement in the total quantity of appearance and each first object candidate key content, calculates any candidate Start statement is as the ballot score value in optimal key content when start statement.

Second target candidate key content acquiring unit, for terminating for any candidate in all candidate ENDs Sentence will meet the candidate key content of the second preset condition in all candidate key contents, as the second target candidate key Content, all candidate's ENDs are the END that all candidate key contents include, and the second preset condition is comprising any Candidate END and any candidate END is as the last item subordinate sentence；

END is voted score value acquiring unit, for according in all candidate key contents in the second target candidate key The corresponding end probability of END in the total quantity of appearance and every one second target candidate key content, calculates any candidate END is as the ballot score value in optimal key content when END.

As a kind of alternative embodiment, which further includes：

Target model set acquisition module to be selected, for being combined M key content computation models to be selected, if obtaining Group target model set to be selected is done, includes N number of key content computation model to be selected in every group of target model set to be selected, M is not Less than N；

The optimal key content acquisition module of sample waits for modeling for multiple test sample examples to be separately input into every group of target Type set obtains every group of target model set to be selected using each test sample example as obtained sample optimal pass when input Key content；Wherein, each test sample example includes that sample inquiry text and the sample to match with sample inquiry text reply Text, each test sample example are corresponding with key content, and the corresponding key content of each test sample example is asked based on sample Ask what text extracted in advance in sample replies text；

Key content computation model acquisition module, for by every group of target model set to be selected by each test sample example It is compared as the obtained optimal key content of sample key content corresponding with each test sample example when inputting, according to Comparison result determines the corresponding acquisition of information accuracy rate of every group of target model set to be selected, chooses maximum information and obtains accuracy rate pair The target model set to be selected answered is as N number of key content computation model.

As a kind of alternative embodiment, the optimal key content acquisition module of sample, for being based on greedy algorithm, according to combination The corresponding acquisition of information accuracy rate of model set to be selected obtained afterwards carries out group to M key content computation models to be selected one by one It closes, until obtaining several groups target model set to be selected.

Device provided in an embodiment of the present invention, by distinguishing inquiry text and the answer text to match with inquiry text It is input to N number of key content computation model, obtains the candidate key content of each key content computation model output.According to each Candidate key content obtains optimal key content, and using optimal key content as the corresponding answer of inquiry text.Due to that can incite somebody to action The output result of N number of key content computation model is merged, and is existed partially in training process to effectively evade single model Difference, it is difficult to the problem of whole training data of fitting is distributed completely, and then the reliability and accuracy for replying content can be improved, and Improve interactive experience of the user when carrying out question and answer interaction with equipment.

Secondly, the starting that start statement in each candidate key content is used as by each subordinate sentence in acquisition answer text is general Rate, and as the end probability of END, then based on initial probability and terminate determine the probability candidate key content, to have Help improve the computational accuracy of key content computation model, and then the reliability and accuracy of candidate key content can be improved.

Again, by according to the ballot score value of start statement and the ballot score value of END determine optimal start statement and Optimal END, and then determine optimal key content, to realize the candidate exported to N number of key content computation model Key content is merged, and then improves the reliability and accuracy for replying content.

From secondary, by being based on the corresponding initial probability of start statement and the corresponding knot of END in candidate key content Beam probability determines END as the ballot score value in optimal key content when END, thus by the way of ballot, It determines the ballot score value of each start statement and the ballot score value of each END, and elects optimal key based on this Optimal start statement and optimal END in content.Therefore, the candidate key that N number of key content computation model can be exported Content is merged, and then improves the reliability and accuracy for replying content.

In addition, by the acquisition of information accuracy rate based on every group of target model set to be selected, chooses maximum information and obtain standard The true corresponding target of rate model set to be selected is as N number of key content computation model, so as to ensure to reply the reliable of content Property and accuracy.

Finally, accurate according to the corresponding acquisition of information of model set to be selected obtained after combination by being based on greedy algorithm Rate is one by one combined M key content computation models to be selected, until obtaining several groups target model set to be selected.Due to Can the mode based on locally optimal solution M key content computation models to be selected are combined, it is to be selected to obtain several groups target Model set, so as to ensure to reply the reliability and accuracy of content.

An embodiment of the present invention provides a kind of information acquisition apparatus.Referring to Fig. 8, which includes：Processor (processor) 801, memory (memory) 802 and bus 803；

Wherein, processor 801 and memory 802 complete mutual communication by bus 803 respectively；Processor 801 is used In calling the program instruction in memory 802, to execute the information acquisition method that above-described embodiment is provided, such as including：It will Inquiry text and the answer text to match with inquiry text are separately input into N number of key content computation model, obtain each pass The candidate key content of key content computation model output；Wherein, key content computation model is based on sample inquiry text, sample It replies after the sample key content that text and sample reply in text is trained and obtains, each candidate key content is from answering It is extracted in multiple text, N is the positive integer more than 1；According to each candidate key content, optimal key content is obtained, and will most Excellent key content is as the corresponding answer of inquiry text.

The embodiment of the present invention provides a kind of non-transient computer readable storage medium, the non-transient computer readable storage medium Matter stores computer instruction, which makes computer execute the information acquisition method that above-described embodiment is provided, such as Including：The answer text to match by inquiry text and with inquiry text is separately input into N number of key content computation model, obtains The candidate key content of each key content computation model output；Wherein, key content computation model is based on sample inquiry text Originally, sample is replied after the sample key content in text and sample answer text is trained and is obtained, in each candidate key Appearance is extracted from answer text, and N is the positive integer more than 1；According to each candidate key content, obtain in optimal key Hold, and using optimal key content as the corresponding answer of inquiry text.

One of ordinary skill in the art will appreciate that：Realize that all or part of step of above method embodiment can pass through The relevant hardware of program instruction is completed, and program above-mentioned can be stored in a computer read/write memory medium, the program When being executed, step including the steps of the foregoing method embodiments is executed；And storage medium above-mentioned includes：ROM, RAM, magnetic disc or light The various media that can store program code such as disk.

The embodiments such as information acquisition apparatus described above are only schematical, wherein illustrate as separating component Unit may or may not be physically separated, and the component shown as unit may or may not be object Manage unit, you can be located at a place, or may be distributed over multiple network units.It can select according to the actual needs Some or all of module therein is selected to achieve the purpose of the solution of this embodiment.Those of ordinary skill in the art are not paying wound In the case of the labour for the property made, you can to understand and implement.

Through the above description of the embodiments, those skilled in the art can be understood that each embodiment can It is realized by the mode of software plus required general hardware platform, naturally it is also possible to pass through hardware.Based on this understanding, on Stating technical solution, substantially the part that contributes to existing technology can be expressed in the form of software products in other words, should Computer software product can store in a computer-readable storage medium, such as ROM/RAM, magnetic disc, CD, including several fingers It enables and using so that a computer equipment (can be personal computer, server or the network equipment etc.) executes each implementation Certain Part Methods of example or embodiment.

Finally, the present processes are only preferable embodiment, are not intended to limit the protection model of the embodiment of the present invention It encloses.With within principle, any modification, equivalent replacement, improvement and so on should be included in all spirit in the embodiment of the present invention Within the protection domain of the embodiment of the present invention.

Claims

1. a kind of information acquisition method, which is characterized in that including：

The answer text to match by inquiry text and with the inquiry text is separately input into N number of key content computation model, Obtain the candidate key content of each key content computation model output；Wherein, the key content computation model is to be based on sample This inquiry text, sample are replied after the sample key content in text and sample answer text is trained and are obtained, often One candidate key content is extracted from the answer text, and the N is the positive integer more than 1；

According to each candidate key content, optimal key content is obtained, and using the optimal key content as inquiry text This corresponding answer.

2. according to the method described in claim 1, it is characterized in that, described match by inquiry text and with the inquiry text Answer text be separately input into N number of key content computation model, obtain the candidate of each key content computation model output and close Key content, including：

For the candidate key content of any key content computation model output, each subordinate sentence conduct in the answer text is obtained The initial probability of start statement in the candidate key content, and each subordinate sentence is obtained in the answer text as the candidate The end probability of END in key content；

According to the corresponding initial probability of each subordinate sentence, the corresponding subordinate sentence conduct of maximum initial probability is chosen from the answer text The start statement of the candidate key content is chosen most according to the corresponding end probability of each subordinate sentence from the answer text Terminate END of the corresponding subordinate sentence of probability as the candidate key content greatly；

By subordinate sentence between the start statement and the END in the answer text, the start statement and The END, as the candidate key content.

3. according to the method described in claim 2, it is characterized in that, described according to each candidate key content, optimal pass is obtained Key content, including：

Each start statement in the start statement for including for all candidate key contents, according to each start statement as institute Ballot score value when start statement in optimal key content is stated, determines the optimal start statement in the optimal key content；

Each END in the END for including for all candidate key contents, according to each END as institute Ballot score value when END in optimal key content is stated, determines the optimal END in the optimal key content；

By subordinate sentence between the optimal start statement and the optimal END in the answer text, described optimal Start statement and the optimal END, as the optimal key content.

4. according to the method described in claim 3, it is characterized in that, it is described according to each start statement as the optimal key Ballot score value in content when start statement further includes before determining the optimal start statement in the optimal key content：

For any candidate start statement in all candidate start statements, preset meeting first in all candidate key contents The candidate key content of condition, as first object candidate key content, all candidate start statements are all candidate passes The start statement that key content includes, first preset condition are comprising any candidate start statement and any candidate Start statement is as first subordinate sentence；

It is closed according to the total quantity of first object candidate key content in all candidate key contents and each first object candidate The corresponding initial probability of start statement in key content calculates any candidate start statement as in the optimal key content Ballot score value when start statement.

5. method according to claim 3 or 4, which is characterized in that it is described according to each END as described optimal Ballot score value in key content when END also wraps before determining the optimal END in the optimal key content It includes：

For any candidate END in all candidate ENDs, preset meeting second in all candidate key contents The candidate key content of condition, as the second target candidate key content, all candidate ENDs are all candidate passes The END that key content includes, second preset condition are comprising any candidate END and any candidate END is as the last item subordinate sentence；

It is closed according to the total quantity of the second target candidate key content in all candidate key contents and every one second target candidate The corresponding end probability of END in key content calculates any candidate END as in the optimal key content Ballot score value when END.

6. according to the method described in claim 1, it is characterized in that, described match by inquiry text and with the inquiry text Answer text input to N number of key content computation model, in the candidate key that obtains each key content computation model output Before appearance, further include：

M key content computation models to be selected are combined, obtain several groups target model set to be selected, every group of target is to be selected Include N number of key content computation model to be selected in model set, the M is not less than the N；

Multiple test sample examples are separately input into every group of target model set to be selected, every group of target model set to be selected is obtained and exists Using each test sample example as obtained sample optimal key content when input；Wherein, each test sample example includes Sample inquiry text and the sample to match with the sample inquiry text reply text, and each test sample example corresponds to related Key content, the corresponding key content of each test sample example be based on the sample inquiry text in the sample replies text It extracts in advance；

By every group of target model set to be selected using each test sample example as in the optimal key of obtained sample when inputting Hold key content corresponding with each test sample example to be compared, every group of target model set to be selected is determined according to comparison result Corresponding acquisition of information accuracy rate chooses maximum information and obtains the corresponding target model set to be selected of accuracy rate as described N number of Key content computation model.

7. according to the method described in claim 6, it is characterized in that, described carry out group by M key content computation models to be selected It closes, obtains several groups target model set to be selected, including：

According to the corresponding acquisition of information accuracy rate of the model set to be selected obtained after combination, one by one in key to be selected to the M Hold computation model to be combined, until obtaining the several groups target model set to be selected.

8. a kind of information acquisition device, which is characterized in that including：

Candidate key content obtaining module, the answer text difference for matching by inquiry text and with the inquiry text are defeated Enter the candidate key content that each key content computation model output is obtained to N number of key content computation model；Wherein, described Key content computation model is to reply the sample in text and sample answer text based on sample inquiry text, sample to close Key content obtains after being trained, and each candidate key content is extracted from the answer text, and the N is more than 1 Positive integer；

Optimal key content acquisition module, for according to each candidate key content, obtaining optimal key content, and by described in most Excellent key content is as the corresponding answer of the inquiry text.

9. a kind of information acquisition apparatus, which is characterized in that including：

At least one processor；And

At least one processor being connect with the processor communication, wherein：

The memory is stored with the program instruction that can be executed by the processor, and the processor calls described program to instruct energy Enough methods executed as described in claim 1 to 7 is any.

10. a kind of non-transient computer readable storage medium, which is characterized in that the non-transient computer readable storage medium is deposited Computer instruction is stored up, the computer instruction makes the computer execute the method as described in claim 1 to 7 is any.