CN106991161A

CN106991161A - A kind of method for automatically generating open-ended question answer

Info

Publication number: CN106991161A
Application number: CN201710205299.8A
Authority: CN
Inventors: 曹欢欢; 罗立新
Original assignee: Beijing ByteDance Technology Co Ltd
Current assignee: Beijing Douyin Information Service Co Ltd
Priority date: 2017-03-31
Filing date: 2017-03-31
Publication date: 2017-07-28
Anticipated expiration: 2037-03-31
Also published as: CN106991161B

Abstract

The invention provides a kind of method for automatically generating open-ended question answer, including：The project obtained in content library, the content library at least includes the attribute in item source, the contents of a project and item-title；Set up answer rule and obtained from the content library and meet the project of answer rule to set up answer storehouse, the answer rule include at least with a kind of corresponding screening conditions in the item source, the contents of a project, item-title；Project carried out participle and found the destination item for possessing all participles in answer storehouse the problem of to input；Calculate the synthesis pertinence of described problem project and destination item and find out synthesis pertinence highest destination item to export.The open-ended question that the present invention can be directed in network application provides the autonomous reply of network.

Description

A kind of method for automatically generating open-ended question answer

Technical field

The present invention relates to Computer Applied Technology field, more particularly to a kind of side for automatically generating open-ended question answer Method.

Background technology

With the development of internet, Ask-Answer Community is increasingly becoming a kind of deep internet product welcome by netizen, such as domestic " knowing ", " top question and answer ", the Quora in the U.S., user can propose various problems in community, and other users see oneself The problem of familiar field, can just issue the answer of oneself.These answers are understood and seen by all users for browsing the problem, thumb up Or point is stepped on, welcome answer can be discharged to forward position by system.One challenge of this kind of community is, with the increasing of customer volume It is long, seldom it is concerned about the problem of more and more by other users, causes unmanned answer, the enthusiasm of quizmaster overwhelms greatly.Have A little communities, which are provided, answers the main function of inviting automatically, can be according to the answer history of user problem invite automatically suitable user come Answer.But can so cause it is active answer master and receive excessive invitation, have no idea in the case where ensureing to answer quality and When answer a question.Therefore, this current problem is still without effective solution.

In terms of automatic answer, existing technology is focused primarily in the answer generation for solving closed question, such as " difficult to understand Bar horse is which president of the U.S.”.Problem typically can be being converted into a structure by some existing systems using NLP technologies The query language of change and the knowledge mapping return answer built in advance by inquiry.But existing automatic answer generation technique without The answer that method solves open-ended question automatically generates problem, such as：" ecology to LeEco is anti-war slightly how to see for you" this kind of ask What the quizmaster of topic needed is not intellectual answer, but the elaboration of the analysis and viewpoint to problem, so as to reach thought Exchange.Current network Ask-Answer Community can not have open specific network problem to be replied automatically to above-mentioned, also not have Realize the function of automatically generating answer of network problem.The shortcoming of above-mentioned functions causes a large amount of wastes of Internet resources, it is impossible to whole Internet resources are closed, network self-adapting can not be realized interrogation responsor system are handled so that a large number of users problem can not obtain anti-in time Feedback is with solving.

The content of the invention

The technical problem that technical solution of the present invention is solved is：The open-ended question how being directed in network application provides net The autonomous reply of network.

In order to solve the above-mentioned technical problem, technical solution of the present invention, which is provided, a kind of automatically generates open-ended question answer Method, including：

The project obtained in content library, the content library at least includes in item source, the contents of a project and item-title One attribute；

Set up answer rule and obtained from the content library and meet the project of the answer rule to set up answer storehouse, institute State answer rule include at least with a kind of corresponding screening conditions in the item source, the contents of a project, item-title；

Project carried out participle and found the destination item for possessing all participles in answer storehouse the problem of to input；

Calculate the synthesis pertinence of described problem project and destination item and find out synthesis pertinence highest destination item To export.

Optionally, the acquisition content library includes：Content is set up according to the document or video of internal offer or outside crawl Project in storehouse.

Optionally, the answer rule of setting up includes：

Screening meets the project of at least one in following condition from content library：

The technorati authority of the item source is legal；

The timeliness of the contents of a project is legal；

Entity word is included in the item-title；

The item-title is interrogative sentence.

Optionally, the answer rule of setting up includes：

Meet the project of the answer rule described in artificial screening from the content library and do not meet the answer rule Project using as the first positive sample and the first negative sample；

The attribute of the project, the first positive sample and the first negative sample input answer rule are set up into model, train described Answer rule sets up model to set up the answer rule.

Optionally, the training answer rule sets up model is included with setting up the answer rule：

Obtain first positive sample/first negative sample true relative to the first True Data/the second of the item attribute Real data；

The negative sample of the first positive sample described in model prediction/first is set up using the answer rule relative to the project to belong to The prediction data of first prediction data of property/second；

The model parameter that the answer rule sets up model is adjusted, when first prediction data and the first True Data one Cause and the second prediction data is consistent with the second True Data, export the model parameter and its answer rule sets up model.

Optionally, the answer rule sets up model and is based on machine learning algorithm.

Optionally, the machine learning algorithm is in NB Algorithm, GBDT algorithms, SVM algorithm and RNN algorithm It is a kind of.

Optionally, project carries out participle and finds the target item for possessing all participles in answer storehouse the problem of described pair of input Mesh includes：

The project the problem of input is divided into the sequence of the single word composition of order；

If item-title possesses each word in the sequence in the answer storehouse, the project of the item-title is selected It is used as the destination item.

Optionally, the described problem project and the synthesis pertinence of destination item of calculating includes：

Respectively the single degree of correlation between described problem project and destination item is calculated using some degree of correlation functions；

Weight is distributed to the single degree of correlation that calculating is obtained and adds the accumulation of the weight and the corresponding single degree of correlation To obtain the synthesis pertinence.

Optionally, it is described to calculate single between described problem project and destination item using some degree of correlation functions respectively The degree of correlation at least includes the one or more for performing following several calculations：

Participle is carried out to destination item title and problem project, by word2vec instruments obtain the numerical value of each word to Amount, then obtains the addition of vectors of these words the semantic vector and problem project semantic vector of item-title, finally calculates Two vectorial cosine distances are stated, to obtain the first single degree of correlation；

Destination item title carries out participle with problem project, and the quantity for calculating the two shared word accounts for the ratio of the two all word Example, to obtain the second single degree of correlation；

The keyword of destination item title is obtained, the numerical value vector of each keyword is obtained by word2vec instruments, so These vectors are added and are used as the semantic vector of destination item title afterwards；Participle is carried out to problem project, then passed through Word2vec instruments obtain the numerical value vector of each word and add and as the semantic vector of problem project, finally calculate destination item Cosine distances between title and problem project semantic vector, to obtain the 3rd single degree of correlation.

Optionally, the method for automatically generating open-ended question answer also includes：

Meet the item-title of described problem project described in artificial screening from the answer storehouse and do not meet described problem The item-title of project is using as the second positive sample and the second negative sample；

It is individual character sequence to split described problem project, the second positive sample and the second negative sample, and is based on obtaining after RNN is handled The semantic vector of each sequence；

Whether training RNN models meet true correlation degree with the cosine distances between the above-mentioned each semantic vector of Accurate Prediction, To obtain RNN model parameters；

The described problem project and the synthesis pertinence of destination item of calculating includes：

Destination item title and problem project are carried out participle to obtain individual character sequence；

Based on the RNN model parameters using mark item-title and problem project individual character sequence described in RNN model predictions it Between the prediction degree of correlation.

Optionally, the method for automatically generating open-ended question answer also includes：At least extract synthesis pertinence highest Destination item in object content issued.

The beneficial effect of technical solution of the present invention at least includes：

Technical solution of the present invention can solve the problem that in the prior art the problem of community's question and answer can not be replied provides the user A kind of technical scheme that answer is effectively automatically generated with reference to network homegrown resource, not only solves network configuration resources idle Problem, also improves the validity and enthusiasm of community's question-answering environment.

Technical solution of the present invention effectively confirms that optimum network is answered by setting up answer rule and project indicator for coherence Case, and can select to be combined there is provided a kind of effective technical scheme for obtaining problem answers with user, improve answer and problem The degree of accuracy of adaptation.

Technical solution of the present invention can be greatly reduced in Ask-Answer Community zero in the case where input resource is less and answer a question Ratio, while the answer automatically generated can to the user that answer a question is prepared as reference, also reduce write it is high-quality The threshold of answer, improves Consumer's Experience.

Brief description of the drawings

A kind of method flow schematic diagram for automatically generating open-ended question answer that Fig. 1 provides for technical solution of the present invention；

A kind of method flow schematic diagram for setting up answer rule that Fig. 2 provides for technical solution of the present invention；

Fig. 3 shows for the method flow that a kind of training answer rule that technical solution of the present invention is provided sets up model It is intended to；

Fig. 4 for technical solution of the present invention provide it is a kind of to input the problem of project carry out participle and find in answer storehouse to have The method flow schematic diagram of the destination item of standby all participles；

Fig. 5 a kind of calculates described problem project and destination item for what technical solution of the present invention was provided based on artificial rule The method flow schematic diagram of synthesis pertinence；

Another method flow signal for automatically generating open-ended question answer that Fig. 6 provides for technical solution of the present invention Figure；

Fig. 7 is based on machine learning method for one kind that technical solution of the present invention is provided and calculates described problem project and target item The method flow schematic diagram of purpose synthesis pertinence；

Fig. 8 illustrates for another method flow for automatically generating open-ended question answer that technical solution of the present invention is provided Figure；

Fig. 9 illustrates for another method flow for automatically generating open-ended question answer that technical solution of the present invention is provided Figure.

Embodiment

The automatic answer for open network problem can not be realized in the prior art, if there can be one kind to be effectively directed to The answer automatic generation method of open-ended question, it is possible to which the problem of alleviating many users in Ask-Answer Community significantly cannot be timely The phenomenon of answer.If the accuracy of the automatic answer generation method can not also directly reply quizmaster, reference can also be used as The system that official documents and correspondence is sent to thinks the user for having the ability to answer former problem, and answer user just can be with output high-quality by simple editing Answer, is higher by much than writing an answer efficiency one's very own, and ratio also can accordingly decline the problem of without any answer.This Inventive technique scheme proposes a kind of method for automatically generating answer, can include root in article, the content library of video from one Candidate answers are extracted according to customer problem, quizmaster both can be directly replied, be invited to potential can also be sent to as material Master is answered, its cost for writing answer is improved.

According to the overall invention thinking of technical solution of the present invention, it sets up answer firstly the need of by obtaining a content library Storehouse, answer storehouse foundation need from content library selection meet answer rule project, i.e., " quasi- answer project ", hereafter further according to The problem of user proposes project, realizes the process of answer Auto-matching.Technical solution of the present invention firstly the need of getting content library, Its content-form can include article and short-sighted frequency, secondly need to excavate in content library and be suitable as the content of answer as answering Case storehouse, the problem of then being given according to user, finds most suitable candidate answers by way of retrieval in answer storehouse；To waiting Select answer that predetermined certainty factor algebra is set to calculate the confidence level of candidate answers, selection confidence level is higher than the candidate answers of threshold value Directly reply answer or using candidate answers as material be sent to it is potential answer it is main for its reference.

The specific implementation process of technical solution of the present invention is elaborated below in conjunction with accompanying drawing.

A kind of method for automatically generating open-ended question answer according to Fig. 1, comprises the following steps：

Step S100, the project obtained in content library, the content library at least includes item source, the contents of a project and project An attribute in title.

Step S101, sets up answer rule and is obtained from the content library and meet the project of answer rule to set up Answer storehouse, the answer rule includes screening bar at least corresponding with one kind in the item source, the contents of a project, item-title Part；

Step S102, project carries out participle and finds the target item for possessing all participles in answer storehouse the problem of to input Mesh；

Step S103, calculates the synthesis pertinence of described problem project and destination item and finds out synthesis pertinence highest Destination item is to export.

According to step S100, the acquisition content library includes：Built according to the document or video of internal offer or outside crawl Project in vertical content library.In order to set up answer storehouse, it is necessary first to there is a content library comprising article or short-sighted frequency.These Content can be captured from internet, such as the software based on some crawl internet datas is in specific internet sites Content carries out matching crawl, and the content library is constantly updated to set up new content library, and be updated the data according to internet.It is another Situation is that the operator of Ask-Answer Community also runs content platform simultaneously, there is legal content obtaining channel.Such as top question and answer are real Product under today's tops on border, itself have tens million of legal short-sighted frequencies and article can as answer storehouse source, i.e., The amalgamation of content of platform can be directly based upon and content library is got based on platform database.In general, content library needs to contain Lid field is more wide better, and so giving problem has more maximum probability to find suitable answer or material.There is content library, the present invention Technical scheme goes out the content item possible as problem answers i.e. according to step S101 by default answer Rules Filtering, Answer storehouse is set up with this.

Under a kind of change case of technical solution of the present invention, according to step S101, the answer rule of setting up includes：From interior Screening meets the project of at least one in following condition in Rong Ku：

The technorati authority of the item source is legal；

The timeliness of the contents of a project is legal；

Entity word is included in the item-title；

The item-title is interrogative sentence.

The change case is provided in a kind of relatively simple scheme for setting up answer rule, such scheme, the condition of screening Can be the screening content of a variety of item attributes arranged side by side, to form the answer rule as defined in one group of screening conditions, screening Condition can also be a kind of screening content of item attribute, and the project thus set up in a variety of answer rules, answer storehouse needs Satisfaction answer as defined in one group of screening conditions is regular or meets a variety of answers rules simultaneously.

It is given below a kind of fairly simple to set up answer in the case of answer rule is as defined in one group of screening conditions The application examples of rule：

The screening process of this group of answer rule settings can be：

Whether identify the ageing whether legal of the contents of a project, the i.e. contents of a project is intellectual long timeliness content；

The type for recognizing item-title is interrogative sentence or declarative sentence；

Recognize in the item-title and whether include entity word；

Recognize whether the length of item-title exceedes predetermined length；

Whether the technorati authority in contents of a project source is higher than predetermined value；

Whether recognition result meets predetermined the selection result, if so, then regarding as screening programme to meet the answer rule Project then, and it is attributed to answer library item mesh.Such as only screening meets the content of one of following condition as candidate answers, i.e., The predetermined the selection result of setting is as follows：

Predetermined the selection result one：(title includes query to (long timeliness content) AND (clear and definite entity is included in title) AND Sentence)；

Predetermined the selection result two：(long timeliness content) AND (clear and definite entity is included in title) AND (title is declarative sentence) AND (source technorati authority is higher than predetermined value).

For example, the item-title of wherein one article is in content library《Do then mobile phone company all go where3 points Clock is understood them and lived by what》, judge it is that entity " mobile phone " is included in long timeliness content, title through ageing identification model, and Have a question sentence, meets predetermined the selection result one, therefore can be screened as candidate answers, meets answer rule, and add and answer Case storehouse.If user asks some that such as " company for making mobile phone then is a lot, now all in What for" " mobile phone industry has those The case changed one's profession", this article is exactly a valuable answer.

Recognize that article is ageing according to content of text in the above method, recognize the Entity recognition in the clause of title, title All it is ripe text analysis technique, will not be repeated here.Technorati authority of originating is usually artificial setting, if what content library was related to Source is more, it is necessary to certain workload is counted and is classified to the source that content library is related to, a kind of divided originating The mode of level may be referred to be specified below, but it should be recognized that it is only a kind of example to be specified below, according to different sources and Classification is required, can there is other different hierarchical approaches：

System has made grading to technorati authority of being originated in internet project in advance, such as is commented for the source degree of governmental site Level is highest, i.e., 10 grade, and degree grading in public institution's website source is taken second place, i.e., 9 grades, official's speech in social media site The grading of source degree again, i.e., 8 grades, assert from Media Burst and by official, is rated 7 by of a mass character in social media site Level, of a mass character in social media site is from Media Burst and hop count and number of visits reach high pre-determined number, is rated 6 Level, of a mass character in social media site is from Media Burst and hop count and number of visits reach middle pre-determined number, is rated 5 Level, of a mass character in social media site is from Media Burst and hop count and number of visits reach low pre-determined number, is rated 4 Level, of a mass character in social media site is from Media Burst and number of visits reaches pre-determined number, is rated 3 grades, it is of a mass character from Media site and number of visits reach pre-determined number, are rated 2 grades, are otherwise rated 1 grade.

According to step S101, the answer rule of setting up can be using a kind of complex but higher side of screening accuracy Formula, i.e., train the model of an automatic screening candidate answers using the method for machine learning, then with this model in content library The contents of a project are made whether to meet the prediction of answer rule, and judge whether the project in content library meets according to predicting the outcome Answer rule.

Under another change case of technical solution of the present invention, as shown in Fig. 2 the answer rule of setting up includes following step Suddenly：

Step S200, the project of the answer rule is met from the content library and is not met described described in artificial screening The project of answer rule is using as the first positive sample and the first negative sample；

Step S201, by the attribute of the project, the first positive sample and the first negative sample input answer rule set up model, Train the answer rule to set up model to set up the answer rule.

According to step S200, the process of wherein Screening Samples includes：It is first artificial from content library in advance to mark certain amount Can as candidate answers content (i.e. the first positive sample) and be less properly used as candidate answers (i.e. the first negative sample), one As each class sample in sample set, i.e. the first positive sample and the first negative sample, at least need each class respectively to mark thousands of individuals The sample of work screening.

After above-mentioned sample is obtained, further according to step S201, these samples are inputted machine learning model, namely described answered Case rule sets up model, such as model-naive Bayesian, GBDT models, SVM models etc., how allows model oneself study according to interior Every attribute of appearance distinguishes positive sample and negative sample.These models can be for differentiating whether the content can be with after study terminates It is used as the candidate answers of some problems.Contents attribute mentioned here includes the first based on referring in artificial regular method Explicit attribute, such as, ageing (whether be intellectual long timeliness content) of content, title type (be interrogative sentence or Declarative sentence, if include univocal entity), the length of title, the technorati authority in source.The category being understood by except these Property outside, may be incorporated into some do not allow it is readily understood, but may to machine learning model strengthen recognition capability it is helpful multiple Miscellaneous attribute.Content title is such as converted into numerical value vector with RNN (Recognition with Recurrent Neural Network), individual layer is used as with this numerical value vector The input of neutral net, obtains the probability P of " title may answer a problem ", and this probability P can also be as interior The attribute held.For above-mentioned attribute, training pattern is exported to attribute forecast result, can be to above-mentioned prediction knot in application Fruit is compared with predetermined result, to obtain the content library project for meeting answer rule, so as to set up answer storehouse.Specifically , according to step S201, as shown in figure 3, the training answer rule is set up model and included with setting up the answer rule Following steps：

Step S300, obtains first truly number of the negative sample of first positive sample/first relative to the item attribute According to the/the second True Data；

Step S301, using the answer rule set up the negative sample of the first positive sample described in model prediction/first relative to The prediction data of first prediction data of the item attribute/second；

Step S302, adjusts the model parameter that the answer rule sets up model, when first prediction data and first True Data is consistent and the second prediction data is consistent with the second True Data, exports the model parameter and its answer rule is built Formwork erection type.

Wherein, it is described to obtain first truly number of the negative sample of first positive sample/first relative to the item attribute Include according to the/the second True Data：First True Data of first positive sample relative to the item attribute is obtained, i.e., such as Item attribute for it is above-mentioned the first based on the explicit attribute referred in artificial regular method, first True Data is directed to it In predetermined the selection result one be：Be, be, being (be by binary data：111)；And obtain the first negative sample phase For the second True Data of the item attribute, second True Data is for predetermined the selection result one therein： It is no, no, no (to be by binary data：000).

It is described to set up the negative sample of the first positive sample described in model prediction/first relative to the item using the answer rule The prediction data of first prediction data of mesh attribute/second includes：First is being set up described in model prediction using the answer rule just Sample is relative to the first prediction data of the item attribute, such as the first prediction data is for predetermined the selection result therein One is：It is, is, no (is by binary data：110)；And, set up described in model prediction first using the answer rule negative Sample is relative to the second prediction data of the item attribute, and second prediction data is for predetermined the selection result therein One is：It is no, be, no (be by binary data：010).

The adjustment answer rule sets up the model parameter of model, according to examples detailed above, i.e., including being answered described in adjustment The model parameter that case rule sets up model makes 110 prediction numerical value accurately can be output as 111 in prediction, makes 010 prediction number Value accurately can be output as 000 in prediction.What above-mentioned training process was just as each sample data, finally in all samples First prediction data of notebook data is consistent with the first True Data and the second prediction data is consistent with the second True Data, then exports The model parameter and its answer rule set up model.

Technical solution of the present invention can be at internal operation or outside for step S300~S302 flow What reason was obtained.According to above-mentioned steps S300~S302, the answer rule sets up model and is based on machine learning algorithm.Specifically, The machine learning algorithm is one kind in NB Algorithm, GBDT algorithms, SVM algorithm and RNN algorithms.In given one group Hold attribute and a large amount of positive samples, negative sample, how training machine learning model obtains an effective candidate answers identification model, A ripe problem being adequately addressed in machine learning field, thus this process particular technique details herein not Repeating.

According to the above of technical solution of the present invention, it has been described that candidate sets up the flow in candidate answers storehouse.Need Bright is：

If the operator of Ask-Answer Community has possessed, a class is more complete, and abundant in content content platform is (such as The operator of top question and answer product possesses daily newly-increased hundreds of thousands article simultaneously, and the top news number of video is from media platform), then Fig. 1 In obtaining step S100 can be directly based upon platform database and acquire, substantial amounts of article is otherwise then captured from internet The project come with the video (can be short-sighted frequency) with description information in constitution content storehouse, and realize self-renewing.

Technical solution of the present invention additionally provides screening and meets the content library project of answer rule to set up the technology in answer storehouse Means, including the technological means for directly designing one or more candidate answers rules and by training machine learning model come The technological means of answer rule is set up, two kinds of means can be used alone in technical scheme, can also use parallel respectively.

In addition, in the technological means for directly designing one or more candidate answers rules, if there are a variety of candidate answers When regular, it can also be needed according to the screening effect under different situations to different candidate answers rule setting weights, using each The accumulation result of answer rule judgment result and the product of weight is selected to carry out reality as the judged result of final content library project Screening.

The means that technical solution of the present invention sets up answer storehouse are not limited by above-mentioned technical proposal.

According to the above of technical solution of the present invention, after candidate answers storehouse is established, give a user and propose The problem of, most suitable candidate answers can just be searched in answer storehouse according to step S102, specifically, according to step S102, Technical solution of the present invention provides a kind of feasible answer automatic Matching means, as shown in figure 4, described ask input Topic project, which carries out participle and finding possesses all participles destination item in answer storehouse, includes step：

Step S400, the project the problem of input is divided into the sequence of the single word composition of order；

Step S401, if item-title possesses each word in the sequence in the answer storehouse, selects the project The project of title is used as the destination item.

It should be noted that according to step S400, the number of characters one that its general length of project is included the problem of the input As can be shorter, i.e., it is similar the problem of be a word or the problem of a few words, may be with input problem project length in length Similar meeting is the item-title in answer storehouse, therefore the process of answer Auto-matching may be limited to problem project and project herein Title directly matches hand.But it is understood that, if comparison of item is long the problem of input, its content covered can also compare Many, the problem of typically entering project can also have title, and the title and item-title that can now use input problem project are made Matching.That is step S400 meaning input the problem of project input the problem of content-length exceed predetermined content length when, choose Problem project title as project the problem of the input, if but input the problem of content-length not less than predetermined content length When, then it regard described problem length as project the problem of the input.

, can be according to calculating the comprehensive of described problem project and destination item after destination item is found according to step S103 Close the degree of correlation to speculate the match condition of problem project and destination item, so as to realize Auto-matching answer.The technology of the present invention Scheme provides the following two kinds numerical procedure to get above-mentioned synthesis pertinence.Specially：

The first numerical procedure is artificial regular method, as shown in figure 5, the calculating described problem project and target item Purpose synthesis pertinence includes step：

Step S500, calculates the single-phase between described problem project and destination item using some degree of correlation functions respectively Guan Du；

Step S501, distributes the obtained single degree of correlation of calculating weight and by the weight and the corresponding single degree of correlation Accumulation obtained the synthesis pertinence.

It is described to be calculated respectively using some degree of correlation functions between described problem project and destination item according to step S500 The single degree of correlation at least include the one or more for performing following several calculations, such as：

The first calculation：Participle is carried out to destination item title and problem project, obtained by word2vec instruments The numerical value vector of each word, the addition of vectors of these words is then obtained item-title semantic vector and problem project it is semantic to Amount, finally calculates the cosine distances of above-mentioned two vector, to obtain the first single degree of correlation；

Second of calculation：Destination item title carries out participle with problem project, and the quantity for calculating the two shared word is accounted for The ratio of the two all word, to obtain the second single degree of correlation；

The third calculation：The keyword of destination item title is obtained, obtains each crucial by word2vec instruments The numerical value vector of word, then adds these vectors and be used as the semantic vector of destination item title；Participle is carried out to problem project, Then the numerical value vector of each word is obtained by word2vec instruments and adds and as the semantic vector of problem project, finally calculate Cosine distances between destination item title and problem project semantic vector, to obtain the 3rd single degree of correlation.

Wherein, Word2vec instruments are that a kind of popular term vector chemical industry has in the prior art, are developed by Google, can So that word is represented a numerical value vector, and ensure the word of semantic similarity its numerical value vector distance also should calculate closer to, the instrument Here is omitted for principle.

In the first extension example according to this numerical procedure, order：

Rel (c, q)=w₁×f₁(c,q)+w₂×f₂(c,q)+…+w_n×f_n(c,q)

Wherein, c and q represent destination item and problem project respectively, and rel (c, q) represents c and q synthesis pertinence, f₁、 f₂、…、f_nSingle correlation function is represented respectively, and n is the natural number more than 2, w₁、w₂、…、w_nRepresent correlation function f₁、 f₂、…、f_nWeight, is manual setting.f₁、f₂、…、f_nCalculation can using it is above-mentioned the first to the third calculating side Formula, it would however also be possible to employ the calculation of other degree of correlation functions of the prior art.Those skilled in the relevant art can also set More effective degree of correlation functions are counted out, as space is limited, be will not enumerate herein.

In the method that second of numerical procedure of technical solution of the present invention is machine learning, one kind as shown in Figure 6 is automatic The method (method flow shown in Fig. 6 is based on Fig. 1) of open-ended question answer is generated, in addition to including step S100~S103, Also comprise the following steps：

Step S600, meets the item-title of described problem project described in artificial screening from the answer storehouse and does not meet The item-title of described problem project is using as the second positive sample and the second negative sample；

Step S601, it is individual character sequence to split described problem project, the second positive sample and the second negative sample, and based on RNN The semantic vector of each sequence is obtained after processing；

Whether step S602, training RNN models meet true with the cosine distances between the above-mentioned each semantic vector of Accurate Prediction The real degree of correlation, to obtain RNN model parameters.

Technical solution of the present invention can be at internal operation or outside for step S600~S602 flow What reason was obtained.RNN (Recurrent Neutral Networks) models namely Recognition with Recurrent Neural Network model, are a kind of popular Neural network structure, it inputs a character string, can obtain the potential language that a vector is used for representing this character string Justice, there is more careful elaboration to the model in the prior art.Technical solution of the present invention carrys out training problem with the RNN models Matching prediction between mesh and destination item title, so as to improve the degree of accuracy of matching prediction.

Step S103 in Fig. 6, with reference to Fig. 7, the calculating described problem project is comprehensive related to destination item Degree includes step：

Step S700, carries out participle to obtain individual character sequence to destination item title and problem project；

Step S701, mark item-title described in RNN model predictions and problem project list are used based on the RNN model parameters The prediction degree of correlation between word sequence.

According to step S700 to S701, it is known that, in technical solution of the present invention, for a customer problem q, can first it mark Note content { c } (the desired title c matched with the problem the collection that this question answering is suitable as in a collection of candidate answers storehouse Close), a collection of content { c ' } for being not suitable as this question answering is then marked (i.e. with the unmatched desired title c ' of the problem Set).The pairing of q and all c compositions can be used as second as the second positive sample, q and all c ' compositions pairing Negative sample.Generally require and thousands of different q are carried out with this operation, collect number with ten thousand grades of positive negative sample, then can just instruct Practicing machine learning model, how study calculates rel (c, q) automatically.

The degree of correlation of candidate answers title and customer problem is learnt using popular RNN models in this example, here Problem and candidate answers title all as individual character sequence (such as problem q for " why Sony, the Japanese enterprises such as Panasonic are in recent years Decline" it is converted into following individual character sequence<For, it is assorted, rope, Buddhist nun, pine, under, etc., day, this, enterprise, industry closely, in year, is come, Decline, fall,>), the semantic vector of sequence of question and the semantic vector of candidate answers title are obtained after RNN model treatments.If The two is related, and the cosine distances between two vectors should be 1, if the two is uncorrelated, and its cosine distance should be 0.

Using classical BPTT algorithms, that is, consider cosine distances between the vector that is obtained using RNN model predictions and The true error for calculating the cosine distances between obtained vector, RNN models are adjusted with this error in turn in sample Model parameter, constantly adjustment model parameter make it that resulting predicted value and actual value error are less and less, it is possible to find Suitable model parameter make it that RNN models are most strong to the predictive ability of (c, q) paired sample correlation.

Find after suitable model parameter, model just trains completion, i.e., just can be using suitable RNN models for any (c, q) combination calculate the semantic vector and the semantic vector of candidate answers title of correlation, i.e. prediction sequence the problem of obtain Between cosine distances, so as to judge match condition according to predicted value：First using c title and q as individual character sequence, then Respective semantic vector is obtained with RNN model treatments respectively, then calculates cosine distances.The distance calculated is bigger, the degree of correlation It is higher, so this computational methods can be maximally related for being found out from some candidate answers.The prediction degree of correlation is to refer to According to the cosine distances between the semantic vector of RNN model computational problem sequences and the semantic vector of candidate answers title.

If the degree of accuracy of matching algorithm is higher (to need actual assessment, because the effect of artificial rule needs continuous adjustment Advise to attempt to improve, it is not easy to obtain satisfied effect, the effect of machine learning method is by sample quality and the shadow of quantity Ring), correlation can be directly returned to be published to by robot account behind problem higher than the candidate answers of specific threshold and asked Community is answered, certainly such way has certain risk, safer way is that candidate answers are issued to be adapted to answer the problem Real user, allows the user to judge whether candidate answers answer problem well.The other user can also be candidate answers As material, a more preferable answer is replied in modification on this basis, can so be started anew to write efficiency than oneself and be carried out get Geng Gao. Under another change case of technical solution of the present invention, (Fig. 8 is based on scheme shown in Fig. 1, implements certainly at other as shown in Figure 8 Example in can also be deformed based on scheme shown in Fig. 6), a kind of method for automatically generating open-ended question answer, except including Outside step S100~S103, also comprise the following steps：Step S800, is at least extracted in synthesis pertinence highest destination item Object content is issued.

The object of above-mentioned issue can be the user that the user of proposition problem or work done in the manner of a certain author go out answer.According to Fig. 8 institutes Show an application examples for automatically generating the method for open-ended question answer to operate, as shown in figure 9, wherein customer problem comes from Ask-Answer Community, is generally not that (such as " whom present US President is to factoid questions to simple factoid questions"), but need (such as " Donald Trump can bring anything to become to the open problem for wanting more word or video to illustrate after taking up an official post to international situation Change"), this method comprises the following steps：

It is to excavate candidate answers from a content library comprising magnanimity article or video content in advance and set up candidate first Answer storehouse, next to that receiving after user's enquirement, the answer of matching is found from candidate answers storehouse and answer is issued.

With reference to Fig. 9, the answer of matching is found from candidate answers storehouse and is further comprised the step of issuing answer：

Step S900, participle is carried out to customer problem；

Step S901, according to the word set after participle, this finds the candidate answers that title includes these words in candidate answers storehouse；

Step S902, for each candidate answers, calculates the correlation of its title and customer problem；

Step S903, system needs to judge whether the answer degree of correlation automatically generated is higher than specific threshold；Higher than threshold value Candidate answers are then considered as Top k answers；

For Top K answers, the user for being adapted to answer a question can be sent to according to step S904, i.e., it is potential to take main confession It is referred to, and is write answer for it and is provided material；Can also directly it be issued in Ask-Answer Community using system account according to step S905 Answer.

Although the present invention is disclosed as above with preferred embodiment, it is not for limiting the present invention, any this area Technical staff without departing from the spirit and scope of the present invention, may be by the methods and techniques content of the disclosure above to this hair Bright technical scheme makes possible variation and modification, therefore, every content without departing from technical solution of the present invention, according to the present invention Any simple modifications, equivalents, and modifications made to above example of technical spirit, belong to technical solution of the present invention Protection domain.

Claims

1. a kind of method for automatically generating open-ended question answer, it is characterised in that including：

The project obtained in content library, the content library at least includes one kind in item source, the contents of a project and item-title Attribute；

Set up answer rule and the project for meeting the answer rule is obtained from the content library to set up answer storehouse, it is described to answer Case rule include at least with a kind of corresponding screening conditions in the item source, the contents of a project, item-title；

Calculate the synthesis pertinence of described problem project and destination item and find out synthesis pertinence highest destination item with defeated Go out.

2. the method as claimed in claim 1 for automatically generating open-ended question answer, it is characterised in that the acquisition content library Including：The project set up according to the document or video of internal offer or outside crawl in content library.

3. the method as claimed in claim 1 for automatically generating open-ended question answer, it is characterised in that described to set up answer rule Then include：

The technorati authority of the item source is legal；

The timeliness of the contents of a project is legal；

Entity word is included in the item-title；

The item-title is interrogative sentence.

4. the method as claimed in claim 1 for automatically generating open-ended question answer, it is characterised in that described to set up answer rule Then include：

Meet the project of the answer rule described in artificial screening from the content library and do not meet the item of the answer rule Mesh is using as the first positive sample and the first negative sample；

The attribute of the project, the first positive sample and the first negative sample input answer rule are set up into model, the answer is trained Rule sets up model to set up the answer rule.

5. the method as claimed in claim 4 for automatically generating open-ended question answer, it is characterised in that answered described in the training Case rule sets up model to be included with setting up the answer rule：

Obtain first positive sample/first negative sample truly several relative to the first True Data/the second of the item attribute According to；

The negative sample of the first positive sample described in model prediction/first is set up relative to the item attribute using the answer rule The prediction data of first prediction data/second；

Adjust the model parameter that the answer rule sets up model, when first prediction data is consistent with the first True Data and Second prediction data is consistent with the second True Data, exports the model parameter and its answer rule sets up model.

6. the method as claimed in claim 4 for automatically generating open-ended question answer, it is characterised in that the answer rule is built Formwork erection type is based on machine learning algorithm.

7. the method as claimed in claim 6 for automatically generating open-ended question answer, it is characterised in that the machine learning is calculated Method is one kind in NB Algorithm, GBDT algorithms, SVM algorithm and RNN algorithms.

8. the method as claimed in claim 1 for automatically generating open-ended question answer, it is characterised in that described to ask input Topic project carries out participle and finds the destination item for possessing all participles in answer storehouse and include：

If item-title possesses each word in the sequence in the answer storehouse, the project conduct of the item-title is selected The destination item.

9. the method as claimed in claim 1 for automatically generating open-ended question answer, it is characterised in that asked described in the calculating The synthesis pertinence of topic project and destination item includes：

Weight is distributed to the single degree of correlation that calculating is obtained and is subject to the accumulation of the weight and the corresponding single degree of correlation To the synthesis pertinence.

10. the method as claimed in claim 9 for automatically generating open-ended question answer, it is characterised in that described to use respectively It is following several that the single degree of correlation that some degree of correlation functions are calculated between described problem project and destination item at least includes execution The one or more of calculation：

Participle is carried out to destination item title and problem project, the numerical value vector of each word is obtained by word2vec instruments, so The addition of vectors of these words is obtained afterwards the semantic vector and problem project semantic vector of item-title, finally calculate above-mentioned two The cosine distances of vector, to obtain the first single degree of correlation；

Destination item title carries out participle with problem project, and the quantity for calculating the two shared word accounts for the ratio of the two all word, with Obtain the second single degree of correlation；

Obtain the keyword of destination item title, the numerical value vector of each keyword obtained by word2vec instruments, then These vectors add and are used as the semantic vector of destination item title；Participle is carried out to problem project, then passes through word2vec works Tool obtains the numerical value vector of each word and adds and as the semantic vector of problem project, finally calculate destination item title and problem Cosine distances between project semantic vector, to obtain the 3rd single degree of correlation.

11. the method as claimed in claim 1 for automatically generating open-ended question answer, it is characterised in that also include：

Meet the item-title of described problem project described in artificial screening from the answer storehouse and do not meet described problem project Item-title using as the second positive sample and the second negative sample；

It is individual character sequence to split described problem project, the second positive sample and the second negative sample, and is based on obtaining each after RNN is handled The semantic vector of sequence；

Whether training RNN models meet true correlation degree with the cosine distances between the above-mentioned each semantic vector of Accurate Prediction, to obtain Obtain RNN model parameters；

Marked based on the RNN model parameters using described in RNN model predictions between item-title and problem project individual character sequence Predict the degree of correlation.

12. the method as claimed in claim 1 for automatically generating open-ended question answer, it is characterised in that also include：At least carry The object content in synthesis pertinence highest destination item is taken to be issued.