CN107239574A

CN107239574A - A kind of method and device of intelligent Answer System knowledge problem matching

Info

Publication number: CN107239574A
Application number: CN201710513108.4A
Authority: CN
Inventors: 陈飞; 崔培君; 乔思龙; 王萌萌
Original assignee: Beijing Shenzhou Taiyue Software Co Ltd
Current assignee: Dingfu Intelligent Technology Co., Ltd
Priority date: 2017-06-29
Filing date: 2017-06-29
Publication date: 2017-10-10
Anticipated expiration: 2037-06-29
Also published as: CN107239574B

Abstract

This application discloses the knowledge problem matching process and device of a kind of intelligent Answer System, two kinds of similarity evaluation systems of weight similarity and vector similarity have been merged in the matching process, it compensate for the systematic error that single similarity evaluation method is present, and, the scheme of the application is before weight similarity and vector similarity is calculated, word segmentation result is pre-processed, remove the stop words in word segmentation result, reduce false touch rate, in addition, the weight of knowledge word to being obtained after pretreatment has carried out normalized, make its threshold value [0, 1], reduce due to different knowledge word weight difference it is big caused by weight Similarity Measure deviation, so that the weight similarity of problem and alternative knowledge is more accurate, and then improve the accuracy of total similarity, further increase the accuracy of intelligent Answer System knowledge problem matching.

Description

A kind of method and device of intelligent Answer System knowledge-problem matching

Technical field

The application is related to natural language processing technique field, more particularly to a kind of intelligent Answer System knowledge is matched with problem Method and device.

Background technology

Intelligent Answer System is a kind of system that problem answers are inquired about by human-computer interaction customer self-service, generally includes network Knowing in knowledge base and answer storehouse corresponding with knowledge base, knowledge base is preset in the client and server of connection, server Know and corresponded with the answer in answer storehouse, server matches preset knowledge base according to text the problem of acquisition from client In knowledge, then the corresponding answer of the knowledge is returned into the problem of client is to answer client.

Matching problem text and preset knowledge generally have two methods, the first be based on user input the text of problem with Preset knowledge in knowledge base is identical, is for second the similarity of text and preset knowledge in knowledge base that problem is inputted based on user Highest.For first way, the problem of the problem of user proposes is with database is often incomplete same, for example, building in advance Four knowledge are included in vertical knowledge base：1. credit card handle flow, 2. credit card logout flow paths, 3. mass transit cards handle flow and 4. mass transit card logout flow path, when client's input " credit card handles flow ", intelligent Answer System can then match knowledge 1, work as visitor During family input " how credit card is handled ", intelligent Answer System then can not the match is successful.For the second way, conventional is similar There is systematic deviation in degree computational methods, it is not maximum often to go wrong with the similarity of corresponding knowledge, may be led Cause problem and knowledge matching error, and then cause in the situation that occurs giving an irrelevant answer, such as above-mentioned example, when client inputs " credit How card is handled " when, intelligent Answer System thinks the problem and 3 similarity highest, and provides 3 for client and corresponding answer Case, i.e. the accuracy of this method is poor.

It would therefore be highly desirable to develop a kind of for intelligent Answer System, being capable of accurate match correlation according to the fuzzy enquirement of user The method and device of knowledge.

The content of the invention

This application provides a kind of method of intelligent Answer System knowledge-problem matching and device, asked with solving intelligence Answer system problem matched with knowledge it is inaccurate, cause extract answer accuracy rate it is low the problem of.

It is an object of the invention to provide the following aspects：

In a first aspect, this application provides a kind of method of intelligent Answer System knowledge-problem matching, this method includes：

Obtain the problem of client is sent；

Obtain the weight similarity of each alternative knowledge and described problem respectively using knowledge word and problem word；

Obtain the vector similarity of each the alternative knowledge and described problem respectively using knowledge word and problem word；

Using the weight similarity and the vector similarity, each alternative knowledge and described problem are calculated respectively Total similarity；

The alternative knowledge that total similarity meets preset rules is obtained, the knowledge matched with described problem is used as.

Alternatively, also include before the weight similarity for obtaining each alternative knowledge and described problem respectively：

Generate in knowledge base, the knowledge base comprising alternative knowledge described at least one；

Knowledge is pre-processed, and is carried out word segmentation processing to the alternative knowledge, is removed the stop words in word segmentation processing result, so that Obtain the knowledge word in the alternative knowledge.

Alternatively, the knowledge word is prepared by the following：

Word segmentation processing is carried out to the alternative knowledge；

Remove the stop words in word segmentation processing result, so as to obtain the knowledge word in the alternative knowledge；

Alternatively, described problem word is prepared by the following：

Word segmentation processing is carried out to described problem；

Remove the stop words in word segmentation processing result, so that the problem of obtaining in described problem word.

Alternatively, the utilization knowledge word and problem word obtain the power of each the alternative knowledge and described problem respectively Weight similarity includes：

Obtain the weight of knowledge word in alternative knowledge；

Rule is assigned to problem word imparting weight in problem according to default weight；

Utilize the weight of knowledge word and weight similarity described in the weight calculation of problem word.

Alternatively, the weight for obtaining knowledge word in alternative knowledge, including：

The weight of each knowledge word is obtained, the weight of the knowledge word is weight of the knowledge word in this knowledge；

Weight to each knowledge word is normalized.

Alternatively, the weight assigns rule and assigns the condition for presetting weight to judge whether described problem word meets, if Meet, then assign default weight to described problem word；

If not meeting, the weight of problem word is all standby at each with problem word identical knowledge word in described problem Select the average value of weight in knowledge；

Described assign presets the condition of weight not include described problem word in the knowledge word.

Alternatively, the utilization knowledge word and problem word obtain the vectorial phase of each alternative knowledge and described problem Like degree, including：

The vector of the alternative knowledge is obtained,

The vector of described problem is obtained,

The vector similarity is calculated using the vector of the vector sum described problem of the alternative knowledge.

Alternatively, the vector for obtaining the alternative knowledge includes：

The term vector of knowledge word is obtained, the term vector of the knowledge word is word of the knowledge word in the alternative knowledge Vector；

The vector of the alternative knowledge is calculated using the term vector of the knowledge word.

Alternatively, the vector for obtaining described problem includes：

The term vector of acquisition problem word, the term vector of described problem word is identical with the term vector of knowledge word described in identical；

The vector of described problem is calculated using the term vector of knowledge word.

Alternatively, the weight similarity uses a kind of in Jaccard (Jacobi's distance), Hamming distance and editing distance Or the mode of a variety of combinations is obtained；

The vector similarity is obtained using cosine manner；

Total similarity of the alternative knowledge and described problem is the weight of the alternative knowledge of same and described problem Similarity and the linear weighted function sum of the vector similarity.

Alternatively, the preset rules are by total sequencing of similarity of all alternative knowledge and described problem, total phase Like choosing in degree the maximum.

The application has merged weight similarity and vector similarity in knowledge-problem matching process of intelligent Answer System Two kinds of similarity evaluation systems, compensate for the systematic error that single similarity evaluation method is present, moreover, the scheme of the application exists Calculate before weight similarity and vector similarity, word segmentation result pre-processed, remove the stop words in word segmentation result, Reduce false touch rate, in addition, the weight of the knowledge word to being obtained after pretreatment has carried out normalized, make its threshold value [0, 1], reduce due to different knowledge word weight difference it is big caused by weight Similarity Measure deviation so that problem with it is standby Select the weight similarity of knowledge more accurate, and then improve the accuracy of total similarity, further increase intelligent answer system The accuracy that knowledge-problem of uniting is matched.

Second aspect, present invention also provides a kind of knowledge of intelligent Answer System-problem coalignment, described device bag Include：

Problem acquiring unit, for obtaining the problem of client is sent；

Weight similarity acquiring unit, for using knowledge word and problem word obtain respectively each alternative knowledge with it is described The weight similarity of problem；

Vector similarity acquiring unit, for using knowledge word and problem word obtain respectively each alternative knowledge with The vector similarity of described problem；

Total similarity calculated, for utilizing the weight similarity and the vector similarity, is calculated each respectively Total similarity of the alternative knowledge of bar and described problem；

Knowledge-problem matching unit, the alternative knowledge of preset rules is met for obtaining total similarity, is asked as with described Inscribe the knowledge matched.

Alternatively, the knowledge word is prepared by the following：

Word segmentation processing is carried out to the alternative knowledge；

Alternatively, described problem word is prepared by the following：

Word segmentation processing is carried out to described problem；

Alternatively, the weight similarity acquiring unit includes：

Knowledge word Weight Acquisition subelement, the weight for obtaining knowledge word in alternative knowledge；

Problem word weight assigns subelement, is weighed for assigning rule according to default weight to problem word imparting in problem Weight；

Weight Similarity Measure subelement, for the weight and weight phase described in the weight calculation of problem word using knowledge word Like degree.

Alternatively, the knowledge word Weight Acquisition subelement includes：

Common Weight Acquisition is from unit, the weight for obtaining each knowledge word, and the weight of the knowledge word is known to be described Know weight of the word in this knowledge；

Normalized is normalized from unit for the weight to each knowledge word.

Alternatively, described problem word power is assigned in subelement, and the weight assigns rule whether to judge described problem word Meet the condition for assigning default weight, if meeting, default weight is assigned to described problem word；If not meeting, described problem The weight of middle problem word is all average values with problem word identical knowledge word weight in each alternative knowledge；

Alternatively, the vector similarity acquiring unit includes：

Knowledge vector obtains subelement, the vector for obtaining the alternative knowledge；

Problem vector obtains subelement, the vector for obtaining described problem；

Vector similarity computation subunit, the vector for the vector sum described problem using the alternative knowledge calculates institute State vector similarity.

Alternatively, the knowledge vector obtains subelement and included：

The term vector of knowledge word is obtained from unit, the term vector for obtaining knowledge word, and the term vector of the knowledge word is Term vector of the knowledge word in the alternative knowledge；

Knowledge vector is calculated from unit, and the vector of the alternative knowledge is calculated for the term vector using the knowledge word.

Alternatively, described problem vector obtains subelement and included：

The term vector of problem word is obtained from unit, the term vector for obtaining problem word, the term vector of described problem word with The term vector of knowledge word is identical described in identical；

Problem vector is calculated from unit, and the vector of described problem is calculated for the term vector using knowledge word.

The vector similarity is obtained by the way of cosine；

Total similarity of the alternative knowledge and described problem is the weight of the alternative knowledge of same and described problem Similarity and the linear weighted function sum of the vector similarity；

The preset rules are that, by total sequencing of similarity of all alternative knowledge and described problem, total similarity is maximum Selected in person.

Brief description of the drawings

In order to illustrate more clearly of the technical scheme of the application, letter will be made to the required accompanying drawing used in embodiment below Singly introduce, it should be apparent that, for those of ordinary skills, without having to pay creative labor, Other accompanying drawings can also be obtained according to these accompanying drawings.

Fig. 1 is the flow chart of one embodiment of the application intelligent Answer System knowledge-problem matching process；

Fig. 2 is the flow chart that the application S102 obtains weight similarity one embodiment；

Fig. 3 is the flow chart of one embodiment that the application S103 obtains vector similarity；

Fig. 4 is the apparatus structure schematic diagram of the application intelligent Answer System knowledge-problem coalignment one embodiment；

Fig. 5 is the structural representation of the one embodiment of the application weight similarity acquiring unit 402；

Fig. 6 is the apparatus structure schematic diagram of the one embodiment of the application vector similarity acquiring unit 403；

The structural representation for the computer system that Fig. 7 provides for the embodiment of the present application.

Embodiment

Below by the present invention is described in detail, the features and advantages of the invention will become more with these explanations To be clear, clear and definite.

The present invention described below.

According to the first aspect of the application, there is provided the knowledge of a kind of intelligent Answer System-problem matching process, such as Fig. 1 It is shown, wherein, this method includes：

S101 obtains the problem of client is sent；

S102 obtains the weight similarity of each alternative knowledge and described problem using knowledge word and problem word respectively；

In this application, knowledge word is the word segmentation result of alternative knowledge；Problem word is the participle for the problem of client is sent As a result.

It is vectorial similar to described problem that S103 obtains each alternative knowledge respectively using knowledge word and problem word Degree；

S104 utilizes the weight similarity and the vector similarity, and each alternative knowledge is calculated respectively and is asked with described Total similarity of topic；

S105 obtains the alternative knowledge that total similarity meets preset rules, is used as the knowledge matched with described problem.

In this application, the knowledge word is prepared by the following：

Word segmentation processing is carried out to the alternative knowledge；

Remove the stop words in word segmentation processing result, so as to obtain the knowledge word in the alternative knowledge.

In this application, described problem word is prepared by the following：

Word segmentation processing is carried out to described problem；

In this application, the alternative knowledge is asked including a standard and asked with optional extension, wherein, it is described extension ask for The different expression-forms that standard is asked, it expresses same semanteme, such as by banking on being said exemplified by how handling credit card Bright, the alternative knowledge relevant with " how handling credit card " that is stored with the knowledge base includes：" credit card handles flow ", " where I can handle credit card ", " handling credit card step " etc., regard one of problem of above mentioned problem as standard Ask, other three problems are asked as extension.In the present embodiment, such as a problem " credit card of foremost will can be come Handle flow " asked as standard, ask, can also specify in other embodiments using three other problems as corresponding extension Other problems are asked for standard.

It should be noted that the standard ask with it is described extension ask both can use semantic formula form, also may be used To use specific question sentence form, it is all in protection scope of the present invention.

In this application, based on alternative knowledge every described, the knowledge word is the result after duplicate removal, i.e. in same In the word segmentation processing result of alternative knowledge, identical entry is only according to an entry meter, for example：Alternative knowledge is " I wants to handle Credit card, how I handle China Merchants Bank's credit card ", the result after word segmentation processing be respectively " I ", " handling ", " credit card ", " I ", " how ", " handling ", " China Merchants Bank ", " credit card ", i.e. although including two identical character strings in the alternative knowledge of this The result of the word segmentation processing of the alternative knowledge of " I ", " handling ", " credit card ", but this is " I ", " handling ", " credit card ", " such as What " and " China Merchants Bank ".

It is to pre-establish a deactivation word list to remove stop words processing, every in word segmentation processing result when removing stop words One entry with disable word list in word matched, if the entry be present in disable word list in, by the entry from Deleted in entry string after Chinese word segmentation processing.

Stop words described herein be the word without practical significance, such as auxiliary words of mood or structural auxiliary word, for example " ", " ", " ", " ", " " etc..

The inventors discovered that, remove after stop words, the noise in described problem can be removed, so that the alternative knowledge Similarity with described problem is more accurate, so as to improve the accuracy rate of knowledge-problem matching, and then improves intelligent Answer System The accuracy rate of answer.

In this application, step S111 and step S112 can also be after step slol before step S101.

In the application S101, it is to obtain the text that client sends problem to obtain the problem of client is sent.

In this application, can be any as shown in figure 1, the order to step S102 and step S103 is not specially limited Sequentially, i.e., the order of described two steps can be S102-S103, or S103-S102.

In this application, the weight similarity refer to the knowledge of the weight calculation of knowledge based word and problem word with The similarity of described problem.

In the application S102, as shown in Fig. 2 the utilization knowledge word and problem word obtain each alternative knowledge respectively Include with the weight similarity of described problem：

S201 obtains the weight of knowledge word in alternative knowledge；

S202 assigns rule to problem word imparting weight in problem according to default weight；

S203 utilizes the weight of knowledge word and weight similarity described in the weight calculation of problem word.

In the application S201, the weight for obtaining knowledge word in alternative knowledge includes：

S2011 obtains the weight of each knowledge word, and the weight of the knowledge word is the knowledge word in this knowledge Weight；

The weight of each knowledge word is normalized S2012.

In the application S2011, the weight of each knowledge word passes through tf-idf (term frequency-inverse Document frequency, term frequency-inverse document rate) method acquisition.

In the application S2012, the weight to each knowledge word passes through normalized, and it is [0,1] to make its threshold value.

The inventors discovered that, the weighted value scope obtained by tf-idf methods is big, for example, can reach [Isosorbide-5-Nitrae 000], such as The weight of fruit each knowledge word in terms of the weight, then can cause the small knowledge word of weight to be about 0 when calculating similarity, so that Cause to calculate obtained weight similarity and true similarity difference is huge, i.e. the serious misalignment of weight similarity, and then cause intelligence The answer accuracy rate that energy question answering system is provided is low.

The application passes through normalized to the weight of each knowledge word, and it is [0,1] to make its threshold value, has both maintained each The characteristic distributions of individual knowledge word weight, also reduce the gap of each knowledge word weight so that alternative knowledge and described problem Similarity Measure is more reasonable accurate.

In the application S202, the weight assigns rule and assigns default weight to judge whether described problem word meets Condition, if meeting, default weight is assigned to described problem word；If not meeting, the weight of problem word is institute in described problem There is the average value with problem word identical knowledge word weight in each alternative knowledge.

In this application, described assign presets the condition of weight not include described problem word in the knowledge word.

In this application, all default weights for meeting the problem of assigning the condition for presetting weight word are all identical, such as obtain The problem of taking be " how I handle credit card ", word segmentation processing and remove the result after credit word for " I ", " how ", " do Reason ", " credit card ", in knowledge word only " how ", " handling ", " credit card ", i.e. do not include problem word " I " in knowledge word, Therefore, assign problem word " I " default weight (such as 0.2).

In the application S2023, word segmentation processing is carried out to different alternative knowledge, identical knowledge word is likely to be obtained, not With in alternative knowledge, the weight possibility of above-mentioned identical knowledge word is identical may also be different, when they are different, choose any one Individual weight assign problem word be all it is unilateral, it is inaccurate, and take average weight of the above-mentioned knowledge word in all alternative knowledge Then it is of universal significance, enables to knowledge weight more accurate with problem weight.

In the application one embodiment, alternative knowledge is " credit card handles flow ", " credit card logout flow path ", " public affairs Hand over card to handle flow " and " mass transit card logout flow path ", then knowledge word average value of weight in each alternative knowledge obtain as follows Take：

Wherein, each knowledge word average value of weight in each alternative knowledge is respectively：

In this application, obtained in S202 in described problem before the weight of problem word, in addition to：

S221 carries out word segmentation processing to described problem；

S222 removes the stop words in word segmentation processing result, so that the problem of obtaining in described problem word.

In this application, word segmentation processing mode is carried out to described problem with carrying out the side of word segmentation processing to the alternative knowledge Formula is identical.

In this application, remove the mode of stop words in described problem word segmentation result and remove the alternative knowledge point with described The mode of stop words is identical in word result.

In the application S203, the weight of the utilization knowledge word is led to weight similarity described in the weight calculation of problem word One or more kinds of modes combined in Jaccard (Jacobi's distance), Hamming distance and editing distance are crossed to obtain.

In the application one embodiment, calculate the alternative knowledge is with described problem for " credit card handles flow " The method of the weight similarity of " how I handle credit card " is：

Set of the set A as described problem word and its weight is set, set B is the set of knowledge word and its weight, then collects Close A be " I ", " how ", " handling " and " credit card ", weight is respectively 1/5,1/3,1/3 and 1/3, set B be " credit card ", " handling " and " flow ", weight is respectively 1/3,1/3 and 1/3, and the weight similarity of described problem and the alternative knowledge is：

Jaccard (A, B)=| A intersect B |/| A union B |

Wherein, Jaccard (A, B) represents set A and B weight similarity；

| A intersect B | represent the weight sum that the set of A, B two is occured simultaneously；

| A union B | represent that A, B two gathers the weight sum of union；

For the present embodiment, the intersection of sets collection of A, B two is " handling " and " credit card ", and weight is respectively 1/3 and 1/3, A, B Two union of sets collection be " I ", " how ", " handling ", " credit card " and " flow ", weight is respectively 1/5,1/3,1/3,1/3 and 1/3；

Then Jaccard (A, B)=(1/3+1/3)/(1/5+1/3+1/3+1/3+1/3)=(2/3)/(5/7)=10/21, That is, the weight similarity of described alternative knowledge and described problem is 10/21.

In the application S103, as shown in figure 3, the utilization knowledge word and problem word obtain respectively each it is described alternative The vector similarity of knowledge and described problem includes：

S301 obtains the vector of the alternative knowledge,

S302 obtains the vector of described problem,

S303 calculates the vector similarity using the vector of the vector sum described problem of the alternative knowledge.

In this application, the vector similarity refer to the knowledge that the vector of knowledge based word and problem word calculates with The similarity of described problem.

In the application S301, the vector for obtaining the alternative knowledge, including：

S3011 obtains the term vector of knowledge word, and the term vector of the knowledge word is the knowledge word in the alternative knowledge In term vector；

S3012 calculates the vector of the alternative knowledge using the term vector of the knowledge word.

In this application, the dimension of described problem word term vector is identical with the dimension of the knowledge word term vector.

The term vector of identical knowledge word is identical in the knowledge base, and the term vector of the knowledge word passes through word2vec A kind of mode in (word2vector, i.e. word are embedded in) or one-hot (one-hot encoding, i.e. one-hot coding) is obtained Take.

In this application, the vector of the alternative knowledge is the average term vector of all knowledge words in described problem, i.e. institute There is knowledge word to be averaged on every dimension resulting vector, alternative knowledge is " credit card handles flow " as described, point The result of word processing is " credit card ", " handling " and " flow ", and their vector representation is respectively：

Credit card vector representation [8/10,1/10,1/10]

Handle vector representation [3/10,6/10,1/10]

Flow vector representation [4/10,2/10,4/10]

Then the vector of the alternative knowledge is [(8/10+3/10+4/10)/3, (1/10+6/10+2/10)/3, (1/10+1/10 + 4/10)/3]=[1/2,3/10,1/5].

In the application S302, the vector for obtaining described problem, including：

S3021 obtains the term vector of problem word, the term vector of knowledge word described in the term vector and identical of described problem word It is identical；

S3022 calculates the vector of described problem using the term vector of knowledge word.

In this application, the vectorial and vectorial method phase for obtaining the alternative knowledge for obtaining described problem Together.

In the application S303, the vector of the vector sum described problem using the alternative knowledge calculates the vector Similarity, the vector similarity is obtained by the way of cosine.

In one embodiment of the application, in a cosine manner exemplified by illustrate the acquisition of vector similarity：For example, calculating standby The vector similarity of knowledge " credit card handles flow " and described problem " how I handle credit card " is selected, setting knowledge word is with asking The term vector of epigraph is three-dimensional, and the alternative knowledge is carried out into word segmentation processing and removes stop words, obtaining result is：

Credit card vector representation [8/10,1/10,1/10]

Handle vector representation [3/10,6/10,1/10]

Flow vector representation [4/10,2/10,4/10]

Then the vector of the alternative knowledge is expressed as [(8/10+3/10+4/10)/3, (1/10+6/10+2/10)/3, (1/ with A 10+1/10+4/10)/3]=[1/2,3/10,1/5], i.e. A=[1/2,3/10,1/5]；

Described problem is subjected to word segmentation processing and removes stop words, obtaining result is：

Then problem vector be expressed as with B [(1/10+5/10+3/10+8/10)/4, (3/10+4/10+6/10+1/10)/ 4, (6/10+1/10+1/10+1/10)/4]=[17/40,14/40,9/40], i.e. B=[17/40,14/40,9/40]；

Then the vector similarity of the alternative knowledge and described problem is equal to for the cosine of A and B angle theta：

Specifically, A=[1/2,3/10,1/5], B=[17/40,14/40,9/40], then：

A and B similarity

Cos θ=

[(17/40*1/2)+(14/40*3/10)+(9/40*1/5)]/sqrt[(17/40*17/40+14/40*14/40+ 9/40*9/40)*(1/2*1/2+3/10*3/10+1/5*1/5)]

=98/100,

That is, the vector similarity of described alternative knowledge and described problem is 98/100.

It is described to utilize the weight similarity and the vector similarity in the application S104, each is calculated respectively Total similarity of alternative knowledge and described problem, to seek the weight similarity of current alternative knowledge and described problem and described The linear weighted function sum of vector similarity, i.e. assign weight the first predetermined coefficient of similarity respectively, vector similarity second is preset Coefficient, calculates the product of weight similarity and the first predetermined coefficient respectively, and vector similarity and the second predetermined coefficient product, Total similarity is two sum of products.

In the application S104, total similarity is calculated according to following formula I：

D_Always=a*D_Weight+b*D_VectorFormula I

Wherein, D_AlwaysTotal similarity is represented,

D_WeightWeight similarity is represented,

D_VectorRepresent vector similarity,

A represents the first predetermined coefficient,

B represents the second predetermined coefficient,

Also, 0<a<1, a+b=1.

In the application S105, the preset rules are to arrange total similarity of all alternative knowledge and described problem Selected in sequence, total similarity the maximum.

To be more fully understood by the method that intelligent Answer System of the present invention extracts knowledge, one is set forth below specifically Embodiment is illustrated.

Described problem is " how I handle credit card ", and alternative knowledge is " credit card handles flow ", " credit card nullifies stream Journey ", " mass transit card handles flow " and " mass transit card logout flow path ", then the process that the alternative knowledge is matched with described problem is：

(1) word segmentation processing is carried out to the alternative knowledge, removes stop words therein, obtain knowledge word, recycle tf- Idf calculates weight of each knowledge word in this alternative knowledge, as a result as follows：

(2) according to the result of (1), calculating each knowledge word average value of weight in each alternative knowledge is respectively：

(3) word segmentation processing is carried out to described problem, removes stop words therein, problem word is obtained, according to default weight Assign rule and assign weight to described problem word, in the present embodiment, default weight is 1/5, and weight knot is assigned to problem word Fruit is as follows：

Then, alternative knowledge " credit card handles flow " and the weight similarity of described problem " how I handle credit card " For：Jaccard (A, B)=| A intersect B |/| A union B |=(1/3+1/3)/(1/5+1/3+1/3+1/3+1/3) =(2/3)/(5/7)=10/21, i.e. the weight similarity of the alternative knowledge and described problem is 10/21；Remaining is alternatively known The weight similarity known with described problem is calculated successively in the same manner, obtains result as shown in table 1 below；

(4) in the present embodiment, it is three-dimensional to set each knowledge word and the term vector of problem word, uses word2vec The term vector of each knowledge word and problem word is obtained, by taking alternative knowledge " credit card handles flow " as an example, calculates described alternative The vector similarity of knowledge and described problem, it is as follows：

The knowledge term vector of the alternative knowledge is followed successively by：

Credit card vector representation [8/10,1/10,1/10]

Handle vector representation [3/10,6/10,1/10]

Flow vector representation [4/10,2/10,4/10],

Specifically, A=[1/2,3/10,1/5], B=[17/40,14/40,9/40], then：

A and B similarity

Cos θ=[(17/40*1/2)+(14/40*3/10)+(9/40*1/5)]/sqrt [(17/40*17/40+14/40* 14/40+9/40*9/40)*(1/2*1/2+3/10*3/10+1/5*1/5)]

=98/100,

That is, the vector similarity of described alternative knowledge and described problem is 98/100, remaining alternative knowledge and described problem Vector similarity calculate successively in the same manner, obtain result as shown in table 1 below；

(5) calculate respectively same alternative knowledge and described problem weight similarity and vector similarity linear weighted function it With, wherein, the first predetermined coefficient is a=0.4, and the second predetermined coefficient is b=0.6, as a result as shown in table 1.

The similarity result of the alternative knowledge of table 1 and problem

(6) " total similarity " in comparison sheet 1, the maximum alternative knowledge of numerical value is " credit card handles flow ", i.e. matching knot Fruit is " credit card handles flow ".

According to the second aspect of the application, a kind of knowledge-problem coalignment of intelligent answer is additionally provided, such as Fig. 4 institutes Show, described device includes：

Problem acquiring unit 401, for obtaining the problem of client is sent；

Weight similarity acquiring unit 402, for using knowledge word and problem word obtain respectively each alternative knowledge with The weight similarity of described problem；

Vector similarity acquiring unit 403, described alternatively knows for obtaining each respectively using knowledge word and problem word Know the vector similarity with described problem；

Total similarity calculated 404, for utilizing the weight similarity and the vector similarity, is calculated every respectively Total similarity of one alternative knowledge and described problem；

Knowledge-problem matching unit 405, the alternative knowledge of preset rules is met for obtaining total similarity, as with institute The knowledge that the problem of stating matches.

In this application, the knowledge word is prepared by the following：

Word segmentation processing is carried out to the alternative knowledge；

In this application, described problem word is prepared by the following：

Word segmentation processing is carried out to described problem；

In this application, as shown in figure 5, the weight similarity acquiring unit 402 includes：

Knowledge word Weight Acquisition subelement 4021, the weight for obtaining knowledge word in alternative knowledge；

Problem word weight assigns subelement 4022, is assigned for assigning rule according to default weight to problem word in problem Weight；

Weight Similarity Measure subelement 4023, for utilizing the weight of knowledge word and power described in the weight calculation of problem word Weight similarity.

In this application, the knowledge word Weight Acquisition subelement 4021 includes：

Common Weight Acquisition is from unit 40211, the weight for obtaining each knowledge word, and the weight of the knowledge word is institute State weight of the knowledge word in this knowledge；

Normalized is normalized from unit 40212 for the weight to each knowledge word.

In this application, described problem word power is assigned in subelement 4022, and the weight assigns rule and asked described in judging Whether epigraph meets the condition for assigning default weight, if meeting, and default weight is assigned to described problem word；If not meeting, In described problem the weight of problem word for it is all with problem word identical knowledge word in each alternative knowledge weight is averaged Value；

In this application, as shown in fig. 6, the vector similarity acquiring unit 403 includes：

Knowledge vector obtains subelement 4031, the vector for obtaining the alternative knowledge；

Problem vector obtains subelement 4032, the vector for obtaining described problem；

Vector similarity computation subunit 4033, for the vector sum described problem using the alternative knowledge to gauge Calculate the vector similarity.

In this application, the knowledge vector obtains subelement 4031 and included：

The term vector of knowledge word is obtained from unit 40311, the term vector for obtaining knowledge word, the word of the knowledge word to Measure the term vector in the alternative knowledge for the knowledge word；

Knowledge vector is calculated from unit 40312, for calculating the alternative knowledge using the term vector of the knowledge word Vector.

In this application, described problem vector obtains subelement 4032 and included：

The term vector of problem word is obtained from unit 40321, the term vector for obtaining problem word, the word of described problem word to Amount is identical with the term vector of knowledge word described in identical；

Problem vector is calculated from unit 40322, and the vector of described problem is calculated for the term vector using knowledge word.

In this application, the weight similarity is used in Jaccard (Jacobi's distance), Hamming distance and editing distance The mode that one or more are combined is obtained；

The vector similarity is obtained by the way of cosine；

Total similarity of the alternative knowledge and described problem is the weight of the alternative knowledge of same and described problem Similarity and the vector similarity sum；

Fig. 7 show can thereon implement embodiment computer system 800 block diagram.Computer system 800 is wrapped Include processor 810, storage medium 820, system storage 830, monitor 840, keyboard 850, mouse 860, the and of network interface 820 Video adapter 880.These parts are coupled by system bus 890.

Storage medium 820 (such as hard disk) stores multiple programs, including operating system, application program and other program moulds Block.User can input into computer system 800 order and information by input equipment, input equipment be, for example, keyboard 850, Touch pad (not shown) and mouse 860.Text and graphical information are shown using monitor 840.

Operating system is on processor 810 and for coordinating and providing in the personal computer system 800 in Fig. 7 Various parts control.Furthermore, it is possible in computer system 800 using computer program with implement it is above-mentioned it is various implement Example.

It would be recognized that hardware component shown in Fig. 7 is only for illustrative purposes, and physical unit may be according to be real Apply the present invention and dispose computing device and change.

In addition, computer system 800 for example can be desktop computer, server computer, laptop computer or nothing Line equipment, such as mobile phone, personal digital assistant (PDA), handheld computer.

It would be recognized that the embodiment in the scope of the invention can be embodied as to the form of computer program product, computer Program product includes computer executable instructions, such as program code, and it can run on any with reference to appropriate operating system In appropriate computing environment, operating system is, for example, Microsoft Windows, Linux or UNIX operating system.The scope of the invention Interior embodiment can also include program product, and program product includes computer-readable medium can for carrying or storing computer Execute instruction or data structure are thereon.Such computer-readable medium can be it is any can by it is universal or special calculate The usable medium that machine is accessed.For example, such computer-readable medium can include RAM, ROM, EPROM, EEPROM, CD- ROM, magnetic disk storage or other storage devices, or can be used in carrying with form of computer-executable instructions or store desired Program code and any other medium that can be accessed by universal or special computer.

The intelligent Answer System provided according to the present invention extracts method, device and the system of answer, with following beneficial Effect：

(1) normalized has been carried out to weight, it is more accurate as evaluation factor thereby using weight；

(2) two kinds of factors of comprehensive weight and vector judge the similarity of problem and knowledge so that similarity judges more to be defined Really；

(3) work of artificial correction problem is reduced, the substantial amounts of cost of labor of enterprise is saved.

The present invention is described in detail above in association with embodiment and exemplary example, but these explanations are simultaneously It is not considered as limiting the invention.It will be appreciated by those skilled in the art that without departing from the spirit and scope of the invention, A variety of equivalencings, modification can be carried out to technical solution of the present invention and embodiments thereof or is improved, these each fall within the present invention In the range of.Protection scope of the present invention is determined by the appended claims.

Claims

1. the knowledge of a kind of intelligent Answer System-problem matching process, it is characterised in that this method includes：

Obtain the problem of client is sent；

Using the weight similarity and the vector similarity, total phase of each alternative knowledge and described problem is calculated respectively Like degree；

2. according to the method described in claim 1, it is characterised in that

The knowledge word is prepared by the following：

Word segmentation processing is carried out to the alternative knowledge；

Described problem word is prepared by the following：

Word segmentation processing is carried out to described problem；

3. method according to claim 1 or 2, it is characterised in that the utilization knowledge word and problem word are obtained often respectively The weight similarity of one the alternative knowledge and described problem includes：

Obtain the weight of knowledge word in alternative knowledge；

4. method according to claim 3, it is characterised in that the weight of knowledge word in the alternative knowledge of acquisition, including：

Weight to each knowledge word is normalized.

5. the method according to one of Claims 1 to 4, it is characterised in that the utilization knowledge word and problem word are obtained respectively The vector similarity of each the alternative knowledge and described problem is taken, including：

Obtain the vector of the alternative knowledge；

Obtain the vector of described problem；

6. the knowledge of a kind of intelligent Answer System-problem coalignment, it is characterised in that the device includes：

Problem acquiring unit, for obtaining the problem of client is sent；

Weight similarity acquiring unit, for obtaining each alternative knowledge and described problem respectively using knowledge word and problem word Weight similarity；

Vector similarity acquiring unit, for using knowledge word and problem word obtain respectively each alternative knowledge with it is described The vector similarity of problem；

Total similarity calculated, it is standby for using the weight similarity and the vector similarity, calculating each respectively Select total similarity of knowledge and described problem；

Knowledge-problem matching unit, the alternative knowledge of preset rules is met for obtaining total similarity, as with described problem phase The knowledge of matching.

7. device according to claim 6, it is characterised in that

The knowledge word is prepared by the following：

Word segmentation processing is carried out to the alternative knowledge；

Described problem word is prepared by the following：

Word segmentation processing is carried out to described problem；

8. the device according to claim 6 or 7, it is characterised in that the weight similarity acquiring unit includes：

Problem word weight assigns subelement, for assigning rule to problem word imparting weight in problem according to default weight；

Weight Similarity Measure subelement, for similar to weight described in the weight calculation of problem word using the weight of knowledge word Degree.

9. device according to claim 8, it is characterised in that the Weight Acquisition subelement of the knowledge word includes：

Common Weight Acquisition is from unit, the weight for obtaining each knowledge word, and the weight of the knowledge word is the knowledge word Weight in this knowledge；

Normalized is normalized from unit for the weight to each knowledge word.

10. the device according to one of claim 6~9, it is characterised in that the vector similarity acquiring unit includes：

Vector similarity computation subunit, for using the alternative knowledge vector sum described problem vector calculate it is described to Measure similarity.