The content of the invention
This application provides a kind of method of intelligent Answer System knowledge-problem matching and device, asked with solving intelligence
Answer system problem matched with knowledge it is inaccurate, cause extract answer accuracy rate it is low the problem of.
It is an object of the invention to provide the following aspects:
In a first aspect, this application provides a kind of method of intelligent Answer System knowledge-problem matching, this method includes:
Obtain the problem of client is sent;
Obtain the weight similarity of each alternative knowledge and described problem respectively using knowledge word and problem word;
Obtain the vector similarity of each the alternative knowledge and described problem respectively using knowledge word and problem word;
Using the weight similarity and the vector similarity, each alternative knowledge and described problem are calculated respectively
Total similarity;
The alternative knowledge that total similarity meets preset rules is obtained, the knowledge matched with described problem is used as.
Alternatively, also include before the weight similarity for obtaining each alternative knowledge and described problem respectively:
Generate in knowledge base, the knowledge base comprising alternative knowledge described at least one;
Knowledge is pre-processed, and is carried out word segmentation processing to the alternative knowledge, is removed the stop words in word segmentation processing result, so that
Obtain the knowledge word in the alternative knowledge.
Alternatively, the knowledge word is prepared by the following:
Word segmentation processing is carried out to the alternative knowledge;
Remove the stop words in word segmentation processing result, so as to obtain the knowledge word in the alternative knowledge;
Alternatively, described problem word is prepared by the following:
Word segmentation processing is carried out to described problem;
Remove the stop words in word segmentation processing result, so that the problem of obtaining in described problem word.
Alternatively, the utilization knowledge word and problem word obtain the power of each the alternative knowledge and described problem respectively
Weight similarity includes:
Obtain the weight of knowledge word in alternative knowledge;
Rule is assigned to problem word imparting weight in problem according to default weight;
Utilize the weight of knowledge word and weight similarity described in the weight calculation of problem word.
Alternatively, the weight for obtaining knowledge word in alternative knowledge, including:
The weight of each knowledge word is obtained, the weight of the knowledge word is weight of the knowledge word in this knowledge;
Weight to each knowledge word is normalized.
Alternatively, the weight assigns rule and assigns the condition for presetting weight to judge whether described problem word meets, if
Meet, then assign default weight to described problem word;
If not meeting, the weight of problem word is all standby at each with problem word identical knowledge word in described problem
Select the average value of weight in knowledge;
Described assign presets the condition of weight not include described problem word in the knowledge word.
Alternatively, the utilization knowledge word and problem word obtain the vectorial phase of each alternative knowledge and described problem
Like degree, including:
The vector of the alternative knowledge is obtained,
The vector of described problem is obtained,
The vector similarity is calculated using the vector of the vector sum described problem of the alternative knowledge.
Alternatively, the vector for obtaining the alternative knowledge includes:
The term vector of knowledge word is obtained, the term vector of the knowledge word is word of the knowledge word in the alternative knowledge
Vector;
The vector of the alternative knowledge is calculated using the term vector of the knowledge word.
Alternatively, the vector for obtaining described problem includes:
The term vector of acquisition problem word, the term vector of described problem word is identical with the term vector of knowledge word described in identical;
The vector of described problem is calculated using the term vector of knowledge word.
Alternatively, the weight similarity uses a kind of in Jaccard (Jacobi's distance), Hamming distance and editing distance
Or the mode of a variety of combinations is obtained;
The vector similarity is obtained using cosine manner;
Total similarity of the alternative knowledge and described problem is the weight of the alternative knowledge of same and described problem
Similarity and the linear weighted function sum of the vector similarity.
Alternatively, the preset rules are by total sequencing of similarity of all alternative knowledge and described problem, total phase
Like choosing in degree the maximum.
The application has merged weight similarity and vector similarity in knowledge-problem matching process of intelligent Answer System
Two kinds of similarity evaluation systems, compensate for the systematic error that single similarity evaluation method is present, moreover, the scheme of the application exists
Calculate before weight similarity and vector similarity, word segmentation result pre-processed, remove the stop words in word segmentation result,
Reduce false touch rate, in addition, the weight of the knowledge word to being obtained after pretreatment has carried out normalized, make its threshold value [0,
1], reduce due to different knowledge word weight difference it is big caused by weight Similarity Measure deviation so that problem with it is standby
Select the weight similarity of knowledge more accurate, and then improve the accuracy of total similarity, further increase intelligent answer system
The accuracy that knowledge-problem of uniting is matched.
Second aspect, present invention also provides a kind of knowledge of intelligent Answer System-problem coalignment, described device bag
Include:
Problem acquiring unit, for obtaining the problem of client is sent;
Weight similarity acquiring unit, for using knowledge word and problem word obtain respectively each alternative knowledge with it is described
The weight similarity of problem;
Vector similarity acquiring unit, for using knowledge word and problem word obtain respectively each alternative knowledge with
The vector similarity of described problem;
Total similarity calculated, for utilizing the weight similarity and the vector similarity, is calculated each respectively
Total similarity of the alternative knowledge of bar and described problem;
Knowledge-problem matching unit, the alternative knowledge of preset rules is met for obtaining total similarity, is asked as with described
Inscribe the knowledge matched.
Alternatively, the knowledge word is prepared by the following:
Word segmentation processing is carried out to the alternative knowledge;
Remove the stop words in word segmentation processing result, so as to obtain the knowledge word in the alternative knowledge;
Alternatively, described problem word is prepared by the following:
Word segmentation processing is carried out to described problem;
Remove the stop words in word segmentation processing result, so that the problem of obtaining in described problem word.
Alternatively, the weight similarity acquiring unit includes:
Knowledge word Weight Acquisition subelement, the weight for obtaining knowledge word in alternative knowledge;
Problem word weight assigns subelement, is weighed for assigning rule according to default weight to problem word imparting in problem
Weight;
Weight Similarity Measure subelement, for the weight and weight phase described in the weight calculation of problem word using knowledge word
Like degree.
Alternatively, the knowledge word Weight Acquisition subelement includes:
Common Weight Acquisition is from unit, the weight for obtaining each knowledge word, and the weight of the knowledge word is known to be described
Know weight of the word in this knowledge;
Normalized is normalized from unit for the weight to each knowledge word.
Alternatively, described problem word power is assigned in subelement, and the weight assigns rule whether to judge described problem word
Meet the condition for assigning default weight, if meeting, default weight is assigned to described problem word;If not meeting, described problem
The weight of middle problem word is all average values with problem word identical knowledge word weight in each alternative knowledge;
Described assign presets the condition of weight not include described problem word in the knowledge word.
Alternatively, the vector similarity acquiring unit includes:
Knowledge vector obtains subelement, the vector for obtaining the alternative knowledge;
Problem vector obtains subelement, the vector for obtaining described problem;
Vector similarity computation subunit, the vector for the vector sum described problem using the alternative knowledge calculates institute
State vector similarity.
Alternatively, the knowledge vector obtains subelement and included:
The term vector of knowledge word is obtained from unit, the term vector for obtaining knowledge word, and the term vector of the knowledge word is
Term vector of the knowledge word in the alternative knowledge;
Knowledge vector is calculated from unit, and the vector of the alternative knowledge is calculated for the term vector using the knowledge word.
Alternatively, described problem vector obtains subelement and included:
The term vector of problem word is obtained from unit, the term vector for obtaining problem word, the term vector of described problem word with
The term vector of knowledge word is identical described in identical;
Problem vector is calculated from unit, and the vector of described problem is calculated for the term vector using knowledge word.
Alternatively, the weight similarity uses a kind of in Jaccard (Jacobi's distance), Hamming distance and editing distance
Or the mode of a variety of combinations is obtained;
The vector similarity is obtained by the way of cosine;
Total similarity of the alternative knowledge and described problem is the weight of the alternative knowledge of same and described problem
Similarity and the linear weighted function sum of the vector similarity;
The preset rules are that, by total sequencing of similarity of all alternative knowledge and described problem, total similarity is maximum
Selected in person.
Embodiment
Below by the present invention is described in detail, the features and advantages of the invention will become more with these explanations
To be clear, clear and definite.
The present invention described below.
According to the first aspect of the application, there is provided the knowledge of a kind of intelligent Answer System-problem matching process, such as Fig. 1
It is shown, wherein, this method includes:
S101 obtains the problem of client is sent;
S102 obtains the weight similarity of each alternative knowledge and described problem using knowledge word and problem word respectively;
In this application, knowledge word is the word segmentation result of alternative knowledge;Problem word is the participle for the problem of client is sent
As a result.
It is vectorial similar to described problem that S103 obtains each alternative knowledge respectively using knowledge word and problem word
Degree;
S104 utilizes the weight similarity and the vector similarity, and each alternative knowledge is calculated respectively and is asked with described
Total similarity of topic;
S105 obtains the alternative knowledge that total similarity meets preset rules, is used as the knowledge matched with described problem.
In this application, the knowledge word is prepared by the following:
Word segmentation processing is carried out to the alternative knowledge;
Remove the stop words in word segmentation processing result, so as to obtain the knowledge word in the alternative knowledge.
In this application, described problem word is prepared by the following:
Word segmentation processing is carried out to described problem;
Remove the stop words in word segmentation processing result, so that the problem of obtaining in described problem word.
In this application, the alternative knowledge is asked including a standard and asked with optional extension, wherein, it is described extension ask for
The different expression-forms that standard is asked, it expresses same semanteme, such as by banking on being said exemplified by how handling credit card
Bright, the alternative knowledge relevant with " how handling credit card " that is stored with the knowledge base includes:" credit card handles flow ",
" where I can handle credit card ", " handling credit card step " etc., regard one of problem of above mentioned problem as standard
Ask, other three problems are asked as extension.In the present embodiment, such as a problem " credit card of foremost will can be come
Handle flow " asked as standard, ask, can also specify in other embodiments using three other problems as corresponding extension
Other problems are asked for standard.
It should be noted that the standard ask with it is described extension ask both can use semantic formula form, also may be used
To use specific question sentence form, it is all in protection scope of the present invention.
In this application, based on alternative knowledge every described, the knowledge word is the result after duplicate removal, i.e. in same
In the word segmentation processing result of alternative knowledge, identical entry is only according to an entry meter, for example:Alternative knowledge is " I wants to handle
Credit card, how I handle China Merchants Bank's credit card ", the result after word segmentation processing be respectively " I ", " handling ", " credit card ",
" I ", " how ", " handling ", " China Merchants Bank ", " credit card ", i.e. although including two identical character strings in the alternative knowledge of this
The result of the word segmentation processing of the alternative knowledge of " I ", " handling ", " credit card ", but this is " I ", " handling ", " credit card ", " such as
What " and " China Merchants Bank ".
It is to pre-establish a deactivation word list to remove stop words processing, every in word segmentation processing result when removing stop words
One entry with disable word list in word matched, if the entry be present in disable word list in, by the entry from
Deleted in entry string after Chinese word segmentation processing.
Stop words described herein be the word without practical significance, such as auxiliary words of mood or structural auxiliary word, for example " ",
" ", " ", " ", " " etc..
The inventors discovered that, remove after stop words, the noise in described problem can be removed, so that the alternative knowledge
Similarity with described problem is more accurate, so as to improve the accuracy rate of knowledge-problem matching, and then improves intelligent Answer System
The accuracy rate of answer.
In this application, step S111 and step S112 can also be after step slol before step S101.
In the application S101, it is to obtain the text that client sends problem to obtain the problem of client is sent.
In this application, can be any as shown in figure 1, the order to step S102 and step S103 is not specially limited
Sequentially, i.e., the order of described two steps can be S102-S103, or S103-S102.
In this application, the weight similarity refer to the knowledge of the weight calculation of knowledge based word and problem word with
The similarity of described problem.
In the application S102, as shown in Fig. 2 the utilization knowledge word and problem word obtain each alternative knowledge respectively
Include with the weight similarity of described problem:
S201 obtains the weight of knowledge word in alternative knowledge;
S202 assigns rule to problem word imparting weight in problem according to default weight;
S203 utilizes the weight of knowledge word and weight similarity described in the weight calculation of problem word.
In the application S201, the weight for obtaining knowledge word in alternative knowledge includes:
S2011 obtains the weight of each knowledge word, and the weight of the knowledge word is the knowledge word in this knowledge
Weight;
The weight of each knowledge word is normalized S2012.
In the application S2011, the weight of each knowledge word passes through tf-idf (term frequency-inverse
Document frequency, term frequency-inverse document rate) method acquisition.
In the application S2012, the weight to each knowledge word passes through normalized, and it is [0,1] to make its threshold value.
The inventors discovered that, the weighted value scope obtained by tf-idf methods is big, for example, can reach [Isosorbide-5-Nitrae 000], such as
The weight of fruit each knowledge word in terms of the weight, then can cause the small knowledge word of weight to be about 0 when calculating similarity, so that
Cause to calculate obtained weight similarity and true similarity difference is huge, i.e. the serious misalignment of weight similarity, and then cause intelligence
The answer accuracy rate that energy question answering system is provided is low.
The application passes through normalized to the weight of each knowledge word, and it is [0,1] to make its threshold value, has both maintained each
The characteristic distributions of individual knowledge word weight, also reduce the gap of each knowledge word weight so that alternative knowledge and described problem
Similarity Measure is more reasonable accurate.
In the application S202, the weight assigns rule and assigns default weight to judge whether described problem word meets
Condition, if meeting, default weight is assigned to described problem word;If not meeting, the weight of problem word is institute in described problem
There is the average value with problem word identical knowledge word weight in each alternative knowledge.
In this application, described assign presets the condition of weight not include described problem word in the knowledge word.
In this application, all default weights for meeting the problem of assigning the condition for presetting weight word are all identical, such as obtain
The problem of taking be " how I handle credit card ", word segmentation processing and remove the result after credit word for " I ", " how ", " do
Reason ", " credit card ", in knowledge word only " how ", " handling ", " credit card ", i.e. do not include problem word " I " in knowledge word,
Therefore, assign problem word " I " default weight (such as 0.2).
In the application S2023, word segmentation processing is carried out to different alternative knowledge, identical knowledge word is likely to be obtained, not
With in alternative knowledge, the weight possibility of above-mentioned identical knowledge word is identical may also be different, when they are different, choose any one
Individual weight assign problem word be all it is unilateral, it is inaccurate, and take average weight of the above-mentioned knowledge word in all alternative knowledge
Then it is of universal significance, enables to knowledge weight more accurate with problem weight.
In the application one embodiment, alternative knowledge is " credit card handles flow ", " credit card logout flow path ", " public affairs
Hand over card to handle flow " and " mass transit card logout flow path ", then knowledge word average value of weight in each alternative knowledge obtain as follows
Take:
Wherein, each knowledge word average value of weight in each alternative knowledge is respectively:
In this application, obtained in S202 in described problem before the weight of problem word, in addition to:
S221 carries out word segmentation processing to described problem;
S222 removes the stop words in word segmentation processing result, so that the problem of obtaining in described problem word.
In this application, word segmentation processing mode is carried out to described problem with carrying out the side of word segmentation processing to the alternative knowledge
Formula is identical.
In this application, remove the mode of stop words in described problem word segmentation result and remove the alternative knowledge point with described
The mode of stop words is identical in word result.
In the application S203, the weight of the utilization knowledge word is led to weight similarity described in the weight calculation of problem word
One or more kinds of modes combined in Jaccard (Jacobi's distance), Hamming distance and editing distance are crossed to obtain.
In the application one embodiment, calculate the alternative knowledge is with described problem for " credit card handles flow "
The method of the weight similarity of " how I handle credit card " is:
Set of the set A as described problem word and its weight is set, set B is the set of knowledge word and its weight, then collects
Close A be " I ", " how ", " handling " and " credit card ", weight is respectively 1/5,1/3,1/3 and 1/3, set B be " credit card ",
" handling " and " flow ", weight is respectively 1/3,1/3 and 1/3, and the weight similarity of described problem and the alternative knowledge is:
Jaccard (A, B)=| A intersect B |/| A union B |
Wherein, Jaccard (A, B) represents set A and B weight similarity;
| A intersect B | represent the weight sum that the set of A, B two is occured simultaneously;
| A union B | represent that A, B two gathers the weight sum of union;
For the present embodiment, the intersection of sets collection of A, B two is " handling " and " credit card ", and weight is respectively 1/3 and 1/3, A, B
Two union of sets collection be " I ", " how ", " handling ", " credit card " and " flow ", weight is respectively 1/5,1/3,1/3,1/3 and
1/3;
Then Jaccard (A, B)=(1/3+1/3)/(1/5+1/3+1/3+1/3+1/3)=(2/3)/(5/7)=10/21,
That is, the weight similarity of described alternative knowledge and described problem is 10/21.
In the application S103, as shown in figure 3, the utilization knowledge word and problem word obtain respectively each it is described alternative
The vector similarity of knowledge and described problem includes:
S301 obtains the vector of the alternative knowledge,
S302 obtains the vector of described problem,
S303 calculates the vector similarity using the vector of the vector sum described problem of the alternative knowledge.
In this application, the vector similarity refer to the knowledge that the vector of knowledge based word and problem word calculates with
The similarity of described problem.
In the application S301, the vector for obtaining the alternative knowledge, including:
S3011 obtains the term vector of knowledge word, and the term vector of the knowledge word is the knowledge word in the alternative knowledge
In term vector;
S3012 calculates the vector of the alternative knowledge using the term vector of the knowledge word.
In this application, the dimension of described problem word term vector is identical with the dimension of the knowledge word term vector.
The term vector of identical knowledge word is identical in the knowledge base, and the term vector of the knowledge word passes through word2vec
A kind of mode in (word2vector, i.e. word are embedded in) or one-hot (one-hot encoding, i.e. one-hot coding) is obtained
Take.
In this application, the vector of the alternative knowledge is the average term vector of all knowledge words in described problem, i.e. institute
There is knowledge word to be averaged on every dimension resulting vector, alternative knowledge is " credit card handles flow " as described, point
The result of word processing is " credit card ", " handling " and " flow ", and their vector representation is respectively:
Credit card vector representation [8/10,1/10,1/10]
Handle vector representation [3/10,6/10,1/10]
Flow vector representation [4/10,2/10,4/10]
Then the vector of the alternative knowledge is [(8/10+3/10+4/10)/3, (1/10+6/10+2/10)/3, (1/10+1/10
+ 4/10)/3]=[1/2,3/10,1/5].
In the application S302, the vector for obtaining described problem, including:
S3021 obtains the term vector of problem word, the term vector of knowledge word described in the term vector and identical of described problem word
It is identical;
S3022 calculates the vector of described problem using the term vector of knowledge word.
In this application, the vectorial and vectorial method phase for obtaining the alternative knowledge for obtaining described problem
Together.
In the application S303, the vector of the vector sum described problem using the alternative knowledge calculates the vector
Similarity, the vector similarity is obtained by the way of cosine.
In one embodiment of the application, in a cosine manner exemplified by illustrate the acquisition of vector similarity:For example, calculating standby
The vector similarity of knowledge " credit card handles flow " and described problem " how I handle credit card " is selected, setting knowledge word is with asking
The term vector of epigraph is three-dimensional, and the alternative knowledge is carried out into word segmentation processing and removes stop words, obtaining result is:
Credit card vector representation [8/10,1/10,1/10]
Handle vector representation [3/10,6/10,1/10]
Flow vector representation [4/10,2/10,4/10]
Then the vector of the alternative knowledge is expressed as [(8/10+3/10+4/10)/3, (1/10+6/10+2/10)/3, (1/ with A
10+1/10+4/10)/3]=[1/2,3/10,1/5], i.e. A=[1/2,3/10,1/5];
Described problem is subjected to word segmentation processing and removes stop words, obtaining result is:
Then problem vector be expressed as with B [(1/10+5/10+3/10+8/10)/4, (3/10+4/10+6/10+1/10)/
4, (6/10+1/10+1/10+1/10)/4]=[17/40,14/40,9/40], i.e. B=[17/40,14/40,9/40];
Then the vector similarity of the alternative knowledge and described problem is equal to for the cosine of A and B angle theta:
Specifically, A=[1/2,3/10,1/5], B=[17/40,14/40,9/40], then:
A and B similarity
Cos θ=
[(17/40*1/2)+(14/40*3/10)+(9/40*1/5)]/sqrt[(17/40*17/40+14/40*14/40+
9/40*9/40)*(1/2*1/2+3/10*3/10+1/5*1/5)]
=98/100,
That is, the vector similarity of described alternative knowledge and described problem is 98/100.
It is described to utilize the weight similarity and the vector similarity in the application S104, each is calculated respectively
Total similarity of alternative knowledge and described problem, to seek the weight similarity of current alternative knowledge and described problem and described
The linear weighted function sum of vector similarity, i.e. assign weight the first predetermined coefficient of similarity respectively, vector similarity second is preset
Coefficient, calculates the product of weight similarity and the first predetermined coefficient respectively, and vector similarity and the second predetermined coefficient product,
Total similarity is two sum of products.
In the application S104, total similarity is calculated according to following formula I:
DAlways=a*DWeight+b*DVectorFormula I
Wherein, DAlwaysTotal similarity is represented,
DWeightWeight similarity is represented,
DVectorRepresent vector similarity,
A represents the first predetermined coefficient,
B represents the second predetermined coefficient,
Also, 0<a<1, a+b=1.
In the application S105, the preset rules are to arrange total similarity of all alternative knowledge and described problem
Selected in sequence, total similarity the maximum.
To be more fully understood by the method that intelligent Answer System of the present invention extracts knowledge, one is set forth below specifically
Embodiment is illustrated.
Described problem is " how I handle credit card ", and alternative knowledge is " credit card handles flow ", " credit card nullifies stream
Journey ", " mass transit card handles flow " and " mass transit card logout flow path ", then the process that the alternative knowledge is matched with described problem is:
(1) word segmentation processing is carried out to the alternative knowledge, removes stop words therein, obtain knowledge word, recycle tf-
Idf calculates weight of each knowledge word in this alternative knowledge, as a result as follows:
(2) according to the result of (1), calculating each knowledge word average value of weight in each alternative knowledge is respectively:
(3) word segmentation processing is carried out to described problem, removes stop words therein, problem word is obtained, according to default weight
Assign rule and assign weight to described problem word, in the present embodiment, default weight is 1/5, and weight knot is assigned to problem word
Fruit is as follows:
Then, alternative knowledge " credit card handles flow " and the weight similarity of described problem " how I handle credit card "
For:Jaccard (A, B)=| A intersect B |/| A union B |=(1/3+1/3)/(1/5+1/3+1/3+1/3+1/3)
=(2/3)/(5/7)=10/21, i.e. the weight similarity of the alternative knowledge and described problem is 10/21;Remaining is alternatively known
The weight similarity known with described problem is calculated successively in the same manner, obtains result as shown in table 1 below;
(4) in the present embodiment, it is three-dimensional to set each knowledge word and the term vector of problem word, uses word2vec
The term vector of each knowledge word and problem word is obtained, by taking alternative knowledge " credit card handles flow " as an example, calculates described alternative
The vector similarity of knowledge and described problem, it is as follows:
The knowledge term vector of the alternative knowledge is followed successively by:
Credit card vector representation [8/10,1/10,1/10]
Handle vector representation [3/10,6/10,1/10]
Flow vector representation [4/10,2/10,4/10],
Then the vector of the alternative knowledge is expressed as [(8/10+3/10+4/10)/3, (1/10+6/10+2/10)/3, (1/ with A
10+1/10+4/10)/3]=[1/2,3/10,1/5], i.e. A=[1/2,3/10,1/5];
Described problem is subjected to word segmentation processing and removes stop words, obtaining result is:
Then problem vector be expressed as with B [(1/10+5/10+3/10+8/10)/4, (3/10+4/10+6/10+1/10)/
4, (6/10+1/10+1/10+1/10)/4]=[17/40,14/40,9/40], i.e. B=[17/40,14/40,9/40];
Then the vector similarity of the alternative knowledge and described problem is equal to for the cosine of A and B angle theta:
Specifically, A=[1/2,3/10,1/5], B=[17/40,14/40,9/40], then:
A and B similarity
Cos θ=[(17/40*1/2)+(14/40*3/10)+(9/40*1/5)]/sqrt [(17/40*17/40+14/40*
14/40+9/40*9/40)*(1/2*1/2+3/10*3/10+1/5*1/5)]
=98/100,
That is, the vector similarity of described alternative knowledge and described problem is 98/100, remaining alternative knowledge and described problem
Vector similarity calculate successively in the same manner, obtain result as shown in table 1 below;
(5) calculate respectively same alternative knowledge and described problem weight similarity and vector similarity linear weighted function it
With, wherein, the first predetermined coefficient is a=0.4, and the second predetermined coefficient is b=0.6, as a result as shown in table 1.
The similarity result of the alternative knowledge of table 1 and problem
(6) " total similarity " in comparison sheet 1, the maximum alternative knowledge of numerical value is " credit card handles flow ", i.e. matching knot
Fruit is " credit card handles flow ".
According to the second aspect of the application, a kind of knowledge-problem coalignment of intelligent answer is additionally provided, such as Fig. 4 institutes
Show, described device includes:
Problem acquiring unit 401, for obtaining the problem of client is sent;
Weight similarity acquiring unit 402, for using knowledge word and problem word obtain respectively each alternative knowledge with
The weight similarity of described problem;
Vector similarity acquiring unit 403, described alternatively knows for obtaining each respectively using knowledge word and problem word
Know the vector similarity with described problem;
Total similarity calculated 404, for utilizing the weight similarity and the vector similarity, is calculated every respectively
Total similarity of one alternative knowledge and described problem;
Knowledge-problem matching unit 405, the alternative knowledge of preset rules is met for obtaining total similarity, as with institute
The knowledge that the problem of stating matches.
In this application, the knowledge word is prepared by the following:
Word segmentation processing is carried out to the alternative knowledge;
Remove the stop words in word segmentation processing result, so as to obtain the knowledge word in the alternative knowledge;
In this application, described problem word is prepared by the following:
Word segmentation processing is carried out to described problem;
Remove the stop words in word segmentation processing result, so that the problem of obtaining in described problem word.
In this application, as shown in figure 5, the weight similarity acquiring unit 402 includes:
Knowledge word Weight Acquisition subelement 4021, the weight for obtaining knowledge word in alternative knowledge;
Problem word weight assigns subelement 4022, is assigned for assigning rule according to default weight to problem word in problem
Weight;
Weight Similarity Measure subelement 4023, for utilizing the weight of knowledge word and power described in the weight calculation of problem word
Weight similarity.
In this application, the knowledge word Weight Acquisition subelement 4021 includes:
Common Weight Acquisition is from unit 40211, the weight for obtaining each knowledge word, and the weight of the knowledge word is institute
State weight of the knowledge word in this knowledge;
Normalized is normalized from unit 40212 for the weight to each knowledge word.
In this application, described problem word power is assigned in subelement 4022, and the weight assigns rule and asked described in judging
Whether epigraph meets the condition for assigning default weight, if meeting, and default weight is assigned to described problem word;If not meeting,
In described problem the weight of problem word for it is all with problem word identical knowledge word in each alternative knowledge weight is averaged
Value;
Described assign presets the condition of weight not include described problem word in the knowledge word.
In this application, as shown in fig. 6, the vector similarity acquiring unit 403 includes:
Knowledge vector obtains subelement 4031, the vector for obtaining the alternative knowledge;
Problem vector obtains subelement 4032, the vector for obtaining described problem;
Vector similarity computation subunit 4033, for the vector sum described problem using the alternative knowledge to gauge
Calculate the vector similarity.
In this application, the knowledge vector obtains subelement 4031 and included:
The term vector of knowledge word is obtained from unit 40311, the term vector for obtaining knowledge word, the word of the knowledge word to
Measure the term vector in the alternative knowledge for the knowledge word;
Knowledge vector is calculated from unit 40312, for calculating the alternative knowledge using the term vector of the knowledge word
Vector.
In this application, described problem vector obtains subelement 4032 and included:
The term vector of problem word is obtained from unit 40321, the term vector for obtaining problem word, the word of described problem word to
Amount is identical with the term vector of knowledge word described in identical;
Problem vector is calculated from unit 40322, and the vector of described problem is calculated for the term vector using knowledge word.
In this application, the weight similarity is used in Jaccard (Jacobi's distance), Hamming distance and editing distance
The mode that one or more are combined is obtained;
The vector similarity is obtained by the way of cosine;
Total similarity of the alternative knowledge and described problem is the weight of the alternative knowledge of same and described problem
Similarity and the vector similarity sum;
The preset rules are that, by total sequencing of similarity of all alternative knowledge and described problem, total similarity is maximum
Selected in person.
Fig. 7 show can thereon implement embodiment computer system 800 block diagram.Computer system 800 is wrapped
Include processor 810, storage medium 820, system storage 830, monitor 840, keyboard 850, mouse 860, the and of network interface 820
Video adapter 880.These parts are coupled by system bus 890.
Storage medium 820 (such as hard disk) stores multiple programs, including operating system, application program and other program moulds
Block.User can input into computer system 800 order and information by input equipment, input equipment be, for example, keyboard 850,
Touch pad (not shown) and mouse 860.Text and graphical information are shown using monitor 840.
Operating system is on processor 810 and for coordinating and providing in the personal computer system 800 in Fig. 7
Various parts control.Furthermore, it is possible in computer system 800 using computer program with implement it is above-mentioned it is various implement
Example.
It would be recognized that hardware component shown in Fig. 7 is only for illustrative purposes, and physical unit may be according to be real
Apply the present invention and dispose computing device and change.
In addition, computer system 800 for example can be desktop computer, server computer, laptop computer or nothing
Line equipment, such as mobile phone, personal digital assistant (PDA), handheld computer.
It would be recognized that the embodiment in the scope of the invention can be embodied as to the form of computer program product, computer
Program product includes computer executable instructions, such as program code, and it can run on any with reference to appropriate operating system
In appropriate computing environment, operating system is, for example, Microsoft Windows, Linux or UNIX operating system.The scope of the invention
Interior embodiment can also include program product, and program product includes computer-readable medium can for carrying or storing computer
Execute instruction or data structure are thereon.Such computer-readable medium can be it is any can by it is universal or special calculate
The usable medium that machine is accessed.For example, such computer-readable medium can include RAM, ROM, EPROM, EEPROM, CD-
ROM, magnetic disk storage or other storage devices, or can be used in carrying with form of computer-executable instructions or store desired
Program code and any other medium that can be accessed by universal or special computer.
The intelligent Answer System provided according to the present invention extracts method, device and the system of answer, with following beneficial
Effect:
(1) normalized has been carried out to weight, it is more accurate as evaluation factor thereby using weight;
(2) two kinds of factors of comprehensive weight and vector judge the similarity of problem and knowledge so that similarity judges more to be defined
Really;
(3) work of artificial correction problem is reduced, the substantial amounts of cost of labor of enterprise is saved.
The present invention is described in detail above in association with embodiment and exemplary example, but these explanations are simultaneously
It is not considered as limiting the invention.It will be appreciated by those skilled in the art that without departing from the spirit and scope of the invention,
A variety of equivalencings, modification can be carried out to technical solution of the present invention and embodiments thereof or is improved, these each fall within the present invention
In the range of.Protection scope of the present invention is determined by the appended claims.