CN101520802A - Question-answer pair quality evaluation method and system - Google Patents
Question-answer pair quality evaluation method and system Download PDFInfo
- Publication number
- CN101520802A CN101520802A CN200910081558A CN200910081558A CN101520802A CN 101520802 A CN101520802 A CN 101520802A CN 200910081558 A CN200910081558 A CN 200910081558A CN 200910081558 A CN200910081558 A CN 200910081558A CN 101520802 A CN101520802 A CN 101520802A
- Authority
- CN
- China
- Prior art keywords
- answer
- question
- quality
- evaluation result
- quality evaluation
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a question-answer pair quality evaluation method including the steps of clustering input question-answer pairs according to question contents to obtain a cluster including questions with same or similar semantic meanings and answers to the questions; performing quality evaluation between the question-answer pairs and quality evaluation in the question-answer pairs to the cluster and respectively obtaining the quality evaluation result between the question-answer pairs and the quality evaluation result in the question-answer pairs; inosculating the quality evaluation result between the question-answer pairs and the quality evaluation result in the question-answer pairs and outputting the question-answer pairs with high quality. The invention also provides a question-answer pair quality evaluation system so as to realize more effective question-answer pair quality evaluation and improve the commonality of quality evaluation.
Description
Technical field
The present invention relates to the internet information treatment technology, relate in particular to right quality evaluating method of a kind of question and answer and system.
Background technology
Along with Internet development, information is more and more abundanter, and how obtaining useful knowledge from the information of magnanimity is present urgent problem.For better knowledge services can be provided, a plurality of knowledge question interaction platforms grow up successively.On these knowledge question interaction platforms, the user is the consumer of content, also is the creator of content; The user can by the knowledge question interaction platform seek amusement help, carry out social interaction, also can put question to and answer a question and the answer of problem is estimated.Typical question and answer produce flow process: the user asks a question on the knowledge question interaction platform, and other users participate in answering, and the user of enquirement confirms a satisfied answer to the answer of different user.
Along with increasing of problem number, the semantic problem number that repeats is also more and more, and big multi-user is when puing question to, and whether care system the inside does not exist identical problem and answer.Therefore, on present question and answer interaction platform, the question and answer that exist a lot of repetitions are right.Though for the problem that has solved, all passed through the quizmaster and confirmed this step, different quizmasters' affirmation standard is different, some quizmaster thanks the answerer to furnish an answer and provides very high evaluation, and does not mind the quality of answer.Therefore, exist on the knowledge question interaction platform of replication problem and answer at these, distinguish high-quality question and answer to low-quality question and answer to seeming very necessary.
Exist a kind of decision tree framework that adopts to merge various features in the prior art to the method for question and answer to classifying.The feature of using comprises: based on the content of text feature with based on the usage feature.The N tuple (N-grams) of content of text feature such as speech, the length of speech, different speech number, frequency be greater than speech number of threshold value or the like based on the ternary syntax (Trigram) the language model entropy of character, in answering.The usage feature mainly comprises: the user is for the right rank of agreeing with and oppose number, answerer of question and answer, rank of quizmaster or the like.This method is studied at the different characteristic role, and with its incorporate under the decision tree framework realize to the high-quality question and answer to the right differentiation of middle inferior quality question and answer.
Yet, this method do not consider a problem and answer between the semantic matches degree, the semantic matches degree between problem and the answer is the right bases of high-quality question and answer; This method do not consider many repetition question and answer between relation for the influence of question and answer to quality; In addition, the right data of question and answer lack the usage feature in the production process usually, and this method more relies on the usage feature, can influence its versatility.This shows, prior art to question and answer when carrying out quality assessment, its effect is unsatisfactory, and has the problem of versatility difference.
Summary of the invention
In view of this, fundamental purpose of the present invention is to provide right quality evaluating method of a kind of question and answer and system, to realize that to question and answer more effective quality assessment is improved versatility.
For achieving the above object, technical scheme of the present invention is achieved in that
The invention provides the right quality evaluating method of a kind of question and answer, this method comprises:
To the question and answer of input to carrying out cluster according to problem content, obtain by identical or close problem of semanteme and answer thereof form bunch;
To described bunch carry out question and answer to quality assessment and the internal quality assessment of question and answer, and obtain respectively question and answer to quality evaluation result and the internal quality evaluation result of question and answer;
To described question and answer to quality evaluation result and the internal quality evaluation result of question and answer merge, the question and answer of outputting high quality are right.
Described cluster comprises: k-means cluster and single pass cluster.
Described single pass cluster is specially:
The problem of back input is carried out similarity calculating one by one with the class of current existence,, then described problem is merged with corresponding class if the similarity of described problem and one of them class exceeds default similarity threshold; If the similarity of all classes of described problem and current existence all is lower than default similarity threshold, then be that described problem is created a new class.
State question and answer to quality assessment, be specially:
Participle, part-of-speech tagging and the processing of removal stop words are carried out in each answer in described bunch;
Add up the document frequency that each speech occurs, and with document frequency greater than the speech of frequency threshold as bunch in the theme center of all answers;
Calculate the distance at each answer and theme center by general cosine distance function, and each answer is sorted according to the weights size of distance;
According to calculating based on the similarity of sentence level, eliminate similarity relation and relation of inclusion in the answer after the ordering, obtain described question and answer to quality evaluation result.
The internal quality assessment of described question and answer comprises: the matching degree calculating of evaluation, problem and the answer of problem and answer quality and single question and answer are to the evaluation of quality.
Described problem and the evaluation content of answering quality comprise at least a in the following content: the length of problem formatted message, answer, answer in visual signature information, the positive counter-example dictionary of problem feature, answer positive counter-example dictionary feature and question and answer to the non-text feature in the forming process.
This method further comprises: the matching degree that obtains described problem and answer by the mode based on the theme cluster.
Described single question and answer are specially the evaluation of quality:
By the maximum entropy statistical model following feature is merged, obtains the right quality assessment score value of each question and answer:
The length of problem formatted message, answer, answer in visual signature information, the positive counter-example dictionary of problem feature, answer positive counter-example dictionary feature, question and answer matching degree to non-text feature, problem and answer in the forming process.
The present invention also provides a kind of question and answer right QA system, and this system comprises:
The cluster module is used for question and answer to input to carrying out cluster according to problem content, obtain by identical or close problem of semanteme and answer thereof form bunch;
The first quality assessment module, be used for to described bunch carry out question and answer to quality assessment, obtain question and answer to quality evaluation result;
The second quality assessment module is used for carrying out the internal quality assessment of question and answer to described bunch, obtains the internal quality evaluation result of question and answer;
Fusion Module, be used for to described question and answer to quality evaluation result and the internal quality evaluation result of question and answer merge, the question and answer of outputting high quality are right.
Quality evaluating method and system that a kind of question and answer provided by the present invention are right, by to the question and answer of input to carrying out cluster according to problem content, obtain by identical or close problem of semanteme and answer thereof form bunch; Then to bunch carry out question and answer to quality assessment and the internal quality assessment of question and answer, and obtain respectively question and answer to quality evaluation result and the internal quality evaluation result of question and answer; Again to question and answer to quality evaluation result and the internal quality evaluation result of question and answer merge, and then the question and answer of outputting high quality are right.The present invention has realized question and answer more effective quality assessment, and versatility is higher.
High-quality question and answer can be separated low-quality question and answer centering therefrom by the present invention, be formed high-quality knowledge base; As the data source of search engine, the part of high-quality data as search engine index directly can be placed on the forward position of Search Results; As the knowledge base of automatic question answering, can be with select quality data directly as the knowledge source of automatic question answering, for the user furnishes an answer.In addition, the present invention not only can handle the knowledge question data, also can handle the data that users such as blog, forum, BBBS (Bulletin Board System)BS (BBS, Bulletin Board System), frequently asked questions and corresponding answer (FAQ, Frequently AskedQuestions) question and answer data produce; Can directly be used for setting up encyclopaedic knowledge through the quality data after estimating.
Description of drawings
Fig. 1 is the process flow diagram of the right quality evaluating method of a kind of question and answer of the present invention;
The synoptic diagram of Fig. 2 in the embodiment of the invention quality evaluation result being merged;
Fig. 3 is the composition structural representation of the right QA system of a kind of question and answer of the present invention.
Embodiment
The technical solution of the present invention is further elaborated below in conjunction with the drawings and specific embodiments.
The right quality evaluating method of a kind of question and answer provided by the present invention as shown in Figure 1, mainly may further comprise the steps:
Step 101, to the question and answer of input to carrying out cluster according to problem content, obtain by identical or close problem of semanteme and answer thereof form bunch.
In actual applications, because there is different expression waies in same problem, therefore by the operation of problem cluster, problem that can semanteme is identical or close is poly-to be a class, these semantic identical or close problems and corresponding answer then formed the corresponding a plurality of answers of a plurality of problems bunch.The present invention can adopt such as clustering algorithm problem of implementation clusters such as k-means cluster and single pass clusters, but clustering algorithm of the present invention is not limited only to above-mentioned the act, can also expand according to actual needs.Be that example describes with the single pass cluster below, the principle of single pass cluster is: the problem of back input and the class of current existence are carried out similarity one by one and are calculated, if exceed default similarity threshold, then this problem is merged with corresponding class with the similarity of certain class; If all be lower than default similarity threshold, then be that this problem is created a new class with the similarity of all classes of current existence.Handle the ability of large-scale data in order to improve cluster operation, the descriptor of the present invention in can the employing problem is as the index of each class.
Concrete single pass cluster operation is:
Steps A is analyzed the problem of back input, comprises that the sentence to problem carries out operations such as participle, part-of-speech tagging and removal stop words.
Step B carries out normalized to the speech that obtains after analyzing.
Set up synonymicon, and adopt same speech to represent all synonyms according to synonymicon.For example: all full name are replaced with abbreviation, the speech of fallibility is represented with correct speech.Synonymicon is to put acquisition in order by the human-edited, comprising vocabulary of equal value, as: computer=computing machine, Dad=father=loving father=father, much=how old; Also comprise simple full name, as: the Olympic Games (the full name)=Olympic Games (abbreviation); Also comprise the vocabulary that mistake is corrected, as: wealth is paid logical=Wealth tong, and is electric bright=as to light or the like.
Step C extracts the descriptor according to the weights ordering from the process problem of normalized.
The leaching process of descriptor is specially: search the word frequency tf that each speech occurred and the document frequency df of this problem from the statistics language material, and adopt formula λ log (tf) log (1/df) to compose for each speech and go up weights; According to weights order from big to small all speech in the problem are sorted, and ordering is extracted several forward speech as descriptor (for example preceding 3 speech) according to weights.Wherein, λ is the different parameters value that is provided with at different parts of speech, and the λ value of noun, adjective, verb, adverbial word reduces successively usually; So-called word frequency is meant the frequency of adding up each speech appearance of being added up in the language material; So-called document frequency is meant the frequency of occurrences of adding up the document of being added up in the language material that comprises each speech.
Step D according to the descriptor of extracting, calculates the similarity of each class of this problem and current existence from problem, if the similarity of this problem and certain class exceeds default similarity threshold, then this problem is merged with corresponding class; If all be lower than default similarity threshold with the similarity of all classes of current existence, then be that this problem produces a new class, and with the descriptor of this problem as the new index of class.
Owing to the word in the problem has been carried out normalized, so the similarity of computational problem and each class is exactly the number that there is same words in comparison.The value defined of similarity is
Wherein, k represents the number of same words, tf
kThe word frequency of representing k same words, df
kThe document frequency of representing k same words, λ
kRepresent k same words corresponding parameters value.
Through after the problem cluster, semantic identical or close problem and gather into a class, these semantic identical or close problems and corresponding answer then formed the corresponding a plurality of answers of a plurality of problems bunch.
Step 102, to bunch carry out question and answer to quality assessment and the internal quality assessment of question and answer, and obtain respectively question and answer to quality evaluation result and the internal quality evaluation result of question and answer.
To bunch in difference answer and to analyze, the mutual relationship between answering judges that question and answer are quality assessment problems to be solved between question and answer to quality according to each.The present invention to answer in each bunch between quality assessment, adopt a kind of quality evaluating method based on the theme center, specifically comprise:
Step a, with bunch in each answer be used as a document, carry out processing such as participle, part-of-speech tagging and removal stop words.
Step b adds up the document frequency that each speech occurs, and with document frequency greater than the speech of frequency threshold as bunch in the theme center of all answers.
Frequency threshold can be set according to actual needs, and for example: the setpoint frequency threshold value is 1, then with document frequency greater than the theme center as whole answer such as all nouns of 1, verb, adjective, adverbial word.A document is used as in each answer in bunch, so-called document frequency greater than 1 promptly be meant this speech bunch at least two answers in occurred.
Step c calculates the distance at each answer and theme center by general cosine distance function, and according to the weights size of distance all answers is sorted.
If the feature word set at the theme center of answering is combined into: O={w
1, w
2..., w
n, the lexical set of current answer A is: A={c
1, c
2..., c
m, the cosine vector of then answering A and theme center is:
Wherein, W
O, xThe weight of expression speech x in the theme center O, and W
O, x=tf
L, xLog (tf
x) log (1/df
x), tf
L, xBe illustrated in the local frequencies of the speech x appearance of adding up in the answer in this bunch, W
A, yThe weights W of expression speech y in answering A
A, y=log (tf
y) log (1/df
y).In addition, the word frequency tf of each speech
kWith document frequency df
xBe that statistics obtains from whole language material.
Calculate after each answers cosine distance with the theme center, according to the weights order from big to small of distance all answers are sorted, weights are big more, show that this answer and theme center are close more.
Steps d is eliminated similarity relation and relation of inclusion in the answer set after sorting.
Answer after ordering set is analyzed, judged and answer in the set whether have same or analogous answer, be i.e. similarity relation; Perhaps having certain answer is the situation of the subclass of another answer, i.e. relation of inclusion.If two answers are identical or similar, then the weights of these two answers in ordering are also basic identical; If one of them answer is the subclass of another answer, then the weights of superset in ordering are greater than the weights of subclass in ordering.
When in answering set, having similarity relation, promptly there is same or analogous answer, then only need to keep maximum the getting final product of answering of weights in the ordering, and remaining all is a redundant information, can remove; When having relation of inclusion in answering set, i.e. one of them answer is the subclass of another answer, then only needs to keep the answer of superset.
In order to discern the similarity relation and the relation of inclusion of answering in the set, the present invention adopts and realizes based on the similarity calculating method of sentence level, is specially:
Step 01, subordinate sentence is carried out in each answer.The rule of subordinate sentence is: according to ". ? " discern Deng punctuation mark, and the length of sentence is about 50 words.
Step 02, utilize hash algorithm the text of each sentence to be converted into the finger print information of one 4 byte.Answer the finger print information combination A={s that then comprises a series of 4 bytes in the set for such one
1, s
2..., s
n, with s
iRegard a speech as, thereby can set up a document inverted list, share same finger print information s
iAll texts just form a classification.Then to the calculated fingerprint repetition in twos of the text in classification degree, if fingerprint repeats degree greater than preset threshold value (for example 40%), then judge and have similarity relation or relation of inclusion, thereby, and write down two relations between the answer to lower the identifying of sequencing weight in the pairing answer of text that participates in the calculating of fingerprint repetition degree with the removal mark; On the contrary,, then judge not have similarity relation or relation of inclusion, also just do not need to carry out similarity relation and relation of inclusion relevant treatment if fingerprint repeats degree less than preset threshold value.
Step 03, repeat said process, all be eliminated up to the similarity relation and the relation of inclusion of all classes.
After aforesaid operations is all complete, answering in the answer that has similarity relation and relation of inclusion in the set the less answer of sequencing weight all is used and removes mark and identify, produce a correct evaluation score value with this, as question and answer to quality evaluation result.
The internal quality assessment of question and answer comprises: the matching degree calculating of evaluation, problem and the answer of problem and answer quality and single question and answer are to the evaluation of quality.
Wherein, problem and the evaluation content of answering quality can comprise at least a in the following content: 1, problem formatted message, comprise: the length of problem, punctuation mark and whether have interrogative etc., the problem quality that can satisfy prescribed form is higher, and against regulation form, and express unclear problem, do not have high-quality usually; 2, the length of Da Fuing, the answer moderate according to statistical length has higher quality usually; 3, answer in visual signature information, comprising: for each paragraph speech number, whether the paragraph prefix has black matrix is increased the weight of symbol or the like, the higher answer of quality is except moderate length usually, answering also has good visual signature information.4, the positive counter-example dictionary of problem feature, promptly the speech in the problem is respectively in positive example dictionary and the ratio in the counter-example dictionary; 5, answer positive counter-example dictionary feature, the ratio of speech in positive example dictionary and counter-example dictionary in promptly answering; 6, question and answer are to the non-text element in the forming process, for example: user's evaluation, answerer's rank, answerer's answer number and acceptance rate or the like.
It is pointed out that the quality for reflection problem and answer, the present invention has defined positive example dictionary and counter-example dictionary respectively.If the large percentage of speech in the positive example dictionary in problem or the answer, then this problem or answer are higher as high-quality possibility; Otherwise if the large percentage of speech in the positive example dictionary in problem or the answer, then this problem or answer are lower as high-quality possibility.
The building process of positive example dictionary and counter-example dictionary is as follows: at first, extract the language material of a large amount of question and answer to (as 5000), and it is marked two classes, a class is quality data collection D1, and another kind of is middle low quality data collection D2; To the problem extracted and all vocabulary that occur in answering add up, if the frequency of certain vocabulary in quality data collection D1 divided by the frequency in whole data set (comprising D1 and D2) greater than predetermined threshold value α
1, then this vocabulary enters the positive example dictionary; If the frequency of certain vocabulary in quality data collection D1 divided by the frequency in whole data set (comprising D1 and D2) less than predetermined threshold value α
2, then this vocabulary enters the counter-example dictionary.The vocabulary that occurs in the problem enters the positive counter-example dictionary of problem, and the vocabulary that occurs in the answer enters the positive counter-example dictionary of answer.
The present invention proposes the matching degree calculating that a kind of method based on the theme cluster is carried out problem and answer, is specially:
Step 001 is collected a certain amount of overall corpus (as 80GB) as the statistics language material of putting mutual information, and this statistics language material is carried out word segmentation processing, and according to formula
Calculate the some mutual information between speech and the speech.Wherein, PMI (w
1, w
2) expression speech w
1With speech w
2Between the some mutual information, P (w
1) expression speech w
1The frequency of occurrences in statistics, P (w
2) expression speech w
2The frequency of occurrences in statistics, P (w
1w
2) expression speech w
1And w
2Co-occurrence frequency, if i.e. speech w
1And w
2Appear in continuous several sentence, and the number of words of these continuous several sentences is less than length threshold (as 150 Chinese characters), then thinks speech w
1And w
2Co-occurrence.In addition, w in a document
1And w
2Occur repeatedly all only calculating once.
Step 002, to bunch in problem carry out processing such as participle and part-of-speech tagging, keep vocabulary q with noun part of speech
1, q
2Q
m, the number of noun is designated as m.
Step 003, answer is handled, judge the length of answering, if greater than length threshold (as 150 Chinese characters), then it is carried out descriptor and extract processing, what descriptor was extracted mainly is operating as: search the word frequency tf that each speech occurred and the document frequency df of answer from the global statistics language material, and adopt formula TFlog (tf) log (1/df) to compose for each speech and go up weights; According to weights order from big to small all speech in answering are sorted, and extract several forward (for example n=50) nouns as descriptor.Wherein TF represents the local frequencies that corresponding speech is added up in the answer at its place.If the length of answering is then directly carried out processing such as participle, part-of-speech tagging less than length threshold to it, and extract vocabulary a with noun part of speech
1, a
2A
n, number is designated as n.
Step 004 is with q
iThe initial point that is the theme is judged a
jWith q
iThe some mutual information whether greater than a mutual information threshold value, if greater than, then with a
jAdd center chain; If all less than a mutual information threshold value, then with a
jDeletion.The vocabulary number that finally obtains comprising in the center chain is designated as k, and the matching degree between problem definition and the answer is: k/m+n.This defines expression, if more many relevant with keyword in the problem of the keyword in answering, this probability is just big more, expression put question to and the degree of correlation of answer high more.
In addition, in order to merge above-mentioned various features, the present invention adopts the fusion framework of maximum entropy statistical model as each feature, to realize the evaluation of single question and answer to quality.Certainly, the fusion framework among the present invention also can adopt the sorter of other types to realize, for example: and support vector machine, Bayes etc., and fusion framework of the present invention is not limited only to above-mentioned the act.
To estimate sorter be example that the fusion process of each feature is described in detail with maximum entropy below, as shown in Figure 2, maximum entropy is estimated the input feature vector that sorter adopts and is comprised: the length of problem formatted message, answer, answer in visual signature information, the positive counter-example dictionary of problem feature, answer positive counter-example dictionary feature, question and answer matching degree to non-text feature, problem and answer in the forming process.
Wherein, the positive counter-example dictionary of problem feature with the production process of answering positive counter-example dictionary feature is: each speech in statistical problem and the answer is belonging to quality data and the probability that belongs to low quality data in positive counter-example dictionary respectively; Utilize Bayesian formula to calculate P (good|Q) then, the probability of P (good|A), this probability are respectively as the problem of maximum entropy positive counter-example dictionary feature with answer the input of positive counter-example dictionary feature.
The length of answering is defined as the probability P (good|L) that belongs to quality data under this length L, and
Probability p (good|L), p (bad|L) adds up in training process and obtains.
Question and answer are to the non-text feature in the forming process, be that ratio, answerer's the answer number of the rank of ratio, answerer by the user being estimated score and best result and the highest level answerer's acceptance rate during greater than certain numerical value averages, a numerical value that obtains is with the input of this numerical value as non-text feature.
The problem formatted message is defined as P (good|Q)=λ
1P (good|L
Q)+λ
2+ λ
3, λ wherein
1+ λ
2+ λ
3=1, λ
1, λ
2, λ
3Problem of representation is in this length L respectively
QUnder be high-quality probability P (good|L
Q) weighted value, problem be that high-quality weighted value, problem are high-quality weighted value when having a question the speech feature when having the punctuation mark feature.
Visual signature information is according to judging whether final formation satisfies the resulting result of formatted message in the answer, if satisfy, then this characteristic information is 1, otherwise is 0.
Above-mentioned training process is, at first in 10000 training samples, train the model parameter of maximum entropy, utilize the model parameter of maximum entropy to discern then, finally give each question and answer a correct evaluation score value is arranged, with this as the internal quality evaluation result of question and answer.The question and answer that are lower than certain threshold value for score value are right to low-quality question and answer in then thinking, directly deletion.
Step 103, to question and answer to quality evaluation result and the internal quality evaluation result of question and answer merge, the question and answer of outputting high quality are right.
The quality evaluation result that single question and answer are right and bunch in question and answer to quality evaluation result organically blend, can be undertaken by the mode of weighting, also can pass through sorter, the right evaluation score value of single question and answer and question and answer to the evaluation score value merge as two features.According to experiment statistics, the present invention adopts following scheme:
The right number N of all question and answer in the statistics bunch at first, with bunch in all question and answer to through behind the right evaluation sorter of single question and answer, it is right only to be left high-quality question and answer;
Right for these high-quality question and answer, remove involved question and answer to lower to sequencing weight with similar question and answer;
According to the question and answer that comprise in each bunch number is carried out classification marking: if N〉50, the ordering maximum in then will this bunch is normalized to 1, gets that first three is individual right as high-quality question and answer; If N〉20, the ordering maximum in then will this bunch is normalized to 0.9, get preceding two right as high-quality question and answer; If N〉10, the ordering maximum in then will this bunch is normalized to 0.8, and it is previous right as high-quality question and answer to get; If N〉5, the ordering maximum in then will this bunch is normalized to 0.7, and it is previous right as high-quality question and answer to get; If N〉1, the ordering maximum in then will this bunch is normalized to 0.6, and averages with evaluation score value in the question and answer, if maximum score value surpasses 0.7, keeps that it is right for high-quality question and answer, otherwise deletion; If N=1 then is made as 0.5 with these question and answer to score value, and average,, keep that it is right for high-quality question and answer if maximum score value surpasses 0.7 with the internal evaluation score value of question and answer, otherwise deletion.
For realizing the right quality evaluating method of question and answer of the invention described above, the present invention also provides a kind of question and answer right QA system, as shown in Figure 3, this system comprises: cluster module 10, the first quality assessment module 20, the second quality assessment module 30 and Fusion Module 40.Wherein, cluster module 10 is used for question and answer to input to carrying out cluster according to problem content, obtain by identical or close problem of semanteme and answer thereof form bunch.The first quality assessment module 20 connects cluster module 10, be used for to bunch carry out question and answer to quality assessment, obtain question and answer to quality evaluation result.The second quality assessment module 30 connects cluster module 10, is used for obtaining the internal quality evaluation result of question and answer to bunch carrying out the internal quality assessment of question and answer.Fusion Module 40 connects the first quality assessment module 20 and the second quality assessment module 30, be used for to question and answer to quality evaluation result and the internal quality evaluation result of question and answer merge, the question and answer of outputting high quality are right.
In sum, high-quality question and answer can be separated low-quality question and answer centering therefrom, be formed high-quality knowledge base by the present invention; As the data source of search engine, the part of high-quality data as search engine index directly can be placed on the forward position of Search Results; As the knowledge base of automatic question answering, can be with select quality data directly as the knowledge source of automatic question answering, for the user furnishes an answer.In addition, the present invention not only can handle the knowledge question data, also can handle the data that users such as blog, forum, BBS, FAQ question and answer data produce; Can directly be used for setting up encyclopaedic knowledge through the quality data after estimating.
The above is preferred embodiment of the present invention only, is not to be used to limit protection scope of the present invention.
Claims (9)
1, the right quality evaluating method of a kind of question and answer is characterized in that, this method comprises:
To the question and answer of input to carrying out cluster according to problem content, obtain by identical or close problem of semanteme and answer thereof form bunch;
To described bunch carry out question and answer to quality assessment and the internal quality assessment of question and answer, and obtain respectively question and answer to quality evaluation result and the internal quality evaluation result of question and answer;
To described question and answer to quality evaluation result and the internal quality evaluation result of question and answer merge, the question and answer of outputting high quality are right.
2, according to the right quality evaluating method of the described question and answer of claim 1, it is characterized in that described cluster comprises: k-means cluster and single pass cluster.
According to the right quality evaluating method of the described question and answer of claim 2, it is characterized in that 3, described single pass cluster is specially:
The problem of back input is carried out similarity calculating one by one with the class of current existence,, then described problem is merged with corresponding class if the similarity of described problem and one of them class exceeds default similarity threshold; If the similarity of all classes of described problem and current existence all is lower than default similarity threshold, then be that described problem is created a new class.
4, according to the right quality evaluating method of the described question and answer of claim 1, it is characterized in that, described question and answer to quality assessment, be specially:
Participle, part-of-speech tagging and the processing of removal stop words are carried out in each answer in described bunch;
Add up the document frequency that each speech occurs, and with document frequency greater than the speech of frequency threshold as bunch in the theme center of all answers;
Calculate the distance at each answer and theme center by general cosine distance function, and each answer is sorted according to the weights size of distance;
According to calculating based on the similarity of sentence level, eliminate similarity relation and relation of inclusion in the answer after the ordering, obtain described question and answer to quality evaluation result.
According to the right quality evaluating method of the described question and answer of claim 1, it is characterized in that 5, the internal quality assessment of described question and answer comprises: the matching degree calculating of evaluation, problem and the answer of problem and answer quality and single question and answer are to the evaluation of quality.
6, according to the right quality evaluating method of the described question and answer of claim 5, it is characterized in that described problem and the evaluation content of answering quality comprise at least a in the following content: the length of problem formatted message, answer, answer in visual signature information, the positive counter-example dictionary of problem feature, answer positive counter-example dictionary feature and question and answer to the non-text feature in the forming process.
7, according to the right quality evaluating method of the described question and answer of claim 5, it is characterized in that this method further comprises: the matching degree that obtains described problem and answer by mode based on the theme cluster.
According to the internal quality evaluating method of the described question and answer of claim 5, it is characterized in that 8, described single question and answer are specially the evaluation of quality:
By the maximum entropy statistical model following feature is merged, obtains the right quality assessment score value of each question and answer:
The length of problem formatted message, answer, answer in visual signature information, the positive counter-example dictionary of problem feature, answer positive counter-example dictionary feature, question and answer matching degree to non-text feature, problem and answer in the forming process.
9, the right QA system of a kind of question and answer is characterized in that, this system comprises:
The cluster module is used for question and answer to input to carrying out cluster according to problem content, obtain by identical or close problem of semanteme and answer thereof form bunch;
The first quality assessment module, be used for to described bunch carry out question and answer to quality assessment, obtain question and answer to quality evaluation result;
The second quality assessment module is used for carrying out the internal quality assessment of question and answer to described bunch, obtains the internal quality evaluation result of question and answer;
Fusion Module, be used for to described question and answer to quality evaluation result and the internal quality evaluation result of question and answer merge, the question and answer of outputting high quality are right.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN200910081558A CN101520802A (en) | 2009-04-13 | 2009-04-13 | Question-answer pair quality evaluation method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN200910081558A CN101520802A (en) | 2009-04-13 | 2009-04-13 | Question-answer pair quality evaluation method and system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN101520802A true CN101520802A (en) | 2009-09-02 |
Family
ID=41081391
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN200910081558A Pending CN101520802A (en) | 2009-04-13 | 2009-04-13 | Question-answer pair quality evaluation method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN101520802A (en) |
Cited By (66)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102629272A (en) * | 2012-03-14 | 2012-08-08 | 北京邮电大学 | Clustering based optimization method for examination system database |
CN102681992A (en) * | 2011-03-07 | 2012-09-19 | 腾讯科技(深圳)有限公司 | Method and system for data hierarchy |
CN102955772A (en) * | 2011-08-17 | 2013-03-06 | 北京百度网讯科技有限公司 | Similarity computing method and similarity computing device on basis of semanteme |
CN103049433A (en) * | 2012-12-11 | 2013-04-17 | 微梦创科网络科技(中国)有限公司 | Automatic question answering method, automatic question answering system and method for constructing question answering case base |
CN103049637A (en) * | 2011-10-11 | 2013-04-17 | 塔塔咨询服务有限公司 | Content quality and user engagement in social platforms |
CN103218356A (en) * | 2012-01-18 | 2013-07-24 | 深圳市腾讯计算机系统有限公司 | Question quality judging method and system facing open platform |
CN103226580A (en) * | 2013-04-02 | 2013-07-31 | 西安交通大学 | Interactive-text-oriented topic detection method |
CN103377245A (en) * | 2012-04-27 | 2013-10-30 | 腾讯科技(深圳)有限公司 | Automatic question and answer method and device |
CN103425635A (en) * | 2012-05-15 | 2013-12-04 | 北京百度网讯科技有限公司 | Method and device for recommending answers |
CN103577556A (en) * | 2013-10-21 | 2014-02-12 | 北京奇虎科技有限公司 | Device and method for obtaining association degree of question and answer pair |
CN103577558A (en) * | 2013-10-21 | 2014-02-12 | 北京奇虎科技有限公司 | Device and method for optimizing search ranking of frequently asked question and answer pairs |
CN103729424A (en) * | 2013-12-20 | 2014-04-16 | 百度在线网络技术(北京)有限公司 | Method and system for assessing answers in Q&A (questions and answers) community |
CN103810170A (en) * | 2012-11-06 | 2014-05-21 | 腾讯科技(深圳)有限公司 | Communication platform text classification method and device |
CN103810218A (en) * | 2012-11-14 | 2014-05-21 | 北京百度网讯科技有限公司 | Problem cluster-based automatic asking and answering method and device |
CN104347071A (en) * | 2013-08-02 | 2015-02-11 | 安徽科大讯飞信息科技股份有限公司 | Method and system for generating oral test reference answer |
CN104376074A (en) * | 2014-11-14 | 2015-02-25 | 北京云知声信息技术有限公司 | Method and system for obtaining repeating resources |
WO2015058604A1 (en) * | 2013-10-21 | 2015-04-30 | 北京奇虎科技有限公司 | Apparatus and method for obtaining degree of association of question and answer pair and for search ranking optimization |
WO2016122681A1 (en) * | 2015-01-28 | 2016-08-04 | Intuit Inc. | Pro-active detection and correction of low quality questions in a customer support system |
CN106155522A (en) * | 2016-06-29 | 2016-11-23 | 上海智臻智能网络科技股份有限公司 | Session data process, knowledge base foundation, optimization, exchange method and device |
CN106250398A (en) * | 2016-07-19 | 2016-12-21 | 北京京东尚科信息技术有限公司 | A kind of complaint classifying content decision method complaining event and device |
CN106844334A (en) * | 2016-12-20 | 2017-06-13 | 网易(杭州)网络有限公司 | Method and apparatus for evaluating and testing session robotic intelligence |
CN107066541A (en) * | 2017-03-13 | 2017-08-18 | 平安科技(深圳)有限公司 | The processing method and system of customer service question and answer data |
CN107168967A (en) * | 2016-03-07 | 2017-09-15 | 阿里巴巴集团控股有限公司 | The acquisition methods and device of object knowledge point |
CN107193872A (en) * | 2017-04-14 | 2017-09-22 | 深圳前海微众银行股份有限公司 | Question and answer data processing method and device |
CN107229733A (en) * | 2017-06-12 | 2017-10-03 | 上海智臻智能网络科技股份有限公司 | Evaluation method and device are asked in extension |
CN107562856A (en) * | 2017-08-28 | 2018-01-09 | 深圳追科技有限公司 | A kind of self-service customer service system and method |
CN107908803A (en) * | 2017-12-26 | 2018-04-13 | 上海智臻智能网络科技股份有限公司 | The response method and device, storage medium, terminal of question and answer interaction |
CN108255943A (en) * | 2017-12-12 | 2018-07-06 | 百度在线网络技术(北京)有限公司 | Human-computer dialogue method for evaluating quality, device, computer equipment and storage medium |
US10083213B1 (en) | 2015-04-27 | 2018-09-25 | Intuit Inc. | Method and system for routing a question based on analysis of the question content and predicted user satisfaction with answer content before the answer content is generated |
US10134050B1 (en) | 2015-04-29 | 2018-11-20 | Intuit Inc. | Method and system for facilitating the production of answer content from a mobile device for a question and answer based customer support system |
US10147037B1 (en) | 2015-07-28 | 2018-12-04 | Intuit Inc. | Method and system for determining a level of popularity of submission content, prior to publicizing the submission content with a question and answer support system |
US10162734B1 (en) | 2016-07-20 | 2018-12-25 | Intuit Inc. | Method and system for crowdsourcing software quality testing and error detection in a tax return preparation system |
CN109102809A (en) * | 2018-06-22 | 2018-12-28 | 北京光年无限科技有限公司 | A kind of dialogue method and system for intelligent robot |
US10169718B2 (en) | 2015-08-13 | 2019-01-01 | International Business Machines Corporation | System and method for defining and using different levels of ground truth |
CN109241519A (en) * | 2018-06-28 | 2019-01-18 | 平安科技(深圳)有限公司 | Environmental Evaluation Model acquisition methods and device, computer equipment and storage medium |
CN109271495A (en) * | 2018-08-14 | 2019-01-25 | 阿里巴巴集团控股有限公司 | Question and answer recognition effect detection method, device, equipment and readable storage medium storing program for executing |
CN109472030A (en) * | 2018-11-09 | 2019-03-15 | 科大讯飞股份有限公司 | A kind of system replys the evaluation method and device of quality |
US10242093B2 (en) | 2015-10-29 | 2019-03-26 | Intuit Inc. | Method and system for performing a probabilistic topic analysis of search queries for a customer support system |
US10268956B2 (en) | 2015-07-31 | 2019-04-23 | Intuit Inc. | Method and system for applying probabilistic topic models to content in a tax environment to improve user satisfaction with a question and answer customer support system |
CN110008340A (en) * | 2019-03-27 | 2019-07-12 | 曲阜师范大学 | A kind of multi-source text knowledge indicates, obtains and emerging system |
CN110164447A (en) * | 2019-04-03 | 2019-08-23 | 苏州驰声信息科技有限公司 | A kind of spoken language methods of marking and device |
US10394804B1 (en) | 2015-10-08 | 2019-08-27 | Intuit Inc. | Method and system for increasing internet traffic to a question and answer customer support system |
US10447777B1 (en) | 2015-06-30 | 2019-10-15 | Intuit Inc. | Method and system for providing a dynamically updated expertise and context based peer-to-peer customer support system within a software application |
US10445332B2 (en) | 2016-09-28 | 2019-10-15 | Intuit Inc. | Method and system for providing domain-specific incremental search results with a customer self-service system for a financial management system |
US10460398B1 (en) | 2016-07-27 | 2019-10-29 | Intuit Inc. | Method and system for crowdsourcing the detection of usability issues in a tax return preparation system |
US10467541B2 (en) | 2016-07-27 | 2019-11-05 | Intuit Inc. | Method and system for improving content searching in a question and answer customer support system by using a crowd-machine learning hybrid predictive model |
US10475044B1 (en) | 2015-07-29 | 2019-11-12 | Intuit Inc. | Method and system for question prioritization based on analysis of the question content and predicted asker engagement before answer content is generated |
US10552843B1 (en) | 2016-12-05 | 2020-02-04 | Intuit Inc. | Method and system for improving search results by recency boosting customer support content for a customer self-help system associated with one or more financial management systems |
US10572954B2 (en) | 2016-10-14 | 2020-02-25 | Intuit Inc. | Method and system for searching for and navigating to user content and other user experience pages in a financial management system with a customer self-service system for the financial management system |
US10599699B1 (en) | 2016-04-08 | 2020-03-24 | Intuit, Inc. | Processing unstructured voice of customer feedback for improving content rankings in customer support systems |
CN111444724A (en) * | 2020-03-23 | 2020-07-24 | 腾讯科技(深圳)有限公司 | Medical question-answer quality testing method and device, computer equipment and storage medium |
US10733677B2 (en) | 2016-10-18 | 2020-08-04 | Intuit Inc. | Method and system for providing domain-specific and dynamic type ahead suggestions for search query terms with a customer self-service system for a tax return preparation system |
US10748157B1 (en) | 2017-01-12 | 2020-08-18 | Intuit Inc. | Method and system for determining levels of search sophistication for users of a customer self-help system to personalize a content search user experience provided to the users and to increase a likelihood of user satisfaction with the search experience |
US10755294B1 (en) | 2015-04-28 | 2020-08-25 | Intuit Inc. | Method and system for increasing use of mobile devices to provide answer content in a question and answer based customer support system |
CN111667029A (en) * | 2020-07-09 | 2020-09-15 | 腾讯科技(深圳)有限公司 | Clustering method, device, equipment and storage medium |
CN111680135A (en) * | 2020-04-20 | 2020-09-18 | 重庆兆光科技股份有限公司 | Reading understanding method based on implicit knowledge |
CN111722819A (en) * | 2019-03-19 | 2020-09-29 | 富士施乐株式会社 | Information processing apparatus, recording medium, and information processing method |
CN111967254A (en) * | 2020-10-21 | 2020-11-20 | 深圳追一科技有限公司 | Similar question set scoring method and device, computer equipment and storage medium |
US10922367B2 (en) | 2017-07-14 | 2021-02-16 | Intuit Inc. | Method and system for providing real time search preview personalization in data management systems |
US10977247B2 (en) | 2016-11-21 | 2021-04-13 | International Business Machines Corporation | Cognitive online meeting assistant facility |
US11093951B1 (en) | 2017-09-25 | 2021-08-17 | Intuit Inc. | System and method for responding to search queries using customer self-help systems associated with a plurality of data management systems |
WO2021169499A1 (en) * | 2020-02-26 | 2021-09-02 | 平安科技(深圳)有限公司 | Network bad data monitoring method, apparatus and system, and storage medium |
US11269665B1 (en) | 2018-03-28 | 2022-03-08 | Intuit Inc. | Method and system for user experience personalization in data management systems using machine learning |
US11436642B1 (en) | 2018-01-29 | 2022-09-06 | Intuit Inc. | Method and system for generating real-time personalized advertisements in data management self-help systems |
CN116775882A (en) * | 2023-06-29 | 2023-09-19 | 山东科技大学 | Intelligent government affair message processing method and equipment |
US11967253B2 (en) | 2021-05-27 | 2024-04-23 | International Business Machines Corporation | Semi-automated evaluation of long answer exams |
-
2009
- 2009-04-13 CN CN200910081558A patent/CN101520802A/en active Pending
Cited By (99)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102681992A (en) * | 2011-03-07 | 2012-09-19 | 腾讯科技(深圳)有限公司 | Method and system for data hierarchy |
CN102955772A (en) * | 2011-08-17 | 2013-03-06 | 北京百度网讯科技有限公司 | Similarity computing method and similarity computing device on basis of semanteme |
CN102955772B (en) * | 2011-08-17 | 2015-11-25 | 北京百度网讯科技有限公司 | A kind of similarity calculating method based on semanteme and device |
CN103049637B (en) * | 2011-10-11 | 2018-05-11 | 塔塔咨询服务有限公司 | Strengthen the system and method for the content quality and user's participation of social platform |
CN103049637A (en) * | 2011-10-11 | 2013-04-17 | 塔塔咨询服务有限公司 | Content quality and user engagement in social platforms |
CN103218356B (en) * | 2012-01-18 | 2017-12-08 | 深圳市世纪光速信息技术有限公司 | A kind of enquirement quality judging method and system towards open platform |
CN103218356A (en) * | 2012-01-18 | 2013-07-24 | 深圳市腾讯计算机系统有限公司 | Question quality judging method and system facing open platform |
CN102629272A (en) * | 2012-03-14 | 2012-08-08 | 北京邮电大学 | Clustering based optimization method for examination system database |
CN103377245A (en) * | 2012-04-27 | 2013-10-30 | 腾讯科技(深圳)有限公司 | Automatic question and answer method and device |
CN103377245B (en) * | 2012-04-27 | 2018-09-11 | 深圳市世纪光速信息技术有限公司 | A kind of automatic question-answering method and device |
CN103425635A (en) * | 2012-05-15 | 2013-12-04 | 北京百度网讯科技有限公司 | Method and device for recommending answers |
CN103425635B (en) * | 2012-05-15 | 2018-02-02 | 北京百度网讯科技有限公司 | Method and apparatus are recommended in a kind of answer |
CN103810170A (en) * | 2012-11-06 | 2014-05-21 | 腾讯科技(深圳)有限公司 | Communication platform text classification method and device |
CN103810170B (en) * | 2012-11-06 | 2018-04-27 | 腾讯科技(深圳)有限公司 | Intercommunion platform file classification method and device |
CN103810218B (en) * | 2012-11-14 | 2018-06-08 | 北京百度网讯科技有限公司 | A kind of automatic question-answering method and device based on problem cluster |
CN103810218A (en) * | 2012-11-14 | 2014-05-21 | 北京百度网讯科技有限公司 | Problem cluster-based automatic asking and answering method and device |
CN103049433A (en) * | 2012-12-11 | 2013-04-17 | 微梦创科网络科技(中国)有限公司 | Automatic question answering method, automatic question answering system and method for constructing question answering case base |
CN103049433B (en) * | 2012-12-11 | 2015-10-28 | 微梦创科网络科技(中国)有限公司 | The method of automatic question-answering method, automatically request-answering system and structure question and answer case library |
CN103226580B (en) * | 2013-04-02 | 2016-03-30 | 西安交通大学 | A kind of topic detection method of interaction text |
CN103226580A (en) * | 2013-04-02 | 2013-07-31 | 西安交通大学 | Interactive-text-oriented topic detection method |
CN104347071A (en) * | 2013-08-02 | 2015-02-11 | 安徽科大讯飞信息科技股份有限公司 | Method and system for generating oral test reference answer |
WO2015058604A1 (en) * | 2013-10-21 | 2015-04-30 | 北京奇虎科技有限公司 | Apparatus and method for obtaining degree of association of question and answer pair and for search ranking optimization |
CN103577556A (en) * | 2013-10-21 | 2014-02-12 | 北京奇虎科技有限公司 | Device and method for obtaining association degree of question and answer pair |
CN103577558A (en) * | 2013-10-21 | 2014-02-12 | 北京奇虎科技有限公司 | Device and method for optimizing search ranking of frequently asked question and answer pairs |
CN103577556B (en) * | 2013-10-21 | 2017-01-18 | 北京奇虎科技有限公司 | Device and method for obtaining association degree of question and answer pair |
CN103577558B (en) * | 2013-10-21 | 2017-04-26 | 北京奇虎科技有限公司 | Device and method for optimizing search ranking of frequently asked question and answer pairs |
CN103729424A (en) * | 2013-12-20 | 2014-04-16 | 百度在线网络技术(北京)有限公司 | Method and system for assessing answers in Q&A (questions and answers) community |
CN103729424B (en) * | 2013-12-20 | 2017-03-15 | 百度在线网络技术(北京)有限公司 | Evaluation method and system is answered in Ask-Answer Community |
CN104376074A (en) * | 2014-11-14 | 2015-02-25 | 北京云知声信息技术有限公司 | Method and system for obtaining repeating resources |
CN104376074B (en) * | 2014-11-14 | 2018-05-01 | 北京云知声信息技术有限公司 | One kind repeats resource acquiring method and system |
US10475043B2 (en) | 2015-01-28 | 2019-11-12 | Intuit Inc. | Method and system for pro-active detection and correction of low quality questions in a question and answer based customer support system |
WO2016122681A1 (en) * | 2015-01-28 | 2016-08-04 | Intuit Inc. | Pro-active detection and correction of low quality questions in a customer support system |
US10083213B1 (en) | 2015-04-27 | 2018-09-25 | Intuit Inc. | Method and system for routing a question based on analysis of the question content and predicted user satisfaction with answer content before the answer content is generated |
US10755294B1 (en) | 2015-04-28 | 2020-08-25 | Intuit Inc. | Method and system for increasing use of mobile devices to provide answer content in a question and answer based customer support system |
US11429988B2 (en) | 2015-04-28 | 2022-08-30 | Intuit Inc. | Method and system for increasing use of mobile devices to provide answer content in a question and answer based customer support system |
US10134050B1 (en) | 2015-04-29 | 2018-11-20 | Intuit Inc. | Method and system for facilitating the production of answer content from a mobile device for a question and answer based customer support system |
US10447777B1 (en) | 2015-06-30 | 2019-10-15 | Intuit Inc. | Method and system for providing a dynamically updated expertise and context based peer-to-peer customer support system within a software application |
US10147037B1 (en) | 2015-07-28 | 2018-12-04 | Intuit Inc. | Method and system for determining a level of popularity of submission content, prior to publicizing the submission content with a question and answer support system |
US10861023B2 (en) | 2015-07-29 | 2020-12-08 | Intuit Inc. | Method and system for question prioritization based on analysis of the question content and predicted asker engagement before answer content is generated |
US10475044B1 (en) | 2015-07-29 | 2019-11-12 | Intuit Inc. | Method and system for question prioritization based on analysis of the question content and predicted asker engagement before answer content is generated |
US10268956B2 (en) | 2015-07-31 | 2019-04-23 | Intuit Inc. | Method and system for applying probabilistic topic models to content in a tax environment to improve user satisfaction with a question and answer customer support system |
US10169718B2 (en) | 2015-08-13 | 2019-01-01 | International Business Machines Corporation | System and method for defining and using different levels of ground truth |
US11138521B2 (en) | 2015-08-13 | 2021-10-05 | International Business Machines Corporation | System and method for defining and using different levels of ground truth |
US10169717B2 (en) | 2015-08-13 | 2019-01-01 | International Business Machines Corporation | System and method for defining and using different levels of ground truth |
US10394804B1 (en) | 2015-10-08 | 2019-08-27 | Intuit Inc. | Method and system for increasing internet traffic to a question and answer customer support system |
US10242093B2 (en) | 2015-10-29 | 2019-03-26 | Intuit Inc. | Method and system for performing a probabilistic topic analysis of search queries for a customer support system |
CN107168967B (en) * | 2016-03-07 | 2020-12-04 | 创新先进技术有限公司 | Target knowledge point acquisition method and device |
CN107168967A (en) * | 2016-03-07 | 2017-09-15 | 阿里巴巴集团控股有限公司 | The acquisition methods and device of object knowledge point |
US11734330B2 (en) | 2016-04-08 | 2023-08-22 | Intuit, Inc. | Processing unstructured voice of customer feedback for improving content rankings in customer support systems |
US10599699B1 (en) | 2016-04-08 | 2020-03-24 | Intuit, Inc. | Processing unstructured voice of customer feedback for improving content rankings in customer support systems |
CN106155522A (en) * | 2016-06-29 | 2016-11-23 | 上海智臻智能网络科技股份有限公司 | Session data process, knowledge base foundation, optimization, exchange method and device |
CN106155522B (en) * | 2016-06-29 | 2019-03-29 | 上海智臻智能网络科技股份有限公司 | Session data processing, knowledge base foundation, optimization, exchange method and device |
CN106250398B (en) * | 2016-07-19 | 2020-03-27 | 北京京东尚科信息技术有限公司 | Method and device for classifying and judging complaint content of complaint event |
CN106250398A (en) * | 2016-07-19 | 2016-12-21 | 北京京东尚科信息技术有限公司 | A kind of complaint classifying content decision method complaining event and device |
US10162734B1 (en) | 2016-07-20 | 2018-12-25 | Intuit Inc. | Method and system for crowdsourcing software quality testing and error detection in a tax return preparation system |
US10460398B1 (en) | 2016-07-27 | 2019-10-29 | Intuit Inc. | Method and system for crowdsourcing the detection of usability issues in a tax return preparation system |
US10467541B2 (en) | 2016-07-27 | 2019-11-05 | Intuit Inc. | Method and system for improving content searching in a question and answer customer support system by using a crowd-machine learning hybrid predictive model |
US10445332B2 (en) | 2016-09-28 | 2019-10-15 | Intuit Inc. | Method and system for providing domain-specific incremental search results with a customer self-service system for a financial management system |
US10572954B2 (en) | 2016-10-14 | 2020-02-25 | Intuit Inc. | Method and system for searching for and navigating to user content and other user experience pages in a financial management system with a customer self-service system for the financial management system |
US11403715B2 (en) | 2016-10-18 | 2022-08-02 | Intuit Inc. | Method and system for providing domain-specific and dynamic type ahead suggestions for search query terms |
US10733677B2 (en) | 2016-10-18 | 2020-08-04 | Intuit Inc. | Method and system for providing domain-specific and dynamic type ahead suggestions for search query terms with a customer self-service system for a tax return preparation system |
US10977247B2 (en) | 2016-11-21 | 2021-04-13 | International Business Machines Corporation | Cognitive online meeting assistant facility |
US11423411B2 (en) | 2016-12-05 | 2022-08-23 | Intuit Inc. | Search results by recency boosting customer support content |
US10552843B1 (en) | 2016-12-05 | 2020-02-04 | Intuit Inc. | Method and system for improving search results by recency boosting customer support content for a customer self-help system associated with one or more financial management systems |
CN106844334A (en) * | 2016-12-20 | 2017-06-13 | 网易(杭州)网络有限公司 | Method and apparatus for evaluating and testing session robotic intelligence |
US10748157B1 (en) | 2017-01-12 | 2020-08-18 | Intuit Inc. | Method and system for determining levels of search sophistication for users of a customer self-help system to personalize a content search user experience provided to the users and to increase a likelihood of user satisfaction with the search experience |
CN107066541A (en) * | 2017-03-13 | 2017-08-18 | 平安科技(深圳)有限公司 | The processing method and system of customer service question and answer data |
CN107193872A (en) * | 2017-04-14 | 2017-09-22 | 深圳前海微众银行股份有限公司 | Question and answer data processing method and device |
CN107229733B (en) * | 2017-06-12 | 2020-01-14 | 上海智臻智能网络科技股份有限公司 | Extended question evaluation method and device |
CN107229733A (en) * | 2017-06-12 | 2017-10-03 | 上海智臻智能网络科技股份有限公司 | Evaluation method and device are asked in extension |
US10922367B2 (en) | 2017-07-14 | 2021-02-16 | Intuit Inc. | Method and system for providing real time search preview personalization in data management systems |
CN107562856A (en) * | 2017-08-28 | 2018-01-09 | 深圳追科技有限公司 | A kind of self-service customer service system and method |
US11093951B1 (en) | 2017-09-25 | 2021-08-17 | Intuit Inc. | System and method for responding to search queries using customer self-help systems associated with a plurality of data management systems |
CN108255943A (en) * | 2017-12-12 | 2018-07-06 | 百度在线网络技术(北京)有限公司 | Human-computer dialogue method for evaluating quality, device, computer equipment and storage medium |
CN107908803A (en) * | 2017-12-26 | 2018-04-13 | 上海智臻智能网络科技股份有限公司 | The response method and device, storage medium, terminal of question and answer interaction |
CN107908803B (en) * | 2017-12-26 | 2020-10-27 | 上海智臻智能网络科技股份有限公司 | Question-answer interaction response method and device, storage medium and terminal |
US11436642B1 (en) | 2018-01-29 | 2022-09-06 | Intuit Inc. | Method and system for generating real-time personalized advertisements in data management self-help systems |
US11269665B1 (en) | 2018-03-28 | 2022-03-08 | Intuit Inc. | Method and system for user experience personalization in data management systems using machine learning |
CN109102809A (en) * | 2018-06-22 | 2018-12-28 | 北京光年无限科技有限公司 | A kind of dialogue method and system for intelligent robot |
CN109241519A (en) * | 2018-06-28 | 2019-01-18 | 平安科技(深圳)有限公司 | Environmental Evaluation Model acquisition methods and device, computer equipment and storage medium |
CN109271495A (en) * | 2018-08-14 | 2019-01-25 | 阿里巴巴集团控股有限公司 | Question and answer recognition effect detection method, device, equipment and readable storage medium storing program for executing |
CN109271495B (en) * | 2018-08-14 | 2023-02-17 | 创新先进技术有限公司 | Question-answer recognition effect detection method, device, equipment and readable storage medium |
CN109472030B (en) * | 2018-11-09 | 2023-11-24 | 科大讯飞股份有限公司 | System recovery quality evaluation method and device |
CN109472030A (en) * | 2018-11-09 | 2019-03-15 | 科大讯飞股份有限公司 | A kind of system replys the evaluation method and device of quality |
CN111722819A (en) * | 2019-03-19 | 2020-09-29 | 富士施乐株式会社 | Information processing apparatus, recording medium, and information processing method |
CN110008340A (en) * | 2019-03-27 | 2019-07-12 | 曲阜师范大学 | A kind of multi-source text knowledge indicates, obtains and emerging system |
CN110164447A (en) * | 2019-04-03 | 2019-08-23 | 苏州驰声信息科技有限公司 | A kind of spoken language methods of marking and device |
CN110164447B (en) * | 2019-04-03 | 2021-07-27 | 苏州驰声信息科技有限公司 | Spoken language scoring method and device |
WO2021169499A1 (en) * | 2020-02-26 | 2021-09-02 | 平安科技(深圳)有限公司 | Network bad data monitoring method, apparatus and system, and storage medium |
CN111444724A (en) * | 2020-03-23 | 2020-07-24 | 腾讯科技(深圳)有限公司 | Medical question-answer quality testing method and device, computer equipment and storage medium |
CN111680135A (en) * | 2020-04-20 | 2020-09-18 | 重庆兆光科技股份有限公司 | Reading understanding method based on implicit knowledge |
CN111680135B (en) * | 2020-04-20 | 2023-08-25 | 重庆兆光科技股份有限公司 | Reading and understanding method based on implicit knowledge |
CN111667029A (en) * | 2020-07-09 | 2020-09-15 | 腾讯科技(深圳)有限公司 | Clustering method, device, equipment and storage medium |
CN111667029B (en) * | 2020-07-09 | 2023-11-10 | 腾讯科技(深圳)有限公司 | Clustering method, device, equipment and storage medium |
CN111967254B (en) * | 2020-10-21 | 2021-04-06 | 深圳追一科技有限公司 | Similar question set scoring method and device, computer equipment and storage medium |
CN111967254A (en) * | 2020-10-21 | 2020-11-20 | 深圳追一科技有限公司 | Similar question set scoring method and device, computer equipment and storage medium |
US11967253B2 (en) | 2021-05-27 | 2024-04-23 | International Business Machines Corporation | Semi-automated evaluation of long answer exams |
CN116775882A (en) * | 2023-06-29 | 2023-09-19 | 山东科技大学 | Intelligent government affair message processing method and equipment |
CN116775882B (en) * | 2023-06-29 | 2024-02-27 | 山东科技大学 | Intelligent government affair message processing method and equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN101520802A (en) | Question-answer pair quality evaluation method and system | |
CN110852087B (en) | Chinese error correction method and device, storage medium and electronic device | |
Panichella et al. | How can i improve my app? classifying user reviews for software maintenance and evolution | |
Liu et al. | Review sentiment scoring via a parse-and-paraphrase paradigm | |
Furlan et al. | Semantic similarity of short texts in languages with a deficient natural language processing support | |
Reganti et al. | Modeling satire in English text for automatic detection | |
CN106202584A (en) | A kind of microblog emotional based on standard dictionary and semantic rule analyzes method | |
CN107688630B (en) | Semantic-based weakly supervised microbo multi-emotion dictionary expansion method | |
CN107526841A (en) | A kind of Tibetan language text summarization generation method based on Web | |
CN107463703A (en) | English social media account number classification method based on information gain | |
CN106446147A (en) | Emotion analysis method based on structuring features | |
CN105335350A (en) | Language identification method based on ensemble learning | |
Tsapatsoulis et al. | Feature extraction for tweet classification: Do the humans perform better? | |
Imperial et al. | Developing a machine learning-based grade level classifier for Filipino children’s literature | |
Persing et al. | Lightly-supervised modeling of argument persuasiveness | |
CN109871429B (en) | Short text retrieval method integrating Wikipedia classification and explicit semantic features | |
Schmid et al. | FoSIL-Offensive language classification of German tweets combining SVMs and deep learning techniques. | |
Sokolova et al. | Verbs speak loud: Verb categories in learning polarity and strength of opinions | |
CN112507115B (en) | Method and device for classifying emotion words in barrage text and storage medium | |
CN113657090A (en) | Military news long text layering event extraction method | |
Nieto Piña | Splitting rocks: Learning word sense representations from corpora and lexica | |
Dankhara | A Review of Sentiment Analysis of Tweets | |
Taslioglu et al. | Irony detection on microposts with limited set of features | |
Talwar et al. | Intelligent Classroom System for Qualitative Analysis of Students' Conceptual Understanding | |
Jiang et al. | A novel feature selection based on Tibetan grammar for Tibetan text classification |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C12 | Rejection of a patent application after its publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20090902 |