CN101520802A - Question-answer pair quality evaluation method and system - Google Patents

Question-answer pair quality evaluation method and system Download PDF

Info

Publication number
CN101520802A
CN101520802A CN200910081558A CN200910081558A CN101520802A CN 101520802 A CN101520802 A CN 101520802A CN 200910081558 A CN200910081558 A CN 200910081558A CN 200910081558 A CN200910081558 A CN 200910081558A CN 101520802 A CN101520802 A CN 101520802A
Authority
CN
China
Prior art keywords
answer
question
quality
evaluation result
quality evaluation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN200910081558A
Other languages
Chinese (zh)
Inventor
方高林
刘怀军
郑全战
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN200910081558A priority Critical patent/CN101520802A/en
Publication of CN101520802A publication Critical patent/CN101520802A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a question-answer pair quality evaluation method including the steps of clustering input question-answer pairs according to question contents to obtain a cluster including questions with same or similar semantic meanings and answers to the questions; performing quality evaluation between the question-answer pairs and quality evaluation in the question-answer pairs to the cluster and respectively obtaining the quality evaluation result between the question-answer pairs and the quality evaluation result in the question-answer pairs; inosculating the quality evaluation result between the question-answer pairs and the quality evaluation result in the question-answer pairs and outputting the question-answer pairs with high quality. The invention also provides a question-answer pair quality evaluation system so as to realize more effective question-answer pair quality evaluation and improve the commonality of quality evaluation.

Description

Quality evaluating method that a kind of question and answer are right and system
Technical field
The present invention relates to the internet information treatment technology, relate in particular to right quality evaluating method of a kind of question and answer and system.
Background technology
Along with Internet development, information is more and more abundanter, and how obtaining useful knowledge from the information of magnanimity is present urgent problem.For better knowledge services can be provided, a plurality of knowledge question interaction platforms grow up successively.On these knowledge question interaction platforms, the user is the consumer of content, also is the creator of content; The user can by the knowledge question interaction platform seek amusement help, carry out social interaction, also can put question to and answer a question and the answer of problem is estimated.Typical question and answer produce flow process: the user asks a question on the knowledge question interaction platform, and other users participate in answering, and the user of enquirement confirms a satisfied answer to the answer of different user.
Along with increasing of problem number, the semantic problem number that repeats is also more and more, and big multi-user is when puing question to, and whether care system the inside does not exist identical problem and answer.Therefore, on present question and answer interaction platform, the question and answer that exist a lot of repetitions are right.Though for the problem that has solved, all passed through the quizmaster and confirmed this step, different quizmasters' affirmation standard is different, some quizmaster thanks the answerer to furnish an answer and provides very high evaluation, and does not mind the quality of answer.Therefore, exist on the knowledge question interaction platform of replication problem and answer at these, distinguish high-quality question and answer to low-quality question and answer to seeming very necessary.
Exist a kind of decision tree framework that adopts to merge various features in the prior art to the method for question and answer to classifying.The feature of using comprises: based on the content of text feature with based on the usage feature.The N tuple (N-grams) of content of text feature such as speech, the length of speech, different speech number, frequency be greater than speech number of threshold value or the like based on the ternary syntax (Trigram) the language model entropy of character, in answering.The usage feature mainly comprises: the user is for the right rank of agreeing with and oppose number, answerer of question and answer, rank of quizmaster or the like.This method is studied at the different characteristic role, and with its incorporate under the decision tree framework realize to the high-quality question and answer to the right differentiation of middle inferior quality question and answer.
Yet, this method do not consider a problem and answer between the semantic matches degree, the semantic matches degree between problem and the answer is the right bases of high-quality question and answer; This method do not consider many repetition question and answer between relation for the influence of question and answer to quality; In addition, the right data of question and answer lack the usage feature in the production process usually, and this method more relies on the usage feature, can influence its versatility.This shows, prior art to question and answer when carrying out quality assessment, its effect is unsatisfactory, and has the problem of versatility difference.
Summary of the invention
In view of this, fundamental purpose of the present invention is to provide right quality evaluating method of a kind of question and answer and system, to realize that to question and answer more effective quality assessment is improved versatility.
For achieving the above object, technical scheme of the present invention is achieved in that
The invention provides the right quality evaluating method of a kind of question and answer, this method comprises:
To the question and answer of input to carrying out cluster according to problem content, obtain by identical or close problem of semanteme and answer thereof form bunch;
To described bunch carry out question and answer to quality assessment and the internal quality assessment of question and answer, and obtain respectively question and answer to quality evaluation result and the internal quality evaluation result of question and answer;
To described question and answer to quality evaluation result and the internal quality evaluation result of question and answer merge, the question and answer of outputting high quality are right.
Described cluster comprises: k-means cluster and single pass cluster.
Described single pass cluster is specially:
The problem of back input is carried out similarity calculating one by one with the class of current existence,, then described problem is merged with corresponding class if the similarity of described problem and one of them class exceeds default similarity threshold; If the similarity of all classes of described problem and current existence all is lower than default similarity threshold, then be that described problem is created a new class.
State question and answer to quality assessment, be specially:
Participle, part-of-speech tagging and the processing of removal stop words are carried out in each answer in described bunch;
Add up the document frequency that each speech occurs, and with document frequency greater than the speech of frequency threshold as bunch in the theme center of all answers;
Calculate the distance at each answer and theme center by general cosine distance function, and each answer is sorted according to the weights size of distance;
According to calculating based on the similarity of sentence level, eliminate similarity relation and relation of inclusion in the answer after the ordering, obtain described question and answer to quality evaluation result.
The internal quality assessment of described question and answer comprises: the matching degree calculating of evaluation, problem and the answer of problem and answer quality and single question and answer are to the evaluation of quality.
Described problem and the evaluation content of answering quality comprise at least a in the following content: the length of problem formatted message, answer, answer in visual signature information, the positive counter-example dictionary of problem feature, answer positive counter-example dictionary feature and question and answer to the non-text feature in the forming process.
This method further comprises: the matching degree that obtains described problem and answer by the mode based on the theme cluster.
Described single question and answer are specially the evaluation of quality:
By the maximum entropy statistical model following feature is merged, obtains the right quality assessment score value of each question and answer:
The length of problem formatted message, answer, answer in visual signature information, the positive counter-example dictionary of problem feature, answer positive counter-example dictionary feature, question and answer matching degree to non-text feature, problem and answer in the forming process.
The present invention also provides a kind of question and answer right QA system, and this system comprises:
The cluster module is used for question and answer to input to carrying out cluster according to problem content, obtain by identical or close problem of semanteme and answer thereof form bunch;
The first quality assessment module, be used for to described bunch carry out question and answer to quality assessment, obtain question and answer to quality evaluation result;
The second quality assessment module is used for carrying out the internal quality assessment of question and answer to described bunch, obtains the internal quality evaluation result of question and answer;
Fusion Module, be used for to described question and answer to quality evaluation result and the internal quality evaluation result of question and answer merge, the question and answer of outputting high quality are right.
Quality evaluating method and system that a kind of question and answer provided by the present invention are right, by to the question and answer of input to carrying out cluster according to problem content, obtain by identical or close problem of semanteme and answer thereof form bunch; Then to bunch carry out question and answer to quality assessment and the internal quality assessment of question and answer, and obtain respectively question and answer to quality evaluation result and the internal quality evaluation result of question and answer; Again to question and answer to quality evaluation result and the internal quality evaluation result of question and answer merge, and then the question and answer of outputting high quality are right.The present invention has realized question and answer more effective quality assessment, and versatility is higher.
High-quality question and answer can be separated low-quality question and answer centering therefrom by the present invention, be formed high-quality knowledge base; As the data source of search engine, the part of high-quality data as search engine index directly can be placed on the forward position of Search Results; As the knowledge base of automatic question answering, can be with select quality data directly as the knowledge source of automatic question answering, for the user furnishes an answer.In addition, the present invention not only can handle the knowledge question data, also can handle the data that users such as blog, forum, BBBS (Bulletin Board System)BS (BBS, Bulletin Board System), frequently asked questions and corresponding answer (FAQ, Frequently AskedQuestions) question and answer data produce; Can directly be used for setting up encyclopaedic knowledge through the quality data after estimating.
Description of drawings
Fig. 1 is the process flow diagram of the right quality evaluating method of a kind of question and answer of the present invention;
The synoptic diagram of Fig. 2 in the embodiment of the invention quality evaluation result being merged;
Fig. 3 is the composition structural representation of the right QA system of a kind of question and answer of the present invention.
Embodiment
The technical solution of the present invention is further elaborated below in conjunction with the drawings and specific embodiments.
The right quality evaluating method of a kind of question and answer provided by the present invention as shown in Figure 1, mainly may further comprise the steps:
Step 101, to the question and answer of input to carrying out cluster according to problem content, obtain by identical or close problem of semanteme and answer thereof form bunch.
In actual applications, because there is different expression waies in same problem, therefore by the operation of problem cluster, problem that can semanteme is identical or close is poly-to be a class, these semantic identical or close problems and corresponding answer then formed the corresponding a plurality of answers of a plurality of problems bunch.The present invention can adopt such as clustering algorithm problem of implementation clusters such as k-means cluster and single pass clusters, but clustering algorithm of the present invention is not limited only to above-mentioned the act, can also expand according to actual needs.Be that example describes with the single pass cluster below, the principle of single pass cluster is: the problem of back input and the class of current existence are carried out similarity one by one and are calculated, if exceed default similarity threshold, then this problem is merged with corresponding class with the similarity of certain class; If all be lower than default similarity threshold, then be that this problem is created a new class with the similarity of all classes of current existence.Handle the ability of large-scale data in order to improve cluster operation, the descriptor of the present invention in can the employing problem is as the index of each class.
Concrete single pass cluster operation is:
Steps A is analyzed the problem of back input, comprises that the sentence to problem carries out operations such as participle, part-of-speech tagging and removal stop words.
Step B carries out normalized to the speech that obtains after analyzing.
Set up synonymicon, and adopt same speech to represent all synonyms according to synonymicon.For example: all full name are replaced with abbreviation, the speech of fallibility is represented with correct speech.Synonymicon is to put acquisition in order by the human-edited, comprising vocabulary of equal value, as: computer=computing machine, Dad=father=loving father=father, much=how old; Also comprise simple full name, as: the Olympic Games (the full name)=Olympic Games (abbreviation); Also comprise the vocabulary that mistake is corrected, as: wealth is paid logical=Wealth tong, and is electric bright=as to light or the like.
Step C extracts the descriptor according to the weights ordering from the process problem of normalized.
The leaching process of descriptor is specially: search the word frequency tf that each speech occurred and the document frequency df of this problem from the statistics language material, and adopt formula λ log (tf) log (1/df) to compose for each speech and go up weights; According to weights order from big to small all speech in the problem are sorted, and ordering is extracted several forward speech as descriptor (for example preceding 3 speech) according to weights.Wherein, λ is the different parameters value that is provided with at different parts of speech, and the λ value of noun, adjective, verb, adverbial word reduces successively usually; So-called word frequency is meant the frequency of adding up each speech appearance of being added up in the language material; So-called document frequency is meant the frequency of occurrences of adding up the document of being added up in the language material that comprises each speech.
Step D according to the descriptor of extracting, calculates the similarity of each class of this problem and current existence from problem, if the similarity of this problem and certain class exceeds default similarity threshold, then this problem is merged with corresponding class; If all be lower than default similarity threshold with the similarity of all classes of current existence, then be that this problem produces a new class, and with the descriptor of this problem as the new index of class.
Owing to the word in the problem has been carried out normalized, so the similarity of computational problem and each class is exactly the number that there is same words in comparison.The value defined of similarity is
Figure A200910081558D0009152313QIETU
Wherein, k represents the number of same words, tf kThe word frequency of representing k same words, df kThe document frequency of representing k same words, λ kRepresent k same words corresponding parameters value.
Through after the problem cluster, semantic identical or close problem and gather into a class, these semantic identical or close problems and corresponding answer then formed the corresponding a plurality of answers of a plurality of problems bunch.
Step 102, to bunch carry out question and answer to quality assessment and the internal quality assessment of question and answer, and obtain respectively question and answer to quality evaluation result and the internal quality evaluation result of question and answer.
To bunch in difference answer and to analyze, the mutual relationship between answering judges that question and answer are quality assessment problems to be solved between question and answer to quality according to each.The present invention to answer in each bunch between quality assessment, adopt a kind of quality evaluating method based on the theme center, specifically comprise:
Step a, with bunch in each answer be used as a document, carry out processing such as participle, part-of-speech tagging and removal stop words.
Step b adds up the document frequency that each speech occurs, and with document frequency greater than the speech of frequency threshold as bunch in the theme center of all answers.
Frequency threshold can be set according to actual needs, and for example: the setpoint frequency threshold value is 1, then with document frequency greater than the theme center as whole answer such as all nouns of 1, verb, adjective, adverbial word.A document is used as in each answer in bunch, so-called document frequency greater than 1 promptly be meant this speech bunch at least two answers in occurred.
Step c calculates the distance at each answer and theme center by general cosine distance function, and according to the weights size of distance all answers is sorted.
If the feature word set at the theme center of answering is combined into: O={w 1, w 2..., w n, the lexical set of current answer A is: A={c 1, c 2..., c m, the cosine vector of then answering A and theme center is: cos ( A , Q ) = Σ i ∈ A ∩ O W o , i W a , i Σ x ∈ O ( W o , x ) 2 Σ y ∈ A ( W a , y ) 2 . Wherein, W O, xThe weight of expression speech x in the theme center O, and W O, x=tf L, xLog (tf x) log (1/df x), tf L, xBe illustrated in the local frequencies of the speech x appearance of adding up in the answer in this bunch, W A, yThe weights W of expression speech y in answering A A, y=log (tf y) log (1/df y).In addition, the word frequency tf of each speech kWith document frequency df xBe that statistics obtains from whole language material.
Calculate after each answers cosine distance with the theme center, according to the weights order from big to small of distance all answers are sorted, weights are big more, show that this answer and theme center are close more.
Steps d is eliminated similarity relation and relation of inclusion in the answer set after sorting.
Answer after ordering set is analyzed, judged and answer in the set whether have same or analogous answer, be i.e. similarity relation; Perhaps having certain answer is the situation of the subclass of another answer, i.e. relation of inclusion.If two answers are identical or similar, then the weights of these two answers in ordering are also basic identical; If one of them answer is the subclass of another answer, then the weights of superset in ordering are greater than the weights of subclass in ordering.
When in answering set, having similarity relation, promptly there is same or analogous answer, then only need to keep maximum the getting final product of answering of weights in the ordering, and remaining all is a redundant information, can remove; When having relation of inclusion in answering set, i.e. one of them answer is the subclass of another answer, then only needs to keep the answer of superset.
In order to discern the similarity relation and the relation of inclusion of answering in the set, the present invention adopts and realizes based on the similarity calculating method of sentence level, is specially:
Step 01, subordinate sentence is carried out in each answer.The rule of subordinate sentence is: according to ". ? " discern Deng punctuation mark, and the length of sentence is about 50 words.
Step 02, utilize hash algorithm the text of each sentence to be converted into the finger print information of one 4 byte.Answer the finger print information combination A={s that then comprises a series of 4 bytes in the set for such one 1, s 2..., s n, with s iRegard a speech as, thereby can set up a document inverted list, share same finger print information s iAll texts just form a classification.Then to the calculated fingerprint repetition in twos of the text in classification degree, if fingerprint repeats degree greater than preset threshold value (for example 40%), then judge and have similarity relation or relation of inclusion, thereby, and write down two relations between the answer to lower the identifying of sequencing weight in the pairing answer of text that participates in the calculating of fingerprint repetition degree with the removal mark; On the contrary,, then judge not have similarity relation or relation of inclusion, also just do not need to carry out similarity relation and relation of inclusion relevant treatment if fingerprint repeats degree less than preset threshold value.
Step 03, repeat said process, all be eliminated up to the similarity relation and the relation of inclusion of all classes.
After aforesaid operations is all complete, answering in the answer that has similarity relation and relation of inclusion in the set the less answer of sequencing weight all is used and removes mark and identify, produce a correct evaluation score value with this, as question and answer to quality evaluation result.
The internal quality assessment of question and answer comprises: the matching degree calculating of evaluation, problem and the answer of problem and answer quality and single question and answer are to the evaluation of quality.
Wherein, problem and the evaluation content of answering quality can comprise at least a in the following content: 1, problem formatted message, comprise: the length of problem, punctuation mark and whether have interrogative etc., the problem quality that can satisfy prescribed form is higher, and against regulation form, and express unclear problem, do not have high-quality usually; 2, the length of Da Fuing, the answer moderate according to statistical length has higher quality usually; 3, answer in visual signature information, comprising: for each paragraph speech number, whether the paragraph prefix has black matrix is increased the weight of symbol or the like, the higher answer of quality is except moderate length usually, answering also has good visual signature information.4, the positive counter-example dictionary of problem feature, promptly the speech in the problem is respectively in positive example dictionary and the ratio in the counter-example dictionary; 5, answer positive counter-example dictionary feature, the ratio of speech in positive example dictionary and counter-example dictionary in promptly answering; 6, question and answer are to the non-text element in the forming process, for example: user's evaluation, answerer's rank, answerer's answer number and acceptance rate or the like.
It is pointed out that the quality for reflection problem and answer, the present invention has defined positive example dictionary and counter-example dictionary respectively.If the large percentage of speech in the positive example dictionary in problem or the answer, then this problem or answer are higher as high-quality possibility; Otherwise if the large percentage of speech in the positive example dictionary in problem or the answer, then this problem or answer are lower as high-quality possibility.
The building process of positive example dictionary and counter-example dictionary is as follows: at first, extract the language material of a large amount of question and answer to (as 5000), and it is marked two classes, a class is quality data collection D1, and another kind of is middle low quality data collection D2; To the problem extracted and all vocabulary that occur in answering add up, if the frequency of certain vocabulary in quality data collection D1 divided by the frequency in whole data set (comprising D1 and D2) greater than predetermined threshold value α 1, then this vocabulary enters the positive example dictionary; If the frequency of certain vocabulary in quality data collection D1 divided by the frequency in whole data set (comprising D1 and D2) less than predetermined threshold value α 2, then this vocabulary enters the counter-example dictionary.The vocabulary that occurs in the problem enters the positive counter-example dictionary of problem, and the vocabulary that occurs in the answer enters the positive counter-example dictionary of answer.
The present invention proposes the matching degree calculating that a kind of method based on the theme cluster is carried out problem and answer, is specially:
Step 001 is collected a certain amount of overall corpus (as 80GB) as the statistics language material of putting mutual information, and this statistics language material is carried out word segmentation processing, and according to formula PMI ( w 1 , w 2 ) = log 2 P ( w 1 , w 2 ) P ( w 1 ) P ( w 2 ) Calculate the some mutual information between speech and the speech.Wherein, PMI (w 1, w 2) expression speech w 1With speech w 2Between the some mutual information, P (w 1) expression speech w 1The frequency of occurrences in statistics, P (w 2) expression speech w 2The frequency of occurrences in statistics, P (w 1w 2) expression speech w 1And w 2Co-occurrence frequency, if i.e. speech w 1And w 2Appear in continuous several sentence, and the number of words of these continuous several sentences is less than length threshold (as 150 Chinese characters), then thinks speech w 1And w 2Co-occurrence.In addition, w in a document 1And w 2Occur repeatedly all only calculating once.
Step 002, to bunch in problem carry out processing such as participle and part-of-speech tagging, keep vocabulary q with noun part of speech 1, q 2Q m, the number of noun is designated as m.
Step 003, answer is handled, judge the length of answering, if greater than length threshold (as 150 Chinese characters), then it is carried out descriptor and extract processing, what descriptor was extracted mainly is operating as: search the word frequency tf that each speech occurred and the document frequency df of answer from the global statistics language material, and adopt formula TFlog (tf) log (1/df) to compose for each speech and go up weights; According to weights order from big to small all speech in answering are sorted, and extract several forward (for example n=50) nouns as descriptor.Wherein TF represents the local frequencies that corresponding speech is added up in the answer at its place.If the length of answering is then directly carried out processing such as participle, part-of-speech tagging less than length threshold to it, and extract vocabulary a with noun part of speech 1, a 2A n, number is designated as n.
Step 004 is with q iThe initial point that is the theme is judged a jWith q iThe some mutual information whether greater than a mutual information threshold value, if greater than, then with a jAdd center chain; If all less than a mutual information threshold value, then with a jDeletion.The vocabulary number that finally obtains comprising in the center chain is designated as k, and the matching degree between problem definition and the answer is: k/m+n.This defines expression, if more many relevant with keyword in the problem of the keyword in answering, this probability is just big more, expression put question to and the degree of correlation of answer high more.
In addition, in order to merge above-mentioned various features, the present invention adopts the fusion framework of maximum entropy statistical model as each feature, to realize the evaluation of single question and answer to quality.Certainly, the fusion framework among the present invention also can adopt the sorter of other types to realize, for example: and support vector machine, Bayes etc., and fusion framework of the present invention is not limited only to above-mentioned the act.
To estimate sorter be example that the fusion process of each feature is described in detail with maximum entropy below, as shown in Figure 2, maximum entropy is estimated the input feature vector that sorter adopts and is comprised: the length of problem formatted message, answer, answer in visual signature information, the positive counter-example dictionary of problem feature, answer positive counter-example dictionary feature, question and answer matching degree to non-text feature, problem and answer in the forming process.
Wherein, the positive counter-example dictionary of problem feature with the production process of answering positive counter-example dictionary feature is: each speech in statistical problem and the answer is belonging to quality data and the probability that belongs to low quality data in positive counter-example dictionary respectively; Utilize Bayesian formula to calculate P (good|Q) then, the probability of P (good|A), this probability are respectively as the problem of maximum entropy positive counter-example dictionary feature with answer the input of positive counter-example dictionary feature.
The length of answering is defined as the probability P (good|L) that belongs to quality data under this length L, and P ( good | L ) = P ( good ) p ( good | L ) P ( good ) p ( good | L ) + P ( bad ) p ( bad | L ) . Probability p (good|L), p (bad|L) adds up in training process and obtains.
Question and answer are to the non-text feature in the forming process, be that ratio, answerer's the answer number of the rank of ratio, answerer by the user being estimated score and best result and the highest level answerer's acceptance rate during greater than certain numerical value averages, a numerical value that obtains is with the input of this numerical value as non-text feature.
The problem formatted message is defined as P (good|Q)=λ 1P (good|L Q)+λ 2+ λ 3, λ wherein 1+ λ 2+ λ 3=1, λ 1, λ 2, λ 3Problem of representation is in this length L respectively QUnder be high-quality probability P (good|L Q) weighted value, problem be that high-quality weighted value, problem are high-quality weighted value when having a question the speech feature when having the punctuation mark feature.
Visual signature information is according to judging whether final formation satisfies the resulting result of formatted message in the answer, if satisfy, then this characteristic information is 1, otherwise is 0.
Above-mentioned training process is, at first in 10000 training samples, train the model parameter of maximum entropy, utilize the model parameter of maximum entropy to discern then, finally give each question and answer a correct evaluation score value is arranged, with this as the internal quality evaluation result of question and answer.The question and answer that are lower than certain threshold value for score value are right to low-quality question and answer in then thinking, directly deletion.
Step 103, to question and answer to quality evaluation result and the internal quality evaluation result of question and answer merge, the question and answer of outputting high quality are right.
The quality evaluation result that single question and answer are right and bunch in question and answer to quality evaluation result organically blend, can be undertaken by the mode of weighting, also can pass through sorter, the right evaluation score value of single question and answer and question and answer to the evaluation score value merge as two features.According to experiment statistics, the present invention adopts following scheme:
The right number N of all question and answer in the statistics bunch at first, with bunch in all question and answer to through behind the right evaluation sorter of single question and answer, it is right only to be left high-quality question and answer;
Right for these high-quality question and answer, remove involved question and answer to lower to sequencing weight with similar question and answer;
According to the question and answer that comprise in each bunch number is carried out classification marking: if N〉50, the ordering maximum in then will this bunch is normalized to 1, gets that first three is individual right as high-quality question and answer; If N〉20, the ordering maximum in then will this bunch is normalized to 0.9, get preceding two right as high-quality question and answer; If N〉10, the ordering maximum in then will this bunch is normalized to 0.8, and it is previous right as high-quality question and answer to get; If N〉5, the ordering maximum in then will this bunch is normalized to 0.7, and it is previous right as high-quality question and answer to get; If N〉1, the ordering maximum in then will this bunch is normalized to 0.6, and averages with evaluation score value in the question and answer, if maximum score value surpasses 0.7, keeps that it is right for high-quality question and answer, otherwise deletion; If N=1 then is made as 0.5 with these question and answer to score value, and average,, keep that it is right for high-quality question and answer if maximum score value surpasses 0.7 with the internal evaluation score value of question and answer, otherwise deletion.
For realizing the right quality evaluating method of question and answer of the invention described above, the present invention also provides a kind of question and answer right QA system, as shown in Figure 3, this system comprises: cluster module 10, the first quality assessment module 20, the second quality assessment module 30 and Fusion Module 40.Wherein, cluster module 10 is used for question and answer to input to carrying out cluster according to problem content, obtain by identical or close problem of semanteme and answer thereof form bunch.The first quality assessment module 20 connects cluster module 10, be used for to bunch carry out question and answer to quality assessment, obtain question and answer to quality evaluation result.The second quality assessment module 30 connects cluster module 10, is used for obtaining the internal quality evaluation result of question and answer to bunch carrying out the internal quality assessment of question and answer.Fusion Module 40 connects the first quality assessment module 20 and the second quality assessment module 30, be used for to question and answer to quality evaluation result and the internal quality evaluation result of question and answer merge, the question and answer of outputting high quality are right.
In sum, high-quality question and answer can be separated low-quality question and answer centering therefrom, be formed high-quality knowledge base by the present invention; As the data source of search engine, the part of high-quality data as search engine index directly can be placed on the forward position of Search Results; As the knowledge base of automatic question answering, can be with select quality data directly as the knowledge source of automatic question answering, for the user furnishes an answer.In addition, the present invention not only can handle the knowledge question data, also can handle the data that users such as blog, forum, BBS, FAQ question and answer data produce; Can directly be used for setting up encyclopaedic knowledge through the quality data after estimating.
The above is preferred embodiment of the present invention only, is not to be used to limit protection scope of the present invention.

Claims (9)

1, the right quality evaluating method of a kind of question and answer is characterized in that, this method comprises:
To the question and answer of input to carrying out cluster according to problem content, obtain by identical or close problem of semanteme and answer thereof form bunch;
To described bunch carry out question and answer to quality assessment and the internal quality assessment of question and answer, and obtain respectively question and answer to quality evaluation result and the internal quality evaluation result of question and answer;
To described question and answer to quality evaluation result and the internal quality evaluation result of question and answer merge, the question and answer of outputting high quality are right.
2, according to the right quality evaluating method of the described question and answer of claim 1, it is characterized in that described cluster comprises: k-means cluster and single pass cluster.
According to the right quality evaluating method of the described question and answer of claim 2, it is characterized in that 3, described single pass cluster is specially:
The problem of back input is carried out similarity calculating one by one with the class of current existence,, then described problem is merged with corresponding class if the similarity of described problem and one of them class exceeds default similarity threshold; If the similarity of all classes of described problem and current existence all is lower than default similarity threshold, then be that described problem is created a new class.
4, according to the right quality evaluating method of the described question and answer of claim 1, it is characterized in that, described question and answer to quality assessment, be specially:
Participle, part-of-speech tagging and the processing of removal stop words are carried out in each answer in described bunch;
Add up the document frequency that each speech occurs, and with document frequency greater than the speech of frequency threshold as bunch in the theme center of all answers;
Calculate the distance at each answer and theme center by general cosine distance function, and each answer is sorted according to the weights size of distance;
According to calculating based on the similarity of sentence level, eliminate similarity relation and relation of inclusion in the answer after the ordering, obtain described question and answer to quality evaluation result.
According to the right quality evaluating method of the described question and answer of claim 1, it is characterized in that 5, the internal quality assessment of described question and answer comprises: the matching degree calculating of evaluation, problem and the answer of problem and answer quality and single question and answer are to the evaluation of quality.
6, according to the right quality evaluating method of the described question and answer of claim 5, it is characterized in that described problem and the evaluation content of answering quality comprise at least a in the following content: the length of problem formatted message, answer, answer in visual signature information, the positive counter-example dictionary of problem feature, answer positive counter-example dictionary feature and question and answer to the non-text feature in the forming process.
7, according to the right quality evaluating method of the described question and answer of claim 5, it is characterized in that this method further comprises: the matching degree that obtains described problem and answer by mode based on the theme cluster.
According to the internal quality evaluating method of the described question and answer of claim 5, it is characterized in that 8, described single question and answer are specially the evaluation of quality:
By the maximum entropy statistical model following feature is merged, obtains the right quality assessment score value of each question and answer:
The length of problem formatted message, answer, answer in visual signature information, the positive counter-example dictionary of problem feature, answer positive counter-example dictionary feature, question and answer matching degree to non-text feature, problem and answer in the forming process.
9, the right QA system of a kind of question and answer is characterized in that, this system comprises:
The cluster module is used for question and answer to input to carrying out cluster according to problem content, obtain by identical or close problem of semanteme and answer thereof form bunch;
The first quality assessment module, be used for to described bunch carry out question and answer to quality assessment, obtain question and answer to quality evaluation result;
The second quality assessment module is used for carrying out the internal quality assessment of question and answer to described bunch, obtains the internal quality evaluation result of question and answer;
Fusion Module, be used for to described question and answer to quality evaluation result and the internal quality evaluation result of question and answer merge, the question and answer of outputting high quality are right.
CN200910081558A 2009-04-13 2009-04-13 Question-answer pair quality evaluation method and system Pending CN101520802A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN200910081558A CN101520802A (en) 2009-04-13 2009-04-13 Question-answer pair quality evaluation method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN200910081558A CN101520802A (en) 2009-04-13 2009-04-13 Question-answer pair quality evaluation method and system

Publications (1)

Publication Number Publication Date
CN101520802A true CN101520802A (en) 2009-09-02

Family

ID=41081391

Family Applications (1)

Application Number Title Priority Date Filing Date
CN200910081558A Pending CN101520802A (en) 2009-04-13 2009-04-13 Question-answer pair quality evaluation method and system

Country Status (1)

Country Link
CN (1) CN101520802A (en)

Cited By (66)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102629272A (en) * 2012-03-14 2012-08-08 北京邮电大学 Clustering based optimization method for examination system database
CN102681992A (en) * 2011-03-07 2012-09-19 腾讯科技(深圳)有限公司 Method and system for data hierarchy
CN102955772A (en) * 2011-08-17 2013-03-06 北京百度网讯科技有限公司 Similarity computing method and similarity computing device on basis of semanteme
CN103049433A (en) * 2012-12-11 2013-04-17 微梦创科网络科技(中国)有限公司 Automatic question answering method, automatic question answering system and method for constructing question answering case base
CN103049637A (en) * 2011-10-11 2013-04-17 塔塔咨询服务有限公司 Content quality and user engagement in social platforms
CN103218356A (en) * 2012-01-18 2013-07-24 深圳市腾讯计算机系统有限公司 Question quality judging method and system facing open platform
CN103226580A (en) * 2013-04-02 2013-07-31 西安交通大学 Interactive-text-oriented topic detection method
CN103377245A (en) * 2012-04-27 2013-10-30 腾讯科技(深圳)有限公司 Automatic question and answer method and device
CN103425635A (en) * 2012-05-15 2013-12-04 北京百度网讯科技有限公司 Method and device for recommending answers
CN103577556A (en) * 2013-10-21 2014-02-12 北京奇虎科技有限公司 Device and method for obtaining association degree of question and answer pair
CN103577558A (en) * 2013-10-21 2014-02-12 北京奇虎科技有限公司 Device and method for optimizing search ranking of frequently asked question and answer pairs
CN103729424A (en) * 2013-12-20 2014-04-16 百度在线网络技术(北京)有限公司 Method and system for assessing answers in Q&A (questions and answers) community
CN103810170A (en) * 2012-11-06 2014-05-21 腾讯科技(深圳)有限公司 Communication platform text classification method and device
CN103810218A (en) * 2012-11-14 2014-05-21 北京百度网讯科技有限公司 Problem cluster-based automatic asking and answering method and device
CN104347071A (en) * 2013-08-02 2015-02-11 安徽科大讯飞信息科技股份有限公司 Method and system for generating oral test reference answer
CN104376074A (en) * 2014-11-14 2015-02-25 北京云知声信息技术有限公司 Method and system for obtaining repeating resources
WO2015058604A1 (en) * 2013-10-21 2015-04-30 北京奇虎科技有限公司 Apparatus and method for obtaining degree of association of question and answer pair and for search ranking optimization
WO2016122681A1 (en) * 2015-01-28 2016-08-04 Intuit Inc. Pro-active detection and correction of low quality questions in a customer support system
CN106155522A (en) * 2016-06-29 2016-11-23 上海智臻智能网络科技股份有限公司 Session data process, knowledge base foundation, optimization, exchange method and device
CN106250398A (en) * 2016-07-19 2016-12-21 北京京东尚科信息技术有限公司 A kind of complaint classifying content decision method complaining event and device
CN106844334A (en) * 2016-12-20 2017-06-13 网易(杭州)网络有限公司 Method and apparatus for evaluating and testing session robotic intelligence
CN107066541A (en) * 2017-03-13 2017-08-18 平安科技(深圳)有限公司 The processing method and system of customer service question and answer data
CN107168967A (en) * 2016-03-07 2017-09-15 阿里巴巴集团控股有限公司 The acquisition methods and device of object knowledge point
CN107193872A (en) * 2017-04-14 2017-09-22 深圳前海微众银行股份有限公司 Question and answer data processing method and device
CN107229733A (en) * 2017-06-12 2017-10-03 上海智臻智能网络科技股份有限公司 Evaluation method and device are asked in extension
CN107562856A (en) * 2017-08-28 2018-01-09 深圳追科技有限公司 A kind of self-service customer service system and method
CN107908803A (en) * 2017-12-26 2018-04-13 上海智臻智能网络科技股份有限公司 The response method and device, storage medium, terminal of question and answer interaction
CN108255943A (en) * 2017-12-12 2018-07-06 百度在线网络技术(北京)有限公司 Human-computer dialogue method for evaluating quality, device, computer equipment and storage medium
US10083213B1 (en) 2015-04-27 2018-09-25 Intuit Inc. Method and system for routing a question based on analysis of the question content and predicted user satisfaction with answer content before the answer content is generated
US10134050B1 (en) 2015-04-29 2018-11-20 Intuit Inc. Method and system for facilitating the production of answer content from a mobile device for a question and answer based customer support system
US10147037B1 (en) 2015-07-28 2018-12-04 Intuit Inc. Method and system for determining a level of popularity of submission content, prior to publicizing the submission content with a question and answer support system
US10162734B1 (en) 2016-07-20 2018-12-25 Intuit Inc. Method and system for crowdsourcing software quality testing and error detection in a tax return preparation system
CN109102809A (en) * 2018-06-22 2018-12-28 北京光年无限科技有限公司 A kind of dialogue method and system for intelligent robot
US10169718B2 (en) 2015-08-13 2019-01-01 International Business Machines Corporation System and method for defining and using different levels of ground truth
CN109241519A (en) * 2018-06-28 2019-01-18 平安科技(深圳)有限公司 Environmental Evaluation Model acquisition methods and device, computer equipment and storage medium
CN109271495A (en) * 2018-08-14 2019-01-25 阿里巴巴集团控股有限公司 Question and answer recognition effect detection method, device, equipment and readable storage medium storing program for executing
CN109472030A (en) * 2018-11-09 2019-03-15 科大讯飞股份有限公司 A kind of system replys the evaluation method and device of quality
US10242093B2 (en) 2015-10-29 2019-03-26 Intuit Inc. Method and system for performing a probabilistic topic analysis of search queries for a customer support system
US10268956B2 (en) 2015-07-31 2019-04-23 Intuit Inc. Method and system for applying probabilistic topic models to content in a tax environment to improve user satisfaction with a question and answer customer support system
CN110008340A (en) * 2019-03-27 2019-07-12 曲阜师范大学 A kind of multi-source text knowledge indicates, obtains and emerging system
CN110164447A (en) * 2019-04-03 2019-08-23 苏州驰声信息科技有限公司 A kind of spoken language methods of marking and device
US10394804B1 (en) 2015-10-08 2019-08-27 Intuit Inc. Method and system for increasing internet traffic to a question and answer customer support system
US10447777B1 (en) 2015-06-30 2019-10-15 Intuit Inc. Method and system for providing a dynamically updated expertise and context based peer-to-peer customer support system within a software application
US10445332B2 (en) 2016-09-28 2019-10-15 Intuit Inc. Method and system for providing domain-specific incremental search results with a customer self-service system for a financial management system
US10460398B1 (en) 2016-07-27 2019-10-29 Intuit Inc. Method and system for crowdsourcing the detection of usability issues in a tax return preparation system
US10467541B2 (en) 2016-07-27 2019-11-05 Intuit Inc. Method and system for improving content searching in a question and answer customer support system by using a crowd-machine learning hybrid predictive model
US10475044B1 (en) 2015-07-29 2019-11-12 Intuit Inc. Method and system for question prioritization based on analysis of the question content and predicted asker engagement before answer content is generated
US10552843B1 (en) 2016-12-05 2020-02-04 Intuit Inc. Method and system for improving search results by recency boosting customer support content for a customer self-help system associated with one or more financial management systems
US10572954B2 (en) 2016-10-14 2020-02-25 Intuit Inc. Method and system for searching for and navigating to user content and other user experience pages in a financial management system with a customer self-service system for the financial management system
US10599699B1 (en) 2016-04-08 2020-03-24 Intuit, Inc. Processing unstructured voice of customer feedback for improving content rankings in customer support systems
CN111444724A (en) * 2020-03-23 2020-07-24 腾讯科技(深圳)有限公司 Medical question-answer quality testing method and device, computer equipment and storage medium
US10733677B2 (en) 2016-10-18 2020-08-04 Intuit Inc. Method and system for providing domain-specific and dynamic type ahead suggestions for search query terms with a customer self-service system for a tax return preparation system
US10748157B1 (en) 2017-01-12 2020-08-18 Intuit Inc. Method and system for determining levels of search sophistication for users of a customer self-help system to personalize a content search user experience provided to the users and to increase a likelihood of user satisfaction with the search experience
US10755294B1 (en) 2015-04-28 2020-08-25 Intuit Inc. Method and system for increasing use of mobile devices to provide answer content in a question and answer based customer support system
CN111667029A (en) * 2020-07-09 2020-09-15 腾讯科技(深圳)有限公司 Clustering method, device, equipment and storage medium
CN111680135A (en) * 2020-04-20 2020-09-18 重庆兆光科技股份有限公司 Reading understanding method based on implicit knowledge
CN111722819A (en) * 2019-03-19 2020-09-29 富士施乐株式会社 Information processing apparatus, recording medium, and information processing method
CN111967254A (en) * 2020-10-21 2020-11-20 深圳追一科技有限公司 Similar question set scoring method and device, computer equipment and storage medium
US10922367B2 (en) 2017-07-14 2021-02-16 Intuit Inc. Method and system for providing real time search preview personalization in data management systems
US10977247B2 (en) 2016-11-21 2021-04-13 International Business Machines Corporation Cognitive online meeting assistant facility
US11093951B1 (en) 2017-09-25 2021-08-17 Intuit Inc. System and method for responding to search queries using customer self-help systems associated with a plurality of data management systems
WO2021169499A1 (en) * 2020-02-26 2021-09-02 平安科技(深圳)有限公司 Network bad data monitoring method, apparatus and system, and storage medium
US11269665B1 (en) 2018-03-28 2022-03-08 Intuit Inc. Method and system for user experience personalization in data management systems using machine learning
US11436642B1 (en) 2018-01-29 2022-09-06 Intuit Inc. Method and system for generating real-time personalized advertisements in data management self-help systems
CN116775882A (en) * 2023-06-29 2023-09-19 山东科技大学 Intelligent government affair message processing method and equipment
US11967253B2 (en) 2021-05-27 2024-04-23 International Business Machines Corporation Semi-automated evaluation of long answer exams

Cited By (99)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102681992A (en) * 2011-03-07 2012-09-19 腾讯科技(深圳)有限公司 Method and system for data hierarchy
CN102955772A (en) * 2011-08-17 2013-03-06 北京百度网讯科技有限公司 Similarity computing method and similarity computing device on basis of semanteme
CN102955772B (en) * 2011-08-17 2015-11-25 北京百度网讯科技有限公司 A kind of similarity calculating method based on semanteme and device
CN103049637B (en) * 2011-10-11 2018-05-11 塔塔咨询服务有限公司 Strengthen the system and method for the content quality and user's participation of social platform
CN103049637A (en) * 2011-10-11 2013-04-17 塔塔咨询服务有限公司 Content quality and user engagement in social platforms
CN103218356B (en) * 2012-01-18 2017-12-08 深圳市世纪光速信息技术有限公司 A kind of enquirement quality judging method and system towards open platform
CN103218356A (en) * 2012-01-18 2013-07-24 深圳市腾讯计算机系统有限公司 Question quality judging method and system facing open platform
CN102629272A (en) * 2012-03-14 2012-08-08 北京邮电大学 Clustering based optimization method for examination system database
CN103377245A (en) * 2012-04-27 2013-10-30 腾讯科技(深圳)有限公司 Automatic question and answer method and device
CN103377245B (en) * 2012-04-27 2018-09-11 深圳市世纪光速信息技术有限公司 A kind of automatic question-answering method and device
CN103425635A (en) * 2012-05-15 2013-12-04 北京百度网讯科技有限公司 Method and device for recommending answers
CN103425635B (en) * 2012-05-15 2018-02-02 北京百度网讯科技有限公司 Method and apparatus are recommended in a kind of answer
CN103810170A (en) * 2012-11-06 2014-05-21 腾讯科技(深圳)有限公司 Communication platform text classification method and device
CN103810170B (en) * 2012-11-06 2018-04-27 腾讯科技(深圳)有限公司 Intercommunion platform file classification method and device
CN103810218B (en) * 2012-11-14 2018-06-08 北京百度网讯科技有限公司 A kind of automatic question-answering method and device based on problem cluster
CN103810218A (en) * 2012-11-14 2014-05-21 北京百度网讯科技有限公司 Problem cluster-based automatic asking and answering method and device
CN103049433A (en) * 2012-12-11 2013-04-17 微梦创科网络科技(中国)有限公司 Automatic question answering method, automatic question answering system and method for constructing question answering case base
CN103049433B (en) * 2012-12-11 2015-10-28 微梦创科网络科技(中国)有限公司 The method of automatic question-answering method, automatically request-answering system and structure question and answer case library
CN103226580B (en) * 2013-04-02 2016-03-30 西安交通大学 A kind of topic detection method of interaction text
CN103226580A (en) * 2013-04-02 2013-07-31 西安交通大学 Interactive-text-oriented topic detection method
CN104347071A (en) * 2013-08-02 2015-02-11 安徽科大讯飞信息科技股份有限公司 Method and system for generating oral test reference answer
WO2015058604A1 (en) * 2013-10-21 2015-04-30 北京奇虎科技有限公司 Apparatus and method for obtaining degree of association of question and answer pair and for search ranking optimization
CN103577556A (en) * 2013-10-21 2014-02-12 北京奇虎科技有限公司 Device and method for obtaining association degree of question and answer pair
CN103577558A (en) * 2013-10-21 2014-02-12 北京奇虎科技有限公司 Device and method for optimizing search ranking of frequently asked question and answer pairs
CN103577556B (en) * 2013-10-21 2017-01-18 北京奇虎科技有限公司 Device and method for obtaining association degree of question and answer pair
CN103577558B (en) * 2013-10-21 2017-04-26 北京奇虎科技有限公司 Device and method for optimizing search ranking of frequently asked question and answer pairs
CN103729424A (en) * 2013-12-20 2014-04-16 百度在线网络技术(北京)有限公司 Method and system for assessing answers in Q&A (questions and answers) community
CN103729424B (en) * 2013-12-20 2017-03-15 百度在线网络技术(北京)有限公司 Evaluation method and system is answered in Ask-Answer Community
CN104376074A (en) * 2014-11-14 2015-02-25 北京云知声信息技术有限公司 Method and system for obtaining repeating resources
CN104376074B (en) * 2014-11-14 2018-05-01 北京云知声信息技术有限公司 One kind repeats resource acquiring method and system
US10475043B2 (en) 2015-01-28 2019-11-12 Intuit Inc. Method and system for pro-active detection and correction of low quality questions in a question and answer based customer support system
WO2016122681A1 (en) * 2015-01-28 2016-08-04 Intuit Inc. Pro-active detection and correction of low quality questions in a customer support system
US10083213B1 (en) 2015-04-27 2018-09-25 Intuit Inc. Method and system for routing a question based on analysis of the question content and predicted user satisfaction with answer content before the answer content is generated
US10755294B1 (en) 2015-04-28 2020-08-25 Intuit Inc. Method and system for increasing use of mobile devices to provide answer content in a question and answer based customer support system
US11429988B2 (en) 2015-04-28 2022-08-30 Intuit Inc. Method and system for increasing use of mobile devices to provide answer content in a question and answer based customer support system
US10134050B1 (en) 2015-04-29 2018-11-20 Intuit Inc. Method and system for facilitating the production of answer content from a mobile device for a question and answer based customer support system
US10447777B1 (en) 2015-06-30 2019-10-15 Intuit Inc. Method and system for providing a dynamically updated expertise and context based peer-to-peer customer support system within a software application
US10147037B1 (en) 2015-07-28 2018-12-04 Intuit Inc. Method and system for determining a level of popularity of submission content, prior to publicizing the submission content with a question and answer support system
US10861023B2 (en) 2015-07-29 2020-12-08 Intuit Inc. Method and system for question prioritization based on analysis of the question content and predicted asker engagement before answer content is generated
US10475044B1 (en) 2015-07-29 2019-11-12 Intuit Inc. Method and system for question prioritization based on analysis of the question content and predicted asker engagement before answer content is generated
US10268956B2 (en) 2015-07-31 2019-04-23 Intuit Inc. Method and system for applying probabilistic topic models to content in a tax environment to improve user satisfaction with a question and answer customer support system
US10169718B2 (en) 2015-08-13 2019-01-01 International Business Machines Corporation System and method for defining and using different levels of ground truth
US11138521B2 (en) 2015-08-13 2021-10-05 International Business Machines Corporation System and method for defining and using different levels of ground truth
US10169717B2 (en) 2015-08-13 2019-01-01 International Business Machines Corporation System and method for defining and using different levels of ground truth
US10394804B1 (en) 2015-10-08 2019-08-27 Intuit Inc. Method and system for increasing internet traffic to a question and answer customer support system
US10242093B2 (en) 2015-10-29 2019-03-26 Intuit Inc. Method and system for performing a probabilistic topic analysis of search queries for a customer support system
CN107168967B (en) * 2016-03-07 2020-12-04 创新先进技术有限公司 Target knowledge point acquisition method and device
CN107168967A (en) * 2016-03-07 2017-09-15 阿里巴巴集团控股有限公司 The acquisition methods and device of object knowledge point
US11734330B2 (en) 2016-04-08 2023-08-22 Intuit, Inc. Processing unstructured voice of customer feedback for improving content rankings in customer support systems
US10599699B1 (en) 2016-04-08 2020-03-24 Intuit, Inc. Processing unstructured voice of customer feedback for improving content rankings in customer support systems
CN106155522A (en) * 2016-06-29 2016-11-23 上海智臻智能网络科技股份有限公司 Session data process, knowledge base foundation, optimization, exchange method and device
CN106155522B (en) * 2016-06-29 2019-03-29 上海智臻智能网络科技股份有限公司 Session data processing, knowledge base foundation, optimization, exchange method and device
CN106250398B (en) * 2016-07-19 2020-03-27 北京京东尚科信息技术有限公司 Method and device for classifying and judging complaint content of complaint event
CN106250398A (en) * 2016-07-19 2016-12-21 北京京东尚科信息技术有限公司 A kind of complaint classifying content decision method complaining event and device
US10162734B1 (en) 2016-07-20 2018-12-25 Intuit Inc. Method and system for crowdsourcing software quality testing and error detection in a tax return preparation system
US10460398B1 (en) 2016-07-27 2019-10-29 Intuit Inc. Method and system for crowdsourcing the detection of usability issues in a tax return preparation system
US10467541B2 (en) 2016-07-27 2019-11-05 Intuit Inc. Method and system for improving content searching in a question and answer customer support system by using a crowd-machine learning hybrid predictive model
US10445332B2 (en) 2016-09-28 2019-10-15 Intuit Inc. Method and system for providing domain-specific incremental search results with a customer self-service system for a financial management system
US10572954B2 (en) 2016-10-14 2020-02-25 Intuit Inc. Method and system for searching for and navigating to user content and other user experience pages in a financial management system with a customer self-service system for the financial management system
US11403715B2 (en) 2016-10-18 2022-08-02 Intuit Inc. Method and system for providing domain-specific and dynamic type ahead suggestions for search query terms
US10733677B2 (en) 2016-10-18 2020-08-04 Intuit Inc. Method and system for providing domain-specific and dynamic type ahead suggestions for search query terms with a customer self-service system for a tax return preparation system
US10977247B2 (en) 2016-11-21 2021-04-13 International Business Machines Corporation Cognitive online meeting assistant facility
US11423411B2 (en) 2016-12-05 2022-08-23 Intuit Inc. Search results by recency boosting customer support content
US10552843B1 (en) 2016-12-05 2020-02-04 Intuit Inc. Method and system for improving search results by recency boosting customer support content for a customer self-help system associated with one or more financial management systems
CN106844334A (en) * 2016-12-20 2017-06-13 网易(杭州)网络有限公司 Method and apparatus for evaluating and testing session robotic intelligence
US10748157B1 (en) 2017-01-12 2020-08-18 Intuit Inc. Method and system for determining levels of search sophistication for users of a customer self-help system to personalize a content search user experience provided to the users and to increase a likelihood of user satisfaction with the search experience
CN107066541A (en) * 2017-03-13 2017-08-18 平安科技(深圳)有限公司 The processing method and system of customer service question and answer data
CN107193872A (en) * 2017-04-14 2017-09-22 深圳前海微众银行股份有限公司 Question and answer data processing method and device
CN107229733B (en) * 2017-06-12 2020-01-14 上海智臻智能网络科技股份有限公司 Extended question evaluation method and device
CN107229733A (en) * 2017-06-12 2017-10-03 上海智臻智能网络科技股份有限公司 Evaluation method and device are asked in extension
US10922367B2 (en) 2017-07-14 2021-02-16 Intuit Inc. Method and system for providing real time search preview personalization in data management systems
CN107562856A (en) * 2017-08-28 2018-01-09 深圳追科技有限公司 A kind of self-service customer service system and method
US11093951B1 (en) 2017-09-25 2021-08-17 Intuit Inc. System and method for responding to search queries using customer self-help systems associated with a plurality of data management systems
CN108255943A (en) * 2017-12-12 2018-07-06 百度在线网络技术(北京)有限公司 Human-computer dialogue method for evaluating quality, device, computer equipment and storage medium
CN107908803A (en) * 2017-12-26 2018-04-13 上海智臻智能网络科技股份有限公司 The response method and device, storage medium, terminal of question and answer interaction
CN107908803B (en) * 2017-12-26 2020-10-27 上海智臻智能网络科技股份有限公司 Question-answer interaction response method and device, storage medium and terminal
US11436642B1 (en) 2018-01-29 2022-09-06 Intuit Inc. Method and system for generating real-time personalized advertisements in data management self-help systems
US11269665B1 (en) 2018-03-28 2022-03-08 Intuit Inc. Method and system for user experience personalization in data management systems using machine learning
CN109102809A (en) * 2018-06-22 2018-12-28 北京光年无限科技有限公司 A kind of dialogue method and system for intelligent robot
CN109241519A (en) * 2018-06-28 2019-01-18 平安科技(深圳)有限公司 Environmental Evaluation Model acquisition methods and device, computer equipment and storage medium
CN109271495A (en) * 2018-08-14 2019-01-25 阿里巴巴集团控股有限公司 Question and answer recognition effect detection method, device, equipment and readable storage medium storing program for executing
CN109271495B (en) * 2018-08-14 2023-02-17 创新先进技术有限公司 Question-answer recognition effect detection method, device, equipment and readable storage medium
CN109472030B (en) * 2018-11-09 2023-11-24 科大讯飞股份有限公司 System recovery quality evaluation method and device
CN109472030A (en) * 2018-11-09 2019-03-15 科大讯飞股份有限公司 A kind of system replys the evaluation method and device of quality
CN111722819A (en) * 2019-03-19 2020-09-29 富士施乐株式会社 Information processing apparatus, recording medium, and information processing method
CN110008340A (en) * 2019-03-27 2019-07-12 曲阜师范大学 A kind of multi-source text knowledge indicates, obtains and emerging system
CN110164447A (en) * 2019-04-03 2019-08-23 苏州驰声信息科技有限公司 A kind of spoken language methods of marking and device
CN110164447B (en) * 2019-04-03 2021-07-27 苏州驰声信息科技有限公司 Spoken language scoring method and device
WO2021169499A1 (en) * 2020-02-26 2021-09-02 平安科技(深圳)有限公司 Network bad data monitoring method, apparatus and system, and storage medium
CN111444724A (en) * 2020-03-23 2020-07-24 腾讯科技(深圳)有限公司 Medical question-answer quality testing method and device, computer equipment and storage medium
CN111680135A (en) * 2020-04-20 2020-09-18 重庆兆光科技股份有限公司 Reading understanding method based on implicit knowledge
CN111680135B (en) * 2020-04-20 2023-08-25 重庆兆光科技股份有限公司 Reading and understanding method based on implicit knowledge
CN111667029A (en) * 2020-07-09 2020-09-15 腾讯科技(深圳)有限公司 Clustering method, device, equipment and storage medium
CN111667029B (en) * 2020-07-09 2023-11-10 腾讯科技(深圳)有限公司 Clustering method, device, equipment and storage medium
CN111967254B (en) * 2020-10-21 2021-04-06 深圳追一科技有限公司 Similar question set scoring method and device, computer equipment and storage medium
CN111967254A (en) * 2020-10-21 2020-11-20 深圳追一科技有限公司 Similar question set scoring method and device, computer equipment and storage medium
US11967253B2 (en) 2021-05-27 2024-04-23 International Business Machines Corporation Semi-automated evaluation of long answer exams
CN116775882A (en) * 2023-06-29 2023-09-19 山东科技大学 Intelligent government affair message processing method and equipment
CN116775882B (en) * 2023-06-29 2024-02-27 山东科技大学 Intelligent government affair message processing method and equipment

Similar Documents

Publication Publication Date Title
CN101520802A (en) Question-answer pair quality evaluation method and system
CN110852087B (en) Chinese error correction method and device, storage medium and electronic device
Panichella et al. How can i improve my app? classifying user reviews for software maintenance and evolution
Liu et al. Review sentiment scoring via a parse-and-paraphrase paradigm
Furlan et al. Semantic similarity of short texts in languages with a deficient natural language processing support
Reganti et al. Modeling satire in English text for automatic detection
CN106202584A (en) A kind of microblog emotional based on standard dictionary and semantic rule analyzes method
CN107688630B (en) Semantic-based weakly supervised microbo multi-emotion dictionary expansion method
CN107526841A (en) A kind of Tibetan language text summarization generation method based on Web
CN107463703A (en) English social media account number classification method based on information gain
CN106446147A (en) Emotion analysis method based on structuring features
CN105335350A (en) Language identification method based on ensemble learning
Tsapatsoulis et al. Feature extraction for tweet classification: Do the humans perform better?
Imperial et al. Developing a machine learning-based grade level classifier for Filipino children’s literature
Persing et al. Lightly-supervised modeling of argument persuasiveness
CN109871429B (en) Short text retrieval method integrating Wikipedia classification and explicit semantic features
Schmid et al. FoSIL-Offensive language classification of German tweets combining SVMs and deep learning techniques.
Sokolova et al. Verbs speak loud: Verb categories in learning polarity and strength of opinions
CN112507115B (en) Method and device for classifying emotion words in barrage text and storage medium
CN113657090A (en) Military news long text layering event extraction method
Nieto Piña Splitting rocks: Learning word sense representations from corpora and lexica
Dankhara A Review of Sentiment Analysis of Tweets
Taslioglu et al. Irony detection on microposts with limited set of features
Talwar et al. Intelligent Classroom System for Qualitative Analysis of Students' Conceptual Understanding
Jiang et al. A novel feature selection based on Tibetan grammar for Tibetan text classification

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C12 Rejection of a patent application after its publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20090902