CN103377245A - Automatic question and answer method and device - Google Patents

Automatic question and answer method and device Download PDF

Info

Publication number
CN103377245A
CN103377245A CN2012101283600A CN201210128360A CN103377245A CN 103377245 A CN103377245 A CN 103377245A CN 2012101283600 A CN2012101283600 A CN 2012101283600A CN 201210128360 A CN201210128360 A CN 201210128360A CN 103377245 A CN103377245 A CN 103377245A
Authority
CN
China
Prior art keywords
word
answer
centre
frequency
sentence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2012101283600A
Other languages
Chinese (zh)
Other versions
CN103377245B (en
Inventor
路彦雄
贺翔
焦峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Shiji Guangsu Information Technology Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN201210128360.0A priority Critical patent/CN103377245B/en
Publication of CN103377245A publication Critical patent/CN103377245A/en
Application granted granted Critical
Publication of CN103377245B publication Critical patent/CN103377245B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The embodiment of the invention discloses an automatic question and answer method. The method includes obtaining relevant existing user question and answer data according to a question string input by a user terminal; calculating the word frequency of the central words of the abstract of the existing user question and answer data; computing the word weight of every central word according to the word frequency of every central word and the pre-calculated inverse document frequency of every central word and setting the central word with the largest word weight as an answer word; determining an automatic question and answer answer corresponding to the question string according to the answer word. The invention also discloses an automatic question and answer device. The automatic question and answer method and device does not need to establish a knowledge base or limit a knowledge domain and achieves the automatic questioning and answering simply according to the existing user question and answer data.

Description

A kind of automatic question-answering method and device
Technical field
The present invention relates to the web search technical field, particularly a kind of automatic question-answering method and device.
Background technology
In current web search, Ask-Answer Community grows up gradually, and Ask-Answer Community is that the user participates in puing question to and answering, and according to this question and answer relation user and Organization of Data is got up, can be for the internet product of user search.And in the Ask-Answer Community, fully can't satisfy the user by user's answer problem to put question to demand, so the most Ask-Answer Community also provides the automatic question answering function, namely automatically user's problem provided answer by background server.
Automatic question answering mainly contains two kinds of implementation methods at present:
1) in the specific knowledge field, according to the analytical approach of setting, the automatic analysis customer problem also extracts answer from existing the answer.
2) in a large amount of predefined knowledge bases, mate answer.
In specific knowledge field inner analysis problem and extract answer, this method is owing to be limited to specific ken, so have certain limitation for the first.
And in a large amount of predefined knowledge bases, mate answer for the second, and this method problem-solving ability depends on the size of pre-stored knowledge base data volume, the problem that exceeds knowledge base scope can't realize automatic question answering.
In a word, in the prior art, automatic question answering must rely on specific knowledge field or knowledge base; So long as exceed the problem of ken or knowledge base, all can't realize automatic question answering.
Summary of the invention
In view of this, the invention provides a kind of automatic question-answering method and device, can according to user's question and answer data of existing Ask-Answer Community, realize automatic question answering.For reaching above-mentioned purpose, technical scheme of the present invention specifically is achieved in that
A kind of automatic question-answering method, the method comprises:
Problem string according to the user terminal input obtains relevant existing user's question and answer data;
Add up the word frequency of the summary centre word partly of described existing user's question and answer data;
According to the inverse document frequency of the word frequency of described each centre word and described each centre word of counting in advance, calculate the word weight of described each centre word, the centre word of word weight maximum is defined as the answer word;
Determine the answer of the automatic question answering that described problem string is corresponding according to described answer word.
Preferably, described problem string according to the user terminal input obtains relevant existing user's question and answer data, comprising:
Described problem string as retrieval string, is input to the search engine of Ask-Answer Community, obtains the Query Result corresponding with described retrieval string, every Query Result comprises title division and with the summary part of distinctive mark.
Preferably, add up the word frequency of the summary centre word partly of described existing user's question and answer data, comprising:
Add up one by one the centre word word frequency of the summary part of each bar Query Result, finish until all Query Results are all added up;
Wherein, for each bar Query Result, its summary part take the fullstop cutting as sentence, for each sentence statistics word frequency of each centre word wherein, is added up the word frequency of all centre words in obtaining making a summary with the word frequency of the centre word in all sentences.
Preferably, described word frequency with the centre word in all sentences adds up, and the word frequency of all centre words in obtaining making a summary comprises:
If the word with distinctive mark is arranged in the sentence, then the word frequency of each centre word is cumulative by 3 times of standard weights in this sentence; If the word with distinctive mark is arranged in the adjacent sentence before or after this sentence, then the word frequency of each centre word is cumulative by 2 times of standard weights in this sentence; Otherwise the word frequency of each centre word is cumulative by the standard weight in this sentence, thereby obtains the Weighted Term Frequency of all centre words in this sentence.
Preferably, described centre word word frequency of adding up one by one the summary part of each bar Query Result is finished until all Query Results are all added up, and comprising:
Compare the title division of each bar Query Result and the similarity between the described problem string, if the similarity of the title of current Query Result and described problem string is greater than default threshold value, then carry out the step of described statistics centre word word frequency, otherwise skip the step of the statistics centre word word frequency of current Query Result.
Preferably, the word weight of described each centre word of calculating comprises:
The inverse document frequency of the word frequency of the word weight of centre word=this centre word * this centre word.
Preferably, describedly determine to comprise the answer of the automatic question answering that described problem string is corresponding according to the answer word:
In the summary of described Query Result, find and front s maximum summary of answer word occur; S is the integer more than or equal to 1;
Described s summary respectively is divided into a plurality of sentences by fullstop; In these sentences, find the maximum sentence of centre word number that answer word and customer problem string occur, as the answer of automatic question answering corresponding to described problem string.
A kind of automatic call answering arrangement, this device comprises:
The question and answer data acquisition module is used for obtaining relevant existing user's question and answer data according to the problem string of user terminal input;
The word frequency statistics module is for the word frequency of the summary of adding up described existing user's question and answer data centre word partly;
Answer word determination module is used for the inverse document frequency according to the word frequency of described each centre word and described each centre word of counting in advance, and the word weight of described each centre word of calculating is defined as the answer word with the centre word of word weight maximum;
Automatic question answering answer determination module is used for determining according to described answer word the answer of the automatic question answering that described problem string is corresponding.
Preferably, described question and answer data acquisition module comprises:
Retrieval unit is used for described problem string is gone here and there as retrieval, is input to the search engine of Ask-Answer Community;
Acquiring unit is used for obtaining the Query Result corresponding with described retrieval string, and every Query Result comprises title division and with the summary part of distinctive mark.
Preferably, described word frequency statistics module comprises:
The cutting unit is used for for each bar Query Result, and it is made a summary partly take the fullstop cutting as sentence;
Statistic unit is used for each sentence for described cutting unit cutting, and statistics is the word frequency of each centre word wherein;
Cumulative unit is used for the word frequency of the centre word of all sentences of described statistic unit statistics is added up the word frequency of all centre words in obtaining making a summary;
Control module is used for controlling described cutting unit, statistic unit and cumulative unit, adds up one by one the centre word word frequency of the summary part of each bar Query Result, finishes until all Query Results are all added up.
Preferably, described cumulative unit comprises:
The sign judgment sub-unit is for the distinctive mark of the sentence of judging described cutting unit cutting;
The weight subelement that adds up, it is cumulative to be used for carrying out word frequency according to the judgement of described sign judgment sub-unit; If the word with distinctive mark is arranged in the sentence, then the word frequency of each centre word is cumulative by 3 times of standard weights in this sentence; If the word with distinctive mark is arranged in the adjacent sentence before or after this sentence, then the word frequency of each centre word is cumulative by 2 times of standard weights in this sentence; Otherwise the word frequency of each centre word is cumulative by the standard weight in this sentence, thereby obtains the Weighted Term Frequency of all centre words in this sentence.
Preferably, described word frequency statistics module further comprises:
The similarity comparing unit is used for comparing the title division of each bar Query Result and the similarity between the described problem string;
Described control module is further used for, if the similarity of the title of current Query Result and described problem string is greater than default threshold value, then control described cutting unit, statistic unit and cumulative unit, carry out the step of described statistics centre word word frequency, otherwise skip the step of the statistics centre word word frequency of current Query Result.
Preferably, described answer word determination module comprises:
The word weight calculation unit is used for according to formula: the inverse document frequency of the word frequency of the word weight of centre word=this centre word * this centre word, the word weight of described each centre word of calculating;
Answer word determining unit is used for the centre word of word weight maximum is defined as the answer word.
Preferably, described automatic question answering answer determination module comprises:
The summary acquiring unit is used for summary at described Query Result and finds and front s maximum summary of answer word occur; S is the integer more than or equal to 1;
Summary cutting unit is used for described s summary respectively is divided into a plurality of sentences by fullstop;
The answer determining unit is used for finding the maximum sentence of centre word number that answer word and customer problem string occur at the sentence of described summary cutting unit cutting, as the answer of automatic question answering corresponding to described problem string.
As seen from the above technical solution, this automatic question-answering method of the present invention and device, take full advantage of the existing user's question and answer of Ask-Answer Community data, do not need to set up the question and answer knowledge base, the ken that does not also need the limited subscriber problem, and from existing question and answer data, find out the maximally related answer of problem that proposes with the user according to parameters such as word frequency, inverse document frequency, text similarities, realize automatically answering.In addition, the present invention can also be used for general problem or text string are carried out semantic extension, can be used for classification or search etc.
Description of drawings
Fig. 1 is the automatic question-answering method process flow diagram of the embodiment of the invention;
Fig. 2 is the automatic call answering arrangement structural representation of the embodiment of the invention;
Fig. 3 is the question and answer data acquisition module structural representation of the embodiment of the invention;
Fig. 4 is the word frequency statistics modular structure synoptic diagram of the embodiment of the invention;
Fig. 5 is the cumulative cellular construction synoptic diagram of the embodiment of the invention;
Fig. 6 is the answer word determination module structural representation of the embodiment of the invention;
Fig. 7 is the automatic question answering answer determination module structural representation of the embodiment of the invention.
Embodiment
For making purpose of the present invention, technical scheme and advantage clearer, referring to the accompanying drawing embodiment that develops simultaneously, the present invention is described in more detail.
The present invention utilizes the existing question and answer data of Ask-Answer Community, obtain the question and answer data research result relevant with the problem string of user's proposition by search engine, and according to word frequency, inverse document frequency, and the parameter such as similarity between the text chunk are selected word candidate from these result for retrieval, and the weight of calculating these word candidate also sorts, with the word candidate of weight maximum as the answer word, and with the sentence at this answer word place, the automatic question answering answer of the problem string that proposes as the user.
Idiographic flow comprises the steps: as shown in Figure 1
Step 101, the problem string of inputting according to user terminal obtains relevant existing user's question and answer data;
Obtain the problem string (representing with q) that user terminal proposes, problem string q as the retrieval string, is input to the search engine of Ask-Answer Community, obtain n bar Query Result, every result comprises that title (uses t iI=1|n represents) with the summary of distinctive mark, the distinctive mark in the summary is for will making a summary, the sign that is marked with word identical in the problem string of user terminal input, in Search Results, mark when returning Search Results by the Ask-Answer Community search engine, with prompting user; Generally be to mark with red font, thus claim again to mark red summary with the summary of distinctive mark, the red word of summary acceptance of the bid that in summary, occur in fact with exactlying with identical during retrieval is gone here and there word.Certainly, according to the difference of search engine, the Query Result of acquisition also may adopt other distinctive mark, as long as get access to the summary with distinctive mark here, the form of concrete sign is any.
These Query Results are existing user's question and answer data relevant with the problem string of user's input in the Ask-Answer Community, and wherein title is the problem relevant with the problem string q of user's input, and summary then is corresponding answer.
Step 102, the word frequency of adding up the summary centre word partly of described existing user's question and answer data;
Obtaining Query Result namely after existing user's question and answer data, need to analyze one by one this n bar Query Result, and calculate in these existing user's question and answer data, the word frequency of the centre word of summary part, specific as follows:
From article one Query Result, i.e. i=1;
Title division and the similarity between the problem string q that at first can be by the comparison query result, the not high Query Result of eliminating similarity needs the Query Result quantity of analyzing and processing with minimizing, if problem string q and title t iSimilarity greater than default threshold value, illustrate that then this Search Results is enough relevant with problem string q, need to analyze, otherwise end process is then carried out the analyzing and processing of next bar Query Result;
If problem string q and title t iSimilarity greater than default threshold value, then concrete processing procedure is as follows:
Summary part a with this Query Result iWith fullstop "." cutting is that m sentence (used a I, j, j=1|m represents).For each sentence a I, jJ=1|m statistics wherein each centre word (centre word does not namely comprise stop words, high frequency words and symbol, as " I ", " ", " " etc. remaining word) word frequency tf, i.e. occurrence number, the tf of the centre word in all m sentence is added up, obtain a iIn the tf of all centre words.
Wherein, because centre word and the problem string q correlativity with distinctive mark is larger in the summary, in order to embody the difference of centre word and problem string q degree of correlation, obtain more accurately reasonably tf, when statistics tf, can also adopt weighted calculation; For example, if sentence a I, jIn word with distinctive mark is arranged, a then I, jIn the word frequency tf of each centre word cumulative by 3 times of standard weights; If a I, jFront or rear adjacent sentence (a I, j|1Or a I, j+1) in word with distinctive mark is arranged, a then I, jIn the tf of each centre word cumulative by 2 times of standard weights; Otherwise, a I, jIn the tf of each centre word cumulative by the standard weight, thereby obtain a I, jIn the Weighted Term Frequency of each centre word.
Word frequency statistics is finished, perhaps problem string q and title t iSimilarity less than or equal to default threshold value, then finish the analysis of this Query Result, process next Query Result, even i=i+1 and repeats above-mentioned processing procedure until n bar Query Result is all handled.Wherein, problem string q and title t iThe calculating of similarity can adopt the algorithm of similarity between existing any two texts, for example word neighbour scoring method (Term proximity scoring).
Step 103, the inverse document frequency (idf) of the tf of each centre word that comes out according to said process and the centre word that counts in advance calculates the word weights W of all centre words, wherein W=tf*idf; And the word weights W of each centre word sorted from big to small, the centre word of word weights W maximum is defined as the answer word.
Wherein, inverse document frequency is the inverse of document frequency, document frequency refers to occur the document number of certain word, can from the internet, add up by collecting text in advance, capture range is any, can from specific website, community, collect, perhaps directly from the Ask-Answer Community that the automatic question answering place is provided, collect.
The automatic question answering answer corresponding to problem string of user terminal input determined in the answer word that step 104, basis are determined.
Concrete steps are: in the summary of n bar Query Result, find front s maximum summary of answer word (value of s for example gets 2 for more than or equal to 1 arbitrary integer) to occur, with this s make a summary respectively press fullstop "." be divided into some sentences, then in these sentences, find the maximum sentence of centre word number that answer word and customer problem string q occur, as the answer of automatic question answering.The summary that certainly, directly will contain the sentence of answer word or comprise this answer word is defined as the automatic question answering answer and also is fine.
In addition, the present invention also provides a kind of automatic call answering arrangement, and as shown in Figure 2, this device comprises:
Question and answer data acquisition module 201 is used for obtaining relevant existing user's question and answer data according to the problem string of user's input;
Word frequency statistics module 202 is for the word frequency of the summary of adding up described existing user's question and answer data centre word partly;
Answer word determination module 203 is used for the inverse document frequency according to the word frequency of described each centre word and described each centre word of counting in advance, and the word weight of described each centre word of calculating is defined as the answer word with the centre word of word weight maximum;
Automatic question answering answer determination module 204 is used for determining according to described answer word the answer of automatic question answering.
Wherein, the concrete structure of described question and answer data acquisition module 201 comprises as shown in Figure 3:
Retrieval unit 301 is used for described problem string is gone here and there as retrieval, is input to the search engine of Ask-Answer Community;
Acquiring unit 302 is used for obtaining the Query Result corresponding with described retrieval string, and every Query Result comprises title division and with the summary part of distinctive mark.
Described word frequency statistics module 202 comprises as shown in Figure 4:
Cutting unit 401 is used for for each bar Query Result, and it is made a summary partly take the fullstop cutting as sentence;
Statistic unit 402 is used for each sentence for 401 cuttings of described cutting unit, and statistics is the word frequency of each centre word wherein;
Cumulative unit 403 is used for the word frequency of the centre word of all sentences of described statistic unit 402 statistics is added up the word frequency of all centre words in obtaining making a summary;
Control module 404 is used for controlling described cutting unit 401, statistic unit 402 and cumulative unit 403, adds up one by one the centre word word frequency of the summary part of each bar Query Result, finishes until all Query Results are all added up.
Wherein, described cumulative unit 403 as shown in Figure 5, comprising:
Sign judgment sub-unit 501 is for the distinctive mark of the sentence of judging 401 cuttings of described cutting unit;
The weight subelement 502 that adds up, it is cumulative to be used for carrying out word frequency according to the judgement of described sign judgment sub-unit 501; If the word with distinctive mark is arranged in the sentence, then the word frequency of each centre word is cumulative by 3 times of standard weights in this sentence; If the word with distinctive mark is arranged in the adjacent sentence before or after this sentence, then the word frequency of each centre word is cumulative by 2 times of standard weights in this sentence; Otherwise the word frequency of each centre word is cumulative by the standard weight in this sentence, thereby obtains the Weighted Term Frequency of all centre words in this sentence.
As shown in Figure 4, as another embodiment, described word frequency statistics module 202 may further include:
Similarity comparing unit 405 is used for comparing the title division of each bar Query Result and the similarity between the described problem string;
Described control module 404 is further used for, if the similarity of the title of current Query Result and described problem string is greater than default threshold value, then control described cutting unit, statistic unit and cumulative unit, carry out the step of described statistics centre word word frequency, otherwise skip the step of the statistics centre word word frequency of current Query Result.
Described answer word determination module 203 comprises as shown in Figure 6:
Word weight calculation unit 601 is used for according to formula: the inverse document frequency of the word frequency of the word weight of centre word=this centre word * this centre word, the word weight of described each centre word of calculating;
Answer word determining unit 602 is used for the centre word of word weight maximum is defined as the answer word.
Described automatic question answering answer determination module 204 comprises as shown in Figure 7:
Summary acquiring unit 701 is used for summary at described Query Result and finds and front s maximum summary of answer word occur; S is the integer more than or equal to 1;
Summary cutting unit 702 is used for described s summary respectively is divided into a plurality of sentences by fullstop;
Answer determining unit 703 is used for finding the maximum sentence of centre word number that answer word and customer problem string occur at the sentence of described summary cutting unit 702 cuttings, as automatic question answering answer corresponding to problem string.
By the above embodiments as seen, this automatic question-answering method of the present invention and device, take full advantage of the existing user's question and answer of Ask-Answer Community data, do not need to set up the question and answer knowledge base, the ken that does not also need the limited subscriber problem, and from existing question and answer data, find out the maximally related answer of problem that proposes with the user according to parameters such as word frequency, inverse document frequency, text similarities, realize automatically answering.In addition, the present invention can also be used for general problem or text string are carried out semantic extension, can be used for classification or search etc.
The above only is preferred embodiment of the present invention, and is in order to limit the present invention, within the spirit and principles in the present invention not all, any modification of making, is equal to replacement, improvement etc., all should be included within the scope of protection of the invention.

Claims (14)

1. automatic question-answering method is characterized in that the method comprises:
Problem string according to the user terminal input obtains relevant existing user's question and answer data;
Add up the word frequency of the summary centre word partly of described existing user's question and answer data;
According to the inverse document frequency of the word frequency of described each centre word and described each centre word of counting in advance, calculate the word weight of described each centre word, the centre word of word weight maximum is defined as the answer word;
Determine the answer of the automatic question answering that described problem string is corresponding according to described answer word.
2. automatic question-answering method as claimed in claim 1 is characterized in that, described problem string according to the user terminal input obtains relevant existing user's question and answer data, comprising:
Described problem string as retrieval string, is input to the search engine of Ask-Answer Community, obtains the Query Result corresponding with described retrieval string, every Query Result comprises title division and with the summary part of distinctive mark.
3. automatic question-answering method as claimed in claim 2 is characterized in that, adds up the word frequency of the summary centre word partly of described existing user's question and answer data, comprising:
Add up one by one the centre word word frequency of the summary part of each bar Query Result, finish until all Query Results are all added up;
Wherein, for each bar Query Result, its summary part take the fullstop cutting as sentence, for each sentence statistics word frequency of each centre word wherein, is added up the word frequency of all centre words in obtaining making a summary with the word frequency of the centre word in all sentences.
4. automatic question-answering method as claimed in claim 3 is characterized in that, described word frequency with the centre word in all sentences adds up, and the word frequency of all centre words in obtaining making a summary comprises:
If the word with distinctive mark is arranged in the sentence, then the word frequency of each centre word is cumulative by 3 times of standard weights in this sentence; If the word with distinctive mark is arranged in the adjacent sentence before or after this sentence, then the word frequency of each centre word is cumulative by 2 times of standard weights in this sentence; Otherwise the word frequency of each centre word is cumulative by the standard weight in this sentence, thereby obtains the Weighted Term Frequency of all centre words in this sentence.
5. automatic question-answering method as claimed in claim 3 is characterized in that, described centre word word frequency of adding up one by one the summary part of each bar Query Result is finished until all Query Results are all added up, and comprising:
Compare the title division of each bar Query Result and the similarity between the described problem string, if the similarity of the title of current Query Result and described problem string is greater than default threshold value, then carry out the step of described statistics centre word word frequency, otherwise skip the step of the statistics centre word word frequency of current Query Result.
6. automatic question-answering method as claimed in claim 1 is characterized in that, the word weight of described each centre word of calculating comprises:
The inverse document frequency of the word frequency of the word weight of centre word=this centre word * this centre word.
7. automatic question-answering method as claimed in claim 1 is characterized in that, describedly determines to comprise the answer of the automatic question answering that described problem string is corresponding according to the answer word:
In the summary of described Query Result, find and front s maximum summary of answer word occur; S is the integer more than or equal to 1;
Described s summary respectively is divided into a plurality of sentences by fullstop; In these sentences, find the maximum sentence of centre word number that answer word and customer problem string occur, as the answer of automatic question answering corresponding to described problem string.
8. an automatic call answering arrangement is characterized in that, this device comprises:
The question and answer data acquisition module is used for obtaining relevant existing user's question and answer data according to the problem string of user terminal input;
The word frequency statistics module is for the word frequency of the summary of adding up described existing user's question and answer data centre word partly;
Answer word determination module is used for the inverse document frequency according to the word frequency of described each centre word and described each centre word of counting in advance, and the word weight of described each centre word of calculating is defined as the answer word with the centre word of word weight maximum;
Automatic question answering answer determination module is used for determining according to described answer word the answer of the automatic question answering that described problem string is corresponding.
9. automatic call answering arrangement as claimed in claim 8 is characterized in that, described question and answer data acquisition module comprises:
Retrieval unit is used for described problem string is gone here and there as retrieval, is input to the search engine of Ask-Answer Community;
Acquiring unit is used for obtaining the Query Result corresponding with described retrieval string, and every Query Result comprises title division and with the summary part of distinctive mark.
10. automatic call answering arrangement as claimed in claim 8 is characterized in that, described word frequency statistics module comprises:
The cutting unit is used for for each bar Query Result, and it is made a summary partly take the fullstop cutting as sentence;
Statistic unit is used for each sentence for described cutting unit cutting, and statistics is the word frequency of each centre word wherein;
Cumulative unit is used for the word frequency of the centre word of all sentences of described statistic unit statistics is added up the word frequency of all centre words in obtaining making a summary;
Control module is used for controlling described cutting unit, statistic unit and cumulative unit, adds up one by one the centre word word frequency of the summary part of each bar Query Result, finishes until all Query Results are all added up.
11. automatic call answering arrangement as claimed in claim 10 is characterized in that, described cumulative unit comprises:
The sign judgment sub-unit is for the distinctive mark of the sentence of judging described cutting unit cutting;
The weight subelement that adds up, it is cumulative to be used for carrying out word frequency according to the judgement of described sign judgment sub-unit; If the word with distinctive mark is arranged in the sentence, then the word frequency of each centre word is cumulative by 3 times of standard weights in this sentence; If the word with distinctive mark is arranged in the adjacent sentence before or after this sentence, then the word frequency of each centre word is cumulative by 2 times of standard weights in this sentence; Otherwise the word frequency of each centre word is cumulative by the standard weight in this sentence, thereby obtains the Weighted Term Frequency of all centre words in this sentence.
12. automatic call answering arrangement as claimed in claim 10 is characterized in that, described word frequency statistics module further comprises:
The similarity comparing unit is used for comparing the title division of each bar Query Result and the similarity between the described problem string;
Described control module is further used for, if the similarity of the title of current Query Result and described problem string is greater than default threshold value, then control described cutting unit, statistic unit and cumulative unit, carry out the step of described statistics centre word word frequency, otherwise skip the step of the statistics centre word word frequency of current Query Result.
13. automatic call answering arrangement as claimed in claim 8 is characterized in that, described answer word determination module comprises:
The word weight calculation unit is used for according to formula: the inverse document frequency of the word frequency of the word weight of centre word=this centre word * this centre word, the word weight of described each centre word of calculating;
Answer word determining unit is used for the centre word of word weight maximum is defined as the answer word.
14. automatic call answering arrangement as claimed in claim 8 is characterized in that, described automatic question answering answer determination module comprises:
The summary acquiring unit is used for summary at described Query Result and finds and front s maximum summary of answer word occur; S is the integer more than or equal to 1;
Summary cutting unit is used for described s summary respectively is divided into a plurality of sentences by fullstop;
The answer determining unit is used for finding the maximum sentence of centre word number that answer word and customer problem string occur at the sentence of described summary cutting unit cutting, as the answer of automatic question answering corresponding to described problem string.
CN201210128360.0A 2012-04-27 2012-04-27 A kind of automatic question-answering method and device Active CN103377245B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210128360.0A CN103377245B (en) 2012-04-27 2012-04-27 A kind of automatic question-answering method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210128360.0A CN103377245B (en) 2012-04-27 2012-04-27 A kind of automatic question-answering method and device

Publications (2)

Publication Number Publication Date
CN103377245A true CN103377245A (en) 2013-10-30
CN103377245B CN103377245B (en) 2018-09-11

Family

ID=49462371

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210128360.0A Active CN103377245B (en) 2012-04-27 2012-04-27 A kind of automatic question-answering method and device

Country Status (1)

Country Link
CN (1) CN103377245B (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104375977A (en) * 2013-08-14 2015-02-25 腾讯科技(深圳)有限公司 Answer message processing method and device for question-answer communities
CN104933097A (en) * 2015-05-27 2015-09-23 百度在线网络技术(北京)有限公司 Data processing method and device for retrieval
CN105893476A (en) * 2016-03-29 2016-08-24 上海智臻智能网络科技股份有限公司 Intelligent questioning and answering method, knowledge base optimization method and device, and intelligent knowledge base
CN105893535A (en) * 2016-03-31 2016-08-24 上海智臻智能网络科技股份有限公司 Intelligent question and answer method, knowledge base optimizing method and device and intelligent knowledge base
WO2017071474A1 (en) * 2015-10-27 2017-05-04 中兴通讯股份有限公司 Method and device for processing language data items and method and device for analyzing language data items
CN108073664A (en) * 2016-11-11 2018-05-25 北京搜狗科技发展有限公司 A kind of information processing method, device, equipment and client device
CN108256056A (en) * 2018-01-12 2018-07-06 广州杰赛科技股份有限公司 Intelligent answer method and system
CN108306864A (en) * 2018-01-12 2018-07-20 深圳壹账通智能科技有限公司 Network data detection method, device, computer equipment and storage medium
CN109002434A (en) * 2018-05-31 2018-12-14 青岛理工大学 Customer service question and answer matching process, server and storage medium
CN110096567A (en) * 2019-03-14 2019-08-06 中国科学院自动化研究所 Selection method, system are replied in more wheels dialogue based on QA Analysis of Knowledge Bases Reasoning
CN112101005A (en) * 2020-04-02 2020-12-18 上海迷因网络科技有限公司 Method for generating and dynamically adjusting quick expressive force test questions

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050086045A1 (en) * 2003-10-17 2005-04-21 National Institute Of Information And Communications Technology Question answering system and question answering processing method
CN1928864A (en) * 2006-09-22 2007-03-14 浙江大学 FAQ based Chinese natural language ask and answer method
CN101071424A (en) * 2006-06-23 2007-11-14 腾讯科技(深圳)有限公司 Personalized information push system and method
CN101174273A (en) * 2007-12-04 2008-05-07 清华大学 News event detecting method based on metadata analysis
CN101315624A (en) * 2007-05-29 2008-12-03 阿里巴巴集团控股有限公司 Text subject recommending method and device
CN101520802A (en) * 2009-04-13 2009-09-02 腾讯科技(深圳)有限公司 Question-answer pair quality evaluation method and system
CN101593206A (en) * 2009-06-25 2009-12-02 腾讯科技(深圳)有限公司 Searching method and device based on answer in the question and answer interaction platform

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050086045A1 (en) * 2003-10-17 2005-04-21 National Institute Of Information And Communications Technology Question answering system and question answering processing method
CN101071424A (en) * 2006-06-23 2007-11-14 腾讯科技(深圳)有限公司 Personalized information push system and method
CN1928864A (en) * 2006-09-22 2007-03-14 浙江大学 FAQ based Chinese natural language ask and answer method
CN101315624A (en) * 2007-05-29 2008-12-03 阿里巴巴集团控股有限公司 Text subject recommending method and device
CN101174273A (en) * 2007-12-04 2008-05-07 清华大学 News event detecting method based on metadata analysis
CN101520802A (en) * 2009-04-13 2009-09-02 腾讯科技(深圳)有限公司 Question-answer pair quality evaluation method and system
CN101593206A (en) * 2009-06-25 2009-12-02 腾讯科技(深圳)有限公司 Searching method and device based on answer in the question and answer interaction platform

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104375977A (en) * 2013-08-14 2015-02-25 腾讯科技(深圳)有限公司 Answer message processing method and device for question-answer communities
CN104375977B (en) * 2013-08-14 2018-11-23 腾讯科技(深圳)有限公司 The processing method and processing device of reply message in Ask-Answer Community
CN104933097B (en) * 2015-05-27 2019-04-16 百度在线网络技术(北京)有限公司 A kind of data processing method and device for retrieval
CN104933097A (en) * 2015-05-27 2015-09-23 百度在线网络技术(北京)有限公司 Data processing method and device for retrieval
WO2017071474A1 (en) * 2015-10-27 2017-05-04 中兴通讯股份有限公司 Method and device for processing language data items and method and device for analyzing language data items
CN105893476A (en) * 2016-03-29 2016-08-24 上海智臻智能网络科技股份有限公司 Intelligent questioning and answering method, knowledge base optimization method and device, and intelligent knowledge base
CN105893535A (en) * 2016-03-31 2016-08-24 上海智臻智能网络科技股份有限公司 Intelligent question and answer method, knowledge base optimizing method and device and intelligent knowledge base
CN108073664A (en) * 2016-11-11 2018-05-25 北京搜狗科技发展有限公司 A kind of information processing method, device, equipment and client device
CN108073664B (en) * 2016-11-11 2021-08-31 北京搜狗科技发展有限公司 Information processing method, device, equipment and client equipment
CN108256056A (en) * 2018-01-12 2018-07-06 广州杰赛科技股份有限公司 Intelligent answer method and system
CN108306864A (en) * 2018-01-12 2018-07-20 深圳壹账通智能科技有限公司 Network data detection method, device, computer equipment and storage medium
CN109002434A (en) * 2018-05-31 2018-12-14 青岛理工大学 Customer service question and answer matching process, server and storage medium
CN110096567A (en) * 2019-03-14 2019-08-06 中国科学院自动化研究所 Selection method, system are replied in more wheels dialogue based on QA Analysis of Knowledge Bases Reasoning
CN112101005A (en) * 2020-04-02 2020-12-18 上海迷因网络科技有限公司 Method for generating and dynamically adjusting quick expressive force test questions
CN112101005B (en) * 2020-04-02 2022-08-30 上海迷因网络科技有限公司 Method for generating and dynamically adjusting quick expressive force test questions

Also Published As

Publication number Publication date
CN103377245B (en) 2018-09-11

Similar Documents

Publication Publication Date Title
CN103377245A (en) Automatic question and answer method and device
CN105260359B (en) Semantic key words extracting method and device
JP5540079B2 (en) Knowledge base construction method and apparatus
KR101536520B1 (en) Method and server for extracting topic and evaluating compatibility of the extracted topic
CN106156372B (en) A kind of classification method and device of internet site
US10019492B2 (en) Stop word identification method and apparatus
CN103336766A (en) Short text garbage identification and modeling method and device
KR20110115542A (en) Method for calculating semantic similarities between messages and conversations based on enhanced entity extraction
KR20150036117A (en) Query expansion
CN102622375A (en) Intelligent matching system and method for third-party lawyer recommendations
CN106897290B (en) Method and device for establishing keyword model
CN105512333A (en) Product comment theme searching method based on emotional tendency
CN112149422B (en) Dynamic enterprise news monitoring method based on natural language
CN107330057A (en) A kind of ElasticSearch search relevances algorithm optimization method and system
CN106168968B (en) Website classification method and device
CN113076735A (en) Target information acquisition method and device and server
CN106202312B (en) A kind of interest point search method and system for mobile Internet
CN108875050B (en) Text-oriented digital evidence-obtaining analysis method and device and computer readable medium
CN106909534A (en) A kind of method and device for differentiating text-safe
CN104281710A (en) Network data excavation method
CN108650145A (en) Phone number characteristic automatic extraction method under a kind of home broadband WiFi
CN102722526B (en) Part-of-speech classification statistics-based duplicate webpage and approximate webpage identification method
CN105512270B (en) Method and device for determining related objects
KR20160034471A (en) Method For Retrieving Regional Real-time Hot Issue Using SNS and SMS And System Thereof
CN100593783C (en) Method, system and device for acquiring appraisement of vocabulary semanteme

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
ASS Succession or assignment of patent right

Owner name: SHENZHEN SHIJI LIGHT SPEED INFORMATION TECHNOLOGY

Free format text: FORMER OWNER: TENGXUN SCI-TECH (SHENZHEN) CO., LTD.

Effective date: 20131021

C41 Transfer of patent application or patent right or utility model
COR Change of bibliographic data

Free format text: CORRECT: ADDRESS; FROM: 518044 SHENZHEN, GUANGDONG PROVINCE TO: 518057 SHENZHEN, GUANGDONG PROVINCE

TA01 Transfer of patent application right

Effective date of registration: 20131021

Address after: 518057 Tencent Building, 16, Nanshan District hi tech park, Guangdong, Shenzhen

Applicant after: Shenzhen Shiji Guangsu Information Technology Co., Ltd.

Address before: Shenzhen Futian District City, Guangdong province 518044 Zhenxing Road, SEG Science Park 2 East Room 403

Applicant before: Tencent Technology (Shenzhen) Co., Ltd.

C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant