CN102789466A - Question title quality judgment method and device and question guiding method and device - Google Patents

Question title quality judgment method and device and question guiding method and device Download PDF

Info

Publication number
CN102789466A
CN102789466A CN2011101311697A CN201110131169A CN102789466A CN 102789466 A CN102789466 A CN 102789466A CN 2011101311697 A CN2011101311697 A CN 2011101311697A CN 201110131169 A CN201110131169 A CN 201110131169A CN 102789466 A CN102789466 A CN 102789466A
Authority
CN
China
Prior art keywords
title
correlation
candidate
enquirement
degree
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2011101311697A
Other languages
Chinese (zh)
Other versions
CN102789466B (en
Inventor
陈庆轩
李连华
杨小光
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201110131169.7A priority Critical patent/CN102789466B/en
Publication of CN102789466A publication Critical patent/CN102789466A/en
Application granted granted Critical
Publication of CN102789466B publication Critical patent/CN102789466B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention provides a question title quality judgment method and device and a question guiding method and device. The question title quality judgment method includes: A, obtaining question titles; and B, analyzing the titles by combining with a syntactic structure and text content so as to determine quality of the titles. The question guiding method includes: a, retrieving the question titles so as to obtain candidate titles; b, filtering the candidate titles so as to obtain candidate guiding titles; c, calculating correlation degree of the question titles and the candidate guiding titles, and obtaining guiding titles according to the correlation degree; and d, showing the guiding titles to users. Load of a data base is lightened, and use efficiency of the data base is improved.

Description

A kind of enquirement title quality judging method, enquirement bootstrap technique and device thereof
[technical field]
The present invention relates to searching engine field, particularly a kind of enquirement title quality judging method, enquirement bootstrap technique and device thereof.
[background technology]
Along with the widespread use of WEB2.0, knowledge and information that how People more and more is utilized network to obtain oneself to want meanwhile, have been risen a large amount of knowledge interactive communities or knowledge question platform on the internet.So-called knowledge interactive community or knowledge question platform are meant that the user both can put question on this platform, also can on this platform, answer the place that other users put question to simultaneously.
In knowledge interactive community or knowledge question platform, having a large amount of question and answer data, therefore how to make things convenient for the user to answer or retrieve these data, is a major issue.Enquirement on these platforms all has a title, and the quality of title directly exerts an influence to retrieval or answer, and the title of therefore user being putd question to carries out quality control, is a very important job.Prior art is when addressing this problem; Normally adopt simple strategy to control, the title number of words of for example user being putd question to limits, and the number of words of regulation title can not be less than a threshold value; Or the character that lacks physical meaning in the title that the user is putd question to filters control; But these ways can not be to the express the meaning judgement of ability of the title that the user puts question to, though promptly the number of words of some title has surpassed the number of words of regulation, the content of this title but lets the people scarcely know what one has said; And the simple strategy of prior art can't identify the unsharp title of this implication; Simultaneously, to the enquirement of this inferior quality title, it is a kind of to puing question to the method for channeling conduct that prior art does not provide yet.
[summary of the invention]
Technical matters to be solved by this invention provides a kind of method, enquirement bootstrap technique and device thereof of puing question to the title quality judging; To solve the relatively poor title of the ability of expressing the meaning that prior art can not the enquirement of clear identification user on the knowledge question platform; Thereby can not control the quality of the title that the user puts question to well on the knowledge question platform; Cause being unfavorable in a large number that data retrieved gets into database, increase the problem of database burden.
The technical scheme that the present invention adopts for the technical solution problem provides a kind of method of puing question to the title quality judging, comprising: A. obtains the title of enquirement; B. combine syntactic structure and content of text to analyze to said title, to confirm the quality of said title.
The preferred embodiment one of according to the present invention, said step B comprises: the problem masterplate that B11. utilizes keyword to combine with syntactic structure matees checking to said title; B12. calculate the number of the word of the ability of expressing the meaning that comprises in the title through said checking,, confirm that said title is a high quality titles when said number during greater than first threshold.
The preferred embodiment one of according to the present invention, said step B comprises: B21. utilizes the query vocabulary that said title is mated checking; The number of the notional word that B22. calculates the effective length of the title through said checking and comprise is when said effective length during greater than the 3rd threshold value, confirms that said title is a high quality titles greater than the number of second threshold value and said notional word.
The preferred embodiment one of according to the present invention, said step B comprises: B31. utilizes the query rule that said title is mated checking, and wherein said query rule comprises a kind of restriction among vocabulary, part of speech or the position three at least; The number of the notional word that B32. calculates the effective length of the title through said checking and comprise is when said effective length during greater than the 5th threshold value, confirms that said title is a high quality titles greater than the number of the 4th threshold value and said notional word.
The preferred embodiment one of according to the present invention, said step B further comprises: when B41. can't confirm as high quality titles when said title, said title is carried out semantic analysis, to obtain the theme of said title; B42. utilize the classified information catalogue that said theme is mated checking, and judge the quality of said title according to the level of said theme and said catalogue coupling.
The present invention also provides a kind of enquirement bootstrap technique, comprising: a. retrieves the title of puing question to, to obtain candidate's title; B. said candidate's title is filtered; Guide title to obtain the candidate; Said filtration comprises adopts said enquirement title quality judging method that said candidate's title is carried out quality judging, and filters out other titles except that being judged to be high quality titles in said candidate's title; C. calculate the title of said enquirement and the degree of correlation that said candidate guides title, and obtain guiding title according to the said degree of correlation; D. show said guiding title to the user, with enquirement channeling conduct to the user.
The preferred embodiment one of according to the present invention; The said degree of correlation comprises first degree of correlation and second degree of correlation; The title that wherein said first degree of correlation is said enquirement and said candidate guide the ratio of the vocabulary number that the title of vocabulary number that title comprises jointly and said enquirement comprises separately, and the title that said second degree of correlation is said enquirement and said candidate guide vocabulary number that title comprises jointly and said candidate to guide the ratio of the vocabulary number that title comprises separately.
The preferred embodiment one of according to the present invention among the said step c, when said first degree of correlation and said second degree of correlation during all greater than the 6th threshold value, guides title to be chosen as said guiding title said candidate.
The preferred embodiment one of according to the present invention, said method further comprises before said steps d: when e1. is zero as the output result among the said step c, the title of said enquirement is carried out semantic analysis to obtain the theme of said title; E2. utilize the classified information catalogue that said theme is mated checking, and the enquirement title that from the database of the TOC level that is complementary, extracts preset number is as said guiding title.
The present invention also provides a kind of enquirement title quality judging device, comprising: input block is used to obtain the title of enquirement; The quality judging unit is used for combining syntactic structure and content of text to analyze to said title, to confirm the quality of said title.
The preferred embodiment one of according to the present invention, said quality judging unit comprises: problem masterplate authentication unit, the problem masterplate that is used to utilize keyword to combine with syntactic structure matees checking to said title; First confirms the unit, is used for calculating the number of the word of the ability of expressing the meaning that the title through said checking comprises, and when said number during greater than first threshold, confirms that said title is a high quality titles.
The preferred embodiment one of according to the present invention, said quality judging unit comprises: query vocabulary authentication unit is used to utilize the query vocabulary that said title is mated checking; Second confirms the unit, is used to calculate the effective length of the title through said checking and the number of the notional word that comprises, when said effective length during greater than the 3rd threshold value, confirms that said title is a high quality titles greater than the number of second threshold value and said notional word.
The preferred embodiment one of according to the present invention; Said quality judging unit comprises: query rule authentication unit; Be used to utilize the query rule that said title is mated checking, wherein said query rule comprises a kind of restriction among vocabulary, part of speech or the position three at least; The 3rd confirms the unit, is used to calculate the effective length of the title through said checking and the number of the notional word that comprises, when said effective length during greater than the 5th threshold value, confirms that said title is a high quality titles greater than the number of the 4th threshold value and said notional word.
The preferred embodiment one of according to the present invention, said quality judging unit further comprises: the first semantic analysis unit is used for when said title can't be confirmed as high quality titles, said title being carried out semantic analysis, to obtain the theme of said title; The 4th confirms the unit, is used to utilize the classified information catalogue that said theme is mated checking, and judges the quality of said title according to the level of said theme and said catalogue coupling.
The present invention also provides a kind of enquirement guiding device, comprising: retrieval unit is used for the title of puing question to is retrieved, to obtain candidate's title; Filter element; Be used for said candidate's title is filtered; Guide title to obtain the candidate, said filtration comprises adopts said enquirement title quality judging device that said candidate's title is carried out quality judging, and filters out other titles except that being judged to be high quality titles in said candidate's title; Correlation calculating unit is used to calculate the title of said enquirement and the degree of correlation that said candidate guides title, and obtains guiding title according to the said degree of correlation; Display unit is used for showing said guiding title to the user, with the enquirement channeling conduct to the user.
The preferred embodiment one of according to the present invention; The said degree of correlation comprises first degree of correlation and second degree of correlation; The title that wherein said first degree of correlation is said enquirement and said candidate guide the ratio of the vocabulary number that the title of vocabulary number that title comprises jointly and said enquirement comprises separately, and the title that said second degree of correlation is said enquirement and said candidate guide vocabulary number that title comprises jointly and said candidate to guide the ratio of the vocabulary number that title comprises separately.
The preferred embodiment one of according to the present invention, when said first degree of correlation and said second degree of correlation during all greater than the 6th threshold value, said correlation calculating unit guides title to be chosen as said guiding title said candidate.
The preferred embodiment one of according to the present invention, said device further comprises: the second semantic analysis unit, be used for when the output result of said correlation calculating unit is zero, the title of said enquirement is carried out semantic analysis to obtain the theme of said title; Extracting unit is used to utilize the classified information catalogue that said theme is mated checking, and the enquirement title that from the database of TOC level of coupling, extracts preset number is as said guiding title.
Can find out that by above technical scheme through combining syntactic structure and content of text to analyze to the title of puing question to, the unsharp header identification of implication is come out in can well the user being putd question to; And combine puing question to channeling conduct; Can improve the quality of the title of puing question in the database effectively, retrieve or answer, and discharge the storage space of invalid enquirement in the database thereby help the user; Alleviate the burden of database, increased the service efficiency of database.
[description of drawings]
Fig. 1 is the schematic flow sheet of puing question to the title quality judging method in the embodiment of the invention;
Fig. 2 is for puing question to the schematic flow sheet of another embodiment of title quality judging method among the present invention;
Fig. 3 is the schematic flow sheet of the method that the query rule is set up in the embodiment of the invention;
Fig. 4 is the schematic flow sheet of puing question to bootstrap technique in the embodiment of the invention;
Fig. 5 is the structural representation block diagram of puing question to the title quality judging device in the embodiment of the invention;
Fig. 6 is the structural representation block diagram of puing question to guiding device in the embodiment of the invention.
[embodiment]
In order to make the object of the invention, technical scheme and advantage clearer, describe the present invention below in conjunction with accompanying drawing and specific embodiment.
Please refer to Fig. 1, Fig. 1 is the schematic flow sheet of puing question to the title quality judging method in the embodiment of the invention.As shown in Figure 1, said method 100 comprises:
Step 101: the title that obtains enquirement;
Step 102: combine syntactic structure and content of text to analyze to said title, to confirm the quality of said title.
Embodiment below in conjunction with concrete is described in detail said method.
Please in the lump with reference to figure 1 and Fig. 2, Fig. 2 is for puing question to the schematic flow sheet of another embodiment of title quality judging method among the present invention.
Like Fig. 1 and shown in Figure 2, step S101 is corresponding with step 101, in step S101, obtains the title of enquirement, and obtaining title is the basis of carrying out subsequent treatment.Because the present invention both can be applicable to the quality judging to the title of the enquirement in the database under the line, also can be used on the line quality of the title of the enquirement of user's input is judged, so do not limit the source of title among the step S101.
Step S1021 to S1028 is corresponding with step 102; In the present embodiment, combine syntactic structure and content of text to analyze, realize through four processing logics to title; Be respectively problem masterplate coupling, interrogative coupling, query rule match and four aspects of classified information coupling; The title of puing question to for user, as long as regard as high quality titles through any one in these four processing logics, just explain this title be implication clearly; Otherwise this title just belongs to the inferior quality title, just the unsharp title of implication.
Below in conjunction with concrete step, four above-mentioned processing logics are described.
Step S1021 is the processing logic of problem masterplate coupling, wherein step S1021 to step S1022 realization: the problem masterplate that utilizes keyword to combine with syntactic structure matees checking to title.If title is through checking, execution in step S1023 then, otherwise execution in step S1022.
Described problem masterplate refers to the sentence structure definition that comprises keyword and syntactic structure, and wherein said keyword has strong query tendency usually.With following this problem masterplate is example: can also+VP+, VP wherein represents verb phrase, this masterplate represent to contain in the title " can also ", key word that " " is such; Simultaneously " can also " with " " in the middle of comprise verb phrase, when a title that meets above-mentioned requirements occurred, this title just can pass through to verify; For example; " with have the girlfriend the former boyfriend can also with? " Such title, owing to meet the problems referred to above masterplate, will be through checking.Except verb phrase, can also in sentence, limit the part except that key word in the problem masterplate through syntactic structures such as phrase or noun phrases, repeat no more at this.
Keyword in the problem masterplate has very strong query tendency; This is because the keyword extraction of problem masterplate is that the title that the high-quality in the database is putd question to is carried out obtaining after the statistical study; For example; Can extract with the enquirement that obtains in the database answering or in the enquirement of puing question to the generation back to obtain user's answer within a short period of time than the multi-user; The title of these enquirements is carried out adding up behind the participle; The number of times that in a title, occurs separately through each speech, with the common number of times that occurs of other speech, can calculate probability and each speech and other speech common probability that occurs in a title that each speech occurs separately in a title, so just can select the keyword of independent probability of occurrence speech high or that probability of occurrence is high jointly as the problem masterplate.Through to comprising of said extracted crucial word problem carry out the analysis of syntactic structure, promptly can obtain complete problem masterplate.
Step S1022: calculate the number of the word of the ability of expressing the meaning that comprises in the title through checking,, confirm that title is a high quality titles when this number during greater than first threshold, otherwise execution in step S1023.Step S1022 filters the title through coupling among the step S1021 further, thereby improves the degree of confidence of high quality titles.The word of the said ability of expressing the meaning refers to the speech that noun or verb have physical meaning like this.
Step S1023 is the processing logic of interrogative coupling, wherein step S1023 to step S1024 realization: utilize the query vocabulary that title is mated checking.If title is through checking, execution in step S1025 then, otherwise execution in step S1024.Described query vocabulary, the general knowledge in can using according to people's language summarized, for example where, which, why, how, how, who or the like.
Step S1024: calculate the effective length of the title through checking and the number of the notional word that comprises, when effective length during greater than the 3rd threshold value, confirms that title is a high quality titles greater than the number of second threshold value and notional word, otherwise execution in step S1025.Step S1024 filters the title through coupling among the step S1023 further, thereby improves the degree of confidence of high quality titles.
The effective length of title refers to all speech that a title obtains after through participle, removes the word number behind the stop words.Described stop words, refer to such as " as ", " ", " ", " ", " according to reason " etc. do not have the speech of practical significance.The number of the notional word that title comprises; Be to remove on the basis of these stop words at title; Removing some implications to enquirement does not again have the vocabulary of help, and the user who for example has is everlasting and adopts " seeking help ", " master-hand ", " swordsman " such word, the meaning that these words itself have no the content of understanding enquirement when puing question to; When therefore considering the number of notional word, also can these words be rejected.Second threshold value and the 3rd threshold value are equivalent to two thresholds, only reach the title of corresponding threshold, just confirm as high quality titles.
What step S1025 and step S1026 realized is the processing logic of query rule match, wherein step S1025: utilize the query rule that title is mated checking, said query rule comprises a kind of restriction among vocabulary, part of speech or the position three at least.If title is through checking, execution in step S1027 then, otherwise execution in step S1026.
The restriction of said vocabulary; Refer to the restriction of a concrete term of definition; For example represent a rule with following this structure: should/1+ not/1+ should/1; Wherein " answer ", " no ", " should " all be concrete word, be illustrated in the title and occur " should " such word, and the numeral in the rule " 1 " is appreciated that to be a code name; The representative be " answering ", " no ", " should " all be restriction about vocabulary, for example " should eat fruit after having a meal? " Such title just meets rule recited above.
The restriction of said part of speech; Refer to the part of speech of the part word in the restriction sentence; This rule for example: that/1+ noun/2; Refer in title and " that " such word to occur, simultaneously at that word that to occur a part of speech at the back be noun, for example " that apple similarly is a crudely-made articles? " Such title just meets rule recited above.The restriction of " 1 " expression of the numeral in the rule " that " is the restriction of a concrete vocabulary, and the numeral " 2 " expression " noun " representative be the restriction of part of speech, rather than be illustrated in need in the title appearance " noun " such vocabulary.
The restriction of said position, the word that refers to a concrete word of restriction or certain part of speech is in the position in the title.This rule for example: implication/1+ end/3, refer in title and " implication " such word to occur, this word appears at the end of title simultaneously, and " implication of ' platform ' word of Wish i knew typhoon " such title just meets above-mentioned rule.The restriction of the numeral in the rule " 1 " expression " implication " is the restriction of a concrete vocabulary, and numeral " 3 " expression limits the position that " implication " this vocabulary occurs, and " 3 " are in such position, end in the inside representative of give an example.The restriction of position is positioned in giving an example this situation at end, be positioned in addition initial, be positioned at certain word both sides or the like, as long as have positional information in the rule, can be interpreted as restriction to the position, do not enumerate one by one at this.
The just schematically description in order to explain that the present invention adopts of numeral in the above example " 1 ", " 2 ", " 3 ", in fact any symbol with the meaning represented all can adopt.In addition, in restriction rule, can carry out combination in any, be not limited to several kinds of situation in the son that preamble is given an example the restriction of vocabulary, part of speech or position.
The foundation of query rule is through the data in the database being carried out obtaining after the statistical study, please refer to Fig. 3, and Fig. 3 be the schematic flow sheet of the method for query rule foundation in the embodiment of the invention.As shown in Figure 3, the method for setting up the query rule comprises:
Step 201: from database, extract the high-quality that comprises identical interrogative according to the query vocabulary and put question to title, to form the head stack of identical interrogative.The query vocabulary is exactly the set of the word with query tendency that can sum up out according to general knowledge described in step S1023." where " this interrogative is for example arranged in the query vocabulary, and step 201 just will comprise all high-qualitys of " where " this interrogative and put question to titles to extract from database.The judgement of high-quality title can be carried out based on certain strategy, for example gets access to the time of answer or puts question to number of times of being clicked or the like according to the answer number of puing question to, enquirement.Step 201 to each interrogative, just can form the head stack relevant with this interrogative after the title that high-quality is putd question to extracts.
Step 202: frequent characteristic in the statistics head stack, to obtain statistics, wherein said characteristic comprises vocabulary, part of speech or position.Frequent item refers to the higher word of the statistics frequency of occurrences behind the participle, itself has just constituted this characteristic of vocabulary, and the position of analyzing its appearance can obtain position feature, and the annexation of analyzing itself and other word can obtain the part of speech characteristic.
Step 203: generate the query rule according to statistics.Through statistics is provided with threshold value, can the title characteristics of tool general character be picked out, through manual examination and verification further, then can obtain corresponding query rule.
Please continue to consult Fig. 2.Step S1026: calculate the effective length of the title through checking and the number of the notional word that comprises, when effective length during greater than the 5th threshold value, confirms that title is a high quality titles greater than the number of the 4th threshold value and notional word, otherwise execution in step S1027.Step S1026 and step S1024 are similar, are for the title through coupling among the step S1025 is filtered further, and it is identical with the 3rd threshold value with second threshold value that the 4th threshold value wherein and the 5th threshold value can be set to, also can difference.
Step S1027 is the processing logic of classified information coupling, wherein step S1027 to step S1028 realization: title is carried out semantic analysis, to obtain the theme of title.Title is carried out semantic analysis can adopt prior art to carry out, repeat no more at this.
Step S1028: utilize the classified information catalogue that theme is mated checking, and judge the quality of title according to the level of theme and catalogue coupling.
The classified information catalogue is the taxonomical hierarchy structure of scientific system; For example ground floor be the such cognition system of computing machine, physical culture, society on a large scale; Further segmentation again under the scope of computing machine, physical culture, society; Can obtain the second layer, for example computing machine can also be divided into notebook, desktop computer, panel computer or the like.Under the second layer, can also further segment out the 3rd layer, by that analogy.
When the theme of title matches classified information catalogue ground floor, think that this title is low-quality, just unsharp; When the theme of title matches the second layer and following each layer, can judge further whether title is high-quality through filtering policy, the restriction of a notional word threshold value for example is set for each level; When title matches certain level; The number of the notional word that it comprises surpasses the threshold value that this level is provided with again, just thinks that title is high-quality, otherwise is exactly unsharp.When the level of coupling is more little, explain that the semanteme of title is just clear more, the threshold value that therefore is provided with for notional word just can be more little.
It should be noted that; To puing question to the quality judging of title; Present embodiment adopts problem masterplate coupling, interrogative coupling, query rule match and classified information these four processing logics of coupling and processing sequence thereof to be merely exemplary description simultaneously; In other embodiments of the invention, to these four processing logic combination in any and arrangement processing sequence, all can realize puing question to the title quality judging.
Through method provided by the invention, experimental data shows that the judgement accurate rate of unsharp title is 87%, and clear problem False Rate is 3%, and unintelligible problem recall rate is 60%, explains that the present invention has obtained to judge effect preferably.
Please refer to Fig. 4, Fig. 4 is the schematic flow sheet of puing question to bootstrap technique in the embodiment of the invention.As shown in Figure 4, said method 300 comprises:
Step 301: the title to puing question to is retrieved; To obtain candidate's title, the title of promptly puing question to the user is a key word, in the database of search engine, retrieves; Find all titles that comprise this key word in the database, with these titles as candidate's title.
Step 302: candidate's title is filtered, guide title to obtain the candidate.
Candidate's title is filtered, and mainly is title and the low-quality title in order to filter out some repetitions.Because the data in the database come from different user, therefore, different users might occur and submit identical problem to, some repeating datas will occur like this, therefore, need only one of reservation to the title of these repetitions.Low-quality title is filtered, and the method for employing comprises to be judged and removes corresponding inferior quality title the quality of title.To the quality judging of title, can adopt the described method 100 of preamble to carry out, other titles in method 100 except being judged to be high quality titles all are the inferior quality titles, will filter.
Step 303: calculate the title of enquirement and the degree of correlation that the candidate guides title, and obtain guiding title according to the degree of correlation.
The degree of correlation comprises the tolerance of two aspects; Be called first degree of correlation and second degree of correlation; Wherein first degree of correlation title that refers to enquirement and candidate guide the ratio of the vocabulary number that the title of vocabulary number that title comprises jointly and enquirement comprises separately, and the title that second degree of correlation refers to enquirement and candidate guide vocabulary number that title comprises jointly and candidate to guide the ratio of the vocabulary number that title comprises separately.
For example; The title of puing question to is " what four of China is greatly "; It is " Chinese like the four big styles of cooking eat, are whose inventions " that the candidate guides title, be respectively behind these two title participles " China,, four, big, be, what " with " China, people, love eats,, four, greatly, the style of cooking, be, who, invent, "; The title of puing question to so with the vocabulary that the candidate guides title to comprise jointly be exactly " China,, four, greatly, be "; The vocabulary number that title of therefore puing question to and candidate guide title to comprise jointly is 5, and the vocabulary number that the title of enquirement comprises separately is 6, and the vocabulary number that the candidate guides title to comprise separately is 11; First degree of correlation be exactly 5/6, the second degree of correlation be exactly 5/11.
When first degree of correlation and second degree of correlation during, just think that the title of puing question to guides title relevant with the candidate, thereby guide title to be chosen as the guiding title candidate all greater than the threshold value of regulation.Still with top example shows, be " what four invents by China are " if also have a candidate to guide title, its vocabulary that comprises for " China,, four, big, invent, be, what "; The identical vocabulary that comprises with the title of puing question to " what four of China is greatly " for " China,, four, big, be, what "; First degree of correlation is that 6/6, the second degree of correlation is 6/7 so, if threshold value is made as 0.8; Guide title " what four invents by China are " for the candidate so; Its first degree of correlation and second degree of correlation all greater than the threshold value of regulation, therefore can become the guiding title, but guide title " the four big styles of cooking that Chinese's love is eaten; be whose invention " for the candidate; Though its first degree of correlation is greater than the threshold value of regulation, its second degree of correlation but less than the threshold value of regulation, therefore can not become the guiding title.
Step 304: show the guiding title to the user, with enquirement channeling conduct to the user.Just at the interface of user interactions, will guide title to arrange the back sequentially and supply the user to select, the foundation of ordering can be the degree of correlation size of calculating in the step 303, also can combine with other strategies.
Said method 300 also can further comprise step 305 before step 304: when the quantity of the guiding title that obtains in the step 303 is zero, the title of puing question to is carried out semantic analysis to obtain the theme of title.Semantic analysis can adopt prior art to carry out, and repeats no more at this.Step 306: utilize the classified information catalogue that theme is mated checking, and the enquirement title that from the database of TOC level of coupling, extracts preset number is as the guiding title.
The title of for example puing question to is " I am what spring to the Mount Emei "; Because the guiding title number that in step 303, obtains is zero; Pass through the semantic analysis of step 305 so; What obtain themes as Sichuan, and the enquirement title that so just from the database of the TOC level in " tourism-Sichuan ", extracts some is as the guiding title.
Please refer to Fig. 5, Fig. 5 is the structural representation block diagram of puing question to the title quality judging device in the embodiment of the invention.As shown in Figure 5, said device 400 comprises:
Input block 401 is used to obtain the title of enquirement;
Quality judging unit 402 is used for combining syntactic structure and content of text to analyze to said title, to confirm the quality of said title.In the present embodiment, combine syntactic structure and content of text to analyze, realize through four processing logics to title; Be respectively problem masterplate coupling, interrogative coupling, query rule match and four aspects of classified information coupling; The title of puing question to for user, as long as regard as high quality titles through any one in these four processing logics, just explain this title be implication clearly; Otherwise this title just belongs to the inferior quality title, just the unsharp title of implication.
In the present embodiment, quality judging unit 402 comprises problem masterplate authentication unit 4021, first confirmation unit 4022, query vocabulary authentication unit 4023, second confirmation unit 4024, query rule authentication unit 4025, the 3rd confirmation unit 4026, semantic analysis unit 4027 and the 4th confirmation unit 4028.Wherein problem masterplate authentication unit 4021, first confirmation unit, 4022 correspondence problem masterplate matching treatment logics; Query vocabulary authentication unit 4023, second confirmation unit, 4024 corresponding interrogative matching treatment logics; Query rule authentication unit 4025, the 3rd confirmation unit 4026 corresponding query rule match processing logics, semantic analysis unit 4027, the 4th confirmation unit 4028 corresponding classified information matching treatment logics.
Be elaborated in the face of each concrete unit down.
Problem masterplate authentication unit 4021, the problem masterplate that is used to utilize keyword to combine with syntactic structure matees checking to title, can be delivered to first confirmation unit 4022 through the title of verifying, otherwise be delivered to query vocabulary authentication unit 4023.
Described problem masterplate refers to the sentence structure definition that comprises keyword and syntactic structure, and wherein said keyword has strong query tendency usually.With following this problem masterplate is example: can also+VP+, VP wherein represents verb phrase, this masterplate represent to contain in the title " can also ", key word that " " is such; Simultaneously " can also " with " " in the middle of comprise verb phrase, when a title that meets above-mentioned requirements occurred, this title just can pass through to verify; For example; " with have the girlfriend the former boyfriend can also with? " Such title, owing to meet the problems referred to above masterplate, will be through checking.Except verb phrase, can also in sentence, limit the part except that key word in the problem masterplate through syntactic structures such as phrase or noun phrases, repeat no more at this.
Keyword in the problem masterplate has very strong query tendency; This is because the keyword extraction of problem masterplate is that the title that the high-quality in the database is putd question to is carried out obtaining after the statistical study; For example; Can extract with the enquirement that obtains in the database answering or in the enquirement of puing question to the generation back to obtain user's answer within a short period of time than the multi-user; The title of these enquirements is carried out adding up behind the participle; The number of times that in a title, occurs separately through each speech, with the common number of times that occurs of other speech, can calculate probability and each speech and other speech common probability that occurs in a title that each speech occurs separately in a title, so just can select the keyword of independent probability of occurrence speech high or that probability of occurrence is high jointly as the problem masterplate.Through to comprising of said extracted crucial word problem carry out the analysis of syntactic structure, promptly can obtain complete problem masterplate.
First confirmation unit 4022; Be used for calculating the number of the word of the ability of expressing the meaning that the title through 4021 checkings of problem masterplate authentication unit comprises; When this number during, confirm that title is a high quality titles, otherwise title is passed to query vocabulary authentication unit 4023 greater than first threshold.The word of the said ability of expressing the meaning refers to the speech that noun or verb have physical meaning like this.
Query vocabulary authentication unit 4023 is used to utilize the query vocabulary that title is mated checking.Title through checking will be passed to second confirmation unit 4024, otherwise be passed to query rule authentication unit 4025.Described query vocabulary, the general knowledge in can using according to people's language summarized, for example where, which, why, how, how, who or the like.
Second confirmation unit 4024; Be used to calculate the effective length of the title of verifying through query vocabulary authentication unit 4023 and the number of the notional word that comprises; When effective length greater than the number of second threshold value and notional word during greater than the 3rd threshold value; Confirm that title is a high quality titles, otherwise title is passed to query rule authentication unit 4025.
The effective length of title refers to all speech that a title obtains after through participle, removes the word number behind the stop words.Described stop words, refer to such as " as ", " ", " ", " ", " according to reason " etc. do not have the speech of practical significance.The number of the notional word that title comprises; Be to remove on the basis of these stop words at title; Removing some implications to enquirement does not again have the vocabulary of help, and the user who for example has is everlasting and adopts " seeking help ", " master-hand ", " swordsman " such word, the meaning that these words itself have no the content of understanding enquirement when puing question to; When therefore considering the number of notional word, also can these words be rejected.Second threshold value and the 3rd threshold value are equivalent to two thresholds, only reach the title of corresponding threshold, and second confirmation unit 4024 is just confirmed as high quality titles.
Query rule authentication unit 4025 is used to utilize the query rule that title is mated checking, and said query rule comprises a kind of restriction among vocabulary, part of speech or the position three at least.Title through checking will be passed to the 3rd confirmation unit 4026, otherwise be passed to semantic analysis unit 4027.
The restriction of said vocabulary; Refer to the restriction of a concrete term of definition; For example represent a rule with following this structure: should/1+ not/1+ should/1; Wherein " answer ", " no ", " should " all be concrete word, be illustrated in the title and occur " should " such word, and the numeral in the rule " 1 " is appreciated that to be a code name; The representative be " answering ", " no ", " should " all be restriction about vocabulary, for example " should eat fruit after having a meal? " Such title just meets rule recited above.
The restriction of said part of speech; Refer to the part of speech of the part word in the restriction sentence; This rule for example: that/1+ noun/2; Refer in title and " that " such word to occur, simultaneously at that word that to occur a part of speech at the back be noun, for example " that apple similarly is a crudely-made articles? " Such title just meets rule recited above.The restriction of " 1 " expression of the numeral in the rule " that " is the restriction of a concrete vocabulary, and the numeral " 2 " expression " noun " representative be the restriction of part of speech, rather than be illustrated in need in the title appearance " noun " such vocabulary.
The restriction of said position, the word that refers to a concrete word of restriction or certain part of speech is in the position in the title.This rule for example: implication/1+ end/3, refer in title and " implication " such word to occur, this word appears at the end of title simultaneously, and " implication of ' platform ' word of Wish i knew typhoon " such title just meets above-mentioned rule.The restriction of the numeral in the rule " 1 " expression " implication " is the restriction of a concrete vocabulary, and numeral " 3 " expression limits the position that " implication " this vocabulary occurs, and " 3 " are in such position, end in the inside representative of give an example.The restriction of position is positioned in giving an example this situation at end, be positioned in addition initial, be positioned at certain word both sides or the like, as long as have positional information in the rule, can be interpreted as restriction to the position, do not enumerate one by one at this.
The just schematically description in order to explain that the present invention adopts of numeral in the above example " 1 ", " 2 ", " 3 ", in fact any symbol with the meaning represented all can adopt.In addition, in restriction rule, can carry out combination in any, be not limited to several kinds of situation in the son that preamble is given an example the restriction of vocabulary, part of speech or position.
The 3rd confirmation unit 4026; Be used to calculate the effective length of the title of verifying through query rule authentication unit 4025 and the number of the notional word that comprises; When effective length greater than the number of the 4th threshold value and notional word during greater than the 5th threshold value; Confirm that title is a high quality titles, otherwise title is passed to semantic analysis unit 4027.It is identical with the 3rd threshold value with second threshold value that the 4th threshold value and the 5th threshold value can be set to, also can be different.
Semantic analysis unit 4027 is used for title is carried out semantic analysis, to obtain the theme of title.Title is carried out semantic analysis can adopt prior art to carry out, repeat no more at this.
The 4th confirmation unit 4028 is used to utilize the classified information catalogue that the theme that semantic analysis unit 4027 obtains is mated checking, and judges the quality of title according to the level of theme and catalogue coupling.
The classified information catalogue is the taxonomical hierarchy structure of scientific system; For example ground floor be the such cognition system of computing machine, physical culture, society on a large scale; Further segmentation again under the scope of computing machine, physical culture, society; Can obtain the second layer, for example computing machine can also be divided into notebook, desktop computer, panel computer or the like.Under the second layer, can also further segment out the 3rd layer, by that analogy.
When the theme of title matches classified information catalogue ground floor, think that this title is low-quality, just unsharp; When the theme of title matches the second layer and following each layer, can judge further whether title is high-quality through filtering policy, the restriction of a notional word threshold value for example is set for each level; When title matches certain level; The number of the notional word that it comprises surpasses the threshold value that this level is provided with again, just thinks that title is high-quality, otherwise is exactly unsharp.When the level of coupling is more little, explain that the semanteme of title is just clear more, the threshold value that therefore is provided with for notional word just can be more little.
Being no judge of to high quality titles after handling through quality judging unit 402, all is the inferior quality title, just the unsharp title of implication.It should be noted that; To puing question to the quality judging of title; Present embodiment adopts problem masterplate coupling, interrogative coupling, query rule match and classified information these four processing logics of coupling and processing sequence thereof to be merely exemplary description simultaneously; In other embodiments of the invention, to these four processing logic combination in any and arrangement processing sequence, all can realize puing question to the title quality judging.
Please refer to Fig. 6, Fig. 6 is the structural representation block diagram of puing question to guiding device in the embodiment of the invention.As shown in Figure 6, said device 500 comprises: retrieval unit 501, filter element 502, correlation calculating unit 503, display unit 504, semantic analysis unit 505, extracting unit 506.
Wherein retrieval unit 501, are used for the title of puing question to is retrieved, to obtain candidate's title; The title of promptly puing question to the user is a key word; In the database of search engine, retrieve, find all titles that comprise this key word in the database, with these titles as candidate's title.
Filter element 502 is used for candidate's title is filtered, and guides title to obtain the candidate.
Candidate's title is filtered, and mainly is title and the low-quality title in order to filter out some repetitions.Because the data in the database come from different user, therefore, different users might occur and submit identical problem to, some repeating datas will occur like this, therefore, need only one of reservation to the title of these repetitions.Low-quality title being filtered, can adopt the described device 400 of preamble to carry out, except installing 400 other titles of being judged to be the high quality titles, all is the inferior quality title, will filter.
Correlation calculating unit 503 is used to calculate the title of enquirement and the degree of correlation that the candidate guides title, and obtains guiding title according to the degree of correlation.
The degree of correlation comprises the tolerance of two aspects; Be called first degree of correlation and second degree of correlation; Wherein first degree of correlation title that refers to enquirement and candidate guide the ratio of the vocabulary number that the title of vocabulary number that title comprises jointly and enquirement comprises separately, and the title that second degree of correlation refers to enquirement and candidate guide vocabulary number that title comprises jointly and candidate to guide the ratio of the vocabulary number that title comprises separately.
For example; The title of puing question to is " what four of China is greatly "; It is " Chinese like the four big styles of cooking eat, are whose inventions " that the candidate guides title, be respectively behind these two title participles " China,, four, big, be, what " with " China, people, love eats,, four, greatly, the style of cooking, be, who, invent, "; The title of puing question to so with the vocabulary that the candidate guides title to comprise jointly be exactly " China,, four, greatly, be "; The vocabulary number that title of therefore puing question to and candidate guide title to comprise jointly is 5, and the vocabulary number that the title of enquirement comprises separately is 6, and the vocabulary number that the candidate guides title to comprise separately is 11; First degree of correlation be exactly 5/6, the second degree of correlation be exactly 5/11.
When first degree of correlation and second degree of correlation during, just think that the title of puing question to guides title relevant with the candidate, thereby guide title to be chosen as the guiding title candidate all greater than the threshold value of regulation.Still with top example shows, be " what four invents by China are " if also have a candidate to guide title, its vocabulary that comprises for " China,, four, big, invent, be, what "; The identical vocabulary that comprises with the title of puing question to " what four of China is greatly " for " China,, four, big, be, what "; First degree of correlation is that 6/6, the second degree of correlation is 6/7 so, if threshold value is made as 0.8; Guide title " what four invents by China are " for the candidate so; Its first degree of correlation and second degree of correlation all greater than the threshold value of regulation, therefore can become the guiding title, but guide title " the four big styles of cooking that Chinese's love is eaten; be whose invention " for the candidate; Though its first degree of correlation is greater than the threshold value of regulation, its second degree of correlation but less than the threshold value of regulation, therefore can not become the guiding title.
Display unit 504 is used for showing the guiding title to the user, with the enquirement channeling conduct to the user.Just at the interface of user interactions, will guide title to arrange the back sequentially and supply the user to select, the foundation of ordering can be the degree of correlation size of calculating in the correlation calculating unit 503, also can combine with other strategies.
Semantic analysis unit 505 is used for when the quantity of the guiding title of correlation calculating unit 503 output is zero, the title of puing question to being carried out semantic analysis to obtain the theme of title.Semantic analysis can adopt prior art to carry out, and repeats no more at this.
Extracting unit 506 is used to utilize the classified information catalogue that theme is mated checking, and the enquirement title that from the database of TOC level of coupling, extracts preset number is as the guiding title.
The title of for example puing question to is " I am what spring to the Mount Emei "; Because the guiding title number that correlation calculating unit 503 obtains is zero; Analysis through semantic analysis unit 505 so; What obtain themes as Sichuan, and the enquirement title that so just from the database of the TOC level in " tourism-Sichuan ", extracts some is as the guiding title.
The above is merely preferred embodiment of the present invention, and is in order to restriction the present invention, not all within spirit of the present invention and principle, any modification of being made, is equal to replacement, improvement etc., all should be included within the scope that the present invention protects.

Claims (18)

1. put question to the title quality judging method for one kind, it is characterized in that said method comprises:
A. obtain the title of enquirement;
B. combine syntactic structure and content of text to analyze to said title, to confirm the quality of said title.
2. method according to claim 1 is characterized in that, said step B comprises:
B11. the problem masterplate that utilizes keyword to combine with syntactic structure matees checking to said title;
B12. calculate the number of the word of the ability of expressing the meaning that comprises in the title through said checking,, confirm that said title is a high quality titles when said number during greater than first threshold.
3. method according to claim 1 is characterized in that, said step B comprises:
B21. utilize the query vocabulary that said title is mated checking;
The number of the notional word that B22. calculates the effective length of the title through said checking and comprise is when said effective length during greater than the 3rd threshold value, confirms that said title is a high quality titles greater than the number of second threshold value and said notional word.
4. method according to claim 1 is characterized in that, said step B comprises:
B31. utilize the query rule that said title is mated checking, wherein said query rule comprises a kind of restriction among vocabulary, part of speech or the position three at least;
The number of the notional word that B32. calculates the effective length of the title through said checking and comprise is when said effective length during greater than the 5th threshold value, confirms that said title is a high quality titles greater than the number of the 4th threshold value and said notional word.
5. according to the described method of arbitrary claim in the claim 2 to 4, it is characterized in that said step B further comprises:
B41. when said title can't be confirmed as high quality titles, said title is carried out semantic analysis, to obtain the theme of said title;
B42. utilize the classified information catalogue that said theme is mated checking, and judge the quality of said title according to the level of said theme and said catalogue coupling.
6. put question to bootstrap technique for one kind, it is characterized in that said method comprises:
A. the title of puing question to is retrieved, to obtain candidate's title;
B. said candidate's title is filtered; Guide title to obtain the candidate; Said filtration comprises that the described method of arbitrary claim is carried out quality judging to said candidate's title in the employing claim 1 to 4, and filters out other titles except that being judged to be high quality titles in said candidate's title;
C. calculate the title of said enquirement and the degree of correlation that said candidate guides title, and obtain guiding title according to the said degree of correlation;
D. show said guiding title to the user, with enquirement channeling conduct to the user.
7. method according to claim 6; It is characterized in that; The said degree of correlation comprises first degree of correlation and second degree of correlation; The title that wherein said first degree of correlation is said enquirement and said candidate guide the ratio of the vocabulary number that the title of vocabulary number that title comprises jointly and said enquirement comprises separately, and the title that said second degree of correlation is said enquirement and said candidate guide vocabulary number that title comprises jointly and said candidate to guide the ratio of the vocabulary number that title comprises separately.
8. method according to claim 7 is characterized in that, among the said step c, when said first degree of correlation and said second degree of correlation during all greater than the 6th threshold value, guides title to be chosen as said guiding title said candidate.
9. method according to claim 6 is characterized in that, said method further comprises before said steps d:
E1. when the output result among the said step c is zero, the title of said enquirement is carried out semantic analysis to obtain the theme of said title;
E2. utilize the classified information catalogue that said theme is mated checking, and the enquirement title that from the database of the TOC level that is complementary, extracts preset number is as said guiding title.
10. put question to the title quality judging device for one kind, it is characterized in that said device comprises:
Input block is used to obtain the title of enquirement;
The quality judging unit is used for combining syntactic structure and content of text to analyze to said title, to confirm the quality of said title.
11. device according to claim 10 is characterized in that, said quality judging unit comprises:
Problem masterplate authentication unit, the problem masterplate that is used to utilize keyword to combine with syntactic structure matees checking to said title;
First confirms the unit, is used for calculating the number of the word of the ability of expressing the meaning that the title through said checking comprises, and when said number during greater than first threshold, confirms that said title is a high quality titles.
12. device according to claim 10 is characterized in that, said quality judging unit comprises:
Query vocabulary authentication unit is used to utilize the query vocabulary that said title is mated checking;
Second confirms the unit, is used to calculate the effective length of the title through said checking and the number of the notional word that comprises, when said effective length during greater than the 3rd threshold value, confirms that said title is a high quality titles greater than the number of second threshold value and said notional word.
13. device according to claim 10 is characterized in that, said quality judging unit comprises:
Query rule authentication unit is used to utilize the query rule that said title is mated checking, and wherein said query rule comprises a kind of restriction among vocabulary, part of speech or the position three at least;
The 3rd confirms the unit, is used to calculate the effective length of the title through said checking and the number of the notional word that comprises, when said effective length during greater than the 5th threshold value, confirms that said title is a high quality titles greater than the number of the 4th threshold value and said notional word.
14., it is characterized in that said quality judging unit further comprises according to the described device of arbitrary claim in the claim 11 to 13:
The first semantic analysis unit is used for when said title can't be confirmed as high quality titles, said title being carried out semantic analysis, to obtain the theme of said title;
The 4th confirms the unit, is used to utilize the classified information catalogue that said theme is mated checking, and judges the quality of said title according to the level of said theme and said catalogue coupling.
15. an enquirement guiding device is characterized in that, said device comprises:
Retrieval unit is used for the title of puing question to is retrieved, to obtain candidate's title;
Filter element; Be used for said candidate's title is filtered; Guide title to obtain the candidate; Said filtration comprises that the described device of arbitrary claim carries out quality judging to said candidate's title in the employing claim 10 to 13, and filters out other titles except that being judged to be high quality titles in said candidate's title;
Correlation calculating unit is used to calculate the title of said enquirement and the degree of correlation that said candidate guides title, and obtains guiding title according to the said degree of correlation;
Display unit is used for showing said guiding title to the user, with the enquirement channeling conduct to the user.
16. device according to claim 15; It is characterized in that; The said degree of correlation comprises first degree of correlation and second degree of correlation; The title that wherein said first degree of correlation is said enquirement and said candidate guide the ratio of the vocabulary number that the title of vocabulary number that title comprises jointly and said enquirement comprises separately, and the title that said second degree of correlation is said enquirement and said candidate guide vocabulary number that title comprises jointly and said candidate to guide the ratio of the vocabulary number that title comprises separately.
17. device according to claim 16 is characterized in that, when said first degree of correlation and said second degree of correlation during all greater than the 6th threshold value, said correlation calculating unit guides title to be chosen as said guiding title said candidate.
18. device according to claim 15 is characterized in that, said device further comprises:
The second semantic analysis unit when being used for output result when said correlation calculating unit and being zero, carries out semantic analysis to obtain the theme of said title to the title of said enquirement;
Extracting unit is used to utilize the classified information catalogue that said theme is mated checking, and the enquirement title that from the database of TOC level of coupling, extracts preset number is as said guiding title.
CN201110131169.7A 2011-05-19 2011-05-19 A kind of enquirement title quality judging method, enquirement bootstrap technique and device thereof Active CN102789466B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201110131169.7A CN102789466B (en) 2011-05-19 2011-05-19 A kind of enquirement title quality judging method, enquirement bootstrap technique and device thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201110131169.7A CN102789466B (en) 2011-05-19 2011-05-19 A kind of enquirement title quality judging method, enquirement bootstrap technique and device thereof

Publications (2)

Publication Number Publication Date
CN102789466A true CN102789466A (en) 2012-11-21
CN102789466B CN102789466B (en) 2015-09-30

Family

ID=47154870

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201110131169.7A Active CN102789466B (en) 2011-05-19 2011-05-19 A kind of enquirement title quality judging method, enquirement bootstrap technique and device thereof

Country Status (1)

Country Link
CN (1) CN102789466B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103218436A (en) * 2013-04-17 2013-07-24 中国科学院自动化研究所 Similar problem retrieving method fusing user category labels and device thereof
CN104077330A (en) * 2013-03-30 2014-10-01 百度在线网络技术(北京)有限公司 Method and system for mounting problems to themes
CN110851579A (en) * 2019-11-06 2020-02-28 杨鑫蛟 User intention identification method, system, mobile terminal and storage medium
CN111581487A (en) * 2020-05-11 2020-08-25 北京字节跳动网络技术有限公司 Information processing method and device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101013421A (en) * 2007-02-02 2007-08-08 清华大学 Rule-based automatic analysis method of Chinese basic block
CN101128818A (en) * 2004-12-29 2008-02-20 奥尔有限公司 Routing queries to information sources and sorting and filtering query results
CN101510221A (en) * 2009-02-17 2009-08-19 北京大学 Enquiry statement analytical method and system for information retrieval
CN101576928A (en) * 2009-06-11 2009-11-11 腾讯科技(深圳)有限公司 Method and device for selecting related article
CN101814067A (en) * 2009-01-07 2010-08-25 张光盛 System and methods for quantitative assessment of information in natural language contents

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101128818A (en) * 2004-12-29 2008-02-20 奥尔有限公司 Routing queries to information sources and sorting and filtering query results
CN101013421A (en) * 2007-02-02 2007-08-08 清华大学 Rule-based automatic analysis method of Chinese basic block
CN101814067A (en) * 2009-01-07 2010-08-25 张光盛 System and methods for quantitative assessment of information in natural language contents
CN101510221A (en) * 2009-02-17 2009-08-19 北京大学 Enquiry statement analytical method and system for information retrieval
CN101576928A (en) * 2009-06-11 2009-11-11 腾讯科技(深圳)有限公司 Method and device for selecting related article

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104077330A (en) * 2013-03-30 2014-10-01 百度在线网络技术(北京)有限公司 Method and system for mounting problems to themes
CN103218436A (en) * 2013-04-17 2013-07-24 中国科学院自动化研究所 Similar problem retrieving method fusing user category labels and device thereof
CN103218436B (en) * 2013-04-17 2016-05-18 中国科学院自动化研究所 A kind of Similar Problems search method and device that merges class of subscriber label
CN110851579A (en) * 2019-11-06 2020-02-28 杨鑫蛟 User intention identification method, system, mobile terminal and storage medium
CN111581487A (en) * 2020-05-11 2020-08-25 北京字节跳动网络技术有限公司 Information processing method and device
CN111581487B (en) * 2020-05-11 2023-05-05 北京字节跳动网络技术有限公司 Information processing method and device

Also Published As

Publication number Publication date
CN102789466B (en) 2015-09-30

Similar Documents

Publication Publication Date Title
CN108829893B (en) Method and device for determining video label, storage medium and terminal equipment
KR101737887B1 (en) Apparatus and Method for Topic Category Classification of Social Media Text based on Cross-Media Analysis
US20160171373A1 (en) Training a Question/Answer System Using Answer Keys Based on Forum Content
US11521603B2 (en) Automatically generating conference minutes
US8126897B2 (en) Unified inverted index for video passage retrieval
CN108073568A (en) keyword extracting method and device
CN110377908B (en) Semantic understanding method, semantic understanding device, semantic understanding equipment and readable storage medium
CN106570708A (en) Management method and management system of intelligent customer service knowledge base
US20140040181A1 (en) Automatic faq generation
CN110297988A (en) Hot topic detection method based on weighting LDA and improvement Single-Pass clustering algorithm
CN109255012B (en) Method and device for machine reading understanding and candidate data set size reduction
CN110888990A (en) Text recommendation method, device, equipment and medium
CN106951503A (en) Information providing method, device, equipment and storage medium
CN103885966A (en) Question and answer interaction method and system of electronic commerce transaction platform
US11699034B2 (en) Hybrid artificial intelligence system for semi-automatic patent infringement analysis
CN111309916B (en) Digest extracting method and apparatus, storage medium, and electronic apparatus
Murray et al. Interpretation and transformation for abstracting conversations
KR102639979B1 (en) Keyword extraction apparatus, control method thereof and keyword extraction program
CN111767393A (en) Text core content extraction method and device
CN110807326A (en) Short text keyword extraction method combining GPU-DMM and text features
CN111061837A (en) Topic identification method, device, equipment and medium
CN102789466A (en) Question title quality judgment method and device and question guiding method and device
JP2017151588A (en) Image evaluation learning device, image evaluation device, image searching device, image evaluation learning method, image evaluation method, image searching method, and program
CN107908649B (en) Text classification control method
CN109634436A (en) Association method, device, equipment and the readable storage medium storing program for executing of input method

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant