CN104077330A - Method and system for mounting problems to themes - Google Patents

Method and system for mounting problems to themes Download PDF

Info

Publication number
CN104077330A
CN104077330A CN201310110075.0A CN201310110075A CN104077330A CN 104077330 A CN104077330 A CN 104077330A CN 201310110075 A CN201310110075 A CN 201310110075A CN 104077330 A CN104077330 A CN 104077330A
Authority
CN
China
Prior art keywords
theme
relevance values
value
title
returned
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201310110075.0A
Other languages
Chinese (zh)
Other versions
CN104077330B (en
Inventor
谢双宾
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201310110075.0A priority Critical patent/CN104077330B/en
Publication of CN104077330A publication Critical patent/CN104077330A/en
Application granted granted Critical
Publication of CN104077330B publication Critical patent/CN104077330B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a method for mounting problems to themes. The method includes the following steps: receiving an inputted search word, searching a corresponding theme based on the search word, and searching a problem list including the theme from a problem index database according to the theme; computing a relevance value of each problem in the problem list, sorting the problems based on the relevance values, setting a threshold value, and retur-ning the problems with the relevance values higher than the threshold value; computing a composite value of each returned problem, sorting the returned problems based on the composite values, and sequentially storing a specific number of the returned problems into result data files. Correspondingly, the invention further provides a system mounting the problems to the themes. By the method and system, browsing experience of the user for extending the problems can be improved.

Description

Carry problem is to the method and system of theme
Technical field
The present invention relates to the administrative skill of data, relate in particular to carry problem to the method and system of theme.
Background technology
Along with the development of network technology, user is more and more stronger to the dependence of network, conventionally has vague problem all can seek help from the network platform and obtains answer.Typically, take our conventional question and answer interaction platform---Baidu knows as example, and user submits even follow-up the questioning closely of the title of relevant issues, content on this platform, to obtaining satisfied answer conventionally.Further, user can also evaluate the corresponding answer of problem, so that other users are when searching for the answer of Similar Problems, and can be by checking the very clear information that will obtain such as evaluating number, date.Usually, a plurality of problems all carry under same theme, for example, problem " how long CS OL will download ", " CS download ", " CS is clicked download " etc. all carry under theme " CS: download ".
How by best in quality or correlativity the most by force or can promote the problem carry that user experiences and show under corresponding theme along with the problem that user submits to is more and more, the answer of obtaining is more and more abundanter, there will be the problem of carry.
Therefore, hope can propose a kind of carry problem addressing the above problem to the method and system of theme.
Summary of the invention
The object of this invention is to provide a kind of carry problem to the method and system of theme, can promote user's search, viewing experience.
According to an aspect of the present invention, provide the method for a kind of carry problem to theme, the method comprises the following steps:
Receive the term of inputting, based on described term, retrieve corresponding theme and in problem index database, retrieve the problem list that comprises described theme according to described theme;
Calculate the relevance values of each problem in described problem list and based on described relevance values, described problem sorted, setting a threshold value, the problem that described relevance values is greater than to described threshold value is returned;
Calculate each integrated value of having returned to problem and based on described integrated value, returned problem sorted, and the problem of returning of getting according to the order of sequence specific quantity deposits result data files in.
According to another aspect of the present invention, also provide the system of a kind of carry problem to theme, having comprised:
Receiver module, for receiving inputted term;
Retrieval module, retrieves corresponding theme and in problem index database, retrieves the problem list that comprises described theme according to described theme based on described term; Calculation element, for calculating relevance values and the integrated value of the corresponding problem of described problem list;
Order module, sorts to corresponding problem based on described relevance values or integrated value;
Return to module, by setting a threshold value, the problem that described relevance values is greater than to described threshold value is returned;
Data memory module, for depositing according to the order of sequence the problem of returning of specific quantity in result data files.
Compared with prior art, the present invention has the following advantages: based on technical scheme provided by the invention, can be the theme and obtain carry problem and the corresponding answer of most suitable (comprising the factors such as correlativity and problem quality), promote the viewing experience that user extends problem.
Accompanying drawing explanation
By reading the detailed description that non-limiting example is done of doing with reference to the following drawings, it is more obvious that other features, objects and advantages of the present invention will become:
Fig. 1 is that carry problem is in accordance with a preferred embodiment of the present invention to the method flow diagram of theme;
Fig. 2 is that carry problem is in accordance with a preferred embodiment of the present invention to the page schematic diagram of theme;
Fig. 3 is the schematic block diagram to the system of theme according to the carry problem of another preferred embodiment of the present invention.
Embodiment
Below in conjunction with accompanying drawing, the present invention is described in further detail.
According to an aspect of the present invention, provide the method for a kind of carry problem to theme.
Please refer to Fig. 1, Fig. 1 is that carry problem is in accordance with a preferred embodiment of the present invention to the method flow diagram of theme.
As shown in Figure 1, the method comprises the following steps:
Step S101, receives the term of inputting, and based on described term, retrieves corresponding theme and in problem index database, retrieves the problem list that comprises described theme according to described theme.
Particularly, term can for any with letter, Chinese character or/and word, word, phrase or the sentence that the element such as punctuation mark forms, for searching for the theme that is mounted with at least more than one problem.
Conventionally, the corresponding same theme of a plurality of similar problems, in other words, is mounted with at least more than one relevant issues conventionally under a theme.Please refer to Fig. 2, Fig. 2 is that carry problem is in accordance with a preferred embodiment of the present invention to the page schematic diagram of theme.As shown in Figure 2, the problem of the lower carry of theme " CS: download " has: " how long CS OL will download ", " CS downloads ", " CS is clicked downloads ", " CS 1.6 is downloaded " and " ask individual CS standalone version download! ", these problems sort and are illustrated in the page as factors such as the degree of correlation, user's favorable comment numbers according to certain evaluation criterion.
Wherein, each theme is comprised of centre word and label word (tag word) conventionally, and the theme " CS: download " of take is above example, and wherein, centre word is " CS ", and label word is " download ".
Input term after, according to the corresponding theme centre word of described term or/and label word obtain corresponding theme and read.For example, input term " anti-terrorism ", according to " anti-terrorism " corresponding theme centre word, be " CS ", further according to corresponding centre word, obtain related subject, as " CS: download ", " CS: standalone version ", " CS, game " and " CS: official website " etc., and these themes are read.Further, inputted term further can also be expanded, so that the result of search is more concentrated, for example, term " anti-terrorism " be extended to " anti-terrorism standalone version ", may obtain unique theme " CS: standalone version ".
Retrieval obtains after described theme, based on described theme, in problem index database, retrieves the problem list that comprises described theme.Wherein, problem generally includes title, content, answer, favorable comment number, proposition time and supplements the relevant informations such as enquirement, and problem index database be take the issue database that the title of problem is set up as index conventionally.In the present embodiment, according to read theme, in problem index database, retrieve.Conventionally, can be according to the title of theme and problem, content or/and supplement the degree of correlation of these information such as enquirement and retrieve.Preferably, according to the degree of correlation of theme and problem title, retrieve.
Wherein, described problem list is mainly listed the title of each problem, and other information of problem can directly be obtained for indexing in problem index database with title.The above-mentioned theme " CS: download " of still take is example, by retrieval, obtains the following problem list that comprises described subject content:
How long CS OL will download
CS is downloaded
CS is clicked download
CS 1.6 is downloaded
The online download of CS
CS 1.8 is downloaded
CS official downloads
In the present embodiment, problem list represents with the form of title.In other embodiments, problem list can also supplement or the form such as answer represents with content, problem.
Step S102, calculates the relevance values of each problem in described problem list and based on described relevance values, described problem is sorted, and sets a threshold value, and the problem that described relevance values is greater than to described threshold value is returned.
Particularly, a plurality of problems of carry under same theme, but the correlativity of different problems and theme there are differences, and user wishes to obtain the problem higher with topic relativity conventionally, therefore, need to determine the degree of relevancy between each problem and corresponding theme, and problem is sorted according to relevance values, with the position of determining whether each problem is presented in the page and is represented.
In the present embodiment, the degree of correlation between the title that represents each problem with relevance values and corresponding theme, in other embodiments, can also represent other information of each problem and the degree of correlation of corresponding theme with relevance values.Before calculating the relevance values of each problem, the title of theme and each problem is carried out to participle, wherein, can adopt identical segmenting method to both, described segmenting method can be for the segmenting method based on string matching, segmenting method based on understanding be or/and the segmenting method based on statistics is in this no limit.The title of each problem is carried out after participle, if the participle of title (being term) also appears in theme, be designated as and hit, and adopt TF-IDF(term frequency – inverse document frequency) algorithm or only adopt IDF(inverse document frequency) algorithm adds up wherein each participle.Wherein, relevance values (S rel) can adopt following formula to calculate:
In order more to know the relevance values solution procedure of each problem, still take theme above " CS: download " and the problem of being hung describes as example.Wherein, theme is carried out after participle, the participial construction that obtains described theme is:
Anti-terrorism/elite/download
Wherein, the title of problem that theme is hung is carried out after participle, the participial construction that obtains described each problem title is:
Want/download of anti-terrorism/elite/OL//how long
Anti-terrorism/elite/download
Anti-terrorism/elite/clicking/download
Download anti-terrorism/elite/1.6/
Anti-terrorism/elite/online/download
Download anti-terrorism/elite/1.7/
Anti-terrorism/elite/official/download
The first problem in the problems referred to above " anti-terrorism/elite/OL/ want/download/how long " of take is example, because the anti-terrorism in this problem, elite, download appear in theme " anti-terrorism/elite/download " simultaneously, these three words is designated as and is hit.Adopt IDF algorithm to add up these three words that hit, obtain the IDF value of these three words in described problem, be designated as respectively idf 1, idf 2, idf 3, correspondingly, " anti-terrorism ", " elite ", IDF value corresponding to " download " these three words in theme are designated as respectively to idf 4, idf 5, idf 6, the relevance values (S of first problem and described theme 1) be: to the calculating of the relevance values of other problems, can calculate with reference to aforesaid way.Wherein, described relevance values is positioned at 0~1 this interval.
Calculate after the relevance values of lower each problem of hanging of theme, according to certain parameter, these problems are arranged.Preferably, according to relevance values, carry out descending sort.
Further, set a threshold value of relevance values, the relevance values of described each problem and threshold value are compared, and the problem that relevance values is greater than described threshold value is returned.
Step S103, calculates each integrated value of having returned to problem and based on described integrated value, returned problem is sorted, and the problem of returning of getting according to the order of sequence specific quantity deposits result data files in.
Particularly, in order further to determine and need to be illustrated in the problem in the limited page, also need to determine other parameters of the problem of returning, as problem quality, problem readability etc.In the present embodiment, by solving the relevance values (S that returns to problem rel), problem mass value (S quality), the readable value of problem (S format) etc. correlation parameter determine the integrated value of each problem.Concrete computation process is as follows: first, as described above, the relevance values of computational problem, obtains:
Secondly, computational problem mass value.In this step, the favorable comment that first obtains particular problem from problem metadatabase is counted g, re-uses following formula and calculates:
S quality=log 2(g+1.0)/10
Wherein, the data of depositing in problem metadatabase include but not limited to that the property value of problem is as favorable comment number, submission date etc., and it is that favorable comment valence mumber by all answers under cumulative this problem obtains that the favorable comment of described problem is counted g.Conventionally, described favorable comment digital display is shown in by the title of particular problem.For example, problem title is " this car how ", the answer obtaining consists of " fine ", " generally ", " performance is pretty good; appearance is good-looking ", and wherein, the favorable comment number of " fine " is 0, the favorable comment number of " generally " is 0, the favorable comment number of " performance is pretty good, and appearance is good-looking " is 3, and favorable comment number corresponding to this problem is three number sums 3.Conventionally, described favorable comment number is regularly to upgrade.
Obtain problem mass value (S quality) after, compare S qualitywhether be greater than 1, if be greater than 1, by S qualitybe set to 1, otherwise, keep its value S qualityconstant.
Then, the readability value (S of computational problem format), wherein, readable value is for weighing particular problem with respect to the readability of theme, and concrete computation process is as follows:
(1) calculate particular problem with respect to the surplus value (tit_len_left) of theme, computing formula is:
tit_len_left=tit_len-topic_len
Wherein, tit_len is the character length of problem title, and topic_len is the character length of corresponding theme.If the surplus value obtaining (tit_len_left) is less than or equal to 10, tit_len_left is set to 2; If the surplus value obtaining is greater than 10 and be less than 20, tit_len_left is set to 1, otherwise, be set to 0;
(2) calculate particular problem with respect to the central value (cent_at_head) of theme; if the reference position that the centre word of theme occurs in the title of particular problem is less than or equal to the length that 4(is equivalent to two Chinese characters); described central value (cent_at_head) is set to 1; otherwise, be set to 0;
(3) calculate particular problem with respect to the label value (cent_tag_gap) of theme, if the centre word of theme and label word (tag word) occur that the distance of position is less than 4, cent_tag_gap is set to 2, if this distance is greater than 4 and is less than 8, cent_tag_gap is set to 1, otherwise, be set to 0.
After obtaining above-mentioned surplus value, central value, label value, obtain the readability value (S of described problem format) be:
S format=(tit_len_left+cent_at_head+cent_tag_gap)/5.0
Finally, solve the integrated value of described problem, computing formula is as follows:
S final=S rel*W rel++ quality*W quality+S format*W format
Wherein, W rel, W quality, W formatfor default weighted value, represent respectively the significance level of correlativity, quality, readable three features.Preferably, be set to respectively 0.8,0.1,0.1.
According to integrated value S finalreturned problem is sorted, and preferably, the problem of arranging and getting specific quantity according to descending deposits result data files in.For example, getting ordering front 5 problems deposits in result data files as net result.Preferably, got problem is carried out to duplicate removal, with the result that guarantees to deposit in, do not repeat.Wherein, the information of depositing in result data files includes but not limited to that the title, content, favorable comment number of the corresponding relevant issues of each theme are or/and information such as submission dates.Certainly, also can be identified each problem, only its corresponding identification number is deposited into result data files, when needs are shown in the page, by the front-end module of being correlated with, according to identification number, is gone to obtain the information such as title, content, answer, favorable comment number and submission date of corresponding problem.
Compared with prior art, carry problem provided by the present invention has the following advantages to the method for theme: the parameters such as the correlativity based between problem and theme, the quality of problem, problem readability are assessed and analyzed the problem of carry under theme, be the theme and obtain most suitable carry problem, promoted the viewing experience that user extends problem.
According to another aspect of the present invention, also provide the system of a kind of carry problem to theme.Please refer to Fig. 3, Fig. 3 is the schematic block diagram to the system of theme according to the carry problem of another preferred embodiment of the present invention.According to Fig. 3, this system comprises:
Receiver module 301, for receiving inputted term;
Retrieval module 302, retrieves corresponding theme and in problem index database, retrieves the problem list that comprises described theme according to described theme based on described term;
Calculation element 303, for calculating relevance values and the integrated value of the corresponding problem of described problem list;
Order module 304, sorts to corresponding problem based on described relevance values or integrated value;
Return to module 305, by setting a threshold value, the problem that described relevance values is greater than to described threshold value is returned;
Data memory module 306, for depositing according to the order of sequence the problem of returning of specific quantity in result data files.
Below the specific works process of above-mentioned each device or module is described in detail.
Wherein, the term that receiver module receives can for any with letter, Chinese character or/and word, word, phrase or the sentence that the element such as punctuation mark forms, for searching for the theme that is mounted with at least more than one problem.
Conventionally, the corresponding same theme of a plurality of similar problems, in other words, is mounted with at least more than one relevant issues conventionally under a theme.Please refer to Fig. 2, Fig. 2 is that carry problem is in accordance with a preferred embodiment of the present invention to the page schematic diagram of theme.As shown in Figure 2, the problem of the lower carry of theme " CS: download " has: " how long CS OL will download ", " CS downloads ", " CS is clicked downloads ", " CS 1.6 is downloaded " and " ask individual CS standalone version download! ", these problems sort and are illustrated in the page as factors such as the degree of correlation, user's favorable comment numbers according to certain evaluation criterion.
Wherein, each theme is comprised of centre word and label word (tag word) conventionally, and the theme " CS: download " of take is above example, and wherein, centre word is " CS ", and label word is " download ".
Receiver module receives after term, by retrieval module according to the corresponding theme centre word of described term or/and label word and search obtain corresponding theme and read.For example, input term " anti-terrorism ", according to " anti-terrorism " corresponding theme centre word, be " CS ", further according to corresponding centre word, obtain related subject, as " CS: download ", " CS: standalone version ", " CS, game " and " CS: official website " etc., and these themes are read.Further, inputted term further can also be expanded, so that the result of search is more concentrated, for example, term " anti-terrorism " be extended to " anti-terrorism standalone version ", may obtain unique theme " CS: standalone version ".
Further, retrieval module is retrieved the problem list that comprises described theme according to read theme in problem index database.Conventionally, described retrieval module can be according to the title of theme and problem, content or/and supplement the degree of correlation of these information such as enquirement and retrieve.Preferably, retrieval module is retrieved according to the degree of correlation of theme and problem title.
Wherein, described problem list is mainly listed the title of each problem, and other information of problem can directly be obtained for indexing in problem index database with title.The above-mentioned theme " CS: download " of still take is example, by retrieval, obtains the following problem list that comprises described subject content:
How long CS OL will download
CS is downloaded
CS is clicked download
CS 1.6 is downloaded
The online download of CS
CS 1.8 is downloaded
CS official downloads
In the present embodiment, problem list represents with the form of title.In other embodiments, problem list can also supplement or the form such as answer represents with content, problem.
Further, in order to meet user, wish to obtain needs higher with topic relativity or the problem that quality is higher, by calculation element, calculate relevance values and the integrated value of corresponding problem in described problem list.Conventionally, according to the relevance values of each problem, determine the position whether it is presented in the page and represents.In the present embodiment, the degree of correlation between the title that represents each problem with relevance values and corresponding theme, in other embodiments, can also represent other information of each problem and the degree of correlation of corresponding theme with relevance values.
Further, calculation element also comprises word-dividing mode.
Before calculating the relevance values of each problem, by word-dividing mode, the title of theme and each problem is carried out to participle, wherein, can adopt identical segmenting method to both, described segmenting method can be for the segmenting method based on string matching, segmenting method based on understanding be or/and the segmenting method based on statistics is in this no limit.The title of each problem is carried out after participle, if the participle of title (being term) also appears in theme, be designated as and hit, and adopt TF-IDF(term frequency – inverse document frequency) algorithm or only adopt IDF(inverse document frequency) algorithm adds up wherein each participle.
Wherein, relevance values (S rel) can adopt following formula (1) to calculate:
In order more to know the relevance values solution procedure of each problem, still take theme above " CS: download " and the problem of being hung describes as example.Wherein, theme is carried out after participle, the participial construction that obtains described theme is:
Anti-terrorism/elite/download
Wherein, the title of problem that theme is hung is carried out after participle, the participial construction that obtains described each problem title is:
Want/download of anti-terrorism/elite/OL//how long
Anti-terrorism/elite/download
Anti-terrorism/elite/clicking/download
Download anti-terrorism/elite/1.6/
Anti-terrorism/elite/online/download
Download anti-terrorism/elite/1.7/
Anti-terrorism/elite/official/download
The first problem in the problems referred to above " anti-terrorism/elite/OL/ want/download/how long " of take is example, because the anti-terrorism in this problem, elite, download appear in theme " anti-terrorism/elite/download " simultaneously, these three words is designated as and is hit.Adopt IDF algorithm to add up these three words that hit, obtain the IDF value of these three words in described problem, be designated as respectively idf 1, idf 2, idf 3, correspondingly, " anti-terrorism ", " elite ", IDF value corresponding to " download " these three words in theme are designated as respectively to idf 4, idf 5, idf 6, the relevance values (S of first problem and described theme 1) be: to the calculating of the relevance values of other problems, can calculate with reference to aforesaid way.Wherein, described relevance values is positioned at 0~1 this interval.
Then, by order module, based on described relevance values, corresponding problem is sorted.Preferably, according to relevance values, carry out descending sort.
Further, by returning to module, set a threshold value, the problem that described relevance values is greater than to described threshold value is returned.
Because relevance values is just weighed a parameter of problem, also need other parameter situations of problem identificatioin, as problem quality, problem readability etc.Therefore, also need calculation element to continue to calculate the relevance values (S that has returned to problem rel), problem mass value (S quality), the readable value of problem (S format) etc. correlation parameter determine the integrated value of each problem.Concrete computation process is as follows:
First, as described above, the relevance values of computational problem, obtains:
Secondly, computational problem mass value.In this step, the favorable comment that first obtains particular problem from problem metadatabase is counted g, re-uses following formula and calculates:
S quality=log 2(g+1.0)/10
Wherein, the data of depositing in problem metadatabase include but not limited to that the property value of problem is as favorable comment number, submission date etc., and it is that favorable comment valence mumber by all answers under cumulative this problem obtains that the favorable comment of described problem is counted g.Conventionally, described favorable comment digital display is shown in by the title of particular problem.For example, problem title is " this car how ", the answer obtaining consists of " fine ", " generally ", " performance is pretty good; appearance is good-looking ", and wherein, the favorable comment number of " fine " is 0, the favorable comment number of " generally " is 0, the favorable comment number of " performance is pretty good, and appearance is good-looking " is 3, and favorable comment number corresponding to this problem is three number sums 3.Conventionally, described favorable comment number is regularly to upgrade.
Obtain problem mass value (S quality) after, compare S qualitywhether be greater than 1, if be greater than 1, by S qualitybe set to 1, otherwise, keep its value S qualityconstant.
Then, the readability value (S of computational problem format), wherein, readable value is for weighing particular problem with respect to the readability of theme, and concrete computation process is as follows:
(1) calculate particular problem with respect to the surplus value (tit_len_left) of theme, computing formula is:
tit_len_left=tit_len-topic_len
Wherein, tit_len is the character length of problem title, and topic_len is the character length of corresponding theme.If the surplus value obtaining (tit_len_left) is less than or equal to 10, tit_len_left is set to 2; If the surplus value obtaining is greater than 10 and be less than 20, tit_len_left is set to 1, otherwise, be set to 0.
(2) calculate particular problem with respect to the central value (cent_at_head) of theme, if the reference position that the centre word of theme occurs in the title of particular problem is less than or equal to the length that 4(is equivalent to two Chinese characters), described central value (cent_at_head) is set to 1, otherwise, be set to 0.
(3) calculate particular problem with respect to the label value (cent_tag_gap) of theme, if the centre word of theme and label word (tag word) occur that the distance of position is less than 4, cent_tag_gap is set to 2, if this distance is greater than 4 and is less than 8, cent_tag_gap is set to 1, otherwise, be set to 0.
After obtaining above-mentioned surplus value, central value, label value, obtain the readability value (S of described problem format) be:
S format=(tit_len_left+cent_at_head+cent_tag_gap)/5.0
Finally, solve the integrated value of described problem, computing formula is as follows:
S final=S rel*W rel+S quality*W quality+S format*W format
Wherein, W rek, W quality, W formatfor default weighted value, represent respectively the significance level of correlativity, quality, readable three features.Preferably, be set to respectively 0.8,0.1,0.1.
Calculation element calculates after above-mentioned numerical value, by order module further according to integrated value S finalreturned problem is sorted, preferably, according to descending, arrange.
The problem of returning of further, getting specific quantity by data memory module deposits result data files according to the order of sequence in.For example, getting ordering front 5 problems deposits in result data files as net result.Preferably, got problem is carried out to duplicate removal, with the result that guarantees to deposit in, do not repeat.Wherein, the information of depositing in result data files includes but not limited to that the title, content, favorable comment number of the corresponding relevant issues of each theme are or/and information such as submission dates.Certainly, also can be identified each problem, by data memory module, its corresponding identification number is deposited into result data files, when needs are shown in the page, by the front-end module of being correlated with, according to identification number, gone to obtain the information such as title, content, answer, favorable comment number and submission date of corresponding problem.
Compared with prior art, carry problem provided by the present invention has the following advantages to the system of theme: the cooperatively interacting of the device providing by native system and each module, can be the theme and obtain carry problem and the corresponding answer of most suitable (comprising the factors such as correlativity and problem quality), promote the viewing experience that user extends problem.
Above disclosed is only preferred embodiment of the present invention, certainly can not limit with this interest field of the present invention, and the equivalent variations of therefore doing according to the claims in the present invention, still belongs to the scope that the present invention is contained.

Claims (10)

1. carry problem is to a method for theme, and the method comprises the following steps:
Receive the term of inputting, based on described term, retrieve corresponding theme and in problem index database, retrieve the problem list that comprises described theme according to described theme; Calculate the relevance values of each problem in described problem list and based on described relevance values, described problem sorted, setting a threshold value, the problem that described relevance values is greater than to described threshold value is returned;
Calculate each integrated value of having returned to problem and based on described integrated value, returned problem sorted, and the problem of returning of getting according to the order of sequence specific quantity deposits result data files in.
2. method according to claim 1, wherein, described problem index database is to take the issue database that the title of problem sets up as index.
3. method according to claim 1 and 2, wherein, the relevance values of described each problem of calculating also comprises: theme and each problem title corresponding to theme are carried out to participle.
4. method according to claim 3, wherein, also comprises: adopt IDF algorithm to add up described participle.
5. method according to claim 1 and 2, wherein, the integrated value of described problem comprises the readable value of the relevance values of problem, problem mass value and problem.
6. carry problem, to a system for theme, comprising:
Receiver module, for receiving inputted term;
Retrieval module, retrieves corresponding theme and in problem index database, retrieves the problem list that comprises described theme according to described theme based on described term;
Calculation element, for calculating relevance values and the integrated value of the corresponding problem of described problem list;
Order module, sorts to corresponding problem based on described relevance values or integrated value;
Return to module, by setting a threshold value, the problem that described relevance values is greater than to described threshold value is returned;
Data memory module, for depositing according to the order of sequence the problem of returning of specific quantity in result data files.
7. system according to claim 6, wherein, described problem index database is to take the issue database that the title of problem sets up as index.
8. according to the system described in claim 5 or 6, wherein, described calculation element also comprises word-dividing mode, for theme and problem title corresponding to theme are carried out to participle.
9. system according to claim 8, wherein, described calculation element adopts IDF algorithm to add up described participle.
10. according to the system described in claim 5 or 6, wherein, the integrated value of described problem comprises the readable value of the relevance values of problem, problem mass value and problem.
CN201310110075.0A 2013-03-30 2013-03-30 Method and system of the carry problem to theme Active CN104077330B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310110075.0A CN104077330B (en) 2013-03-30 2013-03-30 Method and system of the carry problem to theme

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310110075.0A CN104077330B (en) 2013-03-30 2013-03-30 Method and system of the carry problem to theme

Publications (2)

Publication Number Publication Date
CN104077330A true CN104077330A (en) 2014-10-01
CN104077330B CN104077330B (en) 2019-05-07

Family

ID=51598589

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310110075.0A Active CN104077330B (en) 2013-03-30 2013-03-30 Method and system of the carry problem to theme

Country Status (1)

Country Link
CN (1) CN104077330B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101178718A (en) * 2007-05-17 2008-05-14 腾讯科技(深圳)有限公司 Knowledge sharing system, problem searching method and problem publish method
CN101286161A (en) * 2008-05-28 2008-10-15 华中科技大学 Intelligent Chinese request-answering system based on concept
CN101441660A (en) * 2008-12-16 2009-05-27 腾讯科技(深圳)有限公司 Knowledge evaluating system and method in inquiry and answer community
CN102789466A (en) * 2011-05-19 2012-11-21 百度在线网络技术(北京)有限公司 Question title quality judgment method and device and question guiding method and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101178718A (en) * 2007-05-17 2008-05-14 腾讯科技(深圳)有限公司 Knowledge sharing system, problem searching method and problem publish method
CN101286161A (en) * 2008-05-28 2008-10-15 华中科技大学 Intelligent Chinese request-answering system based on concept
CN101441660A (en) * 2008-12-16 2009-05-27 腾讯科技(深圳)有限公司 Knowledge evaluating system and method in inquiry and answer community
CN102789466A (en) * 2011-05-19 2012-11-21 百度在线网络技术(北京)有限公司 Question title quality judgment method and device and question guiding method and device

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
CHARLOTTE VAN HOOIJDONK ET AL: ""On the Role of Visuals in Multimodal Answers to Medical Questions"", 《IEEE》 *
王君泽: ""基于大规模问答语料的问题检索系统"", 《万方》 *
王春秀: ""基于问答式检索技术的代理式数字参考咨询系统研究"", 《图书情报工作》 *

Also Published As

Publication number Publication date
CN104077330B (en) 2019-05-07

Similar Documents

Publication Publication Date Title
CN106649818B (en) Application search intention identification method and device, application search method and server
CN106709040B (en) Application search method and server
CN105224699B (en) News recommendation method and device
CN110059271B (en) Searching method and device applying tag knowledge network
CN107704503A (en) User's keyword extracting device, method and computer-readable recording medium
CN105653562B (en) The calculation method and device of correlation between a kind of content of text and inquiry request
CN106815297A (en) A kind of academic resources recommendation service system and method
US20060212441A1 (en) Full text query and search systems and methods of use
CN104199833B (en) The clustering method and clustering apparatus of a kind of network search words
CN106970991B (en) Similar application identification method and device, application search recommendation method and server
CN103678576A (en) Full-text retrieval system based on dynamic semantic analysis
CN105930469A (en) Hadoop-based individualized tourism recommendation system and method
CN103310343A (en) Commodity information issuing method and device
CN108572971B (en) Method and device for mining keywords related to search terms
CN105550359B (en) Webpage sorting method and device based on vertical search and server
CN111444304A (en) Search ranking method and device
CN111753167A (en) Search processing method, search processing device, computer equipment and medium
CN105653547A (en) Method and device for extracting keywords of text
WO2007011129A1 (en) Information search method and information search apparatus on which information value is reflected
CN110674365A (en) Searching method, device, equipment and storage medium
CN112989824A (en) Information pushing method and device, electronic equipment and storage medium
CN113204953A (en) Text matching method and device based on semantic recognition and device readable storage medium
CN109522275B (en) Label mining method based on user production content, electronic device and storage medium
CN111209480A (en) Method and device for determining pushed text, computer equipment and medium
CN104077288B (en) Web page contents recommend method and web page contents recommendation apparatus

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant