CN101576928A - Method and device for selecting related article - Google Patents

Method and device for selecting related article Download PDF

Info

Publication number
CN101576928A
CN101576928A CNA200910107940XA CN200910107940A CN101576928A CN 101576928 A CN101576928 A CN 101576928A CN A200910107940X A CNA200910107940X A CN A200910107940XA CN 200910107940 A CN200910107940 A CN 200910107940A CN 101576928 A CN101576928 A CN 101576928A
Authority
CN
China
Prior art keywords
article
word
recommended
core word
speech
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CNA200910107940XA
Other languages
Chinese (zh)
Inventor
付恒
赵琳霖
耿方圆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CNA200910107940XA priority Critical patent/CN101576928A/en
Publication of CN101576928A publication Critical patent/CN101576928A/en
Pending legal-status Critical Current

Links

Images

Abstract

The invention relates to a method and a device for selecting a related article. The method comprises the steps as follows: S1. establishing an index database of an article expected to be recommended; S2. analyzing the given article to determine the key word of the given article; and S3. selecting the related article of the given article from the index database according to the key word. The device comprises the index database, a key word determining module and a selecting module. The index database is used for storing the article expected to be recommended; the key word determining module is used for analyzing the given article to determine the key word of the given article; and the selecting module is used for selecting the related article of the given article from the index database according to the key word. The invention determines the key word of the given article by analyzing the given article so as to select the related article of the given article without manual operation, thereby simplifying operation process, pointedly providing the related article for different readers reading different articles, and meeting individual requirements of the readers.

Description

A kind of choosing method of related article and device
Technical field
The present invention relates to technical field of the computer network, more particularly, relate to the choosing method and the device of related article in a kind of network.
Background technology
Along with the development of Internet technology, network becomes the important source that people obtain information gradually.Yet increase along with the internet information amount is how much levels, pendulum is the various information of numerous and complicated in face of the user.In order to provide interested in the user rapidly, targetedly with information, each portal website all when the user reads door news, blog or model, provides the recommendation article to the user.
These recommend article generally is to determine by the mode of manual sorting, just reads a large amount of articles by the editorial staff, and the artificial then article that some is relevant is defined as the recommendation article at the reader of a certain article.Provide to the reader by this way and recommend the defective of article to be, it needs a large amount of manually-operateds, and required cost of labor is very high.
These recommend article also can be fixing the recommendation, just determine that by the mode of manual sorting one group or many groups recommend article, recommend article to offer the reader of all articles on this website this group then or will organize more and recommend article to offer all readers randomly.Provide to the reader by this way and recommend the defective of article to be that the content recommendation underaction can not be recommended different contents at the reader of each article, therefore can not satisfy reader's individual demand.
In sum, prior art provide the technical scheme of recommending article to network article reader, otherwise need a large amount of manually-operateds, required cost of labor is very high; Otherwise the content underaction of recommending can not satisfy reader's individual demand, so need a kind of choosing method of related article of the requirement that can satisfy two aspects simultaneously.
Summary of the invention
The technical problem to be solved in the present invention is, at prior art otherwise need a large amount of manually-operateds, required cost of labor is very high; Otherwise the content underaction of recommending can not satisfy the defective of reader's individual demand, and the choosing method and the device of related article in a kind of network is provided.
In order to realize goal of the invention, the choosing method of related article in a kind of network is provided, described method comprises:
The index data base of recommended article is wished in S1, foundation;
S2, analyze given article to determine the core word of described given article;
S3, from described index data base, choose the related article of described given article according to described core word.
Preferably, described step S1 further comprises:
S11, carry out pre-service to wishing recommended article;
S12, wish that to pretreated recommended article carries out participle;
S13, set up according to word segmentation result each speech predicate to comprising article index and store described index data base into.
Preferably, described step S11 further comprises:
S111, filter out the recommended article of hope that does not meet default regulation;
S112, weight set in the recommended article of the hope that meets default regulation;
S113, arrange heavily wishing article title is identical in the recommended article article.
Preferably, described step S12 further comprises: remove individual character and stop words in the word segmentation result.
Preferably, described step S2 comprises:
S21, given article is carried out participle;
S22, select the highest speech of at least one word frequency as core word.
Preferably, described step S22 further comprises:
S221, when the word segmentation result of the article text that obtains is less than some, with all speech all as core word; Otherwise word frequency in the article text is carried out classification greater than 1 speech;
S222, in each classification, choose the highest speech of at least one word frequency and give different weights as the text core word and according to the word frequency size;
S223, the speech in the article title is carried out classification with as the title core word, and give different weights according to the word frequency size;
S224, text core word and title core word are arranged heavy to obtain core word and weight thereof in full.
Preferably, described step S22 further comprises space of a whole page title and the weight thereof that adds the article place, and arranges heavy to obtain core word to described space of a whole page title and described full text core word.
Preferably, described step S3 further comprises:
S31, in described index data base, inquire the hope recommended article relevant with described given article based on described core word;
S32, based on the weight of described core word with describedly wish that the weight of recommended article selects related article.
Preferably, described step S32 further comprises:
S321, determine describedly to wish the relevance score of recommended article and remove the article that described relevance score is lower than setting value based on the weight of described core word and described weight of wishing recommended article;
S322, employing mathematical method fit to a curve with the scoring that described relevance score is higher than the article of setting value, calculate first flex point of described curve, and remove the later article of described flex point.
In order to realize goal of the invention better, a kind of selecting device of related article is provided, described device comprises:
Index data base is used for storage and wishes recommended article;
The core word determination module is used to analyze given article to determine the core word of described given article;
Choose module, be used for choosing from described index data base the related article of described given article according to described core word.
Preferably, described index data base further comprises:
Pretreatment unit is used for wishing that recommended article carries out pre-service;
First participle unit is used for wishing that to pretreated recommended article carries out participle;
Construction unit, be used for according to word segmentation result set up each speech predicate to comprising article index and store described index data base into.
Preferably, described pre-service comprises: filter out the recommended article of hope that do not meet default regulation, the recommended article of the hope that meets default regulation is set weight and/or to wishing that the article that article title is identical in the recommended article arranges heavily.
Preferably, described core word determination module further comprises:
The second participle unit is used for given article is carried out participle;
Core word is selected the unit, is used to select the highest speech of at least one word frequency as core word.
Preferably, the described module of choosing further comprises:
Query unit is used for inquiring the hope recommended article relevant with described given article based on described core word at described index data base;
Selected unit is used for weight and described weight selection related article of wishing recommended article based on described core word.
The present invention is by being used to analyze given article to determine the core word of described given article, thereby choose the related article of described given article, need not manually-operated, simplified operating process, and can be at the different readers of different articles, related article is provided targetedly, satisfies reader's individual demand.
Further, owing among the present invention participle, index technology are introduced correlation calculations, reduce the difficulty of handling problems greatly; And definite article weight is provided with rule, thereby improves the effect of correlation calculations.
Description of drawings
The invention will be further described below in conjunction with drawings and Examples, in the accompanying drawing:
Fig. 1 is the process flow diagram of first embodiment of the choosing method of a kind of related article of the present invention;
Fig. 2 is the process flow diagram of second embodiment of the choosing method of a kind of related article of the present invention;
Fig. 3 is the process flow diagram of the 3rd embodiment of the choosing method of a kind of related article of the present invention;
Fig. 4 is the process flow diagram of another embodiment of step S306 of the 3rd embodiment of the choosing method of a kind of related article of the present invention shown in Fig. 3;
Fig. 5 is the theory diagram of first embodiment of the selecting device of a kind of related article of the present invention;
Fig. 6 is the theory diagram of second embodiment of the selecting device of a kind of related article of the present invention.
Embodiment
In order to make purpose of the present invention, technical scheme and advantage clearer,, the present invention is further elaborated below in conjunction with drawings and Examples.Should be appreciated that specific embodiment described herein only in order to explanation the present invention, and be not used in qualification the present invention.
The statement of using in the present invention " is correlated with " and is meant two pieces or many pieces of identical or close themes of articles expression; Statement " participle " is meant continuous article is divided into independent vocabulary; Statement " index " is meant the vocabulary in the article and makes and can find the article that comprises this vocabulary by vocabulary fast according to the orderly layout of certain way; Statement " word frequency " is meant the number of times that certain speech occurs in whole section/piece article.
Fig. 1 is the process flow diagram of first embodiment of the choosing method of a kind of related article of the present invention, and its detailed process is as follows:
In step S100, set up the index data base of wishing recommended article.Model, news and/or the blog articles etc. that can comprise all issues in certain forum, the website in this index data base.In specific embodiments of the invention, can adopt the whole bag of tricks of prior art, the vocabulary in the article is carried out layout in an orderly manner according to certain mode, thereby can find the article that comprises this vocabulary apace by vocabulary.
In step S200, analyze given article to determine the core word of described given article.Core word is the vocabulary that can represent the article theme, by determining the core word of article, can determine the theme of article, thereby determine the meaning that article will be expressed simply.In simplified embodiment of the present invention, can determine the core word of given article by artificial mode.In a preferred embodiment of the invention, also can adopt the mode of computer checking to determine core word, for example with word frequency greater than the speech of setting value as core word.
In step S300, can from described index data base, choose the related article of described given article according to described core word.These related articles can provide in the mode of tabulating when the user reads door news, blog or community's model, also can send the user to by modes such as Emails.
Fig. 2 is the process flow diagram of second embodiment of the choosing method of a kind of related article of the present invention, and its detailed process is as follows:
In step S201, carry out pre-service to wishing recommended article.In a preferred embodiment of the invention, described pre-service can comprise and filters out the recommended article of hope that does not meet default regulation.Such as number of words few, article of low quality itself can be filtered, and not be used in the processing of it being carried out next step.
In another preferred embodiment of the present invention, described pre-treatment step may further include weight step is set, and just weight is set in the recommended article of the hope that meets default regulation.For example, give higher weight, then give lower weight for older article, article that clicks is lower for more article of the article of newly release, the article that contains picture, clicks etc.The weight rule can manually be adjusted as required.Designing good weight rule can influence final recommendation effect significantly, makes that the hottest up-to-date content is more represented; Simultaneously, the weight rule provides interface also for the manual intervention recommendation results, makes recommendation results to be adjusted quickly and easily, and this can respond the burst demand better.
In a preferred embodiment more of the present invention, described pre-treatment step also further comprises wishing that article title is identical in the recommended article article arranges heavily.This is because title is the important embodiment of article semanteme, when carrying out correlation calculations, the article that title is identical is often represented simultaneously or is pushed to the user, influences reading experience, here the identical article of article title is arranged to weigh just to have solved this problem.Row heavily is meant and finally can keeps an article by the article identical to a plurality of titles, and can be that number of words is maximum, up-to-date deliver article like this, also can be in weight the highest article of weight that is provided with in the step to be set.
In step S202, wish that to pretreated recommended article carries out participle.In this step, can adopt existing word-dividing mode to carry out participle.Because what the present invention mainly handled is Chinese article, therefore adopt the Chinese word segmentation module that the present invention is introduced below.But those skilled in the art know that thought of the present invention and instruction obviously can be used for English and other literal.
In one embodiment of the invention, can adopt existing Chinese word segmentation module that Chinese article is carried out participle, and word segmentation result is carried out following processing:
A, remove the individual character in the word segmentation result, as " ", " " etc., avoid individual character to the semantic interference of judging of article;
B, remove the stop words in the word segmentation result.Stop words is a vocabulary, comprised often to occur in the Chinese but to the little vocabulary of article expression and significance relation, for example " if ", speech such as " still ".Remove stop words, avoid some no its meaning but in Chinese speech commonly used, minimizing is to the interference of correlation computations.
In step S203, can according to word segmentation result set up each speech predicate to comprising article index and store described index data base into.Thus, can comprise the higher article of some quality in certain forum, the website in this index data base, and these articles and the speech that they are comprised are mapped, be convenient to searching of subsequent step.
In step S204, given article is carried out participle.In this step, can adopt the method identical to be realized with step S202.And also can carry out the result of participle and the similar pre-service of step S202.
In step S205, from the word segmentation result that obtains, select the highest speech of at least one word frequency as core word.In a preferred embodiment of the invention, can with occur word frequency greater than 1 speech according to the word frequency descending sort, and select top one or more in order.
(is example with Chinese article) in a preferred embodiment of the invention when the word segmentation result that obtains during less than certain quantity (such as 10 speech or 20 speech), can think that all speech all are core words.
In another preferred embodiment of the present invention, also can carry out classification to word segmentation result, such as it being divided three classes:
The first kind, noun, name, surname, name, place name, group of mechanism, other proper names
Second class is connected into branch behind adjective, Chinese idiom, abbreviation abbreviation, idiom, verb, the verb
The 3rd class, other
When the word segmentation result that obtains during, the first kind and the second class speech are directly thought core word less than some.When word segmentation result during greater than some, with word frequency is the highest in each classification speech as core word.
Those skilled in the art know, except above-mentioned disclosed method, also can adopt any other method or said method are made up or said method and methods known in the art are made up, and then obtain core word from word segmentation result.
In step S206, in described index data base, inquire the hope recommended article relevant with described given article based on described core word.For this query steps, can adopt solr full-text search server to carry out.Certainly, those skilled in the art also can adopt other known methods to inquire about.
In step S207, can tabulate shows described related article.In one embodiment of the invention, these related articles can provide in the mode of tabulating when the user reads door news, blog or community's model, also can send the user to by modes such as Emails.
Fig. 3 is the process flow diagram of the 3rd embodiment of the choosing method of a kind of related article of the present invention; Its idiographic flow is as follows:
In step S301, can the recommended article of hope that not meet default regulation will be filtered out.Can be filtered such as the article, the number of words article very little that repeat, not be used in the processing of it being carried out next step.Wherein default regulation can be set as required, and need not to be limited in content disclosed in this invention.
In step S302, weight set in the recommended article of the hope that meets default regulation.Such as in forum, quantity is more for replying, length is long and/or the bigger article of click volume gives higher weight.
In step S303, the recommended article of hope that is provided with after the weight is carried out participle.In this step, can adopt existing word-dividing mode to carry out participle, also can carry out pre-service to word segmentation result.With the english article is example, can carry out following processing to word segmentation result:
A, remove individual character in the word segmentation result (just, article or do not have the speech of determining meaning), as " a ", " an ", " the " etc.;
B, remove the stop words in the word segmentation result.Stop words is a vocabulary, comprised often to occur in the English but the article expression and significance being concerned little vocabulary, and as " although ", " but ", " and ", the speech of " However " and so on.
For different languages and corresponding word-dividing mode, can adopt different pretreatment modes to word segmentation result, so that delete some the little vocabulary of its meaning relation expressed in article.Those skilled in the art can the such module of constructed according to the present invention.
In step S304, set up according to word segmentation result each speech predicate to comprising article index and store the weight of described index and recommended article into described index data base.Thus, can comprise the higher article of some quality in certain forum, the website in this index data base, and these articles and the speech that they are comprised are mapped, be convenient to searching of subsequent step.
In step S305, given article is carried out participle.In this step, can adopt the method identical to be realized with step S202 and/or S303.And also can carry out the result of participle and step S202 and/or the similar pre-service of S303.
At step S306, select the highest core word of a plurality of word frequency, and give different weights according to the word frequency size.In one embodiment of the invention, can select the highest preceding 4 speech of word frequency, give 1.4,1.2,1.1 and 1.0 weight respectively.
In another embodiment of the present invention, word frequency can be carried out classification greater than all speech of 1, as be divided into following three classes:
The first kind, noun, name, surname, name, place name, group of mechanism, other proper names
Second class is connected into branch behind adjective, Chinese idiom, abbreviation abbreviation, idiom, verb, the verb
The 3rd class, other
In each class, choose a highest speech of word frequency then, such as in the first kind, comprising A1/A2, comprise B1/B2/B3 in second class, comprise C1/C2/C3 in the 3rd class, more than in each classification each speech all according to the descending sort of word frequency, therefore, can choose speech A1/B1/C1, and in remaining all speech, the word frequency of A2 is the highest, therefore the last core word that obtains is A1/B1/C1/A2, and gives weight to it.Weight as A1 is 1.4, and the weight of B1 is 1.2, and the weight of C1 is 1.1, and the weight of A2 is 1.0.
In a simplified embodiment of the present invention, also can select word frequency is the highest in each classification speech as core word respectively, and give weight it, such as, the weight of A1 be 1.3, the weight of B1 is 1.2, the weight of C1 is 1.Those skilled in the art also can be provided with other big or small weights as required.
In step S307, in described index data base, inquire the hope recommended article relevant with described given article based on described core word.For this query steps, can adopt solr full-text search server to carry out.Certainly, those skilled in the art also can adopt other known methods to inquire about.
In step S308, based on the weight and the described weight selection related article of wishing recommended article of described core word.In one embodiment of the invention, can determine describedly to wish the relevance score of recommended article and remove the article that described relevance score is lower than setting value based on the weight of described core word and described weight of wishing recommended article, can guarantee that like this article of correlativity difference can not appear in the final recommendation list.
In another embodiment of the present invention, this is chosen and comprises that further the employing mathematical method fits to a curve with the scoring that described relevance score is higher than the article of setting value, calculate first flex point of described curve, and remove the later article of described flex point, can guarantee that like this article that appears in the tabulation has consistance preferably.
In an embodiment more of the present invention, also can screen inquiring the recommended article of hope relevant, and then satisfy temporary needs in some internet, applications by the network editing personnel with described given article.
In step S309, can tabulate shows described related article.In one embodiment of the invention, these related articles can provide in the mode of tabulating when the user reads door news, blog or community's model, also can send the user to by modes such as Emails.
Fig. 4 is the process flow diagram of another embodiment of step S306 of the 3rd embodiment of the choosing method of a kind of related article of the present invention shown in Fig. 3; Its idiographic flow is as follows:
In step S401, respectively article title and article text are carried out participle.In this step, can adopt existing word-dividing mode to carry out participle, also can carry out pre-service to word segmentation result.
In step S402, whether judge word segmentation result less than 10 speech, if, then enter step S412, judge that directly all speech all are core words, and determine its weight; Otherwise execution in step S403.In the embodiment of word segmentation result less than 10 speech, can determine its weight by the part of speech of judging these speech, such as, for nouns and adjectives, give weight 1.4, for the speech of other parts of speech, give weight 1.2.In another embodiment, also nouns and adjectives can be specified to core word, and give 1.4 and 1.3 weight respectively.
In step S403, whether the word frequency of judging each speech is greater than 1.For word frequency greater than 1 speech, execution in step S404, the speech for word frequency equals 1 does not carry out next step processing.
In step S404, judge whether this speech belongs to the article text, if execution in step S405 then, otherwise this speech is to belong to article title, execution in step S413.
In step S405, according to part of speech the speech in the article text is divided into following three classes, and execution in step S406:
The first kind, noun, name, surname, name, place name, group of mechanism, other proper names
Second class is connected into branch behind adjective, Chinese idiom, abbreviation abbreviation, idiom, verb, the verb
The 3rd class, other
In step S406, in each classification, each speech is done the descending sort of word frequency, choose preceding 4 speech.In one embodiment of the invention, can in each class, choose a highest speech of word frequency, such as in the first kind, comprising A1/A2, comprise B1/B2/B3 in second class, comprise C1/C2/C3 in the 3rd class, more than in each classification each speech all according to the descending sort of word frequency, therefore, can choose speech A1/B1/C1, choose the high speech A2 of word frequency second then in the first kind, the therefore last core word that obtains is A1, B1, C1, A2.In another embodiment of the present invention, also can directly choose 4 the highest speech of word frequency, and not consider its described classification.In an embodiment more of the present invention, also can in each classification, choose a highest speech of word frequency respectively, choose again and set the high speech of word frequency in the classification second.
In step S407, according to the word frequency size speech of choosing is given 1.4,1.2,1.1 and 1.0 weight respectively.As core word A1, B1, C1, A2 be A1/B1/C1/A2 by the word frequency descending sort, so, the weight of A1 is 1.4, the weight of B1 is 1.2, the weight of C1 is 1.1, the weight of A2 is 1.0.Those skilled in the art know, according to instruction of the present invention, can choose the core word of other quantity, and set the weight of different numerical value as required.
In step S413, according to part of speech the speech in the article title is divided into two classes, and execution in step S414:
The first kind, noun, name, surname, name, place name, group of mechanism, other proper names
Second class is connected into branch behind adjective, Chinese idiom, abbreviation abbreviation, idiom, verb, the verb
In step S414, in each classification, each speech done the descending sort of word frequency and choose first two words.In other embodiments of the invention, also can directly choose the highest speech of word frequency and do not consider its classification, perhaps also can select the speech of other quantity.
In step S415, the speech of choosing given 1.2 weight respectively.In one embodiment of the invention, also can give identical or different weight to the speech of choosing according to the difference of classification and/or the size of word frequency.
In step S408, judge whether the speech of choosing has identical speech with the speech of choosing in the article text in article title, if execution in step S409 is arranged, otherwise execution in step S410.
In step S409, choose the speech that wherein weight is the highest and arrange heavily, just only keep the highest speech of weight.In another preferred embodiment of the present invention, except keeping the highest speech of weight, also the weight of this speech can be multiply by 1.1 again.In other embodiments of the invention, also can speech and the weight that keep be further processed.
In step S410, judge that whether the speech obtain repeats with the space of a whole page title at article place in step S409, wherein the weight of space of a whole page title is 1.1.If, execution in step S411, otherwise execution in step S412.
In step S411, the speech that weight selection is the highest is arranged heavily.That is to say that if the weight of the speech that obtains in step S409 " online game " is 1.3, and the weight of the space of a whole page title " online game " at this article place is 1.1, the weight of the speech of Huo Deing " online game " is 1.3 so.In other embodiments of the invention, also can speech and the weight that keep be further processed, such as it be multiply by 1.1.
At step S412, can determine core word and weight thereof at last.Core word and its weight will be brought into play further effect in the retrieving of back.
Fig. 5 is the theory diagram of first embodiment of the selecting device of a kind of related article of the present invention.As shown in Figure 5, the selecting device of related article of the present invention comprises index data base 501, core word determination module 502 and chooses module 503.Wherein, described index data base 501 can be used for storage and wishes recommended article.Model, news and/or the blog articles etc. that can comprise all issues in certain forum, the website in this index data base 501.In specific embodiments of the invention, can adopt the whole bag of tricks of prior art, the vocabulary in the article is carried out layout in an orderly manner according to certain mode, thereby can find the article that comprises this vocabulary apace by vocabulary.
Described core word determination module 502 can be used for analyzing given article to determine the core word of described given article.In a preferred embodiment of the invention, also can adopt the mode of computer checking or known any participle technique to determine core word.
Describedly choose the related article that module 503 can be used for choosing from described index data base 502 according to described core word described given article.These related articles can provide in the mode of tabulating when the user reads door news, blog or community's model, also can send the user to by modes such as Emails.
Fig. 6 is the theory diagram of second embodiment of the selecting device of a kind of related article of the present invention.As shown in Figure 6, described index data base 501 further comprises: pretreatment unit 101 is used for wishing that recommended article carries out pre-service; First participle unit 102 is used for wishing that to pretreated recommended article carries out participle; Construction unit 103, be used for according to word segmentation result set up each speech predicate to comprising article index and store described index data base into.
In one embodiment of the invention, described pre-service comprises: filter out the recommended article of hope that do not meet default regulation, the recommended article of the hope that meets default regulation is set weight and/or to wishing that the article that article title is identical in the recommended article arranges heavily.
Described core word determination module 502 further comprises: the second participle unit 201 is used for given article is carried out participle; Core word is selected unit 202, is used to select the highest speech of at least one word frequency as core word.Wherein said second participle unit 202 and first participle unit 102 can adopt word-dividing mode as known in the art to be provided with.
The described module 503 of choosing further comprises: query unit 301 is used for inquiring the hope recommended article relevant with described given article based on described core word at described index data base; Selected unit 302 is used for weight and described weight selection related article of wishing recommended article based on described core word.
In one embodiment of the invention, this is chosen and comprises and can determine describedly to wish the relevance score of recommended article and remove the article that described relevance score is lower than setting value based on the weight of described core word and described weight of wishing recommended article, can guarantee that like this article of correlativity difference can not appear in the final recommendation list.
In another embodiment of the present invention, this is chosen and comprises that further the employing mathematical method fits to a curve with the scoring that described relevance score is higher than the article of setting value, calculate first flex point of described curve, and remove the later article of described flex point, can guarantee that like this article that appears in the tabulation has consistance preferably.
In other embodiments of the invention, the selecting device of related article of the present invention can make up according to the choosing method of aforementioned related article, those skilled in the art can make up the selecting device of such related article according to instruction of the present invention, therefore just do not state tired at this.
The selecting device of a kind of related article of the present invention and method can be used for when the user reads door news, blog, calculate relative community model and represent; Or semantic by the way extraction of adopting " the analysis of key speech extracts semantic ", be used for polymerization article at random, picture; Or be used for webpage, the article that analysis user was browsed, analyze its semanteme, and then be used for the analysis user hobby.
According to instruction of the present invention and indication, those skilled in the art also can combine method and apparatus of the present invention with other prior aries and method and apparatus, to be applied to other suitable fields.
Therefore, the present invention can pass through hardware, software, and perhaps soft, combination of hardware realizes.The present invention can realize with centralized system at least one computer system, perhaps be realized with dispersing mode by the different piece in the computer system that is distributed in several interconnection.Any computer system or miscellaneous equipment that can implementation method all be applicatory.The combination of software and hardware commonly used can be the general-purpose computing system that computer program is installed, and by installing and executive routine control computer system, it is moved by method.
Though the present invention describes by specific embodiment, it will be appreciated by those skilled in the art that, without departing from the present invention, can also carry out various conversion and be equal to alternative the present invention.In addition, at particular condition or material, can make various modifications to the present invention, and not depart from the scope of the present invention.Therefore, the present invention is not limited to disclosed specific embodiment, and should comprise the whole embodiments that fall in the claim scope of the present invention.

Claims (14)

1, a kind of choosing method of related article is characterized in that, described method comprises:
The index data base of recommended article is wished in S1, foundation;
S2, analyze given article to determine the core word of described given article;
S3, from described index data base, choose the related article of described given article according to described core word.
2, the choosing method of related article according to claim 1 is characterized in that, described step S1 further comprises:
S11, carry out pre-service to wishing recommended article;
S12, wish that to pretreated recommended article carries out participle;
S13, set up according to word segmentation result each speech predicate to comprising article index and store described index data base into.
3, the choosing method of related article according to claim 2 is characterized in that, described step S11 further comprises:
S 111, filter out the recommended article of hope that does not meet default regulation;
S112, weight set in the recommended article of the hope that meets default regulation;
S113, arrange heavily wishing article title is identical in the recommended article article.
4, the choosing method of related article according to claim 2 is characterized in that, described step S12 further comprises: remove individual character and stop words in the word segmentation result.
5, according to the choosing method of the described related article of arbitrary claim among the claim 1-4, it is characterized in that described step S2 comprises:
S21, given article is carried out participle;
S22, select the highest speech of at least one word frequency as core word.
6, the choosing method of related article according to claim 5 is characterized in that, described step S22 further comprises:
S221, when the word segmentation result of the article text that obtains is less than some, with all speech all as core word; Otherwise word frequency in the article text is carried out classification greater than 1 speech;
S222, in each classification, choose the highest speech of at least one word frequency and give different weights as the text core word and according to the word frequency size;
S223, the speech in the article title is carried out classification with as the title core word, and give different weights according to the word frequency size;
S224, text core word and title core word are arranged heavy to obtain core word and weight thereof in full.
7, the choosing method of related article according to claim 6 is characterized in that, described step S22 further comprises space of a whole page title and the weight thereof that adds the article place, and arranges heavy to obtain core word to described space of a whole page title and described full text core word.
According to the choosing method of claim 6 or 7 described related articles, it is characterized in that 8, described step S3 further comprises:
S31, in described index data base, inquire the hope recommended article relevant with described given article based on described core word;
S32, based on the weight of described core word with describedly wish that the weight of recommended article selects related article.
9, the choosing method of related article according to claim 8 is characterized in that, described step S32 further comprises:
S321, determine describedly to wish the relevance score of recommended article and remove the article that described relevance score is lower than setting value based on the weight of described core word and described weight of wishing recommended article;
S322, employing mathematical method fit to a curve with the scoring that described relevance score is higher than the article of setting value, calculate first flex point of described curve, and remove the later article of described flex point.
10, a kind of selecting device of related article is characterized in that, described device comprises:
Index data base is used for storage and wishes recommended article;
The core word determination module is used to analyze given article to determine the core word of described given article;
Choose module, be used for choosing from described index data base the related article of described given article according to described core word.
11, the selecting device of related article according to claim 10 is characterized in that, described index data base further comprises:
Pretreatment unit is used for wishing that recommended article carries out pre-service;
First participle unit is used for wishing that to pretreated recommended article carries out participle;
Construction unit, be used for according to word segmentation result set up each speech predicate to comprising article index and store described index data base into.
12, the selecting device of related article according to claim 11, it is characterized in that described pre-service comprises: filter out the recommended article of hope that do not meet default regulation, the recommended article of the hope that meets default regulation is set weight and/or to wishing that the article that article title is identical in the recommended article arranges heavily.
According to the selecting device of the described related article of arbitrary claim among the claim 10-12, it is characterized in that 13, described core word determination module further comprises:
The second participle unit is used for given article is carried out participle;
Core word is selected the unit, is used to select the highest speech of at least one word frequency as core word.
14, the selecting device of related article according to claim 13 is characterized in that, the described module of choosing further comprises:
Query unit is used for inquiring the hope recommended article relevant with described given article based on described core word at described index data base;
Selected unit is used for weight and described weight selection related article of wishing recommended article based on described core word.
CNA200910107940XA 2009-06-11 2009-06-11 Method and device for selecting related article Pending CN101576928A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CNA200910107940XA CN101576928A (en) 2009-06-11 2009-06-11 Method and device for selecting related article

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CNA200910107940XA CN101576928A (en) 2009-06-11 2009-06-11 Method and device for selecting related article

Publications (1)

Publication Number Publication Date
CN101576928A true CN101576928A (en) 2009-11-11

Family

ID=41271863

Family Applications (1)

Application Number Title Priority Date Filing Date
CNA200910107940XA Pending CN101576928A (en) 2009-06-11 2009-06-11 Method and device for selecting related article

Country Status (1)

Country Link
CN (1) CN101576928A (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102789466A (en) * 2011-05-19 2012-11-21 百度在线网络技术(北京)有限公司 Question title quality judgment method and device and question guiding method and device
CN103778122A (en) * 2012-10-17 2014-05-07 腾讯科技(深圳)有限公司 Searching method and system
CN104391843A (en) * 2013-08-19 2015-03-04 捷达世软件(深圳)有限公司 System and method for recommending files
CN105022794A (en) * 2015-06-26 2015-11-04 广州时韵信息科技有限公司 Method and apparatus for fast searching for required article contents
CN105279155A (en) * 2014-05-28 2016-01-27 腾讯科技(深圳)有限公司 Data processing method and device for access objects
CN106021443A (en) * 2016-05-16 2016-10-12 北京奇虎科技有限公司 Post display method and apparatus
CN106579710A (en) * 2016-12-09 2017-04-26 成都锦汇科技有限公司 Luggage case
CN107122466A (en) * 2017-04-28 2017-09-01 福建中金在线信息科技有限公司 A kind of web documents querying method and system
CN107133308A (en) * 2017-04-28 2017-09-05 维沃移动通信有限公司 The single generation method of one kind song and mobile terminal
CN107992586A (en) * 2017-12-08 2018-05-04 成都谷问信息技术有限公司 Search method based on the intelligent meaning of one's words
CN108021681A (en) * 2017-12-08 2018-05-11 成都谷问信息技术有限公司 Be conducive to the system for improving retrieval precision
CN108090099A (en) * 2016-11-22 2018-05-29 科大讯飞股份有限公司 A kind of text handling method and device

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102789466B (en) * 2011-05-19 2015-09-30 百度在线网络技术(北京)有限公司 A kind of enquirement title quality judging method, enquirement bootstrap technique and device thereof
CN102789466A (en) * 2011-05-19 2012-11-21 百度在线网络技术(北京)有限公司 Question title quality judgment method and device and question guiding method and device
CN103778122A (en) * 2012-10-17 2014-05-07 腾讯科技(深圳)有限公司 Searching method and system
CN103778122B (en) * 2012-10-17 2018-01-23 腾讯科技(深圳)有限公司 Searching method and system
CN104391843A (en) * 2013-08-19 2015-03-04 捷达世软件(深圳)有限公司 System and method for recommending files
CN105279155A (en) * 2014-05-28 2016-01-27 腾讯科技(深圳)有限公司 Data processing method and device for access objects
CN105279155B (en) * 2014-05-28 2019-06-25 腾讯科技(深圳)有限公司 A kind of data processing method and device accessing object
CN105022794A (en) * 2015-06-26 2015-11-04 广州时韵信息科技有限公司 Method and apparatus for fast searching for required article contents
CN106021443A (en) * 2016-05-16 2016-10-12 北京奇虎科技有限公司 Post display method and apparatus
CN108090099A (en) * 2016-11-22 2018-05-29 科大讯飞股份有限公司 A kind of text handling method and device
CN108090099B (en) * 2016-11-22 2022-02-25 科大讯飞股份有限公司 Text processing method and device
CN106579710A (en) * 2016-12-09 2017-04-26 成都锦汇科技有限公司 Luggage case
CN107122466A (en) * 2017-04-28 2017-09-01 福建中金在线信息科技有限公司 A kind of web documents querying method and system
CN107133308B (en) * 2017-04-28 2020-01-14 维沃移动通信有限公司 Singing list generation method and mobile terminal
CN107133308A (en) * 2017-04-28 2017-09-05 维沃移动通信有限公司 The single generation method of one kind song and mobile terminal
CN108021681A (en) * 2017-12-08 2018-05-11 成都谷问信息技术有限公司 Be conducive to the system for improving retrieval precision
CN107992586A (en) * 2017-12-08 2018-05-04 成都谷问信息技术有限公司 Search method based on the intelligent meaning of one's words

Similar Documents

Publication Publication Date Title
CN101576928A (en) Method and device for selecting related article
US10261954B2 (en) Optimizing search result snippet selection
Mendes et al. DBpedia spotlight: shedding light on the web of documents
US8793259B2 (en) Information retrieval device, information retrieval method, and program
US20100293162A1 (en) Automated Keyword Generation Method for Searching a Database
CN106407344B (en) A kind of method and system generating search engine optimization label
US20110295857A1 (en) System and method for aligning and indexing multilingual documents
CN102915380A (en) Method and system for carrying out searching on data
CN101697109A (en) Method and system for acquiring candidates of input method
CN103678576A (en) Full-text retrieval system based on dynamic semantic analysis
CN102930054A (en) Data search method and data search system
Sharma et al. NIRMAL: Automatic identification of software relevant tweets leveraging language model
EP1342177A1 (en) Method for structuring and searching information
AU2005217413A1 (en) Intelligent search and retrieval system and method
CN101118560A (en) Keyword outputting apparatus, keyword outputting method, and keyword outputting computer program product
CN102200975A (en) Vertical search engine system and method using semantic analysis
CN103430172A (en) Search apparatus, search method, and program
US7949646B1 (en) Method and apparatus for building sales tools by mining data from websites
US20070136248A1 (en) Keyword driven search for questions in search targets
Smith et al. Corpus tools and methods, today and tomorrow: Incorporating linguists’ manual annotations
CN110765233A (en) Intelligent information retrieval service system based on deep mining and knowledge management technology
Bergamaschi et al. Comparing topic models for a movie recommendation system
Strzelecki et al. Direct answers in Google search results
TWI290687B (en) System and method for search information based on classifications of synonymous words
KR20160002199A (en) Issue data extracting method and system using relevant keyword

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C12 Rejection of a patent application after its publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20091111