CN101000627B - Method and device for issuing correlation information - Google Patents

Method and device for issuing correlation information Download PDF

Info

Publication number
CN101000627B
CN101000627B CN200710000966A CN200710000966A CN101000627B CN 101000627 B CN101000627 B CN 101000627B CN 200710000966 A CN200710000966 A CN 200710000966A CN 200710000966 A CN200710000966 A CN 200710000966A CN 101000627 B CN101000627 B CN 101000627B
Authority
CN
China
Prior art keywords
user
network text
text
classification
relevant information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN200710000966A
Other languages
Chinese (zh)
Other versions
CN101000627A (en
Inventor
曹菲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Sogou Technology Development Co Ltd
Original Assignee
Beijing Sogou Technology Development Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Sogou Technology Development Co Ltd filed Critical Beijing Sogou Technology Development Co Ltd
Priority to CN200710000966A priority Critical patent/CN101000627B/en
Publication of CN101000627A publication Critical patent/CN101000627A/en
Application granted granted Critical
Publication of CN101000627B publication Critical patent/CN101000627B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

A method for issuing relevant information includes collecting network text in accordance with preset condition from Internet, calculating and obtaining text character parameter of network text, finalizing classification on network tests by comparing their character parameters and then issuing relevant information.

Description

A kind of dissemination method of relevant information and device
Technical field
The present invention relates to internet information collection and treatment field, particularly relate to a kind of dissemination method and device of relevant information.
Background technology
Under the current techniques, by the relevant information form varied (for example, news information, entertainment information, advertising message and list of relevant links or the like) of internet issue; Dissemination method is also numerous and complicated various, describes for example below:
First kind of mode is by the form issue and the closely-related information of this website descriptor of mass media's advertiser web site. ) in the eco-tour sub-directory of website.
The second way is by the search result list page issue information relevant with searching key word.Nowadays, people obtain information and resource from the internet a kind of very quick and effective means have been become by search engine retrieving.In the prior art, generally opening up a special hurdle on the right of search result list page is used to issue the Info Link address relevant with searching key word, and the user can check relevant information by web browser by clicking this link.
But above relevant information published method all is the core foundation with the keyword.For the knowledge category information, keyword is important information, and for non-knowledge category information, for example, literature is appreciated personal network's text of class network text, blog class etc., and keyword just is difficult to embody the degree of association between each information.Because the degree of association is not enough, thereby can cause the accuracy of relevant information issue to reduce, for example, according to the relevant information that identical keyword is recommended, be not that the user is needed.
Secondly, more and more go deep into people's life along with the internet, increasing terminal user has had the Virtual Space of oneself on the internet, how can directly issue relevant information to these terminal users that have the Virtual Space, also is a hot technology problem of this area.
For the Virtual Space issue relevant information of direct intended for end consumers, the clustering method of some terminal users below having proposed in the prior art: (1) is independently founded, is added or the like by the user, forms different classifications (circle); (2) some materialss for registration based on the user carry out cluster or the like to the user; Issue corresponding relevant information according to the cluster situation then.
Because the terminal user is virtual in the internet, is very difficult to Information Monitoring and analysis, above-mentioned two kinds of methods just exist very big defective:
For (1), at first, it can't further be analyzed at the user in time, for example, along with userspersonal information's accumulation whether this user's classification need change or upgrade, this mode of independently being founded, being added by the user can't realize.Secondly, it can't further be analyzed at the user on the degree of depth, and for example, whether this user can also return to other classifications.Moreover it can't carry out further refinement cluster to the user in the classification (circle); Because classification (circle) is that the user foundes, so can there be countless classification (circle) on the internet, if classification is too thin, then the user of Jia Ruing very little, for the issue of relevant information without any meaning, if classification is more coarse, though then number of users is abundant, coarse like this cluster has not had too big meaning yet for the issue of relevant information.
For (2), at first, because the virtual property of internet information, a lot of users have filled in deceptive information in order to protect oneself privacy in log-on message, and then incorrect owing to data source can't draw the right user cluster.Secondly, because log-on message generally all maintains secrecy, only can know by registrar, that is to say that user clustering in this manner can only be finished by registrar, the very various application (for example, the issue of relevant information) after the limited subscriber cluster.
In sum, a technical matters that presses for those skilled in the art's solution is: propose a kind of brand-new relevant information dissemination method, can be indifferent under the current situation of browsing text key word the user, issue is relevant information the most accurately, and the Virtual Space that can accurately relevant information be distributed to the terminal user, to satisfy the development need of internet relevant information issue.
Summary of the invention
Technical matters to be solved by this invention provides the method and apparatus of a kind of internet relevant information issue, a kind of technical scheme of brand-new network text cluster has been proposed, and on this basis, can be simple and efficient and the issue of the realization relevant information that accuracy is high.
In order to address the above problem, the invention discloses a kind of dissemination method of relevant information, comprising:
The non-knowledge class network text that meets prerequisite in step a, the collection internet;
Step b, calculate the corresponding user characteristics parameter of network text; Wherein, described user characteristics parameter comprises user's genre parameters, polynary phrase frequency, vocabulary connection degree, part of speech degree, sentence length and/or emotion degree; Described genre parameters comprises word scope, ability to express and/or logicality;
Wherein, described polynary phrase frequency is used for characterizing the frequency that the adjacent speech of text constitutes binary or polynary phrase, obtains by the participle statistics; Described vocabulary connection degree is used to characterize the number of the different speech adjacent with certain word, obtains by the participle statistics; Described part of speech degree is used for characterizing the number of the word of the various parts of speech of text, obtains by the participle statistics; Described sentence length obtains by the participle statistics; Described emotion degree obtains by the various distributions of words in text that have emotion that statistics presets; Described word scope is calculated by the non-high frequency vocabulary number in total vocabulary number in the network text and the network text; Described ability to express is calculated by total vocabulary number in the network text and adjective, the adverbial word number in the network text;
Described logicality is calculated by total vocabulary number in the network text and the conjunction number in the network text;
Step c, comparison user characteristics parameter are finished the classification of network text successively;
Steps d, when needs are issued relevant information according to network text, according to the classification information of this network text, the issue relevant information.
Preferably, described steps d comprises: when browsing or when submitting the network text issue relevant information of delivering to according to the user is current, gather that the user is current to browse or submit to the network text of delivering, according to the current classification information of browsing or submitting the network text of delivering to of this user, issue corresponding relevant information, corresponding relevant information comprises with this user is current browses or submits to other similar network texts of network text of delivering.
Perhaps, described steps d comprises: in the time need issuing relevant information according to the network text that the user submits to search engine, receive the network text that the user submits to search engine, according to the classification information of this user to the network text of search engine submission, issue corresponding relevant information, corresponding relevant information comprises other similar network texts of network text of submitting to search engine with this user.
Perhaps, also comprise:, compile network text with same user ID according to user ID; According to the classification information of the network text that compiles the same user ID in back, user ID is sorted out; Described steps d comprises: in the time need issuing relevant information according to the current network text of browsing, gather the current network text of browsing of user, classification information according to the current network text of browsing of this user, issue relevant information, described relevant information comprise the user ID tabulation of the affiliated classification of the current network text of browsing of this user.
Preferably, also comprise before the described step b: compiling the text with same user ID is a network text;
The described corresponding user characteristics parameter of network text that calculates of described step b comprises:
The network text of unifiedly calculating after described compiling obtains corresponding user characteristics parameter.
Preferably, described relevant information comprises: recommendation information, news information, entertainment information or advertising message.
Preferably, described step c comprises: preset the sample storehouse, obtain the described user characteristics parameter of sample at each sample calculation; The user characteristics parameter of each sample is finished the classification of network text successively in the user characteristics parameter of contrast network text and the sample storehouse.
Perhaps, described step c comprises: directly compare the described user characteristics parameter of each network text, finish the classification of network text successively.
Preferably, described user characteristics parameter comprises vocabulary and corresponding word frequency, and described vocabulary and corresponding word frequency are by obtaining text participle statistics.
Preferably, also comprise: carry out the segmentation of genre parameters at the network text of same classification, the user ID that will belong to same genre parameters level is labeled as same group.
Preferably, also comprise:, the user ID of same classification is segmented according to log-on message.
Preferably, also comprise: the needs according to classification is provided with, upgrade the sample in the described sample storehouse.
Wherein, the issue of described relevant information can for: to the Virtual Space issue relevant information of terminal user in network; Described Virtual Space comprises personal website, blog space or E-mail address; Perhaps, issue at current page by publisher server; Perhaps, issue on terminal user's computing equipment by issue client terminal; Perhaps, issue at result of page searching by publisher server.
The present invention also provides a kind of distributing device of relevant information, comprises with lower member:
Collector unit is used for collecting the non-knowledge class network text that the internet meets prerequisite;
The text feature parameter calculation unit is used to calculate the corresponding user characteristics parameter of network text; Wherein, described user characteristics parameter comprises user's genre parameters, polynary phrase frequency, vocabulary connection degree, part of speech degree, sentence length and/or emotion degree; Described genre parameters comprises word scope, ability to express and/or logicality;
Wherein, described polynary phrase frequency is used for characterizing the frequency that the adjacent speech of text constitutes binary or polynary phrase, obtains by the participle statistics; Described vocabulary connection degree is used to characterize the number of the different speech adjacent with certain word, obtains by the participle statistics; Described part of speech degree is used for characterizing the number of the word of the various parts of speech of text, obtains by the participle statistics; Described sentence length obtains by the participle statistics; Described emotion degree obtains by the various distributions of words in text that have emotion that statistics presets; Described word scope is calculated by the non-high frequency vocabulary number in total vocabulary number in the network text and the network text; Described ability to express is calculated by total vocabulary number in the network text and adjective, the adverbial word number in the network text; Described logicality is calculated by total vocabulary number in the network text and the conjunction number in the network text;
The unit is sorted out in comparison, is used to compare the user characteristics parameter, finishes the classification of network text successively;
Release unit is used for when needs are issued relevant information according to a network text, according to the classification information of this network text, and the issue relevant information.
Preferably, described release unit specifically is used in the time need issuing relevant information according to the current network text of browsing or submitting to of user, gather the current network text of browsing or submitting to of user, classification information according to the current network text of browsing or submitting to of this user, issue corresponding relevant information, corresponding relevant information comprises and other similar network texts of the current network text of browsing or submitting to of this user.
Preferably, also comprise: collection module, be used for according to user ID, compile network text with same user ID; Terminal user's classifying module is used for the classification information according to the network text that compiles the same user ID in back, and user ID is sorted out; Described release unit comprises: release module, be used in the time need issuing relevant information according to the current network text of browsing, gather the current network text of browsing of user, classification information according to the current network text of browsing of this user, issue relevant information, described relevant information comprise the user ID tabulation of the affiliated classification of the current network text of browsing of this user.
Preferably, described device can also comprise: collection module, be used for according to user ID, and compiling the text with same user ID is a network text;
The network text that described text feature parameter calculation unit specifically was used to unifiedly calculate after described compiling obtains corresponding user characteristics parameter.
Described relevant information can be recommendation information, news information, entertainment information or advertising message.
Preferably, described comparison is sorted out the unit and comprised: the sample characteristics parameter calculating module is used at presetting the described user characteristics parameter that each sample calculation of sample storehouse obtains sample; The comparison classifying module is used for contrasting the user characteristics parameter of network text and the user characteristics parameter of each sample of sample storehouse, finishes the classification of network text successively.
Perhaps, described comparison is sorted out the unit and is comprised: compare classifying module, be used for directly comparing the characteristic parameter of each network text, finish the classification of network text successively.
Preferably, described characteristic parameter comprises vocabulary and corresponding word frequency, and described vocabulary and corresponding word frequency are by obtaining text participle statistics.
Preferably, described device can also comprise: classification segmentation unit, be used for carrying out the segmentation of genre parameters at the network text of same classification, and the user ID that will belong to same genre parameters level is labeled as same group.
Preferably, described device can also comprise: classification segmentation unit, be used for according to log-on message, and the user ID of same classification is segmented.
Preferably, described comparison is sorted out the unit and is also comprised: the sample update module, be used for needs according to the classification setting, and upgrade the sample in the described sample storehouse.
Preferably, described release unit is: publisher server is used for to the terminal user in the Virtual Space of network issue relevant information; Described Virtual Space comprises personal website, blog space or E-mail address; Perhaps, publisher server is used in current page issue relevant information; Perhaps, issue client terminal is used for the anti-information of bursting of reception server, issues relevant information on terminal user's computing equipment; Perhaps, publisher server, the user issues relevant information at result of page searching.
The present invention also provides the dissemination method of another kind of relevant information, comprising:
The non-knowledge class network text that meets prerequisite in step a, the collection internet;
Step b, calculate the corresponding user characteristics parameter of network text; Wherein, described user characteristics parameter comprises user's genre parameters, polynary phrase frequency, vocabulary connection degree, part of speech degree, sentence length and/or emotion degree; Described genre parameters comprises word scope, ability to express and/or logicality;
Wherein, described polynary phrase frequency is used for characterizing the frequency that the adjacent speech of text constitutes binary or polynary phrase, obtains by the participle statistics; Described vocabulary connection degree is used to characterize the number of the different speech adjacent with certain word, obtains by the participle statistics; Described part of speech degree is used for characterizing the number of the word of the various parts of speech of text, obtains by the participle statistics; Described sentence length obtains by the participle statistics; Described emotion degree obtains by the various distributions of words in text that have emotion that statistics presets; Described word scope is calculated by the non-high frequency vocabulary number in total vocabulary number in the network text and the network text; Described ability to express is calculated by total vocabulary number in the network text and adjective, the adverbial word number in the network text; Described logicality is calculated by total vocabulary number in the network text and the conjunction number in the network text;
Step c, comparison user characteristics parameter are finished the classification of network text successively;
Steps d, according to user ID, compile network text with same user ID; According to the classification information of the network text that compiles the same user ID in back, user ID is sorted out;
Step e, when when search engine is submitted user ID to, according to the classification information of the described user ID of submitting to search engine, issue relevant information; Described relevant information comprises other users' of the affiliated classification of this user ID user ID, and/or, described other users' virtual personal space chained address.
Preferably, also comprise: carry out the segmentation of genre parameters at the network text of same classification, the user ID that will belong to same genre parameters level is labeled as same group.
Preferably, also comprise:, the user ID of same classification is segmented according to log-on message.
The present invention also provides the distributing device of another kind of relevant information, comprising:
Collector unit is used for collecting the non-knowledge class network text that the internet meets prerequisite;
The text feature parameter calculation unit is used to calculate the corresponding user characteristics parameter of network text; Wherein, described user characteristics parameter comprises user's genre parameters, polynary phrase frequency, vocabulary connection degree, part of speech degree, sentence length and/or emotion degree; Described genre parameters comprises word scope, ability to express and/or logicality;
Wherein, described polynary phrase frequency is used for characterizing the frequency that the adjacent speech of text constitutes binary or polynary phrase, obtains by the participle statistics; Described vocabulary connection degree is used to characterize the number of the different speech adjacent with certain word, obtains by the participle statistics; Described part of speech degree is used for characterizing the number of the word of the various parts of speech of text, obtains by the participle statistics; Described sentence length obtains by the participle statistics; Described emotion degree obtains by the various distributions of words in text that have emotion that statistics presets; Described word scope is calculated by the non-high frequency vocabulary number in total vocabulary number in the network text and the network text; Described ability to express is calculated by total vocabulary number in the network text and adjective, the adverbial word number in the network text; Described logicality is calculated by total vocabulary number in the network text and the conjunction number in the network text;
The unit is sorted out in comparison, is used to compare the user characteristics parameter, finishes the classification of network text successively;
Collection module is used for according to user ID, compiles the network text with same user ID;
Terminal user's classifying module is used for the classification information according to the network text that compiles the same user ID in back, and user ID is sorted out;
Release module is used for when needs are issued relevant information according to the user ID of submitting to when the forward direction search engine, according to the described classification information of working as the user ID of forward direction search engine submission, issue relevant information; Described relevant information comprises the user ID with generic other users of the user ID of described submission, and/or, described other users' virtual personal space chained address.
Preferably, also comprise: classification segmentation unit, be used for carrying out the segmentation of genre parameters at the network text of same classification, the user ID that will belong to same genre parameters level is labeled as same group.
Preferably, also comprise: classification segmentation unit, be used for according to log-on message, the user ID of same classification is segmented.
Compared with prior art, the present invention has the following advantages:
Owing to implied some features of this user in the network text of user's creation, made these features display by technological means (for example participle, statistics or the like), and each network text is sorted out according to these characteristic parameters of network text.Such classifying mode is for the network text of non-knowledge class, and accuracy is very high, therefore, sorts out on the basis at this, adopts the present invention can realize the relevant information issue that accuracy is high.For example, when the user browses current page (content is assumed to be certain novel), then adopt the present invention can on the correct position of the page, show a recommendation list, be used for the most relevant with the text, the most similar other novels of link.Perhaps,, submit to when delivering, adopt the present invention can on the correct position of the page, show a recommendation list, be used for the most relevant with the text, the most similar other blog texts of link when the user finishes one piece of blog diary.
Because the terminal user in the internet has suitable virtual characteristics, so be very difficult to analyze and cluster, the present inventor is by anatomizing, found the information source that best embodies user personality---the network text that the user delivers on network, one of core of the present invention just is based on user's network text characteristic the terminal user is carried out cluster, and realizes automatic cluster to the terminal user by technological means such as internet information acquisition, information analyses.This cluster mode is very accurate to terminal user's classification, because false composition is less in the data source of its foundation, and is to carry out cluster by the profound level analysis to data source, is not that direct application data source is classified; Secondly, the present invention can realize the classification segmentation of various degree, as long as different condition is set as required, realizes the depth analysis to the terminal user; Further, cluster mode of the present invention can be upgraded along with terminal user's change, promptly can make trace analysis to the terminal user in time.Because the advantage that above-mentioned cluster mode is brought, make and to realize fully technically according to the relevant information characteristic of required issue and the matching degree of terminal user's characteristic, and select which terminal user to issue this relevant information to, thereby can realize the relevant information issue of direct intended for end consumers Virtual Space, and guarantee the accuracy and the distribution effect of relevant information issue.
Description of drawings
Fig. 1 is the flow chart of steps of relevant information dissemination method embodiment 1;
Fig. 2 is the flow chart of steps of relevant information dissemination method embodiment 2;
Fig. 3 is the flow chart of steps of relevant information dissemination method embodiment 3;
Fig. 4 is the position view of each text in two-dimensional space;
Fig. 5 is a kind of structured flowchart of distributing device of relevant information.
Embodiment
For above-mentioned purpose of the present invention, feature and advantage can be become apparent more, the present invention is further detailed explanation below in conjunction with the drawings and specific embodiments.
With reference to Fig. 1, show the flow chart of steps of relevant information dissemination method embodiment 1 of the present invention, specifically may further comprise the steps:
Step 101, preset the sample storehouse, obtain the sample characteristics parameter at each sample calculation;
Described sample is to select according to the needs of classification, can comprise a plurality of samples in the sample storehouse, each sample can become a group, can also segment according to other information each terminal user under each sample, and a plurality of samples can also be formed a big class.For example, be divided into big class such as swordsman's class, describing love affairs class, fantasy class, children's interesting class, technology type in the sample storehouse; Sample below the describing love affairs class comprises ten authors' network text, and then each author can become a group, and the terminal user the most similar to this author's writing style is included into this group; And return each terminal user in this group to be segmented according to parameters such as log-on message, writing levels.In the above-mentioned example, each author's network text can be used as a sample, and this sample can be made up of many pieces of these authors' network text.
For network text, acute pyogenic infection of finger tip in the present invention: the text message of storage or transmission in the internet.
Certainly, the selection of sample and setting do not need to limit divides with the author, and on behalf of a classification, a sample can get final product.Described sample can be a piece or many pieces of network texts with typical meaning of different authors.And the style of the text of same author's different times or different-format also may enter different sample classes, and the present invention mainly screens the sample in the same class by the sample characteristics parameter and gets final product.
User network text and sample comparison in order to realize gathering from the internet then need each sample is calculated its respective sample characteristic parameter in the sample storehouse, with convenient comparison.
Generally speaking, described characteristic parameter comprises vocabulary and corresponding word frequency, and described vocabulary and corresponding word frequency are by obtaining text participle statistics.Employed vocabulary of user and corresponding word frequency are reflection author styles's requisite data, because author styles's difference often is embodied on the vocabulary of their use.Described characteristic parameter is (for example, different classification settings) as required, set by those skilled in the art.
Preferably, the sample in the described sample storehouse is not changeless, according to the needs of classification setting wherein sample is upgraded, and for example, increases or deletes a classification, increases or deletes certain sample under a certain classification.
The network text that meets prerequisite in step 102, the collection internet;
Because core of the present invention is by compiling the network text that the user delivers in network, and then analyzes and obtain some and hide wherein user characteristics parameters, for example, writing styles such as word term; Then according to the characteristic parameter that obtains to network text, and even the terminal user classifies.Therefore, the prerequisite described in the described step 102 can only be to get final product greater than folio.
Certainly, preferred, in order to improve analytical accuracy, can carry out restriction on other collection conditions to collected network text.At network text itself, described collection condition can be number of words, and for example, the network text that is lower than how many numbers of words just will not be collected; Described collection condition also can for example, have " change and paste " printed words such as " reprintings " for whether this network text is original in the network text, then will not collect; Described collection condition also can for whether this network text be the rubbish network text, and for example, this network text is to form by in short repeating multipass.
Further, the time that described collection condition also can be delivered for network text, for example, how long the network text before just will not be collected.Owing to the terminal user also along with the time changes, therefore the network text before certain preset time point may can not embody the up-to-date characteristic parameter of this user, therefore it is got rid of and analyzing outside the data source, to improve the accuracy of this subseries or cluster.
Described collection condition also can be capture range, for example, only collects the network text of certain website (large-scale forum) or certain type (blog, BLOG).First, because network text and user in the internet are too many, and in different websites, same terminal user uses the situation of different user names very general, secondly, for visit capacity and registered user measure less website, because data source is less, adopts Collection and analysis of the present invention also to be difficult to obtain analysis result more accurately, therefore, preferably, can be limited the place that network text is delivered.Second, because the network text in the internet has all kinds, for some types, say very not suitable the present invention from analysis result, for example, the technology type network text is though the characteristic parameter that also can reflect this terminal user to a certain extent (for example, be engaged in any work), but be difficult to it is done further to analyze; And the network text of other types then is well suited for the present invention, for example, the network text of literature or blog (BLOG) class, even same viewpoint, same story, different terminal users also can well embody different characteristic parameters; Therefore, preferred, can be limited the network text type.
The source of text can also comprise the user version of gathering by client, for example, gather user's input text by input method client, the text is stored in the subscriber's local computing equipment, and perhaps being stored in can be as text source of the present invention in the webserver.
The characteristic parameter of each sample is finished the classification of network text successively in step 103, contrast text feature parameter and the sample storehouse.Generally speaking, a network text can be included into the highest sample class of its similarity in.
According to the difference that characteristic parameter is chosen, the similarity of being taked comparison mode also can be different.For example, when described characteristic parameter is vocabulary and word frequency thereof, can compare in the following way: at first, can obtain the similarity of the vocabulary in employed vocabulary of this user and the sample by calculating the vocabulary coincidence factor; Secondly can compare by drawing the word frequency distribution curve, obtain the word frequency distribution similarity of this user and sample, described word frequency distribution curve can adopt the point of each vocabulary as horizontal ordinate, and corresponding word frequency is as ordinate.
If the characteristic parameter of choosing changes to some extent, then to compare mode and also can change thereupon, the back can be described in detail this.
Preferably, if the text feature parameter is compared with the sample characteristics parameter, its comparative result is: all higher with the similarity of a plurality of samples, then can simultaneously this network text be included in corresponding a plurality of sample class, and promptly a network text can be included in a plurality of classifications simultaneously in the present invention.
Preferably, according to the difference that characteristic parameter is chosen, can also carry out darker or thinner category division to network text.For example, carry out the segmentation of genre parameters at the network text of same classification, the network text that will belong to same genre parameters level is labeled as same group; Described genre parameters comprises word scope, ability to express and/or logicality, described word scope is calculated by the non-high frequency vocabulary number in total vocabulary number in the text and the text, described ability to express is calculated by total vocabulary number in the text and the adjective adverbial word number in the text, and described logicality is calculated by total vocabulary number in the text and the conjunction number in the text.
Step 104, according to the classification information of network text, issue corresponding relevant information.
Described relevant information can comprise: news information, entertainment information, advertising message or recommendation information.Wherein, described recommendation information can be used for linking other network of relation texts of same classification, also can be used to link or directly show the hypermedia information relevant with this classification.Described hypermedia information comprises picture, literal and audio-video document or the like.
For example the issue of relevant information is described below:
Example 1, send request when browsing a network text as the user, can be by the classification information of the current network text of browsing of analysis user, thereby issue corresponding relevant information, as, on preset position, by publisher server or deliver client (comprising toolbar etc.) and show recommendation list, recommend other network texts the most similar to the network text writing style of this user's browsing.
Example 2, in the blog space of user, finish one piece of new work at oneself, when submission is delivered, can be by analyzing the classification information of this network text, on preset position, show recommendation list by blog server, recommend other the most similar blog texts of network text writing style of being submitted to this user.The relevant information of being issued can also comprise the classification information of this network text or characteristic parameter information or the like.
Example 3, submit to one piece of network text (to comprise: the address that network text itself or this network text are provided) to search engine as the user, then search engine is by analyzing the classification information of this network text, on preset position, show recommendation list by search engine server, recommend other the most similar network texts of network text writing style of being searched for to this user.
Above-mentioned is that example describes to recommend the issue text message, if when the relevant information of being issued is various hypermedia information format, then can adopt the various forms beyond the tabulation fully, has not just described in detail one by one in this present invention.
The issue of described relevant information can be adopted following variety of way: to the Virtual Space issue relevant information of terminal user in network; Described Virtual Space comprises personal website, blog space or E-mail address;
Perhaps, issue at current page by publisher server;
Perhaps, issue on terminal user's computing equipment by issue client terminal;
Perhaps, issue at result of page searching by publisher server.
With reference to Fig. 2, show the flow chart of steps of relevant information dissemination method embodiment 2 of the present invention, specifically may further comprise the steps:
The network text that meets prerequisite in step 201, the collection internet;
Step 202, calculate network text corresponding text characteristic parameter;
Step 203, directly compare the characteristic parameter of each network text, finish the classification of network text successively;
Step 204, according to the classification information of network text, issue corresponding relevant information.
The similarity of embodiment 2 and embodiment 1 just no longer is repeated in this description, and the main difference of the two is, does not need to be provided with the sample storehouse among the embodiment 2, directly passes through the characteristic parameter of each network text of comparison, thereby finishes the classification of network text.That is to say whether the present invention does not limit needs the sample storehouse is set, in the practical application, the classifying method of the foregoing description 1 and embodiment 2 can be used in combination fully.
With reference to Fig. 3, show the flow chart of steps of relevant information dissemination method embodiment 3 of the present invention, the classification that can realize embodiment illustrated in fig. 3 to the terminal user, and can specifically may further comprise the steps at its Virtual Space issue relevant information:
The network text that meets prerequisite in step 301, the collection internet;
Step 302, calculate network text corresponding text characteristic parameter;
Step 303, comparison characteristic parameter are finished the classification of network text successively;
In the above step, just repeat no more, see also aforementioned relevant portion and get final product with embodiment similarity illustrated in figures 1 and 2.
Step 304, according to user ID, compile network text with same user ID;
Step 305, basis are compiled the classification information of the network text of the same user ID in back, and user ID is sorted out; For example, in should the network text of user ID, there be 1 to belong to category-A, has 10 to belong to category-B, have 1 to belong to the C class, then this user ID is included into category-B.Generally speaking, a user ID be just to there being a virtual or real terminal user, therefore, can realize classification to the terminal user by the classification to user ID.In embodiment illustrated in fig. 3, step 304 and 305 has just been finished the classification to the pairing terminal user of network text.
For another specific embodiment, terminal user's classification, also can adopt following manner to realize: to increase compilation steps, the network text that will have same user ID pools together becomes a network text, unified calculation obtains corresponding user characteristics parameter then, contrast the characteristic parameter of each sample in user characteristics parameter and the sample storehouse then, obtain and the highest sample of this user's index similarity, then this terminal user is included into this sample class.
After above-mentioned classification was finished, according to actual needs, the present invention can also segment the terminal user in the same classification, for example, is divided into each group.
Described segmentation can be segmented according to these terminal users' log-on message, for example, segments according to information such as registration age of user, address, occupations.
Described segmentation also can be carried out the segmentation of genre parameters at the terminal user of same classification, and the terminal user that will belong to same genre parameters level is labeled as same group;
Described genre parameters can comprise parameters such as word scope, ability to express and/or logicality, is used to estimate this user's writing level, as the segmentation foundation.Wherein, described word scope can be calculated by total vocabulary number in described this user network text that compiles and the non-high frequency vocabulary number in the network text, described non-high frequency vocabulary can be differentiated in the following manner: preset the common wordss table, if the conventional frequency of certain vocabulary in described common wordss table is lower than a preset threshold value in this user network text, judge that then this vocabulary is non-high frequency vocabulary; Described ability to express is calculated by total vocabulary number in described this user network text that compiles and the adjective adverbial word number in the network text; Described logicality is calculated by total vocabulary number in described this user network text that compiles and the conjunction number in the network text.
Step 306, according to the current network text of browsing of user, issue corresponding relevant information.Described relevant information can be the information of various hypermedia forms.
For example, described relevant information comprises the user ID tabulation of the affiliated classification of this network text, promptly gathers the current network text of browsing of user, analyzes also and sorts out, and the form of then other user ID of classification under this network text being passed through tabulation shows.Certainly, can also show according to presetting rule ordering back.
For simpler realization, if the current network text of browsing of user has user ID, then can be directly according to the classification information of this user ID, show other user ID in the same classification by the form of tabulation.Certainly, this embodiment is preferably applied to the important field of user ID, for example, and forum or blog or the like.
The issue of described relevant information can for: to the Virtual Space issue relevant information of terminal user in network; Described Virtual Space comprises personal website, blog space or E-mail address or the like;
For issuing steps 306, the issue of described relevant information can push by the short message of forum, the means such as email address that the user stays, described propelling movement can be relevant information itself, also can be the address link of relevant information.The issue of described relevant information can also be issued personal website from recommendation information or the like to user or blog web page by the server that the Virtual Space is provided, and the advantage of this mode is directly to be presented on the User Page.
For another specific embodiment, step 306 can also for: according to the user ID of current submission, the issue Search Results.For example, described Search Results comprises other user ID in the same classification, can show in Search Results promptly and other blogs of the similar style of blog of searching for that described Search Results can also link the blog address.Above-mentioned blog only for for example, goes for the displaying of various virtual personal spaces, also can adopt various exhibition method issue Search Results, and the present invention does not need to be limited.
This instructions front has been introduced with the situation of vocabulary word frequency as characteristic parameter, and in fact, those skilled in the art can set various characteristic parameters according to the needs that reality is sorted out, and below other characteristic parameters that may use are simply introduced:
Described characteristic parameter can also be polynary phrase frequency, is used for characterizing the frequency that the adjacent speech of text constitutes binary or polynary phrase, by text participle statistics is obtained; For example, comprise " to next day; Jia Zhen comes to arrange everything " in the collected network text, then through participle obtain " arrived/next day/;/Jia Zhen/come/cooking/everything ", the binary phrase is exactly so: " to next day ", " Jia Zhen comes ", " cooking of coming ", " cooking everything ", statistics obtains each included in collected network text binary phrase and corresponding frequency, because different terminal users is accustomed to using different collocations, so binary phrase frequency can well be separated each terminal user from the angle of collocations.Only enumerated the example of binary phrase above, this characteristic parameter can also comprise three, even the continuous formation frequency or the probability of more phrase.
Described characteristic parameter can also be vocabulary connection degree, is used to characterize the number of the different speech adjacent with certain word, obtains by the participle statistics; For example, sentence " I love Tian An-men, Beijing, sun liter on Tian An-men ", then the vocabulary connection degree of vocabulary " Tian An-men " is 2, because two the different speech that have that link to each other with it are " Beijing ", " on ".
Described characteristic parameter can also be the part of speech degree, is used for characterizing the number of the word of text various parts of speech, obtains by the participle statistics.Because the terminal user who has likes using Chinese idiom, the terminal user who has likes using two-part allegorical saying, and the terminal user's strict logic that has can often use conjunction, or the like, this can consider from part of speech.By Automatic Program collected network text is carried out part-of-speech tagging, the word quantity of adding up corresponding part of speech then can obtain feature and participate in competition to the family parameter---the part of speech degree.
Described characteristic parameter can also be sentence length, obtains by the participle statistics; Use European long sentence because the terminal user who has likes, the terminal user's sentence that has is short and pithy, in picturesque disorder, then adds up, writes down the distribution situation of sentence length in the collected text by decollators such as comma, fullstops, above-mentioned terminal user can be separated.
Described characteristic parameter can also be the emotion degree, and the various distributions of vocabulary in text that have emotion of presetting by statistics obtain, as the feature of an auxiliary parameter sign text.For example, generally in text, noun often has different emotions, have energetic color as " double-edged sword ", " dark night " has oppressive color, or the like, can the emotion that text showed be considered by the distribution of these speech in the statistics text.Again for example, for the poem text, " withered vine on an old tree dusk crow, small bridge over the flowing stream other " has the strong feelings color; One mentions " cuckoo " in the poem, just has a kind of atmosphere of sadness, as " hoping Supreme Being's desire for love holder cuckoo "; " military hardware " such speech then shows impassioned color; " willow ", " Lan Zhou " then can show graceful and restrained color.By adding up the emotion degree that the distribution situation of above-mentioned vocabulary in text just can identify each text.
Certainly, above-mentioned characteristic parameter can singlely use, and also can combination in any use.
Under the situation that has a plurality of characteristic parameters, can realize the comparison process of user version and sample in the following ways: each characteristic parameter of employing user version is represented the coordinate of each latitude of the text in a hyperspace respectively, by calculating user version in this hyperspace, and the distance of each sample obtains the similarity between this user version and each sample.
With reference to Fig. 4, use vocabulary and word frequency coordinate exactly as each text in the two-dimensional space, show the position view of each text in two-dimensional space, obtain similarity between each text by calculating distance between each text.Among Fig. 4, each small icon is represented a text, and identical shaped icon is represented same writer's text, in the present invention, can adopt a small icon as a sample, also can adopt an identical shaped class icon as a sample.Among Fig. 4, the distance between two texts is represented the similarity degree of two texts.Wherein, what the pentagon icon was represented is the user version that is compiled, and is the center of circle with the user version, seeks the text in the certain diameter, calculate the distance between each text in this diameter, thereby obtain and the most akin icon text of user version or a class icon text.For example, among Fig. 4, leg-of-mutton icon has been represented same author's text, and jointly as a sample, so the user version among Fig. 4 just is included in this sample class.
With a concrete example the present invention is carried out more detailed introduction below!
For example, it is as follows for blog (blog) network text of " abc " to have collected sign author from the internet:
" only see the willow gold thread that hangs down, peach tells red rosy clouds, after the stone of mountain, and the big apricot of a strain, flower falls entirely, and leaf is thick cloudy emerald green, has tied the many little apricot of bean or pea sizes above.Precious jade is because of thinking: " can be out of shape several days, and unexpectedly the apricot flower failed to live up to! Unconsciously ' Green leaves make a shade and the boughs are filled with fruit '! " so look up at apricot and do not give up.Remember that again Xing Xiuyan has selected husband's one thing, though be men and women's major issue, can not be not all right, lacked a good daughter rather again.But 2 years, just also want " Green leaves make a shade and the boughs are filled with fruit ".After several days, the sub-cladoptosis sky of this apricot, several years again, the cave cigarette is Wu Fa such as silver rather, and the beauty is like withered, and is therefore sad unavoidably, by all means to the apricot sigh of shedding tears.When just bemoaning, there is a sparrow sudden suddenly, falls within branch and go up random crowing.Precious jade is having stared blankly property again, think under the heart: " this sparrow must be that apricot spends positive Kai Shita once, and the present sees that not having flower has cotyledon in vain, so also unrest is crowed.This sound must be the sound of crying, and hateful Gongye is long not before eyes, can not ask him.But when not knowing that send out next year again, this sparrow can also remember to fly to come here with apricot flower for a moment? " "
Through participle, the result of statistics non-monosyllabic word wherein is as follows:
Apricot spends 3 to bemoan and have 1 for 1 one 1 in vain and shed tears 1
Precious jade 2 is also wanted 1 by all means 1 must 1 next year 1
2 came 1 to see that 1 has lacked 1 sound 1 rather
Become shady more than 2 days 1 and fail to live up to 1 to send out 1 again and have one 1
More than 2 years 1 two years 1 crows of greenery send out 1 before eyes 1
More than 2 days 1 mountain stone 1 of apricot 1 flies here 1 here
Think 2 bean or pea 1 sigh 1 this 1 fly to 1
Therefore though 2 unexpectedly 1 good daughter 11 is died also 1
Be out of shape 1 do not know 1 remember 1 cry 1 ask he 1
Little apricot 1 can not 1 hateful 1 remembers 1 after 1
Send out 1 not all right 1 sad 1 beauty, 1 nothing and spent 1
Red rosy clouds 11 many 1 have understood 1 above can not 1
Gongye 1 does not give up 1 men and women, 1 size 1 and has tied 1
11 apricot, 1 major issue, 1 husband 1 unavoidably afterwards
Sound 1 only 1 cotyledon 1 look up at 1
Store all kinds of samples in the sample storehouse, wherein a class sample is the author with Cao Xueqin, and the text of promptly having collected some Cao Xue celerys is as one of sample.The vocabulary that top participle statistics is obtained and the vocabulary and the word frequency of word frequency and each sample compare, and obtain with the similarity of Cao Xue celery sample the highlyest, then terminal user " abc " are included in the Cao Xue celery sample class.
Then collected all-network text is all passed through above-mentioned characteristic parameter analysis, comparison, classification process, thereby be implemented in the classification of the terminal user in the capture range.
For the terminal user who has returned through said process in a class, the present invention can also finish further segmentation as required, for example, in the log-on message of blog, each terminal user may fill in some personal information, then can realize segmentation to the terminal user in this classification according to these information.
Preferably, can also segment the terminal user in the same classification by other parameters.For example, segment, will be subdivided into one group the terminal user in certain genre parameters scope by genre parameters.
Genre parameters can obtain by following steps:
(1) following data: a, the total word number of network text in the statistics network text behind the participle; B, network text individual character word number; C, the non-high frequency word of network text number; D, adjective and adverbial word number; E, Chinese idiom number; F, conjunction number.
(2) described genre parameters comprises: A, vocabulary are calculated by above-mentioned data a and b; B, word scope are calculated by above-mentioned data a and c; C, ability to express are calculated by above-mentioned data a and d; D, Chinese idiom use, and are calculated by above-mentioned data a and e; E, logicality, above-mentioned data a and f calculate;
Equally, for aforementioned collected network text, by participle statistics quantity such as noun, verb, adverbial word, adjective, conjunction and Chinese idiom wherein, the result is as follows: total speech number: 184; Chinese idiom number: 0; Not repeat number: 75; Text character number: 330; Adjective number: 7; Adverbial word number: 7; Conjunction number: 4; Very word: 5.By these statistical figure, calculate each genre parameters, as: vocabulary: 0.339489, the word scope: 0.135796, ability to express: 0.152091, Chinese idiom uses: 0.000000, logical capability: 0.434546.
Each genre parameters terminal user in a presetting range, that belong to Cao Xue celery sample class again can be subdivided into one group then.
The concrete technical scheme that the present invention sorts out the terminal user is more than disclosed, and can have a lot of technology to use after sorting out, as, at the issue (described relevant information comprises news information, entertainment information, advertising message and recommendation information or the like) of all kinds of terminal users' relevant information.
Particularly, for the blog terminal user, to use after the present invention, generation that can maximum magnitude is fit to user's the blog circle (circle of friends that can effective communication exchanges very much, as, can generate one and similar other user lists of this user's writing style automatically); And, the propelling movement of information such as relevant advertisements, news, Below-the-line that these terminal users are carried out that can be very pointed.Because the present invention can realize the classification to the terminal user, so can improve the specific aim and the accuracy of relevant information issue.
With reference to Fig. 5, show a kind of structured flowchart of distributing device of relevant information, specifically comprise with lower member:
Collector unit 501 is used for collecting the network text that the internet meets prerequisite;
Text feature parameter calculation unit 502 is used to calculate network text corresponding text characteristic parameter;
Unit 503 is sorted out in comparison, is used to compare characteristic parameter, finishes the classification of network text successively;
Release unit 504 is used for the classification information according to network text, issues corresponding relevant information.
Wherein, described classification information as the issue foundation is: the classification information of the network text that current network text of browsing of user or user submit to.Described submission can comprise: be used for the submission that network text is delivered; Perhaps, be used to carry out the submission of network text search.
Preferably, as a specific embodiment, device shown in Figure 5 can also comprise: collection module links to each other with collector unit 501 or text feature parameter calculation unit 502, be used for according to user ID, compiling the text with same user ID is a network text.Follow-up calculating, the text of sorting out after all foundation is compiled carry out, and promptly can realize the classification to the terminal user.
Preferably, as another specific embodiment, described release unit 504 can comprise: collection module, be used for according to user ID, and compile network text with same user ID; Terminal user's classifying module is used for according to the classification information of compiling the same terminal user's network text in back the terminal user being sorted out; Release module is used for the current network text of browsing according to the user, issues corresponding relevant information.For example, described relevant information can comprise the user ID tabulation of the affiliated classification of this network text.
Further, as another specific embodiment, described release unit 504 also can comprise: collection module, be used for according to user ID, and compile network text with same user ID; Terminal user's classifying module is used for according to the classification information of compiling the same terminal user's network text in back the terminal user being sorted out; Release module is used for the user ID according to current submission, the issue Search Results.For example, described Search Results comprises other user ID in the same classification.
Certainly, above-mentioned collection module and terminal user's classifying module also can be arranged in other unit, perhaps form a unit separately, and the present invention only needs attributive function and annexation to get final product.
Described relevant information in the device shown in Figure 5 can comprise: recommendation information, news information, entertainment information or advertising message.The issue of above-mentioned relevant information can show that described recommendation information can be used for linking other network of relation texts of same classification by modes such as tabulation, rolling windows, perhaps the user's of same classification various hypermedia information.
Preferably, described comparison is sorted out unit 503 and can be comprised: the sample characteristics parameter calculating module is used for obtaining the sample characteristics parameter at sample storehouse 507 each sample calculation that preset; Compare classifying module, be used for contrasting the characteristic parameter of text feature parameter and each sample of sample storehouse, finish the classification of network text successively.
Certainly, described comparison is sorted out unit 503 and also can only be comprised: compare classifying module, be used for directly comparing the characteristic parameter of each network text, finish the classification of network text successively.
Wherein, described characteristic parameter can comprise vocabulary and corresponding word frequency, and described vocabulary and corresponding word frequency are by obtaining text participle statistics.
Preferably, described characteristic parameter can also comprise:
Polynary phrase frequency is used for characterizing the frequency that the adjacent speech of text constitutes binary or polynary phrase, by text participle statistics is obtained;
Perhaps, vocabulary connection degree is used to characterize the number of the different speech adjacent with certain word, obtains by the participle statistics;
Perhaps, the part of speech degree is used for characterizing the number of the word of the various parts of speech of text, obtains by the participle statistics;
Perhaps, sentence length obtains by the participle statistics;
Perhaps, the emotion degree, the various distributions of words in text that have emotion of presetting by statistics obtain.
Certainly, above-mentioned characteristic parameter can singlely use, and also can combination in any use.
Distributing device as shown in Figure 5 can also comprise classification segmentation unit 505, is used for carrying out at the network text of same classification the segmentation of genre parameters, and the network text that will belong to same genre parameters level is labeled as same group; Described genre parameters comprises word scope, ability to express and/or logicality, described word scope is calculated by the non-high frequency vocabulary number in total vocabulary number in the text and the text, described ability to express is calculated by total vocabulary number in the text and the adjective adverbial word number in the text, and described logicality is calculated by total vocabulary number in the text and the conjunction number in the text.
Described classification segmentation unit 505 can also be used for the user at same classification, carries out the classification segmentation according to log-on message.In a word, the user that classification segmentation unit 505 is mainly used in same classification further segments, and certainly, the foundation of segmentation can actually as required be provided with and get final product.
Clustering apparatus as shown in Figure 5 can also comprise sample updating block 506, is used for upgrading according to the needs of classification setting the sample in described sample storehouse.Certainly, by setting, can realize cluster at all levels to the terminal user to sample.
In the clustering apparatus shown in Figure 5, described release unit 504 can be publisher server, is used for to the terminal user in the Virtual Space of network issue relevant information; Described Virtual Space comprises personal website, blog space or E-mail address.Described release unit 504 also can for: be used for publisher server in current page issue relevant information.Described release unit 504 also can be the issue client terminal (comprising toolbar, browser plug-in etc.) that is used on terminal user's computing equipment the issue relevant information.Described release unit 504 also can be: the user publisher server in result of page searching issue relevant information.
Owing to the present invention has been carried out very detailed description in to the description of Fig. 1, so omitted partial content in the description to Fig. 5, not detailed part can be referring to the relevant portion of Fig. 1.
More than to the dissemination method and the device of a kind of relevant information provided by the present invention, be described in detail, used specific case herein principle of the present invention and embodiment are set forth, the explanation of above embodiment just is used for helping to understand method of the present invention and core concept thereof; Simultaneously, for one of ordinary skill in the art, according to thought of the present invention, the part that all can change in specific embodiments and applications, in sum, this description should not be construed as limitation of the present invention.

Claims (32)

1. the dissemination method of a relevant information is characterized in that, comprising:
The non-knowledge class network text that meets prerequisite in step a, the collection internet;
Step b, calculate the corresponding user characteristics parameter of network text; Wherein, described user characteristics parameter comprises user's genre parameters, polynary phrase frequency, vocabulary connection degree, part of speech degree, sentence length and/or emotion degree; Described genre parameters comprises word scope, ability to express and/or logicality;
Wherein, described polynary phrase frequency is used for characterizing the frequency that the adjacent speech of text constitutes binary or polynary phrase, obtains by the participle statistics;
Described vocabulary connection degree is used to characterize the number of the different speech adjacent with certain word, obtains by the participle statistics;
Described part of speech degree is used for characterizing the number of the word of the various parts of speech of text, obtains by the participle statistics;
Described sentence length obtains by the participle statistics;
Described emotion degree obtains by the various distributions of words in text that have emotion that statistics presets;
Described word scope is calculated by the non-high frequency vocabulary number in total vocabulary number in the network text and the network text;
Described ability to express is calculated by total vocabulary number in the network text and adjective, the adverbial word number in the network text;
Described logicality is calculated by total vocabulary number in the network text and the conjunction number in the network text;
Step c, comparison user characteristics parameter are finished the classification of network text successively;
Steps d, when needs are issued relevant information according to network text, according to the classification information of this network text, the issue relevant information.
2. the method for claim 1 is characterized in that, described steps d comprises:
When browsing or when submitting the network text issue relevant information of delivering to according to the user is current, gather that the user is current to browse or submit to the network text of delivering, according to the current classification information of browsing or submitting the network text of delivering to of this user, issue corresponding relevant information, corresponding relevant information comprises with this user is current browses or submits to other similar network texts of network text of delivering.
3. the method for claim 1 is characterized in that, described steps d comprises:
In the time need issuing relevant information according to the network text that the user submits to search engine, receive the network text that the user submits to search engine, according to the classification information of this user to the network text of search engine submission, issue corresponding relevant information, corresponding relevant information comprises other similar network texts of network text of submitting to search engine with this user.
4. the method for claim 1 is characterized in that, also comprises:
According to user ID, compile network text with same user ID;
According to the classification information of the network text that compiles the same user ID in back, user ID is sorted out;
Described steps d comprises:
In the time need issuing relevant information according to the current network text of browsing, gather the current network text of browsing of user, classification information according to the current network text of browsing of this user, issue relevant information, described relevant information comprise the user ID tabulation of the affiliated classification of the current network text of browsing of this user.
5. the method for claim 1 is characterized in that, also comprise before the described step b: compiling the text with same user ID is a network text;
The described corresponding user characteristics parameter of network text that calculates of described step b comprises:
The network text of unifiedly calculating after described compiling obtains corresponding user characteristics parameter.
6. as claim 1,2,3,4 or 5 described methods, it is characterized in that described relevant information comprises:
Recommendation information, news information, entertainment information or advertising message.
7. the method for claim 1 is characterized in that, described step c comprises:
Preset the sample storehouse, obtain the described user characteristics parameter of sample at each sample calculation;
The user characteristics parameter of each sample is finished the classification of network text successively in the user characteristics parameter of contrast network text and the sample storehouse.
8. the method for claim 1 is characterized in that, described step c comprises:
Directly the described user characteristics parameter of each network text of comparison is finished the classification of network text successively.
9. the method for claim 1 is characterized in that, described user characteristics parameter comprises vocabulary and corresponding word frequency, and described vocabulary and corresponding word frequency are by obtaining text participle statistics.
10. method as claimed in claim 4 is characterized in that, also comprises:
Carry out the segmentation of genre parameters at the network text of same classification, the user ID that will belong to same genre parameters level is labeled as same group.
11. method as claimed in claim 4 is characterized in that, also comprises:
According to log-on message, the user ID of same classification is segmented.
12. method as claimed in claim 7 is characterized in that, also comprises:
Needs according to classification is provided with upgrade the sample in the described sample storehouse.
13. the method for claim 1 is characterized in that, being issued as of described relevant information:
To the Virtual Space issue relevant information of terminal user in network; Described Virtual Space comprises personal website, blog space or E-mail address;
Perhaps, issue at current page by publisher server;
Perhaps, issue on terminal user's computing equipment by issue client terminal;
Perhaps, issue at result of page searching by publisher server.
14. the distributing device of a relevant information is characterized in that, comprising:
Collector unit is used for collecting the non-knowledge class network text that the internet meets prerequisite;
The text feature parameter calculation unit is used to calculate the corresponding user characteristics parameter of network text; Wherein, described user characteristics parameter comprises user's genre parameters, polynary phrase frequency, vocabulary connection degree, part of speech degree, sentence length and/or emotion degree; Described genre parameters comprises word scope, ability to express and/or logicality;
Wherein, described polynary phrase frequency is used for characterizing the frequency that the adjacent speech of text constitutes binary or polynary phrase, obtains by the participle statistics;
Described vocabulary connection degree is used to characterize the number of the different speech adjacent with certain word, obtains by the participle statistics;
Described part of speech degree is used for characterizing the number of the word of the various parts of speech of text, obtains by the participle statistics;
Described sentence length obtains by the participle statistics;
Described emotion degree obtains by the various distributions of words in text that have emotion that statistics presets;
Described word scope is calculated by the non-high frequency vocabulary number in total vocabulary number in the network text and the network text;
Described ability to express is calculated by total vocabulary number in the network text and adjective, the adverbial word number in the network text;
Described logicality is calculated by total vocabulary number in the network text and the conjunction number in the network text;
The unit is sorted out in comparison, is used to compare the user characteristics parameter, finishes the classification of network text successively;
Release unit is used for when needs are issued relevant information according to a network text, according to the classification information of this network text, and the issue relevant information.
15. device as claimed in claim 14, it is characterized in that, described release unit specifically is used in the time need issuing relevant information according to the current network text of browsing or submitting to of user, gather the current network text of browsing or submitting to of user, classification information according to the current network text of browsing or submitting to of this user, issue corresponding relevant information, corresponding relevant information comprises and other similar network texts of the current network text of browsing or submitting to of this user.
16. device as claimed in claim 15 is characterized in that, described submission comprises: be used for the submission that network text is delivered; Perhaps, be used to carry out the submission of similar network text search.
17. device as claimed in claim 14 is characterized in that, also comprises:
Collection module is used for according to user ID, compiles the network text with same user ID;
Terminal user's classifying module is used for the classification information according to the network text that compiles the same user ID in back, and user ID is sorted out;
Described release unit comprises:
Release module, be used in the time need issuing relevant information according to the current network text of browsing, gather the current network text of browsing of user, classification information according to the current network text of browsing of this user, issue relevant information, described relevant information comprise the user ID tabulation of the affiliated classification of the current network text of browsing of this user.
18. device as claimed in claim 14 is characterized in that, also comprises:
Collection module is used for according to user ID, and compiling the text with same user ID is a network text;
The network text that described text feature parameter calculation unit specifically was used to unifiedly calculate after described compiling obtains corresponding user characteristics parameter.
19., it is characterized in that described relevant information comprises as claim 14,15,16,17 or 18 described devices:
Recommendation information, news information, entertainment information or advertising message.
20. device as claimed in claim 14 is characterized in that, described comparison is sorted out the unit and is comprised:
The sample characteristics parameter calculating module is used at presetting the described user characteristics parameter that each sample calculation of sample storehouse obtains sample;
The comparison classifying module is used for contrasting the user characteristics parameter of network text and the user characteristics parameter of each sample of sample storehouse, finishes the classification of network text successively.
21. device as claimed in claim 14 is characterized in that, described comparison is sorted out the unit and is comprised:
Compare classifying module, be used for directly comparing the characteristic parameter of each network text, finish the classification of network text successively.
22. device as claimed in claim 14 is characterized in that, described user characteristics parameter comprises vocabulary and corresponding word frequency, and described vocabulary and corresponding word frequency are by obtaining text participle statistics.
23. device as claimed in claim 17 is characterized in that, also comprises:
Classification segmentation unit is used for carrying out at the network text of same classification the segmentation of genre parameters, and the user ID that will belong to same genre parameters level is labeled as same group.
24. device as claimed in claim 17 is characterized in that, also comprises:
Classification segmentation unit is used for according to log-on message, and the user ID of same classification is segmented.
25. device as claimed in claim 20 is characterized in that, described comparison is sorted out the unit and is also comprised:
The sample update module is used for the needs according to the classification setting, upgrades the sample in the described sample storehouse.
26. device as claimed in claim 14 is characterized in that, described release unit is:
Publisher server is used for to the terminal user in the Virtual Space of network issue relevant information; Described Virtual Space comprises personal website, blog space or E-mail address;
Perhaps, publisher server is used in current page issue relevant information;
Perhaps, issue client terminal is used for the anti-information of bursting of reception server, issues relevant information on terminal user's computing equipment;
Perhaps, publisher server, the user issues relevant information at result of page searching.
27. the dissemination method of a relevant information is characterized in that, comprising:
The non-knowledge class network text that meets prerequisite in step a, the collection internet;
Step b, calculate the corresponding user characteristics parameter of network text; Wherein, described user characteristics parameter comprises user's genre parameters, polynary phrase frequency, vocabulary connection degree, part of speech degree, sentence length and/or emotion degree; Described genre parameters comprises word scope, ability to express and/or logicality;
Wherein, described polynary phrase frequency is used for characterizing the frequency that the adjacent speech of text constitutes binary or polynary phrase, obtains by the participle statistics;
Described vocabulary connection degree is used to characterize the number of the different speech adjacent with certain word, obtains by the participle statistics;
Described part of speech degree is used for characterizing the number of the word of the various parts of speech of text, obtains by the participle statistics;
Described sentence length obtains by the participle statistics;
Described emotion degree obtains by the various distributions of words in text that have emotion that statistics presets;
Described word scope is calculated by the non-high frequency vocabulary number in total vocabulary number in the network text and the network text;
Described ability to express is calculated by total vocabulary number in the network text and adjective, the adverbial word number in the network text;
Described logicality is calculated by total vocabulary number in the network text and the conjunction number in the network text;
Step c, comparison user characteristics parameter are finished the classification of network text successively;
Steps d, according to user ID, compile network text with same user ID; According to the classification information of the network text that compiles the same user ID in back, user ID is sorted out;
Step e, when when search engine is submitted user ID to, according to the classification information of the described user ID of submitting to search engine, issue relevant information; Described relevant information comprises other users' of the affiliated classification of this user ID user ID, and/or, described other users' virtual personal space chained address.
28. method as claimed in claim 27 is characterized in that, also comprises:
Carry out the segmentation of genre parameters at the network text of same classification, the user ID that will belong to same genre parameters level is labeled as same group.
29. method as claimed in claim 27 is characterized in that, also comprises:
According to log-on message, the user ID of same classification is segmented.
30. the distributing device of a relevant information is characterized in that, comprising:
Collector unit is used for collecting the non-knowledge class network text that the internet meets prerequisite;
The text feature parameter calculation unit is used to calculate the corresponding user characteristics parameter of network text; Wherein, described user characteristics parameter comprises user's genre parameters, polynary phrase frequency, vocabulary connection degree, part of speech degree, sentence length and/or emotion degree; Described genre parameters comprises word scope, ability to express and/or logicality;
Wherein, described polynary phrase frequency is used for characterizing the frequency that the adjacent speech of text constitutes binary or polynary phrase, obtains by the participle statistics;
Described vocabulary connection degree is used to characterize the number of the different speech adjacent with certain word, obtains by the participle statistics;
Described part of speech degree is used for characterizing the number of the word of the various parts of speech of text, obtains by the participle statistics;
Described sentence length obtains by the participle statistics;
Described emotion degree obtains by the various distributions of words in text that have emotion that statistics presets;
Described word scope is calculated by the non-high frequency vocabulary number in total vocabulary number in the network text and the network text;
Described ability to express is calculated by total vocabulary number in the network text and adjective, the adverbial word number in the network text;
Described logicality is calculated by total vocabulary number in the network text and the conjunction number in the network text;
The unit is sorted out in comparison, is used to compare the user characteristics parameter, finishes the classification of network text successively;
Collection module is used for according to user ID, compiles the network text with same user ID;
Terminal user's classifying module is used for the classification information according to the network text that compiles the same user ID in back, and user ID is sorted out;
Release module is used for when needs are issued relevant information according to the user ID of submitting to when the forward direction search engine, according to the described classification information of working as the user ID of forward direction search engine submission, issue relevant information; Described relevant information comprises the user ID with generic other users of the user ID of described submission, and/or, described other users' virtual personal space chained address.
31. device as claimed in claim 30 is characterized in that, also comprises:
Classification segmentation unit is used for carrying out at the network text of same classification the segmentation of genre parameters, and the user ID that will belong to same genre parameters level is labeled as same group.
32. device as claimed in claim 30 is characterized in that, also comprises:
Classification segmentation unit is used for according to log-on message, and the user ID of same classification is segmented.
CN200710000966A 2007-01-15 2007-01-15 Method and device for issuing correlation information Active CN101000627B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN200710000966A CN101000627B (en) 2007-01-15 2007-01-15 Method and device for issuing correlation information

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN200710000966A CN101000627B (en) 2007-01-15 2007-01-15 Method and device for issuing correlation information

Publications (2)

Publication Number Publication Date
CN101000627A CN101000627A (en) 2007-07-18
CN101000627B true CN101000627B (en) 2010-05-19

Family

ID=38692599

Family Applications (1)

Application Number Title Priority Date Filing Date
CN200710000966A Active CN101000627B (en) 2007-01-15 2007-01-15 Method and device for issuing correlation information

Country Status (1)

Country Link
CN (1) CN101000627B (en)

Families Citing this family (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101420313B (en) * 2007-10-22 2011-01-12 北京搜狗科技发展有限公司 Method and system for clustering customer terminal user group
CN101520784B (en) * 2008-02-29 2011-09-28 富士通株式会社 Information issuing system and information issuing method
CN101324948B (en) * 2008-07-24 2015-11-25 阿里巴巴集团控股有限公司 A kind of method of information recommendation and device
CN102486771B (en) * 2009-11-30 2015-07-08 国际商业机器公司 Method and system for releasing specified contents on webpage
CN103793495B (en) * 2010-12-07 2017-06-16 北京奇虎科技有限公司 Application message search method and system and application message acquisition methods and system
CN102609422A (en) * 2011-01-25 2012-07-25 阿里巴巴集团控股有限公司 Class misplacing identification method and device
CN102821058B (en) * 2012-07-18 2016-06-08 上海量明科技发展有限公司 The method that realizes of circle map, client and system in instant messaging
CN102855282B (en) * 2012-08-01 2018-10-16 北京百度网讯科技有限公司 A kind of document recommendation method and device
CN104123291B (en) * 2013-04-25 2017-09-12 华为技术有限公司 A kind of method and device of data classification
CN104794245B (en) * 2015-05-14 2018-07-13 百度在线网络技术(北京)有限公司 Information search method and device
CN104992182A (en) * 2015-06-29 2015-10-21 北京京东尚科信息技术有限公司 Method and device for determining user level
CN106126566B (en) * 2016-06-17 2019-08-23 武汉斗鱼网络科技有限公司 A kind of text list Rich Media breviary methods of exhibiting and device
CN106776808A (en) * 2016-11-23 2017-05-31 百度在线网络技术(北京)有限公司 Information data offering method and device based on artificial intelligence
CN107391723B (en) * 2017-07-31 2020-05-22 戴智伟 Method and system for automatically searching, classifying and redistributing information
CN110458236A (en) * 2019-08-14 2019-11-15 有米科技股份有限公司 A kind of Advertising Copy style recognition methods and system
CN112163585B (en) * 2020-11-10 2023-11-10 上海七猫文化传媒有限公司 Text auditing method and device, computer equipment and storage medium
CN113190683B (en) * 2021-07-02 2021-09-17 平安科技(深圳)有限公司 Enterprise ESG index determination method based on clustering technology and related product

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1612135A (en) * 2003-10-30 2005-05-04 中联绿盟信息技术(北京)有限公司 Invasion detection (protection) product and firewall product protocol identifying technology
CN1629844A (en) * 2003-12-15 2005-06-22 微软公司 Dynamic content clustering
CN1845104A (en) * 2006-05-22 2006-10-11 赵开灏 System and method for intelligent retrieval and processing of information

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1612135A (en) * 2003-10-30 2005-05-04 中联绿盟信息技术(北京)有限公司 Invasion detection (protection) product and firewall product protocol identifying technology
CN1629844A (en) * 2003-12-15 2005-06-22 微软公司 Dynamic content clustering
CN1845104A (en) * 2006-05-22 2006-10-11 赵开灏 System and method for intelligent retrieval and processing of information

Also Published As

Publication number Publication date
CN101000627A (en) 2007-07-18

Similar Documents

Publication Publication Date Title
CN101000627B (en) Method and device for issuing correlation information
US9348934B2 (en) Systems and methods for facilitating open source intelligence gathering
US8650198B2 (en) Systems and methods for facilitating the gathering of open source intelligence
US8725717B2 (en) System and method for identifying topics for short text communications
CN100405371C (en) Method and system for abstracting new word
CN109299271B (en) Training sample generation method, text data method, public opinion event classification method and related equipment
CN102799610B (en) Method and system for collecting network information
CN106339502A (en) Modeling recommendation method based on user behavior data fragmentation cluster
CN107124653A (en) The construction method of TV user portrait
US20190151758A1 (en) Unique virtual entity creation based on real world data sources
CN106354861A (en) Automatic film label indexing method and automatic indexing system
CN110633406B (en) Event thematic generation method and device, storage medium and terminal equipment
KR101753762B1 (en) Robot Journalism Method and System for Automatic Article Generation
CN103780677A (en) Method for performing classified information push and system thereof
CN104899306B (en) Information processing method, information display method and device
CN103760991A (en) Physical input method and physical input device
CN102831229A (en) Web page browsing method suitable for blind persons
CN108230026A (en) Method and apparatus, electronic equipment, storage medium, program are recommended in advertisement
CN103605808A (en) Search-based UGC (user generated content) recommendation method and search-based UGC recommendation system
CN104090923A (en) Method and device for displaying rich media information in browser
CN108733791A (en) network event detection method
CN104503988A (en) Searching method and device
CN103902596B (en) High frequency content of pages clustering method and system
KR101269205B1 (en) Multilanguage information offer system of one - website
CN106776640A (en) A kind of stock information information displaying method and device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant