CN102968439A

CN102968439A - Method and device for sending microblogs

Info

Publication number: CN102968439A
Application number: CN2012103850367A
Authority: CN
Inventors: 伏圣国
Original assignee: Weimeng Chuangke Network Technology China Co Ltd
Current assignee: Weimeng Chuangke Network Technology China Co Ltd
Priority date: 2012-10-11
Filing date: 2012-10-11
Publication date: 2013-03-13
Anticipated expiration: 2032-10-11
Also published as: CN102968439B

Abstract

The invention discloses a method and a device for sending microblogs, which solve the problem that some microblogs about hot spots and popular feelings cannot be sent to corresponding users timely in the prior art. By utilizing the method, keywords in each microblog received within set time intervals are determined, accordingly, keyword sets are determined; increment clustering is carried out on each determined keyword set according to the keywords contained in intersection and union that each two keywords aggregate; aiming at the obtained clustering keyword sets, when the clustering keyword sets do not exist in a hot spot and popular feeling library, wherein the similarity of the clustering keyword sets about hot spots and popular feelings is larger than the set similarity of the keyword sets about hot spots and popular feelings, the microblogs related to the clustering keywords are sent to related users. As the number of clustering needs not to be set in clustering, any keyword set about hot spots and popular feelings within the set time interval cannot be omitted, the microblogs reflecting the hot spots and popular feelings can be timely sent to the corresponding users.

Description

A kind of method and device that pushes microblogging

Technical field

The present invention relates to communication technical field, relate in particular to a kind of method and device that pushes microblogging.

Background technology

In recent years, be accompanied by popularizing of internet, the network media has been acknowledged as " fourth media " after newspaper, broadcasting, TV, network has become one of main carriers of reflection social hotspots public sentiment, particularly by rise and the development of microblogging,, fast propagation instant by microblogging, characteristics have easily further promoted the development of network public-opinion, and it is with strongest influence power a kind of in the network public-opinion that the public sentiment on the microblogging also becomes.

Pass through microblogging, the user both can be published to the public sentiment of oneself finding on the microblogging, also the microblogging of other user's issues can be transmitted, can cause the focus public sentiment that a large number of users is paid close attention to for some, in case it is issued at microblogging, then this microblogging will be transmitted, pay close attention to by a large number of users in the short time.Therefore, relevant departments and enterprise have all begun the focus public sentiment paying attention to reflecting in the microblogging, to tackle timely according to the focus public sentiment.For example, the information that the baby that will be critically ill is sent to hospital is published on the microblogging, and this microblogging can be transmitted at short notice in a large number, to cause the concern of traffic control department, traffic control department then in time takes for the vehicle at this baby place Corresponding Countermeasures such as open a way, to guarantee that the baby can be delivered to hospital timely.

Yet, because the quantity of information of microblogging is very huge, only rely on artificial method to determine that the focus public sentiment that reflects in the magnanimity microblogging is very difficult, therefore, the focus public sentiment that reflects in the how to confirm magnanimity microblogging becomes a problem demanding prompt solution.

In the prior art, mainly be to adopt based on the text cluster technology of k-means algorithm to determine the focus public sentiment that reflects in the magnanimity microblogging, and at least one microblogging that will reflect this focus public sentiment is pushed to relative users, and this user specifically can be the users such as relevant departments or relevant enterprise.

Need to preset the number of cluster based on the text cluster of k-means algorithm, also namely need to preset the quantity of the focus public sentiment that reflects in the magnanimity microblogging, could carry out text cluster to these magnanimity microbloggings according to the k-means algorithm.Each cluster that obtains is exactly the microblogging cluster of each focus public sentiment of reflection, and the quantity of the focus public sentiment of also namely determining is exactly the number of predefined cluster.

Yet the quantity of the focus public sentiment that reflects in the magnanimity microblogging can not be estimated often.If the number of predefined cluster is very few, will omit some the focus public sentiment that reflects in the microblogging, cause reflecting that the microblogging of the focus public sentiment of omission can not be pushed to relative users timely.

Summary of the invention

The embodiment of the invention provides a kind of method and device that pushes microblogging, can not be pushed to timely the problem of relative users in order to the microblogging that solves some focus public sentiment of reflection in the prior art.

A kind of method that pushes microblogging that the embodiment of the invention provides comprises:

Be received in each microblogging of issue in the setting-up time interval, the keyword in each microblogging of determining to receive;

According to each keyword of determining, adopt establishing method to determine keyword set, and determine all keyword set that the described establishing method of employing can be determined, wherein, described establishing method is: select arbitrarily two keywords to consist of a keyword set in each keyword;

According to the common factor of per two keyword set in each keyword set of determining and and concentrate the keyword that comprises, each keyword set of determining is carried out the increment cluster, obtain each cluster keyword set;

For each the cluster keyword set that obtains, judge whether exist in the focus public sentiment storehouse with the similarity of this cluster keyword set greater than the focus public sentiment keyword set of setting similarity, when not existing, in each microblogging that receives, select the microblogging relevant with this cluster keyword set to be pushed to the associated user, and this cluster keyword set is kept in the described focus public sentiment storehouse as focus public sentiment keyword set.

A kind of device that pushes microblogging that the embodiment of the invention provides comprises:

Receive word-dividing mode, be used for being received in each microblogging of issue in the setting-up time interval, the keyword in each microblogging of determining to receive;

The keyword set determination module, be used for according to each keyword of determining, adopt establishing method to determine keyword set, and definite all keyword set that adopt described establishing method to determine, wherein, described establishing method is: select arbitrarily two keywords to consist of a keyword set in each keyword;

Increment cluster module, be used for according to the common factor of per two keyword set of each keyword set of determining and and concentrate the keyword that comprises, each keyword set of determining is carried out the increment cluster, obtain each cluster keyword set;

Judge pushing module, be used for for each the cluster keyword set that obtains, judge whether exist in the focus public sentiment storehouse with the similarity of this cluster keyword set greater than the focus public sentiment keyword set of setting similarity, when not existing, in each microblogging that receives, select the microblogging relevant with this cluster keyword set to be pushed to the associated user, and this cluster keyword set is kept in the described focus public sentiment storehouse as focus public sentiment keyword set.

The embodiment of the invention provides a kind of method and device that pushes microblogging, keyword in each microblogging that the method is determined to receive in the setting-up time interval, the employing method that optional two keywords consist of keyword set in each keyword is determined keyword set, and according to the common factor of per two keyword set and and concentrate the keyword that comprises, each keyword set of determining is carried out the increment cluster, for the cluster keyword set that obtains, when not existing in the focus public sentiment storehouse with the similarity of this cluster keyword set when setting the focus public sentiment keyword set of similarity, the microblogging relevant with this cluster keyword is pushed to the associated user.Pass through said method, the cluster keyword set that obtains is exactly the focus public sentiment keyword set corresponding to focus public sentiment that reflects of each microblogging of issue in this setting-up time interval, and owing to when cluster, need not to preset the number of cluster, therefore can not omit any focus public sentiment keyword set in this setting-up time interval, the microblogging of reflection focus public sentiment can be pushed to relative users timely.

Description of drawings

The process of the propelling movement microblogging that Fig. 1 provides for the embodiment of the invention;

The apparatus structure synoptic diagram of the propelling movement microblogging that Fig. 2 provides for the embodiment of the invention.

Embodiment

Below in conjunction with Figure of description, the embodiment of the invention is described in detail.

The process of the propelling movement microblogging that Fig. 1 provides for the embodiment of the invention specifically may further comprise the steps:

S101: be received in each microblogging of issue in the setting-up time interval, the keyword in each microblogging of determining to receive.

In embodiments of the present invention, when server finishes at each setting-up time interval, the keyword that comprises in each microblogging of determining in current time interval, to receive.Wherein, this setting-up time interval can be set as required, for example is set as 2 hours.

During the keyword that in each microblogging of determining to receive, comprises, can carry out word segmentation processing to each microblogging that receives, and in each participle that obtains by word segmentation processing, determine the participle of specified type, as the keyword of determining.Concrete, can be in each participle that obtains by word segmentation processing, remove first stop words, again for remaining each participle, with this participle respectively with the participle dictionary of the specified type of pre-save in participle mate, if the match is successful, illustrate that then this participle is the participle of this specified type, is defined as keyword with this participle.Wherein, specified type comprises: minute part of speech type such as noun type, verb type, adjective type.

S102: according to each keyword of determining, adopt establishing method to determine keyword set, and determine all keyword set that this establishing method of employing can be determined.

Wherein, this establishing method is: select arbitrarily two keywords to consist of a keyword set in each keyword.

For example, the keyword that comprises in each microblogging that receives in this setting-up time interval of supposing to determine in step S101 is keyword X, keyword Y, keyword Z, then server adopts and selects arbitrarily two keywords to consist of the method for a keyword set, the keyword set that can determine is combined into { keyword X, keyword Y}, { keyword Y, keyword Z}, { keyword X, keyword Z} be totally 3 keyword set.

S103: according to the common factor of per two keyword set in each keyword set of determining and and concentrate the keyword that comprises, each keyword set of determining is carried out the increment cluster, obtain each cluster keyword set.

Concrete, server can sort to each keyword according to certain rule first when each keyword set is carried out the increment cluster, and according to the sequencing that sorts, carries out following steps A ~ B for each keyword set successively:

Steps A, with current for keyword set as keyword set to be clustered, determine to come each keyword set before the keyword set to be clustered, as the preorder keyword set;

Step B, for each preorder keyword set of determining, determine the first quantity of the keyword that comprises in the common factor of keyword set to be clustered and this preorder keyword set, determine the second quantity keyword set to be clustered and this preorder keyword set and that concentrate the keyword that comprises, when the ratio of the first quantity and the second quantity when setting ratio, the keyword that satisfies the first specified requirements in the keyword set to be clustered is added in this preorder keyword set, wherein, the keyword of satisfied the first specified requirements is: be included in this keyword set to be clustered, and be not included in the keyword in this preorder keyword set.

When the sequencing according to ordering, carried out above-mentioned steps A ~ B for each keyword set successively after, then the increment cluster finishes, each keyword set after the cluster that obtains is exactly the cluster keyword set.

Continue to continue to use example, the keyword set of determining among the step S102 is combined into { keyword X, keyword Y}, { keyword Y, keyword Z}, { keyword X, keyword Z}, totally 3 keyword set are supposed these 3 keyword set are sorted arbitrarily, and sorting is: { keyword X, keyword Y}, { keyword Y, keyword Z}, keyword X, keyword Z}, then:

Sequencing according to ordering, first for keyword set keyword X, keyword Y} is with keyword set { keyword X, keyword Y} is as keyword set to be clustered, and since do not exist come keyword set keyword X, the keyword set before the keyword Y} is therefore to keyword set { keyword X, the processing of keyword Y} finishes, according to the sequencing of ordering, { keyword Y, keyword Z} processes for keyword set in continuation.

For keyword set { keyword Y, keyword Z}, with keyword set { keyword Y, keyword Z} is as keyword set to be clustered, { before the keyword Y, keyword Z} is keyword set { keyword X, keyword Y} to come keyword set, therefore { keyword X, keyword Y} is as the preorder keyword set with keyword set.Keyword set to be clustered { keyword Y, keyword Z} and preorder keyword set { keyword X, the first quantity of the keyword that comprises in the common factor of keyword Y} is 1, and the second quantity of the concentrated keyword that comprises is 3, the ratio of the first quantity and the second quantity is 1/3, suppose that setting ratio is 1/5, then the ratio of the first quantity and the second quantity is greater than setting ratio.And, the keyword that satisfies the first specified requirements is that keyword Z(keyword Z only is included in the keyword set to be clustered, be not included in the preorder keyword), therefore, with keyword set to be clustered { keyword Y, keyword Z among the keyword Z} adds the preorder keyword set to, and { keyword X is among the keyword Y}.At this moment, { keyword X, keyword Y} have just become keyword set { keyword X, keyword Y, keyword Z} to keyword set.So far, { keyword Y, the processing of keyword Z} finishes to keyword set.According to the sequencing of ordering, { keyword X, keyword Z} processes for keyword set in continuation.

For keyword set keyword X, keyword Z} is with keyword set { keyword X, keyword Z } as keyword set to be clustered, { before the keyword X, keyword Z} is keyword set { keyword X to come keyword set, keyword Y, keyword Z} and keyword set keyword Y, keyword Z}, therefore, with keyword set { keyword X, keyword Y, { keyword Y, keyword Z} are as the preorder keyword set for keyword Z} and keyword set.For preorder keyword set { keyword X, keyword Y, keyword Z}, although keyword set to be clustered { keyword X, keyword Z} and this preorder keyword set { keyword X, keyword Y, the first quantity of the keyword that comprises in the common factor of keyword Z} with and the ratio of the second quantity of concentrating the keyword comprise greater than setting ratio, but, because this keyword set to be clustered { keyword X, all keywords among the keyword Z} all be included in the preorder keyword set keyword X, keyword Y is among the keyword Z}, so this keyword set to be clustered { keyword X, do not exist among the keyword Z} and can add this preorder keyword set { keyword X, keyword Y, the keyword among the keyword Z} to.For preorder keyword set { keyword Y, keyword Z}, this keyword set to be clustered { keyword X, keyword Z} and this preorder keyword set { keyword Y, the first quantity of the keyword that comprises in the common factor of keyword Z} with and the ratio of the second quantity of concentrating the keyword comprise greater than setting ratio, and with respect to this preorder keyword set { keyword Y, keyword Z}, this keyword set to be clustered { keyword X, the keyword that satisfies the first specified requirements among the keyword Z} is keyword X, therefore { keyword Y is among the keyword Z} to add keyword X to this preorder keyword set.At this moment, { keyword Y, keyword Z} have just become keyword set { keyword X, keyword Y, keyword Z} to keyword set.

So far, 3 keyword set are all processed, therefore the increment cluster finishes, 3 cluster keyword set that obtain are respectively keyword set { keyword X, keyword Y, keyword Z}, keyword set { keyword X, keyword Y, keyword Z}, keyword set { keyword X, keyword Z}.As seen, in 3 cluster keyword set of this that obtains, it is identical that two cluster keyword set are arranged, thus in the embodiment of the invention for several identical cluster keyword set that obtain, then only keep one.

In embodiments of the present invention, in fact each the cluster keyword set that obtains by the increment cluster is exactly focus public sentiment keyword set corresponding to each focus public sentiment that the microblogging issued in this setting-up time interval reflects.

S104: for each the cluster keyword set that obtains, judge whether exist in the focus public sentiment storehouse with the similarity of this cluster keyword set greater than the focus public sentiment keyword set of setting similarity, if then carry out step S105, otherwise carry out step S106.

In embodiments of the present invention, safeguard a focus public sentiment storehouse in the server, preserved focus public sentiment keyword set corresponding to each focus public sentiment in this focus public sentiment storehouse, the set that each keyword that in fact a focus public sentiment keyword set corresponding to focus public sentiment is exactly this focus public sentiment consists of.When server was found new focus public sentiment at every turn, just the focus public sentiment keyword set that this focus public sentiment is corresponding was kept in this focus public sentiment storehouse.

Among the above-mentioned steps S104 for each the cluster keyword set that obtains, judge whether exist in the focus public sentiment storehouse with this cluster keyword set think speed greater than the focus public sentiment keyword set of setting similarity, in fact be exactly to judge whether focus public sentiment keyword set corresponding to focus public sentiment that the microblogging issued in this setting-up time interval reflects is a focus public sentiment keyword set that new focus public sentiment is corresponding.

S105: determine in focus public sentiment storehouse and the focus public sentiment keyword set of the similarity maximum of this cluster keyword set that microblogging that will be relevant with this cluster keyword set is as the microblogging preservation relevant with this focus public sentiment keyword set of determining.

If have at least one and the similarity of this cluster keyword set focus public sentiment keyword set greater than the setting similarity in the focus public sentiment storehouse, illustrate that then focus public sentiment corresponding to this cluster keyword set is not is a new focus public sentiment.Therefore, in focus public sentiment storehouse, determine the focus public sentiment keyword set with the similarity maximum of this cluster keyword set in the embodiment of the invention, it is identical with focus public sentiment corresponding to this cluster keyword set that focus public sentiment corresponding to this focus public sentiment keyword of determining then can be thought, microblogging that therefore will be relevant with this cluster keyword set is as the microblogging preservation relevant with this focus public sentiment keyword set of determining.

S106: in each microblogging that receives, select the microblogging relevant with this cluster keyword to be pushed to the associated user, and this cluster keyword set is kept in the focus public sentiment storehouse as focus public sentiment keyword set.

If do not exist in the focus public sentiment storehouse with the similarity of this cluster keyword set greater than the focus public sentiment keyword set of setting similarity, illustrate that then focus public sentiment corresponding to this cluster keyword set is a new focus public sentiment.Therefore, in each microblogging that will in this setting-up time interval, receive, select the microblogging relevant with this cluster keyword to be pushed to the associated user, and this cluster keyword set is kept in the focus public sentiment storehouse as a focus public sentiment keyword corresponding to new focus public sentiment.

By above-mentioned steps S106 as can be known, the cluster keyword set that server obtains for the increment cluster in the embodiment of the invention, in case finding focus public sentiment corresponding to this cluster keyword set is new focus public sentiment, just the microblogging relevant with the cluster keyword set is pushed to the associated user, therefore, if focus public sentiment corresponding to this cluster keyword set is not new focus public sentiment, the microblogging that then can further specify focus public sentiment corresponding to this cluster keyword set of reflection has been pushed to the associated user before, thereby when focus public sentiment corresponding to this cluster keyword set is not new focus public sentiment, need not to push the microblogging relevant with the cluster keyword set to the associated user, can reduce the consumption of Internet resources.

Pass through said method, each cluster keyword set that server obtains by the increment cluster is exactly focus public sentiment keyword set corresponding to focus public sentiment that each microblogging of issuing in this setting-up time interval reflects, because the embodiment of the invention is when carrying out the increment cluster, need not to preset the number of cluster, therefore can not omit focus public sentiment keyword set corresponding to any focus public sentiment that interior each microblogging of issuing of this time interval reflects, thereby, in a single day server finds new focus public sentiment, just can timely relevant microblogging be pushed to the associated user.And, the complexity of above-mentioned increment cluster is also far below the complexity of k-means clustering algorithm of the prior art, therefore the method for the above-mentioned propelling movement microblogging that provides of the embodiment of the invention can further reduce the time-delay that pushes microblogging, has improved the real-time that pushes microblogging.

In embodiments of the present invention, because the frequency that the keyword of determining in each microblogging that receives among the step S101 shown in Figure 1 occurs is different, therefore for the accuracy of the cluster keyword set that improves follow-up definite certain focus public sentiment of reflection, server is according to each keyword of determining among the step S102 shown in Figure 1, adopt establishing method to determine that the method for keyword set is specially: for each keyword of determining, the number of times that in each microblogging that receives, occurs according to this keyword and, the quantity of the microblogging that receives, and the inverse document frequency of this keyword of pre-save, adopt formula

Determine the weight of this keyword, wherein, n _WordThe number of times that in each microblogging that receives, occurs for this keyword and, N is the quantity of the microblogging that receives, Idf is the inverse document frequency of this keyword of pre-save, Word _WeightWeight for this definite keyword; According to the weight of each keyword of determining, select successively the first keyword of setting quantity according to weight order from big to small, adopt establishing method to determine keyword set.

Wherein, above-mentioned first sets quantity can set as required, for example can be set as 200.In the above-mentioned formula Be actually the word frequency of this keyword.Also be, server is determined keyword in each microblogging that receives after, determine the weight of each keyword, suppose that the first setting quantity is 200, front 200 keywords of the heavy maximum of server weighting then, according to these 200 keywords, in these 200 keywords, to select arbitrarily two keywords to consist of the method for a keyword set, determine keyword set, and determine all keyword set that employing the method can be determined, all keyword set that adopt the method to determine are then total Individual.

Accordingly, after determining keyword set, among the step S103 shown in Figure 1 each keyword set of determining is carried out before the increment cluster, also want the weight of each keyword set that calculative determination goes out, and still heavy several the larger keyword set of weighting are carried out the increment cluster.Concrete, before each keyword set of determining is carried out the increment cluster, also will be for each keyword set of determining, determine the mutual information of two keywords comprising in this keyword set, mutual information according to two keywords that comprise in this keyword set of determining, and the weight of these two keywords, adopt formula

Determine the weight of this keyword set, wherein, i represents that the keyword i that comprises in this keyword set, j represent the keyword j that comprises in this keyword set,

Be the weight of keyword i,

Be the weight of keyword j, D _WeightBe the weight of this definite keyword set, I (i, j) is the mutual information of keyword i and keyword j, and

P (i) comprises the probability of this keyword i for a microblogging receiving, and p (j) comprises the probability of this keyword j for a microblogging that receives, and p (i, j) comprises the probability of this keyword i and keyword j simultaneously for a microblogging that receives; According to the weight of each keyword set of determining, select successively the keyword set of the second setting quantity according to weight order from big to small.After the keyword set of selecting the second setting quantity, then can according to select second set per two keyword set in the keyword set of quantity common factor and and concentrate the keyword that comprises, the second keyword of setting quantity of selecting is carried out the increment cluster.

Wherein, second sets quantity can set as required, for example is set as 300.Also be, determine after each keyword set, determine the weight of each keyword set, suppose that the second setting quantity is 300, then select front 300 keyword set of weight maximum, according to the common factor of per two keyword set in these 300 keyword set and and concentrate the keyword that comprises, these 300 keyword set are carried out the increment cluster.

In embodiments of the present invention, the method that step S103 shown in Figure 1 carries out the increment cluster to the second keyword set of setting quantity of selecting is specially: according to the weight of each keyword set of selecting, the second keyword set of setting quantity of selecting is sorted according to weight order from big to small; According to the sequencing of keyword set ordering, successively for each keyword set, carry out following steps A ~ B:

The increment clustering method of above-mentioned increment clustering method and step S103 shown in Figure 1 is basic identical, is according to weight order ordering from big to small to the clooating sequence of keyword set just.

For example, suppose that the keyword set of selecting has 3, be respectively set 1, set 2, set 3.Suppose to be set 1 according to weight order ordering from big to small, set 2, set 3, then gathered before 1 owing to coming without any keyword set, therefore begin to process from gathering 2, to gather first 2 as keyword set to be clustered, to gather 1 as the preorder keyword set, if gather 2 and set 1 common factor in the keyword that comprises the first quantity with and the second quantity of concentrating the keyword that comprises greater than setting ratio (for example 20%), then the keyword that satisfies the first specified requirements in the set 2 is added in the set 1, similarly, pair set 3 is processed again, just gives unnecessary details no longer one by one here.

Need to prove, when pair set 3 is processed, be to have added the set 1 of gathering after the keyword that satisfies the first specified requirements in 2 as the set 1 of preorder keyword set in the upper example.

As seen, the above-mentioned increment cluster that the invention process provides is to have certain directivity, and the direction of cluster is: the keyword that satisfies the first specified requirements in the keyword set that weight is less adds in the larger keyword set of weight.This is because common, the weight of the keyword set of reflection focus public sentiment is larger compared to the weight of the keyword set of the general public sentiment of reflection, therefore above-mentioned increment clustering method can be determined more accurately the focus public sentiment that each microblogging of receiving in this setting-up time interval reflects.

The method of above-mentioned increment cluster be with when the first quantity of the keyword that comprises in the common factor of keyword set to be clustered and preorder keyword set with and the ratio of the second quantity of concentrating the keyword that comprises when setting ratio, the keyword that satisfies the first specified requirements in the keyword set to be clustered added in the preorder keyword set as example to describing.In actual applications, can also determine the first quantity of the keyword that comprises in the common factor of keyword set to be clustered and preorder keyword set, less quantity in the quantity of the keyword that comprises in the quantity of the keyword that comprises in the keyword set to be clustered and the preorder keyword set is defined as the second quantity, when the ratio of the first quantity and the second quantity when setting ratio, the keyword that satisfies the first specified requirements in the keyword set to be clustered is added in the preorder keyword set.

In order further to improve the accuracy of the focus public sentiment that each microblogging of determining to receive in this setting-up time interval reflects, in the above-mentioned increment clustering method, before server adds to the keyword that satisfies the first specified requirements in the keyword set to be clustered in this preorder keyword set, also will be in each microblogging that receives, determine to comprise simultaneously the keyword that satisfies the first specified requirements, and the quantity of the microblogging of each keyword in this preorder keyword set is set quantity greater than the 3rd.Also be, the first quantity of the keyword that in the common factor of keyword set to be clustered and preorder keyword set, comprises with and the ratio of the second quantity of concentrating the keyword comprise greater than setting ratio, and, when the quantity that comprises simultaneously the microblogging of the keyword that satisfies the first specified requirements and each keyword in the preorder keyword set is set quantity greater than the 3rd, just the keyword that satisfies the first specified requirements in the keyword set to be clustered is added in the preorder keyword set, as long as an above-mentioned condition does not satisfy, then the keyword that satisfies the first specified requirements in the keyword set to be clustered is not added in the preorder keyword set.

Concrete, realize that the false code of method of above-mentioned increment cluster is as follows:

In above-mentioned false code, cluster is keyword set.

Method by above-mentioned increment cluster, after the second keyword set of setting quantity of selecting all processed, each keyword set after the processing that obtains is exactly the keyword set of having added some keyword, also namely obtained the cluster keyword set, each cluster keyword set is exactly the focus public sentiment keyword set corresponding to each focus public sentiment that reflects of microblogging of issue in this setting-up time interval.Follow-up then can be for each the cluster keyword set that obtains, judge whether exist in the focus public sentiment storehouse with the similarity of this cluster keyword set greater than the focus public sentiment keyword set of setting similarity, whether to judge focus public sentiment corresponding to this cluster keyword set as a new focus public sentiment, the microblogging that the cluster keyword set that wherein new focus public sentiment is corresponding is relevant is pushed to the associated user.

In embodiments of the present invention, step S104 shown in Figure 1 is for each the cluster keyword set that obtains, judge whether exist in the focus public sentiment storehouse with the similarity of this cluster keyword set greater than before the focus public sentiment keyword set of setting similarity, can also filter each the cluster keyword set that obtains, weed out the cluster keyword set of the negligible amounts of the negligible amounts of the keyword that comprises, relevant microblogging, the accuracy of the cluster keyword set of the reflection focus public sentiment of determining with further raising.Concrete, for each the cluster keyword set that obtains, judge whether exist in the focus public sentiment storehouse with the similarity of this cluster keyword set greater than before the focus public sentiment keyword set of setting similarity, server is in each the cluster keyword set that obtains, extract the cluster keyword set that satisfies the second specified requirements, this second specified requirements comprises: the quantity of the keyword that comprises is no less than the 4th and sets quantity, and the quantity of the microblogging relevant with this cluster keyword set is set quantity greater than the 5th; Wherein, the microblogging relevant with this cluster keyword set specifically comprises: include the microblogging of at least m keyword in this cluster keyword set, m is the 6th setting quantity.

Above-mentioned the 4th setting quantity and the 5th is set quantity and all can be set as required.For example, the 4th setting quantity is that 3, the five setting quantity are 20, then can reject the quantity of the keyword that comprises less than 3 in each the cluster keyword set that obtains by the increment cluster, and perhaps, the quantity of relevant microblogging is not more than 20 cluster keyword set.

Need to prove, in the relevant microblogging of cluster keyword set corresponding to each focus public sentiment, there is identical microblogging, with what avoid repetition same microblogging is pushed to the associated user, the problem of waste Internet resources, the method of determining the microblogging that each cluster keyword set of obtaining is relevant in the embodiment of the invention is specifically as follows: the weight that redefines each the cluster keyword set that obtains, according to weight order from big to small, successively for each cluster keyword set, determine relevant with this cluster keyword set (microblogging that includes at least m keyword in this cluster keyword set), and be not confirmed as the microblogging relevant with other cluster keyword set, as the microblogging relevant with this cluster keyword set.

For example, suppose that the cluster keyword set that obtains has 3, be respectively set 1, set 2, set 3, redefine the weight of these 3 cluster keyword set, be followed successively by according to weight order from big to small: set 1, set 2, set 3.Suppose that the microblogging that receives in this setting-up time interval has 5, be respectively microblogging 1 ~ 5, then first for set 1, determine to comprise in this set 1 at least that the microblogging of m keyword is microblogging 1, microblogging 2 and microblogging 3, then with gather 1 relevant microblogging and be microblogging 1, microblogging 2 and microblogging 3.Again for set 2, determining to comprise in this set 2 at least, the microblogging of m keyword is microblogging 3 and microblogging 4, and since microblogging 3 be confirmed as with to gather 1 relevant, therefore relevant microbloggings existence intersect for fear of the microblogging relevant with set 1 of determining with set 2, not with microblogging 3 as with set 2 relevant microbloggings, only microblogging 4 is defined as and gathers 2 relevant microbloggings, also namely in each microblogging that receives, will comprise in this set 2 at least m keyword, and be not confirmed as the microblogging relevant with other cluster keyword set and be defined as the microblogging relevant with this set 2.Similarly, for set 3, determine to comprise in this set 3 at least that the microblogging of m keyword is microblogging 4, microblogging 5, then only microblogging 5 is defined as and gathers 3 relevant microbloggings.

As seen, obtain the cluster keyword set by the increment cluster in the embodiment of the invention, and after extracting each the cluster keyword set that satisfies the second specified requirements, the method of determining the microblogging relevant with each cluster keyword set also is to have certain directivity, and this direction also is to pay the utmost attention to the larger cluster keyword set of weight.

Concrete, the method for the above-mentioned weight that redefines the cluster keyword set can for: based on formula

D_{weight}^{'} = I (i_{1}, i_{2}, . . ., i_{n}) \times ({Word}_{weight}^{i_{1}} + {Word}_{weight}^{i_{2}} + . . . . . . + {Word}_{weight}^{i_{n}}),

Wherein, i ₁, i ₂..., i _nRepresent n keyword comprising in this cluster keyword set ...,

Be respectively the weight of this n keyword, I (i ₁, i ₂..., i _n) be the mutual information of this n keyword, and the mutual information of this n keyword can be according to formula

I (i_{1}, i_{2}, . . . . . ., i_{n}) = \log \frac{p (i_{1}, i_{2}, . . . . . ., i_{n})}{p (i_{1}) p (i_{2}) . . . . . . p (i_{n})}

Determine D ' _WeightWeight for this cluster keyword set of redefining.

Better, in the embodiment of the invention in order to improve the efficient of the weight that redefines the cluster keyword set, in the process of carrying out the increment cluster, for some keyword set, whenever, add a keyword in this keyword set, then redefine once the weight of this keyword set, when the increment cluster process finishes, this keyword set is exactly the cluster keyword set, and the weight of this definite keyword set is exactly the weight of this cluster keyword of redefining.The below adds a keyword so that concrete example explanation is every in this keyword set, then redefine once the method for the weight of this keyword set.

Suppose that keyword set is combined into that { weight of current this keyword set is for keyword X, keyword Y}

If add keyword Z in this keyword set, then this moment this keyword set become keyword X, keyword Y, keyword Z}, at this moment, the method that redefines the weight of this keyword set is: adopt formula

D_{weight}^{X, Y, Z} = I ((X, Y), Z) \times (D_{weight}^{X, Y} + {Word}_{weight}^{Z})

Redefine the weight of this keyword set, wherein,

Be the weight of keyword Z, P (X, Y, Z) comprises the probability of this keyword X, keyword Y, keyword Z simultaneously for a microblogging receiving, and P (Z) comprises the probability of this keyword Z for a microblogging that receives,

Weight for this keyword set of redefining.

Adopt said method, in the process of increment cluster, whenever, add a keyword in the keyword set, then redefine once the weight of this keyword set, after the increment cluster finishes, then can directly obtain the weight of each cluster keyword set, follow-uply then can sort according to the weight order from big to small of each cluster keyword set, and determine the microblogging that each cluster keyword set is relevant, just give unnecessary details no longer one by one here.

Adopt said method that the cluster keyword that obtains is filtered, and determine that the false code of the microblogging relevant with each the cluster keyword set that obtains is specific as follows:

In step S104 shown in Figure 1, for each the cluster keyword set that obtains, judge whether exist in the focus public sentiment storehouse with the similarity of this cluster keyword set when setting the focus public sentiment keyword set of similarity, can adopt the Jaccard similarity coefficient to determine the similarity of the focus keyword set in this cluster keyword set and the focus public sentiment storehouse.Concrete, satisfy the second specified requirements each cluster keyword set of (quantity of the keyword that comprises is no less than the 4th and sets the quantity of quantity, the microblogging relevant with this cluster keyword set greater than the 5th setting quantity), employing formula for what extract

Determine this cluster keyword set respectively with focus public sentiment storehouse in the similarity of each focus public sentiment keyword set, and judge whether exist in the focus public sentiment storehouse with the similarity of this cluster keyword set greater than the focus public sentiment keyword set of setting similarity, wherein, A this cluster keyword set for extracting, B is the focus public sentiment keyword set in the described focus public sentiment storehouse, J (A, B) be this definite cluster keyword set and the similarity of focus public sentiment keyword set, | the quantity of the keyword that comprises in the common factor of A ∩ B| for this cluster keyword set A and focus public sentiment keyword set B, | A ∪ B| is quantity this cluster keyword set A and focus public sentiment keyword set B and that concentrate the keyword that comprises.

Pass through said method, then can determine the similarity of each focus public sentiment keyword set in this cluster keyword set and the focus public sentiment storehouse, if do not exist the similarity of any and this cluster keyword set greater than the focus public sentiment keyword set of setting similarity in the focus public sentiment storehouse, determine that then focus public sentiment corresponding to this cluster keyword is not to be a new focus public sentiment, by step S106 the microblogging relevant with this cluster keyword is pushed to the associated user, and this cluster keyword set is kept in the focus public sentiment storehouse as a new focus public sentiment keyword set.Wherein, this associated user can be the users such as relevant departments or relevant enterprise specifically, also can be the user who has customized this focus public sentiment.

Better, when for certain the cluster keyword set that obtains, determine not exist in the focus public sentiment storehouse with the similarity of this cluster keyword set when setting the focus public sentiment keyword set of similarity, can also be in each microblogging relevant with this cluster keyword set, the shared proportion of microblogging difference of statistics positive emotion, negative emotion, neutral emotion, the proportion that counts is sent to the associated user, so that the user better holds the public opinion emotion situation of focus public sentiment corresponding to this cluster keyword set, to tackle accordingly.

Concrete, for each microblogging relevant with this cluster keyword set, determine the emotion propensity value of this microblogging, emotion propensity value according to each microblogging relevant with this cluster keyword set of determining, determine respectively wherein the emotion propensity value greater than 0, less than 0, equal 0 the shared number percent of microblogging, the number percent of determining is sent to described associated user.Wherein, the emotion propensity value of determining for each microblogging relevant with this cluster keyword set can represent the emotion tendency of each microblogging.

Further, above-mentionedly determine that the method for the emotion propensity value of certain microblogging is specially: to the processing of making pauses in reading unpunctuated ancient writings of this microblogging, obtain each subordinate sentence in this microblogging; For each subordinate sentence, this subordinate sentence is carried out word segmentation processing, obtain each participle in this subordinate sentence, the emotion propensity value corresponding according to each participle of pre-save, determine the emotion propensity value of each participle in this subordinate sentence, according to the emotion propensity value of each participle in this subordinate sentence and the type of this subordinate sentence, determine the emotion propensity value of this subordinate sentence; With the emotion propensity value of each subordinate sentence in this microblogging of determining and value, be defined as the emotion propensity value of this microblogging.

Wherein, above-mentioned punctuate process can be specifically according to the processing of making pauses in reading unpunctuated ancient writings of the punctuation mark in the microblogging, such as with fullstop, say hello, the punctuation mark such as exclamation mark is made pauses in reading unpunctuated ancient writings processing.

The below illustrates the method for determining the emotion propensity value of microblogging in the embodiment of the invention respectively take four kinds of situations as example.

Situation one, suppose that this microblogging is for " brand A mobile phone is handy." then this microblogging only have a subordinate sentence, this subordinate sentence is that " brand A mobile phone is handy." this microblogging is carried out participle, can according to the emotion word with emotion tendency of pre-save, extract the emotion word in the participle that obtains.The emotion word of supposing pre-save comprises participle " good ", then this microblogging is carried out participle after, then extract the participle " good " in the participle obtain.The server pre-save emotion propensity value corresponding to each emotion word, wherein, if the emotion that the emotion word is corresponding is positive emotion, the emotion propensity value that then this emotion word of preservation is corresponding is greater than 0, if the emotion that the emotion word is corresponding is negative emotion, the emotion propensity value that then this emotion word of preservation is corresponding is less than 0.Here the emotion propensity value that the emotion word of pre-save " good " is corresponding is greater than 0, and only comprises an emotion word in this subordinate sentence, so emotion propensity value corresponding to this emotion word is the emotion propensity value of this subordinate sentence.Owing to only have this subordinate sentence in this microblogging, so the emotion propensity value of this subordinate sentence is the emotion propensity value of this microblogging, and the emotion propensity value of this microblogging that obtains is for greater than 0 value, and also namely the emotion of this microblogging tendency is positive emotion.

Situation two, suppose that this microblogging is for " brand A mobile phone is not handy." then similar with situation one, this microblogging is carried out participle, extract the emotion word in the participle that obtains, the emotion word of extraction is " good ".

If directly adopt the method for analogue one to determine the emotion tendency of this microblogging, the emotion of this microblogging tendency also is positive emotion so, but obviously, the emotion of this microblogging tendency is negative emotion.Therefore, in the embodiment of the invention when determining the emotion propensity value of microblogging, adopt the negative word rule to process, also be, for the emotion word that obtains, in the subordinate sentence at this emotion word place, determine to be arranged in the quantity q of the negative word that 3 participles before this emotion word comprise, the emotion propensity value of this emotion word of pre-save is multiplied each other with (1) q, with the product that obtains again as the emotion propensity value of this emotion word.Wherein, server can negative words such as " no ", "No" of pre-save, according to each negative word of preserving, determines the quantity of the negative word that comprises in 3 participles before the emotion word in certain subordinate sentence.

Because the emotion word in this microblogging is " good ", and there is a negative word " no " in 3 participles before being arranged in this emotion word, therefore the emotion propensity value with this emotion word " good " multiplies each other with (1), with the product that obtains again as the emotion propensity value of this emotion word, thereby the emotion propensity value of this microblogging is the value less than 0, and also namely the emotion of this microblogging tendency is negative emotion.

Situation three, suppose this microblogging for " brand A mobile phone is handy? " then similar with situation one, this microblogging is carried out participle, extract the emotion word in the participle that obtains, the emotion word of extraction is " good ".

If directly adopt the method for analogue one to determine the emotion tendency of this microblogging, the emotion of this microblogging tendency also is positive emotion so.But the sentence pattern of this subordinate sentence is the interrogative sentence type, and generally the emotion of this microblogging tendency should be to be partial to negative emotion.Therefore, when determining the emotion propensity value of microblogging, consider that sentence pattern is on the impact of the emotion tendency of subordinate sentence in the embodiment of the invention.When the sentence pattern of determining subordinate sentence is the interrogative sentence type, after then the emotion propensity value addition of each emotion word in this subordinate sentence being obtained and being worth, with multiplying each other with value and-1 of obtaining, with the product that obtains again as the emotion propensity value of this subordinate sentence.Wherein, server can be determined according to the punctuation mark of the subordinate sentence that obtains the sentence pattern of this subordinate sentence after to the microblogging punctuate, and when being question mark such as the punctuation mark when subordinate sentence, the sentence pattern of determining this subordinate sentence is the interrogative sentence type.

Because the subordinate sentence in this microblogging is the interrogative sentence type, therefore the emotion propensity value and-1 with emotion word " good " multiplies each other, with the product that the obtains emotion propensity value as this subordinate sentence, so the emotion propensity value of this microblogging is the value less than 0, and also namely the emotion of this microblogging tendency is negative emotion.

Situation four, suppose that this microblogging is for " brand A mobile phone is handy, but plain." then similar with situation one, this microblogging is carried out participle, extract the emotion word in the participle that obtains, can extract altogether two emotion words, two emotion words are " good ".

Determine the emotion tendency of this microblogging if adopt the method for analogue two, so because the emotion propensity value of the first half sentence in this subordinate sentence is the emotion propensity value of emotion word " good ", the emotion propensity value of later half sentence is the emotion propensity value of emotion word " good " and-1 product, therefore the emotion propensity value of this microblogging is 0, also is neutral emotion.But the sentence pattern of this subordinate sentence is turnover sentence type, and generally the emotion of this microblogging tendency should be to be partial to negative emotion.Therefore, server can pre-save has the participle of turnover meaning in the embodiment of the invention, for example " still ", " yet " etc., for the subordinate sentence that obtains, if determine to exist in this subordinate sentence the participle with turnover meaning, the sentence pattern of then determining this subordinate sentence is turnover sentence type, when determining the emotion propensity value of this subordinate sentence, can only determine to be positioned at have the turnover meaning participle after each emotion word the emotion propensity value and value, emotion propensity value as this subordinate sentence, perhaps, also can determine to be positioned at each the emotion word before the participle with turnover meaning the emotion propensity value and value, and will be somebody's turn to do and be worth and-1 product as the emotion propensity value of this subordinate sentence.

" brand A mobile phone is handy, but plain for microblogging." then can determine to be positioned at the emotion propensity value of participle " still " emotion word afterwards with turnover meaning; there is a negative word " no " owing to be arranged in front 3 participles of " still " emotion word " good " afterwards; the emotion propensity value that therefore is positioned at " still " emotion word " good " afterwards is the emotion propensity value and-1 product of the emotion word " good " of pre-save, and this product is the emotion propensity value of this microblogging.Perhaps, determine to be positioned at the emotion propensity value of participle " still " emotion word " good " before with turnover meaning, with this emotion propensity value and-1 the product emotion propensity value as this microblogging.As seen, " brand A mobile phone is handy, but plain no matter to adopt which kind of method to determine microblogging." the emotion propensity value, this emotion propensity value all is the value less than 0, also namely the emotion of this microblogging tendency is negative emotion.

In actual applications, then can be according to the subordinate sentence that comprises in the concrete microblogging and participle, adopt the disposal route of any one or combination of above-mentioned situation one ~ situation four, determine the emotion propensity value of microblogging.

When for certain cluster keyword set, determined after the emotion propensity value of all microbloggings relevant with this cluster keyword set, then can add up emotion propensity value wherein greater than the 0(positive emotion), less than the 0(negative emotion), equal the neutral emotion of 0() the shared number percent of microblogging, this number percent is also sent to the associated user.

Similarly, for certain cluster keyword set, if determine to exist in the focus public sentiment storehouse at least one and the similarity of this cluster keyword set focus public sentiment keyword set greater than the setting similarity, determine namely that also focus public sentiment corresponding to this cluster keyword set is not when being a new focus public sentiment, server can determine that also the emotion propensity value is greater than 0 in all microbloggings relevant with this cluster keyword set, less than 0, equal 0 the shared number percent of microblogging, and the number percent of determining sent to the associated user, so that the associated user is known the emotion trend of this focus public sentiment.For example go up in the time interval, the shared number percent of the microblogging of negative emotion is more in relevant each microblogging of the focus public sentiment keyword set of this focus public sentiment of reflection of statistics, and in the current time interval, then be that the shared number percent of the microblogging of positive emotion is more in relevant each microblogging of the focus public sentiment keyword set of this focus public sentiment of the reflection of statistics.

In addition, in embodiments of the present invention, when for certain the cluster keyword set that obtains, determine not exist in the focus public sentiment storehouse with the similarity of this cluster keyword set when setting the focus public sentiment keyword set of similarity, server is determined the temperature of focus public sentiment within the current time interval that this cluster keyword set is corresponding, and the temperature of determining is also sent to the associated user.Concrete, when not existing in the focus public sentiment storehouse with the similarity of this cluster keyword set when setting the focus public sentiment keyword set of similarity, for each microblogging relevant with this cluster keyword set, determine the temperature of this microblogging in described setting-up time interval, determine each microblogging the temperature this setting-up time interval in relevant with this cluster keyword set and be worth, and will determine send to the associated user with value.

Wherein, server determines that the method for certain microblogging the temperature this setting-up time interval in relevant with this cluster keyword is specially: determine number of times that number of times that this microblogging is forwarded in this setting-up time interval and quilt are commented on and value, as the definite temperature of this microblogging in this setting-up time interval.

Better, when server will the microblogging relevant with the cluster keyword be pushed to the associated user, can at each microblogging relevant with the cluster keyword, select the microblogging that temperature is the highest to be pushed to the associated user.

Similarly, for certain cluster keyword set, if determine to exist in the focus public sentiment storehouse at least one and the similarity of this cluster keyword set focus public sentiment keyword set greater than the setting similarity, determine namely that also focus public sentiment corresponding to this cluster keyword set is not when being a new focus public sentiment, server also can be determined the temperature of focus public sentiment within the current time interval that this cluster keyword set is corresponding, the temperature of determining is also sent to the associated user, so that the user is known the trend of the temperature of this focus public sentiment within each time interval.For example, the temperature of the temperature of this focus public sentiment of determining of current time interval this focus public sentiment more definite than a upper time interval is higher.

As seen, in embodiments of the present invention, after each cluster keyword set that server obtains by the increment cluster, for a cluster keyword set that obtains, if server determines not exist in the focus public sentiment storehouse similarity with this cluster keyword set greater than the focus public sentiment keyword set of setting similarity, when determining namely that also focus public sentiment corresponding to this cluster keyword set is a new focus public sentiment, the highest microblogging of temperature that then will be relevant with this cluster keyword set is pushed to the associated user, and in the microblogging relevant with this cluster keyword set that will add up the emotion propensity value greater than 0, less than 0, equal 0 the shared number percent of microblogging, and the temperature of each microblogging relevant with this cluster keyword set send to the associated user with value.If server determines in the focus public sentiment storehouse to exist similarity with this cluster keyword set greater than the focus public sentiment keyword set of setting similarity, determine namely that also focus public sentiment corresponding to this cluster keyword set is not when being a new focus public sentiment, then need not to push the microblogging relevant with this cluster keyword set, but can send in the microblogging relevant with this cluster keyword set of statistics the emotion propensity value greater than 0 to the associated user, less than 0, equal 0 the shared number percent of microblogging, and the temperature of each microblogging relevant with this cluster keyword set and the value.

In addition, in embodiments of the present invention.Server can also be monitored and early warning the microblogging that comprises the designated key word.Concrete, for each microblogging of in this setting-up time interval, issuing, consist of the monitoring keyword set with each keyword of in this microblogging, determining, if server determines not exist in the focus public sentiment storehouse similarity with this monitoring keyword set greater than the focus public sentiment keyword set of setting similarity, and comprise the designated key word in this monitoring keyword set, and the emotion propensity value of this microblogging is less than the 0(negative emotion), and the temperature of this microblogging in this setting-up time interval is greater than setting temperature, then this microblogging is pushed to the associated user, in order to send early warning to the associated user, may be a new focus public sentiment with what remind that this microblogging of associated user reflected, and the emotion of this microblogging tendency is negative emotion, and temperature is also higher.

More than the method for the propelling movement microblogging that provides for the embodiment of the invention, based on same thinking, the embodiment of the invention also provides a kind of device that pushes microblogging, as shown in Figure 2.

The apparatus structure synoptic diagram of the propelling movement microblogging that Fig. 2 provides for the embodiment of the invention specifically comprises:

Receive word-dividing mode 201, be used for being received in each microblogging of issue in the setting-up time interval, the keyword in each microblogging of determining to receive;

Keyword set determination module 202, be used for according to each keyword of determining, adopt establishing method to determine keyword set, and definite all keyword set that adopt described establishing method to determine, wherein, described establishing method is: select arbitrarily two keywords to consist of a keyword set in each keyword;

Increment cluster module 203, be used for according to the common factor of per two keyword set of each keyword set of determining and and concentrate the keyword that comprises, each keyword set of determining is carried out the increment cluster, obtain each cluster keyword set;

Judge pushing module 204, be used for for each the cluster keyword set that obtains, judge whether exist in the focus public sentiment storehouse with the similarity of this cluster keyword set greater than the focus public sentiment keyword set of setting similarity, when not existing, in each microblogging that receives, select the microblogging relevant with this cluster keyword set to be pushed to the associated user, and this cluster keyword set is kept in the described focus public sentiment storehouse as focus public sentiment keyword set.

Described reception word-dividing mode 201 specifically is used for, and each microblogging that receives is carried out word segmentation processing, determines the participle of specified type in each participle that obtains, as the keyword of determining.

Described keyword set determination module 202 specifically is used for, for each keyword of determining, the number of times that in each microblogging that receives, occurs according to this keyword and, the quantity of the microblogging that receives, and the inverse document frequency of this keyword of pre-save adopts formula

Determine the weight of this keyword, wherein, n _WordThe number of times that in each microblogging that receives, occurs for this keyword and, N is the quantity of the microblogging that receives, Idf is the inverse document frequency of this keyword of pre-save, Word _WeightWeight for this definite keyword; According to the weight of each keyword of determining, select successively the first keyword of setting quantity according to weight order from big to small, the first keyword of setting quantity according to selecting adopts establishing method to determine keyword set.

Described keyword set determination module 202 also is used for, for each keyword set of determining, determine the mutual information of two keywords comprising in this keyword set, mutual information according to two keywords that comprise in this keyword set of determining, and the weights of these two keywords, adopt formula

Determine the weight of this keyword set, wherein, i represents that the keyword i that comprises in this keyword set, j represent the keyword j that comprises in this keyword set, Be the weight of keyword i,

P (i) comprises the probability of this keyword i for a microblogging receiving, and p (j) comprises the probability of this keyword j for a microblogging that receives, and p (i, j) comprises the probability of this keyword i and keyword j simultaneously for a microblogging that receives; According to the weight of each keyword set of determining, select successively the keyword set of the second setting quantity according to weight order from big to small;

Described increment cluster module 203 specifically is used for, according to select second set per two keyword set in the keyword set of quantity common factor and and concentrate the keyword that comprises, the second keyword set of setting quantity of selecting is carried out the increment cluster.

Described increment cluster module 203 specifically is used for, weight according to each keyword set of selecting, the second keyword set of setting quantity of selecting is sorted according to weight order from big to small, sequencing according to the keyword set ordering, carry out for each keyword set successively: with current for keyword set as keyword set to be clustered, determine to come keyword set to be clustered each keyword set before, as the preorder keyword set, for each preorder keyword set of determining, determine the first quantity of the keyword that comprises in the common factor of keyword set to be clustered and this preorder keyword set, determine the second quantity this keyword to be clustered and this preorder keyword set and that concentrate the keyword that comprises, when the ratio of the first quantity and the second quantity when setting ratio, the keyword that satisfies the first specified requirements in the keyword set to be clustered is added in this preorder keyword set; Wherein, the keyword that satisfies the first specified requirements is: be included in this keyword set to be clustered and be not included in keyword in this preorder keyword set.

Described increment cluster module 203 also is used for, before adding to the keyword that satisfies the first specified requirements in the keyword set to be clustered in this preorder keyword set, in each microblogging that receives, determine to comprise simultaneously the keyword that satisfies the first specified requirements, and the quantity of the microblogging of each keyword in this preorder keyword set is set quantity greater than the 3rd.

Described increment cluster module 203 also is used for, in each the cluster keyword set that obtains, extract the cluster keyword set that satisfies the second specified requirements, described the second specified requirements comprises: the quantity of the keyword that comprises is no less than the 4th and sets quantity, and the quantity of the microblogging relevant with this cluster keyword set is set quantity greater than the 5th; Wherein, the microblogging relevant with this cluster keyword set specifically comprises: include the microblogging of at least m keyword in this cluster keyword set, m is the 6th setting quantity.

Described judgement pushing module 204 specifically is used for, and for each cluster keyword set that satisfies described the second specified requirements of extracting, adopts formula

Determine this cluster keyword set respectively with focus public sentiment storehouse in the similarity of each focus public sentiment keyword set, and judge whether exist in the described focus public sentiment storehouse with the similarity of this cluster keyword set greater than the focus public sentiment keyword set of setting similarity, wherein, A this cluster keyword set for extracting, B is the focus public sentiment keyword set in the described focus public sentiment storehouse, J (A, B) is this definite cluster keyword set and the similarity of focus public sentiment keyword set.

Described judgement pushing module 204 also is used for, when not existing in the described focus public sentiment storehouse with the similarity of this cluster keyword set when setting the focus public sentiment keyword set of similarity, for each microblogging relevant with this cluster keyword set, determine the emotion propensity value of this microblogging, emotion propensity value according to each microblogging relevant with this cluster keyword set of determining, determine respectively wherein the emotion propensity value greater than 0, less than 0, equal 0 the shared number percent of microblogging, the number percent of determining is sent to described associated user; Wherein, the emotion propensity value of determining this microblogging specifically comprises: to the processing of making pauses in reading unpunctuated ancient writings of this microblogging, obtain each subordinate sentence in this microblogging, for each subordinate sentence, this subordinate sentence is carried out word segmentation processing, obtain each participle in this subordinate sentence, the emotion propensity value corresponding according to each participle of pre-save, determine the emotion propensity value of each participle in this subordinate sentence, according to the emotion propensity value of each participle in this subordinate sentence and the type of this subordinate sentence, that determines this subordinate sentence please change propensity value, with the emotion propensity value of each subordinate sentence in this microblogging of determining and value, be defined as the emotion propensity value of this microblogging.

Described judgement pushing module 204 also is used for, when not existing in the described focus public sentiment storehouse with the similarity of this cluster keyword set when setting the focus public sentiment keyword set of similarity, for each microblogging relevant with this cluster keyword set, determine the temperature of this microblogging in described setting-up time interval, determine each microblogging the temperature described setting-up time interval in relevant with this cluster keyword set and be worth, and will determine send to described associated user with value; Wherein, determine that the temperature of this microblogging in described setting-up time interval specifically comprises: determine the number of times that this microblogging is forwarded in described setting-up time interval and the number of times of being commented on and value, as the definite temperature of this microblogging in described setting-up time interval.

Described judgement pushing module 204 also is used for, for each microblogging of in described setting-up time interval, issuing, consist of the monitoring keyword set with each keyword of in this microblogging, determining, if do not exist in the described focus public sentiment storehouse with the similarity of this monitoring keyword set greater than the emotion propensity value that comprises designated key word and this microblogging in the focus public sentiment keyword set of setting similarity and this monitoring keyword set less than 0 and the temperature of this microblogging in described setting-up time interval greater than the setting temperature, then this microblogging is pushed to described associated user.

Concrete, the device of above-mentioned propelling movement microblogging can be arranged in server.

Obviously, those skilled in the art can carry out various changes and modification to the present invention and not break away from the spirit and scope of the present invention.Like this, if of the present invention these are revised and modification belongs within the scope of claim of the present invention and equivalent technologies thereof, then the present invention also is intended to comprise these changes and modification interior.

Claims

1. a method that pushes microblogging is characterized in that, comprising:

2. the method for claim 1 is characterized in that, the keyword in each microblogging of determining to receive specifically comprises:

Each microblogging that receives is carried out word segmentation processing, in each participle that obtains, determine the participle of specified type, as the keyword of determining.

3. the method for claim 1 is characterized in that, according to each keyword of determining, adopts establishing method to determine keyword set, specifically comprises:

For each keyword of determining, the number of times that in each microblogging that receives, occurs according to this keyword and, the quantity of the microblogging that receives, and the inverse document frequency of this keyword of pre-save adopts formula Determine the weight of this keyword, wherein, n _WordThe number of times that in each microblogging that receives, occurs for this keyword and, N is the quantity of the microblogging that receives, Idf is the inverse document frequency of this keyword of pre-save, Word _WeightWeight for this definite keyword;

According to the weight of each keyword of determining, select successively the first keyword of setting quantity according to weight order from big to small, the first keyword of setting quantity according to selecting adopts establishing method to determine keyword set.

4. method as claimed in claim 3 is characterized in that, according to the common factor of per two keyword set in each keyword set of determining and and concentrate the keyword that comprises, each keyword set of determining is carried out before the increment cluster, described method also comprises:

For each keyword set of determining, determine the mutual information of two keywords comprising in this keyword set, according to the mutual information of two keywords that comprise in this keyword set of determining, and the weight of these two keywords, adopt formula

Be the weight of keyword i,

P (i) comprises the probability of this keyword i for a microblogging receiving, and p (j) comprises the probability of this keyword j for a microblogging that receives, and p (i, j) comprises the probability of this keyword i and keyword j simultaneously for a microblogging that receives;

According to the weight of each keyword set of determining, select successively the keyword set of the second setting quantity according to weight order from big to small;

According to the common factor of per two keyword set in each keyword set of determining and and concentrate the keyword that comprises, each keyword set of determining is carried out the increment cluster, specifically comprise:

According to select second set per two keyword set in the keyword set of quantity common factor and and concentrate the keyword that comprises, the second keyword set of setting quantity of selecting is carried out the increment cluster.

5. method as claimed in claim 4 is characterized in that, the second keyword set of setting quantity of selecting is carried out the increment cluster, specifically comprises:

According to the weight of selecting each keyword set, the second keyword set of setting quantity of selecting is sorted according to weight order from big to small;

According to the sequencing of keyword set ordering, successively for each keyword set, carry out following steps A ~ B:

6. method as claimed in claim 5 is characterized in that, add to the keyword that satisfies the first specified requirements in the keyword set to be clustered in this preorder keyword set before, described method also comprises:

In each microblogging that receives, determine to comprise simultaneously the keyword that satisfies the first specified requirements, and the quantity of the microblogging of each keyword in this preorder keyword set is set quantity greater than the 3rd.

7. method as claimed in claim 5, it is characterized in that, for each the cluster keyword set that obtains, judge whether exist in the focus public sentiment storehouse with the similarity of this cluster keyword set greater than before the focus public sentiment keyword set of setting similarity, described method also comprises:

In each the cluster keyword set that obtains, extract the cluster keyword set that satisfies the second specified requirements, described the second specified requirements comprises: the quantity of the keyword that comprises is no less than the 4th and sets quantity, and the quantity of the microblogging relevant with this cluster keyword set is set quantity greater than the 5th;

Wherein, the microblogging relevant with this cluster keyword set specifically comprises: include the microblogging of at least m keyword in this cluster keyword set, m is the 6th setting quantity.

8. method as claimed in claim 7 is characterized in that, for each the cluster keyword set that obtains, judges whether exist in the focus public sentiment storehouse with the similarity of this cluster keyword set greater than the focus public sentiment keyword set of setting similarity, specifically comprise:

For each cluster keyword set that satisfies described the second specified requirements of extracting, adopt formula

9. method as claimed in claim 8 is characterized in that, when not existing in the described focus public sentiment storehouse with the similarity of this cluster keyword set when setting the focus public sentiment keyword set of similarity, described method also comprises:

For each microblogging relevant with this cluster keyword set, determine the emotion propensity value of this microblogging;

According to the emotion propensity value of each microblogging relevant with this cluster keyword set of determining, determine respectively wherein the emotion propensity value greater than 0, less than 0, equal 0 the shared number percent of microblogging, definite number percent is sent to described associated user;

Wherein, the emotion propensity value of determining this microblogging specifically comprises:

To the processing of making pauses in reading unpunctuated ancient writings of this microblogging, obtain each subordinate sentence in this microblogging;

For each subordinate sentence, this subordinate sentence is carried out word segmentation processing, obtain each participle in this subordinate sentence, the emotion propensity value corresponding according to each participle of pre-save, determine the emotion propensity value of each participle in this subordinate sentence, according to the emotion propensity value of each participle in this subordinate sentence and the type of this subordinate sentence, determine the emotion propensity value of this subordinate sentence;

With the emotion propensity value of each subordinate sentence in this microblogging of determining and value, be defined as the emotion propensity value of this microblogging.

10. method as claimed in claim 8 is characterized in that, when not existing in the described focus public sentiment storehouse with the similarity of this cluster keyword set when setting the focus public sentiment keyword set of similarity, described method also comprises:

For each microblogging relevant with this cluster keyword set, determine the temperature of this microblogging in described setting-up time interval;

Determine each microblogging the temperature described setting-up time interval in relevant with this cluster keyword set and be worth, and will determine send to described associated user with value;

Wherein, determine that the temperature of this microblogging in described setting-up time interval specifically comprises:

Determine the number of times that this microblogging is forwarded in described setting-up time interval and the number of times of being commented on and value as the definite temperature of this microblogging in described setting-up time interval.

11. such as the arbitrary described method of claim 8 ~ 10, it is characterized in that described method also comprises:

For each microblogging of in described setting-up time interval, issuing, consist of the monitoring keyword set with each keyword of in this microblogging, determining;

If do not exist in the described focus public sentiment storehouse with the similarity of this monitoring keyword set greater than the emotion propensity value that comprises designated key word and this microblogging in the focus public sentiment keyword set of setting similarity and this monitoring keyword set less than 0 and the temperature of this microblogging in described setting-up time interval greater than the setting temperature, then this microblogging is pushed to described associated user.

12. a device that pushes microblogging is characterized in that, comprising:

13. device as claimed in claim 12 is characterized in that, described reception word-dividing mode specifically is used for, and each microblogging that receives is carried out word segmentation processing, determines the participle of specified type in each participle that obtains, as the keyword of determining.

14. device as claimed in claim 12, it is characterized in that, described keyword set determination module specifically is used for, for each keyword of determining, the number of times that in each microblogging that receives, occurs according to this keyword and, the quantity of the microblogging that receives, and the inverse document frequency of this keyword of pre-save, adopt formula

15. device as claimed in claim 14, it is characterized in that, described keyword set determination module also is used for, for each keyword set of determining, determine the mutual information of two keywords comprising in this keyword set, according to the mutual information of two keywords that comprise in this keyword set of determining, and the weights of these two keywords, formula adopted

Be the weight of keyword i,

Described increment cluster module specifically is used for, according to select second set per two keyword set in the keyword set of quantity common factor and and concentrate the keyword that comprises, the second keyword set of setting quantity of selecting is carried out the increment cluster.

16. device as claimed in claim 15, it is characterized in that, described increment cluster module specifically is used for, weight according to each keyword set of selecting, the second keyword set of setting quantity of selecting is sorted according to weight order from big to small, sequencing according to the keyword set ordering, carry out for each keyword set successively: with current for keyword set as keyword set to be clustered, determine to come keyword set to be clustered each keyword set before, as the preorder keyword set, for each preorder keyword set of determining, determine the first quantity of the keyword that comprises in the common factor of keyword set to be clustered and this preorder keyword set, determine the second quantity this keyword to be clustered and this preorder keyword set and that concentrate the keyword that comprises, when the ratio of the first quantity and the second quantity when setting ratio, the keyword that satisfies the first specified requirements in the keyword set to be clustered is added in this preorder keyword set; Wherein, the keyword that satisfies the first specified requirements is: be included in this keyword set to be clustered and be not included in keyword in this preorder keyword set.

17. device as claimed in claim 16, it is characterized in that, described increment cluster module also is used for, before adding to the keyword that satisfies the first specified requirements in the keyword set to be clustered in this preorder keyword set, in each microblogging that receives, determine to comprise simultaneously the keyword that satisfies the first specified requirements, and the quantity of the microblogging of each keyword in this preorder keyword set is set quantity greater than the 3rd.

18. device as claimed in claim 16, it is characterized in that, described increment cluster module also is used for, in each the cluster keyword set that obtains, extract the cluster keyword set that satisfies the second specified requirements, described the second specified requirements comprises: the quantity of the keyword that comprises is no less than the 4th and sets quantity, and the quantity of the microblogging relevant with this cluster keyword set is set quantity greater than the 5th; Wherein, the microblogging relevant with this cluster keyword set specifically comprises: include the microblogging of at least m keyword in this cluster keyword set, m is the 6th setting quantity.

19. device as claimed in claim 18 is characterized in that, described judgement pushing module specifically is used for, and for each cluster keyword set that satisfies described the second specified requirements of extracting, adopts formula

20. device as claimed in claim 19, it is characterized in that, described judgement pushing module also is used for, when not existing in the described focus public sentiment storehouse with the similarity of this cluster keyword set when setting the focus public sentiment keyword set of similarity, for each microblogging relevant with this cluster keyword set, determine the emotion propensity value of this microblogging, emotion propensity value according to each microblogging relevant with this cluster keyword set of determining, determine that respectively wherein the emotion propensity value is greater than 0, less than 0, equal 0 the shared number percent of microblogging, the number percent of determining is sent to described associated user; Wherein, the emotion propensity value of determining this microblogging specifically comprises: to the processing of making pauses in reading unpunctuated ancient writings of this microblogging, obtain each subordinate sentence in this microblogging, for each subordinate sentence, this subordinate sentence is carried out word segmentation processing, obtain each participle in this subordinate sentence, the emotion propensity value corresponding according to each participle of pre-save, determine the emotion propensity value of each participle in this subordinate sentence, according to the emotion propensity value of each participle in this subordinate sentence and the type of this subordinate sentence, that determines this subordinate sentence please change propensity value, with the emotion propensity value of each subordinate sentence in this microblogging of determining and value, be defined as the emotion propensity value of this microblogging.

21. device as claimed in claim 19, it is characterized in that, described judgement pushing module also is used for, when not existing in the described focus public sentiment storehouse with the similarity of this cluster keyword set when setting the focus public sentiment keyword set of similarity, for each microblogging relevant with this cluster keyword set, determine the temperature of this microblogging in described setting-up time interval, determine each microblogging the temperature described setting-up time interval in relevant with this cluster keyword set and be worth, and will determine send to described associated user with value; Wherein, determine that the temperature of this microblogging in described setting-up time interval specifically comprises: determine the number of times that this microblogging is forwarded in described setting-up time interval and the number of times of being commented on and value, as the definite temperature of this microblogging in described setting-up time interval.

22. such as the arbitrary described device of claim 19 ~ 21, it is characterized in that, described judgement pushing module also is used for, for each microblogging of in described setting-up time interval, issuing, consist of the monitoring keyword set with each keyword of in this microblogging, determining, if do not exist in the described focus public sentiment storehouse with the similarity of this monitoring keyword set greater than the focus public sentiment keyword set of setting similarity, and comprise the designated key word in this monitoring keyword set, and the emotion propensity value of this microblogging is less than 0, and the temperature of this microblogging in described setting-up time interval then is pushed to described associated user with this microblogging greater than setting temperature.