CN102968439A - Method and device for sending microblogs - Google Patents

Method and device for sending microblogs Download PDF

Info

Publication number
CN102968439A
CN102968439A CN2012103850367A CN201210385036A CN102968439A CN 102968439 A CN102968439 A CN 102968439A CN 2012103850367 A CN2012103850367 A CN 2012103850367A CN 201210385036 A CN201210385036 A CN 201210385036A CN 102968439 A CN102968439 A CN 102968439A
Authority
CN
China
Prior art keywords
keyword
keyword set
microblogging
cluster
public sentiment
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2012103850367A
Other languages
Chinese (zh)
Other versions
CN102968439B (en
Inventor
伏圣国
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Weimeng Chuangke Network Technology China Co Ltd
Original Assignee
Weimeng Chuangke Network Technology China Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Weimeng Chuangke Network Technology China Co Ltd filed Critical Weimeng Chuangke Network Technology China Co Ltd
Priority to CN201210385036.7A priority Critical patent/CN102968439B/en
Publication of CN102968439A publication Critical patent/CN102968439A/en
Application granted granted Critical
Publication of CN102968439B publication Critical patent/CN102968439B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method and a device for sending microblogs, which solve the problem that some microblogs about hot spots and popular feelings cannot be sent to corresponding users timely in the prior art. By utilizing the method, keywords in each microblog received within set time intervals are determined, accordingly, keyword sets are determined; increment clustering is carried out on each determined keyword set according to the keywords contained in intersection and union that each two keywords aggregate; aiming at the obtained clustering keyword sets, when the clustering keyword sets do not exist in a hot spot and popular feeling library, wherein the similarity of the clustering keyword sets about hot spots and popular feelings is larger than the set similarity of the keyword sets about hot spots and popular feelings, the microblogs related to the clustering keywords are sent to related users. As the number of clustering needs not to be set in clustering, any keyword set about hot spots and popular feelings within the set time interval cannot be omitted, the microblogs reflecting the hot spots and popular feelings can be timely sent to the corresponding users.

Description

A kind of method and device that pushes microblogging
Technical field
The present invention relates to communication technical field, relate in particular to a kind of method and device that pushes microblogging.
Background technology
In recent years, be accompanied by popularizing of internet, the network media has been acknowledged as " fourth media " after newspaper, broadcasting, TV, network has become one of main carriers of reflection social hotspots public sentiment, particularly by rise and the development of microblogging,, fast propagation instant by microblogging, characteristics have easily further promoted the development of network public-opinion, and it is with strongest influence power a kind of in the network public-opinion that the public sentiment on the microblogging also becomes.
Pass through microblogging, the user both can be published to the public sentiment of oneself finding on the microblogging, also the microblogging of other user's issues can be transmitted, can cause the focus public sentiment that a large number of users is paid close attention to for some, in case it is issued at microblogging, then this microblogging will be transmitted, pay close attention to by a large number of users in the short time.Therefore, relevant departments and enterprise have all begun the focus public sentiment paying attention to reflecting in the microblogging, to tackle timely according to the focus public sentiment.For example, the information that the baby that will be critically ill is sent to hospital is published on the microblogging, and this microblogging can be transmitted at short notice in a large number, to cause the concern of traffic control department, traffic control department then in time takes for the vehicle at this baby place Corresponding Countermeasures such as open a way, to guarantee that the baby can be delivered to hospital timely.
Yet, because the quantity of information of microblogging is very huge, only rely on artificial method to determine that the focus public sentiment that reflects in the magnanimity microblogging is very difficult, therefore, the focus public sentiment that reflects in the how to confirm magnanimity microblogging becomes a problem demanding prompt solution.
In the prior art, mainly be to adopt based on the text cluster technology of k-means algorithm to determine the focus public sentiment that reflects in the magnanimity microblogging, and at least one microblogging that will reflect this focus public sentiment is pushed to relative users, and this user specifically can be the users such as relevant departments or relevant enterprise.
Need to preset the number of cluster based on the text cluster of k-means algorithm, also namely need to preset the quantity of the focus public sentiment that reflects in the magnanimity microblogging, could carry out text cluster to these magnanimity microbloggings according to the k-means algorithm.Each cluster that obtains is exactly the microblogging cluster of each focus public sentiment of reflection, and the quantity of the focus public sentiment of also namely determining is exactly the number of predefined cluster.
Yet the quantity of the focus public sentiment that reflects in the magnanimity microblogging can not be estimated often.If the number of predefined cluster is very few, will omit some the focus public sentiment that reflects in the microblogging, cause reflecting that the microblogging of the focus public sentiment of omission can not be pushed to relative users timely.
Summary of the invention
The embodiment of the invention provides a kind of method and device that pushes microblogging, can not be pushed to timely the problem of relative users in order to the microblogging that solves some focus public sentiment of reflection in the prior art.
A kind of method that pushes microblogging that the embodiment of the invention provides comprises:
Be received in each microblogging of issue in the setting-up time interval, the keyword in each microblogging of determining to receive;
According to each keyword of determining, adopt establishing method to determine keyword set, and determine all keyword set that the described establishing method of employing can be determined, wherein, described establishing method is: select arbitrarily two keywords to consist of a keyword set in each keyword;
According to the common factor of per two keyword set in each keyword set of determining and and concentrate the keyword that comprises, each keyword set of determining is carried out the increment cluster, obtain each cluster keyword set;
For each the cluster keyword set that obtains, judge whether exist in the focus public sentiment storehouse with the similarity of this cluster keyword set greater than the focus public sentiment keyword set of setting similarity, when not existing, in each microblogging that receives, select the microblogging relevant with this cluster keyword set to be pushed to the associated user, and this cluster keyword set is kept in the described focus public sentiment storehouse as focus public sentiment keyword set.
A kind of device that pushes microblogging that the embodiment of the invention provides comprises:
Receive word-dividing mode, be used for being received in each microblogging of issue in the setting-up time interval, the keyword in each microblogging of determining to receive;
The keyword set determination module, be used for according to each keyword of determining, adopt establishing method to determine keyword set, and definite all keyword set that adopt described establishing method to determine, wherein, described establishing method is: select arbitrarily two keywords to consist of a keyword set in each keyword;
Increment cluster module, be used for according to the common factor of per two keyword set of each keyword set of determining and and concentrate the keyword that comprises, each keyword set of determining is carried out the increment cluster, obtain each cluster keyword set;
Judge pushing module, be used for for each the cluster keyword set that obtains, judge whether exist in the focus public sentiment storehouse with the similarity of this cluster keyword set greater than the focus public sentiment keyword set of setting similarity, when not existing, in each microblogging that receives, select the microblogging relevant with this cluster keyword set to be pushed to the associated user, and this cluster keyword set is kept in the described focus public sentiment storehouse as focus public sentiment keyword set.
The embodiment of the invention provides a kind of method and device that pushes microblogging, keyword in each microblogging that the method is determined to receive in the setting-up time interval, the employing method that optional two keywords consist of keyword set in each keyword is determined keyword set, and according to the common factor of per two keyword set and and concentrate the keyword that comprises, each keyword set of determining is carried out the increment cluster, for the cluster keyword set that obtains, when not existing in the focus public sentiment storehouse with the similarity of this cluster keyword set when setting the focus public sentiment keyword set of similarity, the microblogging relevant with this cluster keyword is pushed to the associated user.Pass through said method, the cluster keyword set that obtains is exactly the focus public sentiment keyword set corresponding to focus public sentiment that reflects of each microblogging of issue in this setting-up time interval, and owing to when cluster, need not to preset the number of cluster, therefore can not omit any focus public sentiment keyword set in this setting-up time interval, the microblogging of reflection focus public sentiment can be pushed to relative users timely.
Description of drawings
The process of the propelling movement microblogging that Fig. 1 provides for the embodiment of the invention;
The apparatus structure synoptic diagram of the propelling movement microblogging that Fig. 2 provides for the embodiment of the invention.
Embodiment
The embodiment of the invention provides a kind of method and device that pushes microblogging, keyword in each microblogging that the method is determined to receive in the setting-up time interval, the employing method that optional two keywords consist of keyword set in each keyword is determined keyword set, and according to the common factor of per two keyword set and and concentrate the keyword that comprises, each keyword set of determining is carried out the increment cluster, for the cluster keyword set that obtains, when not existing in the focus public sentiment storehouse with the similarity of this cluster keyword set when setting the focus public sentiment keyword set of similarity, the microblogging relevant with this cluster keyword is pushed to the associated user.Pass through said method, the cluster keyword set that obtains is exactly the focus public sentiment keyword set corresponding to focus public sentiment that reflects of each microblogging of issue in this setting-up time interval, and owing to when cluster, need not to preset the number of cluster, therefore can not omit any focus public sentiment keyword set in this setting-up time interval, the microblogging of reflection focus public sentiment can be pushed to relative users timely.
Below in conjunction with Figure of description, the embodiment of the invention is described in detail.
The process of the propelling movement microblogging that Fig. 1 provides for the embodiment of the invention specifically may further comprise the steps:
S101: be received in each microblogging of issue in the setting-up time interval, the keyword in each microblogging of determining to receive.
In embodiments of the present invention, when server finishes at each setting-up time interval, the keyword that comprises in each microblogging of determining in current time interval, to receive.Wherein, this setting-up time interval can be set as required, for example is set as 2 hours.
During the keyword that in each microblogging of determining to receive, comprises, can carry out word segmentation processing to each microblogging that receives, and in each participle that obtains by word segmentation processing, determine the participle of specified type, as the keyword of determining.Concrete, can be in each participle that obtains by word segmentation processing, remove first stop words, again for remaining each participle, with this participle respectively with the participle dictionary of the specified type of pre-save in participle mate, if the match is successful, illustrate that then this participle is the participle of this specified type, is defined as keyword with this participle.Wherein, specified type comprises: minute part of speech type such as noun type, verb type, adjective type.
S102: according to each keyword of determining, adopt establishing method to determine keyword set, and determine all keyword set that this establishing method of employing can be determined.
Wherein, this establishing method is: select arbitrarily two keywords to consist of a keyword set in each keyword.
For example, the keyword that comprises in each microblogging that receives in this setting-up time interval of supposing to determine in step S101 is keyword X, keyword Y, keyword Z, then server adopts and selects arbitrarily two keywords to consist of the method for a keyword set, the keyword set that can determine is combined into { keyword X, keyword Y}, { keyword Y, keyword Z}, { keyword X, keyword Z} be totally 3 keyword set.
S103: according to the common factor of per two keyword set in each keyword set of determining and and concentrate the keyword that comprises, each keyword set of determining is carried out the increment cluster, obtain each cluster keyword set.
Concrete, server can sort to each keyword according to certain rule first when each keyword set is carried out the increment cluster, and according to the sequencing that sorts, carries out following steps A ~ B for each keyword set successively:
Steps A, with current for keyword set as keyword set to be clustered, determine to come each keyword set before the keyword set to be clustered, as the preorder keyword set;
Step B, for each preorder keyword set of determining, determine the first quantity of the keyword that comprises in the common factor of keyword set to be clustered and this preorder keyword set, determine the second quantity keyword set to be clustered and this preorder keyword set and that concentrate the keyword that comprises, when the ratio of the first quantity and the second quantity when setting ratio, the keyword that satisfies the first specified requirements in the keyword set to be clustered is added in this preorder keyword set, wherein, the keyword of satisfied the first specified requirements is: be included in this keyword set to be clustered, and be not included in the keyword in this preorder keyword set.
When the sequencing according to ordering, carried out above-mentioned steps A ~ B for each keyword set successively after, then the increment cluster finishes, each keyword set after the cluster that obtains is exactly the cluster keyword set.
Continue to continue to use example, the keyword set of determining among the step S102 is combined into { keyword X, keyword Y}, { keyword Y, keyword Z}, { keyword X, keyword Z}, totally 3 keyword set are supposed these 3 keyword set are sorted arbitrarily, and sorting is: { keyword X, keyword Y}, { keyword Y, keyword Z}, keyword X, keyword Z}, then:
Sequencing according to ordering, first for keyword set keyword X, keyword Y} is with keyword set { keyword X, keyword Y} is as keyword set to be clustered, and since do not exist come keyword set keyword X, the keyword set before the keyword Y} is therefore to keyword set { keyword X, the processing of keyword Y} finishes, according to the sequencing of ordering, { keyword Y, keyword Z} processes for keyword set in continuation.
For keyword set { keyword Y, keyword Z}, with keyword set { keyword Y, keyword Z} is as keyword set to be clustered, { before the keyword Y, keyword Z} is keyword set { keyword X, keyword Y} to come keyword set, therefore { keyword X, keyword Y} is as the preorder keyword set with keyword set.Keyword set to be clustered { keyword Y, keyword Z} and preorder keyword set { keyword X, the first quantity of the keyword that comprises in the common factor of keyword Y} is 1, and the second quantity of the concentrated keyword that comprises is 3, the ratio of the first quantity and the second quantity is 1/3, suppose that setting ratio is 1/5, then the ratio of the first quantity and the second quantity is greater than setting ratio.And, the keyword that satisfies the first specified requirements is that keyword Z(keyword Z only is included in the keyword set to be clustered, be not included in the preorder keyword), therefore, with keyword set to be clustered { keyword Y, keyword Z among the keyword Z} adds the preorder keyword set to, and { keyword X is among the keyword Y}.At this moment, { keyword X, keyword Y} have just become keyword set { keyword X, keyword Y, keyword Z} to keyword set.So far, { keyword Y, the processing of keyword Z} finishes to keyword set.According to the sequencing of ordering, { keyword X, keyword Z} processes for keyword set in continuation.
For keyword set keyword X, keyword Z} is with keyword set { keyword X, keyword Z } as keyword set to be clustered, { before the keyword X, keyword Z} is keyword set { keyword X to come keyword set, keyword Y, keyword Z} and keyword set keyword Y, keyword Z}, therefore, with keyword set { keyword X, keyword Y, { keyword Y, keyword Z} are as the preorder keyword set for keyword Z} and keyword set.For preorder keyword set { keyword X, keyword Y, keyword Z}, although keyword set to be clustered { keyword X, keyword Z} and this preorder keyword set { keyword X, keyword Y, the first quantity of the keyword that comprises in the common factor of keyword Z} with and the ratio of the second quantity of concentrating the keyword comprise greater than setting ratio, but, because this keyword set to be clustered { keyword X, all keywords among the keyword Z} all be included in the preorder keyword set keyword X, keyword Y is among the keyword Z}, so this keyword set to be clustered { keyword X, do not exist among the keyword Z} and can add this preorder keyword set { keyword X, keyword Y, the keyword among the keyword Z} to.For preorder keyword set { keyword Y, keyword Z}, this keyword set to be clustered { keyword X, keyword Z} and this preorder keyword set { keyword Y, the first quantity of the keyword that comprises in the common factor of keyword Z} with and the ratio of the second quantity of concentrating the keyword comprise greater than setting ratio, and with respect to this preorder keyword set { keyword Y, keyword Z}, this keyword set to be clustered { keyword X, the keyword that satisfies the first specified requirements among the keyword Z} is keyword X, therefore { keyword Y is among the keyword Z} to add keyword X to this preorder keyword set.At this moment, { keyword Y, keyword Z} have just become keyword set { keyword X, keyword Y, keyword Z} to keyword set.
So far, 3 keyword set are all processed, therefore the increment cluster finishes, 3 cluster keyword set that obtain are respectively keyword set { keyword X, keyword Y, keyword Z}, keyword set { keyword X, keyword Y, keyword Z}, keyword set { keyword X, keyword Z}.As seen, in 3 cluster keyword set of this that obtains, it is identical that two cluster keyword set are arranged, thus in the embodiment of the invention for several identical cluster keyword set that obtain, then only keep one.
In embodiments of the present invention, in fact each the cluster keyword set that obtains by the increment cluster is exactly focus public sentiment keyword set corresponding to each focus public sentiment that the microblogging issued in this setting-up time interval reflects.
S104: for each the cluster keyword set that obtains, judge whether exist in the focus public sentiment storehouse with the similarity of this cluster keyword set greater than the focus public sentiment keyword set of setting similarity, if then carry out step S105, otherwise carry out step S106.
In embodiments of the present invention, safeguard a focus public sentiment storehouse in the server, preserved focus public sentiment keyword set corresponding to each focus public sentiment in this focus public sentiment storehouse, the set that each keyword that in fact a focus public sentiment keyword set corresponding to focus public sentiment is exactly this focus public sentiment consists of.When server was found new focus public sentiment at every turn, just the focus public sentiment keyword set that this focus public sentiment is corresponding was kept in this focus public sentiment storehouse.
Among the above-mentioned steps S104 for each the cluster keyword set that obtains, judge whether exist in the focus public sentiment storehouse with this cluster keyword set think speed greater than the focus public sentiment keyword set of setting similarity, in fact be exactly to judge whether focus public sentiment keyword set corresponding to focus public sentiment that the microblogging issued in this setting-up time interval reflects is a focus public sentiment keyword set that new focus public sentiment is corresponding.
S105: determine in focus public sentiment storehouse and the focus public sentiment keyword set of the similarity maximum of this cluster keyword set that microblogging that will be relevant with this cluster keyword set is as the microblogging preservation relevant with this focus public sentiment keyword set of determining.
If have at least one and the similarity of this cluster keyword set focus public sentiment keyword set greater than the setting similarity in the focus public sentiment storehouse, illustrate that then focus public sentiment corresponding to this cluster keyword set is not is a new focus public sentiment.Therefore, in focus public sentiment storehouse, determine the focus public sentiment keyword set with the similarity maximum of this cluster keyword set in the embodiment of the invention, it is identical with focus public sentiment corresponding to this cluster keyword set that focus public sentiment corresponding to this focus public sentiment keyword of determining then can be thought, microblogging that therefore will be relevant with this cluster keyword set is as the microblogging preservation relevant with this focus public sentiment keyword set of determining.
S106: in each microblogging that receives, select the microblogging relevant with this cluster keyword to be pushed to the associated user, and this cluster keyword set is kept in the focus public sentiment storehouse as focus public sentiment keyword set.
If do not exist in the focus public sentiment storehouse with the similarity of this cluster keyword set greater than the focus public sentiment keyword set of setting similarity, illustrate that then focus public sentiment corresponding to this cluster keyword set is a new focus public sentiment.Therefore, in each microblogging that will in this setting-up time interval, receive, select the microblogging relevant with this cluster keyword to be pushed to the associated user, and this cluster keyword set is kept in the focus public sentiment storehouse as a focus public sentiment keyword corresponding to new focus public sentiment.
By above-mentioned steps S106 as can be known, the cluster keyword set that server obtains for the increment cluster in the embodiment of the invention, in case finding focus public sentiment corresponding to this cluster keyword set is new focus public sentiment, just the microblogging relevant with the cluster keyword set is pushed to the associated user, therefore, if focus public sentiment corresponding to this cluster keyword set is not new focus public sentiment, the microblogging that then can further specify focus public sentiment corresponding to this cluster keyword set of reflection has been pushed to the associated user before, thereby when focus public sentiment corresponding to this cluster keyword set is not new focus public sentiment, need not to push the microblogging relevant with the cluster keyword set to the associated user, can reduce the consumption of Internet resources.
Pass through said method, each cluster keyword set that server obtains by the increment cluster is exactly focus public sentiment keyword set corresponding to focus public sentiment that each microblogging of issuing in this setting-up time interval reflects, because the embodiment of the invention is when carrying out the increment cluster, need not to preset the number of cluster, therefore can not omit focus public sentiment keyword set corresponding to any focus public sentiment that interior each microblogging of issuing of this time interval reflects, thereby, in a single day server finds new focus public sentiment, just can timely relevant microblogging be pushed to the associated user.And, the complexity of above-mentioned increment cluster is also far below the complexity of k-means clustering algorithm of the prior art, therefore the method for the above-mentioned propelling movement microblogging that provides of the embodiment of the invention can further reduce the time-delay that pushes microblogging, has improved the real-time that pushes microblogging.
In embodiments of the present invention, because the frequency that the keyword of determining in each microblogging that receives among the step S101 shown in Figure 1 occurs is different, therefore for the accuracy of the cluster keyword set that improves follow-up definite certain focus public sentiment of reflection, server is according to each keyword of determining among the step S102 shown in Figure 1, adopt establishing method to determine that the method for keyword set is specially: for each keyword of determining, the number of times that in each microblogging that receives, occurs according to this keyword and, the quantity of the microblogging that receives, and the inverse document frequency of this keyword of pre-save, adopt formula
Figure BDA00002243360600091
Determine the weight of this keyword, wherein, n WordThe number of times that in each microblogging that receives, occurs for this keyword and, N is the quantity of the microblogging that receives, Idf is the inverse document frequency of this keyword of pre-save, Word WeightWeight for this definite keyword; According to the weight of each keyword of determining, select successively the first keyword of setting quantity according to weight order from big to small, adopt establishing method to determine keyword set.
Wherein, above-mentioned first sets quantity can set as required, for example can be set as 200.In the above-mentioned formula Be actually the word frequency of this keyword.Also be, server is determined keyword in each microblogging that receives after, determine the weight of each keyword, suppose that the first setting quantity is 200, front 200 keywords of the heavy maximum of server weighting then, according to these 200 keywords, in these 200 keywords, to select arbitrarily two keywords to consist of the method for a keyword set, determine keyword set, and determine all keyword set that employing the method can be determined, all keyword set that adopt the method to determine are then total Individual.
Accordingly, after determining keyword set, among the step S103 shown in Figure 1 each keyword set of determining is carried out before the increment cluster, also want the weight of each keyword set that calculative determination goes out, and still heavy several the larger keyword set of weighting are carried out the increment cluster.Concrete, before each keyword set of determining is carried out the increment cluster, also will be for each keyword set of determining, determine the mutual information of two keywords comprising in this keyword set, mutual information according to two keywords that comprise in this keyword set of determining, and the weight of these two keywords, adopt formula
Figure BDA00002243360600102
Determine the weight of this keyword set, wherein, i represents that the keyword i that comprises in this keyword set, j represent the keyword j that comprises in this keyword set,
Figure BDA00002243360600103
Be the weight of keyword i,
Figure BDA00002243360600104
Be the weight of keyword j, D WeightBe the weight of this definite keyword set, I (i, j) is the mutual information of keyword i and keyword j, and
Figure BDA00002243360600105
P (i) comprises the probability of this keyword i for a microblogging receiving, and p (j) comprises the probability of this keyword j for a microblogging that receives, and p (i, j) comprises the probability of this keyword i and keyword j simultaneously for a microblogging that receives; According to the weight of each keyword set of determining, select successively the keyword set of the second setting quantity according to weight order from big to small.After the keyword set of selecting the second setting quantity, then can according to select second set per two keyword set in the keyword set of quantity common factor and and concentrate the keyword that comprises, the second keyword of setting quantity of selecting is carried out the increment cluster.
Wherein, second sets quantity can set as required, for example is set as 300.Also be, determine after each keyword set, determine the weight of each keyword set, suppose that the second setting quantity is 300, then select front 300 keyword set of weight maximum, according to the common factor of per two keyword set in these 300 keyword set and and concentrate the keyword that comprises, these 300 keyword set are carried out the increment cluster.
In embodiments of the present invention, the method that step S103 shown in Figure 1 carries out the increment cluster to the second keyword set of setting quantity of selecting is specially: according to the weight of each keyword set of selecting, the second keyword set of setting quantity of selecting is sorted according to weight order from big to small; According to the sequencing of keyword set ordering, successively for each keyword set, carry out following steps A ~ B:
Steps A, with current for keyword set as keyword set to be clustered, determine to come each keyword set before the keyword set to be clustered, as the preorder keyword set;
Step B, for each preorder keyword set of determining, determine the first quantity of the keyword that comprises in the common factor of keyword set to be clustered and this preorder keyword set, determine the second quantity keyword set to be clustered and this preorder keyword set and that concentrate the keyword that comprises, when the ratio of the first quantity and the second quantity when setting ratio, the keyword that satisfies the first specified requirements in the keyword set to be clustered is added in this preorder keyword set, wherein, the keyword of satisfied the first specified requirements is: be included in this keyword set to be clustered, and be not included in the keyword in this preorder keyword set.
The increment clustering method of above-mentioned increment clustering method and step S103 shown in Figure 1 is basic identical, is according to weight order ordering from big to small to the clooating sequence of keyword set just.
For example, suppose that the keyword set of selecting has 3, be respectively set 1, set 2, set 3.Suppose to be set 1 according to weight order ordering from big to small, set 2, set 3, then gathered before 1 owing to coming without any keyword set, therefore begin to process from gathering 2, to gather first 2 as keyword set to be clustered, to gather 1 as the preorder keyword set, if gather 2 and set 1 common factor in the keyword that comprises the first quantity with and the second quantity of concentrating the keyword that comprises greater than setting ratio (for example 20%), then the keyword that satisfies the first specified requirements in the set 2 is added in the set 1, similarly, pair set 3 is processed again, just gives unnecessary details no longer one by one here.
Need to prove, when pair set 3 is processed, be to have added the set 1 of gathering after the keyword that satisfies the first specified requirements in 2 as the set 1 of preorder keyword set in the upper example.
As seen, the above-mentioned increment cluster that the invention process provides is to have certain directivity, and the direction of cluster is: the keyword that satisfies the first specified requirements in the keyword set that weight is less adds in the larger keyword set of weight.This is because common, the weight of the keyword set of reflection focus public sentiment is larger compared to the weight of the keyword set of the general public sentiment of reflection, therefore above-mentioned increment clustering method can be determined more accurately the focus public sentiment that each microblogging of receiving in this setting-up time interval reflects.
The method of above-mentioned increment cluster be with when the first quantity of the keyword that comprises in the common factor of keyword set to be clustered and preorder keyword set with and the ratio of the second quantity of concentrating the keyword that comprises when setting ratio, the keyword that satisfies the first specified requirements in the keyword set to be clustered added in the preorder keyword set as example to describing.In actual applications, can also determine the first quantity of the keyword that comprises in the common factor of keyword set to be clustered and preorder keyword set, less quantity in the quantity of the keyword that comprises in the quantity of the keyword that comprises in the keyword set to be clustered and the preorder keyword set is defined as the second quantity, when the ratio of the first quantity and the second quantity when setting ratio, the keyword that satisfies the first specified requirements in the keyword set to be clustered is added in the preorder keyword set.
In order further to improve the accuracy of the focus public sentiment that each microblogging of determining to receive in this setting-up time interval reflects, in the above-mentioned increment clustering method, before server adds to the keyword that satisfies the first specified requirements in the keyword set to be clustered in this preorder keyword set, also will be in each microblogging that receives, determine to comprise simultaneously the keyword that satisfies the first specified requirements, and the quantity of the microblogging of each keyword in this preorder keyword set is set quantity greater than the 3rd.Also be, the first quantity of the keyword that in the common factor of keyword set to be clustered and preorder keyword set, comprises with and the ratio of the second quantity of concentrating the keyword comprise greater than setting ratio, and, when the quantity that comprises simultaneously the microblogging of the keyword that satisfies the first specified requirements and each keyword in the preorder keyword set is set quantity greater than the 3rd, just the keyword that satisfies the first specified requirements in the keyword set to be clustered is added in the preorder keyword set, as long as an above-mentioned condition does not satisfy, then the keyword that satisfies the first specified requirements in the keyword set to be clustered is not added in the preorder keyword set.
Concrete, realize that the false code of method of above-mentioned increment cluster is as follows:
Figure BDA00002243360600121
Figure BDA00002243360600131
In above-mentioned false code, cluster is keyword set.
Method by above-mentioned increment cluster, after the second keyword set of setting quantity of selecting all processed, each keyword set after the processing that obtains is exactly the keyword set of having added some keyword, also namely obtained the cluster keyword set, each cluster keyword set is exactly the focus public sentiment keyword set corresponding to each focus public sentiment that reflects of microblogging of issue in this setting-up time interval.Follow-up then can be for each the cluster keyword set that obtains, judge whether exist in the focus public sentiment storehouse with the similarity of this cluster keyword set greater than the focus public sentiment keyword set of setting similarity, whether to judge focus public sentiment corresponding to this cluster keyword set as a new focus public sentiment, the microblogging that the cluster keyword set that wherein new focus public sentiment is corresponding is relevant is pushed to the associated user.
In embodiments of the present invention, step S104 shown in Figure 1 is for each the cluster keyword set that obtains, judge whether exist in the focus public sentiment storehouse with the similarity of this cluster keyword set greater than before the focus public sentiment keyword set of setting similarity, can also filter each the cluster keyword set that obtains, weed out the cluster keyword set of the negligible amounts of the negligible amounts of the keyword that comprises, relevant microblogging, the accuracy of the cluster keyword set of the reflection focus public sentiment of determining with further raising.Concrete, for each the cluster keyword set that obtains, judge whether exist in the focus public sentiment storehouse with the similarity of this cluster keyword set greater than before the focus public sentiment keyword set of setting similarity, server is in each the cluster keyword set that obtains, extract the cluster keyword set that satisfies the second specified requirements, this second specified requirements comprises: the quantity of the keyword that comprises is no less than the 4th and sets quantity, and the quantity of the microblogging relevant with this cluster keyword set is set quantity greater than the 5th; Wherein, the microblogging relevant with this cluster keyword set specifically comprises: include the microblogging of at least m keyword in this cluster keyword set, m is the 6th setting quantity.
Above-mentioned the 4th setting quantity and the 5th is set quantity and all can be set as required.For example, the 4th setting quantity is that 3, the five setting quantity are 20, then can reject the quantity of the keyword that comprises less than 3 in each the cluster keyword set that obtains by the increment cluster, and perhaps, the quantity of relevant microblogging is not more than 20 cluster keyword set.
Need to prove, in the relevant microblogging of cluster keyword set corresponding to each focus public sentiment, there is identical microblogging, with what avoid repetition same microblogging is pushed to the associated user, the problem of waste Internet resources, the method of determining the microblogging that each cluster keyword set of obtaining is relevant in the embodiment of the invention is specifically as follows: the weight that redefines each the cluster keyword set that obtains, according to weight order from big to small, successively for each cluster keyword set, determine relevant with this cluster keyword set (microblogging that includes at least m keyword in this cluster keyword set), and be not confirmed as the microblogging relevant with other cluster keyword set, as the microblogging relevant with this cluster keyword set.
For example, suppose that the cluster keyword set that obtains has 3, be respectively set 1, set 2, set 3, redefine the weight of these 3 cluster keyword set, be followed successively by according to weight order from big to small: set 1, set 2, set 3.Suppose that the microblogging that receives in this setting-up time interval has 5, be respectively microblogging 1 ~ 5, then first for set 1, determine to comprise in this set 1 at least that the microblogging of m keyword is microblogging 1, microblogging 2 and microblogging 3, then with gather 1 relevant microblogging and be microblogging 1, microblogging 2 and microblogging 3.Again for set 2, determining to comprise in this set 2 at least, the microblogging of m keyword is microblogging 3 and microblogging 4, and since microblogging 3 be confirmed as with to gather 1 relevant, therefore relevant microbloggings existence intersect for fear of the microblogging relevant with set 1 of determining with set 2, not with microblogging 3 as with set 2 relevant microbloggings, only microblogging 4 is defined as and gathers 2 relevant microbloggings, also namely in each microblogging that receives, will comprise in this set 2 at least m keyword, and be not confirmed as the microblogging relevant with other cluster keyword set and be defined as the microblogging relevant with this set 2.Similarly, for set 3, determine to comprise in this set 3 at least that the microblogging of m keyword is microblogging 4, microblogging 5, then only microblogging 5 is defined as and gathers 3 relevant microbloggings.
As seen, obtain the cluster keyword set by the increment cluster in the embodiment of the invention, and after extracting each the cluster keyword set that satisfies the second specified requirements, the method of determining the microblogging relevant with each cluster keyword set also is to have certain directivity, and this direction also is to pay the utmost attention to the larger cluster keyword set of weight.
Concrete, the method for the above-mentioned weight that redefines the cluster keyword set can for: based on formula D weight ′ = I ( i 1 , i 2 , . . . , i n ) × ( Word weight i 1 + Word weight i 2 + . . . . . . + Word weight i n ) , Wherein, i 1, i 2..., i nRepresent n keyword comprising in this cluster keyword set ...,
Figure BDA00002243360600162
Be respectively the weight of this n keyword, I (i 1, i 2..., i n) be the mutual information of this n keyword, and the mutual information of this n keyword can be according to formula I ( i 1 , i 2 , . . . . . . , i n ) = log p ( i 1 , i 2 , . . . . . . , i n ) p ( i 1 ) p ( i 2 ) . . . . . . p ( i n ) Determine D ' WeightWeight for this cluster keyword set of redefining.
Better, in the embodiment of the invention in order to improve the efficient of the weight that redefines the cluster keyword set, in the process of carrying out the increment cluster, for some keyword set, whenever, add a keyword in this keyword set, then redefine once the weight of this keyword set, when the increment cluster process finishes, this keyword set is exactly the cluster keyword set, and the weight of this definite keyword set is exactly the weight of this cluster keyword of redefining.The below adds a keyword so that concrete example explanation is every in this keyword set, then redefine once the method for the weight of this keyword set.
Suppose that keyword set is combined into that { weight of current this keyword set is for keyword X, keyword Y}
Figure BDA00002243360600165
If add keyword Z in this keyword set, then this moment this keyword set become keyword X, keyword Y, keyword Z}, at this moment, the method that redefines the weight of this keyword set is: adopt formula D weight X , Y , Z = I ( ( X , Y ) , Z ) × ( D weight X , Y + Word weight Z ) Redefine the weight of this keyword set, wherein,
Figure BDA00002243360600167
Be the weight of keyword Z, P (X, Y, Z) comprises the probability of this keyword X, keyword Y, keyword Z simultaneously for a microblogging receiving, and P (Z) comprises the probability of this keyword Z for a microblogging that receives,
Figure BDA00002243360600169
Weight for this keyword set of redefining.
Adopt said method, in the process of increment cluster, whenever, add a keyword in the keyword set, then redefine once the weight of this keyword set, after the increment cluster finishes, then can directly obtain the weight of each cluster keyword set, follow-uply then can sort according to the weight order from big to small of each cluster keyword set, and determine the microblogging that each cluster keyword set is relevant, just give unnecessary details no longer one by one here.
Adopt said method that the cluster keyword that obtains is filtered, and determine that the false code of the microblogging relevant with each the cluster keyword set that obtains is specific as follows:
Figure BDA00002243360600171
Figure BDA00002243360600181
In step S104 shown in Figure 1, for each the cluster keyword set that obtains, judge whether exist in the focus public sentiment storehouse with the similarity of this cluster keyword set when setting the focus public sentiment keyword set of similarity, can adopt the Jaccard similarity coefficient to determine the similarity of the focus keyword set in this cluster keyword set and the focus public sentiment storehouse.Concrete, satisfy the second specified requirements each cluster keyword set of (quantity of the keyword that comprises is no less than the 4th and sets the quantity of quantity, the microblogging relevant with this cluster keyword set greater than the 5th setting quantity), employing formula for what extract
Figure BDA00002243360600182
Determine this cluster keyword set respectively with focus public sentiment storehouse in the similarity of each focus public sentiment keyword set, and judge whether exist in the focus public sentiment storehouse with the similarity of this cluster keyword set greater than the focus public sentiment keyword set of setting similarity, wherein, A this cluster keyword set for extracting, B is the focus public sentiment keyword set in the described focus public sentiment storehouse, J (A, B) be this definite cluster keyword set and the similarity of focus public sentiment keyword set, | the quantity of the keyword that comprises in the common factor of A ∩ B| for this cluster keyword set A and focus public sentiment keyword set B, | A ∪ B| is quantity this cluster keyword set A and focus public sentiment keyword set B and that concentrate the keyword that comprises.
Pass through said method, then can determine the similarity of each focus public sentiment keyword set in this cluster keyword set and the focus public sentiment storehouse, if do not exist the similarity of any and this cluster keyword set greater than the focus public sentiment keyword set of setting similarity in the focus public sentiment storehouse, determine that then focus public sentiment corresponding to this cluster keyword is not to be a new focus public sentiment, by step S106 the microblogging relevant with this cluster keyword is pushed to the associated user, and this cluster keyword set is kept in the focus public sentiment storehouse as a new focus public sentiment keyword set.Wherein, this associated user can be the users such as relevant departments or relevant enterprise specifically, also can be the user who has customized this focus public sentiment.
Better, when for certain the cluster keyword set that obtains, determine not exist in the focus public sentiment storehouse with the similarity of this cluster keyword set when setting the focus public sentiment keyword set of similarity, can also be in each microblogging relevant with this cluster keyword set, the shared proportion of microblogging difference of statistics positive emotion, negative emotion, neutral emotion, the proportion that counts is sent to the associated user, so that the user better holds the public opinion emotion situation of focus public sentiment corresponding to this cluster keyword set, to tackle accordingly.
Concrete, for each microblogging relevant with this cluster keyword set, determine the emotion propensity value of this microblogging, emotion propensity value according to each microblogging relevant with this cluster keyword set of determining, determine respectively wherein the emotion propensity value greater than 0, less than 0, equal 0 the shared number percent of microblogging, the number percent of determining is sent to described associated user.Wherein, the emotion propensity value of determining for each microblogging relevant with this cluster keyword set can represent the emotion tendency of each microblogging.
Further, above-mentionedly determine that the method for the emotion propensity value of certain microblogging is specially: to the processing of making pauses in reading unpunctuated ancient writings of this microblogging, obtain each subordinate sentence in this microblogging; For each subordinate sentence, this subordinate sentence is carried out word segmentation processing, obtain each participle in this subordinate sentence, the emotion propensity value corresponding according to each participle of pre-save, determine the emotion propensity value of each participle in this subordinate sentence, according to the emotion propensity value of each participle in this subordinate sentence and the type of this subordinate sentence, determine the emotion propensity value of this subordinate sentence; With the emotion propensity value of each subordinate sentence in this microblogging of determining and value, be defined as the emotion propensity value of this microblogging.
Wherein, above-mentioned punctuate process can be specifically according to the processing of making pauses in reading unpunctuated ancient writings of the punctuation mark in the microblogging, such as with fullstop, say hello, the punctuation mark such as exclamation mark is made pauses in reading unpunctuated ancient writings processing.
The below illustrates the method for determining the emotion propensity value of microblogging in the embodiment of the invention respectively take four kinds of situations as example.
Situation one, suppose that this microblogging is for " brand A mobile phone is handy." then this microblogging only have a subordinate sentence, this subordinate sentence is that " brand A mobile phone is handy." this microblogging is carried out participle, can according to the emotion word with emotion tendency of pre-save, extract the emotion word in the participle that obtains.The emotion word of supposing pre-save comprises participle " good ", then this microblogging is carried out participle after, then extract the participle " good " in the participle obtain.The server pre-save emotion propensity value corresponding to each emotion word, wherein, if the emotion that the emotion word is corresponding is positive emotion, the emotion propensity value that then this emotion word of preservation is corresponding is greater than 0, if the emotion that the emotion word is corresponding is negative emotion, the emotion propensity value that then this emotion word of preservation is corresponding is less than 0.Here the emotion propensity value that the emotion word of pre-save " good " is corresponding is greater than 0, and only comprises an emotion word in this subordinate sentence, so emotion propensity value corresponding to this emotion word is the emotion propensity value of this subordinate sentence.Owing to only have this subordinate sentence in this microblogging, so the emotion propensity value of this subordinate sentence is the emotion propensity value of this microblogging, and the emotion propensity value of this microblogging that obtains is for greater than 0 value, and also namely the emotion of this microblogging tendency is positive emotion.
Situation two, suppose that this microblogging is for " brand A mobile phone is not handy." then similar with situation one, this microblogging is carried out participle, extract the emotion word in the participle that obtains, the emotion word of extraction is " good ".
If directly adopt the method for analogue one to determine the emotion tendency of this microblogging, the emotion of this microblogging tendency also is positive emotion so, but obviously, the emotion of this microblogging tendency is negative emotion.Therefore, in the embodiment of the invention when determining the emotion propensity value of microblogging, adopt the negative word rule to process, also be, for the emotion word that obtains, in the subordinate sentence at this emotion word place, determine to be arranged in the quantity q of the negative word that 3 participles before this emotion word comprise, the emotion propensity value of this emotion word of pre-save is multiplied each other with (1) q, with the product that obtains again as the emotion propensity value of this emotion word.Wherein, server can negative words such as " no ", "No" of pre-save, according to each negative word of preserving, determines the quantity of the negative word that comprises in 3 participles before the emotion word in certain subordinate sentence.
Because the emotion word in this microblogging is " good ", and there is a negative word " no " in 3 participles before being arranged in this emotion word, therefore the emotion propensity value with this emotion word " good " multiplies each other with (1), with the product that obtains again as the emotion propensity value of this emotion word, thereby the emotion propensity value of this microblogging is the value less than 0, and also namely the emotion of this microblogging tendency is negative emotion.
Situation three, suppose this microblogging for " brand A mobile phone is handy? " then similar with situation one, this microblogging is carried out participle, extract the emotion word in the participle that obtains, the emotion word of extraction is " good ".
If directly adopt the method for analogue one to determine the emotion tendency of this microblogging, the emotion of this microblogging tendency also is positive emotion so.But the sentence pattern of this subordinate sentence is the interrogative sentence type, and generally the emotion of this microblogging tendency should be to be partial to negative emotion.Therefore, when determining the emotion propensity value of microblogging, consider that sentence pattern is on the impact of the emotion tendency of subordinate sentence in the embodiment of the invention.When the sentence pattern of determining subordinate sentence is the interrogative sentence type, after then the emotion propensity value addition of each emotion word in this subordinate sentence being obtained and being worth, with multiplying each other with value and-1 of obtaining, with the product that obtains again as the emotion propensity value of this subordinate sentence.Wherein, server can be determined according to the punctuation mark of the subordinate sentence that obtains the sentence pattern of this subordinate sentence after to the microblogging punctuate, and when being question mark such as the punctuation mark when subordinate sentence, the sentence pattern of determining this subordinate sentence is the interrogative sentence type.
Because the subordinate sentence in this microblogging is the interrogative sentence type, therefore the emotion propensity value and-1 with emotion word " good " multiplies each other, with the product that the obtains emotion propensity value as this subordinate sentence, so the emotion propensity value of this microblogging is the value less than 0, and also namely the emotion of this microblogging tendency is negative emotion.
Situation four, suppose that this microblogging is for " brand A mobile phone is handy, but plain." then similar with situation one, this microblogging is carried out participle, extract the emotion word in the participle that obtains, can extract altogether two emotion words, two emotion words are " good ".
Determine the emotion tendency of this microblogging if adopt the method for analogue two, so because the emotion propensity value of the first half sentence in this subordinate sentence is the emotion propensity value of emotion word " good ", the emotion propensity value of later half sentence is the emotion propensity value of emotion word " good " and-1 product, therefore the emotion propensity value of this microblogging is 0, also is neutral emotion.But the sentence pattern of this subordinate sentence is turnover sentence type, and generally the emotion of this microblogging tendency should be to be partial to negative emotion.Therefore, server can pre-save has the participle of turnover meaning in the embodiment of the invention, for example " still ", " yet " etc., for the subordinate sentence that obtains, if determine to exist in this subordinate sentence the participle with turnover meaning, the sentence pattern of then determining this subordinate sentence is turnover sentence type, when determining the emotion propensity value of this subordinate sentence, can only determine to be positioned at have the turnover meaning participle after each emotion word the emotion propensity value and value, emotion propensity value as this subordinate sentence, perhaps, also can determine to be positioned at each the emotion word before the participle with turnover meaning the emotion propensity value and value, and will be somebody's turn to do and be worth and-1 product as the emotion propensity value of this subordinate sentence.
" brand A mobile phone is handy, but plain for microblogging." then can determine to be positioned at the emotion propensity value of participle " still " emotion word afterwards with turnover meaning; there is a negative word " no " owing to be arranged in front 3 participles of " still " emotion word " good " afterwards; the emotion propensity value that therefore is positioned at " still " emotion word " good " afterwards is the emotion propensity value and-1 product of the emotion word " good " of pre-save, and this product is the emotion propensity value of this microblogging.Perhaps, determine to be positioned at the emotion propensity value of participle " still " emotion word " good " before with turnover meaning, with this emotion propensity value and-1 the product emotion propensity value as this microblogging.As seen, " brand A mobile phone is handy, but plain no matter to adopt which kind of method to determine microblogging." the emotion propensity value, this emotion propensity value all is the value less than 0, also namely the emotion of this microblogging tendency is negative emotion.
In actual applications, then can be according to the subordinate sentence that comprises in the concrete microblogging and participle, adopt the disposal route of any one or combination of above-mentioned situation one ~ situation four, determine the emotion propensity value of microblogging.
When for certain cluster keyword set, determined after the emotion propensity value of all microbloggings relevant with this cluster keyword set, then can add up emotion propensity value wherein greater than the 0(positive emotion), less than the 0(negative emotion), equal the neutral emotion of 0() the shared number percent of microblogging, this number percent is also sent to the associated user.
Similarly, for certain cluster keyword set, if determine to exist in the focus public sentiment storehouse at least one and the similarity of this cluster keyword set focus public sentiment keyword set greater than the setting similarity, determine namely that also focus public sentiment corresponding to this cluster keyword set is not when being a new focus public sentiment, server can determine that also the emotion propensity value is greater than 0 in all microbloggings relevant with this cluster keyword set, less than 0, equal 0 the shared number percent of microblogging, and the number percent of determining sent to the associated user, so that the associated user is known the emotion trend of this focus public sentiment.For example go up in the time interval, the shared number percent of the microblogging of negative emotion is more in relevant each microblogging of the focus public sentiment keyword set of this focus public sentiment of reflection of statistics, and in the current time interval, then be that the shared number percent of the microblogging of positive emotion is more in relevant each microblogging of the focus public sentiment keyword set of this focus public sentiment of the reflection of statistics.
In addition, in embodiments of the present invention, when for certain the cluster keyword set that obtains, determine not exist in the focus public sentiment storehouse with the similarity of this cluster keyword set when setting the focus public sentiment keyword set of similarity, server is determined the temperature of focus public sentiment within the current time interval that this cluster keyword set is corresponding, and the temperature of determining is also sent to the associated user.Concrete, when not existing in the focus public sentiment storehouse with the similarity of this cluster keyword set when setting the focus public sentiment keyword set of similarity, for each microblogging relevant with this cluster keyword set, determine the temperature of this microblogging in described setting-up time interval, determine each microblogging the temperature this setting-up time interval in relevant with this cluster keyword set and be worth, and will determine send to the associated user with value.
Wherein, server determines that the method for certain microblogging the temperature this setting-up time interval in relevant with this cluster keyword is specially: determine number of times that number of times that this microblogging is forwarded in this setting-up time interval and quilt are commented on and value, as the definite temperature of this microblogging in this setting-up time interval.
Better, when server will the microblogging relevant with the cluster keyword be pushed to the associated user, can at each microblogging relevant with the cluster keyword, select the microblogging that temperature is the highest to be pushed to the associated user.
Similarly, for certain cluster keyword set, if determine to exist in the focus public sentiment storehouse at least one and the similarity of this cluster keyword set focus public sentiment keyword set greater than the setting similarity, determine namely that also focus public sentiment corresponding to this cluster keyword set is not when being a new focus public sentiment, server also can be determined the temperature of focus public sentiment within the current time interval that this cluster keyword set is corresponding, the temperature of determining is also sent to the associated user, so that the user is known the trend of the temperature of this focus public sentiment within each time interval.For example, the temperature of the temperature of this focus public sentiment of determining of current time interval this focus public sentiment more definite than a upper time interval is higher.
As seen, in embodiments of the present invention, after each cluster keyword set that server obtains by the increment cluster, for a cluster keyword set that obtains, if server determines not exist in the focus public sentiment storehouse similarity with this cluster keyword set greater than the focus public sentiment keyword set of setting similarity, when determining namely that also focus public sentiment corresponding to this cluster keyword set is a new focus public sentiment, the highest microblogging of temperature that then will be relevant with this cluster keyword set is pushed to the associated user, and in the microblogging relevant with this cluster keyword set that will add up the emotion propensity value greater than 0, less than 0, equal 0 the shared number percent of microblogging, and the temperature of each microblogging relevant with this cluster keyword set send to the associated user with value.If server determines in the focus public sentiment storehouse to exist similarity with this cluster keyword set greater than the focus public sentiment keyword set of setting similarity, determine namely that also focus public sentiment corresponding to this cluster keyword set is not when being a new focus public sentiment, then need not to push the microblogging relevant with this cluster keyword set, but can send in the microblogging relevant with this cluster keyword set of statistics the emotion propensity value greater than 0 to the associated user, less than 0, equal 0 the shared number percent of microblogging, and the temperature of each microblogging relevant with this cluster keyword set and the value.
In addition, in embodiments of the present invention.Server can also be monitored and early warning the microblogging that comprises the designated key word.Concrete, for each microblogging of in this setting-up time interval, issuing, consist of the monitoring keyword set with each keyword of in this microblogging, determining, if server determines not exist in the focus public sentiment storehouse similarity with this monitoring keyword set greater than the focus public sentiment keyword set of setting similarity, and comprise the designated key word in this monitoring keyword set, and the emotion propensity value of this microblogging is less than the 0(negative emotion), and the temperature of this microblogging in this setting-up time interval is greater than setting temperature, then this microblogging is pushed to the associated user, in order to send early warning to the associated user, may be a new focus public sentiment with what remind that this microblogging of associated user reflected, and the emotion of this microblogging tendency is negative emotion, and temperature is also higher.
More than the method for the propelling movement microblogging that provides for the embodiment of the invention, based on same thinking, the embodiment of the invention also provides a kind of device that pushes microblogging, as shown in Figure 2.
The apparatus structure synoptic diagram of the propelling movement microblogging that Fig. 2 provides for the embodiment of the invention specifically comprises:
Receive word-dividing mode 201, be used for being received in each microblogging of issue in the setting-up time interval, the keyword in each microblogging of determining to receive;
Keyword set determination module 202, be used for according to each keyword of determining, adopt establishing method to determine keyword set, and definite all keyword set that adopt described establishing method to determine, wherein, described establishing method is: select arbitrarily two keywords to consist of a keyword set in each keyword;
Increment cluster module 203, be used for according to the common factor of per two keyword set of each keyword set of determining and and concentrate the keyword that comprises, each keyword set of determining is carried out the increment cluster, obtain each cluster keyword set;
Judge pushing module 204, be used for for each the cluster keyword set that obtains, judge whether exist in the focus public sentiment storehouse with the similarity of this cluster keyword set greater than the focus public sentiment keyword set of setting similarity, when not existing, in each microblogging that receives, select the microblogging relevant with this cluster keyword set to be pushed to the associated user, and this cluster keyword set is kept in the described focus public sentiment storehouse as focus public sentiment keyword set.
Described reception word-dividing mode 201 specifically is used for, and each microblogging that receives is carried out word segmentation processing, determines the participle of specified type in each participle that obtains, as the keyword of determining.
Described keyword set determination module 202 specifically is used for, for each keyword of determining, the number of times that in each microblogging that receives, occurs according to this keyword and, the quantity of the microblogging that receives, and the inverse document frequency of this keyword of pre-save adopts formula
Figure BDA00002243360600241
Determine the weight of this keyword, wherein, n WordThe number of times that in each microblogging that receives, occurs for this keyword and, N is the quantity of the microblogging that receives, Idf is the inverse document frequency of this keyword of pre-save, Word WeightWeight for this definite keyword; According to the weight of each keyword of determining, select successively the first keyword of setting quantity according to weight order from big to small, the first keyword of setting quantity according to selecting adopts establishing method to determine keyword set.
Described keyword set determination module 202 also is used for, for each keyword set of determining, determine the mutual information of two keywords comprising in this keyword set, mutual information according to two keywords that comprise in this keyword set of determining, and the weights of these two keywords, adopt formula
Figure BDA00002243360600251
Determine the weight of this keyword set, wherein, i represents that the keyword i that comprises in this keyword set, j represent the keyword j that comprises in this keyword set, Be the weight of keyword i,
Figure BDA00002243360600253
Be the weight of keyword j, D WeightBe the weight of this definite keyword set, I (i, j) is the mutual information of keyword i and keyword j, and
Figure BDA00002243360600254
P (i) comprises the probability of this keyword i for a microblogging receiving, and p (j) comprises the probability of this keyword j for a microblogging that receives, and p (i, j) comprises the probability of this keyword i and keyword j simultaneously for a microblogging that receives; According to the weight of each keyword set of determining, select successively the keyword set of the second setting quantity according to weight order from big to small;
Described increment cluster module 203 specifically is used for, according to select second set per two keyword set in the keyword set of quantity common factor and and concentrate the keyword that comprises, the second keyword set of setting quantity of selecting is carried out the increment cluster.
Described increment cluster module 203 specifically is used for, weight according to each keyword set of selecting, the second keyword set of setting quantity of selecting is sorted according to weight order from big to small, sequencing according to the keyword set ordering, carry out for each keyword set successively: with current for keyword set as keyword set to be clustered, determine to come keyword set to be clustered each keyword set before, as the preorder keyword set, for each preorder keyword set of determining, determine the first quantity of the keyword that comprises in the common factor of keyword set to be clustered and this preorder keyword set, determine the second quantity this keyword to be clustered and this preorder keyword set and that concentrate the keyword that comprises, when the ratio of the first quantity and the second quantity when setting ratio, the keyword that satisfies the first specified requirements in the keyword set to be clustered is added in this preorder keyword set; Wherein, the keyword that satisfies the first specified requirements is: be included in this keyword set to be clustered and be not included in keyword in this preorder keyword set.
Described increment cluster module 203 also is used for, before adding to the keyword that satisfies the first specified requirements in the keyword set to be clustered in this preorder keyword set, in each microblogging that receives, determine to comprise simultaneously the keyword that satisfies the first specified requirements, and the quantity of the microblogging of each keyword in this preorder keyword set is set quantity greater than the 3rd.
Described increment cluster module 203 also is used for, in each the cluster keyword set that obtains, extract the cluster keyword set that satisfies the second specified requirements, described the second specified requirements comprises: the quantity of the keyword that comprises is no less than the 4th and sets quantity, and the quantity of the microblogging relevant with this cluster keyword set is set quantity greater than the 5th; Wherein, the microblogging relevant with this cluster keyword set specifically comprises: include the microblogging of at least m keyword in this cluster keyword set, m is the 6th setting quantity.
Described judgement pushing module 204 specifically is used for, and for each cluster keyword set that satisfies described the second specified requirements of extracting, adopts formula
Figure BDA00002243360600261
Determine this cluster keyword set respectively with focus public sentiment storehouse in the similarity of each focus public sentiment keyword set, and judge whether exist in the described focus public sentiment storehouse with the similarity of this cluster keyword set greater than the focus public sentiment keyword set of setting similarity, wherein, A this cluster keyword set for extracting, B is the focus public sentiment keyword set in the described focus public sentiment storehouse, J (A, B) is this definite cluster keyword set and the similarity of focus public sentiment keyword set.
Described judgement pushing module 204 also is used for, when not existing in the described focus public sentiment storehouse with the similarity of this cluster keyword set when setting the focus public sentiment keyword set of similarity, for each microblogging relevant with this cluster keyword set, determine the emotion propensity value of this microblogging, emotion propensity value according to each microblogging relevant with this cluster keyword set of determining, determine respectively wherein the emotion propensity value greater than 0, less than 0, equal 0 the shared number percent of microblogging, the number percent of determining is sent to described associated user; Wherein, the emotion propensity value of determining this microblogging specifically comprises: to the processing of making pauses in reading unpunctuated ancient writings of this microblogging, obtain each subordinate sentence in this microblogging, for each subordinate sentence, this subordinate sentence is carried out word segmentation processing, obtain each participle in this subordinate sentence, the emotion propensity value corresponding according to each participle of pre-save, determine the emotion propensity value of each participle in this subordinate sentence, according to the emotion propensity value of each participle in this subordinate sentence and the type of this subordinate sentence, that determines this subordinate sentence please change propensity value, with the emotion propensity value of each subordinate sentence in this microblogging of determining and value, be defined as the emotion propensity value of this microblogging.
Described judgement pushing module 204 also is used for, when not existing in the described focus public sentiment storehouse with the similarity of this cluster keyword set when setting the focus public sentiment keyword set of similarity, for each microblogging relevant with this cluster keyword set, determine the temperature of this microblogging in described setting-up time interval, determine each microblogging the temperature described setting-up time interval in relevant with this cluster keyword set and be worth, and will determine send to described associated user with value; Wherein, determine that the temperature of this microblogging in described setting-up time interval specifically comprises: determine the number of times that this microblogging is forwarded in described setting-up time interval and the number of times of being commented on and value, as the definite temperature of this microblogging in described setting-up time interval.
Described judgement pushing module 204 also is used for, for each microblogging of in described setting-up time interval, issuing, consist of the monitoring keyword set with each keyword of in this microblogging, determining, if do not exist in the described focus public sentiment storehouse with the similarity of this monitoring keyword set greater than the emotion propensity value that comprises designated key word and this microblogging in the focus public sentiment keyword set of setting similarity and this monitoring keyword set less than 0 and the temperature of this microblogging in described setting-up time interval greater than the setting temperature, then this microblogging is pushed to described associated user.
Concrete, the device of above-mentioned propelling movement microblogging can be arranged in server.
The embodiment of the invention provides a kind of method and device that pushes microblogging, keyword in each microblogging that the method is determined to receive in the setting-up time interval, the employing method that optional two keywords consist of keyword set in each keyword is determined keyword set, and according to the common factor of per two keyword set and and concentrate the keyword that comprises, each keyword set of determining is carried out the increment cluster, for the cluster keyword set that obtains, when not existing in the focus public sentiment storehouse with the similarity of this cluster keyword set when setting the focus public sentiment keyword set of similarity, the microblogging relevant with this cluster keyword is pushed to the associated user.Pass through said method, the cluster keyword set that obtains is exactly the focus public sentiment keyword set corresponding to focus public sentiment that reflects of each microblogging of issue in this setting-up time interval, and owing to when cluster, need not to preset the number of cluster, therefore can not omit any focus public sentiment keyword set in this setting-up time interval, the microblogging of reflection focus public sentiment can be pushed to relative users timely.
Obviously, those skilled in the art can carry out various changes and modification to the present invention and not break away from the spirit and scope of the present invention.Like this, if of the present invention these are revised and modification belongs within the scope of claim of the present invention and equivalent technologies thereof, then the present invention also is intended to comprise these changes and modification interior.

Claims (22)

1. a method that pushes microblogging is characterized in that, comprising:
Be received in each microblogging of issue in the setting-up time interval, the keyword in each microblogging of determining to receive;
According to each keyword of determining, adopt establishing method to determine keyword set, and determine all keyword set that the described establishing method of employing can be determined, wherein, described establishing method is: select arbitrarily two keywords to consist of a keyword set in each keyword;
According to the common factor of per two keyword set in each keyword set of determining and and concentrate the keyword that comprises, each keyword set of determining is carried out the increment cluster, obtain each cluster keyword set;
For each the cluster keyword set that obtains, judge whether exist in the focus public sentiment storehouse with the similarity of this cluster keyword set greater than the focus public sentiment keyword set of setting similarity, when not existing, in each microblogging that receives, select the microblogging relevant with this cluster keyword set to be pushed to the associated user, and this cluster keyword set is kept in the described focus public sentiment storehouse as focus public sentiment keyword set.
2. the method for claim 1 is characterized in that, the keyword in each microblogging of determining to receive specifically comprises:
Each microblogging that receives is carried out word segmentation processing, in each participle that obtains, determine the participle of specified type, as the keyword of determining.
3. the method for claim 1 is characterized in that, according to each keyword of determining, adopts establishing method to determine keyword set, specifically comprises:
For each keyword of determining, the number of times that in each microblogging that receives, occurs according to this keyword and, the quantity of the microblogging that receives, and the inverse document frequency of this keyword of pre-save adopts formula Determine the weight of this keyword, wherein, n WordThe number of times that in each microblogging that receives, occurs for this keyword and, N is the quantity of the microblogging that receives, Idf is the inverse document frequency of this keyword of pre-save, Word WeightWeight for this definite keyword;
According to the weight of each keyword of determining, select successively the first keyword of setting quantity according to weight order from big to small, the first keyword of setting quantity according to selecting adopts establishing method to determine keyword set.
4. method as claimed in claim 3 is characterized in that, according to the common factor of per two keyword set in each keyword set of determining and and concentrate the keyword that comprises, each keyword set of determining is carried out before the increment cluster, described method also comprises:
For each keyword set of determining, determine the mutual information of two keywords comprising in this keyword set, according to the mutual information of two keywords that comprise in this keyword set of determining, and the weight of these two keywords, adopt formula
Figure FDA00002243360500021
Determine the weight of this keyword set, wherein, i represents that the keyword i that comprises in this keyword set, j represent the keyword j that comprises in this keyword set,
Figure FDA00002243360500022
Be the weight of keyword i,
Figure FDA00002243360500023
Be the weight of keyword j, D WeightBe the weight of this definite keyword set, I (i, j) is the mutual information of keyword i and keyword j, and
Figure FDA00002243360500024
P (i) comprises the probability of this keyword i for a microblogging receiving, and p (j) comprises the probability of this keyword j for a microblogging that receives, and p (i, j) comprises the probability of this keyword i and keyword j simultaneously for a microblogging that receives;
According to the weight of each keyword set of determining, select successively the keyword set of the second setting quantity according to weight order from big to small;
According to the common factor of per two keyword set in each keyword set of determining and and concentrate the keyword that comprises, each keyword set of determining is carried out the increment cluster, specifically comprise:
According to select second set per two keyword set in the keyword set of quantity common factor and and concentrate the keyword that comprises, the second keyword set of setting quantity of selecting is carried out the increment cluster.
5. method as claimed in claim 4 is characterized in that, the second keyword set of setting quantity of selecting is carried out the increment cluster, specifically comprises:
According to the weight of selecting each keyword set, the second keyword set of setting quantity of selecting is sorted according to weight order from big to small;
According to the sequencing of keyword set ordering, successively for each keyword set, carry out following steps A ~ B:
Steps A, with current for keyword set as keyword set to be clustered, determine to come each keyword set before the keyword set to be clustered, as the preorder keyword set;
Step B, for each preorder keyword set of determining, determine the first quantity of the keyword that comprises in the common factor of keyword set to be clustered and this preorder keyword set, determine the second quantity keyword set to be clustered and this preorder keyword set and that concentrate the keyword that comprises, when the ratio of the first quantity and the second quantity when setting ratio, the keyword that satisfies the first specified requirements in the keyword set to be clustered is added in this preorder keyword set, wherein, the keyword of satisfied the first specified requirements is: be included in this keyword set to be clustered, and be not included in the keyword in this preorder keyword set.
6. method as claimed in claim 5 is characterized in that, add to the keyword that satisfies the first specified requirements in the keyword set to be clustered in this preorder keyword set before, described method also comprises:
In each microblogging that receives, determine to comprise simultaneously the keyword that satisfies the first specified requirements, and the quantity of the microblogging of each keyword in this preorder keyword set is set quantity greater than the 3rd.
7. method as claimed in claim 5, it is characterized in that, for each the cluster keyword set that obtains, judge whether exist in the focus public sentiment storehouse with the similarity of this cluster keyword set greater than before the focus public sentiment keyword set of setting similarity, described method also comprises:
In each the cluster keyword set that obtains, extract the cluster keyword set that satisfies the second specified requirements, described the second specified requirements comprises: the quantity of the keyword that comprises is no less than the 4th and sets quantity, and the quantity of the microblogging relevant with this cluster keyword set is set quantity greater than the 5th;
Wherein, the microblogging relevant with this cluster keyword set specifically comprises: include the microblogging of at least m keyword in this cluster keyword set, m is the 6th setting quantity.
8. method as claimed in claim 7 is characterized in that, for each the cluster keyword set that obtains, judges whether exist in the focus public sentiment storehouse with the similarity of this cluster keyword set greater than the focus public sentiment keyword set of setting similarity, specifically comprise:
For each cluster keyword set that satisfies described the second specified requirements of extracting, adopt formula
Figure FDA00002243360500031
Determine this cluster keyword set respectively with focus public sentiment storehouse in the similarity of each focus public sentiment keyword set, and judge whether exist in the described focus public sentiment storehouse with the similarity of this cluster keyword set greater than the focus public sentiment keyword set of setting similarity, wherein, A this cluster keyword set for extracting, B is the focus public sentiment keyword set in the described focus public sentiment storehouse, J (A, B) is this definite cluster keyword set and the similarity of focus public sentiment keyword set.
9. method as claimed in claim 8 is characterized in that, when not existing in the described focus public sentiment storehouse with the similarity of this cluster keyword set when setting the focus public sentiment keyword set of similarity, described method also comprises:
For each microblogging relevant with this cluster keyword set, determine the emotion propensity value of this microblogging;
According to the emotion propensity value of each microblogging relevant with this cluster keyword set of determining, determine respectively wherein the emotion propensity value greater than 0, less than 0, equal 0 the shared number percent of microblogging, definite number percent is sent to described associated user;
Wherein, the emotion propensity value of determining this microblogging specifically comprises:
To the processing of making pauses in reading unpunctuated ancient writings of this microblogging, obtain each subordinate sentence in this microblogging;
For each subordinate sentence, this subordinate sentence is carried out word segmentation processing, obtain each participle in this subordinate sentence, the emotion propensity value corresponding according to each participle of pre-save, determine the emotion propensity value of each participle in this subordinate sentence, according to the emotion propensity value of each participle in this subordinate sentence and the type of this subordinate sentence, determine the emotion propensity value of this subordinate sentence;
With the emotion propensity value of each subordinate sentence in this microblogging of determining and value, be defined as the emotion propensity value of this microblogging.
10. method as claimed in claim 8 is characterized in that, when not existing in the described focus public sentiment storehouse with the similarity of this cluster keyword set when setting the focus public sentiment keyword set of similarity, described method also comprises:
For each microblogging relevant with this cluster keyword set, determine the temperature of this microblogging in described setting-up time interval;
Determine each microblogging the temperature described setting-up time interval in relevant with this cluster keyword set and be worth, and will determine send to described associated user with value;
Wherein, determine that the temperature of this microblogging in described setting-up time interval specifically comprises:
Determine the number of times that this microblogging is forwarded in described setting-up time interval and the number of times of being commented on and value as the definite temperature of this microblogging in described setting-up time interval.
11. such as the arbitrary described method of claim 8 ~ 10, it is characterized in that described method also comprises:
For each microblogging of in described setting-up time interval, issuing, consist of the monitoring keyword set with each keyword of in this microblogging, determining;
If do not exist in the described focus public sentiment storehouse with the similarity of this monitoring keyword set greater than the emotion propensity value that comprises designated key word and this microblogging in the focus public sentiment keyword set of setting similarity and this monitoring keyword set less than 0 and the temperature of this microblogging in described setting-up time interval greater than the setting temperature, then this microblogging is pushed to described associated user.
12. a device that pushes microblogging is characterized in that, comprising:
Receive word-dividing mode, be used for being received in each microblogging of issue in the setting-up time interval, the keyword in each microblogging of determining to receive;
The keyword set determination module, be used for according to each keyword of determining, adopt establishing method to determine keyword set, and definite all keyword set that adopt described establishing method to determine, wherein, described establishing method is: select arbitrarily two keywords to consist of a keyword set in each keyword;
Increment cluster module, be used for according to the common factor of per two keyword set of each keyword set of determining and and concentrate the keyword that comprises, each keyword set of determining is carried out the increment cluster, obtain each cluster keyword set;
Judge pushing module, be used for for each the cluster keyword set that obtains, judge whether exist in the focus public sentiment storehouse with the similarity of this cluster keyword set greater than the focus public sentiment keyword set of setting similarity, when not existing, in each microblogging that receives, select the microblogging relevant with this cluster keyword set to be pushed to the associated user, and this cluster keyword set is kept in the described focus public sentiment storehouse as focus public sentiment keyword set.
13. device as claimed in claim 12 is characterized in that, described reception word-dividing mode specifically is used for, and each microblogging that receives is carried out word segmentation processing, determines the participle of specified type in each participle that obtains, as the keyword of determining.
14. device as claimed in claim 12, it is characterized in that, described keyword set determination module specifically is used for, for each keyword of determining, the number of times that in each microblogging that receives, occurs according to this keyword and, the quantity of the microblogging that receives, and the inverse document frequency of this keyword of pre-save, adopt formula
Figure FDA00002243360500061
Determine the weight of this keyword, wherein, n WordThe number of times that in each microblogging that receives, occurs for this keyword and, N is the quantity of the microblogging that receives, Idf is the inverse document frequency of this keyword of pre-save, Word WeightWeight for this definite keyword; According to the weight of each keyword of determining, select successively the first keyword of setting quantity according to weight order from big to small, the first keyword of setting quantity according to selecting adopts establishing method to determine keyword set.
15. device as claimed in claim 14, it is characterized in that, described keyword set determination module also is used for, for each keyword set of determining, determine the mutual information of two keywords comprising in this keyword set, according to the mutual information of two keywords that comprise in this keyword set of determining, and the weights of these two keywords, formula adopted
Figure FDA00002243360500062
Determine the weight of this keyword set, wherein, i represents that the keyword i that comprises in this keyword set, j represent the keyword j that comprises in this keyword set,
Figure FDA00002243360500063
Be the weight of keyword i,
Figure FDA00002243360500064
Be the weight of keyword j, D WeightBe the weight of this definite keyword set, I (i, j) is the mutual information of keyword i and keyword j, and
Figure FDA00002243360500065
P (i) comprises the probability of this keyword i for a microblogging receiving, and p (j) comprises the probability of this keyword j for a microblogging that receives, and p (i, j) comprises the probability of this keyword i and keyword j simultaneously for a microblogging that receives; According to the weight of each keyword set of determining, select successively the keyword set of the second setting quantity according to weight order from big to small;
Described increment cluster module specifically is used for, according to select second set per two keyword set in the keyword set of quantity common factor and and concentrate the keyword that comprises, the second keyword set of setting quantity of selecting is carried out the increment cluster.
16. device as claimed in claim 15, it is characterized in that, described increment cluster module specifically is used for, weight according to each keyword set of selecting, the second keyword set of setting quantity of selecting is sorted according to weight order from big to small, sequencing according to the keyword set ordering, carry out for each keyword set successively: with current for keyword set as keyword set to be clustered, determine to come keyword set to be clustered each keyword set before, as the preorder keyword set, for each preorder keyword set of determining, determine the first quantity of the keyword that comprises in the common factor of keyword set to be clustered and this preorder keyword set, determine the second quantity this keyword to be clustered and this preorder keyword set and that concentrate the keyword that comprises, when the ratio of the first quantity and the second quantity when setting ratio, the keyword that satisfies the first specified requirements in the keyword set to be clustered is added in this preorder keyword set; Wherein, the keyword that satisfies the first specified requirements is: be included in this keyword set to be clustered and be not included in keyword in this preorder keyword set.
17. device as claimed in claim 16, it is characterized in that, described increment cluster module also is used for, before adding to the keyword that satisfies the first specified requirements in the keyword set to be clustered in this preorder keyword set, in each microblogging that receives, determine to comprise simultaneously the keyword that satisfies the first specified requirements, and the quantity of the microblogging of each keyword in this preorder keyword set is set quantity greater than the 3rd.
18. device as claimed in claim 16, it is characterized in that, described increment cluster module also is used for, in each the cluster keyword set that obtains, extract the cluster keyword set that satisfies the second specified requirements, described the second specified requirements comprises: the quantity of the keyword that comprises is no less than the 4th and sets quantity, and the quantity of the microblogging relevant with this cluster keyword set is set quantity greater than the 5th; Wherein, the microblogging relevant with this cluster keyword set specifically comprises: include the microblogging of at least m keyword in this cluster keyword set, m is the 6th setting quantity.
19. device as claimed in claim 18 is characterized in that, described judgement pushing module specifically is used for, and for each cluster keyword set that satisfies described the second specified requirements of extracting, adopts formula
Figure FDA00002243360500071
Determine this cluster keyword set respectively with focus public sentiment storehouse in the similarity of each focus public sentiment keyword set, and judge whether exist in the described focus public sentiment storehouse with the similarity of this cluster keyword set greater than the focus public sentiment keyword set of setting similarity, wherein, A this cluster keyword set for extracting, B is the focus public sentiment keyword set in the described focus public sentiment storehouse, J (A, B) is this definite cluster keyword set and the similarity of focus public sentiment keyword set.
20. device as claimed in claim 19, it is characterized in that, described judgement pushing module also is used for, when not existing in the described focus public sentiment storehouse with the similarity of this cluster keyword set when setting the focus public sentiment keyword set of similarity, for each microblogging relevant with this cluster keyword set, determine the emotion propensity value of this microblogging, emotion propensity value according to each microblogging relevant with this cluster keyword set of determining, determine that respectively wherein the emotion propensity value is greater than 0, less than 0, equal 0 the shared number percent of microblogging, the number percent of determining is sent to described associated user; Wherein, the emotion propensity value of determining this microblogging specifically comprises: to the processing of making pauses in reading unpunctuated ancient writings of this microblogging, obtain each subordinate sentence in this microblogging, for each subordinate sentence, this subordinate sentence is carried out word segmentation processing, obtain each participle in this subordinate sentence, the emotion propensity value corresponding according to each participle of pre-save, determine the emotion propensity value of each participle in this subordinate sentence, according to the emotion propensity value of each participle in this subordinate sentence and the type of this subordinate sentence, that determines this subordinate sentence please change propensity value, with the emotion propensity value of each subordinate sentence in this microblogging of determining and value, be defined as the emotion propensity value of this microblogging.
21. device as claimed in claim 19, it is characterized in that, described judgement pushing module also is used for, when not existing in the described focus public sentiment storehouse with the similarity of this cluster keyword set when setting the focus public sentiment keyword set of similarity, for each microblogging relevant with this cluster keyword set, determine the temperature of this microblogging in described setting-up time interval, determine each microblogging the temperature described setting-up time interval in relevant with this cluster keyword set and be worth, and will determine send to described associated user with value; Wherein, determine that the temperature of this microblogging in described setting-up time interval specifically comprises: determine the number of times that this microblogging is forwarded in described setting-up time interval and the number of times of being commented on and value, as the definite temperature of this microblogging in described setting-up time interval.
22. such as the arbitrary described device of claim 19 ~ 21, it is characterized in that, described judgement pushing module also is used for, for each microblogging of in described setting-up time interval, issuing, consist of the monitoring keyword set with each keyword of in this microblogging, determining, if do not exist in the described focus public sentiment storehouse with the similarity of this monitoring keyword set greater than the focus public sentiment keyword set of setting similarity, and comprise the designated key word in this monitoring keyword set, and the emotion propensity value of this microblogging is less than 0, and the temperature of this microblogging in described setting-up time interval then is pushed to described associated user with this microblogging greater than setting temperature.
CN201210385036.7A 2012-10-11 2012-10-11 A kind of method and device pushing microblogging Active CN102968439B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210385036.7A CN102968439B (en) 2012-10-11 2012-10-11 A kind of method and device pushing microblogging

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210385036.7A CN102968439B (en) 2012-10-11 2012-10-11 A kind of method and device pushing microblogging

Publications (2)

Publication Number Publication Date
CN102968439A true CN102968439A (en) 2013-03-13
CN102968439B CN102968439B (en) 2015-11-25

Family

ID=47798579

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210385036.7A Active CN102968439B (en) 2012-10-11 2012-10-11 A kind of method and device pushing microblogging

Country Status (1)

Country Link
CN (1) CN102968439B (en)

Cited By (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103177024A (en) * 2011-12-23 2013-06-26 微梦创科网络科技(中国)有限公司 Method and device of topic information show
CN103198103A (en) * 2013-03-20 2013-07-10 微梦创科网络科技(中国)有限公司 Microblog pushing method and device based on dense word clustering
CN103247169A (en) * 2013-04-17 2013-08-14 浙江大学 Social network-based vehicle information publishing method
CN103246644A (en) * 2013-04-02 2013-08-14 亿赞普(北京)科技有限公司 Method and device for processing Internet public opinion information
CN103617298A (en) * 2013-12-19 2014-03-05 金蝶软件(中国)有限公司 Data connecting method and data connector
CN103747000A (en) * 2014-01-13 2014-04-23 深圳市深信服电子科技有限公司 Authentication method and authentication device for accessing wireless network
WO2014169672A1 (en) * 2013-04-17 2014-10-23 Tencent Technology (Shenzhen) Company Limited Method, apparatus and system for pushing micro-blogs
WO2015027909A1 (en) * 2013-08-29 2015-03-05 Tencent Technology (Shenzhen) Company Limited Method and apparatus for obtaining hot-topic information
CN104504024A (en) * 2014-12-11 2015-04-08 中国科学院计算技术研究所 Method and system for mining keywords based on microblog content
CN104615593A (en) * 2013-11-01 2015-05-13 北大方正集团有限公司 Method and device for automatic detection of microblog hot topics
CN104618216A (en) * 2013-11-05 2015-05-13 腾讯科技(北京)有限公司 Message management method, apparatus and system
CN104657665A (en) * 2015-03-12 2015-05-27 四川神琥科技有限公司 File processing method
CN104933475A (en) * 2015-05-27 2015-09-23 国家计算机网络与信息安全管理中心 Network forwarding behavior prediction method and apparatus
CN105335422A (en) * 2014-08-06 2016-02-17 阿里巴巴集团控股有限公司 Public opinion information warning method and apparatus
CN105378730A (en) * 2013-05-13 2016-03-02 卡塔尔基金会 Social media content analysis and output
CN105930488A (en) * 2016-05-03 2016-09-07 乐视控股(北京)有限公司 Information search processing method and apparatus
CN105979287A (en) * 2016-05-31 2016-09-28 无锡天脉聚源传媒科技有限公司 Method and device used for extracting and counting program key words
CN106202032A (en) * 2016-06-24 2016-12-07 广州数说故事信息科技有限公司 A kind of sentiment analysis method towards microblogging short text and system thereof
CN106202293A (en) * 2016-06-30 2016-12-07 北京奇艺世纪科技有限公司 The update method of a kind of accident corpus and device
CN108241699A (en) * 2016-12-26 2018-07-03 百度在线网络技术(北京)有限公司 For the method and apparatus of pushed information
CN109241274A (en) * 2017-07-04 2019-01-18 腾讯科技(深圳)有限公司 text clustering method and device
CN109582863A (en) * 2018-11-19 2019-04-05 珠海格力电器股份有限公司 Recommendation method and server
CN110990708A (en) * 2019-12-11 2020-04-10 Oppo(重庆)智能科技有限公司 Hot event determination method and device, storage medium and electronic equipment
CN111046264A (en) * 2019-11-29 2020-04-21 江西省天轴通讯有限公司 Public opinion cue processing method, system, readable storage medium and computer equipment
CN111080341A (en) * 2019-11-26 2020-04-28 微梦创科网络科技(中国)有限公司 Method and device for creating dynamic card of specific character
CN112101008A (en) * 2020-09-27 2020-12-18 北京百度网讯科技有限公司 Text popularity determination method and device, electronic equipment and storage medium

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109660580B (en) * 2017-10-11 2021-06-22 苏州跃盟信息科技有限公司 Information pushing method and device

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1191459A1 (en) * 2000-09-22 2002-03-27 Nightingale Technologies Ltd. Data clustering methods and applications
CN102346766A (en) * 2011-09-20 2012-02-08 北京邮电大学 Method and device for detecting network hot topics found based on maximal clique

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1191459A1 (en) * 2000-09-22 2002-03-27 Nightingale Technologies Ltd. Data clustering methods and applications
CN102346766A (en) * 2011-09-20 2012-02-08 北京邮电大学 Method and device for detecting network hot topics found based on maximal clique

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
杨洁: "多文档关键词抽取技术的研究", 《中国优秀硕士学位论文全文数据库 信息科级辑》, 15 October 2009 (2009-10-15) *
钱爱兵等: "基于改进TF-IDF的中文网页关键词抽取——以新闻网页为例", 《情报理论与实践》, 31 December 2008 (2008-12-31) *

Cited By (43)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103177024A (en) * 2011-12-23 2013-06-26 微梦创科网络科技(中国)有限公司 Method and device of topic information show
CN103198103A (en) * 2013-03-20 2013-07-10 微梦创科网络科技(中国)有限公司 Microblog pushing method and device based on dense word clustering
CN103198103B (en) * 2013-03-20 2016-06-29 微梦创科网络科技(中国)有限公司 The microblogging method for pushing of a kind of density based term clustering and device
CN103246644A (en) * 2013-04-02 2013-08-14 亿赞普(北京)科技有限公司 Method and device for processing Internet public opinion information
CN103246644B (en) * 2013-04-02 2017-05-03 亿赞普(北京)科技有限公司 Method and device for processing Internet public opinion information
CN103247169A (en) * 2013-04-17 2013-08-14 浙江大学 Social network-based vehicle information publishing method
WO2014169672A1 (en) * 2013-04-17 2014-10-23 Tencent Technology (Shenzhen) Company Limited Method, apparatus and system for pushing micro-blogs
CN105378730A (en) * 2013-05-13 2016-03-02 卡塔尔基金会 Social media content analysis and output
WO2015027909A1 (en) * 2013-08-29 2015-03-05 Tencent Technology (Shenzhen) Company Limited Method and apparatus for obtaining hot-topic information
CN104615593B (en) * 2013-11-01 2017-09-29 北大方正集团有限公司 Hot microblog topic automatic testing method and device
CN104615593A (en) * 2013-11-01 2015-05-13 北大方正集团有限公司 Method and device for automatic detection of microblog hot topics
CN104618216A (en) * 2013-11-05 2015-05-13 腾讯科技(北京)有限公司 Message management method, apparatus and system
CN104618216B (en) * 2013-11-05 2019-05-17 腾讯科技(北京)有限公司 Information management method, equipment and system
CN103617298A (en) * 2013-12-19 2014-03-05 金蝶软件(中国)有限公司 Data connecting method and data connector
CN103747000A (en) * 2014-01-13 2014-04-23 深圳市深信服电子科技有限公司 Authentication method and authentication device for accessing wireless network
CN103747000B (en) * 2014-01-13 2017-08-25 深信服科技股份有限公司 Access the authentication method and device of wireless network
CN105335422A (en) * 2014-08-06 2016-02-17 阿里巴巴集团控股有限公司 Public opinion information warning method and apparatus
CN105335422B (en) * 2014-08-06 2019-02-22 阿里巴巴集团控股有限公司 The alarm method and device of public feelings information
CN104504024A (en) * 2014-12-11 2015-04-08 中国科学院计算技术研究所 Method and system for mining keywords based on microblog content
CN104504024B (en) * 2014-12-11 2018-09-07 中国科学院计算技术研究所 Keyword method for digging based on content of microblog and system
CN104657665B (en) * 2015-03-12 2017-12-08 四川神琥科技有限公司 A kind of document handling method
CN104657665A (en) * 2015-03-12 2015-05-27 四川神琥科技有限公司 File processing method
CN104933475A (en) * 2015-05-27 2015-09-23 国家计算机网络与信息安全管理中心 Network forwarding behavior prediction method and apparatus
CN105930488A (en) * 2016-05-03 2016-09-07 乐视控股(北京)有限公司 Information search processing method and apparatus
CN105979287B (en) * 2016-05-31 2020-04-24 无锡天脉聚源传媒科技有限公司 Program keyword extraction and statistics method and device
CN105979287A (en) * 2016-05-31 2016-09-28 无锡天脉聚源传媒科技有限公司 Method and device used for extracting and counting program key words
CN106202032B (en) * 2016-06-24 2018-08-28 广州数说故事信息科技有限公司 A kind of sentiment analysis method and its system towards microblogging short text
CN106202032A (en) * 2016-06-24 2016-12-07 广州数说故事信息科技有限公司 A kind of sentiment analysis method towards microblogging short text and system thereof
CN106202293A (en) * 2016-06-30 2016-12-07 北京奇艺世纪科技有限公司 The update method of a kind of accident corpus and device
CN106202293B (en) * 2016-06-30 2019-05-10 北京奇艺世纪科技有限公司 A kind of update method and device of emergency event corpus
CN108241699A (en) * 2016-12-26 2018-07-03 百度在线网络技术(北京)有限公司 For the method and apparatus of pushed information
CN109241274A (en) * 2017-07-04 2019-01-18 腾讯科技(深圳)有限公司 text clustering method and device
CN109241274B (en) * 2017-07-04 2022-01-25 腾讯科技(深圳)有限公司 Text clustering method and device
CN109582863A (en) * 2018-11-19 2019-04-05 珠海格力电器股份有限公司 Recommendation method and server
CN109582863B (en) * 2018-11-19 2020-08-04 珠海格力电器股份有限公司 Recommendation method and server
CN111080341A (en) * 2019-11-26 2020-04-28 微梦创科网络科技(中国)有限公司 Method and device for creating dynamic card of specific character
CN111080341B (en) * 2019-11-26 2023-04-07 微梦创科网络科技(中国)有限公司 Method and device for creating dynamic card of specific character
CN111046264A (en) * 2019-11-29 2020-04-21 江西省天轴通讯有限公司 Public opinion cue processing method, system, readable storage medium and computer equipment
CN111046264B (en) * 2019-11-29 2023-07-21 江西省天轴通讯有限公司 Public opinion cue processing method, system, readable storage medium and computer device
CN110990708A (en) * 2019-12-11 2020-04-10 Oppo(重庆)智能科技有限公司 Hot event determination method and device, storage medium and electronic equipment
CN110990708B (en) * 2019-12-11 2023-05-02 Oppo(重庆)智能科技有限公司 Hot event determination method and device, storage medium and electronic equipment
CN112101008A (en) * 2020-09-27 2020-12-18 北京百度网讯科技有限公司 Text popularity determination method and device, electronic equipment and storage medium
CN112101008B (en) * 2020-09-27 2024-10-18 北京百度网讯科技有限公司 Text heat determining method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN102968439B (en) 2015-11-25

Similar Documents

Publication Publication Date Title
CN102968439B (en) A kind of method and device pushing microblogging
CN109033387B (en) Internet of things searching system and method fusing multi-source data and storage medium
CN103116605B (en) A kind of microblog hot event real-time detection method based on monitoring subnet and system
US9292877B2 (en) Methods and systems for generating concept-based hash tags
Efron Information search and retrieval in microblogs
US8838599B2 (en) Efficient lexical trending topic detection over streams of data using a modified sequitur algorithm
US9946775B2 (en) System and methods thereof for detection of user demographic information
EP2499563A1 (en) Method and system for text filtering
CN104077377A (en) Method and device for finding network public opinion hotspots based on network article attributes
CN103365924A (en) Method, device and terminal for searching information
CN105095223A (en) Method for classifying texts and server
CN103279479A (en) Emergent topic detecting method and system facing text streams of micro-blog platform
CN103324745A (en) Text garbage identifying method and system based on Bayesian model
CN102646108A (en) Information retrieval using subject-aware document ranker
CN103927297A (en) Evidence theory based Chinese microblog credibility evaluation method
CN109034912B (en) Information pushing system for electronic commerce operation with audience screening function
Psomakelis et al. Big IoT and social networking data for smart cities: Algorithmic improvements on Big Data Analysis in the context of RADICAL city applications
CN102279889A (en) Question pushing method and system based on geographic information
EP2741219A1 (en) Search method, system and device
CN105630884A (en) Geographic position discovery method for microblog hot event
CN101894129B (en) Video topic finding method based on online video-sharing website structure and video description text information
CN102970324A (en) Method and device for pushing microblogs
CN201957051U (en) System for users to recommend friends to friends on social network sites
US8620918B1 (en) Contextual text interpretation
US10318594B2 (en) System and method for enabling related searches for live events in data streams

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant