CN103514167B - Data processing method and equipment - Google Patents
Data processing method and equipment Download PDFInfo
- Publication number
- CN103514167B CN103514167B CN201210202800.2A CN201210202800A CN103514167B CN 103514167 B CN103514167 B CN 103514167B CN 201210202800 A CN201210202800 A CN 201210202800A CN 103514167 B CN103514167 B CN 103514167B
- Authority
- CN
- China
- Prior art keywords
- microblogging
- microblog users
- active time
- key word
- interval
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a kind of data processing method and equipment, the method can include:Active time is interval to determine step, for determining the microblog users group with similar active custom, and based on determined by the microblogging issued of concern user in microblog users group determine that the active time of each microblog users group is interval;Keyword extraction step, for from determined by all microbloggings in active time interval extract key words;And topic determines step, for based on the key word being extracted, the corresponding topic in active time interval determined by determination.According to the present invention it is possible to excavate specific microblog users group topic of interest in different active time intervals, thus targetedly enter row information and issuing and obtain, substantially increase the efficiency of information processing.
Description
Technical field
The present invention relates to a kind of data processing method and equipment, can excavate in special time more particularly, to one kind
The different user group topic, data processing method based on microblogging of interest and equipment in interval.
Background technology
In recent years, with the development of Internet technology, microblogging(micro-blog)It has been increasingly becoming people's communication exchange
One of important way.In numerous and jumbled network data, how to excavate required information more efficiently to carry out data processing to mutual
Networking technology proposes new challenge.
For example, for general working clan, on weekdays, it may concentrate in the active time interval of microblogging and for example go up
Between thirty at noon 8 to 9 thirty and 1 point to 2 points of afternoon(That is, a period of time before devoting oneself to work)And to 10 thirty in the evening 8
Thirty(That is, leisure time after meal)Etc., and at weekend, its active time interval may differ greatly from the work between date
Jump time interval.Accordingly, it would be desirable to a kind of can determine different user group in different active time interval topic of interest with
Targetedly enter row information to issue and obtain, thus greatly improving the technology of data-handling efficiency.
Content of the invention
Brief overview with regard to the present invention is given below, to provide basic with regard to certain aspects of the invention
Understand.It is understood, however, that this general introduction is not the exhaustive general introduction with regard to the present invention.It is not intended to for determining
The critical component of the present invention or pith, are not to be intended to limit the scope of the present invention.Its purpose is only with letter
The form changed provides some concepts with regard to the present invention, in this, as preamble in greater detail given later.
Therefore, in view of said circumstances, it is an object of the invention to provide a kind of data processing method and equipment, it can pass through
Different active time for specific microblog users group are interval, if determining that each customer group is of interest in this active time interval
Topic, so that user can targetedly release news and efficiently obtain required information.
To achieve these goals, according to an embodiment of the invention on one side, there is provided a kind of data processing method,
Including:Active time is interval to determine step, for determining the microblog users group with similar active custom, and based on being determined
Microblog users group in the microblogging issued of concern user interval come the active time to determine each microblog users group;Key word carries
Take step, for from determined by all microbloggings in active time interval extract key words;And topic determines step, it is used for
Based on the key word being extracted, the corresponding topic in active time interval determined by determination.
According to a preferred embodiment of the invention, in active time interval determination step, determine that there is similar active custom
Microblog users group may further include user vector build sub-step, for according to the conventional issuing microblog of microblog users when
Between and quantity building the user vector with predetermined dimensions;While determining sub-step, based on the similarity between each user vector,
Determine the side between user node;Microblog users group builds sub-step, for side determined by being based on, build have similar active
The microblog users group of custom;And concern user determine sub-step, for the vermicelli quantity based on each microblog users, issue micro-
Rich quantity, the reply quantity to the microblogging that this microblog users is issued and the forwarding number to the microblogging that this microblog users is issued
One or more of amount, determines the technorati authority of this microblog users, thus selected predetermined from microblog users group based on technorati authority
The microblog users of quantity are as concern user.
According to another preferred embodiment of the invention, determine in step active time is interval, be based on determined by microblogging
The microblogging that concern user in customer group issues may further include come the active time interval to determine each microblog users group:
Microblogging quantity statistics sub-step, for the number of statistics microblogging that described concern user issues within each period of predetermined period
Amount, thus obtain the microblogging quantity series with time correlation;Sequence recursive subdivision sub-step, for the microblogging quantity being counted
Sequence carries out recursive subdivision, thus obtaining one or more cut-points;And active time interval selection sub-step, in base
When in the time interval that obtained cut-point determines, the larger top n time interval of selection standard variance is as described enlivening
Between interval, wherein N is more than or equal to 1, wherein, in sequence recursive subdivision sub-step:For each point in current sequence, according to
Below equation is calculated:
AnthorV(i)=|L1(i)|*Var(L1(i))/|L|+|L2(i)|*Var(L2(i))/|L|
DiffV(i)=Var(L(i))-AnthorV(i)
Wherein, | L1 (i) |, | L2 (i) | represent that supposition i is current cut-point to two obtaining after current sequence segmentation respectively
The length of individual subsequence, | L | represents the length of current sequence, and Var () represents the standard variance of current sequence or subsequence;
Find out the maximum point of DiffV (i) in current sequence;And
If the DiffV (i) of this point is less than predetermined threshold, stops recursive subdivision, otherwise take this point as current sequence
Cut-point current sequence is divided into two subsequences, and continue to carry out recursive subdivision respectively to this two subsequences.
According to the another preferred embodiment of the present invention, topic determines that step may further include:Candidate key word list
Determine sub-step, interval for active time determined by being directed to, calculate the weight of each key word being extracted, and by weight
It is included in the interval candidate key word list of active time more than the key word of predetermined threshold;Keyword relevance calculates sub-step
Suddenly, for the degree of association between any two key word in candidate key word list determined by calculating;Figure construction sub-step,
For with each key word in candidate key word list as node, to calculate degree of association more than predetermined threshold as key
Structural map is carried out on side between word;And topic determines sub-step, for based on the figure being constructed, using clustering algorithm, determine institute
The corresponding topic in active time interval determining.
According to the further embodiment of the present invention, determine in sub-step in candidate key word list, can be directed to described
Active time is interval, calculates the weight of each key word according to below equation:
W(k)=count(k)*log(Q/counttimes(k))*log(authorfollowers(k))
Wherein, count (k) represents the occurrence number of key word k, and Q represents the microblogging quantity in active time interval,
Counttimes (k) represents microblogging number key word k, and authorfollowers (k) represents that issue includes key word k's
The vermicelli sum of the people of microblogging.
According to an embodiment of the invention on the other hand, additionally provide a kind of data handling equipment, it includes:Active time
Interval determination unit, is configured to determine the microblog users group with similar active custom, and microblogging determined by being based on is used
The microblogging that concern user in the group of family issues is interval come the active time to determine each microblog users group;Keyword extracting unit,
Be configured to from determined by all microbloggings in active time interval extract key words;And topic determining unit, it is configured
Become based on the key word being extracted, the corresponding topic in active time interval determined by determination.
In addition, according to an embodiment of the invention on the other hand, additionally providing a kind of terminal unit, this terminal unit includes
Above-mentioned data handling equipment.This terminal unit for example includes mobile phone, palm PC, panel computer, PC, etc..
In addition, another according to an embodiment of the invention aspect, additionally provide a kind of storage medium, this storage medium includes
Machine-readable program code, when configuration processor code on messaging device, this program code makes information processing set
Standby execution is according to the data processing method of the present invention.
Additionally, another further aspect according to an embodiment of the invention, additionally provide a kind of program product, this program product includes
The executable instruction of machine, when execute instruction on messaging device, this instruction makes messaging device execution basis
The data processing method of the present invention.
Therefore, according to embodiments of the invention, can targetedly carry out topic issue and acquisition of information such that it is able to
Better profit from microblog and obtain information, substantially increase the efficiency of data processing.
Following description partly in provide other aspects of the embodiment of the present invention, wherein, describe in detail for abundant
The preferred embodiment of the open embodiment of the present invention in ground, and it is not applied to limit.
Brief description
The present invention can be by reference to being better understood below in association with the detailed description given by accompanying drawing, wherein
Employ same or analogous reference in all of the figs to represent same or like part.Described accompanying drawing together with
The detailed description in face comprises together in this manual and forms a part for description, for the present invention is further illustrated
Preferred embodiment and explain the present invention principle and advantage.Wherein:
Fig. 1 is the flow chart illustrating data processing method according to an embodiment of the invention;
Fig. 2 is to be shown in the active time interval determination step shown in Fig. 1 to determine the microblogging with similar active custom
The flow chart of the detailed process of customer group;
Fig. 3 is to be shown in the active time interval shown in Fig. 1 to determine that in step, the microblogging based on concern user's issue is Lai really
The flow chart determining the interval detailed process of active time;
Fig. 4 is the schematic diagram illustrating microblogging quantity statistics;
Fig. 5 is to illustrate the flow chart that the topic shown in Fig. 1 determines the detailed process of step;
Fig. 6 is the schematic diagram illustrating topic cluster result;
Fig. 7 is the block diagram illustrating the functional configuration of data handling equipment according to an embodiment of the invention;
Fig. 8 is the block diagram of the example of detailed functions configuration illustrating the active time interval determination unit shown in Fig. 7;
Fig. 9 is the block diagram of another example of detailed functions configuration illustrating the active time interval determination unit shown in Fig. 7;
Figure 10 is the block diagram of the detailed functions configuration illustrating the topic determining unit shown in Fig. 7;And
Figure 11 is the example of the personal computer being shown as the messaging device employed in embodiments of the invention
The block diagram of property structure.
Specific embodiment
Hereinafter in connection with accompanying drawing, the one exemplary embodiment of the present invention is described.For clarity and conciseness,
All features of actual embodiment are not described in the description.It should be understood, however, that developing any this actual enforcement
A lot of decisions specific to embodiment, to realize the objectives of developer, for example, symbol must be made during example
Close those restrictive conditions related to system and business, and these restrictive conditions may have with the difference of embodiment
Changed.Additionally, it also should be appreciated that although development is likely to be extremely complex and time-consuming, but to having benefited from the disclosure
For those skilled in the art of content, this development is only routine task.
Here is in addition it is also necessary to illustrate is a bit, in order to avoid having obscured the present invention because of unnecessary details, in the accompanying drawings
Illustrate only and the device structure closely related according to the solution of the present invention and/or process step, and eliminate and the present invention
The little other details of relation.
Data processing method and equipment according to an embodiment of the invention to be described hereinafter with reference to Fig. 1 to Figure 10.
First, will be with reference to Fig. 1 description data processing method according to an embodiment of the invention.As shown in figure 1, data processing
Method can include active time and determine that step S101, keyword extraction step S102 and topic determine step S103.
Specifically, it may be determined that the microblogging with similar active custom is used in active time interval determination step S101
Family group, and based on determined by the microblogging issued of concern user in microblog users group determine the work of each microblog users group
Jump time interval.
Preferably, as described above, there are the different crowds that enliven in different time segment limits, for example, go to work for common
Race, student or pensioner old man, because their daily schedule is different, thus it is interval to have dramatically different active time.
Therefore it is necessary first to determine the microblog users group with similar active custom in vast microblog users, thus according to each
In microblog users group concern user issued microblogging, for particular group and issue its topic of interest or information.
To describe the handling process determining the microblog users group with similar active custom hereinafter with reference to Fig. 2 in detail.
As shown in Fig. 2 the active time shown in Fig. 1 is interval determining in step S101, determine that there is similar active custom
Microblog users group may further include user vector and build sub-step S201, side and determine sub-step S202, microblog users group
Build sub-step S203 and concern user determines sub-step S204.
Specifically, first, build in sub-step S201 in user vector, can be according to the conventional issuing microblog of microblog users
T/A has the user vector of predetermined dimensions to build.As an example, can be in units of hour, using daily as system
Count interval to build the user vector of 24 dimensions.Specifically, each user vector can be represented as V=(N1, n2 ..., n24),
Wherein, ni represents the quantity of the issuing microblog of each microblog users within this period.Although it should be understood that here in units of hour
To build the user vector of 24 dimensions, but this is only exemplary rather than limiting, and can build more or less dimension as desired
User vector.
Next, determining in sub-step S202 on side, structure in sub-step S201 can be built based in user vector
Similarity between each user vector, determines the side between user node.
Preferably, determine in sub-step S202 on side, each user can be determined with the method based on co sinus vector included angle
Vector between similarity, and by determined by similarity be more than predetermined threshold two user nodes between side be defined as
Formal side.
Specifically, for example, for any two user vector V1=(N1, n2 ..., n24), V2=(P1, p2 ..., p24),
Following formula can be passed through(1)To calculate the similarity between user vector V1 and V2:
CosVal=(n1*p1+n2*p2+…+n24*p24)/sqrt(n1*n1+n2*n2+…+n24*n24)*sqrt(p1*
p1+p2*p2+…+p24*p24)(1)
Wherein sqrt represents extraction of square root computing, and cosval represents the similarity between user.Preferably, if cosval>
M, then by between this two users, when being defined as formal, wherein m is predetermined threshold value.
Next, building in sub-step S203 in microblog users group, can determine in sub-step S202 really based at edge
Formal side between fixed user node, is built using the figure partitioning algorithm of CNM etc. and has the micro- of similar active custom
Rich customer group, for example, can be expressed as C=(V1,V2,…,Vr).
Subsequently, determine in sub-step S204 in concern user, can vermicelli quantity based on each microblog users, issue micro-
The forwarding quantity of rich quantity, the reply quantity to the microblogging that this microblog users is issued and the microblogging that microblog users are issued
One or more of, determine the technorati authority of this microblog users, thus based on determined by technorati authority micro- from constructed each
The microblog users selecting predetermined quantity in rich customer group are as concern user.
For example, in the case of using microblogging quantity b of vermicelli quantity a of microblog users and issue as Consideration, can
With by following formula(2)To calculate the technorati authority of this microblog users:
Authority=Log(b)*Log(a)(2)
Wherein, Authority represents the technorati authority of microblog users, and log is logarithm operation.Preferably, each microblogging can be taken
In customer group technorati authority size for example front 50% user as concern user, i.e. as significant object of statistics.Ying Li
Solution, this technorati authority computational methods is merely illustrative and unrestricted.
By processing it is determined that having the microblog users of similar active custom in above-mentioned steps S201 to step S204
Group, and further define the concern user in each microblog users group.Hereinafter with reference to Fig. 3 be described in shown in Fig. 1 when enlivening
Between determine in step S101 based on determined by the microblogging issued of concern user in microblog users group determine that each microblogging is used
The interval detailed process of the active time of family group.
As shown in figure 3, active time shown in Fig. 1 is interval determining in step S101, be based on determined by microblog users
The microblogging that concern user in group issues can include microblogging quantity system come the active time interval to determine each microblog users group
Meter sub-step S301, sequence recursive subdivision sub-step S302 and active time interval selection sub-step S303.
First, in microblogging quantity statistics sub-step S301, can count and be determined within each period of predetermined period
Concern user issue microblogging quantity, thus obtaining the microblogging quantity series with time correlation.Even if preferably due to right
In same user, it is also likely to be dramatically different with the work and rest at weekend on weekdays, and therefore this statistics can be directed to working day
Carry out respectively with weekend, so that this statistical work is more reasonable, more accurately to carry out topic excavation.Here, as showing
Example, using one day as predetermined period, with minute for interval, determines the microblogging quantity series with time correlation.Using transverse axis as when
Between and for example with minute for interval, and using issue microblogging quantity as the longitudinal axis, thus obtaining statistics for example as indicated at 4
Figure, wherein, Fig. 4(a)Represent and be directed to workaday cartogram, and Fig. 4(b)Represent the cartogram for weekend.Therefore, really
Each element in fixed microblogging quantity series, in sequence(That is, microblogging quantity)Corresponding with each period.
Next, in sequence recursive subdivision sub-step S302, can be to the microblogging quantity sequence of statistics in step S301
Row carry out recursive subdivision, thus obtaining one or more cut-points.
Specifically, in sequence recursive subdivision sub-step S302, it is carried out as follows recursive subdivision:
Firstly, for the every bit in current sequence, according to following formula(3)With(4)Calculated:
AnthorV(i)=|L1(i)|*Var(L1(i))/|L|+|L2(i)|*Var(L2(i))/|L|(3)
DiffV(i)=Var(L(i))-AnthorV(i)(4)
Wherein, | L1 (i) |, | L2 (i) | represent that supposition i is current cut-point to two obtaining after current sequence segmentation respectively
The length of individual subsequence, | L | represents the length of current sequence, and Var () represents the standard variance of current sequence or subsequence, wherein
Variance is less then it represents that this sequence is more uniform.
Next, finding the maximum point of DiffV (i) in current sequence.If the DiffV (i) of maximum is less than predetermined threshold
Value, then stop recursive subdivision, otherwise, then as cut-point, current sequence is divided into two using the maximum point of DiffV (i) in sequence
Subsequence, and respectively recursive subdivision is proceeded to this two subsequences in a similar manner, it is hereby achieved that one or many
Individual cut-point.The purpose of this series of processes is to find the interval that the quantity of user's issuing microblog is uprushed, that is, user
Active time is interval, the time interval that the quantity of issuing microblog for example as shown in Figure 4 is uprushed.
Next the determination interval continuing on active time is processed.Specifically, the active time shown in Fig. 3 is interval
Select sub-step S303 in, can determined based on obtained cut-point in sequence recursive subdivision sub-step S302 when
Between in interval, the larger top n time interval of selection standard variance is interval as the active time of this microblog users group, wherein N
It is the predetermined value more than or equal to 1.
After determine the active time interval of specific microblog users group according to a series of above-mentioned process, need true further
Determine these users in the interval topic of interest of different active time, to improve the efficiency of data processing, enabling there is pin
Row information issue and acquisition are entered to property.Next, referring back to Fig. 1, will be continuing on counting according to an embodiment of the invention
According to processing method.
In keyword extraction step S102, can be all micro- in the active time interval determining step S101
Rich extraction key word.This keyword extracting method for example can include participle, stop words filters etc., and those skilled in the art are permissible
Execute this process using any suitable keyword extraction techniques well known in the art, will not be described here.
Next, determining in step S103 in topic, institute can be determined based on the key word being extracted in step s 102
The corresponding topic in active time interval determining.
Hereinafter with reference to Fig. 5, the detailed process that topic determines step to be described.
As shown in figure 5, topic determines that step S103 can include candidate key word list and determine sub-step S501, key word
Relatedness computation sub-step S502, figure construction sub-step S503 and topic determine sub-step S504.
First, determine in sub-step S501 in candidate key word list, can be directed to determined by active time interval, meter
Calculate the weight of each key word being extracted, and the key word that weight is more than predetermined threshold is included into the interval time of this active time
Select in lists of keywords.
Specifically, for determined by active time interval, for example can pass through following formula(5)Extracted to calculate
Each key word weight:
W(k)=count(k)*log(Q/counttimes(k))*log(authorfollowers(k))(5)
Wherein, count (k) represents the occurrence number of key word k, and Q represents the microblogging quantity in described active time interval,
Counttimes (k) represents microblogging number key word k, and authorfollowers (k) represents that issue includes key word k's
The vermicelli sum of the people of microblogging.Here logarithm operation is to affect the accuracy of result to prevent the fluctuation of vermicelli number too big.
Next, calculating in sub-step S502 in keyword relevance, can calculate and waiting determined by step S501
Select the degree of association between any two key word in lists of keywords.
Specifically, as an example, following formula can be passed through(6)To calculate the degree of association between two key words:
I(A,B)=log(p(A,B))/(log(P(A))*log(P(B)))
Wherein, P (A), P (B) be illustrated respectively in active time interval in, with respect to whole microblogging numbers, occur key word A or
The probability of the microblogging of B, P (A, B) represents in described active time interval, with respect to whole microblogging numbers, key word A simultaneously
Probability with the microblogging of B.
Next, in figure construction sub-step S503, can be with the candidate key word list of determination in step S501
Each key word be node, to calculate degree of association more than predetermined threshold in step S502 as the side between key word
Carry out structural map.
Then, determine in sub-step S504 in topic, can be calculated using cluster based on the figure being constructed in step S503
Method is determining the corresponding topic in each active time interval.Preferably, can be carried out using CNM figure partitioning algorithm here
Topic clusters.For example as shown in fig. 6, wherein, different colors represents different topic clusters to the topic dendrogram finally giving.Example
As, the topic such as air quality, pollution, environmental protection is the topic relevant with environmental conservation, and reform, enter a higher school, taking an examination etc. topic be with
Educate relevant topic.
Although describing data processing method according to embodiments of the present invention, ability in detail above in conjunction with accompanying drawing 1-6
The technical staff in domain should be understood that what flow chart shown in the drawings was merely exemplary, and can be according to practical application and tool
The difference that body requires, is changed accordingly to said method flow process.For example, as needed, can be to certain in said method
The execution sequence of a little steps is adjusted, or can save or add some process steps.Additionally, above-described key
The computational methods of the degree of association between word weight, key word etc. are merely illustrative and unrestricted, and can adopt known in this field
Other technology calculating.
Corresponding with data processing method according to embodiments of the present invention, the embodiment of the present invention additionally provides at a kind of data
Reason equipment.
Specifically, as shown in fig. 7, data handling equipment 700 can include active time interval determination unit 701, key
Word extraction unit 702 and topic determining unit 703.The functional configuration of unit described in detail below.
Active time interval determination unit 701 may be configured to determine the microblog users group with similar active custom,
And based on determined by the microblogging issued of concern user in microblog users group determine when enlivening of each microblog users group
Between interval.
Preferably, as shown in figure 8, this active time interval determination unit 701 may further include user vector structure
Subelement 801, side determination subelement 802, microblog users group build subelement 803 and concern user's determination subelement 804.With
Under will be described in the functional configuration of each subelement.
User vector builds subelement 801 and may be configured to the T/A according to the conventional issuing microblog of microblog users
To build the user vector with predetermined dimensions.Here, as an example, the user vector V=of 24 dimensions can be built(N1,
N2 ..., n24), wherein, ni represents the quantity of the issuing microblog of each microblog users within this period.
Side determination subelement 802 be configured to similarity between each user vector come to determine user node it
Between side.Preferably as example, determine the similarity between each user vector with the method based on co sinus vector included angle,
And by determined by similarity be more than predetermined threshold two user nodes between when being defined as formal.
Microblog users group build subelement 803 be configured to determined by formal side between user node,
Build the microblog users group with similar active custom using the figure partitioning algorithm of CNM etc., for example, can be expressed as C=
(V1, V2 ..., Vr).
Concern user's determination subelement 804 is configured to the vermicelli quantity of each microblog users, the microblogging issued
In the forwarding quantity of quantity, the reply quantity to the microblogging that this microblog users is issued and the microblogging that microblog users are issued
One or more, determine the technorati authority of this microblog users, thus based on determined by technorati authority from each constructed microblogging
The microblog users selecting predetermined quantity in customer group are as concern user.As an example, for example can be with the vermicelli of microblog users
Quantity and the microblogging quantity issued are as Consideration, and take technorati authority size in microblog users group to make in front 50% user
For paying close attention to user.
Preferably, as shown in figure 9, active time interval determination unit 701 can further include microblogging quantity statistics
Subelement 901, sequence recursive subdivision subelement 902 and active time interval selection subelement 903.
Microblogging quantity statistics subelement 901 may be configured to statistics in predetermined period(For example, one day)Each period
The quantity of the microblogging that concern user determined by interior issues, thus obtain the microblogging quantity series with time correlation.Preferably, should
Statistical work can be carried out, so that statistical result is more scientific and reasonable respectively for working day and weekend.
Sequence recursive subdivision subelement 902 may be configured to carry out recursive subdivision to the microblogging quantity series being counted,
Thus obtaining one or more cut-points.
Active time interval selection subelement 903 be may be configured in the time being determined based on obtained cut-point
In interval, the larger top n time interval of selection standard variance is interval as the active time of this microblog users group, and wherein N is
Predetermined value more than or equal to 1.
Next, referring back to Fig. 7, by each list continuing on data handling equipment according to an embodiment of the invention
The functional configuration of unit.
Keyword extracting unit 702 may be configured to from determined by all microbloggings in active time interval extract and close
Keyword.Keyword extracting method can be it is known in the art that will not be described here.
Topic determining unit 703 is configured to extracted key word, determines the work of each microblog users group
Corresponding topic in jump time interval.
With reference to Figure 10, topic determining unit 703 can include candidate key word list determination subelement 1001, key word phase
Pass degree computation subunit 1002, figure construction subelement 1003 and topic determination subelement 1004.Described in detail below each
The functional configuration of subelement.
Specifically, candidate key word list determination subelement 1001 may be configured to be directed to determined by active time area
Between, calculate the weight of each key word being extracted, and the key word that weight is more than predetermined threshold is included into this active time area
Between candidate key word list in.
Keyword relevance computation subunit 1002 may be configured in candidate key word list determined by calculating
Degree of association between any two key word.
Candidate key word list determination subelement 1001 and keyword relevance computation subunit 1002 are adopted
Keyword weight computational methods and relatedness computation method, may be referred to determine sub-step above in relation to candidate key word list
Method employed in S501 and keyword relevance calculating sub-step S502, here is not repeated to describe.
Figure construction subelement 1003 may be configured to determined by each key word in candidate key word list be
Node, carrys out structural map more than the degree of association of predetermined threshold as the side between key word calculating.
Topic determination subelement 1004 is configured to constructed figure, to determine each work using clustering algorithm
Corresponding topic in jump time interval.Preferably, clustering algorithm can be CNM figure partitioning algorithm.The topic cluster finally giving
Result is for example as shown in Figure 6.
It should be noted that the equipment described in the embodiment of the present invention is corresponding with preceding method embodiment, therefore, if
The part not described in detail in standby embodiment, refers to the introduction of relevant position in embodiment of the method, repeats no more here.
In addition, it should also be noted that above-mentioned series of processes and equipment can also be realized by software and/or firmware.?
In the case of being realized by software and/or firmware, from storage medium or network to the computer with specialized hardware structure, for example
General purpose personal computer 1100 shown in Figure 11 installs the program constituting this software, this computer when being provided with various program,
It is able to carry out various functions etc..
In fig. 11, CPU(CPU)1101 according to read only memory(ROM)In 1102 storage program or from
Storage part 1108 is loaded into random access memory(RAM)The 1103 various process of program performing.In RAM 1103, also root
Store the data required when CPU 1101 executes various process etc. according to needs.
CPU 1101, ROM 1102 and RAM 1103 are connected to each other via bus 1104.Input/output interface 1105 also connects
It is connected to bus 1104.
Components described below is connected to input/output interface 1105:Importation 1106, including keyboard, mouse etc.;Output section
Divide 1107, including display, such as cathode ray tube(CRT), liquid crystal display(LCD)Etc., and speaker etc.;Storage part
Divide 1108, including hard disk etc.;With communications portion 1109, including NIC such as LAN card, modem etc..Logical
Letter part 1109 is via network such as the Internet execution communication process.
As needed, driver 1110 is also connected to input/output interface 1105.Detachable media 1111 such as disk,
CD, magneto-optic disk, semiconductor memory etc. are installed in driver 1110 computer so that reading out as needed
Program is installed in storage part 1108 as needed.
In the case that above-mentioned series of processes is realized by software, such as removable from network such as the Internet or storage medium
Unload medium 1111 and the program constituting software is installed.
It will be understood by those of skill in the art that this storage medium is not limited to the journey that is wherein stored with shown in Figure 11
Sequence and equipment are separately distributed to provide a user with the detachable media 1111 of program.The example bag of detachable media 1111
Containing disk(Comprise floppy disk(Registered trade mark)), CD(Comprise compact disc read-only memory(CD-ROM)And digital universal disc(DVD))、
Magneto-optic disk(Comprise mini-disk(MD)(Registered trade mark))And semiconductor memory.Or, storage medium can be ROM 1102, deposit
Hard disk comprising in storage part 1108 etc., wherein computer program stored, and it is distributed to user together with the equipment comprising them.
It may also be noted that execute above-mentioned series of processes step can order naturally following the instructions temporally suitable
Sequence executes, but does not need necessarily to execute sequentially in time.Some steps can execute parallel or independently of one another.
Although the present invention and its advantage have been described in detail it should be appreciated that without departing from by appended claim
Various changes, replacement and conversion can be carried out in the case of the spirit and scope of the present invention being limited.And, the present invention is implemented
The term " inclusion " of example, "comprising" or its any other variant are intended to comprising of nonexcludability, so that including one
The process of list of elements, method, article or equipment not only include those key elements, but also other including being not expressly set out
Key element, or also include for this process, method, article or the intrinsic key element of equipment.In the feelings not having more restrictions
Under condition, the key element that limited by sentence "including a ..." it is not excluded that include the process of described key element, method, article or
Also there is other identical element in person's equipment.
With regard to including the embodiment of above example, following remarks are also disclosed:
A kind of data processing method of remarks 1., including:
Active time is interval to determine step, for determining the microblog users group with similar active custom, and is based on institute
The microblogging that concern user in the microblog users group determining issues is interval come the active time to determine each microblog users group;
Keyword extraction step, for from determined by all microbloggings in active time interval extract key words;And
Topic determines step, for based on the key word being extracted, corresponding in active time interval determined by determination
Topic.
Data processing method according to remarks 1 for the remarks 2., wherein, in described active time interval determination step,
Determine that the microblog users group with similar active custom further includes:
User vector builds sub-step, has for being built according to the T/A of the conventional issuing microblog of microblog users
The user vector of predetermined dimensions;
While determining sub-step, based on the similarity between each user vector, determine the side between user node;
Microblog users group builds sub-step, for side determined by being based on, builds and has the microblogging of similar active custom and use
Family group;And
Concern user determines sub-step, for the vermicelli quantity based on each microblog users, the microblogging quantity, micro- to this issued
One of the reply quantity of the microblogging that rich user is issued and the forwarding quantity to the microblogging that this microblog users is issued or
Multiple, determine the technorati authority of this microblog users, thus predetermined quantity is selected from described microblog users group based on described technorati authority
Microblog users as described concern user.
Data processing method according to remarks 2 for the remarks 3., wherein, determines in sub-step on described side, with based on to
The method of amount included angle cosine determining the similarity between each user vector, and will determined by similarity be more than predetermined threshold
Two user nodes between when being defined as formal.
Data processing method according to remarks 1 for the remarks 4., wherein, in described active time interval determination step,
Based on determined by the microblogging issued of concern user in microblog users group determine the active time area of each microblog users group
Between further include:
Microblogging quantity statistics sub-step, pays close attention to the micro- of user's issue for statistics is described within each period of predetermined period
Rich quantity, thus obtain the microblogging quantity series with time correlation;
Sequence recursive subdivision sub-step, for carrying out recursive subdivision to the microblogging quantity series being counted, thus obtain one
Individual or multiple cut-points;And
Active time interval selection sub-step, for selecting mark in the time interval determining based on obtained cut-point
The larger top n time interval of quasi- variance is interval as described active time, and wherein N is more than or equal to 1,
Wherein, in described sequence recursive subdivision sub-step:
For each point in current sequence, calculated according to below equation:
AnthorV(i)=|L1(i)|*Var(L1(i))/|L|+|L2(i)|*Var(L2(i))/|L|
DiffV(i)=Var(L(i))-AnthorV(i)
Wherein, | L1 (i) |, | L2 (i) | represent that supposition i is current cut-point to two obtaining after current sequence segmentation respectively
The length of individual subsequence, | L | represents the length of current sequence, and Var () represents the standard variance of current sequence or subsequence;
Find out the maximum point of DiffV (i) in current sequence;And
If the DiffV (i) of this point is less than predetermined threshold, stops recursive subdivision, otherwise take this point as current sequence
Cut-point current sequence is divided into two subsequences, and continue to carry out recursive subdivision respectively to this two subsequences.
Data processing method according to remarks 2 for the remarks 5., wherein, described statistics is for working day and weekend respectively
Carry out.
Data processing method according to remarks 1 for the remarks 6., wherein, described topic determines that step further includes:
Candidate key word list determines sub-step, interval for active time determined by being directed to, and calculates extracted each
The weight of individual key word, and the key word that weight is more than predetermined threshold is included into the interval candidate keywords row of described active time
In table;
Keyword relevance calculates sub-step, crucial for any two in candidate key word list determined by calculating
Degree of association between word;
Figure construction sub-step, for each key word in described candidate key word list as node, with more than predetermined
The degree of association of threshold value carrys out structural map as the side between key word;And
Topic determines sub-step, for based on the figure being constructed, using clustering algorithm, active time area determined by determination
Interior corresponding topic.
Data processing method according to remarks 6 for the remarks 7., wherein, determines sub-step in described candidate key word list
In, interval for described active time, the weight of each key word is calculated according to below equation:
W(k)=count(k)*log(Q/counttimes(k))*log(authorfollowers(k))
Wherein, count (k) represents the occurrence number of key word k, and Q represents the microblogging quantity in described active time interval,
Counttimes (k) represents microblogging number key word k, and authorfollowers (k) represents that issue includes key word k's
The vermicelli sum of the people of microblogging.
Method according to remarks 6 for the remarks 8., wherein, described keyword relevance calculate sub-step in, by with
Lower formula calculates the degree of association between two key words:
I(A,B)=log(p(A,B))/(log(P(A))*log(P(B)))
Wherein, P (A), P (B) are illustrated respectively in described active time interval, with respect to whole microblogging numbers, occur crucial
The probability of the microblogging of word A or B, P (A, B) represents in described active time interval, with respect to whole microblogging numbers, occurs closing simultaneously
The probability of the microblogging of keyword A and B.
Data processing method according to remarks 6 for the remarks 9., wherein, described clustering algorithm includes CNM figure partitioning algorithm.
A kind of data handling equipment of remarks 10., including:
Active time interval determination unit, is configured to determine the microblog users group with similar active custom, and base
In determined by the microblogging issued of concern user in microblog users group to determine that the active time of each microblog users group is interval;
Keyword extracting unit, be configured to from determined by all microbloggings in active time interval extract key words;
And
Topic determining unit, is configured to based on the key word being extracted, in active time interval determined by determination
Corresponding topic.
Data handling equipment according to remarks 10 for the remarks 11., wherein, described active time interval determination unit enters one
Step includes:
User vector builds subelement, is configured to be built according to the T/A of the conventional issuing microblog of microblog users
There is the user vector of predetermined dimensions;
Side determination subelement, is configured to the similarity between each user vector, determines the side between user node;
Microblog users group builds subelement, be configured to be based on determined by side, build and there is the micro- of similar active custom
Rich customer group;And
Concern user's determination subelement, is configured to vermicelli quantity based on each microblog users, the microblogging quantity issued, right
The reply quantity of the microblogging that this microblog users is issued and in the forwarding quantity of the microblogging that this microblog users is issued
Individual or multiple, determine the technorati authority of this microblog users, thus select predetermined from described microblog users group based on described technorati authority
The microblog users of quantity are as described concern user.
Data handling equipment according to remarks 11 for the remarks 12., wherein, described side determines that sub-step is configured to be based on
The method of co sinus vector included angle determining the similarity between each user vector, and will determined by similarity be more than predetermined threshold
Between two user nodes of value when being defined as formal.
Data handling equipment according to remarks 10 for the remarks 13., wherein, described active time interval determination unit enters one
Step includes:
Microblogging quantity statistics subelement, is configured to statistics described concern user within each period of predetermined period and issues
Microblogging quantity, thus obtaining the microblogging quantity series with time correlation;
Sequence recursive subdivision subelement, is configured to carry out recursive subdivision to the microblogging quantity series being counted, thus
To one or more cut-points;And
Active time interval selection subelement, is configured to select in the time interval determining based on obtained cut-point
Select the larger top n time interval of standard variance interval as described active time, wherein N is more than or equal to 1,
Wherein, described sequence recursive subdivision subelement is further configured to:
For each point in current sequence, calculated according to below equation:
AnthorV(i)=|L1(i)|*Var(L1(i))/|L|+|L2(i)|*Var(L2(i))/|L|
DiffV(i)=Var(L(i))-AnthorV(i)
Wherein, | L1 (i) |, | L2 (i) | represent that supposition i is current cut-point to two obtaining after current sequence segmentation respectively
The length of individual subsequence, | L | represents the length of current sequence, and Var () represents the standard variance of current sequence or subsequence;
Find out the maximum point of DiffV (i) in current sequence;And
If the DiffV (i) of this point is less than predetermined threshold, stops recursive subdivision, otherwise take this point as current sequence
Cut-point current sequence is divided into two subsequences, and continue to carry out recursive subdivision respectively to this two subsequences.
Data handling equipment according to remarks 11 for the remarks 14., wherein, described statistics is for working day and week respectively
End is carried out.
Data handling equipment according to remarks 10 for the remarks 15., wherein, described topic determining unit further includes:
Candidate key word list determination subelement, be configured to be directed to determined by active time interval, calculate and extracted
Each key word weight, and the key word that weight is more than predetermined threshold is included into the interval candidate key of described active time
In word list;
Keyword relevance computation subunit, is configured to any two in candidate key word list determined by calculating
Degree of association between key word;
Figure construction subelement, is configured to each key word in described candidate key word list as node, to be more than
The degree of association of predetermined threshold carrys out structural map as the side between key word;And
Topic determination subelement, is configured to based on the figure being constructed, using clustering algorithm, when enlivening determined by determination
Between interval in corresponding topic.
Data handling equipment according to remarks 15 for the remarks 16., wherein, described candidate key word list determination subelement
It is further configured to interval for described active time, calculate the weight of each key word according to below equation:
W(k)=count(k)*log(Q/counttimes(k))*log(authorfollowers(k))
Wherein, count (k) represents the occurrence number of key word k, and Q represents the microblogging quantity in described active time interval,
Counttimes (k) represents microblogging number key word k, and authorfollowers (k) represents that issue includes key word k's
The vermicelli sum of the people of microblogging.
Data handling equipment according to remarks 15 for the remarks 17., wherein, described keyword relevance computation subunit is entered
One step is configured to calculate the degree of association between two key words by below equation:
I(A,B)=log(p(A,B))/(log(P(A))*log(P(B)))
Wherein, P (A), P (B) are illustrated respectively in described active time interval, with respect to whole microblogging numbers, occur crucial
The probability of the microblogging of word A or B, P (A, B) represents in described active time interval, with respect to whole microblogging numbers, occurs closing simultaneously
The probability of the microblogging of keyword A and B.
Data handling equipment according to remarks 15 for the remarks 18., wherein, described clustering algorithm includes CNM figure and divides calculation
Method.
A kind of terminal unit of remarks 19., described terminal unit includes the data according to any one of remarks 10 to 18
Processing equipment.
Terminal unit according to remarks 19 for the remarks 20., wherein, described terminal unit includes mobile phone, palm electricity
Brain, panel computer and personal computer.
Claims (8)
1. a kind of data processing method, including:
Active time is interval to determine step, for determining the microblog users group with similar active custom, and based on being determined
Microblog users group in the microblogging issued of concern user interval come the active time to determine each microblog users group;
Keyword extraction step, for for described microblog users group, from determined by all microbloggings in active time interval
Extract key word;And
Topic determines step, for based on the key word being extracted, determine described microblog users group determined by active time
Corresponding topic in interval,
Wherein, in described active time interval determination step, determine that the microblog users group with similar active custom is further
Including:
User vector build sub-step, for built according to the T/A of the conventional issuing microblog of microblog users have predetermined
The user vector of dimension;
While determining sub-step, based on the similarity between each user vector, determine the side between user node;
Microblog users group builds sub-step, for side determined by being based on, builds the microblog users group with similar active custom;
And
Concern user determine sub-step, for the vermicelli quantity based on each microblog users, issue microblogging quantity, to this microblogging use
One or more of the reply quantity of the microblogging that family is issued and the forwarding quantity to the microblogging that this microblog users is issued,
Determine the technorati authority of this microblog users, thus select the microblogging of predetermined quantity from described microblog users group based on described technorati authority
User is as described concern user.
2. data processing method according to claim 1, wherein, in described active time interval determination step, is based on
Determined by the microblogging issued of concern user in microblog users group enter come the active time interval to determine each microblog users group
One step includes:
Microblogging quantity statistics sub-step, for the statistics microblogging that described concern user issues within each period of predetermined period
Quantity, thus obtain the microblogging quantity series with time correlation;
Sequence recursive subdivision sub-step, for recursive subdivision is carried out to the microblogging quantity series being counted, thus obtain one or
Multiple cut-points;And
Active time interval selection sub-step, for selection standard side in the time interval being determined based on obtained cut-point
The larger top n time interval of difference is interval as described active time, and wherein N is more than or equal to 1,
Wherein, in described sequence recursive subdivision sub-step:
For each point in current sequence, calculated according to below equation:
AnthorV (i)=| L1 (i) | * Var (L1 (i))/| L |+| L2 (i) | * Var (L2 (i))/| L |
DiffV (i)=Var (L (i))-AnthorV (i)
Wherein, | L1 (i) |, | L2 (i) | represent that supposition i is current cut-point to two sons obtaining after current sequence segmentation respectively
The length of sequence, | L | represents the length of current sequence, and Var () represents the standard variance of current sequence or subsequence;
Find out the maximum point of DiffV (i) in current sequence;And
If the DiffV (i) of this point be less than predetermined threshold, stop recursive subdivision, otherwise take this as current sequence minute
Current sequence is divided into two subsequences by cutpoint, and continues to carry out recursive subdivision respectively to this two subsequences.
3. data processing method according to claim 1, wherein, described topic determines that step further includes:
Candidate key word list determines sub-step, interval for active time determined by being directed to, and calculates each pass extracted
The weight of keyword, and the key word that weight is more than predetermined threshold is included into the interval candidate key word list of described active time
In;
Keyword relevance calculates sub-step, for any two key word in candidate key word list determined by calculating it
Between degree of association;
Figure construction sub-step, for each key word in described candidate key word list as node, to calculate more than pre-
The degree of association determining threshold value carrys out structural map as the side between key word;And
Topic determines sub-step, and for based on the figure being constructed, using clustering algorithm, active time determined by determination is interval interior
Corresponding topic.
4. a kind of data handling equipment, including:
Active time interval determination unit, is configured to determine the microblog users group with similar active custom, and is based on institute
The microblogging that concern user in the microblog users group determining issues is interval come the active time to determine each microblog users group;
Keyword extracting unit, is configured to for described microblog users group, from determined by all in active time interval
Microblogging extracts key word;And
Topic determining unit, is configured to based on the key word being extracted, determine described microblog users group determined by active
Corresponding topic in time interval,
Wherein, described active time interval determination unit further includes:
User vector builds subelement, is configured to be built according to the T/A of the conventional issuing microblog of microblog users and has
The user vector of predetermined dimensions;
Side determination subelement, is configured to the similarity between each user vector, determines the side between user node;
Microblog users group builds subelement, be configured to be based on determined by side, build and there is the microblogging of similar active custom use
Family group;And
Concern user's determination subelement, is configured to vermicelli quantity based on each microblog users, the microblogging quantity issued, micro- to this
One of the reply quantity of the microblogging that rich user is issued and the forwarding quantity to the microblogging that this microblog users is issued or
Multiple, determine the technorati authority of this microblog users, thus predetermined quantity is selected from described microblog users group based on described technorati authority
Microblog users as described concern user.
5. data handling equipment according to claim 4, wherein, described active time interval determination unit wraps further
Include:
Microblogging quantity statistics subelement, be configured to count within each period of predetermined period described concern user issue micro-
Rich quantity, thus obtain the microblogging quantity series with time correlation;
Sequence recursive subdivision subelement, is configured to carry out recursive subdivision to the microblogging quantity series being counted, thus obtaining one
Individual or multiple cut-points;And
Active time interval selection subelement, is configured to select mark in the time interval determining based on obtained cut-point
The larger top n time interval of quasi- variance is interval as described active time, and wherein N is more than or equal to 1,
Wherein, described sequence recursive subdivision subelement is further configured to:
For each point in current sequence, calculated according to below equation:
AnthorV (i)=| L1 (i) | * Var (L1 (i))/| L |+| L2 (i) | * Var (L2 (i))/| L |
DiffV (i)=Var (L (i))-AnthorV (i)
Wherein, | L1 (i) |, | L2 (i) | represent that supposition i is current cut-point to two sons obtaining after current sequence segmentation respectively
The length of sequence, | L | represents the length of current sequence, and Var () represents the standard variance of current sequence or subsequence;
Find out the maximum point of DiffV (i) in current sequence;And
If the DiffV (i) of this point be less than predetermined threshold, stop recursive subdivision, otherwise take this as current sequence minute
Current sequence is divided into two subsequences by cutpoint, and continues to carry out recursive subdivision respectively to this two subsequences.
6. data handling equipment according to claim 4, wherein, described topic determining unit further includes:
Candidate key word list determination subelement, be configured to be directed to determined by active time interval, calculate extracted each
The weight of individual key word, and the key word that weight is more than predetermined threshold is included into the interval candidate keywords row of described active time
In table;
Keyword relevance computation subunit, any two being configured in candidate key word list determined by calculating is crucial
Degree of association between word;
Figure construction subelement, is configured to big with each key word in described candidate key word list as node, with calculate
Degree of association in predetermined threshold carrys out structural map as the side between key word;And
Topic determination subelement, is configured to based on the figure being constructed, using clustering algorithm, active time area determined by determination
Interior corresponding topic.
7. data handling equipment according to claim 6, wherein, described candidate key word list determination subelement is further
It is configured to interval for described active time, calculate the weight of each key word according to below equation:
W (k)=count (k) * log (Q/counttimes (k)) * log (authorfollowers (k))
Wherein, count (k) represents the occurrence number of key word k, and Q represents the microblogging quantity in described active time interval,
Counttimes (k) represents microblogging number key word k, and authorfollowers (k) represents that issue includes key word k's
The vermicelli sum of the people of microblogging.
8. a kind of terminal unit, the data processing that described terminal unit is included according to any one of claim 4 to 7 sets
Standby.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210202800.2A CN103514167B (en) | 2012-06-15 | 2012-06-15 | Data processing method and equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210202800.2A CN103514167B (en) | 2012-06-15 | 2012-06-15 | Data processing method and equipment |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103514167A CN103514167A (en) | 2014-01-15 |
CN103514167B true CN103514167B (en) | 2017-03-01 |
Family
ID=49896907
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201210202800.2A Expired - Fee Related CN103514167B (en) | 2012-06-15 | 2012-06-15 | Data processing method and equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103514167B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106484724A (en) * | 2015-08-31 | 2017-03-08 | 富士通株式会社 | Information processor and information processing method |
CN105946740A (en) * | 2016-04-29 | 2016-09-21 | 任开付 | Pipeline fixing clamp for automobile |
CN110134788B (en) * | 2019-05-16 | 2021-05-11 | 杭州师范大学 | Microblog release optimization method and system based on text mining |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101609445A (en) * | 2009-07-16 | 2009-12-23 | 复旦大学 | Crucial sub-method for extracting topic based on temporal information |
CN102135983A (en) * | 2011-01-17 | 2011-07-27 | 北京邮电大学 | Group dividing method and device based on network user behavior |
CN102314489A (en) * | 2011-08-15 | 2012-01-11 | 哈尔滨工业大学 | Method for analyzing opinion leader in network forum |
CN102394798A (en) * | 2011-11-16 | 2012-03-28 | 北京交通大学 | Multi-feature based prediction method of propagation behavior of microblog information and system thereof |
WO2012056463A1 (en) * | 2010-10-29 | 2012-05-03 | Hewlett-Packard Development Company, L.P. | Content recommendation for groups |
-
2012
- 2012-06-15 CN CN201210202800.2A patent/CN103514167B/en not_active Expired - Fee Related
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101609445A (en) * | 2009-07-16 | 2009-12-23 | 复旦大学 | Crucial sub-method for extracting topic based on temporal information |
WO2012056463A1 (en) * | 2010-10-29 | 2012-05-03 | Hewlett-Packard Development Company, L.P. | Content recommendation for groups |
CN102135983A (en) * | 2011-01-17 | 2011-07-27 | 北京邮电大学 | Group dividing method and device based on network user behavior |
CN102314489A (en) * | 2011-08-15 | 2012-01-11 | 哈尔滨工业大学 | Method for analyzing opinion leader in network forum |
CN102394798A (en) * | 2011-11-16 | 2012-03-28 | 北京交通大学 | Multi-feature based prediction method of propagation behavior of microblog information and system thereof |
Non-Patent Citations (1)
Title |
---|
一种中文微博新闻话题检测的方法;郑斐然等;《计算机科学》;20120131;第39卷(第1期);138-141 * |
Also Published As
Publication number | Publication date |
---|---|
CN103514167A (en) | 2014-01-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11093854B2 (en) | Emoji recommendation method and device thereof | |
CN110297988A (en) | Hot topic detection method based on weighting LDA and improvement Single-Pass clustering algorithm | |
CN103870474A (en) | News topic organizing method and device | |
CN110147421B (en) | Target entity linking method, device, equipment and storage medium | |
Fang et al. | Topic aspect-oriented summarization via group selection | |
Garg et al. | The structure of word co-occurrence network for microblogs | |
CN104077417A (en) | Figure tag recommendation method and system in social network | |
CN105630884A (en) | Geographic position discovery method for microblog hot event | |
CN111861596B (en) | Text classification method and device | |
CN106599194A (en) | Label determining method and device | |
CN103885933A (en) | Method and equipment for evaluating text sentiment | |
CN110399483A (en) | A kind of subject classification method, apparatus, electronic equipment and readable storage medium storing program for executing | |
CN111309834A (en) | Method and device for matching wireless hotspot with interest point | |
CN103678371B (en) | Word library updating device, data integration device and method and electronic equipment | |
CN103514167B (en) | Data processing method and equipment | |
Chen et al. | An intelligent government complaint prediction approach | |
Wang et al. | An improved clustering method for detection system of public security events based on genetic algorithm and semisupervised learning | |
CN116402166A (en) | Training method and device of prediction model, electronic equipment and storage medium | |
Subramani et al. | Text mining and real-time analytics of twitter data: A case study of australian hay fever prediction | |
CN103678355B (en) | Text mining method and text mining device | |
CN116151235A (en) | Article generating method, article generating model training method and related equipment | |
CN115455957A (en) | User touch method, device, electronic equipment and computer readable storage medium | |
CN111767730B (en) | Event type identification method and device | |
CN106156116A (en) | Information issuing method and system | |
Jee et al. | Potential of patent image data as technology intelligence source |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20170301 Termination date: 20180615 |