CN106570140B - Determine the method and device of information focus - Google Patents

Determine the method and device of information focus Download PDF

Info

Publication number
CN106570140B
CN106570140B CN201610964928.0A CN201610964928A CN106570140B CN 106570140 B CN106570140 B CN 106570140B CN 201610964928 A CN201610964928 A CN 201610964928A CN 106570140 B CN106570140 B CN 106570140B
Authority
CN
China
Prior art keywords
information
focus
information focus
state
list
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610964928.0A
Other languages
Chinese (zh)
Other versions
CN106570140A (en
Inventor
李德彦
晋耀红
杨凯程
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Anhui Tai Yue Xiang Sheng Software Co., Ltd.
Original Assignee
China Science And Technology (beijing) Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Science And Technology (beijing) Co Ltd filed Critical China Science And Technology (beijing) Co Ltd
Publication of CN106570140A publication Critical patent/CN106570140A/en
Application granted granted Critical
Publication of CN106570140B publication Critical patent/CN106570140B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of method and device for determining information focus, belong to areas of information technology.This method includes:Calculate class and the similarity of each information focus in first list;When the similarity of any information focus in class and first list is more than first threshold, class is added in information list corresponding to information focus;When the similarity of any information focus to be confirmed in class and second list is more than Second Threshold, and the information focus to be confirmed meets preparatory condition, information focus to be confirmed is moved in first list.The present invention not merely determines information focus according to cluster result, but the similarity of the information focus in cluster result and first list is more than first threshold, or the similarity of the focus to be confirmed in cluster result and second list is when being more than Second Threshold and meeting preparatory condition, cluster result is defined as information focus, improves the accuracy of identified information focus.

Description

Determine the method and device of information focus
This application claims Patent Office of the People's Republic of China, Application No. 201610354737.2, invention were submitted on 05 26th, 2016 The priority of the Chinese patent application of entitled " a kind of unstructured information focus management system and method ", entire contents are led to Reference is crossed to be incorporated in the present application.
Technical field
The present invention relates to areas of information technology, more particularly to a kind of method and device for determining information focus.
Background technology
In modern society, internet is increasingly becoming the main path of information issue.By internet, user can microblogging, For a certain hot ticket, popular personage or hot issue issue comment information in the social class website such as forum, blog, in information Technical field, the hot ticket, popular personage or hot issue are commonly referred to as information focus.Due to the comment letter to information focus Breath reflects the public sentiment dynamic of current social, significant to social stability and national development, therefore, it is necessary in time from sea Information focus is determined in amount information, and then uses effective measure A clear guidance public opinion.
Prior art is it is determined that during information focus, the main method for using keyword retrieval, and detailed process is:From internet It is upper to obtain pending information;The keyword of each pending information of extraction;Calculate the keyword of the pending information of any two Similarity, if the similarity of the keyword of the two pending information is more than predetermined threshold value, by two pending information Gather for one kind, and using the keyword as such class label;If the information content that any sort includes is more than predetermined number, Using such class label as an information focus.
During the present invention is realized, inventor has found that prior art at least has problems with:
Because prior art is according only to a cluster result, just meet the class of certain condition as information heat information content Point, and may be a fake information focus indeed according to such identified information focus, therefore, determined by prior art Information focus is inaccurate.
The content of the invention
In order to solve problem of the prior art, the embodiments of the invention provide a kind of method and dress for determining information focus Put.The technical scheme is as follows:
On the one hand, there is provided a kind of method for determining information focus, methods described include:
Treat processing information to be clustered, obtain multiple classes;
For any one class, the class and the similarity of each information focus in first list, the first list are calculated For storage information focus;
If the class and the similarity of any information focus in the first list are more than first threshold, by the class It is added in information list corresponding to described information focus;
If the class and the similarity of each information focus in the first list are respectively less than the first threshold, count The class and the similarity of each information focus to be confirmed in second list are calculated, the second list is used to store information to be confirmed Focus;
, will if the class and the similarity of any information focus to be confirmed in the second list are more than Second Threshold The class is added in information list corresponding to the information focus to be confirmed, and meets to preset in the information focus to be confirmed During condition, the information focus to be confirmed is moved in the first list.
In another embodiment of the present invention, each class has a class label, described to calculate the class and secondary series In table after the similarity of each information focus to be confirmed, in addition to:
If the class and the similarity of each information focus to be confirmed in the second list are respectively less than second threshold Value, then be defined as target information focus to be confirmed by the class label of the class;
Target information focus to be confirmed is added in the second list, and in target information heat to be confirmed When point meets the preparatory condition, target information focus to be confirmed is moved in the first list.
In another embodiment of the present invention, it is described that the class is added to information list corresponding to described information focus In after, in addition to:
The change curve of described information focus is drawn using the information content of described information focus as the longitudinal axis, by transverse axis of the time;
Obtain first information amount of the described information focus in current time;
The second information content of described information focus within a specified time is obtained, between the specified time and the current time Every preset duration;
According to the first information amount, second information content and the preset duration, the change curve and horizontal stroke are calculated The angle of axle;
According to the change curve and the angle of transverse axis, the current life cycle state of described information focus is determined;
According to the current life cycle state of described information focus, the life cycle state of described information focus is carried out more Newly.
In another embodiment of the present invention, the life cycle state includes generating state, state of development, outburst shape State, weak state and extinction state;
It is described that the current life cycle state of described information focus is determined according to the change curve and the angle of transverse axis, Including:
If the angle of the change curve and transverse axis is less than the first default value, it is determined that described information focus is current Life cycle state is generating state;
If the angle of the change curve and transverse axis is less than the second default value more than first default value, really It is state of development to determine the current life cycle state of described information focus;
If the angle of the change curve and transverse axis is less than the 3rd default value more than second default value, really It is outburst state to determine the current life cycle state of described information focus;
If the angle of the change curve and transverse axis is less than the 4th default value more than the 3rd default value, really It is weak state to determine the current life cycle state of described information focus;
If the angle of the change curve and transverse axis is more than the 4th default value, it is determined that described information focus is worked as Preceding life cycle state is extinction state;
Wherein, first default value is less than second default value, and second default value is less than described the Three default values, the 3rd default value are less than the 4th default value.
In another embodiment of the present invention, the life cycle state current according to described information focus, to institute State information focus life cycle state be updated after, methods described also includes:
If the life cycle state after the renewal of described information focus is extinction state, described information focus is moved to 3rd list, the 3rd list are used to store the information focus in extinction state.
On the other hand, there is provided a kind of device for determining information focus, described device include:
Cluster module, clustered for treating processing information, obtain multiple classes;
Computing module, for for any one class, calculating the class and the similarity of each information focus in first list, The first list is used for storage information focus;
Add module, for being more than first threshold when the similarity of any information focus in the class and the first list When, the class is added in information list corresponding to described information focus;
The computing module, it is additionally operable to when the class and the similarity of each information focus in the first list are respectively less than During the first threshold, the class and the similarity of each information focus to be confirmed in second list, the second list are calculated For storing information focus to be confirmed;
The add module, for when the class and the similarity of any information focus to be confirmed in the second list it is big When Second Threshold, the class is added in information list corresponding to the information focus to be confirmed;
Mobile module, for when the information focus to be confirmed meets preparatory condition, by the information focus to be confirmed It is moved in the first list.
In another embodiment of the present invention, each class has a class label, and described device also includes:
First determining module, for when the class and the similarity of each information focus to be confirmed in the second list it is equal During less than the Second Threshold, the class label of the class is defined as target information focus to be confirmed;
The add module, for target information focus to be confirmed to be added in the second list;
The mobile module, it is used for and when target information focus to be confirmed meets the preparatory condition, by described in Target information focus to be confirmed is moved in the first list.
In another embodiment of the present invention, described device also includes:
Drafting module, for drawing described information heat using the information content of described information focus as the longitudinal axis, by transverse axis of the time The change curve of point;
Acquisition module, for obtaining first information amount of the described information focus in current time;
The acquisition module, it is additionally operable to obtain the second information content of described information focus within a specified time, it is described to specify Time and the current time interval preset duration;
The computing module, for according to the first information amount, second information content and the preset duration, calculating The angle of the change curve and transverse axis;
Second determining module, for the angle according to the change curve and transverse axis, determine that described information focus is current Life cycle state;
Update module, for according to the current life cycle state of described information focus, to the life of described information focus Periodic state is updated.
In another embodiment of the present invention, the life cycle state includes generating state, state of development, outburst shape State, weak state and extinction state;
Second determining module, for when the angle of the change curve and transverse axis is less than the first default value, really It is generating state to determine the current life cycle state of described information focus;Described in being more than when the angle of the change curve and transverse axis When first default value is less than the second default value, it is state of development to determine the current life cycle state of described information focus; When the angle of the change curve and transverse axis is less than three default values more than second default value, described information is determined The current life cycle state of focus is outburst state;When the angle of the change curve and transverse axis is more than the 3rd present count When value is less than four default values, it is weak state to determine the current life cycle state of described information focus;When the change When the angle of curve and transverse axis is more than four default value, the current life cycle state of described information focus is determined to disappear Die state;Wherein, first default value is less than second default value, and second default value is less than the described 3rd Default value, the 3rd default value are less than the 4th default value.
In another embodiment of the present invention, the mobile module, it is additionally operable to life after described information focus updates When life periodic state is extinction state, described information focus is moved to the 3rd list, the 3rd list is in for storage The information focus of extinction state.
The beneficial effect that technical scheme provided in an embodiment of the present invention is brought is:
Information focus is not merely determined according to cluster result, but in cluster result and the information focus in first list Similarity be more than first threshold, or the similarity of cluster result and the focus to be confirmed in second list is more than Second Threshold And when meeting preparatory condition, cluster result is defined as information focus, improve the accuracy of identified information focus.
Brief description of the drawings
Technical scheme in order to illustrate the embodiments of the present invention more clearly, make required in being described below to embodiment Accompanying drawing is briefly described, it should be apparent that, drawings in the following description are only some embodiments of the present invention, for For those of ordinary skill in the art, on the premise of not paying creative work, other can also be obtained according to these accompanying drawings Accompanying drawing.
Fig. 1 is showing for the implementation environment involved by a kind of method for determination information focus that one embodiment of the invention provides It is intended to;
Fig. 2 is a kind of method flow diagram for determination information focus that another embodiment of the present invention provides;
Fig. 3 is a kind of method flow diagram for determination information focus that another embodiment of the present invention provides;
Fig. 4 is a kind of schematic diagram for information focus discovery procedure that another embodiment of the present invention provides;
Fig. 5 is a kind of schematic diagram for information focus life cycle management process that another embodiment of the present invention provides;
Fig. 6 is a kind of schematic diagram for information hotspot tracking process that another embodiment of the present invention provides;
Fig. 7 is a kind of schematic diagram for information focus life cycle that another embodiment of the present invention provides;
Fig. 8 is a kind of structural representation of the device for determination information focus that another embodiment of the present invention provides.
Embodiment
To make the object, technical solutions and advantages of the present invention clearer, below in conjunction with accompanying drawing to embodiment party of the present invention Formula is described in further detail.
Fig. 1 is refer to, it illustrates the implementation ring involved by the method for determination information focus provided in an embodiment of the present invention Border, referring to Fig. 1, the implementation environment includes:MIM message input module 101, information focus life cycle find tracing management module 102nd, managing listings 103, information focus output module 104.
Wherein, managing listings 103 include enlivening hotspot list 121, historical heat list 122 and hotspot list to be confirmed 123.This enliven hotspot list 121 be used for store newfound information focus and in development, outburst, weak state letter Focus is ceased, the information focus enlivened in hotspot list needs persistently to be tracked.The historical heat list 122 is used at storage Information focus in the information focus of extinction state, the historical heat list need not be tracked again.Focus row to be confirmed Table 123 is used to store information focus to be confirmed, and the hot information to be confirmed needs in the hotspot list 123 to be confirmed are The no confirmation for information focus.
MIM message input module 101 is used to input magnanimity information, and the information of the MIM message input module 101 input can come from mutually Information database in networking, stores the various comment informations on internet in the information database, including user microblogging, The comment information of the social class website orientation such as wechat, blog, the information of the MIM message input module 101 input can be from servicing Device local data base;For the ease of analyzing the magnanimity information of input, MIM message input module 101 is additionally operable to the letter to input Breath is handled, and by handling the information of input, the information of input can be converted into plain text, in the process In, if the information of input is HTML (HyperText Markup Language, HyperText Markup Language) information, then need The html tag in HTML information is removed, retains the text in HTML information, and the text retained is converted into txt shapes Formula, if the information of input is office documents, needs to parse the office documents using document parsing application, lead to Cross and office documents are parsed, text can be extracted from office documents, and the text of extraction is converted into txt forms; MIM message input module 101 is additionally operable to arrange the information after processing, and is deposited the information after arrangement automatically in units of day Storage is found and tracked for use as to information focus in server local database.
Information focus life cycle finds that tracing management module 102 is used to enter the information that MIM message input module 101 inputs The automatic discovery of row, tracking, management etc., the information focus life cycle find that tracing management module 102 includes focus discovery submodule Block 111, hotspot tracking submodule 112, focus life cycle management submodule 113.The focus life cycle finds tracking module 102 input is information hotspot list 123 to be confirmed, and it is to enliven hotspot list 121 and historical heat after updating that it, which is exported, List 122.
Information focus output module 104 enlivens hotspot list 121, historical heat list 122 and heat to be confirmed for output Point list 123, three lists of the message output module 104 output can be shown in the form of HTML, can also be embedding Enter into other systems.
Based on the implementation environment shown in Fig. 1, the embodiments of the invention provide a kind of method for determining information focus, referring to figure 2, method flow provided in an embodiment of the present invention includes:
201st, treat processing information to be clustered, obtain multiple classes.
202nd, for any one class, class and the similarity of each information focus in first list are calculated, first list is used for Storage information focus.
203rd, such as fruit and the similarity of any information focus in first list are more than first threshold, then class are added into letter Cease in information list corresponding to focus.
204th, the similarity of each information focus is respectively less than first threshold such as in fruit and first list, then calculates class and the The similarity of each information focus to be confirmed in two lists, second list are used to store information focus to be confirmed.
205th, such as fruit and the similarity of any information focus to be confirmed in second list are more than Second Threshold, then add class It is added in information list corresponding to information focus to be confirmed, and when information focus to be confirmed meets preparatory condition, will be to be confirmed Information focus is moved in first list.
Method provided in an embodiment of the present invention, information focus is not merely determined according to cluster result, but tied in cluster The similarity of information focus in fruit and first list is more than first threshold, or cluster result with it is to be confirmed in second list The similarity of focus is more than Second Threshold and when meeting preparatory condition, and cluster result is defined as into information focus, improve really The accuracy of fixed information focus.
In another embodiment of the present invention, each class has a class label, calculates each in class and second list After the similarity of information focus to be confirmed, in addition to:
Similarity such as each information focus to be confirmed in fruit and second list is respectively less than Second Threshold, then by the class of class Label is defined as target information focus to be confirmed;
Target information focus to be confirmed is added in second list, and meets default bar in target information focus to be confirmed During part, target information focus to be confirmed is moved in first list.
In another embodiment of the present invention, after class is added in information list corresponding to information focus, also wrap Include:
The change curve of information focus is drawn using the information content of information focus as the longitudinal axis, by transverse axis of the time;
Obtain first information amount of the information focus in current time;
Obtain the second information content of information focus within a specified time, specified time and current time interval preset duration;
According to first information amount, the second information content and preset duration, the angle of calculating change curve and transverse axis;
According to change curve and the angle of transverse axis, the current life cycle state of information focus is determined;
According to the current life cycle state of information focus, the life cycle state of information focus is updated.
In another embodiment of the present invention, life cycle state include generating state, state of development, outburst state, Weak state and extinction state;
According to change curve and the angle of transverse axis, the current life cycle state of information focus is determined, including:
If the angle of change curve and transverse axis is less than the first default value, it is determined that the current life cycle of information focus State is generating state;
If the angle of change curve and transverse axis is less than the second default value more than the first default value, it is determined that information heat The current life cycle state of point is state of development;
If the angle of change curve and transverse axis is less than the 3rd default value more than the second default value, it is determined that information heat The current life cycle state of point is outburst state;
If the angle of change curve and transverse axis is less than the 4th default value more than the 3rd default value, it is determined that information heat The current life cycle state of point is weak state;
If the angle of change curve and transverse axis is more than the 4th default value, it is determined that the current life cycle of information focus State is extinction state;
Wherein, the first default value is less than the second default value, and the second default value is less than the 3rd default value, and the 3rd is pre- If numerical value is less than the 4th default value.
In another embodiment of the present invention, according to the current life cycle state of information focus, to information focus After life cycle state is updated, method also includes:
If the life cycle state after the renewal of information focus is extinction state, information focus is moved to the 3rd row Table, the 3rd list are used to store the information focus in extinction state.
Above-mentioned all optional technical schemes, any combination can be used to form the alternative embodiment of the present invention, herein no longer Repeat one by one.
Based on the implementation environment shown in Fig. 1, the embodiments of the invention provide a kind of method for determining hot information, referring to figure 3, method flow provided in an embodiment of the present invention includes:
301st, server obtains pending information.
Wherein, the quantity of pending information can be 1000,2000,3000 etc., and the present embodiment is not to pending The quantity of information makees specific limit.The pending information is mainly unstructured information, and its message form is not fixed, including each The file of kind form, such as electronic document, Email, video file, multimedia.
The mode of pending information is obtained on server, is included but is not limited to:Server can obtain from local storage Multiple information betided in current time are taken, and using the accessed information betided in current time as pending letter Breath;Server can also obtain multiple information betided in current time from the information database on internet, and will be obtained The information betided in current time got is as pending information.The current time can be chronomere hour, also may be used Using day as chronomere, in the present embodiment using unit of the day as current time, that is to say current time is the same day.
Because accessed information has various forms, and these various forms of information analyses are got up with ten Point inconvenience, therefore, the present embodiment provide method after pending information is got, also by the pending information to getting Handled, handled by treating processing information, pending information can be converted to plain text.
302nd, server is treated processing information and clustered, and obtains multiple classes.
In areas of information technology, often have to the comment information of same event, same personage or same report identical Keyword, multiple information are gathered for one kind if based on the identical keyword, and information is managed in units of class, Information management efficiency can be improved.Therefore, in the present embodiment, server, which can use, specifies clustering algorithm to treat processing information progress Cluster, obtains multiple classes.Wherein, clustering algorithm is specified to include suffix tree clustering algorithm, fuzzy clustering algorithm etc..To specify cluster Algorithm is exemplified by suffix tree clustering algorithm, specific cluster process is:
The first step, server carry out word segmentation processing to each pending information, obtain multiple participles, filter out multiple participles In stop words, and obtain noun from the participle after filtering, verb and length be more than the participles of 2 words as each treat from The characteristic value of information is managed, for any pending information, calculates TF-IDF (Term of each characteristic value in clustering information Frequency-Inverse Document Frequency, word frequency-reverse document-frequency) as each characteristic value weight it is special Value indicative, and the space vector with the eigenvalue cluster of the pending information into the pending information.
Second step, for the pending information of any two, server calculate space corresponding to two pending information to The similarity of amount, if the similarity of space vector is more than specified threshold corresponding to two pending information, it can determine that this Two pending information are similar.
For the computational methods of the similarity of space vector corresponding to two pending information, include but is not limited to:Meter The cosine value for calculating space vector corresponding to two pending information is determined, if empty corresponding to two pending information Between vectorial cosine value be more than specified threshold, then can determine that two pending information are similar.For example, wait to locate for any two Information A and B are managed, according to being (a from space vector corresponding to pending information A1, a2, a3..., an), corresponding to pending information B Space vector is (b1, b2, b3..., bn), server calculates space vector corresponding to pending information A and pending information B Cosine valueWherein, specified threshold can be 0.3,0.5,0.6 Deng, the present embodiment so that specified threshold is 0.3 as an example, if the cosine value is more than 0.3, it can determine that pending information A and wait to locate It is similar to manage information B.
3rd step, server calculate any of other information and the two the pending information in pending information information Similarity, if the similarity of the other information and any information in pending information is more than specified threshold, other are believed Breath gathers for one kind with the two pending information.
Processing information is treated by using this kind of clustering algorithm to be clustered, can obtain multiple classes, and each class is corresponding one Class label, such label are such corresponding space vector.
As a kind of optional embodiment, in order to improve cluster speed, when treating processing information and being clustered, gather The quantity of class is up to 100, when any pending information and current obtained 100 classes are dissimilar, can wait to locate by this Reason information, which removes, no longer to be handled.
As a kind of optional embodiment, in order to improve calculating speed, the method that the present embodiment provides can be according to each class The information content included, obtained multiple classes are ranked up, and according to ranking results, specified quantity is chosen from multiple classes Individual class, so as to be handled in subsequent step the class of selected specified quantity.Wherein, specified quantity is by server Reason ability determines, can be 30,40,50 etc., the present embodiment is so that specified quantity is 30 as an example.
303rd, for any one class, server calculates such similarity with each information focus in first list.
In the present embodiment, server safeguards three managing listings, including first list, second list, the 3rd list.Its In, first list is enlivens hotspot list, for storing fixed information focus and recording the attribute letter of each information focus Breath, including information content for including of information focus mark, information hotspot name, information focus etc.;Second list is to be confirmed Hotspot list, for storing hot information to be confirmed;3rd list is historical heat list, and extinction state is in for storing Information focus.
In the present embodiment, the information focus in first list can use XML (Extensible Markup Language, extensible markup language) form stored.Specific form is as follows:
<hotlist>
<hot>
<id>1</id>
<title>Title</title>
<status>Occur</status>
<infonum>
<date>2012-01-01</date><num>21</num>
<date>2012-01-02</date><num>51</num>
……
</infonum>
<docid>3,5,6……</docid>
</hot>
</hotlist>
Wherein, id represents the numbering of information focus, and the id can be numeric type;Title represents the title of information focus, should Title can be character string type;Status represents the current state of information focus;Infonum represents the daily letter of information focus Breath amount, wherein date represent the date, and num represents the information content within date certain time;Docid represents that information focus is wrapped The document code of the information included or the numbering of database recorded information, the docid can be character string type.Deposited using XML When storing up the information focus in first list, each id uses comma, " separate.
Any one obtained class is clustered for above-mentioned steps 302, in order to determine whether such is an information focus, service Device can calculate such similarity with each information focus in first list.Calculating such and each letter in first list , can be by calculating the cosine of such corresponding class label space vector corresponding with any information focus when ceasing the similarity of focus Value is determined.
If the 304, such is more than first threshold with the similarity of any information focus in first list, server adds class It is added in information list corresponding to information focus, performs step 308.
Wherein, first threshold can be 0.3,0.4,0.6 etc., and the present embodiment is so that first threshold is 0.3 as an example.In this implementation Example in, each information focus corresponds to an information list, have recorded in the information list each information focus include it is every Individual information.
When the similarity of such and any information focus in first list is more than first threshold, illustrate such and the information Focus is similar, and now server can determine that such is information focus, and then such is added into information corresponding to the information focus In list.
The 305th, if such is respectively less than first threshold, server meter with the similarity of each information focus in first list Calculate class and the similarity of each information focus to be confirmed in second list.
When such and the similarity of each information focus heat is respectively less than first threshold in first list, illustrate such with it is each Information focus is dissimilar, and in order to further determine that whether such is information focus, server will also calculate such and secondary series The similarity of each information focus to be confirmed in table.During specific calculating, it can be treated really with each by calculating the corresponding class label The cosine value for recognizing space vector corresponding to information focus is determined.
306th, such as fruit and the similarity of any information focus to be confirmed in second list are more than Second Threshold, then server Class is added in information list corresponding to information focus to be confirmed, and when information focus to be confirmed meets preparatory condition, will Information focus to be confirmed is moved in first list.
In the present embodiment, each information focus to be confirmed corresponds to an information list, have recorded in the information list Each information that each information focus to be confirmed includes.When such similarity with any information focus to be confirmed in second list More than Second Threshold, server determines that such is information focus to be confirmed, and such is added into information focus to be confirmed and corresponded to Information list in.When it is determined that the information focus to be confirmed meets preparatory condition, it is determined that the information focus to be confirmed is information Focus, and the information focus to be confirmed is moved in first list.
Wherein, preparatory condition includes the duration more than preset duration etc., that is to say the information to be confirmed in preset duration The change curve of focus and the angle of transverse axis exceed default value, and the default value can be 30 degree, 40 degree etc..When this is default Length can be 2 days, 3 days, 4 days etc., and the present embodiment does not make specific limit to the size of preset duration.
307th, such as fruit and the similarity of each information focus to be confirmed in second list are respectively less than Second Threshold, then service The class label of class is defined as target information focus to be confirmed by device, and target information focus to be confirmed is added into second list In, and when target information focus to be confirmed meets preparatory condition, target information focus to be confirmed is moved to first by server In list.
When the similarity of such and each information focus to be confirmed in second list is respectively less than Second Threshold, illustrate such with Each information focus to be confirmed is dissimilar in second list, and such can be now defined as to target information focus to be confirmed, It is for a new information focus to be confirmed.When it is determined that target information focus to be confirmed meets preparatory condition, server Can determine that target information focus to be confirmed is information focus, and target information focus to be confirmed is moved to from second list In first list.
Above-mentioned steps 301 to 307 be information focus discovery procedure, for the ease of understanding the process, below will using Fig. 4 as Example illustrates.
Referring to Fig. 4, server obtains pending information, and treats processing information using suffix tree clustering algorithm and gathered Class, multiple classes are obtained, for the preceding n class in multiple classes, server calculates any one class each to be believed with enlivening in hotspot list The similarity of focus is ceased, if such is more than threshold value with enlivening the similarity of any information focus in hotspot list, by such It is added to and enlivens in hotspot list such corresponding information list, if such is with enlivening each information focus in hotspot list Similarity is respectively less than threshold value, then calculates such similarity with each focus to be confirmed in hotspot list to be confirmed, if such It is less than threshold value with the similarity of each focus to be confirmed in hotspot list to be confirmed, then generates a new focus to be confirmed, and The new focus to be confirmed is added in hotspot list to be confirmed;If such with it is any to be confirmed in hotspot list to be confirmed The similarity of focus is more than threshold value, then such is added in information list corresponding to the focus to be confirmed, and judges that this is treated really Whether the duration for recognizing information focus is more than 2 days, should if the duration of the information focus to be confirmed is more than 2 days Information focus to be confirmed, which is moved to, to be enlivened in hotspot list.
308th, server is managed to the life cycle of the information focus in first list.
Wherein, focus life cycle is used to the development trend and current state of information focus be described, focus life Life periodic state includes findings that state, state of development, weak state and extinction state.When information focus is in discovery state, Information focus just occurs, and the information content growth of information focus is very slow;When information focus is in state of development, information heat Point is lasting to be occurred, when the information content growth of information focus is more slow;When information focus is in outburst state, information focus Information content increases very fast;When information focus is in weak state, the information content of information focus is in negative growth;At information focus When extinction state, the information content of information focus is in persistently negative growth.
In the present embodiment, in order to preferably grasp the public sentiment of current social dynamic, server will also be in first list The life cycle of information focus be managed.Specific management process comprises the following steps 3081~3086:
3081st, server draws the change of the information focus using the information content of the information focus as the longitudinal axis, by transverse axis of the time Change curve.
Referring to Fig. 5, Fig. 5 is the change curve of information focus, wherein, transverse axis represents the time, and the longitudinal axis represents information focus Information content.
3082nd, server obtains first information amount of the information focus in current time.
Any sort that server is obtained by the cluster of calculation procedure 302 is similar to each information focus in first list Degree, such is added in the corresponding information list of the information focus similar with such, and existed by counting the information focus The information content increased newly in current time, obtains first information amount of the information focus in current time, so according to this first Information content is updated to the attribute information of the information focus in first list.
Referring to Fig. 6, server obtains multiple classes using suffix tree clustering algorithm, and for any one class, server calculates should Class and the similarity for enlivening each information focus in hotspot list, if the similarity of such and any information focus is more than threshold Such, then be added in information list corresponding to the information focus by value, and the information content newly increased on the day of counting, and then according to The information content that the same day newly increases, renewal enliven hotspot list.
3083rd, server obtains the second information content of information focus within a specified time.
Wherein, specified time and current time interval preset duration, the preset duration can be 2 days, 3 days, 4 days etc., this Embodiment is so that preset duration is 3 days as an example.Server is after the first information amount in current time that gets, with current time Chosen forward for starting point with the current time interval time of 3 days as specified time, and count the information focus at the appointed time The second interior information content.
3084th, server calculates information focus in preset duration according to first information amount, the second information content and preset duration Interior growth acceleration.
Based on first information amount, the second information content and preset duration, server can calculate the change song in the information focus The slope of line, and then determine the change curve of the information focus and the angle of transverse axis.First information amount is set as y2, the second information Measure as y1, current time x2, specified time x1, then the slope of the change curve of the information focus isThe change of the information focus can be obtained according to the slope of the change curve of the information focus Change the angle of curve and transverse axis
3085th, server determines the current life cycle shape of the information focus according to the change curve and the angle of transverse axis State.
Server determines the current life cycle state of the information focus, wrapped according to the change curve and the angle of transverse axis Include following several situations:
If the angle of the first situation, the change curve and transverse axis is less than the first default value, server determines letter It is generating state to cease the current life cycle state of focus.
Wherein, the first default value is less than 90 degree, can be 30 degree, 40 degree etc..When according to first information quantity, the second letter Breath quantity and preset duration determine that the angle of the change curve and transverse axis is less than the first default value, then server can determine that this The current life cycle state of information focus is generating state.
If the angle of second of situation, the change curve and transverse axis is less than the second present count more than the first default value Value, then server determines that the current life cycle state of information focus is state of development.
Wherein, the second default value is more than the first default value and less than 90 degree, can be 50 degree, 60 degree etc..Work as basis First information quantity, the second information content and preset duration determine that the angle of the change curve and transverse axis is more than the first present count Value is less than the second default value, then server can determine that the current life cycle state of the information focus is state of development.
If the angle of the third situation, the change curve and transverse axis is more than the 3rd present count less than the second default value Value, then server determines that the current life cycle state of information focus is outburst state.
Wherein, the 3rd default value is more than 90 degree, can be 120 degree, 130 degree etc..When according to first information quantity, second Information content and preset duration determine that the angle of the change curve and transverse axis is more than the 3rd present count less than the second default value During value, then server can determine that the current life cycle state of the information focus is outburst state.
If the 4th kind of situation, growth acceleration are less than the 4th default value more than the 3rd default value, server is true It is weak state to determine the current life cycle state of information focus.
Wherein, the 4th default value is more than the 3rd default value and more than 90 degree, can be 140 degree, 150 degree etc..Work as root Determine that the angle of the change curve and transverse axis is default more than the 3rd according to first information quantity, the second information content and preset duration Numerical value is less than the 4th default value, then server can determine that the current life cycle state of the information focus is weak state.
If the angle of the 5th kind of situation, the change curve and transverse axis is more than the 4th default value, server determines letter It is extinction state to cease the current life cycle state of focus.
When the angle that the change curve and transverse axis are determined according to first information quantity, the second information content and preset duration More than the 4th default value, then server can determine that the current life cycle state of the information focus is extinction state.
3086th, server enters according to the current life cycle state of information focus to the life cycle state of information focus Row renewal.
Due to have recorded the life cycle state of each information focus in first list in the present embodiment, therefore, work as service , can be according to identified current life cycle state, to first row after current life cycle state of the device according to the information focus The life cycle state of the information focus is updated in table.
When the life cycle state of the information focus in first list is updated, if after information focus renewal Life cycle state be extinction state, then server needs information focus being moved to the 3rd list, and the 3rd list is used for Information focus of the storage in extinction state.
It should be noted that the life cycle of an above-mentioned information focus by first list is entered exemplified by being managed Row explanation, it is managed for the life cycle of other information focus in first list, can be according to above- mentioned information focus Way to manage is managed, and here is omitted.
The management process of the above-mentioned life cycle to information focus, reference can be made to Fig. 7.
Referring to Fig. 7, for enlivening each information focus in hotspot list, server counts information heat in units of day The information content of point, using current time as starting point, the information content in the specified time with the 3 day time of current time interval is obtained, And according to the information content of the information focus, the information content of the information focus and 3 days interval times in specified time in current time, Calculate the slope of the change curve of the information focus, and then according to the slope, determine the change curve and transverse axis of the information focus Angle, the angle is also referred to as the growth acceleration of the information focus, so as to the growth acceleration according to the information focus, it is determined that The current life cycle state of the information focus, if the current life cycle state of the information focus is generating state, development One kind in state, outburst state, weak state, then renewal enliven the life cycle state of the information focus in hotspot list, If the current life cycle state of the information focus is extinction state, the information focus is moved to historical heat list In.
Method provided in an embodiment of the present invention, information focus is not merely determined according to cluster result, but tied in cluster The similarity of information focus in fruit and first list is more than first threshold, or cluster result with it is to be confirmed in second list The similarity of focus is more than Second Threshold and when meeting preparatory condition, and cluster result is defined as into information focus, improve really The accuracy of fixed information focus.
Referring to Fig. 8, the embodiments of the invention provide a kind of device for determining information focus, the device includes:
Cluster module 801, clustered for treating processing information, obtain multiple classes;
Computing module 802, for for any one class, calculating class and the similarity of each information focus in first list, The first list is used for storage information focus;
Add module 803, for when the similarity of any information focus in class and first list is more than first threshold, inciting somebody to action Class is added in information list corresponding to information focus;
Computing module 802, it is additionally operable to when class and the similarity of each information focus in first list are respectively less than first threshold When, class and the similarity of each information focus to be confirmed in second list are calculated, the second list is used to store information to be confirmed Focus;
Add module 803, for being more than the second threshold when the similarity of any information focus to be confirmed in class and second list During value, class is added in information list corresponding to information focus to be confirmed;
Mobile module 804, for when information focus to be confirmed meets preparatory condition, information focus to be confirmed to be moved to In first list.
In another embodiment of the present invention, each class has a class label, and the device also includes:
First determining module, for being respectively less than second when the similarity of each information focus to be confirmed in class and second list During threshold value, the class label of class is defined as target information focus to be confirmed;
Add module 803, for target information focus to be confirmed to be added in second list;
Mobile module 804, it is used for and when target information focus to be confirmed meets preparatory condition, by target information to be confirmed Focus is moved in first list.
In another embodiment of the present invention, the device also includes:
Drafting module, for drawing the change of information focus using the information content of information focus as the longitudinal axis, by transverse axis of the time Curve;
Acquisition module, for obtaining first information amount of the information focus in current time;
Acquisition module, is additionally operable to obtain the second information content of information focus within a specified time, specified time with it is current when Between be spaced preset duration;
Computing module, for according to first information amount, the second information content and preset duration, calculating change curve and transverse axis Angle;
Second determining module, for the angle according to change curve and transverse axis, determine the current life cycle of information focus State;
Update module, for according to the current life cycle state of information focus, to the life cycle state of information focus It is updated.
In another embodiment of the present invention, life cycle state include generating state, state of development, outburst state, Weak state and extinction state;
Second determining module, for when the angle of change curve and transverse axis is less than the first default value, determining information heat The current life cycle state of point is generating state;It is less than second when the angle of change curve and transverse axis is more than the first default value During default value, it is state of development to determine the current life cycle state of information focus;When the angle of change curve and transverse axis is big When the second default value is less than three default values, it is outburst state to determine the current life cycle state of information focus;When When the angle of change curve and transverse axis is less than four default values more than the 3rd default value, the current life of information focus is determined Periodic state is weak state;When the angle of change curve and transverse axis is more than four default values, determine that information focus is current Life cycle state be extinction state;Wherein, the first default value is less than the second default value, and the second default value is less than the Three default values, the 3rd default value are less than the 4th default value.
In another embodiment of the present invention, mobile module 804, it is additionally operable to life cycle after information focus updates When state is extinction state, information focus is moved to the 3rd list, the 3rd list is used to store the information in extinction state Focus.
Device provided in an embodiment of the present invention, information focus is not merely determined according to cluster result, but tied in cluster The similarity of information focus in fruit and first list is more than first threshold, or cluster result with it is to be confirmed in second list The similarity of focus is more than Second Threshold and when meeting preparatory condition, and cluster result is defined as into information focus, improve really The accuracy of fixed information focus.
It should be noted that:The device for the determination information focus that above-described embodiment provides it is determined that during information focus, only with The division progress of above-mentioned each functional module, can be as needed and by above-mentioned function distribution by not for example, in practical application Same functional module is completed, and will determine that the internal structure of the device of information focus is divided into different functional modules, to complete All or part of function described above.In addition, the device for the determination information focus that above-described embodiment provides is with determining information The embodiment of the method for focus belongs to same design, and its specific implementation process refers to embodiment of the method, repeats no more here.
One of ordinary skill in the art will appreciate that hardware can be passed through by realizing all or part of step of above-described embodiment To complete, by program the hardware of correlation can also be instructed to complete, described program can be stored in a kind of computer-readable In storage medium, storage medium mentioned above can be read-only storage, disk or CD etc..
The foregoing is only presently preferred embodiments of the present invention, be not intended to limit the invention, it is all the present invention spirit and Within principle, any modification, equivalent substitution and improvements made etc., it should be included in the scope of the protection.

Claims (10)

  1. A kind of 1. method for determining information focus, it is characterised in that methods described includes:
    Treat processing information to be clustered, obtain multiple classes;
    For any one class, the class and the similarity of each information focus in first list are calculated, the first list is used for Storage information focus;
    If the class and the similarity of any information focus in the first list are more than first threshold, the class is added Into information list corresponding to described information focus;
    If the class and the similarity of each information focus in the first list are respectively less than the first threshold, institute is calculated Class and the similarity of each information focus to be confirmed in second list are stated, the second list is used to store information heat to be confirmed Point;
    If the class and the similarity of any information focus to be confirmed in the second list are more than Second Threshold, by described in Class is added in information list corresponding to the information focus to be confirmed, and meets preparatory condition in the information focus to be confirmed When, the information focus to be confirmed is moved in the first list.
  2. 2. according to the method for claim 1, it is characterised in that each class has a class label, described to calculate the class After the similarity of each information focus to be confirmed in second list, in addition to:
    If the class and the similarity of each information focus to be confirmed in the second list are respectively less than the Second Threshold, The class label of the class is defined as target information focus to be confirmed;
    Target information focus to be confirmed is added in the second list, and expired in target information focus to be confirmed During the foot preparatory condition, target information focus to be confirmed is moved in the first list.
  3. 3. according to the method for claim 1, it is characterised in that described that the class is added to corresponding to described information focus After in information list, in addition to:
    The change curve of described information focus is drawn using the information content of described information focus as the longitudinal axis, by transverse axis of the time;
    Obtain first information amount of the described information focus in current time;
    The second information content of described information focus within a specified time is obtained, the specified time and the current time interval are pre- If duration;
    According to the first information amount, second information content and the preset duration, the change curve and transverse axis are calculated Angle;
    According to the change curve and the angle of transverse axis, the current life cycle state of described information focus is determined;
    According to the current life cycle state of described information focus, the life cycle state of described information focus is updated.
  4. 4. according to the method for claim 3, it is characterised in that the life cycle state includes generating state, development shape State, outburst state, weak state and extinction state;
    It is described that the current life cycle state of described information focus is determined according to the change curve and the angle of transverse axis, including:
    If the angle of the change curve and transverse axis is less than the first default value, it is determined that the current life of described information focus Periodic state is generating state;
    If the angle of the change curve and transverse axis is less than the second default value more than first default value, it is determined that institute It is state of development to state the current life cycle state of information focus;
    If the angle of the change curve and transverse axis is less than the 3rd default value more than second default value, it is determined that institute It is outburst state to state the current life cycle state of information focus;
    If the angle of the change curve and transverse axis is less than the 4th default value more than the 3rd default value, it is determined that institute It is weak state to state the current life cycle state of information focus;
    If the angle of the change curve and transverse axis is more than the 4th default value, it is determined that described information focus is current Life cycle state is extinction state;
    Wherein, first default value is less than second default value, and it is pre- that second default value is less than the described 3rd If numerical value, the 3rd default value is less than the 4th default value.
  5. 5. according to the method for claim 3, it is characterised in that the life cycle shape current according to described information focus State, after being updated to the life cycle state of described information focus, methods described also includes:
    If the life cycle state after the renewal of described information focus is extinction state, described information focus is moved to the 3rd List, the 3rd list are used to store the information focus in extinction state.
  6. 6. a kind of device for determining information focus, it is characterised in that described device includes:
    Cluster module, clustered for treating processing information, obtain multiple classes;
    Computing module, it is described for for any one class, calculating the class and the similarity of each information focus in first list First list is used for storage information focus;
    Add module, for when the similarity of any information focus in the class and the first list is more than first threshold, The class is added in information list corresponding to described information focus;
    The computing module, it is additionally operable to when the class and the similarity of each information focus in the first list are respectively less than described During first threshold, the class and the similarity of each information focus to be confirmed in second list are calculated, the second list is used for Store information focus to be confirmed;
    The add module, for being more than the when the similarity of any information focus to be confirmed in the class and the second list During two threshold values, the class is added in information list corresponding to the information focus to be confirmed;
    Mobile module, for when the information focus to be confirmed meets preparatory condition, the information focus to be confirmed to be moved Into the first list.
  7. 7. device according to claim 6, it is characterised in that each class has a class label, and described device also includes:
    First determining module, for being respectively less than when the class and the similarity of each information focus to be confirmed in the second list During the Second Threshold, the class label of the class is defined as target information focus to be confirmed;
    The add module, for target information focus to be confirmed to be added in the second list;
    The mobile module, for when target information focus to be confirmed meets the preparatory condition, the target to be treated Confirmation focus is moved in the first list.
  8. 8. device according to claim 6, it is characterised in that described device also includes:
    Drafting module, for using the information content of described information focus as the longitudinal axis, described information focus is drawn by transverse axis of the time Change curve;
    Acquisition module, for obtaining first information amount of the described information focus in current time;
    The acquisition module, it is additionally operable to obtain the second information content of described information focus within a specified time, the specified time With the current time interval preset duration;
    The computing module, for according to the first information amount, second information content and the preset duration, described in calculating The angle of change curve and transverse axis;
    Second determining module, for the angle according to the change curve and transverse axis, determine the current life of described information focus Periodic state;
    Update module, for according to the current life cycle state of described information focus, to the life cycle of described information focus State is updated.
  9. 9. device according to claim 8, it is characterised in that the life cycle state includes generating state, development shape State, outburst state, weak state and extinction state;
    Second determining module, for when the angle of the change curve and transverse axis is less than the first default value, determining institute It is generating state to state the current life cycle state of information focus;When the angle of the change curve and transverse axis is more than described first When default value is less than the second default value, it is state of development to determine the current life cycle state of described information focus;Work as institute When stating the angle of change curve and transverse axis and being less than three default values more than second default value, described information focus is determined Current life cycle state is outburst state;When the angle of the change curve and transverse axis is small more than the 3rd default value When four default values, it is weak state to determine the current life cycle state of described information focus;When the change curve When being more than four default value with the angle of transverse axis, it is extinction shape to determine the current life cycle state of described information focus State;Wherein, first default value is less than second default value, and it is default that second default value is less than the described 3rd Numerical value, the 3rd default value are less than the 4th default value.
  10. 10. device according to claim 8, it is characterised in that the mobile module, be additionally operable to work as described information focus more When life cycle state after new is extinction state, described information focus is moved to the 3rd list, the 3rd list is used for Information focus of the storage in extinction state.
CN201610964928.0A 2016-05-26 2016-11-04 Determine the method and device of information focus Active CN106570140B (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN2016103547372 2016-05-26
CN201610354737 2016-05-26

Publications (2)

Publication Number Publication Date
CN106570140A CN106570140A (en) 2017-04-19
CN106570140B true CN106570140B (en) 2018-03-02

Family

ID=58536136

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610964928.0A Active CN106570140B (en) 2016-05-26 2016-11-04 Determine the method and device of information focus

Country Status (1)

Country Link
CN (1) CN106570140B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110147482B (en) * 2017-09-11 2021-06-22 上海优扬新媒信息技术有限公司 Method and device for acquiring burst hotspot theme
CN108647222B (en) * 2018-03-22 2021-01-08 中国互联网络信息中心 Line three-dimensional roaming hotspot icon positioning method and system
CN109257700B (en) * 2018-11-19 2020-11-06 广东小天才科技有限公司 Positioning method, server and system based on positioning deviation rectification
CN112966505B (en) * 2021-01-21 2021-10-15 哈尔滨工业大学 Method, device and storage medium for extracting persistent hot phrases from text corpus
CN114153915A (en) * 2021-09-10 2022-03-08 北京天德科技有限公司 Method and system for tracing and tracing information in block chain
CN113836307B (en) * 2021-10-15 2024-02-20 国网北京市电力公司 Power supply service work order hot spot discovery method, system, device and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101477556A (en) * 2009-01-22 2009-07-08 苏州智讯科技有限公司 Method for discovering hot sport in internet mass information
CN101661513A (en) * 2009-10-21 2010-03-03 上海交通大学 Detection method of network focus and public sentiment
CN102982110A (en) * 2012-11-08 2013-03-20 中国科学院自动化研究所 Method for extracting hot spot event information of cyberspace in physical space

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101477556A (en) * 2009-01-22 2009-07-08 苏州智讯科技有限公司 Method for discovering hot sport in internet mass information
CN101661513A (en) * 2009-10-21 2010-03-03 上海交通大学 Detection method of network focus and public sentiment
CN102982110A (en) * 2012-11-08 2013-03-20 中国科学院自动化研究所 Method for extracting hot spot event information of cyberspace in physical space

Also Published As

Publication number Publication date
CN106570140A (en) 2017-04-19

Similar Documents

Publication Publication Date Title
CN106570140B (en) Determine the method and device of information focus
CN103365924B (en) A kind of method of internet information search, device and terminal
JP5802745B2 (en) Intelligent navigation method, apparatus and system
CN105488092B (en) A kind of time-sensitive and adaptive sub-topic online test method and system
CN104008106B (en) A kind of method and device obtaining much-talked-about topic
US20080147642A1 (en) System for discovering data artifacts in an on-line data object
US20150154306A1 (en) Method for searching related entities through entity co-occurrence
CN101957816A (en) Webpage metadata automatic extraction method and system based on multi-page comparison
CN103279543B (en) Path mode inquiring system for massive image data
CN107256263A (en) Internet hot spots information automatic monitoring method
US10417334B2 (en) Systems and methods for providing a microdocument framework for storage, retrieval, and aggregation
TW201415254A (en) Method and system for recommending semantic annotations
CN108875065A (en) A kind of Indonesia&#39;s news web page recommended method based on content
Nakashole et al. Real-time population of knowledge bases: opportunities and challenges
Moro et al. Early Profile Pruning on XML-aware Publish/Subscribe Systems.
Zhang et al. Continuous top-k monitoring on document streams
Khodaei et al. Temporal-textual retrieval: Time and keyword search in web documents
US20220374482A1 (en) Dynamic taxonomy builder and smart feed compiler
CN106250456A (en) Bid winning announcement extraction method and device
Zhang Start small, build complete: Effective and efficient semantic table interpretation using tableminer
Sharma et al. Shallow neural network and ontology-based novel semantic document indexing for information retrieval
Jin et al. Tise: A temporal search engine for web contents
Kumar et al. Term-frequency inverse-document frequency definition semantic (TIDS) based focused web crawler
US20130124509A1 (en) Publish-subscribe based methods and apparatuses for associating data files
US20150154195A1 (en) Method for entity-driven alerts based on disambiguated features

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20180705

Address after: 230088 room 609-6, R & D center of China (Hefei) International Intelligent Speech Industrial Park, 3333, hi tech Road, Hefei, Anhui.

Patentee after: Anhui Tai Yue Xiang Sheng Software Co., Ltd.

Address before: 100089 Beijing Haidian District Haidian District 25 East Road three units 6 units

Patentee before: China Science and Technology (Beijing) Co., Ltd.

TR01 Transfer of patent right