CN105912670A - Method and device for network hotspot excavation - Google Patents
Method and device for network hotspot excavation Download PDFInfo
- Publication number
- CN105912670A CN105912670A CN201610225018.0A CN201610225018A CN105912670A CN 105912670 A CN105912670 A CN 105912670A CN 201610225018 A CN201610225018 A CN 201610225018A CN 105912670 A CN105912670 A CN 105912670A
- Authority
- CN
- China
- Prior art keywords
- network data
- text
- network
- phrase
- categories
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9535—Search customisation based on user profiles and personalisation
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Information Transfer Between Computers (AREA)
Abstract
The invention discloses a method and device for network hotspot excavation. The device comprises a classification and storage module, a filtering and extraction module, a sorting and combination module and a hotspot statistics module, wherein the classification and storage module is applicable to collection of network data and is used for classification and sorted storage of the network data; the filtering and extraction module is applicable to separately filtering the network data under each class according to preset filtering rules and extracting keywords from the network data which is filtered under each class; the sorting and combination module is applicable to sorting the keywords extracted from the same network data, combining the sorted keywords of the same network data and obtaining keyword groups of each network data under each class; and the hotspot statistics module is applicable to counting occurrence frequencies of the keyword groups under the corresponding classes and obtaining network hotspot keyword groups under each class respectively for sorted display. By the technical scheme provided by the invention, network hotspots can be excavated more macroscopically; excavation results can more accurately reflect objective facts of Internet public opinions; and hotspots in a certain field can be reflected in a more targeted manner.
Description
The application is application number 201210346827.9, on 09 25th, 2012 applying date, denomination of invention
The divisional application of " network hotspot method for digging and device ".
Technical field
The present invention relates to field of Internet communication, particularly relate to a kind of network hotspot method for digging and device.
Background technology
In the prior art, along with the development of the Internet, increasing website introduces user-generated content
(User Generated Content, referred to as UGC) function, substantial amounts of netizen pour in forum, blog,
Microblogging delivered the suggestion of oneself and discloses all kinds of news, having every day thousands of topic to produce from the Internet
Raw, from internet mass information, obtain network hotspot the most faster, will to understand social development situation,
Grasp public opinion and dynamically play directiveness effect.
At present, commonly used in prior art focus method for digging is by the text in special time period
Transfer amount, click volume, reply volume carry out the weighted calculation of predetermined condition and obtain text calorific value, pass through calorific value
Sequence obtains the hottest text.But, the technical scheme of prior art there is problems in that 1, due to the most right
Single text self attributes is added up, and the much-talked-about topic of acquisition is only capable of reflecting the temperature of a certain article on microcosmic
Situation, and macroscopically temperature situation to a certain netizen's focus cannot be reflected;2, due to the sample of statistics
Integrating as full dose data, and do not get down to corresponding statistical analysis from content of text, the result therefore produced does not has pin
To property, it is impossible to point field reflection is for the focus situation in this field;3, technical scheme of the prior art is only
The text of the energy identical same content of statistical nature, acquired results repeatability is big, readable poor.
Summary of the invention
The present invention provides a kind of network hotspot method for digging and device, digs solving network hotspot in prior art
Result is the most macroscopical in pick, the field reflection focus situation for this field and repeatability can not be divided big, readable
Property difference problem.
The present invention provides a kind of network hotspot method for digging, including: gather network data, network data is entered
Row classification and classification storage;Respectively the network data under of all categories is carried out according to the filtering rule pre-set
Filter, and from the network data after lower filtration of all categories, extract centre word respectively;To from consolidated network data
The centre word of middle extraction is ranked up, and is combined by the centre word after the sequence of consolidated network data, obtains
Of all categories under the center phrase of each network data;Statistics center phrase goes out occurrence under generic
Number, obtains lower network hotspot phrase of all categories respectively and carries out classification displaying.
Alternatively, network data includes: article content that text header is corresponding with text header and
The text attribute corresponding with text header.
Alternatively, text attribute farther includes at least one of: the URL that text is corresponding
URL, text source forum/blog, text source column, the issuing time of text, text author,
Text reply number and text browse number.
Alternatively, network data is classified and storage of classifying farther includes: utilize text automatic classification
Technology carries out text classification according to article content to network data, obtains the contingency table corresponding with network data
Sign, and by the text property store of corresponding text header, corresponding tag along sort and correspondence to engine
In;At predetermined time intervals engine is carried out primary network data acquisition, and will collect according to tag along sort
Network data stores classifiedly in the different XML file of given server.
Alternatively, filtering rule farther includes at least one of: text header does not meets predetermined number of words
Network data delete;The network data that issuing time is against regulation is deleted;To URL
In delete containing the network data of predetermined domain name, wherein, predetermined domain name is the black name of the domain name pre-set
Domain name in list;Or, the network data containing predetermined domain name in URL is retained;To source version
Block is that the network data of predetermined column is deleted, and wherein, predetermined column is the column blacklist pre-set
In column;Or, the network data that source column is predetermined column is retained;Source is not met
The network data of regulation is deleted, and wherein, source includes: forum, blog or whole model;Right
Reply number is not inconsistent the network data of regulation and deletes;Delete browsing several network data against regulation
Remove;The network data that author is against regulation is deleted;And network data is disappeared heavily process.
Alternatively, use participle technique respectively from the network data after lower filtration of all categories extract centre word it
Before, said method also includes: the prefix dictionary according to pre-setting carries out prefix filtration to text header.
Alternatively, use participle technique to extract centre word respectively from the network data after lower filtration of all categories to enter
One step includes: uses participle technique that the text header after lower filtration of all categories carries out participle respectively, obtains and divide
Word result, and using word segmentation result as centre word.
Alternatively, before being ranked up the centre word from consolidated network extracting data, method also includes:
Filter according to the everyday words in the conventional dictionary the pre-set centre word to extracting.
Alternatively, it is combined farther including by the centre word after the sequence of consolidated network data: according to inciting somebody to action
Belonging to the centre word after the sequence of same text header to be combined, wherein, n is for belonging to same text
Total number of the centre word of title, r≤n and 2≤r≤5.
Alternatively, the centre word after the sequence of consolidated network data is combined, it is thus achieved that each under of all categories
After the center phrase of individual network data, said method also includes: according to the rubbish dictionary centering pre-set
Rubbish phrase in heart phrase filters.
Alternatively, statistics center phrase occurrence number under generic, obtain respectively of all categories under net
Network focus phrase farther includes: statistics center phrase goes out occurrence in different text headers under generic
Number, will appear from number of times and arranges according to predefined procedure more than the center phrase of predetermined threshold, obtains each respectively
Network hotspot phrase under classification.
Alternatively, obtain respectively of all categories under network hotspot phrase after, said method also includes: to
Network hotspot phrase identical under one classification merges;Calculate of all categories under network hotspot phrase corresponding to
Hot value;Search for the link of focus incident corresponding to lower network focus phrase of all categories.
Alternatively, carry out classification displaying and farther include: show hot spot report to user, wherein, focus report
Announcement includes: network hotspot phrase under of all categories in the generic of network hotspot phrase, predetermined amount of time,
Corresponding to the hot value corresponding to network hotspot phrase under of all categories and lower network focus phrase of all categories
The link of focus incident, predetermined amount of time includes at least one of: per hour, every day, weekly, with
And monthly.
Present invention also offers a kind of network hotspot excavating gear, including: classification memory module, be suitable to gather
Network data, storage of network data being classified and classifies;Filter extraction module, be suitable to according to setting in advance
Network data under of all categories is filtered by filtering rule respectively that put, and the net after lower filtration of all categories
Network extracting data centre word;Ordered set compound module, is suitable to the centre word from consolidated network extracting data
It is ranked up, and the centre word after the sequence of consolidated network data is combined, it is thus achieved that each under of all categories
The center phrase of individual network data;Focus statistics module, is suitable to statistics center phrase going out under generic
Occurrence number, obtain respectively of all categories under network hotspot phrase and carry out classification show.
Alternatively, network data also includes: the text header article content corresponding with text header, with
And the text attribute corresponding with text header.
Alternatively, text attribute farther includes at least one of: the URL that text is corresponding
URL, text source forum/blog, text source column, the issuing time of text, text author,
Text reply number and text browse number.
Alternatively, classification memory module is further adapted for: utilize Technologies of Automated Text Classification according to article content
Network data is carried out text classification, obtains the tag along sort corresponding with network data, and by corresponding text
The text property store of title, corresponding tag along sort and correspondence is in engine;The most right
Engine carries out primary network data acquisition, and according to tag along sort the network data collected stored classifiedly in
In the different XML file of given server.
Alternatively, filtering rule farther includes at least one of: text header does not meets predetermined number of words
Network data delete;The network data that issuing time is against regulation is deleted;To URL
In delete containing the network data of predetermined domain name, wherein, predetermined domain name is the black name of the domain name pre-set
Domain name in list;Or, the network data containing predetermined domain name in URL is retained;To source version
Block is that the network data of predetermined column is deleted, and wherein, predetermined column is the column blacklist pre-set
In column;Or, the network data that source column is predetermined column is retained;Source is not met
The network data of regulation is deleted, and wherein, source includes: forum, blog or whole model;Right
Reply number is not inconsistent the network data of regulation and deletes;Delete browsing several network data against regulation
Remove;The network data that author is against regulation is deleted;And network data is disappeared heavily process.
Alternatively, filter extraction module to be further adapted for: use participle technique respectively after lower filtration of all categories
Network data in extract before centre word, the prefix dictionary according to pre-setting carries out prefix to text header
Filter.
Alternatively, filter extraction module to be further adapted for: after using participle technique respectively to lower filtration of all categories
Text header carry out participle, obtain word segmentation result, and using word segmentation result as centre word.
Alternatively, ordered set compound module is further adapted for: enter the centre word from consolidated network extracting data
Before row sequence, filter according to the everyday words in the conventional dictionary the pre-set centre word to extracting.
Alternatively, ordered set compound module is further adapted for: according toThe sequence of same text header will be belonged to
After centre word be combined, wherein, n is total number of the centre word belonging to same text header, r
≤ n and 2≤r≤5.
Alternatively, ordered set compound module is further adapted for: entered by the centre word after the sequence of consolidated network data
Row combination, it is thus achieved that after the center phrase of each network data under of all categories, according to the rubbish pre-set
Rubbish phrase in the phrase of center is filtered by dictionary.
Alternatively, focus statistics module is further adapted for: statistics center phrase different texts under generic
Occurrence number in title, will appear from number of times and arranges according to predefined procedure more than the center phrase of predetermined threshold
Row, obtain respectively of all categories under network hotspot phrase.
Alternatively, focus statistics module is further adapted for: enter network hotspot phrase identical under same category
Row merges;Calculate of all categories under the hot value corresponding to network hotspot phrase;Search for lower network of all categories heat
The link of some focus incident corresponding to phrase.
Alternatively, focus statistics module is further adapted for: show hot spot report to user, wherein, and focus report
Announcement includes: network hotspot phrase under of all categories in the generic of network hotspot phrase, predetermined amount of time,
Corresponding to the hot value corresponding to network hotspot phrase under of all categories and lower network focus phrase of all categories
The link of focus incident, predetermined amount of time includes at least one of: per hour, every day, weekly, with
And monthly.
The present invention has the beneficial effect that:
Excavate by utilizing hot word Computing Principle to realize focus, and by Text Classification and focus digging technology
Combine, solve that network hotspot Result in prior art is the most macroscopical, can not point field reflect for this
Focus situation and the repeatability in field are big, the problem of readable difference;Excavation network that can be more macroscopical
Focus, reflects the temperature situation macroscopically to a certain netizen's focus, makes Result more can reflect the Internet
The objective fact of public opinion, it is easier to the identical content article that merger repeats, and can be more targeted
Reflect a certain field focus.
Described above is only the general introduction of technical solution of the present invention, in order to better understand the technology of the present invention
Means, and can being practiced according to the content of description, and in order to allow above and other objects of the present invention,
Feature and advantage can become apparent, below especially exemplified by the detailed description of the invention of the present invention.
Accompanying drawing explanation
By reading the detailed description of hereafter preferred implementation, various other advantage and benefit for ability
Territory those of ordinary skill will be clear from understanding.Accompanying drawing is only used for illustrating the purpose of preferred implementation, and also
It is not considered as limitation of the present invention.And in whole accompanying drawing, it is denoted by the same reference numerals identical
Parts.In the accompanying drawings:
Fig. 1 is the flow chart of the network hotspot method for digging of the embodiment of the present invention;
Fig. 2 is the schematic diagram of the filtering rule of the embodiment of the present invention;
Fig. 3 be the network hotspot method for digging of the embodiment of the present invention process schematic diagram in detail;
Fig. 4 is the structural representation of the network hotspot excavating gear of the embodiment of the present invention.
Detailed description of the invention
It is more fully described the exemplary embodiment of the disclosure below with reference to accompanying drawings.Although accompanying drawing shows
The exemplary embodiment of the disclosure, it being understood, however, that may be realized in various forms the disclosure and should be by
Embodiments set forth here is limited.On the contrary, it is provided that these embodiments are able to be best understood from this
Open, and complete for the scope of the present disclosure can be conveyed to those skilled in the art.
In order to solve that network hotspot Result in prior art is the most macroscopical, can not the reflection of point field lead for this
Focus situation and the repeatability in territory are big, the problem of readable difference, the invention provides a kind of network hotspot
Method for digging and device, the network hotspot method for digging of the embodiment of the present invention and device use text automatic classification
Technology and hot word computing technique realize.Below in conjunction with accompanying drawing and embodiment, the present invention is entered one
Step describes in detail.Should be appreciated that specific embodiment described herein only in order to explain the present invention, not
Limit the present invention.
According to embodiments of the invention, it is provided that a kind of network hotspot method for digging, Fig. 1 is that the present invention implements
The flow chart of the network hotspot method for digging of example, as it is shown in figure 1, network hotspot according to embodiments of the present invention
Method for digging includes processing as follows:
Step 101, gathers network data, storage of classifying network data and classify;
Wherein, the network data described in step 101 specifically includes: text header is relative with text header
The article content answered and the text attribute corresponding with text header.Wherein, text attribute specifically includes
At least one of: URL (the Uniform/Universal Resource that text is corresponding
Locator, referred to as URL), text source forum/blog, text source column, the issue of text
Time, text author, text reply number and text browse number.
In a step 101, network data is classified and storage of classifying specifically includes:
Step 1, utilizes Technologies of Automated Text Classification, according to article content, network data is carried out text classification,
Obtain the tag along sort corresponding with network data, and by corresponding text header, corresponding tag along sort, with
And the text property store of correspondence is in engine;Wherein, Technologies of Automated Text Classification refers to: utilize engineering
The principle practised relies on the model parameter after small-sample learning to text set (or other entities or object) according to one
Fixed taxonomic hierarchies or standard carry out automatic key words sorting.
Step 2, carries out primary network data acquisition at predetermined time intervals to engine, and will according to tag along sort
The network data collected stores classifiedly in the different XML file of given server.Wherein, pre-timing
Between can be 1 hour, 6 hours, 1 day, in embodiments of the present invention, the scheduled time can be according to collection
Data characteristics (such as, renewal speed) arrange flexibly.
Step 102, filters the network data under of all categories respectively according to the filtering rule pre-set,
And from the network data after lower filtration of all categories, extract centre word respectively;
Preferably, Fig. 2 is the schematic diagram of the filtering rule of the embodiment of the present invention, as in figure 2 it is shown, at this
In bright embodiment, filtering rule specifically includes at least one of: 1, text header is not met reserved word
The network data of number is deleted;2, the network data that issuing time is against regulation is deleted;3、
Deleting the network data containing predetermined domain name in URL, wherein, predetermined domain name is pre-set
Domain name in domain name blacklist;Or, the network data containing predetermined domain name in URL is retained;4、
Deleting the network data that source column is predetermined column, wherein, predetermined column is the version pre-set
Column in block blacklist;Or, the network data that source column is predetermined column is retained;5、
To originate network data against regulation delete, wherein, source includes: forum, blog or
All models;6, the network data that reply number is not inconsistent regulation is deleted;7, rule are not met to browsing to count
Fixed network data is deleted;8, the network data that author is against regulation is deleted;9, to net
Network data carry out disappearing and heavily process.
It should be noted that the filtering rule in the embodiment of the present invention is not limited to 9 rules listed above,
In embodiments of the present invention, filtering rule can be configured as required, such as, is arranged by filtering rule
For: the network data that the number of words of article is not above predetermined number of words threshold value is deleted etc..
Additionally, in a step 102, before extracting centre word, in order to preferably extract the center of needs
Word, can carry out prefix filtration according to the prefix dictionary pre-set to text header, such as, be flutterred greatly by cat
This kind of unwanted prefixes such as student base, ends of the earth tittle-tattle filter.These prefixes do not participate in centre word
Extract.Further, in embodiments of the present invention, can use participle technique respectively after lower filtration of all categories
Network data is extracted centre word;Specifically, after can using participle technique respectively to lower filtration of all categories
Text header carries out participle, obtains word segmentation result, and using word segmentation result as centre word.It should be noted that
Above-mentioned participle technique is centre word extractive technique ripe in prior art, and the embodiment of the present invention can also use
Other technologies carry out the extraction of centre word.
Step 103, is ranked up the centre word from consolidated network extracting data, and by consolidated network number
According to sequence after centre word be combined, it is thus achieved that the center phrase of each network data under of all categories;
Step 103 is realized by hot word computing technique, and hot word computing technique refers to: automatically in real time
The web page text gathered carries out participle, packet merger, calculates high frequency hot keyword, and according to predefined
Dictionary and preset rules filter, and export real-time Internet focus vocabulary.
In step 103, before the centre word from consolidated network extracting data is ranked up, Ke Yigen
Filtering according to the everyday words in the conventional dictionary the pre-set centre word to extracting, above-mentioned everyday words is
Refer to the most original, reprint, the vocabulary such as figure group, need to filter out these vocabulary.
Further, in step 103, carry out centre word combination to refer to: according toSame text mark will be belonged to
Centre word after the sequence of topic is combined, and wherein, n is centre word total belonging to same text header
Number, r≤n and 2≤r≤5.
After performing step 103, in embodiments of the present invention, it is preferable that can be according to pre-setting
Rubbish dictionary the rubbish phrase in the phrase of center is filtered.
Step 104, statistics center phrase occurrence number under generic, obtain respectively of all categories under
Network hotspot phrase also carries out classification displaying.
Step 104 specifically includes following process: statistics center phrase is under generic in different text headers
Occurrence number, will appear from number of times and arrange according to predefined procedure more than the center phrase of predetermined threshold, point
Do not obtain of all categories under network hotspot phrase.Wherein, above-mentioned predefined procedure can be by many by occurrence number
Arrange to few.
Obtain of all categories under network hotspot phrase after, can be to network boom identical under same category
Point phrase merges;Calculate of all categories under the hot value corresponding to network hotspot phrase;And search for all kinds of
The link of the focus incident corresponding to other lower network focus phrase.To provide the user focus letter more in all directions
Breath.
At step 104, classification displaying refers to: show hot spot report to user, wherein, and hot spot report bag
Include: network hotspot phrase under of all categories in the generic of network hotspot phrase, predetermined amount of time, all kinds of
The hot value corresponding to network hotspot phrase under not and the heat corresponding to lower network focus phrase of all categories
The link of some event, predetermined amount of time includes at least one of: per hour, every day, weekly and every
Month.
Below in conjunction with the accompanying drawings, the technical scheme of the embodiment of the present invention is illustrated.
Fig. 3 be the network hotspot method for digging of the embodiment of the present invention process schematic diagram in detail, as it is shown on figure 3,
Network hotspot method for digging according to embodiments of the present invention specifically includes and processes as follows:
Step 301, utilizes self-defined language material to generate disaggregated model by machine learning module, by classification mould
The type network data to collecting carries out text classification, and tag along sort is together deposited together with text attribute into
In engine.
Step 302, carries out a data acquisition per hour to engine, and by data by storing classifiedly in appointment
Different extensible markup language (Extensible Markup Language, the referred to as XML) literary composition of server
In part.
Step 303, crosses filter data by following filtering rule, and the data after filtering remain in data base,
Wherein, filtering rule can be managed by user by data filtering regulation management backstage.
Specifically, filtering rule according to embodiments of the present invention includes:
1, title filters: come in by the number of words of title data filtering between 5-30 word;
2, post temporal filtering, be that the model on the same day filters by the time of posting;
3, domain filter: (1) takes fuzzy matching, can will have corresponding domain name or list in the URL of model
The model of word filters;Or, (2) by domain name by 30 current events forums, 20 automobile forums and
In the URL of model, the model of band auto filters;Or, satisfied (1), (2) both rules
Will filter into.
4, column filters: filter according to the URL of plate seed;Can also by column title band certain
The model of Chinese character filters;Such as, the amusement of column title band or the model of Eight Diagrams printed words are filtered out;
5, domain name blacklist filters: the result filtered out above is carried out deletion action, by certain second-level domain
In name or two grades of URL, the model with certain word filters out;And, it is xinhuanet.com at TLD
Result in, be filtering out of 120ask.xinhuanet.com by domain name;
6, column blacklist: the result filtered out above is carried out deletion action, certain seed or column
In Ming, the model with certain word filters out;Further, it is filtering out of reporting of new person by column name;
7, source filtering: come in meeting the data filtering filtering source, wherein, filters source and refers to:
Forum, blog or whole model;
8, reply number hits to filter: come in replying number data filtering within 0-1000;Will click on
Number data filtering within 0-10000 is come in;
9, disappear and heavily process: carry out the weight that disappears, one model of all calculations that TLD is identical according to the URL of model;
10, filtered fields includes: title, URL, source forum, come active plate, the time of posting, author,
Reply number, browse number etc..
11, filter logic order: above-mentioned 3rd article of filtering rule and the 4th article of filtering rule are the passes of "or"
System, between other filtering rules be " with " relation.
All text headers are extracted centre word by step 304, and a title may have multiple centre word, logical
Crossing participle technique and title is carried out participle, word segmentation result is title centre word.Preferably, the most right before participle
Title carries out prefix filtration, and these prefixes are not involved in participle, such as, and " cat flutters university students base ", " ends of the earth
Tittle-tattle " etc. this kind of prefix.Wherein, user can be by prefix management backstage to needing the prefix filtered to enter
Line pipe is managed;
Step 305, focus phrase calculates:
Step 1, by the everyday words (such as, the vocabulary such as " original ", " reprinting ", " group figure ") in word segmentation result
Filter;Wherein, user can be by everyday words management backstage to needing the everyday words filtered to be managed;
Step 2, the centre word after filtering carries out the phrase sequence (centre word that such as, a title is extracted out
For bca, after sequence, become abc);
Step 3, is combined the centre word of each title, the centre word of each titleCombination, combination
Formula:Only retain the phrase of 2-5 word;
Below, in conjunction with example, centre word is carried out phrase sequence combination to be illustrated.
Title one extracts centre word b, a, c out, a, b, c after sequence, forms phrase ab, bc, ac, abc
Title two extracts centre word c, b, d out, b, c, d after sequence, forms phrase bc, cd, bd, bcd
Title three extracts centre word b, c out, forms phrase bc
So these three title formed phrase seniority among brothers and sisters be exactly: bc (3), ab (1), ac (1), cd (1),
bd(1)、abc(1)、bcd(1)。
Step 4, filtering spam phrase, removes such as inquiry ### prize-winning, ### phone, ### consulting, mobile phone ###
The rubbish phrase of prize-winning etc;Wherein, user can be by the rubbish phrase management backstage rubbish to needing filtration
Rubbish phrase is managed;
Step 306, forms focus phrase ranking list, adds up the title quantity of each focus phrase behind and presses
Title quantity descending, retains the phrase of title quantity more than 2, and this parameter can be done according to real data
Adjust;
In sum, by means of the technical scheme of the embodiment of the present invention, by utilizing hot word Computing Principle to realize
Focus excavates, and is combined with focus digging technology by Text Classification, solves network in prior art
Focus Result is the most macroscopical, can not divide the field reflection focus situation for this field and repeatability is big,
The problem of readable difference;Excavation network hotspot that can be more macroscopical, a certain netizen is macroscopically paid close attention to by reflection
The temperature situation of point, makes Result more can reflect the objective fact of the Internet public opinion, it is easier to merger repeats
The identical content article occurred, and can the focus in a certain field of more targeted reflection.
According to embodiments of the invention, it is provided that a kind of network hotspot excavating gear, Fig. 4 is that the present invention implements
The structural representation of the network hotspot excavating gear of example, as shown in Figure 4, network according to embodiments of the present invention
Focus excavating gear includes: classification memory module 40, filter extraction module 42, ordered set compound module 44,
And focus statistics module 46, below the modules of the embodiment of the present invention is described in detail.
Classification memory module 40, is suitable to gather network data, storage of classifying network data and classify;
Wherein, above-mentioned network data specifically includes: the text header article content corresponding with text header,
And the text attribute corresponding with text header.Wherein, above-mentioned text attribute specifically include following at least it
One: URL that text is corresponding, the source forum/blog of text, source column, the issue of text of text
Time, text author, text reply number and text browse number.
Classification memory module 40 is particularly adapted to: 1, utilize Technologies of Automated Text Classification according to article content to net
Network data carry out text classification, obtain the tag along sort corresponding with network data, and by corresponding text header,
Corresponding tag along sort and the text property store of correspondence are in engine;Wherein, text automatic classification skill
Art refers to: utilize the principle of machine learning rely on the model parameter after small-sample learning to text set (or other
Entity or object) carry out automatic key words sorting according to certain taxonomic hierarchies or standard.2, every pre-timing
Between engine carried out primary network data acquisition, and according to tag along sort, the network data collected classification is deposited
It is put in the different XML file of given server.Wherein, the scheduled time can be 1 hour, 6 hours,
1 day, in embodiments of the present invention, the scheduled time (such as, can update speed according to the data characteristics gathered
Degree) arrange flexibly.
Filter extraction module 42, be suitable to according to the filtering rule that pre-sets respectively to the network number under of all categories
According to filtering, and from the network data after lower filtration of all categories, extract centre word;
In embodiments of the present invention, Fig. 2 is the schematic diagram of the filtering rule of the embodiment of the present invention, such as Fig. 2 institute
Showing, filtering rule specifically includes at least one of: 1, text header does not meets the network of predetermined number of words
Data are deleted;2, the network data that issuing time is against regulation is deleted;3, to URL
In delete containing the network data of predetermined domain name, wherein, predetermined domain name is the black name of the domain name pre-set
Domain name in list;Or, the network data containing predetermined domain name in URL is retained;4, to source
Column is that the network data of predetermined column is deleted, and wherein, predetermined column is the black name of the column pre-set
Column in list;Or, the network data that source column is predetermined column is retained;5, to source
Network data against regulation is deleted, and wherein, source includes: forum, blog or whole note
Son;6, the network data that reply number is not inconsistent regulation is deleted;7, to browsing several net against regulation
Network data are deleted;8, the network data that author is against regulation is deleted;9, to network data
Carry out disappearing and heavily process.
It should be noted that the filtering rule in the embodiment of the present invention is not limited to 9 rules listed above,
In embodiments of the present invention, filtering rule can be configured as required, such as, is arranged by filtering rule
For: the network data that the number of words of article is not above predetermined number of words threshold value is deleted etc..
Additionally, before extracting centre word, in order to preferably extract the centre word of needs, filter and extract mould
Block 42 is further adapted for: according to the prefix dictionary pre-set, text header can be carried out prefix filtration,
Such as, cat flutters this kind of unwanted prefix such as university students base, ends of the earth tittle-tattle to filter.These prefixes
Do not participate in the extraction of centre word.Further, in embodiments of the present invention, filter extraction module 42 can use
Participle technique extracts centre word respectively from the network data after lower filtration of all categories;Specifically, extraction is filtered
Module 42 can use participle technique respectively the text header after lower filtration of all categories to be carried out participle, obtains
Word segmentation result, and using word segmentation result as centre word.It should be noted that above-mentioned participle technique is existing skill
Centre word extractive technique ripe in art, the embodiment of the present invention can also use other technologies to carry out centre word
Extract.
Ordered set compound module 44, is suitable to be ranked up the centre word from consolidated network extracting data, and will
Centre word after the sequence of consolidated network data is combined, it is thus achieved that in each network data under of all categories
Heart phrase;
Ordered set compound module 44 is the above-mentioned process realized by hot word computing technique, and hot word calculates skill
Art refers to: the web page text to Real-time Collection carries out participle, packet merger automatically, calculates high frequency focus crucial
Word, and filter according to predefined dictionary and preset rules, export real-time Internet focus vocabulary.
Before being ranked up the centre word from consolidated network extracting data, ordered set compound module 44 is permissible
Filter according to the everyday words in the conventional dictionary the pre-set centre word to extracting.Above-mentioned everyday words
Refer to the most original, reprint, the vocabulary such as figure group, need to filter out these vocabulary.
Ordered set compound module 44 carries out centre word combination and refers to: ordered set compound module 44 basisTo belong to same
Centre word after the sequence of one text header is combined, and wherein, n is to belong to same text header
Total number of centre word, r≤n and 2≤r≤5.
Preferably, the centre word after the sequence of consolidated network data is being combined, it is thus achieved that under of all categories
After the center phrase of each network data, ordered set compound module 44 is further adapted for: according to pre-setting
Rubbish dictionary the rubbish phrase in the phrase of center is filtered.
Focus statistics module 46, is suitable to statistics center phrase occurrence number under generic, obtains respectively
Network hotspot phrase under of all categories also carries out classification and shows.
Focus statistics module 46 is particularly adapted to: statistics center phrase is under generic in different text headers
Occurrence number, will appear from number of times and arrange according to predefined procedure more than the center phrase of predetermined threshold, point
Do not obtain of all categories under network hotspot phrase.
Obtain of all categories under network hotspot phrase after, focus statistics module 46 is further adapted for:
Network hotspot phrase identical under same category is merged;Calculate of all categories under network hotspot phrase institute
Corresponding hot value;Search for the link of focus incident corresponding to lower network focus phrase of all categories.
Focus statistics module 46 displaying of classifying refers to: show hot spot report to user, wherein, and hot spot report
Including: in the generic of network hotspot phrase, predetermined amount of time of all categories under network hotspot phrase, each
Corresponding to the hot value corresponding to network hotspot phrase under classification and lower network focus phrase of all categories
The link of focus incident, predetermined amount of time includes at least one of: per hour, every day, weekly and
Monthly.
Below in conjunction with the accompanying drawings, the technical scheme of the embodiment of the present invention is illustrated.
Fig. 3 be the network hotspot method for digging of the embodiment of the present invention process schematic diagram in detail, as it is shown on figure 3,
Network hotspot method for digging according to embodiments of the present invention specifically includes and processes as follows:
Step 301, utilizes self-defined language material to generate disaggregated model, classification storage mould by machine learning module
Block 40 carries out text classification by the disaggregated model network data to collecting, and by tag along sort together with literary composition
This attribute is together deposited in engine.
Step 302, classification memory module 40 carries out a data acquisition per hour, and data is pressed engine
Store classifiedly different extensible markup language in given server (Extensible Markup Language,
Referred to as XML) in file.
Step 303, filters extraction module 42 and crosses filter data by following filtering rule, and the data after filtering
Remaining in data base, wherein, filtering rule can be carried out by user by data filtering regulation management backstage
Management.
Specifically, Fig. 3 is the preferred schematic diagram of the filtering rule of the embodiment of the present invention, as it is shown on figure 3, root
Include according to the filtering rule of the embodiment of the present invention:
1, title filters: come in by the number of words of title data filtering between 5-30 word;
2, post temporal filtering, be that the model on the same day filters by the time of posting;
3, domain filter: (1) takes fuzzy matching, can will have corresponding domain name or list in the URL of model
The model of word filters;Or, (2) by domain name by 30 current events forums, 20 automobile forums and
In the URL of model, the model of band auto filters;Or, satisfied (1), (2) both rules
Will filter into.
4, column filters: filter according to the URL of plate seed;Can also by column title band certain
The model of Chinese character filters;Such as, the amusement of column title band or the model of Eight Diagrams printed words are filtered out;
5, domain name blacklist filters: the result filtered out above is carried out deletion action, by certain second-level domain
In name or two grades of URL, the model with certain word filters out;And, it is xinhuanet.com at TLD
Result in, be filtering out of 120ask.xinhuanet.com by domain name;
6, column blacklist: the result filtered out above is carried out deletion action, certain seed or column
In Ming, the model with certain word filters out;Further, it is filtering out of reporting of new person by column name;
7, source filtering: come in meeting the data filtering filtering source, wherein, filters source and refers to:
Forum, blog or whole model;
8, reply number hits to filter: come in replying number data filtering within 0-1000;Will click on
Number data filtering within 0-10000 is come in;
9, disappear and heavily process: carry out the weight that disappears, one model of all calculations that TLD is identical according to the URL of model;
10, filtered fields includes: title, URL, source forum, come active plate, the time of posting, author,
Reply number, browse number etc..
11, filter logic order: above-mentioned 3rd article of filtering rule and the 4th article of filtering rule are the passes of "or"
System, between other filtering rules be " with " relation.
Step 304, filters extraction module 42 and all text headers is extracted centre word, and a title may have
Multiple centre words, carry out participle by participle technique to title, and word segmentation result is title centre word.Preferably
Ground, first carries out prefix filtration to title before participle, and these prefixes are not involved in participle, such as, and " Mao Pu university
Raw base ", this kind of prefix such as " ends of the earth tittle-tattle ".Wherein, user can be by prefix management backstage to needing
Prefix to be filtered is managed;
Step 305, ordered set compound module 44 carries out focus phrase calculating:
Step 1, by the everyday words (such as, the vocabulary such as " original ", " reprinting ", " group figure ") in word segmentation result
Filter;Wherein, user can be by everyday words management backstage to needing the everyday words filtered to be managed;
Step 2, the centre word after filtering carries out the phrase sequence (centre word that such as, a title is extracted out
For bca, after sequence, become abc);
Step 3, is combined the centre word of each title, the centre word of each titleCombination, combination
Formula:Only retain the phrase of 2-5 word;
Below, in conjunction with example, centre word is carried out phrase sequence combination to be illustrated.
Title one extracts centre word b, a, c out, a, b, c after sequence, forms phrase ab, bc, ac, abc
Title two extracts centre word c, b, d out, b, c, d after sequence, forms phrase bc, cd, bd, bcd
Title three extracts centre word b, c out, forms phrase bc
So these three title formed phrase seniority among brothers and sisters be exactly: bc (3), ab (1), ac (1), cd (1),
bd(1)、abc(1)、bcd(1)。
Step 4, filtering spam phrase, removes such as inquiry ### prize-winning, ### phone, ### consulting, mobile phone ###
The rubbish phrase of prize-winning etc;Wherein, user can be by the rubbish phrase management backstage rubbish to needing filtration
Rubbish phrase is managed;
Step 306, focus statistics module 46 forms focus phrase ranking list, adds up each focus phrase behind
Title quantity and press title quantity descending, retain title quantity more than 2 phrase, this parameter can
Adjust according to real data;
In sum, by means of the technical scheme of the embodiment of the present invention, by utilizing hot word Computing Principle to realize
Focus excavates, and is combined with focus digging technology by Text Classification, solves network in prior art
Focus Result is the most macroscopical, can not divide the field reflection focus situation for this field and repeatability is big,
The problem of readable difference;Excavation network hotspot that can be more macroscopical, a certain netizen is macroscopically paid close attention to by reflection
The temperature situation of point, makes Result more can reflect the objective fact of the Internet public opinion, it is easier to merger repeats
The identical content article occurred, and can the focus in a certain field of more targeted reflection.
Algorithm and display be not intrinsic with any certain computer, virtual system or miscellaneous equipment provided herein
Relevant.Various general-purpose systems can also be used together with based on teaching in this.As described above, structure
Make the structure required by this kind of system to be apparent from.Additionally, the present invention is also not for any certain programmed
Language.It is understood that, it is possible to use various programming languages realize the content of invention described herein, and
The description done language-specific above is the preferred forms in order to disclose the present invention.
In description mentioned herein, illustrate a large amount of detail.It is to be appreciated, however, that this
Bright embodiment can be put into practice in the case of not having these details.In some instances, the most in detail
Known method, structure and technology are shown, in order to do not obscure the understanding of this description.
Similarly, it will be appreciated that in order to simplify the disclosure help to understand in each inventive aspect one or
Multiple, above in the description of the exemplary embodiment of the present invention, each feature of the present invention is sometimes by one
Rise and be grouped in single embodiment, figure or descriptions thereof.But, should be by the method for the disclosure
It is construed to reflect an intention that i.e. the present invention for required protection requires than institute in each claim clearly
The more feature of feature recorded.More precisely, as the following claims reflect, send out
Bright aspect is all features less than single embodiment disclosed above.Therefore, it then follows detailed description of the invention
Claims be thus expressly incorporated in this detailed description of the invention, the conduct of the most each claim itself
The independent embodiment of the present invention.
Those skilled in the art are appreciated that and can carry out adaptive to the module in the equipment in embodiment
Change to answering property and they are arranged in one or more equipment different from this embodiment.Can be reality
Execute the module in example or unit or assembly is combined into a module or unit or assembly, and in addition can be it
Be divided into multiple submodule or subelement or sub-component.Except in such feature and/or process or unit
Outside at least some excludes each other, can use any combination that (this specification being included, adjoint right is wanted
Ask, make a summary and accompanying drawing) disclosed in all features and so disclosed any method or equipment all
Process or unit are combined.Unless expressly stated otherwise, this specification (include adjoint claim,
Summary and accompanying drawing) disclosed in each feature can by provide identical, equivalent or the alternative features of similar purpose
Replace.
Although additionally, it will be appreciated by those of skill in the art that embodiments more described herein include other
Some feature included in embodiment rather than further feature, but the combination meaning of the feature of different embodiment
Taste and is within the scope of the present invention and is formed different embodiments.Such as, in following claim
In book, one of arbitrarily can mode using in any combination of embodiment required for protection.
The all parts embodiment of the present invention can realize with hardware, or with at one or more processor
The software module of upper operation realizes, or realizes with combinations thereof.It will be understood by those of skill in the art that
Microprocessor or digital signal processor (DSP) can be used in practice to realize according to the present invention real
Execute the some or all functions of some or all parts in the network hotspot excavating gear of example.The present invention
It is also implemented as part or all the equipment for performing method as described herein or device
Program (such as, computer program and computer program).Such program realizing the present invention is permissible
Store on a computer-readable medium, or can be to have the form of one or more signal.Such letter
Number can download from internet website and to obtain, or provide on carrier signal, or with any other shape
Formula provides.
The present invention will be described rather than limits the invention to it should be noted above-described embodiment, and
And those skilled in the art can design replacement enforcement without departing from the scope of the appended claims
Example.In the claims, any reference marks that should not will be located between bracket is configured to claim
Limit.Word " comprises " and does not excludes the presence of the element or step not arranged in the claims.Be positioned at element it
Front word "a" or "an" does not excludes the presence of multiple such element.The present invention can be by means of bag
Include the hardware of some different elements and realize by means of properly programmed computer.Some listing
In the unit claim of device, several in these devices can be to be come specifically by same hardware branch
Embody.Word first, second and third use do not indicate that any order.Can be by these word solutions
It is interpreted as title.
Claims (10)
1. a network hotspot excavating gear, it is characterised in that including:
Classification memory module, is suitable to gather network data, storage that described network data is classified and classified;
Filter extraction module, be suitable to according to the filtering rule that pre-sets respectively to the network data under of all categories
Filter, and from the network data after lower filtration of all categories, extract centre word;
Ordered set compound module, is suitable to be ranked up from the described centre word of consolidated network extracting data, and
Centre word after the sequence of consolidated network data is combined, it is thus achieved that each network data under of all categories
Center phrase;
Focus statistics module, is suitable to the occurrence number adding up described center phrase under generic, obtains respectively
Take of all categories under network hotspot phrase and carry out classification show.
2. device as claimed in claim 1, it is characterised in that described network data farther includes:
Article content that text header is corresponding with described text header and corresponding with described text header
Text attribute.
3. device as claimed in claim 1 or 2, it is characterised in that described text attribute is wrapped further
Include at least one of: uniform resource position mark URL that text is corresponding, the source forum/blog of text,
The source column of text, the issuing time of text, text author, text reply the clear of number and text
Look at number.
4. device as claimed any one in claims 1 to 3, it is characterised in that described classification stores
Module is further adapted for:
Utilize Technologies of Automated Text Classification, according to described article content, described network data is carried out text classification,
Obtain the tag along sort corresponding with described network data, and by corresponding text header, corresponding tag along sort,
And the text property store of correspondence is in engine;
At predetermined time intervals described engine is carried out primary network data acquisition, and will according to described tag along sort
The network data collected stores classifiedly in the different XML file of given server.
5. the device as according to any one of Claims 1-4, it is characterised in that described filtering rule
Farther include at least one of:
The network data that text header is not met predetermined number of words is deleted;
The network data that issuing time is against regulation is deleted;
Deleting the network data containing predetermined domain name in URL, wherein, described predetermined domain name is pre-
Domain name in the domain name blacklist first arranged;Or, the network data containing predetermined domain name in URL is entered
Row retains;
Deleting the network data that source column is predetermined column, wherein, described predetermined column is in advance
Column in the column blacklist arranged;Or, the network data that source column is predetermined column is protected
Stay;
To originating, network data against regulation is deleted, and wherein, described source includes: forum, rich
Visitor or all model;
The network data that reply number is not inconsistent regulation is deleted;
Delete browsing several network data against regulation;
The network data that author is against regulation is deleted;And
Network data is disappeared and heavily processes.
6. the device as according to any one of claim 1 to 5, it is characterised in that described filtration is extracted
Module is further adapted for: use participle technique to extract center respectively from the network data after lower filtration of all categories
Before word, the prefix dictionary according to pre-setting carries out prefix filtration to described text header.
7. the device as according to any one of claim 1 to 6, it is characterised in that described filtration is extracted
Module is further adapted for: use participle technique respectively the text header after lower filtration of all categories to be carried out participle,
Obtain word segmentation result, and using described word segmentation result as described centre word.
8. the device as according to any one of claim 1 to 7, it is characterised in that described sequence is combined
Module is further adapted for: before being ranked up from the described centre word of consolidated network extracting data, according to
Everyday words in the conventional dictionary the pre-set described centre word to extracting filters.
9. the device as according to any one of claim 1 to 8, it is characterised in that described sequence is combined
Module is further adapted for: according toCentre word after belonging to the sequence of same text header is combined,
Wherein, n is total number of the centre word belonging to same text header, r≤n and 2≤r≤5.
10. a network hotspot method for digging, it is characterised in that including:
Gather network data, storage that described network data is classified and classified;
Respectively the network data under of all categories is filtered according to the filtering rule pre-set, and respectively from
Network data after lower filtration of all categories is extracted centre word;
It is ranked up from the described centre word of consolidated network extracting data, and by the row of consolidated network data
Centre word after sequence is combined, it is thus achieved that the center phrase of each network data under of all categories;
Add up described center phrase occurrence number under generic, obtain respectively of all categories under network boom
Point phrase also carries out classification displaying.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610225018.0A CN105912670A (en) | 2012-09-18 | 2012-09-18 | Method and device for network hotspot excavation |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210346827.9A CN102831248B (en) | 2012-09-18 | 2012-09-18 | Network focus method for digging and device |
CN201610225018.0A CN105912670A (en) | 2012-09-18 | 2012-09-18 | Method and device for network hotspot excavation |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201210346827.9A Division CN102831248B (en) | 2012-09-18 | 2012-09-18 | Network focus method for digging and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN105912670A true CN105912670A (en) | 2016-08-31 |
Family
ID=47334383
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610225018.0A Pending CN105912670A (en) | 2012-09-18 | 2012-09-18 | Method and device for network hotspot excavation |
CN201210346827.9A Expired - Fee Related CN102831248B (en) | 2012-09-18 | 2012-09-18 | Network focus method for digging and device |
Family Applications After (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201210346827.9A Expired - Fee Related CN102831248B (en) | 2012-09-18 | 2012-09-18 | Network focus method for digging and device |
Country Status (1)
Country | Link |
---|---|
CN (2) | CN105912670A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107423444A (en) * | 2017-08-10 | 2017-12-01 | 世纪龙信息网络有限责任公司 | Hot word phrase extracting method and system |
CN107967299A (en) * | 2017-11-03 | 2018-04-27 | 中国农业大学 | The hot word extraction method and system of a kind of facing agricultural public sentiment |
CN108182191A (en) * | 2016-12-08 | 2018-06-19 | 腾讯科技(深圳)有限公司 | A kind of hot spot data processing method and its equipment |
Families Citing this family (21)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103902596B (en) * | 2012-12-28 | 2017-10-20 | 中国电信股份有限公司 | High frequency content of pages clustering method and system |
CN103324718B (en) * | 2013-06-25 | 2016-08-10 | 百度在线网络技术(北京)有限公司 | Method and system based on humongous search Web log mining topic venation |
CN103761234A (en) * | 2013-10-29 | 2014-04-30 | 北京奇虎科技有限公司 | Method and device for optimizing search ranking of network resource point |
CN103544294B (en) * | 2013-10-30 | 2017-02-01 | 北京京东尚科信息技术有限公司 | Keyword popularity automatic control method |
CN103580997B (en) * | 2013-11-19 | 2017-09-29 | 湖南蚁坊软件有限公司 | The extracting method and its device of a kind of popular microblogging in vertical field |
CN104714820A (en) * | 2013-12-17 | 2015-06-17 | 青岛龙泰天翔通信科技有限公司 | Cloud on-line updating method |
CN105095175B (en) * | 2014-04-18 | 2019-04-30 | 北京搜狗科技发展有限公司 | Obtain the method and device of truncated web page title |
CN105095318B (en) * | 2014-05-22 | 2019-02-26 | 北京启明星辰信息安全技术有限公司 | A kind of method and apparatus for realizing analysis of central issue |
CN105373551A (en) * | 2014-08-25 | 2016-03-02 | 阿里巴巴集团控股有限公司 | Method for determining sensitive resource processing policy and server |
CN105989176A (en) * | 2015-03-05 | 2016-10-05 | 北大方正集团有限公司 | Data processing method and device |
CN108108346B (en) * | 2016-11-25 | 2021-12-24 | 广东亿迅科技有限公司 | Method and device for extracting theme characteristic words of document |
CN107133201B (en) * | 2017-04-21 | 2021-03-16 | 东莞中国科学院云计算产业技术创新与育成中心 | Hot spot information acquisition method and device based on text code recognition |
CN108881968B (en) * | 2017-05-15 | 2020-10-30 | 北京国双科技有限公司 | Network video advertisement putting method and system |
CN107315838A (en) * | 2017-07-17 | 2017-11-03 | 深圳源广安智能科技有限公司 | A kind of efficient network hotspot digging system |
CN108712403B (en) * | 2018-05-04 | 2020-08-04 | 哈尔滨工业大学(威海) | Illegal domain name mining method based on domain name construction similarity |
CN110516066B (en) * | 2019-07-23 | 2022-04-15 | 同盾控股有限公司 | Text content safety protection method and device |
CN110765115A (en) * | 2019-09-27 | 2020-02-07 | 上海麦克风文化传媒有限公司 | Method for combining multiple sorting categories |
CN110929160B (en) * | 2019-12-02 | 2024-05-10 | 上海麦克风文化传媒有限公司 | Optimization method for system ordering result |
CN110888986B (en) * | 2019-12-06 | 2023-05-30 | 北京明略软件系统有限公司 | Information pushing method, device, electronic equipment and computer readable storage medium |
CN111580921B (en) * | 2020-05-15 | 2021-10-22 | 北京字节跳动网络技术有限公司 | Content creation method and device |
CN112380339B (en) * | 2020-11-23 | 2024-09-20 | 北京达佳互联信息技术有限公司 | Hot event mining method, device and server |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101420356A (en) * | 2008-05-30 | 2009-04-29 | 北京天腾时空信息科技有限公司 | Network content classified processing method and apparatus |
CN101788988A (en) * | 2009-01-22 | 2010-07-28 | 蔡亮华 | Information extraction method |
CN101923544A (en) * | 2009-06-15 | 2010-12-22 | 北京百分通联传媒技术有限公司 | Method for monitoring and displaying Internet hot spots |
CN102004792A (en) * | 2010-12-07 | 2011-04-06 | 百度在线网络技术(北京)有限公司 | Method and system for generating hot-searching word |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8046361B2 (en) * | 2008-04-18 | 2011-10-25 | Yahoo! Inc. | System and method for classifying tags of content using a hyperlinked corpus of classified web pages |
-
2012
- 2012-09-18 CN CN201610225018.0A patent/CN105912670A/en active Pending
- 2012-09-18 CN CN201210346827.9A patent/CN102831248B/en not_active Expired - Fee Related
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101420356A (en) * | 2008-05-30 | 2009-04-29 | 北京天腾时空信息科技有限公司 | Network content classified processing method and apparatus |
CN101788988A (en) * | 2009-01-22 | 2010-07-28 | 蔡亮华 | Information extraction method |
CN101923544A (en) * | 2009-06-15 | 2010-12-22 | 北京百分通联传媒技术有限公司 | Method for monitoring and displaying Internet hot spots |
CN102004792A (en) * | 2010-12-07 | 2011-04-06 | 百度在线网络技术(北京)有限公司 | Method and system for generating hot-searching word |
Non-Patent Citations (1)
Title |
---|
罗引: "互联网舆情发现与观点挖掘技术研究", 《电子科技大学硕士学位论文》 * |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108182191A (en) * | 2016-12-08 | 2018-06-19 | 腾讯科技(深圳)有限公司 | A kind of hot spot data processing method and its equipment |
CN108182191B (en) * | 2016-12-08 | 2022-01-18 | 腾讯科技(深圳)有限公司 | Hotspot data processing method and device |
CN107423444A (en) * | 2017-08-10 | 2017-12-01 | 世纪龙信息网络有限责任公司 | Hot word phrase extracting method and system |
CN107423444B (en) * | 2017-08-10 | 2020-05-19 | 世纪龙信息网络有限责任公司 | Hot word phrase extraction method and system |
CN107967299A (en) * | 2017-11-03 | 2018-04-27 | 中国农业大学 | The hot word extraction method and system of a kind of facing agricultural public sentiment |
CN107967299B (en) * | 2017-11-03 | 2020-05-12 | 中国农业大学 | Agricultural public opinion-oriented automatic hot word extraction method and system |
Also Published As
Publication number | Publication date |
---|---|
CN102831248A (en) | 2012-12-19 |
CN102831248B (en) | 2016-05-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN102831248B (en) | Network focus method for digging and device | |
CN102945290B (en) | Hot microblog topic excavating gear and method | |
CN102968297B (en) | The software management system of mobile terminal and method | |
CN102982157A (en) | Device and method used for mining microblog hot topics | |
CN103218431B (en) | A kind ofly can identify the system that info web gathers automatically | |
CN103136358B (en) | A kind of method of Automatic Extraction forum data | |
CN104933093A (en) | Regional public opinion monitoring and decision-making auxiliary system and method based on big data | |
CN104281607A (en) | Microblog hot topic analyzing method | |
CN106383887A (en) | Environment-friendly news data acquisition and recommendation display method and system | |
CN104820685A (en) | Social contact network searching method and social contact network searching system | |
CN103744877A (en) | Public opinion monitoring application system deployed in internet and application method | |
CN103955505A (en) | Micro-blog-based real-time event monitoring method and system | |
CN102521767A (en) | Method and system for publishing network advertising information | |
EP3014414A2 (en) | Real-time and adaptive data mining | |
CN107437038A (en) | A kind of detection method and device of webpage tamper | |
CN103617169A (en) | Microblog hot topic extracting method based on Hadoop | |
CN110134845A (en) | Project public sentiment monitoring method, device, computer equipment and storage medium | |
CN104536956A (en) | A Microblog platform based event visualization method and system | |
CN107220745A (en) | A kind of recognition methods, system and equipment for being intended to behavioral data | |
CN105718590A (en) | Multi-tenant oriented SaaS public opinion monitoring system and method | |
CN103235827B (en) | A kind of method of scientific and technical information automatic classification screening | |
CN103778225A (en) | Processing method, identifying device and identifying system of advertisement marketing language information | |
CN103177076A (en) | Public sentiment monitoring system and method based on fixed point websites | |
CN104809252A (en) | Internet data extraction system | |
CN108733791A (en) | network event detection method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20160831 |
|
RJ01 | Rejection of invention patent application after publication |