CN105912670A - Method and device for network hotspot excavation - Google Patents

Method and device for network hotspot excavation Download PDF

Info

Publication number
CN105912670A
CN105912670A CN201610225018.0A CN201610225018A CN105912670A CN 105912670 A CN105912670 A CN 105912670A CN 201610225018 A CN201610225018 A CN 201610225018A CN 105912670 A CN105912670 A CN 105912670A
Authority
CN
China
Prior art keywords
network data
text
network
phrase
categories
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610225018.0A
Other languages
Chinese (zh)
Inventor
林英杰
马良
陈强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Qihoo Technology Co Ltd
Qizhi Software Beijing Co Ltd
Original Assignee
Beijing Qihoo Technology Co Ltd
Qizhi Software Beijing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Qihoo Technology Co Ltd, Qizhi Software Beijing Co Ltd filed Critical Beijing Qihoo Technology Co Ltd
Priority to CN201610225018.0A priority Critical patent/CN105912670A/en
Publication of CN105912670A publication Critical patent/CN105912670A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The invention discloses a method and device for network hotspot excavation. The device comprises a classification and storage module, a filtering and extraction module, a sorting and combination module and a hotspot statistics module, wherein the classification and storage module is applicable to collection of network data and is used for classification and sorted storage of the network data; the filtering and extraction module is applicable to separately filtering the network data under each class according to preset filtering rules and extracting keywords from the network data which is filtered under each class; the sorting and combination module is applicable to sorting the keywords extracted from the same network data, combining the sorted keywords of the same network data and obtaining keyword groups of each network data under each class; and the hotspot statistics module is applicable to counting occurrence frequencies of the keyword groups under the corresponding classes and obtaining network hotspot keyword groups under each class respectively for sorted display. By the technical scheme provided by the invention, network hotspots can be excavated more macroscopically; excavation results can more accurately reflect objective facts of Internet public opinions; and hotspots in a certain field can be reflected in a more targeted manner.

Description

Network hotspot method for digging and device
The application is application number 201210346827.9, on 09 25th, 2012 applying date, denomination of invention The divisional application of " network hotspot method for digging and device ".
Technical field
The present invention relates to field of Internet communication, particularly relate to a kind of network hotspot method for digging and device.
Background technology
In the prior art, along with the development of the Internet, increasing website introduces user-generated content (User Generated Content, referred to as UGC) function, substantial amounts of netizen pour in forum, blog, Microblogging delivered the suggestion of oneself and discloses all kinds of news, having every day thousands of topic to produce from the Internet Raw, from internet mass information, obtain network hotspot the most faster, will to understand social development situation, Grasp public opinion and dynamically play directiveness effect.
At present, commonly used in prior art focus method for digging is by the text in special time period Transfer amount, click volume, reply volume carry out the weighted calculation of predetermined condition and obtain text calorific value, pass through calorific value Sequence obtains the hottest text.But, the technical scheme of prior art there is problems in that 1, due to the most right Single text self attributes is added up, and the much-talked-about topic of acquisition is only capable of reflecting the temperature of a certain article on microcosmic Situation, and macroscopically temperature situation to a certain netizen's focus cannot be reflected;2, due to the sample of statistics Integrating as full dose data, and do not get down to corresponding statistical analysis from content of text, the result therefore produced does not has pin To property, it is impossible to point field reflection is for the focus situation in this field;3, technical scheme of the prior art is only The text of the energy identical same content of statistical nature, acquired results repeatability is big, readable poor.
Summary of the invention
The present invention provides a kind of network hotspot method for digging and device, digs solving network hotspot in prior art Result is the most macroscopical in pick, the field reflection focus situation for this field and repeatability can not be divided big, readable Property difference problem.
The present invention provides a kind of network hotspot method for digging, including: gather network data, network data is entered Row classification and classification storage;Respectively the network data under of all categories is carried out according to the filtering rule pre-set Filter, and from the network data after lower filtration of all categories, extract centre word respectively;To from consolidated network data The centre word of middle extraction is ranked up, and is combined by the centre word after the sequence of consolidated network data, obtains Of all categories under the center phrase of each network data;Statistics center phrase goes out occurrence under generic Number, obtains lower network hotspot phrase of all categories respectively and carries out classification displaying.
Alternatively, network data includes: article content that text header is corresponding with text header and The text attribute corresponding with text header.
Alternatively, text attribute farther includes at least one of: the URL that text is corresponding URL, text source forum/blog, text source column, the issuing time of text, text author, Text reply number and text browse number.
Alternatively, network data is classified and storage of classifying farther includes: utilize text automatic classification Technology carries out text classification according to article content to network data, obtains the contingency table corresponding with network data Sign, and by the text property store of corresponding text header, corresponding tag along sort and correspondence to engine In;At predetermined time intervals engine is carried out primary network data acquisition, and will collect according to tag along sort Network data stores classifiedly in the different XML file of given server.
Alternatively, filtering rule farther includes at least one of: text header does not meets predetermined number of words Network data delete;The network data that issuing time is against regulation is deleted;To URL In delete containing the network data of predetermined domain name, wherein, predetermined domain name is the black name of the domain name pre-set Domain name in list;Or, the network data containing predetermined domain name in URL is retained;To source version Block is that the network data of predetermined column is deleted, and wherein, predetermined column is the column blacklist pre-set In column;Or, the network data that source column is predetermined column is retained;Source is not met The network data of regulation is deleted, and wherein, source includes: forum, blog or whole model;Right Reply number is not inconsistent the network data of regulation and deletes;Delete browsing several network data against regulation Remove;The network data that author is against regulation is deleted;And network data is disappeared heavily process.
Alternatively, use participle technique respectively from the network data after lower filtration of all categories extract centre word it Before, said method also includes: the prefix dictionary according to pre-setting carries out prefix filtration to text header.
Alternatively, use participle technique to extract centre word respectively from the network data after lower filtration of all categories to enter One step includes: uses participle technique that the text header after lower filtration of all categories carries out participle respectively, obtains and divide Word result, and using word segmentation result as centre word.
Alternatively, before being ranked up the centre word from consolidated network extracting data, method also includes: Filter according to the everyday words in the conventional dictionary the pre-set centre word to extracting.
Alternatively, it is combined farther including by the centre word after the sequence of consolidated network data: according to inciting somebody to action Belonging to the centre word after the sequence of same text header to be combined, wherein, n is for belonging to same text Total number of the centre word of title, r≤n and 2≤r≤5.
Alternatively, the centre word after the sequence of consolidated network data is combined, it is thus achieved that each under of all categories After the center phrase of individual network data, said method also includes: according to the rubbish dictionary centering pre-set Rubbish phrase in heart phrase filters.
Alternatively, statistics center phrase occurrence number under generic, obtain respectively of all categories under net Network focus phrase farther includes: statistics center phrase goes out occurrence in different text headers under generic Number, will appear from number of times and arranges according to predefined procedure more than the center phrase of predetermined threshold, obtains each respectively Network hotspot phrase under classification.
Alternatively, obtain respectively of all categories under network hotspot phrase after, said method also includes: to Network hotspot phrase identical under one classification merges;Calculate of all categories under network hotspot phrase corresponding to Hot value;Search for the link of focus incident corresponding to lower network focus phrase of all categories.
Alternatively, carry out classification displaying and farther include: show hot spot report to user, wherein, focus report Announcement includes: network hotspot phrase under of all categories in the generic of network hotspot phrase, predetermined amount of time, Corresponding to the hot value corresponding to network hotspot phrase under of all categories and lower network focus phrase of all categories The link of focus incident, predetermined amount of time includes at least one of: per hour, every day, weekly, with And monthly.
Present invention also offers a kind of network hotspot excavating gear, including: classification memory module, be suitable to gather Network data, storage of network data being classified and classifies;Filter extraction module, be suitable to according to setting in advance Network data under of all categories is filtered by filtering rule respectively that put, and the net after lower filtration of all categories Network extracting data centre word;Ordered set compound module, is suitable to the centre word from consolidated network extracting data It is ranked up, and the centre word after the sequence of consolidated network data is combined, it is thus achieved that each under of all categories The center phrase of individual network data;Focus statistics module, is suitable to statistics center phrase going out under generic Occurrence number, obtain respectively of all categories under network hotspot phrase and carry out classification show.
Alternatively, network data also includes: the text header article content corresponding with text header, with And the text attribute corresponding with text header.
Alternatively, text attribute farther includes at least one of: the URL that text is corresponding URL, text source forum/blog, text source column, the issuing time of text, text author, Text reply number and text browse number.
Alternatively, classification memory module is further adapted for: utilize Technologies of Automated Text Classification according to article content Network data is carried out text classification, obtains the tag along sort corresponding with network data, and by corresponding text The text property store of title, corresponding tag along sort and correspondence is in engine;The most right Engine carries out primary network data acquisition, and according to tag along sort the network data collected stored classifiedly in In the different XML file of given server.
Alternatively, filtering rule farther includes at least one of: text header does not meets predetermined number of words Network data delete;The network data that issuing time is against regulation is deleted;To URL In delete containing the network data of predetermined domain name, wherein, predetermined domain name is the black name of the domain name pre-set Domain name in list;Or, the network data containing predetermined domain name in URL is retained;To source version Block is that the network data of predetermined column is deleted, and wherein, predetermined column is the column blacklist pre-set In column;Or, the network data that source column is predetermined column is retained;Source is not met The network data of regulation is deleted, and wherein, source includes: forum, blog or whole model;Right Reply number is not inconsistent the network data of regulation and deletes;Delete browsing several network data against regulation Remove;The network data that author is against regulation is deleted;And network data is disappeared heavily process.
Alternatively, filter extraction module to be further adapted for: use participle technique respectively after lower filtration of all categories Network data in extract before centre word, the prefix dictionary according to pre-setting carries out prefix to text header Filter.
Alternatively, filter extraction module to be further adapted for: after using participle technique respectively to lower filtration of all categories Text header carry out participle, obtain word segmentation result, and using word segmentation result as centre word.
Alternatively, ordered set compound module is further adapted for: enter the centre word from consolidated network extracting data Before row sequence, filter according to the everyday words in the conventional dictionary the pre-set centre word to extracting.
Alternatively, ordered set compound module is further adapted for: according toThe sequence of same text header will be belonged to After centre word be combined, wherein, n is total number of the centre word belonging to same text header, r ≤ n and 2≤r≤5.
Alternatively, ordered set compound module is further adapted for: entered by the centre word after the sequence of consolidated network data Row combination, it is thus achieved that after the center phrase of each network data under of all categories, according to the rubbish pre-set Rubbish phrase in the phrase of center is filtered by dictionary.
Alternatively, focus statistics module is further adapted for: statistics center phrase different texts under generic Occurrence number in title, will appear from number of times and arranges according to predefined procedure more than the center phrase of predetermined threshold Row, obtain respectively of all categories under network hotspot phrase.
Alternatively, focus statistics module is further adapted for: enter network hotspot phrase identical under same category Row merges;Calculate of all categories under the hot value corresponding to network hotspot phrase;Search for lower network of all categories heat The link of some focus incident corresponding to phrase.
Alternatively, focus statistics module is further adapted for: show hot spot report to user, wherein, and focus report Announcement includes: network hotspot phrase under of all categories in the generic of network hotspot phrase, predetermined amount of time, Corresponding to the hot value corresponding to network hotspot phrase under of all categories and lower network focus phrase of all categories The link of focus incident, predetermined amount of time includes at least one of: per hour, every day, weekly, with And monthly.
The present invention has the beneficial effect that:
Excavate by utilizing hot word Computing Principle to realize focus, and by Text Classification and focus digging technology Combine, solve that network hotspot Result in prior art is the most macroscopical, can not point field reflect for this Focus situation and the repeatability in field are big, the problem of readable difference;Excavation network that can be more macroscopical Focus, reflects the temperature situation macroscopically to a certain netizen's focus, makes Result more can reflect the Internet The objective fact of public opinion, it is easier to the identical content article that merger repeats, and can be more targeted Reflect a certain field focus.
Described above is only the general introduction of technical solution of the present invention, in order to better understand the technology of the present invention Means, and can being practiced according to the content of description, and in order to allow above and other objects of the present invention, Feature and advantage can become apparent, below especially exemplified by the detailed description of the invention of the present invention.
Accompanying drawing explanation
By reading the detailed description of hereafter preferred implementation, various other advantage and benefit for ability Territory those of ordinary skill will be clear from understanding.Accompanying drawing is only used for illustrating the purpose of preferred implementation, and also It is not considered as limitation of the present invention.And in whole accompanying drawing, it is denoted by the same reference numerals identical Parts.In the accompanying drawings:
Fig. 1 is the flow chart of the network hotspot method for digging of the embodiment of the present invention;
Fig. 2 is the schematic diagram of the filtering rule of the embodiment of the present invention;
Fig. 3 be the network hotspot method for digging of the embodiment of the present invention process schematic diagram in detail;
Fig. 4 is the structural representation of the network hotspot excavating gear of the embodiment of the present invention.
Detailed description of the invention
It is more fully described the exemplary embodiment of the disclosure below with reference to accompanying drawings.Although accompanying drawing shows The exemplary embodiment of the disclosure, it being understood, however, that may be realized in various forms the disclosure and should be by Embodiments set forth here is limited.On the contrary, it is provided that these embodiments are able to be best understood from this Open, and complete for the scope of the present disclosure can be conveyed to those skilled in the art.
In order to solve that network hotspot Result in prior art is the most macroscopical, can not the reflection of point field lead for this Focus situation and the repeatability in territory are big, the problem of readable difference, the invention provides a kind of network hotspot Method for digging and device, the network hotspot method for digging of the embodiment of the present invention and device use text automatic classification Technology and hot word computing technique realize.Below in conjunction with accompanying drawing and embodiment, the present invention is entered one Step describes in detail.Should be appreciated that specific embodiment described herein only in order to explain the present invention, not Limit the present invention.
According to embodiments of the invention, it is provided that a kind of network hotspot method for digging, Fig. 1 is that the present invention implements The flow chart of the network hotspot method for digging of example, as it is shown in figure 1, network hotspot according to embodiments of the present invention Method for digging includes processing as follows:
Step 101, gathers network data, storage of classifying network data and classify;
Wherein, the network data described in step 101 specifically includes: text header is relative with text header The article content answered and the text attribute corresponding with text header.Wherein, text attribute specifically includes At least one of: URL (the Uniform/Universal Resource that text is corresponding Locator, referred to as URL), text source forum/blog, text source column, the issue of text Time, text author, text reply number and text browse number.
In a step 101, network data is classified and storage of classifying specifically includes:
Step 1, utilizes Technologies of Automated Text Classification, according to article content, network data is carried out text classification, Obtain the tag along sort corresponding with network data, and by corresponding text header, corresponding tag along sort, with And the text property store of correspondence is in engine;Wherein, Technologies of Automated Text Classification refers to: utilize engineering The principle practised relies on the model parameter after small-sample learning to text set (or other entities or object) according to one Fixed taxonomic hierarchies or standard carry out automatic key words sorting.
Step 2, carries out primary network data acquisition at predetermined time intervals to engine, and will according to tag along sort The network data collected stores classifiedly in the different XML file of given server.Wherein, pre-timing Between can be 1 hour, 6 hours, 1 day, in embodiments of the present invention, the scheduled time can be according to collection Data characteristics (such as, renewal speed) arrange flexibly.
Step 102, filters the network data under of all categories respectively according to the filtering rule pre-set, And from the network data after lower filtration of all categories, extract centre word respectively;
Preferably, Fig. 2 is the schematic diagram of the filtering rule of the embodiment of the present invention, as in figure 2 it is shown, at this In bright embodiment, filtering rule specifically includes at least one of: 1, text header is not met reserved word The network data of number is deleted;2, the network data that issuing time is against regulation is deleted;3、 Deleting the network data containing predetermined domain name in URL, wherein, predetermined domain name is pre-set Domain name in domain name blacklist;Or, the network data containing predetermined domain name in URL is retained;4、 Deleting the network data that source column is predetermined column, wherein, predetermined column is the version pre-set Column in block blacklist;Or, the network data that source column is predetermined column is retained;5、 To originate network data against regulation delete, wherein, source includes: forum, blog or All models;6, the network data that reply number is not inconsistent regulation is deleted;7, rule are not met to browsing to count Fixed network data is deleted;8, the network data that author is against regulation is deleted;9, to net Network data carry out disappearing and heavily process.
It should be noted that the filtering rule in the embodiment of the present invention is not limited to 9 rules listed above, In embodiments of the present invention, filtering rule can be configured as required, such as, is arranged by filtering rule For: the network data that the number of words of article is not above predetermined number of words threshold value is deleted etc..
Additionally, in a step 102, before extracting centre word, in order to preferably extract the center of needs Word, can carry out prefix filtration according to the prefix dictionary pre-set to text header, such as, be flutterred greatly by cat This kind of unwanted prefixes such as student base, ends of the earth tittle-tattle filter.These prefixes do not participate in centre word Extract.Further, in embodiments of the present invention, can use participle technique respectively after lower filtration of all categories Network data is extracted centre word;Specifically, after can using participle technique respectively to lower filtration of all categories Text header carries out participle, obtains word segmentation result, and using word segmentation result as centre word.It should be noted that Above-mentioned participle technique is centre word extractive technique ripe in prior art, and the embodiment of the present invention can also use Other technologies carry out the extraction of centre word.
Step 103, is ranked up the centre word from consolidated network extracting data, and by consolidated network number According to sequence after centre word be combined, it is thus achieved that the center phrase of each network data under of all categories;
Step 103 is realized by hot word computing technique, and hot word computing technique refers to: automatically in real time The web page text gathered carries out participle, packet merger, calculates high frequency hot keyword, and according to predefined Dictionary and preset rules filter, and export real-time Internet focus vocabulary.
In step 103, before the centre word from consolidated network extracting data is ranked up, Ke Yigen Filtering according to the everyday words in the conventional dictionary the pre-set centre word to extracting, above-mentioned everyday words is Refer to the most original, reprint, the vocabulary such as figure group, need to filter out these vocabulary.
Further, in step 103, carry out centre word combination to refer to: according toSame text mark will be belonged to Centre word after the sequence of topic is combined, and wherein, n is centre word total belonging to same text header Number, r≤n and 2≤r≤5.
After performing step 103, in embodiments of the present invention, it is preferable that can be according to pre-setting Rubbish dictionary the rubbish phrase in the phrase of center is filtered.
Step 104, statistics center phrase occurrence number under generic, obtain respectively of all categories under Network hotspot phrase also carries out classification displaying.
Step 104 specifically includes following process: statistics center phrase is under generic in different text headers Occurrence number, will appear from number of times and arrange according to predefined procedure more than the center phrase of predetermined threshold, point Do not obtain of all categories under network hotspot phrase.Wherein, above-mentioned predefined procedure can be by many by occurrence number Arrange to few.
Obtain of all categories under network hotspot phrase after, can be to network boom identical under same category Point phrase merges;Calculate of all categories under the hot value corresponding to network hotspot phrase;And search for all kinds of The link of the focus incident corresponding to other lower network focus phrase.To provide the user focus letter more in all directions Breath.
At step 104, classification displaying refers to: show hot spot report to user, wherein, and hot spot report bag Include: network hotspot phrase under of all categories in the generic of network hotspot phrase, predetermined amount of time, all kinds of The hot value corresponding to network hotspot phrase under not and the heat corresponding to lower network focus phrase of all categories The link of some event, predetermined amount of time includes at least one of: per hour, every day, weekly and every Month.
Below in conjunction with the accompanying drawings, the technical scheme of the embodiment of the present invention is illustrated.
Fig. 3 be the network hotspot method for digging of the embodiment of the present invention process schematic diagram in detail, as it is shown on figure 3, Network hotspot method for digging according to embodiments of the present invention specifically includes and processes as follows:
Step 301, utilizes self-defined language material to generate disaggregated model by machine learning module, by classification mould The type network data to collecting carries out text classification, and tag along sort is together deposited together with text attribute into In engine.
Step 302, carries out a data acquisition per hour to engine, and by data by storing classifiedly in appointment Different extensible markup language (Extensible Markup Language, the referred to as XML) literary composition of server In part.
Step 303, crosses filter data by following filtering rule, and the data after filtering remain in data base, Wherein, filtering rule can be managed by user by data filtering regulation management backstage.
Specifically, filtering rule according to embodiments of the present invention includes:
1, title filters: come in by the number of words of title data filtering between 5-30 word;
2, post temporal filtering, be that the model on the same day filters by the time of posting;
3, domain filter: (1) takes fuzzy matching, can will have corresponding domain name or list in the URL of model The model of word filters;Or, (2) by domain name by 30 current events forums, 20 automobile forums and In the URL of model, the model of band auto filters;Or, satisfied (1), (2) both rules Will filter into.
4, column filters: filter according to the URL of plate seed;Can also by column title band certain The model of Chinese character filters;Such as, the amusement of column title band or the model of Eight Diagrams printed words are filtered out;
5, domain name blacklist filters: the result filtered out above is carried out deletion action, by certain second-level domain In name or two grades of URL, the model with certain word filters out;And, it is xinhuanet.com at TLD Result in, be filtering out of 120ask.xinhuanet.com by domain name;
6, column blacklist: the result filtered out above is carried out deletion action, certain seed or column In Ming, the model with certain word filters out;Further, it is filtering out of reporting of new person by column name;
7, source filtering: come in meeting the data filtering filtering source, wherein, filters source and refers to: Forum, blog or whole model;
8, reply number hits to filter: come in replying number data filtering within 0-1000;Will click on Number data filtering within 0-10000 is come in;
9, disappear and heavily process: carry out the weight that disappears, one model of all calculations that TLD is identical according to the URL of model;
10, filtered fields includes: title, URL, source forum, come active plate, the time of posting, author, Reply number, browse number etc..
11, filter logic order: above-mentioned 3rd article of filtering rule and the 4th article of filtering rule are the passes of "or" System, between other filtering rules be " with " relation.
All text headers are extracted centre word by step 304, and a title may have multiple centre word, logical Crossing participle technique and title is carried out participle, word segmentation result is title centre word.Preferably, the most right before participle Title carries out prefix filtration, and these prefixes are not involved in participle, such as, and " cat flutters university students base ", " ends of the earth Tittle-tattle " etc. this kind of prefix.Wherein, user can be by prefix management backstage to needing the prefix filtered to enter Line pipe is managed;
Step 305, focus phrase calculates:
Step 1, by the everyday words (such as, the vocabulary such as " original ", " reprinting ", " group figure ") in word segmentation result Filter;Wherein, user can be by everyday words management backstage to needing the everyday words filtered to be managed;
Step 2, the centre word after filtering carries out the phrase sequence (centre word that such as, a title is extracted out For bca, after sequence, become abc);
Step 3, is combined the centre word of each title, the centre word of each titleCombination, combination Formula:Only retain the phrase of 2-5 word;
Below, in conjunction with example, centre word is carried out phrase sequence combination to be illustrated.
Title one extracts centre word b, a, c out, a, b, c after sequence, forms phrase ab, bc, ac, abc
Title two extracts centre word c, b, d out, b, c, d after sequence, forms phrase bc, cd, bd, bcd
Title three extracts centre word b, c out, forms phrase bc
So these three title formed phrase seniority among brothers and sisters be exactly: bc (3), ab (1), ac (1), cd (1), bd(1)、abc(1)、bcd(1)。
Step 4, filtering spam phrase, removes such as inquiry ### prize-winning, ### phone, ### consulting, mobile phone ### The rubbish phrase of prize-winning etc;Wherein, user can be by the rubbish phrase management backstage rubbish to needing filtration Rubbish phrase is managed;
Step 306, forms focus phrase ranking list, adds up the title quantity of each focus phrase behind and presses Title quantity descending, retains the phrase of title quantity more than 2, and this parameter can be done according to real data Adjust;
In sum, by means of the technical scheme of the embodiment of the present invention, by utilizing hot word Computing Principle to realize Focus excavates, and is combined with focus digging technology by Text Classification, solves network in prior art Focus Result is the most macroscopical, can not divide the field reflection focus situation for this field and repeatability is big, The problem of readable difference;Excavation network hotspot that can be more macroscopical, a certain netizen is macroscopically paid close attention to by reflection The temperature situation of point, makes Result more can reflect the objective fact of the Internet public opinion, it is easier to merger repeats The identical content article occurred, and can the focus in a certain field of more targeted reflection.
According to embodiments of the invention, it is provided that a kind of network hotspot excavating gear, Fig. 4 is that the present invention implements The structural representation of the network hotspot excavating gear of example, as shown in Figure 4, network according to embodiments of the present invention Focus excavating gear includes: classification memory module 40, filter extraction module 42, ordered set compound module 44, And focus statistics module 46, below the modules of the embodiment of the present invention is described in detail.
Classification memory module 40, is suitable to gather network data, storage of classifying network data and classify;
Wherein, above-mentioned network data specifically includes: the text header article content corresponding with text header, And the text attribute corresponding with text header.Wherein, above-mentioned text attribute specifically include following at least it One: URL that text is corresponding, the source forum/blog of text, source column, the issue of text of text Time, text author, text reply number and text browse number.
Classification memory module 40 is particularly adapted to: 1, utilize Technologies of Automated Text Classification according to article content to net Network data carry out text classification, obtain the tag along sort corresponding with network data, and by corresponding text header, Corresponding tag along sort and the text property store of correspondence are in engine;Wherein, text automatic classification skill Art refers to: utilize the principle of machine learning rely on the model parameter after small-sample learning to text set (or other Entity or object) carry out automatic key words sorting according to certain taxonomic hierarchies or standard.2, every pre-timing Between engine carried out primary network data acquisition, and according to tag along sort, the network data collected classification is deposited It is put in the different XML file of given server.Wherein, the scheduled time can be 1 hour, 6 hours, 1 day, in embodiments of the present invention, the scheduled time (such as, can update speed according to the data characteristics gathered Degree) arrange flexibly.
Filter extraction module 42, be suitable to according to the filtering rule that pre-sets respectively to the network number under of all categories According to filtering, and from the network data after lower filtration of all categories, extract centre word;
In embodiments of the present invention, Fig. 2 is the schematic diagram of the filtering rule of the embodiment of the present invention, such as Fig. 2 institute Showing, filtering rule specifically includes at least one of: 1, text header does not meets the network of predetermined number of words Data are deleted;2, the network data that issuing time is against regulation is deleted;3, to URL In delete containing the network data of predetermined domain name, wherein, predetermined domain name is the black name of the domain name pre-set Domain name in list;Or, the network data containing predetermined domain name in URL is retained;4, to source Column is that the network data of predetermined column is deleted, and wherein, predetermined column is the black name of the column pre-set Column in list;Or, the network data that source column is predetermined column is retained;5, to source Network data against regulation is deleted, and wherein, source includes: forum, blog or whole note Son;6, the network data that reply number is not inconsistent regulation is deleted;7, to browsing several net against regulation Network data are deleted;8, the network data that author is against regulation is deleted;9, to network data Carry out disappearing and heavily process.
It should be noted that the filtering rule in the embodiment of the present invention is not limited to 9 rules listed above, In embodiments of the present invention, filtering rule can be configured as required, such as, is arranged by filtering rule For: the network data that the number of words of article is not above predetermined number of words threshold value is deleted etc..
Additionally, before extracting centre word, in order to preferably extract the centre word of needs, filter and extract mould Block 42 is further adapted for: according to the prefix dictionary pre-set, text header can be carried out prefix filtration, Such as, cat flutters this kind of unwanted prefix such as university students base, ends of the earth tittle-tattle to filter.These prefixes Do not participate in the extraction of centre word.Further, in embodiments of the present invention, filter extraction module 42 can use Participle technique extracts centre word respectively from the network data after lower filtration of all categories;Specifically, extraction is filtered Module 42 can use participle technique respectively the text header after lower filtration of all categories to be carried out participle, obtains Word segmentation result, and using word segmentation result as centre word.It should be noted that above-mentioned participle technique is existing skill Centre word extractive technique ripe in art, the embodiment of the present invention can also use other technologies to carry out centre word Extract.
Ordered set compound module 44, is suitable to be ranked up the centre word from consolidated network extracting data, and will Centre word after the sequence of consolidated network data is combined, it is thus achieved that in each network data under of all categories Heart phrase;
Ordered set compound module 44 is the above-mentioned process realized by hot word computing technique, and hot word calculates skill Art refers to: the web page text to Real-time Collection carries out participle, packet merger automatically, calculates high frequency focus crucial Word, and filter according to predefined dictionary and preset rules, export real-time Internet focus vocabulary.
Before being ranked up the centre word from consolidated network extracting data, ordered set compound module 44 is permissible Filter according to the everyday words in the conventional dictionary the pre-set centre word to extracting.Above-mentioned everyday words Refer to the most original, reprint, the vocabulary such as figure group, need to filter out these vocabulary.
Ordered set compound module 44 carries out centre word combination and refers to: ordered set compound module 44 basisTo belong to same Centre word after the sequence of one text header is combined, and wherein, n is to belong to same text header Total number of centre word, r≤n and 2≤r≤5.
Preferably, the centre word after the sequence of consolidated network data is being combined, it is thus achieved that under of all categories After the center phrase of each network data, ordered set compound module 44 is further adapted for: according to pre-setting Rubbish dictionary the rubbish phrase in the phrase of center is filtered.
Focus statistics module 46, is suitable to statistics center phrase occurrence number under generic, obtains respectively Network hotspot phrase under of all categories also carries out classification and shows.
Focus statistics module 46 is particularly adapted to: statistics center phrase is under generic in different text headers Occurrence number, will appear from number of times and arrange according to predefined procedure more than the center phrase of predetermined threshold, point Do not obtain of all categories under network hotspot phrase.
Obtain of all categories under network hotspot phrase after, focus statistics module 46 is further adapted for: Network hotspot phrase identical under same category is merged;Calculate of all categories under network hotspot phrase institute Corresponding hot value;Search for the link of focus incident corresponding to lower network focus phrase of all categories.
Focus statistics module 46 displaying of classifying refers to: show hot spot report to user, wherein, and hot spot report Including: in the generic of network hotspot phrase, predetermined amount of time of all categories under network hotspot phrase, each Corresponding to the hot value corresponding to network hotspot phrase under classification and lower network focus phrase of all categories The link of focus incident, predetermined amount of time includes at least one of: per hour, every day, weekly and Monthly.
Below in conjunction with the accompanying drawings, the technical scheme of the embodiment of the present invention is illustrated.
Fig. 3 be the network hotspot method for digging of the embodiment of the present invention process schematic diagram in detail, as it is shown on figure 3, Network hotspot method for digging according to embodiments of the present invention specifically includes and processes as follows:
Step 301, utilizes self-defined language material to generate disaggregated model, classification storage mould by machine learning module Block 40 carries out text classification by the disaggregated model network data to collecting, and by tag along sort together with literary composition This attribute is together deposited in engine.
Step 302, classification memory module 40 carries out a data acquisition per hour, and data is pressed engine Store classifiedly different extensible markup language in given server (Extensible Markup Language, Referred to as XML) in file.
Step 303, filters extraction module 42 and crosses filter data by following filtering rule, and the data after filtering Remaining in data base, wherein, filtering rule can be carried out by user by data filtering regulation management backstage Management.
Specifically, Fig. 3 is the preferred schematic diagram of the filtering rule of the embodiment of the present invention, as it is shown on figure 3, root Include according to the filtering rule of the embodiment of the present invention:
1, title filters: come in by the number of words of title data filtering between 5-30 word;
2, post temporal filtering, be that the model on the same day filters by the time of posting;
3, domain filter: (1) takes fuzzy matching, can will have corresponding domain name or list in the URL of model The model of word filters;Or, (2) by domain name by 30 current events forums, 20 automobile forums and In the URL of model, the model of band auto filters;Or, satisfied (1), (2) both rules Will filter into.
4, column filters: filter according to the URL of plate seed;Can also by column title band certain The model of Chinese character filters;Such as, the amusement of column title band or the model of Eight Diagrams printed words are filtered out;
5, domain name blacklist filters: the result filtered out above is carried out deletion action, by certain second-level domain In name or two grades of URL, the model with certain word filters out;And, it is xinhuanet.com at TLD Result in, be filtering out of 120ask.xinhuanet.com by domain name;
6, column blacklist: the result filtered out above is carried out deletion action, certain seed or column In Ming, the model with certain word filters out;Further, it is filtering out of reporting of new person by column name;
7, source filtering: come in meeting the data filtering filtering source, wherein, filters source and refers to: Forum, blog or whole model;
8, reply number hits to filter: come in replying number data filtering within 0-1000;Will click on Number data filtering within 0-10000 is come in;
9, disappear and heavily process: carry out the weight that disappears, one model of all calculations that TLD is identical according to the URL of model;
10, filtered fields includes: title, URL, source forum, come active plate, the time of posting, author, Reply number, browse number etc..
11, filter logic order: above-mentioned 3rd article of filtering rule and the 4th article of filtering rule are the passes of "or" System, between other filtering rules be " with " relation.
Step 304, filters extraction module 42 and all text headers is extracted centre word, and a title may have Multiple centre words, carry out participle by participle technique to title, and word segmentation result is title centre word.Preferably Ground, first carries out prefix filtration to title before participle, and these prefixes are not involved in participle, such as, and " Mao Pu university Raw base ", this kind of prefix such as " ends of the earth tittle-tattle ".Wherein, user can be by prefix management backstage to needing Prefix to be filtered is managed;
Step 305, ordered set compound module 44 carries out focus phrase calculating:
Step 1, by the everyday words (such as, the vocabulary such as " original ", " reprinting ", " group figure ") in word segmentation result Filter;Wherein, user can be by everyday words management backstage to needing the everyday words filtered to be managed;
Step 2, the centre word after filtering carries out the phrase sequence (centre word that such as, a title is extracted out For bca, after sequence, become abc);
Step 3, is combined the centre word of each title, the centre word of each titleCombination, combination Formula:Only retain the phrase of 2-5 word;
Below, in conjunction with example, centre word is carried out phrase sequence combination to be illustrated.
Title one extracts centre word b, a, c out, a, b, c after sequence, forms phrase ab, bc, ac, abc
Title two extracts centre word c, b, d out, b, c, d after sequence, forms phrase bc, cd, bd, bcd
Title three extracts centre word b, c out, forms phrase bc
So these three title formed phrase seniority among brothers and sisters be exactly: bc (3), ab (1), ac (1), cd (1), bd(1)、abc(1)、bcd(1)。
Step 4, filtering spam phrase, removes such as inquiry ### prize-winning, ### phone, ### consulting, mobile phone ### The rubbish phrase of prize-winning etc;Wherein, user can be by the rubbish phrase management backstage rubbish to needing filtration Rubbish phrase is managed;
Step 306, focus statistics module 46 forms focus phrase ranking list, adds up each focus phrase behind Title quantity and press title quantity descending, retain title quantity more than 2 phrase, this parameter can Adjust according to real data;
In sum, by means of the technical scheme of the embodiment of the present invention, by utilizing hot word Computing Principle to realize Focus excavates, and is combined with focus digging technology by Text Classification, solves network in prior art Focus Result is the most macroscopical, can not divide the field reflection focus situation for this field and repeatability is big, The problem of readable difference;Excavation network hotspot that can be more macroscopical, a certain netizen is macroscopically paid close attention to by reflection The temperature situation of point, makes Result more can reflect the objective fact of the Internet public opinion, it is easier to merger repeats The identical content article occurred, and can the focus in a certain field of more targeted reflection.
Algorithm and display be not intrinsic with any certain computer, virtual system or miscellaneous equipment provided herein Relevant.Various general-purpose systems can also be used together with based on teaching in this.As described above, structure Make the structure required by this kind of system to be apparent from.Additionally, the present invention is also not for any certain programmed Language.It is understood that, it is possible to use various programming languages realize the content of invention described herein, and The description done language-specific above is the preferred forms in order to disclose the present invention.
In description mentioned herein, illustrate a large amount of detail.It is to be appreciated, however, that this Bright embodiment can be put into practice in the case of not having these details.In some instances, the most in detail Known method, structure and technology are shown, in order to do not obscure the understanding of this description.
Similarly, it will be appreciated that in order to simplify the disclosure help to understand in each inventive aspect one or Multiple, above in the description of the exemplary embodiment of the present invention, each feature of the present invention is sometimes by one Rise and be grouped in single embodiment, figure or descriptions thereof.But, should be by the method for the disclosure It is construed to reflect an intention that i.e. the present invention for required protection requires than institute in each claim clearly The more feature of feature recorded.More precisely, as the following claims reflect, send out Bright aspect is all features less than single embodiment disclosed above.Therefore, it then follows detailed description of the invention Claims be thus expressly incorporated in this detailed description of the invention, the conduct of the most each claim itself The independent embodiment of the present invention.
Those skilled in the art are appreciated that and can carry out adaptive to the module in the equipment in embodiment Change to answering property and they are arranged in one or more equipment different from this embodiment.Can be reality Execute the module in example or unit or assembly is combined into a module or unit or assembly, and in addition can be it Be divided into multiple submodule or subelement or sub-component.Except in such feature and/or process or unit Outside at least some excludes each other, can use any combination that (this specification being included, adjoint right is wanted Ask, make a summary and accompanying drawing) disclosed in all features and so disclosed any method or equipment all Process or unit are combined.Unless expressly stated otherwise, this specification (include adjoint claim, Summary and accompanying drawing) disclosed in each feature can by provide identical, equivalent or the alternative features of similar purpose Replace.
Although additionally, it will be appreciated by those of skill in the art that embodiments more described herein include other Some feature included in embodiment rather than further feature, but the combination meaning of the feature of different embodiment Taste and is within the scope of the present invention and is formed different embodiments.Such as, in following claim In book, one of arbitrarily can mode using in any combination of embodiment required for protection.
The all parts embodiment of the present invention can realize with hardware, or with at one or more processor The software module of upper operation realizes, or realizes with combinations thereof.It will be understood by those of skill in the art that Microprocessor or digital signal processor (DSP) can be used in practice to realize according to the present invention real Execute the some or all functions of some or all parts in the network hotspot excavating gear of example.The present invention It is also implemented as part or all the equipment for performing method as described herein or device Program (such as, computer program and computer program).Such program realizing the present invention is permissible Store on a computer-readable medium, or can be to have the form of one or more signal.Such letter Number can download from internet website and to obtain, or provide on carrier signal, or with any other shape Formula provides.
The present invention will be described rather than limits the invention to it should be noted above-described embodiment, and And those skilled in the art can design replacement enforcement without departing from the scope of the appended claims Example.In the claims, any reference marks that should not will be located between bracket is configured to claim Limit.Word " comprises " and does not excludes the presence of the element or step not arranged in the claims.Be positioned at element it Front word "a" or "an" does not excludes the presence of multiple such element.The present invention can be by means of bag Include the hardware of some different elements and realize by means of properly programmed computer.Some listing In the unit claim of device, several in these devices can be to be come specifically by same hardware branch Embody.Word first, second and third use do not indicate that any order.Can be by these word solutions It is interpreted as title.

Claims (10)

1. a network hotspot excavating gear, it is characterised in that including:
Classification memory module, is suitable to gather network data, storage that described network data is classified and classified;
Filter extraction module, be suitable to according to the filtering rule that pre-sets respectively to the network data under of all categories Filter, and from the network data after lower filtration of all categories, extract centre word;
Ordered set compound module, is suitable to be ranked up from the described centre word of consolidated network extracting data, and Centre word after the sequence of consolidated network data is combined, it is thus achieved that each network data under of all categories Center phrase;
Focus statistics module, is suitable to the occurrence number adding up described center phrase under generic, obtains respectively Take of all categories under network hotspot phrase and carry out classification show.
2. device as claimed in claim 1, it is characterised in that described network data farther includes: Article content that text header is corresponding with described text header and corresponding with described text header Text attribute.
3. device as claimed in claim 1 or 2, it is characterised in that described text attribute is wrapped further Include at least one of: uniform resource position mark URL that text is corresponding, the source forum/blog of text, The source column of text, the issuing time of text, text author, text reply the clear of number and text Look at number.
4. device as claimed any one in claims 1 to 3, it is characterised in that described classification stores Module is further adapted for:
Utilize Technologies of Automated Text Classification, according to described article content, described network data is carried out text classification, Obtain the tag along sort corresponding with described network data, and by corresponding text header, corresponding tag along sort, And the text property store of correspondence is in engine;
At predetermined time intervals described engine is carried out primary network data acquisition, and will according to described tag along sort The network data collected stores classifiedly in the different XML file of given server.
5. the device as according to any one of Claims 1-4, it is characterised in that described filtering rule Farther include at least one of:
The network data that text header is not met predetermined number of words is deleted;
The network data that issuing time is against regulation is deleted;
Deleting the network data containing predetermined domain name in URL, wherein, described predetermined domain name is pre- Domain name in the domain name blacklist first arranged;Or, the network data containing predetermined domain name in URL is entered Row retains;
Deleting the network data that source column is predetermined column, wherein, described predetermined column is in advance Column in the column blacklist arranged;Or, the network data that source column is predetermined column is protected Stay;
To originating, network data against regulation is deleted, and wherein, described source includes: forum, rich Visitor or all model;
The network data that reply number is not inconsistent regulation is deleted;
Delete browsing several network data against regulation;
The network data that author is against regulation is deleted;And
Network data is disappeared and heavily processes.
6. the device as according to any one of claim 1 to 5, it is characterised in that described filtration is extracted Module is further adapted for: use participle technique to extract center respectively from the network data after lower filtration of all categories Before word, the prefix dictionary according to pre-setting carries out prefix filtration to described text header.
7. the device as according to any one of claim 1 to 6, it is characterised in that described filtration is extracted Module is further adapted for: use participle technique respectively the text header after lower filtration of all categories to be carried out participle, Obtain word segmentation result, and using described word segmentation result as described centre word.
8. the device as according to any one of claim 1 to 7, it is characterised in that described sequence is combined Module is further adapted for: before being ranked up from the described centre word of consolidated network extracting data, according to Everyday words in the conventional dictionary the pre-set described centre word to extracting filters.
9. the device as according to any one of claim 1 to 8, it is characterised in that described sequence is combined Module is further adapted for: according toCentre word after belonging to the sequence of same text header is combined, Wherein, n is total number of the centre word belonging to same text header, r≤n and 2≤r≤5.
10. a network hotspot method for digging, it is characterised in that including:
Gather network data, storage that described network data is classified and classified;
Respectively the network data under of all categories is filtered according to the filtering rule pre-set, and respectively from Network data after lower filtration of all categories is extracted centre word;
It is ranked up from the described centre word of consolidated network extracting data, and by the row of consolidated network data Centre word after sequence is combined, it is thus achieved that the center phrase of each network data under of all categories;
Add up described center phrase occurrence number under generic, obtain respectively of all categories under network boom Point phrase also carries out classification displaying.
CN201610225018.0A 2012-09-18 2012-09-18 Method and device for network hotspot excavation Pending CN105912670A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610225018.0A CN105912670A (en) 2012-09-18 2012-09-18 Method and device for network hotspot excavation

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201210346827.9A CN102831248B (en) 2012-09-18 2012-09-18 Network focus method for digging and device
CN201610225018.0A CN105912670A (en) 2012-09-18 2012-09-18 Method and device for network hotspot excavation

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
CN201210346827.9A Division CN102831248B (en) 2012-09-18 2012-09-18 Network focus method for digging and device

Publications (1)

Publication Number Publication Date
CN105912670A true CN105912670A (en) 2016-08-31

Family

ID=47334383

Family Applications (2)

Application Number Title Priority Date Filing Date
CN201610225018.0A Pending CN105912670A (en) 2012-09-18 2012-09-18 Method and device for network hotspot excavation
CN201210346827.9A Expired - Fee Related CN102831248B (en) 2012-09-18 2012-09-18 Network focus method for digging and device

Family Applications After (1)

Application Number Title Priority Date Filing Date
CN201210346827.9A Expired - Fee Related CN102831248B (en) 2012-09-18 2012-09-18 Network focus method for digging and device

Country Status (1)

Country Link
CN (2) CN105912670A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107423444A (en) * 2017-08-10 2017-12-01 世纪龙信息网络有限责任公司 Hot word phrase extracting method and system
CN107967299A (en) * 2017-11-03 2018-04-27 中国农业大学 The hot word extraction method and system of a kind of facing agricultural public sentiment
CN108182191A (en) * 2016-12-08 2018-06-19 腾讯科技(深圳)有限公司 A kind of hot spot data processing method and its equipment

Families Citing this family (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103902596B (en) * 2012-12-28 2017-10-20 中国电信股份有限公司 High frequency content of pages clustering method and system
CN103324718B (en) * 2013-06-25 2016-08-10 百度在线网络技术(北京)有限公司 Method and system based on humongous search Web log mining topic venation
CN103761234A (en) * 2013-10-29 2014-04-30 北京奇虎科技有限公司 Method and device for optimizing search ranking of network resource point
CN103544294B (en) * 2013-10-30 2017-02-01 北京京东尚科信息技术有限公司 Keyword popularity automatic control method
CN103580997B (en) * 2013-11-19 2017-09-29 湖南蚁坊软件有限公司 The extracting method and its device of a kind of popular microblogging in vertical field
CN104714820A (en) * 2013-12-17 2015-06-17 青岛龙泰天翔通信科技有限公司 Cloud on-line updating method
CN105095175B (en) * 2014-04-18 2019-04-30 北京搜狗科技发展有限公司 Obtain the method and device of truncated web page title
CN105095318B (en) * 2014-05-22 2019-02-26 北京启明星辰信息安全技术有限公司 A kind of method and apparatus for realizing analysis of central issue
CN105373551A (en) * 2014-08-25 2016-03-02 阿里巴巴集团控股有限公司 Method for determining sensitive resource processing policy and server
CN105989176A (en) * 2015-03-05 2016-10-05 北大方正集团有限公司 Data processing method and device
CN108108346B (en) * 2016-11-25 2021-12-24 广东亿迅科技有限公司 Method and device for extracting theme characteristic words of document
CN107133201B (en) * 2017-04-21 2021-03-16 东莞中国科学院云计算产业技术创新与育成中心 Hot spot information acquisition method and device based on text code recognition
CN108881968B (en) * 2017-05-15 2020-10-30 北京国双科技有限公司 Network video advertisement putting method and system
CN107315838A (en) * 2017-07-17 2017-11-03 深圳源广安智能科技有限公司 A kind of efficient network hotspot digging system
CN108712403B (en) * 2018-05-04 2020-08-04 哈尔滨工业大学(威海) Illegal domain name mining method based on domain name construction similarity
CN110516066B (en) * 2019-07-23 2022-04-15 同盾控股有限公司 Text content safety protection method and device
CN110765115A (en) * 2019-09-27 2020-02-07 上海麦克风文化传媒有限公司 Method for combining multiple sorting categories
CN110929160B (en) * 2019-12-02 2024-05-10 上海麦克风文化传媒有限公司 Optimization method for system ordering result
CN110888986B (en) * 2019-12-06 2023-05-30 北京明略软件系统有限公司 Information pushing method, device, electronic equipment and computer readable storage medium
CN111580921B (en) * 2020-05-15 2021-10-22 北京字节跳动网络技术有限公司 Content creation method and device
CN112380339B (en) * 2020-11-23 2024-09-20 北京达佳互联信息技术有限公司 Hot event mining method, device and server

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101420356A (en) * 2008-05-30 2009-04-29 北京天腾时空信息科技有限公司 Network content classified processing method and apparatus
CN101788988A (en) * 2009-01-22 2010-07-28 蔡亮华 Information extraction method
CN101923544A (en) * 2009-06-15 2010-12-22 北京百分通联传媒技术有限公司 Method for monitoring and displaying Internet hot spots
CN102004792A (en) * 2010-12-07 2011-04-06 百度在线网络技术(北京)有限公司 Method and system for generating hot-searching word

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8046361B2 (en) * 2008-04-18 2011-10-25 Yahoo! Inc. System and method for classifying tags of content using a hyperlinked corpus of classified web pages

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101420356A (en) * 2008-05-30 2009-04-29 北京天腾时空信息科技有限公司 Network content classified processing method and apparatus
CN101788988A (en) * 2009-01-22 2010-07-28 蔡亮华 Information extraction method
CN101923544A (en) * 2009-06-15 2010-12-22 北京百分通联传媒技术有限公司 Method for monitoring and displaying Internet hot spots
CN102004792A (en) * 2010-12-07 2011-04-06 百度在线网络技术(北京)有限公司 Method and system for generating hot-searching word

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
罗引: "互联网舆情发现与观点挖掘技术研究", 《电子科技大学硕士学位论文》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108182191A (en) * 2016-12-08 2018-06-19 腾讯科技(深圳)有限公司 A kind of hot spot data processing method and its equipment
CN108182191B (en) * 2016-12-08 2022-01-18 腾讯科技(深圳)有限公司 Hotspot data processing method and device
CN107423444A (en) * 2017-08-10 2017-12-01 世纪龙信息网络有限责任公司 Hot word phrase extracting method and system
CN107423444B (en) * 2017-08-10 2020-05-19 世纪龙信息网络有限责任公司 Hot word phrase extraction method and system
CN107967299A (en) * 2017-11-03 2018-04-27 中国农业大学 The hot word extraction method and system of a kind of facing agricultural public sentiment
CN107967299B (en) * 2017-11-03 2020-05-12 中国农业大学 Agricultural public opinion-oriented automatic hot word extraction method and system

Also Published As

Publication number Publication date
CN102831248A (en) 2012-12-19
CN102831248B (en) 2016-05-11

Similar Documents

Publication Publication Date Title
CN102831248B (en) Network focus method for digging and device
CN102945290B (en) Hot microblog topic excavating gear and method
CN102968297B (en) The software management system of mobile terminal and method
CN102982157A (en) Device and method used for mining microblog hot topics
CN103218431B (en) A kind ofly can identify the system that info web gathers automatically
CN103136358B (en) A kind of method of Automatic Extraction forum data
CN104933093A (en) Regional public opinion monitoring and decision-making auxiliary system and method based on big data
CN104281607A (en) Microblog hot topic analyzing method
CN106383887A (en) Environment-friendly news data acquisition and recommendation display method and system
CN104820685A (en) Social contact network searching method and social contact network searching system
CN103744877A (en) Public opinion monitoring application system deployed in internet and application method
CN103955505A (en) Micro-blog-based real-time event monitoring method and system
CN102521767A (en) Method and system for publishing network advertising information
EP3014414A2 (en) Real-time and adaptive data mining
CN107437038A (en) A kind of detection method and device of webpage tamper
CN103617169A (en) Microblog hot topic extracting method based on Hadoop
CN110134845A (en) Project public sentiment monitoring method, device, computer equipment and storage medium
CN104536956A (en) A Microblog platform based event visualization method and system
CN107220745A (en) A kind of recognition methods, system and equipment for being intended to behavioral data
CN105718590A (en) Multi-tenant oriented SaaS public opinion monitoring system and method
CN103235827B (en) A kind of method of scientific and technical information automatic classification screening
CN103778225A (en) Processing method, identifying device and identifying system of advertisement marketing language information
CN103177076A (en) Public sentiment monitoring system and method based on fixed point websites
CN104809252A (en) Internet data extraction system
CN108733791A (en) network event detection method

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20160831

RJ01 Rejection of invention patent application after publication