CN107315838A - A kind of efficient network hotspot digging system - Google Patents
A kind of efficient network hotspot digging system Download PDFInfo
- Publication number
- CN107315838A CN107315838A CN201710581360.9A CN201710581360A CN107315838A CN 107315838 A CN107315838 A CN 107315838A CN 201710581360 A CN201710581360 A CN 201710581360A CN 107315838 A CN107315838 A CN 107315838A
- Authority
- CN
- China
- Prior art keywords
- user
- junk
- feature
- normal users
- users
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3344—Query execution using natural language analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9535—Search customisation based on user profiles and personalisation
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Information Transfer Between Computers (AREA)
Abstract
The invention provides a kind of efficient network hotspot digging system, including recommending subsystem, storage subsystem, filter subsystem and focus to excavate subsystem, the recommendation subsystem carries out network data recommendation using microblog users, the storage subsystem is stored to the network data that microblog users are recommended, the filter subsystem is filtered according to filtering rule to the network data, and centre word is extracted from the network data after filtering, the focus excavates the occurrence number that subsystem is used to count centre word, and the centre word more than occurrence number is network hotspot.Beneficial effects of the present invention are:Network hotspot is realized efficiently to excavate.
Description
Technical field
The present invention relates to network hotspot digging technology field, and in particular to a kind of efficient network hotspot digging system.
Background technology
With the development of Internet technology, how network hotspot is quickly obtained from internet mass information, for understanding
Guiding effect is dynamically played in social development situation, grasp public opinion.There is the low problem of digging efficiency in existing focus digging technology.
The content of the invention
In view of the above-mentioned problems, a kind of the present invention is intended to provide efficient network hotspot digging system.
The purpose of the present invention is realized using following technical scheme:
There is provided a kind of efficient network hotspot digging system, including recommend subsystem, storage subsystem, filter subsystem
Subsystem is excavated with focus, the recommendation subsystem carries out network data recommendation, the storage subsystem pair using microblog users
The network data that microblog users are recommended is stored, and the filter subsystem was carried out according to filtering rule to the network data
Filter, and extract from the network data after filtering centre word, what the focus excavated that subsystem is used to counting centre word goes out occurrence
Count, the centre word more than occurrence number is network hotspot.
Beneficial effects of the present invention are:Network hotspot is realized efficiently to excavate.
Brief description of the drawings
Using accompanying drawing, the invention will be further described, but the embodiment in accompanying drawing does not constitute any limit to the present invention
System, for one of ordinary skill in the art, on the premise of not paying creative work, can also be obtained according to the following drawings
Other accompanying drawings.
Fig. 1 is the structural representation of the present invention;
Reference:
Subsystem 1, storage subsystem 2, filter subsystem 3, focus is recommended to excavate subsystem 4.
Embodiment
The invention will be further described with the following Examples.
Referring to Fig. 1, a kind of efficient network hotspot digging system of the present embodiment, including recommend subsystem 1, storage subsystem
System 2, filter subsystem 3 and focus excavate subsystem 4, and the recommendation subsystem 1 carries out network data recommendation using microblog users,
The storage subsystem 2 is stored to the network data that microblog users are recommended, and the filter subsystem 3 is according to filtering rule pair
The network data is filtered, and extracts centre word from the network data after filtering, and the focus, which excavates subsystem 4, to be used for
The occurrence number of centre word is counted, the centre word more than occurrence number is network hotspot.
The present embodiment realizes network hotspot and efficiently excavated.
It is preferred that, the network data includes the issue of text header, content of text corresponding with text header and text
Time.
The network data that this preferred embodiment is obtained is more fully.
It is preferred that, the filtering rule is:Number of words and issuing time network against regulation are not met to text header
Data are rejected.
This preferred embodiment is filtered to ineligible data, further increases focus digging efficiency.
It is preferred that, the recommendation subsystem 1 includes junk user discovery module, junk user and rejects module and recommend mould
Block, the junk user discovery module is used to determine microblogging junk user, and the junk user, which rejects module, to be used to use rubbish
Family is rejected, and the recommending module carries out network data recommendation, the rubbish using the microblog users rejected after junk user
User's discovery module includes the first modeling submodule, second user classification submodule and the 3rd junk user determination sub-module, institute
Stating the first modeling submodule is used to set up microblog users network model, and the second user classification submodule is based on microblog users net
Network model is classified to microblog users, and the 3rd junk user determination sub-module is based on user's classification and determined in microblog users
Junk user;The microblog users network model is set up based on user's concern relation, is specially:User's concern in microblogging is closed
Be formed digraph H=(W, B) as microblog users network model, wherein, B be side collection, if there is concern relation in user,
Then there is side between user, W is microblog users set.
The present embodiment recommends concern relation of the subsystem based on microblog users to be modeled microblog users network, Neng Gouyou
Effect ground finds the junk user in microblogging, is favorably improved subsequent recommendation accuracy.
It is preferred that, the second user classification submodule includes the first characteristic of division determining unit and the second taxon, described
First characteristic of division determining unit is used to determine the feature for classifying, and second taxon is used to be determined according to characteristic of division
Class of subscriber;The first characteristic of division determining unit includes the first junk user characteristic of division determination subelement and second and just commonly used
Family characteristic of division determination subelement, the first junk user characteristic of division determination subelement is used for the spy for determining to judge junk user
Levy, the second normal users characteristic of division determination subelement is used for the feature for determining to judge normal users;The judgement rubbish is used
The feature at family includes the first junk user feature EH1With the second junk user feature EH2;First rubbish is determined in the following ways
Rubbish user characteristics:Calculate the first junk user characteristic index of user: In above-mentioned formula, p (xi) represent the time series of user being divided into m subsequence, xthi
Individual subsequence number of publishing the news accounts for the ratio of total time sequence, if YW1≥CS1, then the first junk user of user satisfaction is special
Levy, CS1For given threshold;The second junk user feature is determined in the following ways:Calculate the second junk user of user
Characteristic index:In above-mentioned formula, d1Represent the message that user includes "@" in publishing the news
Number, l1The message number that user includes " http//" in publishing the news is represented, d represents the message sum that user delivers;If YW2≥
CS2, then it represents that user meets the second junk user feature, CS2For given threshold.The feature for judging normal users includes first
Normal users feature EM1With the second normal users feature EM2:The first normal users feature is determined in the following ways:Calculate
The first normal users characteristic index of user:
If LG1≤CS3, then the user meet the first normal users feature, CS3For given threshold;Described second is determined in the following ways
Normal users feature:Calculate the second normal users characteristic index of user:If LG2≤CS4, then
Represent that user meets the second normal users feature, CS4For given threshold.
This preferred embodiment recommends subsystem by setting up a variety of classification judging characteristics, specifically, the first junk user
Characteristic index and the first normal users characteristic index reflect the rule of posting of user, and the second junk user characteristic index closes second
Normal users characteristic index reflects the transmission junk information situation of user, is that subsequent user classification is laid a good foundation.
It is preferred that, second taxon determines class of subscriber in the following ways:(1) for any user w ∈ W,
The given characteristic set EM={ EM for judging normal usersj, j=1,2, if meeting j-th of feature of normal users, its into
Can be high for the probability of normal users, if only existing a feature so that user w has higher probability to be normal users, then should
User is doubtful normal users, if there is two features so that user w has higher probability to be normal users, then the user
For approximate normal users;(2) for any user w ∈ W, the characteristic set EH={ EH for judging junk user are giveni, i=1,
2, if meeting the ith feature of junk user, its probability for turning into junk user can be high, if only existing a feature,
So that user w has higher probability to be junk user, then the user is doubtful junk user, if there is two features so that
User w has higher probability to be junk user, then the user is approximate junk user;(3) for any user w ∈ W, if both
It is unsatisfactory for junk user and judges feature, be also unsatisfactory for normal users and judge feature, then user w is uncertain user.
This preferred embodiment recommends subsystem by determining that the feature of junk user and the feature of normal users determine user
Classification, realizes the Accurate classification of user, is that follow-up determination junk user is laid a good foundation.
It is preferred that, the 3rd junk user determination sub-module determines junk user in the following ways:(1) user is calculated
Score ZC:In above-mentioned formula, b1Represent that user pays close attention to the number of doubtful junk user, b2Table
Show that user pays close attention to the number of approximate junk user, a1Represent the number that user is paid close attention to by doubtful normal users, a2Represent that user is near
The number paid close attention to like normal users;(2) if user is approximate junk user and meets user's score ZC>0.2, if user is doubtful
Junk user and meet user's score ZC>0.5, if user is uncertain user and meets user's score ZC>1, if user is doubtful
Like normal users and meet user's score ZC>2, if user is approximate normal users and meets user's score ZC>4, then by user
It is defined as junk user, is otherwise normal users.
This preferred embodiment determines junk user using user's score by the way of feature is combined, and reduces junk user
It was found that False Rate, improve the discovery accuracy rate of junk user.
User carries out network hotspot excavation, the focus number point of excavation using the efficient network hotspot digging system of the present invention
Not Wei 10,20,30,40,50 when, excavation time and user satisfaction are counted, compared with junk user find system phase
Than having the beneficial effect that shown in table for, generation:
Focus number | The excavation time shortens | User satisfaction is improved |
10 | 23% | 21% |
20 | 25% | 20% |
30 | 24% | 25% |
40 | 26% | 22% |
50 | 24% | 23% |
Finally it should be noted that the above embodiments are merely illustrative of the technical solutions of the present invention, rather than to present invention guarantor
The limitation of scope is protected, although being explained with reference to preferred embodiment to the present invention, one of ordinary skill in the art should
Work as understanding, technical scheme can be modified or equivalent substitution, without departing from the reality of technical solution of the present invention
Matter and scope.
Claims (7)
1. a kind of efficient network hotspot digging system, it is characterised in that including recommending subsystem, storage subsystem, crossing filter
System and focus excavate subsystem, and the recommendation subsystem carries out network data recommendation, the storage subsystem using microblog users
Unite and the network data that microblog users are recommended is stored, the filter subsystem is entered according to filtering rule to the network data
Row filtering, and centre word is extracted from the network data after filtering, the focus, which excavates subsystem, to be used to count going out for centre word
Occurrence number, the centre word more than occurrence number is network hotspot.
2. efficient network hotspot digging system according to claim 1, it is characterised in that the network data includes text
The issuing time of this title, content of text corresponding with text header and text.
3. efficient network hotspot digging system according to claim 2, it is characterised in that the filtering rule is:It is right
Text header does not meet number of words and issuing time network data against regulation is rejected.
4. efficient network hotspot digging system according to claim 3, it is characterised in that the recommendation subsystem includes
Junk user discovery module, junk user reject module and recommending module, and the junk user discovery module is used to determine microblogging
Junk user, the junk user, which rejects module, to be used to reject junk user, and the recommending module is using rejecting rubbish
Microblog users after user carry out network data recommendation, and the junk user discovery module includes the first modeling submodule, second
User's classification submodule and the 3rd junk user determination sub-module, the first modeling submodule are used to set up microblog users network
Model, the second user classification submodule is classified based on microblog users network model to microblog users, the 3rd rubbish
Rubbish user determination sub-module determines the junk user in microblog users based on user's classification;The microblog users network model is based on
User's concern relation is set up, and is specially:It regard the digraph H=(W, B) of user's concern relation formation in microblogging as microblog users
Network model, wherein, B is side collection, if user, which exists, has side between concern relation, user, and W is microblog users set.
5. efficient network hotspot digging system according to claim 4, it is characterised in that second user classification
Module includes the first characteristic of division determining unit and the second taxon, and the first characteristic of division determining unit is used to determine to use
In the feature of classification, second taxon is used to determine class of subscriber according to characteristic of division;First characteristic of division is true
Order member includes the first junk user characteristic of division determination subelement and the second normal users characteristic of division determination subelement, described
First junk user characteristic of division determination subelement is used for the feature for determining to judge junk user, the second normal users classification
Feature determination subelement is used for the feature for determining to judge normal users;The feature for judging junk user includes the first junk user
Feature EH1With the second junk user feature EH2;The first junk user feature is determined in the following ways:Calculate the of user
One junk user characteristic index:
In above-mentioned formula, p (xi) represent the time series of user being divided into m subsequence, xthiIndividual subsequence number of publishing the news is accounted for
Total time sequence ratio, if YW1≥CS1, then the user meet the first junk user feature, CS1For given threshold;Use with
Under type determines the second junk user feature:Calculate the second junk user characteristic index of user:In above-mentioned formula, d1Represent the message number that user includes "@" in publishing the news, l1Table
Show that user includes the message number of " http//" in publishing the news, d represents the message sum that user delivers;If YW2≥CS2, then table
Show that user meets the second junk user feature, CS2For given threshold;The feature for judging normal users includes first and just commonly used
Family feature EM1With the second normal users feature EM2:The first normal users feature is determined in the following ways:Calculate user's
First normal users characteristic index:If LG1
≤CS3, then the user meet the first normal users feature, CS3For given threshold;Determine that described second is normal in the following ways
User characteristics:Calculate the second normal users characteristic index of user:If LG2≤CS4, then it represents that
User meets the second normal users feature, CS4For given threshold.
6. efficient network hotspot digging system according to claim 5, it is characterised in that second taxon is adopted
Class of subscriber is determined with the following methods:(1) for any user w ∈ W, the characteristic set EM=for judging normal users is given
{EMj, j=1,2, if meeting j-th of feature of normal users, its probability for turning into normal users can be high, if only deposited
In a feature so that user w has higher probability to be normal users, then the user is doubtful normal users, if there is two
Individual feature so that user w has higher probability to be normal users, then the user is approximate normal users;(2) for any user
W ∈ W, give the characteristic set EH={ EH for judging junk useri, i=1,2, if meeting the ith feature of junk user,
Then its probability for turning into junk user can be high, if only existing a feature so that user w has higher probability to be used for rubbish
Family, then the user is doubtful junk user, if there is two features so that user w has higher probability to be junk user, then
The user is approximate junk user;(3) for any user w ∈ W, if being both unsatisfactory for junk user judges feature, also it is discontented with
Sufficient normal users judge feature, then user w is uncertain user.
7. efficient network hotspot digging system according to claim 6, it is characterised in that the 3rd junk user is true
Stator modules determine junk user in the following ways:(1) user's score ZC is calculated:It is above-mentioned
In formula, b1Represent that user pays close attention to the number of doubtful junk user, b2Represent that user pays close attention to the number of approximate junk user, a1Table
Show the number that user is paid close attention to by doubtful normal users, a2Represent the number that user is paid close attention to by approximate normal users;(2) if user is
Approximate junk user and meet user's score ZC>0.2, if user is doubtful junk user and meets user's score ZC>0.5, if
User is uncertain user and meets user's score ZC>1, if user is doubtful normal users and meets user's score ZC>2, if
User is approximate normal users and meets user's score ZC>4, then user is defined as junk user, is otherwise normal users.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710581360.9A CN107315838A (en) | 2017-07-17 | 2017-07-17 | A kind of efficient network hotspot digging system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710581360.9A CN107315838A (en) | 2017-07-17 | 2017-07-17 | A kind of efficient network hotspot digging system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107315838A true CN107315838A (en) | 2017-11-03 |
Family
ID=60178724
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710581360.9A Pending CN107315838A (en) | 2017-07-17 | 2017-07-17 | A kind of efficient network hotspot digging system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107315838A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109102418A (en) * | 2018-08-08 | 2018-12-28 | 电子科技大学 | Social networks rubbish account recognition methods based on customer relationship |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102831248A (en) * | 2012-09-18 | 2012-12-19 | 北京奇虎科技有限公司 | Network hotspot mining method and network hotspot mining device |
CN103294833A (en) * | 2012-11-02 | 2013-09-11 | 中国人民解放军国防科学技术大学 | Junk user discovering method based on user following relationships |
US8606860B2 (en) * | 2003-03-31 | 2013-12-10 | Affini, Inc. | System and method for providing filtering email messages |
-
2017
- 2017-07-17 CN CN201710581360.9A patent/CN107315838A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8606860B2 (en) * | 2003-03-31 | 2013-12-10 | Affini, Inc. | System and method for providing filtering email messages |
CN102831248A (en) * | 2012-09-18 | 2012-12-19 | 北京奇虎科技有限公司 | Network hotspot mining method and network hotspot mining device |
CN103294833A (en) * | 2012-11-02 | 2013-09-11 | 中国人民解放军国防科学技术大学 | Junk user discovering method based on user following relationships |
Non-Patent Citations (1)
Title |
---|
丁兆云等: ""微博中基于统计特征与双向投票的垃圾用户发现"", 《计算机研究与发展》 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109102418A (en) * | 2018-08-08 | 2018-12-28 | 电子科技大学 | Social networks rubbish account recognition methods based on customer relationship |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9842170B2 (en) | Method, apparatus and system of intelligent navigation | |
CN101408883B (en) | Method for collecting network public feelings viewpoint | |
WO2019200752A1 (en) | Semantic understanding-based point of interest query method, device and computing apparatus | |
CN101819573B (en) | Self-adaptive network public opinion identification method | |
CN103500175B (en) | A kind of method based on sentiment analysis on-line checking microblog hot event | |
CN102929959A (en) | Book recommendation method based on user actions | |
CN108170692A (en) | A kind of focus incident information processing method and device | |
CN103049440A (en) | Recommendation processing method and processing system for related articles | |
CN105488092A (en) | Time-sensitive self-adaptive on-line subtopic detecting method and system | |
CN105354305A (en) | Online-rumor identification method and apparatus | |
CN111461711B (en) | Tracking system for block chain transaction | |
CN103559315B (en) | Information screening method for pushing and device | |
CN103914491A (en) | Data excavating method and system for high quality user generation content (UGC) | |
CN103838819A (en) | Information publish method and system | |
CN107943905A (en) | A kind of much-talked-about topic analysis method and system | |
CN104077723A (en) | Social network recommending system and social network recommending method | |
CN103218368B (en) | A kind of method and apparatus excavating hot word | |
CN107563807A (en) | A kind of regional advertisement supplying system based on data mining | |
CN107908618A (en) | A kind of hot spot word finds method and apparatus | |
CN103279483B (en) | A kind of topic Epidemic Scope appraisal procedure towards micro-blog and system | |
Aamir et al. | Trust in social-sensor cloud service | |
CN108959364B (en) | Method for evaluating influence of news media in social media event-level news | |
CN107315838A (en) | A kind of efficient network hotspot digging system | |
CN110413858A (en) | Enterprise's public feelings information querying method, device, computer equipment and storage medium | |
CN102750288B (en) | A kind of internet content recommend method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20171103 |
|
RJ01 | Rejection of invention patent application after publication |