CN107315838A - A kind of efficient network hotspot digging system - Google Patents

A kind of efficient network hotspot digging system Download PDF

Info

Publication number
CN107315838A
CN107315838A CN201710581360.9A CN201710581360A CN107315838A CN 107315838 A CN107315838 A CN 107315838A CN 201710581360 A CN201710581360 A CN 201710581360A CN 107315838 A CN107315838 A CN 107315838A
Authority
CN
China
Prior art keywords
user
junk
feature
normal users
users
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710581360.9A
Other languages
Chinese (zh)
Inventor
孟玲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Source Guang'an Intelligent Technology Co Ltd
Original Assignee
Shenzhen Source Guang'an Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Source Guang'an Intelligent Technology Co Ltd filed Critical Shenzhen Source Guang'an Intelligent Technology Co Ltd
Priority to CN201710581360.9A priority Critical patent/CN107315838A/en
Publication of CN107315838A publication Critical patent/CN107315838A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The invention provides a kind of efficient network hotspot digging system, including recommending subsystem, storage subsystem, filter subsystem and focus to excavate subsystem, the recommendation subsystem carries out network data recommendation using microblog users, the storage subsystem is stored to the network data that microblog users are recommended, the filter subsystem is filtered according to filtering rule to the network data, and centre word is extracted from the network data after filtering, the focus excavates the occurrence number that subsystem is used to count centre word, and the centre word more than occurrence number is network hotspot.Beneficial effects of the present invention are:Network hotspot is realized efficiently to excavate.

Description

A kind of efficient network hotspot digging system
Technical field
The present invention relates to network hotspot digging technology field, and in particular to a kind of efficient network hotspot digging system.
Background technology
With the development of Internet technology, how network hotspot is quickly obtained from internet mass information, for understanding Guiding effect is dynamically played in social development situation, grasp public opinion.There is the low problem of digging efficiency in existing focus digging technology.
The content of the invention
In view of the above-mentioned problems, a kind of the present invention is intended to provide efficient network hotspot digging system.
The purpose of the present invention is realized using following technical scheme:
There is provided a kind of efficient network hotspot digging system, including recommend subsystem, storage subsystem, filter subsystem Subsystem is excavated with focus, the recommendation subsystem carries out network data recommendation, the storage subsystem pair using microblog users The network data that microblog users are recommended is stored, and the filter subsystem was carried out according to filtering rule to the network data Filter, and extract from the network data after filtering centre word, what the focus excavated that subsystem is used to counting centre word goes out occurrence Count, the centre word more than occurrence number is network hotspot.
Beneficial effects of the present invention are:Network hotspot is realized efficiently to excavate.
Brief description of the drawings
Using accompanying drawing, the invention will be further described, but the embodiment in accompanying drawing does not constitute any limit to the present invention System, for one of ordinary skill in the art, on the premise of not paying creative work, can also be obtained according to the following drawings Other accompanying drawings.
Fig. 1 is the structural representation of the present invention;
Reference:
Subsystem 1, storage subsystem 2, filter subsystem 3, focus is recommended to excavate subsystem 4.
Embodiment
The invention will be further described with the following Examples.
Referring to Fig. 1, a kind of efficient network hotspot digging system of the present embodiment, including recommend subsystem 1, storage subsystem System 2, filter subsystem 3 and focus excavate subsystem 4, and the recommendation subsystem 1 carries out network data recommendation using microblog users, The storage subsystem 2 is stored to the network data that microblog users are recommended, and the filter subsystem 3 is according to filtering rule pair The network data is filtered, and extracts centre word from the network data after filtering, and the focus, which excavates subsystem 4, to be used for The occurrence number of centre word is counted, the centre word more than occurrence number is network hotspot.
The present embodiment realizes network hotspot and efficiently excavated.
It is preferred that, the network data includes the issue of text header, content of text corresponding with text header and text Time.
The network data that this preferred embodiment is obtained is more fully.
It is preferred that, the filtering rule is:Number of words and issuing time network against regulation are not met to text header Data are rejected.
This preferred embodiment is filtered to ineligible data, further increases focus digging efficiency.
It is preferred that, the recommendation subsystem 1 includes junk user discovery module, junk user and rejects module and recommend mould Block, the junk user discovery module is used to determine microblogging junk user, and the junk user, which rejects module, to be used to use rubbish Family is rejected, and the recommending module carries out network data recommendation, the rubbish using the microblog users rejected after junk user User's discovery module includes the first modeling submodule, second user classification submodule and the 3rd junk user determination sub-module, institute Stating the first modeling submodule is used to set up microblog users network model, and the second user classification submodule is based on microblog users net Network model is classified to microblog users, and the 3rd junk user determination sub-module is based on user's classification and determined in microblog users Junk user;The microblog users network model is set up based on user's concern relation, is specially:User's concern in microblogging is closed Be formed digraph H=(W, B) as microblog users network model, wherein, B be side collection, if there is concern relation in user, Then there is side between user, W is microblog users set.
The present embodiment recommends concern relation of the subsystem based on microblog users to be modeled microblog users network, Neng Gouyou Effect ground finds the junk user in microblogging, is favorably improved subsequent recommendation accuracy.
It is preferred that, the second user classification submodule includes the first characteristic of division determining unit and the second taxon, described First characteristic of division determining unit is used to determine the feature for classifying, and second taxon is used to be determined according to characteristic of division Class of subscriber;The first characteristic of division determining unit includes the first junk user characteristic of division determination subelement and second and just commonly used Family characteristic of division determination subelement, the first junk user characteristic of division determination subelement is used for the spy for determining to judge junk user Levy, the second normal users characteristic of division determination subelement is used for the feature for determining to judge normal users;The judgement rubbish is used The feature at family includes the first junk user feature EH1With the second junk user feature EH2;First rubbish is determined in the following ways Rubbish user characteristics:Calculate the first junk user characteristic index of user: In above-mentioned formula, p (xi) represent the time series of user being divided into m subsequence, xthi Individual subsequence number of publishing the news accounts for the ratio of total time sequence, if YW1≥CS1, then the first junk user of user satisfaction is special Levy, CS1For given threshold;The second junk user feature is determined in the following ways:Calculate the second junk user of user Characteristic index:In above-mentioned formula, d1Represent the message that user includes "@" in publishing the news Number, l1The message number that user includes " http//" in publishing the news is represented, d represents the message sum that user delivers;If YW2≥ CS2, then it represents that user meets the second junk user feature, CS2For given threshold.The feature for judging normal users includes first Normal users feature EM1With the second normal users feature EM2:The first normal users feature is determined in the following ways:Calculate The first normal users characteristic index of user: If LG1≤CS3, then the user meet the first normal users feature, CS3For given threshold;Described second is determined in the following ways Normal users feature:Calculate the second normal users characteristic index of user:If LG2≤CS4, then Represent that user meets the second normal users feature, CS4For given threshold.
This preferred embodiment recommends subsystem by setting up a variety of classification judging characteristics, specifically, the first junk user Characteristic index and the first normal users characteristic index reflect the rule of posting of user, and the second junk user characteristic index closes second Normal users characteristic index reflects the transmission junk information situation of user, is that subsequent user classification is laid a good foundation.
It is preferred that, second taxon determines class of subscriber in the following ways:(1) for any user w ∈ W, The given characteristic set EM={ EM for judging normal usersj, j=1,2, if meeting j-th of feature of normal users, its into Can be high for the probability of normal users, if only existing a feature so that user w has higher probability to be normal users, then should User is doubtful normal users, if there is two features so that user w has higher probability to be normal users, then the user For approximate normal users;(2) for any user w ∈ W, the characteristic set EH={ EH for judging junk user are giveni, i=1, 2, if meeting the ith feature of junk user, its probability for turning into junk user can be high, if only existing a feature, So that user w has higher probability to be junk user, then the user is doubtful junk user, if there is two features so that User w has higher probability to be junk user, then the user is approximate junk user;(3) for any user w ∈ W, if both It is unsatisfactory for junk user and judges feature, be also unsatisfactory for normal users and judge feature, then user w is uncertain user.
This preferred embodiment recommends subsystem by determining that the feature of junk user and the feature of normal users determine user Classification, realizes the Accurate classification of user, is that follow-up determination junk user is laid a good foundation.
It is preferred that, the 3rd junk user determination sub-module determines junk user in the following ways:(1) user is calculated Score ZC:In above-mentioned formula, b1Represent that user pays close attention to the number of doubtful junk user, b2Table Show that user pays close attention to the number of approximate junk user, a1Represent the number that user is paid close attention to by doubtful normal users, a2Represent that user is near The number paid close attention to like normal users;(2) if user is approximate junk user and meets user's score ZC>0.2, if user is doubtful Junk user and meet user's score ZC>0.5, if user is uncertain user and meets user's score ZC>1, if user is doubtful Like normal users and meet user's score ZC>2, if user is approximate normal users and meets user's score ZC>4, then by user It is defined as junk user, is otherwise normal users.
This preferred embodiment determines junk user using user's score by the way of feature is combined, and reduces junk user It was found that False Rate, improve the discovery accuracy rate of junk user.
User carries out network hotspot excavation, the focus number point of excavation using the efficient network hotspot digging system of the present invention Not Wei 10,20,30,40,50 when, excavation time and user satisfaction are counted, compared with junk user find system phase Than having the beneficial effect that shown in table for, generation:
Focus number The excavation time shortens User satisfaction is improved
10 23% 21%
20 25% 20%
30 24% 25%
40 26% 22%
50 24% 23%
Finally it should be noted that the above embodiments are merely illustrative of the technical solutions of the present invention, rather than to present invention guarantor The limitation of scope is protected, although being explained with reference to preferred embodiment to the present invention, one of ordinary skill in the art should Work as understanding, technical scheme can be modified or equivalent substitution, without departing from the reality of technical solution of the present invention Matter and scope.

Claims (7)

1. a kind of efficient network hotspot digging system, it is characterised in that including recommending subsystem, storage subsystem, crossing filter System and focus excavate subsystem, and the recommendation subsystem carries out network data recommendation, the storage subsystem using microblog users Unite and the network data that microblog users are recommended is stored, the filter subsystem is entered according to filtering rule to the network data Row filtering, and centre word is extracted from the network data after filtering, the focus, which excavates subsystem, to be used to count going out for centre word Occurrence number, the centre word more than occurrence number is network hotspot.
2. efficient network hotspot digging system according to claim 1, it is characterised in that the network data includes text The issuing time of this title, content of text corresponding with text header and text.
3. efficient network hotspot digging system according to claim 2, it is characterised in that the filtering rule is:It is right Text header does not meet number of words and issuing time network data against regulation is rejected.
4. efficient network hotspot digging system according to claim 3, it is characterised in that the recommendation subsystem includes Junk user discovery module, junk user reject module and recommending module, and the junk user discovery module is used to determine microblogging Junk user, the junk user, which rejects module, to be used to reject junk user, and the recommending module is using rejecting rubbish Microblog users after user carry out network data recommendation, and the junk user discovery module includes the first modeling submodule, second User's classification submodule and the 3rd junk user determination sub-module, the first modeling submodule are used to set up microblog users network Model, the second user classification submodule is classified based on microblog users network model to microblog users, the 3rd rubbish Rubbish user determination sub-module determines the junk user in microblog users based on user's classification;The microblog users network model is based on User's concern relation is set up, and is specially:It regard the digraph H=(W, B) of user's concern relation formation in microblogging as microblog users Network model, wherein, B is side collection, if user, which exists, has side between concern relation, user, and W is microblog users set.
5. efficient network hotspot digging system according to claim 4, it is characterised in that second user classification Module includes the first characteristic of division determining unit and the second taxon, and the first characteristic of division determining unit is used to determine to use In the feature of classification, second taxon is used to determine class of subscriber according to characteristic of division;First characteristic of division is true Order member includes the first junk user characteristic of division determination subelement and the second normal users characteristic of division determination subelement, described First junk user characteristic of division determination subelement is used for the feature for determining to judge junk user, the second normal users classification Feature determination subelement is used for the feature for determining to judge normal users;The feature for judging junk user includes the first junk user Feature EH1With the second junk user feature EH2;The first junk user feature is determined in the following ways:Calculate the of user One junk user characteristic index: In above-mentioned formula, p (xi) represent the time series of user being divided into m subsequence, xthiIndividual subsequence number of publishing the news is accounted for Total time sequence ratio, if YW1≥CS1, then the user meet the first junk user feature, CS1For given threshold;Use with Under type determines the second junk user feature:Calculate the second junk user characteristic index of user:In above-mentioned formula, d1Represent the message number that user includes "@" in publishing the news, l1Table Show that user includes the message number of " http//" in publishing the news, d represents the message sum that user delivers;If YW2≥CS2, then table Show that user meets the second junk user feature, CS2For given threshold;The feature for judging normal users includes first and just commonly used Family feature EM1With the second normal users feature EM2:The first normal users feature is determined in the following ways:Calculate user's First normal users characteristic index:If LG1 ≤CS3, then the user meet the first normal users feature, CS3For given threshold;Determine that described second is normal in the following ways User characteristics:Calculate the second normal users characteristic index of user:If LG2≤CS4, then it represents that User meets the second normal users feature, CS4For given threshold.
6. efficient network hotspot digging system according to claim 5, it is characterised in that second taxon is adopted Class of subscriber is determined with the following methods:(1) for any user w ∈ W, the characteristic set EM=for judging normal users is given {EMj, j=1,2, if meeting j-th of feature of normal users, its probability for turning into normal users can be high, if only deposited In a feature so that user w has higher probability to be normal users, then the user is doubtful normal users, if there is two Individual feature so that user w has higher probability to be normal users, then the user is approximate normal users;(2) for any user W ∈ W, give the characteristic set EH={ EH for judging junk useri, i=1,2, if meeting the ith feature of junk user, Then its probability for turning into junk user can be high, if only existing a feature so that user w has higher probability to be used for rubbish Family, then the user is doubtful junk user, if there is two features so that user w has higher probability to be junk user, then The user is approximate junk user;(3) for any user w ∈ W, if being both unsatisfactory for junk user judges feature, also it is discontented with Sufficient normal users judge feature, then user w is uncertain user.
7. efficient network hotspot digging system according to claim 6, it is characterised in that the 3rd junk user is true Stator modules determine junk user in the following ways:(1) user's score ZC is calculated:It is above-mentioned In formula, b1Represent that user pays close attention to the number of doubtful junk user, b2Represent that user pays close attention to the number of approximate junk user, a1Table Show the number that user is paid close attention to by doubtful normal users, a2Represent the number that user is paid close attention to by approximate normal users;(2) if user is Approximate junk user and meet user's score ZC>0.2, if user is doubtful junk user and meets user's score ZC>0.5, if User is uncertain user and meets user's score ZC>1, if user is doubtful normal users and meets user's score ZC>2, if User is approximate normal users and meets user's score ZC>4, then user is defined as junk user, is otherwise normal users.
CN201710581360.9A 2017-07-17 2017-07-17 A kind of efficient network hotspot digging system Pending CN107315838A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710581360.9A CN107315838A (en) 2017-07-17 2017-07-17 A kind of efficient network hotspot digging system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710581360.9A CN107315838A (en) 2017-07-17 2017-07-17 A kind of efficient network hotspot digging system

Publications (1)

Publication Number Publication Date
CN107315838A true CN107315838A (en) 2017-11-03

Family

ID=60178724

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710581360.9A Pending CN107315838A (en) 2017-07-17 2017-07-17 A kind of efficient network hotspot digging system

Country Status (1)

Country Link
CN (1) CN107315838A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109102418A (en) * 2018-08-08 2018-12-28 电子科技大学 Social networks rubbish account recognition methods based on customer relationship

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102831248A (en) * 2012-09-18 2012-12-19 北京奇虎科技有限公司 Network hotspot mining method and network hotspot mining device
CN103294833A (en) * 2012-11-02 2013-09-11 中国人民解放军国防科学技术大学 Junk user discovering method based on user following relationships
US8606860B2 (en) * 2003-03-31 2013-12-10 Affini, Inc. System and method for providing filtering email messages

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8606860B2 (en) * 2003-03-31 2013-12-10 Affini, Inc. System and method for providing filtering email messages
CN102831248A (en) * 2012-09-18 2012-12-19 北京奇虎科技有限公司 Network hotspot mining method and network hotspot mining device
CN103294833A (en) * 2012-11-02 2013-09-11 中国人民解放军国防科学技术大学 Junk user discovering method based on user following relationships

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
丁兆云等: ""微博中基于统计特征与双向投票的垃圾用户发现"", 《计算机研究与发展》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109102418A (en) * 2018-08-08 2018-12-28 电子科技大学 Social networks rubbish account recognition methods based on customer relationship

Similar Documents

Publication Publication Date Title
US9842170B2 (en) Method, apparatus and system of intelligent navigation
CN101408883B (en) Method for collecting network public feelings viewpoint
WO2019200752A1 (en) Semantic understanding-based point of interest query method, device and computing apparatus
CN101819573B (en) Self-adaptive network public opinion identification method
CN103500175B (en) A kind of method based on sentiment analysis on-line checking microblog hot event
CN102929959A (en) Book recommendation method based on user actions
CN108170692A (en) A kind of focus incident information processing method and device
CN103049440A (en) Recommendation processing method and processing system for related articles
CN105488092A (en) Time-sensitive self-adaptive on-line subtopic detecting method and system
CN105354305A (en) Online-rumor identification method and apparatus
CN111461711B (en) Tracking system for block chain transaction
CN103559315B (en) Information screening method for pushing and device
CN103914491A (en) Data excavating method and system for high quality user generation content (UGC)
CN103838819A (en) Information publish method and system
CN107943905A (en) A kind of much-talked-about topic analysis method and system
CN104077723A (en) Social network recommending system and social network recommending method
CN103218368B (en) A kind of method and apparatus excavating hot word
CN107563807A (en) A kind of regional advertisement supplying system based on data mining
CN107908618A (en) A kind of hot spot word finds method and apparatus
CN103279483B (en) A kind of topic Epidemic Scope appraisal procedure towards micro-blog and system
Aamir et al. Trust in social-sensor cloud service
CN108959364B (en) Method for evaluating influence of news media in social media event-level news
CN107315838A (en) A kind of efficient network hotspot digging system
CN110413858A (en) Enterprise's public feelings information querying method, device, computer equipment and storage medium
CN102750288B (en) A kind of internet content recommend method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20171103

RJ01 Rejection of invention patent application after publication