CN104199947A - Important person speech supervision and incidence relation excavating method - Google Patents
Important person speech supervision and incidence relation excavating method Download PDFInfo
- Publication number
- CN104199947A CN104199947A CN201410459905.5A CN201410459905A CN104199947A CN 104199947 A CN104199947 A CN 104199947A CN 201410459905 A CN201410459905 A CN 201410459905A CN 104199947 A CN104199947 A CN 104199947A
- Authority
- CN
- China
- Prior art keywords
- personnel
- incidence relation
- speech
- data
- supervision
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
Abstract
The invention discloses an important person speech supervision and incidence relation excavating method. The method includes the following steps: (1) building a Hadoop big data platform; (2) collecting and resolving microblog data; (3) conducting data cleaning and person matching; (4) analyzing speech tendencies and incidence relations; (5) conducting data visualization displaying. Compared with the prior art, the important person speech supervision and incidence relation excavating method has the advantages of being reasonable in design, convenient to use and the like, the distributed storage and processing technology is applied to a system on the basis of the big data platform, log-on messages and browse messages of netizens on the microblog are collected, the speech tendencies and the incidence relations of the important attention-given persons are analyzed through message matching and incidence relation excavating, the excavated data are displayed in a visualization mode, and tracing is continuously carried out according to the microblog refreshing condition.
Description
Technical field
The present invention relates to public sentiment supervision and the technical field of incidence relation, specifically a kind of method to emphasis personnel speech supervision and incidence relation excavation based on the large data of cloud computing.
Background technology
Hadoop is a distributed system architecture, and by Apache fund, club develops.Hadoop carries out the instrument of classifying content on Internet to search key.
NameNode is the software moving on a common independent machine in HDFS example.It is in charge of file system title space and controls the access of external client.Whether NameNode determines File Mapping on the copy block on DataNode.
DataNode is also the software moving on a common independent machine in HDFS example.DataNode is conventionally with the form tissue of frame, and frame couples together all systems by a switch.
ZooKeeper is the formal sub-project of Hadoop, it be one for the reliable coherent system of large-scale distributed system, the function providing comprises: configuring maintenance, name Service, distributed synchronization, group service etc.
HBase be one distributed, towards row the database of increasing income.HBase is different from general relational database, and it is a database that is suitable for unstructured data storage.Another are different is the per-column rather than pattern based on row of HBase.
Microblogging is one and focuses on ageingly and random based on customer relationship Information Sharing, the platform propagating and obtain, and micro-blog more can give expression to thought and latest tendency all the time.In recent years, micro-blog number with send out rich quantity of information and explode, having become domestic netizen can be independent and sounding channel relatively freely, no matter the open platform of rich and honour poverty, data volume also reaches large data rank.According to supervision microblogging content, the thought of more can true, real-time tracking paying close attention to personnel dynamically, speech tendency and incidence relation.Meanwhile, the reaching its maturity of the distributed storage that the hadoop ecosystem provides, calculating, nosql database, data query handling implement and data mining algorithm etc., also for the large data mining of microblogging provides technology platform.At present, also do not process based on the large data of cloud computing the rational method to emphasis personnel speech supervision and incidence relation.
Summary of the invention
Technical assignment of the present invention is to provide a kind of method to emphasis personnel speech supervision and incidence relation excavation.
Technical assignment of the present invention is realized in the following manner, and the method step is as follows:
1) set up the large data platform of Hadoop: set up the Hadoop cluster being formed by 11 nodes;
2) microblogging data acquisition and parsing: web crawlers adopts the nutch through secondary development, realizes Theme Crawler of Content collection; To with the given relevant information of paying close attention to personnel as theme, crawl the microblogging data on internet, and carry out participle parsing according to self-defined dictionary, deposit predefined characteristic attribute value in database, form structural data;
3) data cleansing and personnel coupling: structural data is carried out to data pre-service, use Euclidean distance, carry out similarity calculating with the personnel that the pay close attention to eigen vector providing, choose netizen's information that similarity surpasses threshold value as analytic target;
4) speech tendency and incidence relation analysis: according to self-defined dictionary, adopt the technology such as semantic analysis and word frequency statistics to analyze paying close attention to personnel's speech tendency; According to the personnel's interactive information gathering from microblogging, adopt incidence relation algorithm to excavate and pay close attention to personnel's network of personal connections, and follow the trail of according to microblogging update status;
5) data visualization represents: to paying close attention to personnel's speech tendency and incidence relation, carry out visual representing.
In described step 1), 11 nodes comprise 1 NameNode node, 1 SecondaryNameNode node, 1 zookeeper node and 8 DataNode/Tasktracker nodes.
Described step 2) database in adopts hbase.
In described step 3), data pre-service comprises formulation vacancy value fill rule, difference computation rule.
A kind of method that emphasis personnel speech supervision and incidence relation are excavated of the present invention compared to the prior art, there is the features such as reasonable in design, easy to use, system is on large data platform basis, application distribution Storage and Processing technology, gather netizen at log-on message and the browsing information of microblogging, through information matches and incidence relation, excavate, analyze the given personnel's of paying close attention to speech tendency and incidence relation, mining data is carried out to visual representing, and continue to follow the tracks of according to microblogging refresh case.
Accompanying drawing explanation
Accompanying drawing 1 is a kind of schematic flow sheet to the method for emphasis personnel speech supervision and incidence relation excavation.
Embodiment
Embodiment 1:
This method step to emphasis personnel speech supervision and incidence relation excavation is as follows:
1) set up the large data platform of Hadoop: set up the Hadoop cluster being formed by 11 nodes;
2) microblogging data acquisition and parsing: web crawlers adopts the nutch through secondary development, realizes Theme Crawler of Content collection; To with the given relevant information of paying close attention to personnel as theme, crawl the microblogging data on internet, and carry out participle parsing according to self-defined dictionary, deposit predefined characteristic attribute value in database, form structural data;
3) data cleansing and personnel's coupling: structural data is carried out to data pre-service, formulate vacancy value fill rule, difference computation rule, use Euclidean distance, carry out similarity calculating with the personnel that the pay close attention to eigen vector providing, choose netizen's information that similarity surpasses threshold value as analytic target;
4) speech tendency and incidence relation analysis: according to self-defined dictionary, adopt the technology such as semantic analysis and word frequency statistics to analyze paying close attention to personnel's speech tendency; According to the personnel's interactive information gathering from microblogging, adopt incidence relation algorithm to excavate and pay close attention to personnel's network of personal connections, and follow the trail of according to microblogging update status;
5) data visualization represents: to paying close attention to personnel's speech tendency and incidence relation, carry out visual representing.
Embodiment 2:
This method step to emphasis personnel speech supervision and incidence relation excavation is as follows:
1) set up the large data platform of Hadoop: set up the Hadoop cluster being formed by 11 nodes, comprise 1 NameNode node, 1 SecondaryNameNode node, 1 zookeeper node and 8 DataNode/Tasktracker nodes.
2) microblogging data acquisition and parsing: web crawlers adopts the nutch through secondary development, realizes Theme Crawler of Content collection; To with the given relevant information of paying close attention to personnel as theme, crawl the microblogging data on internet, and carry out participle parsing according to self-defined dictionary, deposit predefined characteristic attribute value in hbase database, form structural data;
3) data cleansing and personnel's coupling: structural data is carried out to data pre-service, formulate vacancy value fill rule, difference computation rule, use Euclidean distance, carry out similarity calculating with the personnel that the pay close attention to eigen vector providing, choose netizen's information that similarity surpasses threshold value as analytic target;
4) speech tendency and incidence relation analysis: according to self-defined dictionary, adopt the technology such as semantic analysis and word frequency statistics to analyze paying close attention to personnel's speech tendency; According to the personnel's interactive information gathering from microblogging, adopt incidence relation algorithm to excavate and pay close attention to personnel's network of personal connections, and follow the trail of according to microblogging update status;
5) data visualization represents: to paying close attention to personnel's speech tendency and incidence relation, carry out visual representing.
By embodiment above, described those skilled in the art can be easy to realize the present invention.But should be appreciated that the present invention is not limited to above-mentioned several embodiments.On the basis of disclosed embodiment, described those skilled in the art can the different technical characterictic of combination in any, thereby realizes different technical schemes.
Claims (4)
1. a method of emphasis personnel speech supervision and incidence relation being excavated, is characterized in that the method step is as follows:
1) set up the large data platform of Hadoop: set up the Hadoop cluster being formed by 11 nodes;
2) microblogging data acquisition and parsing: web crawlers adopts the nutch through secondary development, realizes Theme Crawler of Content collection; To with the given relevant information of paying close attention to personnel as theme, crawl the microblogging data on internet, and carry out participle parsing according to self-defined dictionary, deposit predefined characteristic attribute value in database, form structural data;
3) data cleansing and personnel coupling: structural data is carried out to data pre-service, use Euclidean distance, carry out similarity calculating with the personnel that the pay close attention to eigen vector providing, choose netizen's information that similarity surpasses threshold value as analytic target;
4) speech tendency and incidence relation analysis: according to self-defined dictionary, adopt the technology such as semantic analysis and word frequency statistics to analyze paying close attention to personnel's speech tendency; According to the personnel's interactive information gathering from microblogging, adopt incidence relation algorithm to excavate and pay close attention to personnel's network of personal connections, and follow the trail of according to microblogging update status;
5) data visualization represents: to paying close attention to personnel's speech tendency and incidence relation, carry out visual representing.
2. a kind of method that emphasis personnel speech supervision and incidence relation are excavated according to claim 1, it is characterized in that, in described step 1), 11 nodes comprise 1 NameNode node, 1 SecondaryNameNode node, 1 zookeeper node and 8 DataNode/Tasktracker nodes.
3. a kind of method that emphasis personnel speech supervision and incidence relation are excavated according to claim 1, is characterized in that described step 2) in database adopt hbase.
4. a kind of method to emphasis personnel speech supervision and incidence relation excavation according to claim 1, is characterized in that, in described step 3), data pre-service comprises formulation vacancy value fill rule, difference computation rule.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410459905.5A CN104199947A (en) | 2014-09-11 | 2014-09-11 | Important person speech supervision and incidence relation excavating method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410459905.5A CN104199947A (en) | 2014-09-11 | 2014-09-11 | Important person speech supervision and incidence relation excavating method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN104199947A true CN104199947A (en) | 2014-12-10 |
Family
ID=52085240
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410459905.5A Pending CN104199947A (en) | 2014-09-11 | 2014-09-11 | Important person speech supervision and incidence relation excavating method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104199947A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104598631A (en) * | 2015-02-05 | 2015-05-06 | 北京航空航天大学 | Distributed data processing platform |
CN104915438A (en) * | 2015-06-25 | 2015-09-16 | 西安交通大学 | Method for acquiring PCU association data in specific topic microblogs |
CN105718590A (en) * | 2016-01-27 | 2016-06-29 | 福州大学 | Multi-tenant oriented SaaS public opinion monitoring system and method |
CN110555149A (en) * | 2019-09-05 | 2019-12-10 | 深圳前海微众银行股份有限公司 | Method, device and equipment for processing speech data and readable storage medium |
CN113609403A (en) * | 2021-06-21 | 2021-11-05 | 河南工学院 | Internet public opinion information acquisition method |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103544196A (en) * | 2012-07-16 | 2014-01-29 | 闫忠华 | BigBase high-throughput big data online analysis software and hardware all-in-one machine |
CN103617169A (en) * | 2013-10-23 | 2014-03-05 | 杭州电子科技大学 | Microblog hot topic extracting method based on Hadoop |
CN103729420A (en) * | 2013-12-20 | 2014-04-16 | 潘大庆 | Microblog hotspot tracking system and method |
-
2014
- 2014-09-11 CN CN201410459905.5A patent/CN104199947A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103544196A (en) * | 2012-07-16 | 2014-01-29 | 闫忠华 | BigBase high-throughput big data online analysis software and hardware all-in-one machine |
CN103617169A (en) * | 2013-10-23 | 2014-03-05 | 杭州电子科技大学 | Microblog hot topic extracting method based on Hadoop |
CN103729420A (en) * | 2013-12-20 | 2014-04-16 | 潘大庆 | Microblog hotspot tracking system and method |
Non-Patent Citations (1)
Title |
---|
唐继禹: "《云环境下基于个性化模型的探索式搜索技术研究与实现》", 《中国优秀硕士学位论文全文数据库(CNKI)》 * |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104598631A (en) * | 2015-02-05 | 2015-05-06 | 北京航空航天大学 | Distributed data processing platform |
CN104598631B (en) * | 2015-02-05 | 2017-11-14 | 北京航空航天大学 | Distributed data processing platform |
CN104915438A (en) * | 2015-06-25 | 2015-09-16 | 西安交通大学 | Method for acquiring PCU association data in specific topic microblogs |
CN104915438B (en) * | 2015-06-25 | 2019-02-05 | 西安交通大学 | A method of obtaining PCU associated data in specific topics microblogging |
CN105718590A (en) * | 2016-01-27 | 2016-06-29 | 福州大学 | Multi-tenant oriented SaaS public opinion monitoring system and method |
CN110555149A (en) * | 2019-09-05 | 2019-12-10 | 深圳前海微众银行股份有限公司 | Method, device and equipment for processing speech data and readable storage medium |
CN113609403A (en) * | 2021-06-21 | 2021-11-05 | 河南工学院 | Internet public opinion information acquisition method |
CN113609403B (en) * | 2021-06-21 | 2024-03-26 | 河南工学院 | Internet public opinion information acquisition method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP3819792A2 (en) | Method, apparatus, device, and storage medium for intention recommendation | |
Abrol et al. | Tweethood: Agglomerative clustering on fuzzy k-closest friends with variable depth for location mining | |
TWI501097B (en) | System and method of analyzing text stream message | |
Gad et al. | ThemeDelta: Dynamic segmentations over temporal topic models | |
CN104281607A (en) | Microblog hot topic analyzing method | |
CN104199947A (en) | Important person speech supervision and incidence relation excavating method | |
Lee | Unsupervised and supervised learning to evaluate event relatedness based on content mining from social-media streams | |
CN103699611B (en) | Microblog flow information extracting method based on dynamic digest technology | |
Psomakelis et al. | Big IoT and social networking data for smart cities: Algorithmic improvements on Big Data Analysis in the context of RADICAL city applications | |
CN110533212A (en) | Urban waterlogging public sentiment monitoring and pre-alarming method based on big data | |
CN108108459A (en) | Multi-source fusion and the associated dynamic data cleaning method of loop and electronic equipment | |
CN104408083A (en) | Socialized media analyzing system | |
CN105678590A (en) | topN recommendation method for social network based on cloud model | |
Chen et al. | D-map+ interactive visual analysis and exploration of ego-centric and event-centric information diffusion patterns in social media | |
Demirbaga | HTwitt: a hadoop-based platform for analysis and visualization of streaming Twitter data | |
Rani et al. | A survey of tools for social network analysis | |
Junaidi et al. | Analysis of Community Response to Disasters through Twitter Social Media | |
CN107239509A (en) | Towards single Topics Crawling method and system of short text | |
Aslam et al. | Opinion mining using live Twitter data | |
CN104035969A (en) | Method and system for building feature word banks in social network | |
Zhang et al. | Rumor detection with hierarchical representation on bipartite ad hoc event trees | |
Leung et al. | Knowledge discovery from big social key-value data | |
CN108830735B (en) | Online interpersonal relationship analysis method and system | |
US10511556B2 (en) | Bursty detection for message streams | |
Kim et al. | Construction of disaster knowledge graphs to enhance disaster resilience |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20141210 |
|
WD01 | Invention patent application deemed withdrawn after publication |