CN106202278A - A kind of public sentiment based on data mining technology monitoring system - Google Patents
A kind of public sentiment based on data mining technology monitoring system Download PDFInfo
- Publication number
- CN106202278A CN106202278A CN201610507203.9A CN201610507203A CN106202278A CN 106202278 A CN106202278 A CN 106202278A CN 201610507203 A CN201610507203 A CN 201610507203A CN 106202278 A CN106202278 A CN 106202278A
- Authority
- CN
- China
- Prior art keywords
- value
- data
- file
- public
- unit
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
A kind of public sentiment based on data mining technology monitoring system, including: data capture unit, for crawling internet public feelings primary data by web crawler;Sharding unit, for carrying out input burst by internet public feelings primary data;Computing unit, for merging the duplicate key value in intermediate file, maps output file redundancy to reduce;Buffer unit, for opening up circulating memory relief area in internal memory, circulating memory relief area is used for mapping output file output;Output unit, is used for all of mapping output file and stores distributed file storage system;Modeling unit, is used for setting up network public-opinion forecast model;Predicting unit, maps output file for reading from distributed file storage system and carries out public sentiment prediction by network public-opinion forecast model.
Description
Technical field
The present invention relates to big data field of cloud computer technology, monitor particularly to a kind of public sentiment based on data mining technology
System.
Background technology
Network public-opinion refers to the most popular network public opinion to social problem's different views, is public opinion
A kind of form of expression, is stronger by the public of transmission on Internet having of being held some focus, focal issue in actual life
Power of influence, tendentious speech and viewpoint.Its manifestation mode of network public-opinion is predominantly: news analysis, BBS forum, blog, broadcast
Visitor, microblogging, polymerization news (RSS), news follow-up and turn note etc..
Network public-opinion is expressed fast, information is polynary, and mode is interactive.The opening of network and virtual, determines network carriage
Feelings have the following characteristics that substantivity, randomness and diversification, sudden, disguised, deviation.This also prison to network public-opinion
Survey brings difficulty.
Summary of the invention
In view of this, the present invention proposes a kind of public sentiment based on data mining technology monitoring system.
A kind of public sentiment based on data mining technology monitoring system, it includes such as lower unit:
Data capture unit, for crawling internet public feelings primary data by web crawler;
Sharding unit, for internet public feelings primary data carries out input burst, distributes one by each input burst
Mapping tasks, the array of the position of input burst storage burst length and record data;
On data memory node, carry out mapping by the mapping function write in advance and obtain intermediate file;
Computing unit, for merging the duplicate key value in intermediate file, maps output file redundancy to reduce;And be combined
After key assignments carry out serializing and obtain mapped cache file;Automatically each computational load value calculating node is obtained, according to calculating
Each mapped cache file is assigned to each and calculates in node by the computational load value of node;
Buffer unit, for opening up circulating memory relief area in internal memory, circulating memory relief area is used for mapping output literary composition
Part exports;In circulating memory relief area, create configuration file, configuration file configures the EMS memory occupation threshold of core buffer
Value;In circulating memory relief area, EMS memory occupation is more than or equal to when taking threshold value, and protection thread time-out writes data into internal memory,
And in internal memory, writing spill file, spill file determines the file of write disk, and is write by the file of circulating memory relief area
Enter disk until the output of all of mapping output file is complete;
Output unit, is used for all of mapping output file and stores distributed file storage system;
Modeling unit, is used for setting up network public-opinion forecast model;
Predicting unit, is mapped output file for reading from distributed file storage system and is predicted by network public-opinion
Model carries out public sentiment prediction.
In public sentiment monitoring system based on data mining technology of the present invention,
Described data capture unit includes:
From self-defined crawl list, take out chained address by web crawler, obtain network text;
Carry out Webpage detecting degree of depth network data source, take out data noise, extract body text, carry out theme phase
Pass degree determination processing.
In public sentiment monitoring system based on data mining technology of the present invention, to the Internet in described sharding unit
Public sentiment primary data carries out inputting burst and includes:
Set up incidence relation table, input file is split as position relationship value, activity relationship value, structural relation value, function
Relation value, functional relationship value, behavior relation value and other relation value, and by the correspondence of each relation value of each input file
In relation write incidence relation table;
Data corresponding for each relation value are put under in input burst.
In public sentiment monitoring system based on data mining technology of the present invention, by advance in described sharding unit
The mapping function write carries out mapping on data memory node and obtains intermediate file and include:
By the mapping function write in advance, input burst is mapped according to mapping tasks, described mapping include according to
Input burst content is carried out list alignment by the data form pre-set, it is judged that position relationship value, activity relationship value, structure are closed
Whether set occurrence, functional relationship value, functional relationship value, behavior relation value and other relation value exist, if each relation value is deposited
The most directly retaining, if there is no a certain item or a few n-th-trem relation n value, then the relation value lacked is sky;The arrangement of each relation
Order all keeps consistent.
In public sentiment monitoring system based on data mining technology of the present invention,
Described output unit includes:
From incidence relation table, inquire about each map all index informations that output file is corresponding, each is mapped output literary composition
In each corresponding segment data section of a being inserted into list of part;The position relationship value of record segment data, activity relationship value, structure are closed
Set occurrence, functional relationship value, functional relationship value, behavior relation value and other relation value.
In public sentiment monitoring system based on data mining technology of the present invention,
Input burst is mapped by described sharding unit by the mapping function by writing in advance according to mapping tasks
Also include judging to input whether burst exists logical error according to incidence relation table, then abandon this input burst as existed.
In public sentiment monitoring system based on data mining technology of the present invention,
Described modeling unit includes:
All of mapping output file employing clustering algorithm is constructed, is formed with sequence network public sentiment data information;
Ordered network public sentiment data information carrying out Lycoperdon polymorphum Vitt add up, generate cumulative sequence, sequence formula is as follows:
x(1)=[x(1)(1),x(1)(2),...x(1)(n)], wherein
By unitized method, the cumulative sequence data generated is zoomed in and out, transform it between [0,1], normalization
Formula be:Wherein xi, xi ' represent the value before and after conversion respectively, min (x), max (x) table respectively
It is shown with maximum and the minima of sequence network public sentiment data information;
Set up network public-opinion gray level model, and the sample pre-entered is predicted, predictive value is carried out regressive reduction
Computing obtains network public-opinion predictive value;
The residual error calculating network threshold predictive value and actual value obtains residual error training sample;
Residual error training sample input reverse transmittance nerve network is trained, and is optimized with particle cluster algorithm and obtains
Network public-opinion forecast model.
Public sentiment based on the data mining technology monitoring system that implementing the present invention provides compared with prior art has following
Beneficial effect: if by the network public-opinion data of magnanimity have been divided into stem portion according to the rule pre-set, giving multiple stage
Processor parallel processing;Then the result after each processor being processed carries out collecting operation to obtain final result;Can be real
Now process data a large amount of, non-structured, improve data processing type and speed.And pass through reverse transmittance nerve network
Obtain network public-opinion forecast model, can deeply excavate the Changing Pattern between network public-opinion data, it is possible to effectively, the most right
Network public-opinion is monitored.
Accompanying drawing explanation
Fig. 1 is public sentiment based on the data mining technology monitoring system architecture diagram of the embodiment of the present invention.
Detailed description of the invention
As it is shown in figure 1, a kind of public sentiment based on data mining technology monitoring system, it includes such as lower unit:
Data capture unit, for crawling internet public feelings primary data by web crawler.
The source of internet public feelings primary data includes the channels such as public number of internet web page, microblogging, wechat, forum.
Sharding unit, for internet public feelings primary data carries out input burst, distributes one by each input burst
Mapping tasks, the array of the position of input burst storage burst length and record data;
On data memory node, carry out mapping by the mapping function write in advance and obtain intermediate file;
Computing unit, for merging the duplicate key value in intermediate file, maps output file redundancy to reduce;And be combined
After key assignments carry out serializing and obtain mapped cache file;Automatically each computational load value calculating node is obtained, according to calculating
Each mapped cache file is assigned to each and calculates in node by the computational load value of node;
Buffer unit, for opening up circulating memory relief area in internal memory, circulating memory relief area is used for mapping output literary composition
Part exports;In circulating memory relief area, create configuration file, configuration file configures the EMS memory occupation threshold of core buffer
Value;In circulating memory relief area, EMS memory occupation is more than or equal to when taking threshold value, and protection thread time-out writes data into internal memory,
And in internal memory, writing spill file, spill file determines the file of write disk, and is write by the file of circulating memory relief area
Enter disk until the output of all of mapping output file is complete;
Output unit, is used for all of mapping output file and stores distributed file storage system;
Modeling unit, is used for setting up network public-opinion forecast model;
Predicting unit, is mapped output file for reading from distributed file storage system and is predicted by network public-opinion
Model carries out public sentiment prediction.
In public sentiment monitoring system based on data mining technology of the present invention,
Described data capture unit includes:
From self-defined crawl list, take out chained address by web crawler, obtain network text;
Carry out Webpage detecting degree of depth network data source, take out data noise, extract body text, carry out theme phase
Pass degree determination processing.
In public sentiment monitoring system based on data mining technology of the present invention, to the Internet in described sharding unit
Public sentiment primary data carries out inputting burst and includes:
Set up incidence relation table, input file is split as position relationship value, activity relationship value, structural relation value, function
Relation value, functional relationship value, behavior relation value and other relation value, and by the correspondence of each relation value of each input file
In relation write incidence relation table;
Data corresponding for each relation value are put under in input burst.
In public sentiment monitoring system based on data mining technology of the present invention, by advance in described sharding unit
The mapping function write carries out mapping on data memory node and obtains intermediate file and include:
By the mapping function write in advance, input burst is mapped according to mapping tasks, described mapping include according to
Input burst content is carried out list alignment by the data form pre-set, it is judged that position relationship value, activity relationship value, structure are closed
Whether set occurrence, functional relationship value, functional relationship value, behavior relation value and other relation value exist, if each relation value is deposited
The most directly retaining, if there is no a certain item or a few n-th-trem relation n value, then the relation value lacked is sky;The arrangement of each relation
Order all keeps consistent.
In public sentiment monitoring system based on data mining technology of the present invention,
Described output unit includes:
From incidence relation table, inquire about each map all index informations that output file is corresponding, each is mapped output literary composition
In each corresponding segment data section of a being inserted into list of part;The position relationship value of record segment data, activity relationship value, structure are closed
Set occurrence, functional relationship value, functional relationship value, behavior relation value and other relation value.
In public sentiment monitoring system based on data mining technology of the present invention,
Input burst is mapped by described sharding unit by the mapping function by writing in advance according to mapping tasks
Also include judging to input whether burst exists logical error according to incidence relation table, then abandon this input burst as existed.
In public sentiment monitoring system based on data mining technology of the present invention,
Described modeling unit includes:
All of mapping output file employing clustering algorithm is constructed, is formed with sequence network public sentiment data information;
Ordered network public sentiment data information carrying out Lycoperdon polymorphum Vitt add up, generate cumulative sequence, sequence formula is as follows:
x(1)=[x(1)(1),x(1)(2),...x(1)(n)], wherein
By unitized method, the cumulative sequence data generated is zoomed in and out, transform it between [0,1], normalization
Formula be:Wherein xi, xi ' represent the value before and after conversion respectively, min (x), max (x) table respectively
It is shown with maximum and the minima of sequence network public sentiment data information;
Set up network public-opinion gray level model, and the sample pre-entered is predicted, predictive value is carried out regressive reduction
Computing obtains network public-opinion predictive value;
The residual error calculating network threshold predictive value and actual value obtains residual error training sample;
Residual error training sample input reverse transmittance nerve network is trained, and is optimized with particle cluster algorithm and obtains
Network public-opinion forecast model.
Public sentiment based on the data mining technology monitoring system that implementing the present invention provides compared with prior art has following
Beneficial effect: if by the network public-opinion data of magnanimity have been divided into stem portion according to the rule pre-set, giving multiple stage
Processor parallel processing;Then the result after each processor being processed carries out collecting operation to obtain final result;Can be real
Now process data a large amount of, non-structured, improve data processing type and speed.And pass through reverse transmittance nerve network
Obtain network public-opinion forecast model, can deeply excavate the Changing Pattern between network public-opinion data, it is possible to effectively, the most right
Network public-opinion is monitored.
It is understood that for the person of ordinary skill of the art, can conceive according to the technology of the present invention and do
Go out other various corresponding changes and deformation, and all these change all should belong to the protection model of the claims in the present invention with deformation
Enclose.
Claims (7)
1. public sentiment based on a data mining technology monitoring system, it is characterised in that it includes such as lower unit:
Data capture unit, for crawling internet public feelings primary data by web crawler;
Sharding unit, for carrying out input burst, by each input burst one mapping of distribution by internet public feelings primary data
Task, the array of the position of input burst storage burst length and record data;
On data memory node, carry out mapping by the mapping function write in advance and obtain intermediate file;
Computing unit, for merging the duplicate key value in intermediate file, maps output file redundancy to reduce;And after being combined
Key assignments carries out serializing and obtains mapped cache file;Automatically each computational load value calculating node is obtained, according to calculating node
Computational load value each mapped cache file be assigned to each calculate in node;
Buffer unit, for opening up circulating memory relief area in internal memory, it is defeated that circulating memory relief area is used for mapping output file
Go out;In circulating memory relief area, create configuration file, configuration file configures the EMS memory occupation threshold value of core buffer;?
In circulating memory relief area, EMS memory occupation is more than or equal to when taking threshold value, protects thread to suspend and writes data into internal memory, and
Writing spill file in internal memory, spill file determines the file of write disk, and the file of circulating memory relief area is write magnetic
Dish is until the output of all of mapping output file is complete;
Output unit, is used for all of mapping output file and stores distributed file storage system;
Modeling unit, is used for setting up network public-opinion forecast model;
Predicting unit, maps output file and by network public-opinion forecast model for reading from distributed file storage system
Carry out public sentiment prediction.
2. public sentiment based on data mining technology monitoring system as claimed in claim 1, it is characterised in that
Described data capture unit includes:
From self-defined crawl list, take out chained address by web crawler, obtain network text;
Carry out Webpage detecting degree of depth network data source, take out data noise, extract body text, carry out degree of subject relativity
Determination processing.
3. public sentiment based on data mining technology monitoring system as claimed in claim 2, it is characterised in that described sharding unit
In internet public feelings primary data carried out inputs burst include:
Set up incidence relation table, input file is split as position relationship value, activity relationship value, structural relation value, functional relationship
Value, functional relationship value, behavior relation value and other relation value, and by the corresponding relation of each relation value of each input file
In write incidence relation table;
Data corresponding for each relation value are put under in input burst.
4. public sentiment based on data mining technology monitoring system as claimed in claim 3, it is characterised in that described sharding unit
In on data memory node, carry out mapping by the mapping function write in advance and obtain intermediate file and include:
Being mapped according to mapping tasks by input burst by the mapping function write in advance, described mapping includes according in advance
Input burst content is carried out list alignment by the data form arranged, it is judged that position relationship value, activity relationship value, structural relation
Whether value, functional relationship value, functional relationship value, behavior relation value and other relation value exist, if each relation value exists
The most directly retaining, if there is no a certain item or a few n-th-trem relation n value, then the relation value lacked is sky;The arrangement of each relation is suitable
Sequence all keeps consistent.
5. public sentiment based on data mining technology monitoring system as claimed in claim 4, it is characterised in that
Described output unit includes:
From incidence relation table, inquire about each map all index informations that output file is corresponding, each is mapped output file
In each corresponding segment data section of being inserted into list;The position relationship value of record segment data, activity relationship value, structural relation
Value, functional relationship value, functional relationship value, behavior relation value and other relation value.
6. public sentiment based on data mining technology monitoring system as claimed in claim 5, it is characterised in that
Described sharding unit is also wrapped being mapped according to mapping tasks by input burst by the mapping function write in advance
Include and judge to input whether burst exists logical error according to incidence relation table, then abandon this input burst as existed.
7. public sentiment based on data mining technology monitoring system as claimed in claim 6, it is characterised in that
Described modeling unit includes:
All of mapping output file employing clustering algorithm is constructed, is formed with sequence network public sentiment data information;
Ordered network public sentiment data information carrying out Lycoperdon polymorphum Vitt add up, generate cumulative sequence, sequence formula is as follows:
x(1)=[x(1)(1),x(1)(2),...x(1)(n)], wherein
By unitized method, the cumulative sequence data generated is zoomed in and out, transform it between [0,1], normalized public affairs
Formula is:Wherein xi, xi ' represent the value before and after conversion respectively, and min (x), max (x) indicate respectively
The maximum of sequence network public sentiment data information and minima;
Set up network public-opinion gray level model, and the sample pre-entered is predicted, predictive value is carried out regressive reduction computing
Obtain network public-opinion predictive value;
The residual error calculating network threshold predictive value and actual value obtains residual error training sample;
Residual error training sample input reverse transmittance nerve network is trained, and is optimized with particle cluster algorithm and obtains network
Public sentiment forecast model.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610507203.9A CN106202278B (en) | 2016-07-01 | 2016-07-01 | A kind of public sentiment monitoring system based on data mining technology |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610507203.9A CN106202278B (en) | 2016-07-01 | 2016-07-01 | A kind of public sentiment monitoring system based on data mining technology |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106202278A true CN106202278A (en) | 2016-12-07 |
CN106202278B CN106202278B (en) | 2019-08-13 |
Family
ID=57463003
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610507203.9A Active CN106202278B (en) | 2016-07-01 | 2016-07-01 | A kind of public sentiment monitoring system based on data mining technology |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106202278B (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106951475A (en) * | 2017-03-07 | 2017-07-14 | 郑州铁路职业技术学院 | Big data distributed approach and system based on cloud computing |
CN107679133A (en) * | 2017-09-22 | 2018-02-09 | 电子科技大学 | A kind of method for digging for being practically applicable to the real-time PMU data of magnanimity |
CN109471965A (en) * | 2018-10-26 | 2019-03-15 | 四川才子软件信息网络有限公司 | A kind of network public-opinion data sampling and processing method and monitoring platform based on big data |
WO2020042427A1 (en) * | 2018-08-31 | 2020-03-05 | 平安科技(深圳)有限公司 | Reconciliation method and apparatus based on data fragments, computer device, and storage medium |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5724573A (en) * | 1995-12-22 | 1998-03-03 | International Business Machines Corporation | Method and system for mining quantitative association rules in large relational tables |
CN104063230A (en) * | 2014-07-09 | 2014-09-24 | 中国科学院重庆绿色智能技术研究院 | Rough set parallel reduction method, device and system based on MapReduce |
-
2016
- 2016-07-01 CN CN201610507203.9A patent/CN106202278B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5724573A (en) * | 1995-12-22 | 1998-03-03 | International Business Machines Corporation | Method and system for mining quantitative association rules in large relational tables |
CN104063230A (en) * | 2014-07-09 | 2014-09-24 | 中国科学院重庆绿色智能技术研究院 | Rough set parallel reduction method, device and system based on MapReduce |
Non-Patent Citations (1)
Title |
---|
杨光明子: "基于入侵检测的APT防御平台的设计与实现", 《中国优秀硕士学位论文全文数据库信息科技辑》 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106951475A (en) * | 2017-03-07 | 2017-07-14 | 郑州铁路职业技术学院 | Big data distributed approach and system based on cloud computing |
CN107679133A (en) * | 2017-09-22 | 2018-02-09 | 电子科技大学 | A kind of method for digging for being practically applicable to the real-time PMU data of magnanimity |
CN107679133B (en) * | 2017-09-22 | 2020-01-17 | 电子科技大学 | Mining method applicable to massive real-time PMU data |
WO2020042427A1 (en) * | 2018-08-31 | 2020-03-05 | 平安科技(深圳)有限公司 | Reconciliation method and apparatus based on data fragments, computer device, and storage medium |
CN109471965A (en) * | 2018-10-26 | 2019-03-15 | 四川才子软件信息网络有限公司 | A kind of network public-opinion data sampling and processing method and monitoring platform based on big data |
Also Published As
Publication number | Publication date |
---|---|
CN106202278B (en) | 2019-08-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Srivastava et al. | Identifying aggression and toxicity in comments using capsule network | |
Shao et al. | A novel method for intelligent fault diagnosis of rolling bearings using ensemble deep auto-encoders | |
Wang et al. | LightLog: A lightweight temporal convolutional network for log anomaly detection on the edge | |
WO2018226404A1 (en) | Machine reasoning based on knowledge graph | |
US8756053B2 (en) | Systems and methods for extracting patterns from graph and unstructured data | |
Zhang et al. | An emotional classification method of Chinese short comment text based on ELECTRA | |
CN106202278A (en) | A kind of public sentiment based on data mining technology monitoring system | |
Li et al. | Exploiting microblog conversation structures to detect rumors | |
CN112861522B (en) | Aspect-level emotion analysis method, system and model based on dual-attention mechanism | |
CN110134852B (en) | Document duplicate removal method and device and readable medium | |
Xu et al. | Rumor detection on social media using hierarchically aggregated feature via graph neural networks | |
Butt et al. | Towards secure private and trustworthy human-centric embedded machine learning: An emotion-aware facial recognition case study | |
Demirbaga | HTwitt: a hadoop-based platform for analysis and visualization of streaming Twitter data | |
Wu et al. | Adversarial contrastive learning for evidence-aware fake news detection with graph neural networks | |
Yan et al. | A clustering algorithm for multi-modal heterogeneous big data with abnormal data | |
Zhang | Application of knowledge model in dance teaching based on wearable device based on deep learning | |
Rashid et al. | Tinym2net: A flexible system algorithm co-designed multimodal learning framework for tiny devices | |
CN109416621A (en) | Restore the free space in non-volatile memories using the computer memory system of shared object is supported | |
Miran et al. | Detection of hate-speech tweets based on deep learning: A review | |
Feng et al. | What Does the Bot Say? Opportunities and Risks of Large Language Models in Social Media Bot Detection | |
CN116702784B (en) | Entity linking method, entity linking device, computer equipment and storage medium | |
CN113221717A (en) | Model construction method, device and equipment based on privacy protection | |
CN115878761B (en) | Event context generation method, device and medium | |
CN109977194B (en) | Text similarity calculation method, system, device and medium based on unsupervised learning | |
Nicolaidis | Global PeaceTech: Unlocking the Better Angels of our Techne |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |