CN106202278A - A kind of public sentiment based on data mining technology monitoring system - Google Patents

A kind of public sentiment based on data mining technology monitoring system Download PDF

Info

Publication number
CN106202278A
CN106202278A CN201610507203.9A CN201610507203A CN106202278A CN 106202278 A CN106202278 A CN 106202278A CN 201610507203 A CN201610507203 A CN 201610507203A CN 106202278 A CN106202278 A CN 106202278A
Authority
CN
China
Prior art keywords
value
data
file
public
unit
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610507203.9A
Other languages
Chinese (zh)
Other versions
CN106202278B (en
Inventor
刘丽君
李成华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
WUHAN TIPDM INTELLIGENT TECHNOLOGY Co Ltd
Original Assignee
WUHAN TIPDM INTELLIGENT TECHNOLOGY Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by WUHAN TIPDM INTELLIGENT TECHNOLOGY Co Ltd filed Critical WUHAN TIPDM INTELLIGENT TECHNOLOGY Co Ltd
Priority to CN201610507203.9A priority Critical patent/CN106202278B/en
Publication of CN106202278A publication Critical patent/CN106202278A/en
Application granted granted Critical
Publication of CN106202278B publication Critical patent/CN106202278B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques

Abstract

A kind of public sentiment based on data mining technology monitoring system, including: data capture unit, for crawling internet public feelings primary data by web crawler;Sharding unit, for carrying out input burst by internet public feelings primary data;Computing unit, for merging the duplicate key value in intermediate file, maps output file redundancy to reduce;Buffer unit, for opening up circulating memory relief area in internal memory, circulating memory relief area is used for mapping output file output;Output unit, is used for all of mapping output file and stores distributed file storage system;Modeling unit, is used for setting up network public-opinion forecast model;Predicting unit, maps output file for reading from distributed file storage system and carries out public sentiment prediction by network public-opinion forecast model.

Description

A kind of public sentiment based on data mining technology monitoring system
Technical field
The present invention relates to big data field of cloud computer technology, monitor particularly to a kind of public sentiment based on data mining technology System.
Background technology
Network public-opinion refers to the most popular network public opinion to social problem's different views, is public opinion A kind of form of expression, is stronger by the public of transmission on Internet having of being held some focus, focal issue in actual life Power of influence, tendentious speech and viewpoint.Its manifestation mode of network public-opinion is predominantly: news analysis, BBS forum, blog, broadcast Visitor, microblogging, polymerization news (RSS), news follow-up and turn note etc..
Network public-opinion is expressed fast, information is polynary, and mode is interactive.The opening of network and virtual, determines network carriage Feelings have the following characteristics that substantivity, randomness and diversification, sudden, disguised, deviation.This also prison to network public-opinion Survey brings difficulty.
Summary of the invention
In view of this, the present invention proposes a kind of public sentiment based on data mining technology monitoring system.
A kind of public sentiment based on data mining technology monitoring system, it includes such as lower unit:
Data capture unit, for crawling internet public feelings primary data by web crawler;
Sharding unit, for internet public feelings primary data carries out input burst, distributes one by each input burst Mapping tasks, the array of the position of input burst storage burst length and record data;
On data memory node, carry out mapping by the mapping function write in advance and obtain intermediate file;
Computing unit, for merging the duplicate key value in intermediate file, maps output file redundancy to reduce;And be combined After key assignments carry out serializing and obtain mapped cache file;Automatically each computational load value calculating node is obtained, according to calculating Each mapped cache file is assigned to each and calculates in node by the computational load value of node;
Buffer unit, for opening up circulating memory relief area in internal memory, circulating memory relief area is used for mapping output literary composition Part exports;In circulating memory relief area, create configuration file, configuration file configures the EMS memory occupation threshold of core buffer Value;In circulating memory relief area, EMS memory occupation is more than or equal to when taking threshold value, and protection thread time-out writes data into internal memory, And in internal memory, writing spill file, spill file determines the file of write disk, and is write by the file of circulating memory relief area Enter disk until the output of all of mapping output file is complete;
Output unit, is used for all of mapping output file and stores distributed file storage system;
Modeling unit, is used for setting up network public-opinion forecast model;
Predicting unit, is mapped output file for reading from distributed file storage system and is predicted by network public-opinion Model carries out public sentiment prediction.
In public sentiment monitoring system based on data mining technology of the present invention,
Described data capture unit includes:
From self-defined crawl list, take out chained address by web crawler, obtain network text;
Carry out Webpage detecting degree of depth network data source, take out data noise, extract body text, carry out theme phase Pass degree determination processing.
In public sentiment monitoring system based on data mining technology of the present invention, to the Internet in described sharding unit Public sentiment primary data carries out inputting burst and includes:
Set up incidence relation table, input file is split as position relationship value, activity relationship value, structural relation value, function Relation value, functional relationship value, behavior relation value and other relation value, and by the correspondence of each relation value of each input file In relation write incidence relation table;
Data corresponding for each relation value are put under in input burst.
In public sentiment monitoring system based on data mining technology of the present invention, by advance in described sharding unit The mapping function write carries out mapping on data memory node and obtains intermediate file and include:
By the mapping function write in advance, input burst is mapped according to mapping tasks, described mapping include according to Input burst content is carried out list alignment by the data form pre-set, it is judged that position relationship value, activity relationship value, structure are closed Whether set occurrence, functional relationship value, functional relationship value, behavior relation value and other relation value exist, if each relation value is deposited The most directly retaining, if there is no a certain item or a few n-th-trem relation n value, then the relation value lacked is sky;The arrangement of each relation Order all keeps consistent.
In public sentiment monitoring system based on data mining technology of the present invention,
Described output unit includes:
From incidence relation table, inquire about each map all index informations that output file is corresponding, each is mapped output literary composition In each corresponding segment data section of a being inserted into list of part;The position relationship value of record segment data, activity relationship value, structure are closed Set occurrence, functional relationship value, functional relationship value, behavior relation value and other relation value.
In public sentiment monitoring system based on data mining technology of the present invention,
Input burst is mapped by described sharding unit by the mapping function by writing in advance according to mapping tasks Also include judging to input whether burst exists logical error according to incidence relation table, then abandon this input burst as existed.
In public sentiment monitoring system based on data mining technology of the present invention,
Described modeling unit includes:
All of mapping output file employing clustering algorithm is constructed, is formed with sequence network public sentiment data information;
Ordered network public sentiment data information carrying out Lycoperdon polymorphum Vitt add up, generate cumulative sequence, sequence formula is as follows:
x(1)=[x(1)(1),x(1)(2),...x(1)(n)], wherein
By unitized method, the cumulative sequence data generated is zoomed in and out, transform it between [0,1], normalization Formula be:Wherein xi, xi ' represent the value before and after conversion respectively, min (x), max (x) table respectively It is shown with maximum and the minima of sequence network public sentiment data information;
Set up network public-opinion gray level model, and the sample pre-entered is predicted, predictive value is carried out regressive reduction Computing obtains network public-opinion predictive value;
The residual error calculating network threshold predictive value and actual value obtains residual error training sample;
Residual error training sample input reverse transmittance nerve network is trained, and is optimized with particle cluster algorithm and obtains Network public-opinion forecast model.
Public sentiment based on the data mining technology monitoring system that implementing the present invention provides compared with prior art has following Beneficial effect: if by the network public-opinion data of magnanimity have been divided into stem portion according to the rule pre-set, giving multiple stage Processor parallel processing;Then the result after each processor being processed carries out collecting operation to obtain final result;Can be real Now process data a large amount of, non-structured, improve data processing type and speed.And pass through reverse transmittance nerve network Obtain network public-opinion forecast model, can deeply excavate the Changing Pattern between network public-opinion data, it is possible to effectively, the most right Network public-opinion is monitored.
Accompanying drawing explanation
Fig. 1 is public sentiment based on the data mining technology monitoring system architecture diagram of the embodiment of the present invention.
Detailed description of the invention
As it is shown in figure 1, a kind of public sentiment based on data mining technology monitoring system, it includes such as lower unit:
Data capture unit, for crawling internet public feelings primary data by web crawler.
The source of internet public feelings primary data includes the channels such as public number of internet web page, microblogging, wechat, forum.
Sharding unit, for internet public feelings primary data carries out input burst, distributes one by each input burst Mapping tasks, the array of the position of input burst storage burst length and record data;
On data memory node, carry out mapping by the mapping function write in advance and obtain intermediate file;
Computing unit, for merging the duplicate key value in intermediate file, maps output file redundancy to reduce;And be combined After key assignments carry out serializing and obtain mapped cache file;Automatically each computational load value calculating node is obtained, according to calculating Each mapped cache file is assigned to each and calculates in node by the computational load value of node;
Buffer unit, for opening up circulating memory relief area in internal memory, circulating memory relief area is used for mapping output literary composition Part exports;In circulating memory relief area, create configuration file, configuration file configures the EMS memory occupation threshold of core buffer Value;In circulating memory relief area, EMS memory occupation is more than or equal to when taking threshold value, and protection thread time-out writes data into internal memory, And in internal memory, writing spill file, spill file determines the file of write disk, and is write by the file of circulating memory relief area Enter disk until the output of all of mapping output file is complete;
Output unit, is used for all of mapping output file and stores distributed file storage system;
Modeling unit, is used for setting up network public-opinion forecast model;
Predicting unit, is mapped output file for reading from distributed file storage system and is predicted by network public-opinion Model carries out public sentiment prediction.
In public sentiment monitoring system based on data mining technology of the present invention,
Described data capture unit includes:
From self-defined crawl list, take out chained address by web crawler, obtain network text;
Carry out Webpage detecting degree of depth network data source, take out data noise, extract body text, carry out theme phase Pass degree determination processing.
In public sentiment monitoring system based on data mining technology of the present invention, to the Internet in described sharding unit Public sentiment primary data carries out inputting burst and includes:
Set up incidence relation table, input file is split as position relationship value, activity relationship value, structural relation value, function Relation value, functional relationship value, behavior relation value and other relation value, and by the correspondence of each relation value of each input file In relation write incidence relation table;
Data corresponding for each relation value are put under in input burst.
In public sentiment monitoring system based on data mining technology of the present invention, by advance in described sharding unit The mapping function write carries out mapping on data memory node and obtains intermediate file and include:
By the mapping function write in advance, input burst is mapped according to mapping tasks, described mapping include according to Input burst content is carried out list alignment by the data form pre-set, it is judged that position relationship value, activity relationship value, structure are closed Whether set occurrence, functional relationship value, functional relationship value, behavior relation value and other relation value exist, if each relation value is deposited The most directly retaining, if there is no a certain item or a few n-th-trem relation n value, then the relation value lacked is sky;The arrangement of each relation Order all keeps consistent.
In public sentiment monitoring system based on data mining technology of the present invention,
Described output unit includes:
From incidence relation table, inquire about each map all index informations that output file is corresponding, each is mapped output literary composition In each corresponding segment data section of a being inserted into list of part;The position relationship value of record segment data, activity relationship value, structure are closed Set occurrence, functional relationship value, functional relationship value, behavior relation value and other relation value.
In public sentiment monitoring system based on data mining technology of the present invention,
Input burst is mapped by described sharding unit by the mapping function by writing in advance according to mapping tasks Also include judging to input whether burst exists logical error according to incidence relation table, then abandon this input burst as existed.
In public sentiment monitoring system based on data mining technology of the present invention,
Described modeling unit includes:
All of mapping output file employing clustering algorithm is constructed, is formed with sequence network public sentiment data information;
Ordered network public sentiment data information carrying out Lycoperdon polymorphum Vitt add up, generate cumulative sequence, sequence formula is as follows:
x(1)=[x(1)(1),x(1)(2),...x(1)(n)], wherein
By unitized method, the cumulative sequence data generated is zoomed in and out, transform it between [0,1], normalization Formula be:Wherein xi, xi ' represent the value before and after conversion respectively, min (x), max (x) table respectively It is shown with maximum and the minima of sequence network public sentiment data information;
Set up network public-opinion gray level model, and the sample pre-entered is predicted, predictive value is carried out regressive reduction Computing obtains network public-opinion predictive value;
The residual error calculating network threshold predictive value and actual value obtains residual error training sample;
Residual error training sample input reverse transmittance nerve network is trained, and is optimized with particle cluster algorithm and obtains Network public-opinion forecast model.
Public sentiment based on the data mining technology monitoring system that implementing the present invention provides compared with prior art has following Beneficial effect: if by the network public-opinion data of magnanimity have been divided into stem portion according to the rule pre-set, giving multiple stage Processor parallel processing;Then the result after each processor being processed carries out collecting operation to obtain final result;Can be real Now process data a large amount of, non-structured, improve data processing type and speed.And pass through reverse transmittance nerve network Obtain network public-opinion forecast model, can deeply excavate the Changing Pattern between network public-opinion data, it is possible to effectively, the most right Network public-opinion is monitored.
It is understood that for the person of ordinary skill of the art, can conceive according to the technology of the present invention and do Go out other various corresponding changes and deformation, and all these change all should belong to the protection model of the claims in the present invention with deformation Enclose.

Claims (7)

1. public sentiment based on a data mining technology monitoring system, it is characterised in that it includes such as lower unit:
Data capture unit, for crawling internet public feelings primary data by web crawler;
Sharding unit, for carrying out input burst, by each input burst one mapping of distribution by internet public feelings primary data Task, the array of the position of input burst storage burst length and record data;
On data memory node, carry out mapping by the mapping function write in advance and obtain intermediate file;
Computing unit, for merging the duplicate key value in intermediate file, maps output file redundancy to reduce;And after being combined Key assignments carries out serializing and obtains mapped cache file;Automatically each computational load value calculating node is obtained, according to calculating node Computational load value each mapped cache file be assigned to each calculate in node;
Buffer unit, for opening up circulating memory relief area in internal memory, it is defeated that circulating memory relief area is used for mapping output file Go out;In circulating memory relief area, create configuration file, configuration file configures the EMS memory occupation threshold value of core buffer;? In circulating memory relief area, EMS memory occupation is more than or equal to when taking threshold value, protects thread to suspend and writes data into internal memory, and Writing spill file in internal memory, spill file determines the file of write disk, and the file of circulating memory relief area is write magnetic Dish is until the output of all of mapping output file is complete;
Output unit, is used for all of mapping output file and stores distributed file storage system;
Modeling unit, is used for setting up network public-opinion forecast model;
Predicting unit, maps output file and by network public-opinion forecast model for reading from distributed file storage system Carry out public sentiment prediction.
2. public sentiment based on data mining technology monitoring system as claimed in claim 1, it is characterised in that
Described data capture unit includes:
From self-defined crawl list, take out chained address by web crawler, obtain network text;
Carry out Webpage detecting degree of depth network data source, take out data noise, extract body text, carry out degree of subject relativity Determination processing.
3. public sentiment based on data mining technology monitoring system as claimed in claim 2, it is characterised in that described sharding unit In internet public feelings primary data carried out inputs burst include:
Set up incidence relation table, input file is split as position relationship value, activity relationship value, structural relation value, functional relationship Value, functional relationship value, behavior relation value and other relation value, and by the corresponding relation of each relation value of each input file In write incidence relation table;
Data corresponding for each relation value are put under in input burst.
4. public sentiment based on data mining technology monitoring system as claimed in claim 3, it is characterised in that described sharding unit In on data memory node, carry out mapping by the mapping function write in advance and obtain intermediate file and include:
Being mapped according to mapping tasks by input burst by the mapping function write in advance, described mapping includes according in advance Input burst content is carried out list alignment by the data form arranged, it is judged that position relationship value, activity relationship value, structural relation Whether value, functional relationship value, functional relationship value, behavior relation value and other relation value exist, if each relation value exists The most directly retaining, if there is no a certain item or a few n-th-trem relation n value, then the relation value lacked is sky;The arrangement of each relation is suitable Sequence all keeps consistent.
5. public sentiment based on data mining technology monitoring system as claimed in claim 4, it is characterised in that
Described output unit includes:
From incidence relation table, inquire about each map all index informations that output file is corresponding, each is mapped output file In each corresponding segment data section of being inserted into list;The position relationship value of record segment data, activity relationship value, structural relation Value, functional relationship value, functional relationship value, behavior relation value and other relation value.
6. public sentiment based on data mining technology monitoring system as claimed in claim 5, it is characterised in that
Described sharding unit is also wrapped being mapped according to mapping tasks by input burst by the mapping function write in advance Include and judge to input whether burst exists logical error according to incidence relation table, then abandon this input burst as existed.
7. public sentiment based on data mining technology monitoring system as claimed in claim 6, it is characterised in that
Described modeling unit includes:
All of mapping output file employing clustering algorithm is constructed, is formed with sequence network public sentiment data information;
Ordered network public sentiment data information carrying out Lycoperdon polymorphum Vitt add up, generate cumulative sequence, sequence formula is as follows:
x(1)=[x(1)(1),x(1)(2),...x(1)(n)], wherein
By unitized method, the cumulative sequence data generated is zoomed in and out, transform it between [0,1], normalized public affairs Formula is:Wherein xi, xi ' represent the value before and after conversion respectively, and min (x), max (x) indicate respectively The maximum of sequence network public sentiment data information and minima;
Set up network public-opinion gray level model, and the sample pre-entered is predicted, predictive value is carried out regressive reduction computing Obtain network public-opinion predictive value;
The residual error calculating network threshold predictive value and actual value obtains residual error training sample;
Residual error training sample input reverse transmittance nerve network is trained, and is optimized with particle cluster algorithm and obtains network Public sentiment forecast model.
CN201610507203.9A 2016-07-01 2016-07-01 A kind of public sentiment monitoring system based on data mining technology Active CN106202278B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610507203.9A CN106202278B (en) 2016-07-01 2016-07-01 A kind of public sentiment monitoring system based on data mining technology

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610507203.9A CN106202278B (en) 2016-07-01 2016-07-01 A kind of public sentiment monitoring system based on data mining technology

Publications (2)

Publication Number Publication Date
CN106202278A true CN106202278A (en) 2016-12-07
CN106202278B CN106202278B (en) 2019-08-13

Family

ID=57463003

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610507203.9A Active CN106202278B (en) 2016-07-01 2016-07-01 A kind of public sentiment monitoring system based on data mining technology

Country Status (1)

Country Link
CN (1) CN106202278B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106951475A (en) * 2017-03-07 2017-07-14 郑州铁路职业技术学院 Big data distributed approach and system based on cloud computing
CN107679133A (en) * 2017-09-22 2018-02-09 电子科技大学 A kind of method for digging for being practically applicable to the real-time PMU data of magnanimity
CN109471965A (en) * 2018-10-26 2019-03-15 四川才子软件信息网络有限公司 A kind of network public-opinion data sampling and processing method and monitoring platform based on big data
WO2020042427A1 (en) * 2018-08-31 2020-03-05 平安科技(深圳)有限公司 Reconciliation method and apparatus based on data fragments, computer device, and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5724573A (en) * 1995-12-22 1998-03-03 International Business Machines Corporation Method and system for mining quantitative association rules in large relational tables
CN104063230A (en) * 2014-07-09 2014-09-24 中国科学院重庆绿色智能技术研究院 Rough set parallel reduction method, device and system based on MapReduce

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5724573A (en) * 1995-12-22 1998-03-03 International Business Machines Corporation Method and system for mining quantitative association rules in large relational tables
CN104063230A (en) * 2014-07-09 2014-09-24 中国科学院重庆绿色智能技术研究院 Rough set parallel reduction method, device and system based on MapReduce

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
杨光明子: "基于入侵检测的APT防御平台的设计与实现", 《中国优秀硕士学位论文全文数据库信息科技辑》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106951475A (en) * 2017-03-07 2017-07-14 郑州铁路职业技术学院 Big data distributed approach and system based on cloud computing
CN107679133A (en) * 2017-09-22 2018-02-09 电子科技大学 A kind of method for digging for being practically applicable to the real-time PMU data of magnanimity
CN107679133B (en) * 2017-09-22 2020-01-17 电子科技大学 Mining method applicable to massive real-time PMU data
WO2020042427A1 (en) * 2018-08-31 2020-03-05 平安科技(深圳)有限公司 Reconciliation method and apparatus based on data fragments, computer device, and storage medium
CN109471965A (en) * 2018-10-26 2019-03-15 四川才子软件信息网络有限公司 A kind of network public-opinion data sampling and processing method and monitoring platform based on big data

Also Published As

Publication number Publication date
CN106202278B (en) 2019-08-13

Similar Documents

Publication Publication Date Title
Srivastava et al. Identifying aggression and toxicity in comments using capsule network
Shao et al. A novel method for intelligent fault diagnosis of rolling bearings using ensemble deep auto-encoders
US20220188521A1 (en) Artificial intelligence-based named entity recognition method and apparatus, and electronic device
Khan et al. Big data: survey, technologies, opportunities, and challenges
Wang et al. LightLog: A lightweight temporal convolutional network for log anomaly detection on the edge
US8756053B2 (en) Systems and methods for extracting patterns from graph and unstructured data
WO2018226404A1 (en) Machine reasoning based on knowledge graph
ALRashdi et al. Deep learning and word embeddings for tweet classification for crisis response
CN106202278A (en) A kind of public sentiment based on data mining technology monitoring system
Li et al. Exploiting microblog conversation structures to detect rumors
Lu et al. Sentence semantic matching based on 3D CNN for human–robot language interaction
CN111160049B (en) Text translation method, apparatus, machine translation system, and storage medium
Xu et al. Rumor detection on social media using hierarchically aggregated feature via graph neural networks
Wu et al. Adversarial contrastive learning for evidence-aware fake news detection with graph neural networks
Butt et al. Towards secure private and trustworthy human-centric embedded machine learning: An emotion-aware facial recognition case study
Yan et al. A clustering algorithm for multi-modal heterogeneous big data with abnormal data
US10902215B1 (en) Social hash for language models
CN103902582B (en) A kind of method and apparatus for reducing data warehouse data redundancy
CN109416621A (en) Restore the free space in non-volatile memories using the computer memory system of shared object is supported
CN113221717A (en) Model construction method, device and equipment based on privacy protection
CN109977194B (en) Text similarity calculation method, system, device and medium based on unsupervised learning
Nicolaidis Global PeaceTech: Unlocking the Better Angels of our Techne
US11762896B2 (en) Relationship discovery and quantification
CN113626650A (en) Service processing method and device and electronic equipment
Miran et al. Detection of Hate-Speech Tweets Based on Deep Learning: A Review

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant