News public sentiment monitoring system
Technical field
The present invention relates to internet information processing technology field, it relates in particular to a kind of news public sentiment monitoring system.
Background technology
With internet developing rapidly in the world, the network media has been acknowledged as after newspaper, broadcast, TV
" fourth media " afterwards, network turns into one of main carriers of reflection Social Public Feelings.
Network public-opinion is that the public is to having that some focuses, focal issue in actual life are held by transmission on Internet
Stronger influence power, tendentious emotion, attitude, opinion, speech or viewpoint, its mainly by forum BBS posting comment and
Follow-up post, blog Blog etc. are realized and strengthened.Due to internet have virtual, disguised, diversity, permeability and arbitrarily
The features such as property, increasing netizen gladly expresses viewpoint, propagating thought by this channel.
Network public-opinion is one powerful public opinion strength, can react on focus incident and to social development and state of affairs process
Produce certain influence.Due to the opening of network, network public-opinion can be caused to be formed rapidly, it is huge to social influence.Particularly
When there is negative Internet news public sentiment, if can not in time understand, effectively guide, it is easy to public opinion crisis is formed, when serious
Even influence public safety.Positive neutralizing to Internet news public opinion crisis, to maintaining social stability, promoting national development to have
Important realistic meaning, is also to create harmonious society to have an intension.Internet news public sentiment viewpoint is collected with suitable
Important meaning, netizen's viewpoint plays vital effect in the evolution of a focus incident, it might even be possible to recognized
To be the core of Internet news public sentiment.
Recently, developing rapidly with Internet technology, breaks the control of information with news media etc. for the new media of representative
System and monopolize, people's Free Surface reaches the attitude and opinion of oneself on network, no longer as the past is so easily unconditionally accepted,
On the contrary, the Interest demands of different estate are presented one after another, different thought viewpoint head-on crash.For related governmental departments, how
Awareness network news public sentiment promptly and accurately, strengthens timely monitoring, the effectively guiding to Internet news public opinion, as Internet news
One big difficult point of public sentiment management.In this case, construction can cover the news public sentiment monitoring system in news data source very
Necessity, such system can be directed to new news media's communication environments, further the focus analysis method of further investigation news public sentiment
And the influence that new media is brought, the research of news public sentiment is carried out abundant and perfect.
Although having there is many units to propose some different solutions for the monitoring of Internet news public sentiment at present.But
Be, it is necessary to those skilled in the art solve technical problem be how to improve judge Internet news public feelings information efficiency and accurately
Degree.Because so far, not yet there is the network public-opinion monitoring system for more efficiently, being accurately directed to news media's data.
The content of the invention
The present invention is aiming at the weak point in above-mentioned background technology, and a kind of public sentiment monitoring of the news media proposed
System, it has higher accuracy rate.The purpose of the present invention is achieved by the following technical measures.
The present invention proposes a kind of news public sentiment monitoring system, and it is pre- that the system includes news information acquisition module 1, news data
Processing module 2, news the analysis of public opinion module 3 and news public sentiment result display module 4, wherein
News information acquisition module 1 is used to be acquired the news public feelings information on internet, obtains news data;
The garbage that news data pretreatment module 2 is used in the news data that obtains news information acquisition module 1
Remove, and necessary arrange is carried out to the news data for eliminating garbage;
Based on the news data that news the analysis of public opinion module 3 is arranged by news data pretreatment module 2, using multiple new
Hear focus and find that submodule finds news public sentiment hot;
News public sentiment result display module 4 realizes that user hands over chart or report form output news the analysis of public opinion result
Mutual function.
Preferably, the news information acquisition module 1 is used to, according to the keyword specified, come origin url or message subject, make
With the search engine web crawlers method based on link analysis, queue concomitantly automatic data collection polytype is downloaded by multithreading
News public feelings information;Wherein, polytype news public feelings information at least includes the text message and/or picture of news
Information;And
The news data pretreatment module 2 includes:Preliminary filter submodule 2a, text extracting sub-module 2b, participle
Module 2c, feature phrase filter submodule 2d, emotion tendentiousness of text analysis submodule 2e, picture analyzing submodule 2f and public sentiment
Temperature acquisition submodule 2g.
Preferably, the preliminary filter submodule 2a, for tentatively being filtered to the information in news data, removes institute
The noise in news data is stated, following handle is carried out to every news data:
Step 2a-1, for every news data, semantic analysis is carried out according to title, detect in network with this news
The similar all news datas of data, obtain the similar group of this news data;If do not found similar to this news data
News data, then the similar group of this news data be itself;
Step 2a-2, for every news data, by the similar group of this news data that all positions occur in network
In all news datas total quantity divided by issue the network address of all news datas in the similar group of this news data
Total quantity, the space for obtaining this news data repeats angle value S1;
Step 2a-3, for every news data, owns in the similar group of this news data occurred in calculating network
The total quantity of news data, the time for obtaining this news data repeats angle value S2;
Step 2a-4, repeats angle value S1 according to the space of every news data and the time repeats angle value S2 and calculates this news
The comprehensive of data repeats angle value S, and carries out threshold decision, if the comprehensive angle value S that repeats exceedes threshold value, filters out this
News data and its similar group;
Wherein, the comprehensive angle value S that repeats is calculated by below equation:
S=(log2(S1+50))1/2+(log2(S2+20))1/2+((lgS1)*(lgS2))1/4。
Preferably, the text extracting sub-module 2b, for the news number after the preliminary filter submodule 2a processing
In, the information of the body part useful to news the analysis of public opinion is extracted, body part is reconstructed, will be had
The representational news information of theme flocks together;
The participle submodule 2c, for being carried out to the news data after text extracting sub-module 2b processing at participle
Reason, filtering stop words, name Entity recognition, syntax parsing, part-of-speech tagging, emotion recognition, Feature Words are extracted and feature phrase
Extract, set up positive sequence index and inverted order index;And word is parsed according to the grammatical attribute of word, part of speech attribute, emotion attribute
Tendentiousness, subject attribute and emotion attribute.
Preferably, the feature phrase filter submodule 2d, for the news number after participle submodule 2c processing
Feature phrase in carries out filtering screening, comprises the following steps:
Step 2d-1, duplicate removal is carried out to feature phrase, including:The repeated feature phrase occurred in the text for recording news
And the number of times of its appearance, filter out the frequency of occurrences and be less than repetition threshold value less than the repeated feature phrase and length for repeating threshold value
Repeated feature phrase;
Step 2d-2, is grouped to feature phrase, including:Calculate between each feature phrase and other feature phrases
Similarity value, the feature phrase by Similarity value higher than similarity threshold is divided into identical group;If a feature phrase and institute
It is all 0 to have the Similarity value between other feature phrases, then filters out this feature phrase;Specifically, following three can be selected
One of individual step calculates described two feature phrase X, Y Similarity value Sims (X, Y), then carries out feature phrase point
Group:
Step 2d-2-1:
First, described feature phrase X, Y Similarity value Sims (X, Y) are the same word between two feature phrases X, Y
The quantity of symbol;
Secondly, if Sims (X, Y)>Feature phrase Y, then be divided into the group where feature phrase X by threshold value TD1;
Step 2d-2-2:
First, it is assumed that the quantity for occurring feature phrase X, Y sentence simultaneously is sum (XY);Only there is feature phrase X, no
The quantity for feature phrase Y sentence occur is sum (X);Only there is feature phrase Y, occur without the quantity of feature phrase X sentence
For sum (Y);Now, feature phrase X, Y Similarity value Sims (X, Y) calculation formula is as follows:
Sims (X, Y)=log2(sum(XY))/log2(sum(X))+log2(sum(XY))/log2(sum(Y));
Secondly, if Sims (X, Y)>Feature phrase Y, then be divided into the group where feature phrase X by threshold value TD2;
Step 2d-2-3:
Assuming that the number that two feature phrases X, Y include character is respectively m and n, k is made to take the smaller value in m, n, respectively
With the subphrase of preceding i character composition in Xi, Yi representative feature phrase X, Y, wherein, i=1,2 ..., k;Definition:
| Xi-Yi | the character quantity included in the most long common characters string for representing subphrase Xi, Yi, then feature phrase X, Y
Similarity value Sims (X, Y) calculation formula it is as follows:
Sims (X, Y)=(| X1-Y1 |3+|X2-Y2|3+…+|Xk-Yk|3)1/3;
Secondly, if Sims (X, Y)>Feature phrase Y, then be divided into the group where feature phrase X by threshold value TD3;
Step 2d-3, entropy filtering is carried out to feature phrase, including:The entropy of feature phrase is calculated, entropy is filtered out low
It is higher than the feature phrase of default upper threshold value in the feature phrase and entropy of default lower threshold value.
Preferably, the emotion tendentiousness of text analysis submodule 2e, the emotion tendentiousness of text point for performing news
Analysis, comprises the following steps:
Step 2e-1, manually chooses the Chinese of some common emotion tendencies and adjective, noun and the verb of English
Be used as initialization seed collection;Wherein, the initialization seed is concentrated, and adjectival quantity can be 50, noun and verb
Quantity can be 150;
Step 2e-2, nominal original reference pair is reduced to by all pronouns with reference relation in the text of news
As to prevent that object being failed to judge or misjudges during analysis;
Step 2e-3, in units of the sentence of news, news is analyzed using part-of-speech tagging POS and semantic character labeling SRL
In each sentence sentence element, extract the subjectivity word in each sentence;
Step 2e-4, sequentially inputs the subjectivity word in each sentence, according to the subset in the sentence of news
Subjectivity word carry out emotion tendency automatic marking;For can not automatic marking subjectivity word, by artificial judgment its
After emotion tendency, the subjectivity word is added to the subset.
Preferably, the picture analyzing submodule 2f, in news data picture visual signature carry out extract and
Expression, the visual signature of the picture includes color characteristic, Tamura textural characteristics and the shape facility of picture;
The color characteristic is represented by the color histogram based on HSV space, Luv spaces and Lab space;
The Tamura textural characteristics include roughness, contrast and the direction degree of picture;
The shape facility is included by the coordinate of all pixels point carries out Fourier on object boundary profile in picture
Curvature function, centroid distance and the complex coordinates function for converting and obtaining.
Preferably, the public sentiment temperature acquisition submodule 2g, the public sentiment temperature weights ρ for calculating the news, if ρ is big
In threshold value T ρ set in advance, then using the news as the data source of the analysis of public opinion and analysis foundation, specifically:
Assuming that browsing hits for K1, comment number is K2, and reply number is K3, clicks on and supports number to be K4, clicks on antilogarithm and is
K5, forwarding number is K6, and collection number is K7, and 1~ξ of ξ 4 are set in advance and adjustable coefficient, then
ρ=(lg (K1)3/4+0.03)*ξ1+(lg((K2)2/3+(K3)2/3)+0.02)*ξ2+(lg((K4)1/2+(K5)1/2)+
0.01)*ξ3+(lg((K6)1/3+(K7)1/3)+0.005)*ξ4;
Wherein, 1~ξ of ξ 4 could be arranged to:ξ 1=0.5;ξ 2=0.3;ξ 3=0.2;ξ 4=0.1.
Preferably, the news the analysis of public opinion module 3 is used to analyze and find news public sentiment hot, comprises the following steps:
First, submodule is found using multiple hot news, news carriage is obtained by parallel distributed computing
Feelings focus, the hot news finds that submodule includes:
1) Single-Pass hot news finds submodule 3.1, and the submodule uses the single based on MapReduce
Pass algorithms;
2) KNN hot news finds submodule 3.2, and the submodule is calculated using the KNN arest neighbors classification based on MapReduce
Method;
3) SVM hot news finds submodule 3.3, and the submodule is calculated using the support vector machines based on MapReduce
Method;
4) K-means hot news finds submodule 3.4, and the submodule is calculated using the K average clusters based on MapReduce
Method;And
5) SOM hot news finds submodule 3.5, and the submodule is using the Self-organizing Maps SOM god based on MapReduce
Through network clustering algorithm;
Secondly, all news public sentiment hots that submodule is obtained respectively, which are converged, to be found to each above-mentioned hot news
Always, following classification is carried out to judge:
If the news public sentiment hot obtained finds submodule from above three above focus, by the news public sentiment
The category label of focus is senior news public sentiment hot;
If the news public sentiment hot obtained finds submodule from above-mentioned two focus, by the news public sentiment hot
Category label be intermediate news public sentiment hot;
If the news public sentiment hot obtained is derived only from said one focus and finds submodule, by news public sentiment heat
The category label of point is primary news public sentiment hot;
Finally, the senior news public sentiment hot, intermediate news public sentiment hot and primary news public sentiment hot are sent out successively
It is sent to the news public sentiment result display module 4.
Preferably, the news public sentiment result display module 4 is based on J2EE frameworks, can be formed:News public feelings information temperature
Rank form, news public sentiment warning information distribution form, news public sentiment geography information distribution form, news public sentiment sentiment analysis report
Table, news public sentiment statistic form and news public sentiment trend move towards analysis chart.
In the prior art, the key data source of network public-opinion is usually various websites or forum, and individually for news
The monitoring system of public sentiment data is then fewer;Even specifically designed for the monitoring system of news public sentiment data, also tending to due to each
Kind of reason and accuracy rate or less efficient.And the present invention proposes a kind of public sentiment data specifically designed for news network data source
Monitoring system.
Compared with prior art, the present invention includes advantages below:
First, news public sentiment monitoring system of the invention is towards news network resource, and the news data gathered is through preliminary
The numbers such as filtering, text extraction, participle, feature phrase filtering, emotion tendentiousness of text analysis, picture analyzing, the acquisition of public sentiment temperature
Data preprocess step, effectively increases the news public sentiment data filter efficiency of news network data source;
Secondly, by distributed cloud computing mode, extensive gathered data can be excavated, analyzed, and can
News public sentiment hot is obtained based on a variety of news public sentiment monitoring algorithm modules, to the news public sentiment hot comprehensive descision point
Class, so that the discovery to news public sentiment hot topic and tracking, the social network analysis to news are realized, analysis result visualization
Present, be the units such as Party and government offices, large enterprise and tissue find in time nose for news information, grasp news public sentiment hot,
Hold news public sentiment trend, the crisis of reply news public sentiment and automation, systematization and scientific Informational support are provided.Effectively increase
The accuracy that the news public sentiment monitoring system judges, for Internet news public feelings information subsequent treatment provide it is more true,
It is accurately basic.
Brief description of the drawings
Technical scheme is further detailed below in conjunction with the accompanying drawings.In the accompanying drawings, identical accompanying drawing is used
Mark represents identical functional module.The accompanying drawing is only used for showing the purpose of preferred embodiment, and is not considered as to this
The limitation of invention.
Fig. 1 shows the functional structure chart of news public sentiment monitoring system according to an embodiment of the invention.
Fig. 2 shows the functional structure chart of news data pretreatment module according to an embodiment of the invention.
Embodiment
By the detailed description of hereafter preferred embodiment, various other advantages and benefit are for ordinary skill
Personnel will be clear understanding.The description is only the general introduction of technical solution of the present invention, in order to better understand the present invention
Technological means, and can be practiced according to the content of specification, and in order to allow above and other objects of the present invention, feature
It can be become apparent with advantage.
The exemplary embodiment of the disclosure is more fully described below with reference to accompanying drawings.Although showing the disclosure in accompanying drawing
Exemplary embodiment, it being understood, however, that may be realized in various forms the disclosure without should be by embodiments set forth here
Limited.On the contrary, these embodiments are provided to facilitate a more thoroughly understanding of the present invention, and can be by the scope of the present disclosure
Complete conveys to those skilled in the art.
A kind of news public sentiment monitoring system is claimed in the present invention, and the system includes news information acquisition module, news number
Data preprocess module, news the analysis of public opinion module and news public sentiment result display module;The news data pretreatment module bag
Include:Preliminary filter submodule, text extracting sub-module, participle submodule, feature phrase filter submodule, emotion tendentiousness of text
Analyze submodule, picture analyzing submodule and public sentiment temperature acquisition submodule;For the pretreatment of news data pretreatment module
Data afterwards, by distributed cloud computing mode, news public sentiment is obtained using a variety of news public sentiment monitoring algorithm submodules
Focus, and comprehensive descision classification is carried out to the news public sentiment hot of acquisition assess, thus realize to news public sentiment hot topic compared with
Efficiently, accurately monitoring.
Fig. 1 is the functional structure chart of news public sentiment monitoring system according to an embodiment of the invention.
As shown in figure 1, the news public sentiment monitoring system includes four modules, it is respectively:News information acquisition module 1,
News data pretreatment module 2, news the analysis of public opinion module 3 and news public sentiment result display module 4.Wherein:
News information acquisition module 1 is used to be acquired the news public feelings information on internet, obtains news data;
The garbage that news data pretreatment module 2 is used in the news data that obtains news information acquisition module 1
Remove, and necessary arrange is carried out to the news data for eliminating garbage;
Based on the information that news the analysis of public opinion module 3 is arranged by news data pretreatment module 2, son is found using focus
Module finds public sentiment hot;
News public sentiment result display module 4 realizes that user hands over chart or report form output news the analysis of public opinion result
Mutual function.
Specifically:
The news information acquisition module 1 is used for according to the keyword specified, come origin url or message subject, using based on
The search engine web crawlers method of link analysis, the queue concomitantly polytype news of automatic data collection is downloaded by multithreading
Public feelings information;Wherein, polytype news public feelings information at least includes text message and/or pictorial information.
Fig. 2 is the functional structure chart of news data pretreatment module according to an embodiment of the invention.
As shown in Fig. 2 the news data pretreatment module 2 includes:Preliminary filter submodule 2a, text extracting sub-module
2b, participle submodule 2c, feature phrase filter submodule 2d, emotion tendentiousness of text analysis submodule 2e, picture analyzing submodule
Block 2f and public sentiment temperature acquisition submodule 2g.
Specifically:
The preliminary filter submodule 2a, for tentatively being filtered to the information in news data, removes the news
Noise in data, following handle is carried out to every news data:
Step 2a-1, for every news data, semantic analysis is carried out according to title, detect in network with this news
The similar all news datas of data, obtain the similar group of this news data;If do not found similar to this news data
News data, then the similar group of this news data be itself;
Step 2a-2, for every news data, by the similar group of this news data that all positions occur in network
In all news datas total quantity divided by issue the network address of all news datas in the similar group of this news data
Total quantity, the space for obtaining this news data repeats angle value S1;
Step 2a-3, for every news data, owns in the similar group of this news data occurred in calculating network
The total quantity of news data, the time for obtaining this news data repeats angle value S2;
Step 2a-4, repeats angle value S1 according to the space of every news data and the time repeats angle value S2 and calculates this news
The comprehensive of data repeats angle value S, and carries out threshold decision, if the comprehensive angle value S that repeats exceedes threshold value, filters out this
News data and its similar group;
Wherein, the comprehensive angle value S that repeats is calculated by below equation:
S=(log2(S1+50))1/2+(log2(S2+20))1/2+((lgS1)*(lgS2))1/4。
Specifically:
The text extracting sub-module 2b, in the news data after the preliminary filter submodule 2a processing, carrying
The information of the body part useful to news the analysis of public opinion is taken, body part is reconstructed, there will be theme generation
The news information of table flocks together;
The participle submodule 2c, for being carried out to the news data after text extracting sub-module 2b processing at participle
Reason, filtering stop words, name Entity recognition, syntax parsing, part-of-speech tagging, emotion recognition, Feature Words are extracted and feature phrase
Extract, set up positive sequence index and inverted order index;And word is parsed according to the grammatical attribute of word, part of speech attribute, emotion attribute
Tendentiousness, subject attribute and emotion attribute.
Specifically:
The feature phrase filter submodule 2d, for the spy in the news data after participle submodule 2c processing
Levy phrase and carry out filtering screening, comprise the following steps:
Step 2d-1, duplicate removal is carried out to feature phrase, including:The repeated feature phrase occurred in the text for recording news
And the number of times of its appearance, filter out the frequency of occurrences and be less than repetition threshold value less than the repeated feature phrase and length for repeating threshold value
Repeated feature phrase;
Step 2d-2, is grouped to feature phrase, including:Calculate between each feature phrase and other feature phrases
Similarity value, the feature phrase by Similarity value higher than similarity threshold is divided into identical group;If a feature phrase and institute
It is all 0 to have the Similarity value between other feature phrases, then filters out this feature phrase;Specifically, following three can be selected
One of individual step calculates described two feature phrase X, Y Similarity value Sims (X, Y), then carries out feature phrase point
Group:
Step 2d-2-1:
First, described feature phrase X, Y Similarity value Sims (X, Y) are the same word between two feature phrases X, Y
The quantity of symbol;
Secondly, if Sims (X, Y)>Feature phrase Y, then be divided into the group where feature phrase X by threshold value TD1;
Step 2d-2-2:
First, it is assumed that the quantity for occurring feature phrase X, Y sentence simultaneously is sum (XY);Only there is feature phrase X, no
The quantity for feature phrase Y sentence occur is sum (X);Only there is feature phrase Y, occur without the quantity of feature phrase X sentence
For sum (Y);Now, feature phrase X, Y Similarity value Sims (X, Y) calculation formula is as follows:
Sims (X, Y)=log2(sum(XY))/log2(sum(X))+log2(sum(XY))/log2(sum(Y));
Secondly, if Sims (X, Y)>Feature phrase Y, then be divided into the group where feature phrase X by threshold value TD2;
Step 2d-2-3:
Assuming that the number that two feature phrases X, Y include character is respectively m and n, k is made to take the smaller value in m, n, respectively
With the subphrase of preceding i character composition in Xi, Yi representative feature phrase X, Y, wherein, i=1,2 ..., k;Definition:
| Xi-Yi | the character quantity included in the most long common characters string for representing subphrase Xi, Yi, then feature phrase X, Y
Similarity value Sims (X, Y) calculation formula it is as follows:
Sims (X, Y)=(| X1-Y1 |3+|X2-Y2|3+…+|Xk-Yk|3)1/3;
Secondly, if Sims (X, Y)>Feature phrase Y, then be divided into the group where feature phrase X by threshold value TD3;
Step 2d-3, entropy filtering is carried out to feature phrase, including:The entropy of feature phrase is calculated, entropy is filtered out low
It is higher than the feature phrase of default upper threshold value in the feature phrase and entropy of default lower threshold value.
Specifically:
The emotion tendentiousness of text analyzes submodule 2e, and the emotion tendentiousness of text for performing news is analyzed, including
Following steps:
Step 2e-1, manually chooses the Chinese of some common emotion tendencies and adjective, noun and the verb of English
Be used as initialization seed collection;Wherein, the initialization seed is concentrated, and adjectival quantity can be 50, noun and verb
Quantity can be 150;
Step 2e-2, nominal original reference pair is reduced to by all pronouns with reference relation in the text of news
As to prevent that object being failed to judge or misjudges during analysis;
Step 2e-3, in units of the sentence of news, news is analyzed using part-of-speech tagging POS and semantic character labeling SRL
In each sentence sentence element, extract the subjectivity word in each sentence;
Step 2e-4, sequentially inputs the subjectivity word in each sentence, according to the subset in the sentence of news
Subjectivity word carry out emotion tendency automatic marking;For can not automatic marking subjectivity word, by artificial judgment its
After emotion tendency, the subjectivity word is added to the subset.
Specifically:
The picture analyzing submodule 2f, is extracted and is expressed for the visual signature to picture in news data, institute
Stating the visual signature of picture includes color characteristic, Tamura textural characteristics and the shape facility of picture;
The color characteristic is represented by the color histogram based on HSV space, Luv spaces and Lab space;
Topmost feature includes roughness (coarseness), the contrast of picture in the Tamura textural characteristics
(contrast) and direction degree (directionality), they are even more important to picture retrieval;
For the shape facility, system of the invention uses fourier descriptor (Fourier shape
Descriptors), basic thought is by the coordinate of all pixels point carries out Fourier's change on object boundary profile in picture
Change and obtain curvature function, centroid distance and complex coordinates function.
Specifically:
The public sentiment temperature acquisition submodule 2g, the public sentiment temperature weights ρ for calculating the news, if ρ is more than in advance
The threshold value T ρ of setting, then using the news as the analysis of public opinion data source and analysis foundation, specifically:
Assuming that browsing hits for K1, comment number is K2, and reply number is K3, clicks on and supports number to be K4, clicks on antilogarithm and is
K5, forwarding number is K6, and collection number is K7, and 1~ξ of ξ 4 are set in advance and adjustable coefficient, then
ρ=(lg (K1)3/4+0.03)*ξ1+(lg((K2)2/3+(K3)2/3)+0.02)*ξ2+(lg((K4)1/2+(K5)1/2)+
0.01)*ξ3+(lg((K6)1/3+(K7)1/3)+0.005)*ξ4;
Wherein, 1~ξ of ξ 4 could be arranged to:ξ 1=0.5;ξ 2=0.3;ξ 3=0.2;ξ 4=0.1.
The news the analysis of public opinion module 3 is used to divide the data after the news data pretreatment module 2 processing
Analyse to find news public sentiment hot.Specifically:
The present invention uses distributed cloud computing mode, extensive collection news data can be excavated, analyzed;And
News public sentiment hot can be obtained based on a variety of public sentiment monitoring algorithm modules, to the news public sentiment hot comprehensive descision point
Class, so that the discovery to news public sentiment hot topic and tracking, the social network analysis to news are realized, analysis result visualization
Present, be the units such as Party and government offices, large enterprise and tissue find in time nose for news information, grasp news public sentiment hot,
Hold news public sentiment trend, the crisis of reply news public sentiment and automation, systematization and scientific Informational support are provided.Effectively increase
The accuracy that the news public sentiment monitoring system judges, for Internet news public feelings information subsequent treatment provide it is more true,
It is accurately basic.Specifically:
By the news data and analysis result of distributed storage layer storage collection, the distributed storage layer is based on
HDFS is realized;
And in Distributed Calculation layer, realize that parallelization is calculated using MapReduce parallel calculating methods;
Optimized by the storage of HDFS files and transmission optimization, MapReduce parallel computations, realize the news public sentiment of magnanimity
The optimization of monitoring, and realize stabilization, efficient big data storage optimization so that the news public sentiment data query processing of magnanimity is excellent
Change, be with good expansibility, reliability, security.The system is based on cloud platform, with good response speed, supports
Magnanimity news data is analyzed to be serviced with excavating.
The news the analysis of public opinion module 3 is used to enter the news data after the news data pretreatment module 2 processing
Row is analyzed to find that comprising the following steps that for news public sentiment hot:
First, submodule is found using multiple hot news, news carriage is obtained by parallel distributed computing
Feelings focus, the hot news finds that submodule includes:
1) Single-Pass hot news finds submodule 3.1, and the submodule uses the single based on MapReduce
Pass algorithms;
2) KNN hot news finds submodule 3.2, and the submodule is calculated using the KNN arest neighbors classification based on MapReduce
Method;
3) SVM hot news finds submodule 3.3, and the submodule is calculated using the support vector machines based on MapReduce
Method;
4) K-means hot news finds submodule 3.4, and the submodule is calculated using the K average clusters based on MapReduce
Method;And
5) SOM hot news finds submodule 3.5, and the submodule is using the Self-organizing Maps SOM god based on MapReduce
Through network clustering algorithm;
Secondly, all news public sentiment hots that submodule is obtained respectively, which are converged, to be found to each above-mentioned hot news
Always, following classification is carried out to judge:
If the news public sentiment hot obtained finds submodule from above three above focus, by the news public sentiment
The category label of focus is senior news public sentiment hot;
If the news public sentiment hot obtained finds submodule from above-mentioned two focus, by the news public sentiment hot
Category label be intermediate news public sentiment hot;
If the news public sentiment hot obtained is derived only from said one focus and finds submodule, by news public sentiment heat
The category label of point is primary news public sentiment hot;
Finally, the senior news public sentiment hot, intermediate news public sentiment hot and primary news public sentiment hot are sent out successively
It is sent to the news public sentiment result display module 4.
Wherein, the algorithm that above-mentioned focus discovery submodule 3.1~3.5 is used is all using this area in general sense
General-purpose algorithm.Therefore the improvements of the present invention are not intended to above-mentioned several algorithms in itself.Because in existing news public sentiment prison
In examining system, a kind of news public sentiment hot therein has often simply been used to find algorithm, and not yet find will be above-mentioned a variety of new
Hear public sentiment hot and find that algorithm is used simultaneously, and the system to concentrating the result of algorithm to carry out grade separation.And, although this hair
Bright news public sentiment monitoring system has used a variety of public sentiment hots to find algorithm, but because the system of the present invention is employed based on cloud
The distributed structure/architecture of calculating, therefore the expense for being difficult to bear can't be brought, and due to the combination of various ways, substantially increase
The accuracy of news public sentiment monitoring system, achieves preferable technique effect.
Specifically:
The news public sentiment result display module 4 is based on J2EE frameworks, can be formed:News public feelings information temperature seniority among brothers and sisters report
It is table, news public sentiment warning information distribution form, news public sentiment geography information distribution form, news public sentiment sentiment analysis form, new
Hear public sentiment statistic form and news public sentiment trend moves towards analysis chart.
The embodiment of system and its comprising modules described in this specification be only it is schematical, can be according to reality
The need for select some or all of module therein to realize the purpose of scheme of the embodiment of the present invention.Ordinary skill people
Member is without creative efforts, you can to understand and implement.
In summary, it is only the present invention preferably embodiment, but protection scope of the present invention is not limited thereto,
Any one skilled in the art the invention discloses technical scope in, the change or replacement that can be readily occurred in,
It should all be included within the scope of the present invention.Therefore, protection scope of the present invention should be with scope of the claims
It is defined.