CN106227772A - A kind of public sentiment monitoring system based on semantic analysis - Google Patents

A kind of public sentiment monitoring system based on semantic analysis Download PDF

Info

Publication number
CN106227772A
CN106227772A CN201610562032.XA CN201610562032A CN106227772A CN 106227772 A CN106227772 A CN 106227772A CN 201610562032 A CN201610562032 A CN 201610562032A CN 106227772 A CN106227772 A CN 106227772A
Authority
CN
China
Prior art keywords
net
module
information
class
subject
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610562032.XA
Other languages
Chinese (zh)
Inventor
党连坤
石晔
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
HEFEI COMPASS ELECTRONIC TECHNOLOGY Co Ltd
Original Assignee
HEFEI COMPASS ELECTRONIC TECHNOLOGY Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by HEFEI COMPASS ELECTRONIC TECHNOLOGY Co Ltd filed Critical HEFEI COMPASS ELECTRONIC TECHNOLOGY Co Ltd
Priority to CN201610562032.XA priority Critical patent/CN106227772A/en
Publication of CN106227772A publication Critical patent/CN106227772A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Abstract

The invention discloses a kind of public sentiment monitoring system based on semantic analysis, including: subject information retrieval module, cache module, net letter presetting module, cluster module, tendency computing module and test and appraisal output module.In the present invention, introduce class letter weights, by the contrast of class letter weights, in this subject retrieval result of determination that can be clear and definite credibility the highest, propagate widest a kind of public sentiment tendency.That is, by introducing type weights so that fuzzy the analysis of public opinion work is made clear, had as changing, and analysis result is with a high credibility.In the present invention, by introducing trend information class and be most inclined to subject information so that the expression of the public sentiment tendency of this theme is definitely, it is simple to staff arranges for public sentiment.

Description

A kind of public sentiment monitoring system based on semantic analysis
Technical field
The present invention relates to technical field of information processing, particularly relate to a kind of public sentiment monitoring system based on semantic analysis.
Background technology
Along with network plays the effect become more and more important, government and relevant enterprises and institutions also in Chinese society is lived Increasingly paying attention to monitoring and the early warning of network public-opinion, the analysis of public opinion and monitoring become and have great strategic significance and realistic meaning Research field.Owing to online quantity of information is the hugest, only rely on artificial method and be difficult to tackle the receipts of online magnanimity information Collection and processing, it is therefore desirable to rely on information technology and related discipline Professional knowledge to set up automated network the analysis of public opinion system System.
Owing to Internet is difficult to calculate in worldwide interconnection intercommunication, the data volume that can therefrom obtain, therefrom obtain useful The work of information is that artificial treatment cannot complete at all, and therefore network public-opinion monitoring closely must be tied with data mining technology Close, make public sentiment monitoring realize automatization and intellectuality.Based on data mining technology application in public sentiment is monitored, how to exist The data acquisition system that this whole world of Internet is maximum finds crucial public feelings information, in particular for different public sentiment monitoring projects It is modeled by feature, to provide precision service, has become as the focus of Research on Data Mining Technology.
Summary of the invention
The technical problem existed based on background technology, the present invention proposes the monitoring of a kind of public sentiment based on semantic analysis and is System.
A kind of based on semantic analysis the public sentiment monitoring system that the present invention proposes, including:
Subject information retrieves module, and it includes input block and web crawlers, and input block is used for inputting theme, and network is climbed Worm is connected with input block, and it carries out network retrieval according to theme and obtains subject correlation message;
Cache module, it is connected with subject information retrieval module, for storing the relevant letter of theme that web crawlers retrieves Breath, and the source web of each subject correlation message is associated storage;
Net letter presetting module, it includes that net telecommunications databases is used for storing ground, website with net letter assessment unit, net telecommunications databases The net letter value of location and correspondence;Net letter assessment unit is for calculating net letter value according to web site contents, and net telecommunications databases is commented according to net letter The result of calculation estimating unit is updated or supplements;
Cluster module, its Connection Cache module, for each subject correlation message in cache module is carried out semantic analysis, And semantic similarity is sorted out more than the subject correlation message presetting similarity threshold, it is thus achieved that multiple info class;Cluster mould Block is also connected with net letter presetting module, and it obtains the net letter value of each subject correlation message source web and relevant to corresponding theme believes Breath association storage;
Tendency computing module, it is connected with cluster module, calculates the net letter value sum conduct of storage in each info class respectively Class letter weights, and extract the info class of class letter maximum weight as trend information class;Tendency computing module is at trend information Apoplexy due to endogenous wind selects the maximum subject correlation message of corresponding net letter value to believe as being most inclined to theme by the contrast of source web net letter value Breath.
Preferably, also including output module of testing and assessing, it is connected with tendency computing module;Test and appraisal output module calculates from tendency The subject correlation message of net letter value maximum corresponding in the module each info class of acquisition is as class representative information, then by info class Class letter weights and class representative information make test and evaluation report output.
Preferably, test and evaluation report also includes address and the net letter value of class representative information source web.
Preferably, trend information class and tendency subject information are highlighted by test and evaluation report.
Preferably, whenever subject information retrieval module obtains new theme, cache module resets and carries out new data storage.
Preferably, net letter assessment unit obtains a plurality of information from band test and appraisal website, then sentences information authenticity Disconnected, and according to real information proportion assessment net letter value in information.
Preferably, cluster module calls the source web of each subject correlation message from cache module, then by each source net Network address of standing matches with the website of storage in net telecommunications databases, and calls source net according to matching result from net telecommunications databases Net letter value of standing or by net letter assessment unit calculate source web net letter value.
A kind of based on semantic analysis the public sentiment monitoring system that the present invention proposes, by semantic analysis, to semantic similitude Information carries out clustering processing, scattered information processing work is converted to info class process work, by turning parts into the whole, simplifies The fussy degree of follow-up work.And, by semantic analysis, the semantic basic simlarity of the subject correlation message in same info class, Avoid the semantic covering problem that clustering processing is likely to result in.
In the present invention, introduce class letter weights, by the contrast of class letter weights, this subject retrieval result of determination that can be clear and definite Middle credibility the highest, propagate widest a kind of public sentiment tendency.That is, by introducing type weights so that fuzzy public sentiment is divided Analysis work is made clear, is had as changing, and analysis result is with a high credibility.
In the present invention, class letter weights are equivalent to the net letter value sum of the website of the information of the same semanteme of all issues.Net letter Value can be estimated by net letter assessment unit.In the present invention, it is also provided with net telecommunications databases mutually complementary in net letter assessment unit Fill.Directly invoking of the setting of net telecommunications databases, beneficially net letter value, improves work efficiency, saves interim calculating net letter value Time;The setting of net letter assessment unit, can be updated the net letter value in net telecommunications databases and supplement, and improves net letter and presets The adaptation ability of module, it is to avoid net telecommunications databases narrow.
In the present invention, by introducing trend information class and be most inclined to subject information so that the public sentiment tendency of this theme Express definitely, it is simple to staff arranges for public sentiment.
Accompanying drawing explanation
Fig. 1 is a kind of based on semantic analysis the public sentiment monitoring system block diagram that the present invention proposes.
Detailed description of the invention
Reference Fig. 1, a kind of based on semantic analysis the public sentiment monitoring system that the present invention proposes, including: subject information is retrieved Module, cache module, net letter presetting module, cluster module, tendency computing module and test and appraisal output module.
Subject information retrieval module includes input block and web crawlers.Input block is used for inputting theme, web crawlers Being connected with input block, it carries out network retrieval according to theme and obtains subject correlation message..
Cache module is connected with subject information retrieval module, the relevant letter of its theme retrieved for storing web crawlers Breath, and the source web of each subject correlation message is associated storage.The subject information that is set to of cache module retrieves module Alleviate storage burden, advantageously ensure that the work efficiency of web crawlers.
In present embodiment, whenever subject information retrieval module obtains new theme, and cache module resets and carries out new number According to storage, to avoid cache module space occupied.
Net letter presetting module includes net telecommunications databases and net letter assessment unit.
Net letter assessment unit for calculating the net letter value of this website according to web site contents.Specifically, net letter assessment unit from Band test and appraisal obtain a plurality of information in website, then judge information authenticity, and according to real information institute accounting in information Example assessment net letter value.
Net letter value computation model is: T=real information quantity/filter information quantity.Wherein, filter information quantity is net letter The information content that assessment unit randomly selects from band test and appraisal website, real information quantity is truly to believe in the information content chosen The quantity of breath, it is known that, real information quantity is less than or equal to filter information quantity.
In present embodiment, for the ease of the confirmation of real information quantity, net letter assessment unit from band test and appraisal website with After information chosen by machine, can be higher as official website retrieved by the information chosen from known net letter value, according to retrieval result to choosing Information carry out authenticity validation.
Net telecommunications databases is for storing the net letter value of station address and correspondence, in order to directly invoking of net letter value.This enforcement In mode, in order to adapt to the Rapid Variable Design of network, for website the most stored in net telecommunications databases, net letter assessment unit also root Calculate according to default periodic quantity progress net letter value, and according to the new net letter value calculated, the net letter value of storage is updated.It addition, For net telecommunications databases does not has the website of storage, then supplement according to the result of calculation of net letter assessment unit.
In present embodiment, directly invoking of the setting of net telecommunications databases, beneficially net letter value, improve work efficiency, joint Calculate the time of net letter value the most temporarily;Net letter assessment unit setting, the net letter value in net telecommunications databases can be updated and Supplement, improve the adaptation ability of net letter presetting module, it is to avoid net telecommunications databases narrow.
Cluster module Connection Cache module, for each subject correlation message in cache module is carried out semantic analysis, and Semantic similarity is sorted out more than the subject correlation message presetting similarity threshold, it is thus achieved that multiple info class.So, pass through Semantic analysis, carries out clustering processing to the information of semantic similitude, scattered information processing work is converted to science and engineering at info class Make, by turning parts into the whole, simplify the fussy degree of follow-up work.And, by semantic analysis, the theme phase in same info class The semantic basic simlarity of pass information, it is to avoid the semantic covering problem that clustering processing is likely to result in.
Cluster module is also connected with net letter presetting module, and it obtains the net letter value of each subject correlation message source web and with right The subject correlation message association storage answered.That is, after cluster module generating info class, it is each master in info class from cache module Topic relevant information is called the net letter value of source web and associates storage.Specifically cluster module calls each theme phase from cache module The source web of pass information, then matches each source web network address with the website of storage in net telecommunications databases.If, The source web of subject correlation message is stored in net telecommunications databases, then directly invoke the net letter value letter relevant to this theme of correspondence Breath correspondence is stored in info class;If the source web of subject correlation message is not stored in net telecommunications databases, then pass through net Letter assessment unit carries out the assessment of net letter value to this source web, and net letter value assessment obtained is corresponding with this subject correlation message It is stored in info class, and the net letter value complement that also this source web network address and assessment obtain is charged in net telecommunications databases.
Tendency computing module is connected with cluster module, calculates the net letter value sum of storage in each info class respectively and believes as class Weights, and extract the info class of class letter maximum weight as trend information class.In present embodiment, class letter weights are equivalent to institute There is the net letter value sum of the website of the information issuing same semanteme, by the contrast of class letter weights, this master of determination that can be clear and definite In topic retrieval result credibility the highest, propagate widest a kind of public sentiment tendency.That is, in present embodiment, by introducing class Type weights so that fuzzy the analysis of public opinion work is made clear, had as changing, and analysis result is with a high credibility.
Tendency computing module selects corresponding net letter value at trend information apoplexy due to endogenous wind by the contrast of source web net letter value Big subject correlation message is as being most inclined to subject information.Trend information apoplexy due to endogenous wind has housed a plurality of information of semantic similitude, but It is these information the most more or less some difference.In present embodiment, most it is inclined to subject information by introducing so that should The expression of the public sentiment tendency of theme is definitely, it is simple to staff arranges for public sentiment.
Test and appraisal output module is connected with tendency computing module.Test and appraisal output module obtains each info class from tendency computing module The class of info class, as class representative information, is then believed that weights and class represent by the subject correlation message of the net letter value maximum of middle correspondence Information makes test and evaluation report output.Visible, tendency subject information is the class representative information of trend information class.This embodiment party In formula, for the ease of the expression of test and evaluation report, trend information class and tendency subject information are highlighted by test and evaluation report. Additionally, test and evaluation report also includes address and the net letter value of class representative information source web, in order to staff verifies.
The above, the only present invention preferably detailed description of the invention, but protection scope of the present invention is not limited thereto, Any those familiar with the art in the technical scope that the invention discloses, according to technical scheme and Inventive concept equivalent or change in addition, all should contain within protection scope of the present invention.

Claims (7)

1. a public sentiment monitoring system based on semantic analysis, it is characterised in that including:
Subject information retrieval module, it includes input block and web crawlers, and input block is used for inputting theme, web crawlers with Input block connects, and it carries out network retrieval according to theme and obtains subject correlation message;
Cache module, it is connected with subject information retrieval module, for storing the subject correlation message that web crawlers retrieves, and The source web of each subject correlation message is associated storage;
Net letter presetting module, it includes net telecommunications databases and net letter assessment unit, net telecommunications databases be used for storing station address and Corresponding net letter value;Net letter assessment unit is for calculating net letter value according to web site contents, and net telecommunications databases is single according to net letter assessment The result of calculation of unit is updated or supplements;
Cluster module, its Connection Cache module, for each subject correlation message in cache module being carried out semantic analysis, and will Semantic similarity is sorted out more than the subject correlation message presetting similarity threshold, it is thus achieved that multiple info class;Cluster module is also Connecting net letter presetting module, it obtains the net letter value of each subject correlation message source web and closes with corresponding subject correlation message Connection storage;
Tendency computing module, it is connected with cluster module, calculates the net letter value sum of storage in each info class respectively and believes as class Weights, and extract the info class of class letter maximum weight as trend information class;Tendency computing module is at trend information apoplexy due to endogenous wind Select the maximum subject correlation message of corresponding net letter value as being most inclined to subject information by the contrast of source web net letter value.
2. public sentiment monitoring system based on semantic analysis as claimed in claim 1, it is characterised in that also include test and appraisal output mould Block, it is connected with tendency computing module;Test and appraisal output module obtains net letter value corresponding each info class from tendency computing module The class of info class, as class representative information, is then believed that weights and class representative information make test and appraisal report by maximum subject correlation message Accuse output.
3. public sentiment monitoring system based on semantic analysis as claimed in claim 2, it is characterised in that test and evaluation report also includes class The address of representative information source web and net letter value.
4. public sentiment monitoring system based on semantic analysis as claimed in claim 2, it is characterised in that to inclining most in test and evaluation report Highlight to info class and tendency subject information.
5. public sentiment monitoring system based on semantic analysis as claimed in claim 1, it is characterised in that whenever subject information is retrieved Module obtains new theme, and cache module resets and carries out new data storage.
6. public sentiment monitoring system based on semantic analysis as claimed in claim 1, it is characterised in that net letter assessment unit is from band Test and appraisal obtain a plurality of information in website, then judge information authenticity, and according to real information proportion in information Assessment net letter value.
7. the public sentiment monitoring system based on semantic analysis as described in claim 1 or 6, it is characterised in that cluster module is postponed Storing module calls the source web of each subject correlation message, then by each source web network address and the net of storage in net telecommunications databases Network address of standing matches, and calls source web net letter value or by net letter assessment unit according to matching result from net telecommunications databases Calculate source web net letter value.
CN201610562032.XA 2016-07-15 2016-07-15 A kind of public sentiment monitoring system based on semantic analysis Pending CN106227772A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610562032.XA CN106227772A (en) 2016-07-15 2016-07-15 A kind of public sentiment monitoring system based on semantic analysis

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610562032.XA CN106227772A (en) 2016-07-15 2016-07-15 A kind of public sentiment monitoring system based on semantic analysis

Publications (1)

Publication Number Publication Date
CN106227772A true CN106227772A (en) 2016-12-14

Family

ID=57520458

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610562032.XA Pending CN106227772A (en) 2016-07-15 2016-07-15 A kind of public sentiment monitoring system based on semantic analysis

Country Status (1)

Country Link
CN (1) CN106227772A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107562722A (en) * 2017-08-14 2018-01-09 上海文军信息技术有限公司 Internet public feelings monitoring analysis system based on big data

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101751458A (en) * 2009-12-31 2010-06-23 暨南大学 Network public sentiment monitoring system and method
US20120323627A1 (en) * 2011-06-14 2012-12-20 Microsoft Corporation Real-time Monitoring of Public Sentiment
CN103902619A (en) * 2012-12-28 2014-07-02 中国移动通信集团公司 Internet public opinion monitoring method and system
CN104598450A (en) * 2013-10-30 2015-05-06 北大方正集团有限公司 Popularity analysis method and system of network public opinion event
CN105373558A (en) * 2014-08-27 2016-03-02 青岛海尔智能家电科技有限公司 Method and system for measuring recommendation levels of products

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101751458A (en) * 2009-12-31 2010-06-23 暨南大学 Network public sentiment monitoring system and method
US20120323627A1 (en) * 2011-06-14 2012-12-20 Microsoft Corporation Real-time Monitoring of Public Sentiment
CN103902619A (en) * 2012-12-28 2014-07-02 中国移动通信集团公司 Internet public opinion monitoring method and system
CN104598450A (en) * 2013-10-30 2015-05-06 北大方正集团有限公司 Popularity analysis method and system of network public opinion event
CN105373558A (en) * 2014-08-27 2016-03-02 青岛海尔智能家电科技有限公司 Method and system for measuring recommendation levels of products

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107562722A (en) * 2017-08-14 2018-01-09 上海文军信息技术有限公司 Internet public feelings monitoring analysis system based on big data

Similar Documents

Publication Publication Date Title
CN107220845B (en) User re-purchase probability prediction/user quality determination method and device and electronic equipment
CN102841946B (en) Commodity data retrieval ordering and Method of Commodity Recommendation and system
CN105183731B (en) Recommendation information generation method, device and system
CN103778548A (en) Goods information and keyword matching method, and goods information releasing method and device
Handani et al. Sentiment analysis for go-jek on google play store
CN107193962A (en) A kind of intelligent figure method and device of internet promotion message
CN105740415B (en) Bidding friend recommendation system based on label position weight and self study
WO2016115944A1 (en) Method and device for establishing webpage quality model
CN104281565B (en) Semantic dictionary construction method and device
CN111401700A (en) Data analysis method, device, computer system and readable storage medium
CN111028006B (en) Service delivery auxiliary method, service delivery method and related device
CN110019519A (en) Data processing method, device, storage medium and electronic device
CN106528755A (en) Hot topic generation method and device
CN112801498A (en) Risk identification model training method, risk identification device and risk identification equipment
CN110084653A (en) A kind of data processing method, device, server and storage medium
CN105868169A (en) Data acquisition interface and data acquisition method and system
CN106815266A (en) Judgement document's search method and device
CN102298618B (en) Method for obtaining matching degree to execute corresponding operations and device and equipment
CN109067708A (en) A kind of detection method, device, equipment and the storage medium at webpage back door
CN106227772A (en) A kind of public sentiment monitoring system based on semantic analysis
CN105975642A (en) Public opinion monitoring method based on network big data
CN112148946A (en) Microblog-based analysis and view display method and system
CN107908649A (en) A kind of control method of text classification
CN110019832A (en) The acquisition methods and device of language model
CN108074071A (en) A kind of project data processing method and processing device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20161214