CN111914087A - Public opinion analysis method - Google Patents

Public opinion analysis method Download PDF

Info

Publication number
CN111914087A
CN111914087A CN202010750608.1A CN202010750608A CN111914087A CN 111914087 A CN111914087 A CN 111914087A CN 202010750608 A CN202010750608 A CN 202010750608A CN 111914087 A CN111914087 A CN 111914087A
Authority
CN
China
Prior art keywords
public opinion
public
information
opinion
emotion
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010750608.1A
Other languages
Chinese (zh)
Other versions
CN111914087B (en
Inventor
黄涛
程朴
龚勋
何莹
梁少勇
万忠平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou China Dci Co ltd
Original Assignee
Guangzhou China Dci Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou China Dci Co ltd filed Critical Guangzhou China Dci Co ltd
Priority to CN202010750608.1A priority Critical patent/CN111914087B/en
Publication of CN111914087A publication Critical patent/CN111914087A/en
Application granted granted Critical
Publication of CN111914087B publication Critical patent/CN111914087B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention relates to a public opinion analysis method, which comprises the following steps: collecting public sentiment data, wherein the public sentiment data comprises project public information, public opinions under the project public information, reports of media to the project public information and discussions of the project public information by the public; preprocessing public opinion data to obtain structured public opinion information; generating a public sentiment object according to the public sentiment information, wherein the public sentiment object comprises an object identifier, an object category, space-time information, semantic information, emotion information and relationship information; carrying out similarity matching on the public opinion object and the public opinion cases in the public opinion case library to obtain the most similar public opinion case; and obtaining a public opinion control scheme according to the most similar public opinion case analysis. The method can reasonably mine the network public opinion data, rapidly process mass multi-source public opinion data, obtain valuable public opinion information and provide data support for subsequent public opinion case similarity matching and public opinion control scheme analysis.

Description

Public opinion analysis method
Technical Field
The invention relates to the technical field of data mining and analysis, in particular to a public opinion knowledge base construction method.
Background
In social public opinion application and decision technology, a scene-response type emergency management mode forms consensus in academic circles, and is promoted to governments and industrial circles through continuous improvement of theory and technology, so that a new scene-response type emergency management mode of emergencies is gradually established. The emergency management of the 'scene-corresponding' type is specifically as follows: the method comprises the steps of firstly, carrying out real-time and comprehensive monitoring and intelligent analysis on emergencies in physical and social spaces, mining valuable information from massive, scattered, unstructured and real-time changing disaster situation data, obtaining general description of the current situation through analysis, carrying out situation deduction, then carrying out comprehensive research and decision making, and providing relevant information to the most needed people in time so that the decision maker can carry out on-site treatment and emergency deployment as appropriate.
The technical system of information processing and prediction and early warning established by the current local government is also part of social public opinion decision analysis, for example: anti-terrorism information research and judgment case; an emergency information monitoring and processing system; a water and rain information acquisition, communication, forecasting and scheduling system; a monitoring and early warning system for emergent public health events; dynamic monitoring, analysis and early warning of social security, and the like. However, the technical systems are imperfect, and the ideological concepts of some management departments cannot meet the situation requirements; some management mechanisms are not sound enough, functions are overlapped in a cross mode, and the sharing degree of information resources is low; the information network dynamically reflects the problems of lag and the like, and the public opinion evaluation mechanism can not make comprehensive, systematic and objective evaluation.
With the rapid development and popularization of the internet, people are used to publish respective opinions or opinions on social hotspots, social public affairs and the like through the network, and various forms of network media are also seen in a dispute, such as public numbers, microblogs and the like. When social events and social problems occur, people often quickly know the causes and the development process of the events by means of a network media platform, and then publish opinions through network media, and the opinions have an effect which is not ignored on the development of the events, so that network public opinions are generated. Due to rapidity, universality and strong interactivity of network transmission, network public sentiment is often increased in an explosive manner and is in a complicated form, so that the big data of the network public sentiment needs to be reasonably mined, and a powerful means is provided for real-time effective supervision of the network public sentiment.
Disclosure of Invention
The invention aims to overcome at least one defect (deficiency) of the prior art and provides a public opinion analysis method for solving the problem of lack of a reasonable network public opinion data mining method.
The technical scheme adopted by the invention is as follows:
a public opinion analysis method comprises the following steps: public opinion data is collected; preprocessing public opinion data to obtain structured public opinion information; generating a public sentiment object according to the public sentiment information, wherein the public sentiment object comprises an object identifier, an object category, space-time information, semantic information, emotion information and relationship information; carrying out similarity matching on the public opinion object and the public opinion cases in the public opinion case library to obtain the most similar public opinion case; and obtaining a public opinion control scheme according to the most similar public opinion case analysis.
Under a multi-granularity public opinion space-time object description attribute framework, structured public opinion information is generated into a public opinion object comprising an object identifier and five types of attributes, and the time characteristic, the space characteristic, the evolution characteristic and the propagation characteristic of a public opinion entity can be abstractly described so as to rapidly process massive multi-source public opinion data, mine valuable public opinion information, provide data support for subsequent similarity matching with public opinion cases and public opinion control scheme analysis, and provide scientific decision reference for supervision of network public opinions.
Further, the public opinion object and the public opinion cases in the public opinion case library are subjected to similarity analysis to obtain the most similar public opinion cases, and the method comprises the following steps: carrying out structural similarity matching on the attribute structure of the public opinion object and the attribute structure of the public opinion case in the public opinion case library to obtain the event type of the public opinion object; and carrying out attribute similarity matching on the attributes of the public sentiment objects and the attribute values of the public sentiment cases belonging to the event types in the public sentiment case library to obtain the most similar public sentiment cases.
In different public opinion cases, the case structures have different characteristics, and the information of the public opinion cases may be incomplete, and the information description of the case control scheme may also be incomplete. And the problem of attribute loss can be well solved by analyzing the structural similarity between the public sentiment object and the public sentiment case. The method comprises the steps of firstly carrying out structural similarity matching on an attribute structure, matching an event type to which a public sentiment object belongs, and then carrying out attribute similarity matching on an attribute value according to the matched event type, so that the matching accuracy can be greatly improved.
Further, performing structural similarity matching on the attribute structure of the public opinion object and the attribute structure of the public opinion case in the public opinion case library to obtain the event type to which the public opinion object belongs, including:
marking a certain public sentiment object as X, and the ith event type in the public sentiment case library as CiEvent type CiThe public opinion case system consists of a plurality of public opinion cases;
calculating the public sentiment object X and the event type C according to the following formulaiStructural similarity of (2) S (X, C)i):
Figure BDA0002609940940000021
Gamma is an empirical factor, q is the attribute number of a public sentiment object X, qsIs a public sentiment object X and an event type CiK is the event type CiThe number of attributes of (2);
according to structural similarity S (X, C)x) Judging the event type of the public sentiment object X as the X-th event type Cx
Further, carrying out attribute similarity matching on the attribute of the public opinion object and the attribute value of the public opinion case belonging to the event type in the public opinion case library to obtain the most similar public opinion case, wherein the method comprises the following steps:
let a public sentiment object be X, the event type to which the public sentiment object X belongs is the X-th event type CxEvent type C in public opinion case libraryxA certain public opinion case is txj
The conditional probability is calculated as follows:
Figure BDA0002609940940000031
Figure BDA0002609940940000032
Figure BDA0002609940940000033
nxjbelonging to event type C in public opinion case libraryxAnd the public opinion case number matched with the X attribute of the public opinion object, n is the total number of the public opinion cases in the public opinion case library, XyIs the attribute of a public sentiment object X, nxj(Xy) As event type CxHas an attribute X inyNumber of public opinion cases, omegaxyIs event type Cx(ii) an attribute weight of;
according to conditional probability p (X | t)xj)p(txj) And judging to obtain the most similar public opinion case.
Further, public opinion data is preprocessed, and the preprocessing comprises the following steps:
identifying and removing useless characters of public opinion data, and/or segmenting the public opinion data and removing stop words of the public opinion data, and/or extracting keywords of the public opinion data based on a word frequency statistical method, and/or extracting entity names of the public opinion data based on a named entity identification method, and/or aggregating the public opinion data based on a subject clustering method, and/or extracting topics of the public opinion data based on a common word analysis method; and/or extracting emotion tendency texts in the public opinion data based on text mining technology.
Useless characters and stop words are removed, and meanwhile, keywords, entity names, public opinion topics and emotional tendency texts are extracted, so that the public opinion data can be preliminarily classified to form structured public opinion information.
Further, the object category includes a text category, a topic category, and a topic category, the public sentiment object of the object category being the text category is generated according to an expression text of the public sentiment information, the public sentiment object of the object category being the topic category is generated according to an expression topic of the public sentiment information, and the public sentiment object of the object category being the topic category is generated according to the expression topic of the public sentiment information.
The object type represents the type of the public sentiment data processing and analyzing object, and public sentiment information expression models related to different object types are different. According to the sequence that the semantic abstraction degree is gradually increased, the public sentiment object categories are divided into three categories of text categories, topic categories and theme categories, so that the public sentiment information can be further mined conveniently.
Further, the semantic information includes semantic granularity and semantic content.
The semantic information of each public opinion object can be divided into multiple semantic granularities, each semantic granularity can have multiple semantic records, and each semantic record has respective number and semantic content.
Further, the emotion information comprises an emotion subject, an emotion object, an emotion category and emotion intensity;
generating emotion information in a public opinion object according to the public opinion information, comprising:
extracting an emotional subject and an emotional object from public sentiment information based on a named entity identification method;
identifying emotion vocabularies from the context of the emotion objects based on an association rule mining method, and determining emotion types according to the emotion vocabularies;
and judging the emotional intensity according to the emotional vocabulary based on the emotional tendency judgment method.
The method has the advantages that the emotion information quadruple structure, namely the emotion subject, the emotion object, the emotion category and the emotion intensity, is constructed, public opinion viewpoint and emotion mining can be realized, further, the change of viewpoint and emotion along with the development of public opinion situation is analyzed, and data support is provided for the subsequent public opinion object analysis.
Further, the emotion vocabulary includes emotion words, negative words, degree adverbs and symbolic expressions.
The emotional words, the negative words, the degree adverbs and the symbolic expressions have certain importance on the judgment of the emotional categories and the emotional intensity in the emotional information.
Further, the relationship information includes a relationship category and other public sentiment objects having a relationship with the public sentiment objects, and the relationship category includes an association relationship, an aggregation relationship and a dependency relationship.
The relationship information can express the relationship between the public sentiment objects, and the category of the relationship information is described through the association relationship, the aggregation relationship and the dependency relationship, so that the relationship between the public sentiment objects is favorably combed.
Compared with the prior art, the invention has the beneficial effects that at least: by constructing a multi-granularity public opinion space-time object description attribute framework and generating public opinion information into a public opinion object under the attribute framework, the time characteristic, the space characteristic, the evolution characteristic and the propagation characteristic of a public opinion entity can be abstractly described, so that mass multi-source public opinion data can be rapidly processed, valuable public opinion information can be obtained, data support is provided for subsequent public opinion case similarity matching and public opinion control scheme analysis, and a public opinion supervision control decision can be timely and correctly made.
Drawings
Fig. 1 is a flowchart of a public opinion analysis method according to an embodiment of the invention.
Fig. 2 is a flow chart of public opinion object and public opinion case similarity matching in a public opinion case library according to an embodiment of the present invention.
Detailed Description
The drawings are only for purposes of illustration and are not to be construed as limiting the invention. For a better understanding of the following embodiments, certain features of the drawings may be omitted, enlarged or reduced, and do not represent the size of an actual product; it will be understood by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted.
As shown in fig. 1, the embodiment provides a public opinion analysis method, including:
s1, collecting public sentiment data, wherein the public sentiment data can comprise project public information, public opinions under the project public information, reports of media to the project public information and discussions of the public to the project public information;
s2, preprocessing public opinion data to obtain structured public opinion information;
s3, generating a public sentiment object according to the public sentiment information, wherein the public sentiment object comprises an object identifier, an object category, space-time information, semantic information, emotional information and relationship information;
s4, carrying out similarity matching on the public opinion objects and the public opinion cases in the public opinion case library to obtain the most similar public opinion cases;
and S5, obtaining a public opinion control scheme according to the most similar public opinion case analysis.
Public sentiment is the collection of individual opinions expressed by the public during social events, problem creation, development and change, including the expression, transmission, interaction and influence of individual emotional opinions. The processes of generation and development of social events and problems have their own life cycles. At different time periods of the life cycle, the individual opinions are constantly changing and have specific spatial range and propagation law, which are called public sentiment entities. The public opinion data with different structures from multiple sources is used as a data source, public opinion event content and public emotion viewpoints are obtained, the time-space law of public opinion propagation and diffusion is analyzed, and decision support can be provided for public opinion guidance.
In step S1, the collected project bulletin material may include project filing number, construction project name, construction project location, etc. The public generally can submit feedback opinions under the project public material, so as to form public opinions fed back by the project public material, and the collected public opinions can comprise feedback opinions, contact addresses of feedback persons and the like. The reports of the collected media to the project public material can include article titles, release times, published media names, article contents and the like. The collected public discussion of the project public material can be the title, the posting time, the poster, the reviewer, the replying time, the replying content and the like in the public post bar.
Aiming at network media propagation platforms where network public sentiments are located, such as network media such as micro blogs, news websites and forums, API (application programming interface) interfaces and HTML (hypertext markup language) analysis provided by social media can be used for collecting network public sentiment data, a high-performance network crawler strategy is designed, for example, a network crawler based on multiple threads is designed, multiple machines are used for crawling data, the efficiency of data capturing is improved, and real-time acquisition and automatic updating of the network public sentiment data are achieved.
The public opinion data not only comprises text data in network media, but also has heterogeneous social network data such as text forwarding amount and forwarding relation, and has the characteristics of large data volume, short timeliness, rich sources, complex forms, unstructured and the like. Therefore, in step S2, the public opinion data collected in step S1 is preprocessed to convert the public opinion data into structured public opinion information.
In step S2, the public opinion data is preprocessed, which may specifically include: identifying and removing useless characters of public opinion data, and/or segmenting the public opinion data and removing stop words of the public opinion data, and/or extracting keywords of the public opinion data based on a word frequency statistical method, and/or extracting entity names of the public opinion data based on a named entity identification method, and/or aggregating the public opinion data based on a subject clustering method, and/or extracting topics of the public opinion data based on a common word analysis method; and/or extracting emotion tendency texts in the public opinion data based on text mining technology.
Useless characters are punctuation marks or emoticons, such as commas, pause signs, etc., that do not have emotional expressions. Stop words are words that are not valid for extracting information, such as "on", "of", "at", etc.
The word frequency statistical method is a common weighting technology used for information retrieval and text mining, and is used for evaluating the repetition degree of a word to a field file set in a file or a corpus, the importance of the word is increased in proportion to the occurrence frequency of the word in the file, and therefore the word with higher word frequency can be extracted as a keyword of public opinion data based on the word frequency statistical method.
Named Entity Recognition (NER) refers to the Recognition of special objects in text, whose semantic categories are usually predefined before Recognition, such as people, addresses, organizations, etc. Based on a naming recognition method, entity categories such as place names, organization names, time, person names, events and the like in public opinion data can be extracted.
The theme clustering method can adopt a k-means clustering algorithm, an agglomeration type hierarchical clustering algorithm, a neural network clustering algorithm and the like. And the public opinion data with similar topics can be gathered based on the topic clustering method.
The common word Analysis method (Co-word Analysis) is that a group of words are counted pairwise to obtain the occurrence frequency of the words in the same file, and the words are subjected to clustering Analysis on the basis of the occurrence frequency, so that the relativity and the sparsity among the words are reflected, and the structure and the change of a theme represented by the words are further analyzed. The common word analysis method can be used for carrying out common word analysis by using the subject words and the keywords of the files respectively. The public sentiment topic extraction method can be used for extracting the public sentiment topics based on the common word analysis method, so that the public sentiment topics are extracted, and topic association analysis and hotspot detection are realized.
Text Mining (Text Mining) refers to a computer processing technique that extracts valuable information and knowledge from Text data. And extracting the text with emotional tendency from the public opinion data by a text mining technology.
Useless characters and stop words are removed, and meanwhile, keywords, entity names, public opinion topics and emotional tendency texts are extracted, so that the public opinion data can be preliminarily classified to form structured public opinion information.
In step S3, in order to design a structural model suitable for parallelization of public opinion big data, and considering the relevance characteristics of public opinion information, a multi-granularity public opinion spatio-temporal object description attribute framework is constructed, and the structured public opinion information is generated into a public opinion object, which can perform abstract description on the temporal characteristics, spatial characteristics, evolution and propagation characteristics of a public opinion entity, so as to quickly process massive multi-source public opinion data and obtain valuable public opinion information.
Specifically, the definition of the public opinion object may be as follows: public sentiment object { object unique Identification (ID), object category, spatiotemporal information, semantic information, emotional information, relational information }. The public sentiment information such as the space-time attribute, the text topic and the subject content, the emotional tendency, the relation with other objects and the like extracted from the public sentiment information can be expressed through five types of attributes of the multi-granularity public sentiment object, namely object type, space-time information, semantic information, emotional information and relation information.
The object category may include a text category, a speech question category, and a theme category, the public sentiment object of which the object category is the text category is generated according to an expression text of the public sentiment information, the public sentiment object of which the object category is the speech question category is generated according to an expression topic of the public sentiment information, and the public sentiment object of which the object category is the theme category is generated according to an expression topic of the public sentiment information.
The object type represents the type of the public sentiment data processing and analyzing object, and public sentiment information expression models related to different object types are different. According to the sequence that the semantic abstraction degree gradually increases, the public sentiment object categories are divided into three categories of text categories, topic categories and theme categories, the text category public sentiment object is a description model constructed for the public sentiment text and expresses the public sentiment information contained in the text. The topic type public sentiment object is a description model constructed for the public sentiment topic and expresses public sentiment information contained in the topic. The topic public opinion object is a description model constructed for the public opinion topic and expresses public opinion information contained in the topic.
The spatiotemporal information may include temporal information and spatial information for expressing time and space of occurrence and termination of public sentiment, evolution and propagation in time and space, and the like.
Specifically, the definition of spatiotemporal information may be as follows:
spatio-temporal information ═ temporal information, spatial information };
time information ═ time granularity, [ start time, end time ] };
spatial information ═ spatial granularity, spatial position }.
The semantic information can comprise semantic granularity and semantic content and is used for expressing the content of the public sentiment text, the semantic information of each public sentiment object can be divided into multiple semantic granularities, each semantic granularity can have multiple semantic records, and each semantic record has respective number and semantic content.
Specifically, the definition of semantic information may be as follows:
semantic information
(semantic granularity 1, ([ number 1.1, semantic content ], [ number 1.2, semantic content ], … …)),
(semantic granularity 2, ([ number 2.1, semantic content ], [ number 2.2, semantic content ], … …)),
(semantic granularity 3, ([ number 3.1, semantic content ], [ number 3.2, semantic content ], … …)), … … }.
The emotional information can comprise an emotional subject, an emotional object, an emotional category and emotional intensity and is used for expressing public sentiment emotional content, one public sentiment object can have a plurality of emotional records, and the emotional content is structurally expressed through quadruplets, namely the emotional subject, the emotional object, the emotional category and the emotional intensity. The 'emotional main body' represents a speaker of emotion, usually a netizen individual or a network media platform. The "emotion object" represents an object for which emotion is aimed, such as an attribute of a commodity, a content of a rumor, and the like. An "emotion category" is a category of emotions that includes happiness, sadness, anger, approval, disapproval, suspicion, and the like. The "emotional intensity" is a score of the degree of intensity of emotion, and can be represented numerically and quantified by means of emotion classification and the like. Based on the quadruple structure, the mining of public opinion viewpoints and emotions can be realized, and then the changes of the viewpoints and the emotions along with the development of the public opinion situation are analyzed, so that data support is provided for the subsequent public opinion object analysis.
Specifically, the definition of emotion information may be as follows:
emotion information { [ emotion record 1, emotion content ], [ emotion record 2, emotion content ], [ emotion record i, emotion content ], … … } (i > { [ 1 >);
and (4) emotion content, namely { emotion subject, emotion object, emotion category and emotion intensity }.
The emotion information can be generated from the public opinion information in the following way: extracting an emotional subject and an emotional object from public sentiment information based on a named entity identification method; identifying emotion vocabularies from the context of the emotion objects based on an association rule mining method, and determining emotion types according to the emotion vocabularies; and judging the emotional intensity according to the emotional vocabulary based on the emotional tendency judgment method.
The emotion tendency judgment method can judge the emotion intensity by calculating the co-occurrence frequency between the emotion vocabulary and the emotion reference word corresponding to the determined emotion category and express the emotion intensity by judging the emotion polarity of the emotion vocabulary based on a word frequency statistic mode. Specifically, the emotion polarity can be judged in the following manner: the positive emotion vocabulary number is larger than the negative emotion vocabulary number, and the emotion polarity is represented as the positive emotion; the positive emotion vocabulary quantity is the negative emotion vocabulary quantity and indicates that the emotion polarity is neutral; the positive emotion vocabulary number is less than the negative emotion vocabulary number, and the emotion polarity is negative.
The emotion vocabulary may include emotion words, negative words, degree adverbs, and symbolic expressions.
The emotion words can be derived from an emotion dictionary, which can adopt How Net containing 4566 positive emotion words and 4370 negative emotion words, ANTUSD containing 2810 positive emotion words and 8276 negative emotion words, and the like.
The negative words are words capable of converting text emotion polarity, positive emotions can be converted into negative emotions or negative emotions can be converted into positive emotions, and double-negative and multiple-negative are also provided in addition to general negative, so that the identification of the negative words is critical to the determination of subsequent emotion categories and the judgment of emotion intensity.
The degree adverb is mainly used for modifying a corresponding adjective or adverb in the text, and usually appears in front of the adjective or adverb, and the degree adverb can weaken or strengthen the emotional intensity of the emotional word in the emotional analysis, so that the degree adverb is additionally considered in the judgment of the emotional intensity, and the judgment accuracy of the emotional intensity can be improved.
People often use emoticons to convey their emotions in a social platform, and the emoticons can not only add humorous sense to the text but also eliminate text ambiguity. Emoticons are typically used in the following cases: (1) words do not express emotion well), such as "how is [ anger ] in such a person [ anger ]", conveys the user's angry emotion well; (2) for disambiguating text, such as "live really interesting tears", which when analyzed by text alone makes the text positive in polarity, but clearly negative, and emoticons tears convey the emotional category correctly; (3) the text emotion is enhanced, for example, "this movie looks too good for [ too happy ]" the emotion symbol [ too happy ] enhances the emotion of the text.
The relationship information may include a relationship category and other public sentiment objects having a relationship with the public sentiment object, wherein the relationship category may be an association relationship, an aggregation relationship, a dependency relationship, and the like. The relationship information is used for expressing the relationship between the public sentiment objects.
Specifically, the definition of the relationship information may be as follows:
relationship information ═ last
(relationship type 1, ([ number 1.1, object ID ], [ number 1.2, object ID ], … …)),
(relationship type 2, ([ number 2.1, object ID ], [ number 2.2, object ID ], … …)),
(relationship type 3, ([ number 3.1, object ID ], [ number 3.2, object ID ], … …)) … … }.
In step S4, the public opinion case base may store historical public opinion cases, and the historical public opinion cases may be stored in the public opinion case base in the form of event objects, and the event objects may include event identifications, start times, end times, event topics, event keywords, and event profiles. An object-oriented technology is introduced to model the historical public opinion control case into an object, and the object-oriented technology is used as an independent knowledge unit in a public opinion case library, so that more complex knowledge identification and knowledge reasoning can be carried out.
When the public opinion object is matched with the public opinion cases in the public opinion case library in a similarity manner, the detailed analysis can be performed by emphasizing five types of attributes under the multi-granularity public opinion space-time object description attribute framework of the public opinion object, the characteristics of each attribute are fully considered, different processing is performed on different attributes, and the similarity between the attributes is more perfectly excavated, so that the most similar public opinion cases are matched more scientifically and reasonably, and the public opinion control scheme obtained by analyzing the most similar public opinion cases in the step S5 is more practical.
In different public opinion cases, the case structures have different characteristics, and the information of the public opinion cases may be incomplete, and the information description of the case control scheme may also be incomplete. And the problem of attribute loss can be well solved by analyzing the structural similarity between the public sentiment object and the public sentiment case. Therefore, as shown in fig. 2, step S4 may specifically include:
s41, carrying out structural similarity matching on the attribute structure of the public opinion object and the attribute structure of the public opinion case in the public opinion case library to obtain an event type to which the public opinion object belongs;
and S42, carrying out attribute similarity matching on the attribute of the public opinion object and the attribute value of the public opinion case belonging to the event type in the public opinion case library to obtain the most similar public opinion case.
The matching accuracy can be greatly improved by performing the structure similarity matching with respect to the attribute structure in step S41 to match the event type to which the public opinion object belongs, and performing the attribute similarity matching with respect to the attribute value according to the matched event type in step S42.
Specifically, according to the diversity of public opinion object attribute values and the characteristics of the public opinion case, the following three types can be flexibly adopted for attribute similarity matching:
(1) digital attribute similarity matching: are often referred to by a definite number, either continuously or discretely. Calculating the similarity of the two numerical values by a hamming distance and Euclidean distance equidistance calculation method;
(2) symbol attribute similarity matching: usually with an explicit notation, such as case publication time, event delivery location, etc. The relationship of quantity does not exist between the symbol attribute values, and only the same (or contained) and different relationships exist, so that the similarity of two symbols can be judged by directly judging whether the two symbols are the same or not;
(3) fuzzy attribute similarity matching: the fuzzy attribute comprises a fuzzy semantic attribute, a fuzzy number attribute, a fuzzy interval attribute and the like. And calculating the similarity of the two fuzzy attributes through membership functions such as a trapezoidal function, a triangular function, a Gaussian function and the like.
The matched most similar public opinion cases only take the similarity as a unique standard and are lack in reliability, so that the confidence analysis can be combined on the basis of similarity matching. Step S4 may further include:
s43, presetting confidence level indexes, and establishing a confidence level decision tree;
and S44, analyzing whether the attributes (such as space-time information, semantic information, emotional information and the like) of the public sentiment object are credible according to the confidence decision tree.
By adopting the above-mentioned ways of structure similarity matching and attribute similarity matching, although more accurate similarity can be obtained, the time cost required is larger, and when the public opinion case base is continuously increased, the time required is increased in the same proportion. Therefore, similarity matching can be carried out by adopting a Bayesian probability model, and the time cost of matching can be reduced.
In step S41, let a public sentiment object be X, and the ith event type in the public sentiment case library be CiEvent type CiThe public opinion case system consists of a plurality of public opinion cases;
calculating the public sentiment object X and the event type C according to the following formulaiStructural similarity of (2) S (X, C)i):
Figure BDA0002609940940000101
Gamma is an empirical factor, q is the attribute number of a public sentiment object X, qsIs a public sentiment object X and an event type CiK is the event type CiThe number of attributes of (2).
According to structural similarity S (X, C)x) Can judgeThe event type of the public sentiment object X is the X-th event type Cx
Specifically, a threshold τ may be preset when the calculated maximum structural similarity S (X, C) isx-max) When the value is larger than the preset threshold value tau, the event type to which the corresponding public sentiment X belongs can be considered as the (X-max) th event type Cx-max
In step S42, let a public sentiment object be X, and the event type to which the public sentiment object X belongs be the xth event type CxEvent type C in public opinion case libraryxA certain public opinion case is txj
Due to the independence between attributes, i.e. the absence of dependencies from conditional attributes, the conditional probability can be calculated:
Figure BDA0002609940940000111
Figure BDA0002609940940000112
then there are:
Figure BDA0002609940940000113
Figure BDA0002609940940000114
in summary, the conditional probability can be calculated as follows:
Figure BDA0002609940940000115
Figure BDA0002609940940000116
Figure BDA0002609940940000117
nxjbelonging to event type C in public opinion case libraryxAnd the public opinion case number matched with the X attribute of the public opinion object, n is the total number of the public opinion cases in the public opinion case library, XyIs the attribute of a public sentiment object X, nxj(Xy) As event type CxHas an attribute X inyNumber of public opinion cases, omegaxyIs event type CxThe attribute weight of (2).
According to conditional probability p (X | t)xj)p(txj) And judging to obtain the most similar public opinion case.
Specifically, one public opinion case or the largest public opinion cases with the largest conditional probability may be used as the most similar public opinion case of the public opinion object X. The public opinion case corresponding to the maximum conditional probability can be calculated according to the following formula:
Figure BDA0002609940940000118
step S5 may specifically be: and deducing a solution of the most similar public opinion case according to the actual situation so as to obtain a public opinion control scheme of the current public opinion event.
Step S5 may specifically be: and analyzing different points and the same points of the current public opinion event and the most similar public opinion case, and adjusting the disposal strategy of the most similar public opinion case to obtain a public opinion control scheme of the current public opinion event.
Through the steps S1 to S5, the collected public opinion data can be fully mined, data support is provided for subsequent similarity matching with public opinion cases and public opinion control scheme analysis, and scientific decision reference is provided for supervision of network public opinions.
It should be understood that the above-mentioned embodiments of the present invention are only examples for clearly illustrating the technical solutions of the present invention, and are not intended to limit the specific embodiments of the present invention. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention claims should be included in the protection scope of the present invention claims.

Claims (10)

1. A public opinion analysis method is characterized by comprising the following steps:
public opinion data is collected;
preprocessing the public opinion data to obtain structured public opinion information;
generating a public sentiment object according to the public sentiment information, wherein the public sentiment object comprises an object identifier, an object category, time-space information, semantic information, emotional information and relationship information;
carrying out similarity matching on the public opinion object and public opinion cases in a public opinion case library to obtain the most similar public opinion case;
and obtaining a public opinion control scheme according to the most similar public opinion case analysis.
2. The public opinion analysis method as claimed in claim 1, wherein the similarity analysis of the public opinion objects and the public opinion cases in the public opinion case library to obtain the most similar public opinion case comprises:
carrying out structural similarity matching on the attribute structure of the public opinion object and the attribute structure of the public opinion case in a public opinion case library to obtain the event type of the public opinion object;
and carrying out attribute similarity matching on the attributes of the public opinion objects and the attribute values of the public opinion cases belonging to the event types in a public opinion case library to obtain the most similar public opinion cases.
3. The public opinion analysis method according to claim 2, wherein performing structural similarity matching on the attribute structure of the public opinion object and the attribute structure of the public opinion case in a public opinion case library to obtain the event type to which the public opinion object belongs comprises:
marking a certain public sentiment object as X, and the ith event type in the public sentiment case library as CiEvent type CiThe public opinion case system consists of a plurality of public opinion cases;
calculating the public sentiment object X and the event type C according to the following formulaiStructural similarity of (2) S (X, C)i):
Figure FDA0002609940930000011
Gamma is an empirical factor, q is the attribute number of a public sentiment object X, qsIs a public sentiment object X and an event type CiK is the event type CiThe number of attributes of (2);
according to structural similarity S (X, C)x) Judging the event type of the public sentiment object X as the X-th event type Cx
4. The public opinion analysis method as claimed in claim 2, wherein performing attribute similarity matching between the attributes of the public opinion object and the attribute values of the public opinion cases belonging to the event type in a public opinion case library to obtain a most similar public opinion case comprises:
let a public sentiment object be X, the event type to which the public sentiment object X belongs is the X-th event type CxEvent type C in public opinion case libraryxA certain public opinion case is txj
The conditional probability is calculated as follows:
Figure FDA0002609940930000021
Figure FDA0002609940930000022
Figure FDA0002609940930000023
nxjbelonging to event type C in public opinion case libraryxAnd the public opinion case number matched with the X attribute of the public opinion object, n is the total number of the public opinion cases in the public opinion case library, XyIs the attribute of a public sentiment object X, nxj(Xy) As event type CxHas an attribute X inyNumber of public opinion cases, omegaxyIs event type Cx(ii) an attribute weight of;
according to conditional probability p (X | t)xj)p(txj) And judging to obtain the most similar public opinion case.
5. The public opinion analysis method according to claim 1, wherein the preprocessing of the public opinion data includes:
identifying and removing useless characters of the public opinion data, and/or segmenting the public opinion data and removing stop words of the public opinion data, and/or extracting keywords of the public opinion data based on a word frequency statistical method, and/or extracting entity names of the public opinion data based on a named entity identification method, and/or aggregating the public opinion data based on a subject clustering method, and/or extracting topics of the public opinion data based on a common word analysis method; and/or extracting emotion tendency texts in the public opinion data based on text mining technology.
6. The public opinion analysis method according to claim 1, wherein the object categories include a text category, a topic category, and a topic category, the public opinion objects of the object categories are generated according to an expression text of the public opinion information, the public opinion objects of the object categories are generated according to an expression topic of the public opinion information, and the public opinion objects of the object categories are generated according to an expression topic of the public opinion information.
7. The public opinion analysis method according to claim 1, wherein the semantic information includes semantic granularity and semantic content.
8. A public opinion analysis method according to claim 1, wherein the emotion information includes emotion subject, emotion object, emotion classification and emotion intensity;
generating emotion information in a public opinion object according to the public opinion information, comprising:
extracting an emotional subject and an emotional object from the public sentiment information based on a named entity identification method;
identifying emotion vocabularies from the emotion object context based on an association rule mining method, and determining emotion types according to the emotion vocabularies;
and judging the emotional intensity according to the emotional vocabulary based on an emotional tendency judgment method.
9. The method for public opinion analysis according to claim 8, wherein the emotion vocabulary includes emotion words, negative words, degree adverbs and symbolic expressions.
10. The public opinion analysis method according to claim 1, wherein the relationship information includes a relationship category and other public opinion objects having a relationship with the public opinion objects, the relationship category includes an association relationship, an aggregation relationship and a dependency relationship.
CN202010750608.1A 2020-07-30 2020-07-30 Public opinion analysis method Active CN111914087B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010750608.1A CN111914087B (en) 2020-07-30 2020-07-30 Public opinion analysis method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010750608.1A CN111914087B (en) 2020-07-30 2020-07-30 Public opinion analysis method

Publications (2)

Publication Number Publication Date
CN111914087A true CN111914087A (en) 2020-11-10
CN111914087B CN111914087B (en) 2023-09-19

Family

ID=73286820

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010750608.1A Active CN111914087B (en) 2020-07-30 2020-07-30 Public opinion analysis method

Country Status (1)

Country Link
CN (1) CN111914087B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112559844A (en) * 2020-12-17 2021-03-26 北京邮电大学 Natural disaster public opinion analysis method and device
CN112734154A (en) * 2020-11-16 2021-04-30 中山大学 Multi-factor public opinion risk assessment method based on fuzzy number similarity
CN112800321A (en) * 2021-01-05 2021-05-14 百威投资(中国)有限公司 Ambiguous post identification method based on keyword retrieval and computer equipment
CN112862305A (en) * 2021-02-03 2021-05-28 北京百度网讯科技有限公司 Method, device, equipment and storage medium for determining risk state of object
CN113297498A (en) * 2021-06-22 2021-08-24 南京晓庄学院 Internet-based food attribute mining method and system
CN113918794A (en) * 2021-12-13 2022-01-11 宝略科技(浙江)有限公司 Enterprise network public opinion benefit analysis method, system, electronic equipment and storage medium
WO2024031550A1 (en) * 2022-08-11 2024-02-15 Accenture Global Solutions Limited Trending topic discovery with keyword-based topic model

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102708096A (en) * 2012-05-29 2012-10-03 代松 Network intelligence public sentiment monitoring system based on semantics and work method thereof
CN103440287A (en) * 2013-08-14 2013-12-11 广东工业大学 Web question-answering retrieval system based on product information structuring
US20150286627A1 (en) * 2014-04-03 2015-10-08 Adobe Systems Incorporated Contextual sentiment text analysis
KR20160111715A (en) * 2015-03-17 2016-09-27 김시영 Method for public opinion making using social network based on emotion analysys
RU2637992C1 (en) * 2016-08-25 2017-12-08 Общество с ограниченной ответственностью "Аби Продакшн" Method of extracting facts from texts on natural language
US20180074786A1 (en) * 2016-09-15 2018-03-15 Oracle International Corporation Techniques for dataset similarity discovery
CN109408804A (en) * 2018-09-03 2019-03-01 平安科技(深圳)有限公司 The analysis of public opinion method, system, equipment and storage medium
WO2020141968A1 (en) * 2018-12-31 2020-07-09 Mimos Berhad A system and method for impact analysis of change request that affects database structure through classificiation and keyword similarity analysis

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102708096A (en) * 2012-05-29 2012-10-03 代松 Network intelligence public sentiment monitoring system based on semantics and work method thereof
CN103440287A (en) * 2013-08-14 2013-12-11 广东工业大学 Web question-answering retrieval system based on product information structuring
US20150286627A1 (en) * 2014-04-03 2015-10-08 Adobe Systems Incorporated Contextual sentiment text analysis
KR20160111715A (en) * 2015-03-17 2016-09-27 김시영 Method for public opinion making using social network based on emotion analysys
RU2637992C1 (en) * 2016-08-25 2017-12-08 Общество с ограниченной ответственностью "Аби Продакшн" Method of extracting facts from texts on natural language
US20180074786A1 (en) * 2016-09-15 2018-03-15 Oracle International Corporation Techniques for dataset similarity discovery
CN109408804A (en) * 2018-09-03 2019-03-01 平安科技(深圳)有限公司 The analysis of public opinion method, system, equipment and storage medium
WO2020141968A1 (en) * 2018-12-31 2020-07-09 Mimos Berhad A system and method for impact analysis of change request that affects database structure through classificiation and keyword similarity analysis

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
熊祖涛;: "针对微博数据的信息抽取与舆情分析", 信息系统工程, no. 03 *

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112734154A (en) * 2020-11-16 2021-04-30 中山大学 Multi-factor public opinion risk assessment method based on fuzzy number similarity
CN112734154B (en) * 2020-11-16 2023-08-01 中山大学 Multi-factor public opinion risk assessment method based on fuzzy number similarity
CN112559844A (en) * 2020-12-17 2021-03-26 北京邮电大学 Natural disaster public opinion analysis method and device
CN112559844B (en) * 2020-12-17 2021-08-31 北京邮电大学 Natural disaster public opinion analysis method and device
CN112800321A (en) * 2021-01-05 2021-05-14 百威投资(中国)有限公司 Ambiguous post identification method based on keyword retrieval and computer equipment
CN112800321B (en) * 2021-01-05 2023-01-20 百威投资(中国)有限公司 Ambiguous post identification method based on keyword retrieval and computer equipment
CN112862305A (en) * 2021-02-03 2021-05-28 北京百度网讯科技有限公司 Method, device, equipment and storage medium for determining risk state of object
CN113297498A (en) * 2021-06-22 2021-08-24 南京晓庄学院 Internet-based food attribute mining method and system
CN113297498B (en) * 2021-06-22 2023-05-26 南京晓庄学院 Internet-based food attribute mining method and system
CN113918794A (en) * 2021-12-13 2022-01-11 宝略科技(浙江)有限公司 Enterprise network public opinion benefit analysis method, system, electronic equipment and storage medium
CN113918794B (en) * 2021-12-13 2022-03-29 宝略科技(浙江)有限公司 Enterprise network public opinion benefit analysis method, system, electronic equipment and storage medium
WO2024031550A1 (en) * 2022-08-11 2024-02-15 Accenture Global Solutions Limited Trending topic discovery with keyword-based topic model

Also Published As

Publication number Publication date
CN111914087B (en) 2023-09-19

Similar Documents

Publication Publication Date Title
Kumar et al. Sentiment analysis of multimodal twitter data
Saberi et al. Sentiment analysis or opinion mining: A review
CN111914087B (en) Public opinion analysis method
Ayo et al. A probabilistic clustering model for hate speech classification in twitter
CN111950273A (en) Network public opinion emergency automatic identification method based on emotion information extraction analysis
US9317594B2 (en) Social community identification for automatic document classification
Mottaghinia et al. A review of approaches for topic detection in Twitter
Petchler et al. Automated content analysis of online political communication
Demirci Emotion analysis on Turkish tweets
Zhang et al. An intelligent textual corpus big data computing approach for lexicons construction and sentiment classification of public emergency events
CN115017303A (en) Method, computing device and medium for enterprise risk assessment based on news text
Al-Hagery et al. Exploration of the best performance method of emotions classification for arabic tweets
Nahar et al. Sentiment analysis and emotion extraction: A review of research paradigm
Yu et al. Exploiting structured news information to improve event detection via dual-level clustering
Atoum Detecting cyberbullying from tweets through machine learning techniques with sentiment analysis
Liu et al. Emotion detection for misinformation: A review
Santhosh Baboo et al. Comparison of machine learning techniques on Twitter emotions classification
CN114764463A (en) Internet public opinion event automatic early warning system based on event propagation characteristics
Rakshitha et al. Machine Learning based Analysis of Twitter Data to Determine a Person's Mental Health Intuitive Wellbeing
CN115510269A (en) Video recommendation method, device, equipment and storage medium
Prakash et al. Lexicon Based Sentiment Analysis (LBSA) to Improve the Accuracy of Acronyms, Emoticons, and Contextual Words
Shah et al. Cyber-bullying detection in hinglish languages using machine learning
Wang et al. Natural language processing systems and Big Data analytics
Alorini et al. Machine learning enabled sentiment index estimation using social media big data
Skenduli et al. Mining emotion-aware sequential rules at user-level from micro-blogs

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB03 Change of inventor or designer information

Inventor after: Huang Tao

Inventor after: Gong Xun

Inventor after: He Ying

Inventor after: Liang Shaoyong

Inventor after: Wan Zhongping

Inventor before: Huang Tao

Inventor before: Cheng Pu

Inventor before: Gong Xun

Inventor before: He Ying

Inventor before: Liang Shaoyong

Inventor before: Wan Zhongping

CB03 Change of inventor or designer information
GR01 Patent grant
GR01 Patent grant