CN111914087B - Public opinion analysis method - Google Patents

Public opinion analysis method Download PDF

Info

Publication number
CN111914087B
CN111914087B CN202010750608.1A CN202010750608A CN111914087B CN 111914087 B CN111914087 B CN 111914087B CN 202010750608 A CN202010750608 A CN 202010750608A CN 111914087 B CN111914087 B CN 111914087B
Authority
CN
China
Prior art keywords
public opinion
emotion
information
public
event type
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010750608.1A
Other languages
Chinese (zh)
Other versions
CN111914087A (en
Inventor
黄涛
龚勋
何莹
梁少勇
万忠平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou China Dci Co ltd
Original Assignee
Guangzhou China Dci Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou China Dci Co ltd filed Critical Guangzhou China Dci Co ltd
Priority to CN202010750608.1A priority Critical patent/CN111914087B/en
Publication of CN111914087A publication Critical patent/CN111914087A/en
Application granted granted Critical
Publication of CN111914087B publication Critical patent/CN111914087B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention relates to a public opinion analysis method, which comprises the following steps: collecting public opinion data, wherein the public opinion data comprises project public opinion data, and public opinion under the project public opinion data, reports of media to the project public opinion data, and discussion of the project public opinion data; preprocessing public opinion data to obtain structured public opinion information; generating public opinion objects according to public opinion information, wherein the public opinion objects comprise object identifications, object categories, space-time information, semantic information, emotion information and relationship information; carrying out similarity matching on the public opinion objects and the public opinion cases in the public opinion case library to obtain the most similar public opinion cases; and obtaining a public opinion control scheme according to the most similar public opinion case analysis. The invention can reasonably mine the network public opinion data, rapidly process massive multi-source public opinion data, acquire valuable public opinion information, and provide data support for subsequent public opinion case similarity matching and public opinion control scheme analysis.

Description

Public opinion analysis method
Technical Field
The invention relates to the technical field of data mining and analysis, in particular to a public opinion knowledge base construction method.
Background
In the application and decision technology of social public opinion, a scene-coping type emergency management mode forms consensus in academia, and is pushed to the government and industry through continuous perfection of theory and technology, so that a new scene-coping type emergency management mode of emergency is gradually established. The emergency scene-handling emergency management is specifically as follows: firstly, real-time and comprehensive monitoring and intelligent analysis are required for sudden events in physical and social space, valuable information is mined from mass, scattered, unstructured and real-time disaster data, general description of the current situation is obtained through analysis, situation deduction is carried out, comprehensive research, judgment and decision making are carried out, and relevant information is provided for most needed people in time, so that decision makers can make appropriate on-site disposal and emergency deployment.
The information processing and predictive early warning technical system established by the current local government is also part of the social public opinion decision analysis, for example: monitoring and processing the emergency letter information; the system comprises a water and rain information acquisition, communication, prediction and scheduling system; a sudden public health event monitoring and early warning system; dynamic monitoring, analysis and early warning of social security, etc. However, the technical systems are imperfect, and ideas of some management departments cannot keep pace with situation demands; some management mechanisms are not sound enough, functions are overlapped in a crossing way, and the information resource sharing degree is low; the information network dynamically reflects the problems of hysteresis and the like, and the public opinion evaluation mechanism cannot make comprehensive system objective evaluation.
With the rapid development and popularization of the internet, people have been used to publishing respective ideas or comments to social hotspots, social public matters and the like through networks, and various forms of network media are also presented, such as public numbers, microblogs and the like. When social events and social problems occur, people often quickly learn the cause and development process of the events by means of a network media platform, and then send comments through the network media, and the comments have non-negligible influence on the development of the events, so that network public opinion is generated. Because of the rapidness, the universality and the strong interactivity of network propagation, the network public opinion tends to be explosive and complex in form, so that the network public opinion big data needs to be reasonably mined, and a powerful means is provided for real-time effective supervision of the network public opinion.
Disclosure of Invention
The present invention is directed to overcoming at least one of the above-mentioned drawbacks (shortcomings) of the prior art, and providing a public opinion analysis method for solving the problem of lack of a reasonable network public opinion data mining method.
The technical scheme adopted by the invention is as follows:
a public opinion analysis method comprising: collecting public opinion data; preprocessing public opinion data to obtain structured public opinion information; generating public opinion objects according to public opinion information, wherein the public opinion objects comprise object identifications, object categories, space-time information, semantic information, emotion information and relationship information; carrying out similarity matching on the public opinion objects and the public opinion cases in the public opinion case library to obtain the most similar public opinion cases; and obtaining a public opinion control scheme according to the most similar public opinion case analysis.
Under the multi-granularity public opinion space-time object description attribute framework, the structured public opinion information is generated into the public opinion object comprising the object identification and five types of attributes, and the time characteristics, the space characteristics, the evolution and the propagation characteristics of public opinion entities can be subjected to abstract description so as to rapidly process mass multi-source public opinion data, mine valuable public opinion information, provide data support for subsequent similarity matching with public opinion cases and public opinion control scheme analysis, and provide scientific decision reference for supervision of network public opinion.
Further, performing similarity analysis on the public opinion object and the public opinion cases in the public opinion case library to obtain the most similar public opinion cases, including: carrying out structural similarity matching on the attribute structure of the public opinion object and the attribute structure of the public opinion case in the public opinion case library to obtain an event type to which the public opinion object belongs; and performing attribute similarity matching on the attribute of the public opinion object and the attribute value of the public opinion case belonging to the event type in the public opinion case library to obtain the most similar public opinion case.
The case structure of different public opinion cases has different characteristics, and the public opinion case information may be incomplete, and the information description of the case control scheme may be incomplete. And analyzing the structural similarity between the public opinion object and the public opinion case can well solve the problem of attribute deficiency. Firstly, carrying out structural similarity matching on attribute structures, matching out event types to which public opinion objects belong, and then carrying out attribute similarity matching on attribute values according to the matched event types, so that the matching accuracy can be greatly improved.
Further, performing structural similarity matching on the attribute structure of the public opinion object and the attribute structure of the public opinion case in the public opinion case library to obtain an event type to which the public opinion object belongs, including:
remembering a certain public opinion object asThe +.f in the public opinion case base>The event type is->Event type->Consists of a plurality of public opinion cases;
calculating public opinion objects according toAnd event type->Structural similarity->
Is an empirical factor->Is subject of public opinion->Attribute number of->Is subject of public opinion->And event type->The same attribute number, ++>For event type->The number of attributes of (a);
according to the structural similarityJudging the public opinion object->The event type is->Event type->
Further, performing attribute similarity matching on the attribute of the public opinion object and the attribute value of the public opinion case belonging to the event type in the public opinion case library to obtain the most similar public opinion case, including:
remembering a certain public opinion object asPublic opinion object->The event type is->Event type->The public opinion case base belongs to event type +.>Is +.>
The conditional probability is calculated according to the following formula:
is the event type in the public opinion case base>And is->The number of public opinion cases with matched attributes,is the total number of public opinion cases in the public opinion case library, </i >>Is subject of public opinion->Attribute of->For event type->Has the attribute->The number of public opinion cases->Is event type +.>Attribute weights of (a);
according to conditional probabilityJudging and obtaining the most similar public opinion cases.
Further, preprocessing public opinion data includes:
identifying and removing useless characters of public opinion data, and/or performing word segmentation on the public opinion data and removing dead words of the public opinion data, and/or extracting keywords of the public opinion data based on a word frequency statistics method, and/or extracting entity names of the public opinion data based on a named entity recognition method, and/or gathering the public opinion data based on a topic clustering method, and/or performing topic extraction on the public opinion data based on a co-word analysis method; and/or extracting emotion tendentiousness text in the public opinion data based on a text mining technology.
And (3) removing useless characters and stop words, and simultaneously extracting keywords, entity names, public opinion topics and emotion tendentiousness texts, so that the public opinion data can be initially classified, and structured public opinion information is formed.
Further, the object class comprises a text class, a topic class and a theme class, wherein the public opinion object of the object class is generated according to the expression text of the public opinion information, the public opinion object of the object class is generated according to the expression topic of the public opinion information, and the public opinion object of the object class is generated according to the expression topic of the public opinion information.
The object categories represent the types of the public opinion data processing and analysis objects, and the public opinion information expression models related to different object categories are different. According to the order of gradually increasing semantic abstraction degree, classification of the public opinion objects is divided into three categories of text, topic and theme, so that further mining of public opinion information is facilitated.
Further, the semantic information includes semantic granularity and semantic content.
The semantic information of each public opinion object can be divided into a plurality of semantic granularities, each semantic granularity can have a plurality of semantic records, and each semantic record has a respective number and semantic content.
Further, the emotion information comprises emotion subjects, emotion objects, emotion categories and emotion intensities;
generating emotion information in the public opinion object according to the public opinion information comprises the following steps:
extracting emotion subjects and emotion objects from public opinion information based on a named entity recognition method;
identifying emotion words from emotion object contexts based on an association rule mining method, and determining emotion categories according to the emotion words;
and judging the emotion strength according to the emotion words based on the emotion tendency judging method.
The emotion information quadruple structure, namely the emotion subject, the emotion object, the emotion category and the emotion strength, can realize the mining of public opinion and emotion, further analyze the changes generated by the opinion and emotion along with the development of the public opinion situation, and provide data support for the subsequent analysis of the public opinion object.
Further, emotion words include emotion words, negatives, degree adverbs and symbolic expressions.
Emotion words, negatives, degree adverbs and symbolic expressions have certain importance for distinguishing emotion categories and emotion intensities in emotion information.
Further, the relationship information includes a relationship category and other public opinion objects having a relationship with the public opinion object, and the relationship category includes an association relationship, an aggregation relationship and a dependency relationship.
The relationship information can express the relationship among the public opinion objects, and the relationship among the public opinion objects can be easily combed by describing the category of the relationship information through the association relationship, the aggregation relationship and the dependency relationship.
Compared with the prior art, the invention has the beneficial effects that: by constructing a multi-granularity public opinion space-time object description attribute frame, public opinion information is generated under the attribute frame to form a public opinion object, and the time characteristics, the space characteristics, the evolution and the propagation characteristics of public opinion entities can be subjected to abstract description so as to rapidly process mass multi-source public opinion data and obtain valuable public opinion information, thereby providing data support for subsequent public opinion case similarity matching and public opinion control scheme analysis and timely and correctly making decisions of public opinion supervision control.
Drawings
FIG. 1 is a flowchart of a public opinion analysis method according to an embodiment of the present invention.
Fig. 2 is a flow chart of matching public opinion objects with public opinion case similarity in a public opinion case base according to an embodiment of the present invention.
Detailed Description
The drawings are for illustrative purposes only and are not to be construed as limiting the invention. For better illustration of the following embodiments, some parts of the drawings may be omitted, enlarged or reduced, and do not represent the actual product dimensions; it will be appreciated by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted.
As shown in fig. 1, the embodiment provides a public opinion analysis method, which includes:
s1, collecting public opinion data, wherein the public opinion data can comprise public opinion under project public opinion data, reports of media to the project public opinion data and discussion of the project public opinion data;
s2, preprocessing public opinion data to obtain structured public opinion information;
s3, generating a public opinion object according to the public opinion information, wherein the public opinion object comprises an object identification, an object category, space-time information, semantic information, emotion information and relation information;
s4, carrying out similarity matching on the public opinion object and the public opinion cases in the public opinion case library to obtain the most similar public opinion cases;
s5, obtaining a public opinion control scheme according to the most similar public opinion case analysis.
Public opinion is a collection of individual opinions expressed by people in the process of social events, problem generation, development and change, and comprises expression, transmission, interaction and influence of the individual emotion opinions. The process of generating and developing social events and problems has a respective lifecycle. In different time phases of life cycle, individual opinions are changed continuously and have specific spatial range and propagation law, which is called public opinion entity. And the multi-source heterogeneous public opinion data is used as a data source to acquire public opinion event content and popular emotion views, and the space-time law of public opinion propagation diffusion is analyzed, so that decision support can be provided for public opinion guidance.
In step S1, the collected project publicity data may include project case number, construction project name, construction project location, and the like. The public generally can submit feedback comments under the project exposure, thereby forming public comments fed back to the project exposure, and the collected public comments can include feedback comments, contact addresses of feedback persons, and the like. The reporting of the collected media to project listings may include article titles, publication times, published media names, article content, and the like. The discussion of the public collected public on the project publicity data can be a sub-title of the public bar, a posting time, a poster, a reviewer, a posting time, a posting content and the like.
Aiming at network media propagation platforms where the network public opinion is located, such as network media of microblogs, news websites, forums and the like, the API interface and HTML analysis provided by social media can be utilized to collect the network public opinion data, and high-performance network crawler strategies, such as a network crawler based on multiple threads, are designed, multiple machines are used and data are crawled, so that the data capturing efficiency is improved, and the real-time acquisition and automatic update of the network public opinion data are realized.
The public opinion data not only comprises text data in network media, but also comprises heterogeneous social network data such as text forwarding quantity and forwarding relation, and has the characteristics of large data quantity, short timeliness, abundant sources, complex form, unstructured and the like. Therefore, in step S2, the public opinion data collected in step S1 is preprocessed to a certain extent, so that the public opinion data is converted into structured public opinion information.
In step S2, preprocessing public opinion data may specifically include: identifying and removing useless characters of public opinion data, and/or performing word segmentation on the public opinion data and removing dead words of the public opinion data, and/or extracting keywords of the public opinion data based on a word frequency statistics method, and/or extracting entity names of the public opinion data based on a named entity recognition method, and/or gathering the public opinion data based on a topic clustering method, and/or performing topic extraction on the public opinion data based on a co-word analysis method; and/or extracting emotion tendentiousness text in the public opinion data based on a text mining technology.
The useless characters are punctuations or emoticons without emotion expression, such as commas, stop signs and the like. The stop words are words that are not effective for extracting information, such as "on", and the like.
The word frequency statistical method is a common weighting technology for information retrieval and text mining, and is used for evaluating the repetition degree of a word for a field file set in a file or a corpus, and the importance of the word is increased in proportion to the occurrence frequency of the word in the file, so that words with higher word frequency can be extracted as keywords of public opinion data based on the word frequency statistical method.
Named entity recognition (Named Entity Recognition, NER) refers to the recognition of special objects in text whose semantic categories are usually predefined well before recognition, predefined categories like people, addresses, organizations, etc. The name recognition method can extract entity categories such as place names, organization names, time, person names, events and the like in the public opinion data.
The topic clustering method can adopt a k-means clustering algorithm, a condensed hierarchical clustering algorithm, a neural network clustering algorithm and the like. The topic clustering method can be used for gathering the public opinion data with similar topics.
The Co-word Analysis method (Co-word Analysis) is to count the number of times that a group of words appear in the same document every two times, and based on the number of times, perform cluster Analysis on the words, so as to reflect the relatedness and sparsity between the words, and further analyze the structure and change of the subject represented by the words. The co-word analysis method can respectively analyze the co-word by the subject word and the key word of the file. The public opinion topics can be extracted based on the co-word analysis method, so that the extraction of the public opinion topics is realized, and topic association analysis and hot spot detection are realized.
Text Mining refers to computer processing techniques that extract valuable information and knowledge from Text data. And extracting texts with emotion tendencies in the public opinion data by a text mining technology.
And (3) removing useless characters and stop words, and simultaneously extracting keywords, entity names, public opinion topics and emotion tendentiousness texts, so that the public opinion data can be initially classified, and structured public opinion information is formed.
In step S3, in order to design a structural model suitable for parallel processing of public opinion big data, and meanwhile, in consideration of the relevance characteristics of public opinion information, a multi-granularity public opinion space-time object description attribute frame is constructed, the structured public opinion information is generated into public opinion objects, and the time characteristics, the space characteristics, the evolution and the propagation characteristics of public opinion entities can be abstract described, so that massive multi-source public opinion data can be rapidly processed, and valuable public opinion information can be obtained.
Specifically, the definition of the public opinion object may be as follows: public opinion object= { object unique Identification (ID), object category, spatiotemporal information, semantic information, affective information, relationship information }. The public opinion information such as the space-time attribute, the text topic, the subject content, the emotion tendency, the relation with other objects and the like extracted from the public opinion information can be expressed through five types of attributes of the multi-granularity public opinion object, namely object category, space-time information, semantic information, emotion information and relation information.
The object class can comprise a text class, a topic class and a theme class, wherein the object class is that a public opinion object of the text class is generated according to an expression text of public opinion information, the object class is that a public opinion object of the topic class is generated according to an expression topic of the public opinion information, and the object class is that a public opinion object of the theme class is generated according to an expression topic of the public opinion information.
The object categories represent the types of the public opinion data processing and analysis objects, and the public opinion information expression models related to different object categories are different. According to the order of gradually increasing semantic abstraction degree, classifying the public opinion object categories into three categories of 'text category, topic category and theme category', wherein the text category public opinion object is a description model constructed for public opinion texts, and expressing public opinion information contained in the texts. The topic class public opinion object is a description model constructed for public opinion topics and expresses public opinion information contained in the topics. The topic type public opinion object is a description model constructed for public opinion topics and expresses public opinion information contained in the topics.
The spatiotemporal information may include temporal information and spatial information for expressing the time and space of occurrence and termination of public opinion, evolution and propagation in time and space, etc.
Specifically, the definition of the spatio-temporal information may be as follows:
spatio-temporal information= { temporal information, spatial information };
time information= { time granularity, [ start time, end time ] };
spatial information = { spatial granularity, spatial position }.
The semantic information may include semantic granularity and semantic content for expressing content of the public opinion text, and the semantic information of each public opinion object may be divided into a plurality of semantic granularities, each semantic granularity may have a plurality of semantic records, and each semantic record has a respective number and semantic content.
Specifically, the definition of the semantic information may be as follows:
semantic information = {
(semantic granularity 1, ([ number 1.1, semantic content ], [ number 1.2, semantic content ], … …)),
(semantic granularity 2, ([ number 2.1, semantic content ], [ number 2.2, semantic content ], … …)),
(semantic granularity 3, ([ number 3.1, semantic content ], [ number 3.2, semantic content ], … …)), … … }.
The emotion information may include emotion subjects, emotion objects, emotion categories and emotion intensities, and is used for expressing public opinion emotion content, one public opinion object may have a plurality of emotion records, and the emotion content is structurally expressed by four members, namely emotion subjects, emotion objects, emotion categories and emotion intensities. "emotion subject" represents an emotion's exposer, typically a netizen individual or a network media platform. An "emotion object" represents an object for which emotion is aimed, such as properties of a commodity, content of rumors, and the like. An "emotion category" is a category of emotion, including happiness, sadness, anger, approval, objection, suspicion, and the like. The emotion intensity is a score of the emotion intensity degree, and can be represented by a number and quantified by an emotion classification method and the like. Based on the quadruple structure, the mining of public opinion and emotion can be realized, so that the change of opinion and emotion generated along with the development of public opinion situation is analyzed, and data support is provided for the subsequent analysis of public opinion objects.
Specifically, the definition of emotion information may be as follows:
emotion information = { [ Emotion record 1, emotion content ], [ Emotion record 2, emotion content ], [ Emotion record i, emotion content ], … … } (i > = 1);
emotion content= { emotion subject, emotion object, emotion category, emotion intensity }.
The emotion information may be generated from public opinion information by: extracting emotion subjects and emotion objects from public opinion information based on a named entity recognition method; identifying emotion words from emotion object contexts based on an association rule mining method, and determining emotion categories according to the emotion words; and judging the emotion strength according to the emotion words based on the emotion tendency judging method.
The emotion tendency judging method can be based on word frequency statistics, judges the emotion intensity by calculating the co-occurrence frequency between emotion words and emotion reference words corresponding to the determined emotion types, and expresses the emotion intensity by judging the emotion polarity of the emotion words. Specifically, the emotion polarity can be judged as follows: the positive emotion vocabulary quantity is larger than the negative emotion vocabulary quantity, and the emotion polarity is positive; positive emotion vocabulary number = negative emotion vocabulary number, meaning emotion polarity is neutral; the positive emotion vocabulary quantity is less than the negative emotion vocabulary quantity, and the emotion polarity is negative.
The emotion vocabulary may include emotion words, negatives, degree adverbs and symbolic expressions.
The emotion words can be derived from an emotion dictionary, and the emotion dictionary can adopt a How Net or an ANTUSD, wherein the How Net contains 4566 positive emotion words and 4370 negative emotion words, and the ANTUSD contains 2810 positive emotion words and 8276 negative emotion words.
The negatives are words which can change the emotion polarity of the text, can change positive emotion into negative emotion or change negative emotion into positive emotion, and have double and multiple negations besides general negatives, so that the recognition of the negatives is critical for the determination of the subsequent emotion type and the judgment of emotion intensity.
The degree adverbs are mainly modification of corresponding adjectives or adverbs in the text, and usually appear in front of the adjectives or the adverbs, and in emotion analysis, the degree adverbs can weaken or strengthen the emotion intensity of the emotion words, so that the degree adverbs are considered when judging the emotion intensity, and the judgment accuracy of the emotion intensity can be improved.
People often use emoticons to convey their emotion in a social platform, and the emoticons not only can add humor sense to the text, but also can eliminate text ambiguity. Emoticons are typically used in the following cases: (1) Text does not express emotion well), such as "how is anger in such a person anger" well conveys the emotion of anger of the user; (2) For disambiguation of text, such as "true sense of life [ tear ]", when analyzed solely by text, "sense of meaning" causes the polarity of the text to be positive, but obviously the sentence is negative, and the emoticon [ tear ] correctly conveys the emotion classification; (3) Enhancing text emotion, such as "the movie is too good looking [ too happy ]" the expression symbol [ too happy ] enhances the emotion of the text.
The relationship information may include relationship categories and other public opinion objects having relationships with the public opinion objects, wherein the relationship categories may be association relationships, aggregation relationships, dependency relationships, and the like. The relationship information is used to express the relationship between the public opinion objects.
Specifically, the definition of the relationship information may be as follows:
relation information = {
(relationship category 1, ([ number 1.1, object ID ], [ number 1.2, object ID ], … …)),
(relationship category 2, ([ number 2.1, object ID ], [ number 2.2, object ID ], … …)),
(relationship category 3, ([ number 3.1, object ID ], [ number 3.2, object ID ], … …)) … … }.
In step S4, the public opinion case base may store historical public opinion cases, and the historical public opinion cases may be stored in the public opinion case base in the form of event objects, where the event objects may include event identification, start time, end time, event topic, event keywords, and event profile. An object-oriented technology is introduced to model the historical public opinion control cases as objects and serve as an independent knowledge unit in a public opinion case base, so that more complex knowledge identification and knowledge reasoning can be performed.
When the public opinion objects are subjected to similarity matching with the public opinion cases in the public opinion case library, the five types of attributes under the multi-granularity public opinion space-time object description attribute frame of the public opinion objects can be emphasized to be subjected to detailed analysis, the characteristics of each attribute are fully considered, different attributes are processed, the similarity among the attributes is well mined, and therefore the most similar public opinion cases are more scientifically and reasonably matched, and the public opinion control scheme obtained by analyzing according to the most similar public opinion cases in the step S5 is more practical.
The case structure of different public opinion cases has different characteristics, and the public opinion case information may be incomplete, and the information description of the case control scheme may be incomplete. And analyzing the structural similarity between the public opinion object and the public opinion case can well solve the problem of attribute deficiency. Thus, as shown in fig. 2, step S4 may specifically include:
s41, performing structural similarity matching on the attribute structure of the public opinion object and the attribute structure of the public opinion case in the public opinion case library to obtain an event type to which the public opinion object belongs;
s42, performing attribute similarity matching on the attribute of the public opinion object and the attribute value of the public opinion case belonging to the event type in the public opinion case library to obtain the most similar public opinion case.
Firstly, in step S41, structural similarity matching is performed on the attribute structure, event types to which public opinion objects belong are matched, and then in step S42, attribute similarity matching is performed on attribute values according to the matched event types, so that matching accuracy can be greatly improved.
Specifically, according to the diversity of the attribute values of the public opinion object and by combining the characteristics of the public opinion cases, the following three types of attribute similarity matching can be flexibly adopted:
(1) Digital attribute similarity matching: typically represented by definite numbers, either continuously or discretely. Calculating the similarity of two values by a Hamming distance and Euclidean distance equidistant calculation method;
(2) Symbol attribute similarity matching: usually represented by an explicit symbolic attribute, such as case publication time, event execution location, etc. The symbol attribute values have no quantitative relation, and only have the same (or contain) and different relations, so that the similarity of two symbols can be judged directly by judging whether the symbol attribute values are the same or not;
(3) Fuzzy attribute similarity matching: the fuzzy attribute includes fuzzy semantic attribute, fuzzy number attribute, fuzzy interval attribute, etc. And calculating the similarity of the two fuzzy attributes through membership functions such as a trapezoidal function, a triangular function, a Gaussian function and the like.
The most similar public opinion cases which are considered to be matched only take the similarity as a unique standard and lack in credibility, so that the confidence analysis can be combined on the basis of similarity matching. Step S4 may further include:
s43, presetting a confidence index and establishing a confidence decision tree;
s44, analyzing whether the attribute (such as space-time information, semantic information, emotion information and the like) of the public opinion object is credible or not according to the confidence decision tree.
By adopting the structure similarity matching and attribute similarity matching modes, although more accurate similarity can be obtained, the cost price ratio of the required time is larger, and when the public opinion case library is continuously increased, the required time is also increased in the same proportion. Therefore, the similarity matching based on the Bayesian probability model can be adopted, and the time cost of the matching can be reduced.
In step S41, a certain public opinion object is recorded asThe +.f in the public opinion case base>The event type is->Event type->Consists of a plurality of public opinion cases;
calculating public opinion objects according toAnd event type->Structural similarity->
Is an empirical factor->Is subject of public opinion->Attribute number of->Is subject of public opinion->And event type->The same attribute number, ++>For event type->Is a number of attributes of (a).
According to the structural similarityCan judge the public opinion object +.>The event type is->Event type
Specifically, a threshold value may be presetWhen the calculated maximum structural similarity +.>Greater than a preset thresholdAt the same time, it can be considered that the corresponding public opinion +.>The event type is->Event type->
In step S42, a certain public opinion object is recorded asPublic opinion object->The event type is->Event type->The public opinion case base belongs to event type +.>Is +.>
Because of the independence between attributes, i.e. the absence of dependencies, and conditional attributes, the conditional probability can be calculated:
then there are:
in summary, the conditional probability can be calculated as follows:
is the event type in the public opinion case base>And is->The number of public opinion cases with matched attributes,is the total number of public opinion cases in the public opinion case library, </i >>Is subject of public opinion->Attribute of->For event type->Has the attribute->The number of public opinion cases->Is event type +.>Is a property weight of (a).
According to conditional probabilityThe most similar public opinion cases can be judged.
Specifically, one public opinion case or a maximum plurality of public opinion cases with the greatest conditional probability may be regarded as the most similar public opinion cases of the public opinion object X. The public opinion cases corresponding to the maximum conditional probability may be calculated according to the following equation:
the step S5 may specifically be: and deducing the solution of the most similar public opinion case according to the actual situation, thereby obtaining the public opinion control scheme of the current public opinion event.
The step S5 may specifically be: and analyzing different points and the same points of the current public opinion event and the most similar public opinion case, and adjusting the treatment strategy of the most similar public opinion case to obtain a public opinion control scheme of the current public opinion event.
Through the steps S1 to S5, the collected public opinion data can be fully mined, data support is provided for subsequent similarity matching with public opinion cases and public opinion control scheme analysis, and scientific decision reference is provided for supervision of network public opinion.
It should be understood that the foregoing examples of the present invention are merely illustrative of the present invention and are not intended to limit the present invention to the specific embodiments thereof. Any modification, equivalent replacement, improvement, etc. that comes within the spirit and principle of the claims of the present invention should be included in the protection scope of the claims of the present invention.

Claims (7)

1. A public opinion analysis method, comprising:
collecting public opinion data;
preprocessing the public opinion data to obtain structured public opinion information;
generating a public opinion object according to the public opinion information, wherein the public opinion object comprises an object identification, an object category, space-time information, semantic information, emotion information and relationship information;
performing similarity matching on the public opinion object and the public opinion cases in the public opinion case library to obtain the most similar public opinion cases;
obtaining a public opinion control scheme according to the most similar public opinion case analysis;
and carrying out similarity analysis on the public opinion object and the public opinion cases in the public opinion case library to obtain the most similar public opinion cases, wherein the method comprises the following steps:
performing structural similarity matching on the attribute structure of the public opinion object and the attribute structure of the public opinion case in the public opinion case library to obtain an event type to which the public opinion object belongs;
performing attribute similarity matching on the attribute of the public opinion object and the attribute value of the public opinion case belonging to the event type in a public opinion case library to obtain the most similar public opinion case;
performing structural similarity matching on the attribute structure of the public opinion object and the attribute structure of the public opinion case in the public opinion case library to obtain the event type of the public opinion object, wherein the event type comprises the following steps:
remembering a certain public opinion object asThe +.f in the public opinion case base>The event type is->Event type->Consists of a plurality of public opinion cases;
calculating public opinion objects according toAnd event type->Structural similarity->
Is an empirical factor->Is subject of public opinion->Attribute number of->Is subject of public opinion->And event type->The same attribute number, ++>For event type->The number of attributes of (a);
according to the structural similarityJudging the public opinion object->The event type is->Event type->
Performing attribute similarity matching on the attribute of the public opinion object and the attribute value of the public opinion case belonging to the event type in a public opinion case library to obtain the most similar public opinion case, wherein the method comprises the following steps:
remembering a certain public opinion object asPublic opinion object->The event type is->Event type->The public opinion case base belongs to event type +.>Is +.>
The conditional probability is calculated according to the following formula:
is of event type in public opinion case library/>And is->Number of public opinion cases with matched attributes, +.>Is the total number of public opinion cases in the public opinion case library, </i >>Is subject of public opinion->Attribute of->For event type->Has attributes thereinThe number of public opinion cases->Is event type +.>Attribute weights of (a);
according to conditional probabilityJudging and obtaining the most similar public opinion cases.
2. The public opinion analysis method of claim 1, wherein preprocessing the public opinion data comprises:
identifying and removing useless characters of the public opinion data, and/or performing word segmentation on the public opinion data and removing dead words of the public opinion data, and/or extracting keywords of the public opinion data based on a word frequency statistics method, and/or extracting entity names of the public opinion data based on a named entity identification method, and/or gathering the public opinion data based on a topic clustering method, and/or performing topic extraction on the public opinion data based on a co-word analysis method; and/or extracting emotion tendentiousness text in the public opinion data based on a text mining technology.
3. The public opinion analysis method according to claim 1, wherein the object class includes a text class, a topic class, and a topic class, the public opinion object of the object class is generated according to an expression text of the public opinion information, the public opinion object of the object class is generated according to an expression topic of the public opinion information, and the public opinion object of the object class is generated according to an expression topic of the public opinion information.
4. The public opinion analysis method of claim 1, wherein the semantic information includes semantic granularity and semantic content.
5. The public opinion analysis method according to claim 1, wherein the emotion information includes emotion subjects, emotion objects, emotion categories, and emotion intensities;
generating emotion information in the public opinion object according to the public opinion information comprises the following steps:
extracting emotion subjects and emotion objects from the public opinion information based on a named entity recognition method;
identifying emotion words from the emotion object context based on an association rule mining method, and determining emotion categories according to the emotion words;
and judging the emotion strength according to the emotion vocabulary based on the emotion tendency judging method.
6. The public opinion analysis method of claim 5, wherein the emotion vocabulary includes emotion words, negatives, degree adverbs, and symbolic expressions.
7. The public opinion analysis method of claim 1, wherein the relationship information includes a relationship category and other public opinion objects having a relationship with the public opinion object, the relationship category including an association relationship, an aggregation relationship, and a dependency relationship.
CN202010750608.1A 2020-07-30 2020-07-30 Public opinion analysis method Active CN111914087B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010750608.1A CN111914087B (en) 2020-07-30 2020-07-30 Public opinion analysis method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010750608.1A CN111914087B (en) 2020-07-30 2020-07-30 Public opinion analysis method

Publications (2)

Publication Number Publication Date
CN111914087A CN111914087A (en) 2020-11-10
CN111914087B true CN111914087B (en) 2023-09-19

Family

ID=73286820

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010750608.1A Active CN111914087B (en) 2020-07-30 2020-07-30 Public opinion analysis method

Country Status (1)

Country Link
CN (1) CN111914087B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112734154B (en) * 2020-11-16 2023-08-01 中山大学 Multi-factor public opinion risk assessment method based on fuzzy number similarity
CN112559844B (en) * 2020-12-17 2021-08-31 北京邮电大学 Natural disaster public opinion analysis method and device
CN112800321B (en) * 2021-01-05 2023-01-20 百威投资(中国)有限公司 Ambiguous post identification method based on keyword retrieval and computer equipment
CN112862305A (en) * 2021-02-03 2021-05-28 北京百度网讯科技有限公司 Method, device, equipment and storage medium for determining risk state of object
CN113297498B (en) * 2021-06-22 2023-05-26 南京晓庄学院 Internet-based food attribute mining method and system
CN113918794B (en) * 2021-12-13 2022-03-29 宝略科技(浙江)有限公司 Enterprise network public opinion benefit analysis method, system, electronic equipment and storage medium
WO2024031550A1 (en) * 2022-08-11 2024-02-15 Accenture Global Solutions Limited Trending topic discovery with keyword-based topic model
CN118469353B (en) * 2024-07-12 2024-09-17 山东工程职业技术大学 Campus network security emergency management method and system based on public opinion monitoring

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102708096A (en) * 2012-05-29 2012-10-03 代松 Network intelligence public sentiment monitoring system based on semantics and work method thereof
CN103440287A (en) * 2013-08-14 2013-12-11 广东工业大学 Web question-answering retrieval system based on product information structuring
KR20160111715A (en) * 2015-03-17 2016-09-27 김시영 Method for public opinion making using social network based on emotion analysys
RU2637992C1 (en) * 2016-08-25 2017-12-08 Общество с ограниченной ответственностью "Аби Продакшн" Method of extracting facts from texts on natural language
CN109408804A (en) * 2018-09-03 2019-03-01 平安科技(深圳)有限公司 The analysis of public opinion method, system, equipment and storage medium
WO2020141968A1 (en) * 2018-12-31 2020-07-09 Mimos Berhad A system and method for impact analysis of change request that affects database structure through classificiation and keyword similarity analysis

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150286627A1 (en) * 2014-04-03 2015-10-08 Adobe Systems Incorporated Contextual sentiment text analysis
US10445062B2 (en) * 2016-09-15 2019-10-15 Oracle International Corporation Techniques for dataset similarity discovery

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102708096A (en) * 2012-05-29 2012-10-03 代松 Network intelligence public sentiment monitoring system based on semantics and work method thereof
CN103440287A (en) * 2013-08-14 2013-12-11 广东工业大学 Web question-answering retrieval system based on product information structuring
KR20160111715A (en) * 2015-03-17 2016-09-27 김시영 Method for public opinion making using social network based on emotion analysys
RU2637992C1 (en) * 2016-08-25 2017-12-08 Общество с ограниченной ответственностью "Аби Продакшн" Method of extracting facts from texts on natural language
CN109408804A (en) * 2018-09-03 2019-03-01 平安科技(深圳)有限公司 The analysis of public opinion method, system, equipment and storage medium
WO2020141968A1 (en) * 2018-12-31 2020-07-09 Mimos Berhad A system and method for impact analysis of change request that affects database structure through classificiation and keyword similarity analysis

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
针对微博数据的信息抽取与舆情分析;熊祖涛;;信息系统工程(03);全文 *

Also Published As

Publication number Publication date
CN111914087A (en) 2020-11-10

Similar Documents

Publication Publication Date Title
CN111914087B (en) Public opinion analysis method
Zad et al. A survey on concept-level sentiment analysis techniques of textual data
Saberi et al. Sentiment analysis or opinion mining: A review
CN104820629B (en) A kind of intelligent public sentiment accident emergent treatment system and method
CN111950273A (en) Network public opinion emergency automatic identification method based on emotion information extraction analysis
KR20120108095A (en) System for analyzing social data collected by communication network
Faruque et al. Ascertaining polarity of public opinions on Bangladesh cricket using machine learning techniques
Petchler et al. Automated content analysis of online political communication
Demirci Emotion analysis on Turkish tweets
Nahar et al. Sentiment analysis and emotion extraction: A review of research paradigm
Yu et al. Exploiting structured news information to improve event detection via dual-level clustering
Trisal et al. K-RCC: A novel approach to reduce the computational complexity of KNN algorithm for detecting human behavior on social networks
Atoum Detecting cyberbullying from tweets through machine learning techniques with sentiment analysis
Abdi et al. Using an auxiliary dataset to improve emotion estimation in users’ opinions
Shah et al. Cyber-bullying detection in hinglish languages using machine learning
CN116738068A (en) Trending topic mining method, device, storage medium and equipment
Bruno et al. Personality traits prediction from text via machine learning
Lu et al. Data mining and social networks processing method based on support vector machine and k-nearest neighbor
Fadhli et al. Survey-credible conversation and sentiment analysis
Liu et al. An Emotion-Aware Approach for Fake News Detection
Sisodia et al. Sentiment analysis of prospective buyers of mega online sale using tweets
Suresh An innovative and efficient method for Twitter sentiment analysis
Prakash et al. Lexicon Based Sentiment Analysis (LBSA) to Improve the Accuracy of Acronyms, Emoticons, and Contextual Words
Alorini et al. Machine learning enabled sentiment index estimation using social media big data
Wang et al. Natural language processing systems and Big Data analytics

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB03 Change of inventor or designer information

Inventor after: Huang Tao

Inventor after: Gong Xun

Inventor after: He Ying

Inventor after: Liang Shaoyong

Inventor after: Wan Zhongping

Inventor before: Huang Tao

Inventor before: Cheng Pu

Inventor before: Gong Xun

Inventor before: He Ying

Inventor before: Liang Shaoyong

Inventor before: Wan Zhongping

CB03 Change of inventor or designer information
GR01 Patent grant
GR01 Patent grant