CN115017302A - Public opinion monitoring method and public opinion monitoring system - Google Patents
Public opinion monitoring method and public opinion monitoring system Download PDFInfo
- Publication number
- CN115017302A CN115017302A CN202210047264.7A CN202210047264A CN115017302A CN 115017302 A CN115017302 A CN 115017302A CN 202210047264 A CN202210047264 A CN 202210047264A CN 115017302 A CN115017302 A CN 115017302A
- Authority
- CN
- China
- Prior art keywords
- public opinion
- data
- keyword
- public
- text
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
- 238000012544 monitoring process Methods 0.000 title claims abstract description 49
- 238000000034 method Methods 0.000 title claims abstract description 32
- 238000004458 analytical method Methods 0.000 claims abstract description 44
- 238000007781 pre-processing Methods 0.000 claims abstract description 27
- 238000012545 processing Methods 0.000 claims abstract description 7
- 230000008451 emotion Effects 0.000 claims description 26
- 238000004364 calculation method Methods 0.000 claims description 25
- 238000000605 extraction Methods 0.000 claims description 20
- 230000011218 segmentation Effects 0.000 claims description 18
- 239000013598 vector Substances 0.000 claims description 12
- 230000014509 gene expression Effects 0.000 claims description 11
- 230000004927 fusion Effects 0.000 claims description 8
- 238000012549 training Methods 0.000 claims description 6
- 238000007619 statistical method Methods 0.000 claims description 5
- 230000010354 integration Effects 0.000 claims description 3
- 238000004422 calculation algorithm Methods 0.000 description 8
- 230000009286 beneficial effect Effects 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 2
- 238000013480 data collection Methods 0.000 description 2
- PCHJSUWPFVWCPO-UHFFFAOYSA-N gold Chemical compound [Au] PCHJSUWPFVWCPO-UHFFFAOYSA-N 0.000 description 2
- 239000010931 gold Substances 0.000 description 2
- 229910052737 gold Inorganic materials 0.000 description 2
- 230000001174 ascending effect Effects 0.000 description 1
- 238000013016 damping Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000005484 gravity Effects 0.000 description 1
- 238000005065 mining Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012419 revalidation Methods 0.000 description 1
- 239000013589 supplement Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9535—Search customisation based on user profiles and personalisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/049—Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a public opinion monitoring method and a public opinion monitoring system, wherein the public opinion monitoring method comprises the following steps: s1: acquiring a keyword; s2: performing keyword expansion operation on the keywords to obtain a keyword library; s3: extracting sensitive words in the keyword library to obtain a sensitive word library; s4: collecting final public opinion data of the keyword library and the sensitive word library; s5: carrying out preprocessing operation on the public opinion data to obtain a preprocessing result; s6: carrying out public sentiment analysis processing on the preprocessing result to obtain an analysis result; s7: and obtaining a public opinion monitoring result according to the analysis result. The public opinion monitoring method and the public opinion monitoring system provided by the invention can effectively improve the comprehensiveness and accuracy of public opinion data.
Description
Technical Field
The invention relates to the technical field of public opinion monitoring, in particular to a public opinion monitoring method and a public opinion monitoring system.
Background
In the information age, the internet has become an important channel and carrier for information transmission in the current society, social media based on internet technology is widely applied in social life, and people have put the center of gravity of information collection on a network social platform. However, in the background of the era of big data, the data scale in network media is continuously increased, the data forms are more diversified, the information transmission speed is continuously improved, and the changes are expected to be accurate in monitoring and bring about little challenge to public opinion monitoring work with quick response. The existing public opinion monitoring system has the following problems:
1. the acquired data has fewer sources and incomplete information. Most network public opinion monitoring systems only collect and analyze data for a single website, but nowadays, social platforms and network media are all in a whole, different platforms have user characteristics of different platforms, and respective public opinion data also have different values. The public opinion data can be collected in multiple channels, so that the network public opinion condition can be reflected more comprehensively and accurately.
2. Public opinion retrieval is inaccurate. Under the background of big data, the scale of network public opinion data is huge, and the information is complicated. Some public opinion monitoring systems directly use keywords provided by users to collect information, however, since there are many near-meaning words in vocabularies, there may be different network popular word expressions at different times, and there may be ambiguity in words and sentences, etc., most of the collected information in the ordinary sense is retrieved by directly using initial keywords, and the most comprehensive public opinion data and sensitive public opinions concerned by users cannot be obtained.
3. The single text emotion analysis method has poor resolution effect on problems such as irony, sentence ambiguity and the like. Most of the current emotion analysis methods are directed to emotion analysis of texts, and the methods cannot identify ironic sentences well because the ironic sentences discard context and normal sentences which are very different. In the network environment of today, pictures such as emoji, emoticons and the like become important supplements for people to express emotions, and are worthy of attracting attention in the field of emotion analysis.
Disclosure of Invention
The invention aims to provide a public opinion monitoring method and a public opinion monitoring system, which can effectively improve the comprehensiveness and accuracy of public opinion data.
The technical scheme for solving the technical problems is as follows:
the invention provides a public opinion monitoring method, which comprises the following steps:
s1: acquiring a keyword input by a user;
s2: performing keyword expansion operation on the keywords to obtain a keyword library;
s3: extracting sensitive words in the keyword library to obtain a sensitive word library;
s4: collecting final public opinion data of the keyword library and the sensitive word library;
s5: carrying out preprocessing operation on the final public sentiment data to obtain a preprocessing result;
s6: carrying out public sentiment analysis processing on the preprocessing result to obtain an analysis result;
s7: and obtaining a public opinion monitoring result according to the analysis result.
Alternatively, the step S2 includes:
searching in related data sources by using the keywords to obtain a plurality of pieces of data information matched with the keywords;
and obtaining the keyword library according to all the data information.
Alternatively, the step S3 includes:
performing word segmentation operation on all data in the keyword library by using a word segmentation toolkit to obtain a word segmentation database;
converting all the word segmentation data information into word vector information;
extracting negative words in the word segmentation database by using a BilSTM model according to the word vector information;
and taking the negative words as sensitive words to obtain the sensitive word bank.
Alternatively, the step S4 includes:
s41: configuring a data acquisition expression, and merging the keyword library and the sensitive word library into a combined word library;
s42: searching a related public opinion news list by using the combined word bank;
s43: adding the webpage address of the current news page of the related public opinion news list into a list to be collected;
s44: extracting the webpage address from a list to be collected, and accessing the related information of the current news page to form initial public opinion data;
s45: if the initial public opinion data simultaneously satisfies the integrity and the uniqueness, the step S46 is carried out, otherwise, the step S47 is carried out;
s46: outputting the initial public sentiment data as the final public sentiment data;
s47: and judging whether the current news page is the last page of the related public opinion news list, if so, returning to the step S46, otherwise, returning to the step S43.
Alternatively, the step S5 includes:
processing the final public opinion data in batches to obtain multiple batches of public opinion data;
removing special characters and useless characters from each batch of public opinion data by using a regular expression to obtain processed final public opinion data;
performing data feature extraction operation on the processed final public sentiment data to obtain a feature extraction result;
and outputting the feature extraction result as the preprocessing result.
Optionally, the public opinion analysis operation includes: general statistical analysis, keyword extraction, heat calculation and multi-modal emotion analysis.
Optionally, the heat calculation includes a heat index calculation of a single data source and a heat index calculation of a plurality of data sources, and the heat index calculation formula of the plurality of data sources is:
wherein H is a calorific value, H i Heat index integration of all final public sentiment data for the ith related data source, W i A heat weight for the associated data source;
the heat index x of a single relevant data source is calculated by the formula:
wherein E is the user attention index, T, of each relevant data source s Reflecting the freshness of related public opinion news and T s A is the release time, B is the acquisition time, T represents the total number of seconds in a thermal cycle of 3 days and T is 259200.
Optionally, the multimodal sentiment analysis comprises:
acquiring picture characteristics and character characteristics in the preprocessing result;
training a picture text alignment network according to the picture characteristics and the text characteristics to obtain a trained picture text alignment network;
according to the picture features and the text features, utilizing the trained picture text to align a network to obtain fusion features;
taking the fusion features as input of a classifier to obtain a multi-modal emotion analysis result;
the loss function of the multi-modal emotion analysis model is as follows:
L=L CA -L DA
wherein L is CA Are lost for cross reconstruction andm is the number of samples, x j Representing the original characteristics of the j mode, D j Encoder representing j modality, E i Encoder representing i modality, x i RepresentsOriginal characteristics of the i-mode, L DA Is to distribute the alignment loss andW ij is the 2-Wasserstein distance between modes i and j andwherein, mu is in combination withAre all hidden layer feature vectors generated by the encoder.
Optionally, the photo text alignment network comprises: the system comprises a picture characteristic encoder, a text characteristic encoder, a shared characteristic layer and a plurality of shared characteristic decoders, wherein the picture characteristic encoder and the text characteristic encoder are simultaneously connected with the input end of the shared characteristic layer, the shared characteristic encoders are connected with the output end of the shared characteristic layer, and the shared characteristic layer is also connected with a classifier;
the picture characteristic encoder is used for encoding the picture characteristic;
the text feature encoder is used for encoding the text features;
a plurality of shared feature decoders for decoding the shared features to output reconstructed picture features and reconstructed text features;
the classifier is used for classifying the shared features so as to train the image text alignment network.
The invention also provides a public opinion monitoring system based on the public opinion monitoring method, and the public opinion monitoring system comprises:
the keyword acquisition module is used for acquiring keywords;
the keyword expansion module is used for expanding the keywords;
the sensitive word extraction module is used for extracting sensitive words in the keyword library;
the public opinion data acquisition module is used for acquiring final public opinion data of the keyword library and the sensitive word library;
the data preprocessing module is used for preprocessing the final public opinion data;
the public opinion analysis module is used for analyzing the preprocessing result;
and the public opinion reporting module is used for displaying the public opinion monitoring result to the user.
The invention has the following beneficial effects:
1. the method can improve the accuracy of public sentiment emotion analysis;
2. the method for expanding the keywords and extracting the sensitive words is combined to form a new search word, so that sensitive public sentiments concerned by the user can be effectively and comprehensively searched;
3. information acquisition is carried out based on a plurality of relevant data sources, and an extensible data acquisition interface is provided, so that the problems that the sources of data acquired by a public opinion monitoring system are few, and the information is incomplete can be solved.
Drawings
Fig. 1 is a flowchart of a public opinion monitoring method according to the present invention;
FIG. 2 is a flowchart illustrating steps S4 of FIG. 1;
FIG. 3 is a schematic structural diagram of a multi-modal emotion analysis model provided by the present invention.
Detailed Description
The principles and features of this invention are described below in conjunction with the following drawings, which are set forth by way of illustration only and are not intended to limit the scope of the invention.
Examples
The invention provides a public opinion monitoring method, which is shown in a reference figure 1 and comprises the following steps:
s1: acquiring a keyword input by a user;
the keyword here is generally a user input keyword.
S2: performing keyword expansion operation on the keywords to obtain a keyword library;
the step S2 includes:
searching in related data sources by using the keywords to obtain a plurality of pieces of data information matched with the keywords;
here, the related data sources include but are not limited to data sources such as microblog, today's headline, internet news, Tencent news, and the like; the search mode comprises the step of finding out key words in the search articles by using a TF-IDF algorithm and a TextRank algorithm.
TF-IDF is a commonly used weighting technique for information retrieval and text mining, which is a statistical method to evaluate how important a word is to one of a set of documents or a corpus of documents. TF represents the frequency of occurrence of the entry in the text by the formulaRepresenting the word frequency, n, of the word i in the document j i,j Representing the number of times the word i appears in document j,representing the sum of the number of occurrences of all words in the j document.IDF is the inverse document frequency, | D | is the total number of documents in the corpus, and the denominator | j: t i ∈d j The expression, | denotes the inclusion of the word t i The number of files of (c). The TF-IDF value of a certain word i for a certain category description text j is calculated as follows: TF-IDF i,j =TF i,j *IDF i
If the high word frequency of a certain word in a specific file and the low file frequency of the word in the whole file set exist, the TF-DF with high weight can be generated, so that common words can be filtered out, important words are reserved, and the extraction of keywords is realized.
The TextRank is based on the idea improvement of the Pagerank, is a graph-based sorting algorithm for keyword extraction and document summarization, can extract keywords by utilizing contribution information among words in a document, can extract the keywords and keyword groups of the text from a given text, and can extract the keywords of the text by using an extraction type automatic summarization method. The basic idea of TextRank is to treat a document as a network of words, where the links in the network represent semantic relationships from word to word. The formula is as follows:
wherein WS (V) i ) The weight of the sentence i is represented, the summation on the right side represents the contribution degree of each adjacent sentence to the sentence, in a single document, all sentences can be roughly considered to be adjacent, generation and extraction of multiple windows are not needed like multiple documents, only a single document window is needed, and W is ij Representing the similarity of two sentences, WS (V) j ) Representing the weight of the last iterated sentence j. d is the damping coefficient, typically 0.85.
From the above, it is obvious that TF-IDF is suitable for extracting rare words in an article, and is intended to find out words with high frequency in the article but with low frequency in a corpus, and is suitable for finding out some characteristic words, whereas TextRank algorithm is a simple method for extracting keywords of an article through a graph algorithm, and is suitable for finding out conventional keywords by discarding the corpus. The two methods are used simultaneously, and conventional keywords and special words in the corresponding fields are found out to expand the keyword library.
And obtaining the keyword library according to all the data information.
S3: extracting sensitive words in the keyword library to obtain a sensitive word library;
alternatively, the step S3 includes:
performing word segmentation operation on all data in the keyword library by using a word segmentation toolkit to obtain a word segmentation database; the word segmentation toolkit adopted in the invention is a jieba word segmentation toolkit.
Converting all the word segmentation data information into word vector information;
because the machine can not recognize the participle, the participle data information is converted into word vector information so as to be convenient for machine recognition.
Extracting negative words in the word segmentation database by using a BilSTM model according to the word vector information; the emotion analysis model BilSTM is used for carrying out emotion analysis on word segmentation results, the emotion score between [ -1, 1] is calculated for each word by the model, the probability that the word is negative is higher when the emotion score is close to-1, and the probability that the word is positive is higher when the emotion score is close to 1. And then, carrying out ascending sorting according to the emotion polarity scores, taking out the first m negative words and adding the negative words into a sensitive word bank.
And taking the negative words as sensitive words to obtain the sensitive word bank.
S4: collecting final public opinion data of the keyword library and the sensitive word library;
and the final public opinion data comprises the subject of the article, the publishing time, the full text of the content, the forwarding number, the comment number, the praise number, the publishing user authentication information, the grade, the region and other information (if the information exists).
Specifically, referring to fig. 2, the step S4 includes:
s41: configuring a data acquisition expression, and merging the keyword library and the sensitive word library into a combined word library; here, the data collection expression is mainly a CSS expression or an xpath expression.
S42: searching a related public opinion news list by using the combined word bank;
s43: adding the webpage address of the current news page of the related public opinion news list into a list to be collected;
s44: extracting the webpage address from a list to be collected, and accessing the related information of the current news page to form initial public opinion data;
here, the related information includes information such as the title of the extracted article, the release time, and the text of the article.
S45: if the initial public opinion data simultaneously meets the integrity and uniqueness, the step S46 is carried out, otherwise, the step S47 is carried out;
the integrity and uniqueness are: for example, if a news article is extracted, the incomplete data is discarded if the title of the article is missing or the content of the article is missing, and if the extracted data is already in the database (data duplication), the data is not stored.
S46: outputting the initial public sentiment data as the final public sentiment data;
s47: and judging whether the current news page is the last page of the related public opinion news list, if so, returning to the step S46, otherwise, returning to the step S43.
S5: carrying out preprocessing operation on the final public sentiment data to obtain a preprocessing result;
alternatively, the step S5 includes:
processing the final public opinion data in batches to obtain a plurality of batches of public opinion data;
removing special characters and useless characters from each batch of public opinion data by using a regular expression to obtain processed final public opinion data;
performing data feature extraction operation on the processed final public sentiment data to obtain a feature extraction result; text data adopts an open-source ALBert Chinese pre-training model to extract 768-dimensional semantic vectors. The picture data adopts an open source ResNet101 pre-training model to extract 2048-dimensional picture feature vectors.
And outputting the feature extraction result as the preprocessing result.
S6: carrying out public sentiment analysis processing on the preprocessing result to obtain an analysis result;
optionally, the public opinion analysis operation includes: general statistical analysis, keyword extraction, heat calculation and multi-modal emotion analysis.
General statistical analysis comprises public opinion data total number related to keywords and similar keywords, public opinion data proportion of each website, public opinion data amount time-interval statistical information and region distribution information released by the public opinion data;
the keyword extraction comprises the following steps: a keyword search may obtain a plurality of related public sentiment events, and a public sentiment event comprises a plurality of public sentiment data. Analysis of each independent event is also crucial to public opinion analysis. The invention adopts TF-IDF algorithm and TextRank algorithm to extract public sentiment keywords in related events, and is used for forming public sentiment keyword cloud and public sentiment keyword-dividing popularity calculation in a public sentiment report module.
The heat calculation includes: and respectively calculating the public sentiment popularity of the dimensions according to the classification of the search terms/the classification of the events belonging to the search terms/the classification of the keywords in the events belonging to the search terms. Public opinion popularity calculation the invention takes traditional socialization media algorithm Reddit as a basis, and respectively designs different popularity calculation methods for social media public opinion data (microblog) and network media public opinion data (the head of the day, Tencent news and network news) to calculate public opinion popularity indexes of different platforms, and then gives weights to the public opinion popularity indexes of different platforms, and the popularity indexes of all the platforms are multiplied by the weights and added to obtain the multisource public opinion popularity indexes of the public opinions related to the keywords.
Optionally, the heat calculation includes a heat index calculation for a single data source and a heat index calculation for a plurality of data sources:
the heat index x of a single relevant data source is calculated by the formula:
wherein E is the user attention index, T, of each relevant data source s Reflecting the freshness of the related public opinion news and T s A is the release time, B is the acquisition time, T represents the total number of seconds in a thermal cycle of 3 days and T is 259200.
When a single related data source is a microblog, E ═ user type (6 × forwarding number +3 × comment number +1 × praise number), and the user type and the corresponding weight thereof are: 1 for common users, 1 for microblog girls, 1.5 for microblog users, 2 for landers, 4 for blue V, 4 for yellow V, 4 for gold V and 10 for gold V; the use of log10 may also allow earlier revalidation to gain more weight.
User types which are the same as the microblog types are not divided in all news network media in detail, but the heat index calculation principle is similar, so the calculation steps are the same except that the calculation of the E and the calculation of the microblog heat index are different.
According to the characteristics of the analysis data, the calculation E in the network media heat calculation formula is as follows: e ═ 8 × forwarding number +5 × comment number +2 × vote number
The heat index calculation formula of a plurality of data sources is as follows:
wherein H is a calorific value, H i Heat index integration of all Final public opinion data for the ith related data Source, W i Is the heat weight of the relevant data source.
The multi-modal sentiment analysis comprises the following steps: because of the domain difference of data among different modalities (pictures and texts), the fusion of characteristics among multiple modalities is also a difficult problem. The emotion analysis module of the software is respectively a graph, a coder and a decoder are constructed for text data, the text characteristics are aligned by adopting a VAE idea, the characteristics are fused into a single hidden layer to form a shared characteristic representation layer, the fusion characteristics generated by the shared characteristic representation layer are used for training an emotion classifier of public opinion data, and finally the emotion tendency score of the multi-mode public opinion data is obtained.
The loss function of the multi-modal emotion analysis model is as follows:
L=L CA -L DA
wherein L is CA Are lost for cross reconstructionM is the number of samples, x j Representing the original characteristics of the j mode, D j Encoder representing j modality, E i Encoder representing i modality, x i Representing the original characteristics of the i-mode, L DA Is to distribute the alignment loss andW ij is the 2-Wasserstein distance between modes i and j andwherein, mu is in combination withAre the hidden layer feature vectors generated by the encoder.
Optionally, the multimodal sentiment analysis comprises:
acquiring picture characteristics and character characteristics in the preprocessing result;
training a picture text alignment network according to the picture characteristics and the text characteristics to obtain a trained picture text alignment network;
the trained picture text alignment network is a picture text alignment network capable of better fusing semantic information corresponding to pictures and texts.
According to the picture features and the text features, aligning a network by using the trained picture texts to obtain fusion features;
and taking the fusion features as the input of a classifier to obtain a multi-modal emotion analysis result.
S7: and obtaining a public opinion monitoring result according to the analysis result.
Optionally, as shown in fig. 3, the picture text alignment network includes: the system comprises a picture characteristic encoder, a text characteristic encoder, a shared characteristic layer and a plurality of shared characteristic decoders, wherein the picture characteristic encoder and the text characteristic encoder are simultaneously connected with the input end of the shared characteristic layer, the shared characteristic encoders are connected with the output end of the shared characteristic layer, and the shared characteristic layer is also connected with a classifier;
the picture feature encoder is used for encoding the picture features;
the text feature encoder is used for encoding the text features;
a plurality of the shared feature decoders for decoding the shared features to output reconstructed picture features and reconstructed text features;
the classifier is used for classifying the shared features so as to train the image text alignment network.
The invention also provides a public opinion monitoring system based on the public opinion monitoring method, and the public opinion monitoring system comprises:
the keyword acquisition module is used for acquiring keywords;
the keyword expansion module is used for expanding the keywords;
the sensitive word extraction module is used for extracting sensitive words in a keyword library;
the public opinion data acquisition module is used for acquiring final public opinion data of the keyword library and the sensitive word library;
the data preprocessing module is used for preprocessing the final public opinion data;
the public opinion analysis module is used for analyzing the preprocessing result;
and the public opinion reporting module is used for displaying the public opinion monitoring result to the user.
The invention has the following beneficial effects:
1. the method can improve the accuracy of public sentiment emotion analysis;
2. the method for expanding the keywords and extracting the sensitive words is combined to form a new search word, so that sensitive public sentiments concerned by the user can be effectively and comprehensively searched;
3. carry out information acquisition based on a plurality of relevant data sources to provide the expandable data acquisition interface, can solve when the source of public opinion monitoring system data collection is few, the problem that the information is incomplete.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.
Claims (10)
1. A public opinion monitoring method is characterized by comprising the following steps:
s1: acquiring a keyword input by a user;
s2: performing keyword expansion operation on the keywords to obtain a keyword library;
s3: extracting sensitive words in the keyword library to obtain a sensitive word library;
s4: collecting final public opinion data of the keyword library and the sensitive word library;
s5: carrying out preprocessing operation on the final public sentiment data to obtain a preprocessing result;
s6: carrying out public sentiment analysis processing on the preprocessing result to obtain an analysis result;
s7: and obtaining a public opinion monitoring result according to the analysis result.
2. The public opinion monitoring method according to claim 1, wherein the step S2 includes:
searching in related data sources by using the keywords to obtain a plurality of pieces of data information matched with the keywords;
and obtaining the keyword library according to all the data information.
3. The public opinion monitoring method according to claim 1, wherein the step S3 includes:
performing word segmentation operation on all data in the keyword library by using a word segmentation toolkit to obtain a word segmentation database;
converting all the word segmentation data information into word vector information;
extracting negative words in the word segmentation database by using a BilSTM model according to the word vector information;
and taking the negative words as sensitive words to obtain the sensitive word bank.
4. The public opinion monitoring method according to claim 1, wherein the step S4 includes:
s41: configuring a data acquisition expression, and merging the keyword library and the sensitive word library into a combined word library;
s42: searching a related public opinion news list by using the combined word bank;
s43: adding the webpage address of the current news page of the related public opinion news list into a list to be collected;
s44: extracting the webpage address from a list to be collected, and accessing the related information of the current news page to form initial public opinion data;
s45: if the initial public opinion data simultaneously meets the integrity and uniqueness, the step S46 is carried out, otherwise, the step S47 is carried out;
s46: outputting the initial public sentiment data as the final public sentiment data;
s47: and judging whether the current news page is the last page of the related public opinion news list, if so, returning to the step S46, otherwise, returning to the step S43.
5. The public opinion monitoring method according to claim 1, wherein the step S5 includes:
processing the final public opinion data in batches to obtain a plurality of batches of public opinion data;
removing special characters and useless characters from each batch of public opinion data by using a regular expression to obtain processed final public opinion data;
performing data feature extraction operation on the processed final public sentiment data to obtain a feature extraction result;
and outputting the feature extraction result as the preprocessing result.
6. The public opinion monitoring method according to any one of claims 1-5, wherein the public opinion analysis process includes: general statistical analysis, keyword extraction, heat calculation and multi-modal emotion analysis.
7. The public opinion monitoring method according to claim 6, wherein the popularity calculation includes a popularity index calculation of a single data source and a popularity index calculation of a plurality of data sources, and the popularity index calculation formula of the plurality of data sources is as follows:
wherein H is a calorific value, H i Heat index integration of all Final public opinion data for the ith related data Source, W i A heat weight for the associated data source;
the heat index x of a single relevant data source is calculated by the formula:
wherein E is the user attention index, T, of each relevant data source s T represents the freshness of the related public opinion news s A is the release time, B is the acquisition time, and T represents the total number of seconds in a thermal cycle of 3 days.
8. The consensus monitoring method of claim 6, wherein the multimodal sentiment analysis comprises:
acquiring picture characteristics and character characteristics in the preprocessing result;
training a picture text alignment network according to the picture characteristics and the text characteristics to obtain a trained picture text alignment network;
according to the picture features and the text features, utilizing the trained picture text to align a network to obtain fusion features;
taking the fusion features as input of a classifier to obtain a multi-modal emotion analysis result;
the loss function of the multi-modal emotion analysis model is as follows:
L=L CA -L DA
wherein L is CA Are lost for cross reconstruction andm is the number of samples, x j Representing the original characteristics of the j mode, D j Encoder representing j modality, E i Encoder representing i modality, x i Representing the original characteristics of the i mode, L DA Is to distribute the alignment loss andW ij is the 2-Wasserstein distance between modes i and j andwherein, mu is in combination withAre all hidden layer feature vectors generated by the encoder.
9. The public opinion monitoring method according to claim 8, wherein the picture text alignment network comprises: the system comprises a picture characteristic encoder, a text characteristic encoder, a shared characteristic layer and a plurality of shared characteristic decoders, wherein the picture characteristic encoder and the text characteristic encoder are simultaneously connected with the input end of the shared characteristic layer, the shared characteristic encoders are connected with the output end of the shared characteristic layer, and the shared characteristic layer is also connected with a classifier;
the picture characteristic encoder is used for encoding the picture characteristic;
the text feature encoder is used for encoding the text features;
a plurality of shared feature decoders for decoding the shared features to output reconstructed picture features and reconstructed text features;
the classifier is used for classifying the shared features so as to train the image text alignment network.
10. A public opinion monitoring system based on the public opinion monitoring method according to any one of claims 1 to 9, wherein the public opinion monitoring system comprises:
the keyword acquisition module is used for acquiring keywords;
the keyword expansion module is used for expanding the keywords;
the sensitive word extraction module is used for extracting sensitive words in the keyword library;
the public opinion data acquisition module is used for acquiring final public opinion data of the keyword library and the sensitive word library;
the data preprocessing module is used for preprocessing the final public opinion data;
the public opinion analysis module is used for analyzing the preprocessing result;
and the public opinion reporting module is used for displaying the public opinion monitoring result to the user.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210047264.7A CN115017302A (en) | 2022-01-17 | 2022-01-17 | Public opinion monitoring method and public opinion monitoring system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210047264.7A CN115017302A (en) | 2022-01-17 | 2022-01-17 | Public opinion monitoring method and public opinion monitoring system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115017302A true CN115017302A (en) | 2022-09-06 |
Family
ID=83067426
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210047264.7A Withdrawn CN115017302A (en) | 2022-01-17 | 2022-01-17 | Public opinion monitoring method and public opinion monitoring system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115017302A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115982473A (en) * | 2023-03-21 | 2023-04-18 | 环球数科集团有限公司 | AIGC-based public opinion analysis arrangement system |
CN117131281A (en) * | 2023-10-26 | 2023-11-28 | 中关村科学城城市大脑股份有限公司 | Public opinion event processing method, apparatus, electronic device and computer readable medium |
-
2022
- 2022-01-17 CN CN202210047264.7A patent/CN115017302A/en not_active Withdrawn
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115982473A (en) * | 2023-03-21 | 2023-04-18 | 环球数科集团有限公司 | AIGC-based public opinion analysis arrangement system |
CN117131281A (en) * | 2023-10-26 | 2023-11-28 | 中关村科学城城市大脑股份有限公司 | Public opinion event processing method, apparatus, electronic device and computer readable medium |
CN117131281B (en) * | 2023-10-26 | 2024-02-09 | 中关村科学城城市大脑股份有限公司 | Public opinion event processing method, apparatus, electronic device and computer readable medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Ishmam et al. | Hateful speech detection in public facebook pages for the bengali language | |
Gupta et al. | A survey of text question answering techniques | |
CN104615593B (en) | Hot microblog topic automatic testing method and device | |
US8161059B2 (en) | Method and apparatus for collecting entity aliases | |
CN111950273A (en) | Network public opinion emergency automatic identification method based on emotion information extraction analysis | |
Suleiman et al. | Comparative study of word embeddings models and their usage in Arabic language applications | |
Geçkil et al. | A clickbait detection method on news sites | |
Khasawneh et al. | Arabic sentiment polarity identification using a hybrid approach | |
CN115017302A (en) | Public opinion monitoring method and public opinion monitoring system | |
KR101059557B1 (en) | Computer-readable recording media containing information retrieval methods and programs capable of performing the information | |
Faruque et al. | Ascertaining polarity of public opinions on Bangladesh cricket using machine learning techniques | |
CN111813874B (en) | Terahertz knowledge graph construction method and system | |
CN109815401A (en) | A kind of name disambiguation method applied to Web people search | |
Mouty et al. | The effect of the similarity between the two names of twitter users on the credibility of their publications | |
CN116910238A (en) | Knowledge perception false news detection method based on twin network | |
Saghayan et al. | Exploring the impact of machine translation on fake news detection: A case study on persian tweets about covid-19 | |
Rehman et al. | User-aware multilingual abusive content detection in social media | |
CN114255067A (en) | Data pricing method and device, electronic equipment and storage medium | |
Campbell et al. | Content+ context networks for user classification in twitter | |
CN109871429B (en) | Short text retrieval method integrating Wikipedia classification and explicit semantic features | |
Gohil et al. | Multilabel classification for emotion analysis of multilingual tweets | |
De Saa et al. | Self-reflective and introspective feature model for hate content detection in sinhala youtube videos | |
Wadawadagi et al. | A multi-layer approach to opinion polarity classification using augmented semantic tree kernels | |
TWI534640B (en) | Chinese network information monitoring and analysis system and its method | |
Alashri et al. | Lexi-augmenter: Lexicon-based model for tweets sentiment analysis |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WW01 | Invention patent application withdrawn after publication | ||
WW01 | Invention patent application withdrawn after publication |
Application publication date: 20220906 |