CN115017302A - Public opinion monitoring method and public opinion monitoring system - Google Patents

Public opinion monitoring method and public opinion monitoring system Download PDF

Info

Publication number
CN115017302A
CN115017302A CN202210047264.7A CN202210047264A CN115017302A CN 115017302 A CN115017302 A CN 115017302A CN 202210047264 A CN202210047264 A CN 202210047264A CN 115017302 A CN115017302 A CN 115017302A
Authority
CN
China
Prior art keywords
public opinion
data
keyword
public
text
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202210047264.7A
Other languages
Chinese (zh)
Inventor
李响
杨国武
李蒍韦
侯柏成
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yellow River Conservancy Technical Institute
Original Assignee
Yellow River Conservancy Technical Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yellow River Conservancy Technical Institute filed Critical Yellow River Conservancy Technical Institute
Priority to CN202210047264.7A priority Critical patent/CN115017302A/en
Publication of CN115017302A publication Critical patent/CN115017302A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a public opinion monitoring method and a public opinion monitoring system, wherein the public opinion monitoring method comprises the following steps: s1: acquiring a keyword; s2: performing keyword expansion operation on the keywords to obtain a keyword library; s3: extracting sensitive words in the keyword library to obtain a sensitive word library; s4: collecting final public opinion data of the keyword library and the sensitive word library; s5: carrying out preprocessing operation on the public opinion data to obtain a preprocessing result; s6: carrying out public sentiment analysis processing on the preprocessing result to obtain an analysis result; s7: and obtaining a public opinion monitoring result according to the analysis result. The public opinion monitoring method and the public opinion monitoring system provided by the invention can effectively improve the comprehensiveness and accuracy of public opinion data.

Description

Public opinion monitoring method and public opinion monitoring system
Technical Field
The invention relates to the technical field of public opinion monitoring, in particular to a public opinion monitoring method and a public opinion monitoring system.
Background
In the information age, the internet has become an important channel and carrier for information transmission in the current society, social media based on internet technology is widely applied in social life, and people have put the center of gravity of information collection on a network social platform. However, in the background of the era of big data, the data scale in network media is continuously increased, the data forms are more diversified, the information transmission speed is continuously improved, and the changes are expected to be accurate in monitoring and bring about little challenge to public opinion monitoring work with quick response. The existing public opinion monitoring system has the following problems:
1. the acquired data has fewer sources and incomplete information. Most network public opinion monitoring systems only collect and analyze data for a single website, but nowadays, social platforms and network media are all in a whole, different platforms have user characteristics of different platforms, and respective public opinion data also have different values. The public opinion data can be collected in multiple channels, so that the network public opinion condition can be reflected more comprehensively and accurately.
2. Public opinion retrieval is inaccurate. Under the background of big data, the scale of network public opinion data is huge, and the information is complicated. Some public opinion monitoring systems directly use keywords provided by users to collect information, however, since there are many near-meaning words in vocabularies, there may be different network popular word expressions at different times, and there may be ambiguity in words and sentences, etc., most of the collected information in the ordinary sense is retrieved by directly using initial keywords, and the most comprehensive public opinion data and sensitive public opinions concerned by users cannot be obtained.
3. The single text emotion analysis method has poor resolution effect on problems such as irony, sentence ambiguity and the like. Most of the current emotion analysis methods are directed to emotion analysis of texts, and the methods cannot identify ironic sentences well because the ironic sentences discard context and normal sentences which are very different. In the network environment of today, pictures such as emoji, emoticons and the like become important supplements for people to express emotions, and are worthy of attracting attention in the field of emotion analysis.
Disclosure of Invention
The invention aims to provide a public opinion monitoring method and a public opinion monitoring system, which can effectively improve the comprehensiveness and accuracy of public opinion data.
The technical scheme for solving the technical problems is as follows:
the invention provides a public opinion monitoring method, which comprises the following steps:
s1: acquiring a keyword input by a user;
s2: performing keyword expansion operation on the keywords to obtain a keyword library;
s3: extracting sensitive words in the keyword library to obtain a sensitive word library;
s4: collecting final public opinion data of the keyword library and the sensitive word library;
s5: carrying out preprocessing operation on the final public sentiment data to obtain a preprocessing result;
s6: carrying out public sentiment analysis processing on the preprocessing result to obtain an analysis result;
s7: and obtaining a public opinion monitoring result according to the analysis result.
Alternatively, the step S2 includes:
searching in related data sources by using the keywords to obtain a plurality of pieces of data information matched with the keywords;
and obtaining the keyword library according to all the data information.
Alternatively, the step S3 includes:
performing word segmentation operation on all data in the keyword library by using a word segmentation toolkit to obtain a word segmentation database;
converting all the word segmentation data information into word vector information;
extracting negative words in the word segmentation database by using a BilSTM model according to the word vector information;
and taking the negative words as sensitive words to obtain the sensitive word bank.
Alternatively, the step S4 includes:
s41: configuring a data acquisition expression, and merging the keyword library and the sensitive word library into a combined word library;
s42: searching a related public opinion news list by using the combined word bank;
s43: adding the webpage address of the current news page of the related public opinion news list into a list to be collected;
s44: extracting the webpage address from a list to be collected, and accessing the related information of the current news page to form initial public opinion data;
s45: if the initial public opinion data simultaneously satisfies the integrity and the uniqueness, the step S46 is carried out, otherwise, the step S47 is carried out;
s46: outputting the initial public sentiment data as the final public sentiment data;
s47: and judging whether the current news page is the last page of the related public opinion news list, if so, returning to the step S46, otherwise, returning to the step S43.
Alternatively, the step S5 includes:
processing the final public opinion data in batches to obtain multiple batches of public opinion data;
removing special characters and useless characters from each batch of public opinion data by using a regular expression to obtain processed final public opinion data;
performing data feature extraction operation on the processed final public sentiment data to obtain a feature extraction result;
and outputting the feature extraction result as the preprocessing result.
Optionally, the public opinion analysis operation includes: general statistical analysis, keyword extraction, heat calculation and multi-modal emotion analysis.
Optionally, the heat calculation includes a heat index calculation of a single data source and a heat index calculation of a plurality of data sources, and the heat index calculation formula of the plurality of data sources is:
Figure BDA0003472514210000031
wherein H is a calorific value, H i Heat index integration of all final public sentiment data for the ith related data source, W i A heat weight for the associated data source;
the heat index x of a single relevant data source is calculated by the formula:
Figure BDA0003472514210000041
wherein E is the user attention index, T, of each relevant data source s Reflecting the freshness of related public opinion news and T s A is the release time, B is the acquisition time, T represents the total number of seconds in a thermal cycle of 3 days and T is 259200.
Optionally, the multimodal sentiment analysis comprises:
acquiring picture characteristics and character characteristics in the preprocessing result;
training a picture text alignment network according to the picture characteristics and the text characteristics to obtain a trained picture text alignment network;
according to the picture features and the text features, utilizing the trained picture text to align a network to obtain fusion features;
taking the fusion features as input of a classifier to obtain a multi-modal emotion analysis result;
the loss function of the multi-modal emotion analysis model is as follows:
L=L CA -L DA
wherein L is CA Are lost for cross reconstruction and
Figure BDA0003472514210000042
m is the number of samples, x j Representing the original characteristics of the j mode, D j Encoder representing j modality, E i Encoder representing i modality, x i RepresentsOriginal characteristics of the i-mode, L DA Is to distribute the alignment loss and
Figure BDA0003472514210000043
W ij is the 2-Wasserstein distance between modes i and j and
Figure BDA0003472514210000044
wherein, mu is in combination with
Figure BDA0003472514210000045
Are all hidden layer feature vectors generated by the encoder.
Optionally, the photo text alignment network comprises: the system comprises a picture characteristic encoder, a text characteristic encoder, a shared characteristic layer and a plurality of shared characteristic decoders, wherein the picture characteristic encoder and the text characteristic encoder are simultaneously connected with the input end of the shared characteristic layer, the shared characteristic encoders are connected with the output end of the shared characteristic layer, and the shared characteristic layer is also connected with a classifier;
the picture characteristic encoder is used for encoding the picture characteristic;
the text feature encoder is used for encoding the text features;
a plurality of shared feature decoders for decoding the shared features to output reconstructed picture features and reconstructed text features;
the classifier is used for classifying the shared features so as to train the image text alignment network.
The invention also provides a public opinion monitoring system based on the public opinion monitoring method, and the public opinion monitoring system comprises:
the keyword acquisition module is used for acquiring keywords;
the keyword expansion module is used for expanding the keywords;
the sensitive word extraction module is used for extracting sensitive words in the keyword library;
the public opinion data acquisition module is used for acquiring final public opinion data of the keyword library and the sensitive word library;
the data preprocessing module is used for preprocessing the final public opinion data;
the public opinion analysis module is used for analyzing the preprocessing result;
and the public opinion reporting module is used for displaying the public opinion monitoring result to the user.
The invention has the following beneficial effects:
1. the method can improve the accuracy of public sentiment emotion analysis;
2. the method for expanding the keywords and extracting the sensitive words is combined to form a new search word, so that sensitive public sentiments concerned by the user can be effectively and comprehensively searched;
3. information acquisition is carried out based on a plurality of relevant data sources, and an extensible data acquisition interface is provided, so that the problems that the sources of data acquired by a public opinion monitoring system are few, and the information is incomplete can be solved.
Drawings
Fig. 1 is a flowchart of a public opinion monitoring method according to the present invention;
FIG. 2 is a flowchart illustrating steps S4 of FIG. 1;
FIG. 3 is a schematic structural diagram of a multi-modal emotion analysis model provided by the present invention.
Detailed Description
The principles and features of this invention are described below in conjunction with the following drawings, which are set forth by way of illustration only and are not intended to limit the scope of the invention.
Examples
The invention provides a public opinion monitoring method, which is shown in a reference figure 1 and comprises the following steps:
s1: acquiring a keyword input by a user;
the keyword here is generally a user input keyword.
S2: performing keyword expansion operation on the keywords to obtain a keyword library;
the step S2 includes:
searching in related data sources by using the keywords to obtain a plurality of pieces of data information matched with the keywords;
here, the related data sources include but are not limited to data sources such as microblog, today's headline, internet news, Tencent news, and the like; the search mode comprises the step of finding out key words in the search articles by using a TF-IDF algorithm and a TextRank algorithm.
TF-IDF is a commonly used weighting technique for information retrieval and text mining, which is a statistical method to evaluate how important a word is to one of a set of documents or a corpus of documents. TF represents the frequency of occurrence of the entry in the text by the formula
Figure BDA0003472514210000061
Representing the word frequency, n, of the word i in the document j i,j Representing the number of times the word i appears in document j,
Figure BDA0003472514210000062
representing the sum of the number of occurrences of all words in the j document.
Figure BDA0003472514210000063
IDF is the inverse document frequency, | D | is the total number of documents in the corpus, and the denominator | j: t i ∈d j The expression, | denotes the inclusion of the word t i The number of files of (c). The TF-IDF value of a certain word i for a certain category description text j is calculated as follows: TF-IDF i,j =TF i,j *IDF i
If the high word frequency of a certain word in a specific file and the low file frequency of the word in the whole file set exist, the TF-DF with high weight can be generated, so that common words can be filtered out, important words are reserved, and the extraction of keywords is realized.
The TextRank is based on the idea improvement of the Pagerank, is a graph-based sorting algorithm for keyword extraction and document summarization, can extract keywords by utilizing contribution information among words in a document, can extract the keywords and keyword groups of the text from a given text, and can extract the keywords of the text by using an extraction type automatic summarization method. The basic idea of TextRank is to treat a document as a network of words, where the links in the network represent semantic relationships from word to word. The formula is as follows:
Figure BDA0003472514210000071
wherein WS (V) i ) The weight of the sentence i is represented, the summation on the right side represents the contribution degree of each adjacent sentence to the sentence, in a single document, all sentences can be roughly considered to be adjacent, generation and extraction of multiple windows are not needed like multiple documents, only a single document window is needed, and W is ij Representing the similarity of two sentences, WS (V) j ) Representing the weight of the last iterated sentence j. d is the damping coefficient, typically 0.85.
From the above, it is obvious that TF-IDF is suitable for extracting rare words in an article, and is intended to find out words with high frequency in the article but with low frequency in a corpus, and is suitable for finding out some characteristic words, whereas TextRank algorithm is a simple method for extracting keywords of an article through a graph algorithm, and is suitable for finding out conventional keywords by discarding the corpus. The two methods are used simultaneously, and conventional keywords and special words in the corresponding fields are found out to expand the keyword library.
And obtaining the keyword library according to all the data information.
S3: extracting sensitive words in the keyword library to obtain a sensitive word library;
alternatively, the step S3 includes:
performing word segmentation operation on all data in the keyword library by using a word segmentation toolkit to obtain a word segmentation database; the word segmentation toolkit adopted in the invention is a jieba word segmentation toolkit.
Converting all the word segmentation data information into word vector information;
because the machine can not recognize the participle, the participle data information is converted into word vector information so as to be convenient for machine recognition.
Extracting negative words in the word segmentation database by using a BilSTM model according to the word vector information; the emotion analysis model BilSTM is used for carrying out emotion analysis on word segmentation results, the emotion score between [ -1, 1] is calculated for each word by the model, the probability that the word is negative is higher when the emotion score is close to-1, and the probability that the word is positive is higher when the emotion score is close to 1. And then, carrying out ascending sorting according to the emotion polarity scores, taking out the first m negative words and adding the negative words into a sensitive word bank.
And taking the negative words as sensitive words to obtain the sensitive word bank.
S4: collecting final public opinion data of the keyword library and the sensitive word library;
and the final public opinion data comprises the subject of the article, the publishing time, the full text of the content, the forwarding number, the comment number, the praise number, the publishing user authentication information, the grade, the region and other information (if the information exists).
Specifically, referring to fig. 2, the step S4 includes:
s41: configuring a data acquisition expression, and merging the keyword library and the sensitive word library into a combined word library; here, the data collection expression is mainly a CSS expression or an xpath expression.
S42: searching a related public opinion news list by using the combined word bank;
s43: adding the webpage address of the current news page of the related public opinion news list into a list to be collected;
s44: extracting the webpage address from a list to be collected, and accessing the related information of the current news page to form initial public opinion data;
here, the related information includes information such as the title of the extracted article, the release time, and the text of the article.
S45: if the initial public opinion data simultaneously meets the integrity and uniqueness, the step S46 is carried out, otherwise, the step S47 is carried out;
the integrity and uniqueness are: for example, if a news article is extracted, the incomplete data is discarded if the title of the article is missing or the content of the article is missing, and if the extracted data is already in the database (data duplication), the data is not stored.
S46: outputting the initial public sentiment data as the final public sentiment data;
s47: and judging whether the current news page is the last page of the related public opinion news list, if so, returning to the step S46, otherwise, returning to the step S43.
S5: carrying out preprocessing operation on the final public sentiment data to obtain a preprocessing result;
alternatively, the step S5 includes:
processing the final public opinion data in batches to obtain a plurality of batches of public opinion data;
removing special characters and useless characters from each batch of public opinion data by using a regular expression to obtain processed final public opinion data;
performing data feature extraction operation on the processed final public sentiment data to obtain a feature extraction result; text data adopts an open-source ALBert Chinese pre-training model to extract 768-dimensional semantic vectors. The picture data adopts an open source ResNet101 pre-training model to extract 2048-dimensional picture feature vectors.
And outputting the feature extraction result as the preprocessing result.
S6: carrying out public sentiment analysis processing on the preprocessing result to obtain an analysis result;
optionally, the public opinion analysis operation includes: general statistical analysis, keyword extraction, heat calculation and multi-modal emotion analysis.
General statistical analysis comprises public opinion data total number related to keywords and similar keywords, public opinion data proportion of each website, public opinion data amount time-interval statistical information and region distribution information released by the public opinion data;
the keyword extraction comprises the following steps: a keyword search may obtain a plurality of related public sentiment events, and a public sentiment event comprises a plurality of public sentiment data. Analysis of each independent event is also crucial to public opinion analysis. The invention adopts TF-IDF algorithm and TextRank algorithm to extract public sentiment keywords in related events, and is used for forming public sentiment keyword cloud and public sentiment keyword-dividing popularity calculation in a public sentiment report module.
The heat calculation includes: and respectively calculating the public sentiment popularity of the dimensions according to the classification of the search terms/the classification of the events belonging to the search terms/the classification of the keywords in the events belonging to the search terms. Public opinion popularity calculation the invention takes traditional socialization media algorithm Reddit as a basis, and respectively designs different popularity calculation methods for social media public opinion data (microblog) and network media public opinion data (the head of the day, Tencent news and network news) to calculate public opinion popularity indexes of different platforms, and then gives weights to the public opinion popularity indexes of different platforms, and the popularity indexes of all the platforms are multiplied by the weights and added to obtain the multisource public opinion popularity indexes of the public opinions related to the keywords.
Optionally, the heat calculation includes a heat index calculation for a single data source and a heat index calculation for a plurality of data sources:
the heat index x of a single relevant data source is calculated by the formula:
Figure BDA0003472514210000101
wherein E is the user attention index, T, of each relevant data source s Reflecting the freshness of the related public opinion news and T s A is the release time, B is the acquisition time, T represents the total number of seconds in a thermal cycle of 3 days and T is 259200.
When a single related data source is a microblog, E ═ user type (6 × forwarding number +3 × comment number +1 × praise number), and the user type and the corresponding weight thereof are: 1 for common users, 1 for microblog girls, 1.5 for microblog users, 2 for landers, 4 for blue V, 4 for yellow V, 4 for gold V and 10 for gold V; the use of log10 may also allow earlier revalidation to gain more weight.
User types which are the same as the microblog types are not divided in all news network media in detail, but the heat index calculation principle is similar, so the calculation steps are the same except that the calculation of the E and the calculation of the microblog heat index are different.
According to the characteristics of the analysis data, the calculation E in the network media heat calculation formula is as follows: e ═ 8 × forwarding number +5 × comment number +2 × vote number
The heat index calculation formula of a plurality of data sources is as follows:
Figure BDA0003472514210000102
wherein H is a calorific value, H i Heat index integration of all Final public opinion data for the ith related data Source, W i Is the heat weight of the relevant data source.
The multi-modal sentiment analysis comprises the following steps: because of the domain difference of data among different modalities (pictures and texts), the fusion of characteristics among multiple modalities is also a difficult problem. The emotion analysis module of the software is respectively a graph, a coder and a decoder are constructed for text data, the text characteristics are aligned by adopting a VAE idea, the characteristics are fused into a single hidden layer to form a shared characteristic representation layer, the fusion characteristics generated by the shared characteristic representation layer are used for training an emotion classifier of public opinion data, and finally the emotion tendency score of the multi-mode public opinion data is obtained.
The loss function of the multi-modal emotion analysis model is as follows:
L=L CA -L DA
wherein L is CA Are lost for cross reconstruction
Figure BDA0003472514210000111
M is the number of samples, x j Representing the original characteristics of the j mode, D j Encoder representing j modality, E i Encoder representing i modality, x i Representing the original characteristics of the i-mode, L DA Is to distribute the alignment loss and
Figure BDA0003472514210000112
W ij is the 2-Wasserstein distance between modes i and j and
Figure BDA0003472514210000113
wherein, mu is in combination with
Figure BDA0003472514210000114
Are the hidden layer feature vectors generated by the encoder.
Optionally, the multimodal sentiment analysis comprises:
acquiring picture characteristics and character characteristics in the preprocessing result;
training a picture text alignment network according to the picture characteristics and the text characteristics to obtain a trained picture text alignment network;
the trained picture text alignment network is a picture text alignment network capable of better fusing semantic information corresponding to pictures and texts.
According to the picture features and the text features, aligning a network by using the trained picture texts to obtain fusion features;
and taking the fusion features as the input of a classifier to obtain a multi-modal emotion analysis result.
S7: and obtaining a public opinion monitoring result according to the analysis result.
Optionally, as shown in fig. 3, the picture text alignment network includes: the system comprises a picture characteristic encoder, a text characteristic encoder, a shared characteristic layer and a plurality of shared characteristic decoders, wherein the picture characteristic encoder and the text characteristic encoder are simultaneously connected with the input end of the shared characteristic layer, the shared characteristic encoders are connected with the output end of the shared characteristic layer, and the shared characteristic layer is also connected with a classifier;
the picture feature encoder is used for encoding the picture features;
the text feature encoder is used for encoding the text features;
a plurality of the shared feature decoders for decoding the shared features to output reconstructed picture features and reconstructed text features;
the classifier is used for classifying the shared features so as to train the image text alignment network.
The invention also provides a public opinion monitoring system based on the public opinion monitoring method, and the public opinion monitoring system comprises:
the keyword acquisition module is used for acquiring keywords;
the keyword expansion module is used for expanding the keywords;
the sensitive word extraction module is used for extracting sensitive words in a keyword library;
the public opinion data acquisition module is used for acquiring final public opinion data of the keyword library and the sensitive word library;
the data preprocessing module is used for preprocessing the final public opinion data;
the public opinion analysis module is used for analyzing the preprocessing result;
and the public opinion reporting module is used for displaying the public opinion monitoring result to the user.
The invention has the following beneficial effects:
1. the method can improve the accuracy of public sentiment emotion analysis;
2. the method for expanding the keywords and extracting the sensitive words is combined to form a new search word, so that sensitive public sentiments concerned by the user can be effectively and comprehensively searched;
3. carry out information acquisition based on a plurality of relevant data sources to provide the expandable data acquisition interface, can solve when the source of public opinion monitoring system data collection is few, the problem that the information is incomplete.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims (10)

1. A public opinion monitoring method is characterized by comprising the following steps:
s1: acquiring a keyword input by a user;
s2: performing keyword expansion operation on the keywords to obtain a keyword library;
s3: extracting sensitive words in the keyword library to obtain a sensitive word library;
s4: collecting final public opinion data of the keyword library and the sensitive word library;
s5: carrying out preprocessing operation on the final public sentiment data to obtain a preprocessing result;
s6: carrying out public sentiment analysis processing on the preprocessing result to obtain an analysis result;
s7: and obtaining a public opinion monitoring result according to the analysis result.
2. The public opinion monitoring method according to claim 1, wherein the step S2 includes:
searching in related data sources by using the keywords to obtain a plurality of pieces of data information matched with the keywords;
and obtaining the keyword library according to all the data information.
3. The public opinion monitoring method according to claim 1, wherein the step S3 includes:
performing word segmentation operation on all data in the keyword library by using a word segmentation toolkit to obtain a word segmentation database;
converting all the word segmentation data information into word vector information;
extracting negative words in the word segmentation database by using a BilSTM model according to the word vector information;
and taking the negative words as sensitive words to obtain the sensitive word bank.
4. The public opinion monitoring method according to claim 1, wherein the step S4 includes:
s41: configuring a data acquisition expression, and merging the keyword library and the sensitive word library into a combined word library;
s42: searching a related public opinion news list by using the combined word bank;
s43: adding the webpage address of the current news page of the related public opinion news list into a list to be collected;
s44: extracting the webpage address from a list to be collected, and accessing the related information of the current news page to form initial public opinion data;
s45: if the initial public opinion data simultaneously meets the integrity and uniqueness, the step S46 is carried out, otherwise, the step S47 is carried out;
s46: outputting the initial public sentiment data as the final public sentiment data;
s47: and judging whether the current news page is the last page of the related public opinion news list, if so, returning to the step S46, otherwise, returning to the step S43.
5. The public opinion monitoring method according to claim 1, wherein the step S5 includes:
processing the final public opinion data in batches to obtain a plurality of batches of public opinion data;
removing special characters and useless characters from each batch of public opinion data by using a regular expression to obtain processed final public opinion data;
performing data feature extraction operation on the processed final public sentiment data to obtain a feature extraction result;
and outputting the feature extraction result as the preprocessing result.
6. The public opinion monitoring method according to any one of claims 1-5, wherein the public opinion analysis process includes: general statistical analysis, keyword extraction, heat calculation and multi-modal emotion analysis.
7. The public opinion monitoring method according to claim 6, wherein the popularity calculation includes a popularity index calculation of a single data source and a popularity index calculation of a plurality of data sources, and the popularity index calculation formula of the plurality of data sources is as follows:
Figure FDA0003472514200000031
wherein H is a calorific value, H i Heat index integration of all Final public opinion data for the ith related data Source, W i A heat weight for the associated data source;
the heat index x of a single relevant data source is calculated by the formula:
Figure FDA0003472514200000032
wherein E is the user attention index, T, of each relevant data source s T represents the freshness of the related public opinion news s A is the release time, B is the acquisition time, and T represents the total number of seconds in a thermal cycle of 3 days.
8. The consensus monitoring method of claim 6, wherein the multimodal sentiment analysis comprises:
acquiring picture characteristics and character characteristics in the preprocessing result;
training a picture text alignment network according to the picture characteristics and the text characteristics to obtain a trained picture text alignment network;
according to the picture features and the text features, utilizing the trained picture text to align a network to obtain fusion features;
taking the fusion features as input of a classifier to obtain a multi-modal emotion analysis result;
the loss function of the multi-modal emotion analysis model is as follows:
L=L CA -L DA
wherein L is CA Are lost for cross reconstruction and
Figure FDA0003472514200000033
m is the number of samples, x j Representing the original characteristics of the j mode, D j Encoder representing j modality, E i Encoder representing i modality, x i Representing the original characteristics of the i mode, L DA Is to distribute the alignment loss and
Figure FDA0003472514200000034
W ij is the 2-Wasserstein distance between modes i and j and
Figure FDA0003472514200000035
wherein, mu is in combination with
Figure FDA0003472514200000041
Are all hidden layer feature vectors generated by the encoder.
9. The public opinion monitoring method according to claim 8, wherein the picture text alignment network comprises: the system comprises a picture characteristic encoder, a text characteristic encoder, a shared characteristic layer and a plurality of shared characteristic decoders, wherein the picture characteristic encoder and the text characteristic encoder are simultaneously connected with the input end of the shared characteristic layer, the shared characteristic encoders are connected with the output end of the shared characteristic layer, and the shared characteristic layer is also connected with a classifier;
the picture characteristic encoder is used for encoding the picture characteristic;
the text feature encoder is used for encoding the text features;
a plurality of shared feature decoders for decoding the shared features to output reconstructed picture features and reconstructed text features;
the classifier is used for classifying the shared features so as to train the image text alignment network.
10. A public opinion monitoring system based on the public opinion monitoring method according to any one of claims 1 to 9, wherein the public opinion monitoring system comprises:
the keyword acquisition module is used for acquiring keywords;
the keyword expansion module is used for expanding the keywords;
the sensitive word extraction module is used for extracting sensitive words in the keyword library;
the public opinion data acquisition module is used for acquiring final public opinion data of the keyword library and the sensitive word library;
the data preprocessing module is used for preprocessing the final public opinion data;
the public opinion analysis module is used for analyzing the preprocessing result;
and the public opinion reporting module is used for displaying the public opinion monitoring result to the user.
CN202210047264.7A 2022-01-17 2022-01-17 Public opinion monitoring method and public opinion monitoring system Withdrawn CN115017302A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210047264.7A CN115017302A (en) 2022-01-17 2022-01-17 Public opinion monitoring method and public opinion monitoring system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210047264.7A CN115017302A (en) 2022-01-17 2022-01-17 Public opinion monitoring method and public opinion monitoring system

Publications (1)

Publication Number Publication Date
CN115017302A true CN115017302A (en) 2022-09-06

Family

ID=83067426

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210047264.7A Withdrawn CN115017302A (en) 2022-01-17 2022-01-17 Public opinion monitoring method and public opinion monitoring system

Country Status (1)

Country Link
CN (1) CN115017302A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115982473A (en) * 2023-03-21 2023-04-18 环球数科集团有限公司 AIGC-based public opinion analysis arrangement system
CN117131281A (en) * 2023-10-26 2023-11-28 中关村科学城城市大脑股份有限公司 Public opinion event processing method, apparatus, electronic device and computer readable medium

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115982473A (en) * 2023-03-21 2023-04-18 环球数科集团有限公司 AIGC-based public opinion analysis arrangement system
CN117131281A (en) * 2023-10-26 2023-11-28 中关村科学城城市大脑股份有限公司 Public opinion event processing method, apparatus, electronic device and computer readable medium
CN117131281B (en) * 2023-10-26 2024-02-09 中关村科学城城市大脑股份有限公司 Public opinion event processing method, apparatus, electronic device and computer readable medium

Similar Documents

Publication Publication Date Title
Ishmam et al. Hateful speech detection in public facebook pages for the bengali language
Gupta et al. A survey of text question answering techniques
CN104615593B (en) Hot microblog topic automatic testing method and device
US8161059B2 (en) Method and apparatus for collecting entity aliases
CN111950273A (en) Network public opinion emergency automatic identification method based on emotion information extraction analysis
Suleiman et al. Comparative study of word embeddings models and their usage in Arabic language applications
Geçkil et al. A clickbait detection method on news sites
Khasawneh et al. Arabic sentiment polarity identification using a hybrid approach
CN115017302A (en) Public opinion monitoring method and public opinion monitoring system
KR101059557B1 (en) Computer-readable recording media containing information retrieval methods and programs capable of performing the information
Faruque et al. Ascertaining polarity of public opinions on Bangladesh cricket using machine learning techniques
CN111813874B (en) Terahertz knowledge graph construction method and system
CN109815401A (en) A kind of name disambiguation method applied to Web people search
Mouty et al. The effect of the similarity between the two names of twitter users on the credibility of their publications
CN116910238A (en) Knowledge perception false news detection method based on twin network
Saghayan et al. Exploring the impact of machine translation on fake news detection: A case study on persian tweets about covid-19
Rehman et al. User-aware multilingual abusive content detection in social media
CN114255067A (en) Data pricing method and device, electronic equipment and storage medium
Campbell et al. Content+ context networks for user classification in twitter
CN109871429B (en) Short text retrieval method integrating Wikipedia classification and explicit semantic features
Gohil et al. Multilabel classification for emotion analysis of multilingual tweets
De Saa et al. Self-reflective and introspective feature model for hate content detection in sinhala youtube videos
Wadawadagi et al. A multi-layer approach to opinion polarity classification using augmented semantic tree kernels
TWI534640B (en) Chinese network information monitoring and analysis system and its method
Alashri et al. Lexi-augmenter: Lexicon-based model for tweets sentiment analysis

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20220906