WO2019227710A1 - Network public opinion analysis method and apparatus, and computer-readable storage medium - Google Patents

Network public opinion analysis method and apparatus, and computer-readable storage medium Download PDF

Info

Publication number
WO2019227710A1
WO2019227710A1 PCT/CN2018/102116 CN2018102116W WO2019227710A1 WO 2019227710 A1 WO2019227710 A1 WO 2019227710A1 CN 2018102116 W CN2018102116 W CN 2018102116W WO 2019227710 A1 WO2019227710 A1 WO 2019227710A1
Authority
WO
WIPO (PCT)
Prior art keywords
public opinion
viewpoint
vocabulary set
word vector
model
Prior art date
Application number
PCT/CN2018/102116
Other languages
French (fr)
Chinese (zh)
Inventor
吴壮伟
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2019227710A1 publication Critical patent/WO2019227710A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering

Definitions

  • the present application relates to the field of computer technology, and in particular, to a method and device for analyzing public opinion on a network, and a computer-readable storage medium.
  • Internet public opinion refers to the network public opinion on the Internet that has different views on social events, and is a manifestation of social public opinion. It is mainly based on the Internet and the event as the core. The public's expression, dissemination and interaction of the event's emotions, attitudes, opinions, opinions, and subsequent influences.
  • the present application provides a network public opinion analysis method, device, and computer-readable storage medium, the main purpose of which is to improve the ability of monitoring and early warning of public opinion.
  • the present application also provides a method for analyzing public opinion on the network, which method includes:
  • the word vector of the viewpoint is input into a pre-trained sentiment scoring model, and the sentiment score of the viewpoint is output.
  • the popularity of the sentiment article corresponding to the vocabulary set contained in the viewpoint the popularity of each data channel and the preset weight of each data channel are used to calculate the viewpoint heat;
  • the present application also provides a network public opinion analysis device.
  • the device includes a memory and a processor.
  • the memory stores a public opinion analysis program that can be run on the processor.
  • the public opinion analysis When the program is executed by the processor, the following steps are implemented:
  • the word vector of the viewpoint is input into a pre-trained sentiment scoring model, and the sentiment score of the viewpoint is output.
  • the popularity of the sentiment article corresponding to the vocabulary set contained in the viewpoint the popularity of each data channel and the preset weight of each data channel are used to calculate the viewpoint heat;
  • the present application also provides a computer-readable storage medium, where the computer-readable storage medium stores a public opinion analysis program, and the public opinion analysis program can be executed by one or more processors to achieve The steps of the analysis method of network public opinion as described above.
  • the public opinion analysis method, device, and computer-readable storage medium proposed in this application determine public opinion events, and collect public opinion articles related to public opinion events from preset data channels through a distributed web crawler; perform word segmentation processing on public opinion articles to obtain The vocabulary set in the public opinion article is used to characterize the public opinion article; the clustering algorithm is used to perform a cluster analysis on the vocabulary set to generate multiple views of the public opinion event, and the word vector of the viewpoint is calculated based on the word vector of the vocabulary set included in the viewpoint; One or more vocabulary sets are extracted from the vocabulary set, and the public opinion article represented by the extracted vocabulary set is taken as the core topic of the viewpoint; the word vector of the viewpoint is input into a pre-trained sentiment score model, and the sentiment score of the viewpoint is output, and Calculate the popularity of the opinion according to the popularity of the sentiment articles corresponding to the vocabulary set contained in the opinion and the preset weights of each data channel; calculate the opinion's public opinion index based on the sentiment score and the popularity, and determine that the absolute value of the public opinion index is
  • the viewpoint and the core topic of the abnormal viewpoint generate early warning information and output.
  • This application uses cluster analysis on the collected articles to construct multiple types of viewpoints of public opinion events, achieving a high degree of generalization of events, and integrating sentiment scoring model pairs.
  • the sentiment scores of the generalized views are calculated to realize the judgment of the impact of various views on public opinion events, and then to provide early warning, which improves the ability of monitoring and early warning of public opinion.
  • FIG. 1 is a schematic flowchart of a network public opinion analysis method according to an embodiment of the present application
  • FIG. 2 is a schematic diagram of an internal structure of a network public opinion analysis device according to an embodiment of the present application
  • FIG. 3 is a schematic block diagram of a public opinion analysis program in a network public opinion analysis device according to an embodiment of the present application.
  • FIG. 1 is a schematic flowchart of a method for analyzing public opinion on a network provided by an embodiment of the present application. The method may be performed by a device, which may be implemented by software and / or hardware.
  • a method for analyzing public opinion on the network includes:
  • Step S10 Determine a public opinion event, and collect a public opinion article related to the public opinion event from a preset data channel through a distributed web crawler.
  • the public opinion event in the embodiment of the present application is generally an event that occurs at present, and the user may set one or more keywords to indicate the public opinion event.
  • Collect public opinion articles related to the public opinion event from a preset data channel through a distributed web crawler, and store the obtained public opinion articles according to the corresponding data channel.
  • a URL Uniform Resource Locator, Uniform Resource Locator
  • a web crawler is used regularly to crawl related information based on the URL addresses in the list and according to a preset keyword that can reflect the public opinion event.
  • Public opinion articles are added to the corpus.
  • Data channels include but are not limited to Weibo, WeChat, news portals, forums, etc.
  • the public opinion articles obtained from the above data channels mainly include news comments, forum posts, Weibo blog posts, WeChat articles, etc.
  • Step S20 Perform word segmentation processing on the public opinion article, and obtain a vocabulary set in the public opinion article to characterize the public opinion article.
  • this step includes the following refinement steps: extract the body data of the public opinion article, remove irrelevant data such as HTML (Hyper Text Markup Language), tag data, image markup, etc., and then remove the Non-Chinese characters. Segment the retained body data by the word segmentation tool to generate Chinese-separated vocabulary sets separated by spaces.
  • the stop word processing is performed on the vocabulary set according to a preset stop word vocabulary, and the remaining vocabulary set is used to characterize the public opinion article, that is, the space-separated vocabulary set is used to characterize the public opinion article.
  • step S30 a clustering analysis is performed on a vocabulary set of a clustering algorithm to generate multiple types of opinions of public opinion events, and a word vector of the opinion is calculated according to the vocabulary set included in the opinion.
  • step S40 one or more vocabulary sets are extracted from the vocabulary set contained in the viewpoint, and the public opinion article represented by the extracted vocabulary set is taken as the core topic of the viewpoint.
  • this step may include the following detailed steps:
  • the Chinese Wikipedia corpus select multiple words in the vocabulary set as keywords based on the word frequency-inverse text frequency index TF-IDF algorithm; generate a word vector model of the Chinese corpus based on the corpus, and calculate the keywords based on the word vector model Word vector, calculate the word vector of the vocabulary set according to the word vector of the keywords; cluster all vocabulary sets of public opinion events according to the word vectors of the vocabulary set and the Kmeans algorithm (K-means algorithm) to divide the vocabulary set of public opinion events into For multiple types of viewpoints, since the vocabulary set characterizes public opinion articles, the clustering of vocabulary sets is actually the clustering of public opinion articles; the keywords of the vocabulary set included in the viewpoint are summarized and calculated based on the word vectors of the summarized keywords Perspective word vector.
  • K-means algorithm Kmeans algorithm
  • Obtain the Chinese Wikipedia corpus Based on the corpus, calculate the importance of each word in the vocabulary set of each vocabulary set according to the TF-IDF algorithm. For each vocabulary set, select the top N vocabularies with the highest significance as the keywords of the article. .
  • the Word2vec model of Chinese corpus is generated based on the Chinese Wikipedia corpus. For each vocabulary set, the word vector of the selected N keywords is calculated by the Word2vec model, and the word vector of the vocabulary set is calculated by the word vector of the keyword. Calculate the word vectors of all vocabulary sets of public opinion events in this way. Since the vocabulary set represents public opinion articles, the keywords for extracting the vocabulary set are actually the keywords for extracting public opinion articles.
  • all public opinion articles (represented by the vocabulary set) in the corpus are clustered according to the Kmeans algorithm and classified into multiple types of opinions.
  • the initial value of the K value of the Kmeans algorithm is randomly set, and the K value is the number of classification groups, and the K value is adjusted according to the evaluation of the classification result until the accuracy of the classification result reaches a set threshold.
  • the keywords of all vocabulary sets in each viewpoint are summarized, and the word frequency of each keyword is calculated.
  • the word frequency reflects the weight of the keyword.
  • the Word2vec model is used to calculate the word vector of each keyword summarized in the viewpoint, and the word vector of the viewpoint is calculated according to the word vector of the keyword and the word frequency.
  • Calculate the similarity between the vocabulary set and the viewpoint based on the calculated word vector of the viewpoint and the word vector of the vocabulary set under the viewpoint.
  • the similarity between the vectors can be calculated by the cosine similarity, and the one with the highest similarity or Articles of public opinion characterized by multiple vocabulary sets serve as the core topic of opinion.
  • step S50 the word vector of the viewpoint is input into a pre-trained sentiment score model, and the sentiment score of the viewpoint is output, and according to the popularity of the sentiment article corresponding to the vocabulary set contained in the viewpoint in each data channel and the preset weight of each data channel Calculate the heat of opinion.
  • the preset deep neural network model is trained according to the constructed sample library in advance, the model parameters are determined, and the deep neural network model that determines the model parameters is used as the emotion score model.
  • this step includes the following detailed steps: obtaining public opinion text data to which tag data is added to form a sample library, wherein the tag data is an emotional score marked on the text data according to a positive, negative, and neutral tendency of the comment, such as a positive comment Is 1, negative comments are -1, neutral comments are 0; Wikipedia corpus combined with TF-IDF algorithm to extract the keywords of public opinion text data in the sample database, and the word vector of the keywords is calculated by the trained word vector model ; Use the word vector and label data of public opinion text data in the sample library as training samples, and input them into the preset deep neural network model for training to determine the model parameters, and use the deep neural network model with the model parameters as the target.
  • the emotion scoring model is described.
  • cross-validation is used to train the model.
  • the statistical data mainly includes the number of reads, comments, and retweets of a single WeChat article
  • the statistical data mainly includes the number of reposts, comments, and likes.
  • public opinion index popularity * sentiment score.
  • the size of the public opinion index reflects the influence of the opinion on public opinion.
  • the absolute value of the public opinion index is closer to 1. , It shows that the greater the influence of this opinion on public opinion, when the calculated public opinion index reaches a preset threshold, the opinion is judged to be an abnormal opinion. For example, if the preset threshold is 0.8 and the calculated public opinion index of a certain viewpoint is -0.9, the absolute value is 0.9, which is greater than the preset threshold, and the public opinion index is biased toward negative evaluation. At this time, early warning information can be output. And the warning message contains the core topics of this view.
  • the public opinion analysis method proposed in this embodiment determines public opinion events, collects public opinion articles related to public opinion events from preset data channels through a distributed web crawler, performs word segmentation processing on public opinion articles, and obtains vocabulary sets in public opinion articles to Characterize public opinion articles; use a clustering algorithm to perform a cluster analysis on the vocabulary set to generate multiple opinions of public opinion events, and calculate the word vector of the viewpoint based on the word vector of the vocabulary set contained in the viewpoint; extract one or more from the vocabulary set contained in the viewpoint Two vocabulary sets, and the public opinion articles represented by the extracted vocabulary set are taken as the core topic of the viewpoint; the word vector of the viewpoint is input into a pre-trained sentiment score model, and the sentiment score of the viewpoint is output, and the corresponding
  • the popularity of the public opinion article in each data channel and the preset weights of each data channel calculate the popularity of the opinion; calculate the opinion public opinion index based on the sentiment score and the popularity, and determine that the opinion whose absolute value of the public opinion index is greater than a preset threshold is an abnormal
  • This application uses cluster analysis on collected articles to construct multiple views of public opinion events, and achieves a high degree of generalization of events. It also integrates the sentiment scoring model to calculate the sentiment scores of the summarized views. In order to judge the influence of various opinions on public opinion events, and then carry out early warning, the ability of monitoring and early warning of public opinion has been improved.
  • the present application also provides a network public opinion analysis device.
  • a network public opinion analysis device Referring to FIG. 2, a schematic diagram of an internal structure of a network public opinion analysis apparatus according to an embodiment of the present application is shown.
  • the network public opinion analysis device 1 may be a PC (Personal Computer) or a terminal device such as a smart phone, a tablet computer, or a portable computer.
  • the network public opinion analysis device 1 includes at least a memory 11, a processor 12, a network interface 13, and a communication bus 14.
  • the memory 11 includes at least one type of readable storage medium.
  • the readable storage medium includes a flash memory, a hard disk, a multimedia card, a card-type memory (for example, SD or DX memory, etc.), a magnetic memory, a magnetic disk, an optical disk, and the like.
  • the memory 11 may be an internal storage unit of the network public opinion analysis device 1 in some embodiments, such as a hard disk of the network public opinion analysis device 1.
  • the memory 11 may also be an external storage device of the network public opinion analysis device 1 in other embodiments, for example, a plug-in hard disk, a smart memory card (SMC), and a secure digital device provided on the network public opinion analysis device 1. (Secure Digital, SD) card, Flash card, etc.
  • the memory 11 may include both an internal storage unit of the network public opinion analysis device 1 and an external storage device.
  • the memory 11 can be used not only to store application software and various types of data installed on the network public opinion analysis device 1, such as the code of the public opinion analysis program 01, but also to temporarily store data that has been or will be output.
  • the processor 12 may be a central processing unit (CPU), a controller, a microcontroller, a microprocessor, or other data processing chip in some embodiments, and is configured to run program codes or processes stored in the memory 11 Data, for example, public opinion analysis program 01 and so on.
  • CPU central processing unit
  • controller controller
  • microcontroller microcontroller
  • microprocessor or other data processing chip in some embodiments, and is configured to run program codes or processes stored in the memory 11 Data, for example, public opinion analysis program 01 and so on.
  • the network interface 13 may optionally include a standard wired interface, a wireless interface (such as a WI-FI interface), and is generally used to establish a communication connection between the device 1 and other electronic devices.
  • a standard wired interface such as a WI-FI interface
  • the communication bus 14 is used to implement connection communication between these components.
  • the device 1 may further include a user interface.
  • the user interface may include a display, an input unit such as a keyboard, and the optional user interface may further include a standard wired interface and a wireless interface.
  • the display may be an LED display, a liquid crystal display, a touch-type liquid crystal display, an OLED (Organic Light-Emitting Diode, organic light emitting diode) touch device, or the like.
  • the display may also be appropriately referred to as a display screen or a display unit, which is used to display information processed in the network public opinion analysis device 1 and to display a visualized user interface.
  • FIG. 2 only shows a network public opinion analysis device 1 having components 11-14 and a public opinion analysis program 01.
  • FIG. 1 does not constitute an analysis device for network public opinion 1
  • the definition may include fewer or more components than shown, or some components may be combined, or different component arrangements.
  • the public opinion analysis program 01 is stored in the memory 11; when the processor 12 executes the public opinion analysis program 01 stored in the memory 11, the following steps are implemented:
  • the public opinion event in the embodiment of the present application is generally an event that occurs at present, and the user may set one or more keywords to indicate the public opinion event.
  • Collect public opinion articles related to the public opinion event from a preset data channel through a distributed web crawler, and store the obtained public opinion articles according to the corresponding data channel.
  • a list of URLs to be crawled is set in advance, and a web crawler is used to periodically crawl related public opinion articles according to the URL addresses in the above list according to the preset keywords that can reflect the public opinion event and add them to the corpus.
  • Data channels Including but not limited to Weibo, WeChat, News Portal, Forum, etc.
  • the public opinion articles obtained from the above data channels mainly include news comments, forum posts, Weibo blog posts, WeChat articles, etc.
  • Word segmentation is performed on the public opinion article, and a vocabulary set in the public opinion article is obtained to characterize the public opinion article.
  • this step includes the following refinement steps: extract the body data of the public opinion article, remove irrelevant data such as HTML tag data, image tags, and then remove non-Chinese characters in the body data through regular expressions. Segment the retained body data by the word segmentation tool to generate Chinese-separated vocabulary sets separated by spaces.
  • the stop word processing is performed on the vocabulary set according to a preset stop word vocabulary, and the remaining vocabulary set is used to characterize the public opinion article, that is, the space-separated vocabulary set is used to characterize the public opinion article.
  • Cluster analysis is performed by using a clustering algorithm vocabulary set to generate multiple types of opinions of public opinion events, and a word vector of the opinion is calculated according to the vocabulary set included in the opinion.
  • One or more vocabulary sets are extracted from the vocabulary set contained in the viewpoint, and the public opinion article represented by the extracted vocabulary set is taken as the core topic of the viewpoint.
  • this step may include the following detailed steps:
  • the Chinese Wikipedia corpus select multiple words in the vocabulary set as keywords based on the word frequency-inverse text frequency index TF-IDF algorithm; generate a word vector model of the Chinese corpus based on the corpus, and calculate the keywords based on the word vector model Word vector, calculate the word vector of the vocabulary set according to the word vector of the keywords; cluster all vocabulary sets of public opinion events according to the word vectors of the vocabulary set and the Kmeans algorithm (K-means algorithm) to divide the vocabulary set of public opinion events into For multiple types of viewpoints, since the vocabulary set characterizes public opinion articles, the clustering of vocabulary sets is actually the clustering of public opinion articles; the keywords of the vocabulary set included in the viewpoint are summarized and calculated based on the word vectors of the summarized keywords Perspective word vector.
  • K-means algorithm Kmeans algorithm
  • Obtain the Chinese Wikipedia corpus Based on the corpus, calculate the importance of each word in the vocabulary set of each vocabulary set according to the TF-IDF algorithm. For each vocabulary set, select the top N vocabularies with the highest significance as the keywords of the article. .
  • the Word2vec model of Chinese corpus is generated based on the Chinese Wikipedia corpus. For each vocabulary set, the word vector of the selected N keywords is calculated by the Word2vec model, and the word vector of the vocabulary set is calculated by the word vector of the keyword. Calculate the word vectors of all vocabulary sets of public opinion events in this way. Since the vocabulary set represents public opinion articles, the keywords for extracting the vocabulary set are actually the keywords for extracting public opinion articles.
  • all public opinion articles (represented by the vocabulary set) in the corpus are clustered according to the Kmeans algorithm and classified into multiple types of opinions.
  • the initial value of the K value of the Kmeans algorithm is randomly set, and the K value is the number of classification groups, and the K value is adjusted according to the evaluation of the classification result until the accuracy of the classification result reaches a set threshold.
  • the keywords of public opinion articles represented by all vocabulary sets in each viewpoint are summarized, and the word frequency of each keyword is calculated.
  • the word frequency reflects the weight of the keyword.
  • the Word2vec model is used to calculate the word vector of each keyword summarized in the viewpoint, and the word vector of the viewpoint is calculated according to the word vector of the keyword and the word frequency.
  • Calculate the similarity between the vocabulary set and the viewpoint based on the calculated word vector of the viewpoint and the word vector of the vocabulary set under the viewpoint.
  • the similarity between the vectors can be calculated by the cosine similarity, and the one with the highest similarity or Articles of public opinion characterized by multiple vocabulary sets serve as the core topic of opinion.
  • the word vector of the viewpoint is input into a pre-trained sentiment scoring model, and the sentiment score of the viewpoint is output.
  • the popularity of the sentiment article corresponding to the vocabulary set contained in the viewpoint the popularity of each data channel and the preset weight of each data channel are used to calculate the viewpoint heat.
  • the preset deep neural network model is trained according to the constructed sample library in advance, the model parameters are determined, and the deep neural network model that determines the model parameters is used as the emotion score model.
  • this step includes the following detailed steps: obtaining public opinion text data to which tag data is added to form a sample library, wherein the tag data is an emotional score marked on the text data according to a positive, negative, and neutral tendency of the comment, such as a positive comment Is 1, negative comments are -1, neutral comments are 0; Wikipedia corpus combined with TF-IDF algorithm to extract the keywords of public opinion text data in the sample database, and the word vector of the keywords is calculated by the trained word vector model ; Use the word vector and label data of public opinion text data in the sample library as training samples, and input them into the preset deep neural network model for training to determine the model parameters, and use the deep neural network model with the model parameters as the target.
  • the emotion scoring model is described.
  • cross-validation is used to train the model.
  • the statistical data mainly includes the number of reads, comments, and retweets of a single WeChat article
  • the statistical data mainly includes the number of reposts, comments, and likes.
  • public opinion index popularity * sentiment score.
  • the size of the public opinion index reflects the influence of the opinion on public opinion.
  • the absolute value of the public opinion index is closer to 1. , It shows that the greater the influence of this opinion on public opinion, when the calculated public opinion index reaches a preset threshold, the opinion is judged to be an abnormal opinion. For example, if the preset threshold is 0.8 and the calculated public opinion index of a certain viewpoint is -0.9, the absolute value is 0.9, which is greater than the preset threshold, and the public opinion index is biased toward negative evaluation. At this time, early warning information can be output. And the warning message contains the core topics of this view.
  • the device for analyzing public opinion in this embodiment.
  • the device first determines public opinion events, and then collects public opinion articles related to public opinion events from preset data channels through a distributed web crawler; performs word segmentation processing on the public opinion articles to obtain public opinion articles.
  • Vocabulary set to characterize public opinion articles cluster analysis is performed on the vocabulary set using a clustering algorithm to generate multiple views of public opinion events, and the word vector of the viewpoint is calculated based on the word vector of the vocabulary set contained in the viewpoint; Extract one or more vocabulary sets, and use the sentiment articles represented by the extracted vocabulary set as the core topic of the viewpoint; input the word vector of the viewpoint into a pre-trained sentiment scoring model, output the sentiment's sentiment score, and include
  • the popularity of the public opinion article corresponding to the vocabulary set in each data channel and the preset weight of each data channel calculate the popularity of the opinion; calculate the opinion public opinion index based on the sentiment score and the popularity, and determine the opinion whose absolute value of the public opinion index is greater than the preset threshold Is an abnormal view
  • this application constructs multiple viewpoints of public opinion events, achieving a high degree of generalization of events, and integrates the sentiment scoring model to summarize the viewpoints obtained. Emotional scores are calculated to determine the impact of various opinions on public opinion events, and then to provide early warning, which improves the ability to monitor and alert public opinion.
  • the public opinion analysis program may also be divided into one or more modules, and the one or more modules are stored in the memory 11 and are implemented by one or more processors (this embodiment is The processor 12) executes to complete this application.
  • the modules referred to in this application refer to a series of computer program instruction segments capable of performing specific functions, and are used to describe the execution process of a public opinion analysis program in an analysis device for network public opinion.
  • FIG. 3 it is a schematic diagram of a program module of a public opinion analysis program in an embodiment of an apparatus for analyzing public opinion on the Internet in this application.
  • the public opinion analysis program can be divided into a data collection module 10 and an article segmentation module 20.
  • Article clustering module 30, topic extraction module 40, score calculation module 50, and index calculation module 60 for example:
  • the data collection module 10 is configured to determine a public opinion event and collect a public opinion article related to the public opinion event from a preset data channel through a distributed web crawler;
  • the article segmentation module 20 is configured to perform word segmentation processing on the public opinion article, and obtain a vocabulary set in the public opinion article to characterize the public opinion article;
  • the article clustering module 30 is configured to perform cluster analysis by using a clustering algorithm vocabulary set to generate multiple types of opinions of public opinion events, and calculate a word vector of the opinion according to the vocabulary set included in the opinion;
  • the topic extraction module 40 is configured to extract one or more vocabulary sets from the vocabulary set contained in the viewpoint, and use the public opinion articles represented by the extracted vocabulary set as the core topic of the viewpoint;
  • the score calculation module 50 is used to input a word vector of a viewpoint into a pre-trained sentiment scoring model, output a sentiment score, and according to the popularity of the sentiment article corresponding to the vocabulary set contained in the viewpoint in each data channel and each data channel Calculation of the popularity of views by preset weights;
  • the index calculation module 60 is configured to calculate a public opinion index of a viewpoint according to the sentiment score and the heat, and determine that a viewpoint whose absolute value is greater than a preset threshold is an abnormal viewpoint, and according to the abnormal viewpoint and the core of the abnormal viewpoint The topic generates warning information and outputs it.
  • an embodiment of the present application further proposes a computer-readable storage medium.
  • the computer-readable storage medium stores a public opinion analysis program, and the public opinion analysis program can be executed by one or more processors to implement the following operations:
  • the word vector of the viewpoint is input into a pre-trained sentiment scoring model, and the sentiment score of the viewpoint is output.
  • the popularity of the sentiment article corresponding to the vocabulary set contained in the viewpoint the popularity of each data channel and the preset weight of each data channel are used to calculate the viewpoint heat;

Abstract

Disclosed are a network public opinion analysis method and apparatus, and a computer-readable storage medium. The method comprises: determining a public opinion event, and collecting public opinion articles related to the public opinion event (S10); pre-processing the collected public opinion articles, and acquiring vocabulary sets in the public opinion articles to represent the public opinion articles (S20); performing clustering analysis on the vocabulary sets by means of a clustering algorithm to generate a plurality of opinions on the public opinion event, and computing word vectors of the opinions (S30); extracting core topics from the vocabulary sets contained in the opinions (S40); computing emotion scores of the opinions by means of an emotion scoring model, and computing the popularity of the opinions (S50); and computing public opinion indexes of the opinions according to the emotion scores and the popularity, determining an opinion, the absolute value of a public opinion index thereof being greater than a pre-set threshold value, to be an abnormal opinion, and generating early-warning information according to the abnormal opinion and a core topic thereof and outputting the early-warning information (S60). The method improves public opinion monitoring and early-warning performance.

Description

网络舆情的分析方法、装置及计算机可读存储介质Network public opinion analysis method, device and computer-readable storage medium
本申请基于巴黎公约申明享有2018年05月31日递交的申请号为201810544762.6、名称为“网络舆情的分析方法、装置及计算机可读存储介质”的中国专利申请的优先权,该中国专利申请的整体内容以参考的方式结合在本申请中。This application is based on the Paris Convention claiming the priority of a Chinese patent application filed on May 31, 2018 with application number 201810544762.6 and entitled "Analysis Method, Apparatus and Computer-readable Storage Media for Internet Public Opinion". The entire contents are incorporated herein by reference.
技术领域Technical field
本申请涉及计算机技术领域,尤其涉及一种网络舆情的分析方法、装置及计算机可读存储介质。The present application relates to the field of computer technology, and in particular, to a method and device for analyzing public opinion on a network, and a computer-readable storage medium.
背景技术Background technique
网络舆情是指在网络上流行的对社会事件不同看法的网络舆论,是社会舆论的一种表现形式。主要是以网络为载体,以事件为核心,公众对该事件的情感、态度、意见、观点的表达、传播与互动,以及后续影响力的集合。Internet public opinion refers to the network public opinion on the Internet that has different views on social events, and is a manifestation of social public opinion. It is mainly based on the Internet and the event as the core. The public's expression, dissemination and interaction of the event's emotions, attitudes, opinions, opinions, and subsequent influences.
通过网络传播公众对现实生活中某些热点、焦点问题所持的有较强影响力、倾向性的言论和观点。网络舆情的表现方式多种多样,例如:新闻评论、论坛帖子、微博博文、微信文章等。近年来,网络舆情对政治生活秩序和社会稳定的影响与日俱增,一些重大的网络舆情事件使人们开始认识到网络对社会监督起到的巨大作用。Through the Internet, the public has strong influence and tendentious opinions and views on some hotspots and focus issues in real life. There are many ways to express public opinion online, such as news comments, forum posts, Weibo blog posts, WeChat articles, and so on. In recent years, the influence of internet public opinion on political life order and social stability has been increasing day by day. Some major incidents of internet public opinion have made people begin to realize the huge role of internet in social supervision.
同时,网络舆情突发事件如果处理不当,极有可能诱发民众的不良情绪,引发群众的违规和过激行为,进而对社会稳定构成威胁。因此,对于网络舆情状态的监测就变得较为重要,需要对网络舆情的情感倾向以及观点等进行分析和预警。目前的主流舆情系统主要有人员参与在内,比如业内中有舆情分析师筛选舆情,然后对系统进行跟踪,从而对事件在影响力上对其舆情状态有一个大致的判断,但是这种方案存在监测媒体源不够全面的缺陷,以及缺乏一个自动化的舆情指数计算方案,导致现有的舆情系统不能准确的获取事件的具体舆情指数,而无法准确地进行预警。At the same time, if the Internet public opinion emergencies are not handled properly, it will most likely induce bad feelings of the people, cause the people to violate the rules and act excessively, and then pose a threat to social stability. Therefore, it is more important to monitor the state of Internet public opinion, and it is necessary to analyze and warn the emotional tendency and viewpoint of Internet public opinion. The current mainstream public opinion system mainly involves personnel. For example, there are public opinion analysts in the industry who screen public opinion and then track the system to make a rough judgment on the state of public opinion in terms of the impact of the incident, but this scheme exists The shortcomings of insufficient monitoring of media sources and the lack of an automated public opinion index calculation scheme have resulted in the existing public opinion system not being able to accurately obtain the event's specific public opinion index, and fail to accurately warn.
发明内容Summary of the Invention
本申请提供一种网络舆情的分析方法、装置及计算机可读存储介质,其主要目的在于提高对舆情的监控和预警能力。The present application provides a network public opinion analysis method, device, and computer-readable storage medium, the main purpose of which is to improve the ability of monitoring and early warning of public opinion.
为实现上述目的,本申请还提供一种网络舆情的分析方法,该方法包括:In order to achieve the above purpose, the present application also provides a method for analyzing public opinion on the network, which method includes:
确定舆情事件,通过分布式网络爬虫从预设的数据渠道采集与所述舆情事件相关的舆情文章;Determine public opinion events, and collect public opinion articles related to the public opinion events from a preset data channel through a distributed web crawler;
对所述舆情文章进行分词处理,获取所述舆情文章中的词汇集合以表征舆情文章;Perform word segmentation processing on the public opinion article, and obtain a vocabulary set in the public opinion article to characterize the public opinion article;
采用聚类算法词汇集合进行聚类分析,生成舆情事件的多个类型的观点,并根据所述观点包含的词汇集合计算该观点的词向量;Cluster analysis using a vocabulary set of a clustering algorithm to generate multiple types of opinions of public opinion events, and calculate a word vector of the opinion according to the vocabulary set included in the opinion;
从观点包含的词汇集合中提取一个或者多个词汇集合,将提取的词汇集合表征的舆情文章作为该观点的核心话题;Extract one or more vocabulary sets from the vocabulary set contained in the viewpoint, and use the sentiment articles represented by the extracted vocabulary set as the core topic of the viewpoint;
将观点的词向量输入到预先训练好的情感评分模型中,输出观点的情感得分,并根据观点包含的词汇集合对应的舆情文章在各数据渠道的热度和各数据渠道的预设权重计算观点的热度;The word vector of the viewpoint is input into a pre-trained sentiment scoring model, and the sentiment score of the viewpoint is output. According to the popularity of the sentiment article corresponding to the vocabulary set contained in the viewpoint, the popularity of each data channel and the preset weight of each data channel are used to calculate the viewpoint heat;
根据所述情感得分和所述热度计算观点的舆情指数,并判定舆情指数的绝对值大于预设阈值的观点为异常观点,根据所述异常观点和该异常观点的核心话题生成预警信息并输出。Calculate a public opinion index of a viewpoint according to the sentiment score and the heat, and determine that a viewpoint whose absolute value is greater than a preset threshold is an abnormal viewpoint, and generate and output warning information according to the abnormal viewpoint and a core topic of the abnormal viewpoint.
此外,为实现上述目的,本申请还提供一种网络舆情的分析装置,该装置包括存储器和处理器,所述存储器中存储有可在所述处理器上运行的舆情分析程序,所述舆情分析程序被所述处理器执行时实现如下步骤:In addition, in order to achieve the above object, the present application also provides a network public opinion analysis device. The device includes a memory and a processor. The memory stores a public opinion analysis program that can be run on the processor. The public opinion analysis When the program is executed by the processor, the following steps are implemented:
确定舆情事件,通过分布式网络爬虫从预设的数据渠道采集与所述舆情事件相关的舆情文章;Determine public opinion events, and collect public opinion articles related to the public opinion events from a preset data channel through a distributed web crawler;
对所述舆情文章进行分词处理,获取所述舆情文章中的词汇集合以表征舆情文章;Perform word segmentation processing on the public opinion article, and obtain a vocabulary set in the public opinion article to characterize the public opinion article;
采用聚类算法词汇集合进行聚类分析,生成舆情事件的多个类型的观点,并根据所述观点包含的词汇集合计算该观点的词向量;Cluster analysis using a vocabulary set of a clustering algorithm to generate multiple types of opinions of public opinion events, and calculate a word vector of the opinion according to the vocabulary set included in the opinion;
从观点包含的词汇集合中提取一个或者多个词汇集合,将提取的词汇集合表征的舆情文章作为该观点的核心话题;Extract one or more vocabulary sets from the vocabulary set contained in the viewpoint, and use the sentiment articles represented by the extracted vocabulary set as the core topic of the viewpoint;
将观点的词向量输入到预先训练好的情感评分模型中,输出观点的情感得分,并根据观点包含的词汇集合对应的舆情文章在各数据渠道的热度和各 数据渠道的预设权重计算观点的热度;The word vector of the viewpoint is input into a pre-trained sentiment scoring model, and the sentiment score of the viewpoint is output. According to the popularity of the sentiment article corresponding to the vocabulary set contained in the viewpoint, the popularity of each data channel and the preset weight of each data channel are used to calculate the viewpoint heat;
根据所述情感得分和所述热度计算观点的舆情指数,并判定舆情指数的绝对值大于预设阈值的观点为异常观点,根据所述异常观点和该异常观点的核心话题生成预警信息并输出。Calculate a public opinion index of a viewpoint according to the sentiment score and the heat, and determine that a viewpoint whose absolute value is greater than a preset threshold is an abnormal viewpoint, and generate and output warning information according to the abnormal viewpoint and a core topic of the abnormal viewpoint.
此外,为实现上述目的,本申请还提供一种计算机可读存储介质,所述计算机可读存储介质上存储有舆情分析程序,所述舆情分析程序可被一个或者多个处理器执行,以实现如上所述的网络舆情的分析方法的步骤。In addition, in order to achieve the above object, the present application also provides a computer-readable storage medium, where the computer-readable storage medium stores a public opinion analysis program, and the public opinion analysis program can be executed by one or more processors to achieve The steps of the analysis method of network public opinion as described above.
本申请提出的网络舆情的分析方法、装置及计算机可读存储介质,确定舆情事件,通过分布式网络爬虫从预设的数据渠道采集与舆情事件相关的舆情文章;对舆情文章进行分词处理,获取舆情文章中的词汇集合以表征舆情文章;采用聚类算法对词汇集合进行聚类分析,生成舆情事件的多个观点,根据观点包含的词汇集合的词向量计算观点的词向量;从观点包含的词汇集合中提取一个或者多个词汇集合,将提取的词汇集合表征的舆情文章作为该观点的核心话题;将观点的词向量输入到预先训练好的情感评分模型中,输出观点的情感得分,并根据观点包含的词汇集合对应的舆情文章在各数据渠道的热度和各数据渠道的预设权重计算观点的热度;根据情感得分和热度计算观点的舆情指数,并判定舆情指数的绝对值大于预设阈值的观点为异常观点,根据异常观点和该异常观点的核心话题生成预警信息并输出,本申请通过对收集的文章进行聚类分析,构建舆情事件的多个类型的观点,实现了对事件的高度概括,并集合情感评分模型对概括得到的观点的情感得分进行计算,实现对舆情事件的各个观点的影响的判断,进而进行预警,提高了对舆情的监控和预警能力。The public opinion analysis method, device, and computer-readable storage medium proposed in this application determine public opinion events, and collect public opinion articles related to public opinion events from preset data channels through a distributed web crawler; perform word segmentation processing on public opinion articles to obtain The vocabulary set in the public opinion article is used to characterize the public opinion article; the clustering algorithm is used to perform a cluster analysis on the vocabulary set to generate multiple views of the public opinion event, and the word vector of the viewpoint is calculated based on the word vector of the vocabulary set included in the viewpoint; One or more vocabulary sets are extracted from the vocabulary set, and the public opinion article represented by the extracted vocabulary set is taken as the core topic of the viewpoint; the word vector of the viewpoint is input into a pre-trained sentiment score model, and the sentiment score of the viewpoint is output, and Calculate the popularity of the opinion according to the popularity of the sentiment articles corresponding to the vocabulary set contained in the opinion and the preset weights of each data channel; calculate the opinion's public opinion index based on the sentiment score and the popularity, and determine that the absolute value of the public opinion index is greater than the preset The threshold view is an abnormal view. The viewpoint and the core topic of the abnormal viewpoint generate early warning information and output. This application uses cluster analysis on the collected articles to construct multiple types of viewpoints of public opinion events, achieving a high degree of generalization of events, and integrating sentiment scoring model pairs. The sentiment scores of the generalized views are calculated to realize the judgment of the impact of various views on public opinion events, and then to provide early warning, which improves the ability of monitoring and early warning of public opinion.
附图说明BRIEF DESCRIPTION OF THE DRAWINGS
图1为本申请一实施例提供的网络舆情的分析方法的流程示意图;FIG. 1 is a schematic flowchart of a network public opinion analysis method according to an embodiment of the present application;
图2为本申请一实施例提供的网络舆情的分析装置的内部结构示意图;FIG. 2 is a schematic diagram of an internal structure of a network public opinion analysis device according to an embodiment of the present application; FIG.
图3为本申请一实施例提供的网络舆情的分析装置中舆情分析程序的模块示意图。FIG. 3 is a schematic block diagram of a public opinion analysis program in a network public opinion analysis device according to an embodiment of the present application.
本申请目的的实现、功能特点及优点将结合实施例,参照附图做进一步说明。The implementation, functional characteristics and advantages of the purpose of this application will be further described with reference to the embodiments and the drawings.
具体实施方式Detailed ways
应当理解,此处所描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。It should be understood that the specific embodiments described herein are only used to explain the application, and are not used to limit the application.
本申请提供一种网络舆情的分析方法。参照图1所示,为本申请一实施例提供的网络舆情的分析方法的流程示意图。该方法可以由一个装置执行,该装置可以由软件和/或硬件实现。This application provides a method for analyzing public opinion on the Internet. FIG. 1 is a schematic flowchart of a method for analyzing public opinion on a network provided by an embodiment of the present application. The method may be performed by a device, which may be implemented by software and / or hardware.
在本实施例中,网络舆情的分析方法包括:In this embodiment, a method for analyzing public opinion on the network includes:
步骤S10,确定舆情事件,通过分布式网络爬虫从预设的数据渠道采集与所述舆情事件相关的舆情文章。Step S10: Determine a public opinion event, and collect a public opinion article related to the public opinion event from a preset data channel through a distributed web crawler.
本申请实施例中的舆情事件一般是当下发生的某个事件,用户可以设置一个或者多个关键词来表示该舆情事件。通过分布式网络爬虫从预设的数据渠道采集与该舆情事件相关的舆情文章,分别按照对应的数据渠道存储获取的舆情文章。具体地,预先设置待爬取的URL(Uniform Resource Locator,统一资源定位符)列表,定时地使用网络爬虫根据上述列表中的URL地址,根据预先设置的能够体现该舆情事件的关键字抓取相关的舆情文章添加至语料库中,数据渠道包括但不限于微博、微信、新闻门户、论坛等,从上述数据渠道获取到的舆情文章主要包括新闻评论、论坛帖子、微博博文、微信文章等。The public opinion event in the embodiment of the present application is generally an event that occurs at present, and the user may set one or more keywords to indicate the public opinion event. Collect public opinion articles related to the public opinion event from a preset data channel through a distributed web crawler, and store the obtained public opinion articles according to the corresponding data channel. Specifically, a URL (Uniform Resource Locator, Uniform Resource Locator) list to be crawled is set in advance, and a web crawler is used regularly to crawl related information based on the URL addresses in the list and according to a preset keyword that can reflect the public opinion event. Public opinion articles are added to the corpus. Data channels include but are not limited to Weibo, WeChat, news portals, forums, etc. The public opinion articles obtained from the above data channels mainly include news comments, forum posts, Weibo blog posts, WeChat articles, etc.
步骤S20,对所述舆情文章进行分词处理,获取所述舆情文章中的词汇集合以表征舆情文章。Step S20: Perform word segmentation processing on the public opinion article, and obtain a vocabulary set in the public opinion article to characterize the public opinion article.
具体地,该步骤包括如下细化步骤:提取舆情文章的正文数据,去除HTML(Hyper Text Markup Language,超级文本标记语言)标签数据、图像标记等无关数据,然后通过正则表达式去除正文数据中的非中文字符。对保留的正文数据通过分词工具进行分词,将中文段落生成以空格分隔的词汇集合。按照预设的停用词词表对词汇集合进行去停用词处理,将剩余的词汇集合用于表征该舆情文章,即以空格分隔的词汇集合作为特征用来表征舆情文章。Specifically, this step includes the following refinement steps: extract the body data of the public opinion article, remove irrelevant data such as HTML (Hyper Text Markup Language), tag data, image markup, etc., and then remove the Non-Chinese characters. Segment the retained body data by the word segmentation tool to generate Chinese-separated vocabulary sets separated by spaces. The stop word processing is performed on the vocabulary set according to a preset stop word vocabulary, and the remaining vocabulary set is used to characterize the public opinion article, that is, the space-separated vocabulary set is used to characterize the public opinion article.
步骤S30,采用聚类算法词汇集合进行聚类分析,生成舆情事件的多个类型的观点,并根据所述观点包含的词汇集合计算该观点的词向量。In step S30, a clustering analysis is performed on a vocabulary set of a clustering algorithm to generate multiple types of opinions of public opinion events, and a word vector of the opinion is calculated according to the vocabulary set included in the opinion.
步骤S40,从观点包含的词汇集合中提取一个或者多个词汇集合,将提取的词汇集合表征的舆情文章作为该观点的核心话题。In step S40, one or more vocabulary sets are extracted from the vocabulary set contained in the viewpoint, and the public opinion article represented by the extracted vocabulary set is taken as the core topic of the viewpoint.
在获取到各个舆情文章的词汇集合之后,对以空格分隔的词汇集合表征的舆情文章进行聚类。由于大众对于一个事件的看法可能是多种多样的,不同的人有不同的观点,因此通过对采集到的所有舆情事件的聚类分析,实现对舆情文章的高度概括,获取到多各观点类别,具体地,该步骤可以包括如下细化步骤:After obtaining the vocabulary set of each public opinion article, cluster the public opinion articles represented by the space-separated vocabulary set. Because the public's views on an event may be diverse, different people have different views, so through cluster analysis of all collected public opinion events, a high degree of generalization of public opinion articles is obtained, and various opinion categories are obtained. Specifically, this step may include the following detailed steps:
获取中文维基百科语料库,基于语料库,根据词频-逆文本频率指数TF-IDF算法选择词汇集合中的多个词汇作为关键词;基于语料库生成中文语料的词向量模型,通过词向量模型计算关键词的词向量,根据关键词的词向量计算词汇集合的词向量;根据词汇集合的词向量和Kmeans算法(K均值算法)对舆情事件的所有词汇集合进行聚类,以将舆情事件的词汇集合分为多个类型的观点,由于词汇集合表征舆情文章,对词汇集合的聚类实际上就是对舆情文章的聚类;对观点包含的词汇集合的关键词进行汇总,根据汇总的关键词的词向量计算观点的词向量。Obtain the Chinese Wikipedia corpus, based on the corpus, select multiple words in the vocabulary set as keywords based on the word frequency-inverse text frequency index TF-IDF algorithm; generate a word vector model of the Chinese corpus based on the corpus, and calculate the keywords based on the word vector model Word vector, calculate the word vector of the vocabulary set according to the word vector of the keywords; cluster all vocabulary sets of public opinion events according to the word vectors of the vocabulary set and the Kmeans algorithm (K-means algorithm) to divide the vocabulary set of public opinion events into For multiple types of viewpoints, since the vocabulary set characterizes public opinion articles, the clustering of vocabulary sets is actually the clustering of public opinion articles; the keywords of the vocabulary set included in the viewpoint are summarized and calculated based on the word vectors of the summarized keywords Perspective word vector.
获取中文维基百科语料库,基于该语料库,根据TF-IDF算法计算每个词汇集合的词汇集合中各个词的重要程度,针对每个词汇集合选择重要程度最高的前N个词汇作为该文章的关键词。基于中文维基百科语料库生成中文语料的Word2vec模型,针对每个词汇集合,通过该Word2vec模型计算选择出的N个关键词的词向量,通过关键词的词向量计算词汇集合的词向量。按照这种方式计算得到舆情事件的所有词汇集合的词向量。由于词汇集合表征舆情文章,提取词汇集合的关键词实际上就是提取舆情文章的关键词。Obtain the Chinese Wikipedia corpus. Based on the corpus, calculate the importance of each word in the vocabulary set of each vocabulary set according to the TF-IDF algorithm. For each vocabulary set, select the top N vocabularies with the highest significance as the keywords of the article. . The Word2vec model of Chinese corpus is generated based on the Chinese Wikipedia corpus. For each vocabulary set, the word vector of the selected N keywords is calculated by the Word2vec model, and the word vector of the vocabulary set is calculated by the word vector of the keyword. Calculate the word vectors of all vocabulary sets of public opinion events in this way. Since the vocabulary set represents public opinion articles, the keywords for extracting the vocabulary set are actually the keywords for extracting public opinion articles.
在计算得到各词汇集合的词向量后,根据Kmeans算法对语料库中所有与舆情事件相关的(以词汇集合表征的)舆情文章进行聚类分析,分成为多个类型的观点。Kmeans算法的K值的初始值随机设置,K值为分类的类群的数量,根据对分类结果的评估调整K值,直至分类结果的准确度达到设定的阈值。After the word vectors of each vocabulary set are calculated, all public opinion articles (represented by the vocabulary set) in the corpus are clustered according to the Kmeans algorithm and classified into multiple types of opinions. The initial value of the K value of the Kmeans algorithm is randomly set, and the K value is the number of classification groups, and the K value is adjusted according to the evaluation of the classification result until the accuracy of the classification result reaches a set threshold.
将每个观点中的所有词汇集合的关键词汇总,计算每个关键词的词频,词频体现了该关键词的权重。通过Word2vec模型计算观点中汇总的各个关键词的词向量,根据关键词的词向量和词频计算观点的词向量。根据计算得到的观点的词向量以及该观点下的词汇集合的词向量,计算词汇集合与观点的相似度,其中,可以通过余弦相似度计算向量之间的相似度,选择相似度最 高的一个或者多个词汇集合表征的舆情文章作为观点的核心话题。The keywords of all vocabulary sets in each viewpoint are summarized, and the word frequency of each keyword is calculated. The word frequency reflects the weight of the keyword. The Word2vec model is used to calculate the word vector of each keyword summarized in the viewpoint, and the word vector of the viewpoint is calculated according to the word vector of the keyword and the word frequency. Calculate the similarity between the vocabulary set and the viewpoint based on the calculated word vector of the viewpoint and the word vector of the vocabulary set under the viewpoint. Among them, the similarity between the vectors can be calculated by the cosine similarity, and the one with the highest similarity or Articles of public opinion characterized by multiple vocabulary sets serve as the core topic of opinion.
步骤S50,将观点的词向量输入到预先训练好的情感评分模型中,输出观点的情感得分,并根据观点包含的词汇集合对应的舆情文章在各数据渠道的热度和各数据渠道的预设权重计算观点的热度。In step S50, the word vector of the viewpoint is input into a pre-trained sentiment score model, and the sentiment score of the viewpoint is output, and according to the popularity of the sentiment article corresponding to the vocabulary set contained in the viewpoint in each data channel and the preset weight of each data channel Calculate the heat of opinion.
预先根据构建的样本库训练预设深度神经网络模型,确定模型参数,将确定模型参数的深度神经网络模型作为情感评分模型。具体地,该步骤包括以下细化步骤:获取添加有标签数据的舆情文本数据,构成样本库,其中,标签数据为根据评论的正面、负面、中立倾向对文本数据标注的情感分数,例如正面评论为1、负面评论为-1、中立评论为0;通过维基百科语料库,结合TF-IDF算法提取样本库中的舆情文本数据的关键词,并通过训练好的词向量模型计算关键词的词向量;将样本库中的舆情文本数据的词向量和标签数据作为训练样本,输入到预设的深度神经网络模型中进行训练,以确定模型参数,并将确定了模型参数的深度神经网络模型作为所述情感评分模型。此外,为了防止过拟合,采取交叉验证的方式训练模型。The preset deep neural network model is trained according to the constructed sample library in advance, the model parameters are determined, and the deep neural network model that determines the model parameters is used as the emotion score model. Specifically, this step includes the following detailed steps: obtaining public opinion text data to which tag data is added to form a sample library, wherein the tag data is an emotional score marked on the text data according to a positive, negative, and neutral tendency of the comment, such as a positive comment Is 1, negative comments are -1, neutral comments are 0; Wikipedia corpus combined with TF-IDF algorithm to extract the keywords of public opinion text data in the sample database, and the word vector of the keywords is calculated by the trained word vector model ; Use the word vector and label data of public opinion text data in the sample library as training samples, and input them into the preset deep neural network model for training to determine the model parameters, and use the deep neural network model with the model parameters as the target. The emotion scoring model is described. In addition, in order to prevent overfitting, cross-validation is used to train the model.
使用上述训练好的情感评分模型计算各个观点的情感得分,其中,情感得分的取值区间为[-1,1],若观点的情感得分为负数,则说明该观点偏向于负面评论;若观点的情感得分为正数,则说明该观点偏向于正面评论;若观点的情感得分在0附近,则说明该观点可能倾向于中立评论。Calculate the sentiment score of each viewpoint using the trained sentiment scoring model, where the value range of the sentiment score is [-1, 1]. If the sentiment score of the viewpoint is negative, it indicates that the viewpoint is biased towards negative reviews; A positive sentiment score indicates that the opinion is biased towards positive comments; if the sentiment score of the opinion is near 0, it indicates that the opinion may favor neutral comments.
分析各个观点的词汇集合表征的舆情文章在各个数据渠道上的统计数据,包括微博、微信、新闻门户、论坛等。例如,对于微信文章,统计数据主要包括单个微信文章的阅读数、评论数和转发数等,对于微博博文,统计数据主要包括博文的转发数、评论数和点赞数等。评估观点在各个渠道上的热度,根据观点在各个数据渠道上的热度和各个数据渠道的预设权重计算观点的热度。可以理解的是,通过上述方法也可以计算各个舆情文章的热度。Analyze the statistical data of public opinion articles represented by vocabulary collections of various viewpoints on various data channels, including Weibo, WeChat, news portals, forums, etc. For example, for WeChat articles, the statistical data mainly includes the number of reads, comments, and retweets of a single WeChat article, and for Weibo blog posts, the statistical data mainly includes the number of reposts, comments, and likes. Evaluate the popularity of opinions on various channels, and calculate the popularity of opinions based on the popularity of opinions on various data channels and preset weights of each data channel. It can be understood that the popularity of each public opinion article can also be calculated by the above method.
步骤S60,根据所述情感得分和所述热度计算观点的舆情指数,并判定舆情指数的绝对值大于预设阈值的观点为异常观点,根据所述异常观点和该异常观点的核心话题生成预警信息并输出。Step S60: Calculate a public opinion index of a viewpoint based on the sentiment score and the popularity, and determine that a viewpoint whose absolute value is greater than a preset threshold is an abnormal viewpoint, and generate early warning information based on the abnormal viewpoint and a core topic of the abnormal viewpoint. And output.
在计算得到各个观点的热度和情感评分后,计算其舆情指数,其中,舆情指数=热度*情感评分,舆情指数的大小体现出该观点对于舆论的影响力,舆情指数的绝对值越接近于1,则说明该观点对于舆论的影响力越大,当计算 得到的舆情指数达到预设阈值时,判定该观点为异常观点。例如,预设阈值为0.8,计算得到的某观点的舆情指数为-0.9,则其绝对值为0.9,该数值大于预设阈值,并且该舆情指数偏向负面评价,此时就可以输出预警信息,并且预警信息中包含有该观点的核心话题。After calculating the popularity and sentiment scores of each point of view, calculate its public opinion index. Among them, public opinion index = popularity * sentiment score. The size of the public opinion index reflects the influence of the opinion on public opinion. The absolute value of the public opinion index is closer to 1. , It shows that the greater the influence of this opinion on public opinion, when the calculated public opinion index reaches a preset threshold, the opinion is judged to be an abnormal opinion. For example, if the preset threshold is 0.8 and the calculated public opinion index of a certain viewpoint is -0.9, the absolute value is 0.9, which is greater than the preset threshold, and the public opinion index is biased toward negative evaluation. At this time, early warning information can be output. And the warning message contains the core topics of this view.
本实施例提出的网络舆情的分析方法,确定舆情事件,通过分布式网络爬虫从预设的数据渠道采集与舆情事件相关的舆情文章;对舆情文章进行分词处理,获取舆情文章中的词汇集合以表征舆情文章;采用聚类算法对词汇集合进行聚类分析,生成舆情事件的多个观点,根据观点包含的词汇集合的词向量计算观点的词向量;从观点包含的词汇集合中提取一个或者多个词汇集合,将提取的词汇集合表征的舆情文章作为该观点的核心话题;将观点的词向量输入到预先训练好的情感评分模型中,输出观点的情感得分,并根据观点包含的词汇集合对应的舆情文章在各数据渠道的热度和各数据渠道的预设权重计算观点的热度;根据情感得分和热度计算观点的舆情指数,并判定舆情指数的绝对值大于预设阈值的观点为异常观点,根据异常观点和该异常观点的核心话题生成预警信息并输出,本申请通过对收集的文章进行聚类分析,构建舆情事件的多个观点,实现了对事件的高度概括,并集合情感评分模型对概括得到的观点的情感得分进行计算,实现对舆情事件的各个观点的影响的判断,进而进行预警,提高了对舆情的监控和预警能力。The public opinion analysis method proposed in this embodiment determines public opinion events, collects public opinion articles related to public opinion events from preset data channels through a distributed web crawler, performs word segmentation processing on public opinion articles, and obtains vocabulary sets in public opinion articles to Characterize public opinion articles; use a clustering algorithm to perform a cluster analysis on the vocabulary set to generate multiple opinions of public opinion events, and calculate the word vector of the viewpoint based on the word vector of the vocabulary set contained in the viewpoint; extract one or more from the vocabulary set contained in the viewpoint Two vocabulary sets, and the public opinion articles represented by the extracted vocabulary set are taken as the core topic of the viewpoint; the word vector of the viewpoint is input into a pre-trained sentiment score model, and the sentiment score of the viewpoint is output, and the corresponding The popularity of the public opinion article in each data channel and the preset weights of each data channel calculate the popularity of the opinion; calculate the opinion public opinion index based on the sentiment score and the popularity, and determine that the opinion whose absolute value of the public opinion index is greater than a preset threshold is an abnormal opinion. According to the anomalous perspective and the core of the anomalous perspective Generate early warning information for questions and output them. This application uses cluster analysis on collected articles to construct multiple views of public opinion events, and achieves a high degree of generalization of events. It also integrates the sentiment scoring model to calculate the sentiment scores of the summarized views. In order to judge the influence of various opinions on public opinion events, and then carry out early warning, the ability of monitoring and early warning of public opinion has been improved.
本申请还提供一种网络舆情的分析装置。参照图2所示,为本申请一实施例提供的网络舆情的分析装置的内部结构示意图。The present application also provides a network public opinion analysis device. Referring to FIG. 2, a schematic diagram of an internal structure of a network public opinion analysis apparatus according to an embodiment of the present application is shown.
在本实施例中,网络舆情的分析装置1可以是PC(Personal Computer,个人电脑),也可以是智能手机、平板电脑、便携计算机等终端设备。该网络舆情的分析装置1至少包括存储器11、处理器12,网络接口13,以及通信总线14。In this embodiment, the network public opinion analysis device 1 may be a PC (Personal Computer) or a terminal device such as a smart phone, a tablet computer, or a portable computer. The network public opinion analysis device 1 includes at least a memory 11, a processor 12, a network interface 13, and a communication bus 14.
其中,存储器11至少包括一种类型的可读存储介质,所述可读存储介质包括闪存、硬盘、多媒体卡、卡型存储器(例如,SD或DX存储器等)、磁性存储器、磁盘、光盘等。存储器11在一些实施例中可以是网络舆情的分析装置1的内部存储单元,例如该网络舆情的分析装置1的硬盘。存储器11在另一些实施例中也可以是网络舆情的分析装置1的外部存储设备,例如网络舆情的分析装置1上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC), 安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。进一步地,存储器11还可以既包括网络舆情的分析装置1的内部存储单元也包括外部存储设备。存储器11不仅可以用于存储安装于网络舆情的分析装置1的应用软件及各类数据,例如舆情分析程序01的代码等,还可以用于暂时地存储已经输出或者将要输出的数据。The memory 11 includes at least one type of readable storage medium. The readable storage medium includes a flash memory, a hard disk, a multimedia card, a card-type memory (for example, SD or DX memory, etc.), a magnetic memory, a magnetic disk, an optical disk, and the like. The memory 11 may be an internal storage unit of the network public opinion analysis device 1 in some embodiments, such as a hard disk of the network public opinion analysis device 1. The memory 11 may also be an external storage device of the network public opinion analysis device 1 in other embodiments, for example, a plug-in hard disk, a smart memory card (SMC), and a secure digital device provided on the network public opinion analysis device 1. (Secure Digital, SD) card, Flash card, etc. Further, the memory 11 may include both an internal storage unit of the network public opinion analysis device 1 and an external storage device. The memory 11 can be used not only to store application software and various types of data installed on the network public opinion analysis device 1, such as the code of the public opinion analysis program 01, but also to temporarily store data that has been or will be output.
处理器12在一些实施例中可以是一中央处理器(Central Processing Unit,CPU)、控制器、微控制器、微处理器或其他数据处理芯片,用于运行存储器11中存储的程序代码或处理数据,例如执行舆情分析程序01等。The processor 12 may be a central processing unit (CPU), a controller, a microcontroller, a microprocessor, or other data processing chip in some embodiments, and is configured to run program codes or processes stored in the memory 11 Data, for example, public opinion analysis program 01 and so on.
网络接口13可选的可以包括标准的有线接口、无线接口(如WI-FI接口),通常用于在该装置1与其他电子设备之间建立通信连接。The network interface 13 may optionally include a standard wired interface, a wireless interface (such as a WI-FI interface), and is generally used to establish a communication connection between the device 1 and other electronic devices.
通信总线14用于实现这些组件之间的连接通信。The communication bus 14 is used to implement connection communication between these components.
可选地,该装置1还可以包括用户接口,用户接口可以包括显示器(Display)、输入单元比如键盘(Keyboard),可选的用户接口还可以包括标准的有线接口、无线接口。可选地,在一些实施例中,显示器可以是LED显示器、液晶显示器、触控式液晶显示器以及OLED(Organic Light-Emitting Diode,有机发光二极管)触摸器等。其中,显示器也可以适当的称为显示屏或显示单元,用于显示在网络舆情的分析装置1中处理的信息以及用于显示可视化的用户界面。Optionally, the device 1 may further include a user interface. The user interface may include a display, an input unit such as a keyboard, and the optional user interface may further include a standard wired interface and a wireless interface. Optionally, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-type liquid crystal display, an OLED (Organic Light-Emitting Diode, organic light emitting diode) touch device, or the like. The display may also be appropriately referred to as a display screen or a display unit, which is used to display information processed in the network public opinion analysis device 1 and to display a visualized user interface.
图2仅示出了具有组件11-14以及舆情分析程序01的网络舆情的分析装置1,本领域技术人员可以理解的是,图1示出的结构并不构成对网络舆情的分析装置1的限定,可以包括比图示更少或者更多的部件,或者组合某些部件,或者不同的部件布置。FIG. 2 only shows a network public opinion analysis device 1 having components 11-14 and a public opinion analysis program 01. Those skilled in the art can understand that the structure shown in FIG. 1 does not constitute an analysis device for network public opinion 1 The definition may include fewer or more components than shown, or some components may be combined, or different component arrangements.
在图2所示的装置1实施例中,存储器11中存储有舆情分析程序01;处理器12执行存储器11中存储的舆情分析程序01时实现如下步骤:In the embodiment of the apparatus 1 shown in FIG. 2, the public opinion analysis program 01 is stored in the memory 11; when the processor 12 executes the public opinion analysis program 01 stored in the memory 11, the following steps are implemented:
确定舆情事件,通过分布式网络爬虫从预设的数据渠道采集与所述舆情事件相关的舆情文章。Determine a public opinion event, and collect a public opinion article related to the public opinion event from a preset data channel through a distributed web crawler.
本申请实施例中的舆情事件一般是当下发生的某个事件,用户可以设置一个或者多个关键词来表示该舆情事件。通过分布式网络爬虫从预设的数据渠道采集与该舆情事件相关的舆情文章,分别按照对应的数据渠道存储获取的舆情文章。具体地,预先设置待爬取的URL列表,定时地使用网络爬虫根 据上述列表中的URL地址,根据预先设置的能够体现该舆情事件的关键字抓取相关的舆情文章添加至语料库中,数据渠道包括但不限于微博、微信、新闻门户、论坛等,从上述数据渠道获取到的舆情文章主要包括新闻评论、论坛帖子、微博博文、微信文章等。The public opinion event in the embodiment of the present application is generally an event that occurs at present, and the user may set one or more keywords to indicate the public opinion event. Collect public opinion articles related to the public opinion event from a preset data channel through a distributed web crawler, and store the obtained public opinion articles according to the corresponding data channel. Specifically, a list of URLs to be crawled is set in advance, and a web crawler is used to periodically crawl related public opinion articles according to the URL addresses in the above list according to the preset keywords that can reflect the public opinion event and add them to the corpus. Data channels Including but not limited to Weibo, WeChat, News Portal, Forum, etc. The public opinion articles obtained from the above data channels mainly include news comments, forum posts, Weibo blog posts, WeChat articles, etc.
对所述舆情文章进行分词处理,获取所述舆情文章中的词汇集合以表征舆情文章。Word segmentation is performed on the public opinion article, and a vocabulary set in the public opinion article is obtained to characterize the public opinion article.
具体地,该步骤包括如下细化步骤:提取舆情文章的正文数据,去除HTML标签数据、图像标记等无关数据,然后通过正则表达式去除正文数据中的非中文字符。对保留的正文数据通过分词工具进行分词,将中文段落生成以空格分隔的词汇集合。按照预设的停用词词表对词汇集合进行去停用词处理,将剩余的词汇集合用于表征该舆情文章,即以空格分隔的词汇集合作为特征用来表征舆情文章。Specifically, this step includes the following refinement steps: extract the body data of the public opinion article, remove irrelevant data such as HTML tag data, image tags, and then remove non-Chinese characters in the body data through regular expressions. Segment the retained body data by the word segmentation tool to generate Chinese-separated vocabulary sets separated by spaces. The stop word processing is performed on the vocabulary set according to a preset stop word vocabulary, and the remaining vocabulary set is used to characterize the public opinion article, that is, the space-separated vocabulary set is used to characterize the public opinion article.
采用聚类算法词汇集合进行聚类分析,生成舆情事件的多个类型的观点,并根据所述观点包含的词汇集合计算该观点的词向量。Cluster analysis is performed by using a clustering algorithm vocabulary set to generate multiple types of opinions of public opinion events, and a word vector of the opinion is calculated according to the vocabulary set included in the opinion.
从观点包含的词汇集合中提取一个或者多个词汇集合,将提取的词汇集合表征的舆情文章作为该观点的核心话题。One or more vocabulary sets are extracted from the vocabulary set contained in the viewpoint, and the public opinion article represented by the extracted vocabulary set is taken as the core topic of the viewpoint.
在获取到各个舆情文章的词汇集合之后,对以空格分隔的词汇集合表征的舆情文章进行聚类。由于大众对于一个事件的看法可能是多种多样的,不同的人有不同的观点,因此通过对采集到的所有舆情事件的聚类分析,实现对舆情文章的高度概括,获取到多各观点类别,具体地,该步骤可以包括如下细化步骤:After obtaining the vocabulary set of each public opinion article, cluster the public opinion articles represented by the space-separated vocabulary set. Because the public's views on an event may be diverse, different people have different views, so through cluster analysis of all collected public opinion events, a high degree of generalization of public opinion articles is obtained, and various opinion categories are obtained. Specifically, this step may include the following detailed steps:
获取中文维基百科语料库,基于语料库,根据词频-逆文本频率指数TF-IDF算法选择词汇集合中的多个词汇作为关键词;基于语料库生成中文语料的词向量模型,通过词向量模型计算关键词的词向量,根据关键词的词向量计算词汇集合的词向量;根据词汇集合的词向量和Kmeans算法(K均值算法)对舆情事件的所有词汇集合进行聚类,以将舆情事件的词汇集合分为多个类型的观点,由于词汇集合表征舆情文章,对词汇集合的聚类实际上就是对舆情文章的聚类;对观点包含的词汇集合的关键词进行汇总,根据汇总的关键词的词向量计算观点的词向量。Obtain the Chinese Wikipedia corpus, based on the corpus, select multiple words in the vocabulary set as keywords based on the word frequency-inverse text frequency index TF-IDF algorithm; generate a word vector model of the Chinese corpus based on the corpus, and calculate the keywords based on the word vector model Word vector, calculate the word vector of the vocabulary set according to the word vector of the keywords; cluster all vocabulary sets of public opinion events according to the word vectors of the vocabulary set and the Kmeans algorithm (K-means algorithm) to divide the vocabulary set of public opinion events into For multiple types of viewpoints, since the vocabulary set characterizes public opinion articles, the clustering of vocabulary sets is actually the clustering of public opinion articles; the keywords of the vocabulary set included in the viewpoint are summarized and calculated based on the word vectors of the summarized keywords Perspective word vector.
获取中文维基百科语料库,基于该语料库,根据TF-IDF算法计算每个词 汇集合的词汇集合中各个词的重要程度,针对每个词汇集合选择重要程度最高的前N个词汇作为该文章的关键词。基于中文维基百科语料库生成中文语料的Word2vec模型,针对每个词汇集合,通过该Word2vec模型计算选择出的N个关键词的词向量,通过关键词的词向量计算词汇集合的词向量。按照这种方式计算得到舆情事件的所有词汇集合的词向量。由于词汇集合表征舆情文章,提取词汇集合的关键词实际上就是提取舆情文章的关键词。Obtain the Chinese Wikipedia corpus. Based on the corpus, calculate the importance of each word in the vocabulary set of each vocabulary set according to the TF-IDF algorithm. For each vocabulary set, select the top N vocabularies with the highest significance as the keywords of the article. . The Word2vec model of Chinese corpus is generated based on the Chinese Wikipedia corpus. For each vocabulary set, the word vector of the selected N keywords is calculated by the Word2vec model, and the word vector of the vocabulary set is calculated by the word vector of the keyword. Calculate the word vectors of all vocabulary sets of public opinion events in this way. Since the vocabulary set represents public opinion articles, the keywords for extracting the vocabulary set are actually the keywords for extracting public opinion articles.
在计算得到各词汇集合的词向量后,根据Kmeans算法对语料库中所有与舆情事件相关的(以词汇集合表征的)舆情文章进行聚类分析,分成为多个类型的观点。Kmeans算法的K值的初始值随机设置,K值为分类的类群的数量,根据对分类结果的评估调整K值,直至分类结果的准确度达到设定的阈值。After the word vectors of each vocabulary set are calculated, all public opinion articles (represented by the vocabulary set) in the corpus are clustered according to the Kmeans algorithm and classified into multiple types of opinions. The initial value of the K value of the Kmeans algorithm is randomly set, and the K value is the number of classification groups, and the K value is adjusted according to the evaluation of the classification result until the accuracy of the classification result reaches a set threshold.
将每个观点中的所有词汇集合表征的舆情文章的关键词汇总,计算每个关键词的词频,词频体现了该关键词的权重。通过Word2vec模型计算观点中汇总的各个关键词的词向量,根据关键词的词向量和词频计算观点的词向量。根据计算得到的观点的词向量以及该观点下的词汇集合的词向量,计算词汇集合与观点的相似度,其中,可以通过余弦相似度计算向量之间的相似度,选择相似度最高的一个或者多个词汇集合表征的舆情文章作为观点的核心话题。The keywords of public opinion articles represented by all vocabulary sets in each viewpoint are summarized, and the word frequency of each keyword is calculated. The word frequency reflects the weight of the keyword. The Word2vec model is used to calculate the word vector of each keyword summarized in the viewpoint, and the word vector of the viewpoint is calculated according to the word vector of the keyword and the word frequency. Calculate the similarity between the vocabulary set and the viewpoint based on the calculated word vector of the viewpoint and the word vector of the vocabulary set under the viewpoint. Among them, the similarity between the vectors can be calculated by the cosine similarity, and the one with the highest similarity or Articles of public opinion characterized by multiple vocabulary sets serve as the core topic of opinion.
将观点的词向量输入到预先训练好的情感评分模型中,输出观点的情感得分,并根据观点包含的词汇集合对应的舆情文章在各数据渠道的热度和各数据渠道的预设权重计算观点的热度。The word vector of the viewpoint is input into a pre-trained sentiment scoring model, and the sentiment score of the viewpoint is output. According to the popularity of the sentiment article corresponding to the vocabulary set contained in the viewpoint, the popularity of each data channel and the preset weight of each data channel are used to calculate the viewpoint heat.
预先根据构建的样本库训练预设深度神经网络模型,确定模型参数,将确定模型参数的深度神经网络模型作为情感评分模型。具体地,该步骤包括以下细化步骤:获取添加有标签数据的舆情文本数据,构成样本库,其中,标签数据为根据评论的正面、负面、中立倾向对文本数据标注的情感分数,例如正面评论为1、负面评论为-1、中立评论为0;通过维基百科语料库,结合TF-IDF算法提取样本库中的舆情文本数据的关键词,并通过训练好的词向量模型计算关键词的词向量;将样本库中的舆情文本数据的词向量和标签数据作为训练样本,输入到预设的深度神经网络模型中进行训练,以确定模型参数,并将确定了模型参数的深度神经网络模型作为所述情感评分模型。此 外,为了防止过拟合,采取交叉验证的方式训练模型。The preset deep neural network model is trained according to the constructed sample library in advance, the model parameters are determined, and the deep neural network model that determines the model parameters is used as the emotion score model. Specifically, this step includes the following detailed steps: obtaining public opinion text data to which tag data is added to form a sample library, wherein the tag data is an emotional score marked on the text data according to a positive, negative, and neutral tendency of the comment, such as a positive comment Is 1, negative comments are -1, neutral comments are 0; Wikipedia corpus combined with TF-IDF algorithm to extract the keywords of public opinion text data in the sample database, and the word vector of the keywords is calculated by the trained word vector model ; Use the word vector and label data of public opinion text data in the sample library as training samples, and input them into the preset deep neural network model for training to determine the model parameters, and use the deep neural network model with the model parameters as the target. The emotion scoring model is described. In addition, in order to prevent overfitting, cross-validation is used to train the model.
使用上述训练好的情感评分模型计算各个观点的情感得分,其中,情感得分的取值区间为[-1,1],若观点的情感得分为负数,则说明该观点偏向于负面评论;若观点的情感得分为正数,则说明该观点偏向于正面评论;若观点的情感得分在0附近,则说明该观点可能倾向于中立评论。Calculate the sentiment score of each viewpoint using the trained sentiment scoring model, where the value range of the sentiment score is [-1, 1]. If the sentiment score of the viewpoint is negative, it indicates that the viewpoint is biased towards negative reviews; A positive sentiment score indicates that the opinion is biased towards positive comments; if the sentiment score of the opinion is near 0, it indicates that the opinion may favor neutral comments.
分析各个观点的词汇集合表征的舆情文章在各个数据渠道上的统计数据,包括微博、微信、新闻门户、论坛等。例如,对于微信文章,统计数据主要包括单个微信文章的阅读数、评论数和转发数等,对于微博博文,统计数据主要包括博文的转发数、评论数和点赞数等。评估观点在各个渠道上的热度,根据观点在各个数据渠道上的热度和各个数据渠道的预设权重计算观点的热度。可以理解的是,通过上述方法也可以计算各个舆情文章的热度。Analyze the statistical data of public opinion articles represented by vocabulary collections of various viewpoints on various data channels, including Weibo, WeChat, news portals, forums, etc. For example, for WeChat articles, the statistical data mainly includes the number of reads, comments, and retweets of a single WeChat article, and for Weibo blog posts, the statistical data mainly includes the number of reposts, comments, and likes. Evaluate the popularity of opinions on various channels, and calculate the popularity of opinions based on the popularity of opinions on various data channels and preset weights of each data channel. It can be understood that the popularity of each public opinion article can also be calculated by the above method.
根据所述情感得分和所述热度计算观点的舆情指数,并判定舆情指数的绝对值大于预设阈值的观点为异常观点,根据所述异常观点和该异常观点的核心话题生成预警信息并输出。Calculate a public opinion index of a viewpoint according to the sentiment score and the heat, and determine that a viewpoint whose absolute value is greater than a preset threshold is an abnormal viewpoint, and generate and output warning information according to the abnormal viewpoint and a core topic of the abnormal viewpoint.
在计算得到各个观点的热度和情感评分后,计算其舆情指数,其中,舆情指数=热度*情感评分,舆情指数的大小体现出该观点对于舆论的影响力,舆情指数的绝对值越接近于1,则说明该观点对于舆论的影响力越大,当计算得到的舆情指数达到预设阈值时,判定该观点为异常观点。例如,预设阈值为0.8,计算得到的某观点的舆情指数为-0.9,则其绝对值为0.9,该数值大于预设阈值,并且该舆情指数偏向负面评价,此时就可以输出预警信息,并且预警信息中包含有该观点的核心话题。After calculating the popularity and sentiment scores of each point of view, calculate its public opinion index. Among them, public opinion index = popularity * sentiment score. The size of the public opinion index reflects the influence of the opinion on public opinion. The absolute value of the public opinion index is closer to 1. , It shows that the greater the influence of this opinion on public opinion, when the calculated public opinion index reaches a preset threshold, the opinion is judged to be an abnormal opinion. For example, if the preset threshold is 0.8 and the calculated public opinion index of a certain viewpoint is -0.9, the absolute value is 0.9, which is greater than the preset threshold, and the public opinion index is biased toward negative evaluation. At this time, early warning information can be output. And the warning message contains the core topics of this view.
本实施例提出的网络舆情的分析装置,该装置先确定舆情事件,然后通过分布式网络爬虫从预设的数据渠道采集与舆情事件相关的舆情文章;对舆情文章进行分词处理,获取舆情文章中的词汇集合以表征舆情文章;采用聚类算法对词汇集合进行聚类分析,生成舆情事件的多个观点,根据观点包含的词汇集合的词向量计算观点的词向量;从观点包含的词汇集合中提取一个或者多个词汇集合,将提取的词汇集合表征的舆情文章作为该观点的核心话题;将观点的词向量输入到预先训练好的情感评分模型中,输出观点的情感得分,并根据观点包含的词汇集合对应的舆情文章在各数据渠道的热度和各数据渠道的预设权重计算观点的热度;根据情感得分和热度计算观点的舆情 指数,并判定舆情指数的绝对值大于预设阈值的观点为异常观点,根据异常观点和该异常观点的核心话题生成预警信息并输出,本申请通过对收集的文章进行聚类分析,构建舆情事件的多个观点,实现了对事件的高度概括,并集合情感评分模型对概括得到的观点的情感得分进行计算,实现对舆情事件的各个观点的影响的判断,进而进行预警,提高了对舆情的监控和预警能力。The device for analyzing public opinion in this embodiment. The device first determines public opinion events, and then collects public opinion articles related to public opinion events from preset data channels through a distributed web crawler; performs word segmentation processing on the public opinion articles to obtain public opinion articles. Vocabulary set to characterize public opinion articles; cluster analysis is performed on the vocabulary set using a clustering algorithm to generate multiple views of public opinion events, and the word vector of the viewpoint is calculated based on the word vector of the vocabulary set contained in the viewpoint; Extract one or more vocabulary sets, and use the sentiment articles represented by the extracted vocabulary set as the core topic of the viewpoint; input the word vector of the viewpoint into a pre-trained sentiment scoring model, output the sentiment's sentiment score, and include The popularity of the public opinion article corresponding to the vocabulary set in each data channel and the preset weight of each data channel calculate the popularity of the opinion; calculate the opinion public opinion index based on the sentiment score and the popularity, and determine the opinion whose absolute value of the public opinion index is greater than the preset threshold Is an abnormal view, according to the abnormal view and the difference The core topics of common viewpoints generate early warning information and output. Through the cluster analysis of the collected articles, this application constructs multiple viewpoints of public opinion events, achieving a high degree of generalization of events, and integrates the sentiment scoring model to summarize the viewpoints obtained. Emotional scores are calculated to determine the impact of various opinions on public opinion events, and then to provide early warning, which improves the ability to monitor and alert public opinion.
可选地,在其他的实施例中,舆情分析程序还可以被分割为一个或者多个模块,一个或者多个模块被存储于存储器11中,并由一个或多个处理器(本实施例为处理器12)所执行以完成本申请,本申请所称的模块是指能够完成特定功能的一系列计算机程序指令段,用于描述舆情分析程序在网络舆情的分析装置中的执行过程。Optionally, in other embodiments, the public opinion analysis program may also be divided into one or more modules, and the one or more modules are stored in the memory 11 and are implemented by one or more processors (this embodiment is The processor 12) executes to complete this application. The modules referred to in this application refer to a series of computer program instruction segments capable of performing specific functions, and are used to describe the execution process of a public opinion analysis program in an analysis device for network public opinion.
例如,参照图3所示,为本申请网络舆情的分析装置一实施例中的舆情分析程序的程序模块示意图,该实施例中,舆情分析程序可以被分割为数据采集模块10、文章分词模块20、文章聚类模块30、话题提取模块40、评分计算模块50和指数计算模块60,示例性地:For example, referring to FIG. 3, it is a schematic diagram of a program module of a public opinion analysis program in an embodiment of an apparatus for analyzing public opinion on the Internet in this application. In this embodiment, the public opinion analysis program can be divided into a data collection module 10 and an article segmentation module 20. , Article clustering module 30, topic extraction module 40, score calculation module 50, and index calculation module 60, for example:
数据采集模块10用于:确定舆情事件,通过分布式网络爬虫从预设的数据渠道采集与所述舆情事件相关的舆情文章;The data collection module 10 is configured to determine a public opinion event and collect a public opinion article related to the public opinion event from a preset data channel through a distributed web crawler;
文章分词模块20用于:对所述舆情文章进行分词处理,获取所述舆情文章中的词汇集合以表征舆情文章;The article segmentation module 20 is configured to perform word segmentation processing on the public opinion article, and obtain a vocabulary set in the public opinion article to characterize the public opinion article;
文章聚类模块30用于:采用聚类算法词汇集合进行聚类分析,生成舆情事件的多个类型的观点,并根据所述观点包含的词汇集合计算该观点的词向量;The article clustering module 30 is configured to perform cluster analysis by using a clustering algorithm vocabulary set to generate multiple types of opinions of public opinion events, and calculate a word vector of the opinion according to the vocabulary set included in the opinion;
话题提取模块40用于:从观点包含的词汇集合中提取一个或者多个词汇集合,将提取的词汇集合表征的舆情文章作为该观点的核心话题;The topic extraction module 40 is configured to extract one or more vocabulary sets from the vocabulary set contained in the viewpoint, and use the public opinion articles represented by the extracted vocabulary set as the core topic of the viewpoint;
评分计算模块50用于:将观点的词向量输入到预先训练好的情感评分模型中,输出观点的情感得分,并根据观点包含的词汇集合对应的舆情文章在各数据渠道的热度和各数据渠道的预设权重计算观点的热度;The score calculation module 50 is used to input a word vector of a viewpoint into a pre-trained sentiment scoring model, output a sentiment score, and according to the popularity of the sentiment article corresponding to the vocabulary set contained in the viewpoint in each data channel and each data channel Calculation of the popularity of views by preset weights;
指数计算模块60用于:根据所述情感得分和所述热度计算观点的舆情指数,并判定舆情指数的绝对值大于预设阈值的观点为异常观点,根据所述异常观点和该异常观点的核心话题生成预警信息并输出。The index calculation module 60 is configured to calculate a public opinion index of a viewpoint according to the sentiment score and the heat, and determine that a viewpoint whose absolute value is greater than a preset threshold is an abnormal viewpoint, and according to the abnormal viewpoint and the core of the abnormal viewpoint The topic generates warning information and outputs it.
上述数据采集模块10、文章分词模块20、文章聚类模块30、话题提取模 块40、评分计算模块50和指数计算模块60等程序模块被执行时所实现的功能或操作步骤与上述实施例大体相同,在此不再赘述。The functions or operation steps implemented when the program modules such as the data acquisition module 10, the article word segmentation module 20, the article clustering module 30, the topic extraction module 40, the score calculation module 50, and the index calculation module 60 are executed are substantially the same as those in the above embodiment. , Will not repeat them here.
此外,本申请实施例还提出一种计算机可读存储介质,所述计算机可读存储介质上存储有舆情分析程序,所述舆情分析程序可被一个或多个处理器执行,以实现如下操作:In addition, an embodiment of the present application further proposes a computer-readable storage medium. The computer-readable storage medium stores a public opinion analysis program, and the public opinion analysis program can be executed by one or more processors to implement the following operations:
确定舆情事件,通过分布式网络爬虫从预设的数据渠道采集与所述舆情事件相关的舆情文章;Determine public opinion events, and collect public opinion articles related to the public opinion events from a preset data channel through a distributed web crawler;
对所述舆情文章进行分词处理,获取所述舆情文章中的词汇集合以表征舆情文章;Perform word segmentation processing on the public opinion article, and obtain a vocabulary set in the public opinion article to characterize the public opinion article;
采用聚类算法词汇集合进行聚类分析,生成舆情事件的多个类型的观点,并根据所述观点包含的词汇集合计算该观点的词向量;Cluster analysis using a vocabulary set of a clustering algorithm to generate multiple types of opinions of public opinion events, and calculate a word vector of the opinion according to the vocabulary set included in the opinion;
从观点包含的词汇集合中提取一个或者多个词汇集合,将提取的词汇集合表征的舆情文章作为该观点的核心话题;Extract one or more vocabulary sets from the vocabulary set contained in the viewpoint, and use the sentiment articles represented by the extracted vocabulary set as the core topic of the viewpoint;
将观点的词向量输入到预先训练好的情感评分模型中,输出观点的情感得分,并根据观点包含的词汇集合对应的舆情文章在各数据渠道的热度和各数据渠道的预设权重计算观点的热度;The word vector of the viewpoint is input into a pre-trained sentiment scoring model, and the sentiment score of the viewpoint is output. According to the popularity of the sentiment article corresponding to the vocabulary set contained in the viewpoint, the popularity of each data channel and the preset weight of each data channel are used to calculate the viewpoint heat;
根据所述情感得分和所述热度计算观点的舆情指数,并判定舆情指数的绝对值大于预设阈值的观点为异常观点,根据所述异常观点和该异常观点的核心话题生成预警信息并输出。Calculate a public opinion index of a viewpoint according to the sentiment score and the heat, and determine that a viewpoint whose absolute value is greater than a preset threshold is an abnormal viewpoint, and generate and output warning information according to the abnormal viewpoint and a core topic of the abnormal viewpoint.
本申请计算机可读存储介质具体实施方式与上述网络舆情的分析装置和方法各实施例基本相同,在此不作累述。The specific implementation manner of the computer-readable storage medium of the present application is basically the same as each embodiment of the above-mentioned network public opinion analysis device and method, and is not repeated here.
需要说明的是,上述本申请实施例序号仅仅为了描述,不代表实施例的优劣。并且本文中的术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、装置、物品或者方法不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、装置、物品或者方法所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括该要素的过程、装置、物品或者方法中还存在另外的相同要素。It should be noted that, the serial numbers of the embodiments of the present application are only for description, and do not represent the advantages and disadvantages of the embodiments. And the terms "including," "including," or any other variation thereof, are intended to cover non-exclusive inclusion, such that a process, device, article, or method that includes a series of elements includes not only those elements, but also The other elements listed, or those that are inherent to such a process, device, article, or method. Without more restrictions, an element limited by the sentence "including a ..." does not exclude that there are other identical elements in the process, device, article, or method that includes the element.
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通 过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在如上所述的一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,或者网络设备等)执行本申请各个实施例所述的方法。Through the description of the above embodiments, those skilled in the art can clearly understand that the methods in the above embodiments can be implemented by means of software plus a necessary universal hardware platform, and of course, also by hardware, but in many cases the former is better. Implementation. Based on such an understanding, the technical solution of the present application, in essence, or a part that contributes to the existing technology, can be embodied in the form of a software product, which is stored in a storage medium (such as ROM / RAM) as described above. , Magnetic disk, optical disc), including a number of instructions to enable a terminal device (which may be a mobile phone, a computer, a server, or a network device, etc.) to execute the methods described in the embodiments of the present application.
以上仅为本申请的优选实施例,并非因此限制本申请的专利范围,凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本申请的专利保护范围内。The above are only preferred embodiments of the present application, and thus do not limit the patent scope of the present application. Any equivalent structure or equivalent process transformation made using the contents of the description and drawings of the application, or directly or indirectly used in other related technical fields Are included in the scope of patent protection of this application.

Claims (20)

  1. 一种网络舆情的分析方法,其特征在于,所述方法包括:A method for analyzing public opinion on the Internet, characterized in that the method includes:
    确定舆情事件,通过分布式网络爬虫从预设的数据渠道采集与所述舆情事件相关的舆情文章;Determine public opinion events, and collect public opinion articles related to the public opinion events from a preset data channel through a distributed web crawler;
    对所述舆情文章进行分词处理,获取所述舆情文章中的词汇集合以表征舆情文章;Perform word segmentation processing on the public opinion article, and obtain a vocabulary set in the public opinion article to characterize the public opinion article;
    采用聚类算法词汇集合进行聚类分析,生成舆情事件的多个类型的观点,并根据所述观点包含的词汇集合计算该观点的词向量;Cluster analysis using a vocabulary set of a clustering algorithm to generate multiple types of opinions of public opinion events, and calculate a word vector of the opinion according to the vocabulary set included in the opinion;
    从观点包含的词汇集合中提取一个或者多个词汇集合,将提取的词汇集合表征的舆情文章作为该观点的核心话题;Extract one or more vocabulary sets from the vocabulary set contained in the viewpoint, and use the sentiment articles represented by the extracted vocabulary set as the core topic of the viewpoint;
    将观点的词向量输入到预先训练好的情感评分模型中,输出观点的情感得分,并根据观点包含的词汇集合对应的舆情文章在各数据渠道的热度和各数据渠道的预设权重计算观点的热度;The word vector of the viewpoint is input into a pre-trained sentiment scoring model, and the sentiment score of the viewpoint is output. According to the popularity of the sentiment article corresponding to the vocabulary set contained in the viewpoint, the popularity of each data channel and the preset weight of each data channel are used to calculate the viewpoint heat;
    根据所述情感得分和所述热度计算观点的舆情指数,并判定舆情指数的绝对值大于预设阈值的观点为异常观点,根据所述异常观点和该异常观点的核心话题生成预警信息并输出。Calculate a public opinion index of a viewpoint according to the sentiment score and the heat, and determine that a viewpoint whose absolute value is greater than a preset threshold is an abnormal viewpoint, and generate and output warning information according to the abnormal viewpoint and a core topic of the abnormal viewpoint.
  2. 如权利要求1所述的网络舆情的分析方法,其特征在于,所述对所述舆情文章进行分词处理,获取所述舆情文章中的词汇集合以表征舆情文章的步骤包括:The network public opinion analysis method according to claim 1, wherein the step of performing word segmentation processing on the public opinion article and obtaining a vocabulary set in the public opinion article to characterize the public opinion article comprises:
    提取舆情文章的正文数据,通过正则表达式去除正文数据中的非中文字符;Extract the body data of public opinion articles, and remove non-Chinese characters from the body data through regular expressions;
    通过分词工具对去除非中文字符后的正文数据进行分词处理,将正文数据转换为以空格分割的词汇集合;Word segmentation processing is performed on the text data after removing non-Chinese characters, and the text data is converted into a vocabulary set separated by spaces;
    对所述词汇集合进行去停用词处理,得到所述词汇集合。De-stop word processing on the vocabulary set to obtain the vocabulary set.
  3. 如权利要求1所述的网络舆情的分析方法,其特征在于,所述采用聚类算法词汇集合进行聚类分析,生成舆情事件的多个类型的观点,并根据所述观点包含的词汇集合计算该观点的词向量的步骤包括:The network public opinion analysis method according to claim 1, wherein the cluster analysis is performed by using a clustering algorithm vocabulary set to generate multiple types of opinions of public opinion events, and the calculation is based on the vocabulary set included in the opinions The steps for this opinion's word vector include:
    获取中文维基百科语料库,基于所述语料库,根据词频-逆文本频率指数TF-IDF算法选择词汇集合中的多个词汇作为关键词;Obtaining a Chinese Wikipedia corpus, and based on the corpus, selecting multiple words in a vocabulary set as keywords based on a word frequency-inverse text frequency index TF-IDF algorithm;
    基于所述语料库生成中文语料的词向量模型,通过所述词向量模型计算 关键词的词向量,根据关键词的词向量计算词汇集合的词向量;Generating a word vector model of a Chinese corpus based on the corpus, calculating a word vector of a keyword through the word vector model, and calculating a word vector of a vocabulary set according to the word vector of the keyword;
    根据词汇集合的词向量和Kmeans算法对所述舆情事件的所有词汇集合进行聚类,以将所述舆情事件的词汇集合分为多个类型的观点;Clustering all vocabulary sets of the public opinion event according to the word vector of the vocabulary set and the Kmeans algorithm to divide the vocabulary set of the public opinion event into multiple types of viewpoints;
    对观点包含的词汇集合的关键词进行汇总,根据汇总的关键词的词向量计算观点的词向量。The keywords of the vocabulary set included in the viewpoint are summarized, and the word vectors of the viewpoint are calculated according to the word vectors of the summarized keywords.
  4. 如权利要求3所述的网络舆情的分析方法,其特征在于,所述从观点包含的词汇集合中提取一个或者多个词汇集合,将提取的词汇集合表征的舆情文章作为该观点的核心话题的步骤包括:The network public opinion analysis method according to claim 3, wherein the extracting one or more vocabulary sets from the vocabulary set contained in the viewpoint, and using the public opinion article represented by the extracted vocabulary set as the core topic of the viewpoint The steps include:
    根据观点的词向量和该观点包含的词汇集合的词向量,计算词汇集合与其对应的观点之间的相似度;Calculate the similarity between the vocabulary set and its corresponding viewpoint according to the word vector of the viewpoint and the word vector of the vocabulary set contained in the viewpoint;
    将相似度大于预设阈值的一个或者多个词汇集合表征的舆情文章,作为观点的核心话题。A public opinion article characterized by one or more vocabulary sets with a similarity greater than a preset threshold is used as the core topic of the opinion.
  5. 如权利要求1所述的网络舆情的分析方法,其特征在于,所述情感评分模型的训练步骤包括:The network public opinion analysis method according to claim 1, wherein the training step of the emotion score model comprises:
    获取添加有标签数据的舆情文本数据,构成样本库;Obtain public opinion text data with labeled data to form a sample library;
    结合TF-IDF算法提取样本库中的舆情文本数据的关键词,并通过训练好的词向量模型计算关键词的词向量;Combine the TF-IDF algorithm to extract the keywords of the public opinion text data in the sample database, and calculate the word vector of the keywords through the trained word vector model;
    将样本库中的舆情文本数据的词向量和标签数据作为训练样本,输入到预设的深度神经网络模型中进行训练,以确定模型参数,并将确定了模型参数的深度神经网络模型作为所述情感评分模型。The word vector and label data of public opinion text data in the sample library are used as training samples, and input to a preset deep neural network model for training to determine model parameters, and a deep neural network model that determines the model parameters is used as the training sample. Emotional scoring model.
  6. 如权利要求2所述的网络舆情的分析方法,其特征在于,所述情感评分模型的训练步骤包括:The network public opinion analysis method according to claim 2, wherein the training step of the emotion score model comprises:
    获取添加有标签数据的舆情文本数据,构成样本库;Obtain public opinion text data with labeled data to form a sample library;
    结合TF-IDF算法提取样本库中的舆情文本数据的关键词,并通过训练好的词向量模型计算关键词的词向量;Combine the TF-IDF algorithm to extract the keywords of the public opinion text data in the sample database, and calculate the word vector of the keywords through the trained word vector model;
    将样本库中的舆情文本数据的词向量和标签数据作为训练样本,输入到预设的深度神经网络模型中进行训练,以确定模型参数,并将确定了模型参数的深度神经网络模型作为所述情感评分模型。The word vector and label data of public opinion text data in the sample library are used as training samples, and input to a preset deep neural network model for training to determine model parameters, and a deep neural network model that determines the model parameters is used as the training sample. Emotional scoring model.
  7. 如权利要求3所述的网络舆情的分析方法,其特征在于,所述情感评分模型的训练步骤包括:The network public opinion analysis method according to claim 3, wherein the training step of the emotion score model comprises:
    获取添加有标签数据的舆情文本数据,构成样本库;Obtain public opinion text data with labeled data to form a sample library;
    结合TF-IDF算法提取样本库中的舆情文本数据的关键词,并通过训练好的词向量模型计算关键词的词向量;Combine the TF-IDF algorithm to extract the keywords of the public opinion text data in the sample database, and calculate the word vector of the keywords through the trained word vector model;
    将样本库中的舆情文本数据的词向量和标签数据作为训练样本,输入到预设的深度神经网络模型中进行训练,以确定模型参数,并将确定了模型参数的深度神经网络模型作为所述情感评分模型。The word vector and label data of public opinion text data in the sample library are used as training samples, and input to a preset deep neural network model for training to determine model parameters, and a deep neural network model that determines the model parameters is used as the training sample. Emotional scoring model.
  8. 一种网络舆情的分析装置,其特征在于,所述装置包括存储器和处理器,所述存储器上存储有可在所述处理器上运行的舆情分析程序,所述舆情分析程序被所述处理器执行时实现如下步骤:A network public opinion analysis device is characterized in that the device includes a memory and a processor. The memory stores a public opinion analysis program that can be run on the processor, and the public opinion analysis program is used by the processor. Implement the following steps during execution:
    确定舆情事件,通过分布式网络爬虫从预设的数据渠道采集与所述舆情事件相关的舆情文章;Determine public opinion events, and collect public opinion articles related to the public opinion events from a preset data channel through a distributed web crawler;
    对所述舆情文章进行分词处理,获取所述舆情文章中的词汇集合以表征舆情文章;Perform word segmentation processing on the public opinion article, and obtain a vocabulary set in the public opinion article to characterize the public opinion article;
    采用聚类算法词汇集合进行聚类分析,生成舆情事件的多个类型的观点,并根据所述观点包含的词汇集合计算该观点的词向量;Cluster analysis using a vocabulary set of a clustering algorithm to generate multiple types of opinions of public opinion events, and calculate a word vector of the opinion according to the vocabulary set included in the opinion;
    从观点包含的词汇集合中提取一个或者多个词汇集合,将提取的词汇集合表征的舆情文章作为该观点的核心话题;Extract one or more vocabulary sets from the vocabulary set contained in the viewpoint, and use the sentiment articles represented by the extracted vocabulary set as the core topic of the viewpoint;
    将观点的词向量输入到预先训练好的情感评分模型中,输出观点的情感得分,并根据观点包含的词汇集合对应的舆情文章在各数据渠道的热度和各数据渠道的预设权重计算观点的热度;The word vector of the viewpoint is input into a pre-trained sentiment scoring model, and the sentiment score of the viewpoint is output. According to the popularity of the sentiment article corresponding to the vocabulary set contained in the viewpoint, the popularity of each data channel and the preset weight of each data channel are used to calculate the viewpoint heat;
    根据所述情感得分和所述热度计算观点的舆情指数,并判定舆情指数的绝对值大于预设阈值的观点为异常观点,根据所述异常观点和该异常观点的核心话题生成预警信息并输出。Calculate a public opinion index of a viewpoint according to the sentiment score and the heat, and determine that a viewpoint whose absolute value is greater than a preset threshold is an abnormal viewpoint, and generate and output warning information according to the abnormal viewpoint and a core topic of the abnormal viewpoint.
  9. 如权利要求8所述的网络舆情的分析装置,其特征在于,所述对所述舆情文章进行分词处理,获取所述舆情文章中的词汇集合以表征舆情文章的步骤包括:The network public opinion analysis device according to claim 8, wherein the step of performing word segmentation processing on the public opinion article and obtaining a vocabulary set in the public opinion article to characterize the public opinion article comprises:
    提取舆情文章的正文数据,通过正则表达式去除正文数据中的非中文字符;Extract the body data of public opinion articles, and remove non-Chinese characters from the body data through regular expressions;
    通过分词工具对去除非中文字符后的正文数据进行分词处理,将正文数据转换为以空格分割的词汇集合;Word segmentation processing is performed on the text data after removing non-Chinese characters, and the text data is converted into a vocabulary set separated by spaces;
    对所述词汇集合进行去停用词处理,得到所述词汇集合。De-stop word processing on the vocabulary set to obtain the vocabulary set.
  10. 如权利要求8所述的网络舆情的分析装置,其特征在于,所述采用聚类算法词汇集合进行聚类分析,生成舆情事件的多个类型的观点,并根据所述观点包含的词汇集合计算该观点的词向量的步骤包括:The network public opinion analysis device according to claim 8, wherein the cluster analysis is performed by using a clustering algorithm vocabulary set to generate multiple types of opinions of public opinion events, and is calculated based on the vocabulary set included in the opinions The steps for this opinion's word vector include:
    获取中文维基百科语料库,基于所述语料库,根据词频-逆文本频率指数TF-IDF算法选择词汇集合中的多个词汇作为关键词;Obtaining a Chinese Wikipedia corpus, and based on the corpus, selecting multiple words in a vocabulary set as keywords based on a word frequency-inverse text frequency index TF-IDF algorithm;
    基于所述语料库生成中文语料的词向量模型,通过所述词向量模型计算关键词的词向量,根据关键词的词向量计算词汇集合的词向量;Generating a word vector model of a Chinese corpus based on the corpus, calculating a word vector of a keyword through the word vector model, and calculating a word vector of a vocabulary set according to the word vector of the keyword;
    根据词汇集合的词向量和Kmeans算法对所述舆情事件的所有词汇集合进行聚类,以将所述舆情事件的词汇集合分为多个类型的观点;Clustering all vocabulary sets of the public opinion event according to the word vector of the vocabulary set and the Kmeans algorithm to divide the vocabulary set of the public opinion event into multiple types of viewpoints;
    对观点包含的词汇集合的关键词进行汇总,根据汇总的关键词的词向量计算观点的词向量。The keywords of the vocabulary set included in the viewpoint are summarized, and the word vectors of the viewpoint are calculated according to the word vectors of the summarized keywords.
  11. 如权利要求10所述的网络舆情的分析装置,其特征在于,所述从观点包含的词汇集合中提取一个或者多个词汇集合,将提取的词汇集合表征的舆情文章作为该观点的核心话题的步骤包括:The network public opinion analysis device according to claim 10, wherein the one or more vocabulary sets are extracted from the vocabulary set contained in the viewpoint, and the public opinion article represented by the extracted vocabulary set is used as the core topic of the viewpoint The steps include:
    根据观点的词向量和该观点包含的词汇集合的词向量,计算词汇集合与其对应的观点之间的相似度;Calculate the similarity between the vocabulary set and its corresponding viewpoint according to the word vector of the viewpoint and the word vector of the vocabulary set contained in the viewpoint;
    将相似度大于预设阈值的一个或者多个词汇集合表征的舆情文章,作为观点的核心话题。A public opinion article characterized by one or more vocabulary sets with a similarity greater than a preset threshold is used as the core topic of the opinion.
  12. 如权利要求8所述的网络舆情的分析装置,其特征在于,所述情感评分模型的训练步骤包括:The network public opinion analysis device according to claim 8, wherein the training step of the emotion score model comprises:
    获取添加有标签数据的舆情文本数据,构成样本库;Obtain public opinion text data with labeled data to form a sample library;
    结合TF-IDF算法提取样本库中的舆情文本数据的关键词,并通过训练好的词向量模型计算关键词的词向量;Combine the TF-IDF algorithm to extract the keywords of the public opinion text data in the sample database, and calculate the word vector of the keywords through the trained word vector model;
    将样本库中的舆情文本数据的词向量和标签数据作为训练样本,输入到预设的深度神经网络模型中进行训练,以确定模型参数,并将确定了模型参数的深度神经网络模型作为所述情感评分模型。The word vector and label data of public opinion text data in the sample library are used as training samples, and input to a preset deep neural network model for training to determine model parameters, and a deep neural network model that determines the model parameters is used as the training sample. Emotional scoring model.
  13. 如权利要求9所述的网络舆情的分析装置,其特征在于,所述情感评分模型的训练步骤包括:The network public opinion analysis device according to claim 9, wherein the training step of the emotion score model comprises:
    获取添加有标签数据的舆情文本数据,构成样本库;Obtain public opinion text data with labeled data to form a sample library;
    结合TF-IDF算法提取样本库中的舆情文本数据的关键词,并通过训练好的词向量模型计算关键词的词向量;Combine the TF-IDF algorithm to extract the keywords of the public opinion text data in the sample database, and calculate the word vector of the keywords through the trained word vector model;
    将样本库中的舆情文本数据的词向量和标签数据作为训练样本,输入到预设的深度神经网络模型中进行训练,以确定模型参数,并将确定了模型参数的深度神经网络模型作为所述情感评分模型。The word vector and label data of public opinion text data in the sample library are used as training samples, and input to a preset deep neural network model for training to determine model parameters, and a deep neural network model that determines the model parameters is used as the training sample. Emotional scoring model.
  14. 如权利要求10所述的网络舆情的分析装置,其特征在于,所述情感评分模型的训练步骤包括:The network public opinion analysis device according to claim 10, wherein the training step of the emotion score model comprises:
    获取添加有标签数据的舆情文本数据,构成样本库;Obtain public opinion text data with labeled data to form a sample library;
    结合TF-IDF算法提取样本库中的舆情文本数据的关键词,并通过训练好的词向量模型计算关键词的词向量;Combine the TF-IDF algorithm to extract the keywords of the public opinion text data in the sample database, and calculate the word vector of the keywords through the trained word vector model;
    将样本库中的舆情文本数据的词向量和标签数据作为训练样本,输入到预设的深度神经网络模型中进行训练,以确定模型参数,并将确定了模型参数的深度神经网络模型作为所述情感评分模型。The word vector and label data of public opinion text data in the sample library are used as training samples, and input to a preset deep neural network model for training to determine model parameters, and a deep neural network model that determines the model parameters is used as the training sample. Emotional scoring model.
  15. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质上存储有舆情分析程序,所述舆情分析程序可被一个或者多个处理器执行,以实现如下步骤:A computer-readable storage medium is characterized in that the computer-readable storage medium stores a public opinion analysis program, and the public opinion analysis program can be executed by one or more processors to implement the following steps:
    确定舆情事件,通过分布式网络爬虫从预设的数据渠道采集与所述舆情事件相关的舆情文章;Determine public opinion events, and collect public opinion articles related to the public opinion events from a preset data channel through a distributed web crawler;
    对所述舆情文章进行分词处理,获取所述舆情文章中的词汇集合以表征舆情文章;Perform word segmentation processing on the public opinion article, and obtain a vocabulary set in the public opinion article to characterize the public opinion article;
    采用聚类算法词汇集合进行聚类分析,生成舆情事件的多个类型的观点,并根据所述观点包含的词汇集合计算该观点的词向量;Cluster analysis using a vocabulary set of a clustering algorithm to generate multiple types of opinions of public opinion events, and calculate a word vector of the opinion according to the vocabulary set included in the opinion;
    从观点包含的词汇集合中提取一个或者多个词汇集合,将提取的词汇集合表征的舆情文章作为该观点的核心话题;Extract one or more vocabulary sets from the vocabulary set contained in the viewpoint, and use the sentiment articles represented by the extracted vocabulary set as the core topic of the viewpoint;
    将观点的词向量输入到预先训练好的情感评分模型中,输出观点的情感得分,并根据观点包含的词汇集合对应的舆情文章在各数据渠道的热度和各数据渠道的预设权重计算观点的热度;The word vector of the viewpoint is input into a pre-trained sentiment scoring model, and the sentiment score of the viewpoint is output. According to the popularity of the sentiment article corresponding to the vocabulary set contained in the viewpoint, the popularity of each data channel and the preset weight of each data channel are used to calculate the viewpoint heat;
    根据所述情感得分和所述热度计算观点的舆情指数,并判定舆情指数的绝对值大于预设阈值的观点为异常观点,根据所述异常观点和该异常观点的核心话题生成预警信息并输出。Calculate a public opinion index of a viewpoint according to the sentiment score and the heat, and determine that a viewpoint whose absolute value is greater than a preset threshold is an abnormal viewpoint, and generate and output warning information according to the abnormal viewpoint and a core topic of the abnormal viewpoint.
  16. 如权利要求15所述的计算机可读存储介质,其特征在于,所述对所述舆情文章进行分词处理,获取所述舆情文章中的词汇集合以表征舆情文章的步骤包括:The computer-readable storage medium of claim 15, wherein the step of performing word segmentation processing on the public opinion article and obtaining a vocabulary set in the public opinion article to characterize the public opinion article comprises:
    提取舆情文章的正文数据,通过正则表达式去除正文数据中的非中文字符;Extract the body data of public opinion articles, and remove non-Chinese characters from the body data through regular expressions;
    通过分词工具对去除非中文字符后的正文数据进行分词处理,将正文数据转换为以空格分割的词汇集合;Word segmentation processing is performed on the text data after removing non-Chinese characters, and the text data is converted into a vocabulary set separated by spaces;
    对所述词汇集合进行去停用词处理,得到所述词汇集合。De-stop word processing on the vocabulary set to obtain the vocabulary set.
  17. 如权利要求15所述的计算机可读存储介质,其特征在于,所述采用聚类算法词汇集合进行聚类分析,生成舆情事件的多个类型的观点,并根据所述观点包含的词汇集合计算该观点的词向量的步骤包括:The computer-readable storage medium according to claim 15, wherein the cluster analysis is performed by using a clustering algorithm vocabulary set to generate multiple types of opinions of public opinion events, and calculation is performed based on the vocabulary set included in the opinions The steps for this opinion's word vector include:
    获取中文维基百科语料库,基于所述语料库,根据词频-逆文本频率指数TF-IDF算法选择词汇集合中的多个词汇作为关键词;Obtaining a Chinese Wikipedia corpus, and based on the corpus, selecting multiple words in a vocabulary set as keywords based on a word frequency-inverse text frequency index TF-IDF algorithm;
    基于所述语料库生成中文语料的词向量模型,通过所述词向量模型计算关键词的词向量,根据关键词的词向量计算词汇集合的词向量;Generating a word vector model of a Chinese corpus based on the corpus, calculating a word vector of a keyword through the word vector model, and calculating a word vector of a vocabulary set according to the word vector of the keyword;
    根据词汇集合的词向量和Kmeans算法对所述舆情事件的所有词汇集合进行聚类,以将所述舆情事件的词汇集合分为多个类型的观点;Clustering all vocabulary sets of the public opinion event according to the word vector of the vocabulary set and the Kmeans algorithm to divide the vocabulary set of the public opinion event into multiple types of viewpoints;
    对观点包含的词汇集合的关键词进行汇总,根据汇总的关键词的词向量计算观点的词向量。The keywords of the vocabulary set included in the viewpoint are summarized, and the word vectors of the viewpoint are calculated according to the word vectors of the summarized keywords.
  18. 如权利要求17所述的计算机可读存储介质,其特征在于,所述从观点包含的词汇集合中提取一个或者多个词汇集合,将提取的词汇集合表征的舆情文章作为该观点的核心话题的步骤包括:The computer-readable storage medium according to claim 17, wherein the extracting one or more vocabulary sets from the vocabulary set contained in the viewpoint, and using public opinion articles represented by the extracted vocabulary set as the core topics of the viewpoint The steps include:
    根据观点的词向量和该观点包含的词汇集合的词向量,计算词汇集合与其对应的观点之间的相似度;Calculate the similarity between the vocabulary set and its corresponding viewpoint according to the word vector of the viewpoint and the word vector of the vocabulary set contained in the viewpoint;
    将相似度大于预设阈值的一个或者多个词汇集合表征的舆情文章,作为观点的核心话题。A public opinion article characterized by one or more vocabulary sets with a similarity greater than a preset threshold is used as the core topic of the opinion.
  19. 如权利要求15所述的计算机可读存储介质,其特征在于,所述情感评分模型的训练步骤包括:The computer-readable storage medium of claim 15, wherein the step of training the emotion score model comprises:
    获取添加有标签数据的舆情文本数据,构成样本库;Obtain public opinion text data with labeled data to form a sample library;
    结合TF-IDF算法提取样本库中的舆情文本数据的关键词,并通过训练好 的词向量模型计算关键词的词向量;Combine the TF-IDF algorithm to extract the keywords of the public opinion text data in the sample database, and calculate the word vector of the keywords through the trained word vector model;
    将样本库中的舆情文本数据的词向量和标签数据作为训练样本,输入到预设的深度神经网络模型中进行训练,以确定模型参数,并将确定了模型参数的深度神经网络模型作为所述情感评分模型。The word vector and label data of public opinion text data in the sample library are used as training samples, and input to a preset deep neural network model for training to determine model parameters, and a deep neural network model that determines the model parameters is used as the training sample. Emotional scoring model.
  20. 如权利要求16所述的计算机可读存储介质,其特征在于,所述情感评分模型的训练步骤包括:The computer-readable storage medium of claim 16, wherein the training step of the emotion score model comprises:
    获取添加有标签数据的舆情文本数据,构成样本库;Obtain public opinion text data with labeled data to form a sample library;
    结合TF-IDF算法提取样本库中的舆情文本数据的关键词,并通过训练好的词向量模型计算关键词的词向量;Combine the TF-IDF algorithm to extract the keywords of the public opinion text data in the sample database, and calculate the word vector of the keywords through the trained word vector model;
    将样本库中的舆情文本数据的词向量和标签数据作为训练样本,输入到预设的深度神经网络模型中进行训练,以确定模型参数,并将确定了模型参数的深度神经网络模型作为所述情感评分模型。The word vector and label data of public opinion text data in the sample library are used as training samples, and input to a preset deep neural network model for training to determine model parameters, and a deep neural network model that determines the model parameters is used as the training sample. Emotional scoring model.
PCT/CN2018/102116 2018-05-31 2018-08-24 Network public opinion analysis method and apparatus, and computer-readable storage medium WO2019227710A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810544762.6A CN108959383A (en) 2018-05-31 2018-05-31 Analysis method, device and the computer readable storage medium of network public-opinion
CN201810544762.6 2018-05-31

Publications (1)

Publication Number Publication Date
WO2019227710A1 true WO2019227710A1 (en) 2019-12-05

Family

ID=64492765

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/102116 WO2019227710A1 (en) 2018-05-31 2018-08-24 Network public opinion analysis method and apparatus, and computer-readable storage medium

Country Status (2)

Country Link
CN (1) CN108959383A (en)
WO (1) WO2019227710A1 (en)

Cited By (54)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111339385A (en) * 2020-02-26 2020-06-26 山东爱城市网信息技术有限公司 CART-based public opinion type identification method and system, storage medium and electronic equipment
CN111738596A (en) * 2020-06-22 2020-10-02 中国银行股份有限公司 Work order distribution method and device
CN111784492A (en) * 2020-07-10 2020-10-16 讯飞智元信息科技有限公司 Public opinion analysis and financial early warning method, device, electronic equipment and storage medium
CN111831824A (en) * 2020-07-16 2020-10-27 民生科技有限责任公司 Public opinion positive and negative face classification method
CN111832815A (en) * 2020-07-02 2020-10-27 山东电力研究院 Scientific research hotspot prediction method and system
CN111931022A (en) * 2020-06-10 2020-11-13 北京雅邦网络技术发展有限公司 AI hot spot content intelligent editing system
CN111950273A (en) * 2020-07-31 2020-11-17 南京莱斯网信技术研究院有限公司 Network public opinion emergency automatic identification method based on emotion information extraction analysis
CN111966920A (en) * 2020-07-13 2020-11-20 江汉大学 Public opinion propagation stable condition prediction method, device and equipment
CN112000813A (en) * 2020-09-14 2020-11-27 支付宝(杭州)信息技术有限公司 Knowledge base construction method and device
CN112035658A (en) * 2020-08-05 2020-12-04 海纳致远数字科技(上海)有限公司 Enterprise public opinion monitoring method based on deep learning
CN112101002A (en) * 2020-09-15 2020-12-18 南京行者易智能交通科技有限公司 Big data based case situation perception early warning method, measure recommendation method and device and terminal equipment
CN112270183A (en) * 2020-10-21 2021-01-26 北京钛氪新媒体科技有限公司 News spreading effect monitoring system based on text
CN112269852A (en) * 2020-10-23 2021-01-26 深圳中泓在线股份有限公司 Method, system and storage medium for generating public opinion topic
CN112329462A (en) * 2020-11-26 2021-02-05 北京五八信息技术有限公司 Data sorting method and device, electronic equipment and storage medium
CN112347230A (en) * 2020-11-16 2021-02-09 上海品见智能科技有限公司 Enterprise public opinion data analysis method based on Word2Vec
CN112527956A (en) * 2020-12-08 2021-03-19 北京工商大学 Food safety public opinion event extraction method based on deep learning
CN112541358A (en) * 2020-06-24 2021-03-23 深圳证券交易所 Public opinion risk early warning method and device and computer storage medium
CN112711691A (en) * 2021-01-08 2021-04-27 深圳市网联安瑞网络科技有限公司 Network public opinion guide effect data information processing method, system, terminal and medium
CN112711651A (en) * 2020-12-30 2021-04-27 上海金仕达软件科技有限公司 Public opinion monitoring method and system
CN112862305A (en) * 2021-02-03 2021-05-28 北京百度网讯科技有限公司 Method, device, equipment and storage medium for determining risk state of object
CN113010764A (en) * 2021-04-15 2021-06-22 杭州恒声科技有限公司 Public opinion monitoring system, method, computer equipment and storage medium
CN113032653A (en) * 2021-04-02 2021-06-25 盐城师范学院 Big data-based public opinion monitoring platform
CN113051455A (en) * 2021-03-31 2021-06-29 合肥供水集团有限公司 Water affair public opinion identification method based on network text data
CN113094703A (en) * 2021-03-11 2021-07-09 北京六方云信息技术有限公司 Output content filtering method and system for web intrusion detection
CN113239685A (en) * 2021-01-13 2021-08-10 中国科学院计算技术研究所 Public sentiment detection method and system based on dual sentiments
CN113239687A (en) * 2021-05-08 2021-08-10 北京天空卫士网络安全技术有限公司 Data processing method and device
CN113268976A (en) * 2021-02-20 2021-08-17 北京交通大学 Topic influence evaluation method facing microblog
CN113468868A (en) * 2021-07-07 2021-10-01 西北大学 NLP-based real-time network hotspot content analysis method
CN113536133A (en) * 2021-07-30 2021-10-22 西安康奈网络科技有限公司 Internet data processing method based on single public opinion event
CN113569118A (en) * 2021-06-30 2021-10-29 深圳市东信时代信息技术有限公司 Self-media pushing method and device, computer equipment and storage medium
CN113590914A (en) * 2021-06-23 2021-11-02 北京百度网讯科技有限公司 Information processing method, device, electronic equipment and storage medium
CN113610427A (en) * 2021-08-19 2021-11-05 深圳市德信软件有限公司 Event early warning index obtaining method and device, terminal equipment and storage medium
CN113657547A (en) * 2021-08-31 2021-11-16 平安医疗健康管理股份有限公司 Public opinion monitoring method based on natural language processing model and related equipment thereof
CN113672792A (en) * 2021-08-20 2021-11-19 广州畅驿智能科技有限公司 Network public opinion data processing method and system
CN113822498A (en) * 2021-10-29 2021-12-21 南京视察者智能科技有限公司 Social contradiction index prediction method based on big data
CN113946680A (en) * 2021-10-20 2022-01-18 河南师范大学 Online network rumor identification method based on graph embedding and information flow analysis
CN114611011A (en) * 2022-03-09 2022-06-10 之江实验室 High-influence user discovery method considering dynamic public sentiment theme
CN114661974A (en) * 2022-03-21 2022-06-24 重庆市规划和自然资源信息中心 Method for public opinion analysis and early warning of government affair website by utilizing natural language semantic analysis
CN114861027A (en) * 2022-04-29 2022-08-05 深圳市东晟数据有限公司 Multi-dimensional public opinion recommendation method based on big data and natural language processing
CN115827989A (en) * 2023-02-16 2023-03-21 杭州金诚信息安全科技有限公司 Network public opinion artificial intelligence early warning system and method under big data environment
CN116017070A (en) * 2022-12-01 2023-04-25 四川长虹电器股份有限公司 Method for improving clicking rate of television homepage based on operation strategy
CN116069832A (en) * 2023-04-07 2023-05-05 微网优联科技(成都)有限公司 Data mining method and device and electronic equipment
CN116522013A (en) * 2023-06-29 2023-08-01 乐麦信息技术(杭州)有限公司 Public opinion analysis method and system based on social network platform
CN116527697A (en) * 2023-06-30 2023-08-01 杭州城市大脑有限公司 Block chain and IPFS public opinion sharing method and system applied to network system management
CN116542238A (en) * 2023-07-07 2023-08-04 和元达信息科技有限公司 Event heat trend determining method and system based on small program
CN116701729A (en) * 2023-08-01 2023-09-05 贵州融云信息技术有限公司 Network public opinion detection system and detection method
CN116861063A (en) * 2023-06-07 2023-10-10 广州数说故事信息科技有限公司 Method for exploring commercial value degree of social media hot search
CN116881504A (en) * 2023-09-06 2023-10-13 北京橙色风暴数字技术有限公司 Image information digital management system and method based on artificial intelligence
CN116910231A (en) * 2023-09-11 2023-10-20 社治无忧(成都)智慧科技有限公司 WeChat public opinion early warning method and system based on natural language processing
CN117217218A (en) * 2023-11-08 2023-12-12 中国科学技术信息研究所 Emotion dictionary construction method and device, electronic equipment and storage medium
CN117390184A (en) * 2023-10-08 2024-01-12 南京特尔顿信息科技有限公司 Internet public opinion early warning method and system based on big data technology
CN117494897A (en) * 2023-11-14 2024-02-02 西安康奈网络科技有限公司 Single public opinion event development tendency judging method
CN117575171A (en) * 2024-01-09 2024-02-20 湖南工商大学 Grain situation intelligent evaluation system based on data analysis
CN112711691B (en) * 2021-01-08 2024-04-30 深圳市网联安瑞网络科技有限公司 Network public opinion guiding effect data information processing method, system, terminal and medium

Families Citing this family (40)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109740146B (en) * 2018-12-10 2023-02-03 厦门市美亚柏科信息股份有限公司 Public opinion monitoring method, terminal and storage medium
CN109800302A (en) * 2018-12-14 2019-05-24 深圳壹账通智能科技有限公司 Public sentiment method for early warning, device, terminal and medium based on Recognition with Recurrent Neural Network algorithm
CN109800307B (en) * 2019-01-18 2022-08-02 深圳壹账通智能科技有限公司 Product evaluation analysis method and device, computer equipment and storage medium
CN110009128A (en) * 2019-01-28 2019-07-12 平安科技(深圳)有限公司 Industry public opinion index prediction technique, device, computer equipment and storage medium
CN109933709B (en) * 2019-01-31 2023-09-26 平安科技(深圳)有限公司 Public opinion tracking method and device for video text combined data and computer equipment
CN109948161A (en) * 2019-03-20 2019-06-28 北京深海巨鲸信息科技有限公司 Data processing method and device for Chinese public sentiment
CN110096652A (en) * 2019-05-06 2019-08-06 上海汽车集团股份有限公司 Public sentiment wind vane index calculation method and device, readable storage medium storing program for executing
CN110222513B (en) * 2019-05-21 2023-06-23 平安科技(深圳)有限公司 Abnormality monitoring method and device for online activities and storage medium
CN112100367A (en) * 2019-05-28 2020-12-18 贵阳海信网络科技有限公司 Public opinion early warning method and device for scenic spot
CN110196979B (en) * 2019-06-05 2023-07-25 深圳市思迪信息技术股份有限公司 Intent recognition method and device based on distributed system
CN110263238B (en) * 2019-06-21 2021-10-15 浙江华坤道威数据科技有限公司 Big data-based public opinion listening system
CN110297986A (en) * 2019-06-21 2019-10-01 山东科技大学 A kind of Sentiment orientation analysis method of hot microblog topic
CN110516067B (en) * 2019-08-23 2022-02-11 北京工商大学 Public opinion monitoring method, system and storage medium based on topic detection
CN110555092B (en) * 2019-09-10 2023-07-04 腾讯科技(深圳)有限公司 Public opinion processing method, apparatus and computer readable storage medium
CN110705288A (en) * 2019-09-29 2020-01-17 武汉海昌信息技术有限公司 Big data-based public opinion analysis system
CN110852090B (en) * 2019-11-07 2024-03-19 中科天玑数据科技股份有限公司 Mechanism characteristic vocabulary expansion system and method for public opinion crawling
CN111160019B (en) * 2019-12-30 2023-08-15 中国联合网络通信集团有限公司 Public opinion monitoring method, device and system
CN111241077B (en) * 2020-01-03 2023-06-09 四川新网银行股份有限公司 Identification method of financial fraud based on internet data
CN111309903B (en) * 2020-01-20 2023-06-16 北京大米未来科技有限公司 Data processing method and device, storage medium and electronic equipment
CN111400437A (en) * 2020-02-19 2020-07-10 北京三快在线科技有限公司 Internet information response method and device, electronic equipment and computer readable medium
CN111414455B (en) * 2020-03-20 2024-03-01 北京百度网讯科技有限公司 Public opinion analysis method, public opinion analysis device, electronic equipment and readable storage medium
CN111428146A (en) * 2020-03-24 2020-07-17 上海智臻智能网络科技股份有限公司 Network information processing method and system, equipment and storage medium
CN111753172A (en) * 2020-06-04 2020-10-09 南京晓庄学院 Internet public opinion information acquisition and processing method
CN111680226A (en) * 2020-06-16 2020-09-18 杭州安恒信息技术股份有限公司 Network public opinion analysis method, device, system, equipment and readable storage medium
CN112463963A (en) * 2020-11-30 2021-03-09 深圳前海微众银行股份有限公司 Method for identifying target public sentiment, model training method and device
CN112434226A (en) * 2020-12-15 2021-03-02 易研信息科技有限公司 Network public opinion monitoring and early warning method
CN112581006A (en) * 2020-12-25 2021-03-30 杭州衡泰软件有限公司 Public opinion engine and method for screening public opinion information and monitoring enterprise main body risk level
CN112966500B (en) * 2021-02-15 2021-11-23 珠海市鸿瑞信息技术股份有限公司 Network data chain safety monitoring platform based on artificial intelligence configuration
CN113392195B (en) * 2021-02-25 2023-07-28 中国人民解放军战略支援部队信息工程大学 Public opinion monitoring method and device, electronic equipment and storage medium
CN112948677B (en) * 2021-02-26 2023-11-03 上海携旅信息技术有限公司 Recommendation reason determining method, system, equipment and medium based on comment aesthetic feeling
CN113254746B (en) * 2021-05-24 2023-07-18 华北科技学院(中国煤矿安全技术培训中心) Internet public opinion display system based on raspberry group
CN113360710B (en) * 2021-05-27 2023-09-01 北京奇艺世纪科技有限公司 Method and device for determining combination degree between objects, computer equipment and storage medium
CN113536805B (en) * 2021-07-09 2023-07-14 北京奇艺世纪科技有限公司 Public opinion analysis method, device, equipment and storage medium for hot events
CN113505581A (en) * 2021-07-27 2021-10-15 北京工商大学 Education big data text analysis method based on APSO-LSTM network
CN114036221A (en) * 2021-09-24 2022-02-11 国务院国有资产监督管理委员会研究中心 Thematic event analysis method
CN114969334B (en) * 2022-05-20 2023-04-07 北京九章云极科技有限公司 Abnormal log detection method and device, electronic equipment and readable storage medium
CN115409018B (en) * 2022-09-20 2023-05-02 浙江书香荷马文化有限公司 Corporate public opinion monitoring system and method based on big data
CN116362589B (en) * 2023-02-23 2023-08-25 中国标准化研究院 Quality work assessment and evaluation method
CN116108851B (en) * 2023-03-13 2023-08-11 北京国研数通软件技术有限公司 NER-based community appeal identification method and system
CN117093762B (en) * 2023-07-18 2024-02-13 南京特尔顿信息科技有限公司 Public opinion data evaluation analysis system and method

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103793503A (en) * 2014-01-24 2014-05-14 北京理工大学 Opinion mining and classification method based on web texts
CN104537097A (en) * 2015-01-09 2015-04-22 成都布林特信息技术有限公司 Microblog public opinion monitoring system
CN107045497A (en) * 2017-05-04 2017-08-15 成都华栖云科技有限公司 A kind of quick newsletter archive content sentiment analysis system and method
CN107085608A (en) * 2017-04-21 2017-08-22 上海喆之信息科技有限公司 A kind of effective network hotspot monitoring system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103793503A (en) * 2014-01-24 2014-05-14 北京理工大学 Opinion mining and classification method based on web texts
CN104537097A (en) * 2015-01-09 2015-04-22 成都布林特信息技术有限公司 Microblog public opinion monitoring system
CN107085608A (en) * 2017-04-21 2017-08-22 上海喆之信息科技有限公司 A kind of effective network hotspot monitoring system
CN107045497A (en) * 2017-05-04 2017-08-15 成都华栖云科技有限公司 A kind of quick newsletter archive content sentiment analysis system and method

Cited By (88)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111339385A (en) * 2020-02-26 2020-06-26 山东爱城市网信息技术有限公司 CART-based public opinion type identification method and system, storage medium and electronic equipment
CN111931022A (en) * 2020-06-10 2020-11-13 北京雅邦网络技术发展有限公司 AI hot spot content intelligent editing system
CN111738596A (en) * 2020-06-22 2020-10-02 中国银行股份有限公司 Work order distribution method and device
CN111738596B (en) * 2020-06-22 2024-03-22 中国银行股份有限公司 Work order dispatching method and device
CN112541358A (en) * 2020-06-24 2021-03-23 深圳证券交易所 Public opinion risk early warning method and device and computer storage medium
CN111832815B (en) * 2020-07-02 2023-12-05 国网山东省电力公司电力科学研究院 Scientific research hot spot prediction method and system
CN111832815A (en) * 2020-07-02 2020-10-27 山东电力研究院 Scientific research hotspot prediction method and system
CN111784492A (en) * 2020-07-10 2020-10-16 讯飞智元信息科技有限公司 Public opinion analysis and financial early warning method, device, electronic equipment and storage medium
CN111966920B (en) * 2020-07-13 2023-09-12 江汉大学 Method, device and equipment for predicting stable condition of public opinion propagation
CN111966920A (en) * 2020-07-13 2020-11-20 江汉大学 Public opinion propagation stable condition prediction method, device and equipment
CN111831824B (en) * 2020-07-16 2024-02-09 民生科技有限责任公司 Public opinion positive and negative surface classification method
CN111831824A (en) * 2020-07-16 2020-10-27 民生科技有限责任公司 Public opinion positive and negative face classification method
CN111950273B (en) * 2020-07-31 2023-09-01 南京莱斯网信技术研究院有限公司 Automatic network public opinion emergency identification method based on emotion information extraction analysis
CN111950273A (en) * 2020-07-31 2020-11-17 南京莱斯网信技术研究院有限公司 Network public opinion emergency automatic identification method based on emotion information extraction analysis
CN112035658A (en) * 2020-08-05 2020-12-04 海纳致远数字科技(上海)有限公司 Enterprise public opinion monitoring method based on deep learning
CN112035658B (en) * 2020-08-05 2024-04-30 海纳致远数字科技(上海)有限公司 Enterprise public opinion monitoring method based on deep learning
CN112000813A (en) * 2020-09-14 2020-11-27 支付宝(杭州)信息技术有限公司 Knowledge base construction method and device
CN112101002A (en) * 2020-09-15 2020-12-18 南京行者易智能交通科技有限公司 Big data based case situation perception early warning method, measure recommendation method and device and terminal equipment
CN112101002B (en) * 2020-09-15 2021-04-02 南京行者易智能交通科技有限公司 Big data based case situation perception early warning method, measure recommendation method and device and terminal equipment
CN112270183A (en) * 2020-10-21 2021-01-26 北京钛氪新媒体科技有限公司 News spreading effect monitoring system based on text
CN112270183B (en) * 2020-10-21 2024-03-19 北京钛氪新媒体科技有限公司 News propagation effect monitoring system based on text
CN112269852A (en) * 2020-10-23 2021-01-26 深圳中泓在线股份有限公司 Method, system and storage medium for generating public opinion topic
CN112347230B (en) * 2020-11-16 2024-04-19 上海品见智能科技有限公司 Enterprise public opinion data analysis method based on Word2Vec
CN112347230A (en) * 2020-11-16 2021-02-09 上海品见智能科技有限公司 Enterprise public opinion data analysis method based on Word2Vec
CN112329462B (en) * 2020-11-26 2024-02-20 北京五八信息技术有限公司 Data sorting method and device, electronic equipment and storage medium
CN112329462A (en) * 2020-11-26 2021-02-05 北京五八信息技术有限公司 Data sorting method and device, electronic equipment and storage medium
CN112527956A (en) * 2020-12-08 2021-03-19 北京工商大学 Food safety public opinion event extraction method based on deep learning
CN112711651A (en) * 2020-12-30 2021-04-27 上海金仕达软件科技有限公司 Public opinion monitoring method and system
CN112711691B (en) * 2021-01-08 2024-04-30 深圳市网联安瑞网络科技有限公司 Network public opinion guiding effect data information processing method, system, terminal and medium
CN112711691A (en) * 2021-01-08 2021-04-27 深圳市网联安瑞网络科技有限公司 Network public opinion guide effect data information processing method, system, terminal and medium
CN113239685A (en) * 2021-01-13 2021-08-10 中国科学院计算技术研究所 Public sentiment detection method and system based on dual sentiments
CN113239685B (en) * 2021-01-13 2023-10-31 中国科学院计算技术研究所 Public opinion detection method and system based on double emotions
CN112862305A (en) * 2021-02-03 2021-05-28 北京百度网讯科技有限公司 Method, device, equipment and storage medium for determining risk state of object
CN113268976A (en) * 2021-02-20 2021-08-17 北京交通大学 Topic influence evaluation method facing microblog
CN113268976B (en) * 2021-02-20 2023-09-12 北京交通大学 Microblog-oriented topic influence assessment method
CN113094703A (en) * 2021-03-11 2021-07-09 北京六方云信息技术有限公司 Output content filtering method and system for web intrusion detection
CN113051455A (en) * 2021-03-31 2021-06-29 合肥供水集团有限公司 Water affair public opinion identification method based on network text data
CN113032653A (en) * 2021-04-02 2021-06-25 盐城师范学院 Big data-based public opinion monitoring platform
CN113010764A (en) * 2021-04-15 2021-06-22 杭州恒声科技有限公司 Public opinion monitoring system, method, computer equipment and storage medium
CN113010764B (en) * 2021-04-15 2023-08-22 德观智能控制设备涿州有限公司 Public opinion monitoring system, public opinion monitoring method, computer equipment and storage medium
CN113239687A (en) * 2021-05-08 2021-08-10 北京天空卫士网络安全技术有限公司 Data processing method and device
CN113239687B (en) * 2021-05-08 2024-03-22 北京天空卫士网络安全技术有限公司 Data processing method and device
CN113590914B (en) * 2021-06-23 2024-02-20 北京百度网讯科技有限公司 Information processing method, apparatus, electronic device and storage medium
CN113590914A (en) * 2021-06-23 2021-11-02 北京百度网讯科技有限公司 Information processing method, device, electronic equipment and storage medium
CN113569118B (en) * 2021-06-30 2023-12-22 深圳市东信时代信息技术有限公司 Self-media pushing method, device, computer equipment and storage medium
CN113569118A (en) * 2021-06-30 2021-10-29 深圳市东信时代信息技术有限公司 Self-media pushing method and device, computer equipment and storage medium
CN113468868A (en) * 2021-07-07 2021-10-01 西北大学 NLP-based real-time network hotspot content analysis method
CN113536133B (en) * 2021-07-30 2023-04-11 西安康奈网络科技有限公司 Internet data processing method based on single public opinion event
CN113536133A (en) * 2021-07-30 2021-10-22 西安康奈网络科技有限公司 Internet data processing method based on single public opinion event
CN113610427A (en) * 2021-08-19 2021-11-05 深圳市德信软件有限公司 Event early warning index obtaining method and device, terminal equipment and storage medium
CN113610427B (en) * 2021-08-19 2023-08-18 深圳市德信软件有限公司 Event early warning index obtaining method, device, terminal equipment and storage medium
CN113672792A (en) * 2021-08-20 2021-11-19 广州畅驿智能科技有限公司 Network public opinion data processing method and system
CN113657547A (en) * 2021-08-31 2021-11-16 平安医疗健康管理股份有限公司 Public opinion monitoring method based on natural language processing model and related equipment thereof
CN113946680B (en) * 2021-10-20 2024-04-16 河南师范大学 Online network rumor identification method based on graph embedding and information flow analysis
CN113946680A (en) * 2021-10-20 2022-01-18 河南师范大学 Online network rumor identification method based on graph embedding and information flow analysis
CN113822498B (en) * 2021-10-29 2023-07-18 南京视察者智能科技有限公司 Social contradiction index prediction method based on big data
CN113822498A (en) * 2021-10-29 2021-12-21 南京视察者智能科技有限公司 Social contradiction index prediction method based on big data
CN114611011A (en) * 2022-03-09 2022-06-10 之江实验室 High-influence user discovery method considering dynamic public sentiment theme
CN114611011B (en) * 2022-03-09 2024-03-29 之江实验室 High-influence user discovery method considering dynamic public opinion theme
CN114661974B (en) * 2022-03-21 2024-03-08 重庆市规划和自然资源信息中心 Government website public opinion analysis and early warning method by utilizing natural language semantic analysis
CN114661974A (en) * 2022-03-21 2022-06-24 重庆市规划和自然资源信息中心 Method for public opinion analysis and early warning of government affair website by utilizing natural language semantic analysis
CN114861027A (en) * 2022-04-29 2022-08-05 深圳市东晟数据有限公司 Multi-dimensional public opinion recommendation method based on big data and natural language processing
CN116017070A (en) * 2022-12-01 2023-04-25 四川长虹电器股份有限公司 Method for improving clicking rate of television homepage based on operation strategy
CN116017070B (en) * 2022-12-01 2024-04-12 四川长虹电器股份有限公司 Method for improving clicking rate of television homepage based on operation strategy
CN115827989B (en) * 2023-02-16 2023-04-28 杭州金诚信息安全科技有限公司 Network public opinion artificial intelligent early warning system and method in big data environment
CN115827989A (en) * 2023-02-16 2023-03-21 杭州金诚信息安全科技有限公司 Network public opinion artificial intelligence early warning system and method under big data environment
CN116069832A (en) * 2023-04-07 2023-05-05 微网优联科技(成都)有限公司 Data mining method and device and electronic equipment
CN116069832B (en) * 2023-04-07 2023-06-06 微网优联科技(成都)有限公司 Data mining method and device and electronic equipment
CN116861063B (en) * 2023-06-07 2024-02-27 广州数说故事信息科技有限公司 Method for exploring commercial value degree of social media hot search
CN116861063A (en) * 2023-06-07 2023-10-10 广州数说故事信息科技有限公司 Method for exploring commercial value degree of social media hot search
CN116522013A (en) * 2023-06-29 2023-08-01 乐麦信息技术(杭州)有限公司 Public opinion analysis method and system based on social network platform
CN116522013B (en) * 2023-06-29 2023-09-05 乐麦信息技术(杭州)有限公司 Public opinion analysis method and system based on social network platform
CN116527697A (en) * 2023-06-30 2023-08-01 杭州城市大脑有限公司 Block chain and IPFS public opinion sharing method and system applied to network system management
CN116527697B (en) * 2023-06-30 2023-09-08 杭州城市大脑有限公司 Block chain and IPFS public opinion sharing method and system applied to network system management
CN116542238B (en) * 2023-07-07 2024-03-15 和元达信息科技有限公司 Event heat trend determining method and system based on small program
CN116542238A (en) * 2023-07-07 2023-08-04 和元达信息科技有限公司 Event heat trend determining method and system based on small program
CN116701729A (en) * 2023-08-01 2023-09-05 贵州融云信息技术有限公司 Network public opinion detection system and detection method
CN116701729B (en) * 2023-08-01 2023-10-31 贵州融云信息技术有限公司 Network public opinion detection system and detection method
CN116881504B (en) * 2023-09-06 2023-11-24 北京橙色风暴数字技术有限公司 Image information digital management system and method based on artificial intelligence
CN116881504A (en) * 2023-09-06 2023-10-13 北京橙色风暴数字技术有限公司 Image information digital management system and method based on artificial intelligence
CN116910231B (en) * 2023-09-11 2023-11-17 社治无忧(成都)智慧科技有限公司 WeChat public opinion early warning method and system based on natural language processing
CN116910231A (en) * 2023-09-11 2023-10-20 社治无忧(成都)智慧科技有限公司 WeChat public opinion early warning method and system based on natural language processing
CN117390184A (en) * 2023-10-08 2024-01-12 南京特尔顿信息科技有限公司 Internet public opinion early warning method and system based on big data technology
CN117217218A (en) * 2023-11-08 2023-12-12 中国科学技术信息研究所 Emotion dictionary construction method and device, electronic equipment and storage medium
CN117217218B (en) * 2023-11-08 2024-01-23 中国科学技术信息研究所 Emotion dictionary construction method and device for science and technology risk event related public opinion
CN117494897A (en) * 2023-11-14 2024-02-02 西安康奈网络科技有限公司 Single public opinion event development tendency judging method
CN117575171A (en) * 2024-01-09 2024-02-20 湖南工商大学 Grain situation intelligent evaluation system based on data analysis
CN117575171B (en) * 2024-01-09 2024-04-05 湖南工商大学 Grain situation intelligent evaluation system based on data analysis

Also Published As

Publication number Publication date
CN108959383A (en) 2018-12-07

Similar Documents

Publication Publication Date Title
WO2019227710A1 (en) Network public opinion analysis method and apparatus, and computer-readable storage medium
CN109325165B (en) Network public opinion analysis method, device and storage medium
CN110516067B (en) Public opinion monitoring method, system and storage medium based on topic detection
CN109145216A (en) Network public-opinion monitoring method, device and storage medium
CN109271512B (en) Emotion analysis method, device and storage medium for public opinion comment information
US10831769B2 (en) Search method and device for asking type query based on deep question and answer
AU2017408801B2 (en) User keyword extraction device and method, and computer-readable storage medium
Alzahrani et al. Understanding plagiarism linguistic patterns, textual features, and detection methods
WO2019218514A1 (en) Method for extracting webpage target information, device, and storage medium
US9996604B2 (en) Generating usage report in a question answering system based on question categorization
CN108108426B (en) Understanding method and device for natural language question and electronic equipment
US9342592B2 (en) Method for systematic mass normalization of titles
KR102491172B1 (en) Natural language question-answering system and learning method
KR20130022042A (en) System for detecting and tracking topic based on topic opinion and social-influencer and method thereof
Tabak et al. Comparison of emotion lexicons
KR102407056B1 (en) Systems and methods for gathering public data of SNS user channel and providing influence reports based on the collected public data
CN116561538A (en) Question-answer scoring method, question-answer scoring device, electronic equipment and storage medium
US20170235835A1 (en) Information identification and extraction
Li et al. A hybrid model for role-related user classification on twitter
CN110019763B (en) Text filtering method, system, equipment and computer readable storage medium
CN109753646B (en) Article attribute identification method and electronic equipment
CN113626704A (en) Method, device and equipment for recommending information based on word2vec model
CN110674288A (en) User portrait method applied to network security field
CN104408036A (en) Correlated topic recognition method and device
Dhiman et al. An unsupervised misinformation detection framework to analyze the users using covid-19 twitter data

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18920896

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 26.03.2021)

122 Ep: pct application non-entry in european phase

Ref document number: 18920896

Country of ref document: EP

Kind code of ref document: A1