CN115269950A - Public opinion information content mining and propagation monitoring analysis method - Google Patents

Public opinion information content mining and propagation monitoring analysis method Download PDF

Info

Publication number
CN115269950A
CN115269950A CN202210653244.4A CN202210653244A CN115269950A CN 115269950 A CN115269950 A CN 115269950A CN 202210653244 A CN202210653244 A CN 202210653244A CN 115269950 A CN115269950 A CN 115269950A
Authority
CN
China
Prior art keywords
public sentiment
last
event
sentiment event
media
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210653244.4A
Other languages
Chinese (zh)
Inventor
常宁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qingdao University of Technology
Original Assignee
Qingdao University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qingdao University of Technology filed Critical Qingdao University of Technology
Priority to CN202210653244.4A priority Critical patent/CN115269950A/en
Publication of CN115269950A publication Critical patent/CN115269950A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9537Spatial or temporal dependent retrieval, e.g. spatiotemporal queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application provides an analysis method for public opinion information content mining and propagation monitoring, which comprises the following steps: acquiring last public opinion information; identifying the positive and negative surfaces of the last public sentiment event according to the last public sentiment information; analyzing the factor of the last public sentiment event; analyzing the propagation degree of the last public sentiment event reproduction based on the occurrence factor of the last public sentiment event; mining the relevant self-media reported by the last public sentiment event according to the propagation degree of the reproduction of the last public sentiment event; predicting the attitude of the relevant self-media for the last reproduction of the public sentiment event; and carrying out record certification on the self-media which possibly causes public opinion crisis according to the attitude prediction result of the self-media.

Description

Public opinion information content mining and propagation monitoring analysis method
[ technical field ] A method for producing a semiconductor device
The invention relates to the technical field of information, in particular to an analysis method for public opinion information content mining and propagation monitoring.
[ background of the invention ]
The public sentiment information has the implication of words, corresponding public sentiment events have positive and negative polarities, and the occurrence of the public sentiment events is not all accidental, because some events can occur due to the occurrence of related events, and some events can occur due to the occurrence of time periods. For example, china, no. 13 manned space, which was successful in launch, brings a great deal of public opinion information. However, people-carrying spaceflight inevitably land when the aircraft is lifted off, and it can be predicted that the period of public opinion information will come after the planning period, and at this time, the self media of public opinion influence is exerted in the early stage, and propaganda with larger influence can be introduced again. However, if the self-media has negative influence after the last public sentiment event is reported, the reproduction attitude of the last public sentiment event is greatly changed. Therefore, for the propagation prediction, the reproduction of the last public sentiment event can be predicted according to the reproduction of the last public sentiment event. Some self-media have bad words on the platform, and how to carry out stricter authentication on the self-media is a problem to avoid some self-media publishing bad words and carrying out record authentication.
[ summary of the invention ]
The invention provides an analysis method for public sentiment information content mining and propagation monitoring, which mainly comprises the following steps:
acquiring the last public opinion information; identifying the positive and negative faces of the last public sentiment event according to the last public sentiment information; analyzing the occurrence factors of the last public sentiment event; analyzing the propagation degree of the last public sentiment event reproduction based on the occurrence factor of the last public sentiment event; mining the relevant self-media reported by the last public sentiment event according to the propagation degree of the reproduction of the last public sentiment event; predicting the attitude of the relevant self-media for the last public sentiment event reproduction; according to the attitude prediction result of the self-media, carrying out record authentication on the self-media which possibly causes public opinion danger;
further optionally, the obtaining of the last public opinion information includes:
public opinion information based on internet obtains public opinion information the last time, include: acquiring a preset condition; the preset condition comprises at least one of preset time, a preset field, a preset place and a preset object; acquiring the last public opinion information by adopting a public opinion calculation analysis method based on the preset condition; based on the preset condition, acquiring the last public opinion information by adopting a public opinion calculation analysis method, comprising the following steps of: according to the preset condition, crawling at least one piece of first information which is the same as the preset condition from an Internet database through a crawler; when at least one first message is one, determining the first message as the last public opinion message; or when at least one piece of first information is a plurality of pieces of first information, the browsing amount of the plurality of pieces of first information is crawled, screening out at least one piece of second information with the highest browsing volume from the plurality of pieces of first information; when at least one piece of second information is one, determining the second information as the last public opinion information, or when at least one piece of second information is multiple, crawling the mutual momentum of multiple pieces of second information, and determining the second information with the highest mutual momentum in the multiple pieces of second information as the last public opinion information; the mutual quantity refers to the sum of the amount of praise and the amount of review reply contained in the information.
Further optionally, the identifying the positive and negative sides of the last public opinion event according to the last public opinion information includes:
the method for identifying the positive and negative surfaces of the last public sentiment event refers to classifying the positive and negative polarities of the last public sentiment information, and comprises the following steps: firstly, extracting the characteristics of the last public sentiment information to make a computer system distinguish the text content expressing the real subjective information; the characteristic extraction means that the computer screens out the paragraph sentences with subjective information by detecting the information text and consulting the standard word stock; the subjective information refers to words with emotional colors and positive and negative surface tendencies; then, based on the paragraph sentence with subjective information, further extracting the meaning expressed by the paragraph sentence, and selecting a viewpoint of a party on a specific topic, including: topic extraction, opinion holder identification and selection of statements; the theme extraction refers to extracting specific aspects of themes with commendable viewpoints and expressions; the opinion holder identification is to identify a person holding the opinion; the statement selection refers to identifying the opinions issued by the opinion holder and removing statements of other people; secondly, constructing a single classification SVM model by taking all historical public sentiment events as samples based on influence factors; the influence factors include extracted topics, statements of opinion holders; then mapping a value interval of the distance from the sample output in the middle of the single classification SVM model to the spherical center of the hypersphere to [0,1] by using an activation function; if the mapping result is equal to the first preset threshold, the polarity of the last public sentiment event is neutral; if the mapping result is smaller than a first preset threshold, the polarity of the last public sentiment event is opposite to that of the historical public sentiment event; otherwise, the polarity of the last public sentiment event is the same as that of the historical public sentiment event.
Further optionally, the analyzing the last occurrence of the public sentiment event includes:
analyzing the occurrence factor of the last public sentiment event according to the positive and negative identified public sentiment event; the factors include an association event and an association time; the related event refers to an event causing the last public sentiment event to occur; the associated time refers to a time period which causes the last public sentiment event to occur; calculating the incidence degree of the last public sentiment event through the incidence function of the last public sentiment event, and analyzing the incidence factors; the method comprises the following steps: analyzing the related event of the last public sentiment event; analyzing the association time of the last public sentiment event; calculating the relevance degree of the last public sentiment event according to the relevance function of the last public sentiment event;
the analysis of the related event occurring in the last public sentiment event specifically comprises the following steps:
analyzing the related event occurring in the last public sentiment event by a public sentiment related event analysis system; the public opinion associated event analysis system comprises a preprocessing module, a topology module and an expert module; the preprocessing module is used for preprocessing the text of the last public sentiment event; preprocessing the text, namely dividing keywords of the text as sub-information comprising places, things and states; the topology module is used for classifying the sub-information of the preprocessing module into different layers, then respectively carrying out topology on the sub-information of the same layer and the sub-information of different layers, and generating various types of sub-information based on the information of the last public sentiment event; and the expert module crawls texts containing the same sub-information in a large Internet database according to the various types of sub-information generated by the topology module, takes the content corresponding to the text of which the sum of the browsing amount and the mutual amount exceeds a second preset threshold value as a correlation event, and generates an analysis report.
The analysis of the correlation time of the last public sentiment event specifically comprises the following steps:
the correlation time analysis is used for analyzing a time period; firstly, presetting data granularity; the data granularity refers to the degree of refining the data of the last public sentiment event; dividing time intervals according to the data granularity, drawing a time interval distribution graph, wherein the abscissa is time, and the ordinate is a correlation event occurrence index, if a correlation event occurs in the time intervals, the ordinate of the time interval distribution graph is 1, otherwise, the ordinate is 0; if the ordinate value appearing in the preset time interval in the time interval distribution diagram is 1 in the preset time period, the last public sentiment event is associated with the time period, and the abscissa value corresponding to the ordinate value in the time interval distribution diagram is the associated time.
The method for calculating the degree of association of the last public sentiment event according to the function of association of the last public sentiment event includes:
establishing a last public sentiment event occurrence correlation function, namely R = Sr W1+ Tr W2; wherein, R represents the relevance degree of the last public sentiment event; sr represents the number of associated events; tr represents the number of association times; w1 and W2 represent the corresponding weights, respectively, and W1+ W2=1.
Further optionally, the analyzing a degree of propagation of the last public opinion event reproduction based on the occurrence factor of the last public opinion event includes:
the propagation degree is measured by the heat of the last public sentiment event reproduction and the focus of the last public sentiment event reproduction; the popularity is the fire and heat degree reproduced by the last public sentiment event, and reflects the attention degree of the public sentiment event reproduced by the public at the last time; the focus refers to a concentration point of public disputes when the last public sentiment event is reproduced; establishing a system for evaluating the propagation degree of the reproduction of the public sentiment event, namely SPR = (the time point of the last time of the reproduction of the public sentiment event is defined by the analyzed heat degree of the reproduction of the last public sentiment event and the reproduction focus of the last public sentiment event
HOT + FOC)/2; wherein, SPR represents the propagation degree evaluation index of the last reproduction of the public sentiment event; HOT represents the popularity evaluation index of the last public sentiment event reproduction, if the popularity of the last public sentiment event reproduction exceeds a third preset threshold, HOT =1, otherwise HOT =0; the FOC represents a focus evaluation index of the last public sentiment event reproduction, if the probability of the keyword of the focus reproduced by the last public sentiment event appearing in the crawled big data word bank is greater than a fourth preset threshold value, FOC =1, otherwise, FOC =0; analyzing the propagation degree of the last public sentiment event reproduction according to the BP neural network; determining a training sample and a testing sample according to a public sentiment event popularity function and a focus prediction model, wherein the training sample is used for neural network training, and the testing sample is used for detecting the relative error between an actual value and a predicted value; the method comprises the following steps: analyzing the heat of the last public sentiment event reproduction based on the number of the associated events; calculating the popularity of the last public sentiment event according to a public sentiment event popularity function; predicting the focus of the last public sentiment event reproduction;
the analyzing the popularity of the last public sentiment event reproduction based on the number of the associated events specifically comprises:
analyzing the reproduction popularity of the last public opinion event based on the number of the related events; if the more the associated events are, the higher the total attention of the associated events and the total attention of the last public sentiment event are, the higher the popularity of the last public sentiment event is, and the higher the popularity of the last public sentiment event is reproduced; and (3) establishing an ARIMA regression model according to the previous public sentiment event popularity calculated by the correlation event total attention and the public sentiment event popularity function as the original data: drawing a change curve of the total attention degree of the associated events and the popularity of the last public sentiment event in a time sequence; if the curve is not stable, differentiating the sequence corresponding to the curve, and drawing a differential broken line graph to determine the order d; then continuously drawing an autocorrelation function graph and a partial autocorrelation function graph of the differential line graph, and respectively judging the orders p and q of the model according to the shapes of the graphs; and finally, reading a predicted value after the time is moved backwards according to an ARIMA (p, d, q) output curve, namely the reproduction popularity of the last public sentiment event.
According to the public sentiment event popularity function, calculating the last public sentiment event popularity, and specifically comprising:
establishing a public sentiment event popularity function, namely Hpr = Sr Ca1+ Ca2; wherein, hpr represents the popularity of the last public sentiment event; sr represents the number of associated events; ca1 represents the total degree of attention of the associated event; the total attention of the associated events refers to the sum of the browsing amount, the praise amount, the comment amount and the forwarding amount of the public for the associated events in the statistical data of the background of the event distribution system. Ca2 represents the total attention of the last public sentiment event; the total attention of the last public sentiment event refers to the sum of the browsing amount, the praise amount, the comment amount and the forwarding amount of the public sentiment event in the statistical data of the background of the event publishing system.
The predicting the focus of the last public sentiment event reproduction specifically comprises the following steps:
all focuses of history are obtained by crawling internet big data, and a focus prediction model is utilized to predict the focus of the last public sentiment event reproduction; the contents of the focus prediction model are as follows: taking all focuses of the history as sample data, analyzing by a K-means clustering algorithm, and selecting a plurality of centroids according to keywords of all focuses of the history; the keywords refer to words with frequency greater than a fifth preset threshold value in the big data; then calculating Euclidean distance from each sample to the centroid, and classifying each sample to the closest centroid respectively; then, averaging the sample data of each category to obtain a new centroid of each category; and finally, iteratively classifying the samples and solving for a new centroid until the clustering is converged and the new and old centroids are not changed any more. Inputting the last public sentiment event and the related event as parameters, and classifying the keyword type of the most similar focus to the last public sentiment event through a clustering algorithm as algorithm output; and finally, analyzing to obtain the reproduced focus of the last public sentiment event according to the key words of the focus and the last public sentiment event.
Further optionally, the mining of the relevant media of the previous public opinion event report according to the propagation degree of the previous public opinion event reproduction comprises:
mining the type and the quantity of relevant self-media reported by the last public sentiment event based on the propagation degree of the reproduction of the last public sentiment event; the relevant self-media comprises: the system comprises a picture-text self-media, a video self-media, an audio self-media and a live self-media; the more relevant self-media reported for the last public sentiment event, the higher the popularity of the last public sentiment event, and the greater the propagation degree of the reproduction of the last public sentiment event; mining the association relation between the propagation degrees of the relevant self-media corresponding to the last public sentiment event reproduction by utilizing an Apriori algorithm; the association rule algorithm process is as follows: through iterative input, retrieving all frequent item sets in the types, the number and the number of the related self-media in the network big data platform, namely the item set with the support degree not lower than a set threshold value; the support degree is the percentage of the occurrence frequency of each factor item set in the big data platform to the total number of the item sets; then, constructing a rule meeting the confidence coefficient by using the frequent item set; the confidence coefficient is the percentage of the total item set of each factor in all the item sets of the database; judging the strength of the relevance between the support degree and the confidence degree of the transmission degree reproduced by the last public sentiment event by comparing the support degree and the confidence degree, and drawing a network graph related to the transmission degree reproduced by the last public sentiment event according to the association rule; the network diagram is related to the propagation degree of the last public sentiment event reproduction and shows the related self-media reported by the last public sentiment event.
Further optionally, the predicting the attitude of the relevant self-media to the last public sentiment event reproduction comprises:
establishing a correlation self-media attitude prediction model based on a deep learning method, and predicting the attitude of the correlation self-media for the last public sentiment event; firstly, collecting a large number of public opinion influence of relevant self-media, including self-media awareness, self-media platform activeness, self-media influence and influence of a last public opinion event; the self-media popularity is the number of fans of the self-media, and the larger the number of fans is, the higher the self-media popularity is; the self-media platform activeness comprises the frequency and the total number of published public opinion events on the self-media platform; the self-media influence comprises the amount of praise, forwarding amount and comment amount after all public sentiment events are reported by self-media; the influence of the last public sentiment event refers to the positive influence or negative influence of the media on the last public sentiment event; the positive influence is measured by the amount of praise after the last public sentiment event report, and the more the amount of praise is, the greater the positive influence is; the negative influence is measured by the reported quantity after the last public sentiment event is reported, and the more the reported quantity is, the greater the negative influence is; crawling content data of the public opinion influence of the relevant self-media of the Internet big data platform by a crawler, wherein the content data comprises the number of vermicelli of the self-media, the frequency and the total number of all public opinion events released from the media platform, the praise amount, the forwarding amount and the comment amount after all the public opinion events are reported by the self-media, and the praise amount and the comment amount after the last public opinion event is reported; dividing the content data of the public sentiment influence of the relevant self-media into a training set and a testing set; then, preprocessing the reported amount of the praise amount after the last public sentiment event report, and judging that the praise amount is larger than the reported amount and exceeds a sixth preset threshold value in numerical value to have positive influence as a preprocessing result; judging that the reported quantity is greater than the praise quantity and numerically exceeds a seventh preset threshold value, and taking the negative influence as a preprocessing result; the self-media popularity, the self-media platform activity and the self-media influence are taken as characteristics, a preprocessing result is taken as a label to train a relevant self-media attitude prediction neural network model, the attitude of the relevant self-media for the last public sentiment event reproduction is judged, and the judgment result is recorded into a table file according to a preset format and returned to the system; and finally, continuously adjusting the parameters of the model by using the data of the test set, and improving the accuracy of the relevant self-media attitude prediction model.
The recording authentication of the self-media which possibly causes public opinion crisis according to the attitude prediction result of the self-media comprises the following steps:
the public opinion crisis refers to negative influence on the society caused by releasing bad speech from media; the self-media filing authentication comprises a data processing module, a neural network model building module and a self-media early warning generating module; according to the attitude prediction result of the self-media, the self-media which possibly causes public opinion crisis is filed and authenticated, comprising the following steps: judging whether to perform bad speech authentication according to the reported quantity of the last public sentiment event from the media; if the reported quantity is higher than an eighth preset threshold value, determining that poor speech authentication is needed, and authenticating the last public sentiment information reported by the media as the poor speech; acquiring content data of the last public sentiment event through the data processing module, analyzing the reporting amount and a dynamic prediction result of the relevant self-media on the public sentiment event by utilizing a deep learning RNN algorithm, finding out the bad speech and constructing a bad speech vector; constructing and training a convolutional neural network model by utilizing the bad speech vectors through the neural network model construction module; and the self-media early warning generation module carries out early warning by utilizing the trained convolutional neural model, so that the self-media possibly causing public opinion crisis can be documented and authenticated.
The technical scheme provided by the embodiment of the invention can have the following beneficial effects:
the invention can analyze the related event and the related time of the last public sentiment event according to the positive and negative faces of the last public sentiment event, thereby predicting the popularity and focus of the last public sentiment reproduction. Meanwhile, self-media reporting the last public sentiment event is mined and the reproduction attitude of the self-media reporting the last public sentiment event is predicted. According to the result of the prediction attitude, the self-media which possibly causes public opinion danger machines are early warned and documented and authenticated, so that better public opinion influence is maintained, bad opinions are sanctioned, and the green and healthy network platform is maintained.
[ description of the drawings ]
Fig. 1 is a flowchart of an analysis method for mining and monitoring public sentiment information content according to the present invention;
fig. 2 is a schematic structural diagram of a public opinion associated event analysis system according to the present invention.
[ detailed description ] embodiments
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in detail with reference to the accompanying drawings and specific embodiments.
Fig. 1 is a flow chart of an analysis method for mining and monitoring public sentiment information content according to the present invention. As shown in fig. 1, the analysis method for mining and monitoring public opinion information content may specifically include:
and step 101, obtaining the last public sentiment information.
Public opinion information based on internet obtains public opinion information the last time, include: acquiring a preset condition; the preset conditions comprise at least one of preset time, a preset field, a preset place and a preset object; and acquiring the last public opinion information by adopting a public opinion calculation analysis method based on the preset condition. Based on the preset condition, acquiring the last public opinion information by adopting a public opinion calculation analysis method, comprising the following steps of: according to the preset condition, crawling at least one piece of first information which is the same as the preset condition from an Internet database through a crawler; when at least one first message is one, determining the first message as the last public opinion message; or when at least one piece of first information is multiple, crawling the browsing volume of the multiple pieces of first information, and screening out at least one piece of second information with the highest browsing volume from the multiple pieces of first information; when at least one second message is one, determining the second message as the last public opinion message, or when at least one second message is multiple, crawling the mutual amount of multiple second messages, and determining the second message with the highest mutual amount in the multiple second messages as the last public opinion message; the mutual quantity refers to the sum of the amount of praise and the amount of review reply contained in the information.
And 102, identifying the positive and negative surfaces of the last public sentiment event according to the last public sentiment information.
The method for identifying the positive and negative surfaces of the last public sentiment event refers to classifying the positive and negative polarities of the last public sentiment information, and comprises the following steps: firstly, extracting the characteristics of the last public sentiment information to make a computer system distinguish the text content expressing the real subjective information; the characteristic extraction means that the computer screens out the paragraph sentences with subjective information by detecting the information text and consulting the standard word stock; the subjective information is words with emotional colors and positive and negative face tendentiousness; then, based on the paragraph sentence with subjective information, further extracting the meaning expressed by the paragraph sentence, and selecting a viewpoint of a party on a specific topic, including: topic extraction, opinion holder identification and selection of statements; the theme extraction refers to extracting the detailed aspects of the theme with comments and expression; the opinion holder identification is to identify a person holding the opinion; the statement selection refers to identifying the opinions issued by the view holders and removing statements of other people; secondly, constructing a single classification SVM model by taking all historical public sentiment events as samples based on influence factors; the influence factors include extracted topics, statements of opinion holders; then mapping a value interval of the distance from the sample output in the middle of the single classification SVM model to the spherical center of the hypersphere to [0,1] by using an activation function; if the mapping result is equal to the first preset threshold, the polarity of the last public sentiment event is neutral; if the mapping result is smaller than a first preset threshold, the polarity of the last public sentiment event is opposite to that of the historical public sentiment event; otherwise, the polarity of the last public sentiment event is the same as that of the historical public sentiment event.
And 103, analyzing the factors of the last public sentiment event.
Analyzing the occurrence factor of the last public sentiment event according to the positive and negative identified public sentiment event; the factors include an association event and an association time; the related event refers to an event causing the last public sentiment event to occur; the associated time refers to a time period which causes the last public sentiment event to occur; and calculating the incidence degree of the last public sentiment event through the incidence function of the last public sentiment event, and analyzing the incidence factors. For example, a manager Portal interface inputs '3 months and 21 days in 2022', and three pieces of information 'Tengxin investing and reading group', 'Ali investing tea is beautiful and beautiful', and 'Quanzhou Wanda plaza is put into use formally' are obtained by a public opinion calculation and analysis method; the browsing amounts are 10w,10w and 2w respectively; then the mutual momentum of the information of the "Tengxin invested reading text group" and the "Ali invested tea color pleasing" is obtained, and the mutual momentum is respectively found to be 1w,6k; and finally, recommending a display result on a Portal interface of a manager to acquire the last public sentiment information of the Tencent investment reading group.
And analyzing the related event of the last public sentiment event.
Analyzing the related event occurring in the last public sentiment event by a public sentiment related event analysis system; the public opinion associated event analysis system comprises a preprocessing module, a topology module and an expert module; the preprocessing module is used for preprocessing the text of the last public sentiment event; the text preprocessing refers to dividing keywords of the text as sub-information including places, objects and states; the topology module is used for classifying the sub-information of the preprocessing module into different layers, then respectively carrying out topology on the sub-information of the same layer and the sub-information of different layers, and generating various types of sub-information based on the information of the last public sentiment event; and the expert module crawls texts containing the same sub-information in an internet big database according to the various types of sub-information generated by the topology module, takes the content corresponding to the text of which the sum of the browsing amount and the mutual amount exceeds a second preset threshold value as a correlation event, and generates an analysis report. For example, in historical public opinion events, public opinion is negative when a listed company breaks down; in one article, a computer detects a paragraph sentence corresponding to two words of 'careless' and 'difficult'; further detecting that the theme is the bankruptcy reformation of the listed company; the presentation of bloggers is that the financial and capital market areas are poor, there is a shortage of funds, and the supply is insufficient; through an SVM model, a first preset threshold value of a mapping result is 0.5; analyzing the positive and negative polarities of the last public sentiment event, wherein the mapping result is 0.9; the last public sentiment event can be obtained as negative.
And analyzing the association time of the last public sentiment event.
The correlation time analysis is used for analyzing a time period; firstly, presetting data granularity; the data granularity refers to the degree of refining the data of the last public sentiment event; dividing time intervals according to the data granularity, drawing a time interval distribution graph, wherein the abscissa is time, and the ordinate is a correlation event occurrence index, if a correlation event occurs in the time intervals, the ordinate of the time interval distribution graph is 1, otherwise, the ordinate is 0; if the vertical coordinate value appearing in the preset time interval in the time interval distribution diagram is 1 in the preset time period, the last public sentiment event is associated with the time period, and the horizontal coordinate value corresponding to the vertical coordinate value in the time interval distribution diagram is the association time. For example, the associated events of "the owner has a popular price of oil" include "the demand for purchasing a new energy vehicle increases", "the owner selects to add compressed natural gas to the vehicle as fuel"; there is no association time; calculating the incidence correlation degree of ' car owner's heat oil price rising ' to be 1.2 through the last public sentiment event incidence correlation function; it can be analyzed that the event occurs due to a correlation event, which is not correlated with time, and the correlation factors are "new energy source" and "compressed natural gas".
And calculating the last public sentiment event occurrence relevance according to the last public sentiment event occurrence relevance function.
Establishing a last public sentiment event occurrence correlation function, namely R = Sr W1+ Tr W2; wherein, R represents the relevance degree of the last public sentiment event; sr represents the number of associated events; tr represents the number of association times; w1, W2 represent the corresponding weights, respectively, and W1+ W2=1. For example, a positive public sentiment event has been obtained, the content of which is "the first 5G smart market made by national telephone theaters"; dividing four keywords of 'national theater', 'first', '5G' and 'intelligent market' into four layers by a public sentiment association event analysis system; the first layer of topological relation is 'national theater' - > 'China Union' - > 'Huacheng technology'; the second layer of topological relation is 'first' - > 'yard and yard integration'; the third layer of topological relation is ' 5G ' -intelligent theater '; the fourth layer of topological relation is 'smart market' - > 'rich experience' - > 'wide culture spread'; then crawling the texts with the browsing amount and the interaction amount exceeding 1w to generate a final associated event analysis report; the results are two, namely 'national voice theatre creation of 5G intelligent market' and 'cultural spread of the cloud by different military project'.
And 104, analyzing the propagation degree of the last public sentiment event reproduction based on the occurrence factor of the last public sentiment event.
The propagation degree is measured by the heat of the last public sentiment event reproduction and the focus of the last public sentiment event reproduction; the popularity is the fire and heat degree reproduced by the last public sentiment event, and reflects the attention degree of the public sentiment event reproduced by the public at the last time; the focus refers to a concentration point of public disputes when the last public sentiment event is reproduced; establishing a system for evaluating the propagation degree of the reproduction of the public sentiment event, namely SPR = (the time point of the last time of the reproduction of the public sentiment event is defined by the analyzed heat degree of the reproduction of the last public sentiment event and the reproduction focus of the last public sentiment event
HOT + FOC)/2; wherein, SPR represents the propagation degree evaluation index of the last reproduction of the public sentiment event; the HOT represents a heat evaluation index of the last public sentiment event reproduction, if the heat of the last public sentiment event reproduction exceeds a third preset threshold, HOT =1, otherwise HOT =0; the FOC represents a focus evaluation index of the last reproduction of the public sentiment event, if the probability of the occurrence of the key word of the focus of the last reproduction of the public sentiment event in the crawled big data word bank is greater than a fourth preset threshold value, FOC =1, otherwise, FOC =0; analyzing the propagation degree of the last public sentiment event reproduction according to the BP neural network; and determining a training sample and a testing sample according to the public sentiment event popularity function and the focus prediction model, wherein the training sample is used for neural network training, and the testing sample is used for detecting the relative error between the actual value and the predicted value. For example, 2 choices of "category" and "brand" can be made in the data granularity selection of the public sentiment event that 1.5L of peanut oil is used up by citizens of Guangdong province every week; the citizen has lower brand dependence on the peanut oil; if the noise is larger in the granularity of a single brand, the citizen may purchase other peanut oil between two times of purchasing one peanut oil, and the effective association cannot be excavated; but if at a particle size of the category, the commercial demand for peanut oil is still constant weekly, although the market is traded for brands; dividing the time interval into one week time, and drawing a time interval distribution diagram to see that the ordinate values of every saturday are all 1; the associated time is six weeks per week.
And analyzing the heat of the last reproduction of the public sentiment event based on the number of the associated events.
Analyzing the reproduction popularity of the last public opinion event based on the number of the related events; if the number of the associated events is more, the total attention of the associated events and the total attention of the last public opinion event are higher, the popularity of the last public opinion event is higher, and the reproduction popularity of the last public opinion event is also higher; and (3) establishing an ARIMA regression model according to the previous public sentiment event popularity calculated by the correlation event total attention and the public sentiment event popularity function as the original data: drawing a change curve of the total attention degree of the associated events and the popularity of the last public sentiment event in a time sequence; if the curve is not stable, differentiating the sequence corresponding to the curve, and drawing a differential broken line graph to determine the order d; then continuously drawing an autocorrelation function graph and a partial autocorrelation function graph of the differential line graph, and respectively judging the orders p and q of the model according to the shapes of the graphs; and finally, reading a predicted value after the time is moved backwards according to an ARIMA (p, d, q) output curve, namely the reproduction popularity of the last public sentiment event. For example, the associated events of "the owner has a popular price of oil" include "the demand for purchasing a new energy vehicle increases", "the owner selects to add compressed natural gas to the vehicle as fuel"; there is no association time; so Sr =2,tr =0; setting the weights of the corresponding correlation indexes to be 0.6,0.4 respectively; the incidence degree of the ' car owner's opinion price rise ' is calculated to be 1.2 through the incidence function of the last public sentiment event.
And calculating the popularity of the last public sentiment event according to the popularity function of the public sentiment event.
Establishing a public sentiment event popularity function, namely Hpr = Sr × Ca1+ Ca2; wherein Hpr represents the popularity of the last public sentiment event; sr represents the number of associated events; ca1 represents the total degree of attention of the associated event; the total attention of the associated event refers to the sum of the browsing amount, the praise amount, the comment amount and the forwarding amount of the public for the associated event in the statistical data of the background of the event distribution system. Ca2 represents the total attention of the last public sentiment event; the total attention of the last public sentiment event refers to the sum of the browsing amount, the praise amount, the comment amount and the forwarding amount of the public sentiment event in the statistical data of the background of the event publishing system. For example, the evaluation system of the propagation degree of the reproduction of the public sentiment event is a large range (0.70-1), a general range (0.40-0.69) and a small range (0-0.39); the input layer of the BP neural network includes 2 indices: the popularity of the last public sentiment event reproduction, the probability of the occurrence of the keyword of the focus of the last public sentiment event reproduction in the crawled big data word bank, namely n =2, the output layer is the propagation degree evaluation result of the last public sentiment event reproduction, namely m =1, and the hidden layer is 2. After the data are trained by the neural network, the values of the reproduction and propagation degrees of the last public sentiment events A, B and C are 0.60,0.82,0.25 respectively, which shows that the last public sentiment event A is propagated in a general range, the last public sentiment event B is propagated in a larger range, and the last public sentiment event C is propagated only in a smaller range.
Predicting the focus of the last public sentiment event reproduction.
All focuses of history are obtained by crawling internet big data, and a focus prediction model is utilized to predict the focus of the last public sentiment event reproduction; the contents of the focus prediction model are as follows: taking all the historical focuses as sample data, analyzing by a K-means clustering algorithm, and selecting a plurality of centroids according to keywords of all the historical focuses; the keywords refer to words with frequency greater than a fifth preset threshold value in the big data; then calculating Euclidean distance from each sample to the centroid, and classifying each sample to the closest centroid respectively; then, averaging the sample data of each category to obtain a new centroid of each category; and finally, iteratively classifying the samples and solving for a new centroid until the clustering is converged and the new and old centroids are not changed any more. Inputting the last public sentiment event and the related event as parameters, and classifying the keyword type of the most similar focus to the last public sentiment event through a clustering algorithm as algorithm output; and finally, analyzing to obtain the reproduced focus of the last public sentiment event according to the key words of the focus and the last public sentiment event. For example, the "Tengcong investment and reading group" has 3 associated events, and the total attention of the associated events is 10w; so the popularity of the last public sentiment event is 35w; through analysis of an ARIMA regression model, an original data curve is in an ascending trend and is not smooth, so that a sequence is subjected to first-order difference processing; a first-order difference line graph is obtained after the treatment, and no obvious ascending and descending trend exists, so that d =1; continuously drawing an autocorrelation analysis chart and a partial autocorrelation function chart of the first-order difference line chart, and determining the model order; the resulting autocorrelation function sequence exhibits a sine wave shape, characteristic of the AR (2) model, i.e. p =2; the partial autocorrelation function sequence has only one significant non-zero, so the decision time sequence is suitable for a second-order moving average model MA (1), i.e. q =1; so the output result is ARIMA (2,1,1); and drawing a function graph according to the model parameter values, and finding out that the predicted value at the later time point is 40w according to the function graph, namely predicting the reproduction popularity of the last public sentiment event to be 40w.
And 105, mining the relevant self-media of the previous public sentiment event report according to the reproduction propagation degree of the previous public sentiment event.
Mining the type and the quantity of relevant self-media reported by the last public sentiment event based on the propagation degree of the reproduction of the last public sentiment event; the relevant self-media comprises: the system comprises a picture-text self-media, a video self-media, an audio self-media and a live self-media; the more relevant self-media reported for the last public sentiment event, the higher the popularity of the last public sentiment event, and the greater the propagation degree of the reproduction of the last public sentiment event; mining the association relation between the propagation degrees of the relevant self-media corresponding to the last public sentiment event reproduction by utilizing an Apriori algorithm; the association rule algorithm process is as follows: through iterative input, retrieving all frequent item sets in the types, the number and the number of the related self-media in the network big data platform, namely the item set with the support degree not lower than a set threshold value; the support degree is the percentage of the times of the factor item sets in the big data platform to the total times of the factor item sets; then, a rule meeting the confidence coefficient is constructed by utilizing the frequent item set; the confidence coefficient is the percentage of the total item set of each factor in all the item sets of the database. Judging the strength of the relevance between the support degree and the confidence degree of the transmission degree reproduced by the last public sentiment event by comparing the support degree and the confidence degree, and drawing a network graph related to the transmission degree reproduced by the last public sentiment event according to the association rule; the network diagram is related to the propagation degree of the last public sentiment event reproduction and shows the related self-media reported by the last public sentiment event. For example, the "Tencent investment and review group" has 3 related events, and the sum of the total browsing volume, the praise volume, the comment volume and the forwarding volume is 10w through the background data display of the release system; namely the total attention of the associated events is 10w; and the total attention of the last public sentiment event is 5w; calculated by the popularity event popularity function, the popularity of the last public opinion event is 35w.
And step 106, predicting the attitude of the relevant self-media for the last reproduction of the public sentiment event.
And establishing a relevant self-media attitude prediction model based on a deep learning method, and predicting the attitude of the relevant self-media to the last public sentiment event reproduction. Firstly, collecting a large number of public opinion influence of relevant self-media, including self-media awareness, self-media platform activeness, self-media influence and influence of a last public opinion event; the self-media popularity is the number of fans of the self-media, and the larger the number of fans is, the higher the self-media popularity is; the self-media platform activeness comprises the frequency and the total number of published public opinion events on the self-media platform; the self-media influence comprises the amount of praise, forwarding amount and comment amount after all public sentiment events are reported by self-media; the influence of the last public sentiment event refers to the positive influence or negative influence of the self-media on the last public sentiment event; the positive influence is measured by the amount of praise after the last public sentiment event report, and the more the amount of praise is, the greater the positive influence is; the negative influence is measured by the reporting quantity after the last public sentiment event is reported, and the more the reporting quantity is, the greater the negative influence is; crawling content data of the public opinion influence of the relevant self-media of the Internet big data platform by a crawler, wherein the content data comprises the number of vermicelli of the self-media, the frequency and the total number of all public opinion events released from the media platform, the praise amount, the forwarding amount and the comment amount after all the public opinion events are reported by the self-media, and the praise amount and the comment amount after the last public opinion event is reported; dividing the content data of the public opinion influence of the relevant self-media into a training set and a testing set; then, preprocessing the reported amount of the praise amount after the last public sentiment event report, and judging that the praise amount is larger than the reported amount and exceeds a sixth preset threshold value in numerical value to have positive influence as a preprocessing result; judging that the reported quantity is greater than the praise quantity and numerically exceeds a seventh preset threshold value, and taking the negative influence as a preprocessing result; the self-media popularity, the self-media platform liveness and the self-media influence are taken as characteristics, a preprocessing result is taken as a label to train a relevant self-media attitude prediction neural network model, the attitude of the relevant self-media for the last public sentiment event reproduction is judged, and the judgment result is recorded into a form file according to a preset format and returned to the system; and finally, continuously adjusting the parameters of the model by using the data of the test set, and improving the accuracy of the relevant self-media attitude prediction model. For example, for the last public sentiment event, namely "Tengchong invest reading group", there are "reading APP developed by the reading group", "novel reading becomes the habit of pupils"; inputting the 3 events into a focus prediction model, and analyzing by a K-means clustering algorithm to obtain a keyword of a focus, namely 'reading'; finally, the previous public sentiment event is combined, and the reproduction focus of the previous public sentiment event is predicted to be 'the success of developing the novel extracting and reading software in the WeChat dialogue mode'.
Step 107, recording and authenticating the self-media which may cause public opinion crisis according to the attitude prediction result of the self-media.
The public opinion crisis refers to negative influence on the society caused by releasing bad speech from media; the self-media filing authentication comprises a data processing module, a neural network model building module and a self-media early warning generating module; according to the attitude prediction result of the self-media, the self-media which possibly causes public opinion crisis is filed and authenticated, comprising the following steps: judging whether to perform bad speech authentication according to the reported quantity of the last public sentiment event from the media; if the reported quantity is higher than an eighth preset threshold value, determining that bad speech authentication is needed, and authenticating the last public sentiment information reported by the media as the bad speech; acquiring content data of a last public sentiment event through the data processing module, analyzing attitude prediction results of the report quantity and relevant self-media on the public sentiment event reproduction by utilizing a deep learning RNN algorithm, finding out the bad speech and constructing a bad speech vector; constructing and training a convolutional neural network model by utilizing the bad utterance vector through the neural network model construction module; and the self-media early warning generation module carries out early warning by utilizing the trained convolutional neural model, so that the self-media possibly causing public opinion crisis is filed and authenticated. For example, the index value of the propagation degree of "Tencent investment corpus" calculated by the system for evaluating the propagation degree of public sentiment event reproduction is 0.80, and the propagation range is large; mining the association relation between the propagation degrees of all relevant self-media corresponding to the last public sentiment event reproduction through an Apriori algorithm, wherein the first association rule is' a larger range: (
0.70-0.80) = > wechat public number, support =0.769, confidence =0.953"; the second association rule is "large range (0.70-0.80) = > microblog, support =0.746, confidence =0.921"; the third association rule is "greater range (0.70-0.80) = > jittering, support =0.824, confidence =0.896"; the fact that the public sentiment event with the large propagation range is likely to be propagated on three platforms of the WeChat public number and the microblog and the trembler at the last time is shown, namely, the propagation degree of the 'large-range (0.70-0.80)' is associated with three types of relevant self-media of the 'WeChat public number', 'microblog' and 'trembler'. Summarizing a plurality of association rules and drawing a propagation degree association network graph of the last reproduction of the public sentiment event, wherein the type and the number of the relevant self-media on the propagation degree association network graph of the last reproduction of the public sentiment event are the relevant self-media of the last public sentiment event reporting "Tengxuengcai investing and reading group".
The above description is only an embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.
Programs for implementing information governance for the present invention may be written in computer program code for carrying out operations of the present invention in one or more programming languages, including an object oriented programming language such as Java, python, C + +, or a conventional procedural programming language such as C or a similar programming language, or a combination thereof.
The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
In the embodiments provided in the present invention, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described device embodiments are merely illustrative, and for example, the division of the units is only one logical functional division, and other divisions may be realized in practice.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.
The integrated unit implemented in the form of a software functional unit may be stored in a computer readable storage medium. The software functional unit is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) or a processor (processor) to execute some steps of the methods according to the embodiments of the present invention.
And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

Claims (8)

1. An analysis method for public opinion information content mining and dissemination monitoring is characterized by comprising the following steps:
acquiring last public opinion information; identifying the positive and negative surfaces of the last public sentiment event according to the last public sentiment information; analyzing the factor of the last occurrence of the public sentiment event, wherein the analyzing the factor of the last occurrence of the public sentiment event specifically comprises the following steps: analyzing the association event of the last public sentiment event, analyzing the association time of the last public sentiment event, and calculating the association degree of the last public sentiment event according to the association function of the last public sentiment event; analyzing the propagation degree of the previous public sentiment event reproduction based on the occurrence factor of the previous public sentiment event, and specifically comprising: analyzing the heat of the last reproduction of the public sentiment event based on the number of the related events, calculating the heat of the last public sentiment event according to a public sentiment event heat function, and predicting the focus of the last reproduction of the public sentiment event; mining the relevant self-media reported by the last public sentiment event according to the propagation degree of the reproduction of the last public sentiment event; predicting the attitude of the relevant self-media for the last reproduction of the public sentiment event; and carrying out record certification on the self-media which possibly causes public opinion crisis according to the attitude prediction result of the self-media.
2. The method of claim 1, wherein the obtaining of last public opinion information comprises:
public opinion information based on internet obtains public opinion information the last time, include: acquiring a preset condition; the preset condition comprises at least one of preset time, a preset field, a preset place and a preset object; acquiring the last public opinion information by adopting a public opinion calculation analysis method based on the preset condition; based on the preset conditions, a public opinion calculation and analysis method is adopted to obtain the last public opinion information, and the method comprises the following steps: according to the preset condition, crawling at least one piece of first information which is the same as the preset condition from an Internet database through a crawler; when at least one first message is one, determining the first message as the last public opinion message; or when at least one piece of first information is multiple, crawling the browsing volume of the multiple pieces of first information, and screening out at least one piece of second information with the highest browsing volume from the multiple pieces of first information; when at least one second message is one, determining the second message as the last public opinion message, or when at least one second message is multiple, crawling the mutual amount of multiple second messages, and determining the second message with the highest mutual amount in the multiple second messages as the last public opinion message; the mutual quantity refers to the sum of the amount of praise and the amount of review reply contained in the information.
3. The method of claim 1, wherein the identifying positive and negative faces of the last public sentiment event according to the last public sentiment information comprises:
the method for identifying the positive and negative surfaces of the last public sentiment event refers to classifying the positive and negative polarities of the last public sentiment information, and comprises the following steps: firstly, extracting the characteristics of the last public sentiment information to make a computer system distinguish text contents expressing real subjective information; the characteristic extraction means that a computer screens out paragraph sentences with subjective information by detecting information texts and looking up a standard word bank; the subjective information refers to words with emotional colors and positive and negative surface tendencies; then, based on the paragraph sentence with subjective information, further extracting the opinion expressed by the paragraph sentence, and selecting a viewpoint of a party on a specific topic, including: topic extraction, opinion holder identification and selection of statements; the theme extraction refers to extracting specific aspects of themes with commendable viewpoints and expressions; the opinion holder identification is to identify a person holding the opinion; the statement selection refers to identifying the opinions issued by the view holders and removing statements of other people; then, constructing a single classification SVM model based on influence factors by taking all historical public sentiment events as samples; the influence factors include extracted topics, statements of opinion holders; then mapping a value interval of the distance from the sample output in the middle of the single classification SVM model to the spherical center of the hypersphere to [0,1] by using an activation function; if the mapping result is equal to a first preset threshold value, the polarity of the last public sentiment event is neutral; if the mapping result is smaller than a first preset threshold, the polarity of the last public sentiment event is opposite to that of the historical public sentiment event; otherwise, the polarity of the last public sentiment event is the same as that of the historical public sentiment event.
4. The method of claim 1, wherein the analyzing the factors of the last public sentiment event occurrence comprises:
analyzing the occurrence factor of the last public sentiment event according to the positive and negative identified public sentiment event; the factors include an association event and an association time; the related event refers to an event causing the last public sentiment event to occur; the associated time refers to a time period which causes the last public sentiment event to occur; calculating the incidence degree of the last public sentiment event through the incidence function of the last public sentiment event, and analyzing the incidence factors; the method comprises the following steps: analyzing the related event of the last public sentiment event; analyzing the association time of the last public sentiment event; calculating the relevance degree of the last public sentiment event according to the relevance function of the last public sentiment event;
the analysis of the related event of the last public sentiment event specifically comprises the following steps:
analyzing the related event of the last public sentiment event by a public sentiment related event analysis system; the public opinion associated event analysis system comprises a preprocessing module, a topology module and an expert module; the preprocessing module is used for preprocessing the text of the last public sentiment event; preprocessing the text, namely dividing keywords of the text as sub-information comprising places, things and states; the topology module is used for classifying the sub-information of the preprocessing module into different layers, then respectively carrying out topology on the sub-information of the same layer and the sub-information of different layers, and generating various types of sub-information based on the information of the last public sentiment event; the expert module is used for crawling texts containing the same sub-information in an internet big database according to the various types of sub-information generated by the topology module, taking the content corresponding to the text of which the sum of the browsing amount and the mutual amount exceeds a second preset threshold value as a correlation event, and generating an analysis report;
the analysis of the association time of the last public sentiment event specifically comprises the following steps:
the correlation time analysis is used for analyzing a time period; firstly, presetting data granularity; the data granularity refers to the degree of refining the data of the last public sentiment event; dividing time intervals according to the data granularity, drawing a time interval distribution graph, wherein the abscissa is time, and the ordinate is a correlation event occurrence index, if a correlation event occurs in the time intervals, the ordinate of the time interval distribution graph is 1, otherwise, the ordinate is 0; if the vertical coordinate value appearing in the preset time interval in the time interval distribution diagram in the preset time period is 1, the last public sentiment event is associated with the time period, and the horizontal coordinate value of the vertical coordinate value corresponding to the time interval distribution diagram is the association time;
the method for calculating the degree of association of the last public sentiment event according to the function of association of the last public sentiment event includes:
establishing a last public sentiment event occurrence correlation function, namely R = Sr W1+ Tr W2; wherein, R represents the relevance degree of the last public sentiment event; sr represents the number of associated events; tr represents the number of association times; w1 and W2 represent the corresponding weights, respectively, and W1+ W2=1.
5. The method of claim 1, wherein the analyzing the propagation degree of the last public sentiment event reproduction based on the occurrence factors of the last public sentiment event comprises:
the propagation degree is measured by the heat of the last public sentiment event reproduction and the focus of the last public sentiment event reproduction; the popularity is the fire and heat degree reproduced by the last public sentiment event, and reflects the attention degree of the public sentiment event reproduced by the public at the last time; the focus refers to a concentration point of public disputes when the last public sentiment event is reproduced; establishing a public opinion event reproduction propagation degree evaluation system, namely SPR = (HOT + FOC)/2, according to the analyzed popularity of the last public opinion event reproduction and the focus of the last public opinion event reproduction; wherein, SPR represents the propagation degree evaluation index of the last public sentiment event reproduction; HOT represents the popularity evaluation index of the last reproduction of the public sentiment event, if the popularity of the last reproduction of the public sentiment event exceeds a third preset threshold, HOT =1, otherwise HOT =0; the FOC represents a focus evaluation index of the last public sentiment event reproduction, if the probability of the keyword of the focus reproduced by the last public sentiment event appearing in the crawled big data word bank is greater than a fourth preset threshold value, FOC =1, otherwise, FOC =0; analyzing the propagation degree of the last public sentiment event reproduction according to the BP neural network; determining a training sample and a testing sample according to a public sentiment event popularity function and a focus prediction model, wherein the training sample is used for neural network training, and the testing sample is used for detecting the relative error between an actual value and a predicted value; the method comprises the following steps: analyzing the degree of heat of the last reproduction of the public sentiment event based on the number of the related events; calculating the popularity of the last public sentiment event according to a public sentiment event popularity function; predicting the focus of the last public sentiment event reproduction;
the analyzing the popularity of the last reproduction of the public sentiment event based on the number of the associated events specifically comprises:
analyzing the reproduction popularity of the last public opinion event based on the number of the related events; if the more the associated events are, the higher the total attention of the associated events and the total attention of the last public sentiment event are, the higher the popularity of the last public sentiment event is, and the higher the popularity of the reproduction of the last public sentiment event is; and (3) establishing an ARIMA regression model according to the previous public sentiment event popularity calculated by the correlation event total attention and the public sentiment event popularity function as the original data: drawing a change curve of the total attention degree of the associated events and the popularity of the last public sentiment event in a time sequence; if the curve is not stable, differentiating the sequence corresponding to the curve, and drawing a differential line graph to determine the order d; then continuously drawing an autocorrelation function graph and a partial autocorrelation function graph of the differential broken line graph, and respectively judging the orders p and q of the model according to the shapes of the graphs; finally, according to an ARIMA (p, d, q) output curve, reading a predicted value after the time is moved backwards, namely the heat of the last public sentiment event reproduction;
according to the public sentiment event popularity function, calculating the last public sentiment event popularity, and specifically comprising:
establishing a public sentiment event popularity function, namely Hpr = Sr × Ca1+ Ca2; wherein Hpr represents the popularity of the last public sentiment event; sr represents the number of associated events; ca1 represents the total degree of attention of the associated event; the total attention of the associated event refers to the sum of the browsing amount, the praise amount, the comment amount and the forwarding amount of the public for the associated event in the statistical data of the background of the event publishing system; ca2 represents the total attention of the last public sentiment event; the total attention of the last public sentiment event refers to the sum of the browsing amount, the praise amount, the comment amount and the forwarding amount of the public sentiment event in the statistical data of the background of the event publishing system;
the predicting the focus of the last public sentiment event reproduction specifically comprises the following steps:
all focuses of history are obtained by crawling internet big data, and a focus prediction model is utilized to predict the focus reproduced by the last public sentiment event; the contents of the focus prediction model are as follows: taking all focuses of the history as sample data, analyzing by a K-means clustering algorithm, and selecting a plurality of centroids according to keywords of all focuses of the history; the keywords refer to words with frequency greater than a fifth preset threshold value in the big data; then calculating Euclidean distance from each sample to the centroid, and classifying each sample to the closest centroid respectively; then, averaging the sample data of each category to obtain a new centroid of each category; finally, carrying out sample classification and finding new mass centers in an iteration mode until clustering is converged and the new mass centers and the old mass centers are not changed any more; inputting the last public sentiment event and the related event as parameters, and classifying the keyword type of the most similar focus to the last public sentiment event through a clustering algorithm as algorithm output; and finally, analyzing to obtain the reproduced focus of the last public sentiment event according to the key words of the focus and the last public sentiment event.
6. The method of claim 1, wherein the mining of the self-media related to the previous public sentiment event report according to the propagation degree of the previous public sentiment event reproduction comprises:
mining the type and the quantity of relevant self-media reported by the last public sentiment event based on the propagation degree of the reproduction of the last public sentiment event; the relevant self-media comprises: the system comprises a video server, an audio server and a video server, wherein the video server is used for playing video and audio; the more relevant self-media reported to the last public sentiment event, the higher the popularity of the last public sentiment event, and the greater the propagation degree of the reproduction of the last public sentiment event; mining the association relation between the propagation degrees of the relevant self-media corresponding to the previous public sentiment event reproduction by utilizing an Apriori algorithm; the association rule algorithm process is as follows: through iterative input, retrieving all frequent item sets in the types, the number and the number of the related self-media in the network big data platform, namely the item set with the support degree not lower than a set threshold value; the support degree is the percentage of the times of the occurrence of each factor item set in the big data platform to the times of each total item set; then, constructing a rule meeting the confidence coefficient by using the frequent item set; the confidence coefficient is the percentage of the total item set of each factor in all the item sets of the database; judging the strength of the relevance of the propagation degree reproduced by the last public sentiment event by comparing the support degree and the confidence degree, and drawing a propagation degree relevance network graph reproduced by the last public sentiment event according to a relevance rule; the network diagram is associated with the propagation degree of the last public sentiment event reproduction and shows the relevant self-media reported by the last public sentiment event.
7. The method of claim 1, wherein predicting the attitude of the relevant self-media with respect to the last public sentiment event reproduction comprises:
establishing a relevant self-media attitude prediction model based on a deep learning method, and predicting the attitude of the relevant self-media for the last public sentiment event reproduction; firstly, collecting a large amount of public sentiment influence of relevant self-media, including self-media popularity, self-media platform activeness, self-media influence and influence of a last public sentiment event; the self-media popularity is the number of fans of the self-media, and the larger the number of fans is, the higher the self-media popularity is; the self-media platform activeness comprises the frequency and the total number of all public sentiment events published on the self-media platform; the self-media influence comprises the praise amount, the forwarding amount and the comment amount after all public sentiment events are reported by self-media; the influence of the last public sentiment event refers to the positive influence or negative influence of the media on the last public sentiment event; the positive influence is measured by the amount of praise after the last public sentiment event report, and the more the amount of praise is, the greater the positive influence is; the negative influence is measured by the reported quantity after the last public sentiment event is reported, and the more the reported quantity is, the greater the negative influence is; crawling content data of public opinion influence of the relevant self-media of an Internet big data platform by a crawler, wherein the content data comprises the number of vermicelli of the self-media, the frequency and the total number of all public opinion events released from the media platform, the amount of praise, forwarding amount and comment amount after all the public opinion events are reported by the self-media, and the amount of praise and comment amount after the last public opinion event is reported; dividing the content data of the public opinion influence of the relevant self-media into a training set and a testing set; then, preprocessing the reported amount of the praise amount after the last public sentiment event report, and judging that the praise amount is larger than the reported amount and exceeds a sixth preset threshold value in numerical value to have positive influence as a preprocessing result; judging that the reporting quantity is greater than the praise quantity and the numerical value exceeds a seventh preset threshold value, and taking the negative influence as a preprocessing result; the self-media popularity, the self-media platform liveness and the self-media influence are taken as characteristics, a preprocessing result is taken as a label to train a relevant self-media attitude prediction neural network model, the attitude of the relevant self-media for the last public sentiment event reproduction is judged, and the judgment result is recorded into a form file according to a preset format and returned to the system; and finally, continuously adjusting the parameters of the model by using the data of the test set, and improving the accuracy of the relevant self-media attitude prediction model.
8. The method of claim 1, wherein the documenting and authenticating the self-media which may cause public opinion crisis according to the attitude prediction result of the self-media comprises:
the public opinion crisis refers to negative influence on the society caused by releasing bad speech from media; the self-media filing authentication comprises a data processing module, a neural network model building module and a self-media early warning generating module; according to the attitude prediction result of the self-media, the self-media which possibly causes public opinion crisis is recorded and authenticated, comprising the following steps: judging whether to perform bad speech authentication according to the reported quantity of the last public sentiment event from the media; if the reported quantity is higher than an eighth preset threshold value, determining that poor speech authentication is needed, and authenticating the last public sentiment information reported by the media as the poor speech; acquiring content data of a last public sentiment event through the data processing module, analyzing attitude prediction results of the report quantity and relevant self-media on the public sentiment event reproduction by utilizing a deep learning RNN algorithm, finding out the bad speech and constructing a bad speech vector; constructing and training a convolutional neural network model by utilizing the bad utterance vector through the neural network model construction module; and the self-media early warning generation module carries out early warning by utilizing the trained convolutional neural model, so that the self-media possibly causing public opinion crisis can be documented and authenticated.
CN202210653244.4A 2022-06-07 2022-06-07 Public opinion information content mining and propagation monitoring analysis method Pending CN115269950A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210653244.4A CN115269950A (en) 2022-06-07 2022-06-07 Public opinion information content mining and propagation monitoring analysis method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210653244.4A CN115269950A (en) 2022-06-07 2022-06-07 Public opinion information content mining and propagation monitoring analysis method

Publications (1)

Publication Number Publication Date
CN115269950A true CN115269950A (en) 2022-11-01

Family

ID=83759496

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210653244.4A Pending CN115269950A (en) 2022-06-07 2022-06-07 Public opinion information content mining and propagation monitoring analysis method

Country Status (1)

Country Link
CN (1) CN115269950A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117494897A (en) * 2023-11-14 2024-02-02 西安康奈网络科技有限公司 Single public opinion event development tendency judging method

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117494897A (en) * 2023-11-14 2024-02-02 西安康奈网络科技有限公司 Single public opinion event development tendency judging method
CN117494897B (en) * 2023-11-14 2024-05-17 西安康奈网络科技有限公司 Single public opinion event development tendency judging method

Similar Documents

Publication Publication Date Title
Ghose et al. Estimating the helpfulness and economic impact of product reviews: Mining text and reviewer characteristics
CN106651424B (en) Power user portrait establishing and analyzing method based on big data technology
Adebayo FairML: ToolBox for diagnosing bias in predictive modeling
Vicario et al. A review of data science in business and industry and a future view
CN112199608B (en) Social media rumor detection method based on network information propagation graph modeling
CN112182246B (en) Method, system, medium, and application for creating an enterprise representation through big data analysis
Alsubari et al. Fake reviews identification based on deep computational linguistic
Gong et al. A survey on dataset quality in machine learning
Yang The evaluation of online education course performance using decision tree mining algorithm
CN114358014B (en) Work order intelligent diagnosis method, device, equipment and medium based on natural language
CN115269950A (en) Public opinion information content mining and propagation monitoring analysis method
Punlumjeak et al. Big data analytics: Student performance prediction using feature selection and machine learning on microsoft azure platform
Lomborg et al. Automated decision‐making: Toward a people‐centred approach
CN110222180A (en) A kind of classification of text data and information mining method
Nikitinsky Improving talent management with automated competence assessment: Research summary
CN109254993B (en) Text-based character data analysis method and system
Bin Cognitive Web Service-Based Learning Analytics in Education Systems Using Big Data Analytics.
CN114493849A (en) Artificial intelligence-based target recommendation method, system, device and medium
Chen et al. Data science for public policy
Koh et al. BAD: BiAs Detection for Large Language Models in the context of candidate screening
de Souza et al. Spatial influence evaluation research of economic growth on greenhouse gas emissions in Brazil
CN114077710B (en) Social network account identification method and device and electronic equipment
Allogmany et al. An approach to dealing with incremental concept drift in personalized learning systems
KR102365429B1 (en) Online mobile survey platform using artificial intelligence to identify unfaithful respondents
Lauro et al. Data science and social research

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination