CN112417253B

CN112417253B - Multi-dimensional public opinion monitoring system and method

Info

Publication number: CN112417253B
Application number: CN202011573978.9A
Authority: CN
Inventors: 王三山; 付巍; 张瑜
Original assignee: Time Know Beijing Culture Technology Co ltd
Current assignee: Wang Sanshan
Priority date: 2020-12-28
Filing date: 2020-12-28
Publication date: 2021-10-15
Anticipated expiration: 2040-12-28
Also published as: CN112417253A

Abstract

The application relates to big data processing technology field, especially relates to a multidimensional public opinion monitoring system and method, wherein, multidimensional public opinion monitoring system includes: the system comprises a hotspot discovery module, an emotion analysis module and a viewpoint mining module, and can carry out public opinion monitoring and analysis from multiple dimensions and all around under the media-fusing era with rich and various contents and forms.

Description

Multi-dimensional public opinion monitoring system and method

Technical Field

The application relates to the technical field of big data processing, in particular to a multidimensional public opinion monitoring system and method.

Background

The "public sentiment" refers to public sentiment or emotion, and the public sentiment is transformed into public sentiment when gathered to a certain extent, so that the public sentiment monitoring is very important for media supervisors, public sentiment guides or content producers such as mainstream media represented by broadcast television, transmission platforms with great social influence and the like.

The smart media era has come from the convergence of traditional media and emerging media to the convergence of new generation information technology-enabled media such as big data, artificial intelligence, etc. Under the intelligent media era, the processing object of public opinion monitoring is rich and diversified contents and forms of integrated media, but the existing public opinion monitoring technology has single function, is usually only used for collecting and discretely analyzing a few specific information sources, cannot form linkage between the information sources and between functions, has insufficient systematization degree, and is difficult to comprehensively analyze and monitor public opinion events.

Therefore, how to comprehensively analyze and monitor public sentiment events in the media-oriented era with rich and varied contents and forms is a technical problem to be solved by those skilled in the art.

Disclosure of Invention

The application provides a multidimensional public opinion monitoring system and a multidimensional public opinion monitoring method, which are used for comprehensively analyzing and monitoring public opinion events under the media-integrated era with rich and various contents and forms.

In order to solve the technical problem, the application provides the following technical scheme:

a multidimensional public opinion monitoring system comprises: the system comprises a hotspot discovery module, an emotion analysis module and a viewpoint mining module; wherein, the hot spot discovery module comprises: the system comprises a first information acquisition sub-module, a first data processing sub-module, a word frequency statistics sub-module, a hot word cloud generation sub-module, a hot news generation sub-module, a hot microblog information acquisition sub-module and a hot dynamic graph generation sub-module; the first information acquisition submodule acquires news information of a news website; the first data processing submodule performs data processing on the news information acquired by the first information acquisition submodule to remove invalid information, remove redundant information and add missing information to obtain text information; the word frequency counting submodule carries out word frequency counting on the text information obtained by the processing of the first data processing submodule; the hot word cloud generating submodule obtains hot words according to the word frequency in the text information obtained by the word frequency statistics submodule, and generates a hot word cloud picture according to the hot words; the hot news generating submodule generates hot words found by the submodule according to the hot word cloud, and displays news titles and URLs corresponding to the hot words as hot news; the hot microblog information acquisition sub-module takes the hot words found by the hot word cloud generation sub-module as keywords and acquires microblog information containing the keywords; the hot spot dynamic graph generating submodule preprocesses the microblog information acquired by the hot spot microblog information acquisition submodule and generates a hot spot dynamic graph according to the preprocessed information; the emotion analysis module comprises: the emotion recognition system comprises a second information acquisition sub-module, a second data processing sub-module, an emotion analysis sub-module, an emotion distribution diagram sub-module, a positive attitude word cloud generation sub-module and a negative attitude word cloud generation sub-module; the second information acquisition sub-module acquires comment information published on websites and forums; the second data processing submodule performs data processing on the comment information acquired by the second information acquisition submodule to remove invalid information, remove redundant information and add missing information to obtain text information; the emotion analysis submodule carries out emotion analysis on the text information processed and obtained by the second data processing submodule so as to obtain the proportion of each emotion in all the text information, the text information of the positive attitude and the text information of the negative attitude; the emotion distribution map submodule generates an emotion distribution map according to the proportion of each emotion in all text information obtained by the emotion analysis submodule; the positive attitude word cloud generating submodule generates a positive attitude word cloud picture according to the text information of the positive attitude obtained by the emotion analyzing submodule; the negative attitude word cloud generating submodule generates a negative attitude word cloud picture according to the text information of the negative attitude obtained by the emotion analyzing submodule; the viewpoint mining module comprises: a third information acquisition sub-module, a third data processing sub-module, a viewpoint mining sub-module, a viewpoint statistical image-text generation sub-module, a heat viewpoint mining sub-module and a heat statistical image-text generation sub-module; microblog information issued on a microblog by a third information acquisition submodule; the third data processing submodule preprocesses the microblog information acquired by the third information acquisition submodule to generate an information bar; the viewpoint mining submodule performs viewpoint mining on the information strips processed by the third data processing submodule; the viewpoint statistical image-text generation submodule generates viewpoint statistical image-text according to the viewpoint mined by the viewpoint mining submodule; the hot viewpoint mining submodule carries out hot calculation according to the information bar generated by the third data processing submodule and obtains hot keywords; the heat degree view point statistic sub-module generates a heat degree view point statistic map according to the heat degree value calculated by the heat degree view point mining sub-module, and displays the heat degree key words on the heat degree view point statistic map.

The multidimensional public opinion monitoring system as described above, wherein preferably, the hotspot dynamic graph generating submodule sorts all the information pieces formed after the preprocessing according to the publishing time; grouping all the information strips in the sequence to form a plurality of information strip groups; calculating the heat of each information bar group; calculating the emotion of each information strip group; generating a hot degree dynamic graph of the hot spot dynamic graph according to the release time of each information strip group and the hot degree of each information strip group; and generating an emotion distribution dynamic graph of the hot spot dynamic graph according to the release time of each information strip group and the emotion of each information strip group.

The multidimensional public opinion monitoring system is characterized in that the viewpoint mining submodule (133) sorts all the information items according to the release time; performing data processing on the sequenced information strips to remove invalid information, remove redundant information and add missing information to obtain text information corresponding to the information strips, and performing keyword extraction on the text information corresponding to the information strips; performing keyword word frequency statistics on the extracted keywords in the text information corresponding to all the information items to obtain viewpoint keywords according to a keyword word frequency system; and searching the sorted information strips for viewpoint keywords to obtain text information corresponding to the information strips containing the viewpoint keywords so as to perform viewpoint mining on the information strips.

A multidimensional public opinion monitoring method comprises the following steps: s210, respectively collecting news information, comment information and microblog information; s220, performing data processing on the collected news information, comment information and microblog information to obtain text information or information bars; step S230, processing the text information or the information bar to obtain a hot word cloud picture, hot news, a hot dynamic picture, an emotion distribution map, a positive attitude word cloud picture, a negative attitude word cloud picture, an opinion statistic graph and a hot opinion statistic map, and displaying the hot word cloud picture, the hot news, the hot animation and the emotion distribution map to the user.

The multidimensional public opinion monitoring method preferably obtains a hot word cloud picture, hot news and a hot dynamic picture, and specifically comprises the following substeps: step S231, carrying out word frequency statistics on the obtained text information; step S232, obtaining hot words according to the word frequency in the text information, and generating a hot word cloud picture according to the hot words; step S233, displaying the news title and the URL corresponding to the hot word as hot news; step S234, taking the hot word as a keyword, and collecting microblog information containing the keyword; and S235, preprocessing the acquired microblog information, and generating a hot spot dynamic graph according to the preprocessed information.

The multidimensional public opinion monitoring method preferably generates a hotspot dynamic graph according to the preprocessed information, and includes the following substeps: s410, sequencing all the information strips formed after the preprocessing according to the release time; step S420, grouping all the sequenced information strips to form a plurality of information strip groups; step S430, calculating the heat of each information bar group; step S440, calculating the emotion of each information strip group; s450, generating a heat degree dynamic graph of the hot spot dynamic graph according to the release time of each information strip group and the heat degree of each information strip group; and step S460, generating an emotion distribution dynamic graph of the hotspot dynamic graph according to the release time of each information strip group and the emotion of each information strip group.

The multidimensional public opinion monitoring method preferably obtains an emotion distribution map, a positive attitude term cloud picture and a negative attitude term cloud picture, and specifically comprises the following substeps: step S510, emotion analysis is carried out on the text information to obtain the proportion of each emotion, the text information of the positive attitude and the text information of the negative attitude in all the text information; step S520, generating an emotion distribution diagram according to the proportion of each emotion in all the obtained text information; step S530, generating a front attitude word cloud picture according to the obtained text information of the front attitude; and S540, generating a negative attitude word cloud picture according to the obtained text information of the negative attitude.

The multidimensional public opinion monitoring method as described above, wherein preferably, the viewpoint statistical graph and the popularity viewpoint statistical graph are obtained, and the detailed substeps are as follows: step S610, carrying out viewpoint mining on the information bars; s630, carrying out heat calculation according to the generated information strip and obtaining heat keywords; and step S640, generating a heat viewpoint statistical chart according to the calculated heat value, and displaying a heat keyword on the heat viewpoint statistical chart.

The multidimensional public opinion monitoring method as described above, wherein preferably the viewpoint mining is performed on the information pieces, includes the following substeps: step S710, sequencing all the information strips according to the release time; s720, performing data processing on the sequenced information strips to remove invalid information, remove redundant information and add missing information to obtain text information corresponding to the information strips; step S730, extracting keywords from the text information corresponding to the information bar; step S740, carrying out keyword word frequency statistics on the extracted keywords in the text information corresponding to all the information items so as to obtain viewpoint keywords according to a keyword word frequency system; step S750, searching the sorted information items for a viewpoint keyword to obtain text information corresponding to the information item containing the viewpoint keyword.

The multidimensional public opinion monitoring method preferably performs popularity calculation according to the information bars to obtain popularity keywords, and comprises the following substeps: step S810, performing heat calculation on each information bar; s820, sequencing all the information strips according to the calculated heat value to obtain heat information strips; step S830, data processing is carried out on the sequenced information strips to obtain text information corresponding to the information strips; and step 840, extracting keywords from the text information corresponding to the heat information items after data processing.

Compared with the background technology, the multidimensional public opinion monitoring system and the multidimensional public opinion monitoring method can carry out public opinion monitoring and analysis from multiple dimensions and all around under the media-integrated era with rich and various contents and forms.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments described in the present application, and other drawings can be obtained by those skilled in the art according to the drawings.

Fig. 1 is a schematic diagram of a multidimensional public opinion monitoring system according to an embodiment of the present application;

fig. 2 is a flowchart of a multidimensional public opinion monitoring method according to the second embodiment of the present application;

fig. 3 is a flowchart for obtaining a hot spot word cloud picture, hot spot news, and a hot spot dynamic picture according to the second embodiment of the present application;

fig. 4 is a flowchart of generating a hotspot dynamic graph according to a second embodiment of the present application;

FIG. 5 is a flowchart for obtaining an emotion distribution map, a positive attitude word cloud map, and a negative attitude word cloud map according to the second embodiment of the present application;

fig. 6 is a flowchart for obtaining a viewpoint statistical graph and a heat viewpoint statistical graph according to the second embodiment of the present application;

FIG. 7 is a flowchart of perspective mining for team information strips provided in the second embodiment of the present application;

fig. 8 is a flowchart for obtaining a popularity keyword according to popularity of information items according to the second embodiment of the present application.

Detailed Description

Reference will now be made in detail to embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are exemplary only for the purpose of explaining the present application and are not to be construed as limiting the present application.

Example one

Referring to fig. 1, fig. 1 is a schematic view of a multidimensional public opinion monitoring system according to an embodiment of the present application;

the application provides a multidimensional public opinion monitoring system, include: hotspot discovery module 110, sentiment analysis module 120, and opinion mining module 130.

The hot spot discovery module 110 is configured to process the news information to discover hot spot information in the news information, and may also display the discovered hot spot information in the form of a hot spot word cloud picture, hot spot news, and a hot spot dynamic picture.

The emotion analysis module 120 is configured to process the comment information to analyze emotion information included in the comment information, and may further display the analyzed emotion information in the form of an emotion distribution map, a positive attitude word cloud picture, and a negative attitude word cloud picture.

The viewpoint mining module 130 is configured to process the microblog information to mine viewpoint information included in the microblog information, and may further display the mined viewpoint information in the form of a viewpoint statistical graph and a popularity viewpoint statistical graph.

Specifically, the hotspot discovery module 110 includes: the system comprises an information acquisition sub-module 111, a data processing sub-module 112, a word frequency statistics sub-module 113, a hot word cloud generation sub-module 114, a hot news generation sub-module 115, a hot microblog information acquisition sub-module 116 and a hot dynamic graph generation sub-module 117.

The information collecting sub-module 111 collects news information of news websites, preferably news headlines of the news websites, and obtains news information from the news headlines.

Specifically, the information collecting submodule 111 sends a collecting request to a server corresponding to a news webpage of the news information to be collected, where the collecting request includes a variable type to be collected, for example: news headlines, URLs corresponding to news pages, etc.

After the server corresponding to the news webpage judges that the acquisition request is valid, the server corresponding to the news webpage sends news information corresponding to the acquisition request to the information acquisition submodule 111, and when the server corresponding to the news webpage sends the news information corresponding to the acquisition request to the information acquisition submodule 111, the server also returns webpage addresses of other webpages linked to the news webpage to the information acquisition submodule 111.

After receiving the news information, the information acquisition submodule 111 further sends an acquisition request to the servers corresponding to all the web page addresses linked to the news web page according to a predetermined policy until all the web pages linked to the news web page are subjected to news information acquisition.

The predetermined strategy may be to collect news information for all the webpages linked on a line in sequence along the line, starting from the current news webpage. In addition, the predetermined policy may be to collect news information of all web pages linked in the current news web page, then select one web page linked to the current news web page, and collect news information of all web pages linked in the web page.

The information collecting submodule 111 analyzes the collected news information and stores the analyzed news information into a news database so as to process the collected news information in the next step.

On the basis, the information acquisition submodule 111 can acquire the news webpage in real time, and can also acquire the news information by reading the offline text of the news webpage. In addition, after the information acquisition submodule 111 acquires the news information, the news information is stored, so that the stored news information can be provided to the information acquisition submodule 121 of the sentiment analysis module 120 and/or the information acquisition submodule 131 of the opinion mining module 130 as offline data, of course, comment information acquired by the information acquisition submodule 121 of the sentiment analysis module 120 can also be provided to the information acquisition submodule 111 of the hotspot discovery module 110 and/or the information acquisition submodule 131 of the opinion mining module 130 as offline data, and microblog information acquired by the information acquisition submodule 131 of the opinion mining module 130 can also be provided to the information acquisition submodule 111 of the hotspot discovery module 110 and/or the information acquisition submodule 121 of the sentiment analysis module 120 as offline data, so that the hotspot discovery module 110, the sentiment analysis module 120, the comment mining module 130, and the like can be realized, Offline data exchange between the perspective mining modules 130.

The data processing sub-module 112 performs data processing on the news information acquired by the information acquisition sub-module 111 to remove invalid information, remove redundant information, and add missing information, thereby obtaining text information.

Specifically, unreasonable information (such as inconsistent information, information against facts and the like) is detected to exist in the collected news information, the unreasonable information is invalid information, and the detected invalid information is deleted from the news information.

And traversing the collected news information, finding out the position with information omission in the news information, and adding the omitted information at the position. Specifically, according to a keyword in a section of data field with missing information in the news information, a data field matched with the keyword is searched from all the news information, and the searched data field completes the news information with missing data. In the process, if a plurality of matched data fields are found, the data field records are selected according to the sequence of the time stamps to complete the missing data.

And traversing the collected news information, finding out redundant information existing in the news information, and deleting the redundant information. Specifically, the similarity between any two pieces of news information is calculated, if the calculated similarity S is smaller than a preset threshold TS, the data recorded in the two pieces of news information is considered to be duplicated data, one piece of the two pieces of news information is selected to be deleted, and preferably, one piece of news information with lower reliability is selected to be deleted according to the reliability of the two pieces of news information.

The preset threshold TS can be set by a user according to a requirement;

the method includes the steps that S is the similarity of first news information and second news information, A is the first news information, B is the second news information, Ai is the weight value of the ith word in the first news information, Bi is the weight value of the ith word in the second news information, and the first news information and the second news information both have a words.

If the number of words in the two pieces of news information is different, the number of words in the two pieces of news information is unified into the same number, for example: and removing the words with smaller weight values in the news information with larger number of words until the number of the words in the two news information is the same.

The word frequency statistic submodule 113 performs word frequency statistics on the text information processed by the data processing submodule 112.

If the text information is mainly English text information, word frequency statistics is directly carried out, if the text information is mainly Chinese text information, word division is carried out on Chinese characters in the Chinese text information, and after the Chinese text information is subjected to word division, word frequency statistics is carried out on the Chinese text information subjected to word division.

Specifically, the probability of occurrence of each adjacent two chinese characters in the chinese text information is calculated, for example: the probability of the common occurrence of the adjacent Chinese characters C and D in the Chinese text information, and the probability of the common occurrence of the adjacent Chinese characters D and E in the Chinese text information.

By the formula

Calculating the probability of the common occurrence of adjacent Chinese characters C and D in the Chinese text information, and obtaining the probability of the common occurrence of adjacent Chinese characters C and D in the Chinese text information through a formula

Calculating the probability of the co-occurrence of the adjacent Chinese characters D and E in the Chinese text information, wherein,

to be the probability of C and D co-occurring adjacently in the pre-trained cluster,

to be the probability that D and E co-occur adjacently in the pre-trained cluster,

is the probability that C occurs alone in the pre-trained cluster,

is the probability that D occurs alone in the pre-trained cluster,

for the probability of E occurring alone in the pre-trained cluster,

as the probability of co-occurrence of C and D in the chinese text information,

in the present application, the pre-trained cluster is a data set composed of a pre-collected number (for example, 1000 pieces) of Chinese text information, wherein the probability that D and E co-occur in the Chinese text information is shown.

If C is the first word of a sentence in the Chinese text information, then

(Preset value) in case C and D are included in the same word, otherwise

(preset value), classifying C and D as different words; if E is the last word of a sentence in the Chinese text information, then

(Preset value) in the same word, otherwise, in the same word, D and E are classified

(preset value), dividing D and E into different words; if D is a word in the middle of a sentence in the Chinese text information, it will be

The corresponding two characters fall under the same word.

In particular according to the formula

Calculating the word frequency M, x of a certain word in the text information with the total number of the text information N_jFor the word in the text information X_jN is the number of text messages in which the word appears in the N text messages.

The hot word cloud generating submodule 114 obtains a hot word according to the word frequency in the text information obtained by the word frequency statistics submodule 113, and generates a hot word cloud picture according to the hot word, so as to display the hot word cloud picture to the user.

Specifically, a hot word cloud picture is generated according to a preset pattern by using a rule that the word frequency is higher and the font is larger, and the hot word cloud picture is displayed. More specifically, the calculated word frequencies are sorted from high to low, and words with the word frequency sorting being in the top preset number are highlighted as hot words to form a hot word cloud picture. In the embodiment of the present application, it is preferable to display 50 words in the hot spot word cloud graph, and 5 of the 50 words are highlighted hot spot words, and of course, the number of words displayed in the hot spot word cloud graph and the number of hot spot words displayed in the hot spot word cloud graph may be other values as long as the requirements of the embodiment of the present application are met.

The hot news generating submodule 115 displays, as hot news, a news title and a URL corresponding to the hot word according to the hot word found by the hot word cloud generating submodule 114, so as to display the hot word to a user.

The hot-spot microblog information collection submodule 116 uses the hot-spot word found by the hot-spot word cloud generation submodule 114 as a keyword, and collects microblog information including the keyword.

Specifically, the hot-spot microblog information collection sub-module 116 sends a collection request to the microblog server, where the collection request carries a keyword, and the keyword is a hot word found by the hot-word cloud generation sub-module 114.

The microblog server traverses each microblog message according to the keyword carried by the acquisition request, and sends the microblog message containing the keyword to the hot microblog message acquisition submodule 116. After receiving the microblog information containing the keywords, the hot microblog information collecting sub-module 116 analyzes and stores the microblog information.

The hot spot dynamic graph generating submodule 117 preprocesses the microblog information acquired by the hot spot microblog information acquiring submodule 116, and generates a hot spot dynamic graph according to the preprocessed information;

specifically, each piece of characteristic information (such as release time, content text, comment quantity, praise quantity and the like) of the acquired microblog information is extracted, and each piece of characteristic information of each piece of microblog information is combined to form an information bar so as to finish preprocessing the microblog information. Generating a hot spot dynamic graph according to the information bar subjected to microblog information preprocessing, wherein the hot spot dynamic graph comprises: and (4) a heat dynamic diagram.

And sequencing all the information strips according to the release time. Specifically, all the information strips are sorted from morning to evening according to the release time of the microblog information contained in each information strip.

And grouping all the information strips in the sequence to form a plurality of information strip groups. Specifically, all the sequenced pieces of information are grouped at certain number intervals to form a plurality of pieces of information groups, for example: and dividing the information strips into an information strip group according to every 20 information strips.

The heat of each information strip group is calculated. In particular, according to the formula

Calculating the heat degree of each information strip group, wherein H is the heat degree value of the information strip group, u is the number of information strips contained in the information strip group, and W₁A weighted value being the sum of the number of comments contained in all the pieces of information in the set of pieces of information, G being the number of comments contained in each piece of information in the set of pieces of information, W₂The weighted value is the sum of the numbers of praise contained in all the information pieces in the information piece group, and Y is the number of praise contained in each information piece in the information piece group. In the present application, preferably u =20, W₁= 70%、W₂= 30%. The sentiment of each information strip group is calculated. Specifically, the following is:

extracting each characteristic information (such as release time, content text, comment quantity, praise quantity and the like) of the acquired microblog information, and combining each characteristic information of each piece of microblog information to form an information strip so as to finish preprocessing the microblog information. Generating a hot spot dynamic graph according to the information bar subjected to microblog information preprocessing, wherein the hot spot dynamic graph comprises: and (4) emotion distribution diagram.

And grouping all the information strips in the sequence to form a plurality of information strip groups. Specifically, all the sequenced pieces of information are grouped at certain number intervals to form a plurality of pieces of information groups, for example: the 20 information pieces are divided into an information piece group. Specifically, one information strip group is

Wherein

For each information strip in the set of information strips, the set of information strips has n information strips, for example: n is 20.

And further acquiring microblog information directly linked or indirectly linked with the microblog information corresponding to each information strip in the information strip group, and forming the linked microblog information into information strips, for example: and information strip

The linked information pieces are

M-1 is the number of linked information pieces, information piece

Information strip linked with it

A set of information pieces is constructed,

。

in addition, the correlation weight between the acquired microblog information and the linked microblog information is also extracted, for example: the association weight of the primary link, the association weight of the secondary link, the association weight of the tertiary link … …, and the like. For another example, the association weight of the primary link is 0.85, the association weight of the secondary link is 0.7, the association weight of the tertiary link is 0.58 … …, etc. Each information strip set

With associated weight sets corresponding thereto

，

Is composed of

Corresponding associated weight value, wherein

。

On the basis, the information pieces are collected

And corresponding associated weight set

Building a set of information strip groups

。

Inputting the information strip group set into a preset classification model, training the classification model to obtain different sub-classification models

Where T =1, 2, 3, … … T, i.e. T sub-classification models are obtained. Using sub-classification models

Wherein T =1, 2, 3, … … T, classifying the information strip group set to obtain a classification result, and estimating a sub-classification model according to the classification result

Set of weights of

. Calculating each sub-classification model by particle swarm optimization algorithm

Set of weights of

Wherein the optimal value corresponding to each weight is determined by each sub-classification model

And the optimal value of its corresponding weight

Or the normalized values of the optimal values are combined to obtain the classification model.

On the basis of the above formula

To obtain

Wherein argmin is

Having a minimum value

A collection of (a).

When the method is used, information strip group sets are acquired according to the steps and are combined and input into the obtained classification model, and therefore different types of emotion vocabularies are obtained for classification in each information strip group.

According to the formula

Calculating the emotion of each information strip group, wherein F is the emotion of each information strip group,

for the number of words of positive emotions in the sorted set of information strips,

negative emotions in groups of information strips obtained for classificationNumber of words of (a), wp_iWeight of vocabulary for positive emotion in emotion dictionary, wp_jAnd the weight of the vocabulary with negative emotion in the emotion dictionary. In the application, the emotion dictionary is a set of corresponding relations between emotion vocabularies generated by utilizing the existing electronic dictionary extension in advance and the weights of the emotion vocabularies. In this example, each information strip group is divided into three categories, namely "positive attitude", "negative attitude", and "neutral attitude", according to the obtained emotion of each information strip group.

And generating a heat dynamic graph according to the release time of each information strip group and the heat of each information strip group. Specifically, because the information bar groups are obtained by dividing the information bars arranged in the order of the release time from morning to evening, and because the plurality of information bar groups also have a time order, the corresponding heat of each information bar group is sorted according to the time order to generate a heat dynamic graph, and the change condition of the heat is shown to the user.

And generating an emotion distribution dynamic graph according to the release time of each information strip group and the emotion of each information strip group. Specifically, the information bar groups are obtained by dividing the information bars which are arranged in the sequence from the morning to the evening according to the release time, and because the plurality of information bar groups have the time sequence, the emotions corresponding to each information bar group are sequenced according to the time sequence to generate an emotion distribution dynamic graph, and the emotion change condition is shown to the user.

Specifically, the emotion analysis module 120 includes: the emotion recognition module comprises an information acquisition sub-module 121, a data processing sub-module 122, an emotion analysis sub-module 123, an emotion distribution diagram sub-module 124, a positive attitude word cloud generation sub-module 125 and a negative attitude word cloud generation sub-module 126.

The information collecting sub-module 121 collects comment information posted in websites and forums.

Specifically, the information collecting sub-module 121 sends a collecting request to a server corresponding to a website and a forum, where the collecting request includes a keyword related to forum information to be collected.

After judging that the collection request is valid, the corresponding server sends comment information corresponding to the collection request to the information collection submodule 121, where the comment information includes: comment information on a movie, comment information on a commodity, comment information on a news event, and the like.

After receiving the comment information, the information collection submodule 121 analyzes the collected comment information and stores the analyzed comment information in the comment database to process the collected comment information in the next step.

On the basis, the information acquisition sub-module 121 may acquire websites and forums in real time, and may also acquire comment information by reading offline texts of the websites and forums.

The data processing sub-module 122 performs data processing on the comment information acquired by the information acquisition sub-module 121 to remove invalid information, remove redundant information, and add missing information, thereby obtaining text information.

Specifically, unreasonable information (such as inconsistent information, information against facts and the like) is detected to exist in the collected comment information, the unreasonable information is invalid information, and the detected invalid information is deleted from the comment information.

And traversing the collected comment information, finding out the position with information omission in the comment information, and adding the omitted information at the position. Specifically, according to the keywords in a section of data field with information omission in the comment information, the data field matched with the keywords is searched from all the comment information, and the searched data field completes the comment information with the omitted data. In the process, if a plurality of matched data fields are found, the data field records are selected according to the sequence of the time stamps to complete the missing data.

And traversing the collected comment information, finding out redundant information existing in the comment information, and deleting the redundant information. Specifically, the similarity between any two pieces of comment information is calculated, if the calculated similarity S is smaller than a preset threshold TS, the data recorded in the two pieces of comment information is considered to be duplicated data, one piece of comment information is selected to be deleted, and preferably, one piece of comment information with lower reliability is selected to be deleted according to the reliability recorded in the two pieces of comment information.

The preset threshold TS can be set by a user according to a requirement;

the similarity of the first comment information and the second comment information is S, A is the first comment information, B is the second comment information, Ai is the weight value of the ith word in the first comment information, Bi is the weight value of the ith word in the second comment information, and the first comment information and the second comment information both have a words.

If the two pieces of comment information have different numbers of words, the numbers of words in the two pieces of comment information are unified into the same number, for example: and removing words with smaller weight values in the comment information with more words until the number of words in the two comment information is the same.

The emotion analysis submodule 123 performs emotion analysis on the text information processed and obtained by the data processing submodule 122 to obtain the proportion of each emotion, the text information of the positive attitude, and the text information of the negative attitude in all the text information.

Specifically, an emotion feature training set is constructed in advance, then the emotion feature training set is input into the bayesian classifier for training to obtain an emotion classifier, and when the emotion analysis sub-module 123 is used, the text information obtained by processing of the data processing module 122 is input into the emotion classifier, so that the emotion of the text information is analyzed. By way of example: and classifying the text information according to the proportion of each emotion in all the text information, wherein the emotion is positive attitude, neutral attitude or negative attitude to obtain the text information of the positive attitude, the text information of the neutral attitude and the text information of the negative attitude. As yet another example, the text information is divided into five categories of "like", "somewhat like", "generally", "less like", and "dislike" according to the analyzed emotion of the text information, wherein "like" and "somewhat like" are categorized as positive attitude, and "less like" and "dislike" are categorized as negative attitude.

The emotion distribution map sub-module 124 generates an emotion distribution map according to the proportion of each emotion in all the text information obtained by the emotion analysis sub-module 123, so as to show the emotion distribution of the comment information to the user.

The front attitude word cloud generating submodule 125 generates a front attitude word cloud picture according to the text information of the front attitude obtained by the emotion analyzing submodule 123, so as to display the front attitude word cloud picture to the user.

Specifically, the word frequency in the text information of the front attitude obtained by the emotion analysis submodule 123 is counted, a front attitude word cloud map is generated according to a predetermined pattern according to a rule that the word frequency is higher and the font is larger, and the front attitude word cloud map is displayed. More specifically, the calculated word frequencies are sorted from high to low, and words with the word frequency sorting being in the top preset number are highlighted to form the front attitude word cloud picture.

The negative attitude word cloud generating submodule 126 generates a negative attitude word cloud picture according to the text information of the negative attitude obtained by the emotion analyzing submodule 123, so that the negative attitude word cloud picture is displayed for the user.

Specifically, the word frequency in the text information of the negative attitude obtained by the emotion analysis submodule 123 is counted, a negative attitude word cloud map is generated according to a predetermined pattern by using a rule that the word frequency is higher and the font is larger, and the negative attitude word cloud map is displayed. More specifically, the calculated word frequencies are sorted from high to low, and words with the word frequency sorting being in the top preset number are highlighted to form the negative attitude word cloud picture.

On the basis of the above, the following method is adopted regardless of whether the word frequency in the text information with positive attitude or the word frequency in the text with negative attitude is calculated.

By the formula

is the probability that C occurs alone in the pre-trained cluster,

is the probability that D occurs alone in the pre-trained cluster,

for the probability of E occurring alone in the pre-trained cluster,

as the probability of co-occurrence of C and D in the chinese text information,

the pre-trained clusters in this application are a pre-collected number (e.g., a pre-collected number) for the probability of co-occurrence of D and E in the Chinese text message: 1000 pieces) of Chinese text information.

If C is the first word of a sentence in the Chinese text information, then

(Preset value) in case C and D are included in the same word, otherwise

The corresponding two characters fall under the same word.

In particular according to the formula

Calculating the word frequency M, x of a certain word occurrence in the text information with the total number of the text information (positive attitude or negative attitude) being N_jFor the word in the text information X_jN is the number of text messages in which the word appears in the N text messages.

The viewpoint mining module 130 includes: the system comprises an information acquisition sub-module 131, a data processing sub-module 132, a viewpoint mining sub-module 133, a viewpoint statistical image-text generation sub-module 134, a heat viewpoint mining sub-module 135 and a heat statistical image-text generation sub-module 136.

The information collecting submodule 131 collects microblog information published on a microblog.

Specifically, the information collecting submodule 131 may collect corresponding microblog information according to the keyword, and a specific collecting manner is the same as a manner in which the information collecting submodule 111 collects news information or the information collecting submodule 121 collects comment information. In addition, the information collecting sub-module 131 may also collect microblog information by reading an offline text.

The data processing sub-module 132 pre-processes the microblog information collected by the information collecting sub-module 131 to generate an information strip.

Specifically, each piece of feature information (such as release time, content text, comment number, praise number, and the like) of the microblog information is extracted, and each piece of feature information of each piece of microblog information is combined to form one information strip.

The viewpoint mining submodule 133 performs viewpoint mining on the information pieces processed by the data processing submodule 132.

And performing data processing on the sequenced information strips to remove invalid information, remove redundant information and add missing information so as to obtain text information corresponding to the information strips. Specifically, the data processing method of the information bar is the same as the data processing method of the data processing submodule 112 for news information or the data processing method of the data processing submodule 122 for comment information.

And extracting keywords from the text information corresponding to the information bars. Specifically, the keywords are extracted from the text information corresponding to the information bars, and in the application, at most 5 keywords are preferably extracted from the text information corresponding to each information bar.

And carrying out keyword word frequency statistics on the extracted keywords in the text information corresponding to all the information items so as to obtain viewpoint keywords according to a keyword word frequency system. Specifically, the extracted keywords are subjected to keyword word frequency statistics in the text information corresponding to all the information items. According to the formula

Performing word frequency statistics of keywords in the text information corresponding to all the information items, wherein N is the total number of the text information corresponding to all the information items, M is the word frequency of a certain word in the text information with the total number of the text information being N, and x_jFor the word in the text information X_jN is the number of text messages in which the word appears in the N text messages. And performing accumulated statistics on the word frequencies of the keywords of the text information corresponding to all the information bars, taking the keywords with the word frequency statistics ranking within a preset range as the viewpoint keywords corresponding to the microblog information, and preferably taking the keywords 5 th before the word frequency statistics ranking as the viewpoint keywords in the application.

And searching the ordered information pieces for the viewpoint keywords to obtain the text information corresponding to the information pieces containing the viewpoint keywords. Specifically, the viewpoint keywords are searched in the sorted information bars in sequence, once the viewpoint keywords appear in the text information corresponding to the information bars, the information bars are considered to be matched, the search on the information bars is stopped, and if the number of the matched information bars reaches a set upper limit of the number, the whole search on the viewpoint keywords is stopped; if the number of the matched information strips does not reach the set upper limit of the number, switching to the next information strip for continuing searching; in the present application, the upper limit of the number of pieces of information matching each viewpoint keyword is set to 5. And after the information bar is matched with the viewpoint keywords, acquiring text information corresponding to the information bar to form a viewpoint statistical chart later.

The viewpoint statistical graph generation submodule 134 generates a viewpoint statistical graph from the viewpoints mined by the viewpoint mining submodule 133 to be presented to the user.

Specifically, the viewpoint statistical graph-text generating sub-module 134 generates the viewpoint statistical graph-text according to the obtained viewpoint keywords and the statistical word frequency thereof, and preferably, further displays the text information corresponding to the obtained matching information pieces at the position corresponding to the viewpoint statistical graph-text.

The hot viewpoint mining submodule 135 performs hot calculation according to the information bar generated by the data processing submodule 132, and obtains a hot keyword.

And performing heat calculation on each information bar. In particular, according to the formula

Calculating the heat of each information bar, wherein H is the heat value of the information bar, W₃A weighted value for the number of comments of a bar, G the number of comments contained in a bar, W₄Is a weight of the number of praise for the stripe, and Y is the number of praise contained by the stripe. In the present application, W is preferred₃= 70%、W₄= 30%。

And sequencing all the information strips according to the calculated heat value to obtain the heat information strips. According to the calculated heat value, the information strips are sorted from high to low, the information strips of the first few bits are used as heat information strips, and the information strips of the first 5 bits are preferably used as heat information strips in the application.

And carrying out data processing on the sequenced information strips to obtain text information corresponding to the information strips. Specifically, data processing is performed on the sequenced information strips to remove invalid information, remove redundant information, and add missing information, so as to obtain text information corresponding to the information strips. Specifically, the data processing method of the information bar is the same as the data processing method of the data processing submodule 112 for news information or the data processing method of the data processing submodule 122 for comment information.

And extracting keywords from the text information corresponding to the heat information strip after data processing to obtain heat keywords. Specifically, the keywords are extracted from the heat information strips after data processing to obtain the heat keywords, and in the present application, the upper limit of the number of the heat keywords extracted from each heat information strip is preferably 3.

The hot point view statistical sub-module 136 generates a hot point view statistical chart from the hot point values calculated by the hot point view mining sub-module 135, and also displays a hot keyword on the hot point view statistical chart.

Specifically, a generated heat viewpoint statistical chart is generated according to the obtained heat value of the heat information bar, and in addition, the text information and the heat keywords of the heat information bar can be displayed at the corresponding position of the heat viewpoint statistical chart.

Example two

Please refer to fig. 2, fig. 2 is a flowchart of a multidimensional public opinion monitoring method according to a second embodiment of the present application;

the application provides a multidimensional public opinion monitoring method, which comprises the following steps:

s210, respectively collecting news information, comment information and microblog information;

specifically, news information, comment information and microblog information are collected in real time, and offline texts can be read to collect the news information, the comment information and the microblog information offline. For example: offline data exchange is carried out among the offline cached news information, comment information and microblog information, so that offline data collection is achieved.

S220, performing data processing on the collected news information, comment information and microblog information to obtain text information or information bars;

the data processing is carried out on the news information, and the data processing method specifically comprises the following steps:

unreasonable information (such as inconsistent information, information against facts and the like) is detected to exist in the collected news information, the unreasonable information is invalid information, and the detected invalid information is deleted from the news information.

The preset threshold TS can be set by a user according to a requirement;

The comment information is subjected to data processing, and the data processing specifically comprises the following steps:

unreasonable information (such as inconsistent information, information against facts and the like) is detected to exist in the collected comment information, the unreasonable information is invalid information, and the detected invalid information is deleted from the comment information.

The preset threshold TS can be set by a user according to a requirement;

The data processing is carried out on the microblog information, and the method specifically comprises the following steps:

extracting each feature information (such as release time, content text, comment quantity, praise quantity and the like) of the microblog information, and combining each feature information of each piece of microblog information to form an information strip.

Step S230, processing the text information or the information bar to obtain a hot word cloud picture, hot news, a hot dynamic picture, an emotion distribution map, a positive attitude word cloud, a negative attitude word cloud, a viewpoint statistical picture and a hot statistical picture, and displaying the hot word cloud picture, the hot news, the hot dynamic picture and the emotion distribution map to the user.

Referring to fig. 3, a hot word cloud picture, hot news, and a hot dynamic picture are obtained, which includes the following steps:

step S231, carrying out word frequency statistics on the obtained text information;

By the formula

is the probability that C occurs alone in the pre-trained cluster,

is the probability that D occurs alone in the pre-trained cluster,

trained in advance for EThe probability of a single occurrence in a trained cluster,

as the probability of co-occurrence of C and D in the chinese text information,

If C is the first word of a sentence in the Chinese text information, then

(Preset value) in case C and D are included in the same word, otherwise

The corresponding two characters fall under the same word.

In particular according to the formula

Calculating the word frequency M, x of a certain word in the text information with the total number of the text information N_jFor the word in the text information X_jN is in the N textsThe number of text messages in which the word appears in the message.

And step S232, obtaining hot words according to the word frequency in the text information, and generating a hot word cloud picture according to the hot words.

And step S233, displaying the news title and the URL corresponding to the hot word as hot news so as to display the hot news title and the URL to a user.

Step S234, the hot words are used as key words, and microblog information containing the key words is collected.

S235, preprocessing the acquired microblog information, and generating a hot spot dynamic graph according to the preprocessed information;

specifically, each piece of characteristic information (such as release time, content text, comment quantity, praise quantity and the like) of the acquired microblog information is extracted, and each piece of characteristic information of each piece of microblog information is combined to form an information bar so as to finish preprocessing the microblog information. Generating a hot spot dynamic graph according to the information bar subjected to microblog information preprocessing, wherein the hot spot dynamic graph comprises: a heat dynamic diagram and an emotion distribution dynamic diagram.

Referring to fig. 4, generating a hot spot dynamic graph according to the preprocessed information includes the following sub-steps:

s410, sequencing all the information strips formed after the preprocessing according to the release time;

specifically, all the information strips are sequenced from morning to evening according to the release time of the microblog information contained in each information strip;

step S420, grouping all the sequenced information strips to form a plurality of information strip groups;

specifically, all the sequenced pieces of information are grouped at certain number intervals to form a plurality of pieces of information groups, for example: and dividing the information strips into an information strip group according to every 20 information strips.

Step S430, calculating the heat of each information bar group;

in particular, according to the formula

Calculating the heat degree of each information strip group, wherein H is the heat degree value of the information strip group, u is the number of information strips contained in the information strip group, and W₁A weighted value being the sum of the number of comments contained in all the pieces of information in the set of pieces of information, G being the number of comments contained in each piece of information in the set of pieces of information, W₂The weighted value is the sum of the numbers of praise contained in all the information pieces in the information piece group, and Y is the number of praise contained in each information piece in the information piece group. In the present application, preferably u =20, W₁= 70%、W₂= 30%。

Step S440, calculating the emotion of each information strip group;

specifically, the following is:

And grouping all the information strips in the sequence to form a plurality of information strip groups. In particular, pressGrouping all the information strips in the sequence according to a certain number of intervals to form a plurality of information strip groups, for example: the 20 information pieces are divided into an information piece group. Specifically, one information strip group is

Wherein

The linked information pieces are

M-1 is the number of linked information pieces, information piece

Information strip linked with it

A set of information pieces is constructed,

。

With associated weight sets corresponding thereto

，

Is composed of

Corresponding associated weight value, wherein

。

On the basis, the information pieces are collected

And corresponding associated weight set

Building a set of information strip groups

。

Wherein, in the step (A),

i.e. T sub-classification models are obtained. Using sub-classification models

Wherein

Classifying the information strip group set to obtain a classification result, and estimating a sub-classification model according to the classification result

Set of weights of

Set of weights of

And the optimal value of its corresponding weight

On the basis of the above formula

To obtain

Wherein argmin is

Having a minimum value

A collection of (a).

In particular, according to the formula

for the number of words of negative emotions in the sorted set of notes, wp_iWeight of vocabulary for positive emotion in emotion dictionary, wp_jAnd the weight of the vocabulary with negative emotion in the emotion dictionary. In the application, the emotion dictionary is a set of corresponding relations between emotion vocabularies generated by utilizing the existing electronic dictionary extension in advance and the weights of the emotion vocabularies.

In this example, each information strip group is divided into three categories, namely "positive attitude", "negative attitude", and "neutral attitude", according to the obtained emotion of each information strip group.

S450, generating a heat dynamic graph according to the release time of each information strip group and the heat of each information strip group;

specifically, because the information bar groups are obtained by dividing the information bars arranged in the order of the release time from morning to evening, and because the plurality of information bar groups also have a time order, the corresponding heat of each information bar group is sorted according to the time order to generate a heat dynamic graph, and the change condition of the heat is shown to the user.

Step S460, generating an emotion distribution dynamic graph according to the release time of each information strip group and the emotion of each information strip group;

specifically, the information bar groups are obtained by dividing the information bars which are arranged in the sequence from the morning to the evening according to the release time, and because the plurality of information bar groups have the time sequence, the emotions corresponding to each information bar group are sequenced according to the time sequence to generate an emotion distribution dynamic graph, and the emotion change condition is shown to the user.

Referring to fig. 5, an emotion distribution map, a positive attitude word cloud picture, and a negative attitude word cloud picture are obtained, which includes the following steps:

and step S510, performing emotion analysis on the text information to obtain the proportion of each emotion, the text information of the positive attitude and the text information of the negative attitude in all the text information.

Specifically, an emotion feature training set is constructed in advance, then the emotion feature training set is input into a Bayes classifier for training to obtain an emotion classifier, and the processed text information is input into the emotion classifier, so that the emotion of the text information is analyzed. By way of example: and classifying the text information according to the proportion of each emotion in all the text information, wherein the emotion is positive attitude, neutral attitude or negative attitude to obtain the text information of the positive attitude, the text information of the neutral attitude and the text information of the negative attitude. As yet another example, the text information is divided into five categories of "like", "somewhat like", "generally", "less like", and "dislike" according to the analyzed emotion of the text information, wherein "like" and "somewhat like" are categorized as positive attitude, and "less like" and "dislike" are categorized as negative attitude.

And step S520, generating an emotion distribution diagram according to the proportion of each emotion in all the obtained text information so as to display the emotion distribution of the comment information to the user.

And step S530, generating a front attitude word cloud picture according to the obtained text information of the front attitude so as to display the front attitude word cloud picture to a user.

Specifically, the word frequency in the obtained text information with the positive attitude is generated into a cloud picture with the positive attitude according to a preset pattern by using a rule that the word frequency is higher and the font is larger, and the cloud picture with the positive attitude is displayed. More specifically, the calculated word frequencies are sorted from high to low, and words with the word frequency sorting being in the top preset number are highlighted to form the front attitude word cloud picture.

And S540, generating a negative attitude word cloud picture according to the obtained text information of the negative attitude so as to display the negative attitude word picture to the user.

Specifically, the word frequency in the text information with the negative attitude obtained through statistics is generated according to a preset pattern by using a rule that the word frequency is higher and the font is larger, and the word cloud picture with the negative attitude is displayed. More specifically, the calculated word frequencies are sorted from high to low, and words with the word frequency sorting being in the top preset number are highlighted to form the negative attitude word cloud picture.

By the formula

in pre-trained clusters for D and EThe probability of co-occurrence of neighbors is,

is the probability that C occurs alone in the pre-trained cluster,

is the probability that D occurs alone in the pre-trained cluster,

for the probability of E occurring alone in the pre-trained cluster,

as the probability of co-occurrence of C and D in the chinese text information,

If C is the first word of a sentence in the Chinese text information, then

(Preset value) in case C and D are included in the same word, otherwise

(preset value), classifying C and D as different words; if E is the last word of a sentence in the chinese text message,

(preset value), dividing D and E into different words; if D is in a sentence in Chinese text informationA word between them will

The corresponding two characters fall under the same word.

In particular according to the formula

Referring to fig. 6, a viewpoint statistical graph and a heat viewpoint statistical graph are obtained, which includes the following steps:

step S610, carrying out viewpoint mining on the information bars;

specifically, referring to fig. 7, the viewpoint mining for the information bar includes the following sub-steps:

step S710, sequencing all the information strips according to the release time;

specifically, all the information strips are sorted from morning to evening according to the release time of the microblog information contained in each information strip.

And step S720, performing data processing on the sequenced information strips to remove invalid information, remove redundant information and add missing information, so as to obtain text information corresponding to the information strips.

Specifically, the data processing method for the information bar is the same as the data processing method for the news information or the data processing method for the comment information.

Step S730, extracting keywords from the text information corresponding to the information bar;

specifically, the keywords are extracted from the text information corresponding to the information bars, and in the application, at most 5 keywords are preferably extracted from the text information corresponding to each information bar.

Step S740, carrying out keyword word frequency statistics on the extracted keywords in the text information corresponding to all the information items so as to obtain viewpoint keywords according to a keyword word frequency system;

specifically, the extracted keywords are subjected to keyword word frequency statistics in the text information corresponding to all the information items. According to the formula

Performing keyword word frequency statistics in the text information corresponding to all the information items, wherein N is the total number of the text information corresponding to all the information items, M is the word frequency of a certain word in the text information with the total number of the text information being N, and x_jFor the word in the text information X_jN is the number of text messages in which the word appears in the N text messages.

And performing accumulated statistics on the word frequencies of the keywords of the text information corresponding to all the information bars, taking the keywords with the word frequency statistics ranking within a preset range as the viewpoint keywords corresponding to the microblog information, and preferably taking the keywords 5 th before the word frequency statistics ranking as the viewpoint keywords in the application.

Step S750, searching viewpoint keywords in the sorted information bars to obtain text information corresponding to the information bars containing the viewpoint keywords;

specifically, the viewpoint keywords are searched in the sorted information bars in sequence, once the viewpoint keywords appear in the text information corresponding to the information bars, the information bars are considered to be matched, the search on the information bars is stopped, and if the number of the matched information bars reaches a set upper limit of the number, the whole search on the viewpoint keywords is stopped; if the number of the matched information strips does not reach the set upper limit of the number, switching to the next information strip for continuing searching; in the present application, the upper limit of the number of pieces of information matching each viewpoint keyword is set to 5. And after the information bar is matched with the viewpoint keywords, acquiring text information corresponding to the information bar to form a viewpoint statistical chart later.

Step S620, generating a viewpoint statistical graph-text according to the mining viewpoint so as to display the viewpoint statistical graph-text for a user;

specifically, a viewpoint statistical graph is generated according to the obtained viewpoint keywords and the statistical word frequency thereof, and preferably, the text information corresponding to the obtained matching information pieces is also displayed at a position corresponding to the viewpoint statistical graph.

S630, carrying out heat calculation according to the generated information bar and obtaining heat keywords;

specifically, please refer to fig. 8, the heat calculation is performed according to the information bars to obtain the heat keyword, which includes the following sub-steps:

step S810, performing heat calculation on each information bar;

in particular, according to the formula

S820, sequencing all the information strips according to the calculated heat value to obtain heat information strips;

according to the calculated heat value, the information strips are sorted from high to low, the information strips of the first few bits are used as heat information strips, and the information strips of the first 5 bits are preferably used as heat information strips in the application.

Step S830, data processing is carried out on the sequenced information strips to obtain text information corresponding to the information strips;

specifically, data processing is performed on the sequenced information strips to remove invalid information, remove redundant information, and add missing information, so as to obtain text information corresponding to the information strips. Specifically, the data processing method for the information bar is the same as the data processing method for the news information or the data processing method for the comment information.

Step 840, extracting keywords from the text information corresponding to the heat information bar after data processing;

specifically, the keywords are extracted from the heat degree information pieces after data processing, and in the present application, the upper limit of the number of the keywords extracted from each heat degree information piece is preferably 3.

Step S640 is to generate a heat viewpoint statistical map according to the calculated heat value, and also to display a heat keyword on the heat viewpoint statistical map.

The multidimensional public opinion monitoring system and the multidimensional public opinion monitoring method can carry out public opinion monitoring and analysis from multiple dimensions and all-round directions under the media-integrated era with rich and diversified contents and forms.

It will be evident to those skilled in the art that the present application is not limited to the details of the foregoing illustrative embodiments, and that the present application may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the application being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned.

Furthermore, it should be understood that although the present description refers to embodiments, not every embodiment may contain only a single embodiment, and such description is for clarity only, and those skilled in the art should integrate the description, and the embodiments may be combined as appropriate to form other embodiments understood by those skilled in the art.

Claims

1. The utility model provides a multidimensional public opinion monitoring system which characterized in that includes: the system comprises a hotspot discovery module (110), an emotion analysis module (120) and a viewpoint mining module (130);

wherein the hotspot discovery module (110) comprises: the system comprises a first information acquisition sub-module (111), a first data processing sub-module (112), a word frequency statistics sub-module (113), a hot word cloud generation sub-module (114), a hot news generation sub-module (115), a hot microblog information acquisition sub-module (116) and a hot dynamic graph generation sub-module (117);

the first information acquisition submodule (111) acquires news information of a news website;

the first data processing submodule (112) performs data processing on the news information acquired by the first information acquisition submodule (111) to remove invalid information, remove redundant information and add missing information to obtain text information;

the word frequency statistic submodule (113) carries out word frequency statistics on the text information obtained by the processing of the first data processing submodule (112);

the hot word cloud generating submodule (114) obtains hot words according to the word frequency in the text information obtained by the word frequency counting submodule (113) in a counting mode, and generates a hot word cloud picture according to the hot words;

the hot news generating submodule (115) generates hot words found by the submodule (114) according to the hot word cloud, and displays news titles and URLs corresponding to the hot words as hot news;

the hot microblog information acquisition sub-module (116) takes the hot words found by the hot word cloud generation sub-module (114) as keywords and acquires microblog information containing the keywords;

the hot spot dynamic graph generating sub-module (117) preprocesses the microblog information acquired by the hot spot microblog information acquisition sub-module (116), and generates a hot spot dynamic graph according to the preprocessed information;

the hot spot dynamic graph generation sub-module (117) sequences all the information strips formed after the preprocessing according to the release time;

grouping all the information strips in the sequence to form a plurality of information strip groups;

one information strip group is [ x ]₁₁、x₂₁……x_n1]Wherein x is₁₁、x₂₁……x_n1For each information strip of the set of information strips, the set of information strips hasn pieces of information;

calculating the heat of each information bar group;

calculating the emotion of each information strip group;

collecting microblog information directly linked or indirectly linked with the microblog information corresponding to each information strip in the information strip group, and forming the linked microblog information into information strips and information strips x_i1The linked information strip is x_i2、x_i3……x_imM-1 is the number of linked information pieces, information piece x_i1Information item x linked thereto_i2、x_i3……x_imBuilding a set of information pieces, X_i＝{x_i1、x_i2、x_i3……x_im}；

Extracting an association weight value between the acquired microblog information and the linked microblog information, wherein each information strip set X is_i＝{x_i1、x_i2、x_i3……x_imHave associated weight value set Y corresponding to it_i＝{y_i1、y_i2、y_i3……y_im}，y_i1、y_i2、y_i3……y_imIs x_i1、x_i2、x_i3……x_imCorresponding correlation weight, wherein y_i1＝1；

By gathering X of information strips_i＝{x_i1、x_i2、x_i3……x_imAnd the corresponding associated weight set Y_i＝{y_i1、y_i2、y_i3……y_imConstruction of information strip set S_n＝{(X₁，Y₁)，(X₂，Y₂)……(X_n，Y_n)}；

Inputting the information strip group set into a preset classification model, training the classification model to obtain different sub-classification models f_t(a) Wherein T is 1, 2, 3, … … T, using a sub-classification model f_t(a) Wherein T is 1, 2, 3, … … T, classifying the information bar group set to obtain a classification result, and obtaining the classification result through a formula

To obtain { mu₁，μ₂，μ₃……μ_TWherein argmin is

A set of μ with a minimum value;

calculating each sub-classification model f by particle swarm optimization algorithm_t(a) Set of weights of (u) { mu }₁，μ₂，μ₃……μ_TIn the method, the optimal value corresponding to each weight passes through each sub-classification model { f }₁(X)、f₂(X)……f_T(X) } and its corresponding optimal value of weight [ mu ]₁，μ₂，μ₃……μ_TCombining the normalized values of the optimal values to obtain a classification model;

acquiring and obtaining an information strip group set, combining and inputting the information strip groups into the obtained classification model, and classifying each information strip group to obtain different types of emotion vocabularies;

according to the formula

Calculating the emotion of each information strip group, wherein F is the emotion of each information strip group, N_pNumber of words of positive emotion in the set of information pieces obtained by classification, N_nFor the number of words of negative emotions in the sorted set of notes, wp_iWeight of vocabulary for positive emotion in emotion dictionary, wp_jThe weight value of the vocabulary with negative emotion in the emotion dictionary;

generating a hot degree dynamic graph of the hot spot dynamic graph according to the release time of each information strip group and the hot degree of each information strip group;

generating an emotion distribution dynamic graph of the hot spot dynamic graph according to the issuing time of each information strip group and the emotion of each information strip group;

the emotion analysis module (120) includes: a second information acquisition sub-module (121), a second data processing sub-module (122), an emotion analysis sub-module (123), an emotion distribution diagram sub-module (124), a positive attitude word cloud generation sub-module (125) and a negative attitude word cloud generation sub-module (126);

a second information acquisition sub-module (121) acquires comment information posted on websites and forums;

the second data processing submodule (122) performs data processing on the comment information acquired by the second information acquisition submodule (121) to remove invalid information, remove redundant information and add missing information to obtain text information;

the emotion analysis submodule (123) is used for carrying out emotion analysis on the text information obtained by processing of the second data processing submodule (122) so as to obtain the proportion of each emotion in all the text information, the text information with positive attitude and the text information with negative attitude;

the emotion distribution map submodule (124) generates an emotion distribution map according to the proportion of each emotion in all the text information obtained by the emotion analysis submodule (123);

the front attitude word cloud generating submodule (125) generates a front attitude word cloud picture according to the text information of the front attitude obtained by the emotion analyzing submodule (123);

the negative attitude word cloud generating submodule (126) generates a negative attitude word cloud picture according to the text information of the negative attitude obtained by the emotion analyzing submodule (123);

the viewpoint mining module (130) includes: a third information acquisition sub-module (131), a third data processing sub-module (132), a viewpoint mining sub-module (133), a viewpoint statistical image-text generation sub-module (134), a hot viewpoint mining sub-module (135) and a hot statistical image-text generation sub-module (136);

microblog information issued on a microblog by a third information acquisition submodule (131);

the third data processing submodule (132) preprocesses the microblog information acquired by the third information acquisition submodule (131) to generate information strips;

the viewpoint mining submodule (133) performs viewpoint mining on the information strips obtained by processing of the third data processing submodule (132);

a viewpoint statistical image-text generation submodule (134) generates a viewpoint statistical image-text according to the viewpoint mined by the viewpoint mining submodule (133);

the heat viewpoint mining submodule (135) carries out heat calculation according to the information bar generated by the third data processing submodule (132) and obtains a heat keyword;

a hot point view statistical generation sub-module (136) generates a hot point view statistical chart from the hot point values calculated by the hot point view mining sub-module (135), and displays the hot keywords on the hot point view statistical chart.

2. The multidimensional public opinion monitoring system according to claim 1, wherein the view mining submodule (133) sorts all the information pieces according to release time; performing data processing on the sequenced information strips to remove invalid information, remove redundant information and add missing information to obtain text information corresponding to the information strips, and performing keyword extraction on the text information corresponding to the information strips; performing keyword word frequency statistics on the extracted keywords in the text information corresponding to all the information items to obtain viewpoint keywords according to a keyword word frequency system; and searching the sorted information strips for viewpoint keywords to obtain text information corresponding to the information strips containing the viewpoint keywords so as to perform viewpoint mining on the information strips.

3. A multidimensional public opinion monitoring method is characterized by comprising the following steps:

step S230, processing the text information or the information strip to obtain a hot word cloud picture, hot news, a hot dynamic picture, an emotion distribution map, a positive attitude word cloud picture, a negative attitude word cloud picture, an opinion statistic graph and a hot opinion statistic picture, and displaying the hot word cloud picture, the hot news, the hot animation and the negative attitude word cloud picture to a user;

obtaining a hot word cloud picture, hot news and a hot dynamic picture, and concretely comprising the following substeps:

step S232, obtaining hot words according to the word frequency in the text information, and generating a hot word cloud picture according to the hot words;

step S233, displaying the news title and the URL corresponding to the hot word as hot news;

step S234, taking the hot word as a keyword, and collecting microblog information containing the keyword;

generating a hot spot dynamic graph according to the preprocessed information, comprising the following substeps:

one information strip group is [ x ]₁₁、x₂₁……x_n1]Wherein x is₁₁、x₂₁……x_n1For each information strip in the information strip group, the information strip group has n information strips;

step S430, calculating the heat of each information bar group;

step S440, calculating the emotion of each information strip group;

Extracting an association weight value between the acquired microblog information and the linked microblog information, wherein each information strip set X is_i＝{x_i1、x_i2、x_i3……x_imHave a correspondence theretoAssociated weight set Y_i＝{y_i1、y_i2、y_i3……y_im}，y_i1、y_i2、y_i3……y_imIs x_i1、x_i2、x_i3……x_imCorresponding correlation weight, wherein y_i1＝1；

To obtain { mu₁，μ₂，μ₃……μ_TWherein argmin is

A set of μ with a minimum value;

calculating each sub-classification model f by particle swarm optimization algorithm_t(a) Set of weights of (u) { mu }₁，μ₂，μ₃……μ_TIn the method, the optimal value corresponding to each weight passes through each sub-classification model { f }₁(X)、f₂(X)……f_T(X) } and its corresponding optimal value of weight [ mu ]₁，μ₂，μ₃......μ_TCombining the normalized values of the optimal values to obtain a classification model;

according to the formula

s450, generating a heat degree dynamic graph of the hot spot dynamic graph according to the release time of each information strip group and the heat degree of each information strip group;

and step S460, generating an emotion distribution dynamic graph of the hotspot dynamic graph according to the release time of each information strip group and the emotion of each information strip group.

4. The method for monitoring multidimensional public sentiment according to claim 3, wherein the emotion distribution map, the positive attitude term cloud picture and the negative attitude term cloud picture are obtained by the following steps:

step S510, emotion analysis is carried out on the text information to obtain the proportion of each emotion, the text information of the positive attitude and the text information of the negative attitude in all the text information;

step S520, generating an emotion distribution diagram according to the proportion of each emotion in all the obtained text information;

step S530, generating a front attitude word cloud picture according to the obtained text information of the front attitude;

and S540, generating a negative attitude word cloud picture according to the obtained text information of the negative attitude.

5. The method for monitoring multidimensional public opinion of claim 3, wherein the viewpoint statistical graph and the popularity viewpoint statistical graph are obtained by the following steps:

step S610, carrying out viewpoint mining on the information bars;

step S620, generating viewpoint statistical graphics and texts according to the mining viewpoint;

and step S640, generating a heat viewpoint statistical chart according to the calculated heat value, and displaying a heat keyword on the heat viewpoint statistical chart.

6. The method for monitoring multidimensional public opinion of claim 5, wherein the viewpoint mining of information strips comprises the following substeps:

step S710, sequencing all the information strips according to the release time;

s720, performing data processing on the sequenced information strips to remove invalid information, remove redundant information and add missing information to obtain text information corresponding to the information strips;

step S750, searching the sorted information items for a viewpoint keyword to obtain text information corresponding to the information item containing the viewpoint keyword.

7. The method for monitoring multidimensional public opinion as claimed in claim 5, wherein the calculating of popularity according to the information bars to obtain the popularity keyword comprises the following substeps:

step S810, performing heat calculation on each information bar;

and step 840, extracting keywords from the text information corresponding to the heat information items after data processing.