CN112417253B - Multi-dimensional public opinion monitoring system and method - Google Patents

Multi-dimensional public opinion monitoring system and method Download PDF

Info

Publication number
CN112417253B
CN112417253B CN202011573978.9A CN202011573978A CN112417253B CN 112417253 B CN112417253 B CN 112417253B CN 202011573978 A CN202011573978 A CN 202011573978A CN 112417253 B CN112417253 B CN 112417253B
Authority
CN
China
Prior art keywords
information
hot
emotion
module
text
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN202011573978.9A
Other languages
Chinese (zh)
Other versions
CN112417253A (en
Inventor
王三山
付巍
张瑜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wang Sanshan
Original Assignee
Time Know Beijing Culture Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Time Know Beijing Culture Technology Co ltd filed Critical Time Know Beijing Culture Technology Co ltd
Priority to CN202011573978.9A priority Critical patent/CN112417253B/en
Publication of CN112417253A publication Critical patent/CN112417253A/en
Application granted granted Critical
Publication of CN112417253B publication Critical patent/CN112417253B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9536Search customisation based on social or collaborative filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9538Presentation of query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking

Abstract

The application relates to big data processing technology field, especially relates to a multidimensional public opinion monitoring system and method, wherein, multidimensional public opinion monitoring system includes: the system comprises a hotspot discovery module, an emotion analysis module and a viewpoint mining module, and can carry out public opinion monitoring and analysis from multiple dimensions and all around under the media-fusing era with rich and various contents and forms.

Description

Multi-dimensional public opinion monitoring system and method
Technical Field
The application relates to the technical field of big data processing, in particular to a multidimensional public opinion monitoring system and method.
Background
The "public sentiment" refers to public sentiment or emotion, and the public sentiment is transformed into public sentiment when gathered to a certain extent, so that the public sentiment monitoring is very important for media supervisors, public sentiment guides or content producers such as mainstream media represented by broadcast television, transmission platforms with great social influence and the like.
The smart media era has come from the convergence of traditional media and emerging media to the convergence of new generation information technology-enabled media such as big data, artificial intelligence, etc. Under the intelligent media era, the processing object of public opinion monitoring is rich and diversified contents and forms of integrated media, but the existing public opinion monitoring technology has single function, is usually only used for collecting and discretely analyzing a few specific information sources, cannot form linkage between the information sources and between functions, has insufficient systematization degree, and is difficult to comprehensively analyze and monitor public opinion events.
Therefore, how to comprehensively analyze and monitor public sentiment events in the media-oriented era with rich and varied contents and forms is a technical problem to be solved by those skilled in the art.
Disclosure of Invention
The application provides a multidimensional public opinion monitoring system and a multidimensional public opinion monitoring method, which are used for comprehensively analyzing and monitoring public opinion events under the media-integrated era with rich and various contents and forms.
In order to solve the technical problem, the application provides the following technical scheme:
a multidimensional public opinion monitoring system comprises: the system comprises a hotspot discovery module, an emotion analysis module and a viewpoint mining module; wherein, the hot spot discovery module comprises: the system comprises a first information acquisition sub-module, a first data processing sub-module, a word frequency statistics sub-module, a hot word cloud generation sub-module, a hot news generation sub-module, a hot microblog information acquisition sub-module and a hot dynamic graph generation sub-module; the first information acquisition submodule acquires news information of a news website; the first data processing submodule performs data processing on the news information acquired by the first information acquisition submodule to remove invalid information, remove redundant information and add missing information to obtain text information; the word frequency counting submodule carries out word frequency counting on the text information obtained by the processing of the first data processing submodule; the hot word cloud generating submodule obtains hot words according to the word frequency in the text information obtained by the word frequency statistics submodule, and generates a hot word cloud picture according to the hot words; the hot news generating submodule generates hot words found by the submodule according to the hot word cloud, and displays news titles and URLs corresponding to the hot words as hot news; the hot microblog information acquisition sub-module takes the hot words found by the hot word cloud generation sub-module as keywords and acquires microblog information containing the keywords; the hot spot dynamic graph generating submodule preprocesses the microblog information acquired by the hot spot microblog information acquisition submodule and generates a hot spot dynamic graph according to the preprocessed information; the emotion analysis module comprises: the emotion recognition system comprises a second information acquisition sub-module, a second data processing sub-module, an emotion analysis sub-module, an emotion distribution diagram sub-module, a positive attitude word cloud generation sub-module and a negative attitude word cloud generation sub-module; the second information acquisition sub-module acquires comment information published on websites and forums; the second data processing submodule performs data processing on the comment information acquired by the second information acquisition submodule to remove invalid information, remove redundant information and add missing information to obtain text information; the emotion analysis submodule carries out emotion analysis on the text information processed and obtained by the second data processing submodule so as to obtain the proportion of each emotion in all the text information, the text information of the positive attitude and the text information of the negative attitude; the emotion distribution map submodule generates an emotion distribution map according to the proportion of each emotion in all text information obtained by the emotion analysis submodule; the positive attitude word cloud generating submodule generates a positive attitude word cloud picture according to the text information of the positive attitude obtained by the emotion analyzing submodule; the negative attitude word cloud generating submodule generates a negative attitude word cloud picture according to the text information of the negative attitude obtained by the emotion analyzing submodule; the viewpoint mining module comprises: a third information acquisition sub-module, a third data processing sub-module, a viewpoint mining sub-module, a viewpoint statistical image-text generation sub-module, a heat viewpoint mining sub-module and a heat statistical image-text generation sub-module; microblog information issued on a microblog by a third information acquisition submodule; the third data processing submodule preprocesses the microblog information acquired by the third information acquisition submodule to generate an information bar; the viewpoint mining submodule performs viewpoint mining on the information strips processed by the third data processing submodule; the viewpoint statistical image-text generation submodule generates viewpoint statistical image-text according to the viewpoint mined by the viewpoint mining submodule; the hot viewpoint mining submodule carries out hot calculation according to the information bar generated by the third data processing submodule and obtains hot keywords; the heat degree view point statistic sub-module generates a heat degree view point statistic map according to the heat degree value calculated by the heat degree view point mining sub-module, and displays the heat degree key words on the heat degree view point statistic map.
The multidimensional public opinion monitoring system as described above, wherein preferably, the hotspot dynamic graph generating submodule sorts all the information pieces formed after the preprocessing according to the publishing time; grouping all the information strips in the sequence to form a plurality of information strip groups; calculating the heat of each information bar group; calculating the emotion of each information strip group; generating a hot degree dynamic graph of the hot spot dynamic graph according to the release time of each information strip group and the hot degree of each information strip group; and generating an emotion distribution dynamic graph of the hot spot dynamic graph according to the release time of each information strip group and the emotion of each information strip group.
The multidimensional public opinion monitoring system is characterized in that the viewpoint mining submodule (133) sorts all the information items according to the release time; performing data processing on the sequenced information strips to remove invalid information, remove redundant information and add missing information to obtain text information corresponding to the information strips, and performing keyword extraction on the text information corresponding to the information strips; performing keyword word frequency statistics on the extracted keywords in the text information corresponding to all the information items to obtain viewpoint keywords according to a keyword word frequency system; and searching the sorted information strips for viewpoint keywords to obtain text information corresponding to the information strips containing the viewpoint keywords so as to perform viewpoint mining on the information strips.
A multidimensional public opinion monitoring method comprises the following steps: s210, respectively collecting news information, comment information and microblog information; s220, performing data processing on the collected news information, comment information and microblog information to obtain text information or information bars; step S230, processing the text information or the information bar to obtain a hot word cloud picture, hot news, a hot dynamic picture, an emotion distribution map, a positive attitude word cloud picture, a negative attitude word cloud picture, an opinion statistic graph and a hot opinion statistic map, and displaying the hot word cloud picture, the hot news, the hot animation and the emotion distribution map to the user.
The multidimensional public opinion monitoring method preferably obtains a hot word cloud picture, hot news and a hot dynamic picture, and specifically comprises the following substeps: step S231, carrying out word frequency statistics on the obtained text information; step S232, obtaining hot words according to the word frequency in the text information, and generating a hot word cloud picture according to the hot words; step S233, displaying the news title and the URL corresponding to the hot word as hot news; step S234, taking the hot word as a keyword, and collecting microblog information containing the keyword; and S235, preprocessing the acquired microblog information, and generating a hot spot dynamic graph according to the preprocessed information.
The multidimensional public opinion monitoring method preferably generates a hotspot dynamic graph according to the preprocessed information, and includes the following substeps: s410, sequencing all the information strips formed after the preprocessing according to the release time; step S420, grouping all the sequenced information strips to form a plurality of information strip groups; step S430, calculating the heat of each information bar group; step S440, calculating the emotion of each information strip group; s450, generating a heat degree dynamic graph of the hot spot dynamic graph according to the release time of each information strip group and the heat degree of each information strip group; and step S460, generating an emotion distribution dynamic graph of the hotspot dynamic graph according to the release time of each information strip group and the emotion of each information strip group.
The multidimensional public opinion monitoring method preferably obtains an emotion distribution map, a positive attitude term cloud picture and a negative attitude term cloud picture, and specifically comprises the following substeps: step S510, emotion analysis is carried out on the text information to obtain the proportion of each emotion, the text information of the positive attitude and the text information of the negative attitude in all the text information; step S520, generating an emotion distribution diagram according to the proportion of each emotion in all the obtained text information; step S530, generating a front attitude word cloud picture according to the obtained text information of the front attitude; and S540, generating a negative attitude word cloud picture according to the obtained text information of the negative attitude.
The multidimensional public opinion monitoring method as described above, wherein preferably, the viewpoint statistical graph and the popularity viewpoint statistical graph are obtained, and the detailed substeps are as follows: step S610, carrying out viewpoint mining on the information bars; s630, carrying out heat calculation according to the generated information strip and obtaining heat keywords; and step S640, generating a heat viewpoint statistical chart according to the calculated heat value, and displaying a heat keyword on the heat viewpoint statistical chart.
The multidimensional public opinion monitoring method as described above, wherein preferably the viewpoint mining is performed on the information pieces, includes the following substeps: step S710, sequencing all the information strips according to the release time; s720, performing data processing on the sequenced information strips to remove invalid information, remove redundant information and add missing information to obtain text information corresponding to the information strips; step S730, extracting keywords from the text information corresponding to the information bar; step S740, carrying out keyword word frequency statistics on the extracted keywords in the text information corresponding to all the information items so as to obtain viewpoint keywords according to a keyword word frequency system; step S750, searching the sorted information items for a viewpoint keyword to obtain text information corresponding to the information item containing the viewpoint keyword.
The multidimensional public opinion monitoring method preferably performs popularity calculation according to the information bars to obtain popularity keywords, and comprises the following substeps: step S810, performing heat calculation on each information bar; s820, sequencing all the information strips according to the calculated heat value to obtain heat information strips; step S830, data processing is carried out on the sequenced information strips to obtain text information corresponding to the information strips; and step 840, extracting keywords from the text information corresponding to the heat information items after data processing.
Compared with the background technology, the multidimensional public opinion monitoring system and the multidimensional public opinion monitoring method can carry out public opinion monitoring and analysis from multiple dimensions and all around under the media-integrated era with rich and various contents and forms.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments described in the present application, and other drawings can be obtained by those skilled in the art according to the drawings.
Fig. 1 is a schematic diagram of a multidimensional public opinion monitoring system according to an embodiment of the present application;
fig. 2 is a flowchart of a multidimensional public opinion monitoring method according to the second embodiment of the present application;
fig. 3 is a flowchart for obtaining a hot spot word cloud picture, hot spot news, and a hot spot dynamic picture according to the second embodiment of the present application;
fig. 4 is a flowchart of generating a hotspot dynamic graph according to a second embodiment of the present application;
FIG. 5 is a flowchart for obtaining an emotion distribution map, a positive attitude word cloud map, and a negative attitude word cloud map according to the second embodiment of the present application;
fig. 6 is a flowchart for obtaining a viewpoint statistical graph and a heat viewpoint statistical graph according to the second embodiment of the present application;
FIG. 7 is a flowchart of perspective mining for team information strips provided in the second embodiment of the present application;
fig. 8 is a flowchart for obtaining a popularity keyword according to popularity of information items according to the second embodiment of the present application.
Detailed Description
Reference will now be made in detail to embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are exemplary only for the purpose of explaining the present application and are not to be construed as limiting the present application.
Example one
Referring to fig. 1, fig. 1 is a schematic view of a multidimensional public opinion monitoring system according to an embodiment of the present application;
the application provides a multidimensional public opinion monitoring system, include: hotspot discovery module 110, sentiment analysis module 120, and opinion mining module 130.
The hot spot discovery module 110 is configured to process the news information to discover hot spot information in the news information, and may also display the discovered hot spot information in the form of a hot spot word cloud picture, hot spot news, and a hot spot dynamic picture.
The emotion analysis module 120 is configured to process the comment information to analyze emotion information included in the comment information, and may further display the analyzed emotion information in the form of an emotion distribution map, a positive attitude word cloud picture, and a negative attitude word cloud picture.
The viewpoint mining module 130 is configured to process the microblog information to mine viewpoint information included in the microblog information, and may further display the mined viewpoint information in the form of a viewpoint statistical graph and a popularity viewpoint statistical graph.
Specifically, the hotspot discovery module 110 includes: the system comprises an information acquisition sub-module 111, a data processing sub-module 112, a word frequency statistics sub-module 113, a hot word cloud generation sub-module 114, a hot news generation sub-module 115, a hot microblog information acquisition sub-module 116 and a hot dynamic graph generation sub-module 117.
The information collecting sub-module 111 collects news information of news websites, preferably news headlines of the news websites, and obtains news information from the news headlines.
Specifically, the information collecting submodule 111 sends a collecting request to a server corresponding to a news webpage of the news information to be collected, where the collecting request includes a variable type to be collected, for example: news headlines, URLs corresponding to news pages, etc.
After the server corresponding to the news webpage judges that the acquisition request is valid, the server corresponding to the news webpage sends news information corresponding to the acquisition request to the information acquisition submodule 111, and when the server corresponding to the news webpage sends the news information corresponding to the acquisition request to the information acquisition submodule 111, the server also returns webpage addresses of other webpages linked to the news webpage to the information acquisition submodule 111.
After receiving the news information, the information acquisition submodule 111 further sends an acquisition request to the servers corresponding to all the web page addresses linked to the news web page according to a predetermined policy until all the web pages linked to the news web page are subjected to news information acquisition.
The predetermined strategy may be to collect news information for all the webpages linked on a line in sequence along the line, starting from the current news webpage. In addition, the predetermined policy may be to collect news information of all web pages linked in the current news web page, then select one web page linked to the current news web page, and collect news information of all web pages linked in the web page.
The information collecting submodule 111 analyzes the collected news information and stores the analyzed news information into a news database so as to process the collected news information in the next step.
On the basis, the information acquisition submodule 111 can acquire the news webpage in real time, and can also acquire the news information by reading the offline text of the news webpage. In addition, after the information acquisition submodule 111 acquires the news information, the news information is stored, so that the stored news information can be provided to the information acquisition submodule 121 of the sentiment analysis module 120 and/or the information acquisition submodule 131 of the opinion mining module 130 as offline data, of course, comment information acquired by the information acquisition submodule 121 of the sentiment analysis module 120 can also be provided to the information acquisition submodule 111 of the hotspot discovery module 110 and/or the information acquisition submodule 131 of the opinion mining module 130 as offline data, and microblog information acquired by the information acquisition submodule 131 of the opinion mining module 130 can also be provided to the information acquisition submodule 111 of the hotspot discovery module 110 and/or the information acquisition submodule 121 of the sentiment analysis module 120 as offline data, so that the hotspot discovery module 110, the sentiment analysis module 120, the comment mining module 130, and the like can be realized, Offline data exchange between the perspective mining modules 130.
The data processing sub-module 112 performs data processing on the news information acquired by the information acquisition sub-module 111 to remove invalid information, remove redundant information, and add missing information, thereby obtaining text information.
Specifically, unreasonable information (such as inconsistent information, information against facts and the like) is detected to exist in the collected news information, the unreasonable information is invalid information, and the detected invalid information is deleted from the news information.
And traversing the collected news information, finding out the position with information omission in the news information, and adding the omitted information at the position. Specifically, according to a keyword in a section of data field with missing information in the news information, a data field matched with the keyword is searched from all the news information, and the searched data field completes the news information with missing data. In the process, if a plurality of matched data fields are found, the data field records are selected according to the sequence of the time stamps to complete the missing data.
And traversing the collected news information, finding out redundant information existing in the news information, and deleting the redundant information. Specifically, the similarity between any two pieces of news information is calculated, if the calculated similarity S is smaller than a preset threshold TS, the data recorded in the two pieces of news information is considered to be duplicated data, one piece of the two pieces of news information is selected to be deleted, and preferably, one piece of news information with lower reliability is selected to be deleted according to the reliability of the two pieces of news information.
The preset threshold TS can be set by a user according to a requirement;
Figure 738903DEST_PATH_IMAGE001
the method includes the steps that S is the similarity of first news information and second news information, A is the first news information, B is the second news information, Ai is the weight value of the ith word in the first news information, Bi is the weight value of the ith word in the second news information, and the first news information and the second news information both have a words.
If the number of words in the two pieces of news information is different, the number of words in the two pieces of news information is unified into the same number, for example: and removing the words with smaller weight values in the news information with larger number of words until the number of the words in the two news information is the same.
The word frequency statistic submodule 113 performs word frequency statistics on the text information processed by the data processing submodule 112.
If the text information is mainly English text information, word frequency statistics is directly carried out, if the text information is mainly Chinese text information, word division is carried out on Chinese characters in the Chinese text information, and after the Chinese text information is subjected to word division, word frequency statistics is carried out on the Chinese text information subjected to word division.
Specifically, the probability of occurrence of each adjacent two chinese characters in the chinese text information is calculated, for example: the probability of the common occurrence of the adjacent Chinese characters C and D in the Chinese text information, and the probability of the common occurrence of the adjacent Chinese characters D and E in the Chinese text information.
By the formula
Figure 187202DEST_PATH_IMAGE002
Calculating the probability of the common occurrence of adjacent Chinese characters C and D in the Chinese text information, and obtaining the probability of the common occurrence of adjacent Chinese characters C and D in the Chinese text information through a formula
Figure 498098DEST_PATH_IMAGE003
Calculating the probability of the co-occurrence of the adjacent Chinese characters D and E in the Chinese text information, wherein,
Figure 158886DEST_PATH_IMAGE004
to be the probability of C and D co-occurring adjacently in the pre-trained cluster,
Figure 774192DEST_PATH_IMAGE005
to be the probability that D and E co-occur adjacently in the pre-trained cluster,
Figure 862234DEST_PATH_IMAGE006
is the probability that C occurs alone in the pre-trained cluster,
Figure 660426DEST_PATH_IMAGE007
is the probability that D occurs alone in the pre-trained cluster,
Figure 124905DEST_PATH_IMAGE008
for the probability of E occurring alone in the pre-trained cluster,
Figure 324942DEST_PATH_IMAGE009
as the probability of co-occurrence of C and D in the chinese text information,
Figure 600197DEST_PATH_IMAGE010
in the present application, the pre-trained cluster is a data set composed of a pre-collected number (for example, 1000 pieces) of Chinese text information, wherein the probability that D and E co-occur in the Chinese text information is shown.
If C is the first word of a sentence in the Chinese text information, then
Figure 620106DEST_PATH_IMAGE011
(Preset value) in case C and D are included in the same word, otherwise
Figure 888276DEST_PATH_IMAGE012
(preset value), classifying C and D as different words; if E is the last word of a sentence in the Chinese text information, then
Figure 677240DEST_PATH_IMAGE013
(Preset value) in the same word, otherwise, in the same word, D and E are classified
Figure 638243DEST_PATH_IMAGE014
(preset value), dividing D and E into different words; if D is a word in the middle of a sentence in the Chinese text information, it will be
Figure 394715DEST_PATH_IMAGE015
The corresponding two characters fall under the same word.
In particular according to the formula
Figure 200997DEST_PATH_IMAGE016
Calculating the word frequency M, x of a certain word in the text information with the total number of the text information NjFor the word in the text information XjN is the number of text messages in which the word appears in the N text messages.
The hot word cloud generating submodule 114 obtains a hot word according to the word frequency in the text information obtained by the word frequency statistics submodule 113, and generates a hot word cloud picture according to the hot word, so as to display the hot word cloud picture to the user.
Specifically, a hot word cloud picture is generated according to a preset pattern by using a rule that the word frequency is higher and the font is larger, and the hot word cloud picture is displayed. More specifically, the calculated word frequencies are sorted from high to low, and words with the word frequency sorting being in the top preset number are highlighted as hot words to form a hot word cloud picture. In the embodiment of the present application, it is preferable to display 50 words in the hot spot word cloud graph, and 5 of the 50 words are highlighted hot spot words, and of course, the number of words displayed in the hot spot word cloud graph and the number of hot spot words displayed in the hot spot word cloud graph may be other values as long as the requirements of the embodiment of the present application are met.
The hot news generating submodule 115 displays, as hot news, a news title and a URL corresponding to the hot word according to the hot word found by the hot word cloud generating submodule 114, so as to display the hot word to a user.
The hot-spot microblog information collection submodule 116 uses the hot-spot word found by the hot-spot word cloud generation submodule 114 as a keyword, and collects microblog information including the keyword.
Specifically, the hot-spot microblog information collection sub-module 116 sends a collection request to the microblog server, where the collection request carries a keyword, and the keyword is a hot word found by the hot-word cloud generation sub-module 114.
The microblog server traverses each microblog message according to the keyword carried by the acquisition request, and sends the microblog message containing the keyword to the hot microblog message acquisition submodule 116. After receiving the microblog information containing the keywords, the hot microblog information collecting sub-module 116 analyzes and stores the microblog information.
The hot spot dynamic graph generating submodule 117 preprocesses the microblog information acquired by the hot spot microblog information acquiring submodule 116, and generates a hot spot dynamic graph according to the preprocessed information;
specifically, each piece of characteristic information (such as release time, content text, comment quantity, praise quantity and the like) of the acquired microblog information is extracted, and each piece of characteristic information of each piece of microblog information is combined to form an information bar so as to finish preprocessing the microblog information. Generating a hot spot dynamic graph according to the information bar subjected to microblog information preprocessing, wherein the hot spot dynamic graph comprises: and (4) a heat dynamic diagram.
And sequencing all the information strips according to the release time. Specifically, all the information strips are sorted from morning to evening according to the release time of the microblog information contained in each information strip.
And grouping all the information strips in the sequence to form a plurality of information strip groups. Specifically, all the sequenced pieces of information are grouped at certain number intervals to form a plurality of pieces of information groups, for example: and dividing the information strips into an information strip group according to every 20 information strips.
The heat of each information strip group is calculated. In particular, according to the formula
Figure 110048DEST_PATH_IMAGE017
Calculating the heat degree of each information strip group, wherein H is the heat degree value of the information strip group, u is the number of information strips contained in the information strip group, and W1A weighted value being the sum of the number of comments contained in all the pieces of information in the set of pieces of information, G being the number of comments contained in each piece of information in the set of pieces of information, W2The weighted value is the sum of the numbers of praise contained in all the information pieces in the information piece group, and Y is the number of praise contained in each information piece in the information piece group. In the present application, preferably u =20, W1= 70%、W2= 30%. The sentiment of each information strip group is calculated. Specifically, the following is:
extracting each characteristic information (such as release time, content text, comment quantity, praise quantity and the like) of the acquired microblog information, and combining each characteristic information of each piece of microblog information to form an information strip so as to finish preprocessing the microblog information. Generating a hot spot dynamic graph according to the information bar subjected to microblog information preprocessing, wherein the hot spot dynamic graph comprises: and (4) emotion distribution diagram.
And sequencing all the information strips according to the release time. Specifically, all the information strips are sorted from morning to evening according to the release time of the microblog information contained in each information strip.
And grouping all the information strips in the sequence to form a plurality of information strip groups. Specifically, all the sequenced pieces of information are grouped at certain number intervals to form a plurality of pieces of information groups, for example: the 20 information pieces are divided into an information piece group. Specifically, one information strip group is
Figure 976372DEST_PATH_IMAGE018
Wherein
Figure 236452DEST_PATH_IMAGE019
For each information strip in the set of information strips, the set of information strips has n information strips, for example: n is 20.
And further acquiring microblog information directly linked or indirectly linked with the microblog information corresponding to each information strip in the information strip group, and forming the linked microblog information into information strips, for example: and information strip
Figure 331578DEST_PATH_IMAGE020
The linked information pieces are
Figure 360714DEST_PATH_IMAGE021
M-1 is the number of linked information pieces, information piece
Figure 397941DEST_PATH_IMAGE022
Information strip linked with it
Figure 879737DEST_PATH_IMAGE023
A set of information pieces is constructed,
Figure 27822DEST_PATH_IMAGE024
in addition, the correlation weight between the acquired microblog information and the linked microblog information is also extracted, for example: the association weight of the primary link, the association weight of the secondary link, the association weight of the tertiary link … …, and the like. For another example, the association weight of the primary link is 0.85, the association weight of the secondary link is 0.7, the association weight of the tertiary link is 0.58 … …, etc. Each information strip set
Figure 163662DEST_PATH_IMAGE025
With associated weight sets corresponding thereto
Figure 371789DEST_PATH_IMAGE026
Figure 340882DEST_PATH_IMAGE027
Is composed of
Figure 292658DEST_PATH_IMAGE028
Corresponding associated weight value, wherein
Figure 30806DEST_PATH_IMAGE029
On the basis, the information pieces are collected
Figure 426147DEST_PATH_IMAGE030
And corresponding associated weight set
Figure 616957DEST_PATH_IMAGE031
Building a set of information strip groups
Figure 106844DEST_PATH_IMAGE032
Inputting the information strip group set into a preset classification model, training the classification model to obtain different sub-classification models
Figure 699499DEST_PATH_IMAGE033
Where T =1, 2, 3, … … T, i.e. T sub-classification models are obtained. Using sub-classification models
Figure 515009DEST_PATH_IMAGE033
Wherein T =1, 2, 3, … … T, classifying the information strip group set to obtain a classification result, and estimating a sub-classification model according to the classification result
Figure 193115DEST_PATH_IMAGE033
Set of weights of
Figure 1539DEST_PATH_IMAGE034
. Calculating each sub-classification model by particle swarm optimization algorithm
Figure 448701DEST_PATH_IMAGE033
Set of weights of
Figure 169533DEST_PATH_IMAGE034
Wherein the optimal value corresponding to each weight is determined by each sub-classification model
Figure 334935DEST_PATH_IMAGE035
And the optimal value of its corresponding weight
Figure 432204DEST_PATH_IMAGE034
Or the normalized values of the optimal values are combined to obtain the classification model.
On the basis of the above formula
Figure 750184DEST_PATH_IMAGE036
To obtain
Figure 641916DEST_PATH_IMAGE037
Wherein argmin is
Figure 294615DEST_PATH_IMAGE038
Having a minimum value
Figure 195575DEST_PATH_IMAGE039
A collection of (a).
When the method is used, information strip group sets are acquired according to the steps and are combined and input into the obtained classification model, and therefore different types of emotion vocabularies are obtained for classification in each information strip group.
According to the formula
Figure 617329DEST_PATH_IMAGE040
Calculating the emotion of each information strip group, wherein F is the emotion of each information strip group,
Figure 926300DEST_PATH_IMAGE041
for the number of words of positive emotions in the sorted set of information strips,
Figure 66295DEST_PATH_IMAGE042
negative emotions in groups of information strips obtained for classificationNumber of words of (a), wpiWeight of vocabulary for positive emotion in emotion dictionary, wpjAnd the weight of the vocabulary with negative emotion in the emotion dictionary. In the application, the emotion dictionary is a set of corresponding relations between emotion vocabularies generated by utilizing the existing electronic dictionary extension in advance and the weights of the emotion vocabularies. In this example, each information strip group is divided into three categories, namely "positive attitude", "negative attitude", and "neutral attitude", according to the obtained emotion of each information strip group.
And generating a heat dynamic graph according to the release time of each information strip group and the heat of each information strip group. Specifically, because the information bar groups are obtained by dividing the information bars arranged in the order of the release time from morning to evening, and because the plurality of information bar groups also have a time order, the corresponding heat of each information bar group is sorted according to the time order to generate a heat dynamic graph, and the change condition of the heat is shown to the user.
And generating an emotion distribution dynamic graph according to the release time of each information strip group and the emotion of each information strip group. Specifically, the information bar groups are obtained by dividing the information bars which are arranged in the sequence from the morning to the evening according to the release time, and because the plurality of information bar groups have the time sequence, the emotions corresponding to each information bar group are sequenced according to the time sequence to generate an emotion distribution dynamic graph, and the emotion change condition is shown to the user.
Specifically, the emotion analysis module 120 includes: the emotion recognition module comprises an information acquisition sub-module 121, a data processing sub-module 122, an emotion analysis sub-module 123, an emotion distribution diagram sub-module 124, a positive attitude word cloud generation sub-module 125 and a negative attitude word cloud generation sub-module 126.
The information collecting sub-module 121 collects comment information posted in websites and forums.
Specifically, the information collecting sub-module 121 sends a collecting request to a server corresponding to a website and a forum, where the collecting request includes a keyword related to forum information to be collected.
After judging that the collection request is valid, the corresponding server sends comment information corresponding to the collection request to the information collection submodule 121, where the comment information includes: comment information on a movie, comment information on a commodity, comment information on a news event, and the like.
After receiving the comment information, the information collection submodule 121 analyzes the collected comment information and stores the analyzed comment information in the comment database to process the collected comment information in the next step.
On the basis, the information acquisition sub-module 121 may acquire websites and forums in real time, and may also acquire comment information by reading offline texts of the websites and forums.
The data processing sub-module 122 performs data processing on the comment information acquired by the information acquisition sub-module 121 to remove invalid information, remove redundant information, and add missing information, thereby obtaining text information.
Specifically, unreasonable information (such as inconsistent information, information against facts and the like) is detected to exist in the collected comment information, the unreasonable information is invalid information, and the detected invalid information is deleted from the comment information.
And traversing the collected comment information, finding out the position with information omission in the comment information, and adding the omitted information at the position. Specifically, according to the keywords in a section of data field with information omission in the comment information, the data field matched with the keywords is searched from all the comment information, and the searched data field completes the comment information with the omitted data. In the process, if a plurality of matched data fields are found, the data field records are selected according to the sequence of the time stamps to complete the missing data.
And traversing the collected comment information, finding out redundant information existing in the comment information, and deleting the redundant information. Specifically, the similarity between any two pieces of comment information is calculated, if the calculated similarity S is smaller than a preset threshold TS, the data recorded in the two pieces of comment information is considered to be duplicated data, one piece of comment information is selected to be deleted, and preferably, one piece of comment information with lower reliability is selected to be deleted according to the reliability recorded in the two pieces of comment information.
The preset threshold TS can be set by a user according to a requirement;
Figure 505366DEST_PATH_IMAGE043
the similarity of the first comment information and the second comment information is S, A is the first comment information, B is the second comment information, Ai is the weight value of the ith word in the first comment information, Bi is the weight value of the ith word in the second comment information, and the first comment information and the second comment information both have a words.
If the two pieces of comment information have different numbers of words, the numbers of words in the two pieces of comment information are unified into the same number, for example: and removing words with smaller weight values in the comment information with more words until the number of words in the two comment information is the same.
The emotion analysis submodule 123 performs emotion analysis on the text information processed and obtained by the data processing submodule 122 to obtain the proportion of each emotion, the text information of the positive attitude, and the text information of the negative attitude in all the text information.
Specifically, an emotion feature training set is constructed in advance, then the emotion feature training set is input into the bayesian classifier for training to obtain an emotion classifier, and when the emotion analysis sub-module 123 is used, the text information obtained by processing of the data processing module 122 is input into the emotion classifier, so that the emotion of the text information is analyzed. By way of example: and classifying the text information according to the proportion of each emotion in all the text information, wherein the emotion is positive attitude, neutral attitude or negative attitude to obtain the text information of the positive attitude, the text information of the neutral attitude and the text information of the negative attitude. As yet another example, the text information is divided into five categories of "like", "somewhat like", "generally", "less like", and "dislike" according to the analyzed emotion of the text information, wherein "like" and "somewhat like" are categorized as positive attitude, and "less like" and "dislike" are categorized as negative attitude.
The emotion distribution map sub-module 124 generates an emotion distribution map according to the proportion of each emotion in all the text information obtained by the emotion analysis sub-module 123, so as to show the emotion distribution of the comment information to the user.
The front attitude word cloud generating submodule 125 generates a front attitude word cloud picture according to the text information of the front attitude obtained by the emotion analyzing submodule 123, so as to display the front attitude word cloud picture to the user.
Specifically, the word frequency in the text information of the front attitude obtained by the emotion analysis submodule 123 is counted, a front attitude word cloud map is generated according to a predetermined pattern according to a rule that the word frequency is higher and the font is larger, and the front attitude word cloud map is displayed. More specifically, the calculated word frequencies are sorted from high to low, and words with the word frequency sorting being in the top preset number are highlighted to form the front attitude word cloud picture.
The negative attitude word cloud generating submodule 126 generates a negative attitude word cloud picture according to the text information of the negative attitude obtained by the emotion analyzing submodule 123, so that the negative attitude word cloud picture is displayed for the user.
Specifically, the word frequency in the text information of the negative attitude obtained by the emotion analysis submodule 123 is counted, a negative attitude word cloud map is generated according to a predetermined pattern by using a rule that the word frequency is higher and the font is larger, and the negative attitude word cloud map is displayed. More specifically, the calculated word frequencies are sorted from high to low, and words with the word frequency sorting being in the top preset number are highlighted to form the negative attitude word cloud picture.
On the basis of the above, the following method is adopted regardless of whether the word frequency in the text information with positive attitude or the word frequency in the text with negative attitude is calculated.
If the text information is mainly English text information, word frequency statistics is directly carried out, if the text information is mainly Chinese text information, word division is carried out on Chinese characters in the Chinese text information, and after the Chinese text information is subjected to word division, word frequency statistics is carried out on the Chinese text information subjected to word division.
Specifically, the probability of occurrence of each adjacent two chinese characters in the chinese text information is calculated, for example: the probability of the common occurrence of the adjacent Chinese characters C and D in the Chinese text information, and the probability of the common occurrence of the adjacent Chinese characters D and E in the Chinese text information.
By the formula
Figure 47206DEST_PATH_IMAGE044
Calculating the probability of the common occurrence of adjacent Chinese characters C and D in the Chinese text information, and obtaining the probability of the common occurrence of adjacent Chinese characters C and D in the Chinese text information through a formula
Figure 546321DEST_PATH_IMAGE045
Calculating the probability of the co-occurrence of the adjacent Chinese characters D and E in the Chinese text information, wherein,
Figure 658764DEST_PATH_IMAGE046
to be the probability of C and D co-occurring adjacently in the pre-trained cluster,
Figure 901527DEST_PATH_IMAGE047
to be the probability that D and E co-occur adjacently in the pre-trained cluster,
Figure 32294DEST_PATH_IMAGE048
is the probability that C occurs alone in the pre-trained cluster,
Figure 702309DEST_PATH_IMAGE049
is the probability that D occurs alone in the pre-trained cluster,
Figure 816896DEST_PATH_IMAGE050
for the probability of E occurring alone in the pre-trained cluster,
Figure 581459DEST_PATH_IMAGE051
as the probability of co-occurrence of C and D in the chinese text information,
Figure 97891DEST_PATH_IMAGE052
the pre-trained clusters in this application are a pre-collected number (e.g., a pre-collected number) for the probability of co-occurrence of D and E in the Chinese text message: 1000 pieces) of Chinese text information.
If C is the first word of a sentence in the Chinese text information, then
Figure 673228DEST_PATH_IMAGE053
(Preset value) in case C and D are included in the same word, otherwise
Figure 275111DEST_PATH_IMAGE054
(preset value), classifying C and D as different words; if E is the last word of a sentence in the Chinese text information, then
Figure 594097DEST_PATH_IMAGE055
(Preset value) in the same word, otherwise, in the same word, D and E are classified
Figure 715768DEST_PATH_IMAGE056
(preset value), dividing D and E into different words; if D is a word in the middle of a sentence in the Chinese text information, it will be
Figure 727586DEST_PATH_IMAGE057
The corresponding two characters fall under the same word.
In particular according to the formula
Figure 285606DEST_PATH_IMAGE058
Calculating the word frequency M, x of a certain word occurrence in the text information with the total number of the text information (positive attitude or negative attitude) being NjFor the word in the text information XjN is the number of text messages in which the word appears in the N text messages.
The viewpoint mining module 130 includes: the system comprises an information acquisition sub-module 131, a data processing sub-module 132, a viewpoint mining sub-module 133, a viewpoint statistical image-text generation sub-module 134, a heat viewpoint mining sub-module 135 and a heat statistical image-text generation sub-module 136.
The information collecting submodule 131 collects microblog information published on a microblog.
Specifically, the information collecting submodule 131 may collect corresponding microblog information according to the keyword, and a specific collecting manner is the same as a manner in which the information collecting submodule 111 collects news information or the information collecting submodule 121 collects comment information. In addition, the information collecting sub-module 131 may also collect microblog information by reading an offline text.
The data processing sub-module 132 pre-processes the microblog information collected by the information collecting sub-module 131 to generate an information strip.
Specifically, each piece of feature information (such as release time, content text, comment number, praise number, and the like) of the microblog information is extracted, and each piece of feature information of each piece of microblog information is combined to form one information strip.
The viewpoint mining submodule 133 performs viewpoint mining on the information pieces processed by the data processing submodule 132.
And sequencing all the information strips according to the release time. Specifically, all the information strips are sorted from morning to evening according to the release time of the microblog information contained in each information strip.
And performing data processing on the sequenced information strips to remove invalid information, remove redundant information and add missing information so as to obtain text information corresponding to the information strips. Specifically, the data processing method of the information bar is the same as the data processing method of the data processing submodule 112 for news information or the data processing method of the data processing submodule 122 for comment information.
And extracting keywords from the text information corresponding to the information bars. Specifically, the keywords are extracted from the text information corresponding to the information bars, and in the application, at most 5 keywords are preferably extracted from the text information corresponding to each information bar.
And carrying out keyword word frequency statistics on the extracted keywords in the text information corresponding to all the information items so as to obtain viewpoint keywords according to a keyword word frequency system. Specifically, the extracted keywords are subjected to keyword word frequency statistics in the text information corresponding to all the information items. According to the formula
Figure 673862DEST_PATH_IMAGE059
Performing word frequency statistics of keywords in the text information corresponding to all the information items, wherein N is the total number of the text information corresponding to all the information items, M is the word frequency of a certain word in the text information with the total number of the text information being N, and xjFor the word in the text information XjN is the number of text messages in which the word appears in the N text messages. And performing accumulated statistics on the word frequencies of the keywords of the text information corresponding to all the information bars, taking the keywords with the word frequency statistics ranking within a preset range as the viewpoint keywords corresponding to the microblog information, and preferably taking the keywords 5 th before the word frequency statistics ranking as the viewpoint keywords in the application.
And searching the ordered information pieces for the viewpoint keywords to obtain the text information corresponding to the information pieces containing the viewpoint keywords. Specifically, the viewpoint keywords are searched in the sorted information bars in sequence, once the viewpoint keywords appear in the text information corresponding to the information bars, the information bars are considered to be matched, the search on the information bars is stopped, and if the number of the matched information bars reaches a set upper limit of the number, the whole search on the viewpoint keywords is stopped; if the number of the matched information strips does not reach the set upper limit of the number, switching to the next information strip for continuing searching; in the present application, the upper limit of the number of pieces of information matching each viewpoint keyword is set to 5. And after the information bar is matched with the viewpoint keywords, acquiring text information corresponding to the information bar to form a viewpoint statistical chart later.
The viewpoint statistical graph generation submodule 134 generates a viewpoint statistical graph from the viewpoints mined by the viewpoint mining submodule 133 to be presented to the user.
Specifically, the viewpoint statistical graph-text generating sub-module 134 generates the viewpoint statistical graph-text according to the obtained viewpoint keywords and the statistical word frequency thereof, and preferably, further displays the text information corresponding to the obtained matching information pieces at the position corresponding to the viewpoint statistical graph-text.
The hot viewpoint mining submodule 135 performs hot calculation according to the information bar generated by the data processing submodule 132, and obtains a hot keyword.
And performing heat calculation on each information bar. In particular, according to the formula
Figure 633728DEST_PATH_IMAGE060
Calculating the heat of each information bar, wherein H is the heat value of the information bar, W3A weighted value for the number of comments of a bar, G the number of comments contained in a bar, W4Is a weight of the number of praise for the stripe, and Y is the number of praise contained by the stripe. In the present application, W is preferred3= 70%、W4= 30%。
And sequencing all the information strips according to the calculated heat value to obtain the heat information strips. According to the calculated heat value, the information strips are sorted from high to low, the information strips of the first few bits are used as heat information strips, and the information strips of the first 5 bits are preferably used as heat information strips in the application.
And carrying out data processing on the sequenced information strips to obtain text information corresponding to the information strips. Specifically, data processing is performed on the sequenced information strips to remove invalid information, remove redundant information, and add missing information, so as to obtain text information corresponding to the information strips. Specifically, the data processing method of the information bar is the same as the data processing method of the data processing submodule 112 for news information or the data processing method of the data processing submodule 122 for comment information.
And extracting keywords from the text information corresponding to the heat information strip after data processing to obtain heat keywords. Specifically, the keywords are extracted from the heat information strips after data processing to obtain the heat keywords, and in the present application, the upper limit of the number of the heat keywords extracted from each heat information strip is preferably 3.
The hot point view statistical sub-module 136 generates a hot point view statistical chart from the hot point values calculated by the hot point view mining sub-module 135, and also displays a hot keyword on the hot point view statistical chart.
Specifically, a generated heat viewpoint statistical chart is generated according to the obtained heat value of the heat information bar, and in addition, the text information and the heat keywords of the heat information bar can be displayed at the corresponding position of the heat viewpoint statistical chart.
Example two
Please refer to fig. 2, fig. 2 is a flowchart of a multidimensional public opinion monitoring method according to a second embodiment of the present application;
the application provides a multidimensional public opinion monitoring method, which comprises the following steps:
s210, respectively collecting news information, comment information and microblog information;
specifically, news information, comment information and microblog information are collected in real time, and offline texts can be read to collect the news information, the comment information and the microblog information offline. For example: offline data exchange is carried out among the offline cached news information, comment information and microblog information, so that offline data collection is achieved.
S220, performing data processing on the collected news information, comment information and microblog information to obtain text information or information bars;
the data processing is carried out on the news information, and the data processing method specifically comprises the following steps:
unreasonable information (such as inconsistent information, information against facts and the like) is detected to exist in the collected news information, the unreasonable information is invalid information, and the detected invalid information is deleted from the news information.
And traversing the collected news information, finding out the position with information omission in the news information, and adding the omitted information at the position. Specifically, according to a keyword in a section of data field with missing information in the news information, a data field matched with the keyword is searched from all the news information, and the searched data field completes the news information with missing data. In the process, if a plurality of matched data fields are found, the data field records are selected according to the sequence of the time stamps to complete the missing data.
And traversing the collected news information, finding out redundant information existing in the news information, and deleting the redundant information. Specifically, the similarity between any two pieces of news information is calculated, if the calculated similarity S is smaller than a preset threshold TS, the data recorded in the two pieces of news information is considered to be duplicated data, one piece of the two pieces of news information is selected to be deleted, and preferably, one piece of news information with lower reliability is selected to be deleted according to the reliability of the two pieces of news information.
The preset threshold TS can be set by a user according to a requirement;
Figure 68645DEST_PATH_IMAGE061
the method includes the steps that S is the similarity of first news information and second news information, A is the first news information, B is the second news information, Ai is the weight value of the ith word in the first news information, Bi is the weight value of the ith word in the second news information, and the first news information and the second news information both have a words.
If the number of words in the two pieces of news information is different, the number of words in the two pieces of news information is unified into the same number, for example: and removing the words with smaller weight values in the news information with larger number of words until the number of the words in the two news information is the same.
The comment information is subjected to data processing, and the data processing specifically comprises the following steps:
unreasonable information (such as inconsistent information, information against facts and the like) is detected to exist in the collected comment information, the unreasonable information is invalid information, and the detected invalid information is deleted from the comment information.
And traversing the collected comment information, finding out the position with information omission in the comment information, and adding the omitted information at the position. Specifically, according to the keywords in a section of data field with information omission in the comment information, the data field matched with the keywords is searched from all the comment information, and the searched data field completes the comment information with the omitted data. In the process, if a plurality of matched data fields are found, the data field records are selected according to the sequence of the time stamps to complete the missing data.
And traversing the collected comment information, finding out redundant information existing in the comment information, and deleting the redundant information. Specifically, the similarity between any two pieces of comment information is calculated, if the calculated similarity S is smaller than a preset threshold TS, the data recorded in the two pieces of comment information is considered to be duplicated data, one piece of comment information is selected to be deleted, and preferably, one piece of comment information with lower reliability is selected to be deleted according to the reliability recorded in the two pieces of comment information.
The preset threshold TS can be set by a user according to a requirement;
Figure 645120DEST_PATH_IMAGE062
the similarity of the first comment information and the second comment information is S, A is the first comment information, B is the second comment information, Ai is the weight value of the ith word in the first comment information, Bi is the weight value of the ith word in the second comment information, and the first comment information and the second comment information both have a words.
If the two pieces of comment information have different numbers of words, the numbers of words in the two pieces of comment information are unified into the same number, for example: and removing words with smaller weight values in the comment information with more words until the number of words in the two comment information is the same.
The data processing is carried out on the microblog information, and the method specifically comprises the following steps:
extracting each feature information (such as release time, content text, comment quantity, praise quantity and the like) of the microblog information, and combining each feature information of each piece of microblog information to form an information strip.
Step S230, processing the text information or the information bar to obtain a hot word cloud picture, hot news, a hot dynamic picture, an emotion distribution map, a positive attitude word cloud, a negative attitude word cloud, a viewpoint statistical picture and a hot statistical picture, and displaying the hot word cloud picture, the hot news, the hot dynamic picture and the emotion distribution map to the user.
Referring to fig. 3, a hot word cloud picture, hot news, and a hot dynamic picture are obtained, which includes the following steps:
step S231, carrying out word frequency statistics on the obtained text information;
if the text information is mainly English text information, word frequency statistics is directly carried out, if the text information is mainly Chinese text information, word division is carried out on Chinese characters in the Chinese text information, and after the Chinese text information is subjected to word division, word frequency statistics is carried out on the Chinese text information subjected to word division.
Specifically, the probability of occurrence of each adjacent two chinese characters in the chinese text information is calculated, for example: the probability of the common occurrence of the adjacent Chinese characters C and D in the Chinese text information, and the probability of the common occurrence of the adjacent Chinese characters D and E in the Chinese text information.
By the formula
Figure 305908DEST_PATH_IMAGE063
Calculating the probability of the common occurrence of adjacent Chinese characters C and D in the Chinese text information, and obtaining the probability of the common occurrence of adjacent Chinese characters C and D in the Chinese text information through a formula
Figure 385860DEST_PATH_IMAGE064
Calculating the probability of the co-occurrence of the adjacent Chinese characters D and E in the Chinese text information, wherein,
Figure 739481DEST_PATH_IMAGE065
to be the probability of C and D co-occurring adjacently in the pre-trained cluster,
Figure 288405DEST_PATH_IMAGE066
to be the probability that D and E co-occur adjacently in the pre-trained cluster,
Figure 752884DEST_PATH_IMAGE067
is the probability that C occurs alone in the pre-trained cluster,
Figure 687342DEST_PATH_IMAGE068
is the probability that D occurs alone in the pre-trained cluster,
Figure 211865DEST_PATH_IMAGE069
trained in advance for EThe probability of a single occurrence in a trained cluster,
Figure 497352DEST_PATH_IMAGE070
as the probability of co-occurrence of C and D in the chinese text information,
Figure 499943DEST_PATH_IMAGE071
in the present application, the pre-trained cluster is a data set composed of a pre-collected number (for example, 1000 pieces) of Chinese text information, wherein the probability that D and E co-occur in the Chinese text information is shown.
If C is the first word of a sentence in the Chinese text information, then
Figure 803755DEST_PATH_IMAGE072
(Preset value) in case C and D are included in the same word, otherwise
Figure 499178DEST_PATH_IMAGE073
(preset value), classifying C and D as different words; if E is the last word of a sentence in the Chinese text information, then
Figure 6383DEST_PATH_IMAGE074
(Preset value) in the same word, otherwise, in the same word, D and E are classified
Figure 812665DEST_PATH_IMAGE075
(preset value), dividing D and E into different words; if D is a word in the middle of a sentence in the Chinese text information, it will be
Figure 987294DEST_PATH_IMAGE076
The corresponding two characters fall under the same word.
In particular according to the formula
Figure 604352DEST_PATH_IMAGE077
Calculating the word frequency M, x of a certain word in the text information with the total number of the text information NjFor the word in the text information XjN is in the N textsThe number of text messages in which the word appears in the message.
And step S232, obtaining hot words according to the word frequency in the text information, and generating a hot word cloud picture according to the hot words.
Specifically, a hot word cloud picture is generated according to a preset pattern by using a rule that the word frequency is higher and the font is larger, and the hot word cloud picture is displayed. More specifically, the calculated word frequencies are sorted from high to low, and words with the word frequency sorting being in the top preset number are highlighted as hot words to form a hot word cloud picture. In the embodiment of the present application, it is preferable to display 50 words in the hot spot word cloud graph, and 5 of the 50 words are highlighted hot spot words, and of course, the number of words displayed in the hot spot word cloud graph and the number of hot spot words displayed in the hot spot word cloud graph may be other values as long as the requirements of the embodiment of the present application are met.
And step S233, displaying the news title and the URL corresponding to the hot word as hot news so as to display the hot news title and the URL to a user.
Step S234, the hot words are used as key words, and microblog information containing the key words is collected.
S235, preprocessing the acquired microblog information, and generating a hot spot dynamic graph according to the preprocessed information;
specifically, each piece of characteristic information (such as release time, content text, comment quantity, praise quantity and the like) of the acquired microblog information is extracted, and each piece of characteristic information of each piece of microblog information is combined to form an information bar so as to finish preprocessing the microblog information. Generating a hot spot dynamic graph according to the information bar subjected to microblog information preprocessing, wherein the hot spot dynamic graph comprises: a heat dynamic diagram and an emotion distribution dynamic diagram.
Referring to fig. 4, generating a hot spot dynamic graph according to the preprocessed information includes the following sub-steps:
s410, sequencing all the information strips formed after the preprocessing according to the release time;
specifically, all the information strips are sequenced from morning to evening according to the release time of the microblog information contained in each information strip;
step S420, grouping all the sequenced information strips to form a plurality of information strip groups;
specifically, all the sequenced pieces of information are grouped at certain number intervals to form a plurality of pieces of information groups, for example: and dividing the information strips into an information strip group according to every 20 information strips.
Step S430, calculating the heat of each information bar group;
in particular, according to the formula
Figure 598852DEST_PATH_IMAGE078
Calculating the heat degree of each information strip group, wherein H is the heat degree value of the information strip group, u is the number of information strips contained in the information strip group, and W1A weighted value being the sum of the number of comments contained in all the pieces of information in the set of pieces of information, G being the number of comments contained in each piece of information in the set of pieces of information, W2The weighted value is the sum of the numbers of praise contained in all the information pieces in the information piece group, and Y is the number of praise contained in each information piece in the information piece group. In the present application, preferably u =20, W1= 70%、W2= 30%。
Step S440, calculating the emotion of each information strip group;
specifically, the following is:
extracting each characteristic information (such as release time, content text, comment quantity, praise quantity and the like) of the acquired microblog information, and combining each characteristic information of each piece of microblog information to form an information strip so as to finish preprocessing the microblog information. Generating a hot spot dynamic graph according to the information bar subjected to microblog information preprocessing, wherein the hot spot dynamic graph comprises: and (4) emotion distribution diagram.
And sequencing all the information strips according to the release time. Specifically, all the information strips are sorted from morning to evening according to the release time of the microblog information contained in each information strip.
And grouping all the information strips in the sequence to form a plurality of information strip groups. In particular, pressGrouping all the information strips in the sequence according to a certain number of intervals to form a plurality of information strip groups, for example: the 20 information pieces are divided into an information piece group. Specifically, one information strip group is
Figure 208825DEST_PATH_IMAGE079
Wherein
Figure 237961DEST_PATH_IMAGE080
For each information strip in the set of information strips, the set of information strips has n information strips, for example: n is 20.
And further acquiring microblog information directly linked or indirectly linked with the microblog information corresponding to each information strip in the information strip group, and forming the linked microblog information into information strips, for example: and information strip
Figure 9608DEST_PATH_IMAGE081
The linked information pieces are
Figure 3322DEST_PATH_IMAGE082
M-1 is the number of linked information pieces, information piece
Figure 151407DEST_PATH_IMAGE083
Information strip linked with it
Figure 35049DEST_PATH_IMAGE084
A set of information pieces is constructed,
Figure 243177DEST_PATH_IMAGE085
in addition, the correlation weight between the acquired microblog information and the linked microblog information is also extracted, for example: the association weight of the primary link, the association weight of the secondary link, the association weight of the tertiary link … …, and the like. For another example, the association weight of the primary link is 0.85, the association weight of the secondary link is 0.7, the association weight of the tertiary link is 0.58 … …, etc. Each information strip set
Figure 212270DEST_PATH_IMAGE086
With associated weight sets corresponding thereto
Figure 649198DEST_PATH_IMAGE087
Figure 652926DEST_PATH_IMAGE088
Is composed of
Figure 31955DEST_PATH_IMAGE089
Corresponding associated weight value, wherein
Figure 222765DEST_PATH_IMAGE090
On the basis, the information pieces are collected
Figure 978231DEST_PATH_IMAGE091
And corresponding associated weight set
Figure 820154DEST_PATH_IMAGE092
Building a set of information strip groups
Figure 370084DEST_PATH_IMAGE093
Inputting the information strip group set into a preset classification model, training the classification model to obtain different sub-classification models
Figure 313770DEST_PATH_IMAGE094
Wherein, in the step (A),
Figure 607348DEST_PATH_IMAGE095
i.e. T sub-classification models are obtained. Using sub-classification models
Figure 54510DEST_PATH_IMAGE096
Wherein
Figure 791653DEST_PATH_IMAGE097
Classifying the information strip group set to obtain a classification result, and estimating a sub-classification model according to the classification result
Figure 957055DEST_PATH_IMAGE098
Set of weights of
Figure 54324DEST_PATH_IMAGE099
. Calculating each sub-classification model by particle swarm optimization algorithm
Figure 355992DEST_PATH_IMAGE100
Set of weights of
Figure 513304DEST_PATH_IMAGE101
Wherein the optimal value corresponding to each weight is determined by each sub-classification model
Figure 166002DEST_PATH_IMAGE102
And the optimal value of its corresponding weight
Figure 53580DEST_PATH_IMAGE103
Or the normalized values of the optimal values are combined to obtain the classification model.
On the basis of the above formula
Figure 475334DEST_PATH_IMAGE104
To obtain
Figure 803547DEST_PATH_IMAGE105
Wherein argmin is
Figure 677962DEST_PATH_IMAGE106
Having a minimum value
Figure 382613DEST_PATH_IMAGE107
A collection of (a).
When the method is used, information strip group sets are acquired according to the steps and are combined and input into the obtained classification model, and therefore different types of emotion vocabularies are obtained for classification in each information strip group.
In particular, according to the formula
Figure 409606DEST_PATH_IMAGE108
Calculating the emotion of each information strip group, wherein F is the emotion of each information strip group,
Figure 908721DEST_PATH_IMAGE109
for the number of words of positive emotions in the sorted set of information strips,
Figure 67169DEST_PATH_IMAGE110
for the number of words of negative emotions in the sorted set of notes, wpiWeight of vocabulary for positive emotion in emotion dictionary, wpjAnd the weight of the vocabulary with negative emotion in the emotion dictionary. In the application, the emotion dictionary is a set of corresponding relations between emotion vocabularies generated by utilizing the existing electronic dictionary extension in advance and the weights of the emotion vocabularies.
In this example, each information strip group is divided into three categories, namely "positive attitude", "negative attitude", and "neutral attitude", according to the obtained emotion of each information strip group.
S450, generating a heat dynamic graph according to the release time of each information strip group and the heat of each information strip group;
specifically, because the information bar groups are obtained by dividing the information bars arranged in the order of the release time from morning to evening, and because the plurality of information bar groups also have a time order, the corresponding heat of each information bar group is sorted according to the time order to generate a heat dynamic graph, and the change condition of the heat is shown to the user.
Step S460, generating an emotion distribution dynamic graph according to the release time of each information strip group and the emotion of each information strip group;
specifically, the information bar groups are obtained by dividing the information bars which are arranged in the sequence from the morning to the evening according to the release time, and because the plurality of information bar groups have the time sequence, the emotions corresponding to each information bar group are sequenced according to the time sequence to generate an emotion distribution dynamic graph, and the emotion change condition is shown to the user.
Referring to fig. 5, an emotion distribution map, a positive attitude word cloud picture, and a negative attitude word cloud picture are obtained, which includes the following steps:
and step S510, performing emotion analysis on the text information to obtain the proportion of each emotion, the text information of the positive attitude and the text information of the negative attitude in all the text information.
Specifically, an emotion feature training set is constructed in advance, then the emotion feature training set is input into a Bayes classifier for training to obtain an emotion classifier, and the processed text information is input into the emotion classifier, so that the emotion of the text information is analyzed. By way of example: and classifying the text information according to the proportion of each emotion in all the text information, wherein the emotion is positive attitude, neutral attitude or negative attitude to obtain the text information of the positive attitude, the text information of the neutral attitude and the text information of the negative attitude. As yet another example, the text information is divided into five categories of "like", "somewhat like", "generally", "less like", and "dislike" according to the analyzed emotion of the text information, wherein "like" and "somewhat like" are categorized as positive attitude, and "less like" and "dislike" are categorized as negative attitude.
And step S520, generating an emotion distribution diagram according to the proportion of each emotion in all the obtained text information so as to display the emotion distribution of the comment information to the user.
And step S530, generating a front attitude word cloud picture according to the obtained text information of the front attitude so as to display the front attitude word cloud picture to a user.
Specifically, the word frequency in the obtained text information with the positive attitude is generated into a cloud picture with the positive attitude according to a preset pattern by using a rule that the word frequency is higher and the font is larger, and the cloud picture with the positive attitude is displayed. More specifically, the calculated word frequencies are sorted from high to low, and words with the word frequency sorting being in the top preset number are highlighted to form the front attitude word cloud picture.
And S540, generating a negative attitude word cloud picture according to the obtained text information of the negative attitude so as to display the negative attitude word picture to the user.
Specifically, the word frequency in the text information with the negative attitude obtained through statistics is generated according to a preset pattern by using a rule that the word frequency is higher and the font is larger, and the word cloud picture with the negative attitude is displayed. More specifically, the calculated word frequencies are sorted from high to low, and words with the word frequency sorting being in the top preset number are highlighted to form the negative attitude word cloud picture.
On the basis of the above, the following method is adopted regardless of whether the word frequency in the text information with positive attitude or the word frequency in the text with negative attitude is calculated.
If the text information is mainly English text information, word frequency statistics is directly carried out, if the text information is mainly Chinese text information, word division is carried out on Chinese characters in the Chinese text information, and after the Chinese text information is subjected to word division, word frequency statistics is carried out on the Chinese text information subjected to word division.
Specifically, the probability of occurrence of each adjacent two chinese characters in the chinese text information is calculated, for example: the probability of the common occurrence of the adjacent Chinese characters C and D in the Chinese text information, and the probability of the common occurrence of the adjacent Chinese characters D and E in the Chinese text information.
By the formula
Figure 778774DEST_PATH_IMAGE111
Calculating the probability of the common occurrence of adjacent Chinese characters C and D in the Chinese text information, and obtaining the probability of the common occurrence of adjacent Chinese characters C and D in the Chinese text information through a formula
Figure 689967DEST_PATH_IMAGE112
Calculating the probability of the co-occurrence of the adjacent Chinese characters D and E in the Chinese text information, wherein,
Figure 94403DEST_PATH_IMAGE113
to be the probability of C and D co-occurring adjacently in the pre-trained cluster,
Figure 943410DEST_PATH_IMAGE114
in pre-trained clusters for D and EThe probability of co-occurrence of neighbors is,
Figure 724285DEST_PATH_IMAGE115
is the probability that C occurs alone in the pre-trained cluster,
Figure 240717DEST_PATH_IMAGE116
is the probability that D occurs alone in the pre-trained cluster,
Figure 566787DEST_PATH_IMAGE117
for the probability of E occurring alone in the pre-trained cluster,
Figure 168670DEST_PATH_IMAGE118
as the probability of co-occurrence of C and D in the chinese text information,
Figure 487655DEST_PATH_IMAGE119
in the present application, the pre-trained cluster is a data set composed of a pre-collected number (for example, 1000 pieces) of Chinese text information, wherein the probability that D and E co-occur in the Chinese text information is shown.
If C is the first word of a sentence in the Chinese text information, then
Figure 858594DEST_PATH_IMAGE120
(Preset value) in case C and D are included in the same word, otherwise
Figure 604833DEST_PATH_IMAGE121
(preset value), classifying C and D as different words; if E is the last word of a sentence in the chinese text message,
Figure 674770DEST_PATH_IMAGE122
(Preset value) in the same word, otherwise, in the same word, D and E are classified
Figure 63026DEST_PATH_IMAGE123
(preset value), dividing D and E into different words; if D is in a sentence in Chinese text informationA word between them will
Figure 22892DEST_PATH_IMAGE124
The corresponding two characters fall under the same word.
In particular according to the formula
Figure 205612DEST_PATH_IMAGE125
Calculating the word frequency M, x of a certain word occurrence in the text information with the total number of the text information (positive attitude or negative attitude) being NjFor the word in the text information XjN is the number of text messages in which the word appears in the N text messages.
Referring to fig. 6, a viewpoint statistical graph and a heat viewpoint statistical graph are obtained, which includes the following steps:
step S610, carrying out viewpoint mining on the information bars;
specifically, referring to fig. 7, the viewpoint mining for the information bar includes the following sub-steps:
step S710, sequencing all the information strips according to the release time;
specifically, all the information strips are sorted from morning to evening according to the release time of the microblog information contained in each information strip.
And step S720, performing data processing on the sequenced information strips to remove invalid information, remove redundant information and add missing information, so as to obtain text information corresponding to the information strips.
Specifically, the data processing method for the information bar is the same as the data processing method for the news information or the data processing method for the comment information.
Step S730, extracting keywords from the text information corresponding to the information bar;
specifically, the keywords are extracted from the text information corresponding to the information bars, and in the application, at most 5 keywords are preferably extracted from the text information corresponding to each information bar.
Step S740, carrying out keyword word frequency statistics on the extracted keywords in the text information corresponding to all the information items so as to obtain viewpoint keywords according to a keyword word frequency system;
specifically, the extracted keywords are subjected to keyword word frequency statistics in the text information corresponding to all the information items. According to the formula
Figure 516507DEST_PATH_IMAGE126
Performing keyword word frequency statistics in the text information corresponding to all the information items, wherein N is the total number of the text information corresponding to all the information items, M is the word frequency of a certain word in the text information with the total number of the text information being N, and xjFor the word in the text information XjN is the number of text messages in which the word appears in the N text messages.
And performing accumulated statistics on the word frequencies of the keywords of the text information corresponding to all the information bars, taking the keywords with the word frequency statistics ranking within a preset range as the viewpoint keywords corresponding to the microblog information, and preferably taking the keywords 5 th before the word frequency statistics ranking as the viewpoint keywords in the application.
Step S750, searching viewpoint keywords in the sorted information bars to obtain text information corresponding to the information bars containing the viewpoint keywords;
specifically, the viewpoint keywords are searched in the sorted information bars in sequence, once the viewpoint keywords appear in the text information corresponding to the information bars, the information bars are considered to be matched, the search on the information bars is stopped, and if the number of the matched information bars reaches a set upper limit of the number, the whole search on the viewpoint keywords is stopped; if the number of the matched information strips does not reach the set upper limit of the number, switching to the next information strip for continuing searching; in the present application, the upper limit of the number of pieces of information matching each viewpoint keyword is set to 5. And after the information bar is matched with the viewpoint keywords, acquiring text information corresponding to the information bar to form a viewpoint statistical chart later.
Step S620, generating a viewpoint statistical graph-text according to the mining viewpoint so as to display the viewpoint statistical graph-text for a user;
specifically, a viewpoint statistical graph is generated according to the obtained viewpoint keywords and the statistical word frequency thereof, and preferably, the text information corresponding to the obtained matching information pieces is also displayed at a position corresponding to the viewpoint statistical graph.
S630, carrying out heat calculation according to the generated information bar and obtaining heat keywords;
specifically, please refer to fig. 8, the heat calculation is performed according to the information bars to obtain the heat keyword, which includes the following sub-steps:
step S810, performing heat calculation on each information bar;
in particular, according to the formula
Figure 193608DEST_PATH_IMAGE127
Calculating the heat of each information bar, wherein H is the heat value of the information bar, W3A weighted value for the number of comments of a bar, G the number of comments contained in a bar, W4Is a weight of the number of praise for the stripe, and Y is the number of praise contained by the stripe. In the present application, W is preferred3= 70%、W4= 30%。
S820, sequencing all the information strips according to the calculated heat value to obtain heat information strips;
according to the calculated heat value, the information strips are sorted from high to low, the information strips of the first few bits are used as heat information strips, and the information strips of the first 5 bits are preferably used as heat information strips in the application.
Step S830, data processing is carried out on the sequenced information strips to obtain text information corresponding to the information strips;
specifically, data processing is performed on the sequenced information strips to remove invalid information, remove redundant information, and add missing information, so as to obtain text information corresponding to the information strips. Specifically, the data processing method for the information bar is the same as the data processing method for the news information or the data processing method for the comment information.
Step 840, extracting keywords from the text information corresponding to the heat information bar after data processing;
specifically, the keywords are extracted from the heat degree information pieces after data processing, and in the present application, the upper limit of the number of the keywords extracted from each heat degree information piece is preferably 3.
Step S640 is to generate a heat viewpoint statistical map according to the calculated heat value, and also to display a heat keyword on the heat viewpoint statistical map.
Specifically, a generated heat viewpoint statistical chart is generated according to the obtained heat value of the heat information bar, and in addition, the text information and the heat keywords of the heat information bar can be displayed at the corresponding position of the heat viewpoint statistical chart.
The multidimensional public opinion monitoring system and the multidimensional public opinion monitoring method can carry out public opinion monitoring and analysis from multiple dimensions and all-round directions under the media-integrated era with rich and diversified contents and forms.
It will be evident to those skilled in the art that the present application is not limited to the details of the foregoing illustrative embodiments, and that the present application may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the application being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned.
Furthermore, it should be understood that although the present description refers to embodiments, not every embodiment may contain only a single embodiment, and such description is for clarity only, and those skilled in the art should integrate the description, and the embodiments may be combined as appropriate to form other embodiments understood by those skilled in the art.

Claims (7)

1. The utility model provides a multidimensional public opinion monitoring system which characterized in that includes: the system comprises a hotspot discovery module (110), an emotion analysis module (120) and a viewpoint mining module (130);
wherein the hotspot discovery module (110) comprises: the system comprises a first information acquisition sub-module (111), a first data processing sub-module (112), a word frequency statistics sub-module (113), a hot word cloud generation sub-module (114), a hot news generation sub-module (115), a hot microblog information acquisition sub-module (116) and a hot dynamic graph generation sub-module (117);
the first information acquisition submodule (111) acquires news information of a news website;
the first data processing submodule (112) performs data processing on the news information acquired by the first information acquisition submodule (111) to remove invalid information, remove redundant information and add missing information to obtain text information;
the word frequency statistic submodule (113) carries out word frequency statistics on the text information obtained by the processing of the first data processing submodule (112);
the hot word cloud generating submodule (114) obtains hot words according to the word frequency in the text information obtained by the word frequency counting submodule (113) in a counting mode, and generates a hot word cloud picture according to the hot words;
the hot news generating submodule (115) generates hot words found by the submodule (114) according to the hot word cloud, and displays news titles and URLs corresponding to the hot words as hot news;
the hot microblog information acquisition sub-module (116) takes the hot words found by the hot word cloud generation sub-module (114) as keywords and acquires microblog information containing the keywords;
the hot spot dynamic graph generating sub-module (117) preprocesses the microblog information acquired by the hot spot microblog information acquisition sub-module (116), and generates a hot spot dynamic graph according to the preprocessed information;
the hot spot dynamic graph generation sub-module (117) sequences all the information strips formed after the preprocessing according to the release time;
grouping all the information strips in the sequence to form a plurality of information strip groups;
one information strip group is [ x ]11、x21……xn1]Wherein x is11、x21……xn1For each information strip of the set of information strips, the set of information strips hasn pieces of information;
calculating the heat of each information bar group;
calculating the emotion of each information strip group;
collecting microblog information directly linked or indirectly linked with the microblog information corresponding to each information strip in the information strip group, and forming the linked microblog information into information strips and information strips xi1The linked information strip is xi2、xi3……ximM-1 is the number of linked information pieces, information piece xi1Information item x linked theretoi2、xi3……ximBuilding a set of information pieces, Xi={xi1、xi2、xi3……xim};
Extracting an association weight value between the acquired microblog information and the linked microblog information, wherein each information strip set X isi={xi1、xi2、xi3……ximHave associated weight value set Y corresponding to iti={yi1、yi2、yi3……yim},yi1、yi2、yi3……yimIs xi1、xi2、xi3……ximCorresponding correlation weight, wherein yi1=1;
By gathering X of information stripsi={xi1、xi2、xi3……ximAnd the corresponding associated weight set Yi={yi1、yi2、yi3……yimConstruction of information strip set Sn={(X1,Y1),(X2,Y2)……(Xn,Yn)};
Inputting the information strip group set into a preset classification model, training the classification model to obtain different sub-classification models ft(a) Wherein T is 1, 2, 3, … … T, using a sub-classification model ft(a) Wherein T is 1, 2, 3, … … T, classifying the information bar group set to obtain a classification result, and obtaining the classification result through a formula
Figure FDA0003099608920000021
To obtain { mu1,μ2,μ3……μTWherein argmin is
Figure FDA0003099608920000022
A set of μ with a minimum value;
calculating each sub-classification model f by particle swarm optimization algorithmt(a) Set of weights of (u) { mu }1,μ2,μ3……μTIn the method, the optimal value corresponding to each weight passes through each sub-classification model { f }1(X)、f2(X)……fT(X) } and its corresponding optimal value of weight [ mu ]1,μ2,μ3……μTCombining the normalized values of the optimal values to obtain a classification model;
acquiring and obtaining an information strip group set, combining and inputting the information strip groups into the obtained classification model, and classifying each information strip group to obtain different types of emotion vocabularies;
according to the formula
Figure FDA0003099608920000023
Calculating the emotion of each information strip group, wherein F is the emotion of each information strip group, NpNumber of words of positive emotion in the set of information pieces obtained by classification, NnFor the number of words of negative emotions in the sorted set of notes, wpiWeight of vocabulary for positive emotion in emotion dictionary, wpjThe weight value of the vocabulary with negative emotion in the emotion dictionary;
generating a hot degree dynamic graph of the hot spot dynamic graph according to the release time of each information strip group and the hot degree of each information strip group;
generating an emotion distribution dynamic graph of the hot spot dynamic graph according to the issuing time of each information strip group and the emotion of each information strip group;
the emotion analysis module (120) includes: a second information acquisition sub-module (121), a second data processing sub-module (122), an emotion analysis sub-module (123), an emotion distribution diagram sub-module (124), a positive attitude word cloud generation sub-module (125) and a negative attitude word cloud generation sub-module (126);
a second information acquisition sub-module (121) acquires comment information posted on websites and forums;
the second data processing submodule (122) performs data processing on the comment information acquired by the second information acquisition submodule (121) to remove invalid information, remove redundant information and add missing information to obtain text information;
the emotion analysis submodule (123) is used for carrying out emotion analysis on the text information obtained by processing of the second data processing submodule (122) so as to obtain the proportion of each emotion in all the text information, the text information with positive attitude and the text information with negative attitude;
the emotion distribution map submodule (124) generates an emotion distribution map according to the proportion of each emotion in all the text information obtained by the emotion analysis submodule (123);
the front attitude word cloud generating submodule (125) generates a front attitude word cloud picture according to the text information of the front attitude obtained by the emotion analyzing submodule (123);
the negative attitude word cloud generating submodule (126) generates a negative attitude word cloud picture according to the text information of the negative attitude obtained by the emotion analyzing submodule (123);
the viewpoint mining module (130) includes: a third information acquisition sub-module (131), a third data processing sub-module (132), a viewpoint mining sub-module (133), a viewpoint statistical image-text generation sub-module (134), a hot viewpoint mining sub-module (135) and a hot statistical image-text generation sub-module (136);
microblog information issued on a microblog by a third information acquisition submodule (131);
the third data processing submodule (132) preprocesses the microblog information acquired by the third information acquisition submodule (131) to generate information strips;
the viewpoint mining submodule (133) performs viewpoint mining on the information strips obtained by processing of the third data processing submodule (132);
a viewpoint statistical image-text generation submodule (134) generates a viewpoint statistical image-text according to the viewpoint mined by the viewpoint mining submodule (133);
the heat viewpoint mining submodule (135) carries out heat calculation according to the information bar generated by the third data processing submodule (132) and obtains a heat keyword;
a hot point view statistical generation sub-module (136) generates a hot point view statistical chart from the hot point values calculated by the hot point view mining sub-module (135), and displays the hot keywords on the hot point view statistical chart.
2. The multidimensional public opinion monitoring system according to claim 1, wherein the view mining submodule (133) sorts all the information pieces according to release time; performing data processing on the sequenced information strips to remove invalid information, remove redundant information and add missing information to obtain text information corresponding to the information strips, and performing keyword extraction on the text information corresponding to the information strips; performing keyword word frequency statistics on the extracted keywords in the text information corresponding to all the information items to obtain viewpoint keywords according to a keyword word frequency system; and searching the sorted information strips for viewpoint keywords to obtain text information corresponding to the information strips containing the viewpoint keywords so as to perform viewpoint mining on the information strips.
3. A multidimensional public opinion monitoring method is characterized by comprising the following steps:
s210, respectively collecting news information, comment information and microblog information;
s220, performing data processing on the collected news information, comment information and microblog information to obtain text information or information bars;
step S230, processing the text information or the information strip to obtain a hot word cloud picture, hot news, a hot dynamic picture, an emotion distribution map, a positive attitude word cloud picture, a negative attitude word cloud picture, an opinion statistic graph and a hot opinion statistic picture, and displaying the hot word cloud picture, the hot news, the hot animation and the negative attitude word cloud picture to a user;
obtaining a hot word cloud picture, hot news and a hot dynamic picture, and concretely comprising the following substeps:
step S231, carrying out word frequency statistics on the obtained text information;
step S232, obtaining hot words according to the word frequency in the text information, and generating a hot word cloud picture according to the hot words;
step S233, displaying the news title and the URL corresponding to the hot word as hot news;
step S234, taking the hot word as a keyword, and collecting microblog information containing the keyword;
s235, preprocessing the acquired microblog information, and generating a hot spot dynamic graph according to the preprocessed information;
generating a hot spot dynamic graph according to the preprocessed information, comprising the following substeps:
s410, sequencing all the information strips formed after the preprocessing according to the release time;
step S420, grouping all the sequenced information strips to form a plurality of information strip groups;
one information strip group is [ x ]11、x21……xn1]Wherein x is11、x21……xn1For each information strip in the information strip group, the information strip group has n information strips;
step S430, calculating the heat of each information bar group;
step S440, calculating the emotion of each information strip group;
collecting microblog information directly linked or indirectly linked with the microblog information corresponding to each information strip in the information strip group, and forming the linked microblog information into information strips and information strips xi1The linked information strip is xi2、xi3……ximM-1 is the number of linked information pieces, information piece xi1Information item x linked theretoi2、xi3……ximBuilding a set of information pieces, Xi={xi1、xi2、xi3……xim};
Extracting an association weight value between the acquired microblog information and the linked microblog information, wherein each information strip set X isi={xi1、xi2、xi3……ximHave a correspondence theretoAssociated weight set Yi={yi1、yi2、yi3……yim},yi1、yi2、yi3……yimIs xi1、xi2、xi3……ximCorresponding correlation weight, wherein yi1=1;
By gathering X of information stripsi={xi1、xi2、xi3……ximAnd the corresponding associated weight set Yi={yi1、yi2、yi3……yimConstruction of information strip set Sn={(X1,Y1),(X2,Y2)……(Xn,Yn)};
Inputting the information strip group set into a preset classification model, training the classification model to obtain different sub-classification models ft(a) Wherein T is 1, 2, 3, … … T, using a sub-classification model ft(a) Wherein T is 1, 2, 3, … … T, classifying the information bar group set to obtain a classification result, and obtaining the classification result through a formula
Figure FDA0003099608920000041
To obtain { mu1,μ2,μ3……μTWherein argmin is
Figure FDA0003099608920000051
A set of μ with a minimum value;
calculating each sub-classification model f by particle swarm optimization algorithmt(a) Set of weights of (u) { mu }1,μ2,μ3……μTIn the method, the optimal value corresponding to each weight passes through each sub-classification model { f }1(X)、f2(X)……fT(X) } and its corresponding optimal value of weight [ mu ]1,μ2,μ3......μTCombining the normalized values of the optimal values to obtain a classification model;
acquiring and obtaining an information strip group set, combining and inputting the information strip groups into the obtained classification model, and classifying each information strip group to obtain different types of emotion vocabularies;
according to the formula
Figure FDA0003099608920000052
Calculating the emotion of each information strip group, wherein F is the emotion of each information strip group, NpNumber of words of positive emotion in the set of information pieces obtained by classification, NnFor the number of words of negative emotions in the sorted set of notes, wpiWeight of vocabulary for positive emotion in emotion dictionary, wpjThe weight value of the vocabulary with negative emotion in the emotion dictionary;
s450, generating a heat degree dynamic graph of the hot spot dynamic graph according to the release time of each information strip group and the heat degree of each information strip group;
and step S460, generating an emotion distribution dynamic graph of the hotspot dynamic graph according to the release time of each information strip group and the emotion of each information strip group.
4. The method for monitoring multidimensional public sentiment according to claim 3, wherein the emotion distribution map, the positive attitude term cloud picture and the negative attitude term cloud picture are obtained by the following steps:
step S510, emotion analysis is carried out on the text information to obtain the proportion of each emotion, the text information of the positive attitude and the text information of the negative attitude in all the text information;
step S520, generating an emotion distribution diagram according to the proportion of each emotion in all the obtained text information;
step S530, generating a front attitude word cloud picture according to the obtained text information of the front attitude;
and S540, generating a negative attitude word cloud picture according to the obtained text information of the negative attitude.
5. The method for monitoring multidimensional public opinion of claim 3, wherein the viewpoint statistical graph and the popularity viewpoint statistical graph are obtained by the following steps:
step S610, carrying out viewpoint mining on the information bars;
step S620, generating viewpoint statistical graphics and texts according to the mining viewpoint;
s630, carrying out heat calculation according to the generated information bar and obtaining heat keywords;
and step S640, generating a heat viewpoint statistical chart according to the calculated heat value, and displaying a heat keyword on the heat viewpoint statistical chart.
6. The method for monitoring multidimensional public opinion of claim 5, wherein the viewpoint mining of information strips comprises the following substeps:
step S710, sequencing all the information strips according to the release time;
s720, performing data processing on the sequenced information strips to remove invalid information, remove redundant information and add missing information to obtain text information corresponding to the information strips;
step S730, extracting keywords from the text information corresponding to the information bar;
step S740, carrying out keyword word frequency statistics on the extracted keywords in the text information corresponding to all the information items so as to obtain viewpoint keywords according to a keyword word frequency system;
step S750, searching the sorted information items for a viewpoint keyword to obtain text information corresponding to the information item containing the viewpoint keyword.
7. The method for monitoring multidimensional public opinion as claimed in claim 5, wherein the calculating of popularity according to the information bars to obtain the popularity keyword comprises the following substeps:
step S810, performing heat calculation on each information bar;
s820, sequencing all the information strips according to the calculated heat value to obtain heat information strips;
step S830, data processing is carried out on the sequenced information strips to obtain text information corresponding to the information strips;
and step 840, extracting keywords from the text information corresponding to the heat information items after data processing.
CN202011573978.9A 2020-12-28 2020-12-28 Multi-dimensional public opinion monitoring system and method Expired - Fee Related CN112417253B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011573978.9A CN112417253B (en) 2020-12-28 2020-12-28 Multi-dimensional public opinion monitoring system and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011573978.9A CN112417253B (en) 2020-12-28 2020-12-28 Multi-dimensional public opinion monitoring system and method

Publications (2)

Publication Number Publication Date
CN112417253A CN112417253A (en) 2021-02-26
CN112417253B true CN112417253B (en) 2021-10-15

Family

ID=74782620

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011573978.9A Expired - Fee Related CN112417253B (en) 2020-12-28 2020-12-28 Multi-dimensional public opinion monitoring system and method

Country Status (1)

Country Link
CN (1) CN112417253B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104965931A (en) * 2015-07-30 2015-10-07 成都布林特信息技术有限公司 Big data based public opinion analysis method
CN105224608A (en) * 2015-09-06 2016-01-06 华南理工大学 The hot news Forecasting Methodology analyzed based on microblog data and system
CN108710654A (en) * 2018-05-10 2018-10-26 新华智云科技有限公司 A kind of public sentiment data method for visualizing and equipment
CN109783815A (en) * 2018-12-28 2019-05-21 华南理工大学 A kind of various dimensions network public-opinion big data comparative analysis method
CN111538888A (en) * 2020-06-05 2020-08-14 国网山东省电力公司检修公司 Network public opinion intensity evolution analysis system based on active monitoring engine and big data

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104965931A (en) * 2015-07-30 2015-10-07 成都布林特信息技术有限公司 Big data based public opinion analysis method
CN105224608A (en) * 2015-09-06 2016-01-06 华南理工大学 The hot news Forecasting Methodology analyzed based on microblog data and system
CN108710654A (en) * 2018-05-10 2018-10-26 新华智云科技有限公司 A kind of public sentiment data method for visualizing and equipment
CN109783815A (en) * 2018-12-28 2019-05-21 华南理工大学 A kind of various dimensions network public-opinion big data comparative analysis method
CN111538888A (en) * 2020-06-05 2020-08-14 国网山东省电力公司检修公司 Network public opinion intensity evolution analysis system based on active monitoring engine and big data

Also Published As

Publication number Publication date
CN112417253A (en) 2021-02-26

Similar Documents

Publication Publication Date Title
Hu et al. Text analytics in social media
US7685091B2 (en) System and method for online information analysis
Jiang et al. Mining search and browse logs for web search: A survey
CN107577759A (en) User comment auto recommending method
CN104268148B (en) A kind of forum page Information Automatic Extraction method and system based on time string
CN110705288A (en) Big data-based public opinion analysis system
US8712999B2 (en) Systems and methods for online search recirculation and query categorization
CN111914087B (en) Public opinion analysis method
Geçkil et al. A clickbait detection method on news sites
CN111309936A (en) Method for constructing portrait of movie user
CN105378730A (en) Social media content analysis and output
CN106557558A (en) A kind of data analysing method and device
CN102542066A (en) Video clustering method, ordering method, video searching method and corresponding devices
Schinas et al. Event detection and retrieval on social media
Viet et al. Analyzing recent research trends of computer science from academic open-access digital library
CN115017302A (en) Public opinion monitoring method and public opinion monitoring system
Chen et al. Towards topic trend prediction on a topic evolution model with social connection
CN113282817A (en) Webpage content intelligent collection processing method and system based on webpage search engine data analysis and computer storage medium
CN112417253B (en) Multi-dimensional public opinion monitoring system and method
Ishikawa et al. T-scroll: Visualizing trends in a time-series of documents for interactive user exploration
Kumari et al. Performance improvement of web page genre classification
CN115640439A (en) Method, system and storage medium for network public opinion monitoring
Lalitha et al. Potential Web Content Identification and Classification System using NLP and Machine Learning Techniques
Alotaibi et al. A Comparison of Topic Modeling Algorithms on Visual Social Media Networks
Li et al. A Method of Interest Degree Mining Based on Behavior Data Analysis

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20211022

Address after: 450003 No. 2, Jingwu Road, Jinshui District, Zhengzhou City, Henan Province

Patentee after: Wang Sanshan

Address before: 100020 523, 4th floor, building 15, xinzhaojiayuan, Chaoyang District, Beijing

Patentee before: Time know (Beijing) Culture Technology Co.,Ltd.

CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20211015