CN117370621A - Big data-based foreign language speech multilingual public opinion monitoring and early warning system - Google Patents

Big data-based foreign language speech multilingual public opinion monitoring and early warning system Download PDF

Info

Publication number
CN117370621A
CN117370621A CN202311221563.9A CN202311221563A CN117370621A CN 117370621 A CN117370621 A CN 117370621A CN 202311221563 A CN202311221563 A CN 202311221563A CN 117370621 A CN117370621 A CN 117370621A
Authority
CN
China
Prior art keywords
public opinion
information
semantic
rule
analysis
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311221563.9A
Other languages
Chinese (zh)
Inventor
杨明星
吴丽华
刘林君
牛桂玲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhengzhou University
Original Assignee
Zhengzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhengzhou University filed Critical Zhengzhou University
Publication of CN117370621A publication Critical patent/CN117370621A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/90335Query processing
    • G06F16/90344Query processing by using string matching techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/906Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/247Thesauruses; Synonyms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses an external cross talk multilingual public opinion monitoring and early warning system based on big data, which relates to the technical field of data management and comprises the following steps: the method comprises a keyword setting step, a public opinion information acquisition step, a public opinion information cleaning step, a public opinion information semantic analysis step, a sensitive/hot spot information judgment step, an emergency directional tracking step and a visual analysis report generation step. Based on internet information acquisition, text mining and intelligent retrieval, the method provided by the invention can be used for timely finding and rapidly collecting the required network public opinion information, and carrying out normality, cross-time and cross-space directional tracking, trend analysis and timely and necessary related early warning on network feedback information in the fields of national major policies, hot topics, emergencies, external news and the like in the modes of analysis and filtration, automatic clustering, topic monitoring, statistical analysis and the like.

Description

Big data-based foreign language speech multilingual public opinion monitoring and early warning system
Technical Field
The invention relates to the technical field of data management, in particular to an external cross talk multilingual public opinion monitoring and early warning system based on big data.
Background
The public opinion monitoring work is an extension work of the out-hand words and news propaganda work. The Internet brings people into the big data age and provides a new carrier for expressing views or tendencies for various government institutions, non-government organizations, overseas and overseas people and the like. The liveness of online speech is immeasurable, and the internet is used for expressing the views or spreading the public opinion, so that the fact that the online public opinion and even the public opinion pressure are not contentious is generated. The Internet is a double-edged sword, the Internet public opinion is a problem which cannot be ignored by any country or department, and an external cross speech public opinion monitoring system is also indispensable when an external cross speech parallel corpus is constructed.
Therefore, the system for monitoring and early warning the foreign language public opinion based on big data refers to the method that through related professional public opinion software, based on internet information acquisition, text mining and intelligent retrieval, required network public opinion information is timely found and rapidly collected, and the method for carrying out normality, cross-time and cross-space directional tracking, trend analysis and timely necessary related early warning on network feedback information in the fields of hot topics, emergencies, foreign language news and the like by means of analysis and filtration, automatic clustering, topic monitoring, statistical analysis and the like is a problem to be solved by a person in the art.
Disclosure of Invention
In view of the above, the invention provides an external cross-talk multi-language public opinion monitoring and early warning system based on big data, which realizes the normal, cross-time and cross-space directional tracking, trend analysis and timely necessary related early warning of network feedback information in the target field.
In order to achieve the above purpose, the present invention adopts the following technical scheme:
the external cross-talk multi-language public opinion monitoring and early warning system based on big data is based on an external cross-talk parallel corpus, and specifically comprises the following steps:
s101, keyword setting: setting keywords or terms of the foreign language multi-language public opinion to be monitored;
s201 public opinion information acquisition: based on internet information collection, text mining and intelligent retrieval, timely finding and collecting network public opinion information related to the monitored foreign language multi-language public opinion keywords or terms;
s301, cleaning public opinion information: the obtained network public opinion information is cleaned and purified, and content irrelevant to the theme is removed;
s401, public opinion information semantic analysis: extracting key points of public opinion information through semantic analysis, and providing support for public opinion early warning;
s501 sensitive/hot spot information determining step: automatically screening and/or manually studying and judging key points of the public opinion information to obtain sensitive or hot spot information;
s601, emergency directional tracking: carrying out normal state, cross-time and cross-space directional tracking, trend analysis and timely necessary related early warning on the corresponding field according to the obtained sensitive or hot spot information;
s701, a visual analysis report generating step: and generating a multi-dimensional public opinion event analysis report to assist in making an external cross-talk multi-language public opinion treatment scheme and measures.
In the above method, optionally, the step of collecting S201 public opinion information further includes: converting the semi-structured web page data into a structured text form;
in the above method, optionally, the specific content of the step of semantic analysis of the S401 public opinion information is:
establishing a semantic analysis model, obtaining a training sample library with a semantic analysis structure, carrying out model training on the semantic analysis model by using the training sample library to obtain a trained semantic analysis model, and respectively inputting all information in the original public opinion information into the trained semantic analysis model to obtain a primary semantic analysis result.
In the above method, optionally, the step of semantic analysis of the S401 public opinion information further includes:
firstly, obtaining a seed word, traversing a word stock to obtain words with similar word senses as the seed word, obtaining a synonym word stock, and establishing word families when no unset words with similar word senses as the seed word exist in the synonym word stock; then acquiring an original semantic rule, dividing the original semantic rule into a plurality of rule character strings, identifying rule sequence information of the rule character strings, judging whether the pattern of the original semantic rule is correct by utilizing the rule character strings and the rule sequence information, judging whether the logic of the original semantic rule is correct when the pattern of the original semantic rule is correct, classifying the original semantic rule into a semantic rule library when the logic of the original semantic rule is correct, and establishing a semantic rule library, wherein the semantic rule library comprises a plurality of semantic rule types, and each semantic rule type comprises a plurality of word families arranged according to preset semantic logic;
dividing the primary semantic analysis result into a plurality of primary result rule strings, identifying semantic sequence information of the primary result rule strings, searching whether semantic rule formulas which are the same as the semantic sequence information of the plurality of primary result rule strings exist in the semantic rule formula library, taking the semantic rule formulas as secondary analysis results if the semantic rule formulas which are the same as the semantic sequence information of the plurality of primary result rule strings exist, taking the secondary analysis results as final analysis results, and taking the primary semantic analysis results as final analysis results if the semantic rule formulas which are the same as the semantic sequence information of the plurality of primary result rule strings do not exist.
In the above method, optionally, in the step of determining the sensitive/hot spot information in S501, important public opinion is determined according to public opinion enthusiasm or public opinion type.
In the above method, optionally, in the step of determining the sensitive/hot spot information in S501, determining important public opinion according to public opinion enthusiasm includes:
obtaining public opinion enthusiasm of each public opinion in public opinion information; wherein, the public opinion popularity is represented by click rate;
when the public opinion enthusiasm is larger than the enthusiasm threshold, determining the corresponding public opinion information as important public opinion; or determining the public opinion information of N names before the public opinion enthusiasm at the same time as important public opinion; wherein N is an integer not less than 1.
In the above method, optionally, in the step of determining the sensitive/hot spot information in S501, determining important public opinion according to the public opinion type includes:
obtaining the public opinion type of each public opinion in the public opinion information; the public opinion type is obtained through a machine learning algorithm;
and matching the public opinion type with the target type, and determining the corresponding public opinion information as important public opinion when the matching is successful.
The external cross speaking big data public opinion monitoring system is used for executing the external cross speaking multilingual public opinion monitoring and early warning system based on big data and comprises a keyword setting module, a public opinion information acquisition module, a public opinion information cleaning module, a public opinion information semantic analysis module, a sensitive/hot spot information judging module, an emergency directional tracking module and an analysis report generating module which are connected in sequence;
the keyword setting module is used for setting keywords or terms which are required to be monitored by the multilingual public opinion of the external cross-talk;
the public opinion information acquisition module is used for timely discovering and collecting network public opinion information related to the monitored foreign language speech multilingual public opinion keywords or terms based on internet information acquisition, text mining and intelligent retrieval;
the public opinion information cleaning module is used for cleaning and purifying the obtained network public opinion information and removing content irrelevant to the theme;
the public opinion information semantic analysis module is used for extracting public opinion information key points through semantic analysis and providing support for public opinion early warning;
the sensitive/hot spot information judging module is used for automatically screening and/or manually judging key points of the public opinion information to obtain sensitive or hot spot information;
the emergency directional tracking module is used for carrying out directional tracking, trend analysis and timely necessary related early warning on the normality, cross-time and cross-space of the corresponding field according to the obtained sensitive or hot spot information;
and the visual analysis report generation module is used for generating a multi-dimensional public opinion event analysis report and assisting in formulating an external cross-talk multi-language public opinion treatment scheme and measures.
The storage medium comprises stored instructions, wherein the equipment where the storage medium is controlled to execute the big data-based foreign language speech multilingual public opinion monitoring and early warning system when the instructions run.
An electronic device comprising a memory and one or more instructions, wherein the one or more instructions are stored in the memory and configured to be executed by the one or more processors by the big data based spoken ex-word multilingual public opinion monitoring and early warning system.
Compared with the prior art, the invention provides the external cross talk multilingual public opinion monitoring and early warning system based on big data, which is known from the technical scheme above: based on internet information acquisition, text mining and intelligent retrieval, the required network public opinion information is timely found and rapidly collected, and the normal state, cross-time and cross-space directional tracking, trend analysis and timely and necessary relevant early warning are carried out on overseas network feedback information in the fields of hot topics, emergencies, external news and the like through analysis and filtration, automatic clustering, topic monitoring, statistical analysis and the like.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are only embodiments of the present invention, and other drawings can be obtained according to the provided drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of an external cross talk multilingual public opinion monitoring and early warning system based on big data provided by the invention;
FIG. 2 is a flowchart showing the steps of semantic analysis of public opinion information according to the present invention;
FIG. 3 is a schematic diagram of a system for monitoring and pre-warning multilingual public opinion based on big data of the present invention;
fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
In this application, relational terms such as "first" and "second", and the like, are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions, and the terms "comprise," "include," or any other variation thereof, are intended to cover a non-exclusive inclusion such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
Referring to fig. 1, the invention discloses an external cross-talk multi-language public opinion monitoring and early warning system based on big data, which is based on an external cross-talk parallel corpus and specifically comprises the following steps:
s101, keyword setting: setting keywords or terms of the foreign language multi-language public opinion to be monitored;
s201 public opinion information acquisition: based on internet information collection, text mining and intelligent retrieval, timely finding and collecting network public opinion information related to the monitored foreign language multi-language public opinion keywords or terms;
s301, cleaning public opinion information: the obtained network public opinion information is cleaned and purified, and content irrelevant to the theme is removed;
s401, public opinion information semantic analysis: extracting key points of public opinion information through semantic analysis, and providing support for public opinion early warning;
s501 sensitive/hot spot information determining step: automatically screening and/or manually studying and judging key points of the public opinion information to obtain sensitive or hot spot information;
s601, emergency directional tracking: carrying out normal state, cross-time and cross-space directional tracking, trend analysis and timely necessary related early warning on the corresponding field according to the obtained sensitive or hot spot information;
s701, a visual analysis report generating step: and generating a multi-dimensional public opinion event analysis report to assist in making an external cross-talk multi-language public opinion treatment scheme and measures.
Specifically, in S101, when setting keywords or terms that the spoken word multilingual public opinion needs to monitor, the following points need to be noted:
multilingual coverage: keywords and terms of different languages are considered to ensure coverage of public opinion information worldwide. For example, the terms "diplomacy", "international relations" and the like may be used in english, while "foreign exchange", "international relationship" and the like may be used in english.
Polysemous word problem: some keywords or terms may have various meanings and need to be distinguished for a specific topical topic. For example, "tariffs" may refer to both international trade tariffs and tariffs for a certain product in a particular country.
Spelling and grammar: spelling and grammar rules for different languages need to be considered to ensure that keywords and terms can be correctly identified and matched.
Sensitive word problem: the external topics often relate to sensitive problems, so related keywords and terms need to be carefully processed, and misjudgment or misinformation is avoided.
Monitoring frequency: the public opinion of the external topic often changes rapidly, so that a proper monitoring frequency needs to be set so as to discover and respond to the related public opinion information in time.
Regional characteristics: public opinion monitoring of foreign topics also needs to consider characteristics and cultural backgrounds of different regions so as to better understand and analyze related public opinion information. For example, there may be differences in attitudes and reactions to the foreign topics in different regions.
Further, the step of collecting the S201 public opinion information further includes: the semi-structured web page data is converted into a structured text form.
Specifically, various text information on the internet is collected by utilizing collection software, including news, blogs, forums, microblogs and the like; common acquisition software is crawler software, web crawler framework and the like.
Specifically, the specific contents of S301 include: and cleaning, de-duplicating, filtering and the like are carried out on the acquired text data so as to ensure the accuracy and the integrity of the data. Common data cleaning software is OpenRefine, dataWrangler, etc.
Further, in S401, text mining software RapidMiner, KNIME, weka or the like may be used as auxiliary support.
Further, the specific content of the step of semantic analysis of the S401 public opinion information is:
establishing a semantic analysis model, obtaining a training sample library with a semantic analysis structure, carrying out model training on the semantic analysis model by using the training sample library to obtain a trained semantic analysis model, and respectively inputting all information in the original public opinion information into the trained semantic analysis model to obtain a primary semantic analysis result.
Further, the step of semantic analysis of the S401 public opinion information further includes:
referring specifically to fig. 2, first, obtaining a seed word, traversing a word stock to obtain words with similar word senses as the seed word, obtaining a synonym word stock, and when no unset words with similar word senses as the seed word exist in the synonym word stock, establishing word families; then acquiring an original semantic rule, dividing the original semantic rule into a plurality of rule character strings, identifying rule sequence information of the rule character strings, judging whether the pattern of the original semantic rule is correct by utilizing the rule character strings and the rule sequence information, judging whether the logic of the original semantic rule is correct when the pattern of the original semantic rule is correct, classifying the original semantic rule into a semantic rule library when the logic of the original semantic rule is correct, and establishing a semantic rule library, wherein the semantic rule library comprises a plurality of semantic rule types, and each semantic rule type comprises a plurality of word families arranged according to preset semantic logic;
dividing the primary semantic analysis result into a plurality of primary result rule strings, identifying semantic sequence information of the primary result rule strings, searching whether semantic rule formulas which are the same as the semantic sequence information of the plurality of primary result rule strings exist in the semantic rule formula library, taking the semantic rule formulas as secondary analysis results if the semantic rule formulas which are the same as the semantic sequence information of the plurality of primary result rule strings exist, taking the secondary analysis results as final analysis results, and taking the primary semantic analysis results as final analysis results if the semantic rule formulas which are the same as the semantic sequence information of the plurality of primary result rule strings do not exist.
Further, in the step of determining the sensitive/hot spot information in S501, important public opinion is determined according to public opinion enthusiasm or public opinion type.
Further, in the step of determining the sensitive/hot spot information in S501, determining important public opinion according to public opinion popularity includes:
obtaining public opinion enthusiasm of each public opinion in public opinion information; wherein, the public opinion popularity is represented by click rate;
when the public opinion enthusiasm is larger than the enthusiasm threshold, determining the corresponding public opinion information as important public opinion; or determining the public opinion information of N names before the public opinion enthusiasm at the same time as important public opinion; wherein N is an integer not less than 1.
Further, in the step of determining the sensitive/hot spot information in S501, determining important public opinion according to the public opinion type includes:
obtaining the public opinion type of each public opinion in the public opinion information; the public opinion type is obtained through a machine learning algorithm;
and matching the public opinion type with the target type, and determining the corresponding public opinion information as important public opinion when the matching is successful.
Specifically, machine learning and artificial intelligence software can be utilized to automatically and intelligently process public opinion information, including text classification, emotion analysis, topic identification, etc., such as TensorFlow, pyTorch, keras, etc.
Further, the specific content of S701 is to visually display the results of text mining and intelligent retrieval, so that the user can more intuitively understand and analyze the public opinion information. Common visualization software is Tableau, powerBI, gephi, etc.
In view of the specificity and sensitivity of the field of external cross-talk and external declaration, the custom setting of the external cross-talk public opinion monitoring system can be considered from the following aspects:
firstly, setting keywords or terms to be monitored, so as to realize hot topic and sensitive topic identification. The public opinion overall situation is identified and analyzed by big data, keyword management and control, semantic analysis and the like.
Secondly, setting backtracking time length and monitoring time length, and carrying out public opinion overview on public opinion mention quantity, positive and negative public opinion mention quantity, heat words, emotion proportion and the like in a certain period.
Third, the system should have a "trend analysis" function. For each topic, the positive and negative or neutral evaluation, tendency of its network, and the degree of interest over different time periods are analyzed and counted.
Fourth, the system should be provided with "media analysis" functions, i.e. media distribution, reference to each media type, and the duty cycle.
Fifth, the public opinion early warning system is customized. And timely discovering and reporting early warning notices of sudden events, sensitive topics related to content safety, sensitive public opinion and the like through two modes of keyword early warning and trend early warning.
Sixth, the system should have an "emergency analysis (or statistics report)" function. And performing cross-time and cross-space comprehensive analysis on the emergency, generating a report according to a result library processed by the public opinion analysis engine, predicting the trend of event development, and providing decision support.
Seventh, the system can realize multi-form combined collection, namely, can realize the combined collection of a plurality of keywords and special semantics. Public opinion data information related to the combined keywords can be collected in the system in the modes of 'event + person + place', and the like.
Specifically, public opinion information and negative information screened from a public opinion corpus are timely early-warned and directionally tracked, and immediately fed back to a public opinion handling group or related departments, and a multi-dimensional public opinion event analysis report is generated to assist in making a public opinion handling scheme and measures. Aiming at general information, public opinion analysis is carried out, qualitative and quantitative public opinion analysis data are provided, and the development and change trend of specific public opinion or a public opinion thematic is accurately researched and judged; aiming at thematic information, carrying out public opinion tracking; and aiming at the propaganda information, carrying out public opinion feedback. And automatically generating a public opinion report and a related data table to assist a leading decision to judge whether the establishment of a public opinion treatment scheme and measures is necessary. The Internet environment is intricate, the public opinion monitoring system can effectively monitor and evaluate the risks faced by the public opinion monitoring system, and appropriate measures are taken in advance to clarify, intervene and avoid the risks when necessary, so that the public opinion monitoring system is a necessary way for speaking and broadcasting the words and the foreign work by utilizing 'technological energy'.
Corresponding to the method shown in fig. 1, the embodiment of the invention also provides an external cross word big data public opinion monitoring system, which is used for realizing the method shown in fig. 1, and the external cross word big data public opinion monitoring system provided by the embodiment of the invention can be applied to a computer terminal or various mobile devices, and has a structure schematic diagram shown in fig. 3, and specifically comprises a keyword setting module, a public opinion information acquisition module, a public opinion information cleaning module, a public opinion information semantic analysis module, a sensitive/hot spot information judgment module, an emergency directional tracking module and an analysis report generation module which are connected in sequence;
the keyword setting module is used for setting keywords or terms which are required to be monitored by the multilingual public opinion of the external cross-talk;
the public opinion information acquisition module is used for timely discovering and collecting network public opinion information related to the monitored foreign language speech multilingual public opinion keywords or terms based on internet information acquisition, text mining and intelligent retrieval;
the public opinion information cleaning module is used for cleaning and purifying the obtained network public opinion information and removing content irrelevant to the theme;
the public opinion information semantic analysis module is used for extracting public opinion information key points through semantic analysis and providing support for public opinion early warning;
the sensitive/hot spot information judging module is used for automatically screening and/or manually judging key points of the public opinion information to obtain sensitive or hot spot information;
the emergency directional tracking module is used for carrying out directional tracking, trend analysis and timely necessary related early warning on the normality, cross-time and cross-space of the corresponding field according to the obtained sensitive or hot spot information;
and the visual analysis report generation module is used for generating a multi-dimensional public opinion event analysis report and assisting in formulating an external cross-talk multi-language public opinion treatment scheme and measures.
The embodiment of the invention also provides a storage medium, which comprises stored instructions, wherein the equipment where the storage medium is controlled to execute the big data-based foreign language speech multilingual public opinion monitoring and early warning system when the instructions run.
The embodiment of the invention further provides an electronic device, the structure of which is shown in fig. 4, specifically including a memory 401 and one or more instructions 402, where the one or more instructions 402 are stored in the memory 401, and configured to be executed by the one or more processors 403 to perform the external cross word big data public opinion monitoring operation by the one or more processors 403.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (8)

1. The external cross-talk multi-language public opinion monitoring and early warning system based on big data is characterized by comprising the following steps of:
s101, keyword setting: setting keywords or terms of the foreign language multi-language public opinion to be monitored;
s201 public opinion information acquisition: based on internet information collection, text mining and intelligent retrieval, timely finding and collecting network public opinion information related to the monitored foreign language multi-language public opinion keywords or terms;
s301, cleaning public opinion information: the obtained network public opinion information is cleaned and purified, and content irrelevant to the theme is removed;
s401, public opinion information semantic analysis: extracting key points of public opinion information through semantic analysis, and providing support for public opinion early warning;
s501 sensitive/hot spot information determining step: automatically screening and/or manually studying and judging key points of the public opinion information to obtain sensitive or hot spot information;
s601, emergency directional tracking: carrying out normal state, cross-time and cross-space directional tracking, trend analysis and timely necessary related early warning on the corresponding field according to the obtained sensitive or hot spot information;
s701, a visual analysis report generating step: and generating a multi-dimensional public opinion event analysis report to assist in making an external cross-talk multi-language public opinion treatment scheme and measures.
2. The system for monitoring and pre-warning multi-lingual public opinion based on big data of claim 1, wherein,
the step of S201 public opinion information acquisition further comprises: the semi-structured web page data is converted into a structured text form.
3. The system for monitoring and pre-warning multi-lingual public opinion based on big data of claim 1, wherein,
the specific content of the S401 public opinion information semantic analysis step is as follows:
establishing a semantic analysis model, obtaining a training sample library with a semantic analysis structure, carrying out model training on the semantic analysis model by using the training sample library to obtain a trained semantic analysis model, and respectively inputting all information in the original public opinion information into the trained semantic analysis model to obtain a primary semantic analysis result.
4. The system for monitoring and pre-warning multi-lingual public opinion based on big data according to claim 3, wherein,
the step of semantic analysis of the S401 public opinion information further comprises the following steps:
firstly, obtaining a seed word, traversing a word stock to obtain words with similar word senses as the seed word, obtaining a synonym word stock, and establishing word families when no unset words with similar word senses as the seed word exist in the synonym word stock; then acquiring an original semantic rule, dividing the original semantic rule into a plurality of rule character strings, identifying rule sequence information of the rule character strings, judging whether the pattern of the original semantic rule is correct by utilizing the rule character strings and the rule sequence information, judging whether the logic of the original semantic rule is correct when the pattern of the original semantic rule is correct, classifying the original semantic rule into a semantic rule library when the logic of the original semantic rule is correct, and establishing a semantic rule library, wherein the semantic rule library comprises a plurality of semantic rule types, and each semantic rule type comprises a plurality of word families arranged according to preset semantic logic;
dividing the primary semantic analysis result into a plurality of primary result rule strings, identifying semantic sequence information of the primary result rule strings, searching whether semantic rule formulas which are the same as the semantic sequence information of the plurality of primary result rule strings exist in the semantic rule formula library, taking the semantic rule formulas as secondary analysis results if the semantic rule formulas which are the same as the semantic sequence information of the plurality of primary result rule strings exist, taking the secondary analysis results as final analysis results, and taking the primary semantic analysis results as final analysis results if the semantic rule formulas which are the same as the semantic sequence information of the plurality of primary result rule strings do not exist.
5. The system for monitoring and pre-warning multi-lingual public opinion based on big data according to claim 3, wherein,
in the step of S501 sensitivity/hotspot information determination, important public opinion is determined according to public opinion enthusiasm or public opinion type.
6. The system for monitoring and pre-warning multi-lingual public opinion based on big data of claim 5, wherein,
in the step of determining the sensitive/hot spot information in S501, determining important public opinion according to public opinion enthusiasm includes:
obtaining public opinion enthusiasm of each public opinion in public opinion information; wherein, the public opinion popularity is represented by click rate;
when the public opinion enthusiasm is larger than the enthusiasm threshold, determining the corresponding public opinion information as important public opinion; or determining the public opinion information of N names before the public opinion enthusiasm at the same time as important public opinion; wherein N is an integer not less than 1.
7. The system for monitoring and pre-warning multi-lingual public opinion based on big data of claim 5, wherein,
in the step of determining the sensitive/hot spot information in S501, determining important public opinion according to the public opinion type includes:
obtaining the public opinion type of each public opinion in the public opinion information; the public opinion type is obtained through a machine learning algorithm;
and matching the public opinion type with the target type, and determining the corresponding public opinion information as important public opinion when the matching is successful.
8. The big data public opinion monitoring system based on the big data is characterized by executing the big data-based multi-language public opinion monitoring and early warning system based on the big data, and comprises a keyword setting module, a public opinion information acquisition module, a public opinion information cleaning module, a public opinion information semantic analysis module, a sensitive/hot spot information judgment module, an emergency orientation tracking module and an analysis report generation module which are connected in sequence;
the keyword setting module is used for setting keywords or terms which are required to be monitored by the multilingual public opinion of the external cross-talk;
the public opinion information acquisition module is used for timely discovering and collecting network public opinion information related to the monitored foreign language speech multilingual public opinion keywords or terms based on internet information acquisition, text mining and intelligent retrieval;
the public opinion information cleaning module is used for cleaning and purifying the obtained network public opinion information and removing content irrelevant to the theme;
the public opinion information semantic analysis module is used for extracting public opinion information key points through semantic analysis and providing support for public opinion early warning;
the sensitive/hot spot information judging module is used for automatically screening and/or manually judging key points of the public opinion information to obtain sensitive or hot spot information;
the emergency directional tracking module is used for carrying out directional tracking, trend analysis and timely necessary related early warning on the normality, cross-time and cross-space of the corresponding field according to the obtained sensitive or hot spot information;
and the visual analysis report generation module is used for generating a multi-dimensional public opinion event analysis report and assisting in formulating an external cross-talk multi-language public opinion treatment scheme and measures.
CN202311221563.9A 2023-08-28 2023-09-21 Big data-based foreign language speech multilingual public opinion monitoring and early warning system Pending CN117370621A (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN2023110899357 2023-08-28
CN202311089935 2023-08-28

Publications (1)

Publication Number Publication Date
CN117370621A true CN117370621A (en) 2024-01-09

Family

ID=89399312

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311221563.9A Pending CN117370621A (en) 2023-08-28 2023-09-21 Big data-based foreign language speech multilingual public opinion monitoring and early warning system

Country Status (1)

Country Link
CN (1) CN117370621A (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101394311A (en) * 2008-11-12 2009-03-25 北京交通大学 Network public opinion prediction method based on time sequence
CN104408157A (en) * 2014-12-05 2015-03-11 四川诚品电子商务有限公司 Funnel type data gathering, analyzing and pushing system and method for online public opinion
CN104933093A (en) * 2015-05-19 2015-09-23 武汉泰迪智慧科技有限公司 Regional public opinion monitoring and decision-making auxiliary system and method based on big data
CN112434226A (en) * 2020-12-15 2021-03-02 易研信息科技有限公司 Network public opinion monitoring and early warning method
CN114298051A (en) * 2021-12-31 2022-04-08 安徽大学 Public opinion studying, judging, distributing and disposing management system based on big data
CN114707045A (en) * 2022-03-23 2022-07-05 江苏悉宁科技有限公司 Big data-based public opinion monitoring method and system

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101394311A (en) * 2008-11-12 2009-03-25 北京交通大学 Network public opinion prediction method based on time sequence
CN104408157A (en) * 2014-12-05 2015-03-11 四川诚品电子商务有限公司 Funnel type data gathering, analyzing and pushing system and method for online public opinion
CN104933093A (en) * 2015-05-19 2015-09-23 武汉泰迪智慧科技有限公司 Regional public opinion monitoring and decision-making auxiliary system and method based on big data
CN112434226A (en) * 2020-12-15 2021-03-02 易研信息科技有限公司 Network public opinion monitoring and early warning method
CN114298051A (en) * 2021-12-31 2022-04-08 安徽大学 Public opinion studying, judging, distributing and disposing management system based on big data
CN114707045A (en) * 2022-03-23 2022-07-05 江苏悉宁科技有限公司 Big data-based public opinion monitoring method and system

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
王晋等: "大数据营销(第2版)", 31 October 2021, pages: 161 - 163 *
邓海荣等: "舆论学基础与实务教程", 31 December 2020, pages: 142 - 150 *
陆泽凯等: "微博中的"中美外交风波"舆情文本研究——基于R语言的词向量情感分析", 传媒观察, 28 February 2021 (2021-02-28), pages 54 - 61 *

Similar Documents

Publication Publication Date Title
CN110516067B (en) Public opinion monitoring method, system and storage medium based on topic detection
CN108009182B (en) Information extraction method and device
Inzalkar et al. A survey on text mining-techniques and application
Rajpathak An ontology based text mining system for knowledge discovery from the diagnosis data in the automotive domain
Gottipati et al. Finding relevant answers in software forums
Gupta et al. Prediction of research trends using LDA based topic modeling
Kou et al. Automatic labelling of topic models using word vectors and letter trigram vectors
Ansari Cost-based text understanding to improve maintenance knowledge intelligence in manufacturing enterprises
Basha et al. Evaluating the impact of feature selection on overall performance of sentiment analysis
Wong et al. A multi-phase correlation search framework for mining non-taxonomic relations from unstructured text
Rinaldi et al. Using a multimedia semantic graph for web document visualization and summarization
KR102107474B1 (en) Social issue deduction system and method using crawling
Latiffi et al. Sentiment analysis: An enhancement of ontological-based using hybrid machine learning techniques
Wong et al. Wiki-reliability: A large scale dataset for content reliability on wikipedia
Kochuieva et al. Usage of Sentiment Analysis to Tracking Public Opinion.
Scholz et al. Opinion mining on a german corpus of a media response analysis
Gupta et al. Research and implementation of event extraction from twitter using LDA and scoring function
Baysal et al. Correlating social interactions to release history during software evolution
Han et al. WikiCSSH: extracting computer science subject headings from Wikipedia
Golub Automated subject classification of textual Web pages, based on a controlled vocabulary: Challenges and recommendations
Mohemad et al. Performance analysis in text clustering using k-means and k-medoids algorithms for Malay crime documents
CN110516157B (en) Document retrieval method, document retrieval equipment and storage medium
Salatino et al. Detection, analysis, and prediction of research topics with scientific knowledge graphs
Spahiu et al. Topic profiling benchmarks in the linked open data cloud: Issues and lessons learned
Li et al. Opinion mining of camera reviews based on semantic role labeling

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination