CN114896522A - Multi-platform information epidemic situation risk assessment method and device - Google Patents

Multi-platform information epidemic situation risk assessment method and device Download PDF

Info

Publication number
CN114896522A
CN114896522A CN202210382759.5A CN202210382759A CN114896522A CN 114896522 A CN114896522 A CN 114896522A CN 202210382759 A CN202210382759 A CN 202210382759A CN 114896522 A CN114896522 A CN 114896522A
Authority
CN
China
Prior art keywords
information
epidemic situation
reliability
epidemic
risk index
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210382759.5A
Other languages
Chinese (zh)
Other versions
CN114896522B (en
Inventor
吴俊杰
殷博文
杜文宇
何熙
杨智尧
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beihang University
Original Assignee
Beihang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beihang University filed Critical Beihang University
Priority to CN202210382759.5A priority Critical patent/CN114896522B/en
Publication of CN114896522A publication Critical patent/CN114896522A/en
Application granted granted Critical
Publication of CN114896522B publication Critical patent/CN114896522B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9537Spatial or temporal dependent retrieval, e.g. spatiotemporal queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0635Risk analysis of enterprise or organisation activities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06393Score-carding, benchmarking or key performance indicator [KPI] analysis
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/80ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for detecting, monitoring or modelling epidemics or pandemics, e.g. flu
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Economics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Strategic Management (AREA)
  • Health & Medical Sciences (AREA)
  • Development Economics (AREA)
  • Public Health (AREA)
  • General Engineering & Computer Science (AREA)
  • Educational Administration (AREA)
  • Medical Informatics (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • Operations Research (AREA)
  • Game Theory and Decision Science (AREA)
  • General Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Biomedical Technology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Pathology (AREA)
  • Epidemiology (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a multi-platform information epidemic situation risk assessment method, which comprises the following steps: the method comprises the following steps: collecting flow data of each platform; step two: extracting a domain name list for each piece of stream data to obtain domain name redirection historical information, and matching the domain name redirection historical information with a domain name reliability corpus to obtain a reliability label of the stream data; step three: analyzing the user-defined position of each piece of streaming data to obtain geographic information; step four: grouping the stream data according to two dimensions of geographic information and time; step five: quantifying a static information epidemic situation risk index value based on the number of fans of the user and the reliability label for each group of flow data; step six: and quantifying the dynamic information epidemic situation risk index values for each group of stream data based on the praise number, the forwarding number, the comment number and the reliability label. The invention also provides an evaluation device. The invention reflects the upper risk limit and the information epidemic degree of the information epidemic by constructing the static information epidemic risk index and the dynamic information epidemic risk index.

Description

Multi-platform information epidemic situation risk assessment method and device
Technical Field
The invention relates to the technical field of data mining. More specifically, the invention relates to a multi-platform information epidemic situation risk assessment method and device.
Background
The emergent new crown pneumonia epidemic situation brings huge impact to the world. This impact not only involves physical space, but also spreads to network space, and "information epidemic" featuring a false news flood appears. The information epidemic situation not only seriously interferes with epidemic situation prevention and control, but also threatens the safety and stability of the region. If the risk of the information epidemic situation in each area can be evaluated in real time, the online and offline epidemic situation prevention and control are greatly facilitated. Therefore, it is desirable to design an effective method and device for evaluating risk of information epidemic situation.
Disclosure of Invention
The invention aims to provide a multi-platform information epidemic situation risk assessment method and device, which reflect the upper risk limit and the information epidemic situation degree of the information epidemic situation by constructing a static information epidemic situation risk index and a dynamic information epidemic situation risk index.
To achieve these objects and other advantages and in accordance with the purpose of the invention, as embodied and broadly described herein, there is provided a multi-platform information epidemic risk assessment method, comprising: the method comprises the following steps: collecting flow data of each platform; step two: extracting a domain name list for each piece of stream data to obtain domain name redirection historical information, and matching the domain name redirection historical information with a domain name reliability corpus to obtain a reliability label of the stream data; analyzing the user-defined position of each piece of streaming data to obtain geographic information; step four, grouping the stream data according to two dimensions of geographic information and time; step five, quantizing each group of streaming data to obtain a static information epidemic situation risk index value reflecting the static information epidemic situation risk degree of a region in a certain time period based on the number of the user fans and the reliability label; and step six, quantizing each group of stream data to obtain a dynamic information epidemic situation risk index value reflecting the dynamic information epidemic situation risk degree of a region in a certain time period based on the praise number, the forwarding number, the comment number and the reliability label.
Further, still include: collecting domain name reliability labeling data of a plurality of sources; constructing multiple types of reliability labels, giving unreliability scores, and mapping domain name reliability labeling data of each source to the reliability labels; and merging the domain name reliability labeling data of all sources according to the reliability labels to form a domain name reliability corpus, and adopting the domain name reliability labeling data with a plurality of reliability labels as the reliability labels with the lowest unreliability score.
Further, formula 1 is adopted to calculate static information epidemic situation risk index value staticIRI c,d
Figure BSA0000271085620000021
Wherein c represents an area, d represents a time period, T c,d Representing all stream data with reliability labels, fans, of c-region during d-period i Number of fans of a poster, r, representing stream data i i Representing the unreliability score of the stream data i.
Further, dynamic information epidemic situation risk index value dynamic nai is calculated by adopting formula 2 c,d
Figure BSA0000271085620000022
Wherein c represents an area, d represents a time period, T c,d Represents all stream data with reliability label, like, of c region during d time period i Indicates the number of praise of stream data i, r i Representing the unreliability score of the stream data i.
Further, a regular expression is adopted for domain name extraction, and asynchronous crawlers are adopted for obtaining domain name redirection historical information.
Further, the user-defined position is analyzed by adopting a geocoding service provided by a geographic service provider, and geographic information is obtained.
Further, the granularity of the time grouping is day, week or month, and the granularity of the geographic information grouping is city, province or country.
Further, still include: acquiring static information epidemic situation risk index values and dynamic information epidemic situation risk index values corresponding to all time periods in historical time, training to acquire a neural network prediction model by taking the static information epidemic situation risk index values as input and the dynamic information epidemic situation risk index values as output; inputting the static information epidemic situation risk index value of the current time period into a neural network prediction model, outputting a prediction value of the dynamic information epidemic situation risk index value, comparing the prediction value with the dynamic information epidemic situation risk index value of the current time period, marking the dynamic information epidemic situation risk index value of the current time period if the error exceeds a first preset range, and re-collecting the flow data of the current time period if the error exceeds a second preset range.
According to another aspect of the present invention, there is also provided a multi-platform information epidemic situation risk assessment apparatus, comprising: the acquisition module is used for collecting the flow data of each platform; the label endowing module is used for extracting a domain name list from each piece of stream data, acquiring domain name redirection historical information, matching the domain name redirection historical information with a domain name reliability corpus and endowing a reliability label; the geographic information analysis module is used for analyzing the user-defined position of each piece of streaming data to obtain geographic information; the grouping module is used for grouping the stream data according to two dimensions of geographic information and time; the system comprises a static information epidemic situation risk index value calculation module, a reliability label calculation module and a dynamic information epidemic situation risk index value calculation module, wherein the static information epidemic situation risk index value calculation module is used for quantizing each group of streaming data to obtain a static information epidemic situation risk index value reflecting the static information epidemic situation risk degree of a region in a certain time period based on the number of fans of users and the reliability label; and the dynamic information epidemic situation risk index value calculation module is used for quantizing and obtaining the dynamic information epidemic situation risk index value reflecting the dynamic information epidemic situation risk degree of a region in a certain time period based on the praise number, the forwarding number, the comment number and the reliability label for each group of streaming data.
Further, the air conditioner is characterized in that,the static information epidemic situation risk index value calculation module adopts a formula 1 to calculate the static information epidemic situation risk index value staticIRI c,d
Figure BSA0000271085620000031
The dynamic information epidemic situation risk index value calculation module adopts a formula 2 to calculate dynamic information epidemic situation risk index value dynamic microIRI c,d
Figure BSA0000271085620000032
Wherein c represents a region, d represents a time period, T c,d Represents all stream data with reliability label, like, of c region during d time period i Indicates the like number, pans, of the stream data i i Number of fans of a poster, r, representing stream data i i Representing the unreliability score of the stream data i.
The invention at least comprises the following beneficial effects:
the method comprehensively considers the commonality of the multiple platforms, constructs the static information epidemic situation risk index and the dynamic information epidemic situation risk index, and comprehensively evaluates the information epidemic situation risk degree of each platform. The static indexes reflect the upper risk limit of the information epidemic situation, and the dynamic indexes evaluate the information epidemic situation degree of each current platform; compared with the traditional simple matching information epidemic situation calculation method, the information epidemic situation risk calculation method based on the redirection technology has more accurate and comprehensive results.
Additional advantages, objects, and features of the invention will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the invention.
Drawings
Fig. 1 is a frame diagram of the present invention.
Detailed Description
The present invention is further described in detail below with reference to the attached drawings so that those skilled in the art can implement the invention by referring to the description text.
It will be understood that terms such as "having," "including," and "comprising," as used herein, do not preclude the presence or addition of one or more other elements or groups thereof.
As shown in fig. 1, an embodiment of the present application provides a multi-platform information epidemic situation risk assessment method, including: the method comprises the following steps: collecting flow data of each platform, wherein the platforms comprise a client, a microblog, a twitter, a Facebook, a telegram and the like, and each piece of flow data at least comprises information such as text content, release time, user-defined position, user fan number, praise number and the like of the piece of flow data;
optionally, streaming data on social platforms such as microblog, facebook, twitter and telegram are collected daily by keywords such as "COVID-19", "Virus", and the like, and the server distributes the daily streaming data to each port of the server on average by using a pull-push scheme in the transport layer ZMQ; after the server receives the stream data, the stream data of each channel is respectively stored in the corresponding table, namely each channel is provided with a stream data storage table; the database can adopt an Elasticissearch distributed storage database to realize large-scale data rapid processing and storage; the data owned by each platform at least comprises mid, text, time, user _ location, user _ fans and like _ num; mid represents the id of the current stream data, text represents the text content of the current stream data, time represents the publishing time of the current stream data, user _ location represents the custom position of the current stream data poster, user _ fans represents the number of fans of the current stream data poster, and like _ num represents the number of prawns of the current stream data;
step two: extracting a domain name list for each piece of stream data to obtain domain name redirection historical information, and matching the domain name redirection historical information with a domain name reliability corpus to obtain a reliability label of the stream data; extracting a domain name list contained in the text content, and further obtaining domain name redirection historical information after the domain name list is extracted; then, matching the domain name redirection historical information of the flow data with a domain name reliability corpus to obtain a reliability label of the flow data;
step three: analyzing the user-defined position of each piece of streaming data to obtain geographic information;
step four: grouping the stream data according to two dimensions of geographic information and time; grouping all data in a database according to time and location, and ensuring that the time and location fields of each group are the same;
step five: quantifying to obtain a static information epidemic situation risk index value (risk degree) reflecting the static information epidemic situation risk degree of a region in a certain time period based on one or more of the number of fans of the user and the reliability label for each group of stream data;
step six: quantifying to obtain a dynamic information epidemic situation risk index value (risk degree) reflecting the dynamic information epidemic situation risk degree of a region in a certain time period based on one or more of the praise number, the forwarding number, the comment number and the reliability label for each group of streaming data;
as can be seen, in view of the great difference between the data formats and the contents of different social platforms, the embodiment comprehensively considers the commonality of the multiple platforms, constructs the static information epidemic risk index and the dynamic information epidemic risk index, and comprehensively evaluates the information epidemic risk degree of each platform; the static indexes reflect the upper risk limit of the information epidemic situation, and the dynamic indexes evaluate the information epidemic situation degree of each current platform; compared with the traditional simple matching information epidemic situation calculation method, the information epidemic situation risk calculation method based on the redirection technology has more accurate and comprehensive results.
Aiming at the problems of dependence on a domain name reliability corpus, a false domain name with a vest and the like in the false information judgment, a method for integrating domain name marking data of different sources is established, and a redirection technology is introduced for domain name matching, so that the false information judgment is more rigorous and accurate. The index system can not only evaluate the risk degree of the information epidemic situation, but also deeply explain the action mechanism behind the evolution of the information epidemic situation.
In other embodiments, further comprising: collecting domain name reliability labeling data of a plurality of sources; constructing multiple types of reliability labels, giving unreliability scores, and mapping domain name reliability labeling data of each source to the reliability labels; merging domain name reliability labeling data of various sources according to the reliability labels to form a domain name reliability corpus, and adopting the domain name reliability labeling data with a plurality of reliability labels as the reliability labels with the lowest unreliability score;
optionally, collecting authoritative domain reliability labeling data widely used by academia, wherein the domain reliability labeling data includes medialabias factcheck, decodex, etc., and its main format is a domain reliability label key value pair, such as { "79 days. Conspiracy theory }, { "lung. Science }; here, by way of example only, but not limited to the above two domain name reliability labeling data, the domain name reliability labeling data conforming to the class are all within the protection range;
unified reliability labels were constructed, including eight categories of "science", "mainstream media", "irony", "title party", "others", "politics", "spurious information", "pseudoscience", with unreliability scores ranging from 1 to 8, sorted from low to high unreliability. The labels of the domain name reliability marking data from different sources are not necessarily the same, and need to be unified for convenient combination. Mapping the domain name reliability label data labels of the various sources onto a uniform reliability label, such as the following equation, { "79 days. The "conspiracy" label in conspiracy theory } is mapped onto the "pseudo-science" label;
then, combining the domain name reliability data of each source, and taking the label with the lowest reliability as the final label of the domain name with a plurality of labels; news is labeled as 'false information' in mediabiasfailure check and 'pseudo science' in decodex, for example 79days, news is labeled as 'pseudo science' in mediabiasfailure check, and the domain name label is finally merged into 'pseudo science';
aiming at the problem that reliability evaluation is extremely dependent on a domain name reliability corpus, the embodiment comprehensively considers different characteristics of main stream websites in various regions, and establishes a merging method for false domain name marking data from different sources based on the research of evaluating the reliability of stream data in the current academic community; the method can capture more comprehensive false information by combining the false domain name tag data of each source, thereby more deeply and comprehensively explaining the action mechanism behind the information epidemic situation evolution.
In other embodiments, static information epidemic situation risk index value statiri is calculated by adopting formula 1 c,d
Figure BSA0000271085620000051
Wherein c represents an area, d represents a time period, T c,d Representing all stream data with reliability labels, fans, of c-region during d-period i Number of fans of a poster, r, representing stream data i i Representing the unreliability score of the stream data i,
Figure BSA0000271085620000063
indicating the number of posts.
In other embodiments, dynamic information epidemic risk index value dynamic microIRI is calculated by adopting formula 2 c,d
Figure BSA0000271085620000061
Wherein c represents an area, d represents a time period, T c,d Represents all stream data with reliability label, like, of c region during d time period i Indicates the number of praise of stream data i, r i Representing the unreliability score of the stream data i,
Figure BSA0000271085620000062
indicating the number of posts.
In other embodiments, a regular expression is adopted for domain name extraction, and asynchronous crawlers are adopted for obtaining domain name redirection historical information; optionally, for each piece of stream data, all domain names in the stream data text field are matched out with a regular expression. Then, redirecting the domain names by adopting asynchttp to obtain redirection history of the domain names; matching the redirection histories of the domain names with a domain name reliability corpus by using a regular expression to obtain reliability labels of the domain name redirection histories; taking the lowest reliability as the reliability label of the stream data; and storing the reliability label of the piece of stream data as a reliability field in a database; the method breaks through the constraint that the traditional method only evaluates the reliability of the stream data through simple matching, comprehensively considers the condition that the false information possibly has a domain name vest, and obtains the real domain name of the false information by using the redirection technology, thereby evaluating the information epidemic situation risk degree more comprehensively and accurately; therefore, compared with the traditional simple matching information epidemic situation calculation method, the information epidemic situation risk calculation method based on the redirection technology has more accurate and comprehensive results.
In other embodiments, a geocoding service provided by a geographic service provider is used for analyzing a user-defined position to obtain geographic information; optionally, for each piece of stream data, extracting a user _ location field of the stream data, sending a request constructed by using requests to the ArcGIS server, obtaining structured geographic information of the user _ location, and storing the structured geographic information as the location field in the database; if the user _ location is Beijing aerospace university, the geographic information of China, Beijing is obtained after the user _ location is sent to ArcGIS.
In other embodiments, the granularity of the time groupings is days, weeks, or months, and the granularity of the geographic information groupings is municipality, province, or country; optionally, in the database, all data are grouped according to time and location, and the time and location fields of each group are guaranteed to be the same. The granularity of time can be set as day, week and month, the granularity of location can be set as province and country, and the specific granularity can be adjusted according to the analysis requirement.
In other embodiments, further comprising: acquiring static information epidemic situation risk index values and dynamic information epidemic situation risk index values corresponding to all time periods in historical time, training to acquire a neural network prediction model by taking the static information epidemic situation risk index values as input and the dynamic information epidemic situation risk index values as output; inputting the static information epidemic situation risk index value of the current time period into a neural network prediction model, outputting a prediction value of the dynamic information epidemic situation risk index value, comparing the prediction value with the dynamic information epidemic situation risk index value of the current time period, marking the dynamic information epidemic situation risk index value of the current time period if the error exceeds a first preset range, and re-collecting the flow data of the current time period if the error exceeds a second preset range;
in the above embodiment, within a period of time, the static information epidemic risk index value and the dynamic information epidemic risk index value have a strong association, but the static information epidemic risk index value is less influenced by human, and the dynamic information epidemic risk index value is more influenced by human and is easily operated by human, which is intended to perform internal detection on the static information epidemic risk index value and the dynamic information epidemic risk index; firstly, training an LSTM model to construct a neural network prediction model according to two indexes in a historical time period; when the static information epidemic situation risk index value of the current time period is calculated, the static information epidemic situation risk index value is input into the neural network prediction model, the predicted value of the dynamic information epidemic situation risk index value is output, the predicted value is compared with the actual value of the dynamic information epidemic situation risk index calculated by the formula 2, and an error is calculated; comparing the error with a first preset range and a second preset range, wherein the first preset range can be 20-30%, when the error between the predicted value and the actual value is larger than the first preset range, the fact that the actual value of the dynamic information epidemic situation risk index is higher to a certain extent is shown, and marking the actual value of the dynamic information epidemic situation risk index at the moment for distinguishing by users; when the error between the predicted value and the actual value is larger than the second predetermined range, the first predetermined range can be selected to be 50-60%, which indicates that the actual value of the dynamic information epidemic situation risk index is extremely high, and the actual value of the dynamic information epidemic situation risk index at the moment is inaccurate and needs to be calculated by collecting data again.
The embodiment of this application still provides many platforms information epidemic situation risk assessment device, includes: the acquisition module is used for collecting the flow data of each platform; the label endowing module is used for extracting a domain name list from each piece of stream data, acquiring domain name redirection historical information, matching the domain name redirection historical information with a domain name reliability corpus and endowing a reliability label; the geographic information analysis module is used for analyzing the user-defined position of each piece of streaming data to obtain geographic information; the grouping module is used for grouping the stream data according to two dimensions of geographic information and time; the system comprises a static information epidemic situation risk index value calculation module, a reliability label calculation module and a dynamic information epidemic situation risk index value calculation module, wherein the static information epidemic situation risk index value calculation module is used for quantizing each group of streaming data to obtain a static information epidemic situation risk index value reflecting the static information epidemic situation risk degree of a region in a certain time period based on the number of fans of users and the reliability label; the dynamic information epidemic situation risk index value calculation module is used for quantizing and obtaining a dynamic information epidemic situation risk index value which reflects the dynamic information epidemic situation risk degree of a region in a certain time period based on the praise number, the forwarding number, the comment number and the reliability label for each group of streaming data;
in the embodiment, a processor and a memory are used for establishing a collection module, a tag giving module, a geographic information analysis module, a grouping module, a static information epidemic situation risk index value calculation module and a dynamic information epidemic situation risk index value calculation module, so as to realize the multi-platform information epidemic situation risk assessment method of the above embodiment, which is described above specifically; the information epidemic situation risk degree can be calculated every day, and dynamic and static information epidemic situation risk indexes are obtained on the basis; all the indexes are stored in the database according to time and regions, so that the information epidemic situation evolution trend of each country and province from the monitoring date to the current date can be obtained by inquiring the database, the action mechanism of the evolution trend is analyzed, and the future evolution position of the information epidemic situation is predicted; the method is beneficial to releasing early warning information of the information epidemic situation in advance, rapidly mastering the development and change trend of the information epidemic situation, better understanding and guiding the mind of netizens, playing the active role of network public sentiment and helping the real epidemic situation under the line.
In other embodiments, the static information epidemic situation risk index value calculation module adopts formula 1 to calculate the static information epidemic situation risk index value staticIRI c,d
Figure BSA0000271085620000081
The dynamic information epidemic situation risk index value calculation module adopts publicEquation 2 for calculating dynamic information epidemic situation risk index value dynamic NAMICIRI c,d
Figure BSA0000271085620000082
Wherein c represents an area, d represents a time period, T c,d Represents all stream data with reliability label, like, of c region during d time period i Indicates the like number, pans, of the stream data i i Number of fans of a poster, r, representing stream data i i Representing the unreliability score of the stream data i,
Figure BSA0000271085620000083
indicating the number of posts.
The number of apparatuses and the scale of the process described herein are intended to simplify the description of the present invention. Applications, modifications and variations of the multi-platform information epidemic risk assessment method of the present invention will be apparent to those skilled in the art.
While embodiments of the invention have been described above, it is not limited to the applications set forth in the description and the embodiments, which are fully applicable in various fields of endeavor to which the invention pertains, and further modifications may readily be made by those skilled in the art, it being understood that the invention is not limited to the details shown and described herein without departing from the general concept defined by the appended claims and their equivalents.

Claims (10)

1. The multi-platform information epidemic situation risk assessment method is characterized by comprising the following steps:
the method comprises the following steps: collecting flow data of each platform;
step two: extracting a domain name list for each piece of stream data to obtain domain name redirection historical information, and matching the domain name redirection historical information with a domain name reliability corpus to obtain a reliability label of the stream data;
step three: analyzing the user-defined position of each piece of streaming data to obtain geographic information;
step four: grouping the stream data according to two dimensions of geographic information and time;
step five: quantizing each group of flow data to obtain a static information epidemic situation risk index value reflecting the static information epidemic situation risk degree of a region in a certain time period based on the number of the fan of the user and the reliability label;
step six: and quantizing each group of stream data to obtain a dynamic information epidemic situation risk index value reflecting the dynamic information epidemic situation risk degree of a region in a certain time period based on the praise number, the forwarding number, the comment number and the reliability label.
2. The multi-platform information epidemic risk assessment method of claim 1, further comprising:
collecting domain name reliability labeling data of a plurality of sources;
constructing multiple types of reliability labels, giving unreliability scores, and mapping domain name reliability labeling data of each source to the reliability labels;
and merging the domain name reliability labeling data of all sources according to the reliability labels to form a domain name reliability corpus, and adopting the domain name reliability labeling data with a plurality of reliability labels as the reliability labels with the lowest unreliability score.
3. The multi-platform information epidemic situation risk assessment method according to claim 1, wherein formula 1 is adopted to calculate static information epidemic situation risk index value statjri c,d
Figure FSA0000271085610000011
Wherein c represents an area, d represents a time period, T c,d Representing all stream data with reliability labels, fans, of c-region during d-period i Number of fans of a poster, r, representing stream data i i Representing the unreliability score of the stream data i.
4. The method according to claim 1, wherein the dynamic information epidemic risk index value dynamic microIRI is calculated by formula 2 c,d
Figure FSA0000271085610000021
Wherein c represents an area, d represents a time period, T c,d Represents all stream data with reliability label, like, of c region during d time period i Indicates the number of praise of stream data i, r i Representing the unreliability score of the stream data i.
5. The multi-platform information epidemic risk assessment method according to claim 1, characterized in that a regular expression is adopted for domain name extraction, and an asynchronous crawler is adopted for obtaining domain name redirection history information.
6. The multi-platform information epidemic risk assessment method according to claim 1, wherein the geographic information is obtained by analyzing the user-defined location using geocoding services provided by a geographic service provider.
7. The multi-platform information epidemic risk assessment method according to claim 1, wherein the granularity of time grouping is day, week or month, and the granularity of geographic information grouping is city, province or country.
8. The multi-platform information epidemic risk assessment method of claim 1, further comprising:
acquiring static information epidemic situation risk index values and dynamic information epidemic situation risk index values corresponding to all time periods in historical time, training to acquire a neural network prediction model by taking the static information epidemic situation risk index values as input and the dynamic information epidemic situation risk index values as output;
inputting the static information epidemic situation risk index value of the current time period into a neural network prediction model, outputting a prediction value of the dynamic information epidemic situation risk index value, comparing the prediction value with the dynamic information epidemic situation risk index value of the current time period, marking the dynamic information epidemic situation risk index value of the current time period if the error exceeds a first preset range, and re-collecting the flow data of the current time period if the error exceeds a second preset range.
9. The multi-platform information epidemic risk assessment apparatus of claim 1, comprising:
the acquisition module is used for collecting the flow data of each platform;
the label endowing module is used for extracting a domain name list from each piece of stream data, acquiring domain name redirection historical information, matching the domain name redirection historical information with a domain name reliability corpus and endowing a reliability label;
the geographic information analysis module is used for analyzing the user-defined position of each piece of streaming data to obtain geographic information;
the grouping module is used for grouping the stream data according to two dimensions of geographic information and time;
the system comprises a static information epidemic situation risk index value calculation module, a reliability label calculation module and a dynamic information epidemic situation risk index value calculation module, wherein the static information epidemic situation risk index value calculation module is used for quantizing each group of streaming data to obtain a static information epidemic situation risk index value reflecting the static information epidemic situation risk degree of a region in a certain time period based on the number of fans of users and the reliability label;
and the dynamic information epidemic situation risk index value calculation module is used for quantizing and obtaining the dynamic information epidemic situation risk index value reflecting the dynamic information epidemic situation risk degree of a region in a certain time period based on the praise number, the forwarding number, the comment number and the reliability label for each group of streaming data.
10. The multi-platform information epidemic risk assessment device according to claim 1, wherein the static information epidemic risk index value calculation module adopts formula 1 to calculate the static information epidemic risk index value statiri c,d
Figure FSA0000271085610000031
The dynamic information epidemic situation risk index value calculation module adopts a formula 2 to calculate dynamic information epidemic situation risk index value dynamic microIRI c,d
Figure FSA0000271085610000032
Wherein c represents an area, d represents a time period, T c,d Represents all stream data with reliability label, like, of c region during d time period i Indicates the like number, pans, of the stream data i i Number of fans of a poster, r, representing stream data i i Representing the unreliability score of the stream data i.
CN202210382759.5A 2022-04-14 2022-04-14 Multi-platform information epidemic situation risk assessment method and device Active CN114896522B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210382759.5A CN114896522B (en) 2022-04-14 2022-04-14 Multi-platform information epidemic situation risk assessment method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210382759.5A CN114896522B (en) 2022-04-14 2022-04-14 Multi-platform information epidemic situation risk assessment method and device

Publications (2)

Publication Number Publication Date
CN114896522A true CN114896522A (en) 2022-08-12
CN114896522B CN114896522B (en) 2023-04-07

Family

ID=82716980

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210382759.5A Active CN114896522B (en) 2022-04-14 2022-04-14 Multi-platform information epidemic situation risk assessment method and device

Country Status (1)

Country Link
CN (1) CN114896522B (en)

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108038240A (en) * 2017-12-26 2018-05-15 武汉大学 Based on content, the social networks rumour detection method of user's multiplicity
CN109543118A (en) * 2018-11-12 2019-03-29 中国人民解放军战略支援部队信息工程大学 Web terrestrial reference reliability estimation method and device based on multilevel policy decision
CN110334180A (en) * 2019-06-05 2019-10-15 南京航空航天大学 A kind of mobile application security appraisal procedure based on comment data
CN110516967A (en) * 2019-08-28 2019-11-29 腾讯科技(深圳)有限公司 A kind of method and relevant apparatus of information evaluation
CN110941953A (en) * 2019-11-26 2020-03-31 华中师范大学 Automatic identification method and system for network false comments considering interpretability
CN111753093A (en) * 2020-07-02 2020-10-09 东北电力大学 Method and device for evaluating level of network public opinion crisis
WO2020208429A1 (en) * 2019-04-10 2020-10-15 Truthshare Software Private Limited System and method to find origin and to prevent spread of false information on an information sharing systems
CN112115257A (en) * 2019-06-20 2020-12-22 百度在线网络技术(北京)有限公司 Method and apparatus for generating information evaluation model
CN112883286A (en) * 2020-12-11 2021-06-01 中国科学院深圳先进技术研究院 BERT-based method, equipment and medium for analyzing microblog emotion of new coronary pneumonia epidemic situation
CN113934964A (en) * 2021-11-02 2022-01-14 重庆邮电大学 Rumor propagation control method based on multi-message and multi-dimensional composite game
CN114238738A (en) * 2022-02-28 2022-03-25 南京明博互联网安全创新研究院有限公司 Rumor detection method based on attention mechanism and bidirectional GRU

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108038240A (en) * 2017-12-26 2018-05-15 武汉大学 Based on content, the social networks rumour detection method of user's multiplicity
CN109543118A (en) * 2018-11-12 2019-03-29 中国人民解放军战略支援部队信息工程大学 Web terrestrial reference reliability estimation method and device based on multilevel policy decision
WO2020208429A1 (en) * 2019-04-10 2020-10-15 Truthshare Software Private Limited System and method to find origin and to prevent spread of false information on an information sharing systems
CN110334180A (en) * 2019-06-05 2019-10-15 南京航空航天大学 A kind of mobile application security appraisal procedure based on comment data
CN112115257A (en) * 2019-06-20 2020-12-22 百度在线网络技术(北京)有限公司 Method and apparatus for generating information evaluation model
CN110516967A (en) * 2019-08-28 2019-11-29 腾讯科技(深圳)有限公司 A kind of method and relevant apparatus of information evaluation
CN110941953A (en) * 2019-11-26 2020-03-31 华中师范大学 Automatic identification method and system for network false comments considering interpretability
CN111753093A (en) * 2020-07-02 2020-10-09 东北电力大学 Method and device for evaluating level of network public opinion crisis
CN112883286A (en) * 2020-12-11 2021-06-01 中国科学院深圳先进技术研究院 BERT-based method, equipment and medium for analyzing microblog emotion of new coronary pneumonia epidemic situation
CN113934964A (en) * 2021-11-02 2022-01-14 重庆邮电大学 Rumor propagation control method based on multi-message and multi-dimensional composite game
CN114238738A (en) * 2022-02-28 2022-03-25 南京明博互联网安全创新研究院有限公司 Rumor detection method based on attention mechanism and bidirectional GRU

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
HUAIWEN ZHANG等: "Multimodal Disentangled Domain Adaption for Social Media Event Rumor Detection", 《IEEE TRANSACTIONS ON MULTIMEDIA ( VOLUME: 23)》 *
JUNJIE WU等: "Fake news propagates differently from real news even at early stages of spreading", 《ARXIV E-PRINTS》 *
孙冉等: "突发公共卫生事件中谣言识别研究", 《情报资料工作》 *
韩玉民等: "新闻网站可信度指标分析与计算方法研究", 《现代信息科技》 *

Also Published As

Publication number Publication date
CN114896522B (en) 2023-04-07

Similar Documents

Publication Publication Date Title
Zannettou et al. On the origins of memes by means of fringe web communities
Muric et al. COVID-19 vaccine hesitancy on social media: building a public Twitter data set of antivaccine content, vaccine misinformation, and conspiracies
CN102208992B (en) The malicious information filtering system of Internet and method thereof
CN100520776C (en) Robust detector of fuzzy duplicates
CN109145216A (en) Network public-opinion monitoring method, device and storage medium
US20130297581A1 (en) Systems and methods for customized filtering and analysis of social media content collected over social networks
CN106383887A (en) Environment-friendly news data acquisition and recommendation display method and system
CN113312461A (en) Intelligent question-answering method, device, equipment and medium based on natural language processing
CN104054103A (en) Machine-learning based classification of user accounts based on email addresses and other account information
CN102119385A (en) Method and subsystem for searching media content within a content-search-service system
Miani et al. LOCO: The 88-million-word language of conspiracy corpus
WO2021179481A1 (en) Cold start method and apparatus for personalizing and pushing data content, device and storage medium
CN111709052A (en) Private data identification and processing method, device, equipment and readable medium
CN112258254B (en) Internet advertisement risk monitoring method and system based on big data architecture
WO2023029347A1 (en) Multi-source data-based disease early warning method and apparatus, device, and storage medium
CN110708339B (en) Correlation analysis method based on WEB log
Rossi et al. Detecting political bots on Twitter during the 2019 Finnish parliamentary election
CN109918621A (en) Newsletter archive infringement detection method and device based on digital finger-print and semantic feature
Sun et al. Efficient event detection in social media data streams
Grover et al. Prediction model for influenza epidemic based on Twitter data
Liao et al. Coronavirus pandemic analysis through tripartite graph clustering in online social networks
Chen et al. Lexicon based Chinese language sentiment analysis method
CN117251414B (en) Data storage and processing method based on heterogeneous technology
CN113688239B (en) Text classification method and device under small sample, electronic equipment and storage medium
CN105138572B (en) Method and device for acquiring relevance weight of user tag

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant