CN112562861B - Method and device for training infectious disease prediction model - Google Patents

Method and device for training infectious disease prediction model Download PDF

Info

Publication number
CN112562861B
CN112562861B CN202011298182.7A CN202011298182A CN112562861B CN 112562861 B CN112562861 B CN 112562861B CN 202011298182 A CN202011298182 A CN 202011298182A CN 112562861 B CN112562861 B CN 112562861B
Authority
CN
China
Prior art keywords
data
new
preset
prediction model
infectious disease
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011298182.7A
Other languages
Chinese (zh)
Other versions
CN112562861A (en
Inventor
王智谨
蔡兵
陈荣鑫
王宗跃
付永钢
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jimei University
Original Assignee
Jimei University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jimei University filed Critical Jimei University
Priority to CN202011298182.7A priority Critical patent/CN112562861B/en
Publication of CN112562861A publication Critical patent/CN112562861A/en
Application granted granted Critical
Publication of CN112562861B publication Critical patent/CN112562861B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/80ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for detecting, monitoring or modelling epidemics or pandemics, e.g. flu
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Abstract

Disclosed is a method of training an infectious disease prediction model, the method comprising: aiming at a plurality of preset geographic areas, determining new infection sequence data corresponding to each of the preset geographic areas in a plurality of continuous preset time periods; determining a plurality of time series of new infectious agents within the consecutive plurality of preset time periods based on the respective new infectious sequence data within the plurality of preset geographical areas; converting the plurality of time series of new infected persons into supervised data; and training the infectious disease prediction model through the supervised data, wherein the infectious disease prediction model is a time sequence neural network based on a time sequence subsequence. According to the method and the system, the infectious disease diagnosis people number can be predicted more accurately in a short period of multiple areas through the time sequence neural network.

Description

Method and device for training infectious disease prediction model
Technical Field
The present application relates to the field of computer technologies, and in particular, to a method, an apparatus, a system, a computer-readable storage medium, and an electronic device for training an infectious disease prediction model.
Background
Along with the acceleration of the global economy integration process, the economy and communication activities are increased, the crowd flows more and more frequently, a favorable environment is provided for the transmission and outbreak of infectious diseases, and the public health problem is more and more severe. Meanwhile, social and natural environments are changed, environmental pollution, natural disasters and other public health events are increased, and the possibility of outbreak of public health events is increased.
Traditional infectious diseases are based on the observed number of infectious disease at several same time intervals when predicting the number of infectious disease episodes in a future time interval. If the number of the morbidities in the same period of the year is used for predicting the number of the morbidities in the same period of the year, the result is poor if the result is predicted by the conventional method.
Disclosure of Invention
The present application is proposed to solve the above-mentioned technical problems. Embodiments of the present application provide a method and apparatus, a system, a computer-readable storage medium, and an electronic device for training an infectious disease prediction model.
According to an aspect of the present application, there is provided a method for training an infectious disease prediction model, applied to an electronic device, including:
aiming at a plurality of preset geographic areas, determining new infection sequence data corresponding to each of the preset geographic areas in a plurality of continuous preset time periods;
determining a plurality of time series of new infectious agent numbers in the continuous plurality of preset time periods based on the new infectious agent sequence data corresponding to the plurality of preset geographical areas;
converting the time series of the number of new infected persons into a plurality of supervised data, wherein the time series of the number of new infected persons and the plurality of supervised data correspond one to one;
training the infectious disease prediction model through the plurality of supervised data, wherein the infectious disease prediction model is a time sequence neural network based on a time sequence subsequence.
According to a second aspect of the present application, there is provided an apparatus for training an infectious disease prediction model, applied to an electronic device, the apparatus comprising:
a first determining module, configured to determine, for multiple preset geographic areas, new infection sequence data corresponding to each of the multiple preset geographic areas in multiple consecutive preset time periods
The second determining module is used for determining a plurality of time sequences of new infectious people in the continuous preset time periods based on the new infectious sequence data corresponding to the plurality of preset geographic areas;
the data conversion module is used for converting the time series of the new number of infected persons into a plurality of supervised data, wherein the time series of the new number of infected persons and the supervised data correspond to each other one by one;
and the model training module is used for training the infectious disease prediction model through the plurality of supervised data, and the infectious disease prediction model is a time sequence neural network based on a time sequence subsequence.
According to a third aspect of the present application, there is provided a computer-readable storage medium storing a computer program for executing the method of training an infectious disease prediction model according to the first or second aspect.
According to a fourth aspect of the present application, there is provided an electronic apparatus comprising:
a processor;
a memory for storing the processor-executable instructions;
the processor is configured to read the executable instructions from the memory and execute the instructions to implement the method for training an infectious disease prediction model according to the first aspect or the second aspect.
The embodiment that this application provided, through a plurality of preset geographical regions in a plurality of preset time quantum in succession the new infectious person number that corresponds in a plurality of preset geographical regions respectively will a plurality of new infectious person number time series in a plurality of preset time quantum in succession turn into there is the supervision data, in the time sequence neural network based on the time sequence subsequence is trained to have the supervision data, can confirm diagnosing the more accurate of people number prediction to the infectious disease in many areas short-term through time sequence neural network.
Drawings
The above and other objects, features and advantages of the present application will become more apparent by describing in more detail embodiments of the present application with reference to the attached drawings. The accompanying drawings are included to provide a further understanding of the embodiments of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the principles of the application. In the drawings, like reference numbers generally indicate like parts or steps.
Fig. 1 is a flowchart illustrating a method for training an infectious disease prediction model according to an exemplary embodiment of the present application.
Fig. 2 is a schematic structural diagram of an apparatus for training an infectious disease prediction model according to an exemplary embodiment of the present application.
Fig. 3 is a block diagram of an electronic device provided in an exemplary embodiment of the present application.
Detailed Description
Hereinafter, example embodiments according to the present application will be described in detail with reference to the accompanying drawings. It should be understood that the described embodiments are only some embodiments of the present application and not all embodiments of the present application, and that the present application is not limited by the example embodiments described herein.
Summary of the application
In order to alleviate the uncertainty problem of infectious disease sequences in a plurality of regions, the invention provides a method and a device for training an infectious disease prediction model. According to the method, the number of newly-sent infectious people of infectious diseases in different time periods in multiple regions is counted, and a time sequence neural network based on a time sequence subsequence (shape) is designed, so that the number of infectious diseases in the next time period can be predicted through the infectious disease prediction model, and the fluctuation condition of the number of confirmed infectious disease diagnosis people in the future time period can be predicted.
Exemplary method
Fig. 1 is a flowchart illustrating a method for training an infectious disease prediction model according to an exemplary embodiment of the present application. The embodiment can be applied to electronic equipment. As shown in fig. 1, the method for training an infectious disease prediction model of the present application may include the following steps:
step 101, aiming at a plurality of preset geographic areas, determining new infection sequence data corresponding to each of the plurality of preset geographic areas in a plurality of continuous preset time periods.
In the embodiment of the application, the preset geographic area can be preset in advance, and the more the preset geographic area is, the more the number of newly infected people is collected, the more sample data is used as model training, the more comprehensive the infectious disease prediction model is trained, and the more accurate the prediction in the future time period is.
In the embodiment of the present application, the consecutive preset time periods may be preset time periods, for example, new infected persons collected in a plurality of preset geographical areas respectively in a unit of time period of months, weeks or days, and consecutive months, weeks or consecutive days.
Aiming at the new infected people number corresponding to each preset geographical area in the plurality of continuous preset time periods in the plurality of preset geographical areas, normalizing the new infected people number corresponding to each preset geographical area in the plurality of continuous preset time periods through the following formula:
Figure BDA0002786000850000041
wherein d represents new infection sequence data corresponding to the one preset geographical area in the continuous preset time periods, and is input to the above formula, d' represents normalized sequence data, function mean (-) represents a mean value of input d, and function std (-) represents a standard deviation of input d.
And 102, determining a plurality of time sequences of new infection persons in a plurality of continuous preset time periods based on the corresponding new infection sequence data in a plurality of preset geographic areas.
In the embodiment of the application, new infection sequence data in a plurality of continuous preset time periods can be preprocessed and sorted according to the time dimension, so that a plurality of infection sequence data are obtainedA time series of new numbers of infected persons, wherein the pre-treatment may comprise: data consistency, most value processing and data equalization. In this embodiment, the symbol M represents the number of time intervals, and the time series of the number of newly infected persons in a predetermined geographic area in a plurality of consecutive predetermined time periods can be represented as [ y ] 1 ,y 2 ,…,y M ]Wherein, in the step (A),
Figure BDA0002786000850000042
and I represents the counted number of the plurality of preset geographic areas, and represents the number obtained by the I preset geographic areas.
And 103, converting the time series of the new number of infected persons into a plurality of supervised data, wherein the time series of the new number of infected persons and the supervised data correspond one to one.
In the embodiment of the present application, each new infectious person time series in the plurality of new infectious person time series may be normalized to obtain a plurality of normalized time series [ y 1 ,y 2 ,…,y M ]Wherein, M represents the number of a plurality of normalization sequences and is a positive integer.
And processing the plurality of normalized time sequences through a forward one-step-forward (one-step-forward) model with a preset lag parameter T to obtain a plurality of supervised data, wherein the plurality of new infected person number time sequences correspond to the plurality of supervised data one by one. Specifically, the forward one-step model is shown by the following formula:
Figure BDA0002786000850000051
wherein, the left side of the above formula is the input of the forward one-step model, and the right side is the output of the forward one-step model.
And 104, training an infectious disease prediction model through a plurality of supervised data, wherein the infectious disease prediction model is a time sequence neural network based on a time sequence subsequence.
In an embodiment of the present application, the plurality of supervised data is received through a time series subsequence layer (shape layer) of the infectious disease prediction model; the shape layer is a subsequence of the time series that can best characterize the sequence. The embodiment of the application utilizes a plurality of shape to represent the characteristics of a plurality of time series. In addition, this layer is used to store the parameters S to be learned.
Calculating the distance between the supervised data and the parameter to be learned stored in the time sequence sub-sequence layer through a distance layer (distance layer) of the infectious disease prediction model to obtain a plurality of distance values which are the same as the plurality of preset geographical areas in number; this layer is used to calculate the distance value between the entered time series of new infectious agents and the parameter S. By symbols
Figure BDA0002786000850000052
Represents a time series of new infected persons, and the distance between Y and each shield is calculated respectively. Mean square error is used to represent the distance between the input time series of new infectious agents and shapelets:
Figure BDA0002786000850000053
wherein D is i,c And (3) a distance value between the segmented input data representing the ith region and the c-th shield.
Normalizing the plurality of distance values by a normalized activation function (softmin layer) of the infectious disease prediction model; this layer is used to receive the distance of the previous layer and normalize the segment data and the distance value of the shape for each region input to help find the minimum distance between input Y and parameter S. The calculation formula is as follows:
Figure BDA0002786000850000054
wherein M is i,c And (3) the weight of the segmented input data of the ith region and the weight of the c shape.
Determining a weight of each of the plurality of preset geographic areas in the infectious disease prediction model based on the normalized plurality of distance values.
In one embodiment, the infectious disease prediction model may be trained using a gradient descent algorithm by setting an objective function used for training to be a Mean Squared Error function (MSE), inputting processed supervised data to the infectious disease prediction model.
On the basis of the embodiment, the time series of the new infectious disease numbers in the multiple preset geographical areas in the multiple continuous preset time periods in the multiple preset geographical areas are converted into supervised data through the corresponding new infectious disease number in the multiple preset time periods in the multiple continuous preset time periods, and the accurate diagnosis number prediction of infectious diseases in the multiple areas in a short term can be realized through the time series neural network in the time series neural network based on the time series subsequences trained on the supervised data.
Further, the embodiment shown in fig. 1 further includes:
determining model data for a next time period after a preset time period based on the infectious disease prediction model:
and performing inverse normalization processing on the model data to obtain the predicted number of the patients in the next time period.
Specifically, if F represents the trained infectious disease prediction model, the model output for the next time interval is:
Figure BDA0002786000850000061
wherein the content of the first and second substances,
Figure BDA0002786000850000062
and (4) representing the model data of the next time period, and performing inverse normalization on the model data to obtain the predicted number of the patients.
In the embodiment of the application, as the characteristics of a plurality of geographic areas are learned through the shield, the next moment is predicted through the characteristics learned by the infectious disease prediction model, and the prediction accuracy can be greatly improved.
In addition, since shape is the subsequence most characteristic of the sequence in time series. Moreover, the shape can be visualized. Therefore, by analyzing the trend of the shield and matching the time series of the number of infected persons in each area, the outbreak trend of each area can be explained.
Exemplary devices
Fig. 2 is a schematic structural diagram of an apparatus for training an infectious disease prediction model according to an exemplary embodiment of the present application. The apparatus in this embodiment may be provided in an electronic device, where the apparatus includes: the device comprises a first determining module 21, a second determining module 22, a data converting module 23 and a model training module 24.
The first determining module 21 is configured to determine, for a plurality of preset geographic areas, new infection sequence data corresponding to each of the plurality of preset geographic areas in a plurality of continuous preset time periods;
a second determining module 22, configured to determine a plurality of time series of new infection persons within the consecutive predetermined time periods based on the new infection sequence data corresponding to each of the predetermined geographic areas;
the data conversion module 23 is configured to convert the time series of the number of new infected persons into a plurality of supervised data, where the time series of the number of new infected persons and the supervised data correspond to each other one to one;
and the model training module 24 is used for training the infectious disease prediction model through the plurality of supervised data, wherein the infectious disease prediction model is a time sequence neural network based on a time sequence subsequence.
Further, on the basis of the embodiment shown in fig. 2, the data conversion module 23 is specifically configured to:
normalizing each new infectious agent time sequence in the multiple new infectious agent time sequences to obtain multiple normalized time sequences;
and processing the plurality of normalized time series through a forward one-step model with a preset lag parameter to obtain a plurality of supervised data, wherein the plurality of new infectious population time series are in one-to-one correspondence with the plurality of supervised data.
Further, on the basis of the embodiment shown in fig. 2, the model training module 24 is specifically configured to:
receiving the plurality of supervised data through a time series subsequence layer (shape layer) of the infectious disease prediction model;
calculating the distance between the supervised data and the parameter to be learned stored in the time sequence sub-sequence layer through a distance layer (distance layer) of the infectious disease prediction model to obtain a plurality of distance values which are the same as the plurality of preset geographical areas in number;
normalizing the plurality of distance values by a normalized activation function (softmin layer) of the infectious disease prediction model;
determining a weight of each of the plurality of preset geographic areas in the infectious disease prediction model based on the normalized plurality of distance values.
Further, on the basis of the embodiment shown in fig. 2, the first determining module 21 is specifically configured to:
aiming at the new infected people number corresponding to each preset geographic area in the plurality of continuous preset time periods in the plurality of preset geographic areas, normalizing the new infected people number corresponding to each preset geographic area in the plurality of continuous preset time periods through the following formula:
Figure BDA0002786000850000071
wherein d represents the new infected people number corresponding to the preset geographic area in the continuous multiple preset time periods, d' represents the normalized sequence data, the function mean (-) represents the mean value of the input d, and the function std (-) represents the standard deviation of the input d.
Further, on the basis of the embodiment shown in fig. 2, the apparatus further includes:
a third determining module 25, configured to determine, based on the infectious disease prediction model, model data for a next time period after the preset time period:
and the processing module 26 is configured to perform inverse normalization processing on the model data to obtain a predicted number of the disease people in the next time period.
Exemplary electronic device
Next, an electronic apparatus according to an embodiment of the present application is described with reference to fig. 3. FIG. 3 illustrates a block diagram of an electronic device in accordance with an embodiment of the present application.
As shown in fig. 3, the electronic device 11 includes one or more processors 111 and memory 112.
The processor 111 may be a Central Processing Unit (CPU) or other form of processing unit having data processing capabilities and/or instruction execution capabilities, and may control other components in the electronic device 11 to perform desired functions.
Memory 112 may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory may include, for example, Random Access Memory (RAM), cache memory (cache), and/or the like. The non-volatile memory may include, for example, Read Only Memory (ROM), hard disk, flash memory, etc. One or more computer program instructions may be stored on the computer-readable storage medium and executed by processor 111 to implement a pet identification method based on image recognition and/or other desired functions of the various embodiments of the present application described above. Various contents such as an input signal, a signal component, a noise component, etc. may also be stored in the computer-readable storage medium.
In one example, the electronic device 11 may further include: an input device 113 and an output device 114, which are interconnected by a bus system and/or other form of connection mechanism (not shown).
For example, when the electronic device is a first device or a second device, the input device 113 may be a microphone or a microphone array as described above for capturing an input signal of a sound source. When the electronic device is a stand-alone device, the input means 113 may be a communication network connector for receiving the acquired input signals from the first device and the second device.
The input device 113 may also include, for example, a keyboard, a mouse, and the like.
The output device 114 may output various information including the determined distance information, direction information, and the like to the outside. The output devices 114 may include, for example, a display, speakers, a printer, and a communication network and remote output devices connected thereto, among others.
Of course, for the sake of simplicity, only some of the components of the electronic device 11 relevant to the present application are shown in fig. 3, and components such as buses, input/output interfaces, and the like are omitted. In addition, the electronic device 11 may include any other suitable components, depending on the particular application.
Exemplary computer program product and computer-readable storage Medium
In addition to the above-described methods and apparatus, embodiments of the present application may also be a computer program product comprising computer program instructions that, when executed by a processor, cause the processor to perform the steps of a method of training an infectious disease prediction model according to various embodiments of the present application described in the "exemplary methods" section of this specification, supra.
The computer program product may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages, for carrying out operations according to embodiments of the present application. The program code may execute entirely on the first user computing device, partly on the first user device, as a stand-alone software package, partly on the first user computing device and partly on a remote computing device, or entirely on the remote computing device or server.
Furthermore, embodiments of the present application may also be a computer-readable storage medium having stored thereon computer program instructions that, when executed by a processor, cause the processor to perform steps in a method of training an infectious disease prediction model according to various embodiments of the present application described in the "exemplary methods" section above of this specification.
The computer-readable storage medium may take any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may include, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
The basic principles of the present application have been described above with reference to specific embodiments, but it should be noted that advantages, effects, etc. mentioned in the present application are only examples and are not limiting, and the advantages, effects, etc. must not be considered to be possessed by various embodiments of the present application. Furthermore, the foregoing disclosure of specific details is for the purpose of illustration and description and is not intended to be limiting, since the foregoing disclosure is not intended to be exhaustive or to limit the disclosure to the precise details disclosed.
The block diagrams of devices, apparatuses, systems referred to in this application are only given as illustrative examples and are not intended to require or imply that the connections, arrangements, configurations, etc. must be made in the manner shown in the block diagrams. These devices, apparatuses, devices, systems may be connected, arranged, configured in any manner, as will be appreciated by those skilled in the art. Words such as "including," "comprising," "having," and the like are open-ended words that mean "including, but not limited to," and are used interchangeably therewith. The words "or" and "as used herein mean, and are used interchangeably with, the word" and/or, "unless the context clearly dictates otherwise. The word "such as" is used herein to mean, and is used interchangeably with, the phrase "such as but not limited to".
It should also be noted that in the devices, apparatuses, and methods of the present application, the components or steps may be decomposed and/or recombined. These decompositions and/or recombinations should be considered as equivalents of the present application.
The previous description of the disclosed aspects is provided to enable any person skilled in the art to make or use the present application. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects without departing from the scope of the application. Thus, the present application is not intended to be limited to the aspects shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
The foregoing description has been presented for purposes of illustration and description. Furthermore, the description is not intended to limit embodiments of the application to the form disclosed herein. While a number of example aspects and embodiments have been discussed above, those of skill in the art will recognize certain variations, modifications, alterations, additions and sub-combinations thereof.

Claims (7)

1. A method for training an infectious disease prediction model, applied to an electronic device, the method comprising:
aiming at a plurality of preset geographic areas, determining new infection sequence data corresponding to each of the preset geographic areas in a plurality of continuous preset time periods;
determining a plurality of time series of new infectious agents within the consecutive plurality of preset time periods based on the respective new infectious sequence data within the plurality of preset geographical areas;
converting the time series of the new number of infected persons into a plurality of supervised data, wherein the time series of the new number of infected persons and the supervised data are in one-to-one correspondence;
training the infectious disease prediction model through the plurality of supervised data, wherein the infectious disease prediction model is a time sequence neural network based on a time sequence subsequence;
said converting said plurality of time series of new infectious agents into a plurality of supervised data comprising:
normalizing each new infected person number time sequence in the multiple new infected person number time sequences to obtain multiple normalized time sequences;
processing the plurality of normalized time series through a forward one-step model with a preset lag parameter to obtain a plurality of supervised data, wherein the plurality of time series of new infectious people are in one-to-one correspondence with the plurality of supervised data;
the determining new infection sequence data corresponding to each of the plurality of preset geographic areas in a plurality of continuous preset time periods comprises:
aiming at the new infected people number corresponding to each preset geographic area in the plurality of continuous preset time periods in the plurality of preset geographic areas, normalizing the new infected people number corresponding to each preset geographic area in the plurality of continuous preset time periods through the following formula:
Figure FDA0003660719740000011
wherein d represents the new infected people number corresponding to the preset geographic area in the continuous multiple preset time periods, d' represents the normalized sequence data, the function mean (-) represents the mean value of the input d, and the function std (-) represents the standard deviation of the input d;
determining a plurality of time series of new infectious agents in a plurality of continuous preset time periods based on the new infectious agent sequence data corresponding to each of the plurality of preset geographical areas, wherein the step of preprocessing the new infectious agent sequence data in the plurality of continuous preset time periods and sequencing the new infectious agent sequence data according to time dimension so as to obtain the plurality of time series of new infectious agentsThe pretreatment comprises: data consistency, maximum value processing and data equalization, wherein the symbol M represents the number of time intervals, and the time sequence of new infectious people in a preset geographic area in a plurality of continuous preset time periods is represented as [ y 1 ,y 2 ,…,y M ]Wherein, in the step (A),
Figure FDA0003660719740000021
i represents the number of a plurality of counted preset geographic areas, and represents the number of the counted preset geographic areas obtained by the total I preset geographic areas;
receiving the plurality of supervised data through a time series subsequence SHAPET layer of the infectious disease prediction model; the shape layer is a subsequence which can be used for representing the sequence most in the time sequence, a plurality of shapes are used for representing the characteristics of a plurality of time sequences, the shape layer is used for storing the parameter S to be learned,
calculating the distance between the supervised data and the parameter to be learned stored in the time sequence subsequence layer through a distance layer of the infectious disease prediction model to obtain a plurality of distance values which are the same as the plurality of preset geographical areas in number; the layer is used for calculating the distance value between the input time sequence of new infectious people and the parameter S by using symbols
Figure FDA0003660719740000022
Representing a new infectious people time series, respectively calculating the distance between Y and each shield, and representing the input new infectious people time series and the distance between the shields by using the mean square error:
Figure FDA0003660719740000023
wherein D is i,c A distance value representing the segmented input data of the ith region and the c shape;
normalizing the plurality of distance values by a normalized activation function softmin layer of the infectious disease prediction model; the layer is used for receiving the distance of the previous layer and normalizing the segment data and the distance value of the shape of each region input to help find the minimum distance between the input Y and the parameter S, and the calculation formula is as follows:
Figure FDA0003660719740000024
wherein M is i,c The weight of the segmented input data representing the ith region and the c shape;
determining weights of the preset geographic areas in the infectious disease prediction model based on the normalized distance values, wherein the weights comprise that an objective function used for training is set as a mean square error function, processed supervised data are input into the infectious disease prediction model, and the infectious disease prediction model is trained by a gradient descent algorithm;
determining model data for a next time period after a preset time period based on the infectious disease prediction model:
carrying out inverse normalization processing on the model data to obtain the number of the people with diseases predicted in the next time period; and F represents the trained infectious disease prediction model, and the model output of the next time interval is as follows:
Figure FDA0003660719740000031
wherein the content of the first and second substances,
Figure FDA0003660719740000032
and (4) representing the model data of the next time period, and performing inverse normalization on the model data to obtain the predicted number of the patients.
2. The method of claim 1, wherein the training of the infectious disease prediction model through the plurality of supervised data comprises:
and inputting the plurality of supervised data into the infectious disease prediction model by setting a trained objective function as a mean square error function, and training the infectious disease prediction model by using a gradient descent algorithm.
3. The method of claim 1, further comprising:
determining model data for a next time period after the preset time period based on the infectious disease prediction model:
and performing inverse normalization processing on the model data to obtain the predicted number of the patients in the next time period.
4. An apparatus for training an infectious disease prediction model, applied to an electronic device, the apparatus comprising:
the first determining module is used for determining new infection sequence data corresponding to each of a plurality of preset geographic areas in a plurality of continuous preset time periods aiming at the plurality of preset geographic areas;
the second determining module is used for determining a plurality of time sequences of new infectious people in the continuous preset time periods based on the new infectious sequence data corresponding to the plurality of preset geographic areas;
the data conversion module is used for converting the time series of the new number of the infected persons into a plurality of supervised data, wherein the time series of the new number of the infected persons and the supervised data are in one-to-one correspondence;
the model training module is used for training the infectious disease prediction model through the plurality of supervised data, and the infectious disease prediction model is a time sequence neural network based on a time sequence subsequence;
the data conversion module is further used for determining new infection sequence data corresponding to each of the plurality of preset geographic areas in a plurality of continuous preset time periods, and comprises:
aiming at the new infected people number corresponding to each preset geographic area in the plurality of continuous preset time periods in the plurality of preset geographic areas, normalizing the new infected people number corresponding to each preset geographic area in the plurality of continuous preset time periods through the following formula:
Figure FDA0003660719740000041
wherein d represents the new infected people number corresponding to the preset geographic area in the continuous multiple preset time periods, d' represents the normalized sequence data, the function mean (-) represents the mean value of the input d, and the function std (-) represents the standard deviation of the input d;
determining a plurality of new infection person number time sequences in a plurality of continuous preset time periods based on corresponding new infection sequence data in a plurality of preset geographic areas, wherein the step of preprocessing the new infection sequence data in the plurality of continuous preset time periods and sequencing the new infection sequence data according to a time dimension so as to obtain the plurality of new infection person number time sequences comprises the following steps of: data consistency, maximum value processing and data equalization, wherein the symbol M represents the number of time intervals, and the time sequence of new infectious people in a preset geographic area in a plurality of continuous preset time periods is represented as [ y 1 ,y 2 ,…,y M ]Wherein, in the step (A),
Figure FDA0003660719740000042
i represents the number of a plurality of counted preset geographic areas, and represents the number of the counted preset geographic areas obtained by the total I preset geographic areas;
receiving the plurality of supervised data through a time series subsequence SHAPET layer of the infectious disease prediction model; the shape layer is a subsequence which can best characterize the sequence in the time sequence, a plurality of shapes are used for representing the characteristics of a plurality of time sequences, the shape layer is used for storing the parameter S to be learned,
calculating the distance between the supervised data and the parameter to be learned stored in the time sequence subsequence layer through a distance layer of the infectious disease prediction model to obtain a plurality of distance values which are the same as the plurality of preset geographical areas in number; the layer is used for calculating the distance value between the input new infectious person time series and the parameter S,by means of symbols
Figure FDA0003660719740000043
Representing a new infectious people time series, respectively calculating the distance between Y and each shield, and representing the input new infectious people time series and the distance between the shields by using the mean square error:
Figure FDA0003660719740000044
wherein D is i,c A distance value representing the segmented input data of the ith region and the c shape;
normalizing the plurality of distance values by a normalized activation function softmin layer of the infectious disease prediction model; the layer is used for receiving the distance of the previous layer and normalizing the segment data and the distance value of the shape of each region input to help find the minimum distance between the input Y and the parameter S, and the calculation formula is as follows:
Figure FDA0003660719740000051
wherein, M i,c Weights representing segmented input data for the ith region and the c-th shapelet;
determining weights of the preset geographic areas in the infectious disease prediction model based on the normalized distance values, wherein the weights comprise that an objective function for training is set as a mean square error function, processed supervision data is input into the infectious disease prediction model, and the infectious disease prediction model is trained by a gradient descent algorithm;
determining model data for a next time period after a preset time period based on the infectious disease prediction model:
carrying out inverse normalization processing on the model data to obtain the number of the people with diseases predicted in the next time period; and F represents the trained infectious disease prediction model, and the model output of the next time interval is as follows:
Figure FDA0003660719740000052
wherein the content of the first and second substances,
Figure FDA0003660719740000053
and (4) representing the model data of the next time period, and performing inverse normalization on the model data to obtain the predicted number of the patients.
5. The apparatus of claim 4, wherein the data transformation module is specifically configured to:
normalizing each new infectious agent time sequence in the multiple new infectious agent time sequences to obtain multiple normalized time sequences;
and processing the plurality of normalized time series through a forward one-step model with a preset lag parameter to obtain a plurality of supervised data, wherein the plurality of new infected person number time series correspond to the plurality of supervised data one by one.
6. The apparatus of claim 4, wherein the model training module is specifically configured to:
and inputting the plurality of supervised data into the infectious disease prediction model by setting a trained objective function as a mean square error function, and training the infectious disease prediction model by using a gradient descent algorithm.
7. The apparatus of claim 4, further comprising:
a third determination module for determining model data for a next time period after the preset time period based on the infectious disease prediction model:
and the processing module is used for carrying out inverse normalization processing on the model data to obtain the predicted number of the disease people in the next time period.
CN202011298182.7A 2020-11-19 2020-11-19 Method and device for training infectious disease prediction model Active CN112562861B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011298182.7A CN112562861B (en) 2020-11-19 2020-11-19 Method and device for training infectious disease prediction model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011298182.7A CN112562861B (en) 2020-11-19 2020-11-19 Method and device for training infectious disease prediction model

Publications (2)

Publication Number Publication Date
CN112562861A CN112562861A (en) 2021-03-26
CN112562861B true CN112562861B (en) 2022-09-09

Family

ID=75043930

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011298182.7A Active CN112562861B (en) 2020-11-19 2020-11-19 Method and device for training infectious disease prediction model

Country Status (1)

Country Link
CN (1) CN112562861B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3642847A1 (en) * 2018-08-31 2020-04-29 Google LLC. Privacy-first on-device federated health modeling and intervention
CN111599485A (en) * 2020-05-26 2020-08-28 中南林业科技大学 Infectious disease propagation law prediction method, device, equipment and storage medium
CN111863276A (en) * 2020-07-21 2020-10-30 集美大学 Hand-foot-and-mouth disease prediction method using fine-grained data, electronic device, and medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3642847A1 (en) * 2018-08-31 2020-04-29 Google LLC. Privacy-first on-device federated health modeling and intervention
CN111599485A (en) * 2020-05-26 2020-08-28 中南林业科技大学 Infectious disease propagation law prediction method, device, equipment and storage medium
CN111863276A (en) * 2020-07-21 2020-10-30 集美大学 Hand-foot-and-mouth disease prediction method using fine-grained data, electronic device, and medium

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
Diarrhoea outpatient visits prediction based on time series;Yongming Wang 等;《Knowledge-Based Systems》;20150825;12-23 *
Dilated Recurrent Neural Network for Epidemiological Predictions;Yonggang Fu 等;《2019 3rd International Conference on Electronic Information Technology and Computer Engineering (EITCE)》;20190820;1728-1731 *
EMD-BP神经网络在传染病发病趋势和预测研究中的应用;刘振球等;《中国卫生统计》;20180225(第01期);全文 *
Prediction of HFMD Cases by Leveraging Time Series;Zhijin Wang 等;《Wireless Communications and Mobile Computing》;20210513;1-10 *

Also Published As

Publication number Publication date
CN112562861A (en) 2021-03-26

Similar Documents

Publication Publication Date Title
CN107773214B (en) Method, computer readable medium and system for optimal wake-up strategy
JP2021532499A (en) Machine learning-based medical data classification methods, devices, computer devices and storage media
WO2021159761A1 (en) Pathological data analysis method and apparatus, and computer device and storage medium
WO2022188773A1 (en) Text classification method and apparatus, device, computer-readable storage medium, and computer program product
CN112183166A (en) Method and device for determining training sample and electronic equipment
US11520817B2 (en) Method and system for automatic discovery of topics and trends over time
EP3968337A1 (en) Target object attribute prediction method based on machine learning and related device
EP3916641A1 (en) Continuous time self attention for improved computational predictions
CN116663568B (en) Critical task identification system and method based on priority
US20230076575A1 (en) Model personalization system with out-of-distribution event detection in dialysis medical records
CN113889262A (en) Model-based data prediction method and device, computer equipment and storage medium
CN113705809A (en) Data prediction model training method, industrial index prediction method and device
CN113704410A (en) Emotion fluctuation detection method and device, electronic equipment and storage medium
JPWO2016084326A1 (en) Information processing system, information processing method, and program
CN112562861B (en) Method and device for training infectious disease prediction model
CN110393539B (en) Psychological anomaly detection method and device, storage medium and electronic equipment
CN116580702A (en) Speech recognition method, device, computer equipment and medium based on artificial intelligence
Parveen et al. Probabilistic Model-Based Malaria Disease Recognition System
US20230190159A1 (en) Mood forecasting method, mood forecasting apparatus and program
WO2022092447A1 (en) Method for mediating deep learning model transaction, performed by deep learning model transaction mediation server
Hodapp Unsupervised learning for computational phenotyping
WO2022249483A1 (en) Prediction device, learning device, prediction method, learning method, and program
US20230368920A1 (en) Learning apparatus, mental state sequence prediction apparatus, learning method, mental state sequence prediction method and program
WO2022092448A1 (en) Deep learning solution providing method performed by deep learning platform providing device for providing deep learning solution platform
Maharudra et al. A HIGH-LEVEL ENSEMBLE FEATURE SELECTION ALGORITHM FOR MITIGATING THE DIMENSIONALITY IN STRESS DATA

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant