CN113192640A

CN113192640A - New crown risk stage assessment method and system based on transfer learning

Info

Publication number: CN113192640A
Application number: CN202110492146.2A
Authority: CN
Inventors: 沈国江; 李宁; 郦鹏飞; 孔祥杰
Original assignee: Zhejiang University of Technology ZJUT
Current assignee: Zhejiang University of Technology ZJUT
Priority date: 2021-05-06
Filing date: 2021-05-06
Publication date: 2021-07-30

Abstract

The new crown risk stage assessment method based on transfer learning comprises the following steps: 1) the new crown hazard phase proposes: and defining the dangerous stage of the epidemic situation from the perspective of the epidemic situation period. 2) Pre-training a decoder to obtain a standard feature space mapping method; 3) classifying the new crown data of the country according to the similarity degree through a decoder, and 4) quantitatively analyzing the characteristics of each category of data; 5) matching the corresponding country types of the countries to be evaluated according to the data characteristics; 6) evaluation of the risk stage of the new crown based on transfer learning: 7) and (3) carrying out new crown risk stage evaluation respectively from two transfer learning methods based on examples and examples after standardization, wherein the former method is used for fixing data volume, and the latter method is used for self-fixing data volume. The invention also comprises a system for implementing the new crown risk phase assessment method based on the transfer learning. The evaluation experiment of the new coronary risk stage by taking 9 epidemic situation countries as examples shows that the invention has excellent performance for treating the problem.

Description

New crown risk stage assessment method and system based on transfer learning

Technical Field

The invention relates to the field of infectious diseases such as new crowns, in particular to a new crown risk stage evaluation method and system based on transfer learning.

Background

The evaluation of the current new crown phase also has the following disadvantages: firstly, no good definition of evaluation indexes of new crown risk stages exists; secondly, in the initial stage of new crown outbreak, due to the lack of the data of the epidemic situation of the country, the data of other countries are needed to help the determination of the dangerous stage of the new crown, while the difference of the epidemic situation data of different countries is huge, and a transfer learning method needs to be proposed. Therefore, the evaluation of the dangerous stage of the new coronary epidemic situation has huge challenges.

Disclosure of Invention

The invention provides a new crown risk stage evaluation method and system based on transfer learning, aiming at overcoming the defects in the prior art.

Aiming at the situation, the invention firstly defines the danger stage of the new crown and provides an evaluation method for the danger stage of the new crown. And a complete data migration evaluation flow is further provided, the risk stage of the new crown is evaluated, and an effective basis is provided for government decision.

The invention achieves the aim through the following technical scheme: the new crown risk stage evaluation method based on the transfer learning comprises the following steps:

(1) proposing a new crown danger phase;

(2) designing a decoder;

(3) pre-training a decoder to obtain a standard feature space mapping method;

(4) classifying the new crown data of the country according to the similarity degree through a decoder;

(5) quantitatively analyzing the characteristics of each category of data;

(6) matching the corresponding country types of the countries to be evaluated according to the data characteristics;

(7) an example-based transfer learning assessment;

(8) example-based migration learning assessment after normalization;

wherein, the step (1) specifically comprises the following steps:

at present, each country has no unified standard for evaluating the risk stage of the new coronary pneumonia. The existing evaluation standard is generally based on the number of confirmed cases. However, the quantity-based risk stage assessment method has many problems, one is that the future development trend of the epidemic situation is not clear, and the risk stage of the epidemic situation cannot be assessed from a complete diagnosis period. Our evaluation criteria are based on the complete diagnosis period and combined with the future epidemic development trend.

The specific evaluation criteria are defined as follows

Defining: a new crown hazard phase. We need to introduce a standard-the current _ to _ max, to describe the infection status in a country.

rdc represents the average of the infection volume on the last three days, too short days being too sensitive to daily fluctuations, and too long days leading to a low variance in the early high rates of countries. The three-day-long-life tea is reasonable and can catch up with the latest growth trend. mdc represents the maximum daily confirmation throughout the infection cycle, and to eliminate errors as much as possible, we are also the average of the three-day maximum confirmed diagnoses over the entire COVID-19 infection cycle. case_nIndicating the number of visits for a certain day. The specific COVID-19 stage is determined as shown in Table 1. This classification criterion is the most satisfactory one for our present study, after several attempts. The tags are labeled in the whole COVID-19 period of all countries as shown in FIG. 2, and on the graph, the distribution of the tags can be clearly seen to be quite consistent with the development of the COVID-19 period.

TABLE 1

Wherein, the step (2) specifically comprises the following steps:

LSTM may capture quantitative characteristics and trends of the input. As shown in fig. 1: the encoder consists of LSTM plus one layer of fully-connected layers after passing through the argmax function. Input device

The data for the diagnosis of COVID-19 infection for the historical 4 days and COVID-19 for the future day is output as the stage in which the current COVID-19 is located. The specific formula is as follows:

take the last one passing through the LSTM

And finally the output is obtained.

Y_lableIs the dangerous phase assessment result we need.

The loss function is shown in the following equation. The first half minimizes the error between the true and predicted values. The rear half L_regThe regularization term is used to avoid overfitting of the function for L2, where λ is a hyper-parameter.

Wherein, the step (3) specifically comprises the following steps:

31. preprocessing data;

countries with a receiver _ to _ max less than or equal to 0.1 and a total number of diagnoses greater than 3000 were selected from 187 countries of the global COVID-19 diagnostic dataset, which were considered to have been at the end of or past a complete cycle. And eliminating the countries with obvious data errors in advance, and marking the data of the selected countries with corresponding labels according to the previous rules.

32. Formal pre-training;

we train using new coronary confirmed data of one country in turn, and we generally choose the trained decoder to be able to decode 100% of the trained country as the stop sign.

Wherein, the step (4) specifically comprises the following steps:

the specific classification process is as follows: the decoder is used for decoding all countries to obtain the similarity degree of the countries, and data with the similarity degree larger than a certain standard are selected as a class. Looping is continued until all data is classified.

Wherein, the step (5) specifically comprises the following steps:

the difference between different categories of data is mainly reflected in the number of cases diagnosed. Countries of the same category have similar COVID-19 periodicity variations and maximum distributions. The average number of confirmed cases in the same country is about the same. Therefore, we should find a characteristic trend in each class of data, which in turn can help us classify unknown classes of data. We obtained these features by statistical analysis of the data.

Wherein, the step (6) specifically comprises the following steps:

it is determined which category the country to be evaluated is in based on the characteristics.

Wherein, the step (7) specifically comprises the following steps:

for the same type of data, the data can be directly migrated, and the problem of data shortage is solved. And directly migrating the unified category data to train a decoder, and then using the decoder to decode the current new crown risk stage of the country to be evaluated.

Wherein the step (8) specifically comprises the following steps:

the evaluation process based on the example after standardization is shown in fig. 7, the data of all countries are standardized according to the maximum value of the own country data, and the values [0,1] are standardized, namely mapped to the same distribution space. This portion of data is then used as source data to train the decoder. And the country to be evaluated is not known with the maximum value of new coronary diagnosis of the country, so the country is standardized according to the standardization rule of the country characteristic matching to the country category. The normalized data input decoder gets the new coronary risk stage of the country.

The system for implementing the new crown risk stage assessment method based on the transfer learning comprises a new crown risk stage evaluation standard module, a decoder pre-training module, a national new crown epidemic situation data classification module, a data analysis module, a national category matching module, a transfer learning assessment module and a standardized post-transfer learning assessment module which are sequentially connected.

The invention has the advantages that: 1. and dividing the danger level of the new crown according to the period of the new crown, and concretizing the abstract danger of the new crown. 2. By analyzing the characteristics of the confirmed diagnosis data of different epidemic situations, the data of other countries are fully utilized to

Drawings

FIG. 1 is an overall flow diagram of the method of the present invention.

Fig. 2(a) -2 (p) are views showing the dangerous phases of the new crown in the defined example of the present invention, wherein fig. 2(a) is country a; FIG. 2(B) is country B; FIG. 2(C) is State C; FIG. 2(D) is country D; FIG. 2(E) is nation E; FIG. 2(F) is country F; FIG. 2(G) is State G; FIG. 2(H) is State H; FIG. 2 (I) is country I; FIG. 2(J) is country J; FIG. 2(K) is the K country; FIG. 2(M) is M country; FIG. 2(N) is the N country; FIG. 2(O) is State O; FIG. 2(P) shows country P.

Fig. 3 is a diagram showing similarity of data of new crown countries in the example of the present invention.

FIG. 4 is a diagram of sorting criteria selection in an example of the invention.

FIG. 5 is a display diagram showing confirmed diagnosis of epidemic situation after classification of new crown nations data in the example of the present invention.

Fig. 6 is a periodic mean number of confirmed diagnoses after classification of new coronary country data in an example of the invention.

FIG. 7 is an illustration of the standard example-based transfer learning method of the present invention.

Fig. 8(a) -8 (i) are actual evaluation effect displays in 9 countries in the example of the present invention, wherein fig. 8(a) is country a; FIG. 8(B) shows country B; FIG. 8(c) is country E; FIG. 8(d) is country H; FIG. 8(e) is country J; FIG. 8(f) is the K country; FIG. 8(g) is state Q; FIG. 8(h) is the M country; fig. 8(i) is country O.

Detailed description of the preferred embodiments

The invention is further described below with reference to examples of 9 national new crown risk phase assessments.

The overall method of the new crown risk stage assessment method in this example is shown in fig. 1, and specifically includes the following steps:

(1) and (4) proposing a new crown danger stage:

The specific evaluation criteria are defined as follows

rdc represents the average of the infection volume on the last three days, too short days being too sensitive to daily fluctuations, and too long days leading to a low variance in the early high rates of countries. The three-day-long-life tea is reasonable and can catch up with the latest growth trend. mdc represents the maximum daily confirmation throughout the infection cycle, and to eliminate errors as much as possible, we are also the average of the three-day maximum confirmed diagnoses over the entire COVID-19 infection cycle. case_nIndicating the number of visits for a certain day. The specific COVID-19 stage is determined as shown in Table 1. After a number of attempts, thisThe classification criteria are the most consistent with our current study. The tags are labeled in the whole COVID-19 period of all countries as shown in FIG. 2, and on the graph, the distribution of the tags can be clearly seen to be quite consistent with the development of the COVID-19 period.

(2) Designing a decoder:

take the last one passing through the LSTM

And finally the output is obtained.

Y_lableIs the dangerous phase assessment result we need.

(3) Pre-training the decoder to obtain a standard feature space mapping method:

31. preprocessing data;

32. Formal pre-training;

(4) Classifying, by a decoder, the national new crown data according to the degree of similarity:

fig. 3 is the country similarity we derive from austria as the pre-trained country, and fig. 4 shows the accuracy of the evaluation of the new crown risk phase after example-based migratory learning when we select different similarity criteria, so we select 80% of this criteria. The specific classification process is as follows: we train the decoder as the benchmark with a country epidemic situation data at a time, use the decoder to decode all countries, get its similarity degree with every country, we choose the data with similarity degree greater than 80% as a class. Looping is continued until all data is classified. In the examples our classification results are shown in Table 2

TABLE 2

(5) Quantitative analysis of the characteristics of each category of data:

as can be seen from fig. 5 and 6, the difference between the different categories of data is mainly reflected in the number of confirmed cases. In fig. 5, countries of the same category have similar COVID-19 periodicity variations and maximum distributions. In fig. 6, the average number of confirmed cases in the same country is approximately the same. Therefore, we should find a characteristic trend in each class of data, which in turn can help us classify unknown classes of data. Through statistical analysis of the data, we obtained these features, as shown in table 2.

(6) And matching the corresponding country types of the countries to be evaluated according to the data characteristics:

(7) Example-based migration learning assessment:

(8) Example-based migration learning assessment after normalization:

The evaluation results of the 9 countries after (6), (7) and (8) are shown in FIG. 8. True values represent true new crown risk phases, i. represents case-based migratory learning estimates, n.i. represents case-based migratory learning estimates after normalization, 31.25% inside brackets represents utilization of source data, and so on. From the results, we can see that the evaluation mode is very suitable for the development of new crown epidemic and can be successfully implemented in a country without new crown epidemic, which has important reference value for the arrangement of government work. Has certain guiding significance in the face of unknown epidemic situation.

The system for implementing the new crown risk stage evaluation method based on the transfer learning comprises a new crown risk stage evaluation standard module, a decoder pre-training module, a national new crown epidemic situation data classification module, a data analysis module, a national category matching module, a transfer learning evaluation module and a standardized post-transfer learning evaluation module which are sequentially connected;

wherein, the new crown danger stage evaluation standard module specifically comprises:

specific evaluation criteria were defined as follows:

defining: a new crown hazard phase; we need to introduce a standard-the current _ to _ max, to describe the infection status of a country;

rdc represents the average of the infection volume of the last three days, too short days being too sensitive to daily fluctuations, too long days leading to a low variance in the early high rate countries; the three-day-long-life growth promotion agent is reasonable and can catch up with the latest growth trend; mdc represents the maximum daily diagnosis amount in the whole infection period, and in order to eliminate errors as much as possible, the average value of the maximum three-day diagnosis amount in the whole COVID-19 infection period is taken; case_nIndicating the number of confirmed diagnoses on a certain day; the specific COVID-19 stage is determined as shown in Table 1;

range of rtm	COVID-19 dangerous stage
		[0,0.2)	Low risk of separation
[0.2,0.5)	Middle risk
		[0.5,0.8)	High risk
[0.8,+∞)	Severe severity of disease

TABLE 1

Wherein, the decoder specifically includes:

the LSTM can capture quantitative characteristics and trends of the input; the encoder is composed of an LSTM plus a full-link layer which passes through an argmax function; input device

The diagnosis data of the COVID-19 infection amount of the historical 4 days and the COVID-19 diagnosis data of the future one day are output as the stage of the current COVID-19; the specific formula is as follows:

take the last one passing through the LSTM

Outputting the most;

Y_lableis the required risk stage assessment result;

the loss function is shown by the following formula; the first half minimizes the error between the true and predicted values; the rear half L_regRegularization terms are used to avoid overfitting of the function for L2, λ is a hyper-parameter;

the decoder pre-training module specifically comprises: a data preprocessing submodule and a formal pre-training submodule;

a data preprocessing submodule: selecting countries having a receiver _ to _ max of 0.1 or less and a total number of diagnoses of more than 3000 from 187 countries of the global COVID-19 diagnostic dataset, which are considered to have been at the end of a complete cycle or have exceeded the cycle; removing countries with obvious data errors in advance, and marking the selected country data with corresponding labels according to the previous rules;

formal pre-training submodule: training by sequentially using new coronary diagnosis data of a country, and selecting a trained decoder capable of decoding 100% of the trained country as a stop sign;

the specific classification process of the national new crown epidemic situation data classification module is as follows: one country epidemic situation data of each random bar is used as a benchmark training decoder, all countries are decoded by the decoder to obtain the similarity degree of the country with each country, and the data with the similarity degree larger than a certain standard is selected as a class; circulating until all data are classified;

wherein, the data analysis module specifically includes: the difference of different types of data is mainly reflected in the number of confirmed cases; countries of the same category have similar COVID-19 periodicity variation and maximum distribution; the average number of confirmed cases in the same country is about the same; therefore, a characteristic trend should be found in each class of data, which in turn can help us classify unknown classes of data; by statistical analysis of the data, we obtained these features;

the country type matching module determines which type the country to be evaluated is in according to the characteristics;

wherein, the transfer learning evaluation module comprises: for the same type of data, the data is directly migrated, and the problem of data shortage is solved; directly transferring unified category data to train a decoder, and then decoding a current new crown danger stage of a country to be evaluated by using the decoder;

the standardized migration learning evaluation module specifically comprises: after standardization, based on evaluation of an example, the data of all countries are subjected to standardization processing according to the maximum value of the data of the countries, and the data are normalized to a value [0,1], namely, the data are mapped to the same distribution space; then the part of data is used as source data to train a decoder; the country to be evaluated is not known with the maximum value of new coronary diagnosis of the country, so the country is standardized according to the standardization rule of the country feature matching to the country category; the normalized data input decoder gets the new crown hazard phase of the country.

Claims

1. The new crown risk stage assessment method based on the transfer learning comprises the following steps:

(1) the method provides a new crown danger stage, which specifically comprises the following steps:

at present, each country has no unified standard for evaluating the risk stage of the new coronary pneumonia; the existing evaluation standard is generally based on the number of confirmed cases; however, the quantity-based risk stage assessment method has a plurality of problems, namely that the future development trend of the epidemic situation is not clear, and the risk stage of the epidemic situation cannot be assessed from a complete diagnosis period; the evaluation standard is based on a complete diagnosis period and is combined with the development trend of future epidemic situations;

specific evaluation criteria were defined as follows:

defining: a new crown hazard phase; a standard-the current _ to _ max is required to be introduced to describe the infection state of a country;

rdc represents the average of the infection volume for the last three days; mdc represents the maximum daily diagnosis amount in the whole infection period, and in order to eliminate errors as much as possible, the average value of the maximum three-day diagnosis amount in the whole COVID-19 infection period is taken; case_nIndicating the number of confirmed diagnoses on a certain day; the specific COVID-19 stage is determined as shown in Table 1;

TABLE 1

(2) Designing a decoder, specifically comprising:

the LSTM can capture quantitative characteristics and trends of the input; the encoder is composed of an LSTM plus a full-link layer through an argmax function; input device

take the last one passing through the LSTM

Outputting the most;

Y_lableis the required risk stage assessment result;

the loss function is shown by the following formula; the first half minimizes the error between the true and predicted values; the latter half L_regRegularization terms are used to avoid overfitting of the function for L2, λ is a hyper-parameter;

(3) the method for pre-training the decoder to obtain the standard feature space mapping specifically comprises the following steps:

31. preprocessing data;

selecting countries having a receiver _ to _ max of 0.1 or less and a total number of diagnoses of more than 3000 from 187 countries of the global COVID-19 diagnostic dataset, which are considered to have been at the end of a complete cycle or have exceeded the cycle; removing countries with obvious data errors in advance, and marking corresponding labels on the selected country data according to the previous rules;

32. formal pre-training;

training is carried out by sequentially using new coronary diagnosis data of a country, and the trained decoder can decode the trained country as a stop sign by 100 percent generally;

(4) classifying the national new crown epidemic situation data according to the similarity degree through a decoder;

the specific classification process is as follows: randomly taking the new crown epidemic situation data of one country as a benchmark training decoder each time, decoding all countries by using the decoder to obtain the similarity of the new crown situation data of one country and each country, and selecting the data with the similarity larger than a certain standard as a class; circulating until all data are classified;

(5) quantitatively analyzing the characteristics of each category of data;

the difference of different types of data is mainly reflected in the number of confirmed cases; countries of the same category have similar COVID-19 periodicity variation and maximum distribution; the average number of confirmed cases in the same country is about the same; therefore, a characteristic trend should be found in each category of data to classify the unknown category of data; obtaining the characteristics through statistical analysis of data;

(7) an example-based transfer learning assessment;

for the same type of data, the data can be directly migrated, and the problem of data shortage is solved; directly transferring unified category data to train a decoder, and then decoding a current new crown danger stage of a country to be evaluated by using the decoder;

(8) example-based migration learning assessment after normalization;

the data of all countries are standardized according to the maximum value of the data of the countries, and the data are standardized to a value [0,1], namely are mapped to the same distribution space; then the part of data is used as source data to train a decoder; the country to be evaluated is not known with the maximum value of new coronary diagnosis of the country, so the country is standardized according to the standardization rule of the country feature matching to the country category; the normalized data input decoder gets the new coronary risk stage of the country.

2. The system for implementing the new crown risk phase assessment method based on transfer learning of claim 1 is characterized in that: the system comprises a new crown risk stage evaluation standard module, a decoder pre-training module, a national new crown epidemic situation data classification module, a data analysis module, a national category matching module, a transfer learning evaluation module and a standardized post-transfer learning evaluation module which are connected in sequence;

specific evaluation criteria were defined as follows:

rdc represents the average of the infection volume for the last three days, too short a day being too sensitive to daily fluctuations, too long a day leading to a small difference in the early rates of countries; the three-day-long-life growth promotion agent is reasonable and can catch up with the latest growth trend; mdc represents the maximum daily diagnosis amount in the whole infection period, and in order to eliminate errors as much as possible, the average value of the maximum three-day diagnosis amount in the whole COVID-19 infection period is taken; case_nIndicating the number of confirmed diagnoses on a certain day; the specific COVID-19 stage is determined as shown in Table 1;

TABLE 1

Wherein, the decoder specifically includes:

take the last one passing through the LSTM

Outputting the most;

Y_lableis the required risk stage assessment result;

a data preprocessing submodule: selecting countries having a receiver _ to _ max of 0.1 or less and a total number of diagnoses of more than 3000 from 187 countries of the global COVID-19 diagnostic dataset, which are considered to have been at the end of a complete cycle or have exceeded the cycle; removing countries with obvious data errors in advance, and marking corresponding labels on the selected country data according to the previous rules;

the standardized migration learning evaluation module specifically comprises: after standardization, based on evaluation of an example, the data of all countries are subjected to standardization processing according to the maximum value of the data of the countries, and the data are normalized to a value [0,1], namely, the data are mapped to the same distribution space; then the part of data is used as source data to train a decoder; the country to be evaluated is not known with the maximum value of new coronary diagnosis of the country, so the country is standardized according to the standardization rule of the country feature matching to the country category; the normalized data input decoder gets the new coronary risk stage of the country.