CN112164471A - New crown epidemic situation comprehensive evaluation method based on classification regression model - Google Patents

New crown epidemic situation comprehensive evaluation method based on classification regression model Download PDF

Info

Publication number
CN112164471A
CN112164471A CN202011006901.3A CN202011006901A CN112164471A CN 112164471 A CN112164471 A CN 112164471A CN 202011006901 A CN202011006901 A CN 202011006901A CN 112164471 A CN112164471 A CN 112164471A
Authority
CN
China
Prior art keywords
factor
factors
epidemic
influence
severity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011006901.3A
Other languages
Chinese (zh)
Other versions
CN112164471B (en
Inventor
刘晓夏
陈海鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jilin University
Original Assignee
Jilin University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jilin University filed Critical Jilin University
Priority to CN202011006901.3A priority Critical patent/CN112164471B/en
Publication of CN112164471A publication Critical patent/CN112164471A/en
Application granted granted Critical
Publication of CN112164471B publication Critical patent/CN112164471B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/80ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for detecting, monitoring or modelling epidemics or pandemics, e.g. flu
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/30ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for calculating health indices; for individual health risk assessment

Abstract

The invention provides a new crown epidemic situation comprehensive evaluation method based on a classification regression model, which specifically comprises the following steps: p1, acquiring characteristic information influencing epidemic factors; p2, constructing a classifier to classify the influencing factors; p3, fitting a regression model between the influencing factors and the severity of the epidemic; and P4, calculating factor values, weight coefficients and epidemic situation risk coefficients of all levels, and analyzing important factors influencing the epidemic situation. The invention comprehensively utilizes computer technology of artificial intelligence and big data analysis, dynamically establishes a classification regression model of machine learning, optimizes evaluation of influence factors at all levels on epidemic situation risks through multiple times of fitting functions and adjustment of a floating module, can analyze main and key aspects of the target area influencing the epidemic situation, effectively warns and guides prevention and control of the epidemic situation, effectively and actively influences the mass life, economy and related aspects of the target area in terms of optimizing macroscopic regulation and control and refining microscopic prevention and control, and greatly improves the effect of epidemic situation prevention and control work.

Description

New crown epidemic situation comprehensive evaluation method based on classification regression model
Technical Field
The invention relates to the field of artificial intelligence and data analysis, in particular to a new crown epidemic situation comprehensive evaluation method based on a classification regression model.
Technical Field
The official name of world health organization is 2019-nCoV on 12 months 1 in 2020. 2019 the novel coronavirus is characterized by general susceptibility of people, can cause multisystem damage of the whole body of an infected patient, and is more influenced by middle-aged and elderly people with chronic disease history. This epidemic has become a public health safety challenge that faces in common worldwide.
The new coronary pneumonia is rough in coming, has astonishing spreading speed and is difficult to control, which causes unprecedented impact on the global economy and normal life of people. It is estimated to have far more impact than sars in 2003 and the global financial crisis in 2008. Until now, it has been shown by the relevant data that the cost for prevention and control of new crown epidemics and treatment of infected patients is high, which brings huge economic burden to the nation and individuals.
The Guangzhou geographic research institute provides an epidemic situation risk level assessment method based on population density, which comprises the following steps: and acquiring the total number, population flow number, entrance density and enterprise data of newly-added confirmed cases in the evaluation area, and inputting the total number, population flow number, entrance density and enterprise data into an evaluation model to form an epidemic situation risk evaluation grade of the evaluation area.
At present, a few published evaluation models related to new crown epidemic situations exist, and evaluation methods related to the epidemic situations are usually only focused on single influence factors; however, the severity of the epidemic is the result of the combined action of multiple factors, so that the result obtained only by considering one or more influencing factors inevitably has a large error. The evaluation result is difficult to play an effective warning and guiding role in the prevention and control work of the epidemic situation, so that the effect of the developed prevention and control work is very little, even the rebound or aggravation of the epidemic situation is caused, and certain economic loss and adverse effect are caused.
With the development of artificial intelligence and big data in recent years, computer technology is widely used in various fields. Machine learning is a multidisciplinary and multi-domain interdisciplinary related to mathematical statistics theory, computer modeling and the like. The main role of machine learning is to study how to simulate the human learning process with a computer. The computer performs classification statistics and data mining by learning the existing knowledge; and acquires a desired object or data from the existing information. With the rapid development of artificial intelligence, big data, cloud computing and parallel processing, the capability of human processing information is greatly improved, and a computer can process and analyze massive data information and complete functions which can not be completed by the human brain.
Supervised learning is a common machine learning method, and is divided into Regression (Regression) and Classification (Classification), wherein the Regression is used for continuous variables, and the Classification is used for discrete variables. Where the regression model is used to fit a corresponding functional model to the data set and the classification model is used to predict the class of the data. Typical applications in real life include spam detection, amazon user-customized recommendations, a "guess you want to see" function of a video website, and the like.
The regression problem in supervised learning is mainly classified into linear regression and logistic regression. For the regression problem, the result obtained is expected to be a functional model. Firstly, a training set is required to be given, then a mathematical function is formed according to the learning of the given training set, namely, a straight line or a curve is fitted according to points in the given training set, then whether the function is enough to fit the training set data is tested, namely whether the cost function is the minimum condition at present is tested, and finally a function model with the best matching degree with the given training set is obtained. The logistic regression in the regression problem is actually a classification learning method although it is called regression algorithm.
Classification in supervised learning is also a common problem in real life, such as determining whether a tumor is benign or malignant, or whether mail is spam, etc. In an example of an application scenario of a typical classification algorithm, in a given data set, if a part of data has a known class and the rest of data has an unknown class, a classifier is trained by using all feature information of the part of known data, so that after learning is completed, the classifier can classify each record of the unknown class into a corresponding class. Common classification algorithms are: naive Bayesian model (Naive Bayesian Mode), K-nearest neighbor (KNN), Support Vector Machine (SVM), and the like. The naive bayes model is the simplest classifier, and is also called a probabilistic classifier because it requires that the feature vectors of each class be subject to a normal distribution. For a given training set, the KNN algorithm first stores all samples, and labels the new sample as the most frequent class among the K neighbors by analyzing the K nearest neighbors around the new sample.
The SVM is a classification method based on a kernel function, and the principle of the classification method is to solve a hyperplane which can correctly divide a training data set and has the largest geometric spacing, so the classification method is also called a large-spacing classifier. The feature vectors are mapped to a high-dimensional space through some kernel functions, and then an optimal hyperplane capable of distinguishing training data in the high-dimensional space is found, wherein the optimal hyperplane can maximize the distance between a plurality of feature vectors closest to a partition surface and the partition surface. The feature vector closest to the segmentation plane is called a support vector, the algorithm complexity of the SVM model is determined by the number of the support vectors, and the trained model completely depends on the support vectors, so that the overfitting phenomenon is not easy to generate. The SVM is mainly used for solving the two-classification problem in machine learning, but can also be applied to the multi-classification problem after improvement.
At present, factors that may have an impact on the severity of an epidemic are: the total number of confirmed cases, the number of newly-added confirmed cases, the storage condition of various medical materials (such as protective clothing, protective mask, disinfection equipment, breathing machine, etc.), the age constitution of confirmed cases, the number of doctors of different types (different departments), the number of nursing staff, the number of population movement, the number of volunteers involved in epidemic prevention and control, the medicine storage condition, the number of cures of newly-added confirmed cases, the number of deaths of newly-added confirmed cases, the number of hospitals at all levels, the number of sickbeds available in hospitals at all levels, the ratio of confirmed cases to medical staff at all levels, the severity of the cases (mild symptoms and severe symptoms), the occupation of confirmed cases, the action track of confirmed cases, the distance from high-incidence places, the local population intensity, the local distribution condition of confirmed cases, the climate condition (temperature, humidity, temperature, Humidity, etc.), the number of infected healthcare workers, etc., the untreated and integrated factors are referred to as primary factors.
Among all the primary factors, some of the factors are composed of a single data, which is called a primary single factor. For example: the number of the total confirmed cases of the target country/region, the number of newly-added confirmed cases, the age composition of confirmed cases, the local population density and other factors. For the first-level single factor, taking the total number of confirmed cases in the target country/region as an example, only the data of the number of confirmed cases is provided, and the data is the whole content of the data set corresponding to the factor, i.e. the percentage of the data is 100%. Some factors are composed of a plurality of data sets, and the data sets correspond to different specific gravities in the factor, and the factor is called a primary composite factor. For example: various medical material storage conditions (such as protective clothing, protective masks, disinfection equipment, breathing machines and the like), different doctor numbers (different departments), occupation of confirmed cases, severity of illness (mild symptoms and severe symptoms) of confirmed cases and the like. Taking the influence factor of the number of different doctors (different departments) as an example, the infection department and the respiratory department are the busiest departments and are the forefront line of the epidemic situation, and take charge of a series of work such as diagnosis and treatment of febrile patients, so the number of doctors belonging to the two departments has the largest influence degree on the epidemic situation, and the doctors of the clinical laboratory need to perform skin test and examination work on the patients who see the doctor, so the doctors also need to occupy a small proportion, and the proportion occupied by the number of doctors of the infection department, the respiratory department and the clinical laboratory in the influence factor of the number of different doctors (different departments) is larger than that occupied by the number of doctors of departments such as the oral department and the urology department. In summary, the proportion of each data set in the first-level composite factor is the importance degree, namely the influence degree on epidemic situation.
And classifying and integrating all the primary factors to obtain secondary factors, so that all the secondary factors are composite factors. The secondary factors are: medical level, epidemic prevention and control strength, geographic factor influence and basic characteristics of a target area. The medical level factors comprise first-level factors such as various medical material storage conditions (such as protective clothing, protective masks, disinfection equipment, breathing machines and the like), the number of different types of doctors, the number of nursing staff, medicine storage conditions, the number of cured newly-increased confirmed cases, the number of dead newly-increased confirmed cases, the number of hospitals at all levels, the number of sickbeds available for hospitals at all levels, the ratio of confirmed cases in all levels to medical staff and the like. The epidemic situation prevention and control force consists of primary factors such as the number of newly added confirmed cases, the number of volunteers participating in epidemic situation prevention and control, the occupation of the confirmed cases, the action tracks of the confirmed cases, the number of infected medical personnel and the like. The geographical factor influence comprises primary factors such as the distance between a target area and a disease-outbreak place, the climatic conditions (temperature, humidity and the like) of the target area, the local confirmed case distribution condition, the population mobility condition and the like. The basic characteristics of the target area comprise the number of total confirmed cases of the target country/area, age composition of the confirmed cases, occupation of the confirmed cases, population intensity and other primary factors. The final result obtained by the method, namely the epidemic situation risk coefficient of the target area is composed of four secondary factors of medical treatment level, epidemic situation prevention and control strength, geographic factor influence and basic characteristics of the target area.
The processing of all secondary factors is the same as the processing of primary compound factors, namely, the proportion is distributed to each primary factor according to the influence degree of each primary factor in the secondary factors on epidemic situation. For example: for the secondary factors of epidemic situation prevention and control strength, the number of newly-added confirmed cases is a main factor, the secondary factors play a key role in the severity of the epidemic situation in an analysis model, and the number of volunteers participating in epidemic situation prevention and control and other factors are secondary factors, so that the proportion allocated to the main factor is higher than the proportion allocated to the secondary factors. Therefore, different weight coefficients are given to all the influence factors, so that the epidemic situation risk coefficient finally obtained by the method is more objective and accurate and more accords with the actual situation. How to determine the current epidemic severity of the target country/region according to the factors of the country/region and take corresponding epidemic prevention and control measures is the key point for preventing and controlling the epidemic. In addition, how to effectively determine the development trend of the epidemic situation and the allocation of resources through computer simulation and data analysis are also main factors influencing the epidemic situation.
The influencing factors are classified into positive factors and negative factors according to their attributes. The positive factors are influencing factors which play an effective prevention and control role on epidemic situations, such as various medical material storage conditions (such as protective clothing, protective masks, disinfection equipment, breathing machines and the like), the number of different doctors (different departments), the cure number of newly-added confirmed cases and the like. The negative factors are adverse factors causing the worsening and aggravation of the epidemic situation, such as the total number of confirmed cases, the number of newly-added dead cases, and the like in the target country/region. Meanwhile, the positive or negative judgment of some influencing factors can be changed according to the actual situation. For example, in the case of the career of the confirmed case, if the career such as a postman, a restaurant waiter, a takeout or a doctor is a career who frequently contacts with a person, the influence factor tends to be a negative factor; in the case of certain occupations with little human contact, the influencing factor tends to be positive. If the distance between the epidemic situation outbreak area and the epidemic situation concentrated outbreak area is far enough, the factor is biased to be positive; however, in the vicinity of the outbreak of epidemic situation, the factor is biased to be a negative factor.
Trade conditions, population mobility trends, transportation and the like between different countries/regions can also have great influence on the severity of the epidemic. For example, if the control measures in individual regions are loose, a large amount of cases flow out, which leads to large-area spread of viruses, and brings great challenges to epidemic prevention and control work in countries/regions around the regions, even influences the economic development and normal lives of residents. If the diagnosis case density of a certain area is large, viruses are attached to certain goods, and the viruses can be input into other areas through trade transportation, so that local residents are infected with diseases, and the prevention and control of epidemic situations are greatly influenced.
The severity of the epidemic in a country/region is a result of the combined action of several factors, each of which has an effect on the epidemic in the target country/region, for example, an increase in the number of newly diagnosed cases in the target country/region will increase the epidemic, and an increase in the number of medical care providers will decrease the epidemic. Meanwhile, the mutual influence among different factors is also an important part to be considered in the evaluation process. For example, in the case of very sufficient medical resources, the influence of the number of confirmed cases on the severity of the epidemic situation in the country/region is very small, and therefore, if the number of medical staff in the target country/region is much larger than the number of confirmed cases, the influence factor of the number of confirmed cases and the corresponding basic characteristics of the secondary factor target region should be weakened in the evaluation of the influence on the severity of the epidemic situation, and the weight of the number of medical staff and the corresponding secondary factor medical level in the evaluation of the influence on the severity of the epidemic situation should be increased, which reflects that the mutual influence among the influence factors also affects the overall epidemic situation.
In order to make the evaluation method of the invention more consistent with the actual situation, corresponding weight coefficients need to be distributed to different factors, and the weight coefficients need to be adjusted according to the relationship among different influence factors, so that the invention is provided with a floating module. The floating module is used for adjusting the weight coefficient corresponding to the influence factor. The floating function comprises upward floating and downward floating, and the floating range is determined by the value of the influencing factor and the influence of other factors on the factor. Therefore, the finally obtained epidemic situation risk coefficient is objective and accurate.
For many influencing factors, positive factors and negative factors need to be modeled by a computer. First, feature information of each influencing factor is extracted. For the primary single factor, the data or data set of the influencing factor, the type of the corresponding secondary factor, the initial proportion of the influencing factor in the secondary factor, the severity of the epidemic situation when the influencing factor value is equal to 0, the severity of the epidemic situation when the influencing factor value is in a positive infinite or maximum value state, whether the change trend of the severity of the epidemic situation changes when the influencing factor value is continuously increased from 0, whether the influencing factor is a positive factor or a negative factor, or information such as further judgment is needed. For the first-level composite factor, each subdata or subdata set of the influence factor, the corresponding type of the second-level factor, the initial specific gravity of the factor in the second-level factor, the severity of the corresponding epidemic situation when the influence factor value is equal to 0, the severity of the corresponding epidemic situation when the influence factor value is in a positive infinite or maximum state, whether the change trend of the severity of the epidemic situation changes when the influence factor value is continuously increased from 0, whether the factor is a positive factor or a negative factor, or information needing further judgment is needed, and each part of the influence factors is set to be allocated with an initial specific gravity value.
And collecting the characteristic information corresponding to each influence factor to form a characteristic matrix, inputting the characteristic matrix into a classifier, and classifying the classes. And dividing all the influence factors into three types according to the functional relation between the size/number of each influence factor and the severity of the epidemic situation. Secondly, for each influence factor, the function relation of the corresponding category of the influence factor is used as a regression model, regression simulation is carried out by using data or a data set of the influence factor, a mathematical function model corresponding to the influence factor and epidemic severity is fitted, and the epidemic severity corresponding to the value of the influence factor and the proportion occupied in the corresponding secondary factor, namely a weight coefficient, are obtained by the function. And then, inputting the obtained weight coefficient into a floating module for first round adjustment, and judging whether the primary factor belongs to a positive factor or a negative factor so as to adjust the sign of the weight coefficient. And obtaining the value of the secondary factor according to the epidemic severity degree corresponding to all the primary factors contained in each type of secondary factor and the proportion occupied by the secondary factor, inputting all the secondary factors into the floating module to perform second adjustment of the weight coefficient, and judging whether the secondary factor belongs to a positive factor or a negative factor so as to adjust the sign of the weight coefficient. And finally, calculating an epidemic situation risk coefficient of the target area, analyzing the factors which have the greatest influence on the target area, providing accurate and comprehensive warning and guiding effects for government control work, saving national economy damaged by the epidemic situation and improving the life of common people.
Disclosure of Invention
The invention aims to provide a new crown epidemic situation comprehensive evaluation method based on a classification regression model, which can comprehensively and accurately evaluate the epidemic situation risk of a target country/region, obtain the epidemic situation risk coefficient of the target country/region, select the influence factor with the largest influence on the epidemic situation of the target country/region, and play a role in warning and guiding the development of the prevention and control work of the epidemic situation.
In order to solve the problems in the related art, the invention provides a new crown epidemic situation comprehensive evaluation method based on a classification regression model, which comprises the following steps:
part _ 1: acquiring characteristic information of factors influencing an epidemic situation;
part _ 2: creating a classifier, and classifying the influence factors;
part _ 3: fitting a regression model between the influence factors and the severity of the epidemic;
part _ 4: calculating the values and weight coefficients of all levels of factors, calculating epidemic situation risk coefficients, and selecting the factor which has the largest influence on the epidemic situation.
A new crown epidemic situation comprehensive evaluation method based on a classification regression model uses a data structure as follows:
the data structure of the influencing factor (primary factor) feature data set FeatureSet input into the classification model is defined as follows:
data Item 1, Item _ 1: name of influencing factor
Data Item 2, Item _ 2: second-level factor type corresponding to influence factor name
Data Item 3, Item _ 3: initial specific gravity among secondary factors
Data Item 4, Item _ 4: severity of corresponding epidemic when the value of the influencing factor is equal to 0
Data Item 5, Item _ 5: severity of the corresponding epidemic when the influence factor value is in a state of positive infinity or maximum value
Data Item 6, Item _ 6: whether the change trend of the epidemic severity changes when the influence factor value is continuously increased from 0
Data Item 7, Item _ 7: positive/negative factors, or the need for further discrimination
Data Item 8, Item _ 8: each part of the influencing factors and the initial specific gravity thereof are marked and represented by arrays
Data Item 9, Item _ 9: the type of functional relationship between the influencing factors and the severity of the epidemic and need to be marked.
The data structure of the regression data set RegressionSet which influences the primary factors of the input regression model is defined as follows:
data Item 1, Item _ 1: name of influencing factor
Data Item 2, Item _ 2: influencing factor data, data set
Data Item 3, Item _ 3: value of influence factor data and epidemic severity corresponding to data set
Data Item 4, Item _ 4: relevant standard value of influence factor
Data Item 5, Item _ 5: the type of functional relationship between the influencing factors and the severity of the epidemic.
The data structure of the primary factor Result dataset Result1 is defined as follows:
data Item 1, Item _ 1: name of influencing factor
Data Item 2, Item _ 2: severity p of epidemic situation corresponding to value of influence factor
Data Item 3, Item _ 3: the weight of the influencing factors among the factors influencing the epidemic situation is w 1.
Data structure of secondary factor Data2 Set:
data Item 1, Item _ 1: name of the second level factor
Data Item 2, Item _ 2: positive/negative factors
Data Item 3, Item _ 3: number of primary factors constituting secondary factors
Data Item 4, Item _ 4: the proportion w2 of the secondary factor in the analysis model
Data Item 5, Item _ 5: and calculating a secondary factor value h from the primary factor.
A new crown epidemic situation comprehensive evaluation method based on a classification regression model uses the following functions:
the classifier SVM, defined as follows:
the specific process comprises the following steps: and (3) carrying out multi-classification processing on the influence factors by adopting a classification algorithm of a Support Vector Machine (SVM) in machine learning. The method for constructing the SVM multi-class classifier is to combine a plurality of two classifiers to realize the construction of the multi-classifier. During training, samples of a certain class are classified into one class, and other remaining samples are classified into another class. When classifying, the unknown samples are classified into the class with the maximum classification function value, so that the multi-classification problem is solved by using the SVM.
Initial data input: training a classifier through the feature data set of the part of the influence factors marked manually; the remaining unartified feature data sets of influencing factors that need to be partitioned by the classifier.
Outputting and obtaining the result: all factors of the evaluation model are labeled. The functional relationship between the influencing factors and the epidemic severity is of the same type and is regarded as a class.
Regression model Regression, defined as follows:
the specific process comprises the following steps: the three function models in the invention are all in polynomial form. Because the highest degree of the polynomial is determined, a single-variable linear regression algorithm is adopted, and a vector which enables a loss function to be minimum is found through a gradient descent algorithm, so that a function model with the best fitting effect is obtained. The independent variable is an influence factor, the dependent variable is the epidemic severity, the initial data input is an ordered pair consisting of the independent variable and the dependent variable, and the output and the result are function models.
A new crown epidemic situation comprehensive evaluation method based on a classification regression model uses the following process.
The system process Task specifically includes:
Task{
task _ 1: extracting the feature information of the primary factor to form a feature data set FeatureSet,
task _ 2: a first-order factor of the manual marking,
task _ 3: training the classifier by using a feature matrix LabeledFeatures of the manually marked primary factors,
task _ 4: dividing all the first-level factors into three classes by using a trained classifier,
task _ 5: inputting the Regression data set Regression set of the influencing factors into a Regression model Regression, fitting a function model,
task _ 6: inputting the data or data set of the primary factor i into the regression function model of the corresponding category to obtain p _ i and w1_ i,
task _ 7: the first-level factors are sequentially input into the floating module, the weight coefficients of other first-level factors are adjusted by comparing the size or proportional relationship between w1 of the first-level factors and w1 of other first-level factors,
task _ 8: calculating a secondary factor value h-w 1_1 p _1+ w1_2 p _2+. + w1 n p _ n,
task _ 9: the secondary factors are sequentially input into the floating module, the weight coefficients of other secondary factors are adjusted by comparing the magnitude relation between w2 of the secondary factors and w2 of other secondary factors,
task _ 10: calculating to obtain an epidemic situation risk coefficient nCoV (w 2_1 h _1+ w2_2 h _2+ w2_3 h _3+ w2_4 h _ 4), and obtaining the factor with the largest influence on the epidemic situation,
task _ i: the reserved user executes the instructions and processes,
}
wherein, the Task _1 mainly realizes the collection of the characteristic information of the influencing factors and prepares for the subsequent classification processing.
The Task _2, the Task _3 and the Task _4 are mainly used for dividing the types of the functional relations between all the influencing factors input into the evaluation model and the severity of the epidemic situation, and are respectively a quadratic functional relation with a negative quadratic coefficient, a positive correlation relation and a negative correlation relation.
Task _5 represents the process of fitting the regression algorithm to the function model. There are three functional relationships between the influencing factors and the severity of the epidemic, which correspond to three functional models respectively: the quadratic term coefficients are negative quadratic functions, cubic functions and inverse proportional functions. And acquiring each coefficient of a function model corresponding to each influence factor according to the data set of each influence factor and the type of the functional relationship between the influence factor and the severity of the epidemic situation, so as to obtain an accurate fitting function.
The Task _6 and the Task _7 are used for calculating the epidemic severity p corresponding to the value of the primary factor and the proportion w1 occupied by the primary factor in the corresponding secondary factor, and performing weight reduction or weight increase on the weight of the primary factor in the secondary factor, namely the influence and importance of the primary factor on the epidemic situation, through the floating module, so that the proportion allocated to each primary factor is more in line with the actual situation.
The Task _8 and the Task _9 calculate the value h of the secondary factor through the epidemic severity p corresponding to the value of the primary factor and the proportion w1 occupied by the primary factor in the corresponding secondary factor, namely the epidemic severity corresponding to the secondary factor, and simultaneously, the floating module is used for carrying out weight reduction or weight increase on each secondary factor, so that the rationalization adjustment on the analysis model is shown according to the actual situation, and the final result is accurate.
The Task _10 represents that the final result, namely the epidemic situation risk coefficient is obtained, and meanwhile, the factor which has the largest influence on the epidemic situation of the target area can be deduced by respectively comparing w2 and w1, so that the warning and guiding effects on the epidemic situation prevention and control work of the target area are achieved.
The Task _ i is an execution instruction and a process reserved by the system for the user so as to meet the requirement of the user on expanding functions.
A new crown epidemic situation comprehensive evaluation method based on a classification regression model is characterized by comprising the following steps:
part _1, extracting the feature information of the primary factor to form a feature matrix FeatureSet, which specifically comprises the following steps:
all the primary factors are classified and divided into four secondary factors, namely medical level, epidemic prevention and control strength, geographic factor influence and target area basic characteristics. If the secondary factor corresponding to the primary factor i is the medical level, the type _ i in the feature data set FeatureSet _ i is 1; if the secondary factor corresponding to the primary factor i is epidemic situation prevention and control strength, the type _ i in the feature data set FeatureSet _ i is 2; if the secondary factor corresponding to the primary factor i is influenced by the geographic factor, the type _ i in the feature data set FeatureSet _ i is 3; if the secondary factor corresponding to the primary factor i is the region basic feature, the type _ i in the feature data set FeatureSet _ i is 4. Therefore, the Data2Set of the secondary factor i includes a secondary factor Name _ i, a positive factor/negative factor PS _ i, a Number _ i of the primary factors constituting the secondary factor, a specific gravity w2_ i of the secondary factor in the analysis model, and a secondary factor value h _ i calculated from the primary factor. Formalized as follows:
Data2Set_i={Name_i,PS_i,Number_i,w2_i,h_i}.
selecting the influencing factor Fact _ index to be input into the evaluation model for analysis, such as: the age of confirmed cases, Fact _1, the number of confirmed cases, Fact _2, the number of medical staff, Fact _ i, the number of various medical supplies, Fact _ n-1, the number of hospitals at all levels, and the like. The more influence factors are input into the evaluation model for analysis, the more accurate the obtained epidemic situation risk coefficient is. Obtaining a feature data set FeatureSet _ i by analyzing an influence factor i, wherein the feature data set FeatureSet _ i comprises an influence factor Name _ i, a secondary factor type _ i corresponding to the influence factor Name, an initial specific gravity w0_ i given in the secondary factor, the severity R1_ i of the corresponding epidemic situation when the influence factor value is equal to 0, the severity R2_ i of the corresponding epidemic situation when the influence factor value is in a state of being infinite or a maximum value, and the severity R3_ i of the epidemic situation when the influence factor value is continuously increased from 0, judging that the epidemic situation is a positive factor/a negative factor, or the factors PS _ i which need to be further distinguished, each part in the given influence factor and the initial proportion thereof are marked and represented by an array of probability [ m ], wherein m represents the number of sub-fractions and the type of functional relationship between the influencing factors and the severity of the epidemic is label _ i. Since no manual marking and classification is performed, and the functional relationship type label _ i between the influencing factors and the severity of the epidemic situation has no value, the initial value can be set to 0, wherein the quadratic functional relationship with the quadratic coefficient being negative corresponds to a value of 1, the positive correlation relationship corresponds to a value of 2, and the negative correlation relationship corresponds to a value of 3. The feature FeatureSet _ i corresponding to the influence factor i represents that feature data sets of all the influence factors are stored in a matrix to form a feature matrix allfeatureset _ i for all the influence factors, and the formalization is as follows:
FeatureSet_i={Name_i,type_i,w0_i,R1_i,R2_i,R3_i,PS_i,proportion[m]_i,label_i}
AllFeatures_i={FeatureSet_1,FeatureSet_2,...,FeatureSet_i,...}
the feature Part _1 is described.
A new crown epidemic situation comprehensive evaluation method based on a classification regression model is characterized by comprising the following steps:
part _2, creating a classifier, and classifying the influence factors, wherein the classification specifically comprises the following steps:
the regression Data set RegressionSet of the influence factors input into the regression model comprises an influence factor Name _ i, influence factor Data or a Data set Data _ i, a value or value array Degreee _ i of the epidemic severity corresponding to the influence factor Data or the Data set, a Standard value Standard _ i related to the influence factors, and a functional relationship type label _ i between the influence factors and the epidemic severity. Selecting and initializing a part of influence factors to carry out artificial marking, and marking label _ i in FeatureSet _ i and label _ i in regression data set RegreeSet _ i as 1 if the functional relation between the influence factor i and the epidemic severity conforms to a quadratic functional relation with a negative quadratic coefficient; if the functional relationship between the influence factor i and the epidemic severity conforms to the positive correlation relationship, marking the label _ i in Featureset _ i and the label _ i in the regression data set regression set _ i as 2; if the functional relationship between the influence factor i and the severity of the epidemic conforms to the negative correlation relationship, label both the label _ i in FeatureSet _ i and the label _ i in regression data set RegistrationSet _ i as 3. These manually labeled influences are stored in the matrix LabeledFeatures, and the remaining influences that are not manually labeled are stored in the matrix UnLabeledFeatures. And training the classifier by using the manually marked influence factors, namely inputting the feature matrix LabeledFeatures into the classifier SVM for the classifier to learn.
Inputting the feature matrix UnLabeledFeatures into a trained classifier, and classifying the influence factors in the UnLabeledFeatures according to the severity R1_ i of the corresponding epidemic when the influence factor value in FeatureSet _ i is equal to 0, the severity R2_ i of the corresponding epidemic when the influence factor value is in a positive infinite state, and whether the change trend of the severity of the epidemic changes R3_ i when the influence factor value is continuously increased from 0. If the functional relationship between the influence factor i and the epidemic severity conforms to a quadratic functional relationship of which the quadratic coefficient is negative, the classifier SVM marks both the label _ i in FeatureSet _ i and the label _ i in the regression data set regressionSet _ i as 1; if the functional relationship between the influence factor i and the epidemic severity conforms to the positive correlation relationship, the classifier SVM marks both the label _ i in Featureset _ i and the label _ i in the regression data set RegressionSet _ i as 2; if the functional relationship between the influence factor i and the severity of the epidemic conforms to the negative correlation relationship, the classifier SVM marks both label _ i in FeatureSet _ i and label _ i in regression data set RegressionSet _ i as 3. Regarding all the influencing factors input into the evaluation model, the influencing factors consistent with label _ i are regarded as one class, so that all the influencing factors are divided into three classes according to the characteristic attributes of the influencing factors. Formalized as follows:
RegressionSet={Name_i,Data_i,Degree_i,Standard_i,label_i}
LabeledFeatures={Feature_m,Feature_n,...,Feature_p,...}
UnLabeledFeatures={Feature_q,Feature_k,...,Feature_t,...}
the feature Part _2 is described.
A new crown epidemic situation comprehensive evaluation method based on a classification regression model is characterized by comprising the following steps:
part _3, fitting a regression model between the influencing factors and the severity of the epidemic, specifically:
after Part _2 is completed, all the influence factors input into the evaluation model are divided into three types, and each type corresponds to a function form between the influence factor and the epidemic severity degree, namely a quadratic function with a negative quadratic coefficient, a cubic function and an inverse proportion function.
If the label _ i in the regression data set RegressionSet _ i of the influence factor i is 1, the functional relationship between the influence factor i and the severity of the epidemic is in a quadratic function form with a negative quadratic coefficient. Making an influence factor numerical value array Data in a regression Data set RegressionSet _ i corresponding to an influence factor i and a value array Degree corresponding to the epidemic severity Degree of the influence factor numerical value array into an ordered pair, and storing the ordered pair in a Test _ i array, namely Test0_ i { (Data0[0], Degree0[0]), (Data0[1], Degree0[1]), (Dala 0[ n ], Degree0[ n ]) }. And inputting the ordered pair data set Test _ i corresponding to the influence factor i into a Regression model Regression, wherein the ordered pairs are expressed as standard points in the function image. The regression algorithm calculates coefficients a _ i, b _ i and c _ i of the quadratic function according with the influence factor i according to the standard points, so as to fit a complete functional relation 1, i _ i x 2+ b _ i x + c _ i.
If the label _ i in the regression data set RegressionSet _ i of the influence factor i is 2, the functional relationship between the influence factor i and the severity of the epidemic is in the form of a cubic function. And making an influencing factor related Standard value Standard _ i in a regression data set Registereset _ i corresponding to the influencing factor i and a value Degreee _ i of the severity of the epidemic into an ordered pair, wherein in this case, the value of Degreee _ i is usually 0.5. Meanwhile, ordered pairs (0, R1) and (1, R2) are formed, wherein R1 is R1 in the Feature data set Feature _ i corresponding to the influence factor i, and R2 is R2 in the Feature data set Feature _ i corresponding to the influence factor i. These ordered pairs are stored in an array of Test1_ i, i.e., Test1_ i { (Standard _ i, Degree _ i), (0, R1), (1, R2) }. The data set Test1_ i of ordered pairs corresponding to the influencing factor i are input into the Regression model Regression, and the ordered pairs are represented as standard points in the function image. The regression algorithm calculates coefficients a _ i and b _ i of the cubic function according with the influence factor i according to the standard points, and then a complete function2 is fitted, namely a _ i x 3+ b _ i.
If the label _ i in the regression data set RegressionSet _ i of the influence factor i is 3, the functional relationship between the influence factor i and the severity of the epidemic is in an inverse proportional function form. And making an influencing factor related Standard value Standard _ i in a regression data set Registereset _ i corresponding to the influencing factor i and a value Degreee _ i of the severity of the epidemic into an ordered pair, wherein in this case, the value of Degreee _ i is usually 0.5. Inputting the ordered pairs into a Regression model Regression, the Regression algorithm calculates the inverse proportion function coefficient a _ i according with the influence factor i according to the standard points, and then fits a complete function3 ═ a _ i/x.
The three regression models are formalized as:
function1=a_i*x^2+b_i*x+c_i,
function2=a_i*x^3+b_i,
function3=a_i/x
the feature Part _3 is described.
A new crown epidemic situation comprehensive evaluation method based on a classification regression model is characterized by comprising the following steps:
part _4, calculating the value and weight coefficient of each level of factor, calculating the epidemic situation risk coefficient, and selecting the factor with the largest influence on the epidemic situation, specifically:
firstly, the epidemic severity p corresponding to the value of the primary factor and the proportion w1 occupied by the value of the secondary factor are obtained.
If the flag label _ i in the regression data set RegressionSet _ i of the influence factor i is 1, the function model 1 between the influence factor i and the severity of the epidemic is a _ i x ^2+ b _ i x + e _ i. And obtaining a weighted Average value Average _ i of the influence factor number Data in the regression Data set RegressionSet _ i corresponding to the influence factor i, and substituting the weighted Average value Average _ i into the function model function 1. The calculated function1(Average _ i) is p _ i in the Result data set Result _ i of the influencing factor i. And w1_ i in the Result data set Result _ i of the influencing factor i is the calculation Result of (p _ i-0.5) × (1/n) + w0_ i, wherein w0_ i is the initial weight value of the influencing factor in the secondary factors set in the feature data set FeatureSet _ i, and n is the Number of the primary factors contained in the corresponding secondary factors, namely the Number _ i value in FeatureSet _ i.
If the label _ i in the regression data set RegressionSet _ i of the influence factor i is 2, the function model function between the influence factor i and the epidemic severity is a _ i x 3+ b _ i. And substituting the influence factor value Data in the regression Data set RegressionSet _ i of the influence factor i into the function model function 2. The found value of function2(Data) is p _ i in the Result Data set Result _ i that affects factor i. And w1_ i in the Result data set Result _ i of the influencing factor i is the calculation Result of (p _ i-0.5) × (1/n) + w0_ i, wherein w0_ i is the initial weight value of the influencing factor in the secondary factor set in the feature data set FeatureSet _ i, and n is the Number of the primary factors contained in the corresponding secondary factor, namely the Number _ i value in FeatureSet _ i.
If the flag type _ i in the regression data set RegressionSet _ i of the influencing factor i is 3, the function model 3 between the influencing factor i and the epidemic severity is a _ i/x. And substituting the influence factor value Data in the regression Data set RegressionSet _ i of the influence factor i into the function model function 3. The found value of function3(Data) is p _ i in the Result Data set Result _ i that affects factor i. And w1_ i in the Result data set Result _ i of the influencing factor i is the calculation Result of (p _ i-0.5) × (1/n) + w0_ i, wherein w0_ i is the initial weight value of the influencing factor in the secondary factor set in the feature data set FeatureSet _ i, and n is the Number of the primary factors contained in the corresponding secondary factor, namely the Number _ i value in FeatureSet _ i.
And sequentially inputting the processed primary factors into the floating module to adjust the weight coefficient. And comparing w1_ i of the input primary factor i with w1_ j of the primary factor j except the primary factor i, and then correspondingly increasing (reducing) w1_ i and correspondingly decreasing (increasing) w1_ j according to the size/proportion relation of w1, namely w1_ i ═ w1_ i +/- Δ w, and w1_ j ═ w1_ j +/- Δ w. Then w1_ i of the primary factor i is compared and adjusted with w1 of all the primary factors except for the primary factor i according to the steps. This completes the first round of adjustment of the weighting coefficients.
The method comprises the steps of carrying out scaling normalization processing on weight coefficients of all primary factors belonging to the same class of secondary factors, wherein the sum of all weight coefficients is 1, namely w1_ i is w1_ i/(w1_ i + w1 (i +1) +. + w1_ m), and when all weight coefficients are subjected to scaling processing, w1_ i + w1 (i +1) +. + w1_ m is 1. Judging whether the factor belongs to a positive factor or a negative factor according to the self attribute of the influencing factor and the corresponding epidemic situation severity degree p, and recording a weight coefficient w1 as a negative number if the factor belongs to the positive factor, namely, w1 is-w 1; if negative, its weighting factor w1 is not changed. Then multiplying the epidemic severity p _ i corresponding to all the primary factors i contained in each type of secondary factors by the adjusted weight coefficient w1_ i to obtain a partial product m _ i, namely m _ i ═ p _ i ═ w1_ i, and then adding all the obtained partial products to obtain a value h _ j of the secondary factor j, namely h _ j ═ m _1+ m _2+ ·+ m _ n, wherein n is the Number of the primary factors contained in the secondary factors, namely the Number _ i value in FeatureSet _ i.
According to the method of the first round of adjustment of the weight coefficient, the second round of adjustment is carried out on the weight coefficient w2 of the four types of secondary factors. After adjustment, judging whether the secondary factor belongs to a positive factor or a negative factor according to the value h of the secondary factor and the attribute of the secondary factor, and if the secondary factor belongs to the positive factor, recording the weight coefficient w2 as a negative number, namely, w2 is-w 2; if negative, its weighting factor w2 is not changed.
And finally, calculating an epidemic situation risk coefficient nCoV (w 2_1 h _1+ w2_2 p _2+ w2_3 p _3+ w2_4 p _ 4) of the target area. And comparing the weight coefficients w2 of all the secondary factors, wherein the secondary factor with the largest weight coefficient w2 is determined as the aspect that the target country/region has the largest influence on the severity of the epidemic situation. Meanwhile, in each type of secondary factors, the factor with the largest weight coefficient w1 is the factor with the largest influence on the aspect. Therefore, the comprehensive factors which have the greatest influence on the local epidemic situation, namely the medical level, the epidemic situation prevention and control strength, the geographic factor influence and the class with the greatest weight coefficient in the basic characteristics of the target area are obtained, so that more accurate warning and guiding information is provided for the epidemic situation prevention and control work, and the development and optimization of the epidemic situation prevention and control work in the target area can be helped. Meanwhile, the method can also respectively obtain the primary factor with the largest proportion in each secondary factor, so that more detailed information is provided for epidemic prevention and control work, and detailed epidemic prevention and control work is facilitated, so that better effects are obtained for epidemic prevention and control. The feature Part _4 is described.
The invention discloses a new crown epidemic situation comprehensive evaluation method based on a classification regression model, which has the following beneficial effects:
(1) and (4) analyzing the epidemic risk by integrating various factors.
Most common evaluation models are based on a single factor, but the epidemic situation is the result of the combined action of multiple factors. Therefore, the risk of the epidemic, which is obtained by analyzing a single influence factor, is naturally not accurate enough. The comprehensive evaluation method of the new crown epidemic situation based on the classification regression model can comprehensively analyze all factors possibly influencing the epidemic situation to obtain a comprehensive and accurate epidemic situation risk evaluation result. In addition, the evaluation method can also screen out the factors which have the greatest influence on the epidemic situation of the target country/region, thereby playing the roles of warning and reminding for the epidemic prevention work of the target country/region and better developing the epidemic situation protection work.
(2) And the mass data can be conveniently processed by applying machine learning and model characteristics of influencing factors.
The factors influencing the epidemic situation of the target country/region are very many, the data set is also very large, and the efficiency is very low if the artificial analysis is carried out. Therefore, the new crown epidemic situation comprehensive evaluation method based on the classification regression model adopts the machine learning algorithm, and the classifier is trained to accurately and efficiently complete heavy and complex work for human, so that the new crown epidemic situation comprehensive evaluation method based on the classification regression model is more accurate and efficient, and the expected effect is realized.
(3) The aspect which has the greatest influence on the epidemic situation of the target country/region and the factor which has the greatest influence on the epidemic situation in each aspect can be selected.
Most of the existing evaluation models only obtain the epidemic situation risk level of a target country/region, the result can show the effect of epidemic prevention work and also play a certain role in warning and reminding, but the result has no guiding function for the improvement and optimization of the epidemic prevention work. The comprehensive evaluation method of the new crown epidemic situation based on the classification regression model can analyze the macroscopic aspect which has the greatest influence on the target national/regional epidemic situation in all factors input into the evaluation model, namely the weakest key part of the target national/regional epidemic prevention work, plays a role in macroscopic regulation and control on the development and optimization of the epidemic situation prevention and control work, and is convenient for the nation and the government to grasp the key points of the epidemic situation. Meanwhile, the invention can obtain the influence factor occupying the largest proportion in each macroscopic part, thus providing more detailed and accurate warning and guiding functions for epidemic prevention and control work and greatly improving the effect of the epidemic prevention and control work.
Drawings
FIG. 1 is a schematic diagram of the main process of a new comprehensive assessment method of crown epidemic situation based on classification regression model.
Detailed Description
The following detailed description of the embodiments of the present invention is provided in connection with the accompanying drawings and examples, which are provided for illustration of the present invention and are not intended to limit the scope of the present invention.
Example 1: the classification of the level and category of the influencing factor and the initialization process thereof.
The selected influence factors input into the evaluation model are as follows: the total number of confirmed cases, the number of newly-added confirmed cases, the storage condition of various medical materials (such as protective clothing, protective mask, disinfection equipment, breathing machine, etc.), the age constitution of confirmed cases, the number of doctors of different types (different departments), the number of nursing staff, the number of population movement, the number of volunteers involved in epidemic prevention and control, the storage condition of medicines, the number of cures of newly-added confirmed cases, the number of deaths of newly-added confirmed cases, the number of hospitals at all levels, the number of sickbeds available in hospitals at all levels, the ratio of confirmed cases to medical staff at all levels, the severity of the confirmed cases (mild symptoms and severe symptoms), the occupation of confirmed cases, the movement track of confirmed cases, the distance from places with high incidence of the cases, the concentration of local population, the distribution condition of confirmed cases at local sites, and the climate condition (temperature and the like) of target countries/regions, Humidity, etc.), the number of infected medical personnel. The first-order composite factors include: various medical material storage conditions (such as protective clothing, protective masks, disinfection equipment, breathing machines and the like), the severity of the condition of a confirmed case (mild symptoms and severe symptoms), medicine storage conditions, climatic conditions (temperature, humidity and the like) of a target area and the number of doctors of the same type (different departments).
For all the first-level factors, the second-level factor medical level is composed of first-level factors such as various medical material storage conditions (such as protective clothing, protective masks, disinfection equipment, breathing machines and the like), different types of doctor numbers (different departments), nursing staff numbers, medicine storage conditions, newly-increased confirmed case cure numbers, newly-increased confirmed case death numbers, hospital numbers of all levels, hospital numbers available for hospital beds of all levels, hospital confirmed case and medical staff ratios of all levels and the like, and therefore the corresponding type value in the feature data set of the first-level factors is 1. The epidemic situation prevention and control force comprises primary factors such as the number of newly added confirmed cases, the number of volunteers participating in epidemic situation prevention and control, the occupation of the confirmed cases, the action tracks of the confirmed cases, the number of infected medical personnel and the like, and therefore the corresponding type value in the feature data set of the primary factors is 2. The geographic factor influence is composed of primary factors such as the distance between a target area and a disease-critical place, the climate conditions (temperature, humidity and the like) of the target area, the local confirmed case distribution condition, the population mobility condition and the like, and therefore, the corresponding type value in the feature data set of the primary factors is 3. The target region basic feature is composed of primary factors such as the number of total confirmed cases of the target country/region, the age of the confirmed cases, the occupation of the confirmed cases, and the population density, and therefore, the corresponding type value in the feature data set of the primary factors is 4.
Taking various medical material storage conditions (such as protective clothing, protective masks, disinfection equipment, breathing machines and the like) belonging to the first-level composite factors of the medical level as an example, the type value corresponding to the characteristic data set is 1. Meanwhile, according to the influence degree of the factor on epidemic situation, the initial proportion w0 of the factor in the secondary factor is set to be 0.2. When the influence factor value is equal to 0, the severity of the corresponding epidemic is 100%, namely R1 is 1; when the influence factor value is in a positive infinity state or a maximum value state, the severity of the corresponding epidemic situation is 0, namely R2 is 0, and the change trend of the severity of the epidemic situation is not changed when the influence factor value is continuously increased from 0, namely R3 is 0(1 indicates that 0 does not exist), the influence factor belongs to a positive factor or a negative factor, and further judgment is needed, because when the medical material reserve is extremely sufficient, the influence factor is beneficial to the prevention and control of the epidemic situation and belongs to the positive factor; when the medical material reserve is seriously insufficient, the factor is harmful to the prevention and control of epidemic situations and belongs to a negative factor, so that PS is 0(0 represents that the attribute needs to be further judged, 1 represents a positive factor, and 2 represents a negative factor). Protective clothing, protective masks, ventilators belong to vital medical supplies, and therefore the initial specific gravity of these three parts is set to 0.3, namely, proport [0] ═ proport [1] ═ proport [2] ═ 0.3, while the disinfection device belongs to a subsidiary supply compared with the other three factors, and therefore the initial specific gravity thereof is set to proport 3[ m ] ≦ 0.1. If the factor is selected as the influence factor needing manual marking, the type of the functional relationship between the influence factor and the severity of the epidemic situation belongs to an inverse proportion functional relationship, and therefore the corresponding label value in the feature data set FeatureSet is 3. Therefore, the format and content of the feature data set FeatureSet of various medical material reserve situations belonging to the first-level composite factor of the medical level are as follows:
FeatureSet { "medical material stock condition", 1, 0.2, 1, 0, 0, 0, {0.3, 0.3, 0.3, 0.1}, 3}.
Example 2: the classification process in the model is evaluated.
The impact factors input into the evaluation model are: the total number of confirmed cases, the number of newly-added confirmed cases, the storage condition of various medical materials (such as protective clothing, protective mask, disinfection equipment, breathing machine, etc.), the age constitution of confirmed cases, the number of doctors of different types (different departments), the number of nursing staff, the number of population movement, the number of volunteers involved in epidemic prevention and control, the medicine storage condition, the number of cured cases, the number of dead cases, the number of hospitals at all levels, the number of sickbeds available at all levels, the ratio of confirmed cases to medical staff at all levels, the severity of the illness (mild symptoms and severe symptoms) of confirmed cases, the occupation of confirmed cases, the movement track of confirmed cases, the distance to high-incidence places, the local population concentration, the distribution condition of confirmed cases at local places, the climate condition (temperature, etc.) of target places, Humidity, etc.), the number of infected medical personnel.
Firstly, a part of influence factors are selected for manual classification, so that a classifier is trained. For example, for the influence factor of the ratio of confirmed cases to medical staff in each level of hospital, when the value of the influence factor is equal to 0, the severity of the corresponding epidemic is 0, and therefore, R1 in the feature data set FeatureSet is 0; when the influence factor value is in a positive infinity or a maximum value state, the severity of the corresponding epidemic is 100%, so that the corresponding R2 in the feature data set FeatureSet is 1; the trend of the change of the severity of the epidemic does not change when the influence factor value is increased from 0, so that the corresponding R3 in the feature data set FeatureSet is 0(1 means yes, and 0 means no). Through manual judgment, the type of the functional relationship between the influencing factor and the severity of the epidemic situation belongs to a cubic functional relationship, namely the corresponding label value in the feature data set Featureset is 2. The age of the confirmed cases constitutes the influencing factor, when the value of the influencing factor is equal to 0, the corresponding number of patients, namely the severity of the epidemic situation, is the ratio of the number of the confirmed cases corresponding to the age to the total number of the confirmed cases in the area, and is marked as rate1, so that the corresponding R1-rate 1 in the feature data set FeatureSet; when the influence factor value is in a state of being infinite or maximum, the corresponding number of patients, that is, the severity of the epidemic situation, is the ratio of the number of the confirmed cases corresponding to the age to the total number of confirmed cases in the area, and is recorded as rate2, so that the corresponding R2 in the feature data set FeatureSet is rate 2; the trend of the change of the severity of the epidemic situation changes when the values of the influencing factors are increased from 0, that is, there is an inflection point in the functional relationship image, so that the corresponding R3 in the feature data set FeatureSet is 1(1 means yes, and 0 means no). Through manual judgment, the type of the functional relationship between the influence factor and the severity of the epidemic situation belongs to a quadratic functional relationship with a negative quadratic coefficient, namely the corresponding label value in the feature data set Featureset is 1. For the influence factor of the distance from the disease high-incidence place, when the value of the influence factor is equal to 0, the severity of the epidemic is 100%, so that the corresponding R1 in the feature data set FeatureSet is 1; when the influence factor value is in a positive infinity or a maximum value state, the severity of the epidemic is 0, so that the corresponding R2 in the feature data set FeatureSet is 0; the trend of the severity of the epidemic does not change when the influence factor value is increased from 0, so that the corresponding R3 in the feature data set FeatureSet is 0(1 means yes, and 0 means no). Through manual judgment, the type of the functional relationship between the influence factor and the severity of the epidemic situation belongs to an inverse proportion functional relationship, namely the corresponding label value in the feature data set FeatureSet is 3. And simultaneously assigning label in the regression data set corresponding to the influence factors, wherein the value of the label in the regression data set is equal to the value of the corresponding label in the feature data set FeatureSet.
Inputting the feature data set of the influence factors which are subjected to the artificial marking into a classifier, and classifying the remaining influence factors which are not subjected to the artificial marking by using the trained classifier, namely obtaining the label values in the regression data set of all the factors.
Example 3: the regression process in the model is evaluated.
The influence factor of the age of a confirmed case is exemplified as follows: and (3) if the label in the regression data set RegressionSet influencing the factor i is 1. And performing statistical analysis on the Age data set of the confirmed cases, namely counting the Number _ i of the confirmed cases corresponding to each Age Age _ i to form a series of ordered pairs (Age _ i, Number _ i). The ordered pairs are used as standard points and input into a Regression model Regression, and a Regression algorithm calculates coefficients a _ i, b _ i and c _ i of a quadratic function according with an influence factor i according to the standard points, so that a complete function relation 1, namely a-i x 2+ b _ i x + c _ i, is fitted, namely the diagnosis case Age and the Number of the diagnosis cases with Age are determined, namely the mathematical expression of the relation between the severity of the epidemic situation.
The influence factor of the ratio of the confirmed cases to the total population of the target country/region is exemplified as follows: and (3) if the label in the regression data set RegressionSet influencing the factor i is 2. And (3) referring to relevant regulations and authority documents of a target country/region, selecting a plurality of common epidemic diseases which are relatively common in the local, acquiring the proportion of cases of the common epidemic diseases to the total population of the local, and obtaining the average value AverageSick of the cases. When the ratio of the confirmed cases to the total population of the target country/region is 0, namely no person is ill, the severity of the corresponding epidemic situation is set to be 0, and the corresponding sequence is (0, 0); when the ratio of the confirmed cases to the total population of the target country/region is 1, namely assuming the situation that all residents in the target country/region are confirmed, the severity of the corresponding epidemic situation is set to be 1, and the corresponding sequence pair (1, 1) is adopted; when the ratio of the confirmed cases to the total population of the target country/region is equal to averagesic, the severity of the corresponding epidemic is set to 0.5, and the corresponding order is (averagesic, 0.5). The ordered pairs obtained above are input to the Regression model Regression as standard points, and these ordered pairs are expressed as standard points in the function image. The regression algorithm calculates coefficients a _ i and b _ i of the cubic function according with the influence factors i according to the standard points, so that a complete function relation 2 (a _ i x 3+ b _ i) is fitted, and the function relation is a mathematical expression of the relation between the influence factor, namely the ratio of the confirmed cases to the total population of the target country/region, and the severity of the epidemic.
The influence factor of the number of medical staff to the number of confirmed cases is illustrated as follows: and (3) if the label in the regression data set RegressionSet influencing the factor i is 3. And (4) consulting the relevant regulations of the target country/region and authority documents to obtain the Standard ratio Standard of the medical staff and the case specified by the target country/region. When the ratio of the number of the medical care personnel to the number of the confirmed cases is 0, namely the medical care personnel are extremely scarce, the corresponding epidemic severity is set to be positive and infinite; when the ratio of the number of the medical care personnel to the number of the confirmed cases is positive and infinite, namely the medical care personnel are sufficient, the severity of the corresponding epidemic situation is approximately 0; when the ratio of the number of medical staff to the number of confirmed cases is equal to Standard, the severity of the corresponding epidemic is set to 0.5, and the corresponding order is set to (averagesic, 0.5). And inputting the obtained ordered pairs into a Regression model Regression by taking the ordered pairs as standard points, and calculating an inverse proportion function coefficient a _ i which accords with an influence factor i by a Regression algorithm according to the standard points so as to fit a complete function3 which is a _ i/x, namely, the mathematical expression of the relation between the proportion of the number of medical staff to the number of confirmed cases and the severity of the epidemic situation.
Example 4: and evaluating the working process of the floating module in the model.
Selecting influence factors such as the total number of confirmed cases in a target country/region, the age composition of the confirmed cases, the ratio of the confirmed cases to the total population in the target country/region, the local population density, the ratio of the number of medical staff to the number of confirmed cases, the climate conditions (temperature, humidity and the like) of the target region, the number of infected medical staff and the like to simulate the working process of the floating module in the evaluation model.
The relationship between the total number of confirmed cases in the target country/region and the severity of the epidemic can be represented by a cubic function, so that the label in the regression data set is 2, and the corresponding regression function is function2 a _1 x 3+ b _ 1. Substituting the obtained Number1 of the total confirmed disease cases of the target country/region into the regression function2 to obtain a function2(Number1), which is p _1 in a Result data set Result of the influencing factor, i.e., the Number of the total confirmed cases of the target country/region, wherein w1_1 in the Result data set Result is a calculation Result of (p _1-0.5) (1/n) + w0, w0 is a weight initial value of the influencing factor set in the feature data set FeatureSet in a secondary factor, and n is a Number of the primary factor included in the corresponding secondary factor, i.e., a Number value in FeatureSet.
The relationship between age formation of confirmed cases and severity of the disease can be represented by a quadratic function with negative quadratic coefficient, so that the label of regression set is 1, and the corresponding regression function is function1 a _2 x 2+ b _2 x + c _2, where a _2 < 0. The weighted average AverageAge of the confirmed case age dataset is obtained and substituted into the regression function1, and the obtained numerical value function1 (AverageAge) is p _2 in the Result dataset Result of the influencing factor of the confirmed case age. W1_2 in the Result data set Result is the calculation Result of (p _2-0.5) × (1/n) + w0, wherein w0 is the initial value of the weight of the influencing factor set in the feature data set FeatureSet in the second-level factor, and n is the Number of the first-level factors contained in the corresponding second-level factor, i.e. the Number value in FeatureSet.
The relationship between the ratio of the number of medical staff to the number of confirmed cases and the severity of the epidemic can be expressed by an inverse proportion function, so that the label in the regression data set is 3, and the corresponding regression function is function3 a _ 3/x. The ratio 1 of the number of medical staff to the number of confirmed cases is substituted into the regression function3, and the obtained function3(rate1) is p _3 in the Result data set Result of the influence factor of the ratio of the number of medical staff to the number of confirmed cases. And w1_3 in the Result data set Result is the calculation Result of (p _3-0.5) × (1/n) + w0, wherein w0 is the initial value of the weight of the influencing factor set in the feature data set FeatureSet in the secondary factor, and n is the Number of the primary factor contained in the corresponding secondary factor, i.e. the Number value in FeatureSet.
The relationship between the ratio of the confirmed cases to the total population of the target country/region and the severity of the epidemic can be represented by a cubic function, so that the label of regression set is 2, and the corresponding regression function is function2 a _4 x 3+ b _ 4. Substituting the obtained ratio 2 of the confirmed cases to the total population of the target country/region into a regression function2 to obtain a function2(rate2), namely p _4 in a Result data set Result of the influencing factor, wherein the ratio of the confirmed cases to the total population of the target country/region is the ratio of the confirmed cases to the total population of the target country/region, w1_4 in the Result data set Result is a calculation Result of (p _4-0.5) (1/n) + w0, w0 is a weight initial value of the influencing factor in a secondary factor set in the influencing factor feature data set Featureset, and n is the Number of primary factors contained in the corresponding secondary factor, namely the Number value in Featureset.
The relationship between the local population density and the ratio of the number of confirmed cases and the severity of the epidemic can be represented by an inverse proportion function, so that the label in the regression data set is 3, and the corresponding regression function is function3 a _ 5/x. The local population density number2 is substituted into the regression function3, and the resulting function3(number2) is p _5 in the Result data set Result of the local population density, which is the influencing factor. W1_5 in the Result data set Result is the calculation Result of (p _5-0.5) × (1/n) + w0, wherein w0 is the initial value of the weight of the influencing factor set in the feature data set FeatureSet in the secondary factor, and n is the Number of the primary factor contained in the corresponding secondary factor, i.e. the Number value in FeatureSet. And the climate conditions (temperature, humidity, etc.) of the target area and the number of infected medical staff belong to the case of label 2, which are respectively obtained as p _6 and p _ 7.
All the factors are sequentially input into the floating module, and the weight coefficient w1 is adjusted. If w1_1 corresponding to the total number of confirmed cases in the target country/region is far greater than w1_3 corresponding to the ratio of the number of medical staff to the number of confirmed cases, w1_1 is correspondingly increased, namely w1_1 is w1_1+ Δ w1, w1_3 is correspondingly decreased, and w1_3 is w1_3- Δ w 1. If w1_5 corresponding to the local population density is far larger than w1_3 corresponding to the ratio of the number of medical staff to the number of confirmed cases or the ratio w1_4 of confirmed cases to the total population of the target country/region, then w1_5 is correspondingly increased, i.e. w1_5 ═ w1_5+ Δ w2, and w1_3 or w1_4 is correspondingly decreased, i.e. w1_3 ═ w1_3- Δ w2/2 or w1_4 ═ w1_4- Δ w 2/2. This completes the first round of adjustment of the weighting coefficients.
The number of total confirmed cases of a target country/region, the age composition of confirmed cases and the local population concentration belong to the basic characteristics of a secondary factor target region, the ratio of confirmed cases to the total population of the target country/region and the number of infected medical workers belong to the secondary factor epidemic prevention and control strength, the ratio of the number of medical workers to the number of confirmed cases belongs to the secondary factor medical level, and the secondary factor geographic factor influences the climate conditions (temperature, humidity and the like) of the target region.
The weight coefficients of all primary factors belonging to the same class of secondary factors are subjected to scaling normalization processing, so that the sum of all weight coefficients is 1, namely w1_ i is w1_ i/(w1_ i + w1_ (i +1) +. + w1_ m), and when all weight coefficients are subjected to scaling processing, w1_ i + w1_ i +1+. + w1_ m is 1. If the total number of confirmed cases of the target country/region, the age composition of the confirmed cases, the ratio of the confirmed cases to the total population of the target country/region, the number of infected medical workers, the ratio of the number of medical workers to the number of confirmed cases, and the p value corresponding to the climate conditions (temperature, humidity, etc.) and the local entrance intensity of the target region is greater than 0.5, the negative factor is considered, and if the p value is not greater than 0.5, the positive factor is considered. If the positive factor is the positive factor, recording the weight coefficient w1 as a negative number, namely w 1-w 1; if negative, its weighting factor w1 is not changed.
The value h1 of the basic characteristic of the secondary factor target area is p _1 × w1_1+ p _2 × w1_2+ p _5 × w1_5, the value h2 of the epidemic situation prevention and control strength is p _4 × w1_4+ p _7 × w1_7, the value h3 of the medical level is p _3 × w1_3, and the value h4 of the geographic factor influence is p _6 × w1_ 6.
Performing a second round of floating treatment on all secondary factors, and if the value h2 of epidemic situation prevention and control strength is far greater than the value h1 of the basic features of the target region, correspondingly increasing w2_2, namely w2_2 is w2_2+ Δ w3, correspondingly decreasing w2_1, and w2_1 is w2_1- Δ w 3; if the value h2 of the epidemic situation prevention and control strength is far greater than the value h4 influenced by geographic factors, correspondingly increasing w2_2, namely w2_2 is w2_2+ Δ w4, correspondingly decreasing w2_4, and w2_4 is w2_4- Δ w 4; if the value h3 of the medical level is much larger than the value h1 of the target area basic feature, w2_2 is correspondingly increased, i.e., w2_3 ═ w2_3+ Δ w5, and w2_1 is correspondingly decreased, w2_1 ═ w2_1- Δ w5, if the value h3 of the medical level is much larger than the value h4 of the geographic factor influence, w2_3 is correspondingly increased, i.e., w2_3 ═ w2_3+ Δ w6, w2_4 is correspondingly decreased, and w2_4 ═ w2_4- Δ w 6. Thus, all float operations have been described.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (1)

1. A new crown epidemic situation comprehensive evaluation method based on a classification regression model is characterized by comprising the following steps:
part _ 1: acquiring characteristic information of factors influencing the epidemic situation,
part _ 2: creating a classifier, classifying the influencing factors,
part _ 3: fitting a regression model between the influencing factors and the severity of the epidemic,
part _ 4: calculating the value and the weight coefficient of each level of factor, calculating an epidemic situation risk coefficient, and selecting the factor which has the greatest influence on the epidemic situation;
a new crown epidemic situation comprehensive evaluation method based on classification regression model, the data structure, function and process used are described as follows
(1) The data structure of the first-order factor feature data set FeatureSet influenced by the input classification model is defined as follows
Data Item 1, Item _ 1: the name of the influencing factor(s),
data Item 2, Item _ 2: the kind of the secondary factor corresponding to the name of the influence factor,
data Item 3, Item _ 3: the initial specific gravity among the secondary factors,
data Item 4, Item _ 4: the severity of the corresponding epidemic when the value of the influencing factor equals 0,
data Item 5, Item _ 5: the severity of the corresponding epidemic when the influencing factor value is in a positive infinity or a maximum value state,
data Item 6, Item _ 6: the trend and degree of change of the severity of the epidemic increases from 0,
data Item 7, Item _ 7: the setting is positive factors and negative factors,
data Item 8, Item _ 8: the components and the initial specific gravity of the influencing factors are marked and expressed by arrays,
data Item 9, Item _ 9: the type of the functional relationship between the influencing factors and the severity of the epidemic situation is marked;
(2) the data structure of regression data set for influencing primary factors input into the regression model is defined as follows
Data Item 1, Item _ 1: the name of the influencing factor(s),
data Item 2, Item _ 2: the data of the influencing factors, the data set,
data Item 3, Item _ 3: influence factor data, and values of epidemic severity corresponding to the data set,
data Item 4, Item _ 4: the standard value corresponding to the influence factor is determined,
data Item 5, Item _ 5: the type of the functional relationship between the influencing factors and the severity of the epidemic;
(3) the data structure of the primary factor Result data set Result1 is defined as follows
Data Item 1, Item _ 1: the name of the influencing factor(s),
data Item 2, Item _ 2: the severity p of the epidemic corresponding to the value of the influencing factor,
data Item 3, Item _ 3: the proportion w1 of the factors influencing epidemic situations;
(4) the Data structure Data2Set of the secondary factor is defined as follows
Data Item 1, Item _ 1: the name of the secondary factor is used,
data Item 2, Item _ 2: the setting is positive factors and negative factors,
data Item 3, Item _ 3: the number of primary factors that make up the secondary factors,
data Item 4, Item _ 4: the secondary factor accounts for a proportion w2 in the analytical model,
data Item 5, Item _ 5: calculating a secondary factor value h from the primary factor;
(5) classifier SVM, defined as follows
The specific process comprises the following steps: the Support Vector Machine (SVM) carries out multi-classification processing on the influence factors, the method for constructing the multi-class classifier of the SVM is to combine a plurality of two classifiers to realize the construction of the multi-classifier, samples of the preprocessed classes are sequentially classified into one class during training, other samples are classified into another class, the classification of unknown samples is set to have the maximum classification function value during classification,
initial data input: training the classifier by the feature data set of the influence factors marked manually, inputting the feature data set of the influence factors not marked manually,
outputting and obtaining the result: all factors of the evaluation model are marked, and the factors with the same type of function relationship with the epidemic severity degree are classified into one class;
(6) regression model Regression is defined as follows
The specific process comprises the following steps: the three function models are all in a polynomial form, the highest times of the polynomial are set, a univariate linear regression algorithm is adopted, a vector with the minimum loss function is calculated through a gradient descent algorithm, and the function model with the best fitting effect is obtained, wherein an independent variable is an influencing factor, a dependent variable is the severity of an epidemic, initial data is input into an ordered pair consisting of the independent variable and the dependent variable, and output and results are the function models;
(7) the system procedure Task is defined as follows
Task{
Task _ 1: preprocessing the feature information of the primary factor to form a feature data set Featureset,
task _ 2: a first-order factor of the manual marking,
task _ 3: the classifier is trained by a manually labeled feature matrix LabeledFeatures of the primary factors,
task _ 4: all the first-level factors are divided into three classes by a trained classifier,
task _ 5: inputting the Regression data set Regression set of the influencing factors into a Regression model Regression, fitting a function model,
task _ 6: inputting the data and the data set of the primary factor i into a regression function model of the corresponding category to obtain p _ i and w1_ i,
task _ 7: the first-level factors are sequentially input into the floating module, the weight coefficients of the related factors are adjusted by comparing the numerical value and the proportional relation of w1 of the first-level factors and w1 of the related factors,
task _ 8: calculating a secondary factor value h-w 1_1 p _1+ w1_2 p _2+. + w1 n p _ n,
task _ 9: the secondary factors are sequentially input into the floating module, the weight coefficient of the related factor is adjusted by comparing the values of w2 of the secondary factor and w2 of the factor related to the secondary factor,
task _ 10: calculating the epidemic risk coefficient according to the formula
nCoV=w2_1*h_1+w2_2*h_2+w2_3*h_3+w2_4*h_4
And the factors which have the greatest influence on the epidemic situation are calculated,
task _ i: reserves access to the user for executing instructions and processes,
}
wherein, Task _1 realizes the collection to the characteristic information of influence factor, Task _2, Task _3 and Task _4 are used for dividing the functional relation type between all influence factors and the epidemic situation severity of input evaluation model, are the quadratic function relation that the quadratic coefficient is the burden respectively, positive correlation and negative correlation, Task _5 is the process of regression algorithm fitting function model, the functional relation between influence factor and the epidemic situation severity has three kinds, correspond three kinds of function models respectively: the quadratic coefficient is a negative quadratic function, a cubic function and an inverse proportion function, each coefficient of a function model corresponding to each influencing factor is calculated according to a data set of each influencing factor and a function relationship type between the data set and the epidemic severity, a fitting function is calculated, the epidemic severity p corresponding to the value of a primary factor and the proportion w1 occupied in a corresponding secondary factor are calculated by the Task _6 and the Task _7, the weight of the influence and the importance of the primary factor in the secondary factor is reduced and increased by a floating module, the value h of the secondary factor, namely the epidemic severity corresponding to the secondary factor, is calculated by the Task _8 and the Task _9, the weight of each secondary factor is reduced and increased by the floating module, the analysis model is adjusted, and the final result, namely the epidemic risk coefficient, is calculated by the Task _10, meanwhile, the factors which have the greatest influence on the epidemic situation of the target area are calculated by respectively comparing w2 and w1, and Task _ i is an execution instruction and a process entry which are reserved for a user by the system, so that the requirement of the user on expanding functions is met;
the data structure, the function and the process used by the system are described;
a new crown epidemic situation comprehensive evaluation method based on a classification regression model is characterized by comprising the following steps
Part _1, collecting the characteristic information of the primary factor, and storing the characteristic information in a data structure FeatureSet, specifically
Classifying all primary factors, and dividing the primary factors into four secondary factors, namely medical level, epidemic situation prevention and control strength, geographic factor influence and target area basic characteristics, wherein if the secondary factor corresponding to the primary factor i is the medical level, the type _ i in a feature data set Featureset _ i is 1; if the secondary factor corresponding to the primary factor i is epidemic situation prevention and control strength, the type _ i in the feature data set FeatureSet _ i is 2; if the secondary factor corresponding to the primary factor i is influenced by the geographic factor, the type _ i in the feature data set FeatureSet _ i is 3; if the secondary factor corresponding to the primary factor i is the region basic feature, the type _ i in the feature data set FeatureSet _ i is 4; the Data2Set of the secondary factor i comprises a secondary factor Name _ i, a positive factor PS _ i, a negative factor PS _ i, the Number of the primary factors forming the secondary factor Number _ i, the proportion of the secondary factor in the analysis model w2_ i, and a secondary factor value h _ i calculated by the primary factor, and is formalized as follows
Data2Set_i={Name_i,PS_i,Number_i,w2_i,h_i}
Selecting an influencing factor Fact _ index which is input into an evaluation model for analysis, analyzing the influencing factor i to obtain a characteristic data set Featureset _ i of the influencing factor i, wherein the influencing factor i comprises an influencing factor Name _ i, a secondary factor type _ i corresponding to the influencing factor Name, an initial specific gravity w0_ i given in the secondary factor, a severity R1_ i corresponding to an epidemic situation when the influencing factor value is equal to 0, a severity R2_ i corresponding to the epidemic situation when the influencing factor value is in a positive infinite or maximum value state, a change trend and a change degree R3_ i of the severity of the epidemic situation when the influencing factor value is increased from 0, setting positive factors and negative factors, and then determining PS _ i, wherein each part in the given influencing factor and the initial specific gravity thereof are marked and represented by an array protort [ m ], wherein m represents a part quantity of the epidemic, and a functional relationship type label _ i between the influencing factor and the severity of the epidemic situation, if no manual marking and classification is carried out, the functional relationship type label _ i between the influencing factors and the severity of the epidemic situation has no value, the initial value is set to be 0, wherein the quadratic functional relationship with the quadratic coefficient being negative correspondingly takes the value of 1, the positive correlation correspondingly takes the value of 2, the negative correlation correspondingly takes the value of 3, the characteristic FeatureSet _ i corresponding to the influencing factor i is stored in the matrix, the characteristic matrix AllFeatures _ i corresponding to all the influencing factors is constructed, and the formalization is as follows
FeatureSet_i={Name_i,type_i,w0_i,R1_i,R2_i,R3_i,PS_i,proportion[m]_i,label_i}
AllFeatures_i={FeatureSet_1,FeatureSet_2,...,FeatureSet_i,...}
The description of the characteristic Part _1 is finished;
a new crown epidemic situation comprehensive evaluation method based on a classification regression model is characterized by comprising the following steps
Part _2, establishing a classifier, and classifying the influencing factors into categories, specifically
Inputting an influencing factor regression Data set RegressionSet of a regression model, wherein the regressingset comprises an influencing factor Name _ i, influencing factor Data and a Data set Data _ i, values of epidemic severity degrees corresponding to the influencing factor Data and the Data set, a value array Degreee _ i, a Standard value Standard _ i corresponding to the influencing factor, a functional relationship type label _ i between the influencing factor and the epidemic severity Degree, selecting and initializing partial influencing factors to carry out artificial marking, and marking label _ i in Feateset _ i and label _ i in regression Data set RegressionSet _ i as 1 if the functional relationship between the influencing factor i and the epidemic severity Degree conforms to a quadratic functional relationship with a negative quadratic coefficient; if the functional relationship between the influence factor i and the epidemic severity conforms to the positive correlation relationship, labeling the label _ i in Featureset _ i and the label _ i in regression data set regression set _ i as 2; if the functional relationship between the influence factor i and the epidemic severity conforms to the negative correlation relationship, labeling label _ i in FeatureSet _ i and label _ i in regression data set _ i as 3, storing the artificially labeled influence factors in a matrix LabeledFeatures, storing the artificially labeled influence factors in a matrix UnLabeledFeatures, training a classifier by using the artificially labeled influence factors, and inputting the characteristic matrix LabeledFeatures into a classifier SVM for the classifier to learn;
inputting the feature matrix UnLabeledFeatures into a trained classifier, according to the severity R1_ i of the corresponding epidemic when the influence factor value in FeatureSet _ i is equal to 0, the severity R2_ i of the corresponding epidemic when the influence factor value is in a positive infinite state, and the variation trend and the variation R3_ i of the severity of the epidemic when the influence factor value is continuously increased from 0, then classifying the influence factors in UnLabeledFeatures, and if the functional relationship between the influence factor i and the severity of the epidemic conforms to a quadratic functional relationship with a negative quadratic coefficient, the classifier labels label _ i in the FeatureSet _ i and label _ i in a regression data set regression Set _ i as 1; if the functional relationship between the influence factor i and the epidemic severity conforms to the positive correlation relationship, the classifier SVM marks the label _ i in FeatureSet _ i and the label _ i in the regression data set regressionSet _ i as 2; if the functional relationship between the influence factors i and the epidemic severity conforms to the negative correlation relationship, the classifier SVM marks the label _ i in FeatureSet _ i and the label _ i in the regression data set regressionSet _ i as 3, for all the influence factors input into the evaluation model, the influence factors with consistent label _ i are classified into one class, all the influence factors are classified into three classes according to the characteristic attributes, and the formalization is as follows
RegressionSet={Name_i,Data_i,Degree_i,Standard_i,label_i}
LabeledFeatures={Feature_m,Feature_n,...,Feature_p,...}
UnLabeledFeatures={Feature_q,Feature_k,...,Feature_t,...}
The feature Part _2 is described;
a new crown epidemic situation comprehensive evaluation method based on a classification regression model is characterized by comprising the following steps
Part _3, fitting a regression model between the influencing factors and the severity of the epidemic, specifically
All the influence factors input into the evaluation model are divided into three classes, each class corresponds to a function form between the influence factor and the severity of the epidemic situation, namely a quadratic function with negative quadratic coefficient, a cubic function and an inverse proportional function, and three regression models are formed into three types
function1 ═ a _ i x ^2+ b _ i x + c _ i, function1(x)
function2 ═ a _ i x ^3+ b _ i, function2(x)
function3 ═ a _ i/x, function3(x)
If the label _ i in the Regression Data set regressoset _ i of the influence factor i is 1, the function relationship between the influence factor i and the epidemic severity is in a quadratic function form with negative quadratic coefficient, an influence factor numerical array Data in the Regression Data set regressoset _ i corresponding to the influence factor i and a value array Degreee of the epidemic severity corresponding to the influence factor numerical array are combined into an ordered pair and stored in a Test _ i array, namely, Test0_ i { (Dala 0[0], Degreee 0[0]), (Dala 0[1], Degre 0[1]), (Data0[ n ], Degre 0[ n ]) }, the ordered pair Data set Test _ i corresponding to the influence factor i is input into the model Regression, the ordered pair is defined as a Regression point in the function, the quadratic coefficient label i _ i and the influence factor i conform to the quadratic coefficient b, and the quadratic coefficient conform to the standard point index a, and the quadratic coefficient b are negative, fitting out a complete functional relation function 1;
if the label _ i in the Regression data set regressorset _ i of the influencing factor i is 2, the functional relationship between the influencing factor i and the severity of the epidemic is in the form of a cubic function, and the Standard value Standard _ i associated with the influencing factor in the Regression data set regressorset _ i corresponding to the influencing factor i and the value Degree _ i of the severity of the corresponding epidemic are combined into an ordered pair, the Standard value Degree _ i is 0.5, and the ordered pair (0, R1) and (1, R2) are formed, wherein R1 is R1 in the Feature data set Feature _ i corresponding to the influencing factor i, R2 is R2 in the Feature data set Feature _ i corresponding to the influencing factor i, and the ordered pairs are stored in the Test1_ i array, that is Test1_ i { (Standard _ i, Degree _ i), (R732, R2), (R4934) is defined in the Regression data set corresponding to the Regression data set featuredsi, calculating coefficients a _ i and b _ i of each cubic function according with the influence factor i by the regression algorithm according to the standard points, and fitting a complete function relation 2;
if the mark label _ i in the Regression data set regressorset _ i of the influence factor i is 3, the functional relationship between the influence factor i and the severity of the epidemic is an inverse proportional function form, an ordered pair is formed by a Standard value Standard _ i associated with the influence factor in the Regression data set regressorset _ i corresponding to the influence factor i and a value Degreee _ i corresponding to the severity of the epidemic, the ordered pair is input into a Regression model Regression, the Regression algorithm calculates an inverse proportional function coefficient a _ i according with the influence factor i according to the Standard points, and a complete functional relationship function3 is fitted;
the feature Part _3 is described;
a new crown epidemic situation comprehensive evaluation method based on a classification regression model is characterized by comprising the following steps
Part _4, calculating the value and weight coefficient of each level of factor, calculating epidemic situation risk coefficient, and selecting the factor with the largest influence on the epidemic situation, specifically
Firstly, calculating epidemic severity p corresponding to the value of a primary factor and proportion w1 occupied in a corresponding secondary factor, if the label _ i in the regression Data set regressorset _ i of the influence factor i is 1, then the function model function1 between the influence factor i and the epidemic severity calculates the weighted Average value Average _ i of the numerical array Data of the influence factor in the regression Data set regressorset _ i corresponding to the influence factor i, and substitutes the weighted Average value Average _ i into the function model function1, the value of the function1(Average _ i) is the calculation Result of p _ i in the Result Data set Result _ i of the influence factor i, and w1_ i in the Result Data set Result _ i of the influence factor i is (p _ i-0.5) (1/n) + w0 i, wherein w0_ i is the calculation Result of the eSet _ i in the initial weight set by the characteristic Data set Feati, and the weight of the secondary factor included in the primary factor is the secondary factor eSet _ i, i.e., Number _ i value in FeatureSet _ i;
if the label _ i in the regression Data set regressoset _ i of the influence factor i is 2, the function model function2 between the influence factor i and the severity of the epidemic substitutes the influence factor value Data in the regression Data set regressoset _ i of the influence factor i into the function model function2, and the value of function2(Data) is calculated to be p _ i in the Result Data set Result _ i of the influence factor i, while w1_ i in the Result Data set Result _ i of the influence factor i is the calculation Result of (p _ i-0.5) (1/n) + w0_ i, wherein w0_ i is the initial weight value of the influence factor set in the feature Data set FeatureSet in the Number of secondary factors, and n is the Number of primary factors included in the corresponding secondary factors, namely the Number _ i in FeatureSet _ i;
if the mark type _ i in the regression Data set regressoset _ i of the influence factor i is 3, substituting the influence factor value Data in the regression Data set regressoset _ i of the influence factor i into a function model function3 to calculate the value of function3(Data) as p _ i in the Result Data set Result _ i of the influence factor i, wherein w0_ i is the initial weight value of the influence factor in the Number of secondary factors in the Result Data set Result _ i of the influence factor i, and n is the Number of primary factors included in the corresponding secondary factors, namely the Number _ i in featuresi, and w1_ i in the Result Data set Result _ i of the influence factor i is (p _ i-0.5) (1/n) + w0_ i;
secondly, sequentially inputting the processed primary factors into a floating module, adjusting weight coefficients, comparing w1_ i of the input primary factor i with w1_ j of the associated primary factor j, correspondingly increasing and decreasing w1_ i according to the numerical value and the proportional relation of w1, correspondingly decreasing and increasing w1_ j, namely w1_ i is w1_ i +/- Δ w, w1_ j is w1_ j +/- Δ w, respectively comparing and adjusting w1_ i of the primary factor i with w1 of the associated primary factor according to the steps, and realizing the first round of adjustment of the weight coefficients;
carrying out proportional normalization processing on the weight coefficients of all primary factors belonging to the same class of secondary factors, so that the sum of all weight coefficients is 1, namely w1_ i is w1_ i/(w1_ i + w1_ (i +1) +. + w1_ m), wherein w1_ i + w1_ (i +1) +. + w1_ m is 1, after all weight coefficients are subjected to proportional reduction processing, the factor is set to belong to a positive factor and a negative factor through the self-attribute of the influence factor and the corresponding epidemic severity degree p, and if the factor is a positive factor, the weight coefficient w1 is recorded as a negative number, namely w1 is-w 1; if the secondary factors are negative factors, the weight coefficient w1 is unchanged, then the epidemic severity p _ i corresponding to all the primary factors i contained in each type of secondary factors is multiplied by the adjusted weight coefficient w1_ i to obtain a parameter m _ i, namely m _ i is p _ i w1_ i, and then all the obtained parameters are added to obtain a value h _ j of the secondary factors j, namely h _ j is m _1+ m _2+. the value + m _ n, wherein n is the Number of the primary factors contained in the secondary factors, namely the Number _ i value in FeatureSet _ i;
according to the method for the first round adjustment of the weight coefficient, the weight coefficient w2 of the four secondary factors is adjusted for the second round, after the adjustment, the factor is determined to belong to a positive factor and a negative factor according to the value h of the secondary factor and the attribute of the secondary factor, if the factor is the positive factor, the weight coefficient w2 is assigned with a negative number, namely w2 is-w 2; if the factor is negative, the weighting coefficient w2 is not changed;
finally calculating epidemic situation risk coefficient of target area
nCoV=w2_1*h_1+w2_2*p_2+w2_3*p_3+w2_4*p_4
Comparing the weight coefficients w2 of all secondary factors, wherein the secondary factor with the largest weight coefficient w2 is judged as the aspect with the largest influence on the severity of the epidemic situation by the target country/region, and meanwhile, in each type of secondary factor, the factor with the largest weight coefficient w1 is the factor with the largest influence on the aspect, so that the comprehensive factors with the largest influence on the local epidemic situation, namely the class with the largest weight coefficient in the medical level, the epidemic situation prevention and control strength, the geographic factor influence and the target region basic characteristics, are obtained, and warning and guiding information is provided for the epidemic situation prevention and control work;
the feature Part _4 is described.
CN202011006901.3A 2020-09-17 2020-09-17 New crown epidemic situation comprehensive evaluation method based on classification regression model Active CN112164471B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011006901.3A CN112164471B (en) 2020-09-17 2020-09-17 New crown epidemic situation comprehensive evaluation method based on classification regression model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011006901.3A CN112164471B (en) 2020-09-17 2020-09-17 New crown epidemic situation comprehensive evaluation method based on classification regression model

Publications (2)

Publication Number Publication Date
CN112164471A true CN112164471A (en) 2021-01-01
CN112164471B CN112164471B (en) 2022-05-24

Family

ID=73863426

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011006901.3A Active CN112164471B (en) 2020-09-17 2020-09-17 New crown epidemic situation comprehensive evaluation method based on classification regression model

Country Status (1)

Country Link
CN (1) CN112164471B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112802611A (en) * 2021-02-04 2021-05-14 天博电子信息科技有限公司 Visual area prevention and control method based on epidemic situation risk model
CN112992372A (en) * 2021-03-09 2021-06-18 深圳前海微众银行股份有限公司 Epidemic situation risk monitoring method, device, equipment, storage medium and program product
CN112986503A (en) * 2021-04-20 2021-06-18 深圳市儒翰基因科技有限公司 Quantitative monitoring system and method for pathogen microorganism safety risk indexes
CN113192640A (en) * 2021-05-06 2021-07-30 浙江工业大学 New crown risk stage assessment method and system based on transfer learning
CN113724792A (en) * 2021-08-01 2021-11-30 北京工业大学 Correlation analysis-based virus diffusion and climate factor relationship analysis method
CN114264784A (en) * 2021-12-03 2022-04-01 淮阴工学院 Cultivation water regime judgment method and system based on sensor risk interval model
CN117116495A (en) * 2023-09-08 2023-11-24 天津医科大学眼科医院 Fine classification method and system for keratoconus

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2013145224A (en) * 2011-12-14 2013-07-25 Dretec Co Ltd Influenza information display device and influenza information/heat stroke information display device
US20130275160A1 (en) * 2010-12-15 2013-10-17 Michal Lev System and method for analyzing and controlling epidemics
CN108172301A (en) * 2018-01-31 2018-06-15 中国科学院软件研究所 A kind of mosquito matchmaker's epidemic Forecasting Methodology and system based on gradient boosted tree
CN109147949A (en) * 2018-08-16 2019-01-04 辽宁大学 A method of based on post-class processing come for detecting teacher's sub-health state
CN109754881A (en) * 2017-11-03 2019-05-14 中国移动通信有限公司研究院 A kind of appraisal procedure and device of community's screening scheme
CN110085327A (en) * 2019-04-01 2019-08-02 东莞理工学院 Multichannel LSTM neural network Influenza epidemic situation prediction technique based on attention mechanism
CN110993118A (en) * 2020-02-29 2020-04-10 同盾控股有限公司 Epidemic situation prediction method, device, equipment and medium based on ensemble learning model
CN111402347A (en) * 2020-03-20 2020-07-10 吴刚 New crown pneumonia epidemic situation prevention and control system based on Internet of things
CN111462919A (en) * 2020-03-31 2020-07-28 中国科学院软件研究所 Method and system for predicting insect-borne diseases based on sliding window time sequence model
CN111639191A (en) * 2020-05-08 2020-09-08 中科院合肥技术创新工程院 Prediction method for simulating epidemic situation development trend by novel coronavirus knowledge map

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130275160A1 (en) * 2010-12-15 2013-10-17 Michal Lev System and method for analyzing and controlling epidemics
JP2013145224A (en) * 2011-12-14 2013-07-25 Dretec Co Ltd Influenza information display device and influenza information/heat stroke information display device
CN109754881A (en) * 2017-11-03 2019-05-14 中国移动通信有限公司研究院 A kind of appraisal procedure and device of community's screening scheme
CN108172301A (en) * 2018-01-31 2018-06-15 中国科学院软件研究所 A kind of mosquito matchmaker's epidemic Forecasting Methodology and system based on gradient boosted tree
CN109147949A (en) * 2018-08-16 2019-01-04 辽宁大学 A method of based on post-class processing come for detecting teacher's sub-health state
CN110085327A (en) * 2019-04-01 2019-08-02 东莞理工学院 Multichannel LSTM neural network Influenza epidemic situation prediction technique based on attention mechanism
CN110993118A (en) * 2020-02-29 2020-04-10 同盾控股有限公司 Epidemic situation prediction method, device, equipment and medium based on ensemble learning model
CN111402347A (en) * 2020-03-20 2020-07-10 吴刚 New crown pneumonia epidemic situation prevention and control system based on Internet of things
CN111462919A (en) * 2020-03-31 2020-07-28 中国科学院软件研究所 Method and system for predicting insect-borne diseases based on sliding window time sequence model
CN111639191A (en) * 2020-05-08 2020-09-08 中科院合肥技术创新工程院 Prediction method for simulating epidemic situation development trend by novel coronavirus knowledge map

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
XIAOCHEN LI: ""Risk factors for severity and mortality in adult COVID-19 inpatients in Wuhan"", 《JOURNAL OF ALLERGY AND CLINICAL IMMUNOLOGY》 *
崔文华 等: ""基于递阶支持向量机的产品族配置性能预测"", 《计算机集成制造系统》 *
植运超 等: ""基于改进SEIR模型的COVID-19疫情状况评估及发展趋势预测"", 《东莞理工学院学报》 *
黄德生 等: ""沈阳市细菌性痢疾疫情分类回归树分析"", 《中国医科大学学报》 *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112802611A (en) * 2021-02-04 2021-05-14 天博电子信息科技有限公司 Visual area prevention and control method based on epidemic situation risk model
CN112992372A (en) * 2021-03-09 2021-06-18 深圳前海微众银行股份有限公司 Epidemic situation risk monitoring method, device, equipment, storage medium and program product
CN112986503A (en) * 2021-04-20 2021-06-18 深圳市儒翰基因科技有限公司 Quantitative monitoring system and method for pathogen microorganism safety risk indexes
CN113192640A (en) * 2021-05-06 2021-07-30 浙江工业大学 New crown risk stage assessment method and system based on transfer learning
CN113724792A (en) * 2021-08-01 2021-11-30 北京工业大学 Correlation analysis-based virus diffusion and climate factor relationship analysis method
CN113724792B (en) * 2021-08-01 2024-04-09 北京工业大学 Virus diffusion and climate factor relation analysis method based on correlation analysis
CN114264784A (en) * 2021-12-03 2022-04-01 淮阴工学院 Cultivation water regime judgment method and system based on sensor risk interval model
CN114264784B (en) * 2021-12-03 2023-08-22 淮阴工学院 Breeding water condition judging method and system based on sensor risk interval model
CN117116495A (en) * 2023-09-08 2023-11-24 天津医科大学眼科医院 Fine classification method and system for keratoconus
CN117116495B (en) * 2023-09-08 2024-04-05 天津医科大学眼科医院 Fine classification method and system for keratoconus

Also Published As

Publication number Publication date
CN112164471B (en) 2022-05-24

Similar Documents

Publication Publication Date Title
CN112164471B (en) New crown epidemic situation comprehensive evaluation method based on classification regression model
Goel et al. OptCoNet: an optimized convolutional neural network for an automatic diagnosis of COVID-19
Mohammed et al. Benchmarking methodology for selection of optimal COVID-19 diagnostic model based on entropy and TOPSIS methods
ȚĂRANU Data mining in healthcare: decision making and precision
Si et al. RETRACTED ARTICLE: Picture fuzzy set-based decision-making approach using Dempster–Shafer theory of evidence and grey relation analysis and its application in COVID-19 medicine selection
Morid et al. Learning hidden patterns from patient multivariate time series data using convolutional neural networks: A case study of healthcare cost prediction
CN112820415B (en) GIS-based chronic disease spatial-temporal evolution feature analysis and environmental health risk monitoring system and method
Chhabra et al. An advanced VGG16 architecture-based deep learning model to detect pneumonia from medical images
Suma et al. Nature inspired optimization model for classification and severity prediction in COVID-19 clinical dataset
Wu et al. Using apriori algorithm on students’ performance data for Association Rules Mining
CN116883768A (en) Lung nodule intelligent grading method and system based on multi-modal feature fusion
AU2021102593A4 (en) A Method for Detection of a Disease
Rezaei et al. Improve data classification performance in diagnosing diabetes using the Binary Exchange Market Algorithm
Raza et al. Auditing ICU Readmission Rates in an Clinical Database: An Analysis of Risk Factors and Clinical Outcomes
Ramesh et al. A frame work for classification of multi class medical data based on deep learning and Naive Bayes classification model
Uematsu et al. Development of a Novel Scoring System to Quantify the Severity of Incident Reports: An Exploratory Research Study
Baltas et al. Data Driven Modelling of Coronavirus Spread in Spain.
Garg et al. Predicting family physicians based on their practice using machine learning
Vasa et al. A Machine Learning Model to Predict a Diagnosis of Brain Stroke
Rallapalli et al. Big data ensemble clinical prediction for healthcare data by using deep learning model
Gancheva et al. X-Ray Images Analytics Algorithm based on Machine Learning
Zoha et al. A numerical approach to maximize the number of testing of COVID-19 using conditional cluster sampling method
Baiju et al. Diabetes Retinopathy Prediction Using Multi-model Hyper Tuned Machine Learning
Lakshmi et al. A Review And Analysis Of The Role Of Machine Learning Techniques To Predict Health Risks Among Women During Menopause
Anitha et al. A Review on Disease Prediction Approach using Data Analytics and Machine Learning Algorithms

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant