CN110349673B

CN110349673B - Group constitution evaluation method based on Gaussian mixture distribution

Info

Publication number: CN110349673B
Application number: CN201910570304.4A
Authority: CN
Inventors: 赵宏伟; 张宝亮; 赵浩宇; 范丽丽; 胡黄水; 李星; 姚瑶; 张原瑞; 王万鹏; 刘萍萍
Original assignee: Jilin University
Current assignee: Jilin University
Priority date: 2019-06-27
Filing date: 2019-06-27
Publication date: 2022-08-05
Anticipated expiration: 2039-06-27
Also published as: CN110349673A

Abstract

The invention discloses a population constitution evaluation method based on Gaussian mixture distribution, which comprises the following steps: step 1: randomly dividing unlabeled physical test data into a plurality of sections as test data and training data; step 2: preprocessing each section of test data and training data in the step 1; and step 3: extracting each group of characteristic data by using an unsupervised learning algorithm; and 4, step 4: fitting each group of characteristic data obtained in the step 3, and judging the number of mixed distribution; and 5: calculating the weight and the average value of each mixed distribution by using an EM algorithm; step 6: and (4) establishing a three-level evaluation model, and substituting the observation and calculation results obtained in the step (4) and the step (5) into the three-level evaluation model and the group constitution evaluation quantification formula to obtain a grade and a grading result. The method is completely independent of individual constitution evaluation results, and the group constitution evaluation results can be obtained without depending on the individual evaluation results.

Description

Group constitution evaluation method based on Gaussian mixture distribution

Technical Field

The invention belongs to the field of physique evaluation, relates to a physique health evaluation method, and particularly relates to a population physique evaluation method based on Gaussian mixture distribution.

Background

Constitutions refer to the quality of the human body, and are the comprehensive and relatively stable characteristics of morphological structure, physiological functions, and psychological factors of the human body expressed on the basis of heredity and acquirement. Physical health assessment has been a hot topic in the field of health research. The physique evaluation can evaluate the national physique and health condition by using scientific indexes and methods, and further continuously improve and strengthen the national physique. At present, many domestic scholars have made a lot of active and beneficial exploration and practice on the research of physique monitoring and evaluation systems, and have achieved good effects. Most of the existing research results are obtained by obtaining expert knowledge through early investigation to obtain various evaluation indexes and weight coefficients thereof, and then, the existing statistical formulas or curve fitting technology is utilized to evaluate the individual constitutions. In the last two decades of published literature, assessing the physical health of a population is a simple statistic of the results of individual assessments.

Machine learning is a branch of artificial intelligence and in many cases is almost a pronoun for artificial intelligence. Machine learning systems are used to identify objects in images, transcribe speech into text, match news items, posts or products to the interests of the user, and select relevant results for searches. It is also an important medical auxiliary means, and has important application value in the field of medical care. Although the evaluation model is widely applied in other fields, the problem of assessing the health of the population in a complex data environment is still a worthwhile and not deeply researched problem.

Disclosure of Invention

In order to solve the problem of the existing group constitution evaluation, the invention provides a group constitution evaluation method based on Gaussian mixture distribution. The method has the core idea that a convolutional neural network is used for automatically learning characteristics from original sports test data without supervision, a group constitution three-level evaluation model is provided based on Gaussian mixture distribution, and the learned characteristics are sent to the evaluation model to obtain a group constitution evaluation result.

The purpose of the invention is realized by the following technical scheme:

a population constitution evaluation method based on Gaussian mixture distribution comprises the following steps:

step 1: randomly dividing unlabeled physical test data into a plurality of sections as test data and training data;

step 2: preprocessing each section of test data and training data in the step 1;

and step 3: taking the training data preprocessed in the step 2 as each input of a convolutional neural network model, and extracting each group of characteristic data by using an unsupervised learning algorithm;

and 4, step 4: fitting each group of characteristic data obtained in the step 3, and judging the number of mixed distribution;

and 5: calculating the weight and mean value of each mixed distribution by using an EM (effective electromagnetic) algorithm according to each group of characteristic data obtained in the step 3;

step 6: and (4) establishing a three-level evaluation model, and substituting the observation and calculation results obtained in the step (4) and the step (5) into the three-level evaluation model and the group constitution evaluation quantification formula to obtain a grade and a grading result.

Compared with the prior art, the invention has the following advantages:

1. the method is completely independent of individual constitution evaluation results, and the group constitution evaluation results can be obtained without depending on the individual evaluation results.

2. The method fully considers the population constitution distribution characteristics, can be used for the constitution evaluation of all regions and all classes of people, and has the characteristics of global property and universality.

Drawings

FIG. 1 is a training flow chart of the method for assessing population constitution based on Gaussian mixture distribution according to the present invention;

FIG. 2 is a diagram of a feature extraction convolutional neural network in the present invention;

FIG. 3 is a test data feature distribution diagram of the Gaussian mixture distribution based population fitness evaluation method of the present invention;

FIG. 4 is a graph of two sets of Gaussian mixture distributions under test data in the present invention.

Detailed Description

The technical solution of the present invention is further described below with reference to the accompanying drawings, but not limited thereto, and any modification or equivalent replacement of the technical solution of the present invention without departing from the spirit and scope of the technical solution of the present invention shall be covered by the protection scope of the present invention.

The invention provides a group constitution evaluation method based on Gaussian mixture distribution, which takes the characteristic distribution situation of group constitution into consideration and has important influence on group constitution evaluation. The method provides a physical health assessment model mainly comprising characteristic learning and characteristic assessment by using a machine learning method, and aims to establish a group physical health state assessment model by analyzing group physical test data independent of individual assessment results so as to master group health states. As shown in fig. 1, the physical fitness evaluation method specifically includes the following steps:

step 1: the unlabeled sports test data is divided into several segments as training data.

And dividing the unlabeled sports test data into a plurality of sections as training data, and inputting the training data into the model as the input of the convolutional neural network. According to different test items, data are divided into 7 dimensions, 1800 data items are randomly extracted from each dimension to serve as test data, and the rest data serve as training data. Of these training data, 1800 data items are randomly drawn at a time as a training input.

Step 2: and carrying out preprocessing operation on each section of data.

Before feature extraction, we need to perform normalization processing on the data. In feature extraction, normalization is very important. This is because there may be multiple features from sample to sample, and the scale of the values of the different features is different. Without normative handling, the large differences in dimensions may cause the entire model to fail.

In this step, we adopt 7-dimensional data, and in order to maintain the distribution characteristics of the original data, we adopt a 0-1 normalization method:

wherein X is an input data item comprising training data and test data, X _max For the maximum term in this set of data, X _min The min term for this set of data. In the invention, max is 1, and min is 0. The data normalization formula can be written as:

eventually all data is normalized to between 0-1.

And step 3: feature information is extracted using an unsupervised learning algorithm.

In the step, a model consisting of two layers of convolutional neural networks is used to convert a large amount of original signals into reduction set characteristics. As shown in fig. 2, the convolutional neural network includes two convolutional layers, two activation layers, and two pooling layers. In the convolutional layer, the size of a convolutional kernel is set to be 3 multiplied by 1, and the step length is designed to be 1; the activation layer adopts a ReLu activation function; the pooling layer filter was set to 2 x 1, using the maximum pooling function. And (3) taking the data in the step (2) as each input of the model, and obtaining the feature mapping of the constitution data after passing through the convolution layer, the activation layer and the pooling layer. And by combining the thought of self-coding, the reconstruction error between the input data and the reconstruction input is analyzed, and the network parameters are fed back and adjusted, so that better learning characteristics can be obtained finally. And 3, 5000 times of iteration is needed, and when the error approaches zero, the corresponding characteristic column can be taken out.

And 4, step 4: and fitting the characteristic data and judging the number of Gaussian mixture distribution.

In this step, the mixed distribution of the feature data needs to be observed. And (3) fitting the characteristic data obtained in the step (3) by using a fitting function in a Python language, observing the data distribution condition, and recording the distribution number of the mixed distribution.

And 5: and calculating the weight and the average value of each mixed distribution by using an EM algorithm.

In this step, the weight and the average value of each mixed distribution in the corresponding group are calculated by using the characteristic data of each group obtained in the step 3. The EM algorithm process is shown in table 1:

TABLE 1

Step 6: and (4) establishing a three-level evaluation model, and substituting the calculation result of the step (5) into a group constitution evaluation quantification formula to obtain a grade and a grading result.

Step 6.1:

the three-level evaluation model established in this step is shown in table 2:

TABLE 2

Wherein K is the number of partial models, alpha _max Is the largest weight, mu, of the K fractional models _n And mu _m The average values are respectively corresponding to the two maximum weight component models. a (0 < a < 1) is a threshold used to describe the weight difference, and b (0 < b < 1) is a distance threshold used to describe the two distributions with the greatest weight. In this step, the weight difference threshold value a is set to 0.3, and the distance threshold value b is set to 0.3. Substituting the distribution number, weight and average value observed and calculated in the

steps

4 and 5 into the table 2 to obtain the constitution evaluation grade.

When the feature is expressed as a single Gaussian model or a multi-Gaussian model and satisfies the inequality alpha _max -(1-α _max ) At > alpha, scoreSetting as A grade;

when the features are expressed as multiple Gaussian models and satisfy the inequality set

When the test is finished, the test is rated as B;

When the test was completed, the rating was C.

Step 6.2: and calculating an evaluation result.

According to the distribution number, the weight and the average value, the constitution evaluation result of the belonged group can be calculated by combining the group constitution evaluation quantification formula designed in the invention. The quantitative formula for the group constitution evaluation is as follows:

in the formula, the function h is expressed as that when the input is greater than 0, the function value is the input itself, otherwise, the function value is 0, and the other parameters are the same as those in step 6.1.

In the experiment, the method is tested by using the sports test result of the whole school (girl) published by a certain college website. According to the latest revised national student physical health standards of the ministry of education of China, the BIM, the vital capacity, the standing long jump, the forward bending of the sitting position, the running at 50 meters, the running at 800 meters and the sit-up at one minute are selected as seven test items, and a characteristic probability distribution diagram is obtained, as shown in figure 3.

From this probability distribution map, the features follow a substantially gaussian distribution. However, the probability distribution of some features is not subjected to single gaussian distribution, but to mixed gaussian distribution, and the results are obtained by using the group constitution evaluation quantitative formula according to the three-level evaluation model established in step 6.1, as shown in table 3.

TABLE 3

Finally, the gaussian mixture distribution map of BIM and vital capacity can be split into a single gaussian distribution map, as shown in fig. 4.

Claims

1. A population constitution assessment method based on Gaussian mixture distribution is characterized by comprising the following steps:

and 5: calculating the weight and the mean value of each mixed distribution by using an EM (effective electromagnetic radiation) algorithm according to each group of characteristic data obtained in the step 3;

step 6: establishing a third-level evaluation model, substituting the observation and calculation results of the step 4 and the step 5 into a third-level evaluation model and a group constitution evaluation quantification formula to obtain a grade and a grading result, wherein the third-level evaluation model comprises a grade A, a grade B and a grade C, and the third-level evaluation model comprises the following steps:

the A grade satisfies one of the following conditions:

the characteristics are expressed as a single Gaussian model, and K is 1;

the characteristics are expressed as a Gaussian mixture model, K > is 2, and the following conditions are satisfied:

α _max -(1-α _max )＞a；

the B-level features are expressed as a multiple Gaussian model, K > 2, and satisfy the set of inequalities:

the C grade characteristic is expressed as a multiple Gaussian model, K > is 2, and satisfies the inequality group:

in the formula, K is the number of the partial models, alpha _max Is the largest weight, mu, of the K fractional models _n And mu _m The average values respectively corresponding to the two maximum weight component models are used for describing a threshold value of the weight difference, b is used for describing a distance threshold value of the two maximum weight distributions, and the group constitution assessment quantification formula is defined as follows:

in the formula, K is the number of the partial models, alpha _max Is the largest weight, mu, of the K fractional models _n And mu _m The function h is expressed as that when the input is greater than 0, the function value is the input itself, otherwise, the function value is 0.

2. The method for assessing the fitness of a population based on the Gaussian mixture distribution as claimed in claim 1, wherein in the step 1, each set of the test data and the training data comprises 1800 data items.

3. The method for assessing the group constitution based on Gaussian mixture distribution as claimed in claim 1, wherein in the step 2, each segment of test data and training data is preprocessed by a 0-1 normalization method, wherein: the data normalization formula is as follows:

in which X is an input data item, X _max For the maximum term in this set of data, X _min The min term for this set of data.

4. The method for population quality assessment based on Gaussian mixture distribution as claimed in claim 1, wherein in said step 3, the convolutional neural network comprises two convolutional layers, two activation layers and two pooling layers, wherein: in the convolutional layer, the size of a convolutional kernel is set to be 3 multiplied by 1, and the step length is designed to be 1; the activation layer adopts a ReLu activation function; the pooling layer filter was set to 2 x 1, using the maximum pooling function.

5. The method according to claim 1, wherein in the step 4, the characteristic data of each group obtained in the step 3 are fitted by using a Python fitting function, the data distribution is observed, and the distribution number of the mixture distribution is recorded.

6. The method for population health assessment based on Gaussian mixture distribution as claimed in claim 1, wherein in said step 5, the E step of the EM algorithm is:

calculating the partial model k to the observation data y according to the current model parameters _i The formula is as follows: