CN113807490A

CN113807490A - Data linear correlation judgment method based on convolutional neural network

Info

Publication number: CN113807490A
Application number: CN202010535355.6A
Authority: CN
Inventors: 汪丽莉; 刘烨; 李大明; 李伟豪; 郭博研; 朱子杰; 田浥岐
Original assignee: Shanghai University of Engineering Science
Current assignee: Shanghai University of Engineering Science
Priority date: 2020-06-12
Filing date: 2020-06-12
Publication date: 2021-12-17

Abstract

The invention discloses a data linear correlation judgment method based on a convolutional neural network, which overcomes the defect that the application range of the traditional analysis is limited; the training strategy refers in particular to a strategy of generating random measurement data according to the nonlinear rate and the data quality parameters to generate images; the method for judging the linear correlation of the data based on the convolutional neural network can give the judgment reliability of the network under different conditions without depending on the statistical assumption of variables.

Description

Data linear correlation judgment method based on convolutional neural network

Technical Field

The invention relates to data linear correlation judgment, in particular to a data linear correlation judgment method based on a convolutional neural network.

Background

In recent years, a big outbreak of deep learning theory and practice provides a research basis for establishing a new linear correlation analysis method. The convolutional neural network is used as a deep learning model with ultrahigh learning efficiency, is widely applied to the fields of image and voice recognition, financial analysis and scientific research, and achieves a leap development. The powerful feature extraction capability makes it a powerful analytical modeling tool.

As two closely related analysis means, correlation analysis and regression analysis have important application in scientific experimental data processing and various engineering practices. The objective of regression analysis is to obtain quantitative mathematical relationships between the variables under study using a method of data fitting. The analysis may be performed with a linear, non-linear, or a specified function fit to the variables. However, since the relationship between the variables is not known in advance, there is a problem in that it is difficult to select a correct fitting functional relationship in the fitting process, resulting in distortion of the model. If the accuracy of the fitting is pursued, the fitting is overfitted. Correlation analysis can provide a reasonable reference for regression analysis. In the classical correlation analysis, the linear correlation coefficient based on the Pearson product distance can reflect the strength of the linear correlation between two variables. Therefore, judgment and support of the rationality degree are provided for linear regression analysis.

However, the existence of the pearson product distance is based on the assumption that both variables conform to a normal distribution, and the application range is greatly limited. Although the theory of correlation analysis continues to develop, there is no linear correlation analysis method with wide applicability.

Disclosure of Invention

The invention aims to provide a data linear correlation judgment method based on a convolutional neural network, which can be independent of the statistical hypothesis of variables and provide the judgment reliability of the network under different conditions.

The technical purpose of the invention is realized by the following technical scheme:

a data linear correlation judgment method based on a convolutional neural network comprises the following steps:

establishing a convolutional neural network;

generating linear data and nonlinear data of accurate data according to the generating function;

setting nonlinear rate and data quality parameters and generating random measurement data based on accurate data;

generating an image according to a training strategy by using the generated random measurement data;

inputting the generated image into a convolutional neural network for training;

obtaining a convolutional neural network with data linear correlation judgment capability;

and detecting different nonlinear rates and data quality parameters to obtain corresponding recognition capability limits of the convolutional neural network, so as to obtain judgment reliability.

Preferably, the specific steps of generating the random measurement data according to the accurate data are as follows:

generating a training data set and a testing data set according to linear data and nonlinear data classification, generating accurate data according to the following two generating functions,

y_l＝bx+c；

y_nl＝ax²+bx+c；

wherein, y_lTo accurately linear data, y_nlFor accurate nonlinear data, a is the coefficient of the second order nonlinear term, b is the coefficient of the linear term, c is an arbitrary constant, and the nonlinear ratio is defined as: p_nl＝a/b。

Adjusting data quality parameters, obtaining random measurement data by a generating probability function expressed by the following two formulas,

wherein, σ is a data quality parameter, represents a relative deviation value of random measurement data and accurate data, and can be understood as relative uncertainty in real experimental measurement; y'_lIs a linear random number, y 'in the measured data'_nlIs a non-linear random number in the measured data.

Preferably, the strategy for generating the image specifically includes:

uniformly taking values in an x value interval, and generating a linear random number and a nonlinear random number according to a generating function of the measurement data; directly generating (x to y ') according to the corresponding relation between the x value and the linear random number and the nonlinear random number'_l) And (x-y'_nl) And (4) function images.

Preferably, the method for detecting and obtaining the identification capability limit of the convolutional neural network according to different nonlinear rates and data quality parameters specifically comprises the following steps:

inputting a function image generated by a training strategy into a convolutional neural network for training to obtain the convolutional neural network with judgment capability;

and identifying and judging the convolutional neural networks trained by different nonlinear rates and data quality parameters by inputting corresponding data images to obtain the identification judgment limit of the convolutional neural networks trained by different nonlinear rates and data quality parameters.

In conclusion, the invention has the following beneficial effects:

a novel data linear correlation judgment method based on a convolutional neural network is provided, and a training strategy of the method can give judgment reliability of the network under different nonlinear rates and data quality conditions.

Drawings

FIG. 1 is a schematic block flow diagram of the process;

FIG. 2 is an image generated by setting a policy;

FIG. 3 shows the non-linearity P_nl0.4, different data quality is takenIdentification results of the network at the time of the parameter;

fig. 4 shows the data quality parameter σ of 0.02, which is the recognition result of the network for different non-linear rates.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings.

According to one or more embodiments, a method for determining linear correlation of data based on a convolutional neural network is disclosed, which comprises the following steps:

establishing a convolutional neural network;

setting a nonlinear rate and a data quality parameter to generate random measurement data based on accurate data;

inputting the generated image into a convolutional neural network for training;

obtaining a neural network with data linear correlation judgment capability;

and detecting different nonlinear rates and data quality parameters to obtain corresponding recognition capability limits of the convolutional neural network, so as to obtain the judgment reliability of the method.

The specific steps of generating the measurement data according to the accurate data are as follows:

generating a training data set and a testing data set according to the linear data and the nonlinear data in a classification mode, generating accurate data according to a generating function,

y_l＝bx+c；

y_nl＝ax²+bx+c；

wherein, sigma is a data quality parameter and represents a relative deviation value of the measured data and the accurate data; y'_lIs a linear random number, y 'in the measured data'_nlIs a non-linear random number in the measured data.

The strategy for generating the image specifically comprises the following steps:

The method for detecting and obtaining the identification capability limit of the convolutional neural network according to different nonlinear rates and data quality parameters specifically comprises the following steps:

The method is based on the powerful complex data feature extraction capability of the convolutional neural network, and converts the traditional linear correlation analysis problem into an image recognition problem based on a deep learning method. A new linear correlation data analysis method based on the convolutional neural network is obtained by establishing the convolutional neural network with the capability of identifying the linear correlation degree of the data. The linear correlation analysis method does not depend on statistical hypothesis of variables, has a wider application range than a classical Pearson product distance correlation coefficient method, and has better expansibility. Therefore, judgment support can be better provided for regression analysis.

A network training strategy based on different data imaging methods is provided. And comparing the judgment capability of the network by using the accuracy index of the trained network under the conditions of different data quality and nonlinear degree, and providing the optimal convolutional neural network with linear correlation analysis capability.

Through big data training, the convolutional neural network can establish the internal mapping relation of input and output. And classifying according to the linear data and the nonlinear data to generate a training data set and a testing data set of accurate data, and training the network. The generation functions of the accurate linear data and the non-linear data are shown in formula (1) and formula (2).

y_l＝bx+c； (1)

y_nl＝ax²+bx+c； (2)

The image input into the convolutional neural network is not generated from accurate data. In practical applications, errors exist between the measured data and the accurate data. The measured data input into the convolutional neural network for training is linear data y of accurate data_lAnd non-linear data y_nlCentered at σ y_lAnd σ y_nlFor standard deviation, random numbers conforming to a normal distribution are generated. As measurement data, a linear random number y'_lAnd a non-linear random number y'_nlThe value distribution of (A) is given by the formulas (3) and (4).

Wherein, σ is a data quality parameter, which represents a relative deviation value between random measurement data and accurate data, and can be understood as relative uncertainty in real experimental measurement. When σ is 0.02, the standard deviation indicating that the measured value deviates from the true value is ± 2%. By setting different sigma values, test data with different data qualities can be generated, and the identification capability of the network under different data quality conditions is tested so as to detect the limit of the identification capability of the network.

Without loss of generality, in x ∈ [0,1 ]]Evenly taking 11 points of data in intervals, and generating 11 real data y respectively_lAnd y_nl. According to equations (3) and (4), 11 linear random numbers y 'are each generated'_lAnd a non-linear random number y'_nl. Finally, utilizing the generated y'_lAnd y'_nlAnd (4) making a function image by using the data, and inputting the function image into a convolutional neural network for training. Fig. 2 (a) and (b) show images of linear data and nonlinear data generated according to a training strategy. In the data generation, a ═ b ═ c ═ 1, and σ ═ 0.02 are selected.

For clarity, two examples are given, respectively:

1. and (3) improving the data quality recognition rate:

as shown in FIG. 3, take P_nlWhen the data quality parameter σ is 0.01, 0.02 and 0.03, the data quality gradually deteriorates as σ gradually increases, which leads to the reduction of the recognition capability of the convolutional neural network. When σ is 0.02, the convolutional neural network has not been able to identify the difference between one linear and another non-linear image.

2. Improvement in nonlinear rate detection:

as shown in fig. 4, when σ is taken to be 0.02 as a constant value, different nonlinear coefficient values are adopted, and when P is taken to be P_nlWhen P is 0.8, the recognition rate can be 99% or more as shown in fig. 4(a), and P is equal to_nlAs shown in fig. 4(b), 0.6, the recognition rate can be 99% or more; when P is present_nlWhen the value is 0.2, the effect of the nonlinear term becomes weaker and the intelligibility decreases to 0.5 as shown in fig. 4 (c).

In actual network training, the size of the training set is 20000 pictures. The figure shows the accuracy of the training strategy under different sigma conditions. During the training process, the non-linear rate P is maintained_nlStep by step from 1 to 0.4, with P_nlA decrease in the value, that is to say that it meansThe contribution of the middle non-linear term is smaller and smaller, and under the condition, if the artificial intelligence identification by using the traditional method cannot be distinguished, the judgment reliability of the convolutional neural network under different non-linear rates and data quality conditions can be obtained by using the method, so that the judgment and analysis of the physical experiment measurement data are facilitated.

The present embodiment is only for explaining the present invention, and it is not limited to the present invention, and those skilled in the art can make modifications of the present embodiment without inventive contribution as needed after reading the present specification, but all of them are protected by patent law within the scope of the claims of the present invention.

Claims

1. A data linear correlation judgment method based on a convolutional neural network is characterized by comprising the following steps:

establishing a convolutional neural network;

inputting the generated image into a convolutional neural network for training;

2. The convolutional neural network-based data linear correlation decision method as claimed in claim 1, wherein the specific steps of generating the measurement data from the accurate data are as follows:

generating training data set and testing data set according to following linear data and non-linear data formulas, generating accurate data according to generating function,

y_l＝bx+c；

y_nl＝ax²+bx+c；

3. The convolutional neural network-based data linear correlation decision method as claimed in claim 2, wherein the training strategy for generating the image is specifically:

4. The convolutional neural network-based data linear correlation decision method as claimed in claim 3, wherein the obtaining of the recognition capability limit of the convolutional neural network based on different non-linear rates and data quality parameter detection specifically comprises: