CN111046080A

CN111046080A - Carbon fiber precursor monitoring data preprocessing method based on convolution denoising autoencoder

Info

Publication number: CN111046080A
Application number: CN201911238154.3A
Authority: CN
Inventors: 严俊伟; 高根源; 娄平; 刘泉; 周祖德
Original assignee: Wuhan University of Technology WUT
Current assignee: Wuhan University of Technology WUT
Priority date: 2019-12-05
Filing date: 2019-12-05
Publication date: 2020-04-21

Abstract

The invention discloses a carbon fiber precursor monitoring data preprocessing method based on a convolution denoising autoencoder, which comprises the following steps: 1) collecting monitoring data of carbon fiber precursor production, and arranging the data into a data set according to a time sequence and a process sequence; 2) selecting a sample without deletion in the data set as a target data set; carrying out random missing operation on the target data set to obtain a training data set; 3) establishing a convolution denoising autoencoder model, and training the model by using the training data set and the target data set stored in the step 2); 4) preprocessing the precursor monitoring data to be processed by utilizing the convolutional denoising self-encoder model trained in the step 3). The invention solves the problems of deletion and outlier in the carbon fiber monitoring data by designing the carbon fiber precursor monitoring data preprocessing method based on the convolution denoising autoencoder, and can quickly finish the preprocessing of the carbon fiber monitoring data.

Description

Carbon fiber precursor monitoring data preprocessing method based on convolution denoising autoencoder

Technical Field

The invention relates to a carbon fiber production technology, in particular to a carbon fiber precursor monitoring data preprocessing method based on a convolution denoising autoencoder.

Background

The carbon fiber is a novel high-strength and high-modulus fiber material with the carbon content of more than 90 percent, is lighter than aluminum in mass, has higher strength than steel, and has the characteristics of corrosion resistance and fatigue resistance. Meanwhile, the fiber has the soft processability of textile fiber, and is an important material in the aspects of national defense, military industry and civil use. The polyacrylonitrile-based carbon fiber is the carbon fiber with the highest market share at present due to the simple process, low cost and excellent performance. The research and production of carbon fiber in China are slow to start, and the current domestic high-performance carbon fiber products are limited by process technology and cannot realize comprehensive industrial production, so that more than 80 percent of high-performance carbon fiber in China depends on import.

The quality of carbon fiber precursors is a bottleneck for restricting the improvement of carbon fibers. In order to improve the quality of the precursor and ensure the production stability, the real-time monitoring of the production process of the carbon fiber precursor is necessary. In recent years, the monitoring problem of carbon fiber production process is more and more regarded by the industry. With the development of intelligent manufacturing and big data, more and more monitoring devices and sensors are deployed in the carbon fiber production process, so that a large amount of production process monitoring data is obtained. However, due to network transmission problems, sensor faults and other human factors, data quality problems such as noise, missing and the like often exist in carbon fiber monitoring data, and the characteristics of large data volume and high dimensionality bring great inconvenience to subsequent data analysis or modeling problems. Data loss refers to a phenomenon in which a part of a time period is lost in time-series data or a part of a dimension is lost. Data noise refers to the presence of erroneous values in the data or significant deviations from expected outliers. In carbon fiber monitoring data, prominent data quality issues are lack of dimensional data and insignificant outliers.

In the traditional data preprocessing method, data are preprocessed by a statistical knowledge or machine learning method. For the processing of missing data, although the filling speed is high by simple statistical methods (such as mean filling and median filling), the filling result deviates from the distribution of the original data to a certain extent; complex filling methods (such as regression filling, KNNI and the like) have low filling efficiency on large-scale data sets; for noise existing in monitoring data, a smoothing filtering method adopted at present smoothes data flow and removes abrupt change data. With the development of deep learning and the increasing amount of monitoring data, more and more deep learning techniques are applied to the field of data preprocessing. Compared with the traditional method, the deep learning has remarkable advantages in processing the problems of high dimensionality and large size.

The production environment of the carbon fiber precursor is complex, the processes are multiple, a complete precursor spinning process comprises the steps of preparation of stock solution, spinning, washing, drafting, oiling, drying and the like, and meanwhile, strong coupling relations exist among technological parameters, so that the monitoring data of the carbon fiber precursor production has the characteristics of high dimensionality, large volume, high value, timeliness, coupling and miscellaneous property. On one hand, the monitoring amount at a certain moment is the continuation of the change condition of the monitoring amount at the last moment; on the other hand, the monitoring amount is influenced by other monitoring amounts at the same time. The complex correlation brings certain difficulties to data preprocessing and subsequent data mining work. However, the existing data preprocessing method, especially the solution to the lack of carbon fiber monitoring and noise interference, lacks attention to the characteristics of the carbon fiber monitoring data, and the common statistical method has low processing efficiency and poor effect.

In order to solve the problems of deletion and noise contained in data quality in carbon fiber production monitoring data and the problems caused by the characteristics of large data volume and high dimensionality, the invention designs a carbon fiber precursor monitoring data preprocessing method based on a convolution denoising autoencoder. The method solves the problems of deletion and noise in the carbon fiber monitoring data through the convolution denoising self-encoder model, and has great advantages on the carbon fiber monitoring data with high dimension and large volume.

Disclosure of Invention

The invention aims to solve the technical problem of providing a carbon fiber precursor monitoring data preprocessing method based on a convolution denoising autoencoder aiming at the defects in the prior art.

The technical scheme adopted by the invention for solving the technical problems is as follows: a carbon fiber precursor monitoring data preprocessing method based on a convolution denoising autoencoder comprises the following steps:

1) collecting monitoring data of carbon fiber precursor production, and arranging the data into a data set according to a time sequence and a process sequence;

2) selecting a sample without deletion in the data set as a target data set; carrying out random missing operation on the target data set to obtain a training data set;

3) establishing a convolution denoising autoencoder model, and training the model by using the training data set and the target data set stored in the step 2);

the convolution denoising self-encoder model is as follows:

the convolution denoising self-encoder comprises 10 layers including a full connection layer, a 2D convolution layer, a maximum pooling layer, an upper sampling layer and a deconvolution layer,

wherein, the 1 st layer is a full-connection layer, the 2 nd and 4 th layers are 2D convolutional layers, the 3 rd and 5 th layers are maximum pooling layers, the 6 th and 8 th layers are up-sampling layers, the 7 th and 9 th layers are deconvolution layers, the 10 th layer is an output layer, and the output layer is a full-connection layer;

4) preprocessing the precursor monitoring data to be processed by utilizing the convolutional denoising self-encoder model trained in the step 3).

According to the scheme, the data in the step 1) are arranged into a data set according to a time sequence and a process sequence, and the method specifically comprises the following steps:

1.1) collecting real-time monitoring data of the production process of carbon fiber precursors, generating a data set X ═ X according to the data types of temperature, pressure and flow by taking time as a collection sequence_iWherein the operation data value of the system at each moment is used as one row of the matrix, i is more than or equal to 1 and less than or equal to n, n is the number of data variables, and the monitoring data comprises each of the raw yarn productionThe time period, for each data type, the data set is generated as:

X＝[X₁，X₂，X₃，…，X_i…，X_n]

1.2) reordering the features of the data set according to the following principle:

arranging the characteristics of the data set according to the process sequence of carbon fiber precursor production;

the monitoring data of the same type in the same procedure are adjacent;

1.3) uploading the sorted data set to a cloud platform.

According to the scheme, the specific steps of obtaining the training data set in the step 2) are as follows:

2.1) selecting a data subset X without deletion in the data set sorted in the step 1);

2.2) carrying out normalization processing on the data set X to serve as a target data set D _ output, wherein the normalization method is as follows:

wherein X is the data in the selected non-missing data set X, and X' is the data of the normalized data set X;

2.3) carrying out random deletion operation on the data set by using a random zero setting method, wherein the deletion proportion is set according to the deletion condition of the carbon fiber precursor monitoring data as required, so as to obtain a damaged data set D _ corruption;

and 2.4) normalizing the damaged data set D _ corruption to obtain an input data set D _ input.

According to the above scheme, before the random null-steering operation is performed on the data set by using the random null-steering method in the step 2.3), the method further comprises an operation of adding gaussian noise to the data set:

adding additive white Gaussian noise to X;

X′_i＝X_i+μ*Z_i～N(0，1)

where μ is a noise factor used to control the amplitude of the additive noise.

According to the scheme, in the step 3), the loss function adopted in the model training is selected as the mean square error of the reconstructed sample D _ train and the target sample D _ output;

wherein x is the target sample,

to reconstruct the sample.

According to the scheme, in the step 4), preprocessing is performed on the precursor monitoring data to be processed, and the preprocessing comprises the following specific steps:

4.1) sorting the data into a data set by taking days as a unit, rearranging the data set, and replacing missing values with zero values;

4.2) normalizing the data set;

4.3) loading the convolution denoising self-encoder model in the step 3), taking the data after normalization processing as input, and obtaining reconstructed data by using the model;

and 4.4) performing inverse normalization on the reconstructed data to obtain the preprocessed precursor monitoring data.

According to the scheme, the convolution denoising self-encoder has the following specific structure:

3.1) the layer 1 is a fully-connected layer, the number of neurons is 256, the activation function is a relu function, and then a feature vector output by the first layer is constructed into a feature map by utilizing a Reshape function, wherein the size is 16 × 1;

ReLu(x)＝max(0，x)

h₁＝ReLu(w¹h₀+b¹)

wherein h is₀Is an input layer, w¹Is the total connection layer neuron weight, b¹Is an offset.

3.2) the 2 nd and 4 th layers are 2D convolutional layers, the depths of the convolutional layers are respectively 16 and 8, the sizes of convolution kernels are 3 x 3, the activation functions are all relu functions, the filling mode is SAME filling, and the purpose of filling is to keep the sizes of the characteristic graphs unchanged;

h_2，a＝ReLu(h₁*w^2，a+b^2，a)

wherein h is_2，a，w^2，aIs the a-th convolution area and convolution kernel of the convolution layer;

3.3) the 3 rd and 5 th layers are maximum pooling layers, the size of the pools is 2 x 2, and the filling is SAME filling;

h_3，a＝down(h_2，a)

wherein h is_3，aA down-sampling result of the corresponding volume area;

3.4) the 6 th and 8 th layers are upper sampling layers with the size of 2 x 2;

3.5) the 7 th layer and the 9 th layer are deconvolution layers, the number of convolution kernels in the deconvolution layers is 16 and 1 respectively, the size of the convolution kernels is 3 x 3, the activation functions are all relu functions, and the filling is SAME filling;

3.6) the 10 th layer is an output layer, the output layer is a full-connection layer, the number of the neurons is 195, the activation function is a relu function, and before entering the full-connection layer, the feature map output by the 9 th layer needs to be expanded to be reconstructed into a feature vector.

The invention has the following beneficial effects:

1. in the existing data preprocessing process, the complex correlation existing in the carbon fiber monitoring data is ignored, so that the preprocessing effect is poor. Aiming at the problems, the convolutional layer and the pooling layer are added into the denoising self-encoder model, the data is better reconstructed by utilizing the correlation relationship among the data, and great advantages are shown in the preprocessing effect.

2. In the existing data preprocessing process, denoising and deletion filling are divided into two independent steps, and due to the problems of high dimension and large volume of monitoring data, the traditional method has the defects of low efficiency or low precision. Aiming at the problems, the invention designs the carbon fiber precursor monitoring data preprocessing method based on the convolution denoising autoencoder, so that denoising and deletion filling are effectively fused, the problems of deletion and outlier existing in the carbon fiber monitoring data are solved, and the preprocessing of the carbon fiber monitoring data can be quickly completed.

Drawings

The invention will be further described with reference to the accompanying drawings and examples, in which:

FIG. 1 is a flow chart of a method of an embodiment of the present invention;

FIG. 2 is a network structure diagram of a convolutional denoising autoencoder according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of a denoised self-encoder according to an embodiment of the invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

As shown in fig. 1, a method for preprocessing carbon fiber precursor monitoring data based on a convolution denoising autoencoder includes the following steps:

step 1: the method comprises the steps of collecting carbon fiber precursor production monitoring data from a real-time database of a carbon fiber production plant, uploading the carbon fiber precursor production monitoring data to a big data analysis platform, and sorting the data into a data set according to a time sequence and a process sequence. The method specifically comprises the following steps:

1.1: collecting real-time monitoring data of the production process of carbon fiber precursors from a real-time database of a carbon fiber production plant, generating a data set X (X) aiming at data types such as temperature, pressure, flow and the like by taking time as an acquisition sequence_iThe operation data value of the system at each moment is used as one row of the matrix, wherein i is more than or equal to 1 and less than or equal to n, n is the number of data variables, monitoring data needs to contain each time period of protofilament production as much as possible, and generation of a data set aiming at various data types can be expressed as follows:

X＝[X₁，X₂，X₃，…，X_i…，X_n]

1.2: the features of the data set are reordered as follows. The principle is followed as follows:

① the characteristics of the data set are arranged according to the process sequence of carbon fiber precursor production (the process sequence is spinning, water washing, water drawing, oiling, drying, steaming and drawing, and winding).

② the same type of monitored data within the same process are adjacent.

1.3: and uploading the sorted data set to a cloud platform.

Step 2: selecting a sample without loss in a data set on a big data platform as a target data set; randomly damaging a target data set to be used as a training data set, and specifically comprising the following steps of:

2.1 selecting data X without deletion in the data set sorted in the step 1;

2.2, carrying out normalization processing on the data set X to serve as a target data set D _ output, and storing the normalization model on a big data analysis platform. The normalization method is as follows:

wherein X is the selected defect-free data set and X' is the normalized data

2.3 adding additive white Gaussian noise to X;

X′_i＝X_i+μ*Z_i～N(0，1)

where μ is the noise factor, which controls the amplitude of the additive noise.

2.4, performing random deletion operation on the data set added with Gaussian noise by using a random zero setting method, wherein the deletion proportion is set to be 0.1 according to the deletion condition of the monitoring data of the carbon fiber precursors, so as to obtain a damaged data set D _ corruption;

and 2.5, normalizing the damaged data set D _ corruption by using the stored normalization model to obtain an input data set D _ input.

And step 3: designing a convolution denoising self-encoder model, training the model by using the training data set and the target data set stored in the step 2, selecting the model with the best effect, storing the model as an HDF5 file as a final model, and deploying the model to a big data analysis platform.

The method specifically comprises the following steps:

3.1 design convolution denoising autoencoder, as shown in fig. 2, the autoencoder has 10 layers in total, and is composed of a full connection layer, a 2D convolution layer, a maximum pooling layer, an up-sampling layer and an anti-convolution layer, and the specific structure is as follows:

3.1.1 layer 1 is the fully-connected layer, the number of neurons is 256, and the activation function is the relu function. Compared with the sigmoid function and the tanh function, the relu function has low computational complexity and solves the problem of gradient disappearance. The feature vectors output by the first layer are then constructed into a feature map using the Reshape function, with a size of 16 x 1, which is done to accommodate the operation of subsequent convolutional layers.

ReLu(x)＝max(0，x)

h₁＝ReLu(w¹h₀+b¹)

3.1.2 layers 2 and 4 are 2D convolutional layers, the depths of the convolutional layers are respectively 16 and 8, the sizes of convolution kernels are 3 x 3, and the activation functions are all relu functions. The filling mode is SAME filling. The purpose of the padding is to keep the size of the feature map constant.

h_2，a＝ReLu(h₁*w^2，a+b^2，a)

Wherein h is_2，a，w^2，aIs the a-th convolution area of the convolution layer and the convolution kernel.

3.1.3 layers 3 and 5 are the largest pooling layers, the pool size is 2 x 2, and the filling is SAME filling.

h_3，a＝down(h_2，a)

Wherein h is_3，aIs the down-sampled result of the corresponding area of volume.

3.1.4 layers 6 and 8 are upsampled layers with a size of 2 x 2.

3.1.5 layers 7 and 9 are deconvolution layers, the number of convolution kernels in the deconvolution layers is 16 and 1 respectively, the size of the convolution kernels is 3 x 3, the activation functions are all relu functions, and the filling is SAME filling.

3.1.6 layer 10 is the output layer, the output layer is the full connection layer, its neuron number is 195, the activation function is relu function. Before entering the fully connected layer, the feature map output by the layer 9 needs to be expanded and reconstructed into a feature vector.

3.2 training the convolutional autocoder.

3.2.1 the input dataset is D _ input and the target dataset is D _ output.

3.2.2 in the training process, randomly selecting 20% of data in the training set as a verification set, wherein the iteration number is 200; and when the loss function of the training set tends to be stable and does not significantly decrease along with the iteration number any more, and the loss function of the verification set begins to show a rising trend, the training is proved to be finished.

3.2.3 the optimization algorithm is selected as Adam algorithm, Adam is a first-order optimization algorithm capable of replacing the traditional random gradient descent process, the neural network weight can be updated iteratively based on training data, and the initial learning rate is 0.01;

3.2.4 the loss function is chosen as the mean square error of the reconstructed sample D _ train and the target sample D _ output.

Wherein x is the target sample,

to reconstruct the sample.

3.3 storing the trained model and weights in an HDF5 file, wherein the file contains information such as the structure, weights, training configuration and the like of the model, and packaging and storing all the information on the cloud platform.

And 4, step 4: utilizing the preprocessing model trained in the step 3 to preprocess subsequent precursor monitoring data, storing the processed data into a data warehouse of a big data analysis platform to achieve the purpose of preprocessing, and specifically comprising the following steps:

4.1, sorting the data into a data set by taking days as a unit, rearranging the data set, and replacing missing values with zero values;

4.2, normalizing the data set by using a normalization model stored by a big data analysis platform;

4.3, loading the convolution denoising self-encoder model stored in the step 3, taking the data after normalization processing as input, and obtaining reconstructed data by using the model;

and 4.4, performing inverse normalization on the reconstructed data by using the normalization model to obtain preprocessed data, and storing the preprocessed data in a data warehouse.

It will be understood that modifications and variations can be made by persons skilled in the art in light of the above teachings and all such modifications and variations are intended to be included within the scope of the invention as defined in the appended claims.

Claims

1. A carbon fiber precursor monitoring data preprocessing method based on a convolution denoising autoencoder is characterized by comprising the following steps:

the convolution denoising self-encoder model is as follows:

2. The pretreatment method of carbon fiber precursor monitoring data according to claim 1, wherein the data in step 1) are sorted into data sets according to a time sequence and a process sequence, and the data sets are as follows:

1.1) collecting real-time monitoring data of the production process of carbon fiber precursors, generating a data set X ═ X according to the data types of temperature, pressure and flow by taking time as a collection sequence_iAnd the operation data value of the system at each moment is taken as one row of the matrix, wherein i is more than or equal to 1 and less than or equal to n, n is the number of data variables, the monitoring data comprises each time period of protofilament production, and the generation of data sets aiming at various data types is represented as follows:

X＝[X₁，X₂，X₃，…，X_i…，X_n]

the monitoring data of the same type in the same procedure are adjacent;

1.3) uploading the sorted data set to a cloud platform.

3. The method for preprocessing carbon fiber precursor monitoring data according to claim 1, wherein the specific steps of obtaining the training data set in step 2) are as follows:

4. The method for preprocessing carbon fiber precursor monitoring data according to claim 3, wherein before the random deletion operation is performed on the data set by using a random zero setting method in the step 2.3), the method further comprises an operation of adding Gaussian noise to the data set:

adding additive white Gaussian noise to X;

X′_i＝X_i+μ*Z_i～N(0，1)

where μ is a noise factor used to control the amplitude of the additive noise.

5. The method for preprocessing carbon fiber precursor monitoring data according to claim 1, wherein in the step 3), the loss function adopted in the model training is selected as the mean square error of the reconstructed sample D _ train and the target sample D _ output;

wherein x is the target sample,

to reconstruct the sample.

6. The method for preprocessing carbon fiber precursor monitoring data according to claim 1, wherein in the step 4), the precursor monitoring data to be processed is preprocessed, specifically as follows:

4.2) normalizing the data set;

7. The method for preprocessing carbon fiber precursor monitoring data as claimed in claim 1, wherein in the step 3), the convolutional denoising autoencoder has the following specific structure:

ReLu(x)＝max(0，x)

h₁＝ReLu(w¹h₀+b¹)

h_2，a＝ReLu(h₁*w^2，a+b^2，a)

h_3，a＝down(h_2，a)

wherein h is_3，aA down-sampling result of the corresponding volume area;

3.4) the 6 th and 8 th layers are upper sampling layers with the size of 2 x 2;