CN117333704A

CN117333704A - Small sample materialization experiment equipment state detection method based on transfer learning

Info

Publication number: CN117333704A
Application number: CN202311286111.9A
Authority: CN
Inventors: 刘峰; 周玉帆; 干宗良
Original assignee: Nanjing University of Posts and Telecommunications
Current assignee: Nanjing University of Posts and Telecommunications
Priority date: 2023-10-07
Filing date: 2023-10-07
Publication date: 2024-01-02

Abstract

The invention provides a state detection method of a small sample physical and chemical experiment device based on transfer learning. Collecting data of different experimental instruments, and constructing a basic class data set and a new class data set; pre-training a basic category target detection model by using a basic category data set to obtain a basic model comprising a feature extraction module, an RPN (remote procedure for network) area suggestion network, a RoIPooling module and a prediction module, introducing a feature pyramid network into the feature extraction module, and effectively fusing a plurality of scale features; introducing a comparison branch into the basic model, and performing fine adjustment on the comparison branch by using a new class data set to obtain a final detection model; inputting the image to be detected into a final detection model to obtain a detection result. The problem that a large amount of annotation data in a real scene is not easy to acquire can be solved; simultaneously, features extracted by multiple scales are considered, semantic information and position information are effectively fused, the characterization capability of a backbone network is enhanced, and the classification and positioning capability of a model to a target to be detected are improved; the classification accuracy of the model to the target is effectively improved by comparing branches.

Description

Small sample materialization experiment equipment state detection method based on transfer learning

Technical Field

The invention relates to a state detection method of a small sample physical and chemical experiment device based on transfer learning, and belongs to the technical field of computer vision image target detection.

Background

The method has significance for the detection and research of the deep learning target under the condition of small sample aiming at the conditions that the data set samples of the existing manually acquired laboratory instrument are few, the appearance of the experimental instrument adopted by different schools is different from that of the instrument training set of the existing model, the detection of a new class of experimental instrument which does not appear in the training set of the existing model and the like.

More research on small sample learning in recent years has focused on target classification tasks, and relatively few target detection studies are performed under small sample conditions. In small sample target detection, the currently mainstream methods are divided into two types: 1) A meta learning based method. The method is to learn the meta-knowledge from different tasks by using a meta-learner, and then complete the detection of the new class for the task containing the new class through the adjustment of the meta-knowledge. 2) A method based on transfer learning. The method is to migrate knowledge learned from known classes to detection tasks of unknown classes.

The method based on transfer learning generally adopts FasterRCNN as a basic framework for small sample target detection, adopts a two-stage training method, wherein the training set in the first stage is a large amount of marked base class data, and the second stage adopts a small amount of base classes and new classes for fine adjustment. The TFA method has great potential in small sample target detection, simply freezes the network parameters learned on the base class data, then fine-tunes the last two fully connected layers on the new class data, i.e. classification and frame regression, and other structures are all frozen. The subsequent discovery can obtain better results without freezing the model structure by adopting proper training. However, there are also some disadvantages: the problem of serious imbalance of positive and negative categories caused by the condition of a small sample and the problem of inaccurate positioning in the detection of targets of a new category of a small number of samples exist.

Disclosure of Invention

The invention provides a state detection method of a small sample physical and chemical experiment device based on transfer learning, which solves the problems disclosed in the background technology.

In order to solve the technical problems, the invention adopts the following technical scheme: a small sample materialization experiment equipment state detection method based on transfer learning comprises the following steps:

collecting data of different experimental equipment, and dividing the data into a basic class data set and a new class data set, wherein the number of image labels of the basic class data set is more than that of the new class data set, and the basic class data and the new class data have no intersection;

pre-training a basic category target detection model by using a basic category data set to obtain a basic model, wherein the basic category target detection model comprises a feature extraction module, an RPN (remote procedure for network) area suggestion network, a RoIPooling module and a prediction module;

fine tuning the basic model by using a balance data set consisting of basic class data and new class data to obtain a final model; during fine tuning, a comparison branch is added, parallel to the classifier and the regressor;

inputting the image of the experimental instrument to be detected into a final model, and obtaining the category, the positioning boundary box and the confidence of the detection target.

Further, the pre-training method of the basic model comprises the following steps:

inputting the basic class data set into a feature extraction network, obtaining four feature graphs C2, C3, C4 and C5 with different scales and different depths, respectively inputting the four feature graphs as an improved feature pyramid network, and finally obtaining feature graphs N2, N3, N4, N5 and N6 with different scales and fused semantic and position information;

transmitting the feature maps N2, N3, N4, N5 and N6 into the RPN area suggestion network in parallel to generate candidate areas;

mapping candidate areas generated by the RPN area suggestion network into feature graphs N2, N3, N4 and N5, and inputting the candidate areas into a RoIPooling module to obtain output with fixed size;

and after the output passes through the full connection layer, the output is input into a classifier and a regressive to obtain a prediction category and a target candidate frame.

Further, inputting the basic class data set into a feature extraction network, and obtaining four feature graphs C2, C3, C4 and C5 with different scales and different depths, wherein the four feature graphs C2, C3, C4 and C5 are respectively used as improved feature pyramid network input, and finally, the process of obtaining the feature graphs N2, N3, N4, N5 and N6 with different scales and fused semantic and position information is as follows:

the feature extraction network adopts a residual network ResNet-101, and consists of a convolution layer of 7 multiplied by 64, conv1 and four residual blocks conv2_x, conv3_x, conv4_x and conv5_x, wherein the image passes through the four residual blocks of the feature extraction network to respectively output feature images C2, C3, C4 and C5 with four different scales and different depths, the scale size of the feature image C2 is 256 multiplied by 256, the depth is 256, and the scale size of each level of feature image is reduced by half from the feature image C2 to the feature image C5, and the depth is doubled;

the method comprises the steps that the number of channels of obtained feature images C2, C3, C4 and C5 is adjusted to 256 by 1X 1 convolution, the feature image of the upper stage is subjected to double up-sampling and the feature image of the lower stage are subjected to addition operation, and the feature images P2, P3, P4 and P5 are obtained after 3X 3 convolution, wherein the size of the feature image P2 is 256X 256, the depth is 256, and from the feature image P2 to the feature image P5, the size of the feature image of each stage is reduced by half, and the depth is 256;

and constructing a bottom-up path based on the feature graphs P2, P3, P4 and P5, adding the feature graph of the next stage with the feature graph of the previous stage through a 3X 3 convolution with the step length of 2, performing the same operation on each layer of feature graph through a 3X 3 convolution to obtain feature graphs N2, N3, N4 and N5, and performing maxpooling with the step length of 2 on the feature graph N5 to obtain a feature graph N6, wherein the feature graph N2 has the dimension of 256X 256 and the depth of 256, and the dimension of each stage of feature graph is reduced by half from the feature graph N2 to the feature graph N6, and the depth of each stage of feature graph is 256.

Further, the feature maps N2, N3, N4, N5 and N6 are parallelly transmitted into the RPN area suggestion network, and the generated candidate area comprises the following processes:

the feature graphs N2, N3, N4, N5 and N6 are transmitted into an RPN area suggestion network, k anchors with different scales and different length-width ratios are generated for each 3X 3 sliding window on the feature graph through 3X 3 convolution, and W X H X k anchors are generated for a feature graph with W X H;

each anchor realizes classification and regression through two parallel 1×1 convolutions respectively; the confidence coefficient of a classification is obtained by using a 1 multiplied by 1 convolution and softmax function, and only the probability of the foreground and the background is predicted; obtaining the position offset of the anchors relative to the true value by using 1X 1 convolution, wherein the position offset of each anchor has four values, and the values correspond to the abscissa and the ordinate of the center point and the width and the height respectively;

and adjusting each anchor by using the position offset, sorting all anchors according to the probability calculated by the softmax function, screening the first 12000 anchors, and selecting the first 2000 anchors as final candidate regions according to the predicted probability after non-maximum suppression.

Further, the candidate region generated by the RPN region suggestion network is mapped to the feature graphs N2, N3, N4 and N5, and is input to the RoIPooling module, and the process of obtaining the output with the fixed size is as follows:

dividing the mapped region into grids with the size of 7 multiplied by 7 on average, wherein the boundary of each unit is not quantized;

four regular sampling points are selected for each grid, and the pixel value of the four sampling points is obtained through bilinear interpolation calculation;

and respectively executing aggregation operation on each grid by using maxpooling to obtain a fixed 7×7 size characteristic diagram.

Further, the basic model is fine-tuned by using a balance data set composed of basic class data and new class data, and the process of obtaining the final model is as follows:

for all categories of the basic category and the new category data set, respectively adopting the same number of samples as training sets;

loading basic model parameters, freezing feature extraction module and RoIPooling module, introducing a comparison branch parallel to classification and regression branches into a prediction module, and forming a full-connection layer, taking RoI features as input and outputting RoI feature vectors with the dimension of 128;

and carrying out joint fine tuning training on the improved characteristic pyramid network, the RPN area suggestion network, the comparison branch and the prediction module to obtain a final detection model.

Accordingly, a computer readable storage medium storing one or more programs: the one or more programs include instructions, which when executed by a computing device, cause the computing device to perform any of the methods described above.

Accordingly, a computing device, comprising:

one or more processors, one or more memories, and one or more programs, wherein the one or more programs are stored in the one or more memories and configured to be executed by the one or more processors, the one or more programs comprising instructions for performing any of the methods described above.

The invention has the beneficial effects that:

the invention provides a state detection method of a small sample physical and chemical experiment device based on transfer learning, which adopts a mode of training a small number of samples to relieve the problem that a large number of marked data in a real scene are not easy to obtain; the semantic information and the position information are effectively fused by adopting a top-down path enhancement technology and a bottom-up path enhancement technology respectively, and the representation capability of a backbone network is enhanced, so that the classification and positioning capability of a model to a target to be detected are improved; the comparison branches introduced in the fine adjustment stage are used for shortening the distances of the same category and lengthening the distances of different categories, so that the classification capacity of the model is improved more effectively, and the accuracy of the model for predicting experimental instruments is improved.

Drawings

FIG. 1 is a schematic flow chart of the present invention;

fig. 2 is a schematic diagram of the structure of the model of the present invention.

Detailed Description

The invention is further described below with reference to the accompanying drawings. The following examples are only for more clearly illustrating the technical aspects of the present invention, and are not intended to limit the scope of the present invention.

As shown in FIG. 1, the method for detecting the state of the small sample materialization experiment equipment based on transfer learning comprises four steps of data set preparation, model basic training, model fine tuning and target detection, and is specifically described as follows:

step 1: collecting data of different experimental instruments, and dividing the data into a basic class data set and a new class data set, wherein the basic class data set has more samples, the new class data set is a balanced data set consisting of a small amount of basic class data and new class data, and the basic class data and the new class data have no intersection; and labeling each picture, and respectively generating json labeling files in the COCO format.

Step 2: and inputting the base class data set into a base class target detection model for training to obtain a base model. The basic category target detection model comprises a feature extraction module, an RPN (remote procedure network) area suggestion network, a RoIPooling module and a prediction module:

(1) And the characteristic extraction module. The method consists of a residual network ResNet-101 and an improved characteristic pyramid network, takes a basic class data set as input, and outputs a plurality of characteristic graphs with different scales, and comprises the following specific processes:

(1.1) residual network ResNet-101 consists of one 7×7×64 convolutional layer and conv1 and four residual blocks conv2_x, conv3_x, conv4_x, conv5_x. Inputting an image into ResNet-101, sequentially passing through four residual blocks, and then respectively outputting four feature images C2, C3, C4 and C5 with different dimensions and different depths, wherein the dimension of the feature image C2 is 256 multiplied by 256, the depth is 256, and from the feature image C2 to the feature image C5, the dimension of each level of feature image is reduced by half, and the depth is doubled;

after the channel numbers of the obtained feature maps C2, C3, C4 and C5 are respectively adjusted to 256 through convolution of 1×1, the feature map of the upper stage is subjected to addition operation through double up-sampling and the feature map of the lower stage, and the feature maps P2, P3, P4 and P5 are respectively obtained through convolution of 3×3, wherein the feature map P2 has the size of 256×256 and the depth of 256, and the size of each stage of feature map is reduced by half from the feature map P2 to the feature map P5, and the depth of 256;

(1.3) constructing a path from bottom to top based on the feature graphs P2, P3, P4 and P5, adding the feature graph of the next stage with the feature graph of the previous stage through a 3×3 convolution with a step length of 2, obtaining a new feature graph through a 3×3 convolution, performing the same operation on the new feature graph to obtain feature graphs N2, N3, N4 and N5, and performing maxpooling with a step length of 2 on the feature graph N5 to obtain a feature graph N6, wherein the feature graph N2 has a dimension of 256×256 and a depth of 256, and the dimension of each stage of feature graph from the feature graph N2 to the feature graph N6 is reduced by half and the depth of 256.

(2) The RPN region suggests a network. Predicting candidate areas on different hierarchy feature graphs to generate better candidate areas, wherein the specific process is as follows:

(2.1) parallel transmitting the feature graphs N2, N3, N4, N5 and N6 into an RPN area proposal network, generating k anchors with different scales and different length-width ratios for each 3X 3 sliding window on the feature graph through a 3X 3 convolution, and generating W X H X k anchors for a feature graph with W X H;

(2.2) each Anchor performs classification and regression by two parallel 1×1 convolutions, respectively. Wherein, a 1×1 convolution and softmax function is used to obtain a confidence level of two categories, and only the probabilities of the foreground and the background are predicted; the position offset of the anchors relative to the true value is obtained by using 1×1 convolution, and the position offset of each anchor has four values, namely, the abscissa and the ordinate of the center point and the width and height.

And (2.3) adjusting each anchor by using the position offset, then sorting all anchors according to the probability calculated by the softmax function, screening out the first 12000 anchors, and selecting the first 2000 anchors as final candidate regions according to the predicted probability again after non-maximum suppression.

(3) A RoIPooling module. The candidate areas generated by the RPN area suggestion network and the feature diagrams of different levels are used as inputs, and RoIAlignon operation is utilized to obtain an output with a fixed size, and the specific process is as follows:

(3.1) mapping the candidate areas generated by the RPN area proposal network into feature graphs N2, N3, N4 and N5, and equally dividing the mapped areas into grids with the size of 7 multiplied by 7, wherein the boundary of each unit is not quantized;

(3.2) selecting four regular sampling points for each grid, and calculating to obtain the pixel value of the four sampling points by using bilinear interpolation;

(3.3) performing aggregation operation on each grid by using maxpooling to obtain a fixed 7×7 size characteristic diagram.

(4) And a prediction module. And (3) after the output of the step (3) passes through the full connection layer, inputting the output into a classifier and a regressive device to conduct classification prediction and regression prediction of the target candidate frame.

In the basic training phase of the model, the objective function of the whole network is in the following form:

wherein,and->Are all loss functions in the RPN area suggestion network,>is cross entropy loss for two classes, +.>Is the soomth L1 loss for regression; l (L) _cls Is the cross entropy penalty for the classifier for multiple classifications; l (L) _reg Is the soluth L1 loss for the regressive bounding box of the regressive.

Step 3: loading the basic model parameters obtained in the step 2, freezing the feature extraction module and the RoIPooling module, and simultaneously introducing a comparison branch parallel to the classification and regression branches into the prediction module, wherein the comparison branch consists of a full-connection layer, takes RoI features as input, and outputs RoI feature vectors with the dimension of 128; and then, carrying out joint fine tuning training on the improved characteristic pyramid network, the RPN area suggestion network, the comparison branch and the prediction module by using the new class data set to obtain a final detection model.

In the fine tuning stage of the model, the objective function of the whole network is of the form:

wherein lambda is set to 0.5, L _CL A loss function for the comparison branch is defined as follows:

in the above formula, N is the total number of RoI features input, x _i For the 128-dimensional feature of the ith RoI, y _i True value for the ith RoI class, u _i For the IoU value between the i-th RoI and its corresponding real box,is a standardized x feature, +.>Is a standardized x _j Characteristic(s)>Is a standardized x _k Characteristic(s)>Is the RoI label is y _i Is represented by the total number of (I {.cndot. }) and meansThe function, τ, is the temperature coefficient, which is a super parameter.

Loss per RoIAll use a reference u _i Is a function f (u) _i )＝I{u _i ≥0.7}·u _i It is weighted, weighting the candidate boxes of different quality to different extents. While at the expense->In (I)>Calculate the normalized feature->And->Cosine similarity between two RoI categories, when they are the same, the higher their similarity, the smaller the loss value, which causes the feature distance of the same category to decrease and the feature distance of different categories to become larger during training.

Step 4: inputting the image to be detected into a detection model finally obtained, and obtaining the category, the positioning boundary box and the confidence of the detection target.

The foregoing is merely a preferred embodiment of the present invention, and it should be noted that modifications and variations could be made by those skilled in the art without departing from the technical principles of the present invention, and such modifications and variations should also be regarded as being within the scope of the invention.

A computer-readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by a computing device, cause the computing device to perform a small sample materialization experiment equipment state detection method based on transfer learning.

A computing device comprising one or more processors, one or more memories, and one or more programs, wherein one or more programs are stored in the one or more memories and configured to be executed by the one or more processors, the one or more programs comprising instructions for performing a small sample materialization experiment equipment state detection method based on transfer learning.

It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The foregoing is illustrative of the present invention and is not to be construed as limiting thereof, but rather as providing for the use of additional embodiments and advantages of all such modifications, equivalents, improvements and similar to the present invention are intended to be included within the scope of the present invention as defined by the appended claims.

Claims

1. A small sample materialization experiment equipment state detection method based on transfer learning is characterized by comprising the following steps of:

2. The method for detecting the state of the small sample physical and chemical experiment equipment based on the transfer learning according to claim 1, wherein the pre-training method of the basic model is as follows:

3. The method for detecting the state of small sample physical and chemical experiment equipment based on transfer learning according to claim 2 is characterized in that a basic class data set is input into a feature extraction network, four feature graphs C2, C3, C4 and C5 with different dimensions and different depths are obtained and are respectively input into an improved feature pyramid network, and finally the process of obtaining the feature graphs N2, N3, N4, N5 and N6 with different dimensions and fused semantic and position information is as follows:

4. The method for detecting the state of the small sample physical and chemical experiment equipment based on the transfer learning according to claim 2 is characterized in that feature maps N2, N3, N4, N5 and N6 are parallelly transmitted into an RPN area suggestion network, and the generated candidate area comprises the following processes:

5. The method for detecting the state of small sample physical and chemical experiment equipment based on transfer learning according to claim 2, wherein the process of mapping the candidate region generated by the RPN region suggestion network to the feature maps N2, N3, N4 and N5, inputting the candidate region to the RoIPooling module and obtaining the output with a fixed size is as follows:

6. The method for detecting the state of small sample physical and chemical experiment equipment based on transfer learning according to claim 1, wherein the process of fine tuning a basic model by using a balance data set consisting of basic class data and new class data to obtain a final model is as follows:

7. A computer readable storage medium storing one or more programs, characterized by: the one or more programs include instructions, which when executed by a computing device, cause the computing device to perform any of the methods of claims 1-6.

8. A computing device, comprising:

one or more processors, one or more memories, and one or more programs, wherein the one or more programs are stored in the one or more memories and configured to be executed by the one or more processors, the one or more programs comprising instructions for performing any of the methods of claims 1-6.