CN115204261A

CN115204261A - Small sample data set model training method, device, equipment, medium and product

Info

Publication number: CN115204261A
Application number: CN202210699010.3A
Authority: CN
Inventors: 涂晓招
Original assignee: Bank of China Financial Technology Co Ltd
Current assignee: Bank of China Financial Technology Co Ltd
Priority date: 2022-06-20
Filing date: 2022-06-20
Publication date: 2022-10-18

Abstract

The invention provides a small sample data set model training method, a device, equipment, a medium and a product, which comprises the steps of determining a source domain sample set corresponding to a target domain sample set, and pre-training an initialized Faster-RCNN model based on the source domain sample set to obtain a pre-trained Faster-RCNN model; storing weight parameters of a feature extraction network and a region generation network in a pre-trained fast-RCNN model, and performing random initialization on the weight parameters of a detection network in the pre-trained fast-RCNN model; training the detection network after random initialization based on the target domain sample set to obtain a trained target Faster-RCNN model, so that the defect of insufficient target domain samples is made up through rich supervision information of a source domain, and the accuracy of the model trained by using a small sample target domain is improved.

Description

Small sample data set model training method, device, equipment, medium and product

Technical Field

The invention relates to the technical field of deep learning, in particular to a small sample data set model training method, a device, equipment, a medium and a product.

Background

Visual understanding of images or video is a long-standing and challenging problem in computer vision. One approach to consolidate visual understanding is to build a model that learns the data identified as the target by collecting a set of image data.

Over the past few years, many deep learning methods have achieved excellent performance in target detection. However, the success of these depth detectors depends largely on a large-scale detection benchmark with a fully annotated bounding box, which is extremely costly. In fact, in some scenarios, the number of training set samples with complete labels may be limited due to various factors, thereby affecting the detection effect of the depth detector.

Disclosure of Invention

The invention provides a small sample data set model training method, a small sample data set model training device, a small sample data set model training medium and a small sample data set model training product, which are used for overcoming the defect that the detection effect of a depth detector is influenced under the condition that the number of samples in a training set is insufficient in the prior art and improving the accuracy of a small sample target detection algorithm.

The invention provides a small sample data set model training method, which comprises the following steps:

determining a source domain sample set corresponding to a target domain sample set, and pre-training an initialized fast-RCNN model based on the source domain sample set to obtain a pre-trained fast-RCNN model;

storing the weight parameters of the feature extraction network and the area generation network in the pre-trained fast-RCNN model, and randomly initializing the weight parameters of the detection network in the pre-trained fast-RCNN model;

training the detection network after random initialization based on the target domain sample set to obtain a trained target fast-RCNN model, and identifying a target domain object through the target fast-RCNN model.

According to the small sample data set model training method provided by the invention, the detection network after random initialization is trained based on the target domain sample set to obtain a trained target fast-RCNN model, and the method specifically comprises the following steps:

inputting the target domain sample set into a feature extraction network in the pre-trained fast-RCNN model to obtain target domain sample features output by the feature extraction network in the pre-trained fast-RCNN model;

inputting the target domain sample characteristics into a region generation network in the pre-trained fast-RCNN model to obtain target domain sample candidate domain characteristic information output by the region generation network in the pre-trained fast-RCNN model;

inputting the candidate domain feature information of the target domain sample into a detection network after random initialization for regression and classification to obtain a target domain sample prediction label output by the detection network after random initialization;

and adjusting the class weight parameters and the position regression parameters of the detection network after random initialization based on the target domain sample real label and the target domain sample prediction label to obtain a trained target Faster-RCNN model.

According to the small sample data set model training method provided by the invention, the adjusting of the class weight parameter and the position regression parameter of the detection network after random initialization based on the target domain sample real label and the target domain sample prediction label specifically comprises the following steps:

calculating a classification loss function and a regression loss function corresponding to the target domain sample real label and the target domain sample prediction label;

adjusting the class weight parameters of the classifiers in the detection network after random initialization according to the classification loss function;

and adjusting the position regression parameters in the detection network after random initialization according to the regression loss function.

According to the training method of the small sample data set model provided by the invention, the feature extraction network comprises a VGG16 convolutional neural network, the region generation network comprises an RPN network, and the detection network comprises a Fast-RCNN network.

According to the small sample data set model training method provided by the invention, the edge probability distribution of the target domain corresponding to the target domain sample set is different from that of the source domain sample set, the conditional probability of the target domain corresponding to the target domain sample set is the same as that of the source domain sample set, and the number of samples in the source domain sample set is greater than that of the samples in the target domain sample set.

The invention also provides a small sample data set model training device, which comprises:

the first training unit is used for determining a source domain sample set corresponding to a target domain sample set and pre-training an initialized Faster-RCNN model based on the source domain sample set to obtain a pre-trained Faster-RCNN model;

the initialization unit is used for storing the weight parameters of the feature extraction network and the area generation network in the pre-trained fast-RCNN model and randomly initializing the weight parameters of the detection network in the pre-trained fast-RCNN model;

and the second training unit is used for training the detection network after random initialization based on the target domain sample set to obtain a trained target fast-RCNN model so as to identify the target domain object through the target fast-RCNN model.

The invention also provides an electronic device, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor executes the program to realize the small sample data set model training method.

The present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a method of training a small sample dataset model as described in any one of the above.

The invention also provides a computer program product comprising a computer program which, when executed by a processor, implements a method of training a small sample dataset model as described in any one of the above.

According to the small sample data set model training method, device, equipment, medium and product, the pre-trained fast-RCNN model is obtained by determining the source domain sample set corresponding to the target domain sample set and pre-training the initialized fast-RCNN model based on the source domain sample set; storing weight parameters of a feature extraction network and a region generation network in a pre-trained fast-RCNN model, and performing random initialization on the weight parameters of a detection network in the pre-trained fast-RCNN model; training the detection network after random initialization based on a target domain sample set to obtain a trained target fast-RCNN model, and identifying a target domain object through the target fast-RCNN model, so that the initialization fast-RCNN model is pre-trained by using a source domain sample set, the detection network in the pre-trained fast-RCNN model is trained by using a small sample target domain sample set, and the defect of insufficient target domain samples is made up through rich supervision information of a source domain, so that the accuracy of the target fast-RCNN model trained by using a small sample target domain is improved.

Drawings

In order to more clearly illustrate the technical solutions of the present invention or the prior art, the drawings needed for the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.

FIG. 1 is one of the flow diagrams of a small sample dataset model training method provided by the present invention;

FIG. 2 is a schematic structural diagram of a small sample data set model training device provided by the present invention;

fig. 3 is a schematic structural diagram of an electronic device provided in the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

A small sample dataset model training method of the present invention is described below in conjunction with fig. 1.

Fig. 1 is a schematic flow chart of a small sample data set model training method provided by the present invention, as shown in fig. 1, the method includes:

step 100, determining a source domain sample set corresponding to a target domain sample set, and pre-training an initialized fast-RCNN model based on the source domain sample set to obtain a pre-trained fast-RCNN model;

it should be noted that the small sample data set model training method provided by the present invention is mainly applied to the technical field of image classification and identification, in other words, for the identification and detection of an object, the method collects an image of the object to use the Faster-RCNN model to locate the object and identify the category of the object.

In the invention, the source domain represents a different field from the test sample, but has rich supervision information, the target domain represents the field of the test sample, but the sample has no label or only a small number of labels, and the model learning tasks of the source domain and the target domain are the same.

In practical application, because image resources of some target domains under an industrial production environment are rare and the real-time requirement is high, the method only uses a source domain for training in the first training process, and uses the source domain for pre-training because the source domain has rich supervision information, thereby not only making up the defect of a small sample of the target domain, but also accelerating the training process of the target domain in the later period.

Specifically, the edge probability distribution of the target domain corresponding to the target domain sample set is different from that of the source domain sample set, the conditional probability of the target domain corresponding to the target domain sample set is the same as that of the source domain sample set, and the number of samples in the source domain sample set is greater than that of the target domain sample set.

Wherein, the different edge probability distributions refer to the difference of the sum of the probabilities of various labels in the target domain sample set and the source domain sample set. That is, the target domain sample and the source domain sample have different data contents but have related data characteristics.

In practical application, because the number of labeled samples in the target domain sample set is insufficient, the model accuracy of training only by using the target domain sample set is not high, and in the invention, the initial fast-RCNN model is pre-trained by using the source domain sample set with sufficient number of labeled samples, so that the detection accuracy of the fast-RCNN model is improved.

In the pre-training process, the source domain sample set can be divided into a training set and a testing set, wherein the number of samples in the training set is far larger than that of samples in the target domain sample set, so that the model parameters of the initialized fast-RCNN model are continuously adjusted through the training set, and whether the adjusted model parameters reach the optimum or not is tested through the testing set.

Specifically, the Faster R-CNN model is a candidate domain-based target detection algorithm, target detection refers to positioning of target classes in a given picture to be detected by using a detection algorithm, the fast R-CNN model firstly generates some pre-selection frames on a picture feature map to be detected in the process of positioning the target classes, the pre-selection frames are screened and processed to extract candidate domains, and finally classification and position regression are performed on the extracted candidate domains.

200, storing weight parameters of a feature extraction network and a region generation network in the pre-trained fast-RCNN model, and randomly initializing the weight parameters of a detection network in the pre-trained fast-RCNN model;

the Fast-RCNN model comprises a feature extraction network, a region generation network and a Fast-RCNN network.

Specifically, the feature extraction network in the invention comprises a VGG16 convolutional neural network, the region generation network comprises an RPN network, and the detection network comprises a Fast-RCNN network.

It should be noted that 16 of the VGG16 convolution spirit represents 16 network layers, and is composed of 13 convolution layers and 3 full-connection layers, in the invention, the VGG-16 network is a simple network dedicated to building the convolution layers, the hyper-parameters are not large, along with the deepening of the network, the height, the width and the number of channels of the image are changed according to a certain rule, the characteristic size of the image is reduced by half after each pooling operation, the number of channels is doubled after each group of convolution operations, and the extracted feature map provides convenience for the extraction, classification and regression of subsequent candidate domains.

In the invention, because the target domain sample and the source domain sample have different data contents but are related in data characteristics, after the pre-trained fast-RCNN model is obtained, the weight parameters of the feature extraction network and the region generation network trained by the source domain sample set are used as the weight parameters corresponding to the fast-RCNN model for identifying the target domain, the weight parameters of the detection network trained by the source domain sample set are initialized randomly, and the weight parameters of the detection network initialized randomly are adjusted by the target domain sample set to obtain the detection network capable of accurately identifying the target domain.

Step 300, training the detection network after random initialization based on the target domain sample set to obtain a trained target fast-RCNN model, and identifying a target domain object through the target fast-RCNN model.

Specifically, in the process of training on a target domain sample set, the weight parameters of the detection network are only adjusted by adjusting the parameters each time, so that the classification and positioning accuracy of the fast-RCNN model on the target domain is gradually optimized on the premise that the feature extraction of the fast-RCNN model is optimal.

The invention provides a small sample data set model training method, which comprises the steps of determining a source domain sample set corresponding to a target domain sample set, and pre-training an initialized fast-RCNN model based on the source domain sample set to obtain a pre-trained fast-RCNN model; storing weight parameters of a feature extraction network and a region generation network in a pre-trained fast-RCNN model, and randomly initializing the weight parameters of a detection network in the pre-trained fast-RCNN model; training the detection network after random initialization based on a target domain sample set to obtain a trained target fast-RCNN model, and identifying a target domain object through the target fast-RCNN model, so that the initialization fast-RCNN model is pre-trained by using a source domain sample set, the detection network in the pre-trained fast-RCNN model is trained by using a small sample target domain sample set, and the defect of insufficient target domain samples is made up through rich supervision information of a source domain, so that the accuracy of the target fast-RCNN model trained by using a small sample target domain is improved.

Based on the embodiment, the method for pre-training the initialized Faster-RCNN model based on the source domain sample set to obtain the pre-trained Faster-RCNN model specifically comprises the following steps:

inputting the source domain sample set into a feature extraction network in an initialized Faster-RCNN model to obtain source domain sample features output by the feature extraction network in the initialized Faster-RCNN model;

inputting the source domain sample characteristics into a region generation network in an initialized Faster-RCNN model to obtain source domain sample candidate domain characteristic information output by the region generation network in the initialized Faster-RCNN model;

inputting source domain sample candidate domain feature information into a detection network in an initialized fast-RCNN model for regression and classification to obtain a source domain sample prediction label output by the detection network in the initialized fast-RCNN model;

and adjusting the weight parameters of the feature extraction network, the area generation network and the detection network in the initialized Faster-RCNN model according to the source domain sample real label and the source domain sample prediction label to obtain a pre-trained Faster-RCNN model.

Specifically, the source domain sample feature refers to a feature map extracted from each source domain sample in the source domain sample set, the source domain sample candidate domain feature information refers to a candidate domain further extracted from the feature map extracted from each source domain sample, and the source domain sample prediction label refers to a category label of a category to which the candidate domain extracted in the previous step belongs and a position label of a position of the candidate domain in the source domain sample.

In practical application, a source domain sample in a source domain sample set is subjected to a feature extraction network to obtain a feature map, a candidate domain corresponding to the feature map is extracted by using a region generation network, and finally the candidate domain is sent to a detection network, so that the type judgment is carried out through a classifier of the detection network, and the positioning regression is carried out on the candidate domain belonging to a certain type through a regressor of the detection network, thereby completing the target detection.

Based on the above embodiment, training the detection network after random initialization based on the target domain sample set to obtain a trained target fast-RCNN model, specifically including:

inputting the target domain sample set into a feature extraction network in a pre-trained fast-RCNN model to obtain target domain sample features output by the feature extraction network in the pre-trained fast-RCNN model;

inputting the target domain sample characteristics into a region generation network in a pre-trained fast-RCNN model to obtain target domain sample candidate domain characteristic information output by the region generation network in the pre-trained fast-RCNN model;

inputting the candidate domain feature information of the target domain sample into the detection network after random initialization for regression and classification to obtain a target domain sample prediction label output by the detection network after random initialization;

and adjusting the category weight parameters and the position regression parameters of the detection network after random initialization based on the real target domain sample labels and the target domain sample prediction labels to obtain a trained target Faster-RCNN model.

Specifically, the target domain sample feature refers to a feature map extracted from each target domain sample in the target domain sample set, the target domain sample candidate domain feature information refers to a candidate domain further extracted from the feature map extracted from each target domain sample, and the target domain sample prediction label refers to a category label of a category to which the candidate domain extracted in the previous step belongs and a position label of a position of the candidate domain in the target domain sample.

In practical application, a target domain sample in a target domain sample set is subjected to a pre-trained feature extraction network to obtain a feature map, then a candidate domain corresponding to the feature map is extracted by utilizing a pre-trained region generation network, and finally the candidate domain is sent to a detection network to be trained, so that the detection network to be trained is trained through the target domain sample set.

Specifically, a classification loss function and a regression loss function corresponding to a target domain sample real label and a target domain sample prediction label are calculated; adjusting the class weight parameters of the classifiers in the detection network after random initialization according to the classification loss function; and adjusting the position regression parameters in the detection network after random initialization according to the regression loss function.

In other words, in the process of training the target fast-RCNN model for identifying the target domain, the two weight parameters of the detection network of the fast-RCNN model which is pre-trained by using the source domain sample set are trained by using the target domain sample set, so that the defect of insufficient target domain samples is made up through rich supervision information of the source domain, and the accuracy of the target fast-RCNN model which is trained by using a small sample target domain is improved.

The following describes the small sample dataset model training apparatus provided by the present invention, and the small sample dataset model training apparatus described below and the small sample dataset model training method described above may be referred to in correspondence with each other.

Referring to fig. 2, fig. 2 is a schematic structural diagram of a small sample data set model training apparatus provided in the present invention, and as shown in fig. 2, the small sample data set model training apparatus includes: the first training unit 210 is configured to determine a source domain sample set corresponding to a target domain sample set, and pre-train an initialized fast-RCNN model based on the source domain sample set to obtain a pre-trained fast-RCNN model; an initialization unit 220, configured to store the weight parameters of the feature extraction network and the region generation network in the pre-trained fast-RCNN model, and perform random initialization on the weight parameters of the detection network in the pre-trained fast-RCNN model; and a second training unit 230, configured to train the detection network after random initialization based on the target domain sample set, to obtain a trained target fast-RCNN model, so as to identify a target domain object through the target fast-RCNN model.

The small sample data set model training device provided by the invention obtains a pre-trained fast-RCNN model by determining a source domain sample set corresponding to a target domain sample set and pre-training an initialized fast-RCNN model based on the source domain sample set; storing weight parameters of a feature extraction network and a region generation network in a pre-trained fast-RCNN model, and randomly initializing the weight parameters of a detection network in the pre-trained fast-RCNN model; training the detection network after random initialization based on a target domain sample set to obtain a trained target fast-RCNN model, and identifying a target domain object through the target fast-RCNN model, so that the initialization fast-RCNN model is pre-trained by using a source domain sample set, the detection network in the pre-trained fast-RCNN model is trained by using a small sample target domain sample set, and the defect of insufficient target domain samples is made up through rich supervision information of a source domain, so that the accuracy of the target fast-RCNN model trained by using a small sample target domain is improved.

Fig. 3 illustrates a physical structure diagram of an electronic device, which may include, as shown in fig. 3: a processor (processor) 310, a communication Interface (communication Interface) 320, a memory (memory) 330 and a communication bus 340, wherein the processor 310, the communication Interface 320 and the memory 330 communicate with each other via the communication bus 340. The processor 310 may invoke logic instructions in the memory 330 to perform a small sample dataset model training method comprising: determining a source domain sample set corresponding to a target domain sample set, and pre-training an initialized fast-RCNN model based on the source domain sample set to obtain a pre-trained fast-RCNN model; storing the weight parameters of the feature extraction network and the area generation network in the pre-trained fast-RCNN model, and randomly initializing the weight parameters of the detection network in the pre-trained fast-RCNN model; training the detection network after random initialization based on the target domain sample set to obtain a trained target Faster-RCNN model, and identifying a target domain object through the target Faster-RCNN model.

In addition, the logic instructions in the memory 330 may be implemented in the form of software functional units and stored in a computer readable storage medium when the software functional units are sold or used as independent products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk, and various media capable of storing program codes.

In another aspect, the present invention also provides a computer program product, the computer program product comprising a computer program, the computer program being storable on a non-transitory computer-readable storage medium, wherein when the computer program is executed by a processor, the computer is capable of executing the method for training a small sample data set model provided by the above methods, the method comprising: determining a source domain sample set corresponding to a target domain sample set, and pre-training an initialized Faster-RCNN model based on the source domain sample set to obtain a pre-trained Faster-RCNN model; storing weight parameters of a feature extraction network and a region generation network in the pre-trained fast-RCNN model, and performing random initialization on the weight parameters of a detection network in the pre-trained fast-RCNN model; training the detection network after random initialization based on the target domain sample set to obtain a trained target fast-RCNN model, and identifying a target domain object through the target fast-RCNN model.

In yet another aspect, the present invention also provides a non-transitory computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements a method for training a small sample dataset model provided by the methods described above, the method comprising: determining a source domain sample set corresponding to a target domain sample set, and pre-training an initialized fast-RCNN model based on the source domain sample set to obtain a pre-trained fast-RCNN model; storing the weight parameters of the feature extraction network and the area generation network in the pre-trained fast-RCNN model, and randomly initializing the weight parameters of the detection network in the pre-trained fast-RCNN model; training the detection network after random initialization based on the target domain sample set to obtain a trained target fast-RCNN model, and identifying a target domain object through the target fast-RCNN model.

The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A small sample data set model training method is characterized by comprising the following steps:

2. The small sample dataset model training method according to claim 1, wherein the training of the detection network after random initialization based on the target domain sample set to obtain a trained target fast-RCNN model specifically comprises:

3. The small sample dataset model training method according to claim 2, wherein the adjusting the class weight parameter and the position regression parameter of the randomly initialized detection network based on the target domain sample real label and the target domain sample prediction label specifically comprises:

4. The method for training a small sample dataset model according to any of claims 1 to 3, wherein the feature extraction network comprises a VGG16 convolutional neural network, the region generation network comprises an RPN network, and the detection network comprises a Fast-RCNN network.

5. The small sample dataset model training method according to any one of claims 1 to 3, wherein the edge probability distribution of the target domain corresponding to the target domain sample set is different from that of the source domain sample set, the conditional probability of the target domain corresponding to the target domain sample set is the same as that of the source domain sample set, and the number of samples in the source domain sample set is greater than that of the target domain sample set.

6. A small sample dataset model training apparatus, comprising:

7. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of training a small sample dataset model according to any one of claims 1 to 5 when executing the program.

8. A non-transitory computer-readable storage medium having stored thereon a computer program, wherein the computer program, when executed by a processor, implements the small sample dataset model training method of any one of claims 1 to 5.

9. A computer program product comprising a computer program, wherein the computer program, when executed by a processor, implements a method for training a small sample dataset model according to any one of claims 1 to 5.