CN116091867A - Model training and image recognition method, device, equipment and storage medium - Google Patents
Model training and image recognition method, device, equipment and storage medium Download PDFInfo
- Publication number
- CN116091867A CN116091867A CN202310063908.6A CN202310063908A CN116091867A CN 116091867 A CN116091867 A CN 116091867A CN 202310063908 A CN202310063908 A CN 202310063908A CN 116091867 A CN116091867 A CN 116091867A
- Authority
- CN
- China
- Prior art keywords
- image
- episode
- samples
- adaptive
- sample
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 89
- 238000012549 training Methods 0.000 title claims abstract description 71
- 238000003860 storage Methods 0.000 title claims abstract description 16
- 230000003044 adaptive effect Effects 0.000 claims abstract description 53
- 230000006870 function Effects 0.000 claims abstract description 35
- 208000037170 Delayed Emergence from Anesthesia Diseases 0.000 claims description 46
- 238000000605 extraction Methods 0.000 claims description 27
- 238000010586 diagram Methods 0.000 claims description 18
- 238000012360 testing method Methods 0.000 claims description 14
- 230000004913 activation Effects 0.000 claims description 13
- 238000010606 normalization Methods 0.000 claims description 11
- 238000004590 computer program Methods 0.000 claims description 10
- 238000011176 pooling Methods 0.000 claims description 9
- 238000013507 mapping Methods 0.000 claims description 6
- 230000006978 adaptation Effects 0.000 claims description 5
- 230000009466 transformation Effects 0.000 claims description 5
- 230000006835 compression Effects 0.000 claims description 4
- 238000007906 compression Methods 0.000 claims description 4
- 238000005520 cutting process Methods 0.000 claims description 3
- 238000010276 construction Methods 0.000 claims description 2
- 230000000694 effects Effects 0.000 abstract description 4
- 230000008569 process Effects 0.000 description 6
- 238000012545 processing Methods 0.000 description 6
- 210000002569 neuron Anatomy 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 208000035977 Rare disease Diseases 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 2
- 238000013136 deep learning model Methods 0.000 description 2
- 238000003745 diagnosis Methods 0.000 description 2
- 238000004519 manufacturing process Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000003068 static effect Effects 0.000 description 2
- 238000012935 Averaging Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 230000008034 disappearance Effects 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 238000012886 linear function Methods 0.000 description 1
- 238000013139 quantization Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/774—Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Computing Systems (AREA)
- Databases & Information Systems (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Software Systems (AREA)
- Artificial Intelligence (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
Abstract
The application provides a model training method, an image recognition device, equipment and a storage medium, wherein the method comprises the following steps: randomly acquiring a plurality of image episodes in a source domain data set; constructing a task-aware self-adaptive learning network model; inputting the image episode into the self-adaptive learning network model to obtain a feature map of a support sample and a query sample in the image episode; determining classification loss according to the feature graphs of the support samples and the query samples, determining adaptive loss according to the domain offset of the image episode and the target domain data set, and determining overall loss according to the classification loss and the adaptive loss; and adjusting the self-adaptive learning network model according to the overall loss until the overall loss converges. In the method, the domain offset is introduced into the loss function, so that the trained model can give consideration to the target data set with different domain offsets, and a more accurate image recognition effect is achieved.
Description
Technical Field
The present application relates to the field of image processing technologies, and in particular, to a model training method, an image recognition method, an apparatus, a device, and a storage medium.
Background
The conventional deep learning model shows excellent generalization performance through training of a large number of labeled samples. However, abundant samples and reliable labels are difficult to obtain in practical applications, such as rare disease diagnosis, fine-grained identification, etc. Inspired by the fact that humans can quickly learn new knowledge, small sample learning aims at achieving that when each class has only a small number of samples with labels, the model can well identify samples to be detected.
However, most of the current small sample learning image recognition models only pay attention to how to quickly adapt to new categories, and do not consider the domain offset problem between the test task and the training domain.
Disclosure of Invention
The problem solved by the application is that the current small sample learning image recognition model does not consider the domain offset problem between the test task and the training domain.
To solve the above problem, a first aspect of the present application provides a model training method, including:
randomly acquiring a plurality of image episodes in a source domain data set, wherein each image episode comprises a plurality of types of support samples and query samples, and the support samples and the query samples in the image episodes are marked;
constructing a task-aware self-adaptive learning network model;
Inputting the image episode into the self-adaptive learning network model to obtain a feature map of a support sample and a query sample in the image episode;
determining classification loss according to the feature graphs of the support samples and the query samples, determining adaptive loss according to the domain offset of the image episode and the target domain data set, and determining overall loss according to the classification loss and the adaptive loss;
and adjusting the self-adaptive learning network model according to the overall loss until the overall loss converges.
A second aspect of the present application provides an image recognition method, including:
acquiring an image episode to be identified in a target domain data set, wherein the image episode to be identified comprises a plurality of types of support samples and query samples, and the support samples are marked;
obtaining a pre-trained self-adaptive learning network model, wherein the self-adaptive learning network model is obtained by training through the model training method;
adjusting the self-adaptive learning network model through the marked support sample;
determining a feature map of a support sample and a query sample in the image episode to be identified through the adjusted self-adaptive learning network model;
And determining the category of the query sample according to the feature map of the query sample and the feature map of the marked support sample.
A third aspect of the present application provides a model training apparatus, comprising:
the training acquisition module is used for randomly acquiring a plurality of image episodes in the source domain data set, each image episode comprises a plurality of categories of support samples and query samples, and the support samples and the query samples in the image episodes are marked;
the model construction module is used for constructing a task-aware self-adaptive learning network model;
the feature acquisition module is used for inputting the image episode into the self-adaptive learning network model to obtain feature graphs of a support sample and a query sample in the image episode;
a loss determination module for determining a classification loss according to a feature map of the support sample and the query sample, an adaptive loss according to a domain offset of the image episode and a target domain dataset, and an overall loss according to the classification loss and the adaptive loss;
and the model training module is used for adjusting the self-adaptive learning network model according to the integral loss until the integral loss converges.
A fourth aspect of the present application provides an image recognition apparatus, comprising:
the test acquisition module is used for acquiring an image episode to be identified in the target domain data set, wherein the image episode to be identified comprises a plurality of types of support samples and inquiry samples, and the support samples are marked;
the model acquisition module is used for acquiring a pre-trained self-adaptive learning network model, and the self-adaptive learning network model is obtained by training the model training method;
the model adjustment module is used for adjusting the self-adaptive learning network model through the marked support samples;
the model output module is used for determining a feature map of a support sample and a query sample in the image episode to be identified through the adjusted self-adaptive learning network model;
and the category determining module is used for determining the category of the query sample according to the characteristic diagram of the query sample and the characteristic diagram of the marked support sample.
A fifth aspect of the present application provides a terminal device, comprising: a memory and a processor;
the memory is used for storing programs;
the processor is coupled to the memory for executing the program for executing the model training method described above, or for executing the image recognition method described above.
A sixth aspect of the present application provides a computer-readable storage medium having stored thereon a computer program for execution by a processor to implement the model training method described above, or to implement the image recognition method described above.
In the method, the domain offset is introduced into the loss function, so that the trained model can give consideration to the target data set with different domain offsets, and a more accurate image recognition effect is achieved.
According to the domain offset of the task to be tested and the source domain, the optimal task specific parameter strategy is adaptively learned for each test task, different optimal reasoning network structure diagrams are obtained for the task to be tested with different domain offsets, and the accuracy of small sample image identification is improved.
Drawings
FIG. 1 is a flow chart of a model training method according to one embodiment of the present application;
FIG. 2 is a schematic diagram of an adaptive learning network model according to the present application;
FIG. 3 is a flow chart of a model training method based on an adaptive learning network model according to one embodiment of the present application;
FIG. 4 is a flow chart of a model training method based on residual blocks according to one embodiment of the present application;
FIG. 5 is a flow chart of a model training method based on an adaptive module layer according to one embodiment of the present application;
FIG. 6 is a flow chart of a model training method based on gating network according to one embodiment of the present application;
FIG. 7 is a flow chart of an image recognition method according to one embodiment of the present application;
FIG. 8 is a block diagram of a model training apparatus according to one embodiment of the present application;
FIG. 9 is a block diagram of an image recognition device according to one embodiment of the present application;
fig. 10 is a block diagram of a terminal device according to an embodiment of the present application.
Detailed Description
In order to make the above objects, features and advantages of the present application more comprehensible, embodiments accompanied with figures are described in detail below. While exemplary embodiments of the present application are shown in the drawings, it should be understood that the present application may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
It is noted that unless otherwise indicated, technical or scientific terms used herein should be given the ordinary meaning as understood by one of ordinary skill in the art to which this application belongs.
The conventional deep learning model shows excellent generalization performance through training of a large number of labeled samples. However, abundant samples and reliable labels are difficult to obtain in practical applications, such as rare disease diagnosis, fine-grained identification, etc. Inspired by the fact that humans can quickly learn new knowledge, small sample learning aims at achieving that when each class has only a small number of samples with labels, the model can well identify samples to be detected.
Recently, a small sample learning image recognition method based on meta learning has made a long progress in the face of test data derived from a source domain. However, more and more studies have shown that the generalization ability of these methods is significantly inadequate when the test data is heterogeneous with the source domain. This limitation is due to the fact that most small sample learning image recognition models only focus on how quickly to adapt to new classes, but there is little or no effort to understand and solve the domain offset problem between the test task and the training domain.
Aiming at the problems, the application provides a novel model training scheme, which can solve the problem that the current small sample learning image recognition model does not consider the domain offset between a test task and a training domain by introducing the domain offset of an image episode and a target domain data set to determine the self-adaptive loss.
For ease of understanding, the following terms that may be used are explained herein:
activation function: each neuron node in the neural network receives the output value of the neuron of the upper layer as the input value of the neuron, and transmits the input value to the next layer, and the input layer neuron node directly transmits the input attribute value to the next layer (hidden layer or output layer); in a multi-layer neural network, there is a functional relationship between the output of an upper node and the input of a lower node, and this function is called an activation function.
Source domain (source domain): representing a different area than the test sample, but with rich supervision information.
Target domain): indicating the field of the test sample, no label or only a small number of labels; in general, the source domain and the target domain belong to the same class of tasks, but are distributed differently.
The embodiment of the application provides a model training method, which can be executed by a model training device, wherein the model training device can be integrated in electronic equipment such as pad, computer, server cluster, data center and the like. FIG. 1 is a flow chart of a model training method according to one embodiment of the present application; the model training method comprises the following steps:
S100, randomly acquiring a plurality of image episodes in a source domain data set, wherein each image episode comprises a plurality of types of support samples and query samples, and the support samples and the query samples in the image episodes are marked;
in the application, the source domain data set is used for training a model, and the trained model is used for identifying the image to be identified in the target domain data set. The data in the source domain data set is training data, and the data in the target domain data set is test data.
The source domain data set may be an ImageNet image data set, or may be another source, and the method for obtaining the source domain data set is not limited in this application.
It should be noted that, the labeling modes of the support sample and the query sample in the image episode are not limited in this application.
In one embodiment, the plurality of categories of the support samples in the image episode are the same as the plurality of categories of the query samples.
In the method, better training effect is achieved through the support samples and the query samples of the same category.
In one embodiment, each category contains 1-5 support samples in the image episode.
Each sampled image episode contains a support sample set S and a query sample set Q, namely:
S200, constructing a task-aware self-adaptive learning network model;
the self-adaptive learning network model is used for small sample image recognition.
S300, inputting the image episode into the self-adaptive learning network model to obtain a feature map of a support sample and a query sample in the image episode;
s400, determining classification loss according to the feature graphs of the support samples and the query samples, determining self-adaptive loss according to the domain offset of the image episode and the target domain data set, and determining overall loss according to the classification loss and the self-adaptive loss;
the image episode is from a source domain data set, so that the domain offset between the image episode and a target domain data set is the domain offset between the source domain data set and the target domain data set.
In one embodiment, the method determines the classification loss according to the feature graphs of the support samples and the query samples, predicts the category of the query samples through the labeled support samples, and then determines the classification loss through the predicted category and the labeled category of the query samples.
S500, adjusting the self-adaptive learning network model according to the overall loss until the overall loss converges.
In the present application, the model is adjusted according to the overall loss by means of back propagation.
In one embodiment, the counter-propagating gradient is calculated by the soft decision gradient to achieve better convergence.
In the method, the domain offset is introduced into the loss function, so that the trained model can give consideration to the target data set with different domain offsets, and a more accurate image recognition effect is achieved.
As shown in fig. 2, which is a schematic diagram of the following adaptive learning network model, the following description is made in connection with the diagram.
In one embodiment, the adaptive learning network model includes a plurality of residual blocks and full connection layers connected in sequence;
as shown in fig. 3, the inputting the image episode into the adaptive learning network model to obtain a feature map of a support sample and a query sample in the image episode includes:
s301, performing feature extraction on an input image episode through a residual block to obtain an extracted intermediate feature map;
in the application, every two residual blocks are a group, and a plurality of groups of residual blocks are sequentially connected; as the multi-group residual blocks extract the input image episodes layer by layer, the sizes of the extracted feature images are gradually reduced, and the number of the feature images is increased by times.
S302, performing linear transformation on the extracted feature map through a full connection layer to obtain feature maps of the support sample and the query sample.
In this application, the classifier shown in fig. 3 is the fully connected layer. The fully connected layer (fully connected layers, FC) acts as a "classifier" throughout the convolutional neural network.
The residual blocks form a residual network to extract the input image clips layer by layer, so that the gradient disappearance problem in a multi-layer network can be avoided; the complexity of the self-adaptive learning network model is increased through the linear transformation of the full connection layer, so that the model can express more complex characteristics.
In one embodiment, the residual block includes an adaptive module layer, a batch normalization layer, and a ReLU function;
as shown in fig. 4, the feature extraction of the input image episode by the residual block includes:
s311, extracting the characteristics of the output of the last residual block through the self-adaptive module layer;
s312, normalizing the output of the self-adaptive module through a batch normalization layer;
the batch normalization layer (Batch Normalization, BN) is used for performing normalization processing on data, so that the model convergence speed is increased.
S313, mapping the normalized output result to an output end through a ReLU function.
In the method, the residual block is normalized through BN after convolution, and then uses ReLU as an activation function after addition of direct mapping units.
In one embodiment, the adaptive module layer comprises a convolutional layer, a task adapter and a gating network, wherein the task adapter comprises a plurality of task parameter convolutional layers;
as shown in fig. 5, the feature extraction of the output of the last residual block by the adaptive module layer includes:
s321, performing first feature extraction on the output of the last residual block through a convolution layer;
in one embodiment, the output of the last residual block is first feature extracted by the 3*3 convolution layer.
S322, respectively carrying out second feature extraction on the output of the last residual block through a plurality of task parameter convolution layers in the task adapter;
in one embodiment, the task parameter convolution layer is a 1*1 convolution layer.
S323, generating a decision result based on the output of the last residual block through a gating network, and deciding whether each task parameter convolution layer in the task adapter is executed or not according to the decision result;
And S324, adding the result of the second feature extraction after the decision and the result of the first feature extraction to be used as the output of the adaptive module layer.
In the present application, task Adapters (TAs) are parallel to the 3*3 convolution layers, and each Adapter includes k specific Task parameter 1*1 convolution layers, and whether each specific Task parameter layer is executed or not is determined by the gating network.
Wherein,,3*3 convolution layer, which represents the first adaptation module, when the input to the first adaptation module is:
the features learned by the task adapter will be combined with those learned by the 3*3 convolution layer, namely:
wherein,,a task-specific learning function representing the ith layer of a task-specific adapter.Representing gate decisions generated by the gating network, determining whether a specific task function of the i-th layer is executed or not,/->Where 1 indicates execution and 0 indicates non-execution.
In one embodiment, the gating network includes a global average pooling layer, a prototype-like layer, a 1*1 convolution layer, and an activation function;
as shown in fig. 6, the generating, by the gating network, a decision result based on the output of the last residual block includes:
s331, carrying out space dimension compression on the output of the last residual block through a global average pooling layer;
In the present application, the input of the gating network is the output h of the last residual block l-1 The spatial dimensions of the feature map are first laminated via global averaging pooling, namely:
u l-1 =GAP(h l-1 )
S332, determining prototype features of each category in the image episode at the current layer through a prototype-like layer;
wherein, the prototype characteristic of each category at the current layer can be obtained by the following formula:
wherein,,prototype features representing class n, S n Representing a set of samples belonging to class n. And is also provided withPrototype features representing all classes of the current task.
S333, generating a decision result via 1*1 convolution layer and activation function according to the prototype feature of each category at the current layer.
Prototype features generate soft decisions by a linear function of 1*1 and Sigmoid activation function, namely:
Discrete decisions (decision results) can then be generated by a simple thresholding algorithm, namely:
as shown in fig. 2, 0.6, 0.2, 1, …, 0.3 are the soft decisions, and 1, 0, 1, …, 0 are discrete decisions/hard decisions/decision results of the production.
In one embodiment, S400, determining a classification loss according to the feature map of the support sample and the query sample, determining an adaptation loss according to the domain offset of the image episode and the target domain data set, and quantifying the domain offset of the image episode and the target domain data set by the maximum average difference in determining an overall loss according to the classification loss and the adaptation loss.
In one embodiment, the quantization formula for the domain offset is:
wherein,,mapping features to regenerated Hilbert space, P i Class prototype for class i in source domain dataset, N b Representing the category number, P, of the source domain dataset j For class prototype of class j in the source domain dataset, k () is a kernel function, N s To support the number of samples of the set +.>The MMD metric is represented in regenerated hilbert space with norms on the left vertical line of the equal sign.
In one embodiment, the adaptive loss function is:
wherein L is the number of the self-adaptive modules,t is the domain offset, i is the number of layers of the task adapter, represents the ith layer of the task adapter,is a soft decision of the ith layer of the task adapter of the first layer.
In one embodiment, the overall loss function is:
where lambda is a superparameter, used to balance the weights of the two losses,for adaptive loss->Is a classification loss.
The embodiment of the application provides an image recognition method which can be executed by an image recognition device, wherein the image recognition device can be integrated in electronic equipment such as a pad, a computer, a server cluster, a data center and the like. As shown in fig. 7, which is a flowchart of an image recognition method according to one embodiment of the present application; the image recognition method comprises the following steps:
S10, acquiring an image episode to be identified in a target domain data set, wherein the image episode to be identified comprises a plurality of types of support samples and inquiry samples, and the support samples are marked;
in this step, unlike in the model training method, the query sample of the image episode to be identified is not labeled.
Wherein, each image episode to be identified is independently subjected to image identification.
S20, acquiring a pre-trained self-adaptive learning network model, wherein the self-adaptive learning network model is obtained by training through the model training method;
s30, adjusting the self-adaptive learning network model through the marked support sample;
in one embodiment, the adapting the adaptive learning network model by the annotated support samples includes: inputting the marked support sample into the self-adaptive learning network model to obtain a feature map of the support sample; determining prototype features of each category at the current layer according to the feature map and the marked category of the support sample; calculating cross entropy loss according to the distance between each support sample and the prototype feature of the category in the current layer; optimizing the self-adaptive learning network model according to the cross entropy loss until convergence, and obtaining the adjusted self-adaptive learning network model.
The cross entropy loss is the classification loss in the model training method.
The determining manner of the prototype feature of each category in the current layer is described in the model training method, and is not described in detail in this step.
The distance between each support sample and the prototype feature of the class at the current layer may be a euclidean distance.
S40, determining a feature map of a support sample and a query sample in the image episode to be identified through the adjusted self-adaptive learning network model;
after the adaptive learning network model is adjusted, the feature map of the support sample needs to be redetermined by the adjusted adaptive learning network model.
S50, determining the category of the query sample according to the feature map of the query sample and the feature map of the marked support sample.
In one embodiment, the determining the category of the query sample according to the feature map of the query sample and the feature map of the labeled support sample includes: determining prototype features of each category at the current layer according to the feature map and the marked category of the support sample; determining the distance between each query sample and the prototype feature of the category in the current layer according to the feature map of each query sample; and selecting the category with the smallest distance as the identification result of the query sample.
According to the domain offset of the task to be tested and the source domain, the optimal task specific parameter strategy is adaptively learned for each test task, different optimal reasoning network structure diagrams are obtained for the task to be tested with different domain offsets, and the accuracy of small sample image identification is improved.
In the above model training method, the method predicts the category of the query sample by the labeled support sample, and the specific process is the same as the method for image recognition by determining the category of the query sample by the labeled support sample (steps S20-S50), except that the method for model training acquires the feature map of the support sample and the query sample by the untrained adaptive learning network model, and the method for image recognition acquires the feature map of the support sample and the query sample by the pretrained adaptive learning network model. Based on this, a detailed description of the specific process of predicting the class of the query sample by the labeled support sample in the model training method is not repeated.
The embodiment of the application provides a model training device, which is used for executing the model training method described in the content of the application, and the model training device is described in detail below.
As shown in fig. 8, the model training apparatus includes:
a training acquisition module 101, configured to randomly acquire a plurality of image episodes in a source domain data set, where each image episode includes a plurality of types of support samples and query samples, and the support samples and the query samples in the image episodes are labeled;
a model building module 102 for building a task aware adaptive learning network model;
the feature acquisition module 103 is configured to input the image episode into the adaptive learning network model to obtain feature graphs of a support sample and a query sample in the image episode;
a loss determination module 104, configured to determine a classification loss according to a feature map of the support sample and the query sample, determine an adaptive loss according to a domain offset of the image episode and a target domain data set, and determine an overall loss according to the classification loss and the adaptive loss;
a model training module 105 for adjusting the adaptive learning network model according to the overall loss until the overall loss converges.
In one embodiment, the adaptive learning network model includes a plurality of residual blocks and full connection layers connected in sequence;
The feature acquisition module 103 is further configured to:
feature extraction is carried out on the input image cutting through a residual block, and an extracted intermediate feature image is obtained;
and carrying out linear transformation on the extracted feature images through a full connection layer to obtain feature images of the support samples and the query samples.
In one embodiment, the residual block includes an adaptive module layer, a batch normalization layer, and a ReLU function;
the feature acquisition module 103 is further configured to:
performing feature extraction on the output of the last residual block through the self-adaptive module layer;
normalizing the output of the self-adaptive module through a batch normalization layer;
and mapping the normalized output result to an output end through a ReLU function.
In one embodiment, the adaptive module layer comprises a convolutional layer, a task adapter and a gating network, wherein the task adapter comprises a plurality of task parameter convolutional layers;
the feature acquisition module 103 is further configured to:
performing first feature extraction on the output of the last residual block through a convolution layer;
respectively carrying out second feature extraction on the output of the last residual block through a plurality of task parameter convolution layers in the task adapter;
generating a decision result based on the output of the last residual block through a gating network, and deciding whether each task parameter convolution layer in the task adapter is executed or not according to the decision result;
And adding the result of the second feature extraction after decision and the result of the first feature extraction to be used as the output of the adaptive module layer.
In one embodiment, the gating network includes a global average pooling layer, a prototype-like layer, a 1*1 convolution layer, and an activation function;
the feature acquisition module 103 is further configured to:
carrying out space dimension compression on the output of the last residual block through a global average pooling layer;
determining prototype features of each category in the image episode at the current layer through a prototype-like layer;
the decision results are generated via 1*1 convolution layers and activation functions based on the prototype features of each class at the current layer.
The model training device provided by the above embodiment of the present application and the model training method provided by the embodiment of the present application are the same inventive concept, and have the same beneficial effects as the method adopted, operated or implemented by the application program stored therein.
An embodiment of the present application provides an image recognition device, which is used for executing the image recognition method described in the foregoing content of the present application, and the image recognition device is described in detail below.
As shown in fig. 9, the image recognition apparatus includes:
a test acquisition module 201, configured to acquire an image episode to be identified in a target domain data set, where the image episode to be identified includes a plurality of types of support samples and query samples, and the support samples are labeled;
A model obtaining module 202, configured to obtain a pre-trained adaptive learning network model, where the adaptive learning network model is obtained by training by the model training method described above;
a model adjustment module 203, configured to adjust the adaptive learning network model through the labeled support sample;
the model output module 204 is configured to determine, through the adjusted adaptive learning network model, a feature map of a support sample and a query sample in the image episode to be identified;
the category determining module 205 is configured to determine a category of the query sample according to the feature map of the query sample and the feature map of the labeled support sample.
The image recognition device provided by the above embodiment of the present application and the image recognition method provided by the embodiment of the present application are the same inventive concept, and have the same advantages as the method adopted, operated or implemented by the application program stored therein.
The internal functions and structures of the model training apparatus/image recognition apparatus are described above, and as shown in fig. 10, in practice, the model training apparatus/image recognition apparatus may be implemented as a terminal device, including: memory 301 and processor 303.
The memory 301 may be configured to store a program.
In addition, the memory 301 may also be configured to store other various data to support operations on the terminal device. Examples of such data include instructions for any application or method operating on the terminal device, contact data, phonebook data, messages, pictures, video, etc.
The memory 301 may be implemented by any type or combination of volatile or nonvolatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disk.
A processor 303 coupled to the memory 301 for executing programs in the memory 301 for:
randomly acquiring a plurality of image episodes in a source domain data set, wherein each image episode comprises a plurality of types of support samples and query samples, and the support samples and the query samples in the image episodes are marked;
constructing a task-aware self-adaptive learning network model;
inputting the image episode into the self-adaptive learning network model to obtain a feature map of a support sample and a query sample in the image episode;
Determining classification loss according to the feature graphs of the support samples and the query samples, determining adaptive loss according to the domain offset of the image episode and the target domain data set, and determining overall loss according to the classification loss and the adaptive loss;
and adjusting the self-adaptive learning network model according to the overall loss until the overall loss converges.
In one embodiment, the adaptive learning network model includes a plurality of residual blocks and full connection layers connected in sequence;
the processor 303 is specifically configured to:
feature extraction is carried out on the input image cutting through a residual block, and an extracted intermediate feature image is obtained;
and carrying out linear transformation on the extracted feature images through a full connection layer to obtain feature images of the support samples and the query samples.
In one embodiment, the residual block includes an adaptive module layer, a batch normalization layer, and a ReLU function;
the processor 303 is specifically configured to:
performing feature extraction on the output of the last residual block through the self-adaptive module layer;
normalizing the output of the self-adaptive module through a batch normalization layer;
and mapping the normalized output result to an output end through a ReLU function.
In one embodiment, the adaptive module layer comprises a convolutional layer, a task adapter and a gating network, wherein the task adapter comprises a plurality of task parameter convolutional layers;
the processor 303 is specifically configured to:
performing first feature extraction on the output of the last residual block through a convolution layer;
respectively carrying out second feature extraction on the output of the last residual block through a plurality of task parameter convolution layers in the task adapter;
generating a decision result based on the output of the last residual block through a gating network, and deciding whether each task parameter convolution layer in the task adapter is executed or not according to the decision result;
and adding the result of the second feature extraction after decision and the result of the first feature extraction to be used as the output of the adaptive module layer.
In one embodiment, the gating network includes a global average pooling layer, a prototype-like layer, a 1*1 convolution layer, and an activation function;
the processor 303 is specifically configured to:
carrying out space dimension compression on the output of the last residual block through a global average pooling layer;
determining prototype features of each category in the image episode at the current layer through a prototype-like layer;
the decision results are generated via 1*1 convolution layers and activation functions based on the prototype features of each class at the current layer.
Alternatively, the processor 303 is coupled to the memory 301 for executing the program in the memory 301 for:
acquiring an image episode to be identified in a target domain data set, wherein the image episode to be identified comprises a plurality of types of support samples and query samples, and the support samples are marked;
obtaining a pre-trained self-adaptive learning network model, wherein the self-adaptive learning network model is obtained by training through the model training method;
adjusting the self-adaptive learning network model through the marked support sample;
determining a feature map of a support sample and a query sample in the image episode to be identified through the adjusted self-adaptive learning network model;
and determining the category of the query sample according to the feature map of the query sample and the feature map of the marked support sample.
In this application, only some components are schematically shown in fig. 10, which does not mean that the terminal device only includes the components shown in fig. 10.
The terminal device provided in this embodiment, which is the same as the model training method or the image recognition method provided in this embodiment of the present application, has the same advantages as the method adopted, operated or implemented by the application program stored therein, because of the same inventive concept.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-readable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In one typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media.
The present application also provides a computer readable storage medium corresponding to the model training method or the image recognition method provided in the foregoing embodiments, on which a computer program (i.e., a program product) is stored, which when executed by a processor, performs the model training method provided in any of the foregoing embodiments, or performs the image recognition method provided in any of the foregoing embodiments.
Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.
The computer readable storage medium provided by the above embodiment of the present application has the same advantages as the method adopted, operated or implemented by the application program stored therein, because the same inventive concept is adopted by the model training method or the image recognition method provided by the embodiment of the present application.
In the description provided herein, numerous specific details are set forth. However, it is understood that embodiments of the present application may be practiced without these specific details. In some instances, well-known structures and techniques have not been shown in detail in order not to obscure an understanding of this description.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element.
The foregoing is merely exemplary of the present application and is not intended to limit the present application. Various modifications and changes may be made to the present application by those skilled in the art. Any modifications, equivalent substitutions, improvements, etc. which are within the spirit and principles of the present application are intended to be included within the scope of the claims of the present application.
Claims (10)
1. A method of model training, comprising:
randomly acquiring a plurality of image episodes in a source domain data set, wherein each image episode comprises a plurality of types of support samples and query samples, and the support samples and the query samples in the image episodes are marked;
constructing a task-aware self-adaptive learning network model;
inputting the image episode into the self-adaptive learning network model to obtain a feature map of a support sample and a query sample in the image episode;
determining classification loss according to the feature graphs of the support samples and the query samples, determining adaptive loss according to the domain offset of the image episode and the target domain data set, and determining overall loss according to the classification loss and the adaptive loss;
and adjusting the self-adaptive learning network model according to the overall loss until the overall loss converges.
2. The method of claim 1, wherein the adaptive learning network model comprises a plurality of residual blocks and full connection layers connected in sequence;
inputting the image episode into the adaptive learning network model to obtain a feature map of a support sample and a query sample in the image episode, including:
feature extraction is carried out on the input image cutting through a residual block, and an extracted intermediate feature image is obtained;
and carrying out linear transformation on the extracted feature images through a full connection layer to obtain feature images of the support samples and the query samples.
3. The method of claim 2, wherein the residual block comprises an adaptation module layer, a batch normalization layer, and a ReLU function;
the feature extraction of the input image episode by the residual block includes:
performing feature extraction on the output of the last residual block through the self-adaptive module layer;
normalizing the output of the self-adaptive module through a batch normalization layer;
and mapping the normalized output result to an output end through a ReLU function.
4. The method of claim 3, wherein the adaptive module layer comprises a convolutional layer, a task adapter, and a gating network, the task adapter comprising a plurality of task parameter convolutional layers;
The feature extraction of the output of the last residual block by the adaptive module layer comprises:
performing first feature extraction on the output of the last residual block through a convolution layer;
respectively carrying out second feature extraction on the output of the last residual block through a plurality of task parameter convolution layers in the task adapter;
generating a decision result based on the output of the last residual block through a gating network, and deciding whether each task parameter convolution layer in the task adapter is executed or not according to the decision result;
and adding the result of the second feature extraction after decision and the result of the first feature extraction to be used as the output of the adaptive module layer.
5. The method of claim 4, wherein the gating network comprises a global average pooling layer, a prototype-like layer, a 1*1 convolution layer, and an activation function;
the generating, by the gating network, a decision result based on an output of a last residual block, including:
carrying out space dimension compression on the output of the last residual block through a global average pooling layer;
determining prototype features of each category in the image episode at the current layer through a prototype-like layer;
the decision results are generated via 1*1 convolution layers and activation functions based on the prototype features of each class at the current layer.
6. An image recognition method, comprising:
acquiring an image episode to be identified in a target domain data set, wherein the image episode to be identified comprises a plurality of types of support samples and query samples, and the support samples are marked;
obtaining a pre-trained self-adaptive learning network model, wherein the self-adaptive learning network model is obtained by training a model training method according to any one of claims 1-5;
adjusting the self-adaptive learning network model through the marked support sample;
determining a feature map of a support sample and a query sample in the image episode to be identified through the adjusted self-adaptive learning network model;
and determining the category of the query sample according to the feature map of the query sample and the feature map of the marked support sample.
7. A model training device, comprising:
the training acquisition module is used for randomly acquiring a plurality of image episodes in the source domain data set, each image episode comprises a plurality of categories of support samples and query samples, and the support samples and the query samples in the image episodes are marked;
the model construction module is used for constructing a task-aware self-adaptive learning network model;
The feature acquisition module is used for inputting the image episode into the self-adaptive learning network model to obtain feature graphs of a support sample and a query sample in the image episode;
a loss determination module for determining a classification loss according to a feature map of the support sample and the query sample, an adaptive loss according to a domain offset of the image episode and a target domain dataset, and an overall loss according to the classification loss and the adaptive loss;
and the model training module is used for adjusting the self-adaptive learning network model according to the integral loss until the integral loss converges.
8. An image recognition apparatus, comprising:
the test acquisition module is used for acquiring an image episode to be identified in the target domain data set, wherein the image episode to be identified comprises a plurality of types of support samples and inquiry samples, and the support samples are marked;
a model acquisition module for acquiring a pre-trained adaptive learning network model, the adaptive learning network model being obtained by training the model training method according to any one of claims 1 to 5;
the model adjustment module is used for adjusting the self-adaptive learning network model through the marked support samples;
The model output module is used for determining a feature map of a support sample and a query sample in the image episode to be identified through the adjusted self-adaptive learning network model;
and the category determining module is used for determining the category of the query sample according to the characteristic diagram of the query sample and the characteristic diagram of the marked support sample.
9. A terminal device, comprising: a memory and a processor;
the memory is used for storing programs;
the processor, coupled to the memory, for executing the program for performing the model training method of any of claims 1-5 or for performing the image recognition method of claim 6.
10. A computer-readable storage medium, on which a computer program is stored, characterized in that the program is executed by a processor to implement the model training method of any one of claims 1-5, or to implement the image recognition method of claim 6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310063908.6A CN116091867B (en) | 2023-01-12 | 2023-01-12 | Model training and image recognition method, device, equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310063908.6A CN116091867B (en) | 2023-01-12 | 2023-01-12 | Model training and image recognition method, device, equipment and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN116091867A true CN116091867A (en) | 2023-05-09 |
CN116091867B CN116091867B (en) | 2023-09-29 |
Family
ID=86204138
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310063908.6A Active CN116091867B (en) | 2023-01-12 | 2023-01-12 | Model training and image recognition method, device, equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116091867B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116863216A (en) * | 2023-06-30 | 2023-10-10 | 国网湖北省电力有限公司武汉供电公司 | Depth field adaptive image classification method, system and medium based on data manifold geometry |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109447149A (en) * | 2018-10-25 | 2019-03-08 | 腾讯科技(深圳)有限公司 | A kind of training method of detection model, device and terminal device |
WO2020083073A1 (en) * | 2018-10-23 | 2020-04-30 | 苏州科达科技股份有限公司 | Non-motorized vehicle image multi-label classification method, system, device and storage medium |
US20200143209A1 (en) * | 2018-11-07 | 2020-05-07 | Element Ai Inc. | Task dependent adaptive metric for classifying pieces of data |
CN111858991A (en) * | 2020-08-06 | 2020-10-30 | 南京大学 | Small sample learning algorithm based on covariance measurement |
US20210003700A1 (en) * | 2019-07-02 | 2021-01-07 | Wuyi University | Method and apparatus for enhancing semantic features of sar image oriented small set of samples |
CN112990282A (en) * | 2021-03-03 | 2021-06-18 | 华南理工大学 | Method and device for classifying fine-grained small sample images |
US20210319263A1 (en) * | 2020-04-13 | 2021-10-14 | International Business Machines Corporation | System and method for augmenting few-shot object classification with semantic information from multiple sources |
CN114511521A (en) * | 2022-01-21 | 2022-05-17 | 浙江大学 | Tire defect detection method based on multiple representations and multiple sub-field self-adaption |
CN115239946A (en) * | 2022-06-30 | 2022-10-25 | 锋睿领创(珠海)科技有限公司 | Small sample transfer learning training and target detection method, device, equipment and medium |
CN115270872A (en) * | 2022-07-26 | 2022-11-01 | 中山大学 | Radar radiation source individual small sample learning and identifying method, system, device and medium |
-
2023
- 2023-01-12 CN CN202310063908.6A patent/CN116091867B/en active Active
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2020083073A1 (en) * | 2018-10-23 | 2020-04-30 | 苏州科达科技股份有限公司 | Non-motorized vehicle image multi-label classification method, system, device and storage medium |
CN109447149A (en) * | 2018-10-25 | 2019-03-08 | 腾讯科技(深圳)有限公司 | A kind of training method of detection model, device and terminal device |
US20200143209A1 (en) * | 2018-11-07 | 2020-05-07 | Element Ai Inc. | Task dependent adaptive metric for classifying pieces of data |
US20210003700A1 (en) * | 2019-07-02 | 2021-01-07 | Wuyi University | Method and apparatus for enhancing semantic features of sar image oriented small set of samples |
US20210319263A1 (en) * | 2020-04-13 | 2021-10-14 | International Business Machines Corporation | System and method for augmenting few-shot object classification with semantic information from multiple sources |
CN111858991A (en) * | 2020-08-06 | 2020-10-30 | 南京大学 | Small sample learning algorithm based on covariance measurement |
CN112990282A (en) * | 2021-03-03 | 2021-06-18 | 华南理工大学 | Method and device for classifying fine-grained small sample images |
CN114511521A (en) * | 2022-01-21 | 2022-05-17 | 浙江大学 | Tire defect detection method based on multiple representations and multiple sub-field self-adaption |
CN115239946A (en) * | 2022-06-30 | 2022-10-25 | 锋睿领创(珠海)科技有限公司 | Small sample transfer learning training and target detection method, device, equipment and medium |
CN115270872A (en) * | 2022-07-26 | 2022-11-01 | 中山大学 | Radar radiation source individual small sample learning and identifying method, system, device and medium |
Non-Patent Citations (2)
Title |
---|
YURONG GUO等: "Learning Calibrated Class Centers for Few-Shot Classification by Pair-Wise Similarity", 《 IEEE TRANSACTIONS ON IMAGE PROCESSING 》 * |
杨晨曦;左?;孙频捷;: "基于自编码器的零样本学习方法研究进展", 现代计算机, no. 01 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116863216A (en) * | 2023-06-30 | 2023-10-10 | 国网湖北省电力有限公司武汉供电公司 | Depth field adaptive image classification method, system and medium based on data manifold geometry |
Also Published As
Publication number | Publication date |
---|---|
CN116091867B (en) | 2023-09-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2023160290A1 (en) | Neural network inference acceleration method, target detection method, device, and storage medium | |
CN109993102B (en) | Similar face retrieval method, device and storage medium | |
CN111027576B (en) | Cooperative significance detection method based on cooperative significance generation type countermeasure network | |
CN113361645B (en) | Target detection model construction method and system based on meta learning and knowledge memory | |
CN112200296B (en) | Network model quantization method and device, storage medium and electronic equipment | |
WO2020019102A1 (en) | Methods, systems, articles of manufacture and apparatus to train a neural network | |
CN116091867B (en) | Model training and image recognition method, device, equipment and storage medium | |
WO2024012138A1 (en) | Target detection model training method and apparatus, and target detection method and apparatus | |
CN112668608A (en) | Image identification method and device, electronic equipment and storage medium | |
CN117608630A (en) | Code quality detection method, device, equipment and storage medium | |
CN115063664A (en) | Model learning method, training method and system for industrial vision detection | |
CN117523218A (en) | Label generation, training of image classification model and image classification method and device | |
CN111832693A (en) | Neural network layer operation and model training method, device and equipment | |
CN117708698A (en) | Class determination method, device, equipment and storage medium | |
WO2024012179A1 (en) | Model training method, target detection method and apparatuses | |
CN112966815A (en) | Target detection method, system and equipment based on impulse neural network | |
CN116958809A (en) | Remote sensing small sample target detection method for feature library migration | |
JP2023126130A (en) | Computer-implemented method, data processing apparatus and computer program for object detection | |
CN116468948A (en) | Incremental learning detection method and system for supporting detection of unknown urban garbage | |
CN110728292A (en) | Self-adaptive feature selection algorithm under multi-task joint optimization | |
CN112507137B (en) | Small sample relation extraction method based on granularity perception in open environment and application | |
CN115600666A (en) | Self-learning method and device for power transmission and distribution line defect detection model | |
CN113033397A (en) | Target tracking method, device, equipment, medium and program product | |
CN117274732B (en) | Method and system for constructing optimized diffusion model based on scene memory drive | |
CN118506221B (en) | Semi-supervised detection method based on unmanned aerial vehicle overhead line self-adaptive inspection |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |