CN116091867B - Model training and image recognition method, device, equipment and storage medium - Google Patents

Model training and image recognition method, device, equipment and storage medium Download PDF

Info

Publication number
CN116091867B
CN116091867B CN202310063908.6A CN202310063908A CN116091867B CN 116091867 B CN116091867 B CN 116091867B CN 202310063908 A CN202310063908 A CN 202310063908A CN 116091867 B CN116091867 B CN 116091867B
Authority
CN
China
Prior art keywords
image
adaptive
episode
layer
loss
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310063908.6A
Other languages
Chinese (zh)
Other versions
CN116091867A (en
Inventor
马占宇
郭玉荣
杜若一
梁孔明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Posts and Telecommunications
Original Assignee
Beijing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Posts and Telecommunications filed Critical Beijing University of Posts and Telecommunications
Priority to CN202310063908.6A priority Critical patent/CN116091867B/en
Publication of CN116091867A publication Critical patent/CN116091867A/en
Application granted granted Critical
Publication of CN116091867B publication Critical patent/CN116091867B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The application provides a model training and image recognition method, device and equipment and a storage medium, wherein the method comprises the following steps: randomly acquiring a plurality of image episodes in a source domain data set; constructing a task-aware self-adaptive learning network model; inputting the image episode into the self-adaptive learning network model to obtain a feature map of a support sample and a query sample in the image episode; determining classification loss according to the feature graphs of the support samples and the query samples, determining adaptive loss according to the domain offset of the image episode and the target domain data set, and determining overall loss according to the classification loss and the adaptive loss; and adjusting the self-adaptive learning network model according to the overall loss until the overall loss converges. According to the application, the domain offset is introduced into the loss function, so that the trained model can give consideration to the target data sets with different domain offsets, and a more accurate image recognition effect is achieved.

Description

Model training and image recognition method, device, equipment and storage medium
Technical Field
The application relates to the technical field of image processing, in particular to a model training method, an image recognition method, a device, equipment and a storage medium.
Background
The conventional deep learning model shows excellent generalization performance through training of a large number of labeled samples. However, abundant samples and reliable labels are difficult to obtain in practical applications, such as rare disease diagnosis, fine-grained identification, etc. Inspired by the fact that humans can quickly learn new knowledge, small sample learning aims at achieving that when each class has only a small number of samples with labels, the model can well identify samples to be detected.
However, most of the current small sample learning image recognition models only pay attention to how to quickly adapt to new categories, and do not consider the domain offset problem between the test task and the training domain.
Disclosure of Invention
The application solves the problem that the current small sample learning image recognition model does not consider the domain offset problem between a test task and a training domain.
To solve the above problems, a first aspect of the present application provides a model training method, including:
randomly acquiring a plurality of image episodes in a source domain data set, wherein each image episode comprises a plurality of types of support samples and query samples, and the support samples and the query samples in the image episodes are marked;
constructing a task-aware self-adaptive learning network model;
Inputting the image episode into the self-adaptive learning network model to obtain a feature map of a support sample and a query sample in the image episode;
determining classification loss according to the feature graphs of the support samples and the query samples, determining adaptive loss according to the domain offset of the image episode and the target domain data set, and determining overall loss according to the classification loss and the adaptive loss;
and adjusting the self-adaptive learning network model according to the overall loss until the overall loss converges.
The second aspect of the present application provides an image recognition method, comprising:
acquiring an image episode to be identified in a target domain data set, wherein the image episode to be identified comprises a plurality of types of support samples and query samples, and the support samples are marked;
obtaining a pre-trained self-adaptive learning network model, wherein the self-adaptive learning network model is obtained by training through the model training method;
adjusting the self-adaptive learning network model through the marked support sample;
determining a feature map of a support sample and a query sample in the image episode to be identified through the adjusted self-adaptive learning network model;
And determining the category of the query sample according to the feature map of the query sample and the feature map of the marked support sample.
A third aspect of the present application provides a model training apparatus comprising:
the training acquisition module is used for randomly acquiring a plurality of image episodes in the source domain data set, each image episode comprises a plurality of categories of support samples and query samples, and the support samples and the query samples in the image episodes are marked;
the model construction module is used for constructing a task-aware self-adaptive learning network model;
the feature acquisition module is used for inputting the image episode into the self-adaptive learning network model to obtain feature graphs of a support sample and a query sample in the image episode;
a loss determination module for determining a classification loss according to a feature map of the support sample and the query sample, an adaptive loss according to a domain offset of the image episode and a target domain dataset, and an overall loss according to the classification loss and the adaptive loss;
and the model training module is used for adjusting the self-adaptive learning network model according to the integral loss until the integral loss converges.
A fourth aspect of the present application provides an image recognition apparatus, comprising:
the test acquisition module is used for acquiring an image episode to be identified in the target domain data set, wherein the image episode to be identified comprises a plurality of types of support samples and inquiry samples, and the support samples are marked;
the model acquisition module is used for acquiring a pre-trained self-adaptive learning network model, and the self-adaptive learning network model is obtained by training the model training method;
the model adjustment module is used for adjusting the self-adaptive learning network model through the marked support samples;
the model output module is used for determining a feature map of a support sample and a query sample in the image episode to be identified through the adjusted self-adaptive learning network model;
and the category determining module is used for determining the category of the query sample according to the characteristic diagram of the query sample and the characteristic diagram of the marked support sample.
A fifth aspect of the present application provides a terminal device, comprising: a memory and a processor;
the memory is used for storing programs;
the processor is coupled to the memory for executing the program for executing the model training method described above, or for executing the image recognition method described above.
A sixth aspect of the present application provides a computer-readable storage medium having stored thereon a computer program for execution by a processor to implement the model training method described above, or to implement the image recognition method described above.
According to the application, the domain offset is introduced into the loss function, so that the trained model can give consideration to the target data sets with different domain offsets, and a more accurate image recognition effect is achieved.
According to the domain offset of the task to be tested and the source domain, the method and the system adaptively learn the optimal task specific parameter strategy for each test task, and meanwhile, the task to be tested with different domain offsets obtains different optimal reasoning network structure diagrams, so that the accuracy of small sample image identification is improved.
Drawings
FIG. 1 is a flow chart of a model training method according to one embodiment of the application;
FIG. 2 is a schematic diagram of an adaptive learning network model according to the present application;
FIG. 3 is a flow chart of a model training method based on an adaptive learning network model according to one embodiment of the present application;
FIG. 4 is a flow chart of a model training method based on residual blocks according to one embodiment of the application;
FIG. 5 is a flow chart of a model training method based on an adaptive module layer according to one embodiment of the present application;
FIG. 6 is a flow chart of a model training method based on a gating network according to one embodiment of the present application;
FIG. 7 is a flow chart of an image recognition method according to one embodiment of the present application;
FIG. 8 is a block diagram of a model training apparatus according to one embodiment of the present application;
fig. 9 is a block diagram of an image recognition apparatus according to an embodiment of the present application;
fig. 10 is a block diagram of a terminal device according to an embodiment of the present application.
Detailed Description
In order that the above objects, features and advantages of the application will be readily understood, a more particular description of the application will be rendered by reference to specific embodiments thereof which are illustrated in the appended drawings. While exemplary embodiments of the present application are shown in the drawings, it should be understood that the present application may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the application to those skilled in the art.
It is noted that unless otherwise indicated, technical or scientific terms used herein should be given the ordinary meaning as understood by one of ordinary skill in the art to which this application belongs.
The conventional deep learning model shows excellent generalization performance through training of a large number of labeled samples. However, abundant samples and reliable labels are difficult to obtain in practical applications, such as rare disease diagnosis, fine-grained identification, etc. Inspired by the fact that humans can quickly learn new knowledge, small sample learning aims at achieving that when each class has only a small number of samples with labels, the model can well identify samples to be detected.
Recently, a small sample learning image recognition method based on meta learning has made a long progress in the face of test data derived from a source domain. However, more and more studies have shown that the generalization ability of these methods is significantly inadequate when the test data is heterogeneous with the source domain. This limitation is due to the fact that most small sample learning image recognition models only focus on how quickly to adapt to new classes, but there is little or no effort to understand and solve the domain offset problem between the test task and the training domain.
Aiming at the problems, the application provides a new model training scheme, which can solve the problem that the current small sample learning image recognition model does not consider the domain offset between a test task and a training domain by introducing the domain offset of an image episode and a target domain data set to determine the self-adaptive loss.
For ease of understanding, the following terms that may be used are explained herein:
activation function: each neuron node in the neural network receives the output value of the neuron of the upper layer as the input value of the neuron, and transmits the input value to the next layer, and the input layer neuron node directly transmits the input attribute value to the next layer (hidden layer or output layer); in a multi-layer neural network, there is a functional relationship between the output of an upper node and the input of a lower node, and this function is called an activation function.
Source domain (source domain): representing a different area than the test sample, but with rich supervision information.
Target domain): indicating the field of the test sample, no label or only a small number of labels; in general, the source domain and the target domain belong to the same class of tasks, but are distributed differently.
The embodiment of the application provides a model training method which can be executed by a model training device, and the model training device can be integrated in electronic equipment such as pad, computer, server cluster, data center and the like. FIG. 1 is a flow chart of a model training method according to one embodiment of the application; the model training method comprises the following steps:
S100, randomly acquiring a plurality of image episodes in a source domain data set, wherein each image episode comprises a plurality of types of support samples and query samples, and the support samples and the query samples in the image episodes are marked;
in the application, the source domain data set is used for training a model, and the trained model is used for identifying the image to be identified in the target domain data set. The data in the source domain data set is training data, and the data in the target domain data set is test data.
The source domain data set may be an ImageNet image data set or other sources, and the method for acquiring the source domain data set is not limited in the application.
It should be noted that, the labeling modes of the support sample and the query sample in the image episode are not limited in the present application.
In one embodiment, the plurality of categories of the support samples in the image episode are the same as the plurality of categories of the query samples.
In the application, better training effect is achieved through the support samples and the query samples of the same category.
In one embodiment, each category contains 1-5 support samples in the image episode.
Each sampled image episode contains a support sample set S and a query sample set Q, namely:
T=s.u.q, and
wherein the support sample set is noted as:the query sample set is marked->
S200, constructing a task-aware self-adaptive learning network model;
the self-adaptive learning network model is used for small sample image recognition.
S300, inputting the image episode into the self-adaptive learning network model to obtain a feature map of a support sample and a query sample in the image episode;
s400, determining classification loss according to the feature graphs of the support samples and the query samples, determining self-adaptive loss according to the domain offset of the image episode and the target domain data set, and determining overall loss according to the classification loss and the self-adaptive loss;
the image episode is from a source domain data set, so that the domain offset between the image episode and a target domain data set is the domain offset between the source domain data set and the target domain data set.
In one embodiment, the method determines the classification loss according to the feature graphs of the support samples and the query samples, predicts the category of the query samples through the labeled support samples, and then determines the classification loss through the predicted category and the labeled category of the query samples.
S500, adjusting the self-adaptive learning network model according to the overall loss until the overall loss converges.
In the application, the model is adjusted according to the overall loss by a back propagation mode.
In one embodiment, the counter-propagating gradient is calculated by the soft decision gradient to achieve better convergence.
According to the application, the domain offset is introduced into the loss function, so that the trained model can give consideration to the target data sets with different domain offsets, and a more accurate image recognition effect is achieved.
As shown in fig. 2, which is a schematic diagram of the following adaptive learning network model, the following description is made in connection with the diagram.
In one embodiment, the adaptive learning network model includes a plurality of residual blocks and full connection layers connected in sequence;
as shown in fig. 3, the inputting the image episode into the adaptive learning network model to obtain a feature map of a support sample and a query sample in the image episode includes:
s301, performing feature extraction on an input image episode through a residual block to obtain an extracted intermediate feature map;
in the application, every two residual blocks are a group, and a plurality of groups of residual blocks are connected in sequence; as the multi-group residual blocks extract the input image episodes layer by layer, the sizes of the extracted feature images are gradually reduced, and the number of the feature images is increased by times.
S302, performing linear transformation on the extracted feature map through a full connection layer to obtain feature maps of the support sample and the query sample.
In the present application, the classifier shown in fig. 3 is the full connection layer. The fully connected layer (fully connected layers, FC) acts as a "classifier" throughout the convolutional neural network.
The residual blocks form a residual network to extract the input image clips layer by layer, so that the gradient disappearance problem in a multi-layer network can be avoided; the complexity of the self-adaptive learning network model is increased through the linear transformation of the full connection layer, so that the model can express more complex characteristics.
In one embodiment, the residual block includes an adaptive module layer, a batch normalization layer, and a ReLU function;
as shown in fig. 4, the feature extraction of the input image episode by the residual block includes:
s311, extracting the characteristics of the output of the last residual block through the self-adaptive module layer;
s312, normalizing the output of the self-adaptive module through a batch normalization layer;
the batch normalization layer (Batch Normalization, BN) is used for performing normalization processing on data, so that the model convergence speed is increased.
S313, mapping the normalized output result to an output end through a ReLU function.
In the application, the residual block is normalized through BN after convolution, and then uses ReLU as an activation function after addition of direct mapping units.
In one embodiment, the adaptive module layer comprises a convolutional layer, a task adapter and a gating network, wherein the task adapter comprises a plurality of task parameter convolutional layers;
as shown in fig. 5, the feature extraction of the output of the last residual block by the adaptive module layer includes:
s321, performing first feature extraction on the output of the last residual block through a convolution layer;
in one embodiment, the output of the last residual block is first feature extracted by the 3*3 convolution layer.
S322, respectively carrying out second feature extraction on the output of the last residual block through a plurality of task parameter convolution layers in the task adapter;
in one embodiment, the task parameter convolution layer is a 1*1 convolution layer.
S323, generating a decision result based on the output of the last residual block through a gating network, and deciding whether each task parameter convolution layer in the task adapter is executed or not according to the decision result;
And S324, adding the result of the second feature extraction after the decision and the result of the first feature extraction to be used as the output of the adaptive module layer.
In the present application, task Adapters (TAs) are parallel to the 3*3 convolution layers, and each Adapter contains k specific Task parameter 1*1 convolution layers, and whether each specific Task parameter layer is executed or not is determined by the gating network.
Wherein, the liquid crystal display device comprises a liquid crystal display device,3*3 convolution layer, which represents the first adaptation module, when the input to the first adaptation module is:
the features learned by the task adapter will be combined with those learned by the 3*3 convolution layer, namely:
wherein, the liquid crystal display device comprises a liquid crystal display device,a task-specific learning function representing the ith layer of a task-specific adapter. />Representing gate decisions generated by the gating network, determining whether a specific task function of the i-th layer is executed or not,/->Where 1 indicates execution and 0 indicates non-execution.
In one embodiment, the gating network includes a global average pooling layer, a prototype-like layer, a 1*1 convolution layer, and an activation function;
as shown in fig. 6, the generating, by the gating network, a decision result based on the output of the last residual block includes:
s331, carrying out space dimension compression on the output of the last residual block through a global average pooling layer;
In the application, the input of the gating network is the output h of the last residual block l-1 The spatial dimensions of the feature map are first laminated via global averaging pooling, namely:
u l-1 =GAP(h l-1 )
wherein, the liquid crystal display device comprises a liquid crystal display device,and->
S332, determining prototype features of each category in the image episode at the current layer through a prototype-like layer;
wherein, the prototype characteristic of each category at the current layer can be obtained by the following formula:
wherein, the liquid crystal display device comprises a liquid crystal display device,prototype features representing class n, S n Representing a set of samples belonging to class n. And is also provided withPrototype features representing all classes of the current task.
S333, generating a decision result via 1*1 convolution layer and activation function according to the prototype feature of each category at the current layer.
Prototype features generate soft decisions by a linear function of 1*1 and Sigmoid activation function, namely:
wherein, the liquid crystal display device comprises a liquid crystal display device,sigma represents the Sigmoid activation function, +.>
Discrete decisions (decision results) can then be generated by a simple thresholding algorithm, namely:
as shown in fig. 2, 0.6, 0.2, 1, …, 0.3 are the soft decisions, and 1, 0, 1, …, 0 are discrete decisions/hard decisions/decision results of the production.
In one embodiment, S400, determining a classification loss according to the feature map of the support sample and the query sample, determining an adaptation loss according to the domain offset of the image episode and the target domain data set, and quantifying the domain offset of the image episode and the target domain data set by the maximum average difference in determining an overall loss according to the classification loss and the adaptation loss.
In one embodiment, the quantization formula for the domain offset is:
wherein, the liquid crystal display device comprises a liquid crystal display device,mapping features to regenerated Hilbert space, P i Class prototype for class i in source domain dataset, N b Representing the category number, P, of the source domain dataset j For class prototype of class j in the source domain dataset, k () is a kernel function, N s To support the number of samples of the set +.>The MMD metric is represented in regenerated hilbert space with norms on the left vertical line of the equal sign.
In one embodiment, the adaptive loss function is:
wherein L is the number of self-adaptive modules, t is the domain offset, i is the number of layers of the task adapter, represents the ith layer of the task adapter,is a soft decision of the ith layer of the task adapter of the first layer.
In one embodiment, the overall loss function is:
where lambda is a superparameter, used to balance the weights of the two losses,for adaptive loss->Is a classification loss.
The embodiment of the application provides an image recognition method which can be executed by an image recognition device, wherein the image recognition device can be integrated in electronic equipment such as a pad, a computer, a server cluster, a data center and the like. FIG. 7 is a flow chart of a method of image recognition according to one embodiment of the application; the image recognition method comprises the following steps:
S10, acquiring an image episode to be identified in a target domain data set, wherein the image episode to be identified comprises a plurality of types of support samples and inquiry samples, and the support samples are marked;
in this step, unlike in the model training method, the query sample of the image episode to be identified is not labeled.
Wherein, each image episode to be identified is independently subjected to image identification.
S20, acquiring a pre-trained self-adaptive learning network model, wherein the self-adaptive learning network model is obtained by training through the model training method;
s30, adjusting the self-adaptive learning network model through the marked support sample;
in one embodiment, the adapting the adaptive learning network model by the annotated support samples includes: inputting the marked support sample into the self-adaptive learning network model to obtain a feature map of the support sample; determining prototype features of each category at the current layer according to the feature map and the marked category of the support sample; calculating cross entropy loss according to the distance between each support sample and the prototype feature of the category in the current layer; optimizing the self-adaptive learning network model according to the cross entropy loss until convergence, and obtaining the adjusted self-adaptive learning network model.
The cross entropy loss is the classification loss in the model training method.
The determining manner of the prototype feature of each category in the current layer is described in the model training method, and is not described in detail in this step.
The distance between each support sample and the prototype feature of the class at the current layer may be a euclidean distance.
S40, determining a feature map of a support sample and a query sample in the image episode to be identified through the adjusted self-adaptive learning network model;
after the adaptive learning network model is adjusted, the feature map of the support sample needs to be redetermined by the adjusted adaptive learning network model.
S50, determining the category of the query sample according to the feature map of the query sample and the feature map of the marked support sample.
In one embodiment, the determining the category of the query sample according to the feature map of the query sample and the feature map of the labeled support sample includes: determining prototype features of each category at the current layer according to the feature map and the marked category of the support sample; determining the distance between each query sample and the prototype feature of the category in the current layer according to the feature map of each query sample; and selecting the category with the smallest distance as the identification result of the query sample.
According to the domain offset of the task to be tested and the source domain, the method and the system adaptively learn the optimal task specific parameter strategy for each test task, and meanwhile, the task to be tested with different domain offsets obtains different optimal reasoning network structure diagrams, so that the accuracy of small sample image identification is improved.
In the above model training method, the method predicts the category of the query sample by the labeled support sample, and the specific process is the same as the method for image recognition by determining the category of the query sample by the labeled support sample (steps S20-S50), except that the method for model training acquires the feature map of the support sample and the query sample by the untrained adaptive learning network model, and the method for image recognition acquires the feature map of the support sample and the query sample by the pretrained adaptive learning network model. Based on this, a detailed description of the specific process of predicting the class of the query sample by the labeled support sample in the model training method is not repeated.
The embodiment of the application provides a model training device for executing the model training method disclosed by the application, and the model training device is described in detail below.
As shown in fig. 8, the model training apparatus includes:
a training acquisition module 101, configured to randomly acquire a plurality of image episodes in a source domain data set, where each image episode includes a plurality of types of support samples and query samples, and the support samples and the query samples in the image episodes are labeled;
a model building module 102 for building a task aware adaptive learning network model;
the feature acquisition module 103 is configured to input the image episode into the adaptive learning network model to obtain feature graphs of a support sample and a query sample in the image episode;
a loss determination module 104, configured to determine a classification loss according to a feature map of the support sample and the query sample, determine an adaptive loss according to a domain offset of the image episode and a target domain data set, and determine an overall loss according to the classification loss and the adaptive loss;
a model training module 105 for adjusting the adaptive learning network model according to the overall loss until the overall loss converges.
In one embodiment, the adaptive learning network model includes a plurality of residual blocks and full connection layers connected in sequence;
The feature acquisition module 103 is further configured to:
feature extraction is carried out on the input image cutting through a residual block, and an extracted intermediate feature image is obtained;
and carrying out linear transformation on the extracted feature images through a full connection layer to obtain feature images of the support samples and the query samples.
In one embodiment, the residual block includes an adaptive module layer, a batch normalization layer, and a ReLU function;
the feature acquisition module 103 is further configured to:
performing feature extraction on the output of the last residual block through the self-adaptive module layer;
normalizing the output of the self-adaptive module through a batch normalization layer;
and mapping the normalized output result to an output end through a ReLU function.
In one embodiment, the adaptive module layer comprises a convolutional layer, a task adapter and a gating network, wherein the task adapter comprises a plurality of task parameter convolutional layers;
the feature acquisition module 103 is further configured to:
performing first feature extraction on the output of the last residual block through a convolution layer;
respectively carrying out second feature extraction on the output of the last residual block through a plurality of task parameter convolution layers in the task adapter;
generating a decision result based on the output of the last residual block through a gating network, and deciding whether each task parameter convolution layer in the task adapter is executed or not according to the decision result;
And adding the result of the second feature extraction after decision and the result of the first feature extraction to be used as the output of the adaptive module layer.
In one embodiment, the gating network includes a global average pooling layer, a prototype-like layer, a 1*1 convolution layer, and an activation function;
the feature acquisition module 103 is further configured to:
carrying out space dimension compression on the output of the last residual block through a global average pooling layer;
determining prototype features of each category in the image episode at the current layer through a prototype-like layer;
the decision results are generated via 1*1 convolution layers and activation functions based on the prototype features of each class at the current layer.
The model training device provided by the embodiment of the application and the model training method provided by the embodiment of the application have the same beneficial effects as the method adopted, operated or realized by the application program stored in the model training device and the model training method provided by the embodiment of the application due to the same inventive conception.
An embodiment of the present application provides an image recognition apparatus for performing the image recognition method according to the foregoing aspect of the present application, and the image recognition apparatus is described in detail below.
As shown in fig. 9, the image recognition apparatus includes:
a test acquisition module 201, configured to acquire an image episode to be identified in a target domain data set, where the image episode to be identified includes a plurality of types of support samples and query samples, and the support samples are labeled;
A model obtaining module 202, configured to obtain a pre-trained adaptive learning network model, where the adaptive learning network model is obtained by training by the model training method described above;
a model adjustment module 203, configured to adjust the adaptive learning network model through the labeled support sample;
the model output module 204 is configured to determine, through the adjusted adaptive learning network model, a feature map of a support sample and a query sample in the image episode to be identified;
the category determining module 205 is configured to determine a category of the query sample according to the feature map of the query sample and the feature map of the labeled support sample.
The image recognition device provided by the above embodiment of the present application and the image recognition method provided by the embodiment of the present application have the same advantageous effects as the method adopted, operated or implemented by the application program stored therein, because of the same inventive concept.
The internal functions and structures of the model training apparatus/image recognition apparatus are described above, and as shown in fig. 10, in practice, the model training apparatus/image recognition apparatus may be implemented as a terminal device, including: memory 301 and processor 303.
The memory 301 may be configured to store a program.
In addition, the memory 301 may also be configured to store other various data to support operations on the terminal device. Examples of such data include instructions for any application or method operating on the terminal device, contact data, phonebook data, messages, pictures, video, etc.
The memory 301 may be implemented by any type or combination of volatile or nonvolatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disk.
A processor 303 coupled to the memory 301 for executing programs in the memory 301 for:
randomly acquiring a plurality of image episodes in a source domain data set, wherein each image episode comprises a plurality of types of support samples and query samples, and the support samples and the query samples in the image episodes are marked;
constructing a task-aware self-adaptive learning network model;
inputting the image episode into the self-adaptive learning network model to obtain a feature map of a support sample and a query sample in the image episode;
Determining classification loss according to the feature graphs of the support samples and the query samples, determining adaptive loss according to the domain offset of the image episode and the target domain data set, and determining overall loss according to the classification loss and the adaptive loss;
and adjusting the self-adaptive learning network model according to the overall loss until the overall loss converges.
In one embodiment, the adaptive learning network model includes a plurality of residual blocks and full connection layers connected in sequence;
the processor 303 is specifically configured to:
feature extraction is carried out on the input image cutting through a residual block, and an extracted intermediate feature image is obtained;
and carrying out linear transformation on the extracted feature images through a full connection layer to obtain feature images of the support samples and the query samples.
In one embodiment, the residual block includes an adaptive module layer, a batch normalization layer, and a ReLU function;
the processor 303 is specifically configured to:
performing feature extraction on the output of the last residual block through the self-adaptive module layer;
normalizing the output of the self-adaptive module through a batch normalization layer;
and mapping the normalized output result to an output end through a ReLU function.
In one embodiment, the adaptive module layer comprises a convolutional layer, a task adapter and a gating network, wherein the task adapter comprises a plurality of task parameter convolutional layers;
the processor 303 is specifically configured to:
performing first feature extraction on the output of the last residual block through a convolution layer;
respectively carrying out second feature extraction on the output of the last residual block through a plurality of task parameter convolution layers in the task adapter;
generating a decision result based on the output of the last residual block through a gating network, and deciding whether each task parameter convolution layer in the task adapter is executed or not according to the decision result;
and adding the result of the second feature extraction after decision and the result of the first feature extraction to be used as the output of the adaptive module layer.
In one embodiment, the gating network includes a global average pooling layer, a prototype-like layer, a 1*1 convolution layer, and an activation function;
the processor 303 is specifically configured to:
carrying out space dimension compression on the output of the last residual block through a global average pooling layer;
determining prototype features of each category in the image episode at the current layer through a prototype-like layer;
the decision results are generated via 1*1 convolution layers and activation functions based on the prototype features of each class at the current layer.
Alternatively, the processor 303 is coupled to the memory 301 for executing the program in the memory 301 for:
acquiring an image episode to be identified in a target domain data set, wherein the image episode to be identified comprises a plurality of types of support samples and query samples, and the support samples are marked;
obtaining a pre-trained self-adaptive learning network model, wherein the self-adaptive learning network model is obtained by training through the model training method;
adjusting the self-adaptive learning network model through the marked support sample;
determining a feature map of a support sample and a query sample in the image episode to be identified through the adjusted self-adaptive learning network model;
and determining the category of the query sample according to the feature map of the query sample and the feature map of the marked support sample.
In the present application, only some components are schematically shown in fig. 10, which does not mean that the terminal device includes only the components shown in fig. 10.
The terminal device provided in this embodiment, which is the same as the model training method or the image recognition method provided in the embodiment of the present application, has the same advantages as the method adopted, operated or implemented by the application program stored in the terminal device.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-readable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In one typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media.
The present application also provides a computer readable storage medium corresponding to the model training method or the image recognition method provided in the foregoing embodiments, on which a computer program (i.e., a program product) is stored, which when executed by a processor, performs the model training method provided in any of the foregoing embodiments, or performs the image recognition method provided in any of the foregoing embodiments.
Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.
The computer readable storage medium provided by the above embodiment of the present application has the same advantages as the method adopted, operated or implemented by the application program stored therein, because the same inventive concept is adopted by the model training method or the image recognition method provided by the embodiment of the present application.
In the description provided herein, numerous specific details are set forth. However, it is understood that embodiments of the application may be practiced without these specific details. In some instances, well-known structures and techniques have not been shown in detail in order not to obscure an understanding of this description.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element.
The foregoing is merely exemplary of the present application and is not intended to limit the present application. Various modifications and variations of the present application will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. which come within the spirit and principles of the application are to be included in the scope of the claims of the present application.

Claims (10)

1. A method of model training, comprising:
randomly acquiring a plurality of image episodes in a source domain data set, wherein each image episode comprises a plurality of types of support samples and query samples, and the support samples and the query samples in the image episodes are marked;
constructing a task-aware self-adaptive learning network model;
inputting the image episode into the self-adaptive learning network model to obtain a feature map of a support sample and a query sample in the image episode;
determining classification loss according to the feature graphs of the support samples and the query samples, determining adaptive loss according to the domain offset of the image episode and the target domain data set, and determining overall loss according to the classification loss and the adaptive loss;
the determining an adaptive loss according to the domain offset of the image episode from the target domain dataset comprises:
Quantifying the domain offset of the image episode and the target domain data set through the maximum average difference, and determining the adaptive loss corresponding to the domain offset according to a preset adaptive loss function; the preset adaptive loss function is:
wherein L is the number of self-adaptive modules, t is the domain offset, i is the number of layers of the task adapter, represents the ith layer of the task adapter,soft decisions for the ith layer of the first layer task adapter;
and adjusting the self-adaptive learning network model according to the overall loss until the overall loss converges.
2. The method of claim 1, wherein the adaptive learning network model comprises a plurality of residual blocks and full connection layers connected in sequence;
inputting the image episode into the adaptive learning network model to obtain a feature map of a support sample and a query sample in the image episode, including:
feature extraction is carried out on the input image cutting through a residual block, and an extracted intermediate feature image is obtained;
and carrying out linear transformation on the extracted feature images through a full connection layer to obtain feature images of the support samples and the query samples.
3. The method of claim 2, wherein the residual block comprises an adaptation module layer, a batch normalization layer, and a ReLU function;
The feature extraction of the input image episode by the residual block includes:
performing feature extraction on the output of the last residual block through the self-adaptive module layer;
normalizing the output of the self-adaptive module through a batch normalization layer;
and mapping the normalized output result to an output end through a ReLU function.
4. The method of claim 3, wherein the adaptive module layer comprises a convolutional layer, a task adapter, and a gating network, the task adapter comprising a plurality of task parameter convolutional layers;
the feature extraction of the output of the last residual block by the adaptive module layer comprises:
performing first feature extraction on the output of the last residual block through a convolution layer;
respectively carrying out second feature extraction on the output of the last residual block through a plurality of task parameter convolution layers in the task adapter;
generating a decision result based on the output of the last residual block through a gating network, and deciding whether each task parameter convolution layer in the task adapter is executed or not according to the decision result;
and adding the result of the second feature extraction after decision and the result of the first feature extraction to be used as the output of the adaptive module layer.
5. The method of claim 4, wherein the gating network comprises a global average pooling layer, a prototype-like layer, a 1*1 convolution layer, and an activation function;
the generating, by the gating network, a decision result based on an output of a last residual block, including:
carrying out space dimension compression on the output of the last residual block through a global average pooling layer;
determining prototype features of each category in the image episode at the current layer through a prototype-like layer;
the decision results are generated via 1*1 convolution layers and activation functions based on the prototype features of each class at the current layer.
6. An image recognition method, comprising:
acquiring an image episode to be identified in a target domain data set, wherein the image episode to be identified comprises a plurality of types of support samples and query samples, and the support samples are marked;
obtaining a pre-trained self-adaptive learning network model, wherein the self-adaptive learning network model is obtained by training a model training method according to any one of claims 1-5;
adjusting the self-adaptive learning network model through the marked support sample;
determining a feature map of a support sample and a query sample in the image episode to be identified through the adjusted self-adaptive learning network model;
And determining the category of the query sample according to the feature map of the query sample and the feature map of the marked support sample.
7. A model training device, comprising:
the training acquisition module is used for randomly acquiring a plurality of image episodes in the source domain data set, each image episode comprises a plurality of categories of support samples and query samples, and the support samples and the query samples in the image episodes are marked;
the model construction module is used for constructing a task-aware self-adaptive learning network model;
the feature acquisition module is used for inputting the image episode into the self-adaptive learning network model to obtain feature graphs of a support sample and a query sample in the image episode;
a loss determination module for determining a classification loss according to a feature map of the support sample and the query sample, an adaptive loss according to a domain offset of the image episode and a target domain dataset, and an overall loss according to the classification loss and the adaptive loss; the determining an adaptive loss according to the domain offset of the image episode from the target domain dataset comprises:
quantifying the domain offset of the image episode and the target domain data set through the maximum average difference, and determining the adaptive loss corresponding to the domain offset according to a preset adaptive loss function; the preset adaptive loss function is:
Wherein L is the number of self-adaptive modules, t is the domain offset, i is the number of layers of the task adapter, represents the ith layer of the task adapter,soft decisions for the ith layer of the first layer task adapter;
and the model training module is used for adjusting the self-adaptive learning network model according to the integral loss until the integral loss converges.
8. An image recognition apparatus, comprising:
the test acquisition module is used for acquiring an image episode to be identified in the target domain data set, wherein the image episode to be identified comprises a plurality of types of support samples and inquiry samples, and the support samples are marked;
a model acquisition module for acquiring a pre-trained adaptive learning network model, the adaptive learning network model being obtained by training the model training method according to any one of claims 1 to 5;
the model adjustment module is used for adjusting the self-adaptive learning network model through the marked support samples;
the model output module is used for determining a feature map of a support sample and a query sample in the image episode to be identified through the adjusted self-adaptive learning network model;
And the category determining module is used for determining the category of the query sample according to the characteristic diagram of the query sample and the characteristic diagram of the marked support sample.
9. A terminal device, comprising: a memory and a processor;
the memory is used for storing programs;
the processor, coupled to the memory, for executing the program for performing the model training method of any of claims 1-5 or for performing the image recognition method of claim 6.
10. A computer-readable storage medium, on which a computer program is stored, characterized in that the program is executed by a processor to implement the model training method of any one of claims 1-5, or to implement the image recognition method of claim 6.
CN202310063908.6A 2023-01-12 2023-01-12 Model training and image recognition method, device, equipment and storage medium Active CN116091867B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310063908.6A CN116091867B (en) 2023-01-12 2023-01-12 Model training and image recognition method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310063908.6A CN116091867B (en) 2023-01-12 2023-01-12 Model training and image recognition method, device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN116091867A CN116091867A (en) 2023-05-09
CN116091867B true CN116091867B (en) 2023-09-29

Family

ID=86204138

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310063908.6A Active CN116091867B (en) 2023-01-12 2023-01-12 Model training and image recognition method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN116091867B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109447149A (en) * 2018-10-25 2019-03-08 腾讯科技(深圳)有限公司 A kind of training method of detection model, device and terminal device
WO2020083073A1 (en) * 2018-10-23 2020-04-30 苏州科达科技股份有限公司 Non-motorized vehicle image multi-label classification method, system, device and storage medium
CN111858991A (en) * 2020-08-06 2020-10-30 南京大学 Small sample learning algorithm based on covariance measurement
CN112990282A (en) * 2021-03-03 2021-06-18 华南理工大学 Method and device for classifying fine-grained small sample images
CN114511521A (en) * 2022-01-21 2022-05-17 浙江大学 Tire defect detection method based on multiple representations and multiple sub-field self-adaption
CN115239946A (en) * 2022-06-30 2022-10-25 锋睿领创(珠海)科技有限公司 Small sample transfer learning training and target detection method, device, equipment and medium
CN115270872A (en) * 2022-07-26 2022-11-01 中山大学 Radar radiation source individual small sample learning and identifying method, system, device and medium

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200143209A1 (en) * 2018-11-07 2020-05-07 Element Ai Inc. Task dependent adaptive metric for classifying pieces of data
CN110472483B (en) * 2019-07-02 2022-11-15 五邑大学 SAR image-oriented small sample semantic feature enhancement method and device
US11263488B2 (en) * 2020-04-13 2022-03-01 International Business Machines Corporation System and method for augmenting few-shot object classification with semantic information from multiple sources

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020083073A1 (en) * 2018-10-23 2020-04-30 苏州科达科技股份有限公司 Non-motorized vehicle image multi-label classification method, system, device and storage medium
CN109447149A (en) * 2018-10-25 2019-03-08 腾讯科技(深圳)有限公司 A kind of training method of detection model, device and terminal device
CN111858991A (en) * 2020-08-06 2020-10-30 南京大学 Small sample learning algorithm based on covariance measurement
CN112990282A (en) * 2021-03-03 2021-06-18 华南理工大学 Method and device for classifying fine-grained small sample images
CN114511521A (en) * 2022-01-21 2022-05-17 浙江大学 Tire defect detection method based on multiple representations and multiple sub-field self-adaption
CN115239946A (en) * 2022-06-30 2022-10-25 锋睿领创(珠海)科技有限公司 Small sample transfer learning training and target detection method, device, equipment and medium
CN115270872A (en) * 2022-07-26 2022-11-01 中山大学 Radar radiation source individual small sample learning and identifying method, system, device and medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Yurong Guo等.Learning Calibrated Class Centers for Few-Shot Classification by Pair-Wise Similarity.《 IEEE Transactions on Image Processing 》.2022,全文. *
杨晨曦;左劼;孙频捷.基于自编码器的零样本学习方法研究进展.现代计算机.2020,(01),全文. *

Also Published As

Publication number Publication date
CN116091867A (en) 2023-05-09

Similar Documents

Publication Publication Date Title
US11501162B2 (en) Device for classifying data
CN109993102B (en) Similar face retrieval method, device and storage medium
CN113361645B (en) Target detection model construction method and system based on meta learning and knowledge memory
CN113128478B (en) Model training method, pedestrian analysis method, device, equipment and storage medium
CN112036447A (en) Zero-sample target detection system and learnable semantic and fixed semantic fusion method
CN111027576A (en) Cooperative significance detection method based on cooperative significance generation type countermeasure network
CN111052128B (en) Descriptor learning method for detecting and locating objects in video
CN112668608B (en) Image recognition method and device, electronic equipment and storage medium
WO2020019102A1 (en) Methods, systems, articles of manufacture and apparatus to train a neural network
CN112200296B (en) Network model quantization method and device, storage medium and electronic equipment
CN115063664A (en) Model learning method, training method and system for industrial vision detection
WO2023160290A1 (en) Neural network inference acceleration method, target detection method, device, and storage medium
CN116091867B (en) Model training and image recognition method, device, equipment and storage medium
CN117033657A (en) Information retrieval method and device
JP2023126130A (en) Computer-implemented method, data processing apparatus and computer program for object detection
CN115600666A (en) Self-learning method and device for power transmission and distribution line defect detection model
CN113033397A (en) Target tracking method, device, equipment, medium and program product
CN110728292A (en) Self-adaptive feature selection algorithm under multi-task joint optimization
CN110188219A (en) Deeply de-redundancy hash algorithm towards image retrieval
CN115658926B (en) Element estimation method and device of knowledge graph, electronic equipment and storage medium
CN112507137B (en) Small sample relation extraction method based on granularity perception in open environment and application
US20240161474A1 (en) Neural Network Inference Acceleration Method, Target Detection Method, Device, and Storage Medium
WO2024012179A1 (en) Model training method, target detection method and apparatuses
CN115457546A (en) Method, system and storage medium for generating cell image density map
CN117218467A (en) Model training method and related device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant