CN116091867A

CN116091867A - A model training, image recognition method, device, equipment and storage medium

Info

Publication number: CN116091867A
Application number: CN202310063908.6A
Authority: CN
Inventors: 马占宇; 郭玉荣; 杜若一; 梁孔明
Original assignee: Beijing University of Posts and Telecommunications
Current assignee: Beijing University of Posts and Telecommunications
Priority date: 2023-01-12
Filing date: 2023-01-12
Publication date: 2023-05-09
Anticipated expiration: 2043-01-12
Also published as: CN116091867B

Abstract

The application provides a model training method, an image recognition device, equipment and a storage medium, wherein the method comprises the following steps: randomly acquiring a plurality of image episodes in a source domain data set; constructing a task-aware self-adaptive learning network model; inputting the image episode into the self-adaptive learning network model to obtain a feature map of a support sample and a query sample in the image episode; determining classification loss according to the feature graphs of the support samples and the query samples, determining adaptive loss according to the domain offset of the image episode and the target domain data set, and determining overall loss according to the classification loss and the adaptive loss; and adjusting the self-adaptive learning network model according to the overall loss until the overall loss converges. In the method, the domain offset is introduced into the loss function, so that the trained model can give consideration to the target data set with different domain offsets, and a more accurate image recognition effect is achieved.

Description

A model training, image recognition method, device, equipment and storage medium

技术领域Technical Field

本申请涉及图像处理技术领域，具体而言，涉及一种模型训练、图像识别方法、装置、设备及存储介质。The present application relates to the field of image processing technology, and more specifically, to a model training, image recognition method, device, equipment and storage medium.

背景技术Background Art

传统的深度学习模型通过对大量有标签样本的训练，展示出优异的泛化性能。然而，丰富的样本和可靠的标注在实际应用中难以获取，比如罕见病诊断，细粒度识别等。受人类能够快速学习新知识的启发，小样本学习旨在实现当每个类别仅有少量有标签的样本时，模型可以很好的识别待测样本。Traditional deep learning models have demonstrated excellent generalization performance by training on a large number of labeled samples. However, rich samples and reliable annotations are difficult to obtain in practical applications, such as rare disease diagnosis and fine-grained recognition. Inspired by the ability of humans to quickly learn new knowledge, small sample learning aims to achieve that when there are only a small number of labeled samples in each category, the model can well identify the sample to be tested.

但是目前大多数小样本学习图像识别模型只关注如何快速适应新类别，并未考虑测试任务和训练域之间的域偏移问题。However, most current image recognition models using small-sample learning only focus on how to quickly adapt to new categories, without considering the domain shift problem between the test task and the training domain.

发明内容Summary of the invention

本申请解决的问题是当前小样本学习图像识别模型未考虑测试任务和训练域之间的域偏移问题。The problem addressed by this application is that current small-sample learning image recognition models do not consider the domain shift problem between the test task and the training domain.

为解决上述问题，本申请第一方面提供了一种模型训练方法，包括：To solve the above problems, the first aspect of the present application provides a model training method, comprising:

在源域数据集中随机获取多个图像插曲，每个图像插曲包含多个类别的支持样本与查询样本，所述图像插曲中的支持样本与查询样本已标注；A plurality of image episodes are randomly obtained in a source domain dataset, each of which includes support samples and query samples of multiple categories, and the support samples and query samples in the image episodes are labeled;

构建任务感知的自适应学习网络模型；Constructing task-aware adaptive learning network models;

将所述图像插曲输入所述自适应学习网络模型，得到所述图像插曲中的支持样本与查询样本的特征图；Inputting the image episode into the adaptive learning network model to obtain feature graphs of support samples and query samples in the image episode;

根据所述支持样本与所述查询样本的特征图确定分类损失，根据所述图像插曲与目标域数据集的域偏移确定自适应损失，根据所述分类损失与所述自适应损失确定整体损失；Determine a classification loss according to the feature graphs of the support sample and the query sample, determine an adaptive loss according to the domain shift of the image interlude and a target domain dataset, and determine an overall loss according to the classification loss and the adaptive loss;

根据所述整体损失调整所述自适应学习网络模型，直至所述整体损失收敛为止。The adaptive learning network model is adjusted according to the overall loss until the overall loss converges.

本申请第二方面提供了一种图像识别方法，其包括：The second aspect of the present application provides an image recognition method, which includes:

获取目标域数据集中待识别的图像插曲，所述待识别的图像插曲包含多个类别的支持样本与查询样本，所述支持样本已标注；Obtaining an image episode to be identified in a target domain dataset, wherein the image episode to be identified includes support samples and query samples of multiple categories, and the support samples have been labeled;

获取预训练的自适应学习网络模型，所述自适应学习网络模型通过前述所述的模型训练方法进行训练得到；Obtaining a pre-trained adaptive learning network model, wherein the adaptive learning network model is trained by the aforementioned model training method;

通过已标注的支持样本对所述自适应学习网络模型进行调整；Adjusting the adaptive learning network model through labeled support samples;

通过调整后的所述自适应学习网络模型确定待识别的图像插曲中支持样本与查询样本的特征图；Determine the feature graphs of the support samples and the query samples in the image episode to be identified by using the adjusted adaptive learning network model;

根据查询样本的特征图和已标注的支持样本的特征图，确定所述查询样本的类别。The category of the query sample is determined according to the feature graph of the query sample and the feature graphs of the labeled support samples.

本申请第三方面提供了一种模型训练装置，其包括：The third aspect of the present application provides a model training device, which includes:

训练获取模块，其用于在源域数据集中随机获取多个图像插曲，每个图像插曲包含多个类别的支持样本与查询样本，所述图像插曲中的支持样本与查询样本已标注；A training acquisition module, which is used to randomly acquire multiple image episodes in a source domain dataset, each image episode includes support samples and query samples of multiple categories, and the support samples and query samples in the image episode are labeled;

模型构建模块，其用于构建任务感知的自适应学习网络模型；A model building module, which is used to build a task-aware adaptive learning network model;

特征获取模块，其用于将所述图像插曲输入所述自适应学习网络模型，得到所述图像插曲中的支持样本与查询样本的特征图；A feature acquisition module, which is used to input the image episode into the adaptive learning network model to obtain feature graphs of support samples and query samples in the image episode;

损失确定模块，其用于根据所述支持样本与所述查询样本的特征图确定分类损失，根据所述图像插曲与目标域数据集的域偏移确定自适应损失，根据所述分类损失与所述自适应损失确定整体损失；a loss determination module, configured to determine a classification loss according to feature maps of the support sample and the query sample, determine an adaptive loss according to a domain shift between the image interlude and a target domain dataset, and determine an overall loss according to the classification loss and the adaptive loss;

模型训练模块，其用于根据所述整体损失调整所述自适应学习网络模型，直至所述整体损失收敛为止。A model training module is used to adjust the adaptive learning network model according to the overall loss until the overall loss converges.

本申请第四方面提供了一种图像识别装置，其包括：A fourth aspect of the present application provides an image recognition device, comprising:

测试获取模块，其用于获取目标域数据集中待识别的图像插曲，所述待识别的图像插曲包含多个类别的支持样本与查询样本，所述支持样本已标注；A test acquisition module, which is used to acquire image episodes to be identified in a target domain dataset, wherein the image episodes to be identified include support samples and query samples of multiple categories, and the support samples are labeled;

模型获取模块，其用于获取预训练的自适应学习网络模型，所述自适应学习网络模型通过前述所述的模型训练方法进行训练得到；A model acquisition module, which is used to acquire a pre-trained adaptive learning network model, wherein the adaptive learning network model is trained by the aforementioned model training method;

模型调整模块，其用于通过已标注的支持样本对所述自适应学习网络模型进行调整；A model adjustment module, which is used to adjust the adaptive learning network model through labeled support samples;

模型输出模块，其用于通过调整后的所述自适应学习网络模型确定待识别的图像插曲中支持样本与查询样本的特征图；A model output module, which is used to determine the feature graphs of the supporting samples and the query samples in the image episode to be identified through the adjusted adaptive learning network model;

类别确定模块，其用于根据查询样本的特征图和已标注的支持样本的特征图，确定所述查询样本的类别。The category determination module is used to determine the category of the query sample according to the feature graph of the query sample and the feature graphs of the labeled support samples.

本申请第五方面提供了一种终端设备，其包括：存储器和处理器；A fifth aspect of the present application provides a terminal device, comprising: a memory and a processor;

所述存储器，其用于存储程序；The memory is used to store programs;

所述处理器，耦合至所述存储器，用于执行所述程序，以用于执行前述所述的模型训练方法，或者，以用于执行前述所述的图像识别方法。The processor, coupled to the memory, is used to execute the program to execute the model training method described above, or to execute the image recognition method described above.

本申请第六方面提供了一种计算机可读存储介质，其上存储有计算机程序，所述程序被处理器执行实现上述所述的模型训练方法，或者，实现上述所述的图像识别方法。In a sixth aspect, the present application provides a computer-readable storage medium having a computer program stored thereon, wherein the program is executed by a processor to implement the above-mentioned model training method, or to implement the above-mentioned image recognition method.

本申请中，通过将域偏移引入损失函数，从而使得训练后的模型可以兼顾具备不同域偏移的目标数据集，达到更准确的图像识别效果。In this application, by introducing domain offset into the loss function, the trained model can take into account target data sets with different domain offsets to achieve more accurate image recognition effects.

本申请中，根据待测任务与源域的域偏移大小，自适应的为每个测试任务学习到最佳任务特定参数策略，同时具备不同域偏移的待测任务获得不同的最优推理网络结构图，提升小样本图像识别的准确率。In this application, according to the domain offset size between the task to be tested and the source domain, the optimal task-specific parameter strategy is adaptively learned for each test task. At the same time, different optimal inference network structure diagrams are obtained for tasks to be tested with different domain offsets, thereby improving the accuracy of small sample image recognition.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1为根据本申请一个实施例的模型训练方法的流程图；FIG1 is a flow chart of a model training method according to an embodiment of the present application;

图2为根据本申请自适应学习网络模型示意图；FIG2 is a schematic diagram of an adaptive learning network model according to the present application;

图3为根据本申请一个实施例的模型训练方法基于自适应学习网络模型的流程图；FIG3 is a flow chart of a model training method based on an adaptive learning network model according to an embodiment of the present application;

图4为根据本申请一个实施例的模型训练方法基于残差块的流程图；FIG4 is a flow chart of a model training method based on a residual block according to an embodiment of the present application;

图5为根据本申请一个实施例的模型训练方法基于自适应模块层的流程图；FIG5 is a flow chart of a model training method based on an adaptive module layer according to an embodiment of the present application;

图6为根据本申请一个实施例的模型训练方法基于门控网络的流程图；FIG6 is a flow chart of a model training method based on a gated network according to an embodiment of the present application;

图7为根据本申请一个实施例的图像识别方法的流程图；FIG7 is a flow chart of an image recognition method according to an embodiment of the present application;

图8为根据本申请一个实施例的模型训练装置的结构框图；FIG8 is a structural block diagram of a model training device according to an embodiment of the present application;

图9为根据本申请一个实施例的图像识别装置的结构框图；FIG9 is a structural block diagram of an image recognition device according to an embodiment of the present application;

图10为根据本申请实施例的终端设备的结构框图。FIG10 is a structural block diagram of a terminal device according to an embodiment of the present application.

具体实施方式DETAILED DESCRIPTION

为使本申请的上述目的、特征和优点能够更为明显易懂，下面结合附图对本申请的具体实施例做详细的说明。虽然附图中显示了本申请的示例性实施方式，然而应当理解，可以以各种形式实现本申请而不应被这里阐述的实施方式所限制。相反，提供这些实施方式是为了能够更透彻地理解本申请，并且能够将本申请的范围完整的传达给本领域的技术人员。In order to make the above-mentioned purposes, features and advantages of the present application more obvious and easy to understand, the specific embodiments of the present application are described in detail below in conjunction with the accompanying drawings. Although the exemplary embodiments of the present application are shown in the accompanying drawings, it should be understood that the present application can be implemented in various forms and should not be limited by the embodiments described herein. On the contrary, these embodiments are provided in order to enable a more thorough understanding of the present application and to fully convey the scope of the present application to those skilled in the art.

需要注意的是，除非另有说明，本申请使用的技术术语或者科学术语应当为本申请所属领域技术人员所理解的通常意义。It should be noted that, unless otherwise specified, the technical terms or scientific terms used in this application should have the common meanings understood by technicians in the field to which this application belongs.

最近，面对来源于源域的测试数据，基于元学习的小样本学习图像识别方法取得了长足的进步。然而，越来越多的研究表明，当测试数据与源域异构时，这些方法的泛化能力明显不足。这一局限性归因于，大多数小样本学习图像识别模型只关注如何快速适应新类别，但很少或根本没有努力理解和解决测试任务和训练域之间的域偏移问题。Recently, meta-learning-based few-shot image recognition methods have made great progress when the test data is derived from the source domain. However, more and more studies have shown that these methods have significantly insufficient generalization ability when the test data is heterogeneous with the source domain. This limitation is attributed to the fact that most few-shot image recognition models only focus on how to quickly adapt to new categories, but little or no effort is made to understand and address the domain shift problem between the test task and the training domain.

针对上述问题，本申请提供一种新的模型训练方案，能够通过引入图像插曲与目标域数据集的域偏移确定自适应损失的方式，解决当前小样本学习图像识别模型未考虑测试任务和训练域之间的域偏移的问题。To address the above problems, the present application provides a new model training scheme, which can solve the problem that the current small-sample learning image recognition model does not consider the domain offset between the test task and the training domain by introducing the domain offset between the image interlude and the target domain dataset to determine the adaptive loss.

为了便于理解，在此对下述可能使用的术语进行解释：For ease of understanding, the following terms that may be used are explained here:

激活函数：神经网络中的每个神经元节点接受上一层神经元的输出值作为本神经元的输入值，并将输入值传递给下一层，输入层神经元节点会将输入属性值直接传递给下一层(隐层或输出层)；在多层神经网络中，上层节点的输出和下层节点的输入之间具有一个函数关系，这个函数称为激活函数。Activation function: Each neuron node in a neural network accepts the output value of the previous layer of neurons as the input value of this neuron, and passes the input value to the next layer. The input layer neuron node will directly pass the input attribute value to the next layer (hidden layer or output layer); in a multi-layer neural network, there is a functional relationship between the output of the upper layer node and the input of the lower layer node. This function is called an activation function.

源域(source domain)：表示与测试样本不同的领域，但是有丰富的监督信息。Source domain: It refers to a domain that is different from the test sample but has rich supervision information.

目标域(target domain)：表示测试样本所在的领域，无标签或者只有少量标签；一般而言，源域和目标域属于同一类任务，但是分布不同。Target domain: refers to the field where the test samples are located, with no labels or only a small number of labels; generally speaking, the source domain and the target domain belong to the same type of tasks, but have different distributions.

本申请实施例提供了一种模型训练方法，该方法可以由模型训练装置来执行，该模型训练装置可以集成在pad、电脑、服务器、计算机、服务器集群、数据中心等电子设备中。如图1所示，其为根据本申请一个实施例的模型训练方法的流程图；其中，所述模型训练方法，包括：The embodiment of the present application provides a model training method, which can be performed by a model training device, and the model training device can be integrated in electronic devices such as pads, computers, servers, computers, server clusters, data centers, etc. As shown in Figure 1, it is a flow chart of a model training method according to an embodiment of the present application; wherein the model training method includes:

S100，在源域数据集中随机获取多个图像插曲，每个图像插曲包含多个类别的支持样本与查询样本，所述图像插曲中的支持样本与查询样本已标注；S100, randomly obtaining multiple image episodes in a source domain dataset, each image episode comprising support samples and query samples of multiple categories, and the support samples and query samples in the image episodes are labeled;

本申请中，源域数据集用于训练模型，训练后的模型用于识别目标域数据集中的待识别图像。其中，源域数据集中的数据为训练数据，目标域数据集中的数据为测试数据。In this application, the source domain dataset is used to train the model, and the trained model is used to identify the images to be identified in the target domain dataset. The data in the source domain dataset is the training data, and the data in the target domain dataset is the test data.

其中，源域数据集，可以为ImageNet图像数据集，也可以为其他来源，本申请中对源域数据集的获取方式不做限制。The source domain dataset may be an ImageNet image dataset or another source. This application does not limit the method for obtaining the source domain dataset.

需要说明的是，图像插曲中的支持样本与查询样本的标注方式，本申请中不做限制。It should be noted that the labeling method of the support samples and query samples in the image interlude is not limited in this application.

在一种实施方式中，所述图像插曲中所述支持样本的多个类别与所述查询样本的多个类别相同。In one embodiment, the multiple categories of the support samples in the image episode are the same as the multiple categories of the query samples.

本申请中，通过相同类别的支持样本与查询样本，达到更好的训练效果。In this application, better training effect is achieved by using support samples and query samples of the same category.

在一种实施方式中，所述图像插曲中，每个类别包含1-5个支持样本。In one embodiment, in the image episode, each category contains 1-5 supporting samples.

每个采样的图像插曲均包含支持样本集S和查询样本集Q，即：Each sampled image episode contains a support sample set S and a query sample set Q, namely:

T＝S∪Q，且

T = S ∪ Q, and

其中支持样本集记为：

查询样本集记为

The supported sample set is recorded as:

The query sample set is denoted as

S200，构建任务感知的自适应学习网络模型；S200, building a task-aware adaptive learning network model;

其中，所述自适应学习网络模型为用于小样本图像识别的自适应学习网络模型。The adaptive learning network model is an adaptive learning network model for small sample image recognition.

S300，将所述图像插曲输入所述自适应学习网络模型，得到所述图像插曲中的支持样本与查询样本的特征图；S300, inputting the image episode into the adaptive learning network model to obtain feature graphs of support samples and query samples in the image episode;

S400，根据所述支持样本与所述查询样本的特征图确定分类损失，根据所述图像插曲与目标域数据集的域偏移确定自适应损失，根据所述分类损失与所述自适应损失确定整体损失；S400, determining a classification loss according to the feature graphs of the support sample and the query sample, determining an adaptive loss according to a domain shift between the image interlude and a target domain dataset, and determining an overall loss according to the classification loss and the adaptive loss;

其中，所述图像插曲来自于源域数据集，因此所述图像插曲与目标域数据集的域偏移即为述源域数据集与目标域数据集的域偏移。The image interlude comes from a source domain dataset, so the domain offset between the image interlude and the target domain dataset is the domain offset between the source domain dataset and the target domain dataset.

在一种实施方式中，所述根据所述支持样本与所述查询样本的特征图确定分类损失，首先通过已标注的支持样本对查询样本的类别进行预测，然后通过查询样本的预测类别和已标注的类别确定分类损失。In one embodiment, the classification loss is determined based on the feature graphs of the support samples and the query sample. First, the category of the query sample is predicted through the labeled support samples, and then the classification loss is determined based on the predicted category of the query sample and the labeled category.

S500，根据所述整体损失调整所述自适应学习网络模型，直至所述整体损失收敛为止。S500, adjusting the adaptive learning network model according to the overall loss until the overall loss converges.

本申请中，通过反向传播的方式，根据整体损失对模型进行调整。In this application, the model is adjusted according to the overall loss through back propagation.

在一种实施方式中，通过软决策的梯度对反向传播的梯度进行计算，达到更好的收敛效果。In one implementation, the gradient of back propagation is calculated using the gradient of the soft decision to achieve better convergence effect.

如图2所示，其为下述自适应学习网络模型的示意图，结合该图进行下述描述。As shown in FIG2 , it is a schematic diagram of the following adaptive learning network model, and the following description is made in conjunction with the figure.

在一种实施方式中，所述自适应学习网络模型包括多个依次连接的残差块和全连接层；In one embodiment, the adaptive learning network model includes a plurality of residual blocks and fully connected layers connected in sequence;

如图3所示，所述将所述图像插曲输入所述自适应学习网络模型，得到所述图像插曲中的支持样本与查询样本的特征图，包括：As shown in FIG3 , the step of inputting the image episode into the adaptive learning network model to obtain feature graphs of support samples and query samples in the image episode includes:

S301，通过残差块对输入的图像插曲进行特征提取，得到提取后的中间特征图；S301, extracting features from the input image episode through a residual block to obtain an extracted intermediate feature map;

本申请中，每两个残差块为一组，多组残差块依次连接；随着多组残差块对输入的图像插曲进行逐层提取，其提取的特征图的大小逐渐减小，特征图的数量成倍增加。In the present application, every two residual blocks form a group, and multiple groups of residual blocks are connected in sequence; as multiple groups of residual blocks extract the input image interlude layer by layer, the size of the extracted feature map gradually decreases, and the number of feature maps increases exponentially.

S302，通过全连接层对提取后的特征图进行线性变换，得到所述支持样本与所述查询样本的特征图。S302, performing a linear transformation on the extracted feature map through a fully connected layer to obtain feature maps of the support sample and the query sample.

本申请中，如图3中所示的分类器，即为所述的全连接层。全连接层(fullyconnected layers，FC)在整个卷积神经网络中起到“分类器”的作用。In this application, the classifier shown in FIG3 is the fully connected layer. The fully connected layers (FC) act as a "classifier" in the entire convolutional neural network.

通过残差块组成残差网络对输入的图像插曲进行逐层提取，可以避免多层网络中的梯度消失问题；通过全连接层的线性变换，增加自适应学习网络模型的复杂性，使得模型可以表达更复杂的特征。By using residual blocks to form a residual network to extract the input image episodes layer by layer, the gradient vanishing problem in multi-layer networks can be avoided; through the linear transformation of the fully connected layer, the complexity of the adaptive learning network model is increased, allowing the model to express more complex features.

在一种实施方式中，所述残差块包括自适应模块层、批量归一化层和ReLU函数；In one embodiment, the residual block includes an adaptive module layer, a batch normalization layer and a ReLU function;

如图4所示，所述通过残差块对输入的图像插曲进行特征提取，包括：As shown in FIG4 , the feature extraction of the input image episode by the residual block includes:

S311，通过自适应模块层对上一个残差块的输出进行特征提取；S311, extracting features from the output of the previous residual block through an adaptive module layer;

S312，通过批量归一化层对所述自适应模块的输出进行归一化；S312, normalizing the output of the adaptive module through a batch normalization layer;

其中，批量归一化层(Batch Normalization，BN)的作用就是对数据进行标准化处理，从而加快模型收敛速度。Among them, the role of the batch normalization layer (Batch Normalization, BN) is to standardize the data, thereby accelerating the convergence of the model.

S313，通过ReLU函数将归一化后的输出结果映射到输出端。S313, mapping the normalized output result to the output end through the ReLU function.

本申请中，所述残差块在卷积之后通过BN做归一化，然后再和直接映射单位加之后使用了ReLU作为激活函数，该种残差设置方式，提高了模型的精度。In this application, the residual block is normalized by BN after convolution, and then ReLU is used as the activation function after adding the direct mapping unit. This residual setting method improves the accuracy of the model.

在一种实施方式中，所述自适应模块层包括卷积层、任务适配器和门控网络，所述任务适配器包括多个任务参数卷积层；In one embodiment, the adaptive module layer includes a convolutional layer, a task adapter and a gating network, and the task adapter includes a plurality of task parameter convolutional layers;

如图5所示，所述通过自适应模块层对上一个残差块的输出进行特征提取，包括：As shown in FIG5 , the feature extraction of the output of the previous residual block through the adaptive module layer includes:

S321，通过卷积层对上一个残差块的输出进行第一特征提取；S321, performing a first feature extraction on the output of the previous residual block through a convolutional layer;

在一种实施方式中，通过3*3卷积层对上一个残差块的输出进行第一特征提取。In one implementation, a first feature extraction is performed on the output of the previous residual block through a 3*3 convolutional layer.

S322，通过任务适配器中的多个任务参数卷积层分别对上一个残差块的输出进行第二特征提取；S322, performing second feature extraction on the output of the previous residual block respectively through multiple task parameter convolution layers in the task adapter;

在一种实施方式中，任务参数卷积层为1*1卷积层。In one implementation, the task parameter convolution layer is a 1*1 convolution layer.

S323，通过门控网络基于上一个残差块的输出生成决策结果，通过所述决策结果对任务适配器中的各个任务参数卷积层是否执行进行决策；S323, generating a decision result based on the output of the previous residual block through a gating network, and making a decision on whether to execute each task parameter convolution layer in the task adapter based on the decision result;

S324，决策后的所述第二特征提取的结果与所述第一特征提取的结果相加，作为所述自适应模块层的输出。S324, adding the result of the second feature extraction after the decision to the result of the first feature extraction as the output of the adaptive module layer.

本申请中，任务适配器(Task Adapter，TA)与3*3卷积层并行，且每个适配器包含k个特定任务参数1*1卷积层，每个特定任务参数层执行与否由门控网络决定。In this application, the task adapter (TA) is parallel to the 3*3 convolution layer, and each adapter contains k specific task parameter 1*1 convolution layers, and whether each specific task parameter layer is executed or not is determined by the gating network.

其中，

表示第l个自适应模块的3*3卷积层，当第l个自适应模块的输入为：in,

Represents the 3*3 convolutional layer of the lth adaptive module. When the input of the lth adaptive module is:

任务适配器学习到的特征将与3*3卷积层学习到的特征相结合，即：

The features learned by the task adapter will be combined with the features learned by the 3*3 convolutional layer, namely:

其中，

表示特定任务适配器的第i层的特定任务学习函数。

表示门控网络生成的门决策，决定第i层的特定任务函数的执行与否，

其中，1表示执行，0表示不执行。in,

represents the task-specific learning function of the i-th layer of the task-specific adapter.

represents the gate decision generated by the gating network, which determines whether the specific task function of the i-th layer is executed or not.

Among them, 1 means execution, and 0 means no execution.

在一种实施方式中，所述门控网络包括全局平均池化层、类原型层、1*1卷积层和激活函数；In one embodiment, the gated network includes a global average pooling layer, a prototype-like layer, a 1*1 convolutional layer, and an activation function;

如图6所示，所述通过门控网络基于上一个残差块的输出生成决策结果，包括：As shown in FIG6 , the decision result is generated based on the output of the previous residual block through the gating network, including:

S331，通过全局平均池化层对上一个残差块的输出进行空间维度压缩；S331, compressing the spatial dimension of the output of the previous residual block through a global average pooling layer;

本申请中，门控网络的输入是上一个残差块的输出h_l-1，首先经过全局平均池化层压缩特征图的空间维度，即：In this application, the input of the gating network is the output h _l-1 of the previous residual block, which first passes through a global average pooling layer to compress the spatial dimension of the feature map, namely:

u_l-1＝GAP(h_l-1)u _l-1 =GAP(h _l-1 )

其中，

并且

in,

and

S332，通过类原型层确定所述图像插曲中的每个类别在当前层的原型特征；S332, determining the prototype feature of each category in the image episode at the current layer through the class prototype layer;

其中，每个类别在当前层的原型特征可以通过以下公式得到：Among them, the prototype feature of each category at the current layer can be obtained by the following formula:

其中，

表示类别n的原型特征，S_n表示属于类别n的样本集合。并且

表示当前任务所有类别的原型特征。in,

represents the prototype feature of category n, and _Sn represents the sample set belonging to category n. And

Represents the prototype features of all categories of the current task.

S333，根据每个类别在当前层的原型特征，经由1*1卷积层和激活函数生成决策结果。S333, based on the prototype features of each category in the current layer, a decision result is generated through a 1*1 convolution layer and an activation function.

原型特征通过1*1的线性函数以及Sigmoid激活函数生成软决策，即：The prototype feature generates soft decisions through a 1*1 linear function and a Sigmoid activation function, namely:

其中，

σ表示Sigmoid激活函数，

in,

σ represents the Sigmoid activation function,

接着，通过简单阈值算法可以生成离散决策(决策结果)，即：Then, a discrete decision (decision result) can be generated through a simple threshold algorithm, namely:

如图2所示，其中的0.6、0.2、1、…、0.3为所述软决策，其中的1、0、1、…、0为生产的离散决策/硬决策/决策结果。As shown in FIG. 2 , 0.6, 0.2, 1, ..., 0.3 are soft decisions, and 1, 0, 1, ..., 0 are discrete decisions/hard decisions/decision results of production.

在一种实施方式中，S400，根据所述支持样本与所述查询样本的特征图确定分类损失，根据所述图像插曲与目标域数据集的域偏移确定自适应损失，根据所述分类损失与所述自适应损失确定整体损失中，通过最大平均差异量化据所述图像插曲与目标域数据集的域偏移。In one embodiment, S400, a classification loss is determined based on feature maps of the support sample and the query sample, an adaptive loss is determined based on a domain offset between the image interlude and a target domain dataset, and in determining an overall loss based on the classification loss and the adaptive loss, the domain offset between the image interlude and the target domain dataset is quantified by a maximum average difference.

在一种实施方式中，所述域偏移的量化公式为：In one implementation, the quantization formula of the domain offset is:

其中，

将特征映射到再生希尔伯特空间，P_i为源域数据集中类别i的类原型，N_b代表源域数据集的类别数，P_j为源域数据集中类别j的类原型，k()为核函数，N_s为支持集样本数，

表示MMD度量是在再生希尔伯特空间进行的，等号左侧竖线为范数。in,

Map the features to the regenerated Hilbert space, _Pi is the class prototype of category i in the source domain dataset, _Nb represents the number of categories in the source domain dataset, _Pj is the class prototype of category j in the source domain dataset, k() is the kernel function, _Ns is the number of samples in the support set,

It indicates that the MMD measurement is performed in the regenerated Hilbert space, and the vertical line on the left side of the equal sign is the norm.

在一种实施方式中，自适应损失函数为：In one embodiment, the adaptive loss function is:

其中，L为自适应模块数量，t为域偏移，i为任务适配器的层数，表示任务适配器的第i层，

为第l层任务适配器的第i层的软决策。Where L is the number of adaptive modules, t is the domain offset, i is the number of layers of the task adapter, and represents the i-th layer of the task adapter.

is the soft decision of the i-th layer of the l-th layer task adapter.

在一种实施方式中，整体损失函数是：In one embodiment, the overall loss function is:

其中，λ为超参数，用于平衡两个损失的权重，

为自适应损失，

为分类损失。Among them, λ is a hyperparameter used to balance the weights of the two losses.

is the adaptive loss,

is the classification loss.

本申请实施例提供了一种图像识别方法，该方法可以由图像识别装置来执行，该图像识别装置可以集成在pad、电脑、服务器、计算机、服务器集群、数据中心等电子设备中。如图7所示，其为根据本申请一个实施例的图像识别方法的流程图；其中，所述图像识别方法，包括：The embodiment of the present application provides an image recognition method, which can be performed by an image recognition device, and the image recognition device can be integrated in electronic devices such as pads, computers, servers, computers, server clusters, data centers, etc. As shown in Figure 7, it is a flow chart of an image recognition method according to an embodiment of the present application; wherein the image recognition method includes:

S10，获取目标域数据集中待识别的图像插曲，所述待识别的图像插曲包含多个类别的支持样本与查询样本，所述支持样本已标注；S10, obtaining an image episode to be identified in a target domain dataset, wherein the image episode to be identified includes supporting samples and query samples of multiple categories, and the supporting samples have been labeled;

本步骤中，与模型训练方法中不同的是，所述待识别的图像插曲的查询样本未标注。In this step, unlike the model training method, the query sample of the image episode to be identified is not labeled.

其中，每个待识别的图像插曲独立进行图像识别。Each image episode to be identified is independently subjected to image recognition.

S20，获取预训练的自适应学习网络模型，所述自适应学习网络模型通过上述所述的模型训练方法进行训练得到；S20, obtaining a pre-trained adaptive learning network model, wherein the adaptive learning network model is trained by the above-mentioned model training method;

S30，通过已标注的支持样本对所述自适应学习网络模型进行调整；S30, adjusting the adaptive learning network model through the labeled support samples;

在一种实施方式中，所述通过已标注的支持样本对所述自适应学习网络模型进行调整，包括：将已标注的支持样本输入所述自适应学习网络模型，得到支持样本的特征图；根据支持样本的特征图和标注的类别，确定每个类别在当前层的原型特征；根据每个支持样本与所述类别在当前层的原型特征之间的距离，计算交叉熵损失；根据交叉熵损失优化所述自适应学习网络模型，直至收敛为止，得到调整后的所述自适应学习网络模型。In one embodiment, the adaptive learning network model is adjusted using labeled support samples, including: inputting the labeled support samples into the adaptive learning network model to obtain a feature map of the support samples; determining the prototype features of each category at the current layer based on the feature map of the support samples and the labeled categories; calculating the cross entropy loss based on the distance between each support sample and the prototype features of the category at the current layer; optimizing the adaptive learning network model based on the cross entropy loss until convergence, to obtain the adjusted adaptive learning network model.

其中，所述交叉熵损失，即为模型训练方法中的分类损失。The cross entropy loss is the classification loss in the model training method.

其中，每个类别在当前层的原型特征的确定方式，已在模型训练方法中进行描述，本步骤中不再赘述。The method for determining the prototype features of each category in the current layer has been described in the model training method and will not be repeated in this step.

其中，每个支持样本与所述类别在当前层的原型特征之间的距离，可以为欧式距离。The distance between each supporting sample and the prototype feature of the category at the current layer may be a Euclidean distance.

S40，通过调整后的所述自适应学习网络模型确定待识别的图像插曲中支持样本与查询样本的特征图；S40, determining feature graphs of supporting samples and query samples in the image episode to be identified by using the adjusted adaptive learning network model;

需要说明的是，在所述自适应学习网络模型调整后，需要通过调整后的所述自适应学习网络模型重新确定支持样本的特征图。It should be noted that after the adaptive learning network model is adjusted, the feature graph of the supporting samples needs to be re-determined by the adjusted adaptive learning network model.

S50，根据查询样本的特征图和已标注的支持样本的特征图，确定所述查询样本的类别。S50, determining the category of the query sample according to the feature graph of the query sample and the feature graphs of the labeled support samples.

在一种实施方式中，所述根据查询样本的特征图和已标注的支持样本的特征图，确定所述查询样本的类别，包括：根据支持样本的特征图和标注的类别，确定每个类别在当前层的原型特征；根据每个查询样本的特征图，确定每个查询样本与所述类别在当前层的原型特征之间的距离；选择距离最小的所述类别为该查询样本的识别结果。In one embodiment, determining the category of the query sample based on the feature graph of the query sample and the feature graph of the labeled support samples includes: determining the prototype features of each category at the current layer based on the feature graph of the support samples and the labeled categories; determining the distance between each query sample and the prototype features of the category at the current layer based on the feature graph of each query sample; and selecting the category with the smallest distance as the recognition result of the query sample.

需要说明的是，上述模型训练方法中，所述通过已标注的支持样本对查询样本的类别进行预测，与图像识别方法中，通过已标注的支持样本对查询样本的类别进行确定(步骤S20-S50)，具体过程相同，不同之处在于，模型训练方法中通过未训练的自适应学习网络模型获取支持样本与查询样本的特征图，图像识别方法中通过预训练的自适应学习网络模型获取支持样本与查询样本的特征图。基于此，不再对模型训练方法中所述通过已标注的支持样本对查询样本的类别进行预测的具体过程进行赘述。It should be noted that in the above-mentioned model training method, the prediction of the category of the query sample by the labeled support samples is the same as the determination of the category of the query sample by the labeled support samples in the image recognition method (steps S20-S50). The difference is that in the model training method, the feature graphs of the support samples and the query samples are obtained by the untrained adaptive learning network model, and in the image recognition method, the feature graphs of the support samples and the query samples are obtained by the pre-trained adaptive learning network model. Based on this, the specific process of predicting the category of the query sample by the labeled support samples in the model training method will not be described in detail.

本申请实施例提供了一种模型训练装置，用于执行本申请上述内容所述的模型训练方法，以下对所述模型训练装置进行详细描述。An embodiment of the present application provides a model training device for executing the model training method described in the above content of the present application. The model training device is described in detail below.

如图8所示，所述模型训练装置，包括：As shown in FIG8 , the model training device includes:

训练获取模块101，其用于在源域数据集中随机获取多个图像插曲，每个图像插曲包含多个类别的支持样本与查询样本，所述图像插曲中的支持样本与查询样本已标注；A training acquisition module 101 is used to randomly acquire multiple image episodes in a source domain dataset, each image episode includes support samples and query samples of multiple categories, and the support samples and query samples in the image episode are labeled;

模型构建模块102，其用于构建任务感知的自适应学习网络模型；A model building module 102, which is used to build a task-aware adaptive learning network model;

特征获取模块103，其用于将所述图像插曲输入所述自适应学习网络模型，得到所述图像插曲中的支持样本与查询样本的特征图；A feature acquisition module 103, which is used to input the image episode into the adaptive learning network model to obtain feature graphs of support samples and query samples in the image episode;

损失确定模块104，其用于根据所述支持样本与所述查询样本的特征图确定分类损失，根据所述图像插曲与目标域数据集的域偏移确定自适应损失，根据所述分类损失与所述自适应损失确定整体损失；a loss determination module 104, configured to determine a classification loss according to the feature graphs of the support sample and the query sample, determine an adaptive loss according to a domain shift between the image interlude and a target domain dataset, and determine an overall loss according to the classification loss and the adaptive loss;

模型训练模块105，其用于根据所述整体损失调整所述自适应学习网络模型，直至所述整体损失收敛为止。The model training module 105 is used to adjust the adaptive learning network model according to the overall loss until the overall loss converges.

所述特征获取模块103还用于：The feature acquisition module 103 is also used for:

通过残差块对输入的图像插曲进行特征提取，得到提取后的中间特征图；The input image episode is subjected to feature extraction through the residual block to obtain an extracted intermediate feature map;

通过全连接层对提取后的特征图进行线性变换，得到所述支持样本与所述查询样本的特征图。The extracted feature map is linearly transformed through a fully connected layer to obtain the feature maps of the support sample and the query sample.

通过自适应模块层对上一个残差块的输出进行特征提取；Perform feature extraction on the output of the previous residual block through the adaptive module layer;

通过批量归一化层对所述自适应模块的输出进行归一化；Normalizing the output of the adaptive module through a batch normalization layer;

通过ReLU函数将归一化后的输出结果映射到输出端。The normalized output result is mapped to the output end through the ReLU function.

通过卷积层对上一个残差块的输出进行第一特征提取；Perform the first feature extraction on the output of the previous residual block through the convolution layer;

通过任务适配器中的多个任务参数卷积层分别对上一个残差块的输出进行第二特征提取；Performing second feature extraction on the output of the previous residual block through multiple task parameter convolution layers in the task adapter;

通过门控网络基于上一个残差块的输出生成决策结果，通过所述决策结果对任务适配器中的各个任务参数卷积层是否执行进行决策；Generate a decision result based on the output of the previous residual block through a gating network, and decide whether to execute each task parameter convolution layer in the task adapter based on the decision result;

决策后的所述第二特征提取的结果与所述第一特征提取的结果相加，作为所述自适应模块层的输出。The result of the second feature extraction after the decision is added to the result of the first feature extraction as the output of the adaptive module layer.

通过全局平均池化层对上一个残差块的输出进行空间维度压缩；The output of the previous residual block is compressed in spatial dimension through the global average pooling layer;

通过类原型层确定所述图像插曲中的每个类别在当前层的原型特征；Determine the prototype feature of each category in the image episode at the current layer through the class prototype layer;

根据每个类别在当前层的原型特征，经由1*1卷积层和激活函数生成决策结果。According to the prototype features of each category in the current layer, the decision result is generated through a 1*1 convolution layer and an activation function.

本申请的上述实施例提供的模型训练装置与本申请实施例提供的模型训练方法出于相同的发明构思，具有与其存储的应用程序所采用、运行或实现的方法相同的有益效果。The model training device provided in the above-mentioned embodiments of the present application and the model training method provided in the embodiments of the present application are based on the same inventive concept and have the same beneficial effects as the methods adopted, run or implemented by the application programs stored therein.

本申请实施例提供了一种图像识别装置，用于执行本申请上述内容所述的图像识别方法，以下对所述图像识别装置进行详细描述。An embodiment of the present application provides an image recognition device for executing the image recognition method described in the above content of the present application. The image recognition device is described in detail below.

如图9所示，所述图像识别装置，包括：As shown in FIG9 , the image recognition device includes:

测试获取模块201，其用于获取目标域数据集中待识别的图像插曲，所述待识别的图像插曲包含多个类别的支持样本与查询样本，所述支持样本已标注；A test acquisition module 201 is used to acquire an image episode to be identified in a target domain dataset, wherein the image episode to be identified includes support samples and query samples of multiple categories, and the support samples are labeled;

模型获取模块202，其用于获取预训练的自适应学习网络模型，所述自适应学习网络模型通过上述所述的模型训练方法进行训练得到；A model acquisition module 202, which is used to acquire a pre-trained adaptive learning network model, wherein the adaptive learning network model is trained by the above-mentioned model training method;

模型调整模块203，其用于通过已标注的支持样本对所述自适应学习网络模型进行调整；A model adjustment module 203, which is used to adjust the adaptive learning network model through the labeled support samples;

模型输出模块204，其用于通过调整后的所述自适应学习网络模型确定待识别的图像插曲中支持样本与查询样本的特征图；A model output module 204, which is used to determine the feature graphs of the support samples and the query samples in the image episode to be identified through the adjusted adaptive learning network model;

类别确定模块205，其用于根据查询样本的特征图和已标注的支持样本的特征图，确定所述查询样本的类别。The category determination module 205 is used to determine the category of the query sample according to the feature graph of the query sample and the feature graphs of the labeled support samples.

本申请的上述实施例提供的图像识别装置与本申请实施例提供的图像识别方法出于相同的发明构思，具有与其存储的应用程序所采用、运行或实现的方法相同的有益效果。The image recognition device provided in the above-mentioned embodiment of the present application and the image recognition method provided in the embodiment of the present application are based on the same inventive concept and have the same beneficial effects as the method adopted, run or implemented by the application program stored therein.

以上描述了模型训练装置/图像识别装置的内部功能和结构，如图10所示，实际中，该模型训练装置/图像识别装置可实现为终端设备，包括：存储器301及处理器303。The above describes the internal functions and structure of the model training device/image recognition device. As shown in FIG10 , in practice, the model training device/image recognition device can be implemented as a terminal device, including: a memory 301 and a processor 303 .

存储器301，可被配置为存储程序。The memory 301 may be configured to store programs.

另外，存储器301，还可被配置为存储其它各种数据以支持在终端设备上的操作。这些数据的示例包括用于在终端设备上操作的任何应用程序或方法的指令，联系人数据，电话簿数据，消息，图片，视频等。In addition, the memory 301 may also be configured to store various other data to support operations on the terminal device. Examples of such data include instructions for any application or method operating on the terminal device, contact data, phone book data, messages, pictures, videos, etc.

存储器301可以由任何类型的易失性或非易失性存储设备或者它们的组合实现，如静态随机存取存储器(SRAM)，电可擦除可编程只读存储器(EEPROM)，可擦除可编程只读存储器(EPROM)，可编程只读存储器(PROM)，只读存储器(ROM)，磁存储器，快闪存储器，磁盘或光盘。The memory 301 may be implemented by any type of volatile or nonvolatile storage device or a combination thereof, such as static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic disk or optical disk.

处理器303，耦合至存储器301，用于执行存储器301中的程序，以用于：The processor 303 is coupled to the memory 301 and is configured to execute a program in the memory 301 to:

处理器303具体用于：The processor 303 is specifically used for:

或者，处理器303，耦合至存储器301，用于执行存储器301中的程序，以用于：Alternatively, the processor 303 is coupled to the memory 301 and is configured to execute a program in the memory 301 to:

获取目标域数据集中待识别的图像插曲，所述待识别的图像插曲包含多个类别的支持样本与查询样本，所述支持样本已标注；Obtaining an image episode to be identified in a target domain dataset, wherein the image episode to be identified includes support samples and query samples of multiple categories, and the support samples are labeled;

获取预训练的自适应学习网络模型，所述自适应学习网络模型通过上述所述的模型训练方法进行训练得到；Obtaining a pre-trained adaptive learning network model, wherein the adaptive learning network model is trained by the above-mentioned model training method;

本申请中，图10中仅示意性给出部分组件，并不意味着终端设备只包括图10所示组件。In the present application, only some components are schematically shown in FIG10 , which does not mean that the terminal device only includes the components shown in FIG10 .

本实施例提供的终端设备，与本申请实施例提供的模型训练方法或图像识别方法出于相同的发明构思，具有与其存储的应用程序所采用、运行或实现的方法相同的有益效果。The terminal device provided in this embodiment is based on the same inventive concept as the model training method or image recognition method provided in the embodiment of the present application, and has the same beneficial effects as the method adopted, run or implemented by the application stored therein.

本领域内的技术人员应明白，本申请的实施例可提供为方法、系统、或计算机程序产品。因此，本申请可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且，本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可读存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。Those skilled in the art will appreciate that the embodiments of the present application may be provided as methods, systems, or computer program products. Therefore, the present application may adopt the form of a complete hardware embodiment, a complete software embodiment, or an embodiment in combination with software and hardware. Moreover, the present application may adopt the form of a computer program product implemented in one or more computer-readable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) that contain computer-usable program code.

本申请是参照根据本申请实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器，使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。The present application is described with reference to the flowchart and/or block diagram of the method, device (system) and computer program product according to the embodiment of the present application. It should be understood that each process and/or box in the flowchart and/or block diagram, and the combination of the process and/or box in the flowchart and/or block diagram can be realized by computer program instructions. These computer program instructions can be provided to a processor of a general-purpose computer, a special-purpose computer, an embedded processor or other programmable data processing device to produce a machine, so that the instructions executed by the processor of the computer or other programmable data processing device produce a device for realizing the function specified in one process or multiple processes in the flowchart and/or one box or multiple boxes in the block diagram.

这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中，使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品，该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing device to work in a specific manner, so that the instructions stored in the computer-readable memory produce a manufactured product including an instruction device that implements the functions specified in one or more processes in the flowchart and/or one or more boxes in the block diagram.

这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上，使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理，从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions may also be loaded onto a computer or other programmable data processing device so that a series of operational steps are executed on the computer or other programmable device to produce a computer-implemented process, whereby the instructions executed on the computer or other programmable device provide steps for implementing the functions specified in one or more processes in the flowchart and/or one or more boxes in the block diagram.

在一个典型的配置中，计算设备包括一个或多个处理器(CPU)、输入/输出接口、网络接口和内存。In a typical configuration, a computing device includes one or more processors (CPU), input/output interfaces, network interfaces, and memory.

内存可能包括计算机可读介质中的非永久性存储器，随机存取存储器(RAM)和/或非易失性内存等形式，如只读存储器(ROM)或闪存(flash RAM)。内存是计算机可读介质的示例。The memory may include non-permanent storage in a computer-readable medium, random access memory (RAM) and/or non-volatile memory in the form of read-only memory (ROM) or flash RAM. The memory is an example of a computer-readable medium.

本申请还提供一种与前述实施方式所提供的模型训练方法或图像识别方法对应的计算机可读存储介质，其上存储有计算机程序(即程序产品)，所述计算机程序在被处理器运行时，会执行前述任意实施方式所提供的模型训练方法，或者执行前述任意实施方式所提供的图像识别方法。The present application also provides a computer-readable storage medium corresponding to the model training method or image recognition method provided in the aforementioned embodiment, on which a computer program (i.e., program product) is stored. When the computer program is run by a processor, it will execute the model training method provided in any of the aforementioned embodiments, or execute the image recognition method provided in any of the aforementioned embodiments.

计算机可读介质包括永久性和非永久性、可移动和非可移动媒体可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括，但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带，磁带磁磁盘存储或其他磁性存储设备或任何其他非传输介质，可用于存储可以被计算设备访问的信息。按照本文中的界定，计算机可读介质不包括暂存电脑可读媒体(transitory media)，如调制的数据信号和载波。Computer readable media include permanent and non-permanent, removable and non-removable media that can be implemented by any method or technology to store information. Information can be computer readable instructions, data structures, program modules or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technology, compact disk read-only memory (CD-ROM), digital versatile disk (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices or any other non-transmission media that can be used to store information that can be accessed by a computing device. As defined herein, computer readable media does not include temporary computer readable media (transitory media), such as modulated data signals and carrier waves.

本申请的上述实施例提供的计算机可读存储介质与本申请实施例提供的模型训练方法或图像识别方法出于相同的发明构思，具有与其存储的应用程序所采用、运行或实现的方法相同的有益效果。The computer-readable storage medium provided in the above-mentioned embodiments of the present application and the model training method or image recognition method provided in the embodiments of the present application are based on the same inventive concept and have the same beneficial effects as the methods adopted, run or implemented by the application programs stored therein.

需要说明的是，在此处所提供的说明书中，说明了大量具体细节。然而，能够理解，本申请的实施例可以在没有这些具体细节的情况下实践。在一些实例中，并未详细示出公知的结构和技术，以便不模糊对本说明书的理解。It should be noted that in the description provided herein, a large number of specific details are described. However, it is understood that the embodiments of the present application can be practiced without these specific details. In some instances, well-known structures and technologies are not shown in detail so as not to obscure the understanding of this description.

还需要说明的是，术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含，从而使得包括一系列要素的过程、方法、商品或者设备不仅包括那些要素，而且还包括没有明确列出的其他要素，或者是还包括为这种过程、方法、商品或者设备所固有的要素。在没有更多限制的情况下，由语句“包括一个……”限定的要素，并不排除在包括所述要素的过程、方法、商品或者设备中还存在另外的相同要素。It should also be noted that the terms "include", "comprises" or any other variations thereof are intended to cover non-exclusive inclusion, so that a process, method, commodity or device including a series of elements includes not only those elements, but also other elements not explicitly listed, or also includes elements inherent to such process, method, commodity or device. In the absence of more restrictions, the elements defined by the sentence "comprises a ..." do not exclude the existence of other identical elements in the process, method, commodity or device including the elements.

以上所述仅为本申请的实施例而已，并不用于限制本申请。对于本领域技术人员来说，本申请可以有各种更改和变化。凡在本申请的精神和原理之内所作的任何修改、等同替换、改进等，均应包含在本申请的权利要求范围之内。The above is only an embodiment of the present application and is not intended to limit the present application. For those skilled in the art, the present application may have various changes and variations. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.

Claims

1. A model training method, characterized by comprising:

A plurality of image episodes are randomly obtained in a source domain dataset, each of which includes support samples and query samples of multiple categories, and the support samples and query samples in the image episodes are labeled;

Constructing task-aware adaptive learning network models;

Inputting the image episode into the adaptive learning network model to obtain feature graphs of support samples and query samples in the image episode;

Determine a classification loss according to the feature graphs of the support sample and the query sample, determine an adaptive loss according to the domain shift of the image interlude and a target domain dataset, and determine an overall loss according to the classification loss and the adaptive loss;

The adaptive learning network model is adjusted according to the overall loss until the overall loss converges.

2. The method according to claim 1, characterized in that the adaptive learning network model comprises a plurality of residual blocks and fully connected layers connected in sequence;

The step of inputting the image episode into the adaptive learning network model to obtain a feature graph of a support sample and a query sample in the image episode includes:

Perform feature extraction on the input image episode through the residual block to obtain an extracted intermediate feature map;

The extracted feature map is linearly transformed through a fully connected layer to obtain the feature maps of the support sample and the query sample.

3. The method according to claim 2, wherein the residual block comprises an adaptive module layer, a batch normalization layer and a ReLU function;

The feature extraction of the input image episode by the residual block includes:

Perform feature extraction on the output of the previous residual block through the adaptive module layer;

Normalizing the output of the adaptive module through a batch normalization layer;

The normalized output result is mapped to the output end through the ReLU function.

4. The method according to claim 3, characterized in that the adaptive module layer includes a convolution layer, a task adapter and a gating network, and the task adapter includes a plurality of task parameter convolution layers;

The feature extraction of the output of the previous residual block through the adaptive module layer includes:

Perform the first feature extraction on the output of the previous residual block through the convolution layer;

Performing second feature extraction on the output of the previous residual block through multiple task parameter convolution layers in the task adapter;

Generate a decision result based on the output of the previous residual block through a gating network, and decide whether to execute each task parameter convolution layer in the task adapter based on the decision result;

The result of the second feature extraction after the decision is added to the result of the first feature extraction as the output of the adaptive module layer.

5. The method according to claim 4, characterized in that the gated network includes a global average pooling layer, a prototype-like layer, a 1*1 convolutional layer and an activation function;

The generating of a decision result based on the output of the previous residual block through the gating network includes:

The output of the previous residual block is compressed in spatial dimension through the global average pooling layer;

Determine the prototype feature of each category in the image episode at the current layer through the class prototype layer;

According to the prototype features of each category in the current layer, the decision result is generated through a 1*1 convolution layer and an activation function.

6. An image recognition method, comprising:

Obtaining an image episode to be identified in a target domain dataset, wherein the image episode to be identified includes support samples and query samples of multiple categories, and the support samples are labeled;

Obtain a pre-trained adaptive learning network model, wherein the adaptive learning network model is trained by the model training method described in any one of claims 1 to 5;

Adjusting the adaptive learning network model through labeled support samples;

Determine the feature graphs of the support samples and the query samples in the image episode to be identified by using the adjusted adaptive learning network model;

The category of the query sample is determined according to the feature graph of the query sample and the feature graphs of the labeled support samples.

7. A model training device, characterized in that it comprises:

A training acquisition module, which is used to randomly acquire multiple image episodes in a source domain dataset, each image episode includes support samples and query samples of multiple categories, and the support samples and query samples in the image episode are labeled;

A model building module, which is used to build a task-aware adaptive learning network model;

A feature acquisition module, which is used to input the image episode into the adaptive learning network model to obtain feature graphs of support samples and query samples in the image episode;

a loss determination module, configured to determine a classification loss according to feature maps of the support sample and the query sample, determine an adaptive loss according to a domain shift between the image interlude and a target domain dataset, and determine an overall loss according to the classification loss and the adaptive loss;

A model training module is used to adjust the adaptive learning network model according to the overall loss until the overall loss converges.

8. An image recognition device, comprising:

A test acquisition module, which is used to acquire image episodes to be identified in a target domain dataset, wherein the image episodes to be identified include support samples and query samples of multiple categories, and the support samples are labeled;

A model acquisition module, which is used to acquire a pre-trained adaptive learning network model, wherein the adaptive learning network model is trained by the model training method described in any one of claims 1 to 5;

A model adjustment module, which is used to adjust the adaptive learning network model through labeled support samples;

A model output module, which is used to determine the feature graphs of the supporting samples and the query samples in the image episode to be identified through the adjusted adaptive learning network model;

The category determination module is used to determine the category of the query sample according to the feature graph of the query sample and the feature graphs of the labeled support samples.

9. A terminal device, comprising: a memory and a processor;

The memory is used to store programs;

The processor, coupled to the memory, is used to execute the program to execute the model training method described in any one of claims 1-5, or to execute the image recognition method described in claim 6.

10. A computer-readable storage medium having a computer program stored thereon, characterized in that the program is executed by a processor to implement the model training method described in any one of claims 1 to 5, or to implement the image recognition method described in claim 6.