CN114648650A

CN114648650A - Method and device, device and storage medium for neural network training and target detection

Info

Publication number: CN114648650A
Application number: CN202210333676.7A
Authority: CN
Inventors: 高梦雅
Original assignee: Beijing Sensetime Technology Development Co Ltd
Current assignee: Beijing Sensetime Technology Development Co Ltd
Priority date: 2022-03-30
Filing date: 2022-03-30
Publication date: 2022-06-21

Abstract

The present disclosure provides a method, device, device and storage medium for neural network training and target detection, wherein the method includes: acquiring a first image sample collected in an upstream task, a first neural network to be trained in a downstream task, And the second neural network and the image generation network obtained based on the training of the first image sample; the second neural network is used for feature extraction, and the image generation network is used to generate a new image, and the new image conforms to the overall distribution of the first image sample; respectively; Perform feature extraction on the new image generated based on the image generation network according to the second neural network obtained by training and the first neural network to be trained; train the first neural network to be trained based on the extracted first image features and the second image features , get the trained first neural network. The present disclosure can better guide the training of the first neural network through the two image features, thereby improving the performance in downstream tasks.

Description

Method and device, device and storage medium for neural network training and target detection

技术领域technical field

本公开涉及人工智能技术领域，具体而言，涉及一种神经网络训练、目标检测的方法及装置、设备及存储介质。The present disclosure relates to the technical field of artificial intelligence, and in particular, to a method and apparatus, device and storage medium for neural network training and target detection.

背景技术Background technique

随着人工智能技术的飞速发展，基于端到端的深度学习技术也日趋成熟。利用大规模数据集可以在上游为各类任务共同学习一个预训练神经网络(即预训练模型)，该模型可以直接共享预训练好的早期权重，具有较为强大的特征表示能力。With the rapid development of artificial intelligence technology, end-to-end deep learning technology is also becoming more and more mature. Using large-scale data sets, a pre-trained neural network (ie, a pre-training model) can be jointly learned upstream for various tasks. The model can directly share the pre-trained early weights and has relatively powerful feature representation capabilities.

然而，在将预训练模型迁移到下游特定任务的过程中，由于在下游实际上所能获取到的数据量比较少，这导致不管是直接进行模型迁移，还是在对预训练模型进行微调后再进行迁移，预训练模型在下游任务中的表现性能都不佳。However, in the process of migrating a pre-trained model to a specific downstream task, the amount of data that can actually be obtained downstream is relatively small, which leads to whether the model is transferred directly or after fine-tuning the pre-trained model. For migration, the performance of pre-trained models in downstream tasks is poor.

发明内容SUMMARY OF THE INVENTION

本公开实施例至少提供一种神经网络训练、目标检测的方法及装置、设备及存储介质。The embodiments of the present disclosure provide at least a method, apparatus, device, and storage medium for neural network training and target detection.

第一方面，本公开实施例提供了一种神经网络训练的方法，所述方法包括：In a first aspect, an embodiment of the present disclosure provides a method for training a neural network, the method comprising:

获取上游任务中采集的第一图像样本、下游任务中待训练的第一神经网络、以及基于所述第一图像样本训练得到的第二神经网络和图像生成网络；所述第二神经网络用于进行特征提取，所述图像生成网络用于生成新图像，且所述新图像符合所述第一图像样本的整体分布；Obtain the first image sample collected in the upstream task, the first neural network to be trained in the downstream task, and the second neural network and image generation network obtained by training based on the first image sample; the second neural network is used for performing feature extraction, the image generation network is used to generate a new image, and the new image conforms to the overall distribution of the first image samples;

分别根据训练得到的所述第二神经网络和所述待训练的第一神经网络对基于所述图像生成网络生成的新图像进行特征提取，得到第一图像特征和第二图像特征；Perform feature extraction on the new image generated based on the image generation network according to the second neural network obtained by training and the first neural network to be trained, to obtain the first image feature and the second image feature;

基于所述第一图像特征和所述第二图像特征对所述待训练的第一神经网络进行训练，得到训练好的第一神经网络。The first neural network to be trained is trained based on the first image feature and the second image feature to obtain a trained first neural network.

采用上述神经网络训练的方法，可以分别根据下游任务中待训练的第一神经网络和基于上游任务中采集的第一图像样本训练得到的第二神经网络对基于图像生成网络生成的新图像进行特征提取，而后可以根据得到的第一图像特征和第二图像特征对第一神经网络进行训练。由于基于图像生成网络生成的新图像是更为符合第一图像样本的整体分布的图像样本，这样的图像样本会更有利于适应于第二神经网络的网络环境，与此同时，通过训练好的第二神经网络输出的第一图像特征以及待训练的第一神经网络输出的第二图像特征可以更好地指导第一神经网络的训练，从而进一步提升在下游任务的表现性能。Using the above neural network training method, the new image generated based on the image generation network can be characterized according to the first neural network to be trained in the downstream task and the second neural network trained based on the first image sample collected in the upstream task. extraction, and then the first neural network can be trained according to the obtained first image features and the second image features. Since the new images generated based on the image generation network are image samples that are more in line with the overall distribution of the first image samples, such image samples are more conducive to adapting to the network environment of the second neural network. The first image feature output by the second neural network and the second image feature output by the first neural network to be trained can better guide the training of the first neural network, thereby further improving the performance in downstream tasks.

在一种可能的实施方式中，按照如下步骤训练所述图像生成网络：In a possible implementation, the image generation network is trained according to the following steps:

获取基于码本生成网络输出的第一图像；所述码本生成网络用于生成将所述第一图像样本分解为包含多个基元的码本；obtaining a first image output based on a codebook generation network; the codebook generation network is used to generate a codebook that decomposes the first image sample into a plurality of primitives;

将所述第一图像输入至所述待训练的图像生成网络，得到所述图像生成网络输出的第二图像；Inputting the first image to the image generation network to be trained to obtain a second image output by the image generation network;

基于所述第二图像与所述第一图像之间的图像相似度，确定所述待训练的图像生成网络的损失函数值；determining a loss function value of the image generation network to be trained based on the image similarity between the second image and the first image;

基于所述损失函数值对所述待训练的图像生成网络进行训练，得到训练好的图像生成网络。The image generation network to be trained is trained based on the loss function value to obtain a trained image generation network.

这里，可以通过码本编码方式输出的第一图像对待训练的图像生成网络进行训练，这样所训练得到的图像生成网络更为符合第一图像样本的整体分布，从而有利于提升后续的网络训练性能。Here, the image generation network to be trained can be trained by the first image output by the codebook encoding method, so that the trained image generation network is more in line with the overall distribution of the first image samples, which is beneficial to improve the subsequent network training performance .

在一种可能的实施方式中，所述将所述第一图像输入至所述待训练的图像生成网络，包括：In a possible implementation manner, the inputting the first image to the image generation network to be trained includes:

对所述第一图像中的部分图像区域进行遮盖处理，得到遮盖处理后的第一图像；performing masking processing on part of the image area in the first image to obtain a masked first image;

将所述遮盖处理后的第一图像输入至所述待训练的图像生成网络。The masked first image is input to the image generation network to be trained.

在一种可能的实施方式中，所述码本生成网络包括编码器和解码器，按照如下步骤训练所述码本生成网络：In a possible implementation manner, the codebook generation network includes an encoder and a decoder, and the codebook generation network is trained according to the following steps:

重复执行以下步骤，直至所述解码器输出的图像与输入到所述编码器中的第一图像样本之间的相似度大于预设阈值：The following steps are repeated until the similarity between the image output by the decoder and the first image sample input into the encoder is greater than a preset threshold:

将所述第一图像样本输入到待训练的编码器，得到所述编码器输出的码本；将所述编码器输出的码本输入到待训练的解码器，得到所述解码器输出的图像。Input the first image sample into the encoder to be trained to obtain the codebook output by the encoder; input the codebook output from the encoder to the decoder to be trained to obtain the image output from the decoder .

这里的码本可以是基于编码器和解码器所构成的对抗网络实现的图像编码，准确性较高。The codebook here can be image coding based on an adversarial network composed of an encoder and a decoder, with high accuracy.

在一种可能的实施方式中，按照如下步骤获取基于码本生成网络输出的第一图像：In a possible implementation manner, the first image output based on the codebook generation network is obtained according to the following steps:

将所述第一图像样本输入到所述码本生成网络包括的编码器，得到所述编码器输出的码本；inputting the first image sample into an encoder included in the codebook generation network to obtain a codebook output by the encoder;

将所述编码器输出的码本输入到所述码本生成网络包括的解码器，得到所述解码器输出的所述第一图像。The codebook output by the encoder is input to the decoder included in the codebook generation network to obtain the first image output by the decoder.

这里，利用编码器输出的码本可以对第一图像样本进行重表征，表征后的第一图像可以更为适配后续的网络训练。Here, the codebook output by the encoder can be used to re-characterize the first image sample, and the characterized first image can be more suitable for subsequent network training.

在一种可能的实施方式中，所述图像生成网络包括用于生成将所述第一图像样本分解为多个基元的码本的第一生成子网络、以及基于所述第一生成子网络输出的图像，生成所述新图像的第二生成子网络；按照如下步骤训练所述图像生成网络：In a possible implementation, the image generation network includes a first generation sub-network for generating a codebook that decomposes the first image sample into a plurality of primitives, and a first generation sub-network based on the first generation sub-network The output image generates the second generation sub-network of the new image; train the image generation network according to the following steps:

将所述第一图像样本输入至训练好的第一生成子网络，得到所述第一生成子网络输出的第一图像；Inputting the first image sample into the trained first generation sub-network to obtain the first image output by the first generation sub-network;

将所述第一图像输入至所述待训练的第二生成子网络，得到所述第二生成子网络输出的第二图像；Inputting the first image to the second generation sub-network to be trained to obtain a second image output by the second generation sub-network;

基于所述第一图像与输入的所述第一图像样本之间的第一图像相似度、以及所述第二图像与所述第一图像之间的第二图像相似度，确定所述待训练的图像生成网络的损失函数值；Determine the to-be-trained based on the first image similarity between the first image and the input first image sample, and the second image similarity between the second image and the first image The loss function value of the image generation network;

这里，可以结合第一生成子网络和第二生成子网络进行图像生成网络的训练，使得所训练得到的图像生成网络可以更好的兼顾码本生成和图像生成的生成效果和生成效率。Here, the image generation network can be trained in combination with the first generation sub-network and the second generation sub-network, so that the trained image generation network can better take into account the generation effect and generation efficiency of codebook generation and image generation.

在一种可能的实施方式中，所述基于所述第一图像特征和所述第二图像特征对所述待训练的第一神经网络进行训练，得到训练好的第一神经网络，包括：In a possible implementation manner, the first neural network to be trained is trained based on the first image feature and the second image feature to obtain a trained first neural network, including:

基于所述第一图像特征和所述第二图像特征之间的图像相似度，确定所述待训练的第一神经网络的损失函数值；determining a loss function value of the first neural network to be trained based on the image similarity between the first image feature and the second image feature;

在当前轮对应的所述损失函数值大于预设阈值的情况下，基于所述损失函数值对所述第一神经网络的网络参数值进行调整，并根据调整后的第一神经网络进行下一轮训练，直至所述损失函数值小于或等于预设阈值。When the loss function value corresponding to the current round is greater than a preset threshold, adjust the network parameter value of the first neural network based on the loss function value, and perform the next step according to the adjusted first neural network. rounds of training until the loss function value is less than or equal to the preset threshold.

在一种可能的实施方式中，在所述得到训练好的第一神经网络之后，所述方法还包括：In a possible implementation manner, after the trained first neural network is obtained, the method further includes:

获取下游任务中采集的第三图像样本；obtain the third image sample collected in the downstream task;

基于所述第三图像样本对训练好的所述第一神经网络再次进行网络训练，得到最终训练好的第一神经网络。Perform network training again on the trained first neural network based on the third image sample to obtain a final trained first neural network.

这里，可以基于下游任务中采集的第三图像样本对第一神经网络进行微调，扩展网络在下游任务中的泛化性能。Here, the first neural network can be fine-tuned based on the third image sample collected in the downstream task to expand the generalization performance of the network in the downstream task.

在一种可能的实施方式中，所述基于所述第三图像样本对训练好的所述第一神经网络再次进行网络训练，得到最终训练好的第一神经网络，包括：In a possible implementation manner, performing network training on the trained first neural network based on the third image sample again to obtain a final trained first neural network, including:

将所述第三图像样本输入至所述第一神经网络中，得到网络的任务输出结果；Inputting the third image sample into the first neural network to obtain the task output result of the network;

基于所述任务输出结果以及针对所述第三图像样本进行标注的任务标注结果之间的比对关系，确定所述第一神经网络的损失函数值；determining the loss function value of the first neural network based on the comparison between the task output result and the task labeling result labelled for the third image sample;

基于所述损失函数值对所述第一神经网络再次进行网络训练，得到最终训练好的第一神经网络。Perform network training on the first neural network again based on the loss function value to obtain a final trained first neural network.

在一种可能的实施方式中，按照如下步骤训练所述第二神经网络：In a possible implementation, the second neural network is trained according to the following steps:

获取原始神经网络；所述原始神经网络至少包括特征提取层；Obtain an original neural network; the original neural network includes at least a feature extraction layer;

基于所述原始神经网络包括的特征提取层对所述第一图像样本进行特征提取，得到特征提取层输出的图像特征信息；Perform feature extraction on the first image sample based on the feature extraction layer included in the original neural network to obtain image feature information output by the feature extraction layer;

基于所述图像特征信息对所述特征提取层的网络参数值进行调整，得到调整好的特征提取层；Adjust the network parameter values of the feature extraction layer based on the image feature information to obtain an adjusted feature extraction layer;

将包含有调整好的特征提取层的原始神经网络，确定为训练得到的第二神经网络。The original neural network including the adjusted feature extraction layer is determined as the second neural network obtained by training.

这里，可以基于包括有特征提取层的原始神经网络的训练，得到第二神经网络，该网络可以输出较为通用的特征信息，便于后续进行任务迁移。Here, a second neural network can be obtained based on the training of the original neural network including the feature extraction layer, and the network can output relatively general feature information to facilitate subsequent task migration.

第二方面，本公开实施例还提供了一种目标检测的方法，所述方法包括：In a second aspect, an embodiment of the present disclosure further provides a method for target detection, the method comprising:

获取下游任务中采集的目标图像；Obtain target images collected in downstream tasks;

将所述目标图像输入至利用第一方面及其各种实施方式任一所述的神经网络训练的方法训练好的第一神经网络，得到目标对象在所述目标图像中的检测结果。The target image is input into the first neural network trained by using the neural network training method according to any one of the first aspect and its various embodiments to obtain the detection result of the target object in the target image.

第三方面，本公开实施例还提供了一种神经网络训练的装置，所述装置包括：In a third aspect, an embodiment of the present disclosure further provides an apparatus for training a neural network, the apparatus comprising:

获取模块，用于获取上游任务中采集的第一图像样本、下游任务中待训练的第一神经网络、以及基于所述第一图像样本训练得到的第二神经网络和图像生成网络；所述第二神经网络用于进行特征提取，所述图像生成网络用于生成新图像，且所述新图像符合所述第一图像样本的整体分布；an acquisition module, configured to acquire a first image sample collected in an upstream task, a first neural network to be trained in a downstream task, and a second neural network and an image generation network trained based on the first image sample; the Two neural networks are used for feature extraction, and the image generation network is used to generate a new image, and the new image conforms to the overall distribution of the first image samples;

提取模块，用于分别根据训练得到的所述第二神经网络和所述待训练的第一神经网络对基于所述图像生成网络生成的新图像进行特征提取，得到第一图像特征和第二图像特征；an extraction module, configured to perform feature extraction on a new image generated based on the image generation network according to the second neural network obtained by training and the first neural network to be trained, respectively, to obtain the first image feature and the second image feature;

训练模块，用于基于所述第一图像特征和所述第二图像特征对所述待训练的第一神经网络进行训练，得到训练好的第一神经网络。A training module, configured to train the first neural network to be trained based on the first image feature and the second image feature to obtain a trained first neural network.

第四方面，本公开实施例还提供了一种目标检测的装置，所述装置包括：In a fourth aspect, an embodiment of the present disclosure further provides an apparatus for target detection, the apparatus comprising:

获取模块，用于获取下游任务中采集的目标图像；The acquisition module is used to acquire the target image collected in the downstream task;

检测模块，用于将所述目标图像输入至利用第一方面及其各种实施方式任一所述的神经网络训练的方法训练好的第一神经网络，得到目标对象在所述目标图像中的检测结果。The detection module is used to input the target image into the first neural network trained by the neural network training method according to any one of the first aspect and its various embodiments, to obtain the target object in the target image. Test results.

第五方面，本公开实施例还提供了一种电子设备，包括：处理器、存储器和总线，所述存储器存储有所述处理器可执行的机器可读指令，当电子设备运行时，所述处理器与所述存储器之间通过总线通信，所述机器可读指令被所述处理器执行时执行如第一方面及其各种实施方式任一所述的神经网络训练的方法的步骤或者如第二方面所述的目标检测的方法的步骤。In a fifth aspect, embodiments of the present disclosure further provide an electronic device, including: a processor, a memory, and a bus, where the memory stores machine-readable instructions executable by the processor, and when the electronic device runs, the Communication between the processor and the memory is through a bus, and when the machine-readable instructions are executed by the processor, the steps of the method for training a neural network according to any one of the first aspect and its various embodiments or as The steps of the method for target detection described in the second aspect.

第六方面，本公开实施例还提供了一种计算机可读存储介质，该计算机可读存储介质上存储有计算机程序，该计算机程序被处理器运行时执行如第一方面及其各种实施方式任一所述的神经网络训练的方法的步骤或者如第二方面所述的目标检测的方法的步骤。In a sixth aspect, embodiments of the present disclosure further provide a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and the computer program is executed by a processor when the first aspect and various implementations thereof are executed. Any of the steps of the neural network training method or the method of target detection according to the second aspect.

关于上述装置、电子设备、及计算机可读存储介质的效果描述参见上述方法的说明，这里不再赘述。For the description of the effects of the foregoing apparatus, electronic device, and computer-readable storage medium, reference may be made to the description of the foregoing method, and details are not repeated here.

为使本公开的上述目的、特征和优点能更明显易懂，下文特举较佳实施例，并配合所附附图，作详细说明如下。In order to make the above-mentioned objects, features and advantages of the present disclosure more obvious and easy to understand, the preferred embodiments are exemplified below, and are described in detail as follows in conjunction with the accompanying drawings.

附图说明Description of drawings

为了更清楚地说明本公开实施例的技术方案，下面将对实施例中所需要使用的附图作简单地介绍，此处的附图被并入说明书中并构成本说明书中的一部分，这些附图示出了符合本公开的实施例，并与说明书一起用于说明本公开的技术方案。应当理解，以下附图仅示出了本公开的某些实施例，因此不应被看作是对范围的限定，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他相关的附图。In order to explain the technical solutions of the embodiments of the present disclosure more clearly, the following briefly introduces the accompanying drawings required in the embodiments, which are incorporated into the specification and constitute a part of the specification. The drawings illustrate embodiments consistent with the present disclosure, and together with the description serve to explain the technical solutions of the present disclosure. It should be understood that the following drawings only show some embodiments of the present disclosure, and therefore should not be regarded as limiting the scope. Other related figures are obtained from these figures.

图1示出了本公开实施例所提供的一种神经网络训练的方法的流程图；1 shows a flowchart of a method for training a neural network provided by an embodiment of the present disclosure;

图2示出了本公开实施例所提供的一种目标检测的方法的流程图；FIG. 2 shows a flowchart of a method for target detection provided by an embodiment of the present disclosure;

图3示出了本公开实施例所提供的一种神经网络训练的装置的示意图；FIG. 3 shows a schematic diagram of an apparatus for training a neural network provided by an embodiment of the present disclosure;

图4示出了本公开实施例所提供的一种目标检测的装置的示意图；FIG. 4 shows a schematic diagram of an apparatus for target detection provided by an embodiment of the present disclosure;

图5示出了本公开实施例所提供的一种电子设备的示意图。FIG. 5 shows a schematic diagram of an electronic device provided by an embodiment of the present disclosure.

具体实施方式Detailed ways

为使本公开实施例的目的、技术方案和优点更加清楚，下面将结合本公开实施例中附图，对本公开实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例仅仅是本公开一部分实施例，而不是全部的实施例。通常在此处附图中描述和示出的本公开实施例的组件可以以各种不同的配置来布置和设计。因此，以下对在附图中提供的本公开的实施例的详细描述并非旨在限制要求保护的本公开的范围，而是仅仅表示本公开的选定实施例。基于本公开的实施例，本领域技术人员在没有做出创造性劳动的前提下所获得的所有其他实施例，都属于本公开保护的范围。In order to make the purposes, technical solutions and advantages of the embodiments of the present disclosure more clear, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present disclosure. Obviously, the described embodiments are only These are some, but not all, embodiments of the present disclosure. The components of the disclosed embodiments generally described and illustrated in the drawings herein may be arranged and designed in a variety of different configurations. Therefore, the following detailed description of the embodiments of the disclosure provided in the accompanying drawings is not intended to limit the scope of the disclosure as claimed, but is merely representative of selected embodiments of the disclosure. Based on the embodiments of the present disclosure, all other embodiments obtained by those skilled in the art without creative work fall within the protection scope of the present disclosure.

应注意到：相似的标号和字母在下面的附图中表示类似项，因此，一旦某一项在一个附图中被定义，则在随后的附图中不需要对其进行进一步定义和解释。It should be noted that like numerals and letters refer to like items in the following figures, so once an item is defined in one figure, it does not require further definition and explanation in subsequent figures.

本文中术语“和/或”，仅仅是描述一种关联关系，表示可以存在三种关系，例如，A和/或B，可以表示：单独存在A，同时存在A和B，单独存在B这三种情况。另外，本文中术语“至少一种”表示多种中的任意一种或多种中的至少两种的任意组合，例如，包括A、B、C中的至少一种，可以表示包括从A、B和C构成的集合中选择的任意一个或多个元素。The term "and/or" in this paper only describes an association relationship, which means that there can be three kinds of relationships, for example, A and/or B, which can mean: the existence of A alone, the existence of A and B at the same time, the existence of B alone. a situation. In addition, the term "at least one" herein refers to any combination of any one of the plurality or at least two of the plurality, for example, including at least one of A, B, and C, and may mean including from A, B, and C. Any one or more elements selected from the set of B and C.

经研究发现，在将预训练模型迁移到下游特定任务的过程中，相关技术中通常会通过模型微调提升在下游任务中的性能。Studies have found that in the process of migrating a pre-trained model to a specific downstream task, the performance of the downstream task is usually improved by model fine-tuning in related technologies.

现有的微调方式主要有以下两类：第一类可以是筛选映射主干网络提取的特征。在具体应用中，可以通过在主干网络后添加额外的网络层来实现上述筛选过程，也即，通过额外的网络层可以对主干网络提取的通用特征进行筛选、映射，保留并强化下游任务需要的特征，这里，额外的网络层可以是卷积层、归一化层等。第二类可以是操作主干网络权重参数。在具体应用中，不直接利用反向传播进行下游任务迁移，而是针对下游任务在指定权重参数空间内预测权重增量和偏移值，帮助主干网络适应下游任务。The existing fine-tuning methods mainly fall into the following two categories: The first category can be to filter and map the features extracted by the backbone network. In specific applications, the above screening process can be implemented by adding an additional network layer after the backbone network, that is, the general features extracted by the backbone network can be filtered and mapped through the additional network layer, and the features required by downstream tasks can be retained and strengthened. Features, here, additional network layers can be convolutional layers, normalization layers, etc. The second category can be manipulating backbone network weight parameters. In specific applications, backpropagation is not directly used to migrate downstream tasks, but weight increments and offset values are predicted for downstream tasks within the specified weight parameter space to help the backbone network adapt to downstream tasks.

然而，上述两类方法都存在缺点：第一类方法在下游任务中的数据量较小的情况下可能引起特征映射层过拟合；第二类方法的权重更新范围受限于指定的权重参数空间，无法保证权重优化到最佳状态，可见，它们对模型迁移性能的提升均有待提高。However, both of the above two types of methods have shortcomings: the first type of methods may cause overfitting of the feature map layer when the amount of data in the downstream task is small; the weight update range of the second type of methods is limited by the specified weight parameters Space, it is impossible to guarantee that the weights are optimized to the best state. It can be seen that they need to improve the performance of model migration.

除此之外，相关预训练模型往往具有特定的网络结构，在实际所迁徙的下游场景中，也许需要迁移到不同的网络结构，这更是对模型迁移性能提出了更高的要求。In addition, related pre-training models often have specific network structures. In the actual downstream scenarios to be migrated, they may need to be migrated to different network structures, which puts forward higher requirements for model migration performance.

基于上述研究，本公开提供了一种基于训练好的教师网络指导待训练的学生网络的带教训练方式实现网络迁移的方案，以提升预训练模型在下游任务中的表现性能。Based on the above research, the present disclosure provides a solution for network migration based on a trained teacher network instructing the student network to be trained, so as to improve the performance of the pre-training model in downstream tasks.

为便于对本实施例进行理解，首先对本公开实施例所公开的一种神经网络训练的方法进行详细介绍，本公开实施例所提供的神经网络训练的方法的执行主体一般为具有一定计算能力的电子设备，该电子设备例如包括：终端设备或服务器或其它处理设备，终端设备可以为用户设备(User Equipment，UE)、移动设备、用户终端、终端等。在一些可能的实现方式中，该神经网络训练的方法可以通过处理器调用存储器中存储的计算机可读指令的方式来实现。In order to facilitate the understanding of this embodiment, a method for training a neural network disclosed in an embodiment of the present disclosure is first introduced in detail. A device, the electronic device includes, for example, a terminal device or a server or other processing device, and the terminal device may be a user equipment (User Equipment, UE), a mobile device, a user terminal, a terminal, and the like. In some possible implementations, the method for training a neural network may be implemented by a processor invoking computer-readable instructions stored in a memory.

参见图1所示，为本公开实施例提供的神经网络训练的方法的流程图，方法包括步骤S101～S103，其中：Referring to FIG. 1, which is a flowchart of a method for training a neural network provided by an embodiment of the present disclosure, the method includes steps S101-S103, wherein:

S101：获取上游任务中采集的第一图像样本、下游任务中待训练的第一神经网络、以及基于第一图像样本训练得到的第二神经网络和图像生成网络；第二神经网络用于进行特征提取，图像生成网络用于生成新图像，且新图像符合第一图像样本的整体分布；S101: Obtain a first image sample collected in an upstream task, a first neural network to be trained in a downstream task, and a second neural network and an image generation network trained based on the first image sample; the second neural network is used for feature Extracting, the image generation network is used to generate a new image, and the new image conforms to the overall distribution of the first image samples;

S102：分别根据训练得到的第二神经网络和待训练的第一神经网络对基于图像生成网络生成的新图像进行特征提取，得到第一图像特征和第二图像特征；S102: Perform feature extraction on the new image generated based on the image generation network according to the second neural network obtained by training and the first neural network to be trained, to obtain the first image feature and the second image feature;

S103：基于第一图像特征和第二图像特征对待训练的第一神经网络进行训练，得到训练好的第一神经网络。S103: Train the first neural network to be trained based on the first image feature and the second image feature to obtain a trained first neural network.

为了便于理解本公开实施例提供的神经网络训练的方法，接下来首先对该方法的应用场景进行简单说明。本公开实施例中的神经网络训练的方法主要可以应用于视觉场景迁移下有关下游任务中的网络训练，这里的下游任务可以是基于当前所迁移到的场景下的相关任务，例如，可以是自然场景下的目标检测任务，还可以是采集场景下的语义分割任务。In order to facilitate the understanding of the neural network training method provided by the embodiments of the present disclosure, the following briefly describes the application scenario of the method. The neural network training method in the embodiments of the present disclosure can be mainly applied to network training in related downstream tasks under visual scene migration. The downstream tasks here may be related tasks based on the currently migrated scene, for example, may be natural The target detection task in the scene can also be the semantic segmentation task in the acquisition scene.

其中，在下游任务中可采集的训练样本数量相对较少。与下游任务对应的是上游任务，可以是具有较多训练样本的相关任务。以目标分类任务为例，目前已经具备包括由各目标对象组成的训练数据库训练得到的目标分类神经网络，然而对于下游具体的应用场景自动驾驶而言，由于对应这一场景下的训练数据相对较少，因而往往需要借助上游得到的预训练模型来支持下游的训练，例如，可以在对预训练模型进行微调后再进行迁移。Among them, the number of training samples that can be collected in downstream tasks is relatively small. Corresponding to downstream tasks are upstream tasks, which can be related tasks with more training samples. Taking the target classification task as an example, there is currently a target classification neural network trained by a training database consisting of various target objects. However, for the specific downstream application scenario of automatic driving, the training data corresponding to this scenario is relatively low. Therefore, it is often necessary to use the pre-trained model obtained from the upstream to support the downstream training. For example, the pre-trained model can be fine-tuned and then migrated.

然而，由于相关技术中的微调方案存在一系列的问题，导致预训练模型在下游任务中的表现性能并不佳。与此同时，在进行迁移过程中，相关技术中还存在使用上游预训练模型迁移到下游数据集上必须使用相同模型结构的限制，这导致无法将预训练模型很好地迁移到存在不同模型结构的下游任务中。However, due to a series of problems in the fine-tuning scheme in the related art, the performance of the pre-trained model in downstream tasks is not good. At the same time, during the migration process, there is also the limitation that the same model structure must be used when using the upstream pre-trained model to migrate to the downstream data set, which makes it impossible to transfer the pre-trained model well to the existence of different model structures. downstream tasks.

正是为了解决上述问题，本公开实施例才提供了一种基于训练好的教师网络指导待训练的学生网络的带教训练方式实现网络迁移的方案，以提升预训练模型在下游任务中的表现性能。It is in order to solve the above problem that the embodiments of the present disclosure provide a solution for network migration based on a trained teacher network instructing the student network to be trained, so as to improve the performance of the pre-trained model in downstream tasks. performance.

本公开实施例中，这里的预训练模型可以是在上游任务中，利用上游任务中采集的第一图像样本训练得到的第二神经网络。另外，这里的图像生成网络可以是生成符合第一图像样本的整体分布的新图像的相关网络。In the embodiment of the present disclosure, the pre-training model here may be a second neural network obtained by training in the upstream task using the first image samples collected in the upstream task. In addition, the image generation network here may be a correlation network that generates new images that conform to the overall distribution of the first image samples.

在具体应用中，可以预先准备有针对上游任务的上游数据集以及针对下游任务的下游数据集，上游数据集作为大规模预训练数据集，具有大量的第一图像样本，下游数据集作为待迁移到的数据集，具有少量的第二图像样本。In a specific application, an upstream data set for upstream tasks and a downstream data set for downstream tasks can be prepared in advance. The upstream data set is used as a large-scale pre-training data set with a large number of first image samples, and the downstream data set is used as a to-be-migrated data set. to the dataset, with a small number of second image samples.

其中，第一图像样本可以是在多个应用场景下的多个任务中采集的图像，这里的应用场景可以是自然场景、监控场景、采集场景等场景，这里的任务可以是图像分类、目标检测、语义分割等任务。第二图像样本可以是待迁移的特定场景、特定任务中采集的图像，比如检测任务中有关街道行人图像。The first image sample may be images collected in multiple tasks under multiple application scenarios, where the application scenarios may be natural scenes, monitoring scenarios, acquisition scenarios, and other scenarios, and the tasks here may be image classification, target detection, etc. , semantic segmentation and other tasks. The second image sample may be a specific scene to be migrated, an image collected in a specific task, such as an image of a pedestrian on a street in a detection task.

基于上述第一图像样本可以对包括有特征提取层的原始神经网络进行训练，这里，可以基于特征提取层对第一图像样本进行特征提取，而后通过特征提取层输出的图像特征信息对特征提取层的网络参数值进行调整，这样训练得到的原始神经网络可以确定为上述训练好的第二神经网络。The original neural network including the feature extraction layer can be trained based on the above-mentioned first image sample. Here, feature extraction can be performed on the first image sample based on the feature extraction layer, and then the feature extraction layer can be extracted by the image feature information output by the feature extraction layer. The value of the network parameter is adjusted, so that the original neural network obtained by training can be determined as the above-trained second neural network.

其中，上述原始神经网络是任一具有特征提供功能的网络结构，通过使用大规模上游数据(对应第一图像样本)对原始神经网络进行训练所得到的第二神经网络，对于任一张图像，其骨干网络部分(对应特征提取层)可以输出一个通用的特征表征。Wherein, the above-mentioned original neural network is any network structure with feature providing function, and the second neural network obtained by using large-scale upstream data (corresponding to the first image sample) to train the original neural network, for any image, Its backbone network part (corresponding to the feature extraction layer) can output a general feature representation.

需要说明的是，上述原始神经网络还可以在特征提取层之后，包含用于进行任务处理的任务层，这时可以利用任务层的任务输出结果与针对大规模上游数据的任务标注结果之间的匹配度来进行整个原始神经网络的训练，这里不再赘述。It should be noted that the above-mentioned original neural network can also include a task layer for task processing after the feature extraction layer. In this case, the difference between the task output results of the task layer and the task annotation results for large-scale upstream data can be used. The matching degree is used to train the entire original neural network, which will not be repeated here.

为了更好的训练下游任务中的第二神经网路，这里，可以将上述训练好的第二神经网络当作知识蒸馏中的教师模型，下游待迁移的第一神经网络当作知识蒸馏中的学生模型，以固定教师模型，训练学生模型的方式对第一神经网络进行训练。In order to better train the second neural network in the downstream task, here, the trained second neural network can be regarded as the teacher model in the knowledge distillation, and the downstream first neural network to be migrated is regarded as the knowledge distillation. The student model trains the first neural network in a manner of fixing the teacher model and training the student model.

在训练的过程中，可以分别利用第二神经网络和第一神经网络进行特征提取，而后基于第二神经网络这一教师模型所输出的第一图像特征以及第一神经网络这一学生模型所输出的第二图像特征之间的相似度来指导第一神经网络的训练，使得学生模型输出的表征与教师模型尽可能相近，进而可以更好的适配下游任务，在两者表征相近的情况下，即使下游任务和上游任务使用的不是相同的网络结构，也可以很好地完成任务指标。During the training process, the second neural network and the first neural network can be used for feature extraction respectively, and then based on the first image features output by the teacher model of the second neural network and the output of the student model by the first neural network The similarity between the second image features to guide the training of the first neural network, so that the representation of the output of the student model is as close as possible to the teacher model, so that the downstream tasks can be better adapted. , even if the downstream task and the upstream task do not use the same network structure, the task metrics can be done well.

考虑到在实际应用中，第一图像样本是来源于大规模预训练数据集中的图像，这使得不同的第一图像样本可能是基于不同的应用场景采集的。不同应用场景下所采集的第一图像样本的特性可能存在一定的差异，这一定程度上可能会对网络训练带来干扰，这里，为了在充分挖掘上游数据集中所蕴含的特征信息的前提下，降低无关信息的干扰，这里可以利用图像生成网络生成符合第一图像样本的整体分布的新图像，而后基于生成的新图像进行上述有关教师学生模型的训练，从而可以高效且准确的将第二神经网络所属预训练模型迁移到下游特定的任务领域，即使是在下游数据量较少的情况，也可以具有很好的迁移效果。Considering that in practical applications, the first image samples are derived from images in a large-scale pre-training dataset, which makes it possible that different first image samples may be collected based on different application scenarios. There may be some differences in the characteristics of the first image samples collected in different application scenarios, which may interfere with network training to a certain extent. Here, in order to fully mine the feature information contained in the upstream data set, To reduce the interference of irrelevant information, the image generation network can be used to generate new images that conform to the overall distribution of the first image samples, and then the above-mentioned teacher-student models can be trained based on the generated new images, so that the second neural network can be efficiently and accurately generated. The pre-trained model to which the network belongs is migrated to a specific downstream task domain, which can have a good migration effect even in the case of a small amount of downstream data.

考虑到图像生成网络的训练过程对于模型迁移的关键作用，接下来将重点说明有关训练图像生成网络的方案。其中，本公开实施例中的图像生成网络可以是在码本生成网络的前提下训练得到的，也可以是与码本生成网络同步训练得到的。具体可以通过如下两个方面进行展开。Considering the critical role of the training process of the image generation network for model transfer, the following will focus on the scheme for training the image generation network. The image generation network in the embodiment of the present disclosure may be obtained by training on the premise of the codebook generation network, or may be obtained by training synchronously with the codebook generation network. Specifically, it can be expanded in the following two aspects.

第一方面：本公开实施例可以按照如下步骤训练图像生成网络：The first aspect: the embodiment of the present disclosure can train the image generation network according to the following steps:

步骤一、获取基于码本生成网络输出的第一图像；码本生成网络用于生成将第一图像样本分解为包含多个基元的码本；Step 1: Obtain the first image output based on the codebook generation network; the codebook generation network is used to generate a codebook that decomposes the first image sample into a plurality of primitives;

步骤二、将第一图像输入至待训练的图像生成网络，得到图像生成网络输出的第二图像；Step 2, inputting the first image to the image generation network to be trained to obtain the second image output by the image generation network;

步骤三、基于第二图像与第一图像之间的图像相似度，确定待训练的图像生成网络的损失函数值；Step 3: Determine the loss function value of the image generation network to be trained based on the image similarity between the second image and the first image;

步骤四、基于损失函数值对待训练的图像生成网络进行训练，得到训练好的图像生成网络。Step 4: Train the image generation network to be trained based on the loss function value to obtain a trained image generation network.

这里，可以将码本生成网络输出的第一图像作为待训练的图像生成网络的输入图像，而后基于图像生成网络输出的第二图像与第一图像之间的图像相似度来确定图像生成网络的损失函数值。损失函数值越大，一定程度上说明输出的第二图像与输入的第一图像之间的差距比较大，此时需要再次进行网络训练，损失函数值越小，一定程度上说明输出的第二图像与输入的第一图像之间的差距比较小，在差距小到一定程度时，可以确定输出的图像与输入的图像基本一致，此时可以结束训练。Here, the first image output by the codebook generation network can be used as the input image of the image generation network to be trained, and then the image generation network is determined based on the image similarity between the second image output by the image generation network and the first image. loss function value. The larger the loss function value, the larger the gap between the output second image and the input first image to a certain extent. At this time, network training needs to be performed again. The smaller the loss function value, the higher the output second image. The gap between the image and the input first image is relatively small. When the gap is small to a certain extent, it can be determined that the output image is basically the same as the input image, and the training can be ended at this time.

为了更好的训练图像生成网络，在将第一图像输入至待训练的图像生成网络之前，可以对第一图像中的部分图像区域进行遮盖处理，得到遮盖处理后的第一图像，在将遮盖处理后的第一图像输入到待训练的图像生成网络的情况下，未遮盖部分图像区域可以指导遮盖部分图像区域的生成，进而可以基于生成的图像与原始的第一图像之间的接近程度实现网络训练。In order to better train the image generation network, before the first image is input to the image generation network to be trained, part of the image area in the first image can be masked to obtain the masked first image. When the processed first image is input to the image generation network to be trained, the uncovered part of the image area can guide the generation of the covered part of the image area, which can be realized based on the proximity between the generated image and the original first image. network training.

本公开实施例中，上述码本生成网络也可以是基于第一图像样本训练得到的。这里的码本生成网络主要是为了训练一个可以编码上游数据中视觉特征的码本，而后可以通过码本生成网络中生成的码本包含的多个基元进行图像复原，继而得到码本生成网络输出的第一图像。In the embodiment of the present disclosure, the above-mentioned codebook generation network may also be obtained by training based on the first image sample. The codebook generation network here is mainly to train a codebook that can encode the visual features in the upstream data, and then image restoration can be performed through the multiple primitives contained in the codebook generated in the codebook generation network, and then the codebook generation network can be obtained. The first image of the output.

接下来可以详细的介绍有关码本生成网络的训练过程以及应用过程。Next, the training process and application process of the codebook generation network can be introduced in detail.

在本公开实施例中，可以利用配对的编码器和解码器所构成的对抗网络进行码本生成网络的训练。这里，可以将第一图像样本输入到待训练的编码器，得到编码器输出的码本；将编码器输出的码本输入到待训练的解码器，得到解码器输出的图像，然后验证解码器输出的图像与输入编码器的第一图像样本之间的相似度是否大于预设阈值，如果不大于预设阈值，则循环上述将第一图像样本输入到待训练的编码器的过程，直至两个图像的相似度大于预设阈值。In the embodiments of the present disclosure, an adversarial network formed by paired encoders and decoders can be used to train the codebook generation network. Here, the first image sample can be input into the encoder to be trained, and the codebook output by the encoder can be obtained; the codebook output by the encoder can be input into the decoder to be trained to obtain the image output by the decoder, and then the decoder can be verified. Whether the similarity between the output image and the first image sample input to the encoder is greater than the preset threshold, if not greater than the preset threshold, then loop the above process of inputting the first image sample to the encoder to be trained until two The similarity of the images is greater than the preset threshold.

这里，利用训练好的码本生成网络可以使得一张图像通过编码器将图片分解为由若干个基元组成的码本，再通过解码器能将这些基元还原为该图像。Here, by using the trained codebook generation network, an image can be decomposed into a codebook consisting of several primitives through the encoder, and these primitives can be restored to the image through the decoder.

这里，可以将第一图像样本输入到码本生成网络包括的编码器，得到编码器输出的码本，在将编码器输出的码本输入到码本生成网络包括的解码器的情况下，可以利用码本所包含的各个基元进行图像还原，进而得到重表征后的第一图像。Here, the first image sample can be input into the encoder included in the codebook generation network to obtain the codebook output by the encoder. In the case of inputting the codebook output by the encoder into the decoder included in the codebook generation network, the Image restoration is performed using each primitive included in the codebook, and the re-characterized first image is obtained.

需要说明的是，在实际应用中，有关第一图像的确定过程，可以是联合码本生成网络的训练过程确定的，也即，可以重复执行将第一图像样本输入到待训练的编码器，得到编码器输出的码本，并将编码器输出的码本输入到待训练的解码器，得到解码器输出的图像的步骤，直至解码器输出的图像与输入到编码器中的第一图像样本之间的相似度大于预设阈值的情况下，将解码器输出的图像确定为第一图像。It should be noted that, in practical applications, the determination process of the first image may be determined by the training process of the joint codebook generation network, that is, the input of the first image sample into the encoder to be trained may be performed repeatedly, The steps of obtaining the codebook output by the encoder, and inputting the codebook output by the encoder into the decoder to be trained, and obtaining the image output by the decoder, until the image output by the decoder is the same as the first image sample input into the encoder When the similarity between them is greater than the preset threshold, the image output by the decoder is determined as the first image.

第二方面：在图像生成网络包括用于生成将第一图像样本分解为多个基元的码本的第一生成子网络、以及基于第一生成子网络输出的图像，生成新图像的第二生成子网络的情况下，本公开实施例可以按照如下步骤训练图像生成网络：Second aspect: the image generation network includes a first generation sub-network for generating a codebook that decomposes the first image sample into a plurality of primitives, and a second generation sub-network for generating a new image based on an image output by the first generation sub-network In the case of generating a sub-network, the embodiment of the present disclosure can train the image generating network according to the following steps:

步骤一、将第一图像样本输入至训练好的第一生成子网络，得到第一生成子网络输出的第一图像；Step 1: Input the first image sample into the trained first generation sub-network to obtain the first image output by the first generation sub-network;

步骤二、将第一图像输入至待训练的第二生成子网络，得到第二生成子网络输出的第二图像；Step 2, inputting the first image to the second generation sub-network to be trained to obtain the second image output by the second generation sub-network;

步骤三、基于第一图像与输入的第一图像样本之间的第一图像相似度、以及第二图像与第一图像之间的第二图像相似度，确定待训练的图像生成网络的损失函数值；Step 3: Determine the loss function of the image generation network to be trained based on the first image similarity between the first image and the input first image sample, and the second image similarity between the second image and the first image value;

这里，可以联合第一图像与输入的第一图像样本之间的第一图像相似度、以及第二图像与第一图像之间的第二图像相似度确定待训练的图像生成网络的损失函数值，不管是第一图像相似度还是第二图像相似度均会影响相关网络参数值的调整，也即，这里实现了对第一生成子网络和第二生成子网络的同步训练，训练效率更高。Here, the first image similarity between the first image and the input first image sample and the second image similarity between the second image and the first image can be combined to determine the loss function value of the image generation network to be trained , whether it is the similarity of the first image or the similarity of the second image will affect the adjustment of the relevant network parameter values, that is, the synchronous training of the first generation sub-network and the second generation sub-network is realized here, and the training efficiency is higher .

在训练得到包含第一生成子网络和第二生成子网络的情况下，对于任一输入的第一图像样本均可以生成对应的新图像，且该新图像中蕴含了上游任务丰富的数据信息，从而更适应于网络的训练需求。In the case that the training includes the first generation sub-network and the second generation sub-network, a corresponding new image can be generated for any input first image sample, and the new image contains rich data information of upstream tasks, Thus, it is more suitable for the training needs of the network.

在将基于图像生成网络生成的大量新图像输入到上游任务中的练好的第二神经网络(即预训练模型)的情况下，可以得到通用表征，使用通用表征进行知识蒸馏，可以将通用表征的知识，蒸馏到下游待迁移的第一神经网络中。In the case of inputting a large number of new images generated by the image generation network into the trained second neural network (ie, the pre-training model) in the upstream task, a general representation can be obtained, and the general representation can be used for knowledge distillation. knowledge is distilled into the first neural network to be migrated downstream.

本公开实施例中，可以基于训练好的第二神经网络和待训练的第一神经网络所提取出的两个图像特征(即第一图像特征和第二图像特征)之间的图像相似度来指导下游任务中有关第一神经网络的训练，具体可以通过如下步骤来实现：In the embodiment of the present disclosure, the image similarity between two image features (ie, the first image feature and the second image feature) extracted by the trained second neural network and the to-be-trained first neural network may be used to determine Instruct the training of the first neural network in the downstream task, which can be achieved by the following steps:

步骤一、基于第一图像特征和第二图像特征之间的图像相似度，确定待训练的第一神经网络的损失函数值；Step 1: Determine the loss function value of the first neural network to be trained based on the image similarity between the first image feature and the second image feature;

步骤二、在当前轮对应的损失函数值大于预设阈值的情况下，基于损失函数值对第一神经网络的网络参数值进行调整，并根据调整后的第一神经网络进行下一轮训练，直至损失函数值小于或等于预设阈值。Step 2: When the loss function value corresponding to the current round is greater than the preset threshold, adjust the network parameter value of the first neural network based on the loss function value, and perform the next round of training according to the adjusted first neural network, until the loss function value is less than or equal to the preset threshold.

这里，在两个图像特征之间的图像相似度与第一神经网络的损失函数值呈负相关，也即，在图像相似度比较小的情况下，所确定的损失函数值较大，在图像相似度比较大的情况下，所确定的损失函数值比较小。本公开实施例训练第一神经网络的目的在于使得两个神经网络(第二神经网络和第一神经网络)输出的表征尽可能相近。Here, the image similarity between the two image features is negatively correlated with the loss function value of the first neural network, that is, in the case that the image similarity is relatively small, the determined loss function value is larger. When the similarity is relatively large, the determined loss function value is relatively small. The purpose of training the first neural network in the embodiment of the present disclosure is to make the representations output by the two neural networks (the second neural network and the first neural network) as similar as possible.

为了更进一步地扩展第一神经网络在下游任务领域的泛化性能，这里，可以使用下游任务中采集的第二图像样本对第一神经网络进行微调，具体可以通过如下步骤来实现：In order to further expand the generalization performance of the first neural network in the downstream task field, here, the second image sample collected in the downstream task can be used to fine-tune the first neural network, which can be achieved by the following steps:

步骤一、将第二图像样本输入至第一神经网络中，得到网络的任务输出结果；Step 1. Input the second image sample into the first neural network to obtain the task output result of the network;

步骤二、基于任务输出结果以及针对第二图像样本进行标注的任务标注结果之间的比对关系，确定第一神经网络的损失函数值；Step 2: Determine the loss function value of the first neural network based on the comparison between the task output result and the task labeling result labelled for the second image sample;

步骤三、基于损失函数值对第一神经网络再次进行网络训练，得到最终训练好的第一神经网络。Step 3: Perform network training on the first neural network again based on the loss function value to obtain a final trained first neural network.

这里，可以通过第一神经网络包括的特征提取层进行特征提取，在将特征提取层输出的特征信息输入到第一神经网络包括的任务层的情况下，可以基于任务输出结果以及针对第二图像样本的任务标注结果的匹配结果进行第一神经网络的多轮训练。Here, feature extraction can be performed through the feature extraction layer included in the first neural network. In the case where the feature information output by the feature extraction layer is input into the task layer included in the first neural network, it can be based on the task output result and for the second image. The matching results of the task labeling results of the samples are subjected to multiple rounds of training of the first neural network.

本公开实施例中，在任务输出结果以及任务标注结果不匹配的情况下，说明当前的网络性能不佳，需要进行网络参数值的调整以进行下一轮训练，直到两个结果相匹配或者直到满足其它网络收敛条件，例如，迭代轮次达到预设次数，再如，损失函数值小于预设阈值等。In the embodiment of the present disclosure, in the case where the task output result and the task labeling result do not match, it indicates that the current network performance is not good, and it is necessary to adjust the network parameter values for the next round of training, until the two results match or until the Other network convergence conditions are met, for example, the number of iterations reaches a preset number, and another example, the loss function value is less than a preset threshold, etc.

针对不同的下游任务，这里的任务标注结果也不同。例如，有的图像样本可以是针对目标检测任务标记的有关目标对象的位置、大小等信息，有的图像样本可以是针对目标语义分割任务标记的对象语义信息。这里可以针对不同的下游任务进行标注，对此不做具体的限制。For different downstream tasks, the task annotation results here are also different. For example, some image samples may be information such as the position and size of the target object marked for the target detection task, and some image samples may be the object semantic information marked for the target semantic segmentation task. Different downstream tasks can be marked here, and there are no specific restrictions on this.

本公开实施例在基于第二图像样本进行网络微调的过程中，可以针对网络包括的各个网络层的整体调整过程，这里，可以放开各个网络层的所有参数，使用较小的学习率，进行网络的最终调整，从而可以显著提升网络在下游任务领域的泛化性能。In the process of fine-tuning the network based on the second image sample in the embodiment of the present disclosure, the overall adjustment process of each network layer included in the network can be performed. Here, all parameters of each network layer can be released, and a smaller learning rate can be used to perform The final adjustment of the network can significantly improve the generalization performance of the network in the downstream task domain.

基于本公开实施例提供的上述神经网络训练的方法，本公开实施例还提供了一种目标检测的方法，如图2所示，具体包括如下步骤：Based on the above-mentioned neural network training method provided by the embodiment of the present disclosure, an embodiment of the present disclosure also provides a method for target detection, as shown in FIG. 2 , which specifically includes the following steps:

S201：获取下游任务中采集的目标图像；S201: Obtain the target image collected in the downstream task;

S202：将目标图像输入至利用神经网络训练的方法训练好的第一神经网络，得到目标对象在目标图像中的检测结果。S202: Input the target image into the first neural network trained by using the neural network training method to obtain the detection result of the target object in the target image.

这里，在获取到下游任务中采集的目标图像的情况下，可以基于训练得到的用于进行目标检测的第一神经网络对目标图像中的目标对象进行检测，得到目标对象在目标图像中的检测结果。Here, when the target image collected in the downstream task is acquired, the target object in the target image can be detected based on the trained first neural network for target detection, and the detection of the target object in the target image can be obtained. result.

其中，目标对象在目标图像中的检测结果可以是目标对象在目标图像中的位置、大小等信息。The detection result of the target object in the target image may be information such as the position and size of the target object in the target image.

本公开实施例中，不同的下游任务所采集的目标图像也不同，具体可以参见有关第二图像样本的采集过程，这里不再赘述。有关第一神经网络的训练过程参见上述实施例中的相关描述，在此也不再赘述。In the embodiment of the present disclosure, the target images collected by different downstream tasks are also different. For details, please refer to the collection process of the second image sample, which will not be repeated here. For the training process of the first neural network, refer to the relevant descriptions in the foregoing embodiments, and details are not repeated here.

需要说明的是，本公开实施例提供的神经网络训练的方法不仅可以应用于目标检测领域，还可以应用于图像分类、语义分割等领域，在此不再赘述。It should be noted that the neural network training method provided by the embodiments of the present disclosure can be applied not only to the field of target detection, but also to the fields of image classification, semantic segmentation, and the like, which will not be repeated here.

本领域技术人员可以理解，在具体实施方式的上述方法中，各步骤的撰写顺序并不意味着严格的执行顺序而对实施过程构成任何限定，各步骤的具体执行顺序应当以其功能和可能的内在逻辑确定。Those skilled in the art can understand that in the above method of the specific implementation, the writing order of each step does not mean a strict execution order but constitutes any limitation on the implementation process, and the specific execution order of each step should be based on its function and possible Internal logic is determined.

基于同一发明构思，本公开实施例中还提供了与方法对应的装置，由于本公开实施例中的装置解决问题的原理与本公开实施例上述方法相似，因此装置的实施可以参见方法的实施，重复之处不再赘述。Based on the same inventive concept, an apparatus corresponding to the method is also provided in the embodiment of the present disclosure. Since the principle of solving the problem in the apparatus in the embodiment of the present disclosure is similar to the above-mentioned method in the embodiment of the present disclosure, the implementation of the apparatus may refer to the implementation of the method. The repetition will not be repeated.

参照图3所示，为本公开实施例提供的一种神经网络训练的装置的示意图，装置包括：获取模块301、提取模块302、训练模块303；其中，Referring to FIG. 3 , which is a schematic diagram of an apparatus for training a neural network according to an embodiment of the present disclosure, the apparatus includes: an acquisition module 301 , an extraction module 302 , and a training module 303 ; wherein,

获取模块301，用于获取上游任务中采集的第一图像样本、下游任务中待训练的第一神经网络、以及基于第一图像样本训练得到的第二神经网络和图像生成网络；第二神经网络用于进行特征提取，图像生成网络用于生成新图像，且新图像符合第一图像样本的整体分布；The acquisition module 301 is used to acquire the first image sample collected in the upstream task, the first neural network to be trained in the downstream task, and the second neural network and the image generation network obtained by training based on the first image sample; the second neural network For feature extraction, the image generation network is used to generate a new image, and the new image conforms to the overall distribution of the first image samples;

提取模块302，用于分别根据训练得到的第二神经网络和待训练的第一神经网络对基于图像生成网络生成的新图像进行特征提取，得到第一图像特征和第二图像特征；The extraction module 302 is used to perform feature extraction on the new image generated based on the image generation network according to the second neural network obtained by training and the first neural network to be trained, to obtain the first image feature and the second image feature;

训练模块303，用于基于第一图像特征和第二图像特征对待训练的第一神经网络进行训练，得到训练好的第一神经网络。The training module 303 is configured to train the first neural network to be trained based on the first image feature and the second image feature to obtain a trained first neural network.

采用上述神经网络训练的装置，可以分别根据下游任务中待训练的第一神经网络和基于上游任务中采集的第一图像样本训练得到的第二神经网络对基于图像生成网络生成的新图像进行特征提取，而后可以根据得到的第一图像特征和第二图像特征对第一神经网络进行训练。由于基于图像生成网络生成的新图像是更为符合第一图像样本的整体分布的图像样本，这样的图像样本会更有利于适应于第二神经网络的网络环境，与此同时，通过训练好的第二神经网络输出的第一图像特征以及待训练的第一神经网络输出的第二图像特征可以更好地指导第一神经网络的训练，从而进一步提升在下游任务的表现性能。Using the above-mentioned apparatus for neural network training, the new image generated based on the image generation network can be characterized according to the first neural network to be trained in the downstream task and the second neural network obtained by training based on the first image sample collected in the upstream task. extraction, and then the first neural network can be trained according to the obtained first image features and the second image features. Since the new images generated based on the image generation network are image samples that are more in line with the overall distribution of the first image samples, such image samples are more conducive to adapting to the network environment of the second neural network. The first image feature output by the second neural network and the second image feature output by the first neural network to be trained can better guide the training of the first neural network, thereby further improving the performance in downstream tasks.

在一种可能的实施方式中，获取模块301，用于按照如下步骤训练图像生成网络：In a possible implementation, the acquisition module 301 is used to train the image generation network according to the following steps:

获取基于码本生成网络输出的第一图像；码本生成网络用于生成将第一图像样本分解为包含多个基元的码本；obtaining the first image output based on the codebook generation network; the codebook generation network is used to generate a codebook that decomposes the first image sample into a plurality of primitives;

将第一图像输入至待训练的图像生成网络，得到图像生成网络输出的第二图像；Inputting the first image to the image generation network to be trained to obtain the second image output by the image generation network;

基于第二图像与第一图像之间的图像相似度，确定待训练的图像生成网络的损失函数值；Determine the loss function value of the image generation network to be trained based on the image similarity between the second image and the first image;

基于损失函数值对待训练的图像生成网络进行训练，得到训练好的图像生成网络。The image generation network to be trained is trained based on the loss function value, and the trained image generation network is obtained.

在一种可能的实施方式中，获取模块301，用于将第一图像输入至待训练的图像生成网络：In a possible implementation manner, the acquisition module 301 is configured to input the first image to the image generation network to be trained:

对第一图像中的部分图像区域进行遮盖处理，得到遮盖处理后的第一图像；performing masking processing on part of the image area in the first image to obtain the masked first image;

将遮盖处理后的第一图像输入至待训练的图像生成网络。Input the masked first image to the image generation network to be trained.

在一种可能的实施方式中，码本生成网络包括编码器和解码器，获取模块301，用于按照如下步骤训练码本生成网络：In a possible implementation, the codebook generation network includes an encoder and a decoder, and an acquisition module 301 is used to train the codebook generation network according to the following steps:

重复执行以下步骤，直至解码器输出的图像与输入到编码器中的第一图像样本之间的相似度大于预设阈值：Repeat the following steps until the similarity between the image output by the decoder and the first image sample input into the encoder is greater than a preset threshold:

将第一图像样本输入到待训练的编码器，得到编码器输出的码本；将编码器输出的码本输入到待训练的解码器，得到解码器输出的图像。The first image sample is input into the encoder to be trained to obtain a codebook output by the encoder; the codebook output from the encoder is input to the decoder to be trained to obtain an image output by the decoder.

在一种可能的实施方式中，获取模块301，用于按照如下步骤获取基于码本生成网络输出的第一图像：In a possible implementation manner, the obtaining module 301 is configured to obtain the first image output based on the codebook generation network according to the following steps:

将第一图像样本输入到码本生成网络包括的编码器，得到编码器输出的码本；Inputting the first image sample into the encoder included in the codebook generation network to obtain the codebook output by the encoder;

将编码器输出的码本输入到码本生成网络包括的解码器，得到解码器输出的第一图像。The codebook output by the encoder is input to the decoder included in the codebook generation network to obtain the first image output by the decoder.

在一种可能的实施方式中，图像生成网络包括用于生成将第一图像样本分解为多个基元的码本的第一生成子网络、以及基于第一生成子网络输出的图像，生成新图像的第二生成子网络；获取模块301，用于按照如下步骤训练图像生成网络：In a possible implementation, the image generation network includes a first generation sub-network for generating a codebook that decomposes the first image sample into a plurality of primitives, and generates a new generation sub-network based on the image output by the first generation sub-network The second generation sub-network of the image; the acquisition module 301 is used to train the image generation network according to the following steps:

将第一图像样本输入至训练好的第一生成子网络，得到第一生成子网络输出的第一图像；Input the first image sample into the trained first generation sub-network to obtain the first image output by the first generation sub-network;

将第一图像输入至待训练的第二生成子网络，得到第二生成子网络输出的第二图像；Input the first image to the second generation sub-network to be trained, and obtain the second image output by the second generation sub-network;

基于第一图像与输入的第一图像样本之间的第一图像相似度、以及第二图像与第一图像之间的第二图像相似度，确定待训练的图像生成网络的损失函数值；Determine the loss function value of the image generation network to be trained based on the first image similarity between the first image and the input first image sample, and the second image similarity between the second image and the first image;

在一种可能的实施方式中，训练模块303，用于按照以下步骤基于第一图像特征和第二图像特征对待训练的第一神经网络进行训练，得到训练好的第一神经网络：In a possible implementation, the training module 303 is used to train the first neural network to be trained based on the first image feature and the second image feature according to the following steps, to obtain a trained first neural network:

基于第一图像特征和第二图像特征之间的图像相似度，确定待训练的第一神经网络的损失函数值；Determine the loss function value of the first neural network to be trained based on the image similarity between the first image feature and the second image feature;

在当前轮对应的损失函数值大于预设阈值的情况下，基于损失函数值对第一神经网络的网络参数值进行调整，并根据调整后的第一神经网络进行下一轮训练，直至损失函数值小于或等于预设阈值。When the loss function value corresponding to the current round is greater than the preset threshold, the network parameter value of the first neural network is adjusted based on the loss function value, and the next round of training is performed according to the adjusted first neural network until the loss function The value is less than or equal to the preset threshold.

在一种可能的实施方式中，训练模块303，还用于：In a possible implementation manner, the training module 303 is further used for:

在得到训练好的第一神经网络之后，获取下游任务中采集的第二图像样本；基于第二图像样本对训练好的第一神经网络再次进行网络训练，得到最终训练好的第一神经网络。After the trained first neural network is obtained, a second image sample collected in the downstream task is obtained; network training is performed again on the trained first neural network based on the second image sample to obtain a final trained first neural network.

在一种可能的实施方式中，训练模块303，用于按照以下步骤基于第二图像样本对训练好的第一神经网络再次进行网络训练，得到最终训练好的第一神经网络：In a possible implementation, the training module 303 is configured to perform network training again on the trained first neural network based on the second image sample according to the following steps to obtain the final trained first neural network:

将第二图像样本输入至第一神经网络中，得到网络的任务输出结果；Input the second image sample into the first neural network to obtain the task output result of the network;

基于任务输出结果以及针对第二图像样本进行标注的任务标注结果之间的比对关系，确定第一神经网络的损失函数值；Determine the loss function value of the first neural network based on the comparison between the task output result and the task labeling result labelled for the second image sample;

基于损失函数值对第一神经网络再次进行网络训练，得到最终训练好的第一神经网络。Network training is performed on the first neural network again based on the loss function value to obtain a final trained first neural network.

在一种可能的实施方式中，获取模块301，用于按照如下步骤训练第二神经网络：In a possible implementation manner, the acquisition module 301 is configured to train the second neural network according to the following steps:

获取原始神经网络；原始神经网络至少包括特征提取层；Obtain the original neural network; the original neural network includes at least a feature extraction layer;

基于原始神经网络包括的特征提取层对第一图像样本进行特征提取，得到特征提取层输出的图像特征信息；Perform feature extraction on the first image sample based on the feature extraction layer included in the original neural network to obtain image feature information output by the feature extraction layer;

基于图像特征信息对特征提取层的网络参数值进行调整，得到调整好的特征提取层；Adjust the network parameter values of the feature extraction layer based on the image feature information to obtain the adjusted feature extraction layer;

参照图4所示，为本公开实施例提供的一种目标检测的装置的示意图，装置包括：获取模块401、检测模块402；其中，Referring to FIG. 4 , which is a schematic diagram of an apparatus for target detection provided by an embodiment of the present disclosure, the apparatus includes: an acquisition module 401 and a detection module 402 ; wherein,

获取模块401，用于获取下游任务中采集的目标图像；an acquisition module 401, configured to acquire target images collected in downstream tasks;

检测模块402，用于将目标图像输入至利用神经网络训练的方法训练好的第一神经网络，得到目标对象在目标图像中的检测结果。The detection module 402 is configured to input the target image into the first neural network trained by the neural network training method to obtain the detection result of the target object in the target image.

关于装置中的各模块的处理流程、以及各模块之间的交互流程的描述可以参照上述方法实施例中的相关说明，这里不再详述。For the description of the processing flow of each module in the apparatus and the interaction flow between the modules, reference may be made to the relevant descriptions in the foregoing method embodiments, which will not be described in detail here.

对应于图1和图2中的方法，本公开实施例还提供了一种电子设备，如图5所示，为本公开实施例提供的电子设备的结构示意图，包括：Corresponding to the methods in FIGS. 1 and 2 , an embodiment of the present disclosure further provides an electronic device. As shown in FIG. 5 , a schematic structural diagram of the electronic device provided by an embodiment of the present disclosure includes:

处理器501、存储器502、和总线503；存储器502用于存储执行指令，包括内存5021和外部存储器5022；这里的内存5021也称内存储器，用于暂时存放处理器501中的运算数据，以及与硬盘等外部存储器5022交换的数据，处理器501通过内存5021与外部存储器5022进行数据交换，当电子设备运行时，处理器501与存储器502之间通过总线503通信，使得处理器501执行图1所示的神经网络训练的方法的步骤或者执行图2所示的目标检测的方法的步骤。The processor 501, the memory 502, and the bus 503; the memory 502 is used to store the execution instructions, including the memory 5021 and the external memory 5022; the memory 5021 here is also called the internal memory, which is used to temporarily store the operation data in the processor 501, and For the data exchanged by the external memory 5022 such as the hard disk, the processor 501 exchanges data with the external memory 5022 through the memory 5021. When the electronic device is running, the processor 501 and the memory 502 communicate through the bus 503, so that the processor 501 executes the data shown in FIG. 1 . The steps of the method for training a neural network shown in FIG. 2 or the steps for performing the method for target detection shown in FIG.

本公开实施例还提供一种计算机可读存储介质，该计算机可读存储介质上存储有计算机程序，该计算机程序被处理器运行时执行上述方法实施例中所述的方法的步骤。其中，该存储介质可以是易失性或非易失的计算机可读取存储介质。Embodiments of the present disclosure further provide a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is run by a processor, the steps of the methods described in the foregoing method embodiments are executed. Wherein, the storage medium may be a volatile or non-volatile computer-readable storage medium.

本公开实施例还提供一种计算机程序产品，该计算机程序产品承载有程序代码，所述程序代码包括的指令可用于执行上述方法实施例中所述的方法的步骤，具体可参见上述方法实施例，在此不再赘述。Embodiments of the present disclosure further provide a computer program product, where the computer program product carries program codes, and the instructions included in the program codes can be used to execute the steps of the methods described in the foregoing method embodiments. For details, please refer to the foregoing method embodiments. , and will not be repeated here.

其中，上述计算机程序产品可以具体通过硬件、软件或其结合的方式实现。在一个可选实施例中，所述计算机程序产品具体体现为计算机存储介质，在另一个可选实施例中，计算机程序产品具体体现为软件产品，例如软件开发包(Softw第一神经网络reDevelopment Kit，SDK)等等。Wherein, the above-mentioned computer program product can be specifically implemented by means of hardware, software or a combination thereof. In an optional embodiment, the computer program product is embodied as a computer storage medium, and in another optional embodiment, the computer program product is embodied as a software product, such as a software development kit (Softw first neural network reDevelopment Kit). , SDK) and so on.

所属领域的技术人员可以清楚地了解到，为描述的方便和简洁，上述描述的系统和装置的具体工作过程，可以参考前述方法实施例中的对应过程，在此不再赘述。在本公开所提供的几个实施例中，应该理解到，所揭露的系统、装置和方法，可以通过其它的方式实现。以上所描述的装置实施例仅仅是示意性的，例如，所述单元的划分，仅仅为一种逻辑功能划分，实际实现时可以有另外的划分方式，又例如，多个单元或组件可以结合或者可以集成到另一个系统，或一些特征可以忽略，或不执行。另一点，所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些通信接口，装置或单元的间接耦合或通信连接，可以是电性，机械或其它的形式。Those skilled in the art can clearly understand that, for the convenience and brevity of description, for the specific working process of the system and device described above, reference may be made to the corresponding process in the foregoing method embodiments, which will not be repeated here. In the several embodiments provided by the present disclosure, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. The apparatus embodiments described above are only illustrative. For example, the division of the units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored, or not implemented. On the other hand, the shown or discussed mutual coupling or direct coupling or communication connection may be through some communication interfaces, indirect coupling or communication connection of devices or units, which may be in electrical, mechanical or other forms.

所述作为分离部件说明的单元可以是或者也可以不是物理上分开的，作为单元显示的部件可以是或者也可以不是物理单元，即可以位于一个地方，或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.

另外，在本公开各个实施例中的各功能单元可以集成在一个处理单元中，也可以是各个单元单独物理存在，也可以两个或两个以上单元集成在一个单元中。In addition, each functional unit in each embodiment of the present disclosure may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.

所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时，可以存储在一个处理器可执行的非易失的计算机可读取存储介质中。基于这样的理解，本公开的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来，该计算机软件产品存储在一个存储介质中，包括若干指令用以使得一台电子设备(可以是个人计算机，服务器，或者网络设备等)执行本公开各个实施例所述方法的全部或部分步骤。而前述的存储介质包括：U盘、移动硬盘、只读存储器(Re第一神经网络d-Only Memory，ROM)、随机存取存储器(R第一神经网络ndom第一神经网络ccessMemory，R第一神经网络M)、磁碟或者光盘等各种可以存储程序代码的介质。The functions, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a processor-executable non-volatile computer-readable storage medium. Based on such understanding, the technical solutions of the present disclosure can be embodied in the form of software products in essence, or the parts that contribute to the prior art or the parts of the technical solutions. The computer software products are stored in a storage medium, including Several instructions are used to cause an electronic device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in various embodiments of the present disclosure. The aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (Re first neural network d-Only Memory, ROM), random access memory (R first neural network ndom first neural network accessMemory, R first neural network accessMemory, R first neural network d-Only Memory, ROM) Various media that can store program codes, such as neural network M), magnetic disk or optical disk.

最后应说明的是：以上所述实施例，仅为本公开的具体实施方式，用以说明本公开的技术方案，而非对其限制，本公开的保护范围并不局限于此，尽管参照前述实施例对本公开进行了详细的说明，本领域的普通技术人员应当理解：任何熟悉本技术领域的技术人员在本公开揭露的技术范围内，其依然可以对前述实施例所记载的技术方案进行修改或可轻易想到变化，或者对其中部分技术特征进行等同替换；而这些修改、变化或者替换，并不使相应技术方案的本质脱离本公开实施例技术方案的精神和范围，都应涵盖在本公开的保护范围之内。因此，本公开的保护范围应所述以权利要求的保护范围为准。Finally, it should be noted that the above-mentioned embodiments are only specific implementations of the present disclosure, and are used to illustrate the technical solutions of the present disclosure rather than limit them. The protection scope of the present disclosure is not limited thereto, although referring to the foregoing The embodiments describe the present disclosure in detail, and those skilled in the art should understand that: any person skilled in the art can still modify the technical solutions described in the foregoing embodiments within the technical scope disclosed by the present disclosure. Changes can be easily thought of, or equivalent replacements are made to some of the technical features; and these modifications, changes or replacements do not make the essence of the corresponding technical solutions deviate from the spirit and scope of the technical solutions of the embodiments of the present disclosure, and should be covered in the present disclosure. within the scope of protection. Therefore, the protection scope of the present disclosure should be based on the protection scope of the claims.

Claims

1. a method for neural network training, wherein the method comprises:

Obtain the first image sample collected in the upstream task, the first neural network to be trained in the downstream task, and the second neural network and image generation network obtained by training based on the first image sample; the second neural network is used for performing feature extraction, the image generation network is used to generate a new image, and the new image conforms to the overall distribution of the first image samples;

Perform feature extraction on the new image generated based on the image generation network according to the second neural network obtained by training and the first neural network to be trained, to obtain the first image feature and the second image feature;

The first neural network to be trained is trained based on the first image feature and the second image feature to obtain a trained first neural network.

2. The method according to claim 1, wherein the image generation network is trained according to the following steps:

obtaining a first image output based on a codebook generation network; the codebook generation network is used to generate a codebook that decomposes the first image sample into a plurality of primitives;

Inputting the first image to the image generation network to be trained to obtain a second image output by the image generation network;

determining a loss function value of the image generation network to be trained based on the image similarity between the second image and the first image;

The image generation network to be trained is trained based on the loss function value to obtain a trained image generation network.

3. The method according to claim 2, wherein the inputting the first image to the image generation network to be trained comprises:

performing masking processing on part of the image area in the first image to obtain a masked first image;

The masked first image is input to the image generation network to be trained.

4. The method according to claim 2 or 3, wherein the codebook generation network comprises an encoder and a decoder, and the codebook generation network is trained according to the following steps:

The following steps are repeated until the similarity between the image output by the decoder and the first image sample input into the encoder is greater than a preset threshold:

Input the first image sample into the encoder to be trained to obtain the codebook output by the encoder; input the codebook output from the encoder to the decoder to be trained to obtain the image output from the decoder .

5. method according to claim 4, is characterized in that, obtains the first image based on codebook generation network output according to the following steps:

inputting the first image sample into an encoder included in the codebook generation network to obtain a codebook output by the encoder;

The codebook output by the encoder is input to the decoder included in the codebook generation network to obtain the first image output by the decoder.

6. The method of claim 1, wherein the image generation network comprises a first generation sub-network for generating a codebook that decomposes the first image sample into a plurality of primitives, and a The image output by the first generation sub-network, and the second generation sub-network of the new image is generated; the image generation network is trained according to the following steps:

Inputting the first image sample into the trained first generation sub-network to obtain the first image output by the first generation sub-network;

Inputting the first image to the second generation sub-network to be trained to obtain a second image output by the second generation sub-network;

Determine the to-be-trained based on the first image similarity between the first image and the input first image sample, and the second image similarity between the second image and the first image The loss function value of the image generation network;

7. The method according to any one of claims 1 to 6, wherein the first neural network to be trained is trained based on the first image feature and the second image feature to obtain the training Good first neural networks, including:

determining a loss function value of the first neural network to be trained based on the image similarity between the first image feature and the second image feature;

When the loss function value corresponding to the current round is greater than a preset threshold, adjust the network parameter value of the first neural network based on the loss function value, and perform the next step according to the adjusted first neural network. rounds of training until the loss function value is less than or equal to the preset threshold.

8. The method according to any one of claims 1 to 7, wherein, after the trained first neural network is obtained, the method further comprises:

obtaining the second image sample collected in the downstream task;

Perform network training again on the trained first neural network based on the second image sample to obtain a final trained first neural network.

9. The method according to claim 8, wherein the network training is performed again on the trained first neural network based on the second image sample to obtain the final trained first neural network, comprising: :

Inputting the second image sample into the first neural network to obtain a task output result of the network;

determining a loss function value of the first neural network based on the comparison between the task output result and the task labeling result labelled for the second image sample;

Perform network training on the first neural network again based on the loss function value to obtain a final trained first neural network.

10. The method according to any one of claims 1 to 9, wherein the second neural network is trained according to the following steps:

Obtain an original neural network; the original neural network includes at least a feature extraction layer;

Perform feature extraction on the first image sample based on the feature extraction layer included in the original neural network to obtain image feature information output by the feature extraction layer;

Adjust the network parameter values of the feature extraction layer based on the image feature information to obtain an adjusted feature extraction layer;

The original neural network including the adjusted feature extraction layer is determined as the second neural network obtained by training.

11. A method for target detection, wherein the method comprises:

Obtain target images collected in downstream tasks;

The target image is input into the first neural network trained by the neural network training method according to any one of claims 1 to 10 to obtain the detection result of the target object in the target image.

12. A device for neural network training, wherein the device comprises:

an acquisition module, configured to acquire a first image sample collected in an upstream task, a first neural network to be trained in a downstream task, and a second neural network and an image generation network trained based on the first image sample; the Two neural networks are used for feature extraction, and the image generation network is used to generate a new image, and the new image conforms to the overall distribution of the first image samples;

an extraction module, configured to perform feature extraction on a new image generated based on the image generation network according to the second neural network obtained by training and the first neural network to be trained, respectively, to obtain the first image feature and the second image feature;

A training module, configured to train the first neural network to be trained based on the first image feature and the second image feature to obtain a trained first neural network.

13. A device for target detection, characterized in that the device comprises:

The acquisition module is used to acquire the target image collected in the downstream task;

The detection module is configured to input the target image into the first neural network trained by the neural network training method according to any one of claims 1 to 10, to obtain the detection result of the target object in the target image.

14. An electronic device, comprising: a processor, a memory, and a bus, wherein the memory stores machine-readable instructions executable by the processor, and when the electronic device is running, the processor and the The memory is communicated through a bus, and when the machine-readable instructions are executed by the processor, the steps of the neural network training method according to any one of claims 1 to 10 or the target detection method according to claim 11 are executed. steps of the method.

15. A computer-readable storage medium, characterized in that, a computer program is stored on the computer-readable storage medium, and the computer program is executed by a processor when the neural network training according to any one of claims 1 to 10 is executed. The steps of the method or the steps of the method of object detection as claimed in claim 11 .