WO2023273720A1

WO2023273720A1 - Method and apparatus for training model, and device, and storage medium

Info

Publication number: WO2023273720A1
Application number: PCT/CN2022/095186
Authority: WO
Inventors: 张炜; 许靖; 梅涛; 周伯文
Original assignee: 京东科技控股股份有限公司
Priority date: 2021-06-28
Filing date: 2022-05-26
Publication date: 2023-01-05
Also published as: CN115618218A

Abstract

Disclosed in the present application are a method and apparatus for training a model. A specific implementation scheme includes: acquiring a training sample set, wherein training samples in the training sample set comprise target images and feature maps corresponding to the target images; and by using the target images comprised in the training samples in the training sample set as input data of a network, and using the feature maps corresponding to the input target images as output data of the network, performing training to obtain an image detection model, wherein a network structure of the image detection model is constructed on the basis of each model sub-structure in each type of sub-structure; an optimization objective of the image detection model is to obtain, by means of sampling a structure parameter of each model sub-structure in each type of sub-structure and performing learning, an optimal solution of the network structure of the image detection model; and each type of sub-structure is obtained by means of analyzing the feature map that is processed by each model sub-structure in a search space of the network structure of the image detection model.

Description

Method, device, device and storage medium for training model

Cross References to Related Applications

This application claims the priority of the Chinese patent application with application number 202110717772.7 and titled "Method, device, device and storage medium for training model" filed on June 28, 2021, the entire contents of which are hereby incorporated by reference In this application.

technical field

The embodiments of the present application relate to the field of computer technology, specifically to the field of image processing technology, and especially to a method and device for training a model.

Background technique

The development of computer vision and graphics technology has made it easier to forge digital images, resulting in the generation of a large number of malicious false images and videos circulating on the Internet, causing serious social impact. In order to deal with this threat, image counterfeiting technology came into being. Image counterfeiting is to identify the faked image from the real image. Due to the existence of a variety of digital image forgery technologies with vastly different underlying principles, the image authentication model needs to have the ability to detect different forgery methods at the same time.

Contents of the invention

The present application provides a method, device, device and storage medium for training a model and a method, device, device and storage medium for generating information.

Some embodiments of the present application provide a method for training a model, the method comprising: obtaining a training sample set, wherein the training samples in the training sample set include a target image and a feature map corresponding to the target image; The target image included in the training sample is used as the input data of the network, and the feature map corresponding to the input target image is used as the output data of the network, and the image detection model is obtained through training, wherein the network structure of the image detection model is based on various substructures. Each model substructure is constructed. The optimization goal of the image detection model is to learn the optimal solution of the network structure of the image detection model by sampling the structural parameters of each model substructure in each substructure. It is obtained by analyzing the feature maps processed by each model substructure in the network structure search space of the image detection model.

In some embodiments, various substructures are obtained by dividing feature semantic levels of feature maps processed by each model substructure in the network structure search space of the image detection model.

In some embodiments, the network structure of the image detection model is constructed based on summarizing model substructures in the same substructure and stacking them according to the levels divided by various substructures.

In some embodiments, the training of the image detection model is used to represent that each model substructure in the image detection model is transformed into multiple feature layers, and the input data of each feature layer includes: corresponding model substructures in the network structure of the image detection model At least one of the output data of at least two model substructures preceding the structure and the output data of each feature layer preceding the corresponding model substructure. Specifically, the input data of each feature layer may include the output data of at least two model substructures before the corresponding model substructure in the network structure of the image detection model; or, the input data of each feature layer may include the output data of the image detection model The output data of each feature layer before the corresponding model substructure in the network structure; or, the input data of each feature layer may include: the output of at least two model substructures before the corresponding model substructure in the network structure of the image detection model The output data of each feature layer before the data and the corresponding model substructure.

In some embodiments, various substructures include: a first substructure and a second substructure, the image detection model includes a first submodel and a second submodel, and the network structure of the first submodel is based on the Each model substructure is constructed, and the network structure of the second submodel is constructed based on each model substructure in the second substructure; the target image included in the training sample in the training sample set is used as the input data of the network, and the input target The feature map corresponding to the image is used as the output data of the network, and the image detection model is obtained by training, including: using the target image included in the training samples in the training sample set as the input data of the network, and using the feature map corresponding to the input target image as the output of the network Data, adjust the structural parameters of each model substructure in a class of substructures, and obtain the first submodel after training; use the target image included in the training sample in the training sample set as the input data of the network, and use the input data corresponding to the input target image The feature map is used as the output data of the network, and the structural parameters of each model substructure in the second type of substructure are adjusted to obtain the trained second submodel; based on the trained first submodel and the trained second submodel, determine the image detection model.

In some embodiments, the image detection model includes a first detection sub-model and a second detection sub-model, the model parameters of the first detection sub-model are the training parameters of the image detection model, and the model parameters of the second detection sub-model are the image detection model Structural parameters; the target image included in the training samples in the training sample set is used as the input data of the network, and the feature map corresponding to the input target image is used as the output data of the network, and the image detection model is obtained by training, including: the training sample set The target image included in the training sample is used as input data, the feature map corresponding to the input target image is used as output data, and the training parameters are adjusted to obtain the first detection sub-model that has been trained; the target image included in the training sample set in the training sample set is used as Input data, use the feature map corresponding to the input target image as output data, adjust the structural parameters, and obtain the trained second detection sub-model; based on the trained first detection sub-model and the trained second detection sub-model, Determine the image detection model.

Some embodiments of the present application provide a method for generating information. The method includes: acquiring a target image; inputting the target image into a pre-trained image detection model to generate a feature map corresponding to the target image, wherein the image detection model passes Obtained by training according to any embodiment of the method for training the model above.

In some embodiments, the image detection model includes a first detection sub-model and a second detection sub-model, the first detection sub-model is used to represent the training parameters of the image detection model using machine learning algorithms, and the second detection sub-model uses In characterization, the machine learning algorithm is used to train the structural parameters of the image detection model; the target image is input into the pre-trained image detection model, and the feature map corresponding to the target image is generated, including: inputting the target image into the pre-trained first detector model to generate the first feature map corresponding to the target image; input the target image to the pre-trained second detection sub-model to generate the second feature map corresponding to the target image; based on the first feature map and the second feature map, determine the The feature map corresponding to the first feature map and the second feature map is used as the feature map corresponding to the target image.

Some embodiments of the present application provide a device for training a model, the device includes: an acquisition unit configured to acquire a training sample set, wherein the training samples in the training sample set include a target image and a feature map corresponding to the target image The training unit is configured to use the target image included in the training sample set as the input data of the network, and use the feature map corresponding to the input target image as the output data of the network to train the image detection model, wherein the image detection The network structure of the model is constructed based on each model substructure in various substructures. The optimization goal of the image detection model is to learn the network structure of the image detection model by sampling the structural parameters of each model substructure in each substructure. The optimal solution of , various substructures are obtained by analyzing the feature maps processed by each model substructure in the network structure search space of the image detection model.

In some embodiments, various substructures in the training unit are obtained by dividing the feature semantic levels of the feature maps processed by each model substructure in the network structure search space of the image detection model.

In some embodiments, the network structure of the image detection model in the training unit is constructed based on summarizing the model substructures in the same kind of substructures and stacking them according to the levels divided by various substructures.

In some embodiments, the training of the image detection model in the training unit is used to represent that each model substructure in the image detection model is transformed into multiple feature layers, and the input data of each feature layer includes: the network structure of the image detection model At least one of the output data of at least two model substructures before the corresponding model substructure in the corresponding model substructure and the output data of each feature layer before the corresponding model substructure. Specifically, the input data of each feature layer may include the output data of at least two model substructures before the corresponding model substructure in the network structure of the image detection model; or, the input data of each feature layer may include the output data of the image detection model The output data of each feature layer before the corresponding model substructure in the network structure; or, the input data of each feature layer may include: the output of at least two model substructures before the corresponding model substructure in the network structure of the image detection model The output data of each feature layer before the data and the corresponding model substructure.

In some embodiments, the various substructures in the training unit include: a first substructure and a second substructure, the image detection model includes a first submodel and a second submodel, and the network structure of the first submodel is based on a class Each model substructure in the substructure is constructed, and the network structure of the second submodel is constructed based on each model substructure in the second type substructure; the training unit includes: the first training module, which is configured to use the training sample set The target image included in the training sample is used as the input data of the network, and the feature map corresponding to the input target image is used as the output data of the network, and the structural parameters of each model substructure in a class of substructures are adjusted to obtain the first trained submodel ; The second training module is configured to use the target image included in the training sample set as the input data of the network, and use the feature map corresponding to the input target image as the output data of the network, and adjust each model in the second class substructure The structural parameter of the substructure is to obtain the trained second sub-model; the first determination module is configured to determine the image detection model based on the trained first sub-model and the trained second sub-model.

In some embodiments, the image detection model in the training unit includes a first detection sub-model and a second detection sub-model, the model parameters of the first detection sub-model are the training parameters of the image detection model, and the model parameters of the second detection sub-model is the structural parameter of the image detection model; the training unit includes: a third training module configured to use the target image included in the training samples in the training sample set as input data, and use the feature map corresponding to the input target image as output data, Adjust the training parameters to obtain the trained first detection sub-model; the fourth training module is configured to use the target image included in the training samples in the training sample set as input data, and use the feature map corresponding to the input target image as output data , adjust the structural parameters to obtain the trained second detection sub-model; the second determination module is configured to determine the image detection model based on the trained first detection sub-model and the trained second detection sub-model.

Some embodiments of the present application provide an apparatus for generating information, and the apparatus includes: an image acquisition unit configured to acquire a target image; a generation unit configured to input the target image into a pre-trained image detection model to generate A feature map corresponding to the target image, wherein the image detection model is obtained through training according to any embodiment of the method for training the model above.

In some embodiments, the image detection model in the generation unit includes a first detection sub-model and a second detection sub-model, the first detection sub-model is used to represent the training parameters of the image detection model using machine learning algorithms, and the second detection sub-model The detection sub-model is used to represent and train the structural parameters of the image detection model using a machine learning algorithm; the generation unit includes: a first generation module configured to input the target image to the pre-trained first detection sub-model to generate the target image The corresponding first feature map; the second generation module is configured to input the target image to the pre-trained second detection sub-model to generate a second feature map corresponding to the target image; the second determination module is configured to be based on the first A feature map and a second feature map, determining a feature map corresponding to the first feature map and the second feature map as a feature map corresponding to the target image.

Some embodiments of the present application provide an electronic device, including: at least one processor; and a memory communicatively connected to the at least one processor; wherein, the memory stores instructions executable by at least one processor, and the instructions are executed by at least one processor. Executed by a processor, so that at least one processor can execute the method described in the foregoing implementation manner.

Some embodiments of the present application provide a non-transitory computer-readable storage medium storing computer instructions, wherein the computer instructions are used to cause a computer to execute the method described in the foregoing implementation manners.

It should be understood that what is described in this section is not intended to identify key or important features of the embodiments of the application, nor is it intended to limit the scope of the application. Other features of the present application will be easily understood from the following description.

Description of drawings

The accompanying drawings are used to better understand the solution, and do not constitute a limitation to the application.

FIG. 1 is a schematic diagram of a first embodiment of a method for training a model according to the present application;

FIG. 2 is a scene diagram of a method for training a model that can implement an embodiment of the present application;

3 is a schematic diagram of a second embodiment of a method for training a model according to the present application;

Fig. 4 is the schematic diagram that realizes the model submodule structure of the embodiment of the present application;

5A is a schematic diagram of a network model architecture according to the method for training a model of the present application;

5B and 5C are schematic diagrams of the sampling process of the model sub-module structure in the network model architecture;

FIG. 5D is a schematic diagram of the sampling results of the network model architecture;

Fig. 6 is a schematic diagram of a first embodiment of a method for generating information according to the present application;

FIG. 7 is a schematic structural diagram of an embodiment of a device for training a model according to the present application;

Fig. 8 is a schematic structural diagram of an embodiment of a device for generating information according to the present application;

FIG. 9 is a block diagram of an electronic device used to implement an embodiment of the present application.

detailed description

Exemplary embodiments of the present application are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present application to facilitate understanding, and they should be regarded as exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

It should be noted that, in the case of no conflict, the embodiments in the present application and the features in the embodiments can be combined with each other. The present application will be described in detail below with reference to the accompanying drawings and embodiments.

FIG. 1 shows a schematic diagram 100 of a first embodiment of a method for training a model according to the present application. The method for training the model includes the following steps:

Step 101, obtain a training sample set.

In this embodiment, the execution subject (such as a server or a terminal device) may obtain the training sample set from other electronic devices or locally through a wired connection or a wireless connection. The training samples in the training sample set include target images and feature maps corresponding to the target images. It should be noted that the above-mentioned wireless connection methods may include but not limited to 3G, 4G, 5G connection, WiFi connection, Bluetooth connection, WiMAX connection, Zigbee connection, UWB (ultra wideband) connection, and other wireless connections known or developed in the future. connection method.

Step 102, using the target image included in the training samples in the training sample set as the input data of the network, and using the feature map corresponding to the input target image as the output data of the network, to train an image detection model.

In this embodiment, the execution subject can use the machine learning algorithm to train the image detection model by using the target image obtained in step 101 as the input data of the network, and using the feature map corresponding to the input target image as the output data of the network. Among them, the network structure of the model is constructed based on each model substructure in various substructures. For example, each model substructure in different types of substructures is stacked according to the preset construction method. The optimization goal of the image detection model can be The machine learning algorithm is used to sample the structural parameters of each model substructure in various substructures each time, and continuously sample a better structure until convergence to obtain the optimal model network structure. Various substructures can be obtained through the model The feature map processed by each model substructure in the network structure search space is analyzed for feature information, and each model substructure in each substructure is classified based on the analysis results. The feature information can be forged feature information in the image.

It should be noted that the above execution body can store a pre-trained image detection model, and the network architecture of the image detection model is predefined, for example, an eight-layer basic model architecture is defined for the search of the model sub-module structure. The execution subject may use the image detection model to predict a feature map corresponding to feature information in the target image in the target image. Among them, the image detection model can be used to characterize the correspondence between the target object and the feature map. The model structure of the image prediction model can be constructed based on various logistic regression models in related technologies, such as but not limited to: BERT, FastText, TextCNN, etc.

It should be noted that the image detection model may be, for example, a data table or a calculation formula, and this embodiment does not make any limitation on this aspect. The above-mentioned machine learning algorithm is a well-known technology widely researched and applied at present, and will not be repeated here.

Continuing to refer to FIG. 2 , the method 200 for training a model of this embodiment runs on a server 201 . The server 201 first obtains the training sample set 202, wherein the training samples in the training sample set include the target image and the feature map corresponding to the target image, and then the server 201 uses the target image included in the training sample set as the input data of the network, and The feature map corresponding to the input target image is used as the output data of the network to train the image detection model 203, wherein the network structure of the model is constructed based on each model substructure in various substructures, and the optimization goal of the image detection model is to pass Sampling the structural parameters of each model substructure in various substructures, learning to obtain the optimal solution of the model network structure, and analyzing the feature maps processed by each model substructure in the network structure search space of the model for each substructure And get.

The method for training the model provided by the above-mentioned embodiments of the present application adopts obtaining a training sample set, wherein, the training samples in the training sample set include a target image and a feature map corresponding to the target image; the training samples in the training sample set include the target The image is used as the input data of the network, and the feature map corresponding to the input target image is used as the output data of the network to train the image detection model. The network structure of the model is constructed based on each model substructure in various substructures. The image The optimization goal of the detection model is to learn the optimal solution of the model network structure by sampling the structural parameters of each model substructure in various substructures. The processed feature map is analyzed and obtained, and the research and optimization of the image detection model structure is realized. By classifying the model substructure, different levels of information are separated from each other, and a model structure with different contents is generated, which realizes a separation based on Construction of Image Detection Models in Search Space. Allocate more resources to the processing of required types of information according to requirements, realize lightweight ground image authentication, avoid waste of resources, and improve the accuracy and efficiency of the model as a whole.

With further reference to FIG. 3 , a schematic diagram 300 of a second embodiment of a method for training a model is shown. The flow of the method includes the following steps:

Step 301, acquire a training sample set.

In step 302, the target image included in the training samples in the training sample set is used as the input data of the network, and the feature map corresponding to the input target image is used as the output data of the network to train an image detection model.

In this embodiment, the execution subject can use the machine learning algorithm to train the image detection model by using the target image obtained in step 301 as the input data of the network, and using the feature map corresponding to the input target image as the output data of the network. Among them, the network structure of the model is constructed based on each model substructure in each substructure, and the optimization goal of the image detection model can be during training, by sampling the structural parameters of each model substructure in each substructure each time , continue to sample a better structure until it converges to obtain the optimal model network structure, and various substructures are obtained by dividing the feature semantic levels of the feature maps processed by each model substructure in the network structure search space of the model , for example, based on the semantic features, the corresponding information of the feature map is divided into shallow/bottom information and deep information according to the operation mode of different depths, and each model substructure corresponding to the shallow information of the model is divided into shallow substructures, and the deep layer of the model Each model substructure corresponding to the information is divided into deep substructures. Shallow layer/bottom layer information can include bottom layer pixel features, pixel distribution, and frequency domain structure, and deep layer information can include object structure, shape, and other information.

In this embodiment, the training of the image detection model can be used to represent that each model substructure in the image detection model is transformed into multiple feature layers, and the input data of each feature layer includes: before the corresponding model substructure in the model network structure At least one of the output data of at least two model substructures and the output data of each feature layer before the corresponding model substructure. For example, the model sub-module structure definition is shown in Figure 4. Each sub-module structure receives the output of the first two sub-module structures as input (Input ₁ and Input ₂ in the figure), and performs three-layer feature transformation on these two inputs (in the figure Node ₁ ~ Node ₃ ), the input of each feature map includes two inputs of the sub-module structure and the output of the previous feature maps of each layer, and the sum of the results is used as the output of this layer, and finally each feature layer (Node ₁ ~ Node ₃ ) output superposition as the output of the model sub-module structure.

Here is an example to illustrate the sampling process of the image detection model. First, the basic network model architecture of the eight-layer image detection model is defined. As shown in Figure 5A, the bottom two modules (cell 1 and cell 2) are divided into shallow substructure-based Shallow unit, the upper six modules (cell 3 ~ cell 8) are divided into deep units based on deep substructure. The controller samples a specific structure for the shallow unit and the deep unit each time, so as to obtain a candidate model. The test results of the candidate model are used to train the controller by means of reinforcement learning or gradient optimization, so that it can continuously sample a better structure until it converges to obtain the optimal model structure. The controller can use commonly used structure search algorithms. In the sampling process, each unit searches for each edge (the dotted line edge shown in Figure 5B) and selects an operation type to obtain a subnetwork of the unit (as shown in Figure 5C), and finally obtains the sampling result As shown in Figure 5D, the left picture is the sampling result of the deep unit, the right picture is the sampling result of the shallow unit, max_pool indicates the maximum pooling, and dil_conv_5x5 indicates the dilated convolution with a kernel size of 5 , dil_conv_3x3 means a dilated convolution with a kernel size of 3, identity means an identity map, and sep_conv_3x3 means a depth-wise separable convolution with a kernel_size of 3.

In some optional implementations of this embodiment, the network structure of the model is constructed based on summarizing each model substructure in the same substructure and then stacking them according to the levels divided by various substructures. The shallow structure of the model is separated from the deep structure, so that the model can focus more on the processing of the underlying information.

In some optional implementations of this embodiment, various substructures include: a first substructure and a second substructure, the image detection model includes a first submodel and a second submodel, and the network structure of the first submodel It is constructed based on each model substructure in the first substructure, and the network structure of the second submodel is constructed based on each model substructure in the second substructure; the target image included in the training samples in the training sample set is used as the input of the network Data, using the feature map corresponding to the input target image as the output data of the network, training the image detection model, including: using the target image included in the training samples in the training sample set as the input data of the network, corresponding to the input target image The feature map of the network is used as the output data of the network, and the structural parameters of each model substructure in a class of substructures are adjusted to obtain the first submodel after training; the target image included in the training samples in the training sample set is used as the input data of the network, and the The feature map corresponding to the input target image is used as the output data of the network, and the structural parameters of each model substructure in the second type of substructure are adjusted to obtain the second submodel that has been trained; based on the first submodel that has been trained and the completed training The second sub-model determines the image detection model. By splitting the model into multiple types of sub-module structures, searching for the optimal sub-structure for each type of sub-module structure, and then stacking the optimal sub-structures to build a complete model, improving the efficiency of model training.

In some optional implementations of this embodiment, the image detection model includes a first detection sub-model and a second detection sub-model, the model parameters of the first detection sub-model are the training parameters of the model, and the model of the second detection sub-model The parameter is the structural parameter of the model; the target image included in the training samples in the training sample set is used as the input data of the network, and the feature map corresponding to the input target image is used as the output data of the network, and the image detection model is obtained by training, including: training The target image included in the training sample in the sample set is used as input data, and the feature map corresponding to the input target image is used as output data, and the training parameters are adjusted to obtain the first detection sub-model that has been trained; the training sample included in the training sample set is The target image is used as input data, the feature map corresponding to the input target image is used as output data, and the structural parameters are adjusted to obtain the trained second detection sub-model; based on the trained first detection sub-model and the trained second detection Submodel, which determines the image detection model. The image detection model can be built based on the neural network. The parameters of the neural network model include: training parameters and network structure parameters. The training parameters are other parameters obtained by training except the model structure parameters, such as step size learning rate, data sample number batch size, weight weight decay, etc. The network structure parameters are parameters that define the network structure of the image detection model, such as the number of layers of the network, the operator of each layer, and the filter size in the convolution. By simultaneously training the model's training parameters and model structure parameters, the accuracy and efficiency of model training are improved, and the model is more widely used.

In this embodiment, the specific operation of step 301 is basically the same as the operation of step 101 in the embodiment shown in FIG. 1 , and will not be repeated here.

As can be seen from FIG. 3 , compared with the embodiment corresponding to FIG. 1 , the schematic diagram 300 of the method for training a model in this embodiment adopts various substructures by searching for each model substructure in the network structure space of the model The feature semantic level of the processed feature map is divided, and the training of the model is used to represent the transformation of each model substructure in the model into multiple feature layers. The input data of each feature layer includes: the corresponding in the model network structure At least one of the output data of at least two model substructures preceding the model substructure and the output data of each feature layer preceding the corresponding model substructure. It solves the problem that the standard deep learning image classification model used by related technologies relies on high-level semantic features, while image counterfeiting technology relies on low-level semantic features, and the needs of the two conflict. The design of the image classification model tends to use a deeper model structure. In the application of image forgery, it causes the problem of waste of computing resources. By dividing the model substructure, the operations of different depths are separated from each other, and different structures are used. Different processing methods are used for different levels of information, and the construction of an image detection model based on depth-separated search space is realized. On the depth-separated search space, a lightweight image authentication model is realized.

Further referring to FIG. 6 , it shows a schematic diagram 600 of a first embodiment of a method for generating information according to the present application. The method for generating information includes the following steps:

Step 601, acquire a target image.

In this embodiment, the execution subject (such as a server or a terminal device) may acquire the target image from other electronic devices or locally through a wired connection or a wireless connection.

Step 602: Input the target image into the pre-trained image detection model to generate a feature map corresponding to the target image.

In this embodiment, the execution subject may input the target image obtained in step 601 into a pre-trained image detection model to generate a feature map corresponding to feature information in the target image. The image detection model is obtained through training according to any one of the above-mentioned methods for training the model.

In some optional implementations of this embodiment, the image detection model includes a first detection sub-model and a second detection sub-model, the first detection sub-model is used to represent the training parameters of the model using a machine learning algorithm, and the second detection sub-model The second detection sub-model is used to represent the structural parameters of the model using machine learning algorithms to train; input the target image to the pre-trained image detection model to generate the feature map corresponding to the target image, including: input the target image to the pre-trained first A detection sub-model, generating a first feature map corresponding to the target image; inputting the target image to a pre-trained second detection sub-model, generating a second feature map corresponding to the target image; based on the first feature map and the second feature map, The feature map corresponding to the first feature map and the second feature map is determined as the feature map corresponding to the target image. Two sub-modules are used to extract features separately, which improves the processing efficiency and accuracy of the system. By adding a sub-model based on model structural parameters on the basis of the original convolutional neural network model, the model improvement is more flexible and convenient.

It can be seen from FIG. 6 that, compared with the embodiment corresponding to FIG. 1 , the process 600 of the method for generating information in this embodiment highlights the use of the trained image detection model to generate feature information in the target image The corresponding feature map steps. Therefore, the solution described in this embodiment can realize targeted feature extraction of different types, different levels, and different depths.

Further referring to FIG. 7 , as an implementation of the method shown in FIGS. 1 to 3 above, the present application provides an embodiment of a device for training a model, which corresponds to the method embodiment shown in FIG. 1 , in addition to the features described below, the device embodiment may also include the same or corresponding features as the method embodiment shown in Figure 1, and produce the same or corresponding effects as the method embodiment shown in Figure 1, the device Specifically, it can be applied to various electronic devices.

As shown in FIG. 7 , the apparatus 700 for training a model in this embodiment includes: an acquisition unit 701 and a training unit 702, wherein the acquisition unit is configured to acquire a training sample set, wherein the training samples in the training sample set include the target image and a feature map corresponding to the target image; the training unit is configured to use the target image included in the training samples in the training sample set as the input data of the network, and use the feature map corresponding to the input target image as the output data of the network, train The image detection model is obtained, wherein the network structure of the model is constructed based on each model substructure in various substructures, and the optimization goal of the image detection model is to learn by sampling the structural parameters of each model substructure in various substructures. The optimal solution of the model network structure is obtained, and various substructures are obtained by analyzing the feature maps processed by each model substructure in the network structure search space of the model.

In this embodiment, for the specific processing of the acquisition unit 701 and the training unit 702 of the apparatus 700 for training the model and the technical effects brought about by them, please refer to the relevant steps from step 101 to step 102 in the embodiment corresponding to FIG. 1 . description and will not be repeated here.

In some optional implementations of this embodiment, the various substructures in the training unit are obtained by dividing the feature semantic levels of the feature maps processed by each model substructure in the network structure search space of the model.

In some optional implementation manners of this embodiment, the network structure of the model in the training unit is constructed based on summarizing each model substructure in the same kind of substructure and then stacking them according to the levels divided by various substructures.

In some optional implementations of this embodiment, the training of the image detection model in the training unit is used to represent that each model substructure in the model is transformed into multiple feature layers, and the input data of each feature layer includes: model At least one of the output data of at least two model substructures before the corresponding model substructure in the network structure and the output data of each feature layer before the corresponding model substructure.

In some optional implementations of this embodiment, the various substructures in the training unit include: one type of substructure and two types of substructure, the image detection model includes the first submodel and the second submodel, the first substructure The network structure of the model is constructed based on each model substructure in the first type of substructure, and the network structure of the second submodel is constructed based on each model substructure in the second type of substructure; the training unit includes: the first training module, which is It is configured to use the target image included in the training samples in the training sample set as the input data of the network, use the feature map corresponding to the input target image as the output data of the network, adjust the structural parameters of each model substructure in a class of substructures, and obtain The first sub-model that has been trained; the second training module is configured to use the target image included in the training sample in the training sample set as the input data of the network, and use the feature map corresponding to the input target image as the output data of the network, adjust The structural parameters of each model substructure in the second type of substructure obtain the trained second sub-model; the first determining module is configured to determine the image detection based on the trained first sub-model and the trained second sub-model Model.

In some optional implementations of this embodiment, the image detection model in the training unit includes a first detection sub-model and a second detection sub-model, the model parameters of the first detection sub-model are the training parameters of the model, and the second detection sub-model The model parameters of the sub-model are structural parameters of the model; the training unit includes: a third training module configured to use the target image included in the training samples in the training sample set as input data, and use the feature map corresponding to the input target image as Output data, adjust the training parameters, and obtain the first detection sub-model that has been trained; the fourth training module is configured to use the target image included in the training sample in the training sample set as input data, and use the feature map corresponding to the input target image As the output data, the structural parameters are adjusted to obtain the trained second detection sub-model; the second determination module is configured to determine the image detection model based on the trained first detection sub-model and the trained second detection sub-model.

The above-mentioned embodiments of the present disclosure provide an apparatus for training a model. The training sample set is acquired by the first acquisition unit, wherein the training samples in the training sample set include head images, feature information of head objects in the head images, and The feature map corresponding to the feature information of the head object, and then, the training unit uses the machine learning algorithm to use the head image included in the training sample in the training sample set as input data, and the feature of the head object corresponding to the input head image The information and the feature map corresponding to the feature information of the head object are used as the expected output data, and the feature extraction model is trained. The feature extraction model is constructed based on the convolutional neural network. The parameters of the convolutional neural network model include: scale parameters and others Convolution kernel parameters, the scale parameter is the scale structure of the head object set by using the scale space theory, and other convolution kernel parameters are other parameters of the convolution kernel in the convolutional neural network except the scale parameter, which enriches the training of the model This method is helpful to realize feature extraction in multi-scale space based on the trained model.

Continuing to refer to FIG. 8, as an implementation of the above-mentioned method shown in FIG. 6, the present application provides an embodiment of a device for generating information. This device embodiment corresponds to the method embodiment shown in FIG. 6, In addition to the features described below, the device embodiment may also include the same or corresponding features as the method embodiment shown in Figure 6, and produce the same or corresponding effects as the method embodiment shown in Figure 6, the device specifically It can be applied to various electronic devices.

As shown in FIG. 8 , the apparatus 800 for generating information in this embodiment includes: an image acquisition unit 801 and a generation unit 802, wherein the image acquisition unit is configured to acquire a target image; the generation unit is configured to convert the target image Input to the pre-trained image detection model to generate a feature map corresponding to the target image, wherein the image detection model is obtained by training according to any embodiment of the method for training the model above.

In this embodiment, for the specific processing of the image acquisition unit 801 and the generation unit 802 of the device 800 for generating information and the technical effects brought about by them, please refer to steps 601 to 602 in the embodiment corresponding to FIG. 6 Relevant descriptions will not be repeated here.

In some optional implementations of this embodiment, the image detection model in the generation unit includes a first detection sub-model and a second detection sub-model, and the first detection sub-model is used to represent the training parameters of the model using a machine learning algorithm For training, the second detection sub-model is used to represent the structural parameters of the model using a machine learning algorithm; the generation unit includes: a first generation module configured to input the target image to the pre-trained first detection sub-model, Generate the first feature map corresponding to the target image; the second generation module is configured to input the target image to the pre-trained second detection sub-model to generate a second feature map corresponding to the target image; the second determination module is configured to Based on the first feature map and the second feature map, a feature map corresponding to the first feature map and the second feature map is determined as a feature map corresponding to the target image.

According to the embodiments of the present application, the present application also provides an electronic device and a readable storage medium.

As shown in FIG. 9 , it is a block diagram of an electronic device according to a method for training a model according to an embodiment of the present application. Electronic device is intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other suitable computers. Electronic devices may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are by way of example only, and are not intended to limit implementations of the applications described and/or claimed herein.

As shown in FIG. 9, the electronic device includes: one or more processors 901, a memory 902, and interfaces for connecting various components, including high-speed interfaces and low-speed interfaces. The various components are interconnected using different buses and can be mounted on a common motherboard or otherwise as desired. The processor may process instructions executed within the electronic device, including instructions stored in or on the memory, to display graphical information of a GUI on an external input/output device such as a display device coupled to an interface. In other implementations, multiple processors and/or multiple buses may be used with multiple memories and multiple memories, if desired. Likewise, multiple electronic devices may be connected, with each device providing some of the necessary operations (eg, as a server array, a set of blade servers, or a multi-processor system). In FIG. 9, a processor 901 is taken as an example.

The memory 902 is a non-transitory computer-readable storage medium provided in this application. Wherein, the memory stores instructions executable by at least one processor, so that at least one processor executes the method for training a model provided in this application. The non-transitory computer-readable storage medium of the present application stores computer instructions, and the computer instructions are used to cause a computer to execute the method for training a model provided in the present application.

The memory 902, as a non-transitory computer-readable storage medium, can be used to store non-transitory software programs, non-transitory computer-executable programs and modules, such as the program instructions/modules corresponding to the method for training the model in the embodiment of the present application ( For example, the acquisition unit 701 and the training unit 702 shown in Fig. 7). The processor 901 executes various functional applications and data processing of the server by running the non-transitory software programs, instructions and modules stored in the memory 902, that is, implements the method for training the model in the above method embodiments.

The memory 902 may include a program storage area and a data storage area, wherein the program storage area may store an operating system and an application program required by at least one function; the data storage area may store data created according to the use of the electronic device for training the model Wait. In addition, the memory 902 may include a high-speed random access memory, and may also include a non-transitory memory, such as at least one magnetic disk storage device, a flash memory device, or other non-transitory solid-state storage devices. In some embodiments, the storage 902 may optionally include storages that are remotely located relative to the processor 901, and these remote storages may be connected to electronic devices for training models through a network. Examples of the aforementioned networks include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The electronic equipment used in the method for training a model may further include: an input device 903 and an output device 904 . The processor 901, the memory 902, the input device 903, and the output device 904 may be connected through a bus or in other ways, and connection through a bus is taken as an example in FIG. 9 .

The input device 903 can receive the input number or character information, and generate key signal input related to the user setting and function control of the electronic equipment used to train the model, such as touch screen, keypad, mouse, trackpad, touchpad, pointer stick , one or more mouse buttons, trackballs, joysticks, and other input devices. The output device 904 may include a display device, an auxiliary lighting device (eg, LED), a tactile feedback device (eg, a vibration motor), and the like. The display device may include, but is not limited to, a liquid crystal display (LCD), a light emitting diode (LED) display, and a plasma display. In some implementations, the display device may be a touch screen.

Various implementations of the systems and techniques described herein can be implemented in digital electronic circuitry, integrated circuit systems, application specific ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include being implemented in one or more computer programs executable and/or interpreted on a programmable system including at least one programmable processor, the programmable processor Can be special-purpose or general-purpose programmable processor, can receive data and instruction from storage system, at least one input device, and at least one output device, and transmit data and instruction to this storage system, this at least one input device, and this at least one output device an output device.

These computing programs (also referred to as programs, software, software applications, or codes) include machine instructions for a programmable processor and may be implemented using high-level procedural and/or object-oriented programming languages, and/or assembly/machine language calculation program. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or means for providing machine instructions and/or data to a programmable processor ( For example, magnetic disks, optical disks, memories, programmable logic devices (PLDs), including machine-readable media that receive machine instructions as machine-readable signals. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.

To provide for interaction with the user, the systems and techniques described herein can be implemented on a computer having a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to the user. ); and a keyboard and pointing device (eg, a mouse or a trackball) through which a user can provide input to the computer. Other kinds of devices can also be used to provide interaction with the user; for example, the feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and can be in any form (including Acoustic input, speech input or, tactile input) to receive input from the user.

The systems and techniques described herein can be implemented in a computing system that includes back-end components (e.g., as a data server), or a computing system that includes middleware components (e.g., an application server), or a computing system that includes front-end components (e.g., as a a user computer having a graphical user interface or web browser through which a user can interact with embodiments of the systems and techniques described herein), or including such backend components, middleware components, Or any combination of front-end components in a computing system. The components of the system can be interconnected by any form or medium of digital data communication, eg, a communication network. Examples of communication networks include: Local Area Network (LAN), Wide Area Network (WAN) and the Internet.

A computer system may include clients and servers. Clients and servers are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by computer programs running on the respective computers and having a client-server relationship to each other.

According to the technical solution of the embodiment of the present application, the training sample set is obtained, wherein the training samples in the training sample set include the target image and the feature map corresponding to the target image; the target image included in the training sample set in the training sample set is used as the input data of the network , the feature map corresponding to the input target image is used as the output data of the network, and the image detection model is trained. The network structure of the model is constructed based on each model substructure in various substructures, and the optimization goal of the image detection model is By sampling the structural parameters of each model substructure in various substructures, the optimal solution of the model network structure is learned, and each type of substructure is processed by the feature map processed by each model substructure in the network structure search space of the model. Based on the analysis, the research and optimization of the image detection model structure is realized. By classifying the model substructure, different levels of information are separated from each other to generate a model structure with different content, and an image detection model based on a separated search space is realized. build. Allocate more resources to the processing of required types of information according to requirements, realize lightweight and efficient image detection, avoid waste of resources, and improve the accuracy and efficiency of the model as a whole.

It should be understood that steps may be reordered, added or deleted using the various forms of flow shown above. For example, the steps described in this application may be executed in parallel, sequentially, or in a different order, as long as the desired result of the technical solution disclosed in this application can be achieved, no limitation is imposed herein.

The above specific implementation methods are not intended to limit the protection scope of the present application. It should be apparent to those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made depending on design requirements and other factors. Any modifications, equivalent replacements and improvements made within the spirit and principles of this application shall be included within the protection scope of this application.

Claims

A method for training a model, the method comprising:

Obtain a training sample set, wherein the training samples in the training sample set include a target image and a feature map corresponding to the target image;

The target image included in the training samples in the training sample set is used as the input data of the network, and the feature map corresponding to the input target image is used as the output data of the network, and the image detection model is obtained through training, wherein the image detection model The network structure of the network structure is constructed based on each model substructure in various substructures, and the optimization goal of the image detection model is to learn and obtain the image detection model by sampling the structural parameters of each model substructure in various substructures. The optimal solution of the network structure, the various substructures are obtained by analyzing the feature maps processed by each model substructure in the network structure search space of the image detection model.
The method according to claim 1, wherein the various substructures are obtained by dividing the feature semantic levels of the feature map processed by each model substructure in the network structure search space of the image detection model.
The method according to claim 2, wherein the network structure of the image detection model is constructed based on summarizing each model substructure in the same kind of substructure and then stacking them according to the levels divided by the various substructures.
The method according to claim 1, wherein the training of the image detection model is used to represent that each model substructure in the image detection model is transformed into a plurality of feature layers, and the input data of each feature layer includes : at least one of the output data of at least two model substructures before the corresponding model substructure in the network structure of the image detection model and the output data of each feature layer before the corresponding model substructure.
The method according to claim 1, wherein the various substructures include: a first substructure and a second substructure, the image detection model includes a first submodel and a second submodel, the first submodel The network structure of the model is constructed based on each model substructure in the first type of substructure, and the network structure of the second submodel is constructed based on each model substructure in the second type of substructure;

The target image included in the training samples in the training sample set is used as the input data of the network, and the feature map corresponding to the input target image is used as the output data of the network, and the image detection model is obtained through training, including:

The target image included in the training samples in the training sample set is used as the input data of the network, and the feature map corresponding to the input target image is used as the output data of the network, and the parameters of each model substructure in the one type of substructure are adjusted. Structural parameters, obtaining the first sub-model that has been trained;

The target image included in the training samples in the training sample set is used as the input data of the network, and the feature map corresponding to the input target image is used as the output data of the network, and the parameters of each model substructure in the two-class substructure are adjusted. Structural parameters, obtaining the second sub-model that has been trained;

The image detection model is determined based on the trained first sub-model and the trained second sub-model.
The method according to claim 1, wherein the image detection model includes a first detection sub-model and a second detection sub-model, and the model parameters of the first detection sub-model are training parameters of the image detection model, so The model parameters of the second detection sub-model are structural parameters of the image detection model;

The target image included in the training samples in the training sample set is used as the input data of the network, and the feature map corresponding to the input target image is used as the output data of the network, and the image detection model is obtained through training, including:

Using the target image included in the training samples in the training sample set as input data, using the feature map corresponding to the input target image as output data, adjusting the training parameters to obtain the first detection sub-model that has been trained;

Using the target image included in the training sample in the training sample set as input data, using the feature map corresponding to the input target image as output data, adjusting the structural parameters to obtain the second detection sub-model that has been trained;

The image detection model is determined based on the trained first detection sub-model and the trained second detection sub-model.
A method for generating information, the method comprising:

Get the target image;

Inputting the target image into a pre-trained image detection model to generate a feature map corresponding to the target image, wherein the image detection model is trained by the method according to any one of claims 1-6.
The method according to claim 7, wherein the image detection model includes a first detection sub-model and a second detection sub-model, and the first detection sub-model is used to characterize the performance of the image detection model using a machine learning algorithm. The training parameters are trained, and the second detection sub-model is used to represent the structural parameters of the image detection model that are trained using machine learning algorithms;

The step of inputting the target image into a pre-trained image detection model to generate a feature map corresponding to the target image includes:

Inputting the target image into a pre-trained first detection sub-model to generate a first feature map corresponding to the target image;

Inputting the target image into a pre-trained second detection sub-model to generate a second feature map corresponding to the target image;

Based on the first feature map and the second feature map, determine a feature map corresponding to the first feature map and the second feature map as a feature map corresponding to the target image.
A device for training a model, the device comprising:

an acquisition unit configured to acquire a training sample set, wherein the training samples in the training sample set include a target image and a feature map corresponding to the target image;

The training unit is configured to use the target image included in the training samples in the training sample set as the input data of the network, and use the feature map corresponding to the input target image as the output data of the network to train an image detection model, wherein , the network structure of the image detection model is constructed based on each model substructure in various substructures, and the optimization goal of the image detection model is to learn by sampling the structural parameters of each model substructure in various substructures An optimal solution of the network structure of the image detection model is obtained, and the various substructures are obtained by analyzing the feature maps processed by each model substructure in the network structure search space of the image detection model.
The device according to claim 9, wherein the various substructures in the training unit use the feature semantics of the feature map processed by each model substructure in the network structure search space of the image detection model obtained by classifying.
The device according to claim 10, wherein the network structure of the image detection model in the training unit is based on summarizing each model substructure in the same substructure and dividing it according to the various substructures Layers are stacked to build.
The device according to claim 9, wherein the training of the image detection model in the training unit is used to represent that each model substructure in the image detection model is transformed into multiple feature layers, each of the The input data of the feature layer includes: at least one of the output data of at least two model substructures before the corresponding model substructure in the network structure of the image detection model and the output data of each feature layer before the corresponding model substructure.
The device according to claim 9, wherein the various substructures in the training unit include: a first substructure and a second substructure, and the image detection model includes a first submodel and a second submodel , the network structure of the first sub-model is constructed based on each model substructure in the first type of substructure, and the network structure of the second sub-model is constructed based on each model substructure in the second type of substructure;

The training unit includes:

The first training module is configured to use the target image included in the training samples in the training sample set as the input data of the network, use the feature map corresponding to the input target image as the output data of the network, and adjust the class Structural parameters of each model substructure in the substructure to obtain the first submodel that has been trained;

The second training module is configured to use the target image included in the training samples in the training sample set as the input data of the network, use the feature map corresponding to the input target image as the output data of the network, and adjust the second class Structural parameters of each model substructure in the substructure to obtain the second submodel that has been trained;

The first determining module is configured to determine the image detection model based on the trained first sub-model and the trained second sub-model.
The device according to claim 9, wherein the image detection model in the training unit includes a first detection sub-model and a second detection sub-model, and the model parameters of the first detection sub-model are the image detection The training parameters of the model, the model parameters of the second detection sub-model are the structural parameters of the image detection model;

The training unit includes:

The third training module is configured to use the target image included in the training samples in the training sample set as input data, and use the feature map corresponding to the input target image as output data, adjust the training parameters, and obtain the first training completed a detection sub-model;

The fourth training module is configured to use the target image included in the training samples in the training sample set as input data, use the feature map corresponding to the input target image as output data, adjust the structural parameters, and obtain the first training complete Two detection sub-models;

The second determination module is configured to determine the image detection model based on the trained first detection sub-model and the trained second detection sub-model.
An apparatus for generating information, the apparatus comprising:

an image acquisition unit configured to acquire a target image;

A generating unit configured to input the target image into a pre-trained image detection model, and generate a feature map corresponding to the target image, wherein the image detection model passes the method according to any one of claims 1-6 Get trained.
The device according to claim 15, wherein the image detection model in the generation unit includes a first detection sub-model and a second detection sub-model, and the first detection sub-model is used to characterize the The training parameters of the image detection model are trained, and the second detection sub-model is used to represent that the structural parameters of the image detection model are trained using machine learning algorithms;

The generating unit includes:

The first generation module is configured to input the target image into a pre-trained first detection sub-model, and generate a first feature map corresponding to the target image;

The second generation module is configured to input the target image into a pre-trained second detection sub-model, and generate a second feature map corresponding to the target image;

The second determination module is configured to determine, based on the first feature map and the second feature map, a feature map corresponding to the first feature map and the second feature map as the target image corresponding feature map.
An electronic device, characterized in that it comprises:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

The memory stores instructions executable by the at least one processor, the instructions are executed by the at least one processor, so that the at least one processor can perform claims 1-6 or claims 7-8 any one of the methods described.
A non-transitory computer-readable storage medium storing computer instructions, wherein the computer instructions are used to make the computer execute the method according to any one of claims 1-6 or 7-8.