WO2023273720A1 - Method and apparatus for training model, and device, and storage medium - Google Patents

Method and apparatus for training model, and device, and storage medium Download PDF

Info

Publication number
WO2023273720A1
WO2023273720A1 PCT/CN2022/095186 CN2022095186W WO2023273720A1 WO 2023273720 A1 WO2023273720 A1 WO 2023273720A1 CN 2022095186 W CN2022095186 W CN 2022095186W WO 2023273720 A1 WO2023273720 A1 WO 2023273720A1
Authority
WO
WIPO (PCT)
Prior art keywords
model
training
substructure
target image
feature map
Prior art date
Application number
PCT/CN2022/095186
Other languages
French (fr)
Chinese (zh)
Inventor
张炜
许靖
梅涛
周伯文
Original Assignee
京东科技控股股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 京东科技控股股份有限公司 filed Critical 京东科技控股股份有限公司
Publication of WO2023273720A1 publication Critical patent/WO2023273720A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Definitions

  • the embodiments of the present application relate to the field of computer technology, specifically to the field of image processing technology, and especially to a method and device for training a model.
  • the present application provides a method, device, device and storage medium for training a model and a method, device, device and storage medium for generating information.
  • Some embodiments of the present application provide a method for training a model, the method comprising: obtaining a training sample set, wherein the training samples in the training sample set include a target image and a feature map corresponding to the target image;
  • the target image included in the training sample is used as the input data of the network, and the feature map corresponding to the input target image is used as the output data of the network, and the image detection model is obtained through training, wherein the network structure of the image detection model is based on various substructures.
  • Each model substructure is constructed.
  • the optimization goal of the image detection model is to learn the optimal solution of the network structure of the image detection model by sampling the structural parameters of each model substructure in each substructure. It is obtained by analyzing the feature maps processed by each model substructure in the network structure search space of the image detection model.
  • various substructures are obtained by dividing feature semantic levels of feature maps processed by each model substructure in the network structure search space of the image detection model.
  • the network structure of the image detection model is constructed based on summarizing model substructures in the same substructure and stacking them according to the levels divided by various substructures.
  • the training of the image detection model is used to represent that each model substructure in the image detection model is transformed into multiple feature layers, and the input data of each feature layer includes: corresponding model substructures in the network structure of the image detection model At least one of the output data of at least two model substructures preceding the structure and the output data of each feature layer preceding the corresponding model substructure.
  • the input data of each feature layer may include the output data of at least two model substructures before the corresponding model substructure in the network structure of the image detection model; or, the input data of each feature layer may include the output data of the image detection model The output data of each feature layer before the corresponding model substructure in the network structure; or, the input data of each feature layer may include: the output of at least two model substructures before the corresponding model substructure in the network structure of the image detection model The output data of each feature layer before the data and the corresponding model substructure.
  • various substructures include: a first substructure and a second substructure, the image detection model includes a first submodel and a second submodel, and the network structure of the first submodel is based on the Each model substructure is constructed, and the network structure of the second submodel is constructed based on each model substructure in the second substructure; the target image included in the training sample in the training sample set is used as the input data of the network, and the input target The feature map corresponding to the image is used as the output data of the network, and the image detection model is obtained by training, including: using the target image included in the training samples in the training sample set as the input data of the network, and using the feature map corresponding to the input target image as the output of the network Data, adjust the structural parameters of each model substructure in a class of substructures, and obtain the first submodel after training; use the target image included in the training sample in the training sample set as the input data of the network, and use the input data corresponding to the input target image
  • the feature map is used as the output data of
  • the image detection model includes a first detection sub-model and a second detection sub-model, the model parameters of the first detection sub-model are the training parameters of the image detection model, and the model parameters of the second detection sub-model are the image detection model Structural parameters; the target image included in the training samples in the training sample set is used as the input data of the network, and the feature map corresponding to the input target image is used as the output data of the network, and the image detection model is obtained by training, including: the training sample set The target image included in the training sample is used as input data, the feature map corresponding to the input target image is used as output data, and the training parameters are adjusted to obtain the first detection sub-model that has been trained; the target image included in the training sample set in the training sample set is used as Input data, use the feature map corresponding to the input target image as output data, adjust the structural parameters, and obtain the trained second detection sub-model; based on the trained first detection sub-model and the trained second detection sub-model, Determine the image detection model.
  • Some embodiments of the present application provide a method for generating information.
  • the method includes: acquiring a target image; inputting the target image into a pre-trained image detection model to generate a feature map corresponding to the target image, wherein the image detection model passes Obtained by training according to any embodiment of the method for training the model above.
  • the image detection model includes a first detection sub-model and a second detection sub-model
  • the first detection sub-model is used to represent the training parameters of the image detection model using machine learning algorithms
  • the second detection sub-model uses In characterization, the machine learning algorithm is used to train the structural parameters of the image detection model
  • the target image is input into the pre-trained image detection model
  • the feature map corresponding to the target image is generated, including: inputting the target image into the pre-trained first detector model to generate the first feature map corresponding to the target image; input the target image to the pre-trained second detection sub-model to generate the second feature map corresponding to the target image; based on the first feature map and the second feature map, determine the The feature map corresponding to the first feature map and the second feature map is used as the feature map corresponding to the target image.
  • Some embodiments of the present application provide a device for training a model
  • the device includes: an acquisition unit configured to acquire a training sample set, wherein the training samples in the training sample set include a target image and a feature map corresponding to the target image
  • the training unit is configured to use the target image included in the training sample set as the input data of the network, and use the feature map corresponding to the input target image as the output data of the network to train the image detection model, wherein the image detection
  • the network structure of the model is constructed based on each model substructure in various substructures.
  • the optimization goal of the image detection model is to learn the network structure of the image detection model by sampling the structural parameters of each model substructure in each substructure.
  • the optimal solution of , various substructures are obtained by analyzing the feature maps processed by each model substructure in the network structure search space of the image detection model.
  • various substructures in the training unit are obtained by dividing the feature semantic levels of the feature maps processed by each model substructure in the network structure search space of the image detection model.
  • the network structure of the image detection model in the training unit is constructed based on summarizing the model substructures in the same kind of substructures and stacking them according to the levels divided by various substructures.
  • the training of the image detection model in the training unit is used to represent that each model substructure in the image detection model is transformed into multiple feature layers, and the input data of each feature layer includes: the network structure of the image detection model At least one of the output data of at least two model substructures before the corresponding model substructure in the corresponding model substructure and the output data of each feature layer before the corresponding model substructure.
  • the input data of each feature layer may include the output data of at least two model substructures before the corresponding model substructure in the network structure of the image detection model; or, the input data of each feature layer may include the output data of the image detection model The output data of each feature layer before the corresponding model substructure in the network structure; or, the input data of each feature layer may include: the output of at least two model substructures before the corresponding model substructure in the network structure of the image detection model The output data of each feature layer before the data and the corresponding model substructure.
  • the various substructures in the training unit include: a first substructure and a second substructure, the image detection model includes a first submodel and a second submodel, and the network structure of the first submodel is based on a class Each model substructure in the substructure is constructed, and the network structure of the second submodel is constructed based on each model substructure in the second type substructure;
  • the training unit includes: the first training module, which is configured to use the training sample set The target image included in the training sample is used as the input data of the network, and the feature map corresponding to the input target image is used as the output data of the network, and the structural parameters of each model substructure in a class of substructures are adjusted to obtain the first trained submodel ;
  • the second training module is configured to use the target image included in the training sample set as the input data of the network, and use the feature map corresponding to the input target image as the output data of the network, and adjust each model in the second class substructure
  • the structural parameter of the substructure is to obtain the trained second sub-
  • the image detection model in the training unit includes a first detection sub-model and a second detection sub-model
  • the model parameters of the first detection sub-model are the training parameters of the image detection model
  • the model parameters of the second detection sub-model is the structural parameter of the image detection model
  • the training unit includes: a third training module configured to use the target image included in the training samples in the training sample set as input data, and use the feature map corresponding to the input target image as output data, Adjust the training parameters to obtain the trained first detection sub-model
  • the fourth training module is configured to use the target image included in the training samples in the training sample set as input data, and use the feature map corresponding to the input target image as output data , adjust the structural parameters to obtain the trained second detection sub-model
  • the second determination module is configured to determine the image detection model based on the trained first detection sub-model and the trained second detection sub-model.
  • Some embodiments of the present application provide an apparatus for generating information, and the apparatus includes: an image acquisition unit configured to acquire a target image; a generation unit configured to input the target image into a pre-trained image detection model to generate A feature map corresponding to the target image, wherein the image detection model is obtained through training according to any embodiment of the method for training the model above.
  • the image detection model in the generation unit includes a first detection sub-model and a second detection sub-model, the first detection sub-model is used to represent the training parameters of the image detection model using machine learning algorithms, and the second detection sub-model The detection sub-model is used to represent and train the structural parameters of the image detection model using a machine learning algorithm;
  • the generation unit includes: a first generation module configured to input the target image to the pre-trained first detection sub-model to generate the target image The corresponding first feature map;
  • the second generation module is configured to input the target image to the pre-trained second detection sub-model to generate a second feature map corresponding to the target image;
  • the second determination module is configured to be based on the first A feature map and a second feature map, determining a feature map corresponding to the first feature map and the second feature map as a feature map corresponding to the target image.
  • Some embodiments of the present application provide an electronic device, including: at least one processor; and a memory communicatively connected to the at least one processor; wherein, the memory stores instructions executable by at least one processor, and the instructions are executed by at least one processor. Executed by a processor, so that at least one processor can execute the method described in the foregoing implementation manner.
  • Some embodiments of the present application provide a non-transitory computer-readable storage medium storing computer instructions, wherein the computer instructions are used to cause a computer to execute the method described in the foregoing implementation manners.
  • FIG. 1 is a schematic diagram of a first embodiment of a method for training a model according to the present application
  • FIG. 2 is a scene diagram of a method for training a model that can implement an embodiment of the present application
  • FIG. 3 is a schematic diagram of a second embodiment of a method for training a model according to the present application.
  • Fig. 4 is the schematic diagram that realizes the model submodule structure of the embodiment of the present application.
  • 5A is a schematic diagram of a network model architecture according to the method for training a model of the present application
  • 5B and 5C are schematic diagrams of the sampling process of the model sub-module structure in the network model architecture
  • FIG. 5D is a schematic diagram of the sampling results of the network model architecture
  • Fig. 6 is a schematic diagram of a first embodiment of a method for generating information according to the present application.
  • FIG. 7 is a schematic structural diagram of an embodiment of a device for training a model according to the present application.
  • Fig. 8 is a schematic structural diagram of an embodiment of a device for generating information according to the present application.
  • FIG. 9 is a block diagram of an electronic device used to implement an embodiment of the present application.
  • FIG. 1 shows a schematic diagram 100 of a first embodiment of a method for training a model according to the present application.
  • the method for training the model includes the following steps:
  • Step 101 obtain a training sample set.
  • the execution subject may obtain the training sample set from other electronic devices or locally through a wired connection or a wireless connection.
  • the training samples in the training sample set include target images and feature maps corresponding to the target images.
  • the above-mentioned wireless connection methods may include but not limited to 3G, 4G, 5G connection, WiFi connection, Bluetooth connection, WiMAX connection, Zigbee connection, UWB (ultra wideband) connection, and other wireless connections known or developed in the future. connection method.
  • Step 102 using the target image included in the training samples in the training sample set as the input data of the network, and using the feature map corresponding to the input target image as the output data of the network, to train an image detection model.
  • the execution subject can use the machine learning algorithm to train the image detection model by using the target image obtained in step 101 as the input data of the network, and using the feature map corresponding to the input target image as the output data of the network.
  • the network structure of the model is constructed based on each model substructure in various substructures. For example, each model substructure in different types of substructures is stacked according to the preset construction method.
  • the optimization goal of the image detection model can be The machine learning algorithm is used to sample the structural parameters of each model substructure in various substructures each time, and continuously sample a better structure until convergence to obtain the optimal model network structure.
  • each model substructure in the network structure search space is analyzed for feature information, and each model substructure in each substructure is classified based on the analysis results.
  • the feature information can be forged feature information in the image.
  • the above execution body can store a pre-trained image detection model, and the network architecture of the image detection model is predefined, for example, an eight-layer basic model architecture is defined for the search of the model sub-module structure.
  • the execution subject may use the image detection model to predict a feature map corresponding to feature information in the target image in the target image.
  • the image detection model can be used to characterize the correspondence between the target object and the feature map.
  • the model structure of the image prediction model can be constructed based on various logistic regression models in related technologies, such as but not limited to: BERT, FastText, TextCNN, etc.
  • the image detection model may be, for example, a data table or a calculation formula, and this embodiment does not make any limitation on this aspect.
  • the above-mentioned machine learning algorithm is a well-known technology widely researched and applied at present, and will not be repeated here.
  • the method 200 for training a model of this embodiment runs on a server 201 .
  • the server 201 first obtains the training sample set 202, wherein the training samples in the training sample set include the target image and the feature map corresponding to the target image, and then the server 201 uses the target image included in the training sample set as the input data of the network, and The feature map corresponding to the input target image is used as the output data of the network to train the image detection model 203, wherein the network structure of the model is constructed based on each model substructure in various substructures, and the optimization goal of the image detection model is to pass Sampling the structural parameters of each model substructure in various substructures, learning to obtain the optimal solution of the model network structure, and analyzing the feature maps processed by each model substructure in the network structure search space of the model for each substructure And get.
  • the method for training the model adopts obtaining a training sample set, wherein, the training samples in the training sample set include a target image and a feature map corresponding to the target image; the training samples in the training sample set include the target
  • the image is used as the input data of the network, and the feature map corresponding to the input target image is used as the output data of the network to train the image detection model.
  • the network structure of the model is constructed based on each model substructure in various substructures.
  • the optimization goal of the detection model is to learn the optimal solution of the model network structure by sampling the structural parameters of each model substructure in various substructures.
  • the processed feature map is analyzed and obtained, and the research and optimization of the image detection model structure is realized.
  • model substructure By classifying the model substructure, different levels of information are separated from each other, and a model structure with different contents is generated, which realizes a separation based on Construction of Image Detection Models in Search Space. Allocate more resources to the processing of required types of information according to requirements, realize lightweight ground image authentication, avoid waste of resources, and improve the accuracy and efficiency of the model as a whole.
  • FIG. 3 a schematic diagram 300 of a second embodiment of a method for training a model is shown.
  • the flow of the method includes the following steps:
  • Step 301 acquire a training sample set.
  • step 302 the target image included in the training samples in the training sample set is used as the input data of the network, and the feature map corresponding to the input target image is used as the output data of the network to train an image detection model.
  • the execution subject can use the machine learning algorithm to train the image detection model by using the target image obtained in step 301 as the input data of the network, and using the feature map corresponding to the input target image as the output data of the network.
  • the network structure of the model is constructed based on each model substructure in each substructure, and the optimization goal of the image detection model can be during training, by sampling the structural parameters of each model substructure in each substructure each time , continue to sample a better structure until it converges to obtain the optimal model network structure, and various substructures are obtained by dividing the feature semantic levels of the feature maps processed by each model substructure in the network structure search space of the model , for example, based on the semantic features, the corresponding information of the feature map is divided into shallow/bottom information and deep information according to the operation mode of different depths, and each model substructure corresponding to the shallow information of the model is divided into shallow substructures, and the deep layer of the model Each model substructure corresponding to the information is divided into deep substructures.
  • the training of the image detection model can be used to represent that each model substructure in the image detection model is transformed into multiple feature layers, and the input data of each feature layer includes: before the corresponding model substructure in the model network structure At least one of the output data of at least two model substructures and the output data of each feature layer before the corresponding model substructure.
  • the model sub-module structure definition is shown in Figure 4.
  • Each sub-module structure receives the output of the first two sub-module structures as input (Input 1 and Input 2 in the figure), and performs three-layer feature transformation on these two inputs (in the figure Node 1 ⁇ Node 3 ), the input of each feature map includes two inputs of the sub-module structure and the output of the previous feature maps of each layer, and the sum of the results is used as the output of this layer, and finally each feature layer (Node 1 ⁇ Node 3 ) output superposition as the output of the model sub-module structure.
  • the basic network model architecture of the eight-layer image detection model is defined.
  • the bottom two modules (cell 1 and cell 2) are divided into shallow substructure-based Shallow unit
  • the upper six modules (cell 3 ⁇ cell 8) are divided into deep units based on deep substructure.
  • the controller samples a specific structure for the shallow unit and the deep unit each time, so as to obtain a candidate model.
  • the test results of the candidate model are used to train the controller by means of reinforcement learning or gradient optimization, so that it can continuously sample a better structure until it converges to obtain the optimal model structure.
  • the controller can use commonly used structure search algorithms.
  • each unit searches for each edge (the dotted line edge shown in Figure 5B) and selects an operation type to obtain a subnetwork of the unit (as shown in Figure 5C), and finally obtains the sampling result
  • the left picture is the sampling result of the deep unit
  • the right picture is the sampling result of the shallow unit
  • max_pool indicates the maximum pooling
  • dil_conv_5x5 indicates the dilated convolution with a kernel size of 5
  • dil_conv_3x3 means a dilated convolution with a kernel size of 3
  • identity means an identity map
  • sep_conv_3x3 means a depth-wise separable convolution with a kernel_size of 3.
  • the network structure of the model is constructed based on summarizing each model substructure in the same substructure and then stacking them according to the levels divided by various substructures.
  • the shallow structure of the model is separated from the deep structure, so that the model can focus more on the processing of the underlying information.
  • various substructures include: a first substructure and a second substructure, the image detection model includes a first submodel and a second submodel, and the network structure of the first submodel It is constructed based on each model substructure in the first substructure, and the network structure of the second submodel is constructed based on each model substructure in the second substructure; the target image included in the training samples in the training sample set is used as the input of the network Data, using the feature map corresponding to the input target image as the output data of the network, training the image detection model, including: using the target image included in the training samples in the training sample set as the input data of the network, corresponding to the input target image
  • the feature map of the network is used as the output data of the network, and the structural parameters of each model substructure in a class of substructures are adjusted to obtain the first submodel after training; the target image included in the training samples in the training sample set is used as the input data of the network, and the The feature map corresponding to the input target image is used as the
  • the image detection model includes a first detection sub-model and a second detection sub-model, the model parameters of the first detection sub-model are the training parameters of the model, and the model of the second detection sub-model
  • the parameter is the structural parameter of the model
  • the target image included in the training samples in the training sample set is used as the input data of the network, and the feature map corresponding to the input target image is used as the output data of the network
  • the image detection model is obtained by training, including: training The target image included in the training sample in the sample set is used as input data, and the feature map corresponding to the input target image is used as output data, and the training parameters are adjusted to obtain the first detection sub-model that has been trained
  • the training sample included in the training sample set is The target image is used as input data, the feature map corresponding to the input target image is used as output data, and the structural parameters are adjusted to obtain the trained second detection sub-model; based on the trained first detection sub-model and the trained second detection Submodel, which determines the image detection model
  • the image detection model can be built based on the neural network.
  • the parameters of the neural network model include: training parameters and network structure parameters.
  • the training parameters are other parameters obtained by training except the model structure parameters, such as step size learning rate, data sample number batch size, weight weight decay, etc.
  • the network structure parameters are parameters that define the network structure of the image detection model, such as the number of layers of the network, the operator of each layer, and the filter size in the convolution.
  • step 301 is basically the same as the operation of step 101 in the embodiment shown in FIG. 1 , and will not be repeated here.
  • the schematic diagram 300 of the method for training a model in this embodiment adopts various substructures by searching for each model substructure in the network structure space of the model
  • the feature semantic level of the processed feature map is divided, and the training of the model is used to represent the transformation of each model substructure in the model into multiple feature layers.
  • the input data of each feature layer includes: the corresponding in the model network structure At least one of the output data of at least two model substructures preceding the model substructure and the output data of each feature layer preceding the corresponding model substructure.
  • the design of the image classification model tends to use a deeper model structure. In the application of image forgery, it causes the problem of waste of computing resources.
  • By dividing the model substructure the operations of different depths are separated from each other, and different structures are used. Different processing methods are used for different levels of information, and the construction of an image detection model based on depth-separated search space is realized. On the depth-separated search space, a lightweight image authentication model is realized.
  • FIG. 6 it shows a schematic diagram 600 of a first embodiment of a method for generating information according to the present application.
  • the method for generating information includes the following steps:
  • Step 601 acquire a target image.
  • the execution subject (such as a server or a terminal device) may acquire the target image from other electronic devices or locally through a wired connection or a wireless connection.
  • Step 602 Input the target image into the pre-trained image detection model to generate a feature map corresponding to the target image.
  • the execution subject may input the target image obtained in step 601 into a pre-trained image detection model to generate a feature map corresponding to feature information in the target image.
  • the image detection model is obtained through training according to any one of the above-mentioned methods for training the model.
  • the image detection model includes a first detection sub-model and a second detection sub-model
  • the first detection sub-model is used to represent the training parameters of the model using a machine learning algorithm
  • the second detection sub-model is used to represent the structural parameters of the model using machine learning algorithms to train; input the target image to the pre-trained image detection model to generate the feature map corresponding to the target image, including: input the target image to the pre-trained first A detection sub-model, generating a first feature map corresponding to the target image; inputting the target image to a pre-trained second detection sub-model, generating a second feature map corresponding to the target image; based on the first feature map and the second feature map, The feature map corresponding to the first feature map and the second feature map is determined as the feature map corresponding to the target image.
  • Two sub-modules are used to extract features separately, which improves the processing efficiency and accuracy of the system.
  • the process 600 of the method for generating information in this embodiment highlights the use of the trained image detection model to generate feature information in the target image The corresponding feature map steps. Therefore, the solution described in this embodiment can realize targeted feature extraction of different types, different levels, and different depths.
  • the present application provides an embodiment of a device for training a model, which corresponds to the method embodiment shown in FIG. 1 , in addition to the features described below, the device embodiment may also include the same or corresponding features as the method embodiment shown in Figure 1, and produce the same or corresponding effects as the method embodiment shown in Figure 1, the device Specifically, it can be applied to various electronic devices.
  • the apparatus 700 for training a model in this embodiment includes: an acquisition unit 701 and a training unit 702, wherein the acquisition unit is configured to acquire a training sample set, wherein the training samples in the training sample set include the target image and a feature map corresponding to the target image; the training unit is configured to use the target image included in the training samples in the training sample set as the input data of the network, and use the feature map corresponding to the input target image as the output data of the network, train
  • the image detection model is obtained, wherein the network structure of the model is constructed based on each model substructure in various substructures, and the optimization goal of the image detection model is to learn by sampling the structural parameters of each model substructure in various substructures.
  • the optimal solution of the model network structure is obtained, and various substructures are obtained by analyzing the feature maps processed by each model substructure in the network structure search space of the model.
  • the various substructures in the training unit are obtained by dividing the feature semantic levels of the feature maps processed by each model substructure in the network structure search space of the model.
  • the network structure of the model in the training unit is constructed based on summarizing each model substructure in the same kind of substructure and then stacking them according to the levels divided by various substructures.
  • the training of the image detection model in the training unit is used to represent that each model substructure in the model is transformed into multiple feature layers
  • the input data of each feature layer includes: model At least one of the output data of at least two model substructures before the corresponding model substructure in the network structure and the output data of each feature layer before the corresponding model substructure.
  • the various substructures in the training unit include: one type of substructure and two types of substructure, the image detection model includes the first submodel and the second submodel, the first substructure The network structure of the model is constructed based on each model substructure in the first type of substructure, and the network structure of the second submodel is constructed based on each model substructure in the second type of substructure;
  • the training unit includes: the first training module, which is It is configured to use the target image included in the training samples in the training sample set as the input data of the network, use the feature map corresponding to the input target image as the output data of the network, adjust the structural parameters of each model substructure in a class of substructures, and obtain The first sub-model that has been trained;
  • the second training module is configured to use the target image included in the training sample in the training sample set as the input data of the network, and use the feature map corresponding to the input target image as the output data of the network, adjust The structural parameters of each model substructure in the second type of substructure obtain the trained
  • the image detection model in the training unit includes a first detection sub-model and a second detection sub-model
  • the model parameters of the first detection sub-model are the training parameters of the model
  • the model parameters of the sub-model are structural parameters of the model
  • the training unit includes: a third training module configured to use the target image included in the training samples in the training sample set as input data, and use the feature map corresponding to the input target image as Output data, adjust the training parameters, and obtain the first detection sub-model that has been trained
  • the fourth training module is configured to use the target image included in the training sample in the training sample set as input data, and use the feature map corresponding to the input target image
  • the structural parameters are adjusted to obtain the trained second detection sub-model
  • the second determination module is configured to determine the image detection model based on the trained first detection sub-model and the trained second detection sub-model.
  • the above-mentioned embodiments of the present disclosure provide an apparatus for training a model.
  • the training sample set is acquired by the first acquisition unit, wherein the training samples in the training sample set include head images, feature information of head objects in the head images, and The feature map corresponding to the feature information of the head object, and then, the training unit uses the machine learning algorithm to use the head image included in the training sample in the training sample set as input data, and the feature of the head object corresponding to the input head image
  • the information and the feature map corresponding to the feature information of the head object are used as the expected output data, and the feature extraction model is trained.
  • the feature extraction model is constructed based on the convolutional neural network.
  • the parameters of the convolutional neural network model include: scale parameters and others Convolution kernel parameters, the scale parameter is the scale structure of the head object set by using the scale space theory, and other convolution kernel parameters are other parameters of the convolution kernel in the convolutional neural network except the scale parameter, which enriches the training of the model This method is helpful to realize feature extraction in multi-scale space based on the trained model.
  • the present application provides an embodiment of a device for generating information.
  • This device embodiment corresponds to the method embodiment shown in FIG. 6,
  • the device embodiment may also include the same or corresponding features as the method embodiment shown in Figure 6, and produce the same or corresponding effects as the method embodiment shown in Figure 6, the device specifically It can be applied to various electronic devices.
  • the apparatus 800 for generating information in this embodiment includes: an image acquisition unit 801 and a generation unit 802, wherein the image acquisition unit is configured to acquire a target image; the generation unit is configured to convert the target image Input to the pre-trained image detection model to generate a feature map corresponding to the target image, wherein the image detection model is obtained by training according to any embodiment of the method for training the model above.
  • the image detection model in the generation unit includes a first detection sub-model and a second detection sub-model, and the first detection sub-model is used to represent the training parameters of the model using a machine learning algorithm
  • the second detection sub-model is used to represent the structural parameters of the model using a machine learning algorithm
  • the generation unit includes: a first generation module configured to input the target image to the pre-trained first detection sub-model, Generate the first feature map corresponding to the target image;
  • the second generation module is configured to input the target image to the pre-trained second detection sub-model to generate a second feature map corresponding to the target image;
  • the second determination module is configured to Based on the first feature map and the second feature map, a feature map corresponding to the first feature map and the second feature map is determined as a feature map corresponding to the target image.
  • the present application also provides an electronic device and a readable storage medium.
  • FIG. 9 it is a block diagram of an electronic device according to a method for training a model according to an embodiment of the present application.
  • Electronic device is intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other suitable computers.
  • Electronic devices may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smart phones, wearable devices, and other similar computing devices.
  • the components shown herein, their connections and relationships, and their functions, are by way of example only, and are not intended to limit implementations of the applications described and/or claimed herein.
  • the electronic device includes: one or more processors 901, a memory 902, and interfaces for connecting various components, including high-speed interfaces and low-speed interfaces.
  • the various components are interconnected using different buses and can be mounted on a common motherboard or otherwise as desired.
  • the processor may process instructions executed within the electronic device, including instructions stored in or on the memory, to display graphical information of a GUI on an external input/output device such as a display device coupled to an interface.
  • multiple processors and/or multiple buses may be used with multiple memories and multiple memories, if desired.
  • multiple electronic devices may be connected, with each device providing some of the necessary operations (eg, as a server array, a set of blade servers, or a multi-processor system).
  • a processor 901 is taken as an example.
  • the memory 902 is a non-transitory computer-readable storage medium provided in this application.
  • the memory stores instructions executable by at least one processor, so that at least one processor executes the method for training a model provided in this application.
  • the non-transitory computer-readable storage medium of the present application stores computer instructions, and the computer instructions are used to cause a computer to execute the method for training a model provided in the present application.
  • the memory 902 as a non-transitory computer-readable storage medium, can be used to store non-transitory software programs, non-transitory computer-executable programs and modules, such as the program instructions/modules corresponding to the method for training the model in the embodiment of the present application ( For example, the acquisition unit 701 and the training unit 702 shown in Fig. 7).
  • the processor 901 executes various functional applications and data processing of the server by running the non-transitory software programs, instructions and modules stored in the memory 902, that is, implements the method for training the model in the above method embodiments.
  • the memory 902 may include a program storage area and a data storage area, wherein the program storage area may store an operating system and an application program required by at least one function; the data storage area may store data created according to the use of the electronic device for training the model Wait.
  • the memory 902 may include a high-speed random access memory, and may also include a non-transitory memory, such as at least one magnetic disk storage device, a flash memory device, or other non-transitory solid-state storage devices.
  • the storage 902 may optionally include storages that are remotely located relative to the processor 901, and these remote storages may be connected to electronic devices for training models through a network. Examples of the aforementioned networks include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof.
  • the electronic equipment used in the method for training a model may further include: an input device 903 and an output device 904 .
  • the processor 901, the memory 902, the input device 903, and the output device 904 may be connected through a bus or in other ways, and connection through a bus is taken as an example in FIG. 9 .
  • the input device 903 can receive the input number or character information, and generate key signal input related to the user setting and function control of the electronic equipment used to train the model, such as touch screen, keypad, mouse, trackpad, touchpad, pointer stick , one or more mouse buttons, trackballs, joysticks, and other input devices.
  • the output device 904 may include a display device, an auxiliary lighting device (eg, LED), a tactile feedback device (eg, a vibration motor), and the like.
  • the display device may include, but is not limited to, a liquid crystal display (LCD), a light emitting diode (LED) display, and a plasma display. In some implementations, the display device may be a touch screen.
  • Various implementations of the systems and techniques described herein can be implemented in digital electronic circuitry, integrated circuit systems, application specific ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include being implemented in one or more computer programs executable and/or interpreted on a programmable system including at least one programmable processor, the programmable processor Can be special-purpose or general-purpose programmable processor, can receive data and instruction from storage system, at least one input device, and at least one output device, and transmit data and instruction to this storage system, this at least one input device, and this at least one output device an output device.
  • machine-readable medium and “computer-readable medium” refer to any computer program product, apparatus, and/or means for providing machine instructions and/or data to a programmable processor ( For example, magnetic disks, optical disks, memories, programmable logic devices (PLDs), including machine-readable media that receive machine instructions as machine-readable signals.
  • machine-readable signal refers to any signal used to provide machine instructions and/or data to a programmable processor.
  • the systems and techniques described herein can be implemented on a computer having a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to the user. ); and a keyboard and pointing device (eg, a mouse or a trackball) through which a user can provide input to the computer.
  • a display device e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor
  • a keyboard and pointing device eg, a mouse or a trackball
  • Other kinds of devices can also be used to provide interaction with the user; for example, the feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and can be in any form (including Acoustic input, speech input or, tactile input) to receive input from the user.
  • the systems and techniques described herein can be implemented in a computing system that includes back-end components (e.g., as a data server), or a computing system that includes middleware components (e.g., an application server), or a computing system that includes front-end components (e.g., as a a user computer having a graphical user interface or web browser through which a user can interact with embodiments of the systems and techniques described herein), or including such backend components, middleware components, Or any combination of front-end components in a computing system.
  • the components of the system can be interconnected by any form or medium of digital data communication, eg, a communication network. Examples of communication networks include: Local Area Network (LAN), Wide Area Network (WAN) and the Internet.
  • a computer system may include clients and servers.
  • Clients and servers are generally remote from each other and typically interact through a communication network.
  • the relationship of client and server arises by computer programs running on the respective computers and having a client-server relationship to each other.
  • the training sample set is obtained, wherein the training samples in the training sample set include the target image and the feature map corresponding to the target image; the target image included in the training sample set in the training sample set is used as the input data of the network , the feature map corresponding to the input target image is used as the output data of the network, and the image detection model is trained.
  • the network structure of the model is constructed based on each model substructure in various substructures, and the optimization goal of the image detection model is By sampling the structural parameters of each model substructure in various substructures, the optimal solution of the model network structure is learned, and each type of substructure is processed by the feature map processed by each model substructure in the network structure search space of the model.
  • the research and optimization of the image detection model structure is realized.
  • classifying the model substructure different levels of information are separated from each other to generate a model structure with different content, and an image detection model based on a separated search space is realized. build. Allocate more resources to the processing of required types of information according to requirements, realize lightweight and efficient image detection, avoid waste of resources, and improve the accuracy and efficiency of the model as a whole.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

Disclosed in the present application are a method and apparatus for training a model. A specific implementation scheme includes: acquiring a training sample set, wherein training samples in the training sample set comprise target images and feature maps corresponding to the target images; and by using the target images comprised in the training samples in the training sample set as input data of a network, and using the feature maps corresponding to the input target images as output data of the network, performing training to obtain an image detection model, wherein a network structure of the image detection model is constructed on the basis of each model sub-structure in each type of sub-structure; an optimization objective of the image detection model is to obtain, by means of sampling a structure parameter of each model sub-structure in each type of sub-structure and performing learning, an optimal solution of the network structure of the image detection model; and each type of sub-structure is obtained by means of analyzing the feature map that is processed by each model sub-structure in a search space of the network structure of the image detection model.

Description

用于训练模型的方法、装置、设备以及存储介质Method, device, device and storage medium for training model
相关申请的交叉引用Cross References to Related Applications
本申请要求于2021年6月28日提交的申请号为202110717772.7、发明名称为“用于训练模型的方法、装置、设备以及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application with application number 202110717772.7 and titled "Method, device, device and storage medium for training model" filed on June 28, 2021, the entire contents of which are hereby incorporated by reference In this application.
技术领域technical field
本申请的实施例涉及计算机技术领域,具体涉及图像处理技术领域,尤其涉及用于训练模型的方法和装置。The embodiments of the present application relate to the field of computer technology, specifically to the field of image processing technology, and especially to a method and device for training a model.
背景技术Background technique
计算机视觉以及图形学技术的发展使得数字图像的伪造越来越容易,导致生成了大量恶意虚假图像与视频在网络上流传,造成了严重的社会影响。为应对这一威胁,图像鉴伪技术应运而生。图像鉴伪即将经过伪造的图像从真实的图像中识别出来。由于存在多种底层原理相差巨大的数字图像伪造技术,图像鉴伪模型需要同时具备针对不同造假方法的检测能力。The development of computer vision and graphics technology has made it easier to forge digital images, resulting in the generation of a large number of malicious false images and videos circulating on the Internet, causing serious social impact. In order to deal with this threat, image counterfeiting technology came into being. Image counterfeiting is to identify the faked image from the real image. Due to the existence of a variety of digital image forgery technologies with vastly different underlying principles, the image authentication model needs to have the ability to detect different forgery methods at the same time.
发明内容Contents of the invention
本申请提供了一种用于训练模型的方法、装置、设备以及存储介质和一种用于生成信息的方法、装置、设备以及存储介质。The present application provides a method, device, device and storage medium for training a model and a method, device, device and storage medium for generating information.
本申请的一些实施例提供了一种用于训练模型的方法,该方法包括:获取训练样本集,其中,训练样本集中的训练样本包括目标图像和与目标图像对应的特征图;将训练样本集中的训练样本包括的目标图像作为网络的输入数据,将与输入的目标图像对应的特征图作为网络的输出数据,训练得到图像检测模型,其中,图像检测模型的网络结构基于各类子结构中的各个模型子结构而构建,图像检测模型的优化目标为通过对各类子结构中各个模型子结构的结构参数进行采样,学习得到图像检测模型的网络结 构的最优解,各类子结构通过对图像检测模型的网络结构搜索空间中各个模型子结构所处理的特征图进行分析而得到。Some embodiments of the present application provide a method for training a model, the method comprising: obtaining a training sample set, wherein the training samples in the training sample set include a target image and a feature map corresponding to the target image; The target image included in the training sample is used as the input data of the network, and the feature map corresponding to the input target image is used as the output data of the network, and the image detection model is obtained through training, wherein the network structure of the image detection model is based on various substructures. Each model substructure is constructed. The optimization goal of the image detection model is to learn the optimal solution of the network structure of the image detection model by sampling the structural parameters of each model substructure in each substructure. It is obtained by analyzing the feature maps processed by each model substructure in the network structure search space of the image detection model.
在一些实施例中,各类子结构通过对图像检测模型的网络结构搜索空间中各个模型子结构所处理的特征图的特征语义层级进行划分而得到。In some embodiments, various substructures are obtained by dividing feature semantic levels of feature maps processed by each model substructure in the network structure search space of the image detection model.
在一些实施例中,图像检测模型的网络结构基于将同类子结构中的各个模型子结构进行汇总后,按照各类子结构划分后的层级进行堆叠而构建。In some embodiments, the network structure of the image detection model is constructed based on summarizing model substructures in the same substructure and stacking them according to the levels divided by various substructures.
在一些实施例中,图像检测模型的训练用于表征将图像检测模型中的各个模型子结构进行多个特征层变换,每个特征层的输入数据包括:图像检测模型的网络结构中相应模型子结构之前的至少两个模型子结构的输出数据和相应模型子结构之前的各个特征层的输出数据中的至少一方。具体地,每个特征层的输入数据可以包括图像检测模型的网络结构中相应模型子结构之前的至少两个模型子结构的输出数据;或者,每个特征层的输入数据可以包括图像检测模型的网络结构中相应模型子结构之前的各个特征层的输出数据;再或者,每个特征层的输入数据可以包括:图像检测模型的网络结构中相应模型子结构之前的至少两个模型子结构的输出数据和相应模型子结构之前的各个特征层的输出数据。In some embodiments, the training of the image detection model is used to represent that each model substructure in the image detection model is transformed into multiple feature layers, and the input data of each feature layer includes: corresponding model substructures in the network structure of the image detection model At least one of the output data of at least two model substructures preceding the structure and the output data of each feature layer preceding the corresponding model substructure. Specifically, the input data of each feature layer may include the output data of at least two model substructures before the corresponding model substructure in the network structure of the image detection model; or, the input data of each feature layer may include the output data of the image detection model The output data of each feature layer before the corresponding model substructure in the network structure; or, the input data of each feature layer may include: the output of at least two model substructures before the corresponding model substructure in the network structure of the image detection model The output data of each feature layer before the data and the corresponding model substructure.
在一些实施例中,各类子结构包括:一类子结构和二类子结构,图像检测模型包括第一子模型和第二子模型,第一子模型的网络结构基于一类子结构中的各个模型子结构而构建,第二子模型的网络结构基于二类子结构中的各个模型子结构而构建;将训练样本集中的训练样本包括的目标图像作为网络的输入数据,将与输入的目标图像对应的特征图作为网络的输出数据,训练得到图像检测模型,包括:将训练样本集中的训练样本包括的目标图像作为网络的输入数据,将与输入的目标图像对应的特征图作为网络的输出数据,调整一类子结构中各个模型子结构的结构参数,得到训练完成的第一子模型;将训练样本集中的训练样本包括的目标图像作为网络的输入数据,将与输入的目标图像对应的特征图作为网络的输出数据,调整二类子结构中各个模型子结构的结构参数,得到训练完成的第二子模型;基于训练完成的第一子模型和训练完成的第二子模型,确定图像检测模型。In some embodiments, various substructures include: a first substructure and a second substructure, the image detection model includes a first submodel and a second submodel, and the network structure of the first submodel is based on the Each model substructure is constructed, and the network structure of the second submodel is constructed based on each model substructure in the second substructure; the target image included in the training sample in the training sample set is used as the input data of the network, and the input target The feature map corresponding to the image is used as the output data of the network, and the image detection model is obtained by training, including: using the target image included in the training samples in the training sample set as the input data of the network, and using the feature map corresponding to the input target image as the output of the network Data, adjust the structural parameters of each model substructure in a class of substructures, and obtain the first submodel after training; use the target image included in the training sample in the training sample set as the input data of the network, and use the input data corresponding to the input target image The feature map is used as the output data of the network, and the structural parameters of each model substructure in the second type of substructure are adjusted to obtain the trained second submodel; based on the trained first submodel and the trained second submodel, determine the image detection model.
在一些实施例中,图像检测模型包括第一检测子模型和第二检测子模型,第一检测子模型的模型参数为图像检测模型的训练参数,第二检测子模型的模型参数为图像检测模型的结构参数;将训练样本集中的训练样本包括的目标图像作为网络的输入数据,将与输入的目标图像对应的特征图作为网络的输出数据,训练得到图像检测模型,包括:将训练样本集中的训练样本包括的目标图像作为输入数据,将与输入的目标图像对应的特征图作为输出数据,调整训练参数,得到训练完成的第一检测子模型;将训练样本集中的训练样本包括的目标图像作为输入数据,将与输入的目标图像对应的特征图作为输出数据,调整结构参数,得到训练完成的第二检测子模型;基于训练完成的第一检测子模型和训练完成的第二检测子模型,确定图像检测模型。In some embodiments, the image detection model includes a first detection sub-model and a second detection sub-model, the model parameters of the first detection sub-model are the training parameters of the image detection model, and the model parameters of the second detection sub-model are the image detection model Structural parameters; the target image included in the training samples in the training sample set is used as the input data of the network, and the feature map corresponding to the input target image is used as the output data of the network, and the image detection model is obtained by training, including: the training sample set The target image included in the training sample is used as input data, the feature map corresponding to the input target image is used as output data, and the training parameters are adjusted to obtain the first detection sub-model that has been trained; the target image included in the training sample set in the training sample set is used as Input data, use the feature map corresponding to the input target image as output data, adjust the structural parameters, and obtain the trained second detection sub-model; based on the trained first detection sub-model and the trained second detection sub-model, Determine the image detection model.
本申请的一些实施例提供了一种用于生成信息的方法,方法包括:获取目标图像;将目标图像输入至预先训练的图像检测模型,生成目标图像对应的特征图,其中,图像检测模型通过如上述用于训练模型的方法中任一实施例的方法训练得到。Some embodiments of the present application provide a method for generating information. The method includes: acquiring a target image; inputting the target image into a pre-trained image detection model to generate a feature map corresponding to the target image, wherein the image detection model passes Obtained by training according to any embodiment of the method for training the model above.
在一些实施例中,图像检测模型包括第一检测子模型和第二检测子模型,第一检测子模型用于表征利用机器学习算法对图像检测模型的训练参数进行训练,第二检测子模型用于表征利用机器学习算法对图像检测模型的结构参数进行训练;将目标图像输入至预先训练的图像检测模型,生成目标图像对应的特征图,包括:将目标图像输入至预先训练的第一检测子模型,生成目标图像对应的第一特征图;将目标图像输入至预先训练的第二检测子模型,生成目标图像对应的第二特征图;基于第一特征图和第二特征图,确定与第一特征图和第二特征图相对应的特征图作为目标图像对应的特征图。In some embodiments, the image detection model includes a first detection sub-model and a second detection sub-model, the first detection sub-model is used to represent the training parameters of the image detection model using machine learning algorithms, and the second detection sub-model uses In characterization, the machine learning algorithm is used to train the structural parameters of the image detection model; the target image is input into the pre-trained image detection model, and the feature map corresponding to the target image is generated, including: inputting the target image into the pre-trained first detector model to generate the first feature map corresponding to the target image; input the target image to the pre-trained second detection sub-model to generate the second feature map corresponding to the target image; based on the first feature map and the second feature map, determine the The feature map corresponding to the first feature map and the second feature map is used as the feature map corresponding to the target image.
本申请的一些实施例提供了一种用于训练模型的装置,装置包括:获取单元,被配置成获取训练样本集,其中,训练样本集中的训练样本包括目标图像和与目标图像对应的特征图;训练单元,被配置成将训练样本集中的训练样本包括的目标图像作为网络的输入数据,将与输入的目标图像对应的特征图作为网络的输出数据,训练得到图像检测模型,其中,图像 检测模型的网络结构基于各类子结构中的各个模型子结构而构建,图像检测模型的优化目标为通过对各类子结构中各个模型子结构的结构参数进行采样,学习得到图像检测模型的网络结构的最优解,各类子结构通过对图像检测模型的网络结构搜索空间中各个模型子结构所处理的特征图进行分析而得到。Some embodiments of the present application provide a device for training a model, the device includes: an acquisition unit configured to acquire a training sample set, wherein the training samples in the training sample set include a target image and a feature map corresponding to the target image The training unit is configured to use the target image included in the training sample set as the input data of the network, and use the feature map corresponding to the input target image as the output data of the network to train the image detection model, wherein the image detection The network structure of the model is constructed based on each model substructure in various substructures. The optimization goal of the image detection model is to learn the network structure of the image detection model by sampling the structural parameters of each model substructure in each substructure. The optimal solution of , various substructures are obtained by analyzing the feature maps processed by each model substructure in the network structure search space of the image detection model.
在一些实施例中,训练单元中的各类子结构通过对图像检测模型的网络结构搜索空间中各个模型子结构所处理的特征图的特征语义层级进行划分而得到。In some embodiments, various substructures in the training unit are obtained by dividing the feature semantic levels of the feature maps processed by each model substructure in the network structure search space of the image detection model.
在一些实施例中,训练单元中的图像检测模型的网络结构基于将同类子结构中的各个模型子结构进行汇总后,按照各类子结构划分后的层级进行堆叠而构建。In some embodiments, the network structure of the image detection model in the training unit is constructed based on summarizing the model substructures in the same kind of substructures and stacking them according to the levels divided by various substructures.
在一些实施例中,训练单元中的图像检测模型的训练用于表征将图像检测模型中的各个模型子结构进行多个特征层变换,每个特征层的输入数据包括:图像检测模型的网络结构中相应模型子结构之前的至少两个模型子结构的输出数据和相应模型子结构之前的各个特征层的输出数据中的至少一方。具体地,每个特征层的输入数据可以包括图像检测模型的网络结构中相应模型子结构之前的至少两个模型子结构的输出数据;或者,每个特征层的输入数据可以包括图像检测模型的网络结构中相应模型子结构之前的各个特征层的输出数据;再或者,每个特征层的输入数据可以包括:图像检测模型的网络结构中相应模型子结构之前的至少两个模型子结构的输出数据和相应模型子结构之前的各个特征层的输出数据。In some embodiments, the training of the image detection model in the training unit is used to represent that each model substructure in the image detection model is transformed into multiple feature layers, and the input data of each feature layer includes: the network structure of the image detection model At least one of the output data of at least two model substructures before the corresponding model substructure in the corresponding model substructure and the output data of each feature layer before the corresponding model substructure. Specifically, the input data of each feature layer may include the output data of at least two model substructures before the corresponding model substructure in the network structure of the image detection model; or, the input data of each feature layer may include the output data of the image detection model The output data of each feature layer before the corresponding model substructure in the network structure; or, the input data of each feature layer may include: the output of at least two model substructures before the corresponding model substructure in the network structure of the image detection model The output data of each feature layer before the data and the corresponding model substructure.
在一些实施例中,训练单元中的各类子结构包括:一类子结构和二类子结构,图像检测模型包括第一子模型和第二子模型,第一子模型的网络结构基于一类子结构中的各个模型子结构而构建,第二子模型的网络结构基于二类子结构中的各个模型子结构而构建;训练单元,包括:第一训练模块,被配置成将训练样本集中的训练样本包括的目标图像作为网络的输入数据,将与输入的目标图像对应的特征图作为网络的输出数据,调整一类子结构中各个模型子结构的结构参数,得到训练完成的第一子模型;第二训练模块,被配置成将训练样本集中的训练样本包括的目标图像作为网 络的输入数据,将与输入的目标图像对应的特征图作为网络的输出数据,调整二类子结构中各个模型子结构的结构参数,得到训练完成的第二子模型;第一确定模块,被配置成基于训练完成的第一子模型和训练完成的第二子模型,确定图像检测模型。In some embodiments, the various substructures in the training unit include: a first substructure and a second substructure, the image detection model includes a first submodel and a second submodel, and the network structure of the first submodel is based on a class Each model substructure in the substructure is constructed, and the network structure of the second submodel is constructed based on each model substructure in the second type substructure; the training unit includes: the first training module, which is configured to use the training sample set The target image included in the training sample is used as the input data of the network, and the feature map corresponding to the input target image is used as the output data of the network, and the structural parameters of each model substructure in a class of substructures are adjusted to obtain the first trained submodel ; The second training module is configured to use the target image included in the training sample set as the input data of the network, and use the feature map corresponding to the input target image as the output data of the network, and adjust each model in the second class substructure The structural parameter of the substructure is to obtain the trained second sub-model; the first determination module is configured to determine the image detection model based on the trained first sub-model and the trained second sub-model.
在一些实施例中,训练单元中的图像检测模型包括第一检测子模型和第二检测子模型,第一检测子模型的模型参数为图像检测模型的训练参数,第二检测子模型的模型参数为图像检测模型的结构参数;训练单元,包括:第三训练模块,被配置成将训练样本集中的训练样本包括的目标图像作为输入数据,将与输入的目标图像对应的特征图作为输出数据,调整训练参数,得到训练完成的第一检测子模型;第四训练模块,被配置成将训练样本集中的训练样本包括的目标图像作为输入数据,将与输入的目标图像对应的特征图作为输出数据,调整结构参数,得到训练完成的第二检测子模型;第二确定模块,被配置成基于训练完成的第一检测子模型和训练完成的第二检测子模型,确定图像检测模型。In some embodiments, the image detection model in the training unit includes a first detection sub-model and a second detection sub-model, the model parameters of the first detection sub-model are the training parameters of the image detection model, and the model parameters of the second detection sub-model is the structural parameter of the image detection model; the training unit includes: a third training module configured to use the target image included in the training samples in the training sample set as input data, and use the feature map corresponding to the input target image as output data, Adjust the training parameters to obtain the trained first detection sub-model; the fourth training module is configured to use the target image included in the training samples in the training sample set as input data, and use the feature map corresponding to the input target image as output data , adjust the structural parameters to obtain the trained second detection sub-model; the second determination module is configured to determine the image detection model based on the trained first detection sub-model and the trained second detection sub-model.
本申请的一些实施例提供了一种用于生成信息的装置,装置包括:图像获取单元,被配置成获取目标图像;生成单元,被配置成将目标图像输入至预先训练的图像检测模型,生成目标图像对应的特征图,其中,图像检测模型通过如上述用于训练模型的方法中任一实施例的方法训练得到。Some embodiments of the present application provide an apparatus for generating information, and the apparatus includes: an image acquisition unit configured to acquire a target image; a generation unit configured to input the target image into a pre-trained image detection model to generate A feature map corresponding to the target image, wherein the image detection model is obtained through training according to any embodiment of the method for training the model above.
在一些实施例中,生成单元中的图像检测模型包括第一检测子模型和第二检测子模型,第一检测子模型用于表征利用机器学习算法对图像检测模型的训练参数进行训练,第二检测子模型用于表征利用机器学习算法对图像检测模型的结构参数进行训练;生成单元,包括:第一生成模块,被配置成将目标图像输入至预先训练的第一检测子模型,生成目标图像对应的第一特征图;第二生成模块,被配置成将目标图像输入至预先训练的第二检测子模型,生成目标图像对应的第二特征图;第二确定模块,被配置成基于第一特征图和第二特征图,确定与第一特征图和第二特征图相对应的特征图作为目标图像对应的特征图。In some embodiments, the image detection model in the generation unit includes a first detection sub-model and a second detection sub-model, the first detection sub-model is used to represent the training parameters of the image detection model using machine learning algorithms, and the second detection sub-model The detection sub-model is used to represent and train the structural parameters of the image detection model using a machine learning algorithm; the generation unit includes: a first generation module configured to input the target image to the pre-trained first detection sub-model to generate the target image The corresponding first feature map; the second generation module is configured to input the target image to the pre-trained second detection sub-model to generate a second feature map corresponding to the target image; the second determination module is configured to be based on the first A feature map and a second feature map, determining a feature map corresponding to the first feature map and the second feature map as a feature map corresponding to the target image.
本申请的一些实施例提供了一种电子设备,包括:至少一个处理器;以及与至少一个处理器通信连接的存储器;其中,存储器存储有可被至少 一个处理器执行的指令,指令被至少一个处理器执行,以使至少一个处理器能够执行如前述实现方式描述的方法。Some embodiments of the present application provide an electronic device, including: at least one processor; and a memory communicatively connected to the at least one processor; wherein, the memory stores instructions executable by at least one processor, and the instructions are executed by at least one processor. Executed by a processor, so that at least one processor can execute the method described in the foregoing implementation manner.
本申请的一些实施例提供了一种存储有计算机指令的非瞬时计算机可读存储介质,其特征在于,计算机指令用于使计算机执行如前述实现方式描述的方法。Some embodiments of the present application provide a non-transitory computer-readable storage medium storing computer instructions, wherein the computer instructions are used to cause a computer to execute the method described in the foregoing implementation manners.
应当理解,本部分所描述的内容并非旨在标识本申请的实施例的关键或重要特征,也不用于限制本申请的范围。本申请的其它特征将通过以下的说明书而变得容易理解。It should be understood that what is described in this section is not intended to identify key or important features of the embodiments of the application, nor is it intended to limit the scope of the application. Other features of the present application will be easily understood from the following description.
附图说明Description of drawings
附图用于更好地理解本方案,不构成对本申请的限定。The accompanying drawings are used to better understand the solution, and do not constitute a limitation to the application.
图1是根据本申请的用于训练模型的方法的第一实施例的示意图;FIG. 1 is a schematic diagram of a first embodiment of a method for training a model according to the present application;
图2是可以实现本申请实施例的用于训练模型的方法的场景图;FIG. 2 is a scene diagram of a method for training a model that can implement an embodiment of the present application;
图3是根据本申请的用于训练模型的方法的第二实施例的示意图;3 is a schematic diagram of a second embodiment of a method for training a model according to the present application;
图4是实现本申请实施例的模型子模块结构的示意图;Fig. 4 is the schematic diagram that realizes the model submodule structure of the embodiment of the present application;
图5A是根据本申请的用于训练模型的方法的一个网络模型架构的示意图;5A is a schematic diagram of a network model architecture according to the method for training a model of the present application;
图5B和图5C是网络模型架构中模型子模块结构的抽样过程示意图;5B and 5C are schematic diagrams of the sampling process of the model sub-module structure in the network model architecture;
图5D是网络模型架构的抽样结果的示意图;FIG. 5D is a schematic diagram of the sampling results of the network model architecture;
图6是根据本申请的用于生成信息的方法的第一实施例的示意图;Fig. 6 is a schematic diagram of a first embodiment of a method for generating information according to the present application;
图7是根据本申请的用于训练模型的装置的一个实施例的结构示意图;FIG. 7 is a schematic structural diagram of an embodiment of a device for training a model according to the present application;
图8是根据本申请的用于生成信息的装置的一个实施例的结构示意图;Fig. 8 is a schematic structural diagram of an embodiment of a device for generating information according to the present application;
图9是用来实现本申请实施例的电子设备的框图。FIG. 9 is a block diagram of an electronic device used to implement an embodiment of the present application.
具体实施方式detailed description
以下结合附图对本申请的示范性实施例做出说明,其中包括本申请实施例的各种细节以助于理解,应当将它们认为仅仅是示范性的。因此,本 领域普通技术人员应当认识到,可以对这里描述的实施例做出各种改变和修改,而不会背离本申请的范围和精神。同样,为了清楚和简明,以下的描述中省略了对公知功能和结构的描述。Exemplary embodiments of the present application are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present application to facilitate understanding, and they should be regarded as exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
需要说明的是,在不冲突的情况下,本申请中的实施例及实施例中的特征可以相互组合。下面将参考附图并结合实施例来详细说明本申请。It should be noted that, in the case of no conflict, the embodiments in the present application and the features in the embodiments can be combined with each other. The present application will be described in detail below with reference to the accompanying drawings and embodiments.
图1示出了根据本申请的用于训练模型的方法的第一实施例的示意图100。该用于训练模型的方法,包括以下步骤:FIG. 1 shows a schematic diagram 100 of a first embodiment of a method for training a model according to the present application. The method for training the model includes the following steps:
步骤101,获取训练样本集。 Step 101, obtain a training sample set.
在本实施例中,执行主体(例如服务器或终端设备)可以通过有线连接或无线连接的方式从其他电子设备或者本地获取训练样本集。训练样本集中的训练样本包括目标图像和与目标图像对应的特征图。需要说明的是,上述无线连接方式可以包括但不限于3G、4G、5G连接、WiFi连接、蓝牙连接、WiMAX连接、Zigbee连接、UWB(ultra wideband)连接、以及其他现在已知或将来开发的无线连接方式。In this embodiment, the execution subject (such as a server or a terminal device) may obtain the training sample set from other electronic devices or locally through a wired connection or a wireless connection. The training samples in the training sample set include target images and feature maps corresponding to the target images. It should be noted that the above-mentioned wireless connection methods may include but not limited to 3G, 4G, 5G connection, WiFi connection, Bluetooth connection, WiMAX connection, Zigbee connection, UWB (ultra wideband) connection, and other wireless connections known or developed in the future. connection method.
步骤102,将训练样本集中的训练样本包括的目标图像作为网络的输入数据,将与输入的目标图像对应的特征图作为网络的输出数据,训练得到图像检测模型。 Step 102, using the target image included in the training samples in the training sample set as the input data of the network, and using the feature map corresponding to the input target image as the output data of the network, to train an image detection model.
在本实施例中,执行主体可以利用机器学习算法,将步骤101中得到的目标图像作为网络的输入数据,将与输入的目标图像对应的特征图作为网络的输出数据,训练得到图像检测模型。其中,模型的网络结构基于各类子结构中的各个模型子结构而构建,比如将不同类型的子结构中的各个模型子模块结构按照预设搭建方式进行堆叠,图像检测模型的优化目标可以为利用机器学习算法通过每次对各类子结构中各个模型子结构的结构参数进行采样,不断采样出更优的结构,直至收敛,得到最优的模型网络结构,各类子结构可以通过对模型的网络结构搜索空间中各个模型子结构所处理的特征图进行特征信息分析,基于分析结果分类得到各类子结构中的各个模型子结构,特征信息可以为在图像中伪造的特征信息。In this embodiment, the execution subject can use the machine learning algorithm to train the image detection model by using the target image obtained in step 101 as the input data of the network, and using the feature map corresponding to the input target image as the output data of the network. Among them, the network structure of the model is constructed based on each model substructure in various substructures. For example, each model substructure in different types of substructures is stacked according to the preset construction method. The optimization goal of the image detection model can be The machine learning algorithm is used to sample the structural parameters of each model substructure in various substructures each time, and continuously sample a better structure until convergence to obtain the optimal model network structure. Various substructures can be obtained through the model The feature map processed by each model substructure in the network structure search space is analyzed for feature information, and each model substructure in each substructure is classified based on the analysis results. The feature information can be forged feature information in the image.
需要说明的是,上述执行主体中可以存储有预先训练的图像检测模型,图像检测模型的网络架构预先定义,例如定义八层的基本模型架构用于模 型子模块结构的搜索。上述执行主体可以利用该图像检测模型,在上述目标图像中预测与目标图像中的特征信息对应的特征图。其中,该图像检测模型可以用于表征目标对象与特征图之间的对应关系。该图像预测模型的模型结构可以基于相关技术中的各种逻辑回归模型而构建,例如但不限于:BERT、FastText、TextCNN等。It should be noted that the above execution body can store a pre-trained image detection model, and the network architecture of the image detection model is predefined, for example, an eight-layer basic model architecture is defined for the search of the model sub-module structure. The execution subject may use the image detection model to predict a feature map corresponding to feature information in the target image in the target image. Among them, the image detection model can be used to characterize the correspondence between the target object and the feature map. The model structure of the image prediction model can be constructed based on various logistic regression models in related technologies, such as but not limited to: BERT, FastText, TextCNN, etc.
需要指出的是,该图像检测模型例如可以是数据表或计算公式等,本实施例不对此方面内容做任何限定。上述机器学习算法是目前广泛研究和应用的公知技术,在此不再赘述。It should be noted that the image detection model may be, for example, a data table or a calculation formula, and this embodiment does not make any limitation on this aspect. The above-mentioned machine learning algorithm is a well-known technology widely researched and applied at present, and will not be repeated here.
继续参见图2,本实施例的用于训练模型的方法200运行于服务器201中。服务器201首先获取训练样本集202,其中,训练样本集中的训练样本包括目标图像和与目标图像对应的特征图,然后服务器201将训练样本集中的训练样本包括的目标图像作为网络的输入数据,将与输入的目标图像对应的特征图作为网络的输出数据,训练得到图像检测模型203,其中,模型的网络结构基于各类子结构中的各个模型子结构而构建,图像检测模型的优化目标为通过对各类子结构中各个模型子结构的结构参数进行采样,学习得到模型网络结构的最优解,各类子结构通过对模型的网络结构搜索空间中各个模型子结构所处理的特征图进行分析而得到。Continuing to refer to FIG. 2 , the method 200 for training a model of this embodiment runs on a server 201 . The server 201 first obtains the training sample set 202, wherein the training samples in the training sample set include the target image and the feature map corresponding to the target image, and then the server 201 uses the target image included in the training sample set as the input data of the network, and The feature map corresponding to the input target image is used as the output data of the network to train the image detection model 203, wherein the network structure of the model is constructed based on each model substructure in various substructures, and the optimization goal of the image detection model is to pass Sampling the structural parameters of each model substructure in various substructures, learning to obtain the optimal solution of the model network structure, and analyzing the feature maps processed by each model substructure in the network structure search space of the model for each substructure And get.
本申请的上述实施例提供的用于训练模型的方法采用获取训练样本集,其中,训练样本集中的训练样本包括目标图像和与目标图像对应的特征图;将训练样本集中的训练样本包括的目标图像作为网络的输入数据,将与输入的目标图像对应的特征图作为网络的输出数据,训练得到图像检测模型,其中,模型的网络结构基于各类子结构中的各个模型子结构而构建,图像检测模型的优化目标为通过对各类子结构中各个模型子结构的结构参数进行采样,学习得到模型网络结构的最优解,各类子结构通过对模型的网络结构搜索空间中各个模型子结构所处理的特征图进行分析而得到,实现了针对图像检测模型结构的研究和优化,通过对模型子结构进行分类,将不同层次信息相互分离,生成不同内容的模型结构,实现了一种基于分离搜索空间的图像检测模型的构建。根据需求将更多资源分配到对所需类型信息的处理上,实现轻量级地图像鉴伪,避免了资源的浪费,从 整体上提高了模型的精度和效率。The method for training the model provided by the above-mentioned embodiments of the present application adopts obtaining a training sample set, wherein, the training samples in the training sample set include a target image and a feature map corresponding to the target image; the training samples in the training sample set include the target The image is used as the input data of the network, and the feature map corresponding to the input target image is used as the output data of the network to train the image detection model. The network structure of the model is constructed based on each model substructure in various substructures. The image The optimization goal of the detection model is to learn the optimal solution of the model network structure by sampling the structural parameters of each model substructure in various substructures. The processed feature map is analyzed and obtained, and the research and optimization of the image detection model structure is realized. By classifying the model substructure, different levels of information are separated from each other, and a model structure with different contents is generated, which realizes a separation based on Construction of Image Detection Models in Search Space. Allocate more resources to the processing of required types of information according to requirements, realize lightweight ground image authentication, avoid waste of resources, and improve the accuracy and efficiency of the model as a whole.
进一步参考图3,其示出了用于训练模型的方法的第二实施例的示意图300。该方法的流程包括以下步骤:With further reference to FIG. 3 , a schematic diagram 300 of a second embodiment of a method for training a model is shown. The flow of the method includes the following steps:
步骤301,获取训练样本集。 Step 301, acquire a training sample set.
步骤302,将训练样本集中的训练样本包括的目标图像作为网络的输入数据,将与输入的目标图像对应的特征图作为网络的输出数据,训练得到图像检测模型。In step 302, the target image included in the training samples in the training sample set is used as the input data of the network, and the feature map corresponding to the input target image is used as the output data of the network to train an image detection model.
在本实施例中,执行主体可以利用机器学习算法,将步骤301中得到的目标图像作为网络的输入数据,将与输入的目标图像对应的特征图作为网络的输出数据,训练得到图像检测模型。其中,模型的网络结构基于各类子结构中的各个模型子结构而构建,图像检测模型的优化目标可以为在训练中,通过每次对各类子结构中各个模型子结构的结构参数进行采样,不断采样出更优的结构,直至收敛,得到最优的模型网络结构,各类子结构通过对模型的网络结构搜索空间中各个模型子结构所处理的特征图的特征语义层级进行划分而得到,例如,基于语义特征按照不同深度的操作方式将特征图对应信息分为浅层/底层信息和深层信息,并将模型浅层信息对应的各个模型子结构划分为浅层子结构,将模型深层信息对应的各个模型子结构划分为深层子结构。浅层/底层信息可以包括底层像素特征、像素分布和频域结构等信息,深层信息可以包括物体的结构、形状等信息。In this embodiment, the execution subject can use the machine learning algorithm to train the image detection model by using the target image obtained in step 301 as the input data of the network, and using the feature map corresponding to the input target image as the output data of the network. Among them, the network structure of the model is constructed based on each model substructure in each substructure, and the optimization goal of the image detection model can be during training, by sampling the structural parameters of each model substructure in each substructure each time , continue to sample a better structure until it converges to obtain the optimal model network structure, and various substructures are obtained by dividing the feature semantic levels of the feature maps processed by each model substructure in the network structure search space of the model , for example, based on the semantic features, the corresponding information of the feature map is divided into shallow/bottom information and deep information according to the operation mode of different depths, and each model substructure corresponding to the shallow information of the model is divided into shallow substructures, and the deep layer of the model Each model substructure corresponding to the information is divided into deep substructures. Shallow layer/bottom layer information can include bottom layer pixel features, pixel distribution, and frequency domain structure, and deep layer information can include object structure, shape, and other information.
在本实施例中,图像检测模型的训练可以用于表征将图像检测模型中的各个模型子结构进行多个特征层变换,每个特征层的输入数据包括:模型网络结构中相应模型子结构之前的至少两个模型子结构的输出数据和相应模型子结构之前的各个特征层的输出数据中的至少一方。例如模型子模块结构定义如图4所示,每个子模块结构接收前两个子模块结构的输出作为输入(图中Input 1与Input 2),对这两个输入进行三层的特征变换(图中Node 1~Node 3),每层特征图的输入包含子模块结构的两个输入以及之前的各层特征图的输出,并将结果求和作为该层的输出,最后将每个特征层(Node 1~Node 3)的输出叠加作为该模型子模块结构的输出。 In this embodiment, the training of the image detection model can be used to represent that each model substructure in the image detection model is transformed into multiple feature layers, and the input data of each feature layer includes: before the corresponding model substructure in the model network structure At least one of the output data of at least two model substructures and the output data of each feature layer before the corresponding model substructure. For example, the model sub-module structure definition is shown in Figure 4. Each sub-module structure receives the output of the first two sub-module structures as input (Input 1 and Input 2 in the figure), and performs three-layer feature transformation on these two inputs (in the figure Node 1 ~ Node 3 ), the input of each feature map includes two inputs of the sub-module structure and the output of the previous feature maps of each layer, and the sum of the results is used as the output of this layer, and finally each feature layer (Node 1 ~ Node 3 ) output superposition as the output of the model sub-module structure.
这里举例说明图像检测模型的抽样过程,首先定义八层的图像检测模 型的基本网络模型架构,如图5A所示,底层两个模块(cell 1和cell 2)被划分为基于浅层子结构的浅层单元,上面六个模块(cell 3~cell 8)被划分为基于深层子结构的深层单元。控制器每次对浅层单元和深层单元分别采样出一个特定结构,从而得到一个候选模型。使用该候选模型的测试结果利用强化学习或梯度优化的方法训练控制器,使其不断采样出更优的结构,直至收敛,得到最优的模型结构。控制器可以使用常用的结构搜索算法。在抽样过程中,每个单元每次搜索各条边(如图5B所示的虚线边)选择一种操作类型,得到该单元的一个子网络(如图5C所示),最终得到的抽样结果如图5D所示,其中,左侧图为深层单元的抽样结果,右侧图为浅层单元的抽样结果,max_pool表示最大值池化,dil_conv_5x5表示kernel size为5的空洞卷积(dilated convolution),dil_conv_3x3表示kernel size为3的空洞卷积(dilated convolution),identity表示恒等映射,sep_conv_3x3表示kernel_size为3的深度可分解卷积(depth-wise separable convolution)。Here is an example to illustrate the sampling process of the image detection model. First, the basic network model architecture of the eight-layer image detection model is defined. As shown in Figure 5A, the bottom two modules (cell 1 and cell 2) are divided into shallow substructure-based Shallow unit, the upper six modules (cell 3 ~ cell 8) are divided into deep units based on deep substructure. The controller samples a specific structure for the shallow unit and the deep unit each time, so as to obtain a candidate model. The test results of the candidate model are used to train the controller by means of reinforcement learning or gradient optimization, so that it can continuously sample a better structure until it converges to obtain the optimal model structure. The controller can use commonly used structure search algorithms. In the sampling process, each unit searches for each edge (the dotted line edge shown in Figure 5B) and selects an operation type to obtain a subnetwork of the unit (as shown in Figure 5C), and finally obtains the sampling result As shown in Figure 5D, the left picture is the sampling result of the deep unit, the right picture is the sampling result of the shallow unit, max_pool indicates the maximum pooling, and dil_conv_5x5 indicates the dilated convolution with a kernel size of 5 , dil_conv_3x3 means a dilated convolution with a kernel size of 3, identity means an identity map, and sep_conv_3x3 means a depth-wise separable convolution with a kernel_size of 3.
在本实施例的一些可选的实现方式中,模型的网络结构基于将同类子结构中的各个模型子结构进行汇总后,按照各类子结构划分后的层级进行堆叠而构建。实现模型浅层构造与深层构造相分离,使模型可以更专注于对底层信息的处理。In some optional implementations of this embodiment, the network structure of the model is constructed based on summarizing each model substructure in the same substructure and then stacking them according to the levels divided by various substructures. The shallow structure of the model is separated from the deep structure, so that the model can focus more on the processing of the underlying information.
在本实施例的一些可选的实现方式中,各类子结构包括:一类子结构和二类子结构,图像检测模型包括第一子模型和第二子模型,第一子模型的网络结构基于一类子结构中的各个模型子结构而构建,第二子模型的网络结构基于二类子结构中的各个模型子结构而构建;将训练样本集中的训练样本包括的目标图像作为网络的输入数据,将与输入的目标图像对应的特征图作为网络的输出数据,训练得到图像检测模型,包括:将训练样本集中的训练样本包括的目标图像作为网络的输入数据,将与输入的目标图像对应的特征图作为网络的输出数据,调整一类子结构中各个模型子结构的结构参数,得到训练完成的第一子模型;将训练样本集中的训练样本包括的目标图像作为网络的输入数据,将与输入的目标图像对应的特征图作为网络的输出数据,调整二类子结构中各个模型子结构的结构参数,得到训练完成的第二子模型;基于训练完成的第一子模型和训练完成的第二子 模型,确定图像检测模型。通过将模型拆分成多类子模块结构,对各类子模块结构搜索最优的子结构,再堆叠最优子结构搭建出完整的模型,提升模型训练效率。In some optional implementations of this embodiment, various substructures include: a first substructure and a second substructure, the image detection model includes a first submodel and a second submodel, and the network structure of the first submodel It is constructed based on each model substructure in the first substructure, and the network structure of the second submodel is constructed based on each model substructure in the second substructure; the target image included in the training samples in the training sample set is used as the input of the network Data, using the feature map corresponding to the input target image as the output data of the network, training the image detection model, including: using the target image included in the training samples in the training sample set as the input data of the network, corresponding to the input target image The feature map of the network is used as the output data of the network, and the structural parameters of each model substructure in a class of substructures are adjusted to obtain the first submodel after training; the target image included in the training samples in the training sample set is used as the input data of the network, and the The feature map corresponding to the input target image is used as the output data of the network, and the structural parameters of each model substructure in the second type of substructure are adjusted to obtain the second submodel that has been trained; based on the first submodel that has been trained and the completed training The second sub-model determines the image detection model. By splitting the model into multiple types of sub-module structures, searching for the optimal sub-structure for each type of sub-module structure, and then stacking the optimal sub-structures to build a complete model, improving the efficiency of model training.
在本实施例的一些可选的实现方式中,图像检测模型包括第一检测子模型和第二检测子模型,第一检测子模型的模型参数为模型的训练参数,第二检测子模型的模型参数为模型的结构参数;将训练样本集中的训练样本包括的目标图像作为网络的输入数据,将与输入的目标图像对应的特征图作为网络的输出数据,训练得到图像检测模型,包括:将训练样本集中的训练样本包括的目标图像作为输入数据,将与输入的目标图像对应的特征图作为输出数据,调整训练参数,得到训练完成的第一检测子模型;将训练样本集中的训练样本包括的目标图像作为输入数据,将与输入的目标图像对应的特征图作为输出数据,调整结构参数,得到训练完成的第二检测子模型;基于训练完成的第一检测子模型和训练完成的第二检测子模型,确定图像检测模型。图像检测模型可以基于神经网络而构建,神经网络模型的参数包括:训练参数和网络结构参数,训练参数为训练得到的除模型结构参数之外的其他参数,比如步长learning rate、数据样本数量batch size,权重weight decay等,网络结构参数为定义图像检测模型的网络结构的参数,比如网络的层数、每层的算子和卷积中的滤波器尺寸等。通过同时对模型的训练参数和模型结构参数一起进行训练,提升模型训练的精度和效率,使模型的应用更加广泛。In some optional implementations of this embodiment, the image detection model includes a first detection sub-model and a second detection sub-model, the model parameters of the first detection sub-model are the training parameters of the model, and the model of the second detection sub-model The parameter is the structural parameter of the model; the target image included in the training samples in the training sample set is used as the input data of the network, and the feature map corresponding to the input target image is used as the output data of the network, and the image detection model is obtained by training, including: training The target image included in the training sample in the sample set is used as input data, and the feature map corresponding to the input target image is used as output data, and the training parameters are adjusted to obtain the first detection sub-model that has been trained; the training sample included in the training sample set is The target image is used as input data, the feature map corresponding to the input target image is used as output data, and the structural parameters are adjusted to obtain the trained second detection sub-model; based on the trained first detection sub-model and the trained second detection Submodel, which determines the image detection model. The image detection model can be built based on the neural network. The parameters of the neural network model include: training parameters and network structure parameters. The training parameters are other parameters obtained by training except the model structure parameters, such as step size learning rate, data sample number batch size, weight weight decay, etc. The network structure parameters are parameters that define the network structure of the image detection model, such as the number of layers of the network, the operator of each layer, and the filter size in the convolution. By simultaneously training the model's training parameters and model structure parameters, the accuracy and efficiency of model training are improved, and the model is more widely used.
在本实施例中,步骤301的具体操作与图1所示的实施例中的步骤101的操作基本相同,在此不再赘述。In this embodiment, the specific operation of step 301 is basically the same as the operation of step 101 in the embodiment shown in FIG. 1 , and will not be repeated here.
从图3中可以看出,与图1对应的实施例相比,本实施例中的用于训练模型的方法的示意图300采用各类子结构通过对模型的网络结构搜索空间中各个模型子结构所处理的特征图的特征语义层级进行划分而得到,并且模型的训练用于表征将模型中的各个模型子结构进行多个特征层变换,每个特征层的输入数据包括:模型网络结构中相应模型子结构之前的至少两个模型子结构的输出数据和相应模型子结构之前的各个特征层的输出数据中的至少一方。解决了相关技术使用标准的深度学习图像分类模型依 赖的是高层次的语义特征,而图像鉴伪技术依赖的是低层次的语义特征,两者的需求相抵触的问题,解决了目前基于深度学习图像分类模型的设计偏向使用较深的模型结构,在图像鉴伪的应用中,造成了计算资源浪费的问题,通过对模型子结构进行划分,将不同深度的操作相互分离,使用不同的构造,对不同层次信息采用不同的处理方式,实现了一种基于深度分离搜索空间的图像检测模型的构建,在深度分离搜索空间上,实现了轻量级图像鉴伪模型。As can be seen from FIG. 3 , compared with the embodiment corresponding to FIG. 1 , the schematic diagram 300 of the method for training a model in this embodiment adopts various substructures by searching for each model substructure in the network structure space of the model The feature semantic level of the processed feature map is divided, and the training of the model is used to represent the transformation of each model substructure in the model into multiple feature layers. The input data of each feature layer includes: the corresponding in the model network structure At least one of the output data of at least two model substructures preceding the model substructure and the output data of each feature layer preceding the corresponding model substructure. It solves the problem that the standard deep learning image classification model used by related technologies relies on high-level semantic features, while image counterfeiting technology relies on low-level semantic features, and the needs of the two conflict. The design of the image classification model tends to use a deeper model structure. In the application of image forgery, it causes the problem of waste of computing resources. By dividing the model substructure, the operations of different depths are separated from each other, and different structures are used. Different processing methods are used for different levels of information, and the construction of an image detection model based on depth-separated search space is realized. On the depth-separated search space, a lightweight image authentication model is realized.
进一步参考图6,其出了根据本申请的用于生成信息的方法的第一实施例的示意图600。该用于生成信息的方法,包括以下步骤:Further referring to FIG. 6 , it shows a schematic diagram 600 of a first embodiment of a method for generating information according to the present application. The method for generating information includes the following steps:
步骤601,获取目标图像。 Step 601, acquire a target image.
在本实施例中,执行主体(例如服务器或终端设备)可以通过有线连接或无线连接的方式从其他电子设备或者本地获取目标图像。In this embodiment, the execution subject (such as a server or a terminal device) may acquire the target image from other electronic devices or locally through a wired connection or a wireless connection.
步骤602,将目标图像输入至预先训练的图像检测模型,生成目标图像对应的特征图。Step 602: Input the target image into the pre-trained image detection model to generate a feature map corresponding to the target image.
在本实施例中,执行主体可以将步骤601获取到的目标图像输入至预先训练的图像检测模型,生成目标图像中特征信息对应的特征图。图像检测模型通过如上述用于训练模型的方法中任一实施例的方法训练得到。In this embodiment, the execution subject may input the target image obtained in step 601 into a pre-trained image detection model to generate a feature map corresponding to feature information in the target image. The image detection model is obtained through training according to any one of the above-mentioned methods for training the model.
在本实施例的一些可选的实现方式中,图像检测模型包括第一检测子模型和第二检测子模型,第一检测子模型用于表征利用机器学习算法对模型的训练参数进行训练,第二检测子模型用于表征利用机器学习算法对模型的结构参数进行训练;将目标图像输入至预先训练的图像检测模型,生成目标图像对应的特征图,包括:将目标图像输入至预先训练的第一检测子模型,生成目标图像对应的第一特征图;将目标图像输入至预先训练的第二检测子模型,生成目标图像对应的第二特征图;基于第一特征图和第二特征图,确定与第一特征图和第二特征图相对应的特征图作为目标图像对应的特征图。采用两个子模块分开进行特征提取,提升了系统处理效率和精度,通过在原有卷积神经网络模型的基础上增加基于模型结构参数的子模型,使模型改进更加灵活简便。In some optional implementations of this embodiment, the image detection model includes a first detection sub-model and a second detection sub-model, the first detection sub-model is used to represent the training parameters of the model using a machine learning algorithm, and the second detection sub-model The second detection sub-model is used to represent the structural parameters of the model using machine learning algorithms to train; input the target image to the pre-trained image detection model to generate the feature map corresponding to the target image, including: input the target image to the pre-trained first A detection sub-model, generating a first feature map corresponding to the target image; inputting the target image to a pre-trained second detection sub-model, generating a second feature map corresponding to the target image; based on the first feature map and the second feature map, The feature map corresponding to the first feature map and the second feature map is determined as the feature map corresponding to the target image. Two sub-modules are used to extract features separately, which improves the processing efficiency and accuracy of the system. By adding a sub-model based on model structural parameters on the basis of the original convolutional neural network model, the model improvement is more flexible and convenient.
从图6中可以看出,与图1对应的实施例相比,本实施例中的用于生 成信息的方法的流程600突出了采用训练得到的图像检测模型,来生成目标图像中的特征信息对应的特征图的步骤。由此,本实施例描述的方案可以实现不同类型、不同层级、不同深度的富有针对性的特征提取。It can be seen from FIG. 6 that, compared with the embodiment corresponding to FIG. 1 , the process 600 of the method for generating information in this embodiment highlights the use of the trained image detection model to generate feature information in the target image The corresponding feature map steps. Therefore, the solution described in this embodiment can realize targeted feature extraction of different types, different levels, and different depths.
进一步参考图7,作为对上述图1~3所示方法的实现,本申请提供了一种用于训练模型的装置的一个实施例,该装置实施例与图1所示的方法实施例相对应,除下面所记载的特征外,该装置实施例还可以包括与图1所示的方法实施例相同或相应的特征,以及产生与图1所示的方法实施例相同或相应的效果,该装置具体可以应用于各种电子设备中。Further referring to FIG. 7 , as an implementation of the method shown in FIGS. 1 to 3 above, the present application provides an embodiment of a device for training a model, which corresponds to the method embodiment shown in FIG. 1 , in addition to the features described below, the device embodiment may also include the same or corresponding features as the method embodiment shown in Figure 1, and produce the same or corresponding effects as the method embodiment shown in Figure 1, the device Specifically, it can be applied to various electronic devices.
如图7所示,本实施例的用于训练模型的装置700包括:获取单元701和训练单元702,其中,获取单元,被配置成获取训练样本集,其中,训练样本集中的训练样本包括目标图像和与目标图像对应的特征图;训练单元,被配置成将训练样本集中的训练样本包括的目标图像作为网络的输入数据,将与输入的目标图像对应的特征图作为网络的输出数据,训练得到图像检测模型,其中,模型的网络结构基于各类子结构中的各个模型子结构而构建,图像检测模型的优化目标为通过对各类子结构中各个模型子结构的结构参数进行采样,学习得到模型网络结构的最优解,各类子结构通过对模型的网络结构搜索空间中各个模型子结构所处理的特征图进行分析而得到。As shown in FIG. 7 , the apparatus 700 for training a model in this embodiment includes: an acquisition unit 701 and a training unit 702, wherein the acquisition unit is configured to acquire a training sample set, wherein the training samples in the training sample set include the target image and a feature map corresponding to the target image; the training unit is configured to use the target image included in the training samples in the training sample set as the input data of the network, and use the feature map corresponding to the input target image as the output data of the network, train The image detection model is obtained, wherein the network structure of the model is constructed based on each model substructure in various substructures, and the optimization goal of the image detection model is to learn by sampling the structural parameters of each model substructure in various substructures. The optimal solution of the model network structure is obtained, and various substructures are obtained by analyzing the feature maps processed by each model substructure in the network structure search space of the model.
在本实施例中,用于训练模型的装置700的获取单元701和训练单元702的具体处理及其所带来的技术效果可分别参考图1对应的实施例中的步骤101到步骤102的相关说明,在此不再赘述。In this embodiment, for the specific processing of the acquisition unit 701 and the training unit 702 of the apparatus 700 for training the model and the technical effects brought about by them, please refer to the relevant steps from step 101 to step 102 in the embodiment corresponding to FIG. 1 . description and will not be repeated here.
在本实施例的一些可选的实现方式中,训练单元中的各类子结构通过对模型的网络结构搜索空间中各个模型子结构所处理的特征图的特征语义层级进行划分而得到。In some optional implementations of this embodiment, the various substructures in the training unit are obtained by dividing the feature semantic levels of the feature maps processed by each model substructure in the network structure search space of the model.
在本实施例的一些可选的实现方式中,训练单元中的模型的网络结构基于将同类子结构中的各个模型子结构进行汇总后,按照各类子结构划分后的层级进行堆叠而构建。In some optional implementation manners of this embodiment, the network structure of the model in the training unit is constructed based on summarizing each model substructure in the same kind of substructure and then stacking them according to the levels divided by various substructures.
在本实施例的一些可选的实现方式中,训练单元中的图像检测模型的训练用于表征将模型中的各个模型子结构进行多个特征层变换,每个特征 层的输入数据包括:模型网络结构中相应模型子结构之前的至少两个模型子结构的输出数据和相应模型子结构之前的各个特征层的输出数据中的至少一方。In some optional implementations of this embodiment, the training of the image detection model in the training unit is used to represent that each model substructure in the model is transformed into multiple feature layers, and the input data of each feature layer includes: model At least one of the output data of at least two model substructures before the corresponding model substructure in the network structure and the output data of each feature layer before the corresponding model substructure.
在本实施例的一些可选的实现方式中,训练单元中的各类子结构包括:一类子结构和二类子结构,图像检测模型包括第一子模型和第二子模型,第一子模型的网络结构基于一类子结构中的各个模型子结构而构建,第二子模型的网络结构基于二类子结构中的各个模型子结构而构建;训练单元,包括:第一训练模块,被配置成将训练样本集中的训练样本包括的目标图像作为网络的输入数据,将与输入的目标图像对应的特征图作为网络的输出数据,调整一类子结构中各个模型子结构的结构参数,得到训练完成的第一子模型;第二训练模块,被配置成将训练样本集中的训练样本包括的目标图像作为网络的输入数据,将与输入的目标图像对应的特征图作为网络的输出数据,调整二类子结构中各个模型子结构的结构参数,得到训练完成的第二子模型;第一确定模块,被配置成基于训练完成的第一子模型和训练完成的第二子模型,确定图像检测模型。In some optional implementations of this embodiment, the various substructures in the training unit include: one type of substructure and two types of substructure, the image detection model includes the first submodel and the second submodel, the first substructure The network structure of the model is constructed based on each model substructure in the first type of substructure, and the network structure of the second submodel is constructed based on each model substructure in the second type of substructure; the training unit includes: the first training module, which is It is configured to use the target image included in the training samples in the training sample set as the input data of the network, use the feature map corresponding to the input target image as the output data of the network, adjust the structural parameters of each model substructure in a class of substructures, and obtain The first sub-model that has been trained; the second training module is configured to use the target image included in the training sample in the training sample set as the input data of the network, and use the feature map corresponding to the input target image as the output data of the network, adjust The structural parameters of each model substructure in the second type of substructure obtain the trained second sub-model; the first determining module is configured to determine the image detection based on the trained first sub-model and the trained second sub-model Model.
在本实施例的一些可选的实现方式中,训练单元中的图像检测模型包括第一检测子模型和第二检测子模型,第一检测子模型的模型参数为模型的训练参数,第二检测子模型的模型参数为模型的结构参数;训练单元,包括:第三训练模块,被配置成将训练样本集中的训练样本包括的目标图像作为输入数据,将与输入的目标图像对应的特征图作为输出数据,调整训练参数,得到训练完成的第一检测子模型;第四训练模块,被配置成将训练样本集中的训练样本包括的目标图像作为输入数据,将与输入的目标图像对应的特征图作为输出数据,调整结构参数,得到训练完成的第二检测子模型;第二确定模块,被配置成基于训练完成的第一检测子模型和训练完成的第二检测子模型,确定图像检测模型。In some optional implementations of this embodiment, the image detection model in the training unit includes a first detection sub-model and a second detection sub-model, the model parameters of the first detection sub-model are the training parameters of the model, and the second detection sub-model The model parameters of the sub-model are structural parameters of the model; the training unit includes: a third training module configured to use the target image included in the training samples in the training sample set as input data, and use the feature map corresponding to the input target image as Output data, adjust the training parameters, and obtain the first detection sub-model that has been trained; the fourth training module is configured to use the target image included in the training sample in the training sample set as input data, and use the feature map corresponding to the input target image As the output data, the structural parameters are adjusted to obtain the trained second detection sub-model; the second determination module is configured to determine the image detection model based on the trained first detection sub-model and the trained second detection sub-model.
本公开的上述实施例提供用于训练模型的装置,通过第一获取单元获取训练样本集,其中,训练样本集中的训练样本包括头部图像、头部图像中的头部对象的特征信息和与头部对象的特征信息对应的特征图,然后,训练单元利用机器学习算法,将训练样本集中的训练样本包括的头部图像 作为输入数据,将与输入的头部图像对应的头部对象的特征信息和与头部对象的特征信息对应的特征图作为期望输出数据,训练得到特征提取模型,其中,特征提取模型基于卷积神经网络而构建,卷积神经网络模型的参数包括:尺度参数和其他卷积核参数,尺度参数为利用尺度空间理论而设定的头部对象的尺度结构,其他卷积核参数为卷积神经网络中除尺度参数以外卷积核的其他参数,丰富了模型的训练方式,有助于基于训练得到的模型实现多尺度空间的特征提取。The above-mentioned embodiments of the present disclosure provide an apparatus for training a model. The training sample set is acquired by the first acquisition unit, wherein the training samples in the training sample set include head images, feature information of head objects in the head images, and The feature map corresponding to the feature information of the head object, and then, the training unit uses the machine learning algorithm to use the head image included in the training sample in the training sample set as input data, and the feature of the head object corresponding to the input head image The information and the feature map corresponding to the feature information of the head object are used as the expected output data, and the feature extraction model is trained. The feature extraction model is constructed based on the convolutional neural network. The parameters of the convolutional neural network model include: scale parameters and others Convolution kernel parameters, the scale parameter is the scale structure of the head object set by using the scale space theory, and other convolution kernel parameters are other parameters of the convolution kernel in the convolutional neural network except the scale parameter, which enriches the training of the model This method is helpful to realize feature extraction in multi-scale space based on the trained model.
继续参考参考图8,作为对上述图6所示方法的实现,本申请提供了一种用于生成信息的装置的一个实施例,该装置实施例与图6所示的方法实施例相对应,除下面所记载的特征外,该装置实施例还可以包括与图6所示的方法实施例相同或相应的特征,以及产生与图6所示的方法实施例相同或相应的效果,该装置具体可以应用于各种电子设备中。Continuing to refer to FIG. 8, as an implementation of the above-mentioned method shown in FIG. 6, the present application provides an embodiment of a device for generating information. This device embodiment corresponds to the method embodiment shown in FIG. 6, In addition to the features described below, the device embodiment may also include the same or corresponding features as the method embodiment shown in Figure 6, and produce the same or corresponding effects as the method embodiment shown in Figure 6, the device specifically It can be applied to various electronic devices.
如图8所示,本实施例的用于生成信息的装置800包括:图像获取单元801和生成单元802,其中,图像获取单元,被配置成获取目标图像;生成单元,被配置成将目标图像输入至预先训练的图像检测模型,生成目标图像对应的特征图,其中,图像检测模型通过如上述用于训练模型的方法中任一实施例的方法训练得到。As shown in FIG. 8 , the apparatus 800 for generating information in this embodiment includes: an image acquisition unit 801 and a generation unit 802, wherein the image acquisition unit is configured to acquire a target image; the generation unit is configured to convert the target image Input to the pre-trained image detection model to generate a feature map corresponding to the target image, wherein the image detection model is obtained by training according to any embodiment of the method for training the model above.
在本实施例中,用于生成信息的装置800的图像获取单元801和生成单元802的具体处理及其所带来的技术效果可分别参考图6对应的实施例中的步骤601到步骤602的相关说明,在此不再赘述。In this embodiment, for the specific processing of the image acquisition unit 801 and the generation unit 802 of the device 800 for generating information and the technical effects brought about by them, please refer to steps 601 to 602 in the embodiment corresponding to FIG. 6 Relevant descriptions will not be repeated here.
在本实施例的一些可选的实现方式中,生成单元中的图像检测模型包括第一检测子模型和第二检测子模型,第一检测子模型用于表征利用机器学习算法对模型的训练参数进行训练,第二检测子模型用于表征利用机器学习算法对模型的结构参数进行训练;生成单元,包括:第一生成模块,被配置成将目标图像输入至预先训练的第一检测子模型,生成目标图像对应的第一特征图;第二生成模块,被配置成将目标图像输入至预先训练的第二检测子模型,生成目标图像对应的第二特征图;第二确定模块,被配置成基于第一特征图和第二特征图,确定与第一特征图和第二特征图相对应的特征图作为目标图像对应的特征图。In some optional implementations of this embodiment, the image detection model in the generation unit includes a first detection sub-model and a second detection sub-model, and the first detection sub-model is used to represent the training parameters of the model using a machine learning algorithm For training, the second detection sub-model is used to represent the structural parameters of the model using a machine learning algorithm; the generation unit includes: a first generation module configured to input the target image to the pre-trained first detection sub-model, Generate the first feature map corresponding to the target image; the second generation module is configured to input the target image to the pre-trained second detection sub-model to generate a second feature map corresponding to the target image; the second determination module is configured to Based on the first feature map and the second feature map, a feature map corresponding to the first feature map and the second feature map is determined as a feature map corresponding to the target image.
根据本申请的实施例,本申请还提供了一种电子设备和一种可读存储介质。According to the embodiments of the present application, the present application also provides an electronic device and a readable storage medium.
如图9所示,是根据本申请实施例的用于训练模型的方法的电子设备的框图。电子设备旨在表示各种形式的数字计算机,诸如,膝上型计算机、台式计算机、工作台、个人数字助理、服务器、刀片式服务器、大型计算机、和其它适合的计算机。电子设备还可以表示各种形式的移动装置,诸如,个人数字处理、蜂窝电话、智能电话、可穿戴设备和其它类似的计算装置。本文所示的部件、它们的连接和关系、以及它们的功能仅仅作为示例,并且不意在限制本文中描述的和/或者要求的本申请的实现。As shown in FIG. 9 , it is a block diagram of an electronic device according to a method for training a model according to an embodiment of the present application. Electronic device is intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other suitable computers. Electronic devices may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are by way of example only, and are not intended to limit implementations of the applications described and/or claimed herein.
如图9所示,该电子设备包括:一个或多个处理器901、存储器902,以及用于连接各部件的接口,包括高速接口和低速接口。各个部件利用不同的总线互相连接,并且可以被安装在公共主板上或者根据需要以其它方式安装。处理器可以对在电子设备内执行的指令进行处理,包括存储在存储器中或者存储器上以在外部输入/输出装置(诸如,耦合至接口的显示设备)上显示GUI的图形信息的指令。在其它实施方式中,若需要,可以将多个处理器和/或多条总线与多个存储器和多个存储器一起使用。同样,可以连接多个电子设备,各个设备提供部分必要的操作(例如,作为服务器阵列、一组刀片式服务器、或者多处理器系统)。图9中以一个处理器901为例。As shown in FIG. 9, the electronic device includes: one or more processors 901, a memory 902, and interfaces for connecting various components, including high-speed interfaces and low-speed interfaces. The various components are interconnected using different buses and can be mounted on a common motherboard or otherwise as desired. The processor may process instructions executed within the electronic device, including instructions stored in or on the memory, to display graphical information of a GUI on an external input/output device such as a display device coupled to an interface. In other implementations, multiple processors and/or multiple buses may be used with multiple memories and multiple memories, if desired. Likewise, multiple electronic devices may be connected, with each device providing some of the necessary operations (eg, as a server array, a set of blade servers, or a multi-processor system). In FIG. 9, a processor 901 is taken as an example.
存储器902即为本申请所提供的非瞬时计算机可读存储介质。其中,存储器存储有可由至少一个处理器执行的指令,以使至少一个处理器执行本申请所提供的用于训练模型的方法。本申请的非瞬时计算机可读存储介质存储计算机指令,该计算机指令用于使计算机执行本申请所提供的用于训练模型的方法。The memory 902 is a non-transitory computer-readable storage medium provided in this application. Wherein, the memory stores instructions executable by at least one processor, so that at least one processor executes the method for training a model provided in this application. The non-transitory computer-readable storage medium of the present application stores computer instructions, and the computer instructions are used to cause a computer to execute the method for training a model provided in the present application.
存储器902作为一种非瞬时计算机可读存储介质,可用于存储非瞬时软件程序、非瞬时计算机可执行程序以及模块,如本申请实施例中的用于训练模型的方法对应的程序指令/模块(例如,附图7所示的获取单元701和训练单元702)。处理器901通过运行存储在存储器902中的非瞬时软件程序、指令以及模块,从而执行服务器的各种功能应用以及数据处理,即实现上述方法实施例中的用于训练模型的方法。The memory 902, as a non-transitory computer-readable storage medium, can be used to store non-transitory software programs, non-transitory computer-executable programs and modules, such as the program instructions/modules corresponding to the method for training the model in the embodiment of the present application ( For example, the acquisition unit 701 and the training unit 702 shown in Fig. 7). The processor 901 executes various functional applications and data processing of the server by running the non-transitory software programs, instructions and modules stored in the memory 902, that is, implements the method for training the model in the above method embodiments.
存储器902可以包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需要的应用程序;存储数据区可存储根据用于训练模型的电子设备的使用所创建的数据等。此外,存储器902可以包括高速随机存取存储器,还可以包括非瞬时存储器,例如至少一个磁盘存储器件、闪存器件、或其他非瞬时固态存储器件。在一些实施例中,存储器902可选包括相对于处理器901远程设置的存储器,这些远程存储器可以通过网络连接至用于训练模型的电子设备。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。The memory 902 may include a program storage area and a data storage area, wherein the program storage area may store an operating system and an application program required by at least one function; the data storage area may store data created according to the use of the electronic device for training the model Wait. In addition, the memory 902 may include a high-speed random access memory, and may also include a non-transitory memory, such as at least one magnetic disk storage device, a flash memory device, or other non-transitory solid-state storage devices. In some embodiments, the storage 902 may optionally include storages that are remotely located relative to the processor 901, and these remote storages may be connected to electronic devices for training models through a network. Examples of the aforementioned networks include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof.
用于训练模型的方法的电子设备还可以包括:输入装置903和输出装置904。处理器901、存储器902、输入装置903和输出装置904可以通过总线或者其他方式连接,图9中以通过总线连接为例。The electronic equipment used in the method for training a model may further include: an input device 903 and an output device 904 . The processor 901, the memory 902, the input device 903, and the output device 904 may be connected through a bus or in other ways, and connection through a bus is taken as an example in FIG. 9 .
输入装置903可接收输入的数字或字符信息,以及产生与用于训练模型的电子设备的用户设置以及功能控制有关的键信号输入,例如触摸屏、小键盘、鼠标、轨迹板、触摸板、指示杆、一个或者多个鼠标按钮、轨迹球、操纵杆等输入装置。输出装置904可以包括显示设备、辅助照明装置(例如,LED)和触觉反馈装置(例如,振动电机)等。该显示设备可以包括但不限于,液晶显示器(LCD)、发光二极管(LED)显示器和等离子体显示器。在一些实施方式中,显示设备可以是触摸屏。The input device 903 can receive the input number or character information, and generate key signal input related to the user setting and function control of the electronic equipment used to train the model, such as touch screen, keypad, mouse, trackpad, touchpad, pointer stick , one or more mouse buttons, trackballs, joysticks, and other input devices. The output device 904 may include a display device, an auxiliary lighting device (eg, LED), a tactile feedback device (eg, a vibration motor), and the like. The display device may include, but is not limited to, a liquid crystal display (LCD), a light emitting diode (LED) display, and a plasma display. In some implementations, the display device may be a touch screen.
此处描述的系统和技术的各种实施方式可以在数字电子电路系统、集成电路系统、专用ASIC(专用集成电路)、计算机硬件、固件、软件、和/或它们的组合中实现。这些各种实施方式可以包括:实施在一个或者多个计算机程序中,该一个或者多个计算机程序可在包括至少一个可编程处理器的可编程系统上执行和/或解释,该可编程处理器可以是专用或者通用可编程处理器,可以从存储系统、至少一个输入装置、和至少一个输出装置接收数据和指令,并且将数据和指令传输至该存储系统、该至少一个输入装置、和该至少一个输出装置。Various implementations of the systems and techniques described herein can be implemented in digital electronic circuitry, integrated circuit systems, application specific ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include being implemented in one or more computer programs executable and/or interpreted on a programmable system including at least one programmable processor, the programmable processor Can be special-purpose or general-purpose programmable processor, can receive data and instruction from storage system, at least one input device, and at least one output device, and transmit data and instruction to this storage system, this at least one input device, and this at least one output device an output device.
这些计算程序(也称作程序、软件、软件应用、或者代码)包括可编程处理器的机器指令,并且可以利用高级过程和/或面向对象的编程语言、和/或汇编/机器语言来实施这些计算程序。如本文使用的,术语“机器可读介质”和“计算机可读介质”指的是用于将机器指令和/或数据提供给可编 程处理器的任何计算机程序产品、设备、和/或装置(例如,磁盘、光盘、存储器、可编程逻辑装置(PLD)),包括,接收作为机器可读信号的机器指令的机器可读介质。术语“机器可读信号”指的是用于将机器指令和/或数据提供给可编程处理器的任何信号。These computing programs (also referred to as programs, software, software applications, or codes) include machine instructions for a programmable processor and may be implemented using high-level procedural and/or object-oriented programming languages, and/or assembly/machine language calculation program. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or means for providing machine instructions and/or data to a programmable processor ( For example, magnetic disks, optical disks, memories, programmable logic devices (PLDs), including machine-readable media that receive machine instructions as machine-readable signals. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.
为了提供与用户的交互,可以在计算机上实施此处描述的系统和技术,该计算机具有:用于向用户显示信息的显示装置(例如,CRT(阴极射线管)或者LCD(液晶显示器)监视器);以及键盘和指向装置(例如,鼠标或者轨迹球),用户可以通过该键盘和该指向装置来将输入提供给计算机。其它种类的装置还可以用于提供与用户的交互;例如,提供给用户的反馈可以是任何形式的传感反馈(例如,视觉反馈、听觉反馈、或者触觉反馈);并且可以用任何形式(包括声输入、语音输入或者、触觉输入)来接收来自用户的输入。To provide for interaction with the user, the systems and techniques described herein can be implemented on a computer having a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to the user. ); and a keyboard and pointing device (eg, a mouse or a trackball) through which a user can provide input to the computer. Other kinds of devices can also be used to provide interaction with the user; for example, the feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and can be in any form (including Acoustic input, speech input or, tactile input) to receive input from the user.
可以将此处描述的系统和技术实施在包括后台部件的计算系统(例如,作为数据服务器)、或者包括中间件部件的计算系统(例如,应用服务器)、或者包括前端部件的计算系统(例如,具有图形用户界面或者网络浏览器的用户计算机,用户可以通过该图形用户界面或者该网络浏览器来与此处描述的系统和技术的实施方式交互)、或者包括这种后台部件、中间件部件、或者前端部件的任何组合的计算系统中。可以通过任何形式或者介质的数字数据通信(例如,通信网络)来将系统的部件相互连接。通信网络的示例包括:局域网(LAN)、广域网(WAN)和互联网。The systems and techniques described herein can be implemented in a computing system that includes back-end components (e.g., as a data server), or a computing system that includes middleware components (e.g., an application server), or a computing system that includes front-end components (e.g., as a a user computer having a graphical user interface or web browser through which a user can interact with embodiments of the systems and techniques described herein), or including such backend components, middleware components, Or any combination of front-end components in a computing system. The components of the system can be interconnected by any form or medium of digital data communication, eg, a communication network. Examples of communication networks include: Local Area Network (LAN), Wide Area Network (WAN) and the Internet.
计算机系统可以包括客户端和服务器。客户端和服务器一般远离彼此并且通常通过通信网络进行交互。通过在相应的计算机上运行并且彼此具有客户端-服务器关系的计算机程序来产生客户端和服务器的关系。A computer system may include clients and servers. Clients and servers are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by computer programs running on the respective computers and having a client-server relationship to each other.
根据本申请实施例的技术方案采用获取训练样本集,其中,训练样本集中的训练样本包括目标图像和与目标图像对应的特征图;将训练样本集中的训练样本包括的目标图像作为网络的输入数据,将与输入的目标图像对应的特征图作为网络的输出数据,训练得到图像检测模型,其中,模型的网络结构基于各类子结构中的各个模型子结构而构建,图像检测模型的优化目标为通过对各类子结构中各个模型子结构的结构参数进行采样,学习得到模型网络结构的最优解,各类子结构通过对模型的网络结构搜索空 间中各个模型子结构所处理的特征图进行分析而得到,实现了针对图像检测模型结构的研究和优化,通过对模型子结构进行分类,将不同层次信息相互分离,生成不同内容的模型结构,实现了一种基于分离搜索空间的图像检测模型的构建。根据需求将更多资源分配到对所需类型信息的处理上,实现轻量高效地图像检测,避免了资源的浪费,从整体上提高了模型的精度和效率。According to the technical solution of the embodiment of the present application, the training sample set is obtained, wherein the training samples in the training sample set include the target image and the feature map corresponding to the target image; the target image included in the training sample set in the training sample set is used as the input data of the network , the feature map corresponding to the input target image is used as the output data of the network, and the image detection model is trained. The network structure of the model is constructed based on each model substructure in various substructures, and the optimization goal of the image detection model is By sampling the structural parameters of each model substructure in various substructures, the optimal solution of the model network structure is learned, and each type of substructure is processed by the feature map processed by each model substructure in the network structure search space of the model. Based on the analysis, the research and optimization of the image detection model structure is realized. By classifying the model substructure, different levels of information are separated from each other to generate a model structure with different content, and an image detection model based on a separated search space is realized. build. Allocate more resources to the processing of required types of information according to requirements, realize lightweight and efficient image detection, avoid waste of resources, and improve the accuracy and efficiency of the model as a whole.
应该理解,可以使用上面所示的各种形式的流程,重新排序、增加或删除步骤。例如,本申请中记载的各步骤可以并行地执行也可以顺序地执行也可以不同的次序执行,只要能够实现本申请公开的技术方案所期望的结果,本文在此不进行限制。It should be understood that steps may be reordered, added or deleted using the various forms of flow shown above. For example, the steps described in this application may be executed in parallel, sequentially, or in a different order, as long as the desired result of the technical solution disclosed in this application can be achieved, no limitation is imposed herein.
上述具体实施方式,并不构成对本申请保护范围的限制。本领域技术人员应该明白的是,根据设计要求和其他因素,可以进行各种修改、组合、子组合和替代。任何在本申请的精神和原则之内所作的修改、等同替换和改进等,均应包含在本申请保护范围之内。The above specific implementation methods are not intended to limit the protection scope of the present application. It should be apparent to those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made depending on design requirements and other factors. Any modifications, equivalent replacements and improvements made within the spirit and principles of this application shall be included within the protection scope of this application.

Claims (18)

  1. 一种用于训练模型的方法,所述方法包括:A method for training a model, the method comprising:
    获取训练样本集,其中,所述训练样本集中的训练样本包括目标图像和与所述目标图像对应的特征图;Obtain a training sample set, wherein the training samples in the training sample set include a target image and a feature map corresponding to the target image;
    将所述训练样本集中的训练样本包括的目标图像作为网络的输入数据,将与输入的目标图像对应的特征图作为所述网络的输出数据,训练得到图像检测模型,其中,所述图像检测模型的网络结构基于各类子结构中的各个模型子结构而构建,所述图像检测模型的优化目标为通过对各类子结构中各个模型子结构的结构参数进行采样,学习得到所述图像检测模型的网络结构的最优解,所述各类子结构通过对所述图像检测模型的网络结构搜索空间中各个模型子结构所处理的所述特征图进行分析而得到。The target image included in the training samples in the training sample set is used as the input data of the network, and the feature map corresponding to the input target image is used as the output data of the network, and the image detection model is obtained through training, wherein the image detection model The network structure of the network structure is constructed based on each model substructure in various substructures, and the optimization goal of the image detection model is to learn and obtain the image detection model by sampling the structural parameters of each model substructure in various substructures. The optimal solution of the network structure, the various substructures are obtained by analyzing the feature maps processed by each model substructure in the network structure search space of the image detection model.
  2. 根据权利要求1所述的方法,其中,所述各类子结构通过对所述图像检测模型的网络结构搜索空间中各个模型子结构所处理的所述特征图的特征语义层级进行划分而得到。The method according to claim 1, wherein the various substructures are obtained by dividing the feature semantic levels of the feature map processed by each model substructure in the network structure search space of the image detection model.
  3. 根据权利要求2所述的方法,其中,所述图像检测模型的网络结构基于将同类子结构中的各个模型子结构进行汇总后,按照所述各类子结构划分后的层级进行堆叠而构建。The method according to claim 2, wherein the network structure of the image detection model is constructed based on summarizing each model substructure in the same kind of substructure and then stacking them according to the levels divided by the various substructures.
  4. 根据权利要求1所述的方法,其中,所述图像检测模型的训练用于表征将所述图像检测模型中的各个模型子结构进行多个特征层变换,每个所述特征层的输入数据包括:所述图像检测模型的网络结构中相应模型子结构之前的至少两个模型子结构的输出数据和相应模型子结构之前的各个特征层的输出数据中的至少一方。The method according to claim 1, wherein the training of the image detection model is used to represent that each model substructure in the image detection model is transformed into a plurality of feature layers, and the input data of each feature layer includes : at least one of the output data of at least two model substructures before the corresponding model substructure in the network structure of the image detection model and the output data of each feature layer before the corresponding model substructure.
  5. 根据权利要求1所述的方法,其中,所述各类子结构包括:一类子结构和二类子结构,所述图像检测模型包括第一子模型和第二子模型,所述第一子模型的网络结构基于所述一类子结构中的各个模型子结构而构 建,所述第二子模型的网络结构基于所述二类子结构中的各个模型子结构而构建;The method according to claim 1, wherein the various substructures include: a first substructure and a second substructure, the image detection model includes a first submodel and a second submodel, the first submodel The network structure of the model is constructed based on each model substructure in the first type of substructure, and the network structure of the second submodel is constructed based on each model substructure in the second type of substructure;
    所述将所述训练样本集中的训练样本包括的目标图像作为网络的输入数据,将与输入的目标图像对应的特征图作为所述网络的输出数据,训练得到图像检测模型,包括:The target image included in the training samples in the training sample set is used as the input data of the network, and the feature map corresponding to the input target image is used as the output data of the network, and the image detection model is obtained through training, including:
    将所述训练样本集中的训练样本包括的目标图像作为网络的输入数据,将与输入的目标图像对应的特征图作为所述网络的输出数据,调整所述一类子结构中各个模型子结构的结构参数,得到训练完成的所述第一子模型;The target image included in the training samples in the training sample set is used as the input data of the network, and the feature map corresponding to the input target image is used as the output data of the network, and the parameters of each model substructure in the one type of substructure are adjusted. Structural parameters, obtaining the first sub-model that has been trained;
    将所述训练样本集中的训练样本包括的目标图像作为网络的输入数据,将与输入的目标图像对应的特征图作为所述网络的输出数据,调整所述二类子结构中各个模型子结构的结构参数,得到训练完成的所述第二子模型;The target image included in the training samples in the training sample set is used as the input data of the network, and the feature map corresponding to the input target image is used as the output data of the network, and the parameters of each model substructure in the two-class substructure are adjusted. Structural parameters, obtaining the second sub-model that has been trained;
    基于训练完成的所述第一子模型和训练完成的所述第二子模型,确定所述图像检测模型。The image detection model is determined based on the trained first sub-model and the trained second sub-model.
  6. 根据权利要求1所述的方法,其中,所述图像检测模型包括第一检测子模型和第二检测子模型,所述第一检测子模型的模型参数为所述图像检测模型的训练参数,所述第二检测子模型的模型参数为所述图像检测模型的结构参数;The method according to claim 1, wherein the image detection model includes a first detection sub-model and a second detection sub-model, and the model parameters of the first detection sub-model are training parameters of the image detection model, so The model parameters of the second detection sub-model are structural parameters of the image detection model;
    所述将所述训练样本集中的训练样本包括的目标图像作为网络的输入数据,将与输入的目标图像对应的特征图作为所述网络的输出数据,训练得到图像检测模型,包括:The target image included in the training samples in the training sample set is used as the input data of the network, and the feature map corresponding to the input target image is used as the output data of the network, and the image detection model is obtained through training, including:
    将所述训练样本集中的训练样本包括的目标图像作为输入数据,将与输入的目标图像对应的特征图作为输出数据,调整所述训练参数,得到训练完成的第一检测子模型;Using the target image included in the training samples in the training sample set as input data, using the feature map corresponding to the input target image as output data, adjusting the training parameters to obtain the first detection sub-model that has been trained;
    将所述训练样本集中的训练样本包括的目标图像作为输入数据,将与输入的目标图像对应的特征图作为输出数据,调整所述结构参数,得到训练完成的第二检测子模型;Using the target image included in the training sample in the training sample set as input data, using the feature map corresponding to the input target image as output data, adjusting the structural parameters to obtain the second detection sub-model that has been trained;
    基于训练完成的所述第一检测子模型和训练完成的所述第二检测子 模型,确定所述图像检测模型。The image detection model is determined based on the trained first detection sub-model and the trained second detection sub-model.
  7. 一种用于生成信息的方法,所述方法包括:A method for generating information, the method comprising:
    获取目标图像;Get the target image;
    将所述目标图像输入至预先训练的图像检测模型,生成所述目标图像对应的特征图,其中,所述图像检测模型通过如权利要求1-6之一所述的方法训练得到。Inputting the target image into a pre-trained image detection model to generate a feature map corresponding to the target image, wherein the image detection model is trained by the method according to any one of claims 1-6.
  8. 根据权利要求7所述的方法,其中,所述图像检测模型包括第一检测子模型和第二检测子模型,所述第一检测子模型用于表征利用机器学习算法对所述图像检测模型的训练参数进行训练,所述第二检测子模型用于表征利用机器学习算法对所述图像检测模型的结构参数进行训练;The method according to claim 7, wherein the image detection model includes a first detection sub-model and a second detection sub-model, and the first detection sub-model is used to characterize the performance of the image detection model using a machine learning algorithm. The training parameters are trained, and the second detection sub-model is used to represent the structural parameters of the image detection model that are trained using machine learning algorithms;
    所述将所述目标图像输入至预先训练的图像检测模型,生成所述目标图像对应的特征图,包括:The step of inputting the target image into a pre-trained image detection model to generate a feature map corresponding to the target image includes:
    将所述目标图像输入至预先训练的第一检测子模型,生成所述目标图像对应的第一特征图;Inputting the target image into a pre-trained first detection sub-model to generate a first feature map corresponding to the target image;
    将所述目标图像输入至预先训练的第二检测子模型,生成所述目标图像对应的第二特征图;Inputting the target image into a pre-trained second detection sub-model to generate a second feature map corresponding to the target image;
    基于所述第一特征图和所述第二特征图,确定与所述第一特征图和所述第二特征图相对应的特征图作为所述目标图像对应的特征图。Based on the first feature map and the second feature map, determine a feature map corresponding to the first feature map and the second feature map as a feature map corresponding to the target image.
  9. 一种用于训练模型的装置,所述装置包括:A device for training a model, the device comprising:
    获取单元,被配置成获取训练样本集,其中,所述训练样本集中的训练样本包括目标图像和与所述目标图像对应的特征图;an acquisition unit configured to acquire a training sample set, wherein the training samples in the training sample set include a target image and a feature map corresponding to the target image;
    训练单元,被配置成将所述训练样本集中的训练样本包括的目标图像作为网络的输入数据,将与输入的目标图像对应的特征图作为所述网络的输出数据,训练得到图像检测模型,其中,所述图像检测模型的网络结构基于各类子结构中的各个模型子结构而构建,所述图像检测模型的优化目标为通过对各类子结构中各个模型子结构的结构参数进行采样,学习得到所述图像检测模型的网络结构的最优解,所述各类子结构通过对所述图像 检测模型的网络结构搜索空间中各个模型子结构所处理的所述特征图进行分析而得到。The training unit is configured to use the target image included in the training samples in the training sample set as the input data of the network, and use the feature map corresponding to the input target image as the output data of the network to train an image detection model, wherein , the network structure of the image detection model is constructed based on each model substructure in various substructures, and the optimization goal of the image detection model is to learn by sampling the structural parameters of each model substructure in various substructures An optimal solution of the network structure of the image detection model is obtained, and the various substructures are obtained by analyzing the feature maps processed by each model substructure in the network structure search space of the image detection model.
  10. 根据权利要求9所述的装置,其中,所述训练单元中的所述各类子结构通过对所述图像检测模型的网络结构搜索空间中各个模型子结构所处理的所述特征图的特征语义层级进行划分而得到。The device according to claim 9, wherein the various substructures in the training unit use the feature semantics of the feature map processed by each model substructure in the network structure search space of the image detection model obtained by classifying.
  11. 根据权利要求10所述的装置,其中,所述训练单元中的所述图像检测模型的网络结构基于将同类子结构中的各个模型子结构进行汇总后,按照所述各类子结构划分后的层级进行堆叠而构建。The device according to claim 10, wherein the network structure of the image detection model in the training unit is based on summarizing each model substructure in the same substructure and dividing it according to the various substructures Layers are stacked to build.
  12. 根据权利要求9所述的装置,其中,所述训练单元中的所述图像检测模型的训练用于表征将所述图像检测模型中的各个模型子结构进行多个特征层变换,每个所述特征层的输入数据包括:所述图像检测模型的网络结构中相应模型子结构之前的至少两个模型子结构的输出数据和相应模型子结构之前的各个特征层的输出数据中的至少一方。The device according to claim 9, wherein the training of the image detection model in the training unit is used to represent that each model substructure in the image detection model is transformed into multiple feature layers, each of the The input data of the feature layer includes: at least one of the output data of at least two model substructures before the corresponding model substructure in the network structure of the image detection model and the output data of each feature layer before the corresponding model substructure.
  13. 根据权利要求9所述的装置,其中,所述训练单元中的所述各类子结构包括:一类子结构和二类子结构,所述图像检测模型包括第一子模型和第二子模型,所述第一子模型的网络结构基于一类子结构中的各个模型子结构而构建,所述第二子模型的网络结构基于二类子结构中的各个模型子结构而构建;The device according to claim 9, wherein the various substructures in the training unit include: a first substructure and a second substructure, and the image detection model includes a first submodel and a second submodel , the network structure of the first sub-model is constructed based on each model substructure in the first type of substructure, and the network structure of the second sub-model is constructed based on each model substructure in the second type of substructure;
    所述训练单元,包括:The training unit includes:
    第一训练模块,被配置成将所述训练样本集中的训练样本包括的目标图像作为网络的输入数据,将与输入的目标图像对应的特征图作为所述网络的输出数据,调整所述一类子结构中各个模型子结构的结构参数,得到训练完成的所述第一子模型;The first training module is configured to use the target image included in the training samples in the training sample set as the input data of the network, use the feature map corresponding to the input target image as the output data of the network, and adjust the class Structural parameters of each model substructure in the substructure to obtain the first submodel that has been trained;
    第二训练模块,被配置成将所述训练样本集中的训练样本包括的目标图像作为网络的输入数据,将与输入的目标图像对应的特征图作为所述网络的输出数据,调整所述二类子结构中各个模型子结构的结构参数,得到 训练完成的所述第二子模型;The second training module is configured to use the target image included in the training samples in the training sample set as the input data of the network, use the feature map corresponding to the input target image as the output data of the network, and adjust the second class Structural parameters of each model substructure in the substructure to obtain the second submodel that has been trained;
    第一确定模块,被配置成基于训练完成的所述第一子模型和训练完成的所述第二子模型,确定所述图像检测模型。The first determining module is configured to determine the image detection model based on the trained first sub-model and the trained second sub-model.
  14. 根据权利要求9所述的装置,其中,所述训练单元中的所述图像检测模型包括第一检测子模型和第二检测子模型,所述第一检测子模型的模型参数为所述图像检测模型的训练参数,所述第二检测子模型的模型参数为所述图像检测模型的结构参数;The device according to claim 9, wherein the image detection model in the training unit includes a first detection sub-model and a second detection sub-model, and the model parameters of the first detection sub-model are the image detection The training parameters of the model, the model parameters of the second detection sub-model are the structural parameters of the image detection model;
    所述训练单元,包括:The training unit includes:
    第三训练模块,被配置成将所述训练样本集中的训练样本包括的目标图像作为输入数据,将与输入的目标图像对应的特征图作为输出数据,调整所述训练参数,得到训练完成的第一检测子模型;The third training module is configured to use the target image included in the training samples in the training sample set as input data, and use the feature map corresponding to the input target image as output data, adjust the training parameters, and obtain the first training completed a detection sub-model;
    第四训练模块,被配置成将所述训练样本集中的训练样本包括的目标图像作为输入数据,将与输入的目标图像对应的特征图作为输出数据,调整所述结构参数,得到训练完成的第二检测子模型;The fourth training module is configured to use the target image included in the training samples in the training sample set as input data, use the feature map corresponding to the input target image as output data, adjust the structural parameters, and obtain the first training complete Two detection sub-models;
    第二确定模块,被配置成基于训练完成的所述第一检测子模型和训练完成的所述第二检测子模型,确定所述图像检测模型。The second determination module is configured to determine the image detection model based on the trained first detection sub-model and the trained second detection sub-model.
  15. 一种用于生成信息的装置,所述装置包括:An apparatus for generating information, the apparatus comprising:
    图像获取单元,被配置成获取目标图像;an image acquisition unit configured to acquire a target image;
    生成单元,被配置成将所述目标图像输入至预先训练的图像检测模型,生成所述目标图像对应的特征图,其中,所述图像检测模型通过如权利要求1-6之一所述的方法训练得到。A generating unit configured to input the target image into a pre-trained image detection model, and generate a feature map corresponding to the target image, wherein the image detection model passes the method according to any one of claims 1-6 Get trained.
  16. 根据权利要求15所述的装置,其中,所述生成单元中的所述图像检测模型包括第一检测子模型和第二检测子模型,所述第一检测子模型用于表征利用机器学习算法对所述图像检测模型的训练参数进行训练,所述第二检测子模型用于表征利用机器学习算法对所述图像检测模型的结构参数进行训练;The device according to claim 15, wherein the image detection model in the generation unit includes a first detection sub-model and a second detection sub-model, and the first detection sub-model is used to characterize the The training parameters of the image detection model are trained, and the second detection sub-model is used to represent that the structural parameters of the image detection model are trained using machine learning algorithms;
    所述生成单元,包括:The generating unit includes:
    第一生成模块,被配置成将所述目标图像输入至预先训练的第一检测子模型,生成所述目标图像对应的第一特征图;The first generation module is configured to input the target image into a pre-trained first detection sub-model, and generate a first feature map corresponding to the target image;
    第二生成模块,被配置成将所述目标图像输入至预先训练的第二检测子模型,生成所述目标图像对应的第二特征图;The second generation module is configured to input the target image into a pre-trained second detection sub-model, and generate a second feature map corresponding to the target image;
    第二确定模块,被配置成基于所述第一特征图和所述第二特征图,确定与所述第一特征图和所述第二特征图相对应的特征图作为所述目标图像对应的特征图。The second determination module is configured to determine, based on the first feature map and the second feature map, a feature map corresponding to the first feature map and the second feature map as the target image corresponding feature map.
  17. 一种电子设备,其特征在于,包括:An electronic device, characterized in that it comprises:
    至少一个处理器;以及at least one processor; and
    与所述至少一个处理器通信连接的存储器;其中,a memory communicatively coupled to the at least one processor; wherein,
    所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行权利要求1-6或权利要求7-8中任一项所述的方法。The memory stores instructions executable by the at least one processor, the instructions are executed by the at least one processor, so that the at least one processor can perform claims 1-6 or claims 7-8 any one of the methods described.
  18. 一种存储有计算机指令的非瞬时计算机可读存储介质,其特征在于,所述计算机指令用于使所述计算机执行权利要求1-6或权利要求7-8中任一项所述的方法。A non-transitory computer-readable storage medium storing computer instructions, wherein the computer instructions are used to make the computer execute the method according to any one of claims 1-6 or 7-8.
PCT/CN2022/095186 2021-06-28 2022-05-26 Method and apparatus for training model, and device, and storage medium WO2023273720A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110717772.7A CN115618218A (en) 2021-06-28 2021-06-28 Method, apparatus, device and storage medium for training a model
CN202110717772.7 2021-06-28

Publications (1)

Publication Number Publication Date
WO2023273720A1 true WO2023273720A1 (en) 2023-01-05

Family

ID=84692497

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/095186 WO2023273720A1 (en) 2021-06-28 2022-05-26 Method and apparatus for training model, and device, and storage medium

Country Status (2)

Country Link
CN (1) CN115618218A (en)
WO (1) WO2023273720A1 (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107169573A (en) * 2017-05-05 2017-09-15 第四范式(北京)技术有限公司 Using composite machine learning model come the method and system of perform prediction
CN107169574A (en) * 2017-05-05 2017-09-15 第四范式(北京)技术有限公司 Using nested machine learning model come the method and system of perform prediction
CN107273979A (en) * 2017-06-08 2017-10-20 第四范式(北京)技术有限公司 The method and system of machine learning prediction are performed based on service class
CN111695052A (en) * 2020-06-12 2020-09-22 上海智臻智能网络科技股份有限公司 Label classification method, data processing device and readable storage medium
CN112200169A (en) * 2020-12-07 2021-01-08 北京沃东天骏信息技术有限公司 Method, apparatus, device and storage medium for training a model

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107169573A (en) * 2017-05-05 2017-09-15 第四范式(北京)技术有限公司 Using composite machine learning model come the method and system of perform prediction
CN107169574A (en) * 2017-05-05 2017-09-15 第四范式(北京)技术有限公司 Using nested machine learning model come the method and system of perform prediction
CN107273979A (en) * 2017-06-08 2017-10-20 第四范式(北京)技术有限公司 The method and system of machine learning prediction are performed based on service class
CN111695052A (en) * 2020-06-12 2020-09-22 上海智臻智能网络科技股份有限公司 Label classification method, data processing device and readable storage medium
CN112200169A (en) * 2020-12-07 2021-01-08 北京沃东天骏信息技术有限公司 Method, apparatus, device and storage medium for training a model

Also Published As

Publication number Publication date
CN115618218A (en) 2023-01-17

Similar Documents

Publication Publication Date Title
Li et al. Deepsaliency: Multi-task deep neural network model for salient object detection
US20220383535A1 (en) Object Tracking Method and Device, Electronic Device, and Computer-Readable Storage Medium
CN111860479B (en) Optical character recognition method, device, electronic equipment and storage medium
US20210303921A1 (en) Cross-modality processing method and apparatus, and computer storage medium
CN113657465B (en) Pre-training model generation method and device, electronic equipment and storage medium
CN110705460B (en) Image category identification method and device
Zhao et al. Hi-Fi: Hierarchical feature integration for skeleton detection
WO2018005594A1 (en) Eye gaze tracking using neural networks
Nguyen et al. Yolo based real-time human detection for smart video surveillance at the edge
US11893708B2 (en) Image processing method and apparatus, device, and storage medium
KR102551835B1 (en) Active interaction method, device, electronic equipment and readable storage medium
EP3852011A2 (en) Method and apparatus for determining target anchor, device and storage medium
CN114677565B (en) Training method and image processing method and device for feature extraction network
WO2022213857A1 (en) Action recognition method and apparatus
US10438088B2 (en) Visual-saliency driven scene description
KR20220047228A (en) Method and apparatus for generating image classification model, electronic device, storage medium, computer program, roadside device and cloud control platform
Lu et al. An improved target detection method based on multiscale features fusion
Zhang et al. R2net: Residual refinement network for salient object detection
WO2023273720A1 (en) Method and apparatus for training model, and device, and storage medium
CN111368800A (en) Gesture recognition method and device
CN116052288A (en) Living body detection model training method, living body detection device and electronic equipment
Rubin Bose et al. In-situ identification and recognition of multi-hand gestures using optimized deep residual network
CN111862030B (en) Face synthetic image detection method and device, electronic equipment and storage medium
CN113204665A (en) Image retrieval method, image retrieval device, electronic equipment and computer-readable storage medium
Quan et al. Object detection model based on deep dilated convolutional networks by fusing transfer learning

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE