CN115601550B

CN115601550B - Model determination method, model determination device, computer equipment and computer readable storage medium

Info

Publication number: CN115601550B
Application number: CN202211595457.2A
Authority: CN
Inventors: 杨帅; 颜泽鑫; 刘枢; 吕江波; 沈小勇
Original assignee: Shenzhen Smartmore Technology Co Ltd
Current assignee: Shenzhen Smartmore Technology Co Ltd
Priority date: 2022-12-13
Filing date: 2022-12-13
Publication date: 2023-04-07
Anticipated expiration: 2042-12-13
Also published as: CN115601550A

Abstract

The application relates to a model determination method, a model determination device, computer equipment and a computer readable storage medium. The method comprises the following steps: selecting a target convolutional network layer to be optimized from a plurality of convolutional network layers included in the initial semantic segmentation model; selecting at least one target width value corresponding to the target convolutional network layer from the width value set corresponding to the target convolutional network layer; generating at least one candidate width value combination based on each target width value corresponding to each target convolutional network layer; updating the width value of each target convolutional network layer in the initial semantic segmentation model to a target width value in a candidate width value combination to obtain a semantic segmentation model to be trained corresponding to the candidate width value combination; training each semantic segmentation model to be trained to obtain a trained semantic segmentation model corresponding to each candidate width value combination; and determining a target semantic segmentation model from the trained semantic segmentation models. By adopting the method, the efficiency of obtaining the required semantic segmentation model can be improved.

Description

Model determination method, model determination device, computer equipment and computer readable storage medium

Technical Field

The present application relates to the field of image processing technologies, and in particular, to a model determination method, apparatus, computer device, and computer-readable storage medium.

Background

With the development of computer vision technology, deep neural networks are widely applied to the field of image recognition, for example, semantic segmentation models in deep neural networks, which involve a large number of parameters and complex structures and put higher requirements on hardware storage and overhead capacity, and therefore the semantic segmentation models need to be processed to obtain semantic segmentation models meeting the requirements, namely target semantic segmentation models.

In the conventional technology, a target semantic segmentation model is usually obtained by adopting a network clipping method and a network pruning method, however, the target semantic segmentation model is obtained by adopting the network clipping method and the network pruning method, and complex decomposition and clipping are required for many times, so that the process of obtaining the target semantic segmentation model is complex, and the efficiency of obtaining the target semantic segmentation model is low.

Disclosure of Invention

The application provides a model determination method, a model determination device, computer equipment and a computer readable storage medium, which can improve the efficiency of obtaining a required semantic segmentation model.

In a first aspect, the present application provides a model determining method, including:

selecting at least one target convolutional network layer to be optimized from a plurality of convolutional network layers included in the initial semantic segmentation model;

selecting at least one target width value corresponding to the target convolutional network layer from the width value set corresponding to the target convolutional network layer aiming at each target convolutional network layer; the width value is the number of convolution kernels of the target convolutional network layer;

generating at least one candidate width value combination based on each target width value corresponding to each target convolutional network layer; the candidate width value combination comprises a target width value corresponding to each target convolutional network layer;

for each candidate width value combination, updating the width value of each target convolutional network layer in the initial semantic segmentation model to a corresponding target width value in the candidate width value combination to obtain a semantic segmentation model to be trained corresponding to the candidate width value combination;

training the semantic segmentation model to be trained corresponding to each candidate width value combination by using the sample image to obtain a trained semantic segmentation model corresponding to each candidate width value combination;

and determining a target semantic segmentation model from the trained semantic segmentation models.

In a second aspect, the present application further provides a model determination apparatus, including:

the first selection module is used for selecting at least one target convolutional network layer to be optimized from a plurality of convolutional network layers included in the initial semantic segmentation model;

the second selection module is used for selecting at least one target width value corresponding to the target convolutional network layer from the width value set corresponding to the target convolutional network layer aiming at each target convolutional network layer; the width value is the number of convolution kernels of the target convolutional network layer;

the generating module is used for generating at least one candidate width value combination based on each target width value corresponding to each target convolutional network layer; the candidate width value combination comprises a target width value corresponding to each target convolutional network layer;

the updating module is used for updating the width value of each target convolutional network layer in the initial semantic segmentation model to a corresponding target width value in the candidate width value combination aiming at each candidate width value combination to obtain a to-be-trained semantic segmentation model corresponding to the candidate width value combination;

the training module is used for training the semantic segmentation model to be trained corresponding to each candidate width value combination by using the sample image to obtain a trained semantic segmentation model corresponding to each candidate width value combination;

and the determining module is used for determining a target semantic segmentation model from the trained semantic segmentation models.

In a third aspect, the present application further provides a computer device, where the computer device includes a memory and a processor, the memory stores a computer program, and the processor implements the steps in the model determination method when executing the computer program.

In a fourth aspect, the present application also provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps in the above-described model determination method.

In a fifth aspect, the present application also provides a computer program product comprising a computer program which, when executed by a processor, performs the steps of the above-described model determination method.

According to the model determination method, the model determination device, the computer equipment, the computer readable storage medium and the computer program product, the target width value corresponding to each target convolutional network layer is determined through at least one target convolutional network layer based on the initial semantic segmentation model and the width value set corresponding to each target convolutional network layer, so that at least one trained semantic segmentation model is obtained through combined training of at least one candidate width value generated based on the target width value, the target semantic segmentation model is determined from the trained semantic segmentation models, the process of obtaining the target semantic segmentation model is simple, and the efficiency of obtaining the required semantic segmentation model is improved.

Drawings

Fig. 1 is an application environment diagram of a model determining method according to an embodiment of the present application;

fig. 2 is a schematic flowchart of a first model determining method according to an embodiment of the present application;

fig. 3A is a schematic flowchart of a second model determination method according to an embodiment of the present application;

FIG. 3B is a schematic diagram of a confusion matrix for validating a data set according to an embodiment of the present application;

fig. 4 is a schematic flowchart of a third model determination method provided in the embodiment of the present application;

fig. 5 is a block diagram of a structure of a model determining apparatus according to an embodiment of the present application;

fig. 6 is an internal structural diagram of a first computer device according to an embodiment of the present application;

fig. 7 is an internal structural diagram of a second computer device according to an embodiment of the present application;

FIG. 8 is a diagram illustrating an internal structure of a third computer device according to an embodiment of the present application;

fig. 9 is an internal structural diagram of a computer-readable storage medium according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of and not restrictive on the broad application.

The model determination method provided by the embodiment of the application can be applied to the application environment shown in fig. 1. Wherein the computer device 102 communicates with the server 104 over a communication network. The data storage system may store data that the server 104 needs to process. The data storage system may be integrated on the server 104, or may be located on the cloud or other network server.

Specifically, the server 104 selects at least one target convolutional network layer to be optimized from a plurality of convolutional network layers included in the initial semantic segmentation model; selecting at least one target width value corresponding to the target convolutional network layer from the width value set corresponding to the target convolutional network layer aiming at each target convolutional network layer; generating at least one candidate width value combination based on each target width value corresponding to each target convolutional network layer; and the candidate width value combination comprises a target width value corresponding to each target convolutional network layer. The server 104 updates the width value of each target convolutional network layer in the initial semantic segmentation model to a corresponding target width value in the candidate width value combination aiming at each candidate width value combination to obtain a semantic segmentation model to be trained corresponding to the candidate width value combination; training the semantic segmentation model to be trained corresponding to each candidate width value combination by using the sample image to obtain a trained semantic segmentation model corresponding to each candidate width value combination; a target semantic segmentation model is determined from each trained semantic segmentation model. The computer device 102 may determine, by using the target semantic segmentation model determined by the server 104, a position of a target object belonging to a target object class from the image, that is, input the target image into the target semantic segmentation model for semantic segmentation processing, so as to obtain a target object position corresponding to the target object class in the target image.

The computer device 102 may be, but not limited to, a desktop terminal or a mobile terminal, and the mobile terminal may specifically be at least one of a mobile phone, a tablet computer, a notebook computer, a smart watch, and the like. The server 104 may be implemented as a stand-alone server or as a server cluster comprised of multiple servers.

In some embodiments, as shown in fig. 2, a model determination method is provided, which may be executed by a computer device or a server, and may also be executed by both the computer device and the server, and is described by taking the method as an example applied to the server 104 in fig. 1, and includes the following steps:

step 202, selecting at least one target convolutional network layer to be optimized from a plurality of convolutional network layers included in the initial semantic segmentation model.

The semantic segmentation model is a neural network model used for image processing, and can also be called as a segmentation network; the initial semantic segmentation model is a convolutional neural network model obtained by initializing model parameters of the convolutional neural network model, and the initial semantic segmentation model may include a plurality of convolutional network layers, where a plurality means at least two. For example, the initial semantic segmentation model may be ResNet101 (Residual Neural Network), where ResNet101 includes 101 convolutional Network layers, each of which may be defined by the following parameters: the convolution kernel width k _ w, the convolution kernel height k _ h, the input channel number c _ in, the output channel number c _ out, and the number of parameters of each convolution network layer can be represented as k _ w × k _ h × c _ in × c _ out + c _ out. The semantic segmentation model belongs to different segmentation task types, the segmentation task is a computer vision task for endowing each pixel point in the image with a category label, and the segmentation task type comprises at least one of road vehicle segmentation, pedestrian segmentation, indoor scene segmentation or industrial defect segmentation.

The convolutional network layer is a network layer including at least one convolutional layer, and may include, for example, a plurality of convolutional layers, a plurality being at least two.

Specifically, the server acquires an initial semantic segmentation model, and then selects at least one target convolutional network layer to be optimized from a plurality of convolutional network layers included in the initial semantic segmentation model; the server may use all the convolutional network layers in the initial semantic segmentation model as at least one target convolutional network layer to be optimized, or may use a part of the convolutional network layers in the initial semantic segmentation model as at least one target convolutional network layer to be optimized. For example, the semantic segmentation model has 101 convolutional network layers in total, and the server may select the 1 st to 100 th convolutional network layers as at least one target convolutional network layer to be optimized.

In some embodiments, the server may select a target convolutional network layer from a plurality of convolutional network layers included in the initial semantic segmentation model according to the number of parameters of the convolutional network layer; specifically, the server may arrange the convolutional network layers of the initial semantic segmentation model in an order from a large number of parameters to a small number of parameters to obtain a convolutional network layer sequence, and select a specified number of convolutional network layers from the convolutional network layer sequence in an order from the front to the back to obtain at least one target convolutional network layer. Of course, the server may also determine the convolutional network layer having the largest number of parameters as the target convolutional network layer.

204, aiming at each target convolutional network layer, selecting at least one target width value corresponding to the target convolutional network layer from the width value set corresponding to the target convolutional network layer; the width value is the number of convolution kernels of the target convolutional network layer.

Each target convolutional network layer corresponds to one width value set respectively. The set of width values includes one or more width values, a plurality referring to at least two. The width value in the width value set corresponding to the target convolutional network layer refers to the number of output channels of the target convolutional network layer, that is, the number of convolution kernels of the last convolutional layer of the target convolutional network layer. The target width value may also be referred to as a usable width value. The target width values corresponding to different target convolutional network layers may be the same or different.

Specifically, the server determines a search space of the initial semantic segmentation model according to the model parameters of the initial semantic segmentation model, wherein the search space comprises width value sets respectively corresponding to the target convolutional network layers. For example, if the server selects the neural network model a as the initial semantic segmentation model, the search space may be defined to include a variable range of model parameters of the neural network model a, where the model parameters include a width value of the target convolutional network layer.

In some embodiments, the server obtains the width value set of each target convolutional network layer, searches for the width value set corresponding to the target convolutional network layer for each target convolutional network layer, and determines at least one target width value corresponding to the target convolutional network layer. For example, as shown in fig. 3A, assuming that I target convolutional network layers to be optimized are selected from a plurality of convolutional network layers included in the initial semantic segmentation model, the server may obtain a width value set corresponding to the ith target convolutional network layer in the I target convolutional network layers to be optimized, where the width value set includes N width values, search is performed in the width value set, and the nth width value is determined as the target width value corresponding to the target convolutional network layer. Wherein, I =1, the.

In some embodiments, for each target convolutional network layer, the server may select a specified number of width values from the set of width values in order from small to large as the target width values of the target convolutional network layer. The specified number may be set as needed, and is, for example, any one of 2 or 3.

Step 206, generating at least one candidate width value combination based on each target width value corresponding to each target convolutional network layer; the combination of candidate width values includes one target width value corresponding to each target convolutional network layer.

And each candidate width value combination is different, and at least one target width value in each candidate width value combination is different from the target width values in other candidate width value combinations. For example, there are 3 target convolutional network layers in total, and the 1 st target convolutional network layer corresponds to 2 target width values, which are 8 and 16 respectively; the 2 nd target convolutional network layer is correspondingly provided with 2 target width values which are respectively 16 and 24; the 3 rd target convolutional network layer corresponds to 1 target width value, which is 32, and the candidate width value combinations are [8,16,32], [8,24,32], [16,16,32] and [16,24,32], respectively.

Specifically, the server may rank and combine the target width values of the target convolutional network layers to generate candidate width value combinations. Taking 2 target convolutional network layers as an example, which are respectively a first target convolutional network layer and a second target convolutional network layer, the server may select one target width value from each target width value of the first target convolutional network layer, select one target width value from each target width value of the second target convolutional network layer, and combine the selected 2 target width values to obtain a candidate width value combination.

And 208, updating the width value of each target convolutional network layer in the initial semantic segmentation model to a corresponding target width value in the candidate width value combination aiming at each candidate width value combination to obtain a to-be-trained semantic segmentation model corresponding to the candidate width value combination.

For example, the server selects 2 target convolutional network layers, which are a first target convolutional network layer and a second target convolutional network layer, respectively, and if the target width value corresponding to the first target convolutional network layer in the candidate width value combination is 16 and the target width value corresponding to the second target convolutional network layer is 24, the width value of the first target convolutional network layer in the initial semantic segmentation model is updated to 16, and the width value of the second target convolutional network layer in the initial semantic segmentation model is updated to 24.

For example, assuming that the server generates M candidate width value combinations based on each target width value of each target convolutional network layer, for an mth candidate width value combination of the M candidate width value combinations, updating the width value of each target convolutional network layer in the initial semantic segmentation model to a corresponding target width value in the mth candidate width value combination to obtain a to-be-trained semantic segmentation model corresponding to the mth candidate width value combination, and so on, the server may obtain to-be-trained semantic segmentation models corresponding to the M candidate width value combinations, respectively. Wherein M is a positive integer greater than or equal to 1, M = 1.

In some embodiments, for each target convolutional network layer, after updating the width value of the target convolutional network layer in the initial semantic segmentation model to the target width value corresponding to the target convolutional network layer in the candidate width value combination, the server may determine the convolutional network layer to which the output data of the target convolutional network layer is input, and obtain the backward convolutional network layer of the target convolutional network layer, for example, the backward convolutional network layer of the target convolutional network layer may be searched by using a depth-first search algorithm, so as to obtain the backward convolutional network layer of the target convolutional network layer, and update the number of input channels of the backward convolutional network layer of the target convolutional network layer to the target width value corresponding to the target convolutional network layer, so that the problem that the number of channels of upper and lower layers is not matched is not caused, thereby reducing the number of errors of the semantic segmentation model, and improving the operation accuracy of the semantic segmentation model.

Step 210, training the semantic segmentation model to be trained corresponding to each candidate width value combination by using the sample image to obtain a trained semantic segmentation model corresponding to each candidate width value combination.

The trained semantic segmentation model is obtained by training the semantic segmentation model to be trained by using a training data set. The trained semantic segmentation model is used for determining the position of a target object belonging to a target object class from an image, and the training data set comprises a sample image and a label image corresponding to the sample image; the sample image is a visual image, for example, the sample image may be a 3-channel RGB image, including all digital information of the image, the label image includes a real category label of each pixel point in the sample image, so the label image is also referred to as a semantic label of the sample image, the category label includes an object category label and a non-object category label, and when the real category label of the pixel point is the object category label, the pixel point is a position where the target object is located; under the condition that the real category label of the pixel point is a non-object category label, the pixel point is not the position of the target object, the target object refers to an object belonging to a target object category, and the target object category includes but is not limited to at least one of a person, a cat, a table and the like.

Specifically, the server obtains a semantic segmentation model to be trained based on the candidate width value combination, then inputs a sample image in the training data set into the semantic segmentation model to be trained for prediction, outputs a predicted image, calculates an error between the tag image and the predicted image, and adjusts model parameters in the semantic segmentation model to be trained according to the error between the tag image and the predicted image to obtain the trained semantic segmentation model. The training data set can be pre-stored in the server, and the model parameters include parameters corresponding to a convolutional network layer, a normalization layer or a full connection layer in the semantic segmentation model. The prediction image comprises a prediction category label of each pixel point in the sample image, and the prediction category label is any one of an object category label or a non-object category label. It should be noted that, when the model parameters in the semantic segmentation model to be trained are adjusted, the width value of the target convolutional network layer is not adjusted.

In some embodiments, the server may obtain a model loss value according to an error between the tag image and the predicted image, adjust model parameters of the semantic segmentation model in a direction such that the model loss value is reduced until the model converges, and determine the semantic segmentation model in a case where the model converges as the trained semantic segmentation model. Wherein, the model convergence comprises that the model loss value is smaller than the loss value threshold value, and the loss value threshold value can be set according to the requirement.

Step 212, a target semantic segmentation model is determined from each trained semantic segmentation model.

The target semantic segmentation model may be a semantic segmentation model that satisfies a target requirement, for example, the target requirement may be a lightweight semantic segmentation model.

Specifically, the server calculates model performances corresponding to the trained semantic segmentation models, and determines a target semantic segmentation model from the trained semantic segmentation models based on the model performances corresponding to the trained semantic segmentation models, so that image processing can be performed based on the target semantic segmentation model. Wherein the model performance may be used to characterize a processing performance of the semantic segmentation model, including at least one of a processing speed performance or a processing precision performance. Processing speed performance refers to the speed at which the semantic segmentation model processes an image. The processing precision performance refers to the accuracy of the semantic segmentation model for processing the image.

In the model determination method, the target width value corresponding to each target convolutional network layer is determined through at least one target convolutional network layer based on the initial semantic segmentation model and the width value set corresponding to each target convolutional network layer, so that at least one trained semantic segmentation model is obtained through combined training of at least one candidate width value generated based on the target width value, and the target semantic segmentation model is determined from the trained semantic segmentation models, so that the process of obtaining the target semantic segmentation model is simpler, and the efficiency of obtaining the required semantic segmentation model is improved.

In some embodiments, step 204 comprises:

searching a candidate width value from a width value set corresponding to the target convolutional network layer;

updating the width value of the target convolutional network layer in the initial semantic segmentation model into a candidate width value to obtain a candidate semantic segmentation model;

training the candidate semantic segmentation model to obtain a trained candidate semantic segmentation model;

and under the condition that the model performance of the trained candidate semantic segmentation model meets the preset performance requirement, taking the candidate width value as a target width value corresponding to the target convolutional network layer.

And the candidate width value is a width value obtained by searching from a width value set corresponding to the target convolutional network layer. The preset performance requirement is a preset performance requirement, for example, the time for processing an image by the semantic segmentation model may be required to be less than a preset processing time, and the preset processing time may be a processing time threshold value preset according to experience.

Specifically, the server may search a candidate width value from a set of width values corresponding to the target convolutional network layer based on a width value search strategy from small to large, for example, when the set of width values is [32, 40, 48, 56, 64], the server selects 32 from the set of width values as the candidate width value of the target convolutional network layer, calculates the model performance of the trained candidate semantic segmentation model corresponding to the width value of the target convolutional network layer of 32, and in a case that the model performance of the trained candidate semantic segmentation model meets a preset performance requirement, the server may use 32 as the target width value corresponding to the target convolutional network layer. The width value search strategy is a search strategy for searching candidate width values from a width value set corresponding to a target convolutional network layer.

In some embodiments, the server searches for a candidate width value from a width value set corresponding to a target convolutional network layer, then updates the width value of the target convolutional network layer in the initial semantic segmentation model to the candidate width value, and keeps the width values of other convolutional network layers and other model parameters in the initial semantic segmentation model unchanged to obtain a candidate semantic segmentation model, and then trains the candidate semantic segmentation model by using a training sample set to obtain the trained candidate semantic segmentation model. The server calculates the model performance of the trained candidate semantic segmentation model, and takes the candidate width value as a target width value corresponding to the target convolutional network layer under the condition that the model performance of the trained candidate semantic segmentation model meets the preset performance requirement; and under the condition that the model performance of the trained candidate semantic segmentation model does not meet the preset performance requirement, the server returns to the step of searching the candidate width value from the width value set corresponding to the target convolutional network layer.

In some embodiments, in a case that the model performance of all trained candidate semantic segmentation models does not meet the preset performance requirement, the server may select a trained candidate semantic segmentation model with the optimal model performance from all trained candidate semantic segmentation models, and use a candidate width value corresponding to the candidate semantic segmentation model as a target width value corresponding to the target convolutional network layer.

In some embodiments, the server may represent the model performance of the candidate semantic segmentation models by GPU (graphics Processing Unit) average Processing time, which is the time required by the semantic segmentation models to process each image on average. And under the condition that the average processing time of the GPU is less than the preset processing time, the server takes the candidate width value as a target width value corresponding to the target convolutional network layer. The server inputs a plurality of original images into the trained candidate semantic segmentation model for prediction, the GPU processing time for processing each original image is obtained respectively, and the average value calculation is carried out on the GPU processing time of each original image to obtain the GPU average processing time. The average processing time of the GPU and the model performance of the semantic segmentation model form a negative correlation relation, and the smaller the average processing time of the GPU is, the better the model performance of the semantic segmentation model is. The original image does not belong to the training data set, the preset processing time can be a processing time threshold preset according to experience, and the negative correlation relationship refers to: in the case of otherwise unchanged conditions, the two variables change in opposite directions.

In some embodiments, the server may represent the model performance of the candidate semantic segmentation model by an average intersection ratio, which refers to a mean of the intersection and union ratios of each type of predicted value and true value of the semantic segmentation model. And under the condition that the average intersection ratio is smaller than the preset intersection ratio, the server takes the candidate width value as a target width value corresponding to the target convolutional network layer. The server inputs a verification data set into the trained candidate semantic segmentation model for prediction, the verification data set comprises a sample image and a corresponding label image, a predicted image is output, then a confusion matrix of the verification data set is obtained based on the label image and the predicted image corresponding to the sample image, and the average intersection-parallel ratio of the verification data set is obtained based on the confusion matrix calculation. For example, assuming that the confusion matrix for the validation data set is shown in FIG. 3B, the average intersection ratio of the validation data sets can be calculated using the formula: MIoU = [ TP/(TP + FP + FN) + TN/(TN + FN + FP) ]/2. The MIoU represents the average cross-over ratio of the verification data sets, the TP represents the number of pixels of a positive example and a positive example in the predicted image, the FN represents a negative example in the predicted image and the number of pixels of the positive example in the label image, the FP represents the positive example in the predicted image and the number of pixels of the negative example in the label image, and the TN represents the number of pixels of the negative example and the number of pixels of the negative example in the predicted image.

In this embodiment, when the candidate semantic segmentation model meets the preset performance requirement, the candidate width value is used as the target width value corresponding to the target convolutional network layer, so that the target width value can improve the model performance of the initial semantic segmentation model, thereby improving the model performance of the semantic segmentation model. In addition, in an industrial application scenario with a strict requirement on processing speed, the target semantic segmentation model obtained by the method can have a faster processing speed, for example, the processing time can be reduced from 40 milliseconds to 5 milliseconds.

In some embodiments, step 204 further comprises:

and under the condition that the model performance of the trained candidate semantic segmentation model does not meet the preset performance requirement, returning to the step of searching the candidate width value from the width value set corresponding to the target convolutional network layer.

Specifically, the server may further select at least one search algorithm from random search, evolutionary algorithm search, structural parameter gradient descent search, and the like, as a width value search strategy, and search for a candidate width value from a set of width values corresponding to the target convolutional network layer. And under the condition that the model performance of the trained candidate semantic segmentation model does not meet the preset performance requirement, the server returns to the step of searching candidate width values from the width value set corresponding to the target convolutional network layer, and continuously selects the candidate width values from the width value set.

In this embodiment, the step of searching for the candidate width value from the width value set corresponding to the target convolutional network layer is returned when the model performance does not meet the preset performance requirement until the model performance of the trained candidate semantic segmentation model meets the preset performance, so that the model performance of the target semantic segmentation model is improved.

In some embodiments, prior to step 204, the model determination method further comprises:

obtaining an initial width value of a target convolutional network layer in an initial semantic segmentation model;

determining a width value set corresponding to the target convolutional network layer based on the initial width value of the target convolutional network layer; the width value in the set of width values is less than or equal to the initial width value.

Wherein the initial width value is a width value of a target convolutional network layer in the initial semantic segmentation model.

For example, when the initial width value of the target convolutional network layer is 64, the set of width values corresponding to the target convolutional network layer may be [32, 40, 48, 56, 64].

In this embodiment, since the width value in the width value set is less than or equal to the initial width value, the corresponding width value set is determined based on the initial width value of the target convolutional network layer, so that the target width value corresponding to the target convolutional network layer can be determined from the width value set, the width value can be selected in a direction smaller than the initial width value, and since the smaller the width value is, the fewer the model parameters are, the parameters and the model complexity of the target semantic segmentation model are reduced, and the obtained target semantic segmentation model is a light-weight model. Therefore, the model determination method can be used for determining a lightweight semantic segmentation model.

In some embodiments, determining the set of width values corresponding to the target convolutional network layer based on the initial width value of the target convolutional network layer comprises:

carrying out reduction processing on the initial width value of the target convolutional network layer to obtain a first width value;

and selecting a plurality of width values from the range of the width values from the first width value to the initial width value to form a width value set corresponding to the target convolutional network layer.

The first width value is obtained by reducing the initial width value of the target convolutional network layer; for example, if the initial width value of the target convolutional network layer is 32, the first width value may be 16.

Specifically, the server may perform reduction processing on the initial width value of the target convolutional network layer to obtain a first width value, for example, one half of the initial width value may be determined as the first width value, then, from the first width value to the range of the width value of the initial width value, positive integers that can be evenly divided by 8 are selected, and the selected positive integers are combined to obtain a width value set corresponding to the target convolutional network layer. For example, when the initial width value of the target convolutional network layer is 64, the first width value is 32, and from the range of width values of 32 to 64, 32, 40, 48, 56, 64 are selected to be combined, so that the corresponding set of width values of the target convolutional network layer is [32, 40, 48, 56, 64].

In this embodiment, since the first width value is obtained by performing reduction processing on the initial width value, a plurality of width values are selected from the range of width values from the first width value to the initial width value to form a width value set corresponding to the target convolutional network layer, so that a target width value smaller than the initial width value can be selected from the width value set, and the parameter number and the model complexity of the target semantic segmentation model are reduced.

In some embodiments, step 210 comprises:

combining the semantic segmentation models to be trained corresponding to each candidate width value, and inputting the sample image into the semantic segmentation models to be trained for semantic segmentation processing to obtain a predicted object position corresponding to a target object category in the sample image;

and adjusting model parameters of the semantic segmentation model to be trained based on the predicted object position and the real object position corresponding to the target object class in the sample image to obtain a trained semantic segmentation model corresponding to the candidate width value combination.

The sample image is an image used for training a semantic segmentation model to be trained. The target object category is an object category of image recognition, including but not limited to at least one of a person, a cat, a table, or the like. The predicted object position is a position corresponding to a target object type obtained by inputting the sample image into the semantic segmentation network, the real object position is a real position of the target object type in the label image corresponding to the sample image, and the real object position can be pre-marked on the sample image to obtain the label image.

Specifically, the server inputs a sample image in a training data set into the semantic segmentation model to be trained for semantic segmentation processing aiming at the semantic segmentation model to be trained corresponding to each candidate width value combination, outputs a predicted image, obtains a predicted object position corresponding to a target object type in the sample image from the predicted image, then obtains an error between the predicted object position and a real object position based on the predicted object position corresponding to the target object type in the sample image and a real object position corresponding to the target object type in a label image corresponding to the sample image, and adjusts model parameters of the semantic segmentation model to be trained based on the error to obtain the trained semantic segmentation model corresponding to the candidate width value combination.

In this embodiment, the trained semantic segmentation model is obtained by training the semantic segmentation model to be trained corresponding to each candidate width value combination, so that the target image can be subjected to semantic segmentation processing by using the target semantic segmentation model determined from the trained semantic segmentation model.

In some embodiments, the model determination method further comprises:

acquiring a target image to be processed;

and inputting the target image into a target semantic segmentation model for semantic segmentation processing to obtain a target object position corresponding to a target object category in the target image.

Wherein the target object position is an object position of the target object in the target image.

Specifically, after the target semantic segmentation model is determined, the server may obtain a target image to be processed, and input the target image into the target semantic segmentation model for semantic segmentation processing to obtain a target object position corresponding to a target object category in the target image. Taking the target object as an example of a person, to determine the position of the person from the target image, the server may input the target image into the target semantic division model to perform semantic division processing, so as to obtain a predicted image, where the predicted image includes the position of the person.

In some embodiments, step 212 comprises:

determining the model performance corresponding to each trained semantic segmentation model;

and determining a target semantic segmentation model from the trained semantic segmentation models based on the model performance corresponding to each trained semantic segmentation model.

Wherein the model performance may be used to characterize a processing performance of the semantic segmentation model, including at least one of a processing speed performance or a processing precision performance. Processing speed performance refers to the speed at which the semantic segmentation model processes an image. The processing precision performance refers to the accuracy of the semantic segmentation model for processing the image.

Specifically, the server calculates the model performance of each trained semantic segmentation model, and determines the trained semantic segmentation model with the optimal model performance as the target semantic segmentation model. For example, the server may determine the trained semantic segmentation model with the shortest GPU average processing time as the target semantic segmentation model, or may determine the trained semantic segmentation model with the largest average cross ratio as the target semantic segmentation model. The model performance calculation method may refer to a method for calculating the model performance of the trained candidate semantic segmentation model, and is not described herein again.

In this embodiment, the target semantic segmentation model is determined from each trained semantic segmentation model based on the model performance corresponding to each trained semantic segmentation model, so that the model performance of the target semantic segmentation model is improved.

In some embodiments, as shown in fig. 4, a model determination method is provided, which is described by taking the method as an example for being applied to a server, and includes the following steps:

step 402, obtaining an initial semantic segmentation model, and selecting at least one target convolutional network layer to be optimized from a plurality of convolutional network layers included in the initial semantic segmentation model.

Step 404, determining a width value set corresponding to each target convolutional network layer based on the initial width value of each target convolutional network layer.

Step 406, based on the width value search strategy, searching candidate width values from the width value set corresponding to the target convolutional network layer.

And step 408, updating the width value of the target convolutional network layer in the initial semantic segmentation model to a candidate width value to obtain a candidate semantic segmentation model to be trained.

And step 410, training the candidate semantic segmentation model to be trained to obtain the trained candidate semantic segmentation model, and calculating the model performance of the trained candidate semantic segmentation model.

In step 412, it is determined whether the model performance of the trained candidate semantic segmentation model is better than a preset performance requirement, if not, step 406 is executed, and if yes, step 414 is executed.

And step 414, taking the candidate width value corresponding to the trained candidate semantic segmentation model as a target width value corresponding to the target convolutional network layer.

Step 416, generating at least one candidate width value combination based on the target width values corresponding to the target convolutional network layers.

And 418, updating the width value of each target convolutional network layer in the initial semantic segmentation model to a corresponding target width value in the candidate width value combination to obtain a semantic segmentation model to be trained corresponding to the candidate width value combination.

And 420, training the semantic segmentation model to be trained corresponding to each candidate width value combination by using the sample image to obtain a trained semantic segmentation model corresponding to each candidate width value combination.

Step 422, determine the target semantic segmentation model from each trained semantic segmentation model.

In this embodiment, at least one target convolutional network layer of the initial semantic segmentation model and a width value set corresponding to each target convolutional network layer are determined based on a segmentation task type of the initial semantic segmentation model, a target width value corresponding to each target convolutional network layer is determined from a width value set corresponding to each target convolutional network layer, at least one candidate width value combination is generated based on each target width value corresponding to each target convolutional network layer, the semantic segmentation model is trained based on the candidate width value combination to obtain a corresponding trained semantic segmentation model, and the target semantic segmentation model is determined from each trained semantic segmentation model based on model performance. In addition, the method can be suitable for most of the segmentation network models because the implementation process is simple.

It should be understood that, although the steps in the flowcharts related to the embodiments as described above are sequentially shown as indicated by arrows, the steps are not necessarily performed sequentially as indicated by the arrows. The steps are not limited to being performed in the exact order illustrated and, unless explicitly stated herein, may be performed in other orders. Moreover, at least a part of the steps in the flowcharts according to the embodiments described above may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of performing the steps or stages is not necessarily sequential, but may be performed alternately or alternately with other steps or at least a part of the steps or stages in other steps.

Based on the same inventive concept, the embodiment of the application also provides a model determining device. The implementation scheme for solving the problem provided by the device is similar to the implementation scheme described in the above method, so the specific limitations in the embodiment of the model determination device provided below can refer to the limitations on the model determination method in the above, and are not described herein again.

In some embodiments, as shown in fig. 5, there is provided a model determination apparatus including:

a first selecting module 502, configured to select at least one target convolutional network layer to be optimized from a plurality of convolutional network layers included in the initial semantic segmentation model;

a second selecting module 504, configured to select, for each target convolutional network layer, at least one target width value corresponding to the target convolutional network layer from the set of width values corresponding to the target convolutional network layer; the width value is the number of convolution kernels of the target convolutional network layer;

a generating module 506, configured to generate at least one candidate width value combination based on each target width value corresponding to each target convolutional network layer; the candidate width value combination comprises a target width value corresponding to each target convolutional network layer;

an updating module 508, configured to update, for each candidate width value combination, the width value of each target convolutional network layer in the initial semantic segmentation model to a corresponding target width value in the candidate width value combination, so as to obtain a to-be-trained semantic segmentation model corresponding to the candidate width value combination;

a training module 510, configured to train, by using a sample image, a to-be-trained semantic segmentation model corresponding to each candidate width value combination, to obtain a trained semantic segmentation model corresponding to each candidate width value combination;

a determining module 512, configured to determine a target semantic segmentation model from the trained semantic segmentation models.

In some embodiments, in selecting at least one target width value corresponding to the target convolutional network layer from the set of width values corresponding to the target convolutional network layer, the second selecting module 504 is specifically configured to:

In some embodiments, the second selecting module 504 is further specifically configured to:

In some embodiments, before selecting at least one target width value corresponding to the target convolutional network layer from the set of width values corresponding to the target convolutional network layer, the model determining apparatus further includes a set determining module, where the set determining module is specifically configured to:

determining a width value set corresponding to the target convolutional network layer based on the initial width value of the target convolutional network layer; a width value in the set of width values is less than or equal to the initial width value.

In some embodiments, in determining the set of width values corresponding to the target convolutional network layer based on the initial width value of the target convolutional network layer, the set determination module is specifically configured to:

In some embodiments, in the aspect that the to-be-trained semantic segmentation model corresponding to each candidate width value combination is trained by using the sample image to obtain a trained semantic segmentation model corresponding to each candidate width value combination, the training module 510 is specifically configured to:

combining the corresponding semantic segmentation model to be trained according to each candidate width value, inputting the sample image into the semantic segmentation model to be trained for semantic segmentation processing, and obtaining a predicted object position corresponding to a target object category in the sample image;

In some embodiments, the model determining apparatus further comprises an image processing module, the image processing module is specifically configured to:

acquiring a target image to be processed;

In some embodiments, in determining the target semantic segmentation model from among the trained semantic segmentation models, the determination module 512 is specifically configured to:

The modules in the model determination apparatus described above may be implemented in whole or in part by software, hardware, and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

In some embodiments, a computer device is provided, which may be a server, the internal structure of which may be as shown in fig. 6. The computer device comprises a processor, a memory, an Input/Output (I/O) interface and a communication interface. The processor, the memory and the input/output interface are connected through a system bus, and the communication interface is connected to the system bus through the input/output interface. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the computer device is used for storing relevant data involved in the model determination method. The communication interface of the computer device is used for communicating with an external computer device through a network connection. The computer program is executed by a processor to implement the steps in the model determination method described above.

In some embodiments, a computer device is provided, which may be a terminal, and its internal structure diagram may be as shown in fig. 7. The computer equipment comprises a processor, a memory, an Input/Output (I/O for short), a communication interface, a display unit and an Input device. The processor, the memory and the input/output interface are connected through a system bus, and the communication interface, the input device and the display unit are connected to the system bus through the input/output interface. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The communication interface of the computer device is used for communicating with an external terminal in a wired or wireless manner, and the wireless manner can be realized through WIFI, a mobile cellular network, NFC (near field communication) or other technologies. The computer program is executed by a processor to implement the steps in the model determination method described above. The display unit of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on a shell of the computer equipment, an external keyboard, a touch pad or a mouse and the like.

In some embodiments, a computer device is provided, which may comprise a server, the system architecture of which is shown in fig. 8. The server may be a local server, a cloud server, or the like, and is not limited. The computer device has computing capabilities. The computer device may include a processor, a memory, a communication interface, a transmitter, and a receiver. The processor, which may be a Central Processing Unit (CPU) or a Graphics Processing Unit (GPU), may be used to initialize the semantic segmentation model and may determine the search space and the network layer to be optimized. The processor can also train the semantic segmentation model according to the determined width value of each network layer, so as to obtain the trained semantic segmentation model. The processor may also determine model performance of the trained candidate semantic segmentation model and determine a usable width value for each network layer based on the model performance of the trained candidate semantic segmentation model. The processor can also determine at least one semantic segmentation model to be trained according to different combinations of the available width values of each network layer, and determine a final target semantic segmentation model according to the model performance of the trained semantic segmentation model corresponding to each semantic segmentation model to be trained. The memory may be a volatile memory or a nonvolatile memory, and may be used for storing data required by the processor in the calculation process, or storing the calculation result of the processor, and the like. The memory may also store an initial semantic segmentation model, a search space, a training data set, trained candidate semantic segmentation models, an original image, an available width value for each network layer, a performance of each candidate semantic segmentation model or a lightweight segmented network model, and so on. The communication interface is used for communicating with other devices. The processor, memory, and communication interface communicate via a bus. The bus may include: data bus, power bus, control bus, status signal bus, etc. The receiver and transmitter may be used to receive information or data or the like from other devices or systems. The receiver may be used to receive initialized semantic segmentation models, training data sets, search spaces, raw images, etc. from other devices. The transmitter may be configured to transmit the available width value or the target semantic segmentation model for each network layer to other devices.

Those skilled in the art will appreciate that the configurations shown in fig. 6, 7, and 8 are merely block diagrams of portions of configurations related to aspects of the present application, and do not constitute limitations on the computing devices to which aspects of the present application may be applied, as particular computing devices may include more or less components than shown, or combine certain components, or have a different arrangement of components.

In some embodiments, a computer device is provided, the computer device comprising a memory and a processor, the memory having stored therein a computer program, the processor implementing the steps of the above-described model determination method when executing the computer program.

In some embodiments, a computer-readable storage medium 900 is provided, on which a computer program 902 is stored, the computer program 902, when being executed by a processor, implementing the steps of the model determination method described above, the internal structure of which may be as shown in fig. 9.

In some embodiments, a computer program product is provided, comprising a computer program which, when executed by a processor, performs the steps in the above-described model determination method.

It should be noted that, the user information (including but not limited to user equipment information, user personal information, etc.) and data (including but not limited to data for analysis, stored data, displayed data, etc.) referred to in the present application are information and data authorized by the user or sufficiently authorized by each party, and the collection, use and processing of the related data need to comply with the relevant laws and regulations and standards of the relevant country and region.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, databases, or other media used in the embodiments provided herein can include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high-density embedded nonvolatile Memory, resistive Random Access Memory (ReRAM), magnetic Random Access Memory (MRAM), ferroelectric Random Access Memory (FRAM), phase Change Memory (PCM), graphene Memory, and the like. Volatile Memory can include Random Access Memory (RAM), external cache Memory, and the like. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM), among others. The databases referred to in various embodiments provided herein may include at least one of relational and non-relational databases. The non-relational database may include, but is not limited to, a block chain based distributed database, and the like. The processors referred to in the embodiments provided herein may be general purpose processors, central processing units, graphics processors, digital signal processors, programmable logic devices, quantum computing based data processing logic devices, etc., without limitation.

All possible combinations of the technical features in the above embodiments may not be described for the sake of brevity, but should be considered as being within the scope of the present disclosure as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is specific and detailed, but not construed as limiting the scope of the present application. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present application shall be subject to the appended claims.

Claims

1. A method of model determination, comprising:

for each target convolutional network layer, searching a candidate width value from a width value set corresponding to the target convolutional network layer; updating the width value of the target convolutional network layer in the initial semantic segmentation model into the candidate width value to obtain a candidate semantic segmentation model; training the candidate semantic segmentation model to obtain a trained candidate semantic segmentation model; under the condition that the model performance of the trained candidate semantic segmentation model meets the preset performance requirement, taking the candidate width value as a target width value corresponding to the target convolutional network layer; the width value is a number of convolution kernels of the target convolutional network layer;

for each candidate width value combination, updating the width value of each target convolutional network layer in the initial semantic segmentation model to a corresponding target width value in the candidate width value combination to obtain a to-be-trained semantic segmentation model corresponding to the candidate width value combination;

aiming at each semantic segmentation model to be trained corresponding to the candidate width value combination, inputting a sample image into the semantic segmentation model to be trained for semantic segmentation processing to obtain a prediction object position corresponding to a target object category in the sample image; obtaining a trained semantic segmentation model corresponding to the candidate width value combination based on the predicted object position corresponding to the target object class;

a target semantic segmentation model is determined from each of the trained semantic segmentation models based on model performance.

2. The method of claim 1, further comprising:

3. The method of claim 1, wherein before selecting at least one target width value corresponding to the target convolutional network layer from the set of width values corresponding to the target convolutional network layer, the method further comprises:

obtaining an initial width value of the target convolutional network layer in the initial semantic segmentation model;

determining a width value set corresponding to the target convolutional network layer based on the initial width value of the target convolutional network layer; a width value of the set of width values is less than or equal to the initial width value.

4. The method of claim 3, wherein determining the set of width values corresponding to the target convolutional network layer based on the initial width value of the target convolutional network layer comprises:

5. The method of claim 1, wherein obtaining the trained semantic segmentation model corresponding to the candidate width value combination based on the predicted object position corresponding to the target object class comprises:

and adjusting model parameters of the semantic segmentation model to be trained based on the predicted object position and the real object position corresponding to the target object class in the sample image to obtain the trained semantic segmentation model corresponding to the candidate width value combination.

6. The method of claim 1, further comprising:

acquiring a target image to be processed;

and inputting the target image into the target semantic segmentation model for semantic segmentation processing to obtain a target object position corresponding to a target object category in the target image.

7. The method of claim 1, wherein determining a target semantic segmentation model from each of the trained semantic segmentation models based on model performance comprises:

and determining a target semantic segmentation model from each trained semantic segmentation model based on the model performance corresponding to each trained semantic segmentation model.

8. A model determination apparatus, comprising:

the second selection module is used for searching a candidate width value from the width value set corresponding to the target convolutional network layer aiming at each target convolutional network layer; updating the width value of the target convolutional network layer in the initial semantic segmentation model into the candidate width value to obtain a candidate semantic segmentation model; training the candidate semantic segmentation model to obtain a trained candidate semantic segmentation model; under the condition that the model performance of the trained candidate semantic segmentation model meets the preset performance requirement, taking the candidate width value as a target width value corresponding to the target convolutional network layer; the width value is a number of convolution kernels of the target convolutional network layer;

a generating module, configured to generate at least one candidate width value combination based on each target width value corresponding to each target convolutional network layer; the candidate width value combination comprises a target width value corresponding to each target convolutional network layer;

the updating module is used for updating the width value of each target convolutional network layer in the initial semantic segmentation model to a corresponding target width value in the candidate width value combination aiming at each candidate width value combination to obtain a semantic segmentation model to be trained corresponding to the candidate width value combination;

the training module is used for combining a corresponding semantic segmentation model to be trained according to each candidate width value, inputting a sample image into the semantic segmentation model to be trained for semantic segmentation processing, and obtaining a predicted object position corresponding to a target object category in the sample image; obtaining a trained semantic segmentation model corresponding to the candidate width value combination based on the predicted object position corresponding to the target object class;

a determination module for determining a target semantic segmentation model from each of the trained semantic segmentation models based on model performance.

9. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any one of claims 1 to 7 when executing the computer program.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 7.