CN112734013A

CN112734013A - Image processing method, image processing device, electronic equipment and storage medium

Info

Publication number: CN112734013A
Application number: CN202110020618.4A
Authority: CN
Inventors: 张选杨
Original assignee: Beijing Megvii Technology Co Ltd
Current assignee: Beijing Megvii Technology Co Ltd
Priority date: 2021-01-07
Filing date: 2021-01-07
Publication date: 2021-04-30
Anticipated expiration: 2041-01-07
Also published as: CN112734013B

Abstract

The embodiment of the application provides an image processing method, an image processing device, electronic equipment and a storage medium, wherein the method comprises the following steps: determining a dynamic convolution neural network corresponding to the image to be processed based on the image to be processed, wherein the dynamic convolution neural network corresponding to the image to be processed has a second dynamic connection mode and a second operation mode, the second dynamic connection mode represents an object which is connected with each processing node in all the processing nodes, and the second operation mode represents an operation type which corresponds to each object which is connected with each processing node; and the dynamic convolution neural network processes the image to be processed based on the second dynamic connection mode and the second operation mode to generate a processing result of the image to be processed. The performance of the dynamic convolutional neural network is improved, and meanwhile, the expression capacity of the dynamic convolutional neural network is enhanced.

Description

Image processing method, image processing device, electronic equipment and storage medium

Technical Field

Some embodiments of the present application relate to the field, and in particular, to an image processing method and apparatus, and an electronic device storage medium.

Background

Convolutional neural networks are widely used in the fields of image classification, target detection and the like. Compared with a static convolutional neural network such as ResNet, the dynamic convolutional neural network has low computational complexity and is suitable for landing in a resource-limited device.

And the dynamic convolutional neural network is searched out from the original convolutional neural network through a neural network architecture search algorithm. For example, a dynamic convolutional neural network is searched under a defined search space by a DARTS algorithm. Compared with the structure of the original convolutional neural network, the structure of the dynamic convolutional neural network is simpler, so that the computational complexity is reduced.

However, the searched dynamic convolutional neural network only reduces the computational complexity relative to the original convolutional neural network. In terms of performance, the performance of the searched dynamic convolutional neural network is not improved relative to the original convolutional neural network, and may be lower than the original convolutional neural network due to the simple structure relative to the original convolutional neural network. Therefore, how to improve the performance of the dynamic convolutional neural network becomes a problem to be solved.

Disclosure of Invention

Some embodiments of the present application provide an image processing method, an image processing apparatus, an electronic device, and a storage medium.

Some embodiments of the present application provide an image processing method, including:

determining a dynamic convolutional neural network corresponding to an image to be processed based on the image to be processed, wherein the dynamic convolutional neural network corresponding to the image to be processed has a second dynamic connection mode and a second operation mode, the second dynamic connection mode represents an object connected to each processing node in all the processing nodes, and the second operation mode represents an operation type corresponding to each object connected to each processing node, and the dynamic convolutional neural network comprises: an object selection network for selecting, for each processing node, an object to which the processing node is connected from all candidate objects of the processing node;

and the dynamic convolution neural network processes the image to be processed based on the second dynamic connection mode and the second operation mode to generate a processing result of the image to be processed.

Some embodiments of the present application provide an image processing apparatus, including:

a determining unit, configured to determine, based on an image to be processed, a dynamic convolutional neural network corresponding to the image to be processed, where the dynamic convolutional neural network corresponding to the image to be processed has a second dynamic connection mode and a second operation mode, where the second dynamic connection mode represents an object connected to each processing node in all processing nodes, and the second operation mode represents an operation type corresponding to each object connected to each processing node, and the dynamic convolutional neural network includes: an object selection network for selecting, for each processing node, an object to which the processing node is connected from all candidate objects of the processing node;

and the processing unit is configured to process the image to be processed based on the second dynamic connection mode and the second operation mode by using the dynamic convolutional neural network, and generate a processing result of the image to be processed.

The image processing method and the image processing device provided by the embodiment of the application realize that the second dynamic connection mode and the second operation mode aiming at the image to be processed are determined when the image to be processed is processed by utilizing the dynamic convolution neural network. And determining a corresponding topological structure of the dynamic convolutional neural network, which is suitable for processing the image to be processed, and inputting the image to be processed into the dynamic convolutional neural network under the topological structure of the dynamic convolutional neural network, which is suitable for processing the image to be processed, so as to perform operation on the output of the object connected with the processing node. Therefore, the topological structure of the dynamic convolution neural network can be changed, and the determined topological structure suitable for processing the corresponding image to be processed can be used for processing the image to be processed aiming at different images to be processed, so that the processing result of the image to be processed is obtained. The performance of the dynamic convolutional neural network is improved, and meanwhile, the expression capacity of the dynamic convolutional neural network is enhanced.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with some embodiments of the application and, together with the description, serve to explain the principles of some embodiments of the application.

FIG. 1 is a flow chart illustrating an image processing method provided by an embodiment of the present application;

FIG. 2 is a diagram illustrating the effect of selecting objects to process node connections;

FIG. 3 is a diagram illustrating an effect of an operation corresponding to selecting an object to process a node connection;

fig. 4 is a block diagram showing a configuration of an image processing apparatus according to an embodiment of the present application;

fig. 5 shows a block diagram of an electronic device provided in an embodiment of the present application.

Detailed Description

The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.

It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.

Fig. 1 shows a flowchart of an image processing method provided in an embodiment of the present application, where the method includes:

step 101, determining a dynamic convolutional neural network corresponding to the image to be processed based on the image to be processed.

In the embodiment of the application, each image to be processed is processed by utilizing a dynamic convolution neural network.

And for different images to be processed, the second dynamic connection mode and the second operation mode adopted by the dynamic convolution neural network are different.

The dynamic convolution neural network corresponding to the image to be processed is understood as: for an image to be processed, the dynamic convolution neural network adopts a second dynamic connection mode and a second operation mode which are suitable for the image to be processed, so that when the image to be processed is processed, the dynamic convolution neural network can be called as a dynamic convolution neural network corresponding to the image to be processed.

In this embodiment of the present application, the dynamic convolutional neural network corresponding to the image to be processed has a second dynamic connection mode and a second operation mode, where the second dynamic connection mode represents an object connected to each processing node in all the processing nodes, and the second operation mode represents an operation type corresponding to each object connected to each processing node.

For an image to be processed, the second dynamic connection mode for the image to be processed may be characterized by: for the image to be processed, to which objects each processing node should be connected, determining the second dynamic connection mode for the image to be processed is equivalent to determining a topology of a dynamic convolutional neural network that is suitable for processing the image to be processed.

In an embodiment of the present application, a dynamic convolutional neural network includes: and the object selection network is used for selecting an object connected with the processing node from all the candidate objects of the processing node.

In the embodiment of the application, before the dynamic convolution neural network corresponding to the image to be processed is determined based on the image to be processed, the dynamic convolution neural network is trained, and the dynamic convolution neural network is trained each time. The dynamic convolutional neural network is trained with one training image at a time. The training images used for training the dynamic convolution neural network are different each time. Each training image may be pre-labeled such that each training image has a respective labeling output. For each training image, the labeled output of the training image is the expected output of the training image input to the dynamic convolutional neural network.

The dynamic convolutional neural network is trained so that the object selection network learns the association relationship between the features of the image input to the dynamic convolutional neural network and the objects connected to the processing nodes, and the unit for selecting operation in the dynamic convolutional neural network learns the association relationship between the features of the image input to the dynamic convolutional neural network and the operations corresponding to the objects connected to the processing nodes.

After the dynamic convolutional neural network is trained by a large number of training images, the object selection network learns the association relationship between the features of the images input to the dynamic convolutional neural network and the objects to which the processing nodes are connected. Therefore, when the dynamic convolution neural network is used for processing the image to be processed, the image to be processed is used as the image input to the dynamic convolution neural network, and the object selection network can accurately select the object connected with the processing node.

After the dynamic convolutional neural network is trained by a large number of training images, the association relationship between the features of the images input to the dynamic convolutional neural network and the operations corresponding to the objects connected to the processing nodes is learned. Therefore, when the image to be processed is processed by using the dynamic convolution neural network, the image to be processed is used as the image input to the dynamic convolution neural network, and the operation corresponding to the object connected with the processing node can be determined more accurately.

In some embodiments of the present application, a dynamic convolutional neural network may be constructed according to a Search space defined in a DARTS (differentiated Architecture Search) algorithm. The dynamic convolutional neural network includes: a plurality of processing units. Each of the plurality of processing units includes a plurality of processing nodes. A processing unit may be referred to as a cell and a processing node as a node.

In some embodiments of the present application, all processing nodes are comprised of processing nodes in each processing unit. For example, if the dynamic convolutional neural network includes 8 processing units, each processing unit includes 4 processing nodes, and all the processing nodes are 32 processing nodes.

Each of all processing nodes may each include one or more of: at least one convolutional layer, at least one pooling layer, an output layer.

In some embodiments of the present application, the dynamic convolutional neural network has an initial connection mode. In the initial connection mode, all processing units in the dynamic convolutional neural network are connected in series. For each processing unit, the processing unit is connected to a processing unit that is adjacent in position. All processing nodes in the same processing unit are connected in series. For each processing node in any one processing unit, the processing node is connected to a processing node that is adjacent in position to the processing node in the processing unit.

For example, a dynamic convolutional neural network includes 8 processing units, each of which includes 4 processing nodes. The 8 processing units are connected in series. The 1 st processing unit is connected with the 2 nd processing unit, the 2 nd processing unit is connected with the 1 st processing unit and the 3 rd processing unit simultaneously, the 3 rd processing unit is connected with the 2 nd processing unit and the 4 th processing unit simultaneously, and so on. For any one processing unit, 4 processing nodes in the processing unit are connected in series.

In the initial connection mode, for each processing unit, in the case where the processing unit has a subsequent processing unit, the output of the processing unit may be used as the input of the subsequent processing unit of the processing unit. Accordingly, the output of the processing unit may be used as an input to the 1 st processing node in a processing unit subsequent to the processing unit.

In some embodiments of the present application, the dynamic convolutional neural network may further include a convolution unit located before all processing units. The convolution unit preceding all processing units may comprise one or more convolution layers. A convolution unit located before all processing units may be used to perform preliminary feature extraction on the image input to the dynamic convolutional neural network. For each processing node in a first processing unit in the dynamic convolutional neural network, all candidates for each processing node include: a convolution unit before all processing units, all processing nodes before the processing node in the first processing unit.

In some embodiments of the present application, in the initial connection mode, the output of the convolution unit located before all processing units may be used as the input of the 1 st processing unit in the dynamic convolution neural network. Accordingly, in the initial connection mode, the output of the convolution unit located before all the processing units can be used as the input of the 1 st processing node in the 1 st processing unit in the dynamic convolution neural network. For each processing node in the first processing unit in the dynamic convolutional neural network, the output of each candidate object of the processing node is associated with the output of the convolution unit located before all processing units in the initial connection mode.

In some embodiments of the present application, for a processing unit that is the nth processing unit in the dynamic convolutional neural network, the processing unit is the nth processing unit in the dynamic convolutional neural network in a direction from an input of the dynamic convolutional neural network to an output of the dynamic convolutional neural network. For each processing node in a processing unit, the processing node being the mth processing node in the processing unit means that the processing node is the mth processing node in the processing unit in a direction from an input in the dynamic convolutional neural network to an output of the dynamic convolutional neural network.

In some embodiments, each processing unit of the plurality of processing units is of a type that is one of: a normal type, a reduced type, wherein the number of feature map output channels of the processing unit of the normal type is 2 times the number of feature map output channels of the processing unit of the reduced type, and the size of the feature map in the output of the processing unit of the normal type is 2 times the size of the feature map in the output of the processing unit of the reduced type.

In some embodiments of the present application, a processing unit of a conventional type may be referred to as a normal cell, and a processing unit of a reduced type may be referred to as a reduction cell. The plurality of cells in the dynamic convolutional neural network may be composed of at least one normal cell and at least one reduction cell. The number of feature map output channels of the normal cell is 2 times that of the reduction cell, that is, the number of feature maps included in the output of the normal cell is 2 times that of the feature maps included in the output of the reduction cell. The size of the feature map in the output of the normal cell is 2 times the size of the feature map in the output of the reduction cell.

In some embodiments of the present application, for a processing node, if the number of candidate objects of the processing node is greater than a preset selection number, a preset selection number of candidate objects may be selected from all candidate objects of the processing node as objects to which the processing node is connected, and if the number of candidate objects of the processing node is less than or equal to the preset selection number, the candidate objects of the processing node may be directly used as the objects to which the processing node is connected.

In some embodiments of the present application, the preset number of selections may be 2.

For each processing node that satisfies the following condition: the number of processing units preceding the processing unit to which the processing node belongs is greater than 2 and there is at least one processing node located before the processing node among the processing units to which the processing node belongs, and all candidates for the processing node may be composed of: the first two processing units of the processing unit to which the processing node belongs, and all processing nodes located before the processing node in the processing unit to which the processing node belongs.

For each processing node that satisfies the following condition: the number of processing units before the processing unit to which the processing node belongs is greater than 2 and the processing node is the 1 st processing node in the processing unit to which the processing node belongs, and all candidates of the processing node may be composed of: the first two processing units of the processing unit to which the processing node belongs.

All candidates for the 1 st processing node in the 2 nd processing unit are the 1 st processing unit.

For each processing node in the 2 nd processing unit except the 1 st processing node, all candidates for that processing node may consist of: all processing nodes in the 1 st processing unit and the 2 nd processing unit which are positioned before the processing node.

All candidates for the 1 st processing node in the 1 st processing unit may be convolution units preceding all processing units in the dynamic convolution neural network.

For each processing node in the 1 st processing unit except for the 1 st processing node, all candidates for that processing node may consist of: a convolution unit positioned before all processing units in the dynamic convolution neural network, and all processing nodes positioned before the processing node in the 1 st processing unit.

In some embodiments of the present application, before determining the dynamic convolutional neural network corresponding to the image to be processed based on the image to be processed, the training operation is repeatedly performed on the dynamic convolutional neural network until the training on the dynamic convolutional neural network is completed. For example, the training may be stopped when the precision of the dynamic convolutional neural network reaches a precision threshold, and it is determined that the training of the dynamic convolutional neural network is completed, and the training operation is no longer performed on the dynamic convolutional neural network. Or when the total times of training the dynamic convolutional neural network is greater than the time threshold indicating that the training of the dynamic convolutional neural network is completed, determining that the training of the dynamic convolutional neural network is completed, and not performing the training operation on the dynamic convolutional neural network any more.

In some embodiments of the present application, the training operation comprises: determining a first dynamic connection mode aiming at a training image and a first operation mode aiming at the training image, wherein the first dynamic connection mode represents an object connected with each processing node in all the processing nodes, and the first operation mode represents an operation type corresponding to each object connected with each processing node; based on a loss between an output corresponding to the training image and an annotated output of the training image in the first dynamic connection mode and the first operation mode for the training image, updating a parameter value of a parameter associated with the loss in the dynamic convolutional neural network.

In some embodiments of the present application, each time the dynamic convolutional neural network is trained, a first dynamic connection mode and a first operation mode for a training image used for training the dynamic convolutional neural network are determined for the training image.

The first dynamic connection mode can be characterized as follows: and each processing node is connected with an object, and the first dynamic connection mode is equivalent to determining the topological structure of the dynamic convolutional neural network for the training image. For any processing node, the first operation mode may characterize an operation type corresponding to each object connected to the processing node.

When the dynamic convolution neural network is trained, a training image is input into the dynamic convolution neural network, and for one processing node, the input of the processing node comprises: and executing the operation corresponding to the object to the output of the object connected by the processing node in the first dynamic connection mode to obtain an operation result. The type of the operation corresponding to the object is the operation type represented by the first operation mode and corresponding to the object.

In some embodiments of the present application, each time the dynamic convolutional neural network is trained, in the first dynamic connection mode and the first operation mode, the corresponding output of the training image is: and inputting the training image into a dynamic convolution neural network under a first dynamic connection mode and a first operation mode aiming at the training image to obtain network output of the dynamic convolution neural network.

In some embodiments of the present application, if a last processing unit of the plurality of processing units in the dynamic convolutional neural network is a last unit of the dynamic convolutional neural network, an output of the last processing unit of the plurality of processing units in the dynamic convolutional neural network may be used as a network output of the dynamic convolutional neural network.

If the last processing unit of the plurality of processing units in the dynamic convolutional neural network is not the last unit of the dynamic convolutional neural network, and the dynamic convolutional neural network further includes at least one other unit located after the last processing unit, the output of the output layer of the last unit in the dynamic convolutional neural network may be used as the network output of the dynamic convolutional neural network.

In some embodiments of the present application, each time the dynamic convolutional neural network is trained, a loss between an output corresponding to the training image and a labeled output of the training image in the first dynamic connection mode and the first operation mode may be calculated by using a preset loss function. Then, based on the loss between the corresponding output of the training image and the labeled output of the training image in the first dynamic connection mode and the first operation mode, the parameter value of the parameter related to the loss in the dynamic convolution neural network is updated.

In some embodiments of the present application, the object selection network may include at least any one of a pooling layer, a fully connected layer, and a softmax layer.

Each time the dynamic convolutional neural network is trained, for a processing node, the network may be selected by the object, selecting the object to which the processing node connects from all the candidates for the processing node.

In some embodiments of the present application, the operation selection network may include at least any one of a pooling layer, a fully connected layer, and a softmax layer.

Each time the dynamic convolutional neural network is trained, for a processing node, the operation selection network can select the operation corresponding to the object from all the candidate operations.

In some embodiments of the present application, each time the dynamic convolutional neural network is trained, all parameters associated with the loss in the dynamic convolutional neural network include: parameters associated with loss in each processing node, parameters associated with loss in the object selection network, parameters associated with loss in the operation selection network.

Therefore, each time the dynamic convolution neural network is trained, not only each processing node but also the object selection network and the operation selection network are trained.

After the dynamic convolutional neural network is trained through a large number of training images, an object selection network in the dynamic convolutional neural network learns the association relationship between the characteristics of the images input into the dynamic convolutional neural network and the objects connected with the processing nodes.

After the dynamic convolutional neural network is trained through a large number of training images, an operation selection network in the dynamic convolutional neural network learns the association relationship between the characteristics of the images input into the dynamic convolutional neural network and the operation corresponding to the object connected with the processing node.

In some embodiments, the object selection network comprises: the system comprises a global pooling layer, a full connection layer and an object selection result output layer, wherein the object selection network acquires first output of each candidate object of each processing node in an initial connection mode, and the probability that each candidate object of the processing node is an object connected with the processing node is obtained after processing. In other words, for any processing node, the object selection network may process the first output of each candidate object of the processing node in the initial connection mode to obtain the probability that each candidate object of the processing node is the object to which the processing node is connected.

The following illustrates the process of selecting an object to which a processing node is connected:

assume that 2 objects connected by the processing node are selected from all the candidate objects of the processing node. The object selection network may obtain a first output of each candidate object for each processing node in the initial connected mode. In selecting an object to which the processing node connects using the object selection network, a global pooling layer in the object selection network receives a first output of each candidate object for the processing node in the initial connection mode.

Specifically, feature maps in the output of all candidate objects of the processing node are spliced in the channel dimension in the initial connection mode to obtain a splicing result. And inputting the splicing result into a global pooling layer in the object selection network, and performing pooling operation on the splicing result in the global pooling layer to obtain a pooling result. The pooling result is input to a fully connected layer in the object selection network. The output of the fully connected layer is used as the input of an object selection result output layer in the object selection network. In the object selection result output layer, the probability that each candidate object of the processing node is an object to which the processing node is connected may be calculated based on the output of the fully-connected layer using a first preset function, for example, a top-2 pocket-softmax function, respectively. And sequencing all the candidate objects of the processing node according to the sequence from large probability to small probability.

The object selection result output layer outputs the policy vector. The dimension of the strategy vector output by the object selection result output layer is equal to the number of candidate objects. Each vector element in the policy vector takes a value of 0 or 1, and each vector element corresponds to a candidate object. The values of 2 vector elements in the policy vector are 1, and the values of the rest vector elements are 0. After all the candidate objects of the processing node are sorted in the order of the probability from large to small, the value of the vector element corresponding to each of the first two candidate objects is 1. In other words, the candidate object corresponding to the vector element whose value is 1 is the object connected to the processing node.

Referring to fig. 2, a schematic diagram illustrating an effect of selecting an object to which a processing node is connected is shown.

node_jRepresenting a processing node, node_jFor the jth node in the kth cell, k is greater than 2, and j is greater than 2. node(s)_jAll candidates of (a) consist of: all the nodes positioned before the jth node in the kth-2 cell, the kth-1 cell and the kth cell. cell_k-2Represents the k-2 th cell, cell_k-1Representing the k-1 th cell. node(s)₀Represents the 1 st node, node in the kth cell_j-1Represents the node in the k cell_jThe previous node of (1). cat represents p-node_jThe feature maps in the output of all candidate objects are stitched in the channel dimension.

In the object selection result output layer, calculating a node based on the output of the full connection layer in the object selection network by using a first preset function top-2 member softmax function_jEach candidate of (a) is a node_jProbability of connected objects, i.e. for node_jIs calculated as a node_jProbability of connected objects. And the object selection result output layer outputs the strategy vector. Each vector element of the strategy vector output by the object selection result output layer corresponds to a candidate object.

The value of the 2 nd vector element in the up-to direction is 1, and the value of the 4 th vector element in the up-to direction is 1. Assuming that the 2 nd vector element in the up-to direction corresponds to the (k-1) th cell and the 4 th vector element in the up-to direction corresponds to the 2 nd node in the k-th cell, the slave node_jOf all candidates of (1)_jThe 2 connected objects are the kth-1 cell and the 2 nd node in the kth cell.

In some embodiments, further comprising: and under the condition that a first adjusting condition is met, adjusting the temperature coefficient of the first preset function.

The first preset function may be a top-N pocket-softmax function. If, for each processing node, 2 objects connected to the processing node are selected from all the candidate objects of the processing node, the first preset function may be a top-2 pocket-softmax function.

The first adjustment condition may be that the number of times the dynamic convolutional neural network has been trained reaches a number threshold. The temperature coefficient of the first preset function may be adjusted to be a temperature coefficient that decreases the top-N pocket-softmax function. The temperature coefficient of the top-N-chamber-softmax function may be decreased when the number of times that has been trained reaches a threshold number of times. The temperature coefficient of the top-N pocket-softmax function after adjustment is smaller than the temperature coefficient of the top-N pocket-softmax function before adjustment.

The first adjustment condition may be that the newly added training times reach an increase time threshold from the time of the last adjustment. After each adjustment, the temperature coefficient of top-N chamber-softmax may be decreased when the newly added number of trains reaches the newly added number increase threshold. Thus, the temperature coefficient of top-N pocket-softmax is gradually decreased as the number of trainings increases.

By dynamically adjusting the temperature coefficient of the top-N pocket-softmax function, premature convergence of the object selection network to some locally optimal strategies can be avoided.

In some embodiments, the dynamic convolutional neural network further comprises: operating a selected network, the operating the selected network comprising: the operation selection network obtains the candidate operation probability corresponding to each object according to the second output of each object for each object connected by each processing node, and the second output of each object is subjected to candidate operation to obtain the candidate output corresponding to each object.

In this application, the output of the object to which the node is connected in the first dynamic connection mode or the second dynamic connection mode may be referred to as a second output of the object. The operation selection network may obtain a candidate operation probability corresponding to each object according to the second output of each object. For an object, the probability of the candidate operation corresponding to the object indicates, for each candidate operation, the probability that the candidate operation is the operation corresponding to the object. For an object, the candidate operation probability corresponding to the object may be represented by a policy vector, where each vector element in the policy vector takes a value of 0 or 1, and each vector element represents a candidate operation. The value of 1 vector element in the strategy vector is 1, and the values of the rest vector elements are 0. The vector element representing the candidate operation with the highest probability takes a value of 1, and the candidate operation represented by the vector element taking the value of 1 is the operation corresponding to the object.

In this application, for an object connected to a processing node, an operation corresponding to the object is performed on a second output of the object in the first dynamic connection mode or the second dynamic connection mode to obtain an operation result, and the operation result may be used as an input of the processing node or a part of an input of the processing node.

In this application, for an object connected to a processing node, each candidate operation may be performed on the second output of the object to obtain a plurality of candidate outputs, and the plurality of candidate outputs are weighted by using a policy vector indicating a probability of the candidate operation corresponding to the object, and an obtained weighting result is used as the input of the processing node or a part of the input of the processing node.

The value of the vector element in the policy vector representing the operation corresponding to the object is 1, the weight of the operation corresponding to the object is 1 when weighting is performed, the value of the vector element in the policy vector representing other candidate operations is 0, and the weight of each other candidate operation is 0 when weighting is performed.

When weighting is performed, the candidate output related to the operation corresponding to the object is multiplied by the weight 1, and the other candidate outputs except the candidate output related to the operation corresponding to the object are multiplied by the weight 0, so that the obtained weighting result is the candidate output related to the operation corresponding to the object.

The following illustrates the process of determining the operation corresponding to the object to which a processing node is connected:

and taking the second output of the object as the input of a global pooling layer in the operation selection network to obtain a global pooling result. The global pooling result is input to a fully connected layer in the operation selection network. And taking the output of the full connection layer as the input of an operation selection result output layer in the operation selection network. In the operation selection result output layer, a second preset function, for example, a gum softmax function, may be used to calculate, based on the output of the fully-connected layer, a probability that each candidate operation is an operation corresponding to the object, and generate a policy vector. The operation selection result output layer outputs a strategy vector, and the dimensionality of the strategy vector is equal to the number of candidate operations.

In some embodiments of the present application, all candidate operations may include, but are not limited to: 3 × 3 partial convolution, i.e., 3 × 3 separable convolution, 5 × 5 partial convolution, 3 × 3 divided partial convolution, 3 × 3 dilated separable convolution, 5 × 5 divided partial convolution, 5 × 5 dilated separable convolution, 3 × 3average pooling, 3 × 3 max pooling, and 3 × 3 max pooling, i.e., 3 × 3 maximum pooling, identity. identity is an operation of holding the original value. If the operation performed on one object connected to one processing node is an identity operation, it is equivalent to not processing the output of the object connected to the processing node.

Referring to fig. 3, an effect diagram of an operation of determining an object to which a processing node is connected is shown.

Using a node_jRepresenting a processing node. In fig. 3, the node is shown only by way of example_jOne object node of a connection_i。node_jThe number of connected objects is a preset selection number.

For example, the preset number of choices is 2, nodes_jFor the jth node in the kth cell, k is larger than 2, and the object selects the network from the node _j2 objects are selected from all the candidate objects, 2 objects comprising a node_iAnd another object. The other object may be a node or a cell.

For node_jConnected object node_iA node is prepared_iAs an input to the global pooling layer in the operation selection network, to obtain a global pooling result. The global pooling result is input to a fully connected layer in the operation selection network. And taking the output of the full connection layer as the input of an operation selection result output layer in the operation selection network. In the operation selection result output layer, a node may be calculated based on an output of the fully-connected layer using a second preset function, i.e., a predetermined function, i.e., a function of "key softmax_iCorresponding candidate operation probability, node_iThe corresponding candidate operation probability indicates that for each candidate operation, the candidate operation is a node_iProbability of corresponding operation. The operation selection result output layer outputs a strategy vector, and the dimensionality of the strategy vector is equal to the number of candidate operations. Each vector element in the policy vector has a value of 0 or 1, 1 vector element in the policy vector has a value of 1, and the remaining vector elements have values of 0. The vector element representing the candidate operation with the highest probability takes a value of 1, and the candidate operation represented by the vector element with the value of 1 is a node_iAnd (4) performing corresponding operation.

The 1 st vector element in the up-to direction is the 1 st vector element in the policy vector, the 2 nd vector element in the up-to direction is the 2 nd vector element in the policy vector, and so on.

The value of the 2 nd vector element in the policy vector is 1. Assume that the order of candidate operations represented by vector elements in the policy vector is the following order: 3 × 3 partial volume, 5 × 5 partial volume, 3 × 3 divided partial volume, 5 × 5 divided partial volume, 3 × 3 max volume, identity, then 1 vector element represents 3 × 3 partial volume, the 2 nd vector element represents 5 × 5 partial volume, and so on. The value of the 2 nd vector element in the strategy vector is 1, which means that 5 × 5 partial volume is selected as the node_iAnd (4) performing corresponding operation.

To node_iThe result obtained by performing 5 × 5 partial conversion on the operation corresponding to the output of (a) is used as a node_jInput or node of_jIs part of the input of (1).

In some embodiments, further comprising: and under the condition that a second adjusting condition is met, adjusting the temperature coefficient of the second preset function.

The second preset function may be a gumbel softmax function. The second adjustment condition may be that the number of times the dynamic convolutional neural network has been trained reaches a number threshold. Adjusting the temperature coefficient of the chamber-softmax function to decrease the temperature coefficient of the chamber-softmax function. When the number of times that has been trained reaches the number threshold, the temperature coefficient of the gum-softmax function may be decreased so that the temperature coefficient of the gum-softmax function becomes smaller.

The second adjustment condition may be that the newly added training times reach an increased number threshold from the time of the last adjustment. After each adjustment, the temperature coefficient in the chamber-softmax may be decreased when the newly added training number reaches the newly added number increase threshold. Thus, the temperature coefficient in the gum-softmax is gradually decreased as the number of training times increases.

By dynamically adjusting the temperature coefficient of the Gumbel-softmax function, premature convergence of each operation selection network to some locally optimal strategy can be avoided.

And 102, processing the image to be processed by the dynamic convolution neural network based on the second dynamic connection mode and the second operation mode to generate a processing result of the image to be processed.

In some embodiments of the present application, when the image to be processed is processed by using the dynamic convolutional neural network, the second dynamic connection mode and the second operation mode for the image to be processed may be determined by using the dynamic convolutional neural network. Then, a processing result of the image to be processed may be generated based on the second dynamic connection manner and the second operation manner for the image to be processed.

When a processing result of the image to be processed is generated based on the second dynamic connection mode and the second operation mode for the image to be processed, each processing node may be connected to a corresponding object according to the second dynamic connection mode for the image to be processed, and the image to be processed is input to the dynamic convolutional neural network, and for each processing node, the input of the processing node in the second connection mode includes: and performing an operation corresponding to the object on the output of the object connected with the processing node in the second connection mode to obtain an operation result, wherein the type of the operation corresponding to the object is the operation type represented by the second operation mode and corresponding to the object. And under a second dynamic connection mode, after the image to be processed is input into the dynamic convolution neural network, the output of the dynamic convolution neural network is the processing result of the image to be processed.

Referring to fig. 4, a block diagram of an image processing apparatus according to some embodiments of the present application is shown. The device comprises: a determination unit 401 and a processing unit 402.

The determining unit 401 is configured to determine, based on the image to be processed, a dynamic convolutional neural network corresponding to the image to be processed, where the dynamic convolutional neural network corresponding to the image to be processed has a second dynamic connection manner and a second operation manner, where the second dynamic connection manner represents an object connected to each processing node in all processing nodes, and the second operation manner represents an operation type corresponding to each object connected to each processing node, and the dynamic convolutional neural network includes: an object selection network for selecting, for each processing node, an object to which the processing node is connected from all candidate objects of the processing node;

the processing unit 402 is configured to process the to-be-processed image based on the second dynamic connection mode and the second operation mode by using the dynamic convolutional neural network, and generate a processing result of the to-be-processed image.

In some embodiments, the dynamic convolutional neural network further comprises: a convolution unit preceding all processing units in the dynamic convolutional neural network, for each processing node in a first processing unit in the dynamic convolutional neural network, all candidates for each processing node including: the convolution unit, all processing nodes in the first processing unit before the processing node.

In some embodiments, the image processing apparatus further comprises: a training unit configured to repeatedly perform a training operation on a dynamic convolutional neural network until training of the dynamic convolutional neural network is completed before determining the dynamic convolutional neural network corresponding to an image to be processed based on the image to be processed, the dynamic convolutional neural network including: a plurality of processing units, the processing units including a plurality of processing nodes, the training operation including: determining a first dynamic connection mode and a first operation mode aiming at a training image, wherein the first dynamic connection represents an object connected with each processing node in all the processing nodes, and the first operation mode represents an operation type corresponding to each object connected with each processing node; updating parameter values of parameters in the dynamic convolutional neural network associated with the loss based on the loss between the corresponding output of the training image and the labeled output of the training image in the first dynamic connection mode and the first operation mode.

In some embodiments, the object selection network comprises: the system comprises a global pooling layer, a full connection layer and an object selection result output layer, wherein the object selection network acquires first output of each candidate object of each processing node in an initial connection mode, and the probability that each candidate object is an object connected with the processing node is obtained after processing.

In some embodiments, in the initial connected mode, the input of the first processing unit in the dynamic convolutional neural network is the output of the convolutional unit located before all processing units in the dynamic convolutional neural network, and for each processing node in the first processing unit in the dynamic convolutional neural network, the output of each candidate object of the processing node in the initial connected mode is associated with the output of the convolutional unit.

In some embodiments, the image processing apparatus further comprises:

a first adjusting unit configured to adjust a temperature coefficient of a first preset function for calculating a probability that each candidate object is an object to which the processing node is connected, in a case where a first adjusting condition is satisfied.

In some embodiments, the dynamic convolutional neural network further comprises: the operation selects a network, the operation selecting the network comprising: the operation selection network obtains the candidate operation probability corresponding to each object according to the second output of each object for each object connected by each processing node, wherein the second output of each object is subjected to candidate operation to obtain the candidate output corresponding to each object.

In some embodiments, the image processing apparatus further comprises:

and the second adjusting unit is configured to adjust the temperature coefficient of a second preset function under the condition that a second adjusting condition is met, wherein the second preset function is used for calculating the candidate operation probability corresponding to each object.

Fig. 5 is a block diagram of an electronic device provided in this embodiment. The electronic device includes a processing component 522 that further includes one or more processors, and memory resources, represented by memory 532, for storing instructions, e.g., applications, that are executable by the processing component 522. The application programs stored in memory 532 may include one or more modules that each correspond to a set of instructions. Further, the processing component 522 is configured to execute instructions to perform the above-described methods.

The electronic device may also include a power supply component 526 configured to perform power management of the electronic device, a wired or wireless network interface 550 configured to connect the electronic device to a network, and an input/output (I/O) interface 558. The electronic device may operate based on an operating system stored in memory 532, such as Windows Server, MacOS XTM, UnixTM, LinuxTM, FreeBSDTM, or the like.

In an exemplary embodiment, a storage medium comprising instructions, such as a memory comprising instructions, executable by an electronic device to perform the above method is also provided. Alternatively, the storage medium may be a non-transitory computer readable storage medium, which may be, for example, a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

Other embodiments of some of the embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. The embodiments of the present application are intended to cover any variations, uses, or adaptations of the embodiments of the application following, in general, the principles of the embodiments of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the embodiments of the application pertain. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of some embodiments of the application being indicated by the following claims.

It will be understood that some embodiments of the present application are not limited to the precise arrangements described above and shown in the drawings, and that various modifications and changes may be made without departing from the scope thereof. The scope of some embodiments of the present application is limited only by the following claims.

Claims

1. An image processing method, characterized in that the method comprises:

2. The method of claim 1, wherein the dynamic convolutional neural network further comprises: a convolution unit located before all processing units in the dynamic convolutional neural network, for each processing node in a first processing unit in the dynamic convolutional neural network, all candidates for each processing node include: the convolution unit, all processing nodes in the first processing unit before the processing node.

3. The method of claim 1, wherein each of the plurality of processing units is of one of the following types: a normal type, a reduced type, wherein the number of feature map output channels of the processing unit of the normal type is 2 times the number of feature map output channels of the processing unit of the reduced type, and the size of the feature map in the output of the processing unit of the normal type is 2 times the size of the feature map in the output of the processing unit of the reduced type.

4. The method according to any one of claims 1-3, wherein prior to determining a dynamic convolutional neural network corresponding to a to-be-processed image based on the to-be-processed image, the method further comprises:

repeatedly performing a training operation on a dynamic convolutional neural network until training of the dynamic convolutional neural network is completed, the dynamic convolutional neural network comprising: a plurality of processing units, the processing units including a plurality of processing nodes, the training operation including: determining a first dynamic connection mode and a first operation mode aiming at a training image, wherein the first dynamic connection mode represents an object connected with each processing node in all the processing nodes, and the first operation mode represents an operation type corresponding to each object connected with each processing node; updating parameter values of parameters in the dynamic convolutional neural network associated with the loss based on the loss between the corresponding output of the training image and the labeled output of the training image in the first dynamic connection mode and the first operation mode.

5. The method of claim 4, wherein the object selection network comprises: the system comprises a global pooling layer, a full connection layer and an object selection result output layer, wherein the object selection network acquires first output of each candidate object of each processing node in an initial connection mode, and the probability that each candidate object is an object connected with the processing node is obtained after processing.

6. The method of claim 5, wherein in the initial connected mode, the input of the first processing unit in the dynamic convolutional neural network is the output of the convolutional unit located before all processing units in the dynamic convolutional neural network, and for each processing node in the first processing unit in the dynamic convolutional neural network, the output of each candidate object of the processing node in the initial connected mode is associated with the output of the convolutional unit.

7. The method of claim 5, further comprising:

and under the condition that a first adjusting condition is met, adjusting the temperature coefficient of a first preset function, wherein the first preset function is used for calculating the probability that each candidate object of the processing node is an object connected with the processing node.

8. The method of any one of claims 1-4, wherein the dynamic convolutional neural network further comprises: an operation selection network, the operation selection network comprising: the operation selection network obtains the candidate operation probability corresponding to each object according to the second output of each object for each object connected by each processing node, wherein the second output of each object is subjected to candidate operation to obtain the candidate output corresponding to each object.

9. The method of claim 8, further comprising:

and under the condition that a second adjusting condition is met, adjusting the temperature coefficient of a second preset function, wherein the second preset function is used for calculating the candidate operation probability corresponding to each object.

10. An image processing apparatus, characterized in that the apparatus comprises:

11. An electronic device, comprising:

a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the method of any one of claims 1 to 9.

12. A storage medium in which instructions, when executed by a processor of an electronic device, enable the electronic device to perform the method of any of claims 1 to 9.