CN110659726B

CN110659726B - Image processing method and device, electronic equipment and storage medium

Info

Publication number: CN110659726B
Application number: CN201910906732.XA
Authority: CN
Inventors: 张�雄
Original assignee: Beijing Dajia Internet Information Technology Co Ltd
Current assignee: Beijing Dajia Internet Information Technology Co Ltd
Priority date: 2019-09-24
Filing date: 2019-09-24
Publication date: 2022-05-06
Anticipated expiration: 2039-09-24
Also published as: CN110659726A

Abstract

The disclosure relates to an image processing method, an image processing apparatus, an electronic device and a storage medium. In the image processing method, a neural network model is utilized to perform image processing on an image to be processed to obtain image processing results of various processing types; the neural network model includes multiple processing layers; each stage of treatment layer comprises: the convolutional network and the branch networks are shared, and each non-final processing layer also comprises a characteristic aggregation network; the plurality of branch networks correspond to the plurality of processing types one to one. According to the image processing method provided by the embodiment of the disclosure, when a neural network model is used for carrying out image processing of multiple processing types on an image, the accuracy of image processing results of the multiple processing types can be improved, and the training phase of the neural network model can be converged quickly due to the fact that the accuracy of the image processing results determined by the neural network model is high, so that the learning efficiency of the neural network model can be improved.

Description

Image processing method and device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of neural network technologies, and in particular, to an image processing method and apparatus, an electronic device, and a storage medium.

Background

The neural network is an algorithm model simulating the structure and function of the biological neural network, and has been widely applied in the field of image processing. Among these, there is often a need for image processing of multiple processing types on an image using a neural network model.

In the related art, a method for performing image processing of a plurality of processing types on an image using a neural network model includes: and inputting the image into a pre-trained neural network model to obtain image processing results of the image in multiple processing types. The neural network model comprises a plurality of branch networks, each branch network respectively extracts image characteristic data related to a corresponding processing type from an image, and determines an image processing result of the corresponding processing type by using the extracted image characteristic data; thus, a plurality of branch networks can obtain image processing results of a plurality of processing types.

Since the image feature data extracted by each branch network are from the same image, there is a correlation between the image feature data. However, in the neural network model used in the related art, each branch network determines the image processing result of one processing type of the image to be processed using the extracted image feature data alone, ignoring the correlation between the image feature data of the entire image, which may make the determined image processing results of a plurality of processing types inaccurate.

Disclosure of Invention

The present disclosure provides an image processing method, an image processing apparatus, an electronic device, and a storage medium to improve accuracy of image processing results of a plurality of processing types when image processing of a plurality of processing types is performed on an image using one neural network model. The technical scheme of the disclosure is as follows:

according to a first aspect of embodiments of the present disclosure, there is provided an image processing method, including:

acquiring an image to be processed;

inputting the image to be processed into a pre-trained neural network model to obtain image processing results of the image to be processed in multiple processing types;

the neural network model comprises a plurality of processing layers, and each processing layer comprises: the convolutional network and the branch networks are shared, and each non-last processing layer also comprises a feature aggregation network; the plurality of branch networks correspond to the plurality of processing types one to one;

in each non-last processing layer, extracting branch image feature data of a processing type corresponding to each branch network from data input to the processing layer by the shared convolution network and the plurality of branch networks, aggregating the extracted branch image feature data by the feature aggregation network to obtain aggregated image feature data, and outputting the aggregated image feature data to a next processing layer;

in the last stage processing layer, extracting branch image feature data of a processing type corresponding to each branch network from data input to the processing layer by the shared convolutional network and the plurality of branch networks, and determining an image processing result of the corresponding processing type by each branch network based on the extracted branch image feature data;

the data input into the first-stage processing layer is image characteristic data extracted from the image to be processed, and the data input into each non-first-stage processing layer is aggregated image characteristic data output by the previous-stage processing layer.

Optionally, the plurality of branch networks included in each stage of processing layer are a group of branch networks with the same network structure and different model parameters; the shared convolution networks included in each stage of processing layer have the same network structure and different convolution parameters.

Optionally, in each processing layer, extracting, by the shared convolutional network and the plurality of branch networks, branch image feature data of a processing type corresponding to each branch network from data input to the processing layer includes:

extracting, in each processing layer, high-level image feature data from data input to the processing layer by the shared convolutional network; extracting branch image feature data of corresponding processing types from the high-level image feature data by each branch network;

wherein the high-level image feature data is: image feature data including branch image feature data of a processing type corresponding to each of the branch networks.

Optionally, the branch image feature data is a multi-dimensional matrix;

the feature aggregation network aggregates the extracted branch image feature data to obtain aggregated image feature data, and the method comprises the following steps:

calculating a descriptor of each extracted branch image feature data based on the dimension of the target data to be aggregated;

aggregating the extracted branch image feature data based on each calculated descriptor according to an aggregation mode defined in the feature aggregation network to obtain aggregated image feature data;

the descriptor is a matrix with the same data dimension as the branch image feature data, and in the descriptor, data belonging to the target data dimension is equal to data of a corresponding data dimension in the branch image feature data, and data not belonging to the target data dimension is reserved or pooled.

Optionally, the calculating a descriptor of each extracted branch image feature data based on the target data dimension to be aggregated includes:

performing matrix summation on each extracted branch image characteristic data according to the target data dimension to obtain a summation matrix;

sequentially carrying out pooling processing and convolution operation on the summation matrix to obtain a descriptor of each extracted branch image characteristic data;

wherein the pooling is performed on data which does not belong to the target data dimension; the convolution operation is a convolution operation of data about the target data dimension.

Optionally, the aggregating, based on each of the calculated descriptors, the extracted branch image feature data according to an aggregation mode defined in the feature aggregation network to obtain aggregated image feature data includes:

inputting the calculated descriptors into a preset weight calculation function, and calculating the weight corresponding to each descriptor;

taking the weight corresponding to each descriptor as the weight of the branch image feature data corresponding to the descriptor;

and calculating weighted image characteristic data according to the extracted branch image characteristic data and the weight of each branch image characteristic data, wherein the weighted image characteristic data is used as aggregated image characteristic data obtained by aggregation.

Optionally, the weight calculation Function includes a normalized exponential Function Softmax Function.

Optionally, the image processing results of the plurality of processing types include a plurality of the following image processing results:

an object segmentation result, an object pose estimation result, an object detection result, a scene segmentation result, and a scene depth estimation result.

According to a second aspect of the embodiments of the present disclosure, there is provided an image processing apparatus including:

an acquisition module configured to acquire an image to be processed;

the image processing module is configured to input the image to be processed to a pre-trained neural network model to obtain image processing results of multiple processing types of the image to be processed;

wherein the neural network model comprises a plurality of processing layers, wherein each processing layer comprises: the convolutional network and the branch networks are shared, and each non-final processing layer also comprises a characteristic aggregation network; the plurality of branch networks correspond to the plurality of processing types one to one;

Optionally, the plurality of branch networks included in each stage of processing layer is a group of branch networks with the same network structure and different model parameters; the shared convolution networks included in each stage of processing layer have the same network structure and different convolution parameters.

Optionally, the branch image feature data is a multi-dimensional matrix;

aggregating the extracted branch image feature data according to the aggregation mode defined in the feature aggregation network based on the calculated descriptors to obtain aggregated image feature data;

the descriptor is a matrix with the same data dimension as the branch image characteristic data, and in the descriptor, data belonging to the target data dimension is equal to data of a corresponding data dimension in the branch image characteristic data, and data not belonging to the target data dimension is reserved or pooled.

According to a third aspect of an embodiment of the present disclosure, there is provided an electronic apparatus including:

a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement any of the image processing methods described above.

According to a fourth aspect of the embodiments of the present disclosure, there is provided a storage medium, in which a computer program is stored, and when the computer program is executed by a processor, the image processing method described in any one of the above is implemented.

According to a fifth aspect of embodiments of the present disclosure, there is provided a computer program product which, when run on a computer, causes the computer to perform any of the image processing methods described above.

The technical scheme provided by the embodiment of the disclosure at least has the following beneficial effects:

in the image processing method provided by the embodiment of the disclosure, a neural network model adopted for processing images of multiple processing types on an image to be processed comprises a plurality of processing layers; extracting branch image characteristic data of a processing type corresponding to each branch network from data input to the processing layer by a shared convolution network and a plurality of branch networks, aggregating the extracted branch image characteristic data by a characteristic aggregation network to obtain aggregated image characteristic data, and outputting the aggregated image characteristic data to the processing layer of the next stage; in the last stage of the processing layer, the shared convolution network and the plurality of branch networks extract branch image feature data of a processing type corresponding to each branch network from data input to the processing layer, and each branch network determines an image processing result of the corresponding processing type based on the extracted branch image feature data. The feature aggregation network aggregates the obtained aggregated image feature data to synthesize the branch image feature data extracted by each branch network, so that the branch image feature data extracted by each branch network from the aggregated image feature data is more accurate; in this way, in the processing layer at the last stage, each branch network can improve the accuracy of the corresponding type of image processing result when determining the corresponding type of image processing result based on the branch image feature data extracted from the aggregated image feature data. In addition, the accuracy of the image processing result determined by the neural network model is higher, so that the convergence can be realized quickly in the training stage of the neural network model, and the learning efficiency of the neural network model is improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the principles of the disclosure and are not to be construed as limiting the disclosure.

FIG. 1 is a flow diagram illustrating an image processing method according to an exemplary embodiment.

FIG. 2 is a schematic diagram illustrating a structure of a neural network model for image processing, according to an example embodiment.

FIG. 3 is a schematic diagram illustrating the structure of another neural network model for image processing, according to an example embodiment.

Fig. 4 is a block diagram illustrating an image processing apparatus according to an exemplary embodiment.

FIG. 5 is a block diagram illustrating an electronic device in accordance with an example embodiment.

Fig. 6 is a block diagram illustrating an apparatus for image processing according to an exemplary embodiment.

Fig. 7 is a block diagram illustrating another apparatus for image processing according to an example embodiment.

Detailed Description

In order to make the technical solutions of the present disclosure better understood by those of ordinary skill in the art, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.

It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein are capable of operation in sequences other than those illustrated or otherwise described herein. The implementations described in the exemplary embodiments below do not represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.

In order to improve the accuracy of image processing results of multiple processing types when one neural network model is used for performing image processing of multiple processing types on an image, embodiments of the present disclosure provide an image processing method, an image processing apparatus, an electronic device, and a storage medium.

An execution subject of an image processing method provided by the embodiment of the disclosure may be an image processing apparatus, and the apparatus may be applied to an electronic device. In a specific application, the electronic device may be a smart phone, a computer, a monitoring device, a game console, a tablet device, a medical device, a fitness device, or a personal digital assistant, etc.

Fig. 1 is a flow chart illustrating an image processing method according to an exemplary embodiment, which may include the following steps, as shown in fig. 1.

S11: and acquiring an image to be processed.

Here, the image to be processed may be a single picture or a group of pictures. In addition, it is understood that the video is composed of a plurality of frames of images, and therefore, in this step, the image to be processed may also be a video.

S12: and inputting the image to be processed into a pre-trained neural network model to obtain image processing results of the image to be processed in multiple processing types.

The neural network model can be obtained by training based on the sample image and the labeling information of the sample image. Here, the annotation information of the sample image may include: and image processing results of multiple processing types corresponding to the sample images are labeled in advance.

In a particular application, the plurality of processing types described above may include a plurality of processing types among object segmentation, pose estimation of objects, object detection, scene segmentation, and depth estimation of a scene.

The object refers to a human object and/or an animal object in the image to be processed, and the scene refers to a scene in the image to be processed; object segmentation belongs to an image segmentation technology, and specifically relates to extracting image data of an area where an object is located from an image; the object pose estimation is an image processing method for detecting the position, direction and scale information of each part of an object from an image; object detection, which is an image processing method for detecting the coordinate position of an object and various features of the object from an image; here, various characteristics of the subject such as sex, age, apparel category, apparel color, wearing, and behavioral actions, etc., are not limited thereto; scene segmentation also belongs to the image segmentation technology; the depth estimation of a scene is an image processing method for extracting image feature data from a graphic and estimating the depth of the scene in the image using the extracted image feature data.

Accordingly, the image processing results of the above-described plural processing types may include plural kinds of the following image processing results:

In this step, the neural network model used is a model for performing image processing of a plurality of processing types on an image. In order to improve the accuracy of image processing of multiple processing types, the neural network model used in the embodiments of the present disclosure includes multiple processing layers, each processing layer includes: the convolutional network and the plurality of branch networks are shared, and each non-final processing layer further comprises a feature aggregation network. Here, the plurality of branch networks correspond to a plurality of processing types one to one, and the plurality of branch networks included in each processing layer is a set of branch networks having the same network structure.

In each non-final processing layer, extracting branch image characteristic data of a processing type corresponding to each branch network from data input to the processing layer by using a shared convolution network and a plurality of branch networks, aggregating the extracted branch image characteristic data by using a characteristic aggregation network to obtain aggregated image characteristic data, and outputting the aggregated image characteristic data to the next processing layer;

in the last stage processing layer, branch image feature data of a processing type corresponding to each branch network is extracted from data input to the processing layer by a shared convolution network and a plurality of branch networks, and an image processing result of the corresponding processing type is determined by each branch network based on the extracted branch image feature data.

In the neural network model, the data input to the first-stage processing layer is image feature data extracted from an image to be processed, the data input to each non-first-stage processing layer is aggregated image feature data output by the previous-stage processing layer.

In addition, the shared convolutional network in each processing layer has a role of extracting high-level image feature data from data input to the processing layer; here, the high-level image feature data is: image feature data including branch image feature data of a processing type corresponding to each branch network. Therefore, in each processing layer, extracting branch image feature data of a processing type corresponding to each branch network from data input to the processing layer by sharing the convolutional network and the plurality of branch networks may include:

extracting high-level image feature data from data input to each processing layer by a shared convolutional network in each processing layer; and extracting branch image characteristic data corresponding to the processing type from the high-level image characteristic data by each branch network.

It can be understood that the image processing results of the multiple processing types determined by the neural network model and the annotation information corresponding to the sample image may be in a one-to-one correspondence relationship. For example, assuming that the sample image is a picture, the annotation information of the sample image includes: the width of the picture, the length of the picture and the classification category corresponding to the picture; then, when an image is processed by using the neural network model, the determined image processing result may include: the width of the picture, the length of the picture and the classification category corresponding to the picture.

It can be understood that after the respective branch networks extract the image feature data, wherein for the same kind of image feature, there may be a deviation in the branch image feature data extracted by each branch network for the image feature, which makes the branch image feature data for the same kind of image feature non-unique, causing an error; and after the branch image feature data extracted by the branch networks are aggregated by the feature aggregation network, the obtained aggregated image feature data is unique, and the error is counteracted. Thus, with the progressive processing layers, the accuracy of the aggregated image feature data output by each stage of processing layer is gradually improved, and in the last stage of processing layer, each branch network can determine the image processing result of the corresponding processing type based on the extracted branch image feature data, so that the image processing result can be more accurate.

For clarity of the scheme, fig. 2 schematically shows a structural diagram of a neural network model for image processing. As shown in fig. 2, the neural network model includes 3 processing layers: a first-stage treatment layer 10, a second-stage treatment layer 20, and a last-stage treatment layer 30; in the first level 10, the shared convolutional network 110, the branch network 1201, the branch network 1202, the branch network 1203 and the feature aggregation network 130 are included; in the second level processing layer 20, a shared convolutional network 210, a branching network 2201, a branching network 2202, a branching network 2203, and a feature aggregation network 230 are included; in the last stage processing layer 30, the shared convolutional network 310, the branch network 3201, the branch network 3202, and the branch network 3203 are included.

In fig. 2, the shared convolutional network in each processing layer is respectively connected to each branch network in the processing layer; each branch network in each stage of processing layer is respectively connected with the feature aggregation network in the same processing layer; and the characteristic aggregation network of the previous stage is connected with the shared convolution network of the next stage, and so on.

The shared convolution network 110 in the first-stage processing layer 10 extracts high-level image feature data from the image to be processed; each branch network in the first-stage processing layer 10 extracts branch image feature data of a corresponding processing type from the high-level image feature data; the feature aggregation network 130 in the first-stage processing layer 10 aggregates the extracted feature data of each branch image to obtain aggregated image feature data, and sends the aggregated image feature data to the second-stage processing layer 20; the shared convolution network 210 in the second-stage processing layer 20 extracts high-level image feature data from the aggregated image feature data input by the feature aggregation network 130 in the first-stage processing layer 10; each branch network in the second-level processing layer 20 extracts branch image feature data of a corresponding processing type from the high-level image feature data; the feature aggregation network 230 in the second-stage processing layer 20 aggregates the extracted feature data of each branch image to obtain aggregated image feature data, and sends the aggregated image feature data to the last-stage processing layer 30; the shared convolutional network 310 in the last processing layer 30 extracts high-level image feature data from the aggregated image feature data input by the feature aggregation network 230 in the second processing layer 20; each branch network in the last-stage processing layer 30 extracts branch image feature data of a corresponding processing type from the high-level image feature data, and determines an image processing result of the corresponding processing type based on the branch image feature data extracted by each branch network.

It should be noted that the neural network model shown in fig. 2 includes a number of processing layers, which is only an example and does not limit the embodiments of the present disclosure. In order to further improve the accuracy of the image processing results of multiple processing types, the neural network model used in the image processing method provided by the embodiment of the disclosure may further increase the number of the second-stage processing layers 20 on the basis of the structure shown in fig. 2.

It can be understood that, for the purpose of improving the accuracy of the image processing results of multiple processing types, the neural network model used in the image processing method provided by the embodiment of the disclosure includes at least two processing layers. When two process layers are included, the first process layer includes: the system comprises a shared convolutional network, a plurality of branch networks and a feature aggregation network, wherein the last processing layer comprises the shared convolutional network and the plurality of branch networks. Fig. 3 is a schematic structural diagram of an exemplary neural network model for image processing. As shown in fig. 3, the neural network model includes 2 levels of processing layers; in the first level 10, the shared convolutional network 110, the branch network 1201, the branch network 1202, the branch network 1203 and the feature aggregation network 130 are included; in the last stage processing layer 30, the shared convolutional network 310, the branch network 3201, the branch network 3202, and the branch network 3203 are included.

In the image processing method provided by the embodiment of the disclosure, a neural network model adopted for processing images of multiple processing types on an image to be processed comprises a plurality of processing layers; extracting branch image characteristic data of a processing type corresponding to each branch network from data input to the processing layer by a shared convolution network and a plurality of branch networks, aggregating the extracted branch image characteristic data by a characteristic aggregation network to obtain aggregated image characteristic data, and outputting the aggregated image characteristic data to the processing layer of the next stage; in the last stage of the processing layer, the shared convolution network and a plurality of branch networks extract branch image feature data of processing types corresponding to the branch networks from the data input to the processing layer, and each branch network determines an image processing result of the corresponding processing type based on the extracted branch image feature data. The feature aggregation network aggregates the obtained aggregated image feature data to synthesize the branch image feature data extracted by each branch network, so that the branch image feature data extracted by each branch network from the aggregated image feature data is more accurate; in this way, in the processing layer at the last stage, each branch network can improve the accuracy of the corresponding type of image processing result when determining the corresponding type of image processing result based on the branch image feature data extracted from the aggregated image feature data. In addition, the accuracy of the image processing result determined by the neural network model is higher, so that the convergence can be realized quickly in the training stage of the neural network model, and the learning efficiency of the neural network model is improved.

For clarity of the scheme and clear layout, a specific implementation manner of aggregating the extracted branch image feature data by the feature aggregation network to obtain aggregated image feature data is exemplarily described below.

Illustratively, in one implementation, the branch image feature data extracted by the branch network is a multi-dimensional matrix;

correspondingly, the aggregating the extracted branch image feature data by the feature aggregation network to obtain aggregated image feature data may include:

calculating a descriptor of the extracted feature data of each branch image based on the dimension of the target data to be aggregated;

the descriptor is a matrix with the same data dimension as the branch image feature data, and in the descriptor, data belonging to the target data dimension is equal to data of the corresponding data dimension in the branch image feature data and data not belonging to the target data dimension, and an original value is reserved or pooled.

It can be understood that the descriptor has a role in describing data belonging to a target data dimension in the branch image feature data.

In this implementation, the dimensions of the target data to be aggregated by the feature aggregation networks in different processing layers may be the same or different; moreover, the aggregation modes defined by the feature aggregation networks in different processing layers may be the same or different. For example, assume that the branch image feature data is a multidimensional matrix of A × B × C × D; then, in the neural network model shown in fig. 2, in the feature aggregation network 130 in the first-stage processing layer 10, the target data dimension may be dimension a, and the defined aggregation mode may be aggregation mode x; in the feature aggregation network 230 in the second-level processing layer 20, the target data dimensions may be a dimension B and a dimension C, and the defined aggregation manner may be an aggregation manner y. Or, in the feature aggregation network 130 in the first-stage processing layer 10, the target data dimension may be a dimension a and a dimension B, and the defined aggregation mode may be an aggregation mode x; in the feature aggregation network 130 in the second-level processing layer 20, the target data dimension may also be a dimension a and a dimension B, and the defined aggregation manner may also be an aggregation manner x.

It can be understood that when the dimensions of target data to be aggregated by the feature aggregation networks in different processing layers are different, the network structures of the feature aggregation networks in different processing layers are different; moreover, when aggregation modes defined by feature aggregation networks in different processing layers are different, network structures of the feature aggregation networks in different processing layers can also be different.

In the first-stage processing layer, the shared convolution network extracts high-level image features from the image to be processed; in a non-first-stage processing layer, a shared convolution network extracts high-level image features from a previous-stage processing layer; therefore, when the network structures of the feature aggregation networks included in different processing layers are different, convolution parameters related to the extraction of high-level image feature data in the shared convolution networks included in the different processing layers are different; accordingly, since the plurality of branch networks extract the branch image feature from the high-level image feature extracted by the shared convolutional network and input the extracted branch image feature to the feature aggregation network, when the network structures of the feature aggregation networks included in different processing layers are different, the model parameters of the plurality of branch networks included in different processing layers are also different. Thus, in the neural network model, the plurality of branch networks included in each stage of processing layer may be a group of branch networks with the same network structure and different model parameters; the shared convolution networks included in each stage of processing layer have the same network structure and different convolution parameters.

In addition, even when the network structures of the feature aggregation networks included in different processing layers are the same, since there is a difference in the physical hierarchy levels of the processing layers, it is possible to make different convolution parameters belonging to the shared convolution network related to the difference in the physical hierarchy levels and different model parameters belonging to the respective branch networks related to the difference in the physical hierarchy levels in the processing layers of different levels.

In this implementation manner, calculating the descriptor of the extracted feature data of each branch image based on the dimension of the target data to be aggregated may include:

performing matrix summation on the extracted feature data of each branch image according to the dimension of target data to obtain a summation matrix;

performing pooling processing and convolution operation on the summation matrix in sequence to obtain a descriptor of the extracted feature data of each branch image;

wherein the pooling is performed on data which do not belong to the target data dimension; the convolution operation is a convolution operation of data with respect to a target data dimension.

For example, assume that the branch image feature data extracted by the branch networks are 3N × C × H × W multidimensional matrices; the dimension N represents the number of the pictures, the dimension C represents the channel data of the pictures, the dimension H represents the height of the pictures, and the dimension W represents the width of the pictures; assuming that the dimension of target data to be aggregated is dimension C, dimension N is data needing to be reserved with an original value, and dimension H and dimension W are data needing to be pooled; then the process of computing each descriptor of nxc × H × W may be: performing matrix addition on the three NxCxHxW according to the dimension C to obtain Nx3 CxHxW; then, pooling dimension H and dimension W in Nx3 CxHxW to 1; obtaining Nx 3 Cx 1 x 1; then, taking 3 Cx 1 x 1 as a convolution kernel, and performing convolution operation on Nx 3 Cx 1 x 1 to obtain 3 xNxCx 1 x 1; drawing 3 × N × C × 1 × 1 into 3N × C × 1 × 1; these 3 nxc × 1 × 1 descriptors are the 3 nxc × H × W descriptors described above, respectively. It can be understood that, in the 3 nxc × 1 × 1 descriptors, the data of the dimension N and the dimension C are equal to the data of the dimension N and the dimension C in the corresponding nxc × H × W, respectively; in the 3 nxc × 1 × 1 descriptors, the dimension W and the dimension H are pooled to 1, and the dimension N retains the original value.

When pooling data that needs to be pooled, the pooling value may be 1, or may be any other preset pooling value, which is not limited in the embodiment of the present disclosure.

In addition, based on the calculated descriptors, the extracted branch image feature data are aggregated according to an aggregation mode defined in the feature aggregation network, and a variety of specific implementation modes exist for obtaining the aggregated image feature data. For example, in an implementation manner, aggregating the extracted branch image feature data according to an aggregation manner defined in the feature aggregation network based on the calculated descriptors to obtain aggregated image feature data may include:

taking the weight corresponding to each descriptor as the weight of the branch image characteristic data corresponding to the descriptor;

and calculating weighted image feature data according to the extracted branch image feature data and the weight of each branch image feature data, wherein the weighted image feature data are used as aggregated image feature data obtained by aggregation.

The weight calculation Function may include, but is not limited to, a normalized exponential Function Softmax Function.

The specific implementation of the feature aggregation network aggregating the extracted branch image feature data to obtain aggregated image feature data is merely an example, and should not be construed as a limitation to the embodiments of the present disclosure. Any aggregation mode that can be used when aggregating image feature data and determining image processing results of multiple processing types based on the aggregated image feature data, such as a local mode of directly summing up feature data of each branch image, is applicable to the embodiments of the present disclosure.

For clarity of the scheme, the training process of the neural network model described above is illustrated below. Illustratively, the training process of the neural network model may include:

acquiring a plurality of sample images and the labeling information of each sample image; here, the annotation information of the sample image may include: pre-labeling image processing results of multiple processing types corresponding to the sample image;

respectively inputting each sample image into a neural network model in training to obtain image processing results of multiple processing types of the image to be processed;

judging whether a neural network model in training converges or not based on the obtained image processing results of the images to be processed in various processing types and the difference between the labeling information of the sample images; and if the convergence is achieved, ending the training to obtain the trained neural network model.

And judging whether the neural network model in training converges according to the difference between the obtained image processing results of the images to be processed in various processing types and the labeling information of the sample image.

For example, in one implementation, in the neural network model under training, a plurality of branch networks included in each stage of processing layer may output a set of image processing results including a plurality of processing types based on the extracted branch image feature data;

correspondingly, judging whether the neural network model in training converges based on the obtained image processing results of the multiple processing types of the image to be processed and the difference between the labeling information of the sample image may include:

for each processing layer, calculating a branch loss value corresponding to the processing layer based on the difference between a group of image processing results output by a plurality of branch networks included in the processing layer and the labeling information of the sample image;

and judging whether the neural network model in training converges or not based on the calculated branch loss values.

During judgment, when each branch loss value is smaller than a corresponding preset threshold value, judging that the neural network model in training converges; or, the calculated branch loss values may be added to obtain a total loss value, and when the total loss value is smaller than a preset total loss value threshold, it is determined that the neural network model in training converges.

It is to be understood that the plurality of branch networks included in each of the processing layers may determine a set of image processing results of a plurality of processing types based on the extracted branch image feature data, in addition to the branch image feature data. Therefore, in the embodiment of the present disclosure, in order to obtain an accurate image processing result of multiple processing types, image processing results of multiple processing types determined by multiple branch networks included in the last processing layer are used. Therefore, the neural network model used in the embodiment of the disclosure realizes the precision of the image processing result from coarse to fine, so that more precise image processing results of multiple processing types can be obtained.

Corresponding to the image processing method described above, an embodiment of the present disclosure further provides an image processing apparatus, as shown in fig. 4, the apparatus including: an obtaining module and a predicting module;

an acquisition module 401 configured to acquire an image to be processed;

an image processing module 402, configured to input the image to be processed to a pre-trained neural network model, so as to obtain image processing results of multiple processing types of the image to be processed;

Optionally, the branch image feature data is a multi-dimensional matrix;

The image processing device provided by the embodiment of the disclosure is used for processing images to be processed by a neural network model in multiple processing types, and comprises a multi-level processing layer; extracting branch image characteristic data of a processing type corresponding to each branch network from data input to the processing layer by a shared convolution network and a plurality of branch networks, aggregating the extracted branch image characteristic data by a characteristic aggregation network to obtain aggregated image characteristic data, and outputting the aggregated image characteristic data to the processing layer of the next stage; in the last stage of the processing layer, the shared convolution network and the plurality of branch networks extract branch image feature data of a processing type corresponding to each branch network from data input to the processing layer, and each branch network determines an image processing result of the corresponding processing type based on the extracted branch image feature data. The feature aggregation network aggregates the obtained aggregated image feature data to synthesize the branch image feature data extracted by each branch network, so that the branch image feature data extracted from the aggregated image feature data by each branch network is more accurate; in this way, in the processing layer at the last stage, each branch network can improve the accuracy of the corresponding type of image processing result when determining the corresponding type of image processing result based on the branch image feature data extracted from the aggregated image feature data. In addition, the accuracy of the image processing result determined by the neural network model is higher, so that the convergence can be realized quickly in the training stage of the neural network model, and the learning efficiency of the neural network model is improved.

Fig. 5 is a block diagram illustrating an electronic device according to an example embodiment, the electronic device including, as shown in fig. 5:

a processor 510;

a memory 520 for storing instructions executable by the processor 510;

wherein the processor 510 is configured to execute the instructions to implement any of the image processing methods described above.

Fig. 6 is a block diagram illustrating an apparatus 600 for image processing according to an example embodiment. For example, the apparatus 600 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, an exercise device, a personal digital assistant, and the like.

Referring to fig. 6, apparatus 600 may include one or more of the following components: a processing component 602, a memory 604, a power component 606, a multimedia component 608, an audio component 610, an input/output (I/O) interface 612, a sensor component 614, and a communication component 616.

The processing component 602 generally controls overall operation of the device 600, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing component 602 may include one or more processors 620 to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component 602 can include one or more modules that facilitate interaction between the processing component 602 and other components. For example, the processing component 602 can include a multimedia module to facilitate interaction between the multimedia component 608 and the processing component 602.

The memory 604 is configured to store various types of data to support operations at the apparatus 600. Examples of such data include instructions for any application or method operating on device 600, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 604 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

Power supply component 606 provides power to the various components of device 600. The power components 606 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the apparatus 600.

The multimedia component 608 includes a screen that provides an output interface between the device 600 and a user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 608 includes a front facing camera and/or a rear facing camera. The front camera and/or the rear camera may receive external multimedia data when the device 600 is in an operating mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.

The audio component 610 is configured to output and/or input audio signals. For example, audio component 610 includes a Microphone (MIC) configured to receive external audio signals when apparatus 600 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signal may further be stored in the memory 604 or transmitted via the communication component 616. In some embodiments, audio component 610 further includes a speaker for outputting audio signals.

The I/O interface 612 provides an interface between the processing component 602 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.

The sensor component 614 includes one or more sensors for providing status assessment of various aspects of the apparatus 600. For example, the sensor component 614 may detect an open/closed state of the device 600, the relative positioning of components, such as a display and keypad of the apparatus 600, the sensor component 614 may also detect a change in position of the apparatus 600 or a component of the apparatus 600, the presence or absence of user contact with the apparatus 600, orientation or acceleration/deceleration of the apparatus 600, and a change in temperature of the apparatus 600. The sensor assembly 614 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 614 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 614 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 616 is configured to facilitate communications between the apparatus 600 and other devices in a wired or wireless manner. The apparatus 600 may access a wireless network based on a communication standard, such as WiFi, an operator network (such as 2G, 3G, 4G, or 5G), or a combination thereof. In an exemplary embodiment, the communication component 616 receives broadcast signals or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 616 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the apparatus 600 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the above-described methods.

In an exemplary embodiment, there is also provided a storage medium comprising instructions, such as memory 604 comprising instructions, executable by processor 620 of device 600 to perform the image processing method described above. Alternatively, the storage medium may be a non-transitory computer readable storage medium, which may be, for example, a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

Fig. 7 is a block diagram illustrating an apparatus 700 for image processing according to an example embodiment. For example, the apparatus 700 may be provided as a server. Referring to fig. 7, apparatus 700 includes a processing component 722 that further includes one or more processors and memory resources, represented by memory 732, for storing instructions, such as applications, that are executable by processing component 722. The application programs stored in memory 732 may include one or more modules that each correspond to a set of instructions. Further, the processing component 722 is configured to execute instructions to perform the image processing methods described above.

The apparatus 700 may also include a power component 726 configured to perform power management of the apparatus 700, a wired or wireless network interface 750 configured to connect the apparatus 700 to a network, and an input output (I/O) interface 758. The apparatus 700 may operate based on an operating system stored in memory 732, such as a Windows Server, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, or similar operating system.

In an exemplary embodiment, there is also provided a storage medium having a computer program stored therein, which when executed by a processor implements any of the image processing methods described above.

Alternatively, the storage medium may be a non-transitory computer readable storage medium, which may be, for example, a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

In an exemplary embodiment, there is also provided a computer program product which, when run on a computer, causes the computer to perform any of the image processing methods described above.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice in the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements that have been described above and shown in the drawings, and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the claims.

Claims

1. An image processing method, comprising:

acquiring an image to be processed;

the neural network model comprises a plurality of processing layers, and each processing layer comprises: the convolutional network and the branch networks are shared, and each non-final processing layer also comprises a characteristic aggregation network; the plurality of branch networks correspond to the plurality of processing types one to one;

the data input to the first-stage processing layer is image feature data extracted from the image to be processed, the data input to each non-first-stage processing layer is the aggregated image feature data output by the previous-stage processing layer.

2. The method according to claim 1, wherein the plurality of branch networks included in each stage of the processing layer are a group of branch networks having the same network structure and different model parameters; the shared convolution networks included in each stage of processing layer have the same network structure and different convolution parameters.

3. The method according to claim 1 or 2, wherein in each processing layer, extracting branch image feature data of a processing type corresponding to each branch network from data input to the processing layer by the shared convolutional network and the plurality of branch networks comprises:

4. The method of claim 1, wherein the branch image feature data is a multi-dimensional matrix;

calculating a descriptor of each extracted branch image feature data based on the dimension of the target data to be aggregated; the dimension of the target data to be aggregated is picture channel data;

5. The method according to claim 4, wherein the calculating the descriptor of each extracted branch image feature data based on the dimension of the target data to be aggregated comprises:

6. The method according to claim 4, wherein the aggregating the extracted branch image feature data according to an aggregation manner defined in the feature aggregation network based on the calculated descriptors to obtain aggregated image feature data comprises:

7. The method of claim 6, wherein the weight calculation Function comprises a normalized exponential Function Softmax Function.

8. The method of claim 1, wherein the image processing results of the plurality of processing types comprise a plurality of the following image processing results:

9. An image processing apparatus characterized by comprising:

an acquisition module configured to acquire an image to be processed;

10. The apparatus according to claim 9, wherein the plurality of branch networks included in each stage of the processing layer are a group of branch networks having the same network structure and different model parameters; the shared convolution networks included in each stage of processing layer have the same network structure and different convolution parameters.

11. The apparatus according to claim 9 or 10, wherein in each processing layer, extracting branch image feature data of a processing type corresponding to each branch network from data input to the processing layer by the shared convolutional network and the plurality of branch networks comprises:

12. The apparatus of claim 9, wherein the branch image feature data is a multi-dimensional matrix;

13. The apparatus according to claim 12, wherein the calculating the descriptor of each extracted branch image feature data based on the dimension of the target data to be aggregated comprises:

wherein the pooling process is performed on data which does not belong to the target data dimension; the convolution operation is a convolution operation of data about the target data dimension.

14. The apparatus according to claim 12, wherein the aggregating the extracted branch image feature data according to an aggregation manner defined in the feature aggregation network based on the calculated descriptors to obtain aggregated image feature data includes:

15. The apparatus of claim 14, wherein the weight calculation Function comprises a normalized exponential Function Softmax Function.

16. The apparatus of claim 9, wherein the image processing results of the plurality of processing types comprise a plurality of the following image processing results:

17. An electronic device, comprising:

a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the method of any one of claims 1-8.

18. A storage medium, characterized in that a computer program is stored in the storage medium, which computer program, when being executed by a processor, carries out the method steps of any one of claims 1 to 8.