CN112967198A

CN112967198A - Image processing method and device

Info

Publication number: CN112967198A
Application number: CN202110247369.2A
Authority: CN
Inventors: 王诗吟; 周强
Original assignee: Beijing Zitiao Network Technology Co Ltd
Current assignee: Beijing Zitiao Network Technology Co Ltd
Priority date: 2021-03-05
Filing date: 2021-03-05
Publication date: 2021-06-15

Abstract

The present disclosure relates to an image processing method and apparatus. The method comprises the following steps: acquiring an image to be processed and a first mask, wherein the image to be processed contains a target object, a partial region of the target object is shielded by a shielding object, and the first mask is a mask of a region of the target object which is not shielded by the shielding object or a mask of a region of the target object which is shielded by the shielding object; and completing the content of the area of the target object, which is shielded by the shielding object, according to the image to be processed and the first mask. Therefore, the content of the region of the target object which is shielded by the shielding object in the image to be processed can be supplemented by focusing attention on the variation trend of the region of the target object which is shielded by the shielding object or the variation trend of the region of the target object which is not shielded by the shielding object, the calculation amount for estimating the content of the region of the target object which is shielded by the shielding object is reduced, and tasks such as target tracking, target detection, image segmentation and the like can be reliably completed.

Description

Image processing method and device

Technical Field

The present disclosure relates to the field of image processing technologies, and in particular, to an image processing method and apparatus.

Background

In the process of image acquisition, the image is easily damaged or interfered by noise due to the influence of factors such as illumination, equipment and algorithm, so that the content of the image cannot be correctly expressed. Therefore, image inpainting (inpainting) has become one of the important steps in the preprocessing of computer vision tasks, and has direct correlation with task results such as target tracking, target detection, image segmentation, and the like.

At present, the repairing process of the image usually adopts a neural network model to complete the image. The neural network model usually directly applies an attention mechanism to infer the content of the image needing to be completed.

However, the content of the image to be completed is changed in the processing process of the neural network model, such as the neural network model based on partial convolution filling (PartialConv), but the attention mechanism cannot successfully capture the above characteristics, so that the content of the image to be completed inferred by the neural network model is not accurate enough, which is not beneficial to realizing tasks such as target tracking, target detection, image segmentation, and the like, and the calculated amount of the neural network model is increased, so that the content of the image to be completed cannot be obtained in time.

Disclosure of Invention

To solve the above technical problem or at least partially solve the above technical problem, the present disclosure provides an image processing method and apparatus.

In a first aspect, the present disclosure provides an image processing method, including:

acquiring an image to be processed and a first mask, wherein the image to be processed contains a target object, a partial region of the target object is shielded by a shielding object, and the first mask is a mask of a region of the target object which is not shielded by the shielding object or a mask of a region of the target object which is shielded by the shielding object;

and inputting the image to be processed and the first mask into the neural network model, and completing the content of the area of the target object, which is shielded by the shielding object.

By the method provided by the first aspect, the mask of the region of the target object which is not blocked by the blocking object or the mask of the region of the target object which is blocked by the blocking object is introduced into the attention mechanism, so that the attention mechanism can focus on the variation trend of the region of the target object which is blocked by the blocking object or the variation trend of the region of the target object which is not blocked by the blocking object, so that the content of the region of the target object which is blocked by the blocking object can be accurately inferred based on the characteristics of the region of the target object which is not blocked by the blocking object in the image to be processed and the characteristics of the predicted complete region of the target object, the completion of the content of the region of the target object which is blocked by the blocking object in the image to be processed is realized, the image to be processed after the region of the target object which is blocked by the blocking object is supplemented can be accurately output, and the calculation amount for inferring the content of the region of the target object which is blocked by the blocking object is, the method is beneficial to reliably completing tasks such as target tracking, target detection, image segmentation and the like.

In one possible design, inputting the image to be processed and the first mask into the neural network model, completing the content of the region of the target object which is blocked by the blocking object, including: processing the image to be processed and the first mask through a plurality of network layers in the neural network model, and completing the content of the area of the target object, which is shielded by the shielding object; wherein, the processing procedure of at least one network layer in the plurality of network layers comprises: receiving a first feature map and a second mask output by a previous network layer, wherein the first feature map and the second mask are obtained based on an image to be processed and the first mask; obtaining a second feature map and a third mask according to the first feature map and the second mask, wherein the second feature map is a feature of the target object after the content of the region shielded by the shielding object is enhanced, and the number of channels of the second feature map is the same as that of the first feature map; and outputting the second feature map and the third mask to a subsequent network layer.

In one possible design, obtaining the second feature map and the third mask according to the first feature map and the second mask includes: updating the first feature map by attention mechanism processing using the first feature map and the second mask; and performing partial convolution processing on the second mask and the updated first feature map to obtain a second feature map and a third mask.

In one possible design, the second mask and the third mask are masks of areas of the target object which are not blocked by the blocking object, the areas of the target object which are not blocked by the blocking object and correspond to the third mask are larger than the areas of the target object which are not blocked by the blocking object and correspond to the second mask, and the areas of the target object which are not blocked by the blocking object and correspond to the second mask are larger than the areas of the target object which are not blocked by the blocking object and correspond to the first mask.

In one possible design, the second mask and the third mask are masks of areas of the target object which are blocked by the blocking object, the area of the target object which is blocked by the blocking object and corresponds to the third mask is smaller than the area of the target object which is blocked by the blocking object and corresponds to the second mask, and the area of the target object which is blocked by the blocking object and corresponds to the second mask is smaller than the area of the target object which is blocked by the blocking object and corresponds to the first mask.

In one possible design, the number of lanes of the first feature map and the number of lanes of the second mask are the same, and the first feature map and the second mask are multi-lane.

In a second aspect, the present disclosure provides an image processing apparatus comprising:

the device comprises an acquisition module, a processing module and a first mask, wherein the acquisition module is used for acquiring an image to be processed and a first mask, the image to be processed comprises a target object, a part of region of the target object is shielded by a shielding object, and the first mask is the mask of the region of the target object which is not shielded by the shielding object or the mask of the region of the target object which is shielded by the shielding object;

and the processing module is used for inputting the image to be processed and the first mask into the neural network model and completing the content of the area of the target object, which is shielded by the shielding object.

In one possible design, the processing module is specifically configured to process the image to be processed and the first mask through a plurality of network layers in the neural network model, and complement the content of the region of the target object that is blocked by the blocking object; wherein, the processing procedure of at least one network layer in the plurality of network layers comprises: receiving a first feature map and a second mask output by a previous network layer, wherein the first feature map and the second mask are obtained based on an image to be processed and the first mask; obtaining a second feature map and a third mask according to the first feature map and the second mask, wherein the second feature map is a feature of the target object after the content of the region shielded by the shielding object is enhanced, and the number of channels of the second feature map is the same as that of the first feature map; and outputting the second feature map and the third mask to a subsequent network layer.

In one possible design, the processing module is specifically configured to update the first feature map by attention mechanism processing using the first feature map and the second mask; and performing partial convolution processing on the second mask and the updated first feature map to obtain a second feature map and a third mask.

The beneficial effects of the image processing apparatus provided in the second aspect and the possible designs of the second aspect may refer to the beneficial effects brought by the possible embodiments of the first aspect and the first aspect, and are not described herein again.

In a third aspect, the present disclosure provides an electronic device, comprising: a memory and a processor; the memory is used for storing program instructions; the processor is configured to invoke the program instructions in the memory to cause the electronic device to perform the image processing method of the first aspect and any one of the possible designs of the first aspect.

In a fourth aspect, the present disclosure provides a computer storage medium comprising computer instructions that, when run on an electronic device, cause the electronic device to perform the image processing method of the first aspect and any one of the possible designs of the first aspect.

In a fifth aspect, the present disclosure provides a computer program product for causing a computer to perform the image processing method of the first aspect and any one of the possible designs of the first aspect when the computer program product runs on the computer.

In a sixth aspect, the present disclosure provides a chip system, comprising: a processor; the electronic device performs the image processing method of the first aspect and any one of the possible designs of the first aspect when the processor executes the computer instructions stored in the memory.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure.

In order to more clearly illustrate the embodiments or technical solutions in the prior art of the present disclosure, the drawings used in the description of the embodiments or prior art will be briefly described below, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without inventive exercise.

Fig. 1 is a schematic structural diagram of 19 parts of a human body according to an embodiment of the present disclosure;

fig. 2 is a schematic flowchart of an image processing method according to an embodiment of the disclosure;

fig. 3 is a schematic structural diagram of a neural network model provided in an embodiment of the present disclosure;

fig. 4 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present disclosure.

Detailed Description

In order that the above objects, features and advantages of the present disclosure may be more clearly understood, aspects of the present disclosure will be further described below. It should be noted that the embodiments and features of the embodiments of the present disclosure may be combined with each other without conflict.

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present disclosure, but the present disclosure may be practiced in other ways than those described herein; it is to be understood that the embodiments disclosed in the specification are only a few embodiments of the present disclosure, and not all embodiments.

Illustratively, the present disclosure provides an image processing method, apparatus, device, computer storage medium and computer program product, which, by introducing a mask of a target object, guides a neural network model to capture a trend of a change of a region to be supplemented of the target object or a trend of a change of a region to be not supplemented of the target object. Therefore, the content of the region to be compensated of the target object can be accurately deduced, the calculation amount of the region to be compensated of the target object can be reduced, and tasks such as target tracking, target detection, image segmentation and the like can be timely and accurately realized.

Wherein, the image processing method of the present disclosure is executed by an electronic device. The electronic device may be a tablet computer, a mobile phone (e.g., a folding screen mobile phone, a large screen mobile phone, etc.), a wearable device, a vehicle-mounted device, an Augmented Reality (AR)/Virtual Reality (VR) device, a notebook computer, an ultra-mobile personal computer (UMPC), a netbook, a Personal Digital Assistant (PDA), a smart television, a smart screen, a high-definition television, a 4K television, a smart speaker, an intelligent projector, and other internet of things (IOT) devices, and the specific type of the electronic device is not limited by the disclosure.

The mask of the target object may be understood as a code formed by dividing each part of the target object, and each part corresponds to 0 or 1. For example, when the target object is a human body, N parts of the human body are divided, and the divided N parts are marked with a number 0 or 1, respectively, to obtain a mask of the human body. Wherein N is a positive integer.

When N ═ 19, as shown in fig. 1, 19 parts of the human body may include: the human body analysis mask comprises a head a1, a neck a2, a left shoulder a3, a right shoulder a4, a left upper arm a5, a right upper arm a6, a left lower arm a7, a right lower arm a8, a left hand a9, a right hand a10, a left hip a11, a right hip a12, a left thigh a13, a right thigh a14, a left calf a15, a right calf a16, a left foot a17, a right foot a18 and a body a19, and the human body analysis mask is a code formed by 19 parts of the human body corresponding to one number respectively. In addition, the present disclosure is not limited to splitting the human body into 19 parts.

Based on the foregoing description, the image processing method provided by the present disclosure will be explained in detail by taking an electronic device as an example, and combining with the accompanying drawings and application scenarios.

Referring to fig. 2, fig. 2 is a schematic flow chart of an image processing method according to an embodiment of the disclosure. As shown in fig. 2, the image processing method provided by the present disclosure may include:

s101, acquiring an image to be processed and a first mask.

The electronic device may acquire an image to be processed. The image to be processed comprises a target object, and a partial area of the target object is shielded by a shielding object. The present disclosure does not limit the size, format, content, and other parameters of the image to be processed.

Based on the foregoing description, the present disclosure divides the complete area of the target object into an area of the target object that is not occluded by an occlusion and an area of the target object that is occluded by an occlusion.

The target object may include, but is not limited to, a human body, an animal, or an article. The present disclosure does not limit parameters such as the size, shape, and position of an area of the target object that is not blocked by the blocking object and an area of the target object that is blocked by the blocking object.

The first mask may be a mask of an area of the target object that is not occluded by the obstruction. The electronic equipment processes an area, which is not covered by the shielding object, of the target object in the image to be processed, and a first mask is obtained.

The first mask may also be a mask of an area of the target object that is occluded by the occlusion. The electronic equipment processes the area, shielded by the shielding object, of the target object in the image to be processed to obtain a first mask. Or the electronic device processes the area of the target object in the image to be processed, which is not covered by the blocking object, and the complete area of the target object in the image to be processed to obtain the first mask.

Wherein the complete region of the target object in the image to be processed is predicted. In addition, the electronic device may further obtain a mask of the complete region of the target object, and the image to be processed, the first mask, and the mask of the complete region of the target object may be output to the electronic device as a complete input.

S102, inputting the image to be processed and the first mask into the neural network model, and completing the content of the area of the target object, which is shielded by the shielding object.

The electronic equipment can adopt a neural network model, based on the image to be processed and the first mask, can focus attention in the neural network model on the variation trend of the region of the target object, which is shielded by the shielding object, and can complement the content of the region of the target object, which is shielded by the shielding object, so as to output the image to be processed after the content of the region of the target object, which is shielded by the shielding object, is complemented.

The image processing method provided by the disclosure introduces the mask of the region of the target object which is not blocked by the blocking object or the mask of the region of the target object which is blocked by the blocking object in the attention mechanism, so that the attention mechanism can focus on the variation trend of the region of the target object which is blocked by the blocking object or the variation trend of the region of the target object which is not blocked by the blocking object, so that the content of the region of the target object which is blocked by the blocking object can be accurately deduced based on the characteristics of the region of the target object which is not blocked by the blocking object in the image to be processed and the characteristics of the predicted complete region of the target object, the completion of the content of the region of the target object which is blocked by the blocking object in the image to be processed is realized, the image to be processed after the region of the target object which is blocked by the blocking object is supplemented can be accurately output, and the calculation amount for deducing the content of the region of the target object which is blocked by the blocking object is reduced, the method is beneficial to reliably completing tasks such as target tracking, target detection, image segmentation and the like.

Referring to fig. 3, fig. 3 is a schematic structural diagram of a neural network model according to an embodiment of the present disclosure. As shown in fig. 3, the neural network model 10 provided by the present disclosure may include: an input layer 11, an intermediate layer 12, an output layer 13, and an attention mechanism module 14. The attention mechanism module 14 is disposed between the input layer 11 and the intermediate layer 12, and/or the attention mechanism module 14 is disposed between the intermediate layer 12 and the output layer 13, and/or the attention mechanism module 14 is disposed between a plurality of intermediate layers 12.

Also, the present disclosure does not limit the number of intermediate layers 12 and attention mechanism modules 14. In some embodiments, the attention mechanism module 14 is symmetrically disposed in the neural network model 10.

The input layer 11 is mainly configured to receive an image to be processed (indicated by a letter a1 in fig. 3) and a first mask (indicated by a letter M0 in fig. 3), and output a feature map (indicated by a letter F11 in fig. 3) and a mask (indicated by a letter M1 in fig. 3) obtained by performing processing such as convolution on the image to be processed and the first mask to the intermediate layer 12 or the attention mechanism module 14.

The intermediate layer 12 is mainly configured to receive the mask and the feature map from the input layer 11 or the attention mechanism module 14 or the intermediate layer 12, and transmit the feature map and the mask obtained after performing processing such as feature extraction, convolution, channel number dimension reduction, dimension increase and the like on the feature map and the mask to the output layer 13 or the attention mechanism module 14 or the intermediate layer 12.

The output layer 13 is mainly configured to receive the feature map from the intermediate layer 12 or the attention mechanism module 14, receive the mask from the intermediate layer 12 or the attention mechanism module 14, and perform processing such as convolution on the feature map and the mask to output a to-be-processed image (illustrated with a letter a2 in fig. 3) in which the content of the region of the target object that is blocked by the blocking object is filled.

The attention mechanism module 14 is mainly configured to receive a feature map (illustrated by letters F11, F21, and F31 in fig. 3) and a mask (illustrated by letters M1, M2, and M3 in fig. 3) from the input layer 11 or the intermediate layer 12, and perform attention mechanism processing on the feature map and the mask to obtain an updated feature map (illustrated by letters F12, F22, and F32 in fig. 3). The feature map and the mask code input by the attention mechanism module 14 are divided into two paths, the feature and the mask code on one path are multiplied as a key, the feature and the mask code on the other path are multiplied as a query, a point-level correlation relationship is obtained, and an updated feature map is output through attention mechanism processing.

Also, the attention mechanism module 14 updates the feature map, and does not update the input mask, so the attention mechanism module 14 may output the mask and the updated feature map to the middle layer 12 or the output layer 13.

It should be noted that the feature maps input and output in the same attention mechanism module 14 keep the same number of channels (i.e., the same resolution), and the feature maps output in different attention mechanism modules 14 may have different numbers of channels (i.e., different resolutions), so as to improve the final content completion effect.

Based on the foregoing description, the neural network model 10 may include: a plurality of network layers. One of the network layers may be the input layer 11, one or more intermediate layers 12, the attention mechanism module 14 and a layer connected to the output side (i.e., the side close to the output layer 13) of the attention mechanism module 14, or the attention mechanism module 14 and a layer connected to the input side (i.e., the side close to the input layer 11) of the attention mechanism module 14.

The layer connected to the output side of the attention mechanism module 14 may be the intermediate layer 12 or the output layer 13, and the layer connected to the input side of the attention mechanism module 14 may be the input layer 11 or the intermediate layer 12.

Also, the neural network model 10 includes at least one network layer including the attention mechanism module 14 and a layer connected to the output side of the attention mechanism module 14, or the attention mechanism module 14 and a layer connected to the input side of the attention mechanism module 14.

Therefore, the electronic device can process the image to be processed and the first mask through a plurality of network layers, and in the network layer containing the attention mechanism module 14, attention can be focused on the variation trend of the region of the target object which is shielded by the shielding object or the variation trend of the region of the target object which is not shielded by the shielding object, so that the calculation amount of deducing the content of the region of the target object which is shielded by the shielding object is reduced, and the content of the region which is shielded by the shielding object and is completely supplemented is reduced.

For example, in conjunction with fig. 3, assume that the neural network model 10 includes one input layer 11, two intermediate layers 12, one output layer 13, and three attention mechanism modules 14.

Wherein a first attention mechanism module 14 is disposed between the input layer 11 and the first intermediate layer 12, a second attention mechanism module 14 is disposed between the first intermediate layer 12 and the second intermediate layer 12, and a third attention mechanism module 14 is disposed between the second intermediate layer 12 and the output layer 13.

The input layer 11 may receive the image to be processed a1 and the first mask M0, and the input layer 11 performs partial convolution and other processing on the image to be processed a1 and the first mask M0 to obtain a feature map F11 and a mask M1. The first mask M0 may be a mask B1 or a mask C1, the mask M1 may be a mask B2 or a mask C2, a mask of a white area in the mask B1/B2 is a mask corresponding to an area that is not blocked by a blocking object, and a mask of a white area in the mask C1/C2 is a mask corresponding to an area that is blocked by a blocking object.

The input layer 11 may communicate the feature map F11 and the mask M1 to the first attention mechanism module 14. The first attention mechanism module 14 performs attention mechanism processing on the feature map F11 and the mask M1 to obtain a feature map F12, and transmits the feature map F12 and the mask M1 to the first intermediate layer 12. Alternatively, the mask M1 may be transmitted from the input layer 11 to the first intermediate layer 12.

The first intermediate layer 12 performs partial convolution and other processing on the feature map F12 and the mask M1 to obtain a feature map F21 and a mask M2. The mask M2 may be a mask B3 or a mask C3, the mask of the white area in the mask B3 is a mask corresponding to an area not blocked by the blocking object, and the mask of the white area in the mask C3 is a mask corresponding to an area blocked by the blocking object.

The first intermediate level 12 may communicate the feature map F21 and the mask M2 to the second attention mechanism module 14. The second attention mechanism module 14 performs attention mechanism processing on feature map F21 and mask M2 to obtain feature map F22, and transmits feature map F22 and mask M2 to the second intermediate layer 12. Alternatively, mask M2 may be transferred from the first intermediate layer 12 to the second intermediate layer 12.

The second intermediate layer 12 performs partial convolution and other processing on the feature map F22 and the mask M2 to obtain a feature map F31 and a mask M3. The mask M3 may be a mask B4 or a mask C4, the mask of the white area in the mask B4 is a mask corresponding to an area not blocked by the blocking object, and the mask of the white area in the mask C4 is a mask corresponding to an area blocked by the blocking object.

The second intermediate level 12 may communicate the feature map F31 and the mask M3 to the third attention mechanism module 14. The third attention mechanism module 14 performs attention mechanism processing on the feature map F31 and the mask M3 to obtain a feature map F32, and transmits the feature map F32 and the mask M3 to the output layer 13. Alternatively, the mask M3 may be transferred from the second intermediate layer 12 to the output layer 13.

The output layer 13 performs processing such as partial convolution on the feature map F32 and the mask M3 to obtain a processed image a 2.

According to the arrangement sequence of the first mask M0, the mask M1, the mask M2 and the mask M3, the area of the target object which is not blocked by the blocking object gradually increases, and the area of the target object which is blocked by the blocking object gradually decreases.

It should be noted that, besides the structure shown in fig. 3, the present disclosure does not limit the specific implementation manner of the neural network model 10, and only needs to satisfy that at least one network layer including the attention mechanism module 14 exists in the neural network model 10.

Assume that a network layer including the attention mechanism module 14 is network layer 1, and network layer 1 is disposed between network layer 0 and network layer 2, network layer 0 is a previous network layer of network layer 1, and network layer 2 is a next network layer of network layer 1.

The present disclosure does not limit the specific types of network layer 0 and network layer 2.

When the network layer 0 is the input layer 11 or the middle layer 12, the network layer 1 may receive the first feature map and the second mask from the network layer 0. The first feature map and the second mask are obtained by the network layer 0 based on the first mask and the image to be processed.

For example, when the network layer 0 is the input layer 11, the first feature map and the second mask are obtained by performing processing such as partial convolution on the first mask and the image to be processed by the input layer 11.

For another example, when the network layer 0 is the intermediate layer 12, the first feature map and the second mask are obtained by the intermediate layer 12 through partial convolution processing and/or attention mechanism processing on the first mask and the image to be processed, and the obtained mask and feature map.

When the attention mechanism module 14 is included in the network layer 0, the network layer 1 may receive the first feature map and the second mask from the attention mechanism module 14 in the network layer 0. The first feature map and the second mask are obtained by the network layer 0 based on the first mask and the image to be processed. In addition, the network layer 1 may also receive the second mask from the input layer 11 or the middle layer 12 in the network layer 0.

For example, the first feature map is obtained by performing attention mechanism processing by the attention mechanism module 14 in the network layer 0, and the second mask is obtained by performing processing such as partial convolution by the input layer 11 or the intermediate layer 12 in the network layer 0.

In addition, the number of channels of the first feature map and the number of channels of the second mask are kept the same, the processing steps of the attention mechanism module 14 are reduced, and the first feature map and the second mask are multi-channel.

The network layer 1 may update the first feature map through attention mechanism processing according to the first feature map and the second mask, and perform partial convolution processing on the second mask and the updated first feature map to obtain a second feature map and a third mask. Thus, the network layer 1 outputs the second feature map and the third mask to the network layer 2. The second feature map is a feature of the target object after the content of the region blocked by the blocking object is enhanced, so that the characteristics of the region of the target object blocked by the blocking object can be accurately deduced.

It should be noted that the network layer 1 may use parameters such as confidence level to determine the third mask. For example, when the confidence of the edge region in the region of the target object that is not occluded by the occlusion object is greater than or equal to the preset threshold, the network layer 1 may update the region of the target object that is not occluded by the occlusion object to the difference between the region of the target object that is not occluded by the occlusion object and the edge region. For another example, when the confidence of the edge region in the region of the target object that is blocked by the blocking object is greater than or equal to the preset threshold, the network layer 1 may update the region of the target object that is blocked by the blocking object to the difference between the region of the target object that is blocked by the blocking object and the edge region.

In addition, the number of channels of the second feature map is the same as that of the first feature map, and the attention mechanism module 14 can be seamlessly inserted into layers of different stages of the neural network model 10, so that the current network layer does not need to adjust the spatial dimension of the output feature map, update iteration of the rest network layers is not affected, and the process of outputting the feature map of the next network layer by the current network layer is simplified.

Wherein the second mask and the third mask may adopt a variety of possible implementations.

In some embodiments, the second mask and the third mask are masks of regions of the target object that are not occluded by the obstruction. The area of the target object corresponding to the third mask, which is not covered by the covering object, is larger than the area of the target object corresponding to the second mask, which is not covered by the covering object, and the area of the target object corresponding to the second mask, which is not covered by the covering object, is larger than the area of the target object corresponding to the first mask, which is not covered by the covering object.

Therefore, the attention mechanism module 14 in multiple network layers can continuously and dynamically change the mask based on the feature maps output by the modules, which reflects that the trend of the change of the area of the target object not covered by the obstruction is larger and larger, and is beneficial to reducing the calculation amount for deducing the content of the area of the target object covered by the obstruction.

Referring to fig. 3, for example, if the network layer 1 includes the second attentiveness module 14 and the first intermediate layer 12, the second mask is a mask B2 in the mask M1, and the third mask is a mask B3 in the mask M2. For another example, assuming that the network layer 1 includes the second attention mechanism module 14 and the second intermediate layer 12, the second mask is mask B3 in mask M2, and the third mask is mask B4 in mask M3.

In some embodiments, the second mask and the third mask are masks of an area of the target object that is blocked by the blocking object, the area of the target object that is blocked by the blocking object and corresponds to the third mask is smaller than the area of the target object that is blocked by the blocking object and corresponds to the second mask, and the area of the target object that is blocked by the blocking object and corresponds to the second mask is smaller than the area of the target object that is blocked by the blocking object and corresponds to the first mask.

Therefore, the attention mechanism module 14 in multiple network layers can continuously and dynamically change the mask based on the feature maps output by the modules, and reflect that the trend of the change of the region of the target object, which is shielded by the shielding object, is smaller and smaller, which is beneficial to reducing the calculation amount for deducing the content of the region of the target object, which is shielded by the shielding object.

Referring to fig. 3, for example, if the network layer 1 includes the second attentiveness module 14 and the first intermediate layer 12, the second mask is the mask C2 in the mask M1, and the third mask is the mask C3 in the mask M2. For another example, assuming that the network layer 1 includes the second attention mechanism module 14 and the second middle layer 12, the second mask is the mask C3 in the mask M2, and the third mask is the mask C4 in the mask M3.

In addition, for the plurality of network layers including the attention mechanism module 14 in the neural network model 10, in the process of processing a part of the network layers, the third mask and the second mask may be the same, and an area of the target object corresponding to the second mask, which is not covered by the shielding object, is larger than an area of the target object corresponding to the first mask, which is not covered by the shielding object, so that it is only necessary to ensure that the whole mask in the neural network model 10 keeps dynamically changing.

Therefore, the attention mechanism module 14 in multiple network layers can perform slow dynamic change on the mask based on the feature maps output by the modules, and the calculation amount of the network layers can be reduced on the basis of reflecting that the area of the target object which is not shielded by the shielding object is larger and larger or the area of the target object which is shielded by the shielding object is smaller and smaller.

In summary, the network layer 1 can enhance the feature corresponding to the content of the region of the target object that is blocked by the blocking object based on the output second feature map through the above process, and the output third mask can reduce the size of the region of the target object that is blocked by the blocking object or increase the size of the region of the target object that is not blocked by the blocking object, so as to accurately infer the content of the region of the target object that is blocked by the blocking object based on the feature map and the third mask, and reduce the amount of calculation for inferring the content of the region of the target object that is blocked by the blocking object.

It should be noted that the force mechanism module 14 may be provided separately and not bound to other layers. I.e., at least one attention mechanism module 14 is disposed between any two layers.

The present disclosure may implement the above process by using programming methods such as PyTorch, computing platform (CUDA), and the like.

Illustratively, the present disclosure provides an image processing apparatus.

Referring to fig. 4, fig. 4 is a schematic structural diagram of an image processing apparatus according to an embodiment of the disclosure. The image processing apparatus of the present disclosure may be disposed in an electronic device, and the image processing method that can implement the embodiment of fig. 1 to 3 described above corresponds to the operation of the electronic device. As shown in fig. 4, the image processing apparatus 400 provided by the present disclosure may include: an acquisition module 401 and a processing module 402.

An obtaining module 401, configured to obtain an image to be processed and a first mask, where the image to be processed includes a target object, a partial region of the target object is blocked by a blocking object, and the first mask is a mask of a region of the target object that is not blocked by the blocking object, or a mask of a region of the target object that is blocked by the blocking object;

and the processing module 402 is configured to input the image to be processed and the first mask into the neural network model, and complement the content of the region of the target object that is blocked by the blocking object.

In some embodiments, the processing module 402 is specifically configured to process the image to be processed and the first mask through a plurality of network layers in the neural network model, and fill the content of the area of the target object that is blocked by the blocking object; wherein, the processing procedure of at least one network layer in the plurality of network layers comprises: receiving a first feature map and a second mask output by a previous network layer, wherein the first feature map and the second mask are obtained based on an image to be processed and the first mask; obtaining a second feature map and a third mask according to the first feature map and the second mask, wherein the second feature map is a feature of the target object after the content of the region shielded by the shielding object is enhanced, and the number of channels of the second feature map is the same as that of the first feature map; and outputting the second feature map and the third mask to a subsequent network layer.

In some embodiments, the processing module 402 is specifically configured to update the first feature map by attention mechanism processing using the first feature map and the second mask; and performing partial convolution processing on the second mask and the updated first feature map to obtain a second feature map and a third mask.

In some embodiments, the second mask and the third mask are masks of areas of the target object which are not blocked by the blocking object, the areas of the target object which are not blocked by the blocking object and correspond to the third mask are larger than the areas of the target object which are not blocked by the blocking object and correspond to the second mask, and the areas of the target object which are not blocked by the blocking object and correspond to the second mask are larger than the areas of the target object which are not blocked by the blocking object and correspond to the first mask.

In some embodiments, the number of lanes of the first feature map and the number of lanes of the second mask are the same, and the first feature map and the second mask are multi-lane.

The image processing apparatus provided by the present disclosure may implement the above method embodiments, and specific implementation principles and technical effects thereof can be referred to the above method embodiments, which are not described herein again.

Illustratively, the present disclosure provides an electronic device comprising: one or more processors; a memory; and one or more computer programs; wherein the one or more computer programs are stored in the memory; the one or more processors, when executing the one or more computer programs, cause the electronic device to implement the image processing methods of the foregoing embodiments.

Illustratively, the present disclosure provides a chip system, which is applied to an electronic device including a memory, a sensor; the chip system includes: a processor; when the processor executes the image processing method of the foregoing embodiment.

Exemplarily, the present disclosure provides a computer-readable storage medium on which a computer program is stored, which, when being executed by a processor, causes an electronic device to implement the image processing method of the foregoing embodiment.

Illustratively, the present disclosure provides a computer program product which, when run on a computer, causes the computer to perform the image processing method of the foregoing embodiments.

In the above-described embodiments, all or part of the functions may be implemented by software, hardware, or a combination of software and hardware. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. The procedures or functions according to the embodiments of the disclosure are wholly or partially generated when the computer program instructions are loaded and executed on a computer. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium. The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., a floppy disk, a hard disk, a magnetic tape), an optical medium (e.g., a DVD), or a semiconductor medium (e.g., a Solid State Disk (SSD)), among others.

It is noted that, in this document, relational terms such as "first" and "second," and the like, may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

The foregoing are merely exemplary embodiments of the present disclosure, which enable those skilled in the art to understand or practice the present disclosure. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the disclosure. Thus, the present disclosure is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. An image processing method, characterized in that the method comprises:

inputting the image to be processed and the first mask into a neural network model, and completing the content of the area of the target object, which is shielded by the shielding object.

2. The method according to claim 1, wherein the inputting the image to be processed and the first mask into a neural network model, complementing the content of the area of the target object that is occluded by the occlusion, comprises:

processing the image to be processed and the first mask through a plurality of network layers in the neural network model, and completing the content of the area of the target object, which is shielded by the shielding object;

wherein the processing procedure of at least one of the plurality of network layers comprises:

receiving a first feature map and a second mask output by a previous network layer, wherein the first feature map and the second mask are obtained based on the image to be processed and the first mask;

obtaining a second feature map and a third mask according to the first feature map and the second mask, wherein the second feature map is a feature of the target object after the content of the region blocked by the blocking object is enhanced, and the number of channels of the second feature map is the same as that of the first feature map;

and outputting the second feature map and the third mask to a next network layer.

3. The method of claim 2, wherein obtaining a second feature map and a third mask according to the first feature map and the second mask comprises:

updating the first feature map by attention mechanism processing using the first feature map and the second mask;

and performing partial convolution processing on the second mask and the updated first feature map to obtain the second feature map and the third mask.

4. The method according to claim 2 or 3, wherein the second mask and the third mask are masks of areas of the target object which are not blocked by the blocking object, the areas of the target object which are not blocked by the blocking object and correspond to the third mask are larger than the areas of the target object which are not blocked by the blocking object and correspond to the second mask, and the areas of the target object which are not blocked by the blocking object and correspond to the second mask are larger than the areas of the target object which are not blocked by the blocking object and correspond to the first mask.

5. The method according to claim 2 or 3, wherein the second mask and the third mask are masks of areas of the target object that are blocked by the blocking object, the areas of the target object that are blocked by the blocking object and correspond to the third mask are smaller than the areas of the target object that are blocked by the blocking object and correspond to the second mask, and the areas of the target object that are blocked by the blocking object and correspond to the second mask are smaller than the areas of the target object that are blocked by the blocking object and correspond to the first mask.

6. The method of any of claims 2-5, wherein the number of lanes of the first feature map and the number of lanes of the second mask are the same, and wherein the first feature map and the second mask are multi-lane.

7. An image processing apparatus, characterized in that the apparatus comprises:

the device comprises an acquisition module, a processing module and a first mask, wherein the acquisition module is used for acquiring an image to be processed and a first mask, the image to be processed comprises a target object, a partial region of the target object is shielded by a shielding object, and the first mask is a mask of a region of the target object which is not shielded by the shielding object or a mask of a region of the target object which is shielded by the shielding object;

and the processing module is used for inputting the image to be processed and the first mask into a neural network model and completing the content of the area of the target object, which is shielded by the shielding object.

8. An electronic device, comprising: one or more processors; a memory; and one or more computer programs; wherein the one or more computer programs are stored in the memory; characterized in that the one or more processors, when executing the one or more computer programs, cause the electronic device to implement the image processing method of any of claims 1-6.

9. A computer storage medium comprising computer instructions which, when run on an electronic device, cause the electronic device to perform the image processing method of any one of claims 1-6.

10. A computer program product, characterized in that it causes a computer to carry out the image processing method according to any one of claims 1 to 6, when said computer program product is run on the computer.