CN112967197A - Image processing method, apparatus, electronic device, medium, and computer program product - Google Patents

Image processing method, apparatus, electronic device, medium, and computer program product Download PDF

Info

Publication number
CN112967197A
CN112967197A CN202110246097.4A CN202110246097A CN112967197A CN 112967197 A CN112967197 A CN 112967197A CN 202110246097 A CN202110246097 A CN 202110246097A CN 112967197 A CN112967197 A CN 112967197A
Authority
CN
China
Prior art keywords
mask
target object
target
sample
original image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110246097.4A
Other languages
Chinese (zh)
Inventor
王诗吟
周强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Zitiao Network Technology Co Ltd
Original Assignee
Beijing Zitiao Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Zitiao Network Technology Co Ltd filed Critical Beijing Zitiao Network Technology Co Ltd
Priority to CN202110246097.4A priority Critical patent/CN112967197A/en
Publication of CN112967197A publication Critical patent/CN112967197A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/77Retouching; Inpainting; Scratch removal
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The present disclosure relates to an image processing method, apparatus, electronic device, medium, and computer program product. Acquiring a first mask of a visible region of a target object in an original image, wherein the original image contains the target object, and a partial region of the target object is blocked by a blocking object; and after the original image and the first mask are stacked, inputting the stacked original image and the first mask into a target neural network model to obtain a target complete mask of a target object in the original image, wherein the target neural network model is obtained by training based on an original image sample, a preset analytic mask of the target object sample and a reference complete mask of the target object sample, and semantic information of each part of the target object is considered, so that the accuracy of obtaining the target complete mask of the target object is higher.

Description

Image processing method, apparatus, electronic device, medium, and computer program product
Technical Field
The present disclosure relates to the field of image processing technologies, and in particular, to an image processing method, an image processing apparatus, an electronic device, a medium, and a computer program product.
Background
In the field of image processing, it is usually necessary to complement some partially occluded target objects in an image, where the target objects may be human bodies or other objects, and assuming that a human body is partially occluded by an occlusion object in an image, a series of image processing is required to complement the occluded part of the human body, so as to obtain a complete human body. In general, in the process of completing a target object, a target complete mask of the target object needs to be predicted.
In the prior art, an original image is generally input into an example segmentation network, a mask of a visible region of a target object is obtained through the example segmentation network, and then a target complete mask of the target object is obtained through a neural network model based on the mask of the visible region of the target object.
However, the accuracy of obtaining the target complete mask of the target object using the prior art is not high.
Disclosure of Invention
In order to solve the technical problem, the present disclosure provides an image processing method, an apparatus, an electronic device, a medium, and a computer program product.
A first aspect of the present disclosure provides a method for processing an image, including:
acquiring a first mask of a visible region of a target object in an original image, wherein the original image contains the target object, and a partial region of the target object is shielded by a shielding object;
the original image and the first mask are stacked and then input into a target neural network model to obtain a target complete mask of a target object in the original image, wherein the target neural network model is obtained based on an original image sample, a preset analysis mask of the target object sample and a reference complete mask of the target object sample through training, the original image sample comprises the target object sample, a partial area of the target object sample is shielded by a shielding object, and the preset analysis mask comprises semantic mask labels of all parts of the target object sample.
Optionally, the method further includes:
and subtracting the first mask from the target complete mask of the target object to obtain a mask of an invisible area of the target object.
Optionally, after the stacking the original image and the first mask and inputting the stacked original image and the first mask into the target neural network model, the method further includes:
outputting a complete parsing mask of a target object in the original image, the complete parsing mask including semantic mask tags of portions of the target object.
Optionally, the method further includes:
subtracting the first mask from the target complete mask of the target object to obtain a mask of an invisible area of the target object;
and multiplying the mask of the invisible area of the target object by the complete analysis mask of the target object to obtain a semantic mask label of the invisible area of the target object.
Optionally, before the acquiring the first mask of the visible region of the target object in the original image, the method further includes:
acquiring a second mask of a target object sample in the original image sample;
inputting the original image sample and the second mask into a neural network model to obtain a target complete mask of a target object sample in the original image sample and a complete analysis mask of the target object sample;
and taking the reference complete mask of the target object sample and the preset analysis mask of the target object sample as supervision signals, obtaining a loss function by using the supervision signals, the target complete mask of the target object sample and the complete analysis mask of the target object sample, training the neural network model until the neural network model converges, and taking the converged neural network model as the target neural network model.
Optionally, the method further includes:
and inputting the original image sample into a target object analysis network model to obtain a preset analysis mask of the target object sample.
A second aspect of the present disclosure provides an image processing apparatus, the apparatus comprising:
the device comprises an acquisition module, a first mask module and a second mask module, wherein the acquisition module is used for acquiring a visible region of a target object in an original image, the original image contains the target object, and a partial region of the target object is blocked by a blocking object;
the processing module is configured to stack the original image and the first mask and input the stacked original image and the first mask into a target neural network model to obtain a target complete mask of a target object in the original image, where the target neural network model is obtained based on an original image sample, a preset analysis mask of the target object sample, and a reference complete mask of the target object sample through training, the original image sample includes the target object sample, a partial region of the target object sample is blocked by a blocking object, and the preset analysis mask includes semantic mask tags of the portions of the target object sample.
Optionally, the processing module is further configured to subtract the first mask from the target complete mask of the target object to obtain a mask of an invisible area of the target object.
Optionally, the processing module is further configured to output a complete parsing mask of the target object in the original image, where the complete parsing mask includes semantic mask tags of each portion of the target object.
Optionally, the processing module is further configured to subtract the first mask from the target complete mask of the target object to obtain a mask of an invisible area of the target object; and multiplying the mask of the invisible area of the target object by the complete analysis mask of the target object to obtain a semantic mask label of the invisible area of the target object.
Optionally, the mask of the visible region of the target object is the first mask.
Optionally, the obtaining module is further configured to obtain a second mask of a target object sample in the original image sample;
the processing module is further configured to input the original image sample and the second mask into a neural network model, so as to obtain a target complete mask of a target object sample in the original image sample and a complete analysis mask of the target object sample; and taking the reference complete mask of the target object sample and the preset analysis mask of the target object sample as supervision signals, obtaining a loss function by using the supervision signals, the target complete mask of the target object sample and the complete analysis mask of the target object sample, training the neural network model until the neural network model converges, and taking the converged neural network model as the target neural network model.
Optionally, the processing module is further configured to input the original image sample into a target object analysis network model, so as to obtain a preset analysis mask of the target object sample.
A third aspect of the present disclosure provides an electronic device, comprising: a processor for executing a computer program stored in a memory, the computer program, when executed by the processor, performing the steps of the method of the first aspect.
A fourth aspect of the present disclosure provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the method of the first aspect.
A fifth aspect of the present disclosure provides a computer program product, which, when run on a computer, causes the computer to perform the image processing method of the first aspect.
Compared with the prior art, the technical scheme provided by the embodiment of the disclosure has the following advantages:
the target neural network model is obtained by training based on an original image sample, a preset analysis mask of a target object sample and a reference complete mask of the target object sample, semantic information of each part of a target object is considered, and therefore a first mask of a visible region of the target object in the original image is obtained, wherein the original image comprises the target object, and a part of region of the target object is shielded by a shielding object; and stacking the original image and the first mask and inputting the stacked original image and the first mask into a target neural network model to obtain a target complete mask of a target object in the original image, wherein the accuracy of obtaining the target complete mask of the target object is higher.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure.
In order to more clearly illustrate the embodiments or technical solutions in the prior art of the present disclosure, the drawings used in the description of the embodiments or prior art will be briefly described below, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without inventive exercise.
Fig. 1 is a schematic diagram of a human body de-mask provided by the present disclosure;
FIG. 2 is a schematic diagram of an image processing system provided by the present disclosure;
FIG. 3 is a schematic flow chart of an image processing method provided by the present disclosure;
FIG. 4 is a schematic flow chart of another image processing provided by the present disclosure;
FIG. 5 is a schematic flow chart of yet another image processing provided by the present disclosure;
FIG. 6 is a schematic flow chart of yet another image processing provided by the present disclosure;
fig. 7 is a schematic structural diagram of an image processing apparatus according to the present disclosure.
Detailed Description
In order that the above objects, features and advantages of the present disclosure may be more clearly understood, aspects of the present disclosure will be further described below. It should be noted that the embodiments and features of the embodiments of the present disclosure may be combined with each other without conflict.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present disclosure, but the present disclosure may be practiced in other ways than those described herein; it is to be understood that the embodiments disclosed in the specification are only a few embodiments of the present disclosure, and not all embodiments.
The original image of the present disclosure contains a target object, wherein a partial region of the target object is occluded by an occlusion, and for convenience of description, a region of the target object occluded by the occlusion is described as an invisible region of the target object. And describing the area of the target object which is not blocked by the blocking object as the visible area of the target object. The target object may be a human body, an animal or other objects, but the disclosure is not limited thereto.
In the process of completing the invisible area of the target object of the original image, a mask of the invisible area of the target object needs to be predicted, and the target object is completed based on the mask of the invisible area of the target object. So as to realize image processing tasks such as target tracking, target detection, image segmentation and the like.
The mask of the invisible area of the target object is generally obtained based on the difference between the target complete mask of the target object and the mask of the visible area of the target object, and therefore, the accuracy of the target complete mask of the target object directly affects the accuracy of the mask of the invisible area of the target object.
The parsing mask of the target object may be understood as a code formed by different numbers corresponding to respective parts after the respective parts of the target object are divided.
The target object of the present disclosure may be a human body, an animal, or other objects, and the following embodiments of the present disclosure are described and illustrated by taking a human body as an example, and other target objects are similar to a human body and are not repeated herein.
The target object is a Human body as an example, and Human body analysis (Human matching) is to divide N parts of the Human body, wherein N is a positive integer.
Assuming that N is 19, as shown in fig. 1, fig. 1 is a schematic diagram of a human body de-mask provided by the present disclosure, and 19 parts of a human body may include: the human body analysis mask comprises a head a1, a neck a2, a left shoulder a3, a right shoulder a4, a left upper arm a5, a right upper arm a6, a left lower arm a7, a right lower arm a8, a left hand a9, a right hand a10, a left hip a11, a right hip a12, a left thigh a13, a right thigh a14, a left calf a15, a right calf a16, a left foot a17, a right foot a18 and a body a19, and the human body analysis mask is a code formed by 19 parts of the human body corresponding to one number respectively. In addition, the present disclosure is not limited to splitting the human body into 19 parts.
By analyzing the target object, semantic information of each part segmented by the target object can be obtained.
The method comprises the steps of acquiring a target complete mask of a target object through a target neural network model in order to improve accuracy of the acquired target complete mask of the target object, wherein the target neural network model is obtained through training based on an original image sample, a preset analysis mask of the target object sample and a reference complete mask of the target object sample. The reference complete mask of the target object sample refers to an accurate complete mask of the obtained target object sample, the preset analysis mask includes reliable semantic mask tags of all parts of the target object sample, and the semantic mask tags are used as pseudo tags, namely the semantic mask tags are considered to be real semantic mask tags and participate in neural network training. In the process of training the target neural network model, the reference complete mask of the target object sample is used as a supervision signal, the preset analysis mask of the target object sample is also used as a supervision signal, the semantic information of each part of the target object is considered by the analysis mask, and the semantic accuracy of each part of the target object is considered during training, so that the accuracy of the target complete mask output by the target neural network model obtained by training with the two supervision signals is higher.
The image processing method of the present disclosure is performed by an electronic device. The electronic device may be a tablet computer, a mobile phone (e.g., a folding screen mobile phone, a large screen mobile phone, etc.), a wearable device, a vehicle-mounted device, an Augmented Reality (AR)/Virtual Reality (VR) device, a notebook computer, an ultra-mobile personal computer (UMPC), a netbook, a Personal Digital Assistant (PDA), a smart television, a smart screen, a high-definition television, a 4K television, a smart speaker, a smart projector, and other internet of things (IOT) devices, and the specific type of the electronic device is not limited by the present disclosure.
Fig. 2 is a schematic diagram of an image processing system provided by the present disclosure, as shown in fig. 2, the system includes:
a target neural network model 101, wherein the target neural network model 101 comprises a plurality of network layers, some for predicting a target complete mask of a target object, and some for predicting a complete parse mask of the target object. The target neural network model 101 may include an hourglass network 101a and a multi-layer convolutional network 101b (also called a segmentation header or a task header). An input signal of the target neural network model 101 is a first mask 103 of a visible region of a target object of an original image and a stacked result of the original image 102 after stacking, and an output of the target neural network model 101 is a target complete mask 104 and a complete resolution mask 105 of the target object, wherein the target complete mask 104 is a binary segmented mask; the full resolution mask 105 refers to a mask capable of identifying semantics of parts of the target object, the full resolution mask including semantic mask tags of the parts of the target object, from which it can be determined what the parts of the target object are, for example, the head, the neck, the left shoulder, the right shoulder, the left upper arm, the right upper arm, the left lower arm, the right lower arm, the left hand, the right hand, the left hip, the right hip, the left thigh, the right thigh, the left calf, the right calf, the left foot, the right foot, or the body.
Fig. 3 is a schematic flow chart of image processing provided by the present disclosure, and with reference to fig. 2, the method of this embodiment is as follows:
s301: a first mask of a visible region of a target object in an original image is obtained.
The original image comprises a target object, and a partial area of the target object is shielded by a shielding object. The visible region of the target object refers to a region of the target object displayed in the original image. The original image typically contains a background, a target object, and an occlusion.
Taking fig. 2 as an example, the target object is a human body, the blocking object is grass, the legs and feet of the human body are blocked by the blocking object, and the head, the upper half and part of the legs are visible areas of the human body.
Illustratively, the size of the original image may be H × W × 3, H being high, W being wide, 3 being the number of channels.
Alternatively, the first mask of the visible region of the target object in the original image may be acquired through an instance segmentation network. The size of the first mask is H multiplied by W multiplied by 1, and the values in the first mask obey 0-1 distribution. However, since the example segmentation network does not consider the case of occlusion, i.e., the example segmentation network is used to segment various objects and is not designed for the target object, the first mask of the visible region of the target object of the original image acquired using the example segmentation network may be incomplete or partially erroneous, and is not accurate.
In this step, the original image may be recorded as: i issThe example split network is noted as: n is a radical ofi() The first mask is noted as: miThen, the first step is executed,
Mi=Ni(Is)
s302: and stacking the original image and the first mask and inputting the stacked original image and the first mask into a target neural network model to obtain a target complete mask corresponding to a target object in the original image.
The target complete mask corresponding to the target object in the original image is a binary segmented mask.
Specifically, the original image and the first mask may be stacked to obtain a stacking result, and the stacking result is input into the target neural network model to obtain a target complete mask corresponding to the target object. In conjunction with the description in S301, the stacking result is an H × W × 4 image, that is, the stacking result is an image of four channels.
The target neural network model is noted as:
Figure BDA0002964134940000091
target complete mask corresponding to target object: maAnd then:
Figure BDA0002964134940000092
the target neural network model is obtained by training based on the original image sample, the preset analysis mask of the target object sample and the reference complete mask of the target object sample, namely, two tasks are complete in the training process, one task is to obtain the target complete mask of the target object sample, the other task is to obtain the complete analysis mask of the target object sample, and the complete analysis mask of the target object sample comprises semantic mask labels of all parts of the target object, so that the target neural network model is trained by taking the preset analysis mask and the reference complete mask as supervision signals, more dimensions are measured, and the accuracy of the target complete mask of the target object output by the target neural network model can be improved.
In this embodiment, the target neural network model is obtained by training based on an original image sample, a preset analysis mask of a target object sample, and a reference complete mask of the target object sample, and semantic information of each part of the target object is considered, so that a first mask of a visible region of the target object in the original image is obtained, wherein the original image includes the target object, and a part of the region of the target object is blocked by a blocking object; and stacking the original image and the first mask and inputting the stacked original image and the first mask into a target neural network model to obtain a target complete mask of a target object in the original image, wherein the accuracy of obtaining the target complete mask of the target object is higher.
Fig. 4 is a schematic flow chart of another image processing provided by the present disclosure, and fig. 4 is based on the embodiment shown in fig. 3, and further includes:
s303: and obtaining a complete analysis mask of the target object in the original image.
Wherein the complete parsing mask includes semantic mask tags for portions of the target object.
In some scenarios, image processing may be performed using the complete resolution mask of the target object, for example, in the process of performing image completion, the completion effect may be determined based on the complete resolution mask of the image after completion of completion.
Therefore, the present disclosure also outputs a complete analysis mask of the target object in the original image, so as to facilitate processing of the image in some scenes, and improve the accuracy of image processing.
And a target complete mask of the target object and a complete analysis mask of the target object can be obtained simultaneously through one target neural network model. The image processing efficiency can be improved for a scene in which the target complete mask and the complete analysis mask of the target object are simultaneously applied to perform image processing.
Fig. 5 is a schematic flowchart of still another image processing provided by the present disclosure, and fig. 5 is a flowchart of further image processing, based on the embodiment shown in fig. 3 or fig. 4, in some scenarios where image processing is performed by simultaneously applying a mask of a visible region of a target object and a target complete mask of the target object, for example, in a scenario where target object completion is performed, completion of the target object needs to be performed based on a mask of an invisible region. Accordingly, the method of the present disclosure may further comprise:
s304: and subtracting the mask of the visible area of the target object from the target complete mask of the target object to obtain the mask of the invisible area of the target object.
One possible implementation manner is as follows: and taking the first mask as a mask of a visible region of a target object, namely, subtracting the mask of the visible region of the target object from a target complete mask of the target object to obtain a mask of an invisible region of the target object.
Another possible implementation is: the mask of the accurate visible region of the target object is obtained through other manners, for example, the mask of the accurate visible region of the target object is obtained through a manner such as matting, and the mask of the accurate visible region of the target object is subtracted from the target complete mask of the target object to obtain the mask of the invisible region of the target object, so that the accuracy of the mask of the invisible region of the target object can be further improved.
In the present disclosure, since the accuracy of the target complete mask is improved, the accuracy of obtaining the mask of the invisible area of the target object is also improved by the difference between the target complete mask of the target object and the mask of the visible area of the target object, and image completion and other processing can be performed based on the obtained mask of the invisible area, so that the image completion effect can be further improved.
Based on the embodiment shown in fig. 5, further, in some scenarios, a semantic mask tag applied to the invisible area of the target object is required, and therefore, S305 may further include:
s305: multiplying the mask of the invisible area of the target object by the complete analysis mask of the target object to obtain the semantic mask label of the invisible area of the target object.
Because the mask of the invisible area only has the value of the invisible area of 1 and the values of the rest areas are all 0, the mask of the invisible area of the target object is multiplied by the complete analysis mask of the target object, and the semantic mask label of the invisible area of the target object can be obtained. Therefore, it can be determined based on the semantic mask label of the invisible area of the target object which part of the target object is the invisible area, that is, which part of the target object is specifically covered by the blocking object, and taking the target object as a human body as an example, it can be determined whether the head, the neck, the left shoulder, the right shoulder, the left upper arm, the right upper arm, the left lower arm, the right lower arm, the left hand, the right hand, the left hip, the right hip, the left thigh, the right thigh, the left calf, the right calf, the left foot, the right foot or the body is covered according to the semantic mask label. Optionally, the completion of the invisible area may be further performed according to the semantic mask label, for example, if it is determined that the right foot is blocked according to the semantic mask label, the completion of the invisible area may be performed according to the feature of the right foot. Optionally, the completion effect may be further measured according to the semantic mask label, for example, it is determined that the right foot is blocked according to the semantic mask label, however, the completion accuracy may be determined to be not high if the image after completion of the invisible area is the left foot.
Fig. 6 is a schematic flow chart of still another image processing provided by the present disclosure, and fig. 6 is based on any one of the embodiments shown in fig. 3 to fig. 5, further including: a process of obtaining a target neural network model, that is, a process of obtaining the target neural network model through machine learning, as shown in fig. 6:
s601: a second mask of the target object sample in the original image sample is obtained.
The original image sample comprises a target object sample, and a partial area of the target object sample is shielded by a shielding object. The visible region of the target object sample refers to a region of the target object sample displayed in the original image sample. The original image sample typically contains a background, a target object sample, and an obstruction.
Optionally, the second mask of the visible region of the target object sample in the original image sample may be obtained by an example segmentation network.
S602: and stacking the original image sample and the second mask and inputting the stacked original image sample and the second mask into a target neural network model to obtain a target complete mask of a target object sample in the original image sample and a complete analysis mask of the target object sample.
The target neural network model simultaneously processes two tasks, wherein one task is to obtain a target complete mask of a target object sample, and the other task is to obtain a complete analysis mask of the target object sample.
S603: and taking the reference complete mask of the target object sample and the preset analysis mask of the target object sample as supervision signals, obtaining a loss function by using the supervision signals, the target complete mask of the target object sample and the complete analysis mask of the target object sample, training the neural network model until the neural network model converges, and taking the converged neural network model as the target neural network model.
Optionally, the preset parsing mask of the target object sample may be obtained through, but is not limited to, the following possible implementation manners:
one possible implementation manner is as follows: and marking semantic mask labels on all parts of the target object in an artificial marking mode, and further obtaining a preset analysis mask of the target object sample by using the semantic mask labels, wherein the preset analysis mask of the target object sample comprises the semantic mask labels of all parts of the target object sample.
Another possible implementation is: and inputting the original image sample into a target object analysis network model to obtain a preset analysis mask of the target object sample.
And taking the reference complete mask of the target object sample and the preset analysis mask of the target object sample as supervision signals, and acquiring a first loss function according to the target complete mask of the target object sample and the reference complete mask of the target object sample output by the neural network model in the process of training the neural network model. And acquiring a second loss function according to the complete analysis mask of the target object sample output by the neural network model and the preset analysis mask of the target object sample. And training the neural network model according to the first loss function and the second loss function. That is, the parameters of the neural network model are adjusted according to the first loss function and the second loss function until the neural network model converges, that is, until the first loss function and the second loss function meet the preset requirements, that is, until the accuracy of the target complete mask of the output target object sample and the complete analysis mask of the target object sample meets the requirements, the neural network model is considered to converge. And taking the converged neural network model as a target neural network model.
Specifically, the neural network model may be modeled based on a gradient descent method according to the first loss function and the second loss function until the neural network model converges. In this embodiment, a second mask of a target object sample in the original image sample is obtained, and the original image sample and the second mask are input into a neural network model after being paired to obtain a target complete mask of the target object sample in the original image sample and a complete analysis mask of the target object sample; and taking the reference complete mask of the target object sample and the preset analysis mask of the target object sample as supervision signals, training the neural network model until the neural network model converges, and taking the converged neural network model as the target neural network model. Because the complete analysis mask of the target object sample comprises semantic mask labels of all parts of the target object, the target neural network model is trained by taking the preset analysis mask and the reference complete mask as supervision signals, so that more dimensions are measured, and the accuracy of the target complete mask of the target object output by the target neural network model can be improved.
Fig. 7 is a schematic structural diagram of an image processing apparatus according to the present disclosure, and as shown in fig. 7, the apparatus of the present embodiment includes: an acquisition module 701 and a processing module 702, wherein,
an obtaining module 701, configured to obtain a first mask of a visible region of a target object in an original image, where the original image includes the target object, and a partial region of the target object is blocked by a blocking object;
a processing module 702, configured to stack the original image and the first mask, and input the stacked original image and first mask into a target neural network model, to obtain a target complete mask of a target object in the original image, where the target neural network model is obtained based on an original image sample, a preset analytic mask of the target object sample, and a reference complete mask of the target object sample, where the original image sample includes the target object sample, a partial region of the target object sample is blocked by a blocking object, and the preset analytic mask includes semantic mask tags of the portions of the target object sample.
Optionally, the processing module 702 is further configured to subtract the first mask from the target complete mask of the target object to obtain a mask of the invisible area of the target object.
Optionally, the processing module 702 is further configured to output a complete parsing mask of the target object in the original image, where the complete parsing mask includes semantic mask tags of portions of the target object.
Optionally, the processing module 702 is further configured to subtract the first mask from the target complete mask of the target object to obtain a mask of an invisible area of the target object; and multiplying the mask of the invisible area of the target object by the complete analysis mask of the target object to obtain a semantic mask label of the invisible area of the target object.
Optionally, the obtaining module 701 is further configured to obtain a second mask of a target object sample in the original image sample;
the processing module 702 is further configured to input the original image sample and the second mask into a neural network model, so as to obtain a target complete mask of a target object sample in the original image sample and a complete analysis mask of the target object sample; and taking the reference complete mask of the target object sample and the preset analysis mask of the target object sample as supervision signals, obtaining a loss function by using the supervision signals, the target complete mask of the target object sample and the complete analysis mask of the target object sample, training the neural network model until the neural network model converges, and taking the converged neural network model as the target neural network model.
Optionally, the processing module 702 is further configured to input the original image sample into a target object analysis network model, so as to obtain a preset analysis mask of the target object sample.
The apparatus of this embodiment may be used to implement the technical solutions of the above method embodiments, and the implementation principles and technical effects thereof are similar, and are not described herein again.
The present disclosure also provides an electronic device, comprising: a processor for executing a computer program stored in a memory, the computer program, when executed by the processor, implementing the steps of a method embodiment of any of the figures 3-6. It should be noted that the processor may be a Graphics Processing Unit (GPU), that is, the program algorithm of the present disclosure may be completely completed by the GPU. Illustratively, this may be done using a Unified Device Architecture (CUDA) PyTorch or the like.
The present disclosure provides a computer-readable storage medium having stored thereon a computer program which, when being executed by a processor, carries out the steps of the method embodiment of any one of the figures 3-6.
The present disclosure provides a computer program product which, when run on a computer, causes the computer to perform the steps of a method embodiment as described in any of figures 3-6.
In the above-described embodiments, all or part of the functions may be implemented by software, hardware, or a combination of software and hardware. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. The procedures or functions according to the embodiments of the disclosure are wholly or partially generated when the computer program instructions are loaded and executed on a computer. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium. The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., a floppy disk, a hard disk, a magnetic tape), an optical medium (e.g., a DVD), or a semiconductor medium (e.g., a Solid State Disk (SSD)), among others.
It is noted that, in this document, relational terms such as "first" and "second," and the like, may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The foregoing are merely exemplary embodiments of the present disclosure, which enable those skilled in the art to understand or practice the present disclosure. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the disclosure. Thus, the present disclosure is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. A method of processing an image, comprising:
acquiring a first mask of a visible region of a target object in an original image, wherein the original image contains the target object, and a partial region of the target object is shielded by a shielding object;
the original image and the first mask are stacked and then input into a target neural network model to obtain a target complete mask of a target object in the original image, wherein the target neural network model is obtained based on an original image sample, a preset analysis mask of the target object sample and a reference complete mask of the target object sample through training, the original image sample comprises the target object sample, a partial area of the target object sample is shielded by a shielding object, and the preset analysis mask comprises semantic mask labels of all parts of the target object sample.
2. The method of claim 1, further comprising:
and subtracting the first mask from the target complete mask of the target object to obtain a mask of an invisible area of the target object.
3. The method of claim 1, wherein after the inputting the original image and the first mask into the target neural network model after the stacking, further comprising:
outputting a complete parsing mask of a target object in the original image, the complete parsing mask including semantic mask tags of portions of the target object.
4. The method of claim 3, further comprising:
subtracting the first mask from the target complete mask of the target object to obtain a mask of an invisible area of the target object;
and multiplying the mask of the invisible area of the target object by the complete analysis mask of the target object to obtain a semantic mask label of the invisible area of the target object.
5. The method according to any of claims 1-4, wherein prior to said obtaining the first mask of the visible region of the target object in the original image, further comprising:
acquiring a second mask of a target object sample in the original image sample;
inputting the original image sample and the second mask into a neural network model to obtain a target complete mask of a target object sample in the original image sample and a complete analysis mask of the target object sample;
and taking the reference complete mask of the target object sample and the preset analysis mask of the target object sample as supervision signals, obtaining a loss function by using the supervision signals, the target complete mask of the target object sample and the complete analysis mask of the target object sample, training the neural network model until the neural network model converges, and taking the converged neural network model as the target neural network model.
6. The method of claim 5, further comprising:
and inputting the original image sample into a target object analysis network model to obtain a preset analysis mask of the target object sample.
7. An image processing apparatus, characterized in that the apparatus comprises:
the device comprises an acquisition module, a first mask module and a second mask module, wherein the acquisition module is used for acquiring a visible region of a target object in an original image, the original image contains the target object, and a partial region of the target object is blocked by a blocking object;
the processing module is configured to stack the original image and the first mask and input the stacked original image and the first mask into a target neural network model to obtain a target complete mask of a target object in the original image, where the target neural network model is obtained based on an original image sample, a preset analysis mask of the target object sample, and a reference complete mask of the target object sample through training, the original image sample includes the target object sample, a partial region of the target object sample is blocked by a blocking object, and the preset analysis mask includes semantic mask tags of the portions of the target object sample.
8. An electronic device, comprising: a processor for executing a computer program stored in a memory, the computer program, when executed by the processor, implementing the steps of the method of any of claims 1-6.
9. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 6.
10. A computer program product, characterized in that it causes a computer to carry out the image processing method according to any one of claims 1 to 6, when said computer program product is run on the computer.
CN202110246097.4A 2021-03-05 2021-03-05 Image processing method, apparatus, electronic device, medium, and computer program product Pending CN112967197A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110246097.4A CN112967197A (en) 2021-03-05 2021-03-05 Image processing method, apparatus, electronic device, medium, and computer program product

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110246097.4A CN112967197A (en) 2021-03-05 2021-03-05 Image processing method, apparatus, electronic device, medium, and computer program product

Publications (1)

Publication Number Publication Date
CN112967197A true CN112967197A (en) 2021-06-15

Family

ID=76276600

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110246097.4A Pending CN112967197A (en) 2021-03-05 2021-03-05 Image processing method, apparatus, electronic device, medium, and computer program product

Country Status (1)

Country Link
CN (1) CN112967197A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114445632A (en) * 2022-02-08 2022-05-06 支付宝(杭州)信息技术有限公司 Picture processing method and device

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111353514A (en) * 2018-12-20 2020-06-30 马上消费金融股份有限公司 Model training method, image recognition method, device and terminal equipment
WO2020216008A1 (en) * 2019-04-25 2020-10-29 腾讯科技(深圳)有限公司 Image processing method, apparatus and device, and storage medium

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111353514A (en) * 2018-12-20 2020-06-30 马上消费金融股份有限公司 Model training method, image recognition method, device and terminal equipment
WO2020216008A1 (en) * 2019-04-25 2020-10-29 腾讯科技(深圳)有限公司 Image processing method, apparatus and device, and storage medium

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
FANG MIAO ET AL.: "Research on Character Image Inpainting based on Generative Adversarial Network", 《2020 INTERNATIONAL CONFERENCE ON CULTURE-ORIENTED SCIENCE & TECHNOLOGY (ICCST)》, 24 November 2020 (2020-11-24), pages 137 - 140 *
QIANG ZHOU ET AL.: "Human De-occlusion: Invisible Perception and Recovery for Humans", 《2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR)》, 2 November 2021 (2021-11-02), pages 3691 - 3701 *
XIAOSHENG YAN ET AL.: "Visualizing the Invisible: Occluded Vehicle Segmentation and Recovery", 《2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV)》, 27 February 2020 (2020-02-27), pages 7618 - 7627 *
周强: "基于先验信息建模的视频目标分割和补全研究", 《中国优秀硕士学位论文全文数据库(信息科技辑)》, no. 01, 15 January 2023 (2023-01-15), pages 138 - 1227 *
唐浩丰等: "基于深度学习的图像补全算法综述", 《计算机科学》, vol. 47, no. 11, 30 November 2020 (2020-11-30), pages 151 - 164 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114445632A (en) * 2022-02-08 2022-05-06 支付宝(杭州)信息技术有限公司 Picture processing method and device

Similar Documents

Publication Publication Date Title
CN109584276B (en) Key point detection method, device, equipment and readable medium
US20220230420A1 (en) Artificial intelligence-based object detection method and apparatus, device, and storage medium
Li et al. CNN for saliency detection with low-level feature integration
Srivastav et al. Human pose estimation on privacy-preserving low-resolution depth images
US11663502B2 (en) Information processing apparatus and rule generation method
Ejaz et al. Feature aggregation based visual attention model for video summarization
CN111275784B (en) Method and device for generating image
Rezende et al. Development and validation of a Brazilian sign language database for human gesture recognition
Biswas et al. Beyond document object detection: instance-level segmentation of complex layouts
US20200193604A1 (en) Encoder Regularization of a Segmentation Model
WO2022161302A1 (en) Action recognition method and apparatus, device, storage medium, and computer program product
CN111353325A (en) Key point detection model training method and device
Tan et al. Distinctive accuracy measurement of binary descriptors in mobile augmented reality
EP4222700A1 (en) Sparse optical flow estimation
CN112364916A (en) Image classification method based on transfer learning, related equipment and storage medium
Lee et al. Improved method on image stitching based on optical flow algorithm
Kapur et al. Mastering opencv android application programming
CN111932438A (en) Image style migration method, equipment and storage device
CN112967197A (en) Image processing method, apparatus, electronic device, medium, and computer program product
Muchtar et al. Moving pedestrian localization and detection with guided filtering
JP2020003879A (en) Information processing device, information processing method, watermark detection device, watermark detection method, and program
CN112967199A (en) Image processing method and device
Novkovic et al. CLUBS: An RGB-D dataset with cluttered box scenes containing household objects
CN110517239A (en) A kind of medical image detection method and device
Zhang et al. A map-based normalized cross correlation algorithm using dynamic template for vision-guided telerobot

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination