CN111247790A

CN111247790A - Image processing method and device, image shooting and processing system and carrier

Info

Publication number: CN111247790A
Application number: CN201980004937.7A
Authority: CN
Inventors: 薛立君; 费奥多尔·克拉夫琴科
Original assignee: SZ DJI Technology Co Ltd
Current assignee: SZ DJI Technology Co Ltd; Shenzhen Dajiang Innovations Technology Co Ltd
Priority date: 2019-02-21
Filing date: 2019-02-21
Publication date: 2020-06-05
Also published as: WO2020168515A1

Abstract

The embodiment of the invention discloses an image processing method, an image processing device, an image shooting and processing system and a carrier, wherein the method comprises the following steps: acquiring an image frame sequence shot in a delayed manner; determining a target frame with a target object in the image frame sequence; matting an image area with a target object in the target frame; and filling and scratching the image area behind the target object, so that the target object in the image can be effectively removed, and the playing effect of the image frame sequence obtained by delayed shooting is improved.

Description

Image processing method and device, image shooting and processing system and carrier

The disclosure of this patent document contains material which is subject to copyright protection. The copyright is owned by the copyright owner. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the patent and trademark office official records and records.

Technical Field

The embodiment of the invention relates to the technical field of image processing, in particular to an image processing method, an image processing device, an image shooting and processing system and a carrier.

Background

The delayed shooting refers to a shooting technology of compressing time, after a group of photos or videos are obtained through shooting, a long-time process can be compressed in a short time through photo series connection or video frame extraction at a later stage, and the long-time process can be played in a video mode.

Along with the development of unmanned aerial vehicle aerial photography technique, more and more users use unmanned aerial vehicle to implement the time delay and shoot. However, when unmanned aerial vehicle carried out the time delay and shoots, insects such as birds were attracted by unmanned aerial vehicle very easily, for example unmanned aerial vehicle when taking photo by plane, often had birds can accompany in unmanned aerial vehicle flight all around, led to the part health of birds or birds frequently appearing in unmanned aerial vehicle's the camera lens. In addition, the type of situation may also occur when the handheld pan/tilt head is used for delayed shooting, for example, in a scenic spot with a high people stream density, visitors often enter a field range of delayed shooting. Based on the characteristic of delayed shooting, in the process, the images of bird bodies and tourists appear in the images suddenly, and the playing effect of the video obtained by delayed shooting is seriously influenced.

Disclosure of Invention

In view of this, embodiments of the present invention provide an image processing method and apparatus, an image capturing and processing system and apparatus, which can effectively remove an abnormal object in an image and improve a playing effect of an image frame sequence obtained by delayed capturing.

A first aspect of an embodiment of the present invention is to provide an image processing method, including:

acquiring an image frame sequence shot in a delayed manner;

determining a target frame with a target object in the image frame sequence;

matting an image area with a target object in the target frame;

and filling and scratching the image area of the target object.

A second aspect of the embodiments of the present invention is to provide an image processing apparatus, including a memory, a processor;

the memory is used for storing program codes;

the processor, invoking the program code, when executed, is configured to:

acquiring an image frame sequence shot in a delayed manner;

determining a target frame with a target object in the image frame sequence;

matting an image area with a target object in the target frame;

and filling and scratching the image area of the target object.

A third aspect of embodiments of the present invention is to provide an image capturing and processing system, comprising a capturing device and one or more processors, wherein:

the shooting device is used for obtaining an image frame sequence through delayed shooting and sending the image frame sequence to the one or more processors;

the one or more processors are configured to determine, in the image frame sequence, an object frame having an object, scratch out an image region in the object frame where the object exists, and fill in the image region after scratching out the object.

A fourth aspect of embodiments of the present invention is to provide a carrier, characterized by an image capturing and processing apparatus, wherein the image capturing and processing apparatus is configured to:

acquiring an image frame sequence shot in a delayed manner;

determining a target frame with a target object in the image frame sequence;

matting an image area with a target object in the target frame;

and filling and scratching the image area of the target object.

In the embodiment of the invention, the control terminal can firstly obtain the image frame sequence shot in a time-delay manner, so that the target frame with the target object can be determined in the image frame sequence, the image area with the target object in the target frame is removed, and the image area after the target object is removed is filled, so that the image corresponding to the target object in the target frame can be effectively removed, and the playing effect of the time-delay shooting is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a schematic diagram of an image processing scenario provided by an embodiment of the present invention;

FIG. 2 is a schematic diagram of an image processing scenario provided by another embodiment of the present invention;

FIG. 3 is a schematic diagram of a sequence of image frames provided by an embodiment of the present invention;

FIG. 4a is a diagram of a target frame with a target object according to an embodiment of the present invention;

FIG. 4b is a schematic diagram of a target object in the target frame shown in FIG. 4a after subtraction according to an embodiment of the present invention;

FIG. 4c is a schematic diagram of the image shown in FIG. 4b after being filled in according to the embodiment of the present invention;

FIG. 5 is a schematic flow chart diagram of an image processing method provided by an embodiment of the invention;

FIG. 6 is a schematic flow chart diagram of an image processing method according to another embodiment of the present invention;

FIG. 7 is a schematic diagram of a target object being a local object according to an embodiment of the present invention;

FIG. 8 is a schematic block diagram of an image capture and processing system provided by an embodiment of the present invention;

FIG. 9a is a schematic diagram of a partial mask according to an embodiment of the present invention;

fig. 9b is a schematic structural diagram of a partially convolved neural network according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be described clearly below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Currently, in order to eliminate a target object in an image frame sequence obtained by delayed shooting, the target object includes an abnormal object included in the target frame in the delayed shooting process, or an object specified by a user in the target frame to be removed, the target object may be manually processed after the image frame sequence based on delayed shooting is obtained, so as to eliminate the target object in the sequence, however, by adopting a manual processing mode, there may be a risk of omission, and the efficiency of removing the target object in the image frame sequence is low, so that the target object in the image cannot be effectively eliminated. Based on this, the application provides an image processing method, which can automatically identify an image frame sequence shot in a time-delay manner, deduct an image area in the image frame sequence where a target object exists, and fill the deducted image area, so that the removal efficiency of the target object in the image frame sequence can be improved, the removal effect of the image area corresponding to the target object in the image frame sequence can also be improved, and the playing effect of the image frame sequence obtained by time-delay shooting can be improved.

In one embodiment, the image processing method may be applied to an image processing scenario as shown in fig. 1, wherein the image processing method may be particularly applied to an image capturing and processing system as shown in fig. 1, wherein the system comprises a capturing device and one or more processors, and in the present application scenario, the capturing device and the one or more processors are integrated in the same physical device, and at this time, the image capturing and processing system only comprises the physical device. In the drone as shown in the figures, the one or more processors are configured inside the drone, the camera is mounted on the drone, and is used for obtaining an image frame sequence through time-delay shooting, sending the image frame sequence to the unmanned aerial vehicle, specifically, the information is sent to one or more processors integrated in the drone, and the one or more processors may acquire the image frame sequence captured in a delayed manner and sent by the capturing device, specifically, can be used for image acquisition by the shooting device according to a preset time interval and sending the image acquisition to the one or more processors, such that the one or more processors can order the acquired images in a time series, therefore, the sequenced images can be further compressed into an image frame sequence, and the image frame sequence shot based on time delay can be obtained.

In another embodiment, the camera and the one or more processors may be integrated into different physical devices, and the corresponding image capturing and processing system may be composed of a plurality of physical devices, the camera may be integrated into a mobile phone, a camera, or the like, and the one or more servers may be integrated into a ground station or a remote control device, wherein the physical devices integrating the camera and the one or more processors may perform image transmission based on a pre-established communication connection to process an image in which a target object exists.

In one embodiment, the image processing method may be further applied to an image processing scene as shown in fig. 2, wherein the image processing method may be particularly applied to a carrier as shown in fig. 2, the carrier comprises an image capturing and processing device, and the image capturing and processing device may be carried on the carrier, and the carrier comprises an unmanned aerial vehicle, an unmanned vehicle, a handheld device with a holder, a carrier device and the like. In the application scenario, the carrier may be a handheld cradle head, in which the above-mentioned image capturing and processing apparatus is configured, where the image capturing and processing apparatus may be configured to perform: and acquiring an image frame sequence shot in a delayed manner, and processing the image frame sequence to obtain a target image with an abnormal object in the image frame sequence deducted. The image capturing device may be a part of the carrier or may be fixedly attached to the carrier. The image processing device is in communication connection with the image capture device in a wired or wireless manner for receiving image data captured by the image processing device.

In an embodiment, the unmanned aerial vehicle or the handheld pan-tilt may sequence the acquired images according to a time sequence, and the shooting scene shown in fig. 1 is taken as an example to describe the present solution in detail, and a specific implementation manner of implementing the image processing method in the shooting scene shown in fig. 2 may be referred to in the embodiment of the present invention. In particular, the drone, i.e. the one or more processors in the drone, may order the acquired images in a temporal order as indicated by the arrows in fig. 3, and the sequence of image frames obtained by further compression after the ordering may be as shown in fig. 3. Further, after the image frame sequence is obtained, each frame image in the image frame sequence may be identified, so as to determine a target frame including a target object from the image frame sequence, where the target object is an abnormal object that occurs in a delay shooting process, and the abnormal object is an object corresponding to an image that is composed of the target frame and pixel points corresponding to different pixel values at the same position as a frame adjacent to the target frame, and the image corresponding to the determined image including the target object is an image with a mark number of 2 in the drawing through the identification of the image frame sequence shown in fig. 3.

After the drone acquires the image frame sequence captured in a delayed manner, a target frame with a target object may be determined in the image frame sequence, and it may be assumed that the target frame with the target object determined by the drone from the image frame sequence is as shown in fig. 4a, where the image shown in fig. 4a is the image marked with the sequence number 2 in fig. 3. Wherein the target object is assumed to be an abnormal object identified by the area 401 in the figure. The target object may be an interfering object such as a bird preset in the unmanned aerial vehicle as an abnormal object, or an object selected by a user and required to be eliminated. When the unmanned aerial vehicle determines a target object, the unmanned aerial vehicle can determine the target object from the target frame when detecting that the target frame comprises a preset interference object; or the drone may further identify the type of the object included in any image frame and determine the type of the object included in any image frame, so that the target object may be determined based on the number of the various types of objects included in the image frame, for example, the target object may be determined to be the fewest number of the various types of objects included in the image frame.

After the drone determines the target frame with the target object from the image frame sequence, the image area of the target object in the target frame may be deducted, that is, the image identified by the area 401 in fig. 4a is deducted, and the deducted target frame may be as shown in fig. 4 b. Further, after the image area of the target object in the target frame is deducted, the area can be filled, and the image after filling can be as shown in fig. 4c, so that the influence on the playing effect when the target object plays the image frame sequence can be avoided, and the viewing quality of the user is improved. In the shooting scene shown in fig. 2, the target object (i.e., the abnormal object) placed on the handheld pan/tilt head is a visitor, i.e., a person in fig. 2, who suddenly appears during the shooting process.

Referring to fig. 5, a schematic flowchart of an image processing method provided in an embodiment of the present invention is shown, where the image processing method can be specifically applied to the image capturing and processing system and the carrier, and in the embodiment of the present invention, an implementation subject is taken as an example of the image capturing and processing system, and the image processing method is specifically described, where as shown in fig. 5, the method includes:

s501, obtaining an image frame sequence shot in a time-delay mode.

In one embodiment, the capturing device in the image capturing and processing system is configured to capture a sequence of image frames in a delayed manner and transmit the sequence of image frames to one or more processors included in the system, and the processors may be configured to acquire the sequence of image frames captured in the delayed manner.

Specifically, the photographing device may obtain multiple frames of images by photographing at preset time intervals, where the preset time intervals may be, for example, 30 minutes, 2 hours, and the like, and the photographing device may be, for example, an image capturing apparatus such as a camera. After the shooting device shoots a plurality of frames of images, the plurality of frames of images can be directly sequenced based on a time sequence to obtain an initial image sequence, and further, the initial image sequence can be compressed to generate an image frame sequence.

In another embodiment, the sequence of image frames may also be generated by a device incorporating the one or more processors. In particular, the device into which the one or more processors are integrated may be, for example, the above-mentioned drone, unmanned automobile, ground station, remote control device, and the like. After shooting multiple frames of images, the shooting device can directly send the multiple frames of images to one or more processors, so that the one or more processors can sequence the multiple frames of images based on a time sequence to obtain an initial image sequence, and compress the initial image sequence, thereby generating the image frame sequence.

S502, in the image frame sequence, determining a target frame with a target object.

After the image frame sequence is acquired by the image capturing and processing system, wherein the image frame sequence is specifically acquired by the one or more processors, in order to determine a target frame having a target object from the image frame sequence, the image frame sequence may be preprocessed, that is, the image frame sequence may be split into image groups ordered according to a time sequence, so that the target frame having the target object may be determined from the image groups.

When the image capturing and processing system determines the target object, the target object may be determined based on a target object preset in the image capturing and processing system, specifically, a type of the target object preset by the image capturing and processing system may be determined first, so that image recognition may be performed on any one of the image frames of the image frame sequence to determine a type of the included object from the any one of the image frames, so that the type of the included object and the type of the preset target object may be compared, and thus the target frame including the type of the target object may be determined according to a comparison result.

In one embodiment, the type of the target object preset by the image capturing and processing system may be different based on different capturing scenes, such as natural wind and light, city life, biological evolution, etc., for example, when the capturing scene is natural wind, the type of the target object preset may be birds, etc.; when the shooting scene is a biological evolution, the type of the preset target object may be a human or the like. The type of the target object preset by the image capturing and processing system may be one type or multiple types.

S503, matting the image area of the target object in the target frame.

The image capturing and processing system may determine an image area of the target object corresponding to the target frame based on a preset network model before subtracting the image area of the target object in the target frame, where the preset network model may be, for example, a Regional Convolutional Neural Network (RCNN) network model, and specifically, the image capturing and processing system may input the target frame into the RCNN model, so that the image area of the target object corresponding to the target frame may be determined based on an output of the RCNN model.

When the RCNN model determines the image area corresponding to the target object based on the input target frame, feature extraction may be performed on the input target frame first, so that the category of the object included in the target frame may be determined based on a feature extraction result, and further, the category of the object included in the target frame may be compared with the category of the target object preset by the image capturing and processing system, so that the image area corresponding to the target object in the target frame may be determined according to a comparison result.

After determining the image area corresponding to the target object in the target frame, a corresponding local mask image can be generated in the image area, that is, the mask image is used for identifying the local area of the image, so that the local image area identified by the local mask image can be scratched, and the image area with the target object in the target frame can be scratched. After the image region where the target object exists in the target frame is scratched out, the image may be represented by a white region, and step S504 is performed.

S504, filling the image area after the target object is removed.

In one embodiment, after the image area where the target object exists in the target frame is scratched, the image area where the target object is deducted needs to be filled by the image shooting and processing system, so as to maintain the continuity of the target frame image and simultaneously ensure the playing effect of the image frame sequence during playing.

When filling the image region from which the target object is deducted, filling the image region based on a previous frame image and a next frame image of the target frame; or, inputting a target frame for matting an image region where a target object exists and a unit image corresponding to the image region where an abnormal image exists in the target frame into a convolutional neural network model, so as to fill the image region where the target object is deducted, where an output image of the convolutional neural network model is the filled target frame, and when the convolutional neural network model is used for pixel filling, the convolutional neural network model may be specifically filled by a convolutional neural (UNet) structure mesh of a Partial convolution (Partial convolution) layer in a convolutional neural network structure; or, the previous frame image and the next frame image of the target frame, the target frame with the image area of the target object, and the unit image corresponding to the image area with the abnormal image in the target frame can be input into the convolutional neural network model for pixel filling.

In the embodiment of the invention, the image shooting and processing system can acquire the image frame sequence shot in a delayed manner firstly, so that the target frame with the target object can be determined in the image frame sequence, the image area with the target object in the target frame is scratched, and the image area with the target object scratched is filled, so that the image corresponding to the target object in the target frame can be effectively removed, and the playing effect of delayed shooting is improved.

Referring to fig. 6, a schematic flowchart of an image processing method according to another embodiment of the present invention, which may also be specifically applied to the image capturing and processing system and the carrier, in an embodiment of the present invention, the image processing method is also specifically described by taking an implementation subject as an example of the image capturing and processing system, where as shown in fig. 6, the method includes:

s601, obtaining an image frame sequence shot in a time delay mode.

In one embodiment, when determining the image frame sequence, a camera in the image capturing and processing system may first acquire at least one initial image captured by the camera, so that the at least one image may be sorted based on a time sequence to obtain an initial image sequence, and further, the initial image sequence may be compressed to obtain the image frame sequence. When compressing the initial image frame sequence, blank images based on the initial image sequence may be deleted, and the image frame sequence may be obtained by modifying the time corresponding to the initial images, where each frame image of the obtained image frame sequence is a continuous non-blank image.

S602, in the image frame sequence, determining a target frame with a target object.

In one embodiment, when determining a target frame having a target object, the image capturing and processing system may determine an adjacent frame of the target frame, and compare the target frame with the adjacent frame, so as to determine a target image composed of pixel points having different pixel values corresponding to the adjacent frame at the same position from the target frame, where an object corresponding to the target image is the target object, and further determine the target frame including the target object from the sequence of image frames.

The target object is generally a moving object, and may be all or part of the moving object, and when all of the moving object is photographed in a target frame, the target object is all of the moving object, as shown in fig. 4a, the target object in the target frame is all of a bird; when only part of the moving object is captured in the target frame, the target object is a part of the moving object, that is, the part of the moving object captured in the target frame, as shown in fig. 7, the target object in the target frame is a part (foot) of a bird, that is, a partial image marked by 701.

In an embodiment, the image capturing and processing system may identify the image edge in the target frame based on the convolutional neural network structure, which may improve the speed of identifying the image corresponding to the target object in the target frame, and specifically, may identify the image edge by the UNet structure of the Partial volumes layer in the convolutional neural network structure, and determine the edge of the image corresponding to the target object in the target frame according to the identification result. Specifically, when the Partial volumes layer is adopted to identify the image edge, the pixel point sets belonging to the same semantic meaning can be determined, so that the image area formed by the pixel point sets belonging to the same semantic meaning can be used as the image area of the image corresponding to the target object, and further, the target frame can be determined. Wherein, the pixel points belonging to the same semantic meaning refer to: for example, in fig. 4a, the pixel point corresponding to the wing of the bird and the pixel point corresponding to the foot of the bird are all pixel points for describing the characteristics of the bird, and therefore, the pixel point corresponding to the wing and the pixel point corresponding to the foot belong to the same semantic pixel point set. And the pixel point corresponding to the car door is not a pixel point for describing the characteristics of the bird, so that the pixel point corresponding to the car door and the pixel point corresponding to the wing do not belong to the same semantic pixel point set.

In order to enhance the reliability of the partial convolution algorithm for image edge recognition, after the pixel point sets belonging to the same semantic meaning are determined, the distance between an object corresponding to an image area formed by the pixel points belonging to the same semantic meaning and the image shooting and processing system can be further determined, so that a target object can be determined through distance comparison, and a target frame with the target object can be further determined. For example, if in fig. 4a the distance between the bird and the image capturing and processing system determined based on the same semantic meaning is a, the distance between the vehicle and the image capturing and processing system determined based on the same semantic meaning is b, and a is smaller than b, it is said that the vehicle and the bird are not on the same plane, and therefore the determined target object is a bird.

When determining a target frame with a target object, the image capturing and processing system may process the image frame sequence based on a neural network model, and based on the processing of the image frame sequence, may determine a category of the object included in each image in the image frame sequence. Specifically, the image capture and processing system may input a plurality of images included in the image frame sequence into the neural network model; therefore, the neural network model can be called to extract the features of any image in the image frame sequence to obtain a feature extraction result; based on the feature extraction results, a category of each image in the sequence of image frames including an object may be determined. In one embodiment, the features may be, for example, color features, texture features, and the like.

When the image capturing and processing system determines that each image in the image frame sequence includes the category of the object based on the feature extraction result, the image capturing and processing system may first invoke a neural network model to perform feature summarization on the feature extraction result to obtain a feature summarization result. In another embodiment, the image capturing and processing system may further aggregate all the feature extraction results, for example, the feature aggregation results may be obtained by aggregating based on the texture features and the color features. When summarizing the feature extraction results, the feature extraction results can be directly added to obtain a feature summarizing result; and performing weighted calculation on the feature extraction result to obtain a feature summarizing result.

After the feature summarizing result is obtained, the image capturing and processing system may determine, based on the feature summarizing result, a category of each image in the image frame sequence including the object, and specifically, the image capturing and processing system may match the feature summarizing result with a preset feature value corresponding to each object category, so that the category of each image including the object may be determined according to a matching result.

After the image capture and processing system processes the sequence of image frames based on the neural network model and determines that each image in the sequence of image frames includes a category of an object, a target frame may be further determined based on an output of the processing of the sequence of image frames by the neural network model. Specifically, the image capturing and processing system may determine a target frame from the image frame sequence based on a category of an object included in each image output by the neural network model, wherein the image capturing and processing system may match the category of the object included in each image with a preset category of the target object, so as to determine whether each image in the image frame sequence includes the target object, and determine an image frame including the target object as the target frame.

In another embodiment, when the image capturing and processing system determines a target frame with a target object from an image frame sequence, the image capturing and processing system may further perform region division on any one of a plurality of frame images included in the image frame sequence to obtain a plurality of region images; and determining a target frame with a target object in the image frame sequence based on the characteristic parameters, wherein after the characteristic parameters of each region image are acquired, the object category included in any frame image in the image frame sequence can be determined based on the characteristic parameters, so that the target frame with the target object in the image frame sequence can be determined according to the object category.

S603, determining the image area of the target object in the target frame.

S604, generating a local mask pattern corresponding to the image area based on the image area.

S605, according to the local mask graph and the target frame, the image area with the target object in the target frame is scratched.

Step S603 to step S605 are specific refinements of step S503 in the above embodiment. When the image capturing and processing system is used to scratch out the image region of the target object in the target frame, the image region of the target object in the target frame may be determined first.

After determining that an image region of a target object exists in the target frame, a local mask pattern, namely a mask pattern, corresponding to the image region may be generated based on the image region, where the local mask pattern is used to mark the local image region of the target object existing in the target frame, and based on the mask pattern and the target frame, the image region of the target object existing in the target frame may be subjected to matting, and the image region of the target object existing in the target frame is subjected to matting and then represented by a white region in the target frame.

For example, if the target frame is the image shown in fig. 4a, the image region of the target frame where the target object exists may correspond to the region identified by 401, a corresponding local mask pattern may be generated based on the image region, the target frame after the local mask pattern is generated may be as shown in fig. 4b, after the mask pattern is generated, the image in the mask pattern may be subjected to matting, and after the image in the mask pattern is scratched, the image region of the target frame where the target object exists may be represented by white, as shown in fig. 4 c.

In an embodiment, the image region after deducting the target object may be filled based on surrounding image information of the local mask pattern, specifically, a surrounding image region of the image region where the target object exists in the target frame may be determined first, and a distance between a pixel point in the surrounding image region and a pixel point in the image region where the target object exists is smaller than or equal to a preset distance threshold, and further, the image region after deducting the target object may be filled based on the surrounding image region.

When the image region from which the target object is subtracted is filled based on the surrounding image domain, a reference frame may be determined from the image frame sequence, where the reference frame is any one of M frames before the target frame, where M is an integer greater than 1; further, the image capture and processing system may determine an exposure intensity of the reference frame, such that the image capture and processing system may employ a white balance algorithm and fill the image region from which the target object is subtracted based on the exposure intensity of the reference frame and the surrounding image domain.

S606, filling the image area after the target object is removed.

In one embodiment, when filling the image region after the target object is removed, the image capturing and processing system may first obtain a first unit image included in the target frame, where the first unit image is an image region in the target frame where the target object exists, so that the target frame after the image region where the target object exists is removed and the first unit image may be input into a convolutional neural network model, and obtain an output image of the convolutional neural network model, where the output image is the filled target frame.

In another embodiment, when filling the image region after the target object is removed, the image capturing and processing system may further first obtain a first unit image included in the target frame, and a previous frame image and a next frame image of the target frame, where the first unit image is an image region in the target frame where the target object exists, so that the previous frame image, the next frame image, the target frame after the image region where the target object exists is removed, and the first unit image are input into the convolutional neural network model, and obtain an output image of the convolutional neural network model, where the output image is the target frame after the filling. The convolutional neural network model may be a convolutional neural UNet structure of the partialconvolumes layer.

In another embodiment, when filling the image region after the target object is removed, the image capturing and processing system may further obtain a previous frame image and a next frame image of the target frame, so that the image after the target object is removed may be filled based on the previous frame image and the next frame image, and a filled target frame is obtained. Specifically, the image capturing and processing system may first obtain a second unit image in the previous frame of image, where the second unit image is an image of the previous frame of image at the same position as an image area of the target frame where the target object exists; and acquiring a third unit image in the next frame image, wherein the third unit image is an image at the same position corresponding to the image area where the target object exists in the next frame image and the target frame, further acquiring a first numerical value of each pixel point contained in the second unit image, and acquiring a second numerical value of each pixel point contained in the third unit image, so that for any pixel point, an average value of the first numerical value and the second numerical value can be calculated, and pixel filling can be performed on the target frame where the image area where the target object exists based on the average value, so that the filled target frame is obtained.

It should be noted that, because the image capturing and processing system refers to the pixels of the image in the same position as the target object in the previous frame image and the subsequent frame image of the target frame when performing pixel filling, the continuity of the content and color of the filled target frame in time sequence can be ensured.

In the embodiment of the present invention, an image capturing and processing system first obtains an image frame sequence captured in a delayed manner, determines a target frame having a target object in the image frame sequence, and further determines an image region in which the target object exists in the target frame, so that a local mask image corresponding to the image region can be generated based on the image region, and deducts the image region in which the target object exists in the target frame based on the local mask image and the target frame, and fills the image region after deducting the target object.

An embodiment of the present invention provides an image capturing and processing system, fig. 8 is a block diagram of the image capturing and processing system according to the embodiment of the present invention, as shown in fig. 8, an image capturing and processing system 800 including a capturing device 801 and one or more processors 802 is specifically applicable to the image processing scenario shown in fig. 1, wherein,

the shooting device 801 is configured to obtain an image frame sequence through time-delay shooting, and send the image frame sequence to the one or more processors;

the one or more processors 802 are configured to obtain a sequence of image frames captured in a delayed manner, determine a target frame having a target object in the sequence of image frames, scratch out an image region of the target frame where the target object exists, and fill in the image region where the target object is scratched out.

In one embodiment, the one or more processors 802, when acquiring the sequence of image frames captured with a delay, are specifically configured to:

acquiring at least one shot initial image;

sequencing the at least one frame of initial image based on a time sequence to obtain an initial image sequence;

and compressing the initial image sequence to obtain an image frame sequence.

In one embodiment, the one or more processors 802, when determining a target frame having a target object in the sequence of image frames, are specifically configured to:

processing the sequence of image frames based on a neural network model;

and determining the target frame according to an output result of the neural network model for processing the image frame sequence.

In one embodiment, the one or more processors 802, when processing the sequence of image frames based on a neural network model, are specifically configured to:

inputting the image frame sequence into the neural network model, the image frame sequence comprising a plurality of images;

calling the neural network model to perform feature extraction on any image to obtain a feature extraction result;

determining a category in which each image in the sequence of image frames includes an object based on the feature extraction result.

In one embodiment, the one or more processors 802, when determining, based on the feature extraction result, that each image in the sequence of image frames includes a category of an object, are specifically configured to:

calling the neural network model to perform feature summarization on the feature extraction result to obtain a feature summarization result;

and determining the category of each image comprising the object in the image frame sequence according to the feature summarizing result.

In one embodiment, the one or more processors 802, when determining the target frame according to the output result of the processing of the sequence of image frames by the neural network model, are specifically configured to:

judging whether each image in the image frame sequence contains a target object or not according to an output result of the neural network model for processing the image frame sequence;

determining an image frame containing the target object as a target frame.

aiming at any frame image of a plurality of frame images included in the image frame sequence, carrying out region division on the image to obtain a plurality of region images;

and acquiring the characteristic parameters of each region image, and determining a target frame with a target object in the image frame sequence based on the characteristic parameters.

In one embodiment, the one or more processors 802, when determining a target frame having a target object in the sequence of image frames based on the characteristic parameter, are specifically configured to:

determining an object class included in any frame image in the image frame sequence based on the characteristic parameters;

and determining a target frame with a target object in the image frame sequence according to the object class.

In one embodiment, when matting out an image region where a target object exists in the target frame, the one or more processors 802 are specifically configured to:

determining an image area with a target object in the target frame;

generating a local mask graph corresponding to the image area based on the image area;

and according to the local mask graph and the target frame, carrying out scratch-out on an image area with a target object in the target frame.

In one embodiment, the one or more processors 802, when filling the image region after matting the target object, are specifically configured to:

determining a surrounding image domain of an image region where a target object exists in the target frame, wherein the distance between pixel points in the surrounding image domain and the pixel points of the image region where the target object exists is smaller than or equal to a preset distance threshold;

and filling the image area deducted from the target object based on the surrounding image area.

In one embodiment, the one or more processors 802, when filling the image region from which the target object is subtracted based on the surrounding image region, are specifically configured to:

determining a reference frame from the image frame sequence, wherein the reference frame is any one of the first M frames of the target frame, and M is an integer greater than 1;

determining an exposure intensity of the reference frame;

and filling the image area after deducting the target object based on the exposure intensity of the reference frame and the surrounding image area by adopting a white balance algorithm.

acquiring a first unit image included in the target frame, wherein the first unit image is an image area of a target object in the target frame;

inputting the target frame with the image area of the target object and the first unit image into a convolutional neural network model, and acquiring an output image of the convolutional neural network model, wherein the output image is the filled target frame.

acquiring a first unit image included in the target frame, and a previous frame image and a next frame image of the target frame, wherein the first unit image is an image area of a target object in the target frame;

inputting the previous frame image, the next frame image, the target frame after the image area with the target object and the first unit image are scratched out into a convolutional neural network model, and acquiring an output image of the convolutional neural network model, wherein the output image is the target frame after filling.

acquiring a previous frame image and a next frame image of the target frame;

and filling the image area after the target object is scratched out based on the previous frame image and the next frame image to obtain the filled target frame.

In one embodiment, when the image region after the target object is scratched out is filled based on the previous frame image and the next frame image to obtain a filled target frame, the one or more processors 802 are specifically configured to:

acquiring a second unit image in the previous frame image, wherein the second unit image is an image at the same position corresponding to an image area with a target object in the previous frame image and the target frame;

acquiring a third unit image in the next frame of image, wherein the third unit image is an image in the same position corresponding to an image area with a target object in the next frame of image and the target frame;

acquiring a first numerical value of each pixel point contained in the second unit image, and acquiring a second numerical value of each pixel point contained in the third unit image;

calculating the average value of the first numerical value and the second numerical value aiming at any pixel point;

and carrying out pixel filling on the target frame of the image region with the target object based on the average value to obtain the filled target frame.

In one embodiment, the target object is an abnormal object included in the target frame during the delayed shooting process of the shooting device.

In one embodiment, the one or more processors 802, when determining a target frame having a target object, are specifically configured to:

determining adjacent frames of the target frame;

comparing the target frame with the adjacent frame, and determining a target image formed by pixel points with different pixel values corresponding to the adjacent frame at the same position from the target frame, wherein an object corresponding to the target image is the target object;

and taking the frame comprising the target object in the image frame sequence as a target frame.

In one embodiment, the target object is all or part of a moving object.

The image capturing and processing system provided by this embodiment can execute the image processing method shown in fig. 5 and fig. 6 provided by the foregoing embodiment, and the execution manner and the beneficial effects thereof are similar and will not be described again here.

An embodiment of the present invention provides a carrier, which includes an image capturing and processing apparatus, and may be specifically applied in an image processing scenario as shown in fig. 2, where the image capturing and processing apparatus is configured to:

acquiring an image frame sequence shot in a delayed manner;

determining a target frame with a target object in the image frame sequence;

matting an image area with a target object in the target frame;

and filling and scratching the image area of the target object.

In an embodiment, when the image capturing and processing apparatus acquires the sequence of image frames captured with a delay, the image capturing and processing apparatus is specifically configured to:

acquiring at least one shot initial image;

and compressing the initial image sequence to obtain an image frame sequence.

In one embodiment, the image capturing and processing device, when determining the target frame having the target object in the image frame sequence, is specifically configured to:

processing the sequence of image frames based on a neural network model;

In an embodiment, the image capturing and processing device, when processing the sequence of image frames based on a neural network model, is specifically configured to:

In one embodiment, the image capturing and processing device, when determining that each image in the sequence of image frames includes a category of an object based on the feature extraction result, is specifically configured to:

In an embodiment, when determining the target frame according to an output result of the processing of the image frame sequence by the neural network model, the image capturing and processing apparatus is specifically configured to:

determining an image frame containing the target object as a target frame.

In one embodiment, the image capturing and processing device, when determining the target frame having the target object in the image frame sequence based on the characteristic parameter, is specifically configured to:

In one embodiment, when the image capturing and processing device is used to scratch out an image region of a target object in the target frame, the image capturing and processing device is specifically configured to:

determining an image area with a target object in the target frame;

In one embodiment, when filling the image region after the target object is removed, the image capturing and processing apparatus is specifically configured to:

In an embodiment, when the image capturing and processing device fills the image area obtained by deducting the target object based on the surrounding image area, the image capturing and processing device is specifically configured to:

determining an exposure intensity of the reference frame;

acquiring a previous frame image and a next frame image of the target frame;

In an embodiment, when the image capturing and processing device fills the image region after the target object is scratched out based on the previous frame image and the next frame image to obtain a filled target frame, the image capturing and processing device is specifically configured to:

In one embodiment, the image capturing and processing device, when determining the target frame having the target object, is specifically configured to:

determining adjacent frames of the target frame;

In one embodiment, the target object is all or part of a moving object.

The carrier provided by this embodiment can execute the image processing method shown in fig. 5 and 6 provided by the foregoing embodiment, and the execution manner and the beneficial effects are similar, and are not described again here.

In one embodiment, the partial Mask (Mask) and partial convolution (partialvolume) described in this specification will be described. In image processing, a full convolutional neural Network (FullyConvolutional neural Network) is often used. However, the full convolution neural network requires a traversal convolution of the entire input image, which is resource consuming and somewhat reduces the processing speed. The local mask only performs convolution on the interest region, identifies the semantics of pixels in the interest region one by one, and performs regression processing on the bounding box of the local mask to obtain the pixel characteristics around the bounding box of the local mask.

As shown in fig. 9a, the frame Region of the input image is a Region of interest (RoI), only the RoI is partially convolved in the convolution layer of CNN, and the semantic classification of the RoI is output by semantic analysis of each pixel by a classifier (Class Box). For the loss function, for each sample in the RoI area, a multitask loss function may be defined:

L＝Lcls+Lbox+Lmask

where Lcls and Lbox may be defined by the loss function of the generally fast R-CNN; one Km for each RoI mask²The branch of the dimension, K for the binary encoding and m for the resolution, will have one class for each K. For this, we provide a sigmoid () function for each pixel, defining Lmask as the average binary cross entropy loss.

As shown in FIG. 9b, the U-net neural network model used by Partial Convolution (Partial Convolution) is exemplarily given. The network comprises a process of performing multiple lower convolution and upper convolution on an input image.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims

1. An image processing method, comprising:

acquiring an image frame sequence shot in a delayed manner;

determining a target frame with a target object in the image frame sequence;

matting an image area with a target object in the target frame;

and filling and scratching the image area of the target object.

2. The method of claim 1, wherein the obtaining the sequence of image frames captured with a delay time comprises:

acquiring at least one shot initial image;

and compressing the initial image sequence to obtain an image frame sequence.

3. The method of claim 1, wherein determining a target frame having a target object in the sequence of image frames comprises:

processing the sequence of image frames based on a neural network model;

4. The method of claim 3, wherein the processing the sequence of image frames based on the neural network model comprises:

5. The method of claim 4, wherein the determining that each image in the sequence of image frames includes a category of objects based on the feature extraction result comprises:

6. The method of any of claims 3-5, wherein determining the target frame from the output of the processing of the sequence of image frames according to the neural network model comprises:

determining an image frame containing the target object as a target frame.

7. The method of claim 1, wherein determining a target frame having a target object in the sequence of image frames comprises:

8. The method of claim 7, wherein the determining a target frame having a target object in the sequence of image frames based on the feature parameter comprises:

9. The method of claim 1, wherein the matting an image region in the target frame where a target object exists comprises:

determining an image area with a target object in the target frame;

10. The method of claim 9, wherein the filling out the image region behind the object comprises:

11. The method of claim 10, wherein the filling the image region after deducting the target object based on the surrounding image region comprises:

determining an exposure intensity of the reference frame;

12. The method of claim 1, wherein the filling out the image region after the target object is removed comprises:

13. The method of claim 1, wherein the filling out the image region after the target object is removed comprises:

14. The method of claim 1, wherein the filling out the image region after the target object is removed comprises:

acquiring a previous frame image and a next frame image of the target frame;

15. The method as claimed in claim 14, wherein said filling the image region after said removing the target object based on the previous frame image and the next frame image to obtain a filled target frame comprises:

16. The method of claim 1, wherein the target object is an abnormal object included in the target frame during the time-lapse shooting.

17. The method of claim 16, wherein determining the target frame having the target object comprises:

determining adjacent frames of the target frame;

18. The method of claim 16, wherein the target object is all or part of a moving object.

19. An image processing apparatus includes a memory, a processor;

the memory is used for storing program codes;

the processor, invoking the program code, when executed, is configured to:

acquiring an image frame sequence shot in a delayed manner;

determining a target frame with a target object in the image frame sequence;

matting an image area with a target object in the target frame;

and filling and scratching the image area of the target object.

20. The apparatus of claim 19, wherein the processor performs the following operations when acquiring the sequence of image frames captured with a time delay:

acquiring at least one shot initial image;

and compressing the initial image sequence to obtain an image frame sequence.

21. The apparatus of claim 19, wherein the processor, in determining a target frame having a target object in the sequence of image frames, performs the following:

processing the sequence of image frames based on a neural network model;

22. The apparatus of claim 21, wherein the processor, when processing the sequence of image frames based on a neural network model, performs the following:

23. The apparatus of claim 22, wherein the processor, when determining that each image in the sequence of image frames includes a category of an object based on the feature extraction result, performs the following:

24. The apparatus according to any of claims 21-23, wherein the processor determines the target frame based on an output of the neural network model processing the sequence of image frames by:

determining an image frame containing the target object as a target frame.

25. The apparatus of claim 19, wherein the processor, in determining a target frame having a target object in the sequence of image frames, performs the following:

26. The apparatus of claim 25, wherein the processor, when determining a target frame having a target object in the sequence of image frames based on the feature parameter, performs the following:

27. The apparatus of claim 19, wherein the processor, when matting out an image region in the target frame where a target object exists, performs the following operations:

determining an image area with a target object in the target frame;

28. The apparatus of claim 27, wherein the processor performs the following operations when filling the image region after the object is removed:

29. The apparatus of claim 28, wherein the processor performs the following operations when filling the image region with the subtracted target object based on the surrounding image region:

determining an exposure intensity of the reference frame;

30. The apparatus of claim 19, wherein the processor performs the following operations when filling the image region after the object is scratched out:

31. The apparatus of claim 19, wherein the processor performs the following operations when filling the image region after the object is scratched out:

32. The apparatus of claim 19, wherein the processor performs the following operations when filling the image region after the object is scratched out:

acquiring a previous frame image and a next frame image of the target frame;

33. The apparatus of claim 32, wherein the processor fills the image region after the target object is scratched out based on the previous frame image and the next frame image, and when a filled target frame is obtained, performs the following operations:

acquiring a third unit image in the next frame of image, wherein the third unit image is an image at the same position corresponding to an image area with a target object in the next frame of image and the target frame;

34. The apparatus of claim 1, wherein the target object is an abnormal object included in the target frame during the delayed shooting.

35. The apparatus of claim 34, wherein the processor, when determining a target frame with a target object, performs the following:

determining adjacent frames of the target frame;

36. The apparatus of claim 35, wherein the target object is all or part of a moving object.

37. An image capture and processing system comprising a capture device and one or more processors, wherein:

the one or more processors are configured to acquire a sequence of image frames captured in a delayed manner, determine a target frame with a target object in the sequence of image frames, scratch out an image area of the target frame where the target object exists, and fill in the image area where the target object is scratched out.

38. The image capture and processing system of claim 37, wherein the capture device, when acquiring the sequence of time-lapse captured image frames, is specifically configured to:

acquiring at least one shot initial image;

and compressing the initial image sequence to obtain an image frame sequence.

39. The image capture and processing system of claim 37, wherein the one or more processors, in determining a target frame having a target object in the sequence of image frames, are specifically configured to:

processing the sequence of image frames based on a neural network model;

40. The image capture and processing system of claim 39, wherein the one or more processors, when processing the sequence of image frames based on a neural network model, are specifically configured to:

41. The image capture and processing system of claim 40, wherein the one or more processors, in determining that each image in the sequence of image frames includes a category of an object based on the feature extraction results, are specifically configured to:

42. The image capture and processing system of any of claims 39-41, wherein the one or more processors, in determining the target frame from the output of the processing of the sequence of image frames by the neural network model, are specifically configured to:

determining an image frame containing the target object as a target frame.

43. The image capture and processing system of claim 37, wherein the one or more processors, in determining a target frame having a target object in the sequence of image frames, are specifically configured to:

44. The image capture and processing system of claim 43, wherein the one or more processors, in determining a target frame having a target object in the sequence of image frames based on the feature parameters, are specifically configured to:

45. The image capture and processing system of claim 37, wherein the one or more processors, when matting out an image region in which a target object is present in the target frame, are specifically configured to:

determining an image area with a target object in the target frame;

46. The image capture and processing system of claim 45, wherein the one or more processors, when filling the image region behind the object, are specifically configured to:

47. The image capture and processing system of claim 46, wherein the one or more processors, when populating the image region subtracted from the target object based on the surrounding image domain, are specifically configured to:

determining an exposure intensity of the reference frame;

48. The image capture and processing system of claim 37, wherein the one or more processors, when filling the image region behind the object, are specifically configured to:

49. The image capture and processing system of claim 37, wherein the one or more processors, when filling the image region behind the object, are specifically configured to:

50. The image capture and processing system of claim 37, wherein the one or more processors, when filling the image region behind the object, are specifically configured to:

acquiring a previous frame image and a next frame image of the target frame;

51. The image capture and processing system of claim 50, wherein the one or more processors, when filling the image region from which the target object was scratched out based on the previous frame image and the next frame image to obtain a filled target frame, are specifically configured to:

52. The image capture and processing system of claim 37, wherein the target object is an anomalous object included in the target frame during the time-lapse capture by the capture device.

53. The image capture and processing system of claim 52, wherein the one or more processors, in determining the target frame with the target object, are specifically configured to:

determining adjacent frames of the target frame;

54. The image capture and processing system of claim 52, wherein the target object is all or part of a moving object.

55. A carrier for carrying an image capture device, the image capture device being communicatively couplable to an image processing device, the image processing device being configured to:

acquiring an image frame sequence shot in a delayed manner;

determining a target frame with a target object in the image frame sequence;

matting an image area with a target object in the target frame;

and filling and scratching the image area of the target object.

56. The carrier according to claim 55, wherein the image capture and processing means, when acquiring the sequence of image frames captured with a time delay, are particularly adapted to:

acquiring at least one shot initial image;

and compressing the initial image sequence to obtain an image frame sequence.

57. The carrier as claimed in claim 55, wherein the image capturing and processing means, when determining a target frame with a target object in the sequence of image frames, are particularly adapted to:

processing the sequence of image frames based on a neural network model;

58. The carrier of claim 57, wherein the image capture and processing means, when processing the sequence of image frames based on a neural network model, is specifically configured to:

59. The carrier of claim 58, wherein the image capture and processing device, when determining, based on the feature extraction results, that each image in the sequence of image frames includes a category of objects, is specifically configured to:

60. The carrier of any of claims 57-59, wherein the image capture and processing means, when determining the target frame from the output of the processing of the sequence of image frames by the neural network model, is specifically configured to:

determining an image frame containing the target object as a target frame.

61. The carrier as claimed in claim 55, wherein the image capturing and processing means, when determining a target frame with a target object in the sequence of image frames, are particularly adapted to:

62. The carrier of claim 61, wherein the image capture and processing device, when determining a target frame having a target object in the sequence of image frames based on the characteristic parameters, is specifically configured to:

63. The carrier of claim 55, wherein the image capture and processing device, when matting out an image region where a target object exists in the target frame, is specifically configured to:

determining an image area with a target object in the target frame;

64. The carrier of claim 63, wherein the image capture and processing device, when filling the image region after matting the object, is specifically configured to:

65. The carrier according to claim 64, wherein the image capturing and processing device, when filling the image area deducted from the target object based on the surrounding image area, is specifically configured to:

determining an exposure intensity of the reference frame;

66. The carrier of claim 55, wherein the image capture and processing device, when filling the image region after matting the object, is specifically configured to:

67. The carrier of claim 55, wherein the image capture and processing device, when filling the image region after matting the object, is specifically configured to:

68. The carrier of claim 55, wherein the image capture and processing device, when filling the image region after matting the object, is specifically configured to:

acquiring a previous frame image and a next frame image of the target frame;

69. The carrier of claim 58, wherein the image capturing and processing device, when filling the image region after the target object is removed based on the previous frame image and the next frame image to obtain a filled target frame, is specifically configured to:

70. The carrier of claim 55, wherein the target object is an abnormal object included in the target frame during the delayed shooting by the shooting device.

71. The carrier of claim 70, wherein the image capture and processing device, when determining the target frame with the target object, is specifically configured to:

determining adjacent frames of the target frame;

72. The carrier of claim 70 wherein the target object is all or part of a moving object.

73. A computer storage medium having computer program instructions stored therein for execution by a processor to perform the image processing method of any of claims 1-18.