CN113129227A

CN113129227A - Image processing method, image processing device, computer equipment and storage medium

Info

Publication number: CN113129227A
Application number: CN202110330377.3A
Authority: CN
Inventors: 张伟俊; 谢朝毅
Original assignee: Insta360 Innovation Technology Co Ltd
Current assignee: Insta360 Innovation Technology Co Ltd
Priority date: 2021-03-29
Filing date: 2021-03-29
Publication date: 2021-07-16
Also published as: WO2022206679A1

Abstract

The application relates to an image processing method, an image processing device, computer equipment and a storage medium, which are applicable to the technical field of computers. The method comprises the following steps: acquiring multi-frame images shot in the same scene by a camera module, and performing target detection on the multi-frame images by using a target detection model to obtain a target object included in each frame image in the multi-frame images; classifying the target objects included in each frame of image, and determining moving objects and static objects included in each frame of image; removing the moving object in each frame of image, obtaining a background image corresponding to each frame of image, and performing fusion processing on all the background images to generate a target background image; and covering the static object in the target background image by using the static object in the multi-frame image to obtain an output image of the camera module. By adopting the method, the image quality of the composite image after the moving object is removed can be improved.

Description

Image processing method, image processing device, computer equipment and storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to an image processing method and apparatus, a computer device, and a storage medium.

Background

With the continuous development of scientific technology, more and more devices are used for taking pictures, and devices such as video cameras, smart phones, tablet computers and the like can be used for taking pictures. However, when the device is used for taking a picture, some pedestrians, vehicles, or other objects such as animals usually enter the taken picture, which affects the aesthetic property of the picture.

In order to solve the above problem, in the conventional technology, a moving object is generally recognized depending on a change in pixel values in different frame images, so as to remove the moving object in the images.

However, when the moving object moves insufficiently or stays at a single place for too long, the pixel values of the object in different frame images do not change much, so that recognition is made erroneous. That is to say, the prior art has insufficient recognition accuracy for the moving object, so that the ghost exists in the synthesized image for removing the moving object, and the image quality is poor.

Disclosure of Invention

In view of the above, it is desirable to provide an image processing method, an apparatus, a computer device and a storage medium capable of improving the image quality of a composite image from which a moving object is removed.

In a first aspect, an image processing method is provided, which includes:

acquiring multi-frame images shot in the same scene by a camera module, and performing target detection on the multi-frame images by using a target detection model to obtain a target object included in each frame image in the multi-frame images; classifying the target objects included in each frame of image, and determining moving objects and static objects included in each frame of image; removing the moving object in each frame of image, obtaining a background image corresponding to each frame of image, and performing fusion processing on all the background images to generate a target background image; and covering the static object in the target background image by using the static object in the multi-frame image to obtain an output image of the camera module.

In one embodiment, the classifying the target object included in each frame of image and determining the moving object and the static object included in each frame of image includes: determining the position of the target object in each frame of image; and determining the target object as a moving object or a static object according to the position of the target object in each frame of image.

In one embodiment, determining the target object as a moving object or a static object according to the position of the target object in each frame of image includes: and calculating the position deviation value of the target object in any two frames of images of the multi-frame images, if the maximum position deviation value is smaller than the position deviation threshold value, determining that the target object is a static object, and if the position deviation value of the target object in any two frames of images of the multi-frame images is larger than or equal to the position deviation threshold value, determining that the target object is a moving object.

In one embodiment, the classifying the target object included in each frame of image and determining the moving object and the static object included in each frame of image includes: determining the number of target pixels of each frame of image at a tracking position, wherein the target pixels are used for displaying a target object, and the tracking position is the position of the target object in any one frame of image in the multiple frames of images; and determining that the target object is a moving object or a static object according to the number of the target pixels of each frame image at the tracking position.

In one embodiment, determining the target object as a moving object or a static object according to the number of target pixels of each frame of image at the tracking position includes: calculating the difference of the number of target pixels of any two frames of images in the tracking position in the multi-frame images; if the maximum number difference is smaller than the pixel number threshold, determining that the target object is a static object; and if the difference of the number of the target pixels of any two frames of images at the tracking position is greater than or equal to the pixel number threshold value, determining that the target object is a moving object.

In one embodiment, removing the moving object in each frame of image to obtain a background image corresponding to each frame of image includes: marking the pixels corresponding to the moving object in each frame image as invalid pixels; and generating a background image corresponding to each frame of image according to the rest pixels except the invalid pixel in each frame of image.

In one embodiment, the obtaining an output image of the camera module by covering a still object in the target background image with a still object in the multi-frame image comprises: determining a reference image according to the multi-frame image, wherein the definition of a static object in the reference image is greater than that of a corresponding static object in the target background image; and covering the static object in the target background image by using the static object in the reference image to obtain an output image.

In a second aspect, there is provided an image processing apparatus including:

the acquisition module is used for acquiring multi-frame images shot by the camera module on the same scene, and performing target detection on the multi-frame images by using the target detection model to obtain a target object included in each frame image in the multi-frame images;

the determining module is used for classifying the target objects included in each frame of image and determining moving objects and static objects included in each frame of image;

the removing module is used for removing the moving object in each frame of image, obtaining a background image corresponding to each frame of image, and fusing all the background images to generate a target background image;

and the covering module is used for covering the static object in the target background image by utilizing the static object in the multi-frame image to obtain the output image of the camera module.

In a third aspect, there is provided a computer device comprising a memory storing a computer program and a processor implementing the method according to any of the first aspects as described above when the processor executes the computer program.

In a fourth aspect, there is provided a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the method of any of the first aspects described above.

The image processing method, the image processing device, the computer equipment and the storage medium acquire multi-frame images shot by the camera module on the same scene, and perform target detection on the multi-frame images by using the target detection model to acquire a target object included in each frame image in the multi-frame images; classifying the target objects included in each frame of image, and determining moving objects and static objects included in each frame of image; removing the moving object in each frame of image, obtaining a background image corresponding to each frame of image, and performing fusion processing on all the background images to generate a target background image; and covering the static object in the target background image by using the static object in the multi-frame image to obtain an output image of the camera module. According to the method, the foreground object (for example, the target object described above) in the multi-frame image can be accurately identified through the target detection model, and the accuracy of the target object identification result is improved. The target objects included in each frame of image are classified, and whether the target objects are moving objects or static objects is determined, so that the accuracy of identifying the static objects and the moving objects is improved. In addition, on the premise of ensuring the accuracy of the searched moving object, the moving object in the multi-frame image is removed to generate a background image. By carrying out fusion processing on the multi-frame background images, the ghost image existing in any frame of the multi-frame images is eliminated, so that the generated target background image does not have the ghost image, and the definition of the target background image is ensured. And finally, covering the image corresponding to the static object in the target background image by using the image corresponding to the static object in the multi-frame image, so that the moving object is removed from the finally generated output image, the ghost in the output image is eliminated, the definition of the background and the static object is ensured, and the image quality is improved.

Drawings

FIG. 1 is a diagram of an exemplary embodiment of an image processing method;

FIG. 2 is a flow diagram illustrating a method for image processing according to one embodiment;

FIG. 3 is a diagram illustrating the determination of the position of a target in a multi-frame image in an image processing method according to an embodiment;

FIG. 4 is a diagram illustrating a still object in a multi-frame image overlaid on a still object in a target background image in an embodiment of an image processing method;

FIG. 5 is a schematic flow chart of image processing steps in one embodiment;

FIG. 6 is a flowchart illustrating an image processing method according to another embodiment;

FIG. 7 is a diagram illustrating the determination of a target object in a multi-frame image in an image processing method according to an embodiment;

FIG. 8 is a flowchart illustrating an image processing method according to another embodiment;

FIG. 9 is a flowchart illustrating an image processing method according to another embodiment;

FIG. 10 is a flowchart illustrating an image processing method according to another embodiment;

FIG. 11 is a flowchart illustrating an image processing method according to another embodiment;

FIG. 12 is a flowchart illustrating an image processing method according to another embodiment;

FIG. 13 is a block diagram showing the configuration of an image processing apparatus according to an embodiment;

FIG. 14 is a block diagram showing a configuration of an image processing apparatus according to an embodiment;

fig. 15 is a block diagram showing the configuration of an image processing apparatus according to an embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

The image processing method provided by the application can be applied to computer equipment shown in FIG. 1. Wherein the computer device may be a terminal. The internal structure of which can be seen in figure 1. The computer device includes a processor, a memory, a communication interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The communication interface of the computer device is used for carrying out wired or wireless communication with an external terminal, and the wireless communication can be realized through WIFI, an operator network, NFC (near field communication) or other technologies. The computer program is executed by a processor to implement an image processing method. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the computer equipment, an external keyboard, a touch pad or a mouse and the like.

Those skilled in the art will appreciate that the architecture shown in fig. 1 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

In an embodiment of the present application, as shown in fig. 2, an image processing method is provided, which is described by taking the method as an example applied to the terminal in fig. 1, and includes the following steps:

step 201, a terminal acquires a plurality of frames of images shot by a camera module on the same scene, and performs target detection on the plurality of frames of images by using a target detection model to obtain a target object included in each frame of image in the plurality of frames of images.

Specifically, the user can place the shooting equipment that the module of making a video recording place in fixed position, keeps shooting equipment motionless for the module of making a video recording shoots multiframe image to same scene. The camera module does not change the relative position of a stationary object in a plurality of frames of images shot in the same scene (for example, the stationary object may be a building, a person or a tree being shot, and the like), and the relative position of a moving object may change (for example, the moving object may be a person, an animal or a vehicle suddenly intruding into the scene being shot at present). It should be understood that the same scene is mainly the same shooting scene for the stationary object, i.e. the stationary object is the target object in the image that is finally desired to be obtained, and the moving object is mistaken into the shooting scene, which is not desired by the user. The multi-frame image of the same scene can be obtained by fixing the shooting device where the camera module is located, but the method for obtaining the multi-frame image of the same scene through shooting is not limited thereto, and this embodiment is not particularly limited thereto.

Optionally, after the terminal or the shooting device receives a shooting instruction input by the user, the shooting module can be controlled to shoot multiple frames of continuous images. Optionally, the photographing instruction input by the user may be that the user presses a shutter button, or that the user inputs a voice photographing password, or that the terminal or the photographing device detects a photographing gesture of the user, and the photographing instruction input by the user is not specifically limited in this embodiment.

After the camera module shoots multi-frame images in the same scene, the multi-frame images can be stored in the storage device, and the terminal can acquire the multi-frame images shot in the same scene by the camera module from the storage device. The terminal can input the multi-frame images into the target detection model, and the target detection model is used for extracting the features in the multi-frame images, so that the target object in each frame of image is determined. The target detection Model may be a Model based on manual features, such as a DPM (Deformable Parts Model), or a Model based on a Convolutional Neural network, such as YOLO (You Look Only Once), R-CNN (Region-based Convolutional Neural network), SSD (Single Shot multi box), Mask R-CNN (Mask Region-based Convolutional Neural network), and the like. The embodiment of the present application does not specifically limit the target detection model.

In step 202, the terminal classifies the target objects included in each frame of image, and determines moving objects and static objects included in each frame of image.

Optionally, the terminal may track the same target object included in the multiple frames of images by using a target tracking algorithm, determine the position of the same target object in different frames of images, and determine whether the same target object is a moving object or a stationary object, thereby classifying the moving object and the stationary object in each frame of image.

For example, after the multi-frame images acquired by the terminal all include the target object a, the terminal respectively identifies the position of the target object a in the multi-frame images by using a target tracking algorithm, and judges whether the target object a is a moving object or a stationary object according to the position of the target object a in the multi-frame images.

Optionally, the terminal may further track the same position in the multiple frames of images by using a target tracking algorithm, determine the number of pixels of the target object detected at the same position in the multiple frames of images, and determine whether the target object is a moving object or a stationary object according to the number of pixels of the target object displayed at the same position in the multiple frames of images.

For example, as shown in fig. 3, the terminal detects the position of the target object B in the first frame image according to the target tracking algorithm, and determines the position of the target object B in the first frame image as the target position. And the terminal determines the same position in other multi-frame images as the target position according to the position of the target position in the first frame image. And the terminal tracks the number of pixels occupied by the target object B in the same target position of the multi-frame images, and determines that the target object B is a moving object or a static object according to the number of pixels of the target object B in the target position of the multi-frame images.

And step 203, the terminal removes the moving object in each frame of image, obtains a background image corresponding to each frame of image, and performs fusion processing on all the background images to generate a target background image.

Specifically, after determining that each target object is a static object or a moving object, the terminal marks the pixels, in the target rectangular frame, of the moving object in each frame of image as invalid pixels, obtains a background image corresponding to each frame of image, and performs fusion processing on all background images to generate a target background image.

Optionally, the pixels in the target rectangular frame where the moving object in each frame of image is located are marked as invalid pixels, and after the background image corresponding to each frame of image is obtained, the terminal may perform fusion processing on the multiple frames of background images by using a pixel-level image fusion method, so as to generate the target background image, where the pixel-level image fusion method may be an image fusion method based on non-multi-scale transformation (e.g., an averaging and weighted averaging method, a logic filter method, a mathematical morphology method, an image algebra method, etc.) or an image fusion method based on multi-scale transformation (e.g., a pyramid image fusion method, a wavelet transformation image fusion method, an image fusion method based on a neural network, etc.). In the embodiment of the application, the fusion method of the multi-frame background images is not limited, and more image information can be reserved by adopting the pixel-level image fusion method.

Optionally, the terminal may further perform fusion processing on the background image corresponding to each frame of image by using a background modeling method. The background modeling method can use a non-recursive background modeling method and can also use a recursive background modeling method, wherein the non-recursive background modeling method can comprise a median model, a mean model, a linear prediction model, non-parameter kernel density estimation and the like, and the recursive background modeling method can comprise an approximate median filtering method, a single Gaussian model method, a mixed Gaussian model method and the like.

For example, the embodiment of the present application takes a median model modeling method in a non-recursive background modeling method as an example for detailed description. Suppose there are n frames of images.

To be provided with

Representing a set of images, wherein I^kRepresenting the k frame image.

Representing a set of mask images, M, obtained by labeling each pixel in each frame of image in the set of images^kIs represented by^kThe corresponding mask map. The pixel corresponding to the moving object in each frame of the mask map in the mask map set is an invalid pixel, the invalid pixel can be labeled as 0, the pixel except the moving object is an effective pixel, and each effective pixel can be labeled as1, thereby generating a corresponding mask map. Optionally, M^kThe value range of the pixel value of each pixel point in the pixel array can be {0, 1}, wherein 0 represents an invalid pixel and 1 represents an effective pixel. The coordinate position of each pixel in the image is represented by p ═ x, y, for example, p ═ 1,2 may represent the coordinate position of the pixel in the first row and the second column in the image. I is^k(p) and M^k(p) each represents I^kAnd M^kAnd the pixel value of the pixel point is corresponding to the coordinate position p. Respectively representing the pixel values of the pixel points corresponding to the coordinate position p of the synthesized target background image and the background image by B and B (p), and then the corresponding calculation formula is as follows:

median (×) in equation (1) represents the Median operation on the elements in the set.

And generating a pixel value B of the pixel point corresponding to the coordinate position p of the target background image according to the calculated pixel value B (p) of the pixel point corresponding to the coordinate position p in each frame of background image and the coordinate position of the pixel point corresponding to p, thereby obtaining the target background image.

And step 204, the terminal covers the static object in the target background image by using the static object in the multi-frame image to obtain an output image of the camera module.

Specifically, when the photographing module acquires a plurality of frames of images of the same scene, there may be human errors or equipment errors, which may cause a slight deviation in the position of each still object or moving object in the plurality of frames of images, so that after the fusion processing, the edge corresponding to the still object in the generated target background image becomes blurred, and in order to improve the sharpness of the image corresponding to the still object in the output image. The terminal can select an image with higher definition from the multi-frame images acquired by the camera module, and the image corresponding to the static object in the image is used for replacing the image corresponding to the static object in the target background image, so that the image with the moving object removed and without ghost and with the static object being very clear is acquired and is used as the output image of the camera module.

Illustratively, as shown in fig. 4, a diagram a in fig. 4 is an image of an optional frame in a plurality of frames of images, a diagram B is a target background image, and a diagram C is an output image of the camera module. The terminal may extract the pixel corresponding to the still person (1) in the diagram a, and the extracted pixel corresponding to the still person (1) in the diagram a is overlaid on the position of the pixel corresponding to the still person (1) in the diagram B, so as to generate the diagram C, i.e., the output image of the image capturing module.

In the image processing method, a target detection module is used for carrying out target detection on multiple frames of images shot in the same scene by the camera module to obtain a target object included in each frame of image in the multiple frames of images; classifying the target objects included in each frame of image, and determining moving objects and static objects included in each frame of image; removing the moving object in each frame of image, obtaining a background image corresponding to each frame of image, and performing fusion processing on all the background images to generate a target background image; and covering the static object in the target background image by using the static object in the multi-frame image to obtain an output image of the camera module. According to the method, the target object in the multi-frame image can be accurately identified through the target detection model, and the accuracy of the target object identification result is improved. The target objects included in each frame of image are classified, and whether the target objects are moving objects or static objects is determined, so that the accuracy of identifying the static objects and the moving objects is improved. And removing the moving object in the multi-frame image to generate a background image on the premise of ensuring the accuracy of the searched moving object. By carrying out fusion processing on the multi-frame background images, the ghost image existing in any frame of the multi-frame images is eliminated, so that the generated target background image does not have the ghost image, and the definition of the target background image is ensured. And finally, covering the image corresponding to the static object in the target background image by using the image corresponding to the static object in the multi-frame image, so that the moving object is removed from the finally generated output image, the ghost in the output image is eliminated, the definition of the background and the static object is ensured, and the image quality is improved.

In an alternative implementation manner of the present application, as shown in fig. 5, the step 202 "classify the target objects included in each frame of image, and determine the moving objects and the static objects included in each frame of image" may include the following steps:

in step 501, the terminal determines the position of the target object in each frame of image.

Specifically, the terminal determines the target object according to the target detection model identification result. And aiming at the same target object in the multi-frame images, the terminal respectively determines the position of the same target object in each frame image.

And 502, the terminal determines that the target object is a moving object or a static object according to the position of the target object in each frame of image.

Specifically, the terminal marks the position corresponding to the same target object in each frame of image. And the terminal compares whether the position of the target object in each frame of image changes or not and judges whether the target object is a moving object or a static object according to the comparison detection result.

Illustratively, the terminal identifies the same target object C in each frame of image according to the target detection model identification result. And the terminal marks the corresponding position of the target object C in each frame of image according to the identification result, and optionally, the terminal can use a frame drawing frame to draw the target object in each frame of image. And the terminal compares whether the position mark aiming at the target object C in each frame of image is changed or not, and judges whether the target object is a moving object or a static object according to the comparison result.

In the embodiment of the application, the position of the target object in each frame of image is determined through the terminal, and the target object is determined to be a moving object or a static object according to the position of the target object in each frame of image. Therefore, whether the target object is a moving object or a static object can be accurately determined, the error of an output image caused by the detection error of the moving object is avoided, and the quality of the output image of the moving object is ensured.

In an alternative implementation manner of the present application, as shown in fig. 6, the step 502 "the terminal determines that the target object is a moving object or a stationary object according to the position of the target object in each frame of image" may include the following steps:

step 601, the terminal calculates the position deviation value of the target object in any two frames of images of the multi-frame images. If the maximum position deviation value is smaller than the position deviation threshold, go to step 602; if the position deviation value of the target object in any two frames of images of the multiple frames of images is greater than or equal to the position deviation threshold value, step 603 is executed.

In step 602, the terminal determines that the target object is a stationary object.

And step 603, the terminal determines that the target object is a moving object.

Specifically, after labeling the pixel position corresponding to the same target object in each frame of image, the terminal may determine the position of the same target object in each frame of image. The terminal can compare the positions corresponding to the same target object in any two frames of images, and perform difference calculation on the positions corresponding to the same target object in any two frames of images to obtain the position deviation value of the target object in any two frames of images. The terminal can compare the position deviation values of the target objects in any two frames of images, so that the maximum position deviation value is determined.

And after determining the maximum position deviation value of the target object in any two frames of images, the terminal compares the maximum position deviation value with a position deviation threshold, if the maximum position deviation value is smaller than the position deviation threshold, the position deviation of the target object in the multi-frame images is small, and the terminal determines that the target object is a static object. If the position deviation value of the target object in any two frames of images of the multi-frame images is larger than or equal to the position deviation threshold value, the fact that the position deviation of the target object in the multi-frame images is large is indicated, and the terminal determines that the target object is a moving object.

For example, as shown in fig. 7, after the position of the target object D is labeled in each frame of image, the terminal calculates the corresponding position deviation of the target position D in any two frames of images. If 5 frames of images are provided, the terminal respectively calculates the position deviation between the position corresponding to the target object D in the first frame of image and the position corresponding to the target object D in the second frame of image, and the position deviation between the position corresponding to the target object D in the first frame of image and the position corresponding to the target object D in the third frame of image, and so on, and respectively calculates the position deviation corresponding to the target object D in any two frames of images. The terminal compares the obtained plurality of positional deviations, and determines the maximum positional deviation therefrom. If the maximum positional deviation is 5 pixel distances, the positional deviation threshold is 10 pixel distances. And the terminal compares the maximum position deviation with the position deviation threshold value, and determines that the target object is a static object if the maximum position deviation is smaller than the position deviation threshold value. If there is a position deviation of the target object D in any two frames of images, the position deviation threshold is 10 pixel distances, and the position deviation is 15 pixel distances. And the terminal compares the maximum position deviation with the position deviation threshold value, and determines that the target object is a moving object if the maximum position deviation is larger than the position deviation threshold value.

In the embodiment of the application, the terminal calculates the position deviation value of the target object in any two frames of images of the multiple frames of images. If the maximum position deviation value is smaller than the position deviation threshold value, the terminal determines that the target object is a static object; and if the position deviation value of the target object in any two frames of images of the multi-frame images is greater than or equal to the position deviation threshold value, the terminal determines that the target object is a moving object. According to the method, the terminal compares the position deviation value of the target object in any two frames of images of the multi-frame images with the position deviation threshold value through comparing the target object in any two frames of images of the multi-frame images, so that whether the target object is a moving object or a static object can be accurately and effectively determined, errors of output images caused by moving object detection errors are avoided, and the quality of the output images with the moving object removed is guaranteed.

In an alternative implementation manner of the present application, as shown in fig. 8, the step 202 "the terminal performs classification processing on the target objects included in each frame of image, and determines the moving object and the stationary object included in each frame of image", and may further include the following steps:

in step 801, the terminal determines the number of target pixels at the tracking position for each frame of image.

The target pixel is used for displaying a target object, and the tracking position is the position of the target object in any one frame of image in the multi-frame images.

Specifically, the terminal may determine the position of the target object in any one frame of image as the tracking position, and determine the same position corresponding to other frames as the tracking position according to the tracking position in the current frame, so as to ensure that the tracking positions in the multiple frames of images are the same, and the tracking position may show the target object more or less.

After determining the tracking position in each frame of image, the terminal may calculate the number of target pixels at the tracking position in each frame of image. The target pixel is used for displaying the target object. That is, the terminal may calculate the number of pixels displaying the target object in the tracking position per one frame image.

In step 802, the terminal determines that the target object is a moving object or a static object according to the number of target pixels of each frame of image at the tracking position.

Specifically, the terminal may compare the number of target pixels of any two frames of images at the tracking position, and determine that the target object is a moving object or a stationary object according to the comparison result.

In the embodiment of the application, the terminal determines the number of target pixels of each frame of image at the tracking position, and determines that the target object is a moving object or a static object according to the number of target pixels of each frame of image at the tracking position. By using the method, the terminal can accurately determine whether the target object is a moving object or a static object, thereby avoiding the error of the output image caused by the detection error of the moving object and ensuring the quality of the output image without the moving object.

In an alternative implementation manner of the present application, as shown in fig. 9, the step 802 "determining, by the terminal, that the target object is a moving object or a stationary object according to the number of target pixels at the tracking position in each frame of image" may include the following steps:

in step 901, the terminal calculates the difference of the number of target pixels of any two frames of images in the tracking position in the multi-frame images.

Specifically, after determining the number of target pixels at the tracking position of each frame of image, the terminal may calculate the difference between the number of target pixels at the tracking position of any two frames of images, respectively.

For example, assuming that there are 5 frames of images, the number of target pixels of the first frame of image at the tracking position is 108; the number of target pixels of the second frame image at the tracking position is 111; the number of target pixels of the third frame image at the tracking position is 100; the number of target pixels of the fourth frame image at the tracking position is 105; the number of target pixels of the fifth frame image at the tracking position is 113. And the terminal respectively calculates the difference of the number of target pixels of any two frames of images at the tracking position.

In step 902, if the maximum number difference is smaller than the threshold value of the number of pixels, the terminal determines that the target object is a stationary object.

Specifically, the terminal calculates the difference between the number of target pixels at the tracking position of any two frames of images respectively. And sorting the number differences of the plurality of target pixels obtained by calculation, and selecting the largest number difference from the target pixels. And the terminal compares the maximum number difference with a pixel number threshold, and if the maximum number difference is smaller than the pixel number threshold, the target object is not moved, and the terminal determines that the target object is a static object.

Illustratively, the maximum number difference is 9, and the pixel number threshold is 15, and the terminal determines that the maximum number difference is less than the pixel number threshold for the relationship between the maximum number difference and the pixel number threshold, and the terminal determines that the target object is stationary.

In step 903, if the difference between the number of target pixels of any two frames of images at the tracking position is greater than or equal to the pixel number threshold, the terminal determines that the target object is a moving object.

Specifically, after the terminal calculates the difference between the number of target pixels of any two frames of images at the tracking position each time, the terminal may compare the difference between the number of target pixels calculated at the last time with a pixel number threshold, and after the difference between the number of target pixels is found to be greater than or equal to the pixel number threshold for the first time, the terminal determines that the target object is a moving object, and the terminal does not calculate the difference between the number of target pixels of any two remaining frames of images at the tracking position any more.

For example, after calculating the difference between the numbers of target pixels at the tracking positions of the first frame image and the second frame image, the terminal determines that the difference between the numbers of target pixels at the tracking positions of the first frame image and the second frame image is 20 and the threshold value of the number of pixels is 15, the difference between the numbers of target pixels at the tracking positions of the first frame image and the second frame image is greater than the threshold value of the number of pixels, the terminal determines that the target object is a moving object, and the terminal will not calculate the difference between the numbers of target pixels at the tracking positions of any two remaining frame images.

In the embodiment of the application, the terminal calculates the difference of the number of target pixels of any two frames of images in the tracking position in the multiple frames of images, and if the maximum difference of the number of the target pixels is smaller than the threshold value of the number of the pixels, the terminal determines that the target object is a static object; and if the difference of the number of the target pixels of any two frames of images at the tracking position is greater than or equal to the pixel number threshold, the terminal determines that the target object is a moving object. According to the method, the terminal compares the difference between the number of the target pixels of any two frames of images in the tracking position in the multi-frame images and the threshold value of the number of the pixels, so that whether the target object is a moving object or a static object can be accurately and effectively determined, errors of output images caused by detection errors of the moving object are avoided, and the quality of the output images with the moving object removed is guaranteed.

In an alternative implementation manner of the present application, as shown in fig. 10, the step 203 of removing a moving object in each frame of image to obtain a background image corresponding to each frame of image may include the following steps:

in step 1001, the terminal marks the pixels corresponding to the moving object in each frame image as invalid pixels.

Specifically, after determining that each target object is a stationary object or a moving object, the terminal may perform target segmentation on the moving object in each frame of image using a target segmentation algorithm, so as to obtain a more accurate mask image corresponding to multiple frames of images. The terminal may represent the mask image corresponding to each frame image as a binary image. Optionally, the pixel position corresponding to the moving object may be 0, and the other pixel positions may be 1. The pixel position is 1 to indicate that the pixel is valid, and the pixel position is 0 to indicate that the pixel is invalid, so that the pixel corresponding to the moving object in each frame of image is marked as an invalid pixel.

In step 1002, the terminal generates a background image corresponding to each frame of image according to the remaining pixels of each frame of image except the invalid pixels.

Specifically, after the terminal marks the moving object in each frame of image as an invalid pixel, a background image corresponding to each frame of image may be generated according to other pixels except the invalid pixel.

In the embodiment of the application, the terminal marks the pixels corresponding to the moving object in each frame of image as invalid pixels, and generates the background image corresponding to each frame of image according to the remaining pixels except the invalid pixels in each frame of image, so that the background image in each frame of image can be eliminated and determined, and the background image has no moving object.

In an alternative implementation manner of the present application, as shown in fig. 11, the step 204 of the terminal covering the still object in the target background image with the still object in the multi-frame image to obtain the output image of the camera module by using the still object in the multi-frame image may include the following steps:

in step 1101, the terminal determines a reference image from the multi-frame image.

And the definition of the static object in the reference image is greater than that of the corresponding static object in the target background image.

Specifically, the terminal can identify the definition of the static object in the multi-frame image through an image definition identification algorithm. The terminal can sequence the definition of the static objects in the multi-frame images according to the definition recognition results of the multi-frame images, and selects one frame image with the highest definition of the static objects as a reference image.

Step 1102, the terminal covers the static object in the target background image with the static object in the reference image to obtain an output image.

Specifically, the terminal may extract an image corresponding to the still object in the reference image, and overlay the image corresponding to the still object in the reference image on the image corresponding to the still object in the target background image.

Optionally, when the terminal covers the image corresponding to the stationary object in the reference image with the image corresponding to the stationary object in the target background image, classical technologies such as poisson fusion and multiband fusion can be adopted, so that the output image is more natural on the boundary of the stationary object region.

In the embodiment of the application, the terminal determines the reference image according to the multi-frame image, and covers the static object in the target background image by using the static object in the reference image to obtain the output image. Therefore, the definition of a middle static object of the output image can be ensured, and the whole output image is clearer.

To better explain the image processing method described in the embodiment of the present application, an alternative operation flow of the image processing method is shown in fig. 12.

Step 1201, the terminal acquires a plurality of frames of images shot by the camera module on the same scene, performs target detection on the plurality of frames of images by using the target detection model, acquires a target object included in each frame of image in the plurality of frames of images, and executes step 1202 or step 1206.

In step 1202, the terminal determines the position of the target object in each frame of image.

Step 1203, the terminal calculates a position deviation value of the target object in any two frames of images of the multiple frames of images, and if the maximum position deviation value is smaller than a position deviation threshold value, step 1204 is executed; if the position deviation value of the target object in any two frames of images of the multiple frames of images is greater than or equal to the position deviation threshold value, step 1205 is executed.

In step 1204, the terminal determines that the target object is a stationary object.

In step 1205, the terminal determines that the target object is a moving object and executes step 1210.

In step 1206, the terminal determines the number of target pixels at the tracking position of each frame image.

In step 1207, the terminal calculates the difference between the target pixels of any two frames of images in the tracking position in the multi-frame images. If the maximum number difference is smaller than the threshold value of the number of pixels, go to step 1208; if the difference between the numbers of target pixels at the tracking positions of any two frames of images is greater than or equal to the pixel number threshold, step 1209 is executed.

In step 1208, the terminal determines that the target object is a stationary object.

In step 1209, the terminal determines that the target object is a moving object and performs step 1210.

In step 1210, the terminal marks the pixels corresponding to the moving object in each frame of image as invalid pixels.

In step 1211, the terminal generates a background image corresponding to each frame of image according to the remaining pixels of each frame of image except the invalid pixel.

And 1212, fusing all the background images by the terminal to generate a target background image.

In step 1213, the terminal determines a reference image from the multi-frame image.

In step 1214, the terminal covers the still object in the target background image with the still object in the reference image to obtain an output image.

It should be understood that although the various steps in the flowcharts of fig. 2, 5-6, and 8-12 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least some of the steps in fig. 2, 5-6, and 8-12 may include multiple steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, and the order of execution of the steps or stages is not necessarily sequential, but may be alternated or performed with other steps or at least some of the other steps.

In an embodiment of the present application, as shown in fig. 13, there is provided an image processing apparatus 1300 including: an acquisition module 1310, a determination module 1320, a removal module 1330, and an overlay module 1340, wherein:

the obtaining module 1310 is configured to obtain multiple frames of images shot in the same scene by the camera module, perform target detection on the multiple frames of images by using the target detection model, and obtain a target object included in each frame of image in the multiple frames of images;

a determining module 1320, configured to perform classification processing on the target object included in each frame of image, and determine a moving object and a stationary object included in each frame of image;

a removing module 1330, configured to remove a moving object in each frame of image, obtain a background image corresponding to each frame of image, perform fusion processing on all background images, and generate a target background image;

the covering module 1340 is configured to cover the still object in the target background image with the still object in the multiple frames of images, so as to obtain an output image of the camera module.

In an embodiment of the present application, as shown in fig. 14, the determining module 1320 includes: a first determining unit 1321 and a second determining unit 1322, wherein:

a first determining unit 1321, configured to determine a position of the target object in each frame of the image.

The second determining unit 1322 is configured to determine that the target object is a moving object or a stationary object according to the position of the target object in each frame of image.

In an embodiment of the application, the second determining unit 1322 is specifically configured to calculate a position deviation value of the target object in any two frames of the multi-frame images, determine that the target object is a stationary object if the maximum position deviation value is smaller than the position deviation threshold, and determine that the target object is a moving object if the position deviation value of the target object in any two frames of the multi-frame images is greater than or equal to the position deviation threshold.

In an embodiment of the present application, as shown in fig. 15, the determining module 1320 includes: a third determining unit 1323 and a fourth determining unit 1324, wherein:

a third determining unit 1323, configured to determine the number of target pixels of each frame of image at a tracking position, where the target pixels are used for displaying the target object, and the tracking position is a position of the target object in any one of the images of the multiple frames of images.

And a fourth determining unit 1324, configured to determine that the target object is a moving object or a stationary object according to the number of target pixels at the tracking position in each frame of image.

In an embodiment of the present application, the fourth determining unit 1324 is specifically configured to calculate a difference between the number of target pixels at the tracking position of any two frames of images in the plurality of frames of images; if the maximum number difference is smaller than the pixel number threshold, determining that the target object is a static object; and if the difference of the number of the target pixels of any two frames of images at the tracking position is greater than or equal to the pixel number threshold value, determining that the target object is a moving object.

In an embodiment of the present application, the removing module 1330 is specifically configured to mark a pixel corresponding to a moving object in each frame of image as an invalid pixel; and generating a background image corresponding to each frame of image according to the rest pixels except the invalid pixel in each frame of image.

In an embodiment of the present application, the covering module 1340 is specifically configured to determine a reference image according to the multiple frames of images, where the definition of a stationary object in the reference image is greater than the definition of a corresponding stationary object in the target background image; and covering the static object in the target background image by using the static object in the reference image to obtain an output image.

For specific limitations of the image processing apparatus, reference may be made to the above limitations of the image processing method, which are not described herein again. The respective modules in the image processing apparatus described above may be wholly or partially implemented by software, hardware, and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

In one embodiment of the present application, there is provided a computer device comprising a memory and a processor, the memory having stored therein a computer program, the processor implementing the following steps when executing the computer program: acquiring multi-frame images shot in the same scene by a camera module, and performing target detection on the multi-frame images by using a target detection model to obtain a target object included in each frame image in the multi-frame images; classifying the target objects included in each frame of image, and determining moving objects and static objects included in each frame of image; removing the moving object in each frame of image, obtaining a background image corresponding to each frame of image, and performing fusion processing on all the background images to generate a target background image; and covering the static object in the target background image by using the static object in the multi-frame image to obtain an output image of the camera module.

In one embodiment of the application, the processor when executing the computer program further performs the following steps: determining the position of the target object in each frame of image; and determining the target object as a moving object or a static object according to the position of the target object in each frame of image.

In one embodiment of the application, the processor when executing the computer program further performs the following steps: and calculating the position deviation value of the target object in any two frames of images of the multi-frame images, if the maximum position deviation value is smaller than the position deviation threshold value, determining that the target object is a static object, and if the position deviation value of the target object in any two frames of images of the multi-frame images is larger than or equal to the position deviation threshold value, determining that the target object is a moving object.

In one embodiment of the application, the processor when executing the computer program further performs the following steps: determining the number of target pixels of each frame of image at a tracking position, wherein the target pixels are used for displaying a target object, and the tracking position is the position of the target object in any one frame of image in the multiple frames of images; and determining that the target object is a moving object or a static object according to the number of the target pixels of each frame image at the tracking position.

In one embodiment of the application, the processor when executing the computer program further performs the following steps: calculating the difference of the number of target pixels of any two frames of images in the tracking position in the multi-frame images; if the maximum number difference is smaller than the pixel number threshold, determining that the target object is a static object; and if the difference of the number of the target pixels of any two frames of images at the tracking position is greater than or equal to the pixel number threshold value, determining that the target object is a moving object.

In one embodiment of the application, the processor when executing the computer program further performs the following steps: marking the pixels corresponding to the moving object in each frame image as invalid pixels; and generating a background image corresponding to each frame of image according to the rest pixels except the invalid pixel in each frame of image.

In one embodiment of the application, the processor when executing the computer program further performs the following steps: determining a reference image according to the multi-frame image, wherein the definition of a static object in the reference image is greater than that of a corresponding static object in the target background image; and covering the static object in the target background image by using the static object in the reference image to obtain an output image.

In one embodiment of the present application, there is provided a computer readable storage medium having a computer program stored thereon, the computer program when executed by a processor implementing the steps of: acquiring multi-frame images shot in the same scene by a camera module, and performing target detection on the multi-frame images by using a target detection model to obtain a target object included in each frame image in the multi-frame images; classifying the target objects included in each frame of image, and determining moving objects and static objects included in each frame of image; removing the moving object in each frame of image, obtaining a background image corresponding to each frame of image, and performing fusion processing on all the background images to generate a target background image; and covering the static object in the target background image by using the static object in the multi-frame image to obtain an output image of the camera module.

In one embodiment of the application, the computer program when executed by the processor further performs the steps of: determining the position of the target object in each frame of image; and determining the target object as a moving object or a static object according to the position of the target object in each frame of image.

In one embodiment of the application, the computer program when executed by the processor further performs the steps of: and calculating the position deviation value of the target object in any two frames of images of the multi-frame images, if the maximum position deviation value is smaller than the position deviation threshold value, determining that the target object is a static object, and if the position deviation value of the target object in any two frames of images of the multi-frame images is larger than or equal to the position deviation threshold value, determining that the target object is a moving object.

In one embodiment of the application, the computer program when executed by the processor further performs the steps of: determining the number of target pixels of each frame of image at a tracking position, wherein the target pixels are used for displaying a target object, and the tracking position is the position of the target object in any one frame of image in the multiple frames of images; and determining that the target object is a moving object or a static object according to the number of the target pixels of each frame image at the tracking position.

In one embodiment of the application, the computer program when executed by the processor further performs the steps of: calculating the difference of the number of target pixels of any two frames of images in the tracking position in the multi-frame images; if the maximum number difference is smaller than the pixel number threshold, determining that the target object is a static object; and if the difference of the number of the target pixels of any two frames of images at the tracking position is greater than or equal to the pixel number threshold value, determining that the target object is a moving object.

In one embodiment of the application, the computer program when executed by the processor further performs the steps of: marking the pixels corresponding to the moving object in each frame image as invalid pixels; and generating a background image corresponding to each frame of image according to the rest pixels except the invalid pixel in each frame of image.

In one embodiment of the application, the computer program when executed by the processor further performs the steps of: determining a reference image according to the multi-frame image, wherein the definition of a static object in the reference image is greater than that of a corresponding static object in the target background image; and covering the static object in the target background image by using the static object in the reference image to obtain an output image.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database or other medium used in the embodiments provided herein can include at least one of non-volatile and volatile memory. Non-volatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical storage, or the like. Volatile Memory can include Random Access Memory (RAM) or external cache Memory. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM), among others.

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. An image processing method, characterized in that the method comprises:

acquiring multi-frame images shot in the same scene by a camera module, and performing target detection on the multi-frame images by using a target detection model to obtain a target object included in each frame image in the multi-frame images;

classifying the target object included in each frame of image, and determining a moving object and a static object included in each frame of image;

removing the moving object in each frame of image, obtaining a background image corresponding to each frame of image, and performing fusion processing on all the background images to generate a target background image;

and covering the static object in the target background image by using the static object in the multi-frame image to obtain an output image of the camera module.

2. The method according to claim 1, wherein the classifying the target object included in each frame of image, and determining that each frame of image includes a moving object and a static object comprises:

determining the position of the target object in each frame of image;

and determining the target object to be a moving object or a static object according to the position of the target object in each frame of image.

3. The method according to claim 2, wherein the determining the target object as a moving object or a static object according to the position of the target object in each frame of image comprises:

and calculating the position deviation value of the target object in any two frames of images of the multi-frame images, if the maximum position deviation value is smaller than a position deviation threshold value, determining that the target object is a static object, and if the position deviation value of the target object in any two frames of images of the multi-frame images is larger than or equal to the position deviation threshold value, determining that the target object is a moving object.

4. The method according to claim 1, wherein the classifying the target object included in each frame of image, and determining that each frame of image includes a moving object and a static object comprises:

determining the number of target pixels of each frame of image at a tracking position, wherein the target pixels are used for displaying the target object, and the tracking position is the position of the target object in any one frame of image in the multiple frames of images;

and determining that the target object is a moving object or a static object according to the number of target pixels of each frame of image at the tracking position.

5. The method of claim 4, wherein the determining the target object as a moving object or a static object according to the number of target pixels of each frame of image at the tracking position comprises:

calculating the difference of the number of target pixels of any two frames of images in the tracking position;

if the maximum number difference is smaller than the pixel number threshold, determining that the target object is the static object;

and if the difference of the number of the target pixels of any two frames of images at the tracking position is greater than or equal to a pixel number threshold value, determining that the target object is the moving object.

6. The method of claim 1, wherein removing the moving object in each frame of image to obtain a background image corresponding to each frame of image comprises:

marking the pixels corresponding to the moving object in each frame image as invalid pixels;

and generating a background image corresponding to each frame of image according to the other pixels except the invalid pixel in each frame of image.

7. The method according to claim 1, wherein the covering the still object in the target background image with the still object in the multi-frame image to obtain the output image of the camera module comprises:

determining a reference image according to the multi-frame image, wherein the definition of a static object in the reference image is greater than that of a corresponding static object in the target background image;

and covering the static object in the target background image by using the static object in the reference image to obtain the output image.

8. An image processing apparatus, characterized in that the apparatus comprises:

the acquisition module is used for acquiring multi-frame images shot by the camera module on the same scene, and performing target detection on the multi-frame images by using a target detection model to obtain a target object included in each frame of image in the multi-frame images;

the removing module is used for removing the moving object in each frame of image, obtaining a background image corresponding to each frame of image, and performing fusion processing on all the background images to generate a target background image;

and the covering module is used for covering the static object in the target background image by utilizing the static object in the multi-frame image to obtain an output image of the camera module.

9. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor, when executing the computer program, implements the steps of the method of any of claims 1 to 7.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 7.