WO2022206680A1 - 图像处理方法、装置、计算机设备和存储介质 - Google Patents

图像处理方法、装置、计算机设备和存储介质 Download PDF

Info

Publication number
WO2022206680A1
WO2022206680A1 PCT/CN2022/083404 CN2022083404W WO2022206680A1 WO 2022206680 A1 WO2022206680 A1 WO 2022206680A1 CN 2022083404 W CN2022083404 W CN 2022083404W WO 2022206680 A1 WO2022206680 A1 WO 2022206680A1
Authority
WO
WIPO (PCT)
Prior art keywords
frame
target
image
images
pixels
Prior art date
Application number
PCT/CN2022/083404
Other languages
English (en)
French (fr)
Inventor
张伟俊
Original Assignee
影石创新科技股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 影石创新科技股份有限公司 filed Critical 影石创新科技股份有限公司
Publication of WO2022206680A1 publication Critical patent/WO2022206680A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/73Deblurring; Sharpening
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration using two or more images, e.g. averaging or subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/194Segmentation; Edge detection involving foreground-background segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion

Definitions

  • the present application relates to the field of computer technology, and in particular, to an image processing method, apparatus, computer device and storage medium.
  • the moving objects are usually identified by relying on the change of each pixel value in different frame images, so as to remove the moving objects in the images.
  • an image processing method comprising:
  • the target area of the target background image is used to cover the target area of one frame of images in the multiple frames of images, and the output image of the camera module is obtained, including: determining the number of pixels of moving objects in each frame of images; determining the number of pixels A frame image with the smallest number is the reference image; the target area of the target background image is used to cover the target area of the reference image to obtain the output image of the camera module.
  • classifying the target objects included in each frame of images, and determining the moving objects and stationary objects included in each frame of images includes: determining the position of the target objects in each frame of images; At the position in each frame of the image, determine whether the target object is a moving object or a stationary object.
  • determining whether the target object is a moving object or a stationary object according to the position of the target object in each frame of images includes: calculating the position deviation value of the target object in any two frames of the multi-frame images, if the maximum If the position deviation value of the target object is less than the position deviation threshold value, the target object is determined to be a stationary object. If the position deviation value of the target object in any two frames of the multi-frame image is greater than or equal to the position deviation threshold value, the target object is determined to be a moving object.
  • classifying the target objects included in each frame of image, and determining the moving objects and stationary objects included in each frame of image includes: determining the number of target pixels in the tracking position of each frame of image, the target The pixel is used to display the target object, and the tracking position is the position of the target object in any frame of the multi-frame image; according to the number of target pixels in the tracking position of each frame image, the target object is determined to be a moving object or a stationary object.
  • determining whether the target object is a moving object or a stationary object according to the number of target pixels at the tracking position for each frame of images includes: calculating the number of target pixels at the tracking position for any two frames of images in the multi-frame images If the maximum number difference is less than the pixel number threshold, the target object is determined to be a stationary object; if the target pixel number difference between any two frames of images at the tracking position is greater than or equal to the pixel number threshold, the target object is determined to be a moving object.
  • removing the moving objects in each frame of images to obtain a background image corresponding to each frame of images includes: marking pixels corresponding to the moving objects in each frame of images as invalid pixels; The remaining pixels in the image except the invalid pixels generate the background image corresponding to each frame of image.
  • an image processing apparatus in a second aspect, includes:
  • the acquisition module is used for acquiring multiple frames of images shot by the camera module on the same scene, and using the target detection model to perform target detection on the multiple frames of images to obtain the target object included in each frame of the multiple frames of images;
  • a determination module used for classifying and processing the target objects included in each frame of images, and determining the moving objects and stationary objects included in each frame of images;
  • the removal module is used to remove the moving objects in each frame of image, obtain the background image corresponding to each frame of image, and perform fusion processing on all the background images to generate the target background image;
  • the covering module is used to cover the target area of one frame of the image in the multi-frame images by using the target area of the target background image to obtain the output image of the camera module; wherein, the target area corresponds to the moving object Area.
  • a computer device including a memory and a processor, wherein the memory stores a computer program, and when the processor executes the computer program, the method according to any one of the foregoing first aspects is implemented.
  • a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, implements the method according to any one of the foregoing first aspects.
  • the above-mentioned image processing method, device, computer equipment and storage medium obtain multiple frames of images shot by a camera module on the same scene, use a target detection model to perform target detection on the multiple frames of images, and obtain the target included in each frame of the multiple frame images.
  • Objects classify and process the target objects included in each frame of image, determine the moving objects and stationary objects included in each frame of image; remove the moving objects in each frame of image, obtain the background image corresponding to each frame of image, and analyze all
  • the background images are fused to generate the target background image; the target area of the target background image is used to cover the target area of one frame of the multi-frame images to obtain the output image of the camera module.
  • the foreground object (for example, the target object described above) in the multi-frame images can be accurately identified by the target detection model, which improves the accuracy of the target object identification result.
  • the target detection model By classifying the target objects included in each frame of image, it is determined whether the target object is a moving object or a stationary object, thereby preventing the identification of stationary objects and moving objects from being wrong.
  • the moving objects in the multi-frame images are removed to generate a background image.
  • the target background image is generated, the ghost in the target background image is eliminated, and the clarity of the target background image is ensured.
  • the target area of the target background image is used to cover the target area of one frame of the multi-frame image, so that moving objects are removed from the final generated output image, and there is no ghost in the output image, and the overall clarity of the image is guaranteed. , which improves image quality.
  • Fig. 1 is the application environment diagram of the image processing method in one embodiment
  • FIG. 2 is a schematic flowchart of an image processing method in one embodiment
  • FIG. 3 is a schematic diagram of determining a target position in a multi-frame image in an image processing method in one embodiment
  • FIG. 4 is a schematic diagram of an image processing method covering a target area of a frame of images in multiple frames of images using a target area of a target background image in one embodiment
  • FIG. 6 is a schematic flowchart of an image processing method in another embodiment
  • FIG. 7 is a schematic flowchart of an image processing method in another embodiment
  • FIG. 8 is a schematic diagram of determining a target object in a multi-frame image in an image processing method in one embodiment
  • FIG. 10 is a schematic flowchart of an image processing method in another embodiment
  • FIG. 11 is a schematic flowchart of an image processing method in another embodiment
  • FIG. 13 is a structural block diagram of an image processing apparatus in one embodiment
  • FIG. 14 is a structural block diagram of an image processing apparatus in one embodiment
  • FIG. 15 is a structural block diagram of an image processing apparatus in one embodiment.
  • the image processing method provided by the present application can be applied to the computer device as shown in FIG. 1 .
  • the computer equipment may be a terminal. Its internal structure diagram can be shown in Figure 1.
  • the computer equipment includes a processor, memory, a communication interface, a display screen, and an input device connected by a system bus. Among them, the processor of the computer device is used to provide computing and control capabilities.
  • the memory of the computer device includes a non-volatile storage medium, an internal memory.
  • the nonvolatile storage medium stores an operating system and a computer program.
  • the internal memory provides an environment for the execution of the operating system and computer programs in the non-volatile storage medium.
  • the communication interface of the computer device is used for wired or wireless communication with an external terminal, and the wireless communication can be realized by WIFI, operator network, NFC (Near Field Communication) or other technologies.
  • the computer program implements an image processing method when executed by a processor.
  • the display screen of the computer equipment may be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment may be a touch layer covered on the display screen, or a button, a trackball or a touchpad set on the shell of the computer equipment , or an external keyboard, trackpad, or mouse.
  • FIG. 1 is only a block diagram of a partial structure related to the solution of the present application, and does not constitute a limitation on the computer equipment to which the solution of the present application is applied. Include more or fewer components than shown in the figures, or combine certain components, or have a different arrangement of components.
  • an image processing method is provided, and the method is applied to the terminal in FIG. 1 as an example for description, including the following steps:
  • step 201 the terminal acquires multiple frames of images shot by the camera module on the same scene, and uses a target detection model to perform target detection on the multiple frames of images to obtain a target object included in each frame of the multiple frames of images.
  • the user can place the device where the camera module is located at a fixed position and keep the device still, so that the camera module can capture multiple frames of images of the same scene.
  • the relative positions of stationary objects in multiple frames of images captured by the camera module of the same scene do not change (for example, the stationary objects may be buildings, people being photographed, or trees, etc.), and the relative positions of moving objects may occur. Changes (for example, a moving object may be a person, animal, or vehicle, etc., that suddenly intrudes into the scene currently being filmed).
  • the same scene here is mainly the same shooting scene for stationary objects, that is, the stationary object is the target object in the final desired image, and the moving object is mistakenly entered into this shooting scene, which is not the user’s intention. need.
  • the above-mentioned method of fixing the shooting device where the camera module is located can obtain multiple frames of images of the same scene, but the method of obtaining multiple frames of images of the same scene by shooting is not limited to this, which is not specifically limited in this embodiment.
  • the terminal or the photographing device may control the camera module to photograph multiple frames of continuous images.
  • the photographing instruction input by the user may be that the user presses the shutter button, or the user enters a voice photographing password, or the terminal or the photographing device detects the user's photographing gesture. Instructions are not specifically limited.
  • the multi-frame images can be stored in the storage device, and the terminal can obtain the multiple frames of images captured by the camera module of the same scene from the storage device.
  • the terminal can input multiple frames of images into the target detection model, and use the target detection model to extract features in the multiple frames of images, thereby determining the target object in each frame of images.
  • the target detection model can be a model based on manual features, such as DPM (Deformable Parts Model, deformable parts model), and the target detection model can also be a model based on a convolutional neural network, such as YOLO (You Only Look Once, you only Look once) detector, R-CNN, (Region-based Convolutional Neural Networks, region-based convolutional neural network) model, SSD (Single Shot MultiBox, single shot multiple box) detector and Mask R-CNN (Mask Region- based Convolutional Neural Networks, a masked region-based convolutional neural network) model, etc.
  • YOLO You Only Look Once, you only Look once
  • R-CNN Registered-based Convolutional Neural Networks, region-based convolutional neural network
  • SSD Single Shot MultiBox, single shot multiple box
  • Mask R-CNN Mask Region- based Convolutional Neural Networks, a masked region-based convolutional neural network
  • Step 202 The terminal classifies the target objects included in each frame of image, and determines the moving objects and stationary objects included in each frame of image.
  • the terminal can use the target tracking algorithm to track the same target object included in the multi-frame images, determine the position of the same target object in different frame images, and determine whether the same target object is a moving object or a stationary object, thereby Classify moving and stationary objects in each frame of image.
  • the terminal uses the target tracking algorithm to identify the position of the target object A in the multi-frame images respectively, according to the position of the target object A in the multi-frame images. , determine whether the target object A is a moving object or a stationary object.
  • the terminal can also use the target tracking algorithm to track the same position in the multi-frame images, determine the number of pixels in which the target object is detected at the same position in the multi-frame images, and determine the number of pixels in which the target object is detected at the same position in the multi-frame images.
  • the number of displayed pixels determines whether the target object is a moving object or a stationary object.
  • the terminal detects the position of the target object B in the first frame image according to the target tracking algorithm, and determines the position of the target object B in the first frame image as the target position.
  • the terminal determines the same position in other multi-frame images as the target position according to the position of the target position in the first frame of image.
  • the terminal tracks the number of pixels occupied by the target object B at the same target position of the multi-frame image, and determines whether the target object B is a moving object or a stationary object according to the number of pixels of the target object B at the target position of the multi-frame image.
  • Step 203 the terminal removes the moving objects in each frame of images, obtains a background image corresponding to each frame of images, and performs fusion processing on all the background images to generate a target background image.
  • the terminal marks the pixels in the target rectangular frame where the moving object in each frame of image is located as invalid pixels, obtains the background image corresponding to each frame of image, All background images are fused to generate the target background image.
  • the pixels in the target rectangular frame where the moving objects in each frame of image are located are marked as invalid pixels, and after acquiring the background image corresponding to each frame of image, the terminal can use the pixel-level image fusion method to merge multiple frames of background images.
  • Perform fusion processing to generate the target background image wherein the pixel-level image fusion method can be an image fusion method based on non-multi-scale transformation (for example: average and weighted average method, logical filter method, mathematical morphology method, image algebra method, etc. ) or an image fusion method based on multi-scale transformation (eg: pyramid image fusion method, wavelet transform image fusion method, neural network-based image fusion method, etc.).
  • the fusion method of multiple frames of background images is not limited, and the pixel-level image fusion method is used to retain more image information.
  • the terminal may also use the background modeling method to perform fusion processing on the background images corresponding to each frame of images.
  • the background modeling method can use a non-recursive background modeling method, or a recursive background modeling method, wherein the non-recursive background modeling method can include median, mean model, linear prediction model, non-parametric kernel density estimation etc.
  • recursive background modeling methods may include approximate median filtering methods, single Gaussian model methods, mixture Gaussian model methods, and the like.
  • the embodiments of the present application take the median model modeling method in the non-recursive background modeling method as an example for detailed description.
  • n frames of images there are n frames of images.
  • the pixels corresponding to the moving objects in each frame of the mask image in the mask image set are invalid pixels, and the invalid pixels can be marked as 0, and the pixels except the moving objects are valid pixels, and each valid pixel can be marked as 1 to generate the corresponding mask map.
  • the value range of the pixel value of each pixel in M k may be ⁇ 0, 1 ⁇ , where 0 represents an invalid pixel, and 1 represents a valid pixel.
  • I k (p) and M k (p) represent the pixel values of I k and M k corresponding to the pixel point at the coordinate position p, respectively.
  • B and B(p) to represent the pixel value of the synthesized target background image and the pixel value of the background image corresponding to the pixel point at the coordinate position p, respectively, then the corresponding calculation formula is:
  • Median(*) in formula (1) represents the operation of taking the median of the elements in the set.
  • Step 204 the terminal uses the target area of the target background image to cover the target area of one frame of images in the multiple frames of images, and obtains the output image of the camera module.
  • the target area is the area corresponding to the moving object.
  • the photography module obtains multiple frames of images of the same scene, there will be human errors or equipment errors, resulting in a slight deviation in the positions of each stationary or moving object in the multiple frames of images, so that after fusion processing, The edges corresponding to the moving objects in the generated target background image become blurred, in order to improve the clarity of the output image.
  • the terminal can identify the sharpness of multiple frames of images according to the sharpness identification model, and select a frame of images with the highest sharpness from the multiple frames of images as a reference image.
  • the terminal identifies the moving object in the reference image, and determines the area corresponding to the moving object.
  • the terminal determines the area corresponding to the moving object in the target background image according to the area corresponding to the moving object in the reference image.
  • the terminal extracts the area corresponding to the moving object in the target background image, and covers the area corresponding to the moving object in the target background image to the area corresponding to the moving object in the reference image, thereby obtaining the output image of the camera module.
  • picture A in FIG. 4 is an image of an optional frame among the multi-frame images
  • picture B is the target background image
  • picture C is the output image of the camera module.
  • the terminal identifies the regions corresponding to the moving objects (1) and (2) in Figure A, and determines the moving objects (1) and ( 2) The corresponding area.
  • the terminal extracts and copies the regions corresponding to the moving objects (1) and (2) in the target background image to the regions corresponding to the moving objects (1) and (2) in Figure A, thereby generating Figure C, that is, the output of the camera module image.
  • the terminal may also identify motion problems in the multi-frame images, calculate the number of moving objects in each frame of the multi-frame images, and include a frame of images with the least number of moving objects as a reference image.
  • the terminal determines the area corresponding to the moving object in the target background image according to the area corresponding to the moving object in the reference image.
  • the terminal extracts the area corresponding to the moving object in the target background image, and covers the area corresponding to the moving object in the target background image to the area corresponding to the moving object in the reference image, thereby obtaining the output image of the camera module.
  • multiple frames of images shot by a camera module on the same scene are obtained, and a target detection model is used to perform target detection on the multiple frames of images, so as to obtain a target object included in each frame of the multiple frames of images;
  • the included target objects are classified, and the moving objects and stationary objects included in each frame of image are determined; the moving objects in each frame of image are removed, the background image corresponding to each frame of image is obtained, and all background images are fused to generate A target background image; the target area of the target background image is used to cover the target area of one frame of the multi-frame images to obtain the output image of the camera module.
  • the target object in the multi-frame images can be accurately recognized by the target detection model, which improves the accuracy of the target object recognition result.
  • the target detection model By classifying the target objects included in each frame of image, it is determined whether the target object is a moving object or a stationary object, thereby preventing the identification of stationary objects and moving objects from being wrong.
  • the moving objects in the multi-frame images are removed to generate a background image.
  • the target background image is generated, the ghost in the target background image is eliminated, and the clarity of the target background image is ensured.
  • the target area of the target background image is used to cover the target area of one frame of the multi-frame image, so that moving objects are removed from the final output image, and there is no ghost in the output image, and the clarity of the output image is guaranteed. , which improves image quality.
  • the terminal uses the target area of the target background image to cover the target area of one frame of images in the multi-frame images, and obtains the output of the camera module. image
  • the terminal uses the target area of the target background image to cover the target area of one frame of images in the multi-frame images, and obtains the output of the camera module. image
  • Step 501 the terminal determines the number of pixels of the moving object in each frame of image.
  • the terminal may determine the moving objects in the multi-frame images according to the target tracking algorithm, and determine the number of pixels occupied by the moving objects in the entire image in each frame of images according to the recognized moving objects in the multi-frame images.
  • Step 502 the terminal determines a frame image with the least number of pixels as a reference image.
  • the terminal can analyze the multi-frame images according to the number of pixels occupied by the moving objects in each frame of images. Sorting is performed, and a frame image with the least number of pixels occupied by a moving object is selected as the reference image.
  • Step 503 the terminal uses the target area of the target background image to cover the target area of the reference image, and obtains the output image of the camera module.
  • the terminal may determine the target area in the reference image, that is, the area corresponding to the moving object, according to the position of the moving object in the reference image, and determine the corresponding same position in the target background image as the target area in the reference image. target area.
  • the terminal may extract the target area in the target background image, and overlay the target area in the extracted target background image on the target area in the reference image, thereby generating an output image of the camera module.
  • the terminal when covering the target area in the extracted target background image on the target area in the reference image, the terminal may use classical techniques such as Poisson fusion and multi-band fusion, so that the output image is on the boundary of the target area.
  • the transition is more natural.
  • the terminal identifies moving objects in multiple frames of images, and determines a frame of images in which moving objects occupy the least number of pixels as a reference image.
  • the terminal uses the target area of the target background image to cover the target area of the reference image, and obtains the output image of the camera module. Therefore, it can be ensured that the number of covered pixels in the reference image is minimal, and the output image as a whole is clearer, thereby improving the image quality of the output image.
  • step 202 "classify the target objects included in each frame of image, and determine the moving objects and stationary objects included in each frame of image", you can Include the following steps:
  • Step 601 the terminal determines the position of the target object in each frame of image.
  • the terminal determines the target object according to the recognition result of the target detection model. For the same target object in multiple frames of images, the terminal determines the position of the same target object in each frame of images respectively.
  • Step 602 the terminal determines that the target object is a moving object or a stationary object according to the position of the target object in each frame of image.
  • the terminal marks the position corresponding to the same target object in each frame of image.
  • the terminal compares whether the position of the target object in each frame of image changes, and compares the detection results to determine whether the target object is a moving object or a stationary object.
  • the terminal recognizes the same target object C in each frame of images according to the recognition result of the target detection model.
  • the terminal marks the position corresponding to the target object C in each frame of image according to the recognition result.
  • the terminal may use a block diagram to frame the target object in each frame of image.
  • the terminal compares whether the position mark of the target object C in each frame of images has changed, and judges whether the target object is a moving object or a stationary object according to the comparison result.
  • the terminal determines the position of the target object in each frame of image, and determines that the target object is a moving object or a stationary object according to the position of the target object in each frame of image. Therefore, it is possible to accurately determine whether the target object is a moving object or a stationary object, avoid errors in the output image caused by the detection error of the moving object, and ensure the quality of the output image after removing the moving object.
  • the above step 602 "the terminal determines that the target object is a moving object or a stationary object according to the position of the target object in each frame of image" may include the following steps :
  • Step 701 the terminal calculates the position deviation value of the target object in any two frames of the multi-frame images. If the maximum position deviation value is less than the position deviation threshold, go to step 702; if the target object's position deviation in any two frames of the multi-frame image is greater than or equal to the position deviation threshold, go to step 703.
  • Step 702 the terminal determines that the target object is a stationary object.
  • Step 703 the terminal determines that the target object is a moving object.
  • the terminal may determine the position of the same target object in each frame of images.
  • the terminal can compare the positions corresponding to the same target object in any two frames of images, and calculate the difference between the positions corresponding to the same target object in any two frames of images, so as to obtain the position deviation value of the target objects in any two frames of images.
  • the terminal can compare the position deviation value of the target object in any two frames of images, so as to determine the maximum position deviation value.
  • the terminal After determining the maximum position deviation value of the target object in any two frames of images, the terminal compares the maximum position deviation value with the position deviation threshold. If the maximum position deviation value is less than the position deviation threshold, it means that the target object is in multiple frames. If the positional deviation in the image is small, the terminal determines that the target object is a stationary object. If the maximum position deviation value is greater than or equal to the position deviation threshold, it means that the position deviation of the target object in the multi-frame images is relatively large, and the terminal determines that the target object is a moving object.
  • the terminal calculates the corresponding position deviation of the target position D in any two frames of images. Assuming that there are 5 frames of images, the terminal calculates the position deviation between the position corresponding to the target object D in the first frame image and the position corresponding to the target object D in the second frame image, and the position corresponding to the target object D in the first frame image. The position deviation between the position and the position corresponding to the target object D in the third frame image, and so on, respectively calculate the position deviation corresponding to the target object D in any two frames of images. The terminal compares the obtained multiple position deviations, and determines the largest position deviation from them.
  • the terminal compares the maximum position deviation with the position deviation threshold, and the comparison result is that the maximum position deviation is less than the position deviation threshold, and the terminal determines that the target object is a stationary object. If there are any two frames of images, the position deviation of the target object D is 15 pixels distance, and the position deviation threshold is 10 pixels distance. The terminal compares the maximum position deviation with the position deviation threshold, and the comparison result is that the maximum position deviation is greater than the position deviation threshold, and the terminal determines that the target object is a moving object.
  • the terminal calculates the position deviation value of the target object in any two frames of images of the multi-frame images. If the maximum position deviation value is less than the position deviation threshold, the terminal determines that the target object is a stationary object; if the maximum position deviation value is greater than or equal to the position deviation threshold, the terminal determines that the target object is a moving object.
  • the terminal can accurately and effectively determine whether the target object is a moving object or a stationary object by comparing the relationship between the maximum position deviation value of the target object in any two frames of images of the multi-frame image and the position deviation threshold value, avoiding the need for a moving object. Errors in detection result in errors in output images, thereby ensuring the quality of output images with moving objects removed.
  • the terminal determines that the target object is a moving object or a stationary object according to the position of the target object in each frame of image", which may also include the following step:
  • Step 901 the terminal determines the number of target pixels at the tracking position of each frame of image.
  • the target pixel is used to display the target object, and the tracking position is the position of the target object in any one frame of the multi-frame images.
  • the terminal can determine the position of the target object in any frame of images as the tracking position, and determine the same position corresponding to other frames according to the tracking position in the current frame as the tracking position, thereby ensuring the tracking position in the multi-frame images Likewise, the tracking position can reveal the target object more or less.
  • the terminal may calculate the number of target pixels in the tracking position of each frame of image.
  • the target pixel is used to display the target object. That is, the terminal can calculate the number of pixels in which the target object is displayed in the tracking position for each frame of image.
  • Step 902 the terminal determines that the target object is a moving object or a stationary object according to the number of target pixels in the tracking position of each frame of image.
  • the terminal may compare the number of target pixels at the tracking position of any two frames of images, and determine that the target object is a moving object or a stationary object according to the comparison result.
  • the terminal determines the number of target pixels at the tracking position for each frame of image, and determines whether the target object is a moving object or a stationary object according to the number of target pixels at the tracking position for each frame of image.
  • the terminal can accurately determine whether the target object is a moving object or a stationary object, avoiding the error of the output image due to the detection error of the moving object, thereby ensuring the quality of the output image after removing the moving object.
  • the terminal determines that the target object is a moving object or a stationary object according to the number of target pixels in the tracking position of each frame of image", which may include The following steps:
  • Step 1001 the terminal calculates the difference in the number of target pixels at the tracking position of any two frames of images in the multi-frame images.
  • the terminal may calculate the difference in the number of target pixels at the tracking position for any two frames of images respectively.
  • the number of target pixels at the tracking position of the first frame image is 108; the number of target pixels at the tracking position of the second frame image is 111; the number of target pixels at the tracking position of the third frame image is 111; The number of target pixels is 100; the number of target pixels at the tracking position of the fourth frame image is 104; the number of target pixels at the tracking position of the fifth frame image is 113.
  • the terminal calculates the difference in the number of target pixels at the tracking position of any two frames of images respectively.
  • Step 1002 if the largest difference in quantity is less than the threshold of the number of pixels, the terminal determines that the target object is a stationary object.
  • the terminal calculates the difference in the number of target pixels at the tracking position of any two frames of images respectively. Sort the calculated quantity differences of multiple target pixels, and select the largest quantity difference from them. The terminal compares the maximum number difference with the pixel number threshold. If the maximum number difference is less than the pixel number threshold, it means that the target object is not moving, and the terminal determines that the target object is a stationary object.
  • the maximum number difference is 9, and the pixel number threshold is 15.
  • the terminal compares the relationship between the maximum number difference and the pixel number threshold, determines that the maximum number difference is less than the pixel number threshold, and the terminal determines that the target object is stationary.
  • Step 1003 if the difference in the number of target pixels at the tracking position between any two frames of images is greater than or equal to the pixel number threshold, the terminal determines that the target object is a moving object.
  • each time the terminal calculates the difference in the number of target pixels at the tracking position of any two frames of images it can compare the difference in the number of target pixels obtained by the last calculation with the threshold of the number of pixels. After the difference in number is greater than or equal to the threshold of the number of pixels, the terminal determines that the target object is a moving object, and the terminal will no longer calculate the difference in the number of target pixels at the tracking position for any two remaining frames of images.
  • the terminal determines that the difference in the number of target pixels at the tracking position between the first frame image and the second frame image is 20,
  • the pixel number threshold is 15, and the difference in the number of target pixels at the tracking position between the first frame image and the second frame image is greater than the pixel number threshold, the terminal determines that the target object is a moving object, and the terminal will no longer calculate the remaining two frames of images.
  • the difference in the number of target pixels at the tracking position is 20
  • the terminal calculates the difference in the number of target pixels at the tracking position of any two frames of images in the multi-frame images, and if the largest difference in quantity is less than the threshold of the number of pixels, the terminal determines that the target object is a stationary object; if any two frames If the difference in the number of target pixels at the tracking position of the image is greater than or equal to the pixel number threshold, the terminal determines that the target object is a moving object.
  • the terminal can accurately and effectively determine whether the target object is a moving object or a stationary object by comparing the difference between the number of target pixels at the tracking position of any two frames of images in the multi-frame image and the threshold of the number of pixels, avoiding the need for Errors in the detection of moving objects result in errors in the output image, thereby ensuring the quality of the output image from which moving objects are removed.
  • step 203 "remove moving objects in each frame of image and obtain a background image corresponding to each frame of image” may include the following steps:
  • Step 1101 the terminal marks the pixels corresponding to the moving objects in each frame of images as invalid pixels.
  • the terminal can use a target segmentation algorithm to perform target segmentation on the moving objects in each frame of images, thereby obtaining more accurate mask images corresponding to multiple frames of images.
  • the terminal may represent the mask image corresponding to each frame of image as a binary image.
  • the pixel position corresponding to the moving object may be 0, and other pixel positions may be 1.
  • a pixel position of 1 indicates that the pixel is valid
  • a pixel position of 0 indicates that the pixel is invalid, so that the pixel corresponding to the moving object in each frame of image is marked as an invalid pixel.
  • Step 1102 the terminal generates a background image corresponding to each frame of image according to the remaining pixels in each frame of image except invalid pixels.
  • the terminal may determine to generate a background image corresponding to each frame of images according to other pixels other than the invalid pixels.
  • the terminal marks the pixels corresponding to the moving objects in each frame of images as invalid pixels, and generates a background image corresponding to each frame of images according to the remaining pixels in each frame of images except the invalid pixels, thereby The background image in each frame image can be eliminated, so that there are no moving objects in the background image.
  • FIG. 12 an optional operation flow of the image processing method is shown.
  • Step 1201 The terminal acquires multiple frames of images shot by the camera module on the same scene, uses the target detection model to perform target detection on the multiple frames of images, obtains the target object included in each frame of the multiple frames of images, and executes step 1202 or step 1206.
  • Step 1202 the terminal determines the position of the target object in each frame of image.
  • Step 1203 the terminal calculates the position deviation value of the target object in any two frames of the multi-frame image, if the maximum position deviation value is less than the position deviation threshold, then executes step 1204; if the target object is in any two frames of the multi-frame image. If the position deviation value is greater than or equal to the position deviation threshold, step 1205 is executed.
  • Step 1204 the terminal determines that the target object is a stationary object.
  • Step 1205 the terminal determines that the target object is a moving object, and executes step 1210.
  • Step 1206 the terminal determines the number of target pixels at the tracking position for each frame of image.
  • Step 1207 the terminal calculates the difference in the number of target pixels at the tracking position of any two frames of images in the multi-frame images. If the maximum number difference is less than the pixel number threshold, step 1208 is performed; if the target pixel number difference between any two frames of images at the tracking position is greater than or equal to the pixel number threshold, step 1209 is performed.
  • Step 1208 the terminal determines that the target object is a stationary object.
  • Step 1209 the terminal determines that the target object is a moving object, and executes step 1210.
  • Step 1210 the terminal marks the pixels corresponding to the moving objects in each frame of images as invalid pixels.
  • Step 1211 The terminal generates a background image corresponding to each frame of image according to the remaining pixels in each frame of image except invalid pixels.
  • Step 1212 The terminal performs fusion processing on all background images to generate a target background image.
  • Step 1213 The terminal determines the number of pixels of the moving object in each frame of image.
  • Step 1214 the terminal determines a frame of image with the least number of pixels as a reference image.
  • Step 1215 The terminal uses the target area of the target background image to cover the target area of the reference image to obtain the output image of the camera module.
  • FIGS. 5-7 and 9-12 are shown in sequence according to the arrows, these steps are not necessarily executed in the sequence indicated by the arrows. Unless explicitly stated herein, the execution of these steps is not strictly limited to the order, and the steps may be executed in other orders. Moreover, 2. At least a part of the steps in FIGS. 5-7 and 9-12 may include multiple steps or multiple stages, and these steps or stages are not necessarily executed at the same time, but may be executed at different times The execution order of these steps or stages is not necessarily performed sequentially, but may be performed alternately or alternately with other steps or at least a part of the steps or stages in the other steps.
  • an image processing apparatus 1300 including: an acquisition module 1310, a determination module 1320, a removal module 1330, and an overlay module 1340, wherein:
  • the acquisition module 1310 is configured to acquire multiple frames of images shot by the camera module on the same scene, and to perform target detection on the multiple frames of images by using a target detection model to obtain a target object included in each frame of the multiple frames of images;
  • a determination module 1320 configured to classify and process the target objects included in each frame of images, and determine the moving objects and stationary objects included in each frame of images;
  • the removing module 1330 is used to remove the moving objects in each frame of image, obtain the background image corresponding to each frame of image, perform fusion processing on all the background images, and generate the target background image;
  • the covering module 1340 is configured to use the target area of the target background image to cover the target area of one frame of the multi-frame images to obtain the output image of the camera module; wherein, the target area is the area corresponding to the moving object.
  • the above-mentioned overlay module 1340 is specifically used to determine the number of pixels of a moving object in each frame of image; to determine an image with the least number of pixels as a reference image; to cover the reference image with the target area of the target background image the target area to obtain the output image of the camera module.
  • the above determination module 1320 includes: a first determination unit 1321 and a second determination unit 1322, wherein:
  • the first determining unit 1321 is configured to determine the position of the target object in each frame of image.
  • the second determining unit 1322 is configured to determine whether the target object is a moving object or a stationary object according to the position of the target object in each frame of image.
  • the above-mentioned second determining unit 1322 is specifically configured to calculate the position deviation value of the target object in any two frames of images of the multi-frame images. If the maximum position deviation value is less than the position deviation threshold, the target object is determined. It is a stationary object. If the position deviation value of the target object in any two frames of the multi-frame images is greater than or equal to the position deviation threshold, the target object is determined to be a moving object.
  • the above determination module 1320 further includes: a third determination unit 1323 and a fourth determination unit 1324, wherein:
  • the third determining unit 1323 is used to determine the number of target pixels in the tracking position of each frame of images, the target pixels are used to display the target object, and the tracking position is the position of the target object in any frame of the multi-frame images.
  • the fourth determining unit 1324 is configured to determine whether the target object is a moving object or a stationary object according to the number of target pixels in the tracking position of each frame of image.
  • the above-mentioned fourth determining unit 1324 is specifically configured to calculate the difference in the number of target pixels at the tracking position of any two frames of images in the multi-frame image;
  • the object is a stationary object; if the difference in the number of target pixels at the tracking position between any two frames of images is greater than or equal to the pixel number threshold, the target object is determined to be a moving object.
  • the above-mentioned removal module 1330 is specifically configured to mark pixels corresponding to moving objects in each frame of images as invalid pixels; generate each frame according to the remaining pixels in each frame of images except invalid pixels The image corresponds to the background image.
  • Each module in the above-mentioned image processing apparatus may be implemented in whole or in part by software, hardware, and combinations thereof.
  • the above modules can be embedded in or independent of the processor in the computer device in the form of hardware, or stored in the memory in the computer device in the form of software, so that the processor can call and execute the operations corresponding to the above modules.
  • a computer device including a memory and a processor, a computer program is stored in the memory, and the processor implements the following steps when executing the computer program: acquiring multiple frames shot by a camera module on the same scene image, using the target detection model to perform target detection on multiple frames of images, and obtain the target objects included in each frame of the multi-frame images; classify the target objects included in each frame of images, and determine the moving objects included in each frame of images.
  • the target area of the frame image is to obtain the output image of the camera module; wherein, the target area is the area corresponding to the moving object.
  • the processor also implements the following steps when executing the computer program: determining the number of pixels of moving objects in each frame of image; determining a frame image with the least number of pixels as a reference image; using the target area of the target background image Cover the target area of the reference image to obtain the output image of the camera module.
  • the processor further implements the following steps when executing the computer program: determining the position of the target object in each frame of image; determining the target object as a moving object or stationary object.
  • the processor further implements the following steps when executing the computer program: calculating the position deviation value of the target object in any two frames of images of the multi-frame images, and determining the target object if the maximum position deviation value is smaller than the position deviation threshold
  • the object is a stationary object. If the position deviation value of the target object in any two frames of the multi-frame image is greater than or equal to the position deviation threshold, the target object is determined to be a moving object.
  • the processor further implements the following steps when executing the computer program: determining the number of target pixels in the tracking position of each frame of image, the target pixels are used to display the target object, and the tracking position is any one of the multiple frames of images.
  • the processor also implements the following steps when executing the computer program: calculating the difference in the number of target pixels at the tracking position of any two frames of images in the multi-frame images; if the largest difference in the number is less than the pixel number threshold, then determine The target object is a stationary object; if the difference in the number of target pixels at the tracking position between any two frames of images is greater than or equal to the pixel number threshold, the target object is determined to be a moving object.
  • the processor also implements the following steps when executing the computer program: marking pixels corresponding to moving objects in each frame of image as invalid pixels; The background image corresponding to each frame image.
  • a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, the following steps are implemented: acquiring multiple frames of images shot by a camera module on the same scene, using The target detection model performs target detection on multiple frames of images, and obtains the target objects included in each frame of the multi-frame images; classifies the target objects included in each frame of images, and determines the moving objects and stationary objects included in each frame of images. ;Remove moving objects in each frame of image, obtain the background image corresponding to each frame of image, and fuse all background images to generate a target background image; use the target area of the target background image to cover the area of one frame of images in multiple frames of images.
  • the target area is to obtain the output image of the camera module; wherein, the target area is the area corresponding to the moving object.
  • the following steps are further implemented: determine the number of pixels of the moving object in each frame of image; determine a frame image with the least number of pixels as the reference image; The area covers the target area of the reference image, and the output image of the camera module is obtained.
  • the following steps are also implemented: determining the position of the target object in each frame of image; determining the target object as a moving object according to the position of the target object in each frame of image or stationary objects.
  • the following steps are further implemented: calculating the position deviation value of the target object in any two frames of images of the multi-frame images, and if the maximum position deviation value is smaller than the position deviation threshold, then determine The target object is a stationary object. If the position deviation value of the target object in any two frames of the multi-frame images is greater than or equal to the position deviation threshold, the target object is determined to be a moving object.
  • the following steps are further implemented: determine the number of target pixels in the tracking position of each frame of image, the target pixels are used to display the target object, and the tracking position is any one of the multiple frames of images.
  • the position of the target object in one frame of image according to the number of target pixels in the tracking position of each frame of image, determine whether the target object is a moving object or a stationary object.
  • the following steps are further implemented: calculating the difference in the number of target pixels of any two frames of images in the multi-frame images at the tracking position; if the largest difference in the number is less than the pixel number threshold, then The target object is determined to be a stationary object; if the difference in the number of target pixels at the tracking position between any two frames of images is greater than or equal to the pixel number threshold, the target object is determined to be a moving object.
  • the following steps are further implemented: marking pixels corresponding to moving objects in each frame of image as invalid pixels; Generate a background image corresponding to each frame of image.
  • Non-volatile memory may include read-only memory (Read-Only Memory, ROM), magnetic tape, floppy disk, flash memory, or optical memory, and the like.
  • Volatile memory may include random access memory (RAM) or external cache memory.
  • the RAM may be in various forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM).

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

本申请涉及一种图像处理方法、装置、计算机设备和存储介质,适用于计算机技术领域。所述方法包括:获取摄像模组对同一场景拍摄的多帧图像,利用目标检测模型对多帧图像进行目标检测,获得多帧图像中每一帧图像包括的目标物体;对每一帧图像包括的目标物体进行分类处理,确定每一帧图像包括的运动物体和静止物体;去除每一帧图像中的运动物体,获得每一帧图像对应的背景图像,对所有背景图像进行融合处理,生成目标背景图像;利用目标背景图像的目标区域覆盖多帧图像中一帧图像的目标区域,获得摄像模组的输出图像。采用本方法能够提高去除运动物体后的合成图像的图像质量。

Description

图像处理方法、装置、计算机设备和存储介质 技术领域
本申请涉及计算机技术领域,特别是涉及一种图像处理方法、装置、计算机设备和存储介质。
背景技术
随着科学技术的不断发展,用于拍照的设备越来越多,诸如摄像机、照相机、智能手机、平板电脑等设备均可以用于拍照。然而,在用这些设备进行拍照时,通常会有一些行人、车辆、或者动物等其他物体进入拍摄的图像中,影响了图像的美观性。
为了解决上述问题,传统技术中,通常依靠不同帧图像中各个像素值的变化来识别运动物体,从而去除图像中的运动物体。
技术问题
然而,当运动物体运动不充分或者运动物体在单个地点停留时间过长时,物体在不同帧图像中的像素值不会有太大改变,从而使得识别出现误差。也就是说现有技术对运动物体的识别准确度不够,使得去除运动物体的合成图像中存在鬼影,图像质量不佳。
技术解决方案
基于此,有必要针对上述技术问题,提供一种图像处理方法、装置、计算机设备和存储介质,能够提高去除运动物体后的合成图像的图像质量。
第一方面,提供了一种图像处理方法,该方法包括:
获取摄像模组对同一场景拍摄的多帧图像,利用目标检测模型对多帧图像进行目标检测,获得多帧图像中每一帧图像包括的目标物体;对每一帧图像包括的目标物体进行分类处理,确定每一帧图像包括的运动物体和静止物体;去除每一帧图像中的运动物体,获得每一帧图像对应的背景图像,对所有背景图像进行融合处理,生成目标背景图像;利用目标背景图像的目标区域覆盖多帧图像中一帧图像的目标区域,获得摄像模组的输出图像;其中,目标区域为运动物体对应的区域。
在其中一个实施例中,利用目标背景图像的目标区域覆盖多帧图像中一帧图像的目标区域,获得摄像模组的输出图像,包括:确定每一帧图像中运动物体的像素数量;确定像素数量最少的一帧图像为参考图像;利用目标背景图像的目标区域覆盖参考图像的目标区域,获得摄像模组的输出图像。
在其中一个实施例中,对每一帧图像包括的目标物体进行分类处理,确定每一帧图像包括的运动物体和静止物体,包括:确定目标物体在每一帧图像中的位置;根据目标物体在每一帧图像中的位置,确定目标物体为运动物体或者静止物体。
在其中一个实施例中,根据目标物体在每一帧图像中的位置,确定目标物体为运动物体或者静止物体,包括:计算目标物体在多帧图像的任意两帧图像中位置偏差值,若最大的位置偏差值小于位置偏差阈值,则确定目标物体为静止物体,若目标物体在多帧图像的任意两帧图像中位置偏差值大于或等于位置偏差阈值,则确定目标物体为运动物体。
在其中一个实施例中,对每一帧图像包括的目标物体进行分类处理,确定每一帧图像包括的运动物体和静止物体,包括:确定每一帧图像在追踪位置的目标像素的数量,目标像素用于显示目标物体,追踪位置为多帧图像中任意一帧图像中目标物体的位置;根据每一帧图像在追踪位置的目标像素的数量,确定目标物体为运动物体或者静止物体。
在其中一个实施例中,根据每一帧图像在追踪位置的目标像素的数量,确定目标物体为运动物体或者静止物体,包括:计算多帧图像中任意两帧图像在追踪位置的目标像素的数量差;若最大的数量差小于像素数量阈值,则确定目标物体为静止物体;若任意两帧图像在追踪位置的目标像素的数量差大于或者等于像素数量阈值,则确定目标物体为运动物体。
在其中一个实施例中,去除每一帧图像中的运动物体,获得每一帧图像对应的背景图像,包括:将每一帧图像中的运动物体对应的像素标记为无效像素;根据每一帧图像中除无效像素外的其余像素生成每一帧图像对应的背景图像。
第二方面,提供了一种图像处理装置,上述图像处理装置包括:
获取模块,用于获取摄像模组对同一场景拍摄的多帧图像,利用目标检测模型对多帧图像进行目标检测,获得多帧图像中每一帧图像包括的目标物体;
确定模块,用于对每一帧图像包括的目标物体进行分类处理,确定每一帧图像包括的运动物体和静止物体;
去除模块,用于去除每一帧图像中的运动物体,获得每一帧图像对应的背景图像,对所有背景图像进行融合处理,生成目标背景图像;
覆盖模块,用于利用所述目标背景图像的目标区域覆盖所述多帧图像中一帧图像的目标区域,获得所述摄像模组的输出图像;其中,所述目标区域为所述运动物体对应的区域。
第三方面,提供了一种计算机设备,包括存储器和处理器,所述存储器存储有计算机程序,所述处理器执行所述计算机程序时实现如上述第一方面任一所述的方法。
第四方面,提供了一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现如上述第一方面任一所述的方法。
技术效果
上述图像处理方法、装置、计算机设备和存储介质,获取摄像模组对同一场景拍摄的多帧图像,利用目标检测模型对多帧图像进行目标检测,获得多帧图像中每一帧图像包括的目标物体;对每一帧图像包括的目标物体进行分类处理,确定每一帧图像包括的运动物体和静止物体;去除每一帧图像中的运动物体,获得每一帧图像对应的背景图像,对所有背景图像进行融合处理,生成目标背景图像;利用目标背景图像的目标区域覆盖多帧图像中一帧图像的目标区域,获得摄像模组的输出图像。上述方法,通过目标检测模型可以准确识别出多帧图像中的前景物体(例如,可以是前文所述的目标物体),提高了目标物体识别结果的准确性。通过对每一帧图像包括的目标物体进行分类处理,确定目标物体为运动物体还是静止物体,从而防止静止物体和运动物体识别错误。在保证查找到的运动物体的准确性的前提下,将多帧图像中的运动物体去除,生成背景图像。通过对多帧背景图像进行融合处理,生成目标背景图像,消除了目标背景图像中的鬼影,保证了目标背景图像的清晰度。最后,利用目标背景图像的目标区域覆盖多帧图像中一帧图像的目标区域,使得最终生成的输出图像中去除了运动物体,且输出图像中不存在鬼影,且保证了图像整体的清晰度,提高了图像质量。
附图说明
图1为一个实施例中图像处理方法的应用环境图;
图2为一个实施例中图像处理方法的流程示意图;
图3为一个实施例中图像处理方法中确定多帧图像中目标位置的示意图;
图4为一个实施例中图像处理方法利用目标背景图像的目标区域覆盖多帧图像中一帧图像的目标区域的示意图;
图5为一个实施例中图像处理步骤的流程示意图;
图6为另一个实施例中图像处理方法的流程示意图;
图7为另一个实施例中图像处理方法的流程示意图;
图8为一个实施例中图像处理方法中确定多帧图像中目标物体的示意图;
图9为另一个实施例中图像处理方法的流程示意图;
图10为另一个实施例中图像处理方法的流程示意图;
图11为另一个实施例中图像处理方法的流程示意图;
图12为另一个实施例中图像处理方法的流程示意图;
图13为一个实施例中图像处理装置的结构框图;
图14为一个实施例中图像处理装置的结构框图;
图15为一个实施例中图像处理装置的结构框图。
本发明的实施方式
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。
本申请提供的图像处理方法,可以应用于如图1所示的计算机设备中。其中,该计算机设备可以是终端。其内部结构图可以如图1所示。该计算机设备包括通过系统总线连接的处理器、存储器、通信接口、显示屏和输入装置。其中,该计算机设备的处理器用于提供计算和控制能力。该计算机设备的存储器包括非易失性存储介质、内存储器。该非易失性存储介质存储有操作系统和计算机程序。该内存储器为非易失性存储介质中的操作系统和计算机程序的运行提供环境。该计算机设备的通信接口用于与外部的终端进行有线或无线方式的通信,无线方式可通过WIFI、运营商网络、NFC(近场通信)或其他技术实现。该计算机程序被处理器执行时以实现一种图像处理方法。该计算机设备的显示屏可以是液晶显示屏或者电子墨水显示屏,该计算机设备的输入装置可以是显示屏上覆盖的触摸层,也可以是计算机设备外壳上设置的按键、轨迹球或触控板,还可以是外接的键盘、触控板或鼠标等。
本领域技术人员可以理解,图1中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的计算机设备的限定,具体的计算机设备可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。
在本申请一个实施例中,如图2所示,提供了一种图像处理方法,以该方法应用于图1中的终端为例进行说明,包括以下步骤:
步骤201,终端获取摄像模组对同一场景拍摄的多帧图像,利用目标检测模型对多帧图像进行目标检测,获得多帧图像中每一帧图像包括的目标物体。
具体地,用户可以将摄像模组所在设备放置在固定的位置,保持设备静止不动,使得摄像模组对同一场景拍摄多帧图像。其中,摄像模组对同一场景拍摄的多帧图像中的静止物体的相对位置不发生变化(例如,静止物体可以是建筑物、正在被拍照的人或者树木等),运动物体的相对位置可以发生变化(例如,运动物体可以是突然闯入当前正在拍摄场景的人、动物或者车辆等)。应当理解,这里的同一场景,主要是针对静止物体而言的同一拍摄场景,即静止物体是最终想要得到的图像中的目标物体,而运动物体是误入这一拍摄场景中,是用户不想要的。上述通过固定摄像模组所在拍摄设备的方式可以得到同一场景的多帧图像,但拍摄得到同一场景多帧图像的方法并不仅限于此,本实施例对此不做具体限定。
可选的,终端或者拍摄设备在接收到用户输入的拍照指令后,可以控制摄像模组拍摄多帧连续图像。可选的,用户输入的拍照指令可以是用户按下快门按键,也可以是用户输入语音拍照口令,还可以是终端或者拍摄设备检测到用户的拍照手势,本申请实施例中对用户输入的拍照指令不做具体限定。
摄像模组对同一场景拍摄多帧图像以后,可以将多帧图像存储至存储设备中,终端可以从存储设备中获取到摄像模组对同一场景拍摄的多帧图像。终端可以将多帧图像输入至目标检测模型,利用目标检测模型对多帧图像中的特征进行提取,从而确定每一帧图像中的目标物体。其中,目标检测模型可以是基于手工特征的模型,例如DPM(Deformable Parts Model,可变形零件模型),目标检测模型也可以是基于卷积神经网络的模型,例如YOLO(You Only Look Once,你只看一次)检测器、R-CNN,(Region-based Convolutional Neural Networks,基于区域的卷积神经网络)模型、SSD(Single Shot MultiBox,单发多框)检测器以及Mask R-CNN(Mask Region-based Convolutional Neural Networks,带掩码的基于区域的卷积神经网络)模型等。本申请实施例对于目标检测模型不做具体限定。
步骤202,终端对每一帧图像包括的目标物体进行分类处理,确定每一帧图像包括的运动物体和静止物体。
可选的,终端可以利用目标追踪算法对多帧图像中包括的同一个目标物体进行追踪,确定同一个目标物体在不同帧图像中的位置,判断同一个目标物体为运动物体还是静止物体,从而对每一帧图像中的运动物体和静止物体进行分类。
示例性的,在终端获取到的多帧图像中均包括目标物体A后,终端利用目标追踪算法分别识别出目标物体A在多帧图像中的位置,根据目标物体A在多帧图像中的位置,判断目标 物体A为运动物体还是静止物体。
可选的,终端还可以利用目标追踪算法对多帧图像中的同一位置进行追踪,确定多帧图像中在这同一位置检测到目标物体的像素的数量,根据目标物体在多帧图像的同一位置显示的像素的数量判断目标物体为运动物体还是静止物体。
示例性的,如图3所示,终端根据目标追踪算法检测到目标物体B在第一帧图像中位置,将目标物体B在第一帧图像中的位置确定为目标位置。终端根据目标位置在第一帧图像中的位置,将其他多帧图像中的相同位置确定为目标位置。终端追踪在多帧图像的同一目标位置,目标物体B所占据的像素的数量,并根据目标物体B在多帧图像目标位置的像素的数量确定目标物体B为运动物体或静止物体。
步骤203,终端去除每一帧图像中的运动物体,获得每一帧图像对应的背景图像,对所有背景图像进行融合处理,生成目标背景图像。
具体地,终端在确定各个目标物体为静止物体或者运动物体之后,将每一帧图像中的运动物体所在目标矩形框内的像素标记为无效像素,获取到每一帧图像对应的背景图像,对所有背景图像进行融合处理,生成目标背景图像。
可选的,将每一帧图像中的运动物体所在目标矩形框内的像素标记为无效像素,获取到每一帧图像对应的背景图像之后,终端可以采用像素级图像融合方法对多帧背景图像进行融合处理,从而生成目标背景图像,其中,像素级图像融合方法可以是基于非多尺度变换的图像融合方法(例如:平均与加权平均法、逻辑滤波器法、数学形态法、图像代数法等)或者是基于多尺度变换的图像融合方法(例如:金字塔图像融合法、小波变换图像融合法、基于神经网络的图像融合法等)。在本申请实施例中,不对多帧背景图像的融合方法进行限定,采用像素级图像融合方法,可以保留更多的图像信息。
可选的,终端还可以利用背景建模的方法,对每一帧图像对应的背景图像进行融合处理。其中,背景建模的方法可以使用非递归背景建模方法,也可以使用递归背景建模方法,其中,非递归背景建模方法可以包括中值、均值模型,线性预测模型,非参数核密度估计等,递归背景建模方法可以包括近似中值滤波方法,单高斯模型方法,混合高斯模型方法等。
示例性的,本申请实施例以非递归背景建模方法中的中值模型建模方法为例进行详细介绍。假设有n帧图像。
Figure PCTCN2022083404-appb-000001
表示图像集合,其中I k表示第k帧图像。
Figure PCTCN2022083404-appb-000002
表示对图像集合中的每一帧图 像中的各个像素进行标注后得到的掩码图集合,M k表示I k对应的掩码图。其中,掩码图集合中的每一帧掩码图中的运动物体对应的像素为无效像素,可以将无效像素标注为0,除运动物体以为的像素为有效像素,可以将各有效像素标注为1,从而生成对应的掩码图。可选的,M k中每一像素点的像素值的取值范围可以为{0,1},其中,0代表无效像素,1代表有效像素。用p=(x,y)表示图像中各个像素点的坐标位置,例如p=(1,2)可以代表图像中第一行第二列的像素点的坐标位置。I k(p)和M k(p)分别表示I k和M k在坐标位置p对应像素点的像素值。用B和B(p)分别表示合成的目标背景图像和背景图像在坐标位置p对应像素点的像素值,则相应的计算公式为:
Figure PCTCN2022083404-appb-000003
公式(1)中Median(*)表示对集合中的元素取中值操作。
利用计算出的每一帧背景图像中各个坐标位置p对应像素点的像素值B(p)以及根据p对应像素点的坐标位置,生成目标背景图像在坐标位置p对应像素点的像素值B,从而得到目标背景图像。
步骤204,终端利用目标背景图像的目标区域覆盖多帧图像中一帧图像的目标区域,获得摄像模组的输出图像。
其中,目标区域为运动物体对应的区域。
可选的,由于摄影模组在获取同一场景的多帧图像时,会存在人为误差或者设备误差,导致多帧图像中各个静止物体或者运动物体的位置存在些许偏差,从而使得经过融合处理后,生成的目标背景图像中的运动物体对应的边缘变得模糊,为了提高输出图像的清晰度。终端可以根据清晰度识别模型识别多帧图像的清晰度,从多帧图像中选择清晰度最高的一帧图像作为参考图像。终端识别参考图像中运动物体,并确定运动物体对应的区域。终端根据参考图像中运动物体对应的区域,确定在目标背景图像中运动物体对应的区域。终端提取目标背景图像中运动物体对应的区域,并将目标背景图像中运动物体对应的区域覆盖至参考图像中运动物体对应的区域,从而获得所述摄像模组的输出图像。
示例性的,如图4所示,图4中的图A为多帧图像中任选一帧的图像,图B为目标背景图像,图C为摄像模组的输出图像。终端识别图A中的运动物体(1)和(2)对应的区域,并根据图A中的运动物体(1)和(2)对应的区域确定在目标背景图像中运动物体(1)和(2)对应的区域。终端将目标背景图像中运动物体(1)和(2)对应的区域进行提取复制到 图A中的运动物体(1)和(2)对应的区域,从而生成图C,即摄像模组的输出图像。
可选的,终端还可以识别多帧图像中的运动问题,并计算多帧图像中的每帧图像中的运动物体的数量,并包括的运动物体数量最少的一帧图像作为参考图像。终端根据参考图像中运动物体对应的区域,确定在目标背景图像中运动物体对应的区域。终端提取目标背景图像中运动物体对应的区域,并将目标背景图像中运动物体对应的区域覆盖至参考图像中运动物体对应的区域,从而获得所述摄像模组的输出图像。
上述图像处理方法中,获取摄像模组对同一场景拍摄的多帧图像,利用目标检测模型对多帧图像进行目标检测,获得多帧图像中每一帧图像包括的目标物体;对每一帧图像包括的目标物体进行分类处理,确定每一帧图像包括的运动物体和静止物体;去除每一帧图像中的运动物体,获得每一帧图像对应的背景图像,对所有背景图像进行融合处理,生成目标背景图像;利用目标背景图像的目标区域覆盖所述多帧图像中一帧图像的目标区域,获得所述摄像模组的输出图像。上述方法,通过目标检测模型可以准确识别出多帧图像中的目标物体,提高了目标物体识别结果的准确性。通过对每一帧图像包括的目标物体进行分类处理,确定目标物体为运动物体还是静止物体,从而防止静止物体和运动物体识别错误。在保证查找到的运动物体的准确性的前提下,将多帧图像中的运动物体去除,生成背景图像。通过对多帧背景图像进行融合处理,生成目标背景图像,消除了目标背景图像中的鬼影,保证了目标背景图像的清晰度。最后,利用目标背景图像的目标区域覆盖多帧图像中一帧图像的目标区域,使得最终生成的输出图像中去除了运动物体,且输出图像中不存在鬼影,且保证了输出图像的清晰度,提高了图像质量。
在本申请一种可选的实现方式中,如图5所示,上述步骤204中的“终端利用目标背景图像的目标区域覆盖多帧图像中一帧图像的目标区域,获得摄像模组的输出图像”,可以包括以下步骤:
步骤501,终端确定每一帧图像中运动物体的像素数量。
具体地,终端可以根据目标追踪算法确定多帧图像中的运动物体,并根据识别到的多帧图像中的运动物体,确定每一帧图像中运动物体在整个图像中占据的像素的数量。
步骤502,终端确定像素数量最少的一帧图像为参考图像。
具体地,终端在识别完多帧图像中的运动物体,并确定运动物体在每帧图像中占据的像 素的数量以后,终端可以根据运动物体在每帧图像中占据的像素的数量对多帧图像进行排序,从中选择运动物体占据像素数量最少的一帧图像作为参考图像。
步骤503,终端利用目标背景图像的目标区域覆盖参考图像的目标区域,获得摄像模组的输出图像。
具体地,终端可以根据运动物体在参考图像中的位置确定参考图像中的目标区域,即运行物体对应的区域,并根据参考图像中的目标区域,将目标背景图像中对应的相同位置也确定为目标区域。终端可以将目标背景图像中的目标区域进行提取,将提取出的目标背景图像中的目标区域覆盖在参考图像中的目标区域,从而生成摄像模组的输出图像。
可选的,终端在将提取出的目标背景图像中的目标区域覆盖在参考图像中的目标区域时,可以采用泊松融合、多波段融合等经典技术,从而使得输出图像在目标区域的边界上过渡更加自然。
在本申请实施例中,终端识别多帧图像中的运动物体,并确定多帧图像中运动物体占像素数量最少的一帧图像为参考图像。终端利用目标背景图像的目标区域覆盖参考图像的目标区域,获得摄像模组的输出图像。从而可以保证参考图像中被覆盖的像素的数量最小,且输出图像整体更加清晰,提高了输出图像的图像质量。
在本申请一种可选的实现方式中,如图6所示,上述步骤202“对每一帧图像包括的目标物体进行分类处理,确定每一帧图像包括的运动物体和静止物体”,可以包括以下步骤:
步骤601,终端确定目标物体在每一帧图像中的位置。
具体地,终端根据目标检测模型识别结果,确定目标物体。针对多帧图像中的同一个目标物体,终端分别确定同一个目标物体在每一帧图像中的位置。
步骤602,终端根据目标物体在每一帧图像中的位置,确定目标物体为运动物体或者静止物体。
具体地,终端在每一帧图像中对同一个目标物体对应的位置进行标注。终端对比目标物体在每一帧图像中的位置是否发生变化,并对比检测结果判断目标物体为运动物体还是静止物体。
示例性的,终端根据目标检测模型识别结果,在每一帧图像中均识别出相同的目标物体C。终端根据识别结果,分别对每一帧图像中的目标物体C对应的位置进行标注,可选的,终端可以在每一帧图像中利用框图框出目标物体。终端对比每帧图像中针对目标物体C的位置标记是否发生变化,并根据对比结果判断目标物体为运动物体还是静止物体。
本申请实施例中,通过终端确定目标物体在每一帧图像中的位置,根据目标物体在每一 帧图像中的位置,确定目标物体为运动物体或者静止物体。从而能够准确地确定目标物体为运动物体还是静止物体,避免了因为运动物体检测错误,造成输出图像的错误,从而保证了去除运动物体的输出图像的质量。
在本申请一种可选的实现方式中,如图7所示,上述步骤602“终端根据目标物体在每一帧图像中的位置,确定目标物体为运动物体或者静止物体”,可以包括以下步骤:
步骤701,终端计算目标物体在多帧图像的任意两帧图像中位置偏差值。若最大的位置偏差值小于位置偏差阈值,则执行步骤702;若目标物体在多帧图像的任意两帧图像中位置偏差值大于或等于位置偏差阈值,则执行步骤703。
步骤702,终端确定目标物体为静止物体。
步骤703,终端确定目标物体为运动物体。
具体地,终端在对每帧图像中的同一目标物体对应的像素位置进行标注之后,可以确定同一目标物体在每一帧图像中的位置。终端可以比较任意两帧图像中同一目标物体对应的位置,并对任意两帧图像中同一目标物体对应的位置进行求差计算,得到任意两帧图像中目标物体的位置偏差值。终端可以对任意两帧图像中目标物体的位置偏差值进行对比,从而确定最大的位置偏差值。
终端在确定了任意两帧图像中目标物体的最大的位置偏差值后,将最大的位置偏差值与位置偏差阈值进行对比,若最大的位置偏差值小于位置偏差阈值,则说明目标物体在多帧图像中的位置偏差较小,则终端确定目标物体为静止物体。若最大的位置偏差值大于或者等于位置偏差阈值,则说明目标物体在多帧图像中的位置偏差较大,则终端确定目标物体为运动物体。
示例性的,如图8所示,在每帧图像中针对目标物体D进行位置标注之后,终端计算任意两帧图像中目标位置D的对应的位置偏差。假设有5帧图像,则终端分别计算第一帧图像中目标物体D对应的位置与第二帧图像中目标物体D对应的位置之间的位置偏差,以及第一帧图像中目标物体D对应的位置与第三帧图像中目标物体D对应的位置之间的位置偏差,依次类推,分别计算任意两帧图像中目标物体D对应的位置偏差。终端对得到的多个位置偏差进行对比,从中确定出最大的位置偏差。若最大的位置偏差为5个像素距离,而位置偏差阈值为10个像素距离。终端对比最大的位置偏差与位置偏差阈值,对比结果为最大的位置偏差小于位置偏差阈值,终端确定目标物体为静止物体。若存在任意两帧图像中目标物体D的位 置偏差为15个像素距离,而位置偏差阈值为10个像素距离。终端对比最大的位置偏差与位置偏差阈值,对比结果为最大的位置偏差大于位置偏差阈值,终端确定目标物体为运动物体。
在本申请实施例中,终端计算目标物体在多帧图像的任意两帧图像中位置偏差值。若最大的位置偏差值小于位置偏差阈值,则终端确定目标物体为静止物体;若最大的位置偏差值大于或等于位置偏差阈值,则终端确定目标物体为运动物体。上述方法,终端通过对比目标物体在多帧图像的任意两帧图像中最大的位置偏差值与位置偏差阈值的关系,从而可以准确有效地确定目标物体是运动物体还是静止物体,避免了因为运动物体检测错误,造成输出图像的错误,从而保证了去除运动物体的输出图像的质量。
在本申请一种可选的实现方式中,如图9所示,上述步骤202“终端根据目标物体在每一帧图像中的位置,确定目标物体为运动物体或者静止物体”,还可以包括以下步骤:
步骤901,终端确定每一帧图像在追踪位置的目标像素的数量。
其中,目标像素用于显示目标物体,追踪位置为多帧图像中任意一帧图像中目标物体的位置。
具体地,终端可以将目标物体在任意一帧图像中的位置确定为追踪位置,并根据当前帧中的追踪位置确定其他帧对应的相同位置也为追踪位置,从而保证多帧图像中的追踪位置相同,追踪位置可以或多或少的展示目标物体。
终端在确定了每一帧图像中的追踪位置之后,可以计算每一帧图像在追踪位置的目标像素的数量。其中,目标像素用于显示目标物体。也就是说,终端可以计算每一帧图像在追踪位置中显示目标物体的像素的数量。
步骤902,终端根据每一帧图像在追踪位置的目标像素的数量,确定目标物体为运动物体或者静止物体。
具体地,终端可以对比任意两帧图像在追踪位置的目标像素的数量,并根据对比的结果,确定目标物体为运动物体或者静止物体。
在本申请实施例中,终端确定每一帧图像在追踪位置的目标像素的数量,并根据每一帧图像在追踪位置的目标像素的数量,确定目标物体为运动物体或者静止物体。使用上述方法,终端能够准确地确定目标物体为运动物体还是静止物体,避免了因为运动物体检测错误,造成输出图像的错误,从而保证了去除运动物体的输出图像的质量。
在本申请一种可选的实现方式中,如图10所示,上述步骤902“终端根据每一帧图像在追踪位置的目标像素的数量,确定目标物体为运动物体或者静止物体”,可以包括以下步骤:
步骤1001,终端计算多帧图像中任意两帧图像在追踪位置的目标像素的数量差。
具体地,在确定每一帧图像在追踪位置的目标像素的数量之后,终端可以分别计算任意两帧图像在追踪位置的目标像素的数量差。
示例性的,假设有5帧图像,第一帧图像在追踪位置的目标像素的数量为108个;第二帧图像在追踪位置的目标像素的数量为111个;第三帧图像在追踪位置的目标像素的数量为100个;第四帧图像在追踪位置的目标像素的数量为104个;第五帧图像在追踪位置的目标像素的数量为113个。终端分别计算任意两帧图像在追踪位置的目标像素的数量差。
步骤1002,若最大的数量差小于像素数量阈值,则终端确定目标物体为静止物体。
具体地,终端在分别计算任意两帧图像在追踪位置的目标像素的数量差。对计算得到的多个目标像素的数量差进行排序,并从中选择出最大的数量差。终端将最大的数量差与像素数量阈值进行对比,若最大的数量差小于像素数量阈值,说明目标物体没有动,则终端确定目标物体为静止物体。
示例性的,最大的数量差为9,而像素数量阈值为15,终端对比最大的数量差和像素数量阈值之间的关系,确定最大的数量差小于像素数量阈值,终端确定目标物体为静止。
步骤1003,若任意两帧图像在追踪位置的目标像素的数量差大于或者等于像素数量阈值,则终端确定目标物体为运动物体。
具体地,终端每次计算完任意两帧图像在追踪位置的目标像素的数量差,均可以将最后一次计算得到的目标像素的数量差与像素数量阈值进行对比,在第一次发现目标像素的数量差大于或者等于像素数量阈值后,终端确定目标物体为运动物体,并且终端将不再计算剩余的任意两帧图像在追踪位置的目标像素的数量差。
示例性的,终端在计算完第一帧图像与第二帧图像在追踪位置的目标像素的数量差之后,确定第一帧图像与第二帧图像在追踪位置的目标像素的数量差为20,而像素数量阈值为15,第一帧图像与第二帧图像在追踪位置的目标像素的数量差大于像素数量阈值,终端确定目标物体为运动物体,并且终端将不再计算剩余的任意两帧图像在追踪位置的目标像素的数量差。
在本申请实施例中,终端计算多帧图像中任意两帧图像在追踪位置的目标像素的数量差, 若最大的数量差小于像素数量阈值,则终端确定目标物体为静止物体;若任意两帧图像在追踪位置的目标像素的数量差大于或者等于像素数量阈值,则终端确定目标物体为运动物体。上述方法,终端通过对比多帧图像中任意两帧图像在追踪位置的目标像素的数量差与像素数量阈值之间的大小,从而可以准确有效地确定目标物体是运动物体还是静止物体,避免了因为运动物体检测错误,造成输出图像的错误,从而保证了去除运动物体的输出图像的质量。
在本申请一种可选的实现方式中,如图11所示,上述步骤203中的“去除每一帧图像中的运动物体,获得每一帧图像对应的背景图像”可以包括以下步骤:
步骤1101,终端将每一帧图像中的运动物体对应的像素标记为无效像素。
具体地,在确定各个目标物体为静止物体或者运动物体之后,终端可以使用目标分割算法对每一帧图像中的运动物体进行目标分割,从而得到多帧图像对应的更精确的掩码图像。终端可以将每一帧图像对应的掩码图像表示为二值图像。可选的,运动物体对应的像素位置可以是0,其他像素位置可以为1。其中,像素位置为1表示像素有效,像素位置为0,则表示像素无效,从而实现将每一帧图像中的运动物体对应的像素标记为无效像素。
步骤1102,终端根据每一帧图像中除无效像素外的其余像素生成每一帧图像对应的背景图像。
具体地,终端将每一帧图像中的运动物体标记为无效像素之后,可以确定根据无效像素以外的其他像素,生成每一帧图像对应的背景图像。
在本申请实施例中,终端将每一帧图像中的运动物体对应的像素标记为无效像素,并据每一帧图像中除无效像素外的其余像素生成每一帧图像对应的背景图像,从而可以消除确定每一帧图像中的背景图像,使得背景图像中没有运动物体。
为了更好的说明本申请实施例中介绍的图像处理方法,如图12所示,其示出了图像处理方法的一种可选的操作流程。
步骤1201,终端获取摄像模组对同一场景拍摄的多帧图像,利用目标检测模型对多帧图像进行目标检测,获得多帧图像中每一帧图像包括的目标物体,执行步骤1202或者步骤1206。
步骤1202,终端确定目标物体在每一帧图像中的位置。
步骤1203,终端计算目标物体在多帧图像的任意两帧图像中位置偏差值,若最大的位置偏差值小于位置偏差阈值,则执行步骤1204;若目标物体在多帧图像的任意两帧图像中位置偏差值大于或等于位置偏差阈值,则执行步骤1205。
步骤1204,终端确定目标物体为静止物体。
步骤1205,终端确定目标物体为运动物体,执行步骤1210。
步骤1206,终端确定每一帧图像在追踪位置的目标像素的数量。
步骤1207,终端计算多帧图像中任意两帧图像在追踪位置的目标像素的数量差。若最大的数量差小于像素数量阈值,执行步骤1208;若任意两帧图像在追踪位置的目标像素的数量差大于或者等于像素数量阈值,则执行步骤1209。
步骤1208,终端确定目标物体为静止物体。
步骤1209,终端确定目标物体为运动物体,执行步骤1210。
步骤1210,终端将每一帧图像中的运动物体对应的像素标记为无效像素。
步骤1211,终端根据每一帧图像中除无效像素外的其余像素生成每一帧图像对应的背景图像。
步骤1212,终端对所有背景图像进行融合处理,生成目标背景图像。
步骤1213,终端确定每一帧图像中所述运动物体的像素数量。
步骤1214,终端确定像素数量最少的一帧图像为参考图像。
步骤1215,终端利用目标背景图像的目标区域覆盖参考图像的目标区域,获得摄像模组的输出图像。
应该理解的是,虽然图2、图5-7以及图9-12的流程图中的各个步骤按照箭头的指示依次显示,但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明,这些步骤的执行并没有严格的顺序限制,这些步骤可以以其它的顺序执行。而且,2、图5-7以及图9-12中的至少一部分步骤可以包括多个步骤或者多个阶段,这些步骤或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,这些步骤或者阶段的执行顺序也不必然是依次进行,而是可以与其它步骤或者其它步骤中的步骤或者阶段的至少一部分轮流或者交替地执行。
在本申请一个实施例中,如图13所示,提供了一种图像处理装置1300,包括:获取模块1310、确定模块1320、去除模块1330和覆盖模块1340,其中:
获取模块1310,用于获取摄像模组对同一场景拍摄的多帧图像,利用目标检测模型对多帧图像进行目标检测,获得多帧图像中每一帧图像包括的目标物体;
确定模块1320,用于对每一帧图像包括的目标物体进行分类处理,确定每一帧图像包括的运动物体和静止物体;
去除模块1330,用于去除每一帧图像中的运动物体,获得每一帧图像对应的背景图像,对所有背景图像进行融合处理,生成目标背景图像;
覆盖模块1340,用于利用目标背景图像的目标区域覆盖多帧图像中一帧图像的目标区域,获得摄像模组的输出图像;其中,目标区域为运动物体对应的区域。
在本申请一个实施例中,上述覆盖模块1340,具体用于确定每一帧图像中运动物体的像素数量;确定像素数量最少的一帧图像为参考图像;利用目标背景图像的目标区域覆盖参考图像的目标区域,获得摄像模组的输出图像。
在本申请一个实施例中,如图14所示,上述确定模块1320,包括:第一确定单元1321和第二确定单元1322,其中:
第一确定单元1321,用于确定目标物体在每一帧图像中的位置。
第二确定单元1322,用于根据目标物体在每一帧图像中的位置,确定目标物体为运动物体或者静止物体。
在本申请一个实施例中,上述第二确定单元1322,具体用于计算目标物体在多帧图像的任意两帧图像中位置偏差值,若最大的位置偏差值小于位置偏差阈值,则确定目标物体为静止物体,若目标物体在多帧图像的任意两帧图像中位置偏差值大于或等于位置偏差阈值,则确定目标物体为运动物体。
在本申请一个实施例中,如图15所示,上述确定模块1320,还包括:第三确定单元1323和第四确定单元1324,其中:
第三确定单元1323,用于确定每一帧图像在追踪位置的目标像素的数量,目标像素用于显示目标物体,追踪位置为多帧图像中任意一帧图像中目标物体的位置。
第四确定单元1324,用于根据每一帧图像在追踪位置的目标像素的数量,确定目标物体为运动物体或者静止物体。
在本申请一个实施例中,上述第四确定单元1324,具体用于计算多帧图像中任意两帧图像在追踪位置的目标像素的数量差;若最大的数量差小于像素数量阈值,则确定目标物体为静止物体;若任意两帧图像在追踪位置的目标像素的数量差大于或者等于像素数量阈值,则确定目标物体为运动物体。
在本申请一个实施例中,上述去除模块1330,具体用于将每一帧图像中的运动物体对应的像素标记为无效像素;根据每一帧图像中除无效像素外的其余像素生成每一帧图像对应的背景图像。
关于图像处理装置的具体限定可以参见上文中对于图像处理方法的限定,在此不再赘述。上述图像处理装置中的各个模块可全部或部分通过软件、硬件及其组合来实现。上述各模块可以硬件形式内嵌于或独立于计算机设备中的处理器中,也可以以软件形式存储于计算机设备中的存储器中,以便于处理器调用执行以上各个模块对应的操作。
在本申请一个实施例中,提供了一种计算机设备,包括存储器和处理器,存储器中存储有计算机程序,该处理器执行计算机程序时实现以下步骤:获取摄像模组对同一场景拍摄的多帧图像,利用目标检测模型对多帧图像进行目标检测,获得多帧图像中每一帧图像包括的目标物体;对每一帧图像包括的目标物体进行分类处理,确定每一帧图像包括的运动物体和静止物体;去除每一帧图像中的运动物体,获得每一帧图像对应的背景图像,对所有背景图像进行融合处理,生成目标背景图像;利用目标背景图像的目标区域覆盖多帧图像中一帧图像的目标区域,获得摄像模组的输出图像;其中,目标区域为运动物体对应的区域。
在本申请一个实施例中,处理器执行计算机程序时还实现以下步骤:确定每一帧图像中运动物体的像素数量;确定像素数量最少的一帧图像为参考图像;利用目标背景图像的目标区域覆盖参考图像的目标区域,获得摄像模组的输出图像。
在本申请一个实施例中,处理器执行计算机程序时还实现以下步骤:确定目标物体在每一帧图像中的位置;根据目标物体在每一帧图像中的位置,确定目标物体为运动物体或者静止物体。
在本申请一个实施例中,处理器执行计算机程序时还实现以下步骤:计算目标物体在多帧图像的任意两帧图像中位置偏差值,若最大的位置偏差值小于位置偏差阈值,则确定目标物体为静止物体,若目标物体在多帧图像的任意两帧图像中位置偏差值大于或等于位置偏差阈值,则确定目标物体为运动物体。
在本申请一个实施例中,处理器执行计算机程序时还实现以下步骤:确定每一帧图像在追踪位置的目标像素的数量,目标像素用于显示目标物体,追踪位置为多帧图像中任意一帧图像中目标物体的位置;根据每一帧图像在追踪位置的目标像素的数量,确定目标物体为运 动物体或者静止物体。
在本申请一个实施例中,处理器执行计算机程序时还实现以下步骤:计算多帧图像中任意两帧图像在追踪位置的目标像素的数量差;若最大的数量差小于像素数量阈值,则确定目标物体为静止物体;若任意两帧图像在追踪位置的目标像素的数量差大于或者等于像素数量阈值,则确定目标物体为运动物体。
在本申请一个实施例中,处理器执行计算机程序时还实现以下步骤:将每一帧图像中的运动物体对应的像素标记为无效像素;根据每一帧图像中除无效像素外的其余像素生成每一帧图像对应的背景图像。
在本申请一个实施例中,提供了一种计算机可读存储介质,其上存储有计算机程序,计算机程序被处理器执行时实现以下步骤:获取摄像模组对同一场景拍摄的多帧图像,利用目标检测模型对多帧图像进行目标检测,获得多帧图像中每一帧图像包括的目标物体;对每一帧图像包括的目标物体进行分类处理,确定每一帧图像包括的运动物体和静止物体;去除每一帧图像中的运动物体,获得每一帧图像对应的背景图像,对所有背景图像进行融合处理,生成目标背景图像;利用目标背景图像的目标区域覆盖多帧图像中一帧图像的目标区域,获得摄像模组的输出图像;其中,目标区域为运动物体对应的区域。
在本申请一个实施例中,计算机程序被处理器执行时还实现以下步骤:确定每一帧图像中运动物体的像素数量;确定像素数量最少的一帧图像为参考图像;利用目标背景图像的目标区域覆盖参考图像的目标区域,获得摄像模组的输出图像。
在本申请一个实施例中,计算机程序被处理器执行时还实现以下步骤:确定目标物体在每一帧图像中的位置;根据目标物体在每一帧图像中的位置,确定目标物体为运动物体或者静止物体。
在本申请一个实施例中,计算机程序被处理器执行时还实现以下步骤:计算目标物体在多帧图像的任意两帧图像中位置偏差值,若最大的位置偏差值小于位置偏差阈值,则确定目标物体为静止物体,若目标物体在多帧图像的任意两帧图像中位置偏差值大于或等于位置偏差阈值,则确定目标物体为运动物体。
在本申请一个实施例中,计算机程序被处理器执行时还实现以下步骤:确定每一帧图像在追踪位置的目标像素的数量,目标像素用于显示目标物体,追踪位置为多帧图像中任意一 帧图像中目标物体的位置;根据每一帧图像在追踪位置的目标像素的数量,确定目标物体为运动物体或者静止物体。
在本申请一个实施例中,计算机程序被处理器执行时还实现以下步骤:计算多帧图像中任意两帧图像在追踪位置的目标像素的数量差;若最大的数量差小于像素数量阈值,则确定目标物体为静止物体;若任意两帧图像在追踪位置的目标像素的数量差大于或者等于像素数量阈值,则确定目标物体为运动物体。
在本申请一个实施例中,计算机程序被处理器执行时还实现以下步骤:将每一帧图像中的运动物体对应的像素标记为无效像素;根据每一帧图像中除无效像素外的其余像素生成每一帧图像对应的背景图像。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,所述的计算机程序可存储于一非易失性计算机可读取存储介质中,该计算机程序在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和易失性存储器中的至少一种。非易失性存储器可包括只读存储器(Read-Only Memory,ROM)、磁带、软盘、闪存或光存储器等。易失性存储器可包括随机存取存储器(Random Access Memory,RAM)或外部高速缓冲存储器。作为说明而非局限,RAM可以是多种形式,比如静态随机存取存储器(Static Random Access Memory,SRAM)或动态随机存取存储器(Dynamic Random Access Memory,DRAM)等。
以上实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。
以上所述实施例仅表达了本申请的几种实施方式,其描述较为具体和详细,但并不能因此而理解为对发明专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本申请构思的前提下,还可以做出若干变形和改进,这些都属于本申请的保护范围。因此,本申请专利的保护范围应以所附权利要求为准。

Claims (10)

  1. 一种图像处理方法,其特征在于,所述方法包括:
    获取摄像模组对同一场景拍摄的多帧图像,利用目标检测模型对所述多帧图像进行目标检测,获得所述多帧图像中每一帧图像包括的目标物体;
    对所述每一帧图像包括的所述目标物体进行分类处理,确定所述每一帧图像包括的运动物体和静止物体;
    去除所述每一帧图像中的运动物体,获得所述每一帧图像对应的背景图像,对所有背景图像进行融合处理,生成目标背景图像;
    利用所述目标背景图像的目标区域覆盖所述多帧图像中一帧图像的目标区域,获得所述摄像模组的输出图像;其中,所述目标区域为所述运动物体对应的区域。
  2. 根据权利要求1所述的方法,其特征在于,所述利用所述目标背景图像的目标区域覆盖所述多帧图像中一帧图像的目标区域,获得所述摄像模组的输出图像,包括:
    确定所述每一帧图像中所述运动物体的像素数量;
    确定所述像素数量最少的一帧图像为参考图像;
    利用所述目标背景图像的目标区域覆盖所述参考图像的目标区域,获得所述摄像模组的输出图像。
  3. 根据权利要求1所述的方法,其特征在于,所述对所述每一帧图像包括的所述目标物体进行分类处理,确定所述每一帧图像包括的运动物体和静止物体,包括:
    确定所述目标物体在所述每一帧图像中的位置;
    根据所述目标物体在所述每一帧图像中的位置,确定所述目标物体为运动物体或者静止物体。
  4. 根据权利要求3所述的方法,其特征在于,所述根据所述目标物体在所述每一帧图像中的位置,确定所述目标物体为运动物体或者静止物体,包括:
    计算所述目标物体在所述多帧图像的任意两帧图像中位置偏差值,若最大的位置偏差值小于位置偏差阈值,则确定所述目标物体为静止物体,若所述目标物体在所述多帧图像的任意两帧图像中位置偏差值大于或等于所述位置偏差阈值,则确定所述目标物体为运动物体。
  5. 根据权利要求1所述的方法,其特征在于,所述对所述每一帧图像包括的所述目标物体进行分类处理,确定所述每一帧图像包括的运动物体和静止物体,包括:
    确定所述每一帧图像在追踪位置的目标像素的数量,所述目标像素用于显示所述目标物体,所述追踪位置为所述多帧图像中任意一帧图像中所述目标物体的位置;
    根据所述每一帧图像在追踪位置的目标像素的数量,确定所述目标物体为运动物体或者静止物体。
  6. 根据权利要求5所述的方法,其特征在于,所述根据所述每一帧图像在追踪位置的目标像素的数量,确定所述目标物体为运动物体或者静止物体,包括:
    计算所述多帧图像中任意两帧图像在所述追踪位置的目标像素的数量差;
    若最大的数量差小于像素数量阈值,则确定所述目标物体为所述静止物体;
    若任意两帧图像在所述追踪位置的目标像素的数量差大于或者等于像素数量阈值,则确定所述目标物体为所述运动物体。
  7. 根据权利要求1所述的方法,其特征在于,去除所述每一帧图像中的运动物体,获得所述每一帧图像对应的背景图像,包括:
    将所述每一帧图像中的运动物体对应的像素标记为无效像素;
    根据所述每一帧图像中除所述无效像素外的其余像素生成所述每一帧图像对应的背景图像。
  8. 一种图像处理装置,其特征在于,所述装置包括:
    获取模块,用于获取摄像模组对同一场景拍摄的多帧图像,利用目标检测模型对所述多帧图像进行目标检测,获得所述多帧图像中每一帧图像包括的目标物体;
    确定模块,用于对所述每一帧图像包括的所述目标物体进行分类处理,确定所述每一帧图像包括的运动物体和静止物体;
    去除模块,用于去除所述每一帧图像中的运动物体,获得所述每一帧图像对应的背景图像,对所有背景图像进行融合处理,生成目标背景图像;
    覆盖模块,用于利用所述目标背景图像的目标区域覆盖所述多帧图像中一帧图像的目标区域,获得所述摄像模组的输出图像;其中,所述目标区域为所述运动物体对应的区域。
  9. 一种计算机设备,包括存储器和处理器,所述存储器存储有计算机程序,其特征在于,所述处理器执行所述计算机程序时实现权利要求1至7中任一项所述的方法的步骤。
  10. 一种计算机可读存储介质,其上存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现权利要求1至7中任一项所述的方法的步骤。
PCT/CN2022/083404 2021-03-29 2022-03-28 图像处理方法、装置、计算机设备和存储介质 WO2022206680A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110338721.3 2021-03-29
CN202110338721.3A CN113129229A (zh) 2021-03-29 2021-03-29 图像处理方法、装置、计算机设备和存储介质

Publications (1)

Publication Number Publication Date
WO2022206680A1 true WO2022206680A1 (zh) 2022-10-06

Family

ID=76774558

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/083404 WO2022206680A1 (zh) 2021-03-29 2022-03-28 图像处理方法、装置、计算机设备和存储介质

Country Status (2)

Country Link
CN (1) CN113129229A (zh)
WO (1) WO2022206680A1 (zh)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113129227A (zh) * 2021-03-29 2021-07-16 影石创新科技股份有限公司 图像处理方法、装置、计算机设备和存储介质
CN113129229A (zh) * 2021-03-29 2021-07-16 影石创新科技股份有限公司 图像处理方法、装置、计算机设备和存储介质
CN117716705A (zh) * 2022-06-20 2024-03-15 北京小米移动软件有限公司 一种图像处理方法、图像处理装置及存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2003030664A (ja) * 2001-07-18 2003-01-31 Hitachi Software Eng Co Ltd 移動物体抽出方法及び装置
CN107507160A (zh) * 2017-08-22 2017-12-22 努比亚技术有限公司 一种图像融合方法、终端及计算机可读存储介质
CN109167893A (zh) * 2018-10-23 2019-01-08 Oppo广东移动通信有限公司 拍摄图像的处理方法、装置、存储介质及移动终端
CN111815673A (zh) * 2020-06-23 2020-10-23 四川虹美智能科技有限公司 运动目标检测方法、装置及可读介质
CN113129229A (zh) * 2021-03-29 2021-07-16 影石创新科技股份有限公司 图像处理方法、装置、计算机设备和存储介质

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105827952B (zh) * 2016-02-01 2019-05-17 维沃移动通信有限公司 一种去除指定对象的拍照方法及移动终端
CN110213476A (zh) * 2018-02-28 2019-09-06 腾讯科技(深圳)有限公司 图像处理方法及装置
CN109002787B (zh) * 2018-07-09 2021-02-23 Oppo广东移动通信有限公司 图像处理方法和装置、存储介质、电子设备
CN111242128B (zh) * 2019-12-31 2023-08-04 深圳奇迹智慧网络有限公司 目标检测方法、装置、计算机可读存储介质和计算机设备
CN111369469B (zh) * 2020-03-10 2024-01-12 北京爱笔科技有限公司 图像处理方法、装置及电子设备

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2003030664A (ja) * 2001-07-18 2003-01-31 Hitachi Software Eng Co Ltd 移動物体抽出方法及び装置
CN107507160A (zh) * 2017-08-22 2017-12-22 努比亚技术有限公司 一种图像融合方法、终端及计算机可读存储介质
CN109167893A (zh) * 2018-10-23 2019-01-08 Oppo广东移动通信有限公司 拍摄图像的处理方法、装置、存储介质及移动终端
CN111815673A (zh) * 2020-06-23 2020-10-23 四川虹美智能科技有限公司 运动目标检测方法、装置及可读介质
CN113129229A (zh) * 2021-03-29 2021-07-16 影石创新科技股份有限公司 图像处理方法、装置、计算机设备和存储介质

Also Published As

Publication number Publication date
CN113129229A (zh) 2021-07-16

Similar Documents

Publication Publication Date Title
WO2022206680A1 (zh) 图像处理方法、装置、计算机设备和存储介质
CN108009543B (zh) 一种车牌识别方法及装置
Li et al. Supervised people counting using an overhead fisheye camera
US10452893B2 (en) Method, terminal, and storage medium for tracking facial critical area
WO2019218824A1 (zh) 一种移动轨迹获取方法及其设备、存储介质、终端
CN109815843B (zh) 图像处理方法及相关产品
JP4970557B2 (ja) デジタル画像取込装置における顔検索及び検出
WO2017080399A1 (zh) 一种人脸位置跟踪方法、装置和电子设备
CN108846854B (zh) 一种基于运动预测与多特征融合的车辆跟踪方法
EP3982322A1 (en) Panoramic image and video splicing method, computer-readable storage medium, and panoramic camera
CN109344727B (zh) 身份证文本信息检测方法及装置、可读存储介质和终端
KR101603019B1 (ko) 화상 처리 장치, 화상 처리 방법 및 컴퓨터로 판독 가능한 기록 매체
Shi et al. Robust foreground estimation via structured Gaussian scale mixture modeling
CN109299658B (zh) 脸部检测方法、脸部图像渲染方法、装置及存储介质
US20090285488A1 (en) Face tracking method for electronic camera device
WO2022194079A1 (zh) 天空区域分割方法、装置、计算机设备和存储介质
CN111626163A (zh) 一种人脸活体检测方法、装置及计算机设备
WO2022233252A1 (zh) 图像处理方法、装置、计算机设备和存储介质
WO2022206679A1 (zh) 图像处理方法、装置、计算机设备和存储介质
Wang et al. Object counting in video surveillance using multi-scale density map regression
AU2014277855A1 (en) Method, system and apparatus for processing an image
CN113256683A (zh) 目标跟踪方法及相关设备
CN114092720A (zh) 目标跟踪方法、装置、计算机设备和存储介质
CN115514887A (zh) 视频采集的控制方法、装置、计算机设备和存储介质
CN115115552A (zh) 图像矫正模型训练及图像矫正方法、装置和计算机设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22778862

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 22778862

Country of ref document: EP

Kind code of ref document: A1