CN113950705A

CN113950705A - Image processing method and device and movable platform

Info

Publication number: CN113950705A
Application number: CN202080039127.8A
Authority: CN
Inventors: 周游; 刘洁; 陈希
Original assignee: SZ DJI Technology Co Ltd
Current assignee: SZ DJI Technology Co Ltd
Priority date: 2020-08-26
Filing date: 2020-08-26
Publication date: 2022-01-18
Also published as: WO2022040988A1

Abstract

An image processing method, an image processing device and a movable platform are provided. The method comprises the following steps: acquiring a first image acquired by a camera device in a first position, and determining a first pixel block corresponding to an object to be filtered in the first image; acquiring a second image acquired by the camera device in a second position, wherein the second image comprises a second pixel block corresponding to a target object, and the target object is an object shielded by an object to be filtered in the first image; and replacing the first pixel block in the first image through the second pixel block to generate a replaced first image.

Description

Image processing method and device and movable platform

Technical Field

The present application relates to the field of image processing technologies, and in particular, to an image processing method, an image processing apparatus, and a movable platform.

Background

In the process of shooting images or videos, a user usually faces a scene with non-target shooting objects in the shooting visual angle range, and the non-target shooting objects are also shot into the final images or videos to influence the final shooting effect. For example, suppose a user shoots a certain building, there may be some non-target objects such as passers-by, vehicles, trash cans or utility poles beside the building, and these non-target objects may obstruct the shot building or appear in the shot image, which affects the display effect of the image. In order to improve the shooting effect of images or videos and to better meet the shooting requirements of users, it is necessary to provide a scheme for removing non-target shooting objects in images.

Disclosure of Invention

In view of the above, the present application provides an image processing method, an image processing apparatus and a movable platform.

According to a first aspect of the present application, there is provided an image processing method, the method comprising:

acquiring a first image acquired by a camera device in a first position, and determining a first pixel block corresponding to an object to be filtered in the first image;

acquiring a second image acquired by the camera device in a second position, wherein the second image comprises a second pixel block corresponding to a target object, and the target object is an object shielded by the object to be filtered in the first image;

and replacing the first pixel block in the first image through the second pixel block to generate a replaced first image.

According to a second aspect of the present application, there is provided an image processing method, characterized in that the method includes:

determining a first pixel block corresponding to a dynamic object in a first image;

determining pixel blocks of pixel positions of the first pixel blocks in corresponding pixel positions of a plurality of second images, wherein the plurality of second images and the first image are acquired at the same pose through a camera device;

determining a pixel block at the corresponding pixel position from the plurality of second images as a third image of a static area;

replacing the first pixel block in the first image with a pixel block of the corresponding pixel location in the third image.

According to a third aspect of the present application, there is provided an image processing apparatus comprising a processor, a memory, and a computer program stored in the memory and executable by the processor, the processor implementing the following steps when executing the computer program:

According to a fourth aspect of the present application, there is provided an image processing apparatus comprising a processor, a memory, and a computer program stored in the memory and executable by the processor, the processor implementing the following steps when executing the computer program:

According to a fifth aspect of the present application, there is provided a movable platform comprising an image capture device and an image processing device according to any one of the embodiments of the present application.

By the scheme, the images acquired by the camera device at different poses are acquired, the second image of the target object shielded by the object to be filtered is used for complementing the shielded target object in the first image, the object to be filtered in the first image is eliminated, the method is not only suitable for filtering dynamic objects, but also suitable for filtering static objects, the purpose of automatically filtering the non-shooting target object of the image according to user requirements can be achieved, and the display effect and the user experience of the image can be improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive labor.

Fig. 1 is a schematic diagram of filtering out a non-photographic target object in an image according to an embodiment of the present application.

FIG. 2 is a flowchart of an image processing method according to an embodiment of the present application.

Fig. 3 is a schematic diagram illustrating a first pixel block corresponding to an object to be filtered according to an embodiment of the disclosure.

Fig. 4 is a schematic diagram of a prompt interface for prompting a user to adjust to a second pose according to an embodiment of the present application.

Fig. 5 is a schematic diagram of adjusting an image capturing apparatus to a second pose according to an embodiment of the present application.

FIG. 6 is a schematic diagram of a determination of a second position according to an embodiment of the present application.

Fig. 7 is a schematic diagram of determining corresponding pixel regions of an object to be filtered and a target object according to an embodiment of the present application.

FIG. 8 is a schematic illustration of determining a second orientation according to an embodiment of the present application.

Fig. 9 is a schematic diagram of determining whether an image captured by a camera device can be used as a second image according to an embodiment of the present application.

Fig. 10 is a schematic diagram of determining whether an image captured by a camera device can be used as a second image according to an embodiment of the present application.

Fig. 11(a) is a schematic diagram of filtering out dynamic objects according to an embodiment of the present application.

FIG. 11(b) is a schematic diagram of determining a third image according to an embodiment of the present application.

FIG. 12 is a flow chart of an image processing method according to an embodiment of the present application.

Fig. 13 is a schematic diagram of an application scenario according to an embodiment of the present application.

FIG. 14 is a diagram illustrating filtering of dynamic objects according to an embodiment of the present application.

Fig. 15 is a schematic diagram of a method for framing an object to be filtered and an occluded background area according to an embodiment of the present application.

FIG. 16 is a schematic diagram of filtering out static objects according to an embodiment of the present application

FIG. 17 is a schematic diagram of a logic structure of an image processing apparatus according to an embodiment of the present application

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

During the process of shooting a video or an image, some non-shooting target objects may exist in the shooting visual angle range, and the non-shooting target objects are also shot into the image, so that the display effect of the image is influenced. These non-photographic target objects may be dynamic, such as pedestrians, vehicles, etc. walking, or static, such as trash cans, utility poles, buildings, etc. As shown in fig. 1(a), the user wants to photograph a house 11 and its surroundings, but a trash can 12 and passersby in front of the house are also photographed in the image 13, seriously affecting the visual effect of the image. Therefore, it is necessary to remove these non-shooting target objects in the image, as shown in fig. 1(b), so that the image has a better shooting effect, and the user experience is improved.

In order to remove a non-shooting target object (hereinafter referred to as an object to be filtered) in an image, an embodiment of the present application provides an image processing method, where a first image is collected in a first position, and then a second image is collected in a second position where the target object in the first image that is blocked by the object to be filtered can be observed, so as to use the second image to complement the target object in the first image that is blocked by the object to be filtered, thereby achieving the purpose of removing the object to be filtered.

Specifically, the flowchart of the method is shown in fig. 2, and includes the following steps:

s202, acquiring a first image acquired by a camera device in a first position, and determining a first pixel block corresponding to an object to be filtered in the first image;

s204, acquiring a second image acquired by the camera device in a second position, wherein the second image comprises a second pixel block corresponding to a target object, and the target object is an object shielded by the object to be filtered in the first image;

s206, replacing the first pixel block in the first image through the second pixel block to generate a replaced first image.

The image processing method can be executed by the camera device for acquiring the first image and the second image, the camera device can be any equipment with an image or video acquisition function, for example, the camera device can be a camera, or a camera, a mobile phone, a tablet, a notebook computer, an intelligent wearable device and other terminals provided with the camera, and also can be a movable platform provided with the camera, such as an unmanned aerial vehicle, an unmanned vehicle and a handheld cloud deck. Certainly, in some embodiments, the image processing method in the embodiment of the present application may also be executed by other devices in communication connection with the camera device, for example, the image processing method may be executed by a cloud server, and the camera device acquires the first image and the second image and then sends the first image and the second image to the cloud server for processing.

The object to be filtered in the embodiment of the present application refers to an object that a user desires to remove from a first image, where the object to be filtered may be a dynamic object or a static object, and the number of the objects to be filtered may be one or more.

The target object in the embodiment of the present application refers to an object in the first image that is blocked by an object to be filtered, and the target object may be one or more objects.

After a first image acquired by the camera device is acquired, a first pixel block corresponding to an object to be filtered can be determined in the first image, then a second image containing a target object shielded by the object to be filtered is acquired, and the first pixel block in the first image is replaced by using a second pixel block corresponding to the target object in the second image, so that the object to be filtered in the first image can be eliminated.

According to the method, the images acquired by the camera device at different poses are acquired, the target object in another image is complemented by the image of the target object shielded by the object to be filtered, so that the object to be filtered in another image is eliminated, the method is not only suitable for filtering a dynamic object, but also can be used for filtering a static object, the non-shooting target object of the image can be automatically filtered according to user requirements, and the display effect and the user experience of the image can be improved.

The object to be filtered in the first image is determined, and the object to be filtered in the first image can be selected by a user or automatically identified by equipment. In some embodiments, the device may automatically identify the object to be filtered from the first image, for example, some identification rules of the object to be filtered may be preset, and then identify the object to be filtered according to the rules, for example, an object such as a trash can, a telegraph pole, a walking vehicle, and the like may be automatically identified from the first image as the object to be filtered, and an object located at an edge position of the image may also be identified as the object to be filtered. Of course, after the object to be filtered is automatically identified, the identified object to be filtered may also be displayed to the user, for example, an image of the object to be filtered is selected in the user interface display frame, and the subsequent steps are executed after the user confirms the image.

In some embodiments, the object to be filtered out may be determined in the first image according to an instruction of the user, for example, the user may click or frame the object desired to be filtered out on the image. The first pixel block may be a pixel block including only the object to be filtered, or a pixel block including the object to be filtered and a peripheral portion of the background thereof. For example, as shown in fig. 3, the first pixel block may be a human-shaped outline region 31 including only the person to be filtered out, or may be a rectangular region 32 including the person to be filtered out and its surrounding background region.

In some embodiments, the user instruction for determining the object to be filtered out may include a selection box input by the user through the human-computer interaction interface, and the selection box is used for selecting the object to be filtered out. For example, a user may directly draw a box on the user interface to select an object to be filtered, where the box drawn by the user may be a rectangular box, a circular box, or an irregularly shaped box, and may be specifically set according to actual requirements.

In a certain embodiment, an image area selected by a user through a frame selection frame may be used as a first pixel area corresponding to an object to be filtered, because a frame selection drawn by the user may not be accurate enough, a part of edge areas of the object to be filtered may not be selected, but some background areas may frame the selected scene, and further, when the first pixel area is subjected to replacement processing, the object to be filtered which is not completely replaced may exist in the image after the replacement processing, so that a defect occurs after the image is replaced. The principle of the super-pixel segmentation processing is to group pixels, and group adjacent pixels with similar texture, color, brightness and other characteristics into a group as an image area, so that the image can be divided into a plurality of image areas, and then the pixel blocks framed in the selection frame can be adjusted according to the plurality of image areas. When the superpixel segmentation processing is performed on the first image, the superpixel segmentation processing can be realized by adopting a currently general algorithm, and details are not repeated herein.

In some embodiments, when the pixel blocks framed in the frame are adjusted according to the plurality of image areas obtained by the super-pixel segmentation processing, the pixel blocks framed in the frame may be adjusted according to the ratio of the portion of each image area falling into the frame to the image area. For example, the ratio of the portion of each image area falling into the frame to the image area may be determined, if the ratio is greater than a preset ratio, for example, greater than 50%, the image area is considered to be selected by the frame, the image area is placed in the frame, and if the ratio is less than the preset ratio, the image area is considered not to be selected by the frame, the image area is placed outside the frame, so as to adjust the pixel area selected by the frame. Of course, in some embodiments, in order to ensure that the first pixel block corresponding to the object to be filtered is within the selection frame as much as possible, after the pixel block selected by the selection frame is adjusted in the above manner, the selection frame may be expanded, and the pixel block selected by the selection frame is expanded to serve as the first pixel block.

The image is subjected to superpixel processing, then the user input selection frame is subjected to fine adjustment, and the selection content of the selection frame is adjusted, so that when a user selects an object to be filtered, the user does not need to precisely select the frame, and an image area which the user wants to select can be accurately determined, so that a first pixel block corresponding to the object to be filtered is accurately determined, and the operation of the user is facilitated.

In order to conveniently adjust the control camera device to different poses so as to acquire images at different poses, in some embodiments, the camera device may be carried on the movable platform, and the movement of the camera device may be controlled by controlling the movement of the movable platform so as to adjust to a second pose capable of observing the second image blocked by the object to be filtered and acquire the second image.

In some embodiments, the movable platform may be any electronic device that includes a powered component that can actuate movement of the movable platform. For example, the movable platform may be any one of an unmanned aerial vehicle, an unmanned ship, an intelligent robot, and the like.

In some embodiments, the camera device may also be carried on the movable platform through the pan-tilt, for example, the movable platform may be provided with the pan-tilt, and the camera device may be fixed on the pan-tilt, and may control the movement of the movable platform to control the movement of the camera device, or may control the movement of the pan-tilt to cause the camera device and the movable platform to generate relative movement, so as to control the movement of the camera device to adjust to the second pose.

In some embodiments, the second position and the second orientation may be included, when the camera device is controlled to move in a scene where the cradle head is mounted on the movable platform, the camera device may be located at the second position by controlling the movable platform to move, and the orientation of the camera device may be adjusted to the second orientation by controlling the cradle head to rotate. Use movable platform as unmanned aerial vehicle for example, can set up the cloud platform on the unmanned aerial vehicle, camera device can install on the cloud platform, and after confirming the second position appearance, can fly to the second position through controlling unmanned aerial vehicle, after reacing the second position, can control the cloud platform and rotate, make camera device's orientation adjust to the second orientation.

There are a number of ways in which the second image may be acquired. For example, the second pose at which the target object can be observed is determined first, then the camera device is directly controlled to adjust to the second pose to acquire the second image, of course, the pose of the camera device may also be continuously changed to acquire a plurality of frames of images acquired by the camera device at different poses, and then each frame of image is acquired, whether the image includes the complete target object is determined, and the image including the complete target object is taken as the second image until the image including the complete target object is acquired.

In some embodiments, in order to acquire the target object including the object to be filtered, a second pose in which the camera device can observe the target object may be automatically determined, and then the camera device is controlled to move to adjust to the second pose so as to acquire a second image. For example, camera device can carry on in movable platform such as unmanned aerial vehicle, unmanned car, consequently, can the automatic calculation camera device should be in which position and gesture, just can observe complete target object, then automatic control unmanned aerial vehicle, unmanned car move to corresponding position to the collection obtains the second image.

In some embodiments, it may also be automatically determined that the drone may capture the second pose of the target object, and then send a prompt indicating the second pose to the user, so that the user controls the camera to move to adjust to the second pose and capture the second image. For example, taking a scene shot by a user using a mobile phone, a camera, or other imaging device as an example, the second pose may be automatically calculated, and then prompt information indicating the second pose is sent to the user through an interactive interface, where the prompt information may be text information or image information, for example, prompt information such as "move 100 meters east from the current position", "move 50 meters right from the current position" may be displayed on the interactive interface, so that the user may control the imaging device to move to the corresponding position according to the prompt information and then shoot. Of course, if the user uses the movable platform to shoot, the user can control the control terminal corresponding to the movable platform according to the prompt information, and the movable platform is controlled to move to the corresponding position through the control terminal.

Of course, in some embodiments, the second pose includes a second position and a second orientation, and when the user is prompted with a prompt indicating the second pose, the user may be presented with an image identifying the second position and with the rotation angle information adjusted to the second orientation. In order to make the prompt information more intuitive and facilitate the user to quickly locate the second pose, the prompt information may also be image information, for example, an image identifying a second position in the second pose may be presented to the user on the interactive interface, where the second position where the target object may be observed may be a region, and thus, the region may be framed in the image, as shown in fig. 4. Meanwhile, rotation angle information corresponding to the second orientation adjusted to the second pose can be displayed for the user, so that the user can adjust the camera device to acquire a second image according to the displayed position information and angle information.

For example, when the distance between the object to be filtered and the target object is relatively long, an image containing the complete target object can be acquired by moving a small distance, and when the distance between the object to be filtered and the target object is relatively short, an image containing the complete target object can be acquired by moving a relatively long distance. Therefore, in some embodiments, when determining the second pose, the position information of the object to be filtered and the position information of the target object may be determined, and then the second pose may be determined according to the position information of the object to be filtered and the position information of the target object.

Of course, the size of the object to be filtered has an influence on the position where the target object can be completely photographed, for example, the object to be filtered has a large size and may be completely photographed only by moving to a distant position, and the object to be filtered has a small size and may be completely photographed only by moving a small distance. Therefore, in some embodiments, when the second pose is determined according to the position information of the object to be filtered and the position information of the target object, the size of the object to be filtered may be determined, and then the second pose is determined according to the size of the object to be filtered, the position information of the object to be filtered, and the position information of the target object.

As shown in fig. 5, the trash can 51 is an object to be filtered, the house 52 is a shielded target object, and the black dots indicate the position of the camera. Assuming that the current camera device captures a first image at "position 1", and the target object 52 in the first image is occluded by the object 51 to be filtered, when determining the second pose at which the target object 52 can be completely captured, the camera device may bypass the object 51 to be filtered (for example, bypass behind the object to be filtered, as shown in "position 2") to capture the complete target object 52, and of course, may also move a distance along the current capturing position to reach "position 3", so that the target object 52 may fall into the viewing angle range of the camera device. In some embodiments, the first position includes a first position and a first orientation, the second position includes a second position and a second orientation, the second position is located on a straight line which passes through the first position and is parallel to a plane where the object to be filtered is located, and the second orientation points to the position where the object to be filtered is located, so that the image capturing device can be translated for a distance along the first position where the image capturing device is located, and reaches the second position, and then the orientation of the image capturing device is adjusted to point to the object to be filtered.

In some embodiments, when determining the second position, the moving distance may be determined according to the position information of the object to be filtered, the position information of the target object, and the size of the object to be filtered, and then the second position may be determined according to the first position and the moving distance. For example, as shown in fig. 6, the small rectangular parallelepiped 61 in the figure is an object to be filtered, the width of the object to be filtered is L, the distance between the object to be filtered and the image capturing device is d1, the large rectangular parallelepiped 62 is an occluded object, the distance between the object to be filtered and the image capturing device is d2, the object to be filtered and the object to be filtered are converted into a view in a top view, where the object in the top view is shown as 63 in the figure, the object to be filtered in the top view is shown as 65 in the figure, a region of the object occluded by the object to be filtered is shown as 64 in the figure, position a in the figure is a first position, the image plane schematic diagram 66 is a schematic diagram of an image captured by the image capturing device in the first position, the image plane diagram 67 is a schematic diagram of an image captured by the image capturing device in position B, where the position B is a position where the image capturing device can observe the left edge of the occluded region of the object, "position B" can be reached from the first position by a translation distance D, which, as can be seen in FIG. 6, can be solved by equation (1):

the distance d1 between the object to be filtered and the camera device and the distance d2 between the target object and the camera device can be determined by a plurality of images acquired by the camera device in different poses. The width L of the object to be filtered may be determined according to the distance between the object to be filtered and the imaging device and the imaging size of the object to be filtered.

As can be seen from fig. 6, the occluded region can be observed in both the "position B" and the region on the right side of the line connecting the right edge of the object to be filtered, and thus, the second position can be any position in the region. After the moving distance D is determined, the three-dimensional space coordinate of the "position B" may be determined according to the current three-dimensional space coordinate of the first position and the moving distance D, the three-dimensional space coordinate corresponding to the second position may be further determined, and the image pickup apparatus may be controlled to move to the second position.

In some embodiments, when determining the distance d1 between the object to be filtered and the camera and the distance d2 between the target object and the camera, multiple frames of images captured by the camera at different poses may be acquired, and a region including the object to be filtered is determined from one of the frames of images, for example, as shown in fig. 7, a region 71 including the object 70 to be filtered is determined, and then a ring-shaped region 72 surrounding the point of the region 71 is determined. When determining the distance d1 between the object to be filtered and the image capturing device, a plurality of feature points may be extracted from the region 71, and the extraction of the feature points may use an existing feature point extraction algorithm, which is not described herein again. Then, the matching points of the extracted feature points in the rest of the multi-frame images can be determined, then, the optical flow vectors of the feature points are determined according to the matching points of the feature points in the rest of the multi-frame images, the optical flow vectors of the center of the object to be filtered (namely, the area 71) relative to the images are fitted according to the optical flow vectors of the feature points, so that the matching points of the center of the object to be filtered (namely, the area 71) in the rest of the multi-frame images can be determined, the internal and external parameters of the camera device can be determined by adopting a BA (bundle adjustment) algorithm according to the feature points and the matching points of the center of the object to be filtered, and the depth distance of the center of the object to be filtered, namely, the distance d1 between the object to be filtered and the camera device, is determined according to the internal and external parameters of the camera device. For the distance d2 between the target object and the imaging device, the feature points can be extracted from the annular region 72, and then the distance d2 between the target object and the imaging device can be determined by similar means, which will not be described herein again.

In some embodiments, when determining the second orientation, the second orientation may be determined according to the first position and the position of the object to be filtered in the image frame captured by the camera. For example, in the moving process of the camera device, the position of the object to be filtered in the acquired image picture can be detected in real time, and the orientation of the camera device can be continuously adjusted to keep the object to be filtered in the center of the image picture, so that when the camera device moves to the second position, the second orientation can also be determined. For example, when the image capturing device is determined to move to the second position according to the position of the center of the object to be filtered in the first image on the image frame and the pose parameter corresponding to the first pose, the center of the object to be filtered is located at the pose angle corresponding to the second orientation of the center of the image frame, so as to determine the second orientation.

In some embodiments, when determining the second orientation, the second orientation may also be determined according to the first position, the positions of the left and right end points of the object to be filtered, and the positions of the left and right end points of the target object. For example, as shown in fig. 8, it may be determined which side of the first position the second position is located according to the three-dimensional coordinates of the first position and the second position, and when the second position is located on the right side of the first position, a connection line AD may be determined according to a left end point a of the object to be filtered and a right end point D of the target object, and the second direction is to point to the object to be filtered along the connection line AD. Because the three-dimensional coordinates of the left end point A of the object to be filtered and the right end point D of the target object can be determined, the attitude angle corresponding to the connection line of the two end points can be solved. Of course, when the second position is located on the left side of the first position, a connection line BC may be determined according to the left end point B of the object to be filtered and the right end point C of the target object, and the second direction is to point to the object to be filtered along the connection line BC. Similarly, the attitude angle corresponding to the connecting line BC can be determined according to the three-dimensional coordinates of the left end point B of the object to be filtered and the right end point C of the target object.

Of course, when the second image is obtained, the second image may be obtained by continuously adjusting the pose of the camera device to collect the image and then determining whether the collected image can be used as the second image. For example, in some embodiments, the pose of the imaging device may also be continuously changed to obtain multiple frames of images acquired by the imaging device in different poses, and each time a frame of image is acquired by the imaging device, it may be determined whether the image includes the second pixel block corresponding to the target object, and if so, the image is taken as the second image.

The determination of whether the image includes the second pixel block may be determined by the user, or may be automatically determined by the device executing the image processing method. Taking the case that the camera device is mounted on a movable platform such as an unmanned aerial vehicle, a user can adjust the pose of the unmanned aerial vehicle, images are collected at different poses, then the unmanned aerial vehicle can transmit the collected images back to the control terminal, and when the user judges that the images comprise the target object, the user can click the images to use the images as second images. Of course, it is also possible to automatically determine whether the captured image can be used as the second image by performing certain processing and recognition on the captured image. For example, in some embodiments, after acquiring a first image and determining a first pixel block corresponding to an object to be filtered from the first image, a plurality of first feature points may be extracted from the first pixel block, and a plurality of second feature points may be extracted from a peripheral region of the first pixel block, for each frame of image acquired after the pose of the image capturing device is changed, a first matching point of the first feature point in the image and a second matching point of the second feature point in the image may be determined, and then it is determined whether the image includes the second pixel block according to a positional relationship of the first matching point and the second matching point in the image.

In some embodiments, the first feature point may be a feature point located inside a first side of the first pixel block, the second feature point may be a feature point located outside a second side of the first pixel block, and when it is determined whether the image includes the second pixel block according to a positional relationship between the first matching point and the second matching point, it may be determined whether the second matching point is located on the first side of the first matching point, and if so, it may be determined that the image includes the second pixel block. The first side is opposite to the second side, for example, the first side is a left side of the first pixel block, the second side is a right side of the first pixel block, the first side is an upper side of the first pixel block, and the second side is a lower side of the first pixel block. For example, as shown in fig. 9, (a) in fig. 9 is a schematic diagram of a first image 90, a first pixel block 91 can be determined from the first image 90, a plurality of first feature points 92 are extracted within a first side (i.e. the left side) of the first pixel block 91, and a plurality of second feature points 93 are extracted outside a second side (i.e. right side) of the first pixel block, after the camera device changes the pose to collect one frame of image, as shown in (b) as a schematic diagram of the image 94, a first matching point 95 of a first feature point in the image 94 and a second matching point 96 of a second feature point in the image 94 may be determined, then, the positional relationship of the first matching point 95 and the second matching point 96 is determined, as shown in the figure, the second matching point 96 is located on the first side (left side) of the first matching point 95, then, the target object is considered not to be blocked by the object to be filtered, so that it can be determined that the image includes the second pixel block corresponding to the target object.

In some embodiments, the plurality of second feature points may be located in a ring-shaped pixel block surrounding the first pixel region, and when it is determined whether the image includes the second pixel block according to a positional relationship between the first matching point and the second matching point, it is determined that the second pixel block is included in the second image when a preset number of second matching points among the second matching points are located on one side of the first matching point. The preset number may be determined according to actual requirements, for example, 90% of the second matching points are located on one side of the first matching point. As shown in fig. 10, (a) is a schematic diagram of a first image 100, a first pixel block 101 may be determined from the first image 100, a plurality of first feature points 102 are extracted from the first pixel block 101, a plurality of second feature points 104 are extracted from a ring-shaped pixel block 103 around the first pixel block 101, after the image pickup apparatus changes the pose to acquire a frame of image, as shown in (b) is a schematic diagram of the image 105, a first matching point 106 of the first feature point 102 in the image 105 and a second matching point 107 of the second feature point 104 in the image 105 may be determined, and then the positional relationship between the first matching point 106 and the second matching point 107 is determined, as shown in the figure, when more than a certain number of second matching points 107 (for example, more than 90% of the total number of the second matching points) are located on one side of the first matching point 106, it is considered that the target object is not occluded by the object to be filtered, it can thus be determined that the image includes the second pixel region corresponding to the target object.

In some scenes, even if the pose of the camera device is adjusted, the target object blocked by the object to be filtered cannot be completely shot, for example, when the distance between the object to be filtered and the target object is very short, the complete target object cannot be shot by adjusting the pose of the camera device, so as to complement the target object blocked by the object to be filtered in the first image. In such a scenario, a prompt message may be sent to the user to prompt the user that the object to be filtered in the first image cannot be filtered in the current scenario. Therefore, in some embodiments, after the preset first condition is triggered, a prompt message that the object to be filtered cannot be filtered is sent to the user, where the prompt message may be displayed in a pop-up window form, for example, the pop-up window message may be displayed on the user interaction interface to prompt the user that the currently selected object to be filtered cannot be filtered.

In certain embodiments, the first preset condition may be at least one of: the first distance between the object to be filtered and the target object is smaller than a first preset threshold value, or the second distance between the target object and the camera device is smaller than a second preset threshold value, or the distance size relation between the first distance and the second distance does not meet a preset second condition. Of course, the first preset threshold, the second preset threshold and the second condition may be flexibly set according to an actual scene, for example, if the distance between the object to be filtered and the target object is less than 1 meter, the complete target object cannot be shot, and the first preset threshold may be set to 1 meter. The second preset threshold and the second preset condition may be set by similar means, and are not described in detail herein.

In some embodiments, the second pixel block in the acquired second image may be determined before the replacement processing of the first pixel block in the first image by the second pixel block in the second image. In some embodiments, when the second pixel block is determined in the second image, a mapping relationship between pixel points of the first image and pixel points of the second image may be determined, the mapping region of the first pixel block in the second image is determined according to the mapping relationship and is used as the second pixel block, and then the second pixel block is used to replace the first pixel block in the first image.

In some implementations, when determining the mapping relationship between the pixel points of the first image and the pixel points of the second image, a third feature point may be extracted from the peripheral region of the first pixel block in the first image, a third matching point of the third feature point may be determined in the second image, and then the mapping relationship may be determined according to the third feature point and the third matching point. For example, if the pixel coordinate of the pixel point on the first image is P1, the pixel coordinate of the pixel point on the second image is P2, and the homography matrix is H, the pixel point of the first image and the pixel point of the second image satisfy the following formula (2):

formula (2) HP1 ═ P2

Since the homography matrix H has 8 unknowns, at least 4 pairs of feature points and matching points are required to solve. Therefore, at least 4 third feature points can be extracted from a peripheral region of the first pixel block in the first image (e.g., a region surrounding the first pixel block by one circle), and then third matching points of the at least 4 pixel points in the second image are determined, and H is solved according to the third feature points and the third matching points.

Certainly, in some embodiments, in order to obtain a more accurate homography matrix H, a RANSAC (Random sample consensus) algorithm may be used to remove a point with a poor matching degree from the third feature point and the third matching point, and the H is solved through the screened third feature point and the third matching point with a more accurate matching degree, so as to obtain a more accurate H, and ensure the validity of the result.

After the H is determined, a mapping area of the first pixel block in the first image in the second image may be determined according to the H, and then the mapping area is used as the second pixel block to replace the first pixel block in the first image, so as to complement the target object in the first image, which is blocked by the object to be filtered.

In some embodiments, when determining the second pixel block in the second image, a ring-shaped pixel block surrounding the first pixel block in the first image may be determined, then a matching ring-shaped block matching the ring-shaped pixel block in the second image may be determined, and then the pixel block surrounded by the matching ring-shaped block in the second image may be used as the second pixel block.

Since the image from which the object to be filtered is removed is usually an image used by the user for subsequent use, the first image may be an image with a relatively good shooting effect, for example, the image may be a relatively clear and relatively full-face image of the target to be shot, and in some embodiments, the camera device may collect a plurality of images in the first position, and then select the first image from the plurality of images, where the first image may be selected by the user, or may be automatically selected by the device executing the image processing method according to information such as the definition, brightness, and picture composition of the image.

Of course, for a dynamic object, the object itself moves, so that the method is more suitable for eliminating a plurality of images acquired by the camera device at the same pose, and the camera device does not need to be moved. For the static object, the static object can be eliminated by moving the camera device to acquire images of the camera device at different poses because the object does not move. Therefore, in some implementations, after the user determines the first image, before determining the first pixel block corresponding to the object to be filtered in the first image, the category information of the object to be filtered may be determined, where the category information is used to identify that the object to be filtered is a dynamic object or a static object, and the object to be filtered is eliminated by using a corresponding processing method according to the category information of the object to be filtered. The determination of whether the object to be filtered is a dynamic object or a static object can be determined according to a plurality of images acquired by the camera device in the first posture.

In some embodiments, when the category of the object to be filtered is determined by the plurality of images acquired by the image pickup device in the first pose, the category information of the object to be filtered may be determined according to optical flow vectors of each pixel point of the object to be filtered in the first image relative to other images in the plurality of images. For example, the optical flow vectors of each pixel point of the object to be filtered relative to other images may be counted, and if the modulus of the optical flow vector is greater than a preset threshold, the filtered object is considered as a dynamic object.

In some embodiments, if the object to be filtered is a static object, the step of determining a first pixel block corresponding to the object to be filtered in the first image, acquiring a second image acquired by the image pickup device in the second pose and including a second pixel block, and performing replacement processing on the first pixel block in the first image through the second pixel block to generate a replaced first image is performed.

In some embodiments, if it is determined that the object to be filtered is a dynamic object, a third pixel block corresponding to the object to be filtered in the first image may be determined, then a fourth pixel block located at a pixel position corresponding to the pixel position of the third pixel block in other images acquired by the imaging device in the first pose except the first image is determined, the fourth pixel block is determined to be a third image of a static area from the other images (i.e., an image in which the corresponding area of the third pixel block is not blocked), and the third pixel block in the first image is replaced with the fourth pixel block in the third image. As shown in fig. 11(a), a third pixel block 110 corresponding to an object to be filtered may be determined in the first image, then a fourth pixel block (e.g., 111 in the figure) at a pixel position corresponding to a pixel position in another image (e.g., image 1, image 2, and image 3 in the figure) where the third pixel block 110 is located is determined, then whether the fourth pixel block 111 is a static area may be determined, if yes, the image is taken as the third image, for example, the fourth pixel block 111 in the image 1 is a static area, the image 1 is taken as the third image, and the fourth pixel block 111 in the image 1 is used to replace the third pixel block 110 in the first image.

In some embodiments, when determining that the fourth pixel block is the third image of the static area in the other images except the first image, the dynamic area in the other images may be determined first, and then, for each object to be filtered in the first image, the pixel block of the pixel position of the third pixel block in the corresponding pixel position in the other images may be determined according to the order from near to far between the other images and the first image acquisition order, until the pixel block of the corresponding pixel position is not overlapped with the dynamic area (i.e., is not blocked), the other images are used as the third image. For example, as shown in fig. 11(b), assuming that the first image is a K frame image acquired by the image pickup device, and the other images are respectively a K +1 frame, a K +2 frame, a K +3 frame, and the like acquired by the image pickup device, dynamic regions in the K +1 frame, the K +2 frame, and the K +3 frame may be determined first, when determining a dynamic region of each frame, an optical flow vector of each pixel point of the current frame and an adjacent frame or multi-frame image thereof may be calculated, if a modulus of the optical flow vector of the pixel point is greater than a certain threshold, the pixel point is considered to be moving, then a clustering process is performed on the pixel point determined to be moving, so as to obtain a plurality of pixel point sets, and a region in which the number of pixel points in the set is greater than a certain value (the number is too small, and noise may be ignored) is considered to be a moving region. Assuming that a rectangular area and a circular area in an image are dynamic areas, and the rest areas are static areas, determining that a third pixel block 121 corresponding to an object to be filtered in the first image, and a pixel block at a pixel position 121 corresponding to a pixel position in a K +1 th frame, a K +2 th frame, and a K +3 th frame is an area 122 framed by a dashed line frame, when determining the third image, first determining whether a pixel block 122 at a pixel position corresponding to a pixel position of the third pixel block 121 in a previous frame or a subsequent frame (for example, a K +1 th frame) of the first image overlaps with a dynamic area in the K +1 th frame, if so, determining whether a pixel block 122 at a pixel position corresponding to a pixel position of the third pixel block 121 in the K +2 th frame overlaps with the dynamic area, and if it is determined that the K +2 th frame meets requirements, using the K +2 th frame as the third image, the third pixel block of the first image is replaced by the pixel block of the corresponding pixel position in the frame.

When the image is collected in the first position by the camera device, multiple frames of images are usually collected continuously to form an image sequence, and because the difference between two or more adjacent frames of images in the image sequence is possibly smaller, and the position change of the dynamic object in the two adjacent frames is not large, the camera device is not suitable for filtering the dynamic object of the first image by using the adjacent frames, and if the image frames are judged one by one, resources are consumed relatively. Therefore, when a plurality of images acquired by the camera device are acquired, some images which can show the change of the dynamic object can be screened out from the image sequence acquired by the camera device, so that the dynamic object can be filtered by more efficiently utilizing the images. Therefore, in some embodiments, the other images of the plurality of images except the first image may be images that differ from the first image by more than a preset threshold value or images that are separated from the first image by a specified frame. For example, the first image may be used as a reference, and if the angle or displacement of a specified object in a certain frame of image in the image sequence and the angle or displacement of the specified object in the first image exceed a preset threshold, the image may be acquired as one of the plurality of images, or an image separated from the first image by a specified frame may be acquired, for example, if the first image is the 5 th frame of the image sequence, the other images are the 10 th frame, the 15 th frame, the 20 th frame, the 25 th frame, and the like in this order.

In some embodiments, when the static objects in the image are filtered by collecting the images of different poses, the dynamic objects in the image may generate certain interference on the filtering of the static objects, so that the static objects cannot be well filtered.

In addition, the present application also provides an image processing method, which can be used for automatically removing a dynamic object in an image, and as shown in fig. 12, the method includes the following steps:

s1202, determining a first pixel block corresponding to a dynamic object to be filtered in a first image;

s1204, determining pixel blocks of pixel positions where the first pixel blocks are located in corresponding pixel positions of a plurality of second images, wherein the plurality of second images and the first image are acquired at the same pose through a camera device;

s1206, determining the pixel block at the corresponding pixel position in the plurality of second images as a third image of a static area;

s1208, replacing the first pixel block in the first image with the pixel block at the corresponding pixel position in the third image.

The image processing method can be executed by the camera device for acquiring the first image and the second image, the camera device can be any equipment with an image acquisition function, for example, the camera device can be a camera, or a camera, a mobile phone, a tablet, a notebook computer, an intelligent wearable device and other terminals provided with the camera, and also can be a movable platform provided with the camera, such as an unmanned aerial vehicle, an unmanned vehicle and the like. Certainly, in some embodiments, the image processing method of the present application may also be executed by other devices in communication connection with the camera device, for example, the image processing method may be executed by a cloud server, and the camera device acquires the first image and the second image and then sends the first image and the second image to the cloud server for processing.

The dynamic objects to be filtered in the embodiment of the application are objects which a user desires to remove from an image, the dynamic objects to be filtered can be determined by the user or can be selected by the user, and one or more dynamic objects to be filtered can be selected.

The first image can be selected from an image sequence continuously acquired by the camera device at a certain fixed pose, wherein the first image can be selected by a user, or can be automatically selected by equipment executing the image processing method, such as automatically selecting an image with better definition, composition or shooting angle from the image sequence as the first image.

After the first image is determined, a first pixel block corresponding to a dynamic object to be filtered in the first image can be determined, then a plurality of frames of second images are determined from the image sequence, a pixel block at a pixel position corresponding to the pixel position of the first pixel block is determined from the second images to be a third image (namely, an image in which the pixel block at the corresponding pixel position is not blocked) of a static area, and then the first pixel block in the first image is replaced by the pixel block at the corresponding pixel position of the first pixel block in the third image to remove the dynamic object.

In order to more quickly screen out an image frame that can be used for filtering out a dynamic object to be filtered in a first image, in some embodiments, some images that are different from the first image may be selected from the image sequence as a second image, for example, the second image may be an image whose difference from the first image exceeds a preset threshold, or an image that is separated from the first image by a specified frame. For example, the first image may be used as a reference, and if the angle or displacement of a certain object in a certain frame of image in the image sequence and the angle or displacement of the certain object in the first image exceed a preset threshold, the image may be acquired as the second image, or an image separated from the first image by a specified frame may be acquired, for example, the first image is the 5 th frame of the image sequence, and the second image is the 10 th frame, the 15 th frame, the 20 th frame, the 25 th frame, and the like in sequence.

In some embodiments, when determining the first pixel block corresponding to the dynamic object in the first image, the following operations may be performed separately for each frame of the second image: calculating an optical flow vector of each pixel point of the first image relative to the second image, determining a target pixel point with the modulus of the optical flow vector larger than a preset threshold value from each pixel point of the first image, and clustering the target pixel point to obtain a first pixel block corresponding to the dynamic object. For example, the optical flow vectors of each pixel point of the first image and the adjacent frame or frames of images may be calculated, if the modulus of the optical flow vector of the pixel point is greater than a certain threshold, the pixel point is considered to be moving, then the pixel points determined to be moving are clustered to obtain a plurality of pixel point sets, and the regions in which the number of the pixel points in the sets is greater than a certain value (the number is too small, and noise may be ignored) are considered to be dynamic objects.

In some embodiments, when the pixel block region of the corresponding pixel position in the third image is used to replace the first pixel block in the first image, the dynamic regions in the plurality of second images may be determined, for each first pixel block, the corresponding third image may be determined according to the following manner, and the pixel block of the pixel position of the dynamic object of the first image in the corresponding pixel position in the second image is determined according to the order from near to far in the acquisition order of the second image and the first image, until the pixel block of the corresponding pixel position is not overlapped with the dynamic region, the second image is taken as the third image.

By the method provided by the embodiment of the application, a reference image (namely a first image) can be determined from multi-frame images acquired by the camera device at the same pose, then a first pixel block corresponding to a dynamic object to be filtered is determined from the reference image, whether the pixel block of the pixel position corresponding to the pixel position of the first pixel block in other images is a static area or not is judged, an image in which the pixel block of the pixel position corresponding to the pixel position of the first pixel block is not blocked can be rapidly screened out, then the pixel block of the pixel position corresponding to the image is used for replacing the first pixel block in the first image, and the dynamic object in the first image can be rapidly and efficiently removed.

To further explain the image processing method provided in the present application, the following is explained with reference to a specific embodiment.

Generally, when a user takes an image or a video, some non-target shooting objects exist and are also within the shooting visual angle, so that some non-shooting target objects exist in the last shot image, and the target which the user wants to shoot is shielded. Therefore, it is necessary to perform a removal process on these non-target objects, and the following provides a method for removing non-target objects (i.e., objects to be filtered) in an image, where the objects to be filtered may be dynamic objects or static objects. Use the user to adopt the unmanned aerial vehicle who has carried on the camera to carry out the scene of image acquisition as an example, as shown in fig. 13, the user can gather the image through camera device 133 that carries on control terminal 132 control unmanned aerial vehicle 131, and unmanned aerial vehicle 131 can pass back the image that camera device 133 gathered to control terminal to show for the user. The filtering of the objects in the image can be performed by the control terminal, and the following describes methods for filtering dynamic objects and static objects, respectively.

Filtering dynamic objects:

1. the camera device can be controlled to acquire a series of image sequences at a certain fixed pose, and then a reference image I0 is screened out from the image sequences by a user, or a reference image I0 is screened out from the image sequences automatically by the control terminal.

2. Selecting a plurality of key frames from the image sequence, wherein the key frames are image frames which have larger difference with the reference image I0, for example, the key frames can be image frames of which the angle or displacement of a certain object is different from the angle or displacement of the object in the reference image I0 by a certain threshold value, or image frames which are separated from the reference image by a specified frame, for example, the reference image I0 is the Kth frame, and the key frames can be image frames separated from the Kth frame by 5, 10, 15 and 20.

3. And respectively calculating optical flow vectors of each pixel in the reference image I0 and the screened other key frames, if the modulus of the optical flow vector of a single pixel is larger than a certain threshold value, determining that the pixel is changed at different moments, namely moving, clustering the pixel which is determined to be moving, obtaining a plurality of pixel points to be combined, and if the pixel point in the set is larger than a certain value, determining that the region corresponding to the pixel point in the set is a moving object.

4. The other key frames also calculate optical flow vectors with two (or more) key frames before and after the other key frames, and determine the dynamic area and the static area of each key frame.

5. Finding a key frame that can fill the dynamic objects of the k0 frame, starting from the nearest key frame of the k0 frame (i.e., the reference image I0), can be determined by determining whether each dynamic object of the k0 frame is a static area in the corresponding area of the frame. As shown in fig. 14, the triangle, square and circle areas in the figure represent dynamic areas, and the remaining areas are static areas, for example, for the circle dynamic area of the k0 frame, the corresponding area of the circle area in the k-1 frame, such as the circle dashed area in the figure, is first determined, and is a static area, so the circle dynamic area of the k0 frame can be filled up using the k-1 frame. Similarly, the remaining triangle dynamic regions and square dynamic regions need to be filled by frames k-5 and k7, respectively.

6. After the key frame which can be used for filling each dynamic object in the k0 th frame is determined, the dynamic object in the k0 frame can be replaced by the corresponding area of each dynamic object in the key frame, so that the purpose of filtering the dynamic object is achieved.

Filtering static objects

7. And determining a pixel area corresponding to the object to be filtered and a pixel area corresponding to the shielded background.

The static object to be filtered out can be determined in the reference image I0, can be automatically identified by the control terminal, and can also be determined by the user in the interactive interface frame selection. As shown in fig. 15, a user may frame a frame in the reference image I0 to filter out a static object (as shown in fig. 1), and since the frame framed by the user is not accurate, the frame may be automatically adjusted, for example, super-pixel segmentation processing is performed on the reference image I0 to obtain a plurality of image regions, and then a ratio of a portion of each region falling into the frame to the image region is determined, and if the ratio is greater than a certain value, the image region is classified into the frame, otherwise, the image region is classified out of the frame. The box adjusted in the above manner is shown in fig. 2. After the relatively accurate selection frame is obtained through automatic adjustment, some selection frames can be properly expanded, for example, 5% expansion, so that the static object is completely in the selection frame, and meanwhile, some background falls in the selection frame, which is a part that needs to be completed, as shown in (3) in the figure. A part of the background region, such as the square-shaped region shown in fig. 4, can be expanded again on the basis of the above.

8. The depth distance of the static object and the background area is determined.

And aiming at the pixel area corresponding to the static object and the pixel area corresponding to the background area detected in the last step, extracting the feature points, performing feature point tracking matching between multi-frame images, performing feature point tracking matching between the previous frame and the next frame, and determining the depth distance of the static object and the depth distance of the background area. The method comprises the following specific steps:

(1) feature point extraction

And extracting feature points from the corresponding region of the static object on the reference image according to the static object selected from the frame, wherein the feature points can be extracted by using a general algorithm, such as a Harris algorithm, a SIFT algorithm, a SURF algorithm, an ORB algorithm and the like.

In order to reduce the amount of calculation, a sparse method may be adopted, feature points of the image are extracted first, a Corner (Corner Detection) may be generally selected as the feature points, and an optional Corner Detection Algorithm, includes: FAST (features from obtained segment test), SUSAN, and Harris operator, etc., as exemplified below using the Harris Corner Detection Algorithms:

defining the matrix A as a construction tensor, as in equation (3)

Where Ix and Iy are gradient information of a point on the image in the x and y directions, respectively, a function Mc can be defined as the following formula (4):

M_c＝λ₁λ₂-κ(λ₁+λ₂)²＝det(A)-κtrace²(A) formula (4)

Where det (a) is a determinant of the matrix a, trace (a) is a trace of the matrix a, κ is a parameter for adjusting sensitivity, and the threshold is set to Mth, and when Mc > Mth, this point may be regarded as a feature point.

(2) KLT (Kanade-Lucas-Tomasi feature point tracking matching algorithm)

The feature points between multiple frames of images can be tracked to calculate the movement (optical flow), h can be taken as the offset of two images before and after, the former image is F (x), the latter image is G (x) ═ F (x + h)

For each feature point, the displacement h of the feature point in the front and back image frames can be obtained through the iteration of the formula (5),

in order to ensure the reliability of the result, the following image is f (x), the preceding image is g (x), a certain feature point is calculated, the offset h of the following image relative to the preceding image is calculated, and the feature point is calculated in reverse, the offset h 'of the preceding image relative to the following image, h-h' in theory, satisfies the condition that the tracked point is correct, wherein h is the optical flow vector h (Δ u, Δ v).

(3) Updating feature points

In the tracking process, some feature points can not be observed any more and some feature points are newly added due to the change of the view angle, so that the feature points can be continuously updated.

(4) Calculating the position of the center of a static object

Since the center of the static object does not necessarily have a feature point. Therefore, for the center of the static object, the position of the center of the static object in each image needs to be determined by using the fitted optical flow vector, so that the three-dimensional coordinates of the center of the static object can be obtained by performing the BA algorithm.

The center point of the static object can be estimated by optical flow vectors of other feature points within the corresponding region of the static object framed from the image. Specifically, the formula (6):

x₀＝∑_nw_ix_iformula (6)

x_iAn optical flow vector of feature points within the frame, w_iThe weight is determined according to the feature point and the 2D image position of the center point, and is specifically expressed by formula (7):

where σ is adjusted empirically, is an adjustable parameter, d_iRepresenting the distance of the feature point i from the center line point

(u_i，v_i) 2D image pixel coordinates representing feature point i, (u)₀，v₀) Is the 2D image pixel coordinates of the center point of the target frame.

Through the steps (1) to (4), the parallax and the optical flow of the center of the static object can be calculated, and the three-dimensional depth information of the center of the static object can be obtained.

In a similar way, three-dimensional depth information of the background region can be calculated.

9. And determining the pose of the camera when the background area can be observed.

Referring to fig. 6, assuming that the imaging device is located at "position 1" when taking the reference image I0, the imaging device can observe the view angle of the entire blocked region and can reach "position 2" by performing translation transformation from "position 1". Wherein d1 and d2 are the depth distance of the static object and the depth distance of the background area obtained in one step. The maximum width L of the static object can be obtained according to the size of the static object in the image and the depth distance of the static object. The moving distance D from "position 1" to "position 2" can be solved by equation (1):

from equation (1) above, when the static object is very close to the background region, i.e. D2 ≈ D1, D will be close to infinity, so that when the static object is too close to the background region, the static object cannot be filtered, and then the non-filtering prompt message may be sent to the user.

In fig. 6, it is shown that the drone is flying to the right, so the extreme position where it can be seen that the whole occluded background area is just observed at the left edge of the occluded background area. Certainly, the unmanned aerial vehicle can fly to the left, and the corresponding limit viewing angle is just to see the right edge of the shielded background area. After the translation distance D is determined, the orientation of the camera can be adjusted at the same time, and the right edge of the pixel area corresponding to the object to be filtered is centered. The method can complete the adjustment of the pose of the camera and achieve the purpose of adjusting the visual angle.

10. Computing a homography matrix H

The pose of the shielded background area can be observed In the last step, the unmanned aerial vehicle can be automatically controlled to be adjusted to the pose, and an image In is obtained through shooting at the pose.

Feature points can be extracted from a pixel region corresponding to a background region In the reference image I0, and feature point matching is performed on the image In to obtain a matching point queue. Determining a homography matrix H according to the characteristic points and the matching points:

the H matrix represents a mapping relationship between two matched pixel points on two images acquired by the camera device at different poses, specifically as formula (8):

x₀＝Hx_nformula (8)

x₀Is a feature point, x, on the background area of the image I0_nIs In the image with x₀The matched point. The H matrix is used for representing the mapping relation between two points, and the pixels are actually required to be on the same plane in the space. When the camera device is far away from the shooting target, the background area can be treated as a plane. Therefore, when d1 is small (e.g., less than 100m), the user is also prompted to filter out the data with poor or no filtering effect.

Or fitting a plane with three-dimensional points (feature points obtain depth information, which can be converted into three-dimensional information) on the background, wherein tolerance parameters (namely, the maximum degree of unevenness) used in the fitting of the plane can be obtained according to the depth of the background (for example, 2% of the depth of the background), and if the plane cannot be fitted, a user is prompted, and the filtering effect is poor or the plane cannot be filtered. The H matrix has 8 unknowns and requires at least 4 pairs of points to be calculated.

And a RANSAC (Random sample consensus) algorithm can be adopted, so that the characteristic points and the matching points with poor matching degree are effectively filtered, the effectiveness of the result is further improved, and a more accurate H matrix is obtained.

11. The occluded background area In image I0 is padded with image In to filter out static objects.

As shown In fig. 16, by using the homography matrix H determined In the previous step, the image In can be projected onto the pose (camera position) of the image pickup apparatus corresponding to the image I0 according to the matrix H, so as to obtain an image In ', and then the pixel region corresponding to the object to be filtered In the image In' is replaced by the coverage region corresponding to the image I0, so that the static object can be removed.

Furthermore, the present application also provides an image processing apparatus, as shown in fig. 17, the image processing apparatus includes a processor 171, a memory 172, and a computer program stored in the memory 172 and executable by the processor 171, and when the processor executes the computer program, the processor implements the following steps:

In some embodiments, the processor, when being configured to acquire the second image acquired by the camera device in the second pose, is specifically configured to:

determining the second pose;

and controlling the camera device to move so as to adjust to the second pose and acquire the second image.

determining the second pose;

and sending prompt information indicating the second pose to a user so that the user controls the camera device to move according to the prompt information to adjust to the second pose and acquire the second image.

In some embodiments, the processor, when being configured to determine the second pose, is specifically configured to:

acquiring the position information of the object to be filtered and the position information of the target object;

and determining the second pose according to the position information of the object to be filtered and the position information of the target object.

In some embodiments, the processor is configured to determine the second pose according to the position information of the object to be filtered and the position information of the target object, and specifically configured to:

and determining the second pose according to the position information of the object to be filtered, the position information of the target object and the size of the object to be filtered.

In some embodiments, the first position comprises a first position and a first orientation, and the second position comprises a second position and a second orientation;

the second position is located on a straight line which passes through the first position and is parallel to a plane where the object to be filtered is located, and the second direction points to the position where the object to be filtered is located.

In certain embodiments, the second position is determined by:

determining a moving distance according to the position information of the object to be filtered, the position information of the target object and the size of the object to be filtered;

and determining the second position according to the first position and the moving distance.

In certain embodiments, the second orientation is determined by:

determining the second orientation according to the first position and the position of the object to be filtered in an image picture acquired by the camera device; or

And determining the second orientation according to the first position, the positions of the left and right end points of the object to be filtered and the positions of the left and right end points of the target object.

In some embodiments, the second pose comprises a second position and a second orientation, and the processor is configured to issue a prompt to the user indicating the second pose, and when the processor is configured to:

and displaying the image marked with the second position to a user, and displaying the rotation angle information adjusted to the second orientation.

controlling the camera device to move so as to change the pose of the camera device and acquire a plurality of frames of images, and judging whether the images comprise the second pixel block or not aiming at each frame of images;

and taking the image comprising the second pixel block as the second image.

In some embodiments, the processor, when being configured to determine whether the image includes the second pixel block, is specifically configured to:

determining a first characteristic point in the first pixel block, and determining a second characteristic point in a peripheral area of the first pixel block;

for each frame of the image, determining a first matching point of the first feature point in the image and a second matching point of the second feature point in the image;

and determining whether the image comprises the second pixel block according to the position relation of the first matching point and the second matching point in the image.

In some embodiments, the first feature point is located inside a first side of the first pixel block, and the second feature point is located outside a second side of the first pixel block, wherein the first side is opposite the second side;

the processor is configured to, when determining whether the image includes the second pixel block according to the positional relationship between the first matching point and the second matching point, specifically:

when the second matching point is determined to be located on the first side of the first matching point, determining that the image includes the second block of pixels.

In some embodiments, the plurality of second feature points are located in a ring-shaped pixel block surrounding the first pixel region, and when the processor is configured to determine whether the image includes the second pixel block according to the position relationship between the first matching point and the second matching point, the processor is specifically configured to:

and when a preset number of the second matching points are positioned at one side of the first matching point, determining that the second image comprises the second pixel block.

In some embodiments, the imaging device is mounted on a movable platform, and the processor is configured to, when controlling the movement of the imaging device, specifically:

and controlling the movable platform to move so as to control the camera device to move.

In some embodiments, the camera device is mounted on the movable platform through a pan-tilt, and the processor is configured to, when controlling the camera device to move, specifically:

and controlling the movable platform to move, and/or controlling the holder to enable the camera device and the movable platform to generate relative motion so as to control the camera device to move.

In some embodiments, the camera device is mounted on the movable platform through the pan/tilt head, the second pose includes a second position and a second orientation, and the processor is configured to control the camera device to move so as to adjust the camera device to the second pose, and specifically configured to:

controlling the movable platform to move so that the camera device is located at the second position; and controlling the holder to rotate so as to adjust the orientation of the camera device to the second orientation.

In certain embodiments, the movable platform comprises any one of a drone, an unmanned vehicle, an unmanned ship.

In some embodiments, the processor, when determining the first pixel block corresponding to the object to be filtered out in the first image, is specifically configured to:

and responding to an instruction of a user, and determining a first pixel block corresponding to an object to be filtered from the first image.

In some embodiments, the instruction comprises a selection box input by a user through a human-computer interaction interface, and the selection box is used for selecting the static target object.

In some embodiments, the first pixel block is a pixel block selected by the frame, and the apparatus is further configured to:

and performing superpixel segmentation processing on the first image to obtain a plurality of image areas, and adjusting the pixel blocks selected by the frame selection frame based on the plurality of image areas.

In some embodiments, the processor is specifically configured to, when adjusting the framed pixel block based on the plurality of image regions:

and adjusting the pixel blocks framed by the selection frame according to the ratio of the part of each image area falling into the selection frame in the plurality of image areas to each image area.

In certain embodiments, the apparatus is further configured to:

and sending prompt information which cannot filter the object to be filtered to a user after a preset first condition is triggered.

In certain embodiments, the preset first condition comprises one or more of:

a first distance between the object to be filtered and the target object is smaller than a first preset threshold value;

or a second distance between the target object and the camera device is smaller than a second preset threshold;

or the distance size relation between the first distance and the second distance does not meet a preset second condition.

In certain embodiments, the apparatus is further configured to:

determining the second block of pixels in the second image.

In some embodiments, the processor, when determining the second block of pixels in the second image, is specifically configured to:

determining a mapping relation between pixel points of the first image and pixel points of the second image;

and determining the mapping area of the first pixel block in the second image according to the mapping relation to serve as the second pixel block.

In some embodiments, when the processor is configured to determine the mapping relationship between the pixel points of the first image and the pixel points of the second image, the processor is specifically configured to:

extracting a third feature point in a peripheral area of the first pixel block, and determining a third matching point of the third feature point in the second image;

determining the mapping relation based on the third feature point and the third matching point.

determining a ring-shaped pixel block surrounding the first pixel block in the first image;

determining a matching annular block in the second image that matches the annular block of pixels;

and taking a pixel block surrounded by the matching annular block in the second image as the second pixel block.

In some embodiments, the processor, when being configured to acquire the first image acquired by the camera device in the first pose, is specifically configured to:

acquiring a plurality of images acquired by a camera device in a first position;

determining the first image in the plurality of images.

In some embodiments, before determining the first pixel block corresponding to the object to be filtered out in the first image, the processor is further configured to:

determining category information of the object to be filtered through the plurality of images, wherein the category information is used for identifying the object to be filtered as a dynamic object or a static object;

and if the object to be filtered is a static object, executing the step of determining a first pixel block corresponding to the object to be filtered in the first image, acquiring a second image which is acquired by the camera device at a second position and comprises a second pixel block, and replacing the first pixel block in the first image through the second pixel block to generate a replaced first image.

In certain embodiments, the apparatus is further configured to:

if the object to be filtered is a dynamic object, executing the following steps:

determining a third pixel block corresponding to an object to be filtered in the first image;

determining a fourth pixel block of which the pixel position of the third pixel area is on the corresponding pixel position in other images except the first image in the plurality of images;

determining the fourth pixel block as a third image of a static area from the other images;

replacing the third block of pixels in the first image with the fourth block of pixels in the third image.

In some embodiments, when the processor is configured to determine the category of the object to be filtered out from the plurality of images, the processor is specifically configured to:

and determining the category information of the object to be filtered according to the optical flows of all pixel points of the object to be filtered relative to other images except the first image in the plurality of images.

In some embodiments, the processor, when determining that the fourth pixel block is the third image of the static area from the other images, is specifically configured to:

determining a dynamic region in the other image;

for each object to be filtered out, performing the following steps:

and determining the pixel block of the pixel position of the third pixel block at the corresponding pixel position in the other images according to the sequence of the other images from the near to the far from the first image acquisition sequence until the pixel block of the corresponding pixel position is not overlapped with the dynamic area, and taking the other images as the third images.

In some embodiments, the other difference from the first image exceeds a preset threshold; or

The other image is an image separated from the first image by a designated frame.

For details of the image processing performed by the image processing apparatus, reference is made to the description of each embodiment in the image processing method, and details are not repeated here.

Further, the present application provides another image processing apparatus, where the image processing apparatus includes a processor, a memory, and a computer program stored in the memory and executable by the processor, and when the processor executes the computer program, the following steps are implemented:

In some embodiments, the processor, when being configured to determine the first pixel block corresponding to the dynamic object in the first image, is specifically configured to:

performing the following for the second image of each frame:

calculating optical flows of all pixel points of the first image relative to the second image;

determining target pixel points of which the modulus of the optical flow is larger than a preset threshold value from each pixel point of the first image;

and clustering the target pixel points to obtain a first pixel block corresponding to the dynamic object.

In some embodiments, the processor is configured to determine, from the plurality of second images, that the block of pixels at the corresponding pixel location is a third image of a static area; the method is specifically used for:

determining dynamic regions in the plurality of second images;

for each of the first pixel blocks, performing the following steps:

and determining the pixel block of the pixel position of the first pixel block in the corresponding pixel position in the second image according to the sequence of the second image and the first image from near to far, and taking the second image as the third image until the pixel block of the corresponding pixel position is not overlapped with the dynamic region.

In some embodiments, the difference between the second image and the first image exceeds a preset threshold; or

The second image is an image spaced from the first image by a specified frame.

In addition, this application still provides a movable platform, movable platform can be arbitrary equipment such as unmanned aerial vehicle, unmanned ship, intelligent robot, handheld cloud platform. The movable platform includes a camera device and an image processing device, the image processing device may implement any image processing method in the embodiments of the present application, and specific implementation details refer to the description of each embodiment in the image processing methods, and are not described herein again.

Accordingly, the embodiments of the present specification further provide a computer storage medium, in which a program is stored, and the program, when executed by a processor, implements the image processing method in any of the above embodiments.

Embodiments of the present description may take the form of a computer program product embodied on one or more storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having program code embodied therein. Computer-usable storage media include permanent and non-permanent, removable and non-removable media, and information storage may be implemented by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of the storage medium of the computer include, but are not limited to: phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technologies, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic tape storage or other magnetic storage devices, or any other non-transmission medium, may be used to store information that may be accessed by a computing device.

For the device embodiments, since they substantially correspond to the method embodiments, reference may be made to the partial description of the method embodiments for relevant points. The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. The terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

The method and apparatus provided by the embodiments of the present invention are described in detail above, and the principle and the embodiments of the present invention are explained in detail herein by using specific examples, and the description of the embodiments is only used to help understanding the method and the core idea of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims

1. An image processing method, characterized in that the method comprises:

2. The method of claim 1, wherein acquiring a second image captured by the camera in a second position comprises:

determining the second pose;

3. The method of claim 1, wherein acquiring a second image of the camera device acquired in a second position comprises

Determining the second pose;

4. The method of claim 2 or 3, wherein determining the second pose comprises:

5. The method of claim 4, wherein determining the second pose according to the position information of the object to be filtered and the position information of the target object comprises:

6. The method of claim 5, wherein the first position comprises a first position and a first orientation, and the second position comprises a second position and a second orientation;

the second position is located on a straight line which passes through the first position and is parallel to the plane where the object to be filtered is located, and the second direction points to the position where the object to be filtered is located.

7. The method of claim 6, wherein the second location is determined by:

8. The method of claim 6, wherein the second orientation is determined by:

9. The method of claim 3, wherein the second pose comprises a second position and a second orientation, and wherein issuing a prompt to a user indicating the second pose comprises:

10. The method of claim 1, wherein acquiring a second image captured by the camera in a second position comprises:

and taking the image comprising the second pixel block as the second image.

11. The method of claim 10, wherein determining whether the second block of pixels is included in the image comprises:

12. The method of claim 11, wherein the first feature point is located inside a first side of the first pixel block and the second feature point is located outside a second side of the first pixel block, wherein the first side is opposite the second side;

determining whether the image comprises the second pixel block according to the position relation of the first matching point and the second matching point, wherein the determining comprises the following steps:

13. The method according to claim 11, wherein a plurality of the second feature points are located in a ring-shaped pixel block surrounding the first pixel region, and determining whether the image includes the second pixel block according to a positional relationship between the first matching point and the second matching point comprises:

14. The method of claim 2, 3 or 10, wherein the camera is mounted on a movable platform, and wherein controlling the camera to move comprises:

15. The method of claim 2, 3 or 10, wherein the camera is carried on a movable platform by a pan-tilt; the controlling the camera device to move comprises:

16. The method of claim 2 or 3, wherein the camera is carried on a movable platform by a pan-tilt head, the second position comprises a second position and a second orientation, and controlling the camera to move to adjust the camera to the second position comprises:

17. The method of any one of claims 14-16, wherein the movable platform comprises any one of a drone, an unmanned vehicle, and an unmanned ship.

18. The method according to any one of claims 1 to 17, wherein determining a first pixel block corresponding to an object to be filtered out in the first image comprises:

19. The method of claim 18, wherein the instruction comprises a selection box input by a user through a human-machine interface, and the selection box is used for selecting the static target object.

20. The method of claim 19, wherein the first block of pixels is a block of pixels framed by the frame, the method further comprising:

21. The method of claim 20, wherein adjusting the boxed block of pixels based on the plurality of image regions comprises:

22. The method according to any one of claims 1-21, further comprising:

23. The method of claim 22, wherein the preset first condition comprises one or more of:

24. The method according to any one of claims 1-23, further comprising:

determining the second block of pixels in the second image.

25. The method of claim 24, wherein determining the second block of pixels in the second image comprises:

26. The method of claim 25, wherein determining the mapping relationship between the pixel points of the first image and the pixel points of the second image comprises:

27. The method of claim 24, wherein determining the second block of pixels in the second image comprises:

28. The method of any one of claims 1-27, wherein acquiring the first image acquired by the camera in the first pose comprises:

determining the first image in the plurality of images.

29. The method according to claim 28, prior to determining the first pixel block corresponding to the object to be filtered out in the first image, further comprising:

30. The method according to claim 29, wherein if the object to be filtered is a dynamic object, the following steps are performed:

determining a fourth pixel block of which the pixel position is located on the corresponding pixel position in the other images except the first image in the plurality of images;

31. The method according to claim 29 or 30, wherein determining the category of the object to be filtered out from the plurality of images comprises:

and determining the category information of the object to be filtered according to the optical flow of each pixel point of the object to be filtered relative to the other images.

32. The method of claim 30, wherein determining the fourth block of pixels from the other images as a third image of a static area comprises:

determining a dynamic region in the other image;

for each object to be filtered out, performing the following steps:

33. The method according to any of claims 29-32, wherein the other differences from the first image exceed a preset threshold; or

34. An image processing method, characterized in that the method comprises:

35. The method of claim 34, wherein determining the first pixel region corresponding to the dynamic object in the first image comprises:

performing the following for the second image of each frame:

36. The method of claim 35, wherein determining the pixel area of the corresponding pixel position from the plurality of second images as a third image of a static area comprises:

determining dynamic regions in the plurality of second images;

for each of the first pixel blocks, performing the following steps:

and determining a pixel block of a pixel position of the first pixel region at a corresponding pixel position in the second image according to the sequence of the second image and the first image from near to far, and taking the second image as the third image until the pixel block at the corresponding position is not overlapped with the dynamic region.

37. The method according to any one of claims 34-36, wherein the second image differs from the first image by more than a preset threshold; or

The second image is an image spaced from the first image by a specified frame.

38. An image processing apparatus comprising a processor, a memory, and a computer program stored in the memory and executable by the processor, wherein the processor, when executing the computer program, implements the steps of:

39. The apparatus according to claim 38, wherein the processor is configured to, when acquiring the second image acquired by the camera in the second pose, specifically:

determining the second pose;

40. The apparatus according to claim 38, wherein the processor is configured to, when acquiring the second image acquired by the camera in the second pose, specifically:

determining the second pose;

41. The apparatus according to claim 39 or 40, wherein the processor is configured to, when determining the second pose, in particular:

42. The apparatus according to claim 41, wherein the processor is configured to determine the second pose according to the position information of the object to be filtered and the position information of the target object, and is specifically configured to:

43. The device of claim 42, wherein the first position comprises a first position and a first orientation, and the second position comprises a second position and a second orientation;

44. The apparatus of claim 43, wherein the second position is determined by:

45. The apparatus of claim 43 or 44, wherein the second orientation is determined by:

46. The apparatus of claim 40, wherein the second pose comprises a second position and a second orientation, and wherein the processor is configured to issue a prompt to the user indicating the second pose, and in particular to:

47. The apparatus according to claim 38, wherein the processor is configured to, when acquiring the second image acquired by the camera in the second pose, specifically:

and taking the image comprising the second pixel block as the second image.

48. The apparatus of claim 47, wherein the processor, when determining whether the second block of pixels is included in the image, is further configured to:

49. The apparatus of claim 48, wherein the first feature point is located inside a first side of the first pixel block and the second feature point is located outside a second side of the first pixel block, wherein the first side is opposite the second side;

50. The apparatus according to claim 48, wherein a plurality of the second feature points are located in a ring-shaped pixel block surrounding the first pixel region, and the processor is configured to, when determining whether the image includes the second pixel block according to a positional relationship between the first matching point and the second matching point:

51. The apparatus of claim 39, 40 or 47, wherein the imaging device is mounted on a movable platform, and wherein the processor is configured to control the movement of the imaging device, and in particular to:

52. The apparatus according to claim 39, 40 or 47, wherein the imaging device is mounted on a movable platform via a pan-tilt head, and wherein the processor is configured to control the movement of the imaging device, and in particular to:

53. The apparatus according to claim 39 or 40, wherein the imaging device is mounted on the movable platform via a pan-tilt head, wherein the second position comprises a second position and a second orientation, and wherein the processor is configured to control the movement of the imaging device to adjust the imaging device to the second position, and in particular to:

54. The apparatus of any one of claims 51 to 53, wherein the movable platform comprises any one of a drone, an unmanned vehicle, an unmanned ship.

55. The apparatus according to any one of claims 38 to 54, wherein the processor is configured to, when determining the first pixel block corresponding to the object to be filtered out in the first image, specifically:

56. The apparatus of claim 55, wherein the instruction comprises a selection box input by a user through a human-machine interface, and wherein the selection box is used for selecting the static target object.

57. The apparatus of claim 56, wherein the first block of pixels is a block of pixels framed by the frame, the apparatus further configured to:

58. The apparatus as claimed in claim 57, wherein the processor is further configured to, when adjusting the boxed pixel blocks based on the plurality of image regions:

59. The apparatus of any one of claims 38-58, wherein the apparatus is further configured to:

60. The apparatus according to claim 59, wherein the preset first condition comprises one or more of:

61. The apparatus of any one of claims 30-60, wherein the apparatus is further configured to:

determining the second block of pixels in the second image.

62. The apparatus of claim 61, wherein the processor, when determining the second block of pixels in the second image, is further configured to:

63. The apparatus according to claim 62, wherein the processor, when determining the mapping relationship between the pixel points of the first image and the pixel points of the second image, is specifically configured to:

64. The apparatus of claim 61, wherein the processor, when determining the second block of pixels in the second image, is further configured to:

65. The device according to any one of claims 38 to 64, wherein the processor is configured to, when acquiring the first image acquired by the camera device in the first position, in particular:

determining the first image in the plurality of images.

66. The apparatus of claim 65, wherein the processor, prior to determining the first pixel block corresponding to the object to be filtered in the first image, is further configured to:

67. The apparatus of claim 66, wherein the apparatus is further configured to:

68. The apparatus according to claim 66 or 67, wherein the processor is configured to, when determining the category of the object to be filtered out from the plurality of images, in particular:

69. The apparatus of claim 67, wherein the processor, when determining from the other images that the fourth block of pixels is the third image of the static area, is specifically configured to:

determining a dynamic region in the other image;

for each object to be filtered out, performing the following steps:

70. The apparatus according to any of claims 66-69, wherein the other difference from the first image exceeds a preset threshold; or

71. An image processing apparatus comprising a processor, a memory, and a computer program stored in the memory and executable by the processor, wherein the processor, when executing the computer program, implements the steps of:

72. The apparatus of claim 71, wherein the processor, when determining the first block of pixels corresponding to the dynamic object in the first image, is specifically configured to:

performing the following for the second image of each frame:

73. The apparatus according to claim 71, wherein the processor is configured to determine a pixel patch at the corresponding pixel position from the plurality of second images as a third image of a static area; the method is specifically used for:

determining dynamic regions in the plurality of second images;

for each of the first pixel blocks, performing the following steps:

74. The apparatus according to any of claims 71-73, wherein the second image differs from the first image by more than a preset threshold; or

The second image is an image spaced from the first image by a specified frame.

75. A movable platform comprising a camera device and an image processing device according to any one of claims 38 to 74.