CN112585946A

CN112585946A - Image shooting method, image shooting device, movable platform and storage medium

Info

Publication number: CN112585946A
Application number: CN202080004316.1A
Authority: CN
Inventors: 胡晓翔; 张李亮; 李鑫超
Original assignee: SZ DJI Technology Co Ltd
Current assignee: SZ DJI Technology Co Ltd
Priority date: 2020-03-27
Filing date: 2020-03-27
Publication date: 2021-03-30
Also published as: WO2021189429A1

Abstract

An image photographing method includes manually photographing a target object by a photographing device to generate a first image when a movable platform is in a manual photographing mode; when the movable platform is in the automatic shooting mode, a second image is automatically acquired through the camera device; determining whether the second image contains the target object by performing feature matching on the first image and the second image; and if the second image contains the target object, controlling the shooting device to shoot the target object in focus. The method can automatically and accurately shoot the target object, avoids manual shooting operation on the target object, and improves operation efficiency.

Description

Image shooting method, image shooting device, movable platform and storage medium

Technical Field

The present invention relates to the field of image processing technologies, and in particular, to an image capturing method, an image capturing apparatus, a movable platform, and a storage medium.

Background

Along with the development of science and technology, unmanned aerial vehicle can replace the manual work to accomplish the operation of patrolling and examining.

In the process of patrolling and examining, the technical staff can control unmanned aerial vehicle to fly to near target object such as electric tower, bridge, unmanned aerial vehicle can return the preview picture of catching in real time to ground. And the technician controls the unmanned aerial vehicle to adjust the pose through the returned preview picture so as to aim at the target object for shooting. The technician performs an anomaly analysis on the target object based on the captured image to determine whether an anomaly exists in the target object.

When the target object needs to be shot, technicians are required to accurately control the unmanned aerial vehicle so as to adjust the unmanned aerial vehicle to be in a proper pose for shooting, the efficiency is low, and the operation is inconvenient.

Disclosure of Invention

The embodiment of the invention provides an image shooting method, an image shooting device, image shooting equipment and a storage medium, which are used for improving the shooting efficiency of a movable platform.

In a first aspect, an embodiment of the present invention provides an image capturing method, which is applied to a movable platform, where an image capturing device is mounted on the movable platform; the method comprises the following steps:

when the movable platform is in a manual shooting mode, manually shooting a target object through the camera device to generate a first image;

when the movable platform is in an automatic shooting mode, automatically acquiring a second image through the camera device;

determining whether the second image contains the target object by performing feature matching on the first image and the second image;

and if the second image contains the target object, controlling the shooting device to carry out focusing shooting on the target object.

In a second aspect, an embodiment of the present invention provides an image capturing apparatus, which is provided on a movable platform, where an image capturing apparatus is mounted on the movable platform; the device includes: a memory, a processor; wherein the memory has stored thereon executable code that, when executed by the processor, causes the processor to:

and if the second image contains the target object, controlling the camera device to carry out focusing shooting on the target object.

In a third aspect, an embodiment of the present invention provides a movable platform, including:

a body;

the power system is arranged on the machine body and used for providing power for the movable platform;

the camera device is arranged on the machine body and used for collecting images;

and one or more processors configured to perform:

when the image shooting device is in a manual shooting mode, manually shooting a target object through the image shooting device to generate a first image;

when the image shooting device is in an automatic shooting mode, automatically acquiring a second image through the image shooting device;

In a fourth aspect, an embodiment of the present invention provides a non-transitory machine-readable storage medium, where the storage medium is a computer-readable storage medium, and program instructions are stored in the computer-readable storage medium, and the program instructions are used to implement the image capturing method according to the first aspect of the present invention.

In the method provided by the embodiment of the present invention, the movable platform may respectively capture a first image and a second image in the manual capture mode and the automatic capture mode, where the first image includes a target object, and after the first image and the second image are obtained, the first image and the second image may be subjected to feature matching processing to determine whether the second image also includes the target object, and when the second image also includes the target object, the target object is captured in focus, so that the finally obtained image may include the target object. The photographed image containing the target object can be observed and analyzed subsequently to determine whether the target object is in a normal state. Through the mode, the target object can be automatically and accurately shot, the manual shooting operation of the target object is avoided, and the operation efficiency is improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on the drawings without creative efforts.

Fig. 1 is a flowchart of an image capturing method according to an embodiment of the present invention;

FIG. 2 is a flowchart illustrating a method for determining whether a second image contains a target object according to an embodiment of the present invention;

FIG. 3 is another flowchart of determining whether a second image includes a target object according to an embodiment of the present invention;

FIG. 4 is a flowchart of another method for determining whether a target object is included in a second image according to an embodiment of the present invention;

FIG. 5 is a flowchart of another image capture method provided by an embodiment of the invention;

fig. 6 is a schematic structural diagram of an image capturing apparatus according to an embodiment of the present invention;

fig. 7 is a schematic structural diagram of a movable platform according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The terminology used in the embodiments of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the examples of the present invention and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise, and "a plurality" typically includes at least two.

The words "if", as used herein, may be interpreted as "at … …" or "at … …" or "in response to a determination" or "in response to a detection", depending on the context. Similarly, the phrases "if determined" or "if detected (a stated condition or event)" may be interpreted as "when determined" or "in response to a determination" or "when detected (a stated condition or event)" or "in response to a detection (a stated condition or event)", depending on the context.

In addition, the sequence of steps in each method embodiment described below is only an example and is not strictly limited.

The image shooting method provided by the embodiment of the invention can be executed by a movable platform, the movable platform can be provided with a camera device, and the movable platform can be equipment such as an unmanned aerial vehicle and a robot.

In some high-risk situations, the movable platform can be used to work instead of manual work, for example, an unmanned aerial vehicle can be used to perform patrol work of a target object in the high altitude, and the target object can be some key components in electric equipment such as an electric tower. By way of example, the electrical tower key components may include lightning conductor, tower head, insulator, etc. In the process of the operation of the movable platform, the target object needs to be shot, and subsequently, whether the target object is in a normal state or not can be judged based on the image of the target object shot by the movable platform.

By the method provided by the embodiment of the invention, the target object can be accurately shot so as to ensure that the target object can be in the shot image, thereby facilitating the subsequent judgment of whether the target object is in a normal state.

The following describes some embodiments of the image capturing method provided herein.

Fig. 1 is a flowchart of an image capturing method according to an embodiment of the present invention, as shown in fig. 1, the method includes the following steps:

101. when the movable platform is in a manual shooting mode, a first image is generated by manually shooting a target object through the camera device.

102. When the movable platform is in the automatic shooting mode, a second image is automatically acquired through the camera device.

103. And determining whether the second image contains the target object or not by performing feature matching on the first image and the second image.

104. And if the second image contains the target object, controlling the shooting device to shoot the target object in focus.

In practical application, the manual shooting mode can indicate a shooting mode adopted in a teaching stage, the automatic shooting mode can indicate a shooting mode adopted in a replay stage, and the process of shooting the target object by the movable platform comprises two stages, namely the teaching stage and the replay stage.

In the teaching stage, the technical staff can control unmanned aerial vehicle to fly to the target object, and unmanned aerial vehicle shoots the preview picture through the camera device who carries on it to with preview picture biography back ground, the technical staff can learn through the preview picture that passes back whether camera device aims at the target object and shoots. If the image pickup device does not shoot at the target object, the target object does not appear in the preview screen, or only a part of the target object appears in the preview screen. The state of the unmanned aerial vehicle and the state of the camera device can be adjusted adaptively by technicians according to the preview picture returned in real time, so that the target object can completely appear in the preview picture.

When the target object completely appears in the preview screen, the technician may trigger a photographing operation, and in response to the photographing operation triggered by the technician, the image pickup device may photograph the target object, at which time the first image including the target object may be obtained.

When the target object completely appears in the preview picture, in addition to the need to capture the first image, the state information of the movable platform and the camera at that time can be recorded, and the state information is used for automatically controlling the movable platform and the camera to capture in the same state again in the replay stage. The state information may include pose information of the movable platform and/or focal segment information of the camera. This focal length information can be understood as focal length.

It can be understood that the cradle head can be mounted on the movable platform, and the camera device can be mounted on the cradle head. The holder is a supporting part for mounting and fixing the camera device, and after the camera device is mounted on the holder, the horizontal angle and the pitch angle of the camera device can be adjusted, so that the monitoring range of the camera device can be enlarged through the holder. When the camera device is installed on the movable platform through the pan/tilt head, the state information may include pose information of the movable platform, pose information of the pan/tilt head, and/or focal segment information of the camera device.

After obtaining the first image and the state information, the first image may be stored in association with the state information. It should be noted that, in practical applications, the number of the target objects may be one or multiple, and when the number of the target objects is multiple, a technician may control the movable platform to shoot each target object, and associate and store each first image obtained by shooting and corresponding state information.

The implementation of the teaching phase is described above, and the implementation of the replay phase is described below. In the replay stage, the movable platform can be automatically controlled to restore the state when the first image is shot again according to the state information recorded in the teaching stage when the first image is shot, and the second image is automatically acquired through the camera device in the state when the first image is shot.

For convenience of understanding, a scene in which the unmanned aerial vehicle shoots key components of the electric tower is taken as an example, and the implementation process of the teaching stage and the replay stage is described. Suppose that the unmanned aerial vehicle gathers the image of electricity tower key part at 2020.3.23 (corresponding teaching phase) for the first time, the technical staff can control the unmanned aerial vehicle to fly to near this electricity tower key part, adjust the state of unmanned aerial vehicle, when this electricity tower key part appears in preview picture completely, trigger the shooting operation, can obtain the image A who contains electricity tower key part this moment, the state information when recording shooting image A simultaneously. Assume that the state information is used to indicate that the unmanned aerial vehicle is in a position of 40 ° N, 116 ° E, in an attitude at an azimuth angle of 30 ° and a depression angle of 45 °, and that the image a is taken in a state in which the focal length of the imaging device is 50 mm.

Then suppose that unmanned aerial vehicle gathers the image of electric tower key component again (corresponding to the rehearsal stage) at 2020.3.24, can directly input above-mentioned state information directly into unmanned aerial vehicle, unmanned aerial vehicle can fly to the position of 40N, 116E again according to this state information automatically, adjusts self gesture simultaneously so that with the azimuth 30, the angle of depression is the gesture of 45 and with the focus section acquisition image B of 50 mm.

It is understood that the first image and the second image taken in the same state should be identical in theory, excluding the influence of external disturbance factors. However, due to the control deviation and the influence of external disturbance factors in practical applications, the first image captured in the teaching stage and the second image captured in the replay stage are not necessarily identical, and even in some cases, the first image completely contains the target object, while the second image does not already contain the target object or only contains a part of the target object. However, if the second image does not contain the target object or only contains a part of the target object, the subsequent determination of whether the target object is in a normal state based on the second image may be affected. To solve this problem, it is possible to identify whether the target object is contained in the second image.

In practical applications, the process of identifying whether the second image contains the target object may be implemented as follows: and determining whether the second image contains the target object or not by performing feature matching on the first image and the second image. After determining whether the target object is included in the second image, if the target object is included in the second image, the image pickup device may be controlled to perform in-focus shooting of the target object. If the second image does not contain the target object, the movable platform and the camera device can be adjusted, and then shooting can be carried out again.

In connection with the embodiment shown in fig. 2, a scheme for determining whether the second image contains the target object is exemplified below. As shown in fig. 2, the determination scheme may include the steps of:

201. a first region image containing the target object in the first image is determined.

The first area image is an area image corresponding to a target object in the first image, and may be understood as a first area image corresponding to the target object in the first image, which is selected by framing the first image.

The process of determining the first region image containing the target object in the first image may be implemented by inputting the first image into a pre-trained neural network model, the neural network model may output a corresponding position region of the target object in the first image, and the corresponding first region image of the target object in the first image may be determined based on the position region. It can be understood that, since the neural network model can identify the target object, in the process of training the neural network model, the neural network model can be trained through a large number of images including or not including the target object and the labeling information of the position area where the target object is located in each image. After the neural network model is trained, the trained neural network model has the function of identifying the target object. In addition to identifying the target object through the neural network model, the target object may also be identified through an algorithm, which is not limited in this embodiment of the present invention.

It should be noted that the process of identifying the first region image may be performed in the teaching stage or in the replay stage. If the teaching phase is completed, the first area image and the state information at the time of capturing the first image may be stored in association with each other. If the replay stage is completed, the first area image corresponding to the target object can be determined from the first image before the first area image is used.

202. And performing image block matching on the first area image and the second image to determine whether the second image contains the target object.

In practical applications, an image block matching process may be performed in the second image based on the first area image to determine whether the second image includes the target object. The process of image block matching the first area image and the second image may be implemented as follows: and calculating the image similarity between the first area image and the second image, wherein if the image similarity is greater than a preset threshold value, the second image contains the target object, and otherwise, the second image does not contain the target object.

While the above describes a scheme for determining whether the second image contains the target object, in an alternative embodiment, it can also be determined whether the second image contains the target object as follows. As shown in fig. 3, the determination scheme may include the steps of:

301. a first region image containing the target object in the first image is determined.

Specifically, the first region image can be processed by a trained neural network model to determine a region containing the target object; the determination of the first area image may also be done manually by framing, doodling. In addition, the adjustment of the focal length can be performed to make the first area image fill the whole picture.

In an embodiment, the first image may be divided into a plurality of image blocks according to texture or color features of the first image, and a user performs a click to determine one of the image blocks as the first area image. The present invention is not limited to the above-described first region image determining method, and other methods for determining the first region image are also within the scope of the present invention.

302. And performing classification recognition processing on the second image to determine a second area image containing the preset class object in the second image, wherein the target object corresponds to the preset class object.

In practical application, whether the second image contains the preset category object or not can be identified. The process of identifying whether the second image contains the preset class object may be implemented by inputting the second image into a pre-trained neural network model, and the neural network model may output indication information indicating whether the second image contains the preset class object.

It is to be understood that, since the neural network model can output the above indication information, which indicates that the neural network model needs to be trained in advance through a plurality of images containing or not containing the object of the preset category and the label information of whether each image contains the object of the preset category, after the training, the neural network model has an object classification function, and the neural network model can determine whether the second image contains the object of the preset category based on the input second image. If the second image contains the preset class object, the neural network model can output a second area image corresponding to the preset class object in the second image.

It should be noted that the preset-category object is a category, and an object belonging to the same category as the target object corresponds to the preset-category object. Taking an electric tower key component as an example, it is assumed that there are an electric tower key component a and an electric tower key component B, which are both electric tower key components but are not the same electric tower key component. When an image containing the electric tower key component a is input into the neural network model, the neural network model may output an area image a' corresponding to the electric tower key component contained in the image. When the image containing the electric tower key component B is input into the neural network model, the neural network model may output a region image B' corresponding to the electric tower key component contained in the image. Although the neural network model outputs the area image a 'and the area image B' corresponding to the electric tower key component, it is currently impossible to distinguish which electric tower key component the area image a 'and the area image B' correspond to, and it is necessary to determine whether the second area image corresponds to the bottom through the following steps.

303. And if the similarity between the first area image and the second area image meets the first set condition, determining that the second image contains the target object.

Illustratively, the matching of the similarity in step 303 determines the similarity between the first area image and the second area image by means of graph block matching.

In practical application, after the first area image and the second area image are obtained, the image similarity between the first area image and the second area image can be calculated, and if the obtained image similarity is greater than a preset threshold value, it indicates that the first area image and the second area image contain the same object, and further the second area image contains the target object.

In the above two schemes for determining whether the second image contains the target object, in another alternative embodiment, whether the second image contains the target object may also be determined as follows. As shown in fig. 4, the determination scheme may include the steps of:

401. and acquiring a first characteristic point corresponding to the target object in the first image.

Optionally, the obtaining of the first feature point corresponding to the target object in the first image may be implemented by extracting a feature point included in the first image; performing semantic segmentation processing on the first image to obtain a classification result of each pixel point in the first image, wherein the classification result indicates whether the corresponding pixel point is located on a target object; based on the classification result of each pixel point, the feature points which do not belong to the target object are removed from the feature points contained in the first image, and the first feature points corresponding to the target object in the first image are obtained.

The above-mentioned process of extracting the feature points included in the first image may be implemented by extracting the feature points in the first image through a Scale-invariant feature transform (SIFT) algorithm. In practical application, of course, the feature points in the first image may also be extracted through other algorithms, which is not limited to the present invention.

The extracted feature points included in the first image are feature points of all objects in the first image, and all objects include the target object and also include a background portion in the image. In order to make the feature matching result of the first image and the second image more accurate, feature points which are not located on the target object in the first image can be removed.

In order to remove the feature points of the first image, which are not located on the target object, the semantic segmentation processing may be performed on the first image to obtain a classification result of each pixel point in the first image. For convenience of understanding, what the classification result of each pixel point in the first image refers to is exemplarily described below with reference to a specific example. Assuming that the first image contains 800 pixels, the classification category of each of the 800 pixels can be determined by performing semantic segmentation processing on the first image. Taking a simple scene as an example, assuming that the first image only includes pixel points of two objects, one object is a target object, and the other object is a background portion, the classification category of each of the 800 pixel points is used to indicate whether the corresponding pixel point belongs to the target object or the background portion.

In the above, the implementation manner of determining the classification result of each pixel point in the first image is introduced, and after the classification result of each pixel point in the first image is determined, the feature points included in the first image may be screened based on the classification result to eliminate the feature points in the first image that are not located on the target object.

In order to further improve the accuracy of extracting the first feature point corresponding to the target object in the first image, before the first feature point is extracted, a region image corresponding to the target object in the first image may be identified, and the first feature point corresponding to the target object may be extracted in the identified region image.

It should be noted that the first feature point corresponding to the target object in the first image may be extracted in advance in the teaching stage, and the first feature point may be stored in association with the state information at the time of capturing the first image, or the first feature point may be extracted before the first feature point is used in the replay stage.

402. Second feature points included in the second image are extracted.

Alternatively, the process of extracting the feature points included in the second image may be implemented by extracting the feature points in the second image through a SIFT algorithm. In practical application, of course, the feature points in the first image may also be extracted through other algorithms, which is not limited to the present invention.

It is to be understood that the second feature point is a feature point of all objects in the second image, and the second feature point may include a feature point on the target object or a feature point of the background portion.

403. And determining the target feature point matched with the first feature point in the second feature points.

In practical application, in the process of extracting the first feature point and the second feature point, a feature descriptor corresponding to the first feature point can be determined, and a feature descriptor corresponding to the second feature point can be determined. Whether pixel points from two different images correspond to the same physical point can be judged through the feature descriptors. Based on this, the process of determining the target feature point in the second feature point that matches the first feature point may be implemented to determine the target feature point in the second feature point that matches the first feature point based on the feature descriptor of the second feature point and the feature descriptor of the first feature point. If the feature point A in the second feature point is matched with the feature point B in the first feature point, the feature point A and the feature point B are represented to correspond to the same real object point in the actual space, that is, the feature point A and the feature point B in the two images are obtained by shooting the same real object point.

It should be noted that, in the process of matching the second feature point and the first feature point, the matching result may be inaccurate due to the influence of interference factors. In the embodiment of the present invention, in order to ensure the accuracy of the matching result, the feature points with the matching errors in the target feature points may be removed, which may specifically be implemented as follows: performing semantic segmentation processing on the second image to obtain a classification result of each pixel point in the second image, wherein the classification result indicates whether the corresponding pixel point belongs to a target object; and filtering the characteristic points which do not belong to the target object from the target characteristic points based on the classification result of each pixel point.

In this embodiment, the matching method using the feature points is more suitable for the inspection process of the frame structure like an electric power device such as an electric tower than the matching method using the image block because the image block serving as the target object includes a large number of background areas. And the target is positioned by the image block matching algorithm, and when the target background area is greatly changed in the teaching stage and the replay stage, the positioning is easy to fail.

By adopting semantic segmentation, whether each pixel of the image belongs to a target object can be judged, so that the characteristic extracted in the teaching stage can be distinguished from the self or background of the target object, the extracted characteristic points can represent the target object more accurately, and more accurate target positioning can be carried out according to the characteristics in the replay stage. Particularly for the frame structure of the electric tower, the situation that the target comprises a large number of background areas is depicted by a graphic block.

Therefore, with the scheme of the embodiment, the feature points of the background area are removed, so that the method has stronger robustness to background change. Such as different seasonal variations, different weather variations, etc. In the teaching stage, accurate feature point description of a background-removed area of a target object can be obtained by combining image semantic segmentation and feature point extraction; in the replay stage, the method for improving the target matching precision by combining the feature point matching and the image semantic segmentation is adopted, namely, wrong feature point matching is eliminated according to the image semantic segmentation result, the matching accuracy can be improved, and the wrong tracking or missing tracking can be prevented.

404. And if the target characteristic point meets a second set condition, determining that the second image contains the target object.

Alternatively, the second setting condition may include that the number of target feature points reaches a set threshold.

Taking the determination of the scene containing the key components of the electric tower in the second image as an example, assuming that 30 first feature points corresponding to the key components of the electric tower in the first image and 50 second feature points extracted in the second image, through the matching processing of the first feature points and the second feature points, 20 feature points matching with the first feature points in the second feature points are determined, which means that 20 feature points in the second image are all pixel points obtained by shooting the same physical point in the first image. Assuming that the threshold is set to 20, it can be determined that the second image includes the above-described electric tower key component.

The process of determining whether the second image contains the target object is introduced above, and further, in order to improve the shooting precision, the target object may be controlled to be located in the central area of the image for shooting, so that the target object appears in the central area in the image, which is more beneficial to observing and processing the target object, and thus, whether the target object is in a normal state is determined more accurately. The process of controlling the target object to be located in the central region of the image may be implemented by determining a location region of the target object in the second image according to the coordinates of the target feature point; if the position area is not located in the central area of the second image, determining pose adjustment data according to the relative position of the position area and the central area; and controlling the movable platform to perform focusing shooting on the target object after adjusting according to the pose adjusting data.

In practical application, after the target feature point is determined, the coordinates of the target feature point may be acquired, and the position area of the target object in the second image may be determined according to the coordinates of the target feature point. For example, an average value of the coordinates of the target feature point may be calculated, which may represent a position area of the target object in the second image. Assuming that the center point of one image is located at (10,10) and the average value of the target feature point coordinates is (10,10), it indicates that the position area of the target object in the second image is located at the center area of the second image.

Conversely, if it is determined that the position region of the target object in the second image is not located in the center region of the second image, the relative position of the position region of the target object in the second image and the center region of the second image may be determined, and the pose adjustment data may be determined according to the relative position.

For ease of understanding, the process of determining pose adjustment data is described in connection with specific examples. Assuming that the central point of the image is located at (10,10), and the location area of the target object in the second image is located at (8,10), the distance from the central point of the target object in the second image is 2 pixel points in the horizontal direction. Assuming that the distance between the 2 pixels corresponding to the actual distance is 1.6cm calculated by the internal reference and the external reference of the camera device, the position of the movable platform can be adjusted, so that the movable platform moves rightwards by 1.6cm, and thus, after the movable platform moves rightwards by 1.6cm, the target object is in the central area of the preview picture, and the target object can be shot in a focusing manner.

It is understood that, in some cases, the method provided by the embodiment of the present invention may be performed multiple times to cause the movable platform to appropriately perform pose adjustment until an image of the target object in the center area can be captured by the adjusted pose. However, in order to reduce the amount of calculation, as shown in fig. 5, after the target object is matched in the previous frame, the target object can be tracked in the next frame, so that the target object matching operation can be avoided being performed once in each frame. After the target object is tracked, a position area of the target object in a next frame of picture can be determined, if the position area is not located in the central area of the next frame of picture, the position and orientation adjustment data are determined, and the movable platform is controlled to adjust according to the position and orientation adjustment data. Meanwhile, the teaching stage also records the focal length information of the imaging device when the first image is shot, and the focal length of the imaging device can be adjusted to the focal length used when the first image is shot again, and then the target object can be shot. If the target object is not matched in the previous frame, the target object cannot be tracked, and the current focal length can be reduced, so that the visual field range shot by the camera can be enlarged. Of course, before zooming out the focal length, it can also be determined whether the current focal length is in the minimum focal length, if so, it is prompted that the target object cannot be successfully matched, otherwise, the current focal length is zoomed out. After the current focal segment is reduced, the process of matching the target object may be resumed.

Fig. 6 is a schematic structural diagram of an image capturing apparatus according to an embodiment of the present invention, where the image capturing apparatus may be disposed on the above movable platform, as shown in fig. 6, the image capturing apparatus includes: a memory 11, a processor 12; wherein the memory 11 has stored thereon executable code which, when executed by the processor 12, causes the processor 12 to implement:

Optionally, the processor 12 is specifically configured to:

determining a first area image containing the target object in the first image;

and performing image block matching on the first area image and the second image to determine whether the second image contains the target object.

Optionally, the processor 12 is further configured to:

recording state information when the target object is photographed in the manual photographing mode; the state information comprises position information of the movable platform, attitude information of the movable platform and/or focal length information of the camera device;

under the automatic shooting mode, controlling the movable platform and/or the camera device to be adjusted to the same state according to the state information;

and when the movable platform and/or the camera device are in the same state, acquiring a second image through the camera device.

Optionally, the processor 12 is specifically configured to:

determining a first area image containing the target object in the first image;

performing semantic segmentation processing on the second image to determine a second region image containing a preset class object in the second image, wherein the target object corresponds to the preset class object;

and when the similarity between the first area image and the second area image meets a first set condition, determining that the target object is contained in the second image.

Optionally, the processor 12 is specifically configured to:

acquiring a first feature point corresponding to the target object in the first image;

extracting a second feature point contained in the second image;

determining a target feature point matched with the first feature point in the second feature points;

and when the target feature point meets a second set condition, determining that the second image contains the target object.

Optionally, the second setting condition includes: the number of the target characteristic points reaches a set threshold value.

Optionally, the processor 12 is further configured to:

determining a position area of the target object in the second image according to the coordinates of the target feature point;

when the position area is not located in the central area of the second image, determining pose adjustment data according to the relative position of the position area and the central area;

and controlling the movable platform to perform focusing shooting on the target object after adjusting according to the pose adjusting data.

Optionally, the processor 12 is specifically configured to:

extracting feature points included in the first image;

performing semantic segmentation processing on the first image to obtain a classification result of each pixel point in the first image, wherein the classification result indicates whether the corresponding pixel point is located on the target object;

based on the classification result of each pixel point, eliminating the feature points which do not belong to the target object from the feature points contained in the first image to obtain the first feature points corresponding to the target object in the first image.

Optionally, the processor 12 is further configured to:

performing semantic segmentation processing on the second image to obtain a classification result of each pixel point in the second image, wherein the classification result indicates whether the corresponding pixel point belongs to the target object;

and filtering the characteristic points which do not belong to the target object from the target characteristic points based on the classification result of each pixel point.

Optionally, the movable platform comprises a drone.

Optionally, the movable platform further comprises a cradle head, the cradle head is mounted on the unmanned aerial vehicle, and the camera device is mounted on the cradle head.

Optionally, the target object comprises an electrical device.

Fig. 7 is a schematic structural diagram of a movable platform provided in an embodiment of the present invention, and in fig. 7, the movable platform is implemented as an unmanned aerial vehicle.

As shown in fig. 7, the movable platform includes: a body 21, a power system 22 provided on the body 21, an imaging device 23 provided on the body 21, and a processor 24.

Wherein the power system 22 is used to power the movable platform.

The camera 23 is used to capture images.

Wherein, the processor 24 is configured to generate a first image by manually shooting the target object through the camera 23 when the camera 23 is in the manual shooting mode; when the image shooting device is in the automatic shooting mode, a second image is automatically acquired through the camera device 23; determining whether the second image contains the target object by performing feature matching on the first image and the second image; when the target object is included in the second image, the imaging device 23 is controlled to perform focus shooting on the target object.

Optionally, the processor 24 is further configured to: when it is determined that the target object is included in the second image, the control system 22 adjusts the pose of the body 21 to track the target object so that the target image is maintained in the screen.

Optionally, when the movable platform is implemented as an unmanned aerial vehicle, as shown in fig. 7, the unmanned aerial vehicle may further include a cradle head 25 disposed on the body 21, so that the camera 23 may be disposed on the cradle head 25, and the camera 23 may move relative to the body 21 through the cradle head 25. The processor 24 is further configured to: when the target object is determined to be contained in the second image, the control power system 22 adjusts the pose of the pan/tilt head 25 to track the target object so that the target object is maintained in the screen.

The power system 22 of the drone may include an electronic governor, one or more rotors, and one or more motors corresponding to the one or more rotors.

Other devices (not shown in the figure) such as an inertial measurement unit may also be provided on the drone, not listed here.

In addition, an embodiment of the present invention further provides a computer-readable storage medium, where executable codes are stored in the computer-readable storage medium, and the executable codes are used for implementing the image capturing method provided in each of the foregoing embodiments.

The above-described apparatus embodiments are merely illustrative, wherein the units described as separate components may or may not be physically separate. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. An image shooting method is characterized by being applied to a movable platform, wherein an image pickup device is loaded on the movable platform; the method comprises the following steps:

2. The method of claim 1, wherein determining whether the target object is contained in the second image by feature matching the first image and the second image comprises:

determining a first area image containing the target object in the first image;

3. The method according to claim 1, wherein before the generating the first image by manually capturing the target object by the image capturing device, further comprising:

when the movable platform is under the automatic shooting mode, obtain the second image through camera device automatic acquisition, include:

4. The method of claim 1, wherein determining whether the target object is contained in the second image by feature matching the first image and the second image comprises:

determining a first area image containing the target object in the first image;

and if the similarity between the first area image and the second area image meets a first set condition, determining that the second image contains the target object.

5. The method of claim 1, wherein determining whether the target object is contained in the second image by feature matching the first image and the second image comprises:

extracting a second feature point contained in the second image;

and if the target feature point meets a second set condition, determining that the second image contains the target object.

6. The method according to claim 5, wherein the second setting condition includes: the number of the target characteristic points reaches a set threshold value.

7. The method of claim 5, further comprising:

if the position area is not located in the central area of the second image, determining pose adjustment data according to the relative position of the position area and the central area;

8. The method according to claim 5, wherein the obtaining a first feature point corresponding to the target object in the first image comprises:

extracting feature points included in the first image;

9. The method according to claim 5, wherein after determining the target feature point matching the first feature point in the second feature points, further comprising:

10. The method of any one of claims 1-9, wherein the movable platform comprises a drone.

11. The method of claim 10, wherein the movable platform further comprises a pan-tilt mounted on the drone and the camera is mounted on the pan-tilt.

12. The method of any one of claims 1-9, wherein the target object comprises an electrical device.

13. An image capturing apparatus is provided on a movable platform on which an image capturing apparatus is mounted; the image capturing apparatus includes: a memory, a processor; wherein the memory has stored thereon executable code that, when executed by the processor, causes the processor to:

14. The apparatus of claim 13, wherein the processor is specifically configured to:

determining a first area image containing the target object in the first image;

15. The apparatus of claim 13, wherein the processor is further configured to:

16. The apparatus of claim 13, wherein the processor is specifically configured to:

determining a first area image containing the target object in the first image;

17. The apparatus of claim 13, wherein the processor is specifically configured to:

extracting a second feature point contained in the second image;

18. The apparatus according to claim 17, wherein the second setting condition comprises: the number of the target characteristic points reaches a set threshold value.

19. The apparatus of claim 17, wherein the processor is further configured to:

20. The apparatus of claim 17, wherein the processor is specifically configured to:

extracting feature points included in the first image;

21. The apparatus of claim 17, wherein the processor is further configured to:

22. The apparatus of any one of claims 13-21, wherein the movable platform comprises a drone.

23. The apparatus of claim 22, wherein the movable platform further comprises a pan-tilt mounted on the drone, the camera mounted on the pan-tilt.

24. The apparatus of any of claims 13-21, wherein the target object comprises an electrical device.

25. A movable platform, comprising:

a body;

and one or more processors configured to perform:

26. The movable platform of claim 25, wherein the processor is further configured to:

and when the target object is determined to be contained in the second image, controlling the power system to adjust the pose of the body to track the target object so as to keep the target image in a picture.

27. The movable platform of claim 25, further comprising a pan-tilt on which the camera device is mounted;

the processor is further configured to: and when the target object is determined to be contained in the second image, controlling the power system to adjust the pose of the holder to track the target object so as to keep the target object in a picture.

28. A computer-readable storage medium, characterized in that the storage medium is a computer-readable storage medium in which program instructions for implementing the image capturing method according to any one of claims 1 to 12 are stored.