WO2021217398A1 - 图像的处理方法及装置、可移动平台及其控制终端、计算机可读存储介质 - Google Patents

图像的处理方法及装置、可移动平台及其控制终端、计算机可读存储介质 Download PDF

Info

Publication number
WO2021217398A1
WO2021217398A1 PCT/CN2020/087404 CN2020087404W WO2021217398A1 WO 2021217398 A1 WO2021217398 A1 WO 2021217398A1 CN 2020087404 W CN2020087404 W CN 2020087404W WO 2021217398 A1 WO2021217398 A1 WO 2021217398A1
Authority
WO
WIPO (PCT)
Prior art keywords
image frame
target
display object
video
space
Prior art date
Application number
PCT/CN2020/087404
Other languages
English (en)
French (fr)
Inventor
杨振飞
周游
苏坤岳
Original Assignee
深圳市大疆创新科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市大疆创新科技有限公司 filed Critical 深圳市大疆创新科技有限公司
Priority to PCT/CN2020/087404 priority Critical patent/WO2021217398A1/zh
Priority to CN202080030128.6A priority patent/CN113853577A/zh
Publication of WO2021217398A1 publication Critical patent/WO2021217398A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0484Interaction techniques based on graphical user interfaces [GUI] for the control of specific functions or operations, e.g. selecting or manipulating an object, an image or a displayed text element, setting a parameter value or selecting a range

Definitions

  • This application relates to the field of image processing technology, and specifically to an image processing method, an image processing device, a movable platform, a control terminal of the movable platform, and a computer-readable storage medium.
  • UAV aerial video due to its unique God's perspective, has been widely used and respected. However, if you want to do some special effects for aerial video, such as 3D subtitle effects, you still need to download the video from the drone's SD (Secure Digital Memory Card) to the computer, and use traditional professional video editing software to create special effects. , And there is a certain degree of difficulty in operation, time-consuming and laborious.
  • SD Secure Digital Memory Card
  • This application aims to solve at least one of the technical problems existing in the prior art or related technologies.
  • the first aspect of this application proposes an image processing method.
  • the second aspect of the present application proposes an image processing device.
  • the third aspect of this application proposes a movable platform.
  • the fourth aspect of the present application proposes a control terminal of a movable platform.
  • the fifth aspect of the present application proposes a computer-readable storage medium.
  • an image processing method applied to an image processing device including: acquiring a target video collected by a shooting device of the movable platform when the movable platform moves in space; Obtain the pose information of each image frame in the target video captured by the camera; obtain the display object edited by the user; determine the position information of the display object in space; correspond to each image frame according to the position information of the display object in the space The pose information of the display object is projected onto each image frame to obtain the target composite video.
  • acquiring the pose information of each image frame in the target video when the shooting device is capturing includes: acquiring the position information of the feature point in each image frame in the image frame; according to the feature point The position information in the image frame determines the pose information of the camera when collecting each image frame.
  • the method further includes: acquiring the initial pose information of the camera when collecting each image frame, where the initial pose information is collected by a sensor configured on the movable platform;
  • the position information in the image frame determines the pose information of the camera when collecting each image frame, including: determining that the camera is collecting according to the position information of the feature point in the image frame and the initial pose information corresponding to each image frame The pose information of each image frame.
  • the method further includes: acquiring the initial pose information of the camera when collecting each image frame, wherein the initial pose information is collected by a sensor configured on the movable platform; according to the display object The position information in the space and the initial pose information corresponding to each image frame project the display object onto each image frame to obtain a preview composite video.
  • projecting the display object onto each image frame according to the position information of the display object in space and the pose information corresponding to each image frame to obtain the target composite video includes: according to the display object The position information in the space and the pose information corresponding to each image frame determine the projection position and projection posture of the display object in each image frame; the object will be displayed according to the projection position and projection posture of the display object in each image frame Projected into each image frame to obtain the target composite video.
  • the method further includes: obtaining position adjustment information and/or posture adjustment information of the display object edited by the user; according to the position information of the display object in space and the posture information corresponding to each image frame Determine the projection position and projection posture of the display object in each image frame, including: according to the position information of the display object in space, the pose information corresponding to each image frame, and the position adjustment information and/or posture adjustment information of the display object Determine the projection position and projection posture of the display object in each image frame.
  • acquiring the display object edited by the user includes: detecting the user's display object editing operation, and determining the display object edited by the user according to the detected editing operation.
  • detecting the user's display object editing operation includes: controlling the interactive device to display the display object editing interface; and detecting the user's display object editing operation on the interactive device that displays the display object editing interface.
  • the method further includes: acquiring the initial video collected by the shooting device when the movable platform moves in space; acquiring the target video collected by the shooting device of the movable platform when the movable platform moves in space, including : Detect the user's video selection operation, and determine the target video from the initial video based on the detected video selection operation.
  • detecting the user's video selection operation includes: controlling the interactive device to display the video selection interface; and detecting the user's video selection operation on the interactive device that displays the video selection interface.
  • determining the position information of the display object in space includes: obtaining the position of the pixel selected by the user in the target image frame of the target video in the target image frame or in the target image frame The position of the selected pixel area in the target image frame; the position information of the display object in space is determined according to the position of the pixel or the pixel area in the target image frame.
  • the method further includes: determining the target sub-video from the target video; obtaining the position of the pixel selected by the user in the target image frame of the target video in the target image frame or in the target image frame
  • the position of the pixel area selected in the target image frame includes: obtaining the position of the pixel selected by the user in the target image frame in the target sub-video in the target image frame or the pixel area selected in the target image frame The position in the target image frame.
  • the method further includes: in response to a user's operation of selecting a pixel or a pixel area in an image frame other than the target sub-video in the target video, outputting a prompt message indicating that the selection is invalid.
  • the method further includes: outputting first prompt information, where the first prompt information is used to instruct the user to select a pixel point or a pixel point area in the target image frame in the target sub-video.
  • the target sub-video includes the video collected by the shooting device when the motion state of the shooting device in the target video meets a preset motion condition.
  • determining the target sub-video from the target video includes: determining a plurality of continuous image frames from the target video, wherein the average of the feature points between the adjacent image frames of the plurality of continuous image frames The sum of the amount of movement is greater than or equal to the preset distance threshold, and the disparity of the multiple consecutive image frames is greater than or equal to the preset disparity threshold; the multiple consecutive image frames are determined as the target sub-video.
  • the number of multiple consecutive image frames is greater than or equal to a preset image number threshold.
  • determining the position information of the display object in the space further includes: determining whether the object in the space indicated by the pixel point or the pixel point area is a static object; according to the pixel point or the pixel point area in the target image
  • the position in the frame determines the position information of the display object edited by the user in space, including: when the object is a static object, the position of the display object edited by the user in the space is determined according to the position of the pixel or the pixel area in the target image frame location information.
  • determining the position information of the display object in the space further includes: determining whether the object in the space indicated by the pixel point or the pixel point area is a stationary object; when the object is not in stationary motion, outputting the first 2. Prompt information, where the second prompt information is used to prompt the user that the pixel or the pixel area is not selectable, or to prompt the user to select other pixels or the pixel area.
  • determining the position information of the display object in space further includes: determining whether the pixel area meets a preset texture condition; and determining the user-edited image according to the position of the pixel area in the target image frame
  • the position information of the display object in the space includes: when the preset texture condition is satisfied, the position information of the display object edited by the user in the space is determined according to the position of the pixel point or the pixel point area in the target image frame.
  • determining the position information of the display object in space further includes: determining whether the pixel area meets a preset texture condition; when the preset texture condition is not satisfied, outputting third prompt information , Wherein the third prompt information is used to prompt the user that the pixel or the pixel area is not selectable, or to prompt the user to select other pixels or the pixel area.
  • determining the position information of the display object in space according to the position of the pixel area in the target image frame includes: determining the position of the pixel area in the target image frame according to the position of the pixel area in the target image frame Feature points; obtain the location of the feature point in the pixel point area in the target image frame; determine the location information of the spatial point corresponding to the geometric center pixel point of the pixel point area according to the location of the feature point in the target image frame.
  • determining the spatial position information of the corresponding spatial point of the geometric center pixel point of the pixel point area according to the position of the feature point in the target image frame includes: according to the feature point in the target image frame The position in determines the optical flow vector of the corresponding spatial point of the geometric center pixel of the pixel point area in at least one reference image frame of the target video; according to the optical flow vector, the corresponding spatial point of the geometric center pixel is determined in at least one frame of reference The position in the image frame; according to the position in the reference image frame of at least one frame and the position of the feature point in the target image frame, the position information of the spatial point corresponding to the geometric center pixel point is determined in space.
  • determining the position information of the display object in space according to the position of the pixel point in the target image frame includes: obtaining the position in space of the space point corresponding to the target feature point in the target image frame ; Fit the target plane according to the position in the space of the spatial point corresponding to the target feature point; determine the position information of the display object in the space according to the position of the pixel point in the target image frame and the fitted target plane.
  • the pixel distance between the target feature point and the pixel point is less than or equal to a preset pixel distance threshold.
  • the target video is captured by the camera when the movable platform is following the target object in the space
  • determining the position information of the display object in the space includes: acquiring the following object of the camera Position information in the space; the position information of the display object in the space is determined according to the position information of the following object in the space.
  • the target video is captured by a shooting device when the movable platform is moving around the target object in space
  • determining the position information of the display object in the space includes: obtaining the surroundings of the shooting device The position information of the object in the space; the position information of the display object in the space is determined according to the position information of the surrounding object in the space.
  • the display object includes at least one of numbers, letters, symbols, characters, and object identifications.
  • the display object is a three-dimensional model.
  • the method further includes: playing or storing or running a social application program to share the target composite video.
  • the mobile platform includes an image processing device
  • the method further includes: sending the target composite video to the control terminal of the mobile platform so that the control terminal can play or store or run a social application to share the target composite video.
  • the movable platform includes an unmanned aerial vehicle.
  • an image processing device including: a memory, configured to store a computer program; a processor, configured to execute the computer program to realize: acquiring that the movable platform is movable when moving in space The target video captured by the camera of the platform; obtain the pose information of each image frame in the target video captured by the camera; obtain the display object edited by the user; determine the position information of the display object in the space; according to the display object in the space The position information in and the pose information corresponding to each image frame project the display object onto each image frame to obtain the target composite video.
  • image processing device in the above technical solution provided by this application may also have the following additional technical features:
  • the processor is further configured to execute a computer program to achieve: obtain the position information of the feature point in each image frame in the image frame; determine according to the position information of the feature point in the image frame The pose information of each image frame collected by the camera.
  • the processor is further configured to execute a computer program to achieve: obtain the initial pose information of the camera when acquiring each image frame, where the initial pose information is configured by the movable platform Collected by the sensor; according to the position information of the feature point in the image frame and the initial pose information corresponding to each image frame, the pose information of the camera when collecting each image frame is determined.
  • the processor is further configured to execute a computer program to achieve: obtain the initial pose information of the camera when acquiring each image frame, where the initial pose information is configured by the movable platform Collected by the sensor; according to the position information of the display object in space and the initial pose information corresponding to each image frame, the display object is projected onto each image frame to obtain a preview composite video.
  • the processor is further configured to execute a computer program to implement: determining the display object in each image frame according to the position information of the display object in space and the pose information corresponding to each image frame Projection position and projection posture of the display object; according to the projection position and projection posture of the display object in each image frame, the display object is projected into each image frame to obtain the target composite video.
  • the processor is further configured to execute a computer program to achieve: obtain the position adjustment information and/or posture adjustment information of the display object edited by the user; The pose information corresponding to an image frame and the position adjustment information and/or the posture adjustment information of the display object determine the projection position and projection posture of the display object in each image frame.
  • the processor is further configured to execute a computer program to realize: detecting the user's editing operation of the display object, and determining the display object edited by the user according to the detected editing operation.
  • the processor is further configured to execute a computer program to implement: control the interactive device to display the display object editing interface; and detect the user's display object editing operation on the interactive device that displays the display object editing interface.
  • the processor is further configured to execute a computer program to achieve: obtain the initial video collected by the shooting device when the movable platform moves in space; detect the user's video selection operation, and select according to the detected video The operation determines the target video from the initial video.
  • the processor is further configured to execute a computer program to achieve: control the interactive device to display a video selection interface; and detect a user's video selection operation on the interactive device that displays the video selection interface.
  • the processor is further configured to execute a computer program to achieve: obtain the position of the pixel selected by the user in the target image frame of the target video in the target image frame or in the target image frame The position of the selected pixel area in the target image frame; the position information of the display object in space is determined according to the position of the pixel or the pixel area in the target image frame.
  • the processor is further configured to execute a computer program to: determine the target sub-video from the target video; obtain the pixel points selected by the user in the target image frame in the target sub-video in the target image The position in the frame or the position of the selected pixel area in the target image frame in the target image frame.
  • the processor is further configured to execute a computer program to achieve: in response to the user's operation of selecting a pixel or a pixel area in an image frame other than the target sub-video in the target video, output Check the invalid prompt message.
  • the processor is further configured to execute a computer program to implement: output first prompt information, where the first prompt information is used to instruct the user to select a pixel in the target image frame in the target sub-video Point or pixel area.
  • the target sub-video includes the video collected by the shooting device when the motion state of the shooting device in the target video meets a preset motion condition.
  • the processor is further configured to execute a computer program to realize: determine a plurality of continuous image frames from the target video, wherein the characteristic points between adjacent image frames of the plurality of continuous image frames are determined The sum of the average movement amount is greater than or equal to the preset distance threshold, and the disparity of the multiple consecutive image frames is greater than or equal to the preset disparity threshold; the multiple consecutive image frames are determined as the target sub-video.
  • the number of multiple consecutive image frames is greater than or equal to a preset image number threshold.
  • the processor is further configured to execute a computer program to achieve: determine whether the object in the space indicated by the pixel point or the pixel point area is a stationary object; when the object is a stationary object, according to the pixel point or The position of the pixel area in the target image frame determines the position information of the display object edited by the user in space.
  • the processor is further configured to execute a computer program to realize: determine whether the object in the space indicated by the pixel point or the pixel point area is a stationary object; when the object is not in stationary motion, output the second Prompt information, where the second prompt information is used to prompt the user that a pixel or a pixel area is not selectable, or is used to prompt the user to select another pixel or a pixel area.
  • the processor is further configured to execute a computer program to achieve: determine whether the pixel area meets a preset texture condition; when the preset texture condition is satisfied, according to the pixel point or the pixel area
  • the position in the target image frame determines the position information in the space of the display object edited by the user.
  • the processor is further configured to execute a computer program to achieve: determine whether the pixel area meets a preset texture condition; when the preset texture condition is not satisfied, output third prompt information, Among them, the third prompt information is used to prompt the user that a pixel or a pixel area is not selectable, or to prompt the user to select another pixel or a pixel area.
  • the processor is further configured to execute a computer program to implement: determine the feature points in the pixel area according to the position of the pixel area in the target image frame; obtain the feature points in the pixel area The position in the target image frame; according to the position of the feature point in the target image frame, the position information of the spatial point corresponding to the geometric center pixel point of the pixel point area is determined in space.
  • the processor is further configured to execute a computer program to realize that: according to the position of the feature point in the target image frame, it is determined that the corresponding spatial point of the geometric center pixel point of the pixel point area is at least in the target video
  • the optical flow vector of a reference image frame the position of the corresponding spatial point of the geometric center pixel in at least one reference image frame is determined according to the optical flow vector; according to the position and feature point in at least one reference image frame
  • the position in the target image frame determines the position information of the spatial point corresponding to the geometric center pixel point in space.
  • the processor is further configured to execute a computer program to achieve: obtain the position in space of the spatial point corresponding to the target feature point in the target image frame; according to the spatial point corresponding to the target feature point The position in the space fits the target plane; the position information of the display object in the space is determined according to the position of the pixel in the target image frame and the fitted target plane.
  • the pixel distance between the target feature point and the pixel point is less than or equal to a preset pixel distance threshold.
  • the target video is captured by the camera when the movable platform is following the target object in the space
  • the processor is also configured to execute a computer program to achieve: Obtain the following object of the camera Position information in the space; the position information of the display object in the space is determined according to the position information of the following object in the space.
  • the target video is captured by the camera when the movable platform is moving around the target object in space
  • the processor is also configured to execute a computer program to achieve:
  • the position information of the surrounding object in, and the position information of the display object in the space is determined according to the position information of the surrounding object in the space.
  • the display object includes at least one of numbers, letters, symbols, characters, and object identifications.
  • the display object is a three-dimensional model.
  • the processor is further configured to execute a computer program to realize: playing or storing or running a social application program to share the target composite video.
  • the movable platform includes an image processing device, and the processor is also configured to execute a computer program to realize: sending the target composite video to the control terminal of the movable platform to make the control terminal play or store Or run a social application to share the target composite video.
  • the movable platform includes an unmanned aerial vehicle.
  • a movable platform which includes an image processing device as described in some of the above technical solutions.
  • a control terminal of a movable platform which includes an image processing device as described in some of the above technical solutions.
  • a computer-readable storage medium having a computer program stored thereon, and when the computer program is executed by a processor, the steps of the image processing method as in any of the above technical solutions are realized.
  • this application proposes an image processing scheme.
  • the pose information of the camera of the movable platform is acquired during the movement, combined with the image information of the video, and the three-dimensional information of the scene in the video is calculated.
  • the user only needs to input the display object to be generated, such as subtitle text, and click the position where it wants to be placed, and the special effect video with the display object inserted, such as 3D subtitle effect video, can be automatically rendered.
  • Fig. 1 shows a schematic flowchart of an image processing method according to an embodiment of the present application
  • Fig. 2 shows a schematic flowchart of a method for acquiring pose information according to an embodiment of the present application
  • Fig. 3 shows a schematic flowchart of a method for determining position information of a display object in space according to an embodiment of the present application
  • Fig. 4 shows a schematic flowchart of a method for obtaining a target composite video according to an embodiment of the present application
  • Fig. 5 shows a wireframe of a three-dimensional model according to an embodiment of the present application
  • Fig. 6 shows a three-dimensional model blanking diagram according to an embodiment of the present application
  • Fig. 7 shows a schematic flowchart of an image processing method according to another embodiment of the present application.
  • Fig. 8 shows a schematic flowchart of an image processing method according to still another embodiment of the present application.
  • Fig. 9 shows a schematic flowchart of a method for determining a target sub-video according to an embodiment of the present application
  • Fig. 10 shows a schematic diagram of a strategy for determining a target sub-video according to an embodiment of the present application
  • FIG. 11 shows a schematic flowchart of a method for calculating points of interest according to an embodiment of the present application
  • FIG. 12 shows a schematic flowchart of a method for calculating points of interest according to another embodiment of the present application.
  • FIG. 13 shows a schematic flowchart of an image processing method according to another embodiment of the present application.
  • Fig. 14 shows a schematic block diagram of an image processing apparatus according to an embodiment of the present application.
  • Fig. 15 shows a schematic block diagram of a movable platform according to an embodiment of the present application
  • Fig. 16 shows a schematic block diagram of a control terminal of a movable platform according to an embodiment of the present application.
  • the embodiment of the first aspect of the present application provides an image processing method, which is applied to an image processing apparatus.
  • the image processing device can be separately arranged on the movable platform, can also be separately arranged on the control terminal of the movable platform, and can also be partly arranged on the movable platform and partly on the control terminal of the movable platform.
  • the movable platform can be, for example, an unmanned aerial vehicle, or other vehicles with multiple cameras, such as an unmanned car.
  • the control terminal can be any terminal device that can interact with the mobile platform, for example, it can be a remote control device, or a smart device (interaction through APP (Application, application)), such as smart phones, smart tablets, smart glasses (such as VR (Virtual Reality, virtual reality) glasses, AR (Augmented Reality, augmented reality) glasses), the SD card of the mobile platform can also be inserted into the computer, and the control terminal is the computer at this time.
  • APP Application, application
  • [u,v,1] T represents a two-dimensional point in Homogeneous image coordinates.
  • [x w ,y w ,z w ,1] T represents a three-dimensional point in World coordinates.
  • Matrix R is a rotation matrix (Rotation Matrix)
  • matrix T is a displacement matrix (Translation Matrix) or it can be written as a matrix t
  • R and T are the camera's external parameters (Extrinsic Matrix), expressing the three-dimensional space, the world coordinate system to The rotation and displacement transformation of the camera coordinate system together become the camera pose.
  • the matrix K is called the camera calibration matrix, that is, the intrinsic parameters of each camera, which expresses the conversion from the three-dimensional camera coordinate system to the two-dimensional homogeneous image coordinate system.
  • Fig. 1 shows a schematic flowchart of an image processing method according to an embodiment of the present application. As shown in Figure 1, the image processing method includes:
  • Step 110 Obtain a target video collected by a shooting device of the movable platform when the movable platform moves in space.
  • This step is used as an initial step, by acquiring the target video collected by the camera of the movable platform, and using the target video as a processing object to perform specific processing operations.
  • the method further includes: acquiring the initial video collected by the shooting device when the movable platform moves in the space.
  • step 110 may include: detecting the user's video selection operation, and determining the target video from the initial video according to the detected video selection operation.
  • the target video is a part selected from the initial video, which can be the part that the user expects to keep, or the part that the user chooses to be processed later, such as the part that needs to be inserted into the display object, which improves the flexibility of video production , And reduce unnecessary calculations.
  • detecting the user's video selection operation includes: controlling the interactive device to display the video selection interface; and detecting the user's video selection operation on the interactive device that displays the video selection interface.
  • controlling the interactive device to display the video selection interface a clear interface can be provided for the user to operate, and the interactive device can be used to accurately detect the operation to ensure that an accurate target video is obtained.
  • the interactive device may be a touch display screen, for example.
  • Step 120 Obtain the pose information of the camera when capturing each image frame in the target video.
  • the camera moves with the movable platform in the space.
  • the pose information corresponding to the shooting device as the external parameters of the shooting device, including rotation information and displacement information
  • it can be combined with the internal parameters of the known shooting device to realize the world coordinate system and homogeneous image Conversion between coordinate systems to determine the view of the entity in the space in each frame of the target video in order to perform subsequent image processing operations.
  • FIG. 2 shows a schematic flowchart of a method for acquiring pose information according to an embodiment of the present application.
  • step 120 in Fig. 1 includes:
  • Step 122 Obtain the position information of the feature points in each image frame in the image frame.
  • Step 124 According to the position information of the feature point in the image frame, determine the pose information of the photographing device when acquiring each image frame.
  • a conversion equation between its three-dimensional position information in the world coordinate system and its two-dimensional position information in the homogeneous image coordinate system ie, the position information in the image frame
  • image recognition can be used to determine the position information (known quantity) of multiple feature points in each image frame in the image frame, and two adjacent image frames often have a large number of overlapping feature points.
  • the three-dimensional position information (unknown quantity) of the same feature point in the world coordinate system is the same, and an image frame also has unique pose information (unknown quantity).
  • the feature points are extracted from the target video frame by frame and the feature points are made. Tracking and matching can obtain multiple sets of conversion equations, which can be solved simultaneously to obtain the pose information corresponding to each image frame and the three-dimensional position information of each feature point in the world coordinate system.
  • the error image frame is analyzed when tracking the feature point, it can be deleted to optimize the calculation result.
  • the number of image frames in a target video is huge, and if the frame-by-frame tracking is performed on the basis of the first image frame, a large deviation may occur.
  • the image frames with obvious changes can be marked as key frames during the frame-by-frame tracking, and then the subsequent non-key frames can be tracked based on the key frames, and when a new key frame appears, the new key frame can be used as the basis to track it. After the non-key frame, thereby improving the calculation accuracy.
  • calculation if there are more than 5 key frames in the target video used for calculation, calculation can be performed.
  • the method further includes: acquiring the initial pose information of the camera when acquiring each image frame, wherein the initial pose information is collected by a sensor configured on the movable platform.
  • Step 124 specifically includes: determining the pose information of the photographing device when acquiring each image frame according to the position information of the feature point in the image frame and the initial pose information corresponding to each image frame.
  • some prior information can be obtained during the process of collecting the target video, including the attitude data of the unmanned aerial vehicle, PTZ pose data and trajectory data used for one-click shooting can obtain the initial pose information of the camera with lower accuracy.
  • the initial pose information is used as the initial value of the iterative pose information when solving the equations simultaneously. Reduce the number of iterations, speed up the algorithm convergence time, and reduce the probability of errors caused by improper selection of the initial value, which helps to shorten the post-processing time of the target video. Even if you use a smart phone, you can insert the display object in the target video to make the target composite video .
  • the calculation process may be specifically, for example:
  • i the key frame sequence.
  • the projection transformation process is:
  • P i is the three-dimensional coordinates of a certain feature point (that is, the position information of the spatial point corresponding to the feature point in space), and p i is the pixel coordinate of this feature point on the i-th image frame (that is, the feature point is in the first image frame).
  • position information in the i-frame image frame Represents the rotation and translation transformation of the current frame relative to the previous frame, arg represents the optimized parameter (target) is P.
  • j represents a non-key frame sequence
  • the BA beam adjustment algorithm is also used to calculate the pose information of the corresponding camera.
  • Step 130 Obtain the display object edited by the user.
  • the processing operation is specifically to insert a display object in the target video
  • the display object pre-inserted by the user needs to be obtained first.
  • step 130 includes: detecting the user's editing operation of the display object, and determining the display object edited by the user according to the detected editing operation.
  • the control terminal can accurately obtain the display object edited by the user by detecting the user's editing operation of the display object.
  • detecting the user's display object editing operation includes: controlling the interactive device to display the display object editing interface; detecting the user's display object editing operation on the interactive device that displays the display object editing interface.
  • a clear interface can be provided for the user to operate, and the interactive device can be used to accurately detect the operation to ensure that the accurate display object is obtained.
  • the interactive device may be a touch display screen, for example.
  • the display object includes at least one of numbers, letters, symbols, text, and object identifiers to meet the rich display needs of users.
  • the display object editing interface can be provided with a text input box for the user to use the input method to input numbers and letters, and the font library can be configured, and the user can also load a new font library or delete an existing font library.
  • the display object editing interface can also display a collection of symbols and object identifiers for users to choose.
  • the display object can also be drawn by the user. It can be drawn by inputting numbers, letters, symbols, text, and object identification, or drawing any graphics.
  • the display object is a three-dimensional model to meet rich display requirements.
  • the specific processing method will be detailed below.
  • Step 140 Determine the position information of the display object in the space.
  • the position information in the space can be the position information in the world coordinate system, or the position information in the camera coordinate system obtained by combining the pose information corresponding to each image frame and the position information in the world coordinate system. .
  • step 130 may be performed first, or step 140 may be performed first, and the present application does not limit the execution order of the two.
  • FIG. 3 shows a schematic flowchart of a method for determining position information of a display object in space according to an embodiment of the present application.
  • step 140 in FIG. 1 includes:
  • Step 142 Obtain the position of the pixel selected by the user in the target image frame of the target video in the target image frame or the position of the pixel area selected in the target image frame in the target image frame.
  • the target video contains multiple image frames.
  • the user can select one of the image frames as the target image frame, and select a pixel or pixel area in the target image frame (that is, select a point of interest, and select the pixel area ROI (Region of Interest, In the case of the region of interest), the center point of the pixel region is used as the point of interest) to indicate the insertion position of the display object.
  • a pixel or pixel area in the target image frame that is, select a point of interest, and select the pixel area ROI (Region of Interest, In the case of the region of interest), the center point of the pixel region is used as the point of interest) to indicate the insertion position of the display object.
  • ROI Region of Interest
  • the user can select at will, or display the feature points in the image frame, and make the feature points in a selectable state, so that the user can directly select the identified feature points as points of interest to simplify Subsequent calculations.
  • reference points in the pixel area such as the pixel at the upper left corner and the pixel at the lower right corner, can be used to represent the pixel area. For example, the user can select these two pixels simultaneously or successively. To frame the pixel area, for example, you can select one pixel and then slide to another pixel to select the pixel area.
  • SLIC Simple Linear Iterative Clustering
  • NCut Normalized Cut
  • Turbopixel Quick -Shift
  • Graph-cut a Graph-cut b
  • other algorithms to generate super pixels in the image frame (that is, irregular pixel blocks with certain visual significance formed by adjacent pixels with similar texture, color, brightness and other characteristics) .
  • the superpixels in the frame are selected, the superpixels outside the frame are excluded, and the pixels on the border of the frame can be set when a certain percentage (for example, 50%) or more of the superpixel belongs in the frame, then It is counted as selected, otherwise it is counted as unselected, and all selected superpixels constitute a pixel area.
  • a certain percentage for example, 50%
  • Step 144 Determine the position information of the display object in space according to the position of the pixel point or the pixel point area in the target image frame.
  • the position of the selected point of interest in the target image frame is clear, but it is still necessary to calculate its position in other image frames other than the target image frame, or to calculate its position in space.
  • the calculation process is similar to the calculation process of the pose information corresponding to each image frame, that is, it is achieved by tracking and matching feature points and solving the equation set simultaneously.
  • the selected interest point can also be used as a feature point to establish a conversion equation. The difference is that first, when calculating the pose information, it is necessary to extract the feature points of the whole image, and when calculating the interest points, only the feature points near the selected pixel points or the feature points in the selected pixel area are extracted to improve the calculation.
  • the target video can be down-converted.
  • the mobile phone video is 30Hz, that is, 30 images per second, and 5 of them can be extracted at equal intervals.
  • feature point tracking can be performed once along the forward and reverse directions of the target video time axis to obtain accurate calculation results. The scheme of calculating points of interest will be further described later.
  • Step 150 Project the display object onto each image frame according to the position information of the display object in the space and the pose information corresponding to each image frame to obtain the target composite video.
  • the position information of the display object in the space reflects the absolute position of the display object, and the pose information corresponding to each image frame reflects the shooting angle of the camera. Combining the two, the display object can be projected into the image frame to obtain a composite Image frames, all composite image frames are sequentially combined to form the target composite video, and the process of inserting the display object into the target video is completed.
  • FIG. 4 shows a schematic flowchart of a method for acquiring a target composite video according to an embodiment of the present application.
  • step 150 in FIG. 1 specifically includes:
  • Step 152 Determine the projection position and projection posture of the display object in each image frame according to the position information of the display object in the space and the pose information corresponding to each image frame.
  • the display object itself is not a point, and therefore has a certain shape.
  • the position information of the display object in the space may be, for example, the position of a reference point on the display object in the space.
  • the position information of the display object in the space can be converted into each image
  • the position information in the frame is used as the projection position of the display object in each image frame.
  • Using the pose information corresponding to each image frame to determine the orientation of the display object can also be understood as performing coordinate transformation on the entire display object to obtain the projection posture of the display object in each image frame.
  • Step 154 Project the display object into each image frame according to the projection position and projection posture of the display object in each image frame to obtain the target composite video.
  • the display object is placed at the corresponding projection position in the corresponding image frame to complete the projection of the display object to obtain a composite image frame, and then combine to obtain the target composite video.
  • the method of the present application further includes: obtaining position adjustment information and/or posture adjustment information of the display object edited by the user.
  • the selected point of interest can be used as the initial position for placing the display object.
  • the position of the display object can be further adjusted based on the point of interest.
  • the selected points of interest can be used as a bridge to solve the problem that the position where the user actually expects to insert the display object cannot be calculated or accurately calculated, which improves the flexibility of the solution and can meet the rich image processing needs.
  • the display object is placed frontally in the image frame by default.
  • step 152 includes: determining the projection of the display object in each image frame according to the position information of the display object in space, the pose information corresponding to each image frame, and the position adjustment information and/or posture adjustment information of the display object Position and projection attitude.
  • the specific projection process is as follows:
  • the realistic graphics generated at this time are placed on the initial position (that is, the point of interest), and are placed frontally in the image frame where the user selects the point of interest.
  • Users can input position adjustment information according to their needs, such as dragging to adjust the position of the realistic graphic (that is, the adjustment of the translational transformation t between the point of interest), or the posture adjustment information to rotate the angle of the realistic graphic (that is, the angle of the realistic graphic).
  • position adjustment information that is, the adjustment of the translational transformation t between the point of interest
  • the posture adjustment information to rotate the angle of the realistic graphic that is, the angle of the realistic graphic.
  • the pose information above should be based on the camera coordinate system of the first image frame.
  • the position and position of each image frame relative to the realistic image are calculated.
  • Pose simple coordinate system conversion, get the orientation of the realistic graphics in the image frame, use z-buffer to get the blanking map and render), and use the camera model projection relationship to calculate the realistic graphics in each image frame Two-dimensional location information. So far, the placement position and orientation of the realistic graphics are obtained, and the placement of the realistic graphics is completed.
  • Fig. 7 shows a schematic flowchart of an image processing method according to another embodiment of the present application to describe a solution for making a preview composite video.
  • the image processing method includes:
  • Step 210 Obtain a target video collected by a shooting device of the movable platform when the movable platform moves in space.
  • Step 220 Obtain initial pose information of the camera when collecting each image frame, where the initial pose information is collected by a sensor configured on the movable platform.
  • Step 230 Obtain the display object edited by the user.
  • Step 240 Determine the position information of the display object in the space.
  • For the pixel area for example, you can select the position of the pixel at the center point, and then use the initial pose
  • the information and the internal parameters of the shooting device are coordinated to obtain the rough position of the pixel or the pixel area in the space, which is recorded as the preview position information of the display object in the space.
  • Step 250 Project the display object onto each image frame according to the position information of the display object in the space and the initial pose information corresponding to each image frame to obtain a preview composite video.
  • the initial pose information is used to replace the pose information, and a rough preview composite video is first obtained to preview the composite effect.
  • Step 260 Determine whether the received preview feedback information is confirmation information, if yes, go to step 270, and if not, go to step 230.
  • preview feedback information can be obtained. If the user is satisfied with the preview composite video, the user can perform a confirmation operation, and the generated preview feedback information is confirmation information; if not satisfied, the user can perform a cancel operation, and the generated preview feedback information is cancel information. At this time, return to step 230, the user You can continue to edit the display object and obtain a new preview composite video. The cycle is repeated until the user performs the confirmation operation before performing the subsequent processing steps, which can reduce the computational load and improve the response speed.
  • Step 270 Obtain the pose information of the camera when capturing each image frame in the target video.
  • Step 280 Project the display object onto each image frame according to the position information of the display object in the space and the pose information corresponding to each image frame to obtain the target composite video.
  • Step 210, step 230, step 270, and step 280 in this embodiment can respectively refer to step 110, step 130, step 120, and step 150 in the foregoing embodiment, which will not be repeated here.
  • FIG. 8 shows a schematic flowchart of an image processing method according to still another embodiment of the present application to describe the admission decision when selecting a point of interest.
  • the image processing method includes:
  • Step 310 Obtain the target video collected by the shooting device of the movable platform when the movable platform moves in the space.
  • Step 320 Obtain the pose information of each image frame in the target video when the shooting device collects each image frame.
  • Step 330 Determine the target sub-video from the target video.
  • the target sub-video includes the video collected by the shooting device when the motion state of the shooting device in the target video meets the preset motion condition. What this step achieves is to filter the computable parts of the target video to obtain the video parts that can be used to calculate points of interest.
  • the preset motion condition refers to the displacement of the photographing device, rather than being static or just shaking the head in place.
  • FIG. 9 shows a schematic flowchart of a method for determining a target sub-video according to an embodiment of the present application.
  • step 330 in FIG. 8 includes:
  • Step 331 Determine a plurality of continuous image frames from the target video, wherein the sum of the average movement amount of the feature points between adjacent image frames of the plurality of continuous image frames is greater than or equal to a preset distance threshold, and the plurality of continuous image frames The disparity of is greater than or equal to the preset disparity threshold.
  • Step 332 Determine multiple consecutive image frames as target sub-videos.
  • the target sub-video is composed of multiple continuous image frames, and these multiple continuous images need to meet two conditions.
  • the first condition is that the sum of the average movement amount of the feature points between adjacent image frames is greater than or equal to the preset distance threshold to ensure a sufficient amount of movement.
  • the second condition is that the disparity of multiple consecutive image frames is greater than or equal to the preset disparity threshold, which can filter the amount of movement caused by the camera shaking its head in place.
  • the multiple continuous image frames may specifically include multiple continuous segments, and each segment is composed of multiple continuous image frames, that is, the multiple continuous image frames are divided into multiple segments. In particular, when the multiple continuous image frames only include one segment, it is equivalent to not dividing the multiple continuous image frames.
  • the above-mentioned second condition may specifically be that the sum of the disparity of the multiple consecutive segments is greater than or equal to a preset disparity threshold, or that the disparity of each segment is greater than or equal to a preset threshold, and this threshold may be Less than or equal to the preset parallax threshold.
  • the above first condition may further require that the average movement amount of feature points between adjacent image frames in each segment is greater than or equal to The sum is greater than or equal to a preset threshold, and this threshold may be less than or equal to the preset distance threshold.
  • the number of multiple consecutive image frames needs to be greater than or equal to a preset image number threshold. Since multiple consecutive image frames have a large enough amount of movement, too few consecutive image frames means that the camera has moved a lot in a short period of time, which will result in a smaller number of feature points continuously observed It is not easy to calculate. By limiting the threshold of the number of images, it can be ensured that the number of feature points that can be continuously observed in multiple consecutive image frames is sufficiently large, and the accuracy of the calculation of interest points can be ensured.
  • the feature points can be extracted and tracked frame by frame, and then the segments are divided according to the cumulative value of the average movement of the feature points, and then the disparity reaches the threshold (for example, 10 pixels). Fragments are regarded as usable fragments, and the fragments that have not reached the threshold are regarded as unusable fragments. Finally, adjacent similar fragments are merged to become parts. If a part contains more than a predetermined number (for example, 5) of usable fragments, the part becomes computable The part is the target sub-video. Otherwise, the part is an uncalculated part to satisfy the foregoing two conditions and the number requirements of multiple consecutive image frames at the same time.
  • a predetermined number for example, 5
  • the average movement amount of the full image feature points of the two image frames before and after can be calculated, and the accumulation can be calculated frame by frame until the cumulative value is greater than a certain threshold, such as 20 pixels.
  • a certain threshold such as 20 pixels.
  • the cumulative value of the average movement of the feature point is 18 pixels, and the 10th image frame becomes 21 pixels, then the 1st image frame to the 10th image frame is divided into As a fragment.
  • the disparity of the image frame No. 1 and the image frame No. 10 can be calculated, which is the disparity of the segment.
  • Step 340 Obtain the display object edited by the user.
  • Step 350 Output first prompt information, where the first prompt information is used to instruct the user to select a pixel point or a pixel point area in the target image frame in the target sub-video.
  • the first admission judgment condition by outputting the first prompt information, it can actively provide the image frames in the target sub-video that can be calculated, so that the user can conveniently and accurately select the target image frame.
  • the target sub-video can be displayed in the display time.
  • the image frame of is in the selectable state, and the image frames outside the target sub-video are grayed out, or only the image frames in the target sub-video are broadcast when the available image frames are broadcasted by voice.
  • Step 360 In response to the user's operation of selecting a pixel or a pixel area in an image frame other than the target sub-video in the target video, output a prompt message indicating that the selection is invalid.
  • step 350 and step 360 play the role of admission determination from both positive and negative aspects, and they may exist at the same time, or only one of them may be retained.
  • Step 370 Determine the position information of the display object in the space.
  • step 370 specifically includes:
  • Step 371 Obtain the position in the target image frame of the pixel selected by the user in the target image frame in the target sub-video or the position of the selected pixel area in the target image frame in the target image frame.
  • the target image frame can be selected from the target sub-video, and the interest point can be selected, which can ensure the accurate calculation of the interest point.
  • Step 372 Determine whether the object in the space indicated by the pixel point or the pixel point area is a stationary object, if yes, go to step 374, and if not, go to step 373.
  • Step 373 Output second prompt information, and return to step 371, where the second prompt information is used to prompt the user that a pixel or a pixel area is not selectable, or to prompt the user to select another pixel or pixel area.
  • Steps 372 and 373 are the second-level admission determination. Since the target is required to be stationary during measurement, convolutional neural networks (CNN (Convolutional Neural Networks)) can be used to determine whether the selected target object is a potential moving object (such as people, cars, boats, waves), if it is a potential moving object, It is necessary to output the second prompt message to warn the user that the measurement may be inaccurate, and request to reselect the point of interest.
  • CNN Convolutional Neural Networks
  • the second prompt information is used to prompt the user to select other pixels or pixel areas, it can prompt the user to select feature points on a stationary object, which helps to improve the prompt efficiency, reduce the amount of calculation, and improve the accuracy of the calculation.
  • the point of interest is only the initial position where the display object is placed, and the position of the display object can be further adjusted based on the point of interest. For example, if you set the point of interest directly on the ocean wave, a warning will pop up, but you can set the point of interest on the beach first, and then adjust the display object to the surface of the sea.
  • Step 374 Determine whether the pixel area meets the preset texture condition, if yes, go to step 376, if not, go to step 375.
  • the area within a certain size range around the pixel can be used as the pixel area.
  • Step 375 Output third prompt information, and return to step 371, where the third prompt information is used to prompt the user that a pixel or a pixel area is not selectable, or to prompt the user to select another pixel or pixel area.
  • Steps 374 and 375 are the third layer admission determination.
  • feature point extraction analyze whether the target is traceable, that is, whether the target has enough texture, and can extract the feature points in the target area (that is, the selected pixel area).
  • HarrisCorner HOG (Histogram of Oriented) Gradient, directional gradient histogram) and other feature extraction methods, determine that if there are not enough feature points, the texture is too weak and not trackable, and it also warns the user.
  • HarrisCorner HOG (Histogram of Oriented) Gradient, directional gradient histogram) and other feature extraction methods, determine that if there are not enough feature points, the texture is too weak and not trackable, and it also warns the user.
  • the third prompt information is used to prompt the user to select other pixels or pixel areas, it can prompt the user to select the pixel area that meets the texture conditions or the feature points in the area, which helps to improve the prompt efficiency, reduce the amount of calculation, and improve Calculation accuracy.
  • the third prompt information here may be the same as or different from the second prompt information, which is not limited here.
  • the interactive device can be controlled to output the above-mentioned selected invalid prompt information, the first prompt information, the second prompt information, and the third prompt information.
  • the interactive device can be, for example, a touch screen or an intelligent voice interaction device.
  • the image processing device is set on the control terminal of the movable platform, it can also directly output the above-mentioned invalid prompt information, first prompt information, second prompt information, and third prompt information through methods such as display, voice broadcast, etc.
  • the image processing device is set on a movable platform, it can also directly output the above-mentioned invalid prompt information, first prompt information, second prompt information, and third prompt information by means of, for example, lighting a warning light.
  • Step 376 Determine the position information of the display object in space according to the position of the pixel point or the pixel point area in the target image frame.
  • step 144 refer to step 144 in the foregoing embodiment to perform the calculation of interest points in the target sub-video.
  • the target sub-video is the part of the target video that can be used to calculate points of interest, so only points of interest can be selected in the target sub-video, but the display object can still appear in non-calculated video clips, as long as there is interest Click to appear.
  • an uncalculated video segment appears behind the target sub-video, then the display object adjusted based on the target sub-video may also appear in the uncalculated part of the video.
  • Step 380 Project the display object onto each image frame according to the position information of the display object in the space and the pose information corresponding to each image frame to obtain the target composite video.
  • Step 310, step 320, step 340, and step 380 in this embodiment can respectively refer to step 110, step 120, step 130, and step 150 in the foregoing embodiment, and will not be repeated here.
  • FIG. 11 shows a schematic flowchart of a method for calculating interest points according to an embodiment of the present application, and is aimed at the case of selecting a pixel point area.
  • the method for calculating points of interest includes:
  • Step 410 Determine feature points in the pixel point area according to the position of the pixel point area in the target image frame.
  • Step 420 Obtain the position of the feature point in the pixel point area in the target image frame.
  • the point of interest is specifically the geometric center pixel point of the pixel area. Its position in the target image frame is known, but to calculate the position information of the corresponding spatial point in space, it is also necessary to know the point of interest in other image frames In the location. However, the point of interest is not a feature point with high probability. In this case, the feature point in the pixel area can be used to fit and estimate the geometric center pixel point. By obtaining the position of the extracted feature point in the target image frame, the position relationship between the extracted feature point and the point of interest can be obtained, and the fitting estimation can be performed based on this, which helps to improve the calculation accuracy.
  • Step 430 Determine, according to the position of the feature point in the target image frame, the optical flow vector of the spatial point corresponding to the geometric center pixel point of the pixel point area in at least one reference image frame of the target video.
  • the scheme of obtaining the optical flow vector is specifically adopted.
  • the KLT feature tracking algorithm can be used to calculate the optical flow vector of the feature point, and then combining the position relationship between the feature point and the interest point, the interest point can be fitted and calculated Refer to the optical flow vector of the image frame in at least one frame.
  • the weighted average of the optical flow vector of the feature point can be calculated as the optical flow vector of the point of interest, namely
  • x i is the optical flow vector of the feature point in the pixel point area
  • w i is the weight
  • w i can be determined according to the two-dimensional image position relationship between the feature point and the geometric center pixel, for example:
  • is adjusted based on experience and is an adjustable parameter.
  • d i represents the distance from the feature point i to the geometric center pixel.
  • (u i , v i ) represents the pixel coordinates of the feature point i in the image frame, and (u 0 , v 0 ) is the pixel coordinates of the geometric center pixel in the image frame.
  • Step 440 Determine the position of the corresponding spatial point of the geometric center pixel in the at least one reference image frame according to the optical flow vector.
  • the position of the point of interest in at least one reference image frame can be obtained.
  • Step 450 Determine the position information of the spatial point corresponding to the geometric center pixel point in space according to the position in the at least one reference image frame and the position of the feature point in the target image frame.
  • the point of interest After obtaining the position of the point of interest in at least one reference image frame, the point of interest can also be used as a feature point, together with other feature points, to establish and solve the coordinate conversion equation set, and then complete the calculation of the point of interest.
  • the BA beam adjustment algorithm can be used for calculation.
  • the geometric center pixel of the pixel area can be replaced with the selected pixel, and the feature point in the pixel area can be replaced with a certain range near the selected pixel. Feature points can also be calculated.
  • FIG. 12 shows a schematic flowchart of a method for calculating interest points according to another embodiment of the present application, and is aimed at the case of selecting pixels.
  • the method for calculating points of interest includes:
  • Step 510 Obtain the position in space of the spatial point corresponding to the target feature point in the target image frame.
  • the pixel distance between the target feature point and the pixel point is less than or equal to the preset pixel distance threshold, that is, the target feature point is located near the pixel point.
  • the pixel point is not a feature point with high probability, so it is necessary to combine nearby feature points for fitting estimation.
  • the target feature point may be, for example, a reliable feature point analyzed when calculating the pose information of the camera, so as to ensure the accuracy of the calculation of the interest point.
  • Step 520 Fit the target plane according to the position in the space of the spatial point corresponding to the target feature point.
  • the target feature point is the aforementioned reliable feature point
  • the position in space of the spatial point corresponding to the target feature point is already obtained when the pose information of the camera is calculated.
  • the target feature point is not the aforementioned reliable feature point, it is necessary to calculate the position of the corresponding spatial point in space, and the calculation method is still to solve the conversion equation set, which will not be repeated here.
  • Step 530 Determine the position information of the display object in space according to the position of the pixel in the target image frame and the fitted target plane.
  • the target feature point is near the selected pixel point, it can be considered that the spatial point corresponding to the pixel point is also in the fitting target plane.
  • the calculation of points of interest is completed, and the position information of the display object in the space is obtained.
  • the pixel point clicked on the i-th image frame is (u, v), where there is a high probability that there is no corresponding feature point.
  • the nearest reliable feature point (after filtering when calculating the pose information) is recorded as feature i,click , and the three-dimensional coordinates Pi ,click of the corresponding spatial point, and combined with the nearby three-dimensional point (the three-dimensional position is in Nearby feature points)
  • P k (x k , y k , z k ) fit the fitting target plane (a, b, c, d), and then calculate the three-dimensional coordinates of the space point corresponding to the user pixel point through interpolation .
  • the plane fitting can be described by the following optimization problem:
  • the focal point of the pixel and the fitting target plane is recorded as P 0 (x,y,z), which satisfies
  • the target video is captured by the camera when the movable platform is following the target object in the space
  • determining the position information of the display object in the space includes: acquiring the following object of the camera in the space.
  • Location information The location information of the display object in the space is determined according to the location information of the following object in the space.
  • the camera itself needs to select the following object when performing follow-up shooting, the following object is taken as the point of interest or region of interest by default, and the position of the display object in the space is determined directly based on the position information of the following object in the space.
  • the position information for example, can directly use the position of the following object as the position of the display object, or adjust the position of the display object based on the position of the following object, which helps to greatly reduce the amount of calculation and reduce the calculation load.
  • the target video is captured by the camera when the movable platform is moving around the target object in the space
  • determining the position information of the display object in the space includes: acquiring the surrounding object of the camera in the space
  • the position information of the display object in the space is determined according to the position information of the surrounding object in the space.
  • the surrounding object is taken as the point of interest or region of interest by default, and the position of the display object in the space is determined directly based on the position information of the surrounding object in the space.
  • the position information can directly use the position of the surrounding object as the position of the display object, or adjust the position of the display object based on the position of the surrounding object, which helps to greatly reduce the amount of calculation and reduce the calculation load.
  • control terminal of the movable platform includes an image processing device
  • image processing method further includes: playing or storing or running a social application program to share the target composite video for users to watch, save or share the target composite video.
  • the mobile platform includes an image processing device
  • the method further includes: sending the target composite video to the control terminal of the mobile platform so that the control terminal can play or store or run a social application to share the target composite video. For users to watch, save or share the target composite video.
  • the projected image frames can be played frame by frame for the user to view the effect of inserting the display object. If the user confirms the effect, then composite and save the target composite video. If you are not satisfied, you can continue to edit the display object, select a point of interest, or adjust the position of the inserted display object based on the selected point of interest.
  • the user selects the video to be edited (ie the initial video) and downloads it to the APP of the smart device.
  • the APP will automatically download the inertial navigation system of the corresponding UAV and the initial pose information provided by the gimbal stabilization system, namely AIS (Automatic Identification System) file, and internal reference K of the camera (matrix K in background knowledge).
  • AIS Automatic Identification System
  • the user first crops the video, selects the desired part, and the APP splits the cropped target video into image frames, and filters out computable video segments as the target sub-video according to the filtering strategy.
  • the user selects a point of interest on the target image frame in the target sub-video (in the actual process, the user can select a region ROI, and the center point of the region is the point of interest).
  • the tracking video Tracking video
  • the one-click video Quality of the scene
  • the default point of interest is on the selected subject (the target for smart tracking, or one-click shooting of surrounding subjects).
  • the user enters the display object to be displayed.
  • Calculating accurate points of interest is similar to calculating the pose information in the previous step.
  • One difference is that the pose information only needs to be calculated once, and the calculated image frame does not need to be saved; but for the points of interest, the user may adjust it at any time, so you need to start from the beginning. Save the image frame at the end.
  • frequency reduction processing can be performed.
  • the mobile phone video is 30Hz, that is, 30 images per second, and 5 of them can be extracted at intervals. This operation is only used when the memory is limited.
  • Another difference is the calculation of points of interest. Feature points can be extracted only in the ROI of the framed area, and tracking and matching calculations are performed to calculate accurate points of interest.
  • the image frame is recombined into the video.
  • the embodiment of the second aspect of the present application provides an image processing device.
  • the image processing device can be separately set on a movable platform or on a control terminal of the movable platform. Some are set on the movable platform, and some are set on the control terminal of the movable platform.
  • the movable platform can be, for example, an unmanned aerial vehicle, or other vehicles with multiple cameras, such as an unmanned car.
  • the control terminal can be any terminal device that can interact with the mobile platform, such as a remote control device, or a smart device (interaction via APP), such as smart phones, smart tablets, smart glasses (such as VR glasses, AR glasses) , You can also insert the SD card of the mobile platform into the computer, and the control terminal is the computer at this time.
  • APP interaction via APP
  • Fig. 14 shows a schematic block diagram of an image processing apparatus according to an embodiment of the present application.
  • the image processing apparatus 100 includes: a memory 102, configured to store a computer program; a processor 104, configured to execute a computer program to realize: acquiring a photograph of the movable platform when the movable platform moves in space The target video collected by the device; obtain the pose information of each image frame in the target video captured by the shooting device; obtain the display object edited by the user; determine the position information of the display object in the space; according to the position of the display object in the space The information and the pose information corresponding to each image frame project the display object onto each image frame to obtain the target composite video.
  • the image processing device 100 obtaineds the target video collected by the camera of the movable platform and the pose information corresponding to the camera during the collection process, and obtains the display object edited by the user, which can be inserted into the display object Target video to realize the production of special effects video. Specifically, by acquiring the pose information corresponding to the shooting device (as the external parameters of the shooting device, including rotation information and displacement information) when each frame of image is collected, it can be combined with the internal parameters of the known shooting device to realize the world coordinate system The conversion between the coordinate system of the homogeneous image and the homogeneous image to determine the view of the entity in the space in each frame of the target video.
  • the insertion position of the display object can be clarified.
  • the position information of the display object in the space reflects the absolute position of the display object, and the pose information corresponding to each image frame reflects the shooting angle of the camera.
  • the display object can be projected into the image frame to obtain a composite Image frames, all composite image frames are sequentially combined to form the target composite video, and the process of inserting the display object into the target video is completed.
  • the position information in the space can be the position information in the world coordinate system, or the position information in the camera coordinate system obtained by combining the pose information corresponding to each image frame and the position information in the world coordinate system.
  • the display object edited by the user can be obtained first, or the position information of the display object in the space can be determined first, and the execution order of the two is not limited in this application.
  • the memory 102 may include a large-capacity memory for data or instructions.
  • the memory 102 may include a hard disk drive (Hard Disk Drive, HDD), a floppy disk drive, a flash memory, an optical disk, a magneto-optical disk, a magnetic tape or a universal serial bus (Universal Serial Bus, USB) drive, or two or more Multiple combinations of these.
  • the memory 102 may include removable or non-removable (or fixed) media.
  • the memory 102 may be inside or outside the integrated gateway disaster recovery device.
  • the memory 102 is a non-volatile solid-state memory.
  • the memory 102 includes read-only memory (ROM).
  • the ROM can be mask-programmed ROM, programmable ROM (PROM), erasable PROM (EPROM), electrically erasable PROM (EEPROM), electrically rewritable ROM (EAROM) or flash memory or A combination of two or more of these.
  • the foregoing processor 104 may include a central processing unit (CPU), or a specific integrated circuit (Application Specific Integrated Circuit, ASIC), or may be configured to implement one or more integrated circuits of the embodiments of the present application.
  • CPU central processing unit
  • ASIC Application Specific Integrated Circuit
  • the processor 104 is further configured to execute a computer program to achieve: obtain the position information of the feature point in each image frame in the image frame; determine the location of the camera according to the position information of the feature point in the image frame The pose information of each image frame is collected.
  • a feature point based on a general camera model, for a feature point, its three-dimensional position information in the world coordinate system and its two-dimensional position information in the homogeneous image coordinate system (that is, the position information in the image frame) can be established.
  • Conversion equation image recognition can be used to determine the position information (known quantity) of multiple feature points in each image frame in the image frame, and two adjacent image frames often have a large number of overlapping feature points.
  • the three-dimensional position information (unknown quantity) of the same feature point in the world coordinate system is the same, and an image frame also has unique pose information (unknown quantity).
  • the feature points are extracted from the target video frame by frame and the feature points are made. Tracking and matching can obtain multiple sets of conversion equations, which can be solved simultaneously to obtain the pose information corresponding to each image frame and the three-dimensional position information of each feature point in the world coordinate system.
  • the error image frame is analyzed when tracking the feature point, it can be deleted to optimize the calculation result.
  • the number of image frames in a target video is huge, and if the frame-by-frame tracking is performed on the basis of the first image frame, a large deviation may occur.
  • the image frames with obvious changes can be marked as key frames during the frame-by-frame tracking, and then the subsequent non-key frames can be tracked based on the key frames, and when a new key frame appears, the new key frame can be used as the basis to track it. After the non-key frame, thereby improving the calculation accuracy.
  • calculation if there are more than 5 key frames in the target video used for calculation, calculation can be performed.
  • the processor 104 is further configured to execute a computer program to achieve: obtain the initial pose information of the camera when capturing each image frame, where the initial pose information is collected by a sensor configured on the movable platform. ⁇ ; According to the position information of the feature point in the image frame and the initial pose information corresponding to each image frame, the pose information of the camera when collecting each image frame is determined.
  • the use of sensors configured on the movable platform can obtain some prior information during the process of collecting the target video, including no Human aircraft pose data, gimbal pose data, and trajectory data used for one-click shooting. From this, the low-precision initial pose information of the shooting device can be obtained, and the initial pose information is used as the pose information when solving the equations simultaneously
  • the initial value of the iteration can reduce the number of iterations, speed up the algorithm convergence time, and reduce the probability of errors caused by improper selection of the initial value, which helps to shorten the post-processing time of the target video. Even using a smart phone, it can be inserted into the target video. Object, make the target composite video.
  • the processor 104 is further configured to execute a computer program to achieve: obtain the initial pose information of the camera when capturing each image frame, where the initial pose information is collected by a sensor configured on the movable platform. ,According to the display object’s location information in space and the initial pose information corresponding to each image frame, the display object is projected onto each image frame to obtain a preview composite video.
  • the rough initial pose information collected by the sensor of the movable platform is used to directly perform coordinate transformation, and the resulting composite video will have the problem of displaying the object shake, the effect is not good, but the calculation speed is fast, and it can be used for production Preview the composite video to facilitate quick preview of the effect.
  • the processor 104 is further configured to execute a computer program to implement: determine the projection position of the display object in each image frame according to the position information of the display object in space and the pose information corresponding to each image frame And the projection posture; according to the projection position and projection posture of the display object in each image frame, the display object is projected into each image frame to obtain the target composite video.
  • the display object itself is not a point, and therefore has a certain shape.
  • the position information of the display object in the space may be, for example, the position of a reference point on the display object in the space. Taking the position information of the display object in space as the position information in the world coordinate system as an example, combining the pose information of each image frame and the internal parameters of the shooting device, the position information of the display object in the space can be converted into each image The position information in the frame is used as the projection position of the display object in each image frame.
  • Using the pose information corresponding to each image frame to determine the orientation of the display object can also be understood as performing coordinate transformation on the entire display object to obtain the projection posture of the display object in each image frame. According to the determined projection posture, the display object is placed at the corresponding projection position in the corresponding image frame to complete the projection of the display object to obtain a composite image frame, and then combine to obtain the target composite video.
  • the processor 104 is further configured to execute a computer program to achieve: obtain the position adjustment information and/or posture adjustment information of the display object edited by the user; according to the position information of the display object in the space, each image frame The corresponding pose information and the position adjustment information and/or the posture adjustment information of the display object determine the projection position and the projection posture of the display object in each image frame.
  • the processor 104 may also obtain position adjustment information and/or posture adjustment information of the display object edited by the user. For the determined position of the display object in the space, it can be used as the initial position for placing the display object. By obtaining the position adjustment information, the position of the display object can be further adjusted. At this time, there is no need to recalculate the new position. It helps to reduce the amount of calculation, and the initial position can be used as a bridge to solve the problem that the position where the user actually expects to insert the display object cannot be calculated or accurately calculated, which improves the flexibility of the scheme and can meet the rich image processing needs. In addition, the display object is placed frontally in the image frame by default. By obtaining the posture adjustment information, the rotation angle of the display object can be adjusted, and then the posture can be changed. A small amount of calculation can meet the rich display needs of users.
  • the processor 104 is further configured to execute a computer program to realize: detecting the user's editing operation of the display object, and determining the display object edited by the user according to the detected editing operation.
  • the display object is specifically determined by the control terminal detecting the user's display object editing operation, and the display edited by the user can be accurately obtained. Object to meet the display needs of users.
  • the processor 104 is further configured to execute a computer program to implement: control the interactive device to display the display object editing interface; and detect the user's display object editing operation on the interactive device that displays the display object editing interface.
  • a clear interface can be provided for the user to operate, and the interactive device can be used to accurately detect the operation to ensure that accurate display objects are obtained.
  • the interactive device may be a touch display screen, for example.
  • the display object includes at least one of numbers, letters, symbols, text, and object identifiers to meet the rich display needs of users.
  • the display object editing interface can be provided with a text input box for the user to use the input method to input numbers and letters, and the font library can be configured, and the user can also load a new font library or delete an existing font library.
  • the display object editing interface can also display a collection of symbols and object identifiers for users to choose.
  • the display object can also be drawn by the user. It can be drawn by inputting numbers, letters, symbols, text, and object identification, or drawing any graphics.
  • the display object is a three-dimensional model to meet rich display requirements.
  • the processor 104 is further configured to execute a computer program to achieve: obtain the initial video collected by the shooting device when the movable platform moves in space; detect the user's video selection operation, and select the video from the initial Identify the target video in the video.
  • the control terminal detects the user's video selection operation for the initial video, and the initial video can be selected and edited.
  • the target video is a part selected from the initial video, which can be a part that the user expects to keep, or a part that the user chooses for subsequent processing, such as the part that needs to be inserted into the display object, which improves the flexibility of video production , And reduce unnecessary calculations.
  • the processor 104 is further configured to execute a computer program to implement: control the interactive device to display a video selection interface; and detect a user's video selection operation on the interactive device that displays the video selection interface.
  • a clear interface can be provided for the user to operate, and the interactive device can be used to accurately detect the operation to ensure that an accurate target video is obtained.
  • the interactive device may be a touch display screen, for example.
  • the processor 104 is further configured to execute a computer program to achieve: obtain the position of the pixel selected by the user in the target image frame of the target video in the target image frame or the pixel selected in the target image frame The position of the point area in the target image frame; the position information of the display object in space is determined according to the position of the pixel point or the pixel point area in the target image frame.
  • the target video contains multiple image frames, and the user can select one of the image frames as the target image frame, and select a pixel or pixel area in the target image frame (that is, select a point of interest, select a pixel area In ROI, the center point of the pixel area is taken as the point of interest) to indicate the insertion position of the display object.
  • the user can select at will, or display the feature points in the image frame, and make the feature points in a selectable state, so that the user can directly select the identified feature points as points of interest to simplify Subsequent calculations.
  • reference points in the pixel area such as the pixel at the upper left corner and the pixel at the lower right corner, can be used to represent the pixel area. For example, the user can select these two pixels simultaneously or successively. To frame the pixel area, for example, you can select one pixel and then slide to another pixel to select the pixel area.
  • the pixel area by frame it is also possible to generate superpixels ( That is, adjacent pixels with similar texture, color, brightness and other characteristics constitute irregular pixel blocks with certain visual significance).
  • the superpixels in the frame are selected, the superpixels outside the frame are excluded, and the pixels on the border of the frame can be set when a certain percentage (for example, 50%) or more of the superpixel belongs in the frame, then It is counted as selected, otherwise it is counted as unselected, and all selected superpixels constitute a pixel area.
  • the position of the selected point of interest in the target image frame is clear, but it is still necessary to calculate its position in other image frames other than the target image frame, or to calculate its position in space.
  • the calculation process is similar to the calculation process of the pose information corresponding to each image frame, that is, it is achieved by tracking and matching feature points and solving the equation set simultaneously.
  • the selected interest point can also be used as a feature point to establish a conversion equation. The difference is that first, when calculating the pose information, it is necessary to extract the feature points of the whole image, and when calculating the interest points, only the feature points near the selected pixel points or the feature points in the selected pixel area are extracted to improve the calculation.
  • the target video can be down-converted.
  • the mobile phone video is 30Hz, that is, 30 images per second, and 5 of them can be extracted at equal intervals.
  • feature point tracking can be performed once along the forward and reverse directions of the target video time axis to obtain accurate calculation results. The scheme of calculating points of interest will be further described later.
  • the processor 104 is further configured to execute a computer program to: determine the target sub-video from the target video; obtain the pixel points selected by the user in the target image frame in the target sub-video in the target image frame The position or the position of the selected pixel area in the target image frame in the target image frame.
  • the part of the video that can be used to calculate the points of interest can be filtered out, so that the user can select the points of interest in this part, which can ensure the accurate calculation of the points of interest.
  • the target sub-video is the part of the target video that can be used to calculate points of interest, so only points of interest can be selected in the target sub-video, but the display object can still appear in non-calculated video clips, as long as there is interest Click to appear. For example, an uncalculated video segment appears behind the target sub-video, then the display object adjusted based on the target sub-video may also appear in the uncalculated part of the video.
  • the processor 104 is further configured to execute a computer program to realize: in response to the user's operation of selecting pixels or pixel areas in an image frame other than the target sub-video in the target video, outputting the invalid selection Prompt information.
  • the interactive device can be controlled to output the prompt information indicating that the selection is invalid, and the interactive device can be, for example, a touch screen or a smart voice interactive device.
  • the image processing device 100 if the image processing device 100 is set on the control terminal of the mobile platform, it can also directly output the above-mentioned invalid prompt information through display, voice broadcast, etc., if the image processing device 100 is set on the mobile platform, It is also possible to directly output the above-mentioned invalid prompt information by lighting a warning lamp, for example.
  • the processor 104 is further configured to execute a computer program to realize: output first prompt information, where the first prompt information is used to instruct the user to select a pixel or a pixel in the target image frame in the target sub-video Point area.
  • the image frames in the target sub-video that can be calculated can be actively provided for the user to conveniently and accurately select the target image frame, for example, the image frames in the target sub-video can be displayed during display. It is in the selectable state, and the image frames outside the target sub-video are grayed out, or only the image frames in the target sub-video are broadcast when the available image frames are broadcasted by voice.
  • the interaction device can be controlled to output the above-mentioned first prompt information, and the interaction device may be, for example, a touch screen or a smart voice interaction device.
  • the interaction device may be, for example, a touch screen or a smart voice interaction device.
  • the image processing device 100 is installed on the control terminal of the movable platform, the above-mentioned first prompt information can also be directly output by means such as display, voice broadcast, etc., if the image processing device 100 is installed on the movable platform, it is also The above-mentioned first prompt information can be directly output by means such as lighting a warning lamp.
  • the target sub-video includes a video collected by the shooting device when the motion state of the shooting device in the target video meets a preset motion condition.
  • the preset motion condition specifically refers to the displacement of the camera, rather than being static or just shaking the head in place, so as to ensure accurate calculation of points of interest.
  • the processor 104 is further configured to execute a computer program to implement: determine a plurality of continuous image frames from the target video, wherein the average movement amount of feature points between adjacent image frames of the plurality of continuous image frames The sum is greater than or equal to the preset distance threshold, and the disparity of the multiple consecutive image frames is greater than or equal to the preset disparity threshold; the multiple consecutive image frames are determined as the target sub-video.
  • the target sub-video is composed of multiple continuous image frames, and the multiple continuous images need to meet two conditions.
  • the first condition is that the sum of the average movement amount of the feature points between adjacent image frames is greater than or equal to the preset distance threshold to ensure a sufficient amount of movement.
  • the second condition is that the disparity of multiple consecutive image frames is greater than or equal to the preset disparity threshold, which can filter the amount of movement caused by the camera shaking its head in place.
  • the multiple continuous image frames may specifically include multiple continuous segments, and each segment is composed of multiple continuous image frames, that is, the multiple continuous image frames are divided into multiple segments. In particular, when the multiple continuous image frames only include one segment, it is equivalent to not dividing the multiple continuous image frames.
  • the above-mentioned second condition may specifically be that the sum of the disparity of the multiple consecutive segments is greater than or equal to a preset disparity threshold, or that the disparity of each segment is greater than or equal to a preset threshold, and this threshold may be Less than or equal to the preset parallax threshold.
  • the above first condition may further require that the average movement amount of feature points between adjacent image frames in each segment is greater than or equal to The sum is greater than or equal to a preset threshold, and this threshold may be less than or equal to the preset distance threshold.
  • the number of multiple consecutive image frames is greater than or equal to a preset image number threshold.
  • the number of multiple consecutive image frames is limited. Since multiple consecutive image frames have a large enough amount of movement, too few consecutive image frames means that the camera has moved a lot in a short period of time, which will result in a smaller number of feature points continuously observed It is not easy to calculate. By limiting the threshold of the number of images, it can be ensured that the number of feature points that can be continuously observed in multiple consecutive image frames is sufficiently large, and the accuracy of the calculation of interest points can be ensured.
  • the feature points can be extracted and tracked frame by frame, and then the segments are divided according to the cumulative value of the average movement of the feature points, and then the disparity reaches the threshold (for example, 10 pixels). Fragments are regarded as usable fragments, and the fragments that have not reached the threshold are regarded as unusable fragments. Finally, adjacent similar fragments are merged to become parts. If a part contains more than a predetermined number (for example, 5) of usable fragments, the part becomes computable The part is the target sub-video. Otherwise, the part is an uncalculated part to satisfy the foregoing two conditions and the number requirements of multiple consecutive image frames at the same time.
  • a predetermined number for example, 5
  • the average movement amount of the full image feature points of the two image frames before and after can be calculated, and the accumulation can be calculated frame by frame until the cumulative value is greater than a certain threshold, such as 20 pixels.
  • a certain threshold such as 20 pixels.
  • the cumulative value of the average movement of the feature point is 18 pixels, and the 10th image frame becomes 21 pixels, then the 1st image frame to the 10th image frame is divided into As a fragment.
  • the disparity of the image frame No. 1 and the image frame No. 10 can be calculated, which is the disparity of the segment.
  • the processor 104 is further configured to execute a computer program to realize: determine whether the object in the space indicated by the pixel or the pixel area is a stationary object; when the object is a stationary object, according to the pixel or the pixel area The position in the target image frame determines the position information in the space of the display object edited by the user.
  • the solution proposed in this application requires the target to remain stationary for a short period of time when calculating interest points, some admission decisions need to be made.
  • the selected target object can be determined through the convolutional neural network CNN. Whether it is a potential moving object (such as people, cars, boats, ocean waves), and when it is determined that the object in the space indicated by the pixel point or the pixel point area is a stationary object, the calculation is performed to ensure the accuracy of the calculation result.
  • the processor 104 is further configured to execute a computer program to implement: determine whether the object in the space indicated by the pixel point or the pixel point area is a stationary object; when the object is not in motion stationary, output second prompt information, Wherein, the second prompt information is used to prompt the user that a pixel or a pixel area is not selectable, or is used to prompt the user to select another pixel or a pixel area.
  • the second prompt information when the object in the space indicated by the pixel point or the pixel point area is not a static object, by outputting the second prompt message, the user can be reminded to make a modification to ensure the accuracy of the calculation result.
  • the second prompt information is used to prompt the user to select other pixels or pixel areas, it can prompt the user to select feature points on a stationary object, which helps to improve the prompt efficiency, reduce the amount of calculation, and improve the accuracy of the calculation.
  • the point of interest is only the initial position where the display object is placed, and the position of the display object can be further adjusted based on the point of interest. For example, if you set the point of interest directly on the ocean wave, a warning will pop up, but you can set the point of interest on the beach first, and then adjust the display object to the sea surface.
  • the interaction device can be controlled to output the above-mentioned second prompt information.
  • the interaction device may be, for example, a touch screen or a smart voice interaction device.
  • the above-mentioned second prompt information can also be directly output by means such as display, voice broadcast, etc.
  • the image processing device 100 is installed on the movable platform, it is also The above-mentioned second prompt information can be directly output by means such as lighting a warning lamp.
  • the processor 104 is also configured to execute a computer program to: determine whether the pixel area meets a preset texture condition; when the preset texture condition is satisfied, according to the pixel or pixel area in the target image The position in the frame determines the position information of the display object edited by the user in the space.
  • the selected pixel area needs to have enough feature points, that is, when the preset texture conditions are met. Perform calculations to ensure that the calculation results are accurate.
  • the area within a certain size range around the pixel can be used as the pixel area.
  • the processor 104 is further configured to execute a computer program to: determine whether the pixel area meets a preset texture condition; when the preset texture condition is not satisfied, output third prompt information, where the first The three prompt information is used to prompt the user that the pixel or the pixel area is not selectable, or to prompt the user to select other pixels or the pixel area.
  • the third prompt information when the selected pixel area does not meet the preset texture condition, by outputting the third prompt information, the user can be reminded to make a modification to ensure the accuracy of the calculation result.
  • the third prompt information is used to prompt the user to select other pixels or pixel areas, it can prompt the user to select the pixel area that meets the texture conditions or the feature points in the area, which helps to improve the prompt efficiency, reduce the amount of calculation, and improve Calculation accuracy.
  • the interaction device can be controlled to output the above-mentioned third prompt information, and the interaction device may be, for example, a touch display screen or an intelligent voice interaction device.
  • the third prompt information can also be directly output through means such as display, voice broadcast, etc., if the image processing device 100 is installed on the movable platform, it is also possible to directly output the third prompt information.
  • the above-mentioned third prompt information can be directly output by means such as lighting a warning lamp.
  • the processor 104 may only realize the detection of static objects, or only realize the detection of texture conditions, and may also realize the detection of static objects and the detection of texture conditions.
  • the second prompt information and the third prompt information may be the same or different, which is not limited here.
  • the processor 104 is further configured to execute a computer program to: determine the feature points in the pixel area according to the position of the pixel area in the target image frame; obtain the feature points in the pixel area in the target image The position in the frame; according to the position of the feature point in the target image frame, the position information of the spatial point corresponding to the geometric center pixel point of the pixel point area is determined in space.
  • a scheme for calculating interest points when the pixel point area is selected is specifically defined.
  • the point of interest is specifically the geometric center pixel point of the pixel area, and its position in the target image frame is known, but to calculate the position information of the corresponding spatial point in space, it is also necessary to know where the point of interest is Position in other image frames.
  • the point of interest is not a feature point with high probability.
  • the feature point in the pixel area can be used to fit and estimate the geometric center pixel point.
  • the processor 104 is further configured to execute a computer program to realize that: according to the position of the feature point in the target image frame, it is determined that the corresponding spatial point of the geometric center pixel point of the pixel point area is referenced in at least one frame of the target video.
  • the optical flow vector of the image frame; the position of the corresponding spatial point of the geometric center pixel in the at least one reference image frame is determined according to the optical flow vector; the position and the feature point in the target image frame are determined according to the position in the at least one reference image frame
  • the position in determines the position information of the spatial point corresponding to the geometric center pixel in the space.
  • the KLT feature tracking algorithm can be used to calculate the optical flow vector of the feature point, and then combining the position relationship between the feature point and the interest point, the interest point can be fitted and calculated Refer to the optical flow vector of the image frame in at least one frame.
  • the weighted average of the optical flow vectors of the feature points can be calculated as the optical flow vectors of the points of interest.
  • the position of the point of interest in at least one reference image frame can be obtained.
  • the point of interest can also be used as A feature point, together with other feature points, is used to establish and solve the coordinate transformation equations, and then complete the calculation of the points of interest.
  • the BA beam adjustment algorithm can be used for calculation.
  • the geometric center pixel of the pixel area can be replaced with the selected pixel, and the feature point in the pixel area can be replaced with a certain range near the selected pixel. Feature points can also be calculated.
  • the processor 104 is further configured to execute a computer program to achieve: obtain the position in space of the space point corresponding to the target feature point in the target image frame; The position fits the target plane; the position information of the display object in the space is determined according to the position of the pixel in the target image frame and the fitted target plane.
  • a scheme for calculating points of interest when pixels are selected is specifically defined.
  • the pixel point is not a feature point with high probability, so it is necessary to combine nearby feature points for fitting estimation.
  • the target feature point may be, for example, a reliable feature point analyzed when calculating the pose information of the camera, so as to ensure the accuracy of the calculation of the interest point.
  • the target feature point is the aforementioned reliable feature point, the position in space of the spatial point corresponding to the target feature point is already obtained when the pose information of the camera is calculated.
  • the target feature point is not the aforementioned reliable feature point, it is necessary to calculate the position of the corresponding spatial point in space, and the calculation method is still to solve the conversion equation set, which will not be repeated here.
  • the target feature point is near the selected pixel point, it can be considered that the spatial point corresponding to the pixel point is also in the fitting target plane.
  • the calculation of points of interest is completed, and the position information of the display object in the space is obtained.
  • the pixel distance between the target feature point and the pixel point is less than or equal to a preset pixel distance threshold, that is, the target feature point is located near the pixel point to ensure the accuracy of the fitting calculation.
  • the target video is captured by the camera when the movable platform is following the target object in the space
  • the processor 104 is further configured to execute a computer program to achieve: Obtain the following object of the camera in the space
  • the position information of the display object in the space is determined according to the position information of the following object in the space.
  • the camera itself needs to select the following object when performing follow-up shooting, the following object is taken as the point of interest or region of interest by default, and the position of the display object in the space is determined directly based on the position information of the following object in the space.
  • the position information for example, can directly use the position of the following object as the position of the display object, or adjust the position of the display object based on the position of the following object, which helps to greatly reduce the amount of calculation and reduce the calculation load.
  • the target video is captured by the camera when the movable platform is moving around the target object in space
  • the processor 104 is further configured to execute a computer program to achieve: The position information in the space; the position information of the display object in the space is determined according to the position information of the surrounding object in the space.
  • the surrounding object is taken as the point of interest or region of interest by default, and the position of the display object in the space is determined directly based on the position information of the surrounding object in the space.
  • the position information can directly use the position of the surrounding object as the position of the display object, or adjust the position of the display object based on the position of the surrounding object, which helps to greatly reduce the amount of calculation and reduce the calculation load.
  • control terminal of the mobile platform includes an image processing device 100, and the processor 104 is also configured to execute a computer program to realize: play or store or run a social application program to share the target composite video for users to watch, Save or share the target composite video.
  • the movable platform includes an image processing apparatus 100
  • the processor 104 is also configured to execute a computer program to realize: sending the target composite video to the control terminal of the movable platform to make the control terminal play or store or Run social applications to share the target composite video for users to watch, save or share the target composite video.
  • the projected image frames can be played frame by frame for the user to view the effect of inserting the display object. If the user confirms the effect, then composite and save the target composite video. If you are not satisfied, you can continue to edit the display object, select a point of interest, or adjust the position of the inserted display object based on the selected point of interest.
  • the embodiment of the third aspect of the present application provides a movable platform 200, which includes the image processing device 100 as in some of the above embodiments, and thus has the corresponding technical effects of the image processing device 100. This will not be repeated here.
  • the movable platform 200 may be, for example, an unmanned aerial vehicle, or other vehicles with multiple cameras, such as an unmanned car.
  • the embodiment of the fourth aspect of the present application provides a control terminal 300 of a movable platform, which includes the image processing device 100 of some of the above embodiments, and thus has the corresponding technology of the image processing device 100 The effect will not be repeated here.
  • the control terminal 300 of the mobile platform can be any terminal device that can interact with the mobile platform. Glasses, AR glasses), the SD card of the mobile platform can also be inserted into the computer. At this time, the control terminal 300 of the mobile platform is a computer.
  • the embodiment of the fifth aspect of the present application provides a computer-readable storage medium on which a computer program is stored.
  • the computer program is executed by a processor, the steps of the image processing method as in any of the above-mentioned embodiments are implemented, and thus have the All technical effects of the image processing method will not be repeated here.
  • the computer-readable storage medium may include any medium that can store or transmit information. Examples of computer-readable storage media include electronic circuits, semiconductor memory devices, ROM, flash memory, erasable ROM (EROM), floppy disks, CD-ROMs, optical disks, hard disks, fiber optic media, radio frequency (RF) links, and so on.
  • the code segment can be downloaded via a computer network such as the Internet, an intranet, and so on.
  • the term “plurality” refers to two or more than two, unless specifically defined otherwise.
  • the terms “installed”, “connected”, “connected”, “fixed” and other terms should be understood in a broad sense.
  • “connected” can be a fixed connection, a detachable connection, or an integral connection;
  • “connected” can be It is directly connected or indirectly connected through an intermediary.
  • the specific meanings of the above-mentioned terms in this application can be understood according to specific circumstances.

Landscapes

  • Engineering & Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Processing Or Creating Images (AREA)

Abstract

本申请提供了一种图像的处理方法及装置、可移动平台及其控制终端、计算机可读存储介质,其中,图像的处理方法包括:获取可移动平台在空间中移动时可移动平台的拍摄装置采集到的目标视频;获取拍摄装置在采集目标视频中每一图像帧时的位姿信息;获取用户编辑的展示对象;确定展示对象在空间中的位置信息;根据展示对象在空间中的位置信息和每一图像帧对应的位姿信息将展示对象投影到每一图像帧上以获取目标合成视频。通过在移动过程中获取拍摄装置的位姿信息,结合视频的图像信息,计算出视频中场景的三维信息,使得处理视频更加快速便捷,用户只需要输入要生成的展示对象,并点击想要摆放的位置,即可自动渲染制作插入了展示对象的特效视频。

Description

图像的处理方法及装置、可移动平台及其控制终端、计算机可读存储介质 技术领域
本申请涉及图像处理技术领域,具体而言,涉及一种图像的处理方法、一种图像的处理装置、一种可移动平台、一种可移动平台的控制终端及一种计算机可读存储介质。
背景技术
无人机航拍视频,由于其独特的上帝视角,得到广泛应用和推崇。但目前航拍视频要想做一些特效,如3D字幕效果,还是需要将视频从无人机的SD(Secure Digital Memory Card,安全数码卡)中下载到计算机上,利用传统的专业视频编辑软件制作特效,并有一定的操作难度,费时费力。
发明内容
本申请旨在至少解决现有技术或相关技术中存在的技术问题之一。
为此,本申请的第一方面提出了一种图像的处理方法。
本申请的第二方面提出了一种图像的处理装置。
本申请的第三方面提出了一种可移动平台。
本申请的第四方面提出了一种可移动平台的控制终端。
本申请的第五方面提出了一种计算机可读存储介质。
有鉴于此,根据本申请的第一方面,提供了一种图像的处理方法,应用于图像的处理装置,包括:获取可移动平台在空间中移动时可移动平台的拍摄装置采集到的目标视频;获取拍摄装置在采集目标视频中每一图像帧时的位姿信息;获取用户编辑的展示对象;确定展示对象在空间中的位置信息;根据展示对象在空间中的位置信息和每一图像帧对应的位姿信息将展示对象投影到每一图像帧上以获取目标合成视频。
另外,根据本申请上述技术方案提供的图像的处理方法,还具有如下附加技术特征:
在本申请的一种实施例中,获取拍摄装置在采集目标视频中每一图像帧时的位姿信息,包括:获取每一图像帧中的特征点在图像帧中的位置信息;根据特征点在图像帧中的位置信息确定拍摄装置在采集每一图像帧时的位姿信息。
在本申请的一种实施例中,方法还包括:获取拍摄装置在采集每一图像帧时的初始位姿信息,其中,初始位姿信息是可移动平台配置的传感器采集得到的;根据特征点在图像帧中的位置信息确定拍摄装置在采集每一图像帧时的位姿信息,包括:根据特征点在图像帧中的位置信息和每一图像帧对应的初始位姿信息确定拍摄装置在采集每一图像帧时的位姿信息。
在本申请的一种实施例中,方法还包括:获取拍摄装置在采集每一图像帧时的初始位姿信息,其中,初始位姿信息是可移动平台配置的传感器采集得到的;根据展示对象在空间中的位置信息和每一图像帧对应的初始位姿信息将展示对象投影到每一图像帧上以获取预览合成视频。
在本申请的一种实施例中,根据展示对象在空间中的位置信息和每一图像帧对应的位姿信息将展示对象投影到每一图像帧上以获取目标合成视频,包括:根据展示对象在空间中的位置信息和每一图像帧对应的位姿信息确定展示对象在每一图像帧中的投影位置和投影姿态;根据展示对象在每一图像帧中的投影位置和投影姿态将展示对象投影到每一图像帧中以获取目标合成视频。
在本申请的一种实施例中,方法还包括:获取用户编辑的展示对象的位置调整信息和/或姿态调整信息;根据展示对象在空间中的位置信息和每一图像帧对应的位姿信息确定展示对象在每一图像帧中的投影位置和投影姿态,包括:根据展示对象在空间中的位置信息、每一图像帧对应的位姿信息和展示对象的位置调整信息和/或姿态调整信息确定展示对象在每一图像帧中的投影位置和投影姿态。
在本申请的一种实施例中,获取用户编辑的展示对象,包括:检测用户的展示对象编辑操作,根据检测到的编辑操作确定用户编辑的展示对象。
在本申请的一种实施例中,检测用户的展示对象编辑操作,包括:控制交互装置显示展示对象编辑界面;检测用户对显示展示对象编辑界面的 交互装置的展示对象编辑操作。
在本申请的一种实施例中,方法还包括:获取可移动平台在空间中移动时拍摄装置采集的初始视频;获取可移动平台在空间中移动时可移动平台的拍摄装置采集到的目标视频,包括:检测用户的视频选取操作,根据检测到的视频选取操作从初始视频中确定目标视频。
在本申请的一种实施例中,检测用户的视频选取操作,包括:控制交互装置显示视频选取界面;检测用户对显示视频选取界面的交互装置的视频选取操作。
在本申请的一种实施例中,确定展示对象在空间中的位置信息,包括:获取用户在目标视频中的目标图像帧中选中的像素点在目标图像帧中的位置或者在目标图像帧中选中的像素点区域在目标图像帧中的位置;根据像素点或者像素点区域在目标图像帧中的位置确定展示对象在空间中的位置信息。
在本申请的一种实施例中,方法还包括:从目标视频中确定目标子视频;获取用户在目标视频中的目标图像帧中选中的像素点在目标图像帧中的位置或者在目标图像帧中选中的像素点区域在目标图像帧中的位置,包括:获取用户在目标子视频中的目标图像帧中选中的像素点在目标图像帧中的位置或者在目标图像帧中选中的像素点区域在目标图像帧中的位置。
在本申请的一种实施例中,方法还包括:响应于用户在目标视频中的目标子视频之外的图像帧中选中像素点或者像素点区域的操作,输出选中无效的提示信息。
在本申请的一种实施例中,方法还包括:输出第一提示信息,其中,第一提示信息用于指示用户在目标子视频中的目标图像帧中选中像素点或者像素点区域。
在本申请的一种实施例中,目标子视频包括目标视频中拍摄装置的运动状态满足预设的运动条件时拍摄装置采集到的视频。
在本申请的一种实施例中,从目标视频中确定目标子视频,包括:从目标视频中确定多个连续图像帧,其中,多个连续图像帧相邻的图像帧之间特征点的平均移动量之和大于或等于预设的距离阈值,多个连续图像帧 的视差大于或等于预设的视差阈值;将多个连续图像帧确定为目标子视频。
在本申请的一种实施例中,多个连续图像帧的数量大于或等于预设的图像数量阈值。
在本申请的一种实施例中,确定展示对象在空间中的位置信息,还包括:确定像素点或者像素点区域指示的空间中物体是否为静止物体;根据像素点或者像素点区域在目标图像帧中的位置确定用户编辑的展示对象在空间中的位置信息,包括:当物体为静止物体时,根据像素点或者像素点区域在目标图像帧中的位置确定用户编辑的展示对象在空间中的位置信息。
在本申请的一种实施例中,确定展示对象在空间中的位置信息,还包括:确定像素点或者像素点区域指示的空间中物体是否为静止物体;当物体不为静止运动时,输出第二提示信息,其中,第二提示信息用于提示用户像素点或者像素点区域不可选,或者用于提示用户选中其他像素点或者像素点区域。
在本申请的一种实施例中,确定展示对象在空间中的位置信息,还包括:确定像素点区域是否满足预设的纹理条件;根据像素点区域在目标图像帧中的位置确定用户编辑的展示对象在空间中的位置信息,包括:当满足预设的纹理条件时,根据像素点或者像素点区域在目标图像帧中的位置确定用户编辑的展示对象在空间中的位置信息。
在本申请的一种实施例中,确定展示对象在空间中的位置信息,还包括:确定像素点区域是否满足预设的纹理条件;当不满足预设的纹理条件时,输出第三提示信息,其中,第三提示信息用于提示用户像素点或者像素点区域不可选,或者用于提示用户选中其他像素点或者像素点区域。
在本申请的一种实施例中,根据像素点区域在目标图像帧中的位置确定展示对象在空间中的位置信息,包括:根据像素点区域在目标图像帧中的位置确定像素点区域中的特征点;获取像素点区域中的特征点在目标图像帧中的位置;根据特征点在目标图像帧中的位置确定像素点区域的几何中心像素点的对应的空间点在空间中的位置信息。
在本申请的一种实施例中,根据特征点在目标图像帧中的位置确定像 素点区域的几何中心像素点的对应的空间点在空间中的位置信息,包括:根据特征点在目标图像帧中的位置确定像素点区域的几何中心像素点的对应的空间点在目标视频至少一帧参考图像帧的光流向量;根据光流向量确定几何中心像素点的对应的空间点在至少一帧参考图像帧中的位置;根据在至少一帧参考图像帧中的位置和特征点在目标图像帧中的位置确定几何中心像素点的对应的空间点在空间中的位置信息。
在本申请的一种实施例中,根据像素点在目标图像帧中的位置确定展示对象在空间中的位置信息,包括:获取目标图像帧中的目标特征点对应的空间点在空间中的位置;根据目标特征点对应的空间点在空间中的位置拟合目标平面;根据像素点在目标图像帧中的位置和拟合目标平面确定展示对象在空间中的位置信息。
在本申请的一种实施例中,目标特征点与像素点之间的像素距离小于或等于预设的像素距离阈值。
在本申请的一种实施例中,目标视频是可移动平台对空间中的目标对象进行跟随时由拍摄装置拍摄获取的,确定展示对象在空间中的位置信息,包括:获取拍摄装置的跟随对象在空间中的位置信息;根据跟随对象在空间中的位置信息确定展示对象在空间中的位置信息。
在本申请的一种实施例中,目标视频是可移动平台对空间中的目标对象进行环绕运动时由拍摄装置拍摄获取的,确定展示对象在空间中的位置信息,包括:获取拍摄装置的环绕对象在空间中的位置信息;根据环绕对象在空间中的位置信息确定展示对象在空间中的位置信息。
在本申请的一种实施例中,展示对象包括数字、字母、符号、文字和物体标识中的至少一种。
在本申请的一种实施例中,展示对象为三维模型。
在本申请的一种实施例中,方法还包括:播放或存储或者运行社交应用程序分享目标合成视频。
在本申请的一种实施例中,可移动平台包括图像的处理装置,方法还包括:将目标合成视频发送给可移动平台的控制终端以使控制终端播放或存储或者运行社交应用程序分享目标合成视频。
在本申请的一种实施例中,可移动平台包括无人飞行器。
根据本申请的第二方面,提供了一种图像的处理装置,包括:存储器,被配置为存储计算机程序;处理器,被配置为执行计算机程序以实现:获取可移动平台在空间中移动时可移动平台的拍摄装置采集到的目标视频;获取拍摄装置在采集目标视频中每一图像帧时的位姿信息;获取用户编辑的展示对象;确定展示对象在空间中的位置信息;根据展示对象在空间中的位置信息和每一图像帧对应的位姿信息将展示对象投影到每一图像帧上以获取目标合成视频。
另外,根据本申请提供的上述技术方案中的图像的处理装置,还可以具有如下附加技术特征:
在本申请的一种实施例中,处理器还被配置为执行计算机程序以实现:获取每一图像帧中的特征点在图像帧中的位置信息;根据特征点在图像帧中的位置信息确定拍摄装置在采集每一图像帧时的位姿信息。
在本申请的一种实施例中,处理器还被配置为执行计算机程序以实现:获取拍摄装置在采集每一图像帧时的初始位姿信息,其中,初始位姿信息是可移动平台配置的传感器采集得到的;根据特征点在图像帧中的位置信息和每一图像帧对应的初始位姿信息确定拍摄装置在采集每一图像帧时的位姿信息。
在本申请的一种实施例中,处理器还被配置为执行计算机程序以实现:获取拍摄装置在采集每一图像帧时的初始位姿信息,其中,初始位姿信息是可移动平台配置的传感器采集得到的;根据展示对象在空间中的位置信息和每一图像帧对应的初始位姿信息将展示对象投影到每一图像帧上以获取预览合成视频。
在本申请的一种实施例中,处理器还被配置为执行计算机程序以实现:根据展示对象在空间中的位置信息和每一图像帧对应的位姿信息确定展示对象在每一图像帧中的投影位置和投影姿态;根据展示对象在每一图像帧中的投影位置和投影姿态将展示对象投影到每一图像帧中以获取目标合成视频。
在本申请的一种实施例中,处理器还被配置为执行计算机程序以实现: 获取用户编辑的展示对象的位置调整信息和/或姿态调整信息;根据展示对象在空间中的位置信息、每一图像帧对应的位姿信息和展示对象的位置调整信息和/或姿态调整信息确定展示对象在每一图像帧中的投影位置和投影姿态。
在本申请的一种实施例中,处理器还被配置为执行计算机程序以实现:检测用户的展示对象编辑操作,根据检测到的编辑操作确定用户编辑的展示对象。
在本申请的一种实施例中,处理器还被配置为执行计算机程序以实现:控制交互装置显示展示对象编辑界面;检测用户对显示展示对象编辑界面的交互装置的展示对象编辑操作。
在本申请的一种实施例中,处理器还被配置为执行计算机程序以实现:获取可移动平台在空间中移动时拍摄装置采集的初始视频;检测用户的视频选取操作,根据检测到的视频选取操作从初始视频中确定目标视频。
在本申请的一种实施例中,处理器还被配置为执行计算机程序以实现:控制交互装置显示视频选取界面;检测用户对显示视频选取界面的交互装置的视频选取操作。
在本申请的一种实施例中,处理器还被配置为执行计算机程序以实现:获取用户在目标视频中的目标图像帧中选中的像素点在目标图像帧中的位置或者在目标图像帧中选中的像素点区域在目标图像帧中的位置;根据像素点或者像素点区域在目标图像帧中的位置确定展示对象在空间中的位置信息。
在本申请的一种实施例中,处理器还被配置为执行计算机程序以实现:从目标视频中确定目标子视频;获取用户在目标子视频中的目标图像帧中选中的像素点在目标图像帧中的位置或者在目标图像帧中选中的像素点区域在目标图像帧中的位置。
在本申请的一种实施例中,处理器还被配置为执行计算机程序以实现:响应于用户在目标视频中的目标子视频之外的图像帧中选中像素点或者像素点区域的操作,输出选中无效的提示信息。
在本申请的一种实施例中,处理器还被配置为执行计算机程序以实现: 输出第一提示信息,其中,第一提示信息用于指示用户在目标子视频中的目标图像帧中选中像素点或者像素点区域。
在本申请的一种实施例中,目标子视频包括目标视频中拍摄装置的运动状态满足预设的运动条件时拍摄装置采集到的视频。
在本申请的一种实施例中,处理器还被配置为执行计算机程序以实现:从目标视频中确定多个连续图像帧,其中,多个连续图像帧相邻的图像帧之间特征点的平均移动量之和大于或等于预设的距离阈值,多个连续图像帧的视差大于或等于预设的视差阈值;将多个连续图像帧确定为目标子视频。
在本申请的一种实施例中,多个连续图像帧的数量大于或等于预设的图像数量阈值。
在本申请的一种实施例中,处理器还被配置为执行计算机程序以实现:确定像素点或者像素点区域指示的空间中物体是否为静止物体;当物体为静止物体时,根据像素点或者像素点区域在目标图像帧中的位置确定用户编辑的展示对象在空间中的位置信息。
在本申请的一种实施例中,处理器还被配置为执行计算机程序以实现:确定像素点或者像素点区域指示的空间中物体是否为静止物体;当物体不为静止运动时,输出第二提示信息,其中,第二提示信息用于提示用户像素点或者像素点区域不可选,或者用于提示用户选中其他像素点或者像素点区域。
在本申请的一种实施例中,处理器还被配置为执行计算机程序以实现:确定像素点区域是否满足预设的纹理条件;当满足预设的纹理条件时,根据像素点或者像素点区域在目标图像帧中的位置确定用户编辑的展示对象在空间中的位置信息。
在本申请的一种实施例中,处理器还被配置为执行计算机程序以实现:确定像素点区域是否满足预设的纹理条件;当不满足预设的纹理条件时,输出第三提示信息,其中,第三提示信息用于提示用户像素点或者像素点区域不可选,或者用于提示用户选中其他像素点或者像素点区域。
在本申请的一种实施例中,处理器还被配置为执行计算机程序以实现: 根据像素点区域在目标图像帧中的位置确定像素点区域中的特征点;获取像素点区域中的特征点在目标图像帧中的位置;根据特征点在目标图像帧中的位置确定像素点区域的几何中心像素点的对应的空间点在空间中的位置信息。
在本申请的一种实施例中,处理器还被配置为执行计算机程序以实现:根据特征点在目标图像帧中的位置确定像素点区域的几何中心像素点的对应的空间点在目标视频至少一帧参考图像帧的光流向量;根据光流向量确定几何中心像素点的对应的空间点在至少一帧参考图像帧中的位置;根据在至少一帧参考图像帧中的位置和特征点在目标图像帧中的位置确定几何中心像素点的对应的空间点在空间中的位置信息。
在本申请的一种实施例中,处理器还被配置为执行计算机程序以实现:获取目标图像帧中的目标特征点对应的空间点在空间中的位置;根据目标特征点对应的空间点在空间中的位置拟合目标平面;根据像素点在目标图像帧中的位置和拟合目标平面确定展示对象在空间中的位置信息。
在本申请的一种实施例中,目标特征点与像素点之间的像素距离小于或等于预设的像素距离阈值。
在本申请的一种实施例中,目标视频是可移动平台对空间中的目标对象进行跟随时由拍摄装置拍摄获取的,处理器还被配置为执行计算机程序以实现:获取拍摄装置的跟随对象在空间中的位置信息;根据跟随对象在空间中的位置信息确定展示对象在空间中的位置信息。
在本申请的一种实施例中,目标视频是可移动平台对空间中的目标对象进行环绕运动时由拍摄装置拍摄获取的,处理器还被配置为执行计算机程序以实现:获取拍摄装置在空间中的环绕对象的位置信息;根据环绕对象在空间中的位置信息确定展示对象在空间中的位置信息。
在本申请的一种实施例中,展示对象包括数字、字母、符号、文字和物体标识中的至少一种。
在本申请的一种实施例中,展示对象为三维模型。
在本申请的一种实施例中,处理器还被配置为执行计算机程序以实现:播放或存储或者运行社交应用程序分享目标合成视频。
在本申请的一种实施例中,可移动平台包括图像的处理装置,处理器还被配置为执行计算机程序以实现:将目标合成视频发送给可移动平台的控制终端以使控制终端播放或存储或者运行社交应用程序分享目标合成视频。
在本申请的一种实施例中,可移动平台包括无人飞行器。
根据本申请的第三方面,提供了一种可移动平台,包括如上述部分技术方案的图像的处理装置。
根据本申请的第四方面,提供了一种可移动平台的控制终端,包括如上述部分技术方案的图像的处理装置。
根据本申请的第五方面,提供了一种计算机可读存储介质,其上存储有计算机程序,计算机程序被处理器执行时实现如上述任一技术方案的图像的处理方法的步骤。
综上,本申请提出一种图像的处理方案,针对于可移动平台,通过在移动过程中获取可移动平台的拍摄装置的位姿信息,结合视频的图像信息,计算出视频中场景的三维信息,使得处理视频更加快速便捷,用户只需要输入要生成的展示对象,如字幕文字,并点击想要摆放的位置,即可自动渲染制作插入了展示对象的特效视频,如3D字幕效果视频。
本申请的附加方面和优点将在下面的描述部分中变得明显,或通过本申请的实践了解到。
附图说明
本申请的上述和/或附加的方面和优点从结合下面附图对实施例的描述中将变得明显和容易理解,其中:
图1示出了根据本申请的一个实施例的图像的处理方法的示意流程图;
图2示出了根据本申请的一个实施例的获取位姿信息的方法的示意流程图;
图3示出了根据本申请的一个实施例的确定展示对象在空间中的位置信息的方法的示意流程图;
图4示出了根据本申请的一个实施例的获取目标合成视频的方法的示意流程图;
图5示出了根据本申请的一个实施例的三维模型线框图;
图6示出了根据本申请的一个实施例的三维模型消隐图;
图7示出了根据本申请的另一个实施例的图像的处理方法的示意流程图;
图8示出了根据本申请的再一个实施例的图像的处理方法的示意流程图;
图9示出了根据本申请的一个实施例的确定目标子视频的方法的示意流程图;
图10示出了根据本申请的一个实施例的确定目标子视频的策略示意图;
图11示出了根据本申请的一个实施例的计算兴趣点的方法的示意流程图;
图12示出了根据本申请的另一个实施例的计算兴趣点的方法的示意流程图;
图13示出了根据本申请的又一个实施例的图像的处理方法的示意流程图;
图14示出了根据本申请的一个实施例的图像的处理装置的示意框图;
图15示出了根据本申请的一个实施例的可移动平台的示意框图;
图16示出了根据本申请的一个实施例的可移动平台的控制终端的示意框图。
具体实施方式
为了能够更清楚地理解本申请的上述目的、特征和优点,下面结合附图和具体实施方式对本申请进行进一步的详细描述。需要说明的是,在不冲突的情况下,本申请的实施例及实施例中的特征可以相互组合。
在下面的描述中阐述了很多具体细节以便于充分理解本申请,但是,本申请还可以采用其他不同于在此描述的其他方式来实施,因此,本申请 的保护范围并不受下面公开的具体实施例的限制。
本申请第一方面的实施例提供了一种图像的处理方法,应用于图像的处理装置。图像的处理装置可单独设置在可移动平台上,也可单独设置在可移动平台的控制终端上,还可部分设置在可移动平台上,部分设置在可移动平台的控制终端上。可移动平台例如可为无人飞行器,还可为其他带有多摄像头的载具,例如无人驾驶的汽车。控制终端可为任何能与可移动平台交互的终端设备,例如可为遥控设备,也可为智能设备(经APP(Application,应用程序)实现交互),如智能手机、智能平板、智能眼镜(如VR(Virtual Reality,虚拟现实)眼镜、AR(Augmented Reality,增强现实)眼镜),还可将可移动平台的SD卡插入电脑,此时控制终端为电脑。
在描述本申请实施例提供的图像的处理方法之前,先介绍一般相机模型,由此可实现三维的世界坐标系与二维的齐次图像坐标系的坐标转换:
Figure PCTCN2020087404-appb-000001
其中:
[u,v,1] T表示齐次图像坐标系(Homogeneous image coordinates)中的二维点。
[x w,y w,z w,1] T表示世界坐标系(World coordinates)中的三维点。
矩阵R为旋转矩阵(Rotation Matrix),矩阵T为位移矩阵(Translation Matrix),或者可以写成矩阵t,R和T为相机的外参(Extrinsic Matrix),表达的是三维空间中,世界坐标系到相机坐标系的旋转与位移变换,合起来成为相机位姿(camera pose)。
矩阵K称为相机校准矩阵(Camera calibration matrix),即每个相机的内参(Intrinsic Parameters),表达的是三维的相机坐标系到二维的齐次图像坐标系的转换。
图1示出了根据本申请的一个实施例的图像的处理方法的示意流程图。如图1所示,该图像的处理方法包括:
步骤110,获取可移动平台在空间中移动时可移动平台的拍摄装置采集到的目标视频。
该步骤作为初始步骤,通过获取可移动平台的拍摄装置移动采集的目标视频,将该目标视频作为处理的对象,执行具体的处理操作。
具体地,该方法还包括:获取可移动平台在空间中移动时拍摄装置采集的初始视频。相应地,当图像的处理装置至少部分设置在可移动平台的控制终端上时,步骤110可包括:检测用户的视频选取操作,根据检测到的视频选取操作从初始视频中确定目标视频。通过控制终端检测用户针对初始视频做出的视频选取操作,可对初始视频进行选取编辑。此时目标视频为从初始视频中选取的一部分,可以是用户期望保留的一部分,也可以是用户选择的需要进行后续处理的部分,例如需要插入展示对象的部分,既提升了视频制作的灵活性,又减少了不必要的计算量。
具体地,检测用户的视频选取操作,包括:控制交互装置显示视频选取界面;检测用户对显示视频选取界面的交互装置的视频选取操作。通过控制交互装置显示视频选取界面,可提供明确的界面以供用户操作,并利用交互装置准确检测该操作,保证获取到准确的目标视频。交互装置例如可为触摸显示屏。
步骤120,获取拍摄装置在采集目标视频中每一图像帧时的位姿信息。
采集目标视频的过程中,拍摄装置随可移动平台在空间中移动。通过获取采集每一帧图像时,拍摄装置对应的位姿信息(作为拍摄装置的外参,包括旋转信息和位移信息),可结合已知的拍摄装置的内参,实现世界坐标系与齐次图像坐标系之间的转换,从而确定空间中的实体在目标视频中每一帧图像内的视图,以便执行后续的图像处理操作。
具体地,图2示出了根据本申请的一个实施例的获取位姿信息的方法的示意流程图。如图2所示,图1中的步骤120包括:
步骤122,获取每一图像帧中的特征点在图像帧中的位置信息。
步骤124,根据特征点在图像帧中的位置信息确定拍摄装置在采集每一图像帧时的位姿信息。
基于一般相机模型,针对一个特征点,可建立其在世界坐标系下的三 维位置信息和其在齐次图像坐标系下的二维位置信息(即图像帧中的位置信息)的转换方程。其中,利用图像识别可确定每一图像帧中的多个特征点在图像帧中的位置信息(已知量),而相邻两个图像帧往往具有大量重合的特征点,不同图像帧中的同一特征点在世界坐标系下的三维位置信息(未知量)相同,一个图像帧也具有唯一的位姿信息(未知量),通过在目标视频中逐帧全图提取特征点,并做特征点跟踪匹配,可获得多组转换方程,联立求解即可得到每一图像帧对应的位姿信息以及各个特征点在世界坐标系下的三维位置信息。
具体计算时,若在跟踪特征点时分析出错的图像帧,可予以删除,以优化计算结果。此外,一个目标视频中的图像帧的数量是巨大的,若以第一帧图像帧为基础进行逐帧跟踪,则可能出现较大偏差。为此,可在逐帧跟踪时将出现明显变化的图像帧标记为关键帧,再基于关键帧跟踪后续的非关键帧,并在出现新的关键帧时,以新的关键帧为基础跟踪其后的非关键帧,从而提高计算准确度。实际计算时,用于计算的目标视频中存在5个以上的关键帧,就可进行计算。
具体地,该方法还包括:获取拍摄装置在采集每一图像帧时的初始位姿信息,其中,初始位姿信息是可移动平台配置的传感器采集得到的。步骤124具体包括:根据特征点在图像帧中的位置信息和每一图像帧对应的初始位姿信息确定拍摄装置在采集每一图像帧时的位姿信息。
利用可移动平台上配置的传感器,例如无人飞行器的惯性导航系统以及云台增稳系统中的传感器,可以在采集目标视频的过程中获取到一些先验信息,包括无人飞行器位姿数据、云台位姿数据、一键拍摄所用的轨迹数据,由此可得到拍摄装置精度较低的初始位姿信息,利用初始位姿信息作为联立求解方程组时位姿信息的迭代初始值,可减少迭代次数,加快算法收敛时间,同时减少初始值选取不当造成的出错的概率,有助于缩短目标视频的后期处理时间,即使使用智能手机也能在目标视频中插入展示对象,制作目标合成视频。
该计算过程具体例如可为:
(1)在飞行录像时,通过无人飞行器的IMU(Inertial Measurement Unit, 惯性测量单元)惯性导航系统以及云台增稳系统,记录下拍摄装置粗略的初始位姿信息
Figure PCTCN2020087404-appb-000002
以及拍摄装置的内参K。
(2)选取关键帧keyframe,针对一系列图像帧,先提取特征点feature(比如Harris Corner detection algorithms),并做多帧图像帧之间特征点的跟踪匹配(例如可采用KLT特征跟踪算法(Kanade-Lucas-Tomasi feature tracker)),以便计算其光流向量,然后运行BA光束平差算法(Bundle Adjustment)计算这些特征点的三维坐标以及精准的拍摄装置位姿信息
Figure PCTCN2020087404-appb-000003
计算公式如下:
Figure PCTCN2020087404-appb-000004
这里的i表示关键帧序列。其中,投影变换过程为:
Figure PCTCN2020087404-appb-000005
简写为p′=π(RP i+t),π代表投影函数。
P i为某个特征点的三维坐标(即该特征点对应的空间点在空间中的位置信息),p i是此特征点在第i帧图像帧上的像素坐标(即该特征点在第i帧图像帧中的位置信息),
Figure PCTCN2020087404-appb-000006
表示当前帧相对于前一帧的旋转平移变换,arg代表优化的参数(目标)是
Figure PCTCN2020087404-appb-000007
P。
这里选取关键帧的准则是:
a)当前图像帧与前一个关键帧的距离足够大(粗略的平移
Figure PCTCN2020087404-appb-000008
大于某个阈值);
b)或者当前图像帧与前一个关键帧的旋转足够大(粗略的旋转
Figure PCTCN2020087404-appb-000009
大于某个阈值);
c)或者跟踪匹配成功的特征点总数太少了(不同图像帧间匹配成功的特征点总数小于某个阈值);
d)或者特征点在不同图像区域的数目总数太少了(同一图像帧上特征点数量太少了)。
(3)利用关键帧,去除不可靠的特征点,筛选出可靠的特征点。策略如下:
遍历所有特征点,判断其中最大的重投影误差
Figure PCTCN2020087404-appb-000010
是否足够小(小于某个阈值),若是,则判断在关键帧中出现的次数足够多(大于某个阈值,比如在80%的关键帧中都跟踪匹配成功)。
(4)计算所有图像帧的位姿信息
上一步中仅仅计算了关键帧的拍摄装置的位姿信息,一般来说,关键帧只占全部图像帧的十分之一,所以处理速度很快。但是为了要能渲染展示对象,这里需要计算所有图像帧的拍摄装置的位姿信息。这里可以在之前的基础上,进行并行化处理:
Figure PCTCN2020087404-appb-000011
这里的j表示非关键帧序列,同样也是用BA光束平差算法计算出对应的拍摄装置的位姿信息。
由此一来,便得到了所有图像帧的特征点的三维坐标,以及相邻图像帧之间的拍摄装置位姿关系。
步骤130,获取用户编辑的展示对象。
当处理操作具体为在目标视频中插入展示对象时,首先需获取用户预插入的展示对象。
具体地,当图像的处理装置至少部分设置在可移动平台的控制终端上时,步骤130包括:检测用户的展示对象编辑操作,根据检测到的编辑操作确定用户编辑的展示对象。使得控制终端可通过检测用户的展示对象编辑操作,准确获取用户编辑的展示对象。
其中,检测用户的展示对象编辑操作,包括:控制交互装置显示展示对象编辑界面;检测用户对显示展示对象编辑界面的交互装置的展示对象编辑操作。通过控制交互装置显示展示对象编辑界面,可提供明确的界面以供用户操作,并利用交互装置准确检测该操作,保证获取到准确的展示对象。交互装置例如可为触摸显示屏。
在一些实施例中,展示对象包括数字、字母、符号、文字和物体标识中的至少一种,以满足用户丰富的展示需求。相应地,展示对象编辑界面可设文本输入框以供用户利用输入法输入数字、字母,并可配置字体库,用户也可载入新的字体库或删除已有字体库。展示对象编辑界面还可展示 符号和物体标识的集合以供用户选择。此外,展示对象也可由用户自行绘制,可以是以绘制的方式输入数字、字母、符号、文字和物体标识外,也可以是绘制任意图形。
在一些实施例中,展示对象为三维模型,以满足丰富的展示需求。其具体处理方式将在下文详述。
步骤140,确定展示对象在空间中的位置信息。
除获取展示对象外,还需确定展示对象在空间中的位置信息,以便在目标视频的相应位置予以恰当展示。需说明的是,空间中的位置信息可以是世界坐标系下的位置信息,也可以是结合每一图像帧对应的位姿信息以及世界坐标系下的位置信息得到的相机坐标系下的位置信息。
可以理解的是,可先执行步骤130,也可先执行步骤140,对于二者的执行顺序,本申请不予限制。
具体地,图3示出了根据本申请的一个实施例的确定展示对象在空间中的位置信息的方法的示意流程图。如图3所示,图1中的步骤140包括:
步骤142,获取用户在目标视频中的目标图像帧中选中的像素点在目标图像帧中的位置或者在目标图像帧中选中的像素点区域在目标图像帧中的位置。
目标视频包含多个图像帧,用户可选中其中一个图像帧作为目标图像帧,并选中目标图像帧中的一个像素点或像素点区域(即选择兴趣点,选中像素点区域ROI(Region of Interest,兴趣区域)时以像素点区域的中心点作为兴趣点),以便指示展示对象的插入位置。
对于像素点,既可由用户随意选取,也可采用在图像帧中显示特征点,并令特征点处于可选择的状态的方案,使得用户可直接选择已经识别出的特征点作为兴趣点,以简化后续计算。
对于像素点区域,可利用像素点区域中的参考点,例如左上角处的像素点和右下角处的像素点,来代表像素点区域,用户例如可通过同时或先后选中这两个像素点来框选像素点区域,例如还可通过先选中一个像素点再滑动至另一个像素点来选中像素点区域。
进一步地,对于框选出像素点区域的情况,还可通过SLIC(Simple  Linear Iterative Clustering,简单的线性迭代聚类)、Graph-based、NCut(Normalized Cut,归一化割法)、Turbopixel、Quick-shift、Graph-cut a、Graph-cut b等算法,在图像帧中生成超像素(即具有相似纹理、颜色、亮度等特征的相邻像素点构成的有一定视觉意义的不规则像素块)。处于框内的超像素被选中,处于框外的超像素被排除,处于框边界上的像素点,则可设定当所属超像素有一定比例(例如50%)以上的部分在框内,则算作选中,否则算作未选中,所有选中的超像素就构成了像素点区域。
步骤144,根据像素点或者像素点区域在目标图像帧中的位置确定展示对象在空间中的位置信息。
选中的兴趣点在目标图像帧中的位置是明确的,但仍需计算其在目标图像帧之外的其他图像帧中的位置,或者说需要计算其在空间中的位置。该计算过程类似于每一图像帧对应的位姿信息的计算过程,即通过跟踪匹配特征点,联立求解方程组实现,此时可将选中的兴趣点也作为一个特征点,建立转换方程。不同之处在于,一者,计算位姿信息时需全图提取特征点,而计算兴趣点时只需提取选中的像素点附近的特征点或选中的像素点区域中的特征点,以提高计算精度;二者,计算位姿信息逐帧跟踪特征点时,内存中只需保存提取的特征点以及当前正在处理的图像帧,无需保存所有图像帧,而计算兴趣点时,由于用户可能调整兴趣点,故需从头到尾保存图像帧。对于第二点区别的情况,当存在内存限制时,可对目标视频做降频处理,比如手机视频是30Hz,即1秒30张图,可等间隔抽取其中5张。此外,计算兴趣点时还可沿目标视频时间轴的正向和反向各进行一次特征点跟踪,以得到精确的计算结果。后文将对计算兴趣点的方案做进一步描述。
步骤150,根据展示对象在空间中的位置信息和每一图像帧对应的位姿信息将展示对象投影到每一图像帧上以获取目标合成视频。
展示对象在空间中的位置信息反映了展示对象的绝对位置,每一图像帧对应的位姿信息则反映了拍摄装置的拍摄视角,结合二者即可将展示对象投影到图像帧中,得到合成图像帧,全部合成图像帧按序组合就形成了目标合成视频,至此完成了将展示对象插入目标视频中的处理。
具体地,图4示出了根据本申请的一个实施例的获取目标合成视频的方法的示意流程图。如图4所示,图1中的步骤150具体包括:
步骤152,根据展示对象在空间中的位置信息和每一图像帧对应的位姿信息确定展示对象在每一图像帧中的投影位置和投影姿态。
展示对象本身并非点,因而具有一定形状,展示对象在空间中的位置信息例如可为展示对象上的一个参考点在空间中的位置。以展示对象在空间中的位置信息是世界坐标系下的位置信息为例,结合每一图像帧的位姿信息和拍摄装置的内参,可以将展示对象在空间中的位置信息转换为每一图像帧中的位置信息,作为展示对象在每一图像帧中的投影位置。再利用每一图像帧对应的位姿信息确定展示对象的朝向,也可理解为对整个展示对象进行坐标变换,即可得到展示对象在每一图像帧中的投影姿态。
步骤154,根据展示对象在每一图像帧中的投影位置和投影姿态将展示对象投影到每一图像帧中以获取目标合成视频。
按照确定的投影姿态将展示对象置于相应图像帧中的相应投影位置,即可完成展示对象的投影,得到合成图像帧,进而组合得到目标合成视频。
进一步地,本申请的方法还包括:获取用户编辑的展示对象的位置调整信息和/或姿态调整信息。已经选定的兴趣点可作为放置展示对象的初始位置,通过获取位置调整信息,可进一步基于兴趣点调整展示对象的位置,此时不必对新的位置重新进行迭代运算,既有助于降低计算量,又可以初选的兴趣点作为桥梁,解决用户实际期望插入展示对象的位置无法计算或无法准确计算的问题,提升了方案的灵活性,可满足丰富的图像处理需求。此外,展示对象默认在图像帧中正面摆放,通过获取姿态调整信息,可调整展示对象的旋转角度,进而改变姿态,通过少量的计算就可满足用户丰富的展示需求。相应地,步骤152包括:根据展示对象在空间中的位置信息、每一图像帧对应的位姿信息和展示对象的位置调整信息和/或姿态调整信息确定展示对象在每一图像帧中的投影位置和投影姿态。
以展示对象是三维模型为例,投影过程具体为:
先导入三维模型(如图5所示的线框图),结合拍摄装置的位姿信息,使用z-buffer算法进行消隐(计算线框图中哪些线是被遮挡的,不应该显 示,这里输出为如图6所示的消隐图),投影到图像帧上,加上色彩渲染得到真实感图形。
此时生成的真实感图形放置在初始位置(即兴趣点)上,并且在用户框选兴趣点的图像帧中是正面摆放的。用户可以根据需求输入位置调整信息,例如拖拽调整真实感图形的位置(即与兴趣点之间有平移变换t的调整),也可以输入姿态调整信息以旋转真实感图形的角度(即与兴趣点之间有旋转变换R)。上文中的位姿信息应该是基于第一帧图像帧的相机坐标系,这里结合用户调整真实感图形的旋转变换R与平移变换t,计算出每一帧图像帧相对于真实感图形的位置和姿态(简单的坐标系转换,得到真实感图形在图像帧中的朝向,利用z-buffer得到消隐图并渲染),同时利用相机模型投影关系,计算出真实感图形在每个图像帧中的二维位置信息。至此得到了真实感图形的放置位置以及朝向,完成了真实感图形的放置。
至此,本申请如图1所示的一个实施例的图像的处理方法描述完毕。
虽然利用可移动平台的传感器采集到的粗略的初始位姿信息直接进行坐标变换,得到的合成视频会存在展示对象抖动的问题,效果欠佳,但计算速度快,可用于效果预览。图7示出了根据本申请的另一个实施例的图像的处理方法的示意流程图,以描述制作预览合成视频的方案。如图7所示,该图像的处理方法包括:
步骤210,获取可移动平台在空间中移动时可移动平台的拍摄装置采集到的目标视频。
步骤220,获取拍摄装置在采集每一图像帧时的初始位姿信息,其中,初始位姿信息是可移动平台配置的传感器采集得到的。
步骤230,获取用户编辑的展示对象。
步骤240,确定展示对象在空间中的位置信息。
该步骤可先获取用户在目标图像帧中选中的像素点或像素点区域在目标图像帧中的位置,对于像素点区域,例如可选用其中心点处的像素点的位置,再利用初始位姿信息和拍摄装置内参进行坐标转换,得到该像素点或像素点区域在空间中的粗略位置,记为展示对象在空间中的预览位置信息。
步骤250,根据展示对象在空间中的位置信息和每一图像帧对应的初始位姿信息将展示对象投影到每一图像帧上以获取预览合成视频。
该步骤利用初始位姿信息代替位姿信息,先得到粗略的预览合成视频,以便预览合成效果。
步骤260,判断接收到的预览反馈信息是否为确认信息,若是,则转到步骤270,若否,则转到步骤230。
通过设置确认步骤,例如在操作界面上提供确认按钮和取消按钮以供用户选择,可获得预览反馈信息。若用户对预览合成视频满意,则可执行确认操作,生成的预览反馈信息为确认信息;若不满意,则用户可执行取消操作,生成的预览反馈信息为取消信息,此时返回步骤230,用户可继续编辑展示对象,并获得新的预览合成视频,如此循环,直到用户执行确认操作才执行后续处理步骤,可降低运算负荷,提升响应速度。
步骤270,获取拍摄装置在采集目标视频中每一图像帧时的位姿信息。
步骤280,根据展示对象在空间中的位置信息和每一图像帧对应的位姿信息将展示对象投影到每一图像帧上以获取目标合成视频。
该实施例中的步骤210、步骤230、步骤270、步骤280可分别对应参照前述实施例中的步骤110、步骤130、步骤120、步骤150,在此不再赘述。
至此,本申请如图7所示的另一个实施例的图像的处理方法描述完毕。
由于本申请提出的方法在计算兴趣点时,需要目标在测量的短时间内静止不动,并且可以持续使用智能跟随算法追踪,所以需要做一些准入判定,对于可能测量不准或是不工作的目标提前给予警示,继续执行可能会失败。图8示出了根据本申请的再一个实施例的图像的处理方法的示意流程图,以描述选择兴趣点时的准入判定。如图8所示,该图像的处理方法包括:
步骤310,获取可移动平台在空间中移动时可移动平台的拍摄装置采集到的目标视频。
步骤320,获取拍摄装置在采集目标视频中每一图像帧时的位姿信息。
步骤330,从目标视频中确定目标子视频。
其中,目标子视频包括目标视频中拍摄装置的运动状态满足预设的运动条件时拍摄装置采集到的视频。该步骤实现的是目标视频中可计算部分的筛选,以得到可以用于计算兴趣点的视频部分。具体地,预设的运动条件是指拍摄装置发生了位移,而非静止或仅在原地摇头。后续选择兴趣点时,需保证仅在目标子视频内的图像帧中选择,以此作为第一个准入判定条件。
具体地,图9示出了根据本申请的一个实施例的确定目标子视频的方法的示意流程图。如图9所示,图8中的步骤330包括:
步骤331,从目标视频中确定多个连续图像帧,其中,多个连续图像帧相邻的图像帧之间特征点的平均移动量之和大于或等于预设的距离阈值,多个连续图像帧的视差大于或等于预设的视差阈值。
步骤332,将多个连续图像帧确定为目标子视频。
目标子视频由多个连续图像帧构成,这多个连续图像需满足两个条件。第一个条件是,相邻的图像帧之间特征点的平均移动量之和大于或等于预设的距离阈值,以确保足够的移动量。第二个条件是,多个连续图像帧的视差大于或等于预设的视差阈值,可过滤拍摄装置在原地摇头导致的移动量。需说明的是,上述多个连续图像帧可具体包含多个连续片段,每个片段由多个连续的图像帧组成,也就是将上述多个连续图像帧划分为多段。特别地,当上述多个连续图像帧仅包含一个片段时,就相当于不对上述多个连续图像帧进行划分。相应地,上述第二个条件具体可以是这多个连续片段的视差之和大于或等于预设的视差阈值,也可以是每个片段的视差均大于或等于一个预设的阈值,这个阈值可小于或等于预设的视差阈值。对应于每个片段的视差均大于或等于一个预设的阈值的情况,与之类似,上述第一个条件也可以进一步要求每个片段中相邻的图像帧之间特征点的平均移动量之和大于或等于一个预设的阈值,这个阈值可小于或等于预设的距离阈值。
其中,多个连续图像帧的数量需大于或等于预设的图像数量阈值。由于多个连续图像帧具有足够大的移动量,因此连续图像帧的数量过少就意味着拍摄装置在较短的时间内发生了较大的移动,会造成连续观测到的特 征点数量较少而不便于计算。通过限定图像数量阈值,可确保在多个连续图像帧中能够连续观测到的特征点的数量足够多,保证兴趣点计算的准确度。
实际计算时,如图10所示,对于目标视频,可先逐帧进行特征点提取和跟踪,然后按特征点的平均移动量累计值划分片段,再将视差达到阈值(例如10个像素)的片段作为可用片段,未达到阈值的片段作为不可用片段,最后合并相邻的同类片段,成为部分,若一个部分中包含预定个数(例如5个)以上的可用片段,则该部分成为可计算部分,即为目标子视频,否则该部分为不可计算部分,以同时满足前述两个条件以及多个连续图像帧的数量要求。具体地,对于片段的划分,可计算前后两个图像帧的全图特征点的平均移动量,并逐帧计算累加,直到累计值大于一定的阈值,如20个像素。例如从1号图像帧一直累计到9号图像帧,特征点的平均移动量累计值为18像素,到10号图像帧就变成了21个像素,则1号图像帧至10号图像帧划为一个片段。此后可计算1号图像帧与10号图像帧的视差,即为该片段的视差。
步骤340,获取用户编辑的展示对象。
步骤350,输出第一提示信息,其中,第一提示信息用于指示用户在目标子视频中的目标图像帧中选中像素点或者像素点区域。
对于第一个准入判定条件,通过输出第一提示信息,可主动提供可计算的目标子视频中的图像帧,以供用户方便准确地选择目标图像帧,例如可在显示时令目标子视频中的图像帧处于可选择状态,而灰化目标子视频之外的图像帧,或在语音播报可供选择的图像帧时仅播报目标子视频中的图像帧。
步骤360,响应于用户在目标视频中的目标子视频之外的图像帧中选中像素点或者像素点区域的操作,输出选中无效的提示信息。
对于第一个准入判定条件,当用户在目标子视频之外的图像帧中选择兴趣点时,通过输出选中无效的提示信息,可提醒用户修改兴趣点。可以理解的是,步骤350和步骤360从正反两方面起到了准入判定的作用,可以同时存在,也可以只保留其中一个。
步骤370,确定展示对象在空间中的位置信息。
基于确定好的目标子视频,步骤370具体包括:
步骤371,获取用户在目标子视频中的目标图像帧中选中的像素点在目标图像帧中的位置或者在目标图像帧中选中的像素点区域在目标图像帧中的位置。
通过第一层准入判定,可从目标子视频中选出目标图像帧,并选中兴趣点,可确保兴趣点的准确计算。
步骤372,确定像素点或者像素点区域指示的空间中物体是否为静止物体,若是,则转到步骤374,若否,则转到步骤373。
步骤373,输出第二提示信息,并返回步骤371,其中,第二提示信息用于提示用户像素点或者像素点区域不可选,或者用于提示用户选中其他像素点或者像素点区域。
步骤372和步骤373为第二层准入判定。由于测量时要求目标静止不动,可通过卷积神经网络CNN(Convolutional Neural Networks)判别选中的目标物体是否是潜在的运动物体(例如人、车、船、海浪),如果是潜在的运动物体,就需要输出第二提示信息以警告用户可能测量不准,要求重新选取兴趣点。当第二提示信息用于提示用户选中其他像素点或者像素点区域时,可提示用户选中静止物体上的特征点,有助于提高提示效率,降低计算量,提升计算准确度。
这里需要注意的是,如前所述,兴趣点只是放置展示对象的初始位置,可以再进一步基于兴趣点调整展示对象的位置。比如直接把兴趣点设置在海浪上,是会弹出警告,但可以先把兴趣点设置在海滩上,最后再调整展示对象到海面上即可。
步骤374,确定像素点区域是否满足预设的纹理条件,若是,则转到步骤376,若否,则转到步骤375。
对于选中像素点的情况,可将像素点周围一定尺寸范围内的区域作为像素点区域。
步骤375,输出第三提示信息,并返回步骤371,其中,第三提示信息用于提示用户像素点或者像素点区域不可选,或者用于提示用户选中其他 像素点或者像素点区域。
步骤374和步骤375为第三层准入判定。通过特征点提取,分析目标是否具有可追踪性,即分析目标是否有足够的纹理,可以通过提取目标区域(即选中的像素点区域)内的特征点,这里可以用HarrisCorner,HOG(Histogram of Oriented Gradient,方向梯度直方图)等特征提取方法,判定,如果特征点不够多,说明纹理太弱了,不具有跟踪性,也警示用户。当第三提示信息用于提示用户选中其他像素点或者像素点区域时,可提示用户选中满足纹理条件的像素点区域或该区域内的特征点,有助于提高提示效率,降低计算量,提升计算准确度。
可以理解的是,第二层准入判定和第三层准入判定在执行时没有先后顺序要求。此处第三提示信息可以与第二提示信息相同,也可以不同,在此不做限定。具体地,可控制交互装置输出上述选中无效的提示信息、第一提示信息、第二提示信息、第三提示信息,交互装置例如可为触摸显示屏、智能语音交互装置。此外,若图像的处理装置设置在可移动平台的控制终端上,也可通过例如显示、语音播报等方式直接输出上述选中无效的提示信息、第一提示信息、第二提示信息、第三提示信息,若图像的处理装置设置在可移动平台上,也可通过例如点亮警示灯等方式直接输出上述选中无效的提示信息、第一提示信息、第二提示信息、第三提示信息。
步骤376,根据像素点或像素点区域在目标图像帧中的位置确定展示对象在空间中的位置信息。
该步骤可参考前述实施例中的步骤144,在目标子视频中进行兴趣点的计算。需说明的是,目标子视频是目标视频中可以用于计算兴趣点的部分,所以只能在目标子视频内选取兴趣点,但展示对象依然可以在不可计算的视频片段中出现,只要有兴趣点出现即可。比如一段不可计算的视频片段出现在了目标子视频后面,那么基于目标子视频调整的展示对象,也可以出现在不可计算部分的视频中。
步骤380,根据展示对象在空间中的位置信息和每一图像帧对应的位姿信息将展示对象投影到每一图像帧上以获取目标合成视频。
该实施例中的步骤310、步骤320、步骤340、步骤380可分别对应参 照前述实施例中的步骤110、步骤120、步骤130、步骤150,在此不再赘述。
至此,本申请如图8所示的再一个实施例的图像的处理方法描述完毕。
接下来对计算兴趣点的方案(即前述实施例中的步骤144)进行描述。
图11示出了根据本申请的一个实施例的计算兴趣点的方法的示意流程图,针对的是选中像素点区域的情况。如图11所示,该计算兴趣点的方法包括:
步骤410,根据像素点区域在目标图像帧中的位置确定像素点区域中的特征点。
如前所述,在计算兴趣点时,仅提取选中的像素点区域内的特征点,以减少计算量、提高计算精度。
步骤420,获取像素点区域中的特征点在目标图像帧中的位置。
兴趣点具体是像素点区域的几何中心像素点,其在目标图像帧中的位置是已知的,但要计算其对应的空间点在空间中的位置信息,还需要知道兴趣点在其他图像帧中的位置。然而兴趣点大概率并非特征点,此时可利用像素点区域内的特征点来拟合估算几何中心像素点。通过获取提取的特征点在目标图像帧中的位置,可得到提取的特征点与兴趣点的位置关系,据此进行拟合估算,有助于提高计算准确度。
步骤430,根据特征点在目标图像帧中的位置确定像素点区域的几何中心像素点的对应的空间点在目标视频至少一帧参考图像帧的光流向量。
拟合计算时,具体采用了求取光流向量的方案。通过在目标视频的至少一帧参考图像帧进行特征点跟踪,例如可采用KLT特征跟踪算法计算特征点的光流向量,再结合特征点与兴趣点的位置关系,即可拟合计算出兴趣点在至少一帧参考图像帧的光流向量。具体地,可计算特征点的光流向量的加权平均数,作为兴趣点的光流向量,即
Figure PCTCN2020087404-appb-000012
x i为像素点区域内的特征点的光流向量,w i是权重,w i例如可根据特征点与几何中心像素点的二维图像位置关系来确定:
Figure PCTCN2020087404-appb-000013
这就是一个简单的高斯分布,σ根据经验调节,是可调参数,d i表示特征点i到几何中心像素点的距离
Figure PCTCN2020087404-appb-000014
(u i,v i)表示特征点i的在图像帧中的像素坐标,(u 0,v 0)是几何中心像素点在图像帧中的像素坐标。
步骤440,根据光流向量确定几何中心像素点的对应的空间点在至少一帧参考图像帧中的位置。
得到兴趣点在至少一帧参考图像帧的光流向量后,结合兴趣点在目标图像帧中的位置,即可得到兴趣点在至少一帧参考图像帧的位置。
步骤450,根据在至少一帧参考图像帧中的位置和特征点在目标图像帧中的位置确定几何中心像素点的对应的空间点在空间中的位置信息。
得到兴趣点在至少一帧参考图像帧中的位置后,就可将兴趣点也作为一个特征点,和其他特征点一起,进行坐标转换方程组的建立和求解,进而完成兴趣点计算。具体地,可采用BA光束平差算法计算。
可以理解的是,对于选中像素点的情况,可以将上述像素点区域的几何中心像素点替换为选中的像素点,将上述像素点区域中的特征点替换为选中的像素点附近一定范围内的特征点,同样可以完成计算。
图12示出了根据本申请的另一个实施例的计算兴趣点的方法的示意流程图,针对的是选中像素点的情况。如图12所示,该计算兴趣点的方法包括:
步骤510,获取目标图像帧中的目标特征点对应的空间点在空间中的位置。
其中,目标特征点与像素点之间的像素距离小于或等于预设的像素距离阈值,即目标特征点位于像素点附近。与前述实施例一样,像素点大概率并非特征点,因此需结合附近的特征点进行拟合估算。此处目标特征点例如可为计算拍摄装置的位姿信息时分析出的可靠特征点,以确保兴趣点计算的准确度。
步骤520,根据目标特征点对应的空间点在空间中的位置拟合目标平面。
当目标特征点为前述可靠特征点时,在计算拍摄装置的位姿信息时就已经得到了目标特征点对应的空间点在空间中的位置。当目标特征点不为 前述可靠特征点时,则需计算其对应的空间点在空间中的位置,计算方法仍为求解转换方程组,在此不再赘述。
步骤530,根据像素点在目标图像帧中的位置和拟合目标平面确定展示对象在空间中的位置信息。
由于目标特征点在选中的像素点附近,因而可认为像素点对应的空间点也处在拟合目标平面内。过像素点和拍摄装置的光心做一条连线,该连线与拟合目标平面的交点是像素点与拟合目标平面的焦点,可认为该焦点就是像素点对应的空间点,由此可完成兴趣点计算,进而得到展示对象在空间中的位置信息。
具体地,用户输入要添加的展示对象后,于第i帧图像帧点击的像素点为(u,v),此处大概率是没有对应的特征点的。这里例如找到最近的可靠特征点(经过计算位姿信息时筛选后的)记为feature i,click,以及其对应的空间点的三维坐标P i,click,并结合附近的三维点(三维位置在附近的特征点)P k=(x k,y k,z k)拟合出拟合目标平面(a,b,c,d),然后通过插值计算出用户像素点对应的空间点的三维坐标。
其中平面拟合可以下面这个优化问题描述:
Figure PCTCN2020087404-appb-000015
其中
Figure PCTCN2020087404-appb-000016
表示的是三维点P k到拟合目标平面的距离,当计算中使用的所有特征点的该距离之和为最小值时,得到的平面就为拟合目标平面,相当于是三维化的最小二乘法。可使用SVD(Singular Value Decomposition,奇异值分解)求得上式的最优解。
像素点与拟合目标平面的焦点记为P 0(x,y,z),满足
Figure PCTCN2020087404-appb-000017
求解线性方程组得到
Figure PCTCN2020087404-appb-000018
即为展示对象的中心。
在一些实施例中,目标视频是可移动平台对空间中的目标对象进行跟随时由拍摄装置拍摄获取的,确定展示对象在空间中的位置信息,包括:获取拍摄装置的跟随对象在空间中的位置信息;根据跟随对象在空间中的位置信息确定展示对象在空间中的位置信息。
在该实施例中,由于拍摄装置进行跟随拍摄时本身就需选取跟随对象,此时默认将跟随对象作为兴趣点或兴趣区域,直接基于跟随对象在空间中的位置信息确定展示对象在空间中的位置信息,例如可直接将跟随对象的位置作为展示对象的位置,也可基于跟随对象的位置调整展示对象的位置,有助于大幅减少计算量,降低计算负荷。
在一些实施例中,目标视频是可移动平台对空间中的目标对象进行环绕运动时由拍摄装置拍摄获取的,确定展示对象在空间中的位置信息,包括:获取拍摄装置在空间中的环绕对象的位置信息;根据环绕对象在空间中的位置信息确定展示对象在空间中的位置信息。
在该实施例中,由于拍摄装置进行环绕拍摄时本身就需选取环绕对象,此时默认将环绕对象作为兴趣点或兴趣区域,直接基于环绕对象在空间中的位置信息确定展示对象在空间中的位置信息,例如可直接将环绕对象的位置作为展示对象的位置,也可基于环绕对象的位置调整展示对象的位置,有助于大幅减少计算量,降低计算负荷。
至此,本申请实施例计算兴趣点的方案描述完毕。
此外,在一些实施例中,可移动平台的控制终端包括图像的处理装置,图像的处理方法还包括:播放或存储或者运行社交应用程序分享目标合成视频,以供用户观看、保存或分享目标合成视频。
在另一些实施例中,可移动平台包括图像的处理装置,方法还包括:将目标合成视频发送给可移动平台的控制终端以使控制终端播放或存储或者运行社交应用程序分享目标合成视频,以供用户观看、保存或分享目标合成视频。
可以理解的是,在合成目标合成视频前,可先逐帧播放完成投影的图 像帧,以供用户查看插入展示对象的效果,若用户确认效果,再合成并保存目标合成视频,若用户对效果不满意,可继续编辑展示对象、选取兴趣点或基于选择的兴趣点调整插入展示对象的位置。
综上,如图13所示,本申请实施例提供的图像的处理方法可简要概括如下:
(1)用户选择要编辑的视频(即初始视频),下载到智能设备的APP端,APP会自动下载对应的无人飞行器的惯性导航系统以及云台增稳系统提供的初始位姿信息,即AIS(Automatic Identification System,自动识别系统)文件,以及拍摄装置的内参K(背景知识中的矩阵K)。
(2)用户先对视频进行裁剪,选择想要的部分,APP将裁剪后得到的目标视频拆分为图像帧,根据筛选策略,筛选出可计算的视频片段,作为目标子视频。
(3)用户在目标子视频中的目标图像帧上,选择兴趣点(实际过程中,用户可框选一块区域ROI,区域中心点即为兴趣点)。其中智能跟随的视频(Tracking video),以及一键拍摄的视频(Quick shot video),默认兴趣点在选取的拍摄主体上(智能跟随的目标,或是一键拍摄环绕的主体)。同时用户输入需要显示的展示对象。
(4)通过视频对应的初始位姿数据(如果是一键拍摄的短片,还有初始的轨迹数据以及初始兴趣点在空间中的三维位置信息),针对全图提取特征点,并做特征点匹配,每一帧都做该操作,再通过BA光束平差算法,计算出拍摄装置精准的位姿信息,包含旋转矩阵R,与位移矩阵T。
(5)计算精准的兴趣点类似上一步计算位姿信息,一个区别在于位姿信息只需要计算一次,计算过的图像帧无需保存;但对于兴趣点,用户可能会随时调整,所以需要从头到尾保存图像帧。由于手机的内存限制,可做降频处理,比如手机视频是30Hz即1秒30张图,可间隔地抽取其中5张,该操作只在内存限制的情况下使用。另一个区别是针对兴趣点的计算,可只在框选区域ROI内提取特征点,并做跟踪匹配计算,计算出精准的兴趣点。
(6)在模型库中找到用户输入的展示对象(如文字)的三维模型,渲 染投影3D字幕,用户可以调整3D字幕的位置以及姿态(可以平移可以旋转),这里的调整是基于兴趣点的相对旋转以及位置变化,所以只有在有兴趣点出现的视频部分,才有3D字幕,如果没有兴趣点的视频部分也想要字幕,就需要重新选取兴趣点。
(7)用户确认效果后,将图像帧重新合成视频。
本申请第二方面的实施例提供了一种图像的处理装置,如前所述,图像的处理装置可单独设置在可移动平台上,也可单独设置在可移动平台的控制终端上,还可部分设置在可移动平台上,部分设置在可移动平台的控制终端上。可移动平台例如可为无人飞行器,还可为其他带有多摄像头的载具,例如无人驾驶的汽车。控制终端可为任何能与可移动平台交互的终端设备,例如可为遥控设备,也可为智能设备(经APP实现交互),如智能手机、智能平板、智能眼镜(如VR眼镜、AR眼镜),还可将可移动平台的SD卡插入电脑,此时控制终端为电脑。
图14示出了根据本申请的一个实施例的图像的处理装置的示意框图。如图14所示,该图像的处理装置100包括:存储器102,被配置为存储计算机程序;处理器104,被配置为执行计算机程序以实现:获取可移动平台在空间中移动时可移动平台的拍摄装置采集到的目标视频;获取拍摄装置在采集目标视频中每一图像帧时的位姿信息;获取用户编辑的展示对象;确定展示对象在空间中的位置信息;根据展示对象在空间中的位置信息和每一图像帧对应的位姿信息将展示对象投影到每一图像帧上以获取目标合成视频。
本申请实施例提供的图像的处理装置100,获取可移动平台的拍摄装置移动采集的目标视频以及采集过程中拍摄装置对应的位姿信息,并获取用户编辑的展示对象,可将该展示对象插入目标视频,实现特效视频的制作。具体而言,通过获取采集每一帧图像时,拍摄装置对应的位姿信息(作为拍摄装置的外参,包括旋转信息和位移信息),可结合已知的拍摄装置的内参,实现世界坐标系与齐次图像坐标系之间的转换,从而确定空间中的实体在目标视频中每一帧图像内的视图。此外,通过确定展示对象在空间中的位置信息,可明确展示对象的插入位置。展示对象在空间中的位置 信息反映了展示对象的绝对位置,每一图像帧对应的位姿信息则反映了拍摄装置的拍摄视角,结合二者即可将展示对象投影到图像帧中,得到合成图像帧,全部合成图像帧按序组合就形成了目标合成视频,至此完成了将展示对象插入目标视频中的处理。需说明的是,空间中的位置信息可以是世界坐标系下的位置信息,也可以是结合每一图像帧对应的位姿信息以及世界坐标系下的位置信息得到的相机坐标系下的位置信息。可以理解的是,可先获取用户编辑的展示对象,也可先确定展示对象在空间中的位置信息,对于二者的执行顺序,本申请不予限制。
具体地,存储器102可以包括用于数据或指令的大容量存储器。举例来说而非限制,存储器102可包括硬盘驱动器(Hard Disk Drive,HDD)、软盘驱动器、闪存、光盘、磁光盘、磁带或通用串行总线(Universal Serial Bus,USB)驱动器或者两个或更多个以上这些的组合。在合适的情况下,存储器102可包括可移除或不可移除(或固定)的介质。在合适的情况下,存储器102可在综合网关容灾设备的内部或外部。在特定实施例中,存储器102是非易失性固态存储器。在特定实施例中,存储器102包括只读存储器(ROM)。在合适的情况下,该ROM可以是掩模编程的ROM、可编程ROM(PROM)、可擦除PROM(EPROM)、电可擦除PROM(EEPROM)、电可改写ROM(EAROM)或闪存或者两个或更多个以上这些的组合。
上述处理器104可以包括中央处理器(CPU),或者特定集成电路(Application Specific Integrated Circuit,ASIC),或者可以被配置成实施本申请实施例的一个或多个集成电路。
在一些实施例中,处理器104还被配置为执行计算机程序以实现:获取每一图像帧中的特征点在图像帧中的位置信息;根据特征点在图像帧中的位置信息确定拍摄装置在采集每一图像帧时的位姿信息。
在该实施例中,基于一般相机模型,针对一个特征点,可建立其在世界坐标系下的三维位置信息和其在齐次图像坐标系下的二维位置信息(即图像帧中的位置信息)的转换方程。其中,利用图像识别可确定每一图像帧中的多个特征点在图像帧中的位置信息(已知量),而相邻两个图像帧往往具有大量重合的特征点,不同图像帧中的同一特征点在世界坐标系下 的三维位置信息(未知量)相同,一个图像帧也具有唯一的位姿信息(未知量),通过在目标视频中逐帧全图提取特征点,并做特征点跟踪匹配,可获得多组转换方程,联立求解即可得到每一图像帧对应的位姿信息以及各个特征点在世界坐标系下的三维位置信息。
具体计算时,若在跟踪特征点时分析出错的图像帧,可予以删除,以优化计算结果。此外,一个目标视频中的图像帧的数量是巨大的,若以第一帧图像帧为基础进行逐帧跟踪,则可能出现较大偏差。为此,可在逐帧跟踪时将出现明显变化的图像帧标记为关键帧,再基于关键帧跟踪后续的非关键帧,并在出现新的关键帧时,以新的关键帧为基础跟踪其后的非关键帧,从而提高计算准确度。实际计算时,用于计算的目标视频中存在5个以上的关键帧,就可进行计算。
在一些实施例中,处理器104还被配置为执行计算机程序以实现:获取拍摄装置在采集每一图像帧时的初始位姿信息,其中,初始位姿信息是可移动平台配置的传感器采集得到的;根据特征点在图像帧中的位置信息和每一图像帧对应的初始位姿信息确定拍摄装置在采集每一图像帧时的位姿信息。
在该实施例中,利用可移动平台上配置的传感器,例如无人飞行器的惯性导航系统以及云台增稳系统中的传感器,可以在采集目标视频的过程中获取到一些先验信息,包括无人飞行器位姿数据、云台位姿数据、一键拍摄所用的轨迹数据,由此可得到拍摄装置精度较低的初始位姿信息,利用初始位姿信息作为联立求解方程组时位姿信息的迭代初始值,可减少迭代次数,加快算法收敛时间,同时减少初始值选取不当造成的出错的概率,有助于缩短目标视频的后期处理时间,即使使用智能手机也能在目标视频中插入展示对象,制作目标合成视频。
在一些实施例中,处理器104还被配置为执行计算机程序以实现:获取拍摄装置在采集每一图像帧时的初始位姿信息,其中,初始位姿信息是可移动平台配置的传感器采集得到的;根据展示对象在空间中的位置信息和每一图像帧对应的初始位姿信息将展示对象投影到每一图像帧上以获取预览合成视频。
在该实施例中,利用可移动平台的传感器采集到的粗略的初始位姿信息直接进行坐标变换,得到的合成视频会存在展示对象抖动的问题,效果欠佳,但计算速度快,可用于制作预览合成视频,便于快速实现效果预览。
在一些实施例中,处理器104还被配置为执行计算机程序以实现:根据展示对象在空间中的位置信息和每一图像帧对应的位姿信息确定展示对象在每一图像帧中的投影位置和投影姿态;根据展示对象在每一图像帧中的投影位置和投影姿态将展示对象投影到每一图像帧中以获取目标合成视频。
在该实施例中,展示对象本身并非点,因而具有一定形状,展示对象在空间中的位置信息例如可为展示对象上的一个参考点在空间中的位置。以展示对象在空间中的位置信息是世界坐标系下的位置信息为例,结合每一图像帧的位姿信息和拍摄装置的内参,可以将展示对象在空间中的位置信息转换为每一图像帧中的位置信息,作为展示对象在每一图像帧中的投影位置。再利用每一图像帧对应的位姿信息确定展示对象的朝向,也可理解为对整个展示对象进行坐标变换,即可得到展示对象在每一图像帧中的投影姿态。按照确定的投影姿态将展示对象置于相应图像帧中的相应投影位置,即可完成展示对象的投影,得到合成图像帧,进而组合得到目标合成视频。
在一些实施例中,处理器104还被配置为执行计算机程序以实现:获取用户编辑的展示对象的位置调整信息和/或姿态调整信息;根据展示对象在空间中的位置信息、每一图像帧对应的位姿信息和展示对象的位置调整信息和/或姿态调整信息确定展示对象在每一图像帧中的投影位置和投影姿态。
在该实施例中,处理器104还可获取用户编辑的展示对象的位置调整信息和/或姿态调整信息。对于已经确定的展示对象在空间中的位置,可将之作为放置展示对象的初始位置,通过获取位置调整信息,可进一步调整展示对象的位置,此时不必对新的位置重新进行运算,既有助于降低计算量,又可以初始位置作为桥梁,解决用户实际期望插入展示对象的位置无法计算或无法准确计算的问题,提升了方案的灵活性,可满足丰富的图像 处理需求。此外,展示对象默认在图像帧中正面摆放,通过获取姿态调整信息,可调整展示对象的旋转角度,进而改变姿态,通过少量的计算就可满足用户丰富的展示需求。
在一些实施例中,处理器104还被配置为执行计算机程序以实现:检测用户的展示对象编辑操作,根据检测到的编辑操作确定用户编辑的展示对象。
在该实施例中,当图像的处理装置100至少部分设置在可移动平台的控制终端上时,展示对象具体是通过控制终端检测用户的展示对象编辑操作来确定的,可准确获取用户编辑的展示对象,以满足用户的展示需求。
在一些实施例中,处理器104还被配置为执行计算机程序以实现:控制交互装置显示展示对象编辑界面;检测用户对显示展示对象编辑界面的交互装置的展示对象编辑操作。
在该实施例中,通过控制交互装置显示展示对象编辑界面,可提供明确的界面供用户操作,并利用交互装置准确检测该操作,保证获取到准确的展示对象。交互装置例如可为触摸显示屏。
在一些实施例中,展示对象包括数字、字母、符号、文字和物体标识中的至少一种,以满足用户丰富的展示需求。相应地,展示对象编辑界面可设文本输入框以供用户利用输入法输入数字、字母,并可配置字体库,用户也可载入新的字体库或删除已有字体库。展示对象编辑界面还可展示符号和物体标识的集合以供用户选择。此外,展示对象也可由用户自行绘制,可以是以绘制的方式输入数字、字母、符号、文字和物体标识外,也可以是绘制任意图形。
在一些实施例中,展示对象为三维模型,以满足丰富的展示需求。
在一些实施例中,处理器104还被配置为执行计算机程序以实现:获取可移动平台在空间中移动时拍摄装置采集的初始视频;检测用户的视频选取操作,根据检测到的视频选取操作从初始视频中确定目标视频。
在该实施例中,当图像的处理装置100至少部分设置在可移动平台的控制终端上时,通过控制终端检测用户针对初始视频做出的视频选取操作,可对初始视频进行选取编辑。此时目标视频为从初始视频中选取的一部分, 可以是用户期望保留的一部分,也可以是用户选择的需要进行后续处理的部分,例如需要插入展示对象的部分,既提升了视频制作的灵活性,又减少了不必要的计算量。
在一些实施例中,处理器104还被配置为执行计算机程序以实现:控制交互装置显示视频选取界面;检测用户对显示视频选取界面的交互装置的视频选取操作。
在该实施例中,通过控制交互装置显示视频选取界面,可提供明确的界面以供用户操作,并利用交互装置准确检测该操作,保证获取到准确的目标视频。交互装置例如可为触摸显示屏。
在一些实施例中,处理器104还被配置为执行计算机程序以实现:获取用户在目标视频中的目标图像帧中选中的像素点在目标图像帧中的位置或者在目标图像帧中选中的像素点区域在目标图像帧中的位置;根据像素点或者像素点区域在目标图像帧中的位置确定展示对象在空间中的位置信息。
在该实施例中,目标视频包含多个图像帧,用户可选中其中一个图像帧作为目标图像帧,并选中目标图像帧中的一个像素点或像素点区域(即选择兴趣点,选中像素点区域ROI时以像素点区域的中心点作为兴趣点),以便指示展示对象的插入位置。
对于像素点,既可由用户随意选取,也可采用在图像帧中显示特征点,并令特征点处于可选择的状态的方案,使得用户可直接选择已经识别出的特征点作为兴趣点,以简化后续计算。
对于像素点区域,可利用像素点区域中的参考点,例如左上角处的像素点和右下角处的像素点,来代表像素点区域,用户例如可通过同时或先后选中这两个像素点来框选像素点区域,例如还可通过先选中一个像素点再滑动至另一个像素点来选中像素点区域。
进一步地,对于框选出像素点区域的情况,还可通过SLIC、Graph-based、NCut、Turbopixel、Quick-shift、Graph-cut a、Graph-cut b等算法,在图像帧中生成超像素(即具有相似纹理、颜色、亮度等特征的相邻像素点构成的有一定视觉意义的不规则像素块)。处于框内的超像素被 选中,处于框外的超像素被排除,处于框边界上的像素点,则可设定当所属超像素有一定比例(例如50%)以上的部分在框内,则算作选中,否则算作未选中,所有选中的超像素就构成了像素点区域。
此外,选中的兴趣点在目标图像帧中的位置是明确的,但仍需计算其在目标图像帧之外的其他图像帧中的位置,或者说需要计算其在空间中的位置。该计算过程类似于每一图像帧对应的位姿信息的计算过程,即通过跟踪匹配特征点,联立求解方程组实现,此时可将选中的兴趣点也作为一个特征点,建立转换方程。不同之处在于,一者,计算位姿信息时需全图提取特征点,而计算兴趣点时只需提取选中的像素点附近的特征点或选中的像素点区域中的特征点,以提高计算精度;二者,计算位姿信息逐帧跟踪特征点时,内存中只需保存提取的特征点以及当前正在处理的图像帧,无需保存所有图像帧,而计算兴趣点时,由于用户可能调整兴趣点,故需从头到尾保存图像帧。对于第二点区别的情况,当存在内存限制时,可对目标视频做降频处理,比如手机视频是30Hz,即1秒30张图,可等间隔抽取其中5张。此外,计算兴趣点时还可沿目标视频时间轴的正向和反向各进行一次特征点跟踪,以得到精确的计算结果。后文将对计算兴趣点的方案做进一步描述。
在一些实施例中,处理器104还被配置为执行计算机程序以实现:从目标视频中确定目标子视频;获取用户在目标子视频中的目标图像帧中选中的像素点在目标图像帧中的位置或者在目标图像帧中选中的像素点区域在目标图像帧中的位置。
在该实施例中,通过从目标视频中确定目标子视频,可筛选出可用于计算兴趣点的视频部分,以供用户在该部分选择兴趣点,可确保兴趣点的准确计算。需说明的是,目标子视频是目标视频中可以用于计算兴趣点的部分,所以只能在目标子视频内选取兴趣点,但展示对象依然可以在不可计算的视频片段中出现,只要有兴趣点出现即可。比如一段不可计算的视频片段出现在了目标子视频后面,那么基于目标子视频调整的展示对象,也可以出现在不可计算部分的视频中。
在一些实施例中,处理器104还被配置为执行计算机程序以实现:响 应于用户在目标视频中的目标子视频之外的图像帧中选中像素点或者像素点区域的操作,输出选中无效的提示信息。
在该实施例中,由于本申请提出的方案在计算兴趣点时,需要目标可以持续使用智能跟随算法追踪,所以需要做一些准入判定,对于可能测量不准或是不工作的目标提前给予警示,继续执行可能会失败。当用户在目标子视频之外的图像帧中选择兴趣点时,通过输出选中无效的提示信息,就可提醒用户修改兴趣点,以此作为兴趣点计算的一个准入判定条件。
具体地,可控制交互装置输出上述选中无效的提示信息,交互装置例如可为触摸显示屏、智能语音交互装置。此外,若图像的处理装置100设置在可移动平台的控制终端上,也可通过例如显示、语音播报等方式直接输出上述选中无效的提示信息,若图像的处理装置100设置在可移动平台上,也可通过例如点亮警示灯等方式直接输出上述选中无效的提示信息。
在一些实施例中,处理器104还被配置为执行计算机程序以实现:输出第一提示信息,其中,第一提示信息用于指示用户在目标子视频中的目标图像帧中选中像素点或者像素点区域。
在该实施例中,通过输出第一提示信息,可主动提供可计算的目标子视频中的图像帧,以供用户方便准确地选择目标图像帧,例如可在显示时令目标子视频中的图像帧处于可选择状态,而灰化目标子视频之外的图像帧,或在语音播报可供选择的图像帧时仅播报目标子视频中的图像帧。
具体地,可控制交互装置输出上述第一提示信息,交互装置例如可为触摸显示屏、智能语音交互装置。此外,若图像的处理装置100设置在可移动平台的控制终端上,也可通过例如显示、语音播报等方式直接输出上述第一提示信息,若图像的处理装置100设置在可移动平台上,也可通过例如点亮警示灯等方式直接输出上述第一提示信息。
在一些实施例中,目标子视频包括目标视频中拍摄装置的运动状态满足预设的运动条件时拍摄装置采集到的视频。
在该实施例中,限定了目标子视频需满足的选取条件。预设的运动条件具体是指拍摄装置发生了位移,而非静止或仅在原地摇头,以确保准确计算兴趣点。
在一些实施例中,处理器104还被配置为执行计算机程序以实现:从目标视频中确定多个连续图像帧,其中,多个连续图像帧相邻的图像帧之间特征点的平均移动量之和大于或等于预设的距离阈值,多个连续图像帧的视差大于或等于预设的视差阈值;将多个连续图像帧确定为目标子视频。
在该实施例中,目标子视频由多个连续图像帧构成,这多个连续图像需满足两个条件。第一个条件是,相邻的图像帧之间特征点的平均移动量之和大于或等于预设的距离阈值,以确保足够的移动量。第二个条件是,多个连续图像帧的视差大于或等于预设的视差阈值,可过滤拍摄装置在原地摇头导致的移动量。需说明的是,上述多个连续图像帧可具体包含多个连续片段,每个片段由多个连续的图像帧组成,也就是将上述多个连续图像帧划分为多段。特别地,当上述多个连续图像帧仅包含一个片段时,就相当于不对上述多个连续图像帧进行划分。相应地,上述第二个条件具体可以是这多个连续片段的视差之和大于或等于预设的视差阈值,也可以是每个片段的视差均大于或等于一个预设的阈值,这个阈值可小于或等于预设的视差阈值。对应于每个片段的视差均大于或等于一个预设的阈值的情况,与之类似,上述第一个条件也可以进一步要求每个片段中相邻的图像帧之间特征点的平均移动量之和大于或等于一个预设的阈值,这个阈值可小于或等于预设的距离阈值。
在一些实施例中,多个连续图像帧的数量大于或等于预设的图像数量阈值。
在该实施例中,对多个连续图像帧的数量进行了限定。由于多个连续图像帧具有足够大的移动量,因此连续图像帧的数量过少就意味着拍摄装置在较短的时间内发生了较大的移动,会造成连续观测到的特征点数量较少而不便于计算。通过限定图像数量阈值,可确保在多个连续图像帧中能够连续观测到的特征点的数量足够多,保证兴趣点计算的准确度。
实际计算时,如图10所示,对于目标视频,可先逐帧进行特征点提取和跟踪,然后按特征点的平均移动量累计值划分片段,再将视差达到阈值(例如10个像素)的片段作为可用片段,未达到阈值的片段作为不可用片段,最后合并相邻的同类片段,成为部分,若一个部分中包含预定个数(例 如5个)以上的可用片段,则该部分成为可计算部分,即为目标子视频,否则该部分为不可计算部分,以同时满足前述两个条件以及多个连续图像帧的数量要求。具体地,对于片段的划分,可计算前后两个图像帧的全图特征点的平均移动量,并逐帧计算累加,直到累计值大于一定的阈值,如20个像素。例如从1号图像帧一直累计到9号图像帧,特征点的平均移动量累计值为18像素,到10号图像帧就变成了21个像素,则1号图像帧至10号图像帧划为一个片段。此后可计算1号图像帧与10号图像帧的视差,即为该片段的视差。
在一些实施例中,处理器104还被配置为执行计算机程序以实现:确定像素点或者像素点区域指示的空间中物体是否为静止物体;当物体为静止物体时,根据像素点或者像素点区域在目标图像帧中的位置确定用户编辑的展示对象在空间中的位置信息。
在该实施例中,由于本申请提出的方案在计算兴趣点时,需要目标在测量的短时间内静止不动,所以需要做一些准入判定,可通过卷积神经网络CNN判别选中的目标物体是否是潜在的运动物体(例如人、车、船、海浪),并在确定像素点或者像素点区域指示的空间中物体为静止物体时才执行计算,以确保计算结果准确。
在一些实施例中,处理器104还被配置为执行计算机程序以实现:确定像素点或者像素点区域指示的空间中物体是否为静止物体;当物体不为静止运动时,输出第二提示信息,其中,第二提示信息用于提示用户像素点或者像素点区域不可选,或者用于提示用户选中其他像素点或者像素点区域。
在该实施例中,在像素点或者像素点区域指示的空间中物体不为静止物体时,通过输出第二提示信息,可提醒用户作出修改,以确保计算结果准确。当第二提示信息用于提示用户选中其他像素点或者像素点区域时,可提示用户选中静止物体上的特征点,有助于提高提示效率,降低计算量,提升计算准确度。这里需要注意的是,如前所述,兴趣点只是放置展示对象的初始位置,可以再进一步基于兴趣点调整展示对象的位置。比如直接把兴趣点设置在海浪上,是会弹出警告,但可以先把兴趣点设置在海滩上, 最后再调整展示对象到海面上即可。
具体地,可控制交互装置输出上述第二提示信息,交互装置例如可为触摸显示屏、智能语音交互装置。此外,若图像的处理装置100设置在可移动平台的控制终端上,也可通过例如显示、语音播报等方式直接输出上述第二提示信息,若图像的处理装置100设置在可移动平台上,也可通过例如点亮警示灯等方式直接输出上述第二提示信息。
在一些实施例中,处理器104还被配置为执行计算机程序以实现:确定像素点区域是否满足预设的纹理条件;当满足预设的纹理条件时,根据像素点或者像素点区域在目标图像帧中的位置确定用户编辑的展示对象在空间中的位置信息。
在该实施例中,由于本申请提出的方案在计算兴趣点时,需要目标可以持续使用智能跟随算法追踪,所以需要选中的像素点区域具有足够多的特征点,即满足预设的纹理条件时才执行计算,以确保计算结果准确。对于选中像素点的情况,可将像素点周围一定尺寸范围内的区域作为像素点区域。
在一些实施例中,处理器104还被配置为执行计算机程序以实现:确定像素点区域是否满足预设的纹理条件;当不满足预设的纹理条件时,输出第三提示信息,其中,第三提示信息用于提示用户像素点或者像素点区域不可选,或者用于提示用户选中其他像素点或者像素点区域。
在该实施例中,当选中的像素点区域不满足预设的纹理条件时,通过输出第三提示信息,可提醒用户作出修改,以确保计算结果准确。当第三提示信息用于提示用户选中其他像素点或者像素点区域时,可提示用户选中满足纹理条件的像素点区域或该区域内的特征点,有助于提高提示效率,降低计算量,提升计算准确度。
具体地,可控制交互装置输出上述第三提示信息,交互装置例如可为触摸显示屏、智能语音交互装置。此外,若图像的处理装置100设置在可移动平台的控制终端上,也可通过例如显示、语音播报等方式直接输出上述第三提示信息,若图像的处理装置100设置在可移动平台上,也可通过例如点亮警示灯等方式直接输出上述第三提示信息。
可以理解的是,处理器104可仅实现针对静止物体的检测,也可仅实现针对纹理条件的检测,还可既实现针对静止物体的检测,又实现针对纹理条件的检测。对于第三种情况,两种检测在执行时没有先后顺序要求,相应地,第二提示信息与第三提示信息可以相同,也可以不同,在此不做限定。
在一些实施例中,处理器104还被配置为执行计算机程序以实现:根据像素点区域在目标图像帧中的位置确定像素点区域中的特征点;获取像素点区域中的特征点在目标图像帧中的位置;根据特征点在目标图像帧中的位置确定像素点区域的几何中心像素点的对应的空间点在空间中的位置信息。
在该实施例中,具体限定了选中像素点区域时的一个计算兴趣点的方案。如前所述,在计算兴趣点时,首先提取选中的像素点区域内的特征点,可减少计算量、提高计算精度。再者,兴趣点具体是像素点区域的几何中心像素点,其在目标图像帧中的位置是已知的,但要计算其对应的空间点在空间中的位置信息,还需要知道兴趣点在其他图像帧中的位置。然而兴趣点大概率并非特征点,此时可利用像素点区域内的特征点来拟合估算几何中心像素点。通过获取提取的特征点在目标图像帧中的位置,可得到提取的特征点与兴趣点的位置关系,据此进行拟合估算,有助于提高计算准确度。
在一些实施例中,处理器104还被配置为执行计算机程序以实现:根据特征点在目标图像帧中的位置确定像素点区域的几何中心像素点的对应的空间点在目标视频至少一帧参考图像帧的光流向量;根据光流向量确定几何中心像素点的对应的空间点在至少一帧参考图像帧中的位置;根据在至少一帧参考图像帧中的位置和特征点在目标图像帧中的位置确定几何中心像素点的对应的空间点在空间中的位置信息。
在该实施例中,拟合计算时,具体采用了求取光流向量的方案。通过在目标视频的至少一帧参考图像帧进行特征点跟踪,例如可采用KLT特征跟踪算法计算特征点的光流向量,再结合特征点与兴趣点的位置关系,即可拟合计算出兴趣点在至少一帧参考图像帧的光流向量。具体地,可计算 特征点的光流向量的加权平均数,作为兴趣点的光流向量。得到兴趣点在至少一帧参考图像帧的光流向量后,结合兴趣点在目标图像帧中的位置,即可得到兴趣点在至少一帧参考图像帧的位置,此时可将兴趣点也作为一个特征点,和其他特征点一起,进行坐标转换方程组的建立和求解,进而完成兴趣点计算。具体地,可采用BA光束平差算法计算。
可以理解的是,对于选中像素点的情况,可以将上述像素点区域的几何中心像素点替换为选中的像素点,将上述像素点区域中的特征点替换为选中的像素点附近一定范围内的特征点,同样可以完成计算。
在一些实施例中,处理器104还被配置为执行计算机程序以实现:获取目标图像帧中的目标特征点对应的空间点在空间中的位置;根据目标特征点对应的空间点在空间中的位置拟合目标平面;根据像素点在目标图像帧中的位置和拟合目标平面确定展示对象在空间中的位置信息。
在该实施例中,具体限定了选中像素点时的一个计算兴趣点的方案。与前述实施例一样,像素点大概率并非特征点,因此需结合附近的特征点进行拟合估算。此处目标特征点例如可为计算拍摄装置的位姿信息时分析出的可靠特征点,以确保兴趣点计算的准确度。当目标特征点为前述可靠特征点时,在计算拍摄装置的位姿信息时就已经得到了目标特征点对应的空间点在空间中的位置。当目标特征点不为前述可靠特征点时,则需计算其对应的空间点在空间中的位置,计算方法仍为求解转换方程组,在此不再赘述。由于目标特征点在选中的像素点附近,因而可认为像素点对应的空间点也处在拟合目标平面内。过像素点和拍摄装置的光心做一条连线,该连线与拟合目标平面的交点是像素点与拟合目标平面的焦点,可认为该焦点就是像素点对应的空间点,由此可完成兴趣点计算,进而得到展示对象在空间中的位置信息。
在一些实施例中,目标特征点与像素点之间的像素距离小于或等于预设的像素距离阈值,即目标特征点位于像素点附近,以保证拟合计算的准确度。
在一些实施例中,目标视频是可移动平台对空间中的目标对象进行跟随时由拍摄装置拍摄获取的,处理器104还被配置为执行计算机程序以实 现:获取拍摄装置的跟随对象在空间中的位置信息;根据跟随对象在空间中的位置信息确定展示对象在空间中的位置信息。
在该实施例中,由于拍摄装置进行跟随拍摄时本身就需选取跟随对象,此时默认将跟随对象作为兴趣点或兴趣区域,直接基于跟随对象在空间中的位置信息确定展示对象在空间中的位置信息,例如可直接将跟随对象的位置作为展示对象的位置,也可基于跟随对象的位置调整展示对象的位置,有助于大幅减少计算量,降低计算负荷。
在一些实施例中,目标视频是可移动平台对空间中的目标对象进行环绕运动时由拍摄装置拍摄获取的,处理器104还被配置为执行计算机程序以实现:获取拍摄装置的环绕对象在空间中的位置信息;根据环绕对象在空间中的位置信息确定展示对象在空间中的位置信息。
在该实施例中,由于拍摄装置进行环绕拍摄时本身就需选取环绕对象,此时默认将环绕对象作为兴趣点或兴趣区域,直接基于环绕对象在空间中的位置信息确定展示对象在空间中的位置信息,例如可直接将环绕对象的位置作为展示对象的位置,也可基于环绕对象的位置调整展示对象的位置,有助于大幅减少计算量,降低计算负荷。
在一些实施例中,可移动平台的控制终端包括图像的处理装置100,处理器104还被配置为执行计算机程序以实现:播放或存储或者运行社交应用程序分享目标合成视频,以供用户观看、保存或分享目标合成视频。
在另一些实施例中,可移动平台包括图像的处理装置100,处理器104还被配置为执行计算机程序以实现:将目标合成视频发送给可移动平台的控制终端以使控制终端播放或存储或者运行社交应用程序分享目标合成视频,以供用户观看、保存或分享目标合成视频。
可以理解的是,在合成目标合成视频前,可先逐帧播放完成投影的图像帧,以供用户查看插入展示对象的效果,若用户确认效果,再合成并保存目标合成视频,若用户对效果不满意,可继续编辑展示对象、选取兴趣点或基于选择的兴趣点调整插入展示对象的位置。
如图15所示,本申请第三方面的实施例提供了一种可移动平台200,包括如上述部分实施例的图像的处理装置100,因而具有该图像的处理装 置100相应的技术效果,在此不再赘述。可移动平台200例如可为无人飞行器,还可为其他带有多摄像头的载具,例如无人驾驶的汽车。
如图16所示,本申请第四方面的实施例提供了一种可移动平台的控制终端300,包括如上述部分实施例的图像的处理装置100,因而具有该图像的处理装置100相应的技术效果,在此不再赘述。可移动平台的控制终端300可为任何能与可移动平台交互的终端设备,例如可为遥控设备,也可为智能设备(经APP实现交互),如智能手机、智能平板、智能眼镜(如VR眼镜、AR眼镜),还可将可移动平台的SD卡插入电脑,此时可移动平台的控制终端300为电脑。
本申请第五方面的实施例提供了一种计算机可读存储介质,其上存储有计算机程序,计算机程序被处理器执行时实现如上述任一实施例的图像的处理方法的步骤,因而具有该图像的处理方法的全部技术效果,在此不再赘述。
计算机可读存储介质可以包括能够存储或传输信息的任何介质。计算机可读存储介质的例子包括电子电路、半导体存储器设备、ROM、闪存、可擦除ROM(EROM)、软盘、CD-ROM、光盘、硬盘、光纤介质、射频(RF)链路,等等。代码段可以经由诸如因特网、内联网等的计算机网络被下载。
在本申请中,术语“多个”则指两个或两个以上,除非另有明确的限定。术语“安装”、“相连”、“连接”、“固定”等术语均应做广义理解,例如,“连接”可以是固定连接,也可以是可拆卸连接,或一体地连接;“相连”可以是直接相连,也可以通过中间媒介间接相连。对于本领域的普通技术人员而言,可以根据具体情况理解上述术语在本申请中的具体含义。
在本说明书的描述中,术语“一个实施例”、“一些实施例”、“具体实施例”等的描述意指结合该实施例或示例描述的具体特征、结构、材料或特点包含于本申请的至少一个实施例或示例中。在本说明书中,对上述术语的示意性表述不一定指的是相同的实施例或实例。而且,描述的具体特征、结构、材料或特点可以在任何的一个或多个实施例或示例中以合 适的方式结合。
以上所述仅为本申请的优选实施例而已,并不用于限制本申请,对于本领域的技术人员来说,本申请可以有各种更改和变化。凡在本申请的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本申请的保护范围之内。

Claims (67)

  1. 一种图像的处理方法,应用于图像的处理装置,其特征在于,包括:
    获取可移动平台在空间中移动时可移动平台的拍摄装置采集到的目标视频;
    获取所述拍摄装置在采集所述目标视频中每一图像帧时的位姿信息;
    获取用户编辑的展示对象;
    确定所述展示对象在所述空间中的位置信息;
    根据所述展示对象在所述空间中的位置信息和所述每一图像帧对应的位姿信息将所述展示对象投影到所述每一图像帧上以获取目标合成视频。
  2. 根据权利要求1所述的方法,其特征在于,所述获取所述拍摄装置在采集所述目标视频中每一图像帧时的位姿信息,包括:
    获取所述每一图像帧中的特征点在所述图像帧中的位置信息;
    根据所述特征点在所述图像帧中的位置信息确定所述拍摄装置在采集所述每一图像帧时的位姿信息。
  3. 根据权利要求2所述的方法,其特征在于,所述方法还包括:获取所述拍摄装置在采集所述每一图像帧时的初始位姿信息,其中,所述初始位姿信息是可移动平台配置的传感器采集得到的;
    所述根据所述特征点在所述图像帧中的位置信息确定所述拍摄装置在采集所述每一图像帧时的位姿信息,包括:
    根据所述特征点在所述图像帧中的位置信息和所述每一图像帧对应的初始位姿信息确定所述拍摄装置在采集所述每一图像帧时的位姿信息。
  4. 根据权利要求1-3任一项所述的方法,其特征在于,所述方法还包括:
    获取所述拍摄装置在采集所述每一图像帧时的初始位姿信息,其中,所述初始位姿信息是可移动平台配置的传感器采集得到的;
    根据所述展示对象在所述空间中的位置信息和所述每一图像帧对应的初始位姿信息将所述展示对象投影到所述每一图像帧上以获取预览合成视频。
  5. 根据权利要求1-4任一项所述的方法,其特征在于,所述根据所述展示对象在所述空间中的位置信息和所述每一图像帧对应的位姿信息将所述展示对象投影到所述每一图像帧上以获取目标合成视频,包括:
    根据所述展示对象在所述空间中的位置信息和所述每一图像帧对应的位姿信息确定所述展示对象在所述每一图像帧中的投影位置和投影姿态;
    根据所述展示对象在所述每一图像帧中的投影位置和投影姿态将所述展示对象投影到所述每一图像帧中以获取目标合成视频。
  6. 根据权利要求5所述的方法,其特征在于,所述方法还包括:获取用户编辑的所述展示对象的位置调整信息和/或姿态调整信息;
    所述根据所述展示对象在所述空间中的位置信息和所述每一图像帧对应的位姿信息确定所述展示对象在所述每一图像帧中的投影位置和投影姿态,包括:
    根据所述展示对象在所述空间中的位置信息、所述每一图像帧对应的位姿信息和所述展示对象的所述位置调整信息和/或姿态调整信息确定所述展示对象在所述每一图像帧中的投影位置和投影姿态。
  7. 根据权利要求1-6任一项所述的方法,其特征在于,所述获取用户编辑的展示对象,包括:
    检测用户的展示对象编辑操作,根据所述检测到的所述编辑操作确定所述用户编辑的展示对象。
  8. 根据权利要求7所述的方法,其特征在于,所述检测用户的展示对象编辑操作,包括:
    控制交互装置显示展示对象编辑界面;
    检测用户对显示所述展示对象编辑界面的交互装置的展示对象编辑操作。
  9. 根据权利要求1-8任一项所述的方法,其特征在于,所述方法还包括:获取所述可移动平台在空间中移动时所述拍摄装置采集的初始视频;
    所述获取可移动平台在空间中移动时可移动平台的拍摄装置采集到的目标视频,包括:
    检测用户的视频选取操作,根据所述检测到的所述视频选取操作从所 述初始视频中确定所述目标视频。
  10. 根据权利要求9所述的方法,其特征在于,所述检测用户的视频选取操作,包括:
    控制交互装置显示视频选取界面;
    检测用户对显示所述视频选取界面的交互装置的视频选取操作。
  11. 根据权利要求1-10任一项所述的方法,其特征在于,所述确定所述展示对象在所述空间中的位置信息,包括:
    获取用户在所述目标视频中的目标图像帧中选中的像素点在所述目标图像帧中的位置或者在所述目标图像帧中选中的像素点区域在所述目标图像帧中的位置;
    根据所述像素点或者像素点区域在所述目标图像帧中的位置确定所述展示对象在所述空间中的位置信息。
  12. 根据权利要求11所述的方法,其特征在于,所述方法还包括:从所述目标视频中确定目标子视频;
    所述获取用户在所述目标视频中的目标图像帧中选中的像素点在所述目标图像帧中的位置或者在所述目标图像帧中选中的像素点区域在所述目标图像帧中的位置,包括:
    获取用户在所述目标子视频中的目标图像帧中选中的像素点在所述目标图像帧中的位置或者在所述目标图像帧中选中的像素点区域在所述目标图像帧中的位置。
  13. 根据权利要求12所述的方法,其特征在于,所述方法还包括:响应于用户在所述目标视频中的目标子视频之外的图像帧中选中像素点或者像素点区域的操作,输出选中无效的提示信息。
  14. 根据权利要求12或13所述的方法,其特征在于,所述方法还包括:输出第一提示信息,其中,所述第一提示信息用于指示用户在所述目标子视频中的目标图像帧中选中像素点或者像素点区域。
  15. 根据权利要求12-14任一项所述的方法,其特征在于,所述目标子视频包括所述目标视频中拍摄装置的运动状态满足预设的运动条件时拍摄装置采集到的视频。
  16. 根据权利要求12-14任一项所述的方法,其特征在于,所述从所述目标视频中确定目标子视频,包括:
    从所述目标视频中确定多个连续图像帧,其中,所述多个连续图像帧相邻的图像帧之间特征点的平均移动量之和大于或等于预设的距离阈值,所述多个连续图像帧的视差大于或等于预设的视差阈值;
    将所述多个连续图像帧确定为目标子视频。
  17. 根据权利要求16所述的方法,其特征在于,所述多个连续图像帧的数量大于或等于预设的图像数量阈值。
  18. 根据权利要求11-17任一项所述的方法,其特征在于,所述确定所述展示对象在所述空间中的位置信息,还包括:
    确定所述像素点或者所述像素点区域指示的空间中物体是否为静止物体;
    所述根据所述像素点或者像素点区域在所述目标图像帧中的位置确定所述用户编辑的展示对象在所述空间中的位置信息,包括:
    当所述物体为静止物体时,根据所述像素点或者像素点区域在所述目标图像帧中的位置确定所述用户编辑的展示对象在所述空间中的位置信息。
  19. 根据权利要求11-18任一项所述的方法,其特征在于,所述确定所述展示对象在所述空间中的位置信息,还包括:
    确定所述像素点或者所述像素点区域指示的空间中物体是否为静止物体;
    当所述物体不为静止运动时,输出第二提示信息,其中,所述第二提示信息用于提示用户所述像素点或者所述像素点区域不可选,或者用于提示用户选中其他像素点或者像素点区域。
  20. 根据权利要求11-19任一项所述的方法,其特征在于,所述确定所述展示对象在所述空间中的位置信息,还包括:
    确定所述像素点区域是否满足预设的纹理条件;
    所述根据所述像素点区域在所述目标图像帧中的位置确定所述用户编辑的展示对象在所述空间中的位置信息,包括:
    当满足预设的纹理条件时,根据所述像素点或者像素点区域在所述目标图像帧中的位置确定所述用户编辑的展示对象在所述空间中的位置信息。
  21. 根据权利要求11-20任一项所述的方法,其特征在于,所述确定所述展示对象在所述空间中的位置信息,还包括:
    确定所述像素点区域是否满足预设的纹理条件;
    当不满足预设的纹理条件时,输出第三提示信息,其中,所述第三提示信息用于提示用户所述像素点或者所述像素点区域不可选,或者用于提示用户选中其他像素点或者像素点区域。
  22. 根据权利要求11-19任一项所述的方法,其特征在于,所述根据所述像素点区域在所述目标图像帧中的位置确定所述展示对象在所述空间中的位置信息,包括:
    根据所述像素点区域在所述目标图像帧中的位置确定所述像素点区域中的特征点;
    获取所述像素点区域中的特征点在所述目标图像帧中的位置;
    根据所述特征点在所述目标图像帧中的位置确定所述像素点区域的几何中心像素点的对应的空间点在所述空间中的位置信息。
  23. 根据权利要求22所述的方法,其特征在于,所述根据所述特征点在所述目标图像帧中的位置确定所述像素点区域的几何中心像素点的对应的空间点在所述空间中的位置信息,包括:
    根据所述特征点在所述目标图像帧中的位置确定所述像素点区域的几何中心像素点的对应的空间点在所述目标视频至少一帧参考图像帧的光流向量;
    根据所述光流向量确定所述几何中心像素点的对应的空间点在所述至少一帧参考图像帧中的位置;
    根据所述在所述至少一帧参考图像帧中的位置和所述特征点在所述目标图像帧中的位置确定所述几何中心像素点的对应的空间点在所述空间中的位置信息。
  24. 根据权利要求11-19任一项所述的方法,其特征在于,所述根据 所述像素点在所述目标图像帧中的位置确定所述展示对象在所述空间中的位置信息,包括:
    获取所述目标图像帧中的目标特征点对应的空间点在所述空间中的位置;
    根据所述目标特征点对应的空间点在所述空间中的位置拟合目标平面;
    根据所述像素点在所述目标图像帧中的位置和所述拟合目标平面确定所述展示对象在所述空间中的位置信息。
  25. 根据权利要求24所述的方法,其特征在于,所述目标特征点与所述像素点之间的像素距离小于或等于预设的像素距离阈值。
  26. 根据权利要求1-25任一项所述的方法,其特征在于,所述目标视频是可移动平台对所述空间中的目标对象进行跟随时由所述拍摄装置拍摄获取的,所述确定所述展示对象在所述空间中的位置信息,包括:
    获取所述拍摄装置的跟随对象在所述空间中的位置信息;
    根据所述跟随对象在所述空间中的位置信息确定所述展示对象在所述空间中的位置信息。
  27. 根据权利要求1-26任一项所述的方法,其特征在于,所述目标视频是可移动平台对所述空间中的目标对象进行环绕运动时由所述拍摄装置拍摄获取的,所述确定所述展示对象在所述空间中的位置信息,包括:
    获取所述拍摄装置的环绕对象在所述空间中的位置信息;
    根据所述环绕对象在所述空间中的位置信息确定所述展示对象在所述空间中的位置信息。
  28. 根据权利要求1-27任一项所述的方法,其特征在于,所述展示对象包括数字、字母、符号、文字和物体标识中的至少一种。
  29. 根据权利要求1-28任一项所述的方法,其特征在于,所述展示对象为三维模型。
  30. 根据权利要求1-29任一项所述的方法,其特征在于,所述方法还包括:播放或存储或者运行社交应用程序分享所述目标合成视频。
  31. 根据权利要求1-29任一项所述的方法,其特征在于,所述可移动 平台包括所述图像的处理装置,所述方法还包括:将所述目标合成视频发送给所述可移动平台的控制终端以使所述控制终端播放或存储或者运行社交应用程序分享所述目标合成视频。
  32. 根据权利要求1-31任一项所述的方法,其特征在于,所述可移动平台包括无人飞行器。
  33. 一种图像的处理装置,其特征在于,包括:
    存储器,被配置为存储计算机程序;
    处理器,被配置为执行所述计算机程序以实现:
    获取可移动平台在空间中移动时可移动平台的拍摄装置采集到的目标视频;
    获取所述拍摄装置在采集所述目标视频中每一图像帧时的位姿信息;
    获取用户编辑的展示对象;
    确定所述展示对象在所述空间中的位置信息;
    根据所述展示对象在所述空间中的位置信息和所述每一图像帧对应的位姿信息将所述展示对象投影到所述每一图像帧上以获取目标合成视频。
  34. 根据权利要求33所述的装置,其特征在于,所述处理器还被配置为执行所述计算机程序以实现:
    获取所述每一图像帧中的特征点在所述图像帧中的位置信息;
    根据所述特征点在所述图像帧中的位置信息确定所述拍摄装置在采集所述每一图像帧时的位姿信息。
  35. 根据权利要求34所述的装置,其特征在于,所述处理器还被配置为执行所述计算机程序以实现:
    获取所述拍摄装置在采集所述每一图像帧时的初始位姿信息,其中,所述初始位姿信息是可移动平台配置的传感器采集得到的;
    根据所述特征点在所述图像帧中的位置信息和所述每一图像帧对应的初始位姿信息确定所述拍摄装置在采集所述每一图像帧时的位姿信息。
  36. 根据权利要求33-35任一项所述的装置,其特征在于,所述处理 器还被配置为执行所述计算机程序以实现:
    获取所述拍摄装置在采集所述每一图像帧时的初始位姿信息,其中,所述初始位姿信息是可移动平台配置的传感器采集得到的;
    根据所述展示对象在所述空间中的位置信息和所述每一图像帧对应的初始位姿信息将所述展示对象投影到所述每一图像帧上以获取预览合成视频。
  37. 根据权利要求33-36任一项所述的装置,其特征在于,所述处理器还被配置为执行所述计算机程序以实现:
    根据所述展示对象在所述空间中的位置信息和所述每一图像帧对应的位姿信息确定所述展示对象在所述每一图像帧中的投影位置和投影姿态;
    根据所述展示对象在所述每一图像帧中的投影位置和投影姿态将所述展示对象投影到所述每一图像帧中以获取目标合成视频。
  38. 根据权利要求37所述的装置,其特征在于,所述处理器还被配置为执行所述计算机程序以实现:
    获取用户编辑的所述展示对象的位置调整信息和/或姿态调整信息;
    根据所述展示对象在所述空间中的位置信息、所述每一图像帧对应的位姿信息和所述展示对象的所述位置调整信息和/或姿态调整信息确定所述展示对象在所述每一图像帧中的投影位置和投影姿态。
  39. 根据权利要求33-38任一项所述的装置,其特征在于,所述处理器还被配置为执行所述计算机程序以实现:
    检测用户的展示对象编辑操作,根据所述检测到的所述编辑操作确定所述用户编辑的展示对象。
  40. 根据权利要求39所述的装置,其特征在于,所述处理器还被配置为执行所述计算机程序以实现:
    控制交互装置显示展示对象编辑界面;
    检测用户对显示所述展示对象编辑界面的交互装置的展示对象编辑操作。
  41. 根据权利要求33-40任一项所述的装置,其特征在于,所述处理器还被配置为执行所述计算机程序以实现:
    获取所述可移动平台在空间中移动时所述拍摄装置采集的初始视频;
    检测用户的视频选取操作,根据所述检测到的所述视频选取操作从所述初始视频中确定所述目标视频。
  42. 根据权利要求41所述的装置,其特征在于,所述处理器还被配置为执行所述计算机程序以实现:
    控制交互装置显示视频选取界面;
    检测用户对显示所述视频选取界面的交互装置的视频选取操作。
  43. 根据权利要求33-42任一项所述的装置,其特征在于,所述处理器还被配置为执行所述计算机程序以实现:
    获取用户在所述目标视频中的目标图像帧中选中的像素点在所述目标图像帧中的位置或者在所述目标图像帧中选中的像素点区域在所述目标图像帧中的位置;
    根据所述像素点或者像素点区域在所述目标图像帧中的位置确定所述展示对象在所述空间中的位置信息。
  44. 根据权利要求43所述的装置,其特征在于,所述处理器还被配置为执行所述计算机程序以实现:
    从所述目标视频中确定目标子视频;
    获取用户在所述目标子视频中的目标图像帧中选中的像素点在所述目标图像帧中的位置或者在所述目标图像帧中选中的像素点区域在所述目标图像帧中的位置。
  45. 根据权利要求44所述的装置,其特征在于,所述处理器还被配置为执行所述计算机程序以实现:响应于用户在所述目标视频中的目标子视频之外的图像帧中选中像素点或者像素点区域的操作,输出选中无效的提示信息。
  46. 根据权利要求44或45所述的装置,其特征在于,所述处理器还被配置为执行所述计算机程序以实现:输出第一提示信息,其中,所述第一提示信息用于指示用户在所述目标子视频中的目标图像帧中选中像素点或者像素点区域。
  47. 根据权利要求44-46任一项所述的装置,其特征在于,所述目标 子视频包括所述目标视频中拍摄装置的运动状态满足预设的运动条件时拍摄装置采集到的视频。
  48. 根据权利要求44-46任一项所述的装置,其特征在于,所述处理器还被配置为执行所述计算机程序以实现:
    从所述目标视频中确定多个连续图像帧,其中,所述多个连续图像帧相邻的图像帧之间特征点的平均移动量之和大于或等于预设的距离阈值,所述多个连续图像帧的视差大于或等于预设的视差阈值;
    将所述多个连续图像帧确定为目标子视频。
  49. 根据权利要求48所述的装置,其特征在于,所述多个连续图像帧的数量大于或等于预设的图像数量阈值。
  50. 根据权利要求43-49任一项所述的装置,其特征在于,所述处理器还被配置为执行所述计算机程序以实现:
    确定所述像素点或者所述像素点区域指示的空间中物体是否为静止物体;
    当所述物体为静止物体时,根据所述像素点或者像素点区域在所述目标图像帧中的位置确定所述用户编辑的展示对象在所述空间中的位置信息。
  51. 根据权利要求43-50任一项所述的装置,其特征在于,所述处理器还被配置为执行所述计算机程序以实现:
    确定所述像素点或者所述像素点区域指示的空间中物体是否为静止物体;
    当所述物体不为静止运动时,输出第二提示信息,其中,所述第二提示信息用于提示用户所述像素点或者所述像素点区域不可选,或者用于提示用户选中其他像素点或者像素点区域。
  52. 根据权利要求43-51任一项所述的装置,其特征在于,所述处理器还被配置为执行所述计算机程序以实现:
    确定所述像素点区域是否满足预设的纹理条件;
    当满足预设的纹理条件时,根据所述像素点或者像素点区域在所述目标图像帧中的位置确定所述用户编辑的展示对象在所述空间中的位置信 息。
  53. 根据权利要求43-52任一项所述的装置,其特征在于,所述处理器还被配置为执行所述计算机程序以实现:
    确定所述像素点区域是否满足预设的纹理条件;
    当不满足预设的纹理条件时,输出第三提示信息,其中,所述第三提示信息用于提示用户所述像素点或者所述像素点区域不可选,或者用于提示用户选中其他像素点或者像素点区域。
  54. 根据权利要求43-51任一项所述的装置,其特征在于,所述处理器还被配置为执行所述计算机程序以实现:
    根据所述像素点区域在所述目标图像帧中的位置确定所述像素点区域中的特征点;
    获取所述像素点区域中的特征点在所述目标图像帧中的位置;
    根据所述特征点在所述目标图像帧中的位置确定所述像素点区域的几何中心像素点的对应的空间点在所述空间中的位置信息。
  55. 根据权利要求54所述的装置,其特征在于,所述处理器还被配置为执行所述计算机程序以实现:
    根据所述特征点在所述目标图像帧中的位置确定所述像素点区域的几何中心像素点的对应的空间点在所述目标视频至少一帧参考图像帧的光流向量;
    根据所述光流向量确定所述几何中心像素点的对应的空间点在所述至少一帧参考图像帧中的位置;
    根据所述在所述至少一帧参考图像帧中的位置和所述特征点在所述目标图像帧中的位置确定所述几何中心像素点的对应的空间点在所述空间中的位置信息。
  56. 根据权利要求43-52任一项所述的装置,其特征在于,所述处理器还被配置为执行所述计算机程序以实现:
    获取所述目标图像帧中的目标特征点对应的空间点在所述空间中的位置;
    根据所述目标特征点对应的空间点在所述空间中的位置拟合目标平 面;
    根据所述像素点在所述目标图像帧中的位置和所述拟合目标平面确定所述展示对象在所述空间中的位置信息。
  57. 根据权利要求56所述的装置,其特征在于,所述目标特征点与所述像素点之间的像素距离小于或等于预设的像素距离阈值。
  58. 根据权利要求33-57任一项所述的装置,其特征在于,所述目标视频是可移动平台对所述空间中的目标对象进行跟随时由所述拍摄装置拍摄获取的,所述处理器还被配置为执行所述计算机程序以实现:
    获取所述拍摄装置的跟随对象在所述空间中的位置信息;
    根据所述跟随对象在所述空间中的位置信息确定所述展示对象在所述空间中的位置信息。
  59. 根据权利要求33-58任一项所述的装置,其特征在于,所述目标视频是可移动平台对所述空间中的目标对象进行环绕运动时由所述拍摄装置拍摄获取的,所述处理器还被配置为执行所述计算机程序以实现:
    获取所述拍摄装置的环绕对象在所述空间中的位置信息;
    根据所述环绕对象在所述空间中的位置信息确定所述展示对象在所述空间中的位置信息。
  60. 根据权利要求33-59任一项所述的装置,其特征在于,所述展示对象包括数字、字母、符号、文字和物体标识中的至少一种。
  61. 根据权利要求33-60任一项所述的装置,其特征在于,所述展示对象为三维模型。
  62. 根据权利要求33-61任一项所述的装置,其特征在于,所述处理器还被配置为执行所述计算机程序以实现:
    播放或存储或者运行社交应用程序分享所述目标合成视频。
  63. 根据权利要求33-61任一项所述的装置,其特征在于,所述可移动平台包括所述图像的处理装置,所述处理器还被配置为执行所述计算机程序以实现:
    将所述目标合成视频发送给所述可移动平台的控制终端以使所述控制终端播放或存储或者运行社交应用程序分享所述目标合成视频。
  64. 根据权利要求33-63任一项所述的装置,其特征在于,所述可移动平台包括无人飞行器。
  65. 一种可移动平台,其特征在于,包括如权利要求33-38任一项或权利要求43-61任一项或权利要求63或权利要求64所述的图像的处理装置。
  66. 一种可移动平台的控制终端,其特征在于,包括如权利要求33-62任一项或权利要求64所述的图像的处理装置。
  67. 一种计算机可读存储介质,其上存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现如权利要求1-32任一项所述的图像的处理方法的步骤。
PCT/CN2020/087404 2020-04-28 2020-04-28 图像的处理方法及装置、可移动平台及其控制终端、计算机可读存储介质 WO2021217398A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2020/087404 WO2021217398A1 (zh) 2020-04-28 2020-04-28 图像的处理方法及装置、可移动平台及其控制终端、计算机可读存储介质
CN202080030128.6A CN113853577A (zh) 2020-04-28 2020-04-28 图像的处理方法及装置、可移动平台及其控制终端、计算机可读存储介质

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/087404 WO2021217398A1 (zh) 2020-04-28 2020-04-28 图像的处理方法及装置、可移动平台及其控制终端、计算机可读存储介质

Publications (1)

Publication Number Publication Date
WO2021217398A1 true WO2021217398A1 (zh) 2021-11-04

Family

ID=78373264

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/087404 WO2021217398A1 (zh) 2020-04-28 2020-04-28 图像的处理方法及装置、可移动平台及其控制终端、计算机可读存储介质

Country Status (2)

Country Link
CN (1) CN113853577A (zh)
WO (1) WO2021217398A1 (zh)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114363627A (zh) * 2021-12-20 2022-04-15 北京百度网讯科技有限公司 图像处理方法、装置及电子设备
CN117151140A (zh) * 2023-10-27 2023-12-01 安徽容知日新科技股份有限公司 目标物标识码的识别方法、装置及计算机可读存储介质
WO2024016924A1 (zh) * 2022-07-20 2024-01-25 北京字跳网络技术有限公司 视频处理方法、装置、电子设备及存储介质
CN117456428A (zh) * 2023-12-22 2024-01-26 杭州臻善信息技术有限公司 基于视频图像特征分析的垃圾投放行为检测方法

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114549974B (zh) * 2022-01-26 2022-09-06 西宁城市职业技术学院 基于用户的多智能设备的交互方法

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103971400A (zh) * 2013-02-06 2014-08-06 阿里巴巴集团控股有限公司 一种基于标识码的三维交互的方法和系统
CN105979170A (zh) * 2016-06-24 2016-09-28 谭圆圆 视频制作方法及视频制作装置
CN205726069U (zh) * 2016-06-24 2016-11-23 谭圆圆 无人飞行器控制端及无人飞行器
US20170242432A1 (en) * 2016-02-24 2017-08-24 Dronomy Ltd. Image processing for gesture-based control of an unmanned aerial vehicle
CN107391060A (zh) * 2017-04-21 2017-11-24 阿里巴巴集团控股有限公司 图像显示方法、装置、系统及设备、可读介质

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105118061A (zh) * 2015-08-19 2015-12-02 刘朔 用于将视频流配准至三维地理信息空间中的场景的方法
CN107665506B (zh) * 2016-07-29 2021-06-01 成都理想境界科技有限公司 实现增强现实的方法及系统
CN110047142A (zh) * 2019-03-19 2019-07-23 中国科学院深圳先进技术研究院 无人机三维地图构建方法、装置、计算机设备及存储介质
CN110648398B (zh) * 2019-08-07 2020-09-11 武汉九州位讯科技有限公司 基于无人机航摄数据的正射影像实时生成方法及系统

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103971400A (zh) * 2013-02-06 2014-08-06 阿里巴巴集团控股有限公司 一种基于标识码的三维交互的方法和系统
US20170242432A1 (en) * 2016-02-24 2017-08-24 Dronomy Ltd. Image processing for gesture-based control of an unmanned aerial vehicle
CN105979170A (zh) * 2016-06-24 2016-09-28 谭圆圆 视频制作方法及视频制作装置
CN205726069U (zh) * 2016-06-24 2016-11-23 谭圆圆 无人飞行器控制端及无人飞行器
CN107391060A (zh) * 2017-04-21 2017-11-24 阿里巴巴集团控股有限公司 图像显示方法、装置、系统及设备、可读介质

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114363627A (zh) * 2021-12-20 2022-04-15 北京百度网讯科技有限公司 图像处理方法、装置及电子设备
CN114363627B (zh) * 2021-12-20 2024-01-19 北京百度网讯科技有限公司 图像处理方法、装置及电子设备
WO2024016924A1 (zh) * 2022-07-20 2024-01-25 北京字跳网络技术有限公司 视频处理方法、装置、电子设备及存储介质
CN117151140A (zh) * 2023-10-27 2023-12-01 安徽容知日新科技股份有限公司 目标物标识码的识别方法、装置及计算机可读存储介质
CN117151140B (zh) * 2023-10-27 2024-02-06 安徽容知日新科技股份有限公司 目标物标识码的识别方法、装置及计算机可读存储介质
CN117456428A (zh) * 2023-12-22 2024-01-26 杭州臻善信息技术有限公司 基于视频图像特征分析的垃圾投放行为检测方法
CN117456428B (zh) * 2023-12-22 2024-03-29 杭州臻善信息技术有限公司 基于视频图像特征分析的垃圾投放行为检测方法

Also Published As

Publication number Publication date
CN113853577A (zh) 2021-12-28

Similar Documents

Publication Publication Date Title
WO2021217398A1 (zh) 图像的处理方法及装置、可移动平台及其控制终端、计算机可读存储介质
US10430995B2 (en) System and method for infinite synthetic image generation from multi-directional structured image array
US11632533B2 (en) System and method for generating combined embedded multi-view interactive digital media representations
US11967162B2 (en) Method and apparatus for 3-D auto tagging
US11095837B2 (en) Three-dimensional stabilized 360-degree composite image capture
US10176592B2 (en) Multi-directional structured image array capture on a 2D graph
US10949978B2 (en) Automatic background replacement for single-image and multi-view captures
US11748907B2 (en) Object pose estimation in visual data
WO2020014909A1 (zh) 拍摄方法、装置和无人机
EP3668093B1 (en) Method, system and apparatus for capture of image data for free viewpoint video
US11783443B2 (en) Extraction of standardized images from a single view or multi-view capture
US20210312702A1 (en) Damage detection from multi-view visual data
CN105282421B (zh) 一种去雾图像获取方法、装置及终端
US20200258309A1 (en) Live in-camera overlays
CN111141264B (zh) 一种基于无人机的城市三维测绘方法和系统
US20230410332A1 (en) Structuring visual data
WO2019104569A1 (zh) 一种对焦方法、设备及可读存储介质
EP3629570A2 (en) Image capturing apparatus and image recording method
US11252398B2 (en) Creating cinematic video from multi-view capture data
WO2022047701A1 (zh) 图像处理方法和装置
CN116097308A (zh) 自动摄影构图推荐
US20220254008A1 (en) Multi-view interactive digital media representation capture
US20230217001A1 (en) System and method for generating combined embedded multi-view interactive digital media representations
US20220254007A1 (en) Multi-view interactive digital media representation viewer
JP4392291B2 (ja) 画像処理装置、画像処理方法および画像処理プログラム

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20933827

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20933827

Country of ref document: EP

Kind code of ref document: A1