WO2022027447A1 - 图像处理方法、相机及移动终端 - Google Patents

图像处理方法、相机及移动终端 Download PDF

Info

Publication number
WO2022027447A1
WO2022027447A1 PCT/CN2020/107433 CN2020107433W WO2022027447A1 WO 2022027447 A1 WO2022027447 A1 WO 2022027447A1 CN 2020107433 W CN2020107433 W CN 2020107433W WO 2022027447 A1 WO2022027447 A1 WO 2022027447A1
Authority
WO
WIPO (PCT)
Prior art keywords
video frame
video
moving subject
frame
camera
Prior art date
Application number
PCT/CN2020/107433
Other languages
English (en)
French (fr)
Inventor
李广
朱传杰
李志强
李静
Original Assignee
深圳市大疆创新科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市大疆创新科技有限公司 filed Critical 深圳市大疆创新科技有限公司
Priority to PCT/CN2020/107433 priority Critical patent/WO2022027447A1/zh
Priority to CN202080035108.8A priority patent/CN113841112A/zh
Publication of WO2022027447A1 publication Critical patent/WO2022027447A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0487Interaction techniques based on graphical user interfaces [GUI] using specific features provided by the input device, e.g. functions controlled by the rotation of a mouse with dual sensing arrangements, or of the nature of the input device, e.g. tap gestures based on pressure sensed by a digitiser
    • G06F3/0488Interaction techniques based on graphical user interfaces [GUI] using specific features provided by the input device, e.g. functions controlled by the rotation of a mouse with dual sensing arrangements, or of the nature of the input device, e.g. tap gestures based on pressure sensed by a digitiser using a touch-screen or digitiser, e.g. input of commands through traced gestures
    • G06F3/04883Interaction techniques based on graphical user interfaces [GUI] using specific features provided by the input device, e.g. functions controlled by the rotation of a mouse with dual sensing arrangements, or of the nature of the input device, e.g. tap gestures based on pressure sensed by a digitiser using a touch-screen or digitiser, e.g. input of commands through traced gestures for inputting data by handwriting, e.g. gesture or text
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0484Interaction techniques based on graphical user interfaces [GUI] for the control of specific functions or operations, e.g. selecting or manipulating an object, an image or a displayed text element, setting a parameter value or selecting a range
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0487Interaction techniques based on graphical user interfaces [GUI] using specific features provided by the input device, e.g. functions controlled by the rotation of a mouse with dual sensing arrangements, or of the nature of the input device, e.g. tap gestures based on pressure sensed by a digitiser
    • G06F3/0488Interaction techniques based on graphical user interfaces [GUI] using specific features provided by the input device, e.g. functions controlled by the rotation of a mouse with dual sensing arrangements, or of the nature of the input device, e.g. tap gestures based on pressure sensed by a digitiser using a touch-screen or digitiser, e.g. input of commands through traced gestures
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules

Definitions

  • the present application relates to the technical field of image processing, and in particular, to an image processing method, a camera, a mobile terminal, and a computer-readable storage medium.
  • Embodiments of the present application provide an image processing method, a camera, a mobile terminal, and a computer-readable storage medium, which can realize a separate video effect.
  • a first aspect of the embodiments of the present application provides an image processing method, including:
  • the original video shot with the moving subject is processed to obtain a target video, where the target video includes the moving subject and at least one dynamic avatar corresponding to the moving subject, and the dynamic avatar starts at a specified time
  • the movement of the moving body is repeated for a delay.
  • a second aspect of the embodiments of the present application provides a camera, including: a processor and a memory for storing a computer program;
  • the processor implements the following steps when executing the computer program:
  • the original video shot with the moving subject is processed to obtain a target video, where the target video includes the moving subject and at least one dynamic avatar corresponding to the moving subject, and the dynamic avatar starts at a specified time
  • the movement of the moving body is repeated for a delay.
  • a third aspect of the embodiments of the present application provides a mobile terminal, including: a processor and a memory for storing a computer program;
  • the processor implements the following steps when executing the computer program:
  • the original video shot with the moving subject is processed to obtain a target video, where the target video includes the moving subject and at least one dynamic avatar corresponding to the moving subject, and the dynamic avatar starts at a specified time
  • the movement of the moving body is repeated for a delay.
  • a fourth aspect of the embodiments of the present application provides a computer-readable storage medium, where the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, implements any image processing in the first aspect above method.
  • the image processing method provided by the embodiment of the present application can process the original video shot with a moving subject after acquiring the clone effect instruction, so that the moving subject in the video has at least one dynamic clone, and the dynamic clone can be repeated with a specified time delay The movement of the moving subject.
  • the embodiment of the present application provides a video avatar effect, which improves the interest of the user in making a video, and enables the user to make a creative video.
  • FIG. 1A is the Nth frame in the original video provided by the embodiment of the present application.
  • FIG. 1B is an effect diagram of the Nth frame shown in FIG. 1A after processing.
  • FIG. 2 is a flowchart of an image processing method provided by an embodiment of the present application.
  • FIG. 3 is a structural diagram of a camera provided by an embodiment of the present application.
  • FIG. 4 is a structural diagram of a mobile terminal provided by an embodiment of the present application.
  • the embodiment of the present application provides an image processing method, the image processing method can add a avatar effect to a moving subject in a video, that is, the moving subject can have at least one dynamic avatar corresponding to the moving subject, and the dynamic avatar can be in a specified time. Delay repeats the movement of the moving body.
  • FIG. 1A is the Nth frame in the original video provided by the embodiment of the present application
  • FIG. 1B is an effect diagram of the Nth frame after processing.
  • the moving subject of the Nth frame in the original video is X
  • the Nth frame of the target video that is, the video obtained after processing the original video
  • the moving subject X may have at least one avatar, as shown in FIG. 1B .
  • Two avatars X' and X" the action done by the avatar in the Nth frame is the action that the moving subject X has done, for example, in an example, the action done by X' can be the moving subject X in the 5th frame.
  • the previous action, the action done by X" can be the action of the moving subject X 10 frames ago.
  • each clone is not static, but Dynamically repeats the action of the moving subject with a certain time delay, that is, each avatar can be a dynamic avatar.
  • FIG. 1A and FIG. 1B are only examples provided for the convenience of understanding.
  • the parameters of the avatar effect such as the number of avatars, the delay of the avatars, and the transparency of the avatars, can be set by the user. Set or use the default parameters of the system, the implementation of this part of the content will be explained later.
  • the image processing method provided by the embodiments of the present application can realize the effect of avatar, improve the interest of the user in making videos, and enable the user to make creative videos.
  • FIG. 2 is a flowchart of an image processing method provided by an embodiment of the present application.
  • the method can be applied to cameras, mobile terminals, image processing equipment and other electronic equipment, and the method includes:
  • the clone effect instruction can be triggered by the user.
  • the avatar effect instruction may be a button in the interactive interface, and after the user clicks the button, the processing of the avatar effect on the original video can be triggered.
  • the clone effect command can also be a physical button.
  • the avatar effect instruction can also be triggered in other ways, such as through voice, touch gestures, and so on.
  • the avatar effect instruction may include one or more of the following information: the number of avatars, the interval of avatar frames, and the transparency of the avatars.
  • the avatar frame interval can be the number of frames of action between adjacent avatars. As mentioned earlier, these information can be set by the user, or the default parameters of the system can be used.
  • the original video with the moving subject can be processed.
  • the processing of the original video with the moving subject can include the following steps:
  • the time corresponding to the first video frame is earlier than that of the second video frame.
  • the second video frame may be the ith frame, and the frame sequence number corresponding to the first video frame is less than i, such as i-3, i-5, etc.
  • the avatar effect can be achieved by fusing the moving subject in the first video frame into the second video frame, so that the moving subject in the second video frame has an avatar, and the avatar is the moving subject of the first video frame.
  • the photographer when shooting a moving subject, the photographer usually changes the shooting angle, in other words, the shooting angle corresponding to the first video frame and the shooting angle corresponding to the second video frame may be different. Then, when the moving subject in the first video frame is fused into the second video frame, in order to make the avatar effect more natural and real, the first video frame can be mapped to the space corresponding to the second video frame, and then both Synthesis.
  • the moving subject in the original video is running, the moving subject in the first video frame is in the air, and the shooting angle of the first video frame corresponds to the left front of the photographer.
  • the first video frame can be mapped to the shooting angle corresponding to the direct front through spatial transformation, etc., to obtain the motion in the first video frame.
  • the subject the image that can be obtained by shooting from an angle directly in front (that is, the first video frame after mapping). Since the shooting angles of the mapped first video frame and the second video frame match, the avatar effect in the synthesized first target video frame is more natural and real.
  • the original video may be obtained by rotating the camera on the spot.
  • the so-called in situ means that the coordinates of the camera in the world coordinate system are roughly unchanged. For example, if the displacement of the camera in the world coordinate system is less than or equal to a preset threshold, the camera can be considered to be still in place.
  • the camera can be arbitrarily rotated on the spot, for example, it can be turned from left to right, or it can be turned from top to bottom, which is not limited in this application.
  • the original video may be captured in real time after obtaining the avatar effect instruction.
  • a shooting mode with a clone effect can be configured in the camera. The user can trigger the shooting mode by clicking and other operations, and issue a clone effect instruction, and the camera can enter the shooting mode after obtaining the clone effect instruction. Before shooting, the camera can prompt the user to shoot in place through text, voice, etc.
  • the processing of the avatar effect can be that the camera performs the avatar effect processing on the captured video frame while shooting the original video, or the camera can process the avatar effect on the original video after the user completes the shooting of the original video. .
  • the camera can also locate its position in the world coordinate system in real time. If it detects that the displacement of the camera exceeds the preset threshold, it can pause the shooting and send a reminder to the user that the displacement is too large.
  • the original video may also be a segment selected by the user from the video material.
  • the video shot by the user may include a segment corresponding to the scenery and a segment corresponding to the movement of the character, then the user can cut out the segment corresponding to the movement of the character, and add a clone effect to the segment corresponding to the movement of the character.
  • the camera can be mounted on the gimbal and the camera is configured with an algorithm to automatically follow the target, then when shooting a moving subject, the camera can automatically follow the moving subject in situ to rotate and shoot under the control of the gimbal .
  • the first video frame may be processed through a spatial transformation matrix.
  • the spatial transformation matrix can be determined in various ways.
  • the spatial transformation matrix can be a rotation matrix.
  • the rotation matrix can be calculated using the pose information of the camera, and the pose information of the camera can be obtained through the inertial measurement unit IMU of the camera. For example, the camera pose information corresponding to the first video frame can be obtained, and the camera pose information corresponding to the second video frame can be obtained. According to the difference between the camera pose information corresponding to the first video frame and the second video frame Difference, the rotation matrix can be calculated.
  • the spatial transformation matrix may also include a homography matrix.
  • the homography matrix can be calculated according to the feature matching result of the first video frame and the second video frame. Specifically, the feature matching may be performed on a specified area (specified content) in the video frame. In one example, the specified area may be, for example, a background area (scene area) other than a moving subject. By extracting feature points from the background region of the first video frame and extracting feature points from the background region of the second video frame, feature matching can be performed on the extracted feature points to obtain multiple matched feature pairs, From these feature pairs, a homography matrix can be calculated.
  • the matched feature pairs are not necessarily all matched accurately, that is, some matched feature pairs may be unreliable and inaccurate, so multiple feature pairs can be screened to filter out the Match the correct credible feature pairs, and then calculate the homography matrix according to the filtered credible feature pairs.
  • the mapped first video frame may be synthesized with the second video frame to obtain the first target video frame.
  • the first target video frame may be a frame in the target video.
  • the spatial transformation of the first video frame is not absolutely accurate, that is, the calculated spatial relationship between the first video frame and the second video frame is Therefore, if the entire first video frame after mapping is directly synthesized with the second video frame, the synthesized first target video frame will appear blurred, and the main body of the current frame will also become transparent. Therefore, in another implementation manner, a moving subject may be extracted from the mapped first video frame, an avatar image may be extracted, and then the avatar image and the second video frame may be synthesized.
  • the original mask corresponding to the moving subject can be obtained by subjecting the first video frame to segmentation; the original mask can be mapped to the space corresponding to the second video frame through a spatial transformation matrix to obtain the target mask film; the target mask can be used to process the mapped first video frame, for example, the target mask can be multiplied by the mapped first video frame, and then the mapped first video frame can be extracted. the moving subject, get the avatar image.
  • the portion of the target mask that overlaps with the moving subject in the second video frame may be further removed.
  • subject segmentation can be performed on the second video frame to obtain the mask of the moving subject corresponding to the second video frame, and then the mask of the moving subject corresponding to the second video frame in the target mask can be overlapped part removed.
  • the first video frame after mapping can be processed by using the processed target mask, so that in the first target video frame obtained by final synthesis, the moving subject does not have any effect. There will be too much overlap with the clone.
  • the target mask can also be blurred.
  • Gaussian blur can be performed on the non-zero values in the target mask (that is, the area corresponding to the moving subject).
  • the non-zero values of the target mask can be blurred.
  • the 0 value is multiplied by 255 and then limited to 255.
  • FIR-style synthesis may be used.
  • the first video frame may refer to a type of video frame whose corresponding time is earlier than that of the second video frame
  • the first target video frame may be is any frame in the target video where the clone begins.
  • the FIR-type synthesis can synthesize each first video frame used for producing the avatar into the second video frame, thereby realizing that the moving subject in the second video frame has multiple avatars.
  • the second video frame may be, for example, the 10th frame
  • the first video frame may include, for example, the first frame and the fourth frame.
  • the 7th frame when realizing 3 avatars of the moving subject, the 1st frame, the 4th frame and the 7th frame can be synthesized into the 10th frame, so that the 10th frame of the moving subject has 3 avatars , the three avatars correspond to the moving subjects in the first frame, the fourth frame and the seventh frame respectively.
  • the frame number of the second video frame can be greater than K, so that there can be at least K first avatars used to make the avatars.
  • the split frame interval is 3 frames.
  • the avatar frame interval can be used to represent the number of frames that are different in action between adjacent avatars.
  • the avatar corresponding to the 7th frame is 3 frames behind the moving subject in action
  • the 4th frame is behind the moving subject.
  • the corresponding avatar is 3 frames behind the avatar corresponding to the 7th frame
  • the avatar corresponding to the 1st frame is 3 frames behind the avatar corresponding to the 4th frame.
  • the synthesized first target video frame corresponds to the frame number of the second video frame, that is, the first target video frame is the 10th frame in the target video.
  • the 2nd frame, the 5th frame and the 8th frame in the original video can be synthesized into the 11th frame in the original video.
  • the 13th frame of the target video frame the 4th frame, the 7th frame, and the 10th frame in the original video can be synthesized into the 13th frame of the original video. The idea of synthesizing subsequent video frames of the target video is the same, and details are not repeated here.
  • the embodiment of the present application provides another implementation, which can adopt IIR synthesis, that is, the target video frame obtained by synthesis can be used to synthesize the subsequent target video frame, so that the amount of calculation can be greatly reduced.
  • the first video frame mentioned above can be a frame in the original video, and the mapped first video frame is synthesized into the second video frame to obtain the first target video with 1 clone frame.
  • a third video frame may also be obtained from the original video, the time corresponding to the third video frame is later than the second video frame, and the first video frame, the second video frame and the first video frame The frame interval between the three video frames is the same.
  • the first video frame is the first frame in the original video
  • the second video frame is the fourth frame in the original video
  • the acquired third video frame may be the seventh frame in the original video.
  • the synthesized first target video frame can be mapped to the space corresponding to the third video frame, and then the second target video frame is synthesized according to the mapped first target video frame and the third video frame. Since the first target video frame already includes the moving subject and one avatar corresponding to the moving subject, the synthesized second target video frame may include the moving subject and two avatars corresponding to the moving subject.
  • the synthesized first target video frame has one avatar.
  • the avatar frame interval is set to 3
  • the 1st, 2nd, and 3rd frames of the target video have no avatars
  • the 4th frame of the target video has 1 avatar.
  • the 1st frame and the 4th frame in the original video are synthesized;
  • the 5th frame of the target video has 1 avatar, and the 5th frame is synthesized by using the 2nd frame and the 5th frame in the original video;
  • the 6th frame has 1 avatar, and the 6th frame is obtained by synthesizing the 3rd frame and the 6th frame in the original video, then the first target video frame can be the 4th frame, the 5th frame or the 6th frame. any frame.
  • the 7th frame can have 2 avatars, which can be obtained by synthesizing the 4th frame of the target video that has been synthesized and the 7th frame of the original video; 8 frames can have 2 avatars, the 8th frame can be obtained by synthesizing the 5th frame of the synthesized target video and the 8th frame of the original video; the 9th frame of the target video can have 2 avatars, the 9th frame can be It is obtained by synthesizing the 6th frame of the synthesized target video and the 9th frame of the original video; the 10th frame of the target video can have 3 avatars, and the 10th frame can be obtained by using the 7th frame of the synthesized target video and the original video.
  • the 10th frame of the composite is obtained... and so on.
  • the synthesized target video frame with K-1 avatars can be used to synthesize the corresponding video frames in the original video.
  • the synthesis of each target video frame is only the synthesis of two video frames, which greatly reduces the amount of computation compared to FIR synthesis.
  • the second video frame can be used to map to the third video frame
  • the spatial transformation matrix corresponding to the frame is used to map the first target video frame.
  • the spatial transformation matrix corresponding to the mapping of the second video frame to the third video frame the specific implementation has been described above.
  • the difference between the camera pose information between the second video frame and the third video frame can be used to calculate the rotation
  • feature matching can be performed on the second video frame and the third video frame to calculate a homography matrix.
  • the moving subject of the mapped first target video frame may also be extracted. Specifically, reference may be made to the embodiments provided below.
  • the first video frame is the i-fs frame
  • the second video frame is the i frame
  • the third video frame is the i+fs frame
  • fs is the split frame interval
  • the first video frame, the second video frame and the third video frame is subjected to subject segmentation, respectively, to obtain the corresponding masks M(i-fs), M(i), M(i+fs) (the masks can separate the moving subjects in the video frame).
  • the spatial transformation matrix H(i) used to map the first video frame F(i-fs) to the second video frame F(i) can be calculated to map the second video frame F(i) to the third video frame F(i)
  • the specific calculation method can refer to the relevant description in the foregoing.
  • the mask M(i-fs) can be mapped to the space corresponding to the second video frame to obtain the target mask, and the overlapping part with M(i) can be removed from the target mask to obtain the mask Mch( i-fs).
  • Gaussian blur can also be performed on the mask Mch(i-fs) to obtain the mask Mchb(i-fs).
  • the first video frame F(i-fs) can be mapped to the space corresponding to the second video frame F(i) to obtain the mapped first video frame Fch(i-fs).
  • the extracted avatar image can be synthesized with the second video frame F(i), so that the first video frame F(i) can be obtained.
  • the transparency of the moving subject X is 0%, and the clone is The transparency of X' can be 50%, and the transparency of the clone X" can be 75%.
  • the attenuated mask Mch(i-fs)./r can be combined with the mask M(i) corresponding to the moving subject of the second video frame to obtain the mask Mc(i) corresponding to the first target video frame.
  • the mask Mc(i) can extract the moving subject and the avatar in the first target video frame.
  • the pixel value in the mask Mc(i) for example, the part of the pixel value in the mask Mc(i) that is lower than the preset threshold can be set to 0, so that, in cooperation with the attenuation coefficient, it can be achieved.
  • the effect of limiting the number of clones of course, there are other methods for limiting the number of clones, which are not limited in this application.
  • the mask Mc(i) can be mapped by H(i+fs), and the part overlapping with M(i+fs) can be removed from the mapped mask Mc(i) to obtain Mch(i).
  • Gaussian blur can be performed on Mch(i) to obtain Mchb(i).
  • H(i+fs) the first target video frame Fc(i) can be mapped to the space corresponding to the third video frame F(i+fs) to obtain the mapped first target video frame Fch(i).
  • the interval of the avatar frames may be varied, that is, the avatar effect with unequal intervals may be realized.
  • the moving subject in the ith frame of the target video, the moving subject can have three avatars, the first avatar can correspond to the i-2th frame in the original video (the interval with the moving subject is 2 frames), and the second avatar can correspond to The i-5th frame in the original video (the interval with the first avatar is 3 frames), and the third avatar can correspond to the i-9th frame in the original video (the interval with the second avatar is 4 frames).
  • the image processing method provided by the embodiment of the present application can process the video, so that the moving subject in the video has a avatar, which improves the creativity of the video and the fun of video production.
  • the amount of calculation required to add a clone effect to the original video can be greatly reduced, so that the clone effect can be achieved without using post-processing special effects software such as AE, so that the user can use the camera, mobile terminal.
  • the video can be processed with the avatar effect on the electronic device, which greatly facilitates the user to make and share the video.
  • FIG. 3 is a structural diagram of a camera provided by an embodiment of the present application.
  • the camera may be a configuration camera on an electronic device such as a mobile phone, a camera mounted on a drone, or a motion camera.
  • the camera may include a lens, an image sensor, a processor 310, and a memory 320 storing a computer program.
  • Lenses and image sensors can be used for video shooting.
  • the processor can be used to process the captured video, and when executing the computer program, it implements the following steps:
  • the original video shot with the moving subject is processed to obtain a target video, where the target video includes the moving subject and at least one dynamic avatar corresponding to the moving subject, and the dynamic avatar starts at a specified time
  • the movement of the moving body is repeated for a delay.
  • the processor is used to obtain a first video frame and a second video frame from the original video shot with a moving subject when processing the original video shot with the moving subject, wherein the first video frame is The corresponding time is earlier than the second video frame; the first video frame is mapped to the space corresponding to the second video frame; according to the mapped first video frame and the second video frame, composite The first target video frame.
  • the processor maps the first video frame to the space corresponding to the second video frame
  • the processor is used to perform spatial transformation on the first video frame by using a spatial transformation matrix, so as to transform the first video frame.
  • a video frame is mapped to the space corresponding to the second video frame.
  • IMU inertial measurement unit
  • the space transformation matrix includes a rotation matrix, and the rotation matrix is calculated based on the camera pose information corresponding to the first video frame and the camera pose information corresponding to the second video frame, and the camera pose information is obtained through the Obtained from the IMU described above.
  • the spatial transformation matrix includes a homography matrix
  • the processor is further configured to perform feature matching on the first video frame and the second video frame, and calculate the homography according to the matching result.
  • Sex Matrix Sex Matrix.
  • the matching result includes multiple feature pairs between the first video frame and the second video frame;
  • the processor When calculating the homography matrix according to the matching result, the processor is used for screening the plurality of feature pairs, and calculating the homography matrix according to the screened credible feature pairs.
  • the processor when the processor performs feature matching on the first video frame and the second video frame, the processor is configured to extract feature points for the designated areas of the first video frame and the second video frame respectively. , and perform feature matching on the extracted feature points.
  • the designated area includes a background area other than the moving subject.
  • the processor when the processor synthesizes the first target video frame according to the mapped first video frame and the second video frame, the processor is used to compare the moving subject in the mapped first video frame. Extraction is performed to obtain a avatar image; and a target video frame corresponding to the second video frame is synthesized according to the avatar image and the second video frame.
  • the processor when the processor extracts the moving subject in the mapped first video frame, the processor is used to process the mapped first video frame through the target mask corresponding to the moving subject. .
  • the processor is further configured to perform a moving subject segmentation on the first video frame to obtain an original mask corresponding to the moving subject; and map the original mask to the second video frame corresponding to the moving subject. space to get the target mask.
  • the processor is further configured to, before processing the mapped first video frame through the target mask, remove motion in the target mask and in the second video frame The part where the body overlaps.
  • the processor is further configured to perform blurring processing on the target mask before processing the mapped first video frame through the target mask.
  • the processor is further configured to obtain a third video frame from the original video, the time corresponding to the third video frame is later than the second video frame, and the first video frame, The frame interval between the second video frame and the third video frame is the same; the first target video frame is mapped to the space corresponding to the third video frame; according to the mapped first target video frame and The third video frame is synthesized into the second target video frame.
  • the original video is obtained by rotating and shooting the camera in situ.
  • the original video is obtained by the camera following the moving subject in situ to rotate and shoot.
  • different dynamic avatars have different transparency.
  • the number of frames that the dynamic avatar lags behind the moving subject is positively related to the transparency of the dynamic avatar.
  • the avatar effect instruction includes one or more of the following information: the number of avatars, the avatar frame interval, and the avatar transparency.
  • the avatar effect instruction is triggered by a user.
  • the original video is obtained by real-time shooting after the avatar effect instruction is obtained.
  • the processor is further configured to, when shooting the original video, determine whether the displacement of the camera in the world coordinate system is less than or equal to a preset threshold.
  • the original video is a segment selected by the user from the captured video.
  • the camera provided by the embodiment of the present application can process the video, so that the moving subject in the video has an avatar, which improves the creativity of the video and the fun of the video production. Moreover, by constraining the user to shoot the original video in situ, the amount of calculation required to add the avatar effect to the original video can be greatly reduced, so that the avatar effect can be achieved without using post-processing special effects software such as AE, which is greatly convenient for the user.
  • Video production and sharing In an embodiment, an IIR-type synthesis method is also proposed, which further reduces the amount of calculation required to realize multiple avatars, and greatly reduces the hardware conditions required for realizing the avatar effect.
  • FIG. 4 is a structural diagram of a mobile terminal provided by an embodiment of the present application.
  • the mobile terminal can be wired or wirelessly connected to the camera, obtain the original video captured by the camera from the camera, and perform the avatar effect processing on the original video.
  • the mobile terminal may be configured with a camera, and the original video may be a video captured by the mobile terminal.
  • the mobile terminal may include a processor 410 and a memory 420 storing computer programs;
  • the processor implements the following steps when executing the computer program:
  • the original video shot with the moving subject is processed to obtain a target video, where the target video includes the moving subject and at least one dynamic avatar corresponding to the moving subject, and the dynamic avatar starts at a specified time
  • the movement of the moving body is repeated for a delay.
  • the processor is used to obtain a first video frame and a second video frame from the original video shot with a moving subject when processing the original video shot with the moving subject, wherein the first video frame is The corresponding time is earlier than the second video frame; the first video frame is mapped to the space corresponding to the second video frame; according to the mapped first video frame and the second video frame, synthesizing The first target video frame.
  • the processor maps the first video frame to the space corresponding to the second video frame, it is used to perform spatial transformation on the first video frame by using a spatial transformation matrix, so as to transform the first video frame into a space.
  • a video frame is mapped to the space corresponding to the second video frame.
  • the spatial transformation matrix includes a rotation matrix, and the rotation matrix is calculated based on the camera pose information corresponding to the first video frame and the camera pose information corresponding to the second video frame.
  • the spatial transformation matrix includes a homography matrix
  • the processor is further configured to perform feature matching on the first video frame and the second video frame, and calculate the homography according to the matching result.
  • Sex Matrix Sex Matrix.
  • the matching result includes multiple feature pairs between the first video frame and the second video frame;
  • the processor When calculating the homography matrix according to the matching result, the processor is used for screening the plurality of feature pairs, and calculating the homography matrix according to the screened credible feature pairs.
  • the processor when the processor performs feature matching on the first video frame and the second video frame, the processor is configured to extract feature points for the designated areas of the first video frame and the second video frame respectively. , and perform feature matching on the extracted feature points.
  • the designated area includes a background area other than the moving subject.
  • the processor when the processor synthesizes the first target video frame according to the mapped first video frame and the second video frame, the processor is used to compare the moving subject in the mapped first video frame. Extraction is performed to obtain a avatar image; and a target video frame corresponding to the second video frame is synthesized according to the avatar image and the second video frame.
  • the processor when the processor extracts the moving subject in the mapped first video frame, the processor is used to process the mapped first video frame through the target mask corresponding to the moving subject. .
  • the processor is further configured to perform a moving subject segmentation on the first video frame to obtain an original mask corresponding to the moving subject; and map the original mask to the second video frame corresponding to the moving subject. space to get the target mask.
  • the processor is further configured to, before processing the mapped first video frame through the target mask, remove motion in the target mask and in the second video frame The part where the body overlaps.
  • the processor is further configured to perform blurring processing on the target mask before processing the mapped first video frame by using the target mask.
  • the processor is further configured to obtain a third video frame from the original video, the time corresponding to the third video frame is later than the second video frame, and the first video frame, The frame interval between the second video frame and the third video frame is the same; the first target video frame is mapped to the space corresponding to the third video frame; according to the mapped first target video frame and The third video frame is synthesized into the second target video frame.
  • the original video is obtained by rotating the camera on the spot.
  • the original video is obtained by the camera following the moving subject in situ to rotate and shoot.
  • different dynamic avatars have different transparency.
  • the number of frames that the dynamic avatar lags behind the moving subject is positively related to the transparency of the dynamic avatar.
  • the avatar effect instruction includes one or more of the following information: the number of avatars, the avatar frame interval, and the avatar transparency.
  • the avatar effect instruction is triggered by a user.
  • the mobile terminal is configured with a camera, and the original video is captured in real time by the camera after acquiring the avatar effect instruction.
  • the processor is further configured to, when shooting the original video, determine whether the displacement of the camera in the world coordinate system is less than or equal to a preset threshold.
  • the original video is a segment selected by the user from the captured video.
  • the mobile terminal provided by the embodiment of the present application can process the video, so that the moving subject in the video has an avatar, which improves the creativity of the video and the fun of video production. Moreover, by restricting the user to shoot the original video in situ, the amount of calculation required to add the avatar effect to the original video can be greatly reduced, so that the avatar effect can be achieved without the use of post-processing special effects software such as AE, which is greatly convenient for users.
  • Video production and sharing In an embodiment, an IIR-type synthesis method is also proposed, which further reduces the amount of calculation required to realize multiple avatars, and greatly reduces the hardware conditions required for realizing the avatar effect.
  • Embodiments of the present application further provide a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium, and when the computer program is executed by a processor, the embodiments of the present application provide any image processing method.
  • Embodiments of the present application may take the form of a computer program product implemented on one or more storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having program code embodied therein.
  • Computer-usable storage media includes permanent and non-permanent, removable and non-removable media, and storage of information can be accomplished by any method or technology.
  • Information may be computer readable instructions, data structures, modules of programs, or other data.
  • Examples of computer storage media include, but are not limited to, phase-change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), Flash Memory or other memory technology, Compact Disc Read Only Memory (CD-ROM), Digital Versatile Disc (DVD) or other optical storage, Magnetic tape cassettes, magnetic tape magnetic disk storage or other magnetic storage devices or any other non-transmission medium that can be used to store information that can be accessed by a computing device.
  • PRAM phase-change memory
  • SRAM static random access memory
  • DRAM dynamic random access memory
  • RAM random access memory
  • ROM read only memory
  • EEPROM Electrically Erasable Programmable Read Only Memory
  • Flash Memory or other memory technology
  • CD-ROM Compact Disc Read Only Memory
  • CD-ROM Compact Disc Read Only Memory
  • DVD Digital Versatile Disc
  • Magnetic tape cassettes magnetic tape magnetic disk storage or other magnetic storage devices or any other non-

Abstract

本申请实施例公开了一种图像处理方法,包括:获取分身效果指令;根据所述分身效果指令,对拍摄有运动主体的原始视频进行处理,得到目标视频,所述目标视频包括所述运动主体及所述运动主体对应的至少一个动态分身,所述动态分身以指定时延重复所述运动主体的运动。本申请实施例公开的方法,实现了分身的视频效果,提高了用户制作视频的趣味性,使用户可以制作富有创意的视频。

Description

图像处理方法、相机及移动终端 技术领域
本申请涉及图像处理技术领域,尤其涉及一种图像处理方法、相机、移动终端及计算机可读存储介质。
背景技术
随着视频技术的发展,越来越多的电子设备具有拍摄视频的功能。通过拍摄视频,人们可以轻松的记录下所见所闻。而在拍摄视频后,为了增加视频内容的创意,人们可以在视频中增加各种效果。
发明内容
本申请实施例提供了一种图像处理方法、相机、移动终端及计算机可读存储介质,可以实现一种分身的视频效果。
本申请实施例第一方面提供了一种图像处理方法,包括:
获取分身效果指令;
根据所述分身效果指令,对拍摄有运动主体的原始视频进行处理,得到目标视频,所述目标视频包括所述运动主体及所述运动主体对应的至少一个动态分身,所述动态分身以指定时延重复所述运动主体的运动。
本申请实施例第二方面提供了一种相机,包括:处理器与存储计算机程序的存储器;
所述处理器在执行所述计算机程序时实现以下步骤:
获取分身效果指令;
根据所述分身效果指令,对拍摄有运动主体的原始视频进行处理,得到目标视频,所述目标视频包括所述运动主体及所述运动主体对应的至少一个动态分身,所述动态分身以指定时延重复所述运动主体的运动。
本申请实施例第三方面提供了一种移动终端,包括:处理器与存储计算机程序的存储器;
所述处理器在执行所述计算机程序时实现以下步骤:
获取分身效果指令;
根据所述分身效果指令,对拍摄有运动主体的原始视频进行处理,得到目标视频,所述目标视频包括所述运动主体及所述运动主体对应的至少一个动态分身,所述动态分身以指定时延重复所述运动主体的运动。
本申请实施例第四方面提供了一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序被处理器执行时实现上述第一方面中的任一种图像处理方法。
本申请实施例提供的图像处理方法,可以在获取分身效果指令后,对拍摄有运动主体的原始视频进行处理,使视频中的运动主体具有至少一个动态分身,并且动态分身可以以指定时延重复运动主体的运动。本申请实施例提供了一种视频的分身效果,提高了用户制作视频的趣味性,使用户可以制作富有创意的视频。
附图说明
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。
图1A是本申请实施例提供的原始视频中的第N帧。
图1B是图1A所示的第N帧在处理后的效果图。
图2是本申请实施例提供的一种图像处理方法的流程图。
图3是本申请实施例提供的一种相机的结构图。
图4是本申请实施例提供的一种移动终端的结构图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
随着视频技术的发展,越来越多的电子设备具有拍摄视频的功能。通过拍摄视频,人们可以轻松的记录下所见所闻。而在拍摄视频后,为了增加视频内容的创意,人们可以在视频中增加各种效果。
本申请实施例提供了一种图像处理方法,该图像处理方法可以对视频中的运动主体增加分身效果,即可以使运动主体具有至少一个与该运动主体对应的动态分身,动态分身可以以指定时延重复该运动主体的运动。
可以参考图1A与图1B,图1A是本申请实施例提供的原始视频中的第N帧,图1B是该第N帧在处理后的效果图。若原始视频中第N帧的运动主体是X,则在目标视频(即对原始视频进行处理后得到的视频)的第N帧中,该运动主体X可以具有至少一个分身,如图1B中具有两个分身X’与X”,分身在第N帧中所做的动作是运动主体X曾经做过的动作,比如,在一个例子中,X’所做的动作可以是运动主体X在5帧前的动作,X”所做的动作可以是运动主体X在10帧前的动作。
需要注意的是,图1A和图1B所示的仅是一个视频帧在处理前后的效果,而当多个视频帧被连续播放时,从视频效果上,每一个分身并不是静态的,而是动态的以一定的时延在重复运动主体的动作,即每一个分身可以是动态分身。
还需注意的是,图1A与图1B仅是为方便理解而提供的示例,在实际应用时,分身的数量、分身落后的时延、分身的透明度等分身效果的参数都可以由用户自行设定或者使用系统的默认参数,该部分内容的实现将在后文中展开说明。
本申请实施例提供的图像处理方法,可以实现分身效果,提高了用户制作视频的趣味性,使用户可以制作富有创意的视频。
下面可以参见图2,图2是本申请实施例提供的一种图像处理方法的流程图。该方法可以应用于相机、移动终端、图像处理设备及其他的电子设备,该方法包括:
S210、获取分身效果指令。
S220、根据分身效果指令,对拍摄有运动主体的原始视频进行处理,得到具有分身效果的目标视频。
分身效果指令可以是由用户触发。在一个例子中,分身效果指令可以是交互界面中的一个按键,用户点击该按键后,可以触发对原始视频进行分身效果的处理。在一个例子中,分身效果指令也可以是一个实体按键。当然,分身效果指令也可以通过其他方式触发,比如可以通过语音、触摸手势等等。
分身效果指令中可以包括以下一种或多种信息:分身个数、分身帧间隔、分身透明度。分身帧间隔可以是相邻分身之间动作的相差帧数。如前所述,这些信息可以由 用户自行设置,也可以使用系统的默认参数。
在获取分身效果指令后,可以对拍摄有运动主体的原始视频进行处理,在一种实施方式中,对拍摄有运动主体的原始视频进行处理,可以包括以下步骤:
S221、从拍摄有运动主体的原始视频中获取第一视频帧和第二视频帧。
S222、将第一视频帧映射至第二视频帧对应的空间。
S223、根据映射后的第一视频帧与第二视频帧,合成第一目标视频帧。
其中,第一视频帧对应的时刻早于第二视频帧,比如,第二视频帧可以是第i帧,则第一视频帧对应的帧序号小于i,如i-3、i-5等。
分身效果的实现,可以通过将第一视频帧中的运动主体融合至第二视频帧中,从而使第二视频帧中的运动主体具有分身,该分身即为第一视频帧的运动主体。
考虑到在对运动主体进行拍摄的时候,拍摄者通常会改变拍摄的角度,换言之,第一视频帧对应的拍摄角度与第二视频帧对应的拍摄角度可能不同。那么,在将第一视频帧中的运动主体融合至第二视频帧中时,为了使分身效果更加自然、真实,可以将第一视频帧映射至第二视频帧对应的空间,再进行两者的合成。
举个例子,比如原始视频中的运动主体正在奔跑,其中,第一视频帧中的运动主体处在空中,第一视频帧的拍摄角度对应在拍摄者的左前方,若第二视频帧中的运动主体正好落地,第二视频帧中的拍摄角度对应拍摄者的正前方,则可以将第一视频帧通过空间变换等方式映射成对应正前方的拍摄角度,得到对第一视频帧中的运动主体、以正前方的角度进行拍摄所能得到的图像(即映射后的第一视频帧)。由于该映射后的第一视频帧与第二视频帧的拍摄角度匹配,因此合成出的第一目标视频帧中的分身效果更自然、真实。
在一种实施方式中,原始视频可以是通过相机在原地进行旋转拍摄得到。需要注意的是,所谓的原地是指相机在世界坐标系中的坐标大致不变,比如,若相机在世界坐标系中的位移量小于或等于预设阈值,则可以认为相机仍在原地。而在拍摄时,相机可以在原地任意的旋转,比如可以从左转到右,也可以从上转到下,本申请对此不作限制。
由于原始视频是通过相机在原地进行旋转拍摄得到的,即相机在世界坐标系上的坐标大致不变,因此在第一视频帧映射至第二视频帧对应的空间时,所涉及的只是二维的空间变换,即只需计算旋转量即可,无需对整个场景进行三维建模,从而大大降低实现分身效果所需的计算资源,分身效果的处理速度大幅度提升,可以做到实时处理,从而极大方便了用户进行视频分享。
在一种实施方式中,原始视频可以是在获取到分身效果指令之后实时拍摄得到的。比如,相机中可以配置有分身效果的拍摄模式,用户可以通过点击等操作触发该拍摄模式、发出分身效果指令,则相机在获取到分身效果指令后可以进入拍摄模式。在拍摄前,相机可以通过文字、语音等方式提示用户在原地进行拍摄。
对分身效果的处理,可以是相机一边进行原始视频的拍摄一边对已拍摄得到的视频帧进行分身效果的处理,也可以在用户完成原始视频的拍摄后,相机再对原始视频进行分身效果的处理。
在用户进行原始视频的拍摄时,相机还可以实时定位其在世界坐标系中的位置,若检测到相机的位移量超出预设阈值,可以暂停拍摄并向用户发出位移过大的提醒。
在一种实施方式中,原始视频也可以是用户从视频素材中选取的片段。比如,用户所拍摄的视频可以包括风景对应的片段以及人物运动对应的片段,则用户可以截选出人物运动对应的片段,对该人物运动对应的片段添加分身效果。
在一种实施方式中,相机可以装载在云台上且相机配置有自动跟随目标的算法,则在对运动主体进行拍摄时,相机在云台的控制下可以在原地自动跟随运动主体进行旋转拍摄。
在将第一视频帧映射至第二视频帧对应的空间时,具体的,可以通过空间变换矩阵对第一视频帧进行处理。
空间变换矩阵的确定可以有多种方式,在一种实施方式中,空间变换矩阵可以是旋转矩阵。旋转矩阵可以利用相机的位姿信息进行计算,相机的位姿信息可以通过相机的惯性测量单元IMU获取。比如,可以获取拍摄第一视频帧时对应的相机位姿信息,以及获取拍摄第二视频帧对应的相机位姿信息,根据第一视频帧与第二视频帧两者对应的相机位姿信息的差值,可以计算出旋转矩阵。
在另一种实施方式中,空间变换矩阵还可以包括单应性矩阵。单应性矩阵可以根据第一视频帧与第二视频帧的特征匹配结果计算得出。具体的,特征匹配可以针对视频帧中的指定区域(指定内容)进行,在一个例子中,该指定区域比如可以是除运动主体以外的背景区域(场景区域)。通过对第一视频帧的背景区域进行特征点提取,对第二视频帧的背景区域进行特征点提取,从而可以对提取出的特征点进行特征匹配,得到多个匹配出的多个特征对,根据这些特征对,可以计算单应性矩阵。
进一步的,考虑到匹配出的多个特征对并不一定均匹配准确,即有一些匹配出的特征对可能是不可信、不准确的,因此可以对多个特征对进行筛选,筛选出其中的匹配正确的可信特征对,再根据筛选出的可信特征对计算单应性矩阵。
在一种实施方式中,映射后的第一视频帧可以与第二视频帧合成,从而得到第一目标视频帧。第一目标视频帧可以是目标视频中的一帧。考虑到第一视频帧映射至第二视频帧对应的空间时,对第一视频帧的空间变换并不是绝对准确的,即计算出的第一视频帧与第二视频帧之间的空间关系有一定的误差,因此,若直接利用映射后的整个第一视频帧与第二视频帧进行合成,则合成得到的第一目标视频帧将出现画面模糊,且当前帧主体也会变得透明。所以,在另一种实施方式中,可以对映射后的第一视频帧进行运动主体的提取,提取出分身图像,再将该分身图像与第二视频帧进行合成。
对映射后的第一视频帧进行运动主体的提取有多种可行的方式。在一种实施方式中,可以通过对第一视频帧进行主体分割,得到运动主体对应的原始掩膜;可以通过空间变换矩阵将该原始掩膜映射至第二视频帧对应的空间,得到目标掩膜;该目标掩膜可以用于对映射后的第一视频帧进行处理,比如可以使该目标掩膜与映射后的第一视频帧相乘,则可以提取出映射后的第一视频帧中的运动主体,得到分身图像。
上述实施方式中,在得到目标掩膜之后,进一步的,还可以去除该目标掩膜中与第二视频帧中的运动主体重叠的部分。具体实现时,比如可以对第二视频帧进行主体分割,得到第二视频帧对应的运动主体的掩膜,则可以将目标掩膜中的与该第二视频帧对应的运动主体的掩膜重叠的部分去除。在对目标掩膜进行上述重叠部分的去除处理后,可以利用处理后的目标掩膜对映射后的第一视频帧进行处理,从而可以使最终合成得到的第一目标视频帧中,运动主体不会与分身有过多重叠。
在得到目标掩膜之后,还可以对目标掩膜进行模糊处理,具体的,可以对目标掩膜中的非0值(即运动主体对应的区域)进行高斯模糊,比如可以对目标掩膜的非0值乘以255再限制到255。通过对目标掩膜进行模糊处理,可以使提取得到的分身图像与第二视频帧的融合效果更加自然,在目标视频帧中的分身不会有明显的边界等图像处理痕迹,分身效果更加真实。
对于多分身效果的实现,在一种实施方式中,可以采用FIR式合成。采用FIR式合成时,前文所提及的第一视频帧可以有多个,即第一视频帧可以是指代对应的时刻早于第二视频帧的一类视频帧,第一目标视频帧可以是目标视频中开始有分身的任一帧。FIR式合成可以将用于制作分身的每个第一视频帧都合成到第二视频帧中,从而实现第二视频帧中的运动主体具有多个分身。比如,在一个例子中,若希望合成得到的第一目标视频帧中运动主体具有3个分身,第二视频帧比如可以是第10帧,第一视频帧比如可以包括第1帧、第4帧与第7帧,那么,在实现运动主体的3个分身时,可以将该第1帧、第4帧与第7帧合成到第10帧中,从而使第10帧的运动主体具有 3个分身,3个分身分别对应第1帧、第4帧与第7帧中的运动主体。
需要注意的是,由于一个分身对应原始视频中的一个视频帧,因此,若需要实现K个分身,则第二视频帧的帧序号可以大于K,从而可以有至少K个用于制作分身的第一视频帧。
上述将第1帧、第4帧与第7帧合成到第10帧的例子中,分身帧间隔为3帧。分身帧间隔可以用于表征相邻分身之间在动作上相差的帧数,比如在合成得到的第一目标视频帧中,第7帧对应的分身在动作上落后运动主体3帧,第4帧对应的分身落后第7帧对应的分身3帧,第1帧对应的分身落后第4帧对应的分身3帧。合成得到的第一目标视频帧与第二视频帧的帧序号对应,即第一目标视频帧为目标视频中的第10帧。而对于目标视频的第11帧,若仍然是实现3个分身,则可以将原始视频中的第2帧、第5帧与第8帧合成至原始视频中的第11帧得到。而对于目标视频帧的第13帧,可以将原始视频中的第4帧、第7帧、第10帧合成至原始视频的第13帧得到。目标视频后续的视频帧的合成思路相同,在此不再赘述。
上述所提供的FIR式合成方式,当需要合成K个分身时,需要将K个第一视频帧合成至第二视频帧,计算量很大。因此,本申请实施例提供另一种实施方式,可以采用IIR式合成,即可以利用已合成得到的目标视频帧来合成后续的目标视频帧,从而可以大大减少计算量。
对于IIR式合成,前文所提及的第一视频帧可以是原始视频中的一帧,将映射后的第一视频帧合成至第二视频帧中,可以得到具有1个分身的第一目标视频帧。在合成得到第一目标视频帧之后,还可以从原始视频中获取第三视频帧,第三视频帧对应的时刻晚于所述第二视频帧,且第一视频帧、第二视频帧与第三视频帧之间的帧间隔相同。比如第一视频帧是原始视频中的第1帧,第二视频帧是原始视频中的第4帧,则获取的第三视频帧可以是原始视频中的第7帧。
获取第三视频帧后,可以将已合成的第一目标视频帧映射至第三视频帧对应的空间,再根据映射后的第一目标视频帧与第三视频帧,合成第二目标视频帧。由于第一目标视频帧中已包括运动主体以及该运动主体对应的1个分身,因此合成的第二目标视频帧中可以包括运动主体以及该运动主体对应的2个分身。
可以理解的是,在IIR式合成中,合成得到的第一目标视频帧具有1个分身。举个例子,比如若设置分身帧间隔为3,则目标视频的第1帧、第2帧与第3帧均没有分身,目标视频的第4帧开始有1个分身,该第4帧是利用原始视频中的第1帧与第4帧合成得到的;目标视频的第5帧有1个分身,该第5帧是利用原始视频中的第2 帧与第5帧合成得到的;目标视频的第6帧有1个分身,该第6帧是利用原始视频中的第3帧与第6帧合成得到的,则第一目标视频帧可以是第4帧、第5帧或第6帧中的任一帧。
对于目标视频的第7帧,在IIR式合成中,该第7帧可以具有2个分身,其可以利用已合成的目标视频的第4帧与原始视频的第7帧合成得到;目标视频的第8帧可以具有2个分身,该第8帧可以利用已合成的目标视频的第5帧与原始视频的第8帧合成得到;目标视频的第9帧可以具有2个分身,该第9帧可以利用已合成的目标视频的第6帧与原始视频的第9帧合成得到;目标视频的第10帧可以具有3个分身,该第10帧可以利用已合成的目标视频的第7帧与原始视频的第10帧合成得到……以此类推。
可见,在IIR式合成中,当需要合成K个分身时,可以利用已合成的具有K-1个分身的目标视频帧与原始视频中的对应视频帧进行合成,换言之,无论合成多少个分身,每一目标视频帧的合成均只是两个视频帧的合成,相比FIR式合成而言计算量大大降低。
对于将第一目标视频帧映射至第三视频帧对应的空间,由于第一目标视频帧在空间上实际与原始视频的第二视频帧对应,因此,可以使用第二视频帧映射至第三视频帧所对应的空间变换矩阵来进行第一目标视频帧的映射。而对于第二视频帧映射至第三视频帧对应的空间变换矩阵,前文已有具体实现的说明,比如可以利用第二视频帧与第三视频帧之间的相机位姿信息的差值计算旋转矩阵,又或者可以对第二视频帧与第三视频帧进行特征匹配以计算单应性矩阵。
在一种实施方式中,也可以对映射后的第一目标视频帧的运动主体进行提取。具体的,可以参考下面提供的实施例。
若第一视频帧为第i-fs帧,第二视频帧为第i帧,第三视频帧为第i+fs帧,fs为分身帧间隔,可以对第一视频帧、第二视频帧与第三视频帧分别进行主体分割,得到各自对应的掩膜M(i-fs)、M(i)、M(i+fs)(掩膜可以分离出视频帧中的运动主体)。可以计算用于将第一视频帧F(i-fs)映射至第二视频帧F(i)的空间变换矩阵H(i),计算用于将第二视频帧F(i)映射至第三视频帧F(i+fs)的空间变换矩阵H(i+fs),具体的计算方式可以参考前文中的相关说明。
通过H(i)可以将掩膜M(i-fs)映射到第二视频帧对应的空间,得到目标掩膜,可以对目标掩膜除去与M(i)重叠的部分,得到掩膜Mch(i-fs)。对该掩膜Mch(i-fs),还可以进行高斯模糊,得到掩膜Mchb(i-fs)。
通过H(i)可以将第一视频帧F(i-fs)映射到第二视频帧F(i)对应的空间,得到映射后的第一视频帧Fch(i-fs)。利用掩膜Mchb(i-fs)对映射后的第一视频帧Fch(i-fs)进行运动主体的提取,提取出的分身图像可以与第二视频帧F(i)合成,从而可以得到第一目标视频帧Fc(i)。
进一步的,可以通过式子Mc(i)=M(i)+Mch(i-fs)./r,计算第一目标视频帧对应的掩膜Mc(i)。由于Mch(i-fs)对应第一视频帧的运动主体,M(i)对应第二视频帧的运动主体,因此,可以通过Mch(i-fs)./r实现对第一视频帧的运动主体进行衰减,其中r为衰减系数,其可以根据需求进行设置。比如可以设置r=2,则最终效果上,运动主体的分身在动作上落后的帧数越多,其对应的透明度将越大,如图1B所示,运动主体X的透明度为0%,分身X’的透明度可以为50%,分身X”的透明度可以为75%。当然,若想使各分身不透明,也可以设置r=1,即不进行衰减。
衰减后的掩膜Mch(i-fs)./r可以与第二视频帧的运动主体对应的掩膜M(i)结合,从而得到第一目标视频帧对应的掩膜Mc(i),该掩膜Mc(i)可以提取出第一目标视频帧中的运动主体与分身。
还可以通过对掩膜Mc(i)中的像素值进行限制,比如可以使掩膜Mc(i)中像素值低于预设阈值的部分为0,从而,在与衰减系数配合下,可以达到限制分身的数量的效果。当然,分身数量的限制还有其他的方法,本申请对此不做限制。
对掩膜Mc(i),可以通过H(i+fs)进行映射,对映射后的掩膜Mc(i)可以除去与M(i+fs)重叠的部分,得到Mch(i)。同样的,可以对Mch(i)进行高斯模糊,得到Mchb(i)。通过H(i+fs)可以将第一目标视频帧Fc(i)映射到第三视频帧F(i+fs)对应的空间,得到映射后的第一目标视频帧Fch(i)。利用掩膜Mchb(i)对映射后的第一目标视频帧Fch(i)进行运动主体以及分身的提取,提取出的分身图像可以与第三视频帧F(i+fs)合成,从而可以得到第二目标视频帧Fc(i+fs)。
对于目标视频后续的视频帧的合成,可以参考上述第二目标视频帧的合成方式,在此不再赘述。
在一种实施方式中,分身帧间隔可以是变化的,即可以实现非等间隔的分身效果。比如在目标视频的第i帧中,运动主体可以具有三个分身,第一个分身可以对应原始视频中的第i-2帧(与运动主体的间隔是2帧),第二个分身可以对应原始视频中的第i-5帧(与第一个分身的间隔是3帧),第三个分身可以对应原始视频中的第i-9帧(与第二个分身的间隔是4帧)。
以上为本申请实施例提供的图像处理方法的详细说明。
本申请实施例提供的图像处理方法,可以对视频进行处理,使视频中的运动主体具有分身,提高了视频的创意与视频制作的趣味性。并且,通过约束用户在原地进行原始视频的拍摄,可以大大降低对原始视频增加分身效果所需的计算量,从而无需使用AE等后处理特效软件也可以实现分身效果,使用户在相机、移动终端等电子设备上就可以对视频进行分身效果的处理,极大的方便了用户进行视频制作和分享。
下面可以参考图3,图3是本申请实施例提供的一种相机的结构图,该相机可以是手机等电子设备上的配置相机,也可以是无人机上搭载的相机,也可以是运动相机。该相机可以包括镜头、图像传感器、处理器310与存储计算机程序的存储器320。
镜头与图像传感器可以用于进行视频拍摄。
处理器可以用于对所拍摄的视频进行处理,其在执行所述计算机程序时实现以下步骤:
获取分身效果指令;
根据所述分身效果指令,对拍摄有运动主体的原始视频进行处理,得到目标视频,所述目标视频包括所述运动主体及所述运动主体对应的至少一个动态分身,所述动态分身以指定时延重复所述运动主体的运动。
可选的,所述处理器对拍摄有运动主体的原始视频进行处理时用于,从拍摄有运动主体的原始视频中获取第一视频帧和第二视频帧,其中,所述第一视频帧对应的时刻早于所述第二视频帧;将所述第一视频帧映射至所述第二视频帧对应的空间;根据映射后的所述第一视频帧与所述第二视频帧,合成第一目标视频帧。
可选的,所述处理器将所述第一视频帧映射至所述第二视频帧对应的空间时用于,通过空间变换矩阵对所述第一视频帧进行空间变换,以将所述第一视频帧映射至所述第二视频帧对应的空间。
可选的,还包括:惯性测量单元IMU;
所述空间变换矩阵包括旋转矩阵,所述旋转矩阵基于所述第一视频帧对应的相机位姿信息与所述第二视频帧对应的相机位姿信息计算得到,所述相机位姿信息通过所述IMU获取。
可选的,所述空间变换矩阵包括单应性矩阵,所述处理器还用于,对所述第一视频帧与所述第二视频帧进行特征匹配,并根据匹配结果计算所述单应性矩阵。
可选的,所述匹配结果包括所述第一视频帧与所述第二视频帧之间的多个特征对;
所述处理器根据匹配结果计算所述单应性矩阵时用于,对所述多个特征对进行筛选,并根据筛选出的可信特征对,计算所述单应性矩阵。
可选的,所述处理器对所述第一视频帧与所述第二视频帧进行特征匹配时用于,分别针对所述第一视频帧与所述第二视频帧的指定区域提取特征点,对提取出的特征点进行特征匹配。
可选的,所述指定区域包括除所述运动主体以外的背景区域。
可选的,所述处理器根据映射后的所述第一视频帧与所述第二视频帧,合成第一目标视频帧时用于,对映射后的所述第一视频帧中的运动主体进行提取,得到分身图像;根据所述分身图像与所述第二视频帧,合成所述第二视频帧对应的目标视频帧。
可选的,所述处理器对映射后的所述第一视频帧中的运动主体进行提取时用于,通过所述运动主体对应的目标掩膜对映射后的所述第一视频帧进行处理。
可选的,所述处理器还用于,对所述第一视频帧进行运动主体分割,得到所述运动主体对应的原始掩膜;将所述原始掩膜映射至所述第二视频帧对应的空间,得到所述目标掩膜。
可选的,所述处理器还用于,在通过所述目标掩膜对映射后的所述第一视频帧进行处理之前,去除所述目标掩膜中与所述第二视频帧中的运动主体重叠的部分。
可选的,所述处理器还用于,在通过所述目标掩膜对映射后的所述第一视频帧进行处理之前,对所述目标掩膜进行模糊处理。
可选的,所述处理器还用于,从所述原始视频中获取第三视频帧,所述第三视频帧对应的时刻晚于所述第二视频帧,且所述第一视频帧、所述第二视频帧与第三视频帧之间的帧间隔相同;将所述第一目标视频帧映射至所述第三视频帧对应的空间;根据映射后的所述第一目标视频帧与所述第三视频帧,合成第二目标视频帧。
可选的,所述原始视频是通过所述相机在原地进行旋转拍摄得到的。
可选的,所述原始视频是通过所述相机在原地跟随所述运动主体进行旋转拍摄得到的。
可选的,不同的所述动态分身具有不同的透明度。
可选的,所述动态分身落后所述运动主体的帧数与所述动态分身的透明度正相关。
可选的,所述分身效果指令包括以下一种或多种信息:分身个数、分身帧间隔、分身透明度。
可选的,所述分身效果指令由用户触发。
可选的,所述原始视频是在获取到所述分身效果指令之后实时拍摄得到的。
可选的,所述处理器还用于,在拍摄所述原始视频时,判断所述相机在世界坐标系中的位移量是否小于或等于预设阈值。
可选的,所述原始视频是用户从所拍摄视频中选取的片段。
以上所提供的各种实施方式的相机,其具体的实现可以参考前文中的相关说明,在此不再赘述。
本申请实施例提供的相机,可以对视频进行处理,使视频中的运动主体具有分身,提高了视频的创意与视频制作的趣味性。并且,通过约束用户在原地进行原始视频的拍摄,可以大大降低对原始视频增加分身效果所需的计算量,从而无需使用AE等后处理特效软件也可以实现分身效果,极大的方便了用户进行视频制作和分享。在一种实施方式中,还提出了IIR式的合成方式,进一步减少了实现多个分身所需的计算量,使分身效果实现所需的硬件条件大大降低。
本申请实施例还提供了一种移动终端,可以参见图4,图4是本申请实施例提供的一种移动终端的结构图。
在一种实施方式中,该移动终端可以与相机进行有线或无线连接,从相机处获取相机拍摄的原始视频,对原始视频进行分身效果的处理。在一种实施方式中,该移动终端可以自身配置了相机,原始视频可以是自身相机拍摄得到的视频。
移动终端可以包括处理器410与存储计算机程序的存储器420;
所述处理器在执行所述计算机程序时实现以下步骤:
获取分身效果指令;
根据所述分身效果指令,对拍摄有运动主体的原始视频进行处理,得到目标视频,所述目标视频包括所述运动主体及所述运动主体对应的至少一个动态分身,所述动态分身以指定时延重复所述运动主体的运动。
可选的,所述处理器对拍摄有运动主体的原始视频进行处理时用于,从拍摄有运动主体的原始视频中获取第一视频帧和第二视频帧,其中,所述第一视频帧对应的时刻早于所述第二视频帧;将所述第一视频帧映射至所述第二视频帧对应的空间;根据映射后的所述第一视频帧与所述第二视频帧,合成第一目标视频帧。
可选的,所述处理器将所述第一视频帧映射至所述第二视频帧对应的空间时用于,通过空间变换矩阵对所述第一视频帧进行空间变换,以将所述第一视频帧映射至所述第二视频帧对应的空间。
可选的,所述空间变换矩阵包括旋转矩阵,所述旋转矩阵基于所述第一视频帧对应的相机位姿信息与所述第二视频帧对应的相机位姿信息计算得到。
可选的,所述空间变换矩阵包括单应性矩阵,所述处理器还用于,对所述第一视频帧与所述第二视频帧进行特征匹配,并根据匹配结果计算所述单应性矩阵。
可选的,所述匹配结果包括所述第一视频帧与所述第二视频帧之间的多个特征对;
所述处理器根据匹配结果计算所述单应性矩阵时用于,对所述多个特征对进行筛选,并根据筛选出的可信特征对,计算所述单应性矩阵。
可选的,所述处理器对所述第一视频帧与所述第二视频帧进行特征匹配时用于,分别针对所述第一视频帧与所述第二视频帧的指定区域提取特征点,对提取出的特征点进行特征匹配。
可选的,所述指定区域包括除所述运动主体以外的背景区域。
可选的,所述处理器根据映射后的所述第一视频帧与所述第二视频帧,合成第一目标视频帧时用于,对映射后的所述第一视频帧中的运动主体进行提取,得到分身图像;根据所述分身图像与所述第二视频帧,合成所述第二视频帧对应的目标视频帧。
可选的,所述处理器对映射后的所述第一视频帧中的运动主体进行提取时用于,通过所述运动主体对应的目标掩膜对映射后的所述第一视频帧进行处理。
可选的,所述处理器还用于,对所述第一视频帧进行运动主体分割,得到所述运动主体对应的原始掩膜;将所述原始掩膜映射至所述第二视频帧对应的空间,得到所述目标掩膜。
可选的,所述处理器还用于,在通过所述目标掩膜对映射后的所述第一视频帧进行处理之前,去除所述目标掩膜中与所述第二视频帧中的运动主体重叠的部分。
可选的,所述处理器还用于,在通过所述目标掩膜对映射后的所述第一视频帧进行处理之前,对所述目标掩膜进行模糊处理。
可选的,所述处理器还用于,从所述原始视频中获取第三视频帧,所述第三视频帧对应的时刻晚于所述第二视频帧,且所述第一视频帧、所述第二视频帧与第三视频帧之间的帧间隔相同;将所述第一目标视频帧映射至所述第三视频帧对应的空间;根据映射后的所述第一目标视频帧与所述第三视频帧,合成第二目标视频帧。
可选的,所述原始视频是通过相机在原地进行旋转拍摄得到的。
可选的,所述原始视频是通过所述相机在原地跟随所述运动主体进行旋转拍摄得到的。
可选的,不同的所述动态分身具有不同的透明度。
可选的,所述动态分身落后所述运动主体的帧数与所述动态分身的透明度正相关。
可选的,所述分身效果指令包括以下一种或多种信息:分身个数、分身帧间隔、分身透明度。
可选的,所述分身效果指令由用户触发。
可选的,所述移动终端配置有相机,所述原始视频是在获取到所述分身效果指令之后通过所述相机实时拍摄得到的。
可选的,所述处理器还用于,在拍摄所述原始视频时,判断所述相机在世界坐标系中的位移量是否小于或等于预设阈值。
可选的,所述原始视频是用户从所拍摄视频中选取的片段。
以上所提供的各种实施方式的移动终端,其具体的实现可以参考前文中的相关说明,在此不再赘述。
本申请实施例提供的移动终端,可以对视频进行处理,使视频中的运动主体具有分身,提高了视频的创意与视频制作的趣味性。并且,通过约束用户在原地进行原始视频的拍摄,可以大大降低对原始视频增加分身效果所需的计算量,从而无需使用AE等后处理特效软件也可以实现分身效果,极大的方便了用户进行视频制作和分享。在一种实施方式中,还提出了IIR式的合成方式,进一步减少了实现多个分身所需的计算量,使分身效果实现所需的硬件条件大大降低。
本申请实施例还提供了一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序被处理器执行时本申请实施例提供任一种图像处理方法。
以上实施例中对每个步骤分别提供了多种实施方式,至于每个步骤具体采用哪种实施方式,在不存在冲突或矛盾的基础上,本领域技术人员可以根据实际情况自由选择或组合,由此构成各种不同的实施例。而本申请文件限于篇幅,未对各种不同的实施例展开说明,但可以理解的是,各种不同的实施例也属于本申请实施例公开的范围。
本申请实施例可采用在一个或多个其中包含有程序代码的存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。计算机可用存储介质包括永久性和非永久性、可移动和非可移动媒体,可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括但不限于:相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带,磁带磁磁盘存储或其他磁性存储设备或任何其他非传输介质,可用于存储可以被计算设备访问的信息。
需要说明的是,在本文中,诸如第一和第二等之类的关系术语仅仅用来将一个实 体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。
以上对本发明实施例所提供的方法、电子设备等进行了详细介绍,本文中应用了具体个例对本发明的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本发明的方法及其核心思想;同时,对于本领域的一般技术人员,依据本发明的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本发明的限制。

Claims (70)

  1. 一种图像处理方法,其特征在于,包括:
    获取分身效果指令;
    根据所述分身效果指令,对拍摄有运动主体的原始视频进行处理,得到目标视频,所述目标视频包括所述运动主体及所述运动主体对应的至少一个动态分身,所述动态分身以指定时延重复所述运动主体的运动。
  2. 根据权利要求1所述的方法,其特征在于,所述对拍摄有运动主体的原始视频进行处理,包括:
    从拍摄有运动主体的原始视频中获取第一视频帧和第二视频帧,其中,所述第一视频帧对应的时刻早于所述第二视频帧;
    将所述第一视频帧映射至所述第二视频帧对应的空间;
    根据映射后的所述第一视频帧与所述第二视频帧,合成第一目标视频帧。
  3. 根据权利要求2所述的方法,其特征在于,所述将所述第一视频帧映射至所述第二视频帧对应的空间,包括:
    通过空间变换矩阵对所述第一视频帧进行空间变换,以将所述第一视频帧映射至所述第二视频帧对应的空间。
  4. 根据权利要求3所述的方法,其特征在于,所述空间变换矩阵包括旋转矩阵,所述旋转矩阵基于所述第一视频帧对应的相机位姿信息与所述第二视频帧对应的相机位姿信息计算得到。
  5. 根据权利要求3所述的方法,其特征在于,所述空间变换矩阵包括单应性矩阵,所述单应性矩阵基于以下方式确定:
    对所述第一视频帧与所述第二视频帧进行特征匹配,并根据匹配结果计算所述单应性矩阵。
  6. 根据权利要求5所述的方法,其特征在于,所述匹配结果包括所述第一视频帧与所述第二视频帧之间的多个特征对;
    所述根据匹配结果计算所述单应性矩阵,包括:
    对所述多个特征对进行筛选,并根据筛选出的可信特征对,计算所述单应性矩阵。
  7. 根据权利要求5所述的方法,其特征在于,所述对所述第一视频帧与所述第二视频帧进行特征匹配,包括:
    分别针对所述第一视频帧与所述第二视频帧的指定区域提取特征点,对提取出的特征点进行特征匹配。
  8. 根据权利要求7所述的方法,其特征在于,所述指定区域包括除所述运动主体以外的背景区域。
  9. 根据权利要求2所述的方法,其特征在于,所述根据映射后的所述第一视频帧与所述第二视频帧,合成第一目标视频帧,包括:
    对映射后的所述第一视频帧中的运动主体进行提取,得到分身图像;
    根据所述分身图像与所述第二视频帧,合成所述第二视频帧对应的目标视频帧。
  10. 根据权利要求9所述的方法,其特征在于,所述对映射后的所述第一视频帧中的运动主体进行提取,包括:
    通过所述运动主体对应的目标掩膜对映射后的所述第一视频帧进行处理。
  11. 根据权利要求10所述的方法,其特征在于,所述目标掩膜基于以下方式得到:
    对所述第一视频帧进行运动主体分割,得到所述运动主体对应的原始掩膜;
    将所述原始掩膜映射至所述第二视频帧对应的空间,得到所述目标掩膜。
  12. 根据权利要求10所述的方法,其特征在于,在通过所述目标掩膜对映射后的所述第一视频帧进行处理之前,还包括:
    去除所述目标掩膜中与所述第二视频帧中的运动主体重叠的部分。
  13. 根据权利要求10所述的方法,其特征在于,在通过所述目标掩膜对映射后的所述第一视频帧进行处理之前,还包括:
    对所述目标掩膜进行模糊处理。
  14. 根据权利要求2所述的方法,其特征在于,所述方法还包括:
    从所述原始视频中获取第三视频帧,所述第三视频帧对应的时刻晚于所述第二视频帧,且所述第一视频帧、所述第二视频帧与第三视频帧之间的帧间隔相同;
    将所述第一目标视频帧映射至所述第三视频帧对应的空间;
    根据映射后的所述第一目标视频帧与所述第三视频帧,合成第二目标视频帧。
  15. 根据权利要求1所述的方法,其特征在于,所述原始视频是通过相机在原地进行旋转拍摄得到的。
  16. 根据权利要求15所述的方法,其特征在于,所述原始视频是通过所述相机在原地跟随所述运动主体进行旋转拍摄得到的。
  17. 根据权利要求1所述的方法,其特征在于,不同的所述动态分身具有不同的透明度。
  18. 根据权利要求17所述的方法,其特征在于,所述动态分身落后所述运动主体的帧数与所述动态分身的透明度正相关。
  19. 根据权利要求1所述的方法,其特征在于,所述分身效果指令包括以下一种或多种信息:分身个数、分身帧间隔、分身透明度。
  20. 根据权利要求1所述的方法,其特征在于,所述分身效果指令由用户触发。
  21. 根据权利要求1所述的方法,其特征在于,所述原始视频是在获取到所述分身效果指令之后实时拍摄得到的。
  22. 根据权利要求21所述的方法,其特征在于,所述方法还包括:
    在拍摄所述原始视频时,判断相机在世界坐标系中的位移量是否小于或等于预设阈值。
  23. 根据权利要求1所述的方法,其特征在于,所述原始视频是用户从所拍摄视频中选取的片段。
  24. 一种相机,其特征在于,包括:处理器与存储计算机程序的存储器;
    所述处理器在执行所述计算机程序时实现以下步骤:
    获取分身效果指令;
    根据所述分身效果指令,对拍摄有运动主体的原始视频进行处理,得到目标视频,所述目标视频包括所述运动主体及所述运动主体对应的至少一个动态分身,所述动态分身以指定时延重复所述运动主体的运动。
  25. 根据权利要求24所述的相机,其特征在于,所述处理器对拍摄有运动主体的原始视频进行处理时用于,从拍摄有运动主体的原始视频中获取第一视频帧和第二视频帧,其中,所述第一视频帧对应的时刻早于所述第二视频帧;将所述第一视频帧映射至所述第二视频帧对应的空间;根据映射后的所述第一视频帧与所述第二视频帧,合成第一目标视频帧。
  26. 根据权利要求25所述的相机,其特征在于,所述处理器将所述第一视频帧映射至所述第二视频帧对应的空间时用于,通过空间变换矩阵对所述第一视频帧进行空间变换,以将所述第一视频帧映射至所述第二视频帧对应的空间。
  27. 根据权利要求26所述的相机,其特征在于,还包括:惯性测量单元IMU;
    所述空间变换矩阵包括旋转矩阵,所述旋转矩阵基于所述第一视频帧对应的相机位姿信息与所述第二视频帧对应的相机位姿信息计算得到,所述相机位姿信息通过所述IMU获取。
  28. 根据权利要求26所述的相机,其特征在于,所述空间变换矩阵包括单应性矩阵,所述处理器还用于,对所述第一视频帧与所述第二视频帧进行特征匹配,并根据匹配结果计算所述单应性矩阵。
  29. 根据权利要求28所述的相机,其特征在于,所述匹配结果包括所述第一视频帧与所述第二视频帧之间的多个特征对;
    所述处理器根据匹配结果计算所述单应性矩阵时用于,对所述多个特征对进行筛选,并根据筛选出的可信特征对,计算所述单应性矩阵。
  30. 根据权利要求28所述的相机,其特征在于,所述处理器对所述第一视频帧与所述第二视频帧进行特征匹配时用于,分别针对所述第一视频帧与所述第二视频帧的指定区域提取特征点,对提取出的特征点进行特征匹配。
  31. 根据权利要求30所述的相机,其特征在于,所述指定区域包括除所述运动主体以外的背景区域。
  32. 根据权利要求25所述的相机,其特征在于,所述处理器根据映射后的所述第一视频帧与所述第二视频帧,合成第一目标视频帧时用于,对映射后的所述第一视频帧中的运动主体进行提取,得到分身图像;根据所述分身图像与所述第二视频帧,合成所述第二视频帧对应的目标视频帧。
  33. 根据权利要求32所述的相机,其特征在于,所述处理器对映射后的所述第一视频帧中的运动主体进行提取时用于,通过所述运动主体对应的目标掩膜对映射后的所述第一视频帧进行处理。
  34. 根据权利要求33所述的相机,其特征在于,所述处理器还用于,对所述第一视频帧进行运动主体分割,得到所述运动主体对应的原始掩膜;将所述原始掩膜映射至所述第二视频帧对应的空间,得到所述目标掩膜。
  35. 根据权利要求33所述的相机,其特征在于,所述处理器还用于,在通过所述目标掩膜对映射后的所述第一视频帧进行处理之前,去除所述目标掩膜中与所述第二视频帧中的运动主体重叠的部分。
  36. 根据权利要求33所述的相机,其特征在于,所述处理器还用于,在通过所述目标掩膜对映射后的所述第一视频帧进行处理之前,对所述目标掩膜进行模糊处理。
  37. 根据权利要求25所述的相机,其特征在于,所述处理器还用于,从所述原始视频中获取第三视频帧,所述第三视频帧对应的时刻晚于所述第二视频帧,且所述第一视频帧、所述第二视频帧与第三视频帧之间的帧间隔相同;将所述第一目标视频帧映射至所述第三视频帧对应的空间;根据映射后的所述第一目标视频帧与所述第三视频帧,合成第二目标视频帧。
  38. 根据权利要求24所述的相机,其特征在于,所述原始视频是通过所述相机在原地进行旋转拍摄得到的。
  39. 根据权利要求38所述的相机,其特征在于,所述原始视频是通过所述相机在原地跟随所述运动主体进行旋转拍摄得到的。
  40. 根据权利要求24所述的相机,其特征在于,不同的所述动态分身具有不同的透明度。
  41. 根据权利要求40所述的相机,其特征在于,所述动态分身落后所述运动主体的帧数与所述动态分身的透明度正相关。
  42. 根据权利要求24所述的相机,其特征在于,所述分身效果指令包括以下一种或多种信息:分身个数、分身帧间隔、分身透明度。
  43. 根据权利要求24所述的相机,其特征在于,所述分身效果指令由用户触发。
  44. 根据权利要求24所述的相机,其特征在于,所述原始视频是在获取到所述分身效果指令之后实时拍摄得到的。
  45. 根据权利要求44所述的相机,其特征在于,所述处理器还用于,在拍摄所述原始视频时,判断所述相机在世界坐标系中的位移量是否小于或等于预设阈值。
  46. 根据权利要求24所述的相机,其特征在于,所述原始视频是用户从所拍摄视频中选取的片段。
  47. 一种移动终端,其特征在于,包括:处理器与存储计算机程序的存储器;
    所述处理器在执行所述计算机程序时实现以下步骤:
    获取分身效果指令;
    根据所述分身效果指令,对拍摄有运动主体的原始视频进行处理,得到目标视频,所述目标视频包括所述运动主体及所述运动主体对应的至少一个动态分身,所述动态分身以指定时延重复所述运动主体的运动。
  48. 根据权利要求47所述的移动终端,其特征在于,所述处理器对拍摄有运动主体的原始视频进行处理时用于,从拍摄有运动主体的原始视频中获取第一视频帧和第二视频帧,其中,所述第一视频帧对应的时刻早于所述第二视频帧;将所述第一视频帧映射至所述第二视频帧对应的空间;根据映射后的所述第一视频帧与所述第二视频帧,合成第一目标视频帧。
  49. 根据权利要求48所述的移动终端,其特征在于,所述处理器将所述第一视频帧映射至所述第二视频帧对应的空间时用于,通过空间变换矩阵对所述第一视频帧进行空间变换,以将所述第一视频帧映射至所述第二视频帧对应的空间。
  50. 根据权利要求49所述的移动终端,其特征在于,所述空间变换矩阵包括旋转矩阵,所述旋转矩阵基于所述第一视频帧对应的相机位姿信息与所述第二视频帧对应 的相机位姿信息计算得到。
  51. 根据权利要求49所述的移动终端,其特征在于,所述空间变换矩阵包括单应性矩阵,所述处理器还用于,对所述第一视频帧与所述第二视频帧进行特征匹配,并根据匹配结果计算所述单应性矩阵。
  52. 根据权利要求51所述的移动终端,其特征在于,所述匹配结果包括所述第一视频帧与所述第二视频帧之间的多个特征对;
    所述处理器根据匹配结果计算所述单应性矩阵时用于,对所述多个特征对进行筛选,并根据筛选出的可信特征对,计算所述单应性矩阵。
  53. 根据权利要求51所述的移动终端,其特征在于,所述处理器对所述第一视频帧与所述第二视频帧进行特征匹配时用于,分别针对所述第一视频帧与所述第二视频帧的指定区域提取特征点,对提取出的特征点进行特征匹配。
  54. 根据权利要求53所述的移动终端,其特征在于,所述指定区域包括除所述运动主体以外的背景区域。
  55. 根据权利要求48所述的移动终端,其特征在于,所述处理器根据映射后的所述第一视频帧与所述第二视频帧,合成第一目标视频帧时用于,对映射后的所述第一视频帧中的运动主体进行提取,得到分身图像;根据所述分身图像与所述第二视频帧,合成所述第二视频帧对应的目标视频帧。
  56. 根据权利要求55所述的移动终端,其特征在于,所述处理器对映射后的所述第一视频帧中的运动主体进行提取时用于,通过所述运动主体对应的目标掩膜对映射后的所述第一视频帧进行处理。
  57. 根据权利要求56所述的移动终端,其特征在于,所述处理器还用于,对所述第一视频帧进行运动主体分割,得到所述运动主体对应的原始掩膜;将所述原始掩膜映射至所述第二视频帧对应的空间,得到所述目标掩膜。
  58. 根据权利要求56所述的移动终端,其特征在于,所述处理器还用于,在通过所述目标掩膜对映射后的所述第一视频帧进行处理之前,去除所述目标掩膜中与所述第二视频帧中的运动主体重叠的部分。
  59. 根据权利要求56所述的移动终端,其特征在于,所述处理器还用于,在通过所述目标掩膜对映射后的所述第一视频帧进行处理之前,对所述目标掩膜进行模糊处理。
  60. 根据权利要求48所述的移动终端,其特征在于,所述处理器还用于,从所述原始视频中获取第三视频帧,所述第三视频帧对应的时刻晚于所述第二视频帧,且所 述第一视频帧、所述第二视频帧与第三视频帧之间的帧间隔相同;将所述第一目标视频帧映射至所述第三视频帧对应的空间;根据映射后的所述第一目标视频帧与所述第三视频帧,合成第二目标视频帧。
  61. 根据权利要求47所述的移动终端,其特征在于,所述原始视频是通过相机在原地进行旋转拍摄得到的。
  62. 根据权利要求61所述的移动终端,其特征在于,所述原始视频是通过所述相机在原地跟随所述运动主体进行旋转拍摄得到的。
  63. 根据权利要求47所述的移动终端,其特征在于,不同的所述动态分身具有不同的透明度。
  64. 根据权利要求63所述的移动终端,其特征在于,所述动态分身落后所述运动主体的帧数与所述动态分身的透明度正相关。
  65. 根据权利要求47所述的移动终端,其特征在于,所述分身效果指令包括以下一种或多种信息:分身个数、分身帧间隔、分身透明度。
  66. 根据权利要求47所述的移动终端,其特征在于,所述分身效果指令由用户触发。
  67. 根据权利要求47所述的移动终端,其特征在于,所述移动终端配置有相机,所述原始视频是在获取到所述分身效果指令之后通过所述相机实时拍摄得到的。
  68. 根据权利要求67所述的移动终端,其特征在于,所述处理器还用于,在拍摄所述原始视频时,判断所述相机在世界坐标系中的位移量是否小于或等于预设阈值。
  69. 根据权利要求47所述的移动终端,其特征在于,所述原始视频是用户从所拍摄视频中选取的片段。
  70. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质存储有计算机程序,所述计算机程序被处理器执行时实现如权利要求1-23任一项所述的图像处理方法。
PCT/CN2020/107433 2020-08-06 2020-08-06 图像处理方法、相机及移动终端 WO2022027447A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2020/107433 WO2022027447A1 (zh) 2020-08-06 2020-08-06 图像处理方法、相机及移动终端
CN202080035108.8A CN113841112A (zh) 2020-08-06 2020-08-06 图像处理方法、相机及移动终端

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/107433 WO2022027447A1 (zh) 2020-08-06 2020-08-06 图像处理方法、相机及移动终端

Publications (1)

Publication Number Publication Date
WO2022027447A1 true WO2022027447A1 (zh) 2022-02-10

Family

ID=78963297

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/107433 WO2022027447A1 (zh) 2020-08-06 2020-08-06 图像处理方法、相机及移动终端

Country Status (2)

Country Link
CN (1) CN113841112A (zh)
WO (1) WO2022027447A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116229337A (zh) * 2023-05-10 2023-06-06 瀚博半导体(上海)有限公司 用于视频处理的方法、装置、系统、设备和介质

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114302071B (zh) * 2021-12-28 2024-02-20 影石创新科技股份有限公司 视频处理方法、装置、存储介质及电子设备
CN114554280B (zh) * 2022-01-14 2024-03-19 影石创新科技股份有限公司 影分身视频的生成方法、生成装置、电子设备及存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110235998A1 (en) * 2010-03-25 2011-09-29 Disney Enterprises, Inc. Continuous freeze-frame video effect system and method
CN104125407A (zh) * 2014-08-13 2014-10-29 深圳市中兴移动通信有限公司 物体运动轨迹的拍摄方法和移动终端
CN106303291A (zh) * 2016-09-30 2017-01-04 努比亚技术有限公司 一种图片处理方法及终端
CN108259781A (zh) * 2017-12-27 2018-07-06 努比亚技术有限公司 视频合成方法、终端及计算机可读存储介质
CN110536087A (zh) * 2019-05-06 2019-12-03 珠海全志科技股份有限公司 电子设备及其运动轨迹照片合成方法、装置和嵌入式装置
CN111327840A (zh) * 2020-02-27 2020-06-23 努比亚技术有限公司 一种多帧特效视频获取方法、终端及计算机可读存储介质

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105992068A (zh) * 2015-05-19 2016-10-05 乐视移动智能信息技术(北京)有限公司 一种视频文件预览方法及装置
CN111601033A (zh) * 2020-04-27 2020-08-28 北京小米松果电子有限公司 视频处理方法、装置及存储介质
CN113490050B (zh) * 2021-09-07 2021-12-17 北京市商汤科技开发有限公司 视频处理方法和装置、计算机可读存储介质及计算机设备

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110235998A1 (en) * 2010-03-25 2011-09-29 Disney Enterprises, Inc. Continuous freeze-frame video effect system and method
CN104125407A (zh) * 2014-08-13 2014-10-29 深圳市中兴移动通信有限公司 物体运动轨迹的拍摄方法和移动终端
CN106303291A (zh) * 2016-09-30 2017-01-04 努比亚技术有限公司 一种图片处理方法及终端
CN108259781A (zh) * 2017-12-27 2018-07-06 努比亚技术有限公司 视频合成方法、终端及计算机可读存储介质
CN110536087A (zh) * 2019-05-06 2019-12-03 珠海全志科技股份有限公司 电子设备及其运动轨迹照片合成方法、装置和嵌入式装置
CN111327840A (zh) * 2020-02-27 2020-06-23 努比亚技术有限公司 一种多帧特效视频获取方法、终端及计算机可读存储介质

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116229337A (zh) * 2023-05-10 2023-06-06 瀚博半导体(上海)有限公司 用于视频处理的方法、装置、系统、设备和介质
CN116229337B (zh) * 2023-05-10 2023-09-26 瀚博半导体(上海)有限公司 用于视频处理的方法、装置、系统、设备和介质
CN117152658A (zh) * 2023-05-10 2023-12-01 瀚博半导体(上海)有限公司 用于视频处理的方法、装置、系统、设备和介质

Also Published As

Publication number Publication date
CN113841112A (zh) 2021-12-24

Similar Documents

Publication Publication Date Title
WO2022027447A1 (zh) 图像处理方法、相机及移动终端
TWI712918B (zh) 擴增實境的影像展示方法、裝置及設備
EP3457683B1 (en) Dynamic generation of image of a scene based on removal of undesired object present in the scene
US11164282B2 (en) Virtual lens simulation for video and photo cropping
US7805066B2 (en) System for guided photography based on image capturing device rendered user recommendations according to embodiments
KR102227583B1 (ko) 딥 러닝 기반의 카메라 캘리브레이션 방법 및 장치
TWI554936B (zh) 圖像處理裝置、圖像處理方法以及電腦程式產品
WO2016155377A1 (zh) 图片展示方法和装置
WO2021114868A1 (zh) 降噪方法、终端及存储介质
CN109089038B (zh) 增强现实拍摄方法、装置、电子设备及存储介质
WO2023071790A1 (zh) 目标对象的姿态检测方法、装置、设备及存储介质
CN107295265A (zh) 拍摄方法及装置、计算机装置和计算机可读存储介质
JP2022514321A (ja) パノラマ映像の手ぶれ補正方法及び携帯端末
CN105282455A (zh) 一种拍照方法、装置及移动终端
CN108320331A (zh) 一种生成用户场景的增强现实视频信息的方法与设备
CN108475410B (zh) 三维立体水印添加方法、装置及终端
JP6157238B2 (ja) 画像処理装置、画像処理方法及び画像処理プログラム
CN108109158B (zh) 基于自适应阈值分割的视频穿越处理方法及装置
CN117082225B (zh) 一种虚拟延时视频的生成方法、装置、设备及存储介质
WO2023178589A1 (zh) 拍摄指引方法、电子设备、系统及存储介质
TW202334902A (zh) 用於圖像重投影的系統和方法
CN116385710A (zh) 延迟数据计算方法、图像融合方法及电子设备
CN114339029A (zh) 拍摄方法、装置及电子设备
CN117135440A (zh) 视频处理方法及装置、计算机可读介质和电子设备
CN113012160A (zh) 图像处理方法、装置、终端设备及计算机可读存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20948198

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20948198

Country of ref document: EP

Kind code of ref document: A1