WO2022012231A1 - 视频生成方法、装置、可读介质及电子设备 - Google Patents

视频生成方法、装置、可读介质及电子设备 Download PDF

Info

Publication number
WO2022012231A1
WO2022012231A1 PCT/CN2021/099107 CN2021099107W WO2022012231A1 WO 2022012231 A1 WO2022012231 A1 WO 2022012231A1 CN 2021099107 W CN2021099107 W CN 2021099107W WO 2022012231 A1 WO2022012231 A1 WO 2022012231A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
target
target object
original image
sliding
Prior art date
Application number
PCT/CN2021/099107
Other languages
English (en)
French (fr)
Inventor
靳潇杰
沈晓辉
王妍
Original Assignee
北京字节跳动网络技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京字节跳动网络技术有限公司 filed Critical 北京字节跳动网络技术有限公司
Priority to EP21841214.6A priority Critical patent/EP4178194A4/en
Publication of WO2022012231A1 publication Critical patent/WO2022012231A1/zh
Priority to US18/091,087 priority patent/US11836887B2/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/222Studio circuitry; Studio devices; Studio equipment
    • H04N5/262Studio circuits, e.g. for mixing, switching-over, change of character of image, other special effects ; Cameras specially adapted for the electronic generation of special effects
    • H04N5/2621Cameras specially adapted for the electronic generation of special effects during image pickup, e.g. digital cameras, camcorders, video cameras having integrated special effects capability
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules
    • H04N23/61Control of cameras or camera modules based on recognised objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4007Scaling of whole images or parts thereof, e.g. expanding or contracting based on interpolation, e.g. bilinear interpolation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/194Segmentation; Edge detection involving foreground-background segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • G06T7/248Analysis of motion using feature-based methods, e.g. the tracking of corners or segments involving reference images or patches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • G06T7/536Depth or shape recovery from perspective effects, e.g. by using vanishing points
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules
    • H04N23/67Focus control based on electronic image sensor signals
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules
    • H04N23/69Control of means for changing angle of the field of view, e.g. optical zoom objectives or electronic zooming
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/95Computational photography systems, e.g. light-field imaging systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/95Computational photography systems, e.g. light-field imaging systems
    • H04N23/958Computational photography systems, e.g. light-field imaging systems for extended depth of field imaging
    • H04N23/959Computational photography systems, e.g. light-field imaging systems for extended depth of field imaging by adjusting depth of field during image capture, e.g. maximising or setting range based on scene characteristics
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/222Studio circuitry; Studio devices; Studio equipment
    • H04N5/262Studio circuits, e.g. for mixing, switching-over, change of character of image, other special effects ; Cameras specially adapted for the electronic generation of special effects
    • H04N5/272Means for inserting a foreground image in a background image, i.e. inlay, outlay
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20112Image segmentation details
    • G06T2207/20164Salient point detection; Corner detection

Definitions

  • the present disclosure relates to the field of computer technologies, and in particular, to a video generation method, apparatus, readable medium, and electronic device.
  • Sliding zoom also known as Hitchcock-style zoom
  • Hitchcock-style zoom is a video shooting technique that can change the visual perspective relationship, thereby compressing or enlarging the background space without changing the subject of the picture, so as to create a kind of sci-fi, dazzling effect. Cool lens feel.
  • a shot video obtained without using the Hitchcock-style zoom shooting method if it is necessary to make it have a Hitchcock-style zoom effect, it can be achieved by manually post-processing the shot video. , that is, based on a video that has been shot, manually position the subject in the picture (for example, a person in the picture), and then keyframe the background other than the main object to zoom.
  • the manual processing method is very complex, time-consuming and inefficient.
  • the present disclosure provides a video generation method, the method comprising:
  • the image of the target object is superimposed on the target background image to obtain the target image corresponding to the target frame;
  • a target video is generated.
  • the present disclosure provides a video generation device, the device comprising:
  • a first acquisition module used for acquiring the original image corresponding to the target frame, and identifying the target object in the original image
  • a processing module configured to perform sliding zoom processing on the initial background image except the target object in the original image according to the sliding zoom strategy to obtain the target background image; wherein the sliding zoom strategy is at least used to indicate the initial background
  • the sliding zoom strategy is at least used to indicate the initial background
  • a first generation module configured to superimpose the image of the target object on the target background image according to the position of the target object in the original image to obtain a target image corresponding to the target frame
  • the second generating module is configured to generate a target video based on the target image corresponding to the target frame.
  • the present disclosure provides a computer-readable medium on which a computer program is stored, and when the program is executed by a processing apparatus, implements the steps of the method described in the first aspect of the present disclosure.
  • the present disclosure provides an electronic device, comprising:
  • a processing device is configured to execute the computer program in the storage device to implement the steps of the method in the first aspect of the present disclosure.
  • the original image corresponding to the target frame is obtained, the target object in the original image is identified, and the initial background image except the target object in the original image is subjected to sliding zoom processing according to the sliding zoom strategy to obtain the target background image, and According to the position of the target object in the original image, the image of the target object is superimposed on the target background image to obtain the target image corresponding to the target frame, and the target video is generated based on the target image corresponding to the target frame.
  • a sliding zoom effect can be automatically added to the image, ensuring that the main body of the image remains unchanged while changing the perspective relationship of the background, so that the main body can be generated based on a series of such images while the background is quickly compressed or enlarged.
  • the target video with the sliding zoom effect which is the target video with the sliding zoom effect.
  • FIG. 1 is a flowchart of a video generation method provided according to an embodiment of the present disclosure
  • 2 and 3 are exemplary schematic diagrams of two frames of images in a target video
  • FIG. 4 is an exemplary flowchart of the steps of performing sliding zoom processing on the initial background image except the target object in the original image according to the sliding zoom strategy in the video generation method provided according to the present disclosure, and obtaining the target background image;
  • FIG. 5 is a block diagram of a video generation apparatus provided according to an embodiment of the present disclosure.
  • FIG. 6 shows a schematic structural diagram of an electronic device suitable for implementing an embodiment of the present disclosure.
  • the term “including” and variations thereof are open-ended inclusions, ie, "including but not limited to”.
  • the term “based on” is “based at least in part on.”
  • the term “one embodiment” means “at least one embodiment”; the term “another embodiment” means “at least one additional embodiment”; the term “some embodiments” means “at least some embodiments”. Relevant definitions of other terms will be given in the description below.
  • audio/video editing includes a three-layer structure, namely the business layer (frontend), the SDK layer (middle platform), and the algorithm layer (backend).
  • SDK is the abbreviation of Software Development Kit. , which means "software development kit” in Chinese.
  • the business layer is responsible for receiving user operations, that is, the client; the SDK layer is responsible for data transmission, such as passing the data to be processed to the algorithm layer to obtain the processing results of the algorithm layer, and further process the data according to the obtained processing results.
  • the layer can be responsible for audio/video frame extraction, encoding and decoding, transmission, etc.
  • the data processing strategy can be set in the SDK layer; the algorithm layer is responsible for processing the data passed in by the SDK layer, and outputting the obtained processing results to the SDK layer. .
  • the method provided by the present disclosure is mainly applied to the scene of video generation (that is, generating a video with a sliding zoom effect), and the related algorithms used in the present disclosure are integrated in the algorithm layer, and the method provided by the present disclosure includes data processing related to the scene.
  • the steps can be performed by the SDK layer (middle stage), and the final processing result (for example, the target video) can be displayed on the client side.
  • FIG. 1 is a flowchart of a video generation method provided according to an embodiment of the present disclosure. As shown in Figure 1, the method may include the following steps:
  • step 11 obtain the original image corresponding to the target frame, and identify the target object in the original image;
  • step 12 perform sliding zoom processing on the initial background image except the target object in the original image according to the sliding zoom strategy to obtain the target background image
  • step 13 according to the position of the target object in the original image, the image of the target object is superimposed on the target background image to obtain the target image corresponding to the target frame;
  • step 14 a target video is generated based on the target image corresponding to the target frame.
  • Steps 11 to 13 describe the process of generating a target image by performing sliding zoom processing on a certain image (that is, the original image corresponding to the target frame). In practical applications, steps 11 to 13 need to be used for multiple target frames.
  • Target images are generated respectively, and based on the time sequence of these target images and their corresponding target frames, a target video is formed together.
  • the target object in the original image is first identified, that is, the screen subject in the original image is identified.
  • the main body of the screen ie, the target object
  • the target object may be preset. For example, if a person is assumed to be the target object, then recognizing the target object in the original image is actually recognizing the person in the original image. For another example, if the object with the largest proportion of the screen is set as the target object, then identifying the target object in the original image is actually identifying the object with a higher proportion of the screen in the original image.
  • a video with a sliding zoom effect is to ensure that the main body of the image in the video remains unchanged while changing the perspective relationship of the background.
  • Two aspects of processing are required here. One is to ensure that the main body of the video in the video remains unchanged, and the other is to change the perspective relationship of the background. Aspects change the perspective of the background.
  • the target object in each image in the finally obtained target video with sliding zoom effect should be in this desired position.
  • the desired position can be manually selected, for example, selected as the position of the center of the picture.
  • the desired position can be determined according to the original image corresponding to each target frame. For example, the position of the target object in the original image corresponding to the target frame with the most forward time is taken as the position of the target object in the entire target video. desired location.
  • the perspective relationship of the background needs to be changed, and the initial background image other than the target object in the original image needs to be subjected to sliding zoom processing according to the sliding zoom strategy to obtain the target background image.
  • the sliding zoom strategy is used to change the perspective relationship of the initial background image in the original image, that is, zoom processing is performed on the basis of advancing or receding the picture.
  • shooting the original image it corresponds to a shooting position, that is, the position of the lens.
  • the advance or retreat of the picture is to simulate the picture that can be captured by advancing or retreating the lens in the three-dimensional image space corresponding to the original image. It mainly changes the perspective relationship of the initial background image. Zooming is zooming in or out of the picture.
  • the sliding zoom strategy is at least used to indicate the sliding direction and the zooming direction of the initial background image, and the sliding direction and the zooming direction are opposite.
  • the sliding direction can be a direction close to the perspective point or a direction away from the perspective point. If the sliding direction is close to the perspective point, it means that the screen is advancing towards the target object (equivalent to moving the lens towards the target object). The camera moves away from the target object).
  • the zoom direction can be a direction close to the perspective point or a direction away from the perspective point.
  • the zooming direction is close to the perspective point, it means that the viewing angle becomes smaller (the focal length increases); if the zooming direction is the direction away from the perspective point, it means that the viewing angle becomes larger (the focal length shrinks).
  • the sliding direction and the zooming direction are opposite. For example, if the sliding direction is a direction close to the perspective point, the zoom direction is a direction away from the perspective point.
  • the initial background image except the target object in the original image can be subjected to sliding zoom processing to obtain the target background image. Further, according to the position of the target object in the original image, the target image corresponding to the target frame can be obtained by superimposing the image of the target object on the target background image.
  • the target object will have a desired position in the picture, so if this desired position does not match the position of the target object in the original image (for example, the position is different), the image of the target object needs to be further processed , so that the image obtained after image processing of the target object can match the desired position.
  • the desired position is the center of the screen and occupies 50% of the screen
  • the image of the target object needs to be enlarged so that the processed The image is in the center of the frame and occupies 50% of the frame.
  • the target video can be generated based on the target object corresponding to the target frame.
  • there may be multiple target frames that is, for multiple target frames, respective target images are generated, so that, based on the target images corresponding to each target frame, according to the time sequence relationship of the target frames, a sliding Target video with zoom effect.
  • the position of the target object is the same in each frame of the target video.
  • the size of each frame in the target video is the same.
  • FIG. 2 and FIG. 3 are schematic diagrams of effects of the video generation method provided by the present disclosure.
  • Fig. 2 is a frame of video frame in the target video
  • Fig. 3 is a frame of video frame in the target video
  • T is the target object
  • the dotted line indicates background.
  • Figure 2 and Figure 3 the position and size of the target object have not changed, and, compared with Figure 3 and Figure 2, the background part is zoomed in.
  • Figure 3 is closer to the background behind T
  • Figure 3 is comparable
  • Yu performed sliding zoom to the rear of the target object T, resulting in a change in the perspective relationship of the background part.
  • a plurality of video frames in the target video are similar to this, so that the target video can achieve a sliding zoom effect.
  • the original image corresponding to the target frame is obtained, the target object in the original image is identified, and the initial background image except the target object in the original image is subjected to sliding zoom processing according to the sliding zoom strategy to obtain the target background image, and According to the position of the target object in the original image, the image of the target object is superimposed on the target background image to obtain the target image corresponding to the target frame, and the target video is generated based on the target image corresponding to the target frame.
  • a sliding zoom effect can be automatically added to the image, ensuring that the main body of the image remains unchanged while changing the perspective relationship of the background, so that the main body can be generated based on a series of such images while the background is quickly compressed or enlarged.
  • the target video with the sliding zoom effect which is the target video with the sliding zoom effect.
  • acquiring the original image corresponding to the target frame may include the following steps:
  • the to-be-processed media file is an image or video containing the target object.
  • This embodiment is equivalent to post-processing on the stored images or videos, that is, performing post-processing based on the images or videos that have been captured to obtain a video with a sliding zoom effect.
  • the media file to be processed is an image containing the target object, that is, only one existing image is processed.
  • the original image corresponding to each target frame is the media file to be processed, and each target frame obtains the same The original image.
  • the solution of the present disclosure is equivalent to performing sliding zoom on the background based on only one image, generating multiple target images corresponding to different target frames, and synthesizing them into a target video.
  • the time sequence of the target frames can follow the sequence of video frames in the media file to be processed, for example, the forward sequence (or reverse sequence) is obtained from the media file to be processed The original image. If it is in positive order, the video frame with the earlier time in the media file to be processed corresponds to the target frame earlier in time. On the contrary, if it is in reverse order, the video frame with later time in the media file to be processed corresponds to the earlier time. target frame.
  • acquiring the original image corresponding to the target frame may include the following steps:
  • the original image corresponding to the target frame is obtained from the information stream collected in real time by the image collection device.
  • This embodiment is equivalent to real-time processing of the information flow acquired in real time, that is, in the process of capturing images or videos, the original image of the target frame is acquired in real time, and the operations from steps 11 to 13 are performed to obtain the corresponding target frame.
  • identifying the target object in the original image may include the following steps:
  • the target object in the original image is identified by a pre-trained target object recognition model.
  • the target object recognition model is used to recognize the target object in the image, for example, to recognize the outline of the target object, or to recognize the corresponding position of the target object in the image, and so on.
  • the target object recognition model is equivalent to the classification model, that is, the pixels belonging to the target object category are identified from the image.
  • the target object classification model can be obtained as follows:
  • each set of training data includes a training image and label information representing whether each pixel in the training image belongs to the target object;
  • the neural network model is trained to obtain the target object classification model.
  • a training image in a set of training data is used as input data
  • the label information of the training image is used as the real output
  • the actual output of the input training image is compared with the real output of the neural network model.
  • the output is compared, and the comparison result (for example, the loss value calculated by the two) is applied to the neural network model to adjust the parameters inside the neural network model, and so on and so forth, until the conditions for the model to stop training are met (for example, the number of training times reaches A certain number of times, or, the loss value is less than a certain threshold, etc.), and the resulting model is used as the target object classification model.
  • the model training method belongs to the common knowledge in the art, and the above is only used as an example for illustration, and the present disclosure will not give them one by one.
  • identifying the target object in the original image may include the following steps:
  • the target object in the original image is identified by the historical position information corresponding to the target object.
  • the historical location information is obtained by performing motion tracking on the image before the original image.
  • the location of the target object in the original image can be inferred.
  • the movement trend of the target object can be determined by the historical position information of the target object, so that the position of the target object in the reference image before the original image and closest in time to the original image can be used as the starting point, according to the historical position information of the target object.
  • the movement trend of the original image and the shooting time difference between the reference image and the original image determine the position change of the target object in the original image relative to that in the reference image, so that the position of the target object in the original image can be inferred to identify the target object in the original image. target.
  • the position of the target object in the original image can be directly determined through historical motion tracking of the target object, so as to identify the target object in the original image, and the amount of data processing is small.
  • step 12 The step of obtaining the target background image by performing sliding zoom processing on the initial background image other than the target object in the original image according to the sliding zoom strategy in step 12 will be described in detail below.
  • step 12 may include the following steps, as shown in FIG. 4 .
  • step 41 the perspective point of the initial background image is determined
  • step 42 according to the initial position of the identified target object in the original image and the desired position of the target object in the screen, determine the sliding direction, sliding distance, zoom direction and zoom distance of the initial background image;
  • step 43 take the perspective point as the center, and perform screen sliding on the initial background image along the sliding direction with the sliding distance;
  • step 44 the initial background image is zoomed by the zoom distance along the zoom direction with the perspective point as the center.
  • the perspective point of the initial background image needs to be determined first to prepare for the subsequent sliding zoom processing.
  • step 41 may include the following steps:
  • the center point of the image is determined as the perspective point of the sliding zoom. It is a fast and relatively stable way to determine the perspective point.
  • step 41 may include the following steps:
  • the depth information hidden in the two-dimensional image can help to understand the three-dimensional structure of the image scene, and the understanding of the depth information of the two-dimensional image, as a well-known technology in the field, is widely used in the field.
  • the depth information of two-dimensional images is understood by means of shading, illumination, geometric analysis, feature learning, etc. The specific methods are not described here.
  • the depth information of the original image can be obtained, and then the vanishing point position in the original image, that is, the vanishing point position, can be determined according to the depth information. Further, the vanishing point position can be directly used as the perspective point.
  • the three-dimensional image space corresponding to the original image can be constructed based on the depth information of the original image, so that the vanishing point position can be obtained based on the common vanishing point identification method in the prior art.
  • the method of sliding processing may include a sliding direction and a sliding distance
  • the method of zooming processing may include a zoom direction and a zoom distance.
  • the meanings of the sliding direction and the zooming direction have been listed in detail above, and the description will not be repeated here.
  • the sliding distance refers to the distance the lens should advance or retreat in the three-dimensional image space corresponding to the original image.
  • Zoom distance refers to the distance by which the focal length changes. Therefore, it is necessary to continue to perform step 42 to determine the sliding direction, sliding distance, zoom direction and zoom distance of the initial background image.
  • step 42 may include the following steps:
  • the sliding direction, the sliding distance, the zooming direction and the zooming distance are determined.
  • a three-dimensional image space corresponding to the original image can be constructed based on the original image.
  • a coordinate, ie, a first coordinate can be determined according to the initial position of the target object in the original image
  • a coordinate, ie, a second coordinate can be determined according to the desired position of the target object.
  • the purpose of sliding zoom is to make the target object in the desired position.
  • the background may also change accordingly, that is, the background can be photographed from the first coordinate.
  • the screen changes to a screen that can be photographed with the second coordinates.
  • the sliding direction, the sliding distance, the zooming direction and the zooming distance can be determined.
  • the sliding direction should be the direction close to the perspective point
  • the sliding distance is the distance from the first coordinate to the second coordinate
  • the zoom direction is away from the perspective.
  • the direction of the point, the zoom distance is the change in the focal length of the lens from the first coordinate to the second coordinate.
  • the sliding zoom processing can be performed on the initial background image based on this, that is, steps 43 and 44 are performed.
  • steps 43 and 44 are performed.
  • the present disclosure does not strictly limit the execution order of step 43 and step 44, and the two may be executed simultaneously or sequentially.
  • the method provided by the present disclosure may further include the following steps:
  • the second background image is supplemented into the first background image to obtain the initial background image.
  • the target image is removed from the original image to obtain the first background image.
  • the first background image is partially missing.
  • the missing part can be supplemented.
  • the second background image located in the area covered by the target object in the original image can be acquired, and the second background image can be supplemented into the first background image according to the position of the area covered by the target object to obtain the initial background image.
  • the second background image located in the area covered by the target object in the original image various methods may be adopted. For example, some images for background supplementation may be preset, and the second background image is directly obtained from these images. For another example, based on the first background image, a partial image may be extracted from the first background image as the second background image. For another example, based on the first background image, image filling can be performed on the missing part (for example, image filling is performed by using an existing image restoration algorithm, image filling algorithm, etc.), and the filled image content can be used as the second background image.
  • some images for background supplementation may be preset, and the second background image is directly obtained from these images.
  • a partial image may be extracted from the first background image as the second background image.
  • image filling can be performed on the missing part (for example, image filling is performed by using an existing image restoration algorithm, image filling algorithm, etc.), and the filled image content can be used as the second background image.
  • the obtained initial background image is complete, and the target background image obtained after sliding zoom processing based on this initial background image will also be complete.
  • the target image is also complete, and there will be no background missing, which ensures the information integrity of the image.
  • FIG. 5 is a block diagram of a video generating apparatus provided according to an embodiment of the present disclosure. As shown in FIG. 5, the apparatus 50 may include:
  • the first acquisition module 51 is used to acquire the original image corresponding to the target frame, and identify the target object in the original image;
  • the processing module 52 is configured to perform sliding zoom processing on the initial background image other than the target object in the original image according to the sliding zoom strategy to obtain the target background image; wherein, the sliding zoom strategy is at least used to indicate the initial background image.
  • the sliding direction and zooming direction of the background image, the sliding direction and the zooming direction are opposite;
  • the first generation module 53 is used to superimpose the image of the target object on the target background image according to the position of the target object in the original image, to obtain the target image corresponding to the target frame;
  • the second generating module 54 is configured to generate a target video based on the target image corresponding to the target frame.
  • the first obtaining module 51 includes:
  • a first identification sub-module for identifying the target object in the original image through a pre-trained target object recognition model
  • the second identification sub-module is configured to identify the target object in the original image through historical position information corresponding to the target object, where the historical position information is obtained by performing motion tracking on the image before the original image.
  • processing module 52 includes:
  • a first determination submodule used for determining the perspective point of the initial background image
  • the second determination sub-module is configured to determine the sliding direction, sliding direction of the initial background image according to the identified initial position of the target object in the original image and the expected position of the target object in the screen distance, said zoom direction and zoom distance;
  • a third determining submodule configured to perform screen sliding on the initial background image with the perspective point as the center and the sliding distance along the sliding direction;
  • the fourth determination sub-module is used for zooming the initial background image with the perspective point as the center and along the zooming direction and the zooming distance.
  • the first determination submodule is configured to determine the center point of the original image as the perspective point
  • the first determination submodule is used for: acquiring depth information of the original image; determining a vanishing point position in the original image according to the depth information; and using the vanishing point position as the perspective point.
  • the second determination sub-module is configured to: determine the first coordinate where the initial position is located in the three-dimensional image space corresponding to the original image; determine where the desired position is located in the three-dimensional image space.
  • the second coordinate at the location; according to the first coordinate and the second coordinate, the sliding direction, the sliding distance, the zooming direction and the zooming distance are determined.
  • the device further includes:
  • an image removal module for removing the target object from the original image to obtain a first background image
  • a second acquisition module configured to acquire a second background image located in the area covered by the target object in the original image
  • a supplementing module configured to supplement the second background image into the first background image according to the position of the area covered by the target object to obtain the initial background image.
  • the first obtaining module 51 is configured to obtain the original image corresponding to the target frame from the to-be-processed media file, where the to-be-processed media file is an image or video containing the target object;
  • the first acquisition module 51 is configured to acquire the original image corresponding to the target frame from the information stream acquired by the image acquisition device in real time.
  • Terminal devices in the embodiments of the present disclosure may include, but are not limited to, such as mobile phones, notebook computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablets), PMPs (portable multimedia players), vehicle-mounted terminals (eg, mobile terminals such as in-vehicle navigation terminals), etc., and stationary terminals such as digital TVs, desktop computers, and the like.
  • the electronic device shown in FIG. 6 is only an example, and should not impose any limitation on the function and scope of use of the embodiments of the present disclosure.
  • an electronic device 600 may include a processing device (eg, a central processing unit, a graphics processor, etc.) 601 that may be loaded into random access according to a program stored in a read only memory (ROM) 602 or from a storage device 608 Various appropriate actions and processes are executed by the programs in the memory (RAM) 603 . In the RAM 603, various programs and data required for the operation of the electronic device 600 are also stored.
  • the processing device 601, the ROM 602, and the RAM 603 are connected to each other through a bus 604.
  • An input/output (I/O) interface 605 is also connected to bus 604 .
  • I/O interface 605 input devices 606 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; including, for example, a liquid crystal display (LCD), speakers, vibration An output device 607 of a computer, etc.; a storage device 608 including, for example, a magnetic tape, a hard disk, etc.; and a communication device 609.
  • Communication means 609 may allow electronic device 600 to communicate wirelessly or by wire with other devices to exchange data. While FIG. 6 shows electronic device 600 having various means, it should be understood that not all of the illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided.
  • embodiments of the present disclosure include a computer program product comprising a computer program carried on a non-transitory computer readable medium, the computer program containing program code for performing the method illustrated in the flowchart.
  • the computer program may be downloaded and installed from the network via the communication device 609, or from the storage device 608, or from the ROM 602.
  • the processing apparatus 601 the above-mentioned functions defined in the methods of the embodiments of the present disclosure are executed.
  • the computer-readable medium mentioned above in the present disclosure may be a computer-readable signal medium or a computer-readable storage medium, or any combination of the above two.
  • the computer-readable storage medium can be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus or device, or a combination of any of the above. More specific examples of computer readable storage media may include, but are not limited to, electrical connections with one or more wires, portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable Programmable read only memory (EPROM or flash memory), fiber optics, portable compact disk read only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing.
  • a computer-readable storage medium can be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device.
  • a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave with computer-readable program code embodied thereon. Such propagated data signals may take a variety of forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • a computer-readable signal medium can also be any computer-readable medium other than a computer-readable storage medium that can transmit, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device .
  • Program code embodied on a computer readable medium may be transmitted using any suitable medium including, but not limited to, electrical wire, optical fiber cable, RF (radio frequency), etc., or any suitable combination of the foregoing.
  • the client and server can use any currently known or future developed network protocol such as HTTP (HyperText Transfer Protocol) to communicate, and can communicate with digital data in any form or medium Communication (eg, a communication network) interconnects.
  • HTTP HyperText Transfer Protocol
  • Examples of communication networks include local area networks (“LAN”), wide area networks (“WAN”), the Internet (eg, the Internet), and peer-to-peer networks (eg, ad hoc peer-to-peer networks), as well as any currently known or future development network of.
  • the above-mentioned computer-readable medium may be included in the above-mentioned electronic device; or may exist alone without being assembled into the electronic device.
  • the above-mentioned computer-readable medium carries one or more programs, and when the above-mentioned one or more programs are executed by the electronic device, the electronic device is made to: acquire the original image corresponding to the target frame, and identify the target in the original image object; perform sliding zoom processing on the initial background image except the target object in the original image according to the sliding zoom strategy to obtain the target background image; wherein the sliding zoom strategy is at least used to indicate the sliding of the initial background image direction and zooming direction, the sliding direction is opposite to the zooming direction; according to the position of the target object in the original image, the image of the target object is superimposed on the target background image to obtain the target object
  • the target image corresponding to the frame based on the target image corresponding to the target frame, a target video is generated.
  • Computer program code for performing operations of the present disclosure may be written in one or more programming languages, including but not limited to object-oriented programming languages—such as Java, Smalltalk, C++, and This includes conventional procedural programming languages - such as the "C" language or similar programming languages.
  • the program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer, or entirely on the remote computer or server.
  • the remote computer may be connected to the user's computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or may be connected to an external computer (eg, using an Internet service provider to via Internet connection).
  • LAN local area network
  • WAN wide area network
  • each block in the flowchart or block diagrams may represent a module, segment, or portion of code that contains one or more logical functions for implementing the specified functions executable instructions.
  • the functions noted in the blocks may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved.
  • each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations can be implemented in dedicated hardware-based systems that perform the specified functions or operations , or can be implemented in a combination of dedicated hardware and computer instructions.
  • the modules involved in the embodiments of the present disclosure may be implemented in software or hardware. Wherein, the name of the module does not constitute a limitation of the module itself under certain circumstances.
  • the first acquisition module can also be described as "acquiring the original image corresponding to the target frame, and identifying the target in the original image. object's module”.
  • exemplary types of hardware logic components include: Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), Systems on Chips (SOCs), Complex Programmable Logical Devices (CPLDs) and more.
  • FPGAs Field Programmable Gate Arrays
  • ASICs Application Specific Integrated Circuits
  • ASSPs Application Specific Standard Products
  • SOCs Systems on Chips
  • CPLDs Complex Programmable Logical Devices
  • a machine-readable medium may be a tangible medium that may contain or store a program for use by or in connection with the instruction execution system, apparatus or device.
  • the machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium.
  • Machine-readable media may include, but are not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, devices, or devices, or any suitable combination of the foregoing.
  • machine-readable storage media would include one or more wire-based electrical connections, portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), fiber optics, compact disk read only memory (CD-ROM), optical storage, magnetic storage, or any suitable combination of the foregoing.
  • RAM random access memory
  • ROM read only memory
  • EPROM or flash memory erasable programmable read only memory
  • CD-ROM compact disk read only memory
  • magnetic storage or any suitable combination of the foregoing.
  • a video generation method comprising:
  • the image of the target object is superimposed on the target background image to obtain the target image corresponding to the target frame;
  • a target video is generated.
  • a video generation method wherein the identifying a target object in the original image includes:
  • the target object in the original image is identified through historical position information corresponding to the target object, and the historical position information is obtained by performing motion tracking on images before the original image.
  • a video generation method wherein a sliding zoom process is performed on an initial background image other than the target object in the original image according to a sliding zoom strategy to obtain a target background image ,include:
  • the identified initial position of the target object in the original image and the expected position of the target object in the screen determine the sliding direction, sliding distance, zoom direction and zoom of the initial background image distance
  • the initial background image is zoomed by the zoom distance along the zoom direction with the perspective point as the center.
  • a video generation method the determining the perspective point of the initial background image, comprising:
  • the vanishing point position is taken as the perspective point.
  • a video generation method wherein according to the identified initial position of the target object in the original image and the desired position of the target object in the picture, Determining the sliding direction, sliding distance, zooming direction and zooming distance of the initial background image includes:
  • the sliding direction, the sliding distance, the zooming direction and the zooming distance are determined.
  • a video generation method is provided, the method further comprising:
  • the second background image is supplemented into the first background image to obtain the initial background image.
  • a method for generating a video wherein the acquiring an original image corresponding to a target frame includes:
  • the original image corresponding to the target frame is obtained from the information stream collected in real time by the image collection device.
  • a video generation apparatus comprising:
  • a first acquisition module used for acquiring the original image corresponding to the target frame, and identifying the target object in the original image
  • a processing module configured to perform sliding zoom processing on the initial background image except the target object in the original image according to the sliding zoom strategy to obtain the target background image; wherein the sliding zoom strategy is at least used to indicate the initial background
  • the sliding zoom strategy is at least used to indicate the initial background
  • a first generation module configured to superimpose the image of the target object on the target background image according to the position of the target object in the original image to obtain a target image corresponding to the target frame
  • the second generating module is configured to generate a target video based on the target image corresponding to the target frame.
  • a computer-readable medium on which a computer program is stored, and when the program is executed by a processing apparatus, implements the steps of the video generation method described in any embodiment of the present disclosure.
  • an electronic device comprising:
  • a processing device is configured to execute the computer program in the storage device to implement the steps of the video generation method described in any embodiment of the present disclosure.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computing Systems (AREA)
  • Image Analysis (AREA)
  • Studio Devices (AREA)

Abstract

本公开涉及一种视频生成方法、装置、可读介质及电子设备。所述方法包括:获取目标帧对应的原始图像,并识别出所述原始图像中的目标对象;根据滑动变焦策略对所述原始图像中除所述目标对象外的初始背景图像进行滑动变焦处理,得到目标背景图像;其中,所述滑动变焦策略至少用于指示所述初始背景图像的滑动方向和变焦方向,所述滑动方向和所述变焦方向相反;按照所述目标对象在所述原始图像中的位置,将所述目标对象的图像叠加在所述目标背景图像上,得到所述目标帧对应的目标图像;基于所述目标帧对应的目标图像,生成目标视频。这样,无需采用特定的拍摄手法,也无需人为处理,能够得到带有滑动变焦效果的视频,数据处理效率高。

Description

视频生成方法、装置、可读介质及电子设备
本申请要求于2020年07月17日提交中国国家知识产权局、申请号为202010694518.5、申请名称为“视频生成方法、装置、可读介质及电子设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本公开涉及计算机技术领域,具体地,涉及一种视频生成方法、装置、可读介质及电子设备。
背景技术
滑动变焦,也叫希区柯克式变焦,是一种视频的拍摄手法,它能使视觉透视关系改变,从而在画面主体不变的情况下压缩或放大背景空间,以营造出种科幻、炫酷的镜头感。现有技术中,对于未使用希区柯克式变焦拍摄手法拍摄所得到的已拍摄视频,若需要使其具备希区柯克式变焦效果,可以通过手动对已拍摄视频进行后期处理的方式实现,也就是基于已拍摄的一段视频,手动定位画面中的主体(例如,画面中的人物),再对除主体对象之外的背景打关键帧进行变焦。但是,手动处理的方式十分复杂,处理耗时且效率很低。
发明内容
提供该发明内容部分以便以简要的形式介绍构思,这些构思将在后面的具体实施方式部分被详细描述。该发明内容部分并不旨在标识要求保护的技术方案的关键特征或必要特征,也不旨在用于限制所要求的保护的技术方案的范围。
第一方面,本公开提供一种视频生成方法,所述方法包括:
获取目标帧对应的原始图像,并识别出所述原始图像中的目标对象;
根据滑动变焦策略对所述原始图像中除所述目标对象外的初始背景图像进行滑动变焦处理,得到目标背景图像;其中,所述滑动变焦策略至少用于指示所述初始背景图像的滑动方向和变焦方向,所述滑动方向和所述变焦方向相反;
按照所述目标对象在所述原始图像中的位置,将所述目标对象的图像叠加在所述目标背景图像上,得到所述目标帧对应的目标图像;
基于所述目标帧对应的目标图像,生成目标视频。
第二方面,本公开提供一种视频生成装置,所述装置包括:
第一获取模块,用于获取目标帧对应的原始图像,并识别出所述原始图像中的目标对象;
处理模块,用于根据滑动变焦策略对所述原始图像中除所述目标对象外的初始背景图像进行滑动变焦处理,得到目标背景图像;其中,所述滑动变焦策略至少用于指示所述初始背景图像的滑动方向和变焦方向,所述滑动方向和所述变焦方向相反;
第一生成模块,用于按照所述目标对象在所述原始图像中的位置,将所述目标对象的图像叠加在所述目标背景图像上,得到所述目标帧对应的目标图像;
第二生成模块,用于基于所述目标帧对应的目标图像,生成目标视频。
第三方面,本公开提供一种计算机可读介质,其上存储有计算机程序,该程序被处理装置执行时实现本公开第一方面所述方法的步骤。
第四方面,本公开提供一种电子设备,包括:
存储装置,其上存储有计算机程序;
处理装置,用于执行所述存储装置中的所述计算机程序,以实现本公开第一方面所述方法的步骤。
通过上述技术方案,获取目标帧对应的原始图像,并识别出原始图像中的目标对象,根据滑动变焦策略对原始图像中除目标对象外的初始背景图像进行滑动变焦处理,得到目标背景图像,并按照目标对象在原始图像中的位置,将目标对象的图像叠加到目标背景图像上,得到目标帧对应的目标图像,以及,基于目标帧对应的目标图像,生成目标视频。由此,基于图像本身,能自动为图像增加滑动变焦效果,保证图像画面主体不变的同时改变背景的透视关系,从而,能够基于一系列这样的图像生成主体不变而背景快速被压缩或放大的视频,即带有滑动变焦效果的目标视频。这样,无需采用特定的拍摄手法,也无需人为处理,能够得到带有滑动变焦效果的视频,数据处理效率高。
本公开的其他特征和优点将在随后的具体实施方式部分予以详细说明。
附图说明
图1是根据本公开的一种实施方式提供的视频生成方法的流程图;
图2和图3是目标视频中两帧图像的示例性示意图;
图4是根据本公开提供的视频生成方法中,根据滑动变焦策略对原始图像中除目标对象外的初始背景图像进行滑动变焦处理,得到目标背景图像的步骤的一种示例性流程图;
图5是根据本公开的一种实施方式提供的视频生成装置的框图;
图6示出了适于用来实现本公开实施例的电子设备的结构示意图。
具体实施方式
下面将参照附图更详细地描述本公开的实施例。虽然附图中显示了本公开的某些实施例,然而应当理解的是,本公开可以通过各种形式来实现,而且不应该被解释为限于这里阐述的实施例,相反提供这些实施例是为了更加透彻和完整地理解本公开。应当理解的是,本公开的附图及实施例仅用于示例性作用,并非用于限制本公开的保护范围。
应当理解,本公开的方法实施方式中记载的各个步骤可以按照不同的顺序执行,和/或并行执行。此外,方法实施方式可以包括附加的步骤和/或省略执行示出的步骤。本公开的范围在此方面不受限制。
本文使用的术语“包括”及其变形是开放性包括,即“包括但不限于”。术语“基于”是“至少部分地基于”。术语“一个实施例”表示“至少一个实施例”;术语“另一实施例”表示“至少一个另外的实施例”;术语“一些实施例”表示“至少一些实施例”。其他术语的相关定义将在下文描述中给出。
需要注意,本公开中提及的“第一”、“第二”等概念仅用于对不同的装置、模块或单元进行区分,并非用于限定这些装置、模块或单元所执行的功能的顺序或者相互依存关系。
需要注意,本公开中提及的“一个”、“多个”的修饰是示意性而非限制性的,本领域技术人员应当理解,除非在上下文另有明确指出,否则应该理解为 “一个或多个”。
本公开实施方式中的多个装置之间所交互的消息或者信息的名称仅用于说明性的目的,而并不是用于对这些消息或信息的范围进行限制。
在音/视频处理领域,一般情况下,音/视频编辑包括三层结构,分别是业务层(前台)、SDK层(中台)、算法层(后台),其中,SDK是Software Development Kit的缩写,中文意思是“软件开发工具包”。业务层负责接收用户操作,即客户端;SDK层负责数据传递,比如将待处理数据传递给算法层,以获得算法层的处理结果,并根据得到的处理结果进一步处理数据,举例来说,SDK层可以负责音/视频的抽帧、编解码、传递等,同时,在SDK层可以设置针对数据的处理策略;算法层负责处理SDK层传入的数据,并将得到的处理结果输出给SDK层。
本公开提供的方法主要应用于视频生成(即,生成带有滑动变焦效果的视频)的场景,并且,本公开所使用的相关算法集成在算法层,本公开提供的方法中有关于数据处理的步骤可以由SDK层(中台)执行,最终的处理结果(例如,目标视频)可以在客户端进行展示。
图1是根据本公开的一种实施方式提供的视频生成方法的流程图。如图1所示,该方法可以包括以下步骤:
在步骤11中,获取目标帧对应的原始图像,并识别出原始图像中的目标对象;
在步骤12中,根据滑动变焦策略对原始图像中除目标对象外的初始背景图像进行滑动变焦处理,得到目标背景图像;
在步骤13中,按照目标对象在原始图像中的位置,将目标对象的图像叠加在目标背景图像上,得到目标帧对应的目标图像;
在步骤14中,基于目标帧对应的目标图像,生成目标视频。
步骤11至步骤13描述的是针对某一个图像(也就是目标帧对应的原始图像)进行滑动变焦处理生成目标图像的过程,实际应用中,需要针对多个目标帧采用步骤11至步骤13的方式分别生成目标图像,并基于这些目标图像以及各自对应的目标帧的时间先后,共同构成目标视频。
在获取到目标帧对应的原始图像后,首先识别出原始图像中的目标对象, 也就是识别出原始图像中的画面主体。在这里,以什么为画面主体(即,目标对象)可以是预先设定的。例如,若设定人是目标对象,则识别原始图像中的目标对象实际上就是识别原始图像中的人。再例如,若设定占画面比例最大的对象为目标对象,则识别原始图像中的目标对象实际上就是识别原始图像中占据画面比例更高一些的对象。
如前文所述,带有滑动变焦效果的视频是保证视频中图像画面主体不变的同时改变背景的透视关系,在这里需要两方面的处理,一方面是保证视频中画面主体不变,另一方面则是改变背景的透视关系。
在第一方面中,保证视频中画面主体不变,就是需要同一目标视频中每一目标帧对应的目标图像的目标对象的大小、位置都相同,也就是说,目标对象在画面中会有一个期望位置,最终获得的带有滑动变焦效果的目标视频中每一图像中的目标对象应当处于这个期望位置。示例地,这个期望位置可以人为选定,例如,选定为画面中心的位置。再例如,这个期望位置可以依据每一目标帧对应的原始图像来确定,例如,将目标对象在时间最靠前的目标帧对应的原始图像中所处的位置作为目标对象在整个目标视频中的期望位置。
在第二方面中,需要改变背景的透视关系,需要根据滑动变焦策略对原始图像中除目标对象外的初始背景图像进行滑动变焦处理,得到目标背景图像。在这里,滑动变焦策略就用于改变原始图像中初始背景图像的透视关系,也就是在画面推进或后退的基础上进行变焦处理。拍摄原始图像时会对应一个拍摄位置,也就是镜头位置,画面推进或后退就是模拟在原始图像对应的三维图像空间中推进或后退镜头能够拍摄到的画面,它主要改变初始背景图像的透视关系,变焦则是对画面的放大或缩小。
其中,滑动变焦策略至少用于指示初始背景图像的滑动方向和变焦方向,且滑动方向和变焦方向相反。在涉及到有关透视关系的图像处理时,一般需要基于透视点进行。滑动方向可以为靠近透视点的方向或远离透视点的方向。若滑动方向为靠近透视点的方向,表示画面向目标对象推进(相当于镜头向靠近目标对象的方向移动);若滑动方向为远离透视点的方向,表示画面相比于目标对象后退(相当于镜头向远离目标对象的方向移动)。变焦方向可以为靠近透视点的方向或远离透视点的方向。若变焦方向为靠近透视点的方向,表示视 角变小(焦距增大);若变焦方向为远离透视点的方向,表示视角变大(焦距缩小)。如上所述,在滑动变焦策略中,滑动方向和变焦方向是相反的。举例来说,若滑动方向为靠近透视点的方向,则变焦方向为远离透视点的方向。
根据滑动变焦策略,可以对原始图像中除目标对象外的初始背景图像进行滑动变焦处理,得到目标背景图像。进一步地,按照目标对象在原始图像中的位置,将目标对象的图像叠加在目标背景图像上,就可以得到目标帧对应的目标图像。
如上所述,目标对象在画面中会有一个期望位置,因此,若这个期望位置与目标对象在原始图像中的位置不匹配(例如,位置不同),还需要对目标对象的图像做进一步的处理,以使对目标对象的图像处理后得到的图像能与该期望位置相匹配。例如,若期望位置为画面中心且占据画面的50%,那么若目标对象在原始图像中处于画面中心但是仅占据画面的25%,则还需要对目标对象的图像进行放大处理,以使得处理后的图像处于画面中心且占据画面的50%。
参照上述处理方式,基于目标帧对应的目标对象,就可以生成目标视频。实际上,目标帧可以为多个,也就是,针对多个目标帧,生成各自的目标图像,从而,基于每一目标帧各自对应的目标图像,按照目标帧的时间先后关系,生成带有滑动变焦效果的目标视频。其中,目标对象在目标视频的每一帧图像中位置相同。并且,目标视频中每一帧图像的尺寸相同。
示例地,图2和图3为本公开提供的视频生成方法的效果示意图。其中,图2为目标视频中时间靠前的一帧视频帧,图3为目标视频中时间靠后的一帧视频帧,以及,在图2和图3中,T为目标对象,虚线部分表示背景。可见,在图2和图3中,目标对象的位置和大小没有变化,并且,图3和图2相比,背景部分被拉近,在视觉上图3更靠近T后方的背景,图3相当于在图2的基础上向目标对象T的后方进行了滑动变焦,造成了背景部分的透视关系变化。以图2和图3为参照,目标视频中的多个视频帧与此类似,使得目标视频能实现滑动变焦效果。
通过上述技术方案,获取目标帧对应的原始图像,并识别出原始图像中的目标对象,根据滑动变焦策略对原始图像中除目标对象外的初始背景图像进行滑动变焦处理,得到目标背景图像,并按照目标对象在原始图像中的位置,将 目标对象的图像叠加到目标背景图像上,得到目标帧对应的目标图像,以及,基于目标帧对应的目标图像,生成目标视频。由此,基于图像本身,能自动为图像增加滑动变焦效果,保证图像画面主体不变的同时改变背景的透视关系,从而,能够基于一系列这样的图像生成主体不变而背景快速被压缩或放大的视频,即带有滑动变焦效果的目标视频。这样,无需采用特定的拍摄手法,也无需人为处理,能够得到带有滑动变焦效果的视频,数据处理效率高。
为了使本领域技术人员更加理解本发明实施例提供的技术方案,下面对上文中的相应步骤进行详细的说明。
首先,针对步骤11中获取目标帧对应的原始图像的方式进行详细说明。
在一种可能的实施方式中,获取目标帧对应的原始图像,可以包括以下步骤:
从待处理媒体文件中获取目标帧对应的原始图像。
其中,待处理媒体文件为包含目标对象的图像或视频。
这一实施方式相当于针对已存储的图像或视频的后处理,也就是基于已经拍摄完毕的图像或视频进行后期处理,得到具有滑动变焦效果的视频。
若待处理媒体文件为包含目标对象的图像,也就是仅针对一张已有图像进行处理,在这里,每一目标帧对应的原始图像均是该待处理媒体文件,每一目标帧获得相同的原始图像。本公开的方案相当于仅基于一张图像,对其背景做滑动变焦,生成对应于不同目标帧的多个目标图像,并合成为目标视频。
若待处理媒体文件为包含目标对象的视频,则目标帧的时间先后顺序可以遵循待处理媒体文件中各帧视频帧的先后顺序,例如,正序(或倒序)从待处理媒体文件中依次获取原始图像。若为正序,则待处理媒体文件中时间靠前的视频帧对应于时间靠前的目标帧,相反地,若为倒序,则待处理媒体文件中时间靠后的视频帧对应于时间靠前的目标帧。
在另一种可能的实施方式中,获取目标帧对应的原始图像,可以包括以下步骤:
从图像采集装置实时采集到的信息流中获取目标帧对应的原始图像。
这一实施方式相当于针对实时获取的信息流的实时处理,也就是在拍摄图像或视频的过程中,实时获取目标帧的原始图像,并进行步骤11至步骤13的 操作,以获得对应于目标帧的目标图像,以便后续生成目标视频。
下面对步骤11中,识别出原始图像中的目标对象的实施方式进行详细说明。
在一种可能的实施方式中,识别出原始图像中的目标对象,可以包括以下步骤:
通过预先训练的目标对象识别模型识别出原始图像中的目标对象。
目标对象识别模型用于识别出图像中的目标对象,例如,识别出目标对象的轮廓,或者,识别出目标对象在图像中对应的位置,等等。目标对象识别模型相当于分类模型,即从图像中识别出属于目标对象这一类的像素点。示例地,目标对象分类模型可以通过如下方式获得:
获取多组训练数据,每一组训练数据包括一训练图像和表征训练图像中各像素点是否属于目标对象的标记信息;
根据多组训练数据,通过神经网络模型进行训练,以获得目标对象分类模型。
其中,在每一次训练过程中,将一组训练数据中的一训练图像作为输入数据,将该训练图像的标记信息作为真实输出,并利用神经网络模型针对输入的训练图像的实际输出与该真实输出进行比较,将比较结果(例如,通过二者计算的损失值)作用于神经网络模型,以调整神经网络模型内部的参数,如此循环往复,直至满足模型停止训练的条件(例如,训练次数达到一定次数,或者,损失值小于某一阈值,等等),将得到的模型作为目标对象分类模型。其中,模型训练方式属于本领域的公知常识,上述仅作为举例说明,对于其他的可实现方式,本公开不再一一给出。
在另一种可能的实施方式中,识别出原始图像中的目标对象,可以包括以下步骤:
通过目标对象对应的历史位置信息识别出原始图像中的目标对象。
其中,历史位置信息是通过对原始图像前的图像进行运动追踪得到的。
通过历史位置信息,可以推测出原始图像中目标对象所处的位置。例如,通过目标对象的历史位置信息可以确定目标对象的运动趋势,从而,可以以原始图像之前、且时间上最接近原始图像的参照图像中目标对象所在位置为起 点,依据历史位置信息中目标对象的运动趋势,以及参照图像与原始图像的拍摄时间差确定目标对象在原始图像中相对于在参照图像中的位置变化,从而能够推测出目标对象在原始图像中的位置,以识别出原始图像中的目标对象。
采用上述方式,能够直接通过历史对目标对象的运动追踪确定目标对象在原始图像中的位置,以识别出原始图像中的目标对象,数据处理量小。
下面对步骤12中,根据滑动变焦策略对原始图像中除目标对象外的初始背景图像进行滑动变焦处理,得到目标背景图像这一步骤进行详细说明。
在一种可能的实施方式中,步骤12可以包括以下步骤,如图4所示。
在步骤41中,确定初始背景图像的透视点;
在步骤42中,根据识别出的目标对象在原始图像中的初始位置和目标对象在画面中的期望位置,确定初始背景图像的滑动方向、滑动距离、变焦方向和变焦距离;
在步骤43中,以透视点为中心、沿滑动方向以滑动距离对初始背景图像进行画面滑动;
在步骤44中,以透视点为中心、沿变焦方向以变焦距离对初始背景图像进行画面变焦。
在处理图像时,特别是有关于透视关系的图像处理,一般需要借助图像中的透视点,因此,首先需要确定初始背景图像的透视点,以为后续的滑动变焦处理做准备。
在一种可能的实施方式中,步骤41可以包括以下步骤:
将原始图像的中心点确定为透视点。
由于图像或视频拍摄过程中一般是将画面主体置于画面中心进行拍摄,符合一般的一点透视规律,而背景的消失点一般也处于画面中心,因此,将图像中心点确定为滑动变焦的透视点是一种快速且比较稳妥的确定透视点的方式。
在另一种可能的实施方式中,步骤41可以包括以下步骤:
获取原始图像的深度信息;
根据深度信息确定原始图像中的灭点位置;
将灭点位置作为透视点。
二维图像中隐藏的深度信息可以帮助理解图像场景的三维结构,对二维图 像的深度信息理解作为本领域的公知技术,在本领域中应用十分广泛。例如,通过明暗、光照、几何分析、特征学习等方式理解二维图像的深度信息,具体的方式此处不过多描述。
因此,基于原始图像,能够获取到原始图像的深度信息,进而,根据深度信息确定原始图像中的灭点位置,也就是消失点位置,进一步地,可以直接将灭点位置作为透视点。其中,基于原始图像的深度信息能够构建原始图像对应的三维图像空间,从而,灭点位置基于现有技术中常用的灭点识别方式即可获得。
若需要对原始图像进行滑动变焦处理,不仅需要确定透视点,还需要确定滑动处理的方式以及变焦处理的方式。其中,滑动处理的方式可以包括滑动方向和滑动距离,变焦处理的方式可以包括变焦方向和变焦距离。有关滑动方向和变焦方向的含义说明已在前文中详细列出,此处不再重复描述。滑动距离指在原始图像对应的三维图像空间中镜头应当推进或后退的距离。变焦距离指焦距变化的距离。因此,需要继续执行步骤42,以确定初始背景图像的滑动方向、滑动距离、变焦方向和变焦距离。
在一种可能的实施方式中,步骤42可以包括以下步骤:
确定初始位置在原始图像对应的三维图像空间中所处的第一坐标;
确定期望位置在三维图像空间中所处的第二坐标;
根据第一坐标和第二坐标,确定滑动方向、滑动距离、变焦方向和变焦距离。
如上所述,基于原始图像可以构建与原始图像对应的三维图像空间。在这个三维空间中,根据目标对象在原始图像中的初始位置可以确定出一个坐标,即第一坐标,并且,根据目标对象的期望位置可以确定出一个坐标,即第二坐标。滑动变焦的目的就是使目标对象处于期望位置,在这个过程中,由于目标对象的初始位置与期望位置之间的差距,背景可能也会对应变化,即,背景从通过第一坐标能够拍摄到的画面变化为通过第二坐标能够拍摄的画面。因此,根据第一坐标和第二坐标,即可确定出滑动方向、滑动距离、变焦方向和变焦距离。示例地,若第二坐标相比于第一坐标更加靠近透视点,则滑动方向应当是靠近透视点的方向,滑动距离是从第一坐标到第二坐标应当经过的距离,变 焦方向是远离透视点的方向,变焦距离是从第一坐标到第二坐标镜头的焦距变化。
在通过步骤42确定出滑动方向、滑动距离、变焦方向、变焦距离后,可以以此为依据对初始背景图像进行滑动变焦处理,即,执行步骤43、44。其中,本公开对步骤43和步骤44的执行顺序不做严格限定,二者可以同时执行,也可以先后执行。
由于对图像的滑动变焦处理涉及到画面的变化,可能出现背景缺失的问题,因此可以进行一定程度的补背景处理,将补背景之后得到的图像作为初始背景图像,用以进行滑动变焦处理,使得到的目标背景图像的背景是无缺失的,进而使得到的目标图像是完整的、无缺失的。基于此,本公开提供的方法还可以包括以下步骤:
在原始图像中去掉目标对象,得到第一背景图像;
获取位于原始图像中目标对象所覆盖区域的第二背景图像;
按照目标对象所覆盖区域的位置,将第二背景图像补充到第一背景图像中,得到初始背景图像。
首先,在原始图像中去掉目标图像,得到第一背景图像,此时,第一背景图像是部分缺失的,为了保证最终生成的目标图像的背景的完整性,可以对缺失的部分进行补充。
因此,可以获取位于原始图像中目标对象所覆盖区域的第二背景图像,并按照目标对象所覆盖区域的位置,将第二背景图像补充到第一背景图像中,得到初始背景图像。
其中,对于获取位于原始图像中目标对象所覆盖区域的第二背景图像,可以采用多种方式。示例地,可以预先设置一些用于背景补充的图像,直接从这些图像中获取第二背景图像。再例如,可以基于第一背景图像,从第一背景图像中提取出部分图像作为第二背景图像。再例如,可以基于第一背景图像,对其中缺失的部分进行图像填充(例如,使用现有的图像修复算法、图像填充算法等进行图像填充),并将填充的图像内容作为第二背景图像。
通过这一方式,得到的初始背景图像是完整的,基于这一初始背景图像进行滑动变焦处理后得到的目标背景图像也会是完整的,从而,在目标背景图像 上覆盖目标对象的图像后得到的目标图像也是完整的,不会出现背景缺失的情况,保证了图像的信息完整性。
图5是根据本公开的一种实施方式提供的视频生成装置的框图。如图5所示,该装置50可以包括:
第一获取模块51,用于获取目标帧对应的原始图像,并识别出所述原始图像中的目标对象;
处理模块52,用于根据滑动变焦策略对所述原始图像中除所述目标对象外的初始背景图像进行滑动变焦处理,得到目标背景图像;其中,所述滑动变焦策略至少用于指示所述初始背景图像的滑动方向和变焦方向,所述滑动方向和所述变焦方向相反;
第一生成模块53,用于按照所述目标对象在所述原始图像中的位置,将所述目标对象的图像叠加在所述目标背景图像上,得到所述目标帧对应的目标图像;
第二生成模块54,用于基于所述目标帧对应的目标图像,生成目标视频。
可选地,所述第一获取模块51包括:
第一识别子模块,用于通过预先训练的目标对象识别模型识别出所述原始图像中的目标对象;
或,
第二识别子模块,用于通过所述目标对象对应的历史位置信息识别出所述原始图像中的目标对象,所述历史位置信息是通过对所述原始图像前的图像进行运动追踪得到的。
可选地,所述处理模块52包括:
第一确定子模块,用于确定所述初始背景图像的透视点;
第二确定子模块,用于根据识别出的所述目标对象在所述原始图像中的初始位置和所述目标对象在画面中的期望位置,确定所述初始背景图像的所述滑动方向、滑动距离、所述变焦方向和变焦距离;
第三确定子模块,用于以所述透视点为中心、沿所述滑动方向以所述滑动距离对所述初始背景图像进行画面滑动;
第四确定子模块,用于以所述透视点为中心、沿所述变焦方向以所述变焦 距离对所述初始背景图像进行画面变焦。
可选地,所述第一确定子模块用于将所述原始图像的中心点确定为所述透视点;
或者,
所述第一确定子模块用于:获取所述原始图像的深度信息;根据所述深度信息确定所述原始图像中的灭点位置;将所述灭点位置作为所述透视点。
可选地,所述第二确定子模块用于:确定所述初始位置在所述原始图像对应的三维图像空间中所处的第一坐标;确定所述期望位置在所述三维图像空间中所处的第二坐标;根据所述第一坐标和所述第二坐标,确定所述滑动方向、所述滑动距离、所述变焦方向和所述变焦距离。
可选地,所述装置还包括:
图像去除模块,用于在所述原始图像中去掉所述目标对象,得到第一背景图像;
第二获取模块,用于获取位于所述原始图像中所述目标对象所覆盖区域的第二背景图像;
补充模块,用于按照所述目标对象所覆盖区域的位置,将所述第二背景图像补充到所述第一背景图像中,得到所述初始背景图像。
可选地,所述第一获取模块51用于从待处理媒体文件中获取目标帧对应的原始图像,所述待处理媒体文件为包含所述目标对象的图像或视频;
或者,
所述第一获取模块51用于从图像采集装置实时采集到的信息流中获取目标帧对应的原始图像。
关于上述实施例中的装置,其中各个模块执行操作的具体方式已经在有关该方法的实施例中进行了详细描述,此处将不做详细阐述说明。
下面参考图6,其示出了适于用来实现本公开实施例的电子设备600的结构示意图。本公开实施例中的终端设备可以包括但不限于诸如移动电话、笔记本电脑、数字广播接收器、PDA(个人数字助理)、PAD(平板电脑)、PMP(便携式多媒体播放器)、车载终端(例如车载导航终端)等等的移动终端以及诸如数字TV、台式计算机等等的固定终端。图6示出的电子设备仅仅是一 个示例,不应对本公开实施例的功能和使用范围带来任何限制。
如图6所示,电子设备600可以包括处理装置(例如中央处理器、图形处理器等)601,其可以根据存储在只读存储器(ROM)602中的程序或者从存储装置608加载到随机访问存储器(RAM)603中的程序而执行各种适当的动作和处理。在RAM 603中,还存储有电子设备600操作所需的各种程序和数据。处理装置601、ROM 602以及RAM 603通过总线604彼此相连。输入/输出(I/O)接口605也连接至总线604。
通常,以下装置可以连接至I/O接口605:包括例如触摸屏、触摸板、键盘、鼠标、摄像头、麦克风、加速度计、陀螺仪等的输入装置606;包括例如液晶显示器(LCD)、扬声器、振动器等的输出装置607;包括例如磁带、硬盘等的存储装置608;以及通信装置609。通信装置609可以允许电子设备600与其他设备进行无线或有线通信以交换数据。虽然图6示出了具有各种装置的电子设备600,但是应理解的是,并不要求实施或具备所有示出的装置。可以替代地实施或具备更多或更少的装置。
特别地,根据本公开的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本公开的实施例包括一种计算机程序产品,其包括承载在非暂态计算机可读介质上的计算机程序,该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中,该计算机程序可以通过通信装置609从网络上被下载和安装,或者从存储装置608被安装,或者从ROM 602被安装。在该计算机程序被处理装置601执行时,执行本公开实施例的方法中限定的上述功能。
需要说明的是,本公开上述的计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子可以包括但不限于:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本公开中,计算机可读存储介质可以 是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本公开中,计算机可读信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读信号介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于:电线、光缆、RF(射频)等等,或者上述的任意合适的组合。
在一些实施方式中,客户端、服务器可以利用诸如HTTP(HyperText Transfer Protocol,超文本传输协议)之类的任何当前已知或未来研发的网络协议进行通信,并且可以与任意形式或介质的数字数据通信(例如,通信网络)互连。通信网络的示例包括局域网(“LAN”),广域网(“WAN”),网际网(例如,互联网)以及端对端网络(例如,ad hoc端对端网络),以及任何当前已知或未来研发的网络。
上述计算机可读介质可以是上述电子设备中所包含的;也可以是单独存在,而未装配入该电子设备中。
上述计算机可读介质承载有一个或者多个程序,当上述一个或者多个程序被该电子设备执行时,使得该电子设备:获取目标帧对应的原始图像,并识别出所述原始图像中的目标对象;根据滑动变焦策略对所述原始图像中除所述目标对象外的初始背景图像进行滑动变焦处理,得到目标背景图像;其中,所述滑动变焦策略至少用于指示所述初始背景图像的滑动方向和变焦方向,所述滑动方向和所述变焦方向相反;按照所述目标对象在所述原始图像中的位置,将所述目标对象的图像叠加在所述目标背景图像上,得到所述目标帧对应的目标图像;基于所述目标帧对应的目标图像,生成目标视频。
可以以一种或多种程序设计语言或其组合来编写用于执行本公开的操作的计算机程序代码,上述程序设计语言包括但不限于面向对象的程序设计语言—诸如Java、Smalltalk、C++,还包括常规的过程式程序设计语言——诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算机上执行、部分 地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络——包括局域网(LAN)或广域网(WAN)——连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。
附图中的流程图和框图,图示了按照本公开各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,该模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。
描述于本公开实施例中所涉及到的模块可以通过软件的方式实现,也可以通过硬件的方式来实现。其中,模块的名称在某种情况下并不构成对该模块本身的限定,例如,第一获取模块还可以被描述为“获取目标帧对应的原始图像,并识别出所述原始图像中的目标对象的模块”。
本文中以上描述的功能可以至少部分地由一个或多个硬件逻辑部件来执行。例如,非限制性地,可以使用的示范类型的硬件逻辑部件包括:现场可编程门阵列(FPGA)、专用集成电路(ASIC)、专用标准产品(ASSP)、片上系统(SOC)、复杂可编程逻辑设备(CPLD)等等。
在本公开的上下文中,机器可读介质可以是有形的介质,其可以包含或存储以供指令执行系统、装置或设备使用或与指令执行系统、装置或设备结合地使用的程序。机器可读介质可以是机器可读信号介质或机器可读储存介质。机器可读介质可以包括但不限于电子的、磁性的、光学的、电磁的、红外的、或半导体系统、装置或设备,或者上述内容的任何合适组合。机器可读存储介质的更具体示例会包括基于一个或多个线的电气连接、便携式计算机盘、硬盘、 随机存取存储器(RAM)、只读存储器(ROM)、可擦除可编程只读存储器(EPROM或快闪存储器)、光纤、便捷式紧凑盘只读存储器(CD-ROM)、光学储存设备、磁储存设备、或上述内容的任何合适组合。
根据本公开的一个或多个实施例,提供了一种视频生成方法,所述方法包括:
获取目标帧对应的原始图像,并识别出所述原始图像中的目标对象;
根据滑动变焦策略对所述原始图像中除所述目标对象外的初始背景图像进行滑动变焦处理,得到目标背景图像;其中,所述滑动变焦策略至少用于指示所述初始背景图像的滑动方向和变焦方向,所述滑动方向和所述变焦方向相反;
按照所述目标对象在所述原始图像中的位置,将所述目标对象的图像叠加在所述目标背景图像上,得到所述目标帧对应的目标图像;
基于所述目标帧对应的目标图像,生成目标视频。
根据本公开的一个或多个实施例,提供了一种视频生成方法,所述识别出所述原始图像中的目标对象,包括:
通过预先训练的目标对象识别模型识别出所述原始图像中的目标对象;
或,
通过所述目标对象对应的历史位置信息识别出所述原始图像中的目标对象,所述历史位置信息是通过对所述原始图像前的图像进行运动追踪得到的。
根据本公开的一个或多个实施例,提供了一种视频生成方法,所述根据滑动变焦策略对所述原始图像中除所述目标对象外的初始背景图像进行滑动变焦处理,得到目标背景图像,包括:
确定所述初始背景图像的透视点;
根据识别出的所述目标对象在所述原始图像中的初始位置和所述目标对象在画面中的期望位置,确定所述初始背景图像的所述滑动方向、滑动距离、所述变焦方向和变焦距离;
以所述透视点为中心、沿所述滑动方向以所述滑动距离对所述初始背景图像进行画面滑动;
以所述透视点为中心、沿所述变焦方向以所述变焦距离对所述初始背景图 像进行画面变焦。
根据本公开的一个或多个实施例,提供了一种视频生成方法,所述确定所述初始背景图像的透视点,包括:
将所述原始图像的中心点确定为所述透视点;
或者,
所述确定所述初始背景图像的透视点,包括:
获取所述原始图像的深度信息;
根据所述深度信息确定所述原始图像中的灭点位置;
将所述灭点位置作为所述透视点。
根据本公开的一个或多个实施例,提供了一种视频生成方法,所述根据识别出的所述目标对象在所述原始图像中的初始位置和所述目标对象在画面中的期望位置,确定所述初始背景图像的所述滑动方向、滑动距离、所述变焦方向和变焦距离,包括:
确定所述初始位置在所述原始图像对应的三维图像空间中所处的第一坐标;
确定所述期望位置在所述三维图像空间中所处的第二坐标;
根据所述第一坐标和所述第二坐标,确定所述滑动方向、所述滑动距离、所述变焦方向和所述变焦距离。
根据本公开的一个或多个实施例,提供了一种视频生成方法,所述方法还包括:
在所述原始图像中去掉所述目标对象,得到第一背景图像;
获取位于所述原始图像中所述目标对象所覆盖区域的第二背景图像;
按照所述目标对象所覆盖区域的位置,将所述第二背景图像补充到所述第一背景图像中,得到所述初始背景图像。
根据本公开的一个或多个实施例,提供了一种视频生成方法,所述获取目标帧对应的原始图像,包括:
从待处理媒体文件中获取目标帧对应的原始图像,所述待处理媒体文件为包含所述目标对象的图像或视频;
或者,
从图像采集装置实时采集到的信息流中获取目标帧对应的原始图像。
根据本公开的一个或多个实施例,提供了一种视频生成装置,所述装置包括:
第一获取模块,用于获取目标帧对应的原始图像,并识别出所述原始图像中的目标对象;
处理模块,用于根据滑动变焦策略对所述原始图像中除所述目标对象外的初始背景图像进行滑动变焦处理,得到目标背景图像;其中,所述滑动变焦策略至少用于指示所述初始背景图像的滑动方向和变焦方向,所述滑动方向和所述变焦方向相反;
第一生成模块,用于按照所述目标对象在所述原始图像中的位置,将所述目标对象的图像叠加在所述目标背景图像上,得到所述目标帧对应的目标图像;
第二生成模块,用于基于所述目标帧对应的目标图像,生成目标视频。
根据本公开的一个或多个实施例,提供了一种计算机可读介质,其上存储有计算机程序,该程序被处理装置执行时实现本公开任意实施例所述的视频生成方法的步骤。
根据本公开的一个或多个实施例,提供了一种电子设备,包括:
存储装置,其上存储有计算机程序;
处理装置,用于执行所述存储装置中的所述计算机程序,以实现本公开任意实施例所述的视频生成方法的步骤。
以上描述仅为本公开的较佳实施例以及对所运用技术原理的说明。本领域技术人员应当理解,本公开中所涉及的公开范围,并不限于上述技术特征的特定组合而成的技术方案,同时也应涵盖在不脱离上述公开构思的情况下,由上述技术特征或其等同特征进行任意组合而形成的其它技术方案。例如上述特征与本公开中公开的(但不限于)具有类似功能的技术特征进行互相替换而形成的技术方案。
此外,虽然采用特定次序描绘了各操作,但是这不应当理解为要求这些操作以所示出的特定次序或以顺序次序执行来执行。在一定环境下,多任务和并行处理可能是有利的。同样地,虽然在上面论述中包含了若干具体实现细节, 但是这些不应当被解释为对本公开的范围的限制。在单独的实施例的上下文中描述的某些特征还可以组合地实现在单个实施例中。相反地,在单个实施例的上下文中描述的各种特征也可以单独地或以任何合适的子组合的方式实现在多个实施例中。
尽管已经采用特定于结构特征和/或方法逻辑动作的语言描述了本主题,但是应当理解所附权利要求书中所限定的主题未必局限于上面描述的特定特征或动作。相反,上面所描述的特定特征和动作仅仅是实现权利要求书的示例形式。关于上述实施例中的装置,其中各个模块执行操作的具体方式已经在有关该方法的实施例中进行了详细描述,此处将不做详细阐述说明。

Claims (12)

  1. 一种视频生成方法,其特征在于,所述方法包括:
    获取目标帧对应的原始图像,并识别出所述原始图像中的目标对象;
    根据滑动变焦策略对所述原始图像中除所述目标对象外的初始背景图像进行滑动变焦处理,得到目标背景图像;其中,所述滑动变焦策略至少用于指示所述初始背景图像的滑动方向和变焦方向,所述滑动方向和所述变焦方向相反;
    按照所述目标对象在所述原始图像中的位置,将所述目标对象的图像叠加在所述目标背景图像上,得到所述目标帧对应的目标图像;
    基于所述目标帧对应的目标图像,生成目标视频。
  2. 根据权利要求1所述的方法,其特征在于,所述识别出所述原始图像中的目标对象,包括:
    通过预先训练的目标对象识别模型识别出所述原始图像中的目标对象;
    或,
    通过所述目标对象对应的历史位置信息识别出所述原始图像中的目标对象,所述历史位置信息是通过对所述原始图像前的图像进行运动追踪得到的。
  3. 根据权利要求1所述的方法,其特征在于,所述根据滑动变焦策略对所述原始图像中除所述目标对象外的初始背景图像进行滑动变焦处理,得到目标背景图像,包括:
    确定所述初始背景图像的透视点;
    根据识别出的所述目标对象在所述原始图像中的初始位置和所述目标对象在画面中的期望位置,确定所述初始背景图像的所述滑动方向、滑动距离、所述变焦方向和变焦距离;
    以所述透视点为中心、沿所述滑动方向以所述滑动距离对所述初始背景图像进行画面滑动;
    以所述透视点为中心、沿所述变焦方向以所述变焦距离对所述初始背景图像进行画面变焦。
  4. 根据权利要求3所述的方法,其特征在于,所述确定所述初始背景图像的透视点,包括:
    将所述原始图像的中心点确定为所述透视点;
    或者,
    所述确定所述初始背景图像的透视点,包括:
    获取所述原始图像的深度信息;
    根据所述深度信息确定所述原始图像中的灭点位置;
    将所述灭点位置作为所述透视点。
  5. 根据权利要求3所述的方法,其特征在于,所述根据识别出的所述目标对象在所述原始图像中的初始位置和所述目标对象在画面中的期望位置,确定所述初始背景图像的所述滑动方向、滑动距离、所述变焦方向和变焦距离,包括:
    确定所述初始位置在所述原始图像对应的三维图像空间中所处的第一坐标;
    确定所述期望位置在所述三维图像空间中所处的第二坐标;
    根据所述第一坐标和所述第二坐标,确定所述滑动方向、所述滑动距离、所述变焦方向和所述变焦距离。
  6. 根据权利要求3-5任一项所述的方法,其特征在于,所述方法还包括:
    若所述目标对象在所述原始图像中的位置和所述目标对象在画面中的期望位置不匹配,则,对所述原始图像进行处理,使处理后得到的图像中所述目标对象的位置与所述期望位置匹配,
    将所述处理后得到的图像中所述目标对象的位置记作所述初始位置。
  7. 根据权利要求1所述的方法,其特征在于,所述方法还包括:
    在所述原始图像中去掉所述目标对象,得到第一背景图像;
    获取位于所述原始图像中所述目标对象所覆盖区域的第二背景图像;
    按照所述目标对象所覆盖区域的位置,将所述第二背景图像补充到所述第一背景图像中,得到所述初始背景图像。
  8. 根据权利要求7所述的方法,其特征在于,所述获取位于所述原始图像中所述目标对象所覆盖区域的第二背景图像,包括:
    从预先设置的背景补充图像中,获得所述第二背景图像;
    或者,从所述第一背景图像中提取部分图像,将提取的图像作为所述第二 背景图像;
    又或者,对所述第一背景图像中缺失的部分进行图像填充,将填充的图像作为所述第二背景图像。
  9. 根据权利要求1所述的方法,其特征在于,所述获取目标帧对应的原始图像,包括:
    从待处理媒体文件中获取目标帧对应的原始图像,所述待处理媒体文件为包含所述目标对象的图像或视频;
    或者,
    从图像采集装置实时采集到的信息流中获取目标帧对应的原始图像。
  10. 一种视频生成装置,其特征在于,所述装置包括:
    第一获取模块,用于获取目标帧对应的原始图像,并识别出所述原始图像中的目标对象;
    处理模块,用于根据滑动变焦策略对所述原始图像中除所述目标对象外的初始背景图像进行滑动变焦处理,得到目标背景图像;其中,所述滑动变焦策略至少用于指示所述初始背景图像的滑动方向和变焦方向,所述滑动方向和所述变焦方向相反;
    第一生成模块,用于按照所述目标对象在所述原始图像中的位置,将所述目标对象的图像叠加在所述目标背景图像上,得到所述目标帧对应的目标图像;
    第二生成模块,用于基于所述目标帧对应的目标图像,生成目标视频。
  11. 一种计算机可读介质,其上存储有计算机程序,其特征在于,该程序被处理装置执行时实现权利要求1-9中任一项所述方法的步骤。
  12. 一种电子设备,其特征在于,包括:
    存储装置,其上存储有计算机程序;
    处理装置,用于执行所述存储装置中的所述计算机程序,以实现权利要求1-9中任一项所述方法的步骤。
PCT/CN2021/099107 2020-07-17 2021-06-09 视频生成方法、装置、可读介质及电子设备 WO2022012231A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP21841214.6A EP4178194A4 (en) 2020-07-17 2021-06-09 VIDEO PRODUCTION METHOD AND APPARATUS, AND READABLE MEDIUM AND ELECTRONIC DEVICE
US18/091,087 US11836887B2 (en) 2020-07-17 2022-12-29 Video generation method and apparatus, and readable medium and electronic device

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010694518.5 2020-07-17
CN202010694518.5A CN113949808B (zh) 2020-07-17 2020-07-17 视频生成方法、装置、可读介质及电子设备

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US18/091,087 Continuation US11836887B2 (en) 2020-07-17 2022-12-29 Video generation method and apparatus, and readable medium and electronic device

Publications (1)

Publication Number Publication Date
WO2022012231A1 true WO2022012231A1 (zh) 2022-01-20

Family

ID=79326761

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/099107 WO2022012231A1 (zh) 2020-07-17 2021-06-09 视频生成方法、装置、可读介质及电子设备

Country Status (4)

Country Link
US (1) US11836887B2 (zh)
EP (1) EP4178194A4 (zh)
CN (1) CN113949808B (zh)
WO (1) WO2022012231A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP4037307A1 (en) * 2021-01-27 2022-08-03 Beijing Xiaomi Mobile Software Co., Ltd. Image processing method and apparatus, electronic device, and storage medium

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114697517A (zh) * 2020-12-28 2022-07-01 北京小米移动软件有限公司 视频处理方法、装置、终端设备及存储介质
CN114584709B (zh) * 2022-03-03 2024-02-09 北京字跳网络技术有限公司 变焦特效的生成方法、装置、设备及存储介质
CN114710619A (zh) * 2022-03-24 2022-07-05 维沃移动通信有限公司 拍摄方法、拍摄装置、电子设备及可读存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104980651A (zh) * 2014-04-04 2015-10-14 佳能株式会社 图像处理设备及控制方法
JP2017143354A (ja) * 2016-02-08 2017-08-17 キヤノン株式会社 画像処理装置及び画像処理方法
CN109379537A (zh) * 2018-12-30 2019-02-22 北京旷视科技有限公司 滑动变焦效果实现方法、装置、电子设备及计算机可读存储介质
CN111083380A (zh) * 2019-12-31 2020-04-28 维沃移动通信有限公司 一种视频处理方法、电子设备及存储介质
CN112532808A (zh) * 2020-11-24 2021-03-19 维沃移动通信有限公司 图像处理方法、装置和电子设备

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9025051B2 (en) * 2013-02-28 2015-05-05 Nokia Technologies Oy Method and apparatus for automatically rendering dolly zoom effect
US9275284B2 (en) * 2014-04-30 2016-03-01 Sony Corporation Method and apparatus for extraction of static scene photo from sequence of images
US10757319B1 (en) * 2017-06-15 2020-08-25 Snap Inc. Scaled perspective zoom on resource constrained devices
JP6696092B2 (ja) * 2018-07-13 2020-05-20 エスゼット ディージェイアイ テクノロジー カンパニー リミテッドSz Dji Technology Co.,Ltd 制御装置、移動体、制御方法、及びプログラム
CN110262737A (zh) * 2019-06-25 2019-09-20 维沃移动通信有限公司 一种视频数据的处理方法及终端
CN110363146A (zh) * 2019-07-16 2019-10-22 杭州睿琪软件有限公司 一种物体识别方法、装置、电子设备和存储介质
US11423510B2 (en) * 2019-10-28 2022-08-23 Samsung Electronics Co., Ltd System and method for providing dolly zoom view synthesis

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104980651A (zh) * 2014-04-04 2015-10-14 佳能株式会社 图像处理设备及控制方法
JP2017143354A (ja) * 2016-02-08 2017-08-17 キヤノン株式会社 画像処理装置及び画像処理方法
CN109379537A (zh) * 2018-12-30 2019-02-22 北京旷视科技有限公司 滑动变焦效果实现方法、装置、电子设备及计算机可读存储介质
CN111083380A (zh) * 2019-12-31 2020-04-28 维沃移动通信有限公司 一种视频处理方法、电子设备及存储介质
CN112532808A (zh) * 2020-11-24 2021-03-19 维沃移动通信有限公司 图像处理方法、装置和电子设备

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP4178194A4

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP4037307A1 (en) * 2021-01-27 2022-08-03 Beijing Xiaomi Mobile Software Co., Ltd. Image processing method and apparatus, electronic device, and storage medium
US11800041B2 (en) 2021-01-27 2023-10-24 Beijing Xiaomi Mobile Software Co., Ltd. Image processing method and apparatus, electronic device, and storage medium

Also Published As

Publication number Publication date
EP4178194A4 (en) 2024-04-03
US11836887B2 (en) 2023-12-05
US20230153941A1 (en) 2023-05-18
EP4178194A1 (en) 2023-05-10
CN113949808B (zh) 2022-12-27
CN113949808A (zh) 2022-01-18

Similar Documents

Publication Publication Date Title
WO2022012231A1 (zh) 视频生成方法、装置、可读介质及电子设备
CN111368685B (zh) 关键点的识别方法、装置、可读介质和电子设备
CN111292420B (zh) 用于构建地图的方法和装置
WO2022100735A1 (zh) 视频处理方法、装置、电子设备及存储介质
TW202040986A (zh) 視頻圖像處理方法及裝置
CN105701762B (zh) 一种图片处理方法和电子设备
CN111246196B (zh) 视频处理方法、装置、电子设备及计算机可读存储介质
WO2022205755A1 (zh) 纹理生成方法、装置、设备及存储介质
CN116934577A (zh) 一种风格图像生成方法、装置、设备及介质
CN112907628A (zh) 视频目标追踪方法、装置、存储介质及电子设备
CN114694136A (zh) 一种物品展示方法、装置、设备及介质
JP2023526899A (ja) 画像修復モデルを生成するための方法、デバイス、媒体及びプログラム製品
CN112785669A (zh) 一种虚拟形象合成方法、装置、设备及存储介质
CN114881901A (zh) 视频合成方法、装置、设备、介质及产品
CN112714263B (zh) 视频生成方法、装置、设备及存储介质
WO2022071875A1 (zh) 图片转视频的方法、装置、设备及存储介质
CN108765549A (zh) 一种基于人工智能的产品三维展示方法及装置
CN110619602B (zh) 一种图像生成方法、装置、电子设备及存储介质
CN109889736B (zh) 基于双摄像头、多摄像头的图像获取方法、装置及设备
WO2023088029A1 (zh) 一种封面生成方法、装置、设备及介质
CN113905177B (zh) 视频生成方法、装置、设备及存储介质
CN112492230B (zh) 视频处理方法、装置、可读介质及电子设备
KR20220080696A (ko) 깊이 추정 방법, 디바이스, 전자 장비 및 컴퓨터 판독가능 저장 매체
CN112070903A (zh) 虚拟对象的展示方法、装置、电子设备及计算机存储介质
WO2023056833A1 (zh) 背景图生成、图像融合方法、装置、电子设备及可读介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21841214

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2021841214

Country of ref document: EP

Effective date: 20230131

NENP Non-entry into the national phase

Ref country code: DE