WO2023246844A1 - 视频处理方法、装置、设备及介质 - Google Patents

视频处理方法、装置、设备及介质 Download PDF

Info

Publication number
WO2023246844A1
WO2023246844A1 PCT/CN2023/101608 CN2023101608W WO2023246844A1 WO 2023246844 A1 WO2023246844 A1 WO 2023246844A1 CN 2023101608 W CN2023101608 W CN 2023101608W WO 2023246844 A1 WO2023246844 A1 WO 2023246844A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
target
frame
video
main object
Prior art date
Application number
PCT/CN2023/101608
Other languages
English (en)
French (fr)
Inventor
陈璐双
Original Assignee
北京字跳网络技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京字跳网络技术有限公司 filed Critical 北京字跳网络技术有限公司
Publication of WO2023246844A1 publication Critical patent/WO2023246844A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation
    • G06T11/60Editing figures and text; Combining figures or text
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N1/00Scanning, transmission or reproduction of documents or the like, e.g. facsimile transmission; Details thereof
    • H04N1/387Composing, repositioning or otherwise geometrically modifying originals
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/95Computational photography systems, e.g. light-field imaging systems
    • H04N23/951Computational photography systems, e.g. light-field imaging systems by using two or more images to influence resolution, frame rate or aspect ratio
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/222Studio circuitry; Studio devices; Studio equipment
    • H04N5/262Studio circuits, e.g. for mixing, switching-over, change of character of image, other special effects ; Cameras specially adapted for the electronic generation of special effects
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/222Studio circuitry; Studio devices; Studio equipment
    • H04N5/262Studio circuits, e.g. for mixing, switching-over, change of character of image, other special effects ; Cameras specially adapted for the electronic generation of special effects
    • H04N5/265Mixing
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Definitions

  • the present disclosure relates to the technical field of video processing, and in particular, to a video processing method, device, equipment and medium.
  • creators In the field of video creation, creators usually shoot videos based on their needs. Different shooting methods will result in different video effects. In some cases, creators need to shoot videos with clear main objects, blurred backgrounds and a laggy feeling. Most of this video effect requires the use of professional shooting tools for slow shutter shooting and/or shooting through sports lenses. It also requires the video creator to have excellent shooting skills and a suitable shooting scene.
  • Embodiments of the present disclosure provide a video processing method, which method includes: obtaining multiple image groups based on a video frame sequence of an initial video; performing motion blur processing based on each frame image in the target image group, and converting each frame into The images obtained by motion blur processing are fused to obtain a motion blur image corresponding to the target image group; each image group in the plurality of image groups is the target image group; based on the target image group
  • the specified frame image determines the main object area and the background area corresponding to the target image group; according to the main object area and the background area, the motion blur image and the specified frame image are fused to obtain a target fusion image ;
  • the image part of the target fusion image in the main object area is the image part of the designated frame image in the main object area, and the image part of the target fusion image in the background area is the motion blur image In the image part of the background area; generate a target video based on the target fusion images corresponding to the multiple image groups; the playback order of the target fusion images
  • the step of performing motion blur processing based on each frame image in the target image group and fusing the image obtained by motion blur processing of each frame image includes: using an optical flow interpolation algorithm to fuse all the frame images.
  • a specified number of intermediate frame images are inserted between adjacent frame images in the target image group, and all frame images in the target image group after the interpolation are used as each frame image in the target image group through motion blur Process the resulting image; averagely fuse the images obtained by motion blur processing on each of the frame images.
  • the step of using an optical flow interpolation algorithm to insert a specified number of intermediate frame images between adjacent frame images in the target image group includes: obtaining a link between adjacent frame images in the target image group. A specified number of intermediate frame images are inserted between the adjacent frame images according to the bidirectional motion vector of the pixel block and the block motion compensation algorithm.
  • the step of obtaining bidirectional motion vectors of pixel blocks between adjacent frame images in the target image group includes: obtaining adjacent frames in the target image group based on an improved DIS optical flow algorithm Bidirectional motion vectors of pixel blocks between images; the bottom image resolution of the image pyramid used by the improved DIS optical flow algorithm is smaller than the bottom image resolution of the image pyramid used by the original DIS optical flow algorithm, and/or , the number of iterations used by the improved DIS optical flow algorithm is smaller than the number of iterations used by the original DIS optical flow algorithm.
  • the step of determining the main object area and the background area based on the designated frame image in the target image group includes: using the image located in the middle of the target image group as the designated frame image, using an object instance segmentation algorithm The specified frame image is processed, and the main object area and background area corresponding to the target image group are obtained based on the processing result.
  • the step of performing image fusion on the motion blur image and the specified frame image according to the main object area and the background area includes: according to the main object area and the background area, Obtain the subject object mask image; obtain the weight coefficient corresponding to the subject object mask image; adjust the pixel value of the subject object mask image based on the weight coefficient to obtain the adjusted subject object mask image; based on The adjusted main object mask image is used to perform image fusion on the motion blur image and the specified frame image.
  • the step of obtaining the weight coefficient corresponding to the subject object mask image includes: obtaining the global motion amplitude corresponding to each frame image in the target image group based on the optical flow method; determining based on the global motion amplitude The weight coefficient corresponding to the main object mask image.
  • the step of performing image fusion on the motion blur image and the specified frame image based on the adjusted main object mask image includes: using the following formula to fuse the motion blur image and the specified frame image.
  • Specify frame images for image fusion: Merge_N' ⁇ *mask_main*Pn+(1- ⁇ *mask_main)*Merge_N
  • is the weight coefficient
  • mask_main is the main object mask image
  • ⁇ *mask_main is the adjustment The subsequent subject object mask image
  • Pn is the designated frame image
  • Merge_N is the motion blur image
  • Merge_N' is the target fusion image.
  • the step of obtaining multiple image groups based on the video frame sequence of the initial video includes: dividing the video frame sequence of the initial video according to specified intervals to obtain multiple image groups; two adjacent image groups There are a preset number of overlapping frame images between them.
  • Embodiments of the present disclosure also provide a video processing device, including: an image group acquisition module, used to obtain multiple image groups based on the video frame sequence of the initial video; a blur processing module, used to obtain multiple image groups based on each frame image in the target image group Perform motion blur processing, and fuse the images obtained by the motion blur processing of each frame image to obtain a motion blur image corresponding to the target image group; each image group in the plurality of image groups is the Target image group; a region determination module, used to determine the subject object area and background area corresponding to the target image group based on the designated frame image in the target image group; a fusion module, used to determine the subject object area and the background area corresponding to the target image group based on the subject object area and the In the background area, the motion blur image and the specified frame image are fused to obtain a target fusion image; the image part of the target fusion image in the subject object area is the image part of the specified frame image in the subject object area.
  • Image part, the image part of the target fusion image in the background area is the image part of the motion blur image in the background area; a video generation module, used for target fusion images corresponding to each of the plurality of image groups. Generate a target video; the play order of the target fusion images corresponding to the plurality of image groups in the target video is the same as the play order of the plurality of image groups in the initial video.
  • An embodiment of the present disclosure also provides an electronic device.
  • the electronic device includes: a processor; a memory used to store instructions executable by the processor; and the processor is used to read the instruction from the memory.
  • the instructions are executable and executed to implement the video processing method provided by embodiments of the present disclosure.
  • Embodiments of the present disclosure also provide a computer-readable storage medium.
  • the storage medium stores a computer program.
  • the computer program When the computer program is run by a processor, the computer program causes the processor to execute the video processing method provided by the embodiments of the present disclosure. .
  • An embodiment of the present disclosure also provides a computer program, including: instructions. When executed by a processor, the instructions cause the processor to execute the video processing method provided by the embodiment of the present disclosure.
  • Figure 1 is a schematic flowchart of a video processing method provided by an embodiment of the present disclosure
  • Figure 2 is a schematic diagram of frame interpolation between adjacent frame images provided by an embodiment of the present disclosure
  • Figure 3 is a schematic structural diagram of a video processing device provided by an embodiment of the present disclosure.
  • FIG. 4 is a schematic structural diagram of an electronic device provided by an embodiment of the present disclosure.
  • embodiments of the present disclosure provide a video processing method, device, equipment and medium, which can use software processing to process the video obtained from normal shooting into a video with clear subject portrait, blurred background and stuttering. Sensational video, detailed below.
  • Figure 1 is a schematic flowchart of a video processing method provided by an embodiment of the present disclosure.
  • the method can be executed by a video processing device, where the device can be implemented using software and/or hardware, and can generally be integrated in electronic equipment.
  • the method mainly includes the following steps S102 to S110.
  • Step S102 Obtain multiple image groups based on the video frame sequence of the initial video. .
  • the initial video can be a video that is not limited to shooting tools, shooting skills, and shooting scenes. For example, it can be a video that the user only shoots in any scene with a mobile phone.
  • the initial video can be a video shot in real time by the user, or a pre-shot video uploaded by the user.
  • the video frame sequence of the initial video can be divided according to specified intervals to obtain multiple image groups.
  • Embodiments of the present disclosure do not limit the slicing method.
  • the slicing method can be, for example, even slicing (that is, equally spaced slicing), non-even slicing, or cross slicing (adjacent slicing obtained by cross slicing).
  • the specified interval can be a numerical interval, so the number of frame images in each image group can be the same, including N frames of images.
  • the number of N can be flexibly set according to requirements. For example, it can be determined with reference to the frame rate of the initial video and the real frame rate of the required video.
  • the N value can be the ratio of the frame rate of the initial video to the real frame rate of the required video. If the ratio is not an integer, the integer value closest to the ratio can be used.
  • the frame images between two adjacent image groups are completely different. In other embodiments, some frame images between two adjacent image groups are the same, that is, some frame images overlap. In other words, there are a preset number of overlapping frame images between two adjacent image groups. In this way, the rationality of the number of image groups can be ensured (that is, the frame rate of the subsequently generated video is ensured), and the image fusion effect of each image group during subsequent processing can be guaranteed.
  • an exemplary description is given below.
  • the original frame rate of the initial video is For example, the actual frame rate of the required video is between 10 fps and 15 fps.
  • the original frame rate is 30fps, fuse 3 original frames into one frame; assuming the original frame rate is 60fps, fuse 6 original frames into one frame.
  • Pi is the i-th frame image.
  • P1 to P6 are used as an image group
  • P7 to P12 are used as an image group
  • P12 to P17 are used as an image group...
  • the number of image groups is usually small, and the frame rate of the final generated video is small, causing the stuttering to be too obvious; and if the number of frame images in the image group is reduced, such as using P1 to P3 as an image group and P4 to P6 as one image In the group mode, only fusing 3 frames at a time will result in a small degree of motion smear, making it difficult to observe obvious flow effects.
  • embodiments of the present disclosure can multiplex image frames.
  • Each group of 6 frames is still selected for processing, but there are overlapping frames between adjacent image groups, that is, P1 to P6 are used as an image group, P4 to P9 are used as an image group, and P7 to P12 are used as an image group.
  • P10 ⁇ P15 are regarded as an image group...and so on. That is to say, there are 3 frames of images overlapping between any two adjacent image groups.
  • the number of image groups can be increased while ensuring that each image group contains 6 frames of images. to 2 times. Therefore, it can not only ensure the rationality of the number of image groups, but also ensure the fusion effect of the multi-frame images in each image group during subsequent processing, that is, improve the virtual flow of the overall picture while ensuring the frame rate of the generated video. .
  • Each image group is regarded as a target image group, that is, for each image group, the following steps S104 to S108 are respectively executed.
  • Step S104 Perform motion blur processing based on each frame image in the target image group, and fuse the images obtained by the motion blur processing of each frame image to obtain a motion blur image corresponding to the target image group.
  • Motion Blur is a post-processing method that captures the motion effects of objects (objects, animals, people, etc.). It mainly simulates the camera technique of exposing when the object is moving. For example, the indirect exposure function of shooting moving objects in photography is simulated, so that the image produces a dynamic effect, such as the effect of objects passing or moving. For example, motion blur is processed along a specified direction.
  • motion blur processing is performed based on each frame image in the target image group, and all images obtained by the motion blur processing are fused.
  • the image obtained by motion blur processing may not only include the processed original frame images in the target image group, but may also include additional frame images inserted on the basis of the original frame images during the motion blur processing.
  • the motion blur image corresponding to the target image group can be obtained.
  • the motion blurred image has a blurry and shaky effect.
  • Step S106 Determine the main object area and background area corresponding to the target image group based on the specified frame image in the target image group.
  • the embodiment of the present disclosure does not limit the type of the subject object.
  • the subject object may be a person, an animal, or an object such as a vehicle.
  • the embodiment of the present disclosure proposes an object protection strategy.
  • the specified frame image can be selected from the target image group, for example, the specified frame image can be the middle position frame of the target image group; by performing object segmentation on the specified frame image, based on the segmentation result, the specified frame image can be selected.
  • the main object area and background area corresponding to the target image group are obtained; through the main object area and background area, the main object can be subsequently protected.
  • object segmentation can be performed on the specified frame image (if the main object is a person, for example, portrait segmentation is performed), the main object area and background area in the specified frame image are obtained, and the main object area and background area in the specified frame image are obtained.
  • the background area is also an area other than the main object area.
  • steps S104 and S106 have no sequential relationship and can be executed in parallel.
  • Step S108 fuse the motion blur image and the specified frame image according to the main object area and the background area to obtain a target fusion image.
  • the image part of the target fusion image in the main object area is the image part of the specified frame image in the main object area
  • the image part of the target fusion image in the background area is the image part of the motion blur image in the background area. That is, the main object area in the target fusion image is based on the pixel composition of the main object area in the specified frame image, and the background area in the target fusion image is based on the pixel composition of the background area in the motion blur image.
  • the target fusion image has both a blurry and shaky background image and a relatively clear main object.
  • a specific method can be used to distinguish the main object area and the background area.
  • a main object mask image can be generated based on the main object area and the background area, and the main object mask image can be identified with different pixel values for different areas.
  • the pixel values of the background area in the main object mask image are all 0, and the pixel values of the main object area are all 1; then the motion blur image and the specified frame image are fused based on the main object mask image to obtain the The target fusion image combines the clear main object in the specified frame image and the blurred background in the motion blur image.
  • Step S110 Generate a target video based on the target fusion images corresponding to the multiple image groups.
  • the playback order of the target fusion images corresponding to the multiple image groups in the target video is the same as the playback order of the multiple image groups in the initial video.
  • Each image group performs the above steps S104 to S108 respectively as a target image group, so each image group corresponds to a target fusion image.
  • All target fusion images are arranged according to the corresponding position relationship of multiple image groups in the video frame sequence of the initial video.
  • Each target fusion image is regarded as a frame that constitutes the target video.
  • Multiple target fusion images are arranged in order.
  • the number of video frames contained in the target video is less than the number of video frames in the initial video.
  • Each frame of the image in the target video is a fusion of multiple frames of images in the initial video after processing such as motion blur and subject object protection. Therefore, the target video can give people a certain sense of lag, and the image The background of the picture is blurry, but the main character is clear.
  • the above technical solution provided by the embodiment of the present disclosure can obtain multiple image groups based on the video frame sequence of the initial video, and use each image group as a target image group to perform the following operations: based on each frame image in the target image group motion blur processing, and fuse the images obtained by motion blur processing of each frame image to obtain the motion blur image corresponding to the target image group; determine the subject object area and background area corresponding to the target image group based on the specified frame image in the target image group ; Then according to the main object area and background area, the motion blur image and the specified frame image are fused to obtain the target fusion image; finally, the target video is generated based on the corresponding target fusion images of multiple image groups.
  • the software algorithm can be used to process the video obtained from normal shooting into a video with a clear subject portrait, a blurry background and a laggy effect, so that users are not limited by shooting tools, shooting skills and shooting skills. scene, you can quickly and easily get the above video shooting effects.
  • the steps of performing motion blur processing on each frame image in the target image group and fusing the images obtained by motion blur processing on each frame image can be performed with reference to the following steps A to B.
  • Step A Use the optical flow interpolation algorithm to insert a specified number of intermediate frame images between adjacent frame images in the target image group, and use all frame images in the target image group after interpolation as each frame in the target image group.
  • Optical flow is the "instantaneous speed" of the pixel movement of a spatially moving object on the observation imaging plane.
  • the study of optical flow uses the temporal changes and correlations of pixel intensity data in an image sequence to determine the "movement" of the respective pixel positions.
  • pixels in one image are matched with pixels in another image. Through matching, you can learn how the pixels "move” or "flow” from one image to another. .
  • the intermediate view between the two images can be interpolated by moving the pixels locally.
  • sparse optical flow interpolation may be used to interpolate frames.
  • the frame image is divided into pixel blocks of a specified size (such as 16*16), and matching between pixel blocks and motion vector calculation are performed in units of pixel blocks.
  • the motion vectors corresponding to all pixels belonging to the same pixel block are the same, and the motion vectors between different pixel blocks may be the same or different. In this way, computing power can be greatly saved.
  • the above method can be directly used for video processing. Based on this, in some implementation examples, the above step A may be performed with reference to the following steps A1 to A2.
  • Step A1 Obtain the bidirectional motion vector of the pixel block between adjacent frame images in the target image group.
  • bidirectional motion vectors include forward motion vectors and reverse motion vectors.
  • the adjacent frame images are respectively the previous frame image Fa and the following frame image Fb.
  • Fa as the benchmark
  • Fb match the pixel blocks in Fa with the pixel blocks in Fb
  • Fb as the benchmark
  • Bidirectional motion vectors can reasonably and reliably characterize the optical flow movement trend of pixel blocks between images.
  • bidirectional motion vectors of pixel blocks between adjacent frame images in the target image group can be obtained based on the improved DIS optical flow algorithm.
  • the bottom image resolution of the image pyramid used by the improved DIS optical flow algorithm is smaller than the bottom image resolution of the image pyramid used by the original DIS optical flow algorithm.
  • the number of iterations used by the improved DIS optical flow algorithm is smaller than the number of iterations used by the original DIS optical flow algorithm.
  • the bottom image resolution of the image pyramid used by the original DIS optical flow algorithm is the original image resolution
  • the bottom image resolution of the image pyramid used by the improved DIS optical flow algorithm is the original image resolution. 1/4;
  • the number of iterations of the original DIS optical flow algorithm is 12, while the number of iterations used by the improved DIS optical flow algorithm is 5.
  • the DIS optical flow algorithm is the abbreviation of Dense Inverse Search-based method.
  • the original DIS optical flow algorithm is a dense optical flow algorithm.
  • improvements are made on the basis of the original DIS optical flow algorithm. For example, the DIS algorithm scales the image to different scales and constructs an image pyramid; then starting from the minimum resolution layer, it estimates the optical flow layer by layer. The optical flow estimated at each layer will be used as the estimate for the next layer. Initialization to achieve the purpose of accurately estimating motions of different amplitudes.
  • the DIS optical flow algorithm is improved to reduce the bottom image resolution of the image pyramid (that is, the highest resolution). For example, set the highest resolution to 1/4 of the original image. In addition, at the highest resolution, there is no need for a densification step, and sparse optical flow is finally obtained.
  • the embodiment of the present disclosure only needs to obtain sparse optical flow and does not require high accuracy, when using gradient descent to solve the problem, only a smaller number of iterations needs to be used. Therefore, the 12 iterations of the original DIS optical flow algorithm were changed to 5 iterations. After improving the DIS optical flow algorithm, the improved DIS optical flow algorithm can be used to quickly obtain the bidirectional motion vectors of pixel blocks between adjacent frame images.
  • Step A2 Insert a specified number of intermediate frame images between adjacent frame images based on the bidirectional motion vector of the pixel block and the block motion compensation algorithm.
  • An intermediate frame image is an image inserted between adjacent frame images.
  • Motion compensation is a method of describing the difference between adjacent frames, for example, describing how each pixel block in the previous frame image gradually moves to a certain position in the subsequent frame image.
  • each frame image is divided into a number of pixel blocks; based on the pixel blocks and corresponding motion vectors in the original frame image, its motion in the intermediate frame image can be predicted s position.
  • the pixel blocks of adjacent frame images can be sampled equidistantly M times on the motion path; each time a frame is sampled, a frame is inserted, and the sampling value M It can characterize the fineness of image fusion.
  • Frame interpolation is performed through block motion compensation to obtain blur effect images between adjacent frames.
  • Fa and Fb are adjacent frames.
  • For any pixel block block_i in the Fa frame find the corresponding block_i0 and block_iM; and through the bidirectional motion vector of the pixel block (forward motion vector F_ab, reverse motion vector F_ba), the corresponding motion path is equidistantly sampled M times, and one frame is inserted for each sampling.
  • each pixel is copied and superimposed on the motion path of the pixel block to which it belongs, thereby creating a truly smooth motion blur effect.
  • multiple intermediate frame images can be inserted between adjacent frame images through multiple sampling, and the intermediate frame images are all blur images.
  • Step B averagely fuse the images obtained by motion blur processing of each frame of image.
  • the motion blurred image corresponding to the target image group can be obtained by averaging the pixel values of all images obtained by motion blur processing (original adjacent image frames and inserted intermediate frame images).
  • the final motion blur image can simulate the indirect exposure function of shooting moving objects in photography, so that the image produces a dynamic effect of motion blur.
  • the processing method based on pixel blocks can also reduce the required computing power while ensuring the image fusion effect, effectively improve the overall algorithm performance, and ensure the feasibility of mobile terminal implementation.
  • motion blur images can be generated based on the frame images in each image group.
  • the blur degree of the motion blur image is usually proportional to the degree of motion. The faster the motion, the longer the smear.
  • the implementation principle and achievable effects of the above algorithm are consistent with the principle of real slow shutter and the degree of blur produced. Therefore, they all have a problem: when the user wants the background of the picture to be motion blurred and the main object to be relatively clear, neither the above-mentioned blur processing algorithm nor the real shooting effect used can avoid the blurring of the subject caused by the movement of the subject or the shaking of the shooting equipment.
  • the main object in the motion blur image obtained through the above motion blur processing method provided by the embodiment of the present disclosure is also blurred, making it difficult to clearly present it to the user.
  • embodiments of the present disclosure propose an object protection strategy, which can perform object segmentation based on specified frame images in the target image group to obtain The main object area and background area corresponding to the target image group; object protection is performed through the main object area and background area. For example, specifying a frame image allows you to select an image located in the middle of the target image group, making subsequent fusion more natural.
  • the image located in the middle of the target image group is used as the designated frame image
  • an object instance segmentation algorithm is used to process the designated frame image
  • the main object area and background area corresponding to the target image group are obtained based on the processing results.
  • a main object mask image can be obtained based on the main object area and the background area.
  • the subject object mask may be determined from at least one object mask.
  • the main object mask is the object mask closest to the center of the image, thereby obtaining the main object mask image.
  • steps 1 to 4 may also be referred to.
  • Step 1 Perform image erosion on the object segmentation result (Alpha segmentation map) of the specified frame image to reduce the connectivity between multiple objects.
  • Step 2 Binarize the corroded image, and then perform connected area detection to find the large connected area closest to the center of the image as the main object.
  • Step 3 Expand the selected connected area and map it to the original Alpha segmentation map to obtain the main object mask.
  • Step 4 Optimize the main object mask, for example, perform mean blur and smooth edge processing to obtain a main object mask image.
  • the main object mask image can be obtained, so that the main object mask image can be subsequently used to protect the main object.
  • the two processes of acquiring the motion blur image corresponding to the target image group and the subject mask image are not prioritized and can be executed in parallel.
  • the target fusion image corresponding to the target image group can be obtained based on the motion blur image, the main object mask image and the specified frame image.
  • embodiments of the present disclosure can also control the degree of subject object protection. For example, when the global motion amplitude is large, the main object will not be particularly clear to avoid a sense of violation. Based on this, the steps of image fusion of the motion blur image and the specified frame image according to the main object mask image can refer to the following steps (1) to (3).
  • step (1) the weight coefficient corresponding to the main object mask image is obtained.
  • Weight coefficient and main object The degree of protection is related. The greater the weight coefficient, the higher the degree of protection of the subject object and the clearer the subject object.
  • the global motion amplitude corresponding to each frame image in the target image group can be obtained based on the optical flow method; the weight coefficient corresponding to the main object mask image is determined based on the global motion amplitude.
  • the sparse optical flow method can be used to determine the motion information of pixel blocks, thereby obtaining the global motion amplitude corresponding to each frame image in the target image group.
  • the global motion amplitude is negatively correlated with the weight coefficient. The greater the global motion amplitude, that is, the faster the movement, the smaller the weight coefficient, and the lower the clarity of the subject object (but it will still be higher than the clarity of the blurred background, which just makes the subject The object is not particularly clear).
  • embodiments of the present disclosure can adjust the degree of object protection according to the global motion amplitude caused by lens movement.
  • step (2) the pixel value of the main object mask image is adjusted based on the weight coefficient to obtain an adjusted main object mask image.
  • the weight coefficient may be multiplied by the pixel value of the subject object mask image to obtain an adjusted subject object mask image.
  • step (3) image fusion is performed on the motion blur image and the specified frame image based on the adjusted main object mask image.
  • is the weight coefficient
  • mask_main is the main object mask image
  • ⁇ *mask_main is the adjusted main object mask image
  • Pn is the specified frame image
  • Merge_N is the motion blur image
  • Merge_N’ is the target fusion image.
  • the resulting target fusion image is a blurry background image, but the main object is relatively clear, and the degree of clarity can be adjusted based on the weight coefficient; the weight coefficient can be based on the global motion amplitude caused by the lens movement. Determine, so that the clarity of the main object is related to the global motion amplitude, and the picture effect is more realistic and natural.
  • the corresponding target fusion image can be obtained using the above method. Arrange all target fusion images in order to form the required target video. And because the multi-frame image of the initial video is fused into one frame of image in the target video, the frame rate is reduced, so it can give the user a sense of lag. In summary, users do not need to be limited by shooting tools, shooting skills and shooting scenes. Only through the above video processing method provided by the embodiments of the present disclosure and using software algorithms, the user's normal shooting video can be conveniently and quickly converted into a video with clear main objects.
  • the target video has a blurry background and a sense of stuttering.
  • the above-mentioned target video has a unique style, which can present users with a video picture with a sense of movement and lag, but the main object in the video picture is still clear, so It can better highlight the main object.
  • the above video effects can reflect the inner consciousness of the subject to a certain extent and have strong appeal.
  • using methods such as sparse optical flow algorithm to perform motion blur during the processing can effectively reduce computing power, improve overall algorithm performance, and ensure the feasibility of mobile terminal implementation. Therefore, it can be implemented both on the server side and on the mobile terminal, and has a wider scope of application.
  • FIG. 3 is a schematic structural diagram of a video processing device provided by an embodiment of the present disclosure.
  • the device can be implemented by software and/or hardware, and can generally be integrated in electronic equipment, as shown in Figure 4.
  • the video processing device includes: an image group acquisition module 302, configured to obtain multiple image groups based on the video frame sequence of the initial video;
  • the blur processing module 304 is configured to perform motion blur processing based on each frame image in the target image group, and fuse the images obtained by the motion blur processing of each frame image to obtain a motion blur image corresponding to the target image group;
  • Each image group in the plurality of image groups is the target image group;
  • the area determination module 306 is configured to determine the main object area and the background area corresponding to the target image group based on the specified frame image in the target image group;
  • the fusion module 308 is configured to fuse the motion blur image and the specified frame image according to the main object area and the background area to obtain a target fusion image; the target fusion image is in the main object area.
  • the image part is the image part of the designated frame image in the main object area, and the image part of the target fusion image in the background area is the image part of the motion blur image in the background area;
  • the video generation module 310 is configured to generate a target video based on the target fusion images corresponding to the plurality of image groups; the playback order of the target fusion images corresponding to the plurality of image groups in the target video is consistent with the playback order of the target fusion images corresponding to the plurality of image groups.
  • the groups of images are played in the same order in the initial video.
  • the blur processing module 304 is configured to use an optical flow interpolation algorithm to insert a specified number of intermediate frame images between adjacent frame images in the target image group, and convert the interpolated target image into All frame images in the group are used as images obtained by motion blur processing of each frame image in the target image group; the images obtained by motion blur processing of each frame image are averagely fused.
  • the blur processing module 304 is used to: obtain adjacent frame images in the target image group. bidirectional motion vectors of pixel blocks between images; inserting a specified number of intermediate frame images between adjacent frame images based on the bidirectional motion vectors of the pixel blocks and the block motion compensation algorithm.
  • the blur processing module 304 is configured to: obtain bidirectional motion vectors of pixel blocks between adjacent frame images in the target image group based on an improved DIS optical flow algorithm; the improved DIS optical flow
  • the bottom image resolution of the image pyramid used by the algorithm is smaller than the bottom image resolution of the image pyramid used by the original DIS optical flow algorithm, and/or the number of iterations used by the improved DIS optical flow algorithm is smaller than that of the original DIS optical flow algorithm.
  • the number of iterations used by the DIS optical flow algorithm is configured to: obtain bidirectional motion vectors of pixel blocks between adjacent frame images in the target image group based on an improved DIS optical flow algorithm; the improved DIS optical flow
  • the bottom image resolution of the image pyramid used by the algorithm is smaller than the bottom image resolution of the image pyramid used by the original DIS optical flow algorithm, and/or the number of iterations used by the improved DIS optical flow algorithm is smaller than that of the original DIS optical flow algorithm. The number of iterations used by the DIS optical flow algorithm.
  • the area determination module 306 is configured to: use the image located in the middle of the target image group as a designated frame image, use an object instance segmentation algorithm to process the designated frame image, and obtain the target based on the processing result.
  • the fusion module 308 is configured to: obtain a main object mask image according to the main object area and the background area; obtain a weight coefficient corresponding to the main object mask image; and adjust based on the weight coefficient The pixel values of the main object mask image are used to obtain the adjusted main object mask image; based on the adjusted main object mask image, image fusion is performed on the motion blur image and the designated frame image. .
  • the fusion module 308 is configured to: obtain the global motion amplitude corresponding to each frame image in the target image group based on the optical flow method; determine the weight coefficient corresponding to the subject object mask image according to the global motion amplitude. .
  • the fusion module 308 is configured to perform image fusion on the motion blur image and the specified frame image based on the adjusted main object mask image, including:
  • Merge_N' ⁇ *mask_main*Pn+(1- ⁇ *mask_main)*Merge_N
  • is the weight coefficient
  • mask_main is the main object mask image
  • ⁇ *mask_main is the adjusted main object mask image
  • Pn is the designated frame image
  • Merge_N is the motion blur image
  • the image group acquisition module 302 is used to: segment the video frame sequence of the initial video according to specified intervals to obtain multiple image groups; there are a preset number of overlapping frames between two adjacent image groups. image.
  • the video processing device provided by the embodiments of the present disclosure can execute the video processing method provided by any embodiment of the present disclosure, and has functional modules and beneficial effects corresponding to the execution method.
  • FIG. 4 is a schematic structural diagram of an electronic device provided by an embodiment of the present disclosure. As shown in FIG. 4 , electronic device 400 includes one or more processors 401 and memory 402 .
  • the processor 401 may be a central processing unit (CPU) or other form of processing unit having data processing capabilities and/or instruction execution capabilities, and may control other components in the electronic device 400 to perform desired functions.
  • CPU central processing unit
  • the processor 401 may be a central processing unit (CPU) or other form of processing unit having data processing capabilities and/or instruction execution capabilities, and may control other components in the electronic device 400 to perform desired functions.
  • Memory 402 may include one or more computer program products, which may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory.
  • the volatile memory may include, for example, random access memory (RAM) and/or cache memory (cache).
  • the non-volatile memory may include, for example, read-only memory (ROM), hard disk, flash memory, etc.
  • One or more computer program instructions may be stored on the computer-readable storage medium, and the processor 401 may execute the program instructions to implement the video processing method of the embodiments of the present disclosure described above and/or other desired function.
  • Various contents such as input signals, signal components, noise components, etc. may also be stored in the computer-readable storage medium.
  • the electronic device 400 may also include an input device 403 and an output device 404, with these components interconnected by a bus system and/or other forms of connection mechanisms (not shown).
  • the input device 403 may also include, for example, a keyboard, a mouse, and the like.
  • the output device 404 can output various information to the outside, including determined distance information, direction information, etc.
  • the output device 404 may include, for example, a display, a speaker, a printer, a communication network and remote output devices connected thereto, and the like.
  • the electronic device 400 may also include any other appropriate components depending on the specific application.
  • embodiments of the present disclosure may also be computer program products, which include computer program instructions that, when run by a processor, cause the processor to execute the video provided by the embodiments of the present disclosure. Approach.
  • the computer program product may be written in any combination of one or more programming languages for execution Program codes for executing the operations of the embodiments of the present disclosure.
  • the programming languages include object-oriented programming languages, such as Java, C++, etc., and also include conventional procedural programming languages, such as "C" language or similar programming languages.
  • the program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server execute on.
  • embodiments of the present disclosure may also be a computer-readable storage medium having computer program instructions stored thereon.
  • the computer program instructions When the computer program instructions are run by a processor, the computer program instructions cause the processor to perform the video processing provided by the embodiments of the present disclosure. method.
  • the computer-readable storage medium may be any combination of one or more readable media.
  • the readable medium may be a readable signal medium or a readable storage medium.
  • the readable storage medium may include, for example, but is not limited to, electrical, magnetic, optical, electromagnetic, infrared, or semiconductor systems, devices or devices, or any combination thereof. Examples of readable storage media (a non-exhaustive list) include: an electrical connection having one or more conductors, a portable disk, a hard disk, random access memory (RAM), read only memory (ROM), removable Programmd read-only memory (EPROM or flash memory), fiber optics, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.
  • An embodiment of the present disclosure also provides a computer program product, which includes a computer program/instruction.
  • a computer program product which includes a computer program/instruction.
  • An embodiment of the present disclosure also provides a computer program, which includes instructions. When executed by a processor, the instructions cause the processor to execute the video processing method provided by the embodiment of the present disclosure.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Image Processing (AREA)

Abstract

本公开实施例涉及一种视频处理方法、装置、设备及介质,该方法包括:基于初始视频的视频帧序列获得多个图像组;基于目标图像组中的各帧图像进行运动模糊处理,并将各帧图像经过运动模糊处理所得的图像进行融合,得到目标图像组对应的运动模糊图像;基于目标图像组中的指定帧图像确定目标图像组对应的主体对象区域和背景区域;多个图像组中的每个图像组均为目标图像组;根据主体对象区域和背景区域,对运动模糊图像和指定帧图像进行融合,得到目标融合图像;基于多个图像组各自对应的目标融合图像生成目标视频。

Description

视频处理方法、装置、设备及介质
相关申请的交叉引用
本申请是以CN申请号为202210705983.3,申请日为2022年6月21日的申请为基础,并主张其优先权,该CN申请的公开内容在此作为整体引入本申请中。
技术领域
本公开涉及视频处理技术领域,尤其涉及一种视频处理方法、装置、设备及介质。
背景技术
在视频创作领域中,创作者通常会根据需求进行视频拍摄。拍摄方式不同,得到的视频效果不同。在一些场合下,创作者需要拍摄出主体对象清晰、背景虚晃且带有卡顿感的视频效果。这种视频效果大多需要借助专业的拍摄工具进行慢快门拍摄和/或通过运动镜头进行拍摄,而且也需要视频创作者具有过硬的拍摄技能以及需要合适的拍摄场景。
发明内容
本公开实施例提供了一种视频处理方法,所述方法包括:基于初始视频的视频帧序列获得多个图像组;基于目标图像组中的各帧图像进行运动模糊处理,并将所述各帧图像经过运动模糊处理所得的图像进行融合,得到所述目标图像组对应的运动模糊图像;所述多个图像组中的每个图像组均为所述目标图像组;基于所述目标图像组中的指定帧图像确定所述目标图像组对应的主体对象区域和背景区域;根据所述主体对象区域和所述背景区域,对所述运动模糊图像和所述指定帧图像进行融合,得到目标融合图像;所述目标融合图像在所述主体对象区域的图像部分为所述指定帧图像在所述主体对象区域的图像部分,所述目标融合图像在所述背景区域的图像部分为所述运动模糊图像在所述背景区域的图像部分;基于所述多个图像组各自对应的目标融合图像生成目标视频;所述多个图像组各自对应的目标融合图像在所述目标视频中的播放顺序与所述多个图像组在所述初始视频中的播放顺序相同。
在一些实施例中,基于目标图像组中的各帧图像进行运动模糊处理,并将所述各帧图像经过运动模糊处理所得的图像进行融合的步骤,包括:采用光流插值算法对所 述目标图像组中的相邻帧图像之间均插入指定数量个中间帧图像,将插帧后的所述目标图像组中的所有帧图像作为所述目标图像组中的各帧图像经过运动模糊处理所得的图像;将所述各帧图像经过运动模糊处理所得的图像进行平均融合。
在一些实施例中,采用光流插值算法对所述目标图像组中的相邻帧图像之间插入指定数量个中间帧图像的步骤,包括:获取所述目标图像组中的相邻帧图像之间像素块的双向运动向量;根据所述像素块的双向运动向量以及块运动补偿算法对所述相邻帧图像之间插入指定数量个中间帧图像。
在一些实施例中,获取所述目标图像组中的相邻帧图像之间像素块的双向运动向量的步骤,包括:基于改进后的DIS光流算法获取所述目标图像组中的相邻帧图像之间像素块的双向运动向量;所述改进后的DIS光流算法所采用的图像金字塔的底层图像分辨率小于原有DIS光流算法所采用的图像金字塔的底层图像分辨率,和/或,所述改进后的DIS光流算法所采用的迭代次数小于原有DIS光流算法所采用的迭代次数。
在一些实施例中,基于所述目标图像组中的指定帧图像确定主体对象区域和背景区域的步骤,包括:将位于所述目标图像组中间位置的图像作为指定帧图像,采用对象实例分割算法对所述指定帧图像进行处理,基于处理结果得到所述目标图像组对应的主体对象区域和背景区域。
在一些实施例中,根据所述主体对象区域和所述背景区域,对所述运动模糊图像和所述指定帧图像进行图像融合的步骤,包括:根据所述主体对象区域和所述背景区域,得到主体对象掩膜图像;获取所述主体对象掩膜图像对应的权重系数;基于所述权重系数调整所述主体对象掩膜图像的像素值,得到调整后的所述主体对象掩膜图像;基于调整后的所述主体对象掩膜图像,对所述运动模糊图像和所述指定帧图像进行图像融合。
在一些实施例中,获取所述主体对象掩膜图像对应的权重系数的步骤,包括:基于光流法获取所述目标图像组中各帧图像对应的全局运动幅度;根据所述全局运动幅度确定所述主体对象掩膜图像对应的权重系数。
在一些实施例中,基于调整后的所述主体对象掩膜图像,对所述运动模糊图像和所述指定帧图像进行图像融合的步骤,包括:采用如下公式对所述运动模糊图像和所述指定帧图像进行图像融合:
Merge_N’=β*mask_main*Pn+(1-β*mask_main)*Merge_N
β为所述权重系数;mask_main为所述主体对象掩膜图像;β*mask_main为调整 后的所述主体对像掩膜图像;Pn为所述指定帧图像;Merge_N为所述运动模糊图像;Merge_N’为所述目标融合图像。
在一些实施例中,基于初始视频的视频帧序列获得多个图像组的步骤,包括:将初始视频的视频帧序列按照指定间隔进行切分,得到多个图像组;相邻的两个图像组之间具有预设数量个重叠帧图像。
本公开实施例还提供了一种视频处理装置,包括:图像组获取模块,用于基于初始视频的视频帧序列获得多个图像组;模糊处理模块,用于基于目标图像组中的各帧图像进行运动模糊处理,并将所述各帧图像经过运动模糊处理所得的图像进行融合,得到所述目标图像组对应的运动模糊图像;所述多个图像组中的每个图像组均为所述目标图像组;区域确定模块,用于基于所述目标图像组中的指定帧图像确定所述目标图像组对应的主体对象区域和背景区域;融合模块,用于根据所述主体对象区域和所述背景区域,对所述运动模糊图像和所述指定帧图像进行融合,得到目标融合图像;所述目标融合图像在所述主体对象区域的图像部分为所述指定帧图像在所述主体对象区域的图像部分,所述目标融合图像在所述背景区域的图像部分为所述运动模糊图像在所述背景区域的图像部分;视频生成模块,用于基于所述多个图像组各自对应的目标融合图像生成目标视频;所述多个图像组各自对应的目标融合图像在所述目标视频中的播放顺序与所述多个图像组在所述初始视频中的播放顺序相同。
本公开实施例还提供了一种电子设备,所述电子设备包括:处理器;用于存储所述处理器可执行指令的存储器;所述处理器,用于从所述存储器中读取所述可执行指令,并执行所述指令以实现如本公开实施例提供的视频处理方法。
本公开实施例还提供了一种计算机可读存储介质,所述存储介质存储有计算机程序,所述计算机程序在被处理器运行时使得所述处理器执行如本公开实施例提供的视频处理方法。
本公开实施例还提供了一种计算机程序,包括:指令,指令当由处理器执行时使处理器执行本公开实施例提供的视频处理方法。
应当理解,本部分所描述的内容并非旨在标识本公开的实施例的关键或重要特征,也不用于限制本公开的范围。本公开的其它特征将通过以下的说明书而变得容易理解。
附图说明
此处的附图被并入说明书中并构成本说明书的一部分,示出了符合本公开的实施 例,并与说明书一起用于解释本公开的原理。
为了更清楚地说明本公开实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,对于本领域普通技术人员而言,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。
图1为本公开实施例提供的一种视频处理方法的流程示意图;
图2为本公开实施例提供的一种相邻帧图像之间的插帧示意图;
图3为本公开实施例提供的一种视频处理装置的结构示意图;
图4为本公开实施例提供的一种电子设备的结构示意图。
具体实施方式
为了能够更清楚地理解本公开的上述目的、特征和优点,下面将对本公开的方案进行进一步描述。需要说明的是,在不冲突的情况下,本公开的实施例及实施例中的特征可以相互组合。
在下面的描述中阐述了很多具体细节以便于充分理解本公开,但本公开还可以采用其他不同于在此描述的方式来实施;显然,说明书中的实施例只是本公开的一部分实施例,而不是全部的实施例。
如前所述,想要获得主体对象清晰、背景虚晃且带有卡顿感的视频效果,大多需要借助专业的拍摄工具进行慢快门拍摄和/或通过运动镜头进行拍摄,而且也需要视频创作者具有过硬的拍摄技能以及需要合适的拍摄场景。大多数视频创作者难以满足上述拍摄条件,想要拍摄得到上述视频效果的难度较高。
为了获得具有主体对象清晰、背景虚晃且带有卡顿感的视频效果,通常需要专业的拍摄工具、过硬的拍摄技能以及合适的拍摄场景。例如,需要专业的拍摄稳定器搭配三角架进行慢快门拍摄,通过慢快门实现虚晃背景及运动拖影,以此营造画面迷离感,在拍摄过程中还需要专业调整慢快门以及利用合理曝光才可达到所需效果。另外,若要拍摄得到上述视频效果,对拍摄场景要求较高,诸如需要拍摄场景是夜里或者光线较暗情况,否则在光线较足的情况下容易过曝。
在相关技术中,大多采用控制拍摄帧率和曝光时间的方式来产生运动拖影,但这种方式会受到拍摄场景的限制,只能在光线较暗的场景拍摄,无法适用于所有场景。另外也无法对视频中的主体对象进行保护,只能是全图都拖影,难以保障主体对象清晰。另外,对于用户个人拍摄而言,大多缺少专业的拍摄稳定器,用户往往因为手持 抖动而出现主体模糊的情况。
为了改善以上问题,本公开实施例提供了一种视频处理方法、装置、设备及介质,能够利用软件处理的方式将正常拍摄所得的视频处理成为具有主体人像清晰、背景虚晃模糊且具有卡顿感的视频,以下进行详细说明。
图1为本公开实施例提供的一种视频处理方法的流程示意图,该方法可以由视频处理装置执行,其中该装置可以采用软件和/或硬件实现,一般可集成在电子设备中。如图1所示,该方法主要包括如下步骤S102~步骤S110。
步骤S102,基于初始视频的视频帧序列获得多个图像组。。
初始视频可以是不受限于拍摄工具、拍摄技能以及拍摄场景拍摄所得的视频,诸如,可以是用户仅是用手机在任何场景下拍摄所得的视频。初始视频可以是用户实时拍摄所得的视频,也可以是用户上传的预先拍摄的视频。
在一些实施方式中,可以将初始视频的视频帧序列按照指定间隔进行切分,得到多个图像组。本公开实施例对切分方式不进行限定,该切分方式诸如可以为平均切分(也即等间隔切分)、非平均切分、也可以为交叉切分(交叉切分所得的相邻图像组之间存在重叠帧图像)。该指定间隔可以为数量间隔,因此每个图像组中的帧图像的数量可以相同,均包含N帧图像。N的数量可以根据需求而灵活设置,示例性地,可以参照初始视频的帧率以及所需视频的真实帧率确定。诸如,N值可以为初始视频的帧率与所需视频的真实帧率的比值,如比值非整数,则可以取最接近该比值的整数值。在一些实施方式中,相邻的两个图像组之间的帧图像完全不同。在另一些实施方式中,相邻的两个图像组之间的部分帧图像相同,也即有部分帧图像重叠,换言之,相邻的两个图像组之间具有预设数量个重叠帧图像。通过这种方式,既可以保证图像组数量的合理性(也即保证后续生成视频的帧率合理性),又可以保障每个图像组在后续处理时的图像融合效果。为便于理解,以下进行示例性说明。
假设初始视频的原始帧率为Xfps,为了生成连续卡顿的视频,可以以N帧图像为一组进行处理,以便后续基于N帧图像融合为一帧图像。诸如,所需视频的真实帧率为10fps~15fps之间,示例性地可以选择N=X/10,也即融合X/10个原始帧为一帧。假设原始帧率为30fps,则融合3个原始帧为一帧;假设原始帧率为60fps,则融合6个原始帧为一帧。以上仅为选取N值的示例,不应当被视为限制。对于待处理视频的视频帧序列而言,假设Pi为第i帧图像。在一些实施方式中,以P1~P6作为一个图像组,P7~P12作为一个图像组,P12~P17作为一个图像组….依次类推;这种方式所得 的图像组数量通常较少,最终生成视频的帧率较少,致使卡顿太明显;而如果缩减图像组中帧图像的数量,诸如以P1~P3作为一个图像组,P4~P6作为一个图像组的方式,每次仅融合3帧会导致运动拖影程度较小,不易观察到明显的流动效果。为了达到更好的融合效果,本公开实施例可以对图像帧进行复用。仍旧选择每6帧为一组进行处理,但相邻的图像组之间具有重叠帧,也即采用P1~P6作为一个图像组,P4~P9作为一个图像组,P7~P12作为一个图像组,P10~P15作为一个图像组….依次类推。也即,任意两个相邻图像组之间都有3帧图像重叠,通过这种帧图像复用的方式,在保障每个图像组中包含有6帧图像的同时,可将图像组数量提升至2倍。因此既可以保障图像组数量的合理性,又可以保障每个图像组中的多帧图像在后续处理时的融合效果,也即在保证生成视频帧率的前提下提升整体画面的虚晃流动感。
将每个图像组分别作为目标图像组,也即针对每个图像组,分别执行下述步骤S104~步骤S108。
步骤S104,基于目标图像组中的各帧图像进行运动模糊处理,并将所述各帧图像经过运动模糊处理所得的图像进行融合,得到目标图像组对应的运动模糊图像。
运动模糊处理(Motion Blur)是一种抓取对象(物体、动物或人物等)运动状态效果的后处理方式,主要模拟在对象运动时曝光的摄像手法。例如,模拟出摄像中拍摄运动对象的间接曝光功能,从而使图像产生出一种动态效果,诸如制造出对象掠过或移动的效果。例如,运动模糊处理为沿指定的方向。
在本公开实施例中,通过基于目标图像组中的各帧图像进行运动模糊处理,并将运动模糊处理所得的所有图像进行融合。例如,运动模糊处理所得的图像可以不仅包含有目标图像组中经处理的原始帧图像,还可以包含在运动模糊处理过程中在原始帧图像的基础上额外插入的帧图像。最后将所有图像进行融合之后,即可得到目标图像组对应的运动模糊图像。该运动模糊图像即具有模糊虚晃的画面效果。
步骤S106,基于目标图像组中的指定帧图像确定目标图像组对应的主体对象区域和背景区域。
本公开实施例对主体对象的类型不进行限制,主体对象诸如可以为人物,也可以为动物或者诸如车辆等物品。
为了使视频中的主体对象部分能够相对清晰的出现在画面中,本公开实施例提出了对象保护策略。例如,可以从目标图像组中选择指定帧图像,诸如,该指定帧图像可以为目标图像组的中间位置帧;通过对指定帧图像进行对象分割,基于分割结果可 最终获取目标图像组对应的主体对象区域和背景区域;通过主体对象区域和背景区域即可在后续实现对主体对象的保护。例如,可对指定帧图像进行对象分割(以主体对象是人物为例,则进行人像分割),得到指定帧图像中的主体对象区域和背景区域,将指定帧图像中的主体对象区域和背景区域作为目标图像组对应的主体对象区域和背景区域,背景区域也即除主体对象区域之外的区域。
应当注意的是,上述步骤S104和步骤S106无先后关系,可以并行执行。
步骤S108,根据主体对象区域和背景区域,对运动模糊图像和指定帧图像进行融合,得到目标融合图像。
目标融合图像在主体对象区域的图像部分为指定帧图像在主体对象区域的图像部分,目标融合图像在背景区域的图像部分为运动模糊图像在背景区域的图像部分。也即,目标融合图像中的主体对象区域基于指定帧图像中主体对象区域的像素组成,目标融合图像中的背景区域基于运动模糊图像中背景区域的像素组成。通过上述方式,使得目标融合图像既有模糊虚晃的背景画面,又有相对清晰的主体对象。
例如,在将目标图像组中的指定帧图像进行主体对象区域和背景区域进行分割后,可以采用特定方式区分主体对象区域和背景区域。诸如,可以基于主体对象区域和背景区域生成主体对象掩膜图像,该主体对象掩膜图像可以针对不同的区域采用不同的像素值进行标识。示例性地,主体对象掩膜图像中的背景区域的像素值均为0,主体对象区域的像素值均为1;然后基于主体对象掩膜图像对运动模糊图像和指定帧图像进行融合,得到将指定帧图像中清晰的主体对象与运动模糊图像中模糊虚晃的背景相结合的目标融合图像。
步骤S110,基于多个图像组各自对应的目标融合图像生成目标视频,多个图像组各自对应的目标融合图像在目标视频中的播放顺序与多个图像组在初始视频中的播放顺序相同。
每个图像组都分别作为目标图像组执行了上述步骤S104~步骤S108,因此每个图像组都对应有一幅目标融合图像。所有的目标融合图像按照多个图像组在初始视频的视频帧序列中所对应的先后位置关系排列,每个目标融合图像均作为构成目标视频的一帧,多个目标融合图像按序排列后即可构成目标视频的视频帧序列。也即,由目标融合图像组成的视频帧序列即为目标视频。目标视频所包含的视频帧数量少于初始视频的视频帧数量,目标视频中的每一帧图像都是初始视频中的多帧图像经过运动模糊、主体对象保护等处理后融合而成。因此目标视频能够给人带来一定的卡顿感,而且图 像画面的背景模糊虚晃,但主体人物清晰。
通过上述方式,采用软件算法即可将正常拍摄所得的视频处理为具有主体人像清晰、背景虚晃且带有卡顿感的效果的视频。可以使用户不受限于拍摄工具、拍摄技能以及拍摄场景,即可方便快捷地得到上述视频拍摄效果。
本公开实施例提供的上述技术方案,能够基于初始视频的视频帧序列得到多个图像组,并将每个图像组分别作为目标图像组,执行如下操作:基于目标图像组中的各帧图像进行运动模糊处理,并将各帧图像经过运动模糊处理所得的图像进行融合,得到目标图像组对应的运动模糊图像;基于目标图像组中的指定帧图像确定目标图像组对应的主体对象区域和背景区域;然后根据主体对象区域和背景区域,对运动模糊图像和指定帧图像进行融合,得到目标融合图像;最后基于多个图像组各自对应的目标融合图像生成目标视频。通过上述方式,采用软件算法即可将正常拍摄所得的视频处理为具有主体人像清晰、背景虚晃且带有卡顿感的效果的视频,可以使用户不受限于拍摄工具、拍摄技能以及拍摄场景,即可方便快捷地得到上述视频拍摄效果。
在一些实施方式中,基于目标图像组中的各帧图像进行运动模糊处理,并将各帧图像经过运动模糊处理所得的图像进行融合的步骤,可以参照如下步骤A~步骤B执行。
步骤A:采用光流插值算法对目标图像组中的相邻帧图像之间均插入指定数量个中间帧图像,将插帧后的目标图像组中的所有帧图像作为目标图像组中的各帧图像经过运动模糊处理所得的图像。
光流是空间运动物体在观测成像平面上的像素运动的“瞬时速度”。光流的研究是利用图像序列中的像素强度数据的时域变化和相关性来确定各自像素位置的“运动”。换言之,在光流算法中,将一幅图像中的像素与另一幅图像中的像素进行匹配,通过匹配即可获知像素是如何从一幅图像“移动”或“流动”至另一幅图像。在匹配每一个像素后,就可以通过局部地移动像素来插值两幅图像中的中间视图。在一些实施方式中,为了节约运算力,提高处理效率,可以采用稀疏光流插值的方式进行插帧。例如,将帧图像按照指定大小的像素块(诸如16*16)划分,以像素块为单位,进行像素块之间的匹配以及运动向量的计算。属于同一像素块中的所有像素对应的运动向量都相同,不同像素块之间的运动向量可能相同也可能不同。通过这种方式,可以极大节约运算力。无论是服务端还是移动终端,都可以直接采用上述方式进行视频处理。基于此,在一些实施示例中,上述步骤A可以参照如下步骤A1~步骤A2执行。
步骤A1:获取目标图像组中的相邻帧图像之间像素块的双向运动向量。
例如,双向运动向量包括前向运动向量和反向运动向量。诸如相邻帧图像分别为在前帧图像Fa与在后帧图像Fb,以Fa为基准,将Fa中的像素块与Fb中的像素块进行匹配,并从Fa到Fb的方向上计算前向运动向量。以Fb为基准,将Fb中的像素块与Fa中的像素块进行匹配,并从Fb到Fa的方向上计算反向运动向量。通过双向运动向量可以合理可靠地表征像素块在图像之间的光流运动趋势。
在一些实施方式中,可以基于改进后的DIS光流算法获取目标图像组中的相邻帧图像之间像素块的双向运动向量。
例如,改进后的DIS光流算法所采用的图像金字塔的底层图像分辨率小于原有DIS光流算法所采用的图像金字塔的底层图像分辨率
例如,改进后的DIS光流算法所采用的迭代次数小于原有DIS光流算法所采用的迭代次数。示例性地,原有DIS光流算法所采用的图像金字塔的底层图像分辨率为原图分辨率,而改进后的DIS光流算法所采用的图像金字塔的底层图像分辨率为原图分辨率的1/4;原有DIS光流算法的迭代次数为12次,而改进后的DIS光流算法所采用的迭代次数为5次。
DIS光流算法是Dense Inverse Search-based method(基于稠密逆搜索的方法)的简称。原有的DIS光流算法属于稠密光流算法,在本公开实施例中为了节约算力,在原有DIS光流算法的基础上进行改进。例如,DIS算法是把图像缩放到不同的尺度,构建一个图像金字塔;然后从最小分辨率的一层开始,逐层向下估计光流,每一层估计得到的光流会作为下一层估计的初始化,从而达到准确估计不同幅度的运动的目的。而在本公开实施例中,只需要得到稀疏光流即可(即每个像素块中的像素都共享一个光流,而不是每个像素都需要计算相应的光流,光流可表征运动向量)。因此对DIS光流算法进行改进,降低图像金字塔的底层图像分辨率(也即最高分辨率)。示例性地,将最高分辨率设置为原图的1/4。此外,在最高分辨率上也无需再进行稠密化步骤,最后即可得到稀疏光流。另外,由于本公开实施例只需得到稀疏光流,并不要求高精度,因此在使用梯度下降求解时,只需要使用较小的迭代次数即可。因此将原有的DIS光流算法的12次迭代改为5次迭代。通过对DIS光流算法进行改进后,即可采用改进后的DIS光流算法快速获得相邻帧图像之间像素块的双向运动向量。
步骤A2:根据像素块的双向运动向量以及块运动补偿算法对相邻帧图像之间插入指定数量个中间帧图像。中间帧图像为在相邻帧图像之间插入的图像。
运动补偿是一种描述相邻帧差别的方法,例如是描述在前帧图像中的每个像素块怎样逐步移动到在后帧图像中的某个位置。在块运动补偿算法(也可称为分块运动补偿)中,每帧图像被分为若干像素块;可以基于原有帧图像中的像素块及相应的运动向量,预测其在中间帧图像中的位置。例如,在已知相邻帧图像像素块之间的双向运动向量后,可将相邻帧图像的像素块分别在运动路径上等距离采样M次;每采样一次则插入一帧,采样数值M可表征图像融合细腻程度。M值越大,图像融合越自然,M值越小,则图像融合程度较为粗糙,容易出现明显的重叠痕迹。通过块运动补偿进行插帧,得到相邻帧之间的模糊效果图。为便于理解,参见图2所示的一种相邻帧图像之间的插帧示意图,Fa与Fb为相邻帧,对于Fa帧中的任意像素块block_i,在前后帧中找到对应的block_i0和block_iM;并通过该像素块的双向运动向量(前向运动向量F_ab、反向运动向量F_ba),分别在相应的运动路径上等距离采样M次,每采样一次插入一帧。示例性地,在第j次和第k次采用的像素块位置如图2所示,示意出了在第j次采样对应的像素块为block_ij,第k次采样对应的像素块为block_ik。如图2所示,将每个像素在其所属的像素块的运动路径上进行复制并叠加,从而制造出真实平滑的运动模糊效果。通过上述方式,通过多次采样,可以在相邻帧图像之间插入多个中间帧图像,而且中间帧图像均为模糊图。
步骤B:将各帧图像经过运动模糊处理所得的图像进行平均融合。
将运动模糊处理所得的所有图像(原相邻图像帧及插入的中间帧图像)的像素值求平均即可得到目标图像组对应的运动模糊图像。通过这种方式,使得最终的运动模糊图像能够模拟出摄像中拍摄运动对象的间接曝光功能,使图像产生出一种运动虚晃的动态效果。另外,基于像素块的处理方式,在保证图像融合效果的前提下也可降低所需算力,有效提升整体算法性能,保证移动终端落地的可行性。
通过上述方式,可以基于每个图像组中的帧图像都对应生成运动模糊图像,而运动模糊图像的模糊程度通常与运动程度成正比,运动越快,拖影越长。采用上述算法的实现原理以及可达到的效果与真实慢快门的原理及拍出的模糊程度是一致的。因此它们都存在一个问题:当用户希望画面背景运动模糊而主体对象相对清晰时,所采用的上述模糊处理算法或真实拍摄效果都不能避免因主体运动或者拍摄设备抖动而导致的主体模糊。换言之,通过本公开实施例提供的上述运动模糊处理方式所得的运动模糊图像中的主体对象也是模糊的,难以清晰呈现给用户。为改善此问题,本公开实施例提出对象保护策略,可以基于目标图像组中的指定帧图像进行对象分割,以获取 目标图像组对应的主体对象区域和背景区域;通过主体对象区域和背景区域来进行对象保护。例如,指定帧图像可以选择位于目标图像组中间位置的图像,有助于后续融合更为自然。
在一些实施方式中,将位于目标图像组中间位置的图像作为指定帧图像,采用对象实例分割算法对指定帧图像进行处理,基于处理结果得到目标图像组对应的主体对象区域和背景区域。例如,可根据主体对象区域和背景区域,可以得到主体对象掩膜图像。在一些实施方式中,指定帧图像中可能有至少一个对象,则可以从至少一个对象掩膜中确定主体对象掩膜。主体对象掩膜即为最靠近图像中心的对象掩膜,以此得到主体对象掩膜图像。
在确定主体对象掩膜图像的一些实施方式中,也可以参照如下步骤1~步骤4。
步骤1,对指定帧图像的对象分割结果(Alpha分割图)进行图像腐蚀,减少多个对象间的连通。
步骤2,对腐蚀后的图像进行二值化,之后进行连通区域检测,以找出最靠近图像中心的大片连通区域作为主体对象。
步骤3,对选中的连通区域进行膨胀操作,并映射到原有的Alpha分割图上,得到主体对象掩膜。
步骤4,对主体对象掩膜进行优化,示例性地,进行均值模糊及平滑边缘处理,得到主体对象掩膜图像。
通过上述方式,可得到主体对象掩膜图像,以便于后续利用主体对象掩膜图像对主体对象进行保护。
应当注意的是,本公开实施例在获取目标图像组对应的运动模糊图像以及主体对象掩膜图像的两个过程不分先后,可以并行执行。
在按照上述方式得到运动模糊图像和主体对象掩膜图像之后,在一些实施方式中,可以根据运动模糊图像、主体对象掩膜图像和指定帧图像,得到目标图像组对应的目标融合图像。
为了使得到的目标视频的帧图像画面更逼真,本公开实施例还可以控制主体对象保护的程度。诸如,在全局运动幅度较大的情况下,主体对象不会特别清晰,以避免违和感。基于此,根据主体对象掩膜图像,对运动模糊图像和指定帧图像进行图像融合的步骤,可以参照如下步骤(1)~步骤(3)。
在步骤(1)中,获取主体对象掩膜图像对应的权重系数。权重系数与主体对象的 保护程度相关,权重系数越大,主体对象保护程度越高,主体对象越清晰。
在一些实施示例中,可以基于光流法获取目标图像组中各帧图像对应的全局运动幅度;根据全局运动幅度确定主体对象掩膜图像对应的权重系数。本公开实施例对光流法不进行限定,诸如可以采用稀疏光流法确定像素块的运动信息,从而获取目标图像组中各帧图像对应的全局运动幅度。全局运动幅度与权重系数呈负相关,全局运动幅度越大,也即运动越快,权重系数越小,主体对象的清晰度相对越低(但仍旧会高于模糊背景的清晰度,只是使主体对象不是特别清晰)。综上,本公开实施例可根据镜头移动带来的全局运动幅度来调节对象保护的程度。
在步骤(2)中,基于权重系数调整主体对象掩膜图像的像素值,得到调整后的主体对象掩膜图像。在一些示例中,可以令权重系数与主体对象掩膜图像的像素值相乘,以得到调整后的主体对象掩膜图像。
在步骤(3)中,基于调整后的主体对象掩膜图像,对运动模糊图像和指定帧图像进行图像融合。示例性地,可以采用如下公式对运动模糊图像和指定帧图像进行图像融合:
Merge_N’=β*mask_main*Pn+(1-β*mask_main)*Merge_N
其中,β为权重系数;mask_main为主体对象掩膜图像;β*mask_main为调整后的主体对象掩膜图像;Pn为指定帧图像;Merge_N为运动模糊图像;Merge_N’为目标融合图像。
基于上述公式进行图像融合,所得到的目标融合图像即为背景画面模糊虚晃,但主体对象相对清晰,且清晰程度可基于权重系数调控;而权重系数可基于因镜头移动带来的全局运动幅度确定,使得主体对象的清晰程度与全局运动幅度相关,画面效果更为真实自然。
对于初始视频的视频帧序列切分(平均切分、非平均切分、交叉切分等,对切分方式不进行限定)所得的图像组都采用上述方式得到相应的目标融合图像之后,即可将所有的目标融合图像按序排列形成所需的目标视频。且由于是将初始视频的多帧图像融合处理为目标视频中的一帧图像,降低了帧率,因此可给用户带来卡顿感。综上,用户无需受限于拍摄工具、拍摄技能以及拍摄场景,仅通过本公开实施例提供的上述视频处理方法,采用软件算法即可方便快捷地将用户正常拍摄的视频转换为具有主体对象清晰、背景虚晃且带有卡顿感的目标视频。上述目标视频风格独特,可以为用户呈现出带有运动感及卡顿感的视频画面,但该视频画面中的主体对象仍旧清晰,因此 可以较好突显出主体对象。以主体对象是人物为例,上述视频效果可以在一定程度上体现出主体人物的内心意识,具有较强的感染力。另外,在处理过程中采用诸如稀疏光流算法等方式进行运动模糊,可有效降低运算力,提升整体算法性能,保证了移动终端落地的可行性。因此既可以在服务端实现,又可以在移动终端实现,适用范围更广。
对应于前述视频处理方法,本公开实施例提供了一种视频处理装置,图3为本公开实施例提供的一种视频处理装置的结构示意图。该装置可由软件和/或硬件实现,一般可集成在电子设备中,如图4所示。
视频处理装置包括:图像组获取模块302,用于基于初始视频的视频帧序列获得多个图像组;
模糊处理模块304,用于基于目标图像组中的各帧图像进行运动模糊处理,并将所述各帧图像经过运动模糊处理所得的图像进行融合,得到所述目标图像组对应的运动模糊图像;所述多个图像组中的每个图像组均为所述目标图像组;
区域确定模块306,用于基于所述目标图像组中的指定帧图像确定所述目标图像组对应的主体对象区域和背景区域;
融合模块308,用于根据所述主体对象区域和所述背景区域,对所述运动模糊图像和所述指定帧图像进行融合,得到目标融合图像;所述目标融合图像在所述主体对象区域的图像部分为所述指定帧图像在所述主体对象区域的图像部分,所述目标融合图像在所述背景区域的图像部分为所述运动模糊图像在所述背景区域的图像部分;
视频生成模块310,用于基于所述多个图像组各自对应的目标融合图像生成目标视频;所述多个图像组各自对应的目标融合图像在所述目标视频中的播放顺序与所述多个图像组在所述初始视频中的播放顺序相同。
通过上述装置,采用软件算法即可将正常拍摄所得的视频处理为具有主体人像清晰、背景虚晃且带有卡顿感的的效果的视频,可以使用户不受限于拍摄工具、拍摄技能以及拍摄场景,即可方便快捷地得到上述视频拍摄效果。
在一些实施方式中,模糊处理模块304用于:采用光流插值算法对所述目标图像组中的相邻帧图像之间均插入指定数量个中间帧图像,将插帧后的所述目标图像组中的所有帧图像作为所述目标图像组中的各帧图像经过运动模糊处理所得的图像;将所述各帧图像经过运动模糊处理所得的图像进行平均融合。
在一些实施方式中,模糊处理模块304用于:获取所述目标图像组中的相邻帧图 像之间像素块的双向运动向量;根据所述像素块的双向运动向量以及块运动补偿算法对所述相邻帧图像之间插入指定数量个中间帧图像。
在一些实施方式中,模糊处理模块304用于:基于改进后的DIS光流算法获取所述目标图像组中的相邻帧图像之间像素块的双向运动向量;所述改进后的DIS光流算法所采用的图像金字塔的底层图像分辨率小于原有DIS光流算法所采用的图像金字塔的底层图像分辨率,和/或,所述改进后的DIS光流算法所采用的迭代次数小于原有DIS光流算法所采用的迭代次数。
在一些实施方式中,区域确定模块306用于:将位于所述目标图像组中间位置的图像作为指定帧图像,采用对象实例分割算法对所述指定帧图像进行处理,基于处理结果得到所述目标图像组对应的主体对象区域和背景区域。
在一些实施方式中,融合模块308用于:根据所述主体对象区域和所述背景区域,得到主体对象掩膜图像;获取所述主体对象掩膜图像对应的权重系数;基于所述权重系数调整所述主体对象掩膜图像的像素值,得到调整后的所述主体对象掩膜图像;基于调整后的所述主体对象掩膜图像,对所述运动模糊图像和所述指定帧图像进行图像融合。
在一些实施方式中,融合模块308用于:基于光流法获取所述目标图像组中各帧图像对应的全局运动幅度;根据所述全局运动幅度确定所述主体对象掩膜图像对应的权重系数。
在一些实施方式中,融合模块308用于:基于调整后的所述主体对象掩膜图像,对所述运动模糊图像和所述指定帧图像进行图像融合的步骤,包括:
采用如下公式对所述运动模糊图像和所述指定帧图像进行图像融合:
Merge_N’=β*mask_main*Pn+(1-β*mask_main)*Merge_N
β为所述权重系数;mask_main为所述主体对象掩膜图像;β*mask_main为调整后的所述主体对象掩膜图像;Pn为所述指定帧图像;Merge_N为所述运动模糊图像;Merge_N’为所述目标融合图像。
在一些实施方式中,图像组获取模块302用于:将初始视频的视频帧序列按照指定间隔进行切分,得到多个图像组;相邻的两个图像组之间具有预设数量个重叠帧图像。
本公开实施例所提供的视频处理装置可执行本公开任意实施例所提供的视频处理方法,具备执行方法相应的功能模块和有益效果。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的装置实施例的工作过程,可以参考方法实施例中的对应过程,在此不再赘述。
本公开实施例还提供了一种电子设备,电子设备包括:处理器;用于存储处理器可执行指令的存储器;处理器,用于从存储器中读取可执行指令,并执行指令以实现上述视频处理方法。图4为本公开实施例提供的一种电子设备的结构示意图。如图4所示,电子设备400包括一个或多个处理器401和存储器402。
处理器401可以是中央处理单元(CPU)或者具有数据处理能力和/或指令执行能力的其他形式的处理单元,并且可以控制电子设备400中的其他组件以执行期望的功能。
存储器402可以包括一个或多个计算机程序产品,所述计算机程序产品可以包括各种形式的计算机可读存储介质,例如易失性存储器和/或非易失性存储器。所述易失性存储器例如可以包括随机存取存储器(RAM)和/或高速缓冲存储器(cache)等。所述非易失性存储器例如可以包括只读存储器(ROM)、硬盘、闪存等。在所述计算机可读存储介质上可以存储一个或多个计算机程序指令,处理器401可以运行所述程序指令,以实现上文所述的本公开的实施例的视频处理方法以及/或者其他期望的功能。在所述计算机可读存储介质中还可以存储诸如输入信号、信号分量、噪声分量等各种内容。
在一些示例中,电子设备400还可以包括:输入装置403和输出装置404,这些组件通过总线系统和/或其他形式的连接机构(未示出)互连。
此外,该输入装置403还可以包括例如键盘、鼠标等等。
该输出装置404可以向外部输出各种信息,包括确定出的距离信息、方向信息等。该输出装置404可以包括例如显示器、扬声器、打印机、以及通信网络及其所连接的远程输出设备等等。
当然,为了简化,图4中仅示出了该电子设备400中与本公开有关的组件中的一些,省略了诸如总线、输入/输出接口等等的组件。除此之外,根据具体应用情况,电子设备400还可以包括任何其他适当的组件。
除了上述方法和设备以外,本公开的实施例还可以是计算机程序产品,其包括计算机程序指令,所述计算机程序指令在被处理器运行时使得所述处理器执行本公开实施例所提供的视频处理方法。
所述计算机程序产品可以以一种或多种程序设计语言的任意组合来编写用于执 行本公开实施例操作的程序代码,所述程序设计语言包括面向对象的程序设计语言,诸如Java、C++等,还包括常规的过程式程序设计语言,诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算设备上执行、部分地在用户设备上执行、作为一个独立的软件包执行、部分在用户计算设备上部分在远程计算设备上执行、或者完全在远程计算设备或服务器上执行。
此外,本公开的实施例还可以是计算机可读存储介质,其上存储有计算机程序指令,所述计算机程序指令在被处理器运行时使得所述处理器执行本公开实施例所提供的视频处理方法。
所述计算机可读存储介质可以采用一个或多个可读介质的任意组合。可读介质可以是可读信号介质或者可读存储介质。可读存储介质例如可以包括但不限于电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。可读存储介质的的例子(非穷举的列表)包括:具有一个或多个导线的电连接、便携式盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。
本公开实施例还提供了一种计算机程序产品,包括计算机程序/指令,该计算机程序/指令被处理器执行时实现本公开实施例中的视频处理方法。
本公开实施例还提供了一种计算机程序,包括指令,指令当由处理器执行时使处理器执行本公开实施例提供的视频处理方法。
需要说明的是,在本文中,诸如“第一”和“第二”等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。
以上所述仅是本公开的具体实施方式,使本领域技术人员能够理解或实现本公开。对这些实施例的多种修改对本领域的技术人员来说将是显而易见的,本文中所定义的一般原理可以在不脱离本公开的精神或范围的情况下,在其它实施例中实现。因此, 本公开将不会被限制于本文所述的这些实施例,而是要符合与本文所公开的原理和新颖特点相一致的最宽的范围。

Claims (14)

  1. 一种视频处理方法,包括:
    基于初始视频的视频帧序列,获得多个图像组;
    基于目标图像组中的各帧图像,进行运动模糊处理,并将所述各帧图像经过运动模糊处理所得的图像进行融合,得到所述目标图像组对应的运动模糊图像,所述多个图像组中的每个图像组均为所述目标图像组;
    基于所述目标图像组中的指定帧图像,确定所述目标图像组对应的主体对象区域和背景区域;
    根据所述主体对象区域和所述背景区域,对所述运动模糊图像和所述指定帧图像进行融合,得到目标融合图像,所述目标融合图像在所述主体对象区域的图像部分为所述指定帧图像在所述主体对象区域的图像部分,所述目标融合图像在所述背景区域的图像部分为所述运动模糊图像在所述背景区域的图像部分;
    基于所述多个图像组各自对应的目标融合图像,生成目标视频,所述多个图像组各自对应的目标融合图像在所述目标视频中的播放顺序与所述多个图像组在所述初始视频中的播放顺序相同。
  2. 根据权利要求1所述的视频处理方法,其中,所述基于目标图像组中的各帧图像进行运动模糊处理,并将所述各帧图像经过运动模糊处理所得的图像进行融合的步骤,包括:
    采用光流插值算法,对所述目标图像组中的相邻帧图像之间均插入指定数量个中间帧图像,将插帧后的所述目标图像组中的所有帧图像作为所述目标图像组中的各帧图像经过运动模糊处理所得的图像;
    将所述各帧图像经过运动模糊处理所得的图像进行平均融合。
  3. 根据权利要求2所述的视频处理方法,其中,所述采用光流插值算法,对所述目标图像组中的相邻帧图像之间插入指定数量个中间帧图像的步骤,包括:
    获取所述目标图像组中的相邻帧图像之间像素块的双向运动向量;
    根据所述像素块的双向运动向量以及块运动补偿算法,对所述相邻帧图像之间插入指定数量个中间帧图像。
  4. 根据权利要求3所述的视频处理方法,其中,所述获取所述目标图像组中的相邻帧图像之间像素块的双向运动向量的步骤,包括:
    基于改进后的基于稠密逆搜索DIS光流算法获取所述目标图像组中的相邻帧图像之间像素块的双向运动向量;
    改进后的DIS光流算法所采用的图像金字塔的底层图像分辨率小于原有DIS光流算法所采用的图像金字塔的底层图像分辨率,和/或,所述改进后的DIS光流算法所采用的迭代次数小于原有DIS光流算法所采用的迭代次数。
  5. 根据权利要求1-4任一项所述的视频处理方法,其中,所述基于所述目标图像组中的指定帧图像,确定主体对象区域和背景区域的步骤,包括:
    将位于所述目标图像组中间位置的图像作为指定帧图像;
    采用对象实例分割算法,对所述指定帧图像进行处理;
    基于处理结果,得到所述目标图像组对应的主体对象区域和背景区域。
  6. 根据权利要求1-4任一项所述的视频处理方法,其中,所述根据所述主体对象区域和所述背景区域,对所述运动模糊图像和所述指定帧图像进行图像融合的步骤,包括:
    根据所述主体对象区域和所述背景区域,得到主体对象掩膜图像;
    获取所述主体对像掩膜图像对应的权重系数;
    基于所述权重系数,调整所述主体对象掩膜图像的像素值,得到调整后的所述主体对象掩膜图像;
    基于调整后的所述主体对象掩膜图像,对所述运动模糊图像和所述指定帧图像进行图像融合。
  7. 根据权利要求6所述的视频处理方法,其中,所述获取所述主体对象掩膜图像对应的权重系数的步骤,包括:
    基于光流法,获取所述目标图像组中各帧图像对应的全局运动幅度;
    根据所述全局运动幅度,确定所述主体对象掩膜图像对应的权重系数。
  8. 根据权利要求7所述的视频处理方法,其中,所述全局运动幅度与所述权重系数负相关。
  9. 根据权利要求6所述的视频处理方法,其中,所述基于调整后的所述主体对象掩膜图像,对所述运动模糊图像和所述指定帧图像进行图像融合的步骤,包括:
    采用如下公式对所述运动模糊图像和所述指定帧图像进行图像融合:
    Merge_N’=β*mask_main*Pn+(1-β*mask_main)*Merge_N,
    其中,β为所述权重系数,mask_main为所述主体对象掩膜图像,β*mask_main 为调整后的所述主体对像掩膜图像,Pn为所述指定帧图像,Merge_N为所述运动模糊图像,Merge_N’为所述目标融合图像。
  10. 根据权利要求1-9任一项所述的方视频处理法,其中,所述基于初始视频的视频帧序列,获得多个图像组的步骤,包括:
    将所述初始视频的视频帧序列按照指定间隔进行切分,得到所述多个图像组,相邻的两个图像组之间具有预设数量个重叠帧图像。
  11. 一种视频处理装置,包括:
    图像组获取模块,用于基于初始视频的视频帧序列获得多个图像组;
    模糊处理模块,用于基于目标图像组中的各帧图像进行运动模糊处理,并将所述各帧图像经过运动模糊处理所得的图像进行融合,得到所述目标图像组对应的运动模糊图像,所述多个图像组中的每个图像组均为所述目标图像组;
    区域确定模块,用于基于所述目标图像组中的指定帧图像确定所述目标图像组对应的主体对象区域和背景区域;
    融合模块,用于根据所述主体对象区域和所述背景区域,对所述运动模糊图像和所述指定帧图像进行融合,得到目标融合图像;所述目标融合图像在所述主体对象区域的图像部分为所述指定帧图像在所述主体对象区域的图像部分,所述目标融合图像在所述背景区域的图像部分为所述运动模糊图像在所述背景区域的图像部分;
    视频生成模块,用于基于所述多个图像组各自对应的目标融合图像生成目标视频,所述多个图像组各自对应的目标融合图像在所述目标视频中的播放顺序与所述多个图像组在所述初始视频中的播放顺序相同。
  12. 一种电子设备,包括:
    处理器;
    用于存储所述处理器可执行指令的存储器;
    所述处理器,用于从所述存储器中读取所述可执行指令,并执行所述指令以实现上述权利要求1-10中任一所述的视频处理方法。
  13. 一种计算机可读存储介质,存储有计算机程序,所述计算机程序在被处理器运行时使得所述处理器执行上述权利要求1-10中任一所述的视频处理方法。
  14. 一种计算机程序,包括:
    指令,所述指令当由处理器执行时使所述处理器执行根据权利要求1-10中任一所述的视频处理方法。
PCT/CN2023/101608 2022-06-21 2023-06-21 视频处理方法、装置、设备及介质 WO2023246844A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210705983.3 2022-06-21
CN202210705983.3A CN117336422A (zh) 2022-06-21 2022-06-21 视频处理方法、装置、设备及介质

Publications (1)

Publication Number Publication Date
WO2023246844A1 true WO2023246844A1 (zh) 2023-12-28

Family

ID=89277884

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/101608 WO2023246844A1 (zh) 2022-06-21 2023-06-21 视频处理方法、装置、设备及介质

Country Status (2)

Country Link
CN (1) CN117336422A (zh)
WO (1) WO2023246844A1 (zh)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160205291A1 (en) * 2015-01-09 2016-07-14 PathPartner Technology Consulting Pvt. Ltd. System and Method for Minimizing Motion Artifacts During the Fusion of an Image Bracket Based On Preview Frame Analysis
CN111292337A (zh) * 2020-01-21 2020-06-16 广州虎牙科技有限公司 图像背景替换方法、装置、设备及存储介质
CN113313788A (zh) * 2020-02-26 2021-08-27 北京小米移动软件有限公司 图像处理方法和装置、电子设备以及计算机可读存储介质
CN114245035A (zh) * 2021-12-17 2022-03-25 深圳市慧鲤科技有限公司 视频生成方法和装置、设备、介质
CN114419073A (zh) * 2022-03-09 2022-04-29 荣耀终端有限公司 一种运动模糊生成方法、装置和终端设备

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160205291A1 (en) * 2015-01-09 2016-07-14 PathPartner Technology Consulting Pvt. Ltd. System and Method for Minimizing Motion Artifacts During the Fusion of an Image Bracket Based On Preview Frame Analysis
CN111292337A (zh) * 2020-01-21 2020-06-16 广州虎牙科技有限公司 图像背景替换方法、装置、设备及存储介质
CN113313788A (zh) * 2020-02-26 2021-08-27 北京小米移动软件有限公司 图像处理方法和装置、电子设备以及计算机可读存储介质
CN114245035A (zh) * 2021-12-17 2022-03-25 深圳市慧鲤科技有限公司 视频生成方法和装置、设备、介质
CN114419073A (zh) * 2022-03-09 2022-04-29 荣耀终端有限公司 一种运动模糊生成方法、装置和终端设备

Also Published As

Publication number Publication date
CN117336422A (zh) 2024-01-02

Similar Documents

Publication Publication Date Title
KR102642993B1 (ko) 야경 촬영 방법, 장치, 전자설비, 및 저장 매체
CN109218628B (zh) 图像处理方法、装置、电子设备及存储介质
WO2021073331A1 (zh) 基于终端设备的变焦虚化图像获取方法和装置
US10600157B2 (en) Motion blur simulation
US9591237B2 (en) Automated generation of panning shots
Zhang et al. Gradient-directed composition of multi-exposure images
Nayar et al. Motion-based motion deblurring
US8228400B2 (en) Generation of simulated long exposure images in response to multiple short exposures
US20190089910A1 (en) Dynamic generation of image of a scene based on removal of undesired object present in the scene
WO2020038087A1 (zh) 超级夜景模式下的拍摄控制方法、装置和电子设备
CN111028190A (zh) 图像处理方法、装置、存储介质及电子设备
CN107113381A (zh) 时空局部变形及接缝查找的容差视频拼接
WO2013151873A1 (en) Joint video stabilization and rolling shutter correction on a generic platform
KR101831516B1 (ko) 멀티 스티커를 이용한 영상 생성 방법 및 장치
US11812154B2 (en) Method, apparatus and system for video processing
CN112822412A (zh) 曝光方法和电子设备
CN110958363B (zh) 图像处理方法及装置、计算机可读介质和电子设备
US9686470B2 (en) Scene stability detection
CN113014817A (zh) 高清高帧视频的获取方法、装置及电子设备
WO2023246844A1 (zh) 视频处理方法、装置、设备及介质
JP2019110430A (ja) 制御装置、撮像装置、制御方法およびプログラム
JP2019047336A (ja) 画像処理装置、撮像装置、画像処理方法、及びプログラム
CN115278047A (zh) 拍摄方法、装置、电子设备和存储介质
CN114463213A (zh) 视频处理方法、视频处理装置、终端及存储介质
Lai et al. Correcting face distortion in wide-angle videos

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23826489

Country of ref document: EP

Kind code of ref document: A1