WO2021233032A1 - 视频处理方法、视频处理装置和电子设备 - Google Patents

视频处理方法、视频处理装置和电子设备 Download PDF

Info

Publication number
WO2021233032A1
WO2021233032A1 PCT/CN2021/087795 CN2021087795W WO2021233032A1 WO 2021233032 A1 WO2021233032 A1 WO 2021233032A1 CN 2021087795 W CN2021087795 W CN 2021087795W WO 2021233032 A1 WO2021233032 A1 WO 2021233032A1
Authority
WO
WIPO (PCT)
Prior art keywords
frame
video
shake
original
interpolated
Prior art date
Application number
PCT/CN2021/087795
Other languages
English (en)
French (fr)
Inventor
张弓
Original Assignee
Oppo广东移动通信有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Oppo广东移动通信有限公司 filed Critical Oppo广东移动通信有限公司
Publication of WO2021233032A1 publication Critical patent/WO2021233032A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/587Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal sub-sampling or interpolation, e.g. decimation or subsequent interpolation of pictures in a video sequence
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/136Incoming video signal characteristics or properties
    • H04N19/137Motion inside a coding unit, e.g. average field, frame or block difference
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/142Detection of scene cut or scene change
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/20Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using video object coding

Definitions

  • the present disclosure relates to the field of image processing technology, and in particular to a video processing method, video processing device and electronic equipment.
  • Mobile camera devices are used in more and more fields because of their small size and convenient portability.
  • these mobile camera devices are easily affected by the surrounding environment due to their convenient movement when shooting. For example, in handheld shooting, it is easy to cause video jitter due to unstable hand movement; for another example, the vibration during the driving of a car can easily cause the video captured by the car camera to jitter.
  • the purpose of the present disclosure is to provide a video processing method, a video processing device, and an electronic device, so as to improve the visual coherence of the anti-shake video at least to a certain extent.
  • a video processing method which includes: acquiring original video and capturing motion data corresponding to the device when the original video is captured, and performing frame interpolation processing on the original video to obtain the frame-inserted video corresponding to the original video ; Perform anti-shake repair on the video frames in the inserted frame video according to the motion data to obtain the anti-shake video frame corresponding to the inserted frame video; generate the anti-shake video corresponding to the original video according to the anti-shake video frame.
  • a video processing device including: a video frame insertion module, used to obtain the original video and the motion data corresponding to the capture device when the original video is captured, and perform frame insertion processing on the original video to obtain The frame-insertion video corresponding to the original video; the anti-shake processing module is used to perform anti-shake repair on the video frames in the frame-inserted video according to the motion data to obtain the anti-shake video frame corresponding to the frame-inserted video; the video generation module is used to The anti-shake video frame generates an anti-shake video corresponding to the original video.
  • an electronic device including: a processor; and a memory, configured to store one or more programs, and when one or more programs are executed by one or more processors, one or more Multiple processors implement the above-mentioned video processing method.
  • Fig. 1 shows a schematic diagram of an exemplary system architecture to which embodiments of the present disclosure can be applied
  • FIG. 2 shows a schematic diagram of an electronic device to which an embodiment of the present disclosure can be applied
  • FIG. 3 schematically shows a flowchart of a video processing method in an exemplary embodiment of the present disclosure
  • Fig. 4 schematically shows a flowchart of a method for frame interpolation processing on original video in an exemplary embodiment of the present disclosure
  • FIG. 5 schematically shows a schematic diagram of frame insertion processing in an exemplary embodiment of the present disclosure
  • Fig. 6 schematically shows a schematic diagram of determining a motion vector based on motion estimation in an exemplary embodiment of the present disclosure
  • Fig. 7 schematically shows a schematic diagram of a corrected motion vector in an exemplary embodiment of the present disclosure
  • FIG. 8 schematically shows a schematic diagram of frame interpolation based on motion compensation in an exemplary embodiment of the present disclosure
  • FIG. 9 schematically shows a flowchart of another video processing method in an exemplary embodiment of the present disclosure.
  • FIG. 10 schematically shows a schematic diagram of the composition of a video processing device in an exemplary embodiment of the present disclosure.
  • FIG. 1 shows a schematic diagram of a system architecture of an exemplary application environment of a video processing method and device to which the embodiments of the present disclosure can be applied.
  • the system architecture 100 may include one or more of terminal devices 101, 102, and 103, a network 104 and a server 105.
  • the network 104 is used to provide a medium for communication links between the terminal devices 101, 102, 103 and the server 105.
  • the network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, and so on.
  • the terminal devices 101, 102, 103 may be various electronic devices with image processing functions, including but not limited to desktop computers, portable computers, smart phones, tablet computers, and so on. It should be understood that the numbers of terminal devices, networks, and servers in FIG. 1 are merely illustrative. There can be any number of terminal devices, networks, and servers according to implementation needs.
  • the server 105 may be a server cluster composed of multiple servers.
  • the video processing methods provided by the embodiments of the present disclosure are generally executed by the terminal devices 101, 102, and 103. Accordingly, the video processing apparatuses are generally also provided in the terminal devices 101, 102, and 103. However, it is easily understood by those skilled in the art that the video processing method provided by the embodiments of the present disclosure can also be executed by the server 105. Correspondingly, the video processing device can also be provided in the server 105. This is not the case in this exemplary embodiment. Make special restrictions.
  • the terminal devices 101, 102, 103, etc. are used as collection devices to collect the original video and corresponding motion data, and at the same time as the execution subject of the video processing method, based on the collected original video Perform video processing with motion data to obtain anti-shake video; in another exemplary embodiment, terminal devices 101, 102, 103, etc. can be used as collection devices to send the collected original video and motion data to other terminal devices 101, 102, 103, etc., or the server 105 performs video processing to obtain an anti-shake video.
  • Exemplary embodiments of the present disclosure provide an electronic device for implementing a video processing method, which may be the terminal device 101, 102, 103 or the server 105 in FIG. 1.
  • the electronic device at least includes a processor and a memory, the memory is used to store executable instructions of the processor, and the processor is configured to execute the video processing method by executing the executable instructions.
  • Fig. 2 shows a schematic diagram of an electronic device suitable for implementing exemplary embodiments of the present disclosure.
  • the electronic device 200 may specifically include: a processor 210, an internal memory 221, an external memory interface 222, a Universal Serial Bus (USB) interface 230, a charging management module 240, a power management module 241, Battery 242, antenna 1, antenna 2, mobile communication module 250, wireless communication module 260, audio module 270, speaker 271, receiver 272, microphone 273, earphone interface 274, sensor module 280, display screen 290, camera module 291, indicator 292, a motor 293, a button 294, a subscriber identification module (SIM) card interface 295, and so on.
  • the sensor module 280 may include a depth sensor 2801, a pressure sensor 2802, a gyroscope sensor 2803, an air pressure sensor 2804, a magnetic sensor 2805, an acceleration sensor 2806, and the like.
  • the structure illustrated in the embodiment of the present application does not constitute a specific limitation on the electronic device 200.
  • the electronic device 200 may include more or fewer components than shown, or combine certain components, or split certain components, or arrange different components.
  • the illustrated components can be implemented in hardware, software, or a combination of software and hardware.
  • the processor 210 may include one or more processing units.
  • the processor 210 may include an application processor (AP), a modem processor, a graphics processing unit (GPU), and an image signal processor. (Image Signal Processor, ISP), controller, video codec, digital signal processor (Digital Signal Processor, DSP), baseband processor and/or Neural-Network Processing Unit (NPU), etc.
  • AP application processor
  • modem processor GPU
  • image signal processor ISP
  • controller Video codec
  • DSP Digital Signal Processor
  • NPU Neural-Network Processing Unit
  • the different processing units may be independent devices or integrated in one or more processors.
  • a memory may also be provided in the processor 210 for storing instructions and data.
  • the memory can store instructions for implementing six modular functions: detection instructions, connection instructions, information management instructions, analysis instructions, data transmission instructions, and notification instructions, and the processor 210 controls the execution.
  • the memory in the processor 210 is a cache memory.
  • the memory can store instructions or data that have just been used or recycled by the processor 210. If the processor 210 needs to use the instruction or data again, it can be directly called from the memory. Repeated accesses are avoided, the waiting time of the processor 210 is reduced, and the efficiency of the system is improved.
  • the wireless communication function of the electronic device 200 can be implemented by the antenna 1, the antenna 2, the mobile communication module 250, the wireless communication module 260, the modem processor, and the baseband processor.
  • the camera module 291 is used to capture still images or videos.
  • the object generates an optical image through the lens and is projected to the photosensitive element.
  • the photosensitive element may be a Charge Coupled Device (CCD) or a Complementary Metal-Oxide-Semiconductor (CMOS) phototransistor.
  • CCD Charge Coupled Device
  • CMOS Complementary Metal-Oxide-Semiconductor
  • the photosensitive element converts the optical signal into an electrical signal, and then transfers the electrical signal to the ISP to convert it into a digital image signal.
  • ISP outputs digital image signals to DSP for processing.
  • DSP converts digital image signals into standard RGB, YUV and other formats of image signals.
  • the electronic device 200 may include 1 or N camera modules 291, and N is a positive integer greater than 1. If the electronic device 200 includes N cameras, one of the N cameras is the main camera.
  • Digital signal processors are used to process digital signals. In addition to digital image signals, they can also process other digital signals. For example, when the electronic device 200 selects the frequency point, the digital signal processor is used to perform Fourier transform on the energy of the frequency point.
  • Video codecs are used to compress or decompress digital video.
  • the electronic device 200 may support one or more video codecs. In this way, the electronic device 200 can play or record videos in multiple encoding formats, such as: Moving Picture Experts Group (MPEG) 1, MPEG2, MPEG3, MPEG4, etc.
  • MPEG Moving Picture Experts Group
  • MPEG2 MPEG2, MPEG3, MPEG4, etc.
  • NPU is a neural network (Neural-Network, NN) computing processor.
  • NN neural network
  • applications such as intelligent cognition of the electronic device 200 can be realized, such as image recognition, face recognition, voice recognition, text understanding, and so on.
  • the external memory interface 222 may be used to connect an external memory card, such as a Micro SD card, so as to expand the storage capacity of the electronic device 200.
  • the internal memory 221 may be used to store computer executable program code, and the executable program code includes instructions.
  • the internal memory 221 may include a storage program area and a storage data area.
  • the depth sensor 2801 is used to obtain depth information of the scene.
  • the depth sensor may be provided in the camera module 291.
  • the pressure sensor 2802 is used to sense the pressure signal and can convert the pressure signal into an electrical signal.
  • the gyro sensor 2803 may be used to determine the movement posture of the electronic device 200.
  • the gyroscope sensor can be used to collect the movement data corresponding to the original video when the collection device is collecting the original video. That is, the gyroscope sensor 2803 can determine that the electronic device 200 is around three axes (ie, x, y, and z axes). Angular velocity, and use the angular velocity as the motion data corresponding to the current video frame.
  • the image mapping matrix corresponding to the camera can be used as the original coordinate mapping matrix corresponding to the first frame of video, and then according to the motion data of the gyroscope in each frame of video and the motion data of the first frame of video The change of, calculates the offset of the original coordinate matrix, and then obtains the original coordinate matrix of each frame of video after the first frame, to achieve anti-shake to a certain extent.
  • the air pressure sensor 2804 is used to measure air pressure, and the electronic device 200 can calculate the altitude based on the air pressure value measured by the air pressure sensor 2804 to assist positioning and navigation.
  • the magnetic sensor 2805 includes a Hall sensor, and the electronic device 200 can use the magnetic sensor 2805 to detect the opening and closing of the flip holster.
  • the acceleration sensor 2806 can detect the acceleration of the electronic device 200 in various directions (usually three axes). When the electronic device 200 is stationary, it can detect the magnitude and direction of gravity. It can also be used to identify the posture of the electronic device. Therefore, in some implementations In the example, it can also be used to collect the movement data corresponding to the original video when the collecting device collects the original video.
  • the button 294 includes a power-on button, a volume button, and so on.
  • the motor 293 can generate vibration prompts.
  • the motor 293 can be used for incoming call vibration notification, and can also be used for touch vibration feedback.
  • the indicator 292 can be an indicator light, which can be used to indicate the charging status, power change, or to indicate messages, missed calls, notifications, and so on.
  • the SIM card interface 295 is used to connect to the SIM card.
  • Fig. 3 shows the flow of a video processing method in this exemplary embodiment, including the following steps S310 to S330:
  • step S310 the original video is acquired and the motion data corresponding to the acquisition device when the original video is collected, and the frame insertion processing is performed on the original video to obtain the frame insertion video corresponding to the original video.
  • the motion data may be motion data obtained by the gyroscope, acceleration sensor and other devices set on the acquisition device when the acquisition device shoots the original video, reflecting the motion state of the acquisition device in the current state, such as pose and acceleration.
  • the motion data may be data such as the pose of the mobile phone collected by the gyroscope in the mobile phone or the angle at which the mobile phone is placed.
  • the frame insertion processing refers to the process of inserting a series of intermediate frames between two original video frames in the original video through certain rules. After the original video is obtained, two frames can be determined as the original video frame in the original video, and a certain number of intermediate frames can be inserted between the two original video frames. In addition, multiple original video frame pairs can be extracted from the video at the same time, and frame interpolation processing can be performed for each pair. After the frame insertion processing ends, the original video frames and the interpolated video frames obtained from the interpolated frame are sorted in chronological order, as the interpolated video corresponding to the original video.
  • performing frame interpolation processing on the original video may include: extracting at least a pair of original video frame pairs from the original video, and determining the frame interpolation for the original video frame pair according to a preset frame interpolation rule Time phase, and perform frame interpolation processing on each original video frame pair according to the time phase of the interpolation frame.
  • the original video frame pair when extracting the original video frame pair from the original video, it can be extracted according to any rule.
  • the extracted original video frame pair can be two adjacent original video frames in the original video, or it can be that there is no adjacent relationship in the original video. Any two of the original video frames.
  • the preset frame interpolation rule used can be any frame interpolation rule, that is, the parameters of the frame interpolation, such as the time phase of the interpolation frame and the number of the interpolation frames, can be customized, which is not specifically limited in the present disclosure.
  • the time phase refers to dividing the time interval between two original video frames into N parts at equal intervals, and each part is a time phase. For example, if the time interval between two original video frames is 1, then the time difference between the 0.5 time phase interpolated video frame and the two original video frames is equal; the 0.3 time phase interpolated video frame is equal to two The ratio of the time difference between the original video frames is 3:7.
  • the frame rate of the video can be increased, and the motion state of various characters, objects, and other objects in the video can be expressed in more detail.
  • the preset frame interpolation rules may include at least equal time phase rules.
  • at least one pair of original video frames is extracted from the original video, and the original video frame pairs are determined according to the preset frame interpolation rules.
  • Interpolating the frame insertion time phase of the inserted frame, and performing frame interpolation processing on each original video frame pair according to the interpolation frame time phase may include the following steps S410 to S440:
  • step S410 at least one pair of original video frame pairs is arbitrarily extracted from the original video, and the original video frame pair includes a first original video frame and a second original video frame.
  • the pair of original video frames may include two frames of video. According to a time sequence, the first frame in time is regarded as the first original video frame, and the frame later in time is regarded as the second original video frame.
  • the original video frame pair obtained by extraction can be in the original video, two adjacent original video frames, or not in the original video. Any two original video frames in the neighbor relationship.
  • step S420 the jitter degree value of the original video frame to the corresponding video segment is determined based on the motion data.
  • the original video frame pair includes the first original video frame and the second original video frame, and the corresponding video segment is the original video, the first original video frame is the starting point, and the second original video frame is the end point.
  • the degree of jitter can be determined according to the degree of floating of the motion data, and then the jitter degree value corresponding to the video can be obtained.
  • step S430 when the jitter degree value is less than the preset jitter threshold, the frame interpolation time phase of the original video frame pair is determined according to any one of the preset frame interpolation rules, and the frame interpolation time phase is determined according to the frame interpolation time phase. Interpolate the original video frame pair to obtain the corresponding interpolated video frame.
  • the jitter degree value when the jitter degree value is less than the preset jitter threshold, it can be judged that the current video clip has a low degree of jitter. Therefore, the intermediate frame between the original video frame pair has a higher degree of reliability, so it can be directly based on
  • the arbitrary frame interpolation rule determines the frame interpolation time phase when the original video frame pair is interpolated, and then the original video frame pair is interpolated according to the frame interpolation time phase to obtain the corresponding interpolated frame video.
  • step S440 when the jitter degree value is greater than or equal to the preset jitter threshold, the interpolated frame time phase of the original video frame pair is determined according to the isochronous phase rule, and the original video frame pair is interpolated according to the interpolated frame time phase. Interpolate frames to obtain the corresponding interpolated video frames.
  • the jitter level value when the jitter level value is greater than or equal to the preset jitter threshold, it can be determined that the current video segment has a large degree of jitter. Therefore, it is likely that the intermediate frame between the pair of original video frames is already due to the large jitter. It becomes distorted and unreliable.
  • the interval between the original video frame pairs can be divided equally according to the equal time phase rule, and the interpolation frame time phase can be determined according to the equal division result, and then the frame can be inserted according to the interpolation frame time phase. Obtain the corresponding interpolated video frame.
  • the intermediate frame between the original video frame pair is likely to be distorted and unreliable.
  • the original intermediate frame since the original intermediate frame is obviously distorted and unreliable, the original intermediate frame can be replaced by the interpolated video frame, and the original intermediate frame can be deleted.
  • the distorted intermediate frame can be replaced by the same number of interpolated video frames, avoiding the problem of broken video motion continuity caused by the distorted intermediate frame.
  • the equal division point may divide the time interval between the pair of original video frames into N+1 parts at equal intervals.
  • the ball is on the ground in the first frame, and the ball is 1 meter away from the ground in the third frame. In the second frame, the ball is not in the picture due to the large shaking of the shooting.
  • Interpolate frames with equal time phases according to the first frame and the third frame that is, divide the time interval between the first frame and the third frame into two equal parts, and the equal division point is the interpolation time phase corresponding to the second interpolation frame.
  • the first frame and the third frame are interpolated according to the interpolated frame time phase to obtain a second interpolated frame, and the second interpolated frame is replaced by the second interpolated frame.
  • the ball in the second interpolated video frame may be at a distance of 0.5 meters from the ground.
  • the interpolated video may have multiple interpolated video frames in the same time phase.
  • multiple interpolated video frames of the same phase may be merged first , And use the fused frame as the interpolated video frame in the time phase.
  • preset weight fusion, adaptive weight fusion, and other fusion methods can be used, which are not specifically limited in the present disclosure; in addition, during fusion, pixel-level fusion can be performed, block-level fusion, or frame-level fusion can also be performed. Level integration, this disclosure does not make any special restrictions on this.
  • the original video includes 4 frames, which are original video frame 1 to original video frame 4.
  • the interpolated video frames 5-1 to 5-5 as shown in Figure 5 can be obtained.
  • the interpolated video frames 5-2 and 5-4 have the same time phase as the original video frame 2 and the original video frame 3, respectively, and the interpolated video frame 5-1 is located in the intermediate time phase between the original video frame 1 and the original video frame 2.
  • the interpolated video frame 5-3 is located in the intermediate time phase between the original video frame 2 and the original video frame 3
  • the interpolated video frame 5-4 is located in the intermediate time phase between the original video frame 3 and the original video frame 4.
  • the interpolated video frame 1-1 as shown in Figure 5 can be obtained, which is located in the original video Intermediate time phase between frame 1 and original video frame 2.
  • the interpolated video frames 3-1 to 3-3 as shown in Figure 5 can be obtained.
  • the time phase corresponding to the interpolated video frames 3-1 to 3-3 equally divides the time between the original video frame 3 and the original video frame 4.
  • the aforementioned frame interpolation processing may use motion estimation motion compensation method, optical flow method, neural network frame interpolation, or any other frame interpolation technology.
  • the aforementioned motion estimation motion compensation method may include the following steps:
  • a motion estimation method is used to determine the motion vector corresponding to the original video frame pair.
  • the two original video frames in the original video frame pair are recorded as the current image and the reference image respectively, and the two images are divided into blocks according to the preset size, and the divided images are traversed to search for each block in the current image
  • For matching blocks in the reference image determine the motion vector (forward MV) of each block of the current image relative to the reference image.
  • forward MV motion vector of each block of the reference image relative to the current image
  • backward MV motion vector of each block of the reference image relative to the current image
  • a correction operation is performed on the forward and backward MV, where the correction operation includes at least one or a combination of multiple operations such as filtering, weighting, etc., to finally determine the forward or backward MV of each block, as shown in Figure 7 .
  • the motion vector is corrected by interpolating the time phase of the frame to obtain the mapping vector corresponding to the original video frame pair.
  • the final forward or backward MV of each block can be corrected by the interpolation frame time phase, and then the relative value of each interpolation block is generated in the interpolation frame image.
  • the mapping MV of the current image and the reference image is shown in Figure 8.
  • the original video frame pair is merged and interpolated based on the mapping vector to generate the corresponding interpolated video frame.
  • mapping MV the corresponding block is found in the reference image and the current image, and the weight interpolation of the two blocks is performed to generate all the pixels of the interpolation block, and finally the interpolated frame image is obtained, as shown in FIG. 8.
  • step S320 anti-shake restoration is performed on the video frames in the inserted frame video according to the motion data to obtain an anti-shake video frame corresponding to the inserted frame video.
  • the image mapping matrix corresponding to the motion data may be used as the first video in the interpolated video frame.
  • the image mapping matrix is a mapping matrix of plane image coordinates and world coordinates generated by the acquisition device, and the mapping matrix may usually be a 3*3 matrix.
  • the offset of other video frames relative to the first video frame can be calculated based on the motion data corresponding to other video frames, and the offset of the first frame can be calculated according to the calculated offset.
  • the original coordinate mapping matrix corresponding to the video frame is offset to obtain the original coordinate mapping matrix corresponding to other video frames in the interpolated frame video.
  • the original coordinate mapping matrix corresponding to the video frame in the interpolated video frame is filtered by the method of time domain filtering to obtain the modified image mapping matrix corresponding to the video frame, and then the video frame is projected and transformed according to the obtained modified image mapping matrix After the repair operation, obtain the anti-shake video frame corresponding to the inserted frame video.
  • the filter coefficients can be set differently according to different video collection environments.
  • each video frame in the interpolated video can be repaired, and all the repaired videos to be repaired are regarded as the corresponding interpolated video.
  • Anti-shake video frames
  • the video frame it is also possible to first select a part of the inserted frame video as the video frame to be repaired according to the preset selection rules, and then repair the corresponding video frame to be repaired according to the corrected image mapping matrix, and use all the repaired videos as the to-be-repaired video frame.
  • the anti-shake video frame corresponding to the inserted frame video By selectively repairing the interleaved video, it is possible to avoid the problem of repairing all interleaved videos when the frame rate of the interleaved video is high, and the problem of repairing too many video frames will take a long time to repair.
  • the video segment with a large jitter degree value may be an interpolated video frame obtained by interpolating the frame according to the isochronous phase rule in step S440.
  • the interpolated video frame obtained through the iso-temporal phase rule can already make the motion state in the video clip coherent, so it is not necessary to do the processing of time domain filtering, but directly
  • the interpolated video frame is used as an anti-shake video frame.
  • step S330 an anti-shake video corresponding to the original video is generated according to the anti-shake video frame.
  • the anti-shake video frames obtained after anti-shake restoration may be directly arranged in order to generate the anti-shake video corresponding to the original video. Since the anti-shake video frame is obtained through frame insertion and anti-shake processing, it can ensure a low degree of video jitter while improving the visual continuity of the video.
  • generating the anti-shake video corresponding to the original video according to the anti-shake video frame may include: extracting a target anti-shake frame from the anti-shake video frame according to a preset frame extraction rule, and outputting the target anti-shake frame, Generate the anti-shake video corresponding to the original video.
  • the preset frame drawing rule may be a fixed number of frames set by a user. For example, it can be defined that one frame is extracted every other frame, and the target anti-shake frames extracted in the anti-shake video are the first, third, fifth, and seventh frames.
  • the preset frame sampling rule may further include an adaptive frame sampling rule.
  • the adaptive frame extraction rule may include at least one of the following rules: frame extraction according to the motion state of the first target object in the anti-shake video frame, frame extraction according to the stability of the second target object in the anti-shake video frame, And according to the image quality of the anti-shake video frame, frame extraction is performed.
  • the interpolated video frames obtained by the interpolated frame may have uneven quality, and even after the anti-shake restoration is performed, the image quality is still different. Therefore, when there are anti-shake video frames formed by multiple interpolated frames in the same time phase, the quality parameters of multiple anti-shake video frames can be determined according to the confidence level corresponding to the multiple anti-shake video frames, and then the quality parameters can be used in multiple anti-shake Determine a video frame with the best quality as the target anti-shake frame of the time phase.
  • the confidence level corresponding to the anti-shake video frame may be a confidence level parameter used when searching for a motion vector based on a motion estimation method during frame interpolation, and is used to indicate the confidence level of the interpolated video frame obtained by the frame interpolation.
  • the anti-shake video frame with higher quality can be extracted from the anti-shake video frame as the target anti-shake frame according to the quality parameter of the anti-shake video frame, and the target anti-shake frame is output to generate the anti-shake video.
  • frame extraction may be performed according to the motion state of the first target object in the anti-shake video frame.
  • the initial motion state and final motion state of the first target object in the anti-shake video frame can be obtained first, and then the intermediate motion state of the first target object at each time point can be determined according to the initial motion state and the final motion state, and then Extract the target anti-shake frame from the anti-shake video frame according to the intermediate motion state.
  • the motion state corresponding to the first target object is the same as the intermediate motion state at the corresponding time point; the first target object can be any person, animal or object in motion in the original video .
  • frame extraction may be performed according to the stability of the second target object in the anti-shake video frame.
  • the second target object may be an object that is usually static, such as a background of a video.
  • the stabilization parameter of the stabilization video frame can be determined according to the coincidence rate of the second target object in the stabilization video with the second target object in the previous stabilization video frame, and the stabilization parameter can be used to indicate the second target The degree of stability of the subject in each anti-shake video frame.
  • the anti-shake video frame with better stability is extracted from the anti-shake video frame according to the stability parameter as the target anti-shake frame.
  • the previous anti-shake video frame refers to the previous anti-shake video frame in the anti-shake video of the anti-shake video frame.
  • the anti-shake video frame when extracting the target anti-shake frame from the anti-shake video frame according to the stability parameter, it can be determined whether the anti-shake video frame can be used as the target anti-shake frame by judging whether the stability parameter of the anti-shake video frame is in the preset stability parameter threshold. Frame shaking.
  • the anti-shake video frame is extracted according to other screening methods of the stability parameter, which is not specifically limited in the present disclosure. For example, it can be judged by the fluctuation range of the stability parameter of the anti-shake video frame and the previous anti-shake video frame.
  • the stable parameters of the video background in the anti-shake video frame of each frame are used to extract the more stable target anti-shake frame, so that in the obtained anti-shake video frame, the background is always in a stable state.
  • anti-shake For the purpose of anti-shake.
  • step S910 is first performed, the original video is interpolated according to the preset interpolating rules to obtain the interpolated video, and then the gyroscope data corresponding to the original video obtained in step S920, namely motion data, is obtained, and then the step is performed S930: Determine the original coordinate mapping matrix of the first frame according to the first frame of the interpolated video and the corresponding gyroscope data, and determine the original coordinates of each subsequent frame according to the original coordinate mapping matrix of the first frame and the gyroscope data Mapping matrix; then through step S940, the original coordinate mapping matrix corresponding to each frame in the interpolated frame video is time-domain filtered to determine the modified coordinate mapping matrix corresponding to each frame, and then step S950 is executed to perform the interpolated frame according to the modified coordinate mapping matrix Anti-shake restoration is performed on each frame in the video to obtain an anti-shake video frame; after that, the target anti-shake frame is extracted from the anti-shake video frame through step S960, and the
  • step S910 may be performed first and then step S920 may be performed, or step S920 may be performed first and then step S910 may be performed, or step S910 and step S920 may be performed simultaneously.
  • anti-shake restoration of the frame-insertion video can be implemented to achieve anti-shake on the frame-insertion video; , Extract the target anti-shake frame from the repaired anti-shake video frame, and control the motion state of each object in the target anti-shake frame according to the preset frame extraction rules set, thereby improving the visual coherence of motion.
  • this exemplary embodiment adopts the method of inserting frames first and then performing anti-shake restoration, compared to the method of performing anti-shake repair first and then inserting frames, the interpolation caused by texture loss caused by the anti-shake restoration process can be avoided. Frame error or inaccurate frame insertion.
  • this exemplary embodiment adopts a method of first interpolating frames and then decimating frames. Therefore, when the number of interpolated frames and the number of decimated frames are set differently, the effect of frame rate conversion on the original video can also be achieved.
  • the embodiment of this example also provides a video processing device 1000, which includes: a video frame insertion module 1010, an anti-shake processing module 1020, and a video generation module 1030. in:
  • the video frame insertion module 1010 may be used to obtain the original video and the motion data corresponding to the collecting device when the original video is collected, and perform frame insertion processing on the original video to obtain the frame insertion video corresponding to the original video.
  • the anti-shake processing module 1020 may be used to perform anti-shake repair on the video frame in the inserted frame video according to the motion data, so as to obtain the anti-shake video frame corresponding to the inserted frame video.
  • the video generation module 1030 may be used to generate an anti-shake video corresponding to the original video according to the anti-shake video frame.
  • the video frame insertion module 1010 may be used to extract at least one pair of original video frame pairs from the original video, and determine the frame interpolation time phase for the original video frame pair according to a preset frame interpolation rule, And according to the time phase of the interpolated frame, the original video frame pairs are interpolated.
  • the video frame insertion module 1010 can be used to arbitrarily extract at least one pair of original video frame pairs from the original video.
  • the original video frame pairs include a first original video frame and a second original video frame; based on the motion data Determine the jitter degree value of the original video frame to the corresponding video segment; where the video segment starts from the first original video frame and ends with the second original video frame; when the jitter degree value is less than the preset jitter threshold, insert according to the preset Any one of the frame interpolation rules in the frame rules determines the frame interpolation time phase of the original video frame pair, and the original video frame pair is interpolated according to the frame interpolation time phase to obtain the corresponding interpolated video frame; in the degree of jitter When the value is greater than or equal to the preset jitter threshold, the interpolation time phase of the original video frame pair is determined according to the equal time phase rule, and the original video frame pair is interpolated according to the interpolation time phase to obtain the corresponding interpol
  • the video frame insertion module 1010 may be used to obtain intermediate frames between pairs of original video frames in the original video; according to the number of intermediate frames, the time interval between the pairs of original video frames is equally spaced to Determine the interpolated frame time phase; interpolate the original video frame pair according to the interpolated frame time phase, generate interpolated video frames equal to the number of intermediate frames, and delete the intermediate frames.
  • the video frame interpolation module 1010 may be used to determine the motion vector corresponding to the original video frame pair by means of motion estimation; the motion vector is corrected by the time phase of the frame interpolation, so as to obtain the corresponding motion vector of the original video frame pair.
  • Mapping vector Based on the mapping vector, the original video frame pair is fused and interpolated to generate the corresponding interpolated video frame.
  • the video frame interpolation module 1010 may be used to merge multiple interpolated video frames when interpolated video frames formed by multiple interpolations of the same time phase are present in the interpolated video, and merge the resulting one.
  • the interpolated video frame is used as the interpolated video frame corresponding to the time phase.
  • the anti-shake processing module 1020 may be used to read the image mapping matrix corresponding to the motion data, and use the image mapping matrix as the original coordinate mapping matrix corresponding to the first frame of the video frame in the interpolated video;
  • the original coordinate mapping matrix corresponding to the data and the first video frame generates the original coordinate mapping matrix corresponding to other video frames in the interpolated video;
  • the original coordinate mapping matrix corresponding to the video frame in the interpolated video is filtered by the method of time domain filtering Through processing, the corrected image mapping matrix corresponding to the video frame is obtained; the video frame is repaired based on the corrected image mapping matrix to obtain the anti-shake video frame corresponding to the inserted frame video.
  • the anti-shake processing module 1020 may be used to select a video frame to be repaired from a video frame according to a preset selection rule, repair the corresponding video frame to be repaired by correcting the image mapping matrix, and restore the The to-be-repaired video frame is used as the anti-shake video frame.
  • the video generation module 1030 may be configured to extract a target anti-shake frame from the anti-shake video frame according to a preset frame extraction rule, and output the target anti-shake frame to generate an anti-shake video corresponding to the original video.
  • the video generation module 1030 may be used to extract frames according to the motion state of the first target object in the anti-shake video frame; to extract frames according to the stability of the second target object in the anti-shake video frame; and Draw frames according to the image quality of the anti-shake video frame.
  • the video generation module 1030 may be used to obtain the initial motion state and the final motion state of the first target object in the anti-shake video frame; according to the initial motion state and the final motion state, determine the first target object at each time The intermediate motion state of the point; extract the target anti-shake frame from the anti-shake video frame; wherein, in the target anti-shake frame, the motion state corresponding to the first target object is the same as the intermediate motion state at the corresponding time point.
  • the video generation module 1030 may be used to determine the anti-shake video frame according to the coincidence rate of the second target object in the anti-shake video frame with the second target object in the previous anti-shake video frame in time sequence. Stabilization parameters of the shake video frame; extract the target anti-shake frame from the anti-shake video frame according to the stability parameter.
  • the video generation module 1030 may be used to determine the anti-shake video frame formed by multiple interpolated frames with the same time phase in the anti-shake video frame according to the confidence level corresponding to the multiple anti-shake video frames.
  • the quality parameters of multiple anti-shake video frames; according to the quality parameters, one anti-shake video frame is determined from the multiple anti-shake video frames as the target anti-shake frame corresponding to the time phase.
  • Exemplary embodiments of the present disclosure also provide a computer-readable storage medium on which is stored a program product capable of implementing the above-mentioned method of this specification.
  • various aspects of the present disclosure can also be implemented in the form of a program product, which includes program code.
  • the program product runs on a terminal device, the program code is used to make the terminal device execute the above-mentioned instructions in this specification.
  • the steps described in the "Exemplary Methods" section according to various exemplary embodiments of the present disclosure for example, any one or more of the steps in FIG. 3, FIG. 4, and FIG. 9 may be performed.
  • the computer-readable medium shown in the present disclosure may be a computer-readable signal medium or a computer-readable storage medium, or any combination of the two.
  • the computer-readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or device, or a combination of any of the above. More specific examples of computer-readable storage media may include, but are not limited to: electrical connections with one or more wires, portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable removable Programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.
  • a computer-readable storage medium may be any tangible medium that contains or stores a program, and the program may be used by or in combination with an instruction execution system, apparatus, or device.
  • a computer-readable signal medium may include a data signal propagated in a baseband or as a part of a carrier wave, and a computer-readable program code is carried therein. This propagated data signal can take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • the computer-readable signal medium may also be any computer-readable medium other than the computer-readable storage medium.
  • the computer-readable medium may send, propagate, or transmit the program for use by or in combination with the instruction execution system, apparatus, or device .
  • the program code contained on the computer-readable medium can be transmitted by any suitable medium, including but not limited to: wireless, wire, optical cable, RF, etc., or any suitable combination of the above.
  • program code for performing the operations of the present disclosure can be written in any combination of one or more programming languages.
  • the programming languages include object-oriented programming languages—such as Java, C++, etc., as well as conventional procedural programming languages. Programming language-such as "C" language or similar programming language.
  • the program code can be executed entirely on the user's computing device, partly on the user's device, executed as an independent software package, partly on the user's computing device and partly executed on the remote computing device, or entirely on the remote computing device or server Executed on.
  • the remote computing device can be connected to a user computing device through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computing device (for example, using Internet service providers). Business to connect via the Internet).
  • LAN local area network
  • WAN wide area network
  • Internet service providers for example, using Internet service providers.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Television Systems (AREA)

Abstract

本申请提供了一种视频处理方法、视频处理装置和电子设备,涉及图像处理技术领域。该方法包括:获取原始视频和采集原始视频时采集设备对应的运动数据,对原始视频进行插帧处理,以获得原始视频对应的插帧视频;根据运动数据对插帧视频中的视频帧进行防抖修复,以获得插帧视频对应的防抖视频帧;根据防抖视频帧生成原始视频对应的防抖视频。本申请通过包含更多视频帧的插帧视频来展现原始视频中的运动状态,通过对原始视频进行插帧处理提高了视频在视觉上的运动连贯性,且在一定程度上修复了插帧视频中视频帧的抖动。

Description

视频处理方法、视频处理装置和电子设备
交叉引用
本公开要求于2020年5月19日提交的申请号为202010425185.6名称为“视频处理方法、视频处理装置和电子设备”的中国专利申请的优先权,该中国专利申请的全部内容通过引用全部并入本文。
技术领域
本公开涉及图像处理技术领域,具体涉及一种视频处理方法、视频处理装置和电子设备。
背景技术
伴随着人们生活水平的不断提高,各种电子摄像设备被广泛的应用于生活的各个方面。移动摄像设备因其体型较小且方便携带被应用在越来越多的领域,然而这些移动摄像设备在进行拍摄时,由于其移动方便,因此很容易受周围环境的影响。例如,在手持拍摄时,容易因为手部移动的不稳定造成视频抖动;再如,汽车行驶过程中的震动很容易造成车载摄像头拍摄的视频出现抖动的问题。
公开内容
本公开的目的在于提供一种视频处理方法、视频处理装置和电子设备,进而至少在一定程度上提高防抖视频的视觉连贯性。
根据本公开的第一方面,提供一种视频处理方法,包括:获取原始视频和采集原始视频时采集设备对应的运动数据,并对原始视频进行插帧处理,以获得原始视频对应的插帧视频;根据运动数据对插帧视频中的视频帧进行防抖修复,以获得插帧视频对应的防抖视频帧;根据防抖视频帧生成原始视频对应的防抖视频。
根据本公开的第二方面,提供一种视频处理装置,包括:视频插帧模块,用于获取原始视频和采集原始视频时采集设备对应的运动数据,并对原始视频进行插帧处理,以获得原始视频对应的插帧视频;防抖处理模块,用于根据运动数据对插帧视频中的视频帧进行防抖修复,以获得插帧视频对应的防抖视频帧;视频生成模块,用于根据防抖视频帧生成原始视频对应的防抖视频。
根据本公开的第三方面,提供一种电子设备,包括:处理器;以及存储器,用于存储一个或多个程序,当一个或多个程序被一个或多个处理器执行时,使得一个或多个处理器实现上述的视频处理方法。
应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,并不能限制本公开。
附图说明
此处的附图被并入说明书中并构成本说明书的一部分,示出了符合本公开的实施例,并与说明书一起用于解释本公开的原理。显而易见地,下面描述中的附图仅仅是本公开的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。在附图中:
图1示出了可以应用本公开实施例的一种示例性系统架构的示意图;
图2示出了可以应用本公开实施例的一种电子设备的示意图;
图3示意性示出本公开示例性实施例中一种视频处理方法的流程图;
图4示意性示出本公开示例性实施例中一种对原始视频进行插帧处理的方法的流程 图;
图5示意性示出本公开示例性实施例中插帧处理的示意图;
图6示意性示出本公开示例性实施例中一种基于运动估计确定运动矢量的示意图;
图7示意性示出本公开示例性实施例中一种修正后的运动矢量的示意图;
图8示意性示出本公开示例性实施例中一种基于运动补偿进行插帧的示意图;
图9示意性示出本公开示例性实施例中另一种视频处理方法的流程图;
图10示意性示出本公开示例性实施例中视频处理装置的组成示意图。
具体实施方式
现在将参考附图更全面地描述示例实施方式。然而,示例实施方式能够以多种形式实施,且不应被理解为限于在此阐述的范例;相反,提供这些实施方式使得本公开将更加全面和完整,并将示例实施方式的构思全面地传达给本领域的技术人员。所描述的特征、结构或特性可以以任何合适的方式结合在一个或更多实施方式中。
此外,附图仅为本公开的示意性图解,并非一定是按比例绘制。图中相同的附图标记表示相同或类似的部分,因而将省略对它们的重复描述。附图中所示的一些方框图是功能实体,不一定必须与物理或逻辑上独立的实体相对应。可以采用软件形式来实现这些功能实体,或在一个或多个硬件模块或集成电路中实现这些功能实体,或在不同网络和/或处理器装置和/或微控制器装置中实现这些功能实体。
图1示出了可以应用本公开实施例的一种视频处理方法及装置的示例性应用环境的系统架构的示意图。
如图1所示,系统架构100可以包括终端设备101、102、103中的一个或多个,网络104和服务器105。网络104用以在终端设备101、102、103和服务器105之间提供通信链路的介质。网络104可以包括各种连接类型,例如有线、无线通信链路或者光纤电缆等等。终端设备101、102、103可以是各种具有图像处理功能的电子设备,包括但不限于台式计算机、便携式计算机、智能手机和平板电脑等等。应该理解,图1中的终端设备、网络和服务器的数目仅仅是示意性的。根据实现需要,可以具有任意数目的终端设备、网络和服务器。比如服务器105可以是多个服务器组成的服务器集群等。
本公开实施例所提供的视频处理方法一般由终端设备101、102、103执行,相应地,视频处理装置一般也设置于终端设备101、102、103中。但本领域技术人员容易理解的是,本公开实施例所提供的视频处理方法也可以由服务器105执行,相应的,视频处理装置也可以设置于服务器105中,本示例性实施例中对此不做特殊限定。举例而言,在一种示例性实施例中,终端设备101、102、103等一方面作为采集设备采集原始视频和对应的运动数据,同时作为视频处理方法的执行主体,基于采集到的原始视频和运动数据进行视频处理,得到防抖视频;在另一种示例性实施例中,可以以终端设备101、102、103等为采集设备,将采集到的原始视频和运动数据发送至其他终端设备101、102、103等或者服务器105中进行视频处理,得到防抖视频。
本公开的示例性实施方式提供一种用于实现视频处理方法的电子设备,其可以是图1中的终端设备101、102、103或服务器105。该电子设备至少包括处理器和存储器,存储器用于存储处理器的可执行指令,处理器配置为经由执行可执行指令来执行视频处理方法。
图2示出了适于用来实现本公开示例性实施方式的电子设备的示意图。
需要说明的是,图2示出的电子设备的电子设备200仅是一个示例,不应对本公开实施例的功能和使用范围带来任何限制。
如图2所示,电子设备200具体可以包括:处理器210、内部存储器221、外部存储器接口222、通用串行总线(Universal Serial Bus,USB)接口230、充电管理模块240、电源管理模块241、电池242、天线1、天线2、移动通信模块250、无线通信模块260、音频 模块270、扬声器271、受话器272、麦克风273、耳机接口274、传感器模块280、显示屏290、摄像模组291、指示器292、马达293、按键294以及用户标识模块(subscriber identification module,SIM)卡接口295等。其中传感器模块280可以包括深度传感器2801、压力传感器2802、陀螺仪传感器2803、气压传感器2804、磁传感器2805、加速度传感器2806等。
可以理解的是,本申请实施例示意的结构并不构成对电子设备200的具体限定。在本申请另一些实施例中,电子设备200可以包括比图示更多或更少的部件,或者组合某些部件,或者拆分某些部件,或者不同的部件布置。图示的部件可以以硬件、软件或软件和硬件的组合实现。
处理器210可以包括一个或多个处理单元,例如:处理器210可以包括应用处理器(Application Processor,AP)、调制解调处理器、图形处理器(Graphics Processing Unit,GPU)、图像信号处理器(Image Signal Processor,ISP)、控制器、视频编解码器、数字信号处理器(Digital Signal Processor,DSP)、基带处理器和/或神经网络处理器(Neural-Network Processing Unit,NPU)等。其中,不同的处理单元可以是独立的器件,也可以集成在一个或多个处理器中。
处理器210中还可以设置存储器,用于存储指令和数据。存储器可以存储用于实现六个模块化功能的指令:检测指令、连接指令、信息管理指令、分析指令、数据传输指令和通知指令,并由处理器210来控制执行。在一些实施例中,处理器210中的存储器为高速缓冲存储器。该存储器可以保存处理器210刚用过或循环使用的指令或数据。如果处理器210需要再次使用该指令或数据,可从存储器中直接调用。避免了重复存取,减少了处理器210的等待时间,因而提高了系统的效率。
电子设备200的无线通信功能可以通过天线1、天线2、移动通信模块250、无线通信模块260、调制解调处理器以及基带处理器等实现。
摄像模组291用于捕获静态图像或视频。物体通过镜头生成光学图像投射到感光元件。感光元件可以是电荷耦合器件(Charge Coupled Device,CCD)或互补金属氧化物半导体(Complementary Metal-Oxide-Semiconductor,CMOS)光电晶体管。感光元件把光信号转换成电信号,之后将电信号传递给ISP转换成数字图像信号。ISP将数字图像信号输出到DSP加工处理。DSP将数字图像信号转换成标准的RGB,YUV等格式的图像信号。在一些实施例中,电子设备200可以包括1个或N个摄像模组291,N为大于1的正整数,若电子设备200包括N个摄像头,N个摄像头中有一个是主摄像头。
数字信号处理器用于处理数字信号,除了可以处理数字图像信号,还可以处理其他数字信号。例如,当电子设备200在频点选择时,数字信号处理器用于对频点能量进行傅里叶变换等。
视频编解码器用于对数字视频压缩或解压缩。电子设备200可以支持一种或多种视频编解码器。这样,电子设备200可以播放或录制多种编码格式的视频,例如:动态图像专家组(Moving Picture Experts Group,MPEG)1,MPEG2,MPEG3,MPEG4等。
NPU为神经网络(Neural-Network,NN)计算处理器,通过借鉴生物神经网络结构,例如借鉴人脑神经元之间传递模式,对输入信息快速处理,还可以不断的自学习。通过NPU可以实现电子设备200的智能认知等应用,例如:图像识别,人脸识别,语音识别,文本理解等。
外部存储器接口222可以用于连接外部存储卡,例如Micro SD卡,实现扩展电子设备200的存储能力。内部存储器221可以用于存储计算机可执行程序代码,可执行程序代码包括指令。内部存储器221可以包括存储程序区和存储数据区。
深度传感器2801用于获取景物的深度信息。在一些实施例中,深度传感器可以设置于摄像模组291。压力传感器2802用于感受压力信号,可以将压力信号转换成电信号。
陀螺仪传感器2803可以用于确定电子设备200的运动姿态。在一些实施例中,陀螺仪传感器可以用于收集采集设备在采集原始视频时对应的运动数据,即可以通过陀螺仪传感器2803确定电子设备200围绕三个轴(即,x,y和z轴)的角速度,并且以该角速度作为当前视频帧对应的运动数据。示例性的,在拍摄视频时,可以以摄像头对应的图像映射矩阵为第一帧视频对应的原始坐标映射矩阵,然后根据陀螺仪在每一帧视频时的运动数据与第一帧视频的运动数据的变化计算出原始坐标矩阵的偏移,进而得到第一帧以后每一帧视频的原始坐标矩阵,在一定程度上实现防抖。
气压传感器2804用于测量气压,电子设备200可以通过气压传感器2804测得的气压值计算海拔高度,辅助定位和导航。磁传感器2805包括霍尔传感器,电子设备200可以利用磁传感器2805检测翻盖皮套的开合。
加速度传感器2806可检测电子设备200在各个方向上(一般为三轴)加速度的大小,当电子设备200静止时可检测出重力的大小及方向,还可以用于识别电子设备姿态,因此在一些实施例中,也可以用于收集采集设备在采集原始视频时对应的运动数据。
按键294包括开机键,音量键等。马达293可以产生振动提示。马达293可以用于来电振动提示,也可以用于触摸振动反馈。指示器292可以是指示灯,可以用于指示充电状态,电量变化,也可以用于指示消息,未接来电,通知等。SIM卡接口295用于连接SIM卡。
下面对本公开示例性实施方式的视频处理方法和视频处理装置进行具体说明。
图3示出了本示例性实施方式中一种视频处理方法的流程,包括以下步骤S310至S330:
在步骤S310中,获取原始视频和采集原始视频时采集设备对应的运动数据,并对原始视频进行插帧处理,以获得原始视频对应的插帧视频。
其中,运动数据可以是采集设备在拍摄原始视频时,采集设备上设置的陀螺仪、加速度传感器等装置获取到的反映采集设备在当前状态下的位姿、加速度等运动状态的运动数据。例如,在通过手机进行视频拍摄时,运动数据可以是手机中陀螺仪收集到的手机位姿或者手机放置的角度等数据。
其中,插帧处理是指通过一定规则在原始视频中的两个原始视频帧之间插入一系列中间帧的过程。在获取到原始视频后,可以在原始视频中确定两帧作为原始视频帧,并在这两帧原始视频帧之间插入一定数量的中间帧。此外,可以在视频中同时抽取多个原始视频帧对,并针对每对进行插帧处理。在插帧处理结束后,将原始视频帧和插帧得到的插值视频帧按照时间先后顺序排序,作为原始视频对应的插帧视频。
在一示例性实施例中,对原始视频进行插帧处理,可以包括:在原始视频中抽取至少一对原始视频帧对,根据预设插帧规则确定对原始视频帧对进行插帧的插帧时间相位,并根据插帧时间相位对各原始视频帧对进行插帧处理。
其中,在原始视频中抽取原始视频帧对时,可以根据任意规则进行抽取,抽取得到的原始视频帧对可以是原始视频中相邻的两帧原始视频帧,也可以是原始视频中没有邻近关系的任意两帧原始视频帧。在进行插帧时,采用的预设插帧规则可以是任意插帧规则,即插帧的时间相位和插帧的数量等插帧的参数均可以自定义,本公开对此不做特殊限定。
其中,时间相位指将两个原始视频帧之间的时间间隔等间距划分为N份,每一份为一个时间相位。举例而言,若将两个原始视频帧之间的时间间隔即为1,则0.5时间相位的插值视频帧与两个原始视频帧之间的时间差相等;0.3时间相位的插值视频帧与两个原始视频帧之间的时间差之比为3:7。
通过对原始视频中的至少一对原始视频帧对进行插帧,可以提高视频的帧率,进而更细致的表现视频中各人物、物体等对象的运动状态。
进一步的,在另一示例性实施例中,由于在拍摄时有可能出现特殊情况造成视频抖动较大,例如手持拍摄时发生碰撞等原因,此时可能需要对抖动较大的视频进行特定处理。 因此,预设插帧规则可以至少包括等时间相位规则,此时,参照图4所示,在原始视频中抽取至少一对原始视频帧对,根据预设插帧规则确定对原始视频帧对进行插帧的插帧时间相位,并根据插帧时间相位对各原始视频帧对进行插帧处理,可以包括以下步骤S410至S440:
在步骤S410中,在原始视频中任意抽取至少一对原始视频帧对,原始视频帧对包括第一原始视频帧和第二原始视频帧。
其中,原始视频帧对中可以包括两帧视频,根据时间顺序将时间靠前的一帧作为第一原始视频帧,将时间靠后的一帧作为第二原始视频帧。在原始视频中抽取原始视频帧对时,同样可以根据任意预设规则进行抽取,抽取得到的原始视频帧对可以是原始视频中,相邻的两帧原始视频帧,也可以是原始视频中没有邻近关系的任意两帧原始视频帧。
在步骤S420中,基于运动数据确定原始视频帧对对应视频片段的抖动程度值。
其中,原始视频帧对包括第一原始视频帧和第二原始视频帧,对应的视频片段即为原始视频中,以第一原始视频帧为起点,以第二原始视频帧为终点截取的视频片段。在拍摄视频出现较大抖动时,可以根据运动数据的浮动程度确定抖动程度,进而得到视频对应的抖动程度值。
在步骤S430中,在抖动程度值小于预设抖动阈值时,根据预设插帧规则中的任意一个插帧规则确定对原始视频帧对进行插帧的插帧时间相位,并根据插帧时间相位对原始视频帧对进行插帧,以获取对应的插值视频帧。
在一示例性实施例中,在抖动程度值小于预设抖动阈值时,可以判断当前视频片段的抖动程度不大,因此在原始视频帧对之间的中间帧可靠程度较高,因此可以直接基于任意插帧规则确定对原始视频帧对进行插帧时的插帧时间相位,进而根据插帧时间相位对原始视频帧对进行插帧,得到对应的插值帧视频。
在步骤S440中,在抖动程度值大于或等于预设抖动阈值时,根据等时间相位规则确定对原始视频帧对进行插帧的插帧时间相位,并根据插帧时间相位对原始视频帧对进行插帧,以获取对应的插值视频帧。
在一示例性实施例中,在抖动程度值大于或等于预设抖动阈值时,可以判断当前视频片段的抖动程度较大,因此很可能在原始视频帧对之间的中间帧由于抖动较大已经变得失真、不可靠,此时可以根据等时间相位规则对原始视频帧对之间的间隔进行等分,并根据等分结果确定插帧时间相位,进而根据插帧时间相位进行插帧,以获取对应的插值视频帧。
通过对抖动程度不同的视频片段以不同的手段进行处理,可以针对抖动程度较大的视频片段进行有针对性的修复,避免拍摄抖动造成视频中各人物、物体等对象的运动状态不连贯的问题。
具体的,在视频片段的抖动程度值较大时,原始视频帧对之间的中间帧很可能失真、不可靠,此时需要根据原始视频帧对修复对应的中间帧。为了能够更好的修复,可以获取原始视频帧对之间存在的中间帧,并根据中间帧的数量对原始视频帧对进行等时间相位的插帧,以生成与中间帧的数量相等的插值视频帧。同时,由于原始的中间帧明显存在失真、不可靠的问题,因此可以通过插值视频帧替换原始的中间帧,并将原始的中间帧删除。通过等时间相位规则插帧,可以通过相同数量的插值视频帧替换失真的中间帧,避免由于失真的中间帧造成视频运动连贯性被打破的问题。
此外,在根据中间帧的数量确定各插值视频帧的插帧时间相位时,需要确定与中间帧的数量相同的插帧时间相位,因此可以通过与中间帧的数量相同的等分点,并将得到的等分点确定为各插值视频帧的插帧时间相位。需要说明的是,在中间帧的数量为N时,等分点可以将原始视频帧对之间的时间间隔等间距划分为N+1份。
例如,假设有3帧视频,第一帧中小球处于地面上,第三帧中小球处于距离地面1米处,而第二帧由于拍摄的抖动较大,导致小球不在画面中,此时可以根据第一帧和第三帧 进行等时间相位的插帧,即将第一帧和第三帧之间的时间间隔划分为两等份,等分点即为第二插值帧对应的插值时间相位,根据该插帧时间相位对第一帧和第三帧进行插帧得到第二插值帧,并通过第二插值帧替换上述第二帧。其中,第二插值视频帧中小球可以处于距离地面0.5米处的状态。
在一示例性实施例中,在进行插帧处理后,插帧视频中有可能出现同一时间相位上具有多个插值视频帧的情况,此时可以先对相同相位的多个插值视频帧进行融合,并将融合后的融合帧作为该时间相位上的插值视频帧。具体的,可以采用预设权重融合、自适应权重融合,以及其他融合方式,本公开对此不做特殊限定;此外,在融合时,可以进行像素级融合,也可以进行块级融合,或者帧级融合,本公开对此也不做特殊限定。
以下针对上述根据插帧时间相位对原始视频进行插帧处理的过程提供以下具体实施例:
参照图5所示,原始视频包括4帧,分别为原始视频帧1至原始视频帧4。
实施例1:
设置插帧数量为5,并以原始视频帧1和原始视频帧4为原始视频帧对进行等时间相位的插帧,则可以得到如图5所示的插值视频帧5-1至5-5。其中,插值视频帧5-2和5-4分别与原始视频帧2和原始视频帧3的时间相位相同,插值视频帧5-1位于原始视频帧1和原始视频帧2之间的中间时间相位,插值视频帧5-3位于原始视频帧2和原始视频帧3之间的中间时间相位,插值视频帧5-4位于原始视频帧3和原始视频帧4之间的中间时间相位。
实施例2:
设置插帧数量为1,并以原始视频帧1和原始视频帧2为原始视频帧对进行等时间相位的插帧,则可以得到如图5所示的插值视频帧1-1,位于原始视频帧1和原始视频帧2之间的中间时间相位。
实施例3:
设置插帧数量为3,并以原始视频帧3和原始视频帧4为原始视频帧对进行等时间相位的插帧,则可以得到如图5所示的插值视频帧3-1至3-3,均位于原始视频帧3和原始视频帧4之间,其插值视频帧3-1至3-3对应的时间相位等分原始视频帧3和原始视频帧4之间的时间。
在一示例性实施例中,上述的插帧处理可以采用运动估计运动补偿法、光流法、神经网络插帧或者其他任意插帧技术。
举例而言,上述的运动估计运动补偿方法可以包括以下步骤:
第一步,采用运动估计的方式确定原始视频帧对对应的运动矢量。
将原始视频帧对中的两帧原始视频帧分别记为当前图像和参考图像,按照预设大小对两幅图像进行分块,并对分块后的图像进行遍历,搜索当前图像中每个块在参考图像中的匹配块,确定当前图像每个块相对于参考图像的运动矢量(前向MV),同理,采用上述方法确定参考图像每个块相对于当前图像的运动矢量(后向MV),如图6所示。
随后,对前后向MV进行修正操作,其中,修正操作包括滤波,加权等多种操作中至少一种或多种的组合,最终确定每个块的前向或后向MV,如图7所示。
第二步,通过插帧时间相位对运动矢量进行校正,以获取原始视频帧对对应的映射矢量。
在根据预设插帧规则确定了插帧时间相位后,可以通过插帧时间相位对最终确定的每个块的前向或后向MV进行校正,然后在插帧图像中生成每个插值块相对于当前图像和参考图像的映射MV,如图8所示。
第三步,基于映射矢量对原始视频帧对进行融合插帧,以生成对应的插值视频帧。
按照该映射MV在参考图像和当前图像中找到对应块,进行两个块的权重插值,生成 该插值块的所有像素,最终得到插帧图像,如图8所示。
在步骤S320中,根据运动数据对插帧视频中的视频帧进行防抖修复,以获得插帧视频对应的防抖视频帧。
在一示例性实施例中,由于插值视频帧中第一帧视频帧画面中的人物、物体等对象均处于初始状态,因此可以将运动数据对应的图像映射矩阵作为插帧视频中第一帧视频帧对应的原始坐标映射矩阵。其中,图像映射矩阵为采集设备生成的平面图像坐标和世界坐标的映射矩阵,该映射矩阵通常可以是3*3的矩阵。
在得到第一帧视频帧对应的原始坐标映射矩阵后,可以基于其他视频帧对应的运动数据计算其他视频帧相对于第一视频帧的偏移量,根据计算得到的偏移量对第一帧视频帧对应的原始坐标映射矩阵进行偏移,得到插帧视频中其它视频帧对应的原始坐标映射矩阵。
随后,通过时域滤波的方法对插值视频帧中的视频帧对应的原始坐标映射矩阵进行滤波处理,得到视频帧对应的修正图像映射矩阵,进而根据得到的修正图像映射矩阵对视频帧进行投影变换的修复操作后,获得插帧视频对应的防抖视频帧。其中,在对视频帧对应的原始坐标映射矩阵进行滤波处理时,滤波系数可以根据不同视频采集环境进行不同的设置。
之后,在根据修正图像映射矩阵对视频帧进行投影变换的修复操作时,可以针对插帧视频中的每一视频帧均进行修复,并将修复后的所有的待修复视频作为该插帧视频对应的防抖视频帧。
进一步的,也可以先根据预设选择规则在插帧视频中选择一部分作为待修复视频帧,然后根据修正图像映射矩阵对对应的待修复视频帧进行修复,并将所有修复后的待修复视频作为该插帧视频对应的防抖视频帧。通过对插帧视频进行选择性修复,可以在插帧视频的帧率较高时避免对所有插帧视频进行修复时,修复的视频帧过多导致修复耗时较长的问题。
需要说明的是,在对原始视频进行插帧时,针对抖动程度值较大的视频片段,可能是根据上述步骤S440中的等时间相位规则进行插帧得到的插值视频帧。在这一基础上,由于中间帧的失真、不可靠,通过等时间相位规则得到的插值视频帧已经可以使视频片段中的运动状态连贯,因此可以不做时域滤波的处理,而是直接以该插值视频帧作为防抖视频帧。
在步骤S330中,根据防抖视频帧生成原始视频对应的防抖视频。
在一示例性实施例中,可以直接将进行防抖修复后得到的防抖视频帧按顺序排列,生成原始视频对应的防抖视频。由于防抖视频帧是通过插帧处理和防抖处理后得到的,因此能够在保证视频抖动程度较低的同时,提高了视频在视觉上的运动连贯性。
在一示例性实施例中,根据防抖视频帧生成原始视频对应的防抖视频可以包括:根据预设抽帧规则在防抖视频帧中抽取目标防抖帧,并将目标防抖帧输出,生成原始视频对应的防抖视频。
在一示例性实施例中,预设抽帧规则可以是自定义设置的固定帧数。例如,可以定义每隔一帧抽取一帧,则在防抖视频中抽取的目标防抖帧分别为第1、3、5、7…帧。
在一示例性实施例中,预设抽帧规则还可以包括自适应抽帧规则。其中,自适应抽帧规则可以包括以下规则中至少一种:根据防抖视频帧中第一目标对象的运动状态进行抽帧、根据防抖视频帧中第二目标对象的稳定性进行抽帧,以及根据防抖视频帧的图像质量进行抽帧。
在一示例性实施例中,由于在对原始视频进行插帧时,插帧得到的插值视频帧可能存在质量参差不齐的情况,即使进行了防抖修复之后,图像质量仍然不同。因此在相同时间相位存在多个插帧形成的防抖视频帧时,可以根据多个防抖视频帧对应的置信度确定多个防抖视频帧的质量参数,进而根据质量参数在多个防抖视频帧中确定一个质量最好的作为 该时间相位的目标防抖帧。其中,防抖视频帧对应的置信度可以是在进行插帧时基于运动估计方式寻找运动矢量时使用的置信度参数,用于表示插帧得到的插值视频帧的置信程度。为了保证得到的防抖视频质量更好,可以依据防抖视频帧的质量参数在防抖视频帧中抽取质量较高的作为目标防抖帧,并将目标防抖帧输出生成防抖视频。
在一示例性实施例中,由于第一目标对象在较短时间内运动状态的变化通常为线性,因此可以根据防抖视频帧中第一目标对象的运动状态进行抽帧。举例而言,可以先获取防抖视频帧中第一目标对象的初始运动状态和最终运动状态,然后根据初始运动状态和最终运动状态分别确定第一目标对象在各个时间点的中间运动状态,随后在根据中间运动状态在防抖视频帧中抽取目标防抖帧。其中,在抽取得到的目标防抖帧中,第一目标对象对应的运动状态与对应时间点的中间运动状态相同;第一目标对象则可以是原始视频中处于运动状态的任意人物、动物或者物体。
在一示例性实施例中,还可以根据防抖视频帧中第二目标对象的稳定性进行抽帧。其中,第二目标对象可以是视频的背景等通常处于静态的物体。具体的,可以根据防抖视频中的第二目标对象,与前一防抖视频帧中的第二目标对象的重合率确定防抖视频帧的稳定参数,该稳定参数可以用于表示第二目标对象在各防抖视频帧中的稳定程度。然后,根据稳定参数在防抖视频帧中抽取稳定性较好的防抖视频帧作为目标防抖帧。其中,前一防抖视频帧是指该防抖视频帧在防抖视频中,时间顺序上的前一帧防抖视频帧。
此外,在根据稳定参数在防抖视频帧中抽取目标防抖帧时,可以通过判断该防抖视频帧的稳定参数是否在预设的稳定参数阈值中确定该防抖视频帧是否可以作为目标防抖帧。除此之外,还可以根据对稳定参数的其他筛选方式确定防抖视频帧是否抽取,本公开对此不做特殊限定。例如,可以通过该防抖视频帧与前一防抖视频帧的稳定参数的波动幅度进行判断等方式。
在进行抽帧时,通过视频背景在各帧防抖视频帧中的稳定参数,抽取稳定性较高的作为目标防抖帧,这样得到的防抖视频帧中,背景始终处于稳定状态,即实现了防抖的目的。
需要说明的是,在对防抖视频帧进行抽帧时,还可以同时通过以上三种方式中两者的组合或者三者的组合进行抽帧,进而实现更好的防抖效果。
以下以陀螺仪为采集设备为例,参照图9所示,对本公开实施例的技术方案进行阐述:
参照图9所示,首先执行步骤S910,根据预设插帧规则对原始视频进行插帧处理得到插帧视频,随后获取通过步骤S920获取原始视频对应的陀螺仪数据,即运动数据,之后执行步骤S930,通过插帧视频中的第一帧和对应的陀螺仪数据确定该第一帧的原始坐标映射矩阵,并根据第一帧的原始坐标映射矩阵和陀螺仪数据确定后续每一帧的原始坐标映射矩阵;随后通过步骤S940对插帧视频中每一帧对应的原始坐标映射矩阵进行时域滤波,确定每一帧对应的修正坐标映射矩阵,之后执行步骤S950,根据修正坐标映射矩阵对插帧视频中的每一帧进行防抖修复,得到防抖视频帧;之后通过步骤S960在防抖视频帧中抽取目标防抖帧,并将目标防抖帧输出,生成原始视频对应的防抖视频。
其中,本公开对上述步骤S910和步骤S920的执行顺序不做限定,即可以先执行步骤S910后执行步骤S920,也可以先执行步骤S920再执行步骤S910,还可以同时执行步骤S910和步骤S920。
综上,本示例性实施方式中,通过对原始视频进行任意插帧或者等时间相位插帧后,对插帧视频进行防抖修复,可以实现对插帧视频的防抖;然后通过在插帧、修复后的防抖视频帧中抽取目标防抖帧,可以根据设置的预设抽帧规则控制目标防抖帧中各对象的运动状态,进而提高视觉上的运动连贯性。
此外,由于本示例性实施方式采用了先插帧,后进行防抖修复的方式,相对于先进行防抖修复,后进行插帧的方式,可以避免防抖修复过程造成的纹理损失导致的插帧错误或者插帧不准确的问题。同时,本示例性实施方式采用了先插帧,后抽帧的方式,因此在对 插帧数量和抽帧数量的设置不同时,还可以实现对原始视频进行帧率转换的效果。
需要注意的是,上述附图仅是根据本公开示例性实施例的方法所包括的处理的示意性说明,而不是限制目的。易于理解,上述附图所示的处理并不表明或限制这些处理的时间顺序。另外,也易于理解,这些处理可以是例如在多个模块中同步或异步执行的。
进一步的,参考图10所示,本示例的实施方式中还提供一种视频处理装置1000,包括:视频插帧模块1010、防抖处理模块1020和视频生成模块1030。其中:
视频插帧模块1010可以用于获取原始视频和采集原始视频时采集设备对应的运动数据,并对原始视频进行插帧处理,以获得原始视频对应的插帧视频。
防抖处理模块1020可以用于根据运动数据对插帧视频中的视频帧进行防抖修复,以获得插帧视频对应的防抖视频帧。
视频生成模块1030可以用于根据防抖视频帧生成原始视频对应的防抖视频。
在一示例性实施例中,视频插帧模块1010可以用于在原始视频中抽取至少一对原始视频帧对,根据预设插帧规则确定对原始视频帧对进行插帧的插帧时间相位,并根据插帧时间相位对各原始视频帧对进行插帧处理。
在一示例性实施例中,视频插帧模块1010可以用于在原始视频中任意抽取至少一对原始视频帧对,原始视频帧对包括第一原始视频帧和第二原始视频帧;基于运动数据确定原始视频帧对对应视频片段的抖动程度值;其中,视频片段以第一原始视频帧为起点,以第二原始视频帧为终点;在抖动程度值小于预设抖动阈值时,根据预设插帧规则中的任意一个插帧规则确定对原始视频帧对进行插帧的插帧时间相位,并根据插帧时间相位对原始视频帧对进行插帧,以获取对应的插值视频帧;在抖动程度值大于或等于预设抖动阈值时,根据等时间相位规则确定对原始视频帧对进行插帧的插帧时间相位,并根据插帧时间相位对原始视频帧对进行插帧,以获取对应的插值视频帧。
在一示例性实施例中,视频插帧模块1010可以用于获取原始视频中原始视频帧对之间的中间帧;根据中间帧的数量对原始视频帧对之间的时间间隔进行等间距划分以确定插帧时间相位;根据插帧时间相位对原始视频帧对进行插帧,生成与中间帧的数量相等的插值视频帧,并将中间帧删除。
在一示例性实施例中,视频插帧模块1010可以用于采用运动估计的方式确定原始视频帧对对应的运动矢量;通过插帧时间相位对运动矢量进行校正,以获取原始视频帧对对应的映射矢量;基于映射矢量对原始视频帧对进行融合插帧,以生成对应的插值视频帧。
在一示例性实施例中,视频插帧模块1010可以用于在插值视频中存在相同时间相位的多个插值形成的插值视频帧时,对多个插值视频帧进行融合,并将融合得到的一个插值视频帧作为该时间相位对应的插值视频帧。
在一示例性实施例中,防抖处理模块1020可以用于读取运动数据对应的图像映射矩阵,并将图像映射矩阵作为插帧视频中第一帧视频帧对应的原始坐标映射矩阵;基于运动数据和第一帧视频帧对应的原始坐标映射矩阵生成插帧视频中其他视频帧对应的原始坐标映射矩阵;通过时域滤波的方法对插帧视频中的视频帧对应的原始坐标映射矩阵进行滤波处理,得到视频帧对应的修正图像映射矩阵;基于修正图像映射矩阵对视频帧进行修复,以获取插帧视频对应的防抖视频帧。
在一示例性实施例中,防抖处理模块1020可以用于根据预设选择规则在视频帧中选择待修复视频帧,通过修正图像映射矩阵对对应的待修复视频帧进行修复,并将修复后的待修复视频帧作为防抖视频帧。
在一示例性实施例中,视频生成模块1030可以用于根据预设抽帧规则在防抖视频帧中抽取目标防抖帧,并将目标防抖帧输出,生成原始视频对应的防抖视频。
在一示例性实施例中,视频生成模块1030可以用于根据防抖视频帧中第一目标对象的运动状态进行抽帧;根据防抖视频帧中第二目标对象的稳定性进行抽帧;以及根据防抖 视频帧的图像质量进行抽帧。
在一示例性实施例中,视频生成模块1030可以用于获取防抖视频帧中第一目标对象的初始运动状态和最终运动状态;根据初始运动状态和最终运动状态确定第一目标对象在各个时间点的中间运动状态;在防抖视频帧中抽取目标防抖帧;其中,目标防抖帧中,第一目标对象对应的运动状态与对应时间点的中间运动状态相同。
在一示例性实施例中,视频生成模块1030可以用于根据防抖视频帧中的第二目标对象,与时间顺序上的前一帧防抖视频帧中的第二目标对象的重合率确定防抖视频帧的稳定参数;根据稳定参数在防抖视频帧中抽取目标防抖帧。
在一示例性实施例中,视频生成模块1030可以用于在防抖视频帧中存在相同时间相位的多个插帧形成的防抖视频帧时,根据多个防抖视频帧对应的置信度确定多个防抖视频帧的质量参数;根据质量参数在多个防抖视频帧中确定一个防抖视频帧作为时间相位对应的目标防抖帧。
上述装置中各模块的具体细节在方法部分实施方式中已经详细说明,未披露的细节内容可以参见方法部分的实施方式内容,因而不再赘述。
所属技术领域的技术人员能够理解,本公开的各个方面可以实现为系统、方法或程序产品。因此,本公开的各个方面可以具体实现为以下形式,即:完全的硬件实施方式、完全的软件实施方式(包括固件、微代码等),或硬件和软件方面结合的实施方式,这里可以统称为“电路”、“模块”或“系统”。
本公开的示例性实施方式还提供了一种计算机可读存储介质,其上存储有能够实现本说明书上述方法的程序产品。在一些可能的实施方式中,本公开的各个方面还可以实现为一种程序产品的形式,其包括程序代码,当程序产品在终端设备上运行时,程序代码用于使终端设备执行本说明书上述“示例性方法”部分中描述的根据本公开各种示例性实施方式的步骤,例如可以执行图3、图4和图9中任意一个或多个步骤。
需要说明的是,本公开所示的计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子可以包括但不限于:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。
在本公开中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本公开中,计算机可读的信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读的信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于:无线、电线、光缆、RF等等,或者上述的任意合适的组合。
此外,可以以一种或多种程序设计语言的任意组合来编写用于执行本公开操作的程序代码,程序设计语言包括面向对象的程序设计语言—诸如Java、C++等,还包括常规的过程式程序设计语言—诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算设备上执行、部分地在用户设备上执行、作为一个独立的软件包执行、部分在用户计算设备上部分在远程计算设备上执行、或者完全在远程计算设备或服务器上执行。在涉及远程计算设备的情形中,远程计算设备可以通过任意种类的网络,包括局域网(LAN)或 广域网(WAN),连接到用户计算设备,或者,可以连接到外部计算设备(例如利用因特网服务提供商来通过因特网连接)。
本领域技术人员在考虑说明书及实践这里公开的发明后,将容易想到本公开的其他实施例。本申请旨在涵盖本公开的任何变型、用途或者适应性变化,这些变型、用途或者适应性变化遵循本公开的一般性原理并包括本公开未公开的本技术领域中的公知常识或惯用技术手段。说明书和实施例仅被视为示例性的,本公开的真正范围和精神由权利要求指出。
应当理解的是,本公开并不局限于上面已经描述并在附图中示出的精确结构,并且可以在不脱离其范围进行各种修改和改变。本公开的范围仅由所附的权利要求来限。

Claims (20)

  1. 一种视频处理方法,其特征在于,包括:
    获取原始视频和采集所述原始视频时采集设备对应的运动数据,并对所述原始视频进行插帧处理,以获得所述原始视频对应的插帧视频;
    根据所述运动数据对所述插帧视频中的视频帧进行防抖修复,以获得所述插帧视频对应的防抖视频帧;
    根据所述防抖视频帧生成所述原始视频对应的防抖视频。
  2. 根据权利要求1所述的方法,其特征在于,所述对所述原始视频进行插帧处理,包括:
    在所述原始视频中抽取至少一对原始视频帧对,根据预设插帧规则确定对所述原始视频帧对进行插帧的插帧时间相位,并根据所述插帧时间相位对各所述原始视频帧对进行插帧处理。
  3. 根据权利要求2所述的方法,其特征在于,所述预设插帧规则至少包括等时间相位规则;
    所述在所述原始视频中抽取至少一对原始视频帧对,根据预设插帧规则确定对所述原始视频帧对进行插帧的插帧时间相位,并根据所述插帧时间相位对各所述原始视频帧对进行插帧处理,包括:
    在所述原始视频中任意抽取至少一对原始视频帧对,所述原始视频帧对包括第一原始视频帧和第二原始视频帧;
    基于所述运动数据确定所述原始视频帧对对应视频片段的抖动程度值;其中,所述视频片段以所述第一原始视频帧为起点,以所述第二原始视频帧为终点;
    在所述抖动程度值小于预设抖动阈值时,根据所述预设插帧规则中的任意一个插帧规则确定对所述原始视频帧对进行插帧的插帧时间相位,并根据所述插帧时间相位对所述原始视频帧对进行插帧,以获取对应的插值视频帧;
    在所述抖动程度值大于或等于所述预设抖动阈值时,根据所述等时间相位规则确定对所述原始视频帧对进行插帧的插帧时间相位,并根据所述插帧时间相位对所述原始视频帧对进行插帧,以获取对应的插值视频帧。
  4. 根据权利要求3所述的方法,其特征在于,所述根据所述等时间相位规则确定对所述原始视频帧对进行插帧的插帧时间相位,并根据所述插帧时间相位对所述原始视频帧对进行插帧,包括:
    获取所述原始视频中所述原始视频帧对之间的中间帧;
    根据所述中间帧的数量对所述原始视频帧对之间的时间间隔进行等间距划分以确定插帧时间相位;
    根据所述插帧时间相位对所述原始视频帧对进行插帧,生成与所述中间帧的数量相等的插值视频帧,并将所述中间帧删除。
  5. 根据权利要求2至4中任一项所述的方法,其特征在于,根据所述插帧时间相位对所述原始视频帧对进行插帧处理,包括:
    采用运动估计的方式确定所述原始视频帧对对应的运动矢量;
    通过所述插帧时间相位对所述运动矢量进行校正,以获取所述原始视频帧对对应的映射矢量;
    基于所述映射矢量对所述原始视频帧对进行融合插帧,以生成对应的插值视频帧。
  6. 根据权利要求2所述的方法,其特征在于,所述方法还包括:
    在所述插帧视频中存在相同时间相位的多个插帧形成的插值视频帧时,对多个所述插值视频帧进行融合,并将融合得到的一个插值视频帧作为所述时间相位对应 的插值视频帧。
  7. 根据权利要求1所述的方法,其特征在于,所述根据所述运动数据对所述插帧视频中的视频帧进行防抖修复,以获得所述插帧视频对应的防抖视频帧,包括:
    读取所述运动数据对应的图像映射矩阵,并将所述图像映射矩阵作为所述插帧视频中第一帧视频帧对应的原始坐标映射矩阵;
    基于所述运动数据和所述第一帧视频帧对应的原始坐标映射矩阵生成所述插帧视频中其他视频帧对应的原始坐标映射矩阵;
    通过时域滤波的方法对所述插帧视频中的所述视频帧对应的原始坐标映射矩阵进行滤波处理,得到所述视频帧对应的修正图像映射矩阵;
    基于所述修正图像映射矩阵对所述视频帧进行修复,以获取所述插帧视频对应的防抖视频帧。
  8. 根据权利要求7所述的方法,其特征在于,所述基于所述修正图像映射矩阵对所述视频帧进行修复,以获取所述插帧视频对应的防抖视频帧,包括:
    根据预设选择规则在所述视频帧中选择待修复视频帧,通过所述修正图像映射矩阵对对应的所述待修复视频帧进行修复,并将修复后的所述待修复视频帧作为防抖视频帧。
  9. 根据权利要求1所述的方法,其特征在于,所述根据所述防抖视频帧生成所述原始视频对应的防抖视频,包括:
    根据预设抽帧规则在所述防抖视频帧中抽取目标防抖帧,并将所述目标防抖帧输出,生成所述原始视频对应的防抖视频。
  10. 根据权利要求9所述的方法,其特征在于,所述预设抽帧规则包括自适应抽帧规则;所述自适应抽帧规则包括以下规则中至少一种:
    根据所述防抖视频帧中第一目标对象的运动状态进行抽帧;
    根据所述防抖视频帧中第二目标对象的稳定性进行抽帧;以及
    根据所述防抖视频帧的图像质量进行抽帧。
  11. 根据权利要求10所述的方法,其特征在于,根据所述防抖视频帧中第一目标对象的运动状态进行抽帧,包括:
    获取所述防抖视频帧中所述第一目标对象的初始运动状态和最终运动状态;
    根据所述初始运动状态和所述最终运动状态确定所述第一目标对象在各个时间点的中间运动状态;
    在所述防抖视频帧中抽取目标防抖帧;其中,所述目标防抖帧中,所述第一目标对象对应的运动状态与对应时间点的所述中间运动状态相同。
  12. 根据权利要求10所述的方法,其特征在于,根据所述防抖视频帧中第二目标对象的稳定性进行抽帧,包括:
    根据所述防抖视频帧中的第二目标对象,与时间顺序上的前一帧防抖视频帧中的第二目标对象的重合率确定所述防抖视频帧的稳定参数;
    根据所述稳定参数在所述防抖视频帧中抽取目标防抖帧。
  13. 根据权利要求10所述的方法,其特征在于,根据所述防抖视频帧的图像质量进行抽帧,包括:
    在所述防抖视频帧中存在相同时间相位的多个插帧形成的防抖视频帧时,根据多个所述防抖视频帧对应的置信度确定多个所述防抖视频帧的质量参数;
    根据所述质量参数在多个所述防抖视频帧中确定一个所述防抖视频帧作为所述时间相位对应的目标防抖帧。
  14. 一种视频处理装置,其特征在于,包括:
    视频插帧模块,用于获取原始视频和采集所述原始视频时采集设备对应的运动 数据,并对所述原始视频进行插帧处理,以获得所述原始视频对应的插帧视频;
    防抖处理模块,用于根据所述运动数据对所述插帧视频中的视频帧进行防抖修复,以获得所述插帧视频对应的防抖视频帧;
    视频生成模块,用于根据所述防抖视频帧生成所述原始视频对应的防抖视频。
  15. 根据权利要求14所述的装置,其特征在于,所述视频插帧模块还用于在所述原始视频中抽取至少一对原始视频帧对,根据预设插帧规则确定对所述原始视频帧对进行插帧的插帧时间相位,并根据所述插帧时间相位对各所述原始视频帧对进行插帧处理。
  16. 根据权利要求14所述的装置,其特征在于,所述防抖处理模块还用于读取所述运动数据对应的图像映射矩阵,并将所述图像映射矩阵作为所述插帧视频中第一帧视频帧对应的原始坐标映射矩阵;基于所述运动数据和所述第一帧视频帧对应的原始坐标映射矩阵生成所述插帧视频中其他视频帧对应的原始坐标映射矩阵;通过时域滤波的方法对所述插帧视频中的所述视频帧对应的原始坐标映射矩阵进行滤波处理,得到所述视频帧对应的修正图像映射矩阵;基于所述修正图像映射矩阵对所述视频帧进行修复,以获取所述插帧视频对应的防抖视频帧。
  17. 根据权利要求14所述的装置,其特征在于,所述视频生成模块还用于根据预设抽帧规则在所述防抖视频帧中抽取目标防抖帧,并将所述目标防抖帧输出,生成所述原始视频对应的防抖视频。
  18. 根据权利要求17所述的装置,其特征在于,所述预设抽帧规则包括自适应抽帧规则;所述自适应抽帧规则包括以下规则中至少一种:
    根据所述防抖视频帧中第一目标对象的运动状态进行抽帧;
    根据所述防抖视频帧中第二目标对象的稳定性进行抽帧;以及
    根据所述防抖视频帧的图像质量进行抽帧。
  19. 一种计算机可读介质,其上存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现如权利要求1至13中任一项所述的方法。
  20. 一种电子设备,其特征在于,包括:
    处理器;以及存储器,用于存储一个或多个程序,当所述一个或多个程序被所述一个或多个处理器执行时,使得所述一个或多个处理器实现如权利要求1至13中任一项所述的视频处理方法。
PCT/CN2021/087795 2020-05-19 2021-04-16 视频处理方法、视频处理装置和电子设备 WO2021233032A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010425185.6 2020-05-19
CN202010425185.6A CN111641835B (zh) 2020-05-19 2020-05-19 视频处理方法、视频处理装置和电子设备

Publications (1)

Publication Number Publication Date
WO2021233032A1 true WO2021233032A1 (zh) 2021-11-25

Family

ID=72332090

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/087795 WO2021233032A1 (zh) 2020-05-19 2021-04-16 视频处理方法、视频处理装置和电子设备

Country Status (2)

Country Link
CN (1) CN111641835B (zh)
WO (1) WO2021233032A1 (zh)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114494083A (zh) * 2022-04-14 2022-05-13 杭州雄迈集成电路技术股份有限公司 一种自适应提升视频通透性方法和系统
CN114640754A (zh) * 2022-03-08 2022-06-17 京东科技信息技术有限公司 视频抖动检测方法、装置、计算机设备及存储介质
CN114913468A (zh) * 2022-06-16 2022-08-16 阿里巴巴(中国)有限公司 对象修复方法、修复评估方法、电子设备和存储介质
CN115174995A (zh) * 2022-07-04 2022-10-11 北京国盛华兴科技有限公司 一种视频数据的插帧方法及装置
CN115242981A (zh) * 2022-07-25 2022-10-25 维沃移动通信有限公司 视频播放方法、视频播放装置和电子设备
CN116112707A (zh) * 2023-02-01 2023-05-12 上海哔哩哔哩科技有限公司 视频处理方法及装置、电子设备和存储介质

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111641835B (zh) * 2020-05-19 2023-06-02 Oppo广东移动通信有限公司 视频处理方法、视频处理装置和电子设备
CN113837136B (zh) * 2021-09-29 2022-12-23 深圳市慧鲤科技有限公司 视频插帧方法及装置、电子设备和存储介质
CN116055876A (zh) * 2021-10-27 2023-05-02 北京字跳网络技术有限公司 一种视频处理方法、装置、电子设备和存储介质
CN114745545B (zh) * 2022-04-11 2024-07-09 北京字节跳动网络技术有限公司 一种视频插帧方法、装置、设备和介质
CN114827663B (zh) * 2022-04-12 2023-11-21 咪咕文化科技有限公司 分布式直播插帧系统及方法
CN116866665B (zh) * 2023-09-05 2023-11-14 中信建投证券股份有限公司 一种视频播放方法、装置、电子设备及存储介质

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1863273A (zh) * 2005-05-13 2006-11-15 三洋电机株式会社 图像信号处理装置及其方法和具有其的摄像装置
CN101867698A (zh) * 2009-04-16 2010-10-20 索尼公司 图像处理设备、图像处理方法和记录介质
US8130277B2 (en) * 2008-02-20 2012-03-06 Aricent Group Method and system for intelligent and efficient camera motion estimation for video stabilization
CN102542529A (zh) * 2008-08-04 2012-07-04 株式会社东芝 图像处理器和图像处理方法
CN102761729A (zh) * 2011-04-26 2012-10-31 索尼公司 图像处理装置和方法以及程序
CN104469086A (zh) * 2014-12-19 2015-03-25 北京奇艺世纪科技有限公司 一种视频去抖动方法及装置
CN111641835A (zh) * 2020-05-19 2020-09-08 Oppo广东移动通信有限公司 视频处理方法、视频处理装置和电子设备

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102546917B (zh) * 2010-12-31 2014-10-22 联想移动通信科技有限公司 带摄像头的移动终端及其视频处理方法
US9131127B2 (en) * 2013-02-08 2015-09-08 Ati Technologies, Ulc Method and apparatus for reconstructing motion compensated video frames
JP6090494B2 (ja) * 2016-03-25 2017-03-08 カシオ計算機株式会社 動画撮影装置および動画像揺れ補正方法ならびにプログラム
CN107027029B (zh) * 2017-03-01 2020-01-10 四川大学 基于帧率变换的高性能视频编码改进方法
JP6995490B2 (ja) * 2017-04-14 2022-01-14 キヤノン株式会社 映像再生装置とその制御方法及びプログラム
WO2018223381A1 (zh) * 2017-06-09 2018-12-13 厦门美图之家科技有限公司 一种视频防抖方法及移动设备
CN110198412B (zh) * 2019-05-31 2020-09-18 维沃移动通信有限公司 一种视频录制方法及电子设备
CN110366003A (zh) * 2019-06-24 2019-10-22 北京大米科技有限公司 视频数据的抗抖动处理方法、装置、电子设备和存储介质

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1863273A (zh) * 2005-05-13 2006-11-15 三洋电机株式会社 图像信号处理装置及其方法和具有其的摄像装置
US8130277B2 (en) * 2008-02-20 2012-03-06 Aricent Group Method and system for intelligent and efficient camera motion estimation for video stabilization
CN102542529A (zh) * 2008-08-04 2012-07-04 株式会社东芝 图像处理器和图像处理方法
CN101867698A (zh) * 2009-04-16 2010-10-20 索尼公司 图像处理设备、图像处理方法和记录介质
CN102761729A (zh) * 2011-04-26 2012-10-31 索尼公司 图像处理装置和方法以及程序
CN104469086A (zh) * 2014-12-19 2015-03-25 北京奇艺世纪科技有限公司 一种视频去抖动方法及装置
CN111641835A (zh) * 2020-05-19 2020-09-08 Oppo广东移动通信有限公司 视频处理方法、视频处理装置和电子设备

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114640754A (zh) * 2022-03-08 2022-06-17 京东科技信息技术有限公司 视频抖动检测方法、装置、计算机设备及存储介质
CN114494083A (zh) * 2022-04-14 2022-05-13 杭州雄迈集成电路技术股份有限公司 一种自适应提升视频通透性方法和系统
CN114494083B (zh) * 2022-04-14 2022-07-29 杭州雄迈集成电路技术股份有限公司 一种自适应提升视频通透性方法和系统
CN114913468A (zh) * 2022-06-16 2022-08-16 阿里巴巴(中国)有限公司 对象修复方法、修复评估方法、电子设备和存储介质
CN115174995A (zh) * 2022-07-04 2022-10-11 北京国盛华兴科技有限公司 一种视频数据的插帧方法及装置
CN115242981A (zh) * 2022-07-25 2022-10-25 维沃移动通信有限公司 视频播放方法、视频播放装置和电子设备
CN115242981B (zh) * 2022-07-25 2024-06-25 维沃移动通信有限公司 视频播放方法、视频播放装置和电子设备
CN116112707A (zh) * 2023-02-01 2023-05-12 上海哔哩哔哩科技有限公司 视频处理方法及装置、电子设备和存储介质

Also Published As

Publication number Publication date
CN111641835B (zh) 2023-06-02
CN111641835A (zh) 2020-09-08

Similar Documents

Publication Publication Date Title
WO2021233032A1 (zh) 视频处理方法、视频处理装置和电子设备
CN109788189B (zh) 将相机与陀螺仪融合在一起的五维视频稳定化装置及方法
CN111641828B (zh) 视频处理方法及装置、存储介质和电子设备
CN111784614A (zh) 图像去噪方法及装置、存储介质和电子设备
CN105704369B (zh) 一种信息处理方法及装置、电子设备
CN109583391B (zh) 关键点检测方法、装置、设备及可读介质
CN112270754B (zh) 局部网格地图构建方法及装置、可读介质和电子设备
CN111445392B (zh) 图像处理方法及装置、计算机可读存储介质和电子设备
CN105430263A (zh) 长曝光全景图像拍摄装置和方法
CN111741303B (zh) 深度视频处理方法、装置、存储介质与电子设备
CN112954251B (zh) 视频处理方法、视频处理装置、存储介质与电子设备
CN111161176B (zh) 图像处理方法及装置、存储介质和电子设备
JP6398971B2 (ja) 画像処理装置、画像処理方法およびプログラム
CN111652933B (zh) 基于单目相机的重定位方法、装置、存储介质与电子设备
CN111866483A (zh) 颜色还原方法及装置、计算机可读介质和电子设备
CN111641829A (zh) 视频处理方法及装置、系统、存储介质和电子设备
CN111694978A (zh) 图像相似度检测方法、装置、存储介质与电子设备
CN111784734A (zh) 图像处理方法及装置、存储介质和电子设备
CN111835973A (zh) 拍摄方法、拍摄装置、存储介质与移动终端
TW202110165A (zh) 一種資訊處理方法、電子設備、儲存媒體和程式
CN113038010B (zh) 视频处理方法、视频处理装置、存储介质与电子设备
WO2024104439A1 (zh) 图像插帧方法、装置、设备及计算机可读存储介质
CN113781336B (zh) 图像处理的方法、装置、电子设备与存储介质
CN113409209B (zh) 图像去模糊方法、装置、电子设备与存储介质
CN113362260A (zh) 图像优化方法及装置、存储介质及电子设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21809058

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21809058

Country of ref document: EP

Kind code of ref document: A1