WO2022179554A1 - 视频拼接方法、装置、计算机设备和存储介质 - Google Patents

视频拼接方法、装置、计算机设备和存储介质 Download PDF

Info

Publication number
WO2022179554A1
WO2022179554A1 PCT/CN2022/077635 CN2022077635W WO2022179554A1 WO 2022179554 A1 WO2022179554 A1 WO 2022179554A1 CN 2022077635 W CN2022077635 W CN 2022077635W WO 2022179554 A1 WO2022179554 A1 WO 2022179554A1
Authority
WO
WIPO (PCT)
Prior art keywords
video
video frame
frame
overlapping
spliced
Prior art date
Application number
PCT/CN2022/077635
Other languages
English (en)
French (fr)
Inventor
谢朝毅
Original Assignee
影石创新科技股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 影石创新科技股份有限公司 filed Critical 影石创新科技股份有限公司
Priority to EP22758918.1A priority Critical patent/EP4300982A4/en
Priority to JP2023550696A priority patent/JP2024506109A/ja
Publication of WO2022179554A1 publication Critical patent/WO2022179554A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/44016Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving splicing one content stream with another content stream, e.g. for substituting a video clip
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/14Transformations for image registration, e.g. adjusting or mapping for alignment of images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4038Image mosaicing, e.g. composing plane images from plane sub-images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/30Determination of transform parameters for the alignment of images, i.e. image registration
    • G06T7/33Determination of transform parameters for the alignment of images, i.e. image registration using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/30Determination of transform parameters for the alignment of images, i.e. image registration
    • G06T7/38Registration of image sequences
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/44008Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules
    • H04N23/68Control of cameras or camera modules for stable pick-up of the scene, e.g. compensating for camera body vibrations
    • H04N23/682Vibration or motion blur correction
    • H04N23/683Vibration or motion blur correction performed by a processor, e.g. controlling the readout of an image memory
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules
    • H04N23/698Control of cameras or camera modules for achieving an enlarged field of view, e.g. panoramic image capture
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20212Image combination
    • G06T2207/20221Image fusion; Image merging

Definitions

  • the present application relates to the technical field of image processing, and in particular, to a video splicing method, device, computer equipment and storage medium.
  • Video stitching technology can stitch together videos shot under different time conditions to form a complete video.
  • cameras take panoramic videos of objects passing through obstacles. When shooting an object passing through an obstacle, stop shooting the first video after the object passes through the obstacle for a certain distance. Then, the camera bypasses the obstacle, shoots a second video of the object passing through the obstacle from the other side of the obstacle, and stitches the first video and the second video to form a complete panoramic video of the object passing through the obstacle.
  • Panoramic video is widely used in various fields due to its large viewing angle and high resolution. Therefore, video stitching technology is also widely used in various fields.
  • the current video splicing method has the problem of poor video splicing effect.
  • a video splicing method comprising: acquiring a first video and a second video to be spliced, the first video is before the second video; performing still frame detection on the first video or the second video , obtain a still frame sequence; obtain a reference video frame based on the still frame sequence; perform a coincidence area search based on the reference video frame to obtain the first coincident video frame area corresponding to the first video and the corresponding second video frame.
  • the second overlapping video frame area based on the first overlapping video frame area and the second overlapping video frame area, the first video and the second video are spliced to obtain a spliced video.
  • the step of splicing the first video and the second video based on the first overlapping video frame area and the second overlapping video frame area to obtain the spliced video includes: acquiring the position of the spliced video frame , obtaining the first splicing video frame corresponding to the splicing video frame position from the first overlapping video frame area, and obtaining the second splicing video frame corresponding to the splicing video frame position from the second overlapping video frame area; Determine the spatial transformation relationship between the first spliced video frame and the second spliced video frame, and perform video frame alignment on the first video and the second video based on the spatial transformation relationship; The first video frame is spliced with the second video frame to obtain a spliced video, wherein, during splicing, the first overlapping video frame area and the second overlapping video frame area are fused to obtain a fused video frame.
  • the spatial transformation relationship includes a horizontal transformation value
  • Performing video frame alignment on the first video and the second video includes: acquiring a first feature point of the first spliced video frame and a second feature point of the second splicing video frame; determining the first feature point The horizontal distance between the second feature point and the second feature point; the horizontal transformation value between the first spliced video frame and the second spliced video frame is determined based on the horizontal distance.
  • the step of fusing the first overlapping video frame region and the second overlapping video frame region to obtain a fused video frame includes: acquiring a current video to be fused from the first overlapping video frame region frame; obtain the current time difference between the current shooting time of the video frame corresponding to the current video frame and the reference shooting time of the reference video frame; obtain the current fusion weight corresponding to the current video frame based on the current time difference, wherein the current time The difference is positively correlated with the current fusion weight; based on the current fusion weight, the current video frame is fused with the video frame at the corresponding position of the second overlapping video frame region to obtain the current fusion video frame.
  • the obtaining the current fusion weight corresponding to the current video frame based on the current time difference includes: obtaining the overlapping area time length corresponding to the overlapping video frame area; calculating the difference between the current time difference and the overlapping area time length ratio to get the current fusion weight.
  • the searching for the overlapping area based on the reference video frame, and obtaining the first overlapping video frame area corresponding to the first video and the second overlapping video frame area corresponding to the second video include: Comparing the reference video frame with each video frame in the first video to obtain a matching video frame matching the reference video frame in the first video; compare the tail video frame area corresponding to the matching video frame , as the first overlapping video frame area corresponding to the first video; the reference video frame area where the reference video frame is located in the second video frame is taken as the second overlapping video frame area corresponding to the second video , the reference video frame is a head video frame of the reference video frame region, and the reference video frame region matches the number of video frames in the tail video frame region.
  • the searching for the overlapping area based on the reference video frame, and obtaining the first overlapping video frame area corresponding to the first video and the second overlapping video frame area corresponding to the second video include: Acquire a trailing video frame sequence with a preset number of frames in the first video as the first coincident video frame area corresponding to the first video; from the backward video frame sequence corresponding to the reference video frame, obtain For the matched video frame sequence matched by the tail video frame sequence, the matched video frame sequence is used as the second coincident video frame region corresponding to the second video.
  • the performing still frame detection on the first video or the second video to obtain a still frame sequence includes: converting the first video or the second video into a plane video; Perform still frame detection to obtain the still frame sequence.
  • a video splicing device comprises: a first video and a second video acquisition module for acquiring a first video and a second video to be spliced, the first video is before the second video; still frame A sequence obtaining module is used to perform still frame detection on the first video or the second video to obtain a still frame sequence; a reference video frame obtaining module is used to obtain a reference video frame based on the still frame sequence; the first overlapping video A frame area and a second overlapping video frame area obtaining module, configured to perform a overlapping area search based on the reference video frame, to obtain a first overlapping video frame area corresponding to the first video and a second overlapping video frame area corresponding to the second video A video frame area; a spliced video obtaining module, configured to splicing the first video and the second video based on the first overlapping video frame area and the second overlapping video frame area to obtain a spliced video.
  • the spliced video obtaining module is configured to obtain the spliced video frame position, obtain the first spliced video frame corresponding to the spliced video frame position from the first overlapping video frame area, and obtain the first spliced video frame corresponding to the spliced video frame position from the first overlapping video frame area, and obtain the first splicing video frame corresponding to the splicing video frame position from the first overlapping video frame area, Obtain the second spliced video frame corresponding to the position of the spliced video frame in the frame area; determine the spatial transformation relationship between the first spliced video frame and the second spliced video frame, based on the spatial transformation relationship.
  • the spliced video obtaining module is used to obtain the first feature point of the first spliced video frame and the second feature point of the second spliced video frame;
  • the horizontal distance between the second feature points; the horizontal transformation value between the first spliced video frame and the second spliced video frame is determined based on the horizontal distance.
  • the spliced video obtaining module is configured to obtain the current video frame to be fused from the first overlapping video frame area; obtain the current shooting time of the video frame corresponding to the current video frame and the reference video frame The current time difference between the reference shooting times; obtain the current fusion weight corresponding to the current video frame based on the current time difference, wherein the current time difference is positively correlated with the current fusion weight; The video frames at the corresponding positions of the two overlapping video frame regions are fused to obtain the current fused video frame.
  • the spliced video obtaining module is used to obtain the overlapping area time length corresponding to the overlapping video frame area; calculate the ratio of the current time difference to the overlapping area time length to obtain the current fusion weight.
  • the module for obtaining the first overlapping video frame area and the second overlapping video frame area is configured to compare the reference video frame with each video frame in the first video, to obtain the first video frame in the first video.
  • the matched video frame matched with the reference video frame; the tail video frame area corresponding to the matched video frame is used as the first coincident video frame area corresponding to the first video; the second video frame described
  • the reference video frame region where the reference video frame is located is taken as the second coincident video frame region corresponding to the second video, and the reference video frame is the head video frame of the reference video frame region, and the reference video frame region is the same as the reference video frame region.
  • the number of video frames in the tail video frame area matches.
  • the first overlapping video frame area and the second overlapping video frame area obtaining module is used to obtain a sequence of tail video frames with a preset number of frames in the first video, as the first video corresponding to the The first overlapping video frame area; from the backward video frame sequence corresponding to the reference video frame, obtain the matching video frame sequence matching the tail video frame sequence, and use the matching video frame sequence as the corresponding video frame sequence of the second video.
  • the second coincident video frame area is used to obtain a sequence of tail video frames with a preset number of frames in the first video, as the first video corresponding to the The first overlapping video frame area; from the backward video frame sequence corresponding to the reference video frame, obtain the matching video frame sequence matching the tail video frame sequence, and use the matching video frame sequence as the corresponding video frame sequence of the second video.
  • the second coincident video frame area is used to obtain a sequence of tail video frames with a preset number of frames in the first video, as the first video corresponding to the The first overlapping video frame area; from the
  • the still frame sequence obtaining module is configured to convert the first video or the second video into a plane video; perform still frame detection on the plane video to obtain the still frame sequence.
  • a computer device comprising a memory and a processor, the memory stores a computer program, and the processor implements the following steps when executing the computer program: acquiring a first video and a second video to be spliced, the first video Before the second video; perform still frame detection on the first video or the second video to obtain a still frame sequence; obtain a reference video frame based on the still frame sequence; perform a coincidence area search based on the reference video frame , obtain the first overlapping video frame area corresponding to the first video and the second overlapping video frame area corresponding to the second video; based on the first overlapping video frame area and the second overlapping video frame area A video is spliced with the second video to obtain a spliced video.
  • the terminal obtains the first video and the second video to be spliced, performs still frame detection on the first video or the second video, and obtains a still frame sequence; based on the still frame sequence, obtains Reference video frame; wherein, the first video is captured before the second video is captured.
  • the overlapping area search is performed based on the reference video frame to obtain a first overlapping video frame area corresponding to the first video and a second overlapping video frame area corresponding to the second video; based on the first overlapping video frame area and the second overlapping video frame area
  • the first video and the second video are spliced to obtain a spliced video.
  • the first overlapping video frame area and the second overlapping area are obtained, and based on the above two overlapping video frame areas, the first video and the second video are spliced, so that the first video and the second video are spliced together. It can realize natural splicing transition and improve the video splicing effect.
  • FIG. 1 is an application environment diagram of a video splicing method in one embodiment
  • FIG. 2 is a schematic flowchart of a video splicing method in one embodiment
  • FIG. 3 is a schematic flowchart of a video splicing method in another embodiment
  • FIG. 4 is a schematic flowchart of a video splicing method in another embodiment
  • FIG. 5 is a schematic flowchart of a step of fusing a first overlapping video frame region and the second overlapping video frame region to obtain a fused video frame in one embodiment
  • FIG. 6 is a schematic flowchart of a step of obtaining a current fusion weight corresponding to a current video frame based on a current time difference in one embodiment
  • FIG. 7 is a schematic flowchart of a video splicing method in another embodiment
  • FIG. 8 is a schematic flowchart of a video splicing method in another embodiment
  • FIG. 9 is a schematic diagram of a method for determining the position of a spliced video frame in one embodiment
  • FIG. 10 is a structural block diagram of a video splicing device in one embodiment
  • Figure 11 is a diagram of the internal structure of a computer device in one embodiment.
  • the video splicing method provided by the present application can be applied to the application environment shown in FIG. 1 , and is specifically applied to a video splicing system.
  • the video splicing system includes a video capture device 102 and a terminal 104, wherein the video capture device 102 communicates with the terminal 104 through a network.
  • Terminal 104 performs a video stitching method. Specifically, the video capture device 102 transmits to the terminal 104 two pieces of video to be spliced shot at the same position of the object to be photographed at different times, and the terminal 104 correspondingly obtains the first video and the second video to be spliced.
  • the video is the forward video of the second video; after acquiring the first video and the second video, the terminal 104 performs still frame detection on the first video or the second video, and obtains a still frame sequence; based on the still frame sequence, Obtain a reference video frame; perform a coincidence region search based on the reference video frame, respectively obtain a first coincidence video frame region in the first video and a second coincidence video frame region in the second video; Terminal 104 is based on the first coincidence video frame region
  • the first video and the second video are spliced together with the second overlapping video frame area to obtain a spliced video.
  • the video capture device 102 may be, but is not limited to, various devices having a video capture function, and may be distributed outside the terminal 104 or inside the terminal 104 . For example, various cameras, various cameras, video capture cards, etc. are distributed outside the terminal 104 .
  • the terminal 104 may be, but is not limited to, various cameras, personal computers, notebook computers, smart phones, tablet computers, and portable wearable
  • a video splicing method is provided, and the method is applied to the terminal in FIG. 1 as an example for description, including the following steps:
  • Step 202 Acquire a first video and a second video to be spliced, where the first video precedes the second video.
  • the forward video refers to a video obtained from the same shooting position as the second video before shooting the second video.
  • cameras take panoramic videos of objects passing through obstacles.
  • the first video is stopped after the object passes through the obstacle for a certain distance.
  • the camera then goes around the obstacle and takes a second video of the object crossing the obstacle from the other side of the obstacle.
  • the time of shooting the first video is regarded as the first time
  • the time of shooting the second video is regarded as the second time.
  • the first time is before the second time.
  • the two videos have the same shooting position, that is, a position with a certain distance from the obstacle, the first video is the forward video of the second video.
  • the first video and the second video to be spliced need to be acquired first.
  • the terminal can collect video through a connected video collection device, and the collection device transmits the collected video to the terminal in real time;
  • the video acquisition instruction is sent, the locally stored video is transmitted to the terminal, and correspondingly, the terminal can acquire the first video and the second video to be spliced.
  • the terminal collects the first video and the second video through an internal video collection module, and stores the collected video in the terminal memory.
  • the terminal needs to splicing the first video and the second video , and obtain the first video and the second video to be spliced from the memory.
  • Step 204 Perform still frame detection on the first video or the second video to obtain a still frame sequence.
  • the still frame refers to a video frame in which the picture of the first video or the second video is made still in each video frame in the first video or the second video.
  • the still frame sequence refers to a sequence composed of sequential still frames in the first video or the second video.
  • feature point extraction may be performed on the last video frame in the first video sequentially with multiple consecutive video frames before the last video frame, and feature point matching may be performed.
  • the video frame sequence composed of the multiple video frames is determined as the still frame sequence.
  • the last video frame in the first video is represented as the first frame, and feature points are extracted and matched with the consecutive n-2 frames before the last video frame.
  • the first video The video frame sequence consisting of the last n-1 frames is determined as a still frame sequence.
  • feature point extraction may be performed through the first video frame in the second video and successive video frames following the first video frame, and feature point matching may be performed.
  • the video frame sequence composed of the multiple video frames is determined as the still frame sequence.
  • the first video frame in the second video is represented as the first frame, and feature points are extracted and matched with the consecutive n-2 frames after the first video frame.
  • the second A video frame sequence consisting of the first n-1 frames in the video is determined as a still frame sequence.
  • the image corresponding to the video frame is first converted into a plane view image.
  • the plane view image may refer to a plane view with a field of view of 90 degrees seen in a certain direction of the panorama.
  • the panorama includes six planes, up, down, front, back, left, and right, and each plane is a plane view.
  • the A certain video frame of the still frame to be determined is determined as a still frame.
  • an ORB Oriented Fast and Rotated Brief
  • feature point detection method can be used to extract and match feature points in a video frame.
  • SIFT Scale-invariant feature transform
  • SUFT Speeded Up Robust Features
  • LSD Line Segment Detection
  • Step 206 obtaining a reference video frame based on the still frame sequence.
  • the reference video frame refers to a video frame that can be used as a reference, and a matching result between other video frames and the video frame can be obtained by using the video frame.
  • the forward video frame of the still frame sequence may be obtained from the first video or the backward video frame of the still frame sequence may be obtained from the second video.
  • the forward video frame or The backward video frame is considered as a non-still video frame, and the non-still video frame can be used as a reference video frame.
  • the forward video frame refers to the first video frame before the still frame sequence in the first video
  • the backward video frame refers to the first video frame after the still frame sequence in the second video.
  • the OpenCV software library in the terminal is called to extract the forward video frame or the backward video frame.
  • OpenCV is a cross-platform computer vision and machine learning software library based on BSD (original BSD license, FreeBSD license, Original BSD license), which can extract video frames.
  • a CRC (Cyclic Redundancy Check) check method can be used to perform still frame detection on the first video or the second video, and a CRC check can be performed on the video frames in the first video or the second video by creating multiple threads. Test, get the CRC check value of each frame, through the CRC check value, you can obtain the still frame in the first video or the second video, obtain the non-still video frame through the still frame, and use the non-still video frame as the reference video frame. It can be understood that other still frame detection methods may also be used for the determination of the still frame.
  • Step 208 searching for overlapping regions based on the reference video frame, to obtain a first overlapping video frame region corresponding to the first video and a second overlapping video frame region corresponding to the second video.
  • the overlapping area refers to the video frame area obtained by shooting the same position in the two videos, for example, two videos shot at different times, the two videos have the same shooting position, and the two videos have the same shooting position.
  • the video frame area obtained by location shooting is the overlapping area.
  • the overlapping region can be determined by using the reference video frame.
  • the reference video frame may be matched with all frames in the first video, and in the first video, the video frame with the highest matching probability is used as the start frame of the first overlapping video frame region.
  • a region corresponding to a video frame having the same position as the video frame in the first overlapping video frame region is used to obtain a second overlapping video frame region corresponding to the second video.
  • the frame is represented as a P frame, and in the first video, the video frame portion corresponding to all the video frames after the P frame is taken as the overlapping area.
  • the video frame area that has the same position as all the video frames after the P frame is the video frame area between the C frame and the F frame, and the video frame area between the C frame and the F frame is regarded as the second overlap.
  • Video frame area is the video frame area between the C frame and the F frame.
  • video frames with a preset number of frames after the reference video frame may be taken to perform matching of the corresponding video frames.
  • m frames after the reference video frame may be taken, and by comparing the last m frames in the first video, the m frames corresponding to the maximum number of feature point matching statistics are obtained as the first overlapping video frame area.
  • the obtained feature point matching number statistics are different, form a correspondence table between m frames and feature point matching number statistics, and find the maximum matching number statistics from this table, the maximum value of the statistics
  • the corresponding m frames are the number of video frames in the obtained overlapping area.
  • a second overlapping video frame region corresponding to the second video can be obtained.
  • Table 1 it is a correspondence table between the preset number of frames and the statistical value of the matching number of feature points.
  • Step 210 splicing the first video and the second video based on the first overlapping video frame area and the second overlapping video frame area to obtain a spliced video.
  • splicing refers to the process of combining two or more videos into a complete video.
  • the first video and the second video are processed in the first overlapping video frame area and the second overlapping video frame area. Stitch video.
  • image fusion is performed after aligning the first video and the second video in the first overlapping video frame area and the second overlapping video frame area, and the first video and the second video are image-fused to form a complete Panoramic video.
  • Image fusion methods can use linear fusion, Poisson fusion, multi-scale fusion, weighted fusion or Laplacian pyramid fusion. It can be understood that each frame of video in the video can be regarded as a still image, and when performing video fusion, the fusion of multiple video frames aligned in the overlapping video frame region can be regarded as the fusion of multiple still images.
  • the method for weighted fusion of the first video and the second video forms a complete panoramic video.
  • the weight in the process of weighted fusion of the first video and the second video, may be determined by the current value between the current shooting time of the video frame corresponding to the current video frame in the overlapping area and the reference shooting time of the reference video frame. time difference to determine. Assuming that Q represents the weight, t1 represents the current shooting time of the video frame corresponding to the current video frame, t2 represents the reference shooting time of the reference video frame, and t represents the total time corresponding to the video frames in the overlapping area, the weight can be determined by the current video frame corresponding to the The difference between the current shooting time of the video frame and the reference shooting time of the reference video frame, and the total time corresponding to the video frames in the overlapping area are calculated.
  • I represents the fused video frame
  • I1 represents the current video frame of the first video in the overlapping area
  • I2 represents the current video frame of the second video in the overlapping area
  • first overlapping video frame area and the second overlapping video frame area are the areas where the overlapping video frame area exists in the first video and the second video respectively, and this area is the overlapping area of the first video and the second video. , using the first overlapping video frame area and the second overlapping video frame area, so as to distinguish the overlapping video frame areas corresponding to the overlapping area appearing in the first video and the second video.
  • the terminal obtains the first video and the second video to be spliced, performs still frame detection on the first video or the second video, and obtains a still frame sequence; and obtains a reference video frame based on the still frame sequence;
  • One video is captured before the second video is captured.
  • the overlapping area search is performed based on the reference video frame to obtain a first overlapping video frame area corresponding to the first video and a second overlapping video frame area corresponding to the second video; based on the first overlapping video frame area and the second overlapping video frame area
  • the first video and the second video are spliced to obtain a spliced video.
  • the first overlapping video frame area and the second overlapping area are obtained, and based on the above two overlapping video frame areas, the first video and the second video are spliced, so that the first video and the second video are spliced together. It can realize natural splicing transition and improve the video splicing effect.
  • splicing the first video and the second video based on the first overlapping video frame area and the second overlapping video frame area, and obtaining the spliced video includes:
  • Step 302 Obtain the position of the spliced video frame, obtain the first spliced video frame corresponding to the position of the spliced video frame from the first overlapping video frame area, and obtain the second spliced video frame corresponding to the position of the spliced video frame from the second overlapping video frame area.
  • the spliced video frame position refers to a video frame position capable of splicing the first video and the second video.
  • the spatial position of the 100th frame in the first video is frame S
  • the position of the 10th frame in the second video is also frame S
  • the splicing position It is the S frame
  • the corresponding 100th frame of the first video and the 10th frame of the second video can be regarded as the position of the spliced video frame.
  • the splicing may be performed by acquiring the spliced video frame positions of the two videos.
  • the center video frame of the overlapping area can be selected as the position of the spliced video frame, and after the position of the spliced video frame is obtained, the first spliced video frame and the spliced video frame corresponding to the position of the spliced video frame in the first video can be determined.
  • the position of the video frame corresponds to the second spliced video frame in the second video.
  • the center video frame is the video frame located in the middle of the sequence of video frames. For example, if the video frame sequence is arranged with 5 video frames, and the video frame positions are ⁇ 1, 2, 3, 4, 5 ⁇ , the video frame at position 3 is the video frame in the middle of the video frame sequence.
  • the matching points between the respective aligned video frames are obtained by calculating the respective aligned video frames.
  • the video frame with the largest number of matching points is used as the position of the spliced video frame.
  • Step 304 Determine the spatial transformation relationship between the first spliced video frame and the second spliced video frame, and perform video frame alignment on the first video and the second video based on the spatial transformation relationship.
  • the spatial transformation relationship refers to the transformation relationship between the first spliced video frame and the second spliced video frame, such as rotation, translation, or zooming in and out.
  • first spliced video and the second spliced video there may be a certain angle between the first spliced video and the second spliced video due to the shooting angle and other reasons.
  • first spliced video frame and the second spliced video frame there will also be a certain angle between the first spliced video frame and the second spliced video frame.
  • the spatial transformation relationship between the spliced video frames, the splicing of the first spliced video frame and the second spliced video frame can be completed only after the video frames of the first video and the second video are aligned based on the spatial transformation relationship.
  • the images captured at different angles are all converted to the same viewing angle, and the spatial transformation relationship from video frame to video frame is obtained. Based on the spatial transformation relationship, the Video frame alignment is performed on the first video and the second video.
  • Step 306 splicing video frames based on the aligned first video frame and the second video frame to obtain a spliced video, wherein during splicing, the first overlapping video frame area and the second overlapping video frame area are fused to obtain a fusion video frame.
  • the fused video frame refers to extracting the information corresponding to the first video frame and the second video frame that can enhance the image quality, and synthesizing the video frame corresponding to the high-quality image information.
  • the spatial transformation relationship between the corresponding first spliced video frame and the second spliced video frame is determined at the position of the spliced video frame, so that the first video and the second video are video frame
  • fusion is performed to obtain fused video frames, which can accurately splicing the first video and the second video, so that the spliced first video and the second video can achieve a natural transition and improve the splicing effect.
  • the spatial transformation relationship includes a horizontal transformation value
  • the spatial transformation relationship between the first spliced video frame and the second spliced video frame is determined, and the first video and the second spliced video frame are determined based on the spatial transformation relationship.
  • Video frame alignment includes:
  • Step 402 Obtain a first feature point of the first spliced video frame and a second feature point of the second spliced video frame.
  • the feature point refers to a point that can reflect the essential feature in each video frame image, and the target object in the image can be identified by the essential feature.
  • the distance between the two video frames can be calculated by the distance of the feature points in the two video frames.
  • the anti-shake panoramic video refers to the panoramic video after the panoramic video is subjected to anti-shake processing through the video data recorded by inertial sensors and accelerometers; the horizon in the anti-shake video is basically kept at the horizontal centerline position of the panoramic video frame.
  • the pitch angle and roll angle between the anti-shake panoramic video frames shot at the same position at all times are basically 0. It can be understood that in the anti-shake panoramic video, there is only one heading angle between the anti-shake video images shot at the same position at different times, that is, there is only horizontal translation.
  • the ORB feature point detection method or the SIFT feature point detection method may be used to directly extract the first feature point of the first spliced video frame and the second feature point of the second spliced video frame.
  • the panorama image corresponding to the panoramic video may be converted into a plane view, and then the ORB feature point detection method may be used to extract the first feature point of the first spliced video frame and the second feature point of the second spliced video frame .
  • the plane view may refer to a plane view with a field of view of 90 degrees seen in a certain direction of the panorama.
  • the panorama includes six planes, up, down, front, back, left, and right, and each plane is a plane view. For example, top view, bottom view, left view, right view, bottom view, and bottom view.
  • the panorama corresponding to the panorama video can be converted into a bottom view in the plane view video, and then feature points are extracted, and the panorama is image-transformed through a rotation matrix to obtain the image transformation of the panorama image to the bottom-view image.
  • a panorama refers to an image whose viewing angle covers the horizon plus and minus 180 degrees and the vertical direction plus and minus 90 degrees; if the panorama is regarded as an image in the spatial state of a cube, it can be considered that the image completely includes up, down, front, back, left, and right.
  • Step 404 Determine the horizontal distance between the first feature point and the second feature point.
  • the horizontal distance refers to the difference between the coordinates of the first feature point in the horizontal direction and the coordinates of the second feature point in the horizontal direction.
  • the horizontal distance between the first feature point and the second feature point is represented as ⁇ x
  • the coordinates of the first feature point in the horizontal direction are represented as x p1
  • the coordinates of the second feature point in the horizontal direction are represented as x p2
  • the first feature point is represented as x p2 .
  • the horizontal distance ⁇ x between a feature point and a second feature point can be calculated by the following formula:
  • Step 406 Determine the horizontal transformation value between the first spliced video frame and the second spliced video frame based on the horizontal distance.
  • the horizontal transformation value refers to the horizontal difference between the first spliced video frame and the second spliced video frame obtained by using the horizontal distance.
  • different horizontal transformation values can be obtained according to different value ranges of the horizontal distance.
  • the horizontal transformation value dx can be expressed as the formula:
  • the horizontal transformation value between the first spliced video frame and the second spliced video frame can be obtained by using the statistical value of the horizontal transformation value.
  • the average value of the horizontal transformation values may be used as the horizontal transformation value between the first spliced video frame and the second spliced video frame.
  • the statistical value of the horizontal transformation value is obtained by sorting the obtained horizontal transformation values, and the horizontal transformation values can be sorted from large to small or from small to large, and will be located in the middle of the sorting.
  • the horizontal transformation value of is used as the statistical value of the horizontal transformation value.
  • Statistics of horizontally transformed values can also be obtained by other methods. For example, the statistical value of the horizontally transformed values is obtained by calculating the average, weighted average, or mode of the respective horizontally transformed values.
  • the horizontal distance between the first feature point and the second feature point is obtained by acquiring the first feature point of the first spliced video frame and the second feature point of the second spliced video frame, and the first feature point is determined by the horizontal distance.
  • the horizontal transformation value between the spliced video frame and the second spliced video frame can achieve the purpose of accurately determining the spatial transformation relationship between the first spliced video frame and the second spliced video frame, and improve the video splicing according to the accurate spatial transformation relationship. Effect.
  • the step of fusing the first overlapping video frame region and the second overlapping video frame region to obtain a fused video frame includes:
  • Step 502 Acquire the current video frame to be fused from the first overlapping video frame area.
  • the current video frame to be fused needs to be acquired first in the overlapping video frame area.
  • the video is read through the OpenCV software library, and each frame in the video is extracted, which can be implemented by using the video acquisition structure function in the OpenCV software library.
  • the video acquisition structure function VideoCapture and Mat are used to acquire the video, and further, the video acquisition structure function described above can be used to acquire the current video frame to be fused.
  • filename represents a video file
  • frame represents a certain video frame to be acquired.
  • Step 504 Obtain the current time difference between the current shooting time of the video frame corresponding to the current video frame and the reference shooting time of the reference video frame.
  • the current video frame after acquiring the current video frame to be fused in the first overlapping video frame area in the first video, because of the different shooting times, the current video frame will have a corresponding shooting time, and the same reference video frame will also have For the corresponding shooting time, the current time difference can be obtained through the shooting time of the current video frame and the shooting time of the reference video frame.
  • the shooting time can be represented by a timestamp, which can be represented by the number of frames, or it can be represented by the number of frames multiplied by the frame rate, both of which can uniquely determine the shooting time corresponding to the video frame .
  • the current time difference may be obtained by the difference between the current shooting time of the video frame corresponding to the current video frame and the reference shooting time of the reference video frame.
  • t1 represents the current shooting time of the video frame corresponding to the current video frame
  • t2 represents the reference shooting time of the reference video frame
  • ⁇ t represents the current time between the current shooting time of the video frame corresponding to the current video frame and the reference shooting time of the reference video frame.
  • Step 506 Obtain the current fusion weight corresponding to the current video frame based on the current time difference, wherein the current time difference and the current fusion weight are positively correlated.
  • the fusion weight refers to the proportion corresponding to the current video frame in the image fusion process.
  • a positive correlation means that the current fusion weight and the current time difference have the same trend of increasing or decreasing; the current time difference increases, the current fusion weight also increases, the current time difference decreases, and the current fusion weight also decreases. .
  • the current fusion weight can be obtained according to the positive correlation between the current time difference and the current fusion weight.
  • Step 508 fuse the current video frame with the video frame at the corresponding position of the second overlapping video frame region based on the current fusion weight to obtain the current fused video frame.
  • the first video and the second video are fused by using the current fusion weight to obtain a spliced video with higher quality.
  • image fusion is performed using the current fusion weight, the video frame in the first overlapping video frame area corresponding to the first video, and the video frame in the second overlapping video frame area corresponding to the second video, to obtain a fused image.
  • first overlapping video frame area and the second overlapping video frame area are the areas where the overlapping video frame area exists in the first video and the second video respectively, and this area is the overlapping area of the first video and the second video. , using the first overlapping video frame region and the second overlapping video frame region, so as to distinguish the regions corresponding to the overlapping regions appearing in the first video and the second video.
  • the current fusion weight is obtained through the current time difference, and based on the current fusion weight, the current video frame and the video frame at the corresponding position of the second overlapping video frame area are fused to obtain the current fusion video. frame, which can achieve the purpose of obtaining a complete video with a natural transition effect.
  • the current fusion weight corresponding to the current video frame is obtained based on the current time difference and includes:
  • Step 602 Obtain the overlapping area time length corresponding to the overlapping video frame area.
  • the time length of the overlapping area refers to the video time length corresponding to the video frames of the overlapping area. For example, if the video length of the overlapping area is 600 milliseconds, the time length of the overlapping area is 600 milliseconds.
  • one of the parameters in the calculation of the current fusion weight is the time length of the overlapping area, and one of the parameters for calculating the current fusion weight can be determined by obtaining the time length of the overlapping area.
  • the total number of frames in the overlapping area and the video frame rate can be used to obtain the time length of the overlapping area according to the functional relationship between the total number of frames and the video frame rate.
  • b represents the total number of frames in the overlapping area
  • v represents the frame rate
  • t represents the time length of the overlapping area
  • Step 604 Calculate the ratio of the current time difference to the time length of the overlapping area to obtain the current fusion weight.
  • t1-t2 represents the current time difference
  • t represents the time length of the overlapping area
  • Q represents the current fusion weight
  • the purpose of accurately obtaining the current fusion weight can be achieved through the ratio of the current time difference to the time length of the overlapping area, so that when the first video and the second video are spliced through the overlapping area, the current fusion weight can be used to correct the overlapping The video frames in the area are fused to improve the video stitching effect.
  • the overlapping area search is performed based on the reference video frame to obtain the first overlapping video frame area corresponding to the first video and the second overlapping video frame area corresponding to the second video, including:
  • Step 702 Compare the reference video frame with each video frame in the first video to obtain a matching video frame in the first video that matches the reference video frame.
  • the matching video frame refers to a video frame in the first video that can satisfy the matching condition with the reference video frame.
  • the video frame with the largest number of matching points with the reference video frame among the video frames in the first video may be used as the matching video frame.
  • the reference video frame is in the second video, and the backward video frame of the still frame is considered to be the first non-still frame in order to avoid the video quality such as freezing or stillness.
  • the reference video frame is used as the video frame to be compared, and the matching video frame is obtained.
  • the video frame with the highest matching rate with the video frame serving as the reference video frame in the second video is selected as the matching video frame.
  • the matching rate may be the ratio of the number of feature points matching to the total number of feature points. For example, if the number of feature points matching between the reference video frame and a certain video frame in the first video is 1000, and the total number of feature points is 1500, then the ratio of the matching rate between 1000 and 1500 is 67%.
  • step 704 the tail video frame area corresponding to the matching video frame is used as the first overlapping video frame area corresponding to the first video.
  • the tail video frame area refers to the corresponding video frame area from the beginning of the matching video frame to the end video frame of the first video. For example, if the matching video frame is P, the tail video frame area is the video frame after the P frame in the first video.
  • the video frame after the matching video frame may be obtained in the first video as the first overlapping video frame region corresponding to the first video.
  • Step 706 the reference video frame area where the reference video frame is located in the second video frame is taken as the second coincident video frame area corresponding to the second video, the reference video frame is the head video frame of the reference video frame area, the reference video frame area Matches the number of video frames in the tail video frame area.
  • the header video frame refers to the first video frame in the video frame area.
  • the reference video frame area is the second overlapping video frame area in the second video
  • the trailing video frame area is the first overlapping video frame area in the first video
  • video frame It can be understood that, after the video fusion is realized, the first overlapping video frame area and the second overlapping video frame area will form a overlapping video frame area, and the video frame before the overlapping video frame area is the video frame of the first video.
  • the video frame after the overlapping video frame area is the video frame of the second video.
  • the matching video frame can be obtained by referring to the video frame
  • the first overlapping video frame area can be obtained by matching the video frame
  • the second overlapping video frame area can be obtained correspondingly on the second video, so as to accurately determine the overlapping video frame area
  • the purpose is to perform video fusion on the overlapping video frame area, and realize the natural splicing of the first video and the second video in the overlapping video frame area.
  • the overlapping area search is performed based on the reference video frame, and the obtained first overlapping video frame area corresponding to the first video and the second overlapping video frame area corresponding to the second video include:
  • Step 802 Acquire a tail video frame sequence with a preset number of frames in the first video as a first overlapping video frame region corresponding to the first video.
  • the preset number of frames refers to a predetermined number of video frames, and the number of video frames in the acquired tail video frame sequence can be determined by the number of video frames. For example, if the preset number of frames is m, the acquired tail video frame sequence includes m video frames.
  • the preset number of frames may be determined by continuous testing in a pre-judgment manner.
  • the total number of video frames may be used as a reference, and according to an empirical value, video frame regions corresponding to a plurality of preset frame numbers may be set as the first overlapping video frame regions corresponding to the first video.
  • Step 804 Obtain a matching video frame sequence matching the tail video frame sequence from the backward video frame sequence corresponding to the reference video frame, and use the matching video frame sequence as the second overlapping video frame region corresponding to the second video.
  • the backward video frame sequence refers to a sequence composed of video frames following the reference video frame.
  • the sequence may include some video frames following the reference video frame, or may include all video frames following the reference video frame.
  • a video frame with a preset number of frames after the reference video frame may be taken as the matching video frame sequence. For example, m frames after the reference video frame may be taken, and by comparing the last m frames in the first video, the m frames corresponding to the maximum number of feature point matching statistics are obtained as the overlapping area. When m takes different values, the obtained feature point matching number statistics are different, forming a correspondence table between the numerical value m and the feature point matching number statistics value, and find the maximum matching number statistics value from the table, and the maximum value corresponds to where m is the number of video frames in the obtained overlapping area.
  • the first overlapping video frame region is obtained by obtaining a tail video frame sequence with a preset number of frames in the first video, and a video frame matching the tail video frame sequence is obtained through the obtained video frames in the first overlapping video frame region.
  • Matching the video frame sequence taking the matching video frame sequence as the second overlapping video frame area corresponding to the second video, can achieve the purpose of accurately determining the overlapping video frame area, so that the first video and the second video are completed in the overlapping video frame area.
  • splicing so that the first video and the second video can transition naturally during the splicing process, and improve the effect of video splicing.
  • performing still frame detection on the first video or the second video to obtain a still frame sequence includes:
  • the flat video is a video composed of individual flat view images.
  • the plane view image may refer to a plane view with a field of view of 90 degrees seen in a certain direction of the panorama.
  • the panorama includes six planes, up, down, front, back, left, and right, and each plane is a plane view.
  • plan views include top, bottom, left, right, bottom, and bottom views.
  • the panorama corresponding to the panorama video can be converted into a bottom view in the plane view corresponding to the plane video, and then feature points are extracted, and the panorama is image-transformed through a rotation matrix to obtain an image of the bottom view image of the panorama image transform.
  • a panorama refers to an image whose viewing angle covers the horizon plus and minus 180 degrees and the vertical direction plus and minus 90 degrees; if the panorama is regarded as an image in the spatial state of a cube, it can be considered that the image completely includes up, down, front, back, left, and right.
  • the still frame detection is performed on the plane video, and the still frame sequence is obtained.
  • the first plane video and the second plane video may be acquired respectively, and feature point extraction is performed by sequentially performing feature point extraction on the last video frame in the first plane video and consecutive video frames before the last video frame, and Perform feature point matching.
  • the video frame sequence composed of the multiple video frames is determined as the still frame sequence.
  • the last plane video frame in the first plane video is represented as the first frame, and the feature points are extracted and matched with the consecutive n-2 frames before the last video frame.
  • the first plane A video frame sequence consisting of the last n-1 frames in the video is determined as a still frame sequence.
  • feature point extraction may be performed on the first planar video frame in the second planar video sequentially with multiple consecutive video frames after the first video frame, and feature point matching may be performed.
  • the video frame sequence composed of the multiple video frames is determined as the still frame sequence.
  • the first video frame in the second plane video is represented as the first frame, and feature points are extracted and matched with the consecutive n-2 frames after the first video frame.
  • the matching results meet the threshold condition, the The video frame sequence composed of the first n-1 frames in the biplane video is determined as the still frame sequence.
  • the camera is first placed at a position where the last video of the first video (previous video) passes, and then the camera is still for a period of time, Move the camera along the motion path of the last video of the first video to start capturing the second video (the second video).
  • Both the first video and the second video have an overlapped area where the last video is shot. It is assumed that the video frame that starts to move in the second video is at frame A, and the end position of the overlapped area is at frame B.
  • the range of the video frames in the first video of the overlapping area is also between the A frame and the B frame.
  • the video frame at the central position between frame A and frame B can be taken as the position of the spliced video frame, or other video frames between frame A and frame B can be taken as the position of the spliced video frame, and the first spliced video frame position can be used to complete the first.
  • the stitching of the video and the second video Assuming that the position of the spliced video frame is represented as a C frame, the C frame may be the video frames at multiple video frames after the A frame, for example, the C frame may be the video frame at 5 video frames after the A frame.
  • a panorama video of an object passing through an obstacle is taken by an anti-shake camera as an example for description.
  • the first video first video
  • the camera bypasses the obstacle and shoots the second video (second video) of the object passing through the obstacle from the other side of the obstacle, and the second video is shot from the first video of the object passing through the obstacle
  • the shooting started at , because there is a certain delay in shooting, so the second video shot looks still at the starting moment.
  • the second video continues to be shot along the route of passing through the obstacle shot in the first video, so that the two videos before and after have the same shooting route, that is, an overlapping path, and the overlapping path is used to connect the two videos before and after.
  • the conversion relationship is a horizontal translation relationship.
  • the first video and the second video are aligned in the overlapping path, and the image fusion method is used to complete the fusion of video frame images, so that the first video and the second video can complete the natural transition, so as to realize the first video.
  • a server is provided, and the server is configured to execute the steps in the foregoing method embodiments.
  • the server can be implemented as an independent server or a server cluster composed of multiple servers.
  • steps in the flowcharts of FIGS. 2-8 are shown in sequence according to the arrows, these steps are not necessarily executed in the sequence shown by the arrows. Unless explicitly stated herein, the execution of these steps is not strictly limited to the order, and these steps may be performed in other orders. Moreover, at least a part of the steps in FIGS. 2-8 may include multiple steps or multiple stages. These steps or stages are not necessarily executed and completed at the same time, but may be executed at different times. The execution of these steps or stages The order is also not necessarily sequential, but may be performed alternately or alternately with other steps or at least a portion of the steps or phases within the other steps.
  • a video splicing apparatus 1000 including: a first video and a second video acquisition module 1002, a still frame sequence acquisition module 1004, a reference video frame acquisition module 1006, a first The overlapping video frame area and the second overlapping video frame area obtaining module 1008 and the splicing video obtaining module 1010, wherein: the first video and the second video obtaining module 1002 are used to obtain the first video and the second video to be spliced.
  • the video is before the second video;
  • the still frame sequence obtaining module 1004 is used to perform still frame detection on the first video or the second video to obtain a still frame sequence;
  • the reference video frame obtaining module 1006 is used to obtain a still frame sequence based on the frame sequence to obtain a reference video frame;
  • the first overlapping video frame area and the second overlapping video frame area obtaining module 1008 is used to search the overlapping area based on the reference video frame to obtain the first overlapping video frame area corresponding to the first video and The second overlapping video frame area corresponding to the second video;
  • the spliced video obtaining module 1010 is used for splicing the first video and the second video based on the first overlapping video frame area and the second overlapping video frame area to obtain a spliced video.
  • the spliced video obtaining module 1010 is configured to obtain the spliced video frame position, obtain the first spliced video frame corresponding to the spliced video frame position from the first overlapping video frame area, and obtain the spliced video frame from the second overlapping video frame area
  • the second spliced video frame corresponding to the frame position determine the spatial transformation relationship between the first spliced video frame and the second spliced video frame, and perform video frame alignment on the first video and the second video based on the spatial transformation relationship;
  • the first video frame and the second video frame are spliced into video frames to obtain a spliced video, wherein during splicing, the first overlapping video frame area and the second overlapping video frame area are fused to obtain a fused video frame.
  • the spliced video obtaining module 1010 is configured to obtain the first feature point of the first spliced video frame and the second feature point of the second spliced video frame; determine the level between the first feature point and the second feature point Distance; the horizontal transformation value between the first spliced video frame and the second spliced video frame is determined based on the horizontal distance.
  • the spliced video obtaining module 1010 is used to obtain the current video frame to be fused from the first overlapping video frame area; obtain the difference between the current shooting time of the video frame corresponding to the current video frame and the reference shooting time of the reference video frame The current time difference of The video frames are fused to obtain the current fused video frame.
  • the spliced video obtaining module 1010 is configured to obtain the overlapping area time length corresponding to the overlapping video frame area; calculate the ratio of the current time difference to the overlapping area time length to obtain the current fusion weight.
  • the first overlapping video frame area and the second overlapping video frame area obtaining module 1008 is configured to compare the reference video frame with each video frame in the first video, respectively, to obtain the first video and the reference video frame
  • the matched matching video frame; the tail video frame area corresponding to the matching video frame is used as the first coincident video frame area corresponding to the first video;
  • the reference video frame area where the reference video frame is located in the second video frame is used as the second video frame area
  • the reference video frame is the head video frame of the reference video frame region, and the reference video frame region matches the number of video frames in the tail video frame region.
  • the first overlapping video frame area and the second overlapping video frame area obtaining module 1008 is configured to obtain a sequence of tail video frames with a preset number of frames in the first video, as the first overlapping video corresponding to the first video Frame area; from the backward video frame sequence corresponding to the reference video frame, obtain a matching video frame sequence matching the tail video frame sequence, and use the matching video frame sequence as the second coincident video frame area corresponding to the second video.
  • the still frame sequence obtaining module 1004 is configured to convert the first video or the second video into a planar video; perform still frame detection on the planar video to obtain a still frame sequence.
  • Each module in the above video splicing apparatus can be implemented in whole or in part by software, hardware and combinations thereof.
  • the above modules can be embedded in or independent of the processor in the computer device in the form of hardware, or stored in the memory in the computer device in the form of software, so that the processor can call and execute the operations corresponding to the above modules.
  • a computer device is provided, and the computer device may be a terminal, and its internal structure diagram may be as shown in FIG. 11 .
  • the computer equipment includes a processor, memory, a communication interface, a display screen, and an input device connected by a system bus.
  • the processor of the computer device is used to provide computing and control capabilities.
  • the memory of the computer device includes a non-volatile storage medium, an internal memory.
  • the nonvolatile storage medium stores an operating system and a computer program.
  • the internal memory provides an environment for the execution of the operating system and computer programs in the non-volatile storage medium.
  • the communication interface of the computer device is used for wired or wireless communication with an external terminal, and the wireless communication can be realized by WIFI, operator network, NFC (Near Field Communication) or other technologies.
  • the computer program when executed by the processor, implements a video stitching method.
  • the display screen of the computer equipment may be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment may be a touch layer covered on the display screen, or a button, a trackball or a touchpad set on the shell of the computer equipment , or an external keyboard, trackpad, or mouse.
  • FIG. 11 is only a block diagram of a partial structure related to the solution of the present application, and does not constitute a limitation on the computer equipment to which the solution of the present application is applied. Include more or fewer components than shown in the figures, or combine certain components, or have a different arrangement of components.
  • a computer device including a memory and a processor, where a computer program is stored in the memory, and when the processor executes the computer program, the steps in the foregoing method embodiments are implemented.
  • a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, implements the steps in the foregoing method embodiments.
  • Non-volatile memory may include read-only memory (Read-Only Memory, ROM), magnetic tape, floppy disk, flash memory, or optical memory, and the like.
  • Volatile memory may include random access memory (RAM) or external cache memory.
  • the RAM may be in various forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM).

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Processing (AREA)
  • Image Analysis (AREA)
  • Studio Devices (AREA)

Abstract

本申请涉及一种视频拼接方法、装置、计算机设备和存储介质。所述方法包括:获取待拼接的第一视频和第二视频,所述第一视频在所述第二视频之前;对所述第一视频或第二视频进行静止帧检测,得到静止帧序列;基于所述静止帧序列,得到参考视频帧;基于所述参考视频帧进行重合区域搜索,得到所述第一视频对应的第一重合视频帧区域以及所述第二视频对应的第二重合视频帧区域;基于所述第一重合视频帧区域以及第二重合视频帧区域将所述第一视频与所述第二视频进行拼接,得到拼接视频。采用本方法能够提高视频拼接效果。

Description

视频拼接方法、装置、计算机设备和存储介质 技术领域
本申请涉及图像处理技术领域,特别是涉及一种视频拼接方法、装置、计算机设备和存储介质。
背景技术
随着图像处理技术的发展,出现了视频拼接技术,视频拼接技术可以把不同时间条件下拍摄的视频拼接在一起,构成一段完整的视频。例如,相机拍摄物体穿越障碍物的全景视频。当拍摄到物体穿越障碍物时,物体穿过障碍物一定距离后停止拍摄第一段视频。然后,相机绕过障碍物,从障碍物另一侧拍摄物体穿越障碍物的第二段视频,将第一段视频和第二段视频拼接形成一个物体穿过障碍物的完整全景视频。全景视频因大视角和高分辨率,被广泛应用到各个领域,因此,视频拼接技术也被广泛应用到各个领域。
技术问题
然而,目前的视频拼接方式,存在视频拼接效果差的问题。
技术解决方案
基于此,有必要针对上述技术问题,提供一种能够提高视频拼接效果的视频拼接方法、装置、计算机设备和存储介质。
一种视频拼接方法,所述方法包括:获取待拼接的第一视频和第二视频,所述第一视频在所述第二视频之前;对所述第一视频或第二视频进行静止帧检测,得到静止帧序列;基于所述静止帧序列,得到参考视频帧;基于所述参考视频帧进行重合区域搜索,得到所述第一视频对应的第一重合视频帧区域以及所述第二视频对应的第二重合视频帧区域;基于所述第一重合视频帧区域以及第二重合视频帧区域将所述第一视频与所述第二视频进行拼接,得到拼接视频。
在其中一个实施例中,所述基于所述第一重合视频帧区域以及第二重合视频帧区域将所述第一视频与所述第二视频进行拼接,得到拼接视频包括:获取拼接视频帧位置,从所述第一重合视频帧区域获取所述拼接视频帧位置对应的第一拼接视频帧,从所述第二重合视频帧区域中获取所述拼接视频帧位置对应的第二拼接视频帧;确定所述第一拼接视频帧与所述第二拼接视频帧之间的空间变换关系,基于所述空间变换关系对所述第一视频以及所述第二视 频进行视频帧对齐;基于对齐之后的第一视频帧与所述第二视频帧进行视频帧拼接,得到拼接视频,其中,在拼接时,所述第一重合视频帧区域与所述第二重合视频帧区域进行融合得到融合视频帧。
在其中一个实施例中,所述空间变换关系包括水平变换值,所述确定所述第一拼接视频帧与所述第二拼接视频帧之间的空间变换关系,基于所述空间变换关系对所述第一视频以及所述第二视频进行视频帧对齐包括:获取所述第一拼接视频帧的第一特征点和所述第二拼接视频帧的第二特征点;确定所述第一特征点与所述第二特征点之间的水平距离;基于所述水平距离确定所述第一拼接视频帧与所述第二拼接视频帧之间的水平变换值。
在其中一个实施例中,所述第一重合视频帧区域与所述第二重合视频帧区域进行融合得到融合视频帧的步骤包括:从所述第一重合视频帧区域中获取待融合的当前视频帧;获取所述当前视频帧对应的视频帧当前拍摄时间与所述参考视频帧的参考拍摄时间之间的当前时间差异;基于当前时间差异得到当前视频帧对应的当前融合权重,其中,当前时间差异与当前融合权重成正相关关系;基于当前融合权重将当前视频帧与所述第二重合视频帧区域对应位置的视频帧进行融合,得到当前融合视频帧。
在其中一个实施例中,所述基于当前时间差异得到当前视频帧对应的当前融合权重包括:获取重合视频帧区域对应的重合区域时间长度;计算所述当前时间差异与所述重合区域时间长度的比值,得到当前融合权重。
在其中一个实施例中,所述基于所述参考视频帧进行重合区域搜索,得到所述第一视频对应的第一重合视频帧区域以及所述第二视频对应的第二重合视频帧区域包括:将所述参考视频帧分别与所述第一视频中的各个视频帧进行对比,得到第一视频中与所述参考视频帧匹配的匹配视频帧;将所述匹配视频帧对应的尾部视频帧区域,作为所述第一视频对应的第一重合视频帧区域;将所述第二视频帧中所述参考视频帧所在的参考视频帧区域,作为所述第二视频对应的第二重合视频帧区域,所述参考视频帧为所述参考视频帧区域的头部视频帧,所述参考视频帧区域与所述尾部视频帧区域的视频帧数量匹配。
在其中一个实施例中,所述基于所述参考视频帧进行重合区域搜索,得到所述第一视频对应的第一重合视频帧区域以及所述第二视频对应的第二重合视频帧区域包括:在所述第一视频中获取预设帧数的尾部视频帧序列,作为所述第一视频对应的第一重合视频帧区域;从所述参考视频帧对应的后向视频帧序列中,获取与所述尾部视频帧序列匹配的匹配视频帧序列,将匹配视频帧序列作为所述第二视频对应的第二重合视频帧区域。
在其中一个实施例中,所述对所述第一视频或第二视频进行静止帧检测,得到静止帧序列包括:将所述第一视频或者第二视频转换为平面视频;对所述平面视频进行静止帧检测, 得到所述静止帧序列。
一种视频拼接装置,所述装置包括:第一视频和第二视频获取模块,用于获取待拼接的第一视频和第二视频,所述第一视频在所述第二视频之前;静止帧序列得到模块,用于对所述第一视频或第二视频进行静止帧检测,得到静止帧序列;参考视频帧得到模块,用于基于所述静止帧序列,得到参考视频帧;第一重合视频帧区域以及第二重合视频帧区域得到模块,用于基于所述参考视频帧进行重合区域搜索,得到所述第一视频对应的第一重合视频帧区域以及所述第二视频对应的第二重合视频帧区域;拼接视频得到模块,用于基于所述第一重合视频帧区域以及第二重合视频帧区域将所述第一视频与所述第二视频进行拼接,得到拼接视频。
在其中一个实施例中,拼接视频得到模块用于获取拼接视频帧位置,从所述第一重合视频帧区域获取所述拼接视频帧位置对应的第一拼接视频帧,从所述第二重合视频帧区域中获取所述拼接视频帧位置对应的第二拼接视频帧;确定所述第一拼接视频帧与所述第二拼接视频帧之间的空间变换关系,基于所述空间变换关系对所述第一视频以及所述第二视频进行视频帧对齐;基于对齐之后的第一视频帧与所述第二视频帧进行视频帧拼接,得到拼接视频,其中,在拼接时,所述第一重合视频帧区域与所述第二重合视频帧区域进行融合得到融合视频帧。
在其中一个实施例中,拼接视频得到模块用于获取所述第一拼接视频帧的第一特征点和所述第二拼接视频帧的第二特征点;确定所述第一特征点与所述第二特征点之间的水平距离;基于所述水平距离确定所述第一拼接视频帧与所述第二拼接视频帧之间的水平变换值。
在其中一个实施例中,拼接视频得到模块用于从所述第一重合视频帧区域中获取待融合的当前视频帧;获取所述当前视频帧对应的视频帧当前拍摄时间与所述参考视频帧的参考拍摄时间之间的当前时间差异;基于当前时间差异得到当前视频帧对应的当前融合权重,其中,当前时间差异与当前融合权重成正相关关系;基于当前融合权重将当前视频帧与所述第二重合视频帧区域对应位置的视频帧进行融合,得到当前融合视频帧。
在其中一个实施例中,拼接视频得到模块用于获取重合视频帧区域对应的重合区域时间长度;计算所述当前时间差异与所述重合区域时间长度的比值,得到当前融合权重。
在其中一个实施例中,第一重合视频帧区域以及第二重合视频帧区域得到模块用于将所述参考视频帧分别与所述第一视频中的各个视频帧进行对比,得到第一视频中与所述参考视频帧匹配的匹配视频帧;将所述匹配视频帧对应的尾部视频帧区域,作为所述第一视频对应的第一重合视频帧区域;将所述第二视频帧中所述参考视频帧所在的参考视频帧区域,作为所述第二视频对应的第二重合视频帧区域,所述参考视频帧为所述参考视频帧区域的头部视 频帧,所述参考视频帧区域与所述尾部视频帧区域的视频帧数量匹配。
在其中一个实施例中,第一重合视频帧区域以及第二重合视频帧区域得到模块用于在所述第一视频中获取预设帧数的尾部视频帧序列,作为所述第一视频对应的第一重合视频帧区域;从所述参考视频帧对应的后向视频帧序列中,获取与所述尾部视频帧序列匹配的匹配视频帧序列,将匹配视频帧序列作为所述第二视频对应的第二重合视频帧区域。
在其中一个实施例中,静止帧序列得到模块用于将所述第一视频或者第二视频转换为平面视频;对所述平面视频进行静止帧检测,得到所述静止帧序列。
一种计算机设备,包括存储器和处理器,所述存储器存储有计算机程序,所述处理器执行所述计算机程序时实现以下步骤:获取待拼接的第一视频和第二视频,所述第一视频在所述第二视频之前;对所述第一视频或第二视频进行静止帧检测,得到静止帧序列;基于所述静止帧序列,得到参考视频帧;基于所述参考视频帧进行重合区域搜索,得到所述第一视频对应的第一重合视频帧区域以及所述第二视频对应的第二重合视频帧区域;基于所述第一重合视频帧区域以及第二重合视频帧区域将所述第一视频与所述第二视频进行拼接,得到拼接视频。
一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现以下步骤:获取待拼接的第一视频和第二视频,所述第一视频在所述第二视频之前;对所述第一视频或第二视频进行静止帧检测,得到静止帧序列;基于所述静止帧序列,得到参考视频帧;基于所述参考视频帧进行重合区域搜索,得到所述第一视频对应的第一重合视频帧区域以及所述第二视频对应的第二重合视频帧区域;基于所述第一重合视频帧区域以及第二重合视频帧区域将所述第一视频与所述第二视频进行拼接,得到拼接视频。
技术效果
上述视频拼接方法、装置、计算机设备和存储介质,终端获取待拼接的第一视频和第二视频,对第一视频或第二视频进行静止帧检测,得到静止帧序列;基于静止帧序列,得到参考视频帧;其中,第一视频是在第二视频拍摄之前拍摄得到。基于参考视频帧进行重合区域搜索,得到第一视频对应的第一重合视频帧区域以及所述第二视频对应的第二重合视频帧区域;基于第一重合视频帧区域以及第二重合视频帧区域将第一视频与第二视频进行拼接,得到拼接视频。通过确定参考视频帧,得到第一重合视频帧区域和第二重合区域,基于上述两个重合视频帧区域对第一视频和第二视频这两段视频进行拼接,使第一视频和第二视频能够实现自然拼接过渡,提高视频拼接效果。
附图说明
图1为一个实施例中视频拼接方法的应用环境图;
图2为一个实施例中视频拼接方法的流程示意图;
图3为另一个实施例中视频拼接方法的流程示意图;
图4为另一个实施例中视频拼接方法的流程示意图;
图5为一个实施例中第一重合视频帧区域与所述第二重合视频帧区域进行融合得到融合视频帧步骤的流程示意图;
图6为一个实施例中基于当前时间差异得到当前视频帧对应的当前融合权重步骤的流程示意图;
图7为另一个实施例中视频拼接方法的流程示意图;
图8为另一个实施例中视频拼接方法的流程示意图;
图9为一个实施例中确定拼接视频帧位置方法的示意图;
图10为一个实施例中视频拼接装置的结构框图;
图11为一个实施例中计算机设备的内部结构图。
本发明的实施方式
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。
本申请提供的视频拼接方法,可以应用于如图1所示的应用环境中,具体应用到一种视频拼接系统中。该视频拼接系统包括视频采集设备102与终端104,其中,视频采集设备102通过网络与终端104进行通信。终端104执行一种视频拼接方法。具体的,视频采集设备102通过在不同时刻在需要拍摄对象的相同位置拍摄的两段待拼接的视频传输给终端104,终端104相应地获取到待拼接的第一视频和第二视频,第一视频为第二视频的前向视频;终端104在获取到第一视频和第二视频后,对其中的对第一视频或第二视频进行静止帧检测,得到静止帧序列;基于静止帧序列,得到参考视频帧;基于该参考视频帧进行重合区域搜索,分别得到第一视频中的第一重合视频帧区域与第二视频中的第二重合视频帧区域;终端104基于第一重合视频帧区域和第二重合视频帧区域将第一视频与第二视频进行拼接,得到拼接视频。其中,视频采集设备102可以但不限于是各种有视频采集功能的设备,可以分布于终端104的外部,也可以分布于终端104的内部。例如:分布于终端104的外部的各种摄像头、各种相机、视频采集卡等。终端104可以但不限于是各种相机、个人计算机、笔记本电脑、智能 手机、平板电脑和便携式可穿戴设备。
在一个实施例中,如图2所示,提供了一种视频拼接方法,以该方法应用于图1中的终端为例进行说明,包括以下步骤:
步骤202,获取待拼接的第一视频和第二视频,第一视频在第二视频之前。
其中,前向视频是指在拍摄第二视频之前,与第二视频存在有相同拍摄位置得到的视频。例如,相机拍摄物体穿越障碍物的全景视频。当拍摄到物体穿越障碍物时,物体穿过障碍物一定距离后停止拍摄第一视频。然后,相机绕过障碍物,从障碍物另一侧拍摄物体穿越障碍物的第二视频。将拍摄第一视频的时间看作第一时间,将拍摄第二视频的时间看作第二时间。其中,第一时间在第二时间之前。同时,两段视频存在相同的拍摄位置,即与障碍物有一定距离的位置,则第一视频为第二视频的前向视频。
具体的,当需要将不同时刻拍摄的,存在相同位置的两段视频进行拼接时,需要首先获取到待拼接的第一视频和第二视频。
在一个实施例中,终端可以通过连接的视频采集设备进行视频的采集,采集设备将采集到的视频实时传输给终端;或者采集设备将采集到的视频暂存到采集设备本地,当接收到终端的视频获取指令时,将本地存储的视频传输给终端,相应的,终端能够获取到待拼接的第一视频和第二视频。
在一个实施例中,终端通过内部存在视频采集模块,对第一视频和第二视频进行采集,对采集到的视频存储到终端存储器中,当终端需要对第一视频和第二视频进行拼接时,从存储器中,获取待拼接的第一视频和第二视频。
步骤204,对第一视频或第二视频进行静止帧检测,得到静止帧序列。
其中,静止帧是指第一视频或第二视频中的各个视频帧中,存在使第一视频或第二视频画面静止的视频帧。静止帧序列是指第一视频或第二视频中有先后顺序的静止帧组成的序列。
具体的,静止帧存在于视频中时,会使视频呈现出来卡顿的情况,为了使视频拼接过程中不会受到静止帧的影响,需要对第一视频或第二视频的进行静止帧检测。
在一个实施例中,可以通过第一视频中的最后一个视频帧依次与最后一个视频帧前的连续的多个视频帧进行特征点提取,并进行特征点匹配。当最后一个视频帧前的连续的多个视频帧特征点匹配结果都满足阈值条件时,将该多个视频帧组成的视频帧序列确定为静止帧序列。例如,第一视频中的最后一个视频帧表示为第1帧,与最后一个视频帧前的连续n-2帧进行特征点提取和匹配,当匹配结果都满足阈值条件时,将第一视频中的后n-1帧组成的视频帧序列确定为静止帧序列。
在一个实施例中,可以通过第二视频中的第一个视频帧依次与第一个视频帧后的连续的 多个视频帧进行特征点提取,并进行特征点匹配。当第一个视频帧后的连续的多个视频帧特征点匹配结果都满足阈值条件时,将该多个视频帧组成的视频帧序列确定为静止帧序列。例如,第二视频中的第一个视频帧表示为第1帧,与第一个视频帧后的连续n-2帧进行特征点提取和匹配,当匹配结果都满足阈值条件时,将第二视频中的前n-1帧组成的视频帧序列确定为静止帧序列。
在一个实施例中,在进行视频帧的特征点提取和特征点匹配时,先将视频帧对应的图像转换为平面视图图像。其中,平面视图图像可以是指全景图某个方向所看到的视场角为90度的平面图,比如全景图包括上下前后左右六个面,每个平面就是一个平面视图。例如,上视图、下视图、左视图、右视图、仰视图和底视图。进行第一视频中静止帧检测时,当第一视频中最后一个视频帧和该最后一个视频帧前的待确定静止帧的某一视频帧匹配到的特征点的位置处在上述平面视图图像宽度的1/10至1/60之间,且匹配到的特征点的总数处于第一视频中最后一个视频帧和倒数第二个视频帧之间特征点匹配总数的10%以上时,则将该待确定静止帧的某一视频帧确定为静止帧。进行第二视频中静止帧检测时,当第二视频中第一个视频帧和该第一个视频帧后的待确定静止帧的某一视频帧匹配到的特征点的位置处在上述平面视图图像宽度的1/10至1/60之间,且匹配到的特征点的总数处于第二视频中第一个视频帧和第二个视频帧之间特征点匹配总数的10%以上时,则将该待确定静止帧的某一视频帧确定为静止帧。
在一个实施例中,可以采用ORB(Oriented Fast and Rotated Brief)特征点检测的方法,对视频帧中的特征点进行提取和匹配。可以理解的,也可以采用其他的特征点检测方法进行视频帧中的特征点的提取和匹配。例如,SIFT(Scale-invariant feature transform),SUFT(Speeded Up Robust Features),LSD(Line Segment Detection)等。
步骤206,基于静止帧序列,得到参考视频帧。
其中,参考视频帧是指可以作为参考的视频帧,利用该视频帧可以得到其他视频帧与该视频帧的匹配结果。
具体的,在得到静止帧序列之后,可以在第一视频中获取该静止帧序列的前向视频帧或者可以在第二视频中获取该静止帧序列的后向视频帧,该前向视频帧或者后向视频帧认为是非静止视频帧,该非静止视频帧可以作为参考视频帧。前向视频帧是指第一视频中,静止帧序列前的第一个视频帧,后向视频帧是指第二视频中,静止帧序列后的第一个视频帧。在一个实施例中,调用终端中OpenCV软件库进行前向视频帧或者后向视频帧的提取。其中的OpenCV是一个基于BSD(original BSD license、FreeBSD license、Original BSD license)发行的跨平台计算机视觉和机器学习软件库,能够实现视频帧的提取。
在一个实施例中,可以通过CRC(Cyclic Redundancy Check)校验方法对第一视频或者第二视频进行静止帧的检测,通过创建多线程对第一视频或者第二视频中的视频帧进行CRC校验,得到每一帧的CRC校验值,通过CRC校验值可以得到第一视频或者第二视频中的静止帧,通过该静止帧得到非静止视频帧,将该非静止视频帧作为参考的视频帧。可以理解的,静止帧的确定也可以采用其他静止帧的检测方法。
步骤208,基于参考视频帧进行重合区域搜索,得到第一视频对应的第一重合视频帧区域以及第二视频对应的第二重合视频帧区域。
其中,重合区域是指两段视频中对相同位置进行拍摄所得到的视频帧区域,例如不同时间拍摄的两段视频,该两段视频存在相同的拍摄位置,该两段视频存在的共有的拍摄位置拍摄得到的视频帧区域为重合区域。
具体的,在获取到参考视频帧之后,可以通过该参考视频帧进行重合区域的确定。
在一个实施例中,可以通过该参考视频帧与第一视频中的所有帧进行匹配,在第一视频中,将匹配概率最高的视频帧作为第一重合视频帧区域的起始帧。相应的,在第二视频中,利用与第一重合视频帧区域中的视频帧有相同位置的视频帧对应的区域,得到第二视频对应的第二重合视频帧区域。例如,该帧表示为P帧,在第一视频中,取P帧之后的全部视频帧对应的视频帧部分作为重合区域。第二视频中,与P帧之后的全部视频帧有相同位置的视频帧区域为C帧到F帧之间的视频帧区域,则将C帧到F帧之间的视频帧区域作为第二重合视频帧区域。
在一个实施例中,可以取参考视频帧后预设帧数的视频帧,进行对应视频帧的匹配。例如,可以取参考视频帧后的m帧,通过对比第一视频中的后m帧,得到特征点匹配数量统计值最大时对应的m帧作为第一重合视频帧区域。当m取不同数值时,得到的特征点匹配数量统计值不同,形成m帧与特征点匹配数量统计值之间的对应表,从该表中查找匹配数量统计值最大值,该统计值最大值对应的m帧为得到的重合区域的视频帧数。相应的,可以得到第二视频对应的第二重合视频帧区域。如下表1中所示,为预设帧数与特征点匹配数量统计值之间的对应关系表。
表1.m值与特征点匹配数量统计值对应关系表
m值(预设帧数) 特征点匹配数量统计值
20 1000
30 1200
35 900
40 1100
由表1中可以查找到,当m取值为30时,对应的特征点匹配数量统计值为最大,则将第一视频中的后30帧作为第一重合视频帧区域。
步骤210,基于第一重合视频帧区域以及第二重合视频帧区域将第一视频与第二视频进行拼接,得到拼接视频。
其中,拼接是指将两段或者多段视频组合成一段完整视频的过程。
具体的,在确定第一重合视频帧区域和第二重合视频帧区域之后,将第一视频与第二视频在第一重合视频帧区域和第二重合视频帧区域对两段视频进行处理,得到拼接视频。
在一个实施例中,在第一重合视频帧区域以及第二重合视频帧区域对第一视频与第二视频对齐后进行图像融合,将第一视频和第二视频进行图像融合后形成一个完整的全景视频。图像融合方法可以采用线性融合、泊松融合、多尺度融合、加权融合或者拉普拉斯金字塔融合等。可以理解的,视频中的每一帧视频可以认为是静止图像,在进行视频融合时,在重合视频帧区域内对对齐的多个视频帧融合可以认为是对多个静态图像的融合。
在一个实施例中,对第一视频与第二视频进行加权融合的方法形成完整的全景视频。
在一个实施例中,对第一视频与第二视频进行加权融合的过程中,权重可以通过重合区域中的当前视频帧对应的视频帧当前拍摄时间与参考视频帧的参考拍摄时间之间的当前时间差异来确定。假设Q表示权重,t1表示当前视频帧对应的视频帧当前拍摄时间,t2表示参考视频帧的参考拍摄时间,t表示重合区域中的视频帧对应的总时间,则权重可以通过当前视频帧对应的视频帧当前拍摄时间与参考视频帧的参考拍摄时间的差值,以及,重合区域中的视频帧对应的总时间计算得到。权重Q可以表示为公式:Q=(t1-t2)/t,利用权重和第一视频对应的第一重合视频帧区域中的视频帧与第二视频对应的第二重合视频帧区域中的视频帧进行图像融合,得到融合后的拼接视频。假设,I表示融合后的视频帧,I1表示第一段视频在重合区域中的当前视频帧,I2表示第二段视频的在重合区域中的当前视频帧,则融合后的视频帧I可以表示为公式:I=I1×Q+I2×(1-Q)。可以理解的,第一重合视频帧区域和第二重合视频帧区域分别是重合视频帧区域存在于第一视频和第二视频中的区域,并且该区域是第一视频和第二视频的重合区域,使用第一重合视频帧区域和第二重合视频帧区域,以便于区分该重合区域出现在第一视频和第二视频中对应的重合视频帧区域。
上述视频拼接方法中,终端获取待拼接的第一视频和第二视频,对第一视频或第二视频进行静止帧检测,得到静止帧序列;基于静止帧序列,得到参考视频帧;其中,第一视频是在第二视频拍摄之前拍摄得到。基于参考视频帧进行重合区域搜索,得到第一视频对应的第一重合视频帧区域以及所述第二视频对应的第二重合视频帧区域;基于第一重合视频帧区域以及第二重合视频帧区域将第一视频与第二视频进行拼接,得到拼接视频。通过确定参考视 频帧,得到第一重合视频帧区域和第二重合区域,基于上述两个重合视频帧区域对第一视频和第二视频这两段视频进行拼接,使第一视频和第二视频能够实现自然拼接过渡,提高视频拼接效果。
在一个实施例中,如图3所示,基于第一重合视频帧区域以及第二重合视频帧区域将第一视频与第二视频进行拼接,得到拼接视频包括:
步骤302,获取拼接视频帧位置,从第一重合视频帧区域获取拼接视频帧位置对应的第一拼接视频帧,从第二重合视频帧区域中获取拼接视频帧位置对应的第二拼接视频帧。
其中,拼接视频帧位置是指能够使第一视频与第二视频进行拼接的视频帧位置。例如,在重合视频帧区域中,不同时刻拍摄的视频,第一段视频中第100帧拍摄的空间位置为S帧,第二段视频中第10帧所拍摄的位置也是S帧,则拼接位置就是S帧;对应的第一段视频中的第100帧图像和第二段视频的第10帧图像处可以认为是拼接视频帧位置。
具体的,在对第一视频与第二视频进行拼接时,可以通过获取到两段视频的拼接视频帧位置来进行拼接。
在一个实施例中,可以选择重合区域的中心视频帧作为拼接视频帧位置,在得到拼接视频帧位置后,可以确定该拼接视频帧位置在第一视频中对应的第一拼接视频帧和该拼接视频帧位置在第二视频中对应的第二拼接视频帧。中心视频帧是位于视频帧序列中间位置的视频帧。例如,视频帧序列排列有5个视频帧,视频帧位置分别为{1,2,3,4,5},则位于位置3处的视频帧为视频帧序列中间位置的视频帧。
在一个实施例中,可以选择在第一重合视频帧区域和第二重合视频帧区域进行重合区域中的视频帧对齐时,对各个对齐的视频帧计算得到各个对齐的视频帧之间的匹配点数量,将匹配点数量最多的视频帧作为拼接视频帧位置,在得到拼接视频帧位置后,可以确定该拼接视频帧位置在第一视频中对应的第一拼接视频帧和该拼接视频帧位置在第二视频中对应的第二拼接视频帧。
步骤304,确定第一拼接视频帧与第二拼接视频帧之间的空间变换关系,基于空间变换关系对第一视频以及第二视频进行视频帧对齐。
其中,空间变换关系是指第一拼接视频帧与第二拼接视频帧之间进行的旋转、平移或者放大缩小等的变换关系。
具体的,第一拼接视频与第二拼接视频之间由于拍摄角度等原因,可能存在有一定的角度。相应的,第一拼接视频帧与第二拼接视频帧之间也同样会存在一定的角度,要完成第一拼接视频帧与第二拼接视频帧的拼接,需要确定第一拼接视频帧与第二拼接视频帧之间的空间变换关系,基于空间变换关系对第一视频以及第二视频进行视频帧对齐后才能完成第一拼 接视频帧与第二拼接视频帧的拼接。
在一个实施例中,通过求出两幅图像之间的单应变换矩阵,将不同角度拍摄的图像都转换到同样的视角下,得到视频帧到视频帧的空间变换关系,基于空间变换关系对第一视频以及第二视频进行视频帧对齐。
步骤306,基于对齐之后的第一视频帧与第二视频帧进行视频帧拼接,得到拼接视频,其中,在拼接时,第一重合视频帧区域与第二重合视频帧区域进行融合得到融合视频帧。
其中,融合视频帧是指提取第一视频帧与第二视频帧对应的能够增强图像质量的信息,综合成的高质量的图像信息对应的视频帧。
本实施例中,通过获取拼接视频帧位置,在该拼接视频帧位置处对对应的第一拼接视频帧与第二拼接视频帧进行空间变换关系确定,使第一视频以及第二视频进行视频帧对齐后进行融合得到融合视频帧,能够实现准确地对第一视频和第二视频进行拼接,使拼接后的第一视频和第二视频实现自然过渡,提高拼接效果。
在一个实施例中,如图4所示,空间变换关系包括水平变换值,确定第一拼接视频帧与第二拼接视频帧之间的空间变换关系,基于空间变换关系对第一视频以及第二视频进行视频帧对齐包括:
步骤402,获取第一拼接视频帧的第一特征点和第二拼接视频帧的第二特征点。
其中,特征点是指能够体现每一视频帧图像中的本质特征,通过该本质特征能够识别图像中目标物体的点。可以通过两个视频帧中特征点的距离来进行计算两个视频帧的距离。
具体的,当第一视频和第二视频是防抖全景视频时,第一视频和第二视频之间的空间变换关系只存在水平变换值。其中,防抖全景视频是指全景视频通过惯性传感器和加速度计等记录的视频数据对全景视频进行防抖处理后的视频;防抖视频中的地平线基本保持在全景视频帧水平中线位置,在不同时刻拍摄相同位置的防抖全景视频帧之间俯仰角和滚转角基本为0。可以理解的,防抖全景视频是相同位置不同时刻拍摄的防抖视频图像之间只存在一个航向角,也即只存在水平方向的平移。
在一个实施例中,可以通过ORB特征点检测方法或者SIFT特征点检测方法,直接提取第一拼接视频帧的第一特征点和第二拼接视频帧的第二特征点。
在一个实施例中,可以先通过将全景视频对应的全景图转换成平面视图,再利用ORB特征点检测方法提取第一拼接视频帧的第一特征点和第二拼接视频帧的第二特征点。其中,平面视图可以是指全景图某个方向所看到的视场角为90度的平面图,比如全景图包括上下前后左右六个面,每个平面就是一个平面视图。例如,上视图、下视图、左视图、右视图、仰视图和底视图。
在一个实施例中,全景视频对应的全景图可以转换成平面视图视频中的底视图之后进行特征点的提取,通过旋转矩阵将全景图进行图像变换,得到全景图像到底视图图像的图像变换。其中,全景图是指图像视角涵盖地平线正负各180度,垂直方向正负各90度的图像;若将全景图看作在立方体的空间状态中的图像,可以认为该图像完全包含上下前后左右六个平面视图。
步骤404,确定第一特征点与第二特征点之间的水平距离。
其中,水平距离是指第一特征点水平方向的坐标与第二特征点水平方向的坐标之间的差值。例如,将第一特征点与第二特征点之间的水平距离表示为△x,第一特征点水平方向的坐标表示为x p1,第二特征点水平方向的坐标表示为x p2,则第一特征点与第二特征点之间的水平距离△x可以利用如下公式计算得到:
△x=x p1-x p2
具体的,在得到第一拼接视频帧的第一特征点和第二拼接视频帧的第二特征点之后,计算第一拼接视频帧的第一特征点和第二拼接视频帧的第二特征点之间的水平距离。
步骤406,基于水平距离确定第一拼接视频帧与第二拼接视频帧之间的水平变换值。
其中,水平变换值是指利用水平距离得到的第一拼接视频帧与第二拼接视频帧之间的水平差异。
具体的,可以根据水平距离的不同取值范围得到不同的水平变换值。
在一个实施例中,根据水平变换值与水平距离之间的正相关关系,利用不同的水平距离得到水平变换值。假设dx表示水平变换值,w表示全景视频帧的宽度,则水平变换值dx可以表示为公式:
Figure PCTCN2022077635-appb-000001
在一个实施例中,可以利用水平变换值的统计值得到第一拼接视频帧与第二拼接视频帧之间的水平变换值。例如,可以将水平变换值的平均值作为第一拼接视频帧与第二拼接视频帧之间的水平变换值。
在一个实施例中,水平变换值的统计值是通过将得到的各个水平变换值进行排序之后得到,可以将各个水平变换值从大到小进行排序或者从小到大进行排序,将位于排序中间位置的水平变换值作为水平变换值的统计值。也可以通过其他方法得到水平变换值的统计值。例如,通过计算各个水平变换值的平均值、加权平均值或者众数等得到水平变换值的统计值。
本实施例中,通过第一拼接视频帧的第一特征点和第二拼接视频帧的第二特征点的获取,得到第一特征点和第二特征点的水平距离,通过水平距离确定第一拼接视频帧与第二拼接视频帧之间的水平变换值,能够达到准确确定第一拼接视频帧与第二拼接视频帧之间的空间变换关系的目的,根据准确的空间变换关系进而提高视频拼接的效果。
在一个实施例中,如图5所示,第一重合视频帧区域与第二重合视频帧区域进行融合得到融合视频帧的步骤包括:
步骤502,从第一重合视频帧区域中获取待融合的当前视频帧。
具体的,在重合视频帧区域对第一视频和第二视频进行融合之前,需要首先在重合视频帧区域获取待融合的当前视频帧。
在一个实施例中,通过OpenCV软件库读取视频,并且提取视频中的每一帧,可以利用OpenCV软件库中的视频获取结构函数来实现。例如,利用视频获取结构函数VideoCapture和Mat来获取视频,进一步,可以利用上述视频获取结构函数获取到待融合的当前视频帧。例如,filename表示视频文件,frame表示需要获取的某一帧视频帧,利用上述视频获取结构函数从摄像头或者文件中抓取并返回一帧视频帧可以表示为:
VideoCapture cap(“filename”);
Mat frame;
cap>>frame;
步骤504,获取当前视频帧对应的视频帧当前拍摄时间与参考视频帧的参考拍摄时间之间的当前时间差异。
具体的,在第一视频中的第一重合视频帧区域中获取待融合的当前视频帧后,因为拍摄时间的不同,当前视频帧会有相对应的拍摄时间,同样的参考视频帧也会有相应的拍摄时间,通过当前视频帧的拍摄时间和参考视频帧的拍摄时间,可以得到当前时间差异。
在一个实施例中,拍摄时间可以利用时间戳表示,该时间戳可以用帧数来表示,也可以用帧数乘以帧率来表示,两种表示方式都可以唯一确定视频帧对应的拍摄时间。例如,利用帧数来表示时间戳,视频中第100帧图像的时间戳为100;利用帧数乘以帧率来表示时间戳,假设视频帧率为30帧每秒,则时间戳也可以为100/30=3.33秒。根据当前视频帧的时间戳和参考视频帧的时间戳之间的差异,得到当前时间差异。
在一个实施例中,可以通过当前视频帧对应的视频帧当前拍摄时间与参考视频帧的参考拍摄时间之间的差值,得到当前时间差异。假设t1表示当前视频帧对应的视频帧当前拍摄时间,t2表示参考视频帧的参考拍摄时间,△t表示当前视频帧对应的视频帧当前拍摄时间与参考视频帧的参考拍摄时间之间的当前时间差异,则当前时间差异△t可以表示为公式:△t=t1- t2。
步骤506,基于当前时间差异得到当前视频帧对应的当前融合权重,其中,当前时间差异与当前融合权重成正相关关系。
其中,融合权重是指在图像融合过程中当前视频帧所对应的比重。正相关关系是指当前融合权重和当前时间差异的增大或者减少的趋势相同;当前时间差异增大,当前融合权重也随着增大,当前时间差异减小,当前融合权重也随着减小。
具体的,在获取到当前时间差异之后,根据当前时间差异与当前融合权重的正相关关系,可以得到当前融合权重。
在一个实施例中,对第一视频与第二视频进行加权融合的过程中,权重可以通过重合区域中的当前视频帧对应的视频帧当前拍摄时间与参考视频帧的参考拍摄时间之间的当前时间差异来确定,假设Q表示权重,t1表示当前视频帧对应的视频帧当前拍摄时间,t2表示参考视频帧的参考拍摄时间,t表示重合区域中的视频帧对应的总时间,则权重可以通过当前视频帧对应的视频帧当前拍摄时间与参考视频帧的参考拍摄时间的差值,以及,重合区域中的视频帧对应的总时间计算得到,权重Q可以表示为公式:Q=(t1-t2)/t,权重Q随着当前时间差异t1-t2的增大而增大;同样的,权重Q随着当前时间差异t1-t2的减小而减小。
步骤508,基于当前融合权重将当前视频帧与第二重合视频帧区域对应位置的视频帧进行融合,得到当前融合视频帧。
具体的,在重合区域中得到当前融合权重后,利用当前融合权重将第一视频和第二视频进行融合,得到质量更高的拼接视频。
在一个实施例中,利用当前融合权重、第一视频对应的第一重合视频帧区域中的视频帧和第二视频对应的第二重合视频帧区域中的视频帧进行图像融合,得到融合后的拼接视频,假设,I表示融合后的视频帧,I1表示第一段视频在重合区域中的当前视频帧,I2表示第二段视频的在重合区域中的当前视频帧,则融合后的视频帧I可以表示为公式:I=I1×Q+I2×(1-Q)。可以理解的,第一重合视频帧区域和第二重合视频帧区域分别是重合视频帧区域存在于第一视频和第二视频中的区域,并且该区域是第一视频和第二视频的重合区域,使用第一重合视频帧区域和第二重合视频帧区域,以便于区分该重合区域出现在第一视频和第二视频中对应的区域。
本实施例中,通过获取到当前时间差异,通过该当前时间差异获取到当前融合权重,基于当前融合权重将当前视频帧与第二重合视频帧区域对应位置的视频帧进行融合,得到当前融合视频帧,能够达到得到自然过渡效果的完整视频的目的。
在一个实施例中,如图6所示,基于当前时间差异得到当前视频帧对应的当前融合权重 包括:
步骤602,获取重合视频帧区域对应的重合区域时间长度。
其中,重合区域时间长度是指重合区域的视频帧对应的视频时间长度。例如,重合区域的视频长度为600毫秒,则重合区域时间长度为600毫秒。
具体的,当前融合权重计算中的参数之一为重合区域时间长度,通过获取重合区域时间长度可以确定计算当前融合权重的其中一个参数。
在一个实施例中,可以通过重合区域中的帧总数,以及视频帧率,根据帧总数与视频帧率之间的函数关系,获取到重合区域时间长度。例如,b表示在重合区域中的帧总数,v表示帧率,t表示重合区域时间长度,则重合区域时间长度t可以表示为公式:t=b/v。
步骤604,计算当前时间差异与重合区域时间长度的比值,得到当前融合权重。
具体的,t1-t2表示当前时间差异,t表示重合区域时间长度,Q表示当前融合权重,则当前融合权重Q可以表示为公式:Q=(t1-t2)/t。
本实施例中,通过当前时间差异与重合区域时间长度的比值,能够达到准确得到当前融合权重的目的,以使第一视频和第二视频通过重合区域进行拼接时,能够利用当前融合权重对重合区域中的视频帧进行融合,提高视频拼接效果。
在一个实施例中,如图7所示,基于参考视频帧进行重合区域搜索,得到第一视频对应的第一重合视频帧区域以及第二视频对应的第二重合视频帧区域包括:
步骤702,将参考视频帧分别与第一视频中的各个视频帧进行对比,得到第一视频中与参考视频帧匹配的匹配视频帧。
其中,匹配视频帧是指第一视频中与参考视频帧能够满足匹配条件的视频帧。例如,可以将第一视频中的各个视频帧中与参考视频帧匹配点数量最多的视频帧作为匹配视频帧。
具体的,参考视频帧是处在第二视频中的,静止帧的后向视频帧,为了避免视频存在卡顿或者静止等视频质量的情况,认为参考频帧为第一个非静止帧,选择参考视频帧作为需要对比的视频帧,得到的匹配视频帧。
在一个实施例中,在第一视频中的各个视频帧中,选择与第二视频中作为参考视频帧的视频帧匹配率最高的视频帧作为匹配视频帧。匹配率可以是特征点匹配数目和特征点总数的比值。例如,参考视频帧与第一视频中的某一视频帧的特征点匹配数目为1000,特征点总数是1500,则匹配率为1000与1500的比值,即为67%。
步骤704,将匹配视频帧对应的尾部视频帧区域,作为第一视频对应的第一重合视频帧区域。
其中,尾部视频帧区域是指从匹配视频帧开始到第一视频结束视频帧之间对应的视频帧 区域。例如,匹配视频帧为P,则尾部视频帧区域为第一视频中P帧之后的视频帧。
具体的,在获取到匹配视频帧之后,可以在第一视频中得到匹配视频帧之后的视频帧作为第一视频对应的第一重合视频帧区域。
步骤706,将第二视频帧中参考视频帧所在的参考视频帧区域,作为第二视频对应的第二重合视频帧区域,参考视频帧为参考视频帧区域的头部视频帧,参考视频帧区域与尾部视频帧区域的视频帧数量匹配。
其中,头部视频帧是指在视频帧区域中的第一个视频帧。
具体的,在重合区域中既有第一视频中的视频帧,也有第二视频中的视频帧,要使得在重合区域中完成视频帧的对齐和融合,需要在重合区域中的视频帧数量相同。参考视频帧区域是处在第二视频中的第二重合视频帧区域,尾部视频帧区域是处在第一视频中的第一重合视频帧区域,在两个重合视频帧区域中存在相同数量的视频帧。可以理解的,第一重合视频帧区域和第二重合视频帧区域在实现视频融合后,会形成一个重合视频帧区域,在该重合视频帧区域前的视频帧为第一视频的视频帧,在该重合视频帧区域后的视频帧为第二视频的视频帧。
本实施例中,通过参考视频帧可以得到匹配视频帧,通过匹配视频帧得到第一重合视频帧区域,相应的在第二视频上得到第二重合视频帧区域,能够达到准确确定重合视频帧区域的目的,以使对重合视频帧区域进行视频融合,在重合视频帧区域实现第一视频和第二视频的自然拼接。
在一个实施例中,如图8所示,基于参考视频帧进行重合区域搜索,得到第一视频对应的第一重合视频帧区域以及第二视频对应的第二重合视频帧区域包括:
步骤802,在第一视频中获取预设帧数的尾部视频帧序列,作为第一视频对应的第一重合视频帧区域。
其中,预设帧数是指预先确定的视频帧帧数,通过该视频帧帧数可以确定获取到的尾部视频帧序列中的视频帧帧数。例如,预设帧数为m,则获取到的尾部视频帧序列中包含有m个视频帧。
具体的,在确定第一重合视频帧区域之前,可以通过预判的方式进行不断试验的方式进行预设帧数的确定。
在一个实施例中,可以以视频的总帧数为参考,根据经验值,设置多个预设帧数对应的视频帧区域作为第一视频对应的第一重合视频帧区域。
步骤804,从参考视频帧对应的后向视频帧序列中,获取与尾部视频帧序列匹配的匹配视频帧序列,将匹配视频帧序列作为第二视频对应的第二重合视频帧区域。
其中,后向视频帧序列是指参考视频帧之后的视频帧组成的序列。该序列可以包含参考视频帧之后的部分视频帧,也可以包括参考视频帧之后的全部视频帧。
具体的,在得到作为第一重合视频帧区域的尾部视频帧序列之后,需要在第二视频中找到对应的,与第一重合视频帧区域视频帧数目相同并且满足一定的匹配条件的视频帧序列,作为第二重合视频帧区域。
在一个实施例中,可以取参考视频帧后预设帧数的视频帧作为匹配视频帧序列。例如,可以取参考视频帧后的m帧,通过对比第一视频中的后m帧,得到特征点匹配数量统计值最大时对应的m帧作为重合区域。当m取不同数值时,得到的特征点匹配数量统计值不同,形成数值m与特征点匹配数量统计值之间的对应关系表,从该表中查找匹配数量统计值最大值,该最大值对应的m为得到的重合区域的视频帧数。
表2.m值与特征点匹配数量统计值对应关系表
m值(预设帧数) 特征点匹配数量统计值
20 1000
30 1200
35 900
40 1100
由表2中可以查找到,当m取值为30时,对应的特征点匹配数量统计值为最大,则将第一视频中的后30帧作为重合区域。
本实施例中,通过在第一视频中获取预设帧数的尾部视频帧序列得到第一重合视频帧区域,通过得到的第一重合视频帧区域中的视频帧获取与尾部视频帧序列匹配的匹配视频帧序列,将匹配视频帧序列作为第二视频对应的第二重合视频帧区域,能够达到准确确定重合视频帧区域的目的,以使得在该重合视频帧区域完成第一视频和第二视频的拼接,达到第一视频和第二视频在拼接过程中能够自然过渡,提高视频拼接的效果。
在一个实施例中,对第一视频或第二视频进行静止帧检测,得到静止帧序列包括:
将第一视频或者第二视频转换为平面视频。
其中,平面视频是由各个平面视图图像组成的视频。平面视图图像可以是指全景图某个方向所看到的视场角为90度的平面图,比如全景图包括上下前后左右六个面,每个平面就是一个平面视图。例如,平面视图包括上视图、下视图、左视图、右视图、仰视图和底视图。
在一个实施例中,全景视频对应的全景图可以转换成平面视频对应的平面视图中的底视图之后进行特征点的提取,通过旋转矩阵将全景图进行图像变换,得到全景图像到底视图图像的图像变换。其中,全景图是指图像视角涵盖地平线正负各180度,垂直方向正负各90度 的图像;若将全景图看作在立方体的空间状态中的图像,可以认为该图像完全包含上下前后左右六个平面视图。
对平面视频进行静止帧检测,得到静止帧序列。
在一个实施例中,可以分别获取第一平面视频和第二平面视频,通过第一平面视频中的最后一个视频帧依次与最后一个视频帧前的连续的多个视频帧进行特征点提取,并进行特征点匹配。当最后一个视频帧前的连续的多个视频帧特征点匹配结果都满足阈值条件时,将该多个视频帧组成的视频帧序列确定为静止帧序列。例如,第一平面视频中的最后一个平面视频帧表示为第1帧,与最后一个视频帧前的连续n-2帧进行特征点提取和匹配,当匹配结果都满足阈值条件时,将第一视频中的后n-1帧组成的视频帧序列确定为静止帧序列。
在一个实施例中,可以通过第二平面视频中的第一个平面视频帧依次与第一个视频帧后的连续的多个视频帧进行特征点提取,并进行特征点匹配。当第一个视频帧后的连续的多个视频帧特征点匹配结果都满足阈值条件时,将该多个视频帧组成的视频帧序列确定为静止帧序列。例如,第二平面视频中的第一个视频帧表示为第1帧,与第一个视频帧后的连续n-2帧进行特征点提取和匹配,当匹配结果都满足阈值条件时,将第二平面视频中的前n-1帧组成的视频帧序列确定为静止帧序列。
在一个实施例中,如图9所示,在视频获取过程中,先将相机放置在第一视频(前一段视频)中拍摄末尾段视频经过的某个位置,然后将相机静止一段时间后,沿着第一视频拍摄末尾段视频的运动路径移动相机开始拍摄第二视频(第二段视频)。第一视频和第二视频都存在拍摄末尾段视频的重合区域,假设第二视频中开始运动的视频帧为A帧处,重合区域的结束位置为B帧处。该重合区域在第一视频中视频帧的范围也为A帧到B帧之间。可以取A帧到B帧之间中心位置处的视频帧作为拼接视频帧位置,也可以取A帧到B帧之间的其他视频帧作为拼接视频帧位置,利用该拼接视频帧位置完成第一视频和第二视频的拼接。假设拼接视频帧位置表示为C帧,C帧可以取A帧之后的多个视频帧处的视频帧,例如C帧可以取A帧之后的5个视频帧处的视频帧。
在一个实施例中,以防抖相机拍摄物体穿越障碍物的全景视频为例进行阐述。当拍摄到物体穿越障碍物时,物体穿过障碍物一定距离后停止拍摄第一段视频(第一视频)。然后,相机绕过障碍物,从障碍物另一侧拍摄物体穿越障碍物的第二段视频(第二视频),第二段视频的拍摄是从第一段视频拍摄到的物体穿过障碍物处开始拍摄的,由于拍摄存在一定的时延,所以拍摄的第二段视频在起始时刻看起来静止。然后,第二段视频沿着第一段视频拍摄的穿越障碍物的路线继续拍摄,使得前后两段视频存在相同的拍摄路线即重合路径,该重合路径用于衔接前后两段视频。防抖相机拍摄的全景视频,在拍摄位置大致相等的情况下,前后两 段视频之间只是存在一个简单的转化关系,该转换关系为水平平移关系,利用该水平平移关系移动第一段视频或者第二段视频,使第一段视频和第二段视频在重合路径内完成对齐,利用图像融合方法完成视频帧图像的融合,使第一段视频和第二段视频完成自然过渡,从而实现第一段视频和第二段视频的无缝衔接。在一个实施例中,提供了一种服务器,该服务器用于执行上述各方法实施例中的步骤。该服务器可以用独立的服务器或者是多个服务器组成的服务器集群来实现。
应该理解的是,虽然图2-8的流程图中的各个步骤按照箭头的指示依次显示,但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明,这些步骤的执行并没有严格的顺序限制,这些步骤可以以其它的顺序执行。而且,图2-8中的至少一部分步骤可以包括多个步骤或者多个阶段,这些步骤或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,这些步骤或者阶段的执行顺序也不必然是依次进行,而是可以与其它步骤或者其它步骤中的步骤或者阶段的至少一部分轮流或者交替地执行。
在一个实施例中,如图10所示,提供了一种视频拼接装置1000,包括:第一视频和第二视频获取模块1002、静止帧序列得到模块1004、参考视频帧得到模块1006、第一重合视频帧区域以及第二重合视频帧区域得到模块1008和拼接视频得到模块1010,其中:第一视频和第二视频获取模块1002,用于获取待拼接的第一视频和第二视频,第一视频在第二视频之前;静止帧序列得到模块1004,用于对对所述第一视频或第二视频进行静止帧检测,得到静止帧序列;参考视频帧得到模块1006,用于基于所述静止帧序列,得到参考视频帧;;第一重合视频帧区域以及第二重合视频帧区域得到模块1008,用于基于参考视频帧进行重合区域搜索,得到第一视频对应的第一重合视频帧区域以及第二视频对应的第二重合视频帧区域;拼接视频得到模块1010,用于基于第一重合视频帧区域以及第二重合视频帧区域将第一视频与第二视频进行拼接,得到拼接视频。
在一个实施例中,拼接视频得到模块1010用于获取拼接视频帧位置,从第一重合视频帧区域获取拼接视频帧位置对应的第一拼接视频帧,从第二重合视频帧区域中获取拼接视频帧位置对应的第二拼接视频帧;确定第一拼接视频帧与第二拼接视频帧之间的空间变换关系,基于空间变换关系对第一视频以及第二视频进行视频帧对齐;基于对齐之后的第一视频帧与第二视频帧进行视频帧拼接,得到拼接视频,其中,在拼接时,第一重合视频帧区域与第二重合视频帧区域进行融合得到融合视频帧。
在一个实施例中,拼接视频得到模块1010用于获取第一拼接视频帧的第一特征点和第二拼接视频帧的第二特征点;确定第一特征点与第二特征点之间的水平距离;基于水平距离确定第一拼接视频帧与第二拼接视频帧之间的水平变换值。
在一个实施例中,拼接视频得到模块1010用于从第一重合视频帧区域中获取待融合的当前视频帧;获取当前视频帧对应的视频帧当前拍摄时间与参考视频帧的参考拍摄时间之间的当前时间差异;基于当前时间差异得到当前视频帧对应的当前融合权重,其中,当前时间差异与当前融合权重成正相关关系;基于当前融合权重将当前视频帧与第二重合视频帧区域对应位置的视频帧进行融合,得到当前融合视频帧。
在一个实施例中,拼接视频得到模块1010用于获取重合视频帧区域对应的重合区域时间长度;计算当前时间差异与重合区域时间长度的比值,得到当前融合权重。
在一个实施例中,第一重合视频帧区域以及第二重合视频帧区域得到模块1008用于将参考视频帧分别与第一视频中的各个视频帧进行对比,得到第一视频中与参考视频帧匹配的匹配视频帧;将匹配视频帧对应的尾部视频帧区域,作为第一视频对应的第一重合视频帧区域;将第二视频帧中参考视频帧所在的参考视频帧区域,作为第二视频对应的第二重合视频帧区域,参考视频帧为参考视频帧区域的头部视频帧,参考视频帧区域与尾部视频帧区域的视频帧数量匹配。
在一个实施例中,第一重合视频帧区域以及第二重合视频帧区域得到模块1008用于在第一视频中获取预设帧数的尾部视频帧序列,作为第一视频对应的第一重合视频帧区域;从参考视频帧对应的后向视频帧序列中,获取与尾部视频帧序列匹配的匹配视频帧序列,将匹配视频帧序列作为第二视频对应的第二重合视频帧区域。
在一个实施例中,静止帧序列得到模块1004用于将第一视频或者第二视频转换为平面视频;对平面视频进行静止帧检测,得到静止帧序列。
关于视频拼接装置的具体限定可以参见上文中对于视频拼接方法的限定,在此不再赘述。上述视频拼接装置中的各个模块可全部或部分通过软件、硬件及其组合来实现。上述各模块可以硬件形式内嵌于或独立于计算机设备中的处理器中,也可以以软件形式存储于计算机设备中的存储器中,以便于处理器调用执行以上各个模块对应的操作。
在一个实施例中,提供了一种计算机设备,该计算机设备可以是终端,其内部结构图可以如图11所示。该计算机设备包括通过系统总线连接的处理器、存储器、通信接口、显示屏和输入装置。其中,该计算机设备的处理器用于提供计算和控制能力。该计算机设备的存储器包括非易失性存储介质、内存储器。该非易失性存储介质存储有操作系统和计算机程序。该内存储器为非易失性存储介质中的操作系统和计算机程序的运行提供环境。该计算机设备的通信接口用于与外部的终端进行有线或无线方式的通信,无线方式可通过WIFI、运营商网络、NFC(近场通信)或其他技术实现。该计算机程序被处理器执行时以实现一种视频拼接方法。该计算机设备的显示屏可以是液晶显示屏或者电子墨水显示屏,该计算机设备的输入 装置可以是显示屏上覆盖的触摸层,也可以是计算机设备外壳上设置的按键、轨迹球或触控板,还可以是外接的键盘、触控板或鼠标等。
本领域技术人员可以理解,图11中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的计算机设备的限定,具体的计算机设备可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。
在一个实施例中,在一个实施例中,还提供了一种计算机设备,包括存储器和处理器,存储器中存储有计算机程序,该处理器执行计算机程序时实现上述各方法实施例中的步骤。
在一个实施例中,提供了一种计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器执行时实现上述各方法实施例中的步骤。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,所述的计算机程序可存储于一非易失性计算机可读取存储介质中,该计算机程序在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和易失性存储器中的至少一种。非易失性存储器可包括只读存储器(Read-Only Memory,ROM)、磁带、软盘、闪存或光存储器等。易失性存储器可包括随机存取存储器(Random Access Memory,RAM)或外部高速缓冲存储器。作为说明而非局限,RAM可以是多种形式,比如静态随机存取存储器(Static Random Access Memory,SRAM)或动态随机存取存储器(Dynamic Random Access Memory,DRAM)等。
以上实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。
以上所述实施例仅表达了本申请的几种实施方式,其描述较为具体和详细,但并不能因此而理解为对发明专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本申请构思的前提下,还可以做出若干变形和改进,这些都属于本申请的保护范围。因此,本申请专利的保护范围应以所附权利要求为准。

Claims (11)

  1. 一种视频拼接方法,其特征在于,所述方法包括:
    获取待拼接的第一视频和第二视频,所述第一视频在所述第二视频之前;
    对所述第一视频或第二视频进行静止帧检测,得到静止帧序列;
    基于所述静止帧序列,得到参考视频帧;基于所述参考视频帧进行重合区域搜索,得到所述第一视频对应的第一重合视频帧区域以及所述第二视频对应的第二重合视频帧区域;
    基于所述第一重合视频帧区域以及第二重合视频帧区域将所述第一视频与所述第二视频进行拼接,得到拼接视频。
  2. 根据权利要求1所述的方法,其特征在于,所述基于所述第一重合视频帧区域以及第二重合视频帧区域将所述第一视频与所述第二视频进行拼接,得到拼接视频包括:
    获取拼接视频帧位置,从所述第一重合视频帧区域获取所述拼接视频帧位置对应的第一拼接视频帧,从所述第二重合视频帧区域中获取所述拼接视频帧位置对应的第二拼接视频帧;
    确定所述第一拼接视频帧与所述第二拼接视频帧之间的空间变换关系,基于所述空间变换关系对所述第一视频以及所述第二视频进行视频帧对齐;
    基于对齐之后的第一视频帧与所述第二视频帧进行视频帧拼接,得到拼接视频,其中,在拼接时,所述第一重合视频帧区域与所述第二重合视频帧区域进行融合得到融合视频帧。
  3. 根据权利要求2所述的方法,其特征在于,所述空间变换关系包括水平变换值,所述确定所述第一拼接视频帧与所述第二拼接视频帧之间的空间变换关系,基于所述空间变换关系对所述第一视频以及所述第二视频进行视频帧对齐包括:
    获取所述第一拼接视频帧的第一特征点和所述第二拼接视频帧的第二特征点;
    确定所述第一特征点与所述第二特征点之间的水平距离;
    基于所述水平距离确定所述第一拼接视频帧与所述第二拼接视频帧之间的水平变换值。
  4. 根据权利要求2所述的方法,其特征在于,所述第一重合视频帧区域与所述第二重合视频帧区域进行融合得到融合视频帧的步骤包括:
    从所述第一重合视频帧区域中获取待融合的当前视频帧;
    获取所述当前视频帧对应的视频帧当前拍摄时间与所述参考视频帧的参考拍摄时间之间的当前时间差异;
    基于所述当前时间差异得到当前视频帧对应的当前融合权重,其中,当前时间差异与当前融合权重成正相关关系;
    基于当前融合权重将当前视频帧与所述第二重合视频帧区域对应位置的视频帧进行融合,得到当前融合视频帧。
  5. 根据权利要求4所述的方法,其特征在于,所述基于当前时间差异得到当前视频帧对应的当前融合权重包括:
    获取重合视频帧区域对应的重合区域时间长度;
    计算所述当前时间差异与所述重合区域时间长度的比值,得到当前融合权重。
  6. 根据权利要求1所述的方法,其特征在于,所述基于所述参考视频帧进行重合区域搜索,得到所述第一视频对应的第一重合视频帧区域以及所述第二视频对应的第二重合视频帧区域包括:
    将所述参考视频帧分别与所述第一视频中的各个视频帧进行对比,得到第一视频中与所述参考视频帧匹配的匹配视频帧;
    将所述匹配视频帧对应的尾部视频帧区域,作为所述第一视频对应的第一重合视频帧区域;
    将所述第二视频帧中所述参考视频帧所在的参考视频帧区域,作为所述第二视频对应的第二重合视频帧区域,所述参考视频帧为所述参考视频帧区域的头部视频帧,所述参考视频帧区域与所述尾部视频帧区域的视频帧数量匹配。
  7. 根据权利要求1所述的方法,其特征在于,所述基于所述参考视频帧进行重合区域搜索,得到所述第一视频对应的第一重合视频帧区域以及所述第二视频对应的第二重合视频帧区域包括:
    在所述第一视频中获取预设帧数的尾部视频帧序列,作为所述第一视频对应的第一重合视频帧区域;
    从所述参考视频帧对应的后向视频帧序列中,获取与所述尾部视频帧序列匹配的匹配视频帧序列,将匹配视频帧序列作为所述第二视频对应的第二重合视频帧区域。
  8. 根据权利要求1所述的方法,其特征在于,所述对所述第一视频或第二视频进行静止帧检测,得到静止帧序列包括:
    将所述第一视频或者第二视频转换为平面视频;
    对所述平面视频进行静止帧检测,得到所述静止帧序列。
  9. 一种视频拼接装置,其特征在于,所述装置包括:
    第一视频和第二视频获取模块,用于获取待拼接的第一视频和第二视频,所述第一视频在所述第二视频之前;
    静止帧序列得到模块,用于对所述第一视频或第二视频进行静止帧检测,得到静止帧序列;
    参考视频帧得到模块,用于基于所述静止帧序列,得到参考视频帧;
    第一重合视频帧区域以及第二重合视频帧区域得到模块,用于基于所述参考视频帧进行重合区域搜索,得到所述第一视频对应的第一重合视频帧区域以及所述第二视频对应的第二重合视频帧区域;
    拼接视频得到模块,用于基于所述第一重合视频帧区域以及第二重合视频帧区域将所述第一视频与所述第二视频进行拼接,得到拼接视频。
  10. 一种计算机设备,包括存储器和处理器,所述存储器存储有计算机程序,其特征在于,所述处理器执行所述计算机程序时实现权利要求1至8中任一项所述的方法的步骤。
  11. 一种计算机可读存储介质,其上存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现权利要求1至8中任一项所述的方法的步骤。
PCT/CN2022/077635 2021-02-26 2022-02-24 视频拼接方法、装置、计算机设备和存储介质 WO2022179554A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP22758918.1A EP4300982A4 (en) 2021-02-26 2022-02-24 VIDEO COLLECTION METHOD AND APPARATUS, COMPUTER DEVICE AND STORAGE MEDIUM
JP2023550696A JP2024506109A (ja) 2021-02-26 2022-02-24 ビデオつなぎ合わせ方法及び装置、コンピュータデバイス並びに記憶媒体

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110217098.6 2021-02-26
CN202110217098.6A CN114979758B (zh) 2021-02-26 2021-02-26 视频拼接方法、装置、计算机设备和存储介质

Publications (1)

Publication Number Publication Date
WO2022179554A1 true WO2022179554A1 (zh) 2022-09-01

Family

ID=82973207

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/077635 WO2022179554A1 (zh) 2021-02-26 2022-02-24 视频拼接方法、装置、计算机设备和存储介质

Country Status (4)

Country Link
EP (1) EP4300982A4 (zh)
JP (1) JP2024506109A (zh)
CN (1) CN114979758B (zh)
WO (1) WO2022179554A1 (zh)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101304490A (zh) * 2008-06-20 2008-11-12 北京六维世纪网络技术有限公司 一种拼合视频的方法和装置
US20160286138A1 (en) * 2015-03-27 2016-09-29 Electronics And Telecommunications Research Institute Apparatus and method for stitching panoramaic video
US20190028643A1 (en) * 2016-02-29 2019-01-24 Sony Corporation Image processing device, display device, reproduction control method, and image processing system
CN111294644A (zh) * 2018-12-07 2020-06-16 腾讯科技(深圳)有限公司 视频拼接方法、装置、电子设备及计算机存储介质

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101668130A (zh) * 2009-09-17 2010-03-10 深圳市启欣科技有限公司 一种电视拼接墙分割补偿系统

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101304490A (zh) * 2008-06-20 2008-11-12 北京六维世纪网络技术有限公司 一种拼合视频的方法和装置
US20160286138A1 (en) * 2015-03-27 2016-09-29 Electronics And Telecommunications Research Institute Apparatus and method for stitching panoramaic video
US20190028643A1 (en) * 2016-02-29 2019-01-24 Sony Corporation Image processing device, display device, reproduction control method, and image processing system
CN111294644A (zh) * 2018-12-07 2020-06-16 腾讯科技(深圳)有限公司 视频拼接方法、装置、电子设备及计算机存储介质

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP4300982A4 *

Also Published As

Publication number Publication date
EP4300982A4 (en) 2024-07-03
CN114979758B (zh) 2023-03-21
JP2024506109A (ja) 2024-02-08
EP4300982A1 (en) 2024-01-03
CN114979758A (zh) 2022-08-30

Similar Documents

Publication Publication Date Title
AU2012219026B2 (en) Image quality assessment
CN111046752B (zh) 一种室内定位方法、计算机设备和存储介质
Meng et al. SkyStitch: A cooperative multi-UAV-based real-time video surveillance system with stitching
CN111915483B (zh) 图像拼接方法、装置、计算机设备和存储介质
CN108958469B (zh) 一种基于增强现实的在虚拟世界增加超链接的方法
CN108875507B (zh) 行人跟踪方法、设备、系统和计算机可读存储介质
CN111241872B (zh) 视频图像遮挡方法及装置
WO2020228680A1 (zh) 基于双相机图像的拼接方法、装置、电子设备
WO2022206680A1 (zh) 图像处理方法、装置、计算机设备和存储介质
CN112163503A (zh) 办案区人员无感轨迹生成方法、系统、存储介质及设备
EP3035242B1 (en) Method and electronic device for object tracking in a light-field capture
CN111047622A (zh) 视频中对象的匹配方法和装置、存储介质及电子装置
CN114092720A (zh) 目标跟踪方法、装置、计算机设备和存储介质
JP6396682B2 (ja) 監視カメラシステム
Wang et al. Ranking video salient object detection
JP2014228881A5 (zh)
CN114500873A (zh) 跟踪拍摄系统
Huang et al. Tracking multiple deformable objects in egocentric videos
CN113286084A (zh) 终端的图像采集方法及装置、存储介质、终端
WO2022179554A1 (zh) 视频拼接方法、装置、计算机设备和存储介质
US9392146B2 (en) Apparatus and method for extracting object
CN110930437B (zh) 目标跟踪方法和装置
JP5455101B2 (ja) 映像処理システムと映像処理方法、映像処理装置及びその制御方法と制御プログラム
CN112991175B (zh) 一种基于单ptz摄像头的全景图片生成方法及设备
CN113033350B (zh) 基于俯视图像的行人重识别方法、存储介质和电子设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22758918

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2023550696

Country of ref document: JP

WWE Wipo information: entry into national phase

Ref document number: 2022758918

Country of ref document: EP

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2022758918

Country of ref document: EP

Effective date: 20230926