WO2022179554A1 - 视频拼接方法、装置、计算机设备和存储介质 - Google Patents
视频拼接方法、装置、计算机设备和存储介质 Download PDFInfo
- Publication number
- WO2022179554A1 WO2022179554A1 PCT/CN2022/077635 CN2022077635W WO2022179554A1 WO 2022179554 A1 WO2022179554 A1 WO 2022179554A1 CN 2022077635 W CN2022077635 W CN 2022077635W WO 2022179554 A1 WO2022179554 A1 WO 2022179554A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- video
- video frame
- frame
- overlapping
- spliced
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 52
- 238000001514 detection method Methods 0.000 claims abstract description 31
- 230000000875 corresponding effect Effects 0.000 claims description 149
- 230000004927 fusion Effects 0.000 claims description 58
- 230000009466 transformation Effects 0.000 claims description 56
- 238000004590 computer program Methods 0.000 claims description 18
- 230000002596 correlated effect Effects 0.000 claims description 4
- 230000000694 effects Effects 0.000 abstract description 11
- 230000008569 process Effects 0.000 description 8
- 238000010586 diagram Methods 0.000 description 6
- 238000000605 extraction Methods 0.000 description 6
- 230000007704 transition Effects 0.000 description 6
- 238000004891 communication Methods 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 5
- 230000006870 function Effects 0.000 description 5
- 230000007423 decrease Effects 0.000 description 4
- 238000013519 translation Methods 0.000 description 4
- 239000011159 matrix material Substances 0.000 description 3
- 238000012545 processing Methods 0.000 description 3
- 238000006243 chemical reaction Methods 0.000 description 2
- 238000007500 overflow downdraw method Methods 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 1
- 125000004122 cyclic group Chemical group 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000008014 freezing Effects 0.000 description 1
- 238000007710 freezing Methods 0.000 description 1
- 238000007499 fusion processing Methods 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 230000002194 synthesizing effect Effects 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/44—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
- H04N21/44016—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving splicing one content stream with another content stream, e.g. for substituting a video clip
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformations in the plane of the image
- G06T3/14—Transformations for image registration, e.g. adjusting or mapping for alignment of images
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformations in the plane of the image
- G06T3/40—Scaling of whole images or parts thereof, e.g. expanding or contracting
- G06T3/4038—Image mosaicing, e.g. composing plane images from plane sub-images
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/30—Determination of transform parameters for the alignment of images, i.e. image registration
- G06T7/33—Determination of transform parameters for the alignment of images, i.e. image registration using feature-based methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/30—Determination of transform parameters for the alignment of images, i.e. image registration
- G06T7/38—Registration of image sequences
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/44—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
- H04N21/44008—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N23/00—Cameras or camera modules comprising electronic image sensors; Control thereof
- H04N23/60—Control of cameras or camera modules
- H04N23/68—Control of cameras or camera modules for stable pick-up of the scene, e.g. compensating for camera body vibrations
- H04N23/682—Vibration or motion blur correction
- H04N23/683—Vibration or motion blur correction performed by a processor, e.g. controlling the readout of an image memory
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N23/00—Cameras or camera modules comprising electronic image sensors; Control thereof
- H04N23/60—Control of cameras or camera modules
- H04N23/698—Control of cameras or camera modules for achieving an enlarged field of view, e.g. panoramic image capture
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10016—Video; Image sequence
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20212—Image combination
- G06T2207/20221—Image fusion; Image merging
Definitions
- the present application relates to the technical field of image processing, and in particular, to a video splicing method, device, computer equipment and storage medium.
- Video stitching technology can stitch together videos shot under different time conditions to form a complete video.
- cameras take panoramic videos of objects passing through obstacles. When shooting an object passing through an obstacle, stop shooting the first video after the object passes through the obstacle for a certain distance. Then, the camera bypasses the obstacle, shoots a second video of the object passing through the obstacle from the other side of the obstacle, and stitches the first video and the second video to form a complete panoramic video of the object passing through the obstacle.
- Panoramic video is widely used in various fields due to its large viewing angle and high resolution. Therefore, video stitching technology is also widely used in various fields.
- the current video splicing method has the problem of poor video splicing effect.
- a video splicing method comprising: acquiring a first video and a second video to be spliced, the first video is before the second video; performing still frame detection on the first video or the second video , obtain a still frame sequence; obtain a reference video frame based on the still frame sequence; perform a coincidence area search based on the reference video frame to obtain the first coincident video frame area corresponding to the first video and the corresponding second video frame.
- the second overlapping video frame area based on the first overlapping video frame area and the second overlapping video frame area, the first video and the second video are spliced to obtain a spliced video.
- the step of splicing the first video and the second video based on the first overlapping video frame area and the second overlapping video frame area to obtain the spliced video includes: acquiring the position of the spliced video frame , obtaining the first splicing video frame corresponding to the splicing video frame position from the first overlapping video frame area, and obtaining the second splicing video frame corresponding to the splicing video frame position from the second overlapping video frame area; Determine the spatial transformation relationship between the first spliced video frame and the second spliced video frame, and perform video frame alignment on the first video and the second video based on the spatial transformation relationship; The first video frame is spliced with the second video frame to obtain a spliced video, wherein, during splicing, the first overlapping video frame area and the second overlapping video frame area are fused to obtain a fused video frame.
- the spatial transformation relationship includes a horizontal transformation value
- Performing video frame alignment on the first video and the second video includes: acquiring a first feature point of the first spliced video frame and a second feature point of the second splicing video frame; determining the first feature point The horizontal distance between the second feature point and the second feature point; the horizontal transformation value between the first spliced video frame and the second spliced video frame is determined based on the horizontal distance.
- the step of fusing the first overlapping video frame region and the second overlapping video frame region to obtain a fused video frame includes: acquiring a current video to be fused from the first overlapping video frame region frame; obtain the current time difference between the current shooting time of the video frame corresponding to the current video frame and the reference shooting time of the reference video frame; obtain the current fusion weight corresponding to the current video frame based on the current time difference, wherein the current time The difference is positively correlated with the current fusion weight; based on the current fusion weight, the current video frame is fused with the video frame at the corresponding position of the second overlapping video frame region to obtain the current fusion video frame.
- the obtaining the current fusion weight corresponding to the current video frame based on the current time difference includes: obtaining the overlapping area time length corresponding to the overlapping video frame area; calculating the difference between the current time difference and the overlapping area time length ratio to get the current fusion weight.
- the searching for the overlapping area based on the reference video frame, and obtaining the first overlapping video frame area corresponding to the first video and the second overlapping video frame area corresponding to the second video include: Comparing the reference video frame with each video frame in the first video to obtain a matching video frame matching the reference video frame in the first video; compare the tail video frame area corresponding to the matching video frame , as the first overlapping video frame area corresponding to the first video; the reference video frame area where the reference video frame is located in the second video frame is taken as the second overlapping video frame area corresponding to the second video , the reference video frame is a head video frame of the reference video frame region, and the reference video frame region matches the number of video frames in the tail video frame region.
- the searching for the overlapping area based on the reference video frame, and obtaining the first overlapping video frame area corresponding to the first video and the second overlapping video frame area corresponding to the second video include: Acquire a trailing video frame sequence with a preset number of frames in the first video as the first coincident video frame area corresponding to the first video; from the backward video frame sequence corresponding to the reference video frame, obtain For the matched video frame sequence matched by the tail video frame sequence, the matched video frame sequence is used as the second coincident video frame region corresponding to the second video.
- the performing still frame detection on the first video or the second video to obtain a still frame sequence includes: converting the first video or the second video into a plane video; Perform still frame detection to obtain the still frame sequence.
- a video splicing device comprises: a first video and a second video acquisition module for acquiring a first video and a second video to be spliced, the first video is before the second video; still frame A sequence obtaining module is used to perform still frame detection on the first video or the second video to obtain a still frame sequence; a reference video frame obtaining module is used to obtain a reference video frame based on the still frame sequence; the first overlapping video A frame area and a second overlapping video frame area obtaining module, configured to perform a overlapping area search based on the reference video frame, to obtain a first overlapping video frame area corresponding to the first video and a second overlapping video frame area corresponding to the second video A video frame area; a spliced video obtaining module, configured to splicing the first video and the second video based on the first overlapping video frame area and the second overlapping video frame area to obtain a spliced video.
- the spliced video obtaining module is configured to obtain the spliced video frame position, obtain the first spliced video frame corresponding to the spliced video frame position from the first overlapping video frame area, and obtain the first spliced video frame corresponding to the spliced video frame position from the first overlapping video frame area, and obtain the first splicing video frame corresponding to the splicing video frame position from the first overlapping video frame area, Obtain the second spliced video frame corresponding to the position of the spliced video frame in the frame area; determine the spatial transformation relationship between the first spliced video frame and the second spliced video frame, based on the spatial transformation relationship.
- the spliced video obtaining module is used to obtain the first feature point of the first spliced video frame and the second feature point of the second spliced video frame;
- the horizontal distance between the second feature points; the horizontal transformation value between the first spliced video frame and the second spliced video frame is determined based on the horizontal distance.
- the spliced video obtaining module is configured to obtain the current video frame to be fused from the first overlapping video frame area; obtain the current shooting time of the video frame corresponding to the current video frame and the reference video frame The current time difference between the reference shooting times; obtain the current fusion weight corresponding to the current video frame based on the current time difference, wherein the current time difference is positively correlated with the current fusion weight; The video frames at the corresponding positions of the two overlapping video frame regions are fused to obtain the current fused video frame.
- the spliced video obtaining module is used to obtain the overlapping area time length corresponding to the overlapping video frame area; calculate the ratio of the current time difference to the overlapping area time length to obtain the current fusion weight.
- the module for obtaining the first overlapping video frame area and the second overlapping video frame area is configured to compare the reference video frame with each video frame in the first video, to obtain the first video frame in the first video.
- the matched video frame matched with the reference video frame; the tail video frame area corresponding to the matched video frame is used as the first coincident video frame area corresponding to the first video; the second video frame described
- the reference video frame region where the reference video frame is located is taken as the second coincident video frame region corresponding to the second video, and the reference video frame is the head video frame of the reference video frame region, and the reference video frame region is the same as the reference video frame region.
- the number of video frames in the tail video frame area matches.
- the first overlapping video frame area and the second overlapping video frame area obtaining module is used to obtain a sequence of tail video frames with a preset number of frames in the first video, as the first video corresponding to the The first overlapping video frame area; from the backward video frame sequence corresponding to the reference video frame, obtain the matching video frame sequence matching the tail video frame sequence, and use the matching video frame sequence as the corresponding video frame sequence of the second video.
- the second coincident video frame area is used to obtain a sequence of tail video frames with a preset number of frames in the first video, as the first video corresponding to the The first overlapping video frame area; from the backward video frame sequence corresponding to the reference video frame, obtain the matching video frame sequence matching the tail video frame sequence, and use the matching video frame sequence as the corresponding video frame sequence of the second video.
- the second coincident video frame area is used to obtain a sequence of tail video frames with a preset number of frames in the first video, as the first video corresponding to the The first overlapping video frame area; from the
- the still frame sequence obtaining module is configured to convert the first video or the second video into a plane video; perform still frame detection on the plane video to obtain the still frame sequence.
- a computer device comprising a memory and a processor, the memory stores a computer program, and the processor implements the following steps when executing the computer program: acquiring a first video and a second video to be spliced, the first video Before the second video; perform still frame detection on the first video or the second video to obtain a still frame sequence; obtain a reference video frame based on the still frame sequence; perform a coincidence area search based on the reference video frame , obtain the first overlapping video frame area corresponding to the first video and the second overlapping video frame area corresponding to the second video; based on the first overlapping video frame area and the second overlapping video frame area A video is spliced with the second video to obtain a spliced video.
- the terminal obtains the first video and the second video to be spliced, performs still frame detection on the first video or the second video, and obtains a still frame sequence; based on the still frame sequence, obtains Reference video frame; wherein, the first video is captured before the second video is captured.
- the overlapping area search is performed based on the reference video frame to obtain a first overlapping video frame area corresponding to the first video and a second overlapping video frame area corresponding to the second video; based on the first overlapping video frame area and the second overlapping video frame area
- the first video and the second video are spliced to obtain a spliced video.
- the first overlapping video frame area and the second overlapping area are obtained, and based on the above two overlapping video frame areas, the first video and the second video are spliced, so that the first video and the second video are spliced together. It can realize natural splicing transition and improve the video splicing effect.
- FIG. 1 is an application environment diagram of a video splicing method in one embodiment
- FIG. 2 is a schematic flowchart of a video splicing method in one embodiment
- FIG. 3 is a schematic flowchart of a video splicing method in another embodiment
- FIG. 4 is a schematic flowchart of a video splicing method in another embodiment
- FIG. 5 is a schematic flowchart of a step of fusing a first overlapping video frame region and the second overlapping video frame region to obtain a fused video frame in one embodiment
- FIG. 6 is a schematic flowchart of a step of obtaining a current fusion weight corresponding to a current video frame based on a current time difference in one embodiment
- FIG. 7 is a schematic flowchart of a video splicing method in another embodiment
- FIG. 8 is a schematic flowchart of a video splicing method in another embodiment
- FIG. 9 is a schematic diagram of a method for determining the position of a spliced video frame in one embodiment
- FIG. 10 is a structural block diagram of a video splicing device in one embodiment
- Figure 11 is a diagram of the internal structure of a computer device in one embodiment.
- the video splicing method provided by the present application can be applied to the application environment shown in FIG. 1 , and is specifically applied to a video splicing system.
- the video splicing system includes a video capture device 102 and a terminal 104, wherein the video capture device 102 communicates with the terminal 104 through a network.
- Terminal 104 performs a video stitching method. Specifically, the video capture device 102 transmits to the terminal 104 two pieces of video to be spliced shot at the same position of the object to be photographed at different times, and the terminal 104 correspondingly obtains the first video and the second video to be spliced.
- the video is the forward video of the second video; after acquiring the first video and the second video, the terminal 104 performs still frame detection on the first video or the second video, and obtains a still frame sequence; based on the still frame sequence, Obtain a reference video frame; perform a coincidence region search based on the reference video frame, respectively obtain a first coincidence video frame region in the first video and a second coincidence video frame region in the second video; Terminal 104 is based on the first coincidence video frame region
- the first video and the second video are spliced together with the second overlapping video frame area to obtain a spliced video.
- the video capture device 102 may be, but is not limited to, various devices having a video capture function, and may be distributed outside the terminal 104 or inside the terminal 104 . For example, various cameras, various cameras, video capture cards, etc. are distributed outside the terminal 104 .
- the terminal 104 may be, but is not limited to, various cameras, personal computers, notebook computers, smart phones, tablet computers, and portable wearable
- a video splicing method is provided, and the method is applied to the terminal in FIG. 1 as an example for description, including the following steps:
- Step 202 Acquire a first video and a second video to be spliced, where the first video precedes the second video.
- the forward video refers to a video obtained from the same shooting position as the second video before shooting the second video.
- cameras take panoramic videos of objects passing through obstacles.
- the first video is stopped after the object passes through the obstacle for a certain distance.
- the camera then goes around the obstacle and takes a second video of the object crossing the obstacle from the other side of the obstacle.
- the time of shooting the first video is regarded as the first time
- the time of shooting the second video is regarded as the second time.
- the first time is before the second time.
- the two videos have the same shooting position, that is, a position with a certain distance from the obstacle, the first video is the forward video of the second video.
- the first video and the second video to be spliced need to be acquired first.
- the terminal can collect video through a connected video collection device, and the collection device transmits the collected video to the terminal in real time;
- the video acquisition instruction is sent, the locally stored video is transmitted to the terminal, and correspondingly, the terminal can acquire the first video and the second video to be spliced.
- the terminal collects the first video and the second video through an internal video collection module, and stores the collected video in the terminal memory.
- the terminal needs to splicing the first video and the second video , and obtain the first video and the second video to be spliced from the memory.
- Step 204 Perform still frame detection on the first video or the second video to obtain a still frame sequence.
- the still frame refers to a video frame in which the picture of the first video or the second video is made still in each video frame in the first video or the second video.
- the still frame sequence refers to a sequence composed of sequential still frames in the first video or the second video.
- feature point extraction may be performed on the last video frame in the first video sequentially with multiple consecutive video frames before the last video frame, and feature point matching may be performed.
- the video frame sequence composed of the multiple video frames is determined as the still frame sequence.
- the last video frame in the first video is represented as the first frame, and feature points are extracted and matched with the consecutive n-2 frames before the last video frame.
- the first video The video frame sequence consisting of the last n-1 frames is determined as a still frame sequence.
- feature point extraction may be performed through the first video frame in the second video and successive video frames following the first video frame, and feature point matching may be performed.
- the video frame sequence composed of the multiple video frames is determined as the still frame sequence.
- the first video frame in the second video is represented as the first frame, and feature points are extracted and matched with the consecutive n-2 frames after the first video frame.
- the second A video frame sequence consisting of the first n-1 frames in the video is determined as a still frame sequence.
- the image corresponding to the video frame is first converted into a plane view image.
- the plane view image may refer to a plane view with a field of view of 90 degrees seen in a certain direction of the panorama.
- the panorama includes six planes, up, down, front, back, left, and right, and each plane is a plane view.
- the A certain video frame of the still frame to be determined is determined as a still frame.
- an ORB Oriented Fast and Rotated Brief
- feature point detection method can be used to extract and match feature points in a video frame.
- SIFT Scale-invariant feature transform
- SUFT Speeded Up Robust Features
- LSD Line Segment Detection
- Step 206 obtaining a reference video frame based on the still frame sequence.
- the reference video frame refers to a video frame that can be used as a reference, and a matching result between other video frames and the video frame can be obtained by using the video frame.
- the forward video frame of the still frame sequence may be obtained from the first video or the backward video frame of the still frame sequence may be obtained from the second video.
- the forward video frame or The backward video frame is considered as a non-still video frame, and the non-still video frame can be used as a reference video frame.
- the forward video frame refers to the first video frame before the still frame sequence in the first video
- the backward video frame refers to the first video frame after the still frame sequence in the second video.
- the OpenCV software library in the terminal is called to extract the forward video frame or the backward video frame.
- OpenCV is a cross-platform computer vision and machine learning software library based on BSD (original BSD license, FreeBSD license, Original BSD license), which can extract video frames.
- a CRC (Cyclic Redundancy Check) check method can be used to perform still frame detection on the first video or the second video, and a CRC check can be performed on the video frames in the first video or the second video by creating multiple threads. Test, get the CRC check value of each frame, through the CRC check value, you can obtain the still frame in the first video or the second video, obtain the non-still video frame through the still frame, and use the non-still video frame as the reference video frame. It can be understood that other still frame detection methods may also be used for the determination of the still frame.
- Step 208 searching for overlapping regions based on the reference video frame, to obtain a first overlapping video frame region corresponding to the first video and a second overlapping video frame region corresponding to the second video.
- the overlapping area refers to the video frame area obtained by shooting the same position in the two videos, for example, two videos shot at different times, the two videos have the same shooting position, and the two videos have the same shooting position.
- the video frame area obtained by location shooting is the overlapping area.
- the overlapping region can be determined by using the reference video frame.
- the reference video frame may be matched with all frames in the first video, and in the first video, the video frame with the highest matching probability is used as the start frame of the first overlapping video frame region.
- a region corresponding to a video frame having the same position as the video frame in the first overlapping video frame region is used to obtain a second overlapping video frame region corresponding to the second video.
- the frame is represented as a P frame, and in the first video, the video frame portion corresponding to all the video frames after the P frame is taken as the overlapping area.
- the video frame area that has the same position as all the video frames after the P frame is the video frame area between the C frame and the F frame, and the video frame area between the C frame and the F frame is regarded as the second overlap.
- Video frame area is the video frame area between the C frame and the F frame.
- video frames with a preset number of frames after the reference video frame may be taken to perform matching of the corresponding video frames.
- m frames after the reference video frame may be taken, and by comparing the last m frames in the first video, the m frames corresponding to the maximum number of feature point matching statistics are obtained as the first overlapping video frame area.
- the obtained feature point matching number statistics are different, form a correspondence table between m frames and feature point matching number statistics, and find the maximum matching number statistics from this table, the maximum value of the statistics
- the corresponding m frames are the number of video frames in the obtained overlapping area.
- a second overlapping video frame region corresponding to the second video can be obtained.
- Table 1 it is a correspondence table between the preset number of frames and the statistical value of the matching number of feature points.
- Step 210 splicing the first video and the second video based on the first overlapping video frame area and the second overlapping video frame area to obtain a spliced video.
- splicing refers to the process of combining two or more videos into a complete video.
- the first video and the second video are processed in the first overlapping video frame area and the second overlapping video frame area. Stitch video.
- image fusion is performed after aligning the first video and the second video in the first overlapping video frame area and the second overlapping video frame area, and the first video and the second video are image-fused to form a complete Panoramic video.
- Image fusion methods can use linear fusion, Poisson fusion, multi-scale fusion, weighted fusion or Laplacian pyramid fusion. It can be understood that each frame of video in the video can be regarded as a still image, and when performing video fusion, the fusion of multiple video frames aligned in the overlapping video frame region can be regarded as the fusion of multiple still images.
- the method for weighted fusion of the first video and the second video forms a complete panoramic video.
- the weight in the process of weighted fusion of the first video and the second video, may be determined by the current value between the current shooting time of the video frame corresponding to the current video frame in the overlapping area and the reference shooting time of the reference video frame. time difference to determine. Assuming that Q represents the weight, t1 represents the current shooting time of the video frame corresponding to the current video frame, t2 represents the reference shooting time of the reference video frame, and t represents the total time corresponding to the video frames in the overlapping area, the weight can be determined by the current video frame corresponding to the The difference between the current shooting time of the video frame and the reference shooting time of the reference video frame, and the total time corresponding to the video frames in the overlapping area are calculated.
- I represents the fused video frame
- I1 represents the current video frame of the first video in the overlapping area
- I2 represents the current video frame of the second video in the overlapping area
- first overlapping video frame area and the second overlapping video frame area are the areas where the overlapping video frame area exists in the first video and the second video respectively, and this area is the overlapping area of the first video and the second video. , using the first overlapping video frame area and the second overlapping video frame area, so as to distinguish the overlapping video frame areas corresponding to the overlapping area appearing in the first video and the second video.
- the terminal obtains the first video and the second video to be spliced, performs still frame detection on the first video or the second video, and obtains a still frame sequence; and obtains a reference video frame based on the still frame sequence;
- One video is captured before the second video is captured.
- the overlapping area search is performed based on the reference video frame to obtain a first overlapping video frame area corresponding to the first video and a second overlapping video frame area corresponding to the second video; based on the first overlapping video frame area and the second overlapping video frame area
- the first video and the second video are spliced to obtain a spliced video.
- the first overlapping video frame area and the second overlapping area are obtained, and based on the above two overlapping video frame areas, the first video and the second video are spliced, so that the first video and the second video are spliced together. It can realize natural splicing transition and improve the video splicing effect.
- splicing the first video and the second video based on the first overlapping video frame area and the second overlapping video frame area, and obtaining the spliced video includes:
- Step 302 Obtain the position of the spliced video frame, obtain the first spliced video frame corresponding to the position of the spliced video frame from the first overlapping video frame area, and obtain the second spliced video frame corresponding to the position of the spliced video frame from the second overlapping video frame area.
- the spliced video frame position refers to a video frame position capable of splicing the first video and the second video.
- the spatial position of the 100th frame in the first video is frame S
- the position of the 10th frame in the second video is also frame S
- the splicing position It is the S frame
- the corresponding 100th frame of the first video and the 10th frame of the second video can be regarded as the position of the spliced video frame.
- the splicing may be performed by acquiring the spliced video frame positions of the two videos.
- the center video frame of the overlapping area can be selected as the position of the spliced video frame, and after the position of the spliced video frame is obtained, the first spliced video frame and the spliced video frame corresponding to the position of the spliced video frame in the first video can be determined.
- the position of the video frame corresponds to the second spliced video frame in the second video.
- the center video frame is the video frame located in the middle of the sequence of video frames. For example, if the video frame sequence is arranged with 5 video frames, and the video frame positions are ⁇ 1, 2, 3, 4, 5 ⁇ , the video frame at position 3 is the video frame in the middle of the video frame sequence.
- the matching points between the respective aligned video frames are obtained by calculating the respective aligned video frames.
- the video frame with the largest number of matching points is used as the position of the spliced video frame.
- Step 304 Determine the spatial transformation relationship between the first spliced video frame and the second spliced video frame, and perform video frame alignment on the first video and the second video based on the spatial transformation relationship.
- the spatial transformation relationship refers to the transformation relationship between the first spliced video frame and the second spliced video frame, such as rotation, translation, or zooming in and out.
- first spliced video and the second spliced video there may be a certain angle between the first spliced video and the second spliced video due to the shooting angle and other reasons.
- first spliced video frame and the second spliced video frame there will also be a certain angle between the first spliced video frame and the second spliced video frame.
- the spatial transformation relationship between the spliced video frames, the splicing of the first spliced video frame and the second spliced video frame can be completed only after the video frames of the first video and the second video are aligned based on the spatial transformation relationship.
- the images captured at different angles are all converted to the same viewing angle, and the spatial transformation relationship from video frame to video frame is obtained. Based on the spatial transformation relationship, the Video frame alignment is performed on the first video and the second video.
- Step 306 splicing video frames based on the aligned first video frame and the second video frame to obtain a spliced video, wherein during splicing, the first overlapping video frame area and the second overlapping video frame area are fused to obtain a fusion video frame.
- the fused video frame refers to extracting the information corresponding to the first video frame and the second video frame that can enhance the image quality, and synthesizing the video frame corresponding to the high-quality image information.
- the spatial transformation relationship between the corresponding first spliced video frame and the second spliced video frame is determined at the position of the spliced video frame, so that the first video and the second video are video frame
- fusion is performed to obtain fused video frames, which can accurately splicing the first video and the second video, so that the spliced first video and the second video can achieve a natural transition and improve the splicing effect.
- the spatial transformation relationship includes a horizontal transformation value
- the spatial transformation relationship between the first spliced video frame and the second spliced video frame is determined, and the first video and the second spliced video frame are determined based on the spatial transformation relationship.
- Video frame alignment includes:
- Step 402 Obtain a first feature point of the first spliced video frame and a second feature point of the second spliced video frame.
- the feature point refers to a point that can reflect the essential feature in each video frame image, and the target object in the image can be identified by the essential feature.
- the distance between the two video frames can be calculated by the distance of the feature points in the two video frames.
- the anti-shake panoramic video refers to the panoramic video after the panoramic video is subjected to anti-shake processing through the video data recorded by inertial sensors and accelerometers; the horizon in the anti-shake video is basically kept at the horizontal centerline position of the panoramic video frame.
- the pitch angle and roll angle between the anti-shake panoramic video frames shot at the same position at all times are basically 0. It can be understood that in the anti-shake panoramic video, there is only one heading angle between the anti-shake video images shot at the same position at different times, that is, there is only horizontal translation.
- the ORB feature point detection method or the SIFT feature point detection method may be used to directly extract the first feature point of the first spliced video frame and the second feature point of the second spliced video frame.
- the panorama image corresponding to the panoramic video may be converted into a plane view, and then the ORB feature point detection method may be used to extract the first feature point of the first spliced video frame and the second feature point of the second spliced video frame .
- the plane view may refer to a plane view with a field of view of 90 degrees seen in a certain direction of the panorama.
- the panorama includes six planes, up, down, front, back, left, and right, and each plane is a plane view. For example, top view, bottom view, left view, right view, bottom view, and bottom view.
- the panorama corresponding to the panorama video can be converted into a bottom view in the plane view video, and then feature points are extracted, and the panorama is image-transformed through a rotation matrix to obtain the image transformation of the panorama image to the bottom-view image.
- a panorama refers to an image whose viewing angle covers the horizon plus and minus 180 degrees and the vertical direction plus and minus 90 degrees; if the panorama is regarded as an image in the spatial state of a cube, it can be considered that the image completely includes up, down, front, back, left, and right.
- Step 404 Determine the horizontal distance between the first feature point and the second feature point.
- the horizontal distance refers to the difference between the coordinates of the first feature point in the horizontal direction and the coordinates of the second feature point in the horizontal direction.
- the horizontal distance between the first feature point and the second feature point is represented as ⁇ x
- the coordinates of the first feature point in the horizontal direction are represented as x p1
- the coordinates of the second feature point in the horizontal direction are represented as x p2
- the first feature point is represented as x p2 .
- the horizontal distance ⁇ x between a feature point and a second feature point can be calculated by the following formula:
- Step 406 Determine the horizontal transformation value between the first spliced video frame and the second spliced video frame based on the horizontal distance.
- the horizontal transformation value refers to the horizontal difference between the first spliced video frame and the second spliced video frame obtained by using the horizontal distance.
- different horizontal transformation values can be obtained according to different value ranges of the horizontal distance.
- the horizontal transformation value dx can be expressed as the formula:
- the horizontal transformation value between the first spliced video frame and the second spliced video frame can be obtained by using the statistical value of the horizontal transformation value.
- the average value of the horizontal transformation values may be used as the horizontal transformation value between the first spliced video frame and the second spliced video frame.
- the statistical value of the horizontal transformation value is obtained by sorting the obtained horizontal transformation values, and the horizontal transformation values can be sorted from large to small or from small to large, and will be located in the middle of the sorting.
- the horizontal transformation value of is used as the statistical value of the horizontal transformation value.
- Statistics of horizontally transformed values can also be obtained by other methods. For example, the statistical value of the horizontally transformed values is obtained by calculating the average, weighted average, or mode of the respective horizontally transformed values.
- the horizontal distance between the first feature point and the second feature point is obtained by acquiring the first feature point of the first spliced video frame and the second feature point of the second spliced video frame, and the first feature point is determined by the horizontal distance.
- the horizontal transformation value between the spliced video frame and the second spliced video frame can achieve the purpose of accurately determining the spatial transformation relationship between the first spliced video frame and the second spliced video frame, and improve the video splicing according to the accurate spatial transformation relationship. Effect.
- the step of fusing the first overlapping video frame region and the second overlapping video frame region to obtain a fused video frame includes:
- Step 502 Acquire the current video frame to be fused from the first overlapping video frame area.
- the current video frame to be fused needs to be acquired first in the overlapping video frame area.
- the video is read through the OpenCV software library, and each frame in the video is extracted, which can be implemented by using the video acquisition structure function in the OpenCV software library.
- the video acquisition structure function VideoCapture and Mat are used to acquire the video, and further, the video acquisition structure function described above can be used to acquire the current video frame to be fused.
- filename represents a video file
- frame represents a certain video frame to be acquired.
- Step 504 Obtain the current time difference between the current shooting time of the video frame corresponding to the current video frame and the reference shooting time of the reference video frame.
- the current video frame after acquiring the current video frame to be fused in the first overlapping video frame area in the first video, because of the different shooting times, the current video frame will have a corresponding shooting time, and the same reference video frame will also have For the corresponding shooting time, the current time difference can be obtained through the shooting time of the current video frame and the shooting time of the reference video frame.
- the shooting time can be represented by a timestamp, which can be represented by the number of frames, or it can be represented by the number of frames multiplied by the frame rate, both of which can uniquely determine the shooting time corresponding to the video frame .
- the current time difference may be obtained by the difference between the current shooting time of the video frame corresponding to the current video frame and the reference shooting time of the reference video frame.
- t1 represents the current shooting time of the video frame corresponding to the current video frame
- t2 represents the reference shooting time of the reference video frame
- ⁇ t represents the current time between the current shooting time of the video frame corresponding to the current video frame and the reference shooting time of the reference video frame.
- Step 506 Obtain the current fusion weight corresponding to the current video frame based on the current time difference, wherein the current time difference and the current fusion weight are positively correlated.
- the fusion weight refers to the proportion corresponding to the current video frame in the image fusion process.
- a positive correlation means that the current fusion weight and the current time difference have the same trend of increasing or decreasing; the current time difference increases, the current fusion weight also increases, the current time difference decreases, and the current fusion weight also decreases. .
- the current fusion weight can be obtained according to the positive correlation between the current time difference and the current fusion weight.
- Step 508 fuse the current video frame with the video frame at the corresponding position of the second overlapping video frame region based on the current fusion weight to obtain the current fused video frame.
- the first video and the second video are fused by using the current fusion weight to obtain a spliced video with higher quality.
- image fusion is performed using the current fusion weight, the video frame in the first overlapping video frame area corresponding to the first video, and the video frame in the second overlapping video frame area corresponding to the second video, to obtain a fused image.
- first overlapping video frame area and the second overlapping video frame area are the areas where the overlapping video frame area exists in the first video and the second video respectively, and this area is the overlapping area of the first video and the second video. , using the first overlapping video frame region and the second overlapping video frame region, so as to distinguish the regions corresponding to the overlapping regions appearing in the first video and the second video.
- the current fusion weight is obtained through the current time difference, and based on the current fusion weight, the current video frame and the video frame at the corresponding position of the second overlapping video frame area are fused to obtain the current fusion video. frame, which can achieve the purpose of obtaining a complete video with a natural transition effect.
- the current fusion weight corresponding to the current video frame is obtained based on the current time difference and includes:
- Step 602 Obtain the overlapping area time length corresponding to the overlapping video frame area.
- the time length of the overlapping area refers to the video time length corresponding to the video frames of the overlapping area. For example, if the video length of the overlapping area is 600 milliseconds, the time length of the overlapping area is 600 milliseconds.
- one of the parameters in the calculation of the current fusion weight is the time length of the overlapping area, and one of the parameters for calculating the current fusion weight can be determined by obtaining the time length of the overlapping area.
- the total number of frames in the overlapping area and the video frame rate can be used to obtain the time length of the overlapping area according to the functional relationship between the total number of frames and the video frame rate.
- b represents the total number of frames in the overlapping area
- v represents the frame rate
- t represents the time length of the overlapping area
- Step 604 Calculate the ratio of the current time difference to the time length of the overlapping area to obtain the current fusion weight.
- t1-t2 represents the current time difference
- t represents the time length of the overlapping area
- Q represents the current fusion weight
- the purpose of accurately obtaining the current fusion weight can be achieved through the ratio of the current time difference to the time length of the overlapping area, so that when the first video and the second video are spliced through the overlapping area, the current fusion weight can be used to correct the overlapping The video frames in the area are fused to improve the video stitching effect.
- the overlapping area search is performed based on the reference video frame to obtain the first overlapping video frame area corresponding to the first video and the second overlapping video frame area corresponding to the second video, including:
- Step 702 Compare the reference video frame with each video frame in the first video to obtain a matching video frame in the first video that matches the reference video frame.
- the matching video frame refers to a video frame in the first video that can satisfy the matching condition with the reference video frame.
- the video frame with the largest number of matching points with the reference video frame among the video frames in the first video may be used as the matching video frame.
- the reference video frame is in the second video, and the backward video frame of the still frame is considered to be the first non-still frame in order to avoid the video quality such as freezing or stillness.
- the reference video frame is used as the video frame to be compared, and the matching video frame is obtained.
- the video frame with the highest matching rate with the video frame serving as the reference video frame in the second video is selected as the matching video frame.
- the matching rate may be the ratio of the number of feature points matching to the total number of feature points. For example, if the number of feature points matching between the reference video frame and a certain video frame in the first video is 1000, and the total number of feature points is 1500, then the ratio of the matching rate between 1000 and 1500 is 67%.
- step 704 the tail video frame area corresponding to the matching video frame is used as the first overlapping video frame area corresponding to the first video.
- the tail video frame area refers to the corresponding video frame area from the beginning of the matching video frame to the end video frame of the first video. For example, if the matching video frame is P, the tail video frame area is the video frame after the P frame in the first video.
- the video frame after the matching video frame may be obtained in the first video as the first overlapping video frame region corresponding to the first video.
- Step 706 the reference video frame area where the reference video frame is located in the second video frame is taken as the second coincident video frame area corresponding to the second video, the reference video frame is the head video frame of the reference video frame area, the reference video frame area Matches the number of video frames in the tail video frame area.
- the header video frame refers to the first video frame in the video frame area.
- the reference video frame area is the second overlapping video frame area in the second video
- the trailing video frame area is the first overlapping video frame area in the first video
- video frame It can be understood that, after the video fusion is realized, the first overlapping video frame area and the second overlapping video frame area will form a overlapping video frame area, and the video frame before the overlapping video frame area is the video frame of the first video.
- the video frame after the overlapping video frame area is the video frame of the second video.
- the matching video frame can be obtained by referring to the video frame
- the first overlapping video frame area can be obtained by matching the video frame
- the second overlapping video frame area can be obtained correspondingly on the second video, so as to accurately determine the overlapping video frame area
- the purpose is to perform video fusion on the overlapping video frame area, and realize the natural splicing of the first video and the second video in the overlapping video frame area.
- the overlapping area search is performed based on the reference video frame, and the obtained first overlapping video frame area corresponding to the first video and the second overlapping video frame area corresponding to the second video include:
- Step 802 Acquire a tail video frame sequence with a preset number of frames in the first video as a first overlapping video frame region corresponding to the first video.
- the preset number of frames refers to a predetermined number of video frames, and the number of video frames in the acquired tail video frame sequence can be determined by the number of video frames. For example, if the preset number of frames is m, the acquired tail video frame sequence includes m video frames.
- the preset number of frames may be determined by continuous testing in a pre-judgment manner.
- the total number of video frames may be used as a reference, and according to an empirical value, video frame regions corresponding to a plurality of preset frame numbers may be set as the first overlapping video frame regions corresponding to the first video.
- Step 804 Obtain a matching video frame sequence matching the tail video frame sequence from the backward video frame sequence corresponding to the reference video frame, and use the matching video frame sequence as the second overlapping video frame region corresponding to the second video.
- the backward video frame sequence refers to a sequence composed of video frames following the reference video frame.
- the sequence may include some video frames following the reference video frame, or may include all video frames following the reference video frame.
- a video frame with a preset number of frames after the reference video frame may be taken as the matching video frame sequence. For example, m frames after the reference video frame may be taken, and by comparing the last m frames in the first video, the m frames corresponding to the maximum number of feature point matching statistics are obtained as the overlapping area. When m takes different values, the obtained feature point matching number statistics are different, forming a correspondence table between the numerical value m and the feature point matching number statistics value, and find the maximum matching number statistics value from the table, and the maximum value corresponds to where m is the number of video frames in the obtained overlapping area.
- the first overlapping video frame region is obtained by obtaining a tail video frame sequence with a preset number of frames in the first video, and a video frame matching the tail video frame sequence is obtained through the obtained video frames in the first overlapping video frame region.
- Matching the video frame sequence taking the matching video frame sequence as the second overlapping video frame area corresponding to the second video, can achieve the purpose of accurately determining the overlapping video frame area, so that the first video and the second video are completed in the overlapping video frame area.
- splicing so that the first video and the second video can transition naturally during the splicing process, and improve the effect of video splicing.
- performing still frame detection on the first video or the second video to obtain a still frame sequence includes:
- the flat video is a video composed of individual flat view images.
- the plane view image may refer to a plane view with a field of view of 90 degrees seen in a certain direction of the panorama.
- the panorama includes six planes, up, down, front, back, left, and right, and each plane is a plane view.
- plan views include top, bottom, left, right, bottom, and bottom views.
- the panorama corresponding to the panorama video can be converted into a bottom view in the plane view corresponding to the plane video, and then feature points are extracted, and the panorama is image-transformed through a rotation matrix to obtain an image of the bottom view image of the panorama image transform.
- a panorama refers to an image whose viewing angle covers the horizon plus and minus 180 degrees and the vertical direction plus and minus 90 degrees; if the panorama is regarded as an image in the spatial state of a cube, it can be considered that the image completely includes up, down, front, back, left, and right.
- the still frame detection is performed on the plane video, and the still frame sequence is obtained.
- the first plane video and the second plane video may be acquired respectively, and feature point extraction is performed by sequentially performing feature point extraction on the last video frame in the first plane video and consecutive video frames before the last video frame, and Perform feature point matching.
- the video frame sequence composed of the multiple video frames is determined as the still frame sequence.
- the last plane video frame in the first plane video is represented as the first frame, and the feature points are extracted and matched with the consecutive n-2 frames before the last video frame.
- the first plane A video frame sequence consisting of the last n-1 frames in the video is determined as a still frame sequence.
- feature point extraction may be performed on the first planar video frame in the second planar video sequentially with multiple consecutive video frames after the first video frame, and feature point matching may be performed.
- the video frame sequence composed of the multiple video frames is determined as the still frame sequence.
- the first video frame in the second plane video is represented as the first frame, and feature points are extracted and matched with the consecutive n-2 frames after the first video frame.
- the matching results meet the threshold condition, the The video frame sequence composed of the first n-1 frames in the biplane video is determined as the still frame sequence.
- the camera is first placed at a position where the last video of the first video (previous video) passes, and then the camera is still for a period of time, Move the camera along the motion path of the last video of the first video to start capturing the second video (the second video).
- Both the first video and the second video have an overlapped area where the last video is shot. It is assumed that the video frame that starts to move in the second video is at frame A, and the end position of the overlapped area is at frame B.
- the range of the video frames in the first video of the overlapping area is also between the A frame and the B frame.
- the video frame at the central position between frame A and frame B can be taken as the position of the spliced video frame, or other video frames between frame A and frame B can be taken as the position of the spliced video frame, and the first spliced video frame position can be used to complete the first.
- the stitching of the video and the second video Assuming that the position of the spliced video frame is represented as a C frame, the C frame may be the video frames at multiple video frames after the A frame, for example, the C frame may be the video frame at 5 video frames after the A frame.
- a panorama video of an object passing through an obstacle is taken by an anti-shake camera as an example for description.
- the first video first video
- the camera bypasses the obstacle and shoots the second video (second video) of the object passing through the obstacle from the other side of the obstacle, and the second video is shot from the first video of the object passing through the obstacle
- the shooting started at , because there is a certain delay in shooting, so the second video shot looks still at the starting moment.
- the second video continues to be shot along the route of passing through the obstacle shot in the first video, so that the two videos before and after have the same shooting route, that is, an overlapping path, and the overlapping path is used to connect the two videos before and after.
- the conversion relationship is a horizontal translation relationship.
- the first video and the second video are aligned in the overlapping path, and the image fusion method is used to complete the fusion of video frame images, so that the first video and the second video can complete the natural transition, so as to realize the first video.
- a server is provided, and the server is configured to execute the steps in the foregoing method embodiments.
- the server can be implemented as an independent server or a server cluster composed of multiple servers.
- steps in the flowcharts of FIGS. 2-8 are shown in sequence according to the arrows, these steps are not necessarily executed in the sequence shown by the arrows. Unless explicitly stated herein, the execution of these steps is not strictly limited to the order, and these steps may be performed in other orders. Moreover, at least a part of the steps in FIGS. 2-8 may include multiple steps or multiple stages. These steps or stages are not necessarily executed and completed at the same time, but may be executed at different times. The execution of these steps or stages The order is also not necessarily sequential, but may be performed alternately or alternately with other steps or at least a portion of the steps or phases within the other steps.
- a video splicing apparatus 1000 including: a first video and a second video acquisition module 1002, a still frame sequence acquisition module 1004, a reference video frame acquisition module 1006, a first The overlapping video frame area and the second overlapping video frame area obtaining module 1008 and the splicing video obtaining module 1010, wherein: the first video and the second video obtaining module 1002 are used to obtain the first video and the second video to be spliced.
- the video is before the second video;
- the still frame sequence obtaining module 1004 is used to perform still frame detection on the first video or the second video to obtain a still frame sequence;
- the reference video frame obtaining module 1006 is used to obtain a still frame sequence based on the frame sequence to obtain a reference video frame;
- the first overlapping video frame area and the second overlapping video frame area obtaining module 1008 is used to search the overlapping area based on the reference video frame to obtain the first overlapping video frame area corresponding to the first video and The second overlapping video frame area corresponding to the second video;
- the spliced video obtaining module 1010 is used for splicing the first video and the second video based on the first overlapping video frame area and the second overlapping video frame area to obtain a spliced video.
- the spliced video obtaining module 1010 is configured to obtain the spliced video frame position, obtain the first spliced video frame corresponding to the spliced video frame position from the first overlapping video frame area, and obtain the spliced video frame from the second overlapping video frame area
- the second spliced video frame corresponding to the frame position determine the spatial transformation relationship between the first spliced video frame and the second spliced video frame, and perform video frame alignment on the first video and the second video based on the spatial transformation relationship;
- the first video frame and the second video frame are spliced into video frames to obtain a spliced video, wherein during splicing, the first overlapping video frame area and the second overlapping video frame area are fused to obtain a fused video frame.
- the spliced video obtaining module 1010 is configured to obtain the first feature point of the first spliced video frame and the second feature point of the second spliced video frame; determine the level between the first feature point and the second feature point Distance; the horizontal transformation value between the first spliced video frame and the second spliced video frame is determined based on the horizontal distance.
- the spliced video obtaining module 1010 is used to obtain the current video frame to be fused from the first overlapping video frame area; obtain the difference between the current shooting time of the video frame corresponding to the current video frame and the reference shooting time of the reference video frame The current time difference of The video frames are fused to obtain the current fused video frame.
- the spliced video obtaining module 1010 is configured to obtain the overlapping area time length corresponding to the overlapping video frame area; calculate the ratio of the current time difference to the overlapping area time length to obtain the current fusion weight.
- the first overlapping video frame area and the second overlapping video frame area obtaining module 1008 is configured to compare the reference video frame with each video frame in the first video, respectively, to obtain the first video and the reference video frame
- the matched matching video frame; the tail video frame area corresponding to the matching video frame is used as the first coincident video frame area corresponding to the first video;
- the reference video frame area where the reference video frame is located in the second video frame is used as the second video frame area
- the reference video frame is the head video frame of the reference video frame region, and the reference video frame region matches the number of video frames in the tail video frame region.
- the first overlapping video frame area and the second overlapping video frame area obtaining module 1008 is configured to obtain a sequence of tail video frames with a preset number of frames in the first video, as the first overlapping video corresponding to the first video Frame area; from the backward video frame sequence corresponding to the reference video frame, obtain a matching video frame sequence matching the tail video frame sequence, and use the matching video frame sequence as the second coincident video frame area corresponding to the second video.
- the still frame sequence obtaining module 1004 is configured to convert the first video or the second video into a planar video; perform still frame detection on the planar video to obtain a still frame sequence.
- Each module in the above video splicing apparatus can be implemented in whole or in part by software, hardware and combinations thereof.
- the above modules can be embedded in or independent of the processor in the computer device in the form of hardware, or stored in the memory in the computer device in the form of software, so that the processor can call and execute the operations corresponding to the above modules.
- a computer device is provided, and the computer device may be a terminal, and its internal structure diagram may be as shown in FIG. 11 .
- the computer equipment includes a processor, memory, a communication interface, a display screen, and an input device connected by a system bus.
- the processor of the computer device is used to provide computing and control capabilities.
- the memory of the computer device includes a non-volatile storage medium, an internal memory.
- the nonvolatile storage medium stores an operating system and a computer program.
- the internal memory provides an environment for the execution of the operating system and computer programs in the non-volatile storage medium.
- the communication interface of the computer device is used for wired or wireless communication with an external terminal, and the wireless communication can be realized by WIFI, operator network, NFC (Near Field Communication) or other technologies.
- the computer program when executed by the processor, implements a video stitching method.
- the display screen of the computer equipment may be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment may be a touch layer covered on the display screen, or a button, a trackball or a touchpad set on the shell of the computer equipment , or an external keyboard, trackpad, or mouse.
- FIG. 11 is only a block diagram of a partial structure related to the solution of the present application, and does not constitute a limitation on the computer equipment to which the solution of the present application is applied. Include more or fewer components than shown in the figures, or combine certain components, or have a different arrangement of components.
- a computer device including a memory and a processor, where a computer program is stored in the memory, and when the processor executes the computer program, the steps in the foregoing method embodiments are implemented.
- a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, implements the steps in the foregoing method embodiments.
- Non-volatile memory may include read-only memory (Read-Only Memory, ROM), magnetic tape, floppy disk, flash memory, or optical memory, and the like.
- Volatile memory may include random access memory (RAM) or external cache memory.
- the RAM may be in various forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM).
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Image Processing (AREA)
- Image Analysis (AREA)
- Studio Devices (AREA)
Abstract
Description
m值(预设帧数) | 特征点匹配数量统计值 |
20 | 1000 |
30 | 1200 |
35 | 900 |
40 | 1100 |
m值(预设帧数) | 特征点匹配数量统计值 |
20 | 1000 |
30 | 1200 |
35 | 900 |
40 | 1100 |
Claims (11)
- 一种视频拼接方法,其特征在于,所述方法包括:获取待拼接的第一视频和第二视频,所述第一视频在所述第二视频之前;对所述第一视频或第二视频进行静止帧检测,得到静止帧序列;基于所述静止帧序列,得到参考视频帧;基于所述参考视频帧进行重合区域搜索,得到所述第一视频对应的第一重合视频帧区域以及所述第二视频对应的第二重合视频帧区域;基于所述第一重合视频帧区域以及第二重合视频帧区域将所述第一视频与所述第二视频进行拼接,得到拼接视频。
- 根据权利要求1所述的方法,其特征在于,所述基于所述第一重合视频帧区域以及第二重合视频帧区域将所述第一视频与所述第二视频进行拼接,得到拼接视频包括:获取拼接视频帧位置,从所述第一重合视频帧区域获取所述拼接视频帧位置对应的第一拼接视频帧,从所述第二重合视频帧区域中获取所述拼接视频帧位置对应的第二拼接视频帧;确定所述第一拼接视频帧与所述第二拼接视频帧之间的空间变换关系,基于所述空间变换关系对所述第一视频以及所述第二视频进行视频帧对齐;基于对齐之后的第一视频帧与所述第二视频帧进行视频帧拼接,得到拼接视频,其中,在拼接时,所述第一重合视频帧区域与所述第二重合视频帧区域进行融合得到融合视频帧。
- 根据权利要求2所述的方法,其特征在于,所述空间变换关系包括水平变换值,所述确定所述第一拼接视频帧与所述第二拼接视频帧之间的空间变换关系,基于所述空间变换关系对所述第一视频以及所述第二视频进行视频帧对齐包括:获取所述第一拼接视频帧的第一特征点和所述第二拼接视频帧的第二特征点;确定所述第一特征点与所述第二特征点之间的水平距离;基于所述水平距离确定所述第一拼接视频帧与所述第二拼接视频帧之间的水平变换值。
- 根据权利要求2所述的方法,其特征在于,所述第一重合视频帧区域与所述第二重合视频帧区域进行融合得到融合视频帧的步骤包括:从所述第一重合视频帧区域中获取待融合的当前视频帧;获取所述当前视频帧对应的视频帧当前拍摄时间与所述参考视频帧的参考拍摄时间之间的当前时间差异;基于所述当前时间差异得到当前视频帧对应的当前融合权重,其中,当前时间差异与当前融合权重成正相关关系;基于当前融合权重将当前视频帧与所述第二重合视频帧区域对应位置的视频帧进行融合,得到当前融合视频帧。
- 根据权利要求4所述的方法,其特征在于,所述基于当前时间差异得到当前视频帧对应的当前融合权重包括:获取重合视频帧区域对应的重合区域时间长度;计算所述当前时间差异与所述重合区域时间长度的比值,得到当前融合权重。
- 根据权利要求1所述的方法,其特征在于,所述基于所述参考视频帧进行重合区域搜索,得到所述第一视频对应的第一重合视频帧区域以及所述第二视频对应的第二重合视频帧区域包括:将所述参考视频帧分别与所述第一视频中的各个视频帧进行对比,得到第一视频中与所述参考视频帧匹配的匹配视频帧;将所述匹配视频帧对应的尾部视频帧区域,作为所述第一视频对应的第一重合视频帧区域;将所述第二视频帧中所述参考视频帧所在的参考视频帧区域,作为所述第二视频对应的第二重合视频帧区域,所述参考视频帧为所述参考视频帧区域的头部视频帧,所述参考视频帧区域与所述尾部视频帧区域的视频帧数量匹配。
- 根据权利要求1所述的方法,其特征在于,所述基于所述参考视频帧进行重合区域搜索,得到所述第一视频对应的第一重合视频帧区域以及所述第二视频对应的第二重合视频帧区域包括:在所述第一视频中获取预设帧数的尾部视频帧序列,作为所述第一视频对应的第一重合视频帧区域;从所述参考视频帧对应的后向视频帧序列中,获取与所述尾部视频帧序列匹配的匹配视频帧序列,将匹配视频帧序列作为所述第二视频对应的第二重合视频帧区域。
- 根据权利要求1所述的方法,其特征在于,所述对所述第一视频或第二视频进行静止帧检测,得到静止帧序列包括:将所述第一视频或者第二视频转换为平面视频;对所述平面视频进行静止帧检测,得到所述静止帧序列。
- 一种视频拼接装置,其特征在于,所述装置包括:第一视频和第二视频获取模块,用于获取待拼接的第一视频和第二视频,所述第一视频在所述第二视频之前;静止帧序列得到模块,用于对所述第一视频或第二视频进行静止帧检测,得到静止帧序列;参考视频帧得到模块,用于基于所述静止帧序列,得到参考视频帧;第一重合视频帧区域以及第二重合视频帧区域得到模块,用于基于所述参考视频帧进行重合区域搜索,得到所述第一视频对应的第一重合视频帧区域以及所述第二视频对应的第二重合视频帧区域;拼接视频得到模块,用于基于所述第一重合视频帧区域以及第二重合视频帧区域将所述第一视频与所述第二视频进行拼接,得到拼接视频。
- 一种计算机设备,包括存储器和处理器,所述存储器存储有计算机程序,其特征在于,所述处理器执行所述计算机程序时实现权利要求1至8中任一项所述的方法的步骤。
- 一种计算机可读存储介质,其上存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现权利要求1至8中任一项所述的方法的步骤。
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP22758918.1A EP4300982A4 (en) | 2021-02-26 | 2022-02-24 | VIDEO COLLECTION METHOD AND APPARATUS, COMPUTER DEVICE AND STORAGE MEDIUM |
JP2023550696A JP2024506109A (ja) | 2021-02-26 | 2022-02-24 | ビデオつなぎ合わせ方法及び装置、コンピュータデバイス並びに記憶媒体 |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110217098.6 | 2021-02-26 | ||
CN202110217098.6A CN114979758B (zh) | 2021-02-26 | 2021-02-26 | 视频拼接方法、装置、计算机设备和存储介质 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022179554A1 true WO2022179554A1 (zh) | 2022-09-01 |
Family
ID=82973207
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2022/077635 WO2022179554A1 (zh) | 2021-02-26 | 2022-02-24 | 视频拼接方法、装置、计算机设备和存储介质 |
Country Status (4)
Country | Link |
---|---|
EP (1) | EP4300982A4 (zh) |
JP (1) | JP2024506109A (zh) |
CN (1) | CN114979758B (zh) |
WO (1) | WO2022179554A1 (zh) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101304490A (zh) * | 2008-06-20 | 2008-11-12 | 北京六维世纪网络技术有限公司 | 一种拼合视频的方法和装置 |
US20160286138A1 (en) * | 2015-03-27 | 2016-09-29 | Electronics And Telecommunications Research Institute | Apparatus and method for stitching panoramaic video |
US20190028643A1 (en) * | 2016-02-29 | 2019-01-24 | Sony Corporation | Image processing device, display device, reproduction control method, and image processing system |
CN111294644A (zh) * | 2018-12-07 | 2020-06-16 | 腾讯科技(深圳)有限公司 | 视频拼接方法、装置、电子设备及计算机存储介质 |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101668130A (zh) * | 2009-09-17 | 2010-03-10 | 深圳市启欣科技有限公司 | 一种电视拼接墙分割补偿系统 |
-
2021
- 2021-02-26 CN CN202110217098.6A patent/CN114979758B/zh active Active
-
2022
- 2022-02-24 JP JP2023550696A patent/JP2024506109A/ja active Pending
- 2022-02-24 WO PCT/CN2022/077635 patent/WO2022179554A1/zh active Application Filing
- 2022-02-24 EP EP22758918.1A patent/EP4300982A4/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101304490A (zh) * | 2008-06-20 | 2008-11-12 | 北京六维世纪网络技术有限公司 | 一种拼合视频的方法和装置 |
US20160286138A1 (en) * | 2015-03-27 | 2016-09-29 | Electronics And Telecommunications Research Institute | Apparatus and method for stitching panoramaic video |
US20190028643A1 (en) * | 2016-02-29 | 2019-01-24 | Sony Corporation | Image processing device, display device, reproduction control method, and image processing system |
CN111294644A (zh) * | 2018-12-07 | 2020-06-16 | 腾讯科技(深圳)有限公司 | 视频拼接方法、装置、电子设备及计算机存储介质 |
Non-Patent Citations (1)
Title |
---|
See also references of EP4300982A4 * |
Also Published As
Publication number | Publication date |
---|---|
EP4300982A4 (en) | 2024-07-03 |
CN114979758B (zh) | 2023-03-21 |
JP2024506109A (ja) | 2024-02-08 |
EP4300982A1 (en) | 2024-01-03 |
CN114979758A (zh) | 2022-08-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
AU2012219026B2 (en) | Image quality assessment | |
CN111046752B (zh) | 一种室内定位方法、计算机设备和存储介质 | |
Meng et al. | SkyStitch: A cooperative multi-UAV-based real-time video surveillance system with stitching | |
CN111915483B (zh) | 图像拼接方法、装置、计算机设备和存储介质 | |
CN108958469B (zh) | 一种基于增强现实的在虚拟世界增加超链接的方法 | |
CN108875507B (zh) | 行人跟踪方法、设备、系统和计算机可读存储介质 | |
CN111241872B (zh) | 视频图像遮挡方法及装置 | |
WO2020228680A1 (zh) | 基于双相机图像的拼接方法、装置、电子设备 | |
WO2022206680A1 (zh) | 图像处理方法、装置、计算机设备和存储介质 | |
CN112163503A (zh) | 办案区人员无感轨迹生成方法、系统、存储介质及设备 | |
EP3035242B1 (en) | Method and electronic device for object tracking in a light-field capture | |
CN111047622A (zh) | 视频中对象的匹配方法和装置、存储介质及电子装置 | |
CN114092720A (zh) | 目标跟踪方法、装置、计算机设备和存储介质 | |
JP6396682B2 (ja) | 監視カメラシステム | |
Wang et al. | Ranking video salient object detection | |
JP2014228881A5 (zh) | ||
CN114500873A (zh) | 跟踪拍摄系统 | |
Huang et al. | Tracking multiple deformable objects in egocentric videos | |
CN113286084A (zh) | 终端的图像采集方法及装置、存储介质、终端 | |
WO2022179554A1 (zh) | 视频拼接方法、装置、计算机设备和存储介质 | |
US9392146B2 (en) | Apparatus and method for extracting object | |
CN110930437B (zh) | 目标跟踪方法和装置 | |
JP5455101B2 (ja) | 映像処理システムと映像処理方法、映像処理装置及びその制御方法と制御プログラム | |
CN112991175B (zh) | 一种基于单ptz摄像头的全景图片生成方法及设备 | |
CN113033350B (zh) | 基于俯视图像的行人重识别方法、存储介质和电子设备 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22758918 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2023550696 Country of ref document: JP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2022758918 Country of ref document: EP |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
ENP | Entry into the national phase |
Ref document number: 2022758918 Country of ref document: EP Effective date: 20230926 |