CN113099266A - Video fusion method, system, medium and device based on unmanned aerial vehicle POS data - Google Patents

Video fusion method, system, medium and device based on unmanned aerial vehicle POS data Download PDF

Info

Publication number
CN113099266A
CN113099266A CN202110361880.5A CN202110361880A CN113099266A CN 113099266 A CN113099266 A CN 113099266A CN 202110361880 A CN202110361880 A CN 202110361880A CN 113099266 A CN113099266 A CN 113099266A
Authority
CN
China
Prior art keywords
video
image data
data
spliced
pos
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110361880.5A
Other languages
Chinese (zh)
Other versions
CN113099266B (en
Inventor
曾靖杰
刘盛中
侯永顺
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yuncong Technology Group Co Ltd
Original Assignee
Yuncong Technology Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yuncong Technology Group Co Ltd filed Critical Yuncong Technology Group Co Ltd
Priority to CN202110361880.5A priority Critical patent/CN113099266B/en
Publication of CN113099266A publication Critical patent/CN113099266A/en
Application granted granted Critical
Publication of CN113099266B publication Critical patent/CN113099266B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs
    • H04N21/23418Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs
    • H04N21/23424Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs involving splicing one content stream with another content stream, e.g. for inserting or substituting an advertisement
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/25Management operations performed by the server for facilitating the content distribution or administrating data related to end-users or client devices, e.g. end-user or client device authentication, learning user preferences for recommending movies
    • H04N21/258Client or end-user data management, e.g. managing client capabilities, user preferences or demographics, processing of multiple end-users preferences to derive collaborative data
    • H04N21/25808Management of client data
    • H04N21/25841Management of client data involving the geographical location of the client
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs
    • H04N21/44008Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs
    • H04N21/44016Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs involving splicing one content stream with another content stream, e.g. for substituting a video clip
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/45Management operations performed by the client for facilitating the reception of or the interaction with the content or administrating data related to the end-user or to the client device itself, e.g. learning user preferences for recommending movies, resolving scheduling conflicts
    • H04N21/4508Management of client data or end-user data
    • H04N21/4524Management of client data or end-user data involving the geographical location of the client
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/472End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content
    • H04N21/47205End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content for manipulating displayed content, e.g. interacting with MPEG-4 objects, editing locally

Abstract

The invention belongs to the technical field of video data processing, and particularly relates to a video fusion method, a system, a medium and a device based on unmanned aerial vehicle POS data. The invention aims to solve the problem that a large amount of capital and labor cost are consumed in the process of editing and manufacturing videos at the later stage due to the large shooting time span of the unmanned aerial vehicle and the change of the shooting position and angle of the unmanned aerial vehicle at different moments. For this purpose, before video fusion, the second POS data of the video to be spliced are respectively matched with the first POS data of the initial video at the selected first video frame position, so that the second video frame position is determined, and the initial video shot at different time and the video to be spliced are spliced at the video frame position with the same scene. So, can splice or fuse the video that unmanned aerial vehicle shot more high-efficiently, accurately.

Description

Video fusion method, system, medium and device based on unmanned aerial vehicle POS data
Technical Field
The invention belongs to the technical field of video data processing, and particularly relates to a video fusion method, a system, a medium and a device based on unmanned aerial vehicle POS data.
Background
Nowadays, the unmanned aerial vehicle shooting technology plays a non-negligible role in the movie and television play shooting process. In movie & TV drama shooting process, often utilize unmanned aerial vehicle to shoot the not same time point's of scene the segment for later stage to cut the concatenation and use, because it is big to shoot the time span to and unmanned aerial vehicle is in the change of the position and the angle of shooting at different moments, carry out a large amount of clips and preparation with the later stage staff, cause to cut the in-process of preparation to video in the later stage and need consume a large amount of funds and human cost.
Accordingly, there is a need in the art for a method, system, medium, and apparatus for video fusion based on drone POS data that addresses the foregoing problems.
Disclosure of Invention
To solve or at least partially solve: because unmanned aerial vehicle shoots the time span big to and the change of the position and the angle that unmanned aerial vehicle shot at different moments, carry out a large amount of clips and preparation with the later stage staff, cause the problem that the in-process of cutting the preparation to the video needs to consume a large amount of funds and human cost in the later stage. The invention provides a video fusion method, a system, a medium and a device based on unmanned aerial vehicle POS data.
In a first aspect, the invention provides a video fusion method based on unmanned aerial vehicle POS data, which comprises the following steps: acquiring first POS data and first image data of an initial video at a selected first video frame position; matching each second POS data of the video to be spliced with the first POS data respectively, and determining the video frame position where the successfully matched second POS data is located as a second video frame position; acquiring second image data of the video to be spliced at the position of the second video frame; and fusing the initial video and the video to be spliced based on the first image data and the second image data.
As a preferred technical solution of the above video fusion method provided by the present invention, the step of "fusing the initial video and the video to be stitched based on the first image data and the second image data" includes: extracting a plurality of first feature points in the first image data and a plurality of second feature points in the second image data through a SURF algorithm, and determining a matching relationship between the plurality of first feature points and the plurality of second feature points; determining a matching model between the first image data and the second image data through a RANSAC algorithm based on a matching relationship between a plurality of first feature points and a plurality of second feature points; preprocessing the second image data based on the matching model to obtain preprocessed second image data matched with the size and shape of the first image data; and fusing the initial video and the video to be spliced based on the first image data and the preprocessed second image data.
As a preferred technical solution of the above video fusion method provided by the present invention, the step of fusing the initial video and the video to be stitched based on the first image data and the preprocessed second image data includes fusing the first image data and the preprocessed second image data by formulas (1) and (2):
Figure BDA0003005921370000021
in the formula (1), the first and second groups,
Figure BDA0003005921370000022
wherein, I1(x, y) represents a pixel location matrix of the first image data, I2(x, y) represents the second graph after the pretreatmentA pixel position matrix of image data, I (x, y) represents a pixel position matrix of a fused image obtained by fusing the first image data and the preprocessed second image data, and x1Coordinates on an X-axis representing a left boundary of an overlapping region of the first image data and the preprocessed second image data, X2Coordinates on the X-axis of a right boundary of an overlapping region of the first image data and the preprocessed second image data.
As a preferred embodiment of the above video fusion method provided by the present invention, the step of "extracting a plurality of first feature points in the first image data and a plurality of second feature points in the second image data by a SURF algorithm, and determining a matching relationship between the plurality of first feature points in the first image data and the plurality of second feature points in the second image data" includes: generating a plurality of first interest points of the first image data and a plurality of second interest points of the second image data through a Hessian matrix; based on a scale space filter, respectively screening a plurality of first feature points from the first interest points and screening a plurality of second feature points from the second interest points; determining the main direction of each first characteristic point by counting harr wavelet characteristics in the circular neighborhood of the first characteristic points, and determining the main direction of each second characteristic point by counting harr wavelet characteristics in the circular neighborhood of the second characteristic points; generating a first descriptor of each first feature point according to the main direction of each first feature point, and generating a second descriptor of each second feature point according to the main direction of each second feature point; calculating Euclidean distances between each first descriptor and each first descriptor; and determining a matching relationship between the plurality of first feature points and the plurality of second feature points based on the first descriptor and the second descriptor of which the Euclidean distance meets a preset distance threshold.
As a preferable technical solution of the above video fusion method provided by the present invention, the video fusion method further includes: acquiring first POS data and first image data of the initial video at a selected continuous plurality of first video frame positions; respectively matching second POS data of each continuous multiple video frame positions of the video to be spliced with the first POS data, and determining the continuous multiple video frame positions corresponding to the successfully matched second POS data as the continuous multiple second video frame positions; acquiring second image data of the video to be spliced at the positions of the continuous second video frames; and fusing the initial video and the video to be spliced based on the first image data and the second image data.
As a preferred technical solution of the above video fusion method provided by the present invention, after the step of "fusing the initial video and the video to be stitched based on the first image data and the second image data", the video fusion method further includes: judging whether a new video to be spliced still exists or not; if a new video to be spliced exists, taking the video formed after fusion as an initial video and continuously executing the video fusion method in any technical scheme; and if no new video to be spliced exists, outputting the fused video serving as the final fused video.
In a second aspect, the present invention provides a video fusion system based on POS data of an unmanned aerial vehicle, where the video fusion system includes: the acquisition module is used for acquiring first POS data and first image data of an initial video at a selected first video frame position; the POS data matching module is used for respectively matching each second POS data of the video to be spliced with the first POS data and determining the video frame position where the successfully matched second POS data is located as a second video frame position; the acquisition module is further used for acquiring second image data of the video to be spliced at the position of the second video frame; and the fusion module is used for fusing the initial video and the video to be spliced based on the first image data and the second image data.
As a preferable technical solution of the above video fusion system provided by the present invention, the fusion module further includes: the feature matching module is used for extracting a plurality of first feature points in the first image data and a plurality of second feature points in the second image data through a SURF algorithm and determining a matching relation between the plurality of first feature points and the plurality of second feature points; a matching model determining module, configured to determine a matching model between the first image data and the second image data through a RANSAC algorithm based on a matching relationship between the plurality of first feature points and the plurality of second feature points; the preprocessing module is used for preprocessing the second image data based on the matching model to obtain preprocessed second image data matched with the size and the shape of the first image data; the fusion module is further configured to fuse the initial video and the video to be stitched based on the first image data and the preprocessed second image data.
As a preferred technical solution of the above video fusion system provided by the present invention, the fusion module fuses the first image data and the preprocessed second image data through formulas (1) and (2):
Figure BDA0003005921370000041
in the formula (1), the first and second groups,
Figure BDA0003005921370000042
wherein, I1(x, y) represents a pixel location matrix of the first image data, I2(x, y) represents a pixel position matrix of the preprocessed second image data, I (x, y) represents a pixel position matrix of a fused image obtained by fusing the first image data and the preprocessed second image data, and x1Coordinates on an X-axis representing a left boundary of an overlapping region of the first image data and the preprocessed second image data, X2Coordinates on the X-axis of a right boundary of an overlapping region of the first image data and the preprocessed second image data.
As a preferred technical solution of the video fusion system provided by the present invention, the feature matching module is specifically configured to: generating a plurality of first interest points of the first image data and a plurality of second interest points of the second image data through a Hessian matrix; based on a scale space filter, respectively screening a plurality of first feature points from the first interest points and screening a plurality of second feature points from the second interest points; determining the main direction of each first characteristic point by counting harr wavelet characteristics in the circular neighborhood of the first characteristic points, and determining the main direction of each second characteristic point by counting harr wavelet characteristics in the circular neighborhood of the second characteristic points; generating a first descriptor of each first feature point according to the main direction of each first feature point, and generating a second descriptor of each second feature point according to the main direction of each second feature point; calculating Euclidean distances between each first descriptor and each first descriptor; and determining a matching relationship between the plurality of first feature points and the plurality of second feature points based on the first descriptor and the second descriptor of which the Euclidean distance meets a preset distance threshold.
As a preferred technical solution of the above video fusion system provided by the present invention, the obtaining module is further configured to obtain first POS data and first image data of the initial video at selected consecutive multiple first video frame positions; the POS data matching module is further used for respectively matching second POS data of each continuous multiple video frame positions of the video to be spliced with the first POS data, and determining the continuous multiple video frame positions corresponding to the successfully matched second POS data as continuous multiple second-view video frame positions; the acquisition module is further used for acquiring second image data of the video to be spliced at the positions of the continuous second video frames; the fusion module is further configured to fuse the initial video and the video to be stitched based on the first image data and the second image data.
As a preferred technical solution of the above video fusion system provided by the present invention, the video fusion system further includes a judging module, after the step of "fusing the initial video and the video to be stitched based on the first image data and the second image data", the judging module is configured to: judging whether a new video to be spliced still exists or not; if a new video to be spliced exists, the video fusion system takes the video formed after fusion as an initial video to execute the video fusion operation in any one of the technical schemes; and if no new video to be spliced exists, the video fusion system outputs the fused video as the final fused video.
In a third aspect, the present invention further provides a computer-readable storage medium, where multiple program codes are stored, where the program codes are adapted to be loaded and executed by a processor to perform the video fusion method according to any one of the foregoing first aspects.
In a fourth aspect, the present invention further provides a video fusion apparatus based on POS data of an unmanned aerial vehicle, including a processor and a memory, where the memory stores a plurality of program codes, and the program codes are suitable for being loaded and executed by the processor to perform the video fusion method according to any one of the foregoing first aspects.
According to the video fusion method, the system, the medium and the device based on the POS data of the unmanned aerial vehicle, the second POS data of the video to be spliced are respectively matched with the first POS data of the initial video at the selected first video frame position, so that the second video frame position is determined, the initial video shot at different time and the video to be spliced are spliced at the video frame position with the same scene, and then the initial video and the video to be spliced are fused based on the first image data of the initial video at the selected first video frame position and the second image data of the video to be spliced at the second video frame position. So, through the unmanned aerial vehicle POS data that utilizes the unmanned aerial vehicle flight path flight information to provide, the correction of the video concatenation of supplementary different time quantum carries out POS data matching earlier before image data matches, can be more high-efficient, accurately splice or fuse the video that unmanned aerial vehicle shot, can realize seamlessly combining the multistage video of taking photo by plane into one and have not obvious transition, the video of gradual change nature, and can realize the unmanned aerial vehicle multiunit video automated processing of taking photo by plane of different time points.
In addition, the video fusion method, system, medium and device based on the unmanned aerial vehicle POS data further determine the matching relationship between a plurality of first characteristic points in the first image data and a plurality of second characteristic points in the second image data through the SURF algorithm, and determine the matching model between the first image data and the second image data through the RANSAC algorithm. Therefore, the size and the shape of the second image data of the video to be spliced can be adjusted more accurately and conveniently, so that the preprocessed second image data is matched with the first image data of the initial video, and the final video fusion is realized. Therefore, the method is more beneficial to automatic processing in the fusion process of a plurality of videos and obtains the fusion video with natural gradual change.
Drawings
Specific embodiments of the present embodiments are described below with reference to the accompanying drawings, in which:
fig. 1 is a schematic main flow diagram of a video fusion method based on unmanned aerial vehicle POS data according to this embodiment;
fig. 2 is a detailed flowchart of the video fusion method based on the POS data of the unmanned aerial vehicle according to the embodiment;
fig. 3 is a schematic diagram of a hardware structure of a first terminal device provided in this embodiment;
fig. 4 is a schematic diagram of a hardware structure of a second terminal device provided in this embodiment.
Detailed Description
Some embodiments of the invention are described below with reference to the accompanying drawings. It should be understood by those skilled in the art that these embodiments are only for explaining the technical principle of the present invention, and are not intended to limit the scope of the present invention.
In the description of the present invention, a "module" or "processor" may include hardware, software, or a combination of both. A module may comprise hardware circuitry, various suitable sensors, communication ports, memory, may comprise software components such as program code, or may be a combination of software and hardware. The processor may be a central processing unit, microprocessor, image processor, digital signal processor, or any other suitable processor. The processor has data and/or signal processing functionality. The processor may be implemented in software, hardware, or a combination thereof. Non-transitory computer readable storage media include any suitable medium that can store program code, such as magnetic disks, hard disks, optical disks, flash memory, read-only memory, random-access memory, and the like.
The term "a and/or B" denotes all possible combinations of a and B, such as a alone, B alone or a and B. The term "at least one A or B" or "at least one of A and B" means similar to "A and/or B" and may include only A, only B, or both A and B. The singular forms "a", "an" and "the" may include the plural forms as well. Of course, the above alternative embodiments, and the alternative embodiments and the preferred embodiments can also be used in a cross-matching manner, so that a new embodiment is combined to be suitable for a more specific application scenario.
To solve or at least partially solve: because unmanned aerial vehicle shoots the time span big to and the change of the position and the angle that unmanned aerial vehicle shot at different moments, carry out a large amount of clips and preparation with the later stage staff, cause the problem that the in-process of cutting the preparation to the video needs to consume a large amount of funds and human cost in the later stage. The embodiment provides a video fusion method, a system, a medium and a device based on unmanned aerial vehicle POS data.
First aspect
The embodiment provides a video fusion method based on unmanned aerial vehicle POS data, and as shown in fig. 1 and fig. 2, the video fusion method includes:
and S1, acquiring first POS data and first image data of the initial video at the selected first video frame position.
Illustratively, a first video frame position in a video segment of the initial video that is about to end may be selected. It can be understood that the first POS data and the second POS data of the present embodiment are only POS data for distinguishing different videos, and the POS data of the present embodiment includes GPS data and IMU data recorded at the moment of photographing by the drone. The GPS data is a three-dimensional coordinate consisting of longitude, latitude, and flying height. The IMU data is flight attitude data consisting of a course angle, a pitch angle and a roll angle.
And S2, matching each second POS data of the video to be spliced with the first POS data respectively, and determining the video frame position where the successfully matched second POS data is located as a second video frame position.
For example, the second POS data of the video frames to be spliced from the beginning can be respectively matched with the first POS data, so as to splice the end of the initial video with the beginning of the video to be spliced. The method aims to ensure that the content of the fused video in the initial video after playing can naturally play the content of the video to be spliced without transition gradual change. The video frame position of the video to be spliced, at which the data difference between the second POS data and the first POS data satisfies the set threshold requirement, may be determined as the second video frame position.
And S3, acquiring second image data of the video to be spliced at the position of the second video frame.
It can be understood that, on the basis that the first POS data of the first video frame position of the initial video is successfully matched with the second POS data of the second video frame position of the video to be spliced, the second image data of the second video frame position may be directly matched and spliced with the first image data. In addition, there may be a plurality of second video frame positions that satisfy the requirement in the video to be stitched, and at this time, one video frame position in the video to be stitched, which has the highest matching degree with the first POS data in the first video frame position, may be selected as the second video frame position.
And S4, fusing the initial video and the video to be spliced based on the first image data and the second image data.
Illustratively, the end or beginning of the initial video and the beginning or end of the video to be spliced contain videos shot by the drone for the same scene at different points in time. The video fusion method based on the unmanned aerial vehicle POS data aims at splicing or fusing the two types of videos and other videos with scenes needing to be spliced. In this embodiment, each second POS data of the video to be stitched is respectively matched with the first POS data, that is, in order to search for a video frame in the video to be stitched, where the video frame is the same as a shooting scene of a first video frame position of the initial video, and to ensure that the first POS data of the first video frame position of the initial video that is successfully matched is the same as or similar to the second POS data of the second video frame position of the video to be stitched, the matching process may be implemented by requiring that a difference value between the first POS data and the second POS data satisfies a set threshold value.
It can be understood that, in the video fusion method based on the POS data of the unmanned aerial vehicle provided in this embodiment, each second POS data of the video to be spliced is respectively matched with the first POS data of the initial video at the selected first video frame position, so as to determine the second video frame position, so as to splice the initial video shot at different times and the video to be spliced at the same video frame position in the scene, and then fuse the initial video and the video to be spliced based on the first image data of the initial video at the selected first video frame position and the second image data of the video to be spliced at the second video frame position. So, through the unmanned aerial vehicle POS data that utilizes the unmanned aerial vehicle flight path flight information to provide, the correction of the video concatenation of supplementary different time quantum carries out POS data matching earlier before image data matches, can be more high-efficient, accurately splice or fuse the video that unmanned aerial vehicle shot, can realize seamlessly combining the multistage video of taking photo by plane into one and have not obvious transition, the video of gradual change nature, and can realize the unmanned aerial vehicle multiunit video automated processing of taking photo by plane of different time points.
As a preferred implementation manner of the video fusion method provided in this embodiment, the step S4 specifically includes:
s41, extracting a plurality of first feature points in the first image data and a plurality of second feature points in the second image data by SURF algorithm, and determining a matching relationship between the plurality of first feature points and the plurality of second feature points.
It is understood that the first feature point and the second feature point can be used as position points when the initial video is fused with the video to be spliced.
And S42, determining a matching model between the first image data and the second image data through a RANSAC algorithm based on the matching relation between the plurality of first characteristic points and the plurality of second characteristic points. In this way, more accurate matching between the first image data and the second image data can be achieved.
Exemplarily, the first feature point is represented as X1And its corresponding second feature point is represented as X2Then the base matrix F is present such that
Figure BDA0003005921370000091
F is a matching model between the first image data and the second image data. Wherein the basic matrix F is any point p in the first image data1Epipolar line L to corresponding point on the second image data2Of (2), i.e. L2=F*p1. The base matrix F is a 3 × 3 matrix with a total of 9 position elements. Due to the formula
Figure BDA0003005921370000101
The homogeneous coordinates used in the method are equal under the condition of a constant factor difference, and the matrix F only has 8 unknown elements, namely, the basic matrix can be solved by only 8 pairs of matched first characteristic points and second characteristic points.
The basic assumption of the RANSAC algorithm is that samples contain correct data (inliers, data that can be described by a model) and also contain abnormal data (outliers, data that is far from a normal range and cannot adapt to a mathematical model), that is, data sets contain noise. These outlier data may be due to erroneous measurements, erroneous assumptions, erroneous calculations, etc. RANSAC also assumes that, given a correct set of data, there is a way to calculate the model parameters that fit into the data.
The algorithm flow of RANSAC for rejecting the error matching comprises the following steps: 1) from matchingSelecting 8 points from the point pairs, and estimating a basic matrix F by using a normalization 8-point method; 2) calculating the distances d from the rest point pairs to the corresponding epipolar linespIf d ispD is less than or equal to d, the point is an inner point, otherwise, the point is an outer point. Recording the number of interior points meeting the condition as num; 3) and repeating the operation for k times, or stopping if the occupied proportion of the num of the number of the inner points obtained in a certain time is more than or equal to 95 percent. The base matrix with the largest num is selected as the final result F.
S43, preprocessing the second image data based on the matching model to obtain preprocessed second image data matching the size and shape of the first image data.
It is understood that, after the matching model is obtained, the second image data may be stretched, translated, scaled, and the like according to the matching model, so that the plurality of first feature points in the first image data and the corresponding plurality of second feature points in the second image data can be overlapped.
And S44, fusing the initial video and the video to be spliced based on the first image data and the preprocessed second image data.
As a preferred implementation of the above video fusion method provided in this embodiment, step S44 includes fusing the first image data with the pre-processed second image data through formulas (1) and (2):
Figure BDA0003005921370000102
in the formula (1), the first and second groups,
Figure BDA0003005921370000111
wherein, I1(x, y) represents a pixel location matrix of the first image data, I2(x, y) represents a pixel position matrix of the preprocessed second image data, I (x, y) represents a pixel position matrix of a fused image obtained by fusing the first image data and the preprocessed second image data, and x1Coordinates on the X-axis, X, representing the left boundary of the overlapping region of the first image data and the preprocessed second image data2And coordinates on the X-axis of a right boundary of the overlapping area of the first image data and the preprocessed second image data.
As a preferred implementation manner of the video fusion method provided in this embodiment, step S41 of this embodiment specifically includes:
s411, generating a plurality of first interest points of the first image data and a plurality of second interest points of the second image data through a Hessian matrix.
It is understood that, in this step, Hessian (blackplug matrix) that needs to construct the first image data and the second image data, respectively, generates all the interest points and is used for feature extraction. The Hessian matrix is constructed to generate edge points (mutation points) for image stabilization, which function similarly to Canny and laplacian edge detection. When the discriminant of the Hessian matrix obtains a local maximum, the current point is judged to be a brighter or darker point than other points in the surrounding neighborhood, and therefore the position of the interest point is located.
S412, based on the scale space filter, respectively screening a plurality of first feature points from the plurality of first interest points and screening a plurality of second feature points from the plurality of second interest points.
And S413, determining the main direction of each first characteristic point by counting harr wavelet characteristics in the circular neighborhood of the plurality of first characteristic points, and determining the main direction of each second characteristic point by counting harr wavelet characteristics in the circular neighborhood of the plurality of second characteristic points.
And S414, generating a first descriptor of each first feature point according to the main direction of each first feature point, and generating a second descriptor of each second feature point according to the main direction of each second feature point.
And S415, calculating Euclidean distances between the first descriptors and the first descriptors. The euclidean distance may be used to indicate the matching degree between the first feature point and the second feature point.
And S416, determining the matching relation between the plurality of first characteristic points and the plurality of second characteristic points based on the first descriptor and the second descriptor of which the Euclidean distance meets the preset distance threshold. The matching relationship is a one-to-one correspondence relationship between the first characteristic points and the second characteristic points.
As a preferred implementation of the video fusion method provided in this embodiment, the video fusion method may further include:
s101, first POS data and first image data of the initial video at the selected continuous multiple first video frame positions are obtained.
It can be understood that selecting a plurality of consecutive first video frames enables finding a more suitable first video frame when the initial video is matched with the video to be spliced, so that the matching is more accurate and more efficient.
S102, matching second POS data of each continuous video frame position of the video to be spliced with the first POS data, and determining the continuous video frame positions corresponding to the successfully matched second POS data as the continuous second video frame positions.
It should be noted that the first POS data of a plurality of consecutive first video frames at the end of the initial video may be sequentially matched with the second POS data of each of a plurality of consecutive video frame positions of the video to be stitched from the beginning of the video to be stitched. Thus, the speed and accuracy of matching the first POS data and the second POS data can be improved.
S103, second image data of the video to be spliced at the positions of a plurality of continuous second video frames are obtained.
And S104, fusing the initial video and the video to be spliced based on the first image data and the second image data.
It should be noted that, the first image data of a plurality of consecutive first video frames and the second image data of a plurality of consecutive second video frames may be respectively matched to find the first image data and the second image data with the highest matching degree for subsequent splicing.
As a preferred implementation manner of the above video fusion method provided in this embodiment, after step S4, the video fusion method further includes:
s51, judging whether a new video to be spliced exists or not;
s521, if a new video to be spliced exists, taking the video formed after fusion as an initial video and continuing to execute the video fusion method of any one of the above embodiments;
and S522, if no new video to be spliced exists, outputting the fused video serving as a final fused video.
It should be noted that, the output final fused video may be stored and output according to a unified video coding mode.
It should be noted that although the detailed steps of the video fusion method based on the POS data of the unmanned aerial vehicle according to the embodiment are described in detail above, on the premise of not departing from the basic principle of the embodiment, a person skilled in the art may combine, split, and exchange the order of the above steps, and the implementation paradigm after such modification does not change the basic concept of the embodiment, and therefore, the implementation paradigm also falls within the protection scope of the embodiment.
Second aspect of the invention
It should be noted that the video fusion system based on the POS data of the unmanned aerial vehicle provided in this embodiment corresponds to the video fusion method based on the POS data of the unmanned aerial vehicle in the first aspect, so that details of the system in this embodiment are not repeated, and for the description of the system, refer to the contents in the first aspect.
This embodiment provides a video fusion system based on unmanned aerial vehicle POS data, this video fusion system includes: the acquisition module is used for acquiring first POS data and first image data of an initial video at a selected first video frame position; the POS data matching module is used for respectively matching each second POS data of the video to be spliced with the first POS data and determining the video frame position where the successfully matched second POS data is located as a second video frame position; the acquisition module is also used for acquiring second image data of the video to be spliced at a second video frame position; and the fusion module is used for fusing the initial video and the video to be spliced based on the first image data and the second image data.
As a preferred implementation manner of the above video fusion system provided in this embodiment, the fusion module further includes: the characteristic matching module is used for extracting a plurality of first characteristic points in the first image data and a plurality of second characteristic points in the second image data through an SURF algorithm and determining the matching relation between the plurality of first characteristic points and the plurality of second characteristic points; the matching model determining module is used for determining a matching model between the first image data and the second image data through a RANSAC algorithm based on the matching relation between the plurality of first characteristic points and the plurality of second characteristic points; the preprocessing module is used for preprocessing the second image data based on the matching model to obtain preprocessed second image data matched with the size and shape of the first image data; the fusion module is further used for fusing the initial video and the video to be spliced based on the first image data and the preprocessed second image data.
As a preferred implementation of the above video fusion system provided in this embodiment, the fusion module fuses the first image data and the preprocessed second image data through formulas (1) and (2):
Figure BDA0003005921370000141
in the formula (1), the first and second groups,
Figure BDA0003005921370000142
wherein, I1(x, y) represents a pixel location matrix of the first image data, I2(x, y) represents a pixel position matrix of the preprocessed second image data, I (x, y) represents a pixel position matrix of a fused image obtained by fusing the first image data and the preprocessed second image data, and x1Coordinates on the X-axis, X, representing the left boundary of the overlapping region of the first image data and the preprocessed second image data2Representing the first image data and after preprocessingThe right boundary of the overlapping area of the second image data of (2) is the coordinate on the X-axis.
As a preferred implementation manner of the video fusion system provided in this embodiment, the feature matching module is specifically configured to: generating a plurality of first interest points of the first image data and a plurality of second interest points of the second image data through a Hessian matrix; based on a scale space filter, respectively screening a plurality of first feature points from a plurality of first interest points and screening a plurality of second feature points from a plurality of second interest points; determining the main direction of each first characteristic point by counting harr wavelet characteristics in a circular neighborhood of the plurality of first characteristic points, and determining the main direction of each second characteristic point by counting harr wavelet characteristics in a circular neighborhood of the plurality of second characteristic points; generating a first descriptor of each first characteristic point according to the main direction of each first characteristic point, and generating a second descriptor of each second characteristic point according to the main direction of each second characteristic point; calculating Euclidean distances between the first descriptors and the first descriptors; and determining the matching relation between the plurality of first characteristic points and the plurality of second characteristic points based on the first descriptor and the second descriptor of which the Euclidean distance meets the preset distance threshold.
As a preferred implementation manner of the video fusion system provided in this embodiment, the obtaining module is further configured to obtain first POS data and first image data of the initial video at the selected consecutive multiple first video frame positions; the POS data matching module is also used for respectively matching second POS data of each continuous multiple video frame positions of the video to be spliced with the first POS data, and determining the continuous multiple video frame positions corresponding to the successfully matched second POS data as continuous multiple second video frame positions; the acquisition module is also used for acquiring second image data of the video to be spliced at the positions of a plurality of continuous second video frames; the fusion module is further used for fusing the initial video and the video to be spliced based on the first image data and the second image data.
As a preferred implementation manner of the above video fusion system provided in this embodiment, the video fusion system further includes a determining module, and after the step of "fusing the initial video and the video to be stitched based on the first image data and the second image data", the determining module is configured to: judging whether a new video to be spliced still exists or not; if a new video to be spliced exists, the video fusion system takes the video formed after fusion as an initial video to execute the video fusion operation in any one of the above embodiments; and if no new video to be spliced exists, the video fusion system outputs the fused video as the final fused video.
It should be noted that, the video fusion system based on the POS data of the unmanned aerial vehicle provided in the foregoing embodiment is exemplified by only the division of the above functional modules (such as the acquisition module, the POS data matching module, the fusion module, and the like), and in practical application, the functional modules may be completed by different functional modules according to needs, that is, the functional modules in the embodiment of the present invention are further decomposed or combined, for example, the functional modules in the foregoing embodiment may be combined into one functional module, or may be further split into multiple sub-modules, so as to complete all or part of the functions described above. The names of the function modules related to the embodiment of the present invention are only for distinguishing and are not to be construed as an improper limitation to the embodiment.
Third aspect of the invention
The present embodiment also provides a computer-readable storage medium, having stored therein a plurality of program codes, which are adapted to be loaded and executed by a processor to perform the video fusion method in any of the foregoing embodiments of the first aspect.
The storage medium includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) or a processor (processor) to perform some steps of the methods according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
Fourth aspect of the invention
The embodiment also provides a video fusion device based on POS data of the unmanned aerial vehicle, which includes a processor and a memory, where the memory stores a plurality of program codes, and the program codes are suitable for being loaded and executed by the processor to perform the video fusion method in any one of the foregoing first aspect embodiments.
Fifth aspect of the invention
The embodiment further explains the implementation of the video fusion method based on the unmanned aerial vehicle POS data, which is mainly applied to a scene of a terminal device. The hardware structure of the terminal device is shown in fig. 3. The terminal device may include: an input device 1100, a first processor 1101, an output device 1102, a first memory 1103, and at least one communication bus 1104. The communication bus 1104 is used to implement communication connections between the elements. The first memory 1103 may include a high-speed RAM memory, and may also include a non-volatile storage NVM, such as at least one disk memory, and the first memory 1103 may store various programs for performing various processing functions and implementing the method steps of the present embodiment.
Alternatively, the first processor 1101 may be, for example, a Central Processing Unit (CPU), an Application Specific Integrated Circuit (ASIC), a Digital Signal Processor (DSP), a Digital Signal Processing Device (DSPD), a Programmable Logic Device (PLD), a Field Programmable Gate Array (FPGA), a controller, a microcontroller, a microprocessor, or other electronic components, and the first processor 1101 is coupled to the input device 1100 and the output device 1102 through a wired or wireless connection.
Optionally, the input device 1100 may include a variety of input devices, such as at least one of a user-oriented user interface, a device-oriented device interface, a software programmable interface, a camera, and a sensor. Optionally, the device interface facing the device may be a wired interface for data transmission between devices, or may be a hardware plug-in interface (e.g., a USB interface, a serial port, etc.) for data transmission between devices; optionally, the user-facing user interface may be, for example, a user-facing control key, a voice input device for receiving voice input, and a touch sensing device (e.g., a touch screen with a touch sensing function, a touch pad, etc.) for receiving user touch input; optionally, the programmable interface of the software may be, for example, an entry for a user to edit or modify a program, such as an input pin interface or an input interface of a chip; the output devices 1102 may include output devices such as a display, audio, and the like. In this embodiment, the processor of the terminal device includes a function for executing each module of the speech recognition apparatus in each device, and specific functions and technical effects may refer to the above embodiments, which are not described herein again.
Fig. 4 is a schematic hardware structure diagram of a terminal device according to another embodiment of the present application. Fig. 4 is a specific embodiment of fig. 3 in an implementation process. As shown in fig. 4, the terminal device of the present embodiment may include a second processor 1201 and a second memory 1202.
The second processor 1201 executes the computer program code stored in the second memory 1202 to implement the video fusion method based on the drone POS data of the first aspect and as in fig. 1 and 2. The second memory 1202 is configured to store various types of data to support operations at the terminal device. Examples of such data include instructions for any application or method operating on the terminal device, such as messages, pictures, videos, and so forth. The second memory 1202 may include a Random Access Memory (RAM) and may also include a non-volatile memory (non-volatile memory), such as at least one disk memory.
Optionally, a second processor 1201 is provided in the processing module 1200. The terminal device may further include: a communication module 1203, a power module 1204, a multimedia module 1205, a voice module 1206, input/output interfaces 1207, and/or a sensor module 1208. The specific components included in the terminal device are set according to actual requirements, which is not limited in this embodiment.
The processing module 1200 generally controls the overall operation of the terminal device. The processing module 1200 may include one or more second processors 1201 to execute instructions to perform all or part of the steps of the method shown in fig. 1 described above. Further, the processing module 1200 may include one or more modules that facilitate interaction between the processing module 1200 and other components. For example, the processing module 1200 may include a multimedia module to facilitate interaction between the multimedia module 1205 and the processing module 1200. A power module 1204 provides power to the various components of the terminal device. The power module 1204 may include a power management system, one or more power sources, and other components associated with generating, managing, and distributing power for the terminal devices. The multimedia module 1205 includes a display screen that provides an output interface between the terminal device and the user. In some embodiments, the display screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the display screen includes a touch panel, the display screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. The voice module 1206 is configured to output and/or input a voice signal. For example, the voice module 1206 includes a Microphone (MIC) configured to receive an external voice signal when the terminal device is in an operational mode, such as a voice recognition mode. The received speech signal may further be stored in the second memory 1202 or transmitted via the communication module 1203. In some embodiments, the voice module 1206 further includes a speaker for outputting voice signals.
Input/output interface 1207 provides an interface between processing module 1200 and peripheral interface modules, which may be click wheels, buttons, etc. These buttons may include, but are not limited to: a volume button, a start button, and a lock button.
The sensor module 1208 includes one or more sensors for providing various aspects of status assessment for the terminal device. For example, the sensor module 1208 may detect an open/closed status of the terminal device, relative positioning of components, presence or absence of user contact with the terminal device. The sensor module 1208 may include a proximity sensor configured to detect the presence of nearby objects without any physical contact, including detecting the distance between the user and the terminal device. In some embodiments, the sensor module 1208 may also include a camera or the like.
The communication module 1203 is configured to facilitate communication between the terminal device and other devices in a wired or wireless manner. The terminal device may access a wireless network based on a communication standard, such as WiFi, 2G or 3G, or a combination thereof. In one embodiment, the terminal device may include a SIM card slot therein for inserting a SIM card therein, so that the terminal device may log onto a GPRS network to establish communication with the server via the internet.
As can be seen from the above, the communication module 1203, the voice module 1206, the input/output interface 1207, and the sensor module 1208 in the embodiment of fig. 4 may be implemented as the input device in the embodiment of fig. 3.
Furthermore, those skilled in the art will appreciate that while some embodiments described herein include some features included in other embodiments, rather than other features, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments. For example, in the claims of the present invention, any of the claimed embodiments may be used in any combination.
So far, the technical solutions of the present invention have been described in connection with the preferred embodiments shown in the drawings, but it is easily understood by those skilled in the art that the scope of the present invention is obviously not limited to these specific embodiments. Equivalent changes or substitutions of related technical features can be made by those skilled in the art without departing from the principle of the invention, and the technical scheme after the changes or substitutions can fall into the protection scope of the invention.

Claims (14)

1. A video fusion method based on unmanned aerial vehicle POS data is characterized by comprising the following steps:
acquiring first POS data and first image data of an initial video at a selected first video frame position;
matching each second POS data of the video to be spliced with the first POS data respectively, and determining the video frame position where the successfully matched second POS data is located as a second video frame position;
acquiring second image data of the video to be spliced at the position of the second video frame;
and fusing the initial video and the video to be spliced based on the first image data and the second image data.
2. The video fusion method according to claim 1, wherein the step of fusing the initial video and the video to be stitched based on the first image data and the second image data comprises:
extracting a plurality of first feature points in the first image data and a plurality of second feature points in the second image data through a SURF algorithm, and determining a matching relationship between the plurality of first feature points and the plurality of second feature points;
determining a matching model between the first image data and the second image data through a RANSAC algorithm based on a matching relationship between a plurality of first feature points and a plurality of second feature points;
preprocessing the second image data based on the matching model to obtain preprocessed second image data matched with the size and shape of the first image data;
and fusing the initial video and the video to be spliced based on the first image data and the preprocessed second image data.
3. The video fusion method according to claim 2, wherein the step of fusing the initial video and the video to be stitched based on the first image data and the preprocessed second image data comprises fusing the first image data and the preprocessed second image data by equations (1) and (2):
Figure FDA0003005921360000021
in the formula (1), the first and second groups,
Figure FDA0003005921360000022
wherein, I1(x, y) represents a pixel location matrix of the first image data, I2(x, y) represents a pixel position matrix of the preprocessed second image data, I (x, y) represents a pixel position matrix of a fused image obtained by fusing the first image data and the preprocessed second image data, and x1Coordinates on an X-axis representing a left boundary of an overlapping region of the first image data and the preprocessed second image data, X2Coordinates on the X-axis of a right boundary of an overlapping region of the first image data and the preprocessed second image data.
4. The video fusion method according to claim 2, wherein the step of extracting a plurality of first feature points in the first image data and a plurality of second feature points in the second image data by a SURF algorithm and determining a matching relationship between the plurality of first feature points in the first image data and the plurality of second feature points in the second image data comprises:
generating a plurality of first interest points of the first image data and a plurality of second interest points of the second image data through a Hessian matrix;
based on a scale space filter, respectively screening a plurality of first feature points from the first interest points and screening a plurality of second feature points from the second interest points;
determining the main direction of each first characteristic point by counting harr wavelet characteristics in the circular neighborhood of the first characteristic points, and determining the main direction of each second characteristic point by counting harr wavelet characteristics in the circular neighborhood of the second characteristic points;
generating a first descriptor of each first feature point according to the main direction of each first feature point, and generating a second descriptor of each second feature point according to the main direction of each second feature point;
calculating Euclidean distances between each first descriptor and each first descriptor;
and determining a matching relationship between the plurality of first feature points and the plurality of second feature points based on the first descriptor and the second descriptor of which the Euclidean distance meets a preset distance threshold.
5. The video fusion method of claim 1, further comprising:
acquiring first POS data and first image data of the initial video at a selected continuous plurality of first video frame positions;
respectively matching second POS data of each continuous multiple video frame positions of the video to be spliced with the first POS data, and determining the continuous multiple video frame positions corresponding to the successfully matched second POS data as the continuous multiple second video frame positions;
acquiring second image data of the video to be spliced at the positions of the continuous second video frames;
and fusing the initial video and the video to be spliced based on the first image data and the second image data.
6. The video fusion method according to any one of claims 1 to 5, wherein after the step of "fusing the initial video and the video to be stitched based on the first image data and the second image data", the video fusion method further comprises:
judging whether a new video to be spliced still exists or not;
if a new video to be spliced still exists, taking the video formed after the fusion as an initial video and continuing to execute the video fusion method of any one of claims 1 to 5;
and if no new video to be spliced exists, outputting the fused video serving as the final fused video.
7. The utility model provides a video fusion system based on unmanned aerial vehicle POS data which characterized in that, video fusion system includes:
the acquisition module is used for acquiring first POS data and first image data of an initial video at a selected first video frame position;
the POS data matching module is used for respectively matching each second POS data of the video to be spliced with the first POS data and determining the video frame position where the successfully matched second POS data is located as a second video frame position;
the acquisition module is further used for acquiring second image data of the video to be spliced at the position of the second video frame;
and the fusion module is used for fusing the initial video and the video to be spliced based on the first image data and the second image data.
8. The video fusion system of claim 7, wherein the fusion module further comprises:
the feature matching module is used for extracting a plurality of first feature points in the first image data and a plurality of second feature points in the second image data through a SURF algorithm and determining a matching relation between the plurality of first feature points and the plurality of second feature points;
a matching model determining module, configured to determine a matching model between the first image data and the second image data through a RANSAC algorithm based on a matching relationship between the plurality of first feature points and the plurality of second feature points;
the preprocessing module is used for preprocessing the second image data based on the matching model to obtain preprocessed second image data matched with the size and the shape of the first image data;
the fusion module is further configured to fuse the initial video and the video to be stitched based on the first image data and the preprocessed second image data.
9. The video fusion system of claim 8 wherein the fusion module fuses the first image data with the pre-processed second image data by equations (1) and (2):
Figure FDA0003005921360000041
in the formula (1), the first and second groups,
Figure FDA0003005921360000042
wherein, I1(x, y) represents a pixel location matrix of the first image data, I2(x, y) represents a pixel position matrix of the preprocessed second image data, I (x, y) represents a pixel position matrix of a fused image obtained by fusing the first image data and the preprocessed second image data, and x1Coordinates on an X-axis representing a left boundary of an overlapping region of the first image data and the preprocessed second image data, X2Coordinates on the X-axis of a right boundary of an overlapping region of the first image data and the preprocessed second image data.
10. The video fusion system of claim 8, wherein the feature matching module is specifically configured to:
generating a plurality of first interest points of the first image data and a plurality of second interest points of the second image data through a Hessian matrix;
based on a scale space filter, respectively screening a plurality of first feature points from the first interest points and screening a plurality of second feature points from the second interest points;
determining the main direction of each first characteristic point by counting harr wavelet characteristics in the circular neighborhood of the first characteristic points, and determining the main direction of each second characteristic point by counting harr wavelet characteristics in the circular neighborhood of the second characteristic points;
generating a first descriptor of each first feature point according to the main direction of each first feature point, and generating a second descriptor of each second feature point according to the main direction of each second feature point;
calculating Euclidean distances between each first descriptor and each first descriptor;
and determining a matching relationship between the plurality of first feature points and the plurality of second feature points based on the first descriptor and the second descriptor of which the Euclidean distance meets a preset distance threshold.
11. The video fusion system of claim 7, wherein:
the acquisition module is further used for acquiring first POS data and first image data of the initial video at a selected continuous plurality of first video frame positions;
the POS data matching module is further used for respectively matching second POS data of each continuous multiple video frame positions of the video to be spliced with the first POS data, and determining the continuous multiple video frame positions corresponding to the successfully matched second POS data as continuous multiple second-view video frame positions;
the acquisition module is further used for acquiring second image data of the video to be spliced at the positions of the continuous second video frames;
the fusion module is further configured to fuse the initial video and the video to be stitched based on the first image data and the second image data.
12. The video fusion system according to any one of claims 7 to 11, further comprising a determination module, after the step of fusing the initial video and the video to be stitched based on the first image data and the second image data, configured to:
judging whether a new video to be spliced still exists or not;
if a new video to be spliced still exists, the video fusion system takes the video formed after fusion as an initial video to execute the video fusion operation of any one of claims 7 to 11;
and if no new video to be spliced exists, the video fusion system outputs the fused video as the final fused video.
13. A computer readable storage medium having stored therein a plurality of program codes, characterized in that the program codes are adapted to be loaded and run by a processor to perform the video fusion method according to any one of claims 1 to 6.
14. Video fusion device based on unmanned aerial vehicle POS data, comprising a processor and a memory, said memory having stored therein a plurality of program codes, characterized in that said program codes are adapted to be loaded and run by said processor to perform the video fusion method according to any one of claims 1 to 6.
CN202110361880.5A 2021-04-02 2021-04-02 Video fusion method, system, medium and device based on unmanned aerial vehicle POS data Active CN113099266B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110361880.5A CN113099266B (en) 2021-04-02 2021-04-02 Video fusion method, system, medium and device based on unmanned aerial vehicle POS data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110361880.5A CN113099266B (en) 2021-04-02 2021-04-02 Video fusion method, system, medium and device based on unmanned aerial vehicle POS data

Publications (2)

Publication Number Publication Date
CN113099266A true CN113099266A (en) 2021-07-09
CN113099266B CN113099266B (en) 2023-05-26

Family

ID=76673583

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110361880.5A Active CN113099266B (en) 2021-04-02 2021-04-02 Video fusion method, system, medium and device based on unmanned aerial vehicle POS data

Country Status (1)

Country Link
CN (1) CN113099266B (en)

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104282005A (en) * 2014-09-19 2015-01-14 天津航天中为数据系统科技有限公司 Video image stitching method and device
CN105100640A (en) * 2015-01-23 2015-11-25 武汉智源泉信息科技有限公司 Local registration parallel video stitching method and local registration parallel video stitching system
CN105493512A (en) * 2014-12-14 2016-04-13 深圳市大疆创新科技有限公司 Video processing method, video processing device and display device
CN105678719A (en) * 2014-11-20 2016-06-15 深圳英飞拓科技股份有限公司 Panoramic stitching seam smoothing method and panoramic stitching seam smoothing device
CN106067948A (en) * 2016-07-27 2016-11-02 杨珊珊 Unmanned plane and take photo by plane material processing equipment, automatic integrated system and integration method
CN107808362A (en) * 2017-11-15 2018-03-16 北京工业大学 A kind of image split-joint method combined based on unmanned plane POS information with image SURF features
US20180103197A1 (en) * 2016-10-06 2018-04-12 Gopro, Inc. Automatic Generation of Video Using Location-Based Metadata Generated from Wireless Beacons
US10013763B1 (en) * 2015-09-28 2018-07-03 Amazon Technologies, Inc. Increasing field of view using multiple devices
CN108702464A (en) * 2017-10-16 2018-10-23 深圳市大疆创新科技有限公司 A kind of method for processing video frequency, control terminal and movable equipment
WO2019140621A1 (en) * 2018-01-19 2019-07-25 深圳市大疆创新科技有限公司 Video processing method and terminal device
CN110992261A (en) * 2019-11-15 2020-04-10 国网福建省电力有限公司漳州供电公司 Method for quickly splicing images of unmanned aerial vehicle of power transmission line
CN112166599A (en) * 2019-09-26 2021-01-01 深圳市大疆创新科技有限公司 Video editing method and terminal equipment
CN112544071A (en) * 2020-07-27 2021-03-23 华为技术有限公司 Video splicing method, device and system

Patent Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104282005A (en) * 2014-09-19 2015-01-14 天津航天中为数据系统科技有限公司 Video image stitching method and device
CN105678719A (en) * 2014-11-20 2016-06-15 深圳英飞拓科技股份有限公司 Panoramic stitching seam smoothing method and panoramic stitching seam smoothing device
CN105493512A (en) * 2014-12-14 2016-04-13 深圳市大疆创新科技有限公司 Video processing method, video processing device and display device
CN105100640A (en) * 2015-01-23 2015-11-25 武汉智源泉信息科技有限公司 Local registration parallel video stitching method and local registration parallel video stitching system
US10013763B1 (en) * 2015-09-28 2018-07-03 Amazon Technologies, Inc. Increasing field of view using multiple devices
CN106067948A (en) * 2016-07-27 2016-11-02 杨珊珊 Unmanned plane and take photo by plane material processing equipment, automatic integrated system and integration method
US20180103197A1 (en) * 2016-10-06 2018-04-12 Gopro, Inc. Automatic Generation of Video Using Location-Based Metadata Generated from Wireless Beacons
CN108702464A (en) * 2017-10-16 2018-10-23 深圳市大疆创新科技有限公司 A kind of method for processing video frequency, control terminal and movable equipment
CN107808362A (en) * 2017-11-15 2018-03-16 北京工业大学 A kind of image split-joint method combined based on unmanned plane POS information with image SURF features
WO2019140621A1 (en) * 2018-01-19 2019-07-25 深圳市大疆创新科技有限公司 Video processing method and terminal device
CN110612721A (en) * 2018-01-19 2019-12-24 深圳市大疆创新科技有限公司 Video processing method and terminal equipment
CN112166599A (en) * 2019-09-26 2021-01-01 深圳市大疆创新科技有限公司 Video editing method and terminal equipment
CN110992261A (en) * 2019-11-15 2020-04-10 国网福建省电力有限公司漳州供电公司 Method for quickly splicing images of unmanned aerial vehicle of power transmission line
CN112544071A (en) * 2020-07-27 2021-03-23 华为技术有限公司 Video splicing method, device and system

Also Published As

Publication number Publication date
CN113099266B (en) 2023-05-26

Similar Documents

Publication Publication Date Title
WO2020207191A1 (en) Method and apparatus for determining occluded area of virtual object, and terminal device
CN108764091B (en) Living body detection method and apparatus, electronic device, and storage medium
CN105282430B (en) Electronic device using composition information of photograph and photographing method using the same
CN108830892B (en) Face image processing method and device, electronic equipment and computer readable storage medium
US9607394B2 (en) Information processing method and electronic device
US10674066B2 (en) Method for processing image and electronic apparatus therefor
CN108961267B (en) Picture processing method, picture processing device and terminal equipment
CN109525786B (en) Video processing method and device, terminal equipment and storage medium
CN108898082B (en) Picture processing method, picture processing device and terminal equipment
CN111062276A (en) Human body posture recommendation method and device based on human-computer interaction, machine readable medium and equipment
CN109040596B (en) Method for adjusting camera, mobile terminal and storage medium
CN110296686B (en) Vision-based positioning method, device and equipment
CN111325798B (en) Camera model correction method, device, AR implementation equipment and readable storage medium
CN107909569B (en) Screen-patterned detection method, screen-patterned detection device and electronic equipment
US11551465B2 (en) Method and apparatus for detecting finger occlusion image, and storage medium
CN110113677A (en) The generation method and device of video subject
US11531702B2 (en) Electronic device for generating video comprising character and method thereof
US20210097655A1 (en) Image processing method and electronic device supporting the same
KR20180010493A (en) Electronic device and method for editing video thereof
CN108898649A (en) Image processing method and device
CN110177216B (en) Image processing method, image processing device, mobile terminal and storage medium
KR102161437B1 (en) Apparatus for sharing contents using spatial map of augmented reality and method thereof
CN113099266B (en) Video fusion method, system, medium and device based on unmanned aerial vehicle POS data
CN108932704B (en) Picture processing method, picture processing device and terminal equipment
CN112163554B (en) Method and device for acquiring mark mask in video

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant