CN115619959B - Comprehensive environment three-dimensional modeling method for extracting key frames based on videos acquired by unmanned aerial vehicle - Google Patents

Comprehensive environment three-dimensional modeling method for extracting key frames based on videos acquired by unmanned aerial vehicle Download PDF

Info

Publication number
CN115619959B
CN115619959B CN202211631541.5A CN202211631541A CN115619959B CN 115619959 B CN115619959 B CN 115619959B CN 202211631541 A CN202211631541 A CN 202211631541A CN 115619959 B CN115619959 B CN 115619959B
Authority
CN
China
Prior art keywords
key frame
flight
key
representing
cost
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211631541.5A
Other languages
Chinese (zh)
Other versions
CN115619959A (en
Inventor
朱义勇
梁亢聘
李金玖
袁渊
袁现旺
慕昊润
张洪碧
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National University of Defense Technology
Original Assignee
National University of Defense Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National University of Defense Technology filed Critical National University of Defense Technology
Priority to CN202211631541.5A priority Critical patent/CN115619959B/en
Publication of CN115619959A publication Critical patent/CN115619959A/en
Application granted granted Critical
Publication of CN115619959B publication Critical patent/CN115619959B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • G06T17/05Geographic models
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B64AIRCRAFT; AVIATION; COSMONAUTICS
    • B64CAEROPLANES; HELICOPTERS
    • B64C39/00Aircraft not otherwise provided for
    • B64C39/02Aircraft not otherwise provided for characterised by special use
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/761Proximity, similarity or dissimilarity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • G06V20/17Terrestrial scenes taken from planes or by drones
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2200/00Indexing scheme for image data processing or generation, in general
    • G06T2200/08Indexing scheme for image data processing or generation, in general involving all processing steps from image acquisition to 3D model generation
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Abstract

The comprehensive environment three-dimensional modeling method comprises the steps of obtaining an optimal air route covering an area needing modeling based on an optimization result of a comprehensive preference function considering constraint conditions such as various costs and the like, and enabling an unmanned aerial vehicle to sail along the optimal air route to obtain comprehensive environment video data; measuring image frames of the comprehensive environment video data by using a method for calculating image frame similarity and using the Babbitt distance to extract a key frame set; extracting a plurality of groups of adjacent key frame combinations in the key frame set, calculating the contact ratio by using an improved SIFT algorithm, extracting frames of the frame combinations which do not meet the precision requirement, and supplementing the frames to obtain a screening and supplementing key frame set; and embedding geographic information data into the screened supplementary key frame set and preprocessing the geographic information data to construct a comprehensive environment three-dimensional model of the region needing to be modeled. The method not only ensures the working stability and high efficiency of the unmanned aerial vehicle and reduces the operation burden of personnel, but also provides a terrain three-dimensional model with geographic information data, and is convenient and practical.

Description

Comprehensive environment three-dimensional modeling method for extracting key frames based on videos acquired by unmanned aerial vehicle
Technical Field
The invention relates to the technical field of unmanned aerial vehicle remote sensing, in particular to a comprehensive environment three-dimensional modeling method for extracting key frames based on videos acquired by an unmanned aerial vehicle.
Background
It is common practice in many industry sectors to use drones to take pictures or videos to view a particular area. In consideration of portability, the single-lens-carried micro unmanned aerial vehicle is mostly used in many current operation scenes, the single lens can only acquire a two-dimensional plane graph or a two-dimensional video, and in the later period, the situation of comprehensively watching the terrain and researching and judging needs to be repeatedly switched or continuously played back in a plurality of pictures, so that the problems of long consumed time, inconvenience in operation, difficulty in plotting and the like are caused.
Based on the inconvenience that the single lens can only obtain a two-dimensional plane graph or a two-dimensional video at present, three-dimensional modeling can be adopted to overcome the inconvenience, however, the solution for constructing the three-dimensional model which is popular in the market has the following defects: firstly, the operation flow is complicated, thorns need to be selected in advance and image control points need to be finished in the field operation, and the whole set of flow is not completely suitable for different environments of scenes; secondly, in the flight operation of the unmanned aerial vehicle, the acquired information data mainly comprises photos, if the number of the photos is small, the possibility of missing and losing information exists, and if the number of the photos is too large, the subsequent three-dimensional modeling can be greatly burdened, and the intermediate trade-off standard is difficult to determine; and thirdly, the time consumed by outfield operation is long, the flight speed of the unmanned aerial vehicle is limited due to factors such as optical flow phenomenon and sensor sensitivity, the low-altitude low-speed flying unmanned aerial vehicle is difficult to ensure the working stability under certain severe conditions, and sometimes the long-time air stagnation of the unmanned aerial vehicle is not allowed under meteorological conditions, battery capacity and scene sensitivity.
Disclosure of Invention
In order to overcome at least one technical defect in the prior art mentioned above and improve or optimize the prior art, the invention provides a comprehensive environment three-dimensional modeling method for extracting key frames based on videos acquired by an unmanned aerial vehicle, which is used for optimizing the generation rate of a three-dimensional model, quickly updating the three-dimensional model as far as possible, and optimizing application scenes such as mountain forests, urban environments and the like to eliminate information fog and provide convenience for decision making of operating personnel.
In order to achieve the purpose, the invention provides a comprehensive environment three-dimensional modeling method for extracting key frames based on videos collected by an unmanned aerial vehicle, which comprises the following steps:
acquiring an optimal air route covering an area needing to be modeled based on an optimization result of a comprehensive preference function considering one or more constraint conditions including air route length cost, dead time cost, special operation risk area cost and no-fly area cost, and enabling the unmanned aerial vehicle to sail along the optimal air route to acquire comprehensive environment video data;
measuring the image frames of the comprehensive environment video data by using a method for calculating the similarity of the image frames and the Babbitt distance so as to extract a key frame set;
extracting a plurality of groups of adjacent key frame combinations in the key frame set, calculating the contact ratio by using an improved SIFT algorithm, and extracting and supplementing frames of the frame combinations which do not meet the precision requirement to obtain a screening and supplementing key frame set;
and embedding geographic information data into the screened supplementary key frame set and preprocessing the screened supplementary key frame set so as to construct a comprehensive environment three-dimensional model of the region needing to be modeled.
Further, the expression of the comprehensive preference function is:
Figure 506375DEST_PATH_IMAGE001
wherein the content of the first and second substances,
Figure 157936DEST_PATH_IMAGE002
representing a comprehensive preference value for the route planning;mrepresenting the number of constraints;nrepresenting the number of voyages of the unmanned aerial vehicle;c ij is represented iniDuring the navigation of each flight segmentjThe cost of each constraint;λ j representing preference coefficients, representing the first to be given in a certain route planningjPreference degrees of the respective constraints.
Further, the expression of the route length cost is as follows:
Figure 737340DEST_PATH_IMAGE003
wherein, the first and the second end of the pipe are connected with each other,c i1 is represented iniThe length cost of the flight line in the voyage of each flight segment;
Figure 866970DEST_PATH_IMAGE004
represents the firstiThe length of the flight path of each flight segment;
Figure 478080DEST_PATH_IMAGE005
representing the linear distance of the entire flight path including all the legs in the most ideal case.
Further, the expression of the dead time cost is:
Figure 34963DEST_PATH_IMAGE006
wherein the content of the first and second substances,c i2 is represented iniDead time cost in each leg of the voyage;
Figure 98734DEST_PATH_IMAGE007
represents the firstiThe time taken for the voyage of each voyage section;
Figure 297635DEST_PATH_IMAGE008
the time required by the unmanned plane to fly at a constant speed in the linear navigation process under the optimal condition is represented.
Further, the expression of the special operation risk zone cost is as follows:
Figure 700934DEST_PATH_IMAGE009
wherein the content of the first and second substances,c i3 is represented iniCost of special operation risk areas in each flight segment navigation;R 3 a threat radius representing a risk zone;r i3 represents the firstiThe shortest distance between the flight path of each flight segment and the center position of the risk area;
Figure 491036DEST_PATH_IMAGE004
represents the firstiThe length of the flight path of each flight segment;
Figure 245365DEST_PATH_IMAGE005
representing the linear distance of the entire flight path including all flight segments in the most ideal case.
Further, the expression of the no-fly zone cost is as follows:
Figure 716798DEST_PATH_IMAGE010
wherein the content of the first and second substances,c i4 is represented iniThe cost of the no-fly zone in the navigation of each flight segment,r i4 represents the firstiThe shortest distance between the flight path of each flight segment and the flight control point of the flight control area;R 4 representing the no-fly radius of the no-fly zone.
Further, the measuring the image frames of the integrated environment video data by using the babbit distance by using the method for calculating the image frame similarity to extract the key frame set specifically includes:
extracting a frame of image frame at intervals of a preset image frame number as a prepared key frame;
by the formula
Figure 771341DEST_PATH_IMAGE011
Calculating the Papanicolaou distance between the extracted preparation key frame and the previous key frame, wherein the first key frame is initialized by the acquired first image frame;pand withqRepresenting two image frames;p(x) Andq(x) Representing two image frames atxThe gray value of (d);
Figure 997923DEST_PATH_IMAGE012
representing the babbitt distance of two image frames,
Figure 973970DEST_PATH_IMAGE013
judgment ofCoefficient of pasteurisation
Figure 45831DEST_PATH_IMAGE014
If the number of the key frames does not exceed a preset Bhattacharyya coefficient threshold value, judging the prepared key frame as a new key frame;
continuously extracting a frame of image frame as a next prepared key frame at intervals of a preset image frame number, judging whether the next prepared key frame is a new key frame or not by the Babbitt coefficient threshold judgment method, and repeating the judgment in sequence until all the image frames in the comprehensive environment video data are sequentially traversed and judged for one round to obtain the key frame set.
Further, the improved SIFT algorithm comprises:
when determining the direction, setting the Gaussian weighting parameter as the preset multiple of the key point scale, and then countingnHistogram of each direction to obtain histogram statistic
Figure 423722DEST_PATH_IMAGE015
Will be the largest of them
Figure 758889DEST_PATH_IMAGE016
And as the direction of the key point, neglecting the selection of the auxiliary direction, recording the histogram sequence and applying the histogram sequence to the calculation of the feature vector generation.
Further, the embedding of the geographic information data into the filtered supplementary keyframe set specifically includes:
the programming program extracts geographic information data stored in the ROM of the unmanned aerial vehicle, then the key frames in the screening supplement key frame set correspond to the geographic information data one by one through comparison with the time stamps of the key frames, and the geographic information data are added into exif metadata of each key frame by applying API provided by exiv 2.
Further, the pretreatment specifically comprises:
changing the color space of the key frames in the screening supplement key frame set, converting the RGB space into HSV space, and performing histogram equalization on the gray component V;
the probability of gray scale is calculated by the formula
Figure 284548DEST_PATH_IMAGE017
,
Figure 97783DEST_PATH_IMAGE018
(ii) a In the formula (I), the compound is shown in the specification,nthe number of the representative pixels is,
Figure 595761DEST_PATH_IMAGE019
represents the first
Figure 164145DEST_PATH_IMAGE020
The number of the gray levels is one,
Figure 645942DEST_PATH_IMAGE021
represents the first
Figure 262868DEST_PATH_IMAGE022
Number of pixels of a gray scale;
all the pixel points are transformed by the transformation formula
Figure 412090DEST_PATH_IMAGE023
In the formula (I), the compound is shown in the specification,
Figure 89059DEST_PATH_IMAGE024
representing the transformed pixels.
In general, compared with the prior art, the above technical solution contemplated by the present invention can achieve the following beneficial effects:
(1) The unmanned aerial vehicle of the method adopts a video shooting mode to carry out data acquisition in application, so that higher flying speed can be adopted to sweep over a surveyed region, the dead time and the operation time of the unmanned aerial vehicle can be effectively reduced, the safe working time of the unmanned aerial vehicle with limited battery capacity is ensured, and the burden of operators and the requirements on the control skills of the unmanned aerial vehicle are reduced. The improved operation process abandons the steps with higher speciality such as drawing image control points and the like, and the requirement on the air route is relatively low in the mode of recording the video, so that the training process of operators is simplified, and the complexity of the whole operation process is not high. The method takes the optimization result of the comprehensive preference function of one or more constraint conditions including the length cost, the dead time cost, the special operation risk area cost and the no-fly area cost of the flight line into consideration to obtain the optimal flight line covering the area to be modeled, so that the unmanned aerial vehicle can be ensured to be capable of more comprehensively collecting the video data of the area to be modeled while the flight line is most economical.
(2) The method measures the image frame of the comprehensive environment video data by using the method of calculating the image frame similarity in the video stream and the Papanicolaou distance, takes the extracted key frame set as the data source for model synthesis, greatly reduces the data volume, greatly relieves the operation pressure of model synthesis, and enables the dynamic update of the synthesized environment three-dimensional model to be possible along with the application requirements.
(3) According to the method, a plurality of groups of adjacent key frame combinations are extracted from the key frame set, the contact ratio calculation is carried out by using an improved SIFT algorithm, the frame extraction and the supplement are carried out on the frame combinations which do not meet the precision requirement, so that a screened supplement key frame set is obtained, one-time inspection and quality evaluation are carried out on the key frame set, and the precision of key frame extraction is improved.
(4) The method embeds geographic information data into the screened key frame set. Compared with the traditional information carrier, the three-dimensional model constructed by the method has the obvious advantages of bearing accurate geographical position information and being capable of directly carrying out calculation on the model, thereby improving the secondary development potential of the model.
(5) The method uses an image enhancement algorithm based on histogram equalization to carry out preprocessing on the screened key frames, so that the image contrast and definition are enhanced, and experiments prove that the number of matched key points can be obviously increased, thereby being beneficial to improving the precision and efficiency of subsequent modeling.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without creative efforts.
Fig. 1 is a flowchart of a comprehensive environment three-dimensional modeling method for extracting key frames based on videos collected by an unmanned aerial vehicle according to an embodiment of the present invention;
FIG. 2 is a schematic view of a route integrally using a well-shaped route and supplemented with a surrounding route according to an embodiment of the present invention; in fig. 2, 1 and 2 are no-fly zones, 3 and 4 are risk zones, 5 and 6 are dead time limit zones, one well-shaped long chain indicated by 7 is a well-shaped route, and a plurality of surrounding short chains indicated by 8 are surrounding routes;
FIG. 3 is a schematic view of a "well" pattern of the flight path provided by an embodiment of the present invention;
FIG. 4 is a schematic diagram of a circular route according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.
The terms "comprises" or "comprising," as well as any variations thereof, are intended to cover a non-exclusive inclusion. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements but may alternatively include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
The comprehensive environment three-dimensional modeling method based on the unmanned aerial vehicle collected video and extracted key frames mainly comprises the following design aspects. Firstly, an optimized data acquisition flow of the unmanned aerial vehicle is realized, the data acquisition mode mainly comprises video shooting, and meanwhile, the dead time of the unmanned aerial vehicle is reduced as much as possible by methods of improving the flight speed, simplifying the flight line and the like; secondly, the algorithm of key frame identification and extraction is used for reducing the number of extracted key frame sets as much as possible and reducing the operation time overhead on the premise of meeting the requirement of modeling contact ratio; thirdly, a sampling inspection method of the frame set is used, for the key frame set extracted and generated by adopting the algorithm, one-time inspection and check are needed, and missing image information is supplemented in time after a problem is found; and fourthly, embedding the geographic information of the key frame set, wherein the method is used for ensuring that the generated model has accurate geographic information data, ensuring the functionality and the expansibility of the model and providing possibility for subsequent secondary development. More specific schemes are set forth below.
As shown in fig. 1, in an embodiment, a method for three-dimensional modeling of an integrated environment based on extraction of keyframes from videos captured by an unmanned aerial vehicle mainly includes the following steps:
step 1, obtaining an optimal air route covering an area needing modeling based on an optimization result of a comprehensive preference function considering one or more constraint conditions including air route length cost, dead time cost, special operation risk area cost and no-fly area cost, and enabling an unmanned aerial vehicle to sail along the optimal air route to obtain comprehensive environment video data.
The step is the optimization of the comprehensive environment data acquisition process. When the unmanned aerial vehicle works, the sensitivity degree of each factor can change along with the difference of the comprehensive environment, and therefore operators are required to obtain high-value information as much as possible in the limited dead time of the unmanned aerial vehicle. The optimization aims to plan a route covering a region needing to be modeled under the constraint conditions of comprehensively considering environment, battery capacity, dead time and the like, and the optimization problem is a typical multi-objective optimization problem.
First, the constraints mentioned above with different physical meanings are converted into dimensionless satisfaction indicators with the same order of magnitude, which are used as parameters of a global preference function of the form as follows:
Figure 323731DEST_PATH_IMAGE025
(1)
in the formula (I), the compound is shown in the specification,
Figure 744348DEST_PATH_IMAGE026
planning a comprehensive preference value for the air route;mthe number of constraint conditions;nrepresenting the number of voyages of the unmanned aerial vehicle;c ij is represented in the firstiDuring the navigation of each flight segmentjThe cost of each constraint;λ j representing preference coefficients set for adapting to various application scenarios, particularly for the first time in a certain route planning taskjPreference degrees of the respective constraints. The form of the formula improves the mode of averaging the preference function of the original constraint condition and taking the logarithm, and the calculation result of the formula can be correspondingly adjusted according to the emphasis point of the operation requirement by adding the preference coefficient. Experiments show that 4 constraint conditions including the following cost of the length of the flight line, the cost of the dead time, the cost of the special operation risk area and the cost of the no-fly area need to be considered in the flight line planning.
One is the flight line length penalty. The length of the flight line is the length of the actual flight track when the unmanned aerial vehicle takes off and returns, and under the optimal condition, the linear flight path is undoubtedly the most economical and practical solution, but is limited by various factors, and the flight line inevitably has the condition of bending and folding, so the cost in this respect is the ratio of the bending flight line to the linear distance, and the calculation form is as follows:
Figure 951339DEST_PATH_IMAGE027
(2)
in the formula (I), the compound is shown in the specification,c i1 is represented iniThe length cost of the flight line in each flight segment navigation;
Figure 861526DEST_PATH_IMAGE004
represents the firstiThe length of the flight path of each flight segment;
Figure 521177DEST_PATH_IMAGE005
representing the linear distance of the entire flight path including all the legs in the most ideal case.
The second is the dead time cost. The dead time of the small unmanned aerial vehicle is limited, and how to improve the data acquisition efficiency as much as possible under the constraint of limited operation time is an important research direction. Based on the above consideration, the calculation method of the partial cost function is as follows:
Figure 73381DEST_PATH_IMAGE028
(3)
in the formula (I), the compound is shown in the specification,c i2 is represented in the firstiDead time cost in each leg of the voyage;
Figure 134878DEST_PATH_IMAGE007
represents the firstiThe length of dead time (time taken to sail) for each flight segment;
Figure 153650DEST_PATH_IMAGE008
the time required by the unmanned aerial vehicle to fly at a constant speed in the process of straight line navigation under the optimal condition is represented, and the optimal navigation uniform speed is preferably 20 km/h.
Thirdly, the risk cost (or the cost of the risk area of the special operation) in various special operations. In some special applications, this problem needs to be considered, and the ground threat is proportional to the square of the drone from the center point of the risk area, so the form of calculation of this part of the cost is as follows:
Figure 362914DEST_PATH_IMAGE029
(4)
Figure 125334DEST_PATH_IMAGE030
(5)
in the formula (I), the compound is shown in the specification,c i3 is represented iniCost of special operation risk areas in each flight segment navigation;R 3 representing threats of the risk zoneA radius;r i3 represents the firstiThe shortest distance between the flight path of each flight segment and the center position of the risk area;
Figure 306917DEST_PATH_IMAGE004
represents the firstiThe length of the flight path of each flight segment;
Figure 296257DEST_PATH_IMAGE005
representing the linear distance of the entire flight path including all the legs in the most ideal case.
And fourthly, the terrain and the cost of a no-fly zone (or the cost of the no-fly zone) which is defined in advance. The unmanned aerial vehicle is limited by performance conditions of the micro unmanned aerial vehicle, the flying height and the flying speed of the unmanned aerial vehicle are influenced by factors such as air pressure, humidity and air speed, and therefore, the local extremely severe region of the natural environment is not suitable for operation of the micro unmanned aerial vehicle. Meanwhile, due to factors such as a planned plan and the like, part of regions can be divided into no-fly regions, and the unmanned aerial vehicle cannot enter the no-fly regions. Considering the above factors together, the cost calculation form of this part is:
Figure 930501DEST_PATH_IMAGE031
(6)
in the formula (I), the compound is shown in the specification,c i4 is represented in the firstiThe cost of the no-fly zone in the navigation of each flight segment,R 4 is a no-fly zone or a no-fly radius of an area with extremely severe terrain environment and unsuitable for flying of the unmanned aerial vehicle,r i4 represents the firstiThe shortest distance between the flight path of each flight section and the flight control point of the flight control area.
In the formula (1), the reaction mixture is,λthe design of the unmanned aerial vehicle follows the preference of operators and a plan made in advance, and the design of the unmanned aerial vehicle can be adaptively adjusted according to the performance, the environment and other factors of the unmanned aerial vehicle. Given the suggested design principle of the present embodiment, the constraint conditions can be divided into six levels, which are very important, slightly important, not important and not concerned, according to the different degrees of importance, and they respectively correspond to different preference coefficients, as follows:
Figure 27770DEST_PATH_IMAGE032
(7)
in the case of a given geographic environment, depending on the degree of importance placed on each factor, the operator may balance the environment of the area to be mapped and accept or reject each factor.
According to the method, the comprehensive environment can be segmented to form a multi-level, optional and strong-operability air route planning base map. On the basis, considering the coincidence requirement of three-dimensional modeling, comprehensively adopting two modes of a surrounding type route and a 'well' -shaped route, and carrying out global coverage reconnaissance on the region to be surveyed, wherein the expression form of the whole route is shown in figure 2, firstly, the whole route is covered on the region to be surveyed by the 'well' -shaped route (the route schematic diagram of the 'well' -shaped route is shown in figure 3), if the condition of unsuitable flight is met, the route is directly bypassed, and then, the surrounding type route (the route schematic diagram of the surrounding type route is shown in figure 4, the surrounding track is a circular arc line which takes the track point of the 'well' -shaped route as the center of a circle and takes a certain distance as the radius) is used for carrying out supplementary shooting on the partial region. Although accuracy is sacrificed to a certain extent, the flight path planning method can actually meet basic modeling requirements, and is a flight path planning method with strong operability particularly in a time-critical scene.
And 2, measuring the image frames of the video data of the comprehensive environment by using the Papanicolaou distance by adopting a method for calculating the similarity of the image frames so as to extract a key frame set.
This step is the identification and extraction of key frames in the video stream information. The comprehensive environment video data acquired by the method inevitably has excessive redundant information, and if the three-dimensional modeling operation is directly carried out on the basis of the excessive redundant information, excessive pressure is caused on an arithmetic unit, so that a key frame acquisition flow for eliminating the redundant data and only supporting the modeling data is left.
The implementation method of the step is established on a key frame extraction algorithm based on the similarity between video sampling and images. The method comprehensively considers the following two factors: firstly, the video stream information is collected by a camera with 30 FPS-60 FPS, if the video stream information is compared one by one, the algorithm efficiency is too low, and too much time is wasted in an extraction link, so that the calculated amount is compressed in advance by using a sampling method, and the final result is not influenced by sampling on the basis of reasonable parameter selection through experimental verification; secondly, the selection condition of the key frame needs to meet the coincidence degree requirement of three-dimensional modeling, but the calculation amount of directly calculating the image coincidence degree is large, so that the method of roughly estimating the coincidence degree by image similarity calculation is adopted for approximate substitution in the step.
Through experiments, the calculation requirement of the contact ratio can be well fitted by measuring two images by adopting the Babbitt distance, the calculation method mainly reflects the pixel value distribution similarity of the images, and can eliminate the influence of tiny noise and light stream on the images, and the calculation method is as follows:
Figure 126176DEST_PATH_IMAGE033
(8)
wherein the content of the first and second substances,
Figure 221171DEST_PATH_IMAGE034
the calculation method of (c) is as follows:
Figure 405027DEST_PATH_IMAGE035
(9)
in the formula (I), the compound is shown in the specification,
Figure 774829DEST_PATH_IMAGE036
is the babbitt distance of two image frames,
Figure 399845DEST_PATH_IMAGE037
referred to as the babbitt coefficient,
Figure 993637DEST_PATH_IMAGE038
pand withqRepresenting two image frames;p(x) Andq(x) Representing two image frames atxThe gray value of (d).
As can be seen from the formula (9), a greater babbit coefficient means a greater overlap ratio between two samples, and when the babbit distance between one image frame and the previous key frame reaches a certain threshold (or the babbit coefficients of two image frames do not exceed a preset babbit coefficient threshold), it is determined that the previous key frame is the next key frame. The whole extraction process is as follows:
(1) Every 10 frames, one frame of image is extracted as a preparation key frame.
(2) The babbitt distance between the extracted preparation key frame and the previous key frame is calculated by equation (8), wherein the first key frame is obtained by initializing the acquired first image frame.
(3) Setting a Bhattacharyya coefficient threshold related to the Bhattacharyya distance
Figure 602473DEST_PATH_IMAGE039
If, if
Figure 510386DEST_PATH_IMAGE040
A new key frame is determined.
(4) And (4) repeating the processes of the steps (1) to (3) until the whole video stream traversal is finished.
And 3, extracting a plurality of groups of adjacent key frame combinations in the key frame set, calculating the contact ratio by using an improved SIFT algorithm, extracting frames of the frame combinations which do not meet the precision requirement, and supplementing the frames to obtain a screening and supplementing key frame set.
This step is the screening and supplementing of the key frame. The key frame set extracted by the method for calculating the image similarity inevitably has precision errors, so that the set needs to be checked and evaluated once. In consideration, in order to improve the accuracy, the step adopts a method of extracting several groups of adjacent key frame combinations from the key frame set and then using an SIFT algorithm to calculate the contact ratio. And performing frame extraction and re-supplement on the frame combinations which do not meet the requirements.
The key of the step lies in the application and improvement of the traditional SIFT algorithm, the traditional SIFT algorithm is high in complexity, long in image processing time and low in overall efficiency, simplification needs to be performed to a certain degree, and then the RANSAC algorithm can be used for making up for the reduction of precision after simplification.
The conventional SIFT algorithm mainly comprises the following five steps: firstly, establishing a scale space, and generating a Gaussian difference scale space by utilizing convolution of Gaussian difference kernels with different scales and an image; secondly, detecting an extreme value in a scale space, accurately positioning the extreme value point, and removing a key point with low contrast and a corresponding point of an unstable edge; thirdly, matching the directions of the key points, and determining the directions of the key points by using the gradient direction distribution characteristics of the neighborhood pixels of the key points so as to enable the key points to have rotation invariance; fourthly, generating a feature point descriptor, describing each key point by 16 seed points of 4 multiplied by 4, and finally forming a 128-dimensional feature vector; fifthly, feature matching, namely calculating Euclidean distance by using the feature vector generated in the fourth step, and searching matched feature points through comparison. Through experimental analysis, the time consumption for generating the feature point descriptors in the fourth step and matching the features in the fifth step is longest, so that optimization needs to be performed on the two key links.
First, in determining the direction of the keypoint, a direction corresponding to 80% of the energy of the main peak is taken as the auxiliary direction of the keypoint, but since the coordinates of the feature points of the auxiliary direction and the main direction are the same, this may cause a situation in which the number of matches is increased, and therefore, the reference to the auxiliary direction should be cancelled. Secondly, a process of generating feature vectors is carried out, the key point neighborhood histogram is repeatedly generated in the key point direction distribution and the feature vector generation, and optimization can be carried out.
To solve the above two problems, first, in determining the direction, a Gaussian weighting parameter is defined as a key point scale
Figure 317805DEST_PATH_IMAGE041
1.5 times of the total amount of the compound, and then countingnHistogram of each direction to obtain histogram statistic
Figure 816920DEST_PATH_IMAGE042
Will be the largest of them
Figure 647473DEST_PATH_IMAGE016
And (4) as the direction of the key point, neglecting the selection of the auxiliary direction, recording the histogram sequence, and applying the histogram sequence to the feature vector generation calculation. Through the optimization, the number of generated key points can be reduced, but the number of the reserved key points can be matched, and an accurate coincidence result of the image can be obtained.
And 4, embedding geographic information data into the screened supplementary key frame set and preprocessing the screened supplementary key frame set so as to construct a comprehensive environment three-dimensional model of the region to be modeled.
For the three-dimensional model, the fact that geographic position information is carried is a crucial feature, and the keyframes extracted by the method lack related data in exif metadata, so that the synthesized three-dimensional model does not support later-stage function expansion.
In order to solve the problem, the geographic information data stored in the ROM of the unmanned aerial vehicle can be extracted through a programming program, then the extracted key frames are in one-to-one correspondence with the geographic information data through comparison with the time stamps of the key frames, and the data are added into exif metadata of each frame by applying an API (application programming interface) provided by exiv2, so that the secondary development potential of the model can be improved.
Unmanned aerial vehicle can receive the influence of factors such as sun angle, environment cover when taking photo by plane, causes the image of gathering to have the difference on grey scale and the luminance, has adverse effect to the detection of image characteristic. In this embodiment, an image enhancement algorithm based on histogram equalization is used to pre-process the screened keyframes, and the steps of the algorithm are as follows:
(1) And changing the color space of the key frames in the key frame set after screening and supplementing, converting the RGB space into HSV space, and performing histogram equalization on the gray component V.
(2) The probability of the gray level is calculated by the formula:
Figure 155814DEST_PATH_IMAGE043
(10)
in the formula (I), the compound is shown in the specification,nthe number of the representative pixels is,
Figure 755423DEST_PATH_IMAGE044
represents the first
Figure 894280DEST_PATH_IMAGE045
The number of the gray levels is one,
Figure 274446DEST_PATH_IMAGE046
represents the first
Figure 258583DEST_PATH_IMAGE047
Number of pixels of one gray level.
(3) All the pixel points are transformed, and the transformation formula is as follows:
Figure 40594DEST_PATH_IMAGE048
(11)
in the formula
Figure 84773DEST_PATH_IMAGE024
Representing the transformed pixels.
After the preprocessing by the method, the contrast and the definition of the image frame can be enhanced, and experiments prove that the number of matching key points can be increased remarkably, thereby being beneficial to improving the precision and the efficiency of subsequent modeling.
After the data is sufficiently prepared, the model can be constructed by using mature tools such as Photoscan and Context Capture, and the three-dimensional model in the OSGB format is generally selected and generated, so that the three-dimensional model can be adapted to various display platforms and is favorable for development of expansibility.
Through the four steps of operation, the scheme can achieve the effect similar to that of the traditional oblique photography modeling under the condition of greatly reducing the data processing amount, not only meets the timeliness requirement under various operating environments, but also ensures the requirements of intuition, three-dimensional and image of the model, and simultaneously has certain inspiring and reference significance for the traditional oblique photography.
The technical route of the method is a brand new research direction generated after surveying and mapping intersect with the subjects such as unmanned aerial vehicles, image processing and the like, and the application target of the method is to provide comprehensive environment information service for terrain-based situation research and judgment and operation planning. In an emergency state, the system can be used as a situation display platform of a specific area; in the normal state, the system can provide information support for relevant business departments to view the terrain in an all-around manner and think, discuss and deduce. The system has the advantages that the real geographic environment is vividly and stereoscopically displayed in front of related personnel, and functions of plotting, dynamic demonstration and the like are provided, so that the decision-making process is accelerated.
In conclusion, the invention optimizes the data acquisition strategy on the basis of the traditional oblique photography operation flow, and replaces photo shooting with video recording. This improvement has reduced the requirement of whole set of flow to the airline planning, has reduced unmanned aerial vehicle's the time of staying empty, is applicable to the current many basic departments or individual unmanned aerial vehicle small, the limited occasion of battery capacity to with video mode information acquisition, greatly reduced to operating personnel's technical requirement, whole set of solution has extensive adaptability and practicality.
The invention can quickly and efficiently identify and extract the key frame in the video stream. The whole section of the comprehensive environment video has excessive redundant data, so that the designed algorithm for identifying key frames is the key for improving the overall efficiency, most redundant information is removed on the premise of meeting the requirement of modeling coincidence rate, and data support is provided for later-stage modeling. This scheme stands at user's angle, the simple function of making a video recording of making full use of miniature unmanned aerial vehicle (portability is good, take off weight about 1.5 kilograms), the extraction of key frame is carried out to the video of miniature unmanned aerial vehicle collection, realize the three-dimensional modeling of comprehensive environment, aim at overcoming the picture or the all-round topography of watching of video that directly use unmanned aerial vehicle to gather, study and judge the not enough of situation existence, strengthened the interactive experience of operation personnel with the terrain environment, promoted decision-making efficiency.
The invention checks the extracted key frame set. The key frame set is checked by sampling and detecting a proper range, the extracted samples are subjected to accurate coincidence degree calculation, and if an image frame which does not meet the accuracy requirement is found, the missing part is supplemented.
The invention embeds geographic information data into the screened key frame set. Compared with the traditional information carrier, the constructed three-dimensional model has the obvious advantages of bearing accurate geographical position information and being capable of directly carrying out calculation work on the model. However, the image frames extracted directly lack geographic information, and therefore, a special auxiliary program is needed to embed the calibrated geographic information data so as to meet various application requirements.
The above description is merely an exemplary embodiment of the present disclosure, and the scope of the present disclosure is not limited thereto. That is, all equivalent changes and modifications made in accordance with the teachings of the present disclosure are intended to be included within the scope of the present disclosure. Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.
All possible combinations of the technical features in the above embodiments may not be described for the sake of brevity, but should be considered as being within the scope of the present disclosure as long as there is no contradiction between the combinations of the technical features.
It will be understood by those skilled in the art that the foregoing is only a preferred embodiment of the present invention, and is not intended to limit the invention, and that any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (9)

1. An integrated environment three-dimensional modeling method for extracting key frames based on videos collected by an unmanned aerial vehicle is characterized by comprising the following steps:
acquiring an optimal air route covering an area needing modeling based on an optimization result of a comprehensive preference function considering one or more constraint conditions including air route length cost, dead time cost, special operation risk area cost and no-fly area cost, and enabling the unmanned aerial vehicle to sail along the optimal air route to acquire comprehensive environment video data;
measuring the image frames of the comprehensive environment video data by using a method for calculating the similarity of the image frames and the Babbitt distance so as to extract a key frame set;
extracting a plurality of groups of adjacent key frame combinations in the key frame set, calculating the contact ratio by using an improved SIFT algorithm, extracting frames of the frame combinations which do not meet the precision requirement, and supplementing the frames to obtain a screening and supplementing key frame set;
embedding geographic information data into the screened supplementary key frame set and preprocessing the screened supplementary key frame set so as to construct a comprehensive environment three-dimensional model of the region needing to be modeled;
wherein the improved SIFT algorithm comprises:
when determining the direction, setting the Gaussian weighting parameter as the preset multiple of the key point scale, and then countingnHistogram of each direction to obtain histogram statistic
Figure QLYQS_1
Will be the largest of them
Figure QLYQS_2
And as the direction of the key point, neglecting the selection of the auxiliary direction, recording the histogram sequence and applying the histogram sequence to the calculation of the feature vector generation.
2. The modeling method of claim 1, wherein the synthetic preference function has the expression:
Figure QLYQS_3
wherein the content of the first and second substances,
Figure QLYQS_4
representing a comprehensive preference value of route planning;mrepresenting the number of constraints;nrepresenting the number of voyages of the unmanned aerial vehicle;c ij is represented iniDuring the navigation of each flight segmentjThe cost of each constraint;λ j representing preference coefficients, representing the first to be given in a certain route planningjPreference degrees of individual constraints.
3. A modeling method according to claim 2, in which the route length cost is expressed as:
Figure QLYQS_5
wherein the content of the first and second substances,c i1 is represented in the firstiThe length cost of the flight line in each flight segment navigation;
Figure QLYQS_6
represents the firstiThe length of the flight path of each flight segment;
Figure QLYQS_7
representing the linear distance of the entire flight path including all the legs in the most ideal case.
4. A modeling method according to claim 2, wherein the dead-time cost is expressed as:
Figure QLYQS_8
wherein the content of the first and second substances,c i2 is represented iniDead time cost in each leg of the voyage;
Figure QLYQS_9
represents the firstiThe time taken for the voyage of each voyage section;
Figure QLYQS_10
the time required by the unmanned plane to fly at a constant speed in the linear navigation process under the optimal condition is represented.
5. The modeling method of claim 2, wherein the special job risk zone cost is expressed as:
Figure QLYQS_11
wherein the content of the first and second substances,c i3 is represented in the firstiCost of special operation risk areas in each flight segment navigation;R 3 a threat radius representing a risk zone;r i3 represents the firstiThe shortest distance between the flight path of each flight segment and the center position of the risk area;
Figure QLYQS_12
represents the firstiThe length of the flight path of each flight segment; />
Figure QLYQS_13
Representing the linear distance of the entire flight path including all flight segments in the most ideal case.
6. A modeling method in accordance with claim 2, wherein the forbidden flight zone cost is expressed as:
Figure QLYQS_14
wherein the content of the first and second substances,c i4 is represented in the firstiThe cost of the no-fly zone in the navigation of each flight segment,r i4 represents the firstiThe shortest distance between the flight path of each flight section and the flight control point of the flight control area;R 4 representing the no-fly radius of the no-fly zone.
7. The modeling method of claim 1, wherein said using the method of calculating image frame similarity to measure image frames of the integrated environmental video data using babbitt distance to extract keyframe sets comprises:
extracting a frame of image frame at intervals of a preset image frame number as a prepared key frame;
by the formula
Figure QLYQS_15
Calculating the Papanicolaou distance between the extracted preparation key frame and the previous key frame, wherein the first key frame is initialized by the acquired first image frame;pandqrepresenting two image frames;p(x) Andq(x) Representing two image frames atxThe gray value of (d); />
Figure QLYQS_16
Representing the Bhattacharyya distance of two image frames>
Figure QLYQS_17
Judging the Barcol index
Figure QLYQS_18
If the number of the key frames does not exceed a preset Bhattacharyya coefficient threshold value, judging the prepared key frame as a new key frame;
continuously extracting a frame of image frame as a next prepared key frame at intervals of a preset image frame number, judging whether the next prepared key frame is a new key frame or not by the Babbitt coefficient threshold judgment method, and repeating the judgment in sequence until all the image frames in the comprehensive environment video data are sequentially traversed and judged for one round to obtain the key frame set.
8. The modeling method of claim 1, wherein said embedding geographic information data into said set of filtered supplemental keyframes specifically comprises:
the programming program extracts geographic information data stored in the ROM of the unmanned aerial vehicle, then the key frames in the screening supplement key frame set correspond to the geographic information data one by one through comparison with the time stamps of the key frames, and the geographic information data are added into exif metadata of each key frame by applying API provided by exiv 2.
9. The modeling method of claim 1, wherein the preprocessing specifically includes:
changing the color space of the key frames in the screening supplement key frame set, converting the RGB space into HSV space, and performing histogram equalization on the gray component V;
the probability of the gray level is calculated by the formula
Figure QLYQS_19
,/>
Figure QLYQS_20
(ii) a In the formula (I), the compound is shown in the specification,nthe number of the representative pixels is->
Figure QLYQS_21
Represents a fifth or fifth party>
Figure QLYQS_22
A gray level->
Figure QLYQS_23
Represents a fifth->
Figure QLYQS_24
Number of pixels of a gray scale;
all the pixel points are transformed by the transformation formula
Figure QLYQS_25
In the formula (I), the compound is shown in the specification,
Figure QLYQS_26
representing the transformed pixels. />
CN202211631541.5A 2022-12-19 2022-12-19 Comprehensive environment three-dimensional modeling method for extracting key frames based on videos acquired by unmanned aerial vehicle Active CN115619959B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211631541.5A CN115619959B (en) 2022-12-19 2022-12-19 Comprehensive environment three-dimensional modeling method for extracting key frames based on videos acquired by unmanned aerial vehicle

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211631541.5A CN115619959B (en) 2022-12-19 2022-12-19 Comprehensive environment three-dimensional modeling method for extracting key frames based on videos acquired by unmanned aerial vehicle

Publications (2)

Publication Number Publication Date
CN115619959A CN115619959A (en) 2023-01-17
CN115619959B true CN115619959B (en) 2023-04-07

Family

ID=84880255

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211631541.5A Active CN115619959B (en) 2022-12-19 2022-12-19 Comprehensive environment three-dimensional modeling method for extracting key frames based on videos acquired by unmanned aerial vehicle

Country Status (1)

Country Link
CN (1) CN115619959B (en)

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101719144B (en) * 2009-11-04 2013-04-24 中国科学院声学研究所 Method for segmenting and indexing scenes by combining captions and video image information
CN104463962B (en) * 2014-12-09 2017-02-22 合肥工业大学 Three-dimensional scene reconstruction method based on GPS information video
CN104899861B (en) * 2015-04-01 2017-10-27 华北电力大学(保定) The automatic searching method of key frame in a kind of intravascular ultrasound video
ES2848851T3 (en) * 2018-04-30 2021-08-12 Tata Consultancy Services Ltd Procedure and system for the construction of images based on the stitching of frames in an indoor environment

Also Published As

Publication number Publication date
CN115619959A (en) 2023-01-17

Similar Documents

Publication Publication Date Title
Wegner et al. Cataloging public objects using aerial and street-level images-urban trees
CN110956651B (en) Terrain semantic perception method based on fusion of vision and vibrotactile sense
Lin et al. Use of UAV oblique imaging for the detection of individual trees in residential environments
CN111540048B (en) Fine live-action three-dimensional modeling method based on space-ground fusion
CN109520500B (en) Accurate positioning and street view library acquisition method based on terminal shooting image matching
CN109883401B (en) Method and system for measuring visual field of city mountain watching
CN106373088B (en) The quick joining method of low Duplication aerial image is tilted greatly
KR102200299B1 (en) A system implementing management solution of road facility based on 3D-VR multi-sensor system and a method thereof
Barazzetti et al. True-orthophoto generation from UAV images: Implementation of a combined photogrammetric and computer vision approach
WO2004095374A1 (en) Video object recognition device and recognition method, video annotation giving device and giving method, and program
CN115439424A (en) Intelligent detection method for aerial video image of unmanned aerial vehicle
CN108320304A (en) A kind of automatic edit methods and system of unmanned plane video media
Poterek et al. Deep learning for automatic colorization of legacy grayscale aerial photographs
CN112991487A (en) System for multithreading real-time construction of orthoimage semantic map
CN113340312A (en) AR indoor live-action navigation method and system
CN116030194A (en) Air-ground collaborative live-action three-dimensional modeling optimization method based on target detection avoidance
Shin et al. True orthoimage generation using airborne lidar data with generative adversarial network-based deep learning model
CN117315146B (en) Reconstruction method and storage method of three-dimensional model based on trans-scale multi-source data
CN109883400A (en) Fixed station Automatic Targets and space-location method based on YOLO-SITCOL
CN115619959B (en) Comprehensive environment three-dimensional modeling method for extracting key frames based on videos acquired by unmanned aerial vehicle
Maurer et al. Automated inspection of power line corridors to measure vegetation undercut using UAV-based images
CN115937673B (en) Geographic element rapid change discovery method based on mobile terminal photo
CN116229001A (en) Urban three-dimensional digital map generation method and system based on spatial entropy
CN113362265B (en) Low-cost rapid geographical splicing method for orthographic images of unmanned aerial vehicle
KR102587445B1 (en) 3d mapping method with time series information using drone

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant