CN117095317A - Unmanned aerial vehicle three-dimensional image entity identification and time positioning method - Google Patents

Unmanned aerial vehicle three-dimensional image entity identification and time positioning method Download PDF

Info

Publication number
CN117095317A
CN117095317A CN202311352027.2A CN202311352027A CN117095317A CN 117095317 A CN117095317 A CN 117095317A CN 202311352027 A CN202311352027 A CN 202311352027A CN 117095317 A CN117095317 A CN 117095317A
Authority
CN
China
Prior art keywords
entity
time period
resnet
time
classifier
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202311352027.2A
Other languages
Chinese (zh)
Other versions
CN117095317B (en
Inventor
周皓然
叶绍泽
陆国锋
陈康
袁杰遵
余齐
张举冠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Senge Data Technology Co ltd
Original Assignee
Shenzhen Senge Data Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Senge Data Technology Co ltd filed Critical Shenzhen Senge Data Technology Co ltd
Priority to CN202311352027.2A priority Critical patent/CN117095317B/en
Publication of CN117095317A publication Critical patent/CN117095317A/en
Application granted granted Critical
Publication of CN117095317B publication Critical patent/CN117095317B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • G06V20/17Terrestrial scenes taken from planes or by drones
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Remote Sensing (AREA)
  • Image Analysis (AREA)

Abstract

The application discloses a three-dimensional image entity identification and time positioning method of an unmanned aerial vehicle, belonging to the technical field of geographic information, comprising the following steps: s10: acquiring a time period possibly containing an entity from the unmanned aerial vehicle video, and constructing a two-class training data set; s20: training a ResNet model on the data set to obtain a two-classifier ResNet_1; s30: recommending the entity time period by utilizing the characteristics extracted by the two classifiers ResNet_1; s40: refining the boundary of the recommended entity time period; s50: constructing a K+1 class training data set, wherein the data set comprises a K class entity frame and a background frame; s60: training the ResNet model on the data set constructed in the step S50 to obtain a K+1 classifier ResNet_2; s70: extracting specific entity time period characteristics by using a K+1 classifier ResNet_2; s80: and classifying the specific entity time period by using a K+1 classifier SVM. The beneficial effects of the application are as follows: the method can carry out classification processing on the whole video frequency band, reduces the influence caused by single frame identification errors, and ensures that the identification of the artificial geographic entity is more accurate.

Description

Unmanned aerial vehicle three-dimensional image entity identification and time positioning method
Technical Field
The application relates to the technical field of geographic information, in particular to an unmanned aerial vehicle three-dimensional image entity identification and time positioning method.
Background
The artificial geographic entity in the live-action three-dimension refers to a geographic entity constructed or modified by human beings, such as water conservancy, traffic, buildings, site facilities and the like. In live-action three-dimension, unmanned aerial vehicles are often used for data acquisition of an artificial geographic entity through oblique photography, then the artificial geographic entity is subjected to time positioning from the acquired data, video segments of the artificial geographic entity are extracted, and the artificial geographic entity is subjected to three-dimensional modeling around the video segments of the artificial geographic entity. The identification of artificial geographical entities and the time positioning in video is therefore of vital importance. With the continuous progress of geographic information science and the continuous development of spatial data acquisition means technology, the live-action three-dimensional technology has become an important means for acquiring urban current situation and natural resource spatial data. The real-scene three-dimensional model can truly realize the three-dimensional visual expression of the whole real world, multiple scales, multiple sources and multiple types, plays an important role in the three-dimensional construction of the Chinese real scene, and provides powerful assistance for the construction of smart cities.
In recent years, unmanned aerial vehicle technology is widely applied and rapidly developed, and plays an important role in various fields such as land exploration, geographic information acquisition, environment monitoring and the like. The unmanned aerial vehicle is provided with a high-resolution camera or a sensor to perform aerial shooting, and three-dimensional image information of the ground surface is acquired, so that the unmanned aerial vehicle has become an important mode for acquiring geographic information. Three-dimensional images captured by unmanned aerial vehicles or satellites and other devices are essentially complex image data containing a large amount of spatial geographic information, and useful information can be extracted through specific preprocessing and analysis processes. The step of artificial geographic entity identification is to intelligently identify and classify geographic entities in images through specific computer algorithms and pattern recognition technologies in the preprocessing and analyzing processes. However, how to extract geographical entity information from a large number of complex unmanned aerial vehicle images rapidly and accurately and to perform accurate time positioning on the geographical entity information is still an important research topic. However, the three-dimensional image data collected by the unmanned aerial vehicle is huge and complex, and effective data processing and analysis means are required to extract useful information therefrom. The traditional manual identification and processing method is time-consuming and labor-consuming, and is difficult to meet the requirements of modern high efficiency.
Disclosure of Invention
In order to overcome the defects of the prior art, the three-dimensional image entity identification and time positioning method of the unmanned aerial vehicle can be used for carrying out classification processing on the whole video frequency band, reducing the influence caused by single frame identification errors and enabling the identification of the artificial geographic entity to be more accurate.
The technical scheme adopted for solving the technical problems is as follows: in a method for three-dimensional image entity identification and time positioning of an unmanned aerial vehicle, the improvement comprising the steps of:
s10: acquiring a time period possibly containing an entity from the unmanned aerial vehicle video, and constructing a two-class training data set;
s20: training a ResNet model on the data set to obtain a two-classifier ResNet_1;
s30: recommending the entity time period by utilizing the characteristics extracted by the two classifiers ResNet_1;
s40: refining the boundary of the recommended entity time period;
s50: constructing a K+1 class training data set, wherein the data set comprises a K class entity frame and a background frame;
s60: training the ResNet model on the data set constructed in the step S50 to obtain a K+1 classifier ResNet_2;
s70: extracting specific entity time period characteristics by using a K+1 classifier ResNet_2;
s80: and classifying the specific entity time period by using a K+1 classifier SVM.
Further, in the step S10, the data set includes two types of entity frames and background frames, and the entity frames at this time do not distinguish specific building entities or traffic entities.
Further, in the step S20, the classifier res net_1 classifies a video frame into a background frame or a physical frame.
Further, the step S30 includes the following steps;
s301: each frame of video is taken as an initial time period and forms a time period set;
s302: extracting features from the video frames by using a two-classifier ResNet_1;
s303: and recommending the time of the entity based on the feature similarity.
Further, the step S40 includes the following steps;
s401: extracting features of the recommended entity time period;
s402; calculating an average value and L2 normalization operation on the recommended entity time period integration set to obtain a feature expression;
s403: training a fully connected neural network;
s404: taking the feature expression as the input of the neural network, and outputting the confidence score of the entity time period and the deviation of the time period boundary;
s405: the physical time period for redundancy removal is suppressed using a time domain non-maximum.
Further, in step S403, the fully connected neural network is trained into a classifier and a boundary regressor simultaneously by a multi-task learning method, and the loss function during training is composed of two parts, one is a Softmax cross entropy loss function for classifying tasks, and the other is a loss function for boundary shift regression tasks of entity time periods.
Further, in the step S405, the time-domain non-maximum suppression measures the overlapping degree of the two time periods by calculating a time overlap ratio, where the time overlap ratio is expressed as:
wherein time period 1 and time period 2 are overlapping time periods.
Further, the Softmax cross entropy loss function is defined as:
where N is the number of samples, y i ' is the classification result expected for the ith sample, y i Is the Softmax score of the i-th sample actually output by the neural network;
L reg is a loss function in the boundary-offset regression task for an entity period, defined as:
wherein the label value is a sample label value, the positive sample is 1, the negative sample is 0, and N p Is the number of positive samples, O s,i For the deviation of the first frame in the video for the physical period i, O e,i Deviations of the last frame in the video for the physical time period.
Further, the step S70 includes the following steps;
s701: extracting features from the video frames by using a K+1 classifier ResNet_2;
s702: and calculating the characteristic expression of the recommended specific entity time period.
The beneficial effects of the application are as follows: the method can carry out classification processing on the whole video frequency band, reduces the influence caused by single frame identification errors, and ensures that the identification of the artificial geographic entity is more accurate.
Drawings
FIG. 1 is a flow chart of a method for three-dimensional image entity identification and time positioning of an unmanned aerial vehicle according to the present application;
FIG. 2 is a diagram of an example boundary refinement of a time period of an artificial geographic entity of the present application;
FIG. 3 is a diagram of a fully connected neural network of the present application;
fig. 4 is an exemplary diagram of a time overlap ratio map tbou according to the present application.
Detailed Description
The application will be further described with reference to the drawings and examples.
The conception, specific structure, and technical effects produced by the present application will be clearly and completely described below with reference to the embodiments and the drawings to fully understand the objects, features, and effects of the present application. It is apparent that the described embodiments are only some embodiments of the present application, but not all embodiments, and that other embodiments obtained by those skilled in the art without inventive effort are within the scope of the present application based on the embodiments of the present application. In addition, all the coupling/connection relationships referred to in the patent are not direct connection of the single-finger members, but rather, it means that a better coupling structure can be formed by adding or subtracting coupling aids depending on the specific implementation. The technical features in the application can be interactively combined on the premise of no contradiction and conflict.
Referring to fig. 1 to 4, the application also provides a method for identifying and positioning a three-dimensional image entity of an unmanned aerial vehicle, which in the embodiment comprises the following steps:
s10: acquiring a time period possibly containing an entity from the unmanned aerial vehicle video, and constructing a two-class training data set;
s20: training a ResNet model on the data set to obtain a two-classifier ResNet_1;
s30: recommending the entity time period by utilizing the characteristics extracted by the two classifiers ResNet_1;
s40: refining the boundary of the recommended entity time period;
s50: constructing a K+1 class training data set, wherein the data set comprises a K class entity frame and a background frame;
s60: training the ResNet model on the data set constructed in the step S50 to obtain a K+1 classifier ResNet_2;
s70: extracting specific entity time period characteristics by using a K+1 classifier ResNet_2;
s80: and classifying the specific entity time period by using a K+1 classifier SVM.
According to the method, through setting the series of operations, the artificial geographic entity time period recommendation is performed through the two classifiers ResNet_1, and then the artificial geographic entity time period classification is performed through the K+1 classifier. The method utilizes the similarity of adjacent frames, simultaneously carries out classification treatment on the whole video frequency band, reduces the influence caused by single frame identification errors, simultaneously carries out classification treatment on the whole video frequency band by the treatment from the frame level to the video frequency band level on the time period recommendation and the time period classification of the entity, realizes the identification of the artificial geographic entity from local to whole, and ensures that the identification of the artificial geographic entity is more accurate.
Further, some time periods that may contain artificial geographic entities will be recommended from the drone video. Because the two entities may have time overlapping in the video, the recommended time periods may also have time overlapping. A two-class training dataset is constructed using video frames, and in step S10, the dataset includes two classes of entity frames and background (non-entity) frames, where the entity frames do not distinguish between specific building entities or traffic entities, etc. In this process, the model learns and understands the data features in the training set and then generates a certain judgment criterion. In the step S20, the classifier res net_1 classifies a video frame into a background (non-physical) frame or a physical frame. For an unmanned video with a length of N frames, the frame set is. Classifying each frame using trained classifier ResNet_1, resulting in,/>. Extracting each frame simultaneouslyIs characterized by Pool5, the extraction result is +.>
In the method, each frame of the video is taken as an initial time period to form a time period set, two time periods which are most similar in time and adjacent are continuously selected to be combined so as to recommend the time period containing the artificial geographic entity, and the time period recommendation is completed until one time period containing the entity frame is left in the set. The time period recommendation is performed in the Pool5 feature space extracted by the trained two-classifier ResNet_1, and the similarity calculation of the two time periods uses the L2 distance. The method also fully utilizes the discrimination capability of ResNet_1, and the video segment merging, video segment reservation and stopping criteria in the whole time segment recommendation process are as follows:
(1) Video segment merging criteria: in each merging process, the entity frames are included in two adjacent time periods of merging.
(2) Video segment retention criteria: if the proportion of physical frames in a video segment is below a certain threshold θ (typically 0.5), it is not recommended and not taken as a physical time period.
(3) Stopping criteria: in the merging process, merging is stopped when only one time period is left to contain the entity frame.
The specific algorithm recommended for this time period is as follows:
algorithm: time period recommendation
Input videoFrame set->
A classification result set of video frames:
classification score set for video frames:
pool5 feature set of video frames:
the method comprises the following steps:
aggregating framesAs an initial set of time periods
Initializing a time period similarity set:
initializing a time period recommendation set:
foreach adjacent time period
do
Calculating similarity:
end foreach
while at least one time period contains the physical frame// stop criteria
do
Acquiring most similar adjacent time periodsAnd wherein->Or->Includes physical frames;// merging criteria
MergingAnd->:/>
Mean value mergingAnd->Is characterized by: />
Mean value mergingAnd->Is a score of (2): />
Calculation ofSimilarity set with its neighboring time period +.>
Updating the similarity set:
updating the score set:
updating the feature set:
updating the set of time periods:
if video segmentThe proportion of physical frames in (3) is greater than +.>The/(reservation criterion)
then
end if
end while output: time period recommendation set for artificial geographic entity
Further, the step S30 includes a following step;
s301: each frame of video is taken as an initial time period and forms a time period set;
s302: extracting features from the video frames by using a two-classifier ResNet_1;
s303: and recommending the time of the entity based on the feature similarity.
Still further, the step S40 includes the following steps;
s401: extracting features of the recommended entity time period;
s402; calculating an average value and L2 normalization operation on the recommended entity time period integration set to obtain a feature expression;
s403: training a fully connected neural network;
s404: taking the feature expression as the input of the neural network, and outputting the confidence score of the entity time period and the deviation of the time period boundary;
s405: the physical time period for redundancy removal is suppressed using a time domain non-maximum.
Referring to fig. 2, in step S40, for the entity time periods recommended by the time period recommendation algorithm, the method constructs a multi-task fully-connected neural network to refine the boundary of each recommended time period, and classifies the recommended time periods and also refines the boundary of the recommended time period.
A recommended entity time period PC e PC, and frame set thereofCorresponding Pool5 feature setWhere s and e are the indices of the first and last frames of the time period pc in the video, respectively. The characteristics of the entity time period pc are expressed as:
wherein the mean value of the feature set F and the L2 normalization operation are adopted. The feed forward neural network uses as input the characteristic representation R of the recommended physical time period, outputs it as a confidence score of the physical time period, and its offset of the time boundary. The boundary offset, using the L1 distance, is defined as:
wherein S is p And e p Is the index in the video of the first and last frame of the physical time period S g And e g Is the index in the video of the first and last frame of the true value that matches it.
In the step S403, the fully connected neural network is trained into a classifier and a boundary regressor simultaneously by a multitask learning method, and the loss function during training is composed of two parts, one is a Softmax cross entropy loss function for classifying tasks, and the other is a loss function for boundary shift regression tasks of entity time periods.
The fully connected neural network is trained into a classifier and a boundary regression simultaneously through a multi-task learning method. The loss function thus consists of two parts, defined as the formula:
the Softmax cross entropy loss function is defined as:
where N is the number of samples, y i ' is the classification result expected for the ith sample, y i Is the Softmax score of the i-th sample actually output by the neural network;
L reg is a loss function in the boundary-offset regression task for an entity period, defined as:
wherein the label value is a sample label value, the positive sample is 1, the negative sample is 0, and N p Is the number of positive samples, O s,i For the deviation of the first frame in the video for the physical period i, O e,i Deviations of the last frame in the video for the physical time period.
The structure of the feedforward neural network adopted by the method is shown in figure 3. The first layer is a fully connected layer containing ReLU activation, where there are 1024 neurons, followed by a discard layer with a discard rate of 0.4 and a fully connected layer of 4 neurons without ReLU activation. The last layer is a multi-task layer, the two loss functions are respectively used in training, and the Softmax score of the entity time period and the deviation of the boundary are respectively output in testing. Through the arrangement, the fully-connected neural network can conduct fine adjustment of the boundary while identifying the entity time period, so that the determination of the boundary of the entity is more accurate, the accuracy of the boundary of the entity time period is greatly improved, and the method is beneficial to better identifying and extracting the entity characteristics in the video.
Further, in the step S405, the time-domain non-maximum suppression measures the overlapping degree of the two time periods by calculating a time overlap ratio, where the time overlap ratio is expressed as:
the cleaning of entity time periods with repeated or excessively high overlapping degree is a key step, so that redundancy can be eliminated, and the final precision is improved. Non-maximum suppression (NMS) is a common method for eliminating redundant overlap areas in target detection results. The non-maximum suppression is performed in the spatial dimension and is mainly used for object detection tasks in the image. However, in processing video or some tasks involving the time dimension, the non-maximum suppression of the application space may not be accurate enough and thus may be extended to processing in the time domain, which is the time domain non-maximum suppression. The recommended redundant entity time period is removed by changing the overlap ratio calculation IoU to a time overlap ratio tIoU, extending the spatial Non-maximum suppression (NMS: non-Maximum Suppression) to the time domain.
Time domain non-maximum suppression in the method uses a time overlap ratio tIoU with a threshold of 0.3, wherein time period 1 and time period 2 are overlapping time periods. And then sequencing all the entity time periods (generally sequencing according to confidence scores), selecting one interval with the highest confidence score from the intervals, calculating tIoU with all other intervals, and deleting all intervals with the tIoU exceeding the threshold. This process is repeated until all intervals have been processed. Thus, the intervals with the highest confidence scores and no overlap (or the overlapping degree is lower than a certain threshold value) can be reserved, and the final result can effectively eliminate redundant entity time periods.
Further, the artificial geographic entity time period is classified into a specific artificial geography or background (non-entity). Constructing a K+1 class training data set, wherein the data set comprises a K class entity frame and a background (non-entity) frame, training a neural network ResNet on the data set to obtain a K+1 classifier, called ResNet_2, which can divide a video frame into: specific physical frames or background (non-physical) frames.
The pool5 feature is extracted for each frame in the artificial geographic entity time period recommended in the first stage using the trained ResNet_2. A recommended entity time period PC e PC, and frame set thereofAnd the corresponding ResNet_2 extracted Pool5 feature set +.>Where s and e are the indices of the first and last frames of the time period pc in the video, respectively. The characteristics of the entity time period pc at this stage are expressed as:
because the k+1 classifier res net_2 is used to classify video frames, it is not possible to classify video segments. ResNet_2 is therefore only used to extract the features of each frame and calculate the feature table R' for the acquisition entity time period. The SVM classifies the entity time period using the recommended feature expression R' of the entity time period as an input. Unlike the frame training data of ResNet_2, the K+1 class training data set of the SVM is a time period data set that includes a K class entity time period and a background (non-entity) time period.
Still further, the step S70 includes the following steps;
s701: extracting features from the video frames by using a K+1 classifier ResNet_2;
s702: and calculating the characteristic expression of the recommended specific entity time period.
The training data used by ResNet_2 is in frames, each frame being a separate sample. The training data of the SVM is in units of time periods, including K-type entity time periods and background (non-entity) time periods. The reason for this is that in video processing, the information that a single frame and a sequence of consecutive frames (i.e. time segments) can provide is different, in particular the occurrence of some action or event, which requires a complete and clear representation in a sequence of consecutive frames. Therefore, by combining the two methods, the video data can be better analyzed and processed, further improving the performance of the model.
While the preferred embodiment of the present application has been illustrated and described, the present application is not limited to the embodiments, and those skilled in the art can make various equivalent modifications or substitutions without departing from the spirit of the present application, and these equivalent modifications or substitutions are included in the scope of the present application as defined in the appended claims.

Claims (9)

1. The unmanned aerial vehicle three-dimensional image entity identification and time positioning method is characterized by comprising the following steps of:
s10: acquiring a time period possibly containing an entity from the unmanned aerial vehicle video, and constructing a two-class training data set;
s20: training a ResNet model on the data set to obtain a two-classifier ResNet_1;
s30: recommending the entity time period by utilizing the characteristics extracted by the two classifiers ResNet_1;
s40: refining the boundary of the recommended entity time period;
s50: constructing a K+1 class training data set, wherein the data set comprises a K class entity frame and a background frame;
s60: training the ResNet model on the data set constructed in the step S50 to obtain a K+1 classifier ResNet_2;
s70: extracting specific entity time period characteristics by using a K+1 classifier ResNet_2;
s80: and classifying the specific entity time period by using a K+1 classifier SVM.
2. The method according to claim 1, wherein in step S10, the data set includes two types of entity frames and background frames, and the entity frames do not distinguish between specific building entities or traffic entities.
3. The method according to claim 1, wherein in the step S20, the classifier res net_1 classifies a video frame into a background frame or a physical frame.
4. The method for three-dimensional image entity recognition and time positioning of unmanned aerial vehicle according to claim 1, wherein the step S30 comprises the steps of:
s301: each frame of video is taken as an initial time period and forms a time period set;
s302: extracting features from the video frames by using a two-classifier ResNet_1;
s303: and recommending the time of the entity based on the feature similarity.
5. The method for three-dimensional image entity recognition and time positioning of unmanned aerial vehicle according to claim 1, wherein the step S40 comprises the steps of:
s401: extracting features of the recommended entity time period;
s402; calculating an average value and L2 normalization operation on the recommended entity time period integration set to obtain a feature expression;
s403: training a fully connected neural network;
s404: taking the feature expression as the input of the neural network, and outputting the confidence score of the entity time period and the deviation of the time period boundary;
s405: the physical time period for redundancy removal is suppressed using a time domain non-maximum.
6. The method for three-dimensional image entity recognition and time positioning of unmanned aerial vehicle according to claim 5, wherein in step S403, the fully connected neural network is trained into a classifier and a boundary regressor simultaneously by a multitask learning method, and the loss function during training is composed of two parts, one part is a Softmax cross entropy loss function for classification tasks, and the other part is a loss function for boundary shift regression tasks for entity time periods.
7. The method for three-dimensional image entity recognition and time positioning of unmanned aerial vehicle according to claim 5, wherein in step S405, the time domain non-maximum suppression measures the overlapping degree of two time periods by calculating a time overlap ratio, and the expression of the time overlap ratio is:
wherein time period 1 and time period 2 are overlapping time periods.
8. The method for three-dimensional image entity recognition and time positioning of an unmanned aerial vehicle according to claim 6, wherein the Softmax cross entropy loss function is defined as:
where N is the number of samples, y i ' is the classification result expected for the ith sample, y i Is the Softmax score of the i-th sample actually output by the neural network;
L reg is a loss function in the boundary-offset regression task for an entity period, defined as:
wherein the label value is a sample label value, the positive sample is 1, the negative sample is 0, and N p Is the number of positive samples, O s,i For the deviation of the first frame in the video for the physical period i, O e,i Deviations of the last frame in the video for the physical time period.
9. The method for three-dimensional image entity recognition and time positioning of unmanned aerial vehicle according to claim 1, wherein the step S70 comprises the steps of,
s701: extracting features from the video frames by using a K+1 classifier ResNet_2;
s702: and calculating the characteristic expression of the recommended specific entity time period.
CN202311352027.2A 2023-10-19 2023-10-19 Unmanned aerial vehicle three-dimensional image entity identification and time positioning method Active CN117095317B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311352027.2A CN117095317B (en) 2023-10-19 2023-10-19 Unmanned aerial vehicle three-dimensional image entity identification and time positioning method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311352027.2A CN117095317B (en) 2023-10-19 2023-10-19 Unmanned aerial vehicle three-dimensional image entity identification and time positioning method

Publications (2)

Publication Number Publication Date
CN117095317A true CN117095317A (en) 2023-11-21
CN117095317B CN117095317B (en) 2024-06-25

Family

ID=88783730

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311352027.2A Active CN117095317B (en) 2023-10-19 2023-10-19 Unmanned aerial vehicle three-dimensional image entity identification and time positioning method

Country Status (1)

Country Link
CN (1) CN117095317B (en)

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103914702A (en) * 2013-01-02 2014-07-09 国际商业机器公司 System and method for boosting object detection performance in videos
CN107430687A (en) * 2015-05-14 2017-12-01 谷歌公司 The segmentation of the time based on entity of video flowing
CN109614956A (en) * 2018-12-29 2019-04-12 上海依图网络科技有限公司 The recognition methods of object and device in a kind of video
US20190325275A1 (en) * 2018-04-19 2019-10-24 Adobe Inc. Active learning method for temporal action localization in untrimmed videos
US20210195286A1 (en) * 2019-12-19 2021-06-24 Sling Media Pvt Ltd Method and system for analyzing live broadcast video content with a machine learning model implementing deep neural networks to quantify screen time of displayed brands to the viewer
US20220262116A1 (en) * 2021-02-12 2022-08-18 Comcast Cable Communications, Llc Methods, Systems, And Apparatuses For Improved Video Frame Analysis And Classification
CN115705706A (en) * 2021-08-13 2023-02-17 腾讯科技(深圳)有限公司 Video processing method, video processing device, computer equipment and storage medium
CN115858863A (en) * 2022-12-22 2023-03-28 广州启生信息技术有限公司 Method and device for labeling video label
US20230206632A1 (en) * 2021-12-23 2023-06-29 Yahoo Ad Tech Llc Computerized system and method for fine-grained video frame classification and content creation therefrom
CN116644755A (en) * 2023-07-27 2023-08-25 中国科学技术大学 Multi-task learning-based few-sample named entity recognition method, device and medium
CN116824463A (en) * 2023-08-31 2023-09-29 江西啄木蜂科技有限公司 Video key frame extraction method, computer readable storage medium and electronic device

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103914702A (en) * 2013-01-02 2014-07-09 国际商业机器公司 System and method for boosting object detection performance in videos
CN107430687A (en) * 2015-05-14 2017-12-01 谷歌公司 The segmentation of the time based on entity of video flowing
US20190325275A1 (en) * 2018-04-19 2019-10-24 Adobe Inc. Active learning method for temporal action localization in untrimmed videos
CN109614956A (en) * 2018-12-29 2019-04-12 上海依图网络科技有限公司 The recognition methods of object and device in a kind of video
US20210195286A1 (en) * 2019-12-19 2021-06-24 Sling Media Pvt Ltd Method and system for analyzing live broadcast video content with a machine learning model implementing deep neural networks to quantify screen time of displayed brands to the viewer
US20220262116A1 (en) * 2021-02-12 2022-08-18 Comcast Cable Communications, Llc Methods, Systems, And Apparatuses For Improved Video Frame Analysis And Classification
CN115705706A (en) * 2021-08-13 2023-02-17 腾讯科技(深圳)有限公司 Video processing method, video processing device, computer equipment and storage medium
US20230206632A1 (en) * 2021-12-23 2023-06-29 Yahoo Ad Tech Llc Computerized system and method for fine-grained video frame classification and content creation therefrom
CN115858863A (en) * 2022-12-22 2023-03-28 广州启生信息技术有限公司 Method and device for labeling video label
CN116644755A (en) * 2023-07-27 2023-08-25 中国科学技术大学 Multi-task learning-based few-sample named entity recognition method, device and medium
CN116824463A (en) * 2023-08-31 2023-09-29 江西啄木蜂科技有限公司 Video key frame extraction method, computer readable storage medium and electronic device

Also Published As

Publication number Publication date
CN117095317B (en) 2024-06-25

Similar Documents

Publication Publication Date Title
CN110414368B (en) Unsupervised pedestrian re-identification method based on knowledge distillation
CN112308158B (en) Multi-source field self-adaptive model and method based on partial feature alignment
CN110321910B (en) Point cloud-oriented feature extraction method, device and equipment
CN105869173B (en) A kind of stereoscopic vision conspicuousness detection method
CN109993100B (en) Method for realizing facial expression recognition based on deep feature clustering
CN107145862B (en) Multi-feature matching multi-target tracking method based on Hough forest
CN109871875B (en) Building change detection method based on deep learning
CN106257496B (en) Mass network text and non-textual image classification method
CN111476161A (en) Somatosensory dynamic gesture recognition method fusing image and physiological signal dual channels
WO2022062419A1 (en) Target re-identification method and system based on non-supervised pyramid similarity learning
CN112990282B (en) Classification method and device for fine-granularity small sample images
CN111259735B (en) Single-person attitude estimation method based on multi-stage prediction feature enhanced convolutional neural network
CN113157678B (en) Multi-source heterogeneous data association method
CN113487610B (en) Herpes image recognition method and device, computer equipment and storage medium
CN111985325A (en) Aerial small target rapid identification method in extra-high voltage environment evaluation
CN104680193B (en) Online objective classification method and system based on quick similitude network integration algorithm
CN112528058B (en) Fine-grained image classification method based on image attribute active learning
CN111626357B (en) Image identification method based on neural network model
CN114612450B (en) Image detection segmentation method and system based on data augmentation machine vision and electronic equipment
Ibrahem et al. Real-time weakly supervised object detection using center-of-features localization
CN114548256A (en) Small sample rare bird identification method based on comparative learning
Yang et al. C-RPNs: Promoting object detection in real world via a cascade structure of Region Proposal Networks
CN116469020A (en) Unmanned aerial vehicle image target detection method based on multiscale and Gaussian Wasserstein distance
CN116468935A (en) Multi-core convolutional network-based stepwise classification and identification method for traffic signs
Al-Obodi et al. A Saudi Sign Language recognition system based on convolutional neural networks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant