CN113947714A

CN113947714A - Multi-mode collaborative optimization method and system for video monitoring and remote sensing

Info

Publication number: CN113947714A
Application number: CN202111154171.6A
Authority: CN
Inventors: 李晓威; 陈升敬; 刘晓建
Original assignee: Guangzhou Fuan Electronic Technology Co ltd
Current assignee: Guangzhou Fuan Electronic Technology Co ltd
Priority date: 2021-09-29
Filing date: 2021-09-29
Publication date: 2022-01-18
Anticipated expiration: 2041-09-29
Also published as: CN113947714B

Abstract

The invention relates to a multimode collaborative optimization method and a system for video monitoring and remote sensing, wherein the method comprises the following steps: constructing a first classifier and a second classifier; constructing a mapping relation between picture coordinates and remote sensing longitude and latitude coordinates of a monitoring camera according to state parameters of a preset monitoring camera; carrying out target recognition on the same object, extracting recognition results in the first classifier and the second classifier, and carrying out data fusion; performing course-type learning on the classifier, and continuously optimizing the classifier; and inputting the monitoring picture to be detected and the remote sensing image to be detected into the optimized classifier for recognition and fusion to obtain a target recognition result. The method combines video monitoring and remote sensing, and collaborative optimization, trains a new model on the basis of the existing machine learning model, and realizes result fusion in multi-modal recognition to realize online learning, thereby improving the recognition accuracy.

Description

Multi-mode collaborative optimization method and system for video monitoring and remote sensing

Technical Field

The invention relates to the technical field of target identification, in particular to a multi-mode collaborative optimization method and system for video monitoring and remote sensing.

Background

The target identification technology has wide application prospect in the fields of video monitoring, robots, intelligent transportation and the like. However, since the target recognition needs to involve calculation and analysis of a large amount of data and interference of environmental factors such as external light viewing angle, the traditional recognition algorithm cannot extract the optimal characteristics of the image, and the recognition rate is limited.

The target detection and identification is one of important applications of remote sensing technology, and has great significance in the fields of ocean monitoring, geological survey, urban planning and the like. The existing remote sensing image target detection method adopts a one-step target detection algorithm or a rotation transformation data enhancement mode detection method, has higher cost and can not meet the requirement of real-time detection of the remote sensing image on the premise of ensuring high target detection precision.

Disclosure of Invention

In order to overcome the defects of the prior art, the invention aims to provide a multi-mode collaborative optimization method and system for video monitoring and remote sensing.

In order to achieve the purpose, the invention provides the following scheme:

a multimode collaborative optimization method for video monitoring and remote sensing comprises the following steps:

constructing a monitoring picture target detection data set and a remote sensing image target detection data set;

training a machine learning model according to the monitoring picture target detection data set and the remote sensing image target detection data set to obtain a trained first classifier and a trained second classifier;

constructing a mapping relation between picture coordinates and remote sensing longitude and latitude coordinates of a monitoring camera according to state parameters of a preset monitoring camera;

identifying objects in the monitoring picture by using the first classifier to obtain a first confidence set of all objects in the monitoring picture, and identifying the objects in the remote sensing image by using the second classifier to obtain a second confidence set of all objects in the remote sensing image;

respectively utilizing the first classifier and the second classifier to recognize preset recognition objects to obtain recognition results, and carrying out data fusion and marking on the recognition results according to the first confidence degree set, the second confidence degree set and the mapping relation to obtain a monitoring picture added with a pseudo label and a remote sensing image added with the pseudo label;

acquiring a training set according to the monitoring picture added with the pseudo label and the remote sensing image added with the pseudo label, and respectively training the first classifier and the second classifier according to the training set to obtain an optimized classifier;

and inputting the monitoring picture to be detected and the remote sensing image to be detected into the optimized classifier for recognition and fusion to obtain a target recognition result.

Preferably, the training a machine learning model according to the monitoring picture target detection data set and the remote sensing image target detection data set to obtain a trained first classifier and a trained second classifier includes:

dividing the monitoring picture target detection data set and the remote sensing image target detection data set according to a preset proportion to obtain a monitoring picture training set, a monitoring picture verification set, a remote sensing image training set and a remote sensing image verification set;

based on a multi-scale training method and a data processing method, training a YoloV3 model according to the monitoring picture training set and the remote sensing image training set, and evaluating the trained YoloV3 model according to the monitoring picture verification set and the remote sensing image verification set to obtain a trained YoloV3 model;

and constructing the first classifier and the second classifier by using the trained YoloV3 model.

Preferably, the constructing a mapping relationship between the picture coordinates of the monitoring camera and the remote sensing longitude and latitude coordinates according to the state parameters of the preset monitoring camera includes:

acquiring state parameters of the preset monitoring camera; the state parameters include: the height of the monitoring camera from the horizontal plane, the included angle between the central line of the monitoring camera and the vertical line, the included angle between the projection of the central line of the monitoring camera on the horizontal plane and the true north direction of geography, the horizontal field angle of the monitoring camera, the vertical field angle of the monitoring camera and the image resolution parameter information of the monitoring camera;

calibrating the vertical projection position of the monitoring camera on the horizontal plane;

calculating a straight-line horizontal distance and a longitude horizontal distance between the vertical projection position and any position on a horizontal plane in a visual range of the monitoring camera based on a hemiversine formula;

calculating an included angle between a connecting line of the vertical projection position and the arbitrary position and the geographical true north direction according to the straight line horizontal distance and the longitude horizontal distance, and recording the included angle as a first included angle;

calculating the included angle between the connecting line of the position of the monitoring camera and the arbitrary position and a vertical line according to the straight-line horizontal distance and the height of the monitoring camera from the horizontal plane, and recording the included angle as a second included angle;

calculating the picture coordinate of the monitoring camera at any position according to the included angle between the center line of the monitoring camera and a vertical line, the projection of the center line of the monitoring camera on a horizontal plane and the included angle in the true north direction of geography, the horizontal field angle of the monitoring camera, the vertical field angle of the monitoring camera and the image resolution parameter information of the monitoring camera; the mapping relation comprises the straight line horizontal distance, the longitude horizontal distance, the first included angle, the second included angle and the picture coordinate of the random position in the monitoring camera.

Preferably, the recognizing a preset recognized object by using the first classifier and the second classifier respectively to obtain a recognition result, and performing data fusion and labeling on the recognition result according to the first confidence set, the second confidence set and the mapping relationship to obtain a monitoring picture with a pseudo tag added and a remote sensing image with a pseudo tag added, includes:

determining the preset identification object according to the monitoring picture target detection data set and the remote sensing image target detection data set; the preset recognition object is recognized by a classifier to obtain a preset target recognition object;

correspondingly inputting the monitoring picture target detection data set and the remote sensing image target detection data set to the first classifier and the second classifier respectively for identification to obtain a first classification result and a second classification result;

and performing data fusion according to the first classification result and the second classification result to obtain the identification result, and performing data marking according to the identification result, the first confidence set, the second confidence set and the mapping relation to obtain the monitoring picture after the pseudo label is added and the remote sensing image after the pseudo label is added.

Preferably, the data fusion is performed according to the first classification result and the second classification result to obtain the identification result, and data labeling is performed according to the identification result, the first confidence set, the second confidence set and the mapping relationship to obtain the monitoring picture after the pseudo label is added and the remote sensing image after the pseudo label is added, including

If the first classification result comprises the preset target recognition object and the second classification result does not comprise the preset target recognition object, extracting the confidence coefficient of the preset target recognition object in the first confidence degree set, judging whether the confidence coefficient of the preset target recognition object is greater than or equal to a preset confidence coefficient threshold value, if so, acquiring a target recognition frame of the preset target recognition object, obtaining a corresponding region in the remote sensing image according to the mapping relation and the target recognition frame, and marking the corresponding region as a first recognition result;

if the first classification result does not include the preset target recognition object and the second classification result includes the preset target recognition object, extracting the confidence coefficient of the preset target recognition object in the second confidence coefficient set, and judging whether the confidence coefficient of the preset target recognition object is greater than or equal to a preset confidence coefficient threshold value, if so, acquiring a target recognition frame of the preset target recognition object, obtaining a corresponding region in a monitoring picture according to the mapping relation and the target recognition frame, and marking the corresponding region as a second recognition result;

if the first classification result comprises the preset target recognition object and the second classification result comprises the preset target recognition object, extracting confidence degrees of the preset target recognition object in the first confidence degree set and the second confidence degree set;

if the types of the target objects recognized in the first classification result and the second classification result are the same, judging whether the confidence degree of the preset target recognition object in the first confidence degree set is greater than or equal to a preset confidence degree threshold or whether the confidence degree of the preset target recognition object in the second confidence degree set is greater than or equal to a preset confidence degree threshold, and if so, determining the first classification result and the second classification result as correct results;

if the types of the target objects identified in the first classification result and the second classification result are different, screening out the maximum value of the confidence degrees of the preset target identification objects in the first confidence degree set and the confidence degrees of the preset target identification objects in the second confidence degree set, and judging whether the maximum value is greater than or equal to a preset confidence degree threshold value, if so, correcting the type of the target object corresponding to the other confidence degree by using the type of the target object corresponding to the maximum value to obtain a correction result;

and obtaining the monitoring picture after the pseudo label is added and the remote sensing image after the pseudo label is added according to the correction result.

Preferably, the obtaining a training set according to the monitoring picture after adding the pseudo tag and the remote sensing image after adding the pseudo tag, and respectively training the first classifier and the second classifier according to the training set to obtain an optimized classifier includes:

respectively calculating the average confidence of all objects in each frame of monitoring picture image and each remote sensing image;

when the number of the monitoring pictures and the number of the remote sensing images respectively processed by the first classifier and the second classifier and subjected to data fusion reach a preset training number, respectively sequencing the monitoring pictures and the remote sensing images from high to low according to the value of the average confidence coefficient to obtain a sequencing image set;

obtaining the training set according to the sequencing image set, the monitoring picture added with the pseudo label and the remote sensing image added with the pseudo label;

training the first classifier and the second classifier according to the monitoring picture image and the remote sensing image in the training set respectively to obtain a first classifier and a second classifier which are trained for the first time;

respectively inputting the multi-frame monitoring picture and the multiple remote sensing images into the first classifier and the second classifier of the primary training for target recognition to obtain the confidence coefficient of a target object in the monitoring picture image and the confidence coefficient of the target object in the remote sensing images;

counting the total number of all identified objects in a plurality of monitoring pictures and a plurality of remote sensing images;

counting the number of objects of which the confidence degrees are greater than or equal to a preset confidence degree threshold value and the confidence degrees are greater than or equal to a preset confidence degree threshold value in the remote sensing image, wherein the object numbers are the same type;

calculating the ratio of the number of the objects with the same type of the target objects to the total number of all the identified objects;

and judging whether the ratio is smaller than a preset precision threshold value, if so, continuing training the first classifier and the second classifier, and if not, determining the first classifier and the second classifier as the optimized classifier.

A multimodal collaborative optimization system for video surveillance and remote sensing, comprising:

the data set construction module is used for constructing a monitoring picture target detection data set and a remote sensing image target detection data set;

the classifier building module is used for training a machine learning model according to the monitoring picture target detection data set and the remote sensing image target detection data set to obtain a trained first classifier and a trained second classifier;

the mapping relation construction module is used for constructing a mapping relation between the picture coordinate of the monitoring camera and the remote sensing longitude and latitude coordinate according to the state parameter of the preset monitoring camera;

the confidence coefficient acquisition module is used for identifying objects in the monitoring picture by using the first classifier to obtain a first confidence coefficient set of all the objects in the monitoring picture, and identifying the objects in the remote sensing image by using the second classifier to obtain a second confidence coefficient set of all the objects in the remote sensing image;

the fusion module is used for respectively utilizing the first classifier and the second classifier to identify preset identification objects to obtain identification results, and carrying out data fusion and marking on the identification results according to the first confidence coefficient set, the second confidence coefficient set and the mapping relation to obtain a monitoring picture added with a pseudo label and a remote sensing image added with the pseudo label;

the optimization module is used for acquiring a training set according to the monitoring picture added with the pseudo label and the remote sensing image added with the pseudo label, and respectively training the first classifier and the second classifier according to the training set to obtain an optimized classifier;

and the identification module is used for inputting the monitoring picture to be detected and the remote sensing image to be detected into the optimized classifier for identification and fusion to obtain a target identification result.

Preferably, the classifier building module specifically includes:

the dividing unit is used for dividing the monitoring picture target detection data set and the remote sensing image target detection data set according to a preset proportion to obtain a monitoring picture training set, a monitoring picture verification set, a remote sensing image training set and a remote sensing image verification set;

the training unit is used for training a YoloV3 model according to the monitoring picture training set and the remote sensing image training set based on a multi-scale training method and a data processing method, and evaluating the trained YoloV3 model according to the monitoring picture verification set and the remote sensing image verification set to obtain a trained YoloV3 model;

a building unit, configured to build the first classifier and the second classifier by using the trained YoloV3 model.

Preferably, the mapping relationship building module specifically includes:

the information acquisition unit is used for acquiring the state parameters of the preset monitoring camera; the state parameters include: the height of the monitoring camera from the horizontal plane, the included angle between the central line of the monitoring camera and the vertical line, the included angle between the projection of the central line of the monitoring camera on the horizontal plane and the true north direction of geography, the horizontal field angle of the monitoring camera, the vertical field angle of the monitoring camera and the image resolution parameter information of the monitoring camera;

the calibration unit is used for calibrating the vertical projection position of the monitoring camera on the horizontal plane;

the first calculation unit is used for calculating a straight-line horizontal distance and a longitude horizontal distance between the vertical projection position and any position on a horizontal plane in a visual range of the monitoring camera based on a hemiversine formula;

the second calculation unit is used for calculating an included angle between a connecting line of the vertical projection position and the arbitrary position and the geographical true north direction according to the straight line horizontal distance and the longitude horizontal distance, and recording the included angle as a first included angle;

the third calculating unit is used for calculating an included angle between a connecting line of the position of the monitoring camera and the arbitrary position and a vertical line according to the linear horizontal distance and the height of the monitoring camera from the horizontal plane, and recording the included angle as a second included angle;

the fourth calculating unit is used for calculating the picture coordinate of the monitoring camera at any position according to the included angle between the central line of the monitoring camera and a vertical line, the projection of the central line of the monitoring camera on a horizontal plane and the included angle in the true north direction of geography, the horizontal field angle of the monitoring camera, the vertical field angle of the monitoring camera and the image resolution parameter information of the monitoring camera; the mapping relation comprises the linear horizontal distance, the longitude horizontal distance, the first included angle, the second included angle and the picture coordinate of the random position in the monitoring camera.

Preferably, the fusion module specifically includes:

the target object determining unit is used for determining the preset identification object according to the monitoring picture target detection data set and the remote sensing image target detection data set; the preset recognition object is recognized by a classifier to obtain a preset target recognition object;

the identification unit is used for correspondingly inputting the monitoring picture target detection data set and the remote sensing image target detection data set to the first classifier and the second classifier respectively for identification to obtain a first classification result and a second classification result;

and the fusion marking unit is used for carrying out data fusion according to the first classification result and the second classification result to obtain the identification result, and carrying out data marking according to the identification result, the first confidence coefficient set, the second confidence coefficient set and the mapping relation to obtain the monitoring picture added with the pseudo label and the remote sensing image added with the pseudo label.

According to the specific embodiment provided by the invention, the invention discloses the following technical effects:

the invention provides a multimode collaborative optimization method and a multimode collaborative optimization system for video monitoring and remote sensing, wherein the method comprises the following steps: constructing a monitoring picture target detection data set and a remote sensing image target detection data set; training a machine learning model according to the monitoring picture target detection data set and the remote sensing image target detection data set to obtain a trained first classifier and a trained second classifier; constructing a mapping relation between picture coordinates and remote sensing longitude and latitude coordinates of a monitoring camera according to state parameters of a preset monitoring camera; identifying objects in the monitoring picture by using the first classifier to obtain a first confidence set of all objects in the monitoring picture, and identifying the objects in the remote sensing image by using the second classifier to obtain a second confidence set of all objects in the remote sensing image; respectively identifying preset identification objects by using the first classifier and the second classifier to obtain identification results, and performing data fusion and marking on the identification results according to the first confidence coefficient set, the second confidence coefficient set and the mapping relation to obtain a monitoring picture added with a pseudo label and a remote sensing image added with the pseudo label; acquiring a training set according to the monitoring picture added with the pseudo label and the remote sensing image added with the pseudo label, and respectively training the first classifier and the second classifier according to the training set to obtain an optimized classifier; and inputting the monitoring picture to be detected and the remote sensing image to be detected into the optimized classifier for recognition and fusion to obtain a target recognition result. The method combines video monitoring and remote sensing, and collaborative optimization, trains a new model on the basis of the existing machine learning model, and realizes result fusion in multi-modal recognition to realize online learning, thereby improving the recognition accuracy.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings required to be used in the embodiments will be briefly described below, and it is obvious that the drawings described below are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without inventive labor.

FIG. 1 is a method flow diagram of a multi-modal collaborative optimization method in an embodiment provided by the present invention;

FIG. 2 is a schematic diagram of an application process in an embodiment provided by the present invention;

fig. 3 is a first schematic diagram of a method for calculating a mapping relationship between picture coordinates and longitude and latitude coordinates of a monitoring camera in an embodiment of the present invention;

fig. 4 is a second schematic diagram of a method for calculating a mapping relationship between picture coordinates and longitude and latitude coordinates of a monitoring camera in an embodiment of the present invention;

fig. 5 is a schematic diagram of a position of an object to be detected in a screen in an embodiment of the present invention;

fig. 6 is a module connection diagram of a modality collaborative optimization system in an embodiment provided by the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be described clearly and completely with reference to the accompanying drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.

The terms "first," "second," "third," and "fourth," etc. in the description and claims of this application and in the accompanying drawings are used for distinguishing between different objects and not for describing a particular order. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, the inclusion of a series of steps, processes, methods, etc. is not limited to the steps shown, but may alternatively include other steps not shown, or may alternatively include other steps inherent to such processes, methods, articles, or apparatus.

The invention aims to provide a multi-mode collaborative optimization method and system for video monitoring and remote sensing, which can improve the identification accuracy.

In order to make the aforementioned objects, features and advantages of the present invention comprehensible, the present invention is described in detail with reference to the accompanying drawings and the detailed description thereof.

Fig. 1 and fig. 2 are a method flowchart and an application process schematic diagram of a multimodal collaborative optimization method in an embodiment provided by the present invention, and as shown in fig. 1 and fig. 2, the present invention provides a multimodal collaborative optimization method for video monitoring and remote sensing, which includes:

step 100: constructing a monitoring picture target detection data set and a remote sensing image target detection data set;

step 200: training a machine learning model according to the monitoring picture target detection data set and the remote sensing image target detection data set to obtain a first classifier and a second classifier which are trained;

step 300: constructing a mapping relation between picture coordinates and remote sensing longitude and latitude coordinates of a monitoring camera according to state parameters of a preset monitoring camera;

step 400: identifying objects in the monitoring picture by using the first classifier to obtain a first confidence set of all objects in the monitoring picture, and identifying the objects in the remote sensing image by using the second classifier to obtain a second confidence set of all objects in the remote sensing image;

step 500: respectively utilizing the first classifier and the second classifier to identify a preset identification object to obtain an identification result, and carrying out data fusion and marking on the identification result according to the first confidence set, the second confidence set and the mapping relation to obtain a monitoring picture added with a pseudo label and a remote sensing image added with the pseudo label;

step 600: acquiring a training set according to the monitoring picture added with the pseudo label and the remote sensing image added with the pseudo label, and respectively training the first classifier and the second classifier according to the training set to obtain an optimized classifier;

step 700: and inputting the monitoring picture to be detected and the remote sensing image to be detected into the optimized classifier for recognition and fusion to obtain a target recognition result.

Optionally, in this embodiment, a classifier is first constructed based on a YoloV3 neural network, a monitoring picture target detection data set and a remote sensing image target detection data set are constructed, and the operation on the two data sets is as follows: the 70% of the data was randomly set as the training set and the remaining 30% as the validation set. And training the model YoloV3 on the training set until the model converges, and in order to further improve the detection effect, using a multi-scale training method and a data enhancement method, wherein the performance and generalization capability of the trained model are greatly improved by the training methods.

Specifically, two classifiers, namely a classifier 1 and a classifier 2, are constructed by training a yoolov 3 and utilizing a trained yoolov 3 neural network, wherein the classifier 1 is used for monitoring image target recognition, and the classifier 2 is used for remote sensing image target recognition.

Referring to fig. 3 to 5, in the embodiment, after the classifier is constructed, parameter acquisition preparation is performed, and a mapping relationship establishing process is performed according to the acquired parameters, as shown in fig. 3 to 5, the letter meaning in the drawings is specifically: n schematic geographical true NorthDirection, measuring the height of the monitoring camera from the horizontal plane as H, the included angle between the central line of the monitoring camera and the vertical line as theta, the included angle between the projection of the central line of the monitoring camera on the horizontal plane and the geographical true north direction as beta, and the horizontal field angle of the monitoring camera as omega_xThe vertical field angle of the monitoring camera is omega_yAcquiring the resolution parameter information of the image of the monitoring camera as X multiplied by Y (X is the pixel width of the image, and Y is the pixel height);

suppose that: the coordinate of the center of the picture of the monitoring camera is (0,0), the vertical projection position of the position O of the monitoring camera on the horizontal plane is O', and the longitude and latitude are (lambda)₀，ψ₀) Aiming at any position A on the horizontal plane within the visual range of the monitoring camera_iLatitude and longitude coordinates (λ)_i，ψ_i) Can be converted into the picture coordinate (x) of the monitoring camera in the following way_i，y_i)；

1) According to a Haversene (semi-positive vector formula) formula, calculating a vertical projection position O' of the position of the monitoring camera on a horizontal plane and an optional position A on the horizontal plane within a visual range of the monitoring camera_iStraight horizontal distance d_iIn units of m, O' and A_iLongitude horizontal distance s_iThe unit is m:

wherein: a. b are all intermediate variable values, O' (λ)₀，ψ₀) For monitoring the vertical projection position of the camera on the horizontal plane, A_i(λ_i，ψ_i) Is any position on a horizontal plane within the visual range of the monitoring camera, and r is the radius of the earth and the unit is m;

2): calculation of O' and A from 1)_iAngle beta between the connecting line of (A) and the true north direction of geography_i：

3): calculation of O and A from 2)_iAngle theta between the connecting line and the vertical line_i：

H is the height of the monitoring camera from the horizontal plane, and the unit is m;

4) calculation of A_iIn the picture coordinate (x) of the monitoring camera_i，y_i)：

Wherein X is the pixel width of the image, Y is the pixel height, and the parameter values of X and Y can be obtained according to the image resolution of the monitoring camera being X multiplied by Y;

theta is the included angle between the central line of the monitoring camera and the vertical line, beta is the included angle between the projection of the central line of the monitoring camera on the horizontal plane and the true north direction of geography, omega_xFor monitoring the horizontal field angle, omega, of the camera_yThe vertical field angle of the monitoring camera is adopted.

Specifically, in this embodiment, after the mapping relationship establishing process is performed, the same object is subjected to target recognition, and the recognition results in the classifier 1 and the classifier 2 are extracted for data fusion.

Furthermore, according to the mapping relation between the picture coordinate of the monitoring camera and the remote sensing longitude and latitude coordinate, the mutual correspondence of the same object in the monitoring picture and the remote sensing image can be realized. The object in the monitoring picture and the object in the remote sensing image are subjected to target recognition by utilizing the constructed classifier, wherein the classifier 1 recognizes the object in the monitoring picture, the classifier 2 recognizes the object in the remote sensing image, a confidence coefficient set A of all objects in the monitoring picture and a confidence coefficient set B of all objects in the remote sensing image are respectively obtained, a confidence coefficient threshold is set to be Q, and the following three conditions can occur in the recognition of the same target object: (1) identifying a target object in the monitoring picture, wherein the target object is not identified in the remote sensing image; (2) a target object is not identified in the monitoring picture, and the target object is identified in the remote sensing image; (3) and identifying a target object in the monitoring picture and the remote sensing image. And respectively carrying out data fusion on the results of the three conditions, wherein the specific processing process is as follows:

1) and identifying a target object in the monitoring picture, wherein the target object is not identified in the remote sensing image.

Extracting confidence F of the target object from the confidence set A of all objects in the monitoring picture₁If F is₁If < Q, classifier 1 identifies an error, if F₁If the value is more than or equal to Q, the classifier 1 identifies the correctness; and when the classifier 1 can correctly identify the target object, acquiring an identification frame of the target object, wherein the picture coordinates of two points at the upper left and the lower right of the identification frame can be known when the classifier 1 identifies the target, obtaining a corresponding area of the identification frame in the remote sensing image according to the mapping relation established in the step 2, and marking the area as the target object identified by the classifier 1.

2) The target object is not identified in the monitoring picture, and the target object is identified in the remote sensing image.

Extracting target object from confidence set B of all objects in remote sensing imageDegree of confidence F₂If F is₂< Q, the classifier 2 identifies an error, if F₂If the value is more than or equal to Q, the classifier 2 identifies the correctness; when the classifier 2 can correctly identify the target object, acquiring an identification frame of the target object, obtaining longitude and latitude information of two points, namely the upper left point and the lower right point of the identification frame, of the remote sensing image, obtaining a corresponding area of the identification frame in the monitoring picture according to the mapping relation established in the step 2, and marking the area as the target object identified by the classifier 2.

3) And identifying a target object in the monitoring picture and the remote sensing image.

Extracting the confidence F of the target object from the confidence set A of all objects in the monitoring picture and the confidence set B of all objects in the remote sensing image₁And F₂If the types of the target objects identified by the classifier 1 and the classifier 2 are the same, the confidence coefficient F is judged₁And F₂Whether the condition is satisfied: f₁≥Q，F₂Q, when one or more conditions are met, the classifier 1 and the classifier 2 correctly identify, otherwise, the classifier 1 and the classifier 2 incorrectly identify; if the types of the target objects identified by the classifier 1 and the classifier 2 are different, screening out confidence F₁And F₂Maximum value of (1): f_max＝(F₁,F₂)_maxWhen F is_max< Q, classifier 1 and classifier 2 identify an error, when F_maxWhen not less than Q, with confidence (F)₁,F₂) And correcting the target object type corresponding to the other confidence coefficient by taking the target object type corresponding to the maximum value as a reference. And keeping the correctly identified and corrected target object in the monitoring picture and the remote sensing image, wherein the corrected target object is represented by adding a pseudo label to the target object.

Specifically, after the data fusion is performed, the classifier is learned in a course, so that the classifier is continuously optimized, and the accuracy of the classifier in identifying the target object is improved.

Further, each frame is calculatedAverage confidence of all objects in monitoring picture image

Wherein a is the monitored picture of the a-th frame, Z_iThe confidence of the ith object.

Calculating the average confidence of all objects in each remote sensing image

Wherein b is the b-th remote sensing image, P_jThe confidence of the jth object.

Setting a parameter num, when the number of the monitoring picture images and the remote sensing images after being processed by the classifier and data fusion reaches num, sequencing the monitoring picture images and the remote sensing images from high to low according to the value of average confidence coefficient, respectively inputting the sequenced monitoring picture images and remote sensing images added with pseudo labels into the classifier 1 and the classifier 2, starting learning from simple samples, gradually transitioning to difficult samples, continuously optimizing the classifier, and enabling the training process to be more stable.

Specifically, in the embodiment, after the classifier is optimized in a curriculum-based learning manner, the optimized classifier is further used for accurately identifying the target object.

Further, continuously learning and optimizing in step 4 to obtain a classifier 1 and a classifier 2 with higher precision, inputting s frames of monitoring picture images and s remote sensing images, respectively carrying out target identification on the monitoring picture images and the remote sensing images by using the optimized classifier 1 and the optimized classifier 2 to obtain the confidence F of the target object in the monitoring picture images₁And remote sensing of target objects in the imageConfidence F₂Simultaneously counting the total number of all identified objects in the s-frame monitoring picture image and the s remote sensing images to be N, setting a parameter K, and counting to satisfy F₁≥Q，F₂The number of the objects is omega under the three conditions that the object is equal to or more than Q and the target object is of the same type, and the ratio T of omega to all the recognized objects is calculated:

T＝Ω/2N；

when T is less than K, continuing to optimize the classifier; otherwise, the classifier is considered to be capable of accurately identifying the target object. And the target object type corresponding to the maximum confidence coefficient is used as a result after data fusion, so that accurate identification of the target object is realized.

Fig. 6 is a module connection diagram of a modal collaborative optimization system in an embodiment provided by the present invention, and as shown in fig. 6, the present invention further provides a multimodal collaborative optimization system for video monitoring and remote sensing, including:

Preferably, the classifier building module specifically includes:

Preferably, the mapping relationship building module specifically includes:

Preferably, the fusion module specifically includes:

The invention has the following beneficial effects:

(1) the classifier of the invention can continuously obtain new samples when executing the classification task, thereby continuously training and improving self and improving the precision of the classifier.

(2) And the optimized classifier is utilized to identify the target object, and the identification results of video monitoring and remote sensing are fused, so that the identification accuracy is improved.

The embodiments are described in a progressive manner in the specification, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are mutually referred to. For the system disclosed by the embodiment, the description is simple because the system corresponds to the method disclosed by the embodiment, and the relevant points can be referred to the method part for description.

The principles and embodiments of the present invention have been described herein using specific examples, which are provided only to help understand the method and the core concept of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there are variations in the specific implementation and application scope. In view of the above, the present description should not be construed as limiting the invention.

Claims

1. A multimode collaborative optimization method for video monitoring and remote sensing is characterized by comprising the following steps:

respectively utilizing the first classifier and the second classifier to identify a preset identification object to obtain an identification result, and carrying out data fusion and marking on the identification result according to the first confidence set, the second confidence set and the mapping relation to obtain a monitoring picture added with a pseudo label and a remote sensing image added with the pseudo label;

2. The multimodal collaborative optimization method for video surveillance and remote sensing according to claim 1, wherein the training of a machine learning model according to the surveillance image target detection dataset and the remote sensing image target detection dataset to obtain a trained first classifier and a trained second classifier comprises:

3. The multimodal collaborative optimization method for video monitoring and remote sensing according to claim 1, wherein the constructing a mapping relationship between picture coordinates of a monitoring camera and remote sensing longitude and latitude coordinates according to state parameters of a preset monitoring camera comprises:

calculating an included angle between a connecting line of the position of the monitoring camera and the arbitrary position and a vertical line according to the linear horizontal distance and the height of the monitoring camera from the horizontal plane, and recording the included angle as a second included angle;

calculating the picture coordinate of the monitoring camera at any position according to the included angle between the central line of the monitoring camera and a vertical line, the projection of the central line of the monitoring camera on a horizontal plane and the included angle in the true north direction of geography, the horizontal field angle of the monitoring camera, the vertical field angle of the monitoring camera and the image resolution parameter information of the monitoring camera; the mapping relation comprises the linear horizontal distance, the longitude horizontal distance, the first included angle, the second included angle and the picture coordinate of the random position in the monitoring camera.

4. The multimodal collaborative optimization method for video monitoring and remote sensing according to claim 1, wherein the steps of respectively identifying a preset identification object by using the first classifier and the second classifier to obtain an identification result, and performing data fusion and labeling on the identification result according to the first confidence set, the second confidence set and the mapping relationship to obtain a monitoring picture with a pseudo label added and a remote sensing image with a pseudo label added comprise:

5. The multi-modal collaborative optimization method for video monitoring and remote sensing according to claim 4, wherein the data fusion is performed according to the first classification result and the second classification result to obtain the recognition result, and the data labeling is performed according to the recognition result, the first confidence set, the second confidence set and the mapping relationship to obtain the pseudo-tag added monitoring picture and the pseudo-tag added remote sensing image, including

If the first classification result comprises the preset target recognition object and the second classification result does not comprise the preset target recognition object, extracting the confidence coefficient of the preset target recognition object in the first confidence coefficient set, judging whether the confidence coefficient of the preset target recognition object is greater than or equal to a preset confidence coefficient threshold value, if so, acquiring a target recognition frame of the preset target recognition object, obtaining a corresponding region in the remote sensing image according to the mapping relation and the target recognition frame, and marking the corresponding region as a first recognition result;

if the types of the target objects identified in the first classification result and the second classification result are the same, judging whether the confidence of the preset target identification object in the first confidence set is greater than or equal to a preset confidence threshold or whether the confidence of the preset target identification object in the second confidence set is greater than or equal to a preset confidence threshold, and if so, determining the first classification result and the second classification result as correct results;

6. The multimodal collaborative optimization method for video monitoring and remote sensing according to claim 5, wherein the obtaining of a training set according to the monitoring picture after adding the pseudo label and the remote sensing image after adding the pseudo label, and training the first classifier and the second classifier respectively according to the training set to obtain the optimized classifier comprises:

counting the number of objects of which the confidence degrees are greater than or equal to a preset confidence degree threshold value and the confidence degrees are greater than or equal to the preset confidence degree threshold value in the remote sensing image, wherein the object numbers are the same type;

7. A multimodal collaborative optimization system for video surveillance and remote sensing, comprising:

the confidence coefficient acquisition module is used for identifying the objects in the monitoring picture by using the first classifier to obtain a first confidence coefficient set of all the objects in the monitoring picture, and identifying the objects in the remote sensing image by using the second classifier to obtain a second confidence coefficient set of all the objects in the remote sensing image;

the fusion module is used for respectively utilizing the first classifier and the second classifier to identify a preset identification object to obtain an identification result, and performing data fusion and marking on the identification result according to the first confidence set, the second confidence set and the mapping relation to obtain a monitoring picture added with a pseudo label and a remote sensing image added with the pseudo label;

8. The multimodal collaborative optimization system for video surveillance and remote sensing according to claim 7, wherein the classifier building module specifically comprises:

9. The multimodal collaborative optimization system for video surveillance and remote sensing according to claim 7, wherein the mapping relationship construction module specifically includes:

the fourth calculation unit is used for calculating the picture coordinates of the monitoring camera at any position according to the included angle between the central line of the monitoring camera and a vertical line, the projection of the central line of the monitoring camera on a horizontal plane and the included angle in the true north direction of geography, the horizontal field angle of the monitoring camera, the vertical field angle of the monitoring camera and the image resolution parameter information of the monitoring camera; the mapping relation comprises the linear horizontal distance, the longitude horizontal distance, the first included angle, the second included angle and the picture coordinate of the random position in the monitoring camera.

10. The multimodal collaborative optimization system for video surveillance and remote sensing according to claim 7, wherein the fusion module specifically comprises: