CN113947714A - Multi-mode collaborative optimization method and system for video monitoring and remote sensing - Google Patents

Multi-mode collaborative optimization method and system for video monitoring and remote sensing Download PDF

Info

Publication number
CN113947714A
CN113947714A CN202111154171.6A CN202111154171A CN113947714A CN 113947714 A CN113947714 A CN 113947714A CN 202111154171 A CN202111154171 A CN 202111154171A CN 113947714 A CN113947714 A CN 113947714A
Authority
CN
China
Prior art keywords
classifier
remote sensing
monitoring
confidence
preset
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111154171.6A
Other languages
Chinese (zh)
Other versions
CN113947714B (en
Inventor
李晓威
陈升敬
刘晓建
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Fuan Electronic Technology Co ltd
Original Assignee
Guangzhou Fuan Electronic Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Fuan Electronic Technology Co ltd filed Critical Guangzhou Fuan Electronic Technology Co ltd
Priority to CN202111154171.6A priority Critical patent/CN113947714B/en
Publication of CN113947714A publication Critical patent/CN113947714A/en
Application granted granted Critical
Publication of CN113947714B publication Critical patent/CN113947714B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Medical Informatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)
  • Closed-Circuit Television Systems (AREA)

Abstract

The invention relates to a multimode collaborative optimization method and a system for video monitoring and remote sensing, wherein the method comprises the following steps: constructing a first classifier and a second classifier; constructing a mapping relation between picture coordinates and remote sensing longitude and latitude coordinates of a monitoring camera according to state parameters of a preset monitoring camera; carrying out target recognition on the same object, extracting recognition results in the first classifier and the second classifier, and carrying out data fusion; performing course-type learning on the classifier, and continuously optimizing the classifier; and inputting the monitoring picture to be detected and the remote sensing image to be detected into the optimized classifier for recognition and fusion to obtain a target recognition result. The method combines video monitoring and remote sensing, and collaborative optimization, trains a new model on the basis of the existing machine learning model, and realizes result fusion in multi-modal recognition to realize online learning, thereby improving the recognition accuracy.

Description

Multi-mode collaborative optimization method and system for video monitoring and remote sensing
Technical Field
The invention relates to the technical field of target identification, in particular to a multi-mode collaborative optimization method and system for video monitoring and remote sensing.
Background
The target identification technology has wide application prospect in the fields of video monitoring, robots, intelligent transportation and the like. However, since the target recognition needs to involve calculation and analysis of a large amount of data and interference of environmental factors such as external light viewing angle, the traditional recognition algorithm cannot extract the optimal characteristics of the image, and the recognition rate is limited.
The target detection and identification is one of important applications of remote sensing technology, and has great significance in the fields of ocean monitoring, geological survey, urban planning and the like. The existing remote sensing image target detection method adopts a one-step target detection algorithm or a rotation transformation data enhancement mode detection method, has higher cost and can not meet the requirement of real-time detection of the remote sensing image on the premise of ensuring high target detection precision.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention aims to provide a multi-mode collaborative optimization method and system for video monitoring and remote sensing.
In order to achieve the purpose, the invention provides the following scheme:
a multimode collaborative optimization method for video monitoring and remote sensing comprises the following steps:
constructing a monitoring picture target detection data set and a remote sensing image target detection data set;
training a machine learning model according to the monitoring picture target detection data set and the remote sensing image target detection data set to obtain a trained first classifier and a trained second classifier;
constructing a mapping relation between picture coordinates and remote sensing longitude and latitude coordinates of a monitoring camera according to state parameters of a preset monitoring camera;
identifying objects in the monitoring picture by using the first classifier to obtain a first confidence set of all objects in the monitoring picture, and identifying the objects in the remote sensing image by using the second classifier to obtain a second confidence set of all objects in the remote sensing image;
respectively utilizing the first classifier and the second classifier to recognize preset recognition objects to obtain recognition results, and carrying out data fusion and marking on the recognition results according to the first confidence degree set, the second confidence degree set and the mapping relation to obtain a monitoring picture added with a pseudo label and a remote sensing image added with the pseudo label;
acquiring a training set according to the monitoring picture added with the pseudo label and the remote sensing image added with the pseudo label, and respectively training the first classifier and the second classifier according to the training set to obtain an optimized classifier;
and inputting the monitoring picture to be detected and the remote sensing image to be detected into the optimized classifier for recognition and fusion to obtain a target recognition result.
Preferably, the training a machine learning model according to the monitoring picture target detection data set and the remote sensing image target detection data set to obtain a trained first classifier and a trained second classifier includes:
dividing the monitoring picture target detection data set and the remote sensing image target detection data set according to a preset proportion to obtain a monitoring picture training set, a monitoring picture verification set, a remote sensing image training set and a remote sensing image verification set;
based on a multi-scale training method and a data processing method, training a YoloV3 model according to the monitoring picture training set and the remote sensing image training set, and evaluating the trained YoloV3 model according to the monitoring picture verification set and the remote sensing image verification set to obtain a trained YoloV3 model;
and constructing the first classifier and the second classifier by using the trained YoloV3 model.
Preferably, the constructing a mapping relationship between the picture coordinates of the monitoring camera and the remote sensing longitude and latitude coordinates according to the state parameters of the preset monitoring camera includes:
acquiring state parameters of the preset monitoring camera; the state parameters include: the height of the monitoring camera from the horizontal plane, the included angle between the central line of the monitoring camera and the vertical line, the included angle between the projection of the central line of the monitoring camera on the horizontal plane and the true north direction of geography, the horizontal field angle of the monitoring camera, the vertical field angle of the monitoring camera and the image resolution parameter information of the monitoring camera;
calibrating the vertical projection position of the monitoring camera on the horizontal plane;
calculating a straight-line horizontal distance and a longitude horizontal distance between the vertical projection position and any position on a horizontal plane in a visual range of the monitoring camera based on a hemiversine formula;
calculating an included angle between a connecting line of the vertical projection position and the arbitrary position and the geographical true north direction according to the straight line horizontal distance and the longitude horizontal distance, and recording the included angle as a first included angle;
calculating the included angle between the connecting line of the position of the monitoring camera and the arbitrary position and a vertical line according to the straight-line horizontal distance and the height of the monitoring camera from the horizontal plane, and recording the included angle as a second included angle;
calculating the picture coordinate of the monitoring camera at any position according to the included angle between the center line of the monitoring camera and a vertical line, the projection of the center line of the monitoring camera on a horizontal plane and the included angle in the true north direction of geography, the horizontal field angle of the monitoring camera, the vertical field angle of the monitoring camera and the image resolution parameter information of the monitoring camera; the mapping relation comprises the straight line horizontal distance, the longitude horizontal distance, the first included angle, the second included angle and the picture coordinate of the random position in the monitoring camera.
Preferably, the recognizing a preset recognized object by using the first classifier and the second classifier respectively to obtain a recognition result, and performing data fusion and labeling on the recognition result according to the first confidence set, the second confidence set and the mapping relationship to obtain a monitoring picture with a pseudo tag added and a remote sensing image with a pseudo tag added, includes:
determining the preset identification object according to the monitoring picture target detection data set and the remote sensing image target detection data set; the preset recognition object is recognized by a classifier to obtain a preset target recognition object;
correspondingly inputting the monitoring picture target detection data set and the remote sensing image target detection data set to the first classifier and the second classifier respectively for identification to obtain a first classification result and a second classification result;
and performing data fusion according to the first classification result and the second classification result to obtain the identification result, and performing data marking according to the identification result, the first confidence set, the second confidence set and the mapping relation to obtain the monitoring picture after the pseudo label is added and the remote sensing image after the pseudo label is added.
Preferably, the data fusion is performed according to the first classification result and the second classification result to obtain the identification result, and data labeling is performed according to the identification result, the first confidence set, the second confidence set and the mapping relationship to obtain the monitoring picture after the pseudo label is added and the remote sensing image after the pseudo label is added, including
If the first classification result comprises the preset target recognition object and the second classification result does not comprise the preset target recognition object, extracting the confidence coefficient of the preset target recognition object in the first confidence degree set, judging whether the confidence coefficient of the preset target recognition object is greater than or equal to a preset confidence coefficient threshold value, if so, acquiring a target recognition frame of the preset target recognition object, obtaining a corresponding region in the remote sensing image according to the mapping relation and the target recognition frame, and marking the corresponding region as a first recognition result;
if the first classification result does not include the preset target recognition object and the second classification result includes the preset target recognition object, extracting the confidence coefficient of the preset target recognition object in the second confidence coefficient set, and judging whether the confidence coefficient of the preset target recognition object is greater than or equal to a preset confidence coefficient threshold value, if so, acquiring a target recognition frame of the preset target recognition object, obtaining a corresponding region in a monitoring picture according to the mapping relation and the target recognition frame, and marking the corresponding region as a second recognition result;
if the first classification result comprises the preset target recognition object and the second classification result comprises the preset target recognition object, extracting confidence degrees of the preset target recognition object in the first confidence degree set and the second confidence degree set;
if the types of the target objects recognized in the first classification result and the second classification result are the same, judging whether the confidence degree of the preset target recognition object in the first confidence degree set is greater than or equal to a preset confidence degree threshold or whether the confidence degree of the preset target recognition object in the second confidence degree set is greater than or equal to a preset confidence degree threshold, and if so, determining the first classification result and the second classification result as correct results;
if the types of the target objects identified in the first classification result and the second classification result are different, screening out the maximum value of the confidence degrees of the preset target identification objects in the first confidence degree set and the confidence degrees of the preset target identification objects in the second confidence degree set, and judging whether the maximum value is greater than or equal to a preset confidence degree threshold value, if so, correcting the type of the target object corresponding to the other confidence degree by using the type of the target object corresponding to the maximum value to obtain a correction result;
and obtaining the monitoring picture after the pseudo label is added and the remote sensing image after the pseudo label is added according to the correction result.
Preferably, the obtaining a training set according to the monitoring picture after adding the pseudo tag and the remote sensing image after adding the pseudo tag, and respectively training the first classifier and the second classifier according to the training set to obtain an optimized classifier includes:
respectively calculating the average confidence of all objects in each frame of monitoring picture image and each remote sensing image;
when the number of the monitoring pictures and the number of the remote sensing images respectively processed by the first classifier and the second classifier and subjected to data fusion reach a preset training number, respectively sequencing the monitoring pictures and the remote sensing images from high to low according to the value of the average confidence coefficient to obtain a sequencing image set;
obtaining the training set according to the sequencing image set, the monitoring picture added with the pseudo label and the remote sensing image added with the pseudo label;
training the first classifier and the second classifier according to the monitoring picture image and the remote sensing image in the training set respectively to obtain a first classifier and a second classifier which are trained for the first time;
respectively inputting the multi-frame monitoring picture and the multiple remote sensing images into the first classifier and the second classifier of the primary training for target recognition to obtain the confidence coefficient of a target object in the monitoring picture image and the confidence coefficient of the target object in the remote sensing images;
counting the total number of all identified objects in a plurality of monitoring pictures and a plurality of remote sensing images;
counting the number of objects of which the confidence degrees are greater than or equal to a preset confidence degree threshold value and the confidence degrees are greater than or equal to a preset confidence degree threshold value in the remote sensing image, wherein the object numbers are the same type;
calculating the ratio of the number of the objects with the same type of the target objects to the total number of all the identified objects;
and judging whether the ratio is smaller than a preset precision threshold value, if so, continuing training the first classifier and the second classifier, and if not, determining the first classifier and the second classifier as the optimized classifier.
A multimodal collaborative optimization system for video surveillance and remote sensing, comprising:
the data set construction module is used for constructing a monitoring picture target detection data set and a remote sensing image target detection data set;
the classifier building module is used for training a machine learning model according to the monitoring picture target detection data set and the remote sensing image target detection data set to obtain a trained first classifier and a trained second classifier;
the mapping relation construction module is used for constructing a mapping relation between the picture coordinate of the monitoring camera and the remote sensing longitude and latitude coordinate according to the state parameter of the preset monitoring camera;
the confidence coefficient acquisition module is used for identifying objects in the monitoring picture by using the first classifier to obtain a first confidence coefficient set of all the objects in the monitoring picture, and identifying the objects in the remote sensing image by using the second classifier to obtain a second confidence coefficient set of all the objects in the remote sensing image;
the fusion module is used for respectively utilizing the first classifier and the second classifier to identify preset identification objects to obtain identification results, and carrying out data fusion and marking on the identification results according to the first confidence coefficient set, the second confidence coefficient set and the mapping relation to obtain a monitoring picture added with a pseudo label and a remote sensing image added with the pseudo label;
the optimization module is used for acquiring a training set according to the monitoring picture added with the pseudo label and the remote sensing image added with the pseudo label, and respectively training the first classifier and the second classifier according to the training set to obtain an optimized classifier;
and the identification module is used for inputting the monitoring picture to be detected and the remote sensing image to be detected into the optimized classifier for identification and fusion to obtain a target identification result.
Preferably, the classifier building module specifically includes:
the dividing unit is used for dividing the monitoring picture target detection data set and the remote sensing image target detection data set according to a preset proportion to obtain a monitoring picture training set, a monitoring picture verification set, a remote sensing image training set and a remote sensing image verification set;
the training unit is used for training a YoloV3 model according to the monitoring picture training set and the remote sensing image training set based on a multi-scale training method and a data processing method, and evaluating the trained YoloV3 model according to the monitoring picture verification set and the remote sensing image verification set to obtain a trained YoloV3 model;
a building unit, configured to build the first classifier and the second classifier by using the trained YoloV3 model.
Preferably, the mapping relationship building module specifically includes:
the information acquisition unit is used for acquiring the state parameters of the preset monitoring camera; the state parameters include: the height of the monitoring camera from the horizontal plane, the included angle between the central line of the monitoring camera and the vertical line, the included angle between the projection of the central line of the monitoring camera on the horizontal plane and the true north direction of geography, the horizontal field angle of the monitoring camera, the vertical field angle of the monitoring camera and the image resolution parameter information of the monitoring camera;
the calibration unit is used for calibrating the vertical projection position of the monitoring camera on the horizontal plane;
the first calculation unit is used for calculating a straight-line horizontal distance and a longitude horizontal distance between the vertical projection position and any position on a horizontal plane in a visual range of the monitoring camera based on a hemiversine formula;
the second calculation unit is used for calculating an included angle between a connecting line of the vertical projection position and the arbitrary position and the geographical true north direction according to the straight line horizontal distance and the longitude horizontal distance, and recording the included angle as a first included angle;
the third calculating unit is used for calculating an included angle between a connecting line of the position of the monitoring camera and the arbitrary position and a vertical line according to the linear horizontal distance and the height of the monitoring camera from the horizontal plane, and recording the included angle as a second included angle;
the fourth calculating unit is used for calculating the picture coordinate of the monitoring camera at any position according to the included angle between the central line of the monitoring camera and a vertical line, the projection of the central line of the monitoring camera on a horizontal plane and the included angle in the true north direction of geography, the horizontal field angle of the monitoring camera, the vertical field angle of the monitoring camera and the image resolution parameter information of the monitoring camera; the mapping relation comprises the linear horizontal distance, the longitude horizontal distance, the first included angle, the second included angle and the picture coordinate of the random position in the monitoring camera.
Preferably, the fusion module specifically includes:
the target object determining unit is used for determining the preset identification object according to the monitoring picture target detection data set and the remote sensing image target detection data set; the preset recognition object is recognized by a classifier to obtain a preset target recognition object;
the identification unit is used for correspondingly inputting the monitoring picture target detection data set and the remote sensing image target detection data set to the first classifier and the second classifier respectively for identification to obtain a first classification result and a second classification result;
and the fusion marking unit is used for carrying out data fusion according to the first classification result and the second classification result to obtain the identification result, and carrying out data marking according to the identification result, the first confidence coefficient set, the second confidence coefficient set and the mapping relation to obtain the monitoring picture added with the pseudo label and the remote sensing image added with the pseudo label.
According to the specific embodiment provided by the invention, the invention discloses the following technical effects:
the invention provides a multimode collaborative optimization method and a multimode collaborative optimization system for video monitoring and remote sensing, wherein the method comprises the following steps: constructing a monitoring picture target detection data set and a remote sensing image target detection data set; training a machine learning model according to the monitoring picture target detection data set and the remote sensing image target detection data set to obtain a trained first classifier and a trained second classifier; constructing a mapping relation between picture coordinates and remote sensing longitude and latitude coordinates of a monitoring camera according to state parameters of a preset monitoring camera; identifying objects in the monitoring picture by using the first classifier to obtain a first confidence set of all objects in the monitoring picture, and identifying the objects in the remote sensing image by using the second classifier to obtain a second confidence set of all objects in the remote sensing image; respectively identifying preset identification objects by using the first classifier and the second classifier to obtain identification results, and performing data fusion and marking on the identification results according to the first confidence coefficient set, the second confidence coefficient set and the mapping relation to obtain a monitoring picture added with a pseudo label and a remote sensing image added with the pseudo label; acquiring a training set according to the monitoring picture added with the pseudo label and the remote sensing image added with the pseudo label, and respectively training the first classifier and the second classifier according to the training set to obtain an optimized classifier; and inputting the monitoring picture to be detected and the remote sensing image to be detected into the optimized classifier for recognition and fusion to obtain a target recognition result. The method combines video monitoring and remote sensing, and collaborative optimization, trains a new model on the basis of the existing machine learning model, and realizes result fusion in multi-modal recognition to realize online learning, thereby improving the recognition accuracy.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings required to be used in the embodiments will be briefly described below, and it is obvious that the drawings described below are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without inventive labor.
FIG. 1 is a method flow diagram of a multi-modal collaborative optimization method in an embodiment provided by the present invention;
FIG. 2 is a schematic diagram of an application process in an embodiment provided by the present invention;
fig. 3 is a first schematic diagram of a method for calculating a mapping relationship between picture coordinates and longitude and latitude coordinates of a monitoring camera in an embodiment of the present invention;
fig. 4 is a second schematic diagram of a method for calculating a mapping relationship between picture coordinates and longitude and latitude coordinates of a monitoring camera in an embodiment of the present invention;
fig. 5 is a schematic diagram of a position of an object to be detected in a screen in an embodiment of the present invention;
fig. 6 is a module connection diagram of a modality collaborative optimization system in an embodiment provided by the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be described clearly and completely with reference to the accompanying drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.
The terms "first," "second," "third," and "fourth," etc. in the description and claims of this application and in the accompanying drawings are used for distinguishing between different objects and not for describing a particular order. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, the inclusion of a series of steps, processes, methods, etc. is not limited to the steps shown, but may alternatively include other steps not shown, or may alternatively include other steps inherent to such processes, methods, articles, or apparatus.
The invention aims to provide a multi-mode collaborative optimization method and system for video monitoring and remote sensing, which can improve the identification accuracy.
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, the present invention is described in detail with reference to the accompanying drawings and the detailed description thereof.
Fig. 1 and fig. 2 are a method flowchart and an application process schematic diagram of a multimodal collaborative optimization method in an embodiment provided by the present invention, and as shown in fig. 1 and fig. 2, the present invention provides a multimodal collaborative optimization method for video monitoring and remote sensing, which includes:
step 100: constructing a monitoring picture target detection data set and a remote sensing image target detection data set;
step 200: training a machine learning model according to the monitoring picture target detection data set and the remote sensing image target detection data set to obtain a first classifier and a second classifier which are trained;
step 300: constructing a mapping relation between picture coordinates and remote sensing longitude and latitude coordinates of a monitoring camera according to state parameters of a preset monitoring camera;
step 400: identifying objects in the monitoring picture by using the first classifier to obtain a first confidence set of all objects in the monitoring picture, and identifying the objects in the remote sensing image by using the second classifier to obtain a second confidence set of all objects in the remote sensing image;
step 500: respectively utilizing the first classifier and the second classifier to identify a preset identification object to obtain an identification result, and carrying out data fusion and marking on the identification result according to the first confidence set, the second confidence set and the mapping relation to obtain a monitoring picture added with a pseudo label and a remote sensing image added with the pseudo label;
step 600: acquiring a training set according to the monitoring picture added with the pseudo label and the remote sensing image added with the pseudo label, and respectively training the first classifier and the second classifier according to the training set to obtain an optimized classifier;
step 700: and inputting the monitoring picture to be detected and the remote sensing image to be detected into the optimized classifier for recognition and fusion to obtain a target recognition result.
Preferably, the training a machine learning model according to the monitoring picture target detection data set and the remote sensing image target detection data set to obtain a trained first classifier and a trained second classifier includes:
dividing the monitoring picture target detection data set and the remote sensing image target detection data set according to a preset proportion to obtain a monitoring picture training set, a monitoring picture verification set, a remote sensing image training set and a remote sensing image verification set;
based on a multi-scale training method and a data processing method, training a YoloV3 model according to the monitoring picture training set and the remote sensing image training set, and evaluating the trained YoloV3 model according to the monitoring picture verification set and the remote sensing image verification set to obtain a trained YoloV3 model;
and constructing the first classifier and the second classifier by using the trained YoloV3 model.
Optionally, in this embodiment, a classifier is first constructed based on a YoloV3 neural network, a monitoring picture target detection data set and a remote sensing image target detection data set are constructed, and the operation on the two data sets is as follows: the 70% of the data was randomly set as the training set and the remaining 30% as the validation set. And training the model YoloV3 on the training set until the model converges, and in order to further improve the detection effect, using a multi-scale training method and a data enhancement method, wherein the performance and generalization capability of the trained model are greatly improved by the training methods.
Specifically, two classifiers, namely a classifier 1 and a classifier 2, are constructed by training a yoolov 3 and utilizing a trained yoolov 3 neural network, wherein the classifier 1 is used for monitoring image target recognition, and the classifier 2 is used for remote sensing image target recognition.
Preferably, the constructing a mapping relationship between the picture coordinates of the monitoring camera and the remote sensing longitude and latitude coordinates according to the state parameters of the preset monitoring camera includes:
acquiring state parameters of the preset monitoring camera; the state parameters include: the height of the monitoring camera from the horizontal plane, the included angle between the central line of the monitoring camera and the vertical line, the included angle between the projection of the central line of the monitoring camera on the horizontal plane and the true north direction of geography, the horizontal field angle of the monitoring camera, the vertical field angle of the monitoring camera and the image resolution parameter information of the monitoring camera;
calibrating the vertical projection position of the monitoring camera on the horizontal plane;
calculating a straight-line horizontal distance and a longitude horizontal distance between the vertical projection position and any position on a horizontal plane in a visual range of the monitoring camera based on a hemiversine formula;
calculating an included angle between a connecting line of the vertical projection position and the arbitrary position and the geographical true north direction according to the straight line horizontal distance and the longitude horizontal distance, and recording the included angle as a first included angle;
calculating the included angle between the connecting line of the position of the monitoring camera and the arbitrary position and a vertical line according to the straight-line horizontal distance and the height of the monitoring camera from the horizontal plane, and recording the included angle as a second included angle;
calculating the picture coordinate of the monitoring camera at any position according to the included angle between the center line of the monitoring camera and a vertical line, the projection of the center line of the monitoring camera on a horizontal plane and the included angle in the true north direction of geography, the horizontal field angle of the monitoring camera, the vertical field angle of the monitoring camera and the image resolution parameter information of the monitoring camera; the mapping relation comprises the straight line horizontal distance, the longitude horizontal distance, the first included angle, the second included angle and the picture coordinate of the random position in the monitoring camera.
Referring to fig. 3 to 5, in the embodiment, after the classifier is constructed, parameter acquisition preparation is performed, and a mapping relationship establishing process is performed according to the acquired parameters, as shown in fig. 3 to 5, the letter meaning in the drawings is specifically: n schematic geographical true NorthDirection, measuring the height of the monitoring camera from the horizontal plane as H, the included angle between the central line of the monitoring camera and the vertical line as theta, the included angle between the projection of the central line of the monitoring camera on the horizontal plane and the geographical true north direction as beta, and the horizontal field angle of the monitoring camera as omegaxThe vertical field angle of the monitoring camera is omegayAcquiring the resolution parameter information of the image of the monitoring camera as X multiplied by Y (X is the pixel width of the image, and Y is the pixel height);
suppose that: the coordinate of the center of the picture of the monitoring camera is (0,0), the vertical projection position of the position O of the monitoring camera on the horizontal plane is O', and the longitude and latitude are (lambda)0,ψ0) Aiming at any position A on the horizontal plane within the visual range of the monitoring cameraiLatitude and longitude coordinates (λ)i,ψi) Can be converted into the picture coordinate (x) of the monitoring camera in the following wayi,yi);
1) According to a Haversene (semi-positive vector formula) formula, calculating a vertical projection position O' of the position of the monitoring camera on a horizontal plane and an optional position A on the horizontal plane within a visual range of the monitoring cameraiStraight horizontal distance diIn units of m, O' and AiLongitude horizontal distance siThe unit is m:
Figure BDA0003288004740000131
Figure BDA0003288004740000132
Figure BDA0003288004740000133
Figure BDA0003288004740000134
wherein: a. b are all intermediate variable values, O' (λ)0,ψ0) For monitoring the vertical projection position of the camera on the horizontal plane, Aii,ψi) Is any position on a horizontal plane within the visual range of the monitoring camera, and r is the radius of the earth and the unit is m;
2): calculation of O' and A from 1)iAngle beta between the connecting line of (A) and the true north direction of geographyi
Figure BDA0003288004740000135
3): calculation of O and A from 2)iAngle theta between the connecting line and the vertical linei
Figure BDA0003288004740000136
H is the height of the monitoring camera from the horizontal plane, and the unit is m;
4) calculation of AiIn the picture coordinate (x) of the monitoring camerai,yi):
Figure BDA0003288004740000141
Figure BDA0003288004740000142
Wherein X is the pixel width of the image, Y is the pixel height, and the parameter values of X and Y can be obtained according to the image resolution of the monitoring camera being X multiplied by Y;
theta is the included angle between the central line of the monitoring camera and the vertical line, beta is the included angle between the projection of the central line of the monitoring camera on the horizontal plane and the true north direction of geography, omegaxFor monitoring the horizontal field angle, omega, of the camerayThe vertical field angle of the monitoring camera is adopted.
Preferably, the recognizing a preset recognized object by using the first classifier and the second classifier respectively to obtain a recognition result, and performing data fusion and labeling on the recognition result according to the first confidence set, the second confidence set and the mapping relationship to obtain a monitoring picture with a pseudo tag added and a remote sensing image with a pseudo tag added, includes:
determining the preset identification object according to the monitoring picture target detection data set and the remote sensing image target detection data set; the preset recognition object is recognized by a classifier to obtain a preset target recognition object;
correspondingly inputting the monitoring picture target detection data set and the remote sensing image target detection data set to the first classifier and the second classifier respectively for identification to obtain a first classification result and a second classification result;
and performing data fusion according to the first classification result and the second classification result to obtain the identification result, and performing data marking according to the identification result, the first confidence set, the second confidence set and the mapping relation to obtain the monitoring picture after the pseudo label is added and the remote sensing image after the pseudo label is added.
Preferably, the data fusion is performed according to the first classification result and the second classification result to obtain the identification result, and data labeling is performed according to the identification result, the first confidence set, the second confidence set and the mapping relationship to obtain the monitoring picture after the pseudo label is added and the remote sensing image after the pseudo label is added, including
If the first classification result comprises the preset target recognition object and the second classification result does not comprise the preset target recognition object, extracting the confidence coefficient of the preset target recognition object in the first confidence degree set, judging whether the confidence coefficient of the preset target recognition object is greater than or equal to a preset confidence coefficient threshold value, if so, acquiring a target recognition frame of the preset target recognition object, obtaining a corresponding region in the remote sensing image according to the mapping relation and the target recognition frame, and marking the corresponding region as a first recognition result;
if the first classification result does not include the preset target recognition object and the second classification result includes the preset target recognition object, extracting the confidence coefficient of the preset target recognition object in the second confidence coefficient set, and judging whether the confidence coefficient of the preset target recognition object is greater than or equal to a preset confidence coefficient threshold value, if so, acquiring a target recognition frame of the preset target recognition object, obtaining a corresponding region in a monitoring picture according to the mapping relation and the target recognition frame, and marking the corresponding region as a second recognition result;
if the first classification result comprises the preset target recognition object and the second classification result comprises the preset target recognition object, extracting confidence degrees of the preset target recognition object in the first confidence degree set and the second confidence degree set;
if the types of the target objects recognized in the first classification result and the second classification result are the same, judging whether the confidence degree of the preset target recognition object in the first confidence degree set is greater than or equal to a preset confidence degree threshold or whether the confidence degree of the preset target recognition object in the second confidence degree set is greater than or equal to a preset confidence degree threshold, and if so, determining the first classification result and the second classification result as correct results;
if the types of the target objects identified in the first classification result and the second classification result are different, screening out the maximum value of the confidence degrees of the preset target identification objects in the first confidence degree set and the confidence degrees of the preset target identification objects in the second confidence degree set, and judging whether the maximum value is greater than or equal to a preset confidence degree threshold value, if so, correcting the type of the target object corresponding to the other confidence degree by using the type of the target object corresponding to the maximum value to obtain a correction result;
and obtaining the monitoring picture after the pseudo label is added and the remote sensing image after the pseudo label is added according to the correction result.
Specifically, in this embodiment, after the mapping relationship establishing process is performed, the same object is subjected to target recognition, and the recognition results in the classifier 1 and the classifier 2 are extracted for data fusion.
Furthermore, according to the mapping relation between the picture coordinate of the monitoring camera and the remote sensing longitude and latitude coordinate, the mutual correspondence of the same object in the monitoring picture and the remote sensing image can be realized. The object in the monitoring picture and the object in the remote sensing image are subjected to target recognition by utilizing the constructed classifier, wherein the classifier 1 recognizes the object in the monitoring picture, the classifier 2 recognizes the object in the remote sensing image, a confidence coefficient set A of all objects in the monitoring picture and a confidence coefficient set B of all objects in the remote sensing image are respectively obtained, a confidence coefficient threshold is set to be Q, and the following three conditions can occur in the recognition of the same target object: (1) identifying a target object in the monitoring picture, wherein the target object is not identified in the remote sensing image; (2) a target object is not identified in the monitoring picture, and the target object is identified in the remote sensing image; (3) and identifying a target object in the monitoring picture and the remote sensing image. And respectively carrying out data fusion on the results of the three conditions, wherein the specific processing process is as follows:
1) and identifying a target object in the monitoring picture, wherein the target object is not identified in the remote sensing image.
Extracting confidence F of the target object from the confidence set A of all objects in the monitoring picture1If F is1If < Q, classifier 1 identifies an error, if F1If the value is more than or equal to Q, the classifier 1 identifies the correctness; and when the classifier 1 can correctly identify the target object, acquiring an identification frame of the target object, wherein the picture coordinates of two points at the upper left and the lower right of the identification frame can be known when the classifier 1 identifies the target, obtaining a corresponding area of the identification frame in the remote sensing image according to the mapping relation established in the step 2, and marking the area as the target object identified by the classifier 1.
2) The target object is not identified in the monitoring picture, and the target object is identified in the remote sensing image.
Extracting target object from confidence set B of all objects in remote sensing imageDegree of confidence F2If F is2< Q, the classifier 2 identifies an error, if F2If the value is more than or equal to Q, the classifier 2 identifies the correctness; when the classifier 2 can correctly identify the target object, acquiring an identification frame of the target object, obtaining longitude and latitude information of two points, namely the upper left point and the lower right point of the identification frame, of the remote sensing image, obtaining a corresponding area of the identification frame in the monitoring picture according to the mapping relation established in the step 2, and marking the area as the target object identified by the classifier 2.
3) And identifying a target object in the monitoring picture and the remote sensing image.
Extracting the confidence F of the target object from the confidence set A of all objects in the monitoring picture and the confidence set B of all objects in the remote sensing image1And F2If the types of the target objects identified by the classifier 1 and the classifier 2 are the same, the confidence coefficient F is judged1And F2Whether the condition is satisfied: f1≥Q,F2Q, when one or more conditions are met, the classifier 1 and the classifier 2 correctly identify, otherwise, the classifier 1 and the classifier 2 incorrectly identify; if the types of the target objects identified by the classifier 1 and the classifier 2 are different, screening out confidence F1And F2Maximum value of (1): fmax=(F1,F2)maxWhen F ismax< Q, classifier 1 and classifier 2 identify an error, when FmaxWhen not less than Q, with confidence (F)1,F2) And correcting the target object type corresponding to the other confidence coefficient by taking the target object type corresponding to the maximum value as a reference. And keeping the correctly identified and corrected target object in the monitoring picture and the remote sensing image, wherein the corrected target object is represented by adding a pseudo label to the target object.
Preferably, the obtaining a training set according to the monitoring picture after adding the pseudo tag and the remote sensing image after adding the pseudo tag, and respectively training the first classifier and the second classifier according to the training set to obtain an optimized classifier includes:
respectively calculating the average confidence of all objects in each frame of monitoring picture image and each remote sensing image;
when the number of the monitoring pictures and the number of the remote sensing images respectively processed by the first classifier and the second classifier and subjected to data fusion reach a preset training number, respectively sequencing the monitoring pictures and the remote sensing images from high to low according to the value of the average confidence coefficient to obtain a sequencing image set;
obtaining the training set according to the sequencing image set, the monitoring picture added with the pseudo label and the remote sensing image added with the pseudo label;
training the first classifier and the second classifier according to the monitoring picture image and the remote sensing image in the training set respectively to obtain a first classifier and a second classifier which are trained for the first time;
respectively inputting the multi-frame monitoring picture and the multiple remote sensing images into the first classifier and the second classifier of the primary training for target recognition to obtain the confidence coefficient of a target object in the monitoring picture image and the confidence coefficient of the target object in the remote sensing images;
counting the total number of all identified objects in a plurality of monitoring pictures and a plurality of remote sensing images;
counting the number of objects of which the confidence degrees are greater than or equal to a preset confidence degree threshold value and the confidence degrees are greater than or equal to a preset confidence degree threshold value in the remote sensing image, wherein the object numbers are the same type;
calculating the ratio of the number of the objects with the same type of the target objects to the total number of all the identified objects;
and judging whether the ratio is smaller than a preset precision threshold value, if so, continuing training the first classifier and the second classifier, and if not, determining the first classifier and the second classifier as the optimized classifier.
Specifically, after the data fusion is performed, the classifier is learned in a course, so that the classifier is continuously optimized, and the accuracy of the classifier in identifying the target object is improved.
Further, each frame is calculatedAverage confidence of all objects in monitoring picture image
Figure BDA0003288004740000181
Figure BDA0003288004740000182
Wherein a is the monitored picture of the a-th frame, ZiThe confidence of the ith object.
Calculating the average confidence of all objects in each remote sensing image
Figure BDA0003288004740000191
Figure BDA0003288004740000192
Wherein b is the b-th remote sensing image, PjThe confidence of the jth object.
Setting a parameter num, when the number of the monitoring picture images and the remote sensing images after being processed by the classifier and data fusion reaches num, sequencing the monitoring picture images and the remote sensing images from high to low according to the value of average confidence coefficient, respectively inputting the sequenced monitoring picture images and remote sensing images added with pseudo labels into the classifier 1 and the classifier 2, starting learning from simple samples, gradually transitioning to difficult samples, continuously optimizing the classifier, and enabling the training process to be more stable.
Specifically, in the embodiment, after the classifier is optimized in a curriculum-based learning manner, the optimized classifier is further used for accurately identifying the target object.
Further, continuously learning and optimizing in step 4 to obtain a classifier 1 and a classifier 2 with higher precision, inputting s frames of monitoring picture images and s remote sensing images, respectively carrying out target identification on the monitoring picture images and the remote sensing images by using the optimized classifier 1 and the optimized classifier 2 to obtain the confidence F of the target object in the monitoring picture images1And remote sensing of target objects in the imageConfidence F2Simultaneously counting the total number of all identified objects in the s-frame monitoring picture image and the s remote sensing images to be N, setting a parameter K, and counting to satisfy F1≥Q,F2The number of the objects is omega under the three conditions that the object is equal to or more than Q and the target object is of the same type, and the ratio T of omega to all the recognized objects is calculated:
T=Ω/2N;
when T is less than K, continuing to optimize the classifier; otherwise, the classifier is considered to be capable of accurately identifying the target object. And the target object type corresponding to the maximum confidence coefficient is used as a result after data fusion, so that accurate identification of the target object is realized.
Fig. 6 is a module connection diagram of a modal collaborative optimization system in an embodiment provided by the present invention, and as shown in fig. 6, the present invention further provides a multimodal collaborative optimization system for video monitoring and remote sensing, including:
the data set construction module is used for constructing a monitoring picture target detection data set and a remote sensing image target detection data set;
the classifier building module is used for training a machine learning model according to the monitoring picture target detection data set and the remote sensing image target detection data set to obtain a trained first classifier and a trained second classifier;
the mapping relation construction module is used for constructing a mapping relation between the picture coordinate of the monitoring camera and the remote sensing longitude and latitude coordinate according to the state parameter of the preset monitoring camera;
the confidence coefficient acquisition module is used for identifying objects in the monitoring picture by using the first classifier to obtain a first confidence coefficient set of all the objects in the monitoring picture, and identifying the objects in the remote sensing image by using the second classifier to obtain a second confidence coefficient set of all the objects in the remote sensing image;
the fusion module is used for respectively utilizing the first classifier and the second classifier to identify preset identification objects to obtain identification results, and carrying out data fusion and marking on the identification results according to the first confidence coefficient set, the second confidence coefficient set and the mapping relation to obtain a monitoring picture added with a pseudo label and a remote sensing image added with the pseudo label;
the optimization module is used for acquiring a training set according to the monitoring picture added with the pseudo label and the remote sensing image added with the pseudo label, and respectively training the first classifier and the second classifier according to the training set to obtain an optimized classifier;
and the identification module is used for inputting the monitoring picture to be detected and the remote sensing image to be detected into the optimized classifier for identification and fusion to obtain a target identification result.
Preferably, the classifier building module specifically includes:
the dividing unit is used for dividing the monitoring picture target detection data set and the remote sensing image target detection data set according to a preset proportion to obtain a monitoring picture training set, a monitoring picture verification set, a remote sensing image training set and a remote sensing image verification set;
the training unit is used for training a YoloV3 model according to the monitoring picture training set and the remote sensing image training set based on a multi-scale training method and a data processing method, and evaluating the trained YoloV3 model according to the monitoring picture verification set and the remote sensing image verification set to obtain a trained YoloV3 model;
a building unit, configured to build the first classifier and the second classifier by using the trained YoloV3 model.
Preferably, the mapping relationship building module specifically includes:
the information acquisition unit is used for acquiring the state parameters of the preset monitoring camera; the state parameters include: the height of the monitoring camera from the horizontal plane, the included angle between the central line of the monitoring camera and the vertical line, the included angle between the projection of the central line of the monitoring camera on the horizontal plane and the true north direction of geography, the horizontal field angle of the monitoring camera, the vertical field angle of the monitoring camera and the image resolution parameter information of the monitoring camera;
the calibration unit is used for calibrating the vertical projection position of the monitoring camera on the horizontal plane;
the first calculation unit is used for calculating a straight-line horizontal distance and a longitude horizontal distance between the vertical projection position and any position on a horizontal plane in a visual range of the monitoring camera based on a hemiversine formula;
the second calculation unit is used for calculating an included angle between a connecting line of the vertical projection position and the arbitrary position and the geographical true north direction according to the straight line horizontal distance and the longitude horizontal distance, and recording the included angle as a first included angle;
the third calculating unit is used for calculating an included angle between a connecting line of the position of the monitoring camera and the arbitrary position and a vertical line according to the linear horizontal distance and the height of the monitoring camera from the horizontal plane, and recording the included angle as a second included angle;
the fourth calculating unit is used for calculating the picture coordinate of the monitoring camera at any position according to the included angle between the central line of the monitoring camera and a vertical line, the projection of the central line of the monitoring camera on a horizontal plane and the included angle in the true north direction of geography, the horizontal field angle of the monitoring camera, the vertical field angle of the monitoring camera and the image resolution parameter information of the monitoring camera; the mapping relation comprises the linear horizontal distance, the longitude horizontal distance, the first included angle, the second included angle and the picture coordinate of the random position in the monitoring camera.
Preferably, the fusion module specifically includes:
the target object determining unit is used for determining the preset identification object according to the monitoring picture target detection data set and the remote sensing image target detection data set; the preset recognition object is recognized by a classifier to obtain a preset target recognition object;
the identification unit is used for correspondingly inputting the monitoring picture target detection data set and the remote sensing image target detection data set to the first classifier and the second classifier respectively for identification to obtain a first classification result and a second classification result;
and the fusion marking unit is used for carrying out data fusion according to the first classification result and the second classification result to obtain the identification result, and carrying out data marking according to the identification result, the first confidence coefficient set, the second confidence coefficient set and the mapping relation to obtain the monitoring picture added with the pseudo label and the remote sensing image added with the pseudo label.
The invention has the following beneficial effects:
(1) the classifier of the invention can continuously obtain new samples when executing the classification task, thereby continuously training and improving self and improving the precision of the classifier.
(2) And the optimized classifier is utilized to identify the target object, and the identification results of video monitoring and remote sensing are fused, so that the identification accuracy is improved.
The embodiments are described in a progressive manner in the specification, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are mutually referred to. For the system disclosed by the embodiment, the description is simple because the system corresponds to the method disclosed by the embodiment, and the relevant points can be referred to the method part for description.
The principles and embodiments of the present invention have been described herein using specific examples, which are provided only to help understand the method and the core concept of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there are variations in the specific implementation and application scope. In view of the above, the present description should not be construed as limiting the invention.

Claims (10)

1. A multimode collaborative optimization method for video monitoring and remote sensing is characterized by comprising the following steps:
constructing a monitoring picture target detection data set and a remote sensing image target detection data set;
training a machine learning model according to the monitoring picture target detection data set and the remote sensing image target detection data set to obtain a trained first classifier and a trained second classifier;
constructing a mapping relation between picture coordinates and remote sensing longitude and latitude coordinates of a monitoring camera according to state parameters of a preset monitoring camera;
identifying objects in the monitoring picture by using the first classifier to obtain a first confidence set of all objects in the monitoring picture, and identifying the objects in the remote sensing image by using the second classifier to obtain a second confidence set of all objects in the remote sensing image;
respectively utilizing the first classifier and the second classifier to identify a preset identification object to obtain an identification result, and carrying out data fusion and marking on the identification result according to the first confidence set, the second confidence set and the mapping relation to obtain a monitoring picture added with a pseudo label and a remote sensing image added with the pseudo label;
acquiring a training set according to the monitoring picture added with the pseudo label and the remote sensing image added with the pseudo label, and respectively training the first classifier and the second classifier according to the training set to obtain an optimized classifier;
and inputting the monitoring picture to be detected and the remote sensing image to be detected into the optimized classifier for recognition and fusion to obtain a target recognition result.
2. The multimodal collaborative optimization method for video surveillance and remote sensing according to claim 1, wherein the training of a machine learning model according to the surveillance image target detection dataset and the remote sensing image target detection dataset to obtain a trained first classifier and a trained second classifier comprises:
dividing the monitoring picture target detection data set and the remote sensing image target detection data set according to a preset proportion to obtain a monitoring picture training set, a monitoring picture verification set, a remote sensing image training set and a remote sensing image verification set;
based on a multi-scale training method and a data processing method, training a YoloV3 model according to the monitoring picture training set and the remote sensing image training set, and evaluating the trained YoloV3 model according to the monitoring picture verification set and the remote sensing image verification set to obtain a trained YoloV3 model;
and constructing the first classifier and the second classifier by using the trained YoloV3 model.
3. The multimodal collaborative optimization method for video monitoring and remote sensing according to claim 1, wherein the constructing a mapping relationship between picture coordinates of a monitoring camera and remote sensing longitude and latitude coordinates according to state parameters of a preset monitoring camera comprises:
acquiring state parameters of the preset monitoring camera; the state parameters include: the height of the monitoring camera from the horizontal plane, the included angle between the central line of the monitoring camera and the vertical line, the included angle between the projection of the central line of the monitoring camera on the horizontal plane and the true north direction of geography, the horizontal field angle of the monitoring camera, the vertical field angle of the monitoring camera and the image resolution parameter information of the monitoring camera;
calibrating the vertical projection position of the monitoring camera on the horizontal plane;
calculating a straight-line horizontal distance and a longitude horizontal distance between the vertical projection position and any position on a horizontal plane in a visual range of the monitoring camera based on a hemiversine formula;
calculating an included angle between a connecting line of the vertical projection position and the arbitrary position and the geographical true north direction according to the straight line horizontal distance and the longitude horizontal distance, and recording the included angle as a first included angle;
calculating an included angle between a connecting line of the position of the monitoring camera and the arbitrary position and a vertical line according to the linear horizontal distance and the height of the monitoring camera from the horizontal plane, and recording the included angle as a second included angle;
calculating the picture coordinate of the monitoring camera at any position according to the included angle between the central line of the monitoring camera and a vertical line, the projection of the central line of the monitoring camera on a horizontal plane and the included angle in the true north direction of geography, the horizontal field angle of the monitoring camera, the vertical field angle of the monitoring camera and the image resolution parameter information of the monitoring camera; the mapping relation comprises the linear horizontal distance, the longitude horizontal distance, the first included angle, the second included angle and the picture coordinate of the random position in the monitoring camera.
4. The multimodal collaborative optimization method for video monitoring and remote sensing according to claim 1, wherein the steps of respectively identifying a preset identification object by using the first classifier and the second classifier to obtain an identification result, and performing data fusion and labeling on the identification result according to the first confidence set, the second confidence set and the mapping relationship to obtain a monitoring picture with a pseudo label added and a remote sensing image with a pseudo label added comprise:
determining the preset identification object according to the monitoring picture target detection data set and the remote sensing image target detection data set; the preset recognition object is recognized by a classifier to obtain a preset target recognition object;
correspondingly inputting the monitoring picture target detection data set and the remote sensing image target detection data set to the first classifier and the second classifier respectively for identification to obtain a first classification result and a second classification result;
and performing data fusion according to the first classification result and the second classification result to obtain the identification result, and performing data marking according to the identification result, the first confidence set, the second confidence set and the mapping relation to obtain the monitoring picture after the pseudo label is added and the remote sensing image after the pseudo label is added.
5. The multi-modal collaborative optimization method for video monitoring and remote sensing according to claim 4, wherein the data fusion is performed according to the first classification result and the second classification result to obtain the recognition result, and the data labeling is performed according to the recognition result, the first confidence set, the second confidence set and the mapping relationship to obtain the pseudo-tag added monitoring picture and the pseudo-tag added remote sensing image, including
If the first classification result comprises the preset target recognition object and the second classification result does not comprise the preset target recognition object, extracting the confidence coefficient of the preset target recognition object in the first confidence coefficient set, judging whether the confidence coefficient of the preset target recognition object is greater than or equal to a preset confidence coefficient threshold value, if so, acquiring a target recognition frame of the preset target recognition object, obtaining a corresponding region in the remote sensing image according to the mapping relation and the target recognition frame, and marking the corresponding region as a first recognition result;
if the first classification result does not include the preset target recognition object and the second classification result includes the preset target recognition object, extracting the confidence coefficient of the preset target recognition object in the second confidence coefficient set, and judging whether the confidence coefficient of the preset target recognition object is greater than or equal to a preset confidence coefficient threshold value, if so, acquiring a target recognition frame of the preset target recognition object, obtaining a corresponding region in a monitoring picture according to the mapping relation and the target recognition frame, and marking the corresponding region as a second recognition result;
if the first classification result comprises the preset target recognition object and the second classification result comprises the preset target recognition object, extracting confidence degrees of the preset target recognition object in the first confidence degree set and the second confidence degree set;
if the types of the target objects identified in the first classification result and the second classification result are the same, judging whether the confidence of the preset target identification object in the first confidence set is greater than or equal to a preset confidence threshold or whether the confidence of the preset target identification object in the second confidence set is greater than or equal to a preset confidence threshold, and if so, determining the first classification result and the second classification result as correct results;
if the types of the target objects identified in the first classification result and the second classification result are different, screening out the maximum value of the confidence degrees of the preset target identification objects in the first confidence degree set and the confidence degrees of the preset target identification objects in the second confidence degree set, and judging whether the maximum value is greater than or equal to a preset confidence degree threshold value, if so, correcting the type of the target object corresponding to the other confidence degree by using the type of the target object corresponding to the maximum value to obtain a correction result;
and obtaining the monitoring picture after the pseudo label is added and the remote sensing image after the pseudo label is added according to the correction result.
6. The multimodal collaborative optimization method for video monitoring and remote sensing according to claim 5, wherein the obtaining of a training set according to the monitoring picture after adding the pseudo label and the remote sensing image after adding the pseudo label, and training the first classifier and the second classifier respectively according to the training set to obtain the optimized classifier comprises:
respectively calculating the average confidence of all objects in each frame of monitoring picture image and each remote sensing image;
when the number of the monitoring pictures and the number of the remote sensing images respectively processed by the first classifier and the second classifier and subjected to data fusion reach a preset training number, respectively sequencing the monitoring pictures and the remote sensing images from high to low according to the value of the average confidence coefficient to obtain a sequencing image set;
obtaining the training set according to the sequencing image set, the monitoring picture added with the pseudo label and the remote sensing image added with the pseudo label;
training the first classifier and the second classifier according to the monitoring picture image and the remote sensing image in the training set respectively to obtain a first classifier and a second classifier which are trained for the first time;
respectively inputting the multi-frame monitoring picture and the multiple remote sensing images into the first classifier and the second classifier of the primary training for target recognition to obtain the confidence coefficient of a target object in the monitoring picture image and the confidence coefficient of the target object in the remote sensing images;
counting the total number of all identified objects in a plurality of monitoring pictures and a plurality of remote sensing images;
counting the number of objects of which the confidence degrees are greater than or equal to a preset confidence degree threshold value and the confidence degrees are greater than or equal to the preset confidence degree threshold value in the remote sensing image, wherein the object numbers are the same type;
calculating the ratio of the number of the objects with the same type of the target objects to the total number of all the identified objects;
and judging whether the ratio is smaller than a preset precision threshold value, if so, continuing training the first classifier and the second classifier, and if not, determining the first classifier and the second classifier as the optimized classifier.
7. A multimodal collaborative optimization system for video surveillance and remote sensing, comprising:
the data set construction module is used for constructing a monitoring picture target detection data set and a remote sensing image target detection data set;
the classifier building module is used for training a machine learning model according to the monitoring picture target detection data set and the remote sensing image target detection data set to obtain a trained first classifier and a trained second classifier;
the mapping relation construction module is used for constructing a mapping relation between the picture coordinate of the monitoring camera and the remote sensing longitude and latitude coordinate according to the state parameter of the preset monitoring camera;
the confidence coefficient acquisition module is used for identifying the objects in the monitoring picture by using the first classifier to obtain a first confidence coefficient set of all the objects in the monitoring picture, and identifying the objects in the remote sensing image by using the second classifier to obtain a second confidence coefficient set of all the objects in the remote sensing image;
the fusion module is used for respectively utilizing the first classifier and the second classifier to identify a preset identification object to obtain an identification result, and performing data fusion and marking on the identification result according to the first confidence set, the second confidence set and the mapping relation to obtain a monitoring picture added with a pseudo label and a remote sensing image added with the pseudo label;
the optimization module is used for acquiring a training set according to the monitoring picture added with the pseudo label and the remote sensing image added with the pseudo label, and respectively training the first classifier and the second classifier according to the training set to obtain an optimized classifier;
and the identification module is used for inputting the monitoring picture to be detected and the remote sensing image to be detected into the optimized classifier for identification and fusion to obtain a target identification result.
8. The multimodal collaborative optimization system for video surveillance and remote sensing according to claim 7, wherein the classifier building module specifically comprises:
the dividing unit is used for dividing the monitoring picture target detection data set and the remote sensing image target detection data set according to a preset proportion to obtain a monitoring picture training set, a monitoring picture verification set, a remote sensing image training set and a remote sensing image verification set;
the training unit is used for training a YoloV3 model according to the monitoring picture training set and the remote sensing image training set based on a multi-scale training method and a data processing method, and evaluating the trained YoloV3 model according to the monitoring picture verification set and the remote sensing image verification set to obtain a trained YoloV3 model;
a building unit, configured to build the first classifier and the second classifier by using the trained YoloV3 model.
9. The multimodal collaborative optimization system for video surveillance and remote sensing according to claim 7, wherein the mapping relationship construction module specifically includes:
the information acquisition unit is used for acquiring the state parameters of the preset monitoring camera; the state parameters include: the height of the monitoring camera from the horizontal plane, the included angle between the central line of the monitoring camera and the vertical line, the included angle between the projection of the central line of the monitoring camera on the horizontal plane and the true north direction of geography, the horizontal field angle of the monitoring camera, the vertical field angle of the monitoring camera and the image resolution parameter information of the monitoring camera;
the calibration unit is used for calibrating the vertical projection position of the monitoring camera on the horizontal plane;
the first calculation unit is used for calculating a straight-line horizontal distance and a longitude horizontal distance between the vertical projection position and any position on a horizontal plane in a visual range of the monitoring camera based on a hemiversine formula;
the second calculation unit is used for calculating an included angle between a connecting line of the vertical projection position and the arbitrary position and the geographical true north direction according to the straight line horizontal distance and the longitude horizontal distance, and recording the included angle as a first included angle;
the third calculating unit is used for calculating an included angle between a connecting line of the position of the monitoring camera and the arbitrary position and a vertical line according to the linear horizontal distance and the height of the monitoring camera from the horizontal plane, and recording the included angle as a second included angle;
the fourth calculation unit is used for calculating the picture coordinates of the monitoring camera at any position according to the included angle between the central line of the monitoring camera and a vertical line, the projection of the central line of the monitoring camera on a horizontal plane and the included angle in the true north direction of geography, the horizontal field angle of the monitoring camera, the vertical field angle of the monitoring camera and the image resolution parameter information of the monitoring camera; the mapping relation comprises the linear horizontal distance, the longitude horizontal distance, the first included angle, the second included angle and the picture coordinate of the random position in the monitoring camera.
10. The multimodal collaborative optimization system for video surveillance and remote sensing according to claim 7, wherein the fusion module specifically comprises:
the target object determining unit is used for determining the preset identification object according to the monitoring picture target detection data set and the remote sensing image target detection data set; the preset recognition object is recognized by a classifier to obtain a preset target recognition object;
the identification unit is used for correspondingly inputting the monitoring picture target detection data set and the remote sensing image target detection data set to the first classifier and the second classifier respectively for identification to obtain a first classification result and a second classification result;
and the fusion marking unit is used for carrying out data fusion according to the first classification result and the second classification result to obtain the identification result, and carrying out data marking according to the identification result, the first confidence coefficient set, the second confidence coefficient set and the mapping relation to obtain the monitoring picture added with the pseudo label and the remote sensing image added with the pseudo label.
CN202111154171.6A 2021-09-29 2021-09-29 Multi-mode collaborative optimization method and system for video monitoring and remote sensing Active CN113947714B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111154171.6A CN113947714B (en) 2021-09-29 2021-09-29 Multi-mode collaborative optimization method and system for video monitoring and remote sensing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111154171.6A CN113947714B (en) 2021-09-29 2021-09-29 Multi-mode collaborative optimization method and system for video monitoring and remote sensing

Publications (2)

Publication Number Publication Date
CN113947714A true CN113947714A (en) 2022-01-18
CN113947714B CN113947714B (en) 2022-09-13

Family

ID=79328908

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111154171.6A Active CN113947714B (en) 2021-09-29 2021-09-29 Multi-mode collaborative optimization method and system for video monitoring and remote sensing

Country Status (1)

Country Link
CN (1) CN113947714B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116630825A (en) * 2023-06-09 2023-08-22 北京佳格天地科技有限公司 Satellite remote sensing data and monitoring video fusion method and system
WO2023185074A1 (en) * 2022-04-02 2023-10-05 深圳先进技术研究院 Group behavior recognition method based on complementary spatio-temporal information modeling

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104660993A (en) * 2015-02-09 2015-05-27 武汉理工大学 Intelligent maritime affair monitoring method and system based on AIS (automatic identification system) and CCTV (closed circuit television)
CN108197562A (en) * 2017-12-29 2018-06-22 江苏省新通智能交通科技发展有限公司 A kind of AIS information visualization methods and realization system based on video technique
CN108810473A (en) * 2018-06-15 2018-11-13 高新兴科技集团股份有限公司 A kind of method and system for realizing GPS mapping camera views coordinates on a mobile platform
CN109460740A (en) * 2018-11-15 2019-03-12 上海埃威航空电子有限公司 The watercraft identification recognition methods merged based on AIS with video data
CN109948523A (en) * 2019-03-18 2019-06-28 中国汽车工程研究院股份有限公司 A kind of object recognition methods and its application based on video Yu millimetre-wave radar data fusion
CN112418028A (en) * 2020-11-11 2021-02-26 上海交通大学 Satellite image ship identification and segmentation method based on deep learning
CN112598733A (en) * 2020-12-10 2021-04-02 广州市赋安电子科技有限公司 Ship detection method based on multi-mode data fusion compensation adaptive optimization
CN112687127A (en) * 2020-12-18 2021-04-20 华南理工大学 Ship positioning and snapshot method based on AIS and image analysis assistance

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104660993A (en) * 2015-02-09 2015-05-27 武汉理工大学 Intelligent maritime affair monitoring method and system based on AIS (automatic identification system) and CCTV (closed circuit television)
CN108197562A (en) * 2017-12-29 2018-06-22 江苏省新通智能交通科技发展有限公司 A kind of AIS information visualization methods and realization system based on video technique
CN108810473A (en) * 2018-06-15 2018-11-13 高新兴科技集团股份有限公司 A kind of method and system for realizing GPS mapping camera views coordinates on a mobile platform
CN109460740A (en) * 2018-11-15 2019-03-12 上海埃威航空电子有限公司 The watercraft identification recognition methods merged based on AIS with video data
CN109948523A (en) * 2019-03-18 2019-06-28 中国汽车工程研究院股份有限公司 A kind of object recognition methods and its application based on video Yu millimetre-wave radar data fusion
CN112418028A (en) * 2020-11-11 2021-02-26 上海交通大学 Satellite image ship identification and segmentation method based on deep learning
CN112598733A (en) * 2020-12-10 2021-04-02 广州市赋安电子科技有限公司 Ship detection method based on multi-mode data fusion compensation adaptive optimization
CN112687127A (en) * 2020-12-18 2021-04-20 华南理工大学 Ship positioning and snapshot method based on AIS and image analysis assistance

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
白玉 等: "基于可见光图像和红外图像决策级融合的目标检测算法", 《空军工程大学学报(自然科学版)》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023185074A1 (en) * 2022-04-02 2023-10-05 深圳先进技术研究院 Group behavior recognition method based on complementary spatio-temporal information modeling
CN116630825A (en) * 2023-06-09 2023-08-22 北京佳格天地科技有限公司 Satellite remote sensing data and monitoring video fusion method and system

Also Published As

Publication number Publication date
CN113947714B (en) 2022-09-13

Similar Documents

Publication Publication Date Title
CN108898047B (en) Pedestrian detection method and system based on blocking and shielding perception
CN113947714B (en) Multi-mode collaborative optimization method and system for video monitoring and remote sensing
CN111912416B (en) Method, device and equipment for positioning equipment
CN107014294A (en) A kind of contact net geometric parameter detection method and system based on infrared image
CN108388871B (en) Vehicle detection method based on vehicle body regression
CN109191255B (en) Commodity alignment method based on unsupervised feature point detection
CN114155527A (en) Scene text recognition method and device
CN107909053B (en) Face detection method based on hierarchical learning cascade convolution neural network
CN110751076A (en) Vehicle detection method
CN103065163A (en) Rapid target detection and recognition system and method based on static picture
CN110309828B (en) Inclined license plate correction method
CN111582270A (en) Identification tracking method based on high-precision bridge region visual target feature points
CN105868776A (en) Transformer equipment recognition method and device based on image processing technology
CN112784494B (en) Training method of false positive recognition model, target recognition method and device
CN114332435A (en) Image labeling method and device based on three-dimensional reconstruction
CN109784257B (en) Transformer thermometer detection and identification method
CN111652048A (en) A deep learning based 1: n face comparison method
CN106897683A (en) The ground object detecting method and system of a kind of remote sensing images
CN112232272B (en) Pedestrian recognition method by fusing laser and visual image sensor
CN113591705B (en) Inspection robot instrument identification system and method and storage medium
CN110378337A (en) Metal cutting tool drawing identification information vision input method and system
CN115457130A (en) Electric vehicle charging port detection and positioning method based on depth key point regression
CN116188755A (en) Instrument angle correction and reading recognition device based on deep learning
CN114927236A (en) Detection method and system for multiple target images
CN113673534B (en) RGB-D image fruit detection method based on FASTER RCNN

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: 510600 room 1301, No. 68, yueken Road, Wushan street, Tianhe District, Guangzhou City, Guangdong Province (Location: room 1301-1) (office only)

Applicant after: Guangzhou Fu'an Digital Technology Co.,Ltd.

Address before: 510600 1501, No. 68, yueken Road, Tianhe District, Guangzhou, Guangdong

Applicant before: GUANGZHOU FUAN ELECTRONIC TECHNOLOGY Co.,Ltd.

CB02 Change of applicant information
GR01 Patent grant
GR01 Patent grant