CN112084860A

CN112084860A - Target object detection method and device and thermal power plant detection method and device

Info

Publication number: CN112084860A
Application number: CN202010783297.9A
Authority: CN
Inventors: 昝露洋
Original assignee: Aerospace Information Research Institute of CAS
Current assignee: Aerospace Information Research Institute of CAS
Priority date: 2020-08-06
Filing date: 2020-08-06
Publication date: 2020-12-15

Abstract

The embodiment of the application provides a target object detection method and device and a thermal power plant detection method and device, and relates to the technical field of image processing. Extracting a candidate region in a target image by using an object detection model; extracting object characteristics of a target object in the candidate region and relation characteristics of the target object and at least one associated object in the target object respectively; fusing the object feature and at least one relation feature to obtain a fused feature of the candidate region; identifying the target object and the at least one associated object in the candidate region using the object detection model based on the fused features. The technical scheme provided by the embodiment of the application improves the precision of target object detection.

Description

Target object detection method and device and thermal power plant detection method and device

Technical Field

The embodiment of the application relates to the technical field of image processing, in particular to a target object detection method and device and a thermal power plant detection method and device.

Background

Target object detection is an important research problem in the field of computer vision, and is also a common technical means in the field of artificial intelligence, and generally, an object detection model based on a deep convolutional neural network learns the characteristics of a target object, identifies the target object (object) in an image, and determines the position information of the target object. Target object detection has application requirements in many fields, including face detection, vehicle detection, important ground object detection in remote sensing images, and the like.

The current object detection model learns the characteristics of a single object, and judges according to the characteristic similarity of the object to be detected and the learned object during detection. For the detection of the ground objects in the remote sensing image, considering the complexity of the remote sensing image, a large number of different ground objects may have the same or similar characteristics, and a large number of false detections may be generated in actual application, which affects the detection accuracy.

Disclosure of Invention

The embodiment of the application provides a method and a device for target object detection and thermal power plant detection, which are used for solving the problem that a large amount of false detections are generated in target object detection in the prior art.

In a first aspect, an embodiment of the present application provides a target object detection method, including:

extracting a candidate region in the target image by using an object detection model;

extracting object characteristics of a target object in the candidate region and relation characteristics of the target object and at least one associated object in the target object respectively;

fusing the object feature and at least one relation feature to obtain a fused feature of the candidate region;

identifying the target object and the at least one associated object in the candidate region using the object detection model based on the fused features.

In a second aspect, an embodiment of the present application provides a data processing method, including:

determining a sample image and a training label in the sample image, wherein the training label comprises position information of a target object in the sample image and position information of at least one associated object in the target object;

training an object detection model by using the sample image and the training label; wherein the object detection model is used to identify the target object and the at least one associated object in a sample image.

In a third aspect, an embodiment of the present application provides a detection method for a thermal power plant, including:

extracting a candidate region in the remote sensing image by using the object detection model;

extracting object characteristics of a thermal power plant in the candidate area and relation characteristics of the thermal power plant and at least one production device in the thermal power plant;

fusing the object features and the relation features to obtain fused features of the candidate regions;

identifying the thermal power plant and the at least one production facility in the candidate area using the object detection model based on the fusion features.

In a fourth aspect, an embodiment of the present application provides a detection data processing method for a thermal power plant, including:

determining a sample image and a training label in the sample image, wherein the training label comprises position information of a thermal power plant in the sample image and position information of at least one production device in the thermal power plant;

training an object detection model by using the sample image and the training label; wherein the object detection model is used to identify the thermal power plant and the at least one production facility in a sample image.

In a fifth aspect, an embodiment of the present application provides a target object detection apparatus, including:

the first extraction module is used for extracting a candidate region in the target image by using the object detection model;

the second extraction module is used for extracting object features of target objects in the candidate areas and relation features of the target objects and at least one associated object in the target objects respectively;

the fusion module is used for fusing the object characteristic and at least one relation characteristic to obtain a fusion characteristic of the candidate region;

a detection module for identifying the target object and the at least one associated object in the candidate region using the object detection model based on the fusion feature.

In a sixth aspect, an embodiment of the present application provides a data processing apparatus, including:

a determining module, configured to determine a sample image and a training label in the sample image, where the training label includes position information of a target object in the sample image and position information of at least one associated object in the target object;

the training module is used for training an object detection model by utilizing the sample image and the training label; wherein the object detection model is used to identify the target object and the at least one associated object in a sample image.

The functions can be realized by hardware, and the functions can also be realized by executing corresponding software by hardware. The hardware or software includes one or more modules corresponding to the above-described functions.

In the embodiment of the application, a candidate region in a target image is extracted by using an object detection model; extracting object characteristics of a target object in the candidate region and relation characteristics of the target object and at least one associated object in the target object respectively; fusing the object feature and at least one relation feature to obtain a fused feature of the candidate region; identifying the target object and the at least one associated object in the candidate region using the object detection model based on the fused features. The method comprises the steps of obtaining the relation characteristic of a target object and at least one associated object in the target object, fusing the relation characteristic with the target object characteristic to obtain a fusion characteristic, and establishing the correlation constraint of the target object and the at least one associated object in the target object, so that an object detection model learns the ground feature correlation characteristic on the basis of learning the target object characteristic, the false detection of the irrelevant object is reduced, and the detection precision of the target object and the at least one associated object is improved.

These and other aspects of the present application will be more readily apparent from the following description of the embodiments.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present application, and other drawings can be obtained by those skilled in the art without creative efforts.

Fig. 1 is a flowchart illustrating an embodiment of a target object detection method proposed in the present application;

FIG. 2 is a flow chart illustrating one embodiment of a data processing method presented herein;

FIG. 3 is a flow chart illustrating an embodiment of a thermal power plant detection method proposed by the present application;

FIG. 4 is a flow chart illustrating an embodiment of a thermal power plant test data processing method proposed in the present application;

FIG. 5 is a schematic structural diagram illustrating an embodiment of a target object detection apparatus according to the present application;

fig. 6 is a schematic structural diagram of an embodiment of a data processing apparatus according to the present application.

Detailed Description

In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application.

In some of the flows described in the specification of the present application and in the above-described figures, a number of operations are included that occur in a particular order, but it should be clearly understood that these operations may be performed out of order or in parallel with the order in which they occur herein, the order of the operations being 101, 102, etc. merely to distinguish between the various operations, the order of the operations by themselves not being representative of any order of execution. Additionally, the flows may include more or fewer operations, and the operations may be performed sequentially or in parallel. It should be noted that, the descriptions of "first", "second", etc. in this document are used for distinguishing different messages, devices, modules, etc., and do not represent a sequential order, nor limit the types of "first" and "second" to be different.

The technical scheme of the embodiment of the application can be applied to any application scene related to target detection, as described in the background art, the object detection model based on the deep convolutional neural network learns the characteristics of the target object, identifies the target object (object) in the image according to the characteristic similarity between the object to be detected and the learned object, and determines the position information of the target object.

The object detection model needs to be trained by learning the characteristics of a target object, so that an image area where the object to be detected may exist can be determined according to an input image to be detected during detection, the object to be detected is identified according to the characteristic similarity of the object to be detected and the learned target object in the image area, and the specific position information of the object to be detected is determined to complete object detection.

The problem that the detection precision is not high can occur when the current object detection model is used for detection, and particularly, the detection on ground objects of remote sensing images is realized. The inventor finds in research that different objects may have the same or similar features in consideration of complexity of remote sensing images, and the object detection model only learns the features of a single target object, so that a large amount of false detections are generated on unrelated objects having similar features to the target object when feature similarity judgment is performed, and detection accuracy is affected.

The inventor researches in the process of realizing the technical scheme of the application to find that when the current object detection model is used for detection, part of non-target objects with similar characteristics can interfere the detection of the target object. Taking the detection of the thermal power plant in the remote sensing image as an example, the features of surrounding ground objects, such as steel plants, smelting plants and the like, in the remote sensing image are similar to those of the thermal power plant, which may interfere with the detection of the thermal power plant and cause false detection. Ground objects such as a cooling tower in the thermal power plant, which are used as production equipment in the thermal power plant, have certain influence on the detection of the thermal power plant. Therefore, the inventors thought that, if it is possible to establish constraints between object features of different features based on the influence relationship of the object features of the production equipment inside the thermal power plant on the thermal power plant by using the feature correlation between the object features of the internal production equipment and the object features of the thermal power plant, such as the cooling tower and the thermal power plant, etc., the target object detection model learns not only the object features of the thermal power plant but also the relationship features between the internal production equipment and the object features of the thermal power plant, and whether the thermal power plant and the internal production equipment can be made dependent on each other or not, thereby improving the detection accuracy of the thermal power plant and reducing the false picking rate. Accordingly, the inventors propose a technical solution of the present application.

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Fig. 1 is a flowchart of an embodiment of a target object detection method proposed in the present application, which may include the following steps:

101: candidate regions in the target image are extracted using an object detection model.

The target image may be an image containing a target object and may include a remote sensing image. The candidate region is a region in which the target object may exist, and may be extracted from the target image by using an object detection model.

102: and extracting object characteristics of the target object in the candidate region and relationship characteristics of the target object and at least one associated object in the target object respectively.

The relationship feature of the target object and the associated object in the target object may refer to an influence feature in which the feature of the associated object in the target object influences the feature of the target object. Wherein the associated object in the target object refers to an object located inside the target object structure and belonging to a component of the target object.

In real life, the structure of a part of a target object is complex, and there may be associated objects belonging to the components of the target object inside the structure of the target object. When the object detection model is used for detection, the characteristics of the associated objects in the target object structure influence the characteristic detection of the target object, so that the object characteristics of the target object in the candidate region can be extracted, and meanwhile, the relationship characteristics of the target object and the associated objects in the target object can be extracted.

It is considered that in the target object, there may be a plurality of objects belonging to the target object component, i.e., there may be associated objects among the plurality of target objects. In the extracting of the features, the relationship features of the target object and each of the associated objects in the target object may be respectively extracted, and a plurality of relationship features of the target object and at least one of the associated objects in the target object are obtained.

103: and fusing the object characteristic and at least one relation characteristic to obtain a fused characteristic of the candidate region.

The fusion feature of the candidate region includes an object feature of the target object and a relationship feature between the target object and an associated object in the target object.

104: identifying the target object and the at least one associated object in the candidate region using the object detection model based on the fused features.

Based on the object characteristics of the target object and the relation characteristics of the target object and the associated objects in the target object, which are contained in the fusion characteristics, association constraints of the target object and the associated objects in the target object are established, so that the object detection model can more accurately identify the target object in the candidate region and also identify at least one associated object in the target object.

In this embodiment, by extracting the relationship feature of the target object and the at least one associated object in the target object and fusing the relationship feature with the object feature of the target object, the association constraint between the target object and the at least one associated object is established, so that the object detection model can detect the target object according to the object feature of the target object, the relationship feature of the target object and the at least one associated object, and detect the at least one associated object at the same time, thereby reducing false detection of a non-target object, improving the detection accuracy of the target object and the at least one associated object, and improving the detection accuracy of the object detection model.

In some embodiments, the method for extracting a candidate region in a target image by using an object detection model may include:

extracting a characteristic diagram of a target image by using an object detection model;

and extracting a candidate region from the feature map.

The feature map can refer to a feature image containing various features such as texture features, shape features, semantic features and the like of an image, and the feature map is firstly extracted from the target image by using the object detection model, so that the target object can be conveniently detected according to the feature map. And performing convolution operation on the generated feature map to obtain a plurality of candidate regions, and completing candidate region extraction in the target image so that the target object is detected by the object detection model according to the candidate regions.

In practical application, because the target image has a certain complexity, the image includes a large number of non-target objects irrelevant to the detected object besides the target object, and the texture features or shape features of part of the non-target objects are similar to the target object, which is easy to interfere with the detection of the target object. Therefore, to reduce false detection of non-target objects, in some embodiments, a method for extracting a candidate region in a target image using an object detection model may include:

extracting a plurality of regions to be selected in a target image by using an object detection model;

candidate regions are determined from the plurality of candidate regions.

When the object detection model is used for extracting the candidate region in the target image, a plurality of candidate regions in the target image can be preferentially extracted, and then the candidate region meeting the requirement is screened out from the plurality of candidate regions.

Based on the extracted candidate region, in some embodiments, the method of extracting the object feature of the target object and the relationship feature of the target object and at least one associated object in the target object in the candidate region may include:

extracting object features of target objects in the candidate region and object features of at least one associated object in the target objects; the object features comprise spectral features;

and carrying out weighted summation on the spectral characteristics of the at least one associated object, and carrying out linear mapping through a value vector matrix to obtain the relation characteristics of the target object and at least one associated object in the target object.

For a single object, the object features may include spectral features, with f being the spectral feature_AAnd (4) showing. For the object group, the spectral characteristics of the target object in the candidate region can be used

Indicating that the spectral characteristics of at least one associated one of the target objects can be used

And (4) showing.

The spectral characteristics of the target object in the candidate area and the spectral characteristics of at least one associated object in the target object are extracted by using an object detection model, the spectral characteristics of the at least one associated object are subjected to weighted summation, and the summation result is subjected to linear mapping through a value vector matrix, so that the relationship characteristics of the target object and the at least one associated object in the target object can be obtained.

The relationship characteristic may be represented by f_R(n) is expressed, and can be calculated by the following formula:

wherein, ω is^mnRepresenting a relationship weight, which may indicate a degree of influence of at least one associated object in the target object on the target object,

representing spectral characteristics of at least one associated object, W_VRepresenting a matrix of value vectors.

In real life, due to the complexity of the target object, a plurality of different associated objects may exist in the target object structure, and the influence degree of the different associated objects on the target object is different. Therefore, in order to accurately know that the target object is influenced by the associated object and accurately calculate the relationship characteristic, the relationship weight between the target object and different associated objects can be calculated, the spectral characteristic of at least one associated object except the target object is subjected to weighted summation according to the relationship weight, and then the result of the weighted summation is subjected to linear mapping, so that the relationship characteristic between the target object and at least one associated object can be obtained.

W_VThe vector matrix is a value vector matrix in which weighted sum results are linearly mapped, and is a parameter that can be learned by an object detection model.

The relationship weight may be calculated according to the object feature, and in some embodiments, the object feature may further include a geometric feature;

the above method for performing weighted summation on the spectral features of the at least one associated object and performing linear mapping through the value vector matrix to obtain the relationship features between the target object and the at least one associated object in the target object may include:

performing dot product operation on the spectral characteristics of the target object and the spectral characteristics of the at least one associated object to obtain spectral weights;

mapping the geometric characteristics of the target object and the geometric characteristics of the at least one associated object to a high-dimensional space, and obtaining geometric weight through geometric vector matrix linear mapping;

and normalizing the spectral weight and the geometric weight to obtain a relation weight, performing weighted summation on the spectral characteristics of the at least one associated object according to the relation weight, and performing linear mapping through a value vector matrix to obtain the relation characteristics of the target object and the at least one associated object in the target object.

The object features include, in addition to spectral features, geometric features f_GCan be represented by the coordinates of the circumscribed rectangle of the object.

Wherein the relation weight ω^mnCan be determined by spectral weight

And geometric weight

And (4) forming. Relation weight omega^mnThe following formula can be used for calculation:

wherein,

the weight of the spectrum is represented by,

representing the geometric weight. The spectral weight is obtained by performing a dot product operation on the spectral feature of the target object and the spectral feature of at least one associated object, and may be specifically calculated by using the following formula:

wherein,

a spectral feature representing the target object,

representing a spectral feature of at least one associated object, d_kRepresenting a feature dimension. W_KAnd W_QAre respectively a matrix of key vectors subjected to linear mappingAnd a query vector matrix, which is a parameter that can be learned by the object detection model.

And geometric weight

Then, the geometric features of the target object and the geometric features of at least one associated object are mapped to a high-dimensional space, and the obtained result is subjected to linear mapping by a geometric vector matrix, and the following formula can be used for calculation:

wherein,

representing the geometric features of the target object,

representing a geometric feature of at least one associated object,_Grepresenting the mapping parameters. W_GIs a matrix of geometric vector parameters that can be learned by training the object detection model.

The spectral weight and the geometric weight are normalized based on the obtained spectral weight and the geometric weight to obtain a relation weight, the relation weight is used for carrying out weighted summation on the spectral characteristics of at least one associated object, and the result is subjected to linear mapping through a value vector matrix, so that the relation characteristics of the target object and the at least one associated object can be obtained.

By calculating the relation weight between the target object in the candidate region and at least one associated object in the target object, the relation characteristic is further calculated, the accurate similarity relation between the target object and the at least one associated object is established, the object detection model is favorable for reducing false detection of non-target objects based on the relation characteristic on the basis of detecting the target object based on the object characteristic of the target object, and the precision of the object detection model is further improved.

Based on the extracted object features and the relationship features, in some embodiments, the method for obtaining the fused features of the candidate region by fusing the object features and at least one relationship feature may include:

and adding the object characteristic and at least one relation characteristic to obtain the fusion characteristic of the candidate region.

The object features and geometric features can be fused using the following equations:

wherein,

the relation feature vector is Nr, the number of the relation features is Nr, the relation feature vector is a hyper-parameter of the object detection model, and the hyper-parameter can be obtained through training of the object detection model.

And the object characteristics of the original target are fused into the relationship characteristics of the target object and at least one associated object, so that the object detection network can detect the target object and the at least one associated object in the candidate area based on the fusion characteristics.

In some embodiments, the object detection model may include a feature map extraction network, a candidate area extraction network, a feature extraction network, and a detection network;

the method for extracting the candidate region in the target image by using the object detection model may include:

extracting a feature map of the target image by using the feature map extraction network;

and extracting candidate regions from the feature map by using the candidate region extraction network.

In practical applications, the object detection model may be selected from the fast R-CNN model. The target image has certain complexity, and contains shallow information such as geometric information and texture information, and also contains deep information such as semantic information. The feature map extraction network can extract image information level by level to obtain feature maps of different levels including superficial texture features and deep semantic features, so that a subsequent network model can complete detection tasks on the basis of the feature maps, for example, on the basis of a Faster R-CNN model, a ResNet50 network can be selected to extract feature maps of different levels.

As the hierarchy of extracting image information by the feature map extraction network is deepened, deep information such as semantic information represented by the feature map is gradually enhanced, and shallow information such as geometric information and texture information is gradually weakened. For a target object with a complex internal structure, the target object with a large size and a complex internal structure needs deep semantic features for classification and identification, but the shallow features such as geometric features represented by a deep semantic feature map are weak, and the specific position of the target object cannot be accurately positioned. For the related objects contained in the target object, due to the small size and the simple structure, shallow features such as geometric features and texture features are required for accurate positioning, but shallow feature maps cannot be well classified and identified, and due to the small number of semantic features contained, missing picking is easy to occur.

In view of the above problems, a feature pyramid FPN (feature pyramid) structure may be introduced on the basis of the ResNet50 network, and the FPN structure may fuse the shallow feature map and the deep feature map, and give more semantic information to the shallow feature map, and at the same time give geometric information such as position information of the deep feature map. Therefore, in the Faster R-CNN model, a multi-level feature map in the target image can be extracted using a feature map extraction network composed of a ResNet50 network and an FPN structure.

According to the extracted feature map, performing convolution operation on the feature map by using a candidate region extraction network RPN (RegionProposal network) to obtain a candidate region of the target image, namely, in a Faster R-CNN model, selecting the RPN as a candidate region extraction network.

The method for extracting object features of a target object and relationship features of the target object and at least one associated object in the target object respectively in the candidate region based on the extracted candidate region, and fusing the object features and the at least one relationship features to obtain a fused feature of the candidate region may include:

extracting object characteristics of a target object in the candidate area and relationship characteristics of the target object and at least one associated object in the target object respectively by using the characteristic extraction network; and fusing the object characteristic and at least one relation characteristic to obtain a fused characteristic of the candidate region.

The feature extraction network in the object detection model extracts object features of a target object in a candidate region and object features of at least one associated object in the target object through a region feature aggregation mode ROI-Align, calculates relationship weight according to a relationship weight calculation formula, and further calculates relationship features by using the relationship weight. And fusing the extracted object features of the original target object and the calculated relation features to obtain the fusion features of the candidate region.

The method for identifying the target object and the at least one associated object in the candidate region using the object detection model based on the fusion feature may include:

identifying, with the detection network, the target object and the at least one associated object in the candidate region based on the fused feature.

The detection network may be two fully connected branch networks, including a class prediction branch and a bounding box regression branch. Therefore, based on the obtained fusion features, the target object and the at least one associated object in the candidate region can be detected, the target object and the at least one associated object in the candidate region are identified, and the position information of the target object and the position information of the at least one associated object are obtained.

In some embodiments, the object detection model may be obtained in advance based on a sample image and training labels in the sample image. Wherein the training labels comprise position information of target objects in the sample image and position information of at least one associated object in the target objects.

The training object detection model can be based on a sample image and a training label in the sample image to perform data processing, so that the object detection model learns the object characteristics of a related object, and performs modeling according to the information of the object characteristics in network design, so that the model can complete judgment according to the characteristic similarity between an object to be detected and a learned target object during detection, and perform category identification and position information detection on the object to be detected.

In the following, analysis of model training is performed from the aspect of data processing, and fig. 2 is a flowchart of an embodiment of a data processing method proposed in the present application, which may include the following steps:

201: determining a sample image and a training label in the sample image, the training label including position information of a target object in the sample image and position information of at least one associated object in the target object.

202: and training an object detection model by using the sample image and the training label.

Wherein the object detection model is used to identify the target object and the at least one associated object in a sample image.

The determination of the sample image and the training label in the sample image requires analyzing the characteristics of the target object and at least one associated object in the target object, analyzing the characteristics of the target object, obtaining specific information of the target object including the size, the internal composition structure and the like, fully recognizing the related constraints of the target object and the at least one associated object, obtaining the information of the size, the structure and the like of the at least one associated object, and making a sample image data set meeting the requirements.

In this embodiment, by analyzing the object features of the target object and the at least one associated object, the related constraint of the target object and the at least one associated object in the target object is fully recognized, the related information of the size, the structure and the like of the target object and the at least one associated object is obtained, the sample image and the training label are determined, the object detection model is trained, and the detection accuracy of the object detection model is improved.

In some embodiments, using the sample images and the training labels, a method of training a subject detection model may include:

inputting a sample image into an object detection model, and extracting a candidate region in the sample image;

and training the object detection model by using the fusion features of the candidate regions and the training labels.

Inputting a sample image into an object detection model according to the manufactured sample image data set, extracting a candidate region in the sample image, extracting target object features and relation features of the target object and at least one associated object in the target object respectively in the candidate region based on the extracted candidate region, fusing the original object features and the at least one relation feature to obtain fusion features of the candidate region, and training the object detection model by using the fusion features and a training label.

The object detection model is trained through the sample images and the training labels, so that the object detection model learns the object characteristics of the target object and also learns the relation characteristics of the target object and at least one associated object in the target object, the false detection of non-target objects is reduced during object detection, the detection precision of the target object is improved, and meanwhile, at least one associated object in the target object is detected, and the precision of the object detection model is improved.

training the feature map extraction network by using a sample image and a feature map corresponding to the sample image;

training the candidate region extraction network by using the feature map and the candidate regions marked in advance in the feature map;

training the feature extraction network by using the candidate region and the fusion feature corresponding to the candidate region;

and training the detection network by using the fusion characteristics and the training labels.

In practical application, according to an input sample image, a feature map corresponding to the sample image and the sample image can be utilized to train a feature map extraction network to extract an accurate feature map, then a candidate region marked in advance in the feature map and the feature map is utilized to train the candidate region extraction network, so that the candidate region extraction network can extract a correct candidate region, the feature extraction network can be trained to obtain an accurate fusion feature by utilizing the fusion feature corresponding to the candidate region and the candidate region, then a detection network is trained according to the fusion feature and a training label to improve the accuracy of detecting a target object and at least one associated object by the detection network, and the accuracy of determining the position information of the target object and the position information of the at least one associated object by the detection network.

In practical application, there are various implementation manners for training the candidate region extraction network by using the feature map and the candidate regions labeled in advance in the feature map.

As an implementation, the method may include:

extracting a plurality of regions to be selected from the feature map by using the candidate region extraction network;

and training the candidate region extraction network by using the plurality of candidate regions and the pre-marked candidate regions.

And extracting a plurality of candidate areas from the characteristic diagram by using the candidate area extraction network, comparing the plurality of candidate areas with the candidate areas marked in advance, extracting the candidate areas corresponding to the candidate areas marked in advance as the candidate areas, adjusting corresponding parameters of the candidate area extraction network, and training the candidate area extraction network to extract correct candidate areas.

As another implementation, the method may further include:

selecting a region to be selected from the plurality of regions to be selected that does not include the candidate region as a negative sample region;

reverse training the candidate region extraction network with the negative sample region.

Based on the extracted negative sample region, the candidate region extraction network can learn and resist the non-target object contained in the negative sample region, and the candidate region extraction network is trained in a reverse training mode, so that the candidate region extraction network can avoid extracting the negative sample region in practical application, and the accuracy of candidate region extraction is improved.

Considering that in practical application, the sample image has a certain complexity, a large number of irrelevant non-objects are contained in the sample image. The object can cause interference to the detection of the target object and influence the false detection rate of the model, so that the accuracy of the candidate region selection is very important. In some embodiments, the method for training the candidate region extraction network using the plurality of candidate regions and the pre-labeled candidate regions may include:

extracting object features of class target objects in each region to be selected and object features of at least one class associated object in the class target objects; the class target image comprises a target image; at least one class associated object in the class target image comprises at least one associated object in the target object;

calculating the relation weight of a class target object in each to-be-selected area and at least one class associated object in the class target objects;

and training the candidate region extraction network by using the respective relation weights of the plurality of candidate regions and the pre-marked candidate regions.

The method comprises the steps of extracting object features of class target objects in a candidate area and object features of at least one class associated object in the class target objects, calculating relation weights of the class target objects in the candidate area and the at least one class associated object in the class target objects by using a relation weight calculation formula, adding the calculated relation weights to obtain the relation weight of the candidate area, optimizing weight distribution by using the relation weight of the candidate area and a pre-marked candidate area, highlighting the target objects, resisting irrelevant objects by using a back propagation mode, and training a candidate area network to extract more accurate candidate areas.

In some embodiments, the method for training the candidate region extraction network by using the relationship weights of the multiple candidate regions and the pre-labeled candidate regions may include:

generating a density center according to the plurality of regions to be selected;

and determining the candidate area within a first distance range from the density center as a candidate area.

In practical application, part of the candidate areas extracted by the candidate area extraction network are gathered in a certain range, so that the class target objects in the sample image can be shown to be in the range. And taking the range center of the gathering of the candidate areas as a density center, and determining the candidate areas within a first distance range from the density center as candidate areas. Wherein, the first distance range can be specifically set according to specific situations. And discarding the area to be selected far away from the density center, so that model false detection can be reduced, and the accuracy of extracting the network from the candidate area is improved.

A large-scale training of the sample image is very important for training of the object detection model and detection accuracy of the model. In view of the fact that in practical applications, the acquisition mode of a part of the sample image is complex and difficult to analyze, the number of the sample images is limited, which is not beneficial to the data processing of the model, in some embodiments, before determining the sample image, the data processing method further includes:

acquiring an image to be processed;

and performing data enhancement on the image to be processed to obtain a sample image.

The data enhancement mode can increase the diversity of the samples, and for example, the data enhancement mode can include color brightness saturation conversion, random inversion, random clipping, random mirroring and the like. After data enhancement is carried out on the image to be processed, a sample image is obtained, the diversity of the sample is increased, the problem that the number of the sample is less than the parameter quantity of the model to cause overfitting of the model is avoided, and the data processing of the model is facilitated.

In a practical application, the technical scheme of the application can be suitable for detecting a thermal power plant in a remote sensing image, wherein the remote sensing image refers to films or photos recording electromagnetic waves of various ground objects and is mainly divided into aerial photos and satellite photos. In the field of remote sensing, training sample images for object detection are fewer because the cost of remote sensing image acquisition is higher and remote sensing target interpretation is more difficult. Therefore, a thermal power plant which is closely related to the power development and the social economic development of the modern society and is widely distributed can be selected as a detection and training sample object of the object detection model. The following describes the technical solution of the present application by taking the thermal power plant detection as an example, and as shown in fig. 3, the method is a flowchart of an embodiment of a thermal power plant detection method provided by the present application, and the method may include the following steps:

301: and extracting a candidate region in the remote sensing image by using the object detection model.

302: and extracting object characteristics of a thermal power plant in the candidate area and relation characteristics of the thermal power plant and at least one production device in the thermal power plant.

A thermal power plant is a plant for producing electric energy or heat energy by using combustible materials as fuel, and mainly comprises a cooling tower, a generator set, an open-air coal pile and the like. In the remote sensing image, the thermal power plant is a complex containing various land features, and also comprises a large number of unrelated land features such as buildings, bare land and the like, and the characteristics of the thermal power plant are similar to those of a steel plant and a smelting plant.

Considering that the cooling tower is an important cooling device inside the thermal power plant and has obvious characteristics in the remote sensing image, the cooling tower can be selected as at least one relevant object of the thermal power plant.

303: and fusing the object characteristic and the relation characteristic to obtain a fused characteristic of the candidate region.

304: identifying the thermal power plant and the at least one production facility in the candidate area using the object detection model based on the fusion features.

The object characteristics and the relation characteristics of the thermal power plant and the internal cooling tower are extracted, then the object characteristics and the relation characteristics are fused, the ground feature correlation of the thermal power plant and the internal cooling tower is utilized, the object detection model is applied to detect the thermal power plant and the cooling tower nationwide, the internal cooling tower is detected while the thermal power plant is detected, the condition that a large number of false detections are generated due to the complexity of a remote sensing image is avoided, the false detection rate of the object detection model is reduced, and the detection precision of the thermal power plant and the internal cooling tower is improved.

In practical application, considering the complexity of the remote sensing image, the number of samples of the thermal power plant is small, and model overfitting is easily caused, so that ResNet50 can be selected by a feature map extraction network contained in the object detection model, the model parameters are small, the requirement on data volume is low, and detection of the thermal power plant and the cooling tower in the remote sensing image is facilitated.

The method comprises the steps of generating a feature map of a remote sensing image by using a feature map extraction network ResNet50 so that a candidate region extraction network RPN can generate a plurality of candidate regions based on the feature map, obtaining object features of a thermal power plant and relation features of the thermal power plant and a cooling tower in the candidate regions according to the feature extraction network, fusing the object features with the object features of the thermal power plant to obtain fusion features, detecting and identifying the thermal power plant and the cooling tower in the candidate regions by using a detection network based on the fusion features, and obtaining position information of the thermal power plant and position information of the cooling tower. The detection accuracy of the thermal power plant is improved, the detection accuracy of the cooling tower is improved, the false detection rate of the model is reduced, and the detection accuracy of the model is improved.

The object detection model in the embodiment shown in fig. 3 may be obtained by pre-training according to the data processing method shown in fig. 4, and the method may include the following steps:

401: determining a sample image and a training label in the sample image, wherein the training label comprises position information of a thermal power plant in the sample image and position information of at least one production device in the thermal power plant.

In practical application, when a sample image of a thermal power plant is determined, considering that the occupied area of the thermal power plant is large, on the basis of containing all structures of the thermal power plant, in order to ensure that the thermal power plant has clear texture and obvious characteristics, GF1 satellite data and Google Earth 16-level images can be selected, and the resolution is about 2 meters. Meanwhile, multi-period GF-1 images and multi-temporal Google earth data containing various illumination conditions, satellite imaging postures, various atmospheric conditions and the like can be adopted, so that the diversity of samples of the thermal power plant is increased as much as possible.

Based on an RS-Label sample labeling tool, labeling the thermal power plant and the cooling tower by using the minimum external rectangle, and determining a training Label of the sample image.

402: and training an object detection model by using the sample image and the training label.

Wherein the object detection model is used to identify the thermal power plant and the at least one production facility in a sample image.

Considering the limited number of samples of a thermal power plant, data enhancement operations may be employed to increase the diversity of samples. The method comprises the operations of color brightness saturation conversion, random inversion, random clipping, random mirroring and the like. Meanwhile, when the thermal power plant operates, a large amount of smoke can be released from the cooling tower, partial detail characteristics of the thermal power plant can be covered in the remote sensing image, and the model can easily generate false detection on partially covered unrelated ground objects, so that the sample image can be subjected to defogging treatment, and the false detection of the model is reduced.

Training an object detection model by utilizing a sample image and a training label of a thermal power plant, inputting the sample image into the object detection model, and extracting a candidate region in the sample image; extracting object characteristics of the thermal power plant in the candidate region and relation characteristics of the thermal power plant and a cooling tower in the thermal power plant respectively; fusing the object feature and at least one relation feature to obtain a fused feature of the candidate region; and training the object detection model by using the fusion features of the candidate regions and the training labels. The object detection model can detect the cooling tower based on the ground feature correlation of the thermal power plant and the cooling tower, the detection precision of the thermal power plant is improved, the detection precision of the cooling tower is also improved, the false detection of irrelevant ground features is reduced, and the high-precision detection of the thermal power plant and the cooling tower in a large range can be realized.

In practical application, the object detection model is used, and based on a feature extraction network which is included in the model and can extract the relationship features of the target object and at least one associated object in the target object, the target object and the at least one associated object in a large range can be detected, for example, intelligent detection can be performed on thermal power plants and cooling towers nationwide. When the target object detection in a large range is carried out, in order to specifically analyze the working mechanism of the object detection model and increase the interpretability of the object detection model, the method further comprises the following steps:

and visually displaying the relation weight of the target object and at least one associated object in the target object.

The visual display may refer to a technology of converting data into an image or displaying the image on a screen by using computer graphics or an image processing technology, and then performing interactive processing. The relationship weight of the target object extracted by the feature extraction network and at least one associated object in the target object is visually displayed, the working mechanism of the model is visually displayed, the interpretability of the model is increased, and the object detection model containing the feature extraction network is favorably used for detecting the target object in a large range.

The visualization analysis may include a global visualization analysis and a local visualization analysis. For the global visualization analysis, the candidate regions and the relationship weights may be displayed in an overlapping manner, that is, the candidate regions are extracted by using the candidate region extraction network, the corresponding relationship weights calculated by the feature extraction network are obtained for each candidate region, each candidate region is filled with the relationship weights, and a plurality of candidate regions are displayed in an overlapping manner on the feature map. By displaying different brightness on the feature diagram, different relation weights given to different candidate areas are judged, so that the relation characteristics calculated by the feature extraction network and the ground-object relation structure can be clearly displayed, and the interpretability of the object detection model is improved.

For the local visualization analysis, for a specific candidate region, other activated candidate regions are visually displayed according to the magnitude of the relationship weight obtained by the calculation of the feature extraction network, for example, the activation relationships between the thermal power plant and the cooling tower, between the cooling tower and the thermal power plant, between the cooling tower and the cooling tower, and the like can be displayed, so that the relationship features calculated by the feature extraction network, the learned ground-object relationship structure, and the interpretability of the object detection model are displayed.

Fig. 5 is a schematic structural diagram of an embodiment of a target object detection apparatus proposed in the present application, which may include the following components:

a first extraction module 501, configured to extract a candidate region in a target image by using an object detection model;

a second extracting module 502, configured to extract object features of a target object in the candidate region and relationship features between the target object and at least one associated object in the target object;

a fusion module 503, configured to fuse the object feature and at least one relationship feature to obtain a fusion feature of the candidate region;

a detection module 504 configured to identify the target object and the at least one associated object in the candidate region using the object detection model based on the fusion feature.

In some embodiments, the first extraction module may include:

a first extraction unit, which is used for extracting a feature map of a target image by using an object detection model;

and the second extraction unit is used for extracting the candidate region from the feature map.

In certain embodiments, the object features may include spectral features;

the second extraction module may include:

a third extraction unit, configured to extract an object feature of a target object in the candidate region and an object feature of at least one associated object in the target object;

and the calculation unit is used for carrying out weighted summation on the spectral characteristics of the at least one associated object and carrying out linear mapping on the value vector matrix to obtain the relation characteristics of the target object and the at least one associated object in the target object.

In some embodiments, the object features may also include geometric features;

the calculation unit may include:

the first calculating subunit is configured to perform a dot product operation on the spectral feature of the target object and the spectral feature of the at least one associated object to obtain a spectral weight;

the second calculation subunit is used for mapping the geometric characteristics of the target object and the geometric characteristics of the at least one associated object to a high-dimensional space and obtaining geometric weight through geometric vector matrix linear mapping;

and the third calculation subunit is used for normalizing the spectral weight and the geometric weight to obtain a relation weight, performing weighted summation on the spectral characteristics of the at least one associated object according to the relation weight, and performing linear mapping through a value vector matrix to obtain the relation characteristics of the target object and the at least one associated object in the target object.

In some embodiments, the fusion module may include:

and the fusion unit is used for summing the object characteristic and at least one relation characteristic to obtain a fusion characteristic of the candidate region.

In some embodiments, the object detection model includes a feature map extraction network, a candidate area extraction network, a feature extraction network, and a detection network;

the first extraction module may be to: extracting a feature map of the target image by using the feature map extraction network; extracting candidate regions from the feature map by using the candidate region extraction network;

the fusion module may be configured to: extracting object characteristics of a target object in the candidate area and relationship characteristics of the target object and at least one associated object in the target object respectively by using the characteristic extraction network; fusing the object feature and at least one relation feature to obtain a fused feature of the candidate region;

the detection module is used for: identifying, with the detection network, the target object and the at least one associated object in the candidate region based on the fused feature.

In some embodiments, the first extraction module may further comprise:

the fourth extraction unit is used for extracting a plurality of regions to be selected in the target image by using the object detection model;

a determining unit, configured to determine a candidate region from the multiple candidate regions.

The target object detection apparatus shown in fig. 5 may execute the target object detection method described in the embodiment shown in fig. 1, and the implementation principle and the technical effect are not repeated. The specific manner in which each module and unit of the target object detection apparatus in the above embodiments perform operations has been described in detail in the embodiments related to the method, and will not be described in detail here.

Fig. 6 is a schematic structural diagram of an embodiment of a data processing apparatus proposed in the present application, which may include the following components:

a determining module 601, configured to determine a sample image and a training label in the sample image, where the training label includes position information of a target object in the sample image and position information of at least one associated object in the target object;

a training module 602, configured to train an object detection model using the sample image and the training labels.

In some embodiments, the training module may be to:

the training module may include:

the first training unit is used for training the feature map extraction network by utilizing a sample image and a feature map corresponding to the sample image;

the second training unit is used for training the candidate region extraction network by using the feature map and the candidate regions marked in advance in the feature map;

a third training unit, configured to train the feature extraction network by using the candidate region and the fusion feature corresponding to the candidate region;

and the fourth training unit is used for training the detection network by utilizing the fusion features and the training labels.

In some embodiments, the second training unit may comprise:

the first training subunit is used for extracting a plurality of regions to be selected from the feature map by using the candidate region extraction network; and training the candidate region extraction network by using the plurality of candidate regions and the pre-marked candidate regions.

In some embodiments, the second training unit may further comprise:

a second training subunit, configured to set, as a negative sample region, a region to be selected from the plurality of regions to be selected that does not include the candidate region; reverse training the candidate region extraction network with the negative sample region.

In some embodiments, the first training subunit may be to:

In some embodiments, the first training subunit may be further operable to:

In some embodiments, the apparatus may further comprise:

the acquisition module is used for acquiring an image to be processed;

and the enhancement module is used for carrying out data enhancement on the image to be processed to obtain a sample image.

The data processing apparatus shown in fig. 6 may execute the data processing method shown in the embodiment shown in fig. 2, and the implementation principle and the technical effect are not described again. The specific manner in which each module and unit of the data processing apparatus in the above embodiments perform operations has been described in detail in the embodiments related to the method, and will not be described in detail herein.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that the embodiments may be implemented by software plus a necessary general hardware platform, and may also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solutions of the present application, and not to limit the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.

Claims

1. A target object detection method, comprising:

2. The method of claim 1, wherein extracting candidate regions in the target image using the object detection model comprises:

and extracting a candidate region from the feature map.

3. The method of claim 1, wherein the extracting object features of target objects in the candidate region and relationship features of the target objects and at least one associated object of the target objects respectively comprises:

4. The method of claim 3, wherein the object features further comprise geometric features;

the weighted summation of the spectral features of the at least one associated object and the linear mapping of the value vector matrix to obtain the relationship features of the target object and the at least one associated object of the target objects comprise:

5. The method according to claim 1, wherein the fusing the object feature with at least one relation feature to obtain a fused feature of the candidate region comprises:

6. The method of claim 2, wherein the object detection model comprises a feature map extraction network, a candidate area extraction network, a feature extraction network, and a detection network;

the extracting the candidate region in the target image by using the object detection model comprises:

extracting candidate regions from the feature map by using the candidate region extraction network;

the extracting of the object feature of the target object in the candidate region and the relationship feature of the target object and at least one associated object in the target object, respectively, and the fusing of the object feature and the at least one relationship feature to obtain the fused feature of the candidate region includes:

extracting object characteristics of a target object in the candidate area and relationship characteristics of the target object and at least one associated object in the target object respectively by using the characteristic extraction network; fusing the object feature and at least one relation feature to obtain a fused feature of the candidate region;

the identifying the target object and the at least one associated object in the candidate region using the object detection model based on the fused feature comprises:

7. The method of claim 1, wherein the object detection model is obtained in advance based on a sample image and training labels in the sample image, the training labels comprising position information of target objects in the sample image and position information of at least one associated object in the target objects.

8. The method of claim 1, wherein extracting candidate regions in the target image using the object detection model comprises:

candidate regions are determined from the plurality of candidate regions.

9. A data processing method, comprising:

10. The method of claim 9, wherein training a subject detection model using the sample images and the training labels comprises:

11. The method of claim 9, wherein the object detection model comprises a feature map extraction network, a candidate area extraction network, a feature extraction network, and a detection network;

12. The method of claim 11, wherein the training the candidate region extraction network using the feature map and the pre-labeled candidate regions in the feature map comprises:

13. The method of claim 12, further comprising:

14. The method of claim 12, wherein training the candidate region extraction network using the plurality of candidate regions and pre-labeled candidate regions comprises:

15. The method of claim 12, wherein training the candidate region extraction network using the plurality of candidate regions and pre-labeled candidate regions comprises:

16. The method of claim 9, further comprising:

acquiring an image to be processed;

17. A method for detecting a thermal power plant, comprising:

18. A detection data processing method for a thermal power plant is characterized by comprising the following steps:

19. A target object detection apparatus, comprising:

20. A data processing apparatus, comprising: