CN110738132B

CN110738132B - Target detection quality blind evaluation method with discriminant perception capability

Info

Publication number: CN110738132B
Application number: CN201910896907.3A
Authority: CN
Inventors: 李坤乾; 亓琦; 杨华; 宋大雷
Original assignee: Ocean University of China
Current assignee: Ocean University of China
Priority date: 2019-09-23
Filing date: 2019-09-23
Publication date: 2022-06-03
Anticipated expiration: 2039-09-23
Also published as: CN110738132A

Abstract

The invention provides a blind evaluation method for target detection quality with discriminant perception capability, which relates to the technical field of computer vision and comprises a deep training network module, a target detection quality evaluation module and a constructed training sample, wherein the constructed training sample is a sample set containing an intersection ratio and index of a target area and a true value area and a target area discriminant information richness quality evaluation index by collecting a public data set, the deep training network module is used for setting a loss function of a convolutional neural network, adjusting an output parameter and a forward propagation function of the network, loading the adjusted deep training network module for training to obtain a discriminant perception model, executing a task of evaluating the target detection quality, and detecting the intersection ratio and index of the given target area and the true value area and the richness of the target area discriminant information. The method can effectively sense the judgment information abundance degree of the target attribute in the detection area, realizes more comprehensive quality evaluation and has better generalization capability.

Description

Target detection quality blind evaluation method with discriminant perception capability

Technical Field

The invention relates to the technical field of computer vision, in particular to a target detection quality blind evaluation method with discriminative information perception capability.

Background

In the field of computer vision, target detection technology has been a research hotspot, and related research results have also been widely applied in the industry. The target detection quality evaluation means that the quality of a target detection result is quantitatively graded, and the higher the grade is, the better the target detection result is. The blind evaluation of the target detection quality refers to the quality evaluation automatically used by a computer to give a target detection result without manual intervention and true value labeling. In the existing method, a user manually designs a quality evaluation index for evaluating a target detection result through observation. The method is usually highly subjective, and due to the diversity of target modes, it is difficult to obtain the evaluation index and the evaluation method with common adaptability. Under the condition of no true value reference, the method has an important application prospect in accurately evaluating the quality of target detection, can be widely applied to other computer vision tasks based on target detection results, can optimize the performance of a subsequent vision algorithm, and obviously improves the execution efficiency of the subsequent vision task.

Most of the current methods for evaluating the quality of the target area are based on specific image clue design scoring methods. For example:

(1) bogdan Alexe et al propose an objectification metric method for distinguishing an object window from a background window. The Bayesian model is combined with four objective measures of multi-scale significance, color contrast, edge density and super-pixel to calculate the quality of the target area, and excellent performance higher than the detection effect of any single measure index is obtained. By training the obtained model, objects with clear space boundaries (such as cattle and telephone) can be distinguished from backgrounds without fixed shapes (such as grass and roads). The method is not directed to a particular class of image objects, but can be applied to any object.

(2) Esa Rahtu et al presents a cascading ordering model based on three target-related features, namely superpixel boundaries, boundary edges, and window symmetry. The core of most object detection methods is a discriminant function that distinguishes between windows containing objects of interest and windows not containing objects. When the target detection system is deployed in an actual application scene to process large-scale data, the discriminant function may be a main calculation bottleneck of the system. The method adopts the general setting of the method and designs the characteristics of the learning target area of the cascade layer independent of a specific category.

(3) Ian Endres et al combines the boundary and shape line together to generate a category-independent region suggestion with diversity. A target area evaluation method based on diversity rewards is provided. The method can generate a small-scale diversified regional suggestion set, so that the set can cover all target objects in the image, each stage of the process is thoroughly evaluated, and the method is proved to be well generalized to data sets of various target categories. The method may also rank the target regions containing target objects of unknown class. When an image of a new class object is input, it can be applied in an active learning framework to learn an untrained object class.

(4) Ren et al propose a Region suggestion Network (RPN) in fast R-CNN, which designs an end-to-end method while training candidate box generators and evaluators. More specifically, an input image is converted to a multi-channel feature map by a plurality of convolution and pooling operations, wherein each vector located in the feature map corresponds to 9 anchor windows of different sizes. Each feature vector is then further mapped through two fully connected layers to 9 two-dimensional objectionability scores and four-dimensional coordinates of the object solution, where the objectionability scores measure the likelihood of whether each solution has an object.

(5) Wu et al propose a general area recommendation evaluation model trained using a Lazy learning strategy (Lazy learning) that can estimate the quality of each target area without manually labeling the truth area. The method provides a uniform sampling strategy for collecting a bounding box covering a target, so that the bounding box has uniform Intersection over Union (IoU) distribution and is independent of a region generation process.

With the rapid development of machine learning applications, researchers have found that pre-trained models can provide rich, reusable information. Different from a common method that a preprocessing model is used as feature extraction, Depth Description Transformation (DDT) reveals that rich available information exists in a convolutional layer, for example, a convolutional activation layer can be used as a detector to detect a common object in a co-location problem of a group of images, and the common object is used for evaluating the correlation between descriptors to obtain a region with consistent category, so that the same category of objects can be accurately located in the group of images. DDT has good generalization capability for unknown classes and is more robust to noisy data. Machine learning methods represented by deep learning are used in quality evaluation problems, however, the methods only adopt the intersection ratio of a target area and a truth-valued area and indexes as evaluation criteria, and evaluation is very incomplete. In view of the above problem, an object of the present invention is to provide a blind evaluation method for target detection quality with discriminative sensing capability, which can effectively sense the abundance of target attribute discrimination information in a detection area and realize more comprehensive quality evaluation.

Disclosure of Invention

In order to effectively sense the richness of discrimination information of target attributes in a detection area, realize more comprehensive quality evaluation and enable the quality evaluation to have better generalization capability, the invention provides a target detection quality blind evaluation method with discrimination perception capability, and the specific technical scheme is as follows.

A target detection quality blind evaluation method with discriminant perception capability comprises a deep training network module, a target detection quality evaluation module and a constructed training sample, wherein the constructed training sample is a sample set of quality evaluation indexes including cross-ratio and indexes and target area discriminant information richness by collecting a public data set; the deep training network module comprises a loss function for setting a deep training network, and adjusts the output parameters and the forward propagation function of the deep training network to obtain a discriminant sensing model; and the target detection quality evaluation module loads the discriminant perception model obtained by training the adjusted deep training network module, executes a task of evaluating the target detection quality, and detects the intersection ratio of a given target area and a truth value area and the richness of indexes and discriminative information in the target area.

Preferably, the step of constructing the training sample comprises:

s101, collecting a public data set, obtaining a truth-value area based on an image marking file, dividing an image area according to an intersection ratio of a target truth-value area and taking {0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8 and 0.9} as a standard, and generating a target area coordinate with a corresponding value;

s102, generating a judgment information richness quality evaluation index of the divided image area;

step S103, generating a training sample set, a verification sample set and a test sample set, wherein each sample comprises an image area coordinate, an intersection ratio corresponding to the image area coordinate, an index and a judgment information richness quality evaluation index.

It is further preferred that the step of deep training the network module comprises:

s201, selecting a convolutional neural network AlexNet for image classification as a basic network structure;

s202, setting a forward propagation function of a deep training network, mapping coordinates of an image in an input sample to a feature map extracted based on an original image, and learning features of a feature map region;

s203, setting a mean square error loss function to calculate an input sample true value and an actual output value of the deep training network to obtain a loss value, and performing back propagation to adjust network parameters based on the loss value;

S204, setting the output dimension of the modified network full-connection layer to be 2, and respectively representing the cross ratio and index of a given target area of the detected image and the truth value area and the judgment information richness quality evaluation index;

and S205, obtaining the discriminant sensing model after the training of the training sample set is completed.

It is also preferable that the target detection quality evaluation module includes:

s301, inputting a verification sample set evaluation discriminant sensing model, further optimizing a deep training network according to an output result, and generating a target detection quality evaluation model;

s302, loading a target detection model after training is completed;

step S303, inputting an unmarked image and a target area coordinate covering a certain area of the unmarked image, inputting the unmarked image and the target area coordinate into a target detection quality evaluation model, and then outputting the discriminative information richness and the intersection ratio of the target area and an expected truth value area which respectively represent the target area and the two values.

It is further preferred that each cross-ratio sum generates 5 target regions covering the true value regions of the original image including upper left, upper right, lower left, lower right, and target regions covering the full true value regions.

Further preferably, the calculating of the discriminative information richness quality evaluation index specifically includes: (a) extracting a depth feature map of an input image through a classification pre-training model, calculating covariance matrixes of depth features of all positions, and solving feature values and feature vectors of the depth feature maps; (b) extracting the first two groups of feature vectors with the largest feature values as projection directions, calculating the correlation between any position features of the depth feature map and the feature vectors, and generating an energy heat map; (c) and mapping the target area to a two-dimensional matrix to calculate the energy density of the mapping area, so as to represent the discriminative information richness of the area, and taking the discriminative information richness as the quality evaluation index score of the discriminative information richness.

The method has the advantages that the richness of the object attribute discrimination information of the detection area can be effectively sensed by utilizing the deep training network module, the object detection quality evaluation module and the constructed training sample; in addition, cross ratio and indexes of the target area and the truth value area can be integrated, more comprehensive quality evaluation on the detection area is realized, and the method also has better generalization capability.

Drawings

In order to more clearly illustrate the embodiments or technical solutions of the present invention, the drawings used in the embodiments or technical solutions of the present invention will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a flow chart of a blind evaluation method for target detection quality with discriminative perception capability;

fig. 2 is a diagram of a convolutional neural network structure.

Detailed Description

With reference to fig. 1, a blind evaluation method for target detection quality with a target area discriminative sensing capability according to an embodiment of the present invention is as follows.

Existing algorithms for quality assessment for target areas are typically designed with each quality metric being a generation algorithm for their own specific area recommendation. Due to the use of different objective cues and benchmarks, their evaluation results tend to be inconsistent and inaccurate when the same quality metric is applied to bounding boxes generated by different proposed algorithms.

The automatic evaluation of the target detection performance under the reference condition without the truth value is a new computer vision algorithm, and the algorithm can be widely applied to other computer vision tasks based on the target detection result. The method can effectively screen and reserve the target detection result with better quality, optimize the performance of the subsequent visual algorithm and obviously improve the execution efficiency of the subsequent visual task.

When a given set of images to be detected in an object detection or recognition task respectively belongs to a plurality of categories, since each region generation algorithm describes the characteristics of an object from different angles and clues, in practical applications, the performance of the region generation algorithms often changes with different object categories. Assuming that there is a unique optimal target area generation algorithm for different categories, a uniform evaluation index is needed to evaluate target areas generated by different algorithms, so as to select an optimal target area generation algorithm for different target categories.

The evaluation of the target region only depends on an Intersection over Union (IoU) IoU index, and does not have the ability to perceive the richness of the discriminative information contained in the target region. For example, for an image containing a person, the bounding box in the image containing the whole body of the person is the true-valued area of the image, and we now assume that there are two target areas, the upper and lower body respectively, covering the true-valued area (the person), whose area is one-half of the true-valued area respectively. When we evaluate the two target areas using the existing target area evaluation algorithm, we find that the algorithm gives an approximate evaluation value because their IoU values are the same. In fact, however, the upper body region (particularly, the face region) provides more discriminative information than the lower body region, and therefore, the former should be given a better quality evaluation prediction.

In order to overcome the defects, the method provided by the invention can sense the discriminative information richness of the target area, and introduces a discriminative information richness index (DS) on the basis of using IoU as a basic evaluation index. The target area discriminant information can be effectively detected through the index, so that the target area containing the richness discriminant information is screened.

A target detection quality blind evaluation method with discriminant perception capability comprises a deep training network module, a target detection quality evaluation module and a constructed training sample, wherein the constructed training sample is a sample set which comprises a cross-ratio and index and a target area discriminant information richness quality evaluation index and is generated by collecting a public data set; the deep training network module comprises a loss function for setting a deep training network, and adjusts the output parameters and the forward propagation function of the deep training network to obtain a discriminant sensing model; and the target detection quality evaluation module loads the discriminant perception model obtained by the training of the adjusted deep training network module, executes a task of evaluating the target detection quality, and detects the cross ratio of a given target area and a truth value area and the richness of discriminant information in the index and the target area.

Further, the overall framework flow of the method is shown in fig. 1, and the overall technical flow includes the steps of constructing a training sample, a deep training network module and a target detection quality evaluation module. Specifically, a training sample module is constructed to collect public data sets and generate a sample set containing IoU and DS quality evaluation values; then in a deep training network module, designing a loss function according to a research task, adjusting the quantity of network output parameters, modifying a network forward propagation function to adapt to a sample data format generated at the early stage, and using the adjusted deep network training to obtain a discriminative perception model; in a target detection quality evaluation module, the model is applied to execute a target detection quality evaluation task so as to effectively detect the intersection ratio of a given target area and a truth value area and the richness of indexes and discriminant information in the target area.

Wherein the step of constructing the training sample comprises:

s101, collecting a public data set, obtaining a truth-value area based on an image marking file, dividing an image area according to an intersection ratio of a target truth-value area and taking {0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8 and 0.9} as a standard, and generating a target area coordinate with a corresponding value; and generating 5 target areas by each cross ratio sum, namely generating 54 samples by each image, wherein the target areas cover true value areas of the original image, including upper left, upper right, lower left and lower right, and target areas covering complete true value areas, and simultaneously obtaining coordinates of the generated target areas relative to a coordinate system of the original image.

When the evaluation index DS of the discriminative information richness of the divided regions is generated, and a Deep Description Transform (DDT) algorithm is applied to a single image, many regions with high response values correspond to "high discriminative" target local regions, and these regions can reflect the target attributes to a large extent. Based on the method, a target region discriminant information richness evaluation method based on a DDT algorithm is designed, and the calculation of the discriminant information richness quality evaluation index specifically comprises the following steps:

(a) Extracting a depth feature map of an input image through a classification pre-training model, calculating covariance matrixes of depth features of all positions, and solving feature values and feature vectors of the depth feature maps;

(b) extracting the first two groups of feature vectors with the largest feature values as projection directions, calculating the correlation between any position features of the depth feature map and the feature vectors, and generating an energy heat map;

(c) and mapping the target area to a two-dimensional matrix to calculate the energy density of the mapping area, so as to represent the discriminative information richness of the area, and taking the discriminative information richness as the quality evaluation index score of the discriminative information richness.

Wherein, the step of deep training network module includes:

s203, setting a mean square error loss function to calculate an input sample true value and an actual output value of the deep training network to obtain a loss value, and performing back propagation to adjust network parameters based on the loss value; the mean square Error loss function is MSE (mean Squared Error) and can evaluate the change degree of data, and the smaller the value of MSE is, the better accuracy of the prediction model describing experimental data is shown.

S204, setting the output dimension of the modified network full-connection layer to be 2, and respectively representing the cross ratio of a given target area and a true value area of the detected image and the quality evaluation indexes of the abundance degree of the discriminative information;

The target detection quality evaluation module comprises the following steps:

s301, inputting a verification sample set to evaluate a discriminant sensing model, and further optimizing a deep training network according to an output result to generate a target detection model;

s302, loading a target detection model after training is completed;

step S303, inputting an unmarked image and a target area coordinate covering a certain area of the unmarked image, inputting the unmarked image and the target area coordinate into a target detection quality evaluation model, and then outputting the discriminant information abundance degree contained in the representation target area and the intersection ratio of the target area and an expected truth value area, and outputting the two values.

The method provided by the invention utilizes the deep training network module, the target detection quality evaluation module and the constructed training sample, thereby effectively perceiving the abundance degree of the target attribute discrimination information of the detection area; in addition, the cross ratio and indexes of the target area and the truth value area can be integrated, more comprehensive quality evaluation on the detection area is realized, and the method has better generalization capability.

It will be understood by those skilled in the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.

It is to be understood that the above description is not intended to limit the present invention, and the present invention is not limited to the above examples, and those skilled in the art may make various changes, modifications, additions and substitutions within the spirit and scope of the present invention.

Claims

1. A blind evaluation method for target detection quality with discriminant perception capability is characterized by comprising a deep training network module, a target detection quality evaluation module and a construction training sample,

the construction of the training sample is to generate a sample set containing cross ratio and indexes and target area discriminant information richness and quality evaluation indexes by collecting a public data set;

The deep training network module comprises a loss function for setting a deep training network, and adjusts the output parameters and the forward propagation function of the deep training network to obtain a discriminant sensing model;

the target detection quality evaluation module loads a discriminant perception model obtained by training of the adjusted deep training network module, executes a task of evaluating target detection quality, detects the cross ratio of a given target area and a truth value area and the richness of discriminant information in the target area;

the step of constructing the training sample comprises:

s103, generating a training sample set, a verification sample set and a test sample set, wherein each sample comprises an image area coordinate, an intersection ratio corresponding to the image area coordinate, an index and a judgment information richness quality evaluation index;

the calculation of the judgment information richness quality evaluation index specifically comprises the following steps: (a) extracting a depth feature map of an input image through a classification pre-training model, calculating covariance matrixes of depth features of all positions, and solving feature values and feature vectors of the depth feature maps; (b) extracting the first two groups of feature vectors with the largest feature values as projection directions, calculating the correlation between any position features of the depth feature map and the feature vectors, and generating an energy heat map; (c) and mapping the target area to a two-dimensional matrix to calculate the energy density of the mapping area, so as to represent the discriminative information richness of the area, and taking the discriminative information richness as the quality evaluation index score of the discriminative information richness.

2. The blind evaluation method for the target detection quality with discriminant perception capability according to claim 1, wherein the step of deeply training the network module includes:

s204, setting the output dimension of the modified network full-connection layer to be 2, and respectively representing the cross ratio of a given target area and a true value area of the detected image and an index and a judgment information richness quality evaluation index;

3. The blind evaluation method for target detection quality with discriminant perception capability according to claim 1, wherein the target detection quality evaluation module comprises:

s302, loading a target detection model for completing training;

4. The blind evaluation method for target detection quality with discriminant perception capability of claim 1, wherein 5 target regions are generated for each cross ratio sum value in the cross ratio sum values of the target truth value regions, and the target regions cover original image truth value regions including upper left, upper right, lower left, lower right, and target regions covering complete truth value regions.