CN112348013A

CN112348013A - Target detection method, target detection device, computer equipment and readable storage medium

Info

Publication number: CN112348013A
Application number: CN202011163151.0A
Authority: CN
Inventors: 肖尧
Original assignee: Shanghai Eye Control Technology Co Ltd
Current assignee: Shanghai Eye Control Technology Co Ltd
Priority date: 2020-10-27
Filing date: 2020-10-27
Publication date: 2021-02-09

Abstract

The application relates to a target detection method, a target detection device, a computer device and a readable storage medium. The method comprises the following steps: acquiring a target image, and acquiring a plurality of super pixel sets according to the target image; the target image comprises at least one target object, and the plurality of super-pixel sets are obtained by fusing a plurality of super-pixels corresponding to the target image; performing iterative fusion on the plurality of super-pixel sets, acquiring set distances between corresponding adjacent super-pixel sets based on the minimum super-pixel distance between the adjacent super-pixel sets in the iterative process, and fusing each adjacent super-pixel set according to each set distance; the set distance is used for representing the feature similarity between adjacent super-pixel sets; determining a plurality of target candidate frames corresponding to the target object from a plurality of initial candidate frames obtained by iterative fusion; the plurality of target candidate frames are used for determining a detection position frame of the target object in the target image. By adopting the method, the fusion accuracy of the super-pixel set and the target detection accuracy can be improved.

Description

Target detection method, target detection device, computer equipment and readable storage medium

Technical Field

The present application relates to the field of artificial intelligence technologies, and in particular, to a target detection method, an apparatus, a computer device, and a readable storage medium.

Background

The target detection is a hot direction of computer vision and digital image processing, is widely applied to various fields of robot navigation, intelligent video monitoring, industrial detection, aerospace and the like, reduces the consumption of human capital through the computer vision, and has important practical significance.

A candidate frame corresponding to the target object is extracted from the image, and essential in the target detection process, a position frame of the target object in the image can be determined according to a plurality of candidate frames corresponding to the target object. In a traditional target detection method, after a superpixel set is obtained based on an image, average features in the superpixel set are used as features for representing the set, and set distances of adjacent superpixel sets are calculated by adopting the average features so as to perform fusion of the superpixel sets. For example, taking color characteristics as an example, for each super-pixel set, the average color in the super-pixel set is taken as the characteristic of the super-pixel set.

However, in the conventional target detection method, the set distance of the adjacent superpixel sets is calculated through the average feature of each superpixel set, and the fusion accuracy of the superpixel sets is often poor, so that the target detection accuracy is poor.

Disclosure of Invention

In view of the foregoing, it is desirable to provide a target detection method, an apparatus, a computer device and a readable storage medium, which can improve the fusion accuracy of a super-pixel set, thereby improving the accuracy of target detection.

In a first aspect, an embodiment of the present application provides a target detection method, where the method includes:

acquiring a target image, and acquiring a plurality of super pixel sets according to the target image; the target image comprises at least one target object, and the plurality of super-pixel sets are obtained by fusing a plurality of super-pixels corresponding to the target image;

performing iterative fusion on the plurality of super-pixel sets, acquiring set distances between corresponding adjacent super-pixel sets based on the minimum super-pixel distance between the adjacent super-pixel sets in each iterative process, and fusing each adjacent super-pixel set according to each set distance; the set distance is used for characterizing feature similarity between the adjacent super-pixel sets;

determining a plurality of target candidate frames corresponding to the target object from a plurality of initial candidate frames obtained by iterative fusion; the plurality of target candidate frames are used for determining a detection position frame of the target object in the target image.

In one embodiment, the obtaining a set distance between corresponding neighboring superpixel sets based on a minimum superpixel distance between the neighboring superpixel sets comprises:

acquiring the maximum superpixel distance between adjacent superpixel sets;

acquiring high-complexity distances between corresponding adjacent superpixel sets based on the minimum superpixel distance between the adjacent superpixel sets, and acquiring low-complexity distances between the corresponding adjacent superpixel sets based on the maximum superpixel distance between the adjacent superpixel sets;

acquiring a set distance between corresponding adjacent superpixel sets according to a low complexity distance, a high complexity distance and preset weight constraint parameters between the adjacent superpixel sets;

wherein the low complexity distance is used for characterizing feature similarity between adjacent superpixel sets under the condition that the feature complexity of the superpixel sets is low; the high complexity distance is used for characterizing feature similarity between adjacent superpixel sets under the condition that the feature complexity of the superpixel sets is high complexity.

In one embodiment, the method further comprises:

acquiring a color and material characteristic distance between super pixels in a plurality of super pixels corresponding to the target image; the color and material characteristic distance is used for representing the characteristic similarity of the color characteristic and the material characteristic between a corresponding pair of super pixels;

and determining the minimum color and material characteristic distance in the adjacent super-pixel sets as the minimum super-pixel distance between the corresponding adjacent super-pixel sets, and determining the maximum color and material characteristic distance in the adjacent super-pixel sets as the maximum super-pixel distance between the corresponding adjacent super-pixel sets.

In one embodiment, the obtaining high complexity distances between corresponding neighboring superpixel sets based on a minimum superpixel distance between neighboring superpixel sets, and obtaining low complexity distances between corresponding neighboring superpixel sets based on a maximum superpixel distance between neighboring superpixel sets comprises:

obtaining a graph feature distance and an edge loss distance between adjacent super-pixel sets; the graph characteristic distance represents the distance between the super pixels with the nearest distance in the adjacent super pixel sets, and the edge loss distance is obtained based on the edge response graph corresponding to the target image;

calculating high complexity distances between corresponding adjacent superpixel sets according to the minimum superpixel distance between the adjacent superpixel sets and the image characteristic distance;

and calculating the low-complexity distance between the corresponding adjacent superpixel sets according to the maximum superpixel distance between the adjacent superpixel sets, the graph characteristic distance and the edge loss distance.

In one embodiment, the obtaining of the map feature distance and the edge loss distance between adjacent super-pixel sets comprises:

calculating edge loss values between a pair of superpixels in adjacent superpixel sets according to the edge response graph corresponding to the target image, and calculating edge loss distances between the corresponding adjacent superpixel sets according to the edge loss values corresponding to the adjacent superpixel sets;

and taking each super pixel corresponding to the target image as a node, constructing a connection graph, acquiring a super pixel graph distance between adjacent super pixels by adopting a shortest path algorithm, and calculating a graph characteristic distance between corresponding adjacent super pixel sets according to each super pixel graph distance corresponding to the adjacent super pixel sets.

In one embodiment, the acquiring a plurality of super-pixel sets according to the target image includes:

performing superpixel segmentation on the target image to obtain a plurality of superpixels corresponding to the target image;

calculating the characteristic distance of color and material, the distance of a superpixel graph and an edge loss value between adjacent superpixels;

and fusing the plurality of super pixels according to the color and material characteristic distance, the super pixel graph distance and the edge loss value between the adjacent super pixels by adopting a greedy algorithm to obtain the plurality of super pixel sets.

In one embodiment, the determining a plurality of target candidate boxes corresponding to the target object from a plurality of initial candidate boxes obtained by iterative fusion includes:

calculating an evaluation score corresponding to each initial candidate frame according to an edge loss value corresponding to a super pixel included in each initial candidate frame;

sorting the evaluation scores corresponding to the initial candidate boxes according to the order of the scores, and determining a plurality of target evaluation scores from the sorting result;

and determining the initial candidate frame corresponding to each target evaluation score as the target candidate frame.

In a second aspect, an embodiment of the present application provides an object detection apparatus, including:

the first acquisition module is used for acquiring a target image and acquiring a plurality of super pixel sets according to the target image; the target image comprises at least one target object, and the plurality of super-pixel sets are obtained by fusing a plurality of super-pixels corresponding to the target image;

the fusion module is used for performing iterative fusion on the plurality of super-pixel sets, acquiring set distances between corresponding adjacent super-pixel sets based on the minimum super-pixel distance between the adjacent super-pixel sets in each iterative process, and fusing each adjacent super-pixel set according to each set distance; the set distance is used for characterizing feature similarity between the adjacent super-pixel sets;

the first determining module is used for determining a plurality of target candidate frames corresponding to the target object from a plurality of initial candidate frames obtained by iterative fusion; the plurality of target candidate frames are used for determining a detection position frame of the target object in the target image.

In a third aspect, an embodiment of the present application provides a computer device, including a memory and a processor, where the memory stores a computer program, and the processor implements the steps of the method according to the first aspect when executing the computer program.

In a fourth aspect, embodiments of the present application provide a computer-readable storage medium, on which a computer program is stored, which, when executed by a processor, implements the steps of the method according to the first aspect as described above.

The beneficial effects brought by the technical scheme provided by the embodiment of the application at least comprise:

in the conventional technology, the average feature of a superpixel set is used as the feature for characterizing the set, and if the complexity of a detected object is high, for example, a superpixel set includes a plurality of regions with distinct colors, then the use of the average color of the superpixel set is a property that cannot accurately characterize the region of the superpixel set, that is, cannot accurately characterize the actual feature of the superpixel set. Therefore, the set distance between the adjacent superpixel sets is calculated based on the average color of the adjacent superpixel sets, and the obtained set distance may be far different from the actual feature similarity between the adjacent superpixel sets, which may cause the false fusion of the adjacent superpixel sets, and the fusion accuracy is poor, thereby also affecting the accuracy of target detection.

In the embodiment of the application, a target image is obtained, a plurality of super-pixel sets are obtained according to the target image, then the super-pixel sets are subjected to iterative fusion, in each iterative process, a set distance between corresponding adjacent super-pixel sets is obtained based on the minimum super-pixel distance between the adjacent super-pixel sets, and each adjacent super-pixel set is fused according to each set distance, wherein the set distance is used for representing the feature similarity between the adjacent super-pixel sets; determining a plurality of target candidate frames corresponding to the target object from a plurality of initial candidate frames obtained by iterative fusion; the plurality of target candidate frames are used for determining a detection position frame of the target object in the target image. Therefore, the minimum superpixel distance between the adjacent superpixel sets represents the feature similarity of the most similar part between the adjacent superpixel sets, if the minimum superpixel distance is larger, no similar part exists between the adjacent superpixel sets, and if the minimum superpixel distance is smaller, a similar part exists between the adjacent superpixel sets.

Drawings

FIG. 1 is a schematic flow chart of a method for object detection in one embodiment;

FIG. 2 is a schematic diagram of a partial refinement of step S200 in another embodiment;

FIG. 3 is a schematic diagram illustrating a process of acquiring a minimum superpixel distance and a maximum superpixel distance between neighboring superpixel sets by a computer device according to another embodiment;

FIG. 4 is a diagram illustrating a detailed step of step S202 in another embodiment;

FIG. 5 is a schematic diagram of a partial refinement of step S100 in another embodiment;

FIG. 6 is a schematic flow chart diagram of a target detection method in another embodiment;

FIG. 7 is a block diagram of an embodiment of an object detection device;

FIG. 8 is a diagram illustrating an internal structure of a computer device according to an embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

The target detection method, the target detection device, the computer equipment and the readable storage medium provided by the embodiment of the application aim at solving the technical problems that in the prior art, the average characteristic of a superpixel set is taken as the characteristic for representing the set, the set distance of adjacent superpixel sets is calculated by adopting the average characteristic to fuse the superpixel sets, and the target detection accuracy is poor due to poor fusion accuracy of the superpixel sets. The technical solution of the present application will be specifically described below by way of examples with reference to the accompanying drawings. The following specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments.

It should be noted that, in the object detection method provided in the embodiment of the present application, an execution subject may be an object detection apparatus, and the object detection apparatus may be implemented as part or all of a computer device by software, hardware, or a combination of software and hardware, and the computer device may be a server. In the following method embodiments, the execution subject is a computer device as an example. It can be understood that the target detection method provided by the following method embodiments may also be applied to a terminal, may also be applied to a system including the terminal and a server, and is implemented through interaction between the terminal and the server.

In one embodiment, as shown in fig. 1, there is provided a target detection method comprising the steps of:

step S100, acquiring a target image, and acquiring a plurality of super pixel sets according to the target image.

In this embodiment of the application, the target image may be an image to be detected, where the target image includes at least one target object, and the target object is an object to be detected. The computer device may obtain the target image, which may be a frame of image extracted from video data, such as from monitoring video data, or an image collected by the computer device through an image collecting component, and is not limited specifically herein.

The computer equipment acquires a plurality of super pixel sets according to the target image, wherein the super pixel sets are obtained by fusing a plurality of super pixels corresponding to the target image through the computer equipment. As an embodiment, the computer device may perform superpixel segmentation on the target image by using a superpixel segmentation algorithm to obtain a plurality of superpixels, and then fuse the plurality of superpixels to obtain a plurality of superpixel sets, where the superpixel segmentation algorithm may be, for example, a slic (simple Linear Iterative cluster) algorithm.

Step S200, performing iterative fusion on a plurality of super-pixel sets, acquiring set distances between corresponding adjacent super-pixel sets based on the minimum super-pixel distance between the adjacent super-pixel sets in each iterative process, and fusing each adjacent super-pixel set according to each set distance.

In this embodiment, the computer device may calculate a superpixel distance for each two of a plurality of superpixels obtained by segmenting the target image, where the superpixel distance between a pair of superpixels may be a characteristic distance between the pair of superpixels, and the characteristic distance may be, for example, a color and material characteristic distance, that is, a characteristic similarity between a color characteristic and a material characteristic.

The computer device obtains a minimum superpixel distance between adjacent superpixel sets in each iterative fusion process, wherein the minimum superpixel distance is a superpixel distance between a pair of most similar superpixels between the adjacent superpixel sets. For example, i is the set of super-pixels S_mJ is a set S of superpixels_nAnd the superpixel distance between the superpixel i and the superpixel j is S_mAnd S_nThe computer device determines the superpixel distance between the superpixel i and the superpixel j as the adjacent superpixel set S_mAnd S_nThe minimum superpixel distance in between.

In one possible implementation, the computer device may obtain a set distance between the adjacent super-pixel sets by combining the minimum super-pixel distance between the adjacent super-pixel sets and an image feature distance between the adjacent super-pixel sets, where the image feature distance may be obtained by the computer device by using each super-pixel corresponding to the target image as a node to construct a connection graph, obtaining a super-pixel image distance between the adjacent super-pixels by using a shortest path algorithm, and determining a super-pixel image distance between the super-pixels with the most similarity between the two sets as the image feature distance between the adjacent super-pixel sets.

In another possible implementation, the computer device may further obtain a set distance between corresponding neighboring super-pixel sets by combining a minimum super-pixel distance between neighboring super-pixel sets and a maximum super-pixel distance between neighboring super-pixel sets, where the maximum super-pixel distance between neighboring super-pixel sets characterizes a feature similarity of a least similar portion between neighboring super-pixel sets, and if the maximum super-pixel distance is smaller, the computer device characterizes that there is no large fluctuation inside the neighboring super-pixel sets, that is, the similarity is higher. Alternatively, the computer device may also determine a minimum superpixel distance between neighboring superpixel sets as a set distance between neighboring superpixel sets, and so on. The manner in which the computer device obtains the set distance between corresponding neighboring superpixel sets based on the minimum superpixel distance between neighboring superpixel sets is not particularly limited herein.

The set distance between adjacent superpixel sets is used to characterize the feature similarity between adjacent superpixel sets. The computer device fuses the adjacent superpixel sets according to the set distances, for example, a greedy algorithm may be adopted to fuse the most similar pair of superpixel sets, that is, the adjacent superpixel set with the minimum set distance is fused, and the fused region is output as an initial candidate box.

Step S300, determining a plurality of target candidate frames corresponding to the target object from a plurality of initial candidate frames obtained by iterative fusion.

The computer device determines a plurality of target candidate frames corresponding to the target object from a plurality of initial candidate frames obtained by iterative fusion, and may calculate evaluation scores for the plurality of initial candidate frames respectively, sort the plurality of initial candidate frames in a descending order of the scores, select a preset number of initial candidate frames ranked in the top, and determine the plurality of initial candidate frames corresponding to the target object.

Wherein a plurality of target candidate frames are used for determining the detected position frame of the target object in the target image, i.e. the computer device may determine the exact position of the target object in the target image based on the plurality of target candidate frames.

In the embodiment of the application, a target image is obtained, a plurality of super-pixel sets are obtained according to the target image, then the super-pixel sets are subjected to iterative fusion, in each iterative process, a set distance between corresponding adjacent super-pixel sets is obtained based on the minimum super-pixel distance between the adjacent super-pixel sets, and each adjacent super-pixel set is fused according to each set distance, wherein the set distance is used for representing the feature similarity between the adjacent super-pixel sets; determining a plurality of target candidate frames corresponding to the target object from a plurality of initial candidate frames obtained by iterative fusion; the plurality of target candidate frames are used for determining a detection position frame of the target object in the target image. Therefore, the minimum superpixel distance between the adjacent superpixel sets represents the feature similarity of the most similar parts between the adjacent superpixel sets, if the minimum superpixel distance is larger, no similar part exists between the adjacent superpixel sets, and if the minimum superpixel distance is smaller, a similar part exists between the adjacent superpixel sets, so that the minimum superpixel distance can accurately measure whether the actual similar parts exist between the adjacent superpixel sets; according to the method and the device, for the super-pixel sets with higher complexity, the set distance is obtained based on the minimum super-pixel distance between the adjacent super-pixel sets, namely the actual feature similarity between the adjacent super-pixel sets is accurately measured based on the minimum super-pixel distance, the defect caused by the fact that the set distance between the adjacent super-pixel sets is calculated by average color for the super-pixel sets with higher complexity is avoided, the accuracy of the obtained feature similarity between the adjacent super-pixel sets can be improved, the super-pixel sets are favorably accurately fused, and the accuracy of target detection is further improved.

In one embodiment, based on the embodiment shown in fig. 1, the present embodiment relates to a process of how a computer device obtains a set distance between corresponding neighboring superpixel sets based on a minimum superpixel distance between the neighboring superpixel sets. Referring to fig. 2, the process includes step S201, step S202, and step S203:

step S201, a maximum superpixel distance between adjacent superpixel sets is obtained.

In the embodiment of the application, the computer device obtains the maximum superpixel distance between adjacent superpixel sets, which is similar to the minimum superpixel distance, where the maximum superpixel distance may be the superpixel distance between a pair of least similar superpixels between adjacent superpixel sets, and the superpixel distance may be a color and material characteristic distance.

As an embodiment, set of neighboring superpixels S_mAnd S_nMaximum superpixel distance D between_maxThe (m, n) computer device may be determined using equation 1 as follows:

D_max(m,n)＝max{d_ct(i,j)|i∈S_m,j∈S_nformula 1

Wherein i is the super-pixel set S_mJ is a set S of superpixels_nAny one of the super pixels in (b), d_ct(i, j) is the superpixel distance between superpixel i and superpixel j, i.e. the color texture feature distance.

Likewise, a set S of neighboring superpixels_mAnd S_nMinimum superpixel distance D between_min(m, n) can be determined using the following equation 2:

D_min(m,n)＝min{d_ct(i,j)|i∈S_m,j∈S_nequation 2

In this embodiment, as an implementation manner, the computer device may execute step S401 and step S402 shown in fig. 3 to obtain a minimum superpixel distance and a maximum superpixel distance between adjacent superpixel sets:

step S401, obtaining a color and material characteristic distance between each super pixel in a plurality of super pixels corresponding to the target image.

In the embodiment of the application, the computer device performs calculation of a superpixel distance on a plurality of superpixels obtained by segmenting a target image, wherein the superpixel distance is a color and material characteristic distance. The color and material characteristic distance is used for representing the characteristic similarity of the color characteristic and the material characteristic between the corresponding pair of super pixels.

As an embodiment, for color features, the computer device may build a normalized one-dimensional color histogram h for each superpixel_cEach color channel is divided into 20 color grades, namely each color channel is divided into 20 grids, the computer device counts the grid in which the pixel value of each pixel in the superpixel falls, a 60-dimensional vector is obtained after counting, and the number of the corresponding pixel values is represented in each grid. For the material characteristics, the computer device can calculate the Gaussian gradients in eight directions for each superpixel to obtain a normalized material histogram h_tFor example, the computer device calculates the gradient of each pixel in the superpixel by using the first direction, constructs a histogram, and then combines the histograms of the eight directions together to obtain a normalized material histogram, wherein the eight directions may be the directions of the rest eight lattices in the nine-square with the superpixel as the center relative to the center.

After the color characteristics and the material characteristics of each super pixel are obtained through calculation by computer equipment, the color and material characteristic distance d between the super pixel i and the super pixel j is calculated by adopting the following formula 3_ct(i,j)：

d_ct(i,j)＝||h_c(i)-h_c(j)||+||h_t(i)-h_t(j) Equation 3

Wherein h is_c(i) Is the color of the super pixel iColor characteristics, h_c(j) Is a color feature of a super-pixel j, h_t(i) Is a material characteristic of the super pixel i, h_t(j) Is a material characteristic of the super pixel j.

Thus, the computer device obtains a color material characteristic distance between a pair of super pixels.

Step S402, determining the minimum color and material characteristic distance in the adjacent superpixel sets as the minimum superpixel distance between the corresponding adjacent superpixel sets, and determining the maximum color and material characteristic distance in the adjacent superpixel sets as the maximum superpixel distance between the corresponding adjacent superpixel sets.

As shown in equations 1 and 2, for a set of neighboring superpixels, the computer device determines a minimum color material characteristic distance between the set of neighboring superpixels as a minimum superpixel distance between the set of neighboring superpixels, and the computer device determines a maximum color material characteristic distance between the set of neighboring superpixels as a maximum superpixel distance between the set of neighboring superpixels.

The maximum superpixel distance and the minimum superpixel distance can be used to measure the distance between two superpixel sets in the case of low complexity and high complexity, respectively. If the maximum superpixel distance is small, the similarity of the least similar parts in the two superpixel sets is high, no large fluctuation exists in the two superpixel sets, and the method is suitable for being used in the low-complexity situation. Conversely, if the minimum superpixel distance is small, at least one pair of superpixels in the two superpixel sets are very similar, which is more suitable for high complexity.

Step S202, high complexity distance between corresponding adjacent superpixel sets is obtained based on minimum superpixel distance between adjacent superpixel sets, and low complexity distance between corresponding adjacent superpixel sets is obtained based on maximum superpixel distance between adjacent superpixel sets.

The low complexity distance is used for representing the feature similarity between adjacent superpixel sets under the condition that the feature complexity of the superpixel sets is low; the high complexity distance is used to characterize the feature similarity between neighboring superpixel sets where the feature complexity of a superpixel set is high.

In one possible implementation of step S202, as shown in fig. 4, step S202 may include step S2021, step S2022, and step S2023:

step S2021, map feature distances and edge loss distances between adjacent superpixel sets are obtained.

The image characteristic distance represents the distance between the super pixels with the nearest distance in the adjacent super pixel set, and the edge loss distance is obtained based on the edge response image corresponding to the target image.

As an embodiment, the computer device may calculate an edge loss value between a pair of superpixels in the neighboring superpixel sets according to the edge response map corresponding to the target image, and calculate an edge loss distance between the corresponding neighboring superpixel sets according to each edge loss value corresponding to the neighboring superpixel sets.

Edge loss measures the strength of the edge response at the intersection of two regions. For the target image, the computer device firstly generates an Edge response image E by adopting a Structure Edge algorithm, and then the Edge loss value between the super pixel i and the super pixel j is the Edge response accumulated on the boundary pixel and then the boundary length is normalized. Assume that the set of boundary pixels is l_i,jThen the edge loss value d between superpixel i and superpixel j_e(i, j) is as defined

Formula 4:

where p is the index of the pixel.

The super pixel set S_mAnd S_nEdge loss distance D therebetween_e(m, n) is shown in equation 5:

thus, the computer device calculates the edge loss distance between adjacent superpixel sets.

The computer device takes each super-pixel corresponding to the target image as a node, constructs a connection graph, namely, an edge is connected between adjacent super-pixels, the weight of the edge is set to be 1, and a shortest path algorithm is adopted to obtain the super-pixel graph distance between the adjacent super-pixels, for example, the computer device adopts a Floyd-Warshall algorithm to calculate the super-pixel graph distance d between the adjacent super-pixels i and j_g(i,j)。

The computer device calculates the graph characteristic distance between the corresponding adjacent superpixel sets according to the superpixel graph distance corresponding to the adjacent superpixel sets, as an implementation mode, the adjacent superpixel sets S_mAnd S_nGraph characteristic distance D between_g(m, n) may be two adjacent sets S of superpixels_mAnd S_nThe graph feature distance between the most similar superpixels in between. Thus, the computer device obtains the map feature distances between adjacent sets of superpixels.

Step S2022, calculating a high complexity distance between corresponding neighboring superpixel sets according to the minimum superpixel distance between neighboring superpixel sets and the graph feature distance.

Based on the complexity of the different superpixel sets, the computer device calculates high complexity distances and low complexity distances between corresponding neighboring superpixel sets. As an embodiment, the computer device may calculate the set S of neighboring superpixels using equation 6 below_mAnd S_nHigh complexity distance D between_H：

D_H＝D_min(m,n)+bD_g(m, n) formula 6

Wherein D is_g(m, n) as a spatial constraint term, 0<b<1, thereby weakening the spatial constraint relationship.

Step S2023, calculating a low complexity distance between corresponding neighboring superpixel sets according to a maximum superpixel distance between neighboring superpixel sets, a graph feature distance, and an edge loss distance.

As an embodiment, calculateThe computer device can calculate the neighboring super-pixel set S by using the following formula 7_mAnd S_nLow complexity distance D between_L：

D_L(m,n)＝D_max(m,n)+D_e(m,n)+D_g(m, n) formula 7

Wherein D is_g(m, n) as a spatial constraint term. At low complexity, it is preferable to combine adjacent sets of superpixels first.

Step S203, acquiring a set distance between corresponding adjacent superpixel sets according to the low complexity distance and the high complexity distance between the adjacent superpixel sets and a preset weight constraint parameter.

As an embodiment, the computer device may calculate the set S of neighboring superpixels using equation 8 as follows_mAnd S_nSet distance D between_A：

D_A＝ρ_m,nD_L+(1-ρ_m,n)D_H+ηD_sEquation 8

Where ρ is_m,nAnd η D_sAre all weight constraint parameters. Eta may be set to 2, D_sThe area constraint term can ensure that small sets of superpixels are fused first for the area constraint term. For example, using r to represent the relative area of the superpixel set (i.e., the ratio of pixels in the set to the total pixels of the target image), then:

D_s(m,n)＝r_m+r_nequation 9

Complexity factor ρ_m,nCharacterizing a set S of superpixels_mAnd S_nLevel of complexity of (1), let T_m,T_nIs a set of pixels S_mAnd S_nThe number of super pixels, T is the total number of super pixels, then rho_m,nComprises the following steps:

where α represents a complexity level, λ represents a boundary between low complexity and high complexity, λ may be 5, for example, and the parameter σ may affect smoothness of complexity change, and σ may be set to 0.1.

Therefore, the computer equipment acquires the set distance between the corresponding adjacent superpixel sets according to the low complexity distance and the high complexity distance between the adjacent superpixel sets and the preset weight constraint parameter.

In the embodiment of the application, the adaptive complexity distance of the adjacent superpixel sets can be realized through the low complexity distance and the high complexity distance, namely, the calculation of the set distance is realized, therefore, even for the sets of the superpixel sets with different complexities, the accurate set distance of the adaptive complexity between the adjacent superpixel sets can be obtained through the formula 8, so that the fusion accuracy of the superpixel sets can be improved under the condition that the complexities of the superpixel sets are different, the fusion strategy is selected in a self-adaptive mode by considering the target complexity, the fusion route is more reasonable, the generated target candidate frame is more accurate and diversified, and the accuracy of target detection is improved.

In the embodiment of the application, through the distance measurement of the self-adaptive complexity, parameters can be self-adaptively adjusted according to the complexity of the super-pixel set, namely the complexity of the target object, so that the super-pixel set with low complexity and high complexity uses different distance similarities, namely the set distance, and further the strategy and standard of super-pixel set fusion are changed, and the quality of the target candidate frame is improved.

In one embodiment, based on the embodiment shown in fig. 1, the present embodiment relates to a process of how a computer device obtains a plurality of super pixel sets according to a target image. As shown in fig. 5, the process may include step S101, step S102, and step S103:

step S101, performing superpixel segmentation on the target image to obtain a plurality of superpixels corresponding to the target image.

Step S102, calculating the color and material characteristic distance between adjacent superpixels, the superpixel graph distance and the edge loss value.

And S103, fusing the multiple superpixels according to the color and material characteristic distance, the superpixel graph distance and the edge loss value between every two adjacent superpixels by adopting a greedy algorithm to obtain multiple superpixel sets.

In this embodiment, the computer device may perform superpixel segmentation on the target image by using a superpixel segmentation algorithm to obtain a plurality of superpixels, then fuse the plurality of superpixels, and obtain a plurality of superpixel sets after the fusion, where the superpixel segmentation algorithm may be, for example, an slic (simple Linear Iterative cluster) algorithm.

The computer equipment adopts the formula 3 in the embodiment to calculate the characteristic distance of the color and the material between the adjacent super pixels; the computer device takes each super pixel corresponding to the target image as a node, constructs a connection graph, namely, an edge is connected between adjacent super pixels, the weight of the edge is set to be 1, and a shortest path algorithm is adopted to obtain the super pixel graph distance between the adjacent super pixels; the computer device calculates the edge loss value between adjacent superpixels using equation 4 in the above embodiment. For a detailed implementation, please refer to the above embodiments, which are not described herein again.

The computer equipment adopts a greedy algorithm, according to color and material characteristic distance between each adjacent superpixel, superpixel graph distance and edge loss value, fuse a plurality of superpixels, for example, select the most similar superpixel to fuse, obtain a plurality of superpixel sets, therefore, combine color and material characteristic distance between the adjacent superpixels, superpixel graph distance and edge loss value to fuse a plurality of superpixels, compare in adopting the mode of Euclidean distance to fuse a plurality of superpixels among the conventional art, the accuracy that superpixel fuses can be promoted to this embodiment, provide reliable basis for the fusion of superpixel sets.

In one embodiment, on the basis of the embodiment shown in fig. 1, this embodiment relates to a process of how a computer device determines a plurality of target candidate boxes corresponding to a target object from a plurality of initial candidate boxes obtained by iterative fusion. Referring to fig. 6, step 300 may include step S301, step S302, and step S303:

step S301, calculating the evaluation score corresponding to each initial candidate frame according to the edge loss value corresponding to the superpixel included in each initial candidate frame.

Iterative fusion results in a frame of the initial candidate frames that contains a large amount of redundancy, such as a single superpixel or background region. To remove these redundant boxes that are unlikely to contain the object, the computer device calculates an edge response-based score, i.e., an evaluation score, for each initial candidate box, and if the edge response at the periphery of the initial candidate box is large, indicating that the initial candidate box has a distinct surrounding edge, it is more likely to contain the foreground object.

In this embodiment of the present application, for the initial candidate box p, the computer device may calculate an evaluation score (p) corresponding to the initial candidate box using the following formula 12:

where k is set to less than 1 to favor larger windows, the initial candidate frame p is formed by the set of superpixels S_pIn the composition, the edge response is calculated using the edge loss of the region boundary, and the evaluation score is obtained by summing and normalizing the edge losses on the peripheral boundary.

Step S302, the evaluation scores corresponding to the initial candidate boxes are sorted according to the order of the scores, and a plurality of target evaluation scores are determined from the sorting result.

The computer device sorts the evaluation scores corresponding to the initial candidate boxes in the order of the scores, for example, the evaluation scores corresponding to the initial candidate boxes may be sorted from large to small, and 1000 target evaluation scores ranked at the top, i.e., with higher scores, may be determined from the sorted results.

Step S303, determining the initial candidate frame corresponding to each target evaluation score as a target candidate frame.

And the computer equipment determines the initial candidate frame corresponding to each target evaluation score as a target candidate frame, and the plurality of target candidate frames are used for determining the detection position frame of the target object in the target image by the computer equipment.

According to the new sorting method designed by the embodiment of the application, the edge response of the peripheral part of the initial candidate frame is considered, the normalized edge response is calculated to serve as the evaluation score, the initial candidate frame more likely to contain objects is arranged in front, a large number of redundant frames are eliminated, and the calculation speed of computer equipment and the quality of the target candidate frame are improved.

Tests are carried out on a PASCAL VOC2012 data set, the threshold value of the IOU is set to be 0.5, and MABO, AUC and recall rate are used as indexes, experiments show that when 2000 candidate boxes are output by the target detection method in the embodiment of the application, MABO reaches 0.84, AUC reaches 0.647, and the recall rate reaches 96%, so that the quality of the target candidate boxes and the accuracy of target detection are greatly improved.

It should be understood that although the various steps in the flow charts of fig. 1-6 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least some of the steps in fig. 1-6 may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed in turn or alternately with other steps or at least some of the other steps.

In one embodiment, as shown in fig. 7, there is provided an object detection apparatus including:

a first obtaining module 10, configured to obtain a target image, and obtain a plurality of superpixel sets according to the target image; the target image comprises at least one target object, and the plurality of super-pixel sets are obtained by fusing a plurality of super-pixels corresponding to the target image;

a fusion module 20, configured to perform iterative fusion on the plurality of superpixel sets, obtain a set distance between corresponding neighboring superpixel sets based on a minimum superpixel distance between neighboring superpixel sets in each iteration process, and fuse each neighboring superpixel set according to each set distance; the set distance is used for characterizing feature similarity between the adjacent super-pixel sets;

a first determining module 30, configured to determine, from a plurality of initial candidate frames obtained through iterative fusion, a plurality of target candidate frames corresponding to the target object; the plurality of target candidate frames are used for determining a detection position frame of the target object in the target image.

Optionally, the fusion module 20 includes:

a first acquisition unit for acquiring a maximum superpixel distance between adjacent superpixel sets;

a second obtaining unit, configured to obtain a high-complexity distance between corresponding neighboring superpixel sets based on a minimum superpixel distance between neighboring superpixel sets, and obtain a low-complexity distance between corresponding neighboring superpixel sets based on a maximum superpixel distance between neighboring superpixel sets;

the third acquisition unit is used for acquiring the set distance between corresponding adjacent superpixel sets according to the low complexity distance and the high complexity distance between the adjacent superpixel sets and a preset weight constraint parameter;

Optionally, the second obtaining unit is specifically configured to obtain a graph feature distance and an edge loss distance between adjacent super-pixel sets; the graph characteristic distance represents the distance between the super pixels with the nearest distance in the adjacent super pixel sets, and the edge loss distance is obtained based on the edge response graph corresponding to the target image; calculating high complexity distances between corresponding adjacent superpixel sets according to the minimum superpixel distance between the adjacent superpixel sets and the image characteristic distance; and calculating the low-complexity distance between the corresponding adjacent superpixel sets according to the maximum superpixel distance between the adjacent superpixel sets, the graph characteristic distance and the edge loss distance.

Optionally, the apparatus further comprises:

the second acquisition module is used for acquiring the color and material characteristic distance between the super pixels in the plurality of super pixels corresponding to the target image; the color and material characteristic distance is used for representing the characteristic similarity of the color characteristic and the material characteristic between a corresponding pair of super pixels;

and the second determining module is used for determining the minimum color and material characteristic distance in the adjacent superpixel sets as the minimum superpixel distance between the corresponding adjacent superpixel sets, and determining the maximum color and material characteristic distance in the adjacent superpixel sets as the maximum superpixel distance between the corresponding adjacent superpixel sets.

Optionally, the second obtaining unit is specifically configured to calculate an edge loss value between a pair of super pixels in adjacent super pixel sets according to the edge response graph corresponding to the target image, and calculate an edge loss distance between corresponding adjacent super pixel sets according to each edge loss value corresponding to the adjacent super pixel sets; and taking each super pixel corresponding to the target image as a node, constructing a connection graph, acquiring a super pixel graph distance between adjacent super pixels by adopting a shortest path algorithm, and calculating a graph characteristic distance between corresponding adjacent super pixel sets according to each super pixel graph distance corresponding to the adjacent super pixel sets.

Optionally, the first obtaining module 10 includes:

the segmentation unit is used for performing superpixel segmentation on the target image to obtain the plurality of superpixels corresponding to the target image;

the first calculation unit is used for calculating the color and material characteristic distance, the superpixel graph distance and the edge loss value between adjacent superpixels;

and the fusion unit is used for fusing the plurality of super pixels according to the color material characteristic distance, the super pixel graph distance and the edge loss value between the adjacent super pixels by adopting a greedy algorithm to obtain the plurality of super pixel sets.

Optionally, the first determining module 30 includes:

a second calculating unit, configured to calculate an evaluation score corresponding to each of the initial candidate frames according to an edge loss value corresponding to a super pixel included in each of the initial candidate frames;

the sorting unit is used for sorting the evaluation scores corresponding to the initial candidate frames according to the order of the scores and determining a plurality of target evaluation scores from the sorting result;

and the determining unit is used for determining the initial candidate frame corresponding to each target evaluation score as the target candidate frame.

For specific limitations of the target detection device, reference may be made to the above limitations of the target detection method, which are not described herein again. The modules in the target detection device can be wholly or partially realized by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

In one embodiment, a computer device is provided, which may be a server, and its internal structure diagram may be as shown in fig. 8. The computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the computer device is used for storing data of the target detection method. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a method of object detection.

Those skilled in the art will appreciate that the architecture shown in fig. 8 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

In one embodiment, a computer device is provided, comprising a memory and a processor, the memory having a computer program stored therein, the processor implementing the following steps when executing the computer program:

In one embodiment, the processor, when executing the computer program, further performs the steps of:

acquiring the maximum superpixel distance between adjacent superpixel sets;

In one embodiment, a computer-readable storage medium is provided, having a computer program stored thereon, which when executed by a processor, performs the steps of:

In one embodiment, the computer program when executed further performs the steps of:

acquiring the maximum superpixel distance between adjacent superpixel sets;

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database or other medium used in the embodiments provided herein can include at least one of non-volatile and volatile memory. Non-volatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical storage, or the like. Volatile Memory can include Random Access Memory (RAM) or external cache Memory. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM), among others.

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A method of object detection, the method comprising:

2. The method of claim 1, wherein obtaining a set distance between corresponding neighboring superpixel sets based on a minimum superpixel distance between the neighboring superpixel sets comprises:

acquiring the maximum superpixel distance between adjacent superpixel sets;

3. The method of claim 2, further comprising:

4. The method of claim 2, wherein obtaining high complexity distances between corresponding neighboring superpixel sets based on a minimum superpixel distance between neighboring superpixel sets, and obtaining low complexity distances between corresponding neighboring superpixel sets based on a maximum superpixel distance between neighboring superpixel sets comprises:

5. The method of claim 4, wherein obtaining a map feature distance and an edge loss distance between neighboring superpixel sets comprises:

6. The method of claim 1, wherein said obtaining a plurality of sets of superpixels from said target image comprises:

7. The method of claim 1, wherein determining a plurality of target candidate boxes corresponding to the target object from a plurality of initial candidate boxes obtained from iterative fusion comprises:

8. An object detection apparatus, characterized in that the apparatus comprises:

9. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor, when executing the computer program, implements the steps of the method of any of claims 1 to 7.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 7.