WO2019080743A1

WO2019080743A1 - Target detection method and apparatus, and computer device

Info

Publication number: WO2019080743A1
Application number: PCT/CN2018/110394
Authority: WO
Inventors: 宋涛; 谢迪; 浦世亮
Original assignee: 杭州海康威视数字技术股份有限公司
Priority date: 2017-10-23
Filing date: 2018-10-16
Publication date: 2019-05-02
Also published as: CN109697441B; CN109697441A; EP3702957A1; EP3702957A4; US20200250487A1; US11288548B2; EP3702957B1

Abstract

Embodiments of the present application provide a target detection method and apparatus, and a computer device. The target detection method comprises: obtaining an image to be detected, which is collected by means of an image collector; inputting the image to be detected into a total-convolutional neural network obtained by means of training, so as to obtain a target upper-vertex confidence distribution diagram, a target lower-vertex confidence distribution diagram and a target upper-vertex and lower-vertex associative field diagram of the image to be detected; for the target upper-vertex confidence distribution diagram and the target lower-vertex confidence distribution diagram, respectively determining upper-vertex targets and lower-vertex targets in the image to be detected by using a preset target determining method; for a first vertex target, computing associative field values of connection lines between the first vertex target and second vertex targets by mapping the upper-vertex targets and the lower-vertex targets to the target upper-vertex and lower-vertex associative field diagram; and based on the associative field values and by matching upper vertexes with lower vertexes, determining a connection line with a maximum associative field value, as a specified target. By means of the solution, the accuracy of target detection can be improved.

Description

Target detection method, device and computer equipment

This application claims the priority of the Chinese Patent Application entitled "A Target Detection Method, Apparatus, and Computer Equipment" by the Chinese Patent Office, filed on October 23, 2017, the entire disclosure of which is hereby incorporated by reference. In the application.

Technical field

The present application relates to the field of machine vision technology, and in particular, to a target detection method, device and computer device.

Background technique

With the continuous advancement of society, the application range of video surveillance systems is more and more extensive. As a research hotspot of video surveillance technology, intelligent surveillance has become popular in some special occasions such as banks, stations, shopping malls and other public places. As a part of intelligent monitoring, target detection has a very important meaning. Target detection can be defined as: determining whether there is a specified target in the input image or video, and if there is a specified target, outputting the position information of the specified target in the image or video. At present, commonly used target detection methods mainly include background difference method, frame difference method, optical flow method, template matching and machine learning-based methods. The first four target detection methods are conventional image detection-based target detection methods, which are susceptible to illumination changes, colors, and poses. The machine-based target detection method, which learns different changes of the specified target from the sample set, has better robustness.

In the related machine learning-based target detection method, the training sample set is first constructed, and a convolutional neural network model is obtained by training the training sample set. When performing target detection, input the image to be detected into the trained convolutional neural network model, and obtain the candidate frame and confidence level corresponding to the specified target, and then perform non-maximum suppression and threshold screening to determine the image to be detected. The specified target in .

However, in some special scenarios, the distribution of targets is relatively dense. For example, in crowd-intensive scenarios, pedestrian targets may be crowded, so that the obtained target learning method based on machine learning is obtained. If there is overlap between the candidate frames, the non-maximum suppression of the candidate frames that overlap each other may discard the candidate frame corresponding to the real specified target, resulting in partial detection of the target and having a certain detection error.

Summary of the invention

The purpose of the embodiments of the present application is to provide a target detection method, apparatus, and computer device to improve the accuracy of target detection. The specific technical solutions are as follows:

In a first aspect, an embodiment of the present application provides a target detection method, where the method includes:

Obtaining an image to be detected collected by the image collector;

And inputting the image to be detected into the trained full convolutional neural network, generating a target upper vertex confidence distribution map, a target lower vertex confidence distribution map, and a target upper and lower vertex correlation field map;

Determining, by using a preset target determining method, at least one upper vertex target and at least one lower vertex target in the image to be detected, respectively, for the target upper vertex confidence distribution map and the target lower vertex confidence distribution map;

Calculating the associated field value between the first vertex target and each second vertex target connection for the first vertex target by mapping each upper vertex target and each lower vertex target to the target upper and lower vertex associated field maps Wherein, if the first vertex target is any upper vertex target, the second vertex target is any lower vertex target, and if the first vertex target is any lower vertex target, the second vertex target For any upper vertex target;

Based on the associated field value between the first vertex target and each second vertex target connection, the upper and lower vertices are matched to determine that the connection with the largest associated field value is the specified target.

In a second aspect, an embodiment of the present application provides a target detecting apparatus, where the apparatus includes:

a first acquiring module, configured to acquire an image to be detected collected by the image collector;

a first generating module, configured to input the image to be detected into the trained full convolutional neural network, generate a target upper vertex confidence distribution map, a target lower vertex confidence distribution map, and a target upper and lower Vertex associated field map;

a target determining module, configured to determine at least one upper vertex target and at least one of the to-be-detected images by using a preset target determining method for the target upper vertex confidence distribution map and the target lower vertex confidence distribution map, respectively Lower vertex target;

a first calculating module, configured to calculate the first vertex target and each second vertex target for the first vertex target by mapping each upper vertex target and each lower vertex target to the target upper and lower vertex associated field maps An associated field value between the lines, wherein if the first vertex target is any upper vertex target, the second vertex target is any lower vertex target, and if the first vertex target is any lower vertex target, Then the second vertex target is any upper vertex target;

And a matching module, configured to determine, according to the associated field value between the first vertex target and each second vertex target connection, by matching the upper and lower vertices, determining that the connection with the largest associated field value is the specified target.

In a third aspect, the embodiment of the present application provides a storage medium for storing executable code, where the executable code is used to execute at a runtime: the target detection method provided by the first aspect of the embodiment of the present application.

In a fourth aspect, an embodiment of the present application provides an application for performing the target detection method provided by the first aspect of the embodiment of the present application.

In a fifth aspect, an embodiment of the present application provides a computer device, including an image collector, a processor, and a storage medium, where

The image collector is configured to collect an image to be detected;

The storage medium is configured to store executable code;

The processor, when used to execute executable code stored on the storage medium, implements the object detection method provided by the first aspect.

In summary, in the solution provided by the embodiment of the present application, the acquired image to be detected is input into the trained full convolutional neural network, and the target upper vertex confidence distribution map and the target lower vertex confidence distribution are generated. The map and the upper and lower vertices of the target are associated with the field map, and the upper vertex target and the lower vertex target in the image to be detected are determined according to the target vertex confidence distribution map and the target lower vertex confidence distribution map, respectively, and then the upper vertex target and the lower vertex are obtained. The target maps to the target field map of the upper and lower vertices of the target, and calculates the associated field value between the first vertex target and the second vertex target connection. Finally, based on the associated field values, the associated field is determined by matching the upper and lower vertices. The line with the largest value is the specified target. Using the trained full convolutional neural network, the upper and lower vertices of the specified target can be extracted, and the connection between the upper vertex and the lower vertex is established by mapping, and then the matching upper and lower vertex connections are matched as the designated target by matching. The specified target is represented by a line, and the overlap of the candidate frames is excluded. Even if the specified target is densely distributed, since the upper and lower vertices of the specified target can be accurately located through the full convolutional neural network, the connection between the upper and lower vertices can be clearly distinguished. Each specified target improves the accuracy of target detection.

DRAWINGS

In order to more clearly illustrate the embodiments of the present application and the technical solutions of the prior art, the following description of the embodiments and the drawings used in the prior art will be briefly introduced. Obviously, the drawings in the following description are only Some embodiments of the application may also be used to obtain other figures from those of ordinary skill in the art without departing from the scope of the invention.

1 is a schematic flowchart of a target detecting method according to an embodiment of the present application;

2 is a schematic structural diagram of a full convolutional neural network according to an embodiment of the present application;

FIG. 3 is another schematic flowchart of a target detecting method according to an embodiment of the present application; FIG.

4 is a true value map of a target upper vertex confidence obtained by extracting a to-be-detected image according to an embodiment of the present application, a true value map of the target lower vertex confidence level, and a true value map of the target upper and lower vertex associated fields;

FIG. 5 is a schematic structural diagram of another full convolutional neural network according to an embodiment of the present application; FIG.

6 is a schematic diagram of a pedestrian detection result according to an embodiment of the present application;

FIG. 7 is a schematic structural diagram of a target detecting apparatus according to an embodiment of the present application;

FIG. 8 is another schematic structural diagram of an object detecting apparatus according to an embodiment of the present application; FIG.

FIG. 9 is a schematic structural diagram of a computer device according to an embodiment of the present application.

Detailed ways

In order to make the objects, technical solutions, and advantages of the present application more comprehensible, the present application will be further described in detail below with reference to the accompanying drawings. It is apparent that the described embodiments are only a part of the embodiments of the present application, and not all of them. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments of the present application without departing from the inventive scope are the scope of the present application.

The present application will be described in detail below through specific embodiments.

In order to improve the accuracy of the target detection, the embodiment of the present application provides a target detection method, device, and computer device.

In the following, a target detection method provided by an embodiment of the present application is first introduced.

An execution subject of an object detection method provided by an embodiment of the present application may be a computer device equipped with a core processing chip, and the computer device may be a camera having an image processing capability, an image processor, or the like. A manner of implementing a target detection method provided by an embodiment of the present application may be at least one of software, a hardware circuit, and a logic circuit disposed in an execution body.

As shown in FIG. 1 , an object detection method provided by an embodiment of the present application may include the following steps:

S101. Acquire an image to be detected collected by an image collector.

The image collector may be a camera or a camera. Of course, the image collector is not limited thereto. If the image collector is a camera, the camera captures a video for a period of time, and the image to be detected can be any frame image in the video.

S102. Input the image to be detected into the trained full convolutional neural network, generate a target upper vertex confidence distribution map, a target lower vertex confidence distribution map, and a target upper and lower vertex correlation field map.

Since the full convolutional neural network has the ability to automatically extract the upper vertex target and the lower vertex target feature of the specified target, and the network parameters of the full convolutional neural network can be obtained through the process of sample training. Therefore, the fully convolutional neural network obtained by training can ensure the fast recognition of the upper vertex target and the lower vertex target of the specified target. As shown in FIG. 2, in the embodiment of the present application, the full convolutional neural network is composed of a plurality of convolution layers and a plurality of downsampling layers, and the acquired image to be detected is input into the full convolutional neural network. The convolutional neural network performs feature extraction on the upper vertex target and the lower vertex target of the specified target in the detected image, and can obtain the target upper vertex confidence distribution map, the target lower vertex confidence distribution map, and the target upper and lower vertex correlation field map of the image to be detected. .

The target vertex confidence distribution graph and the target lower vertex confidence distribution map can be understood as: the detected target is a distribution map of the probability of the upper vertex of the specified target and the probability of the lower vertex. For example, if the specified target is a pedestrian, the target upper vertex confidence distribution map is a probability distribution map of the detected target as the head vertex; the target lower vertex confidence distribution map is a probability distribution map of the detected target as the pedestrian's feet. Each pixel on the associated field graph of the target upper and lower vertices represents the correlation degree value of the upper vertex target or the lower vertex target of the specified target at the position. The parameters in the vertex confidence distribution map and the target vertex confidence distribution graph on the target may be specific probability values of the upper vertex target and the lower vertex target of the specified target for each target in the identified region, wherein the identified The area is an area related to the position and size of the target. Generally, the area of the area may be greater than or equal to the actual size of the target. The pixel value of the pixel may also be used to represent the magnitude of the probability, and the pixel of each pixel in the area. The larger the value, the greater the probability that the target in the region is the upper vertex target or the lower vertex target of the specified target. Of course, in the embodiment of the present application, the target vertex confidence distribution map and the target lower vertex confidence distribution map are The specific parameters are not limited to this.

Optionally, the full convolutional neural network may include: a convolution layer, a downsampling layer, and a deconvolution layer. The full convolutional neural network often includes at least one convolution layer and at least one downsampling layer, and the deconvolution layer is an optional layer, in order to make the resolution of the obtained feature image the same as the resolution of the input image to be detected, The step of reducing the conversion of the image compression ratio facilitates feature extraction. After the last convolution layer, a deconvolution layer can be set.

Optionally, S102 can be implemented by the following steps.

In the first step, the image to be detected is input into the trained full convolutional neural network, and the features of the image to be detected are extracted through a network structure in which the convolution layer and the downsampling layer are arranged.

In the second step, the feature is upsampled to the same resolution as the image to be detected by the deconvolution layer, and the upsampled result is obtained.

The image to be detected is input into the trained full convolutional neural network, and the features from the lower layer to the upper layer are sequentially extracted by a series of convolutional layers and downsampling layers, and the series of convolutional layers and downsampled layers are arranged in phase. The deconvolution layer is then connected to upsample the feature to the size of the image to be detected of the input.

In the third step, the result obtained in the second step is calculated by using the 1×1 convolution layer, and the target upper vertex confidence distribution map, the target lower vertex confidence distribution map, and the target upper and lower vertex associations are obtained with the same resolution as the image to be detected. Field map.

In order to ensure that the vertex confidence distribution map on the target, the target lower vertex confidence distribution map, and the target upper and lower vertex correlation field map have the same resolution as the image to be detected, the result of the upsampling can be calculated by a convolution layer. The convolution kernel size of the convolutional layer may be selected as a convolution kernel of 1×1, 3×3 or 5×5 size, but in order to accurately extract the feature of one pixel point, the convolution kernel of the convolution layer may be selected. When the size is 1×1, the operation of the convolution layer can obtain the target vertex confidence distribution map, the target lower vertex confidence distribution map, and the target upper and lower vertex correlation field map.

S103: Determine, by using a preset target determining method, at least one upper vertex target and at least one lower vertex target in the image to be detected, respectively, for the target upper vertex confidence distribution map and the target lower vertex confidence distribution map.

Since the target upper vertex confidence distribution map of the image to be detected obtained by the full convolutional neural network includes the probability that the target of each identified region is the upper vertex target of the specified target, and the lower vertex confidence distribution map includes each identification. The target of the region is the probability of specifying the target's lower vertex target. All targets may include other targets than the upper vertex target and the lower vertex target. Therefore, the target upper vertex confidence map and the target for the image to be detected need to be separately The target lower vertex confidence distribution map is determined by the preset target determination method, and the upper vertex target with the specified target in the image to be detected is determined from the target top vertex confidence distribution map, and the target lower vertex confidence distribution map is determined from the target lower vertex confidence distribution map. Detecting an accurate target lower vertex target in the image, wherein the preset target determining method may be setting a threshold, and if the probability in the vertex confidence distribution graph on the target is greater than the threshold, determining that the region corresponding to the probability is an upper vertex Target, if the probability in the target's lower vertex confidence map is greater than the threshold, then determine The region corresponding to the probability is a lower vertex target; or may be based on the pixel value of each pixel, if each pixel value in the region is greater than a preset pixel value, determining the region as an upper vertex target or a lower vertex target; If the confidence of each pixel is greater than a preset reliability threshold, determining that the region is the upper vertex target or the lower vertex target; or, if the average value of the confidence of each pixel is greater than a preset reliability threshold, Then determine that the area is the upper vertex target or the lower vertex target. Of course, the manner of specifically determining the upper vertex target and the lower vertex target is not limited thereto, and a threshold processing method may be adopted for the convenience of implementation.

Optionally, S103 can be implemented by the following steps.

In the first step, the non-maximum suppression method is used to determine the position of the center point of at least one detection target for the target vertex confidence distribution map and the target lower vertex confidence distribution map.

Due to the vertex confidence distribution map and the target lower vertex confidence distribution map on the target of the image to be detected, the confidence maximum point characterizes the position of each detection target center point, and the non-zero point representation of spatial aggregation on the confidence distribution map The detection target is located in the region, and the non-maximum value suppression is performed on the target vertex confidence distribution map and the target lower vertex confidence distribution map, and the maximum value in the region is searched by suppressing the element that is not the maximum value. The position of the center point of each detection target can be obtained. The formation of this region is related to the confidence of each pixel. Because it may be affected by factors such as too close to the two targets and background objects, the region may deviate from the actual detection target, but the confidence maximum point representation After detecting the center point of the target and determining the position of the center point, it can be determined as a detection target in a certain neighborhood of the center point, so that the accuracy of the target detection can be improved by determining the position of the center point.

In the second step, the confidence of all the pixels in the neighborhood of the center point of each detection target is obtained.

Since the neighborhood of the center point of the detection target can be determined as a detection target, the size of the neighborhood can be determined according to a statistical analysis of the upper vertex size and the lower vertex size of the specified target, for example, for the pedestrian target, it can be through the actual human head. The average of the radius is counted, or the value of the neighboring vertex is determined by obeying a preset distribution. The neighborhood size of the lower vertex target can be set to be the same as the upper vertex. Of course, the neighborhood sizes of the upper and lower vertex can be different, and the neighborhood size of the lower vertex can also be determined according to the size of the lower vertex of the actual specified target. Since the confidence of all the pixels in the neighborhood of the center point of the detection target is larger, the probability that the detection target is the upper vertex target or the lower vertex target is larger. Therefore, in this embodiment, all the pixels in the neighborhood are required. Confidence is obtained.

In the third step, it is determined that the confidence level of each pixel in the vertex confidence distribution graph on the target is greater than the confidence of the pre-set confidence threshold, and the confidence of each pixel in the target vertex confidence map The detection target that is greater than the preset reliability threshold is the lower vertex target.

The probability that the detection target is the upper vertex target or the lower vertex target of the specified target is larger because the confidence of all the pixels in the neighborhood of the detection target center point is larger. Therefore, in this embodiment, a preset is preset. a confidence threshold, if the confidence level of all the pixels in the neighborhood of the detection target center point in the vertex confidence distribution graph on the target is greater than the preset reliability threshold, the detection target may be determined as the upper vertex target of the image to be detected, if In the target lower vertex confidence distribution map, the confidence of all the pixels in the neighborhood of the detection target center point is greater than the preset reliability threshold, and the detection target may be determined as the lower vertex target of the image to be detected. The pre-set reliability threshold may be set according to experimental data or requirements. For example, the pre-set reliability may be set to 0.7, and if all the pixels in the neighborhood of the detection target are in the target vertex confidence distribution map, the confidence is determined. If the degree is greater than 0.7, the detection target is determined as the upper vertex target. If the confidence of all the pixels in the neighborhood of the detection target in the target lower vertex confidence distribution map is greater than 0.7, the detection target may be determined as Vertex target. For example, the pre-set reliability may be set to 0.85, 0.9, or other values, which is not limited herein. In this embodiment, since the confidence level of all the pixels in the neighborhood of the detection target center point is greater than the preset reliability threshold, the accuracy of the target detection is further ensured.

S104: Calculate the associated field value between the first vertex target and each second vertex target connection for the first vertex target by mapping each upper vertex target and each lower vertex target to the target upper and lower vertex related field maps.

Wherein, the first vertex target and the second vertex target are any upper vertex target or any lower vertex target, and if the first vertex target is any upper vertex target, the second vertex target is any lower vertex target, if the first The vertex target is any lower vertex target, and the second vertex target is any upper vertex target. After determining the upper vertex target and the lower vertex target of the target in the scene, the obtained upper vertex target and lower vertex target may be mapped to the target upper and lower vertex associated field map obtained by S102, because each of the target upper and lower vertex is associated with the field map. A pixel point represents the correlation degree value of the upper vertex target or the lower vertex target of the specified target at the position, and then the degree of association of each of the two connected upper and lower vertices can be obtained by connecting each upper vertex target and each lower vertex target. The sum of the values may be defined as the associated field value of the connection, or the mean of the correlation degree values of each of the two connected upper and lower vertices may be defined as the associated field value of the connection. For the first vertex target, in the connection with each second vertex target, if the associated field value is larger, the degree of association between the upper and lower vertices of the connection is higher, that is, the probability that the connection is a specified target is larger. .

S105. Determine, according to the associated field value between the first vertex target and each second vertex target connection, by matching the upper and lower vertices, determining that the connection with the largest associated field value is the specified target.

Since the associated field value is larger in the connection with the second vertex target for the first vertex target, the probability that the connection is the specified target is larger. Therefore, the connection with the largest associated field value can be determined as The target is specified, and generally, only one second vertex target is connected to the first vertex target to form a specified target. Therefore, by matching the upper and lower vertices, the largest associated field value can be determined for the first vertex target. The line is a specified target. For example, five upper vertex targets and four lower vertex targets are determined by S103. For the first upper vertex target, the associated field value of the line with the first lower vertex target is the largest, and then the first upper limit is determined. The line connecting the vertex target and the first lower vertex target is a specified target; for the second upper vertex target, the associated field value of the line with the third lower vertex target is the largest, and determining the second upper vertex target and the third lower vertex target The connection is a specified target; for the third upper vertex target, the associated field value of the line with the second lower vertex target is the largest, and the third upper vertex target and the second lower vertex are determined. The target connection is the specified target; for the fifth upper vertex target, the associated field value of the line with the fourth lower vertex target is the largest, then the connection between the fifth upper vertex target and the fourth lower vertex target is determined to be the specified target; The associated field value of the connection between the fourth upper vertex and each lower vertex is smaller than the associated field value of the other connected lines. Therefore, it can be determined that the fourth upper vertex may be the misidentified upper vertex target, and the upper vertex target is discarded. . Optionally, the method of matching the upper and lower vertices may adopt a classic bipartite graph matching method, that is, a Hungarian algorithm, thereby achieving one-to-one matching between the upper vertex target and the lower vertex target. Of course, one-to-one matching between the targets can be achieved. The methods are all applicable to this embodiment, and are not exemplified herein.

In the process of determining the specified target, the calculated connection with the largest associated field value may be a false detection target. To further improve the accuracy of the target detection, a preset associated field threshold may be set to determine the maximum associated field of the connection. Whether the value is greater than the preset associated field threshold. If it is greater than, the connection is an accurate specified target. If it is not greater than, the connection is a false detection target and the detection result is discarded. After determining the specified target, it is possible to determine whether a specified target exists in the image to be detected, and determine accurate position information of the specified target.

Applying the embodiment, the acquired image to be detected is input into the trained full convolutional neural network, and the target upper vertex confidence distribution map, the target lower vertex confidence distribution map, and the target upper and lower vertex correlation field map are generated. Determining the upper vertex target and the lower vertex target in the image to be detected according to the vertex confidence distribution map and the target lower vertex confidence distribution map, respectively, and mapping the upper vertex target and the lower vertex target to the target upper and lower vertex associated fields by the upper vertex target and the lower vertex target respectively The figure calculates the associated field value between the first vertex target and the second vertex target connection. Finally, based on the associated field values, the upper and lower vertices are matched to determine the connection with the largest associated field value as the specified target. . Using the trained full convolutional neural network, the upper and lower vertices of the specified target can be extracted, and the connection between the upper vertex and the lower vertex is established by mapping, and then the matching upper and lower vertex connections are matched as the designated target by matching. The specified target is represented by a line, and the overlap of the candidate frames is excluded. Even if the specified target is densely distributed, since the upper and lower vertices of the specified target can be accurately located through the full convolutional neural network, the connection between the upper and lower vertices can be clearly distinguished. Each specified target improves the accuracy of target detection. Moreover, since the specified target of the detection is the connection between the upper vertex target and the lower vertex target, the connection can clearly and clearly reflect the posture information of the specified target (for example, forward tilt, backward tilt, lean over, etc.), which is beneficial to the follow-up. About applications such as target behavior analysis. In this embodiment, the features with high discrimination are extracted layer by layer by convolution and mapping, and then the upper and lower vertices of the target are accurately positioned and matched, and the matching upper and lower vertices are used as the specified target detection result, which is more robust and specified. The advantage of high target detection accuracy is high. At the same time, it is not necessary to preset a certain scale and aspect ratio anchor point frame as the reference frame. Therefore, the performance of the algorithm target detection does not depend on the selection of the anchor point frame, and adaptively Solved the scale and aspect ratio of the target.

Based on the embodiment shown in FIG. 1 , as shown in FIG. 3 , the embodiment of the present application further provides a target detection method, where the target detection method may include the following steps:

S301. Acquire a preset training set sample image, and a line connecting the upper edge center position, the lower edge center position, and the center position of the upper and lower edges of each specified target in the preset training set sample image.

In this embodiment, before performing the operation of the full convolutional neural network, a full convolutional neural network needs to be constructed. Since the network parameters of the full convolutional neural network are trained, the training process can be understood as the upper vertex target of the specified target. And the learning process of the next vertex target. It is necessary to construct preset training set sample images for the features of various specified targets, each image corresponding to the upper vertex target and the lower vertex target feature of different specified targets, and the confidence of the upper vertex target and the lower vertex target can be preset. Obeying a circular Gaussian distribution, it is necessary to obtain the upper edge center position of the specified target (for example, the head vertex position of the pedestrian target) and the lower edge center position (for example, the center position between the feet of the pedestrian target), and the upper and lower edges The center position of the connection, the center position of the upper and lower edges can be calibrated.

S302. Generate a true value map of the target upper vertex confidence and a true value of the target lower vertex confidence according to the preset distribution law, the upper edge center position and the lower edge center position of each specified target.

The preset distribution law is a probability distribution obeyed by the confidence of the upper vertex target and the lower vertex target of the specified target. Generally, the confidence of the upper and lower vertex targets obeys a circular Gaussian distribution. Of course, this embodiment not only Limited to this. Suppose that the center position of the upper edge of each specified target in the calibration image is P _up , the center position of the lower edge is P _down , and the confidence of the upper vertex target and the lower vertex target obeys the circular Gaussian distribution N, according to formula (1), 2), obtain the target vertex confidence truth value map of the preset training set sample image and the target lower vertex confidence truth value map.

Where p represents the coordinates of any pixel position on the truth map of the confidence distribution; up represents the upper vertex target of the specified target; D _up (p) represents the upper vertex target at the p position coordinate on the target vertex confidence truth map Confidence; n _ped represents the total number of specified targets in the training set sample image; P _up represents the coordinate position of the vertex target on each specified target in the calibration training set sample image; σ _up indicates that the upper vertex target obeys the circular Gaussian distribution N Variance; down represents the lower vertex target of the specified target; D _down (p) represents the confidence of the lower vertex target at the p position coordinate on the target lower vertex confidence truth map; P _down represents the specified in the calibration training set sample image The coordinate position of the vertex target under the target; σ _down indicates the variance of the lower vertex target obeying the circular Gaussian distribution N; Equation (2) is the standard Gaussian distribution, ensuring that the positions of the upper vertex target and the lower vertex target of the specified specified target have The highest confidence is 1.0, and the confidence is gradually decreasing to zero around the Gaussian distribution.

S303. Generate a true value map of the target upper and lower vertex associated fields of the preset training set sample image according to the connection of the center positions of the upper and lower edges of the specified target.

In general, for a given target, the associated field of the line between the upper vertex target and the lower vertex target is subject to the magnitude unit vector.

The unit vector has an amplitude equal to 1, and the direction is along the line direction. Of course, the embodiment is not limited to this. According to the line connecting the center positions of the upper and lower edges of each specified target, and formulas (3) and (4), the true value map of the target upper and lower vertex associated fields of the preset training set sample image can be generated.

Where p denotes the coordinates of any pixel position on the true value map of the target upper and lower vertex; A(p) denotes the associated field value at the p position coordinate on the true value map of the target upper and lower vertex; n _ped denotes the specified in the training set sample image The total number of targets; Equation (4) indicates that the associated field on the line connecting the upper vertex target and the lower vertex target of the specified target is a unit vector vr having an amplitude equal to 1, along the line direction.

The target vertex confidence truth value map of the preset training set sample image, the target lower vertex confidence truth value map, and the target upper and lower vertex correlation field truth value map are generated as shown in FIG. 4, taking pedestrian detection as an example, and the target can be obtained from the target. It can be seen from the upper vertex confidence truth graph that each bright point corresponds to the upper vertex target of each specified target in the sample image of the preset training set, and it is seen from the true value map of the target lower vertex confidence that each bright point corresponds to The lower vertex target of each specified target in the sample image of the preset training set; as seen from the truth map of the associated field of the upper and lower vertex of the target, each connection is the connection of the upper vertex target and the lower vertex target of each specified target line.

S304: Input the preset training set sample image into the initial full convolutional neural network, and obtain a target upper vertex confidence distribution map, a target lower vertex confidence distribution map, and a target upper and lower vertex correlation field map of the preset training set sample image.

The network parameters of the initial full convolutional neural network are preset values; the initial upper convolutional neural network can obtain the target upper vertex confidence distribution map of the preset training set sample image, the target lower vertex confidence distribution map, and the target upper and lower Vertex associated field map, the target vertex confidence distribution map is compared with the vertex confidence truth map on the target, and the target lower vertex confidence map is used to compare with the target vertex confidence truth map, the target The upper and lower vertex correlation field maps are compared with the above-mentioned target upper and lower vertex associated field truth maps, and the training parameters are continuously trained and updated, so that the target upper vertex confidence distribution map and the target upper vertex confidence are outputted by the full convolutional neural network. The degree truth map is close, the target lower vertex confidence distribution map is close to the target lower vertex confidence truth value map, and the target upper and lower vertex correlation field map is close to the target upper and lower vertex correlation field truth map, and when it is close enough The full convolutional neural network is determined as a trained full convolutional neural network that can perform target detection.

Optionally, the full convolutional neural network may include: a convolution layer, a downsampling layer, and a deconvolution layer.

The full convolutional neural network often includes at least one convolution layer and at least one downsampling layer, and the deconvolution layer is an optional layer, in order to make the resolution of the obtained feature map and the resolution of the input preset training set sample image. Similarly, the step of reducing the conversion ratio of the image compression ratio facilitates the calculation of the confidence. After the last convolutional layer, a deconvolution layer can be set.

Optionally, the step of obtaining the target vertex confidence distribution map, the target lower vertex confidence distribution map, and the target upper and lower vertex correlation field maps of the preset training set sample image may be implemented by the following steps.

In the first step, the preset training set sample image is input into the initial full convolutional neural network, and the features of the preset training set sample image are extracted through the network structure in which the convolution layer and the downsampling layer are arranged.

In the second step, the feature is upsampled to the same resolution as the preset training set sample image by the deconvolution layer, and the upsampled result is obtained.

The preset training set sample image is input into the initial full convolutional neural network, as shown in FIG. 5, and the features from the lower layer to the upper layer are sequentially extracted by using a series of convolutional layers and downsampling layers, the series of convolutional layers and downsampling layers. It is arranged in phase. The deconvolution layer is then connected to upsample the feature to the input preset training set sample image size.

In the third step, the result obtained in the second step is calculated by using the 1×1 convolution layer, and the target upper vertex confidence distribution map and the target lower vertex confidence distribution map and the target are obtained with the same resolution as the preset training set sample image. The upper and lower vertices are associated with the field map

In order to ensure that the vertex confidence distribution map on the target, the target lower vertex confidence distribution map, and the target upper and lower vertex correlation field map have the same resolution as the preset training set sample image, the result of the upsampling can be finally performed by a roll of layers. For calculation, the convolution kernel size of the convolution layer may be selected from a convolution kernel of size 1×1, 3×3, or 5×5, but in order to accurately extract features of one pixel, the convolution layer may be selected. When the convolution kernel size is 1×1, the convolutional layer on the target can be obtained by the operation of the convolution layer, the target lower vertex confidence distribution map, and the target upper and lower vertex correlation field map.

S305. Calculate a first average error of a target upper vertex confidence distribution map of the preset training set sample image and a target upper vertex confidence true value map, a target lower vertex confidence distribution map, and a target lower vertex confidence true value map. The second average error, and the third average error of the associated field map of the upper and lower vertex of the target and the truth map of the associated field of the upper and lower vertex of the target.

S306. If the first average error, the second average error, or the third average error is greater than the preset error threshold, update the network parameter according to the first average error, the second average error, the third average error, and the preset gradient operation strategy. Obtaining an updated full convolutional neural network; calculating a first average error, a second average error, and a third average error obtained by the updated full convolutional neural network until the first average error, the second average error, and the third error are Less than or equal to the preset error threshold, the corresponding full convolutional neural network is determined to be a trained full convolutional neural network.

The full convolutional neural network can be trained by the classical back propagation algorithm. The preset gradient operation strategy can be the ordinary gradient descent method or the stochastic gradient descent method. The gradient descent method uses the negative gradient direction as the search direction. Close to the target value, the smaller the step size, the slower the progress. Since the stochastic gradient descent method uses only one sample at a time, the iteration speed is much higher than the gradient descent. Therefore, in order to improve the operation efficiency, the embodiment may use a random gradient descent method to update the network parameters. During the training process, the first average error of the target upper vertex confidence distribution map and the target upper vertex confidence truth value map outputted by the preset training set sample image after passing through the full convolutional neural network, and the target lower vertex confidence distribution map are calculated. The second average error of the target vertex confidence truth map, and the third average error of the target upper and lower vertex associated field map and the target upper and lower vertex associated field truth map, as shown in equation (5)(6), updated with the average error The network parameters of the full convolutional neural network are iteratively performed until the average error is no longer decreased. The network parameters of the full convolutional neural network include the convolution kernel parameters and the offset parameters of the convolutional layer.

L(θ)=L _D (θ)+λL _A (θ) (6)

Where L _D (θ) represents the first average error or the second average error; θ represents the network parameter of the full convolutional neural network; N represents the number of sample images of the preset training set; F _D (X _i ; θ) represents the total Convolutional neural network output target vertex confidence distribution map or target lower vertex confidence distribution map; X _i represents input image input to the network, number i; i represents image number; D _i represents through formula (1) and Equation (2) obtained on the target vertex confidence truth map or target lower vertex confidence truth map; L _A (θ) represents the third average error; F _A (X _i ; θ) represents the full convolutional neural network output The upper and lower vertices of the target are associated with the field map; A _i represents the true value map of the associated upper and lower vertices obtained by equations (3) and (4); λ represents the balance parameter of the two errors, usually taking a value of 1.0.

S307. Acquire an image to be detected collected by the image collector.

S308. Input the image to be detected into the trained full convolutional neural network, generate a target upper vertex confidence distribution map, a target lower vertex confidence distribution map, and a target upper and lower vertex correlation field map.

S309. Determine, by using a preset target determining method, at least one upper vertex target and at least one lower vertex target in the image to be detected, respectively, for the target upper vertex confidence distribution map and the target lower vertex confidence distribution map.

S310, by mapping each upper vertex target and each lower vertex target to the target upper and lower vertex related field maps, and calculating an associated field value between the first vertex target and each second vertex target connection for the first vertex target.

S311: Determine, according to the associated field value between the first vertex target and each second vertex target connection, by matching the upper and lower vertices, determining that the connection with the largest associated field value is the specified target.

S307 to S311 are the same as the steps of the embodiment shown in FIG. 1, and have the same or similar beneficial effects, and are not described herein again.

Applying the embodiment, the acquired image to be detected is input into the trained full convolutional neural network, and the target upper vertex confidence distribution map, the target lower vertex confidence distribution map, and the target upper and lower vertex correlation field map are generated. Determining the upper vertex target and the lower vertex target in the image to be detected according to the vertex confidence distribution map and the target lower vertex confidence distribution map, respectively, and mapping the upper vertex target and the lower vertex target to the target upper and lower vertex associated fields by the upper vertex target and the lower vertex target respectively The figure calculates the associated field value between the first vertex target and the second vertex target connection. Finally, based on the associated field values, the upper and lower vertices are matched to determine the connection with the largest associated field value as the specified target. . Using the trained full convolutional neural network, the upper and lower vertices of the specified target can be extracted, and the connection between the upper vertex and the lower vertex is established by mapping, and then the matching upper and lower vertex connections are matched as the designated target by matching. The specified target is represented by a line, and the overlap of the candidate frames is excluded. Even if the specified target is densely distributed, since the upper and lower vertices of the specified target can be accurately located through the full convolutional neural network, the connection between the upper and lower vertices can be clearly distinguished. Each specified target improves the accuracy of target detection. Moreover, since the specified target of the detection is the connection between the upper vertex target and the lower vertex target, the connection can clearly and clearly reflect the posture information of the specified target (for example, forward tilt, backward tilt, lean over, etc.), which is beneficial to the follow-up. About applications such as target behavior analysis. In this embodiment, the features with high discrimination are extracted layer by layer by convolution and mapping, and then the upper and lower vertices of the target are accurately positioned and matched, and the matching upper and lower vertices are used as the specified target detection result, which is more robust and specified. The advantage of high target detection accuracy is high. At the same time, it is not necessary to preset a certain scale and aspect ratio anchor point frame as the reference frame. Therefore, the performance of the algorithm target detection does not depend on the selection of the anchor point frame, and adaptively Solved the scale and aspect ratio of the target. In the training process of the full convolutional neural network, the preset training set sample image is set for the upper vertex target and the lower vertex target of the specified target with different features, and the training and iteration of the sample image of the preset training set are performed. The obtained full convolutional neural network has strong generalization ability, avoids complicated classifier cascade mode, and has a simpler structure.

The target detection method provided by the embodiment of the present application is introduced in conjunction with a specific application example for detecting a pedestrian target.

For the scene in the street, the image to be detected is collected by the monitoring device, and the image to be detected is input into the trained full convolutional neural network, and the target upper vertex confidence distribution map and the target lower vertex confidence distribution are obtained. The map and the target upper and lower vertex are associated with the field map; respectively, for the target upper vertex confidence distribution map and the target lower vertex confidence distribution map for the image to be detected, non-maximum value suppression is used to determine the position of the center point of each detection target, And the confidence level of the pixel in the neighborhood of the detection center point of the target is greater than the preset reliability threshold, and the center position target between the pedestrian head vertex target and the pedestrian foot is determined.

Then, the center position target between the pedestrian head vertex target and the pedestrian foot is mapped to the above-mentioned target upper and lower vertex associated field map, and the correlation degree value of each pedestrian head vertex target and the center position target between each pedestrian foot is obtained. According to the correlation degree value, the mean value of the correlation degree value between the head vertex target of each pedestrian and the center position target of each pedestrian's feet can be obtained, and the detection result shown in FIG. 6 is determined by the judgment and matching of the mean value, each of which is connected. The line is a pedestrian goal.

Compared with the related art, the present scheme generates a target upper vertex confidence distribution map, a target lower vertex confidence distribution map, and a target upper and lower vertex by inputting the acquired image to be detected into the trained full convolutional neural network. Correlating the field map, determining the upper vertex target and the lower vertex target in the image to be detected according to the vertex confidence distribution map and the target lower vertex confidence distribution map respectively, and mapping the upper vertex target and the lower vertex target to the target upper and lower targets The vertex is associated with the field map, and the associated field value between the first vertex target and the second vertex target is calculated. Finally, based on the associated field values, the upper and lower vertices are matched to determine the maximum associated field value. To specify the target. Using the trained full convolutional neural network, the upper and lower vertices of the specified target can be extracted, and the connection between the upper vertex and the lower vertex is established by mapping, and then the matching upper and lower vertex connections are matched as the designated target by matching. The specified target is represented by a line, and the overlap of the candidate frames is excluded. Even if the specified target is densely distributed, since the upper and lower vertices of the specified target can be accurately located through the full convolutional neural network, the connection between the upper and lower vertices can be clearly distinguished. Each specified target improves the accuracy of target detection. Moreover, since the specified target of the detection is the connection between the upper vertex target and the lower vertex target, the connection can clearly and clearly reflect the posture information of the specified target (for example, forward tilt, backward tilt, lean over, etc.), which is beneficial to the follow-up. About applications such as target behavior analysis.

Corresponding to the above method embodiment, the embodiment of the present application provides a target detecting device. As shown in FIG. 7, the target detecting device includes:

The first acquiring module 710 is configured to acquire an image to be detected collected by the image collector;

a first generating module 720, configured to input the image to be detected into the trained full convolutional neural network, generate a target upper vertex confidence distribution map, a target lower vertex confidence distribution map, and a target Upper and lower vertices associated with the field map;

The target determining module 730 is configured to determine at least one upper vertex target and at least one of the to-be-detected images by using a preset target determining method for the target upper vertex confidence distribution map and the target lower vertex confidence distribution map, respectively. a lower vertex target;

a first calculation module 740, configured to calculate the first vertex target and each second vertex for the first vertex target by mapping each upper vertex target and each lower vertex target to the target upper and lower vertex associated field maps An associated field value between the target connections, wherein if the first vertex target is any upper vertex target, the second vertex target is any lower vertex target, if the first vertex target is any lower vertex target And the second vertex target is any upper vertex target;

The matching module 750 is configured to determine, according to the associated field value between the first vertex target and each second vertex target connection, by matching the upper and lower vertices, determining that the connection with the largest associated field value is the specified target.

Optionally, the target determining module 730 is specifically configured to:

Determining a position of a center point of the at least one detection target by using a non-maximum value suppression method for the target upper vertex confidence distribution map and the target lower vertex confidence distribution map;

Obtaining the confidence of all pixels in the neighborhood of the center point of each detection target;

Determining a confidence level of each pixel in the vertex confidence distribution graph on the target is greater than a confidence level of the pre-set reliability threshold, and the confidence of each pixel in the target lower vertex confidence map The detection target that is greater than the preset reliability threshold is the lower vertex target.

Optionally, the first calculating module 740 is specifically configured to:

Mapping each upper vertex target and each lower vertex target to the target upper and lower vertex associated field maps, and obtaining correlation degrees between the upper vertex targets and the lower vertex targets;

The first vertex target is connected to each second vertex target for the first vertex target;

Calculating, according to the correlation degree value of the first vertex target and each second vertex target, an average value of the correlation degree value of the first vertex target and each second vertex target as the first vertex target and each second vertex The associated field value between the target connections.

Optionally, the matching module 750 is specifically configured to:

Determining a maximum associated field value from each associated field value by using a preset bipartite graph matching method based on the associated field value between the first vertex target and each second vertex target connection;

The connection corresponding to the largest associated field value is determined as the specified target.

Optionally, the matching module 750 is further configured to:

Obtain a preset associated field threshold;

Determining whether the maximum associated field value is greater than the preset associated field threshold;

If it is greater than, the connection corresponding to the largest associated field value is determined as the specified target.

It should be noted that the target detecting apparatus in the embodiment of the present application is the apparatus applying the embodiment of the target detecting method shown in FIG. 1, and all the embodiments of the target detecting method are applicable to the apparatus, and all of the same or similar can be achieved. Beneficial effect.

Based on the embodiment shown in FIG. 7, the embodiment of the present application further provides a target detecting device. As shown in FIG. 8, the target detecting device may include:

The first obtaining module 810 is configured to acquire an image to be detected collected by the image collector;

The second obtaining module 820 is configured to acquire a preset training set sample image, and a line connecting the upper edge center position, the lower edge center position, and the center position of the upper and lower edges of each specified target in the preset training set sample image;

a second generating module 830, configured to generate, according to a preset distribution law, an upper edge center position and a lower edge center position of each specified target, a target upper vertex confidence true value map and a target lower vertex of the preset training set sample image Confidence truth map;

The third generation module 840 is configured to generate a true value map of the target upper and lower vertex associated fields of the preset training set sample image according to the connection of the top and bottom edge center positions of the specified targets;

The extraction module 850 is configured to input the preset training set sample image into the initial full convolutional neural network, obtain a target upper vertex confidence distribution map, a target lower vertex confidence distribution map, and a target of the preset training set sample image. The upper and lower vertices are associated with the field map, wherein the network parameters of the initial full convolutional neural network are preset values;

a second calculating module 860, configured to calculate a first average error of a target upper vertex confidence distribution map of the preset training set sample image and a target upper vertex confidence truth value map, and a target of the preset training set sample image a second average error of the lower vertex confidence distribution map and the target lower vertex confidence truth map, and a third average of the target upper and lower vertex associated field maps of the preset training set sample image and the target upper and lower vertex associated field truth maps error;

The loop module 870 is configured to: if the first average error, the second average error, or the third average error is greater than a preset error threshold, according to the first average error, the second average error, The third average error and the preset gradient operation strategy are described, the network parameters are updated, an updated full convolutional neural network is obtained, and the first average error, the second average error, and the third obtained by the updated full convolutional neural network are calculated. The average error, until the first average error, the second average error, and the third error are both less than or equal to the preset error threshold, determining that the corresponding full convolutional neural network is a full convolution after training Neural Networks;

a first generating module 880, configured to input the image to be detected into the trained full convolutional neural network, generate a target upper vertex confidence distribution map, a target lower vertex confidence distribution map, and a target Upper and lower vertices associated with the field map;

a target determining module 890, configured to determine at least one upper vertex target and at least one of the to-be-detected images by using a preset target determining method for the target upper vertex confidence distribution map and the target lower vertex confidence distribution map, respectively a lower vertex target;

a first calculating module 8100, configured to calculate the first vertex target and each second vertex for the first vertex target by mapping each upper vertex target and each lower vertex target to the target upper and lower vertex associated field maps An associated field value between the target connections, wherein if the first vertex target is any upper vertex target, the second vertex target is any lower vertex target, if the first vertex target is any lower vertex target And the second vertex target is any upper vertex target;

The matching module 8110 is configured to determine, according to the associated field value between the first vertex target and each second vertex target connection, by matching the upper and lower vertices, the connection with the largest associated field value is the specified target.

Applying the embodiment, the acquired image to be detected is input into the trained full convolutional neural network, and the target upper vertex confidence distribution map, the target lower vertex confidence distribution map, and the target upper and lower vertex correlation field map are generated. Determining the upper vertex target and the lower vertex target in the image to be detected according to the vertex confidence distribution map and the target lower vertex confidence distribution map, respectively, and mapping the upper vertex target and the lower vertex target to the target upper and lower vertex associated fields by the upper vertex target and the lower vertex target respectively The figure calculates the associated field value between the first vertex target and the second vertex target connection. Finally, based on the associated field values, the upper and lower vertices are matched to determine the connection with the largest associated field value as the specified target. . Using the trained full convolutional neural network, the upper and lower vertices of the specified target can be extracted, and the connection between the upper vertex and the lower vertex is established by mapping, and then the matching upper and lower vertex connections are matched as the designated target by matching. The specified target is represented by a line, and the overlap of the candidate frames is excluded. Even if the specified target is densely distributed, since the upper and lower vertices of the specified target can be accurately located through the full convolutional neural network, the connection between the upper and lower vertices can be clearly distinguished. Each specified target improves the accuracy of target detection. Moreover, since the specified target is the connection between the upper vertex target and the lower vertex target, the connection can clearly and clearly reflect the posture information of the specified target (for example, forward tilt, backward tilt, lean over, etc.), which is beneficial to the follow-up. About applications such as target behavior analysis. In this embodiment, the features with high discrimination are extracted layer by layer by convolution and mapping, and then the upper and lower vertices of the target are accurately positioned and matched, and the matching upper and lower vertices are used as the specified target detection result, which is more robust and specified. The advantage of high target detection accuracy is high. At the same time, it is not necessary to preset a certain scale and aspect ratio anchor point frame as the reference frame. Therefore, the performance of the algorithm target detection does not depend on the selection of the anchor point frame, and adaptively Solved the scale and aspect ratio of the target. In the training process of the full convolutional neural network, the preset training set sample image is set for the upper vertex target and the lower vertex target of the specified target with different features, and the training and iteration of the sample image of the preset training set are performed. The obtained full convolutional neural network has strong generalization ability, avoids complicated classifier cascade mode, and has a simpler structure.

Optionally, the full convolutional neural network includes: a convolution layer, a downsampling layer, and a deconvolution layer;

The extraction module 850 can be specifically configured to:

And inputting the preset training set sample image into an initial full convolutional neural network, and extracting, by the convolution layer and the downsampling layer, a network structure of the preset training set sample image;

Upsampling the feature to the resolution of the preset training set sample image by the deconvolution layer, and obtaining the upsampled result;

The result is calculated by using a 1×1 convolution layer, and a target upper vertex confidence distribution map, a target lower vertex confidence distribution map, and a target upper and lower vertex correlation field map are obtained with the same resolution as the preset training set sample image. .

It should be noted that, the object detecting apparatus in the embodiment of the present application is the apparatus applying the embodiment of the object detecting method shown in FIG. 3, and all the embodiments of the target detecting method are applicable to the apparatus, and all of the same or similar can be achieved. Beneficial effect.

In addition, corresponding to the object detection method provided by the foregoing embodiment, the embodiment of the present application provides a storage medium for storing executable code, which is executed at runtime: provided by the embodiment of the present application. All steps of the target detection method.

In this embodiment, the storage medium stores executable code that executes the target detection method provided by the embodiment of the present application at runtime, and thus can implement: using a trained full convolutional neural network, which can be extracted to a specified target. The vertex and the lower vertex are connected, and the connection between the upper vertex and the lower vertex is established by mapping, and then the matching upper and lower vertex lines are matched as the specified target by matching, and the specified target is represented by a line, and the overlapping of the candidate frames is excluded. Even if the specified target distribution is dense, since the upper and lower vertices of the specified target can be accurately located by the full convolutional neural network, the specified targets can be clearly distinguished by the connection of the upper and lower vertices, and the accuracy of the target detection is improved.

In addition, corresponding to the object detection method provided by the foregoing embodiment, the embodiment of the present application provides an application program for performing all the steps of the target detection method provided by the embodiment of the present application.

In this embodiment, the application performs the target detection method provided by the embodiment of the present application at runtime, and thus can implement: using the trained full convolutional neural network, the upper vertex and the lower vertex of the specified target can be extracted, and The connection between the upper vertex and the lower vertex is established by mapping, and then the matching upper and lower vertex lines are matched as the specified target, and the specified target is represented by a line, and the overlapping of the candidate frames is excluded, even if the specified target is densely distributed, Since the upper and lower vertices of the specified target can be accurately located by the full convolutional neural network, the specified targets can be clearly distinguished by the connection of the upper and lower vertices, and the accuracy of the target detection is improved.

In addition, the embodiment of the present application further provides a computer device, as shown in FIG. 9, including an image collector 901, a processor 902, and a storage medium 903, where

An image collector 901, configured to collect an image to be detected;

a storage medium 903 for storing executable code;

The processor 902 is configured to implement all the steps of the target detection method provided by the embodiments of the present application when the executable code stored on the storage medium 903 is executed.

The image collector 901, the processor 902, and the storage medium 903 can perform data transmission by means of a wired connection or a wireless connection, and the computer device can communicate with other devices through a wired communication interface or a wireless communication interface.

The storage medium may include a RAM (Random Access Memory), and may also include an NVM (Non-volatile Memory), such as at least one disk storage. Alternatively, the storage medium may also be at least one storage device located remotely from the aforementioned processor.

The processor may be a general-purpose processor, including a CPU (Central Processing Unit), an NP (Network Processor), or the like; or a DSP (Digital Signal Processor) or an ASIC (Application) Specific Integrated Circuit, FPGA (Field-Programmable Gate Array) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components.

The image collector may be a camera for shooting a monitoring area for video capture or picture capture.

In this embodiment, the processor of the computer device can realize the extraction of the specified target by using the trained full convolutional neural network by reading the executable code stored in the storage medium and running the executable code. The upper vertex and the lower vertex, and the connection between the upper vertex and the lower vertex is established by mapping, and then the matching upper and lower vertex lines are matched as the designated target by matching, and the specified target is represented by a line, and the overlapping of the candidate frames is excluded. Occurs, even if the specified target is densely distributed, since the upper and lower vertices of the specified target can be accurately located by the full convolutional neural network, the specified targets can be clearly distinguished by the connection of the upper and lower vertices, and the accuracy of the target detection is improved.

For the storage medium, the application program, and the computer device embodiment, since the method content involved is basically similar to the foregoing method embodiment, the description is relatively simple, and the relevant parts can be referred to the description of the method embodiment.

It should be noted that, in this context, relational terms such as first and second are used merely to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply such entities or operations. There is any such actual relationship or order between them. Furthermore, the term "comprises" or "comprises" or "comprises" or any other variations thereof is intended to encompass a non-exclusive inclusion, such that a process, method, article, or device that comprises a plurality of elements includes not only those elements but also Other elements, or elements that are inherent to such a process, method, item, or device. An element that is defined by the phrase "comprising a ..." does not exclude the presence of additional equivalent elements in the process, method, item, or device that comprises the element.

The various embodiments in the specification are described in a related manner, and the same similar parts between the various embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the apparatus, the storage medium, the application, and the computer apparatus embodiment, since it is substantially similar to the method embodiment, the description is relatively simple, and the relevant parts can be referred to the description of the method embodiment.

The above is only the preferred embodiment of the present application, and is not intended to limit the present application. Any modifications, equivalent substitutions, improvements, etc., which are made within the spirit and principles of the present application, should be included in the present application. Within the scope of protection.

Claims

A method for detecting a target, the method comprising:

Obtaining an image to be detected collected by the image collector;

And inputting the image to be detected into the trained full convolutional neural network, generating a target upper vertex confidence distribution map, a target lower vertex confidence distribution map, and a target upper and lower vertex correlation field map;

Determining, by using a preset target determining method, at least one upper vertex target and at least one lower vertex target in the image to be detected, respectively, for the target upper vertex confidence distribution map and the target lower vertex confidence distribution map;

Calculating the associated field value between the first vertex target and each second vertex target connection for the first vertex target by mapping each upper vertex target and each lower vertex target to the target upper and lower vertex associated field maps Wherein, if the first vertex target is any upper vertex target, the second vertex target is any lower vertex target, and if the first vertex target is any lower vertex target, the second vertex target For any upper vertex target;

Based on the associated field value between the first vertex target and each second vertex target connection, the upper and lower vertices are matched to determine that the connection with the largest associated field value is the specified target.
The method according to claim 1, wherein the training method of the full convolutional neural network comprises:

Obtaining a preset training set sample image, and a line connecting the upper edge center position, the lower edge center position, and the center position of the upper and lower edges of each specified target in the preset training set sample image;

Generating a target upper vertex confidence truth value map and a target lower vertex confidence truth value map according to a preset distribution law, an upper edge center position of each specified target, and a lower edge center position;

Generating a true value map of the target upper and lower vertex associated fields of the preset training set sample image according to a line connecting the center positions of the upper and lower edges of the specified target;

And inputting the preset training set sample image into an initial full convolutional neural network, and obtaining a target upper vertex confidence distribution map, a target lower vertex confidence distribution map, and a target upper and lower vertex correlation field map of the preset training set sample image, The network parameter of the initial full convolutional neural network is a preset value;

Calculating a first average error of a target upper vertex confidence distribution map of the preset training set sample image and a target upper vertex confidence truth value map, and a target lower vertex confidence distribution map and a target of the preset training set sample image a second average error of the lower vertex confidence truth map, and a third average error of the target upper and lower vertex associated field maps of the preset training set sample image and the target upper and lower vertex associated field truth maps;

If the first average error, the second average error, or the third average error is greater than a preset error threshold, according to the first average error, the second average error, the third average error, and Presetting a gradient operation strategy, updating network parameters, obtaining an updated full convolutional neural network; calculating a first average error, a second average error, and a third average error obtained by the updated full convolutional neural network until said The first average error, the second average error, and the third error are both less than or equal to the preset error threshold, and the corresponding full convolutional neural network is determined to be a trained full convolutional neural network.
The method according to claim 2, wherein the full convolutional neural network comprises: a convolution layer, a downsampling layer, and a deconvolution layer;

And inputting the preset training set sample image into an initial full convolutional neural network, and obtaining a target upper vertex confidence distribution map, a target lower vertex confidence distribution map, and a target upper and lower vertex correlation field of the preset training set sample image. Figure, including:

And inputting the preset training set sample image into an initial full convolutional neural network, and extracting, by the convolution layer and the downsampling layer, a network structure of the preset training set sample image;

Upsampling the feature to the resolution of the preset training set sample image by the deconvolution layer, and obtaining the upsampled result;

The result is calculated by using a 1×1 convolution layer, and a target upper vertex confidence distribution map, a target lower vertex confidence distribution map, and a target upper and lower vertex correlation field map are obtained with the same resolution as the preset training set sample image. .
The method according to claim 1, wherein the determining the image to be detected by using a preset target determining method for the target upper vertex confidence distribution map and the target lower vertex confidence distribution map respectively At least one upper vertex target and at least one lower vertex target, including:

Determining a position of a center point of the at least one detection target by using a non-maximum value suppression method for the target upper vertex confidence distribution map and the target lower vertex confidence distribution map;

Obtaining the confidence of all pixels in the neighborhood of the center point of each detection target;

Determining a confidence level of each pixel in the vertex confidence distribution graph on the target is greater than a confidence level of the pre-set reliability threshold, and the confidence of each pixel in the target lower vertex confidence map The detection target that is greater than the preset reliability threshold is the lower vertex target.
The method according to claim 1, wherein said first calculating said first vertex target by mapping each upper vertex target and each lower vertex target into said target upper and lower vertex associated field maps The associated field value between the vertex target and each second vertex target connection, including:

Mapping each upper vertex target and each lower vertex target to the target upper and lower vertex associated field maps, and obtaining correlation degrees between the upper vertex targets and the lower vertex targets;

The first vertex target is connected to each second vertex target for the first vertex target;

Calculating, according to the correlation degree value of the first vertex target and each second vertex target, an average value of the correlation degree value of the first vertex target and each second vertex target as the first vertex target and each second vertex The associated field value between the target connections.
The method according to claim 1, wherein the determining the associated field value between the first vertex target and each second vertex target line determines the maximum associated field value by matching the upper and lower vertices Lines are specified targets, including:

Determining a maximum associated field value from each associated field value by using a preset bipartite graph matching method based on the associated field value between the first vertex target and each second vertex target connection;

The connection corresponding to the largest associated field value is determined as the specified target.
The method according to claim 6, wherein the associated field value is based on the associated field value between the first vertex target and each second vertex target connection, using a preset bipartite graph matching method. After selecting the largest associated field value, the method further includes:

Obtain a preset associated field threshold;

Determining whether the maximum associated field value is greater than the preset associated field threshold;

If it is greater than, the connection corresponding to the largest associated field value is determined as the specified target.
A target detecting device, characterized in that the device comprises:

a first acquiring module, configured to acquire an image to be detected collected by the image collector;

a first generating module, configured to input the image to be detected into the trained full convolutional neural network, generate a target upper vertex confidence distribution map, a target lower vertex confidence distribution map, and a target upper and lower Vertex associated field map;

a target determining module, configured to determine at least one upper vertex target and at least one of the to-be-detected images by using a preset target determining method for the target upper vertex confidence distribution map and the target lower vertex confidence distribution map, respectively Lower vertex target;

a first calculating module, configured to calculate the first vertex target and each second vertex target for the first vertex target by mapping each upper vertex target and each lower vertex target to the target upper and lower vertex associated field maps An associated field value between the lines, wherein if the first vertex target is any upper vertex target, the second vertex target is any lower vertex target, and if the first vertex target is any lower vertex target, Then the second vertex target is any upper vertex target;

And a matching module, configured to determine, according to the associated field value between the first vertex target and each second vertex target connection, by matching the upper and lower vertices, determining that the connection with the largest associated field value is the specified target.
The device according to claim 8, wherein the device further comprises:

a second acquiring module, configured to acquire a preset training set sample image, and a line connecting the upper edge center position, the lower edge center position, and the center position of the upper and lower edges of each specified target in the preset training set sample image;

a second generating module, configured to generate, according to a preset distribution law, an upper edge center position and a lower edge center position of each specified target, a target upper vertex confidence true value map of the preset training set sample image and a target lower vertex confidence True value map;

a third generation module, configured to generate a true value map of the target upper and lower vertex associated fields of the preset training set sample image according to the connection of the center positions of the upper and lower edges of the specified target;

An extraction module, configured to input the preset training set sample image into an initial full convolutional neural network, obtain a target upper vertex confidence distribution map of the preset training set sample image, a target lower vertex confidence distribution map, and a target upper and lower a vertex associated field map, wherein the network parameter of the initial full convolutional neural network is a preset value;

a second calculating module, configured to calculate a first average error of a target upper vertex confidence distribution map of the preset training set sample image and a target upper vertex confidence truth value map, and a target of the preset training set sample image a second average error of the vertex confidence distribution map and the target lower vertex confidence truth map, and a third average error of the target upper and lower vertex associated field map of the preset training set sample image and the target upper and lower vertex associated field truth map ;

a loop module, configured to: if the first average error, the second average error, or the third average error is greater than a preset error threshold, according to the first average error, the second average error, The third average error and the preset gradient operation strategy update the network parameters to obtain an updated full convolutional neural network; calculate the first average error, the second average error, and the third average obtained by the updated full convolutional neural network The error, until the first average error, the second average error, and the third error are both less than or equal to the preset error threshold, determining that the corresponding full convolutional neural network is a trained full convolutional nerve The internet.
The apparatus according to claim 9, wherein said full convolutional neural network comprises: a convolution layer, a downsampling layer, and a deconvolution layer;

The extraction module is specifically configured to:

And inputting the preset training set sample image into an initial full convolutional neural network, and extracting, by the convolution layer and the downsampling layer, a network structure of the preset training set sample image;

Upsampling the feature to the resolution of the preset training set sample image by the deconvolution layer, and obtaining the upsampled result;

The result is calculated by using a 1×1 convolution layer, and a target upper vertex confidence distribution map, a target lower vertex confidence distribution map, and a target upper and lower vertex correlation field map are obtained with the same resolution as the preset training set sample image. .
The device according to claim 8, wherein the target determining module is specifically configured to:

Determining a position of a center point of the at least one detection target by using a non-maximum value suppression method for the target upper vertex confidence distribution map and the target lower vertex confidence distribution map;

Obtaining the confidence of all pixels in the neighborhood of the center point of each detection target;

Determining a confidence level of each pixel in the vertex confidence distribution graph on the target is greater than a confidence level of the pre-set reliability threshold, and the confidence of each pixel in the target lower vertex confidence map The detection target that is greater than the preset reliability threshold is the lower vertex target.
The device according to claim 8, wherein the first calculating module is specifically configured to:

Mapping each upper vertex target and each lower vertex target to the target upper and lower vertex associated field maps, and obtaining correlation degrees between the upper vertex targets and the lower vertex targets;

The first vertex target is connected to each second vertex target for the first vertex target;

Calculating, according to the correlation degree value of the first vertex target and each second vertex target, an average value of the correlation degree value of the first vertex target and each second vertex target as the first vertex target and each second vertex The associated field value between the target connections.
The device according to claim 8, wherein the matching module is specifically configured to:

Determining a maximum associated field value from each associated field value by using a preset bipartite graph matching method based on the associated field value between the first vertex target and each second vertex target connection;

The connection corresponding to the largest associated field value is determined as the specified target.
The device according to claim 13, wherein the matching module is further configured to:

Obtain a preset associated field threshold;

Determining whether the maximum associated field value is greater than the preset associated field threshold;

If it is greater than, the connection corresponding to the largest associated field value is determined as the specified target.
A storage medium for storing executable code for execution at runtime: the object detection method of any one of claims 1-7.
An application, characterized in that it is executed at runtime: the object detection method according to any one of claims 1-7.
A computer device, comprising: an image collector, a processor, and a storage medium, wherein

The image collector is configured to collect an image to be detected;

The storage medium is configured to store executable code;

The processor, when used to execute executable code stored on the storage medium, implements the object detection method according to any one of claims 1-7.