WO2019080743A1 - Target detection method and apparatus, and computer device - Google Patents

Target detection method and apparatus, and computer device

Info

Publication number
WO2019080743A1
WO2019080743A1 PCT/CN2018/110394 CN2018110394W WO2019080743A1 WO 2019080743 A1 WO2019080743 A1 WO 2019080743A1 CN 2018110394 W CN2018110394 W CN 2018110394W WO 2019080743 A1 WO2019080743 A1 WO 2019080743A1
Authority
WO
WIPO (PCT)
Prior art keywords
target
vertex
confidence
map
preset
Prior art date
Application number
PCT/CN2018/110394
Other languages
French (fr)
Chinese (zh)
Inventor
宋涛
谢迪
浦世亮
Original Assignee
杭州海康威视数字技术股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 杭州海康威视数字技术股份有限公司 filed Critical 杭州海康威视数字技术股份有限公司
Priority to EP18871198.0A priority Critical patent/EP3702957B1/en
Priority to US16/758,443 priority patent/US11288548B2/en
Publication of WO2019080743A1 publication Critical patent/WO2019080743A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/255Detecting or recognising potential candidate objects based on visual cues, e.g. shapes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/217Validation; Performance evaluation; Active pattern learning techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/24Aligning, centring, orientation detection or correction of the image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/42Global feature extraction by analysis of the whole pattern, e.g. using frequency domain transformations or autocorrelation
    • G06V10/422Global feature extraction by analysis of the whole pattern, e.g. using frequency domain transformations or autocorrelation for representing the structure of the pattern or shape of an object therefor
    • G06V10/426Graphical representations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects

Definitions

  • the present application relates to the field of machine vision technology, and in particular, to a target detection method, device and computer device.
  • Target detection can be defined as: determining whether there is a specified target in the input image or video, and if there is a specified target, outputting the position information of the specified target in the image or video.
  • commonly used target detection methods mainly include background difference method, frame difference method, optical flow method, template matching and machine learning-based methods.
  • the first four target detection methods are conventional image detection-based target detection methods, which are susceptible to illumination changes, colors, and poses.
  • the machine-based target detection method which learns different changes of the specified target from the sample set, has better robustness.
  • the training sample set is first constructed, and a convolutional neural network model is obtained by training the training sample set.
  • a convolutional neural network model is obtained by training the training sample set.
  • target detection input the image to be detected into the trained convolutional neural network model, and obtain the candidate frame and confidence level corresponding to the specified target, and then perform non-maximum suppression and threshold screening to determine the image to be detected.
  • the specified target in .
  • the distribution of targets is relatively dense. For example, in crowd-intensive scenarios, pedestrian targets may be crowded, so that the obtained target learning method based on machine learning is obtained. If there is overlap between the candidate frames, the non-maximum suppression of the candidate frames that overlap each other may discard the candidate frame corresponding to the real specified target, resulting in partial detection of the target and having a certain detection error.
  • the purpose of the embodiments of the present application is to provide a target detection method, apparatus, and computer device to improve the accuracy of target detection.
  • the specific technical solutions are as follows:
  • an embodiment of the present application provides a target detection method, where the method includes:
  • the upper and lower vertices are matched to determine that the connection with the largest associated field value is the specified target.
  • an embodiment of the present application provides a target detecting apparatus, where the apparatus includes:
  • a first acquiring module configured to acquire an image to be detected collected by the image collector
  • a first generating module configured to input the image to be detected into the trained full convolutional neural network, generate a target upper vertex confidence distribution map, a target lower vertex confidence distribution map, and a target upper and lower Vertex associated field map;
  • a target determining module configured to determine at least one upper vertex target and at least one of the to-be-detected images by using a preset target determining method for the target upper vertex confidence distribution map and the target lower vertex confidence distribution map, respectively Lower vertex target;
  • a first calculating module configured to calculate the first vertex target and each second vertex target for the first vertex target by mapping each upper vertex target and each lower vertex target to the target upper and lower vertex associated field maps An associated field value between the lines, wherein if the first vertex target is any upper vertex target, the second vertex target is any lower vertex target, and if the first vertex target is any lower vertex target, Then the second vertex target is any upper vertex target;
  • a matching module configured to determine, according to the associated field value between the first vertex target and each second vertex target connection, by matching the upper and lower vertices, determining that the connection with the largest associated field value is the specified target.
  • the embodiment of the present application provides a storage medium for storing executable code, where the executable code is used to execute at a runtime: the target detection method provided by the first aspect of the embodiment of the present application.
  • an embodiment of the present application provides an application for performing the target detection method provided by the first aspect of the embodiment of the present application.
  • an embodiment of the present application provides a computer device, including an image collector, a processor, and a storage medium, where
  • the image collector is configured to collect an image to be detected
  • the storage medium is configured to store executable code
  • the processor when used to execute executable code stored on the storage medium, implements the object detection method provided by the first aspect.
  • the acquired image to be detected is input into the trained full convolutional neural network, and the target upper vertex confidence distribution map and the target lower vertex confidence distribution are generated.
  • the map and the upper and lower vertices of the target are associated with the field map, and the upper vertex target and the lower vertex target in the image to be detected are determined according to the target vertex confidence distribution map and the target lower vertex confidence distribution map, respectively, and then the upper vertex target and the lower vertex are obtained.
  • the target maps to the target field map of the upper and lower vertices of the target, and calculates the associated field value between the first vertex target and the second vertex target connection.
  • the associated field is determined by matching the upper and lower vertices.
  • the line with the largest value is the specified target.
  • the upper and lower vertices of the specified target can be extracted, and the connection between the upper vertex and the lower vertex is established by mapping, and then the matching upper and lower vertex connections are matched as the designated target by matching.
  • the specified target is represented by a line, and the overlap of the candidate frames is excluded. Even if the specified target is densely distributed, since the upper and lower vertices of the specified target can be accurately located through the full convolutional neural network, the connection between the upper and lower vertices can be clearly distinguished. Each specified target improves the accuracy of target detection.
  • FIG. 1 is a schematic flowchart of a target detecting method according to an embodiment of the present application.
  • FIG. 2 is a schematic structural diagram of a full convolutional neural network according to an embodiment of the present application.
  • FIG. 3 is another schematic flowchart of a target detecting method according to an embodiment of the present application.
  • 4 is a true value map of a target upper vertex confidence obtained by extracting a to-be-detected image according to an embodiment of the present application, a true value map of the target lower vertex confidence level, and a true value map of the target upper and lower vertex associated fields;
  • FIG. 5 is a schematic structural diagram of another full convolutional neural network according to an embodiment of the present application.
  • FIG. 6 is a schematic diagram of a pedestrian detection result according to an embodiment of the present application.
  • FIG. 7 is a schematic structural diagram of a target detecting apparatus according to an embodiment of the present application.
  • FIG. 8 is another schematic structural diagram of an object detecting apparatus according to an embodiment of the present application.
  • FIG. 9 is a schematic structural diagram of a computer device according to an embodiment of the present application.
  • the embodiment of the present application provides a target detection method, device, and computer device.
  • An execution subject of an object detection method provided by an embodiment of the present application may be a computer device equipped with a core processing chip, and the computer device may be a camera having an image processing capability, an image processor, or the like.
  • a manner of implementing a target detection method provided by an embodiment of the present application may be at least one of software, a hardware circuit, and a logic circuit disposed in an execution body.
  • an object detection method provided by an embodiment of the present application may include the following steps:
  • the image collector may be a camera or a camera. Of course, the image collector is not limited thereto. If the image collector is a camera, the camera captures a video for a period of time, and the image to be detected can be any frame image in the video.
  • the full convolutional neural network has the ability to automatically extract the upper vertex target and the lower vertex target feature of the specified target, and the network parameters of the full convolutional neural network can be obtained through the process of sample training. Therefore, the fully convolutional neural network obtained by training can ensure the fast recognition of the upper vertex target and the lower vertex target of the specified target.
  • the full convolutional neural network is composed of a plurality of convolution layers and a plurality of downsampling layers, and the acquired image to be detected is input into the full convolutional neural network.
  • the convolutional neural network performs feature extraction on the upper vertex target and the lower vertex target of the specified target in the detected image, and can obtain the target upper vertex confidence distribution map, the target lower vertex confidence distribution map, and the target upper and lower vertex correlation field map of the image to be detected. .
  • the target vertex confidence distribution graph and the target lower vertex confidence distribution map can be understood as: the detected target is a distribution map of the probability of the upper vertex of the specified target and the probability of the lower vertex.
  • the target upper vertex confidence distribution map is a probability distribution map of the detected target as the head vertex;
  • the target lower vertex confidence distribution map is a probability distribution map of the detected target as the pedestrian's feet.
  • Each pixel on the associated field graph of the target upper and lower vertices represents the correlation degree value of the upper vertex target or the lower vertex target of the specified target at the position.
  • the parameters in the vertex confidence distribution map and the target vertex confidence distribution graph on the target may be specific probability values of the upper vertex target and the lower vertex target of the specified target for each target in the identified region, wherein the identified The area is an area related to the position and size of the target. Generally, the area of the area may be greater than or equal to the actual size of the target.
  • the pixel value of the pixel may also be used to represent the magnitude of the probability, and the pixel of each pixel in the area. The larger the value, the greater the probability that the target in the region is the upper vertex target or the lower vertex target of the specified target.
  • the target vertex confidence distribution map and the target lower vertex confidence distribution map are The specific parameters are not limited to this.
  • the full convolutional neural network may include: a convolution layer, a downsampling layer, and a deconvolution layer.
  • the full convolutional neural network often includes at least one convolution layer and at least one downsampling layer, and the deconvolution layer is an optional layer, in order to make the resolution of the obtained feature image the same as the resolution of the input image to be detected, The step of reducing the conversion of the image compression ratio facilitates feature extraction. After the last convolution layer, a deconvolution layer can be set.
  • S102 can be implemented by the following steps.
  • the image to be detected is input into the trained full convolutional neural network, and the features of the image to be detected are extracted through a network structure in which the convolution layer and the downsampling layer are arranged.
  • the feature is upsampled to the same resolution as the image to be detected by the deconvolution layer, and the upsampled result is obtained.
  • the image to be detected is input into the trained full convolutional neural network, and the features from the lower layer to the upper layer are sequentially extracted by a series of convolutional layers and downsampling layers, and the series of convolutional layers and downsampled layers are arranged in phase.
  • the deconvolution layer is then connected to upsample the feature to the size of the image to be detected of the input.
  • the result obtained in the second step is calculated by using the 1 ⁇ 1 convolution layer, and the target upper vertex confidence distribution map, the target lower vertex confidence distribution map, and the target upper and lower vertex associations are obtained with the same resolution as the image to be detected. Field map.
  • the result of the upsampling can be calculated by a convolution layer.
  • the convolution kernel size of the convolutional layer may be selected as a convolution kernel of 1 ⁇ 1, 3 ⁇ 3 or 5 ⁇ 5 size, but in order to accurately extract the feature of one pixel point, the convolution kernel of the convolution layer may be selected.
  • the size is 1 ⁇ 1
  • the operation of the convolution layer can obtain the target vertex confidence distribution map, the target lower vertex confidence distribution map, and the target upper and lower vertex correlation field map.
  • S103 Determine, by using a preset target determining method, at least one upper vertex target and at least one lower vertex target in the image to be detected, respectively, for the target upper vertex confidence distribution map and the target lower vertex confidence distribution map.
  • the target upper vertex confidence distribution map of the image to be detected obtained by the full convolutional neural network includes the probability that the target of each identified region is the upper vertex target of the specified target, and the lower vertex confidence distribution map includes each identification.
  • the target of the region is the probability of specifying the target's lower vertex target. All targets may include other targets than the upper vertex target and the lower vertex target. Therefore, the target upper vertex confidence map and the target for the image to be detected need to be separately
  • the target lower vertex confidence distribution map is determined by the preset target determination method, and the upper vertex target with the specified target in the image to be detected is determined from the target top vertex confidence distribution map, and the target lower vertex confidence distribution map is determined from the target lower vertex confidence distribution map.
  • the preset target determining method may be setting a threshold, and if the probability in the vertex confidence distribution graph on the target is greater than the threshold, determining that the region corresponding to the probability is an upper vertex Target, if the probability in the target's lower vertex confidence map is greater than the threshold, then determine The region corresponding to the probability is a lower vertex target; or may be based on the pixel value of each pixel, if each pixel value in the region is greater than a preset pixel value, determining the region as an upper vertex target or a lower vertex target; If the confidence of each pixel is greater than a preset reliability threshold, determining that the region is the upper vertex target or the lower vertex target; or, if the average value of the confidence of each pixel is greater than a preset reliability threshold, Then determine that the area is the upper vertex target or the lower vertex target.
  • S103 can be implemented by the following steps.
  • the non-maximum suppression method is used to determine the position of the center point of at least one detection target for the target vertex confidence distribution map and the target lower vertex confidence distribution map.
  • the confidence maximum point characterizes the position of each detection target center point, and the non-zero point representation of spatial aggregation on the confidence distribution map
  • the detection target is located in the region, and the non-maximum value suppression is performed on the target vertex confidence distribution map and the target lower vertex confidence distribution map, and the maximum value in the region is searched by suppressing the element that is not the maximum value.
  • the position of the center point of each detection target can be obtained.
  • the formation of this region is related to the confidence of each pixel.
  • the region may deviate from the actual detection target, but the confidence maximum point representation After detecting the center point of the target and determining the position of the center point, it can be determined as a detection target in a certain neighborhood of the center point, so that the accuracy of the target detection can be improved by determining the position of the center point.
  • the confidence of all the pixels in the neighborhood of the center point of each detection target is obtained.
  • the size of the neighborhood can be determined according to a statistical analysis of the upper vertex size and the lower vertex size of the specified target, for example, for the pedestrian target, it can be through the actual human head. The average of the radius is counted, or the value of the neighboring vertex is determined by obeying a preset distribution.
  • the neighborhood size of the lower vertex target can be set to be the same as the upper vertex. Of course, the neighborhood sizes of the upper and lower vertex can be different, and the neighborhood size of the lower vertex can also be determined according to the size of the lower vertex of the actual specified target. Since the confidence of all the pixels in the neighborhood of the center point of the detection target is larger, the probability that the detection target is the upper vertex target or the lower vertex target is larger. Therefore, in this embodiment, all the pixels in the neighborhood are required. Confidence is obtained.
  • the third step it is determined that the confidence level of each pixel in the vertex confidence distribution graph on the target is greater than the confidence of the pre-set confidence threshold, and the confidence of each pixel in the target vertex confidence map
  • the detection target that is greater than the preset reliability threshold is the lower vertex target.
  • a preset is preset.
  • a confidence threshold if the confidence level of all the pixels in the neighborhood of the detection target center point in the vertex confidence distribution graph on the target is greater than the preset reliability threshold, the detection target may be determined as the upper vertex target of the image to be detected, if In the target lower vertex confidence distribution map, the confidence of all the pixels in the neighborhood of the detection target center point is greater than the preset reliability threshold, and the detection target may be determined as the lower vertex target of the image to be detected.
  • the pre-set reliability threshold may be set according to experimental data or requirements.
  • the pre-set reliability may be set to 0.7, and if all the pixels in the neighborhood of the detection target are in the target vertex confidence distribution map, the confidence is determined. If the degree is greater than 0.7, the detection target is determined as the upper vertex target. If the confidence of all the pixels in the neighborhood of the detection target in the target lower vertex confidence distribution map is greater than 0.7, the detection target may be determined as Vertex target.
  • the pre-set reliability may be set to 0.85, 0.9, or other values, which is not limited herein. In this embodiment, since the confidence level of all the pixels in the neighborhood of the detection target center point is greater than the preset reliability threshold, the accuracy of the target detection is further ensured.
  • S104 Calculate the associated field value between the first vertex target and each second vertex target connection for the first vertex target by mapping each upper vertex target and each lower vertex target to the target upper and lower vertex related field maps.
  • the first vertex target and the second vertex target are any upper vertex target or any lower vertex target, and if the first vertex target is any upper vertex target, the second vertex target is any lower vertex target, if the first The vertex target is any lower vertex target, and the second vertex target is any upper vertex target.
  • the obtained upper vertex target and lower vertex target may be mapped to the target upper and lower vertex associated field map obtained by S102, because each of the target upper and lower vertex is associated with the field map.
  • a pixel point represents the correlation degree value of the upper vertex target or the lower vertex target of the specified target at the position, and then the degree of association of each of the two connected upper and lower vertices can be obtained by connecting each upper vertex target and each lower vertex target.
  • the sum of the values may be defined as the associated field value of the connection, or the mean of the correlation degree values of each of the two connected upper and lower vertices may be defined as the associated field value of the connection.
  • the degree of association between the upper and lower vertices of the connection is higher, that is, the probability that the connection is a specified target is larger.
  • S105 Determine, according to the associated field value between the first vertex target and each second vertex target connection, by matching the upper and lower vertices, determining that the connection with the largest associated field value is the specified target.
  • the connection with the largest associated field value can be determined as The target is specified, and generally, only one second vertex target is connected to the first vertex target to form a specified target. Therefore, by matching the upper and lower vertices, the largest associated field value can be determined for the first vertex target.
  • the line is a specified target. For example, five upper vertex targets and four lower vertex targets are determined by S103. For the first upper vertex target, the associated field value of the line with the first lower vertex target is the largest, and then the first upper limit is determined.
  • the line connecting the vertex target and the first lower vertex target is a specified target; for the second upper vertex target, the associated field value of the line with the third lower vertex target is the largest, and determining the second upper vertex target and the third lower vertex target
  • the connection is a specified target; for the third upper vertex target, the associated field value of the line with the second lower vertex target is the largest, and the third upper vertex target and the second lower vertex are determined.
  • the target connection is the specified target; for the fifth upper vertex target, the associated field value of the line with the fourth lower vertex target is the largest, then the connection between the fifth upper vertex target and the fourth lower vertex target is determined to be the specified target; The associated field value of the connection between the fourth upper vertex and each lower vertex is smaller than the associated field value of the other connected lines. Therefore, it can be determined that the fourth upper vertex may be the misidentified upper vertex target, and the upper vertex target is discarded.
  • the method of matching the upper and lower vertices may adopt a classic bipartite graph matching method, that is, a Hungarian algorithm, thereby achieving one-to-one matching between the upper vertex target and the lower vertex target. Of course, one-to-one matching between the targets can be achieved.
  • the methods are all applicable to this embodiment, and are not exemplified herein.
  • the calculated connection with the largest associated field value may be a false detection target.
  • a preset associated field threshold may be set to determine the maximum associated field of the connection. Whether the value is greater than the preset associated field threshold. If it is greater than, the connection is an accurate specified target. If it is not greater than, the connection is a false detection target and the detection result is discarded. After determining the specified target, it is possible to determine whether a specified target exists in the image to be detected, and determine accurate position information of the specified target.
  • the acquired image to be detected is input into the trained full convolutional neural network, and the target upper vertex confidence distribution map, the target lower vertex confidence distribution map, and the target upper and lower vertex correlation field map are generated.
  • Determining the upper vertex target and the lower vertex target in the image to be detected according to the vertex confidence distribution map and the target lower vertex confidence distribution map, respectively, and mapping the upper vertex target and the lower vertex target to the target upper and lower vertex associated fields by the upper vertex target and the lower vertex target respectively The figure calculates the associated field value between the first vertex target and the second vertex target connection.
  • the upper and lower vertices are matched to determine the connection with the largest associated field value as the specified target. .
  • the upper and lower vertices of the specified target can be extracted, and the connection between the upper vertex and the lower vertex is established by mapping, and then the matching upper and lower vertex connections are matched as the designated target by matching.
  • the specified target is represented by a line, and the overlap of the candidate frames is excluded. Even if the specified target is densely distributed, since the upper and lower vertices of the specified target can be accurately located through the full convolutional neural network, the connection between the upper and lower vertices can be clearly distinguished. Each specified target improves the accuracy of target detection.
  • the specified target of the detection is the connection between the upper vertex target and the lower vertex target, the connection can clearly and clearly reflect the posture information of the specified target (for example, forward tilt, backward tilt, lean over, etc.), which is beneficial to the follow-up.
  • the features with high discrimination are extracted layer by layer by convolution and mapping, and then the upper and lower vertices of the target are accurately positioned and matched, and the matching upper and lower vertices are used as the specified target detection result, which is more robust and specified.
  • the advantage of high target detection accuracy is high.
  • the embodiment of the present application further provides a target detection method, where the target detection method may include the following steps:
  • a full convolutional neural network needs to be constructed before performing the operation of the full convolutional neural network. Since the network parameters of the full convolutional neural network are trained, the training process can be understood as the upper vertex target of the specified target. And the learning process of the next vertex target. It is necessary to construct preset training set sample images for the features of various specified targets, each image corresponding to the upper vertex target and the lower vertex target feature of different specified targets, and the confidence of the upper vertex target and the lower vertex target can be preset.
  • the preset distribution law is a probability distribution obeyed by the confidence of the upper vertex target and the lower vertex target of the specified target.
  • the confidence of the upper and lower vertex targets obeys a circular Gaussian distribution.
  • this embodiment not only Limited to this.
  • the center position of the upper edge of each specified target in the calibration image is P up
  • the center position of the lower edge is P down
  • the confidence of the upper vertex target and the lower vertex target obeys the circular Gaussian distribution N, according to formula (1), 2), obtain the target vertex confidence truth value map of the preset training set sample image and the target lower vertex confidence truth value map.
  • p represents the coordinates of any pixel position on the truth map of the confidence distribution
  • up represents the upper vertex target of the specified target
  • D up (p) represents the upper vertex target at the p position coordinate on the target vertex confidence truth map Confidence
  • n ped represents the total number of specified targets in the training set sample image
  • P up represents the coordinate position of the vertex target on each specified target in the calibration training set sample image
  • ⁇ up indicates that the upper vertex target obeys the circular Gaussian distribution N Variance
  • down represents the lower vertex target of the specified target
  • D down (p) represents the confidence of the lower vertex target at the p position coordinate on the target lower vertex confidence truth map
  • P down represents the specified in the calibration training set sample image
  • the associated field of the line between the upper vertex target and the lower vertex target is subject to the magnitude unit vector.
  • the unit vector has an amplitude equal to 1, and the direction is along the line direction.
  • the embodiment is not limited to this. According to the line connecting the center positions of the upper and lower edges of each specified target, and formulas (3) and (4), the true value map of the target upper and lower vertex associated fields of the preset training set sample image can be generated.
  • Equation (4) indicates that the associated field on the line connecting the upper vertex target and the lower vertex target of the specified target is a unit vector vr having an amplitude equal to 1, along the line direction.
  • the target vertex confidence truth value map of the preset training set sample image, the target lower vertex confidence truth value map, and the target upper and lower vertex correlation field truth value map are generated as shown in FIG. 4, taking pedestrian detection as an example, and the target can be obtained from the target. It can be seen from the upper vertex confidence truth graph that each bright point corresponds to the upper vertex target of each specified target in the sample image of the preset training set, and it is seen from the true value map of the target lower vertex confidence that each bright point corresponds to The lower vertex target of each specified target in the sample image of the preset training set; as seen from the truth map of the associated field of the upper and lower vertex of the target, each connection is the connection of the upper vertex target and the lower vertex target of each specified target line.
  • S304 Input the preset training set sample image into the initial full convolutional neural network, and obtain a target upper vertex confidence distribution map, a target lower vertex confidence distribution map, and a target upper and lower vertex correlation field map of the preset training set sample image.
  • the network parameters of the initial full convolutional neural network are preset values; the initial upper convolutional neural network can obtain the target upper vertex confidence distribution map of the preset training set sample image, the target lower vertex confidence distribution map, and the target upper and lower Vertex associated field map, the target vertex confidence distribution map is compared with the vertex confidence truth map on the target, and the target lower vertex confidence map is used to compare with the target vertex confidence truth map, the target
  • the upper and lower vertex correlation field maps are compared with the above-mentioned target upper and lower vertex associated field truth maps, and the training parameters are continuously trained and updated, so that the target upper vertex confidence distribution map and the target upper vertex confidence are outputted by the full convolutional neural network.
  • the degree truth map is close
  • the target lower vertex confidence distribution map is close to the target lower vertex confidence truth value map
  • the target upper and lower vertex correlation field map is close to the target upper and lower vertex correlation field truth map, and when it is close enough
  • the full convolutional neural network is determined as a trained full convolutional neural network that can perform target detection.
  • the full convolutional neural network may include: a convolution layer, a downsampling layer, and a deconvolution layer.
  • the full convolutional neural network often includes at least one convolution layer and at least one downsampling layer, and the deconvolution layer is an optional layer, in order to make the resolution of the obtained feature map and the resolution of the input preset training set sample image. Similarly, the step of reducing the conversion ratio of the image compression ratio facilitates the calculation of the confidence. After the last convolutional layer, a deconvolution layer can be set.
  • the step of obtaining the target vertex confidence distribution map, the target lower vertex confidence distribution map, and the target upper and lower vertex correlation field maps of the preset training set sample image may be implemented by the following steps.
  • the preset training set sample image is input into the initial full convolutional neural network, and the features of the preset training set sample image are extracted through the network structure in which the convolution layer and the downsampling layer are arranged.
  • the feature is upsampled to the same resolution as the preset training set sample image by the deconvolution layer, and the upsampled result is obtained.
  • the preset training set sample image is input into the initial full convolutional neural network, as shown in FIG. 5, and the features from the lower layer to the upper layer are sequentially extracted by using a series of convolutional layers and downsampling layers, the series of convolutional layers and downsampling layers. It is arranged in phase.
  • the deconvolution layer is then connected to upsample the feature to the input preset training set sample image size.
  • the result obtained in the second step is calculated by using the 1 ⁇ 1 convolution layer, and the target upper vertex confidence distribution map and the target lower vertex confidence distribution map and the target are obtained with the same resolution as the preset training set sample image.
  • the upper and lower vertices are associated with the field map
  • the result of the upsampling can be finally performed by a roll of layers.
  • the convolution kernel size of the convolution layer may be selected from a convolution kernel of size 1 ⁇ 1, 3 ⁇ 3, or 5 ⁇ 5, but in order to accurately extract features of one pixel, the convolution layer may be selected.
  • the convolution kernel size is 1 ⁇ 1
  • the convolutional layer on the target can be obtained by the operation of the convolution layer, the target lower vertex confidence distribution map, and the target upper and lower vertex correlation field map.
  • S305 Calculate a first average error of a target upper vertex confidence distribution map of the preset training set sample image and a target upper vertex confidence true value map, a target lower vertex confidence distribution map, and a target lower vertex confidence true value map.
  • the first average error, the second average error, or the third average error is greater than the preset error threshold, update the network parameter according to the first average error, the second average error, the third average error, and the preset gradient operation strategy.
  • the full convolutional neural network can be trained by the classical back propagation algorithm.
  • the preset gradient operation strategy can be the ordinary gradient descent method or the stochastic gradient descent method.
  • the gradient descent method uses the negative gradient direction as the search direction. Close to the target value, the smaller the step size, the slower the progress. Since the stochastic gradient descent method uses only one sample at a time, the iteration speed is much higher than the gradient descent. Therefore, in order to improve the operation efficiency, the embodiment may use a random gradient descent method to update the network parameters.
  • the first average error of the target upper vertex confidence distribution map and the target upper vertex confidence truth value map outputted by the preset training set sample image after passing through the full convolutional neural network, and the target lower vertex confidence distribution map are calculated.
  • the network parameters of the full convolutional neural network are iteratively performed until the average error is no longer decreased.
  • the network parameters of the full convolutional neural network include the convolution kernel parameters and the offset parameters of the convolutional layer.
  • L D ( ⁇ ) represents the first average error or the second average error
  • represents the network parameter of the full convolutional neural network
  • N represents the number of sample images of the preset training set
  • F D (X i ; ⁇ ) represents the total Convolutional neural network output target vertex confidence distribution map or target lower vertex confidence distribution map
  • X i represents input image input to the network, number i; i represents image number
  • D i represents through formula (1) and Equation (2) obtained on the target vertex confidence truth map or target lower vertex confidence truth map
  • L A ( ⁇ ) represents the third average error
  • F A (X i ; ⁇ ) represents the full convolutional neural network output
  • the upper and lower vertices of the target are associated with the field map
  • a i represents the true value map of the associated upper and lower vertices obtained by equations (3) and (4)
  • represents the balance parameter of the two errors, usually taking a value of 1.0.
  • S309 Determine, by using a preset target determining method, at least one upper vertex target and at least one lower vertex target in the image to be detected, respectively, for the target upper vertex confidence distribution map and the target lower vertex confidence distribution map.
  • S311 Determine, according to the associated field value between the first vertex target and each second vertex target connection, by matching the upper and lower vertices, determining that the connection with the largest associated field value is the specified target.
  • S307 to S311 are the same as the steps of the embodiment shown in FIG. 1, and have the same or similar beneficial effects, and are not described herein again.
  • the acquired image to be detected is input into the trained full convolutional neural network, and the target upper vertex confidence distribution map, the target lower vertex confidence distribution map, and the target upper and lower vertex correlation field map are generated.
  • Determining the upper vertex target and the lower vertex target in the image to be detected according to the vertex confidence distribution map and the target lower vertex confidence distribution map, respectively, and mapping the upper vertex target and the lower vertex target to the target upper and lower vertex associated fields by the upper vertex target and the lower vertex target respectively The figure calculates the associated field value between the first vertex target and the second vertex target connection.
  • the upper and lower vertices are matched to determine the connection with the largest associated field value as the specified target. .
  • the upper and lower vertices of the specified target can be extracted, and the connection between the upper vertex and the lower vertex is established by mapping, and then the matching upper and lower vertex connections are matched as the designated target by matching.
  • the specified target is represented by a line, and the overlap of the candidate frames is excluded. Even if the specified target is densely distributed, since the upper and lower vertices of the specified target can be accurately located through the full convolutional neural network, the connection between the upper and lower vertices can be clearly distinguished. Each specified target improves the accuracy of target detection.
  • the specified target of the detection is the connection between the upper vertex target and the lower vertex target
  • the connection can clearly and clearly reflect the posture information of the specified target (for example, forward tilt, backward tilt, lean over, etc.), which is beneficial to the follow-up.
  • the features with high discrimination are extracted layer by layer by convolution and mapping, and then the upper and lower vertices of the target are accurately positioned and matched, and the matching upper and lower vertices are used as the specified target detection result, which is more robust and specified.
  • the advantage of high target detection accuracy is high. At the same time, it is not necessary to preset a certain scale and aspect ratio anchor point frame as the reference frame.
  • the performance of the algorithm target detection does not depend on the selection of the anchor point frame, and adaptively Solved the scale and aspect ratio of the target.
  • the preset training set sample image is set for the upper vertex target and the lower vertex target of the specified target with different features, and the training and iteration of the sample image of the preset training set are performed.
  • the obtained full convolutional neural network has strong generalization ability, avoids complicated classifier cascade mode, and has a simpler structure.
  • the target detection method provided by the embodiment of the present application is introduced in conjunction with a specific application example for detecting a pedestrian target.
  • the image to be detected is collected by the monitoring device, and the image to be detected is input into the trained full convolutional neural network, and the target upper vertex confidence distribution map and the target lower vertex confidence distribution are obtained.
  • the map and the target upper and lower vertex are associated with the field map; respectively, for the target upper vertex confidence distribution map and the target lower vertex confidence distribution map for the image to be detected, non-maximum value suppression is used to determine the position of the center point of each detection target, And the confidence level of the pixel in the neighborhood of the detection center point of the target is greater than the preset reliability threshold, and the center position target between the pedestrian head vertex target and the pedestrian foot is determined.
  • the center position target between the pedestrian head vertex target and the pedestrian foot is mapped to the above-mentioned target upper and lower vertex associated field map, and the correlation degree value of each pedestrian head vertex target and the center position target between each pedestrian foot is obtained.
  • the correlation degree value the mean value of the correlation degree value between the head vertex target of each pedestrian and the center position target of each pedestrian's feet can be obtained, and the detection result shown in FIG. 6 is determined by the judgment and matching of the mean value, each of which is connected.
  • the line is a pedestrian goal.
  • the present scheme generates a target upper vertex confidence distribution map, a target lower vertex confidence distribution map, and a target upper and lower vertex by inputting the acquired image to be detected into the trained full convolutional neural network.
  • Correlating the field map determining the upper vertex target and the lower vertex target in the image to be detected according to the vertex confidence distribution map and the target lower vertex confidence distribution map respectively, and mapping the upper vertex target and the lower vertex target to the target upper and lower targets
  • the vertex is associated with the field map, and the associated field value between the first vertex target and the second vertex target is calculated.
  • the upper and lower vertices are matched to determine the maximum associated field value. To specify the target.
  • the upper and lower vertices of the specified target can be extracted, and the connection between the upper vertex and the lower vertex is established by mapping, and then the matching upper and lower vertex connections are matched as the designated target by matching.
  • the specified target is represented by a line, and the overlap of the candidate frames is excluded. Even if the specified target is densely distributed, since the upper and lower vertices of the specified target can be accurately located through the full convolutional neural network, the connection between the upper and lower vertices can be clearly distinguished. Each specified target improves the accuracy of target detection.
  • the specified target of the detection is the connection between the upper vertex target and the lower vertex target
  • the connection can clearly and clearly reflect the posture information of the specified target (for example, forward tilt, backward tilt, lean over, etc.), which is beneficial to the follow-up.
  • the posture information of the specified target for example, forward tilt, backward tilt, lean over, etc.
  • the embodiment of the present application provides a target detecting device.
  • the target detecting device includes:
  • the first acquiring module 710 is configured to acquire an image to be detected collected by the image collector
  • a first generating module 720 configured to input the image to be detected into the trained full convolutional neural network, generate a target upper vertex confidence distribution map, a target lower vertex confidence distribution map, and a target Upper and lower vertices associated with the field map;
  • the target determining module 730 is configured to determine at least one upper vertex target and at least one of the to-be-detected images by using a preset target determining method for the target upper vertex confidence distribution map and the target lower vertex confidence distribution map, respectively. a lower vertex target;
  • a first calculation module 740 configured to calculate the first vertex target and each second vertex for the first vertex target by mapping each upper vertex target and each lower vertex target to the target upper and lower vertex associated field maps An associated field value between the target connections, wherein if the first vertex target is any upper vertex target, the second vertex target is any lower vertex target, if the first vertex target is any lower vertex target And the second vertex target is any upper vertex target;
  • the matching module 750 is configured to determine, according to the associated field value between the first vertex target and each second vertex target connection, by matching the upper and lower vertices, determining that the connection with the largest associated field value is the specified target.
  • the acquired image to be detected is input into the trained full convolutional neural network, and the target upper vertex confidence distribution map, the target lower vertex confidence distribution map, and the target upper and lower vertex correlation field map are generated.
  • Determining the upper vertex target and the lower vertex target in the image to be detected according to the vertex confidence distribution map and the target lower vertex confidence distribution map, respectively, and mapping the upper vertex target and the lower vertex target to the target upper and lower vertex associated fields by the upper vertex target and the lower vertex target respectively The figure calculates the associated field value between the first vertex target and the second vertex target connection.
  • the upper and lower vertices are matched to determine the connection with the largest associated field value as the specified target. .
  • the upper and lower vertices of the specified target can be extracted, and the connection between the upper vertex and the lower vertex is established by mapping, and then the matching upper and lower vertex connections are matched as the designated target by matching.
  • the specified target is represented by a line, and the overlap of the candidate frames is excluded. Even if the specified target is densely distributed, since the upper and lower vertices of the specified target can be accurately located through the full convolutional neural network, the connection between the upper and lower vertices can be clearly distinguished. Each specified target improves the accuracy of target detection.
  • the specified target of the detection is the connection between the upper vertex target and the lower vertex target, the connection can clearly and clearly reflect the posture information of the specified target (for example, forward tilt, backward tilt, lean over, etc.), which is beneficial to the follow-up.
  • the features with high discrimination are extracted layer by layer by convolution and mapping, and then the upper and lower vertices of the target are accurately positioned and matched, and the matching upper and lower vertices are used as the specified target detection result, which is more robust and specified.
  • the advantage of high target detection accuracy is high.
  • the target determining module 730 is specifically configured to:
  • Determining a position of a center point of the at least one detection target by using a non-maximum value suppression method for the target upper vertex confidence distribution map and the target lower vertex confidence distribution map;
  • Determining a confidence level of each pixel in the vertex confidence distribution graph on the target is greater than a confidence level of the pre-set reliability threshold, and the confidence of each pixel in the target lower vertex confidence map
  • the detection target that is greater than the preset reliability threshold is the lower vertex target.
  • the first calculating module 740 is specifically configured to:
  • the first vertex target is connected to each second vertex target for the first vertex target
  • the matching module 750 is specifically configured to:
  • connection corresponding to the largest associated field value is determined as the specified target.
  • the matching module 750 is further configured to:
  • connection corresponding to the largest associated field value is determined as the specified target.
  • the target detecting apparatus in the embodiment of the present application is the apparatus applying the embodiment of the target detecting method shown in FIG. 1, and all the embodiments of the target detecting method are applicable to the apparatus, and all of the same or similar can be achieved. Beneficial effect.
  • the embodiment of the present application further provides a target detecting device.
  • the target detecting device may include:
  • the first obtaining module 810 is configured to acquire an image to be detected collected by the image collector
  • the second obtaining module 820 is configured to acquire a preset training set sample image, and a line connecting the upper edge center position, the lower edge center position, and the center position of the upper and lower edges of each specified target in the preset training set sample image;
  • a second generating module 830 configured to generate, according to a preset distribution law, an upper edge center position and a lower edge center position of each specified target, a target upper vertex confidence true value map and a target lower vertex of the preset training set sample image Confidence truth map;
  • the third generation module 840 is configured to generate a true value map of the target upper and lower vertex associated fields of the preset training set sample image according to the connection of the top and bottom edge center positions of the specified targets;
  • the extraction module 850 is configured to input the preset training set sample image into the initial full convolutional neural network, obtain a target upper vertex confidence distribution map, a target lower vertex confidence distribution map, and a target of the preset training set sample image.
  • the upper and lower vertices are associated with the field map, wherein the network parameters of the initial full convolutional neural network are preset values;
  • a second calculating module 860 configured to calculate a first average error of a target upper vertex confidence distribution map of the preset training set sample image and a target upper vertex confidence truth value map, and a target of the preset training set sample image a second average error of the lower vertex confidence distribution map and the target lower vertex confidence truth map, and a third average of the target upper and lower vertex associated field maps of the preset training set sample image and the target upper and lower vertex associated field truth maps error;
  • the loop module 870 is configured to: if the first average error, the second average error, or the third average error is greater than a preset error threshold, according to the first average error, the second average error, The third average error and the preset gradient operation strategy are described, the network parameters are updated, an updated full convolutional neural network is obtained, and the first average error, the second average error, and the third obtained by the updated full convolutional neural network are calculated.
  • a first generating module 880 configured to input the image to be detected into the trained full convolutional neural network, generate a target upper vertex confidence distribution map, a target lower vertex confidence distribution map, and a target Upper and lower vertices associated with the field map;
  • a target determining module 890 configured to determine at least one upper vertex target and at least one of the to-be-detected images by using a preset target determining method for the target upper vertex confidence distribution map and the target lower vertex confidence distribution map, respectively a lower vertex target;
  • a first calculating module 8100 configured to calculate the first vertex target and each second vertex for the first vertex target by mapping each upper vertex target and each lower vertex target to the target upper and lower vertex associated field maps An associated field value between the target connections, wherein if the first vertex target is any upper vertex target, the second vertex target is any lower vertex target, if the first vertex target is any lower vertex target And the second vertex target is any upper vertex target;
  • the matching module 8110 is configured to determine, according to the associated field value between the first vertex target and each second vertex target connection, by matching the upper and lower vertices, the connection with the largest associated field value is the specified target.
  • the acquired image to be detected is input into the trained full convolutional neural network, and the target upper vertex confidence distribution map, the target lower vertex confidence distribution map, and the target upper and lower vertex correlation field map are generated.
  • Determining the upper vertex target and the lower vertex target in the image to be detected according to the vertex confidence distribution map and the target lower vertex confidence distribution map, respectively, and mapping the upper vertex target and the lower vertex target to the target upper and lower vertex associated fields by the upper vertex target and the lower vertex target respectively The figure calculates the associated field value between the first vertex target and the second vertex target connection.
  • the upper and lower vertices are matched to determine the connection with the largest associated field value as the specified target. .
  • the upper and lower vertices of the specified target can be extracted, and the connection between the upper vertex and the lower vertex is established by mapping, and then the matching upper and lower vertex connections are matched as the designated target by matching.
  • the specified target is represented by a line, and the overlap of the candidate frames is excluded. Even if the specified target is densely distributed, since the upper and lower vertices of the specified target can be accurately located through the full convolutional neural network, the connection between the upper and lower vertices can be clearly distinguished. Each specified target improves the accuracy of target detection.
  • the specified target is the connection between the upper vertex target and the lower vertex target
  • the connection can clearly and clearly reflect the posture information of the specified target (for example, forward tilt, backward tilt, lean over, etc.), which is beneficial to the follow-up.
  • the features with high discrimination are extracted layer by layer by convolution and mapping, and then the upper and lower vertices of the target are accurately positioned and matched, and the matching upper and lower vertices are used as the specified target detection result, which is more robust and specified.
  • the advantage of high target detection accuracy is high. At the same time, it is not necessary to preset a certain scale and aspect ratio anchor point frame as the reference frame.
  • the performance of the algorithm target detection does not depend on the selection of the anchor point frame, and adaptively Solved the scale and aspect ratio of the target.
  • the preset training set sample image is set for the upper vertex target and the lower vertex target of the specified target with different features, and the training and iteration of the sample image of the preset training set are performed.
  • the obtained full convolutional neural network has strong generalization ability, avoids complicated classifier cascade mode, and has a simpler structure.
  • the full convolutional neural network includes: a convolution layer, a downsampling layer, and a deconvolution layer;
  • the extraction module 850 can be specifically configured to:
  • the result is calculated by using a 1 ⁇ 1 convolution layer, and a target upper vertex confidence distribution map, a target lower vertex confidence distribution map, and a target upper and lower vertex correlation field map are obtained with the same resolution as the preset training set sample image. .
  • the object detecting apparatus in the embodiment of the present application is the apparatus applying the embodiment of the object detecting method shown in FIG. 3, and all the embodiments of the target detecting method are applicable to the apparatus, and all of the same or similar can be achieved. Beneficial effect.
  • the embodiment of the present application provides a storage medium for storing executable code, which is executed at runtime: provided by the embodiment of the present application. All steps of the target detection method.
  • the storage medium stores executable code that executes the target detection method provided by the embodiment of the present application at runtime, and thus can implement: using a trained full convolutional neural network, which can be extracted to a specified target.
  • the vertex and the lower vertex are connected, and the connection between the upper vertex and the lower vertex is established by mapping, and then the matching upper and lower vertex lines are matched as the specified target by matching, and the specified target is represented by a line, and the overlapping of the candidate frames is excluded.
  • the specified target distribution is dense, since the upper and lower vertices of the specified target can be accurately located by the full convolutional neural network, the specified targets can be clearly distinguished by the connection of the upper and lower vertices, and the accuracy of the target detection is improved.
  • the embodiment of the present application provides an application program for performing all the steps of the target detection method provided by the embodiment of the present application.
  • the application performs the target detection method provided by the embodiment of the present application at runtime, and thus can implement: using the trained full convolutional neural network, the upper vertex and the lower vertex of the specified target can be extracted, and The connection between the upper vertex and the lower vertex is established by mapping, and then the matching upper and lower vertex lines are matched as the specified target, and the specified target is represented by a line, and the overlapping of the candidate frames is excluded, even if the specified target is densely distributed, Since the upper and lower vertices of the specified target can be accurately located by the full convolutional neural network, the specified targets can be clearly distinguished by the connection of the upper and lower vertices, and the accuracy of the target detection is improved.
  • the embodiment of the present application further provides a computer device, as shown in FIG. 9, including an image collector 901, a processor 902, and a storage medium 903, where
  • An image collector 901, configured to collect an image to be detected
  • the processor 902 is configured to implement all the steps of the target detection method provided by the embodiments of the present application when the executable code stored on the storage medium 903 is executed.
  • the image collector 901, the processor 902, and the storage medium 903 can perform data transmission by means of a wired connection or a wireless connection, and the computer device can communicate with other devices through a wired communication interface or a wireless communication interface.
  • the storage medium may include a RAM (Random Access Memory), and may also include an NVM (Non-volatile Memory), such as at least one disk storage. Alternatively, the storage medium may also be at least one storage device located remotely from the aforementioned processor.
  • RAM Random Access Memory
  • NVM Non-volatile Memory
  • the processor may be a general-purpose processor, including a CPU (Central Processing Unit), an NP (Network Processor), or the like; or a DSP (Digital Signal Processor) or an ASIC (Application) Specific Integrated Circuit, FPGA (Field-Programmable Gate Array) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components.
  • CPU Central Processing Unit
  • NP Network Processor
  • DSP Digital Signal Processor
  • ASIC Application) Specific Integrated Circuit
  • FPGA Field-Programmable Gate Array
  • other programmable logic device discrete gate or transistor logic device, discrete hardware components.
  • the image collector may be a camera for shooting a monitoring area for video capture or picture capture.
  • the processor of the computer device can realize the extraction of the specified target by using the trained full convolutional neural network by reading the executable code stored in the storage medium and running the executable code.
  • the upper vertex and the lower vertex, and the connection between the upper vertex and the lower vertex is established by mapping, and then the matching upper and lower vertex lines are matched as the designated target by matching, and the specified target is represented by a line, and the overlapping of the candidate frames is excluded. Occurs, even if the specified target is densely distributed, since the upper and lower vertices of the specified target can be accurately located by the full convolutional neural network, the specified targets can be clearly distinguished by the connection of the upper and lower vertices, and the accuracy of the target detection is improved.
  • the description is relatively simple, and the relevant parts can be referred to the description of the method embodiment.

Abstract

Embodiments of the present application provide a target detection method and apparatus, and a computer device. The target detection method comprises: obtaining an image to be detected, which is collected by means of an image collector; inputting the image to be detected into a total-convolutional neural network obtained by means of training, so as to obtain a target upper-vertex confidence distribution diagram, a target lower-vertex confidence distribution diagram and a target upper-vertex and lower-vertex associative field diagram of the image to be detected; for the target upper-vertex confidence distribution diagram and the target lower-vertex confidence distribution diagram, respectively determining upper-vertex targets and lower-vertex targets in the image to be detected by using a preset target determining method; for a first vertex target, computing associative field values of connection lines between the first vertex target and second vertex targets by mapping the upper-vertex targets and the lower-vertex targets to the target upper-vertex and lower-vertex associative field diagram; and based on the associative field values and by matching upper vertexes with lower vertexes, determining a connection line with a maximum associative field value, as a specified target. By means of the solution, the accuracy of target detection can be improved.

Description

一种目标检测方法、装置及计算机设备Target detection method, device and computer equipment
本申请要求于2017年10月23日提交中国专利局、申请号为201711004621.7发明名称为“一种目标检测方法、装置及计算机设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese Patent Application entitled "A Target Detection Method, Apparatus, and Computer Equipment" by the Chinese Patent Office, filed on October 23, 2017, the entire disclosure of which is hereby incorporated by reference. In the application.
技术领域Technical field
本申请涉及机器视觉技术领域,特别涉及一种目标检测方法、装置及计算机设备。The present application relates to the field of machine vision technology, and in particular, to a target detection method, device and computer device.
背景技术Background technique
随着社会的不断进步,视频监控系统的应用范围越来越广泛,智能监控作为视频监控技术的一个研究热点,近年来,在一些特定的场合,例如银行、车站、商场等公共场合逐渐普及。目标检测作为智能监控的一环节,有着非常重要的意义,目标检测可以定义为:判断输入图像或视频中是否存在指定目标,如果存在指定目标,则输出指定目标在图像或者视频中的位置信息。目前,常用的目标检测方法主要有背景差法、帧差法、光流法、模板匹配和基于机器学习的方法。前四种目标检测方法都是常规基于图像处理的目标检测方法,易受到光照变化、色彩和姿态等影响。而基于机器学习的目标检测方法,从样本集中学习指定目标的不同变化,具有较好的鲁棒性。With the continuous advancement of society, the application range of video surveillance systems is more and more extensive. As a research hotspot of video surveillance technology, intelligent surveillance has become popular in some special occasions such as banks, stations, shopping malls and other public places. As a part of intelligent monitoring, target detection has a very important meaning. Target detection can be defined as: determining whether there is a specified target in the input image or video, and if there is a specified target, outputting the position information of the specified target in the image or video. At present, commonly used target detection methods mainly include background difference method, frame difference method, optical flow method, template matching and machine learning-based methods. The first four target detection methods are conventional image detection-based target detection methods, which are susceptible to illumination changes, colors, and poses. The machine-based target detection method, which learns different changes of the specified target from the sample set, has better robustness.
相关的基于机器学习的目标检测方法中,首先构建训练样本集,通过对训练样本集进行训练,得到一个卷积神经网络模型。在进行目标检测时,将待检测的图片输入训练好的卷积神经网络模型,可以得到指定目标所对应的候选框和置信度,然后进行非极大值抑制和阈值筛选,确定待检测的图片中的指定目标。In the related machine learning-based target detection method, the training sample set is first constructed, and a convolutional neural network model is obtained by training the training sample set. When performing target detection, input the image to be detected into the trained convolutional neural network model, and obtain the candidate frame and confidence level corresponding to the specified target, and then perform non-maximum suppression and threshold screening to determine the image to be detected. The specified target in .
但是,在一些特殊的场景下,目标的分布较为密集,例如,在人群密集的场景下,行人目标会出现拥挤的情况,这样,使得在利用上述基于机器学习的目标检测方法中,所得到的候选框之间存在重叠的情况,对相互重叠的候选框进行非极大值抑制,可能会舍弃掉真实的指定目标对应的候选框,导致漏检部分目标,具有一定的检测误差。However, in some special scenarios, the distribution of targets is relatively dense. For example, in crowd-intensive scenarios, pedestrian targets may be crowded, so that the obtained target learning method based on machine learning is obtained. If there is overlap between the candidate frames, the non-maximum suppression of the candidate frames that overlap each other may discard the candidate frame corresponding to the real specified target, resulting in partial detection of the target and having a certain detection error.
发明内容Summary of the invention
本申请实施例的目的在于提供一种目标检测方法、装置及计算机设备,以提高目标检测的准确度。具体技术方案如下:The purpose of the embodiments of the present application is to provide a target detection method, apparatus, and computer device to improve the accuracy of target detection. The specific technical solutions are as follows:
第一方面,本申请实施例提供了一种目标检测方法,所述方法包括:In a first aspect, an embodiment of the present application provides a target detection method, where the method includes:
获取通过图像采集器采集的待检测图像;Obtaining an image to be detected collected by the image collector;
将所述待检测图像输入经训练得到的全卷积神经网络,生成所述待检测图像的目标上顶点置信度分布图、目标下顶点置信度分布图,以及目标上下顶点关联场图;And inputting the image to be detected into the trained full convolutional neural network, generating a target upper vertex confidence distribution map, a target lower vertex confidence distribution map, and a target upper and lower vertex correlation field map;
分别针对所述目标上顶点置信度分布图及所述目标下顶点置信度分布图,采用预设目标确定方法,确定所述待检测图像中至少一个上顶点目标及至少一个下顶点目标;Determining, by using a preset target determining method, at least one upper vertex target and at least one lower vertex target in the image to be detected, respectively, for the target upper vertex confidence distribution map and the target lower vertex confidence distribution map;
通过将各上顶点目标及各下顶点目标映射至所述目标上下顶点关联场图中,针对第一顶点目标,分别计算所述第一顶点目标与各第二顶点目标连线间的关联场值,其中,若所述第一顶点目标为任一上顶点目标,则所述第二顶点目标为任一下顶点目标,若所述第一顶点目标为任一下顶点目标,则所述第二顶点目标为任一上顶点目标;Calculating the associated field value between the first vertex target and each second vertex target connection for the first vertex target by mapping each upper vertex target and each lower vertex target to the target upper and lower vertex associated field maps Wherein, if the first vertex target is any upper vertex target, the second vertex target is any lower vertex target, and if the first vertex target is any lower vertex target, the second vertex target For any upper vertex target;
基于所述第一顶点目标与各第二顶点目标连线间的关联场值,通过对上下顶点进行匹配,确定关联场值最大的连线为指定目标。Based on the associated field value between the first vertex target and each second vertex target connection, the upper and lower vertices are matched to determine that the connection with the largest associated field value is the specified target.
第二方面,本申请实施例提供了一种目标检测装置,所述装置包括:In a second aspect, an embodiment of the present application provides a target detecting apparatus, where the apparatus includes:
第一获取模块,用于获取通过图像采集器采集的待检测图像;a first acquiring module, configured to acquire an image to be detected collected by the image collector;
第一生成模块,用于将所述待检测图像输入经训练得到的全卷积神经网络,生成所述待检测图像的目标上顶点置信度分布图、目标下顶点置信度分布图,以及目标上下顶点关联场图;a first generating module, configured to input the image to be detected into the trained full convolutional neural network, generate a target upper vertex confidence distribution map, a target lower vertex confidence distribution map, and a target upper and lower Vertex associated field map;
目标确定模块,用于分别针对所述目标上顶点置信度分布图及所述目标下顶点置信度分布图,采用预设目标确定方法,确定所述待检测图像中至少一个上顶点目标及至少一个下顶点目标;a target determining module, configured to determine at least one upper vertex target and at least one of the to-be-detected images by using a preset target determining method for the target upper vertex confidence distribution map and the target lower vertex confidence distribution map, respectively Lower vertex target;
第一计算模块,用于通过将各上顶点目标及各下顶点目标映射至所述目 标上下顶点关联场图中,针对第一顶点目标,分别计算所述第一顶点目标与各第二顶点目标连线间的关联场值,其中,若所述第一顶点目标为任一上顶点目标,则所述第二顶点目标为任一下顶点目标,若所述第一顶点目标为任一下顶点目标,则所述第二顶点目标为任一上顶点目标;a first calculating module, configured to calculate the first vertex target and each second vertex target for the first vertex target by mapping each upper vertex target and each lower vertex target to the target upper and lower vertex associated field maps An associated field value between the lines, wherein if the first vertex target is any upper vertex target, the second vertex target is any lower vertex target, and if the first vertex target is any lower vertex target, Then the second vertex target is any upper vertex target;
匹配模块,用于基于所述第一顶点目标与各第二顶点目标连线间的关联场值,通过对上下顶点进行匹配,确定关联场值最大的连线为指定目标。And a matching module, configured to determine, according to the associated field value between the first vertex target and each second vertex target connection, by matching the upper and lower vertices, determining that the connection with the largest associated field value is the specified target.
第三方面,本申请实施例提供了一种存储介质,用于存储可执行代码,所述可执行代码用于在运行时执行:本申请实施例第一方面所提供的目标检测方法。In a third aspect, the embodiment of the present application provides a storage medium for storing executable code, where the executable code is used to execute at a runtime: the target detection method provided by the first aspect of the embodiment of the present application.
第四方面,本申请实施例提供了一种应用程序,用于在运行时执行:本申请实施例第一方面所提供的目标检测方法。In a fourth aspect, an embodiment of the present application provides an application for performing the target detection method provided by the first aspect of the embodiment of the present application.
第五方面,本申请实施例提供了一种计算机设备,包括图像采集器、处理器和存储介质,其中,In a fifth aspect, an embodiment of the present application provides a computer device, including an image collector, a processor, and a storage medium, where
所述图像采集器,用于采集待检测图像;The image collector is configured to collect an image to be detected;
所述存储介质,用于存放可执行代码;The storage medium is configured to store executable code;
所述处理器,用于执行所述存储介质上所存放的可执行代码时,实现如第一方面所提供的目标检测方法。The processor, when used to execute executable code stored on the storage medium, implements the object detection method provided by the first aspect.
综上可见,本申请实施例提供的方案中,通过将获取的待检测图像输入经训练得到的全卷积神经网络,生成待检测图像的目标上顶点置信度分布图、目标下顶点置信度分布图和目标上下顶点关联场图,分别根据目标上顶点置信度分布图和目标下顶点置信度分布图,确定待检测图像中的上顶点目标和下顶点目标,再通过将上顶点目标和下顶点目标映射至目标上下顶点关联场图,计算得到针对第一顶点目标、与各第二顶点目标连线间的关联场值,最后,基于各关联场值,通过对上下顶点进行匹配,确定关联场值最大的连线为指定目标。采用经训练得到的全卷积神经网络,能够提取到指定目标的上顶点和下顶点,并且通过映射建立上顶点与下顶点的连接,再通过匹配,将匹配成功的上下顶点连线作为指定目标,指定目标用连线表示,排除了候选框出现重叠的情况发生,即使指定目标分布密集,由于指定目标的上下顶点 可以通过全卷积神经网络准确定位,则可以用上下顶点的连线清晰区分各指定目标,提高了目标检测的准确度。In summary, in the solution provided by the embodiment of the present application, the acquired image to be detected is input into the trained full convolutional neural network, and the target upper vertex confidence distribution map and the target lower vertex confidence distribution are generated. The map and the upper and lower vertices of the target are associated with the field map, and the upper vertex target and the lower vertex target in the image to be detected are determined according to the target vertex confidence distribution map and the target lower vertex confidence distribution map, respectively, and then the upper vertex target and the lower vertex are obtained. The target maps to the target field map of the upper and lower vertices of the target, and calculates the associated field value between the first vertex target and the second vertex target connection. Finally, based on the associated field values, the associated field is determined by matching the upper and lower vertices. The line with the largest value is the specified target. Using the trained full convolutional neural network, the upper and lower vertices of the specified target can be extracted, and the connection between the upper vertex and the lower vertex is established by mapping, and then the matching upper and lower vertex connections are matched as the designated target by matching. The specified target is represented by a line, and the overlap of the candidate frames is excluded. Even if the specified target is densely distributed, since the upper and lower vertices of the specified target can be accurately located through the full convolutional neural network, the connection between the upper and lower vertices can be clearly distinguished. Each specified target improves the accuracy of target detection.
附图说明DRAWINGS
为了更清楚地说明本申请实施例和现有技术的技术方案,下面对实施例和现有技术中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the embodiments of the present application and the technical solutions of the prior art, the following description of the embodiments and the drawings used in the prior art will be briefly introduced. Obviously, the drawings in the following description are only Some embodiments of the application may also be used to obtain other figures from those of ordinary skill in the art without departing from the scope of the invention.
图1为本申请实施例的目标检测方法的一种流程示意图;1 is a schematic flowchart of a target detecting method according to an embodiment of the present application;
图2为本申请实施例的全卷积神经网络的结构示意图;2 is a schematic structural diagram of a full convolutional neural network according to an embodiment of the present application;
图3为本申请实施例的目标检测方法的另一种流程示意图;FIG. 3 is another schematic flowchart of a target detecting method according to an embodiment of the present application; FIG.
图4为本申请实施例的对待检测图像进行提取得到的目标上顶点置信度真值图、目标下顶点置信度真值图及目标上下顶点关联场真值图;4 is a true value map of a target upper vertex confidence obtained by extracting a to-be-detected image according to an embodiment of the present application, a true value map of the target lower vertex confidence level, and a true value map of the target upper and lower vertex associated fields;
图5为本申请实施例的另一种全卷积神经网络的结构示意图;FIG. 5 is a schematic structural diagram of another full convolutional neural network according to an embodiment of the present application; FIG.
图6为本申请实施例的行人检测结果示意图;6 is a schematic diagram of a pedestrian detection result according to an embodiment of the present application;
图7为本申请实施例的目标检测装置的一种结构示意图;FIG. 7 is a schematic structural diagram of a target detecting apparatus according to an embodiment of the present application;
图8为本申请实施例的目标检测装置的另一种结构示意图;FIG. 8 is another schematic structural diagram of an object detecting apparatus according to an embodiment of the present application; FIG.
图9为申请明实施例的计算机设备的结构示意图。FIG. 9 is a schematic structural diagram of a computer device according to an embodiment of the present application.
具体实施方式Detailed ways
为使本申请的目的、技术方案、及优点更加清楚明白,以下参照附图并举实施例,对本申请进一步详细说明。显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。In order to make the objects, technical solutions, and advantages of the present application more comprehensible, the present application will be further described in detail below with reference to the accompanying drawings. It is apparent that the described embodiments are only a part of the embodiments of the present application, and not all of them. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments of the present application without departing from the inventive scope are the scope of the present application.
下面通过具体实施例,对本申请进行详细的说明。The present application will be described in detail below through specific embodiments.
为了提高目标检测的准确度,本申请实施例提供了一种目标检测方法、装置及计算机设备。In order to improve the accuracy of the target detection, the embodiment of the present application provides a target detection method, device, and computer device.
下面,首先对本申请实施例所提供的一种目标检测方法进行介绍。In the following, a target detection method provided by an embodiment of the present application is first introduced.
本申请实施例所提供的一种目标检测方法的执行主体可以为一种搭载有核心处理芯片的计算机设备,该计算机设备可以为具有图像处理能力的摄像机、图像处理器等。实现本申请实施例所提供的一种目标检测方法的方式可以为设置于执行主体中的软件、硬件电路和逻辑电路中的至少一种。An execution subject of an object detection method provided by an embodiment of the present application may be a computer device equipped with a core processing chip, and the computer device may be a camera having an image processing capability, an image processor, or the like. A manner of implementing a target detection method provided by an embodiment of the present application may be at least one of software, a hardware circuit, and a logic circuit disposed in an execution body.
如图1所示,本申请实施例所提供的一种目标检测方法,可以包括如下步骤:As shown in FIG. 1 , an object detection method provided by an embodiment of the present application may include the following steps:
S101,获取通过图像采集器采集的待检测图像。S101. Acquire an image to be detected collected by an image collector.
其中,图像采集器可以为摄像机或者照相机,当然,图像采集器不仅限于此。如果图像采集器为摄像机,摄像机拍摄的是一段时间内的视频,待检测图像可以为该视频中的任一帧图像。The image collector may be a camera or a camera. Of course, the image collector is not limited thereto. If the image collector is a camera, the camera captures a video for a period of time, and the image to be detected can be any frame image in the video.
S102,将待检测图像输入经训练得到的全卷积神经网络,生成待检测图像的目标上顶点置信度分布图、目标下顶点置信度分布图,以及目标上下顶点关联场图。S102. Input the image to be detected into the trained full convolutional neural network, generate a target upper vertex confidence distribution map, a target lower vertex confidence distribution map, and a target upper and lower vertex correlation field map.
由于全卷积神经网络具有自动提取指定目标的上顶点目标、下顶点目标特征的能力,且全卷积神经网络的网络参数可以是通过样本训练的过程得到的。因此,利用训练得到的全卷积神经网络可以保证对指定目标的上顶点目标和下顶点目标快速识别。如图2所示,本申请实施例中,全卷积神经网络由多个卷积层和多个降采样层相间排列构成,将获取到的待检测图像输入该全卷积神经网络,通过全卷积神经网络对待检测图像中指定目标的上顶点目标、下顶点目标进行特征提取,可得到待检测图像的目标上顶点置信度分布图、目标下顶点置信度分布图和目标上下顶点关联场图。Since the full convolutional neural network has the ability to automatically extract the upper vertex target and the lower vertex target feature of the specified target, and the network parameters of the full convolutional neural network can be obtained through the process of sample training. Therefore, the fully convolutional neural network obtained by training can ensure the fast recognition of the upper vertex target and the lower vertex target of the specified target. As shown in FIG. 2, in the embodiment of the present application, the full convolutional neural network is composed of a plurality of convolution layers and a plurality of downsampling layers, and the acquired image to be detected is input into the full convolutional neural network. The convolutional neural network performs feature extraction on the upper vertex target and the lower vertex target of the specified target in the detected image, and can obtain the target upper vertex confidence distribution map, the target lower vertex confidence distribution map, and the target upper and lower vertex correlation field map of the image to be detected. .
目标上顶点置信度分布图、目标下顶点置信度分布图可以理解为:所检测的目标为指定目标的上顶点的概率和下顶点的概率的分布图。例如,如果指定目标为行人,则目标上顶点置信度分布图为所检测的目标为人头顶点的概率分布图;目标下顶点置信度分布图为所检测的目标为行人双脚的概率分布图。目标上下顶点关联场图上每一像素点表征该位置存在指定目标的上顶点目标或者下顶点目标的关联程度值。目标上顶点置信度分布图及目标下顶 点置信度分布图中的参数,可以为具体每个识别的区域内的目标为指定目标的上顶点目标和下顶点目标的具体概率值,其中,识别的区域是与目标的位置及大小相关的区域,通常情况下该区域的面积可以大于或等于目标的实际大小;也可以用像素点的像素值代表概率的大小,该区域中每个像素点的像素值越大,则该区域内的目标为指定目标的上顶点目标或者下顶点目标的概率也越大,当然,本申请实施例中目标上顶点置信度分布图和目标下顶点置信度分布图的具体参数不仅限于此。The target vertex confidence distribution graph and the target lower vertex confidence distribution map can be understood as: the detected target is a distribution map of the probability of the upper vertex of the specified target and the probability of the lower vertex. For example, if the specified target is a pedestrian, the target upper vertex confidence distribution map is a probability distribution map of the detected target as the head vertex; the target lower vertex confidence distribution map is a probability distribution map of the detected target as the pedestrian's feet. Each pixel on the associated field graph of the target upper and lower vertices represents the correlation degree value of the upper vertex target or the lower vertex target of the specified target at the position. The parameters in the vertex confidence distribution map and the target vertex confidence distribution graph on the target may be specific probability values of the upper vertex target and the lower vertex target of the specified target for each target in the identified region, wherein the identified The area is an area related to the position and size of the target. Generally, the area of the area may be greater than or equal to the actual size of the target. The pixel value of the pixel may also be used to represent the magnitude of the probability, and the pixel of each pixel in the area. The larger the value, the greater the probability that the target in the region is the upper vertex target or the lower vertex target of the specified target. Of course, in the embodiment of the present application, the target vertex confidence distribution map and the target lower vertex confidence distribution map are The specific parameters are not limited to this.
可选的,全卷积神经网络可以包括:卷积层、降采样层及反卷积层。全卷积神经网络往往包括至少一个卷积层和至少一个降采样层,反卷积层为一个可选层,为了使得到的特征图的分辨率与输入的待检测图像的分辨率相同,以减少图像压缩比例的换算的步骤,便于进行特征提取,在最后一个卷积层之后,可以设置一反卷积层。Optionally, the full convolutional neural network may include: a convolution layer, a downsampling layer, and a deconvolution layer. The full convolutional neural network often includes at least one convolution layer and at least one downsampling layer, and the deconvolution layer is an optional layer, in order to make the resolution of the obtained feature image the same as the resolution of the input image to be detected, The step of reducing the conversion of the image compression ratio facilitates feature extraction. After the last convolution layer, a deconvolution layer can be set.
可选的,S102可以通过如下步骤实现。Optionally, S102 can be implemented by the following steps.
第一步,将待检测图像输入训练得到的全卷积神经网络,经卷积层和降采样层相间排列的网络结构,提取待检测图像的特征。In the first step, the image to be detected is input into the trained full convolutional neural network, and the features of the image to be detected are extracted through a network structure in which the convolution layer and the downsampling layer are arranged.
第二步,通过反卷积层将特征上采样至分辨率与待检测图像的分辨率相同,得到上采样后的结果。In the second step, the feature is upsampled to the same resolution as the image to be detected by the deconvolution layer, and the upsampled result is obtained.
将待检测图像输入经训练得到的全卷积神经网络,利用一系列卷积层和降采样层依次提取由低层到高层的特征,该一系列卷积层和降采样层是相间排列的。然后连接反卷积层将特征上采样至输入的待检测图像大小。The image to be detected is input into the trained full convolutional neural network, and the features from the lower layer to the upper layer are sequentially extracted by a series of convolutional layers and downsampling layers, and the series of convolutional layers and downsampled layers are arranged in phase. The deconvolution layer is then connected to upsample the feature to the size of the image to be detected of the input.
第三步,利用1×1卷积层对第二步得到的结果进行运算,得到与待检测图像同等分辨率的目标上顶点置信度分布图、目标下顶点置信度分布图及目标上下顶点关联场图。In the third step, the result obtained in the second step is calculated by using the 1×1 convolution layer, and the target upper vertex confidence distribution map, the target lower vertex confidence distribution map, and the target upper and lower vertex associations are obtained with the same resolution as the image to be detected. Field map.
为了保证目标上顶点置信度分布图、目标下顶点置信度分布图和目标上下顶点关联场图与待检测图像有同等分辨率,最后可以通过一卷积层对上采样后的结果进行运算,该卷积层的卷积核尺寸可以选择1×1、3×3或5×5等尺寸的卷积核,但是,为了精确提取一个像素点的特征,可以选定该卷积层的卷积核尺寸为1×1,则通过该卷积层的运算可得到目标上顶点置信度分布 图、目标下顶点置信度分布图和目标上下顶点关联场图。In order to ensure that the vertex confidence distribution map on the target, the target lower vertex confidence distribution map, and the target upper and lower vertex correlation field map have the same resolution as the image to be detected, the result of the upsampling can be calculated by a convolution layer. The convolution kernel size of the convolutional layer may be selected as a convolution kernel of 1×1, 3×3 or 5×5 size, but in order to accurately extract the feature of one pixel point, the convolution kernel of the convolution layer may be selected. When the size is 1×1, the operation of the convolution layer can obtain the target vertex confidence distribution map, the target lower vertex confidence distribution map, and the target upper and lower vertex correlation field map.
S103,分别针对目标上顶点置信度分布图及目标下顶点置信度分布图,采用预设目标确定方法,确定待检测图像中至少一个上顶点目标及至少一个下顶点目标。S103: Determine, by using a preset target determining method, at least one upper vertex target and at least one lower vertex target in the image to be detected, respectively, for the target upper vertex confidence distribution map and the target lower vertex confidence distribution map.
由于通过全卷积神经网络得到的待检测图像的目标上顶点置信度分布图中包含每个识别的区域的目标为指定目标的上顶点目标的概率、下顶点置信度分布图中包含每个识别的区域的目标为指定目标的下顶点目标的概率,所有目标中可能包含了除上顶点目标和下顶点目标以外的其他目标,因此,需要分别针对待检测图像的目标上顶点置信度分布图和目标下顶点置信度分布图,采用预设目标确定方法,从目标上顶点置信度分布图中确定出待检测图像中指定目标准确的上顶点目标、从目标下顶点置信度分布图中确定出待检测图像中指定目标准确的下顶点目标,其中,预设目标确定方法可以为设定一阈值,如果目标上顶点置信度分布图中的概率大于该阈值,则确定该概率对应的区域为上顶点目标,如果目标下顶点置信度分布图中的概率大于该阈值,则确定该概率对应的区域为下顶点目标;也可以是根据各像素点的像素值,如果区域内的各像素值均大于一预设像素值,则确定该区域为上顶点目标或者下顶点目标;还可以是如果各像素点的置信度均大于一预设置信度阈值,则确定该区域为上顶点目标或者下顶点目标;或者,如果各像素点的置信度的平均值大于一预设置信度阈值,则确定该区域为上顶点目标或者下顶点目标。当然,具体确定上顶点目标和下顶点目标的方式不仅限于此,为了便于实现可采用阈值处理的方式。Since the target upper vertex confidence distribution map of the image to be detected obtained by the full convolutional neural network includes the probability that the target of each identified region is the upper vertex target of the specified target, and the lower vertex confidence distribution map includes each identification. The target of the region is the probability of specifying the target's lower vertex target. All targets may include other targets than the upper vertex target and the lower vertex target. Therefore, the target upper vertex confidence map and the target for the image to be detected need to be separately The target lower vertex confidence distribution map is determined by the preset target determination method, and the upper vertex target with the specified target in the image to be detected is determined from the target top vertex confidence distribution map, and the target lower vertex confidence distribution map is determined from the target lower vertex confidence distribution map. Detecting an accurate target lower vertex target in the image, wherein the preset target determining method may be setting a threshold, and if the probability in the vertex confidence distribution graph on the target is greater than the threshold, determining that the region corresponding to the probability is an upper vertex Target, if the probability in the target's lower vertex confidence map is greater than the threshold, then determine The region corresponding to the probability is a lower vertex target; or may be based on the pixel value of each pixel, if each pixel value in the region is greater than a preset pixel value, determining the region as an upper vertex target or a lower vertex target; If the confidence of each pixel is greater than a preset reliability threshold, determining that the region is the upper vertex target or the lower vertex target; or, if the average value of the confidence of each pixel is greater than a preset reliability threshold, Then determine that the area is the upper vertex target or the lower vertex target. Of course, the manner of specifically determining the upper vertex target and the lower vertex target is not limited thereto, and a threshold processing method may be adopted for the convenience of implementation.
可选的,S103可以通过如下步骤实现。Optionally, S103 can be implemented by the following steps.
第一步,分别针对目标上顶点置信度分布图及目标下顶点置信度分布图,采用非极大值抑制方法,确定至少一个检测目标的中心点的位置。In the first step, the non-maximum suppression method is used to determine the position of the center point of at least one detection target for the target vertex confidence distribution map and the target lower vertex confidence distribution map.
由于在待检测图像的目标上顶点置信度分布图和目标下顶点置信度分布图中,置信度极大值点表征了各个检测目标中心点的位置,置信度分布图上空间聚集的非零点表征了检测目标所处区域,分别对目标上顶点置信度分布图和目标下顶点置信度分布图采用非极大值抑制,通过抑制不是极大值的元素,搜索该区域内的极大值,因而可以获得各检测目标中心点的位置。该区 域的形成与每个像素点的置信度相关,由于可能受两个目标离得太近、背景物体等因素影响,使得该区域与实际检测目标可能存在偏差,但是置信度极大值点表征了检测目标中心点,确定中心点位置后,在中心点的一定邻域内,可以确定为一个检测目标,从而通过确定中心点的位置可以提高目标检测的准确性。Due to the vertex confidence distribution map and the target lower vertex confidence distribution map on the target of the image to be detected, the confidence maximum point characterizes the position of each detection target center point, and the non-zero point representation of spatial aggregation on the confidence distribution map The detection target is located in the region, and the non-maximum value suppression is performed on the target vertex confidence distribution map and the target lower vertex confidence distribution map, and the maximum value in the region is searched by suppressing the element that is not the maximum value. The position of the center point of each detection target can be obtained. The formation of this region is related to the confidence of each pixel. Because it may be affected by factors such as too close to the two targets and background objects, the region may deviate from the actual detection target, but the confidence maximum point representation After detecting the center point of the target and determining the position of the center point, it can be determined as a detection target in a certain neighborhood of the center point, so that the accuracy of the target detection can be improved by determining the position of the center point.
第二步,获取每个检测目标的中心点的邻域内所有像素点的置信度。In the second step, the confidence of all the pixels in the neighborhood of the center point of each detection target is obtained.
由于检测目标的中心点的邻域内可以确定为一个检测目标,该邻域的大小可以根据对指定目标的上顶点大小和下顶点大小的统计分析确定,例如,针对行人目标,可以是通过实际人头半径的大小统计出的一个平均值,或者为服从一预设分布的值,确定上顶点目标的邻域大小。下顶点目标的邻域大小可以设置为与上顶点相同,当然,上、下顶点的邻域大小可以不相同,下顶点的邻域大小也可以根据实际指定目标的下顶点大小确定。由于检测目标的中心点的邻域内所有像素点的置信度越大,该检测目标为上顶点目标或者下顶点目标的概率就越大,因此,本实施例中,需要对邻域内所有像素点的置信度进行获取。Since the neighborhood of the center point of the detection target can be determined as a detection target, the size of the neighborhood can be determined according to a statistical analysis of the upper vertex size and the lower vertex size of the specified target, for example, for the pedestrian target, it can be through the actual human head. The average of the radius is counted, or the value of the neighboring vertex is determined by obeying a preset distribution. The neighborhood size of the lower vertex target can be set to be the same as the upper vertex. Of course, the neighborhood sizes of the upper and lower vertex can be different, and the neighborhood size of the lower vertex can also be determined according to the size of the lower vertex of the actual specified target. Since the confidence of all the pixels in the neighborhood of the center point of the detection target is larger, the probability that the detection target is the upper vertex target or the lower vertex target is larger. Therefore, in this embodiment, all the pixels in the neighborhood are required. Confidence is obtained.
第三步,确定目标上顶点置信度分布图中每个像素点的置信度均大于预设置信度阈值的检测目标为上顶点目标、目标下顶点置信度分布图中每个像素点的置信度均大于预设置信度阈值的检测目标为下顶点目标。In the third step, it is determined that the confidence level of each pixel in the vertex confidence distribution graph on the target is greater than the confidence of the pre-set confidence threshold, and the confidence of each pixel in the target vertex confidence map The detection target that is greater than the preset reliability threshold is the lower vertex target.
由于检测目标中心点的邻域内所有像素点的置信度越大,该检测目标为指定目标的上顶点目标或者下顶点目标的概率就越大,因此,本实施例中,预先设定一预设置信度阈值,如果目标上顶点置信度分布图中检测目标中心点邻域内所有像素点的置信度均大于该预设置信度阈值,则可以确定该检测目标为待检测图像的上顶点目标,如果目标下顶点置信度分布图中检测目标中心点邻域内所有像素点的置信度均大于该预设置信度阈值,则可以确定该检测目标为待检测图像的下顶点目标。其中,预设置信度阈值可以根据实验数据或者需求设定,例如,可以将预设置信度设置为0.7,则如果目标上顶点置信度分布图中检测目标的中心点邻域内所有像素点的置信度均大于0.7,则可确定该检测目标为上顶点目标,如果目标下顶点置信度分布图中检测目标的中心点邻域内所有像素点的置信度均大于0.7,则可确定该检测目标为下顶 点目标。又例如,可以将预设置信度设置为0.85、0.9,或者其它数值,在此不作限定。本实施例由于限定了检测目标中心点邻域内所有像素点的置信度均需大于预设置信度阈值,进一步保证了目标检测的准确性。The probability that the detection target is the upper vertex target or the lower vertex target of the specified target is larger because the confidence of all the pixels in the neighborhood of the detection target center point is larger. Therefore, in this embodiment, a preset is preset. a confidence threshold, if the confidence level of all the pixels in the neighborhood of the detection target center point in the vertex confidence distribution graph on the target is greater than the preset reliability threshold, the detection target may be determined as the upper vertex target of the image to be detected, if In the target lower vertex confidence distribution map, the confidence of all the pixels in the neighborhood of the detection target center point is greater than the preset reliability threshold, and the detection target may be determined as the lower vertex target of the image to be detected. The pre-set reliability threshold may be set according to experimental data or requirements. For example, the pre-set reliability may be set to 0.7, and if all the pixels in the neighborhood of the detection target are in the target vertex confidence distribution map, the confidence is determined. If the degree is greater than 0.7, the detection target is determined as the upper vertex target. If the confidence of all the pixels in the neighborhood of the detection target in the target lower vertex confidence distribution map is greater than 0.7, the detection target may be determined as Vertex target. For example, the pre-set reliability may be set to 0.85, 0.9, or other values, which is not limited herein. In this embodiment, since the confidence level of all the pixels in the neighborhood of the detection target center point is greater than the preset reliability threshold, the accuracy of the target detection is further ensured.
S104,通过将各上顶点目标及各下顶点目标映射至目标上下顶点关联场图中,针对第一顶点目标,分别计算第一顶点目标与各第二顶点目标连线间的关联场值。S104: Calculate the associated field value between the first vertex target and each second vertex target connection for the first vertex target by mapping each upper vertex target and each lower vertex target to the target upper and lower vertex related field maps.
其中,第一顶点目标与第二顶点目标为任一上顶点目标或者任一下顶点目标,并且若第一顶点目标为任一上顶点目标,则第二顶点目标为任一下顶点目标,若第一顶点目标为任一下顶点目标,则第二顶点目标为任一上顶点目标。在确定场景中指定目标的上顶点目标和下顶点目标之后,可以将得到的各上顶点目标和下顶点目标映射至S102获得的目标上下顶点关联场图中,由于目标上下顶点关联场图上每一像素点表征该位置存在指定目标的上顶点目标或者下顶点目标的关联程度值,然后通过将各上顶点目标和各下顶点目标进行连线,可以得到每两个相连的上下顶点的关联程度值之和,可以将该关联程度值之和定义为连线的关联场值,或者可以将每两个相连的上下顶点的关联程度值的均值定义为连线的关联场值。针对第一顶点目标,与各第二顶点目标的连线中,如果关联场值越大,则说明该连线的上下顶点的关联程度越高,即该连线为指定目标的概率就越大。Wherein, the first vertex target and the second vertex target are any upper vertex target or any lower vertex target, and if the first vertex target is any upper vertex target, the second vertex target is any lower vertex target, if the first The vertex target is any lower vertex target, and the second vertex target is any upper vertex target. After determining the upper vertex target and the lower vertex target of the target in the scene, the obtained upper vertex target and lower vertex target may be mapped to the target upper and lower vertex associated field map obtained by S102, because each of the target upper and lower vertex is associated with the field map. A pixel point represents the correlation degree value of the upper vertex target or the lower vertex target of the specified target at the position, and then the degree of association of each of the two connected upper and lower vertices can be obtained by connecting each upper vertex target and each lower vertex target. The sum of the values may be defined as the associated field value of the connection, or the mean of the correlation degree values of each of the two connected upper and lower vertices may be defined as the associated field value of the connection. For the first vertex target, in the connection with each second vertex target, if the associated field value is larger, the degree of association between the upper and lower vertices of the connection is higher, that is, the probability that the connection is a specified target is larger. .
S105,基于第一顶点目标与各第二顶点目标连线间的关联场值,通过对上下顶点进行匹配,确定关联场值最大的连线为指定目标。S105. Determine, according to the associated field value between the first vertex target and each second vertex target connection, by matching the upper and lower vertices, determining that the connection with the largest associated field value is the specified target.
由于针对第一顶点目标,与各第二顶点目标的连线中,关联场值越大,说明该连线为指定目标的概率就越大,因此,可以将关联场值最大的连线确定为指定目标,并且,一般情况下,与第一顶点目标连接构成指定目标的只对应一个第二顶点目标,因此,可以通过对上下顶点进行匹配,针对第一顶点目标,确定关联场值最大的连线为指定目标,例如,通过S103确定出5个上顶点目标和4个下顶点目标,对于第一上顶点目标,与第一下顶点目标的连线的关联场值最大,则确定第一上顶点目标与第一下顶点目标的连线为指定目标;对于第二上顶点目标,与第三下顶点目标的连线的关联场值最大,则确定第二上顶点目标与第三下顶点目标的连线为指定目标;对于第三上顶点目 标,与第二下顶点目标的连线的关联场值最大,则确定第三上顶点目标与第二下顶点目标的连线为指定目标;对于第五上顶点目标,与第四下顶点目标的连线的关联场值最大,则确定第五上顶点目标与第四下顶点目标的连线为指定目标;由于第四上顶点与各下顶点的连线的关联场值均小于其他连线的关联场值,因此,可以判断第四上顶点可能为误识别的上顶点目标,对该上顶点目标进行丢弃操作。可选的,对上下顶点进行匹配的方法,可以采用经典的二分图匹配法,即匈牙利算法,从而实现上顶点目标与下顶点目标的一对一匹配,当然,能够实现目标间一对一匹配的方法均适用于本实施例,这里不再一一举例。Since the associated field value is larger in the connection with the second vertex target for the first vertex target, the probability that the connection is the specified target is larger. Therefore, the connection with the largest associated field value can be determined as The target is specified, and generally, only one second vertex target is connected to the first vertex target to form a specified target. Therefore, by matching the upper and lower vertices, the largest associated field value can be determined for the first vertex target. The line is a specified target. For example, five upper vertex targets and four lower vertex targets are determined by S103. For the first upper vertex target, the associated field value of the line with the first lower vertex target is the largest, and then the first upper limit is determined. The line connecting the vertex target and the first lower vertex target is a specified target; for the second upper vertex target, the associated field value of the line with the third lower vertex target is the largest, and determining the second upper vertex target and the third lower vertex target The connection is a specified target; for the third upper vertex target, the associated field value of the line with the second lower vertex target is the largest, and the third upper vertex target and the second lower vertex are determined. The target connection is the specified target; for the fifth upper vertex target, the associated field value of the line with the fourth lower vertex target is the largest, then the connection between the fifth upper vertex target and the fourth lower vertex target is determined to be the specified target; The associated field value of the connection between the fourth upper vertex and each lower vertex is smaller than the associated field value of the other connected lines. Therefore, it can be determined that the fourth upper vertex may be the misidentified upper vertex target, and the upper vertex target is discarded. . Optionally, the method of matching the upper and lower vertices may adopt a classic bipartite graph matching method, that is, a Hungarian algorithm, thereby achieving one-to-one matching between the upper vertex target and the lower vertex target. Of course, one-to-one matching between the targets can be achieved. The methods are all applicable to this embodiment, and are not exemplified herein.
由于在确定指定目标的过程中,所计算的关联场值最大的连线可能为误检目标,为了进一步提高目标检测的准确性,可以设置一预设关联场阈值,判断连线的最大关联场值是否大于该预设关联场阈值,如果大于,则说明该连线为准确的指定目标,如果不大于,则说明该连线为误检目标,将检测结果丢弃。在确定指定目标后,即可以确定待检测图像中是否存在指定目标,并且确定指定目标的准确位置信息。In the process of determining the specified target, the calculated connection with the largest associated field value may be a false detection target. To further improve the accuracy of the target detection, a preset associated field threshold may be set to determine the maximum associated field of the connection. Whether the value is greater than the preset associated field threshold. If it is greater than, the connection is an accurate specified target. If it is not greater than, the connection is a false detection target and the detection result is discarded. After determining the specified target, it is possible to determine whether a specified target exists in the image to be detected, and determine accurate position information of the specified target.
应用本实施例,通过将获取的待检测图像输入经训练得到的全卷积神经网络,生成待检测图像的目标上顶点置信度分布图、目标下顶点置信度分布图和目标上下顶点关联场图,分别根据目标上顶点置信度分布图和目标下顶点置信度分布图,确定待检测图像中的上顶点目标和下顶点目标,再通过将上顶点目标和下顶点目标映射至目标上下顶点关联场图,计算得到针对第一顶点目标、与各第二顶点目标连线间的关联场值,最后,基于各关联场值,通过对上下顶点进行匹配,确定关联场值最大的连线为指定目标。采用经训练得到的全卷积神经网络,能够提取到指定目标的上顶点和下顶点,并且通过映射建立上顶点与下顶点的连接,再通过匹配,将匹配成功的上下顶点连线作为指定目标,指定目标用连线表示,排除了候选框出现重叠的情况发生,即使指定目标分布密集,由于指定目标的上下顶点可以通过全卷积神经网络准确定位,则可以用上下顶点的连线清晰区分各指定目标,提高了目标检测的准确度。并且,由于检测的指定目标为上顶点目标与下顶点目标的连线,通过该连线可以精细明了的反映指定目标的姿态信息(例如,前倾、后仰、俯身等),有利于后续关于目标行为分析等应用。通过本实施例,通过卷积和 映射逐层提取具有高区分度的特征,然后对目标上下顶点准确定位和匹配,将匹配成功的上下顶点作为指定目标检测结果,具有鲁棒性较佳、指定目标检测准确率较高的优点,同时,检测中不需要预先设定一定尺度和高宽比例的锚点框作为基准框,因而算法目标检测的性能不依赖于锚点框的选择,自适应地解决了目标的尺度和高宽比问题。Applying the embodiment, the acquired image to be detected is input into the trained full convolutional neural network, and the target upper vertex confidence distribution map, the target lower vertex confidence distribution map, and the target upper and lower vertex correlation field map are generated. Determining the upper vertex target and the lower vertex target in the image to be detected according to the vertex confidence distribution map and the target lower vertex confidence distribution map, respectively, and mapping the upper vertex target and the lower vertex target to the target upper and lower vertex associated fields by the upper vertex target and the lower vertex target respectively The figure calculates the associated field value between the first vertex target and the second vertex target connection. Finally, based on the associated field values, the upper and lower vertices are matched to determine the connection with the largest associated field value as the specified target. . Using the trained full convolutional neural network, the upper and lower vertices of the specified target can be extracted, and the connection between the upper vertex and the lower vertex is established by mapping, and then the matching upper and lower vertex connections are matched as the designated target by matching. The specified target is represented by a line, and the overlap of the candidate frames is excluded. Even if the specified target is densely distributed, since the upper and lower vertices of the specified target can be accurately located through the full convolutional neural network, the connection between the upper and lower vertices can be clearly distinguished. Each specified target improves the accuracy of target detection. Moreover, since the specified target of the detection is the connection between the upper vertex target and the lower vertex target, the connection can clearly and clearly reflect the posture information of the specified target (for example, forward tilt, backward tilt, lean over, etc.), which is beneficial to the follow-up. About applications such as target behavior analysis. In this embodiment, the features with high discrimination are extracted layer by layer by convolution and mapping, and then the upper and lower vertices of the target are accurately positioned and matched, and the matching upper and lower vertices are used as the specified target detection result, which is more robust and specified. The advantage of high target detection accuracy is high. At the same time, it is not necessary to preset a certain scale and aspect ratio anchor point frame as the reference frame. Therefore, the performance of the algorithm target detection does not depend on the selection of the anchor point frame, and adaptively Solved the scale and aspect ratio of the target.
基于图1所示实施例,如图3所示,本申请实施例还提供了一种目标检测方法,该目标检测方法可以包括如下步骤:Based on the embodiment shown in FIG. 1 , as shown in FIG. 3 , the embodiment of the present application further provides a target detection method, where the target detection method may include the following steps:
S301,获取预设训练集样本图像,及该预设训练集样本图像中各指定目标的上边缘中心位置、下边缘中心位置,以及上下边缘中心位置的连线。S301. Acquire a preset training set sample image, and a line connecting the upper edge center position, the lower edge center position, and the center position of the upper and lower edges of each specified target in the preset training set sample image.
本实施例在进行全卷积神经网络的运算之前,需要先构建全卷积神经网络,由于全卷积神经网络的网络参数是训练得到的,训练的过程可以理解为对指定目标的上顶点目标和下顶点目标的学习过程。需要针对各种指定目标的特征构建预设训练集样本图像,每张图像对应了不同的指定目标的上顶点目标和下顶点目标特征,并且,可以预设上顶点目标和下顶点目标的置信度服从于圆形高斯分布,因此需要获取指定目标的上边缘中心位置(例如,行人目标的头部顶点位置)和下边缘中心位置(例如,行人目标的双脚之间中心位置),以及上下边缘中心位置的连线,该上下边缘中心位置可以标定。In this embodiment, before performing the operation of the full convolutional neural network, a full convolutional neural network needs to be constructed. Since the network parameters of the full convolutional neural network are trained, the training process can be understood as the upper vertex target of the specified target. And the learning process of the next vertex target. It is necessary to construct preset training set sample images for the features of various specified targets, each image corresponding to the upper vertex target and the lower vertex target feature of different specified targets, and the confidence of the upper vertex target and the lower vertex target can be preset. Obeying a circular Gaussian distribution, it is necessary to obtain the upper edge center position of the specified target (for example, the head vertex position of the pedestrian target) and the lower edge center position (for example, the center position between the feet of the pedestrian target), and the upper and lower edges The center position of the connection, the center position of the upper and lower edges can be calibrated.
S302,根据预设分布定律、各指定目标的上边缘中心位置及下边缘中心位置,生成预设训练集样本图像的目标上顶点置信度真值图及目标下顶点置信度真值图。S302. Generate a true value map of the target upper vertex confidence and a true value of the target lower vertex confidence according to the preset distribution law, the upper edge center position and the lower edge center position of each specified target.
其中,预设分布定律为指定目标的上顶点目标和下顶点目标的置信度所服从的概率分布,一般情况下,上、下顶点目标的置信度服从圆形高斯分布,当然,本实施例不仅限于此。假设标定图像中每个指定目标的上边缘中心位置为P up、下边缘中心位置为P down,上顶点目标和下顶点目标的置信度服从圆形高斯分布N,则根据公式(1)、(2),得到预设训练集样本图像的目标上顶 点置信度真值图及目标下顶点置信度真值图。 The preset distribution law is a probability distribution obeyed by the confidence of the upper vertex target and the lower vertex target of the specified target. Generally, the confidence of the upper and lower vertex targets obeys a circular Gaussian distribution. Of course, this embodiment not only Limited to this. Suppose that the center position of the upper edge of each specified target in the calibration image is P up , the center position of the lower edge is P down , and the confidence of the upper vertex target and the lower vertex target obeys the circular Gaussian distribution N, according to formula (1), 2), obtain the target vertex confidence truth value map of the preset training set sample image and the target lower vertex confidence truth value map.
Figure PCTCN2018110394-appb-000001
Figure PCTCN2018110394-appb-000001
Figure PCTCN2018110394-appb-000002
Figure PCTCN2018110394-appb-000002
其中,p表示置信度分布真值图上的任一像素位置坐标;up表示指定目标的上顶点目标;D up(p)表示目标上顶点置信度真值图上p位置坐标处的上顶点目标的置信度;n ped表示训练集样本图像中指定目标的总数;P up表示标定的训练集样本图像中各指定目标上顶点目标的坐标位置;σ up表示上顶点目标服从圆形高斯分布N的方差;down表示指定目标的下顶点目标;D down(p)表示目标下顶点置信度真值图上p位置坐标处的下顶点目标的置信度;P down表示标定的训练集样本图像中各指定目标下顶点目标的坐标位置;σ down表示下顶点目标服从圆形高斯分布N的方差;式(2)为标准的高斯分布,保证所标定的指定目标的上顶点目标和下顶点目标的位置具有最高的置信度1.0,并且置信度呈高斯分布向四周逐渐递减至0。 Where p represents the coordinates of any pixel position on the truth map of the confidence distribution; up represents the upper vertex target of the specified target; D up (p) represents the upper vertex target at the p position coordinate on the target vertex confidence truth map Confidence; n ped represents the total number of specified targets in the training set sample image; P up represents the coordinate position of the vertex target on each specified target in the calibration training set sample image; σ up indicates that the upper vertex target obeys the circular Gaussian distribution N Variance; down represents the lower vertex target of the specified target; D down (p) represents the confidence of the lower vertex target at the p position coordinate on the target lower vertex confidence truth map; P down represents the specified in the calibration training set sample image The coordinate position of the vertex target under the target; σ down indicates the variance of the lower vertex target obeying the circular Gaussian distribution N; Equation (2) is the standard Gaussian distribution, ensuring that the positions of the upper vertex target and the lower vertex target of the specified specified target have The highest confidence is 1.0, and the confidence is gradually decreasing to zero around the Gaussian distribution.
S303,根据各指定目标的上下边缘中心位置的连线,生成预设训练集样本图像的目标上下顶点关联场真值图。S303. Generate a true value map of the target upper and lower vertex associated fields of the preset training set sample image according to the connection of the center positions of the upper and lower edges of the specified target.
一般情况下,对于一个指定目标,上顶点目标和下顶点目标之间连线的关联场服从于幅值单位向量
Figure PCTCN2018110394-appb-000003
,该单位向量的幅值等于1,方向沿连线方向。 当然,本实施例不仅限于此。根据各指定目标的上下边缘中心位置的连线,及公式(3)、(4),可以生成预设训练集样本图像的目标上下顶点关联场真值图。
In general, for a given target, the associated field of the line between the upper vertex target and the lower vertex target is subject to the magnitude unit vector.
Figure PCTCN2018110394-appb-000003
The unit vector has an amplitude equal to 1, and the direction is along the line direction. Of course, the embodiment is not limited to this. According to the line connecting the center positions of the upper and lower edges of each specified target, and formulas (3) and (4), the true value map of the target upper and lower vertex associated fields of the preset training set sample image can be generated.
Figure PCTCN2018110394-appb-000004
Figure PCTCN2018110394-appb-000004
Figure PCTCN2018110394-appb-000005
Figure PCTCN2018110394-appb-000005
其中,p表示目标上下顶点关联场真值图上任一像素位置坐标;A(p)表示目标上下顶点关联场真值图上p位置坐标处的关联场值;n ped表示训练集样本图像中指定目标的总数;式(4)表示指定目标的上顶点目标与下顶点目标的连线上的关联场为幅值等于1,沿连线方向的单位向量vr。 Where p denotes the coordinates of any pixel position on the true value map of the target upper and lower vertex; A(p) denotes the associated field value at the p position coordinate on the true value map of the target upper and lower vertex; n ped denotes the specified in the training set sample image The total number of targets; Equation (4) indicates that the associated field on the line connecting the upper vertex target and the lower vertex target of the specified target is a unit vector vr having an amplitude equal to 1, along the line direction.
预设训练集样本图像的目标上顶点置信度真值图、目标下顶点置信度真值图及目标上下顶点关联场真值图的生成如图4所示,以行人检测为例,可以从目标上顶点置信度真值图中看出,每个亮点对应了预设训练集样本图像中的每个指定目标的上顶点目标,从目标下顶点置信度真值图中看出,每个亮点对应了预设训练集样本图像中的每个指定目标的下顶点目标;从目标上下顶点关联场真值图中看出,每条连线为每个指定目标的上顶点目标和下顶点目标的连线。The target vertex confidence truth value map of the preset training set sample image, the target lower vertex confidence truth value map, and the target upper and lower vertex correlation field truth value map are generated as shown in FIG. 4, taking pedestrian detection as an example, and the target can be obtained from the target. It can be seen from the upper vertex confidence truth graph that each bright point corresponds to the upper vertex target of each specified target in the sample image of the preset training set, and it is seen from the true value map of the target lower vertex confidence that each bright point corresponds to The lower vertex target of each specified target in the sample image of the preset training set; as seen from the truth map of the associated field of the upper and lower vertex of the target, each connection is the connection of the upper vertex target and the lower vertex target of each specified target line.
S304,将预设训练集样本图像输入初始全卷积神经网络,得到预设训练集样本图像的目标上顶点置信度分布图、目标下顶点置信度分布图及目标上下顶点关联场图。S304: Input the preset training set sample image into the initial full convolutional neural network, and obtain a target upper vertex confidence distribution map, a target lower vertex confidence distribution map, and a target upper and lower vertex correlation field map of the preset training set sample image.
其中,初始全卷积神经网络的网络参数为预设值;通过初始全卷积神经网络可以得到预设训练集样本图像的目标上顶点置信度分布图、目标下顶点置信度分布图及目标上下顶点关联场图,目标上顶点置信度分布图用以与上述目标上顶点置信度真值图进行比较、目标下顶点置信度分布图用以与上述目标下顶点置信度真值图进行比较、目标上下顶点关联场图用以与上述目标上下顶点关联场真值图进行比较,通过不断的训练学习、更新网络参数,令 全卷积神经网络输出的目标上顶点置信度分布图与目标上顶点置信度真值图相接近、目标下顶点置信度分布图与目标下顶点置信度真值图相接近、目标上下顶点关联场图与目标上下顶点关联场真值图相接近,在足够接近时再将全卷积神经网络确定为可进行目标检测的训练后的全卷积神经网络。The network parameters of the initial full convolutional neural network are preset values; the initial upper convolutional neural network can obtain the target upper vertex confidence distribution map of the preset training set sample image, the target lower vertex confidence distribution map, and the target upper and lower Vertex associated field map, the target vertex confidence distribution map is compared with the vertex confidence truth map on the target, and the target lower vertex confidence map is used to compare with the target vertex confidence truth map, the target The upper and lower vertex correlation field maps are compared with the above-mentioned target upper and lower vertex associated field truth maps, and the training parameters are continuously trained and updated, so that the target upper vertex confidence distribution map and the target upper vertex confidence are outputted by the full convolutional neural network. The degree truth map is close, the target lower vertex confidence distribution map is close to the target lower vertex confidence truth value map, and the target upper and lower vertex correlation field map is close to the target upper and lower vertex correlation field truth map, and when it is close enough The full convolutional neural network is determined as a trained full convolutional neural network that can perform target detection.
可选的,全卷积神经网络可以包括:卷积层、降采样层及反卷积层。Optionally, the full convolutional neural network may include: a convolution layer, a downsampling layer, and a deconvolution layer.
全卷积神经网络往往包括至少一个卷积层和至少一个降采样层,反卷积层为一个可选层,为了使得到的特征图的分辨率与输入的预设训练集样本图像的分辨率相同,以减少图像压缩比例的换算的步骤,便于进行置信度的运算,在最后一个卷积层之后,可以设置一反卷积层。The full convolutional neural network often includes at least one convolution layer and at least one downsampling layer, and the deconvolution layer is an optional layer, in order to make the resolution of the obtained feature map and the resolution of the input preset training set sample image. Similarly, the step of reducing the conversion ratio of the image compression ratio facilitates the calculation of the confidence. After the last convolutional layer, a deconvolution layer can be set.
可选的,运算得到预设训练集样本图像的目标上顶点置信度分布图、目标下顶点置信度分布图及目标上下顶点关联场图的步骤,可以通过如下步骤实现。Optionally, the step of obtaining the target vertex confidence distribution map, the target lower vertex confidence distribution map, and the target upper and lower vertex correlation field maps of the preset training set sample image may be implemented by the following steps.
第一步,将预设训练集样本图像输入初始全卷积神经网络,经卷积层和降采样层相间排列的网络结构,提取预设训练集样本图像的特征。In the first step, the preset training set sample image is input into the initial full convolutional neural network, and the features of the preset training set sample image are extracted through the network structure in which the convolution layer and the downsampling layer are arranged.
第二步,通过反卷积层将特征上采样至分辨率与预设训练集样本图像的分辨率相同,得到上采样后的结果。In the second step, the feature is upsampled to the same resolution as the preset training set sample image by the deconvolution layer, and the upsampled result is obtained.
将预设训练集样本图像输入初始全卷积神经网络,如图5所示,利用一系列卷积层和降采样层依次提取由低层到高层的特征,该一系列卷积层和降采样层是相间排列的。然后连接反卷积层将特征上采样至输入的预设训练集样本图像大小。The preset training set sample image is input into the initial full convolutional neural network, as shown in FIG. 5, and the features from the lower layer to the upper layer are sequentially extracted by using a series of convolutional layers and downsampling layers, the series of convolutional layers and downsampling layers. It is arranged in phase. The deconvolution layer is then connected to upsample the feature to the input preset training set sample image size.
第三步,利用1×1卷积层对第二步得到的结果进行运算,得到与预设训练集样本图像同等分辨率的目标上顶点置信度分布图、目标下顶点置信度分布图及目标上下顶点关联场图。In the third step, the result obtained in the second step is calculated by using the 1×1 convolution layer, and the target upper vertex confidence distribution map and the target lower vertex confidence distribution map and the target are obtained with the same resolution as the preset training set sample image. The upper and lower vertices are associated with the field map
为了保证目标上顶点置信度分布图、目标下顶点置信度分布图和目标上下顶点关联场图与预设训练集样本图像有同等分辨率,最后可以通过一卷积层对上采样后的结果进行运算,该卷积层的卷积核尺寸可以选择1×1、3×3或5×5等尺寸的卷积核,但是,为了精确提取一个像素点的特征,可以选定该卷积层的卷积核尺寸为1×1,则通过该卷积层的运算可得到目标上顶点置 信度分布图、目标下顶点置信度分布图和目标上下顶点关联场图。In order to ensure that the vertex confidence distribution map on the target, the target lower vertex confidence distribution map, and the target upper and lower vertex correlation field map have the same resolution as the preset training set sample image, the result of the upsampling can be finally performed by a roll of layers. For calculation, the convolution kernel size of the convolution layer may be selected from a convolution kernel of size 1×1, 3×3, or 5×5, but in order to accurately extract features of one pixel, the convolution layer may be selected. When the convolution kernel size is 1×1, the convolutional layer on the target can be obtained by the operation of the convolution layer, the target lower vertex confidence distribution map, and the target upper and lower vertex correlation field map.
S305,计算预设训练集样本图像的目标上顶点置信度分布图与目标上顶点置信度真值图的第一平均误差、目标下顶点置信度分布图与目标下顶点置信度真值图的第二平均误差,以及目标上下顶点关联场图与目标上下顶点关联场真值图的第三平均误差。S305. Calculate a first average error of a target upper vertex confidence distribution map of the preset training set sample image and a target upper vertex confidence true value map, a target lower vertex confidence distribution map, and a target lower vertex confidence true value map. The second average error, and the third average error of the associated field map of the upper and lower vertex of the target and the truth map of the associated field of the upper and lower vertex of the target.
S306,如果第一平均误差、第二平均误差或第三平均误差大于预设误差阈值,则根据第一平均误差、第二平均误差、第三平均误差及预设梯度运算策略,更新网络参数,得到更新的全卷积神经网络;计算经更新的全卷积神经网络得到的第一平均误差、第二平均误差以及第三平均误差,直至第一平均误差、第二平均误差且第三误差均小于或等于预设误差阈值,确定所对应的全卷积神经网络为训练后的全卷积神经网络。S306. If the first average error, the second average error, or the third average error is greater than the preset error threshold, update the network parameter according to the first average error, the second average error, the third average error, and the preset gradient operation strategy. Obtaining an updated full convolutional neural network; calculating a first average error, a second average error, and a third average error obtained by the updated full convolutional neural network until the first average error, the second average error, and the third error are Less than or equal to the preset error threshold, the corresponding full convolutional neural network is determined to be a trained full convolutional neural network.
全卷积神经网络可以采用经典的反向传播算法进行训练,预设梯度运算策略可以为普通的梯度下降法,也可以为随机梯度下降法,梯度下降法是用负梯度方向为搜索方向,越接近目标值,步长越小,前进越慢,由于随机梯度下降法每次只使用一个样本,迭代一次的速度要远高于梯度下降。因此,为了提高运算效率,本实施例可以采用随机梯度下降法,更新网络参数。训练过程中,计算预设训练集样本图像经过全卷积神经网络后输出的目标上顶点置信度分布图与目标上顶点置信度真值图的第一平均误差、目标下顶点置信度分布图与目标下顶点置信度真值图的第二平均误差,以及目标上下顶点关联场图与目标上下顶点关联场真值图的第三平均误差,如公式(5)(6),用平均误差来更新全卷积神经网络的网络参数,迭代进行上述过程,直至满足平均误差不再下降为止,其中全卷积神经网络的网络参数包括卷积层的卷积核参数和偏移量参数。The full convolutional neural network can be trained by the classical back propagation algorithm. The preset gradient operation strategy can be the ordinary gradient descent method or the stochastic gradient descent method. The gradient descent method uses the negative gradient direction as the search direction. Close to the target value, the smaller the step size, the slower the progress. Since the stochastic gradient descent method uses only one sample at a time, the iteration speed is much higher than the gradient descent. Therefore, in order to improve the operation efficiency, the embodiment may use a random gradient descent method to update the network parameters. During the training process, the first average error of the target upper vertex confidence distribution map and the target upper vertex confidence truth value map outputted by the preset training set sample image after passing through the full convolutional neural network, and the target lower vertex confidence distribution map are calculated. The second average error of the target vertex confidence truth map, and the third average error of the target upper and lower vertex associated field map and the target upper and lower vertex associated field truth map, as shown in equation (5)(6), updated with the average error The network parameters of the full convolutional neural network are iteratively performed until the average error is no longer decreased. The network parameters of the full convolutional neural network include the convolution kernel parameters and the offset parameters of the convolutional layer.
Figure PCTCN2018110394-appb-000006
Figure PCTCN2018110394-appb-000006
L(θ)=L D(θ)+λL A(θ)              (6) L(θ)=L D (θ)+λL A (θ) (6)
其中,L D(θ)表示第一平均误差或第二平均误差;θ表示全卷积神经网络的网络参数;N表示预设训练集样本图像的数目;F D(X i;θ)表示全卷积神经网络输出的目标上顶点置信度分布图或目标下顶点置信度分布图;X i表示输入到网络,编号为i的输入图像;i表示图像编号;D i表示通过式(1)和式(2)得到的目标上顶点置信度真值图或目标下顶点置信度真值图;L A(θ)表示第三平均误差;F A(X i;θ)表示全卷积神经网络输出的目标上下顶点关联场图;A i表示通过式(3)和式(4)得到的目标上下顶点关联场真值图;λ表示两误差的平衡参数,通常取值1.0。 Where L D (θ) represents the first average error or the second average error; θ represents the network parameter of the full convolutional neural network; N represents the number of sample images of the preset training set; F D (X i ; θ) represents the total Convolutional neural network output target vertex confidence distribution map or target lower vertex confidence distribution map; X i represents input image input to the network, number i; i represents image number; D i represents through formula (1) and Equation (2) obtained on the target vertex confidence truth map or target lower vertex confidence truth map; L A (θ) represents the third average error; F A (X i ; θ) represents the full convolutional neural network output The upper and lower vertices of the target are associated with the field map; A i represents the true value map of the associated upper and lower vertices obtained by equations (3) and (4); λ represents the balance parameter of the two errors, usually taking a value of 1.0.
S307,获取通过图像采集器采集的待检测图像。S307. Acquire an image to be detected collected by the image collector.
S308,将待检测图像输入经训练得到的全卷积神经网络,生成待检测图像的目标上顶点置信度分布图、目标下顶点置信度分布图,以及目标上下顶点关联场图。S308. Input the image to be detected into the trained full convolutional neural network, generate a target upper vertex confidence distribution map, a target lower vertex confidence distribution map, and a target upper and lower vertex correlation field map.
S309,分别针对目标上顶点置信度分布图及目标下顶点置信度分布图,采用预设目标确定方法,确定待检测图像中至少一个上顶点目标及至少一个下顶点目标。S309. Determine, by using a preset target determining method, at least one upper vertex target and at least one lower vertex target in the image to be detected, respectively, for the target upper vertex confidence distribution map and the target lower vertex confidence distribution map.
S310,通过将各上顶点目标及各下顶点目标映射至目标上下顶点关联场图中,针对第一顶点目标,分别计算第一顶点目标与各第二顶点目标连线间的关联场值。S310, by mapping each upper vertex target and each lower vertex target to the target upper and lower vertex related field maps, and calculating an associated field value between the first vertex target and each second vertex target connection for the first vertex target.
S311,基于第一顶点目标与各第二顶点目标连线间的关联场值,通过对上下顶点进行匹配,确定关联场值最大的连线为指定目标。S311: Determine, according to the associated field value between the first vertex target and each second vertex target connection, by matching the upper and lower vertices, determining that the connection with the largest associated field value is the specified target.
S307至S311与图1所示实施例的步骤相同,具有相同或相似的有益效果,这里不再赘述。S307 to S311 are the same as the steps of the embodiment shown in FIG. 1, and have the same or similar beneficial effects, and are not described herein again.
应用本实施例,通过将获取的待检测图像输入经训练得到的全卷积神经网络,生成待检测图像的目标上顶点置信度分布图、目标下顶点置信度分布图和目标上下顶点关联场图,分别根据目标上顶点置信度分布图和目标下顶 点置信度分布图,确定待检测图像中的上顶点目标和下顶点目标,再通过将上顶点目标和下顶点目标映射至目标上下顶点关联场图,计算得到针对第一顶点目标、与各第二顶点目标连线间的关联场值,最后,基于各关联场值,通过对上下顶点进行匹配,确定关联场值最大的连线为指定目标。采用经训练得到的全卷积神经网络,能够提取到指定目标的上顶点和下顶点,并且通过映射建立上顶点与下顶点的连接,再通过匹配,将匹配成功的上下顶点连线作为指定目标,指定目标用连线表示,排除了候选框出现重叠的情况发生,即使指定目标分布密集,由于指定目标的上下顶点可以通过全卷积神经网络准确定位,则可以用上下顶点的连线清晰区分各指定目标,提高了目标检测的准确度。并且,由于检测的指定目标为上顶点目标与下顶点目标的连线,通过该连线可以精细明了的反映指定目标的姿态信息(例如,前倾、后仰、俯身等),有利于后续关于目标行为分析等应用。通过本实施例,通过卷积和映射逐层提取具有高区分度的特征,然后对目标上下顶点准确定位和匹配,将匹配成功的上下顶点作为指定目标检测结果,具有鲁棒性较佳、指定目标检测准确率较高的优点,同时,检测中不需要预先设定一定尺度和高宽比例的锚点框作为基准框,因而算法目标检测的性能不依赖于锚点框的选择,自适应地解决了目标的尺度和高宽比问题。在全卷积神经网络的训练过程中,针对具有不同特征的指定目标的上顶点目标和下顶点目标,设定了预设训练集样本图像,通过对预设训练集样本图像的训练、迭代,得到的全卷积神经网络具有较强的泛化能力,避免了复杂的分类器级联模式,结构更为简单。Applying the embodiment, the acquired image to be detected is input into the trained full convolutional neural network, and the target upper vertex confidence distribution map, the target lower vertex confidence distribution map, and the target upper and lower vertex correlation field map are generated. Determining the upper vertex target and the lower vertex target in the image to be detected according to the vertex confidence distribution map and the target lower vertex confidence distribution map, respectively, and mapping the upper vertex target and the lower vertex target to the target upper and lower vertex associated fields by the upper vertex target and the lower vertex target respectively The figure calculates the associated field value between the first vertex target and the second vertex target connection. Finally, based on the associated field values, the upper and lower vertices are matched to determine the connection with the largest associated field value as the specified target. . Using the trained full convolutional neural network, the upper and lower vertices of the specified target can be extracted, and the connection between the upper vertex and the lower vertex is established by mapping, and then the matching upper and lower vertex connections are matched as the designated target by matching. The specified target is represented by a line, and the overlap of the candidate frames is excluded. Even if the specified target is densely distributed, since the upper and lower vertices of the specified target can be accurately located through the full convolutional neural network, the connection between the upper and lower vertices can be clearly distinguished. Each specified target improves the accuracy of target detection. Moreover, since the specified target of the detection is the connection between the upper vertex target and the lower vertex target, the connection can clearly and clearly reflect the posture information of the specified target (for example, forward tilt, backward tilt, lean over, etc.), which is beneficial to the follow-up. About applications such as target behavior analysis. In this embodiment, the features with high discrimination are extracted layer by layer by convolution and mapping, and then the upper and lower vertices of the target are accurately positioned and matched, and the matching upper and lower vertices are used as the specified target detection result, which is more robust and specified. The advantage of high target detection accuracy is high. At the same time, it is not necessary to preset a certain scale and aspect ratio anchor point frame as the reference frame. Therefore, the performance of the algorithm target detection does not depend on the selection of the anchor point frame, and adaptively Solved the scale and aspect ratio of the target. In the training process of the full convolutional neural network, the preset training set sample image is set for the upper vertex target and the lower vertex target of the specified target with different features, and the training and iteration of the sample image of the preset training set are performed. The obtained full convolutional neural network has strong generalization ability, avoids complicated classifier cascade mode, and has a simpler structure.
下面结合对行人目标进行检测的具体应用实例,对本申请实施例所提供的目标检测方法进行介绍。The target detection method provided by the embodiment of the present application is introduced in conjunction with a specific application example for detecting a pedestrian target.
针对在街道的场景下,通过监控设备采集待检测图像,将待检测图像输入经训练得到的全卷积神经网络,得到该待检测图像的目标上顶点置信度分布图、目标下顶点置信度分布图和目标上下顶点关联场图;分别针对该待检测图像的目标上顶点置信度分布图和目标下顶点置信度分布图,采用非极大值抑制,确定每个检测目标的中心点的位置,并且在检测目标的中心点的邻域内的像素点的置信度大于预设置信度阈值,确定行人头部顶点目标和行人双脚之间中心位置目标。For the scene in the street, the image to be detected is collected by the monitoring device, and the image to be detected is input into the trained full convolutional neural network, and the target upper vertex confidence distribution map and the target lower vertex confidence distribution are obtained. The map and the target upper and lower vertex are associated with the field map; respectively, for the target upper vertex confidence distribution map and the target lower vertex confidence distribution map for the image to be detected, non-maximum value suppression is used to determine the position of the center point of each detection target, And the confidence level of the pixel in the neighborhood of the detection center point of the target is greater than the preset reliability threshold, and the center position target between the pedestrian head vertex target and the pedestrian foot is determined.
然后将行人头部顶点目标和行人双脚之间中心位置目标映射至上述得到的目标上下顶点关联场图,得到各行人头部顶点目标和各行人双脚之间中心位置目标的关联程度值,根据关联程度值,可以得到各行人头部顶点目标与各行人双脚之间中心位置目标的关联程度值的均值,通过均值的判断及匹配,确定如图6所示的检测结果,每一个连线即为一个行人目标。Then, the center position target between the pedestrian head vertex target and the pedestrian foot is mapped to the above-mentioned target upper and lower vertex associated field map, and the correlation degree value of each pedestrian head vertex target and the center position target between each pedestrian foot is obtained. According to the correlation degree value, the mean value of the correlation degree value between the head vertex target of each pedestrian and the center position target of each pedestrian's feet can be obtained, and the detection result shown in FIG. 6 is determined by the judgment and matching of the mean value, each of which is connected. The line is a pedestrian goal.
相较于相关技术,本方案通过将获取的待检测图像输入经训练得到的全卷积神经网络,生成待检测图像的目标上顶点置信度分布图、目标下顶点置信度分布图和目标上下顶点关联场图,分别根据目标上顶点置信度分布图和目标下顶点置信度分布图,确定待检测图像中的上顶点目标和下顶点目标,再通过将上顶点目标和下顶点目标映射至目标上下顶点关联场图,计算得到针对第一顶点目标、与各第二顶点目标连线间的关联场值,最后,基于各关联场值,通过对上下顶点进行匹配,确定关联场值最大的连线为指定目标。采用经训练得到的全卷积神经网络,能够提取到指定目标的上顶点和下顶点,并且通过映射建立上顶点与下顶点的连接,再通过匹配,将匹配成功的上下顶点连线作为指定目标,指定目标用连线表示,排除了候选框出现重叠的情况发生,即使指定目标分布密集,由于指定目标的上下顶点可以通过全卷积神经网络准确定位,则可以用上下顶点的连线清晰区分各指定目标,提高了目标检测的准确度。并且,由于检测的指定目标为上顶点目标与下顶点目标的连线,通过该连线可以精细明了的反映指定目标的姿态信息(例如,前倾、后仰、俯身等),有利于后续关于目标行为分析等应用。Compared with the related art, the present scheme generates a target upper vertex confidence distribution map, a target lower vertex confidence distribution map, and a target upper and lower vertex by inputting the acquired image to be detected into the trained full convolutional neural network. Correlating the field map, determining the upper vertex target and the lower vertex target in the image to be detected according to the vertex confidence distribution map and the target lower vertex confidence distribution map respectively, and mapping the upper vertex target and the lower vertex target to the target upper and lower targets The vertex is associated with the field map, and the associated field value between the first vertex target and the second vertex target is calculated. Finally, based on the associated field values, the upper and lower vertices are matched to determine the maximum associated field value. To specify the target. Using the trained full convolutional neural network, the upper and lower vertices of the specified target can be extracted, and the connection between the upper vertex and the lower vertex is established by mapping, and then the matching upper and lower vertex connections are matched as the designated target by matching. The specified target is represented by a line, and the overlap of the candidate frames is excluded. Even if the specified target is densely distributed, since the upper and lower vertices of the specified target can be accurately located through the full convolutional neural network, the connection between the upper and lower vertices can be clearly distinguished. Each specified target improves the accuracy of target detection. Moreover, since the specified target of the detection is the connection between the upper vertex target and the lower vertex target, the connection can clearly and clearly reflect the posture information of the specified target (for example, forward tilt, backward tilt, lean over, etc.), which is beneficial to the follow-up. About applications such as target behavior analysis.
相应于上述方法实施例,本申请实施例提供了一种目标检测装置,如图7所示,该目标检测装置包括:Corresponding to the above method embodiment, the embodiment of the present application provides a target detecting device. As shown in FIG. 7, the target detecting device includes:
第一获取模块710,用于获取通过图像采集器采集的待检测图像;The first acquiring module 710 is configured to acquire an image to be detected collected by the image collector;
第一生成模块720,用于将所述待检测图像输入经训练得到的全卷积神经网络,生成所述待检测图像的目标上顶点置信度分布图、目标下顶点置信度分布图,以及目标上下顶点关联场图;a first generating module 720, configured to input the image to be detected into the trained full convolutional neural network, generate a target upper vertex confidence distribution map, a target lower vertex confidence distribution map, and a target Upper and lower vertices associated with the field map;
目标确定模块730,用于分别针对所述目标上顶点置信度分布图及所述目 标下顶点置信度分布图,采用预设目标确定方法,确定所述待检测图像中至少一个上顶点目标及至少一个下顶点目标;The target determining module 730 is configured to determine at least one upper vertex target and at least one of the to-be-detected images by using a preset target determining method for the target upper vertex confidence distribution map and the target lower vertex confidence distribution map, respectively. a lower vertex target;
第一计算模块740,用于通过将各上顶点目标及各下顶点目标映射至所述目标上下顶点关联场图中,针对第一顶点目标,分别计算所述第一顶点目标与各第二顶点目标连线间的关联场值,其中,若所述第一顶点目标为任一上顶点目标,则所述第二顶点目标为任一下顶点目标,若所述第一顶点目标为任一下顶点目标,则所述第二顶点目标为任一上顶点目标;a first calculation module 740, configured to calculate the first vertex target and each second vertex for the first vertex target by mapping each upper vertex target and each lower vertex target to the target upper and lower vertex associated field maps An associated field value between the target connections, wherein if the first vertex target is any upper vertex target, the second vertex target is any lower vertex target, if the first vertex target is any lower vertex target And the second vertex target is any upper vertex target;
匹配模块750,用于基于所述第一顶点目标与各第二顶点目标连线间的关联场值,通过对上下顶点进行匹配,确定关联场值最大的连线为指定目标。The matching module 750 is configured to determine, according to the associated field value between the first vertex target and each second vertex target connection, by matching the upper and lower vertices, determining that the connection with the largest associated field value is the specified target.
应用本实施例,通过将获取的待检测图像输入经训练得到的全卷积神经网络,生成待检测图像的目标上顶点置信度分布图、目标下顶点置信度分布图和目标上下顶点关联场图,分别根据目标上顶点置信度分布图和目标下顶点置信度分布图,确定待检测图像中的上顶点目标和下顶点目标,再通过将上顶点目标和下顶点目标映射至目标上下顶点关联场图,计算得到针对第一顶点目标、与各第二顶点目标连线间的关联场值,最后,基于各关联场值,通过对上下顶点进行匹配,确定关联场值最大的连线为指定目标。采用经训练得到的全卷积神经网络,能够提取到指定目标的上顶点和下顶点,并且通过映射建立上顶点与下顶点的连接,再通过匹配,将匹配成功的上下顶点连线作为指定目标,指定目标用连线表示,排除了候选框出现重叠的情况发生,即使指定目标分布密集,由于指定目标的上下顶点可以通过全卷积神经网络准确定位,则可以用上下顶点的连线清晰区分各指定目标,提高了目标检测的准确度。并且,由于检测的指定目标为上顶点目标与下顶点目标的连线,通过该连线可以精细明了的反映指定目标的姿态信息(例如,前倾、后仰、俯身等),有利于后续关于目标行为分析等应用。通过本实施例,通过卷积和映射逐层提取具有高区分度的特征,然后对目标上下顶点准确定位和匹配,将匹配成功的上下顶点作为指定目标检测结果,具有鲁棒性较佳、指定目标检测准确率较高的优点,同时,检测中不需要预先设定一定尺度和高宽比例的锚点框作为基准框,因而算法目标检测的性能不依赖于锚点框的选择,自适应地解决了目标的尺度和高宽比问题。Applying the embodiment, the acquired image to be detected is input into the trained full convolutional neural network, and the target upper vertex confidence distribution map, the target lower vertex confidence distribution map, and the target upper and lower vertex correlation field map are generated. Determining the upper vertex target and the lower vertex target in the image to be detected according to the vertex confidence distribution map and the target lower vertex confidence distribution map, respectively, and mapping the upper vertex target and the lower vertex target to the target upper and lower vertex associated fields by the upper vertex target and the lower vertex target respectively The figure calculates the associated field value between the first vertex target and the second vertex target connection. Finally, based on the associated field values, the upper and lower vertices are matched to determine the connection with the largest associated field value as the specified target. . Using the trained full convolutional neural network, the upper and lower vertices of the specified target can be extracted, and the connection between the upper vertex and the lower vertex is established by mapping, and then the matching upper and lower vertex connections are matched as the designated target by matching. The specified target is represented by a line, and the overlap of the candidate frames is excluded. Even if the specified target is densely distributed, since the upper and lower vertices of the specified target can be accurately located through the full convolutional neural network, the connection between the upper and lower vertices can be clearly distinguished. Each specified target improves the accuracy of target detection. Moreover, since the specified target of the detection is the connection between the upper vertex target and the lower vertex target, the connection can clearly and clearly reflect the posture information of the specified target (for example, forward tilt, backward tilt, lean over, etc.), which is beneficial to the follow-up. About applications such as target behavior analysis. In this embodiment, the features with high discrimination are extracted layer by layer by convolution and mapping, and then the upper and lower vertices of the target are accurately positioned and matched, and the matching upper and lower vertices are used as the specified target detection result, which is more robust and specified. The advantage of high target detection accuracy is high. At the same time, it is not necessary to preset a certain scale and aspect ratio anchor point frame as the reference frame. Therefore, the performance of the algorithm target detection does not depend on the selection of the anchor point frame, and adaptively Solved the scale and aspect ratio of the target.
可选的,所述目标确定模块730,具体可以用于:Optionally, the target determining module 730 is specifically configured to:
分别针对所述目标上顶点置信度分布图及所述目标下顶点置信度分布图,采用非极大值抑制方法,确定至少一个检测目标的中心点的位置;Determining a position of a center point of the at least one detection target by using a non-maximum value suppression method for the target upper vertex confidence distribution map and the target lower vertex confidence distribution map;
获取每个检测目标的中心点的邻域内所有像素点的置信度;Obtaining the confidence of all pixels in the neighborhood of the center point of each detection target;
确定所述目标上顶点置信度分布图中每个像素点的置信度均大于预设置信度阈值的检测目标为上顶点目标、所述目标下顶点置信度分布图中每个像素点的置信度均大于预设置信度阈值的检测目标为下顶点目标。Determining a confidence level of each pixel in the vertex confidence distribution graph on the target is greater than a confidence level of the pre-set reliability threshold, and the confidence of each pixel in the target lower vertex confidence map The detection target that is greater than the preset reliability threshold is the lower vertex target.
可选的,所述第一计算模块740,具体可以用于:Optionally, the first calculating module 740 is specifically configured to:
将各上顶点目标及各下顶点目标映射至所述目标上下顶点关联场图中,得到各上顶点目标与各下顶点目标的关联程度值;Mapping each upper vertex target and each lower vertex target to the target upper and lower vertex associated field maps, and obtaining correlation degrees between the upper vertex targets and the lower vertex targets;
针对第一顶点目标,将所述第一顶点目标与各第二顶点目标进行连线;The first vertex target is connected to each second vertex target for the first vertex target;
根据所述第一顶点目标与各第二顶点目标的关联程度值,计算得到所述第一顶点目标与各第二顶点目标的关联程度值的均值作为所述第一顶点目标与各第二顶点目标连线间的关联场值。Calculating, according to the correlation degree value of the first vertex target and each second vertex target, an average value of the correlation degree value of the first vertex target and each second vertex target as the first vertex target and each second vertex The associated field value between the target connections.
可选的,所述匹配模块750,具体可以用于:Optionally, the matching module 750 is specifically configured to:
基于所述第一顶点目标与各第二顶点目标连线间的关联场值,采用预设二分图匹配法,从各关联场值中选择最大的关联场值;Determining a maximum associated field value from each associated field value by using a preset bipartite graph matching method based on the associated field value between the first vertex target and each second vertex target connection;
将所述最大的关联场值对应的连线确定为指定目标。The connection corresponding to the largest associated field value is determined as the specified target.
可选的,所述匹配模块750,还可以用于:Optionally, the matching module 750 is further configured to:
获取预设关联场阈值;Obtain a preset associated field threshold;
判断所述最大的关联场值是否大于所述预设关联场阈值;Determining whether the maximum associated field value is greater than the preset associated field threshold;
如果大于,则执行所述将所述最大的关联场值对应的连线确定为指定目标。If it is greater than, the connection corresponding to the largest associated field value is determined as the specified target.
需要说明的是,本申请实施例的目标检测装置为应用图1所示目标检测方法实施例的装置,则上述目标检测方法的所有实施例均适用于该装置,且 均能达到相同或相似的有益效果。It should be noted that the target detecting apparatus in the embodiment of the present application is the apparatus applying the embodiment of the target detecting method shown in FIG. 1, and all the embodiments of the target detecting method are applicable to the apparatus, and all of the same or similar can be achieved. Beneficial effect.
基于图7所示实施例,本申请实施例还提供了一种目标检测装置,如图8所示,该目标检测装置可以包括:Based on the embodiment shown in FIG. 7, the embodiment of the present application further provides a target detecting device. As shown in FIG. 8, the target detecting device may include:
第一获取模块810,用于获取通过图像采集器采集的待检测图像;The first obtaining module 810 is configured to acquire an image to be detected collected by the image collector;
第二获取模块820,用于获取预设训练集样本图像,及所述预设训练集样本图像中各指定目标的上边缘中心位置、下边缘中心位置,以及上下边缘中心位置的连线;The second obtaining module 820 is configured to acquire a preset training set sample image, and a line connecting the upper edge center position, the lower edge center position, and the center position of the upper and lower edges of each specified target in the preset training set sample image;
第二生成模块830,用于根据预设分布定律、各指定目标的上边缘中心位置及下边缘中心位置,生成所述预设训练集样本图像的目标上顶点置信度真值图及目标下顶点置信度真值图;a second generating module 830, configured to generate, according to a preset distribution law, an upper edge center position and a lower edge center position of each specified target, a target upper vertex confidence true value map and a target lower vertex of the preset training set sample image Confidence truth map;
第三生成模块840,用于根据各指定目标的上下边缘中心位置的连线,生成所述预设训练集样本图像的目标上下顶点关联场真值图;The third generation module 840 is configured to generate a true value map of the target upper and lower vertex associated fields of the preset training set sample image according to the connection of the top and bottom edge center positions of the specified targets;
提取模块850,用于将所述预设训练集样本图像输入初始全卷积神经网络,得到所述预设训练集样本图像的目标上顶点置信度分布图、目标下顶点置信度分布图及目标上下顶点关联场图,其中,所述初始全卷积神经网络的网络参数为预设值;The extraction module 850 is configured to input the preset training set sample image into the initial full convolutional neural network, obtain a target upper vertex confidence distribution map, a target lower vertex confidence distribution map, and a target of the preset training set sample image. The upper and lower vertices are associated with the field map, wherein the network parameters of the initial full convolutional neural network are preset values;
第二计算模块860,用于计算所述预设训练集样本图像的目标上顶点置信度分布图与目标上顶点置信度真值图的第一平均误差、所述预设训练集样本图像的目标下顶点置信度分布图与目标下顶点置信度真值图的第二平均误差,以及所述预设训练集样本图像的目标上下顶点关联场图与目标上下顶点关联场真值图的第三平均误差;a second calculating module 860, configured to calculate a first average error of a target upper vertex confidence distribution map of the preset training set sample image and a target upper vertex confidence truth value map, and a target of the preset training set sample image a second average error of the lower vertex confidence distribution map and the target lower vertex confidence truth map, and a third average of the target upper and lower vertex associated field maps of the preset training set sample image and the target upper and lower vertex associated field truth maps error;
循环模块870,用于如果所述第一平均误差、所述第二平均误差或所述第三平均误差大于预设误差阈值,则根据所述第一平均误差、所述第二平均误差、所述第三平均误差及预设梯度运算策略,更新网络参数,得到更新的全卷积神经网络;计算经所述更新的全卷积神经网络得到的第一平均误差、第二平均误差以及第三平均误差,直至所述第一平均误差、所述第二平均误差 且所述第三误差均小于或等于所述预设误差阈值,确定所对应的全卷积神经网络为训练后的全卷积神经网络;The loop module 870 is configured to: if the first average error, the second average error, or the third average error is greater than a preset error threshold, according to the first average error, the second average error, The third average error and the preset gradient operation strategy are described, the network parameters are updated, an updated full convolutional neural network is obtained, and the first average error, the second average error, and the third obtained by the updated full convolutional neural network are calculated. The average error, until the first average error, the second average error, and the third error are both less than or equal to the preset error threshold, determining that the corresponding full convolutional neural network is a full convolution after training Neural Networks;
第一生成模块880,用于将所述待检测图像输入经训练得到的全卷积神经网络,生成所述待检测图像的目标上顶点置信度分布图、目标下顶点置信度分布图,以及目标上下顶点关联场图;a first generating module 880, configured to input the image to be detected into the trained full convolutional neural network, generate a target upper vertex confidence distribution map, a target lower vertex confidence distribution map, and a target Upper and lower vertices associated with the field map;
目标确定模块890,用于分别针对所述目标上顶点置信度分布图及所述目标下顶点置信度分布图,采用预设目标确定方法,确定所述待检测图像中至少一个上顶点目标及至少一个下顶点目标;a target determining module 890, configured to determine at least one upper vertex target and at least one of the to-be-detected images by using a preset target determining method for the target upper vertex confidence distribution map and the target lower vertex confidence distribution map, respectively a lower vertex target;
第一计算模块8100,用于通过将各上顶点目标及各下顶点目标映射至所述目标上下顶点关联场图中,针对第一顶点目标,分别计算所述第一顶点目标与各第二顶点目标连线间的关联场值,其中,若所述第一顶点目标为任一上顶点目标,则所述第二顶点目标为任一下顶点目标,若所述第一顶点目标为任一下顶点目标,则所述第二顶点目标为任一上顶点目标;a first calculating module 8100, configured to calculate the first vertex target and each second vertex for the first vertex target by mapping each upper vertex target and each lower vertex target to the target upper and lower vertex associated field maps An associated field value between the target connections, wherein if the first vertex target is any upper vertex target, the second vertex target is any lower vertex target, if the first vertex target is any lower vertex target And the second vertex target is any upper vertex target;
匹配模块8110,用于基于所述第一顶点目标与各第二顶点目标连线间的关联场值,通过对上下顶点进行匹配,确定关联场值最大的连线为指定目标。The matching module 8110 is configured to determine, according to the associated field value between the first vertex target and each second vertex target connection, by matching the upper and lower vertices, the connection with the largest associated field value is the specified target.
应用本实施例,通过将获取的待检测图像输入经训练得到的全卷积神经网络,生成待检测图像的目标上顶点置信度分布图、目标下顶点置信度分布图和目标上下顶点关联场图,分别根据目标上顶点置信度分布图和目标下顶点置信度分布图,确定待检测图像中的上顶点目标和下顶点目标,再通过将上顶点目标和下顶点目标映射至目标上下顶点关联场图,计算得到针对第一顶点目标、与各第二顶点目标连线间的关联场值,最后,基于各关联场值,通过对上下顶点进行匹配,确定关联场值最大的连线为指定目标。采用经训练得到的全卷积神经网络,能够提取到指定目标的上顶点和下顶点,并且通过映射建立上顶点与下顶点的连接,再通过匹配,将匹配成功的上下顶点连线作为指定目标,指定目标用连线表示,排除了候选框出现重叠的情况发生,即使指定目标分布密集,由于指定目标的上下顶点可以通过全卷积神经网络准确定位,则可以用上下顶点的连线清晰区分各指定目标,提高了目标检测的准确度。并且,由于检测的指定目标为上顶点目标与下顶点目标的连线,通过该连线可以精细明了的反映指定目标的姿态信息(例如,前倾、后仰、 俯身等),有利于后续关于目标行为分析等应用。通过本实施例,通过卷积和映射逐层提取具有高区分度的特征,然后对目标上下顶点准确定位和匹配,将匹配成功的上下顶点作为指定目标检测结果,具有鲁棒性较佳、指定目标检测准确率较高的优点,同时,检测中不需要预先设定一定尺度和高宽比例的锚点框作为基准框,因而算法目标检测的性能不依赖于锚点框的选择,自适应地解决了目标的尺度和高宽比问题。在全卷积神经网络的训练过程中,针对具有不同特征的指定目标的上顶点目标和下顶点目标,设定了预设训练集样本图像,通过对预设训练集样本图像的训练、迭代,得到的全卷积神经网络具有较强的泛化能力,避免了复杂的分类器级联模式,结构更为简单。Applying the embodiment, the acquired image to be detected is input into the trained full convolutional neural network, and the target upper vertex confidence distribution map, the target lower vertex confidence distribution map, and the target upper and lower vertex correlation field map are generated. Determining the upper vertex target and the lower vertex target in the image to be detected according to the vertex confidence distribution map and the target lower vertex confidence distribution map, respectively, and mapping the upper vertex target and the lower vertex target to the target upper and lower vertex associated fields by the upper vertex target and the lower vertex target respectively The figure calculates the associated field value between the first vertex target and the second vertex target connection. Finally, based on the associated field values, the upper and lower vertices are matched to determine the connection with the largest associated field value as the specified target. . Using the trained full convolutional neural network, the upper and lower vertices of the specified target can be extracted, and the connection between the upper vertex and the lower vertex is established by mapping, and then the matching upper and lower vertex connections are matched as the designated target by matching. The specified target is represented by a line, and the overlap of the candidate frames is excluded. Even if the specified target is densely distributed, since the upper and lower vertices of the specified target can be accurately located through the full convolutional neural network, the connection between the upper and lower vertices can be clearly distinguished. Each specified target improves the accuracy of target detection. Moreover, since the specified target is the connection between the upper vertex target and the lower vertex target, the connection can clearly and clearly reflect the posture information of the specified target (for example, forward tilt, backward tilt, lean over, etc.), which is beneficial to the follow-up. About applications such as target behavior analysis. In this embodiment, the features with high discrimination are extracted layer by layer by convolution and mapping, and then the upper and lower vertices of the target are accurately positioned and matched, and the matching upper and lower vertices are used as the specified target detection result, which is more robust and specified. The advantage of high target detection accuracy is high. At the same time, it is not necessary to preset a certain scale and aspect ratio anchor point frame as the reference frame. Therefore, the performance of the algorithm target detection does not depend on the selection of the anchor point frame, and adaptively Solved the scale and aspect ratio of the target. In the training process of the full convolutional neural network, the preset training set sample image is set for the upper vertex target and the lower vertex target of the specified target with different features, and the training and iteration of the sample image of the preset training set are performed. The obtained full convolutional neural network has strong generalization ability, avoids complicated classifier cascade mode, and has a simpler structure.
可选的,所述全卷积神经网络包括:卷积层、降采样层及反卷积层;Optionally, the full convolutional neural network includes: a convolution layer, a downsampling layer, and a deconvolution layer;
所述提取模块850,具体可以用于:The extraction module 850 can be specifically configured to:
将所述预设训练集样本图像输入初始全卷积神经网络,经卷积层和降采样层相间排列的网络结构,提取所述预设训练集样本图像的特征;And inputting the preset training set sample image into an initial full convolutional neural network, and extracting, by the convolution layer and the downsampling layer, a network structure of the preset training set sample image;
通过所述反卷积层将所述特征上采样至分辨率与所述预设训练集样本图像的分辨率相同,得到上采样后的结果;Upsampling the feature to the resolution of the preset training set sample image by the deconvolution layer, and obtaining the upsampled result;
利用1×1卷积层对所述结果进行运算,得到与所述预设训练集样本图像同等分辨率的目标上顶点置信度分布图、目标下顶点置信度分布图及目标上下顶点关联场图。The result is calculated by using a 1×1 convolution layer, and a target upper vertex confidence distribution map, a target lower vertex confidence distribution map, and a target upper and lower vertex correlation field map are obtained with the same resolution as the preset training set sample image. .
需要说明的是,本申请实施例的目标检测装置为应用图3所示目标检测方法实施例的装置,则上述目标检测方法的所有实施例均适用于该装置,且均能达到相同或相似的有益效果。It should be noted that, the object detecting apparatus in the embodiment of the present application is the apparatus applying the embodiment of the object detecting method shown in FIG. 3, and all the embodiments of the target detecting method are applicable to the apparatus, and all of the same or similar can be achieved. Beneficial effect.
另外,相应于上述实施例所提供的目标检测方法,本申请实施例提供了一种存储介质,用于存储可执行代码,所述可执行代码用于在运行时执行:本申请实施例所提供的目标检测方法的所有步骤。In addition, corresponding to the object detection method provided by the foregoing embodiment, the embodiment of the present application provides a storage medium for storing executable code, which is executed at runtime: provided by the embodiment of the present application. All steps of the target detection method.
本实施例中,存储介质存储有在运行时执行本申请实施例所提供的目标检测方法的可执行代码,因此能够实现:采用经训练得到的全卷积神经网络,能够提取到指定目标的上顶点和下顶点,并且通过映射建立上顶点与下顶点 的连接,再通过匹配,将匹配成功的上下顶点连线作为指定目标,指定目标用连线表示,排除了候选框出现重叠的情况发生,即使指定目标分布密集,由于指定目标的上下顶点可以通过全卷积神经网络准确定位,则可以用上下顶点的连线清晰区分各指定目标,提高了目标检测的准确度。In this embodiment, the storage medium stores executable code that executes the target detection method provided by the embodiment of the present application at runtime, and thus can implement: using a trained full convolutional neural network, which can be extracted to a specified target. The vertex and the lower vertex are connected, and the connection between the upper vertex and the lower vertex is established by mapping, and then the matching upper and lower vertex lines are matched as the specified target by matching, and the specified target is represented by a line, and the overlapping of the candidate frames is excluded. Even if the specified target distribution is dense, since the upper and lower vertices of the specified target can be accurately located by the full convolutional neural network, the specified targets can be clearly distinguished by the connection of the upper and lower vertices, and the accuracy of the target detection is improved.
另外,相应于上述实施例所提供的目标检测方法,本申请实施例提供了一种应用程序,用于在运行时执行:本申请实施例所提供的目标检测方法的所有步骤。In addition, corresponding to the object detection method provided by the foregoing embodiment, the embodiment of the present application provides an application program for performing all the steps of the target detection method provided by the embodiment of the present application.
本实施例中,应用程序在运行时执行本申请实施例所提供的目标检测方法,因此能够实现:采用经训练得到的全卷积神经网络,能够提取到指定目标的上顶点和下顶点,并且通过映射建立上顶点与下顶点的连接,再通过匹配,将匹配成功的上下顶点连线作为指定目标,指定目标用连线表示,排除了候选框出现重叠的情况发生,即使指定目标分布密集,由于指定目标的上下顶点可以通过全卷积神经网络准确定位,则可以用上下顶点的连线清晰区分各指定目标,提高了目标检测的准确度。In this embodiment, the application performs the target detection method provided by the embodiment of the present application at runtime, and thus can implement: using the trained full convolutional neural network, the upper vertex and the lower vertex of the specified target can be extracted, and The connection between the upper vertex and the lower vertex is established by mapping, and then the matching upper and lower vertex lines are matched as the specified target, and the specified target is represented by a line, and the overlapping of the candidate frames is excluded, even if the specified target is densely distributed, Since the upper and lower vertices of the specified target can be accurately located by the full convolutional neural network, the specified targets can be clearly distinguished by the connection of the upper and lower vertices, and the accuracy of the target detection is improved.
另外,本申请实施例还提供了一种计算机设备,如图9所示,包括图像采集器901、处理器902和存储介质903,其中,In addition, the embodiment of the present application further provides a computer device, as shown in FIG. 9, including an image collector 901, a processor 902, and a storage medium 903, where
图像采集器901,用于采集待检测图像;An image collector 901, configured to collect an image to be detected;
存储介质903,用于存放可执行代码;a storage medium 903 for storing executable code;
处理器902,用于执行所述存储介质903上所存放的可执行代码时,实现本申请实施例所提供的目标检测方法的所有步骤。The processor 902 is configured to implement all the steps of the target detection method provided by the embodiments of the present application when the executable code stored on the storage medium 903 is executed.
图像采集器901、处理器902和存储介质903之间可以通过有线连接或者无线连接的方式进行数据传输,并且计算机设备可以通过有线通信接口或者无线通信接口与其他的设备进行通信。The image collector 901, the processor 902, and the storage medium 903 can perform data transmission by means of a wired connection or a wireless connection, and the computer device can communicate with other devices through a wired communication interface or a wireless communication interface.
上述存储介质可以包括RAM(Random Access Memory,随机存取存储器),也可以包括NVM(Non-volatile Memory,非易失性存储器),例如至少一个磁盘存储器。可选的,存储介质还可以是至少一个位于远离前述处理器的存 储装置。The storage medium may include a RAM (Random Access Memory), and may also include an NVM (Non-volatile Memory), such as at least one disk storage. Alternatively, the storage medium may also be at least one storage device located remotely from the aforementioned processor.
上述处理器可以是通用处理器,包括CPU(Central Processing Unit,中央处理器)、NP(Network Processor,网络处理器)等;还可以是DSP(Digital Signal Processor,数字信号处理器)、ASIC(Application Specific Integrated Circuit,专用集成电路)、FPGA(Field-Programmable Gate Array,现场可编程门阵列)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。The processor may be a general-purpose processor, including a CPU (Central Processing Unit), an NP (Network Processor), or the like; or a DSP (Digital Signal Processor) or an ASIC (Application) Specific Integrated Circuit, FPGA (Field-Programmable Gate Array) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components.
上述图像采集器可以为摄像头,用于对监控区域进行拍摄,进行视频采集或者图片采集。The image collector may be a camera for shooting a monitoring area for video capture or picture capture.
本实施例中,该计算机设备的处理器通过读取存储介质中存储的可执行代码,并通过运行该可执行代码,能够实现:采用经训练得到的全卷积神经网络,能够提取到指定目标的上顶点和下顶点,并且通过映射建立上顶点与下顶点的连接,再通过匹配,将匹配成功的上下顶点连线作为指定目标,指定目标用连线表示,排除了候选框出现重叠的情况发生,即使指定目标分布密集,由于指定目标的上下顶点可以通过全卷积神经网络准确定位,则可以用上下顶点的连线清晰区分各指定目标,提高了目标检测的准确度。In this embodiment, the processor of the computer device can realize the extraction of the specified target by using the trained full convolutional neural network by reading the executable code stored in the storage medium and running the executable code. The upper vertex and the lower vertex, and the connection between the upper vertex and the lower vertex is established by mapping, and then the matching upper and lower vertex lines are matched as the designated target by matching, and the specified target is represented by a line, and the overlapping of the candidate frames is excluded. Occurs, even if the specified target is densely distributed, since the upper and lower vertices of the specified target can be accurately located by the full convolutional neural network, the specified targets can be clearly distinguished by the connection of the upper and lower vertices, and the accuracy of the target detection is improved.
对于存储介质、应用程序以及计算机设备实施例而言,由于其所涉及的方法内容基本相似于前述的方法实施例,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。For the storage medium, the application program, and the computer device embodiment, since the method content involved is basically similar to the foregoing method embodiment, the description is relatively simple, and the relevant parts can be referred to the description of the method embodiment.
需要说明的是,在本文中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。It should be noted that, in this context, relational terms such as first and second are used merely to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply such entities or operations. There is any such actual relationship or order between them. Furthermore, the term "comprises" or "comprises" or "comprises" or any other variations thereof is intended to encompass a non-exclusive inclusion, such that a process, method, article, or device that comprises a plurality of elements includes not only those elements but also Other elements, or elements that are inherent to such a process, method, item, or device. An element that is defined by the phrase "comprising a ..." does not exclude the presence of additional equivalent elements in the process, method, item, or device that comprises the element.
本说明书中的各个实施例均采用相关的方式描述,各个实施例之间相同 相似的部分互相参见即可,每个实施例重点说明的都是与其他实施例的不同之处。尤其,对于装置、存储介质、应用程序以及计算机设备实施例而言,由于其基本相似于方法实施例,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。The various embodiments in the specification are described in a related manner, and the same similar parts between the various embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the apparatus, the storage medium, the application, and the computer apparatus embodiment, since it is substantially similar to the method embodiment, the description is relatively simple, and the relevant parts can be referred to the description of the method embodiment.
以上所述仅为本申请的较佳实施例而已,并不用以限制本申请,凡在本申请的精神和原则之内,所做的任何修改、等同替换、改进等,均应包含在本申请保护的范围之内。The above is only the preferred embodiment of the present application, and is not intended to limit the present application. Any modifications, equivalent substitutions, improvements, etc., which are made within the spirit and principles of the present application, should be included in the present application. Within the scope of protection.

Claims (17)

  1. 一种目标检测方法,其特征在于,所述方法包括:A method for detecting a target, the method comprising:
    获取通过图像采集器采集的待检测图像;Obtaining an image to be detected collected by the image collector;
    将所述待检测图像输入经训练得到的全卷积神经网络,生成所述待检测图像的目标上顶点置信度分布图、目标下顶点置信度分布图,以及目标上下顶点关联场图;And inputting the image to be detected into the trained full convolutional neural network, generating a target upper vertex confidence distribution map, a target lower vertex confidence distribution map, and a target upper and lower vertex correlation field map;
    分别针对所述目标上顶点置信度分布图及所述目标下顶点置信度分布图,采用预设目标确定方法,确定所述待检测图像中至少一个上顶点目标及至少一个下顶点目标;Determining, by using a preset target determining method, at least one upper vertex target and at least one lower vertex target in the image to be detected, respectively, for the target upper vertex confidence distribution map and the target lower vertex confidence distribution map;
    通过将各上顶点目标及各下顶点目标映射至所述目标上下顶点关联场图中,针对第一顶点目标,分别计算所述第一顶点目标与各第二顶点目标连线间的关联场值,其中,若所述第一顶点目标为任一上顶点目标,则所述第二顶点目标为任一下顶点目标,若所述第一顶点目标为任一下顶点目标,则所述第二顶点目标为任一上顶点目标;Calculating the associated field value between the first vertex target and each second vertex target connection for the first vertex target by mapping each upper vertex target and each lower vertex target to the target upper and lower vertex associated field maps Wherein, if the first vertex target is any upper vertex target, the second vertex target is any lower vertex target, and if the first vertex target is any lower vertex target, the second vertex target For any upper vertex target;
    基于所述第一顶点目标与各第二顶点目标连线间的关联场值,通过对上下顶点进行匹配,确定关联场值最大的连线为指定目标。Based on the associated field value between the first vertex target and each second vertex target connection, the upper and lower vertices are matched to determine that the connection with the largest associated field value is the specified target.
  2. 根据权利要求1所述的方法,其特征在于,所述全卷积神经网络的训练方式,包括:The method according to claim 1, wherein the training method of the full convolutional neural network comprises:
    获取预设训练集样本图像,及所述预设训练集样本图像中各指定目标的上边缘中心位置、下边缘中心位置,以及上下边缘中心位置的连线;Obtaining a preset training set sample image, and a line connecting the upper edge center position, the lower edge center position, and the center position of the upper and lower edges of each specified target in the preset training set sample image;
    根据预设分布定律、各指定目标的上边缘中心位置及下边缘中心位置,生成所述预设训练集样本图像的目标上顶点置信度真值图及目标下顶点置信度真值图;Generating a target upper vertex confidence truth value map and a target lower vertex confidence truth value map according to a preset distribution law, an upper edge center position of each specified target, and a lower edge center position;
    根据各指定目标的上下边缘中心位置的连线,生成所述预设训练集样本图像的目标上下顶点关联场真值图;Generating a true value map of the target upper and lower vertex associated fields of the preset training set sample image according to a line connecting the center positions of the upper and lower edges of the specified target;
    将所述预设训练集样本图像输入初始全卷积神经网络,得到所述预设训练集样本图像的目标上顶点置信度分布图、目标下顶点置信度分布图及目标 上下顶点关联场图,其中,所述初始全卷积神经网络的网络参数为预设值;And inputting the preset training set sample image into an initial full convolutional neural network, and obtaining a target upper vertex confidence distribution map, a target lower vertex confidence distribution map, and a target upper and lower vertex correlation field map of the preset training set sample image, The network parameter of the initial full convolutional neural network is a preset value;
    计算所述预设训练集样本图像的目标上顶点置信度分布图与目标上顶点置信度真值图的第一平均误差、所述预设训练集样本图像的目标下顶点置信度分布图与目标下顶点置信度真值图的第二平均误差,以及所述预设训练集样本图像的目标上下顶点关联场图与目标上下顶点关联场真值图的第三平均误差;Calculating a first average error of a target upper vertex confidence distribution map of the preset training set sample image and a target upper vertex confidence truth value map, and a target lower vertex confidence distribution map and a target of the preset training set sample image a second average error of the lower vertex confidence truth map, and a third average error of the target upper and lower vertex associated field maps of the preset training set sample image and the target upper and lower vertex associated field truth maps;
    如果所述第一平均误差、所述第二平均误差或所述第三平均误差大于预设误差阈值,则根据所述第一平均误差、所述第二平均误差、所述第三平均误差及预设梯度运算策略,更新网络参数,得到更新的全卷积神经网络;计算经所述更新的全卷积神经网络得到的第一平均误差、第二平均误差以及第三平均误差,直至所述第一平均误差、所述第二平均误差且所述第三误差均小于或等于所述预设误差阈值,确定所对应的全卷积神经网络为训练后的全卷积神经网络。If the first average error, the second average error, or the third average error is greater than a preset error threshold, according to the first average error, the second average error, the third average error, and Presetting a gradient operation strategy, updating network parameters, obtaining an updated full convolutional neural network; calculating a first average error, a second average error, and a third average error obtained by the updated full convolutional neural network until said The first average error, the second average error, and the third error are both less than or equal to the preset error threshold, and the corresponding full convolutional neural network is determined to be a trained full convolutional neural network.
  3. 根据权利要求2所述的方法,其特征在于,所述全卷积神经网络包括:卷积层、降采样层及反卷积层;The method according to claim 2, wherein the full convolutional neural network comprises: a convolution layer, a downsampling layer, and a deconvolution layer;
    所述将所述预设训练集样本图像输入初始全卷积神经网络,得到所述预设训练集样本图像的目标上顶点置信度分布图、目标下顶点置信度分布图及目标上下顶点关联场图,包括:And inputting the preset training set sample image into an initial full convolutional neural network, and obtaining a target upper vertex confidence distribution map, a target lower vertex confidence distribution map, and a target upper and lower vertex correlation field of the preset training set sample image. Figure, including:
    将所述预设训练集样本图像输入初始全卷积神经网络,经卷积层和降采样层相间排列的网络结构,提取所述预设训练集样本图像的特征;And inputting the preset training set sample image into an initial full convolutional neural network, and extracting, by the convolution layer and the downsampling layer, a network structure of the preset training set sample image;
    通过所述反卷积层将所述特征上采样至分辨率与所述预设训练集样本图像的分辨率相同,得到上采样后的结果;Upsampling the feature to the resolution of the preset training set sample image by the deconvolution layer, and obtaining the upsampled result;
    利用1×1卷积层对所述结果进行运算,得到与所述预设训练集样本图像同等分辨率的目标上顶点置信度分布图、目标下顶点置信度分布图及目标上下顶点关联场图。The result is calculated by using a 1×1 convolution layer, and a target upper vertex confidence distribution map, a target lower vertex confidence distribution map, and a target upper and lower vertex correlation field map are obtained with the same resolution as the preset training set sample image. .
  4. 根据权利要求1所述的方法,其特征在于,所述分别针对所述目标上顶点置信度分布图及所述目标下顶点置信度分布图,采用预设目标确定方法,确定所述待检测图像中至少一个上顶点目标及至少一个下顶点目标,包括:The method according to claim 1, wherein the determining the image to be detected by using a preset target determining method for the target upper vertex confidence distribution map and the target lower vertex confidence distribution map respectively At least one upper vertex target and at least one lower vertex target, including:
    分别针对所述目标上顶点置信度分布图及所述目标下顶点置信度分布图,采用非极大值抑制方法,确定至少一个检测目标的中心点的位置;Determining a position of a center point of the at least one detection target by using a non-maximum value suppression method for the target upper vertex confidence distribution map and the target lower vertex confidence distribution map;
    获取每个检测目标的中心点的邻域内所有像素点的置信度;Obtaining the confidence of all pixels in the neighborhood of the center point of each detection target;
    确定所述目标上顶点置信度分布图中每个像素点的置信度均大于预设置信度阈值的检测目标为上顶点目标、所述目标下顶点置信度分布图中每个像素点的置信度均大于预设置信度阈值的检测目标为下顶点目标。Determining a confidence level of each pixel in the vertex confidence distribution graph on the target is greater than a confidence level of the pre-set reliability threshold, and the confidence of each pixel in the target lower vertex confidence map The detection target that is greater than the preset reliability threshold is the lower vertex target.
  5. 根据权利要求1所述的方法,其特征在于,所述通过将各上顶点目标及各下顶点目标映射至所述目标上下顶点关联场图中,针对第一顶点目标,分别计算所述第一顶点目标与各第二顶点目标连线间的关联场值,包括:The method according to claim 1, wherein said first calculating said first vertex target by mapping each upper vertex target and each lower vertex target into said target upper and lower vertex associated field maps The associated field value between the vertex target and each second vertex target connection, including:
    将各上顶点目标及各下顶点目标映射至所述目标上下顶点关联场图中,得到各上顶点目标与各下顶点目标的关联程度值;Mapping each upper vertex target and each lower vertex target to the target upper and lower vertex associated field maps, and obtaining correlation degrees between the upper vertex targets and the lower vertex targets;
    针对第一顶点目标,将所述第一顶点目标与各第二顶点目标进行连线;The first vertex target is connected to each second vertex target for the first vertex target;
    根据所述第一顶点目标与各第二顶点目标的关联程度值,计算得到所述第一顶点目标与各第二顶点目标的关联程度值的均值作为所述第一顶点目标与各第二顶点目标连线间的关联场值。Calculating, according to the correlation degree value of the first vertex target and each second vertex target, an average value of the correlation degree value of the first vertex target and each second vertex target as the first vertex target and each second vertex The associated field value between the target connections.
  6. 根据权利要求1所述的方法,其特征在于,所述基于所述第一顶点目标与各第二顶点目标连线间的关联场值,通过对上下顶点进行匹配,确定关联场值最大的连线为指定目标,包括:The method according to claim 1, wherein the determining the associated field value between the first vertex target and each second vertex target line determines the maximum associated field value by matching the upper and lower vertices Lines are specified targets, including:
    基于所述第一顶点目标与各第二顶点目标连线间的关联场值,采用预设二分图匹配法,从各关联场值中选择最大的关联场值;Determining a maximum associated field value from each associated field value by using a preset bipartite graph matching method based on the associated field value between the first vertex target and each second vertex target connection;
    将所述最大的关联场值对应的连线确定为指定目标。The connection corresponding to the largest associated field value is determined as the specified target.
  7. 根据权利要求6所述的方法,其特征在于,在所述基于所述第一顶点目标与各第二顶点目标连线间的关联场值,采用预设二分图匹配法,从各关联场值中选择最大的关联场值之后,所述方法还包括:The method according to claim 6, wherein the associated field value is based on the associated field value between the first vertex target and each second vertex target connection, using a preset bipartite graph matching method. After selecting the largest associated field value, the method further includes:
    获取预设关联场阈值;Obtain a preset associated field threshold;
    判断所述最大的关联场值是否大于所述预设关联场阈值;Determining whether the maximum associated field value is greater than the preset associated field threshold;
    如果大于,则执行所述将所述最大的关联场值对应的连线确定为指定目标。If it is greater than, the connection corresponding to the largest associated field value is determined as the specified target.
  8. 一种目标检测装置,其特征在于,所述装置包括:A target detecting device, characterized in that the device comprises:
    第一获取模块,用于获取通过图像采集器采集的待检测图像;a first acquiring module, configured to acquire an image to be detected collected by the image collector;
    第一生成模块,用于将所述待检测图像输入经训练得到的全卷积神经网络,生成所述待检测图像的目标上顶点置信度分布图、目标下顶点置信度分布图,以及目标上下顶点关联场图;a first generating module, configured to input the image to be detected into the trained full convolutional neural network, generate a target upper vertex confidence distribution map, a target lower vertex confidence distribution map, and a target upper and lower Vertex associated field map;
    目标确定模块,用于分别针对所述目标上顶点置信度分布图及所述目标下顶点置信度分布图,采用预设目标确定方法,确定所述待检测图像中至少一个上顶点目标及至少一个下顶点目标;a target determining module, configured to determine at least one upper vertex target and at least one of the to-be-detected images by using a preset target determining method for the target upper vertex confidence distribution map and the target lower vertex confidence distribution map, respectively Lower vertex target;
    第一计算模块,用于通过将各上顶点目标及各下顶点目标映射至所述目标上下顶点关联场图中,针对第一顶点目标,分别计算所述第一顶点目标与各第二顶点目标连线间的关联场值,其中,若所述第一顶点目标为任一上顶点目标,则所述第二顶点目标为任一下顶点目标,若所述第一顶点目标为任一下顶点目标,则所述第二顶点目标为任一上顶点目标;a first calculating module, configured to calculate the first vertex target and each second vertex target for the first vertex target by mapping each upper vertex target and each lower vertex target to the target upper and lower vertex associated field maps An associated field value between the lines, wherein if the first vertex target is any upper vertex target, the second vertex target is any lower vertex target, and if the first vertex target is any lower vertex target, Then the second vertex target is any upper vertex target;
    匹配模块,用于基于所述第一顶点目标与各第二顶点目标连线间的关联场值,通过对上下顶点进行匹配,确定关联场值最大的连线为指定目标。And a matching module, configured to determine, according to the associated field value between the first vertex target and each second vertex target connection, by matching the upper and lower vertices, determining that the connection with the largest associated field value is the specified target.
  9. 根据权利要求8所述的装置,其特征在于,所述装置还包括:The device according to claim 8, wherein the device further comprises:
    第二获取模块,用于获取预设训练集样本图像,及所述预设训练集样本图像中各指定目标的上边缘中心位置、下边缘中心位置,以及上下边缘中心位置的连线;a second acquiring module, configured to acquire a preset training set sample image, and a line connecting the upper edge center position, the lower edge center position, and the center position of the upper and lower edges of each specified target in the preset training set sample image;
    第二生成模块,用于根据预设分布定律、各指定目标的上边缘中心位置及下边缘中心位置,生成所述预设训练集样本图像的目标上顶点置信度真值图及目标下顶点置信度真值图;a second generating module, configured to generate, according to a preset distribution law, an upper edge center position and a lower edge center position of each specified target, a target upper vertex confidence true value map of the preset training set sample image and a target lower vertex confidence True value map;
    第三生成模块,用于根据各指定目标的上下边缘中心位置的连线,生成所述预设训练集样本图像的目标上下顶点关联场真值图;a third generation module, configured to generate a true value map of the target upper and lower vertex associated fields of the preset training set sample image according to the connection of the center positions of the upper and lower edges of the specified target;
    提取模块,用于将所述预设训练集样本图像输入初始全卷积神经网络, 得到所述预设训练集样本图像的目标上顶点置信度分布图、目标下顶点置信度分布图及目标上下顶点关联场图,其中,所述初始全卷积神经网络的网络参数为预设值;An extraction module, configured to input the preset training set sample image into an initial full convolutional neural network, obtain a target upper vertex confidence distribution map of the preset training set sample image, a target lower vertex confidence distribution map, and a target upper and lower a vertex associated field map, wherein the network parameter of the initial full convolutional neural network is a preset value;
    第二计算模块,用于计算所述预设训练集样本图像的目标上顶点置信度分布图与目标上顶点置信度真值图的第一平均误差、所述预设训练集样本图像的目标下顶点置信度分布图与目标下顶点置信度真值图的第二平均误差,以及所述预设训练集样本图像的目标上下顶点关联场图与目标上下顶点关联场真值图的第三平均误差;a second calculating module, configured to calculate a first average error of a target upper vertex confidence distribution map of the preset training set sample image and a target upper vertex confidence truth value map, and a target of the preset training set sample image a second average error of the vertex confidence distribution map and the target lower vertex confidence truth map, and a third average error of the target upper and lower vertex associated field map of the preset training set sample image and the target upper and lower vertex associated field truth map ;
    循环模块,用于如果所述第一平均误差、所述第二平均误差或所述第三平均误差大于预设误差阈值,则根据所述第一平均误差、所述第二平均误差、所述第三平均误差及预设梯度运算策略,更新网络参数,得到更新的全卷积神经网络;计算经所述更新的全卷积神经网络得到的第一平均误差、第二平均误差以及第三平均误差,直至所述第一平均误差、所述第二平均误差且所述第三误差均小于或等于所述预设误差阈值,确定所对应的全卷积神经网络为训练后的全卷积神经网络。a loop module, configured to: if the first average error, the second average error, or the third average error is greater than a preset error threshold, according to the first average error, the second average error, The third average error and the preset gradient operation strategy update the network parameters to obtain an updated full convolutional neural network; calculate the first average error, the second average error, and the third average obtained by the updated full convolutional neural network The error, until the first average error, the second average error, and the third error are both less than or equal to the preset error threshold, determining that the corresponding full convolutional neural network is a trained full convolutional nerve The internet.
  10. 根据权利要求9所述的装置,其特征在于,所述全卷积神经网络包括:卷积层、降采样层及反卷积层;The apparatus according to claim 9, wherein said full convolutional neural network comprises: a convolution layer, a downsampling layer, and a deconvolution layer;
    所述提取模块,具体用于:The extraction module is specifically configured to:
    将所述预设训练集样本图像输入初始全卷积神经网络,经卷积层和降采样层相间排列的网络结构,提取所述预设训练集样本图像的特征;And inputting the preset training set sample image into an initial full convolutional neural network, and extracting, by the convolution layer and the downsampling layer, a network structure of the preset training set sample image;
    通过所述反卷积层将所述特征上采样至分辨率与所述预设训练集样本图像的分辨率相同,得到上采样后的结果;Upsampling the feature to the resolution of the preset training set sample image by the deconvolution layer, and obtaining the upsampled result;
    利用1×1卷积层对所述结果进行运算,得到与所述预设训练集样本图像同等分辨率的目标上顶点置信度分布图、目标下顶点置信度分布图及目标上下顶点关联场图。The result is calculated by using a 1×1 convolution layer, and a target upper vertex confidence distribution map, a target lower vertex confidence distribution map, and a target upper and lower vertex correlation field map are obtained with the same resolution as the preset training set sample image. .
  11. 根据权利要求8所述的装置,其特征在于,所述目标确定模块,具体用于:The device according to claim 8, wherein the target determining module is specifically configured to:
    分别针对所述目标上顶点置信度分布图及所述目标下顶点置信度分布图,采用非极大值抑制方法,确定至少一个检测目标的中心点的位置;Determining a position of a center point of the at least one detection target by using a non-maximum value suppression method for the target upper vertex confidence distribution map and the target lower vertex confidence distribution map;
    获取每个检测目标的中心点的邻域内所有像素点的置信度;Obtaining the confidence of all pixels in the neighborhood of the center point of each detection target;
    确定所述目标上顶点置信度分布图中每个像素点的置信度均大于预设置信度阈值的检测目标为上顶点目标、所述目标下顶点置信度分布图中每个像素点的置信度均大于预设置信度阈值的检测目标为下顶点目标。Determining a confidence level of each pixel in the vertex confidence distribution graph on the target is greater than a confidence level of the pre-set reliability threshold, and the confidence of each pixel in the target lower vertex confidence map The detection target that is greater than the preset reliability threshold is the lower vertex target.
  12. 根据权利要求8所述的装置,其特征在于,所述第一计算模块,具体用于:The device according to claim 8, wherein the first calculating module is specifically configured to:
    将各上顶点目标及各下顶点目标映射至所述目标上下顶点关联场图中,得到各上顶点目标与各下顶点目标的关联程度值;Mapping each upper vertex target and each lower vertex target to the target upper and lower vertex associated field maps, and obtaining correlation degrees between the upper vertex targets and the lower vertex targets;
    针对第一顶点目标,将所述第一顶点目标与各第二顶点目标进行连线;The first vertex target is connected to each second vertex target for the first vertex target;
    根据所述第一顶点目标与各第二顶点目标的关联程度值,计算得到所述第一顶点目标与各第二顶点目标的关联程度值的均值作为所述第一顶点目标与各第二顶点目标连线间的关联场值。Calculating, according to the correlation degree value of the first vertex target and each second vertex target, an average value of the correlation degree value of the first vertex target and each second vertex target as the first vertex target and each second vertex The associated field value between the target connections.
  13. 根据权利要求8所述的装置,其特征在于,所述匹配模块,具体用于:The device according to claim 8, wherein the matching module is specifically configured to:
    基于所述第一顶点目标与各第二顶点目标连线间的关联场值,采用预设二分图匹配法,从各关联场值中选择最大的关联场值;Determining a maximum associated field value from each associated field value by using a preset bipartite graph matching method based on the associated field value between the first vertex target and each second vertex target connection;
    将所述最大的关联场值对应的连线确定为指定目标。The connection corresponding to the largest associated field value is determined as the specified target.
  14. 根据权利要求13所述的装置,其特征在于,所述匹配模块还用于:The device according to claim 13, wherein the matching module is further configured to:
    获取预设关联场阈值;Obtain a preset associated field threshold;
    判断所述最大的关联场值是否大于所述预设关联场阈值;Determining whether the maximum associated field value is greater than the preset associated field threshold;
    如果大于,则执行所述将所述最大的关联场值对应的连线确定为指定目标。If it is greater than, the connection corresponding to the largest associated field value is determined as the specified target.
  15. 一种存储介质,其特征在于,用于存储可执行代码,所述可执行代码用于在运行时执行:权利要求1-7任一项所述的目标检测方法。A storage medium for storing executable code for execution at runtime: the object detection method of any one of claims 1-7.
  16. 一种应用程序,其特征在于,用于在运行时执行:权利要求1-7任一项所述的目标检测方法。An application, characterized in that it is executed at runtime: the object detection method according to any one of claims 1-7.
  17. 一种计算机设备,其特征在于,包括图像采集器、处理器和存储介质,其中,A computer device, comprising: an image collector, a processor, and a storage medium, wherein
    所述图像采集器,用于采集待检测图像;The image collector is configured to collect an image to be detected;
    所述存储介质,用于存放可执行代码;The storage medium is configured to store executable code;
    所述处理器,用于执行所述存储介质上所存放的可执行代码时,实现权利要求1-7任一所述的目标检测方法。The processor, when used to execute executable code stored on the storage medium, implements the object detection method according to any one of claims 1-7.
PCT/CN2018/110394 2017-10-23 2018-10-16 Target detection method and apparatus, and computer device WO2019080743A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP18871198.0A EP3702957B1 (en) 2017-10-23 2018-10-16 Target detection method and apparatus, and computer device
US16/758,443 US11288548B2 (en) 2017-10-23 2018-10-16 Target detection method and apparatus, and computer device

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201711004621.7A CN109697441B (en) 2017-10-23 2017-10-23 Target detection method and device and computer equipment
CN201711004621.7 2017-10-23

Publications (1)

Publication Number Publication Date
WO2019080743A1 true WO2019080743A1 (en) 2019-05-02

Family

ID=66229354

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/110394 WO2019080743A1 (en) 2017-10-23 2018-10-16 Target detection method and apparatus, and computer device

Country Status (4)

Country Link
US (1) US11288548B2 (en)
EP (1) EP3702957B1 (en)
CN (1) CN109697441B (en)
WO (1) WO2019080743A1 (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109035260A (en) * 2018-07-27 2018-12-18 京东方科技集团股份有限公司 A kind of sky areas dividing method, device and convolutional neural networks
KR20200132569A (en) * 2019-05-17 2020-11-25 삼성전자주식회사 Device for automatically photographing a photo or a video with respect to a specific moment and method for operating the same
CN111652251B (en) * 2020-06-09 2023-06-27 星际空间(天津)科技发展有限公司 Remote sensing image building feature extraction model construction method, device and storage medium
CN111652250B (en) * 2020-06-09 2023-05-26 星际空间(天津)科技发展有限公司 Remote sensing image building extraction method and device based on polygons and storage medium
CN112364734B (en) * 2020-10-30 2023-02-21 福州大学 Abnormal dressing detection method based on yolov4 and CenterNet
CN112435295A (en) * 2020-11-12 2021-03-02 浙江大华技术股份有限公司 Blackbody position detection method, electronic device and computer-readable storage medium
CN112380973B (en) * 2020-11-12 2023-06-23 深兰科技(上海)有限公司 Traffic signal lamp identification method and system
CN113538490B (en) * 2021-07-20 2022-10-28 刘斌 Video stream processing method and device

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8131011B2 (en) * 2006-09-25 2012-03-06 University Of Southern California Human detection and tracking system
CN102880877A (en) * 2012-09-28 2013-01-16 中科院成都信息技术有限公司 Target identification method based on contour features
US20140270367A1 (en) * 2013-03-14 2014-09-18 Nec Laboratories America, Inc. Selective Max-Pooling For Object Detection
CN106485230A (en) * 2016-10-18 2017-03-08 中国科学院重庆绿色智能技术研究院 Based on the training of the Face datection model of neutral net, method for detecting human face and system
CN106570453A (en) * 2015-10-09 2017-04-19 北京市商汤科技开发有限公司 Pedestrian detection method, device and system
CN106651955A (en) * 2016-10-10 2017-05-10 北京小米移动软件有限公司 Method and device for positioning object in picture

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7450735B1 (en) * 2003-10-16 2008-11-11 University Of Central Florida Research Foundation, Inc. Tracking across multiple cameras with disjoint views
US9691163B2 (en) * 2013-01-07 2017-06-27 Wexenergy Innovations Llc System and method of measuring distances related to an object utilizing ancillary objects
US9280833B2 (en) * 2013-03-05 2016-03-08 International Business Machines Corporation Topology determination for non-overlapping camera network
JP6337811B2 (en) * 2015-03-17 2018-06-06 トヨタ自動車株式会社 Image processing apparatus and image processing method
US10192129B2 (en) * 2015-11-18 2019-01-29 Adobe Systems Incorporated Utilizing interactive deep learning to select objects in digital visual media
GB2553005B (en) * 2016-08-19 2022-04-13 Apical Ltd Method of line detection
CN106845374B (en) * 2017-01-06 2020-03-27 清华大学 Pedestrian detection method and detection device based on deep learning
CN107066990B (en) * 2017-05-04 2019-10-11 厦门美图之家科技有限公司 A kind of method for tracking target and mobile device

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8131011B2 (en) * 2006-09-25 2012-03-06 University Of Southern California Human detection and tracking system
CN102880877A (en) * 2012-09-28 2013-01-16 中科院成都信息技术有限公司 Target identification method based on contour features
US20140270367A1 (en) * 2013-03-14 2014-09-18 Nec Laboratories America, Inc. Selective Max-Pooling For Object Detection
CN106570453A (en) * 2015-10-09 2017-04-19 北京市商汤科技开发有限公司 Pedestrian detection method, device and system
CN106651955A (en) * 2016-10-10 2017-05-10 北京小米移动软件有限公司 Method and device for positioning object in picture
CN106485230A (en) * 2016-10-18 2017-03-08 中国科学院重庆绿色智能技术研究院 Based on the training of the Face datection model of neutral net, method for detecting human face and system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3702957A4

Also Published As

Publication number Publication date
CN109697441B (en) 2021-02-12
CN109697441A (en) 2019-04-30
EP3702957A1 (en) 2020-09-02
EP3702957A4 (en) 2020-12-30
US20200250487A1 (en) 2020-08-06
US11288548B2 (en) 2022-03-29
EP3702957B1 (en) 2023-07-19

Similar Documents

Publication Publication Date Title
WO2019080743A1 (en) Target detection method and apparatus, and computer device
WO2022002039A1 (en) Visual positioning method and device based on visual map
US10198823B1 (en) Segmentation of object image data from background image data
JP6942488B2 (en) Image processing equipment, image processing system, image processing method, and program
CN109815770B (en) Two-dimensional code detection method, device and system
CN107256377B (en) Method, device and system for detecting object in video
WO2020134528A1 (en) Target detection method and related product
US9928405B2 (en) System and method for detecting and tracking facial features in images
US9436999B2 (en) Automatic image orientation and straightening through image analysis
US9576367B2 (en) Object detection method and device
CN108986152B (en) Foreign matter detection method and device based on difference image
WO2018082308A1 (en) Image processing method and terminal
CN112084869A (en) Compact quadrilateral representation-based building target detection method
US11176425B2 (en) Joint detection and description systems and methods
CN111369495B (en) Panoramic image change detection method based on video
CN111160291B (en) Human eye detection method based on depth information and CNN
CN108875504B (en) Image detection method and image detection device based on neural network
WO2022002262A1 (en) Character sequence recognition method and apparatus based on computer vision, and device and medium
JP2020149641A (en) Object tracking device and object tracking method
JP2009064434A (en) Determination method, determination system and computer readable medium
KR20190080388A (en) Photo Horizon Correction Method based on convolutional neural network and residual network structure
CN109961103B (en) Training method of feature extraction model, and image feature extraction method and device
CN111177811A (en) Automatic fire point location layout method applied to cloud platform
CN105894505A (en) Quick pedestrian positioning method based on multi-camera geometrical constraint
WO2022174603A1 (en) Pose prediction method, pose prediction apparatus, and robot

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18871198

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2018871198

Country of ref document: EP

Effective date: 20200525