WO2015078185A1 - Convolutional neural network and target object detection method based on same - Google Patents

Convolutional neural network and target object detection method based on same Download PDF

Info

Publication number
WO2015078185A1
WO2015078185A1 PCT/CN2014/081676 CN2014081676W WO2015078185A1 WO 2015078185 A1 WO2015078185 A1 WO 2015078185A1 CN 2014081676 W CN2014081676 W CN 2014081676W WO 2015078185 A1 WO2015078185 A1 WO 2015078185A1
Authority
WO
WIPO (PCT)
Prior art keywords
layer
detection area
sub
parts
map
Prior art date
Application number
PCT/CN2014/081676
Other languages
French (fr)
Chinese (zh)
Inventor
欧阳万里
许春景
刘健庄
王晓刚
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2015078185A1 publication Critical patent/WO2015078185A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/50Extraction of image or video features by performing operations within image blocks; by using histograms, e.g. histogram of oriented gradients [HoG]; by summing image-intensity values; Projection analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • G06V10/7747Organisation of the process, e.g. bagging or boosting

Definitions

  • the present invention relates to data communication technologies, and more particularly to a convolutional neural network and a target object detection method based on a convolutional neural network.
  • Object detection is one of the basic problems in machine vision. After detecting an object, it is convenient to store, analyze, 3D model, identify, track and search the object.
  • object detection such as pedestrian detection
  • the purpose of pedestrian detection is to find the position and area of the pedestrian in the image.
  • the main difficulty in pedestrian detection is the change of pedestrians in dressing, lighting, background, body deformation and occlusion.
  • pedestrian detection first, it is necessary to extract features that distinguish between pedestrians and non-pedestrians.
  • the commonly used methods are Haar-like features and Histogram of Oriented Gradient (HOG).
  • a deformable model is proposed to deal with the deformation caused by the movement of the pedestrian.
  • many methods of dealing with occlusion find the occluded portion of the picture in the picture to avoid using the occluded image information to determine whether there is a pedestrian in the given rectangle.
  • the classifier is used to determine if a pedestrian is present in a given rectangle.
  • the pedestrian detection method of the prior art mainly includes the following steps: 1. Convolving an input image in a first stage, The result of the product is downsampled to obtain the output of the first stage; 2. The convolution and downsampling are continued according to the output of the first stage to obtain the output of the upper line in the second stage; 3. The output of the first stage is performed through the branch line. Sampling yields the output of the next row in the second phase; 4. Classifies according to the output of the second phase.
  • the main feature is learning feature extraction. Each ⁇ does not have a clear target for the processing result.
  • FIG. 2 is a schematic diagram of a method for pedestrian detection in prior art 2, which divides a pedestrian into a root node composed of a template of the entire pedestrian and a body part of the pedestrian (eg, The child nodes of the head, the upper part of the leg, or the lower part of the leg.
  • the child node has a deformation constraint with the root node, for example, the head cannot be too far away from the body.
  • the prior art pedestrian detection method includes the following steps: 1.
  • the model in Figure 2 has 5 sub-nodes, so there are 5 sub-node filter templates, and 5 matched responses are obtained;
  • the response of the child node is corrected by the deformation constraint of the root node, and the corrected response is obtained;
  • Embodiments of the present invention provide a convolutional neural network and a target object detection method based on a convolutional neural network, which are capable of processing deformation and occlusion of a target object.
  • a first aspect of the present invention provides a method for detecting a target object based on a convolutional neural network, the convolutional neural network comprising: a feature extraction layer, a part detection layer, a deformation processing layer, an occlusion processing layer, and a classifier;
  • the feature extraction layer performs pre-processing on the pixel value of the detection area according to the pixel value of the detection area in the extracted image, and performs feature extraction on the pre-processed image to obtain a feature map of the detection area;
  • the part detecting layer respectively detects the feature map of the detection area through the M filters, and outputs a response map corresponding to the M parts of the detection area, and each filter is used to detect one part, and each part corresponds to one response diagram. ;
  • Deformation processing layer respectively determines deformations of the M parts according to the response maps corresponding to the M parts, and determines a score map of the M parts according to deformations of the N parts;
  • the occlusion processing layer is configured according to a score map of the M parts determines an occlusion corresponding to the M parts;
  • the classifier determines, according to an output result of the occlusion processing layer, that the detection area is No target object.
  • the feature extraction layer extracts a pixel value of the detection area in the image, and performs preprocessing on the pixel value in the detection area, including: the feature extraction layer Extracting a pixel value of the detection area in the image, converting the pixel value of the detection area into data of three channels, wherein the three channels are a first channel, a second channel, and a third channel;
  • the output data of the first channel corresponds to Y channel data of a YUV pixel value in the detection area
  • the second channel is configured to reduce the size of the detection area to a quarter of the original size, and convert the reduced detection area into a YUV format, and convert the image to a YUV format by a Sobel edge operator.
  • a first edge map of the detection area on the three channels Y, U, and V wherein the Y, ⁇ , and V channels respectively correspond to a first edge map, and the three first edge maps are taken a maximum value at each position in the middle, forming a second edge map, the three first edge maps and the second edge map having the same size, each being a quarter of the detection area, and the three a mosaic of the edge map and the second edge map as output data of the second channel;
  • the third channel is configured to reduce the size of the detection area to a quarter of the original size, and convert the reduced detection area into a YUV format, and convert the image to a YUV format by a Sobel edge operator.
  • a detection area respectively obtaining a first edge map of the detection area on three channels of Y, U, V, wherein the Y, U, V channels respectively correspond to a first edge map, and generate a third edge map,
  • the data of each position of the third edge map is 0, and the three first edge maps and the third edge map have the same size, which are all a quarter of the detection area, and the three first edge maps are And a mosaic of the third edge map as output data of the third channel.
  • the part detecting layer includes three sub-layers, which are a first sub-layer, a second sub-layer, and a third sub-layer, respectively
  • a sub-layer includes M1 filters
  • a second sub-layer of the portion detecting layer includes M2 filters
  • M1 filters of the first sub-layer of the part detecting layer respectively detect M1 parts in the detection area to obtain M1 response pictures
  • the M2 filters of the second sub-layer of the part detecting layer respectively detect M2 parts in the detection area, and obtain M2 response patterns;
  • the M3 filters of the third sub-layer of the part detecting layer respectively detect M3 parts in the detection area to obtain M3 response maps.
  • the deformation processing layer respectively determines deformations of the M parts according to the response maps corresponding to the M parts, and according to the deformation of the M parts Determining the score map of the M parts, including:
  • the deformation processing layer obtains the shape of the Pth portion according to the formula (1) according to the response map corresponding to the M parts:
  • B p M p + ⁇ D n ⁇ p (1)
  • denotes that the shape of the p-th part becomes a partial graph
  • M p denotes a response map corresponding to the p-th part
  • N denotes a constraint condition of the p-th part
  • D w represents a score map corresponding to the n-th constraint condition
  • 1 ⁇ ⁇ ⁇ 0 ⁇ represents a weight corresponding to the n-th constraint condition
  • the deformation processing layer becomes a partial map according to the shape, Determine the score map of the Pth part according to formula (2):
  • the occlusion processing layer includes three sub-layers, respectively a first sub-layer, a second sub-layer, and a third sub-layer, and the occlusion processing layer is
  • the score maps of the M parts determine the occlusion corresponding to the M parts, including:
  • the occlusion processing layer determines a score map and visibility of the M portions on a sublayer of the occlusion processing layer
  • the first sub-layer, the second sub-layer, and the third sub-layer of the occlusion processing layer are respectively according to formula (3),
  • a second aspect of the present invention provides a convolutional neural network, including:
  • a feature extraction layer configured to preprocess a pixel value of the detection area according to a pixel value of the detection area in the extracted image, and perform feature extraction on the preprocessed image to obtain a feature map of the detection area;
  • a part detecting layer configured to respectively detect a feature map of the detection area by M filters, and output a response map corresponding to M parts of the detection area, each filter is used to detect one part, and each part corresponds to one response Figure
  • a deformation processing layer configured to respectively determine deformations of the M parts according to the response maps corresponding to the M parts, and determine a score map of the M parts according to the deformation of the N parts;
  • the occlusion processing layer Determining the occlusion corresponding to the M parts according to the score map of the M parts;
  • a classifier configured to determine, according to an output result of the occlusion processing layer, whether there is a target object in the detection area.
  • the feature extraction layer includes three channels, which are a first channel, a second channel, and a third channel, respectively;
  • the output data of the first channel corresponds to Y channel data of a YUV pixel value in the detection area
  • the second channel is configured to reduce the size of the detection area to a quarter of the original size, and convert the reduced detection area into a YUV format, and convert the image to YUV by Sobel edge operator filtering a detection area of the format, respectively obtaining a first edge map of the detection area on three channels of Y, U, V, wherein the Y, U, V channels respectively correspond to a first edge map, and the three first edges are taken
  • the maximum value at each position in the figure constitutes a second edge map, and the three first edge maps and the second edge map are the same size, which are one quarter of the detection area. a size, the mosaic of the three first edge maps and the second edge map is used as output data of the second channel;
  • the third channel is configured to reduce the size of the detection area to a quarter of the original size, and convert the reduced detection area into a YUV format, and convert the image to YUV by Sobel edge operator filtering.
  • a detection area of the format respectively obtaining a first edge map of the detection area on three channels of Y, U, V, wherein the Y, ⁇ , V channels respectively correspond to a first edge map, and generate a third edge map,
  • the data of each position of the third edge map is 0, and the three first edge maps and the third edge map have the same size, all of which are one quarter of the detection area, and the three first edges are
  • the figure and the mosaic of the third edge map serve as output data of the third channel.
  • the part detecting layer includes three sub-layers, which are a first sub-layer, a second sub-layer, and a third sub-layer, respectively
  • a sub-layer includes M1 filters
  • a second sub-layer of the portion detecting layer includes M2 filters
  • the first sub-layer of the part detecting layer is configured to respectively detect M1 parts in the detection area by using M1 filters, and obtain M1 response patterns;
  • a second sub-layer of the part detecting layer is configured to respectively detect M2 parts in the detecting area by using M2 filters, and obtain M2 response patterns;
  • the third sub-layer of the part detecting layer is configured to respectively detect M3 parts in the detection area by M3 filters, and obtain M3 response patterns.
  • the deformation processing layer is specifically configured to:
  • the deformation processing layer obtains the shape of the Pth portion according to the formula (1) according to the response map corresponding to the M parts:
  • ⁇ ⁇ ⁇ ⁇ ⁇ , ⁇ . ⁇ , ⁇ ( 1 )
  • denotes that the shape of the ⁇ th portion becomes a partial graph
  • ⁇ ⁇ represents a response map corresponding to the ⁇ th portion
  • represents a constraint condition of the ⁇ th portion
  • D ",p represents the score map corresponding to the nth constraint condition
  • 1 ⁇ ⁇ ⁇ 0 ⁇ represents the weight corresponding to the nth constraint condition
  • the deformation processing layer becomes a partial map according to the shape, and is determined according to the formula (2)
  • the score map of the third part: maxB(, (2) where ⁇ ) represents the value at the position ( ⁇ , y).
  • the occlusion processing layer includes three sub-layers, which are a first sub-layer, a second sub-layer, and a third sub-layer;
  • the first sub-layer, the second sub-layer, and the third sub-layer of the occlusion processing layer are respectively according to formula (3),
  • the convolutional neural network and the object detection method based on convolutional neural network in the embodiment of the present invention are combined with optimized convolutional neural network model integrating optimization feature extraction, part detection, deformation processing, occlusion processing and classifier learning.
  • the convolutional neural network can learn the deformation of the target object, and the deformation learning and the occlusion processing interact, and the interaction can improve the ability of the classifier to distinguish the target object from the non-target object according to the learned feature.
  • FIG. 1 is a schematic diagram of a pedestrian detection method according to prior art 1;
  • FIG. 2 is a schematic diagram of a method for pedestrian detection in the prior art 2;
  • FIG. 3 is a flow chart of an embodiment of a method for detecting a target object based on a convolutional neural network according to the present invention
  • FIG. 4 is a schematic view of a filter for detecting various parts of a body according to the present invention.
  • Figure 5 is a schematic diagram showing the results of the detection of the part detection layer
  • FIG. 6 is a schematic diagram of an operation flow of a deformation processing layer
  • FIG. 7 is a schematic view showing a processing procedure of an occlusion processing layer
  • FIG. 8 is a schematic diagram of detection results of a target object according to the present invention.
  • Figure 9 is a schematic view of the overall model of the present invention.
  • FIG. 10 is a schematic structural view of an embodiment of a convolutional neural network according to the present invention.
  • FIG. 11 is a schematic structural diagram of still another embodiment of a convolutional neural network according to the present invention.
  • a convolutional neural network includes: a feature extraction layer, a part detection layer, a deformation processing layer, an occlusion processing layer, and a classifier.
  • the method in this embodiment may include: Step 101: The feature extraction layer preprocesses the pixel values of the region according to the pixel values of the detection regions in the extracted image, and performs feature on the preprocessed image. Extraction, obtaining a feature map of the detection area.
  • detecting the target object only refers to detecting whether there is a target object in the detection area
  • the detection area may be an arbitrarily set area, for example, an image is divided into two rectangular frames, and each rectangular frame is As a detection area.
  • the target object can be a pedestrian, steam Cars, animals, etc.
  • the image is pre-processed to eliminate some interference factors of the image, and any existing method, such as gradation transformation, may be adopted for the image pre-processing. , histogram correction, image smoothing, etc.
  • the feature extraction layer extracts the pixel value of the detection area in the image, and converts the pixel value of the detection area into three channels of data, and the three channels are the first channel, the second channel, and the third channel, respectively.
  • the data for each channel is acquired independently as an input part of the entire model.
  • the output data of the first channel corresponds to the data of the Y channel of the YUV pixel value in the detection area.
  • the second channel is used to reduce the size of the detection area to a quarter of the original size, and converts the reduced detection area into a YUV format, and converts it into a detection area of the YUV format by Sobel edge operator filtering, respectively obtaining a detection area.
  • the first edge map is on the three channels of Y, ⁇ , V, wherein the Y, ⁇ , and V channels respectively correspond to a first edge map, and the maximum values at each position in the three first edge maps are formed to form a second
  • the edge map, the three first edge maps and the second edge map are the same size, all of which are one quarter of the detection area, and the mosaic map of the three first edge maps and the second edge map is used as the output data of the second channel. .
  • the third channel is used to reduce the size of the detection area to a quarter of the original size, and converts the reduced detection area into a YUV format, and converts it into a detection area of the YUV format by Sobel edge operator filtering, respectively obtaining a detection area.
  • the Y, U, and V channels respectively correspond to a first edge map, and generate a third edge map.
  • the data of each position of the third edge map is 0, and the three first edge maps and the third edge map have the same size, and both are detection regions. a quarter of the size, the mosaic of the three first edge map and the third edge map as the output data of the third channel;
  • the output data of the first channel, the second channel, and the third channel are used as pre-processed pixel values, and then the pre-processed image is subjected to feature extraction to obtain a feature map of the detection region, and the feature extraction layer can pass the direction gradient value.
  • the feature maps of the abbreviated regions are extracted by means of a square map H0G, SIFT, Gabor, LBP, and the like.
  • Step 102 The part detecting layer respectively detects the feature map of the detection area through the M filters, and outputs a response map corresponding to the M parts of the detection area, and each filter is used to detect one part, and each part corresponds to one response picture.
  • the part detection layer can be regarded as a downsampling layer of the convolutional neural network system, through M
  • the filter detects the feature map of the detection area separately, and obtains more detailed feature parts than the feature map.
  • the part detecting layer includes three sub-layers, which are a first sub-layer, a second sub-layer, and a third sub-layer, respectively, and the first sub-layer of the part detecting layer includes M1 filters, and the second part of the detecting layer
  • the size of the corresponding filter is fixed.
  • the size of each filter may be different in this embodiment, and the present invention is not correct. This is a limitation.
  • M1 filters of the first sub-layer of the part detecting layer respectively detect M1 parts in the detection area to obtain M1 response pictures
  • M2 filters of the second sub-layer of the part detecting layer respectively detect M2 in the detection area
  • M2 response maps are obtained
  • M3 filters of the third sub-layer of the part detection layer respectively detect M3 parts in the detection area
  • M3 response maps are obtained.
  • the filter has a total of 20 filters.
  • the filters of each sub-layer are related to each other, the filter of the first sub-layer is smaller, and the filter of the second sub-layer is larger than the first sub-layer.
  • the filter of the third sub-layer is larger than the filter of the first sub-layer, and the filter of the second sub-layer can be combined by the filter of the first sub-layer according to certain rules, and the filter of the third sub-layer
  • the filter of the second sub-layer can be combined according to certain rules, as shown in FIG. 4, FIG.
  • FIG. 4 is a schematic diagram of the filter for detecting various parts of the body according to the present invention, the first filter and the first sub-layer
  • the two filters combine to obtain the first filter of the second sub-layer
  • the first filter of the first sub-layer and the third filter combine to obtain the second filter of the second sub-layer, but some filters Cannot be combined, such as the first filter of the first sublayer and
  • the fifth filter is not combinable.
  • the parameters of each filter are obtained when training the convolution network. In this step, only the respective filters and convolution images are convoluted to obtain 20 response graphs, each filtering.
  • the device outputs a response map, and each response map corresponds to some parts of the target object to obtain the position of each part of the target object.
  • Figure 5 is a schematic diagram showing the results of the detection of the part detection layer.
  • Step 103 The deformation processing layer determines the deformation of the M parts according to the response map corresponding to the M parts, and determines the score map of the M parts according to the deformation of the N parts.
  • the part detecting layer can detect some parts of the target object appearing in the detection area, In the actual image, the target object will be deformed due to the movement of various parts. For example, the movement of the pedestrian body (such as the head, body, and legs) will cause the deformation of the pedestrian's visual information.
  • the deformation processing layer is to learn the various parts of the target object. The correlation relationship before the row change, the deformation processing layer extracts the M position positions and the scores which are most suitable for the human body from the M part detection response maps, thereby extracting the association between the respective parts.
  • the deformation processing layer determines the deformation of the M parts according to the response maps corresponding to the M parts, and determines the score maps of the M parts according to the deformation of the M parts, specifically:
  • the deformation processing layer obtains the shape of the M parts according to the formula (1) according to the response map corresponding to the M parts:
  • ⁇ ⁇ indicates that the shape of the p-th part becomes a partial graph
  • ⁇ ⁇ represents the response map corresponding to the ⁇ -th part
  • represents the constraint condition of the ⁇ -th part
  • D w represents the n-th constraint condition
  • 1 ⁇ ⁇ ⁇ 0 ⁇ represents the weight corresponding to the nth constraint condition, where each constraint condition corresponds to one deformation, taking the first part as the human head as an example, the head movement usually has left The rotation, right turn, down, and up are deformed in four.
  • Each constraint corresponds to one weight, and the weight is used to indicate the probability of each deformation of the head.
  • each part is calculated by the formula (1), and then the deformation processing layer becomes a sub-graph according to the shape, and the score map of the first part is determined according to the formula (2):
  • FIG. 6 is a schematic diagram of the operation flow of the deformation processing layer.
  • M P represents the response map corresponding to the p-th part, and represents the first restriction condition of the P-th part, and represents the second restriction condition of the P-th part, indicating the P-th.
  • the third constraint of the part, A represents the fourth constraint of the P part
  • represents the weight corresponding to the first constraint
  • 0 ⁇ represents the weight corresponding to the second constraint
  • 0 ⁇ represents the third restriction
  • the weight corresponding to the condition, 0 ⁇ indicates the fourth limit
  • the weight corresponding to the condition is then weighted, and then the respective constraint conditions and the response map corresponding to the p-th part are weighted and summed to obtain a shape of the P-th part, and then the shape becomes a coordinate corresponding to the maximum value in the partial image (X, y) Position as the best position for Part P.
  • Step 104 The occlusion processing layer determines the occlusion corresponding to the M parts according to the score map of the M parts.
  • the occlusion processing layer includes three sub-layers, which are a first sub-layer, a second sub-layer, and a third sub-layer, respectively, and the occlusion processing layer determines the occlusion corresponding to the M parts according to the score map of the M parts, specifically :
  • the occlusion processing layer determines the score map and visibility of the M parts on the sub-layer of the occlusion processing layer; the first sub-layer, the second sub-layer, and the third sub-layer of the occlusion processing layer are respectively according to formulas (3), (4) ),
  • Visibility is a type 8 function, Indicates the visibility of the Pth part on the Zth sublayer of the occlusion processing layer, and the transfer matrix between ⁇ and ⁇ , the jth column of ⁇ , the parameter of the linear classifier of the implied variable ⁇ , Represents a transpose of matrix X, representing the output of the convolutional neural network.
  • each part may have multiple parent nodes and child nodes, and the visibility of each part is related to the visibility of other parts of the same layer. Expressed as having the same parent node, the visibility of the next layer is related to the visibility of several parts of the previous layer.
  • FIG. 7 is a schematic view of the processing process of the occlusion processing layer, and the visibility of the first two portions of the first layer is strongly correlated with the visibility of the second layer, because structurally, The two parts mentioned can be combined to obtain the part of the second layer, that is, the two parts of the previous layer have higher visibility in the image (the degree of matching of the parts is relatively high), which directly causes the latter layer to be combined by them.
  • the visibility of the parts is also relatively high.
  • the visibility of the second layer is also related to the score of its own part, which is intuitively understood. Yes, when the matching score of a part is relatively high, the visibility is naturally higher. All parameters of the occlusion processing layer are learned by the backward propagation algorithm.
  • Step 105 The classifier determines whether there is a target object in the detection area according to an output result of the occlusion processing layer.
  • the occlusion processing layer determines the occlusion degree of each part according to the score map of each part, and the occlusion degree is embodied by visibility.
  • the classifier determines whether there is a target object in the detection area according to the output result of the occlusion processing layer, and outputs the detection result. .
  • Fig. 8 is a schematic view showing the detection result of the target object of the present invention.
  • the method provided in this embodiment is a unified convolutional neural network model which combines optimized feature extraction, part detection, deformation processing, occlusion processing and classifier learning, and enables the convolutional neural network to learn the target object through the deformation processing layer.
  • the deformation, and deformation learning and occlusion processing interact, which enhances the ability of the classifier to distinguish between pedestrians and non-pedestrians based on the learned features.
  • the convolutional neural network-based target object detection method Before adopting the convolutional neural network-based target object detection method provided in the first embodiment, it is first necessary to pre-train the convolutional neural network to obtain parameters of each layer of the convolutional neural network.
  • all of our parameters including image features, deformation parameters, and visibility relationships, can be learned through a unified architecture.
  • a multi-level training strategy is adopted.
  • a supervised learning method was used to learn a convolutional network with only one layer.
  • a Gabor filter was used as the initial value of the filter.
  • add a second layer add a second layer, and then learn the two-tier network, and the network that only learned one layer before is treated as the initial value.
  • all parameters are learned using the method of backward propagation.
  • the prediction error updates all parameters by the backward propagation method, where the propagation for s
  • the expression of the gradient is as follows: :", dL dL dh _ dL
  • the loss function can have many forms. For example, for a squared error loss function, the expression is:
  • the actual result of the training sample is represented, and the output result obtained by the convolutional neural network of the present invention is expressed. If the value of the loss function does not satisfy the preset condition, the parameters are continuously trained until the loss function satisfies the preset condition. .
  • FIG. 9 shows the overall model of the present invention.
  • First enter an image of size 84 x 72.
  • the image consists of 3 layers.
  • the first layer is convolved on the input image.
  • the size of the partial sliding window is 9x9 .
  • the filtered 64-layer 76x is obtained.
  • the image of the size is then averaged according to the four surrounding pixels adjacent to each pixel point to obtain a 64-layer image of l 9 x l 5 size, and then the feature map of the image of the size of 19 ⁇ 1.5 is extracted,
  • These processes are completed by the feature extraction layer, and then the second layer convolution operation is performed on the extracted feature map by the location detection.
  • the 20 filters are used to filter the image to obtain 20 parts response maps. Then, the deformation processing layer determines the score maps of 20 parts according to the response maps of 20 parts. Finally, the occlusion processing layer determines the occlusion corresponding to 20 parts according to the score map of 20 parts, and obtains 20 parts. Visibility The visibility determination portion 20 determines whether an object within the detection target area.
  • the convolutional neural network provided by the present embodiment includes: a feature extraction layer 21, a portion detection layer 22, a deformation processing layer 23, and an occlusion processing layer. 24 and classifier 25.
  • the feature extraction layer 21 is configured to preprocess the pixel value of the detection area according to the pixel value of the detection area in the extracted image, and perform feature extraction on the preprocessed image to obtain a feature map of the detection area;
  • the part detecting layer 22 is configured to respectively detect the feature map of the detection area by the M filters, and output a response map corresponding to the M parts of the detection area, each filter is used for detecting one part, and each part corresponds to one response picture;
  • the deformation processing layer 23 is configured to respectively determine deformations of the M parts according to the response maps corresponding to the M parts, and determine the score maps of the M parts according to the deformation of the N parts;
  • the occlusion processing layer 24 is configured to determine the occlusion corresponding to the M parts according to the score map of the M parts;
  • the classifier 25 is configured to determine whether there is a target object in the detection area according to an output result of the occlusion processing layer.
  • the feature extraction layer 21 may include three channels, which are respectively a first channel, a second channel, and a third channel; wherein, the output data of the first channel corresponds to the Y channel data of the YUV pixel value in the detection area;
  • the second channel is configured to reduce the size of the detection area to a quarter of the original size, and convert the reduced detection area into a YUV format, and convert it into a detection area of the YUV format by Sobel edge operator filtering, and respectively detect
  • the first edge map of the three channels Y, U, V, the Y, ⁇ , V channels respectively correspond to a first edge map, taking the maximum value at each position in the three first edge maps to form a second edge
  • the three first edge maps and the second edge maps are the same size, each of which is a quarter of the detection area, and the mosaic map of the three first edge maps and the second edge map is used as the output data of the second channel;
  • the third channel is configured to reduce the size of the detection area to a quarter of the original size, and convert the reduced detection area into a YUV format, and convert it into a detection area of the YUV format by Sobel edge operator filtering, and respectively detect
  • the first edge map of the region on the Y, U, V channels, the Y, U, V channels respectively correspond to a first edge map, generating a third edge map, the data of each position of the third edge map is 0, three
  • the first edge map and the third edge map are the same size, and each is a quarter of the detection area, and the mosaic of the three first edge maps and the third edge map is used as the output data of the third channel.
  • the deformation processing layer 23 is specifically configured to: according to the response map corresponding to the M parts, respectively obtain the shape of the Pth part according to the formula (1):
  • ⁇ ⁇ indicates that the shape of the p-th part becomes a partial graph
  • 1 ⁇ ⁇ administrattable small
  • denotes the constraint condition of the part
  • D. /" ⁇ ⁇ water limiter corresponds to the score map
  • 1 ⁇ ⁇ ⁇ 0 ⁇ represents the weight corresponding to the nth constraint condition
  • represents the value of ⁇ ⁇ at the position (x, y).
  • the occlusion processing layer 24 includes three sub-layers, respectively:
  • the first sub-layer, the second sub-layer, and the sub-layer of the occlusion processing layer calculate the visibility of each part according to formulas (3), (4), and (5), respectively:
  • the weight matrix, the small offset of the table means: the first part of the P part in the occlusion processing layer
  • the visibility on the layer, ⁇ tXl + expi-t)) - 1 indicates the visibility of the Pth part on the Zth sublayer of the occlusion processing layer, and the transfer matrix between ⁇ and ⁇ is represented by ⁇
  • the convolutional neural network 300 of this embodiment includes: a processor 31 and a memory 32, and the processor 31 and the memory 32 are connected by a bus.
  • the memory 32 stores execution instructions.
  • the processor 31 communicates with the memory 32, and the processor 31 executes instructions to cause the convolutional neural network 300 to perform the convolutional neural network system based on the present invention.
  • Target object detection method In this embodiment, the feature extraction layer, the part detection layer, the deformation processing layer, the occlusion processing layer, and the classifier of the convolutional neural network may be implemented by the processor 31, and the functions of the layers are performed by the processor 31. specifically:
  • the processor 31 controls the feature extraction layer to preprocess the pixel values of the detection area according to the pixel values of the detection area in the extracted image, and performs feature extraction on the preprocessed image to obtain a feature map of the detection area;
  • the control part of the processor 31 detects the feature map of the detection area through the M filters, and outputs a response map corresponding to the M parts of the detection area, and each filter is used for detecting one part, and each part corresponds to one response picture;
  • the processor 31 controls the deformation processing layer to determine the deformation of the M parts according to the response maps corresponding to the M parts, and determines the score map of the M parts according to the deformation of the N parts;
  • the processor 31 controls the occlusion processing layer to determine the occlusion corresponding to the M parts according to the score map of the M parts;
  • the processor 31 controls the classifier to determine whether or not there is a target object in the detection area based on the output result of the occlusion processing layer.
  • the feature extraction layer includes three channels, which are a first channel, a second channel, and a third channel, respectively.
  • the output data of the first channel corresponds to the Y channel data of the YUV pixel value in the detection area
  • the second channel is used to reduce the size of the detection area to a quarter of the original size, and converts the reduced detection area into a YUV format, and converts it into a detection area of the YUV format by Sobel edge operator filtering, respectively obtaining a detection area.
  • the first edge map on the Y, U, V channels, the Y, U, V channels respectively correspond to a first edge map, taking the maximum values at each position in the three first edge maps to form a second edge map
  • the three first edge maps and the second edge maps have the same size, all of which are one quarter of the detection area, and the three first edge maps and the second edge map are spelled together. Connected as the output data of the second channel;
  • the third channel is used to reduce the size of the detection area to a quarter of the original size, and converts the reduced detection area into a YUV format, and converts it into a detection area of the YUV format by Sobel edge operator filtering, respectively obtaining a detection area.
  • the first edge map on the Y, U, V channels the Y, U, and V channels respectively correspond to a first edge map, and a third edge map is generated, and the data of each position of the third edge map is 0, three
  • the first edge map and the third edge map are the same size, and each is a quarter of the detection area, and the mosaic of the three first edge maps and the third edge map is used as the output data of the third channel.
  • the part detecting layer comprises three sub-layers, namely a first sub-layer, a second sub-layer and a third sub-layer, respectively, the first sub-layer of the part detecting layer comprises Ml filters, and the second sub-layer of the part detecting layer comprises M2 a filter, the third sub-layer of the part detecting layer includes M3 filters, wherein
  • M1+M2+M3 M; M1 filters of the first sub-layer of the part detection layer respectively detect M1 parts in the detection area to obtain M1 response pictures; M2 filters of the second sub-layer of the part detection layer M2 regions in the detection area are respectively detected, and M2 response maps are obtained; M3 filters in the third sub-layer of the part detection layer respectively detect M3 parts in the detection area, and M3 response maps are obtained.
  • the deformation processing layer determines the deformation of the M parts according to the response maps corresponding to the M parts, and determines the score maps of the M parts according to the deformation of the M parts, specifically: the deformation processing layer corresponds to the M parts.
  • the response graph according to the formula (1), obtains the shape of the Pth part into a subgraph:
  • the shape indicating the ⁇ th portion becomes a partial graph, l ⁇ p ⁇ M, M p represents a response map corresponding to the pth portion, N represents a restriction condition of the pth portion, D ",p represents an nth constraint condition Corresponding score graph, 1 ⁇ ⁇ ⁇ 0 ⁇ indicates the weight corresponding to the nth constraint condition;
  • the deformation processing layer is divided into graphs according to the shape, and the score map of the Pth portion is determined according to the formula (2):
  • the occlusion processing layer includes three sub-layers, which are a first sub-layer, a second sub-layer, and a third sub-layer, respectively, and the occlusion processing layer determines the occlusion corresponding to each part according to the score map of the one part, include:
  • the occlusion processing layer determines the score map and visibility of the M parts on the sub-layer of the occlusion processing layer; the first sub-layer, the second sub-layer, and the third sub-layer of the occlusion processing layer are respectively according to formulas (3), (4) ), (5) Calculate the visibility of each part:
  • the score map ⁇ denotes the weight matrix of 4, denotes the offset of 4, and shows the visibility of the Pth part on the first layer of the occlusion processing layer, ⁇ tXl + expi-t)) - 1 , indicating the Pth
  • the transpose which represents the output of the convolutional neural network.
  • the convolutional neural network provided in this embodiment is used to implement the technical solution provided by the method embodiment shown in FIG. 3. The specific implementation manner and technical effects are similar, and details are not described herein again.
  • the aforementioned program can be stored in a computer readable storage medium.
  • the program when executed, performs the steps including the above-described method embodiments; and the foregoing storage medium includes: a medium that can store program codes, such as a ROM, a RAM, a magnetic disk, or an optical disk.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Multimedia (AREA)
  • Neurology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Image Analysis (AREA)

Abstract

A convolutional neural network and a target object detection method based on same. The convolutional neural network comprises: a feature extraction layer (21), a part detection layer (22), a deformation processing layer (23), a shielding processing layer (24) and a classifier (25). The convolutional neural network is united with the optimization feature extraction, part detection, deformation processing, shielding processing and classifier learning, the convolutional neural network is able to learn about the deformation of a target object via the deformation processing layer, and interaction is conducted on the deformation learning and the shielding processing, so that such interaction can increase the capability of the classifier to distinguish the target object from a non-target object according to learnt features.

Description

卷积神经网络和基于卷积神经网络的目标物体检测方法 技术领域  Convolutional neural network and target object detection method based on convolutional neural network
本发明涉及数据通讯技术, 尤其涉及一种卷积神经网络和基于卷积神 经网络的目标物体检测方法。  The present invention relates to data communication technologies, and more particularly to a convolutional neural network and a target object detection method based on a convolutional neural network.
背景技术 Background technique
物体检测是机器视觉中的基本的问题之一, 检测到物体后能方便对物 体进行存储、 分析、 3D建模、 识别、 跟踪和搜索。 常用的物体检测如行 人检测, 行人检测的目的是在图像中找出行人的位置和所占区域, 行人检 测的主要难点是行人在着装、 光照、 背景、 身体形变和遮挡方面的变化。 行人检测时, 首先, 需要提取出能够区别行人和非行人的特征, 常用的方 法有 Haar-like特征和梯度直方图 (Histogram of Oriented Gradient, 简称 HOG ) 。 其次, 由于行人身体 (如头, 身体, 腿) 的运动会产生行人视觉 信息的形变, 所以提出了可以形变的模型用于处理行人身体运动造成的形 变。 再次, 为了解决由于遮挡造成的视觉信息丢失, 很多处理遮挡的方法 找出图片中行人被遮挡的部位以避免用这些被遮挡的图像信息来判断给 定矩形框中是否存在行人。 最后, 分类器用于判断给定的矩形框中是否存 在行人。  Object detection is one of the basic problems in machine vision. After detecting an object, it is convenient to store, analyze, 3D model, identify, track and search the object. Commonly used object detection, such as pedestrian detection, the purpose of pedestrian detection is to find the position and area of the pedestrian in the image. The main difficulty in pedestrian detection is the change of pedestrians in dressing, lighting, background, body deformation and occlusion. In the case of pedestrian detection, first, it is necessary to extract features that distinguish between pedestrians and non-pedestrians. The commonly used methods are Haar-like features and Histogram of Oriented Gradient (HOG). Secondly, since the movement of the pedestrian's body (such as the head, body, and legs) produces a change in the visual information of the pedestrian, a deformable model is proposed to deal with the deformation caused by the movement of the pedestrian. Again, in order to solve the loss of visual information due to occlusion, many methods of dealing with occlusion find the occluded portion of the picture in the picture to avoid using the occluded image information to determine whether there is a pedestrian in the given rectangle. Finally, the classifier is used to determine if a pedestrian is present in a given rectangle.
图 1为现有技术一的行人检测方法示意图, 如图 1所示, 现有技术一 的行人检测方法主要包括以下歩骤: 1、 在第一阶段对一幅输入图像进行 卷积, 对卷积后的结果进行下采样得到第一阶段的输出; 2、 根据第一阶 段的输出继续进行卷积和下采样得到第二阶段中上面一行的输出; 3、 第 一阶段的输出通过支线进行下采样得到第二阶段中下面一行的输出; 4、 根据第二阶段的输出进行分类。 这种方法中, 主要是学习特征提取, 每一 歩在处理时对于处理结果并没有一个明确的目标, 因此, 输出结果是不可 预见的, 而且没有对行人的身体运动和遮挡进行建模。 当行人图像存在形 变和遮挡时, 效果较差。 图 2为现有技术二的行人检测的方法示意图, 该 方法将行人分成由整个行人的模板构成的根节点和由行人身体部分 (如 头, 腿上半部分, 或者腿下半部分) 构成的子节点。 子节点与根节点具有 形变约束, 例如头不能离身体太远。 如图 2所示, 该现有技术的行人检测 方法包括以下歩骤: 1、 对一幅输入图像进行特征提取, 得到两种不同分 辨率 (resolution)的特征图 (feature map); 2、 对低分辨率的特征图使用作为 根节点的滤波模板进行匹配, 得到匹配后的响应; 3、 对高分辨率的特征 图使用作为子节点的滤波模板进行匹配, 得到匹配后的响应。 图 2的模型 中有 5个子节点,所以有 5个子节点的滤波模板,得到 5个匹配后的响应;1 is a schematic diagram of a pedestrian detection method according to the prior art. As shown in FIG. 1, the pedestrian detection method of the prior art mainly includes the following steps: 1. Convolving an input image in a first stage, The result of the product is downsampled to obtain the output of the first stage; 2. The convolution and downsampling are continued according to the output of the first stage to obtain the output of the upper line in the second stage; 3. The output of the first stage is performed through the branch line. Sampling yields the output of the next row in the second phase; 4. Classifies according to the output of the second phase. In this method, the main feature is learning feature extraction. Each 歩 does not have a clear target for the processing result. Therefore, the output is unpredictable, and the body movement and occlusion of the pedestrian are not modeled. When the pedestrian image is deformed and occluded, the effect is poor. 2 is a schematic diagram of a method for pedestrian detection in prior art 2, which divides a pedestrian into a root node composed of a template of the entire pedestrian and a body part of the pedestrian (eg, The child nodes of the head, the upper part of the leg, or the lower part of the leg. The child node has a deformation constraint with the root node, for example, the head cannot be too far away from the body. As shown in FIG. 2, the prior art pedestrian detection method includes the following steps: 1. Feature extraction on an input image to obtain two feature maps with different resolutions; The low-resolution feature map is matched using the filter template as the root node to obtain the matched response. 3. The high-resolution feature map is matched using the filter template as the child node to obtain the matched response. The model in Figure 2 has 5 sub-nodes, so there are 5 sub-node filter templates, and 5 matched responses are obtained;
4、 子节点的响应通过与根节点的形变约束进行修正, 得到修正后的响应;4. The response of the child node is corrected by the deformation constraint of the root node, and the corrected response is obtained;
5、 根据子节点的响应和根节点的响应得到对于行人是否存在的总体响应。 现有技术二能够对物体部分形变进行建模, 对身体运动更加鲁棒, 但是此 技术在根据物体的特征图与模板进行匹配时, 使用人为定义的特征, 不能 自动学习特征, 并且不能处理遮挡的情况。 发明内容 本发明实施例提供一种卷积神经网络和基于卷积神经网络的目标物 体检测方法, 能够对目标物体的形变和遮挡进行处理。 5. Get an overall response to the presence of a pedestrian based on the response of the child node and the response of the root node. The prior art 2 can model the deformation of the object part and is more robust to body motion, but this technique uses artificially defined features when matching the template according to the feature map of the object, cannot automatically learn the feature, and cannot handle the occlusion. Case. SUMMARY OF THE INVENTION Embodiments of the present invention provide a convolutional neural network and a target object detection method based on a convolutional neural network, which are capable of processing deformation and occlusion of a target object.
本发明第一方面提供一种基于卷积神经网络的目标物体检测方法, 所 述卷积神经网络包括: 特征提取层、 部位检测层、 形变处理层、 遮挡处理 层和分类器;  A first aspect of the present invention provides a method for detecting a target object based on a convolutional neural network, the convolutional neural network comprising: a feature extraction layer, a part detection layer, a deformation processing layer, an occlusion processing layer, and a classifier;
所述特征提取层根据提取图像中检测区域的像素值, 对所述检测区域 的像素值进行预处理, 并对所述预处理后的图像进行特征提取, 得到所述 检测区域的特征图;  The feature extraction layer performs pre-processing on the pixel value of the detection area according to the pixel value of the detection area in the extracted image, and performs feature extraction on the pre-processed image to obtain a feature map of the detection area;
所述部位检测层通过 M个过滤器分别检测所述检测区域的特征图, 输出所述检测区域的 M个部位对应的响应图, 每个过滤器用于检测一个 部位, 每个部位对应一个响应图;  The part detecting layer respectively detects the feature map of the detection area through the M filters, and outputs a response map corresponding to the M parts of the detection area, and each filter is used to detect one part, and each part corresponds to one response diagram. ;
所述形变处理层根据所述 M个部位对应的响应图分别确定所述 M个 部位的形变, 并根据所述 N个部位的形变确定所述 M个部位的得分图; 所述遮挡处理层根据所述 M个部位的得分图确定所述 M个部位对应 的遮挡;  Deformation processing layer respectively determines deformations of the M parts according to the response maps corresponding to the M parts, and determines a score map of the M parts according to deformations of the N parts; the occlusion processing layer is configured according to a score map of the M parts determines an occlusion corresponding to the M parts;
所述分类器根据所述遮挡处理层的输出结果, 确定所述检测区域内是 否有目标物体。 The classifier determines, according to an output result of the occlusion processing layer, that the detection area is No target object.
在本发明第一方面的第一种可能的实现方式中, 所述特征提取层提取 图像中检测区域的像素值,对所述检测区域内的像素值进行预处理,包括: 所述特征提取层提取所述图像中检测区域的像素值, 将所述检测区域 的像素值转换为三个通道的数据, 所述三个通道分别为第一通道、 第二通 道、 第三通道;  In a first possible implementation manner of the first aspect of the present invention, the feature extraction layer extracts a pixel value of the detection area in the image, and performs preprocessing on the pixel value in the detection area, including: the feature extraction layer Extracting a pixel value of the detection area in the image, converting the pixel value of the detection area into data of three channels, wherein the three channels are a first channel, a second channel, and a third channel;
其中, 所述第一通道的输出数据对应所述检测区域内的 YUV像素值 的 Y通道数据;  The output data of the first channel corresponds to Y channel data of a YUV pixel value in the detection area;
所述第二通道用于将所述检测区域的大小缩小为原大小的四分之一, 并将所述缩小后的检测区域转换成 YUV格式, 通过 Sobel边缘算子过滤 所述转换为 YUV格式的检测区域, 分别得到所述检测区域在 Y、 U、 V三 个通道上第一边缘图, 所述 Y、 υ、 V通道分别对应一个第一边缘图, 取 所述三个第一边缘图中各位置上的最大值, 组成一个第二边缘图, 所述三 个第一边缘图以及第二边缘图大小相同, 都为所述检测区域的四分之一大 小, 将所述三个第一边缘图和所述第二边缘图的拼接图作为所述第二通道 的输出数据;  The second channel is configured to reduce the size of the detection area to a quarter of the original size, and convert the reduced detection area into a YUV format, and convert the image to a YUV format by a Sobel edge operator. a first edge map of the detection area on the three channels Y, U, and V, wherein the Y, υ, and V channels respectively correspond to a first edge map, and the three first edge maps are taken a maximum value at each position in the middle, forming a second edge map, the three first edge maps and the second edge map having the same size, each being a quarter of the detection area, and the three a mosaic of the edge map and the second edge map as output data of the second channel;
所述第三通道用于将所述检测区域的大小缩小为原大小的四分之一, 并将所述缩小后的检测区域转换成 YUV格式, 通过 Sobel边缘算子过滤 所述转换为 YUV格式的检测区域, 分别得到所述检测区域在 Y、 U、 V三 个通道上的第一边缘图, 所述 Y、 U、 V通道分别对应一个第一边缘图, 生成一个第三边缘图, 所述第三边缘图各位置的数据为 0, 所述三个第一 边缘图以及第三边缘图大小相同, 都为所述检测区域的四分之一大小, 将 所述三个第一边缘图和所述第三边缘图的拼接图作为所述第三通道的输 出数据。  The third channel is configured to reduce the size of the detection area to a quarter of the original size, and convert the reduced detection area into a YUV format, and convert the image to a YUV format by a Sobel edge operator. a detection area, respectively obtaining a first edge map of the detection area on three channels of Y, U, V, wherein the Y, U, V channels respectively correspond to a first edge map, and generate a third edge map, The data of each position of the third edge map is 0, and the three first edge maps and the third edge map have the same size, which are all a quarter of the detection area, and the three first edge maps are And a mosaic of the third edge map as output data of the third channel.
在本发明第一方面的第二种可能的实现方式中, 所述部位检测层包括 三个子层, 分别为第一子层、 第二子层和第三子层, 所述部位检测层的第 一子层包括 Ml个过滤器,所述部位检测层的第二子层包括 M2个过滤器, 所述部位检测层的第三子层包括 M3个过滤器, 其中, M1+M2+M3=M; 所述部位检测层的第一子层的 Ml个过滤器分别检测所述检测区域内 的 Ml个部位, 得到 Ml个响应图; 所述部位检测层的第二子层的 M2个过滤器分别检测所述检测区域内 的 M2个部位, 得到 M2个响应图; In a second possible implementation manner of the first aspect of the present invention, the part detecting layer includes three sub-layers, which are a first sub-layer, a second sub-layer, and a third sub-layer, respectively A sub-layer includes M1 filters, a second sub-layer of the portion detecting layer includes M2 filters, and a third sub-layer of the portion detecting layer includes M3 filters, where M1+M2+M3=M And M1 filters of the first sub-layer of the part detecting layer respectively detect M1 parts in the detection area to obtain M1 response pictures; The M2 filters of the second sub-layer of the part detecting layer respectively detect M2 parts in the detection area, and obtain M2 response patterns;
所述部位检测层的第三子层的 M3个过滤器分别检测所述检测区域内 的 M3个部位, 得到 M3个响应图。  The M3 filters of the third sub-layer of the part detecting layer respectively detect M3 parts in the detection area to obtain M3 response maps.
在本发明第一方面的第三种可能的实现方式中, 所述形变处理层根据 所述 M个部位对应的响应图分别确定所述 M个部位的形变, 并根据所述 M个部位的形变确定所述 M个部位的得分图, 包括:  In a third possible implementation manner of the first aspect of the present invention, the deformation processing layer respectively determines deformations of the M parts according to the response maps corresponding to the M parts, and according to the deformation of the M parts Determining the score map of the M parts, including:
所述形变处理层根据所述 M个部位对应的响应图,分别按照公式(1) 得到所述第 P个部位的形变得分图:  The deformation processing layer obtains the shape of the Pth portion according to the formula (1) according to the response map corresponding to the M parts:
Bp =Mp+∑Dn^p (1) 其中, ^表示第 p个部分的形变得分图, l≤p≤M, Mp表示所述第 p 个部分对应的响应图, N表示所述第 p个部位的限制条件, Dw表示第 n 个限制条件对应的得分图, 1≤^≤ 0^表示第 n个限制条件对应的权重; 所述形变处理层根据所述形变得分图, 按照公式 (2) 确定所述第 P 部位的得分图:B p = M p + D n ^ p (1) where ^ denotes that the shape of the p-th part becomes a partial graph, l ≤ p ≤ M, M p denotes a response map corresponding to the p-th part, and N denotes a constraint condition of the p-th part, D w represents a score map corresponding to the n-th constraint condition, 1 ≤ ^ ≤ 0^ represents a weight corresponding to the n-th constraint condition; the deformation processing layer becomes a partial map according to the shape, Determine the score map of the Pth part according to formula (2):
=maxB(, (2) 其中, s )表示 (χ, y)位置上 ^的值。  =maxB(, (2) where s ) represents the value of ^ at the position (χ, y).
在本发明第一方面的第四种可能的实现方式中, 所述遮挡处理层包括 三个子层, 分别为第一子层、 第二子层、 第三子层, 所述遮挡处理层根据 所述 M个部位的得分图确定所述 M个部位对应的遮挡, 包括:  In a fourth possible implementation manner of the first aspect of the present invention, the occlusion processing layer includes three sub-layers, respectively a first sub-layer, a second sub-layer, and a third sub-layer, and the occlusion processing layer is The score maps of the M parts determine the occlusion corresponding to the M parts, including:
所述遮挡处理层确定所述 M个部位在所述遮挡处理层的子层上的得 分图和可视性;  The occlusion processing layer determines a score map and visibility of the M portions on a sublayer of the occlusion processing layer;
所述遮挡处理层的第一子层、第二子层、第三子层分别按照公式(3)、 The first sub-layer, the second sub-layer, and the third sub-layer of the occlusion processing layer are respectively according to formula (3),
(4) 、 (5) 计算所述各个部位的可视性:(4), (5) Calculate the visibility of each part:
Figure imgf000006_0001
Figure imgf000006_0001
h;' ^S((h')Tw^ + c' + g ), 1=1, 2 (4) y = S((h3)T wds + b) (5) 其中, 表示第 P个部位在所述遮挡处理层的第 1层上的得分图, 表示 4的权重矩阵, 表示 4的偏置, 表示第 P个部位在所述遮挡处理 层的第 1层上的可视性, ^tXl + expi-t))-1 , 表示第 P个部位在所述遮挡 处理层的第 Z子层上可视性, 用 ^表示 ^和 之间的传递矩阵, 表示 ^ 的第 j列, Wds表示隐含变量 ^的线性分类器的参数, «表示矩阵 X的 转置, 表示所述卷积神经网络的输出结果 。 本发明第二方面提供一种卷积神经网络, 包括: h;'^S((h') T w^ + c' + g ), 1=1, 2 (4) y = S((h 3 ) T w ds + b) (5) where, denotes the P a score map of the part on the first layer of the occlusion treatment layer, A weight matrix representing 4, indicating an offset of 4, indicating the visibility of the Pth portion on the first layer of the occlusion processing layer, ^tXl + expi-t)) - 1 , indicating that the Pth portion is The visibility on the Zth sublayer of the occlusion processing layer, using ^ to represent the transfer matrix between ^ and ^, represents the jth column of ^, W ds represents the parameter of the linear classifier of the implicit variable ^, «representation matrix The transposition of X represents the output of the convolutional neural network. A second aspect of the present invention provides a convolutional neural network, including:
特征提取层, 用于根据提取图像中检测区域的像素值, 对所述检测区 域的像素值进行预处理, 并对所述预处理后的图像进行特征提取, 得到所 述检测区域的特征图;  a feature extraction layer, configured to preprocess a pixel value of the detection area according to a pixel value of the detection area in the extracted image, and perform feature extraction on the preprocessed image to obtain a feature map of the detection area;
部位检测层, 用于通过 M个过滤器分别检测所述检测区域的特征图, 输出所述检测区域的 M个部位对应的响应图, 每个过滤器用于检测一个 部位, 每个部位对应一个响应图;  a part detecting layer, configured to respectively detect a feature map of the detection area by M filters, and output a response map corresponding to M parts of the detection area, each filter is used to detect one part, and each part corresponds to one response Figure
形变处理层, 用于根据所述 M个部位对应的响应图分别确定所述 M 个部位的形变, 并根据所述 N个部位的形变确定所述 M个部位的得分图; 遮挡处理层, 用于根据所述 M个部位的得分图确定所述 M个部位对 应的遮挡;  a deformation processing layer, configured to respectively determine deformations of the M parts according to the response maps corresponding to the M parts, and determine a score map of the M parts according to the deformation of the N parts; the occlusion processing layer, Determining the occlusion corresponding to the M parts according to the score map of the M parts;
分类器, 用于根据所述遮挡处理层的输出结果, 确定所述检测区域内 是否有目标物体。  And a classifier, configured to determine, according to an output result of the occlusion processing layer, whether there is a target object in the detection area.
在本发明第二方面的第一种可能的实现方式中, 所述特征提取层包括 三个通道, 分别为第一通道、 第二通道、 第三通道;  In a first possible implementation manner of the second aspect of the present invention, the feature extraction layer includes three channels, which are a first channel, a second channel, and a third channel, respectively;
其中, 所述第一通道的输出数据对应所述检测区域内的 YUV像素值 的 Y通道数据;  The output data of the first channel corresponds to Y channel data of a YUV pixel value in the detection area;
所述第二通道, 用于将所述检测区域的大小缩小为原大小的四分之 一, 并将所述缩小后的检测区域转换成 YUV格式, 通过 Sobel边缘算子 过滤所述转换为 YUV格式的检测区域, 分别得到所述检测区域在 Y、 U、 V三个通道上第一边缘图, 所述 Y、 U、 V通道分别对应一个第一边缘图, 取所述三个第一边缘图中各位置上的最大值, 组成一个第二边缘图, 所述 三个第一边缘图以及第二边缘图大小相同, 都为所述检测区域的四分之一 大小, 将所述三个第一边缘图和所述第二边缘图的拼接图作为所述第二通 道的输出数据; The second channel is configured to reduce the size of the detection area to a quarter of the original size, and convert the reduced detection area into a YUV format, and convert the image to YUV by Sobel edge operator filtering a detection area of the format, respectively obtaining a first edge map of the detection area on three channels of Y, U, V, wherein the Y, U, V channels respectively correspond to a first edge map, and the three first edges are taken The maximum value at each position in the figure constitutes a second edge map, and the three first edge maps and the second edge map are the same size, which are one quarter of the detection area. a size, the mosaic of the three first edge maps and the second edge map is used as output data of the second channel;
所述第三通道, 用于将所述检测区域的大小缩小为原大小的四分之 一, 并将所述缩小后的检测区域转换成 YUV格式, 通过 Sobel边缘算子 过滤所述转换为 YUV格式的检测区域, 分别得到所述检测区域在 Y、 U、 V三个通道上的第一边缘图, 所述 Y、 υ、 V通道分别对应一个第一边缘 图, 生成一个第三边缘图, 所述第三边缘图各位置的数据为 0, 所述三个 第一边缘图以及第三边缘图大小相同, 都为所述检测区域的四分之一大 小, 将所述三个第一边缘图和所述第三边缘图的拼接图作为所述第三通道 的输出数据。  The third channel is configured to reduce the size of the detection area to a quarter of the original size, and convert the reduced detection area into a YUV format, and convert the image to YUV by Sobel edge operator filtering. a detection area of the format, respectively obtaining a first edge map of the detection area on three channels of Y, U, V, wherein the Y, υ, V channels respectively correspond to a first edge map, and generate a third edge map, The data of each position of the third edge map is 0, and the three first edge maps and the third edge map have the same size, all of which are one quarter of the detection area, and the three first edges are The figure and the mosaic of the third edge map serve as output data of the third channel.
在本发明第二方面的第二种可能的实现方式中, 所述部位检测层包括 三个子层, 分别为第一子层、 第二子层和第三子层, 所述部位检测层的第 一子层包括 Ml个过滤器,所述部位检测层的第二子层包括 M2个过滤器, 所述部位检测层的第三子层包括 M3个过滤器, 其中, M1+M2+M3=M; 所述部位检测层的第一子层, 用于通过 Ml个过滤器分别检测所述检 测区域内的 Ml个部位, 得到 Ml个响应图;  In a second possible implementation manner of the second aspect of the present invention, the part detecting layer includes three sub-layers, which are a first sub-layer, a second sub-layer, and a third sub-layer, respectively A sub-layer includes M1 filters, a second sub-layer of the portion detecting layer includes M2 filters, and a third sub-layer of the portion detecting layer includes M3 filters, where M1+M2+M3=M The first sub-layer of the part detecting layer is configured to respectively detect M1 parts in the detection area by using M1 filters, and obtain M1 response patterns;
所述部位检测层的第二子层, 用于通过 M2个过滤器分别检测所述检 测区域内的 M2个部位, 得到 M2个响应图;  a second sub-layer of the part detecting layer is configured to respectively detect M2 parts in the detecting area by using M2 filters, and obtain M2 response patterns;
所述部位检测层的第三子层, 用于通过 M3个过滤器分别检测所述检 测区域内的 M3个部位, 得到 M3个响应图。  The third sub-layer of the part detecting layer is configured to respectively detect M3 parts in the detection area by M3 filters, and obtain M3 response patterns.
在本发明第二方面的第三种可能的实现方式中, 所述形变处理层具体 用于:  In a third possible implementation manner of the second aspect of the present invention, the deformation processing layer is specifically configured to:
所述形变处理层根据所述 M个部位对应的响应图,分别按照公式(1 ) 得到所述第 P个部位的形变得分图:  The deformation processing layer obtains the shape of the Pth portion according to the formula (1) according to the response map corresponding to the M parts:
Βρ = Μρ η,ρη,ρ ( 1 ) 其中, ^表示第 ρ个部分的形变得分图, ί≤ρ≤Μ, Μρ表示所述第 ρ 个部分对应的响应图, Ν表示所述第 ρ个部位的限制条件, D",p表示第 η 个限制条件对应的得分图, 1≤^≤ 0^表示第 η个限制条件对应的权重; 所述形变处理层根据所述形变得分图, 按照公式 (2 ) 确定所述第 Ρ 部位的得分图: =maxB(, (2) 其中, β )表示 (χ, y)位置上 的值。 Β ρ = Μ ρ η , ρ . η , ρ ( 1 ) where ^ denotes that the shape of the ρth portion becomes a partial graph, ί ≤ ρ ≤ Μ, Μ ρ represents a response map corresponding to the ρth portion, and Ν represents a constraint condition of the ρth portion , D ",p represents the score map corresponding to the nth constraint condition, 1 ≤ ^ ≤ 0^ represents the weight corresponding to the nth constraint condition; the deformation processing layer becomes a partial map according to the shape, and is determined according to the formula (2) The score map of the third part: =maxB(, (2) where β ) represents the value at the position (χ, y).
在本发明第二方面的第四种可能的实现方式中, 所述遮挡处理层包括 三个子层, 分别为第一子层、 第二子层、 第三子层;  In a fourth possible implementation manner of the second aspect, the occlusion processing layer includes three sub-layers, which are a first sub-layer, a second sub-layer, and a third sub-layer;
所述遮挡处理层的第一子层、第二子层、第三子层分别按照公式(3)、 The first sub-layer, the second sub-layer, and the third sub-layer of the occlusion processing layer are respectively according to formula (3),
(4) 、 (5) 计算所述各个部位的可视性:(4), (5) Calculate the visibility of each part:
= + g (3) h;l ^S((h')Tw^ + c' +g' s' ), 1=1,2 (4) y^S(( )T wds + b) (5) 其中, 表示第 P个部位在所述遮挡处理层的第 1层上的得分图, ^ 表示 4的权重矩阵, 表示 4的偏置, 表示第 P个部位在所述遮挡处理 层的第 1层上的可视性, ^tXl + expi-t))-1, 表示第 P个部位在所述遮挡 处理层的第 Z子层上可视性, 用 ^表示 ^和 之间的传递矩阵, 表示 ^ 的第 j列, 表示隐含变量 ^的线性分类器的参数, «表示矩阵 X的 转置, 表示所述卷积神经网络的输出结果 。本发明实施例提供一种(方 法主题) , 包括: = + g (3) h; l ^S((h') T w^ + c'+g' s ' ), 1=1,2 (4) y^S(( ) T w ds + b) ( 5) wherein, the score map of the Pth portion on the first layer of the occlusion processing layer, ^ indicates the weight matrix of 4, and the offset of 4 indicates that the Pth portion is at the occlusion processing layer The visibility on the 1st layer, ^tXl + expi-t)) - 1 , indicates the visibility of the Pth part on the Zth sublayer of the occlusion processing layer, and the transfer matrix between ^ and ^ is represented by ^ , represents the jth column of ^, represents the parameter of the linear classifier of the implied variable ^, « represents the transpose of the matrix X, and represents the output result of the convolutional neural network. An embodiment of the present invention provides a method subject, including:
本发明实施例的卷积神经网络和基于卷积神经网络的目标物体检测 方法, 由联合了优化特征提取、 部位检测、 形变处理、 遮挡处理和分类器 学习为一体的统一的卷积神经网络模型, 通过形变处理层使得卷积神经网 络能够学习目标物体的形变, 并且形变学习和遮挡处理进行交互, 这种交 互能提高提高分类器根据所学习到的特征分辨目标物体和非目标物体的 能力 附图说明  The convolutional neural network and the object detection method based on convolutional neural network in the embodiment of the present invention are combined with optimized convolutional neural network model integrating optimization feature extraction, part detection, deformation processing, occlusion processing and classifier learning. Through the deformation processing layer, the convolutional neural network can learn the deformation of the target object, and the deformation learning and the occlusion processing interact, and the interaction can improve the ability of the classifier to distinguish the target object from the non-target object according to the learned feature. Illustration
为了更清楚地说明本发明实施例或现有技术中的技术方案, 下面将对 实施例或现有技术描述中所需要使用的附图作一简单地介绍, 显而易见 地, 下面描述中的附图是本发明的一些实施例, 对于本领域普通技术人员 来讲, 在不付出创造性劳动性的前提下, 还可以根据这些附图获得其他的 附图。 In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, a brief description of the drawings used in the embodiments or the prior art description will be briefly described below. Obviously, the drawings in the following description Are some embodiments of the invention, to those of ordinary skill in the art In other words, other drawings can be obtained based on these drawings without paying for creative labor.
图 1为现有技术一的行人检测方法示意图;  1 is a schematic diagram of a pedestrian detection method according to prior art 1;
图 2为现有技术二的行人检测的方法示意图;  2 is a schematic diagram of a method for pedestrian detection in the prior art 2;
图 3为本发明基于卷积神经网络的目标物体检测方法一个实施例的流 程图;  3 is a flow chart of an embodiment of a method for detecting a target object based on a convolutional neural network according to the present invention;
图 4为本发明检测身体各个部位的过滤器的示意图;  4 is a schematic view of a filter for detecting various parts of a body according to the present invention;
图 5部位检测层检测得到的结果示意图;  Figure 5 is a schematic diagram showing the results of the detection of the part detection layer;
图 6为形变处理层的操作流程示意图;  6 is a schematic diagram of an operation flow of a deformation processing layer;
图 7为遮挡处理层的处理过程示意图;  7 is a schematic view showing a processing procedure of an occlusion processing layer;
图 8为本发明目标物体检测结果示意图;  8 is a schematic diagram of detection results of a target object according to the present invention;
图 9本发明总体模型示意图;  Figure 9 is a schematic view of the overall model of the present invention;
图 10为本发明卷积神经网络一个实施例的结构示意图;  10 is a schematic structural view of an embodiment of a convolutional neural network according to the present invention;
图 11为本发明卷积神经网络又一个实施例的结构示意图; 具体实施方式  11 is a schematic structural diagram of still another embodiment of a convolutional neural network according to the present invention;
为使本发明实施例的目的、 技术方案和优点更加清楚, 下面将结合本 发明实施例中的附图, 对本发明实施例中的技术方案进行清楚、 完整地描 述, 显然,所描述的实施例是本发明一部分实施例, 而不是全部的实施例。 基于本发明中的实施例, 本领域普通技术人员在没有作出创造性劳动前提 下所获得的所有其他实施例, 都属于本发明保护的范围。  The technical solutions in the embodiments of the present invention will be clearly and completely described in conjunction with the accompanying drawings in the embodiments of the present invention. It is a partial embodiment of the invention, and not all of the embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments of the present invention without creative efforts are within the scope of the present invention.
图 3为本发明基于卷积神经网络的目标物体检测方法一个实施例的流 程图, 本实施中, 卷积神经网络包括: 特征提取层、 部位检测层、 形变处 理层、 遮挡处理层和分类器, 如图 3所示, 本实施例的方法可以包括: 歩骤 101、 特征提取层根据提取图像中检测区域的像素值, 对区域的 像素值进行预处理, 并对预处理后的图像进行特征提取, 得到检测区域的 特征图。  3 is a flowchart of an embodiment of a method for detecting a target object based on a convolutional neural network according to the present invention. In this embodiment, a convolutional neural network includes: a feature extraction layer, a part detection layer, a deformation processing layer, an occlusion processing layer, and a classifier. As shown in FIG. 3, the method in this embodiment may include: Step 101: The feature extraction layer preprocesses the pixel values of the region according to the pixel values of the detection regions in the extracted image, and performs feature on the preprocessed image. Extraction, obtaining a feature map of the detection area.
本实施例中, 对目标物体进行检测只指检测在检测区域内是否有目标 物体存在, 检测区域可以是任意设置的一个区域, 如一副图像中在划分为 两个矩形框, 每个矩形框就作为一个检测区域。 目标物体可以是行人、 汽 车、 动物等等。 本实施例中, 在对检测区域内的图像进行特征提取之间, 先对图像进行预处理, 消除图像的一些干扰因素, 对图形预处理可以采用 现有的任意一种方法, 如灰度变换、 直方图修正、 图像平滑去噪等。 In this embodiment, detecting the target object only refers to detecting whether there is a target object in the detection area, and the detection area may be an arbitrarily set area, for example, an image is divided into two rectangular frames, and each rectangular frame is As a detection area. The target object can be a pedestrian, steam Cars, animals, etc. In this embodiment, between the feature extraction of the image in the detection area, the image is pre-processed to eliminate some interference factors of the image, and any existing method, such as gradation transformation, may be adopted for the image pre-processing. , histogram correction, image smoothing, etc.
而本实施例中, 特征提取层提取图像中检测区域的像素值, 将检测区 域的像素值转换为三个通道的数据,三个通道分别为第一通道、第二通道、 第三通道, 每个通道的数据独立获取, 作为整个模型的输入部分。  In this embodiment, the feature extraction layer extracts the pixel value of the detection area in the image, and converts the pixel value of the detection area into three channels of data, and the three channels are the first channel, the second channel, and the third channel, respectively. The data for each channel is acquired independently as an input part of the entire model.
具体地, 第一通道的输出数据对应检测区域内的 YUV像素值的 Y通 道的数据。  Specifically, the output data of the first channel corresponds to the data of the Y channel of the YUV pixel value in the detection area.
第二通道用于将检测区域的大小缩小为原大小的四分之一, 并将缩小 后的检测区域转换成 YUV格式, 通过 Sobel边缘算子过滤转换为 YUV格 式的检测区域, 分别得到检测区域在 Y、 υ、 V三个通道上第一边缘图, 其中, Y、 υ、 V通道分别对应一个第一边缘图, 取三个第一边缘图中各 位置上的最大值, 组成一个第二边缘图, 三个第一边缘图以及第二边缘图 大小相同, 都为检测区域的四分之一大小, 将三个第一边缘图和第二边缘 图的拼接图作为第二通道的输出数据。  The second channel is used to reduce the size of the detection area to a quarter of the original size, and converts the reduced detection area into a YUV format, and converts it into a detection area of the YUV format by Sobel edge operator filtering, respectively obtaining a detection area. The first edge map is on the three channels of Y, υ, V, wherein the Y, υ, and V channels respectively correspond to a first edge map, and the maximum values at each position in the three first edge maps are formed to form a second The edge map, the three first edge maps and the second edge map are the same size, all of which are one quarter of the detection area, and the mosaic map of the three first edge maps and the second edge map is used as the output data of the second channel. .
第三通道用于将检测区域的大小缩小为原大小的四分之一, 并将缩小 后的检测区域转换成 YUV格式, 通过 Sobel边缘算子过滤转换为 YUV格 式的检测区域, 分别得到检测区域在 Y、 U、 V三个通道上的第一边缘图, The third channel is used to reduce the size of the detection area to a quarter of the original size, and converts the reduced detection area into a YUV format, and converts it into a detection area of the YUV format by Sobel edge operator filtering, respectively obtaining a detection area. The first edge map on the three channels Y, U, V,
Y、 U、 V通道分别对应一个第一边缘图, 生成一个第三边缘图, 第三边 缘图各位置的数据为 0, 三个第一边缘图以及第三边缘图大小相同, 都为 检测区域的四分之一大小, 将三个第一边缘图和第三边缘图的拼接图作为 第三通道的输出数据; The Y, U, and V channels respectively correspond to a first edge map, and generate a third edge map. The data of each position of the third edge map is 0, and the three first edge maps and the third edge map have the same size, and both are detection regions. a quarter of the size, the mosaic of the three first edge map and the third edge map as the output data of the third channel;
将第一通道、第二通道、第三通道的输出数据作为预处理后的像素值, 然后, 对预处理后的图像进行特征提取, 得到检测区域的特征图, 特征提 取层可通过方向梯度值方图 H0G、 SIFT 、 Gabor、 LBP等方式提取提取 简称区域的特征图。  The output data of the first channel, the second channel, and the third channel are used as pre-processed pixel values, and then the pre-processed image is subjected to feature extraction to obtain a feature map of the detection region, and the feature extraction layer can pass the direction gradient value. The feature maps of the abbreviated regions are extracted by means of a square map H0G, SIFT, Gabor, LBP, and the like.
歩骤 102、 部位检测层通过 M个过滤器分别检测检测区域的特征图, 输出检测区域的 M个部位对应的响应图, 每个过滤器用于检测一个部位, 每个部位对应一个响应图。  Step 102: The part detecting layer respectively detects the feature map of the detection area through the M filters, and outputs a response map corresponding to the M parts of the detection area, and each filter is used to detect one part, and each part corresponds to one response picture.
部位检测层可看做时卷积神经网络系统的一个下采样层, 通过 M个 过滤器分别检测检测区域的特征图, 得到比特征图更多更明确的部位形体 特征。 本实施例中, 部位检测层包括三个子层, 分别为第一子层、 第二子 层和第三子层, 部位检测层的第一子层包括 Ml个过滤器, 部位检测层的 第二子层包括 M2个过滤器, 部位检测层的第三子层包括 M3个过滤器, 其中, Ml、 M2和 M3都为大于 1的正整数, 其 M1+M2+M3=M, 通常, 对于一个卷积层来说, 对应的过滤器的大小是固定的, 但对于行人检测而 言, 由于人体各个部位的大小不同, 因此, 本实施例中, 各个过滤器的大 小可以不同, 本发明并不对此进行限制。 The part detection layer can be regarded as a downsampling layer of the convolutional neural network system, through M The filter detects the feature map of the detection area separately, and obtains more detailed feature parts than the feature map. In this embodiment, the part detecting layer includes three sub-layers, which are a first sub-layer, a second sub-layer, and a third sub-layer, respectively, and the first sub-layer of the part detecting layer includes M1 filters, and the second part of the detecting layer The sublayer includes M2 filters, and the third sublayer of the part detection layer includes M3 filters, wherein M1, M2, and M3 are positive integers greater than 1, and M1+M2+M3=M, usually, for one In the case of a convolutional layer, the size of the corresponding filter is fixed. However, in the case of pedestrian detection, since the size of each part of the human body is different, the size of each filter may be different in this embodiment, and the present invention is not correct. This is a limitation.
部位检测层的第一子层的 Ml个过滤器分别检测检测区域内的 Ml个 部位, 得到 Ml个响应图, 部位检测层的第二子层的 M2个过滤器分别检 测检测区域内的 M2个部位, 得到 M2个响应图; 部位检测层的第三子层 的 M3个过滤器分别检测检测区域内的 M3个部位, 得到 M3个响应图。  M1 filters of the first sub-layer of the part detecting layer respectively detect M1 parts in the detection area to obtain M1 response pictures, and M2 filters of the second sub-layer of the part detecting layer respectively detect M2 in the detection area At the site, M2 response maps are obtained; M3 filters of the third sub-layer of the part detection layer respectively detect M3 parts in the detection area, and M3 response maps are obtained.
以下将通过一个具体例子来说明, 假设 Ml为 6, M2为 7, M3为 7, 即第一子层有 6个过滤器, 第二子层有 7个过滤器, 第三子层有 7个过滤 器, 共有 20个过滤器, 本实施例中, 每个子层的过滤器之间时互相联系 的,第一子层的过滤器较小,第二子层的过滤器的大于第一子层的过滤器, 第三子层的过滤器大于第一子层的过滤器, 第二子层的过滤器可由第一子 层的过滤器按照一定的规则组合而成, 第三子层的过滤器可以由第二子层 的过滤器按照一定的规则组合而成, 如图 4所示, 图 4为本发明检测身体 各个部位的过滤器的示意图, 第一子层的第一个过滤器和第二个过滤器组 合得到第二子层的第一个过滤器, 第一子层的第一个过滤器和第三个过滤 器组合得到第二子层的第二个过滤器, 但有些过滤器是无法组合的, 如第 一子层的第一个过滤器和第五个过滤器是不能组合的。 各个过滤器的参数 都是在对卷积网络进行训练时得到的, 本歩骤中, 只需要将各个过滤器与 处理后的图像进行卷积运算, 就可以得到 20个响应图, 每个过滤器输出 一个响应图, 每个响应图对应目标物体的一些部位, 得到目标物体各个部 位的位置。 图 5部位检测层检测得到的结果示意图。  The following will be explained by a specific example. Suppose Ml is 6, M2 is 7, and M3 is 7, that is, the first sub-layer has 6 filters, the second sub-layer has 7 filters, and the third sub-layer has 7 The filter has a total of 20 filters. In this embodiment, the filters of each sub-layer are related to each other, the filter of the first sub-layer is smaller, and the filter of the second sub-layer is larger than the first sub-layer. The filter of the third sub-layer is larger than the filter of the first sub-layer, and the filter of the second sub-layer can be combined by the filter of the first sub-layer according to certain rules, and the filter of the third sub-layer The filter of the second sub-layer can be combined according to certain rules, as shown in FIG. 4, FIG. 4 is a schematic diagram of the filter for detecting various parts of the body according to the present invention, the first filter and the first sub-layer The two filters combine to obtain the first filter of the second sub-layer, the first filter of the first sub-layer and the third filter combine to obtain the second filter of the second sub-layer, but some filters Cannot be combined, such as the first filter of the first sublayer and The fifth filter is not combinable. The parameters of each filter are obtained when training the convolution network. In this step, only the respective filters and convolution images are convoluted to obtain 20 response graphs, each filtering. The device outputs a response map, and each response map corresponds to some parts of the target object to obtain the position of each part of the target object. Figure 5 is a schematic diagram showing the results of the detection of the part detection layer.
歩骤 103、 形变处理层根据 M个部位对应的响应图分别确定 M个部 位的形变, 并根据 N个部位的形变确定 M个部位的得分图。  Step 103: The deformation processing layer determines the deformation of the M parts according to the response map corresponding to the M parts, and determines the score map of the M parts according to the deformation of the N parts.
通过部位检测层能够检测出检测区域内出现的目标物体的一些部位, 而实际的图像中, 目标物体的由于运动各部位是会发生形变的, 如行人身 体 (如头, 身体, 腿) 的运动会产生行人视觉信息的形变, 形变处理层就 是为了学习目标物体各个部位在行变之前的关联关系,形变处理层从 M个 部位检测响应图中提取最适合人体的 M个部位位置及其得分, 从而提取 出各个部位之间的关联。 The part detecting layer can detect some parts of the target object appearing in the detection area, In the actual image, the target object will be deformed due to the movement of various parts. For example, the movement of the pedestrian body (such as the head, body, and legs) will cause the deformation of the pedestrian's visual information. The deformation processing layer is to learn the various parts of the target object. The correlation relationship before the row change, the deformation processing layer extracts the M position positions and the scores which are most suitable for the human body from the M part detection response maps, thereby extracting the association between the respective parts.
形变处理层根据 M个部位对应的响应图分别确定 M个部位的形变, 并根据 M个部位的形变确定 M个部位的得分图, 具体为:  The deformation processing layer determines the deformation of the M parts according to the response maps corresponding to the M parts, and determines the score maps of the M parts according to the deformation of the M parts, specifically:
首先, 形变处理层根据 M个部位对应的响应图, 按照公式 (1 ) 得到 M个部位的形变得分图:
Figure imgf000013_0001
其中, βρ表示第 p个部分的形变得分图, 1≤ρ≤Μ, Μρ表示第 ρ个部 分对应的响应图, Ν表示第 ρ个部位的限制条件, Dw表示第 η个限制条 件对应的得分图, 1≤^≤ 0^表示第 η个限制条件对应的权重, 这里每个 限制条件对应一个形变, 以第 Ρ个部位为人的头部为例, 头部的运动通常 会有左转、 右转、 向下、 向上四中变形, 每个限制条件对应一个权重, 权 重用来表示头部每种形变的概率。
First, the deformation processing layer obtains the shape of the M parts according to the formula (1) according to the response map corresponding to the M parts:
Figure imgf000013_0001
Where β ρ indicates that the shape of the p-th part becomes a partial graph, 1 ≤ ρ ≤ Μ, Μ ρ represents the response map corresponding to the ρ-th part, Ν represents the constraint condition of the ρ-th part, and D w represents the n-th constraint condition Corresponding score graph, 1 ≤ ^ ≤ 0^ represents the weight corresponding to the nth constraint condition, where each constraint condition corresponds to one deformation, taking the first part as the human head as an example, the head movement usually has left The rotation, right turn, down, and up are deformed in four. Each constraint corresponds to one weight, and the weight is used to indicate the probability of each deformation of the head.
通过公式 (1 ) 计算得到每个部位的形变得分图, 然后, 形变处理层 根据形变得分图, 按照公式 (2 ) 确定第 Ρ个部位的得分图: The shape of each part is calculated by the formula (1), and then the deformation processing layer becomes a sub-graph according to the shape, and the score map of the first part is determined according to the formula (2):
=maxB( , ( 2 ) 其中, β )表示 (x, y)位置上 ^的值, 上述公式的含义是取第 P个部 位形变得分图的最大值,该最大值对应的位置即为第 P部分的位置,因此,
Figure imgf000013_0002
=maxB( , ( 2 ) where β ) represents the value of ^ at the position of (x, y), and the meaning of the above formula is to take the maximum value of the P-th shape into a partial map, and the position corresponding to the maximum value is the first P Part of the location, therefore,
Figure imgf000013_0002
第 P部分的位置可以表示为 。 图 6为形变处理层的操作流程示意图, 图中 MP表示第 p个部分对应 的响应图, 表示第 P部分的第一个限制条件, 表示第 P部分的第二 个限制条件, 表示第 P部分的第三个限制条件, A 表示第 P部分的第 四各限制条件, ρ表示第一个限制条件对应的权重, 0 ^表示第二个限制 条件对应的权重, 0 ^表示第三个限制条件对应的权重, 0 ^表示第四个限 制条件对应的权重, 然后, 将各个限制条件和第 p个部分对应的响应图进 行加权求和, 得到第 P个部位的形变得分图 , 然后取形变得分图中最大 值对应的坐标 (X, y) 位置作为第 P部分的最佳位置。 The position of the P part can be expressed as. Figure 6 is a schematic diagram of the operation flow of the deformation processing layer. In the figure, M P represents the response map corresponding to the p-th part, and represents the first restriction condition of the P-th part, and represents the second restriction condition of the P-th part, indicating the P-th. The third constraint of the part, A represents the fourth constraint of the P part, ρ represents the weight corresponding to the first constraint, 0 ^ represents the weight corresponding to the second constraint, and 0 ^ represents the third restriction The weight corresponding to the condition, 0 ^ indicates the fourth limit The weight corresponding to the condition is then weighted, and then the respective constraint conditions and the response map corresponding to the p-th part are weighted and summed to obtain a shape of the P-th part, and then the shape becomes a coordinate corresponding to the maximum value in the partial image (X, y) Position as the best position for Part P.
歩骤 104、 遮挡处理层根据 M个部位的得分图确定 M个部位对应的 遮挡。  Step 104: The occlusion processing layer determines the occlusion corresponding to the M parts according to the score map of the M parts.
形变处理层给出了各个部分的得分图 s = {SlA''' },根据各个部位的得 分图确定每个部位对应的遮挡。 本实施例中, 遮挡处理层包括三个子层, 分别为第一子层、 第二子层、 第三子层, 遮挡处理层根据 M个部位的得 分图确定 M个部位对应的遮挡, 具体为: The deformation processing layer gives the score map s = {Sl A''' } of each part, and determines the occlusion corresponding to each part according to the score map of each part. In this embodiment, the occlusion processing layer includes three sub-layers, which are a first sub-layer, a second sub-layer, and a third sub-layer, respectively, and the occlusion processing layer determines the occlusion corresponding to the M parts according to the score map of the M parts, specifically :
遮挡处理层确定 M个部位在遮挡处理层的子层上的得分图和可视性; 遮挡处理层的第一子层、 第二子层、 第三子层分别按照公式(3) 、 (4) 、 The occlusion processing layer determines the score map and visibility of the M parts on the sub-layer of the occlusion processing layer; the first sub-layer, the second sub-layer, and the third sub-layer of the occlusion processing layer are respectively according to formulas (3), (4) ),
(5) 计算各个部位的可视性:(5) Calculate the visibility of each part:
= + g (3) h;l ^S((h')Tw^ + c' +g' s' ), 1=1, 2 (4) y ^S((h3)Twds+b) (5) 其中, 4表示第 P个部位在遮挡处理层的第 1层上的得分图, 表示 的权重矩阵, 表示 4的偏置, 表示第 P个部位在遮挡处理层的第 1层 上的可视性, )是8型函数,
Figure imgf000014_0001
表示第 P个部位在遮 挡处理层的第 Z子层上可视性, 用 ^表示 ^和 之间的传递矩阵, 表示 ^的第 j列, 表示隐含变量 ^的线性分类器的参数, 《表示对矩阵 X 的转置, 表示所述卷积神经网络的输出结果 。
= + g (3) h; l ^S((h') T w^ + c'+g' s ' ), 1=1, 2 (4) y ^S((h 3 ) T w ds +b (5) where 4 indicates the score map of the Pth portion on the first layer of the occlusion processing layer, and the weight matrix indicated, indicating the offset of 4, indicating that the Pth portion is on the first layer of the occlusion processing layer. Visibility, ) is a type 8 function,
Figure imgf000014_0001
Indicates the visibility of the Pth part on the Zth sublayer of the occlusion processing layer, and the transfer matrix between ^ and ^, the jth column of ^, the parameter of the linear classifier of the implied variable ^, Represents a transpose of matrix X, representing the output of the convolutional neural network.
本实施例中, 只有相邻层的隐含变量相互连接, 每个部位都可以有多 个父节点和子节点, 而每个部位的可视性与同层的其他部位的可视性相关 联, 表现为拥有相同的父节点, 后面一层的部位的可视性是与前一层的若 干个部位的可视性相关的。 如图 7中所示, 图 7为遮挡处理层的处理过程 示意图, 第一层的前两个部位的可视性与第二层的可视性是强相关的, 这 是因为在结构上, 提及的两个部位能够经过组合得到第二层的部位, 即前 面一层的两个部位在图像中可视性比较高 (部位匹配程度比较高)直接会 导致后一层能够被它们组合出来的部位的可视性也比较高。 除了前一层的 部位之外, 第二层部位的可视性也与自身的部位得分有关, 其直观的理解 是, 当一个部位的匹配得分比较高, 那么其可视性自然就比较高。 遮挡处 理层所有的参数均由后向传播算法学习得出。 In this embodiment, only the hidden variables of adjacent layers are connected to each other, and each part may have multiple parent nodes and child nodes, and the visibility of each part is related to the visibility of other parts of the same layer. Expressed as having the same parent node, the visibility of the next layer is related to the visibility of several parts of the previous layer. As shown in FIG. 7, FIG. 7 is a schematic view of the processing process of the occlusion processing layer, and the visibility of the first two portions of the first layer is strongly correlated with the visibility of the second layer, because structurally, The two parts mentioned can be combined to obtain the part of the second layer, that is, the two parts of the previous layer have higher visibility in the image (the degree of matching of the parts is relatively high), which directly causes the latter layer to be combined by them. The visibility of the parts is also relatively high. In addition to the part of the previous layer, the visibility of the second layer is also related to the score of its own part, which is intuitively understood. Yes, when the matching score of a part is relatively high, the visibility is naturally higher. All parameters of the occlusion processing layer are learned by the backward propagation algorithm.
歩骤 105、 分类器根据遮挡处理层的输出结果, 确定检测区域内是否 有目标物体。  Step 105: The classifier determines whether there is a target object in the detection area according to an output result of the occlusion processing layer.
遮挡处理层根据各个部位的得分图确定各个部位的遮挡程度, 遮挡程 度是通过可视性来体现的, 分类器根据遮挡处理层输出的结果, 确定检测 区域内是否有目标物体, 并输出检测结果。 如图 8所示, 图 8为本发明目 标物体检测结果示意图。  The occlusion processing layer determines the occlusion degree of each part according to the score map of each part, and the occlusion degree is embodied by visibility. The classifier determines whether there is a target object in the detection area according to the output result of the occlusion processing layer, and outputs the detection result. . As shown in Fig. 8, Fig. 8 is a schematic view showing the detection result of the target object of the present invention.
本实施例提供的方法, 由联合了优化特征提取、部位检测、形变处理、 遮挡处理和分类器学习为一体的统一的卷积神经网络模型, 通过形变处理 层使得卷积神经网络能够学习目标物体的形变, 并且形变学习和遮挡处理 进行交互, 这种交互能提高提高分类器根据所学习到的特征分辨行人和非 行人的能力。  The method provided in this embodiment is a unified convolutional neural network model which combines optimized feature extraction, part detection, deformation processing, occlusion processing and classifier learning, and enables the convolutional neural network to learn the target object through the deformation processing layer. The deformation, and deformation learning and occlusion processing interact, which enhances the ability of the classifier to distinguish between pedestrians and non-pedestrians based on the learned features.
在采用实施例一提供的基于卷积神经网络的目标物体检测方法之前, 首先需要对该卷积神经网络进行预训练, 得到卷积神经网络各个层的参 数。 在本发明中, 我们所有的参数, 包括图像的特征, 形变参数, 可视性 关系均可以通过统一的架构学出。 在训练如此多级的网络过程中, 采用了 一种多级训练的策略。 首先采用监督式学习的方法学习了一个只有一层的 卷积网络, 此过程中采用 Gabor 滤波器作为过滤器的初始值。 当该一层的 网络学好之后, 再增添第二层, 继而学习两层的网络, 而之前所学出的只 有一层的网络作为初始值对待。 在整个学习的过程中, 均采用后向传播的 方法学习所有参数。  Before adopting the convolutional neural network-based target object detection method provided in the first embodiment, it is first necessary to pre-train the convolutional neural network to obtain parameters of each layer of the convolutional neural network. In the present invention, all of our parameters, including image features, deformation parameters, and visibility relationships, can be learned through a unified architecture. In the process of training such a multi-level network, a multi-level training strategy is adopted. First, a supervised learning method was used to learn a convolutional network with only one layer. In this process, a Gabor filter was used as the initial value of the filter. When the network of this layer is well learned, add a second layer, and then learn the two-tier network, and the network that only learned one layer before is treated as the initial value. Throughout the learning process, all parameters are learned using the method of backward propagation.
在通过一次预训练得到各个参数之后, 还可以对学习得到的参数进行 调整, 以下以对遮挡估计层进行参数调整为例进行说明, 预测误差通过后 向传播方法更新所有参数, 其中对于 s的传播梯度的表达式如下表 ,:/」、 dL dL dh _ dL After each parameter is obtained through one pre-training, the parameters obtained by the learning can also be adjusted. The following is an example of parameter adjustment of the occlusion estimation layer. The prediction error updates all parameters by the backward propagation method, where the propagation for s The expression of the gradient is as follows: :", dL dL dh _ dL
Figure imgf000015_0001
Figure imgf000016_0001
其中, Θ表示 Hadamard积, Hadamard积的运算为("ΰ""· = t7" , L 表示损失函数。
Figure imgf000015_0001
Figure imgf000016_0001
Where Θ denotes the Hadamard product, and the Hadamard product computes as (" ΰ ""· = t7 " , where L represents the loss function.
损失函数可以有多中形式。 例如, 对于平方和误差损失函数, 则其表 达式为:  The loss function can have many forms. For example, for a squared error loss function, the expression is:
L = ygnd - f / 2 , L = y gnd - f / 2 ,
对于对数误差损失函数来说, 则其表达式为:  For a logarithmic error loss function, its expression is:
L = ygnd log ί + (ΐ- ygnd ) iog(i - y) L = y gnd lo g ί + (ΐ- y gnd ) iog(i - y)
其中, 表示训练样本的真实结果, 表示通过本发明的卷积神经 网络得到的输出结果, 如果损失函数的值不满足预设的条件, 则继续对各 个参数进行训练, 直到损失函数满足预设条件。  Wherein, the actual result of the training sample is represented, and the output result obtained by the convolutional neural network of the present invention is expressed. If the value of the loss function does not satisfy the preset condition, the parameters are continuously trained until the loss function satisfies the preset condition. .
在上述实施例一的基础上, 本发明实施例二将通过一个具体的例子对 实施例一所示的方法进行详细说明, 如图 9所示, 图 9本发明总体模型 /」、 如图 9所示, 首先, 输入一副 84x72大小的图像, 该图像由 3层组成, 对 输入的图像进行第一层卷积, 局部滑动窗的大小是 9x9, 得到过滤后的 64 层的 76x 24大小的图像, 然后根据每个像素点相邻的周围四个像素点求取 平均值, 得到 64层的 l9xl5大小的图像, 然后提取该 l9xl5大小的图像的特 征图, 上述这些过程是由特征提取层完成的, 然后由部位检测曾对提取后 的特征图进行第二层卷积运算, 具体的使用 20个过滤器对图像进行过滤, 得到 20个部位响应图, 然后, 由形变处理层根据 20个部位的响应图分别 确定 20个部位的得分图,最后遮挡处理层根据 20个部位的得分图确定 20 个部位对应的遮挡, 得到 20个部位的可视性, 根据 20个部位的可视性确 定, 确定检测区域内是否有目标物体。 Based on the first embodiment, the second embodiment of the present invention will be described in detail by using a specific example. As shown in FIG. 9, FIG. 9 shows the overall model of the present invention. First, enter an image of size 84 x 72. The image consists of 3 layers. The first layer is convolved on the input image. The size of the partial sliding window is 9x9 . The filtered 64-layer 76x is obtained. The image of the size is then averaged according to the four surrounding pixels adjacent to each pixel point to obtain a 64-layer image of l 9 x l 5 size, and then the feature map of the image of the size of 19 × 1.5 is extracted, These processes are completed by the feature extraction layer, and then the second layer convolution operation is performed on the extracted feature map by the location detection. The 20 filters are used to filter the image to obtain 20 parts response maps. Then, The deformation processing layer determines the score maps of 20 parts according to the response maps of 20 parts. Finally, the occlusion processing layer determines the occlusion corresponding to 20 parts according to the score map of 20 parts, and obtains 20 parts. Visibility The visibility determination portion 20 determines whether an object within the detection target area.
图 10为本发明卷积神经网络一个实施例的结构示意图,如图 10所示, 本实施提供的卷积神经网络包括: 特征提取层 21、 部位检测层 22、 形变 处理层 23、 遮挡处理层 24和分类器 25。  10 is a schematic structural diagram of an embodiment of a convolutional neural network according to the present invention. As shown in FIG. 10, the convolutional neural network provided by the present embodiment includes: a feature extraction layer 21, a portion detection layer 22, a deformation processing layer 23, and an occlusion processing layer. 24 and classifier 25.
特征提取层 21, 用于根据提取图像中检测区域的像素值, 对检测区域 的像素值进行预处理, 并对预处理后的图像进行特征提取, 得到检测区域 的特征图; 部位检测层 22, 用于通过 M个过滤器分别检测检测区域的特征图, 输出检测区域的 M个部位对应的响应图, 每个过滤器用于检测一个部位, 每个部位对应一个响应图; The feature extraction layer 21 is configured to preprocess the pixel value of the detection area according to the pixel value of the detection area in the extracted image, and perform feature extraction on the preprocessed image to obtain a feature map of the detection area; The part detecting layer 22 is configured to respectively detect the feature map of the detection area by the M filters, and output a response map corresponding to the M parts of the detection area, each filter is used for detecting one part, and each part corresponds to one response picture;
形变处理层 23, 用于根据 M个部位对应的响应图分别确定 M个部位 的形变, 并根据 N个部位的形变确定 M个部位的得分图;  The deformation processing layer 23 is configured to respectively determine deformations of the M parts according to the response maps corresponding to the M parts, and determine the score maps of the M parts according to the deformation of the N parts;
遮挡处理层 24, 用于根据 M个部位的得分图确定 M个部位对应的遮 挡;  The occlusion processing layer 24 is configured to determine the occlusion corresponding to the M parts according to the score map of the M parts;
分类器 25, 用于根据遮挡处理层的输出结果, 确定检测区域内是否有 目标物体。  The classifier 25 is configured to determine whether there is a target object in the detection area according to an output result of the occlusion processing layer.
本实施例中, 特征提取层 21可以包括三个通道, 分别为第一通道、 第二通道、第三通道; 其中, 第一通道的输出数据对应检测区域内的 YUV 像素值的 Y通道数据;  In this embodiment, the feature extraction layer 21 may include three channels, which are respectively a first channel, a second channel, and a third channel; wherein, the output data of the first channel corresponds to the Y channel data of the YUV pixel value in the detection area;
第二通道, 用于将检测区域的大小缩小为原大小的四分之一, 并将缩 小后的检测区域转换成 YUV格式, 通过 Sobel边缘算子过滤转换为 YUV 格式的检测区域, 分别得到检测区域在 Y、 U、 V三个通道上第一边缘图, Y、 υ、 V通道分别对应一个第一边缘图, 取三个第一边缘图中各位置上 的最大值, 组成一个第二边缘图, 三个第一边缘图以及第二边缘图大小相 同, 都为检测区域的四分之一大小, 将三个第一边缘图和第二边缘图的拼 接图作为第二通道的输出数据;  The second channel is configured to reduce the size of the detection area to a quarter of the original size, and convert the reduced detection area into a YUV format, and convert it into a detection area of the YUV format by Sobel edge operator filtering, and respectively detect The first edge map of the three channels Y, U, V, the Y, υ, V channels respectively correspond to a first edge map, taking the maximum value at each position in the three first edge maps to form a second edge The three first edge maps and the second edge maps are the same size, each of which is a quarter of the detection area, and the mosaic map of the three first edge maps and the second edge map is used as the output data of the second channel;
第三通道, 用于将检测区域的大小缩小为原大小的四分之一, 并将缩 小后的检测区域转换成 YUV格式, 通过 Sobel边缘算子过滤转换为 YUV 格式的检测区域, 分别得到检测区域在 Y、 U、 V三个通道上的第一边缘 图, Y、 U、 V通道分别对应一个第一边缘图, 生成一个第三边缘图, 第 三边缘图各位置的数据为 0, 三个第一边缘图以及第三边缘图大小相同, 都为检测区域的四分之一大小, 将三个第一边缘图和第三边缘图的拼接图 作为第三通道的输出数据。  The third channel is configured to reduce the size of the detection area to a quarter of the original size, and convert the reduced detection area into a YUV format, and convert it into a detection area of the YUV format by Sobel edge operator filtering, and respectively detect The first edge map of the region on the Y, U, V channels, the Y, U, V channels respectively correspond to a first edge map, generating a third edge map, the data of each position of the third edge map is 0, three The first edge map and the third edge map are the same size, and each is a quarter of the detection area, and the mosaic of the three first edge maps and the third edge map is used as the output data of the third channel.
部位检测层 22包括三个子层, 分别为第一子层、 第二子层和第三子 层, 部位检测层的第一子层包括 Ml个过滤器, 部位检测层的第二子层包 括 M2个过滤器, 部位检测层的第三子层包括 M3个过滤器, 其中, M1+M2+M3=M; 部位检测层的第一子层, 用于通过 Ml个过滤器分别检 测检测区域内的 Ml个部位, 得到 Ml个响应图; 部位检测层的第二子层, 用于通过 M2个过滤器分别检测检测区域内的 M2个部位, 得到 M2个响 应图; 部位检测层的第三子层, 用于通过 M3个过滤器分别检测检测区域 内的 M3个部位, 得到 M3个响应图。 The part detecting layer 22 includes three sub-layers, which are a first sub-layer, a second sub-layer and a third sub-layer, respectively, the first sub-layer of the part detecting layer includes M1 filters, and the second sub-layer of the part detecting layer includes M2 Filter, the third sub-layer of the part detection layer comprises M3 filters, wherein M1+M2+M3=M; the first sub-layer of the part detection layer is used for separately detecting by Ml filters Measuring M1 parts in the detection area to obtain M1 response maps; the second sub-layer of the part detection layer is used for detecting M2 parts in the detection area by M2 filters respectively, and obtaining M2 response patterns; The third sub-layer is used to detect M3 parts in the detection area by M3 filters respectively, and obtain M3 response patterns.
形变处理层 23具体用于: 根据 M个部位对应的响应图, 分别按照公 式 (1) 得到第 P个部位的形变得分图:  The deformation processing layer 23 is specifically configured to: according to the response map corresponding to the M parts, respectively obtain the shape of the Pth part according to the formula (1):
其中, βρ表示第 p个部分的形变得分图, 1≤Ρ≤Μ Μ„表小 ρ
Figure imgf000018_0001
分对应的响应图, Ν表示 Ρ个部位的限制条件, D ./」ν η水限制 件对应的得分图, 1≤^≤ 0^表示第 η个限制条件对应的权重;
Where β ρ indicates that the shape of the p-th part becomes a partial graph, 1≤Ρ≤Μ Μ„table small ρ
Figure imgf000018_0001
For the corresponding response map, Ν denotes the constraint condition of the part, D. /" ν η water limiter corresponds to the score map, 1 ≤ ^ ≤ 0^ represents the weight corresponding to the nth constraint condition;
并根据形变得分图, 按照公式 (2) 确定第 Ρ部位的得分图  And according to the shape becomes a sub-graph, according to formula (2) to determine the score map of the first part
s„ = maxB (x,y) (2)  s„ = maxB (x,y) (2)
p 其中, β )表示 (x, y)位置上 βρ的值。 p where β ) represents the value of β ρ at the position (x, y).
遮挡处理层 24包括三个子层, 分别为:  The occlusion processing layer 24 includes three sub-layers, respectively:
层; 遮挡处理层的第一子层、 第二子层、 .子层分别按照公式 (3) 、 (4) 、 (5) 计算各个部位的可视性: The first sub-layer, the second sub-layer, and the sub-layer of the occlusion processing layer calculate the visibility of each part according to formulas (3), (4), and (5), respectively:
(3) (4) y^S((h3)T wcls + b) (5) 其中, ^表示第 P个部位在遮挡处理层的第 1层上的得分图, 表 /」、 hi (3) (4) y^S((h 3 ) T w cls + b) (5) where ^ represents the score map of the Pth part on the first layer of the occlusion treatment layer, table /", hi
的权重矩阵, 表小 的偏置, 表示: P个部位在遮挡处理层的第 1 hi  The weight matrix, the small offset of the table, means: the first part of the P part in the occlusion processing layer
层上的可视性, ^tXl + expi-t))-1, 表示第 P个部位在遮挡处理层的第 Z子 层上可视性, 用 ^表示 ^和 之间的传递矩阵, 表示 ^的第 j列, 表示隐含变量 ^的线性分类器的参数, 《表示矩阵 X的转置, 表示卷 积神经网络的输出结果 。 The visibility on the layer, ^tXl + expi-t)) - 1 , indicates the visibility of the Pth part on the Zth sublayer of the occlusion processing layer, and the transfer matrix between ^ and ^ is represented by ^ The jth column, which represents the parameter of the linear classifier of the implied variable ^, "represents the transpose of matrix X, which represents the output of the convolutional neural network.
本实施例提供的卷积神经网络科用于执行图 3所示方法实施例提供技 术方案, 具体实现方式和技术效果类似, 这里不再赘述。 图 11为本发明卷积神经网络又一个实施例的结构示意图, 如图 11所 示, 本实施例的卷积神经网络 300包括: 处理器 31和存储器 32, 处理器 31和存储器 32通过总线连接, 存储器 32存储执行指令, 当卷积神经网络 系统 300运行时, 处理器 31与存储器 32之间通信, 处理器 31执行指令 使得卷积神经网络 300执行本发明提供的基于卷积神经网络系统的目标物 体检测方法。 本实施例中, 卷积神经网络的特征提取层、 部位检测层、 形 变处理层、 遮挡处理层和分类器都可以通过处理器 31来实现, 由处理器 31执行各层的功能。 具体地: The convolutional neural network provided in this embodiment is used to implement the technical solution provided by the method embodiment shown in FIG. 3. The specific implementation manner and technical effects are similar, and details are not described herein again. 11 is a schematic structural diagram of still another embodiment of a convolutional neural network according to the present invention. As shown in FIG. 11, the convolutional neural network 300 of this embodiment includes: a processor 31 and a memory 32, and the processor 31 and the memory 32 are connected by a bus. The memory 32 stores execution instructions. When the convolutional neural network system 300 is in operation, the processor 31 communicates with the memory 32, and the processor 31 executes instructions to cause the convolutional neural network 300 to perform the convolutional neural network system based on the present invention. Target object detection method. In this embodiment, the feature extraction layer, the part detection layer, the deformation processing layer, the occlusion processing layer, and the classifier of the convolutional neural network may be implemented by the processor 31, and the functions of the layers are performed by the processor 31. specifically:
处理器 31控制特征提取层根据提取图像中检测区域的像素值, 对检 测区域的像素值进行预处理, 并对预处理后的图像进行特征提取, 得到检 测区域的特征图;  The processor 31 controls the feature extraction layer to preprocess the pixel values of the detection area according to the pixel values of the detection area in the extracted image, and performs feature extraction on the preprocessed image to obtain a feature map of the detection area;
处理器 31控制部位检测层通过 M个过滤器分别检测检测区域的特征 图, 输出检测区域的 M个部位对应的响应图, 每个过滤器用于检测一个 部位, 每个部位对应一个响应图;  The control part of the processor 31 detects the feature map of the detection area through the M filters, and outputs a response map corresponding to the M parts of the detection area, and each filter is used for detecting one part, and each part corresponds to one response picture;
处理器 31控制形变处理层根据 M个部位对应的响应图分别确定 M个 部位的形变, 并根据 N个部位的形变确定 M个部位的得分图;  The processor 31 controls the deformation processing layer to determine the deformation of the M parts according to the response maps corresponding to the M parts, and determines the score map of the M parts according to the deformation of the N parts;
处理器 31控制遮挡处理层根据 M个部位的得分图确定 M个部位对应 的遮挡;  The processor 31 controls the occlusion processing layer to determine the occlusion corresponding to the M parts according to the score map of the M parts;
处理器 31控制分类器根据遮挡处理层的输出结果, 确定检测区域内 是否有目标物体。  The processor 31 controls the classifier to determine whether or not there is a target object in the detection area based on the output result of the occlusion processing layer.
本实施例中, 特征提取层包括三个通道, 分别为第一通道、第二通道、 第三通道。  In this embodiment, the feature extraction layer includes three channels, which are a first channel, a second channel, and a third channel, respectively.
其中, 第一通道的输出数据对应检测区域内的 YUV像素值的 Y通道 数据;  The output data of the first channel corresponds to the Y channel data of the YUV pixel value in the detection area;
第二通道用于将检测区域的大小缩小为原大小的四分之一, 并将缩小 后的检测区域转换成 YUV格式, 通过 Sobel边缘算子过滤转换为 YUV格 式的检测区域, 分别得到检测区域在 Y、 U、 V三个通道上第一边缘图, Y、 U、 V通道分别对应一个第一边缘图, 取三个第一边缘图中各位置上 的最大值, 组成一个第二边缘图, 三个第一边缘图以及第二边缘图大小相 同, 都为检测区域的四分之一大小, 将三个第一边缘图和第二边缘图的拼 接图作为第二通道的输出数据; The second channel is used to reduce the size of the detection area to a quarter of the original size, and converts the reduced detection area into a YUV format, and converts it into a detection area of the YUV format by Sobel edge operator filtering, respectively obtaining a detection area. The first edge map on the Y, U, V channels, the Y, U, V channels respectively correspond to a first edge map, taking the maximum values at each position in the three first edge maps to form a second edge map The three first edge maps and the second edge maps have the same size, all of which are one quarter of the detection area, and the three first edge maps and the second edge map are spelled together. Connected as the output data of the second channel;
第三通道用于将检测区域的大小缩小为原大小的四分之一, 并将缩小 后的检测区域转换成 YUV格式, 通过 Sobel边缘算子过滤转换为 YUV格 式的检测区域, 分别得到检测区域在 Y、 U、 V三个通道上的第一边缘图, Y、 U、 V通道分别对应一个第一边缘图, 生成一个第三边缘图, 第三边 缘图各位置的数据为 0, 三个第一边缘图以及第三边缘图大小相同, 都为 检测区域的四分之一大小, 将三个第一边缘图和第三边缘图的拼接图作为 第三通道的输出数据。  The third channel is used to reduce the size of the detection area to a quarter of the original size, and converts the reduced detection area into a YUV format, and converts it into a detection area of the YUV format by Sobel edge operator filtering, respectively obtaining a detection area. In the first edge map on the Y, U, V channels, the Y, U, and V channels respectively correspond to a first edge map, and a third edge map is generated, and the data of each position of the third edge map is 0, three The first edge map and the third edge map are the same size, and each is a quarter of the detection area, and the mosaic of the three first edge maps and the third edge map is used as the output data of the third channel.
部位检测层包括三个子层, 分别为第一子层、 第二子层和第三子层, 部位检测层的第一子层包括 Ml个过滤器, 部位检测层的第二子层包括 M2个过滤器, 部位检测层的第三子层包括 M3个过滤器, 其中,  The part detecting layer comprises three sub-layers, namely a first sub-layer, a second sub-layer and a third sub-layer, respectively, the first sub-layer of the part detecting layer comprises Ml filters, and the second sub-layer of the part detecting layer comprises M2 a filter, the third sub-layer of the part detecting layer includes M3 filters, wherein
M1+M2+M3=M; 部位检测层的第一子层的 Ml个过滤器分别检测检测区 域内的 Ml个部位, 得到 Ml个响应图; 部位检测层的第二子层的 M2个 过滤器分别检测检测区域内的 M2个部位, 得到 M2个响应图; 部位检测 层的第三子层的 M3个过滤器分别检测检测区域内的 M3个部位, 得到 M3个响应图。 M1+M2+M3=M; M1 filters of the first sub-layer of the part detection layer respectively detect M1 parts in the detection area to obtain M1 response pictures; M2 filters of the second sub-layer of the part detection layer M2 regions in the detection area are respectively detected, and M2 response maps are obtained; M3 filters in the third sub-layer of the part detection layer respectively detect M3 parts in the detection area, and M3 response maps are obtained.
本实施例中, 形变处理层根据 M个部位对应的响应图分别确定 M个 部位的形变, 并根据 M个部位的形变确定 M个部位的得分图, 具体为: 形变处理层根据 M个部位对应的响应图, 分别按照公式 (1 ) 得到第 P个部位的形变得分图: In this embodiment, the deformation processing layer determines the deformation of the M parts according to the response maps corresponding to the M parts, and determines the score maps of the M parts according to the deformation of the M parts, specifically: the deformation processing layer corresponds to the M parts. The response graph, according to the formula (1), obtains the shape of the Pth part into a subgraph:
Figure imgf000020_0001
其中, 表示第 ρ个部分的形变得分图, l≤p≤M, Mp表示第 p个部 分对应的响应图, N表示第 p个部位的限制条件, D",p表示第 n个限制条 件对应的得分图, 1≤^≤ 0^表示第 n个限制条件对应的权重;
Figure imgf000020_0001
Wherein, the shape indicating the ρth portion becomes a partial graph, l≤p≤M, M p represents a response map corresponding to the pth portion, N represents a restriction condition of the pth portion, D ",p represents an nth constraint condition Corresponding score graph, 1 ≤ ^ ≤ 0^ indicates the weight corresponding to the nth constraint condition;
形变处理层根据形变得分图, 按照公式 (2 ) 确定第 P部位的得分图: The deformation processing layer is divided into graphs according to the shape, and the score map of the Pth portion is determined according to the formula (2):
=maxB( , ( 2 ) 其中, β )表示 (x, y)位置上 的值。  =maxB( , ( 2 ) where β ) represents the value at the position of (x, y).
本实施例中, 遮挡处理层包括三个子层, 分别为第一子层、第二子层、 第三子层,遮挡处理层根据 Μ个部位的得分图确定 Μ个部位对应的遮挡, 包括: In this embodiment, the occlusion processing layer includes three sub-layers, which are a first sub-layer, a second sub-layer, and a third sub-layer, respectively, and the occlusion processing layer determines the occlusion corresponding to each part according to the score map of the one part, include:
遮挡处理层确定 M个部位在遮挡处理层的子层上的得分图和可视性; 遮挡处理层的第一子层、 第二子层、 第三子层分别按照公式 (3) 、 (4) 、 (5) 计算各个部位的可视性: The occlusion processing layer determines the score map and visibility of the M parts on the sub-layer of the occlusion processing layer; the first sub-layer, the second sub-layer, and the third sub-layer of the occlusion processing layer are respectively according to formulas (3), (4) ), (5) Calculate the visibility of each part:
= + g (3) + ), 1=1, 2 (4) y ^S((h3)T wds + b) (5) 其中, 表示第 P个部位在遮挡处理层的第 1层上的得分图, ^表示 4的权重矩阵, 表示 4的偏置, 表示第 P个部位在遮挡处理层的第 1 层上的可视性, ^tXl + expi-t))-1, 表示第 P个部位在遮挡处理层的第 Z子 层上可视性, 用 ^表示 ^和 之间的传递矩阵, 表示 ^的第 j列, 表示隐含变量 ^的线性分类器的参数, «表示矩阵 X的转置, 表示卷 积神经网络的输出结果 。 本实施例提供的卷积神经网络科用于执行图 3所示方法实施例提供技 术方案, 具体实现方式和技术效果类似, 这里不再赘述。 = + g (3) + ), 1=1, 2 (4) y ^S((h 3 ) T w ds + b) (5) where, the Pth part is on the first layer of the occlusion layer The score map, ^ denotes the weight matrix of 4, denotes the offset of 4, and shows the visibility of the Pth part on the first layer of the occlusion processing layer, ^tXl + expi-t)) - 1 , indicating the Pth The visibility of the parts on the Z-th sub-layer of the occlusion processing layer, the transfer matrix between ^ and ^, the j-th column of the ^, the parameter of the linear classifier of the implied variable ^, «representing the matrix X The transpose, which represents the output of the convolutional neural network. The convolutional neural network provided in this embodiment is used to implement the technical solution provided by the method embodiment shown in FIG. 3. The specific implementation manner and technical effects are similar, and details are not described herein again.
本领域普通技术人员可以理解: 实现上述各方法实施例的全部或部分 歩骤可以通过程序指令相关的硬件来完成。 前述的程序可以存储于一计算 机可读取存储介质中。 该程序在执行时, 执行包括上述各方法实施例的歩 骤; 而前述的存储介质包括: ROM、 RAM, 磁碟或者光盘等各种可以存 储程序代码的介质。  One of ordinary skill in the art will appreciate that all or a portion of the steps of implementing the various method embodiments described above can be accomplished by hardware associated with the program instructions. The aforementioned program can be stored in a computer readable storage medium. The program, when executed, performs the steps including the above-described method embodiments; and the foregoing storage medium includes: a medium that can store program codes, such as a ROM, a RAM, a magnetic disk, or an optical disk.
最后应说明的是: 以上各实施例仅用以说明本发明的技术方案, 而非 对其限制; 尽管参照前述各实施例对本发明进行了详细的说明, 本领域的 普通技术人员应当理解: 其依然可以对前述各实施例所记载的技术方案进 行修改, 或者对其中部分或者全部技术特征进行等同替换; 而这些修改或 者替换, 并不使相应技术方案的本质脱离本发明各实施例技术方案的范 围。  It should be noted that the above embodiments are merely illustrative of the technical solutions of the present invention, and are not intended to be limiting; although the present invention has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art The technical solutions described in the foregoing embodiments may be modified, or some or all of the technical features may be equivalently replaced; and the modifications or substitutions do not deviate from the technical solutions of the embodiments of the present invention. range.

Claims

权 利 要 求 书 claims
1、 基于卷积神经网络的目标物体检测方法, 其特征在于, 所述卷积 神经网络包括: 特征提取层、 部位检测层、 形变处理层、 遮挡处理层和分 类器; 1. A target object detection method based on a convolutional neural network, characterized in that the convolutional neural network includes: a feature extraction layer, a part detection layer, a deformation processing layer, an occlusion processing layer and a classifier;
所述特征提取层根据提取图像中检测区域的像素值, 对所述检测区域 的像素值进行预处理, 并对所述预处理后的图像进行特征提取, 得到所述 检测区域的特征图; The feature extraction layer performs preprocessing on the pixel values of the detection area based on the extracted pixel values of the detection area in the image, and performs feature extraction on the preprocessed image to obtain a feature map of the detection area;
所述部位检测层通过 M个过滤器分别检测所述检测区域的特征图, 输出所述检测区域的 M个部位对应的响应图, 每个过滤器用于检测一个 部位, 每个部位对应一个响应图; The part detection layer detects the feature maps of the detection area through M filters respectively, and outputs response maps corresponding to the M parts of the detection area. Each filter is used to detect one part, and each part corresponds to a response map. ;
所述形变处理层根据所述 M个部位对应的响应图分别确定所述 M个 部位的形变, 并根据所述 N个部位的形变确定所述 M个部位的得分图; 所述遮挡处理层根据所述 M个部位的得分图确定所述 M个部位对应 的遮挡; The deformation processing layer determines the deformations of the M parts according to the response maps corresponding to the M parts, and determines the score maps of the M parts based on the deformations of the N parts; the occlusion processing layer determines the deformations of the M parts according to The score map of the M parts determines the occlusion corresponding to the M parts;
所述分类器根据所述遮挡处理层的输出结果, 确定所述检测区域内是 否有目标物体。 The classifier determines whether there is a target object in the detection area based on the output result of the occlusion processing layer.
2、 根据权利要求 1所述的方法, 其特征在于, 所述特征提取层提取 图像中检测区域的像素值,对所述检测区域内的像素值进行预处理,包括: 所述特征提取层提取所述图像中检测区域的像素值, 将所述检测区域 的像素值转换为三个通道的数据, 所述三个通道分别为第一通道、 第二通 道、 第三通道; 2. The method according to claim 1, characterized in that: the feature extraction layer extracts pixel values of the detection area in the image, and preprocesses the pixel values in the detection area, including: the feature extraction layer extracts The pixel values of the detection area in the image are converted into data of three channels, and the three channels are respectively the first channel, the second channel, and the third channel;
其中, 所述第一通道的输出数据对应所述检测区域内的 YUV像素值 的 Y通道数据; Wherein, the output data of the first channel corresponds to the Y channel data of the YUV pixel value in the detection area;
所述第二通道用于将所述检测区域的大小缩小为原大小的四分之一, 并将所述缩小后的检测区域转换成 YUV格式, 通过 Sobel边缘算子过滤 所述转换为 YUV格式的检测区域, 分别得到所述检测区域在 Y、 U、 V三 个通道上第一边缘图, 所述 Y、 U、 V通道分别对应一个第一边缘图, 取 所述三个第一边缘图中各位置上的最大值, 组成一个第二边缘图, 所述三 个第一边缘图以及第二边缘图大小相同, 都为所述检测区域的四分之一大 小, 将所述三个第一边缘图和所述第二边缘图的拼接图作为所述第二通道 的输出数据; The second channel is used to reduce the size of the detection area to a quarter of the original size, convert the reduced detection area into YUV format, and filter the conversion into YUV format through Sobel edge operator detection area, respectively obtain the first edge map of the detection area on the three channels Y, U, and V. The Y, U, and V channels respectively correspond to a first edge map, and obtain the three first edge maps. The maximum value at each position in , forms a second edge map. The three first edge maps and the second edge map are of the same size, and are all a quarter of the size of the detection area. The three first edge maps are A spliced image of an edge image and the second edge image is used as the second channel the output data;
所述第三通道用于将所述检测区域的大小缩小为原大小的四分之一, 并将所述缩小后的检测区域转换成 YUV格式, 通过 Sobel边缘算子过滤 所述转换为 YUV格式的检测区域, 分别得到所述检测区域在 Y、 U、 V三 个通道上的第一边缘图, 所述 Y、 U、 V通道分别对应一个第一边缘图, 生成一个第三边缘图, 所述第三边缘图各位置的数据为 0, 所述三个第一 边缘图以及第三边缘图大小相同, 都为所述检测区域的四分之一大小, 将 所述三个第一边缘图和所述第三边缘图的拼接图作为所述第三通道的输 出数据。 The third channel is used to reduce the size of the detection area to a quarter of the original size, convert the reduced detection area into YUV format, and filter the conversion into YUV format through Sobel edge operator detection area, respectively obtain the first edge map of the detection area on the three channels Y, U, and V. The Y, U, and V channels respectively correspond to a first edge map, and a third edge map is generated, so The data at each position of the third edge map is 0, the three first edge maps and the third edge map are the same size, and are all a quarter of the size of the detection area, and the three first edge maps are and the third edge map as the output data of the third channel.
3、 根据权利要求 2所述的方法, 其特征在于, 所述部位检测层包括 三个子层, 分别为第一子层、 第二子层和第三子层, 所述部位检测层的第 一子层包括 Ml个过滤器,所述部位检测层的第二子层包括 M2个过滤器, 所述部位检测层的第三子层包括 M3个过滤器, 其中, M1+M2+M3=M; 所述部位检测层的第一子层的 Ml个过滤器分别检测所述检测区域内 的 Ml个部位, 得到 Ml个响应图; 3. The method according to claim 2, characterized in that the part detection layer includes three sub-layers, namely a first sub-layer, a second sub-layer and a third sub-layer, and the first part of the part detection layer The sub-layer includes M1 filters, the second sub-layer of the part detection layer includes M2 filters, and the third sub-layer of the part detection layer includes M3 filters, where M1+M2+M3=M; The M1 filters of the first sub-layer of the part detection layer respectively detect M1 parts in the detection area, and obtain M1 response maps;
所述部位检测层的第二子层的 M2个过滤器分别检测所述检测区域内 的 M2个部位, 得到 M2个响应图; The M2 filters of the second sub-layer of the part detection layer respectively detect M2 parts in the detection area, and obtain M2 response maps;
所述部位检测层的第三子层的 M3个过滤器分别检测所述检测区域内 的 M3个部位, 得到 M3个响应图。 The M3 filters of the third sub-layer of the part detection layer respectively detect M3 parts in the detection area, and obtain M3 response maps.
4、 根据权利要求 1所述的方法, 其特征在于, 所述形变处理层根据 所述 M个部位对应的响应图分别确定所述 M个部位的形变, 并根据所述 M个部位的形变确定所述 M个部位的得分图, 包括: 4. The method according to claim 1, wherein the deformation processing layer determines the deformation of the M parts according to the response maps corresponding to the M parts, and determines the deformation of the M parts according to the deformation of the M parts. The score maps of the M parts include:
所述形变处理层根据所述 M个部位对应的响应图,分别按照公式(1 ) 得到所述第 P个部位的形变得分图: The deformation processing layer obtains the deformation component diagram of the P-th part according to the response maps corresponding to the M parts according to formula (1):
Figure imgf000023_0001
其中, βρ表示第 ρ个部分的形变得分图, 1≤ρ≤Μ, Μρ表示所述第 ρ 个部分对应的响应图, Ν表示所述第 ρ个部位的限制条件, Dw表示第 η 个限制条件对应的得分图, 1≤^≤ 0^表示第 η个限制条件对应的权重; 所述形变处理层根据所述形变得分图, 按照公式 (2 ) 确定所述第 Ρ 部位的得分图: ^maxB (2) 其中, β )表示 (x, y)位置上 的值。
Figure imgf000023_0001
Among them, β ρ represents the deformation component diagram of the ρ-th part, 1≤ρ≤M, M ρ represents the response diagram corresponding to the ρ-th part, N represents the restriction condition of the ρ-th part, and D w represents the response diagram of the ρ-th part. The score map corresponding to n restriction conditions, 1≤^≤0^ represents the weight corresponding to the nth restriction condition; the deformation processing layer determines the score of the Pth part according to the deformation score map according to the formula (2) picture: ^maxB (2) Among them, β ) represents the value at the (x, y) position.
5、 根据权利要求 1所述的方法, 其特征在于, 所述遮挡处理层包括 三个子层, 分别为第一子层、 第二子层、 第三子层, 所述遮挡处理层根据 所述 M个部位的得分图确定所述 M个部位对应的遮挡, 包括: 5. The method according to claim 1, characterized in that the occlusion processing layer includes three sub-layers, namely a first sub-layer, a second sub-layer and a third sub-layer, and the occlusion processing layer is based on the The score map of M parts determines the occlusion corresponding to the M parts, including:
所述遮挡处理层确定所述 M个部位在所述遮挡处理层的子层上的得 分图和可视性; The occlusion processing layer determines the score map and visibility of the M parts on the sub-layer of the occlusion processing layer;
所述遮挡处理层的第一子层、第二子层、第三子层分别按照公式(3)、 The first sub-layer, second sub-layer and third sub-layer of the occlusion processing layer are respectively formed according to formula (3),
(4) 、 (5) 计算所述各个部位的可视性:(4), (5) Calculate the visibility of each part:
= + g (3)
Figure imgf000024_0001
= + g (3)
Figure imgf000024_0001
y^S((h3)Twcls + b) (5) 其中, 表示第 P个部位在所述遮挡处理层的第 1层上的得分图, ^ 表示 4的权重矩阵, 表示 4的偏置, 表示第 P个部位在所述遮挡处理 层的第 1层上的可视性, ^tXl + expi-t))-1, 表示第 P个部位在所述遮挡 处理层的第 Z子层上可视性, 用 ^表示 ^和 之间的传递矩阵, 表示 ^ 的第 j列, Wds表示隐含变量 的线性分类器的参数, «表示矩阵 X的 转置, 表示所述卷积神经网络的输出结果 。 y^S((h 3 ) T w cls + b) (5) where, represents the score map of the P-th part on the first layer of the occlusion processing layer, ^ represents the weight matrix of 4, represents the bias of 4 Position, represents the visibility of the P-th part on the first layer of the occlusion processing layer, ^tXl + expi-t))- 1 , represents the visibility of the P-th part on the Z-th sub-layer of the occlusion processing layer For visibility, let ^ represent the transfer matrix between ^ and , represent the jth column of ^ , W ds represent the parameters of the linear classifier of the latent variable, « represent the transpose of the matrix X, represent the convolutional neural The output of the network.
6、 一种卷积神经网络, 其特征在于, 包括: 6. A convolutional neural network, characterized by including:
特征提取层, 用于根据提取图像中检测区域的像素值, 对所述检测区 域的像素值进行预处理, 并对所述预处理后的图像进行特征提取, 得到所 述检测区域的特征图; The feature extraction layer is used to preprocess the pixel values of the detection area according to the pixel values of the detection area in the extracted image, and perform feature extraction on the preprocessed image to obtain a feature map of the detection area;
部位检测层, 用于通过 M个过滤器分别检测所述检测区域的特征图, 输出所述检测区域的 M个部位对应的响应图, 每个过滤器用于检测一个 部位, 每个部位对应一个响应图; The part detection layer is used to detect the feature maps of the detection area through M filters, and output response maps corresponding to the M parts of the detection area. Each filter is used to detect one part, and each part corresponds to a response. picture;
形变处理层, 用于根据所述 M个部位对应的响应图分别确定所述 M 个部位的形变, 并根据所述 N个部位的形变确定所述 M个部位的得分图; 遮挡处理层, 用于根据所述 M个部位的得分图确定所述 M个部位对 应的遮挡; Deformation processing layer, used to determine the M according to the response maps corresponding to the M parts. deformations of the N parts, and determine the score maps of the M parts based on the deformations of the N parts; an occlusion processing layer, used to determine the occlusion corresponding to the M parts based on the score maps of the M parts;
分类器, 用于根据所述遮挡处理层的输出结果, 确定所述检测区域内 是否有目标物体。 A classifier, used to determine whether there is a target object in the detection area based on the output result of the occlusion processing layer.
7、 根据权利要求 6所述的卷积神经网络, 其特征在于, 所述特征提 取层包括三个通道, 分别为第一通道、 第二通道、 第三通道; 7. The convolutional neural network according to claim 6, wherein the feature extraction layer includes three channels, namely a first channel, a second channel, and a third channel;
其中, 所述第一通道的输出数据对应所述检测区域内的 YUV像素值 的 Y通道数据; Wherein, the output data of the first channel corresponds to the Y channel data of the YUV pixel value in the detection area;
所述第二通道, 用于将所述检测区域的大小缩小为原大小的四分之 一, 并将所述缩小后的检测区域转换成 YUV格式, 通过 Sobel边缘算子 过滤所述转换为 YUV格式的检测区域, 分别得到所述检测区域在 Y、 U、 V三个通道上第一边缘图, 所述 Y、 U、 V通道分别对应一个第一边缘图, 取所述三个第一边缘图中各位置上的最大值, 组成一个第二边缘图, 所述 三个第一边缘图以及第二边缘图大小相同, 都为所述检测区域的四分之一 大小, 将所述三个第一边缘图和所述第二边缘图的拼接图作为所述第二通 道的输出数据; The second channel is used to reduce the size of the detection area to a quarter of the original size, convert the reduced detection area into YUV format, and filter the conversion into YUV through Sobel edge operator format detection area, respectively obtain the first edge map of the detection area on the three channels Y, U, and V. The Y, U, and V channels respectively correspond to a first edge map, and obtain the three first edges. The maximum value at each position in the figure forms a second edge map. The three first edge maps and the second edge map are of the same size, and are all a quarter of the size of the detection area. The three first edge maps are The spliced image of the first edge image and the second edge image is used as the output data of the second channel;
所述第三通道, 用于将所述检测区域的大小缩小为原大小的四分之 一, 并将所述缩小后的检测区域转换成 YUV格式, 通过 Sobel边缘算子 过滤所述转换为 YUV格式的检测区域, 分别得到所述检测区域在 Y、 U、 V三个通道上的第一边缘图, 所述 Y、 υ、 V通道分别对应一个第一边缘 图, 生成一个第三边缘图, 所述第三边缘图各位置的数据为 0, 所述三个 第一边缘图以及第三边缘图大小相同, 都为所述检测区域的四分之一大 小, 将所述三个第一边缘图和所述第三边缘图的拼接图作为所述第三通道 的输出数据。 The third channel is used to reduce the size of the detection area to a quarter of the original size, convert the reduced detection area into YUV format, and filter the conversion into YUV through Sobel edge operator Format detection area, obtain the first edge map of the detection area on the Y, U, and V channels respectively. The Y, υ, and V channels respectively correspond to a first edge map, and generate a third edge map, The data at each position of the third edge map is 0, the three first edge maps and the third edge map are the same size, and are all one-quarter of the size of the detection area. The three first edges are The spliced image of the image and the third edge image is used as the output data of the third channel.
8、 根据权利要求 7所述的卷积神经网络, 其特征在于, 所述部位检 测层包括三个子层, 分别为第一子层、 第二子层和第三子层, 所述部位检 测层的第一子层包括 Ml个过滤器, 所述部位检测层的第二子层包括 M2 个过滤器, 所述部位检测层的第三子层包括 M3个过滤器, 其中, 8. The convolutional neural network according to claim 7, wherein the part detection layer includes three sub-layers, namely a first sub-layer, a second sub-layer and a third sub-layer, and the part detection layer The first sub-layer of includes M1 filters, the second sub-layer of the part detection layer includes M2 filters, and the third sub-layer of the part detection layer includes M3 filters, where,
M1+M2+M3=M; 所述部位检测层的第一子层, 用于通过 Ml个过滤器分别检测所述检 测区域内的 Ml个部位, 得到 Ml个响应图; M1+M2+M3=M; The first sub-layer of the part detection layer is used to detect Ml parts in the detection area through Ml filters respectively, and obtain Ml response maps;
所述部位检测层的第二子层, 用于通过 M2个过滤器分别检测所述检 测区域内的 M2个部位, 得到 M2个响应图; The second sub-layer of the part detection layer is used to detect M2 parts in the detection area through M2 filters respectively, and obtain M2 response maps;
所述部位检测层的第三子层, 用于通过 M3个过滤器分别检测所述检 测区域内的 M3个部位, 得到 M3个响应图。 The third sub-layer of the part detection layer is used to respectively detect M3 parts in the detection area through M3 filters, and obtain M3 response maps.
9、 根据权利要求 8所述卷积神经网络, 其特征在于, 所述形变处理 层具体用于: 9. The convolutional neural network according to claim 8, characterized in that the deformation processing layer is specifically used for:
所述形变处理层根据所述 M个部位对应的响应图,分别按照公式(1) 得到所述第 P个部位的形变得分图: The deformation processing layer obtains the deformation component diagram of the P-th part according to the response maps corresponding to the M parts according to formula (1):
Figure imgf000026_0001
其中, ^表示第 p个部分的形变得分图, l≤p≤M, Mp表示所述第 p 水部分对应的响应图, N表示所述第 p个部位的限制条件, Dw表示第 n 水限制条件对应的得分图, l≤w≤N,C„^表示第 n个限制条件对应的权重; 所述形变处理层根据所述形变得分图, 按照公式 (2) 确定所述第 P 部位的得分图:
Figure imgf000026_0001
Among them, ^ represents the deformation component diagram of the p-th part, l≤p≤M, M p represents the response diagram corresponding to the p-th water part, N represents the restriction condition of the p-th part, D w represents the n-th The score map corresponding to the water restriction condition, l≤w≤N, C„^ represents the weight corresponding to the nth restriction condition; the deformation processing layer determines the P-th part according to the deformation score map according to the formula (2) Score chart of:
sD = maxB (2) s D = maxB (2)
p 其中, β )表示 (χ, y)位置上 βρ的值。 p where, β ) represents the value of β ρ at the (χ, y) position.
10、 根据权利要求 8所述卷积神经网络, 其特征在于, 所述遮挡处理 层包括三个子层, 分别为第一子层、 第二子层、 第三子层; 10. The convolutional neural network according to claim 8, wherein the occlusion processing layer includes three sub-layers, namely a first sub-layer, a second sub-layer, and a third sub-layer;
所述遮挡处理层的第一子层、第二子层、第三子层分别按照公式(3)、 The first sub-layer, second sub-layer and third sub-layer of the occlusion processing layer are respectively formed according to formula (3),
(4) 、 (5) 计算所述各个部位的可视性:(4), (5) Calculate the visibility of each part:
Figure imgf000026_0002
Figure imgf000026_0002
^ w^ + c^+g';1^1), 1=1, 2 (4) y^S((h3)T wc!s + b) ^ w^ + c^+g'; 1 ^ 1 ), 1=1, 2 (4) y^S((h 3 ) T w c!s + b)
(5) 其中, ^表示第 P个部位在所述遮挡处理层的第 1层上的得分图, 表 /」、 的权重矩阵, 表示 的偏置, 表示第 P个部位在所述遮挡处理层的 (5) Among them, ^ represents the score map of the P-th part on the first layer of the occlusion processing layer, the weight matrix of table/", represents the bias, represents the P-th part on the occlusion processing layer of
1层上的可视性, ^tXl + expi-t))-1, 表示第 P个部位在所述遮挡处理 层的第 Z子层上可视性,用 ^表示/ ^和/ ^之间的传递矩阵, 表示 ^的第 j列, 表示隐含变量 ^的线性分类器的参数, ^表示矩阵 X的转置, 表示所述卷积神经网络的输出结果 。 The visibility on layer 1, ^tXl + expi-t))- 1 , represents the P-th part during the occlusion process Visibility on the Z-th sub-layer of the layer, ^ represents the transfer matrix between / ^ and / ^, represents the j-th column of ^, represents the parameters of the linear classifier of the latent variable ^, ^ represents the transformation of the matrix X Set, represents the output result of the convolutional neural network.
PCT/CN2014/081676 2013-11-29 2014-07-04 Convolutional neural network and target object detection method based on same WO2015078185A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201310633797.4 2013-11-29
CN201310633797.4A CN104680508B (en) 2013-11-29 2013-11-29 Convolutional neural networks and the target object detection method based on convolutional neural networks

Publications (1)

Publication Number Publication Date
WO2015078185A1 true WO2015078185A1 (en) 2015-06-04

Family

ID=53198302

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2014/081676 WO2015078185A1 (en) 2013-11-29 2014-07-04 Convolutional neural network and target object detection method based on same

Country Status (2)

Country Link
CN (1) CN104680508B (en)
WO (1) WO2015078185A1 (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017015887A1 (en) * 2015-07-29 2017-02-02 Nokia Technologies Oy Object detection with neural network
CN107122798A (en) * 2017-04-17 2017-09-01 深圳市淘米科技有限公司 Chin-up count detection method and device based on depth convolutional network
CN107423306A (en) * 2016-05-24 2017-12-01 华为技术有限公司 A kind of image search method and device
CN108121986A (en) * 2017-12-29 2018-06-05 深圳云天励飞技术有限公司 Object detection method and device, computer installation and computer readable storage medium
CN108320026A (en) * 2017-05-16 2018-07-24 腾讯科技(深圳)有限公司 Machine learning model training method and device
CN108629226A (en) * 2017-03-15 2018-10-09 纵目科技(上海)股份有限公司 A kind of vehicle checking method and system based on image layered technology
CN111950727A (en) * 2020-08-06 2020-11-17 中科智云科技有限公司 Neural network training and testing method and device for image data
EP3745347A4 (en) * 2018-01-26 2021-12-15 BOE Technology Group Co., Ltd. Image processing method, processing apparatus and processing device
CN114224354A (en) * 2021-11-15 2022-03-25 吉林大学 Arrhythmia classification method, device and readable storage medium

Families Citing this family (30)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104573731B (en) * 2015-02-06 2018-03-23 厦门大学 Fast target detection method based on convolutional neural networks
WO2017015947A1 (en) 2015-07-30 2017-02-02 Xiaogang Wang A system and a method for object tracking
WO2017151926A1 (en) 2016-03-03 2017-09-08 Google Inc. Deep machine learning methods and apparatus for robotic grasping
CN109074513B (en) 2016-03-03 2020-02-18 谷歌有限责任公司 Deep machine learning method and device for robot gripping
CN105976400B (en) * 2016-05-10 2017-06-30 北京旷视科技有限公司 Method for tracking target and device based on neural network model
CN106127204B (en) * 2016-06-30 2019-08-09 华南理工大学 A kind of multi-direction meter reading Region detection algorithms of full convolutional neural networks
CN106295678B (en) 2016-07-27 2020-03-06 北京旷视科技有限公司 Neural network training and constructing method and device and target detection method and device
CN106529569B (en) * 2016-10-11 2019-10-18 北京航空航天大学 Threedimensional model triangular facet feature learning classification method and device based on deep learning
CN106548207B (en) * 2016-11-03 2018-11-30 北京图森未来科技有限公司 A kind of image processing method neural network based and device
CN106778773B (en) * 2016-11-23 2020-06-02 北京小米移动软件有限公司 Method and device for positioning target object in picture
CN106599832A (en) * 2016-12-09 2017-04-26 重庆邮电大学 Method for detecting and recognizing various types of obstacles based on convolution neural network
CN106803247B (en) * 2016-12-13 2021-01-22 上海交通大学 Microangioma image identification method based on multistage screening convolutional neural network
CN106845338B (en) * 2016-12-13 2019-12-20 深圳市智美达科技股份有限公司 Pedestrian detection method and system in video stream
CN108229509B (en) * 2016-12-16 2021-02-26 北京市商汤科技开发有限公司 Method and device for identifying object class and electronic equipment
US10157441B2 (en) 2016-12-27 2018-12-18 Automotive Research & Testing Center Hierarchical system for detecting object with parallel architecture and hierarchical method thereof
CN106845415B (en) * 2017-01-23 2020-06-23 中国石油大学(华东) Pedestrian fine identification method and device based on deep learning
CN109118459B (en) * 2017-06-23 2022-07-19 南开大学 Image salient object detection method and device
CN107609586A (en) * 2017-09-08 2018-01-19 深圳市唯特视科技有限公司 A kind of visual characteristic learning method based on self-supervision
US10664728B2 (en) 2017-12-30 2020-05-26 Wipro Limited Method and device for detecting objects from scene images by using dynamic knowledge base
US10650211B2 (en) 2018-03-28 2020-05-12 Datalogic IP Tech, S.r.l. Artificial intelligence-based machine readable symbol reader
CN109190455B (en) * 2018-07-18 2021-08-13 东南大学 Black smoke vehicle identification method based on Gaussian mixture and autoregressive moving average model
CN109101926A (en) * 2018-08-14 2018-12-28 河南工业大学 Aerial target detection method based on convolutional neural networks
CN109297975A (en) * 2018-08-16 2019-02-01 奇酷互联网络科技(深圳)有限公司 Mobile terminal and detection method, storage device
CN109102543B (en) * 2018-08-17 2021-04-02 深圳蓝胖子机器智能有限公司 Object positioning method, device and storage medium based on image segmentation
CN109284606B (en) * 2018-09-04 2019-08-27 中国人民解放军陆军工程大学 Data flow anomaly detection system based on empirical features and convolutional neural networks
CN110119682A (en) * 2019-04-04 2019-08-13 北京理工雷科电子信息技术有限公司 A kind of infrared remote sensing Image Fire point recognition methods
CN110610475B (en) * 2019-07-07 2021-09-03 河北工业大学 Visual defect detection method of deep convolutional neural network
US11568251B1 (en) * 2020-06-05 2023-01-31 Ambarella International Lp Dynamic quantization for models run on edge devices
CN111931703B (en) * 2020-09-14 2021-01-05 中国科学院自动化研究所 Object detection method based on human-object interaction weak supervision label
CN112488074A (en) * 2020-12-21 2021-03-12 哈尔滨理工大学 Guide area dense crowd counting method based on convolutional neural network

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5274714A (en) * 1990-06-04 1993-12-28 Neuristics, Inc. Method and apparatus for determining and organizing feature vectors for neural network recognition
WO2009041350A1 (en) * 2007-09-26 2009-04-02 Canon Kabushiki Kaisha Calculation processing apparatus and method
CN101763641A (en) * 2009-12-29 2010-06-30 电子科技大学 Method for detecting contour of image target object by simulated vision mechanism
US20110182469A1 (en) * 2010-01-28 2011-07-28 Nec Laboratories America, Inc. 3d convolutional neural networks for automatic human action recognition
US20110222724A1 (en) * 2010-03-15 2011-09-15 Nec Laboratories America, Inc. Systems and methods for determining personal characteristics

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6038337A (en) * 1996-03-29 2000-03-14 Nec Research Institute, Inc. Method and apparatus for object recognition
CN102034079B (en) * 2009-09-24 2012-11-28 汉王科技股份有限公司 Method and system for identifying faces shaded by eyeglasses
CN101957682B (en) * 2010-09-16 2012-07-18 南京航空航天大学 Method for implementing load identification interactive whiteboard
CN102169544A (en) * 2011-04-18 2011-08-31 苏州市慧视通讯科技有限公司 Face-shielding detecting method based on multi-feature fusion
CN102663409B (en) * 2012-02-28 2015-04-22 西安电子科技大学 Pedestrian tracking method based on HOG-LBP
CN103279759B (en) * 2013-06-09 2016-06-01 大连理工大学 A kind of vehicle front trafficability analytical procedure based on convolutional neural networks

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5274714A (en) * 1990-06-04 1993-12-28 Neuristics, Inc. Method and apparatus for determining and organizing feature vectors for neural network recognition
WO2009041350A1 (en) * 2007-09-26 2009-04-02 Canon Kabushiki Kaisha Calculation processing apparatus and method
CN101763641A (en) * 2009-12-29 2010-06-30 电子科技大学 Method for detecting contour of image target object by simulated vision mechanism
US20110182469A1 (en) * 2010-01-28 2011-07-28 Nec Laboratories America, Inc. 3d convolutional neural networks for automatic human action recognition
US20110222724A1 (en) * 2010-03-15 2011-09-15 Nec Laboratories America, Inc. Systems and methods for determining personal characteristics

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
OUYANG, WANLI ET AL.: "Modeling Mutual Visibility Relationship in Pedestrian Detection", 2013 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, 28 June 2013 (2013-06-28), pages 3224 - 3227 *

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017015887A1 (en) * 2015-07-29 2017-02-02 Nokia Technologies Oy Object detection with neural network
US10614339B2 (en) 2015-07-29 2020-04-07 Nokia Technologies Oy Object detection with neural network
CN107423306B (en) * 2016-05-24 2021-01-29 华为技术有限公司 Image retrieval method and device
CN107423306A (en) * 2016-05-24 2017-12-01 华为技术有限公司 A kind of image search method and device
CN108629226A (en) * 2017-03-15 2018-10-09 纵目科技(上海)股份有限公司 A kind of vehicle checking method and system based on image layered technology
CN108629226B (en) * 2017-03-15 2021-10-22 纵目科技(上海)股份有限公司 Vehicle detection method and system based on image layering technology
CN107122798A (en) * 2017-04-17 2017-09-01 深圳市淘米科技有限公司 Chin-up count detection method and device based on depth convolutional network
CN108320026A (en) * 2017-05-16 2018-07-24 腾讯科技(深圳)有限公司 Machine learning model training method and device
CN108320026B (en) * 2017-05-16 2022-02-11 腾讯科技(深圳)有限公司 Machine learning model training method and device
CN108121986A (en) * 2017-12-29 2018-06-05 深圳云天励飞技术有限公司 Object detection method and device, computer installation and computer readable storage medium
CN108121986B (en) * 2017-12-29 2019-12-17 深圳云天励飞技术有限公司 Object detection method and device, computer device and computer readable storage medium
EP3745347A4 (en) * 2018-01-26 2021-12-15 BOE Technology Group Co., Ltd. Image processing method, processing apparatus and processing device
CN111950727A (en) * 2020-08-06 2020-11-17 中科智云科技有限公司 Neural network training and testing method and device for image data
CN114224354A (en) * 2021-11-15 2022-03-25 吉林大学 Arrhythmia classification method, device and readable storage medium
CN114224354B (en) * 2021-11-15 2024-01-30 吉林大学 Arrhythmia classification method, arrhythmia classification device, and readable storage medium

Also Published As

Publication number Publication date
CN104680508B (en) 2018-07-03
CN104680508A (en) 2015-06-03

Similar Documents

Publication Publication Date Title
WO2015078185A1 (en) Convolutional neural network and target object detection method based on same
Lian et al. Attention guided U-Net for accurate iris segmentation
Seo et al. Attentive semantic alignment with offset-aware correlation kernels
Wen et al. Deep color guided coarse-to-fine convolutional network cascade for depth image super-resolution
US11315266B2 (en) Self-supervised depth estimation method and system
WO2018188453A1 (en) Method for determining human face area, storage medium, and computer device
Davis et al. A two-stage template approach to person detection in thermal imagery
Li et al. Robust visual tracking based on convolutional features with illumination and occlusion handing
CN103279936B (en) Human face fake photo based on portrait is synthesized and modification method automatically
EP2590111B1 (en) Face recognition apparatus and method for controlling the same
JP5505409B2 (en) Feature point generation system, feature point generation method, and feature point generation program
CN106981077A (en) Infrared image and visible light image registration method based on DCE and LSS
JP2018022360A (en) Image analysis device, image analysis method and program
WO2021069945A1 (en) Method for recognizing activities using separate spatial and temporal attention weights
CN103295010A (en) Illumination normalization method for processing face images
KR20220023323A (en) Automatic multi-organ and tumor contouring system based on artificial intelligence for radiation treatment planning
Sun et al. Super resolution reconstruction of images based on interpolation and full convolutional neural network and application in medical fields
JP2012128638A (en) Image processing device, alignment method and program
CN110111368B (en) Human body posture recognition-based similar moving target detection and tracking method
CN111914749A (en) Lane line recognition method and system based on neural network
Rigamonti et al. Filter learning for linear structure segmentation
Han et al. Locally adaptive contrast enhancement using convolutional neural network
JP2023082065A (en) Method of discriminating objet in image having biometric characteristics of user to verify id of the user by separating portion of image with biometric characteristic from other portion
JP2019220174A (en) Image processing using artificial neural network
CN112926500B (en) Pedestrian detection method combining head and overall information

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14866185

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 14866185

Country of ref document: EP

Kind code of ref document: A1