CN107767416A

CN107767416A - The recognition methods of pedestrian's direction in a kind of low-resolution image

Info

Publication number: CN107767416A
Application number: CN201710791630.9A
Authority: CN
Inventors: 张见威; 温春霞
Original assignee: South China University of Technology SCUT
Current assignee: South China University of Technology SCUT
Priority date: 2017-09-05
Filing date: 2017-09-05
Publication date: 2018-03-06
Anticipated expiration: 2037-09-05
Also published as: CN107767416B

Abstract

The invention provides a kind of recognition methods of pedestrian's direction in low-resolution image, comprise the steps：(1) pedestrian is made towards data set and label, three kinds of generation pedestrian head, leg, whole body training samples；(2) mirror image processing and multi-scale sampling operation amplification training dataset are carried out to image；(3) it is trained using framework caffe, obtains corresponding neural network model；(4) neural network model for utilizing (3) to train extracts pedestrian image triple channel feature respectively；(5) classified using Softmax graders, obtained final towards recognition result.The invention provides a kind of generation method of the pedestrian towards image data set, the operation such as the mirror image processing taken and multi-scale sampling can effectively expand small data set, mitigate over-fitting；The convolutional neural networks model for the identification pedestrian's direction trained by deep learning can extract the feature of more discrimination, and the weighting to pedestrian image triple channel feature causes more accurate reasonable towards classification results.

Description

The recognition methods of pedestrian's direction in a kind of low-resolution image

Technical field

The present invention relates to the research field of image procossing, the identification of pedestrian's direction in more particularly to a kind of low-resolution image Method.

Background technology

In research fields such as image recognition, segmentation, reconstructions, the direction of objects in images is to seek as a kind of supplemental characteristic Look for key point, description posture, the important foundation for judging movement tendency.Object is believed towards multiple image is relied primarily in present image Breath, motion track information or camera calibration information obtain, for object in single-frame images relative towards the research of identification It is less.Tulsiani etc. article " Pose Induction for Novel Object Categories " (IEEE International Conference on Computer Vision,2015:64-72) propose the method by concluding, profit The direction of no tag object class is predicted with the small-scale seed set of tagged asset class.This method propose based on CNN The postural assessment system of (Convolutional Neural Network, convolutional neural networks), can be different classes of in utilization Between object while similitude, implicitly learn the General Expression to be come in handy to predicting the direction of different classes of object.This Kind method trains the convolution that " the broad category device " come has been played in CNN and is extracting various feature, in terms of processing picture displacement Advantage, yielded good result on some data sets, but its height to tape label seed set relies on and locating When managing the image larger with the image difference in tape label seed set, direction can not be concluded well.

In practical application scene, due to the influence of the environmental factors such as video camera setting, illumination, pedestrian image generally has The characteristics of visual angle is various, serious shielding, relatively low resolution ratio, make object of the pedestrian towards in than resolution normal image in resolution image Direction is more difficult, so some the pedestrian image processing methods of currently used pedestrian's orientation information as supplemental characteristic, to row The judgement of people's direction generally requires to use camera calibration information, pedestrian movement's trace information etc., but our images for grasping Data often lack corresponding calibration and space time information so that these methods have larger limitation in actual applications.For This, the present invention proposes a kind of recognition methods of pedestrian's direction in low-resolution image, and pedestrian's direction in single-frame images is entered Row identification.

The content of the invention

The shortcomings that it is a primary object of the present invention to overcome prior art and deficiency, there is provided row in a kind of low-resolution image The recognition methods of people's direction, the convolutional neural networks model for the identification pedestrian's direction trained by deep learning can extract more The feature of discrimination, the weighting to pedestrian image triple channel feature make it that classification results are more accurate reasonable.

In order to achieve the above object, the present invention adopts the following technical scheme that：

The recognition methods of pedestrian's direction, comprises the steps in a kind of low-resolution image of the present invention：

(1) pedestrian is made towards data set and label, three kinds of generation pedestrian head, leg, whole body training samples；

(2) mirror image processing and multi-scale sampling operation amplification training dataset are carried out to image；

(3) towards label depth is inputted three kinds of pedestrian head after amplification, leg, whole body image training samples and its respectively It is trained in degree learning framework caffe, obtains corresponding neural network model；

(4) image of identification pedestrian's direction is treated, the neural network model trained using step (3) extracts pedestrian respectively Image three channel feature；

(5) feature of different passages is weighted according to pedestrian image structure priori, utilizes Softmax graders Classified, obtained final towards recognition result.

As preferential technical scheme, in the step (1), pedestrian towards data set and label making concrete operations such as Under：

Four directions of pedestrian are respectively in definition image：Front, the back side, the left side, the right side, it is that every pedestrian image is distributed One towards label, according to the priori of pedestrian image structure and identification target, proportionally cut pedestrian head, trunk, Leg subgraph, pedestrian head and leg subgraph are chosen, assigned same with former pedestrian image towards label, generation pedestrian's head Three kinds of portion, leg, whole body training samples.

As preferential technical scheme, in the step (2), the method for training dataset is expanded such as using mirror image processing Under：

The image concentrated to data makees mirror image processing in horizontal direction, i.e., the pixel value of the right and left in swap image, The mirror image of original image is obtained, to the original image G that size is MXN, pixel values of its mirror image G' in (i, j) position is：

G'[i, j]=G [N-i+1, j]

For the pedestrian image towards label for front or the back side, assign mirror image and marked with original image identical direction Label, for the label on the mirror image right side or the left side corresponding to towards the pedestrian image that label is the left side or the right side, assigning respectively, All mirror images are added into data set.

As preferential technical scheme, in the step (2), the method for training dataset is expanded such as using multi-scale sampling Under：

Multiple dimensioned linear space filtering process, i.e., each pixel in pedestrian image are carried out to the image that data are concentrated In the contiguous range of point (x, y), each pixel and corresponding multiplication, result is added up to obtain point (x, y) in neighborhood The response at place, the matrix of these coefficients composition is corresponding wave filter, can be with by the adjustment to wave filter size and numerical value The filter result of image different scale is obtained, realizes the purpose of multi-scale sampling, the multi-scale sampling image obtained is assigned With original image identical towards label, added in data set.

As preferential technical scheme, in the step (2), the specific behaviour of training dataset is expanded using multi-scale sampling Make as follows：

The filter result of image different scale is obtained by the adjustment to wave filter size and numerical value, realizes multi-scale sampling Purpose, using motion blur build wave filter f, blurred length len, obscure angle be θ motion blur i.e. represent with Len pixel is moved on the direction that horizontal direction angle is θ, wave filter f building process is as described below：

The size that (2-1) constructs two-dimensional matrix a f, f should meet that it is len that can accommodate a length just, and slope is Tan θ line segment l, it is assumed that the size of matrix is axb, then a=len*cos θ+1, b=len*sin θ；

(2-2) calculates the minimum range N_D of the position and line segment l to the position (i, j) in matrix f：

N_D=jcos θ-isin θ

(2-3) calculates the coefficient value at (i, j) place according to minimum range N_D：

F (I, j)=max (1-N_D, 0)；

(2-4) f is normalized：

Using multiple pixel sizes as blurred length, different wave filters is built respectively with multiple angles image is filtered Processing, the multi-scale sampling image obtained is added in data set, all labels are identical with original image.

As preferential technical scheme, in the step (4), extraction pedestrian image triple channel feature concrete operations are as follows：

Definition pedestrian head, leg, whole body images are characterized as pedestrian's triple channel feature, by the row of the direction to be detected cut The convolution for pedestrian head, leg, whole body images that head part, leg, whole body images difference input step (3) train is refreshing Through network, head is respectively extracted, three leg, whole body passages recognize feature.

It is as follows to the characteristic weighing formula of different passages in the step (5) as preferential technical scheme：

F (X)=ω₁f(head)+ω₂f(body)+ω₃f(leg)

Wherein：F (head), f (body), f (leg) are respectively pedestrian's head that the neural network model corresponding to extracts Portion, whole body and leg characteristics of image, ω_i, i=1,2,3 be weight coefficient, and f (X) is the final spy for differentiating pedestrian's direction Sign.

The present invention compared with prior art, has the following advantages that and beneficial effect：

(1), the invention provides a kind of generation method of the pedestrian towards image data set, the mirror image processing taken and more chis The operations such as degree sampling can effectively expand small data set, mitigate over-fitting, while strengthen the Shandong to low-resolution image processing Rod；

(2), the present invention proposes pedestrian image triple channel feature, passes through the convolutional neural networks model extraction that trains Important pedestrian's architectural feature such as head, leg, whole body, more information for determining pedestrian's direction in image can be obtained, to low resolution In rate image pedestrian's direction identification and classification provide necessary and sufficient information；

(3), the convolutional neural networks for identification pedestrian's direction that the present invention is trained by deep learning can extract more area The feature of indexing, the weighting to different characteristic make it that classification results are more accurate reasonable.

(4), the present invention can preferably be located by identifying pedestrian's orientation information to the visual angle change of pedestrian image Reason, contribute to pedestrian's gesture recognition, the again higher level application such as identification.

Brief description of the drawings

Fig. 1 is the overall flow of the recognition methods key step of pedestrian's direction in low-resolution image disclosed by the invention Figure.

Fig. 2 is the network structure of deep learning framework used in the present invention.

Embodiment

With reference to embodiment and accompanying drawing, the present invention is described in further detail, but embodiments of the present invention are unlimited In this.

Embodiment

The schematic flow sheet of the present invention is as shown in Figure 1.Concrete technical scheme is：First make pedestrian towards image data set and Label, three kinds of generation pedestrian head, leg, whole body training samples, mirror image processing is carried out to image and multi-scale sampling operation is expanded Increase training dataset, be then trained respectively using the deep learning framework caffe samples different to three kinds, using training Three kinds of convolutional neural networks extract pedestrian image head to be detected, leg, whole body triple channel feature, according to pedestrian image structure Priori is weighted to the feature of different passages, is classified using Softmax graders.

Specifically include following main technical points：

1. pedestrian is towards the making of data set and label

Four directions of pedestrian are respectively in present invention definition image：Front, the back side, the left side, the right side, wherein, training set In each towards there is 300 pedestrian images, totally 1200 images, it is each towards there is 50 pedestrian images in test set, totally 200 Image.It is that every pedestrian image is distributed accordingly towards label, according to the priori of pedestrian image structure and identification target, presses According to 1:2:3 ratio cuts pedestrian head, trunk, leg subgraph, is assigned and former pedestrian image to head and leg subgraph Same label, three kinds of generation pedestrian head, leg, whole body training samples.

2. mirror image processing

The image concentrated to data makees the mirror image processing in horizontal direction, and the size for obtaining original image G first is MXN, then Value calculation formula of the mirror image G' in (i, j) position is as follows：

G'[i, j]=G [N-i+1, j]

For the pedestrian image towards label for front or the back side, mirror image and original image identical label are assigned, it is right The label on the mirror image right side or the left side corresponding to towards the pedestrian image that label is the left side or the right side, assign respectively, by institute There is mirror image to add data set, make one times of training dataset popularization, totally 2400 images.

3. multi-scale sampling

The filter result of image different scale is obtained by the adjustment to wave filter size and numerical value, it is possible to achieve multiple dimensioned The purpose of sampling.In the present embodiment, in order to reflect the relative motion of pedestrian and video camera, wave filter is built using motion blur f.Blurred length is len, and the motion blur that fuzzy angle is θ represents to be that len is moved on θ direction with horizontal direction angle Individual pixel, wave filter f building process are as described below：

(1) constructing two-dimensional matrix a f, f size should meet that it is len, slope tan that can accommodate a length just θ line segment l, it is assumed that the size of matrix is axb, then a=len*cos θ+1, b=len*sin θ；

(2) to the position (i, j) in matrix f, the minimum range N_D of the position and line segment l is calculated：

N_D=jcos θ-isin θ

(3) coefficient value at (i, j) place is calculated according to minimum range N_D：

F (i, j)=max (1-N_D, 0)；

(4) f is normalized：

Using 9 pixel sizes as blurred length, with -10 °, -5 °, 0 °, 5 °, 10 ° of five angles build different filters respectively Ripple device is filtered processing to image, the multi-scale sampling image obtained is added in data set, all labels and artwork As identical.

After multi-scale sampling image is added into data set, 6 times of training set popularization, increased to by 2400 images 14400 images, form final training dataset.

4. training convolutional neural networks

The present invention extracts pedestrian towards feature using deep learning framework caffe training convolutional neural networks, and caffe is one Individual clear efficient deep learning framework, there is provided be available for each channel type of user-defined model, can be easily extended to In new task and setting.Caffe frameworks mainly have a Solver, Net, Layer, five components of Blob, Proto, wherein Solver is responsible for carrying out the training of depth network, and each Solver includes a training network object and a test network pair As network N et is made up of several Layer, and Layer is used for defining the details such as convolution, pond, each Layer input and defeated Go out and stored with Blob matrix structures, and Proto is a kind of data interchange format, for the structure definition of network model, storage and Read.Trained using caffe as follows towards the convolutional neural networks specific implementation details of identification for pedestrian：

(1) image is concentrated to be converted into lmdb three pedestrian head, leg and whole body position subgraphs in data set first Form, and upset the different orders towards image, then calculate the average image of subgraph image set, the subgraph by band towards label Collection image and average image are trained as caffe input.

(2) as shown in Fig. 2 network structure uses the network structure for being used to train Imagenet provided in caffe, including Five convolutional layers and three full articulamentums, first convolutional layer include the convolution kernel that 96 sizes are 11*11*3, second convolution Layer is using the output of first convolutional layer as input, and the convolution kernel for being 5*5*48 with 256 sizes carries out convolution, the two volumes The output of lamination has all carried out maximum pondization and LRN standardization, followed by three no carry out pond and standardization three Convolution kernel that convolution kernel that convolutional layer is respectively 3*3*256 comprising 384 sizes, 384 sizes are 3*3*192,256 3* 3*192 convolution kernel.Three full articulamentums respectively have 4096,4096,4 neurons.

(3) in the training process, optimized parameter is solved using stochastic gradient descent algorithm, often trains 500 times and carry out once Test, learning rate 0.001, maximum iteration are 20000 times, are trained on GPU.

5. extract feature

The pin that head, leg, whole body images the difference input step (3) of the pedestrian of the direction to be detected cut is trained To pedestrian head, leg, whole body images convolutional neural networks, respectively using the dimensional vector of output 4096 of fc7 layers as each logical The feature in road.

6. weighting classification

It is as follows to the characteristic weighing formula of different passages：

F (X)=ω₁f(head)+ω₂f(body)+ω₃f(leg)

Wherein：F (head), f (body), f (leg) are respectively the row that the convolutional neural networks trained with (3) extract Head part, whole body and leg feature, ω_i, i=1,2,3 be weight coefficient, according to pedestrian image structure priori, takes ω₁= 1/6, ω₂=1/2, ω₃=1, f (X) are the final feature for differentiating pedestrian's direction.

The present invention is classified using Softmax graders, and the grader is applied to solve more classification problems, tag along sort More than two values can be taken, in pedestrian towards in identification mission, it is y to take tag along sort⁽ⁱ⁾={ 0,1,2,3 } representative front, The back side, the left side, the right side }, in Softmax recurrence, the feature f (X) of input is categorized as classification j (j=0,1,2,3) probability For：

Wherein, θ_l, l=1 ..., k are the parameters of the sorter model obtained in training convolutional neural networks, probable value That maximum classification is recognition result.

Above-described embodiment is the preferable embodiment of the present invention, but embodiments of the present invention are not by above-described embodiment Limitation, other any Spirit Essences without departing from the present invention with made under principle change, modification, replacement, combine, simplification, Equivalent substitute mode is should be, is included within protection scope of the present invention.

Claims

1. the recognition methods of pedestrian's direction in a kind of low-resolution image, it is characterised in that comprise the steps：

(3) towards label depth is inputted three kinds of pedestrian head after amplification, leg, whole body image training samples and its respectively Practise and being trained in framework caffe, obtain corresponding neural network model；

(4) image of identification pedestrian's direction is treated, the neural network model trained using step (3) extracts pedestrian image respectively Triple channel feature；

(5) feature of different passages is weighted according to pedestrian image structure priori, carried out using Softmax graders Classification, is obtained final towards recognition result.

2. according to claim 1 in low-resolution image pedestrian's direction recognition methods, it is characterised in that the step (1) in, the making concrete operations of pedestrian towards data set and label are as follows：

Four directions of pedestrian are respectively in definition image：Front, the back side, the left side, the right side, it is that every pedestrian image distributes one Towards label, according to the priori of pedestrian image structure and identification target, pedestrian head, trunk, leg are proportionally cut Subgraph, choose pedestrian head and leg subgraph, assign it is same with former pedestrian image towards label, generate pedestrian head, Three kinds of leg, whole body training samples.

3. according to claim 1 in low fractional diagram picture pedestrian's direction recognition methods, it is characterised in that the step (2) In, the method that training dataset is expanded using mirror image processing is as follows：

The image concentrated to data makees the mirror image processing in horizontal direction, i.e., the pixel value of the right and left, is obtained in swap image The mirror image of original image, to the original image G that size is MXN, pixel values of its mirror image G' in (i, j) position is：

G'[i, j]=G [N-i+1, j]

For the pedestrian image towards label for front or the back side, mirror image is assigned with original image identical towards label, it is right The label on the mirror image right side or the left side corresponding to towards the pedestrian image that label is the left side or the right side, assign respectively, by institute There is mirror image to add data set.

4. according to claim 1 in low fractional diagram picture pedestrian's direction recognition methods, it is characterised in that the step (2) In, the method that training dataset is expanded using multi-scale sampling is as follows：

Multiple dimensioned linear space filtering process, i.e., each pixel in pedestrian image are carried out to the image that data are concentrated In the contiguous range of (x, y), each pixel and corresponding multiplication in neighborhood, result is added up to obtain point (x, y) place Response, these coefficients composition matrix be corresponding wave filter, can be obtained by the adjustment to wave filter size and numerical value To the filter result of image different scale, realize the purpose of multi-scale sampling, by the multi-scale sampling image obtained assign with Original image identical is towards label, added in data set.

5. according to claim 4 in low fractional diagram picture pedestrian's direction recognition methods, it is characterised in that the step (2) In, the concrete operations that training dataset is expanded using multi-scale sampling are as follows：

The filter result of image different scale is obtained by the adjustment to wave filter size and numerical value, realizes the mesh of multi-scale sampling , using motion blur build wave filter f, blurred length len, obscure angle be θ motion blur i.e. represent with level Angular separation is that len pixel is moved on θ direction, and wave filter f building process is as described below：

N_D=jcos θ-isin θ

F (I, j)=max (1-N_D, 0)；

(2-4) f is normalized：

<mrow> <mi>f</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>,</mo> <mi>j</mi> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mrow> <mi>f</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>,</mo> <mi>j</mi> <mo>)</mo> </mrow> </mrow> <mrow> <munder> <mo>&Sigma;</mo> <mrow> <mi>i</mi> <mo>,</mo> <mi>j</mi> </mrow> </munder> <mi>f</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>,</mo> <mi>j</mi> <mo>)</mo> </mrow> </mrow> </mfrac> </mrow>

Using multiple pixel sizes as blurred length, different wave filters is built respectively with multiple angles place is filtered to image Reason, the multi-scale sampling image obtained is added in data set, all labels are identical with original image.

6. according to claim 1 in low-resolution image pedestrian's direction recognition methods, it is characterised in that the step (4) in, extraction pedestrian image triple channel feature concrete operations are as follows：

Definition pedestrian head, leg, whole body images are characterized as pedestrian's triple channel feature, by pedestrian's head of the direction to be detected cut What portion, leg, whole body images difference input step (3) trained is directed to pedestrian head, leg, the convolutional Neural net of whole body images Network, head is extracted respectively, leg, whole body three passages recognize feature.

7. according to claim 1 in low-resolution image pedestrian's direction recognition methods, it is characterised in that the step (5) it is as follows to the characteristic weighing formula of different passages in：

F (X)=ω₁f(head)+ω₂f(body)+ω₃f(leg)

Wherein：F (head), f (body), f (leg) be respectively the neural network model corresponding to withing extract pedestrian head, entirely Body and leg characteristics of image, ω_i, i=1,2,3 be weight coefficient, and f (X) is the final feature for differentiating pedestrian's direction.