CN110046599A

CN110046599A - Intelligent control method based on depth integration neural network pedestrian weight identification technology

Info

Publication number: CN110046599A
Application number: CN201910330924.0A
Authority: CN
Inventors: 梁子; 华如照; 张越; 迟剑宁; 王文浩
Original assignee: Northeastern University China
Current assignee: Northeastern University China
Priority date: 2019-04-23
Filing date: 2019-04-23
Publication date: 2019-07-23

Abstract

The present invention provides a kind of intelligent control method based on depth integration neural network pedestrian weight identification technology.After the method for the present invention carries out the pretreatment of color enhancement to the image of acquisition, traditional-handwork feature is extracted to it and depth residual error convolutional neural networks extract image, the traditional-handwork feature after dimensionality reduction and the depth characteristic for training intact neural network to extract are merged later, complete the identification to target pedestrian.Compared with prior art, the present invention accuracy of identification greatly improves, the pedestrian's weight recognizer for including in the present invention promotes recognition success rate to 81.74%, so that the technology completely can be practical.The process that pedestrian identifies again is to be automatically performed, and will not lose details because of personnel's fatigue.

Description

Intelligent control method based on depth integration neural network pedestrian weight identification technology

Technical field

The present invention relates to intelligent monitoring technology fields, more particularly to one kind to be identified again based on depth integration neural network pedestrian The intelligent control method of technology.

Background technique

Pedestrian identify again also referred to as pedestrian identify again, be judged using computer vision technique be in image or video sequence It is no that there are the technologies of specific pedestrian.Due to the difference between different picture pick-up devices, pedestrian has both characteristic rigid and flexible, appearance Vulnerable to dress, scale, block, posture and visual angle etc. influence, therefore, pedestrian identify again as one in computer vision field both With researching value extremely challenging heat subject again.

Currently, there are mainly two types of the research methods that pedestrian identifies again: method based on manual feature and being based on depth characteristic Method.Method based on manual feature will usually find a kind of good feature of robustness to solve illumination and visual angle change band The influence come, this method are mostly combined with metric learning；And the method based on depth characteristic is then obtained from training data emphatically To the adaptive structure for being matched with label, to realize the identification end-to-end to target pedestrian.In recent years, convolutional neural networks mention The depth characteristic taken has been proved to have good robustness.From pedestrian is introduced within 2014 again behind identification field, based on volume The depth characteristic study of product neural network and depth distance measurement become the mainstream that pedestrian identifies again.Hereafter pedestrian identifies calculation again Fado is to have carried out the improvement in structure to it, including Weihua Chen in proposition in 2017 based on triplet loss's Intersect neural network, the singular value neural network (SVDNet) based on singular value decomposition that Yifan Sun was proposed in 2017 and SpindleNet based on human synovial structure feature.These algorithms have original performance, but still have in robustness It is to be hoisted.2016, Zheng Weishi proposed to be identified again in such a way that manual feature and depth characteristic merge.The algorithmic derivation Loss function also demonstrates constraint of the traditional-handwork feature to neural network parameter to the back-propagation process of each parameter, But the neural network is difficult to train in practical applications, convergence rate it is difficult to ensure that, and by hand feature and depth characteristic It couples very faint.

Summary of the invention

According to technical problem set forth above, and provide a kind of based on depth integration neural network pedestrian's again identification technology Intelligent monitor system.The present invention sets about in terms of manual feature and depth characteristic two, to effectively improve matching rate.The present invention adopts Technological means is as follows:

As shown in Figure 1, a kind of intelligent control method based on depth integration neural network pedestrian weight identification technology, including such as Lower step:

S1, image is obtained frame by frame from monitor video；

S2, the pretreatment that color enhancement is carried out to image send it to the local maximum for extracting manual feature later In the neural network of structure and extraction depth characteristic；

S3A, traditional-handwork feature is extracted, specifically: feature vector is extracted using local maxima pond algorithm, using default The grid of scale traverses whole picture with a fixed step size, and to the image zooming-out color characteristic and textural characteristics in each grid, Then dimension-reduction treatment is carried out to entire feature vector；

S3B, image is extracted using depth residual error convolutional neural networks and Gauss pond；

S4, the traditional-handwork feature after dimensionality reduction and the depth characteristic for training intact neural network to extract are melted It closes, completes the identification to target pedestrian；

S5, the pedestrian recognized is tracked, and repeats above-mentioned stream when tracking creditability is reduced to some threshold value Cheng Jinhang identifies again, meanwhile, the mark for carrying out the preset style to the result of identification is shown.

Further, the pretreatment of the color enhancement specifically: use three respectively using multi-scale image enhancing algorithm The Gaussian parameter of kind different scale carries out color enhancement to obtained image.

Further, in the extraction of the traditional-handwork feature, the extraction of color characteristic include: will carry out it is pretreated Image is converted into hsv color space by RGB, and obtains the color histogram of image on this basis；The extraction of textural characteristics has Body is to be extracted using three value pattern-coding technologies of Scale invariant, the dimension-reduction treatment specifically: using Gauss pond, by data Stripping and slicing is carried out with default size, the moment of the orign and central moment for obtaining stripping and slicing data respectively later are to indicate data in block, thus real Now to the dimensionality reduction of entire feature vector.

Further, the S3B specifically: dynamic fuzzy neural network cuts off pretreated image at random, later It readjusts to pre-set dimension, and is passed in deep neural network ResNet50, input picture passes sequentially through convolutional layer, part The processing of layer, the linear activation primitive of amendment, maximum pond is normalized, enters convolution module later and carries out dimension-reduction treatment, every process One dimensionality reduction module, the size of image can all reduce 2*2.

Further, the S4 specifically: be separately connected the defeated of two vectors using the full connection fused layer of 4096 dimensions Out, the operations such as regularization, batch normalization, nonlinear activation are carried out later, classifier are finally coupled to, using softmax letter Number receives result and provides predicted value, to complete the identification to target pedestrian.

Further, the step S4 specifically comprises the following steps:

S41, the traditional-handwork feature after dimensionality reduction and the depth characteristic for training intact neural network to extract are carried out Fusion, total character representation after fusion are as follows:

F_z=[F_h, F_d]

Wherein, F_dFor the depth characteristic vector obtained in the ResNet50 model, F_hFor by after the dimensionality reduction of Gauss pond Traditional-handwork feature；

S42, traditional-handwork feature is connected in splicing layer, after being sequentially connected batch normalization, nonlinear activation function The attended operation for carrying out classifier, is embodied as:

H () represents activation primitive in formula,And b_fIndicate the weight coefficient and migration vector of articulamentum,

μ_zAnd σ_zThe characteristic for representing mean value and variance, is connected to classifier later, is used herein as the conduct point of softmax structure One layer of class, specifically:

Wherein, x indicates that the vector that the output of one layer of neural network is constituted, θ are the parameter vector obtained by training, point Female presence is in order to which prediction output is normalized.

Using cross entropy loss function, it is J that note, which intersects entropy loss, then has

Wherein, p_kIndicate that the corresponding softmax output valve of k-th of output (also is understood as neural network prediction as k knot The probability of fruit).

Further, the training of model is based on gradient and freezes coaching method and be trained, and specifically comprises the following steps:

Training ResNet50 depth residual error neural network and the full articulamentum of 2048 dimensions being correspondingly connected with point on data set Class device；

After training to model convergence, the parameter of trained ResNet50 is imported in depth integration neural network, weight New connection new full link sort and softmax network, then be trained on identical data set；It, will in training Lower learning rate, emphasis training fusion ring is arranged in the network parameter for the extraction depth characteristic crossed using pre-training model initialization The weight parameter of section and classifier, is finally completed the training to entire converged network.

Further, the data set is specially that Market1501 pedestrian identifies data set again.

The invention has the following advantages that

1, compared with prior art, the present invention accuracy of identification greatly improves, by depth characteristic ResNet50 and manual feature Its accuracy of depth integration neural network after LOMO is combined improves 30% using ResNet50 than simple, the present invention In include pedestrian's weight recognizer recognition success rate is promoted to 81.74% so that the technology completely can be practical.

2, the present invention using gradient freeze the training method time reduce 10 interphases so that fused neural network in training Temporal disadvantage is eased, and improves convergent speed on the basis of guaranteeing accuracy rate.

3, the present invention only need on the basis of artificial monitoring installation one can mouse-based operation program, user is without being entangled with The algorithm details being related to inside technology is identified and is tracked automatically to pedestrian, thus is used relatively simple.

4, the degree of automation of security field is improved.Conventional security monitoring needs a large amount of manpower to carry out naked eyes investigation, Certain personnel that solve a case even will intently, and this technology can be by this process automation, and all processes are automatic complete At, will not because personnel fatigue and lose details.

The present invention can be widely popularized in intelligent monitoring technology field based on the above reasons.

Detailed description of the invention

In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to do simply to introduce, it should be apparent that, the accompanying drawings in the following description is this hair Bright some embodiments for those of ordinary skill in the art without any creative labor, can be with It obtains other drawings based on these drawings.

Fig. 1 is a kind of intelligent control method process based on depth integration neural network pedestrian weight identification technology of the present invention Figure.

Fig. 2 is LOMO of the embodiment of the present invention and the specific design flow diagram of ResNet50.

Fig. 3 is depth integration of embodiment of the present invention detail figure.

Fig. 4 is that gradient of the embodiment of the present invention freezes coaching method superiority schematic diagram.

Fig. 5 is the comparison diagram of the embodiment of the present invention and other methods detection.

Specific embodiment

In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention In attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is A part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art Every other embodiment obtained without making creative work, shall fall within the protection scope of the present invention.

As shown in Figure 1, present embodiments providing a kind of intelligence based on depth integration neural network pedestrian weight identification technology Monitoring method includes the following steps:

S1, it obtains image frame by frame from monitor video, specifically, reads the video flowing in camera and obtain letter frame by frame Breath；Realize that obtain the image tensor of each frame, data are stayed for the reading to video flowing using the Python encapsulation of opencv It is standby to use.

S2, the pretreatment that color enhancement is carried out to image send it to the local maximum for extracting manual feature later In the neural network of structure and extraction depth characteristic；The pretreatment of the color enhancement specifically: enhanced using multi-scale image Algorithm carries out color enhancement to obtained image using the Gaussian parameter of three kinds of different scales respectively.

S3A, traditional-handwork feature is extracted, specifically: feature vector is extracted using local maxima pond algorithm, using default The grid of scale traverses whole picture with a fixed step size, and to the image zooming-out color characteristic and textural characteristics in each grid, Then dimension-reduction treatment is carried out to entire feature vector；In the extraction of the traditional-handwork feature, the extraction of color characteristic include: by It carried out pretreated image and hsv color space is converted by RGB, and obtain the color histogram of image on this basis；Line The extraction for managing feature is specially to be extracted using three value pattern-coding technologies of Scale invariant, the dimension-reduction treatment specifically: use Data are carried out stripping and slicing with default size by Gauss pond, and the moment of the orign and central moment for obtaining stripping and slicing data respectively later are to indicate Data in block, to realize the dimensionality reduction to entire feature vector.

S3B, image is extracted using depth residual error convolutional neural networks and Gauss pond；Specifically, dynamic fuzzy Neural network cuts off pretreated image at random, readjusts later to pre-set dimension, and is passed to deep neural network In ResNet50, input picture passes sequentially through convolutional layer, part normalization layer, the place for correcting linear activation primitive, maximum pond Reason enters convolution module later and carries out dimension-reduction treatment, and every to pass through a dimensionality reduction module, the size of image can all reduce 2*2.

As shown in Figure 2 and Figure 3, S4, by after dimensionality reduction traditional-handwork feature and training intact neural network extract Depth characteristic is merged, and the identification to target pedestrian is completed；Specifically, connected respectively using the full connection fused layer of 4096 dimensions The output of two vectors is connect, the operations such as regularization, batch normalization, nonlinear activation is carried out later, is finally coupled to classifier, Result is received using softmax function and provides predicted value, to complete the identification to target pedestrian.

S5, the pedestrian recognized is tracked, and repeats above-mentioned stream when tracking creditability is reduced to some threshold value Cheng Jinhang identifies again, meanwhile, the mark for carrying out the preset style to the result of identification is shown, specifically, will be identified using ssd algorithm Obtained pedestrian tracks, and is just known again using above-mentioned process again when tracking creditability is reduced to some threshold value Not, the result identified during this will be outlined by rectangle frame to be come.User interface is write using pyQt5, and is translated into operation System executable file (such as exe file), it is user-friendly.

The step S4 specifically comprises the following steps:

F_z=[F_h, F_d]

As preferred embodiment, the training of model is freezed coaching method based on gradient and is trained, and specifically includes as follows Step:

Wherein, the data set is specially that Market1501 pedestrian identifies data set again.The model of this paper is use Market1501 data set is trained (train), test (test) and verifying (validation).Market1501 data set It is data set to be identified by the pedestrian that Liang Zheng is issued again in 2015, at present and largest data set.The data set It finds a view in supermarket, Tsinghua University, totally 38195 pedestrian's pictures that 6 different cameras take are related to including 1501 rows altogether People, wherein training image has 12936, and from 751 pedestrians, test image shares 19732, from other 750 Pedestrian.

Embodiment 1:

S3B, for an original image having a size of 128*64, DFNN first by its it is random cut off, readjust later Size is to 256*128 and is passed in deep neural network ResNet50.Input picture passes sequentially through convolutional layer, part normalization The processing such as layer, the linear activation primitive of amendment, maximum pond, enter 16 convolution modules later.Although these convolution modules are extracted Feature become increasingly complex, but DFFN simultaneously can the size to feature carry out keep and dimensionality reduction, convolution module (conv Block) it is used to carry out dimensionality reduction to data.It is also 3 below identical with this dimensionality reduction module, every to pass through a dimensionality reduction module, The size of image can all reduce 2*2.Therefore, it needs to input before input and output are connected by adder and utilizes convolutional layer Reduce the processing of size, to guarantee normally to be added.

After the pretreatment by multi-scale image enhancing algorithm, a part can be represented as local scale for S3A, image Constant background differential encoding, another part can be converted into the color characteristic of HSV space, later using both LOMO algorithm abstractions Feature Descriptor, obtain the manual feature vector of 70770 dimensions.

Manual characteristic dimension is 70770, if being directly accessed full articulamentum, parameter amount can be very huge.Even if adopting Take traditional dimensionality reduction means, it is also difficult to guarantee accurately to characterize entire traditional characteristic.Therefore one rationally may be used data The characterization leaned on is very necessary.Maximum point is utilized in view of using in the LOMO algorithm that uses when extracting traditional characteristic It measures to characterize the means of the distribution of all components, thus herein no longer using maximum pond, but to the mould of certain length Block uses average pond.In order to which further more reliable guarantee main information is not lost, the classifier of characteristic is assumed herein One-dimensional gaussian profile is closed, thus borrows expectation and two data of variance to characterize local distribution, 70 data are divided into 1 group, are asked Its Gaussian parameter is taken, is linked together later, the feature vector of 2022 dimensions has just been obtained, while guaranteeing precision as far as possible Complete dimensionality reduction.

S4, the traditional-handwork feature after dimensionality reduction and the depth characteristic for training intact neural network to extract are melted It closes, is connected in the splicing layer of whole 2022+2048=4070 dimension, realize the strong confrontation and close coupling of two features.

Freeze to obtain final model after coaching method is trained based on gradient, it is tested, so that it is excellent to assess its More property, specifically, pedestrian identify that common appraisal procedure is CMC (Accumulative Match Characteristic) again. The appraisal procedure thinks that pedestrian's weight identification mission is target image and the sequencing of similarity task to selected digital image.CMC (k) is It is the picture that similarity degree is k in weight identification mission, computer is just correctly obtained the frequency quilt of conclusion in first time identification Referred to as Rank1, Rank1 can be used to measure the accuracy rate that pedestrian identifies again.

In multi-tag image classification, it can not be used simply using Average Accuracy used in single labeling MAP (mean Average Precision) appraisal procedure.The appraisal procedure is to seek putting down by calculating accuracy rate and recall rate Equal accuracy rate (AP, Average Precision), acquires mAP value again later.General mAP numerical value outline is lower than Rank1 numerical value.

As shown in figure 5, in each row, the image on the left side is sample, all recognition results arrange in descending order, red block Indicate erroneous matching, blue box indicates correct matching.That is in the DFNN of the application, correct matching is indicated in addition to the 4th.And SOMAnet only has the 2nd, 7 expressions correctly, other equal mistakes.For second group of example, in the DFNN of the application, the 2nd, 4,7,10 Indicate correct, other indicate mistake, and SOMAnet only has the 6th identification correctly, other identify mistake.Utilize what is introduced Totally 12936 pictures are trained the 751 pedestrian ID given in Market1501 data set, utilize additional 750 later Test by totally 19732 pictures by pedestrian ID.

In training process, detailed process is as follows for the training method that gradient is freezed:

1) training ResNet50 depth residual error neural network and 2048 dimensions being correspondingly connected on Market1501 data set Spend full articulamentum classifier；

2) parameter that trained ResNet50 is imported in depth integration neural network reconnects new full connection point Class and softmax network, then be trained on identical data set.

As shown in figure 4, when only using ResNet50 network training, ResNet50 e-learning rate, depth integration nerve The learning rate of the full link sort device of network is disposed as 0.01.When to deep neural network training, the learning rate of ResNet50 It is arranged to lower 0.001.Optimization method in this model is based on the modified small lot gradient descent method of momentum, momentum (momentum) numerical value is 0.9, and weight delay is 0.0005.After the completion of ResNet training, it is tested, Zhi Hou After being trained to deep neural network, and second is carried out and has tested.Experiments have shown that obtained using ResNet50 merely Rank1 is 51%, and the use of the Rank1 that depth integration neural network is tested is 82%, and 31% is improved from precision.

It is compared by the application depth integration neural network algorithm (DFNN) with pedestrian's weight recognizer in recent years, The performance of this algorithm is verified, test result is as shown in table 1.

Table 1

As shown in table 1, the Rank1 of DFNN is 81.74%, is higher than any other algorithm in table.DFNN model is better than LOMO+ The classic algorithms such as XQDA, because being extracted depth characteristic using ResNet50.In addition, DFNN model ratio SpindleNet or The models such as Triplet CNN will be got well, because having imported manual feature in a model to constrain deep neural network.As can be seen that There is recognition accuracy after depth integration is obviously improved.

Finally, it should be noted that the above embodiments are only used to illustrate the technical solution of the present invention., rather than its limitations.To the greatest extent Present invention has been described in detail with reference to the aforementioned embodiments for pipe, those skilled in the art should understand that it is still It is possible to modify the technical solutions described in the foregoing embodiments, or some or all of the technical features is carried out Equivalent replacement, and these are modified or replaceed, it does not separate the essence of the corresponding technical solution various embodiments of the present invention technical side The range of case.

Claims

1. a kind of intelligent control method based on depth integration neural network pedestrian weight identification technology, which is characterized in that including such as Lower step:

S1, image is obtained frame by frame from monitor video；

S2, the pretreatment that color enhancement is carried out to image send it to the local maximum structure for extracting manual feature later In the neural network of extraction depth characteristic；

S3A, traditional-handwork feature is extracted, specifically: feature vector is extracted using local maxima pond algorithm, uses default scale Grid whole picture is traversed with a fixed step size, and to the image zooming-out color characteristic and textural characteristics in each grid, then Dimension-reduction treatment is carried out to entire feature vector；

S4, the traditional-handwork feature after dimensionality reduction and the depth characteristic for training intact neural network to extract are merged, Complete the identification to target pedestrian；

S5, the pedestrian recognized is tracked, and repeat when tracking creditability is reduced to some threshold value above-mentioned process into It is capable to identify again, meanwhile, the mark for carrying out the preset style to the result of identification is shown.

2. the intelligent control method according to claim 1 based on depth integration neural network pedestrian weight identification technology, It is characterized in that, the pretreatment of the color enhancement specifically: use three kinds of different rulers respectively using multi-scale image enhancing algorithm The Gaussian parameter of degree carries out color enhancement to obtained image.

3. the intelligent control method according to claim 2 based on depth integration neural network pedestrian weight identification technology, Be characterized in that, in the extraction of the traditional-handwork feature, the extraction of color characteristic include: will carry out pretreated image by RGB is converted into hsv color space, and obtains the color histogram of image on this basis；The extraction of textural characteristics is specially to make It is extracted with three value pattern-coding technologies of Scale invariant, the dimension-reduction treatment specifically: using Gauss pond, by data with default Size carries out stripping and slicing, and the moment of the orign and central moment for obtaining stripping and slicing data respectively later are to indicate data in block, to realize to whole The dimensionality reduction of a feature vector.

4. the intelligent control method according to claim 2 based on depth integration neural network pedestrian weight identification technology, It is characterized in that, the S3B specifically: dynamic fuzzy neural network cuts off pretreated image at random, readjusts later It to pre-set dimension, and is passed in deep neural network ResNet50, input picture passes sequentially through convolutional layer, part normalization The processing of layer, the linear activation primitive of amendment, maximum pond enters convolution module later and carries out dimension-reduction treatment, every by a drop Module is tieed up, the size of image can all reduce 2*2.

5. the intelligence prison according to any one of claims 1 to 4 based on depth integration neural network pedestrian weight identification technology Prosecutor method, which is characterized in that the S4 specifically: be separately connected the defeated of two vectors using the full connection fused layer of 4096 dimensions Out, the operations such as regularization, batch normalization, nonlinear activation are carried out later, classifier are finally coupled to, using softmax letter Number receives result and provides predicted value, to complete the identification to target pedestrian.

6. the intelligent control method according to claim 5 based on depth integration neural network pedestrian weight identification technology, It is characterized in that, the step S4 specifically comprises the following steps:

S41, the traditional-handwork feature after dimensionality reduction and the depth characteristic for training intact neural network to extract are merged, Total character representation after fusion are as follows:

F_z=[F_h, F_d]

Wherein, F_dFor the depth characteristic vector obtained in the ResNet50 model, F_hFor by the tradition after the dimensionality reduction of Gauss pond Manual feature；

S42, traditional-handwork feature is connected in splicing layer, is carried out after being sequentially connected batch normalization, nonlinear activation function The attended operation of classifier, is embodied as:

μ_zAnd σ_zThe characteristic for representing mean value and variance, is connected to classifier later, is used herein as softmax structure as classification One layer, specifically:

Wherein, x indicates that the vector that the output of one layer of neural network is constituted, θ are the parameter vector obtained by training, denominator In the presence of be in order to prediction output be normalized.

Wherein, p_kIndicate that (also be understood as neural network prediction is k result to the corresponding softmax output valve of k-th of output Probability).

7. the intelligent control method according to claim 1 based on depth integration neural network pedestrian weight identification technology, It is characterized in that, the training of model is freezed coaching method based on gradient and is trained, and specifically comprises the following steps:

Training ResNet50 depth residual error neural network and the full articulamentum classifier of 2048 dimensions being correspondingly connected on data set；

After training to model convergence, the parameter of trained ResNet50 is imported in depth integration neural network, is connected again New full link sort and softmax network are connect, then is trained on identical data set；In training, will utilize The network parameter of the extraction depth characteristic that pre-training model initialization is crossed is arranged lower learning rate, emphasis training fusion link and The weight parameter of classifier is finally completed the training to entire converged network.

8. the intelligent control method according to claim 7 based on depth integration neural network pedestrian weight identification technology, It is characterized in that, the data set is specially that Market1501 pedestrian identifies data set again.