CN109359515A

CN109359515A - A kind of method and device that the attributive character for target object is identified

Info

Publication number: CN109359515A
Application number: CN201811003925.6A
Authority: CN
Inventors: 邹博; 李昕; 冯莹莹; 潘自兴; 孙登蕊
Original assignee: Neusoft Corp
Current assignee: Neusoft Corp
Priority date: 2018-08-30
Filing date: 2018-08-30
Publication date: 2019-02-19

Abstract

The invention discloses a kind of method, apparatus, system and computer program products that the attributive character for target object is identified, wherein method includes: using the acquired image data set comprising different attribute feature tag as the training data of depth convolutional neural networks；The depth convolutional neural networks are trained using the training data；During being trained to the depth convolutional neural networks, the loss function is dynamically adjusted according to the ratio of the positive sample of multiple attributive character of the training data and negative sample；And video stream data to be detected is decoded and image is chosen, it is identified using attributive character of the trained depth convolutional neural networks to the target object to be identified, exports multiple attributive character of the target object to be identified.

Description

A kind of method and device that the attributive character for target object is identified

Technical field

This application involves field of information processing, and are specifically related to a kind of attributive character progress for target object Method, apparatus, system and the computer program product of identification.

Background technique

Video monitoring is widely used in various fields, in industries such as intelligent transportation, bank, safe city, public safeties In have important role, many aspects for human lives bring convenient and ensure, but simultaneously, the video monitoring stream of magnanimity So that the fast search of effective information becomes very difficult, a large amount of manpower and material resources need to be expended.To the attributive character of target object It is effectively identified, the working efficiency of video monitoring personnel can not only be improved, retrieval, the target object behavior of video are parsed Deng being also of great significance.

The difficult point of the Attribute Recognition of target object essentially consists in, in the ratio of the positive negative sample of the attributive character of target object In the case where numerous imbalances, loss function leads to the problem that recognition accuracy is low using the ratio of fixed positive negative sample.

Summary of the invention

It is provided by the present application to be used to carry out knowledge method for distinguishing to the attributive character of target object, it solves in identification process, In the case where the ratio numerous imbalances of the positive negative sample of the attributive character of target object, loss function uses fixed positive and negative sample This ratio, leads to the problem that recognition accuracy is low.

The application provides a kind of attributive character for target object and carries out knowledge method for distinguishing characterized by comprising

Using the acquired image data set comprising different attribute feature tag as the training of depth convolutional neural networks Data；

The depth convolutional neural networks are trained using the training data, according to the multiple of the training data The ratio of the positive sample of attributive character and negative sample dynamically adjusts the loss function, so that multiple categories of the training data The weighted sum of the loss function of property feature reaches minimum state or convergence state；

The image data including target object to be identified is obtained, using trained depth convolutional neural networks to institute The attributive character for stating target object to be identified is identified, multiple attributive character of the target object to be identified are exported.

Preferably, wherein the target object is pedestrian, and it is described acquired comprising different attribute feature tag Image data set includes:

The image data of pedestrian；And

The different attribute feature tag of pedestrian.

Preferably, the different attribute feature tag of the pedestrian includes:

Gender, whether band cap, whether wear glasses, whether carry on the back shoulder bag, whether carry on the back both shoulders packet, whether carry hand bag, be It is no to take thing and whether make a phone call.

It is preferably, described that the depth convolutional neural networks are trained using the training data, comprising:

The training data is handled according to pre-set size and using method for normalizing, by image data All image datas concentrated are converted to the image data of identical size；Depth is rolled up using the image data of the identical size Product neural network is trained.

Preferably, when the weighted sum of the loss function of multiple attributive character of the training data reach minimum state or When convergence state, the gap between output valve and true value can be reduced.

Preferably, wherein according to the ratio of the positive sample of multiple attributive character of the training data and negative sample come dynamic Adjust the loss function, comprising:

Dynamic loss function are as follows:

When the ratio of the quantity of positive sample and the quantity of negative sample is less than 1:3:

When the ratio of positive sample quantity and the quantity of negative sample is greater than 3:

When the ratio of positive sample data and the quantity of negative sample is the ratio in addition to above-mentioned two ratio:

w_l=1

Wherein, E is total loss function of whole network,It is sample, x_iIt is the confidence level of attribute l, y_ilCharacterize sample x_iWhether there is the true tag of attribute l, n indicates sample number, w_lIt is used weight when calculating loss function,It indicates to belong to The positive sample number of property l,Indicate the negative sample number of attribute l, and a ∈ (0,1], when each iteration, obtains at random.

It is preferably, described to obtain the image data including target object to be identified, comprising:

Video stream data is decoded to obtain video data, is selected from the video data according to predetermined frame rate more A image data, from the image data of to be identified target object of the multiple images data selection comprising different attribute feature.

Preferably, it is described using trained depth convolutional neural networks to the attribute of the target object to be identified Feature is identified, comprising:

The target object to be identified is passed through to input layer, convolutional layer, the pond layer of the depth convolutional neural networks The attributive character is identified with full articulamentum.

It preferably, further include the parameter information for determining the depth convolutional neural networks, specifically:

Determine the number of the convolutional layer, the convolution kernel size of each convolutional layer, the number of the pond layer, each pond The size of the size of layer, the number of the full articulamentum and each full articulamentum.

Preferably, multiple attributive character of the output target object to be identified, comprising:

By the gender of the pedestrian image data of the identification, whether band cap, whether wear glasses, whether carry on the back shoulder bag, be No back both shoulders packet, whether carry hand bag, whether take thing and multiple and different attributive character for whether making a phone call, in the depth The output layer of degree convolutional neural networks is exported.

The application provides a kind of computer program product simultaneously comprising the executable program of processor, which is characterized in that The program performs the steps of when being executed by processor

The application provides a kind of system that the attributive character for target object is identified simultaneously, which is characterized in that The system comprises:

Processor；

Memory for storage processor executable instruction；

Wherein, the processor is configured to: perform claim require 1-6 any one described in method.

The application provides a kind of device that the attributive character for target object is identified, described device packet simultaneously It includes:

Training data acquiring unit, for using the acquired image data set comprising different attribute feature tag as deep Spend the training data of convolutional neural networks；

Training unit, for being trained using the training data to the depth convolutional neural networks, according to described The ratio of the positive samples of multiple attributive character of training data and negative sample dynamically adjusts the loss function, so that the instruction The weighted sum for practicing the loss function of multiple attributive character of data reaches minimum state or convergence state；

Output unit is rolled up for obtaining the image data including target object to be identified using trained depth Product neural network identifies the attributive character of the target object to be identified, exports the target object to be identified Multiple attributive character.

Preferably, wherein the target object is pedestrian, and the training data acquiring unit, comprising:

Subelement is obtained, for obtaining the image data of pedestrian；

Label obtains subelement, for obtaining the image data of pedestrian.

Preferably, the label obtains subelement, comprising:

Label characteristics unit, for obtaining the different attribute feature tag of the pedestrian, wherein the pedestrian's does not belong to Property feature tag include: gender, whether band cap, whether wear glasses, whether carry on the back shoulder bag, whether carry on the back both shoulders packet, whether carry hand Whether whether bag take thing and make a phone call.

Preferably, the training unit, comprising:

Pre-process subelement, for pre-processing the training data, all images that image data is concentrated Data are converted to the image data of identical size；And

Training subelement, for being trained using the image data of the identical size to depth convolutional neural networks.

Preferably, the pretreatment subelement, comprising:

Method handles subelement, for carrying out the training data using method for normalizing according to pre-set size Processing, obtains the image data of identical size.

Preferably, the training unit, further includes:

Subelement is designed, as follows for designing dynamic loss function loss:

w_l=1

Preferably, the output unit, further includes:

Subelement is identified, for the target object to be identified to be passed through to the input of the depth convolutional neural networks Layer, convolutional layer, pond layer and full articulamentum identify the attributive character.

Preferably, the identification subelement, further includes:

Subelement is determined, for determining the number of the convolutional layer, the convolution kernel size of each convolutional layer, the pond layer Number, the size of each pond layer, the size of the number of the full articulamentum and each full articulamentum.

Preferably, the output unit, further includes:

Export subelement, for by the gender of the pedestrian image data of the identification, whether band cap, whether wear glasses, Whether carry on the back shoulder bag, whether carry on the back both shoulders packet, whether carry hand bag, whether take thing and multiple and different categories for whether making a phone call Property feature, is exported in the output layer of the depth convolutional neural networks.

The carry out recognition methods of attributive character provided by the present application to target object, by being carried out at normalization to image It manages and uses depth convolutional neural networks, while loss is dynamically adjusted according to the positive negative sample ratio of the attributive character of training data Function solves in the case where the ratio numerous imbalances of the positive negative sample of the attributive character of target object, and loss function uses The ratio of fixed positive negative sample, leads to the problem that recognition accuracy is low.

Detailed description of the invention

Fig. 1 is the stream that a kind of attributive character for target object provided by the embodiments of the present application know method for distinguishing Cheng Tu；And

Fig. 2 is the knot for the device that a kind of attributive character for target object provided by the embodiments of the present application is identified Structure schematic diagram.

Specific embodiment

Exemplary embodiments of the present invention are introduced referring now to the drawings, however, the present invention can use many different shapes Formula is implemented, and is not limited to the embodiment described herein, and to provide these embodiments be in order at large and fully The open present invention, and the scope of the present invention is sufficiently conveyed to person of ordinary skill in the field.For being illustrated in the accompanying drawings Illustrative embodiments in term be not limitation of the invention.In the accompanying drawings, identical cells/elements use identical Appended drawing reference.

Unless otherwise indicated, term (including scientific and technical terminology) used herein has person of ordinary skill in the field It is common to understand meaning.Further it will be understood that with the term that usually used dictionary limits, should be understood as and its The context of related fields has consistent meaning, and is not construed as Utopian or too formal meaning.

Fig. 1 is the stream that a kind of attributive character for target object provided by the embodiments of the present application know method for distinguishing Cheng Tu is described in detail method provided in this embodiment below with reference to Fig. 1.

A kind of attributive character for target object provided by the embodiments of the present application carries out knowledge method for distinguishing, comprising: will Training data of the acquired image data set comprising different attribute feature tag as depth convolutional neural networks；Using institute It states training data to be trained the depth convolutional neural networks, according to the positive sample of multiple attributive character of the training data Originally the loss function is dynamically adjusted with the ratio of negative sample, so that the loss letter of multiple attributive character of the training data Several weighted sums reaches minimum state or convergence state；The image data including target object to be identified is obtained, warp is used It crosses trained depth convolutional neural networks to identify the attributive character of the target object to be identified, output is described wait know Multiple attributive character of other target object.

In the above-mentioned methods, use the image data comprising different attribute feature tag to depth convolutional neural networks first It is trained, and the ratio numerous imbalances that there is the positive negative samples of multiple attributive character of training data in the training process are asked Topic.In order to overcome this problem, the prior art adjusts loss function using the ratio of fixed positive negative sample, and the application passes through The ratio dynamic adjustment loss function of positive negative sample, improves the accuracy rate of identification.

Step S101, using the acquired image data set comprising different attribute feature tag as depth convolutional Neural net The training data of network.

In this application, mainly the attributive character of the target object in video monitoring image is identified, video prison Control is widely used in various fields, there is important work in the industries such as intelligent transportation, bank, safe city, public safety With many aspects for human lives bring convenient and ensure, still, the video monitoring data of magnanimity makes effective information Fast search becomes very difficult, if a large amount of manpower and material resources, and inefficiency can be expended by artificial search, so, this Shen It please design and have trained depth convolutional neural networks and the attributive character of target object is identified.

Depth convolutional neural networks are trained, it is necessary first to obtain training dataset, training data is finally image Data, image data can also be by obtaining video data decoding.In this application, using pedestrian as the reality of target object Example, but one of ordinary skill in the art are it will be appreciated that target object can be any reasonable object, for example, automobile, fire Vehicle, animal etc..The attributive character of training data, there are many kinds of possibility, for example, the target object of depth convolutional neural networks identification For pedestrian, then just needing to obtain training data of the pedestrian image data as depth convolutional neural networks, while also to obtain The different attribute feature tag of pedestrian, the attributive character label of general pedestrian include: gender, whether band cap, whether wear glasses, Whether carry on the back shoulder bag, whether carry on the back both shoulders packet, whether carry hand bag, whether take thing, if make a phone call.

Step S102 is trained the depth convolutional neural networks using the training data, according to the training The ratio of the positive samples of multiple attributive character of data and negative sample dynamically adjusts the loss function, so that the trained number According to the weighted sums of loss function of multiple attributive character reach minimum state or convergence state.

In previous step, obtain the training dataset of depth convolutional neural networks, training dataset be include not With the pedestrian image data of attribute feature tag, for being trained to depth convolutional neural networks.Identify that the attribute of pedestrian is special The depth convolutional neural networks of sign, the training dataset of acquisition include the pedestrian image data of a variety of different attribute feature tags, Different attribute feature tag include: gender, whether band cap, whether wear glasses, whether carry on the back shoulder bag, whether carry hand bag, be It is no to take thing and whether make a phone call, for the pedestrian image data of different attribute feature tag, it is also necessary to carry out further Refinement, such as, if band cap, there are many kinds of the patterns of cap, include cap edge, it is without cap edge, simultaneously there are also various The cap of color, therefore, it is no this attributive character label with cap just has many different image datas.Likewise, its He passes through refinement at attributive character, also has many different image datas, then just as much as possible obtain comprising these attributes The pedestrian image data of feature tag.In these image datas, can may also there are different sizes, different pixel values, difference The data of the image attributes such as color value.So complicated image data is selected to be trained depth convolutional neural networks, more have Help the accuracy of training result.

Since the pedestrian image of acquisition is more complicated, so there may be inconsistent situations for the size of image, in order to add The convergence of fast neural metwork training, and also to the Training Capability and training speed of neural network are improved, so, in training Before, it needs to pre-process pedestrian image data, specific method is exactly that the pedestrian image that will acquire uses method for normalizing It is handled, the process of method for normalizing processing is exactly by pre-set picture size, at the pedestrian image that will acquire Reason, all picture sizes after the completion of handling are identical.

Depth convolutional neural networks include one it is basic be also important function, loss function, effect is to measure mould The quality of type prediction, likewise, can also be reached by the accuracy of the better percentage regulation convolutional neural networks of loss function The purpose of optimization, so, the quality of loss function design influences the accuracy of depth convolutional neural networks, in order to pass through damage The accuracy for losing function better percentage regulation convolutional neural networks, when the loss function of multiple attributive character of training data When weighted sum reaches minimum state or convergence state, the gap between output valve and true value can be reduced, to improve defeated The accuracy being worth out.

Next, being exactly to handle the image completed using method for normalizing to be trained depth convolutional neural networks.

Pedestrian's data to the training of depth convolutional neural networks include 8 attributive character labels, it is generally the case that positive sample Quantity be less than negative sample quantity, for example, one for training pedestrian image, gender be male, wear glasses, carry on the back both shoulders packet, So whether with cap, whether carry on the back shoulder bag, whether carry hand bag, whether the attributive character label of thing etc. is taken just to be negative, Under normal circumstances, it is normal phenomenon that a pedestrian image, which includes 3 in 8 attributive character labels, it is seen that negative sample quantity one As can be greater than positive sample quantity, and sample size is unbalanced, can be adjusted by loss function, it is common practice that, pass through The ratio of fixed positive negative sample adjusts loss function, and the ratio of the positive negative sample of the attributive character of training data is because different Training data can generate the ratio of different positive negative samples, so the ratio of the positive negative sample of the attributive character of training data is logical Be often unfixed, and the ratio of fixed positive negative sample used to adjust loss function, hence it is evident that be it is unreasonable, also result in damage Lose the accuracy decline of function.And method provided by the present application is, according to the ratio of the positive negative sample of the attributive character of training data Value dynamically adjusts loss function, the ratio of the positive negative sample of such training data with loss function be it is matched, pass through loss The adjusting of function can improve the accuracy of depth convolutional neural networks output layer.

During being trained to depth convolutional neural networks, since depth convolutional neural networks are to the more of pedestrian A attributive character is identified, and the sample distribution between each attribute and disunity, and is the imbalance of extreme, for example, " beating electricity Words " and the positive sample of " taking thing " are few more many than the negative sample of " not making a phone call " and " not taking thing ", so that training is biased to The identification of negative sample, and in order to preferably identify positive sample, need to increase weight of the positive sample on loss function.And the power of fixing Each batch (each iteration enters the picture number of network when training) when weight parameter is for training is not necessarily optimal Weight.In order to keep the positive and negative sample of each batch in sample training balanced, replaces preset parameter using dynamic parameter, make It is more intelligent to obtain algorithm.Dynamic loss function loss provided by the present application are as follows:

w_l=1

Step S103 obtains the image data including target object to be identified, uses trained depth convolution mind It is identified through attributive character of the network to the target object to be identified, exports the multiple of the target object to be identified Attributive character.

This step is and defeated in output layer for being identified to target object to be identified by depth convolutional neural networks Multiple attributive character of target object to be identified out.

The identification object of depth convolutional neural networks is image data, so target object to be identified is image data, It may be video stream data, and video stream data needs are decoded, and then select the image data obtained after decoding After taking, identified as target object to be identified.And in the application, mainly in intelligent transportation, bank, safety city Video monitoring content in the industries such as city, public safety is identified, so being decoded firstly the need of to video stream data.So By the image obtained after decoding, input depth convolutional neural networks are identified after choosing afterwards.

It further include that depth convolutional neural networks are designed before using depth convolutional neural networks, depth convolution Neural network generally comprises: input layer, convolutional layer, pond layer, full articulamentum, output layer.Then the depth convolution mind is designed Parameter information through network, specifically: design the number of the convolutional layer, the convolution kernel size of each convolutional layer, the pond The number of layer, the size of each pond layer, the size of the number of the full articulamentum and each articulamentum entirely.Firstly the need of right The size of the convolution kernel of depth convolutional neural networks is set, and convolution kernel pixel is big, then the characteristic pattern quantity obtained just compared with Few, calculation amount is just few, but recognition capability will be slightly worse.The attributive character of target object is identified, is included at least primary Pondization processing of convolution sum, convolution operation are used to obtain the characteristic image of the target object to be identified.And pondization is handled, The computation complexity to convolutional layer can be reduced.

More specifically and preferably a kind of method of the application is, the structure design of depth convolutional neural networks can be with base In ImageNet-2010 network structure, the top-5 error rate of the structure is 15.3%, the depth convolutional Neural after the completion of designing Network specifically includes: 2 input layers, 5 convolutional layers, 1,2 pond layer of step-length, 2 full articulamentums and 1 multi-tag output Layer.The size of input picture is 113x 113, and unit is in pixel, that is, previous step, and method for normalizing processing is completed Picture size afterwards, convolution kernel size are followed successively by 11x11,5x5,3x3,3x3 and 3x3, and unit is pixel, then step-length 1 makes With above-mentioned 5 kinds of convolution kernels, input picture is sampled respectively, the characteristic pattern number obtained after sampling is respectively 96,256, 384,384 and 512.Activation primitive has very important effect in neural network, in order to avoid simple linear combination, right When image carries out convolution operation, a weight is assigned to each pixel, this operation is obviously exactly linear, but for Input data is not necessarily linear separability, so non-linear factor is added, what solution linear block cann't be solved is asked Topic.So all adding an activation primitive behind each layer of output, activation primitive can be sigmoid, tanh, ReLu Etc., sigmoid function is to deposit the very small problem of gradient when saturated, and existing simultaneously output valve not is centered on 0 Problem；Tanh is a kind of new activation primitive occurred at present there is also gradient very small problem when saturation, ReLu function, Solve gradient dissipation issues in very big program, so, can preferably ReLu function as activation primitive.Obtaining characteristic pattern Later, pond can be carried out to characteristic pattern, pond layer be clipped among continuous convolutional layer, for the amount of compressed data and parameter, Reduce over-fitting.In this application, two pond layers are designed altogether, and the first pond layer is connected to after the second convolutional layer, the second pond Change layer to be connected to after the 5th convolutional layer, and selects the maximum pond of 2x2 pixel, it is non-very big by eliminating using maximum pondization Value, reduces the computation complexity on upper layer.If not using pond layer, final output is had no effect on as a result, still, entire depth The calculation amount for spending convolutional neural networks increases, and certainly will will affect the performance of whole system.So the application, Preferable scheme is that right Characteristic pattern after convolution carries out pond.Pond layer is located among convolutional layer and full articulamentum, after the completion of characteristic pattern pond, connects down The feature in the characteristics of image figure by multiple convolutional layers and pond layer is integrated come exactly full articulamentum, it is special to obtain image Sign is used for image classification later.

Full connection processing is that each node of full articulamentum is connected with upper one layer of all nodes, is used to front The characteristic synthetic extracted is in this application exactly connected the characteristic image after the process of pond entirely, thus Obtain target object to be identified i.e. 8 attributive character of pedestrian.

Full articulamentum plays the role of " classifier " in entire convolutional neural networks.If convolutional layer, pond layer and The operations such as activation primitive layer are that full articulamentum is then played " to be divided what is acquired if initial data to be mapped to hidden layer feature space Cloth character representation " is mapped to the effect in sample labeling space.In actual use, full articulamentum can be realized by convolution operation: It is the convolution that the full articulamentum connected entirely can be converted into that convolution kernel is 1x1 to front layer；And front layer is the full articulamentum of convolutional layer It can be converted into the global convolution that convolution kernel is hxw, h and w are respectively the height and width of front layer convolution results.By Chi Huahou, entirely The size of articulamentum is respectively 2048 and 4096.

It next is exactly to identify target object to be identified by depth convolutional neural networks, output is described wait know Multiple attributive character of other target object.Specifically, using multiple labels by the gender of the pedestrian image data of identification, whether Band cap, whether wear glasses, whether carry on the back shoulder bag, whether carry on the back both shoulders packet, whether carry hand bag, whether take thing and whether It makes a phone call, is exported in the output layer of the depth convolutional neural networks.

In addition, the application also provides a kind of computer program product comprising the executable program of processor, feature exist In the program performs the steps of when being executed by processor

Obtain the image data comprising different attribute feature tag, the training data as depth convolutional neural networks；

The depth convolutional neural networks are trained using the training data, so that the training data is multiple The weighted sum of the loss function of attributive character reaches minimum state or convergence state；

It is special according to multiple attributes of the training data during being trained to the depth convolutional neural networks The ratio of the positive sample of sign and negative sample dynamically adjusts the loss function；And

Video stream data to be detected is decoded and image is chosen, to obtain the figure for including target object to be identified As data, known using attributive character of the trained depth convolutional neural networks to the target object to be identified Not, multiple attributive character of the target object to be identified are exported.

Processor；

Memory for storage processor executable instruction；

Fig. 2 is the knot for the device that a kind of attributive character for target object provided by the embodiments of the present application is identified Structure schematic diagram.It is corresponding know method for distinguishing with a kind of attributive character for target object provided by the present application, this Application provides a kind of device 200 that the attributive character for target object is identified, described device includes:

Training data acquiring unit 201, for making the acquired image data set comprising different attribute feature tag For the training data of depth convolutional neural networks；

Training unit 202, for being trained using the training data to the depth convolutional neural networks, according to institute The ratio of the positive sample and negative sample of stating multiple attributive character of training data dynamically adjusts the loss function, so that described The weighted sum of the loss function of multiple attributive character of training data reaches minimum state or convergence state；

Output unit 203 uses trained depth for obtaining the image data including target object to be identified Convolutional neural networks identify the attributive character of the target object to be identified, export the target object to be identified Multiple attributive character.

Optionally, the adjustment unit, further includes:

Design cell, as follows for designing dynamic loss function loss:

w_l=1

Wherein, E is total loss function of whole network,It is sample, x_iIt is the confidence level of attribute l, y_ilCharacterize sample x_iWhether there is the true tag of attribute l, n indicates sample number, w_lIt is used weight when calculating loss function,It indicates The positive sample number of attribute l,Indicate the negative sample number of attribute l, and a ∈ (0,1], when each iteration, obtains at random.

Claims

1. a kind of attributive character for target object carries out knowledge method for distinguishing characterized by comprising

Using the acquired image data set comprising different attribute feature tag as the training data of depth convolutional neural networks；

The depth convolutional neural networks are trained using the training data, according to multiple attributes of the training data The ratio of the positive sample of feature and negative sample dynamically adjusts the loss function, so that multiple attributes of the training data are special The weighted sum of the loss function of sign reaches minimum state or convergence state；

Obtain include target object to be identified image data, using trained depth convolutional neural networks to it is described to The attributive character of the target object of identification is identified, multiple attributive character of the target object to be identified are exported.

2. the method according to claim 1, wherein wherein the target object is pedestrian, and described being obtained The image data set comprising different attribute feature tag taken includes:

The image data of pedestrian；And

The different attribute feature tag of pedestrian.

3. according to the method described in claim 2, it is characterized in that, the different attribute feature tag of the pedestrian includes:

Gender, whether band cap, whether wear glasses, whether carry on the back shoulder bag, whether carry on the back both shoulders packet, whether carry hand bag, whether take Thing and whether make a phone call；

Wherein export multiple attributive character of the target object to be identified, comprising:

By the gender of the pedestrian image data of the identification, whether band cap, whether wear glasses, whether carry on the back shoulder bag, whether carry on the back Both shoulders packet, whether carry hand bag, whether take thing and multiple and different attributive character for whether making a phone call, rolled up in the depth The output layer of product neural network is exported.

4. the method according to claim 1, wherein described refreshing to the depth convolution using the training data It is trained through network, comprising:

The training data is handled according to pre-set size and using method for normalizing, image data is concentrated All image datas be converted to the image data of identical size；Using the image data of the identical size to depth convolution mind It is trained through network.

5. the method according to claim 1, wherein wherein according to multiple attributive character of the training data The ratio of positive sample and negative sample dynamically adjusts the loss function, comprising:

Dynamic loss function are as follows:

w_l=1

Wherein, E is total loss function of whole network,It is sample, x_iIt is the confidence level of attribute l, y_ilCharacterize sample x_iIt is The no true tag with attribute l, n indicate sample number, w_lIt is used weight when calculating loss function,Indicate attribute l Positive sample number,Indicate the negative sample number of attribute l, and a ∈ (0,1], when each iteration, obtains at random.

6. the method according to claim 1, wherein described obtain the picture number including target object to be identified According to, comprising:

Video stream data is decoded to obtain video data, selects multiple figures from the video data according to predetermined frame rate As data, from the image data of to be identified target object of the multiple images data selection comprising different attribute feature.

7. a kind of computer program product comprising the executable program of processor, which is characterized in that the program is held by processor It is performed the steps of when row

8. a kind of system that the attributive character for target object is identified, which is characterized in that the system comprises:

Processor；

Memory for storage processor executable instruction；

9. a kind of device that the attributive character for target object is identified, described device include:

Training data acquiring unit, for being rolled up the acquired image data set comprising different attribute feature tag as depth The training data of product neural network；

Training unit, for being trained using the training data to the depth convolutional neural networks, according to the training The ratio of the positive samples of multiple attributive character of data and negative sample dynamically adjusts the loss function, so that the trained number According to the weighted sums of loss function of multiple attributive character reach minimum state or convergence state；

Output unit uses trained depth convolution mind for obtaining the image data including target object to be identified It is identified through attributive character of the network to the target object to be identified, exports the multiple of the target object to be identified Attributive character.

10. device according to claim 9, which is characterized in that the adjustment unit, further includes:

Design cell, as follows for designing dynamic loss function loss:

w_l=1