CN109359515A - A kind of method and device that the attributive character for target object is identified - Google Patents
A kind of method and device that the attributive character for target object is identified Download PDFInfo
- Publication number
- CN109359515A CN109359515A CN201811003925.6A CN201811003925A CN109359515A CN 109359515 A CN109359515 A CN 109359515A CN 201811003925 A CN201811003925 A CN 201811003925A CN 109359515 A CN109359515 A CN 109359515A
- Authority
- CN
- China
- Prior art keywords
- target object
- identified
- attributive character
- ratio
- sample
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/52—Surveillance or monitoring of activities, e.g. for recognising suspicious objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
Abstract
The invention discloses a kind of method, apparatus, system and computer program products that the attributive character for target object is identified, wherein method includes: using the acquired image data set comprising different attribute feature tag as the training data of depth convolutional neural networks;The depth convolutional neural networks are trained using the training data;During being trained to the depth convolutional neural networks, the loss function is dynamically adjusted according to the ratio of the positive sample of multiple attributive character of the training data and negative sample;And video stream data to be detected is decoded and image is chosen, it is identified using attributive character of the trained depth convolutional neural networks to the target object to be identified, exports multiple attributive character of the target object to be identified.
Description
Technical field
This application involves field of information processing, and are specifically related to a kind of attributive character progress for target object
Method, apparatus, system and the computer program product of identification.
Background technique
Video monitoring is widely used in various fields, in industries such as intelligent transportation, bank, safe city, public safeties
In have important role, many aspects for human lives bring convenient and ensure, but simultaneously, the video monitoring stream of magnanimity
So that the fast search of effective information becomes very difficult, a large amount of manpower and material resources need to be expended.To the attributive character of target object
It is effectively identified, the working efficiency of video monitoring personnel can not only be improved, retrieval, the target object behavior of video are parsed
Deng being also of great significance.
The difficult point of the Attribute Recognition of target object essentially consists in, in the ratio of the positive negative sample of the attributive character of target object
In the case where numerous imbalances, loss function leads to the problem that recognition accuracy is low using the ratio of fixed positive negative sample.
Summary of the invention
It is provided by the present application to be used to carry out knowledge method for distinguishing to the attributive character of target object, it solves in identification process,
In the case where the ratio numerous imbalances of the positive negative sample of the attributive character of target object, loss function uses fixed positive and negative sample
This ratio, leads to the problem that recognition accuracy is low.
The application provides a kind of attributive character for target object and carries out knowledge method for distinguishing characterized by comprising
Using the acquired image data set comprising different attribute feature tag as the training of depth convolutional neural networks
Data;
The depth convolutional neural networks are trained using the training data, according to the multiple of the training data
The ratio of the positive sample of attributive character and negative sample dynamically adjusts the loss function, so that multiple categories of the training data
The weighted sum of the loss function of property feature reaches minimum state or convergence state;
The image data including target object to be identified is obtained, using trained depth convolutional neural networks to institute
The attributive character for stating target object to be identified is identified, multiple attributive character of the target object to be identified are exported.
Preferably, wherein the target object is pedestrian, and it is described acquired comprising different attribute feature tag
Image data set includes:
The image data of pedestrian;And
The different attribute feature tag of pedestrian.
Preferably, the different attribute feature tag of the pedestrian includes:
Gender, whether band cap, whether wear glasses, whether carry on the back shoulder bag, whether carry on the back both shoulders packet, whether carry hand bag, be
It is no to take thing and whether make a phone call.
It is preferably, described that the depth convolutional neural networks are trained using the training data, comprising:
The training data is handled according to pre-set size and using method for normalizing, by image data
All image datas concentrated are converted to the image data of identical size;Depth is rolled up using the image data of the identical size
Product neural network is trained.
Preferably, when the weighted sum of the loss function of multiple attributive character of the training data reach minimum state or
When convergence state, the gap between output valve and true value can be reduced.
Preferably, wherein according to the ratio of the positive sample of multiple attributive character of the training data and negative sample come dynamic
Adjust the loss function, comprising:
Dynamic loss function are as follows:
When the ratio of the quantity of positive sample and the quantity of negative sample is less than 1:3:
When the ratio of positive sample quantity and the quantity of negative sample is greater than 3:
When the ratio of positive sample data and the quantity of negative sample is the ratio in addition to above-mentioned two ratio:
wl=1
Wherein, E is total loss function of whole network,It is sample, xiIt is the confidence level of attribute l, yilCharacterize sample
xiWhether there is the true tag of attribute l, n indicates sample number, wlIt is used weight when calculating loss function,It indicates to belong to
The positive sample number of property l,Indicate the negative sample number of attribute l, and a ∈ (0,1], when each iteration, obtains at random.
It is preferably, described to obtain the image data including target object to be identified, comprising:
Video stream data is decoded to obtain video data, is selected from the video data according to predetermined frame rate more
A image data, from the image data of to be identified target object of the multiple images data selection comprising different attribute feature.
Preferably, it is described using trained depth convolutional neural networks to the attribute of the target object to be identified
Feature is identified, comprising:
The target object to be identified is passed through to input layer, convolutional layer, the pond layer of the depth convolutional neural networks
The attributive character is identified with full articulamentum.
It preferably, further include the parameter information for determining the depth convolutional neural networks, specifically:
Determine the number of the convolutional layer, the convolution kernel size of each convolutional layer, the number of the pond layer, each pond
The size of the size of layer, the number of the full articulamentum and each full articulamentum.
Preferably, multiple attributive character of the output target object to be identified, comprising:
By the gender of the pedestrian image data of the identification, whether band cap, whether wear glasses, whether carry on the back shoulder bag, be
No back both shoulders packet, whether carry hand bag, whether take thing and multiple and different attributive character for whether making a phone call, in the depth
The output layer of degree convolutional neural networks is exported.
The application provides a kind of computer program product simultaneously comprising the executable program of processor, which is characterized in that
The program performs the steps of when being executed by processor
Using the acquired image data set comprising different attribute feature tag as the training of depth convolutional neural networks
Data;
The depth convolutional neural networks are trained using the training data, according to the multiple of the training data
The ratio of the positive sample of attributive character and negative sample dynamically adjusts the loss function, so that multiple categories of the training data
The weighted sum of the loss function of property feature reaches minimum state or convergence state;
The image data including target object to be identified is obtained, using trained depth convolutional neural networks to institute
The attributive character for stating target object to be identified is identified, multiple attributive character of the target object to be identified are exported.
The application provides a kind of system that the attributive character for target object is identified simultaneously, which is characterized in that
The system comprises:
Processor;
Memory for storage processor executable instruction;
Wherein, the processor is configured to: perform claim require 1-6 any one described in method.
The application provides a kind of device that the attributive character for target object is identified, described device packet simultaneously
It includes:
Training data acquiring unit, for using the acquired image data set comprising different attribute feature tag as deep
Spend the training data of convolutional neural networks;
Training unit, for being trained using the training data to the depth convolutional neural networks, according to described
The ratio of the positive samples of multiple attributive character of training data and negative sample dynamically adjusts the loss function, so that the instruction
The weighted sum for practicing the loss function of multiple attributive character of data reaches minimum state or convergence state;
Output unit is rolled up for obtaining the image data including target object to be identified using trained depth
Product neural network identifies the attributive character of the target object to be identified, exports the target object to be identified
Multiple attributive character.
Preferably, wherein the target object is pedestrian, and the training data acquiring unit, comprising:
Subelement is obtained, for obtaining the image data of pedestrian;
Label obtains subelement, for obtaining the image data of pedestrian.
Preferably, the label obtains subelement, comprising:
Label characteristics unit, for obtaining the different attribute feature tag of the pedestrian, wherein the pedestrian's does not belong to
Property feature tag include: gender, whether band cap, whether wear glasses, whether carry on the back shoulder bag, whether carry on the back both shoulders packet, whether carry hand
Whether whether bag take thing and make a phone call.
Preferably, the training unit, comprising:
Pre-process subelement, for pre-processing the training data, all images that image data is concentrated
Data are converted to the image data of identical size;And
Training subelement, for being trained using the image data of the identical size to depth convolutional neural networks.
Preferably, the pretreatment subelement, comprising:
Method handles subelement, for carrying out the training data using method for normalizing according to pre-set size
Processing, obtains the image data of identical size.
Preferably, the training unit, further includes:
Subelement is designed, as follows for designing dynamic loss function loss:
When the ratio of the quantity of positive sample and the quantity of negative sample is less than 1:3:
When the ratio of positive sample quantity and the quantity of negative sample is greater than 3:
When the ratio of positive sample data and the quantity of negative sample is the ratio in addition to above-mentioned two ratio:
wl=1
Wherein, E is total loss function of whole network,It is sample, xiIt is the confidence level of attribute l, yilCharacterize sample
xiWhether there is the true tag of attribute l, n indicates sample number, wlIt is used weight when calculating loss function,It indicates to belong to
The positive sample number of property l,Indicate the negative sample number of attribute l, and a ∈ (0,1], when each iteration, obtains at random.
Preferably, the output unit, further includes:
Subelement is identified, for the target object to be identified to be passed through to the input of the depth convolutional neural networks
Layer, convolutional layer, pond layer and full articulamentum identify the attributive character.
Preferably, the identification subelement, further includes:
Subelement is determined, for determining the number of the convolutional layer, the convolution kernel size of each convolutional layer, the pond layer
Number, the size of each pond layer, the size of the number of the full articulamentum and each full articulamentum.
Preferably, the output unit, further includes:
Export subelement, for by the gender of the pedestrian image data of the identification, whether band cap, whether wear glasses,
Whether carry on the back shoulder bag, whether carry on the back both shoulders packet, whether carry hand bag, whether take thing and multiple and different categories for whether making a phone call
Property feature, is exported in the output layer of the depth convolutional neural networks.
The carry out recognition methods of attributive character provided by the present application to target object, by being carried out at normalization to image
It manages and uses depth convolutional neural networks, while loss is dynamically adjusted according to the positive negative sample ratio of the attributive character of training data
Function solves in the case where the ratio numerous imbalances of the positive negative sample of the attributive character of target object, and loss function uses
The ratio of fixed positive negative sample, leads to the problem that recognition accuracy is low.
Detailed description of the invention
Fig. 1 is the stream that a kind of attributive character for target object provided by the embodiments of the present application know method for distinguishing
Cheng Tu;And
Fig. 2 is the knot for the device that a kind of attributive character for target object provided by the embodiments of the present application is identified
Structure schematic diagram.
Specific embodiment
Exemplary embodiments of the present invention are introduced referring now to the drawings, however, the present invention can use many different shapes
Formula is implemented, and is not limited to the embodiment described herein, and to provide these embodiments be in order at large and fully
The open present invention, and the scope of the present invention is sufficiently conveyed to person of ordinary skill in the field.For being illustrated in the accompanying drawings
Illustrative embodiments in term be not limitation of the invention.In the accompanying drawings, identical cells/elements use identical
Appended drawing reference.
Unless otherwise indicated, term (including scientific and technical terminology) used herein has person of ordinary skill in the field
It is common to understand meaning.Further it will be understood that with the term that usually used dictionary limits, should be understood as and its
The context of related fields has consistent meaning, and is not construed as Utopian or too formal meaning.
Fig. 1 is the stream that a kind of attributive character for target object provided by the embodiments of the present application know method for distinguishing
Cheng Tu is described in detail method provided in this embodiment below with reference to Fig. 1.
A kind of attributive character for target object provided by the embodiments of the present application carries out knowledge method for distinguishing, comprising: will
Training data of the acquired image data set comprising different attribute feature tag as depth convolutional neural networks;Using institute
It states training data to be trained the depth convolutional neural networks, according to the positive sample of multiple attributive character of the training data
Originally the loss function is dynamically adjusted with the ratio of negative sample, so that the loss letter of multiple attributive character of the training data
Several weighted sums reaches minimum state or convergence state;The image data including target object to be identified is obtained, warp is used
It crosses trained depth convolutional neural networks to identify the attributive character of the target object to be identified, output is described wait know
Multiple attributive character of other target object.
In the above-mentioned methods, use the image data comprising different attribute feature tag to depth convolutional neural networks first
It is trained, and the ratio numerous imbalances that there is the positive negative samples of multiple attributive character of training data in the training process are asked
Topic.In order to overcome this problem, the prior art adjusts loss function using the ratio of fixed positive negative sample, and the application passes through
The ratio dynamic adjustment loss function of positive negative sample, improves the accuracy rate of identification.
Step S101, using the acquired image data set comprising different attribute feature tag as depth convolutional Neural net
The training data of network.
In this application, mainly the attributive character of the target object in video monitoring image is identified, video prison
Control is widely used in various fields, there is important work in the industries such as intelligent transportation, bank, safe city, public safety
With many aspects for human lives bring convenient and ensure, still, the video monitoring data of magnanimity makes effective information
Fast search becomes very difficult, if a large amount of manpower and material resources, and inefficiency can be expended by artificial search, so, this Shen
It please design and have trained depth convolutional neural networks and the attributive character of target object is identified.
Depth convolutional neural networks are trained, it is necessary first to obtain training dataset, training data is finally image
Data, image data can also be by obtaining video data decoding.In this application, using pedestrian as the reality of target object
Example, but one of ordinary skill in the art are it will be appreciated that target object can be any reasonable object, for example, automobile, fire
Vehicle, animal etc..The attributive character of training data, there are many kinds of possibility, for example, the target object of depth convolutional neural networks identification
For pedestrian, then just needing to obtain training data of the pedestrian image data as depth convolutional neural networks, while also to obtain
The different attribute feature tag of pedestrian, the attributive character label of general pedestrian include: gender, whether band cap, whether wear glasses,
Whether carry on the back shoulder bag, whether carry on the back both shoulders packet, whether carry hand bag, whether take thing, if make a phone call.
Step S102 is trained the depth convolutional neural networks using the training data, according to the training
The ratio of the positive samples of multiple attributive character of data and negative sample dynamically adjusts the loss function, so that the trained number
According to the weighted sums of loss function of multiple attributive character reach minimum state or convergence state.
In previous step, obtain the training dataset of depth convolutional neural networks, training dataset be include not
With the pedestrian image data of attribute feature tag, for being trained to depth convolutional neural networks.Identify that the attribute of pedestrian is special
The depth convolutional neural networks of sign, the training dataset of acquisition include the pedestrian image data of a variety of different attribute feature tags,
Different attribute feature tag include: gender, whether band cap, whether wear glasses, whether carry on the back shoulder bag, whether carry hand bag, be
It is no to take thing and whether make a phone call, for the pedestrian image data of different attribute feature tag, it is also necessary to carry out further
Refinement, such as, if band cap, there are many kinds of the patterns of cap, include cap edge, it is without cap edge, simultaneously there are also various
The cap of color, therefore, it is no this attributive character label with cap just has many different image datas.Likewise, its
He passes through refinement at attributive character, also has many different image datas, then just as much as possible obtain comprising these attributes
The pedestrian image data of feature tag.In these image datas, can may also there are different sizes, different pixel values, difference
The data of the image attributes such as color value.So complicated image data is selected to be trained depth convolutional neural networks, more have
Help the accuracy of training result.
Since the pedestrian image of acquisition is more complicated, so there may be inconsistent situations for the size of image, in order to add
The convergence of fast neural metwork training, and also to the Training Capability and training speed of neural network are improved, so, in training
Before, it needs to pre-process pedestrian image data, specific method is exactly that the pedestrian image that will acquire uses method for normalizing
It is handled, the process of method for normalizing processing is exactly by pre-set picture size, at the pedestrian image that will acquire
Reason, all picture sizes after the completion of handling are identical.
Depth convolutional neural networks include one it is basic be also important function, loss function, effect is to measure mould
The quality of type prediction, likewise, can also be reached by the accuracy of the better percentage regulation convolutional neural networks of loss function
The purpose of optimization, so, the quality of loss function design influences the accuracy of depth convolutional neural networks, in order to pass through damage
The accuracy for losing function better percentage regulation convolutional neural networks, when the loss function of multiple attributive character of training data
When weighted sum reaches minimum state or convergence state, the gap between output valve and true value can be reduced, to improve defeated
The accuracy being worth out.
Next, being exactly to handle the image completed using method for normalizing to be trained depth convolutional neural networks.
Pedestrian's data to the training of depth convolutional neural networks include 8 attributive character labels, it is generally the case that positive sample
Quantity be less than negative sample quantity, for example, one for training pedestrian image, gender be male, wear glasses, carry on the back both shoulders packet,
So whether with cap, whether carry on the back shoulder bag, whether carry hand bag, whether the attributive character label of thing etc. is taken just to be negative,
Under normal circumstances, it is normal phenomenon that a pedestrian image, which includes 3 in 8 attributive character labels, it is seen that negative sample quantity one
As can be greater than positive sample quantity, and sample size is unbalanced, can be adjusted by loss function, it is common practice that, pass through
The ratio of fixed positive negative sample adjusts loss function, and the ratio of the positive negative sample of the attributive character of training data is because different
Training data can generate the ratio of different positive negative samples, so the ratio of the positive negative sample of the attributive character of training data is logical
Be often unfixed, and the ratio of fixed positive negative sample used to adjust loss function, hence it is evident that be it is unreasonable, also result in damage
Lose the accuracy decline of function.And method provided by the present application is, according to the ratio of the positive negative sample of the attributive character of training data
Value dynamically adjusts loss function, the ratio of the positive negative sample of such training data with loss function be it is matched, pass through loss
The adjusting of function can improve the accuracy of depth convolutional neural networks output layer.
During being trained to depth convolutional neural networks, since depth convolutional neural networks are to the more of pedestrian
A attributive character is identified, and the sample distribution between each attribute and disunity, and is the imbalance of extreme, for example, " beating electricity
Words " and the positive sample of " taking thing " are few more many than the negative sample of " not making a phone call " and " not taking thing ", so that training is biased to
The identification of negative sample, and in order to preferably identify positive sample, need to increase weight of the positive sample on loss function.And the power of fixing
Each batch (each iteration enters the picture number of network when training) when weight parameter is for training is not necessarily optimal
Weight.In order to keep the positive and negative sample of each batch in sample training balanced, replaces preset parameter using dynamic parameter, make
It is more intelligent to obtain algorithm.Dynamic loss function loss provided by the present application are as follows:
When the ratio of the quantity of positive sample and the quantity of negative sample is less than 1:3:
When the ratio of positive sample quantity and the quantity of negative sample is greater than 3:
When the ratio of positive sample data and the quantity of negative sample is the ratio in addition to above-mentioned two ratio:
wl=1
Wherein, E is total loss function of whole network,It is sample, xiIt is the confidence level of attribute l, yilCharacterize sample
xiWhether there is the true tag of attribute l, n indicates sample number, wlIt is used weight when calculating loss function,It indicates to belong to
The positive sample number of property l,Indicate the negative sample number of attribute l, and a ∈ (0,1], when each iteration, obtains at random.
Step S103 obtains the image data including target object to be identified, uses trained depth convolution mind
It is identified through attributive character of the network to the target object to be identified, exports the multiple of the target object to be identified
Attributive character.
This step is and defeated in output layer for being identified to target object to be identified by depth convolutional neural networks
Multiple attributive character of target object to be identified out.
The identification object of depth convolutional neural networks is image data, so target object to be identified is image data,
It may be video stream data, and video stream data needs are decoded, and then select the image data obtained after decoding
After taking, identified as target object to be identified.And in the application, mainly in intelligent transportation, bank, safety city
Video monitoring content in the industries such as city, public safety is identified, so being decoded firstly the need of to video stream data.So
By the image obtained after decoding, input depth convolutional neural networks are identified after choosing afterwards.
It further include that depth convolutional neural networks are designed before using depth convolutional neural networks, depth convolution
Neural network generally comprises: input layer, convolutional layer, pond layer, full articulamentum, output layer.Then the depth convolution mind is designed
Parameter information through network, specifically: design the number of the convolutional layer, the convolution kernel size of each convolutional layer, the pond
The number of layer, the size of each pond layer, the size of the number of the full articulamentum and each articulamentum entirely.Firstly the need of right
The size of the convolution kernel of depth convolutional neural networks is set, and convolution kernel pixel is big, then the characteristic pattern quantity obtained just compared with
Few, calculation amount is just few, but recognition capability will be slightly worse.The attributive character of target object is identified, is included at least primary
Pondization processing of convolution sum, convolution operation are used to obtain the characteristic image of the target object to be identified.And pondization is handled,
The computation complexity to convolutional layer can be reduced.
More specifically and preferably a kind of method of the application is, the structure design of depth convolutional neural networks can be with base
In ImageNet-2010 network structure, the top-5 error rate of the structure is 15.3%, the depth convolutional Neural after the completion of designing
Network specifically includes: 2 input layers, 5 convolutional layers, 1,2 pond layer of step-length, 2 full articulamentums and 1 multi-tag output
Layer.The size of input picture is 113x 113, and unit is in pixel, that is, previous step, and method for normalizing processing is completed
Picture size afterwards, convolution kernel size are followed successively by 11x11,5x5,3x3,3x3 and 3x3, and unit is pixel, then step-length 1 makes
With above-mentioned 5 kinds of convolution kernels, input picture is sampled respectively, the characteristic pattern number obtained after sampling is respectively 96,256,
384,384 and 512.Activation primitive has very important effect in neural network, in order to avoid simple linear combination, right
When image carries out convolution operation, a weight is assigned to each pixel, this operation is obviously exactly linear, but for
Input data is not necessarily linear separability, so non-linear factor is added, what solution linear block cann't be solved is asked
Topic.So all adding an activation primitive behind each layer of output, activation primitive can be sigmoid, tanh, ReLu
Etc., sigmoid function is to deposit the very small problem of gradient when saturated, and existing simultaneously output valve not is centered on 0
Problem;Tanh is a kind of new activation primitive occurred at present there is also gradient very small problem when saturation, ReLu function,
Solve gradient dissipation issues in very big program, so, can preferably ReLu function as activation primitive.Obtaining characteristic pattern
Later, pond can be carried out to characteristic pattern, pond layer be clipped among continuous convolutional layer, for the amount of compressed data and parameter,
Reduce over-fitting.In this application, two pond layers are designed altogether, and the first pond layer is connected to after the second convolutional layer, the second pond
Change layer to be connected to after the 5th convolutional layer, and selects the maximum pond of 2x2 pixel, it is non-very big by eliminating using maximum pondization
Value, reduces the computation complexity on upper layer.If not using pond layer, final output is had no effect on as a result, still, entire depth
The calculation amount for spending convolutional neural networks increases, and certainly will will affect the performance of whole system.So the application, Preferable scheme is that right
Characteristic pattern after convolution carries out pond.Pond layer is located among convolutional layer and full articulamentum, after the completion of characteristic pattern pond, connects down
The feature in the characteristics of image figure by multiple convolutional layers and pond layer is integrated come exactly full articulamentum, it is special to obtain image
Sign is used for image classification later.
Full connection processing is that each node of full articulamentum is connected with upper one layer of all nodes, is used to front
The characteristic synthetic extracted is in this application exactly connected the characteristic image after the process of pond entirely, thus
Obtain target object to be identified i.e. 8 attributive character of pedestrian.
Full articulamentum plays the role of " classifier " in entire convolutional neural networks.If convolutional layer, pond layer and
The operations such as activation primitive layer are that full articulamentum is then played " to be divided what is acquired if initial data to be mapped to hidden layer feature space
Cloth character representation " is mapped to the effect in sample labeling space.In actual use, full articulamentum can be realized by convolution operation:
It is the convolution that the full articulamentum connected entirely can be converted into that convolution kernel is 1x1 to front layer;And front layer is the full articulamentum of convolutional layer
It can be converted into the global convolution that convolution kernel is hxw, h and w are respectively the height and width of front layer convolution results.By Chi Huahou, entirely
The size of articulamentum is respectively 2048 and 4096.
It next is exactly to identify target object to be identified by depth convolutional neural networks, output is described wait know
Multiple attributive character of other target object.Specifically, using multiple labels by the gender of the pedestrian image data of identification, whether
Band cap, whether wear glasses, whether carry on the back shoulder bag, whether carry on the back both shoulders packet, whether carry hand bag, whether take thing and whether
It makes a phone call, is exported in the output layer of the depth convolutional neural networks.
In addition, the application also provides a kind of computer program product comprising the executable program of processor, feature exist
In the program performs the steps of when being executed by processor
Obtain the image data comprising different attribute feature tag, the training data as depth convolutional neural networks;
Using the acquired image data set comprising different attribute feature tag as the training of depth convolutional neural networks
Data;
The depth convolutional neural networks are trained using the training data, so that the training data is multiple
The weighted sum of the loss function of attributive character reaches minimum state or convergence state;
It is special according to multiple attributes of the training data during being trained to the depth convolutional neural networks
The ratio of the positive sample of sign and negative sample dynamically adjusts the loss function;And
Video stream data to be detected is decoded and image is chosen, to obtain the figure for including target object to be identified
As data, known using attributive character of the trained depth convolutional neural networks to the target object to be identified
Not, multiple attributive character of the target object to be identified are exported.
The application provides a kind of system that the attributive character for target object is identified simultaneously, which is characterized in that
The system comprises:
Processor;
Memory for storage processor executable instruction;
Wherein, the processor is configured to: perform claim require 1-6 any one described in method.
Fig. 2 is the knot for the device that a kind of attributive character for target object provided by the embodiments of the present application is identified
Structure schematic diagram.It is corresponding know method for distinguishing with a kind of attributive character for target object provided by the present application, this
Application provides a kind of device 200 that the attributive character for target object is identified, described device includes:
Training data acquiring unit 201, for making the acquired image data set comprising different attribute feature tag
For the training data of depth convolutional neural networks;
Training unit 202, for being trained using the training data to the depth convolutional neural networks, according to institute
The ratio of the positive sample and negative sample of stating multiple attributive character of training data dynamically adjusts the loss function, so that described
The weighted sum of the loss function of multiple attributive character of training data reaches minimum state or convergence state;
Output unit 203 uses trained depth for obtaining the image data including target object to be identified
Convolutional neural networks identify the attributive character of the target object to be identified, export the target object to be identified
Multiple attributive character.
Optionally, the adjustment unit, further includes:
Design cell, as follows for designing dynamic loss function loss:
When the ratio of the quantity of positive sample and the quantity of negative sample is less than 1:3:
When the ratio of positive sample quantity and the quantity of negative sample is greater than 3:
When the ratio of positive sample data and the quantity of negative sample is the ratio in addition to above-mentioned two ratio:
wl=1
Wherein, E is total loss function of whole network,It is sample, xiIt is the confidence level of attribute l, yilCharacterize sample
xiWhether there is the true tag of attribute l, n indicates sample number, wlIt is used weight when calculating loss function,It indicates
The positive sample number of attribute l,Indicate the negative sample number of attribute l, and a ∈ (0,1], when each iteration, obtains at random.
Claims (10)
1. a kind of attributive character for target object carries out knowledge method for distinguishing characterized by comprising
Using the acquired image data set comprising different attribute feature tag as the training data of depth convolutional neural networks;
The depth convolutional neural networks are trained using the training data, according to multiple attributes of the training data
The ratio of the positive sample of feature and negative sample dynamically adjusts the loss function, so that multiple attributes of the training data are special
The weighted sum of the loss function of sign reaches minimum state or convergence state;
Obtain include target object to be identified image data, using trained depth convolutional neural networks to it is described to
The attributive character of the target object of identification is identified, multiple attributive character of the target object to be identified are exported.
2. the method according to claim 1, wherein wherein the target object is pedestrian, and described being obtained
The image data set comprising different attribute feature tag taken includes:
The image data of pedestrian;And
The different attribute feature tag of pedestrian.
3. according to the method described in claim 2, it is characterized in that, the different attribute feature tag of the pedestrian includes:
Gender, whether band cap, whether wear glasses, whether carry on the back shoulder bag, whether carry on the back both shoulders packet, whether carry hand bag, whether take
Thing and whether make a phone call;
Wherein export multiple attributive character of the target object to be identified, comprising:
By the gender of the pedestrian image data of the identification, whether band cap, whether wear glasses, whether carry on the back shoulder bag, whether carry on the back
Both shoulders packet, whether carry hand bag, whether take thing and multiple and different attributive character for whether making a phone call, rolled up in the depth
The output layer of product neural network is exported.
4. the method according to claim 1, wherein described refreshing to the depth convolution using the training data
It is trained through network, comprising:
The training data is handled according to pre-set size and using method for normalizing, image data is concentrated
All image datas be converted to the image data of identical size;Using the image data of the identical size to depth convolution mind
It is trained through network.
5. the method according to claim 1, wherein wherein according to multiple attributive character of the training data
The ratio of positive sample and negative sample dynamically adjusts the loss function, comprising:
Dynamic loss function are as follows:
When the ratio of the quantity of positive sample and the quantity of negative sample is less than 1:3:
When the ratio of positive sample quantity and the quantity of negative sample is greater than 3:
When the ratio of positive sample data and the quantity of negative sample is the ratio in addition to above-mentioned two ratio:
wl=1
Wherein, E is total loss function of whole network,It is sample, xiIt is the confidence level of attribute l, yilCharacterize sample xiIt is
The no true tag with attribute l, n indicate sample number, wlIt is used weight when calculating loss function,Indicate attribute l
Positive sample number,Indicate the negative sample number of attribute l, and a ∈ (0,1], when each iteration, obtains at random.
6. the method according to claim 1, wherein described obtain the picture number including target object to be identified
According to, comprising:
Video stream data is decoded to obtain video data, selects multiple figures from the video data according to predetermined frame rate
As data, from the image data of to be identified target object of the multiple images data selection comprising different attribute feature.
7. a kind of computer program product comprising the executable program of processor, which is characterized in that the program is held by processor
It is performed the steps of when row
Using the acquired image data set comprising different attribute feature tag as the training data of depth convolutional neural networks;
The depth convolutional neural networks are trained using the training data, according to multiple attributes of the training data
The ratio of the positive sample of feature and negative sample dynamically adjusts the loss function, so that multiple attributes of the training data are special
The weighted sum of the loss function of sign reaches minimum state or convergence state;
Obtain include target object to be identified image data, using trained depth convolutional neural networks to it is described to
The attributive character of the target object of identification is identified, multiple attributive character of the target object to be identified are exported.
8. a kind of system that the attributive character for target object is identified, which is characterized in that the system comprises:
Processor;
Memory for storage processor executable instruction;
Wherein, the processor is configured to: perform claim require 1-6 any one described in method.
9. a kind of device that the attributive character for target object is identified, described device include:
Training data acquiring unit, for being rolled up the acquired image data set comprising different attribute feature tag as depth
The training data of product neural network;
Training unit, for being trained using the training data to the depth convolutional neural networks, according to the training
The ratio of the positive samples of multiple attributive character of data and negative sample dynamically adjusts the loss function, so that the trained number
According to the weighted sums of loss function of multiple attributive character reach minimum state or convergence state;
Output unit uses trained depth convolution mind for obtaining the image data including target object to be identified
It is identified through attributive character of the network to the target object to be identified, exports the multiple of the target object to be identified
Attributive character.
10. device according to claim 9, which is characterized in that the adjustment unit, further includes:
Design cell, as follows for designing dynamic loss function loss:
When the ratio of the quantity of positive sample and the quantity of negative sample is less than 1:3:
When the ratio of positive sample quantity and the quantity of negative sample is greater than 3:
When the ratio of positive sample data and the quantity of negative sample is the ratio in addition to above-mentioned two ratio:
wl=1
Wherein, E is total loss function of whole network,It is sample, xiIt is the confidence level of attribute l, yilCharacterize sample xiIt is
The no true tag with attribute l, n indicate sample number, wlIt is used weight when calculating loss function,Indicate attribute l
Positive sample number,Indicate the negative sample number of attribute l, and a ∈ (0,1], when each iteration, obtains at random.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811003925.6A CN109359515A (en) | 2018-08-30 | 2018-08-30 | A kind of method and device that the attributive character for target object is identified |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811003925.6A CN109359515A (en) | 2018-08-30 | 2018-08-30 | A kind of method and device that the attributive character for target object is identified |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109359515A true CN109359515A (en) | 2019-02-19 |
Family
ID=65350299
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811003925.6A Pending CN109359515A (en) | 2018-08-30 | 2018-08-30 | A kind of method and device that the attributive character for target object is identified |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109359515A (en) |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110245564A (en) * | 2019-05-14 | 2019-09-17 | 平安科技(深圳)有限公司 | A kind of pedestrian detection method, system and terminal device |
CN110288082A (en) * | 2019-06-05 | 2019-09-27 | 北京字节跳动网络技术有限公司 | Convolutional neural networks model training method, device and computer readable storage medium |
CN110457984A (en) * | 2019-05-21 | 2019-11-15 | 电子科技大学 | Pedestrian's attribute recognition approach under monitoring scene based on ResNet-50 |
CN110516602A (en) * | 2019-08-28 | 2019-11-29 | 杭州律橙电子科技有限公司 | A kind of public traffice passenger flow statistical method based on monocular camera and depth learning technology |
CN110598716A (en) * | 2019-09-09 | 2019-12-20 | 北京文安智能技术股份有限公司 | Personnel attribute identification method, device and system |
CN110688888A (en) * | 2019-08-02 | 2020-01-14 | 浙江省北大信息技术高等研究院 | Pedestrian attribute identification method and system based on deep learning |
CN110874577A (en) * | 2019-11-15 | 2020-03-10 | 杭州东信北邮信息技术有限公司 | Automatic verification method of certificate photo based on deep learning |
CN111160411A (en) * | 2019-12-11 | 2020-05-15 | 东软集团股份有限公司 | Classification model training method, image processing method, device, medium, and apparatus |
CN111178403A (en) * | 2019-12-16 | 2020-05-19 | 北京迈格威科技有限公司 | Method and device for training attribute recognition model, electronic equipment and storage medium |
CN111814846A (en) * | 2020-06-19 | 2020-10-23 | 浙江大华技术股份有限公司 | Training method and recognition method of attribute recognition model and related equipment |
CN111931799A (en) * | 2019-05-13 | 2020-11-13 | 百度在线网络技术(北京)有限公司 | Image recognition method and device |
CN112417205A (en) * | 2019-08-20 | 2021-02-26 | 富士通株式会社 | Target retrieval device and method and electronic equipment |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104992142A (en) * | 2015-06-03 | 2015-10-21 | 江苏大学 | Pedestrian recognition method based on combination of depth learning and property learning |
CN106529442A (en) * | 2016-10-26 | 2017-03-22 | 清华大学 | Pedestrian identification method and apparatus |
CN107506786A (en) * | 2017-07-21 | 2017-12-22 | 华中科技大学 | A kind of attributive classification recognition methods based on deep learning |
CN107633223A (en) * | 2017-09-15 | 2018-01-26 | 深圳市唯特视科技有限公司 | A kind of video human attribute recognition approach based on deep layer confrontation network |
CN107766850A (en) * | 2017-11-30 | 2018-03-06 | 电子科技大学 | Based on the face identification method for combining face character information |
CN107832581A (en) * | 2017-12-15 | 2018-03-23 | 百度在线网络技术(北京)有限公司 | Trend prediction method and device |
CN107862300A (en) * | 2017-11-29 | 2018-03-30 | 东华大学 | A kind of descending humanized recognition methods of monitoring scene based on convolutional neural networks |
-
2018
- 2018-08-30 CN CN201811003925.6A patent/CN109359515A/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104992142A (en) * | 2015-06-03 | 2015-10-21 | 江苏大学 | Pedestrian recognition method based on combination of depth learning and property learning |
CN106529442A (en) * | 2016-10-26 | 2017-03-22 | 清华大学 | Pedestrian identification method and apparatus |
CN107506786A (en) * | 2017-07-21 | 2017-12-22 | 华中科技大学 | A kind of attributive classification recognition methods based on deep learning |
CN107633223A (en) * | 2017-09-15 | 2018-01-26 | 深圳市唯特视科技有限公司 | A kind of video human attribute recognition approach based on deep layer confrontation network |
CN107862300A (en) * | 2017-11-29 | 2018-03-30 | 东华大学 | A kind of descending humanized recognition methods of monitoring scene based on convolutional neural networks |
CN107766850A (en) * | 2017-11-30 | 2018-03-06 | 电子科技大学 | Based on the face identification method for combining face character information |
CN107832581A (en) * | 2017-12-15 | 2018-03-23 | 百度在线网络技术(北京)有限公司 | Trend prediction method and device |
Cited By (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111931799A (en) * | 2019-05-13 | 2020-11-13 | 百度在线网络技术(北京)有限公司 | Image recognition method and device |
CN110245564A (en) * | 2019-05-14 | 2019-09-17 | 平安科技(深圳)有限公司 | A kind of pedestrian detection method, system and terminal device |
CN110457984A (en) * | 2019-05-21 | 2019-11-15 | 电子科技大学 | Pedestrian's attribute recognition approach under monitoring scene based on ResNet-50 |
CN110288082A (en) * | 2019-06-05 | 2019-09-27 | 北京字节跳动网络技术有限公司 | Convolutional neural networks model training method, device and computer readable storage medium |
CN110288082B (en) * | 2019-06-05 | 2022-04-05 | 北京字节跳动网络技术有限公司 | Convolutional neural network model training method and device and computer readable storage medium |
CN110688888A (en) * | 2019-08-02 | 2020-01-14 | 浙江省北大信息技术高等研究院 | Pedestrian attribute identification method and system based on deep learning |
CN110688888B (en) * | 2019-08-02 | 2022-08-05 | 杭州未名信科科技有限公司 | Pedestrian attribute identification method and system based on deep learning |
CN112417205A (en) * | 2019-08-20 | 2021-02-26 | 富士通株式会社 | Target retrieval device and method and electronic equipment |
CN110516602A (en) * | 2019-08-28 | 2019-11-29 | 杭州律橙电子科技有限公司 | A kind of public traffice passenger flow statistical method based on monocular camera and depth learning technology |
CN110598716A (en) * | 2019-09-09 | 2019-12-20 | 北京文安智能技术股份有限公司 | Personnel attribute identification method, device and system |
CN110874577B (en) * | 2019-11-15 | 2022-04-15 | 杭州东信北邮信息技术有限公司 | Automatic verification method of certificate photo based on deep learning |
CN110874577A (en) * | 2019-11-15 | 2020-03-10 | 杭州东信北邮信息技术有限公司 | Automatic verification method of certificate photo based on deep learning |
CN111160411A (en) * | 2019-12-11 | 2020-05-15 | 东软集团股份有限公司 | Classification model training method, image processing method, device, medium, and apparatus |
CN111178403A (en) * | 2019-12-16 | 2020-05-19 | 北京迈格威科技有限公司 | Method and device for training attribute recognition model, electronic equipment and storage medium |
CN111178403B (en) * | 2019-12-16 | 2023-10-17 | 北京迈格威科技有限公司 | Method, device, electronic equipment and storage medium for training attribute identification model |
CN111814846A (en) * | 2020-06-19 | 2020-10-23 | 浙江大华技术股份有限公司 | Training method and recognition method of attribute recognition model and related equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109359515A (en) | A kind of method and device that the attributive character for target object is identified | |
CN106778682B (en) | A kind of training method and its equipment of convolutional neural networks model | |
CN107683469A (en) | A kind of product classification method and device based on deep learning | |
CN109299716A (en) | Training method, image partition method, device, equipment and the medium of neural network | |
CN107220277A (en) | Image retrieval algorithm based on cartographical sketching | |
CN104933428B (en) | A kind of face identification method and device based on tensor description | |
CN106570480B (en) | A kind of human action classification method based on gesture recognition | |
CN109191455A (en) | A kind of field crop pest and disease disasters detection method based on SSD convolutional network | |
CN108205676B (en) | The method and apparatus for extracting pictograph region | |
CN109117781A (en) | Method for building up, device and the more attribute recognition approaches of more attribute Recognition Models | |
CN111461164B (en) | Sample data set capacity expansion method and model training method | |
CN110222636B (en) | Pedestrian attribute identification method based on background suppression | |
CN116052218B (en) | Pedestrian re-identification method | |
CN108960260A (en) | A kind of method of generating classification model, medical image image classification method and device | |
CN109886342A (en) | Model training method and device based on machine learning | |
CN111160225A (en) | Human body analysis method and device based on deep learning | |
Latha et al. | Automatic Fruit Detection System using Multilayer Deep Convolution Neural Network | |
CN114445268A (en) | Garment style migration method and system based on deep learning | |
CN112381030A (en) | Satellite optical remote sensing image target detection method based on feature fusion | |
CN113673465A (en) | Image detection method, device, equipment and readable storage medium | |
CN108876776A (en) | A kind of method of generating classification model, eye fundus image classification method and device | |
CN115330759B (en) | Method and device for calculating distance loss based on Hausdorff distance | |
Wang et al. | Self-attention deep saliency network for fabric defect detection | |
CN110796716A (en) | Image coloring method based on multiple residual error networks and regularized transfer learning | |
Liu | Interfruit: deep learning network for classifying fruit images |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190219 |
|
RJ01 | Rejection of invention patent application after publication |