CN112926427A - Target user dressing attribute identification method and device - Google Patents

Target user dressing attribute identification method and device Download PDF

Info

Publication number
CN112926427A
CN112926427A CN202110187498.7A CN202110187498A CN112926427A CN 112926427 A CN112926427 A CN 112926427A CN 202110187498 A CN202110187498 A CN 202110187498A CN 112926427 A CN112926427 A CN 112926427A
Authority
CN
China
Prior art keywords
network
target user
white
attribute value
human body
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110187498.7A
Other languages
Chinese (zh)
Inventor
廖丹萍
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Smart Video Security Innovation Center Co Ltd
Original Assignee
Zhejiang Smart Video Security Innovation Center Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Smart Video Security Innovation Center Co Ltd filed Critical Zhejiang Smart Video Security Innovation Center Co Ltd
Priority to CN202110187498.7A priority Critical patent/CN112926427A/en
Publication of CN112926427A publication Critical patent/CN112926427A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a method for identifying the dressing attribute of a target user, which comprises the following steps: cutting out a human body image corresponding to the human body area from the monitoring image; inputting a human body image into a network model, extracting head characteristics and body characteristics of the human body image by a characteristic extraction network in the network model, inputting the head characteristics into a first identification network and a second identification network in the network model, and inputting the body characteristics into a third identification network in the network model, wherein the first identification network identifies whether a target user wears a mask according to the head characteristics, the second identification network identifies whether the target user wears a white hat according to the head characteristics, and the third identification network identifies whether the target user wears a white jacket according to the body characteristics; and acquiring a mask identification result, a white cap identification result and a white jacket identification result which are output by the network model. The dressing property of a target user (such as a delicatessen operator) is intelligently monitored by utilizing a real-time monitoring picture, so that the continuous monitoring is realized for a long time, and the labor cost is saved.

Description

Target user dressing attribute identification method and device
Technical Field
The invention relates to the technical field of image processing, in particular to a method and a device for identifying the dressing attribute of a target user.
Background
The safety problem of direct access to food such as cooked food and bean products in farmer markets has been an important concern for governments and people. A farmer market cooked food store strictly achieves the 'three-white' principle, namely, the cooked food store operator is ensured to wear white gowns and wear white caps and white mouth covers to operate, and the sanitation and safety of cooked food are guaranteed.
In order to standardize market operation and guarantee food safety, a person generally using a market supervision office patrols and checks delicatessens and punishes illegal stores. The method is time-consuming and labor-consuming, can only play a role in supervision at the inspection time point, and cannot supervise the delicatessen for a long time.
Disclosure of Invention
The present invention provides a method and a device for identifying a clothing attribute of a target user, which are provided to overcome the defects of the prior art, and the object is achieved by the following technical scheme.
The invention provides a method for identifying the dressing attribute of a target user, which comprises the following steps:
cutting out a human body image corresponding to the human body area from the monitoring image;
inputting the human body image into a trained network model, extracting head characteristics and body characteristics of the human body image by a characteristic extraction network in the network model, inputting the head characteristics into a first identification network and a second identification network in the network model, and inputting the body characteristics into a third identification network in the network model, wherein the first identification network identifies whether a target user wears a mask according to the head characteristics, the second identification network identifies whether the target user wears a white hat according to the head characteristics, and the third identification network identifies whether the target user wears a white coat according to the body characteristics;
and acquiring a mask identification result, a white cap identification result and a white jacket identification result which are output by the network model.
A second aspect of the present invention provides an apparatus for identifying a target user dressing attribute, the apparatus comprising:
the cutting module is used for cutting a human body image corresponding to the human body area from the monitoring image;
the recognition module is used for inputting the human body image into a trained network model, extracting head characteristics and body characteristics of the human body image by a characteristic extraction network in the network model, inputting the head characteristics into a first recognition network and a second recognition network in the network model, and inputting the body characteristics into a third recognition network in the network model, wherein the first recognition network recognizes whether a target user wears a mask according to the head characteristics, the second recognition network recognizes whether the target user wears a white hat according to the head characteristics, and the third recognition network recognizes whether the target user wears a white gown according to the body characteristics;
and the result acquisition module is used for acquiring the mask identification result, the white cap identification result and the white coat identification result which are output by the network model.
Based on the method and the device for identifying the target user dressing attribute in the first aspect and the second aspect, the method and the device have the following beneficial effects:
the dressing property of a target user (such as a delicatessen operator) is intelligently monitored by utilizing a real-time monitoring picture, so that the intelligent monitoring system can be used for continuously monitoring for a long time, and a large amount of labor cost is saved. In the process of dressing identification of a human body image cut out from a monitoring picture by using a network model, in order to obtain a more accurate identification result, the head characteristic and the body characteristic of the human body are distinguished and respectively identified, when a white hat and a mask are identified, only the head characteristic is used, when a white jacket is identified, only the body characteristic is used, and the interference of other region characteristics is avoided.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention and not to limit the invention. In the drawings:
FIG. 1 is a flowchart illustrating an embodiment of a target user apparel attribute identification method in accordance with an illustrative embodiment of the present invention;
FIG. 2 is a schematic diagram of a network model architecture according to an exemplary embodiment of the present invention;
FIG. 3 is a flowchart illustrating an embodiment of a model training method according to an exemplary embodiment of the present invention;
fig. 4 is a schematic structural diagram illustrating a target user dressing attribute identification apparatus according to an exemplary embodiment of the present invention.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present invention. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the invention, as detailed in the appended claims.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in this specification and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.
It is to be understood that although the terms first, second, third, etc. may be used herein to describe various information, these information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present invention. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.
Fig. 1 is a flowchart illustrating an embodiment of a method for identifying clothing attributes of a target user according to an exemplary embodiment of the present invention, where the target user may be a delicatessen operator, and may also be another person who needs to be supervised for clothing. As shown in fig. 1, the method for identifying the clothing attribute of the target user includes the following steps:
step 101: and cutting out a human body image corresponding to the human body area from the monitoring image.
In some embodiments, the human body image corresponding to the human body region can be cut out from the monitoring image as the identification data source by acquiring the monitoring image of the target scene acquired by the camera and detecting the human body region in the monitoring image, so as to remove the irrelevant background elements.
The target scene is a scene needing to be supervised, for example, when the target scene is a delicatessen store, the camera monitors an operator in the delicatessen store in real time, so that a single-frame monitoring image can be extracted from a monitoring video acquired by the camera for human body detection, and if a human body area is detected, the single-frame monitoring image can be cut out to serve as an identification data source.
It can be understood that the human body detection technology in the prior art can be adopted to realize human body region detection, such as a human body detection model, a human body detection algorithm, and the like, and the present invention is not particularly limited thereto, as long as the human body detection function can be realized.
Step 102: inputting a human body image into a trained network model, extracting head characteristics and body characteristics of the human body image by a characteristic extraction network in the network model, inputting the head characteristics into a first identification network and a second identification network in the network model, and inputting the body characteristics into a third identification network in the network model, wherein the first identification network identifies whether a target user wears a mask according to the head characteristics, the second identification network identifies whether the target user wears a white hat according to the head characteristics, and the third identification network identifies whether the target user wears a white jacket according to the body characteristics.
Fig. 2 is a structure of a network model, and the following describes the identification process of the network model in detail with reference to fig. 2:
1. processing flow of input human body image by feature extraction network
The method comprises the steps of firstly, extracting global body characteristics of a human body image through a characteristic extraction module in a characteristic extraction network, segmenting the global body characteristics according to preset body proportion distribution, inputting head characteristics and body characteristics obtained through segmentation into a global average pooling layer, and respectively carrying out global average pooling on the head characteristics and the body characteristics through the global average pooling layer.
It should be noted that, because the body proportion distribution difference of people and the posture difference of people are large, the head and the body are distinguished by a unique dividing line, which results in inaccurate part cutting. Therefore, the present embodiment reserves a small amount of overlap between the head and body features to maximize the integrity of the relevant body part features. That is, there is an overlapping feature between the head feature and the body feature obtained by segmentation.
For example, assuming that the output feature of the feature extraction module is 7 × 7 × 2048, the feature of the upper 2 × 7 × 2048 of the output feature is used as a head feature, and is input to the global average pooling layer for global average pooling, so that the 2 × 7 × 2048 feature is mapped to a 1 × 1 × 2048 dimensional head feature; and taking the feature of the lower part 6 multiplied by 7 multiplied by 2048 of the output feature as a body feature, and inputting the feature into a global average pooling layer to perform global average pooling, so that the 6 multiplied by 7 multiplied by 2048 feature is mapped into the body feature of 1 multiplied by 2048 dimension.
It can be seen that the overlapping features between the head and body features are the features of row 2 in a 7 x 7 matrix.
2. Flow of first identification network for identifying whether to wear mask according to head characteristics
The classification layer in the first identification network calculates a first attribute value of a mask worn by a target user, a second attribute value of the mask not worn by the target user and a third attribute value of the mask which cannot be identified according to the head characteristics and outputs the first attribute value, the second attribute value and the third attribute value to the softmax layer, and the softmax layer converts the first attribute value, the second attribute value and the third attribute value into probability distribution and takes the probability distribution as a mask identification result.
Wherein the softmax layer is used to perform a normalization operation on the input plurality of attribute values to convert each attribute value into a probability, and the probabilities are summed to 1, that is, the probability of wearing the mask, the probability of not wearing the mask, and the probability of being unable to recognize the mask are summed to 1.
It should be noted that, when the target user faces the camera, the monitoring image collected by the camera can identify whether to wear the mask, and when the target user faces away from the camera, the monitoring image cannot identify whether to wear the mask, so that the category of "the mask cannot be identified" is added to the first identification network.
3. The second identification network identifies whether the target user wears a white hat according to the head characteristics:
and the classification layer in the second recognition network calculates a fourth attribute value of the target user wearing the white hat and a fifth attribute value of the target user not wearing the white hat according to the head characteristics and outputs the fourth attribute value and the fifth attribute value to the softmax layer in the second recognition network, and the softmax layer converts the fourth attribute value and the fifth attribute value into probability distribution and takes the probability distribution as a white hat recognition result.
Based on the same principle, the softmax layer is used to perform a normalization operation on a plurality of attribute values input to convert each attribute value into a probability, and the probabilities are added up to 1. That is, the probability of wearing white hat and the probability of not wearing white hat are added to 1.
4. The third identification network identifies whether the target user wears a Chinese jacket or not according to the body characteristics
And the classification layer in the third identification network calculates a sixth attribute value of the target user for penetrating the white gown and a seventh attribute value of the target user for not penetrating the white gown according to the body characteristics and outputs the sixth attribute value and the seventh attribute value to the softmax layer in the third identification network, and the softmax layer converts the sixth attribute value and the seventh attribute value into probability distribution and takes the probability distribution as the identification result of the white gown.
Wherein the sum of the probability of the white gown being penetrated and the probability of the white gown not being penetrated is 1.
Based on the above description, the final output length of the network model is the dressing attribute probability of 7, which is the probability of wearing a mask, the probability of not recognizing a mask, the probability of wearing a white hat, the probability of not wearing a white hat, the probability of wearing a white gown, and the probability of not wearing a white gown.
Step 103: and acquiring a mask identification result, a white cap identification result and a white jacket identification result which are output by the network model.
It should be noted that, for the mask wearing type, the attribute corresponding to the maximum probability in the mask identification result may be used as the classification attribute, for the white hat wearing type, the attribute corresponding to the maximum probability in the white hat identification result may be used as the classification attribute, and for the white gown wearing type, the attribute corresponding to the maximum probability in the gown identification result may be used as the classification attribute.
So far, the identification process shown in fig. 1 is completed, and the dressing attribute of the target user (such as a delicatessen operator) is intelligently monitored by using a real-time monitoring picture, so that the continuous monitoring process can be carried out for a long time, and a large amount of labor cost is saved. In the process of dressing identification of a human body image cut out from a monitoring picture by using a network model, in order to obtain a more accurate identification result, the head characteristic and the body characteristic of the human body are distinguished and respectively identified, when a white hat and a mask are identified, only the head characteristic is used, when a white jacket is identified, only the body characteristic is used, and the interference of other region characteristics is avoided.
Fig. 3 is a flowchart of an embodiment of a model training method according to an exemplary embodiment of the present invention, where the training method of the present embodiment is used for training the network model shown in fig. 2, and as shown in fig. 3, the model training method includes the following steps:
step 301: acquiring a plurality of training images containing target users, and establishing label vectors for each frame of training image, wherein the label vectors comprise 7 attribute components of a white hat, a non-white hat, a mask, a non-mask, an unrecognizable mask, a white gown and a non-white gown.
In this embodiment, for the process of creating the label vector for the training image, it is assumed that 0 represents that the training image does not have the component attribute, and 1 represents that the training image has the component attribute. For example, when the operator in the training image wears a white jacket, does not wear a white hat, and does not wear a mask, the corresponding label vector is [1,0,0,1,0,1,0 ].
That is, only one of the components of each type of dressing property is 1, and the rest are 0. Namely, the operator can only be in one of the two states of wearing white gowns or not wearing white gowns, and the mask and the white cap are also treated in the same way.
Step 302: and training a pre-constructed network model by using the multi-frame training image, calculating difference loss of a mask recognition result, a white cap recognition result and a white coat recognition result output by the network model and corresponding label vectors in the training process, and optimizing parameters of the network model according to the difference loss by adopting a gradient descent method.
Wherein, as described in the step 102, the present invention performs softmax normalization operation on each type of dressing attribute separately. Thus, in training, cross-entropy (cross entropy) loss can be calculated from the attribute components after softmax normalization of each type of rigged attribute with the true tag vector. The cross entropy loss is defined as follows:
Figure BDA0002943547440000091
wherein, yiIs the label value, z, of the ith attribute componentiAnd predicting the probability value corresponding to the ith attribute component output by the network model, wherein the value of i is from 1 to 7.
Specifically, as shown in fig. 2, the constructed network model structure may be configured to set the hyper-parameters of the network training, including batch size, learning rate, and the like, according to the size of the data set composed of the training images and whether the pre-training model is adopted during training. And calculating the output error of the network model by using the loss function, so that the output of the network is as close to the label vector as possible, thereby realizing the minimum classification error and further extracting the distinguishing characteristics of the image.
Thus, the training process shown in fig. 3 is completed, and by using the network model obtained by the training process, whether the target user wears a white hat, whether wears a mask, and whether wears a white gown in the monitored image can be accurately identified.
Corresponding to the embodiment of the target user dressing attribute identification method, the invention also provides an embodiment of a target user dressing attribute identification device.
Fig. 4 is a flowchart illustrating an embodiment of a target user clothing attribute identification device according to an exemplary embodiment of the present invention, and as shown in fig. 4, the target user clothing attribute identification device includes:
a cutting module 410, configured to cut a human body image corresponding to a human body region from the monitoring image;
the recognition module 420 is configured to input the human body image into a trained network model, extract head features and body features of the human body image through a feature extraction network in the network model, input the head features into a first recognition network and a second recognition network in the network model, and input the body features into a third recognition network in the network model, where the first recognition network recognizes whether a target user wears a mask according to the head features, the second recognition network recognizes whether the target user wears a white hat according to the head features, and the third recognition network recognizes whether the target user wears a white gown according to the body features;
and a result obtaining module 430, configured to obtain a mask recognition result, a white cap recognition result, and a white coat recognition result output by the network model.
In an optional implementation manner, the cropping module 410 is specifically configured to obtain a monitoring image of a target scene acquired by a camera; detecting a human body region in the monitoring image; and cutting out the human body image corresponding to the human body area from the monitoring image.
In an optional implementation manner, the identification module 420 is specifically configured to, in the process of extracting the head feature and the body feature of the human body image by using a feature extraction module in a feature extraction network, extract the global body feature of the human body image by using the feature extraction module in the feature extraction network, segment the global body feature according to a preset body proportion distribution, and input the head feature and the body feature obtained by segmentation into a global average pooling layer in the feature extraction network; and the global average pooling layer is used for respectively carrying out global average pooling on the head features and the body features.
In an optional implementation manner, the identification module 420 is specifically configured to, in a process that a first identification network identifies whether a target user wears a mask according to head features, calculate, by a classification layer in the first identification network, a first attribute value of the mask worn by the target user, a second attribute value of the mask not worn by the target user, and a third attribute value of the mask which cannot be identified according to the head features, and output the first attribute value, the second attribute value, and the third attribute value to a softmax layer in the first identification network; and the softmax layer converts the first attribute value, the second attribute value and the third attribute value into probability distribution and takes the probability distribution as a mask identification result.
In an optional implementation manner, the identifying module 420 is specifically configured to, in the process that the second identifying network identifies whether the target user wears a white hat according to the head feature, calculate, by the classification layer in the second identifying network, a fourth attribute value of the target user wearing the white hat and a fifth attribute value of the target user not wearing the white hat according to the head feature, and output the fourth attribute value and the fifth attribute value to a softmax layer in the second identifying network; and the softmax layer converts the fourth attribute value and the fifth attribute value into probability distribution and takes the probability distribution as a white hat identification result.
In an optional implementation manner, the identification module 420 is specifically configured to, in a process that a third identification network identifies whether a target user wears a white gown according to body characteristics, calculate, by a classification layer in the third identification network, a sixth attribute value of the target user wearing a white gown and a seventh attribute value of the target user not wearing a white gown according to the body characteristics, and output the sixth attribute value and the seventh attribute value to a softmax layer in the third identification network; and the softmax layer converts the sixth attribute value and the seventh attribute value into probability distribution and takes the probability distribution as a recognition result of the white gown.
In an alternative implementation, the apparatus further comprises (not shown in fig. 4):
the model training module is used for acquiring a plurality of training images containing a target user and establishing a label vector for each training image, wherein the label vector comprises 7 attribute components of a white hat, a non-white hat, a mask, a non-mask, an unrecognizable mask, a white coat and a non-white coat; training a pre-constructed network model by using the multi-frame training image; in the training process, the mask recognition result, the white hat recognition result and the white coat recognition result output by the network model and the corresponding label vectors are used for calculating the difference loss, and the parameters of the network model are optimized according to the difference loss by adopting a gradient descent method.
In an optional implementation manner, the model training module is specifically configured to, in a process of obtaining multiple frames of training images including a target user, extract frames from a surveillance video of a target scene acquired by a camera to obtain multiple frames of surveillance images; and detecting a human body area in each monitoring image, and cutting out a human body image corresponding to the human body area from the monitoring image as a training image containing a target user.
The implementation process of the functions and actions of each unit in the above device is specifically described in the implementation process of the corresponding step in the above method, and is not described herein again.
For the device embodiments, since they substantially correspond to the method embodiments, reference may be made to the partial description of the method embodiments for relevant points. The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the scheme of the invention. One of ordinary skill in the art can understand and implement it without inventive effort.
Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This invention is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (10)

1. A method for identifying target user dressing attributes, the method comprising:
cutting out a human body image corresponding to the human body area from the monitoring image;
inputting the human body image into a trained network model, extracting head characteristics and body characteristics of the human body image by a characteristic extraction network in the network model, inputting the head characteristics into a first identification network and a second identification network in the network model, and inputting the body characteristics into a third identification network in the network model, wherein the first identification network identifies whether a target user wears a mask according to the head characteristics, the second identification network identifies whether the target user wears a white hat according to the head characteristics, and the third identification network identifies whether the target user wears a white coat according to the body characteristics;
and acquiring a mask identification result, a white cap identification result and a white jacket identification result which are output by the network model.
2. The method according to claim 1, wherein the cropping of the human body image corresponding to the human body region from the monitoring image comprises:
acquiring a monitoring image of a target scene acquired by a camera;
detecting a human body region in the monitoring image;
and cutting out the human body image corresponding to the human body area from the monitoring image.
3. The method of claim 1, wherein the feature extraction network extracts head features and body features of the human body image, and comprises:
extracting global body characteristics of the human body image through a characteristic extraction module in the characteristic extraction network, segmenting the global body characteristics according to preset body proportion distribution, and inputting head characteristics and body characteristics obtained through segmentation into a global average pooling layer in the characteristic extraction network;
and the global average pooling layer is used for respectively carrying out global average pooling on the head features and the body features.
4. The method of claim 1, wherein the first identification network identifies whether the mask is worn by the target user based on the head characteristics, comprising:
the classification layer in the first identification network calculates a first attribute value of a mask worn by a target user, a second attribute value of the mask not worn by the target user and a third attribute value of the mask which cannot be identified according to the head features and outputs the first attribute value, the second attribute value and the third attribute value to a softmax layer in the first identification network;
and the softmax layer converts the first attribute value, the second attribute value and the third attribute value into probability distribution and takes the probability distribution as a mask identification result.
5. The method of claim 1, wherein the second recognition network recognizes whether the target user wears white hat according to head features, comprising:
the classification layer in the second recognition network calculates a fourth attribute value of the target user wearing a white hat and a fifth attribute value of the target user not wearing the white hat according to the head features and outputs the fourth attribute value and the fifth attribute value to the softmax layer in the second recognition network;
and the softmax layer converts the fourth attribute value and the fifth attribute value into probability distribution and takes the probability distribution as a white hat identification result.
6. The method of claim 1, wherein the third recognition network recognizing whether the target user wears a gown based on body characteristics comprises:
the classification layer in the third recognition network calculates a sixth attribute value of the target user with the Chinese gown threaded and a seventh attribute value of the Chinese gown not threaded according to the body characteristics and outputs the sixth attribute value and the seventh attribute value to the softmax layer in the third recognition network;
and the softmax layer converts the sixth attribute value and the seventh attribute value into probability distribution and takes the probability distribution as a recognition result of the white gown.
7. The method of claim 1, wherein the training process of the network model comprises:
acquiring a plurality of training images containing a target user, and establishing a label vector for each training image, wherein the label vector comprises 7 attribute components of a white hat, a non-white hat, a mask, a non-mask, an unrecognizable mask, a white gown and a non-white gown;
training a pre-constructed network model by using the multi-frame training image;
in the training process, the mask recognition result, the white hat recognition result and the white coat recognition result output by the network model and the corresponding label vectors are used for calculating the difference loss, and the parameters of the network model are optimized according to the difference loss by adopting a gradient descent method.
8. The method of claim 7, wherein the obtaining a plurality of frames of training images containing a target user comprises:
extracting frames from a monitoring video of a target scene collected by a camera to obtain a plurality of frames of monitoring images;
and detecting a human body area in each monitoring image, and cutting out a human body image corresponding to the human body area from the monitoring image as a training image containing a target user.
9. An apparatus for identifying target user dressing attributes, the apparatus comprising:
the cutting module is used for cutting a human body image corresponding to the human body area from the monitoring image;
the recognition module is used for inputting the human body image into a trained network model, extracting head characteristics and body characteristics of the human body image by a characteristic extraction network in the network model, inputting the head characteristics into a first recognition network and a second recognition network in the network model, and inputting the body characteristics into a third recognition network in the network model, wherein the first recognition network recognizes whether a target user wears a mask according to the head characteristics, the second recognition network recognizes whether the target user wears a white hat according to the head characteristics, and the third recognition network recognizes whether the target user wears a white gown according to the body characteristics;
and the result acquisition module is used for acquiring the mask identification result, the white cap identification result and the white coat identification result which are output by the network model.
10. The apparatus of claim 9, wherein the apparatus comprises:
the model training module is used for acquiring a plurality of training images containing a target user and establishing a label vector for each training image, wherein the label vector comprises 7 attribute components of a white hat, a non-white hat, a mask, a non-mask, an unrecognizable mask, a white coat and a non-white coat; training a pre-constructed network model by using the multi-frame training image; in the training process, the mask recognition result, the white hat recognition result and the white coat recognition result output by the network model and the corresponding label vectors are used for calculating the difference loss, and the parameters of the network model are optimized according to the difference loss by adopting a gradient descent method.
CN202110187498.7A 2021-02-18 2021-02-18 Target user dressing attribute identification method and device Pending CN112926427A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110187498.7A CN112926427A (en) 2021-02-18 2021-02-18 Target user dressing attribute identification method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110187498.7A CN112926427A (en) 2021-02-18 2021-02-18 Target user dressing attribute identification method and device

Publications (1)

Publication Number Publication Date
CN112926427A true CN112926427A (en) 2021-06-08

Family

ID=76171498

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110187498.7A Pending CN112926427A (en) 2021-02-18 2021-02-18 Target user dressing attribute identification method and device

Country Status (1)

Country Link
CN (1) CN112926427A (en)

Citations (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106250874A (en) * 2016-08-16 2016-12-21 东方网力科技股份有限公司 A kind of dress ornament and the recognition methods of carry-on articles and device
CN109614925A (en) * 2017-12-07 2019-04-12 深圳市商汤科技有限公司 Dress ornament attribute recognition approach and device, electronic equipment, storage medium
CN109784140A (en) * 2018-11-19 2019-05-21 深圳市华尊科技股份有限公司 Driver attributes' recognition methods and Related product
CN109800665A (en) * 2018-12-28 2019-05-24 广州粤建三和软件股份有限公司 A kind of Human bodys' response method, system and storage medium
CN110188701A (en) * 2019-05-31 2019-08-30 上海媒智科技有限公司 Dress ornament recognition methods, system and terminal based on the prediction of human body key node
WO2020040391A1 (en) * 2018-08-24 2020-02-27 전북대학교산학협력단 Combined deep layer network-based system for pedestrian recognition and attribute extraction
CN111062429A (en) * 2019-12-12 2020-04-24 上海点泽智能科技有限公司 Chef cap and mask wearing detection method based on deep learning
CN111414812A (en) * 2020-03-03 2020-07-14 平安科技(深圳)有限公司 Human body attribute identification method, system, computer device and storage medium
US20200272902A1 (en) * 2017-09-04 2020-08-27 Huawei Technologies Co., Ltd. Pedestrian attribute identification and positioning method and convolutional neural network system
CN111639544A (en) * 2020-05-07 2020-09-08 齐齐哈尔大学 Expression recognition method based on multi-branch cross-connection convolutional neural network
CN111753795A (en) * 2020-06-30 2020-10-09 北京爱奇艺科技有限公司 Action recognition method and device, electronic equipment and storage medium
CN111860253A (en) * 2020-07-10 2020-10-30 东莞正扬电子机械有限公司 Multitask attribute identification method, multitask attribute identification device, multitask attribute identification medium and multitask attribute identification equipment for driving scene
CN112016527A (en) * 2020-10-19 2020-12-01 成都大熊猫繁育研究基地 Panda behavior recognition method, system, terminal and medium based on deep learning
CN112052819A (en) * 2020-09-15 2020-12-08 浙江智慧视频安防创新中心有限公司 Pedestrian re-identification method, device, equipment and storage medium
CN112084913A (en) * 2020-08-15 2020-12-15 电子科技大学 End-to-end human body detection and attribute identification method
CN112149514A (en) * 2020-08-28 2020-12-29 中国地质大学(武汉) Method and system for detecting safety dressing of construction worker
CN112149512A (en) * 2020-08-28 2020-12-29 成都飞机工业(集团)有限责任公司 Helmet wearing identification method based on two-stage deep learning

Patent Citations (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106250874A (en) * 2016-08-16 2016-12-21 东方网力科技股份有限公司 A kind of dress ornament and the recognition methods of carry-on articles and device
US20200272902A1 (en) * 2017-09-04 2020-08-27 Huawei Technologies Co., Ltd. Pedestrian attribute identification and positioning method and convolutional neural network system
CN109614925A (en) * 2017-12-07 2019-04-12 深圳市商汤科技有限公司 Dress ornament attribute recognition approach and device, electronic equipment, storage medium
WO2020040391A1 (en) * 2018-08-24 2020-02-27 전북대학교산학협력단 Combined deep layer network-based system for pedestrian recognition and attribute extraction
CN109784140A (en) * 2018-11-19 2019-05-21 深圳市华尊科技股份有限公司 Driver attributes' recognition methods and Related product
CN109800665A (en) * 2018-12-28 2019-05-24 广州粤建三和软件股份有限公司 A kind of Human bodys' response method, system and storage medium
CN110188701A (en) * 2019-05-31 2019-08-30 上海媒智科技有限公司 Dress ornament recognition methods, system and terminal based on the prediction of human body key node
CN111062429A (en) * 2019-12-12 2020-04-24 上海点泽智能科技有限公司 Chef cap and mask wearing detection method based on deep learning
CN111414812A (en) * 2020-03-03 2020-07-14 平安科技(深圳)有限公司 Human body attribute identification method, system, computer device and storage medium
CN111639544A (en) * 2020-05-07 2020-09-08 齐齐哈尔大学 Expression recognition method based on multi-branch cross-connection convolutional neural network
CN111753795A (en) * 2020-06-30 2020-10-09 北京爱奇艺科技有限公司 Action recognition method and device, electronic equipment and storage medium
CN111860253A (en) * 2020-07-10 2020-10-30 东莞正扬电子机械有限公司 Multitask attribute identification method, multitask attribute identification device, multitask attribute identification medium and multitask attribute identification equipment for driving scene
CN112084913A (en) * 2020-08-15 2020-12-15 电子科技大学 End-to-end human body detection and attribute identification method
CN112149514A (en) * 2020-08-28 2020-12-29 中国地质大学(武汉) Method and system for detecting safety dressing of construction worker
CN112149512A (en) * 2020-08-28 2020-12-29 成都飞机工业(集团)有限责任公司 Helmet wearing identification method based on two-stage deep learning
CN112052819A (en) * 2020-09-15 2020-12-08 浙江智慧视频安防创新中心有限公司 Pedestrian re-identification method, device, equipment and storage medium
CN112016527A (en) * 2020-10-19 2020-12-01 成都大熊猫繁育研究基地 Panda behavior recognition method, system, terminal and medium based on deep learning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
郎波;张娜;段新新;: "基于融合机制的多模型神经网络人物群体分类模型", 计算机系统应用, no. 08, pages 127 - 134 *

Similar Documents

Publication Publication Date Title
CN108520226B (en) Pedestrian re-identification method based on body decomposition and significance detection
Rahmad et al. Comparison of Viola-Jones Haar Cascade classifier and histogram of oriented gradients (HOG) for face detection
CN111052126B (en) Pedestrian attribute identification and positioning method and convolutional neural network system
CN106778609A (en) A kind of electric power construction field personnel uniform wears recognition methods
Raut et al. Plant disease detection in image processing using MATLAB
CN109522853B (en) Face datection and searching method towards monitor video
US8379920B2 (en) Real-time clothing recognition in surveillance videos
Gowsikhaa et al. Suspicious Human Activity Detection from Surveillance Videos.
CN106128022B (en) A kind of wisdom gold eyeball identification violent action alarm method
CN108053427A (en) A kind of modified multi-object tracking method, system and device based on KCF and Kalman
WO2016190814A1 (en) Method and system for facial recognition
US20110142335A1 (en) Image Comparison System and Method
US8855363B2 (en) Efficient method for tracking people
CN106682578B (en) Weak light face recognition method based on blink detection
CN111814638B (en) Security scene flame detection method based on deep learning
CN109271884A (en) Face character recognition methods, device, terminal device and storage medium
CN108052859A (en) A kind of anomaly detection method, system and device based on cluster Optical-flow Feature
JP2017111660A (en) Video pattern learning device, method and program
CN109558810A (en) Divided based on position and merges target person recognition methods
US20100111375A1 (en) Method for Determining Atributes of Faces in Images
CN114937232B (en) Wearing detection method, system and equipment for medical waste treatment personnel protective appliance
CN110378179A (en) Subway based on infrared thermal imaging is stolen a ride behavioral value method and system
CN107085729B (en) Bayesian inference-based personnel detection result correction method
CN107533547B (en) Product indexing method and system
CN110443179A (en) It leaves the post detection method, device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination