CN111950496B - Mask person identity recognition method - Google Patents
Mask person identity recognition method Download PDFInfo
- Publication number
- CN111950496B CN111950496B CN202010843398.0A CN202010843398A CN111950496B CN 111950496 B CN111950496 B CN 111950496B CN 202010843398 A CN202010843398 A CN 202010843398A CN 111950496 B CN111950496 B CN 111950496B
- Authority
- CN
- China
- Prior art keywords
- image
- features
- mask
- person
- preset
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 39
- 230000005021 gait Effects 0.000 claims abstract description 35
- 230000011218 segmentation Effects 0.000 claims abstract description 27
- 238000012545 processing Methods 0.000 claims abstract description 19
- 230000003068 static effect Effects 0.000 claims abstract description 15
- 238000000926 separation method Methods 0.000 claims abstract description 9
- 238000012935 Averaging Methods 0.000 claims abstract description 7
- 238000012549 training Methods 0.000 claims description 28
- 230000008569 process Effects 0.000 claims description 6
- 238000009432 framing Methods 0.000 claims description 4
- 230000006870 function Effects 0.000 description 23
- 230000004913 activation Effects 0.000 description 9
- 238000013527 convolutional neural network Methods 0.000 description 6
- 230000000694 effects Effects 0.000 description 5
- 238000011176 pooling Methods 0.000 description 4
- 230000008859 change Effects 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 238000000605 extraction Methods 0.000 description 3
- 230000001815 facial effect Effects 0.000 description 2
- 238000012544 monitoring process Methods 0.000 description 2
- 230000003321 amplification Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003199 nucleic acid amplification method Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/168—Feature extraction; Face representation
- G06V40/171—Local features and components; Facial parts ; Occluding parts, e.g. glasses; Geometrical relationships
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/049—Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformations in the plane of the image
- G06T3/40—Scaling of whole images or parts thereof, e.g. expanding or contracting
- G06T3/4053—Scaling of whole images or parts thereof, e.g. expanding or contracting based on super-resolution, i.e. the output image resolution being higher than the sensor resolution
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
- G06V10/267—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/103—Static body considered as a whole, e.g. static pedestrian or occupant recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/172—Classification, e.g. identification
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Multimedia (AREA)
- Data Mining & Analysis (AREA)
- Oral & Maxillofacial Surgery (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Human Computer Interaction (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Image Analysis (AREA)
Abstract
The application discloses a method for identifying identity of a mask person, which comprises the following steps: inputting the image of the person to be identified into a preset area segmentation network for segmentation processing to obtain an image of the person to be identified; inputting the mask person region image into a preset encoding-decoding model for feature separation, deleting the extracted external features, and outputting typical features and gesture features of the mask person region image; carrying out averaging treatment on typical features of the area image of the mask person to obtain static gait features, and inputting the gesture features into a preset LSTM network for treatment to obtain dynamic gait features; the static gait characteristics and the dynamic gait characteristics are input into a preset classifier for classification and identification, so that an identity identification result of the image of the person to be identified is obtained, and the technical problem that the identity of the person to be identified is difficult to identify by the existing face identification method is solved.
Description
Technical Field
The application relates to the technical field of identity recognition, in particular to a method for recognizing identity of a mask person.
Background
In the prior art, identity recognition is usually performed through face recognition, but when facing a mask with facial features blocked by a sunglasses, a hat, a mask or the like, the face recognition method cannot accurately extract the facial features of the mask, so that the identity of the mask cannot be recognized. Particularly in the case of a lower resolution photographing device, it is more difficult to identify the identity of the person.
Disclosure of Invention
The application provides a method for identifying identity of a person to be masked, which is used for solving the technical problem that the identity of the person to be masked is difficult to identify in the existing face identification method.
In view of this, a first aspect of the present application provides a method for identifying identity of a person on a mask, including:
inputting the image of the person to be identified into a preset area segmentation network for segmentation processing to obtain an image of the person to be identified;
inputting the mask area image into a preset encoding-decoding model for feature separation, deleting the extracted external features, and outputting typical features and gesture features of the mask area image;
carrying out averaging treatment on the typical characteristics of the mask area image to obtain static gait characteristics, and inputting the gesture characteristics into a preset LSTM network for treatment to obtain dynamic gait characteristics;
and inputting the static gait characteristics and the dynamic gait characteristics into a preset classifier for classification and identification to obtain an identification result of the to-be-identified mask image.
Optionally, inputting the image of the person to be identified into a preset area segmentation network for segmentation processing to obtain an image of the person to be identified, and before the step of obtaining the image of the person to be identified, further includes:
inputting the to-be-identified mask image into a preset super-resolution network for processing, and outputting a super-resolution mask image;
correspondingly, the step of inputting the image of the person to be identified into a preset area segmentation network for segmentation processing to obtain the image of the person to be identified comprises the following steps:
and inputting the super-resolution mask image into a preset region segmentation network for segmentation processing to obtain a mask region image.
Optionally, the preset super-resolution network includes a first convolution module and a second convolution module;
the first convolution module comprises 6 convolution layers and one sub-pixel convolution layer, and is used for improving pixels in the length direction and the width direction of the to-be-identified mask image;
the second convolution module comprises 4 convolution layers and one sub-pixel convolution layer and is used for improving pixels in the height direction of the to-be-identified mask image.
Optionally, the configuration process of the preset encoding-decoding model includes:
framing the acquired video data to obtain a training sample image;
sequentially inputting the training sample images to an encoding-decoding network, so that an encoder in the encoding-decoding network encodes the training sample images, outputting external features, gesture features and typical features of the training sample images, performing image reconstruction by a decoder in the encoding-decoding network based on the features output by the encoder, and outputting reconstructed images, wherein the training sample images are mask region images obtained by dividing mask images to be trained;
based on the reconstructed image and the training sample image, separating non-posture features of the training sample image through a cross reconstruction loss function, separating posture features of the training sample image through a posture similarity loss function, and separating typical features of the training sample image from the non-posture features through a standard consistency loss function, wherein the non-posture features comprise the external features and the typical features.
Optionally, the cross-reconstruction loss function is:
wherein t is 1 ,t 2 F for different moments under the same video a As external features, f p For gesture features, f c As is typical of the features of this application,at t 2 The training sample image at the moment, D (·) is the decoding function.
Optionally, the pose similarity loss function is:
wherein n is 1 For video scene c 1 Number of video frames, n 2 For video scene c 2 The number of video frames below.
Optionally, the canonical consistency loss function:
optionally, the preset area dividing network is a trained Mask R-CNN network.
From the above technical scheme, the application has the following advantages:
the application provides a method for identifying identity of a mask person, which comprises the following steps: inputting the image of the person to be identified into a preset area segmentation network for segmentation processing to obtain an image of the person to be identified; inputting the mask person region image into a preset encoding-decoding model for feature separation, deleting the extracted external features, and outputting typical features and gesture features of the mask person region image; carrying out averaging treatment on typical features of the area image of the mask person to obtain static gait features, and inputting the gesture features into a preset LSTM network for treatment to obtain dynamic gait features; and inputting the static gait characteristics and the dynamic gait characteristics into a preset classifier for classification and identification, and obtaining an identity identification result of the to-be-identified mask image.
According to the identity recognition method of the mask, the mask area and the background area are segmented through the preset area segmentation network, so that a mask area image is obtained, and interference of factors such as background is reduced; the gait characteristics of the mask are obtained by extracting the typical characteristics and the gesture characteristics of the mask, so that the identity of the mask is identified, and the identity of the mask is identified by extracting the gait characteristics of the mask, so that the problem that the face characteristics of the mask cannot be accurately extracted and the identity of the mask cannot be identified by a face identification method is avoided; in addition, in consideration of the fact that the external features of the mask extracted by the convolutional neural network are different due to the fact that the mask possibly wears different in different scenes, the identity recognition effect of the mask is affected by the fact that the different external features participate in the identity recognition of the mask, in order to avoid the effect of the identity recognition due to the fact that the change of the external features of the mask is affected, the external features are separated and deleted through the preset coding-decoding model, accuracy of the identity recognition of the mask is improved, and therefore the technical problem that the identity recognition of the mask is difficult to recognize in an existing face recognition method is solved.
Drawings
In order to more clearly illustrate the embodiments of the application or the technical solutions of the prior art, the drawings which are used in the description of the embodiments or the prior art will be briefly described, it being obvious that the drawings in the description below are only some embodiments of the application, and that other drawings can be obtained from these drawings without inventive faculty for a person skilled in the art.
FIG. 1 is a schematic flow chart of a method for identifying identity of a person to be masked according to an embodiment of the present application;
FIG. 2 is a schematic flow chart of a method for identifying a person with a mask according to an embodiment of the present application;
fig. 3 is a schematic structural diagram of a super-resolution network according to an embodiment of the present application.
Detailed Description
The application provides a method for identifying identity of a person to be masked, which is used for solving the technical problem that the identity of the person to be masked is difficult to identify in the existing face identification method.
In order to make the present application better understood by those skilled in the art, the following description will clearly and completely describe the technical solutions in the embodiments of the present application with reference to the accompanying drawings, and it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
For easy understanding, referring to fig. 1, an embodiment of a method for identifying identity of a person on a mask provided by the present application includes:
step 101, inputting the image of the person to be identified into a preset area segmentation network for segmentation processing to obtain the image of the person to be identified.
The image of the mask to be identified is directly input into a preset encoding-decoding model for feature extraction, so that unnecessary background features can be extracted, and the identity identification effect is affected. Therefore, in the embodiment of the application, the preset area segmentation network is adopted to segment the image of the to-be-identified mask, and the mask area is segmented from the image of the to-be-identified mask, so that unnecessary characteristics are reduced, and the quality of effective characteristic information is improved. The to-be-identified mask image can be obtained by performing image framing processing on a video stream acquired by monitoring equipment.
And 102, inputting the mask area image into a preset encoding-decoding model to perform feature separation, deleting the extracted external features, and outputting typical features and gesture features of the mask area image.
For different scenes, external features of the mask person are different, such as wearing, and in this case, when the identity is identified through the trained classifier, an erroneous identification result may be generated due to the change of the external features of the mask person. For example, the same person wearing the mask may be wearing the mask differently in different places, and the trained classifier may identify the same person wearing the mask as a person of different identity. Therefore, in the embodiment of the application, the input mask area image is subjected to feature separation by a preset encoding-decoding model to obtain typical features, external features and gesture features, and the external features are deleted without participating in subsequent identification. Typical features are the shape features of the person's height, arm length, etc., and the posture features are the representation of the person's gait information on a particular frame while acting.
Further, the preset encoding-decoding model includes an encoder and a decoder, and the configuration process of the preset encoding-decoding model specifically includes:
carrying out framing treatment on the acquired video data to obtain a to-be-trained mask image, dividing the to-be-trained mask image to obtain a mask area image, and taking the mask area image as a training sample image; sequentially inputting the training sample images into an encoding-decoding network, enabling an encoder in the encoding-decoding network to encode the training sample images, outputting external features, gesture features and typical features of the training sample images, and enabling a decoder in the encoding-decoding network to reconstruct images based on the features output by the encoder and output reconstructed images; based on the reconstructed image and the training sample image, separating out non-posture features of the training sample image through a cross reconstruction loss function, separating out posture features of the training sample image through a posture similarity loss function, and separating out typical features of the training sample image from the non-posture features through a standard consistency loss function, wherein the non-posture features comprise external features and typical features.
It should be noted that the encoding-decoding network is also a typical CNN structure, which includes convolution layers and a max-pooling layer, where each convolution layer is followed by a ReLU activation function, and the last layer is a Sigmoid activation function, and is used to output the value of [0,1] for the next operation. Firstly, an encoder epsilon encodes an input mask person region image X, wherein the characteristic representation mode of the encoding is as follows:
f a ,f p ,f c =ε(X);
wherein f a As external features, f p For gesture features, f c As a typical feature, ε (·) is the coding function.
To adequately extract the image features of the mask area, a reconstructed image approximating the original image X is reconstructed by a decoder DWherein the reconstructed image +.>Can be expressed as:
where D (·) is the decoding function.
After three different features are completely learned and classified, external features, typical features and gesture features in the features of the mask are separated by designing different loss functions, and the method specifically comprises the following steps:
(1) Cross reconstruction loss function:
wherein t is 1 ,t 2 C, for different moments in the same video 1 ,c 2 For different video scenes.
The cross reconstruction loss function proposed by the present application is realized by using t 1 Appearance characteristic f at time a Characteristic features f c And t 2 Gesture feature at time f p To reconstruct t 1 Is the next frame image of (a)Due to t 1 ,t 2 F independent of posture in case a ,f c Is unchanged, f of the current frame can be used p F with any frame in the same video a ,f c Matching, reconstructing the same object, which can force f in all frames of the video a ,f c Keep similar, f a ,f c Corresponds to a constant factor, can make the feature (f a And f c ) Separating, namely separating out the non-attitude features.
(2) Pose similarity loss function:
wherein n is 1 For video scene c 1 Number of video frames, n 2 For video scene c 2 Video of the lower partFrame number.
Under different scenes, f p Will receive f a To ensure f p Only comprises gesture information, a gesture similarity loss function is provided, and f is deleted by utilizing the uniqueness of the gesture information under different scenes p External characteristics present in the model.
(3) Canonical consistency loss function:
in the formula, i, j E [1, n ] 1 ]. Each individual is invariant at different times and scenarios, and from this characteristic, the representative feature can be separated from the non-pose features by a canonical consistency loss function.
In the process of feature separation, deleting external features, and keeping the gesture features and typical features of the mask person for subsequent identification.
Step 103, carrying out averaging treatment on typical features of the area image of the mask person to obtain static gait features, and inputting the gesture features into a preset LSTM network for treatment to obtain dynamic gait features.
102, after typical features and gesture features of a person are obtained through separation, carrying out averaging treatment on the typical features to obtain static gait features of the person; the dynamic gait characteristics are obtained by inputting the gesture characteristics into a multi-layer LSTM network with a design increment identity loss function for characteristic processing.
Step 104, inputting the static gait characteristics and the dynamic gait characteristics into a preset classifier for classification and identification, and obtaining an identity identification result of the to-be-identified mask image.
The static gait characteristics and the dynamic gait characteristics of the mask are input into the preset classifier for classification and identification, so that an identity identification result is achieved, and the identity identification result is more accurate and reliable than an identity identification result obtained by classification and identification based on single static gait characteristics or single dynamic gait characteristics.
According to the identity recognition method for the mask, the mask area and the background area are segmented through the preset area segmentation network, the mask area image is obtained, and interference of factors such as the background is reduced; the gait characteristics of the mask are obtained by extracting the typical characteristics and the gesture characteristics of the mask, so that the identity of the mask is identified, and the identity of the mask is identified by extracting the gait characteristics of the mask, so that the problem that the face characteristics of the mask cannot be accurately extracted and the identity of the mask cannot be identified by a face identification method is avoided; in addition, in consideration of the fact that the external features of the mask extracted by the convolutional neural network are different due to the fact that the mask possibly wears different in different scenes, the identity recognition effect of the mask is affected by the fact that the different external features participate in the identity recognition of the mask, in order to avoid the effect of the identity recognition due to the fact that the change of the external features of the mask is affected, the external features are separated and deleted through the preset coding-decoding model, accuracy of the identity recognition of the mask is improved, and therefore the technical problem that the identity recognition of the mask is difficult to recognize in an existing face recognition method is solved.
The above is an embodiment of a method for identifying a person on a mask provided by the present application, and the following is another embodiment of a method for identifying a person on a mask passed by the present application.
For easy understanding, referring to fig. 2, another embodiment of a method for identifying identity of a person on a mask provided by the present application includes:
step 201, inputting the to-be-identified mask image into a preset super-resolution network for processing, and outputting the super-resolution mask image.
In consideration of the problem of monitoring equipment, the quality of the photographed images of the people with the face is poor, so that the extraction of effective characteristics is affected. According to the application, the super-resolution network is preset to process the to-be-identified mask image, and the super-resolution mask image is output, so that the quality of the to-be-identified mask image is improved.
Further, the preset super-resolution network in the embodiment of the present application includes a first convolution module and a second convolution module, please refer to fig. 3; the first convolution module comprises 6 convolution layers and one sub-pixel convolution layer (upsampling layer) and is used for improving pixels in the length and width directions of the to-be-identified mask image; the second convolution module comprises 4 convolution layers and one sub-pixel convolution layer, and is used for improving pixels in the height direction of the to-be-identified mask image. For an input image of a to-be-identified mask, a patch (pats) -based segmentation method is adopted, low-resolution pats with a pixel of 7*7 is input, a 3*3 convolution kernel is adopted for each convolution layer to perform feature learning, a linear rectification function (ReLU) is connected behind each convolution layer, the learned feature of each convolution layer is input of the next layer, and in order to obtain spatial information among the features of the image of the to-be-identified mask, a first convolution layer in a first convolution module is connected with a third convolution layer in a short jump manner, and the first convolution layer is connected with a sixth convolution layer in a long jump manner. And the activation map generated by the sixth convolution layer is processed by the sub-pixel convolution layer, and the amplified activation map is output, so that the pixels in the length and width directions of the to-be-identified mask image are improved. The input of the second convolution module is the output (amplified activation diagram) of the first convolution module, the first convolution layer in the second convolution module is connected with the fourth convolution layer in a long jump way, the sub-pixel convolution layer in the second convolution module carries out further amplification processing on the activation diagram output by the fourth convolution layer, the pixels in the height direction of the to-be-identified mask image are improved, and finally the super-resolution mask image is output.
The sub-pixel convolution layer in the pre-set super-resolution network has scale factors of two directions for scaling in two axis directions, for which r=2, the sub-pixel convolution layer requires 4 activation maps as inputs. The low resolution mask image is input to the activation map and then the activation map is output in response to the high resolution, with the pixels rearranged for super resolution on both axes, outputting Meng Mianren the super resolution image.
Step 202, inputting the super-resolution mask image into a preset region segmentation network for segmentation processing to obtain a mask region image.
The preset split area network in the embodiment of the application is preferably a trained Mask R-CNN network. The Mask R-CNN network performs feature extraction and region segmentation through the ResNext-101+FPN network, the ROIALign is adopted to replace pooling operation, an interpolation process is introduced, bilinear interpolation processing is performed first, pooling operation is performed again, and thus the problem of data nonlinearity caused by sampling only through pooling operation is avoided.
And 203, inputting the mask area image into a preset encoding-decoding model to perform feature separation, deleting the extracted external features, and outputting typical features and gesture features of the mask area image.
And 204, carrying out averaging treatment on typical features of the area image of the mask person to obtain static gait features, and inputting the gesture features into a preset LSTM network for treatment to obtain dynamic gait features.
Step 205, inputting the static gait feature and the dynamic gait feature into a preset classifier for classification and identification, and obtaining an identity identification result of the to-be-identified mask image.
The details of steps 203 to 205 are identical to those of steps 102 to 104, and the details of steps 203 to 205 will not be described here.
The above embodiments are only for illustrating the technical solution of the present application, and not for limiting the same; although the application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application.
Claims (7)
1. A method for identifying a person on a mask, comprising:
inputting the image of the person to be identified into a preset area segmentation network for segmentation processing to obtain an image of the person to be identified;
inputting the mask area image into a preset encoding-decoding model for feature separation, deleting the extracted external features, and outputting typical features and gesture features of the mask area image;
carrying out averaging treatment on the typical characteristics of the mask area image to obtain static gait characteristics, and inputting the gesture characteristics into a preset LSTM network for treatment to obtain dynamic gait characteristics;
inputting the static gait characteristics and the dynamic gait characteristics into a preset classifier for classification and identification to obtain an identification result of the to-be-identified mask image;
the configuration process of the preset encoding-decoding model comprises the following steps:
framing the acquired video data to obtain a training sample image;
sequentially inputting the training sample images to an encoding-decoding network, so that an encoder in the encoding-decoding network encodes the training sample images, outputting external features, gesture features and typical features of the training sample images, performing image reconstruction by a decoder in the encoding-decoding network based on the features output by the encoder, and outputting reconstructed images, wherein the training sample images are mask region images obtained by dividing mask images to be trained;
based on the reconstructed image and the training sample image, separating non-attitude features of the training sample image through a cross reconstruction loss function, separating attitude features of the training sample image through an attitude similarity loss function, and separating typical features of the training sample image from the non-attitude features through a standard consistency loss function, wherein the non-attitude features comprise the external features and the typical features.
2. The method for identifying the identity of the person to be masked according to claim 1, wherein the step of inputting the image of the person to be masked into a preset area dividing network for dividing processing to obtain the image of the area of the person to be masked, further comprises the steps of:
inputting the to-be-identified mask image into a preset super-resolution network for processing, and outputting a super-resolution mask image;
correspondingly, the step of inputting the image of the person to be identified into a preset area segmentation network for segmentation processing to obtain the image of the person to be identified comprises the following steps:
and inputting the super-resolution mask image into a preset region segmentation network for segmentation processing to obtain a mask region image.
3. The method of claim 2, wherein the preset super-resolution network comprises a first convolution module and a second convolution module;
the first convolution module comprises 6 convolution layers and one sub-pixel convolution layer, and is used for improving pixels in the length direction and the width direction of the to-be-identified mask image;
the second convolution module comprises 4 convolution layers and one sub-pixel convolution layer and is used for improving pixels in the height direction of the to-be-identified mask image.
4. The method of claim 1, wherein the cross-reconstruction loss function is:
wherein t is 1 ,t 2 F for different moments under the same video a As external features, f p For gesture features, f c As is typical of the features of this application,at t 2 The training sample image at the moment, D (·) is the decoding function.
5. The method of claim 4, wherein the pose similarity loss function is:
wherein n is 1 For video scene c 1 Number of video frames, n 2 For video scene c 2 The number of video frames below.
6. The method of claim 5, wherein the canonical consistency loss function:
7. the method of claim 1, wherein the preset area division network is a trained Mask R-CNN network.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010843398.0A CN111950496B (en) | 2020-08-20 | 2020-08-20 | Mask person identity recognition method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010843398.0A CN111950496B (en) | 2020-08-20 | 2020-08-20 | Mask person identity recognition method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111950496A CN111950496A (en) | 2020-11-17 |
CN111950496B true CN111950496B (en) | 2023-09-15 |
Family
ID=73358545
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010843398.0A Active CN111950496B (en) | 2020-08-20 | 2020-08-20 | Mask person identity recognition method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111950496B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114494962A (en) * | 2022-01-24 | 2022-05-13 | 上海商汤智能科技有限公司 | Object identification method, network training method, device, equipment and medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109815874A (en) * | 2019-01-17 | 2019-05-28 | 苏州科达科技股份有限公司 | A kind of personnel identity recognition methods, device, equipment and readable storage medium storing program for executing |
CN110084156A (en) * | 2019-04-12 | 2019-08-02 | 中南大学 | A kind of gait feature abstracting method and pedestrian's personal identification method based on gait feature |
CN110222634A (en) * | 2019-06-04 | 2019-09-10 | 河海大学常州校区 | A kind of human posture recognition method based on convolutional neural networks |
CN110991281A (en) * | 2019-11-21 | 2020-04-10 | 电子科技大学 | Dynamic face recognition method |
-
2020
- 2020-08-20 CN CN202010843398.0A patent/CN111950496B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109815874A (en) * | 2019-01-17 | 2019-05-28 | 苏州科达科技股份有限公司 | A kind of personnel identity recognition methods, device, equipment and readable storage medium storing program for executing |
CN110084156A (en) * | 2019-04-12 | 2019-08-02 | 中南大学 | A kind of gait feature abstracting method and pedestrian's personal identification method based on gait feature |
CN110222634A (en) * | 2019-06-04 | 2019-09-10 | 河海大学常州校区 | A kind of human posture recognition method based on convolutional neural networks |
CN110991281A (en) * | 2019-11-21 | 2020-04-10 | 电子科技大学 | Dynamic face recognition method |
Also Published As
Publication number | Publication date |
---|---|
CN111950496A (en) | 2020-11-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Chen et al. | Fsrnet: End-to-end learning face super-resolution with facial priors | |
CN108830855B (en) | Full convolution network semantic segmentation method based on multi-scale low-level feature fusion | |
CN113362223B (en) | Image super-resolution reconstruction method based on attention mechanism and two-channel network | |
CN108537754B (en) | Face image restoration system based on deformation guide picture | |
CN112766160A (en) | Face replacement method based on multi-stage attribute encoder and attention mechanism | |
Tang et al. | Real-time neural radiance talking portrait synthesis via audio-spatial decomposition | |
CN111369565A (en) | Digital pathological image segmentation and classification method based on graph convolution network | |
CN111488932B (en) | Self-supervision video time-space characterization learning method based on frame rate perception | |
CN114723760B (en) | Portrait segmentation model training method and device and portrait segmentation method and device | |
JP2010108494A (en) | Method and system for determining characteristic of face within image | |
CN113343950B (en) | Video behavior identification method based on multi-feature fusion | |
Krishnan et al. | SwiftSRGAN-Rethinking super-resolution for efficient and real-time inference | |
CN113516604B (en) | Image restoration method | |
CN117274059A (en) | Low-resolution image reconstruction method and system based on image coding-decoding | |
CN111950496B (en) | Mask person identity recognition method | |
CN112906675B (en) | Method and system for detecting non-supervision human body key points in fixed scene | |
CN117409476A (en) | Gait recognition method based on event camera | |
CN117097853A (en) | Real-time image matting method and system based on deep learning | |
CN112488165A (en) | Infrared pedestrian identification method and system based on deep learning model | |
CN116152710A (en) | Video instance segmentation method based on cross-frame instance association | |
CN115330655A (en) | Image fusion method and system based on self-attention mechanism | |
CN115100409A (en) | Video portrait segmentation algorithm based on twin network | |
CN110853040B (en) | Image collaborative segmentation method based on super-resolution reconstruction | |
JP2023092185A (en) | Image processing apparatus, learning method, and program | |
CN114511487A (en) | Image fusion method and device, computer readable storage medium and terminal |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |