CN116309643A - Face shielding score determining method, electronic equipment and medium - Google Patents

Face shielding score determining method, electronic equipment and medium Download PDF

Info

Publication number
CN116309643A
CN116309643A CN202310298673.9A CN202310298673A CN116309643A CN 116309643 A CN116309643 A CN 116309643A CN 202310298673 A CN202310298673 A CN 202310298673A CN 116309643 A CN116309643 A CN 116309643A
Authority
CN
China
Prior art keywords
face
region
trained
occlusion
segmentation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310298673.9A
Other languages
Chinese (zh)
Inventor
何方
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Yuncong Enterprise Development Co ltd
Original Assignee
Shanghai Yuncong Enterprise Development Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Yuncong Enterprise Development Co ltd filed Critical Shanghai Yuncong Enterprise Development Co ltd
Priority to CN202310298673.9A priority Critical patent/CN116309643A/en
Publication of CN116309643A publication Critical patent/CN116309643A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • G06T7/0012Biomedical image inspection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • G06V40/166Detection; Localisation; Normalisation using acquisition arrangements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20021Dividing image into blocks, subimages or windows
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • G06T2207/30201Face

Abstract

The invention relates to computer vision, in particular to a face shielding part determining method, electronic equipment and a medium, and aims to solve the problem that the existing method cannot accurately reflect the face shielding degree. For this purpose, the invention obtains at least partial face region segmentation and occlusion region segmentation in at least partial face region or non-occlusion region segmentation in at least partial face region by inputting face image into trained face segmentation model; respectively acquiring the number of pixels segmented by at least part of the face region and the number of pixels segmented by the shielding region in at least part of the face region or the number of pixels segmented by the non-shielding region in at least part of the face region; face occlusion parts of the face image are determined based on the number of pixels segmented by at least part of the face region and the number of pixels segmented by the occlusion region in at least part of the face region or the number of pixels segmented by the non-occlusion region in at least part of the face region.

Description

Face shielding score determining method, electronic equipment and medium
Technical Field
The invention relates to the technical field of computer vision, and particularly provides a face shielding score determining method, electronic equipment and a medium.
Background
In the face recognition system, face quality evaluation is an indispensable preprocessing step, and aims to filter out face data with poor quality before face recognition so as to improve recognition accuracy. Occlusion is a factor affecting the quality of a face, which can have a large impact on the face recognition, and therefore requires accurate estimation. However, the occlusion score is not an intuitive target, and a specific occlusion score value cannot be directly given (for example, the range of the face occlusion score value is 0-1, 0 indicates no occlusion, 1 indicates all occlusion), and the occlusion score value often needs to be obtained by constructing in other manners.
In the existing face shielding and classifying estimation technology, whether a face belongs to a shielding face or not is identified by a classification method; the shielding degree of each sub-region is estimated by partitioning the face frame region, and then the shielding degree of the whole face is obtained by weighting. However, in these face occlusion score estimation techniques, the method for constructing the face occlusion score cannot accurately reflect the degree of face occlusion (for example, the confidence in the classification method only reflects the confidence that the face is not occlusion, and the face frame area includes a non-face area, and the occlusion condition of the portion should not be used as a factor for determining whether the face is occlusion, so that the partition also has an influence), so that an error further occurs in the occlusion score estimation based on this.
Accordingly, there is a need in the art for a new face occlusion score determination method to address the above-described problems.
Disclosure of Invention
The invention aims to solve the technical problems that the existing face shielding sub-construction method can not accurately reflect the shielding degree of the face, thereby reducing the use experience of users.
In order to achieve the above object, in a first aspect, the present invention provides a face occlusion score determining method, the method comprising the steps of:
acquiring a face image, and inputting the face image into a trained face segmentation model to obtain at least partial face region segmentation and occlusion region segmentation in at least partial face region or non-occlusion region segmentation in at least partial face region;
respectively acquiring the number of pixels segmented by the at least partial face region and the number of pixels segmented by the shielding region in the at least partial face region or the number of pixels segmented by the non-shielding region in the at least partial face region;
and determining the face shielding part of the face image based on the number of the pixels segmented by the at least partial face area and the number of the pixels segmented by the shielding area in the at least partial face area or the number of the pixels segmented by the non-shielding area in the at least partial face area.
In an optional technical solution of the above face occlusion part determining method, the step of determining a face occlusion part of the face image based on the number of pixels segmented by the at least partial face region and the number of pixels segmented by the occlusion region in the at least partial face region or the number of pixels segmented by the non-occlusion region in the at least partial face region includes:
acquiring a first ratio of the number of pixels segmented by the shielding region in the at least partial face region to the number of pixels segmented by the at least partial face region, and taking the first ratio as a face shielding part of the face image;
or, obtaining a second ratio of the number of pixels segmented by the non-occlusion region in the at least partial face region to the number of pixels segmented by the at least partial face region, and taking the second ratio as the face occlusion part of the face image.
In an optional technical scheme of the face occlusion score determining method, the method trains the model at least based on the following steps:
acquiring a first face training image, marking the first face training image, and taking the marked first face training image as a training sample of the face segmentation model; constructing a neural network model, and taking the neural network model as a face segmentation model to be trained, wherein the face segmentation model to be trained comprises a backbone network to be trained, a neck network to be trained and a head network to be trained;
and inputting the training sample into the face segmentation model to be trained for training so as to obtain a face segmentation model which is trained.
In an optional technical solution of the above face occlusion score determining method, the step of "labeling the first face training image" includes:
labeling at least a partial face region segmentation based on the first face training image, and occlusion region segmentation within at least a partial face region or non-occlusion region segmentation within at least a partial face region.
In an optional technical solution of the above face occlusion part determination method, the steps of labeling at least part of face region segmentation, and occlusion region segmentation in at least part of face region or non-occlusion region segmentation in at least part of face region include:
acquiring a second face training image based on the first face training image, wherein the second face training image is an image only comprising a face detection frame area in the first face training image;
labeling at least part of the face region segmentation and shielding region segmentation in at least part of the face region or non-shielding region segmentation in at least part of the face region based on the second face training image and a preset labeling region requirement, wherein the preset labeling region requirement comprises labeling all of the face region or labeling part of the face region.
In an optional technical solution of the above face occlusion score determining method, the step of inputting the training sample into the face segmentation model to be trained to perform training so as to obtain a face segmentation model after training includes:
s1, inputting the training sample into the backbone network to be trained for feature extraction so as to obtain a first feature map of the training sample;
s2, inputting the first feature map of the training sample into the neck network to be trained for feature fusion so as to obtain a second feature map of the training sample;
s3, inputting a second feature map of the training sample into the head network to be trained for region segmentation to obtain a region segmentation result of the training sample, wherein the region segmentation result is a pixel level or a grid level;
s4, acquiring a loss function based on the region segmentation result of the training sample and the training sample, feeding back the loss function to the step S1, and circularly executing the steps S1-S4 until the loss function converges, wherein the loss function at least comprises a cross entropy loss function.
In an optional technical solution of the above method for determining a face occlusion score, the head network to be trained includes a head network to be trained for region segmentation of a face, a head network to be trained for region segmentation of an occlusion region in a face region, or a head network to be trained for region segmentation of a non-occlusion region in a face region, and the step of inputting the second feature map of the training sample into the head network to be trained for region segmentation to obtain a region segmentation result of the training sample includes:
inputting the second feature map of the sample to be trained into the face region segmentation head network to be trained so as to obtain at least partial face region segmentation;
inputting the second feature map of the sample to be trained into a shielding region segmentation head network in the face region to be trained so as to obtain shielding region segmentation in at least part of the face region;
or inputting the second feature map of the sample to be trained into a non-occlusion region segmentation head network in the face region to be trained so as to obtain non-occlusion region segmentation in at least part of the face region.
In an optional technical scheme of the above face shielding score determining method, the method further includes:
constructing the backbone network to be trained based at least on at least one of ResNet, mobileNet, HRNet;
the neck network to be trained is constructed based on at least one of FPN, PANet, bi-FPN.
In a second aspect, the present invention also provides an electronic device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the face occlusion score determination method according to any of the preceding claims when executing the computer program.
In a third aspect, the present invention also provides a readable storage medium having stored therein a plurality of program codes adapted to be loaded and executed by a processor to perform the face occlusion score determination method of any of the above.
As can be appreciated by those skilled in the art, in the technical solution of the present invention, by acquiring a face image and inputting the face image into a trained face segmentation model, at least a partial face region segmentation and an occlusion region segmentation within at least a partial face region or a non-occlusion region segmentation within at least a partial face region are obtained; respectively acquiring the number of pixels segmented by at least part of the face region and the number of pixels segmented by the shielding region in at least part of the face region or the number of pixels segmented by the non-shielding region in at least part of the face region; face occlusion parts of the face image are determined based on the number of pixels segmented by at least part of the face region and the number of pixels segmented by the occlusion region in at least part of the face region or the number of pixels segmented by the non-occlusion region in at least part of the face region. The setting can reflect the shielding degree of the face more accurately, so that the accuracy of face recognition is further improved, and the use experience of a user is improved.
Further, the method further comprises: constructing a backbone network to be trained based on at least one of ResNet, mobileNet, HRNet; a neck network to be trained is constructed based on at least one of the FPN, the PANet, and the Bi-FPN. The face segmentation model can be established based on the actual requirements of the user, so that the relationship between accuracy and time consumption is balanced better, and the use experience of the user is further improved.
Drawings
The present disclosure will become more readily understood with reference to the accompanying drawings. As will be readily appreciated by those skilled in the art: the drawings are for illustrative purposes only and are not intended to limit the scope of the present invention. Moreover, like numerals in the figures are used to designate like parts, wherein:
FIG. 1 is a flow chart of the main steps of a face occlusion score determination method according to an embodiment of the present invention;
FIG. 2 is a flow chart of the main steps of training a face segmentation model according to one embodiment of the present invention;
FIG. 3 is a schematic image of a region segmentation of a face labeled at least in part, according to one embodiment of the invention;
FIG. 4 is a schematic image of an occlusion region segmentation labeled at least in part within a face region, according to one embodiment of the invention;
FIG. 5 is a flow chart of the main steps of inputting training samples into a face segmentation model to be trained for training according to one embodiment of the present invention;
fig. 6 is a schematic diagram of a main structure of an electronic device for performing the face occlusion score determination method of the present invention.
Detailed Description
Some embodiments of the invention are described below with reference to the accompanying drawings. It should be understood by those skilled in the art that these embodiments are merely for explaining the technical principles of the present invention, and are not intended to limit the scope of the present invention.
In the description of the present invention, a "module," "processor" may include hardware, software, or a combination of both. A module may comprise hardware circuitry, various suitable sensors, communication ports, memory, or software components, such as program code, or a combination of software and hardware. The processor may be a central processor, a microprocessor, an image processor, a digital signal processor, or any other suitable processor. The processor has data and/or signal processing functions. The processor may be implemented in software, hardware, or a combination of both. Non-transitory computer readable storage media include any suitable medium that can store program code, such as magnetic disks, hard disks, optical disks, flash memory, read-only memory, random access memory, and the like. The term "at least one A or B" or "at least one of A and B" has a meaning similar to "A and/or B" and may include A alone, B alone or A and B. The singular forms "a", "an" and "the" include plural referents.
As described in the background section, the invention provides a face shielding part determining method, which aims at the problem that the existing face shielding part constructing method can not accurately reflect the shielding degree of a face, thereby reducing the use experience of a user.
Referring to fig. 1, fig. 1 is a schematic flow chart of main steps of a face occlusion score determining method according to an embodiment of the present invention. As shown in fig. 1, the invention further provides a method for determining a face shielding score, which comprises the following steps:
step S101: and acquiring a face image, and inputting the face image into a trained face segmentation model to obtain at least partial face region segmentation, occlusion region segmentation in at least partial face region or non-occlusion region segmentation in at least partial face region.
Specifically, the face region in general includes the forehead, the eyebrow, the eyelid, the canthus, the orbit, the bridge of the nose, the nose wings, the tip of the nose, the nasolabial sulcus, the cheek, the lip, the upper jaw, the lower jaw, and the like, and in the face recognition technique, the face region may be recognized based on all of the face regions or the face region in the above part, and therefore, at least partial face region division, or at least partial face region non-blocking region division may be obtained by a trained face division model. For example, it is considered that the forehead area above the eyebrow does not substantially affect the face recognition, and thus at least a part of the face area may be a face area not including the forehead but including only below the eyebrow. The above arrangement of at least a part of the face area is merely illustrative, and may be selected according to actual needs in practical applications.
Step S102: the number of pixels segmented by at least a part of the face region, the number of pixels segmented by the occlusion region in at least a part of the face region, or the number of pixels segmented by the non-occlusion region in at least a part of the face region are acquired, respectively.
Step S103: face occlusion parts of the face image are determined based on the number of pixels segmented by at least a portion of the face region, the number of pixels segmented by an occlusion region within at least a portion of the face region, or the number of pixels segmented by a non-occlusion region within at least a portion of the face region.
Based on the steps S101 to S103, the face image is acquired and is input into a trained face segmentation model to obtain at least partial face region segmentation and at least partial face region segmentation or at least partial face region non-occlusion region segmentation; respectively acquiring the number of pixels segmented by at least part of the face region and the number of pixels segmented by the shielding region in at least part of the face region or the number of pixels segmented by the non-shielding region in at least part of the face region; face occlusion parts of the face image are determined based on the number of pixels segmented by at least part of the face region and the number of pixels segmented by the occlusion region in at least part of the face region or the number of pixels segmented by the non-occlusion region in at least part of the face region. The setting can reflect the shielding degree of the face more accurately, so that the accuracy of face recognition is further improved, and the use experience of a user is improved.
In some embodiments, determining the face mask of the face image based on the number of pixels segmented by at least a portion of the face region, the number of pixels segmented by the mask region within at least a portion of the face region, or the number of pixels segmented by the non-mask region within at least a portion of the face region comprises the steps of:
step S1031: and obtaining a first ratio of the number of pixels segmented by the shielding area in at least part of the face area to the number of pixels segmented by the at least part of the face area, and taking the first ratio as the face shielding part of the face image.
Step S1032: or, obtaining a second ratio of the number of pixels segmented by the non-occlusion region in at least part of the face region to the number of pixels segmented by the face region, and taking the second ratio as the face occlusion part of the face image.
Namely, a first ratio is obtained by the first ratio= (the number of pixels segmented by the shielding area in at least part of the face area)/(the number of pixels segmented by at least part of the face area), and then the first ratio is used as the face shielding part of the face image; or, the second ratio is obtained by the second ratio= (the number of pixels segmented by the non-occlusion region in at least part of the face region)/(the number of pixels segmented by at least part of the face region), and then the second ratio is used as the face occlusion part of the face image.
Referring to fig. 2, fig. 2 is a flowchart illustrating main steps for training a face segmentation model according to an embodiment of the present invention. As shown in fig. 2, the method trains the model based at least on the following steps:
step S201: and acquiring a first face training image, marking the first face training image, and taking the marked first face training image as a training sample of the face segmentation model.
In some embodiments, the first face training image includes a face detection frame region and a non-face detection frame region, and labeling the first face training image includes: labeling at least a partial face region segmentation based on the first face training image, and either an occlusion region segmentation within at least a partial face region or a non-occlusion region segmentation within at least a partial face region.
Specifically, at least a part of the face region refers to a part or all of the region belonging to the face in the first face training image, and the region can be represented by a point set at the boundary of the region or a point set in the region; the occlusion region in at least part of the face region refers to a region which is occluded in at least part of the face region and can be represented by a point set at the boundary of the region or a point set in the region; the non-occlusion region in at least a part of the face region refers to a region in at least a part of the face region that is not occluded, and may be represented by a set of points at the boundary of the region or a set of points in the region. For example, the region segmentation of the first face training image may be performed on a pixel level, that is, each pixel in the first face training image is divided into regions to which the region belongs. The above-described arrangement of the region division is merely illustrative, and may be selected according to actual needs in practical applications.
In some embodiments, labeling at least a partial face region segmentation and occlusion region segmentation within at least a partial face region or non-occlusion region segmentation within at least a partial face region comprises: acquiring a second face training image based on the first face training image, wherein the second face training image is an image only comprising a face detection frame area in the first face training image; labeling at least a part of the face region, or at least a non-shielding region within the face region based on the second face training image and a preset labeling region requirement, wherein the preset labeling region requirement comprises labeling all of the face region or labeling a part of the face region.
Specifically, the first face training image is an image including a face detection frame region and a non-face detection frame region, and the non-face detection frame region may be, for example, a neck region of a human body, a shoulder region of a human body, or the like. Therefore, in order to more accurately label the first face training image, a second face training image only including the face detection frame area of the first face training image can be obtained first, and then labeling is performed based on the second face training image. When labeling is carried out, at least part of the face region refers to part or all of the region belonging to the face in the second face training image, and the region can be represented by a point set at the boundary of the region or a point set in the region; the occlusion region in at least part of the face region refers to a region which is occluded in at least part of the face region and can be represented by a point set at the boundary of the region or a point set in the region; the non-occlusion region in at least a part of the face region refers to a region in at least a part of the face region that is not occluded, and may be represented by a set of points at the boundary of the region or a set of points in the region. If there is no shielding area in at least part of the face area, the shielding area in at least part of the face area is divided into empty sets.
Illustratively, the face detection frame region in the first face training image may be represented by an upper left corner coordinate (X1, Y1) and a lower right corner coordinate (X2, Y2) of the face detection frame region, that is, an upper left corner of the first face training image is set as an origin of coordinates, a positive direction of an X axis is set to be horizontal to the right, a positive direction of a Y axis is set to be vertical to the bottom, and an upper left corner coordinate of the face detection frame region under the coordinate axis is set as (X1, Y1) and a lower right corner coordinate is set as (X2, Y2). The above-described representation of the face detection frame area is merely an exemplary illustration, and may be selected according to actual needs in practical applications.
In some embodiments, at least a portion of the face region may be custom-defined, i.e., may include all of the face region, or may include only a portion of the face region, thereby labeling all of the face region or labeling a portion of the face region. For example, considering that the forehead area above the eyebrow does not substantially affect the face recognition, at least a part of the face area may be defined as a face area not including the forehead but including only below the eyebrow. In this manner of defining at least a part of the face region, at least a part of the face region segmentation based on the labeling of the second face training image may be as shown in fig. 3, and the occlusion region segmentation within the labeled at least part of the face region may be as shown in fig. 4. The above-mentioned defining manner of at least a part of the face region, the marked at least a part of the face region segmented image, and the marked at least a part of the occlusion region segmented image in the face region are merely exemplary, and may be selected according to actual needs in practical applications.
Step S202: and constructing a neural network model, and taking the neural network model as a face segmentation model to be trained, wherein the face segmentation model to be trained comprises a backbone network to be trained, a neck network to be trained and a head network to be trained.
Step S203: and inputting the training sample into the face segmentation model to be trained for training so as to obtain the face segmentation model which is trained.
S203 is further described below.
Referring to fig. 5, fig. 5 is a flowchart illustrating main steps of inputting training samples into a face segmentation model to be trained for training according to an embodiment of the present invention. As shown in fig. 5, in some embodiments, inputting a training sample into a face segmentation model to be trained for training, to obtain a trained face segmentation model includes the following steps:
step S2031: and inputting the training sample into a backbone network to be trained for feature extraction so as to obtain a first feature map of the training sample.
Step S2032: and inputting the first feature map of the training sample into a neck network to be trained for feature fusion so as to obtain a second feature map of the training sample.
Step S2033: and inputting the second feature map of the training sample into a head network to be trained for region segmentation to obtain a region segmentation result of the training sample, wherein the region segmentation result is a pixel level or a grid level.
Step S2034: and acquiring a loss function based on the region segmentation result of the training sample and the training sample, feeding back the loss function to the step S2031, and circularly executing the steps S2031 to S2034 until the loss function converges, wherein the loss function at least comprises a cross entropy loss function.
Specifically, feature extraction is the basis of computer vision tasks, and a good feature extraction network can significantly improve the performance of an algorithm, and in the computer vision tasks, a network for performing feature extraction on an image is called a backbone network. The receptive field refers to the size of the area mapped by the pixel points on the original image on the output characteristic map, the detection is carried out on the rough characteristic map output by the backbone network in the previous target detection, and the rough characteristic map output by the backbone network has a large receptive field, which is friendly to large objects, but too large receptive field is easy to cause 'defocus' for small objects. In order to avoid this problem, the feature images of multiple scales need to be fused from bottom to top before detection, that is, the neck network needs to fuse the first feature images extracted by the backbone network. After the neck network completes the fusion of the first feature map, the head network can detect and position based on the features fused by the neck network, namely, the region segmentation of the training sample is realized.
Based on the structural settings of the backbone network to be trained, the neck network to be trained and the head network to be trained, the region segmentation result can be at a pixel level or a grid level. The region segmentation at the pixel level is to segment the region to which each pixel in the training sample belongs; the region division at the grid level is to divide each training sample into a plurality of grids, so that the division of the region to which each grid belongs is performed. The accuracy of the region division at the pixel level is greater than that at the grid level, but the time consumption of the region division at the pixel level is longer than that at the grid level, so that the level of the region division to be adopted can be comprehensively considered based on the accuracy and the time consumption in actual use.
The loss function adjusts the weight of the face segmentation model to be trained by calculating the difference between the region segmentation result of each iteration of the face segmentation model to be trained and the training sample. The calculation formula of the cross entropy loss function is as follows:
Figure BDA0004144160940000111
wherein N is the number of samples selected by one training, M is the number of categories, y ic Taking 1 if the true category of the sample i is c, or taking 0 if the true category of the sample i is c; p is p ic The predicted probability that sample i belongs to category c.
In some embodiments, the head network to be trained includes a face region segmentation head network to be trained, an occlusion region segmentation head network within a face region to be trained, or a non-occlusion region segmentation head network within a face region to be trained, and inputting a second feature map of the training sample into the head network to be trained for region segmentation to obtain a region segmentation result of the training sample includes the following steps:
step S20331: and inputting the second feature map of the sample to be trained into a face region segmentation head network to be trained so as to obtain at least partial face region segmentation.
Step S20332: and inputting a second feature map of the sample to be trained into a shielding region segmentation head network in the face region to be trained so as to obtain shielding region segmentation in at least part of the face region.
Step S20333: or inputting the second feature map of the sample to be trained into a non-occlusion region segmentation head network in the face region to be trained so as to obtain non-occlusion region segmentation in at least part of the face region.
In some embodiments, the method further comprises: constructing a backbone network to be trained based on at least one of ResNet, mobileNet, HRNet; a neck network to be trained is constructed based on at least one of the FPN, the PANet, and the Bi-FPN.
Specifically, resNet (Residual Network) was proposed by He Kaiming et al of Microsoft laboratories in 2015, which is mainly characterized by having an ultra-deep Network structure, providing Residual structure modules, and using Batch Normalization (batch normalization) acceleration training. MobileNet (mobile network) is based on a streamlined architecture, using depth separable convolution to build lightweight depth neural networks for mobile and embedded vision applications; the network introduces two simple global hyper-parameters-width multiplier and resolution multiplier, which can effectively trade off between delay and accuracy. HRNet (High-Resolution Net) is proposed for 2D human body pose estimation tasks, and the network is primarily for pose estimation of a single individual (i.e., there should be only one human body target in the image of the input network).
The FPN (Feature Pyramid Networks, feature pyramid network) consists of two paths, bottom-up and top-down, the bottom-up path being the convolutional network of general sign extraction. From bottom to top, the spatial resolution decreases, more higher-level structures are detected, and the semantic value of the network layer correspondingly increases. PANet (Path Aggregation Network ) enhances the whole feature hierarchy by using accurate low-layer positioning signals through bottom-up path enhancement, so that the information path between the low-layer and top-layer features is shortened; an adaptive feature pool (adaptive feature pooling) is presented that connects a feature grid with all feature layers such that the useful information in each feature layer is propagated directly to the underlying proposed subnetwork. Bi-FPN (Bi-directional feature pyramid network ) introduces a learnable weight to learn the importance of different input features while repeatedly applying top-down and bottom-up multi-scale feature fusion.
The backbone network to be trained comprises one of ResNet, mobileNet, HRNet, and the neck network to be trained comprises one of FPN, PANet and Bi-FPN. The face segmentation model can be established based on the actual requirements of the user, so that the relationship between accuracy and time consumption is balanced better, and the use experience of the user is further improved.
In some embodiments, the face image is the same size as the training samples. Specifically, the size of the image may be represented by (H, W), where H is the height of the image and W is the width of the image, and the face image and the training sample are represented by the same (H, W).
It should be noted that, the user images according to the embodiments of the present disclosure (including, but not limited to, the first face training image and the second face training image used for training, the face image in the actual environment, etc.) are all images authorized by the user or sufficiently authorized by the parties.
The actions such as image acquisition related in the embodiments of the present disclosure are performed after user and object authorization or after full authorization by each party.
It should be noted that, although the foregoing embodiments describe the steps in a specific order, it will be understood by those skilled in the art that, in order to achieve the effects of the present invention, the steps are not necessarily performed in such an order, and may be performed simultaneously (in parallel) or in other orders, and these variations are within the scope of the present invention.
The invention further provides electronic equipment.
Referring to fig. 6, fig. 6 is a schematic block diagram of a main structure of an electronic device for performing the face occlusion score determining method of the present invention. As shown in fig. 6, the present invention further provides an electronic device for executing the face occlusion score determining method of the present invention, where the electronic device includes: a processor 11, a memory 12 and a computer program 13 stored in the memory 12 and executable on the processor 11. The steps of the various method embodiments described above are implemented by the processor 11 when executing the computer program 13. Alternatively, the processor 11 implements the functions of the modules/units in the above-described embodiments when executing the computer program 13.
The processor 11 may be, for example, a central processing unit (Central Processing Unit, CPU) or other general purpose processor, digital signal processor (Digital Signal Processor, DSP), application specific integrated circuit (Application Specific Integrated Circuit, ASIC), field-programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory 12 may be an internal storage unit of the electronic device, for example, a hard disk or a memory of the electronic device; the memory 12 may also be an external storage device of the electronic device, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card) or the like, which are provided on the electronic device. Further, the memory 12 may also include both internal storage units and external storage devices of the electronic device. The memory 12 is used for storing computer programs and other programs and data required by the electronic device, and the memory 12 may also be used for temporarily storing data that has been output or is to be output.
In some possible implementations, the electronic device may include multiple processors 11 and memory 12. The program for executing the face mask score determination method of the above method embodiment may be divided into a plurality of sub-programs, and each sub-program may be loaded and executed by the processor 11 to execute the different steps of the face mask score determination method of the above method embodiment. Specifically, each of the subroutines may be stored in different memories 12, respectively, and each of the processors 11 may be configured to execute the programs in one or more memories 12 to collectively implement the face mask score determination method of the above method embodiment, that is, each of the processors 11 executes different steps of the face mask score determination method of the above method embodiment, respectively, to collectively implement the face mask score determination method of the above method embodiment.
The plurality of processors 11 may be processors disposed on the same device, for example, the electronic device may be a high-performance device composed of a plurality of processors, and the plurality of processors 11 may be processors configured on the high-performance device. The plurality of processors 11 may be processors disposed on different devices, for example, the electronic device may be a server cluster, and the plurality of processors 11 may be processors on different servers in the server cluster.
The electronic device may be a desktop computer, a notebook computer, a palm computer, a cloud server, or the like. The electronic device may include, but is not limited to, a processor 11 and a memory 12. It will be appreciated by those skilled in the art that fig. 6 is merely an example of an electronic device and is not meant to be limiting, and may include more or fewer components than shown, or may combine certain components, or different components, e.g., an electronic device may also include an input-output device, a network access device, a bus, etc.
Further, the invention also provides a computer readable storage medium. In one embodiment of the computer readable storage medium according to the present invention, the computer readable storage medium may be configured to store a program for performing the face occlusion score determination method of the above method embodiment, which may be loaded and executed by a processor to implement the face occlusion score determination method described above. For convenience of explanation, only those portions of the embodiments of the present invention that are relevant to the embodiments of the present invention are shown, and specific technical details are not disclosed, please refer to the method portions of the embodiments of the present invention. The computer readable storage medium may be a storage device including various electronic devices, and optionally, the computer readable storage medium in the embodiments of the present invention is a non-transitory computer readable storage medium.
Further, it should be understood that, since the respective modules are merely set to illustrate the functional units of the apparatus of the present invention, the physical devices corresponding to the modules may be the processor itself, or a part of software in the processor, a part of hardware, or a part of a combination of software and hardware. Accordingly, the number of individual modules in the figures is merely illustrative.
Those skilled in the art will appreciate that the various modules in the apparatus may be adaptively split or combined. Such splitting or combining of specific modules does not cause the technical solution to deviate from the principle of the present invention, and therefore, the technical solution after splitting or combining falls within the protection scope of the present invention.
Thus far, the technical solution of the present invention has been described in connection with the preferred embodiments shown in the drawings, but it is easily understood by those skilled in the art that the scope of protection of the present invention is not limited to these specific embodiments. Equivalent modifications and substitutions for related technical features may be made by those skilled in the art without departing from the principles of the present invention, and such modifications and substitutions will fall within the scope of the present invention.

Claims (10)

1. A method for determining a face occlusion score, the method comprising the steps of:
acquiring a face image, and inputting the face image into a trained face segmentation model to obtain at least partial face region segmentation and occlusion region segmentation in at least partial face region or non-occlusion region segmentation in at least partial face region;
respectively acquiring the number of pixels segmented by the at least partial face region and the number of pixels segmented by the shielding region in the at least partial face region or the number of pixels segmented by the non-shielding region in the at least partial face region;
and determining the face shielding part of the face image based on the number of the pixels segmented by the at least partial face area and the number of the pixels segmented by the shielding area in the at least partial face area or the number of the pixels segmented by the non-shielding area in the at least partial face area.
2. The face occlusion part determination method according to claim 1, wherein the step of determining the face occlusion part of the face image based on the number of pixels segmented by the at least partial face region and the number of pixels segmented by the occlusion region within the at least partial face region or the number of pixels segmented by the non-occlusion region within the at least partial face region includes:
acquiring a first ratio of the number of pixels segmented by the shielding region in the at least partial face region to the number of pixels segmented by the at least partial face region, and taking the first ratio as a face shielding part of the face image;
or, obtaining a second ratio of the number of pixels segmented by the non-occlusion region in the at least partial face region to the number of pixels segmented by the at least partial face region, and taking the second ratio as the face occlusion part of the face image.
3. The face occlusion score determination method of claim 1, wherein said method trains said model based on at least the steps of:
acquiring a first face training image, marking the first face training image, and taking the marked first face training image as a training sample of the face segmentation model; constructing a neural network model, and taking the neural network model as a face segmentation model to be trained, wherein the face segmentation model to be trained comprises a backbone network to be trained, a neck network to be trained and a head network to be trained;
and inputting the training sample into the face segmentation model to be trained for training so as to obtain a face segmentation model which is trained.
4. A face occlusion score determination method according to claim 3, wherein the first face training image comprises a face detection box area and a non-face detection box area, and the step of labeling the first face training image comprises:
labeling at least a partial face region segmentation based on the first face training image, and occlusion region segmentation within at least a partial face region or non-occlusion region segmentation within at least a partial face region.
5. The face occlusion part determination method of claim 4, wherein the step of labeling at least part of the face region segmentation and occlusion region segmentation within at least part of the face region or non-occlusion region segmentation within at least part of the face region comprises:
acquiring a second face training image based on the first face training image, wherein the second face training image is an image only comprising a face detection frame area in the first face training image;
labeling at least part of the face region segmentation and shielding region segmentation in at least part of the face region or non-shielding region segmentation in at least part of the face region based on the second face training image and a preset labeling region requirement, wherein the preset labeling region requirement comprises labeling all of the face region or labeling part of the face region.
6. A face occlusion score determining method according to claim 3, wherein the step of inputting the training sample into the face segmentation model to be trained to obtain a trained face segmentation model comprises:
s1, inputting the training sample into the backbone network to be trained for feature extraction so as to obtain a first feature map of the training sample;
s2, inputting the first feature map of the training sample into the neck network to be trained for feature fusion so as to obtain a second feature map of the training sample;
s3, inputting a second feature map of the training sample into the head network to be trained for region segmentation to obtain a region segmentation result of the training sample, wherein the region segmentation result is a pixel level or a grid level;
s4, acquiring a loss function based on the region segmentation result of the training sample and the training sample, feeding back the loss function to the step S1, and circularly executing the steps S1-S4 until the loss function converges, wherein the loss function at least comprises a cross entropy loss function.
7. The face occlusion score determining method of claim 6, wherein the head network to be trained includes a face region segmentation head network to be trained, an occlusion region segmentation head network in a face region to be trained, or a non-occlusion region segmentation head network in a face region to be trained, and the step of inputting the second feature map of the training sample into the head network to be trained for region segmentation to obtain the region segmentation result of the training sample includes:
inputting the second feature map of the sample to be trained into the face region segmentation head network to be trained so as to obtain at least partial face region segmentation;
inputting the second feature map of the sample to be trained into a shielding region segmentation head network in the face region to be trained so as to obtain shielding region segmentation in at least part of the face region;
or inputting the second feature map of the sample to be trained into a non-occlusion region segmentation head network in the face region to be trained so as to obtain non-occlusion region segmentation in at least part of the face region.
8. A face occlusion score determination method according to claim 3, further comprising:
constructing the backbone network to be trained based at least on at least one of ResNet, mobileNet, HRNet;
the neck network to be trained is constructed based on at least one of FPN, PANet, bi-FPN.
9. An electronic device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor implements the face occlusion score determination method of any of claims 1 to 8 when the computer program is executed.
10. A readable storage medium having stored therein a plurality of program codes, wherein the program codes are adapted to be loaded and executed by a processor to perform the face occlusion score determination method of any of claims 1 to 8.
CN202310298673.9A 2023-03-23 2023-03-23 Face shielding score determining method, electronic equipment and medium Pending CN116309643A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310298673.9A CN116309643A (en) 2023-03-23 2023-03-23 Face shielding score determining method, electronic equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310298673.9A CN116309643A (en) 2023-03-23 2023-03-23 Face shielding score determining method, electronic equipment and medium

Publications (1)

Publication Number Publication Date
CN116309643A true CN116309643A (en) 2023-06-23

Family

ID=86793987

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310298673.9A Pending CN116309643A (en) 2023-03-23 2023-03-23 Face shielding score determining method, electronic equipment and medium

Country Status (1)

Country Link
CN (1) CN116309643A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116883670A (en) * 2023-08-11 2023-10-13 智慧眼科技股份有限公司 Anti-shielding face image segmentation method

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116883670A (en) * 2023-08-11 2023-10-13 智慧眼科技股份有限公司 Anti-shielding face image segmentation method

Similar Documents

Publication Publication Date Title
CN108121986B (en) Object detection method and device, computer device and computer readable storage medium
CN109934065B (en) Method and device for gesture recognition
CN112528831B (en) Multi-target attitude estimation method, multi-target attitude estimation device and terminal equipment
CN111860398B (en) Remote sensing image target detection method and system and terminal equipment
CN108734210B (en) Object detection method based on cross-modal multi-scale feature fusion
CN110363817B (en) Target pose estimation method, electronic device, and medium
CN110765860A (en) Tumble determination method, tumble determination device, computer apparatus, and storage medium
WO2020244075A1 (en) Sign language recognition method and apparatus, and computer device and storage medium
CN112328715B (en) Visual positioning method, training method of related model, related device and equipment
CN112446919A (en) Object pose estimation method and device, electronic equipment and computer storage medium
CN110991513A (en) Image target recognition system and method with human-like continuous learning capability
WO2023151237A1 (en) Face pose estimation method and apparatus, electronic device, and storage medium
CN110222572A (en) Tracking, device, electronic equipment and storage medium
CN112633084A (en) Face frame determination method and device, terminal equipment and storage medium
CN112085701A (en) Face ambiguity detection method and device, terminal equipment and storage medium
CN111985458A (en) Method for detecting multiple targets, electronic equipment and storage medium
CN116309643A (en) Face shielding score determining method, electronic equipment and medium
CN115018999A (en) Multi-robot-cooperation dense point cloud map construction method and device
CN110796108A (en) Method, device and equipment for detecting face quality and storage medium
CN111353325A (en) Key point detection model training method and device
CN110598647B (en) Head posture recognition method based on image recognition
WO2023109086A1 (en) Character recognition method, apparatus and device, and storage medium
CN109241942B (en) Image processing method and device, face recognition equipment and storage medium
WO2020244076A1 (en) Face recognition method and apparatus, and electronic device and storage medium
CN111104911A (en) Pedestrian re-identification method and device based on big data training

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination