CN116311419A - Face recognition depth camera and intelligent device - Google Patents
Face recognition depth camera and intelligent device Download PDFInfo
- Publication number
- CN116311419A CN116311419A CN202310042283.5A CN202310042283A CN116311419A CN 116311419 A CN116311419 A CN 116311419A CN 202310042283 A CN202310042283 A CN 202310042283A CN 116311419 A CN116311419 A CN 116311419A
- Authority
- CN
- China
- Prior art keywords
- face
- sub
- image
- face recognition
- depth
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000010606 normalization Methods 0.000 claims description 21
- 238000012545 processing Methods 0.000 claims description 21
- 230000004913 activation Effects 0.000 claims description 17
- 238000012549 training Methods 0.000 claims description 12
- 238000000605 extraction Methods 0.000 claims description 9
- 238000001514 detection method Methods 0.000 claims description 8
- 230000004927 fusion Effects 0.000 claims description 8
- 238000011176 pooling Methods 0.000 claims description 8
- 238000000034 method Methods 0.000 description 17
- 238000010586 diagram Methods 0.000 description 16
- 238000004364 calculation method Methods 0.000 description 13
- 230000006870 function Effects 0.000 description 11
- 238000013527 convolutional neural network Methods 0.000 description 8
- 230000008569 process Effects 0.000 description 7
- 230000000694 effects Effects 0.000 description 5
- 238000013528 artificial neural network Methods 0.000 description 3
- 239000011159 matrix material Substances 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000013136 deep learning model Methods 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 238000007781 pre-processing Methods 0.000 description 2
- 230000009466 transformation Effects 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 239000010432 diamond Substances 0.000 description 1
- 238000005265 energy consumption Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 210000000887 face Anatomy 0.000 description 1
- 239000011521 glass Substances 0.000 description 1
- 238000001559 infrared map Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 235000012054 meals Nutrition 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000010587 phase diagram Methods 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 230000000644 propagated effect Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 239000013589 supplement Substances 0.000 description 1
- 230000001502 supplementing effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/161—Detection; Localisation; Normalisation
- G06V40/166—Detection; Localisation; Normalisation using acquisition arrangements
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/56—Extraction of image or video features relating to colour
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/168—Feature extraction; Face representation
- G06V40/171—Local features and components; Facial parts ; Occluding parts, e.g. glasses; Geometrical relationships
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/172—Classification, e.g. identification
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- General Physics & Mathematics (AREA)
- Oral & Maxillofacial Surgery (AREA)
- Multimedia (AREA)
- Evolutionary Computation (AREA)
- Human Computer Interaction (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Medical Informatics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Databases & Information Systems (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Molecular Biology (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Image Analysis (AREA)
Abstract
A face recognition depth camera, comprising: the emitter is used for emitting structural light spots or floodlights to the target face; the receiver is used for receiving the structural light spot or the floodlight reflection signal so as to generate a first depth image; the RGB camera is used for acquiring a first RGB image of the target face; the processor is used for aligning the first depth image with the first RGB image, respectively acquiring a face area, respectively intercepting at least two sub-areas in the face area, identifying the whole sub-area, and selecting different sub-areas according to shielding conditions to finish face recognition; wherein the sub-region comprises at least one of left eye, right eye, nose, and mouth. The invention enables the user to recognize the face through the depth camera in various states.
Description
Technical Field
The invention relates to the technical field of face recognition, in particular to a face recognition depth camera and intelligent equipment.
Background
Face recognition is increasingly used in daily life, and the requirements for face recognition in different scenes are different. For payment scenarios, very high face recognition accuracy is often required to meet security requirements. For important authentication scenes, such as access control, door locks and the like, high face recognition accuracy is required. For intelligent devices, such as mobile phones, which are frequently started in daily life and have relatively fixed objects to be used, quick response is required, and convenience in use is considered.
The invention relates to a face recognition method; the method comprises the following steps: step 1: sequentially taking out one frame of image data from a memory to perform image preprocessing; step 2: performing face detection on frame data after image preprocessing; step 3: carrying out face picture scale conversion processing and face picture gray scale normalization processing on the face picture existing in the detected current frame image data; step 4: performing two-dimensional DCT on face pictures of a face database acquired in advance by adopting a face recognition algorithm combining a DCT algorithm and an MMSD algorithm, and then performing feature extraction on a transformation coefficient matrix after the two-dimensional DCT to obtain an optimal feature identification matrix W; step 5: the face picture processed in the step 3 is firstly subjected to two-dimensional DCT transformation and then projected to the optimal feature identification matrix W in the step 4, and the face picture is matched by utilizing a nearest neighbor classification method; namely, a face recognition method with accurate recognition is provided.
The invention relates to a face recognition method, wherein a face recognition algorithm based on deep learning is adopted to carry out face recognition on a face region in an image, and the method comprises the following steps: determining the structure of a convolutional neural network; according to the structure of the convolutional neural network, one or more convolutional results and/or input values are summarized under the same coordinate system; combining one or more convolution results and/or input values in the same coordinate system according to the structure of the convolution neural network; and calculating the quantized result of the shift and offset when the vector is transmitted to the convolutional neural network of the next layer according to the combined result. The invention quantifies a high-precision convolutional neural network algorithm such as ResNet class, googLeNet class or acceptance-ResNet class to a low-precision data to represent under the condition of keeping the original classification and detection accuracy, thereby deploying the high-precision convolutional neural network on edge equipment such as a smart phone, an intelligent camera and the like, and improving the performance of face detection on an edge end.
In the prior art, the image is subjected to targeted improvement through different algorithms, so that targeted characteristic improvement is not carried out under the use conditions of daily frequent starting and relatively fixed use objects.
Disclosure of Invention
Therefore, the invention adopts at least two sub-areas to extract the characteristics aiming at the use scene which is frequently started and the use object is relatively fixed in daily life, and carries out targeted processing according to the shielding condition of the human face, so that the user can recognize the human face through the depth camera in various states.
In a first aspect, the present invention provides a face recognition depth camera, comprising:
the emitter is used for emitting structural light spots or floodlights to the target face;
the receiver is used for receiving the structural light spot or the floodlight reflection signal so as to generate a first depth image;
the RGB camera is used for acquiring a first RGB image of the target face;
the processor is used for aligning the first depth image with the first RGB image, respectively acquiring a face area, respectively intercepting at least two sub-areas in the face area, identifying the whole sub-area, and selecting different sub-areas according to shielding conditions to finish face recognition; wherein the sub-region comprises at least one of left eye, right eye, nose, and mouth.
In a second aspect, the present invention provides a face recognition depth camera, comprising:
the infrared camera is used for acquiring a first infrared image of the target face;
the RGB camera is used for acquiring a first RGB image of the target face;
the processor is used for obtaining a first depth image according to the first infrared image and the first RGB image, aligning the first depth image with the first RGB image, respectively obtaining a face area, respectively intercepting at least two sub-areas in the face area, identifying the whole sub-area, and selecting different sub-areas according to shielding conditions to finish face recognition; wherein the sub-region comprises at least one of left eye, right eye, nose, and mouth.
Optionally, the face recognition depth camera is characterized in that the processor includes:
the intercepting module is used for respectively acquiring face areas on the first RGB image and the first depth image and respectively intercepting at least two sub-areas in the face areas; wherein the sub-region comprises at least one of a left eye, a right eye, a nose, and a mouth, and the first RGB image and the first depth image are aligned at a pixel level;
the input module is used for inputting the subareas into the subnetworks respectively and inputting all or part of data of the subnetworks into the subnetwork fusion layer according to the shielding condition of the subareas;
and the identification module is used for selecting the person corresponding to the face with the similarity higher than the threshold value and the highest similarity according to the identification result of the sub-network fusion layer, so as to finish face identification.
Optionally, the face recognition depth camera is characterized in that the intercepting module includes:
the detection unit is used for detecting the face position on the first RGB image and obtaining a first face key point;
the normalization unit is used for simultaneously carrying out normalization processing on the first RGB image and the first depth image based on the face key points to respectively obtain a second RGB image and a second depth image;
the extraction unit is used for extracting a second face key point on the second RGB image and obtaining a corresponding subarea;
and the corresponding unit is used for corresponding the second face key point to the second depth image to obtain a third face key point of the second depth image.
Optionally, the face recognition depth camera is characterized in that the number of the sub-networks is equal to that of the sub-areas, and the sub-networks are in one-to-one correspondence.
Optionally, the face recognition depth camera is characterized in that the subnetwork comprises a convolution layer, a batch normalization layer, a nonlinear activation layer, a pooling layer and an overlapping layer.
Optionally, the face recognition depth camera is characterized in that a plurality of the sub-areas are not overlapped with each other.
Optionally, the face recognition depth camera is characterized in that the second model and the third model adopt a combined training mode when training.
Optionally, the face recognition depth camera is characterized in that the sub-network fusion layer selects different models for processing according to different sub-network inputs.
In a third aspect, the present invention provides an intelligent device, which is characterized by comprising the aforementioned face recognition depth camera.
Compared with the prior art, the invention has the following beneficial effects:
the invention can obtain depth image data more stably by utilizing the active laser technology, thereby enhancing the robustness of data acquisition.
The invention uses partial image data of the human face to judge, can directly use the existing human face database without re-acquiring the image, and greatly reduces the cost of model training.
According to the invention, the network face features are extracted by utilizing the sub-network, global features of the whole face are not needed, the effect under the shielding scene can be better processed, the flexibility and the scene applicability are better, and the recognition effect under the scenes such as wearing mask, wearing sunglasses and the like can be improved.
According to the invention, different sub-networks are trained by adopting independent networks, so that the method has better adaptability to the human face, can obtain the best recognition effect for different areas, and is beneficial to improving the recognition effect.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present invention, and that other drawings can be obtained according to the provided drawings without inventive effort for a person skilled in the art. Other features, objects and advantages of the present invention will become more apparent upon reading of the detailed description of non-limiting embodiments, given with reference to the accompanying drawings in which:
fig. 1 is a schematic structural diagram of a face recognition depth camera according to an embodiment of the present invention;
FIG. 2 is a schematic view of a face cut-out region according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of another face recognition depth camera according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of a processor according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of an interception module according to an embodiment of the present invention;
FIG. 6 is a schematic diagram of a sub-network structure according to an embodiment of the present invention;
FIG. 7 is a diagram illustrating a training phase configuration in accordance with an embodiment of the present invention;
FIG. 8 is a diagram of an inference phase configuration in an embodiment of the present invention;
fig. 9 is a schematic structural diagram of an intelligent device in an embodiment of the present invention.
Detailed Description
The present invention will be described in detail with reference to specific examples. The following examples will assist those skilled in the art in further understanding the present invention, but are not intended to limit the invention in any way. It should be noted that variations and modifications could be made by those skilled in the art without departing from the inventive concept. These are all within the scope of the present invention.
The terms "first," "second," "third," "fourth" and the like in the description and in the claims and in the above drawings, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented, for example, in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
The technical scheme of the invention is described in detail below by specific examples. The following embodiments may be combined with each other, and some embodiments may not be repeated for the same or similar concepts or processes.
The embodiment of the invention provides a face recognition depth camera, which aims to solve the problems in the prior art.
The following describes the technical scheme of the present invention and how the technical scheme of the present application solves the above technical problems in detail with specific embodiments. The following embodiments may be combined with each other, and the same or similar concepts or processes may not be described in detail in some embodiments. Embodiments of the present invention will be described below with reference to the accompanying drawings.
According to the invention, aiming at the use scene that the daily frequently started and the use object is relatively fixed, at least two sub-areas are adopted for feature extraction, and targeted processing is carried out according to the shielding condition of the face, so that the user can recognize the face through the depth camera in various states.
Fig. 1 is a schematic structural diagram of a face recognition depth camera according to an embodiment of the present invention. As shown in fig. 1, a face recognition depth camera according to an embodiment of the present invention includes:
and an emitter 11 for emitting a structured light spot or flood light towards the target face.
Specifically, the emitter 11 is an infrared laser emitter, and may be used to individually project a structured light spot or floodlight, or may be used to implement switchable projection of the structured light spot and floodlight. The power of the transmitter 11 meets the requirements of the relevant industry standard. The structured light spots may be in various shapes, such as circles, squares, diamonds, hexagons, etc., as long as the corresponding functions can be satisfied, and the embodiment is not limited in particular form.
A receiver 12 for receiving the structured light spot or the floodlight reflected signal to generate a first depth image.
Specifically, the receiver 12 is matched to the transmitter 11. When the transmitter 11 projects only a structured light spot, the receiver 12 is a structured light receiver. When the transmitter 11 projects only floodlight, or alternatively projects structured light or floodlight, the receiver 12 is a TOF receiver. The receiver 12 operates synchronously with the transmitter 11, so that the receiver 12 receives information of the transmitter 11, and further obtains a reflected signal of the target face, and generates a first depth image.
An RGB camera 13, configured to acquire a first RGB image of the target face.
Specifically, the RGB camera is synchronously exposed to the receiver 12 to obtain a target face image at the same time. The field of view of the RGB camera 13 is the same as the field of view of the receiver 12. The first depth image has the same resolution as the first RGB image.
In some embodiments, a light supplementing lamp is further included to supplement the RGB camera 13 with light when the ambient light is weak, so that the RGB camera 13 obtains a first RGB image exposed in the visible range.
And the processor 14 is configured to align the first depth image with the first RGB image, obtain face regions respectively, intercept at least two sub-regions in the face regions respectively, identify the whole sub-regions, and select different sub-regions according to occlusion conditions to complete face recognition.
Specifically, the first depth image is pixel-wise aligned with the first RGB image. It is also possible to align the first depth image and the first RGB image by arranging the RGB camera 13 on the optical axis of the transmitter 11 and the receiver 12; and in the case of an RGB camera having the same resolution as the receiver, the first depth image is pixel-wise aligned with the first RGB image without requiring the processor 14 to perform the alignment operation again.
At least two sub-regions are truncated within the face region. As shown in fig. 2, each sub-region contains at least one of the left eye, right eye, nose, and mouth. In this embodiment, the key part of the face is identified by the sub-region, instead of only identifying by the key point. Meanwhile, the subregion does not necessarily include each of the left eye, right eye, nose, and mouth bar, but includes at least two of them.
Judging the shielding condition of the subareas, removing the subareas which are seriously shielded and can not be identified, and only integrally identifying the subareas which are not seriously shielded and can be identified, so that the self-adaption of the user characteristics is realized, and a specific identification model is selected according to the user characteristics to finish the face identification. The processor has at least three face recognition models and selects according to the identifiable sub-regions. Taking two sub-regions as an example, there are sub-region a and sub-region B, respectively, then there are three combinations of a and B: a, B, a+b. Each combination corresponds to a face recognition model.
In this embodiment, the same set of samples may be combined according to the data of the sub-regions, so as to train a plurality of different face recognition models, thereby implementing effective recognition on various shielding conditions with the least training samples.
Fig. 3 is a schematic structural diagram of another face recognition depth camera according to an embodiment of the present invention. As shown in fig. 3, another face recognition depth camera according to an embodiment of the present invention includes:
the infrared camera 21 is configured to acquire a first infrared image of a target face.
Specifically, the infrared camera 21 can obtain an infrared image of the target face, so that living body detection can be realized, and infrared data can be obtained.
The RGB camera 22 is configured to acquire a first RGB image of the target face.
Specifically, the RGB camera 22 is exposed synchronously with the infrared camera 21 to obtain an RGB image and an infrared image at the same time. The RGB camera 22 and the infrared camera 21 form a binocular system, and depth data can be obtained by parallax calculation of the two.
The processor 23 is configured to obtain a first depth image according to the first infrared image and the first RGB image, align the first depth image with the first RGB image, respectively obtain a face region, respectively intercept at least two sub-regions in the face region, identify the whole sub-region, and select different sub-regions according to shielding conditions to complete face recognition.
Specifically, the sub-region comprises at least one of a left eye, a right eye, a nose, and a mouth. Unlike the foregoing embodiment, the depth data in the present embodiment is obtained by parallax calculation of the first RGB image and the first infrared image, and the obtained first depth image is aligned with the first RGB image.
In some embodiments, the exposure of the first RGB image is further compensated by the first infrared map such that the target face in the first RGB image is in the visible range. Since the first infrared image and the first RGB image are shot at the same time, a lot of details of the first infrared image and the first RGB image are consistent, and when the first RGB image is seriously overexposed or underexposed, the first RGB image information which is partially missing can be restored through the details of the first infrared image.
In this embodiment, the same set of samples may be combined according to the data of the sub-regions, so as to train a plurality of different face recognition models, thereby implementing effective recognition on various shielding conditions with the least training samples.
Fig. 4 is a schematic diagram of a processor according to an embodiment of the invention. As shown in fig. 4, a processor in an embodiment of the present invention includes:
and the clipping module 31 is configured to obtain face regions on the first RGB image and the first depth image, and clip at least two sub-regions in the face regions, respectively.
Specifically, the sub-region includes at least one of a left eye, a right eye, a nose, and a mouth, and the first RGB image and the first depth image are aligned at a pixel level. Two or more of the left eye, right eye, nose, mouth may share a sub-region. For example, the left eye and nose may share a sub-region and the right eye and nose may share a sub-region. The shape of the sub-regions may be any shape, such as rectangular, triangular, pentagonal, hexagonal, circular, etc. The size of the sub-regions may be arbitrarily set, but is typically not smaller than the size of a single eye. If the two eyes are of different sizes, the sub-region needs to include the smallest eye in its entirety.
The input module 32 is configured to input the sub-areas into the sub-networks respectively, and input data of all or part of the sub-networks into the sub-network fusion layer according to the shielding situation of the sub-areas.
Specifically, each subnetwork has an independent network. The number of sub-networks is equal to the number of sub-areas. The subnetworks are neural networks, such as convolutional neural networks, deep neural networks, generating countermeasure networks, and the like. The multiple sub-networks may employ the same feature extraction network or may employ different feature extraction networks. For example, when the left and right eyes are respectively different subregions, the two subregions may employ the same feature extraction network. And the subregion that left eye is located and the subregion that the nose is located adopt different characteristic extraction networks.
And judging the shielding condition of the subarea to determine whether the subarea is used for face recognition. When a sub-region is fully or mostly occluded, the sub-region is not employed for identification. And when the subareas are not or are less blocked, carrying out face recognition on the subareas together.
In some embodiments, the blocked condition is determined according to the proportion of the whole sub-region blocked. In case the sub-area is fixed, the information contained in the sub-area is also fixed. When the information is blocked sufficiently, the accuracy of the deep learning model recognition will be greatly reduced. When the accuracy of the deep learning model in judging the nose area is lower than a preset value under a certain shielding condition, judging that all or most of the nose area is shielded, otherwise, judging that the nose area is not shielded or a small part of the nose area is shielded.
And the recognition module 33 is configured to select a person corresponding to a face with a similarity higher than a threshold value and the highest similarity according to the recognition result of the sub-network fusion layer, so as to complete face recognition.
Specifically, the calculation of the similarity includes the steps of:
step S31: scaling an image
In this step, the image is subjected to a scaling process so that the face region obtained in step S1 is kept in a preset size. The first RGB image and the first depth map are processed at the same time, and the processed image still has the characteristic of pixel level alignment. In some embodiments, the present process has been performed on the first RGB image and the first depth image in step S1, and the present step is skipped.
Step S32: graying treatment
In this step, the RGB image is processed into a gray scale image to reduce the amount of post-calculation. This step only requires RGB image processing and does not require depth image processing. In a subsequent step, the processed RGB image and depth image are processed simultaneously.
Step S33: calculating the average value
In this step, the average value of each row of pixel points in each sub-area is calculated in units of rows, and the average value of the row of pixel points is obtained. The image has at least two sub-regions, each having a plurality of lines, the step resulting in a plurality of averages. Each average value corresponds to a line of features.
Step S34: calculating variance
In the step, taking the subarea as an object, calculating variances of all the obtained average values, wherein the obtained variances are the characteristic values of the subarea. The variance records the primary information of the sub-region.
Step S35: comparing variances
In the step, the variance of each subarea is used for calculating the distance between the face and the characteristic value of all samples in the face library. Each subarea calculates the same subarea in the face library to obtain the variance value of each subarea, and averages the variance values of each subarea to obtain the final variance value.
Step S36: face recognition
In this step, according to the calculation result in the above step S35, the face picture closest to the current picture feature value and having a metric distance higher than the threshold value is selected from the face library by using the set threshold value.
Through processing the RGB image and the depth image, variance calculation and comparison are carried out by utilizing the subareas, and more accurate face recognition effect is obtained through joint calculation of the RGB image and the depth image. In this embodiment, only the areas in the sub-areas are compared, and no additional other area information is needed, so as to improve the focusing performance of the model and resist the influence of hairstyles, glasses, masks and the like.
Fig. 5 is a schematic structural diagram of an interception module according to an embodiment of the invention. As shown in fig. 5, an interception module in an embodiment of the present invention includes:
the detecting unit 311 is configured to detect a face position on the first RGB image, and obtain a first face key point.
Specifically, a face is detected on the first RGB image by using a face detection algorithm, and a face position is obtained. The first face keypoints are used to determine the position of the face and the position of the main part, such as contour, eyes, nose, mouth, etc. In the prior art, the number of key points of a human face is 29, 68 and the like, and the key points are used for identifying the basic characteristics of the human face. The key points in the embodiment are different from those in the prior art, do not bear the function of face recognition, and are only used for marking the positions of faces, so that the number of the key points can be far less than that of the face in the prior art. For example, for the main parts of the face such as eyes, nose, mouth, etc., two key points can be used to represent: a center point and an edge point. Taking eyes as an example, the area where the face is located can be obtained by adopting the central point of the eyes and the key points of the outer corners of the eyes, so that the area of the eyes is determined.
And a normalization unit 312, configured to perform normalization processing on the first RGB image and the first depth image simultaneously based on the face key point, so as to obtain a second RGB image and a second depth image respectively.
Specifically, the normalization processing is performed on the image area determined by the key points, so that a normalized image is obtained, and the subsequent processing is more uniform and efficient. Since the first RGB image and the first depth image are aligned at the pixel level, the first RGB image and the first depth image can be processed simultaneously by positioning the key points of the human face, and normalized images, namely a second RGB image and a second depth image, can be obtained. The second RGB image and the second depth image are also pixel-level aligned.
And an extracting unit 313, configured to extract a second face key point on the second RGB image, and obtain a corresponding sub-region.
Specifically, in this step, a face detection algorithm is used to detect on the second RGB image to obtain the second face key point. Since the second RGB image is normalized by the first RGB image, the position of the second face key point is different from the position of the first face key point. And according to the position of the second face key point, the corresponding sub-region position can be obtained. Compared with the method that the sub-region is obtained on the first RGB image and then normalized, the sub-region position on the second RGB image is obtained, and the calculation can be carried out only through a small number of key points, so that the calculation amount is greatly reduced. On the normalized image, the subareas have better size and position range, and can improve the data processing precision.
In some embodiments, the location of the second keypoint is obtained by normalizing the first keypoint. When the first RGB image is normalized, the position of the first key point is known, so that the position of the second key point can be obtained by synchronously normalizing the first key point, the processing is simplified, and the efficiency is improved.
And a correspondence unit 314, configured to correspond the second face key point to the second depth image, and obtain a third face key point of the second depth image.
Specifically, since the second RGB image and the second depth image are aligned at the pixel level, the position of the third face key point can be directly obtained from the position of the second face key point. The third face key point differs from the second face key point in being located on a different image, but having the same position, shape, etc.
In the embodiment, the first RGB image and the first depth image are detected to obtain the key point information, and then normalization processing is performed to obtain the second RGB image and the second depth image, so that the key point position positioning can be reused to obtain the subarea, the subarea position can be obtained through less processing amount, and therefore calculation can be performed quickly, and efficiency is improved.
Fig. 6 is a schematic diagram of a sub-network structure according to an embodiment of the present invention. An example is illustrated in fig. 6 with a convolutional neural network. The convolutional neural network shown in fig. 6 includes ten convolutional layers, ten batch normalization layers, eleven nonlinear activation layers, four pooling layers, three superimposed layers, a random discard layer, a classification layer, and a loss calculation layer. The data of the subarea sequentially passes through a first convolution layer, a first batch normalization layer, a first nonlinear activation layer, a first pooling layer, a second convolution layer, a second batch normalization layer, a first nonlinear activation layer, a third convolution layer, a third nonlinear activation layer, a first superposition layer, a fourth convolution layer, a fourth batch normalization layer, a fourth nonlinear activation layer, a second pooling layer, a fifth convolution layer, a fifth batch normalization layer, a fifth nonlinear activation layer, a sixth convolution layer, a sixth batch normalization layer, a sixth nonlinear activation layer, a second superposition layer, a seventh convolution layer, a seventh batch normalization layer, a seventh nonlinear activation layer, a third pooling layer, an eighth convolution layer, an eighth nonlinear activation layer, a ninth convolution layer, a ninth batch normalization layer, a ninth nonlinear activation layer, a third superposition layer, a tenth convolution layer, a tenth batch normalization layer, a tenth nonlinear activation layer, a fourth pooling layer, a full-connection layer, an eleventh nonlinear activation layer, an eleventh random-order layer, and a linear-loss calculation layer. Wherein the first superimposing layer superimposes the first pooling layer with the third nonlinear activation layer; the second overlapping layer overlaps the fourth convolution layer and the sixth nonlinear activation layer; the third superimposed layer superimposes the seventh convolutional layer and the ninth nonlinear-active layer.
In the embodiment, layering processing on the sub-areas is realized through a plurality of convolution layers, and fusion and processing of data of the plurality of sub-areas are realized through a full-connection layer, an eleventh nonlinear activation layer, a random discarding layer, a classification layer and a loss calculation layer, so that living bodies are better judged.
Fig. 7 is a diagram illustrating a training phase structure in an embodiment of the present invention. Fig. 7 illustrates 3 sub-areas as an example. The subnetwork 601, subnetwork 602, and subnetwork 603 are combined together to obtain a subnetwork convergence layer 620, which then utilizes an integrated classifier 630 to calculate a loss function 640. When the network is trained, a joint training mode is adopted: the loss (loss) from the loss function 611 to the loss function 613 of the plurality of sub-regions is respectively propagated in opposite directions, the network weights from the corresponding sub-network 601 to the sub-network 603 are updated, and the loss (loss) obtained from the loss function 640 is only used for updating the weights of the comprehensive classifier. In the training process of the embodiment, only valuable functions are used, so that the training speed can be increased, and meanwhile, the accuracy of the model is guaranteed.
Fig. 8 is a diagram illustrating an inference phase configuration in an embodiment of the present invention. Since the subnetworks remain unchanged, fig. 8 remains identical to fig. 7, and the illustration is still made using 3 subnetworks. Phase diagram during training, all batch normalization layers need to be combined forward into corresponding convolution layers during reasoning, random discarding layers are removed, and loss functions 611 to 613 are not used, and only loss functions 640 need to be directly replaced by softmax layers to calculate the probability of the final output of the network.
Fig. 9 is a schematic structural diagram of an intelligent device in an embodiment of the present invention. Fig. 9 is an illustration of a robot. The robot shown in fig. 9 includes a depth camera 601, a robot body 602, and a display screen 603. The depth camera 601 is the face recognition depth camera described in any one of the foregoing embodiments. The depth camera 601 may acquire a scene in front of the robot so that three-dimensional information may be acquired. The robot body 602 may select different functions according to different robot types. For example, the meal delivery robot may have a tray, a bracket, a driving wheel, etc.; the greeting robot may have a drive wheel, a manikin, etc. The display 603 is used to display information for interaction with a user. The display 603 may be unidirectional or bi-directional. Due to the adoption of the high-integration automatic switching depth camera, the robot can accommodate more sensors under the same size, so that the robot has better environment sensing capability. Of course, robots can also be reduced in size so as to accommodate more size-critical scenarios. The automatic switching depth camera can also utilize the condition of the monitoring range to set different triggering distances, so that the functions of automatically starting the automatic switching depth camera at a short distance and remotely closing the automatic switching depth camera are realized, and the energy consumption can be saved.
According to the invention, aiming at the use scene that the daily frequently started and the use object is relatively fixed, at least two sub-areas are adopted for feature extraction, and targeted processing is carried out according to the shielding condition of the face, so that the user can recognize the face through the depth camera in various states.
In the present specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, and identical and similar parts between the embodiments are all enough to refer to each other. The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
The foregoing describes specific embodiments of the present invention. It is to be understood that the invention is not limited to the particular embodiments described above, and that various changes and modifications may be made by one skilled in the art within the scope of the claims without affecting the spirit of the invention.
Claims (10)
1. A face recognition depth camera, comprising:
the emitter is used for emitting structural light spots or floodlights to the target face;
the receiver is used for receiving the structural light spot or the floodlight reflection signal so as to generate a first depth image;
the RGB camera is used for acquiring a first RGB image of the target face;
the processor is used for aligning the first depth image with the first RGB image, respectively acquiring a face area, respectively intercepting at least two sub-areas in the face area, identifying the whole sub-area, and selecting different sub-areas according to shielding conditions to finish face recognition; wherein the sub-region comprises at least one of left eye, right eye, nose, and mouth.
2. A face recognition depth camera, comprising:
the infrared camera is used for acquiring a first infrared image of the target face;
the RGB camera is used for acquiring a first RGB image of the target face;
the processor is used for obtaining a first depth image according to the first infrared image and the first RGB image, aligning the first depth image with the first RGB image, respectively obtaining a face area, respectively intercepting at least two sub-areas in the face area, identifying the whole sub-area, and selecting different sub-areas according to shielding conditions to finish face recognition; wherein the sub-region comprises at least one of left eye, right eye, nose, and mouth.
3. The face recognition depth camera of claim 1 or 2, wherein the processor comprises:
the intercepting module is used for respectively acquiring face areas on the first RGB image and the first depth image and respectively intercepting at least two sub-areas in the face areas; wherein the sub-region comprises at least one of a left eye, a right eye, a nose, and a mouth, and the first RGB image and the first depth image are aligned at a pixel level;
the input module is used for inputting the subareas into the subnetworks respectively and inputting all or part of data of the subnetworks into the subnetwork fusion layer according to the shielding condition of the subareas;
and the identification module is used for selecting the person corresponding to the face with the similarity higher than the threshold value and the highest similarity according to the identification result of the sub-network fusion layer, so as to finish face identification.
4. A face recognition depth camera according to claim 3, wherein the intercept module comprises:
the detection unit is used for detecting the face position on the first RGB image and obtaining a first face key point;
the normalization unit is used for simultaneously carrying out normalization processing on the first RGB image and the first depth image based on the face key points to respectively obtain a second RGB image and a second depth image;
the extraction unit is used for extracting a second face key point on the second RGB image and obtaining a corresponding subarea;
and the corresponding unit is used for corresponding the second face key point to the second depth image to obtain a third face key point of the second depth image.
5. A face recognition depth camera according to claim 3, wherein the number of sub-networks is equal to the number of sub-areas and corresponds to one.
6. A face recognition depth camera according to claim 3, wherein the sub-network comprises a convolution layer, a batch normalization layer, a nonlinear activation layer, a pooling layer, and an overlay layer.
7. A face recognition depth camera according to claim 3, wherein a plurality of the sub-areas do not overlap each other.
8. A face recognition depth camera according to claim 3, wherein the second model and the third model are trained in a joint training manner.
9. A face recognition depth camera according to claim 3, wherein the sub-network convergence layer selects different models for processing according to different sub-network inputs.
10. An intelligent device, characterized by comprising a face recognition depth camera as claimed in claim 1 or 2.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310042283.5A CN116311419A (en) | 2023-01-28 | 2023-01-28 | Face recognition depth camera and intelligent device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310042283.5A CN116311419A (en) | 2023-01-28 | 2023-01-28 | Face recognition depth camera and intelligent device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116311419A true CN116311419A (en) | 2023-06-23 |
Family
ID=86778743
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310042283.5A Pending CN116311419A (en) | 2023-01-28 | 2023-01-28 | Face recognition depth camera and intelligent device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116311419A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117746486A (en) * | 2023-12-29 | 2024-03-22 | 广州图语信息科技有限公司 | Face recognition method based on non-coding stripe structured light |
-
2023
- 2023-01-28 CN CN202310042283.5A patent/CN116311419A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117746486A (en) * | 2023-12-29 | 2024-03-22 | 广州图语信息科技有限公司 | Face recognition method based on non-coding stripe structured light |
CN117746486B (en) * | 2023-12-29 | 2024-05-24 | 广州图语信息科技有限公司 | Face recognition method based on non-coding stripe structured light |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109751973B (en) | Three-dimensional measuring device, three-dimensional measuring method, and storage medium | |
CN106909911B (en) | Image processing method, image processing apparatus, and electronic apparatus | |
CN101527046B (en) | Motion detection method, device and system | |
Bertozzi et al. | Pedestrian detection for driver assistance using multiresolution infrared vision | |
EP3740897A1 (en) | License plate reader using optical character recognition on plural detected regions | |
CN113813046B (en) | Optical tracking system and optical tracking method | |
CN111382613B (en) | Image processing method, device, equipment and medium | |
CN106997457B (en) | Figure limb identification method, figure limb identification device and electronic device | |
CN111598065B (en) | Depth image acquisition method, living body identification method, apparatus, circuit, and medium | |
CN103827920A (en) | Object distance determination from image | |
US20210174143A1 (en) | Information processing method, information processing system, and information processing apparatus | |
CN111860352A (en) | Multi-lens vehicle track full-tracking system and method | |
CN116311419A (en) | Face recognition depth camera and intelligent device | |
CN113793413A (en) | Three-dimensional reconstruction method and device, electronic equipment and storage medium | |
Kruger et al. | In-factory calibration of multiocular camera systems | |
CN113160210B (en) | Drainage pipeline defect detection method and device based on depth camera | |
CN112446355B (en) | Pedestrian recognition method and people stream statistics system in public place | |
JP5862217B2 (en) | Marker detection and tracking device | |
KR102119388B1 (en) | AVM system and camera calibration method | |
CN115937996A (en) | Face recognition method for mask and intelligent door lock | |
CN115984712A (en) | Multi-scale feature-based remote sensing image small target detection method and system | |
KR102664123B1 (en) | Apparatus and method for generating vehicle data, and vehicle system | |
EP3168779A1 (en) | Method for identifying an incoming vehicle and corresponding system | |
CN114359726A (en) | Obstacle detection method, device, equipment, storage medium and program product | |
CN108876849B (en) | Deep learning target identification and positioning method based on auxiliary identification |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |