CN111967315A - Human body comprehensive information acquisition method based on face recognition and infrared detection - Google Patents

Human body comprehensive information acquisition method based on face recognition and infrared detection Download PDF

Info

Publication number
CN111967315A
CN111967315A CN202010660673.5A CN202010660673A CN111967315A CN 111967315 A CN111967315 A CN 111967315A CN 202010660673 A CN202010660673 A CN 202010660673A CN 111967315 A CN111967315 A CN 111967315A
Authority
CN
China
Prior art keywords
layer
face
convolution
basic
branch
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010660673.5A
Other languages
Chinese (zh)
Other versions
CN111967315B (en
Inventor
谢巍
卢永辉
吴伟林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China University of Technology SCUT
Original Assignee
South China University of Technology SCUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China University of Technology SCUT filed Critical South China University of Technology SCUT
Priority to CN202010660673.5A priority Critical patent/CN111967315B/en
Publication of CN111967315A publication Critical patent/CN111967315A/en
Application granted granted Critical
Publication of CN111967315B publication Critical patent/CN111967315B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • G06V40/162Detection; Localisation; Normalisation using pixel segmentation or colour matching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Multimedia (AREA)
  • Human Computer Interaction (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a human body comprehensive information acquisition method based on face recognition and infrared detection, which comprises the steps of firstly acquiring RGB images and infrared images of detected personnel through detection equipment; then, carrying out face detection on the RGB image to acquire all face position information in the RGB image; based on the face position information, carrying out face recognition on a face image at each face position in the RGB image, obtaining a face recognition result at each face position, carrying out temperature detection on each face position in the infrared image, and obtaining a temperature detection result at each face position; and then the human face recognition result and the temperature detection result on the same human face position are combined to obtain the human body comprehensive information. The invention can simultaneously complete face recognition and temperature detection on the detected personnel, and has high efficiency of monitoring the comprehensive information of the human body.

Description

Human body comprehensive information acquisition method based on face recognition and infrared detection
Technical Field
The invention relates to the field of digital image processing, mode recognition and computer vision, in particular to a human body comprehensive information acquisition method based on face recognition and infrared detection.
Background
In the natural world, there are various viruses that are the root cause of epidemic diseases that frequently occur in human society, which disturb human lives at a low rate and threaten human lives at a high rate, thus hindering social development. Because the virus is difficult to completely eliminate, the health monitoring of human body in daily life is very important. One of the main symptoms of epidemic diseases is fever, and at present, the temperature of a human body is usually measured manually, and when the temperature is abnormal, the temperature is registered to track the physical condition or further measures are taken. The main tool for manually measuring the body temperature is an infrared camera, and the reason for manual use is that the infrared camera cannot acquire the face position information in the current infrared image, and the RGB camera cannot acquire the temperature data of the detected object, so that the face identification and the temperature detection cannot be simultaneously completed for the detected person.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provide a human body comprehensive information acquisition method based on face recognition and infrared detection.
The purpose of the invention is realized by the following technical scheme: a human body comprehensive information acquisition method based on face recognition and infrared detection comprises the following steps:
s1, acquiring an RGB image and an infrared image of the detected person through detection equipment;
s2, carrying out face detection on the RGB image to acquire all face position information in the RGB image;
s3, based on the face position information, carrying out face recognition on the face image at each face position in the RGB image to obtain a face recognition result at each face position;
carrying out temperature detection on each face position on the infrared image to obtain a temperature detection result on each face position;
and S4, combining the face recognition result and the temperature detection result at the same face position to obtain the comprehensive human body information of each detected person.
Preferably, the RGB image collected in step S1 is a 3-channel color image, the infrared image is a single-channel grayscale image, and the RGB image and the infrared image have the same width and height, and the specific positions in the two images correspond to each other one by one.
Preferably, in step S2, the face detection is performed by a first convolutional neural network, where the input of the first convolutional neural network is an RGB image, the output of the first convolutional neural network is position information of all faces in the current image, and the position information of each face is represented as (x, y, w, h), where (x, y) represents coordinates of a center position of the face in the RGB image when an upper left corner of the RGB image is an origin, w represents a width of a face region, and h represents a height of the face region.
Further, the basic structure of the first convolutional neural network is as follows: the device comprises a basic convolution layer, a residual error module, a down-sampling layer, an RFB-c module, an up-sampling layer and a cascade structure;
this first convolution neural network includes backbone network, 3 detection branch networks of connecting backbone network, and wherein, backbone network is: the device comprises a basic convolution layer, a down-sampling layer, a residual module, a down-sampling layer, 2 cascaded residual modules, an RFB-c module, a down-sampling layer and a residual module which are connected in sequence;
the first detection branch network is separated from a first RFB-c module in a main network, the second detection branch network is separated from a second RFB-c module, the third detection branch network is separated from the last layer of the main network, namely a residual error module, each detection branch network only comprises a basic convolution layer, and in each two adjacent detection branch networks, after the two detection branch networks pass through the first basic convolution layer, the deeper detection branch network passes through the basic convolution layer and an upper sampling layer, and then is combined to the shallower detection branch network through a cascade structure.
Furthermore, in the first convolutional neural network adopted in step S2, the last basic convolutional layer of each detection branch network only includes one convolutional layer, and does not include the batch regularization layer and the ReLU activation function; the other elementary convolutional layers in the first convolutional neural network each comprise three parts connected in sequence: convolutional layers, batch regularization layers, ReLU activation functions;
the residual module comprises 2 branches, wherein one branch comprises a basic convolution layer with convolution kernel size of 1x1 and a depth separable convolution layer with convolution kernel size of 3x3 which are sequentially connected, and the other branch is of a direct connection structure; the outputs of the two branches are combined through a cascade structure, so that the output of the residual error module is obtained;
the down-sampling layer is a basic convolution layer with convolution parameter step length of 2;
the method adopted by the up-sampling layer is a nearest neighbor interpolation method;
the cascade structure is used for performing superposition operations on inputs in the channel dimension.
Further, in the first convolutional neural network adopted in step S2, the RFB-c module includes 5 branches:
the first branch comprises a basic convolution layer with convolution kernel of 1 multiplied by 1 and a cavity convolution layer with convolution kernel of 3 multiplied by 3 and sampling rate of 1 which are connected in sequence;
the second branch comprises a basic convolution layer with convolution kernel of 1 multiplied by 1, a basic convolution layer with convolution kernel of 1 multiplied by 3 and a cavity convolution layer with convolution kernel of 3 multiplied by 3 and sampling rate of 3 which are connected in sequence;
the third branch comprises a basic convolutional layer with a convolutional kernel of 1 multiplied by 1, a basic convolutional layer with a convolutional kernel of 3 multiplied by 1 and a hole convolutional layer with a convolutional kernel of 3 multiplied by 3 and a sampling rate of 3 which are connected in sequence;
the fourth branch comprises a basic convolutional layer with a convolution kernel of 1 multiplied by 1, a basic convolutional layer with a convolution kernel of 1 multiplied by 3, a basic convolutional layer with a convolution kernel of 3 multiplied by 1 and a hole convolutional layer with a convolution kernel of 3 multiplied by 3 and a sampling rate of 5 which are connected in sequence;
the fifth branch is a direct connection structure;
the first branch, the second branch, the third branch and the fourth branch are superposed and pass through a 1x1 convolution layer, then are combined with the fifth branch through a cascade structure, and finally are connected to a ReLU activation function, so that the output of the RFB-c module is obtained.
Preferably, in step S3, the face recognition is performed by the second convolutional neural network, and the recognition process is as follows:
the second convolution neural network encodes the current face image in the RGB image and outputs an N-dimensional characteristic vector [ p ]0,p1,…,pN]I.e. mapping into a euclidean space; then, calculating and matching in a face library: calculating a characteristic vector corresponding to face information in a face library and an N-dimensional characteristic vector [ p ]0,p1,…,pN]The second-order norm of the difference X is used as similarity measurement, and the face information with the minimum similarity measurement value with the current detected face image in the face library is the face recognition result of the current detected face;
wherein the second-order norm | | X | | non-woven phosphor2The formula of (1) is:
Figure BDA0002578440420000031
wherein x isiRepresents the ith element in the vector X, and n is the total number of elements in the vector X.
Further, the basic structure of the second convolutional neural network includes: the device comprises a basic convolution layer, a residual error module, a down-sampling layer and a global average pooling layer;
the backbone network of the second convolutional neural network is: the system comprises a basic convolution layer, a down-sampling layer, a residual error module, a down-sampling layer, 2 cascaded residual error modules, a down-sampling layer and a global average pooling layer which are sequentially connected, wherein a face recognition result of a current detected face in an RGB image is output by the global average pooling layer.
Further, in the second convolutional neural network adopted in step S3, the basic convolutional layer includes three parts connected in sequence: convolutional layers, batch regularization layers, ReLU activation functions;
the residual module comprises 2 branches, wherein one branch comprises a basic convolution layer with convolution kernel size of 1x1 and a depth separable convolution layer with convolution kernel size of 3x3 which are sequentially connected, and the other branch is of a direct connection structure; the outputs of the two branches are combined through a cascade structure, so that the output of the residual error module is obtained; the cascade structure is used for performing superposition operation on input on channel dimension;
the down-sampling layer is a basic convolution layer with convolution parameter step length of 2;
and the global average pooling layer is used for performing summation and averaging operation on all elements on each input channel, taking the result as the value of the current channel, and keeping the number of input and output channels of the global average pooling layer consistent.
Preferably, the temperature detection process in step S3 is as follows:
based on the face position information, finding a corresponding infrared face area in the infrared image;
and then dividing the area into P multiplied by Q grids, summing all the temperatures in each grid, taking the sum as the temperature value of the grid, traversing the temperature values of all the grids, and taking the highest temperature value as the temperature detection result of the current detected personnel.
Compared with the prior art, the invention has the following advantages and effects:
(1) the invention relates to a human body comprehensive information acquisition method based on face recognition and infrared detection, which comprises the steps of firstly acquiring RGB images and infrared images of detected personnel through detection equipment; then, carrying out face detection on the RGB image to acquire all face position information in the RGB image; based on the face position information, carrying out face recognition on a face image at each face position in the RGB image, obtaining a face recognition result at each face position, carrying out temperature detection on each face position in the infrared image, and obtaining a temperature detection result at each face position; and then the human face recognition result and the temperature detection result on the same human face position are combined to obtain the human body comprehensive information of the detected person. Therefore, the human face comprehensive information monitoring system can complete face recognition and temperature detection on detected personnel at the same time, is convenient and efficient for human body comprehensive information monitoring, and can further provide effective technical support for health monitoring and early warning of epidemic diseases.
(2) The method combines the artificial intelligence technology, utilizes the first convolution neural network to carry out face detection on the RGB image, and can quickly obtain accurate face position information; the second convolutional neural network is utilized to carry out face recognition on the face image at each face position in the RGB image, so that a face recognition result can be obtained quickly and reliably, and finally, the face recognition result and the temperature detection result can be in one-to-one correspondence through face position information, so that the temperature detection result of each detected person can be tracked very clearly.
(3) In the method, because the difference between the two tasks of detection and identification is large, the two tasks are not convenient to be completed in the same network, so that the accuracy of detection and identification can be simultaneously ensured by separating the two convolutional neural networks.
(4) In the method, the whole face library is traversed by using the similarity measurement index, and when the face information with the minimum similarity measurement value with the current detected face image in the face library is found, the face information is used as the face recognition result on the current face position, so that the recognition accuracy is high and the efficiency is greatly improved.
Drawings
Fig. 1 is a flow chart of a human body comprehensive information acquisition method based on face recognition and infrared detection.
Fig. 2 is a schematic structural diagram of a first convolutional neural network employed in the method of fig. 1.
Fig. 3 is a schematic structural diagram of a residual module in the first convolutional neural network of fig. 2.
FIG. 4 is a schematic structural diagram of an RFB-c module in the first convolutional neural network of FIG. 2.
Fig. 5 is a schematic diagram of a second convolutional neural network employed in the method of fig. 1.
Fig. 6 is a flow chart of temperature detection in the method of fig. 1.
Fig. 7 is a schematic diagram of mesh segmentation in temperature detection.
Detailed Description
The present invention will be described in further detail with reference to examples and drawings, but the present invention is not limited thereto.
The embodiment discloses a human body comprehensive information acquisition method based on face recognition and infrared detection, as shown in fig. 1, comprising the following steps:
and S1, acquiring the RGB image and the infrared image of the detected person through the detection equipment.
Here, the detection device has an infrared camera capable of acquiring an infrared image and an RGB camera capable of acquiring an RGB image, and the detected person can be simultaneously acquired by standing under the detection device. The RGB image is a 3-channel color image, the infrared image is a single-channel gray image, the RGB image and the infrared image have the same width and height, and the specific positions in the two images correspond to each other one by one.
And S2, carrying out face detection on the RGB image, and acquiring all face position information in the RGB image.
In this embodiment, face detection is performed through a first convolutional neural network, where an input of the first convolutional neural network is an RGB image, an output of the first convolutional neural network is position information of all faces in the current RGB image, and the position information of each face is represented as (x, y, w, h), where (x, y) represents coordinates of a face center position in the RGB image when an upper left corner point of the RGB image is taken as an origin, w represents a width of a face region, and h represents a height of the face region.
Fig. 2 shows a network structure of a first convolutional neural network, which has an input size of 512 × 512 × 3, and a basic structure of: basic convoluting layer (Basic conv), residual module (Res), Down sampling layer (Down sample), RFB-c module, Up sampling layer (Up sample) and cascade structure
Figure BDA0002578440420000061
The basic structures form a main network of the first convolution neural network and 3 detection branch networks connected with the main network。
Specifically, the backbone network is: the device comprises a basic convolution layer, a down-sampling layer, a residual module, a down-sampling layer, 2 cascaded residual modules, an RFB-c module, a down-sampling layer and a residual module which are connected in sequence;
the first detection branch network is branched from a first RFB-c module in the main network, the second detection branch network is branched from a second RFB-c module, the third detection branch network is branched from the last layer of the main network, namely a residual error module, each detection branch network only comprises a basic convolutional layer, and the convolutional kernel size adopted by the convolutional layer in the detection branch network only comprises two types of 1x1 and 3x 3.
In every two adjacent detection branch networks, after the two detection branch networks pass through the first basic convolution layer, the deeper detection branch network passes through the new basic convolution layer and the upsampling layer, and then is combined to the shallower detection branch network through the cascade structure. That is, after passing through the basic convolutional layer and the upsampling layer in sequence, the first basic convolutional layer output end of the third detection branch network is combined with the first basic convolutional layer output end of the second detection branch network through the cascade structure, and the first basic convolutional layer output end of the second detection branch network further passes through the basic convolutional layer and the upsampling layer in sequence and is combined with the first basic convolutional layer output end of the first detection branch network through the cascade structure.
From shallow to deep, the sizes of feature _ maps output by the 3 detection branch networks are 1/8, 1/16 and 1/32 of the input size respectively. If note map _ size is the feature _ map size of the current detected branch network output, batch _ size is the number of pictures in each input network, and the number of detected categories is num _ class, then the shape of each detected branch network output array is (batch _ size, map _ size, map _ size,3 × (5+ num _ class)), where 5 represents the position x, y, w, h of each prediction, plus a probability that the current prediction position is not a background class.
The last elementary convolutional layer of each detection branch network comprises only two parts connected in sequence: convolution layer and batch regularization layer, do not contain ReLU activation function; the other elementary convolutional layers in the first convolutional neural network each comprise three parts connected in sequence: convolutional layers, batch regularization layers, ReLU activation functions;
as shown in fig. 3, the residual module includes 2 branches, wherein one branch includes a base convolutional layer with convolution kernel size of 1 × 1 (the base convolutional layer has a ReLU activation function) and a depth separable convolutional layer with convolution kernel size of 3 × 3 (depthwise conv, the depth separable convolutional layer has a ReLU activation function) connected in sequence, and the other branch is a direct connection structure, that is, the input of the residual module is connected to the cascade structure directly while being connected to the cascade structure through the base convolutional layer and the depth separable convolutional layer; the outputs of the two branches are combined through a cascade structure, so that the output of the residual error module is obtained;
the down-sampling layer is a basic convolution layer with convolution parameter step length of 2;
the up-sampling layer is used for transforming the image size by adopting a nearest interpolation method for image size transformation in the digital image processing technology;
the cascade structure is used for performing superposition operations on inputs in the channel dimension.
As shown in fig. 4, the RFB-c module contains 5 branches:
the first branch comprises a basic convolution layer with convolution kernel of 1 × 1 and a hole convolution layer with convolution kernel of 3 × 3 and sampling rate (rate) of 1 which are connected in sequence;
the second branch comprises a basic convolution layer with convolution kernel of 1 multiplied by 1, a basic convolution layer with convolution kernel of 1 multiplied by 3 and a cavity convolution layer with convolution kernel of 3 multiplied by 3 and sampling rate of 3 which are connected in sequence;
the third branch comprises a basic convolutional layer with a convolutional kernel of 1 multiplied by 1, a basic convolutional layer with a convolutional kernel of 3 multiplied by 1 and a hole convolutional layer with a convolutional kernel of 3 multiplied by 3 and a sampling rate of 3 which are connected in sequence;
the fourth branch comprises a basic convolutional layer with a convolution kernel of 1 multiplied by 1, a basic convolutional layer with a convolution kernel of 1 multiplied by 3, a basic convolutional layer with a convolution kernel of 3 multiplied by 1 and a hole convolutional layer with a convolution kernel of 3 multiplied by 3 and a sampling rate of 5 which are connected in sequence;
the fifth branch is a direct connection structure;
the first branch, the second branch, the third branch and the fourth branch are superposed and pass through a 1x1 convolution layer, then are combined with the fifth branch through a cascade structure, and finally are connected to a ReLU activation function, so that the output of the RFB-c module is obtained.
S3, based on the face position information, carrying out face recognition on the face image at each face position in the RGB image to obtain a face recognition result at each face position;
and carrying out temperature detection on each face position on the infrared image to obtain a temperature detection result on each face position.
The face recognition is carried out through the second convolutional neural network, and the recognition process is as follows:
the second convolution neural network encodes the current face image in the RGB image and outputs an N-dimensional characteristic vector [ p ]0,p1,…,pN]I.e. mapping into a euclidean space; then, calculating and matching in a face library: calculating a characteristic vector corresponding to face information in a face library and an N-dimensional characteristic vector [ p ]0,p1,…,pN]And taking the second-order norm of the difference X as similarity measurement, wherein the face information with the minimum similarity measurement value with the current detected face image in the face library is the face recognition result at the current face position. In the present embodiment, N is 128, that is, a 128-dimensional feature vector [ p ] is output0,p1,…,p127]。
Second order norm | | X | | non conducting phosphor2The formula of (1) is:
Figure BDA0002578440420000081
wherein x isiRepresents the ith element in the vector X, and n is the total number of elements in the vector X.
As shown in fig. 5, the structure of the second convolutional neural network has an input size of 224 × 224 × 3, and the basic structure includes: a base convolution layer, a residual module, a down-sampling layer, and a Global average pooling layer (Global average pooling).
The backbone network of the second convolutional neural network is: the system comprises a basic convolution layer, a down-sampling layer, a residual error module, a down-sampling layer, 2 cascaded residual error modules, a down-sampling layer and a global average pooling layer which are sequentially connected, wherein a face recognition result of a current detected face in an RGB image is output by the global average pooling layer.
The structures of the basic convolution layer, the residual error module and the down-sampling layer are the same as those of the first convolution neural network, namely:
the basic convolutional layer comprises three parts connected in sequence: convolutional layers, batch regularization layers, ReLU activation functions;
the residual module comprises 2 branches, wherein one branch comprises a basic convolution layer with convolution kernel size of 1x1 and a depth separable convolution layer with convolution kernel size of 3x3 which are sequentially connected, and the other branch is of a direct connection structure; the outputs of the two branches are combined through a cascade structure, so that the output of the residual error module is obtained; the cascade structure is used for performing superposition operation on input on channel dimension;
the downsampled layer is the base convolutional layer with a convolution parameter step size of 2.
The global average pooling layer is used for performing summation and average operation on all elements on each channel of the input, and taking the result as the value of the current channel. The number of input and output channels of the global average pooling layer remains the same.
As shown in fig. 6, the temperature detection process is as follows:
based on the face position information, finding a corresponding infrared face area in the infrared image;
and then dividing the area into P multiplied by Q grids, summing all the temperatures in each grid, taking the sum as the temperature value of the grid, traversing the temperature values of all the grids, and taking the highest temperature value as the temperature detection result of the current detected personnel. In order to obtain a good face recognition effect, the value proportion of P, Q is as close as possible to the aspect ratio of the target area. In this embodiment, the area is divided into 8 × 8 grids, and fig. 7 can be referred to specifically.
And S4, combining the face recognition result and the temperature detection result at the same face position to obtain the comprehensive human body information of each detected person.
Based on the collected human body comprehensive information, the health monitoring of the detected personnel can be completed by combining the health judgment basis given by the hospital when the temperature of the detected personnel is abnormal, which is beneficial to judging whether relevant preventive measures need to be taken or not and further provides effective technical support for early warning of epidemic diseases.
The techniques described herein may be implemented by various means. For example, these techniques may be implemented in hardware, firmware, software, or a combination thereof. For a hardware implementation, the processing modules may be implemented within one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Programmable Logic Devices (PLDs), field-programmable gate arrays (FPGAs), processors, controllers, micro-controllers, electronic devices, other electronic units designed to perform the functions described herein, or a combination thereof.
For a firmware and/or software implementation, the techniques may be implemented with modules (e.g., procedures, steps, flows, and so on) that perform the functions described herein. The firmware and/or software codes may be stored in a memory and executed by a processor. The memory may be implemented within the processor or external to the processor.
Those of ordinary skill in the art will understand that: all or part of the steps for realizing the method embodiments can be completed by hardware related to program instructions, the program can be stored in a computer readable storage medium, and the program executes the steps comprising the method embodiments when executed; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.
The invention has been described in connection with the accompanying drawings, it is to be understood that the invention is not limited in its application to the details of construction and the arrangement of the components set forth in the following description, since various insubstantial modifications of the inventive concept and arrangement, or direct application of the inventive concept and arrangement to other applications without modification, are intended to be covered by the scope of the invention.

Claims (10)

1. A human body comprehensive information acquisition method based on face recognition and infrared detection is characterized by comprising the following steps:
s1, acquiring an RGB image and an infrared image of the detected person through detection equipment;
s2, carrying out face detection on the RGB image to acquire all face position information in the RGB image;
s3, based on the face position information, carrying out face recognition on the face image at each face position in the RGB image to obtain a face recognition result at each face position;
carrying out temperature detection on each face position on the infrared image to obtain a temperature detection result on each face position;
and S4, combining the face recognition result and the temperature detection result at the same face position to obtain the comprehensive human body information of each detected person.
2. The method for acquiring integrated human body information based on face recognition and infrared detection as claimed in claim 1, wherein the RGB image collected in step S1 is a 3-channel color map, the infrared image is a single-channel gray-scale map, and the RGB image and the infrared image have the same width and height, and the specific positions in the two images correspond to each other one by one.
3. The method for acquiring integrated human body information based on face recognition and infrared detection as claimed in claim 1, wherein in step S2, the face detection is performed by a first convolutional neural network, the input of the first convolutional neural network is an RGB image, the output of the first convolutional neural network is position information of all faces in the current image, and the position information of each face is represented as (x, y, w, h), where (x, y) represents coordinates of a center position of a face in the RGB image when an upper left corner point of the RGB image is an origin, w represents a width of a face region, and h represents a height of the face region.
4. The human body comprehensive information acquisition method based on face recognition and infrared detection as claimed in claim 3, wherein the first convolution neural network has a basic structure of: the device comprises a basic convolution layer, a residual error module, a down-sampling layer, an RFB-c module, an up-sampling layer and a cascade structure;
this first convolution neural network includes backbone network, 3 detection branch networks of connecting backbone network, and wherein, backbone network is: the device comprises a basic convolution layer, a down-sampling layer, a residual module, a down-sampling layer, 2 cascaded residual modules, an RFB-c module, a down-sampling layer and a residual module which are connected in sequence;
the first detection branch network is separated from a first RFB-c module in a main network, the second detection branch network is separated from a second RFB-c module, the third detection branch network is separated from the last layer of the main network, namely a residual error module, each detection branch network only comprises a basic convolution layer, and in each two adjacent detection branch networks, after the two detection branch networks pass through the first basic convolution layer, the deeper detection branch network passes through the basic convolution layer and an upper sampling layer, and then is combined to the shallower detection branch network through a cascade structure.
5. The human body comprehensive information acquisition method based on face recognition and infrared detection as claimed in claim 4, wherein in the first convolutional neural network adopted in step S2, the last basic convolutional layer of each detection branch network only contains one convolutional layer, and does not contain a batch regularization layer and a ReLU activation function; the other elementary convolutional layers in the first convolutional neural network each comprise three parts connected in sequence: convolutional layers, batch regularization layers, ReLU activation functions;
the residual module comprises 2 branches, wherein one branch comprises a basic convolution layer with convolution kernel size of 1x1 and a depth separable convolution layer with convolution kernel size of 3x3 which are sequentially connected, and the other branch is of a direct connection structure; the outputs of the two branches are combined through a cascade structure, so that the output of the residual error module is obtained;
the down-sampling layer is a basic convolution layer with convolution parameter step length of 2;
the method adopted by the up-sampling layer is a nearest neighbor interpolation method;
the cascade structure is used for performing superposition operations on inputs in the channel dimension.
6. The method for acquiring integrated human body information based on face recognition and infrared detection as claimed in claim 4, wherein in the first convolutional neural network adopted in step S2, the RFB-c module comprises 5 branches:
the first branch comprises a basic convolution layer with convolution kernel of 1 multiplied by 1 and a cavity convolution layer with convolution kernel of 3 multiplied by 3 and sampling rate of 1 which are connected in sequence;
the second branch comprises a basic convolution layer with convolution kernel of 1 multiplied by 1, a basic convolution layer with convolution kernel of 1 multiplied by 3 and a cavity convolution layer with convolution kernel of 3 multiplied by 3 and sampling rate of 3 which are connected in sequence;
the third branch comprises a basic convolutional layer with a convolutional kernel of 1 multiplied by 1, a basic convolutional layer with a convolutional kernel of 3 multiplied by 1 and a hole convolutional layer with a convolutional kernel of 3 multiplied by 3 and a sampling rate of 3 which are connected in sequence;
the fourth branch comprises a basic convolutional layer with a convolution kernel of 1 multiplied by 1, a basic convolutional layer with a convolution kernel of 1 multiplied by 3, a basic convolutional layer with a convolution kernel of 3 multiplied by 1 and a hole convolutional layer with a convolution kernel of 3 multiplied by 3 and a sampling rate of 5 which are connected in sequence;
the fifth branch is a direct connection structure;
the first branch, the second branch, the third branch and the fourth branch are superposed and pass through a 1x1 convolution layer, then are combined with the fifth branch through a cascade structure, and finally are connected to a ReLU activation function, so that the output of the RFB-c module is obtained.
7. The method for acquiring integrated human body information based on face recognition and infrared detection as claimed in claim 1, wherein in step S3, the face recognition is performed through the second convolutional neural network, and the recognition process is as follows:
the second convolution neural network encodes the current face image in the RGB image and outputs an N-dimensional characteristic vector [ p ]0,p1,…,pN]I.e. mapping into a euclidean space; then, calculating and matching in a face library: calculating a characteristic vector corresponding to face information in a face library and an N-dimensional characteristic vector [ p ]0,p1,…,pN]The second-order norm of the difference X is used as similarity measurement, and the face information with the minimum similarity measurement value with the current detected face image in the face library is a face recognition result at the current face position;
wherein the second-order norm | | X | | non-woven phosphor2The formula of (1) is:
Figure FDA0002578440410000031
wherein x isiRepresents the ith element in the vector X, and n is the total number of elements in the vector X.
8. The human body comprehensive information acquisition method based on face recognition and infrared detection as claimed in claim 7, wherein the basic structure of the second convolutional neural network comprises: the device comprises a basic convolution layer, a residual error module, a down-sampling layer and a global average pooling layer;
the backbone network of the second convolutional neural network is: the system comprises a basic convolution layer, a down-sampling layer, a residual error module, a down-sampling layer, 2 cascaded residual error modules, a down-sampling layer and a global average pooling layer which are sequentially connected, wherein a face recognition result of a current detected face in an RGB image is output by the global average pooling layer.
9. The method for acquiring integrated human body information based on face recognition and infrared detection according to claim 8, wherein in the second convolutional neural network adopted in step S3, the basic convolutional layer comprises three parts connected in sequence: convolutional layers, batch regularization layers, ReLU activation functions;
the residual module comprises 2 branches, wherein one branch comprises a basic convolution layer with convolution kernel size of 1x1 and a depth separable convolution layer with convolution kernel size of 3x3 which are sequentially connected, and the other branch is of a direct connection structure; the outputs of the two branches are combined through a cascade structure, so that the output of the residual error module is obtained; the cascade structure is used for performing superposition operation on input on channel dimension;
the down-sampling layer is a basic convolution layer with convolution parameter step length of 2;
and the global average pooling layer is used for performing summation and averaging operation on all elements on each input channel, taking the result as the value of the current channel, and keeping the number of input and output channels of the global average pooling layer consistent.
10. The method for acquiring integrated human body information based on face recognition and infrared detection as claimed in claim 1, wherein the temperature detection process in step S3 is as follows:
based on the face position information, finding a corresponding infrared face area in the infrared image;
and then dividing the area into P multiplied by Q grids, summing all the temperatures in each grid, taking the sum as the temperature value of the grid, traversing the temperature values of all the grids, and taking the highest temperature value as the temperature detection result of the current detected personnel.
CN202010660673.5A 2020-07-10 2020-07-10 Human body comprehensive information acquisition method based on face recognition and infrared detection Active CN111967315B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010660673.5A CN111967315B (en) 2020-07-10 2020-07-10 Human body comprehensive information acquisition method based on face recognition and infrared detection

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010660673.5A CN111967315B (en) 2020-07-10 2020-07-10 Human body comprehensive information acquisition method based on face recognition and infrared detection

Publications (2)

Publication Number Publication Date
CN111967315A true CN111967315A (en) 2020-11-20
CN111967315B CN111967315B (en) 2023-08-22

Family

ID=73361643

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010660673.5A Active CN111967315B (en) 2020-07-10 2020-07-10 Human body comprehensive information acquisition method based on face recognition and infrared detection

Country Status (1)

Country Link
CN (1) CN111967315B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112232309A (en) * 2020-12-08 2021-01-15 飞础科智慧科技(上海)有限公司 Method, electronic device and storage medium for thermographic face recognition
CN113850243A (en) * 2021-11-29 2021-12-28 北京的卢深视科技有限公司 Model training method, face recognition method, electronic device and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018120740A1 (en) * 2016-12-29 2018-07-05 深圳光启合众科技有限公司 Picture classification method, device and robot
CN109472247A (en) * 2018-11-16 2019-03-15 西安电子科技大学 Face identification method based on the non-formula of deep learning
CN110110650A (en) * 2019-05-02 2019-08-09 西安电子科技大学 Face identification method in pedestrian

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018120740A1 (en) * 2016-12-29 2018-07-05 深圳光启合众科技有限公司 Picture classification method, device and robot
CN109472247A (en) * 2018-11-16 2019-03-15 西安电子科技大学 Face identification method based on the non-formula of deep learning
CN110110650A (en) * 2019-05-02 2019-08-09 西安电子科技大学 Face identification method in pedestrian

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
范雪;杨鸿波;李永;: "基于深度学习的人脸图像扭正算法", 信息通信, no. 07, pages 11 - 15 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112232309A (en) * 2020-12-08 2021-01-15 飞础科智慧科技(上海)有限公司 Method, electronic device and storage medium for thermographic face recognition
CN112232309B (en) * 2020-12-08 2021-03-09 飞础科智慧科技(上海)有限公司 Method, electronic device and storage medium for thermographic face recognition
CN113850243A (en) * 2021-11-29 2021-12-28 北京的卢深视科技有限公司 Model training method, face recognition method, electronic device and storage medium

Also Published As

Publication number Publication date
CN111967315B (en) 2023-08-22

Similar Documents

Publication Publication Date Title
CN111080693A (en) Robot autonomous classification grabbing method based on YOLOv3
CN110781756A (en) Urban road extraction method and device based on remote sensing image
CN111967315B (en) Human body comprehensive information acquisition method based on face recognition and infrared detection
CN111986240A (en) Drowning person detection method and system based on visible light and thermal imaging data fusion
CN110163906B (en) Point cloud data processing method and device, electronic equipment and storage medium
CN112258537B (en) Method for monitoring dark vision image edge detection based on convolutional neural network
CN111985502A (en) Multi-mode image feature matching method with scale invariance and rotation invariance
Cheng et al. A vision-based robot grasping system
CN112070153A (en) Relativity relationship verification method and system based on deep learning
CN114926682A (en) Local outlier factor-based industrial image anomaly detection and positioning method and system
Chen et al. An intelligent vision recognition method based on deep learning for pointer meters
CN114972202A (en) Ki67 pathological cell rapid detection and counting method based on lightweight neural network
CN112036250B (en) Pedestrian re-identification method, system, medium and terminal based on neighborhood cooperative attention
Wang et al. Scale value guided Lite-FCOS for pointer meter reading recognition
CN112529908A (en) Digital pathological image segmentation method based on cascade convolution network and model thereof
CN112017221A (en) Multi-modal image registration method, device and equipment based on scale space
Peng et al. Automatic recognition of pointer meter reading based on Yolov4 and improved U-net algorithm
Lo et al. Depth estimation based on a single close-up image with volumetric annotations in the wild: A pilot study
JPWO2011138882A1 (en) Template matching processing apparatus and template matching processing program
Monroy et al. Automated chronic wounds medical assessment and tracking framework based on deep learning
CN113034432A (en) Product defect detection method, system, device and storage medium
CN107133961B (en) Method for detecting and processing circular area in video image
Maithil et al. Semantic Segmentation of Urban Area Satellite Imagery Using DensePlusU-Net
CN115830156B (en) Accurate electrical impedance tomography method, device, system, medium and equipment
CN116958774B (en) Target detection method based on self-adaptive spatial feature fusion

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant