CN111209822A - Face detection method of thermal infrared image - Google Patents

Face detection method of thermal infrared image Download PDF

Info

Publication number
CN111209822A
CN111209822A CN201911394420.1A CN201911394420A CN111209822A CN 111209822 A CN111209822 A CN 111209822A CN 201911394420 A CN201911394420 A CN 201911394420A CN 111209822 A CN111209822 A CN 111209822A
Authority
CN
China
Prior art keywords
frame
thermal infrared
prediction
neural network
convolutional neural
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911394420.1A
Other languages
Chinese (zh)
Inventor
张天序
郭诗嘉
李正涛
苏轩
郭婷
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Huatu Information Technology Co ltd
Original Assignee
Nanjing Huatu Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Huatu Information Technology Co ltd filed Critical Nanjing Huatu Information Technology Co ltd
Priority to CN201911394420.1A priority Critical patent/CN111209822A/en
Publication of CN111209822A publication Critical patent/CN111209822A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Molecular Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses a face detection method of a thermal infrared image, which comprises the following steps: (1) acquiring a positive sample, a negative sample and a test set of the training set, and respectively framing a human face frame as a calibration frame for each thermal infrared image of the positive sample; (2) acquiring a training label; (3) building a convolutional neural network, inputting a training set and a training label into the convolutional neural network together for training, and optimizing the convolutional neural network by using a loss function so as to obtain a required training model of the convolutional neural network; (4) and inputting the thermal infrared image concentrated in the test, and obtaining a face detection frame through a convolutional neural network. The invention inputs the thermal infrared image into the convolutional neural network for training to obtain the convolutional neural network meeting the requirement, and can realize automatic detection of the thermal infrared image so as to accurately frame out the human face range and reduce the detection error rate.

Description

Face detection method of thermal infrared image
Technical Field
The invention belongs to the technical field of biological feature recognition, and particularly relates to a face detection method.
Background
And detecting the human face to obtain the specific positions of all human faces in the picture, wherein the specific positions are usually represented by a rectangular frame, the object in the rectangular frame is the human face, and the part outside the rectangular frame is the background.
The visible light face detection technology is widely applied to the fields of customs, stations, attendance checking, automatic driving, suspect tracking and the like. However, the visible light face detection technology cannot work without an external light source, and cannot detect a face with a mask on the face. The visible light can not be used for living body detection, and the imaging is not judged to be a real person, so that the human face detection method is easy to be deceived by photos and faces dressed up easily, and the results of the face detection are inaccurate and limited.
The thermal infrared image is thermal radiation imaging, which is based on the difference in infrared radiation from an object, and the infrared thermal imager can convert the naturally emitted infrared radiation distribution from the surface of the object into a visible image. Because different objects or different parts of the same object usually have different heat radiation characteristics, such as temperature difference, emissivity and the like, after thermal infrared imaging is carried out, the objects in the thermal infrared image are distinguished due to the difference of the heat radiation. Therefore, the hot infrared image can easily solve the function of biopsy, the human face is a high-temperature object compared with other objects, the image is white in a gray scale image, different capillary vessels are distributed on different organs of the face, the heat radiation is different, and facial five sense organs can be presented.
Active near-infrared face recognition is started to rise at present, but the technology needs an active light source and limits the distance to 50-100 cm. And the active light source can generate obvious reflection on the glasses, so that the positioning precision of the eyes is reduced, and the active light source can be damaged and attenuated after being used for a long time. At present, no face detection method for thermal infrared images exists in China.
Disclosure of Invention
In view of the above defects or improvement requirements of the prior art, the present invention provides a face detection method for thermal infrared images, which can clearly frame the face position in the thermal infrared images without any light source, so as to meet the detection requirements for the thermal infrared images.
In order to achieve the above object, according to an aspect of the present invention, there is provided a method for detecting a human face using thermal infrared images, comprising the steps of:
(1) the method comprises the following steps of taking N thermal infrared images as positive samples and L thermal infrared images of an undisplayed face as negative samples to form a training set, obtaining M thermal infrared images as a test set, and framing a face frame of each thermal infrared image of the positive samples as a calibration frame; the mark of each thermal infrared image in the positive sample is 1, and the mark of each thermal infrared image in the negative sample is 0;
(2) the coordinate value of the central point of the calibration frame of each thermal infrared image is reduced in proportion to the size values of the width and the height, and the reduced coordinate value of the central point, the reduced size values of the width and the height and the mark of the thermal infrared image are stored in an independent txt file together, so that N txt files are obtained in total;
in addition, the path of each thermal infrared image in the training set and the marks of all the thermal infrared images in the negative sample are stored in another txt file;
in this way, N +1 txt files are obtained as training labels;
(3) building a convolutional neural network, inputting a training set and a training label into the convolutional neural network together for training, and optimizing the convolutional neural network by using a loss function so as to obtain a required training model of the convolutional neural network;
(4) and inputting the thermal infrared image concentrated in the test, and obtaining a face detection frame through a convolutional neural network.
Preferably, in the step (1), a thermal infrared imager is adopted to collect the thermal infrared image, and the collection condition is as follows: the human face and the medium wave thermal infrared imager of each person record videos by adopting a plurality of groups of distances and a plurality of groups of set time, the videos are cut according to set frame numbers, then, the set number of photos are selected, and then N thermal infrared images are selected as a training set.
Preferably, the training labels generated in step (2) are specifically as follows:
(2.1) storing the relative coordinates of the center point of the calibration frame:
Figure BDA0002345901300000031
wherein (x)1,y1),(x2,y2) Two coordinates representing diagonal positions on the calibration frame are represented by (x)1,y1),(x2,y2) Determining the calibration frame, x1And x2Representing the width coordinate, y, in an x-y image coordinate system1And y2Denotes the height coordinate in the x-y image coordinate system, and x1>x2,y1>y2
centrexRepresenting the width coordinate, centre, of the centre point of the calibration frame in the x-y image coordinate systemyThe length coordinate of the central point of the calibration frame under an x-y image coordinate system is represented, w represents the length of the thermal infrared image where the calibration frame is located, and h represents the height of the thermal infrared image where the calibration frame is located;
(2.2) store the relative size of the length of the calibration box to the thermal infrared image in which it is located:
Figure BDA0002345901300000032
wherein the frame isxRepresenting the relative width, frame, of the calibration frameyIndicating the relative height of the calibration frame;
mixing the above centrex、centrey、framex、frameyStoring the marks of the thermal infrared images in the positive sample in the same txt file, and marking the marks of different thermal infrared images in the positive sample and the center of the calibration framex、centrey、framex、frameyStoring different txt files.
Preferably, the convolutional neural network adopts a Darknet framework and a Yolo network, the Darknet framework is used for performing convolution, maximum pooling and normalization operations on the input thermal infrared image so as to obtain the weight of the convolutional neural network, and the Yolo network is used for processing the weight of the convolutional neural network so as to perform face determination and position regression.
Preferably, the size relationship between the calibration box and the prediction box constructed by the convolutional neural network is as follows:
ax=dx+Δ(mx)
ay=dy+Δ(my)
Figure BDA0002345901300000033
Figure BDA0002345901300000034
wherein, ax,ayRespectively representing the width and height of the center coordinate of the calibration frame under the u-v image coordinate system, awAnd ahDenotes the width and height, Δ (m), of the calibration framex),Δ(my) Respectively indicating the amount of deviation in the width direction and the amount of deviation in the height direction from the center of the calibration frame to the center of the prediction frame, dx,dyRespectively representing the width and height, p, of the central coordinate of the prediction boxw,phExpressed as the width and height of the prediction box, m, respectivelyw,mhWide and high scaling ratios of the prediction box respectively, and the delta function is a sigmoid function.
Preferably, the prediction box constructed by the convolutional neural network is six and is divided into two scales, the heights of the six prediction boxes are respectively a prediction box I, a prediction box II, a prediction box III, a prediction box IV, a prediction box V and a prediction box VI after being sorted from large to small, wherein the first scale allocates the prediction box I, the prediction box III and the prediction box IV, and the second scale allocates the prediction box II, the prediction box IV and the prediction box VI.
Preferably, in step (3), the loss function is optimized for the convolutional neural network specifically as follows:
Figure BDA0002345901300000041
where loss represents the loss, S2Represents the number of grids of the convolutional neural network, B represents the number of prediction boxes per cell,
Figure BDA0002345901300000042
whether the jth anchor box of the ith grid is responsible for the target or not is shown, the value is 0 when the ith grid is not responsible for the target, the value is 1 when the ith grid is responsible for the target,
Figure BDA0002345901300000043
the j-th prediction frame of the i grids represents an irresponsible target, the value of the target is 1 when the target exists, the value of the target is 0 when the target does not exist, and the lambda iscoord=5,λnoobj=0.5,xi,yiRespectively representing the width and height of the center point coordinate of the ith prediction box,
Figure BDA0002345901300000044
respectively representing the width and height, w, of the coordinates of the center point of the ith calibration framei,hiRespectively representing the width and height of the ith prediction box,
Figure BDA0002345901300000045
respectively, the width and height of the ith calibration frame, ciRepresenting the confidence of the ith prediction box, the value of the selected prediction box is 1, the value of the unselected prediction box is 0,
Figure BDA0002345901300000046
representing the confidence of the ith calibration frame, the value of the selected calibration frame is 1, the value of the unselected calibration frame is 0, piRepresenting the classification probability of a face in the ith prediction box,
Figure BDA0002345901300000047
representing the classification probability of the face in the ith calibration frame, c representing the class with or without the face, and classes representing the set of the classes with and without the face;
and after the loss is obtained, updating by adopting a random gradient descent algorithm, continuously selecting and judging the optimal parameter under the current target by the convolutional neural network, updating the parameter in the convolutional neural network according to the loss result, and stopping updating after the convolutional neural network reaches the required index.
In general, compared with the prior art, the above technical solution contemplated by the present invention can achieve the following beneficial effects:
1) the invention inputs the thermal infrared image into the convolutional neural network for training to obtain the convolutional neural network meeting the requirement, and can realize automatic detection of the thermal infrared image so as to accurately frame out the human face range and reduce the error rate of human face detection.
2) The invention carries out face detection by the thermal infrared technology, can clearly frame the face position in the thermal infrared image without any light source, and meets the detection requirement of the thermal infrared image.
Drawings
FIG. 1 is a schematic flow diagram of the present invention;
FIG. 2 is a schematic flow chart of the present invention for obtaining training labels;
FIG. 3 is a flow chart of the convolutional neural network gain loss calculation in the present invention;
FIG. 4 is a thermal infrared image to be detected;
FIG. 5 is a schematic illustration of the thermal infrared image of FIG. 4 after detection;
FIG. 6 is a schematic diagram of three prediction boxes in a first scale;
FIG. 7 is a schematic diagram of three prediction boxes at a second scale;
fig. 8 is a schematic diagram of detection of two faces.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.
Referring to the attached drawings, the method for detecting the human face of the thermal infrared image comprises the following steps:
(1) and taking N thermal infrared images as positive samples and L thermal infrared images without human faces as negative samples to form a training set, and obtaining M thermal infrared images as a test set.
In order to guarantee a sufficient number of thermal infrared images, it is necessary to guarantee sufficient experimental data. Specifically, a medium wave thermal infrared imager with the model of TAURUS-110kM of IRCAM company in Germany can be adopted, and the test environment of data is as follows: the distance of people's face from camera is 2 meters, 3 meters, the people's face of 5 meters different distances, and through recording the video of settlement time to everyone, every video is selected the photo of settlement quantity after cutting out according to setting for the frame number, can select 200 people to shoot, adopts the video form of 50 frame intercepting, has included different gestures, the influence of different scene backgrounds, has had the scene of external light source, has guaranteed the accuracy of the follow-up use of face detection model through a large amount of experiments. Then, the thermal infrared images intercepted by the video can be screened, the images which do not meet the training requirement are removed, the training data is screened to remove some useless data, so as to prevent a computer from learning the parameters and influencing real parameters in deep learning, for example, when a picture is cut, blurred images which are easy to appear in posture conversion are generally removed, 14 ten thousand thermal infrared images can be obtained as a training set, and M-6 ten thousand thermal infrared images are obtained as a test set, the training set selects N-3.5 million thermal infrared images as positive samples and L-10.5 million thermal infrared images as negative samples, the thermal infrared images in the positive samples show faces and can select face frames, the images in the negative examples do not show the face, e.g. only devices, clothing, walls, etc. are shown.
Then framing a face frame of each thermal infrared image of the alignment sample as a calibration frame; the mark of each thermal infrared image in the positive sample is 1, and the mark of each thermal infrared image in the negative sample is 0;
(2) the coordinate value of the central point of the calibration frame of each thermal infrared image is reduced in proportion to the size values of the width and the height, and the reduced coordinate value of the central point, the reduced size values of the width and the height and the mark of the thermal infrared image are stored in an independent txt file together, so that N txt files are obtained in total;
in addition, the path of each thermal infrared image in the training set and the marks of all the thermal infrared images in the negative sample are stored in another txt file;
in this way, a total of N +1 txt files are obtained as training labels, as follows:
(2.1) storing the relative coordinates of the center point of the calibration frame:
Figure BDA0002345901300000071
wherein (x)1,y1),(x2,y2) Two coordinates representing diagonal positions on the calibration frame are represented by (x)1,y1),(x2,y2) Determining the calibration frame, x1And x2Representing the width coordinate, y, in an x-y image coordinate system1And y2Denotes the height coordinate in the x-y image coordinate system, and x1>x2,y1>y2
centrexRepresenting the width coordinate, centre, of the centre point of the calibration frame in the x-y image coordinate systemyThe length coordinate of the central point of the calibration frame under an x-y image coordinate system is represented, w represents the length of the thermal infrared image where the calibration frame is located, and h represents the height of the thermal infrared image where the calibration frame is located;
(2.2) store the relative size of the length of the calibration box to the thermal infrared image in which it is located:
Figure BDA0002345901300000072
wherein the frame isxRepresenting the relative width, frame, of the calibration frameyIndicating the relative height of the calibration frame;
mixing the above centrex、centrey、framex、frameyStoring the marks of the thermal infrared images in the positive sample in the same txt file, and marking the marks of different thermal infrared images in the positive sample and the center of the calibration framex、centrey、framex、frameyStoring different txt files.
The invention only needs to store the relative coordinate of the central point of the calibration frame and the relative size of the calibration frame, thereby saving the acquisition time of a large number of parameters.
(3) Building a convolutional neural network, inputting a training set and a training label into the convolutional neural network together for training, and optimizing the convolutional neural network by using a loss function so as to obtain a required training model of the convolutional neural network;
the convolutional neural network adopts a Darknet framework, and the Darknet framework is used for performing convolution, maximum pooling and normalization operations on an input thermal infrared image so as to obtain the weight of the convolutional neural network, specifically, the Darknet framework trains a 53-layer network and provides a 106-layer fully-convolutional bottom layer framework. In the forward propagation process, the size of the tensor is transformed by changing the step size of the convolution kernel, such as stride (2, 2), which is equivalent to reducing the side length of the image by half (i.e. reducing the area to 1/4). In the network, 5 times of reduction is needed, 1/2 which reduces the characteristic diagram to the original input size5I.e., 1/32. The input is 416x416 and the output is 13x13(416/32 ═ 13). The backpone would narrow the output profile to 1/32 at the input.
The convolutional neural network also adopts a Yolo network for processing the weight of the convolutional neural network to perform face judgment and position regression, six prediction frames are built and divided into two scales by designing Fast Anchor (Fast prediction frame algorithm), the heights of the six prediction frames are respectively a prediction frame I, a prediction frame II, a prediction frame III, a prediction frame IV, a prediction frame V and a prediction frame VI after being sorted from large to small, wherein the first scale allocates the prediction frame I, the prediction frame III and the prediction frame IV, and the second scale allocates the prediction frame II, the prediction frame IV and the prediction frame VI.
The size relationship between the calibration box and the prediction box constructed by the convolutional neural network is as follows:
ax=dx+Δ(mx)
ay=dy+Δ(my)
Figure BDA0002345901300000081
Figure BDA0002345901300000082
wherein, ax,ayRespectively representing the width and height of the center coordinate of the calibration frame under the u-v image coordinate system, awAnd ahDenotes the width and height, Δ (m), of the calibration framex),Δ(my) Respectively indicating the amount of deviation in the width direction and the amount of deviation in the height direction from the center of the calibration frame to the center of the prediction frame, dx,dyRespectively representing the width and height, p, of the central coordinate of the prediction boxw,phExpressed as the width and height of the prediction box, m, respectivelyw,mhRespectively, the wide scaling ratio and the high scaling ratio of the prediction frame; the delta function is a sigmoid function, and the prediction quantity is scaled to be within 0-1, so that the aim of fast convergence can be achieved. When whether the face exists or not is detected, the length-width ratio is approximate to 1: 1, and a prediction frame with large length-width ratio difference cannot appear.
The loss function is optimized for the convolutional neural network as follows:
Figure BDA0002345901300000091
in the above formula, the total variance is adopted for the loss functions of w and h, and the binary cross entropy is used for the loss function of the confidence coefficient. The first row of the expression is the total square error and is used as the loss function of the position prediction, the second row of the expression uses the root total variance as the loss function of the height and the width, the third row and the fourth row of the expression uses the binary cross entropy as the loss function of the confidence coefficient, and the fifth row of the expression uses SSE as the loss function of the category probability.
Where loss represents the loss, S2Represents the number of grids of the convolutional neural network, B represents the number of prediction boxes per cell,
Figure BDA0002345901300000092
whether the jth anchor box of the ith grid is responsible for the target or not is shown, the value is 0 when the ith grid is not responsible for the target, the value is 1 when the ith grid is responsible for the target,
Figure BDA0002345901300000093
the j-th prediction frame of the i grids represents an irresponsible target, the value of the target is 1 when the target exists, the value of the target is 0 when the target does not exist, and the lambda iscoord=5,λnoobj=0.5,xi,yiRespectively representing the width and height of the center point coordinate of the ith prediction box,
Figure BDA0002345901300000094
respectively representing the width and height, w, of the coordinates of the center point of the ith calibration framei,hiRespectively representing the width and height of the ith prediction box,
Figure BDA0002345901300000095
respectively, the width and height of the ith calibration frame, ciRepresenting the confidence of the ith prediction box, the value of the selected prediction box is 1, the value of the unselected prediction box is 0,
Figure BDA0002345901300000096
representing the confidence of the ith calibration frame, the value of the selected calibration frame is 1, the value of the unselected calibration frame is 0, piRepresenting the classification probability of a face in the ith prediction box,
Figure BDA0002345901300000097
representing the classification probability of the face in the ith calibration frame, c representing the class with or without the face, and classes representing the set of the classes with and without the face;
and after the loss is obtained, updating by adopting a random gradient descent algorithm, continuously selecting and judging the optimal parameter under the current target by the convolutional neural network, updating the parameter in the convolutional neural network according to the loss result so as to ensure that the output result of the convolutional neural network is the same as the training label, and stopping updating after the convolutional neural network reaches the required index.
(4) And inputting the thermal infrared image to be detected to obtain a face detection result. The invention can realize the processing of 0.024s of a single graph, and has high precision and accuracy rate of more than 98.6 percent.
In addition, the coordinates mentioned in the invention refer to the coordinates under a u-v image coordinate system, the widths of the thermal infrared image and the frame are the side length sizes in the left and right directions, and the heights are the side length sizes in the vertical direction.
It will be understood by those skilled in the art that the foregoing is only a preferred embodiment of the present invention, and is not intended to limit the invention, and that any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (7)

1. A face detection method of a thermal infrared image is characterized by comprising the following steps:
(1) the method comprises the following steps of taking N thermal infrared images as positive samples and L thermal infrared images of an undisplayed face as negative samples to form a training set, obtaining M thermal infrared images as a test set, and framing a face frame of each thermal infrared image of the positive samples as a calibration frame; the mark of each thermal infrared image in the positive sample is 1, and the mark of each thermal infrared image in the negative sample is 0;
(2) the coordinate value of the central point of the calibration frame of each thermal infrared image is reduced in proportion to the size values of the width and the height, and the reduced coordinate value of the central point, the reduced size values of the width and the height and the mark of the thermal infrared image are stored in an independent txt file together, so that N txt files are obtained in total;
in addition, the path of each thermal infrared image in the training set and the marks of all the thermal infrared images in the negative sample are stored in another txt file;
in this way, N +1 txt files are obtained as training labels;
(3) building a convolutional neural network, inputting a training set and a training label into the convolutional neural network together for training, and optimizing the convolutional neural network by using a loss function so as to obtain a required training model of the convolutional neural network;
(4) and inputting the thermal infrared image concentrated in the test, and obtaining a face detection frame through a convolutional neural network.
2. The method for detecting the human face of the thermal infrared image according to claim 1, wherein in the step (1), the thermal infrared image is collected by a thermal infrared imager, and the collection condition is as follows: the human face and the medium wave thermal infrared imager of each person record videos by adopting a plurality of groups of distances and a plurality of groups of set time, the videos are cut according to set frame numbers, then, the set number of photos are selected, and then, a training set and a testing set are obtained.
3. The method for detecting the human face with the thermal infrared image according to claim 1, wherein the training label generated in the step (2) is specifically as follows:
(2.1) storing the relative coordinates of the center point of the calibration frame:
Figure RE-FDA0002445217690000011
wherein (x)1,y1),(x2,y2) Two coordinates representing diagonal positions on the calibration frame are represented by (x)1,y1),(x2,y2) Determining the calibration frame, x1And x2Representing the width coordinate, y, in an x-y image coordinate system1And y2Denotes the height coordinate in the x-y image coordinate system, and x1>x2,y1>y2
centrexRepresenting the width coordinate, centre, of the centre point of the calibration frame in the x-y image coordinate systemyGraph with center point of calibration box in x-yLength coordinates under an image coordinate system, w represents the length of the thermal infrared image where the calibration frame is located, and h represents the height of the thermal infrared image where the calibration frame is located;
(2.2) store the relative size of the length of the calibration box to the thermal infrared image in which it is located:
Figure RE-FDA0002445217690000021
wherein the frame isxRepresenting the relative width, frame, of the calibration frameyIndicating the relative height of the calibration frame;
mixing the above centrex、centrey、framex、frameyStoring the marks of the thermal infrared images in the positive sample in the same txt file, and marking the marks of different thermal infrared images in the positive sample and the center of the calibration framex、centrey、framex、frameyStoring different txt files.
4. The method according to claim 1, wherein the convolutional neural network employs a Darknet framework and a Yolo network, the Darknet framework is used for performing convolution, max pooling and normalization on the input thermal infrared image to obtain weights of the convolutional neural network, and the Yolo network is used for processing the weights of the convolutional neural network to perform face determination and position regression.
5. The method for detecting the human face of the thermal infrared image according to claim 1, wherein the size relationship between the calibration frame and the prediction frame constructed by the convolutional neural network is as follows:
ax=dx+Δ(mx)
ay=dy+Δ(my)
Figure RE-FDA0002445217690000022
Figure RE-FDA0002445217690000023
wherein, ax,ayRespectively representing the width and height of the center coordinate of the calibration frame under the u-v image coordinate system, awAnd ahDenotes the width and height, Δ (m), of the calibration framex),Δ(my) Respectively indicating the amount of deviation in the width direction and the amount of deviation in the height direction from the center of the calibration frame to the center of the prediction frame, dx,dyRespectively representing the width and height, p, of the central coordinate of the prediction boxw,phExpressed as the width and height of the prediction box, m, respectivelyw,mhWide and high scaling ratios of the prediction box respectively, and the delta function is a sigmoid function.
6. The method according to claim 5, wherein the prediction frame constructed by the convolutional neural network is six and divided into two scales, and the heights of the six prediction frames are respectively a prediction frame I, a prediction frame II, a prediction frame III, a prediction frame IV, a prediction frame V and a prediction frame VI after being sorted from large to small, wherein the first scale allocates the prediction frame I, the prediction frame III and the prediction frame IV, and the second scale allocates the prediction frame II, the prediction frame IV and the prediction frame VI.
7. The method for detecting the human face of the thermal infrared image according to claim 1, wherein in the step (3), the loss function is optimized for the convolutional neural network as follows:
Figure RE-RE-FDA0002445217690000031
where loss represents the loss, S2Represents the number of grids of the convolutional neural network, B represents the number of prediction boxes per cell,
Figure RE-RE-FDA0002445217690000032
to representWhether the jth anchor box of the ith grid is responsible for the target or not is 0 when not responsible and 1 when responsible,
Figure RE-RE-FDA0002445217690000033
the j-th prediction frame of the i grids represents an irresponsible target, the value of the target is 1 when the target exists, the value of the target is 0 when the target does not exist, and the lambda iscoord=5,λnoobj=0.5,xi,yiRespectively representing the width and height of the center point coordinate of the ith prediction box,
Figure RE-RE-FDA0002445217690000034
respectively representing the width and height, w, of the coordinates of the center point of the ith calibration framei,hiRespectively representing the width and height of the ith prediction box,
Figure RE-RE-FDA0002445217690000035
respectively, the width and height of the ith calibration frame, ciRepresenting the confidence of the ith prediction box, the value of the selected prediction box is 1, the value of the unselected prediction box is 0,
Figure RE-RE-FDA0002445217690000037
representing the confidence of the ith calibration frame, the value of the selected calibration frame is 1, the value of the unselected calibration frame is 0, piRepresenting the classification probability of a face in the ith prediction box,
Figure RE-RE-FDA0002445217690000036
representing the classification probability of the face in the ith calibration frame, c representing the class with or without the face, and classes representing the set of the classes with and without the face;
and after the loss is obtained, updating by adopting a random gradient descent algorithm, continuously selecting and judging the optimal parameter under the current target by the convolutional neural network, updating the parameter in the convolutional neural network according to the loss result, and stopping updating after the convolutional neural network reaches the required index.
CN201911394420.1A 2019-12-30 2019-12-30 Face detection method of thermal infrared image Pending CN111209822A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911394420.1A CN111209822A (en) 2019-12-30 2019-12-30 Face detection method of thermal infrared image

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911394420.1A CN111209822A (en) 2019-12-30 2019-12-30 Face detection method of thermal infrared image

Publications (1)

Publication Number Publication Date
CN111209822A true CN111209822A (en) 2020-05-29

Family

ID=70786541

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911394420.1A Pending CN111209822A (en) 2019-12-30 2019-12-30 Face detection method of thermal infrared image

Country Status (1)

Country Link
CN (1) CN111209822A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111985374A (en) * 2020-08-12 2020-11-24 汉王科技股份有限公司 Face positioning method and device, electronic equipment and storage medium
CN112115838A (en) * 2020-09-11 2020-12-22 南京华图信息技术有限公司 Thermal infrared image spectrum fusion human face classification method
CN112199993A (en) * 2020-09-01 2021-01-08 广西大学 Method for identifying transformer substation insulator infrared image detection model in any direction based on artificial intelligence
CN112232208A (en) * 2020-10-16 2021-01-15 蓝普金睛(北京)科技有限公司 Infrared human face temperature measurement system and method thereof
CN112529947A (en) * 2020-12-07 2021-03-19 北京市商汤科技开发有限公司 Calibration method and device, electronic equipment and storage medium
CN112926478A (en) * 2021-03-08 2021-06-08 新疆爱华盈通信息技术有限公司 Gender identification method, system, electronic device and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108038474A (en) * 2017-12-28 2018-05-15 深圳云天励飞技术有限公司 Method for detecting human face, the training method of convolutional neural networks parameter, device and medium
CN108764057A (en) * 2018-05-03 2018-11-06 武汉高德智感科技有限公司 A kind of far infrared human type of face detection method and system based on deep learning
CN109902556A (en) * 2019-01-14 2019-06-18 平安科技(深圳)有限公司 Pedestrian detection method, system, computer equipment and computer can storage mediums
CN110399905A (en) * 2019-07-03 2019-11-01 常州大学 The detection and description method of safety cap wear condition in scene of constructing

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108038474A (en) * 2017-12-28 2018-05-15 深圳云天励飞技术有限公司 Method for detecting human face, the training method of convolutional neural networks parameter, device and medium
CN108764057A (en) * 2018-05-03 2018-11-06 武汉高德智感科技有限公司 A kind of far infrared human type of face detection method and system based on deep learning
CN109902556A (en) * 2019-01-14 2019-06-18 平安科技(深圳)有限公司 Pedestrian detection method, system, computer equipment and computer can storage mediums
CN110399905A (en) * 2019-07-03 2019-11-01 常州大学 The detection and description method of safety cap wear condition in scene of constructing

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
蔡成涛 等: "《海洋浮标目标探测技术》", 哈尔滨工程大学出版社, pages: 51 - 53 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111985374A (en) * 2020-08-12 2020-11-24 汉王科技股份有限公司 Face positioning method and device, electronic equipment and storage medium
CN111985374B (en) * 2020-08-12 2022-11-15 汉王科技股份有限公司 Face positioning method and device, electronic equipment and storage medium
CN112199993A (en) * 2020-09-01 2021-01-08 广西大学 Method for identifying transformer substation insulator infrared image detection model in any direction based on artificial intelligence
CN112199993B (en) * 2020-09-01 2022-08-09 广西大学 Method for identifying transformer substation insulator infrared image detection model in any direction based on artificial intelligence
CN112115838A (en) * 2020-09-11 2020-12-22 南京华图信息技术有限公司 Thermal infrared image spectrum fusion human face classification method
CN112115838B (en) * 2020-09-11 2024-04-05 南京华图信息技术有限公司 Face classification method based on thermal infrared image spectrum fusion
CN112232208A (en) * 2020-10-16 2021-01-15 蓝普金睛(北京)科技有限公司 Infrared human face temperature measurement system and method thereof
CN112529947A (en) * 2020-12-07 2021-03-19 北京市商汤科技开发有限公司 Calibration method and device, electronic equipment and storage medium
CN112926478A (en) * 2021-03-08 2021-06-08 新疆爱华盈通信息技术有限公司 Gender identification method, system, electronic device and storage medium

Similar Documents

Publication Publication Date Title
CN111209822A (en) Face detection method of thermal infrared image
CN108549873B (en) Three-dimensional face recognition method and three-dimensional face recognition system
CN113705478B (en) Mangrove single wood target detection method based on improved YOLOv5
CN111882579A (en) Large infusion foreign matter detection method, system, medium and equipment based on deep learning and target tracking
JP4559437B2 (en) Sky detection in digital color images
CN111476827B (en) Target tracking method, system, electronic device and storage medium
KR102521386B1 (en) Dimension measuring device, dimension measuring method, and semiconductor manufacturing system
US20070154088A1 (en) Robust Perceptual Color Identification
CN114241548A (en) Small target detection algorithm based on improved YOLOv5
CN109253722A (en) Merge monocular range-measurement system, method, equipment and the storage medium of semantic segmentation
CN111429448B (en) Biological fluorescent target counting method based on weak segmentation information
CN104240264A (en) Height detection method and device for moving object
CN108428220A (en) Satellite sequence remote sensing image sea island reef region automatic geometric correction method
US8094971B2 (en) Method and system for automatically determining the orientation of a digital image
CN109191434A (en) Image detecting system and detection method in a kind of cell differentiation
CN110232387A (en) A kind of heterologous image matching method based on KAZE-HOG algorithm
CN113435282B (en) Unmanned aerial vehicle image ear recognition method based on deep learning
CN111914761A (en) Thermal infrared face recognition method and system
CN116448019B (en) Intelligent detection device and method for quality flatness of building energy-saving engineering
CN109190458A (en) A kind of person of low position's head inspecting method based on deep learning
CN111860587A (en) Method for detecting small target of picture
CN111862040B (en) Portrait picture quality evaluation method, device, equipment and storage medium
CN108154513A (en) Cell based on two photon imaging data detects automatically and dividing method
Li et al. An automatic plant leaf stoma detection method based on YOLOv5
CN112183287A (en) People counting method of mobile robot under complex background

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination