CN114757979B

CN114757979B - Kitchen smoking detection method and system with neural network and infrared image matched

Info

Publication number: CN114757979B
Application number: CN202210652952.6A
Authority: CN
Inventors: 谢红刚; 侯凯元; 连洪伟; 林浩威; 祝树新; 徐俊杰
Original assignee: Hubei University of Technology
Current assignee: Hubei University of Technology
Priority date: 2022-06-10
Filing date: 2022-06-10
Publication date: 2022-08-23
Anticipated expiration: 2042-06-10
Also published as: CN114757979A

Abstract

The invention discloses a kitchen smoking detection method and a kitchen smoking detection system with a neural network and infrared image matching, wherein the method comprises the following steps: s100: acquiring an infrared image and a visible light image of a target area; s200: marking and fixing a high-temperature area in the visible light image; s300: carrying out position registration on the infrared image and the visible light image; s400: extracting a high-temperature region which is not overlapped with the fixed high-temperature region from the infrared image as a high-temperature candidate region; s500: detecting a head region from the visible light image by adopting a multitask convolution neural network, and deforming the head region to obtain a cigarette position information candidate region; s600: detecting whether the high-temperature candidate area and the cigarette position information candidate area are overlapped or not according to the position registration relation of the infrared image and the visible light image, and if so, judging that smoking behavior exists; otherwise, judging that no smoking behavior exists. The invention can obviously improve the detection speed and the detection accuracy of smoking detection in a kitchen scene.

Description

Kitchen smoking detection method and system with neural network and infrared image matched

Technical Field

The invention belongs to the technical field of computer vision, and particularly relates to a kitchen smoking detection method and system based on neural network and infrared image matching.

Background

With the development of artificial intelligence, computer vision has been widely applied to smoking behavior detection. Common detection methods include detection methods based on smoke characteristics of cigarette combustion, and detection methods based on gesture recognition. Because the kitchen environment is complicated, there are more high temperature regions and personnel mobility in the kitchen, and the factor that influences the detection is more, and these detection methods have certain limitation when the kitchen is used at present, and the false retrieval rate is high.

Chinese patent application CN 113326754 a, entitled smoking behavior detection method based on convolutional neural network, and related devices, discloses a smoking behavior detection method as follows: acquiring smoking behavior detection data, wherein the smoking behavior detection data comprises a plurality of frames of human face or human head images to be recognized; carrying out data preprocessing on a plurality of frames of face or head images to be recognized in the smoking behavior detection data to obtain fixed-size images; inputting the fixed-size image into a preset convolutional neural network model to obtain a detection result of the face or head information to be recognized in the smoking behavior detection data; and inputting the detection result into a preset classifier to obtain a smoking behavior judgment result. The smoking behavior detection method is a detection method based on gesture recognition, and is not suitable for kitchen scenes with complex environments.

When the exhaust and smoke discharge conditions of the kitchen are good, the temperature of the kitchen is 20-40 ℃ except for the high temperature of more than 1000 ℃ in a gas stove area and a smoke generation area. The temperature of the breathing area during cooking is about 23 ℃, the temperature of the central part (namely the smoke point) of the cigarette during burning is as high as 800-900 ℃, the temperature of the burning edge of the roll paper reaches 200-300 ℃, and the temperature difference between the roll paper and the outside area is large. Considering that the cigarette area has great temperature difference with the external area, then someone utilizes this characteristic to detect smoking action, and the technical thought is: acquiring an infrared image, detecting a smoke point by the infrared image, and judging that smoking behavior exists when the smoke point is detected. However, because infrared images often have the defects of poor resolution, low contrast, low signal-to-noise ratio, blurred visual effect and the like, it is difficult to obtain an effective, stable and accurate detection result by using a smoking behavior detection method based on infrared images only.

In chinese patent application CN 114202646 a, an infrared image smoking detection method and system based on deep learning, it is proposed to combine an infrared image and a visible light image to perform smoking detection. The detection method comprises the following steps: acquiring an infrared image and a visible light image, and fusing the infrared image and the visible light image to obtain a dual-light fusion image; and extracting smoke point characteristics from the double-light fusion image by adopting a network model, and judging that people smoke in the image with the smoke point. In the detection method, the infrared image is supplemented by using the smoking behavior characteristics in the visible light image so as to improve the detection accuracy. However, the kitchen scene environment is complex, so that the accuracy of extracting smoking behavior features from the visible light image is low, and therefore, the infrared image smoking detection method is not applicable to the kitchen scene.

Disclosure of Invention

The invention aims to provide a kitchen smoking detection method and system based on neural network and infrared image matching.

The invention provides a kitchen smoking detection method with a neural network and infrared image matched, which comprises the following steps:

s100: acquiring detection data including an infrared image and a visible light image of a target area;

s200: marking and fixing a high-temperature area in the visible light image;

s300: carrying out position registration on the infrared image and the visible light image;

s400: extracting a high-temperature region which is not overlapped with the fixed high-temperature region from the infrared image as a high-temperature candidate region;

s500: detecting a head region from the visible light image by adopting a multitask convolutional neural network, and deforming the head region to obtain a candidate region of cigarette position information;

s600: detecting whether a high-temperature candidate area in the infrared image is overlapped with a cigarette position information candidate area in the visible light image or not according to the position registration relation of the infrared image and the visible light image, and if so, judging that smoking behavior exists; otherwise, judging that no smoking behavior exists.

In some embodiments, before the detection data is collected, the camera for collecting the infrared image and the camera for collecting the visible light image are in the same plane and have parallel optical axes.

In some embodiments, in step S200, the fixed high temperature region is labeled by a saliency target detection method or a manual labeling.

In some embodiments, in step S300, the infrared image and the visible light image are registered in position by using bilinear interpolation.

In some embodiments, step S400 specifically includes:

performing binarization segmentation on the infrared image; inputting the position information of the fixed high-temperature area into the infrared image after binarization segmentation, and setting the gray pixel value of the position of the fixed high-temperature area in the infrared image to be 0 to obtain a binarization infrared image without the fixed high-temperature area; and expanding the binarized infrared image without the fixed high-temperature region, wherein the region with the gray value in the expanded binarized infrared image is the high-temperature candidate region.

In some embodiments, step S500 further comprises:

s510: respectively carrying out geometric scaling on the visible light image for multiple times, outputting the image after geometric scaling and constructing an image pyramid;

s520: performing convolution of a 3 multiplied by 3 convolution kernel and 2 multiplied by 2 pooling on the image pyramid by utilizing a ProNet network model, performing convolution of two 3 multiplied by 3 convolution kernels on output after 2 multiplied by 2 pooling, and preliminarily detecting and outputting all head region windows existing in the visible light image;

s530: utilizing a RefineNet network model to sequentially carry out convolution of a 3 multiplied by 3 convolution kernel and 3 multiplied by 3 pooling on a head area window output by a ProNet network model, then sequentially carrying out convolution of the 3 multiplied by 3 convolution kernel and 3 multiplied by 3 pooling, carrying out 2 multiplied by 2 pooling on the output of the second 3 multiplied by 3 pooling, and outputting an optimized head area window after passing through a full connection layer;

s540: performing convolution of a 3 multiplied by 3 convolution kernel and 3 multiplied by 3 pooling treatment on the optimized head region window by utilizing an Output Network model, performing convolution of the 3 multiplied by 3 convolution kernel and 3 multiplied by 3 pooling treatment for the first time, performing convolution of the 3 multiplied by 3 convolution kernel and 2 multiplied by 2 pooling treatment on the Output of the second 3 multiplied by 3 pooling treatment, and outputting the head region after convolution of the 2 multiplied by 2 convolution kernel and full connection layer;

s550: the height of the head region is reduced to 1/2, which is the original height, and only the lower half of the head region is left, and then the head region is extended to each of the left and right sides by 1/10, which is the original width, and the obtained regions are cigarette position information candidate regions.

In some embodiments, step S600 further comprises:

s610: acquiring the position information of four vertexes of the candidate area of the cigarette position information, and respectively marking the four vertexes as

。

S620: acquiring pixel point position information in a high-temperature candidate region to acquire a position information data set;

s630: determining whether a simultaneous satisfaction exists in a location information dataset

And

point of (2)

(ii) a If so, judging that the overlapping exists and smoking behavior exists; otherwise, judging that no overlapping exists and no smoking behavior exists.

The invention provides a kitchen smoking detection system with a neural network matched with an infrared image, which comprises:

the first module is used for acquiring detection data, including an infrared image and a visible light image of a target area;

the second module is used for marking and fixing a high-temperature area in the visible light image;

the third module is used for carrying out position registration on the infrared image and the visible light image;

a fourth module for extracting a high temperature region not coinciding with the fixed high temperature region from the infrared image as a high temperature candidate region;

the fifth module is used for detecting a head region from the visible light image by adopting a multitask convolutional neural network, and deforming the head region to obtain a candidate region of cigarette position information;

the sixth module is used for detecting whether the high-temperature candidate area in the infrared image is overlapped with the cigarette position information candidate area in the visible light image or not according to the position registration relation between the infrared image and the visible light image, and if the high-temperature candidate area is overlapped with the cigarette position information candidate area in the visible light image, judging that smoking behavior exists; otherwise, judging that no smoking behavior exists.

The invention has the following characteristics and beneficial effects:

due to the particularity of a kitchen scene, a large number of high-temperature areas and large personnel flow in a kitchen, and the detection accuracy of the traditional smoking detection method applied to the kitchen scene is not high, the invention is provided, the invention is more suitable for the application of the kitchen scene, and the detection speed and the detection accuracy of smoking detection in the kitchen scene can be obvious.

Drawings

FIG. 1 is a flow chart of the method of the present invention;

FIG. 2 is a flow diagram of a multitasking convolutional neural network detecting a head region in an embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the following embodiments. It should be understood that the embodiments described herein are merely illustrative of the present invention and are not intended to limit the present invention.

A specific embodiment of the present invention will be provided in conjunction with fig. 1, with the following steps:

s100: detection data including infrared images and visible light images of the target area are acquired.

The infrared image is obtained by shooting with a thermal imaging monitoring camera, and the visible light image is obtained by the visible light monitoring camera. Before the detection data is collected, the thermal imaging monitoring camera and the visible light monitoring camera are finely adjusted to be located on the same plane and parallel to each other in optical axis, and then the infrared image and the visible light image of the kitchen target area are collected and uploaded to a computer.

S200: and marking a fixed high-temperature area in the visible light image.

Because the kitchen is the fixed scene, the high temperature region is fixed and the position of thermal imaging surveillance camera head and visible light surveillance camera head is also fixed. Therefore, for the purpose of improving the detection efficiency and the detection accuracy, the visible light image is preprocessed, namely, fixed high-temperature areas such as a cooking bench, an oven and the like are marked in the visible light image. Specifically, the method can adopt a saliency target detection method for intelligent labeling or manual labeling.

S300: and carrying out position registration on the infrared image and the visible light image.

Due to the fact that imaging mechanisms and imaging resolutions of the infrared image and the visible light image are different and optical axes of the optical systems are inconsistent, registration needs to be carried out on pixel points of the infrared image and the visible light image. In the specific embodiment, the bilinear interpolation method is used for registration to obtain the infrared image with the same length and width as the visible light image, so that the aim of position registration of the infrared image and the visible light image is fulfilled.

The specific implementation process of the step is as follows:

s310: and calculating the mapping relation between the interpolation point in the infrared image and the known neighborhood pixel point.

The initial infrared image and the visible light image used in the embodiment have the same length-width ratio, and the coordinate mapping relation between the interpolation point and the four neighborhood pixel points in the infrared image, namely the weight value is calculated

The calculation of the weight is shown in formula (1):

（1）

in the formula (1), the reaction mixture is,

respectively representing the weight values of the four neighborhood pixel points,

respectively representing the length and width of the visible light image,

respectively representing the length and width of the infrared image, and int () representing the rounding operation.

S320: using weights

And performing interpolation to obtain the pixel value of the interpolation point.

The infrared image is interpolated by using the weight value obtained by the formula (1), and the interpolation formula is shown as the formula (2):

（2）

in the formula (2), the reaction mixture is,

the coordinates of the interpolation point are represented,

respectively representing the pixel values of the four neighborhood pixel points,

representing interpolation points

The interpolated pixel values of (a).

S330: and repeating the substep S320, calculating the pixel values of all the interpolation points, thereby transforming the infrared image to be the same as the size of the visible light image, namely realizing the position registration of the infrared image and the visible light image.

Obtaining the mapping relation between the infrared image interpolation point and the four neighborhood pixel points through the formula (1), namely the weight

(ii) a Then, the formula (2) is utilized to carry out four times of interpolation on the two directions of the horizontal coordinate and the vertical coordinate respectively, and the pixel value of the interpolation point is calculated

And then repeating the steps to enable the length and width values of the infrared image and the visible light image to be the same, namely realizing the registration of pixel points in the infrared image and the visible light image.

S400: and extracting a high-temperature region which is not overlapped with the fixed high-temperature region from the infrared image to serve as a high-temperature candidate region, wherein smoke spots may be included in the high-temperature candidate region.

The specific implementation process of the step is as follows:

s410: the contrast of the infrared image is enhanced, specifically, the contrast can be enhanced through histogram equalization, and the adopted formula is as follows:

（3）

in formula (3):

representing the contrast enhanced gray value level,

representing the highest grey value level in the infrared image,

representing the total number of pixels in the infrared image,swhich represents the level of a gray-scale value,

representing the grey value level before contrast enhancement,

is shown assTotal number of gray pixels of the level gray value.

S420: and drawing a two-dimensional gray histogram curve.

The gray value level S after contrast enhancement is taken as an abscissa, and the total number of gray pixels of the S level

Drawing a two-dimensional gray level histogram curve of the infrared image after enhancing the contrast as a vertical coordinate

。

S430: and solving the gradient of the two-dimensional gray level histogram curve.

According to two-dimensional gray level histogram curve

Gray scale value level S and total number of gray scale pixels of S level

Calculating the gradient of two-dimensional gray histogram curve

The calculation formula is shown as formula (4):

（4）

in the formula (4)：

To represent

The partial derivative of the signal with respect to S,

represent

To pair

The partial derivative of (c).

S440: and solving the mean value of the gray gradient.

By two-dimensional grey-level histogram curve gradient

Calculating the mean value of the gradient of the two-dimensional gray histogram curve

：

（5）

In equation (5), z is the order of the gradation gradient values, and P is the total number of gradation value levels present.

S450: and generating a binary threshold surface.

In the curve gradient of the two-dimensional gray level histogram, the mean value of the curve gradient of the two-dimensional gray level histogram is smaller than that of the curve gradient of the two-dimensional gray level histogram

And deleting the curve gradients of the two-dimensional gray level histogram, and reordering the curve gradients of the two-dimensional gray level histogram after deletion to generate a binaryzation threshold surface.

S460: and carrying out binarization segmentation on the binarization threshold surface.

And combining the pixel points of the binarization threshold surface to generate a global threshold surface, and realizing binarization segmentation of the infrared image, namely generating the infrared image after binarization segmentation.

S470: and generating a binary infrared image without a fixed high-temperature area.

And inputting the position information of the fixed high-temperature area into the infrared image after the binarization segmentation, and setting the gray pixel value in the position of the fixed high-temperature area to be 0 to form the binarization infrared image without the fixed high-temperature area.

S480: and performing expansion processing on the binary infrared image to generate a high-temperature candidate region.

Since the high-temperature regions in the synthesized binary image are all small regions, it is necessary to perform dilation on the pixels in the small regions to generate high-temperature candidate regions. The calculation formula of the expansion is shown in equation (6), and then the region having the gradation value in the binarized infrared map after the expansion is output as the high-temperature candidate region.

（6）

In the formula (6), the reaction mixture is,

inputting a binary image; r is a structural element, and 9 pixels are arranged in a 3 × 3 square to form the structural element in this embodiment;

for the sub-regions within the input image,

are inner sub-regions of the structural element.

S500: and detecting and extracting a head region from the visible light image by using a multitask convolutional neural network, wherein the head region possibly comprises smoke points, and the head region is used as a candidate region of the cigarette position information. The multitask convolutional neural network is trained in advance.

The step detects and extracts head position information from the visible light image by utilizing a multitask convolution neural network. In the present embodiment, the head position information detection based on deep learning is performed by cascading three convolutional neural networks.

The specific implementation of this step will be provided below in conjunction with the actual scheme of fig. 2:

s510: and respectively carrying out equal scaling on the visible light images for multiple times until the length of the short side is less than 12, and outputting all the images subjected to equal scaling to construct an image pyramid.

S520: and (5) ProNet detection processing.

Performing convolution of three 3 × 3 convolution kernels and one-time 2 × 2 pooling on the image pyramid to obtain a head region window and a regression vector thereof, performing regression calibration on the head region window through the regression vector, preliminarily judging all head region windows existing in the visible light image, and giving regression of the head region window, wherein the preliminarily obtained head region window has a large range and is easy to generate misjudgment, so that further regression processing of a subsequent network is required to reduce the probability of judgment errors.

S530: and (5) RefineNet detection processing.

And performing convolution of two 3 × 3 convolution kernels and one 2 × 2 convolution kernel, two times of 3 × 3 pooling processing and one time of 2 × 2 pooling processing on the head region window obtained in the step S520, and continuously performing regression processing on the input head region window to obtain an optimized head region window.

S540: and outputting the final Network model by the Output Network.

And performing convolution of three 3 multiplied by 3 convolution kernels and one 2 multiplied by 2 convolution kernel, pooling of two 3 multiplied by 3 convolution kernels and pooling of 2 multiplied by 2 on the optimized head region window, realizing finer identification of the head through more convolution and pooling operations, and generating a finally detected more accurate head region.

S550: and deforming the head area to obtain an accurate cigarette position information candidate area.

The average width of the human face is about 15cm, and the average width of the cigarette is 8 cm. Since cigarette ends usually appear in human mouths and their nearby areas during smoking, in order to improve the detection accuracy of cigarette smoke points, the present invention needs to deform the head area detected in step S540, that is: the height of the head region is reduced to 1/2 where the height is original, only the lower half of the head region is left, and then the head region is elongated by 1/10 of the original width to the left and right, respectively, and output as cigarette position information candidate regions, which are shown as rectangular frame regions in the deformed image of the head region in fig. 2.

Because the cigarette position information candidate area is rectangular, and the high-temperature candidate area is irregular, in order to improve the accuracy of detection, the invention provides a more accurate method for detecting the overlapping part, and the specific implementation process is as follows:

s610: acquiring position information of four vertexes of a cigarette position information candidate area, wherein the position information of the four vertexes is marked as

。

s630: judging whether the position information data set has a point or not

Satisfy the requirement of

And

if yes, judging that overlapping exists, and judging that smoking behavior exists; otherwise, it is judgedThere is no overlap and no smoking behavior.

Although the present invention has been described in detail with reference to specific embodiments thereof, it will be understood by those skilled in the art that various changes and modifications can be made therein without departing from the spirit and scope of the invention as defined in the appended claims.

Claims

1. A kitchen smoking detection method matched with a neural network and an infrared image is characterized by comprising the following steps:

s200: marking and fixing a high-temperature area in the visible light image;

2. The kitchen smoking detection method based on matching of the neural network and the infrared image as claimed in claim 1, wherein:

before the detection data is collected, the camera for collecting the infrared image and the camera for collecting the visible light image are positioned on the same plane, and the optical axes are parallel.

3. The kitchen smoking detection method based on matching of the neural network and the infrared image as claimed in claim 1, wherein:

in step S200, a saliency target detection method or a manual labeling is adopted to label the fixed high-temperature region.

4. The kitchen smoking detection method based on matching of the neural network and the infrared image as claimed in claim 1, wherein:

in step S300, a bilinear interpolation method is used to perform position registration on the infrared image and the visible light image.

5. The kitchen smoking detection method based on matching of the neural network and the infrared image as claimed in claim 1, wherein:

step S400 specifically includes:

6. The kitchen smoking detection method based on matching of the neural network and the infrared image as claimed in claim 1, wherein:

the step S500 further includes:

s550: the height of the head region is reduced to 1/2, which is the original height, and only the lower half of the head region is left, and then the head region is extended by 1/10, which is the original width, to the left and right, respectively, and the resulting regions are cigarette position information candidate regions.

7. The kitchen smoking detection method based on matching of the neural network and the infrared image as claimed in claim 1, wherein:

step S600 further includes:

；

And

point of (2)

(ii) a If there is anyIf so, judging that the overlapping exists and smoking behavior exists; otherwise, judging that no overlapping exists and no smoking behavior exists.

8. A kitchen smoking detection system with a neural network matched with an infrared image is characterized by comprising: