CN114842498A

CN114842498A - Smoking behavior detection method and device

Info

Publication number: CN114842498A
Application number: CN202110144146.3A
Authority: CN
Inventors: 周光正
Original assignee: Beijing Jingdong Zhenshi Information Technology Co Ltd
Current assignee: Beijing Jingdong Zhenshi Information Technology Co Ltd
Priority date: 2021-02-02
Filing date: 2021-02-02
Publication date: 2022-08-02

Abstract

The invention discloses a smoking behavior detection method and device, and relates to the technical field of computers. One embodiment of the method comprises: determining a human body area in the image, and detecting whether a smoke body exists in the human body area; if the cigarette body is detected, identifying and calculating the distance between the cigarette body position and the nose tip position of the human body, and if the distance is smaller than or equal to a preset threshold value, judging that smoking behavior exists; if the smoke body is not detected or the distance is larger than a preset threshold value, whether smoke exists in the image or not is identified, if yes, smoking behavior is judged to possibly exist, and if not, smoking behavior is judged not to exist. The implementation mode comprehensively considers the strict definition of the smoking behavior and the accompanying smoke phenomenon, utilizes various algorithm ideas of deep learning to establish strict algorithm logic so as to accurately judge the smoking behavior, avoid various misjudgments and missed judgments and achieve the aim of effectively preventing fire.

Description

Smoking behavior detection method and device

Technical Field

The invention relates to the technical field of computers, in particular to a smoking behavior detection method and device.

Background

Smoking activity generally refers to the situation where a person's mouth contains a cigarette and is performing a continuous smoking action, which typically produces a continuous stream of smoke. At present, two main smoking behavior detection modes are provided: one method is to detect smoke through a sensor, and the other method is to directly analyze whether an image accords with smoking characteristics based on a computer vision technology and mainly comprises a machine learning method and a deep learning method. In the process of implementing the invention, the inventor finds that the prior art has at least the following problems:

1. the smoke produced by smoking is generally not particularly noticeable and disappears rapidly due to diffusion. Sensors typically have high sensitivity only when the smoke concentration is monitored to be sufficiently high and are not suitable for use in environments where a large amount of dust is present or where high temperatures and humidity are present. There are many reasons for smoke generation in practice and even if smoke is detected, it cannot be concluded that smoking behaviour is present.

2. The machine learning method does not have strong universality for artificial features, thereby resulting in more false reports and missed reports. In the deep learning method, a large number of people are usually involved in a real scene, and the people are often changed continuously, so that the difficulty in sample collection is high. Moreover, such algorithms are generally not applicable to the picture case of a side-lit human body; side smoking does not show a complete mouth area and therefore it is not possible to effectively distinguish by the mouth area whether smoking activity is present.

Disclosure of Invention

In view of this, embodiments of the present invention provide a smoking behavior detection method and apparatus, which can at least solve the problems of the prior art that the universality is low and the method is not suitable for detecting smoking on the side of a human body.

To achieve the above object, according to an aspect of an embodiment of the present invention, there is provided a smoking behavior detection method including:

determining a human body area in the image, and detecting whether a smoke body exists in the human body area;

if the cigarette body is detected, identifying and calculating the distance between the cigarette body position and the nose tip position of the human body, and if the distance is smaller than or equal to a preset threshold value, judging that smoking behavior exists;

if the smoke body is not detected or the distance is larger than the preset threshold value, identifying whether smoke exists in the image or not, if so, judging that smoking behavior possibly exists, otherwise, judging that no smoking behavior exists.

Optionally, the determining the human body region in the image further includes: for a single human body area, determining the width and the height of the single human body area, and taking the product of the width and an external expansion coefficient as an external expansion value in the horizontal direction; and respectively carrying out outward expansion on the single human body area to the left and the right in the horizontal direction based on the outward expansion value to obtain the single human body area after the outward expansion.

Optionally, after obtaining the expanded human body region, the method further includes: and correcting the boundary coordinates of the single body area subjected to the external expansion based on the resolution of the image in the horizontal direction and the vertical direction to obtain the single body area subjected to the external expansion and correction.

Optionally, the width and height are the width and height of a minimum regular rectangle capable of covering the single human body area.

Optionally, the detecting whether there is a smoke body in each human body region includes: for a single human body area, extracting first image features of the single human body area by using a first convolution module, then detecting whether the first image features contain smoke bodies with a first size or not by using a first detection module, and if so, determining smoke body position information; performing transition processing on the first image features by using a first transition module, extracting second image features from the first image features after transition by using a second convolution module, detecting whether the second image features contain smoke bodies with a second size by using a second detection module, and determining smoke body position information if the second image features contain the smoke bodies with the second size; performing transition processing on the second image features by using a second transition module, extracting third image features from the second image features after transition by using a third convolution module, detecting whether the third image features contain smoke bodies with a third size by using a third detection module, and determining smoke body position information if the third image features contain the smoke bodies with the third size; and summarizing the detection information to obtain detection results and position information of the cigarette bodies with different sizes in the single human body area.

Optionally, the method further includes: acquiring the sizes of all marked anchor frames, and acquiring a plurality of anchor frame sizes with the largest clustering result in a clustering mode; and distributing the sizes of the plurality of anchor frames to the first detection module, the second detection module and the third detection module, and identifying the cigarette body area in a mode of finely adjusting the sizes of the anchor frames.

Optionally, the method further includes: and detecting all human key points in the image by adopting a human key point algorithm, and clustering each key point to the corresponding individual region to determine the nose tip position of each human region.

Optionally, the distance between the cigarette body position and the human nose tip position is identified and calculated, and if the distance is smaller than or equal to a preset threshold, it is determined that smoking behavior exists, including: determining the coordinate value of the upper left corner of a single cigarette body area, and calculating the distance between the upper left corner and the nose tip of the human body in the vertical direction; and calculating the product of the width of the single human body area and a preset distance measurement coefficient, and if the distance is less than or equal to the product, judging that smoking behavior exists in the single cigarette body area.

Optionally, the identifying whether smoke exists in the image includes: identifying the probability of smoke existing in the image based on a multi-feature fusion smoke identification algorithm; the multi-feature fused smoke recognition algorithm comprises a deep learning algorithm and an image algorithm.

Optionally, the image algorithm includes a histogram of oriented gradients algorithm and a local binary pattern; the method for identifying the probability of the image having the smoke based on the multi-feature fusion comprises the following steps: extracting a feature map of the image by using the deep learning algorithm, and converting the feature map into a one-dimensional first vector with a first length through a flattening layer; wherein the first length is a product of dimensions of the feature map in three dimensions; converting the one-dimensional vector into a second vector with a second length through the first full-link layer, and then converting the second vector into a third vector with a third length through the second full-link layer; extracting a vector of the image by using the local binary pattern, and converting the vector into a fourth vector with a fourth length through a third full-connection layer; extracting a vector of the image by using the direction gradient histogram algorithm, and converting the vector into a fifth vector with a fifth length through a fourth full-connection layer; fusing the third vector, the fourth vector and the fifth vector to obtain a total vector; wherein the length of the total vector is the sum of the third length, the fourth length, and the fifth length; converting the total vector into a seventh vector with a seventh length through a fifth full-link layer, converting the seventh vector into an eighth vector with a length of 1 through a sixth full-link layer, and taking a modulus of the eighth vector as a probability that smoke exists in the image.

To achieve the above object, according to another aspect of the embodiments of the present invention, there is provided a smoking behavior detection apparatus including: the detection module is used for determining a human body area in the image and detecting whether a cigarette body exists in the human body area; the distance judgment module is used for identifying and calculating the distance between the cigarette body position and the human nose tip position if the cigarette body is detected, and judging that smoking behavior exists if the distance is smaller than or equal to a preset threshold value; and the smoke identification module is used for identifying whether smoke exists in the image or not if no smoke body is detected or the distance is larger than the preset threshold value, judging that smoking behavior possibly exists if the smoke exists, and otherwise, judging that no smoking behavior exists.

Optionally, the detection module is further configured to: for a single human body area, determining the width and the height of the single human body area, and taking the product of the width and an external expansion coefficient as an external expansion value in the horizontal direction; and respectively carrying out outward expansion on the single human body region in the left and right directions in the horizontal direction based on the outward expansion value to obtain an outward expanded single human body region.

Optionally, the detection module is configured to: and correcting the boundary coordinates of the single body area subjected to the external expansion based on the resolution of the image in the horizontal direction and the vertical direction to obtain the single body area subjected to the external expansion and correction.

Optionally, the detection module is configured to: for a single human body area, extracting first image features of the single human body area by using a first convolution module, then detecting whether the first image features contain a cigarette body with a first size or not by using a first detection module, and if so, determining cigarette body position information; performing transition processing on the first image features by using a first transition module, extracting second image features from the first image features after transition by using a second convolution module, detecting whether the second image features contain smoke bodies with a second size by using a second detection module, and determining smoke body position information if the second image features contain the smoke bodies with the second size; performing transition processing on the second image features by using a second transition module, extracting third image features from the second image features after transition by using a third convolution module, detecting whether the third image features contain smoke bodies with a third size by using a third detection module, and determining smoke body position information if the third image features contain the smoke bodies with the third size; and summarizing the detection information to obtain detection results and position information of the smoke bodies with different sizes in the single human body area.

Optionally, the detection module is further configured to: acquiring the sizes of all marked anchor frames, and acquiring a plurality of anchor frame sizes with the largest clustering result in a clustering mode; and distributing the sizes of the plurality of anchor frames to the first detection module, the second detection module and the third detection module, and identifying the cigarette body area in a mode of finely adjusting the sizes of the anchor frames.

Optionally, the detection module is further configured to: and detecting all human key points in the image by adopting a human key point algorithm, and clustering each key point to the corresponding individual region to determine the nose tip position of each human region.

Optionally, the distance determining module is configured to: determining the coordinate value of the upper left corner of a single cigarette body area, and calculating the distance between the upper left corner and the nose tip of the human body in the vertical direction; and calculating the product of the width of the single human body area and a preset distance measurement coefficient, and if the distance is less than or equal to the product, judging that smoking behavior exists in the single cigarette body area.

Optionally, the smoke recognition module is configured to: identifying the probability of smoke existing in the image based on a multi-feature fusion smoke identification algorithm; the multi-feature fused smoke recognition algorithm comprises a deep learning algorithm and an image algorithm.

Optionally, the image algorithm includes a histogram of oriented gradients algorithm and a local binary pattern; the smoke identification module is configured to: extracting a feature map of the image by using the deep learning algorithm, and converting the feature map into a one-dimensional first vector with a first length through a flattening layer; wherein the first length is a product of dimensions of the feature map in three dimensions; converting the one-dimensional vector into a second vector with a second length through the first full-link layer, and then converting the second vector into a third vector with a third length through the second full-link layer; extracting a vector of the image by using the local binary pattern, and converting the vector into a fourth vector with a fourth length through a third full-connection layer; extracting a vector of the image by using the direction gradient histogram algorithm, and converting the vector into a fifth vector with a fifth length through a fourth full-connection layer; fusing the third vector, the fourth vector and the fifth vector to obtain a total vector; wherein the length of the total vector is the sum of the third length, the fourth length, and the fifth length; converting the total vector into a seventh vector with a seventh length through a fifth full-link layer, converting the seventh vector into an eighth vector with a length of 1 through a sixth full-link layer, and taking a modulus of the eighth vector as a probability that smoke exists in the image.

To achieve the above object, according to still another aspect of the embodiments of the present invention, there is provided an electronic smoking behavior detection device.

The electronic device of the embodiment of the invention comprises: one or more processors; a storage device for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement any of the above smoking behaviour detection methods.

To achieve the above object, according to a further aspect of the embodiments of the present invention, there is provided a computer readable medium having a computer program stored thereon, the program, when executed by a processor, implementing any of the smoking behaviour detection methods described above.

According to the scheme provided by the invention, one embodiment of the invention has the following advantages or beneficial effects: based on a computer vision technology, a high-precision smoking behavior detection algorithm scheme suitable for front and side smoking is established, strict definition of smoking behavior and accompanying smoke phenomena are comprehensively considered, a plurality of algorithm ideas of deep learning are utilized, strict algorithm logic is established, the smoking behavior is accurately judged, various misjudgments and missed judgments are avoided, and the purpose of effectively preventing fire is achieved.

Further effects of the above-mentioned non-conventional alternatives will be described below in connection with the embodiments.

Drawings

The drawings are included to provide a better understanding of the invention and are not to be construed as unduly limiting the invention. Wherein:

fig. 1 is a schematic main flow chart of a smoking behavior detection method according to an embodiment of the present invention;

FIG. 2 is a flow chart of a human body region detection method according to an embodiment of the invention;

FIG. 3 is a schematic flow chart of a method of smoke detection according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of a network architecture of a smoke detection algorithm;

FIG. 5 is a schematic flow chart of a smoke recognition method according to an embodiment of the present invention;

FIG. 6 is a schematic diagram of a network structure based on a multi-feature fusion algorithm;

figure 7(a) is a schematic flow chart of an algorithm for smoking behaviour detection;

FIG. 7(b) is a schematic diagram of the algorithmic logic of the distance metric;

figures 8(a) and (b) are two smoking behaviour recognition results;

figure 9 is a schematic diagram of the main modules of a smoking behaviour detection apparatus according to an embodiment of the present invention;

FIG. 10 is an exemplary system architecture diagram in which embodiments of the present invention may be employed;

FIG. 11 is a schematic block diagram of a computer system suitable for use with a mobile device or server implementing an embodiment of the invention.

Detailed Description

Exemplary embodiments of the present invention are described below with reference to the accompanying drawings, in which various details of embodiments of the invention are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

The smoking behavior may cause fire, especially in the places with more inflammable substances, such as warehouses, stations, distribution centers and the like of logistics enterprises. Fires often cause significant economic losses and even harm to personal safety. In order to prevent and stop fire, smoking is prohibited in many occasions. However, in reality, some people often smoke in inappropriate environments regardless of relevant laws and regulations. Therefore, the automatic detection of the smoking phenomenon is implemented in some important places to discover and stop the smoking behavior as soon as possible, and the method has great significance for enterprise safety production and the like. The prior art computer vision techniques and shortcomings are described in detail herein:

the current detection schemes for smoking behavior mainly include two types: the smoke detector is a smoke automatic detection technology which is widely used at present, smoke which possibly exists in the surrounding environment is detected through a built-in sensor, an alarm signal is sent out in time, and only when the smoke concentration is monitored to be high enough, the smoke detector has high sensitivity, so that the smoke detector has little early warning significance on smoking behaviors. And the other method is to directly analyze whether the image accords with the smoking characteristics based on the computer vision technology, and detect possible smoking behaviors without interference by taking the monitoring image monitored by the camera as the input of a related algorithm.

The application of computer vision technology in the aspect of smoking behavior detection is mainly divided into a machine learning method and a deep learning method.

1) The machine learning method is generally based on a picture feature extraction algorithm designed by people and combines certain detection logic to judge whether the picture accords with the smoking feature. However, these artificial features are not very universal, resulting in more false positives and false negatives.

2) The deep learning method has been rapidly developed in recent years, and various excellent algorithms are continuously appearing. The current application of deep learning in smoking detection mainly adopts a classification algorithm therein, and aims at carrying out secondary classification on the whole image or the area near the human mouth, and outputting the conclusion that smoking exists or does not exist in the image. The main problem of the classification algorithm is that the sample coverage must be wide enough to ensure that the model has strong generalization capability; in addition, the sample generally needs to contain both the smoke and non-smoke states of the associated person in a positive situation.

Since a real-world scenario usually involves a large number of people, and the people are often constantly changing, the difficulty of sample collection is great. Moreover, such algorithms are generally not applicable to the picture case of a side-lit human body; side smoking does not show a complete mouth area and therefore it is not possible to effectively distinguish by the mouth area whether smoking activity is present. In conclusion, the recognition rate of the binary method is not very high, and the smoking behavior detection algorithm based on deep learning still needs to be further improved.

Referring to fig. 1, a main flowchart of a smoking behavior detection method provided by an embodiment of the present invention is shown, including the following steps:

s101: determining a human body area in the image, and detecting whether a smoke body exists in the human body area;

s102: if the cigarette body is detected, identifying and calculating the distance between the cigarette body position and the nose tip position of the human body, and if the distance is smaller than or equal to a preset threshold value, judging that smoking behavior exists;

s103: if the smoke body is not detected or the distance is larger than the preset threshold value, identifying whether smoke exists in the image or not, if so, judging that smoking behavior possibly exists, otherwise, judging that no smoking behavior exists.

In the above embodiment, for step S101, after acquiring the image, the present solution detects human body regions in the image through a human body detection algorithm (see the following description shown in fig. 2), detects whether there is a smoke body for each human body region, and outputs a possible smoke body position (see the following description shown in fig. 3). If the cigarette body is contained in the human body area, the position of the mouth of the human body, namely the direct contact area of the cigarette body, is further detected, but because the mouth area generally covers a large range, the point of the nose tip adjacent to the mouth area is selected as an alternative.

The detection of the nose tip mainly adopts a human body key point algorithm, and the human body key points including the nose tip, the left eye, the right eye, the left ear, the right ear, the left shoulder, the right shoulder and the like can be detected simultaneously. The algorithm comprises two processes of key point (including main human body joint points and also including some key positions such as nasal tip) detection and key point clustering, namely, all human body key points in an image are detected firstly, and then the key points are clustered to corresponding individual regions. It should be noted that the human body key point algorithm does not use a common clustering algorithm, which includes some special strategies, and each key point is assigned to a corresponding personal area.

In step S102, the smoking behavior is determined based on the distance between the tip of the nose and the cigarette body, and if the distance is smaller than a preset threshold, it is determined that the smoking behavior exists. For a human body region, suppose that the nose tip position coordinate output by the human body key point detection algorithm is (qx, qy), and the upper left corner coordinate of the jth cigarette body region output by the cigarette body detection algorithm is (s [ j [ ])][0],s[j][1]) The coordinate of the lower right corner is (s [ j ]][2],s[j][3]). Each human body region in the image is as wide as the k-th human body regionDegree and height are respectively w _k And h _k (w 'after external expansion and correction' _k And h' _k ) The general decision criteria for smoking behaviour are as follows:

|s[j][1]-q _y |＜βw' _k

where β is a distance metric coefficient, and its value is selected according to practical situations, and is generally between 0 and 1, such as 0.3. However, if the distance from the mouth/tip of the nose is relatively long, the distance can be increased if the cigarette is taken in the hand. Since the width of the human body region is generally smaller than the height, and the distance between the cigarette body and the nose tip in the horizontal direction is generally relatively small, the judgment criterion mainly calculates whether the distance between the cigarette body and the nose tip in the vertical direction is smaller than or equal to a preset threshold value. On the other hand, in order to ensure that the judgment criterion is applicable to various human body region image resolutions, the relative distance between the two has a general meaning. Typically, the picture generally contains the entire human body in the horizontal direction, but may contain only parts in the vertical direction, such as the upper body, and therefore the decision criterion selects the width w 'of the human body region' _k As a reference standard, and a distance determination operation needs to be performed for each human body region.

For step S103, the distance metric is a relatively strict determination on smoking behavior, and in reality, it is necessary to perform early warning on a situation where smoking behavior may exist, such as people and smoke existing in the image at the same time. Therefore, if a distance metric scheme is performed in which a human body in an image can be detected, but no smoke body is detected or a distance metric criterion is not satisfied, then smoke recognition is further performed on the image.

The method integrates the advantages of a deep learning algorithm and a traditional image algorithm, and adopts a smoke recognition algorithm based on multi-feature fusion to recognize whether smoke exists in the image. Image classification algorithms based on deep learning generally have strong image deep feature extraction capability, but the features have inexplicability and are even close to a black box; meanwhile, due to the discreteness and variability of the smoke form, the deep learning model is often insufficient in generalization capability. On the other hand, the smoke region usually presents a special texture structure, and the traditional image algorithm pertinently extracts a local feature of an image in some aspect and has certain advantages in capturing smoke texture. Therefore, the multi-feature fusion model comprehensively utilizes feature information of different aspects of images extracted by a deep learning algorithm and a traditional image algorithm, and can effectively improve the accuracy of identifying various smoke problems.

The embodiment is based on a computer vision technology, and a high-precision smoking behavior detection algorithm scheme which is simultaneously suitable for front and side smoking is constructed. The scheme comprehensively considers the strict definition of the smoking behavior and the accompanying smoke phenomenon, and utilizes various algorithm ideas of deep learning to establish strict algorithm logic so as to accurately judge the smoking behavior, avoid various misjudgments and missed judgments and achieve the aim of effectively preventing fire.

Referring to fig. 2, a schematic flow chart of a human body region detection method according to an embodiment of the present invention is shown, including the following steps:

s201: detecting a human body in the image to output respective regions where the human body exists;

s202: for a single human body area, determining the width and the height of the single human body area, and taking the product of the width and an external expansion coefficient as an external expansion value in the horizontal direction;

s203: and respectively carrying out outward expansion on the single human body area to the left and the right in the horizontal direction based on the outward expansion value to obtain the single human body area after the outward expansion.

In the above embodiment, regarding step S201, the present embodiment mainly describes that human bodies in the image are detected by a human body detection algorithm, and each region where a human body exists is output, so that smoke body detection is limited to the human body region, so as to eliminate interference of irrelevant background regions in the image on smoke body detection.

In a real scenario, there may be many elongated objects like cigarettes. Human body detection essentially belongs to the field of target detection of computer vision, and human body regions in images can be labeled based on public data sets (such as COCO and the like) containing human body targets and personal collected data, so that a human body detection model is trained.

The human body detection algorithm usually only strictly outputs the human body region in the picture, but in some cases, the smoke body may extend out of the human body, such as side smoking. Therefore, in order to ensure that the output human body region does not lose the possible smoke body, it is necessary to perform outward expansion to some extent to the left and right respectively along the horizontal direction on the basis of outputting the human body region by the algorithm. And when a person is in a smoking state, the cigarette can not exceed the top of the head of the person generally, so that the cigarette does not need to be expanded in the vertical direction.

Suppose that the width and height of the area occupied by the ith person in the image are w _i And h _i The coordinate of the upper left corner of the region is (p [ i [ ])][0],p[i][1]) The coordinate of the lower right corner is (p [ i ]][2],p[i][3]) For the extended body region, its minimum coordinate in the horizontal direction p [ i ]][0]And the maximum coordinate p [ i ]][2]The calculation method is as follows:

p[i][0]＝p[i][0]-αw _i

p[i][2]＝p[i][2]+αw _i

where α is an extension coefficient, generally, preferably about 0.1, and all the human body regions in the image may share one extension coefficient. Note that, since a human body region in a normal image is not a regular rectangle, w here is _i And h _i Refers to the width and height of the smallest regular rectangle that can cover the entire body area.

However, a human body region in the image may have a large size, and the extension of the human body region may cause its boundary coordinates to exceed the coordinate range of the image, resulting in a failure in outputting the corresponding region. Therefore, it is necessary to correct the boundary coordinates of the body region after the external expansion, and the specific scheme is as follows:

p[i][0]＝0，ifp[i][0]<0

p[i][2]＝W，ifp[i][2]>W

where W is the horizontal resolution of the image. Through the coordinate adjustment, the final expanded human body area is ensured to still fall into the range of the original image.

The method provided by the embodiment expands the human body region output by the human body detection algorithm and adjusts the coordinate of the boundary region based on the human body region, so that the interference of similar smoke body substances in the background on the subsequent smoke body detection can be eliminated.

Referring to fig. 3, a schematic flow chart of a smoke detection method according to an embodiment of the invention is shown, including the following steps:

s301: for a single human body area, extracting first image features of the single human body area by using a first convolution module, then detecting whether the first image features contain a cigarette body with a first size or not by using a first detection module, and if so, determining cigarette body position information;

s302: performing transition processing on the first image features by using a first transition module, extracting second image features from the first image features after transition by using a second convolution module, detecting whether the second image features contain smoke bodies with a second size by using a second detection module, and determining smoke body position information if the second image features contain the smoke bodies with the second size;

s303: performing transition processing on the second image features by using a second transition module, extracting third image features from the second image features after transition by using a third convolution module, detecting whether the third image features contain smoke bodies with a third size by using a third detection module, and determining smoke body position information if the third image features contain the smoke bodies with the third size;

s304: and summarizing the detection information to obtain detection results and position information of the smoke bodies with different sizes in the single human body area.

In the above embodiment, in step S301, for each corrected human body region, the smoke body therein is further identified by the smoke body detection model. As the detection target is the cigarette body in the human body area, the picture of the cigarette body detection model training data is processed through a human body detection algorithm, the picture of the human body area is output, and then the cigarette body is marked in the human body area, or the cigarette body is directly marked in the original picture. Theoretically, the former algorithm works well, and can eliminate the interference of some irrelevant areas to the smoke body detection, so the former method is preferred.

In order to ensure that the algorithm can identify cigarette bodies with different sizes, the invention designs a multi-size detection algorithm scheme, and the specific network structure is shown in figure 4:

the Backbone network is mainly used for extracting image characteristics;

conv 1a, Conv 1b and Conv 1c are three convolution modules for extracting image features of different scales. Each size uses 3 (the number is only an example) anchor boxes, and the size of the anchor boxes is obtained by a clustering algorithm, so the design can basically ensure the size of covering most smoke bodies.

The Trans 1a and Trans 1b are corresponding transition modules, and are used to further process the image features extracted in the previous step by other means, such as changing the number of channels of the features, upsampling, and the like. The output of the method is the processed image characteristics which are used as the input of a subsequent convolution module, and the method is favorable for improving the overall prediction precision of the algorithm.

Conv 2a, Conv 2b and Conv 2c are three detection modules with different scales and used for detecting smoke bodies with different sizes and positions thereof.

Their input profiles are successively smaller, but the field of view of perception is progressively larger, and therefore are suitable for detecting smaller, medium, and larger smoke volumes, respectively. By summarizing the results of the three detection modules, various smoke sizes and positions (not summarizing three characteristics) in the image are obtained, and a final smoke detection conclusion is output. If the summary result has no smoke, the image does not contain smoke.

In addition, based on the anchor box sizes (i.e. the width and height of the rectangle) of all labels in the training data, n anchor box sizes with the highest probability are obtained through the K-Means clustering algorithm, for example, n is equal to 9. The n anchor frames are respectively distributed to the three different detection layers, preferably uniformly distributed, that is, the number of anchor frames of each detection layer is n/3.

The loss function belongs to basic knowledge of a computer visual target detection algorithm and is mainly used for quantifying deviation of a prediction result relative to an actual situation. The device is composed of the following parts:

Loss＝L _box +L _cls +L _obj

wherein L is _box For the loss due to the deviation of the target frame size and position, L _cls To predict the loss due to class errors, L _obj The resulting loss of confidence for the target. As before, algorithmic approach9 anchor frame sizes are set first, but the actual cigarette body size cannot be substantially completely equal to the 9 anchor frame sizes. The size of the anchor frame is enabled to be close to the actual cigarette body size as much as possible by fine adjustment of the preset sizes of 9 anchor frames, and cigarette body area identification in the image is achieved.

According to the method provided by the embodiment, the smoke body detection adopts a single-stage target detection algorithm, namely the smoke body detection is regarded as a regression analysis problem of the smoke body position and type information, the detection result is directly output through a neural network model, the calculation efficiency is high, and the real-time performance is good.

Referring to fig. 5, a schematic flow chart of a smoke recognition method according to an embodiment of the present invention is shown, including the following steps:

s501: extracting a feature map of the image by using a deep learning algorithm, and converting the feature map into a one-dimensional first vector with a first length through a flattening layer; wherein the first length is a product of dimensions of the feature map in three dimensions;

s502: converting the one-dimensional vector into a second vector with a second length through the first full-link layer, and then converting the second vector into a third vector with a third length through the second full-link layer;

s503: extracting a vector of the image by using a local binary pattern, and converting the vector into a fourth vector with a fourth length through a third full-connection layer;

s504: extracting a vector of the image by using a direction gradient histogram algorithm, and converting the vector into a fifth vector with a fifth length through a fourth full-connection layer;

s505: fusing the third vector, the fourth vector and the fifth vector to obtain a total vector; wherein the length of the total vector is the sum of the third length, the fourth length, and the fifth length;

s506: converting the total vector into a seventh vector with a seventh length through a fifth full-link layer, converting the seventh vector into an eighth vector with a length of 1 through a sixth full-link layer, and taking a modulus of the eighth vector as a probability that smoke exists in the image.

In the above embodiment, as for step S501, the deep learning algorithm and the conventional algorithm in the multi-feature fusion algorithm may be selected according to the actual scene features, the algorithm accuracy requirement, the algorithm real-time requirement, and the like.

Deep learning algorithms include ResNet, VGG, GoogLeNet, etc.; related conventional image algorithms include an LBP (Local Binary Pattern) algorithm, an HOG (histogram of Oriented gradients) algorithm, and the like, wherein the LB P algorithm has high sensitivity to Local texture features of each region of an image, and has the advantages of gray scale invariance, rotation invariance, and the like. The HOG algorithm describes the outline characteristics in the image by counting the directional gradient histogram of the local area of the image.

Fig. 6 shows a schematic diagram of a network structure based on a multi-feature fusion algorithm, wherein a VGG is adopted in a deep learning algorithm, and LBP and HOG are adopted in a traditional algorithm. Dense is a full connection layer, and parameters of the layer represent the number of neurons and correspond to vector dimensions; flatten is a flattening layer and has the function of converting a multidimensional characteristic diagram into a one-dimensional characteristic vector; concatenate is a splice layer whose function is to splice several feature vectors in order.

1) Extracting the feature map size of the image as (a, b, c) by the VGG algorithm, flattening the feature map size into a one-dimensional feature vector with the length of n through a Flatten layer ₁ ：

Feature(a,b,c)→Vetor(1,1,abc)＝Vetor(1,1,n ₁ )，n ₁ ＝abc

2) The length of the one-dimensional eigenvector is converted into n through two fully connected layers Dense (n2) and Dense (n3) ₃ 。

3) The image feature vectors extracted by the LBP algorithm and the HOG algorithm are converted into the length n through the full connection layer Dense (n4) and Dense (n5) respectively ₄ And n ₅ The feature vector of (2).

4) The feature vectors extracted by the three algorithms are fused through a Concatenate layer to obtain a total feature vector with the length of n ₆ ：

Vetor(1,1,n ₃ )+Vetor(1,1,n ₄ )+Vetor(1,1,n ₅ )＝Vetor(1,1,n ₆ )，n ₆ ＝n ₃ +n ₄ +n ₅

5) And finally, converting the full connection layer Dense (n7) and Dense (1) to obtain a characteristic vector with the length of 1, wherein the mode of the vector is the probability of smoke existing in the image, namely the recognition result.

In addition, to prevent the over-fitting problem in the network training, a Dropout layer is further added behind two fully-connected layers of Dense (n2) and Dense (n7) to randomly inactivate part of neurons. The loss function selects a cross-entropy function of two classes:

wherein, y _i Is the class label of sample i, which is 1 if the sample contains smoke, otherwise it is 0. p (y) _i ) Is the probability that sample i contains smoke.

The method provided by the embodiment further reveals possible smoking behaviors by identifying the smoke in the image, thereby comprehensively reflecting the field smoking condition and avoiding the situation of missed report.

Referring to fig. 7(a), a schematic flow chart of an algorithm for smoking behavior detection according to an embodiment of the present invention is shown, which specifically includes the following aspects:

and (3) a multi-task strategy is adopted, namely, the distance measurement and the smoke identification are combined to jointly judge whether the smoking behavior exists.

1) Distance metric approach referring to fig. 7(b), the logic mainly includes several parts: 1) firstly, detecting human bodies in the images through a human body detection algorithm, and outputting each region with the human bodies; 2) and for each human body area, detecting whether smoke exists in the human body area through the smoke detection model, and outputting possible smoke positions. If the human body region contains smoke, the mouth region of the human body is further detected. 3) And finally, outputting a conclusion whether the smoking behavior exists in the image according to the distance between the position of the cigarette body and the position of the mouth/nose tip.

2) If the conclusion of the distance measurement indicates that smoking behavior exists, the final result is directly output, otherwise, on the premise of detecting a human body and the like, but no smoke body is detected or the distance measurement criterion is not met, smoke in the image is further detected in a smoke identification mode, and if the result indicates that smoke exists, the fact that smoking behavior possibly exists is prompted.

3) And outputting the recognition conclusion of the whole algorithm on the input image according to the previous result, namely 'smoking behavior exists', 'smoking behavior is possible to exist' or 'smoking behavior does not exist'.

The recognition conclusion of figure 8(a) is "smoking behavior present", where the smoke body is next to the mouth of the person, satisfying the "distance metric" criterion. The recognition conclusion of fig. 8(b) is "smoking behavior may be present"; the picture does not meet the criterion of 'distance measurement', but the conclusion of the 'multi-feature fusion smoke recognition' module is that smoke exists. The test result is consistent with the actual situation. Therefore, the whole algorithm scheme can well identify various characteristics of the smoking behavior in the image, so that comprehensive information of the smoking phenomenon in the field environment is provided in time. Here both pictures are derived from network retrieval.

Based on the algorithm scheme, corresponding picture data are collected to respectively train a human body detection model, a cigarette body detection model and a human body key point detection model; and collecting the smoke picture and the background picture to train a multi-feature fusion model for smoke recognition. The computation amount of deep learning is usually large, a GPU (Graphics Processing Unit) device of NVIDIA corporation includes a large number of computing units, and the GPU computing based on CUDA can effectively accelerate the training speed of the model and the prediction speed based on the model. One type of GPU is P40, it is based on advanced Pascal system architecture, it is 3840 to calculate the core (CUDA Cores), the single-precision computation performance reaches 12 Teraflops; meanwhile, the device has larger video memory (24GB) and video memory bandwidth (346GB/s), and is beneficial to data transmission between the CPU and the GPU.

The method provided by the embodiment of the invention designs a set of complete smoking early warning algorithm scheme based on strict definition of smoking behavior and possible accompanying smoke phenomenon aiming at the related problems of the existing computer vision algorithm in the aspect of smoking behavior detection. Has the advantages that:

1) the general cigarette body detection of front and side smoking can be discerned simultaneously. Based on the human body area output by the human body detection algorithm, the human body area is expanded and the coordinates of the boundary area are adjusted, so that the interference of similar smoke body substances in the background is eliminated, and the smoke bodies in various conditions can be completely reserved.

2) Based on the distance measurement strategy, a strict and universal smoking behavior judgment criterion is provided. By the nose positioning based on the human body key point detection algorithm and the cigarette body positioning based on the target detection algorithm, the criterion only calculates the distance between the nose and the cigarette body in the vertical direction, and converts the distance into the relative distance through the width of the human body region, thereby being suitable for various human body region resolutions.

3) A multitask algorithm scheme combining distance measurement and smoke recognition is provided. Based on the strict judgment of distance measurement, possible smoking behaviors are further revealed by identifying smoke in the image, so that the field smoking condition is comprehensively reflected, and the false alarm is avoided.

4) And (3) providing a smoke recognition algorithm based on a multi-feature fusion idea. By integrating the advantages of deep learning and the advantages of the traditional image algorithm, the algorithm performs joint optimization after correlating the feature vectors output by the deep learning and the traditional image algorithm, so that the smoke recognition capability of various smoking situations is improved.

Referring to fig. 9, a schematic diagram of main modules of a smoking behavior detection apparatus 900 according to an embodiment of the present invention is shown, including:

a detection module 901, configured to determine a human body region in the image, and detect whether a smoke body exists in the human body region;

the distance judgment module 902 is used for identifying and calculating the distance between the cigarette body position and the human nose tip position if the cigarette body is detected, and judging that smoking behavior exists if the distance is smaller than or equal to a preset threshold value;

a smoke recognition module 903, configured to recognize whether smoke exists in the image if no smoke is detected or the distance is greater than the preset threshold, determine that a smoking behavior may exist if the smoke exists, and determine that no smoking behavior exists if the smoke does not exist.

In the implementation apparatus of the present invention, the detecting module 901 is further configured to:

for a single human body area, determining the width and the height of the single human body area, and taking the product of the width and an external expansion coefficient as an external expansion value in the horizontal direction;

and respectively carrying out outward expansion on the single human body area to the left and the right in the horizontal direction based on the outward expansion value to obtain the single human body area after the outward expansion.

In the implementation apparatus of the present invention, the detecting module 901 is configured to: and correcting the boundary coordinates of the single expanded human body area based on the resolution of the image in the horizontal direction and the vertical direction to obtain the single expanded and corrected human body area.

In the device for implementing the invention, the width and the height are the width and the height of a minimum regular rectangle capable of covering the single human body area.

In the implementation apparatus of the present invention, the detecting module 901 is configured to:

for a single human body area, extracting first image features of the single human body area by using a first convolution module, then detecting whether the first image features contain a cigarette body with a first size or not by using a first detection module, and if so, determining cigarette body position information;

performing transition processing on the first image features by using a first transition module, extracting second image features from the first image features after transition by using a second convolution module, detecting whether the second image features contain smoke bodies with a second size by using a second detection module, and determining smoke body position information if the second image features contain the smoke bodies with the second size;

performing transition processing on the second image features by using a second transition module, extracting third image features from the second image features after transition by using a third convolution module, detecting whether the third image features contain smoke bodies with a third size by using a third detection module, and determining smoke body position information if the third image features contain the smoke bodies with the third size;

and summarizing the detection information to obtain detection results and position information of the smoke bodies with different sizes in the single human body area.

In the implementation apparatus of the present invention, the detecting module 901 is further configured to: acquiring the sizes of all marked anchor frames, and acquiring a plurality of anchor frame sizes with the largest clustering result in a clustering mode; and distributing the sizes of the plurality of anchor frames to the first detection module, the second detection module and the third detection module, and identifying the cigarette body area in a mode of finely adjusting the sizes of the anchor frames.

In the implementation apparatus of the present invention, the detecting module 901 is further configured to: and detecting all human key points in the image by adopting a human key point algorithm, and clustering each key point to the corresponding individual region to determine the nose tip position of each human region.

In the device for implementing the present invention, the distance determining module 902 is configured to:

determining the coordinate value of the upper left corner of a single cigarette body area, and calculating the distance between the upper left corner and the nose tip of the human body in the vertical direction;

and calculating the product of the width of the single human body area and a preset distance measurement coefficient, and if the distance is less than or equal to the product, judging that smoking behavior exists in the single cigarette body area.

In the implementation apparatus of the present invention, the smoke recognition module 903 is configured to: identifying the probability of smoke existing in the image based on a multi-feature fusion smoke identification algorithm; the multi-feature fused smoke recognition algorithm comprises a deep learning algorithm and an image algorithm.

In the implementation device of the invention, the image algorithm comprises a direction gradient histogram algorithm and a local binary pattern;

the smoke recognition module 903 is configured to: extracting a feature map of the image by using the deep learning algorithm, and converting the feature map into a one-dimensional first vector with a first length through a flattening layer; wherein the first length is a product of dimensions of the feature map in three dimensions;

converting the one-dimensional vector into a second vector with a second length through the first full-link layer, and then converting the second vector into a third vector with a third length through the second full-link layer;

extracting a vector of the image by using the local binary pattern, and converting the vector into a fourth vector with a fourth length through a third full-connection layer;

extracting a vector of the image by using the direction gradient histogram algorithm, and converting the vector into a fifth vector with a fifth length through a fourth full-connection layer;

fusing the third vector, the fourth vector and the fifth vector to obtain a total vector; wherein the length of the total vector is the sum of the third length, the fourth length, and the fifth length;

converting the total vector into a seventh vector with a seventh length through a fifth full-link layer, converting the seventh vector into an eighth vector with a length of 1 through a sixth full-link layer, and taking a modulus of the eighth vector as a probability that smoke exists in the image.

In addition, the detailed implementation of the device in the embodiment of the present invention has been described in detail in the above method, so that the repeated description is not repeated here.

FIG. 10 illustrates an exemplary system architecture 1000 to which embodiments of the invention may be applied.

As shown in fig. 10, the system architecture 1000 may include

terminal devices

1001, 1002, 1003, a network 1004, and a server 1005 (by way of example only). The network 1004 is used to provide a medium for communication links between the

terminal devices

1001, 1002, 1003 and the server 1005. Network 1004 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

A user may use the

terminal devices

1001, 1002, 1003 to interact with a server 1005 via a network 1004 to receive or transmit messages or the like. Various communication client applications may be installed on the

terminal devices

1001, 1002, 1003.

The

terminal devices

1001, 1002, 1003 may be various electronic devices having a display screen and supporting web browsing, and the server 1005 may be a server that provides various services.

It is to be noted that the method provided by the embodiment of the present invention is generally executed by the server 1005, and accordingly, the apparatus is generally disposed in the server 1005.

It should be understood that the number of terminal devices, networks, and servers in fig. 10 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

Referring now to FIG. 11, shown is a block diagram of a computer system 1100 suitable for use with a terminal device implementing an embodiment of the present invention. The terminal device shown in fig. 11 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present invention.

As shown in fig. 11, the computer system 1100 includes a Central Processing Unit (CPU)1101, which can perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)1102 or a program loaded from a storage section 1108 into a Random Access Memory (RAM) 1103. In the RAM 1103, various programs and data necessary for the operation of the system 1100 are also stored. The CPU 1101, ROM 1102, and RAM 1103 are connected to each other by a bus 1104. An input/output (I/O) interface 1105 is also connected to bus 1104.

The following components are connected to the I/O interface 1105: an input portion 1106 including a keyboard, mouse, and the like; an output portion 1107 including a signal output unit such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and a speaker; a storage section 1108 including a hard disk and the like; and a communication section 1109 including a network interface card such as a LAN card, a modem, or the like. The communication section 1109 performs communication processing via a network such as the internet. A driver 1110 is also connected to the I/O interface 1105 as necessary. A removable medium 1111 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 1110 as necessary, so that a computer program read out therefrom is mounted into the storage section 1108 as necessary.

In particular, according to the embodiments of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication portion 1109 and/or installed from the removable medium 1111. The above-described functions defined in the system of the present invention are executed when the computer program is executed by a Central Processing Unit (CPU) 1101.

It should be noted that the computer readable medium shown in the present invention can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present invention, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present invention, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The modules described in the embodiments of the present invention may be implemented by software or hardware. The described modules may also be provided in a processor, which may be described as: a processor comprises a detection module, a distance judgment module and a smoke identification module. Where the names of these modules do not in some cases constitute a limitation on the module itself, for example, the detection module may also be described as a "human and smoke detection module".

As another aspect, the present invention also provides a computer-readable medium that may be contained in the apparatus described in the above embodiments; or may be separate and not incorporated into the device. The computer readable medium carries one or more programs which, when executed by a device, cause the device to comprise:

According to the technical scheme of the embodiment of the invention, a set of complete smoking early warning algorithm scheme is designed aiming at the related problems of the existing computer vision algorithm in the aspect of smoking behavior detection based on the strict definition of smoking behavior and the possible accompanying smoke phenomenon.

The above-described embodiments should not be construed as limiting the scope of the invention. Those skilled in the art will appreciate that various modifications, combinations, sub-combinations, and substitutions can occur, depending on design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A smoking behavior detection method is characterized by comprising the following steps:

2. The method of claim 1, wherein the determining a region of a human body in an image further comprises:

3. The method of claim 2, further comprising, after said obtaining the dilated body region:

and correcting the boundary coordinates of the single body area subjected to the external expansion based on the resolution of the image in the horizontal direction and the vertical direction to obtain the single body area subjected to the external expansion and correction.

4. The method of claim 2, wherein the width and height are the width and height of the smallest regular rectangle that can cover the single body area.

5. The method of claim 1, wherein said detecting the presence or absence of smoke in each body region comprises:

6. The method of claim 5, further comprising:

acquiring the sizes of all marked anchor frames, and acquiring a plurality of anchor frame sizes with the largest clustering result in a clustering mode;

and distributing the sizes of the plurality of anchor frames to the first detection module, the second detection module and the third detection module, and identifying the cigarette body area in a mode of finely adjusting the sizes of the anchor frames.

7. The method of claim 1, further comprising:

and detecting all human key points in the image by adopting a human key point algorithm, and clustering each key point to the corresponding individual region to determine the nose tip position of each human region.

8. The method according to claim 6 or 7, wherein the step of identifying and calculating the distance between the smoke body position and the nose tip position of the human body, and if the distance is smaller than or equal to a preset threshold value, determining that smoking behavior exists comprises the following steps:

9. The method of claim 1, wherein the identifying whether smoke is present in the image comprises:

identifying the probability of the image with smoke based on a multi-feature fusion smoke identification algorithm; the multi-feature fused smoke recognition algorithm comprises a deep learning algorithm and an image algorithm.

10. The method of claim 9, wherein the image algorithm comprises a histogram of oriented gradients algorithm and a local binary pattern;

the method for identifying the probability of smoke existing in the image based on the multi-feature fusion comprises the following steps:

extracting a feature map of the image by using the deep learning algorithm, and converting the feature map into a one-dimensional first vector with a first length through a flattening layer; wherein the first length is a product of dimensions of the feature map in three dimensions;

11. A smoking behaviour detection device, comprising:

the detection module is used for determining a human body area in the image and detecting whether a cigarette body exists in the human body area;

the distance judgment module is used for identifying and calculating the distance between the cigarette body position and the human nose tip position if the cigarette body is detected, and judging that smoking behavior exists if the distance is smaller than or equal to a preset threshold value;

and the smoke identification module is used for identifying whether smoke exists in the image or not if no smoke body is detected or the distance is larger than the preset threshold value, judging that smoking behavior possibly exists if the smoke exists, and otherwise, judging that no smoking behavior exists.

12. An electronic device, comprising:

one or more processors;

a storage device for storing one or more programs,

when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-10.

13. A computer-readable medium, on which a computer program is stored, which, when being executed by a processor, carries out the method according to any one of claims 1-10.