CN112464797B

CN112464797B - Smoking behavior detection method and device, storage medium and electronic equipment

Info

Publication number: CN112464797B
Application number: CN202011344496.6A
Authority: CN
Inventors: 张发恩; 葛振朋; 陈锐桐
Original assignee: Innovation Qizhi Chengdu Technology Co ltd
Current assignee: Innovation Qizhi Chengdu Technology Co ltd
Priority date: 2020-11-25
Filing date: 2020-11-25
Publication date: 2024-04-02
Anticipated expiration: 2040-11-25
Also published as: CN112464797A

Abstract

The application relates to the technical field of deep learning, and provides a smoking behavior detection method, a device, a storage medium and electronic equipment. The method comprises the following steps: performing bone point detection on the target image by using a bone point detection model to obtain each bone point coordinate corresponding to each person in the target image; inputting the coordinates of each bone point of each person into a classification model to obtain a preliminary classification result of whether each person in the target image smokes or not; extracting the face area of the person from the target image to obtain a face image for the person with smoking as the primary classification result; and detecting cigarettes on the face image by using the target detection model. The method adopts a cascade model of skeleton point detection, classification and target detection, utilizes the skeleton point detection and classification model to rapidly screen out potential smoking objects, extracts the face position of the smoking objects, and carries out fine-grained cigarette target detection on the faces of the smoking objects, thereby having real-time performance and detection accuracy.

Description

Smoking behavior detection method and device, storage medium and electronic equipment

Technical Field

The invention relates to the technical field of deep learning, in particular to a smoking behavior detection method and device, a storage medium and electronic equipment.

Background

In view of the fact that smoking behaviors in public places (such as subways, trains, markets and the like) can cause great potential safety hazards to the physical health and environmental safety of smokers and surrounding people, at present, relative departments make strict management measures to stop smoking behaviors in public places and monitor the smoking behaviors by assisting corresponding detection means. The traditional smoking detection means mostly depend on a smoke sensor, but the application occasions of the means are limited, and the means are not suitable for relatively open occasions.

Smoking detection based on video surveillance may be a good alternative in many situations. However, cigarettes belong to a very small target object, are small in size and unobvious in target, and bring great difficulty to smoking detection, especially in videos with poor resolution, so that the screening precision of the means on smoking behaviors is not high.

Disclosure of Invention

An objective of an embodiment of the present application is to provide a smoking behavior detection method, a device, a storage medium, and an electronic apparatus, so as to improve the above technical problems.

In order to achieve the above purpose, the present application provides the following technical solutions:

in a first aspect, embodiments of the present application provide a smoking behavior detection method, including: performing bone point detection on a target image by using a bone point detection model to obtain coordinates of each bone point corresponding to each person in the target image; respectively inputting the bone point coordinates of each person into a classification model to obtain a preliminary classification result of whether each person in the target image smokes or not; extracting the face area of the person from the target image to obtain a face image for the person with smoking as the primary classification result; and carrying out cigarette detection on the facial image by using a target detection model to determine whether the person smokes or not.

The method adopts a cascade model of skeleton point detection, classification and target detection, utilizes the skeleton point detection and classification model to rapidly screen out potential smoking objects, extracts the face positions of the smoking objects, carries out fine-granularity cigarette target detection on the faces of the smoking objects, has high detection speed in the whole detection method, and belongs to a large target in a face image, so that cigarettes are easier to detect, and the detection accuracy is high.

In an alternative embodiment, the extracting the face region of the person from the target image includes: according to the bone point numbers of the bone point coordinates corresponding to the person, determining bone point coordinates corresponding to the face bone points of the person; and extracting a corresponding face region from the target image according to the bone point coordinates of the face bone points.

Extraction of the facial region from the target image can be performed by means of the output result of the skeletal point detection model without having to perform face recognition on the image alone.

In an alternative embodiment, the method further comprises: acquiring a plurality of training images; performing skeleton point detection on each training image by using a skeleton point detection model to obtain the coordinates of each skeleton point of the person in each training image; and training a support vector machine by utilizing the coordinates of each bone point of each training image and the corresponding label to obtain the classification model.

Before the smoking behavior detection is carried out by applying the method, training is carried out on the support vector machine so as to obtain a classification model for classifying the bone point coordinates.

In an optional embodiment, the extracting the face area of the person from the target image to obtain a face image includes: extracting the face area of the person from the target image, and expanding the width of the face area outwards according to a preset proportion to obtain a new face area; and obtaining a corresponding face image according to the new face area.

In order to ensure that the face image contains the cigarette target as much as possible, the extracted face area is expanded outwards to solve the problem that the cigarettes are not positioned in the face area due to the person standing on the side.

In an alternative embodiment, the detecting the cigarette by using the target detection model includes: cutting the facial image into an upper half part and a lower half part to obtain a lower half part image; and detecting cigarettes by using the target detection model.

When a smoker smokes, the cigarette is basically positioned at the mouth of the smoker, and the mouth is positioned at the lower half part of the facial area, so that the information of the upper half part of the facial image can be abandoned, and only the lower half part of the facial image is subjected to cigarette target detection, so that the detection speed can be increased, and meanwhile, the detection accuracy is not lost.

In an alternative embodiment, the detecting the cigarette by using the target detection model includes: acquiring a binary image of the face image; searching two connected areas meeting preset conditions in the upper half area of the binary image, wherein the preset conditions are as follows: the difference of the areas of the two communication areas is not more than a first threshold value, and the inclination angle formed by the connecting line of the barycenters of the two communication areas is not more than a second threshold value; determining the lowest point of each of the two communication areas, taking the area below the connecting line of the two lowest points as a target area, and taking an image corresponding to the target area as an image to be detected; and detecting cigarettes by using the target detection model.

The target area possibly existing in the cigarette is extracted by intelligently identifying the eye area, and the mouth is necessarily below the eyes, so that the obtained target area necessarily contains the mouth area, and the method discards a part of image information above the eyes, reduces useless operation and does not lose the information of the mouth area by mistake. Meanwhile, the proportion of the cigarettes in the images can be further increased, so that the detection of a large target of an original face image is changed into the detection of a larger target, the target detection model is enabled to learn the local characteristics of the cigarettes more easily, and the detection accuracy is further improved.

In a second aspect, embodiments of the present application provide a smoking behavior detection device, including: the bone point detection module is used for detecting bone points of the target image by utilizing the bone point detection model, and obtaining coordinates of each bone point corresponding to each person in the target image; the primary screening module is used for inputting the bone point coordinates of each person into a classification model respectively, obtaining a primary classification result of whether each person in the target image smokes or not, and extracting the face area of the person from the target image for the person with the primary classification result of smoking to obtain a face image; and the cigarette detection module is used for detecting cigarettes on the facial image by using a target detection model so as to determine whether the person smokes or not.

In an alternative embodiment, the preliminary screening module is specifically configured to: and determining bone point coordinates corresponding to the facial bone points of the person according to the bone point numbers of the bone point coordinates corresponding to the person, and extracting a facial region from the target image according to the bone point coordinates of the facial bone points.

In a third aspect, embodiments of the present application provide a storage medium having stored thereon a computer program which, when executed by a processor, performs a method according to any of the alternative embodiments of the first aspect, the first aspect.

In a fourth aspect, embodiments of the present application provide an electronic device, including: a processor, a memory and a bus, the memory storing machine-readable instructions executable by the processor, the processor and the memory in communication via the bus when the electronic device is running, the machine-readable instructions when executed by the processor performing the method according to any of the alternative embodiments of the first aspect, the first aspect.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments of the present application will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and should not be considered as limiting the scope, and other related drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 shows a flowchart of a smoking behavior detection method provided in an embodiment of the present application;

FIG. 2 shows a flow chart of one way of further processing a facial image in accordance with an embodiment of the present application;

FIG. 3 illustrates a flow chart of training a classification model according to an embodiment of the present application;

fig. 4 shows a schematic diagram of a smoking behaviour detection device according to an embodiment of the present application;

fig. 5 shows a schematic diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application. It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further definition or explanation thereof is necessary in the following figures. The terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element. The terms "first," "second," "third," and the like are used merely to distinguish between descriptions and are not to be construed as indicating or implying relative importance.

The embodiment of the application provides a smoking behavior detection method, which adopts a cascading model mode to filter images from coarse granularity to fine granularity, and can give consideration to real-time performance and accuracy of detection. Referring to fig. 1, the smoking behavior detection method provided in the embodiment of the present application includes the following steps:

step 110: a target image is acquired.

The target image may be a single image or one of the frames in a video. If the embodiment is used for detecting the smoking behavior of a person in a video, the target image is one frame of image in the video, and when detecting, smoking behavior can be detected for each frame of image in the video, or the detection frequency can be set according to the need, for example, one frame of target image is acquired from the video every 30 frames, or one frame of target image is acquired from the video every 60 frames, or the like.

Step 120: and detecting the bone points of the target image by using the bone point detection model to obtain the coordinates of each bone point corresponding to each person in the target image.

The human skeleton points are large targets relative to cigarettes and are easy to detect. After the target image is obtained, the target image is input into a bone point detection model, bone key points of all people in the target image are detected by using the bone point detection model, and the bone point detection model outputs bone point coordinates corresponding to each person in the target image.

One coordinate system corresponding to the skeletal point coordinates is: the coordinate system is obtained by taking the top left corner vertex of the minimum rectangular frame containing all the bone point positions as the coordinate origin, taking the width of the image as the x axis and the height as the y axis, and outputting the bone point coordinates of the bone point detection model. Of course, other coordinate systems may be used, for example, with the lower left vertex of the minimum rectangular frame as the origin of coordinates.

The bone spot detection model can adopt a model of openpose, alphapose, hyperpose and the like, and has a relatively high detection speed.

Step 130: and respectively inputting the coordinates of each bone point of each person into a classification model to obtain a preliminary classification result of whether each person in the target image smokes or not.

And inputting the coordinates of each bone point output by the bone point detection model into a classification model, and primarily judging whether the person in the target image is suspected to have smoking behaviors. For example, five persons are in total in the target image, the skeleton point detection model outputs the skeleton point coordinates corresponding to each person, and then the skeleton point coordinates of each person are respectively input into the classification model to obtain corresponding preliminary classification results, and five preliminary classification results are obtained in total. The input of the classification model is the coordinates of each bone point, the output is the preliminary classification result of the input coordinates of each bone point, and the preliminary classification result comprises: the corresponding person does not smoke and the corresponding person is smoking.

In step 130, the classification model is based on the fact that smoking is an action behavior, and has a strong correlation with the positions of bone points, for example, information such as a hand lifting action, a relative position of a hand and lips can be identified according to the coordinates of the bone points, so as to determine whether the smoking is a suspected smoker.

It can be understood that the skeleton point coordinates only contain the position information of the skeleton points, and the classification model can achieve a faster speed for classifying the skeleton point coordinates, which belongs to low-dimensional data. The preliminary classification result of the classification model can be used for filtering out most people without smoking behaviors in the target image.

Step 140: and extracting the face area of the person from the target image to obtain a face image for the person with smoking as the primary classification result.

Step 150: the face image is subjected to cigarette detection using a target detection model to determine whether the person smokes.

According to the preliminary classification result, whether people in the target image smoke or not and filter non-smoking people can be determined, if five people are in the target image, two people without smoking are filtered, and for the other three people suspected of smoking, face areas of the three people are respectively extracted from the target image, so that three corresponding face images are obtained.

If there is no smoking person in the preliminary classification result, the object detection process of steps 140-150 is not performed, and step 110 is skipped to obtain the next frame of object image in the video.

Specifically, after determining a person suspected to smoke according to the preliminary classification result, extracting a face area corresponding to the person from the target image to obtain a face image, detecting the cigarette in the face image by using the target detection model, if a cigarette target is detected in the face image, determining that the person in the target image is actually detected as smoking, and if the cigarette target is not detected in the face image, determining that the person in the target image is not detected as smoking.

Extraction of facial regions from the target image may be performed by means of the output of the bone-point detection model in step 120. Each skeletal point coordinate output by the skeletal point detection model has a corresponding skeletal point number, for example, a facial skeletal point corresponding to the number 1, a left shoulder skeletal point corresponding to the number 2, a left elbow skeletal point corresponding to the number 3, and a left wrist skeletal point corresponding to the number 4. After each bone point coordinate output by the bone point detection model is obtained, the bone point coordinate corresponding to the face bone point of the person is determined according to the bone point number of each bone point coordinate corresponding to the person, and the corresponding face region is extracted from the target image according to the bone point coordinate of the face bone point.

After extracting the face region, a face image can be obtained from the face region. It will be appreciated that the process of extracting the facial region may employ the manner of providing the output result based on the skeletal point detection model described above, or may employ other manners to identify the facial region of the person.

After extracting the face region, the face region is expanded outward according to a preset pixel or a preset ratio to obtain a face image.

The expansion mode according to the preset pixels is that n pixel points are respectively expanded outwards from the left and the right of the face area, of course, n pixel points can be also expanded outwards from the upper, the lower, the left and the right of the face area, a new face area is obtained, and a corresponding face image is obtained according to the new face area.

The expansion according to the preset proportion is to expand the width of the face area outwards according to the preset proportion, and of course, the whole face area can also be expanded outwards according to the preset proportion to obtain a new face area, and a corresponding face image is obtained according to the new face area. The preset proportion may be 10%.

In order to ensure that cigarette objects are contained in the face image as much as possible, the extracted face area is expanded to the left and right sides so as to solve the problem that the cigarettes are not positioned in the face area due to the person standing at the side.

After the face image is obtained, a cigarette target in the face image is detected using a target detection model. Because the target detection process is only aimed at the facial image, but not at the complete target image, the resource consumption can be greatly reduced, the calculated amount is small, and the operation speed is high. Meanwhile, the cigarette is small target detection relative to the whole target image, has a larger proportion relative to the whole face image, is large target detection, and the target detection model is easy to learn the characteristics of the cigarette, so that the detection result is more accurate.

In the detection of facial images by using the object detection model, to further increase the proportion of cigarettes, the facial images may be further processed by two methods including, but not limited to:

(1) After obtaining the face image, the face image is cut into an upper half and a lower half, and a lower half image is obtained. Then, the lower half image is subjected to cigarette detection by using the target detection model.

When a smoker smokes, the cigarettes are basically positioned at the mouth of the smoker, and the mouth is usually positioned at the lower half part of the facial area, so that the information of the upper half part of the facial image can be abandoned, and only the lower half part of the facial image is subjected to cigarette target detection, so that the detection speed can be increased, and meanwhile, the detection accuracy is not lost.

(2) And automatically dividing the image to be detected according to the position of the eyes.

Referring to fig. 2, the embodiment of the mode (2) includes the following steps:

step 210: a binary image of the face image is acquired.

And carrying out binarization processing on the face image to obtain a corresponding binary image.

Step 220: searching two connected areas meeting preset conditions in the upper half area of the binary image.

Searching two connected areas meeting preset conditions in the upper half area of the binary image, wherein the preset conditions are as follows: the difference between the areas of the two communication areas is not greater than a first threshold value, and the inclination angle formed by the connecting line of the centroids of the two communication areas is not greater than a second threshold value. In one embodiment, if two connected regions with similar areas exist in the binary image, and the inclination angle of the connecting line formed by the centroids of the two connected regions in the horizontal direction is small, the two connected regions correspond to the eye region and are respectively a left eye region and a right eye region.

The second threshold may be a value in the range of 0-20 deg..

Step 230: and determining the lowest point of the two connected areas, taking the area below the connecting line of the two lowest points as a target area, and taking an image corresponding to the target area as an image to be detected.

After two connected areas meeting the preset conditions are searched, the lowest point of the two connected areas is determined, the connecting line of the two lowest points is determined, and obviously, the mouth of the person is necessarily below the connecting line, so that the area below the connecting line in the face image is taken as a target area, and the image part corresponding to the target area in the face image is extracted to be taken as an image to be detected.

In one embodiment, the connecting line is used as the upper edge of the minimum bounding rectangle, and the minimum bounding rectangle of the target area is obtained, and the size of the minimum bounding rectangle is the size of the image to be detected. The image to be detected contains an image corresponding to the target area, the part of the image is directly extracted from the face image, the part of the image contains an image outside the target area, the part of the image is automatically generated, and the pixel values of the part of the image are all set to be zero. Thereby, the irregular target region can be transformed into the rectangular-shaped image to be detected. It will be appreciated that the portion of the image outside the target area has zero pixel values and therefore does not have any effect on the cigarette detection process in step 240.

Step 240: and detecting cigarettes by using the target detection model.

After the image to be detected is extracted, the image to be detected is output to a target detection model, and whether the cigarette target exists in the image to be detected is detected by using the target detection model.

In the above steps 210 to 240, first, two connected regions corresponding to eyes are identified from the face image, then, a target region to be detected is extracted based on the position where the eye region is located, and finally, target detection is performed based on the target region.

In the above-described embodiment (1), although only the lower half image is uniformly detected regardless of the posture of the person in the image, in some cases, the mouth is not necessarily positioned in the lower half of the face image, and for example, when a person smoke his head, the mouth may be positioned in the upper half region or the middle region of the face image. While the method (2) extracts the target area possibly existing in the cigarette by intelligently identifying the eye area, it can be understood that the mouth is necessarily below the eyes, so that the obtained target area necessarily contains the mouth area, and the method discards a part of image information above the eyes, reduces useless operation and does not lose the information of the mouth area by mistake. Meanwhile, although the cigarette target occupies a larger proportion in the face image, obviously, the cigarette target occupies a larger proportion in the image to be detected, so that the original large target detection is changed into the detection of a larger target, the target detection model is easier to learn the local characteristics of the cigarette, and the detection accuracy is further improved.

In this embodiment, the object detection model may be a model such as YOLO V3, FCOS, faster-RCNN.

The embodiment of the application provides a method for cascading models of a skeleton point detection model, a classification model and a target detection model, and simultaneously meets the requirements of detection speed and accuracy, and provides a new method for rapidly and accurately judging smoking behaviors. The method of adding the classification model to the bone point detection model is utilized to rapidly screen potential smoking objects, and face areas of the potential smoking objects are extracted, so that the requirement of detection instantaneity can be met, on the other hand, face images of the potential smoking objects are extracted, fine granularity detection is carried out on the face images, the coarse granularity is changed into the fine granularity, higher detection precision can be achieved, and meanwhile, target detection is not carried out on non-smoking objects, and the calculation amount of target detection can be reduced.

Before applying the method to smoking behavior detection, a classification model is first trained. The classification model may be a support vector machine, please refer to fig. 3, and the training process is as follows:

step 310: a plurality of training images is acquired.

Before training, a large number of training images are collected in advance, wherein the training images comprise images of smokers and images of non-smokers, each training image is marked with a corresponding label manually, if the training image is a smoker, the label is 1, and otherwise, the label is 0. The training images of smokers are taken as positive samples, the training images of non-smokers are taken as negative samples, and the number of the positive and negative samples is balanced as much as possible. The number of samples may be larger in number, for example, the number of positive and negative samples is not less than 2000.

Step 320: and detecting the bone points of each training image by using the bone point detection model to obtain the coordinates of each bone point of the person in each training image.

And respectively inputting each training image into a bone point detection model, and outputting corresponding bone point coordinates by the bone point detection model. The bone point detection model may not need to be trained alone if a model already trained by other people is adopted, and may be trained together with a support vector machine of the next stage if a model already trained is not adopted.

In this embodiment, the skeleton point detection model can be modified to detect skeleton points of only the hands, arms, neck, and face of a person, without detecting skeleton points of the whole body.

Step 330: and training a support vector machine by utilizing the coordinates of each bone point of each training image and the corresponding label, and obtaining a classification model after training is completed.

And sending the bone point coordinates of each training image and the corresponding label into a support vector machine as input data for training until the training conditions are met, stopping the training process, and completing the training. After training, the trained support vector machine is used as the classification model.

In summary, the embodiment of the application adopts a cascade model of skeleton point detection, classification and target detection, and utilizes the skeleton point detection and classification model to rapidly screen out potential smoking objects, extract the face position of the smoking objects, and perform fine-grained cigarette target detection on the faces, thereby having real-time performance and detection accuracy.

Based on the same inventive concept, an embodiment of the present application provides a smoking behavior detection device, please refer to fig. 4, which includes:

the bone point detection module 410 is configured to perform bone point detection on a target image by using a bone point detection model, so as to obtain coordinates of each bone point corresponding to each person in the target image;

the preliminary screening module 420 is configured to input the coordinates of the bone points of each person into a classification model, obtain a preliminary classification result of whether each person in the target image smokes, and extract a face area of the person from the target image for a person whose preliminary classification result is smoking, so as to obtain a face image;

a cigarette detection module 430 for detecting cigarettes from the facial image using a target detection model to determine whether the person smokes.

Optionally, the preliminary screening module 420 is specifically configured to: and determining bone point coordinates corresponding to the facial bone points of the person according to the bone point numbers of the bone point coordinates corresponding to the person, and extracting a facial region from the target image according to the bone point coordinates of the facial bone points.

Optionally, the device further includes a training module, and the training module includes:

a training image acquisition unit configured to acquire a plurality of training images;

the bone point detection unit is used for detecting bone points of each training image by utilizing a bone point detection model to obtain the coordinates of each bone point of the person in each training image;

and the support vector machine training unit is used for training a support vector machine by utilizing the coordinates of each bone point of each training image and the corresponding label so as to obtain the classification model.

Optionally, the preliminary screening module 420 is specifically configured to: extracting the face area of the person from the target image, and expanding the width of the face area outwards according to a preset proportion to obtain a new face area; and obtaining a corresponding face image according to the new face area.

Optionally, the cigarette detection module 430 includes: an image cutting unit for cutting the face image into an upper half and a lower half to obtain a lower half image; and the cigarette detection unit is used for detecting cigarettes on the lower half part image by using the target detection model.

Optionally, the cigarette detection module 430 includes:

a binary image acquisition unit configured to acquire a binary image of the face image;

the connected region searching unit is used for searching two connected regions meeting preset conditions in the upper half region of the binary image, and the preset conditions are as follows: the difference of the areas of the two communication areas is not more than a first threshold value, and the inclination angle formed by the connecting line of the barycenters of the two communication areas is not more than a second threshold value;

the image to be detected determining unit is used for determining the lowest point of each of the two communication areas, taking the area below the connecting line of the two lowest points as a target area, and taking an image corresponding to the target area as an image to be detected;

and the cigarette detection unit is used for detecting cigarettes from the image to be detected by utilizing the target detection model.

The smoking behavior detection device provided in the embodiments of the present application has been described in the foregoing method embodiments, and for brevity, reference may be made to the corresponding contents of the method embodiments where the device embodiment is not mentioned.

Fig. 5 shows one possible structure of an electronic device 500 provided in an embodiment of the present application. Referring to fig. 5, an electronic device 500 includes: processor 510, memory 520, and communication interface 530, which are interconnected and communicate with each other by a communication bus 540 and/or other forms of connection mechanisms (not shown).

The Memory 520 includes one or more (Only one is shown in the figure), which may be, but is not limited to, a random access Memory (Random AccessMemory, RAM), a Read Only Memory (ROM), a programmable Read Only Memory (Programmable Read-Only Memory, PROM), an erasable programmable Read Only Memory (Erasable Programmable Read-Only Memory, EPROM), an electrically erasable programmable Read Only Memory (Electric Erasable Programmable Read-Only Memory, EEPROM), and the like. Processor 510 and other possible components may access memory 520, read and/or write data therein.

Processor 510 includes one or more (only one shown) which may be an integrated circuit chip having signal processing capabilities. The processor 510 may be a general-purpose processor, including a Central Processing Unit (CPU), a micro control unit (Micro Controller Unit MCU), a Network Processor (NP), or other conventional processors; but may also be a special purpose processor including a graphics processor (Graphics Processing Unit, GPU), digital Signal Processor (DSP), application specific integrated circuit (Application Specific Integrated Circuits ASIC), field programmable gate array (Field Programmable Gate Array FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components. Also, when the processor 510 is plural, some of them may be general-purpose processors, and another may be special-purpose processors.

Communication interface 530 includes one or more (only one shown) that may be used to communicate directly or indirectly with other devices for data interaction. Communication interface 530 may include an interface for wired and/or wireless communication.

One or more computer program instructions may be stored in the memory 520 that may be read and executed by the processor 510 to implement the smoking behavior detection methods and other desired functions provided by embodiments of the present application.

It is to be understood that the configuration shown in fig. 5 is merely illustrative, and that electronic device 500 may also include more or fewer components than those shown in fig. 5, or have a different configuration than that shown in fig. 5. The components shown in fig. 5 may be implemented in hardware, software, or a combination thereof. The electronic device 500 may be a physical device such as a PC, a notebook, a tablet, a cell phone, a server, an embedded device, etc., or may be a virtual device such as a virtual machine, a virtualized container, etc. The electronic device 500 is not limited to a single device, and may be a combination of a plurality of devices or a cluster of a large number of devices.

The embodiment of the application also provides a computer readable storage medium, and the computer readable storage medium stores computer program instructions, which when read and run by a processor of a computer, execute the smoking behavior detection method provided by the embodiment of the application. For example, a computer-readable storage medium may be implemented as memory 520 in electronic device 500 in fig. 5.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other manners. The above described embodiment of the apparatus is only illustrative, e.g. the division of the units is only one logical function division, and there may be other ways of dividing in practice. Furthermore, functional modules in various embodiments of the present application may be integrated together to form a single portion, or each module may exist alone, or two or more modules may be integrated to form a single portion.

The foregoing is merely exemplary embodiments of the present application and is not intended to limit the scope of the present application, and various modifications and variations may be suggested to one skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principles of the present application should be included in the protection scope of the present application.

Claims

1. A smoking behaviour detection method, comprising:

performing bone point detection on a target image by using a bone point detection model to obtain coordinates of each bone point corresponding to each person in the target image;

respectively inputting the bone point coordinates of each person into a classification model to obtain a preliminary classification result of whether each person in the target image smokes or not;

extracting the face area of the person from the target image to obtain a face image for the person with smoking as the primary classification result;

detecting cigarettes on the face image by using a target detection model to determine whether the person smokes or not;

the detecting the cigarette by using the target detection model comprises the following steps:

acquiring a binary image of the face image;

searching two connected areas meeting preset conditions in the upper half area of the binary image, wherein the preset conditions are as follows: the difference of the areas of the two communication areas is not more than a first threshold value, and the inclination angle formed by the connecting line of the barycenters of the two communication areas is not more than a second threshold value;

determining the lowest point of each of the two communication areas, taking the area below the connecting line of the two lowest points as a target area, and taking an image corresponding to the target area as an image to be detected;

and detecting cigarettes by using the target detection model.

2. The method of claim 1, wherein the extracting the facial region of the person from the target image comprises:

according to the bone point numbers of the bone point coordinates corresponding to the person, determining bone point coordinates corresponding to the face bone points of the person;

and extracting a corresponding face region from the target image according to the bone point coordinates of the face bone points.

3. The method according to claim 1, wherein the method further comprises:

acquiring a plurality of training images;

performing skeleton point detection on each training image by using a skeleton point detection model to obtain the coordinates of each skeleton point of the person in each training image;

and training a support vector machine by utilizing the coordinates of each bone point of each training image and the corresponding label to obtain the classification model.

4. The method according to claim 1 or 2, wherein the extracting the face region of the person from the target image to obtain a face image includes:

extracting the face area of the person from the target image, and expanding the width of the face area outwards according to a preset proportion to obtain a new face area;

and obtaining a corresponding face image according to the new face area.

5. The method of claim 1, wherein said using a target detection model for cigarette detection of said facial image comprises:

cutting the facial image into an upper half part and a lower half part to obtain a lower half part image;

and detecting cigarettes by using the target detection model.

6. A smoking behavior detection device, comprising:

the bone point detection module is used for detecting bone points of the target image by utilizing the bone point detection model, and obtaining coordinates of each bone point corresponding to each person in the target image;

the primary screening module is used for inputting the bone point coordinates of each person into a classification model respectively, obtaining a primary classification result of whether each person in the target image smokes or not, and extracting the face area of the person from the target image for the person with the primary classification result of smoking to obtain a face image;

a cigarette detection module for detecting cigarettes on the face image using a target detection model to determine whether the person smokes;

the cigarette detection module comprises:

7. The apparatus of claim 6, wherein the preliminary screening module is specifically configured to: and determining bone point coordinates corresponding to the facial bone points of the person according to the bone point numbers of the bone point coordinates corresponding to the person, and extracting a facial region from the target image according to the bone point coordinates of the facial bone points.

8. A storage medium having stored thereon a computer program which, when executed by a processor, performs the method of any of claims 1-5.

9. An electronic device, comprising: a processor, a memory and a bus, the memory storing machine-readable instructions executable by the processor, the processor and the memory in communication via the bus when the electronic device is running, the machine-readable instructions when executed by the processor performing the method of any of claims 1-5.