CN116486383A - Smoking behavior recognition method, smoking detection model, device, vehicle, and medium - Google Patents

Smoking behavior recognition method, smoking detection model, device, vehicle, and medium Download PDF

Info

Publication number
CN116486383A
CN116486383A CN202310145939.6A CN202310145939A CN116486383A CN 116486383 A CN116486383 A CN 116486383A CN 202310145939 A CN202310145939 A CN 202310145939A CN 116486383 A CN116486383 A CN 116486383A
Authority
CN
China
Prior art keywords
image
smoking
fusion
person
visible light
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310145939.6A
Other languages
Chinese (zh)
Inventor
韩苹
许雪
王光甫
叶春雨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Great Wall Motor Co Ltd
Original Assignee
Great Wall Motor Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Great Wall Motor Co Ltd filed Critical Great Wall Motor Co Ltd
Priority to CN202310145939.6A priority Critical patent/CN116486383A/en
Publication of CN116486383A publication Critical patent/CN116486383A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/59Context or environment of the image inside of a vehicle, e.g. relating to seat occupancy, driver state or inner lighting conditions
    • G06V20/597Recognising the driver's state or behaviour, e.g. attention or drowsiness
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • G06V10/765Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects using rules for classification or partitioning the feature space
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Social Psychology (AREA)
  • Human Computer Interaction (AREA)
  • Psychiatry (AREA)
  • Image Processing (AREA)

Abstract

The present disclosure provides a smoking behavior recognition method, a smoking detection model, a device, a vehicle and a medium, and relates to the technical field of image recognition, wherein the method comprises: determining the fusion weight of the person image at the position of the driver in the vehicle cabin according to the image display parameters, determining the actual distance between the mouth of the person in the person image and the cigarettes, generating fusion image features according to the fusion weight and the actual distance, and classifying the smoking behaviors of the driver according to the fusion image features to obtain the recognition result of whether smoking exists. The method and the device can improve the detection precision of whether the driver has smoking behaviors, and are beneficial to reducing the false detection rate of the smoking behaviors.

Description

Smoking behavior recognition method, smoking detection model, device, vehicle, and medium
Technical Field
The present disclosure relates to the field of image recognition technology, and in particular, to a smoking behavior recognition method, a smoking detection model, a device, a vehicle, and a computer-readable storage medium.
Background
The artificial intelligence (Artificial Intelligence, AI) visual recognition technology is applied to various industries, and the camera has the capability of automatically recognizing potential safety hazards by linking with an AI capability layer, so that a sound-light alarm can be linked in real time to send out a prompt, and the closed-loop management of the potential hazards is realized.
Along with the rapid development of automobiles in an intelligent direction, an automobile cabin is used as the most important part of man-machine interaction in the automobiles, and a dangerous behavior monitoring method in the automobile cabin aiming at video streams of a single camera is provided based on an AI visual identification technology. Because the video stream used for monitoring the dangerous behavior in the automobile cabin is collected by a single camera, the recognition rate of the dangerous behavior of complex environmental scenes (such as bright, dark, shadow, mottled light and the like, and other reflective objects in the automobile cabin map light or light spots onto the face of a driver, so that the scenes such as light spots on the face are caused) is low, and the situation of false alarm of the dangerous behavior is easy to occur.
It should be noted that the information disclosed in the above background section is only for enhancing understanding of the background of the present disclosure and thus may include information that does not constitute prior art known to those of ordinary skill in the art.
Disclosure of Invention
The present disclosure provides a smoking behavior recognition method, a smoking detection model, a device, a vehicle, and a computer readable storage medium, which can improve the recognition rate of dangerous behaviors in a vehicle cabin of a vehicle under a complex environment scene, and reduce the false alarm rate of dangerous behaviors.
Other features and advantages of the present disclosure will be apparent from the following detailed description, or may be learned in part by the practice of the disclosure.
According to an aspect of the present disclosure, there is provided a smoking behavior recognition method including: determining the fusion weight of the character image according to the image display parameters; wherein the character image is an image of the position of the driver in the cabin; determining the actual distance between the mouth of the person and the cigarette in the person image; generating fusion image features according to the fusion weights and the actual distances; and classifying the smoking behaviors of the driver according to the fusion image characteristics to obtain a recognition result of whether smoking exists.
Optionally, the character image includes an infrared image and a visible light image; before the step of determining the fusion weight of the person image according to the image display parameters, the smoking behavior recognition method further comprises the following steps: acquiring a first video stream of the position of a driver in a vehicle cabin, and performing frame extraction processing on the first video stream to obtain the infrared image, wherein the first video stream is acquired by an infrared camera; acquiring a second video stream of the whole environment inside the vehicle cabin, and performing frame extraction processing on the second video stream to obtain a vehicle cabin environment image, wherein the second video stream is acquired by a visible light camera; and intercepting the image at the position from the vehicle cabin environment image to obtain the visible light image.
Optionally, the image display parameter includes an image gray value; the step of determining the fusion weight of the character image according to the image display parameters comprises the following steps: determining a first image gray average value and a first image gray standard deviation of the infrared image according to the image gray value of the infrared image; determining a second image gray average value and a second image gray standard deviation of the visible light image according to the image gray value of the visible light image; and carrying out normalization processing on the first image gray average value, the first image gray standard deviation, the second image gray average value and the second image gray standard deviation to obtain the fusion weight.
Optionally, the step of determining the actual distance between the mouth of the person and the cigarette in the image of the person comprises: identifying the center point position of the mouth of the person and the center point position of the cigarette in the person image by adopting a pre-trained target detection model; calculating the distance between the center point position of the mouth of the person and the center point position of the cigarette to obtain the center point distance; the center point distance is determined as the actual distance.
Optionally, the step of classifying the smoking behavior of the driver according to the fused image features to obtain the recognition result of whether smoking exists includes: and inputting the fusion image characteristics into a pre-trained classification model to classify the smoking behaviors of the driver, so as to obtain the recognition result of whether smoking exists.
Optionally, before the step of classifying the smoking behaviors of the driver according to the fused image features to obtain the recognition result of whether smoking exists, the smoking behavior recognition method further includes: acquiring an infrared face image and a visible light face image; determining respective sample image gray average values and sample image gray standard deviations of the infrared face image and the visible face image according to the image gray values to obtain sample fusion weights; determining the distance between the mouth of the person and the cigarette in each of the infrared face image and the visible light face image to obtain a distance value pair; generating sample fusion image features according to the sample fusion weight and the distance value pair; and training the classification model by adopting the sample fusion image features.
According to another aspect of the present disclosure, there is provided a smoking detection model comprising:
The fusion weight module is used for determining fusion weights of the infrared image and the visible light image according to image display parameters under the condition that the face exists in the infrared image and/or the visible light image;
a pre-trained target detection model for determining an actual distance between a person's mouth and a cigarette in each of the infrared image and the visible light image;
the feature fusion module is used for generating fusion image features according to the fusion weights and the actual distances;
and the pre-trained classification model is used for classifying smoking behaviors of the people in the infrared image and the visible light image according to the fusion image characteristics to obtain a recognition result of whether smoking exists.
According to still another aspect of the present disclosure, there is provided a smoking behavior recognition apparatus including:
the weight calculation module is used for determining the fusion weight of the person image according to the image display parameters;
the distance calculation module is used for determining the actual distance between the mouth of the person and the cigarette in the person image;
the feature fusion module is used for generating fusion image features according to the fusion weights and the actual distances;
And the behavior recognition module is used for classifying the smoking behaviors of the driver according to the fused image characteristics to obtain a recognition result of whether smoking exists.
According to yet another aspect of the present disclosure, there is provided a vehicle including a memory, a processor, and a computer program stored in the memory and executable on the processor, the processor implementing the smoking behavior recognition method described in the above embodiment or implementing the smoking behavior recognition method described in the above embodiment when executing the computer program.
According to yet another aspect of the present disclosure, there is provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the smoking behavior recognition method as described in the above embodiment, or implements the smoking behavior recognition method as described above.
The smoking behavior recognition method, the smoking detection model, the device, the vehicle and the computer readable storage medium provided by the embodiment of the disclosure have the following technical effects:
according to the technical scheme, the fusion weight of the person image of the position of the driver in the vehicle cabin is determined according to the image display parameters, the actual distance between the person mouth and the cigarette in the person image is determined, fusion image features are generated according to the fusion weight and the actual distance, smoking behaviors of the driver are classified according to the fusion image features, and whether a smoking recognition result exists or not is obtained.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the principles of the disclosure. It will be apparent to those of ordinary skill in the art that the drawings in the following description are merely examples of the disclosure and that other drawings may be derived from them without undue effort.
FIG. 1 illustrates a block flow diagram of model training in an exemplary embodiment of the present disclosure;
figure 2 illustrates a flow diagram of a method of smoking behavior identification in an exemplary embodiment of the present disclosure;
FIG. 3 shows a schematic view of a cabin environment image;
FIG. 4 shows a schematic diagram of a first annotation image;
FIG. 5 shows a schematic flow chart of smoking behavior detection using a smoking detection model;
figure 6 shows a block diagram of a smoking detection model of the present disclosure;
figure 7 shows a process flow diagram of the smoking detection model of the present disclosure;
fig. 8 is a schematic structural view of a smoking behavior recognition device provided by an embodiment of the present disclosure;
Fig. 9 shows a schematic structural diagram of a vehicle provided by an embodiment of the present disclosure.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present disclosure more apparent, the embodiments of the present disclosure will be described in further detail below with reference to the accompanying drawings.
When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples are not representative of all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with some aspects of the disclosure as detailed in the accompanying claims.
It is noted that the above-described figures are merely schematic illustrations of processes involved in a method according to exemplary embodiments of the present disclosure, and are not intended to be limiting. It will be readily appreciated that the processes shown in the above figures do not indicate or limit the temporal order of these processes. In addition, it is also readily understood that these processes may be performed synchronously or asynchronously, for example, among a plurality of modules.
With the rapid development of automobiles in an intelligent direction, automobile cabins are the most important parts of man-machine interaction in automobiles, and in order to ensure the safety in automobile cabins, dangerous behaviors (such as smoking behaviors) in automobile cabins are required to be monitored. In the related art, an AI visual recognition technology is based on a method for monitoring dangerous behavior in an automobile cabin of an automobile aiming at a video stream of a single camera is provided. Because the video stream used for monitoring the dangerous behavior in the automobile cabin is collected by a single camera, the recognition rate of the dangerous behavior of complex environmental scenes (such as bright, dark, shadow, mottled light and the like, and other reflective objects in the automobile cabin map light or light spots onto the face of a driver, so that the scenes such as light spots on the face are caused) is low, and the situation of false alarm of the dangerous behavior is easy to occur.
Based on the problems of the related art, the present disclosure provides a smoking behavior recognition method, a smoking detection model, a device, a vehicle and a computer readable storage medium, so as to solve the problems that the recognition rate of smoking behavior is low and dangerous behavior misinformation is easy to occur in a complex environment scene.
The monitoring of smoking behavior in the automobile cabin under the complex environment scene can be identified through a smoking detection model. As shown in fig. 1, fig. 1 shows a block flow diagram of model training in an exemplary embodiment of the present disclosure. For recognition of smoking behavior, at least two models, namely a face recognition model and a smoking detection model, need to be trained, and the training process is as follows:
data acquisition, i.e. acquisition of OMS (Occupant Monitoring System ) data and DMS (Driver Monitoring System, driver monitoring system) data. The OMS data refer to video data of the whole environment inside the automobile cabin, and the OMS data are collected through an RGB (Red Green Blue) camera. The monitoring object of OMS system is the passenger, and the RGB camera sets up in a plurality of positions in the car cabin, and the RGB camera can shoot all positions in the car cabin of car, namely the RGB camera not only can shoot the passenger, can also shoot the driver. The DMS data refers to video data of the position of the driver in the cabin of the automobile, and the DMS data is collected by an IR (Infrared) camera. The monitoring object of the DMS system is a driver, the IR camera is arranged at the left A column position of the automobile, the IR camera mainly collects video information of the region where the driving seat is located, namely, under the condition that the driving seat sits on a person, the person on the driving seat can be shot by the IR camera. The IR camera is used for night vision monitoring, and the focal position of a lens of the common camera is changed under the condition of infrared light at night, so that an image is blurred, and the image can be clearly adjusted. The focus of the lens of the IR camera is consistent in both infrared and visible light.
The acquisition of OMS data and DMS data may be performed simultaneously, an exemplary acquisition process being: the drivers with different changes (age, skin color, sex, fat and thin, height and the like) sit at the main driving position, simulate normal driving scenes, enable the driver to adjust the driving seat to a proper driving state, further collect OMS data through the RBG camera, and collect DMS data through the IR camera. After OMS data and DMS data are acquired, data processing is performed.
For data processing, a plurality of visible light images are acquired from OMS data, a plurality of infrared images are acquired from DMS data, the acquired visible light images and infrared images can be understood as sample images, human faces need to be contained in the visible light images and the infrared images, cigarettes are contained in one part of the visible light images and the infrared images, and cigarettes are not contained in the other part of the visible light images and the infrared images.
After the sample image is obtained, labeling the sample image, namely labeling a face frame and a cigarette frame in the sample image to obtain a labeled sample image, and further normalizing the labeled sample image to obtain a normalized sample image. Then, the labeled sample image is adopted to carry out face recognition model training and the normalized sample image is adopted to carry out smoking detection model training. After the face recognition model and the smoking detection model are trained, the face recognition model and the smoking detection model can be selected to cooperate, so that whether the person has smoking behaviors or not can be detected.
During training of the model, OMS data and DMS data are acquired by simulating the driving scene, the purpose of which is to acquire visible and infrared images of the face and also of the cigarettes. In addition, the visible light image and the infrared image of the face and the cigarettes can be obtained in other scenes. For example, an IR camera is arranged at a certain position in a room, the IR camera is mainly responsible for collecting video data of people when smoking and not smoking, and infrared images of the people when smoking and not smoking are obtained through the video data; an RGB camera is arranged at the other position in the room, indoor panoramic video data is collected through the RGB camera, and of course, visible light images of people during smoking and non-smoking can be obtained through the panoramic video data. Thus, the visible light image and the infrared image of the face and the cigarette can be obtained in other scenes.
The following is an embodiment of a smoking behavior recognition method provided by the present disclosure.
Fig. 2 is a flow chart illustrating a smoking behavior recognition method according to an exemplary embodiment of the present disclosure. As shown in fig. 2, the smoking behavior recognition method provided by an embodiment of the method of the present disclosure is applied to a driving scenario, and the execution subject may be a controller of a vehicle or a vehicle-mounted terminal, including the following schemes:
S110: and determining the fusion weight of the character image according to the image display parameters.
In an exemplary embodiment, the character image is an image of the location of the driver within the cabin. And during the driving of the vehicle by the driver, acquiring an image of the position of the driver in the vehicle cabin, and obtaining a character image. And inputting the character image into a pre-trained face recognition model, recognizing whether the face exists in the character image by adopting the face recognition model, and if the face exists in the character image, calculating the fusion weight of the character image according to the image display parameters.
The faceNet algorithm can be selected as a face recognition model, the backbone network framework of the faceNet algorithm is in acceptance-ResNetV 1, and the face frame, the mouth frame and the cigarette frame can be predicted by the face recognition model, namely, the predicted face frame, the predicted mouth frame and the predicted cigarette frame in the image can be predicted. The face recognition model is trained using infrared sample images and visible sample images. The infrared sample image and the visible light sample image contain human faces. Wherein, some images in the infrared sample image contain cigarettes besides the mouth of the person, and other images contain the mouth of the person and do not contain cigarettes. Similarly, some of the visible light sample images include cigarettes in addition to the human mouth, and the other images include the human mouth and do not include cigarettes.
Before training a face recognition model, the faces in the infrared sample image and the visible light sample image are required to be marked, then the marked infrared sample image and visible light sample image are input into an acceptance-ResNetV 1 framework, the framework outputs face feature vectors, then the face recognition model is trained by adopting the face feature vectors until the model converges, and the face recognition model training is completed.
The image display parameter includes an image gray value, which refers to a gray value of a pixel point in the image, that is, the image display parameter may be represented by the image gray value. Calculating an image gray average value and an image gray standard deviation of the person image, carrying out normalization processing on the image gray average value and the image gray standard deviation to obtain a first value corresponding to the image gray average value and a second value corresponding to the image gray standard deviation, wherein the first value and the second value are between 0 and 1, and representing fusion weight of the person image through the first value and the second value, and the fusion weight is related to the image gray value. Wherein the first value is denoted p, the second value is denoted q, the fusion weight is denoted (p, q), p+q=1.
In one possible implementation manner, the person image includes an infrared image and a visible light image, and before the step of determining the fusion weight of the person image according to the image display parameter, the smoking behavior recognition method further includes the following scheme:
Acquiring a first video stream of the position of a driver in a vehicle cabin, and performing frame extraction processing on the first video stream to obtain the infrared image, wherein the first video stream is acquired by an infrared camera;
acquiring a second video stream of the whole environment inside the vehicle cabin, and performing frame extraction processing on the second video stream to obtain a vehicle cabin environment image, wherein the second video stream is acquired by a visible light camera;
and intercepting the image at the position from the vehicle cabin environment image to obtain the visible light image.
The character image is acquired under a driving scene, a first video stream of the position of the driver in the vehicle cabin is acquired, and the first video stream corresponds to DMS data, namely, the first video stream is acquired by an IR camera. And performing frame extraction processing on the first video stream to obtain a plurality of frames of first images to be processed, and cutting out each frame of first images to be processed to obtain a plurality of first cut images. And carrying out image graying treatment on each first clipping image to obtain a plurality of infrared images. The size of each infrared image is the same, for example, the size is 800 x 1280.
And acquiring a second video stream of the whole environment inside the vehicle cabin, wherein the second video stream corresponds to OMS data, namely the second video stream is acquired by the RGB camera. And performing frame extraction processing on the second video stream to obtain a plurality of frames of second images to be processed, and cutting each frame of second images to be processed to obtain a plurality of cabin environment images. As shown in fig. 3, fig. 3 shows a schematic view of a cabin environment image. In fig. 3, P1 represents an environmental image of a vehicle cabin, P2 represents an image of a position of a driver in the vehicle cabin, and an area except P2 in P1 is an image of other things in the vehicle cabin, for example, the images of other things include a rear seat, a secondary driving seat, and the like.
For each cabin environment image, an image of the position of the driver is cut from each cabin environment image, and a plurality of second preprocessed images, such as P2 in FIG. 3, are obtained. And then carrying out image graying treatment on each second preprocessed image to obtain a plurality of visible light images. The size of each visible light image is the same, for example 1920×1080.
Because the setting positions of the RGB camera and the IR camera are different, the positions corresponding to the collected video data are also different, and thus, the character images of the same driver shot at different positions can be obtained.
After the infrared image and the visible light image are acquired, the infrared image and the visible light image are sequentially input into a face recognition model, and if the face recognition model recognizes that the face exists in the infrared image and/or the visible light image, fusion weight of the infrared image and the visible light image is calculated according to image display parameters.
In one possible implementation manner, the determining the fusion weight of the person image according to the image display parameter includes the following schemes:
determining a first image gray average value and a first image gray standard deviation of the infrared image according to the image gray value of the infrared image;
Determining a second image gray average value and a second image gray standard deviation of the visible light image according to the image gray value of the visible light image;
and carrying out normalization processing on the first image gray average value, the first image gray standard deviation, the second image gray average value and the second image gray standard deviation to obtain the fusion weight.
The average value of the image gray level is calculated by a formula (1), the standard deviation of the image gray level is calculated by a formula (2), and the formula (1) and the formula (2) are specifically as follows:
where x (i, j) represents a gray value where the pixel point in the image is (i, j), arg represents an average value of gray scale of the image, σ represents a standard deviation of gray scale of the image, a represents a width of the image, and B represents a length of the image.
Based on the formula (1) and the formula (2), a first image gray average value and a first image gray standard deviation of the infrared image are calculated according to the image gray value of the infrared image, wherein the first image gray average value is expressed as ir arg The first image gray standard deviation is denoted as ir σ . Wherein, pass ir arg And ir σ An image weight value representing the infrared image.
Also, based on the formula (1) and the formula (2), a second image gray average value of the visible light image and a second image gray standard deviation are calculated from the image gray values of the visible light image, the second image gray average value being expressed as rgb arg The second image gray standard deviation is expressed as rgb σ . Wherein, through rgb arg And rgb σ An image weight value representing a visible light image.
After obtaining the respective image weight values of the infrared image and the visible light image, carrying out linear fusion on the obtained image weight values to obtain fusion weights corresponding to the infrared image and the visible light image, wherein the fusion weights comprise the respective image weight values of the infrared image and the visible light image. Because the infrared image and the visible light image are images of the same person acquired at different positions, and the images at two angles or in two states correspond to the same person, the fusion weights corresponding to the obtained infrared image and the obtained visible light image are dynamic, namely dynamic fusion weights. Image gray level average value corresponding to infrared image and visible light imageThe standard deviation of the gray scale of the image is (ir) arg ,ir σ ,rgb arg ,rgb σ ) Pair (ir) arg ,ir σ ,rgb arg ,rgb σ ) After normalization, a fusion weight (a, b, c, d) is obtained, wherein ir arg Corresponding to a, ir σ Corresponding to b, rgb arg Corresponding to c, rgb σ Corresponding to d, a+b+c+d=1.
The image weight values of the infrared image and the visible light image are obtained through image gray value calculation, and the fusion weight is obtained after the image weight values of the infrared image and the visible light image are fused, which is equivalent to the fusion of the infrared image and the visible light image, so that the complementary advantages of the infrared image and the visible light image are realized, and the interference of light rays in a complex environment scene on the face image can be avoided.
S120: and determining the actual distance between the human mouth and the cigarette in the human figure image.
After the person image is acquired, the actual distance between the person's mouth and the cigarette in the person image is calculated, which is a distance value denoted as k, which is part of the features of the fused image that are involved in the generation of the following.
In a possible implementation manner, the determining the actual distance between the mouth of the person and the cigarette in the image of the person includes the following schemes:
identifying the center point position of the mouth of the person and the center point position of the cigarette in the person image by adopting a pre-trained target detection model;
calculating the distance between the center point position of the mouth of the person and the center point position of the cigarette to obtain the center point distance;
the center point distance is determined as the actual distance.
The YOLOV5 algorithm can be selected as a target detection model which is pre-trained for identifying the center point positions of the cigarettes and the mouth in the image. Training the target detection model comprises the following schemes:
acquiring a first labeling image carrying a cigarette labeling frame and a mouth labeling frame and a second labeling image carrying a mouth labeling frame; sampling the first annotation image and the second annotation image, and training a target detection model; the first marked image is an image containing cigarettes and obtained after the classification of the character sample image, the second marked image is an image not containing cigarettes and obtained after the classification of the character sample image, and the character sample image contains a human face and specifically contains the mouth of a character.
Classifying the character sample images according to smoking and non-smoking to obtain a smoking image and a non-smoking image, wherein the smoking image contains cigarettes, and the non-smoking image does not contain cigarettes. And labeling the cigarettes and the mouth in the smoking image to obtain a cigarette labeling frame and a mouth labeling frame, so as to generate a first labeling image carrying the cigarette labeling frame and the mouth labeling frame, namely the first labeling image is an image containing the cigarettes and obtained after the classification of the character sample image, and can be also understood as a labeling image of smoking. As shown in fig. 4, fig. 4 shows a schematic view of a first annotation image. Wherein 100 denotes a mouth, 101 denotes a mouth marking frame, 200 denotes a cigarette, and 201 denotes a cigarette marking frame. And similarly, labeling the mouth in the non-smoking image to obtain a mouth labeling frame, so as to generate a second labeling image carrying the mouth labeling frame, namely the second labeling image is an image which is obtained by classifying the character sample image and does not contain cigarettes, and can be understood as a non-smoking labeling image.
And acquiring a first annotation image and a second annotation image, and performing iterative training on the target detection model by adopting the first annotation image and the second annotation image until the model converges, thereby completing training of the target detection model.
The target detection model obtains the center point position of the mouth of the person and the center point position of the cigarette from the infrared image, and the center point position of the mouth of the person and the center point position of the cigarette in the infrared image can be represented by coordinates of pixel points. And calculating the distance between the coordinates of the pixel points corresponding to the central point position of the mouth of the person in the infrared image and the coordinates of the pixel points corresponding to the central point position of the cigarette to obtain the central distance, and taking the central point distance as the actual distance between the mouth of the person and the cigarette in the infrared image. Similarly, the target detection model obtains the center point position of the mouth of the person and the center point position of the cigarette from the visible light image, and the center point position of the mouth of the person and the center point position of the cigarette in the visible light image can be represented by coordinates of the pixels. And calculating the distance between the coordinates of the pixel points corresponding to the central point position of the mouth of the person in the visible light image and the coordinates of the pixel points corresponding to the central point position of the cigarette to obtain the central distance, and taking the central point distance as the actual distance between the mouth of the person in the visible light image and the cigarette.
In the case where the person image includes an infrared image and a visible light image, the infrared image and the visible light image are input to the object detection model. The infrared image and the visible light image are images with faces identified by a face identification model, after the face identification model identifies that the faces exist in the infrared image and the visible light image, a predicted face frame and a predicted mouth frame in each of the infrared image and the visible light image are predicted, and if cigarettes exist in the images, a predicted cigarette frame of the cigarettes can be predicted.
As the smoking posture has various postures such as tobacco pinching, tobacco holding and the like, the cross section of the tobacco tip of some cigarettes is cylindrical, rectangular and the like. Therefore, the present embodiment judges whether or not the person has smoking behavior by setting the actual distance between the person's mouth and the cigarette in the image. The actual distance between the mouth of the person and the cigarette is calculated as follows:
in the formula (3), d represents the distance, (x) m ,y m ) Coordinates representing the position of the center point of the predicted mouth frame in the image, (x) s ,y s ) The coordinates representing the position of the center point of the predicted cigarette frame in the image, and l represents the width of the predicted face frame in the image.
And inputting the infrared image and the visible light image into a target detection model, and obtaining coordinates of pixel points corresponding to the central point positions of the predicted mouth frames, coordinates of pixel points corresponding to the central point positions of the predicted cigarette frames and the width of the predicted face frames in the infrared image and the visible light image by the target detection model. Then, calculating the actual distance between the human mouth and the cigarette in the infrared image by adopting a formula (3), wherein the actual distance is expressed as e, and calculating the actual distance between the human mouth and the cigarette in the visible light image by adopting the formula (3), and the actual distance is expressed as f.
S130: and generating fusion image features according to the fusion weights and the actual distances.
For the case where the person image is not limited, the calculated fusion weight is (p, q), and the actual distance is k. Then, the fusion weight is fused with the actual distance to obtain a fused image feature, denoted as w1, w1= (p, q, k).
For the case of limiting the person image, that is, the person image includes an infrared image and a visible light image, the calculated fusion weight is (a, b, c, d), the actual distance between the person mouth and the cigarette in the infrared image is e, and the actual distance between the person mouth and the cigarette in the visible light image is f. Then, the fusion weights are fused with the two actual distances to obtain a fused image feature, denoted as w2, w2= (a, b, c, d, e, f).
S140: and classifying the smoking behaviors of the driver according to the fusion image characteristics to obtain a recognition result of whether smoking exists.
After the fusion image features are obtained, the smoking behaviors of the driver are classified according to the fusion image features, and the recognition result of whether smoking exists or not is obtained. If the recognition result shows that the driver has smoking behaviors, a prompt message is sent out to give an alarm to the smoking behaviors of the driver.
In one possible implementation manner, the classifying the smoking behavior of the driver according to the fused image features, and obtaining the recognition result of whether smoking exists includes the following schemes:
and inputting the fusion image characteristics into a pre-trained classification model to classify the smoking behaviors of the driver, so as to obtain the recognition result of whether smoking exists.
After the fusion image features are obtained, the fusion image features are input into a classification model, the classification model outputs a recognition result, the recognition result comprises a probability value of smoking by a driver, and whether the smoking behavior of the driver exists is indicated by the recognition result. For example, when the probability value is greater than or equal to the preset value, it indicates that the driver has smoking behavior, and when the probability value is less than the preset value, it indicates that the driver has smoking behavior.
In a possible implementation manner, before the step of classifying the smoking behaviors of the driver according to the fused image features to obtain the recognition result of whether smoking exists, the smoking behavior recognition method further includes the following scheme, namely training of a classification model:
acquiring an infrared face image and a visible light face image;
Determining respective sample image gray average values and sample image gray standard deviations of the infrared face image and the visible face image according to the image gray values to obtain sample fusion weights;
determining the distance between the mouth of the person and the cigarette in each of the infrared face image and the visible light face image to obtain a distance value pair;
generating sample fusion image features according to the sample fusion weight and the distance value pair;
and training the classification model by adopting the sample fusion image features.
The method comprises the steps of acquiring an infrared face image shot by an IR camera and a visible light face image shot by an RGB camera, wherein the infrared face image and the visible light face image contain faces. Wherein, some images in the infrared face image contain cigarettes besides the mouth of the person, and other images contain the mouth of the person and do not contain cigarettes. Similarly, some of the visible face images include cigarettes in addition to the person's mouth, and the other images include the person's mouth and do not include cigarettes.
Calculated according to the above formula (1) and formula (2)And then carrying out normalization processing on the respective sample image gray average value and the sample image gray standard deviation of the infrared face image and the visible face image to obtain the sample fusion weight. For example, the average value of the gray scale of the sample image of the infrared face image and the standard difference of the gray scale of the sample image are denoted as ir1 arg 、ir1 σ The average value of the gray level of the sample image of the visible face image and the standard difference of the gray level of the sample image are expressed as rgb1 respectively arg 、rgb1 σ Pair ir1 arg 、ir1 σ 、rgb1 arg Rgb1 σ And carrying out normalization processing to obtain sample fusion weights, wherein the sample fusion weights are (a 1, b1, c1, d 1), and a1+b1+c1+d1=1.
And (3) calculating the distance between the mouth of the person and the cigarette in each of the infrared face image and the visible face image according to the formula (3). The distance between the person's mouth and the cigarette in the infrared face image is denoted as e1, the distance between the person's mouth and the cigarette in the visible face image is denoted as f1, and the distance value pair is (e 1, f 1).
After the sample fusion weight W3 and the distance value pair are obtained, the sample fusion weight W3 and the distance value pair are fused, and the sample fusion image characteristics are obtained, wherein the sample fusion image characteristics are represented as W3, W3= (a 1, b1, c1, d1, e1, f 1).
And training the classification model by adopting the sample fusion weight W3 until the model converges, namely finishing the training of the classification model. Wherein, the classification model adopts a random forest algorithm.
Under a complex environment scene, the light spots can be mapped to the human face, for example, the cross section shape of cigarette ends of some cigarettes is round or rectangular, and after some light rays are mapped to the human face, the light spots similar to the cross section shape of the cigarette ends can appear on the human face, so that the recognition of whether the person has smoking behaviors is interfered, and misrecognition is caused. When the classification model is trained, as the infrared face image and the visible light face image are images acquired at different positions, the infrared face image contains infrared light, the visible light face image contains visible light, and the sample fusion image characteristics generated through the infrared face image and the visible light face image are used as the input of the classification model, which is equivalent to the input of the infrared face image and the visible light face image as the classification model, the complementary effect of the infrared face image and the visible light face image can be fully exerted, and the misidentification of whether smoking behaviors exist or not due to cigarette spots mapped on the face in a complex environment scene is avoided.
According to the technical scheme, the fusion weight of the person image of the position of the driver in the vehicle cabin is determined according to the image display parameters, the actual distance between the person mouth and the cigarette in the person image is determined, the fusion image features are generated according to the fusion weight and the actual distance, the smoking behaviors of the driver are classified according to the fusion image features, and the recognition result of whether smoking exists is obtained.
When the person image comprises an infrared image and a visible light image, the fusion image features not only comprise the fusion weights of the infrared image and the visible light image related to the gray values of the images, but also comprise the actual distance between the person mouth and the cigarette in the infrared image and the actual distance between the person mouth and the cigarette in the visible light image, so that the basis data for identifying whether the smoking behavior of the driver exists is further increased, which is equivalent to fusing the infrared image and the visible light image, the direct correlation of the complex environment scene and the smoking behavior is realized, and the correlation condition of the smoking behavior and the complex environment scene can be obtained. The fusion image features related to the smoking behaviors are used for smoking behavior recognition, the trend of the infrared image and the visible light image related to smoking behavior prediction can be observed from the front, the detection accuracy of whether the smoking behaviors exist by a driver is improved, and the occurrence of smoking behavior false detection is reduced. When the classification model is applied to recognition of smoking behaviors in the automobile cabin, recognition rate of the smoking behaviors in the automobile cabin is improved under a complex environment scene, and false alarm rate of the smoking behaviors is reduced.
The following are examples of smoking detection models provided by the present disclosure.
After the smoking detection model and the face recognition model are obtained through training, the smoking detection model and the face recognition model are matched to detect smoking behaviors. As shown in fig. 5, fig. 5 shows a schematic flow chart of smoking behavior detection using a smoking detection model, and the detection flow is as follows:
the infrared image and the visible light image of the region where the same person is located are collected by adopting an IR camera and an RGB camera which are arranged at different positions, then the collected infrared image and visible light image of the region are preprocessed, namely the infrared image of the person is cut out from the infrared image of the region, the image is subjected to grey-scale processing, and the infrared image of the person is cut out from the visible light image of the region, and the image is subjected to grey-scale processing. And then, recognizing whether faces exist in the two images subjected to the image graying treatment through a face recognition model, inputting the two images subjected to the image graying treatment into a smoking detection model if the faces exist in at least one of the two images, detecting whether the smoking behavior exists in the person by the smoking detection model, and outputting a detection result.
As shown in fig. 6, fig. 6 shows a block diagram of a smoking detection model of the present disclosure. The smoking detection model 500 includes a fusion weighting module 510, a pre-trained target detection model 520, a feature fusion module 530, and a pre-trained classification model 540, the classification model 540 being a random forest algorithm.
The fusion weight module 510 is configured to determine, according to an image display parameter, a fusion weight of an infrared image and a visible light image when a face exists in the infrared image and/or the visible light image; a target detection model 520 for determining an actual distance between the mouth of the person and the cigarette in each of the infrared image and the visible light image; the feature fusion module 530 is configured to generate a fused image feature according to the fusion weight and the actual distance; and the classification model 540 is used for classifying smoking behaviors of the people in the infrared image and the visible light image according to the fused image characteristics to obtain a recognition result of whether smoking exists.
In an exemplary embodiment, the smoke detection model 500 may be applied to different scene scenarios to detect a person's smoking behavior. For example, it can be used for smoking behavior detection of drivers and passengers (referred to as drivers and passengers as drivers) during driving of automobiles, it can be used for smoking behavior detection of passengers in elevators, it can be used for smoking behavior detection of passengers in offices, and so on.
As shown in fig. 7, fig. 7 shows a process flow diagram of the smoking detection model of the present disclosure. When the smoking behavior is detected by using the smoking detection model 500, an infrared image and a visible light image including the same person, which are images of the same person photographed by the camera from different positions, need to be acquired. After the infrared image and the visible light image containing the same person are obtained, whether the face exists in the infrared image and/or the visible light image is identified through the face recognition model. After the face recognition model recognizes that the face exists in the infrared image and/or the visible light image, a predicted face frame and a predicted mouth frame in each of the infrared image and the visible light image are also predicted, and if cigarettes exist in the images, a predicted cigarette frame of the cigarettes can also be predicted. That is, if there is a face of the person in at least one of the infrared image and the visible light image, the acquired infrared image and the visible light image are input to the fusion weight module 510 and the target detection model 520 together, as well as the target detection model 520.
The fusion weight module 520 obtains respective image display parameters of the infrared image and the visible light image, wherein the image display parameters are image gray values, and based on the formula (2) and the formula (2), an image gray average value and an image gray standard deviation of the infrared image are calculated according to the image gray values of the infrared image and are respectively expressed as ir2 arg 、ir2 σ The method comprises the steps of carrying out a first treatment on the surface of the Calculating the image of the visible light image according to the image gray value of the visible light imageImage gray average value and image gray standard deviation, respectively denoted as rgb2 arg 、rgb2 σ . The average value of the image gray scale and the standard deviation of the image gray scale corresponding to the infrared image and the visible light image are (ir 2) arg ,ir2 σ ,rgb2 arg ,rgb2 σ ) Pair (ir 2) arg ,ir2 σ ,rgb2 arg ,rgb2 σ ) After normalization processing, a fusion weight is obtained, the fusion weight being (a 2, b2, c2, d 2), wherein a2+b2+c2+d2=2. The fusion weight module 520 outputs fusion weights, which are input to the feature fusion module 530.
The target detection model 520 obtains coordinates of the center point position of the predicted mouth frame and coordinates of the pixel point corresponding to the center point position of the predicted cigarette frame from the infrared image, and further calculates the center point distance between the center point position of the mouth of the person and the center point position of the cigarette in the infrared image based on the above formula (3), and the center point distance is expressed as e2 as the actual distance between the mouth of the person and the cigarette in the infrared image. Similarly, the target detection model 520 obtains coordinates of a pixel point corresponding to the center point position of the predicted mouth frame and coordinates of the center point position of the predicted cigarette frame from the visible light image, calculates a center point distance between the center point position of the mouth of the person and the center point position of the cigarette in the visible light image based on the above formula (3), and represents the center point distance as an actual distance between the mouth of the person and the cigarette in the visible light image as f2. The object detection model 520 outputs the actual distance between the mouth of the person and the cigarette in each of the infrared image and the visible light image, and inputs the output two actual distances to the feature fusion module 530.
The feature fusion module 530 fuses the input fusion weights and the two actual distances to obtain a fused image feature, the feature fusion module 530 outputs the fused image feature, denoted as w4, w4= (a 2, b2, c2, d2, e2, f 2), and the fused image feature is input to the classification model 540. The classification model 540 classifies smoking behaviors of the same person in the infrared image and the visible light image according to the fused image features, and obtains a recognition result of whether smoking exists.
Because the infrared image and the visible light image are images acquired at different positions, the infrared image contains infrared light, the visible light image contains visible light, and the infrared image and the visible light image are used as the input of a smoking detection model, the complementary effect of the infrared image and the visible light image can be fully exerted, and the false identification of whether smoking behaviors exist or not caused by cigarette spots mapped on the face in a complex environment scene is avoided.
According to the embodiment, the fusion weight of the infrared image and the visible light image relative to the image gray value and the distance between the mouth of the person and the cigarette in each of the infrared image and the visible light image can be used as the fusion image characteristic relative to the smoking behavior by providing the smoking detection model, after the fusion image characteristic is obtained, the fusion of the infrared image and the visible light image is equivalent, and the direct correlation of the complex environment scene and the smoking behavior is realized, so that the correlation condition of the smoking behavior and the complex environment scene can be obtained. The fusion image features related to the smoking behaviors are used for smoking behavior recognition, the trend of the infrared image and the visible light image related to smoking behavior prediction can be observed from the front, the accuracy of detecting whether the smoking behaviors exist in the person is improved, and the occurrence of misdetection of the smoking behaviors is reduced.
The following are smoking behavior recognition device embodiments of the present disclosure that may be used to perform method embodiments of the present disclosure. For details not disclosed in the embodiments of the smoking behavior recognition device of the present disclosure, please refer to the embodiments of the method of the present disclosure.
Fig. 8 is a schematic diagram showing a structure of a smoking behavior recognition device to which an embodiment of the present disclosure can be applied. Referring to fig. 8, the smoking behavior recognition device shown in the figure may be implemented as a whole or a part of the vehicle by software, hardware, or a combination of both, or may be integrated into the vehicle or on a server as a separate module.
The smoking behavior recognition device 800 according to the embodiment of the present disclosure includes:
a weight calculation module 810 for determining a fusion weight of the person image according to the image display parameters;
a distance calculation module 820 for determining an actual distance between the mouth of the person and the cigarette in the person image;
the feature fusion module 830 is configured to generate a fused image feature according to the fusion weight and the actual distance;
the behavior recognition module 840 is configured to classify the smoking behavior of the driver according to the fused image features, so as to obtain a recognition result of whether smoking exists.
In an exemplary embodiment, based on the foregoing, the person image includes an infrared image and a visible light image, and the smoking behavior recognition device 800 further includes:
the system comprises an infrared image acquisition unit, a first video stream acquisition unit and a second video stream acquisition unit, wherein the infrared image acquisition unit is used for acquiring a first video stream of a position of a driver in a vehicle cabin, performing frame extraction processing on the first video stream to obtain an infrared image, and acquiring the first video stream by an infrared camera;
the visible light image acquisition unit is used for acquiring a second video stream of the whole environment inside the vehicle cabin, performing frame extraction processing on the second video stream to obtain a vehicle cabin environment image, and acquiring the second video stream by a visible light camera; and intercepting the image at the position from the vehicle cabin environment image to obtain the visible light image.
In an exemplary embodiment, based on the foregoing scheme, the image display parameter includes an image gray value, and the weight calculating module 810 includes:
the first calculation unit is used for determining a first image gray average value and a first image gray standard deviation of the infrared image according to the image gray value of the infrared image;
the second calculation unit is used for determining a second image gray average value and a second image gray standard deviation of the visible light image according to the image gray value of the visible light image;
And the fusion unit is used for carrying out normalization processing on the first image gray average value, the first image gray standard deviation, the second image gray average value and the second image gray standard deviation to obtain the fusion weight.
In an exemplary embodiment, based on the foregoing, the distance calculating module 820 includes:
a position acquisition unit for identifying a center point position of the mouth of the person and a center point position of the cigarette in the person image by using a pre-trained target detection model;
a third calculation unit, configured to calculate a distance between a center point position of the mouth of the person and a center point position of the cigarette, to obtain a center point distance;
and the distance determining unit is used for determining the center point distance as the actual distance.
In an exemplary embodiment, based on the foregoing scheme, the behavior recognition module 840 is specifically configured to input the fused image features into a pre-trained classification model, so as to classify the smoking behavior of the driver, and obtain the recognition result of whether there is smoking.
In an exemplary embodiment, based on the foregoing, the smoking behavior recognition device 800 further includes:
The sample face image acquisition unit is used for acquiring an infrared face image and a visible light face image;
the sample fusion weight calculation unit is used for determining a sample image gray average value and a sample image gray standard deviation of each of the infrared face image and the visible face image according to the image gray values to obtain sample fusion weights;
a distance value pair calculation unit, configured to determine a distance between a mouth of a person and a cigarette in each of the infrared face image and the visible face image, to obtain a distance value pair;
the sample image feature fusion unit is used for generating sample fusion image features according to the sample fusion weight and the distance value pair;
and the classification model unit is used for training the classification model by adopting the sample fusion image features.
It should be noted that, when the smoking behavior recognition device provided in the foregoing embodiment performs the smoking behavior recognition method, only the division of the foregoing functional modules is used as an example, and in practical application, the foregoing functional allocation may be performed by different functional modules according to needs, that is, the internal structure of the apparatus is divided into different functional modules, so as to complete all or part of the functions described above. In addition, the smoking behavior recognition device and the smoking behavior recognition method provided in the foregoing embodiments belong to the same concept, so for details not disclosed in the embodiments of the device of the present disclosure, please refer to the embodiments of the smoking behavior recognition method of the present disclosure, and the details are not repeated here.
The disclosed embodiments also provide a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of any of the methods of the previous embodiments. The computer readable storage medium may include, among other things, any type of disk including floppy disks, optical disks, DVDs, CD-ROMs, micro-drives, and magneto-optical disks, ROM, RAM, EPROM, EEPROM, DRAM, VRAM, flash memory devices, magnetic or optical cards, nanosystems (including molecular memory ICs), or any type of media or device suitable for storing instructions and/or data.
The disclosed embodiments also provide a vehicle including a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the steps of any of the methods of the embodiments described above when the program is executed by the processor.
Fig. 9 schematically shows a schematic structural view of a vehicle. Referring to fig. 9, a vehicle 900 includes: a processor 901 and a memory 902.
In the embodiment of the disclosure, the processor 901 is a control center of a computer system, and may be a processor of a physical machine or a processor of a virtual machine. Processor 901 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and the like. The processor 901 may be implemented in at least one hardware form of DSP (Digital Signal Processing ), FPGA (Field-Programmable Gate Array, field programmable gate array), PLA (Programmable Logic Array ). The processor 901 may also include a main processor and a coprocessor, the main processor being a processor for processing data in an awake state, also referred to as a CPU (Central Processing Unit ); a coprocessor is a low-power processor for processing data in a standby state.
In the embodiment of the present disclosure, the processor 901 is specifically configured to: determining the fusion weight of the character image according to the image display parameters; wherein the character image is an image of the position of the driver in the cabin; determining the actual distance between the mouth of the person and the cigarette in the person image; generating fusion image features according to the fusion weights and the actual distances; and classifying the smoking behaviors of the driver according to the fusion image characteristics to obtain a recognition result of whether smoking exists.
Further, the character image includes an infrared image and a visible light image, and the processor 901 is further configured to: acquiring a first video stream of the position of a driver in a vehicle cabin, and performing frame extraction processing on the first video stream to obtain the infrared image, wherein the first video stream is acquired by an infrared camera; acquiring a second video stream of the whole environment inside the vehicle cabin, and performing frame extraction processing on the second video stream to obtain a vehicle cabin environment image, wherein the second video stream is acquired by a visible light camera; and intercepting the image at the position from the vehicle cabin environment image to obtain the visible light image.
Further, the image display parameter includes an image gray value, and the processor 901 is further configured to: determining a first image gray average value and a first image gray standard deviation of the infrared image according to the image gray value of the infrared image; determining a second image gray average value and a second image gray standard deviation of the visible light image according to the image gray value of the visible light image; and carrying out normalization processing on the first image gray average value, the first image gray standard deviation, the second image gray average value and the second image gray standard deviation to obtain the fusion weight.
Further, the processor 901 is further configured to: identifying the center point position of the mouth of the person and the center point position of the cigarette in the person image by adopting a pre-trained target detection model; calculating the distance between the center point position of the mouth of the person and the center point position of the cigarette to obtain the center point distance; the center point distance is determined as the actual distance.
Further, the processor 901 is further configured to: and inputting the fusion image characteristics into a pre-trained classification model to classify the smoking behaviors of the driver, so as to obtain the recognition result of whether smoking exists.
Further, the processor 901 is further configured to: acquiring an infrared face image and a visible light face image; determining respective sample image gray average values and sample image gray standard deviations of the infrared face image and the visible face image according to the image gray values to obtain sample fusion weights; determining the distance between the mouth of the person and the cigarette in each of the infrared face image and the visible light face image to obtain a distance value pair; generating sample fusion image features according to the sample fusion weight and the distance value pair; and training the classification model by adopting the sample fusion image features.
The memory 902 may include one or more computer-readable storage media, which may be non-transitory. The memory 902 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments of the present disclosure, a non-transitory computer readable storage medium in memory 902 is used to store at least one instruction for execution by processor 901 to implement the methods in embodiments of the present disclosure.
In some embodiments, the vehicle 900 further includes: a peripheral interface 903, and at least one peripheral. The processor 901, memory 902, and peripheral interface 903 may be connected by a bus or signal line. The individual peripheral devices may be connected to the peripheral device interface 903 via buses, signal lines, or circuit boards. Specifically, the peripheral device includes: at least one of a display 904, a camera 905, and an audio circuit 906.
The peripheral interface 903 may be used to connect at least one peripheral device associated with an I/O (Input/Output) to the processor 901 and the memory 902. In some embodiments of the present disclosure, the processor 901, memory 902, and peripheral interface 903 are integrated on the same chip or circuit board; in some other embodiments of the present disclosure, either or both of the processor 901, the memory 902, and the peripheral interface 903 may be implemented on separate chips or circuit boards. The embodiments of the present disclosure are not particularly limited thereto.
The display screen 904 is used to display a UI (User Interface). The UI may include graphics, text, icons, video, and any combination thereof. When the display 904 is a touch display, the display 904 also has the ability to collect touch signals at or above the surface of the display 904. The touch signal may be input as a control signal to the processor 901 for processing. At this point, the display 904 may also be used to provide virtual buttons and/or a virtual keyboard, also referred to as soft buttons and/or a soft keyboard. In some embodiments of the present disclosure, the display 904 may be one, providing a front panel of the vehicle 900; in other embodiments of the present disclosure, the display 904 may be at least two, each disposed on a different surface of the vehicle 900 or in a folded design; in still other embodiments of the present disclosure, the display 904 may be a flexible display disposed on a curved surface or a folded surface of the vehicle 900. Even more, the display 904 may be arranged in a non-rectangular, irregular pattern, i.e., a shaped screen. The display 904 may be made of LCD (Liquid Crystal Display ), OLED (Organic Light-Emitting Diode) or other materials.
The camera 905 is used to capture images or video. Optionally, the camera 905 includes a front camera and a rear camera. Typically, a front camera is provided on a front panel of the vehicle 900 and a rear camera is provided on a rear surface of the vehicle 900. In some embodiments, the at least two rear cameras are any one of a main camera, a depth camera, a wide-angle camera and a tele camera, so as to realize that the main camera and the depth camera are fused to realize a background blurring function, and the main camera and the wide-angle camera are fused to realize a panoramic shooting and Virtual Reality (VR) shooting function or other fusion shooting functions. In some embodiments of the present disclosure, the camera 905 may also include a flash. The flash lamp can be a single-color temperature flash lamp or a double-color temperature flash lamp. The dual-color temperature flash lamp refers to a combination of a warm light flash lamp and a cold light flash lamp, and can be used for light compensation under different color temperatures.
The audio circuitry 906 may include a microphone and a speaker. The microphone is used for collecting sound waves of users and the environment, converting the sound waves into electric signals and inputting the electric signals to the processor 901 for processing. For purposes of stereo acquisition or noise reduction, the microphones may be multiple and separately disposed at different locations of the vehicle 900. The microphone may also be an array microphone or an omni-directional pickup microphone.
The power supply 907 is used to power the various components in the vehicle 900. The power source 907 may be an alternating current, direct current, disposable battery, or rechargeable battery. When the power source 907 includes a rechargeable battery, the rechargeable battery may be a wired rechargeable battery or a wireless rechargeable battery. The wired rechargeable battery is a battery charged through a wired line, and the wireless rechargeable battery is a battery charged through a wireless coil. The rechargeable battery may also be used to support fast charge technology.
The block diagram of the vehicle 900 shown in the embodiments of the present disclosure is not limiting of the vehicle 900, and the vehicle 900 may include more or less components than illustrated, or may combine certain components, or may employ a different arrangement of components.
It should be noted that, information (including but not limited to user equipment information, user personal information, etc.), data (including but not limited to data for analysis, stored data, presented data, etc.), and signals according to the embodiments of the present disclosure are all authorized by the user or are fully authorized by the parties, and the collection, use, and processing of relevant data is required to comply with relevant laws and regulations and standards of relevant countries and regions. For example, object features, interactive behavior features, user information, and the like referred to in this specification are all acquired with sufficient authorization.
In the description of the present disclosure, it is to be understood that the terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. The specific meaning of the terms in this disclosure will be understood by those of ordinary skill in the art in the specific context. Furthermore, in the description of the present disclosure, unless otherwise indicated, "a plurality" means two or more. "and/or", describes an association relationship of an association object, and indicates that there may be three relationships, for example, a and/or B, and may indicate: a exists alone, A and B exist together, and B exists alone. The character "/" generally indicates that the context-dependent object is an "or" relationship.
The foregoing is merely specific embodiments of the disclosure, but the protection scope of the disclosure is not limited thereto, and any person skilled in the art can easily think about changes or substitutions within the technical scope of the disclosure, and the changes and substitutions are intended to be covered by the protection scope of the disclosure. Accordingly, equivalent variations from the claims of the present disclosure are intended to be covered by this disclosure.

Claims (10)

1. A smoking behavior recognition method, characterized in that the smoking behavior recognition method comprises:
Determining the fusion weight of the character image according to the image display parameters; wherein the character image is an image of the position of the driver in the cabin;
determining the actual distance between the mouth of the person and the cigarette in the person image;
generating fusion image features according to the fusion weights and the actual distances;
and classifying the smoking behaviors of the driver according to the fusion image characteristics to obtain a recognition result of whether smoking exists.
2. The smoking behavior recognition method of claim 1, wherein the character image includes an infrared image and a visible light image;
before the step of determining the fusion weight of the person image according to the image display parameters, the smoking behavior recognition method further comprises the following steps:
acquiring a first video stream of the position of a driver in a vehicle cabin, and performing frame extraction processing on the first video stream to obtain the infrared image, wherein the first video stream is acquired by an infrared camera;
acquiring a second video stream of the whole environment inside the vehicle cabin, and performing frame extraction processing on the second video stream to obtain a vehicle cabin environment image, wherein the second video stream is acquired by a visible light camera;
and intercepting the image at the position from the vehicle cabin environment image to obtain the visible light image.
3. A smoking behaviour recognition method according to claim 2, wherein said image display parameter comprises an image grey scale value;
the step of determining the fusion weight of the character image according to the image display parameters comprises the following steps:
determining a first image gray average value and a first image gray standard deviation of the infrared image according to the image gray value of the infrared image;
determining a second image gray average value and a second image gray standard deviation of the visible light image according to the image gray value of the visible light image;
and carrying out normalization processing on the first image gray average value, the first image gray standard deviation, the second image gray average value and the second image gray standard deviation to obtain the fusion weight.
4. The smoking behavior recognition method of claim 1, wherein the step of determining an actual distance between the person's mouth and the cigarette in the person image comprises:
identifying the center point position of the mouth of the person and the center point position of the cigarette in the person image by adopting a pre-trained target detection model;
calculating the distance between the center point position of the mouth of the person and the center point position of the cigarette to obtain the center point distance;
The center point distance is determined as the actual distance.
5. The smoking behavior recognition method according to any one of claims 1 to 4, wherein the step of classifying the smoking behavior of the driver based on the fused image features to obtain a recognition result of whether there is smoking includes:
and inputting the fusion image characteristics into a pre-trained classification model to classify the smoking behaviors of the driver, so as to obtain the recognition result of whether smoking exists.
6. The smoking behavior recognition method of claim 5, wherein before the step of classifying the smoking behavior of the driver based on the fused image features to obtain a recognition result of whether there is smoking, the smoking behavior recognition method further comprises:
acquiring an infrared face image and a visible light face image;
determining respective sample image gray average values and sample image gray standard deviations of the infrared face image and the visible face image according to the image gray values to obtain sample fusion weights;
determining the distance between the mouth of the person and the cigarette in each of the infrared face image and the visible light face image to obtain a distance value pair;
Generating sample fusion image features according to the sample fusion weight and the distance value pair;
and training the classification model by adopting the sample fusion image features.
7. A smoking detection model, the smoking detection model comprising:
the fusion weight module is used for determining fusion weights of the infrared image and the visible light image according to image display parameters under the condition that the face exists in the infrared image and/or the visible light image;
a pre-trained target detection model for determining an actual distance between a person's mouth and a cigarette in each of the infrared image and the visible light image;
the feature fusion module is used for generating fusion image features according to the fusion weights and the actual distances;
and the pre-trained classification model is used for classifying smoking behaviors of the people in the infrared image and the visible light image according to the fusion image characteristics to obtain a recognition result of whether smoking exists.
8. A smoking behaviour recognition device, the smoking behaviour recognition device comprising:
the weight calculation module is used for determining the fusion weight of the person image according to the image display parameters;
The distance calculation module is used for determining the actual distance between the mouth of the person and the cigarette in the person image;
the feature fusion module is used for generating fusion image features according to the fusion weights and the actual distances;
and the behavior recognition module is used for classifying the smoking behaviors of the driver according to the fused image characteristics to obtain a recognition result of whether smoking exists.
9. A vehicle comprising a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor implements the smoking behaviour recognition method according to any one of claims 1 to 6 when the computer program is executed by the processor.
10. A computer-readable storage medium, on which a computer program is stored, characterized in that the computer program, when executed by a processor, implements a smoking behaviour recognition method according to any one of claims 1 to 6.
CN202310145939.6A 2023-02-21 2023-02-21 Smoking behavior recognition method, smoking detection model, device, vehicle, and medium Pending CN116486383A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310145939.6A CN116486383A (en) 2023-02-21 2023-02-21 Smoking behavior recognition method, smoking detection model, device, vehicle, and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310145939.6A CN116486383A (en) 2023-02-21 2023-02-21 Smoking behavior recognition method, smoking detection model, device, vehicle, and medium

Publications (1)

Publication Number Publication Date
CN116486383A true CN116486383A (en) 2023-07-25

Family

ID=87222080

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310145939.6A Pending CN116486383A (en) 2023-02-21 2023-02-21 Smoking behavior recognition method, smoking detection model, device, vehicle, and medium

Country Status (1)

Country Link
CN (1) CN116486383A (en)

Similar Documents

Publication Publication Date Title
US20210012128A1 (en) Driver attention monitoring method and apparatus and electronic device
KR102470680B1 (en) Motion recognition, driving motion analysis method and device, electronic device
CN103383723B (en) Method and system for spoof detection for biometric authentication
CN111439170B (en) Child state detection method and device, electronic equipment and storage medium
CN108470169A (en) Face identification system and method
CN112016457A (en) Driver distraction and dangerous driving behavior recognition method, device and storage medium
CN105659200A (en) Method, apparatus, and system for displaying graphical user interface
JP2007200298A (en) Image processing apparatus
US20180204078A1 (en) System for monitoring the state of vigilance of an operator
CN110865705A (en) Multi-mode converged communication method and device, head-mounted equipment and storage medium
Tayibnapis et al. Driver's gaze zone estimation by transfer learning
CN114463725A (en) Driver behavior detection method and device and safe driving reminding method and device
CN208351494U (en) Face identification system
CN111091044A (en) Network appointment-oriented in-vehicle dangerous scene identification method
US20220309808A1 (en) Driver monitoring device, driver monitoring method, and driver monitoring-use computer program
CN114663863A (en) Image processing method, image processing device, electronic equipment and computer storage medium
CN113597616A (en) Pupil position determination method, device and system
CN110491384B (en) Voice data processing method and device
CN112560775A (en) Switch control method and device, computer equipment and storage medium
CN116486383A (en) Smoking behavior recognition method, smoking detection model, device, vehicle, and medium
CN116543266A (en) Automatic driving intelligent model training method and device guided by gazing behavior knowledge
CN114565531A (en) Image restoration method, device, equipment and medium
CN113197542A (en) Online self-service vision detection system, mobile terminal and storage medium
CN116884078B (en) Image pickup apparatus control method, monitoring device, and computer-readable medium
JP7412514B1 (en) Cabin monitoring method and cabin monitoring system that implements the above cabin monitoring method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination