CN111611971A - Behavior detection method and system based on convolutional neural network - Google Patents

Behavior detection method and system based on convolutional neural network Download PDF

Info

Publication number
CN111611971A
CN111611971A CN202010485168.1A CN202010485168A CN111611971A CN 111611971 A CN111611971 A CN 111611971A CN 202010485168 A CN202010485168 A CN 202010485168A CN 111611971 A CN111611971 A CN 111611971A
Authority
CN
China
Prior art keywords
image
predicted
key point
mouth
hand
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010485168.1A
Other languages
Chinese (zh)
Other versions
CN111611971B (en
Inventor
郁强
李圣权
李开民
曹喆
金仁杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
CCI China Co Ltd
Original Assignee
CCI China Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by CCI China Co Ltd filed Critical CCI China Co Ltd
Priority to CN202010485168.1A priority Critical patent/CN111611971B/en
Publication of CN111611971A publication Critical patent/CN111611971A/en
Application granted granted Critical
Publication of CN111611971B publication Critical patent/CN111611971B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/28Recognition of hand or arm movements, e.g. recognition of deaf sign language

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a behavior detection method and system based on a convolutional neural network, wherein the behavior detection method is used for detecting artificial specific behavior actions in videos or dynamic images by means of the convolutional neural network, wherein the artificial specific behavior actions include but are not limited to behavior actions of eating and drinking food, smoking and the like, and the behavior detection method replaces an artificial supervision mode, and has the advantages of being high in detection rate, accurate in detection precision and the like.

Description

Behavior detection method and system based on convolutional neural network
Technical Field
The invention relates to the field of video processing, in particular to a behavior detection method and system based on a convolutional neural network.
Background
Deep learning is a new field of machine learning research, and the motivation is to establish and simulate a neural network for analyzing and learning the human brain, and to use the established neural network to simulate the human brain and replace human activities to complete the analysis and processing of data. In order to better and more accurately acquire useful information in an image, most of the current deep learning techniques focus on analyzing and processing a static image, and find little research about the application of the deep learning techniques to processing a dynamic image or a video image.
However, the analysis of moving images or video images in real life is of great research interest, especially when used for detecting specific behavioral actions, such as eating, smoking, drinking, etc., it is necessary to analyze the moving image or video image information.
At present, in some special places, such as public places like subways, buses, movie theaters, museums and the like, clear fasting orders and smoking prohibition orders exist, but the actual execution conditions of the fasting orders or the smoking prohibition orders are not optimistic, and a control unit lacks an actual effective supervision means for people who violate the fasting orders or the smoking prohibition orders, so that the situation is mainly caused because the dynamic behaviors belong to instantaneous behaviors and can be carried out within minutes or even seconds, the people flow and the area of the public places are usually very large, even if a person specially assigned for monitoring is difficult to ensure that a monitoring person can supervise the person who violates the smoking prohibition orders in an all-round manner, the energy of the person is limited, and the time and labor consumption and the effect of the way of the person assigned for monitoring are poor.
Disclosure of Invention
The invention aims to provide a behavior detection method and a behavior detection system based on a convolutional neural network, wherein the behavior detection method is used for detecting artificial specific behavior actions in videos or dynamic images by means of the convolutional neural network, including but not limited to behavior actions of eating and drinking food, smoking and the like, replaces an artificial supervision mode, and has the advantages of being high in detection rate, accurate in detection precision and the like.
The technical scheme provides a behavior detection method based on a convolutional neural network, which comprises the following steps:
acquiring image data, wherein the image data at least comprises a first image and a second image aiming at the same detection object, and the second image is acquired after the first image is acquired for a fixed time period;
inputting the first image and the second image into a neural network model, and acquiring a confidence map and an affinity vector map of a predicted hand key point and a predicted mouth key point in the first image and the second image, wherein the confidence map represents the accuracy of the predicted hand key point and the predicted mouth key point, and the affinity vector represents the relevance of the predicted hand key point and the predicted mouth key point;
analyzing the confidence coefficient graph and the affinity vector graph of the predicted hand key point and the predicted mouth key point through a greedy algorithm, and outputting coordinate values of the predicted hand key point and the predicted mouth key point;
and acquiring the mouth-hand distance of the detection object in the first image and the second image according to the coordinate values of the predicted hand key point and the predicted mouth key point.
In other embodiments, the method further comprises:
the image data at least comprises three continuous images aiming at the same detection object, wherein the time interval between the two continuous images is fixed; inputting the continuous images into a neural network model, obtaining a confidence map and an affinity vector value of a hand key point and a predicted mouth key point of each image, calculating the mouth-hand distance of the detection object in each image according to the confidence map and the affinity vector map, and judging that the detection object is smoking tobacco if the mouth-hand distances of adjacent images in the continuous images change in alternate size.
Further, the edible product comprises food and tobacco, so that the behavior detection method can be used for detecting the behavior of eating the edible product and smoking the tobacco.
This technical scheme provides a behavior detection system based on convolutional neural network, includes:
an image acquisition unit configured to acquire image data including at least two images for a same detection object; the confidence coefficient unit is used for acquiring a confidence coefficient map of the predicted hand key point and the predicted mouth key point of the image; the affinity unit is used for acquiring an affinity vector diagram of the predicted hand key point and the predicted mouth key point in the image; the analysis unit is used for analyzing the confidence coefficient map and the affinity vector map of the predicted hand key point and the predicted mouth key point through a greedy algorithm and outputting coordinate values of the predicted hand key point and the predicted mouth key point; and the calculating unit is used for acquiring the mouth-hand distance of the detection object in the image according to the coordinate values of the predicted hand key point and the predicted mouth key point.
The present solution provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the above-mentioned steps of the convolutional neural network-based behavior detection method when executing the program.
The present solution provides a computer readable storage medium having stored thereon a computer program which, when being executed by a processor, realizes the above-mentioned steps of the convolutional neural network-based behavior detection method.
Drawings
FIG. 1 is a schematic structural diagram of a food testing model.
Fig. 2 is a schematic structural diagram of a hand mouth keypoint detection model according to the present embodiment.
Fig. 3 is a schematic method flow diagram of the behavior detection method based on the convolutional neural network according to the present embodiment.
Fig. 4 is a schematic diagram of a convolutional neural network-based behavior detection system according to the present embodiment.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments that can be derived by one of ordinary skill in the art from the embodiments given herein are intended to be within the scope of the present invention.
It should be recognized that embodiments of the present invention can be realized and implemented by computer hardware, a combination of hardware and software, or by computer instructions stored in a non-transitory computer readable memory. The methods may be implemented in a computer program using standard programming techniques, including a non-transitory computer-readable storage medium configured with the computer program, where the storage medium so configured causes a computer to operate in a specific and predefined manner, according to the methods and figures described in the detailed description. Each program may be implemented in a high level procedural or object oriented programming language to communicate with a computer system. However, the program(s) can be implemented in assembly or machine language, if desired. In any case, the language may be a compiled or interpreted language. Furthermore, the program can be run on a programmed application specific integrated circuit for this purpose.
Further, the operations of processes described herein can be performed in any suitable order unless otherwise indicated herein or otherwise clearly contradicted by context. The processes described herein (or variations and/or combinations thereof) may be performed under the control of one or more computer systems configured with executable instructions, and may be implemented as code (e.g., executable instructions, one or more computer programs, or one or more applications) collectively executed on one or more processors, by hardware, or combinations thereof. The computer program includes a plurality of instructions executable by one or more processors.
Further, the operations of processes described herein can be performed in any suitable order unless otherwise indicated herein or otherwise clearly contradicted by context. The processes described herein (or variations and/or combinations thereof) may be performed under the control of one or more computer systems configured with executable instructions and may be implemented as code (e.g., executable instructions, one or more computer programs or one or more applications, by hardware, or combinations thereof) that collectively executes on one or more processors.
A computer program can be applied to input data to perform the functions herein to transform the input data to generate output data that is stored to non-volatile memory. The output information may also be applied to one or more output devices, such as a display. In a preferred embodiment of the invention, the transformed data represents physical and tangible objects, including particular visual depictions of physical and tangible objects produced on a display.
Specifically, the technical scheme provides a behavior detection method and a behavior detection system based on a convolutional neural network, and the behavior detection method based on the convolutional neural network can be used for detecting dynamic behaviors of people for eating and drinking food, smoking and the like in videos or dynamic images, and particularly can be applied to monitoring management in public places.
It should be noted that, in the present scheme, the change of the mouth-hand distance is used to determine the eating or smoking behavior of the user, and the position of the head (i.e., the mouth) of the user is usually kept still when the user eats or smokes, and at this time, the mouth-hand distance intuitively reflects the dynamic motion of the hand.
In the scheme, whether the detection object eats the edible product is judged by utilizing the mouth-hand distance of the detection object for judging the multiple images, wherein the edible product comprises but is not limited to food and tobacco, the food comprises various finished products and raw materials for people to eat or drink and articles which are food and medicines according to the tradition, and the tobacco comprises electronic cigarettes, tobacco pipes and the like. When the edible food is edible food, people can eat the food; when the edible product is a drinking beverage, people can drink the food; when the food is tobacco, people can smoke, and any one of the behaviors of eating, drinking or smoking mentioned above can be detected by the behavior detection behavior based on the convolutional neural network provided by the scheme.
Specifically, the present invention provides a behavior detection method based on a convolutional neural network, which is used for detecting whether a specific behavior exists in a detection object, such as eating food or smoking tobacco, and includes the following steps:
acquiring image data, wherein the image data at least comprises a first image and a second image aiming at the same detection object, and the second image is acquired after the first image is acquired for a fixed time period;
inputting the first image and the second image into a neural network model, and acquiring a confidence map and an affinity vector map of a predicted hand key point and a predicted mouth key point in the first image and the second image, wherein the confidence map represents the accuracy of the predicted hand key point and the predicted mouth key point, and the affinity vector represents the relevance of the predicted hand key point and the predicted mouth key point;
analyzing the confidence coefficient graph and the affinity vector graph of the predicted hand key point and the predicted mouth key point through a greedy algorithm, and outputting coordinate values of the predicted hand key point and the predicted mouth key point;
and acquiring the mouth-hand distance of the detection object in the first image and the second image according to the coordinate values of the predicted hand key point and the predicted mouth key point.
In some embodiments, the continuous image data may be selected as a corresponding continuous video frame image set in the surveillance video, or may be selected as a continuously shot image set within a set time period, or may be selected as a continuously performed dynamic image set.
And before obtaining the confidence map and affinity vector map of the predicted hand key points and predicted mouth key points in the first image and the second image, the first image and the second image are processed by a convolution module to obtain corresponding feature maps.
In addition, in the scheme, whether the edible product exists in the image obtained by detection of the edible product detection model can be selected, and whether the edible product is held by the hand of the detection object in the image obtained by detection of the edible product detection model can also be combined with the coordinate values of the key points of the hand.
Of course, the step of detecting whether the object hand holds the food can be performed before or after the mouth-hand distance is obtained, and if the continuous image data of the food held by the object hand is selected in an artificial selection mode, the food is detected without a deep learning model.
The method comprises the steps that an edible product detection model can detect the type and the coordinate value of an edible product of image data, a neural network model on the edible product detection model obtains the coordinate value of a predicted hand key point, whether a hand of a detection object holds the edible product is judged according to the coordinate information of the edible product and the coordinate information of the predicted hand key point, and if the coordinate information of the edible product and the coordinate information of the hand key point are overlapped or close to each other, or the coordinate value range of the edible product and the coordinate value of the predicted hand key point are crossed, the hand of the detection object is judged to hold the edible product.
When the user is judged whether to eat the food, at least two spaced images are only needed, and if the food is held by the hand of the detection object in the first image and the second image, the absolute difference between the mouth-hand distances of the first image and the second image is judged to be larger than a set first threshold value, the detection object is judged to eat the food. It is worth mentioning here that the mouth-hand distance of the second image may be larger than the mouth-hand distance of the first image, at which time the user finishes eating to take the food away from the mouth; the mouth-hand distance of the second image may also be smaller than the mouth-hand distance of the first image, when the user is carrying food into the mouth to complete a eating motion.
It is worth mentioning that the acquisition time interval between the first image and the second image does not exceed 30 seconds, and the first threshold is set to be not more than 0.5 meter.
For example, take user a eating bread as an example:
according to the behavior detection method based on the convolutional neural network, the first image and the second image containing the user A are obtained, the mouth-hand distance of the user A in the first image and the mouth-hand distance of the user A in the second image are respectively obtained, and if the difference value between the mouth-hand distance of the first image and the mouth-hand distance of the second image is larger than a set first threshold value, the user A is determined to move food to the mouth through the hand in the interval between the two images, namely the user A eats bread. (of course, in this scenario the bread on the user A's hand is detected by the food inspection model).
The user smoking behavior and food eating behavior are similar and different, and the user smoking tobacco is constantly reciprocating in the mouth, so at least three images of the detection object need to be acquired, the time interval of acquiring each image is set as a time period, the mouth-hand distance of the detection object in each image is acquired, and the judgment is carried out according to the judgment standard that the mouth-hand distance alternately increases and decreases.
Correspondingly, the scheme provides a behavior detection method based on a convolutional neural network, which is used for detecting the behavior of edible food and comprises the following steps:
acquiring image data, wherein the image data at least comprises three continuous images aiming at the same detection object, and the acquisition time interval of the two continuous images is fixed; and inputting the continuous images into a neural network model, acquiring a confidence map and an affinity vector value of the hand key points and the mouth key points of each image, and calculating the mouth-hand distance of the detected object in each image according to the confidence map and the affinity vector map by using a greedy algorithm. And if the absolute difference value between the mouth-hand distances in the continuous images changes alternately, judging that the detection object sucks tobacco.
Of course, before obtaining the confidence map and affinity vector map of the predicted hand keypoints and predicted mouth keypoints, the image is passed through a convolution module to obtain the corresponding feature map, which is the same as the above-mentioned step.
Taking the acquisition of four images as an example, the behavior detection method based on the convolutional neural network provided by the scheme comprises the following steps:
acquiring image data, wherein the image data at least comprises a first image and a second image, a third image and a fourth image aiming at the same detection object, the second image is acquired after a fixed time period of the first image acquisition, the third image is acquired after the fixed time period of the second image acquisition, and the fourth image is acquired before the fixed time period of the third image acquisition;
inputting the first image, the second image, the third image and the fourth image into a neural network model to obtain a confidence map and an affinity vector map of the hand key points and the mouth key points in the first image, the second image, the third image and the fourth image;
and acquiring the first image, the second image and the third image according to the confidence coefficient image and the affinity vector image by using a greedy algorithm, and detecting the mouth-hand distance of the object in the fourth image.
If the mouth-hand distance in the first image, the second image, the third image and the fourth image is changed in size alternately, for example, the mouth-hand distance in the second image is smaller than the mouth-hand distance in the first image, the mouth-hand distance in the third image is larger than the mouth-hand distance in the second image, and the mouth-hand distance in the fourth image is smaller than the mouth of the third image, or the mouth-hand distance in the second image is larger than the mouth-hand distance in the first image, the mouth-hand distance in the third image is smaller than the mouth-hand distance in the second image, and the mouth-hand distance in the fourth image is larger than the mouth-hand distance in the third image. And the absolute difference value of the mouth-hand distances of the adjacent images is larger than the set threshold value, and the set threshold values corresponding to the absolute difference values of the mouth-hand distances of different adjacent images can be different, so that the tobacco smoking of the detection object is judged.
It should be noted that the values of the first threshold, the second threshold, and the third threshold may be set to be consistent or inconsistent, and the time intervals between the acquisition times of the consecutive images do not necessarily need to be consistent. Preferably, the control interval time is not more than 10 seconds, and the judgment threshold corresponding to the absolute difference value between the mouth-hand distances of the continuous images is set to be not more than 0.3 meter.
Taking the example that the user A sucks tobacco as an example:
the method comprises the steps of obtaining a first image, a second image, a third image and a fourth image of a user A, obtaining mouth-hand distances of the user A in the images respectively, judging that a detection object sucks tobacco when the user A is determined to take cigarettes into a mouth part within the interval time between the first image and the second image, taking the cigarettes out of the mouth part within the interval time between the second image and the third image, and again approaching the cigarettes to the mouth part within the interval time between the third image and the fourth image if the absolute difference between the mouth-hand distances of the first image and the second image is larger than a set first threshold value, the absolute difference between the mouth-hand distances of the second image and the third image is smaller than a set second threshold value, and the absolute difference between the mouth-hand distances of the second image and the third image is larger than a set third threshold value, and judging that the user A sucks tobacco.
The construction and training process of the neural network model adopted by the scheme is as follows:
preparation of pedestrian hand and mouth key point detection data: marking key points of hands and mouths of pedestrians in the collected marking image data, and marking affinity vectors of the key points of the hands and affinity vectors of the key points of the mouths;
the pedestrian hand and mouth key point detection network structure design: the main network is composed of a convolution neural module, marked image data is used as input, a feature map F is obtained through the convolution module A, the network is divided into two branches, the confidence coefficients of a hand key point and a mouth key point are predicted through a branch 1, the affinity vector of the hand key point and the affinity vector of the mouth key point are predicted through a branch 2, each branch is an iterative prediction framework, the branch 1 and the branch 2 form a stage, and the network generates a group of detection confidence coefficient maps Score in each stagek=ρk(F) And a set of affinity vectors
Figure BDA0002518780270000101
Where ρ is1And
Figure BDA0002518780270000102
the output result of the network in the first stage is input into the prediction result in the previous stage and the characteristic diagram F, rho obtained by the convolution module A in each stagekAnd
Figure BDA0002518780270000103
the convolutional neural block structure representing the k-th stage, whose output is:
Figure BDA0002518780270000104
and
Figure BDA0002518780270000105
analyzing a confidence map of key points of the hand and the mouth through greedy reasoning and learning the association of the hand and the mouth by a Part affinity fields (PAF component affinity vector field);
training the detection models of the hand and the mouth key points of the pedestrian: assigning an initialization value to the network parameter, and setting the maximum iteration number m of the network; inputting the prepared training image data set into a network, training, and if the loss value is reduced all the time, continuing training until a final model is obtained after M iterations; if the loss value tends to be stable in the midway, stopping iteration to obtain a final model;
the loss function is:
Figure BDA0002518780270000106
two loss functions for each stage k in the equation
Figure BDA0002518780270000107
And
Figure BDA0002518780270000108
wherein,
Figure BDA0002518780270000111
representing confidence maps for manually labeling hand and mouth keypoints,
Figure BDA0002518780270000113
the method comprises the steps of representing an affinity vector of manually marked hand key points and an affinity vector of mouth key points, wherein m represents key points of a hand and key points of a mouth, n represents limbs, namely the hand and the mouth, and one limb corresponds to two key points.
The construction and training process of the food detection model is as follows:
wherein the food product comprises food or tobacco.
Preparing data: labeling the labeling image, wherein the labeling information is the enclosing frame of the food or the tobacco and the labeled category, namely (c)j,xj,yj,wj,hj) In which C isjIndicating the category of the bounding box, different categories of food corresponding to different cjValue, xj,yjCoordinates representing the vertex of the upper left corner of the bounding box, wj,hjRepresenting the width and the height of the surrounding frame, and dividing the labeled data sample into a training set, a verification set and a test set according to the ratio of 8:1: 1;
and (3) network structure design: the algorithm adopts a convolutional neural network with a multi-scale structure, a backbone network is composed of residual modules, network characteristic channel separation and channel shuffling are carried out, a top-down characteristic pyramid structure is adopted on the basis of the backbone network, top-down up-sampling operation is added, deep layer characteristics and shallow layer characteristic information fusion of a plurality of layers are constructed, so that better characteristics are obtained, candidate frames with different sizes are screened, and finally an optimal result is reserved;
the network employs a swish activation function,
Figure BDA0002518780270000112
training: setting the size of an input image as 416 x 416, setting the input minimum batch data value as 64, setting the learning rate as 10 < -3 >, and performing optimized learning by adopting an Adam gradient descent strategy;
and (3) testing a model: test data is input, and bounding box information (c, x, y, w, h) is output.
Marking C corresponding to various foods in the food detection modelj1, and 2 for cigarettes or other tobaccos, whether the food is contained or not and whether the food is food or tobacco can be known according to the output bounding box information, and the coordinates of the food can be known.
In addition, the scheme can perform behavior management on the basis of a behavior detection method based on a convolutional neural network, and comprises the following steps of: and loading the detection frame of the detection object with the edible food into a pedestrian recognition detection model, acquiring the identity information of the detection object, and recording the identity information as a task library event.
This scheme provides a behavior detection system based on convolutional neural network in addition, includes:
an image acquisition unit configured to acquire image data including at least two images for a same detection object; the confidence coefficient unit is used for acquiring a confidence coefficient map of the predicted hand key point and the predicted mouth key point of the image; the affinity unit is used for acquiring an affinity vector diagram of the predicted hand key point and the predicted mouth key point in the image; the analysis unit is used for analyzing the confidence coefficient map and the affinity vector map of the predicted hand key point and the predicted mouth key point through a greedy algorithm and outputting coordinate values of the predicted hand key point and the predicted mouth key point; and the calculating unit is used for acquiring the mouth-hand distance of the detection object in the image according to the coordinate values of the predicted hand key point and the predicted mouth key point.
Certainly, the behavior detection system based on the convolutional neural network provided in the present solution further includes a determining unit, configured to determine a difference relationship between a mouth-hand distance of the detection object and the set threshold. The details of how to judge and data are described in the above description of behavior detection method based on convolutional neural network, and are not redundantly described here.
In addition, the behavior detection system based on the convolutional neural network comprises an edible product detection unit, wherein the edible product detection unit operates and operates an edible product detection model and is used for detecting whether an edible product and a coordinate value of the edible product exist in the image data, and at the moment, the judgment unit further judges whether the detection object hand holds the edible product or not based on the coordinate value of the edible product and the coordinate value of the key point of the predicted hand. In the convolutional neural network-based behavior detection system, the food products include food and tobacco.
The training and construction process for the food detection model and the hand keypoint detection model is as described above and will not be redundantly described here.
In addition, in some embodiments, the present solution provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the above-mentioned steps of the convolutional neural network-based behavior detection method when executing the program.
There is provided a computer readable storage medium having stored thereon a computer program which, when being executed by a processor, carries out the above-mentioned steps of the convolutional neural network-based behavior detection method.
The present invention is not limited to the above-mentioned preferred embodiments, and any other products in various forms can be obtained by anyone in the light of the present invention, but any changes in the shape or structure thereof, which have the same or similar technical solutions as those of the present application, fall within the protection scope of the present invention.

Claims (10)

1. A behavior detection method based on a convolutional neural network is characterized by comprising the following steps:
acquiring image data, wherein the image data at least comprises a first image and a second image aiming at the same detection object, and the second image is acquired after the first image is acquired for a fixed time period;
inputting the first image and the second image into a neural network model, and acquiring a confidence map and an affinity vector map of a predicted hand key point and a predicted mouth key point in the first image and the second image, wherein the confidence map represents the accuracy of the predicted hand key point and the predicted mouth key point, and the affinity vector represents the relevance of the predicted hand key point and the predicted mouth key point;
analyzing the confidence coefficient graph and the affinity vector graph of the predicted hand key point and the predicted mouth key point through a greedy algorithm, and outputting coordinate values of the predicted hand key point and the predicted mouth key point;
and acquiring the mouth-hand distance of the detection object in the first image and the second image according to the coordinate values of the predicted hand key point and the predicted mouth key point.
2. The convolutional neural network-based behavior detection method of claim 1, further comprising:
before obtaining a confidence map and an affinity vector map of predicted hand key points and predicted mouth key points in the first image and the second image, the first image and the second image are subjected to a convolution module to obtain corresponding feature maps.
3. The convolutional neural network-based behavior detection method of claim 1, further comprising:
and when the absolute difference value between the mouth-hand distances of the first image and the second image is larger than a set first threshold value, judging that the detection object eats food.
4. The convolutional neural network-based behavior detection method of claim 1, further comprising:
the image data at least comprises three continuous images aiming at the same detection object, wherein the time interval between the two continuous images is fixed; inputting the continuous images into a neural network model, acquiring coordinate values of the key points of the predicted hand part and the key points of the predicted mouth part of each image, and calculating the mouth-hand distance of the detection object in each image according to the coordinate values.
5. The convolutional neural network-based behavior detection method of claim 4, further comprising:
and if the mouth-hand distances of the adjacent images in the continuous images change in alternate sizes, judging that the detection object sucks tobacco.
6. The convolutional neural network-based behavior detection method as claimed in any one of claims 1 to 5, wherein the image data is input into a food detection model to obtain bounding box information of the food, wherein the bounding box information includes a type and coordinate values of the food, and if a range formed by the coordinate values of the food intersects with a seating value of a key point of a predicted hand, it is determined that the user holds the food.
7. A convolutional neural network-based behavior detection system, comprising:
an image acquisition unit configured to acquire image data including at least two images for a same detection object;
the confidence coefficient unit is used for acquiring a confidence coefficient map of the predicted hand key point and the predicted mouth key point of the image;
the affinity unit is used for acquiring an affinity vector diagram of the predicted hand key point and the predicted mouth key point in the image;
the analysis unit is used for analyzing the confidence coefficient map and the affinity vector map of the predicted hand key point and the predicted mouth key point through a greedy algorithm and outputting coordinate values of the predicted hand key point and the predicted mouth key point;
and the calculating unit is used for acquiring the mouth-hand distance of the detection object in the image according to the coordinate values of the predicted hand key point and the predicted mouth key point.
8. The convolutional neural network-based behavior detection system as claimed in claim 7, further comprising:
and the judging unit is used for judging the difference relation between the mouth-hand distance of the detection object and the set threshold value.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and running on the processor, characterized in that the computer program, when executed by the processor, implements the steps of the method according to any of claims 1-5.
10. A computer-readable storage medium, having stored thereon a computer program, characterized in that the computer program, when being executed by a processor, is adapted to carry out the steps of the method according to any of the claims 1-5.
CN202010485168.1A 2020-06-01 2020-06-01 Behavior detection method and system based on convolutional neural network Active CN111611971B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010485168.1A CN111611971B (en) 2020-06-01 2020-06-01 Behavior detection method and system based on convolutional neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010485168.1A CN111611971B (en) 2020-06-01 2020-06-01 Behavior detection method and system based on convolutional neural network

Publications (2)

Publication Number Publication Date
CN111611971A true CN111611971A (en) 2020-09-01
CN111611971B CN111611971B (en) 2023-06-30

Family

ID=72205090

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010485168.1A Active CN111611971B (en) 2020-06-01 2020-06-01 Behavior detection method and system based on convolutional neural network

Country Status (1)

Country Link
CN (1) CN111611971B (en)

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106530730A (en) * 2016-11-02 2017-03-22 重庆中科云丛科技有限公司 Traffic violation detection method and system
US20170213014A1 (en) * 2016-01-22 2017-07-27 Covidien Lp System and method for detecting smoking behavior
US20170220772A1 (en) * 2016-01-28 2017-08-03 Savor Labs, Inc. Method and apparatus for tracking of food intake and other behaviors and providing relevant feedback
CN108609018A (en) * 2018-05-10 2018-10-02 郑州天迈科技股份有限公司 Forewarning Terminal, early warning system and parser for analyzing dangerous driving behavior
CN108734125A (en) * 2018-05-21 2018-11-02 杭州杰视科技有限公司 A kind of cigarette smoking recognition methods of open space
CN109522958A (en) * 2018-11-16 2019-03-26 中山大学 Based on the depth convolutional neural networks object detection method merged across scale feature
CN109543627A (en) * 2018-11-27 2019-03-29 西安电子科技大学 A kind of method, apparatus and computer equipment judging driving behavior classification
CN110425005A (en) * 2019-06-21 2019-11-08 中国矿业大学 The monitoring of transportation of belt below mine personnel's human-computer interaction behavior safety and method for early warning
CN110688921A (en) * 2019-09-17 2020-01-14 东南大学 Method for detecting smoking behavior of driver based on human body action recognition technology
CN110705383A (en) * 2019-09-09 2020-01-17 深圳市中电数通智慧安全科技股份有限公司 Smoking behavior detection method and device, terminal and readable storage medium
CN111027481A (en) * 2019-12-10 2020-04-17 浩云科技股份有限公司 Behavior analysis method and device based on human body key point detection

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170213014A1 (en) * 2016-01-22 2017-07-27 Covidien Lp System and method for detecting smoking behavior
US20170220772A1 (en) * 2016-01-28 2017-08-03 Savor Labs, Inc. Method and apparatus for tracking of food intake and other behaviors and providing relevant feedback
CN106530730A (en) * 2016-11-02 2017-03-22 重庆中科云丛科技有限公司 Traffic violation detection method and system
CN108609018A (en) * 2018-05-10 2018-10-02 郑州天迈科技股份有限公司 Forewarning Terminal, early warning system and parser for analyzing dangerous driving behavior
CN108734125A (en) * 2018-05-21 2018-11-02 杭州杰视科技有限公司 A kind of cigarette smoking recognition methods of open space
CN109522958A (en) * 2018-11-16 2019-03-26 中山大学 Based on the depth convolutional neural networks object detection method merged across scale feature
CN109543627A (en) * 2018-11-27 2019-03-29 西安电子科技大学 A kind of method, apparatus and computer equipment judging driving behavior classification
CN110425005A (en) * 2019-06-21 2019-11-08 中国矿业大学 The monitoring of transportation of belt below mine personnel's human-computer interaction behavior safety and method for early warning
CN110705383A (en) * 2019-09-09 2020-01-17 深圳市中电数通智慧安全科技股份有限公司 Smoking behavior detection method and device, terminal and readable storage medium
CN110688921A (en) * 2019-09-17 2020-01-14 东南大学 Method for detecting smoking behavior of driver based on human body action recognition technology
CN111027481A (en) * 2019-12-10 2020-04-17 浩云科技股份有限公司 Behavior analysis method and device based on human body key point detection

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
QIANKUN TANG,JIE LI,ZHIPING SHI,YU HU: "LIGHTDET: A LIGHTWEIGHT AND ACCURATE OBJECT DETECTION NETWORK", 《2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING(ICASSP)》 *
ZHE CAO,TOMAS SIMON,SHIH-EN WEI,YASER SHEIKH: "Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields", 《2017 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION》, pages 2 - 5 *
王莹: "基于脸部关键点信息的驾驶员行为分析——以疲劳和吸烟为例", 《中国优秀硕士学位论文全文数据库 工程科技Ⅱ辑》, no. 2020 *

Also Published As

Publication number Publication date
CN111611971B (en) 2023-06-30

Similar Documents

Publication Publication Date Title
JP6977730B2 (en) People flow estimation device, people flow estimation method and program
CN105405150B (en) Anomaly detection method and device based on fusion feature
US20220139070A1 (en) Learning apparatus, estimation apparatus, data generation apparatus, learning method, and computer-readable storage medium storing a learning program
CN111738908A (en) Scene conversion method and system for generating countermeasure network by combining instance segmentation and circulation
CN113449570A (en) Image processing method and device
US11308348B2 (en) Methods and systems for processing image data
CN111401339A (en) Method and device for identifying age of person in face image and electronic equipment
CN107133629B (en) Picture classification method and device and mobile terminal
Yates et al. Evaluation of synthetic aerial imagery using unconditional generative adversarial networks
CN109558790B (en) Pedestrian target detection method, device and system
JP2016200971A (en) Learning apparatus, identification apparatus, learning method, identification method and program
CN111833375B (en) Method and system for tracking animal group track
CN109086696A (en) A kind of anomaly detection method, device, electronic equipment and storage medium
CN111783716A (en) Pedestrian detection method, system and device based on attitude information
CN115223204A (en) Method, device, equipment and storage medium for detecting illegal wearing of personnel
KR101467307B1 (en) Method and apparatus for counting pedestrians using artificial neural network model
CN111611971A (en) Behavior detection method and system based on convolutional neural network
CN109102486A (en) Detection method of surface flaw and device based on machine learning
JP2019066909A (en) Object distribution estimation apparatus
CN108875501A (en) Human body attribute recognition approach, device, system and storage medium
US20230154153A1 (en) Identifying visual contents used for training of inference models
CN115937991A (en) Human body tumbling identification method and device, computer equipment and storage medium
CN112836549A (en) User information detection method and system and electronic equipment
CN111444803B (en) Image processing method, device, electronic equipment and storage medium
CN115713806A (en) Falling behavior identification method based on video classification and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant