CN111860448A - Hand washing action recognition method and system - Google Patents

Hand washing action recognition method and system Download PDF

Info

Publication number
CN111860448A
CN111860448A CN202010764529.6A CN202010764529A CN111860448A CN 111860448 A CN111860448 A CN 111860448A CN 202010764529 A CN202010764529 A CN 202010764529A CN 111860448 A CN111860448 A CN 111860448A
Authority
CN
China
Prior art keywords
image
detection
hand washing
sample
image data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010764529.6A
Other languages
Chinese (zh)
Inventor
李江
李骊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing HJIMI Technology Co Ltd
Original Assignee
Beijing HJIMI Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing HJIMI Technology Co Ltd filed Critical Beijing HJIMI Technology Co Ltd
Priority to CN202010764529.6A priority Critical patent/CN111860448A/en
Publication of CN111860448A publication Critical patent/CN111860448A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/28Recognition of hand or arm movements, e.g. recognition of deaf sign language
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/2431Multiple classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Abstract

The invention provides a hand washing action recognition method and system. The method comprises the following steps: in the pre-detection stage, acquiring image data of a current frame; the image data comprises a registered color image and a registered depth image; pre-detecting the image data; if the pre-detection is passed, the depth image is used for carrying out background removal processing on the color image to obtain a foreground image; using the foreground image to identify the hand washing action to obtain an identification result corresponding to the image data; the recognition result includes the recognized hand washing action category. As can be seen, in the embodiment of the present invention, the hand washing action recognition is implemented in two steps, the first step is performed with the pre-detection, and the second step is performed only after the pre-detection is passed: and (5) recognizing the action. In the process of executing the second step, the registered depth image is used for assisting in background removal, a lot of background information is filtered, the recognition robustness is increased, and then the foreground image is recognized to obtain a recognition result.

Description

Hand washing action recognition method and system
Technical Field
The invention relates to the field of computers, in particular to a hand washing action identification method and system.
Background
Many industries have requirements for hand washing procedures. In the past, the hand washing process is realized through artificial consciousness and training, but the hand washing process is lack of favorable supervision and is inevitable to cause careless mistakes. Therefore, the automatic supervision operation can be carried out by means of the computer vision combined with the machine learning method, so that the labor and the cost are saved, and meanwhile, the correctness and the normativity of the hand washing step are guaranteed.
The prerequisite for automated supervision is the recognition of the hand washing action. Therefore, how to recognize the hand washing action is a popular study.
Disclosure of Invention
In view of this, embodiments of the present invention provide a hand washing action recognition method and system to realize hand washing action recognition.
In order to achieve the above purpose, the embodiments of the present invention provide the following technical solutions:
a hand washing action recognition method, comprising:
in the pre-detection stage, acquiring image data of a current frame; the image data comprises a registered color image and a registered depth image;
pre-detecting the image data;
if the pre-detection is passed, the depth image is used for carrying out background removal processing on the color image to obtain a foreground image;
using the foreground image to identify the hand washing action to obtain an identification result corresponding to the image data; the recognition result includes the recognized hand washing action category.
Optionally, the performing the pre-detection includes: flushing detection is performed using the image data; performing foam detection using the image data; if the flushing state is not reached and no foam exists, the pre-detection is determined to be passed.
Optionally, before the pre-detection stage, a sample preparation stage is further included; the sample preparation phase comprises: acquiring an image sample; the image samples comprise registered color image samples, depth image samples, and labels; the label is a first label, a second label or a third label; wherein the first tag comprises: information characterizing the flushing state; the second tag includes: information characterizing the presence of foam; the third label includes a hand washing action category; performing data enhancement on the registered color image samples and depth image samples to expand the number of image samples; and carrying out normalization processing on each image sample obtained after the data enhancement.
Optionally, the label is the first label, and the normalized image sample is a first target image sample; the label is the second label, and the normalized image sample is a second target image sample; the label is the third label, the color image sample in the normalized image sample is a target color image sample, and the depth image sample is a target depth image sample; the sample preparation phase further comprises: and performing background removal processing on the corresponding target color image by using the target depth image to obtain a foreground image, and forming a third target image sample by using the target depth image, the target depth image and the third label.
Optionally, after the sample preparation phase and before the pre-detection phase, a training phase is further included; the flush detection is performed by a trained first machine learning model, the foam detection is performed by a trained second machine learning model, and the hand wash action recognition is performed by a trained third machine learning model; the training phase comprises: performing a plurality of iterative trainings on a first machine learning model based on the first target image sample to obtain the trained first machine learning model; performing a plurality of iterative trainings on a second machine learning model based on the second target image sample to obtain the trained second machine learning model; and performing multiple times of iterative training on a third machine learning model based on the third target image sample to obtain the trained third machine learning model.
Optionally, the using the registered depth image to perform background removal processing on the color image to obtain a foreground image includes: and aiming at any pixel point, if the depth value of the pixel point in the depth image is out of a preset range, setting the pixel value of the pixel point in the color image to be 0.
Optionally, the third machine learning model includes: a plurality of directly connected depth-separable convolutional layers; fully rolling up the layers; a global pooling layer; and (5) classifying the task layer.
Optionally, the method further includes: determining and outputting the current hand washing action category by using the identification result of the continuous multi-frame image data; the continuous multi-frame image data comprises the image data of the current frame and continuous N frames of image data before the image data of the current frame; n is a positive integer.
A hand washing action recognition system comprising:
the acquisition unit is used for acquiring the image data of the current frame in the pre-detection stage; the image data comprises a registered color image and depth image;
a pre-detection unit to: pre-detecting the image data;
a pre-processing unit for:
if the pre-detection is passed, the depth image is used for carrying out background removal processing on the color image to obtain a foreground image;
a hand washing action recognition unit for:
using the foreground image to identify the hand washing action to obtain an identification result corresponding to the image data; the recognition result includes the recognized hand washing action category.
As can be seen, in the embodiment of the present invention, the hand washing action recognition is implemented in two steps, the first step is performed with the pre-detection, and the second step is performed only after the pre-detection is passed: and (5) recognizing the action. In the process of executing the second step, the registered depth image is used for assisting in background removal, a lot of background information is filtered, the recognition robustness is increased, and then the foreground image is recognized to obtain a recognition result.
Drawings
Fig. 1 is an exemplary configuration of a hand washing action recognition system according to an embodiment of the present invention;
fig. 2 is an exemplary flow chart of a hand washing action recognition method according to an embodiment of the present invention;
FIG. 3 is another exemplary configuration of a hand washing action recognition system provided by an embodiment of the present invention;
fig. 4 is another exemplary flow chart of a hand washing action recognition method according to an embodiment of the present invention;
FIG. 5 is an exemplary flow of a sample preparation phase provided by embodiments of the present invention;
fig. 6 is an exemplary structure of a CNN model provided in an embodiment of the present invention.
Detailed Description
For reference and clarity, the terms, abbreviations or abbreviations used hereinafter are summarized as follows:
CNN: convolutional Neural Network, Convolutional Neural Network;
depth image: depth image, also called range image, is an image in which the distance (depth) from an image capture to each point in a scene is defined as a pixel value;
3D: 3Dimensional, three-Dimensional;
loss Function: a loss function;
SGD: stochastic Gradient Descent, the random Gradient decreases;
NMS: non maximum value suppression, Non-maximum suppression.
The existing gesture recognition schemes in the market are almost pure color image gesture static recognition schemes, and various deep learning classification recognition experiments are carried out by collecting a large number of images of different hand shapes. However, the hand washing action only depending on the color data has poor recognition accuracy, is easily interfered by factors such as illumination, background and the like, and has poor robustness.
In view of the above, the present invention provides a hand washing action recognition method and system, so as to realize hand washing action recognition and solve the above problems.
Referring to fig. 1, an exemplary structure of the hand washing action recognition system includes: the hand washing machine comprises an acquisition unit 1, a pre-detection unit 2, a pretreatment unit 3 and a hand washing action recognition unit 4.
Furthermore, the system may further comprise an output unit 5 for outputting information for interaction with a person. Such as recognized actions, and may include, among other things, alert tones, alarms, and the like.
Wherein, the acquisition unit 1 includes: RGBD data module, wherein RGB refers to red, green and blue, and D refers to depth. The module includes a device (e.g., a camera) that takes a color image (RGB), and a device (e.g., a depth camera) that takes a depth image.
Depth cameras are also known as 3D cameras. A picture (2D image) taken by a normal color camera can be recorded by seeing all objects within the camera's view angle, but the recorded data does not contain the distance of these objects from the camera. The distance between each point in the image and the camera can be accurately known through the data acquired by the depth camera, and thus the three-dimensional space coordinate of each pixel point in the image can be acquired by adding the (x, y) coordinate of the pixel point in the 2D image.
The acquisition unit 1 may be placed in a hand wash. The position and angle of deployment need to ensure that both color and depth images can be acquired simultaneously.
The acquisition unit 1 and the output unit 5 may be installed in the same apparatus.
As for the pre-detection unit 2, the pre-processing unit 3, and the hand washing action recognition unit 4, they may be installed in the same device as the acquisition unit 1, or may be deployed in an action recognition server and communicate via a network, or the pre-detection unit 2, the pre-processing unit 3, and the hand washing action recognition unit 4 may be separate servers.
Fig. 2 illustrates an exemplary flow of a hand washing action recognition method performed by the hand washing action recognition system described above, including:
s1: in the pre-detection stage, image data of the current frame is acquired.
The image data (RGBD data) includes registered color images and depth images.
The purpose of registration is to allow the depth image and the color image to be merged together, i.e., to convert the image coordinate system of the depth image to the image coordinate system of the color image.
Specifically, step S1 may be performed by the aforementioned acquisition unit 1.
As mentioned above, the acquiring unit 1 is disposed at the hand washing site, and may be an RGBD data module, which periodically and simultaneously acquires a color image and a depth image.
The registered color image and depth image acquired at any acquisition time may be referred to as a frame of image data. The image data acquired at the current time can be regarded as the image data of the current frame.
S2: the image data is pre-detected.
Step S2 may be performed by the aforementioned pre-detection unit 2.
The steps involved in the pre-detection may be designed as appropriate, for example, the design detection includes: flush detection or foam detection, or both.
S3: and if the image passes the pre-detection, the depth image is used for carrying out background removal processing on the color image to obtain a foreground image.
Step S3 may be executed by the preprocessing unit 3 described above.
Specifically, for any pixel point, if the depth value of the pixel point in the depth image is out of the preset range, the pixel value of the pixel point in the color image is set to be 0.
The preset range may be determined according to the optimal recognition distance of the depth camera, and may be, for example, 50cm to 1.2m, and the pixel point with the depth value smaller than 50cm or larger than 1.2m may be set to 0 in the color image, so that most of the background in the image may be effectively removed to extract the color hand image.
S4: and performing hand washing action recognition by using the foreground image to obtain a recognition result corresponding to the image data.
The recognition result includes recognized hand washing action categories, such as crossed finger rubbing, fist rotating rubbing, ten-finger gathering rubbing, and the like.
Different numbers or characters can be used to characterize different hand washing action categories, and those skilled in the art can flexibly design the hand washing action categories, which are not described in detail herein.
Step S4 may be performed by hand washing action recognition unit 4 as described above.
In addition, for image data that fails the pre-detection, the corresponding recognition result may be a pre-detection recognition result.
The recognition result of the pre-detection may include: one or more of an identification result characterizing the presence or absence of flushing, and an identification result characterizing the presence or absence of foam.
As can be seen, in the embodiment of the present invention, the hand washing action recognition is implemented in two steps, the first step is performed with the pre-detection (pre-detection stage), and the second step is performed only after the pre-detection: and (5) recognizing the action. In the process of executing the second step, the registered depth image is used for assisting in background removal, a lot of background information is filtered, the recognition robustness is increased, and then the foreground image is recognized to obtain a recognition result.
In practice, a hand washing action category may include a series of different hand gestures, such as a fist-making rotation motion, which may include a hand-shaking, a gesture in which two hands are in different rotation states. Sometimes, a result of recognition using image data of a single frame may cause erroneous judgment.
Therefore, referring to fig. 2, in other embodiments of the present invention, after step S4, the following steps may be further included:
s5: and determining and outputting the current hand washing action category by using the identification result of the continuous multi-frame image data.
Referring to fig. 3, the system may further include a post-processing unit 6, and the post-processing unit 6 may execute step S5 to output the current hand washing action category to the output unit 5.
The continuous multi-frame image data comprises image data of a current frame and continuous N frames of image data before the image data of the current frame; n is a positive integer.
That is, the current hand washing action type can be comprehensively considered and output in combination with the identification results (whether flush water exists, whether foam exists, and which action type) of the continuous multi-frame image data.
The value of N can be flexibly designed by those skilled in the art.
For example, in case of N-29, the current hand washing action category may be determined using the recognition result of 30 consecutive frames.
More specifically, for the image data of 30 consecutive frames, the number of occurrences of each of the flush water, the recognized hand washing motion category, and the foam may be counted, and when the number of occurrences is greater than a preset threshold (for example, 10 frames), or the proportion of the number of occurrences of the flush water, the recognized hand washing motion category, and the foam within 30 frames (compared to the total number) is calculated, the recognition result of the maximum proportion exceeding 30% may be used as the hand washing motion category output to the output unit 5.
After output, the oldest frame in 30 frames can be removed, for example, the current frame is 31 frames, and the recognition results of 2-31 frames are used for determination. And when the current frame is 32 frames, the identification result of 3-32 frames is used for determination.
It is also possible to remove all 30 frames and accumulate 30 frames again. For example, step S5 is performed once for frames 1-30 (the current frame is the 30 th frame), step S5 is performed once for frames 31-60, and so on, and will not be described again.
In other embodiments of the present invention, there may be other logic to determine whether the user has carefully washed his hands according to the gestures set by the standard, including whether there is flushing, whether to use the liquid soap (foam detection), whether the hand movements meet the specifications, and the like.
It should be noted that, at present, there is also a depth learning model based on RGBD data fusion (model trained by color map and depth map together), but there are the following problems: under the condition of flushing, the depth value is 0 value, the recognition precision of higher requirements can not be reached through model training, and in addition, because water and foam can carry out some interference of sheltering from to the hand, can cause very big degree influence to the hand recognition result.
To solve the above problem, fig. 4 shows another exemplary flow of a hand washing action recognition method performed by the hand washing action recognition system, including:
s41: in the pre-detection stage, image data of the current frame is acquired.
S41 is the same as S1, and is not repeated here.
S42: flush detection is performed using image data.
Flush detection is performed by a trained first machine learning model.
Specifically, the flushing detection can be performed by using the depth image, and the flushing detection can also be performed by using the depth image and the color image at the same time.
S43: the image data is used for foam detection.
The foam detection is performed by a trained second machine learning model.
The flushing detection and the foam detection belong to the pre-detection. Therefore, the pre-detection unit 2 may further include: a trained first machine learning model and a trained second machine learning model.
S44: if the flushing state is not reached and no foam exists, the pre-detection is determined to be passed.
S45 is similar to S3, and is not described herein.
S46: and performing hand washing action recognition by using the foreground image to obtain a recognition result corresponding to the image data.
Hand wash action recognition may be performed by a trained third machine learning model.
The first to third machine learning models may exemplarily be CNN models.
S47 is similar to S5, and is not described herein.
After the flushing water or the foam is detected, the hand motion recognition cannot be carried out on the flushing water or the foam, and is not necessary; therefore, in the embodiment of the invention, the hand washing action recognition mainly aims at the recognition of the hand rubbing action after flushing or foaming. In this embodiment, only when it is determined that the current frame data does not have the flush nor the foam, the hand washing action is recognized.
Furthermore, for better hand washing action recognition, tests can be carried out when the acquisition unit is arranged.
The testing step may include:
step A: the user executes a preset action;
and B: the hand washing action recognition system recognizes and outputs the recognized hand washing action type, and can judge whether the recognized hand washing action type is consistent with an expected result.
That is, after deployment, the method enters a testing stage, prompts a user to perform some preset actions (for example, closing two hands, opening a water faucet, and the like), then acquires an RGBD image, and identifies the image to see whether the identification is correct. For how to identify, reference is made to the above description, which is not repeated herein.
If the difference is not consistent, the distance between the acquisition unit and the water tap, the illumination intensity, the chromaticity, the saturation, the distance and the like in the current scene can be adjusted on line.
After the adjustment is completed, the test is performed again, if the adjustment is not consistent, the adjustment is performed again, and then the test is performed, and so on, which is not described in detail.
The first to third machine learning models described above all need to be trained.
Before training, the training samples need to be prepared.
Therefore, before the pre-detection stage, a sample preparation stage and a training stage are required. This is described below.
First, a sample preparation phase.
Referring to fig. 5, the sample preparation phase may illustratively include the following steps:
s51: an image sample is acquired.
The image samples may include color image samples, depth image samples, and labels that are registered.
In one example, the color image sample and the depth image sample in the image sample may be collected by a module of the obtaining unit, and then the label is manually added.
In one example, for the first through third machine learning models, the labels may be a first label, a second label, and a third label.
Wherein the first tag may include: information characterizing the flushing state;
the second tag may include: information characterizing the presence of foam;
the third label may include a hand washing action category.
S52: data enhancement is performed on the registered color image samples and depth image samples to expand the number of image samples.
The data enhancement processing may include: the image is rotated, shifted, mirrored, random noise added, etc.
In this way, one image sample can be expanded into multiple copies, and the label is, of course, unchanged.
S53: and carrying out normalization processing on each image sample obtained after the data enhancement.
Specifically, the color image sample is normalized.
The normalization operation is a conventional operation, and is not described in detail.
For the sake of distinction, the image sample labeled as the first label and normalized may be referred to as the first target image sample; the image sample which is labeled as a second label and is subjected to normalization processing is called a second target image sample; and taking the color image sample in the image sample with the third label and subjected to normalization processing as a target color image sample, wherein the depth image sample is called a target depth image sample.
For the target depth image sample, the following operations may also be performed:
s54: and performing background removal processing on the corresponding target color image by using the target depth image to obtain a foreground image, wherein the obtained foreground image, the target depth image and the third label form a third target image sample.
For background removal, reference is made to the above description, and details are not repeated here.
It should be noted that the first to third target image samples may be divided into a training set and a test set. Alternatively, the training set includes a portion of the first through third target image samples, and the test set includes a portion of the first through third image samples.
Secondly, training;
still referring to fig. 5, the training phase includes:
s55: performing multiple iterative training on the first machine learning model based on the first target image sample to obtain a trained first machine learning model;
specifically, the depth image in the first target image sample may be used to perform iterative training on the first machine learning model, and the depth image and the color image may also be used to perform iterative training on the first machine learning model simultaneously.
S56: performing multiple iterative training on the second machine learning model based on the second target image sample to obtain a trained second machine learning model;
s57: and performing multiple times of iterative training on the third machine learning model based on the third target image sample to obtain a trained third machine learning model.
Wherein each iterative training comprises:
the first/second/third machine learning model learns based on continuous multi-frame image samples in a training set to obtain a learned first/second/third machine learning model;
inputting continuous multi-frame image samples in the test set into the first/second/third machine learning model after learning, and performing parameter optimization according to the identification result output by the first/second/third machine learning model after learning and the label of the image sample.
The third machine learning model is described below.
In one example, the third machine learning model may be a CNN model, see fig. 6, which may include;
1, a plurality of directly connected depth separable convolutional layers (or convolutional blocks, Cn in fig. 6); where n denotes an nth depth separable convolutional layer, which may specifically include:
DepthwiseConv + BN + ReLU + PointwiseConv + BN + ReLU. Wherein, LU is the characteristic used to extract image data, and is also the characteristic expression;
2, a full convolution layer C1x1 for compressing or expanding the number of channels;
3, global pooling layer (GP) for changing the picture into a feature value;
and 4, a classification task layer Cls for identifying the result probability of the classification, and identifying the current data as the action of the class when the probability of the class is the maximum.
In the conventional CNN model, there are pooled downsampled layers between each depth separable convolutional layer. The operation of the pooling downsampling layer causes the loss of potential related information of original data, so that the recognition accuracy is influenced, especially, in some complex gesture interaction scenes, two gesture actions with different subdivisions are different, and the judgment of a plurality of pixels is possibly mistaken, so that the recognition error is caused.
The CNN model of the present invention has no pooling downsampling layer. Therefore, the resolution can be kept unchanged, and the multi-resolution expansion receptive field strategy of the method depends on the Convolution kernel size of the Convolution block Cn each time, and the multiple resolution expansion receptive field strategy is enlarged from a shallow layer to a deep layer by layer, or uses a hole Convolution (generalized/associated) to expand the receptive field (in short, a hole is inserted into a general Convolution layer, and then the receptive field is enlarged).
It should be noted that, since the size of the convolution kernel is generally an odd number, the larger the size is, the more the global feature of the image can be extracted, and the amount of calculation increases, the size is generally 3, 5, 7, or the like.
In the convolutional neural network, the definition of a Receptive Field (Receptive Field) is the area size of a pixel point on a feature map (feature map) output by each layer of the convolutional neural network, which is mapped on an input picture.
A hand washing action recognition system is described below. Please refer to fig. 1, which exemplarily includes:
an obtaining unit 1, configured to obtain image data of a current frame in a pre-detection stage; the image data comprises a registered color image and depth image;
a pre-detection unit 2 for: pre-detecting image data;
a preprocessing unit 3 for:
if the pre-detection is passed, the depth image is used to perform background removal processing on the color image to obtain a foreground image;
a hand washing action recognition unit 4 for:
performing hand washing action recognition by using the foreground image to obtain a recognition result corresponding to the image data; the recognition result includes the recognized hand washing action category.
Furthermore, the system may further comprise an output unit 5 for outputting information for interaction with a person. Such as recognized actions, and may include, among other things, alert tones, alarms, and the like.
For details, please refer to the foregoing description, which is not repeated herein.
In one example, referring to fig. 3, the system may further include:
a post-processing unit 6 for:
determining and outputting the current hand washing action category by using the identification result of the continuous multi-frame image data;
the continuous multi-frame image data comprises image data of a current frame and continuous N frames of image data before the image data of the current frame; n is a positive integer.
For details, please refer to the foregoing description, which is not repeated herein.
In other embodiments of the present invention, in terms of performing the background removal process, the preprocessing units in all the embodiments described above may be specifically configured to:
and aiming at any pixel point, if the depth value of the pixel point in the depth image is out of the preset range, setting the pixel value of the pixel point in the color image as 0.
For details, please refer to the foregoing description, which is not repeated herein.
In other embodiments of the present invention, in terms of performing the pre-detection, the pre-detection units in all the embodiments are specifically configured to:
flushing detection is carried out by using the depth image;
performing foam detection by using the color image;
if the flushing state is not reached and no foam exists, the pre-detection is determined to be passed.
For details, please refer to the foregoing description, which is not repeated herein.
In one example, the pre-detection unit may further include: a trained first machine learning model and a trained second machine learning model.
Wherein the flush detection is performed by a trained first machine learning model and the foam detection is performed by a trained second machine learning.
In one example, the hand washing action recognition unit may include a trained third machine learning model. Hand wash action recognition is performed by a trained third machine learning model.
For details, please refer to the foregoing description, which is not repeated herein.
In other embodiments of the present invention, before the pre-detection stage, a sample preparation stage may be further included;
the system may further comprise: a sample acquisition unit for:
acquiring an image sample; the image samples comprise registered color image samples, depth image samples, and labels; the labels are a first label, a second label and a third label; wherein the first tag comprises: information characterizing the flushing state; the second label includes: information characterizing the presence of foam; the third label includes a hand washing action category;
performing data enhancement on the registered color image samples and depth image samples to expand the number of image samples;
and carrying out normalization processing on each image sample obtained after the data enhancement.
For details, please refer to the foregoing description, which is not repeated herein.
The image sample labeled as the first label and normalized may be referred to as a first target image sample; the image sample which is labeled as a second label and is subjected to normalization processing is called a second target image sample;
the color image sample in the normalized image sample with the label as the third label is called a target color image sample, and the depth image sample is a target depth image sample;
the sample acquiring unit is further configured to, in a sample preparation phase:
and performing background removal processing on the corresponding target color image by using the target depth image to obtain a foreground image, wherein the obtained foreground image, the target depth image and the third label form a third target image sample.
For details, please refer to the foregoing description, which is not repeated herein.
After the sample preparation stage and before the pre-detection stage, the method also comprises a training stage;
the system may further comprise a training unit for:
performing multiple iterative training on the first machine learning model based on the first target image sample to obtain a trained first machine learning model;
performing multiple iterative training on the second machine learning model based on the second target image sample to obtain a trained second machine learning model;
and performing multiple times of iterative training on the third machine learning model based on the third target image sample to obtain a trained third machine learning model.
For details, please refer to the foregoing description, which is not repeated herein.
The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. For the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is simple, and the description can be referred to the method part.
Those of skill would further appreciate that the various illustrative components and model steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
The steps of a method or model described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, WD-ROM, or any other form of storage medium known in the art.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (9)

1. A hand washing action recognition method, comprising:
in the pre-detection stage, acquiring image data of a current frame; the image data comprises a registered color image and a registered depth image;
pre-detecting the image data;
if the pre-detection is passed, the depth image is used for carrying out background removal processing on the color image to obtain a foreground image;
using the foreground image to identify the hand washing action to obtain an identification result corresponding to the image data; the recognition result includes the recognized hand washing action category.
2. The method of claim 1, wherein said performing a pre-test comprises:
flushing detection is performed using the image data;
performing foam detection using the image data;
if the flushing state is not reached and no foam exists, the pre-detection is determined to be passed.
3. The method of claim 2, further comprising, prior to the pre-detection phase, a sample preparation phase;
the sample preparation phase comprises:
acquiring an image sample; the image samples comprise registered color image samples, depth image samples, and labels; the label is a first label, a second label or a third label; wherein the first tag comprises: information characterizing the flushing state; the second tag includes: information characterizing the presence of foam; the third label includes a hand washing action category;
performing data enhancement on the registered color image samples and depth image samples to expand the number of image samples;
and carrying out normalization processing on each image sample obtained after the data enhancement.
4. The method of claim 3,
the label is the first label, and the normalized image sample is a first target image sample;
the label is the second label, and the normalized image sample is a second target image sample;
the label is the third label, the color image sample in the normalized image sample is a target color image sample, and the depth image sample is a target depth image sample;
the sample preparation phase further comprises:
and performing background removal processing on the corresponding target color image by using the target depth image to obtain a foreground image, and forming a third target image sample by using the target depth image, the target depth image and the third label.
5. The method of claim 4, wherein after the sample preparation phase, and before the pre-detection phase, further comprising a training phase;
the flush detection is performed by a trained first machine learning model, the foam detection is performed by a trained second machine learning model, and the hand wash action recognition is performed by a trained third machine learning model;
the training phase comprises:
performing a plurality of iterative trainings on a first machine learning model based on the first target image sample to obtain the trained first machine learning model;
performing a plurality of iterative trainings on a second machine learning model based on the second target image sample to obtain the trained second machine learning model;
and performing multiple times of iterative training on a third machine learning model based on the third target image sample to obtain the trained third machine learning model.
6. The method of claim 1, wherein the performing background removal processing on the color image using the registered depth image to obtain a foreground image comprises:
and aiming at any pixel point, if the depth value of the pixel point in the depth image is out of a preset range, setting the pixel value of the pixel point in the color image to be 0.
7. The method of claim 1, wherein the third machine learning model comprises:
a plurality of directly connected depth-separable convolutional layers;
fully rolling up the layers;
a global pooling layer;
and (5) classifying the task layer.
8. The method of claim 1, further comprising:
determining and outputting the current hand washing action category by using the identification result of the continuous multi-frame image data;
the continuous multi-frame image data comprises the image data of the current frame and continuous N frames of image data before the image data of the current frame; n is a positive integer.
9. A hand washing action recognition system, comprising:
the acquisition unit is used for acquiring the image data of the current frame in the pre-detection stage; the image data comprises a registered color image and depth image;
a pre-detection unit to: pre-detecting the image data;
a pre-processing unit for:
if the pre-detection is passed, the depth image is used for carrying out background removal processing on the color image to obtain a foreground image;
a hand washing action recognition unit for:
using the foreground image to identify the hand washing action to obtain an identification result corresponding to the image data; the recognition result includes the recognized hand washing action category.
CN202010764529.6A 2020-07-30 2020-07-30 Hand washing action recognition method and system Pending CN111860448A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010764529.6A CN111860448A (en) 2020-07-30 2020-07-30 Hand washing action recognition method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010764529.6A CN111860448A (en) 2020-07-30 2020-07-30 Hand washing action recognition method and system

Publications (1)

Publication Number Publication Date
CN111860448A true CN111860448A (en) 2020-10-30

Family

ID=72954297

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010764529.6A Pending CN111860448A (en) 2020-07-30 2020-07-30 Hand washing action recognition method and system

Country Status (1)

Country Link
CN (1) CN111860448A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113034503A (en) * 2021-05-28 2021-06-25 博奥生物集团有限公司 High-flux automatic cup separating method, device and system
CN113240723A (en) * 2021-05-18 2021-08-10 中德(珠海)人工智能研究院有限公司 Monocular depth estimation method and device and depth evaluation equipment
CN116071687A (en) * 2023-03-06 2023-05-05 四川港通医疗设备集团股份有限公司 Hand cleanliness detection method and system
US11823438B2 (en) 2020-11-09 2023-11-21 Industrial Technology Research Institute Recognition system and image augmentation and training method thereof

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070064986A1 (en) * 2001-03-13 2007-03-22 Johnson Raymond C Touchless identification in system for monitoring hand washing or application of a disinfectant
US20090087028A1 (en) * 2006-05-04 2009-04-02 Gerard Lacey Hand Washing Monitoring System
CN103295016A (en) * 2013-06-26 2013-09-11 天津理工大学 Behavior recognition method based on depth and RGB information and multi-scale and multidirectional rank and level characteristics
CN104167016A (en) * 2014-06-16 2014-11-26 西安工业大学 Three-dimensional motion reconstruction method based on RGB color and depth image
CN108256421A (en) * 2017-12-05 2018-07-06 盈盛资讯科技有限公司 A kind of dynamic gesture sequence real-time identification method, system and device
CN109726668A (en) * 2018-12-25 2019-05-07 大连海事大学 It is based on computer vision to wash one's hands and disinfection process normalization automatic testing method
CN110263689A (en) * 2019-06-11 2019-09-20 深圳市第三人民医院 One kind is washed one's hands monitoring method and its system, hand washing device
CN110309813A (en) * 2019-07-10 2019-10-08 南京行者易智能交通科技有限公司 A kind of model training method, detection method, device, mobile end equipment and the server of the human eye state detection based on deep learning
KR20190133867A (en) * 2018-05-24 2019-12-04 (주)네모랩스 System for providing ar service and method for generating 360 angle rotatable image file thereof
CN110796018A (en) * 2019-09-30 2020-02-14 武汉科技大学 Hand motion recognition method based on depth image and color image

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070064986A1 (en) * 2001-03-13 2007-03-22 Johnson Raymond C Touchless identification in system for monitoring hand washing or application of a disinfectant
US20090087028A1 (en) * 2006-05-04 2009-04-02 Gerard Lacey Hand Washing Monitoring System
CN103295016A (en) * 2013-06-26 2013-09-11 天津理工大学 Behavior recognition method based on depth and RGB information and multi-scale and multidirectional rank and level characteristics
CN104167016A (en) * 2014-06-16 2014-11-26 西安工业大学 Three-dimensional motion reconstruction method based on RGB color and depth image
CN108256421A (en) * 2017-12-05 2018-07-06 盈盛资讯科技有限公司 A kind of dynamic gesture sequence real-time identification method, system and device
KR20190133867A (en) * 2018-05-24 2019-12-04 (주)네모랩스 System for providing ar service and method for generating 360 angle rotatable image file thereof
CN109726668A (en) * 2018-12-25 2019-05-07 大连海事大学 It is based on computer vision to wash one's hands and disinfection process normalization automatic testing method
CN110263689A (en) * 2019-06-11 2019-09-20 深圳市第三人民医院 One kind is washed one's hands monitoring method and its system, hand washing device
CN110309813A (en) * 2019-07-10 2019-10-08 南京行者易智能交通科技有限公司 A kind of model training method, detection method, device, mobile end equipment and the server of the human eye state detection based on deep learning
CN110796018A (en) * 2019-09-30 2020-02-14 武汉科技大学 Hand motion recognition method based on depth image and color image

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
司阳;任松;肖秦琨;马媛;钟昆;杨雪梦;: "基于彩色-深度图像的手语识别算法", 科学技术与工程, no. 11, pages 109 - 114 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11823438B2 (en) 2020-11-09 2023-11-21 Industrial Technology Research Institute Recognition system and image augmentation and training method thereof
CN113240723A (en) * 2021-05-18 2021-08-10 中德(珠海)人工智能研究院有限公司 Monocular depth estimation method and device and depth evaluation equipment
CN113034503A (en) * 2021-05-28 2021-06-25 博奥生物集团有限公司 High-flux automatic cup separating method, device and system
CN116071687A (en) * 2023-03-06 2023-05-05 四川港通医疗设备集团股份有限公司 Hand cleanliness detection method and system
CN116071687B (en) * 2023-03-06 2023-06-06 四川港通医疗设备集团股份有限公司 Hand cleanliness detection method and system

Similar Documents

Publication Publication Date Title
CN111860448A (en) Hand washing action recognition method and system
JP6397144B2 (en) Business discovery from images
CN110060237B (en) Fault detection method, device, equipment and system
EP1693782B1 (en) Method for facial features detection
CN112052186B (en) Target detection method, device, equipment and storage medium
Singh et al. Currency recognition on mobile phones
CN109829467A (en) Image labeling method, electronic device and non-transient computer-readable storage medium
Mery et al. Student attendance system in crowded classrooms using a smartphone camera
CN107305635A (en) Object identifying method, object recognition equipment and classifier training method
CN107766864B (en) Method and device for extracting features and method and device for object recognition
CN109344864B (en) Image processing method and device for dense object
CN112633255B (en) Target detection method, device and equipment
CN111652145B (en) Formula detection method and device, electronic equipment and storage medium
CN109784322A (en) A kind of recognition methods of vin code, equipment and medium based on image procossing
CN114332911A (en) Head posture detection method and device and computer equipment
CN112541394A (en) Black eye and rhinitis identification method, system and computer medium
CN112686122B (en) Human body and shadow detection method and device, electronic equipment and storage medium
CN110188610A (en) A kind of emotional intensity estimation method and system based on deep learning
Kieu et al. Ocr accuracy prediction method based on blur estimation
CN110334703B (en) Ship detection and identification method in day and night image
CN116580232A (en) Automatic image labeling method and system and electronic equipment
Babu et al. A feature based approach for license plate-recognition of Indian number plates
CN106611417A (en) A method and device for classifying visual elements as a foreground or a background
CN115131355A (en) Intelligent method for detecting abnormality of waterproof cloth by using data of electronic equipment
CN112132822A (en) Suspicious illegal building detection algorithm based on transfer learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination