CN114067442B - Hand washing action detection method, model training method and device and electronic equipment - Google Patents

Hand washing action detection method, model training method and device and electronic equipment Download PDF

Info

Publication number
CN114067442B
CN114067442B CN202210051567.6A CN202210051567A CN114067442B CN 114067442 B CN114067442 B CN 114067442B CN 202210051567 A CN202210051567 A CN 202210051567A CN 114067442 B CN114067442 B CN 114067442B
Authority
CN
China
Prior art keywords
time
action
spatiotemporal
hand washing
space
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210051567.6A
Other languages
Chinese (zh)
Other versions
CN114067442A (en
Inventor
周波
梁书玉
苗瑞
邹小刚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Haiqing Zhiyuan Technology Co ltd
Original Assignee
Shenzhen HQVT Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen HQVT Technology Co Ltd filed Critical Shenzhen HQVT Technology Co Ltd
Priority to CN202210051567.6A priority Critical patent/CN114067442B/en
Publication of CN114067442A publication Critical patent/CN114067442A/en
Application granted granted Critical
Publication of CN114067442B publication Critical patent/CN114067442B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting

Abstract

The application provides a hand washing action detection method, a model training method and device and electronic equipment. The hand washing action detection method comprises the following steps: outputting first prompt information, wherein the first prompt information is used for prompting a user of a hand washing step to be carried out and acquiring a real-time action video of the user during hand washing; inputting the real-time action video into a space-time relation detection model to obtain a real-time action space-time relation graph corresponding to the real-time action video, wherein the real-time action space-time relation graph is used for representing the space-time relation between a user's hand and a target object when the user washes hands, and the target object is related to washing hands; and determining the detection result of the hand washing action of the user according to the real-time action space-time relation graph and the standard action space-time relation graph of the standard hand washing action video corresponding to the hand washing step, wherein the detection result comprises correctness or mistakes. Thus, detection of hand washing action is achieved.

Description

Hand washing action detection method, model training method and device and electronic equipment
Technical Field
The present disclosure relates to deep learning technologies, and in particular, to a hand washing motion detection method, a model training method, a hand washing motion detection device, and an electronic device.
Background
In the processes of food processing, medicine production and medical care, workers are often required to wash hands strictly and normatively to ensure sanitary safety, and the common standard hand washing method is a seven-step washing method which is required to complete hand washing according to seven-step hand washing actions.
However, in an actual scene, many people often wash hands only by washing with water at will, and whether the hand washing action is correct or not is unknown, so that the sanitation and safety cannot be guaranteed. Therefore, a method for detecting hand washing action is needed.
Disclosure of Invention
The application provides a hand washing action detection method, a model training method and a model training device and electronic equipment, which are used for realizing hand washing action detection.
In a first aspect, the present application provides a hand washing action detection method, including:
outputting first prompt information, wherein the first prompt information is used for prompting a user of a hand washing step to be carried out and acquiring a real-time action video of the user during hand washing;
inputting the real-time action video into a spatiotemporal relationship detection model to obtain a real-time action spatiotemporal relationship graph corresponding to the real-time action video, wherein the real-time action spatiotemporal relationship graph is used for representing the spatiotemporal relationship between the hand of the user and a target object when the user washes hands, and the target object is related to washing hands;
and determining the detection result of the hand washing action of the user according to the real-time action space-time relation graph and the standard action space-time relation graph of the standard hand washing action video corresponding to the hand washing step, wherein the detection result comprises correctness or mistakes.
In one embodiment, the inputting the real-time motion video into a spatiotemporal relationship detection model to obtain a real-time motion spatiotemporal relationship graph corresponding to the real-time motion video includes:
inputting the real-time action video into the spatiotemporal relation detection model, generating a network through spatiotemporal pipelines in the spatiotemporal relation detection model, and generating spatiotemporal pipelines of all objects in the real-time action video;
generating an initial spatiotemporal relationship graph by using each spatiotemporal pipeline as a graph node through a residual error density nested network in the spatiotemporal relationship detection model;
and performing feature extraction on the initial spatiotemporal relationship diagram through a Gaussian mixture layer in the spatiotemporal relationship detection model to obtain the real-time action spatiotemporal relationship diagram.
In one embodiment, the generating the spatiotemporal pipeline of all objects in the real-time motion video through the spatiotemporal pipeline generation network in the spatiotemporal relation detection model comprises:
and dividing the real-time action video into a plurality of video segments, generating a network through a space-time pipeline in the space-time relation detection model, and generating a space-time pipeline of an object in each video segment of the real-time action video.
In one embodiment, the generating an initial spatio-temporal relationship graph with each spatio-temporal pipe as a graph node through a residual density nested network in the spatio-temporal relationship detection model includes:
and determining edges among the graph nodes according to the space-time interaction of objects corresponding to the space-time pipelines by using each space-time pipeline as a graph node through a residual error density nested network in the space-time relation detection model so as to generate the initial space-time relation graph.
In one embodiment, the determining the detection result of the hand washing action of the user according to the real-time action spatiotemporal relationship diagram and a standard action spatiotemporal relationship diagram of a standard hand washing action video corresponding to the hand washing step includes:
determining the similarity of the real-time action spatiotemporal relationship graph and the standard action spatiotemporal relationship graph;
if the similarity is larger than or equal to a similarity threshold value, determining that the detection result is correct;
and if the similarity is smaller than the similarity threshold value, determining that the detection result is an error.
In one embodiment, the method further comprises:
if the detection result is correct, outputting second prompt information, wherein the second prompt information is used for prompting the user to carry out hand washing action of the next step;
and if the detection result is wrong, outputting third prompt information, wherein the third prompt information is used for prompting the user to repeat the hand washing action in the step.
In a second aspect, the present application provides a training method for a spatiotemporal relationship detection model, including:
acquiring a sample hand washing action video, wherein the hand washing action in the sample hand washing action video is a standard action;
inputting the sample hand washing action video into an initial space-time relation detection model to obtain a sample action space-time relation graph; the sample action spatiotemporal relationship graph is used for representing the spatiotemporal relationship between hands and a target object in the sample hand washing action video, and the target object is related to hand washing;
and updating parameters of the initial spatiotemporal relationship detection model according to the sample action spatiotemporal relationship graph so as to obtain the spatiotemporal relationship detection model.
In one embodiment, the updating the parameters of the initial spatiotemporal relationship detection model according to the sample action spatiotemporal relationship graph includes:
and updating the parameters of the initial spatiotemporal relationship detection model according to the relative entropy of the characteristic distribution and the Gaussian distribution of the sample motion spatiotemporal relationship graph.
In one embodiment, the inputting the sample hand-washing motion video into an initial spatiotemporal relationship detection model to obtain a sample motion spatiotemporal relationship graph includes:
inputting the sample hand washing action video into an initial space-time relationship detection model, generating a network through space-time pipelines in the initial space-time relationship detection model, and generating space-time pipelines of all objects in the sample hand washing action video;
generating an initial spatiotemporal relationship graph by using each spatiotemporal pipeline as a graph node through a residual error density nested network in the initial spatiotemporal relationship detection model;
and performing feature extraction on the initial spatiotemporal relationship graph through a Gaussian mixture layer in the initial spatiotemporal relationship detection model to obtain the sample action spatiotemporal relationship graph.
In one embodiment, the method further comprises:
and inputting the standard hand washing action video into the space-time relation detection model to obtain a standard action space-time relation graph corresponding to the standard hand washing action video.
In a third aspect, the present application provides a hand washing action detection device, comprising:
the output module is used for outputting first prompt information, the first prompt information is used for prompting a user of a hand washing step to be carried out, and a real-time action video of the user during hand washing is obtained;
the input module is used for inputting the real-time action video into a space-time relation detection model to obtain a real-time action space-time relation graph corresponding to the real-time action video, the real-time action space-time relation graph is used for representing the space-time relation between the hand of the user and a target object when the user washes hands, and the target object is related to washing hands;
and the judging module is used for determining the detection result of the hand washing action of the user according to the real-time action space-time relation graph and the standard action space-time relation graph of the standard hand washing action video corresponding to the hand washing step, wherein the detection result comprises correctness or mistakes.
In one embodiment, the input module is configured to:
inputting the real-time action video into the spatiotemporal relation detection model, generating a network through spatiotemporal pipelines in the spatiotemporal relation detection model, and generating spatiotemporal pipelines of all objects in the real-time action video;
generating an initial spatiotemporal relationship graph by using each spatiotemporal pipeline as a graph node through a residual error density nested network in the spatiotemporal relationship detection model;
and performing feature extraction on the initial spatiotemporal relationship diagram through a Gaussian mixture layer in the spatiotemporal relationship detection model to obtain the real-time action spatiotemporal relationship diagram.
In one embodiment, the input module is configured to:
and dividing the real-time action video into a plurality of video segments, generating a network through a space-time pipeline in the space-time relation detection model, and generating a space-time pipeline of an object in each video segment of the real-time action video.
In one embodiment, the input module is configured to:
and determining edges among the graph nodes according to the space-time interaction of objects corresponding to the space-time pipelines by using each space-time pipeline as a graph node through a residual error density nested network in the space-time relation detection model so as to generate the initial space-time relation graph.
In one embodiment, the determining module is configured to:
determining the similarity of the real-time action spatiotemporal relationship graph and the standard action spatiotemporal relationship graph;
if the similarity is larger than or equal to a similarity threshold value, determining that the detection result is correct;
and if the similarity is smaller than the similarity threshold value, determining that the detection result is an error.
In one embodiment, the output module is further configured to:
if the detection result is correct, outputting second prompt information, wherein the second prompt information is used for prompting the user to carry out hand washing action of the next step;
if the detection result is wrong, outputting third prompt information, wherein the third prompt information is used for prompting the user to repeat the hand washing action of the step
In a fourth aspect, the present application provides a training apparatus for a spatiotemporal relationship detection model, including:
the acquisition module is used for acquiring a sample hand washing action video, and the hand washing action in the sample hand washing action video is a standard action;
the input module is used for inputting the sample hand washing action video into an initial space-time relation detection model to obtain a sample action space-time relation graph; the sample action spatiotemporal relationship graph is used for representing the spatiotemporal relationship between hands and a target object in the sample hand washing action video, and the target object is related to hand washing;
and the updating module is used for updating the parameters of the initial spatiotemporal relationship detection model according to the sample action spatiotemporal relationship graph so as to obtain the spatiotemporal relationship detection model.
In one embodiment, the update module is configured to:
and updating the parameters of the initial spatiotemporal relationship detection model according to the relative entropy of the characteristic distribution and the Gaussian distribution of the sample motion spatiotemporal relationship graph.
In one embodiment, the input module is configured to:
inputting the sample hand washing action video into an initial space-time relationship detection model, generating a network through space-time pipelines in the initial space-time relationship detection model, and generating space-time pipelines of all objects in the sample hand washing action video;
generating an initial spatiotemporal relationship graph by using each spatiotemporal pipeline as a graph node through a residual error density nested network in the initial spatiotemporal relationship detection model;
and performing feature extraction on the initial spatiotemporal relationship graph through a Gaussian mixture layer in the initial spatiotemporal relationship detection model to obtain the sample action spatiotemporal relationship graph.
In one embodiment, the method further comprises:
and the standard module is used for inputting the standard hand washing action video into the space-time relation detection model to obtain a standard action space-time relation graph corresponding to the standard hand washing action video.
In a fifth aspect, the present application provides an electronic device comprising a memory and a processor, the memory and the processor being connected;
the memory is used for storing a computer program;
the processor is adapted to implement the method of the first or second aspect when the computer program is executed.
In a fourth aspect, the present application provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a method as described in the first or second aspect above.
In a fifth aspect, the present application provides a computer program product comprising a computer program that, when executed by a processor, implements the method of the first or second aspect.
The application provides a hand washing action detection method, a model training method, a device and electronic equipment.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to these drawings without inventive exercise.
Fig. 1 is a schematic flow chart illustrating a hand washing detection method according to an embodiment of the present disclosure;
FIG. 2 is a block diagram of a spatiotemporal relationship detection model according to an embodiment of the present disclosure;
FIG. 3 is a schematic diagram of a processing method of a spatiotemporal relationship detection model according to an embodiment of the present application;
FIG. 4 is a schematic flowchart of a training method of a spatiotemporal relationship detection model according to an embodiment of the present disclosure;
fig. 5 is a schematic structural diagram of a hand washing motion detection device according to an embodiment of the present disclosure;
FIG. 6 is a schematic structural diagram of a training apparatus for a spatiotemporal relationship detection model according to an embodiment of the present application;
fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
In the embodiment of the application, the real-time action video of the user during hand washing is detected by using the hand washing action space-time relation detection model to obtain the real-time action space-time relation between the hand and other target objects in the real-time action video, and whether the hand washing action of the user is correct or not is determined by using the real-time action space-time relation graph and the standard action space-time relation graph of the standard hand washing action video, so that the hand washing action is detected.
The hand washing operation detection method provided by the present application will be described in detail below with reference to specific examples. It is to be understood that the following detailed description may be combined with other embodiments, and that the same or similar concepts or processes may not be repeated in some embodiments.
Fig. 1 is a schematic flow chart of a hand washing motion detection method according to an embodiment of the present application. The execution subject of the method is a hand washing action detection device, and the device can be realized by software and/or hardware. As shown in fig. 1, the method includes:
s101, outputting first prompt information, wherein the first prompt information is used for prompting a user of a hand washing step to be performed, and acquiring a real-time action video of the user during hand washing.
The first prompt message may be in a voice form or a text form, and for example, the hand washing motion detection device has a voice broadcast device for broadcasting the voice form of the first prompt message, or the hand washing motion detection device has a user interface for displaying the text form of the first prompt message. Because the standard hand washing action comprises a plurality of steps, the user is required to wash hands one by one according to the standard in the embodiment of the application, the hand washing action detection device outputs the first prompt message to prompt the user of the step of washing hands to be carried out, so that the user can carry out the corresponding hand washing action according to the first prompt message, and the hand washing action detection device shoots and acquires the real-time action video of the user when washing hands through the shooting state.
S102, inputting the real-time action video into the space-time relation detection model to obtain a real-time action space-time relation graph corresponding to the real-time action video, wherein the real-time action space-time relation graph is used for representing the space-time relation between the hand of the user and a target object when the user washes hands, and the target object is related to the washing hands.
The input of the space-time relation detection model is a real-time action video, and the output is a real-time action space-time relation graph. The target object is an object related to hand washing, such as soap, hand sanitizer, a faucet, water flow and the like, detected from the real-time action video, and the real-time action spatiotemporal relationship graph represents the relationship between the hand and the target object from the dimensions of time and space, namely the real-time empty relationship.
S103, determining a detection result of the hand washing action of the user according to the real-time action space-time relation graph and the standard action space-time relation graph of the standard hand washing action video corresponding to the hand washing step, wherein the detection result comprises correctness or mistakes.
The standard hand washing action video corresponding to the hand washing step is a video with correct hand washing action corresponding to the hand washing step, and the standard action spatiotemporal relationship graph corresponding to the standard hand washing action video can be obtained by inputting the standard hand washing action video into the spatiotemporal relationship detection model in advance. Optionally, determining the similarity between the real-time action spatiotemporal relationship graph and the standard action spatiotemporal relationship graph; if the similarity is greater than or equal to the similarity threshold, determining that the detection result is correct; and if the similarity is smaller than the similarity threshold value, determining that the detection result is an error. For example, the Similarity between the temporal-motion spatiotemporal relationship graph and the standard motion spatiotemporal relationship graph can be measured by using a Structural Similarity (SSIM) index.
According to the method, the real-time action spatiotemporal relationship graph of the real-time action video of the user during hand washing is extracted by adopting the spatiotemporal relationship detection model, whether the hand washing action is correct or not is judged based on the similarity of the real-time action spatiotemporal relationship graph and the standard action spatiotemporal relationship graph, and accurate hand washing action detection is realized.
On the basis of the embodiment, if the detection result is correct, outputting second prompt information, wherein the second prompt information is used for prompting the user to carry out hand washing action of the next step; if the detection result is wrong, outputting third prompt information, wherein the third prompt information is used for prompting the user to repeat the hand washing action of the step; until all hand washing steps are completed.
The spatiotemporal relationship detection model in the embodiment of the present application is described below. As shown in the model framework shown in fig. 2, the spatio-temporal relationship detection model includes a spatio-temporal pipeline generation Network (TPN), a residual-in-residual dense block (RRDB), and a gaussian mixture layer. The spatio-temporal pipeline generation network may be one or more, each spatio-temporal pipeline generation network for processing a video segment. The spatio-temporal pipeline generation network may be based on a 3D convolutional neural network (P3D ResNet).
The process of inputting the real-time motion video into the spatio-temporal relationship detection model in S102 to obtain the spatio-temporal relationship diagram of the real-time motion is shown in fig. 3, and the description of fig. 3 is described below.
S301, inputting the real-time action video into a space-time relation detection model, generating a network through space-time pipelines in the space-time relation detection model, and generating space-time pipelines of all objects in the real-time action video.
Optionally, the real-time motion video is divided into a plurality of video segments, and a network is generated through a space-time pipeline in the space-time relationship detection model, so as to generate a space-time pipeline of an object in each video segment of the real-time motion video. And respectively inputting the video segments into the space-time pipeline generation networks, wherein each space-time pipeline generation network outputs a space-time pipeline of an object in the video segment, and the object can comprise a hand, a target object related to hand washing and other objects in the video segment.
The processing process of the space-time pipeline generation network to each video clip is as follows: a three-dimensional motion detector in the spatio-temporal pipeline generation network determines candidate frames of objects in a starting frame of a video segment, for example, candidate frames of hands, tap, water, hand sanitizer and possibly other objects, then further estimates the movement of each candidate frame in a current frame, then generates corresponding candidate frames in subsequent frames, and after determining candidate frames of the objects in all frames, connects the same object across the candidate frames of continuous frames to generate a spatio-temporal pipeline of the object, and obtains the characteristics of the object in time and space dimensions.
S302, each space-time pipeline is used as a graph node through a residual error density nested network in the space-time relation detection model, and an initial space-time relation graph is generated.
The initial spatiotemporal relationship graph is obtained by mining spatiotemporal interaction among all spatiotemporal pipelines, each spatiotemporal pipeline is used as a graph node through a residual error density nested network in a spatiotemporal relationship detection model, and edges among the graph nodes are determined according to the spatiotemporal interaction of objects corresponding to the spatiotemporal pipelines so as to generate the initial spatiotemporal relationship graph.
Optionally, all spatiotemporal pipelines in each video segment are used as graph nodes through a residual density nested network, dense connections among the graph nodes are used as edges of the graph, feature graphs corresponding to the graph nodes and the edges are extracted, and an initial spatiotemporal relationship graph is generated. And if the candidate frames of the objects corresponding to the space-time pipeline have an overlapped part, determining that the dense connection exists between the graph nodes. Graph nodes and edges in the initial spatio-temporal relationship graph characterize the spatio-temporal relationship between all objects in the real-time action video.
And S303, performing feature extraction on the initial spatiotemporal relationship diagram through a Gaussian mixture layer in the spatiotemporal relationship detection model to obtain a real-time action spatiotemporal relationship diagram.
When the spatiotemporal pipeline generation network generates spatiotemporal pipelines of objects, the spatiotemporal pipelines of the detected objects irrelevant to hand washing may be included, so graph nodes and edges in the initial spatiotemporal relationship graph may also include parts irrelevant to hand washing, and therefore, a gaussian mixture layer is adopted to perform feature extraction on the initial spatiotemporal relationship graph to obtain a real-time action spatiotemporal relationship graph, and the real-time action spatiotemporal relationship graph only includes graph nodes and edges corresponding to hands and target objects relevant to hand washing. The initial spatiotemporal relationship graph may be referred to as a complete spatiotemporal relationship graph, and the real-time action spatiotemporal relationship graph may be referred to as a subgraph of the initial spatiotemporal relationship graph.
The space-time relation between the hand and the target object in the real-time action video of the user when washing the hand is accurately obtained from the time and space dimensions through the space-time relation detection model, so that whether the hand washing action is correct or not is determined based on the similarity of the space-time relation and the standard action space-time relation, and the detection accuracy is improved.
The above embodiments describe the framework of the spatio-temporal relationship detection model and the processing procedure of the real-time motion video. The training of the spatio-temporal relationship detection model is described below.
Fig. 4 is a flowchart illustrating a training method of a spatiotemporal relationship detection model according to an embodiment of the present application. As shown in fig. 4, the method includes:
s401, a sample hand washing action video is obtained, and hand washing actions in the sample hand washing action video are standard actions.
The sample hand washing action video is a standard hand washing action video, the standard hand washing action video in the database can be used as a sample for model training through a pre-standard hand washing action video database, and the videos in the data can have different visual angles and different visual distances.
S402, inputting the sample hand washing action video into an initial spatiotemporal relation detection model to obtain a sample action spatiotemporal relation graph; the sample action spatiotemporal relationship graph is used for representing the spatiotemporal relationship between hands and a target object in the sample hand washing action video, and the target object is related to hand washing.
Inputting the sample hand washing action video into an initial space-time relationship detection model, generating a network through a space-time pipeline in the initial space-time relationship detection model, and generating space-time pipelines of all objects in the sample hand washing action video; generating an initial spatiotemporal relationship graph by using each spatiotemporal pipeline as a graph node through a residual error density nested network in an initial spatiotemporal relationship detection model; and performing feature extraction on the initial spatiotemporal relationship graph through a Gaussian mixture layer in the initial spatiotemporal relationship detection model to obtain a sample action spatiotemporal relationship graph.
The processing procedure of the spatio-temporal pipeline generation network, the residual density nested network, and the gaussian mixture layer on the sample hand washing motion video is similar to the processing procedure of the real-time motion video in step S302 in the foregoing embodiment, and details are not repeated here.
And S403, updating parameters of the initial space-time relation detection model according to the sample motion space-time relation graph to obtain the space-time relation detection model.
The goal of optimizing the model is to make the feature distribution of the output sample motion spatiotemporal relationship graph conform to the Gaussian distribution, optionally, the KL divergence, i.e. the relative entropy, of the feature distribution and the Gaussian distribution of the sample motion spatiotemporal relationship graph is used as the loss, and the parameters of the initial spatiotemporal relationship detection model are updated according to the relative entropy of the feature distribution and the Gaussian distribution of the sample motion spatiotemporal relationship graph, so that the trained spatiotemporal relationship detection model is obtained.
The initial spatiotemporal relationship detection model is processed by adopting the method to obtain the trained spatiotemporal relationship detection model, so that the action spatiotemporal relationship in the hand washing video can be accurately detected.
After the model training is finished, the standard hand washing action video is input into the space-time relation detection model, a standard action space-time relation graph corresponding to the standard hand washing action video is obtained, the standard action space-time relation graph can be compared with the real-time action space-time relation graph in the subsequent application process, and whether the real-time hand washing action is correct or not is determined by judging the similarity.
Fig. 5 is a schematic structural diagram of a hand washing motion detection device according to an embodiment of the present application. As shown in fig. 5, hand washing action detection apparatus 500 includes:
the output module 501 is configured to output first prompt information, where the first prompt information is used to prompt a user of a hand washing step to be performed, and acquire a real-time action video of the user during hand washing;
an input module 502, configured to input the real-time action video into the spatiotemporal relationship detection model, so as to obtain a real-time action spatiotemporal relationship diagram corresponding to the real-time action video, where the real-time action spatiotemporal relationship diagram is used to represent a spatiotemporal relationship between a user's hand and a target object when the user washes hands, and the target object is related to washing hands;
the determining module 503 is configured to determine a detection result of the hand washing action of the user according to the real-time action spatiotemporal relationship diagram and the standard action spatiotemporal relationship diagram of the standard hand washing action video corresponding to the hand washing step, where the detection result includes a correct or an incorrect detection result.
In one embodiment, the input module 502 is configured to:
inputting the real-time action video into a space-time relation detection model, generating a network through space-time pipelines in the space-time relation detection model, and generating space-time pipelines of all objects in the real-time action video;
generating an initial spatiotemporal relationship graph by using each spatiotemporal pipeline as a graph node through a residual error density nested network in a spatiotemporal relationship detection model;
and performing feature extraction on the initial space-time relation graph through a Gaussian mixture layer in the space-time relation detection model to obtain a real-time action space-time relation graph.
In one embodiment, the input module 502 is configured to:
the real-time action video is divided into a plurality of video segments, and a network is generated through a space-time pipeline in a space-time relation detection model, so that a space-time pipeline of an object in each video segment of the real-time action video is generated.
In one embodiment, the input module 502 is configured to:
and determining edges among the graph nodes according to the space-time interaction of objects corresponding to the space-time pipelines by using each space-time pipeline as a graph node through a residual error density nested network in the space-time relation detection model so as to generate an initial space-time relation graph.
In one embodiment, the determining module 503 is configured to:
determining the similarity between the real-time action spatio-temporal relationship graph and the standard action spatio-temporal relationship graph;
if the similarity is greater than or equal to the similarity threshold, determining that the detection result is correct;
and if the similarity is smaller than the similarity threshold value, determining that the detection result is an error.
In one embodiment, the output module 501 is further configured to:
if the detection result is correct, outputting second prompt information, wherein the second prompt information is used for prompting the user to carry out hand washing action of the next step;
and if the detection result is wrong, outputting third prompt information, wherein the third prompt information is used for prompting the user to repeat the hand washing action of the step.
The hand washing action detection device provided by the embodiment of the application can be used for executing the hand washing action detection method in the embodiment, the technical effects of the realization principles are similar, and the description is omitted here.
Fig. 6 is a schematic structural diagram of a training apparatus for a spatiotemporal relationship detection model according to an embodiment of the present application. As shown in fig. 6, the training apparatus 600 for spatio-temporal relationship detection model includes:
the acquisition module 601 is configured to acquire a sample hand washing action video, where a hand washing action in the sample hand washing action video is a standard action;
an input module 602, configured to input the sample hand washing motion video into the initial spatiotemporal relationship detection model, so as to obtain a sample motion spatiotemporal relationship diagram; the sample action spatiotemporal relation graph is used for representing the spatiotemporal relation between hands and a target object in the sample hand washing action video, and the target object is related to hand washing;
and the updating module 603 is configured to update parameters of the initial spatio-temporal relationship detection model according to the sample motion spatio-temporal relationship diagram, so as to obtain a spatio-temporal relationship detection model.
In one embodiment, the update module 603 is configured to:
and updating parameters of the initial spatiotemporal relationship detection model according to the relative entropy of the characteristic distribution and the Gaussian distribution of the sample motion spatiotemporal relationship graph.
In one embodiment, the input module 602 is configured to:
inputting the sample hand washing action video into an initial space-time relationship detection model, generating a network through a space-time pipeline in the initial space-time relationship detection model, and generating space-time pipelines of all objects in the sample hand washing action video;
generating an initial spatiotemporal relationship graph by using each spatiotemporal pipeline as a graph node through a residual error density nested network in an initial spatiotemporal relationship detection model;
and performing feature extraction on the initial spatiotemporal relationship graph through a Gaussian mixture layer in the initial spatiotemporal relationship detection model to obtain a sample action spatiotemporal relationship graph.
In one embodiment, the method further comprises:
and the standard module is used for inputting the standard hand washing action video into the space-time relation detection model to obtain a standard action space-time relation graph corresponding to the standard hand washing action video.
The training device of the spatio-temporal relationship detection model provided in the embodiment of the present application can be used for executing the training method of the spatio-temporal relationship detection model in the foregoing embodiments, and the implementation principle and technical effect thereof are similar, and are not described herein again.
Fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present application. As shown in fig. 7, the electronic device 700 comprises a memory 701 and a processor 702, the memory 701 and the processor 702 being connected by a bus 703.
The memory 701 is used to store computer programs.
The processor 702 is adapted to implement the method in any of the above embodiments when the computer program is executed.
The embodiments of the present application further provide a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the method in any of the embodiments is implemented.
Embodiments of the present application further provide a computer program product, which includes a computer program, and when the computer program is executed by a processor, the method in any of the above embodiments is implemented.
Optionally, the Processor may be a Central Processing Unit (CPU), or may be another general-purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method embodiment disclosed in this application may be directly implemented by a hardware processor, or may be implemented by a combination of hardware and software modules in a processor.
Those of ordinary skill in the art will understand that: all or a portion of the steps of implementing the above-described method embodiments may be performed by hardware associated with program instructions. The program may be stored in a computer-readable storage medium. When executed, the program performs steps comprising the method embodiments described above; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.
Finally, it should be noted that: the above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present application.

Claims (12)

1. A hand washing action detection method, comprising:
outputting first prompt information, wherein the first prompt information is used for prompting a user of a hand washing step to be carried out and acquiring a real-time action video of the user during hand washing;
inputting the real-time action video into a spatiotemporal relationship detection model to obtain a real-time action spatiotemporal relationship graph corresponding to the real-time action video, wherein the real-time action spatiotemporal relationship graph is used for representing the spatiotemporal relationship between the hand of the user and a target object when the user washes hands, and the target object is related to washing hands;
determining a detection result of the hand washing action of the user according to the real-time action spatiotemporal relationship diagram and a standard action spatiotemporal relationship diagram of a standard hand washing action video corresponding to the hand washing step, wherein the detection result comprises correctness or mistakes;
the inputting the real-time action video into a spatiotemporal relationship detection model to obtain a real-time action spatiotemporal relationship diagram corresponding to the real-time action video comprises:
inputting the real-time action video into the spatiotemporal relation detection model, generating a network through spatiotemporal pipelines in the spatiotemporal relation detection model, and generating spatiotemporal pipelines of all objects in the real-time action video;
generating an initial spatiotemporal relationship graph by using each spatiotemporal pipeline as a graph node through a residual error density nested network in the spatiotemporal relationship detection model;
and performing feature extraction on the initial spatiotemporal relationship diagram through a Gaussian mixture layer in the spatiotemporal relationship detection model to obtain the real-time action spatiotemporal relationship diagram.
2. The method of claim 1, wherein generating spatiotemporal pipelines of all objects in the real-time motion video through a spatiotemporal pipeline generation network in the spatiotemporal relationship detection model comprises:
and dividing the real-time action video into a plurality of video segments, generating a network through a space-time pipeline in the space-time relation detection model, and generating a space-time pipeline of an object in each video segment of the real-time action video.
3. The method according to claim 1, wherein generating an initial spatiotemporal relationship graph with each spatiotemporal pipeline as a graph node through a residual density nested network in the spatiotemporal relationship detection model comprises:
and determining edges among the graph nodes according to the space-time interaction of objects corresponding to the space-time pipelines by using each space-time pipeline as a graph node through a residual error density nested network in the space-time relation detection model so as to generate the initial space-time relation graph.
4. The method according to any one of claims 1 to 3, wherein the determining the detection result of the hand washing action of the user according to the real-time action spatiotemporal relationship diagram and a standard action spatiotemporal relationship diagram of a standard hand washing action video corresponding to the hand washing step comprises:
determining the similarity of the real-time action spatiotemporal relationship graph and the standard action spatiotemporal relationship graph;
if the similarity is larger than or equal to a similarity threshold value, determining that the detection result is correct;
and if the similarity is smaller than the similarity threshold value, determining that the detection result is an error.
5. The method according to any one of claims 1-3, further comprising:
if the detection result is correct, outputting second prompt information, wherein the second prompt information is used for prompting the user to carry out hand washing action of the next step;
and if the detection result is wrong, outputting third prompt information, wherein the third prompt information is used for prompting the user to repeat the hand washing action in the step.
6. A training method of a space-time relationship detection model is characterized by comprising the following steps:
acquiring a sample hand washing action video, wherein the hand washing action in the sample hand washing action video is a standard action;
inputting the sample hand washing action video into an initial space-time relation detection model to obtain a sample action space-time relation graph; the sample action spatiotemporal relationship graph is used for representing the spatiotemporal relationship between hands and a target object in the sample hand washing action video, and the target object is related to hand washing;
updating parameters of the initial spatiotemporal relationship detection model according to the sample action spatiotemporal relationship graph to obtain the spatiotemporal relationship detection model;
inputting the sample hand washing action video into an initial space-time relationship detection model to obtain a sample action space-time relationship diagram, wherein the method comprises the following steps:
inputting the sample hand washing action video into an initial space-time relationship detection model, generating a network through space-time pipelines in the initial space-time relationship detection model, and generating space-time pipelines of all objects in the sample hand washing action video;
generating an initial spatiotemporal relationship graph by using each spatiotemporal pipeline as a graph node through a residual error density nested network in the initial spatiotemporal relationship detection model;
and performing feature extraction on the initial spatiotemporal relationship graph through a Gaussian mixture layer in the initial spatiotemporal relationship detection model to obtain the sample action spatiotemporal relationship graph.
7. The method according to claim 6, wherein updating parameters of the initial spatiotemporal relationship detection model according to the sample action spatiotemporal relationship graph comprises:
and updating the parameters of the initial spatiotemporal relationship detection model according to the relative entropy of the characteristic distribution and the Gaussian distribution of the sample motion spatiotemporal relationship graph.
8. The method of claim 6 or 7, further comprising:
and inputting the standard hand washing action video into the space-time relation detection model to obtain a standard action space-time relation graph corresponding to the standard hand washing action video.
9. A hand washing motion detection device, comprising:
the output module is used for outputting first prompt information, the first prompt information is used for prompting a user of a hand washing step to be carried out, and a real-time action video of the user during hand washing is obtained;
the input module is used for inputting the real-time action video into a space-time relation detection model to obtain a real-time action space-time relation graph corresponding to the real-time action video, the real-time action space-time relation graph is used for representing the space-time relation between the hand of the user and a target object when the user washes hands, and the target object is related to washing hands;
the judging module is used for determining the detection result of the hand washing action of the user according to the real-time action space-time relation graph and the standard action space-time relation graph of the standard hand washing action video corresponding to the hand washing step, wherein the detection result comprises correctness or mistakes;
the input module is used for:
inputting the real-time action video into the spatiotemporal relation detection model, generating a network through spatiotemporal pipelines in the spatiotemporal relation detection model, and generating spatiotemporal pipelines of all objects in the real-time action video;
generating an initial spatiotemporal relationship graph by using each spatiotemporal pipeline as a graph node through a residual error density nested network in the spatiotemporal relationship detection model;
and performing feature extraction on the initial spatiotemporal relationship diagram through a Gaussian mixture layer in the spatiotemporal relationship detection model to obtain the real-time action spatiotemporal relationship diagram.
10. A training device for a spatio-temporal relationship detection model is characterized by comprising:
the acquisition module is used for acquiring a sample hand washing action video, and the hand washing action in the sample hand washing action video is a standard action;
the input module is used for inputting the sample hand washing action video into an initial space-time relation detection model to obtain a sample action space-time relation graph; the sample action spatiotemporal relationship graph is used for representing the spatiotemporal relationship between hands and a target object in the sample hand washing action video, and the target object is related to hand washing;
the updating module is used for updating the parameters of the initial spatiotemporal relationship detection model according to the sample action spatiotemporal relationship graph so as to obtain the spatiotemporal relationship detection model;
the input module is used for:
inputting the sample hand washing action video into an initial space-time relationship detection model, generating a network through space-time pipelines in the initial space-time relationship detection model, and generating space-time pipelines of all objects in the sample hand washing action video;
generating an initial spatiotemporal relationship graph by using each spatiotemporal pipeline as a graph node through a residual error density nested network in the initial spatiotemporal relationship detection model;
and performing feature extraction on the initial spatiotemporal relationship graph through a Gaussian mixture layer in the initial spatiotemporal relationship detection model to obtain the sample action spatiotemporal relationship graph.
11. An electronic device comprising a memory and a processor, the memory and the processor being connected;
the memory is used for storing a computer program;
the processor is adapted to implement the method of any of claims 1-8 when the computer program is executed.
12. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the method according to any one of claims 1-8.
CN202210051567.6A 2022-01-18 2022-01-18 Hand washing action detection method, model training method and device and electronic equipment Active CN114067442B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210051567.6A CN114067442B (en) 2022-01-18 2022-01-18 Hand washing action detection method, model training method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210051567.6A CN114067442B (en) 2022-01-18 2022-01-18 Hand washing action detection method, model training method and device and electronic equipment

Publications (2)

Publication Number Publication Date
CN114067442A CN114067442A (en) 2022-02-18
CN114067442B true CN114067442B (en) 2022-04-19

Family

ID=80231206

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210051567.6A Active CN114067442B (en) 2022-01-18 2022-01-18 Hand washing action detection method, model training method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN114067442B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114973424A (en) * 2022-08-01 2022-08-30 深圳市海清视讯科技有限公司 Feature extraction model training method, hand action recognition method, device and electronic equipment

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5454043A (en) * 1993-07-30 1995-09-26 Mitsubishi Electric Research Laboratories, Inc. Dynamic and static hand gesture recognition through low-level image analysis
CN112613356A (en) * 2020-12-07 2021-04-06 北京理工大学 Action detection method and device based on deep attention fusion network

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110765860B (en) * 2019-09-16 2023-06-23 平安科技(深圳)有限公司 Tumble judging method, tumble judging device, computer equipment and storage medium
CN111598081A (en) * 2020-04-09 2020-08-28 浙江工业大学 Automatic seven-step hand washing method operation normative detection method
CN112084851A (en) * 2020-08-04 2020-12-15 珠海格力电器股份有限公司 Hand hygiene effect detection method, device, equipment and medium
CN113033458B (en) * 2021-04-09 2023-11-07 京东科技控股股份有限公司 Action recognition method and device
CN113326835B (en) * 2021-08-04 2021-10-29 中国科学院深圳先进技术研究院 Action detection method and device, terminal equipment and storage medium
CN113591758A (en) * 2021-08-06 2021-11-02 全球能源互联网研究院有限公司 Human behavior recognition model training method and device and computer equipment
CN113869127A (en) * 2021-08-30 2021-12-31 浙江大华技术股份有限公司 Human behavior detection method, monitoring device, electronic device, and medium

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5454043A (en) * 1993-07-30 1995-09-26 Mitsubishi Electric Research Laboratories, Inc. Dynamic and static hand gesture recognition through low-level image analysis
CN112613356A (en) * 2020-12-07 2021-04-06 北京理工大学 Action detection method and device based on deep attention fusion network

Also Published As

Publication number Publication date
CN114067442A (en) 2022-02-18

Similar Documents

Publication Publication Date Title
CN107766839B (en) Motion recognition method and device based on 3D convolutional neural network
CN107908803B (en) Question-answer interaction response method and device, storage medium and terminal
US11163991B2 (en) Method and apparatus for detecting body
CN108108821B (en) Model training method and device
CN110072142B (en) Video description generation method and device, video playing method and device and storage medium
US9928875B2 (en) Efficient video annotation with optical flow based estimation and suggestion
JP2019139211A (en) Voice wake-up method and device
JP2020091543A (en) Learning device, processing device, neural network, learning method, and program
CN108920665B (en) Recommendation scoring method and device based on network structure and comment text
CN114067442B (en) Hand washing action detection method, model training method and device and electronic equipment
US11275628B2 (en) Notification information output method, server and monitoring system
CN109089172B (en) Bullet screen display method and device and electronic equipment
WO2017197330A1 (en) Two-stage training of a spoken dialogue system
CN108628908B (en) Method, device and electronic equipment for classifying user question-answer boundaries
CN116884391B (en) Multimode fusion audio generation method and device based on diffusion model
KR101741248B1 (en) Method and apparatus for estimating causality among variables
CN108460364B (en) Method and apparatus for generating information
KR20180077865A (en) Online apparatus and method for Multiple Camera Multiple Target Tracking Based on Multiple Hypothesis Tracking
JP2019117556A (en) Information processing apparatus, information processing method and program
US11494704B2 (en) Information processing method and information processing system
CN110019730A (en) Automatic interaction system and intelligent terminal
CN113393325A (en) Transaction detection method, intelligent device and computer storage medium
CN115665369B (en) Video processing method, device, electronic equipment and storage medium
CN113836291B (en) Data processing method, device, equipment and storage medium
CN113628103B (en) High-granularity cartoon face generation method based on multistage loss and related components thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP01 Change in the name or title of a patent holder

Address after: 518000 Guangdong Shenzhen Baoan District Xixiang street, Wutong Development Zone, Taihua Indus Industrial Park 8, 3 floor.

Patentee after: Shenzhen Haiqing Zhiyuan Technology Co.,Ltd.

Address before: 518000 Guangdong Shenzhen Baoan District Xixiang street, Wutong Development Zone, Taihua Indus Industrial Park 8, 3 floor.

Patentee before: SHENZHEN HIVT TECHNOLOGY Co.,Ltd.

CP01 Change in the name or title of a patent holder