CN112464882B - Method, apparatus, medium, and device for recognizing continuous motion - Google Patents

Method, apparatus, medium, and device for recognizing continuous motion Download PDF

Info

Publication number
CN112464882B
CN112464882B CN202011459110.6A CN202011459110A CN112464882B CN 112464882 B CN112464882 B CN 112464882B CN 202011459110 A CN202011459110 A CN 202011459110A CN 112464882 B CN112464882 B CN 112464882B
Authority
CN
China
Prior art keywords
operator
frame
limb
operation object
limb key
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011459110.6A
Other languages
Chinese (zh)
Other versions
CN112464882A (en
Inventor
梁帆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Prophet Big Data Co ltd
Original Assignee
Dongguan Prophet Big Data Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dongguan Prophet Big Data Co ltd filed Critical Dongguan Prophet Big Data Co ltd
Priority to CN202011459110.6A priority Critical patent/CN112464882B/en
Publication of CN112464882A publication Critical patent/CN112464882A/en
Application granted granted Critical
Publication of CN112464882B publication Critical patent/CN112464882B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61LMETHODS OR APPARATUS FOR STERILISING MATERIALS OR OBJECTS IN GENERAL; DISINFECTION, STERILISATION OR DEODORISATION OF AIR; CHEMICAL ASPECTS OF BANDAGES, DRESSINGS, ABSORBENT PADS OR SURGICAL ARTICLES; MATERIALS FOR BANDAGES, DRESSINGS, ABSORBENT PADS OR SURGICAL ARTICLES
    • A61L2/00Methods or apparatus for disinfecting or sterilising materials or objects other than foodstuffs or contact lenses; Accessories therefor
    • A61L2/24Apparatus using programmed or automatic operation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61LMETHODS OR APPARATUS FOR STERILISING MATERIALS OR OBJECTS IN GENERAL; DISINFECTION, STERILISATION OR DEODORISATION OF AIR; CHEMICAL ASPECTS OF BANDAGES, DRESSINGS, ABSORBENT PADS OR SURGICAL ARTICLES; MATERIALS FOR BANDAGES, DRESSINGS, ABSORBENT PADS OR SURGICAL ARTICLES
    • A61L2202/00Aspects relating to methods or apparatus for disinfecting or sterilising materials or objects
    • A61L2202/10Apparatus features
    • A61L2202/14Means for controlling sterilisation processes, data processing, presentation and storage means, e.g. sensors, controllers, programs

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biomedical Technology (AREA)
  • Multimedia (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Human Computer Interaction (AREA)
  • Epidemiology (AREA)
  • Animal Behavior & Ethology (AREA)
  • Public Health (AREA)
  • Veterinary Medicine (AREA)
  • Image Analysis (AREA)

Abstract

The embodiment of the specification discloses a method, a device and electronic equipment for identifying continuous actions, wherein the method comprises the steps of respectively inputting a multi-frame image sequence into an operation object identification model and an operation worker identification model to obtain an operation object detection frame and an operation worker detection frame; screening out an operator image of an operator in a working area of an operation object; inputting the screened operator image into a limb recognition model to obtain the limb key point coordinates of the operator; carrying out standardization processing on the limb key point coordinates in all the frame images; and calculating similarity scores of the limb key point coordinates after the standardization treatment and each standard action set, and judging that the operator executes corresponding actions when the similarity scores are greater than a score threshold value. The method and the device have the advantages that the accuracy of monitoring the operation condition is facilitated, the technical problems of inaccuracy and unreality existing when the operation object use condition is recorded manually in the prior art are solved, and the labor and material cost of an enterprise is reduced.

Description

Method, apparatus, medium, and device for recognizing continuous motion
Technical Field
The present disclosure relates to the field of computer vision technologies, and in particular, to a method for recognizing a continuous motion, a device for recognizing a continuous motion, an electronic apparatus, a computer-readable storage medium, and a computer program.
Background
The dining room such as schools, enterprises and factories has a large number of dining people, food safety relates to the aspect of the aspect, food material sources are reliable, the safety and the sanitation of the processing process are guaranteed, meanwhile, if tableware is not disinfected, residual food residues can generate harmful substances and breed harmful germs, and can threaten the health of dining staff in schools, enterprises and factories, and due to the fact that the disinfection cabinet is correctly used, the tableware can be effectively disinfected and sterilized, major food poisoning accidents and food-borne diseases are reduced, and therefore research and investigation on the use condition of the disinfection cabinet become more and more important.
Traditional disinfection cabinet use condition monitoring relies on manual records, including, for example, the use time and operators of the disinfection cabinet, and the written monitoring method has low authenticity and the possibility of counterfeit records, so that the use condition of the disinfection cabinet cannot be accurately and truly supervised. Therefore, how to efficiently and accurately monitor the use condition of the disinfection cabinet in the canteens such as schools, enterprises and factories becomes a problem to be solved urgently.
Disclosure of Invention
An object of the embodiments of the present disclosure is to provide a method for identifying a continuous motion, a device for identifying a continuous motion, an electronic apparatus, a computer-readable storage medium, and a computer program, which can identify a continuous operation performed by an operator, and determine whether the operator performs a certain operation on an operation object based on an identification result, so that the usage of the operation object can be efficiently and accurately monitored, the monitoring efficiency of food safety is improved, and the costs of manpower and material resources are reduced.
To achieve the above object, in a first aspect, the present specification provides a method for recognizing a continuous motion, the method comprising:
respectively inputting a plurality of frames of images with time sequence relation into an operation object identification model and an operator identification model, and respectively obtaining an operation object detection frame and an operator detection frame in each frame of image;
screening out an operator image in a working area of the operation object according to the operation object detection frame and the operator detection frame;
inputting the screened operator images into a limb recognition model, and obtaining the limb key point coordinates of the operator in each frame of image;
standardizing the limb key point coordinates of the operator in all the frame images;
and calculating similarity scores of the limb key point coordinates of the operator after the standardization treatment and each standard action set, and judging that the operator executes the action corresponding to the standard action set when the similarity score is greater than a score threshold value.
Optionally, the operation object is a disinfection cabinet, and the standard action and the middle action include opening a disinfection cabinet door, closing the disinfection cabinet door, carrying tableware into the disinfection cabinet, and carrying tableware out of the disinfection cabinet.
Optionally, the screening out the operator image located in the working area of the operation object according to the operation object detection frame and the operator detection frame includes:
calculating the Euclidean distance between the central point of the operation object detection frame and the central point of the operator detection frame;
and screening out the images of the operators in the working area of the operation object when the Euclidean distance is smaller than a distance threshold value.
Optionally, the normalizing the coordinates of the key points of the limbs of the operator in all the frame images includes:
smoothing the coordinates of the limb key points of the operator in each frame of image and performing mean value calculation to obtain the skeleton center of the operator;
calculating the projection length from the bone center to the surface line of the operation object trained in advance in each frame image, and obtaining a projection length set { l ] in all frame imagesj};
According to the coordinate set of the key points of the limbs of the operator in all the frame images
Figure BDA0002830665230000021
And set of projection lengths { l }jCarrying out length standardization and target standardization processing on the limb key point coordinates of the operator, wherein,
Figure BDA0002830665230000022
the coordinates of the limb key points are represented by i, the types of the limb key points are represented by j, and the time sequence number of the frame image is represented by j;
wherein the length normalization is as follows:
Figure BDA0002830665230000023
wherein the content of the first and second substances,
Figure BDA0002830665230000031
lsign boardFor standard distance, l ═ max (l)j) I.e. l is the set of projection lengths { l }jThe maximum value in (c);
the formula for the number normalization is as follows:
Figure BDA0002830665230000032
wherein int [ x ]]Is the integer of x, wherein x is b multiplied by r,
Figure BDA0002830665230000033
r denotes the number of key points of the limb, niCoordinate set of limb key points for ith limb key point
Figure BDA0002830665230000034
Number of (1), niThe number of the ith limb key point in the standard action set,
Figure BDA0002830665230000035
representing the normalized set of key points of the limb.
Optionally, the calculating a similarity score between the limb key point coordinate of the operator after the normalization processing and each standard action set, and when the similarity score is greater than a score threshold, determining that the operator performs an action corresponding to the standard action set includes:
for all frame images, a time window with the length of t is adopted, and preset step length is adopted according to the time direction
Figure BDA0002830665230000036
Sliding, calculating the similarity score between the standardized limb key point set and each standard action set in each time window to obtain a similarity score result s in each time windowgWherein n represents the frame rate of the video to be detected, g represents the center time of the time window, and the similarity calculation formula is as follows:
Figure BDA0002830665230000037
where m represents the number of frames in a standard action set, and Y ═ YjDenotes a set of standard actions that are set up,
Figure BDA0002830665230000038
mean value representing a standard action information set, X ═ XjDenotes the set of actions within a time window,
Figure BDA00028306652300000314
represents the average value of the motion information set within the time window,
Figure BDA0002830665230000039
Figure BDA00028306652300000310
Figure BDA00028306652300000311
Figure BDA00028306652300000312
Figure BDA00028306652300000313
wherein the content of the first and second substances,
Figure BDA0002830665230000041
respectively representing the coordinates of key points of the wrist, the elbow and the shoulder of an operator, wherein the equation kx-y + b is 0 representing a linear equation passing through the bone center of the operator and perpendicular to the surface of the operation object, k and b represent parameters of the linear equation passing through the bone center of the operator and perpendicular to the surface of the operation object, the equation k 'x-y + b' is 0 representing a linear equation of the surface of the operation object, and k 'and b' represent parameters of the linear equation of the surface of the operation object;
scoring the similarity within all time windows to result sgComposing a vector and multiplying the vector by corresponding weight to obtain a total similarity score, when the total similarity score is smaller than the corresponding weightAnd when the total similarity score is greater than a score threshold value, judging that the operator executes the action corresponding to the standard action set, and recording a time period when the total similarity score is greater than the score threshold value.
Optionally, after recording a time period in which the total similarity score is greater than the score threshold, the method further comprises:
judging whether the operator executes more than two actions in the same time period;
if so, deleting more than two actions executed in the same time period to obtain continuous actions in the video to be detected.
Optionally, smoothing the coordinates of the limb key points of the operator in each frame of image and performing mean value calculation to obtain the bone center of the operator includes:
judging whether abnormal limb key points exist or not according to the coordinates of the limb key points;
when abnormal limb key points exist, performing mean value calculation on the abnormal limb key points in the current frame image by using coordinates of the limb key points of corresponding parts in the front and rear preset frame images to obtain coordinates of each limb key point after smoothing treatment;
and calculating the average value of the coordinates of the key points of each limb after the smoothing treatment to obtain the bone center of the operator.
In a second aspect, embodiments of the present specification provide a continuous motion recognition apparatus, the apparatus including:
the detection module is used for respectively inputting a plurality of frames of images with time sequence relation into the operation object identification model and the operator identification model and respectively obtaining an operation object detection frame and an operator detection frame in each frame of image;
the screening module is used for screening out the operator image in the working area of the operation object according to the operation object detection frame and the operator detection frame;
the limb identification module is used for inputting the screened operator images into a limb identification model and obtaining the limb key point coordinates of the operator in each frame of image;
the standardization module is used for carrying out standardization processing on the limb key point coordinates of the operator in all the frame images;
and the similarity module is used for calculating similarity scores of the limb key point coordinates of the operator after the standardization processing and each standard action set, and judging that the operator executes corresponding actions in the standard action sets when the similarity scores are larger than a score threshold value.
In a third aspect, the present specification provides an electronic device comprising:
a memory for storing a computer program;
a processor configured to execute the computer program stored in the memory, wherein when the computer program is executed, the method for recognizing continuous motion according to any one of the first aspect is implemented.
In a fourth aspect, the present specification provides a computer-readable storage medium, on which a computer program is stored, which, when executed by a processor, implements the method for recognizing continuous motion according to any one of the first aspect.
In the method for identifying continuous motions, the device for identifying continuous motions, the electronic device, the computer-readable storage medium, and the computer program provided in one or more embodiments of the present disclosure, an operation object detection box and an operation person detection box in an image to be detected may be obtained by using a deep learning method, then an operation person image located in a working area of the operation object is screened out, coordinates of each limb key point are obtained after a limb identification model is input, after a limb key point coordinate in a video to be detected is normalized, a similarity score between the normalized limb key point coordinate and each standard motion set is calculated, and when the similarity score is greater than a score threshold, it is determined that the operator has performed a motion corresponding to the standard motion set. The continuous action recognition method disclosed by the specification is beneficial to improving the monitoring accuracy of an operator in executing continuous operation actions on an operation object, overcomes the technical problems of inaccuracy and unreality when the operator relies on manual recording in the prior art, further improves the food safety supervision efficiency, and reduces the labor and material cost of enterprises.
Drawings
In order to more clearly illustrate the embodiments of the present specification or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only some embodiments described in the present specification, and for those skilled in the art, other drawings can be obtained according to the drawings without any creative effort. In the drawings:
FIG. 1 is a flow chart illustrating an embodiment of a method for recognizing continuous motion provided herein;
FIG. 2 is a schematic illustration of the identification of operands in some embodiments provided herein;
FIG. 3 is a schematic illustration of operator identification in some embodiments provided herein;
FIG. 4 is a schematic illustration of limb identification in some embodiments provided herein;
FIG. 5 is a schematic diagram of 4 actions in a standard set of actions in some embodiments provided herein;
fig. 6 is a schematic structural diagram of an embodiment of a continuously operating identification device provided in this specification.
Detailed Description
In order to make those skilled in the art better understand the technical solutions in the present specification, the technical solutions in one or more embodiments of the present specification will be clearly and completely described below with reference to the drawings in one or more embodiments of the present specification, and it is obvious that the described embodiments are only a part of the embodiments of the specification, and not all embodiments. All other embodiments obtained by a person skilled in the art based on one or more embodiments of the present specification without making any creative effort shall fall within the protection scope of the embodiments of the present specification.
The embodiments provided herein are applicable to electronic devices such as terminal devices, computer systems, and servers, which are operable with numerous other general purpose or special purpose computing system environments or configurations. Examples of well known terminal devices, computer systems, environments, and/or configurations that may be suitable for use with electronic devices, such as terminal devices, computer systems, and servers, include, but are not limited to: personal computer systems, server computer systems, thin clients, thick clients, hand-held or laptop devices, microprocessor-based systems, set top boxes, programmable consumer electronics, network pcs, minicomputer systems, mainframe computer systems, distributed cloud computing environments that include any of the above systems, and the like.
Electronic devices such as terminal devices, computer systems, and servers may be described in the general context of computer system-executable instructions, such as program modules, being executed by a computer system. Generally, program modules may include routines, programs, objects, components, logic, and data structures, etc. that perform particular tasks or implement particular abstract data types. The computer system/server may be practiced in distributed cloud computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed cloud computing environment, program modules may be located in both local and remote computer system storage media including memory storage devices.
Referring to fig. 1, fig. 1 is a schematic flow chart of an embodiment of a method for recognizing a continuous motion provided in this specification. Although the present specification provides the method steps or apparatus structures as shown in the following examples or figures, more or less steps or modules may be included in the method or apparatus structures based on conventional or non-inventive efforts. In the case of steps or structures which do not logically have the necessary cause and effect relationship, the execution order of the steps or the block structure of the apparatus is not limited to the execution order or the block structure shown in the embodiments or the drawings of the present specification. When the method or the module structure is applied to a device, a server or an end product in practice, the method or the module structure shown in the embodiment or the drawings can be executed sequentially or in parallel (for example, a parallel processor or a multi-thread processing environment, or even an implementation environment including distributed processing and server clustering). In a specific embodiment, as shown in fig. 1, in an embodiment of the method for identifying a continuous motion provided in the present specification, the method may include the following steps:
and S200, respectively inputting a plurality of image sequences with time sequence relation into the operation object identification model and the operation worker identification model, and obtaining an operation object detection frame and an operation worker detection frame from a plurality of images.
In an optional example, in this specification, multiple frames of images to be detected having a time sequence relationship may be respectively input into a pre-trained operation object recognition model for operation object recognition in real time, where the multiple frames of images to be detected having a time sequence relationship may be multiple continuous video frames in a video, or may be multiple image blocks cut from multiple continuous video frames in a video. Since a plurality of consecutive video frames in a video have a timing relationship, a plurality of image blocks obtained by slicing a video frame also have a timing relationship. The size of the image to be measured with the time sequence relationship in the present application should meet the preset requirement of the operation object recognition model for the size of the input image, for example, the size of the image to be measured may include, but is not limited to 224 × 224. After the multiple frames of images are respectively input into the operation object identification model, an operation object detection frame can be obtained in each frame of image to be detected, and the operation object detection frame can comprise coordinate information of the operation object detection frame.
Meanwhile, the multiple frames of images to be detected with the time sequence relationship need to be respectively input into a pre-trained operator identification model for operator identification, so as to obtain an operator detection frame. The operator detection box may include coordinate information of the operator detection box and a time of the corresponding frame image.
It should be noted that the embodiment of the present invention does not limit the order of inputting the multi-frame images into the operation object recognition model and the operator recognition model, and in some examples of the present invention, the multi-frame images may be first input into a pre-trained operation object recognition model for operation object recognition, and then input into the pre-trained operator recognition model for operator recognition; in other examples of the present invention, a plurality of frames of images may be input into a pre-trained operator recognition model for operator recognition, and then a plurality of frames of images may be input into a pre-trained operator recognition model for operator recognition.
In this embodiment of the present specification, first, an operation object recognition model and an operator recognition model need to be trained, where topology structures of the operation recognition model and the operator recognition model may be a convolutional neural network.
In an alternative example, the convolutional neural network may be a convolutional neural network with deep learning capability, including but not limited to a plurality of convolutional layers, and may further include: pooling layers, fully connected layers, and layers for performing sort operations, among others. The convolutional neural network can realize deep learning, and the deep convolutional neural network has more outstanding performance in image recognition compared with other deep learning structures.
Before the detection of the operation object is carried out on the image to be detected, the image classification task of the convolutional neural network can be trained by using a data set containing abundant operation object marking information as a training sample in advance, so that an operation object identification model with the operation object classification function is obtained.
The trained operation object recognition model is used for testing a plurality of frames of images to be tested with time sequence relation, and the operation object confidence of each area in the images to be tested can be obtained. The confidence of the operation object is the probability that the image of the region is the operation object, the confidence of the operation object is compared with a preset threshold of the confidence of the operation object, the images of the regions can be classified, and the operation object region and the non-operation object region are distinguished, so that the coordinate information of the operation object detection frame and the operation object detection frame is obtained.
Similarly, before the operator detection is carried out on the image to be detected, the image classification task of the convolutional neural network can be trained by using the data set containing abundant operator marking information as a training sample in advance, so that the operator identification model with the operator classification function is obtained.
And testing a plurality of frames of images to be tested with a time sequence relation by using the trained operator identification model, so as to obtain the confidence of the operators in each region of the images to be tested. The confidence coefficient of the operator is the probability that the image of the region is the operator, the confidence coefficient of the operator is compared with a preset confidence coefficient threshold value of the operator, the images of the regions can be classified, the regions of the operator and the regions of non-operators are distinguished, and therefore the coordinate information of the detection frame of the operator and the time of the corresponding frame are obtained.
It should be noted that the present invention is not limited to the operation object and the operator, and is within the scope of the present invention as long as the operator performs continuous operation on a certain object. In some examples, the operation object may be a disinfection cabinet in a canteen of a business, school, factory, etc., and the operator may be a worker in the canteen who operates the disinfection cabinet, and in other examples, the operation object may be an electronic product, clothing, etc. on a production line of the business, and the operator may be a worker on the production line.
In the embodiment of the present specification, an operation object is a disinfection cabinet in a canteen such as an enterprise, a school, and a factory, and an operator is an operator operating the disinfection cabinet. Referring to fig. 2, fig. 2 is a schematic diagram illustrating the operation object recognition performed in some embodiments provided in the present specification. Wherein, the image to be detected is input into the pre-trained operation object recognition model to obtain the operation object detection frame, i.e. the disinfection cabinet shown in fig. 2.
Referring to fig. 3, fig. 3 is a schematic diagram of performing operator identification in some embodiments provided in the present specification, in which an image to be detected is input into a pre-trained operator identification model, and acquisition time of an operator detection frame and a corresponding frame image is obtained, that is, the operator shown in fig. 3 is obtained.
And S220, screening out the operator image in the working area of the operation object according to the operation object detection frame and the operator detection frame.
In the embodiment of the specification, in order to improve the identification efficiency, it is necessary to remove some images which do not meet the condition from a plurality of frames of images to be detected with a time sequence relationship, for example, when the distance between an operator and an operation object is too far, it is indicated that the operator does not perform an operation on the operation object, so that the images can be removed, the data processing amount is reduced, and the identification efficiency is improved.
In some examples of the present invention, the screening out multiple frames of images of which the operator is located in the working area of the operation object is implemented by the following steps:
s221, calculating the Euclidean distance between the central point of the operation object detection frame and the central point of the operator detection frame.
According to the coordinate information of the operation object detection frame and the coordinate information of the operator detection frame obtained in the above steps, the center point coordinate of the operation object detection frame and the center point coordinate of the operator detection frame can be respectively calculated, and then the Euclidean distance between the two can be obtained according to the coordinates of the center points of the two. Calculating the euclidean distance between two points according to the coordinates of the two points belongs to the common knowledge of those skilled in the art, and will not be described herein.
The coordinates of the operation object detection frame are denoted as (x, y, w, h), where (x, y) is the upper left coordinate of the operation object detection frame, (w, h) is the width and height of the operation object detection frame, and the coordinates of the operator detection frame are denoted as (x ', y', w ', h'), where (x ', y') is the upper left coordinate of the operator detection frame, and (w ', h') is the width and height of the operator detection frame.
Whether the detected operator is in the working area neighborhood of the operation object (x, y, w, h) or not is judged, the coordinates of the canteen workers not in the working area neighborhood are deleted, and the calculation formula of the position coordinates (x ', y', w ', h') of the canteen workers in the station is obtained as follows:
Figure BDA0002830665230000091
Figure BDA0002830665230000092
where d is the boundary threshold of the operation object, and if fL is 0, the coordinates of the corresponding operation object detection box are deleted.
And S222, screening and screening out the images of the operators in the working area of the operation object when the Euclidean distance is smaller than a distance threshold value.
In the invention, when the distance between the center point of the operation object detection frame and the center point of the operator detection frame is greater than the distance threshold, the corresponding coordinates of the operator detection frame can be deleted, and the operator detection frame with the distance between the center point of the operation object detection frame and the center point of the operator detection frame smaller than the threshold is reserved.
And S240, inputting the screened operator image into a limb recognition model, and obtaining the limb key point coordinates of the operator in each frame of image.
In the invention, the image of the operator in the working area of the operation object can be input into a pre-trained limb recognition model for limb recognition, and the topological structure of the limb recognition model can be a convolutional neural network.
In an alternative example, the convolutional neural network may be a convolutional neural network with deep learning capability, including but not limited to a scotch convolutional layer, and the convolutional neural network may further include: pooling layers, fully connected layers, and layers for performing sort operations, among others. The convolutional neural network can realize deep learning, and the deep convolutional neural network has more outstanding performance in image recognition compared with other deep learning structures.
In an optional example, the convolutional neural network can also be a lightweight convolutional neural network, so that the processing time is shortened, and the detection speed is increased.
In an alternative example, the convolutional neural network may also be several cascaded convolutional neural networks, thereby having better recognition performance.
Before limb detection is carried out, an image classification task of a convolutional neural network can be trained by using a data set containing abundant skeletal joint point labeling information as a training sample in advance, so that an operator identification model with a limb classification function is obtained. Wherein the skeletal joint points comprise at least: wrist, elbow, shoulder, etc.
The trained limb recognition model is used for testing the image to be tested, so that the limb key points of the operator, such as the wrist, the elbow, the shoulder and the like, in the image to be tested can be positioned, and the limb key points are connected in sequence to form the basic skeleton of the operator.
Referring to fig. 4, fig. 4 is a schematic diagram of limb identification in some embodiments provided herein. Inputting the images meeting the conditions into a pre-trained limb recognition model, testing the multi-frame images meeting the conditions by using the trained limb recognition model, positioning the joint points of the wrist, elbow, shoulder and the like of the worker, and connecting the joint points in sequence to form the basic skeleton of the worker.
And S260, carrying out standardization processing on the limb key point coordinates of the operator in all the frame images.
In this step, the operations of S200-S240 are performed on each frame of image of the video to be tested, and the coordinates of the key points of the limbs of the operator located in the working area of the operation object are obtained in each frame of image, so that the coordinate set of the key points of the limbs of the operator is obtained in the video to be tested
Figure BDA0002830665230000101
One possible implementation way for carrying out standardization processing on the limb key point coordinates of the operator in the video to be tested is realized by the following steps:
s261, smoothing the coordinates of the limb key points of the operator in each frame of image, and performing mean value calculation to obtain the skeleton center of the operator.
In some alternative examples, the bone center of the operator may be obtained by:
s2611, judging whether abnormal limb key points exist or not according to coordinates of the limb key points.
In this step, the distance between every two limb key points can be calculated, and when the distance is greater than a preset threshold, the abnormal limb key point of the limb key point is judged, and the abnormal limb key point needs to be smoothed.
And S2612, when abnormal limb key points exist, performing mean value calculation on the abnormal limb key points in the current frame image by using coordinates of the limb key points of corresponding parts in the front and rear preset frame images to obtain coordinates of each limb key point after smoothing processing.
In the step, when there is an abnormal bone key point, for the abnormal limb key point in the current frame image, the coordinates of the limb key point of the part in the two previous and next frame images can be used for mean value calculation to obtain the coordinates of each limb key point after smoothing processing.
And S2613, calculating an average value of coordinates of each limb key point after smoothing processing, and obtaining the skeleton center of the operator.
In this specification, after smoothing the skeletal joint points of each frame of image in the motion set, the skeletal joint point coordinates of the operator in each frame of image are added and averaged to obtain the skeletal center of the operator in each frame of image, so that the skeletal frame of the operator in each frame of image can be regarded as a point.
S262, calculating the projection length from the bone center to the linear line on the surface of the operation object trained in advance in each frame image, and obtaining a projection length set { l ] in all the frame imagesj}。
The straight line of the surface of the operation object can be obtained through historical data training, for example, when the operation object is a disinfection cabinet, the surface of the operation object can be the surface of the disinfection cabinet facing an operator. The bone center of the operator to the operation object can be obtained in each frame of imageThe projection length of the straight line of the surface, thereby obtaining a projection length set l in the video to be measuredj}。
S263, according to the limb key point coordinate set of the operator in all the frame images
Figure BDA0002830665230000111
And set of projection lengths { l }jCarrying out length standardization and target standardization processing on the limb key point coordinates of the operator, wherein,
Figure BDA0002830665230000112
and the coordinates of the limb key points are represented by i, the types of the limb key points are represented by j, and the time sequence number of the frame image is represented by j.
Wherein the length normalization is as follows:
Figure BDA0002830665230000121
wherein the content of the first and second substances,
Figure BDA0002830665230000122
lsign boardIs the standard distance, v ═ max (l)j) I.e. l is the set of projection lengths { l }jThe maximum value in (c);
the formula for the number normalization is as follows:
Figure BDA0002830665230000123
wherein int [ x ]]Is the integer of x, wherein x is b multiplied by r,
Figure BDA0002830665230000124
r denotes the number of key points of the limb, niCoordinate set of limb key points for ith limb key point
Figure BDA0002830665230000125
Number of (1), niIs the ith limb key pointThe number of the standard action sets is that after the length standardization and the target standardization of the limb key point set are carried out, a standardized limb key point set platform is obtained
Figure BDA0002830665230000126
And S280, calculating similarity scores of the limb key point coordinates of the operator after the standardization processing and each standard action set, and judging that the operator executes corresponding actions in the standard action sets when the similarity scores are larger than a score threshold value.
For example: the standard action sets comprise four action sets A1, A2, A3 and A4, wherein A1 shows that a cabinet door is opened, A2 shows that tableware is transported outwards, A3 shows that tableware is transported inwards, and A4 shows that the cabinet door is closed, similarity scores of the limb key point set after the standardization processing and various action sets A1, A2, A3 and A4 in the standard action sets are respectively calculated, and the operation of the operator in the standard action sets is judged according to the similarity scores. For example, if the similarity score of the limb key point set of the video to be detected and the standard action set a1 is obtained by calculating the similarity score and is greater than the score threshold, it is determined that the operator has performed an operation of opening the cabinet door. Fig. 5 is a schematic diagram of 4 standard actions in some embodiments provided herein, showing the opening of a cabinet door, the outward transportation of dishes, the inward transportation of dishes, and the closing of a cabinet door, respectively, as shown in fig. 5.
In some alternative examples, step S280 calculates similarity scores of the limb key point coordinates of the operator after the normalization processing and each standard action set, and when the similarity score is greater than a score threshold, it is determined that the operator performed an action corresponding to the standard action set, and one possible implementation manner is to:
s281, for multi-frame images of a video to be detected, adopting a time window with the length of t and presetting step length according to the time direction
Figure BDA0002830665230000127
Bone movement, calculating each timeObtaining similarity score results s of the standardized limb key point sets and the standard action sets in each time windowk
Wherein n represents the frame rate of the video to be detected, g represents the center time of the time window, and the similarity calculation formula is as follows:
Figure BDA0002830665230000131
where m represents the number of frames in a standard action set, and Y ═ YjDenotes a set of standard actions that are set up,
Figure BDA0002830665230000132
mean value representing a standard action information set, X ═ XjDenotes the set of actions within a time window,
Figure BDA0002830665230000133
represents the average value of the motion information set within the time window,
Figure BDA0002830665230000134
Figure BDA0002830665230000135
Figure BDA0002830665230000136
Figure BDA0002830665230000137
Figure BDA0002830665230000138
wherein the content of the first and second substances,
Figure BDA0002830665230000139
respectively representing the coordinates of the key points of the wrist, elbow and shoulder of the operator, wherein the equation kx-y + b is 0 representing the linear equation passing through the bone center of the operator and perpendicular to the surface of the operation object, wherein k and b represent the parameters of the linear equation passing through the bone center of the operator and perpendicular to the surface of the operation object, the equation k 'x-y + b' is 0 representing the linear equation of the surface of the operation object, and k 'and b' represent the parameters of the linear equation of the surface of the operation object.
In some embodiments, the parameters of the linear equation passing through the center of the operator's bone and perpendicular to the surface of the object and the parameters of the linear equation at the surface of the object may be obtained after training from historical empirical data.
S282, scoring the similarity score results { s ] in all time windowsgAnd forming a vector and multiplying the vector by corresponding weight to obtain a total similarity score, judging that the operator executes the action corresponding to the standard action set when the total similarity score is greater than a score threshold value, and recording a time period when the total similarity score is greater than the score threshold value.
After calculating the correlation of the motion in each time window with the standard motion set using the pearson correlation coefficient, the similarity scores in all time windows are combined into a vector s ═ s (s1,...,sn) Where n represents the total number of time windows, followed by multiplying each element in the vector by a corresponding weight u ═ u (u ═ u)1,...,un)TObtaining a total similarity score, wherein u1,…,unAnd representing the total weight corresponding to each time window, judging that the operator performs the operation corresponding to the action in the standard action set when the total similarity score is greater than a score threshold value, and recording the time period when the total similarity score is greater than the score threshold value.
For example, if the similarity score between the limb key point coordinate set of the video to be tested and the action a1 in the standard action set is 0.82, the similarity score between the limb key point coordinate set of the video to be tested and the action a2 in the standard action set is 0.47, the similarity score between the limb key point coordinate set of the video to be tested and the action A3 in the standard action set is-0.23, the similarity score between the limb key point coordinate set of the video to be tested and the action a4 in the standard action set is-0.18, and the score threshold value is 0.7, it is determined that the operator performs the operation corresponding to the action a 1.
In some alternative examples, if it is determined that there is a detection action, a similarity score s is recordedgThe continuous action time period which is larger than the similarity threshold value is obtained, and the action occurrence time period t is (t ═ t)s,te) Wherein, tsRepresents a similarity score sgStart time, t, of a period of continuous motion greater than a threshold of similarityeRepresents a similarity score sgAn expiration time for a period of continuous motion greater than a similarity threshold.
In some optional examples, after similarity judgment is performed on each action set of the video to be detected and a standard action set, detection results of four actions of opening a cabinet door in the detected video, closing the cabinet door, carrying the video into the cabinet and carrying the video out of the cabinet and action occurrence time periods are obtained, logic judgment is performed on the detection results, whether the operator performs more than two actions in the same time period is judged, unreasonable results are eliminated (for example, two actions of opening the cabinet door and closing the cabinet door cannot occur simultaneously in the same time period), and the detection result of the last continuous action is obtained.
In the method for identifying continuous actions provided in one or more embodiments of the present specification, an operation object detection frame and an operator detection frame in an image to be detected may be obtained by using a deep learning method, then, an operator image located in a working area of an operation object is screened out, coordinates of each limb key point are obtained after a limb identification model is input, after a limb key point coordinate in a video to be detected is normalized, a similarity score between the normalized limb key point coordinate and each standard action set is calculated, and when the similarity score is greater than a score threshold, it is determined that the operator performs an action corresponding to that in the standard action set. The continuous action recognition method disclosed by the specification is beneficial to improving the monitoring accuracy of an operator in performing continuous operation actions on an operation object, can record the time period of continuous operation, overcomes the technical problems of inaccuracy and unreality in the prior art when the operator relies on manual recording, further improves the food safety supervision efficiency, and reduces the labor and material cost of enterprises.
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. For details, reference may be made to the description of the related embodiments of the related processing, and details are not repeated herein.
The foregoing description has been directed to specific embodiments of this disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.
Based on the methods provided in the foregoing embodiments, one or more embodiments of the specification further provide a continuous motion recognition apparatus, please refer to fig. 6, where fig. 6 is a schematic structural diagram of an embodiment of the continuous motion recognition apparatus provided in the specification, and the apparatus may include a detection module 300, a screening module 320, a limb recognition module 340, a normalization module 360, and a similarity module 380.
The detection module 300 is configured to input multiple frames of images with a time sequence relationship into the operation object identification model and the operator identification model, and obtain the operation object detection frame and the operator detection frame in each frame of image.
The screening module 320 is configured to screen out an operator image located in a working area of an operation object according to the operation object detection frame and the operator detection frame.
The limb identification module 340 is configured to input the screened operator image into a limb identification model, and obtain a limb key point coordinate of the operator in each frame of image.
The normalization module 360 is configured to normalize the coordinates of the key points of the limbs of the operator in all the frame images.
The similarity module 380 is configured to calculate similarity scores between the limb key point coordinates of the operator after the normalization processing and each standard action set, and determine that the operator performs an action corresponding to the standard action set when the similarity score is greater than a score threshold.
In some optional examples, the screening module may include a calculating unit configured to calculate a euclidean distance between a center point of the operation object detection frame and a center point of the operator detection frame, and a screening unit configured to screen out an operator image located within a work area of an operation object when the euclidean distance is smaller than a distance threshold.
In some optional examples, the normalization module may include a smoothing unit, a projection length unit, and a normalization unit.
The smoothing unit is used for smoothing the coordinates of the limb key points of the operator in each frame of image and performing mean value calculation to obtain the skeleton center of the operator.
In some embodiments, the smoothing unit determines whether abnormal limb key points exist according to coordinates of the limb key points; when abnormal limb key points exist, performing mean value calculation on the abnormal limb key points in the current frame image by using coordinates of the limb key points of corresponding parts in the front and rear preset frame images to obtain coordinates of each limb key point after smoothing treatment; and calculating the average value of the coordinates of the key points of each limb after the smoothing treatment to obtain the bone center of the operator.
The projection length unit is used for calculating the projection length from the bone center to a pre-trained straight line on the surface of an operation object in each frame of image, and acquiring a projection length set { l ] in a video to be measuredj}。
The standardization unit is used for collecting the coordinates of the key points of the limbs of the operator in the video to be tested
Figure BDA0002830665230000161
And set of projection lengths { l }jCarrying out length standardization and target standardization processing on the limb key point coordinates of the operator in the video to be tested, wherein,
Figure BDA0002830665230000162
the coordinates of the limb key points are represented by i, the types of the limb key points are represented by j, and the time sequence number of the frame image is represented by j;
wherein the length normalization is as follows:
Figure BDA0002830665230000163
wherein the content of the first and second substances,
Figure BDA0002830665230000164
lsign boardFor standard distance, l ═ max (l)j) I.e. l is the set of projection lengths { l }jThe maximum value in (c);
the formula for the number normalization is as follows:
Figure BDA0002830665230000165
wherein int [ x ]]Is the integer of x, wherein x is b multiplied by r,
Figure BDA0002830665230000166
nicoordinate set of limb key points for ith limb key point
Figure BDA0002830665230000167
Number of (1), niFor the number of ith limb key points in the standard action set, carrying out length standardization and target standardization on the limb key point set to obtain a standardized limb key point set
Figure BDA0002830665230000168
The similarity module may include a similarity calculation unit and a comparison unit.
The similarity calculation unit is used for adopting a time window with the length of t for multi-frame images of a video to be detected and presetting step length according to the time direction
Figure BDA0002830665230000169
Sliding, calculating the similarity score between the standardized limb key point set and each standard action set in each time window to obtain a similarity score result s in each time windowgWherein n represents the frame rate of the video to be detected, g represents the center time of the time window, and the similarity calculation formula is as follows:
Figure BDA0002830665230000171
where m represents the number of frames in a standard action set, and Y ═ YjDenotes a set of standard actions that are set up,
Figure BDA0002830665230000172
mean value representing a standard action information set, X ═ XjDenotes the set of actions within a time window,
Figure BDA0002830665230000173
represents the average value of the motion information set within the time window,
Figure BDA0002830665230000174
Figure BDA0002830665230000175
Figure BDA0002830665230000176
Figure BDA0002830665230000177
Figure BDA0002830665230000178
wherein the content of the first and second substances,
Figure BDA0002830665230000179
respectively representing the coordinates of the key points of the wrist, elbow and shoulder of the operator, wherein the equation kx-y + b is 0 representing the linear equation passing through the bone center of the operator and perpendicular to the surface of the operation object, wherein k and b represent the parameters of the linear equation passing through the bone center of the operator and perpendicular to the surface of the operation object, the equation k 'x-y + b' is 0 representing the linear equation of the surface of the operation object, and k 'and b' represent the parameters of the linear equation of the surface of the operation object.
The comparison unit is used for scoring the similarity scores in all time windows into a result sgAnd forming a vector and multiplying the vector by corresponding weight to obtain a total similarity score, judging that the operator executes the action corresponding to the standard action set when the total similarity score is greater than a score threshold value, and recording a time period when the total similarity score is greater than the score threshold value.
In some optional examples, after similarity judgment is performed on each action set of the video to be detected and a standard action set, detection results of four actions of opening a cabinet door in the detected video, closing the cabinet door, carrying the video into the cabinet and carrying the video out of the cabinet and action occurrence time periods are obtained, logic judgment is performed on the detection results, whether the operator performs more than two actions in the same time period is judged, unreasonable results are eliminated (for example, two actions of opening the cabinet door and closing the cabinet door cannot occur simultaneously in the same time period), and the detection result of the last continuous action is obtained.
The device for recognizing continuous actions provided in one or more embodiments of the present specification may obtain an operation object detection box and an operator detection box in a scene to be detected by using a deep learning method, then screen out multiple frame images that meet a condition, classify the multiple frame images according to recognition by using a clustering method to obtain at least one action set, compare the similarity between each action set and an action in a standard action set to obtain a similarity score, and determine an operation executed by an operator according to the similarity score. The continuous-action operation information identification method disclosed by the specification is beneficial to improving the accuracy of monitoring the operation condition of the operation object by an operator, overcomes the technical problems of inaccuracy and unreality when the operation object use condition is recorded manually in the prior art, further improves the food safety supervision efficiency, and reduces the labor and material cost of enterprises.
It should be noted that the above-mentioned continuous motion recognition apparatus may also include other embodiments according to the description of the method embodiment. The specific implementation manner may refer to the description of the related method embodiment, and is not described in detail herein.
Correspondingly, the embodiment of the present specification also discloses an electronic device, which includes a memory, a processor and a computer program stored in the memory and executable on the processor, and when the processor executes the computer program, the steps of the method for identifying continuous actions in any of the above embodiments of the present specification are implemented.
Accordingly, the embodiments of the present specification also disclose a computer-readable storage medium, on which a computer program is stored, which when executed by a processor implements the steps of the method for recognizing continuous actions described in any of the above embodiments of the present specification.
The embodiments of the present description are not limited to what must be consistent with a standard data model/template or described in the embodiments of the present description. Certain industry standards, or implementations modified slightly from those described using custom modes or examples, may also achieve the same, equivalent, or similar, or other, contemplated implementations of the above-described examples. The embodiments using these modified or transformed data acquisition, storage, judgment, processing, etc. may still fall within the scope of the alternative embodiments of the present description.
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment. In the description of the specification, reference to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the specification. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.
The above description is only an example of the present specification, and is not intended to limit the present specification. Various modifications and alterations to this description will become apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present specification should be included in the scope of the claims of the present specification.

Claims (8)

1. A method for recognizing a continuous motion, the method comprising:
respectively inputting a plurality of frames of images with time sequence relation into an operation object identification model and an operator identification model, and respectively obtaining an operation object detection frame and an operator detection frame in each frame of image;
screening out an operator image in a working area of the operation object according to the operation object detection frame and the operator detection frame;
inputting the screened operator images into a limb recognition model, and obtaining the limb key point coordinates of the operator in each frame of image;
standardizing the limb key point coordinates of the operator in all the frame images;
calculating similarity scores of the limb key point coordinates of the operator after the standardization processing and each standard action set, and judging that the operator executes corresponding actions in the standard action sets when the similarity scores are larger than a score threshold value;
wherein the normalizing the limb key point coordinates of the operator in all the frame images comprises:
smoothing the coordinates of the limb key points of the operator in each frame of image and performing mean value calculation to obtain the skeleton center of the operator;
calculating the projection length from the skeleton center to the surface line of the operation object trained in advance in each frame image, and obtaining a projection length set in all the frame images
Figure 266419DEST_PATH_IMAGE001
According to the coordinate set of the key points of the limbs of the operator in all the frame images
Figure 526499DEST_PATH_IMAGE002
And set of projection lengths
Figure 339734DEST_PATH_IMAGE001
And carrying out length standardization and target standardization on the limb key point coordinates of the operator, wherein,
Figure 431187DEST_PATH_IMAGE003
the coordinates of the limb key points are represented by i, the types of the limb key points are represented by j, and the time sequence number of the frame image is represented by j;
wherein the length is normalized by the formulaThe following:
Figure 671676DEST_PATH_IMAGE004
wherein the content of the first and second substances,
Figure 974044DEST_PATH_IMAGE005
Figure 653287DEST_PATH_IMAGE006
the distance is a standard distance, and the distance is,
Figure 474613DEST_PATH_IMAGE007
i.e. by
Figure 10636DEST_PATH_IMAGE008
Is the set of projection lengths
Figure 917413DEST_PATH_IMAGE001
Maximum value of (1);
the formula for the number normalization is as follows:
Figure 400347DEST_PATH_IMAGE009
wherein the content of the first and second substances,int[x]is composed ofxThe number of the round-off points is obtained,
Figure 702277DEST_PATH_IMAGE010
r represents the serial number of the key points of the limbs,
Figure 550147DEST_PATH_IMAGE011
is as followsiCoordinate set of each limb key point in limb key point
Figure 272116DEST_PATH_IMAGE012
The number of the (B) is less than the total number of the (B),
Figure 89899DEST_PATH_IMAGE011
is as followsiIndividual limb key point in the standard movementThe number of the centralizers is that of the centralizers,
Figure 885817DEST_PATH_IMAGE013
representing a standardized set of limb key points;
wherein, the calculating the similarity score between the limb key point coordinate of the operator after the normalization processing and each standard action set, and when the similarity score is greater than a score threshold, the judging that the operator performs the action corresponding to the standard action set includes:
for all frame images, a time window with the length of t is adopted, and preset step length is adopted according to the time direction
Figure 763643DEST_PATH_IMAGE014
Sliding, calculating the similarity score between the standardized limb key point set and each standard action set in each time window to obtain the similarity score result in each time window
Figure 645011DEST_PATH_IMAGE015
Wherein n represents the frame rate of the video to be detected, g represents the center time of the time window, and the similarity calculation formula is as follows:
Figure 765021DEST_PATH_IMAGE016
where m represents the number of frames in a standard action set,
Figure 743341DEST_PATH_IMAGE017
a set of standard actions is represented that are,
Figure 667435DEST_PATH_IMAGE018
represents the average of the standard set of motion information,
Figure 426312DEST_PATH_IMAGE019
a set of actions within a time window is represented,
Figure 461264DEST_PATH_IMAGE020
represents the average value of the motion information set within the time window,
Figure 294091DEST_PATH_IMAGE021
Figure 15185DEST_PATH_IMAGE022
Figure 871145DEST_PATH_IMAGE023
Figure 834422DEST_PATH_IMAGE024
Figure 459438DEST_PATH_IMAGE025
wherein the content of the first and second substances,
Figure 849968DEST_PATH_IMAGE026
Figure 255542DEST_PATH_IMAGE027
Figure 632297DEST_PATH_IMAGE028
respectively representing the coordinates of the key points of the wrist, the elbow and the shoulder of the operator
Figure 6427DEST_PATH_IMAGE029
A linear equation which is expressed by the bone center of the operator and is vertical to the surface of the operation object, wherein k and b are expressed by the linear equation which is expressed by the bone center of the operator and is vertical to the surface of the operation objectParameters, equations
Figure 443225DEST_PATH_IMAGE030
A linear equation representing the surface of the operation object,
Figure 132832DEST_PATH_IMAGE031
and
Figure 641174DEST_PATH_IMAGE032
parameters of a linear equation representing the surface of the operation object;
scoring the similarity in all time windows
Figure 506361DEST_PATH_IMAGE033
And forming a vector and multiplying the vector by corresponding weight to obtain a total similarity score, judging that the operator executes the action corresponding to the standard action set when the total similarity score is greater than a score threshold value, and recording a time period when the total similarity score is greater than the score threshold value.
2. The method of claim 1, wherein the object is a disinfection cabinet, and the standard and medium actions include opening a door of the disinfection cabinet, closing the door of the disinfection cabinet, moving dishes into the disinfection cabinet, and moving dishes out of the disinfection cabinet.
3. The method for recognizing a continuous motion according to claim 1, wherein the screening out the operator image located in the work area of the operation object based on the operation object detection frame and the operator detection frame includes:
calculating the Euclidean distance between the central point of the operation object detection frame and the central point of the operator detection frame;
and screening out the images of the operators in the working area of the operation object when the Euclidean distance is smaller than a distance threshold value.
4. The method of continuous motion recognition as claimed in claim 1, wherein after recording a time period in which the overall similarity score is greater than a score threshold, the method further comprises:
judging whether the operator executes more than two actions in the same time period;
if so, deleting more than two actions executed in the same time period to obtain continuous actions in the video to be detected.
5. The method for recognizing continuous actions according to claim 1, wherein the step of smoothing the coordinates of the key points of the limbs of the operator in each frame of image and performing a mean calculation to obtain the bone center of the operator comprises:
judging whether abnormal limb key points exist or not according to the coordinates of the limb key points;
when abnormal limb key points exist, performing mean value calculation on the abnormal limb key points in the current frame image by using coordinates of the limb key points of corresponding parts in the front and rear preset frame images to obtain coordinates of each limb key point after smoothing treatment;
and calculating the average value of the coordinates of the key points of each limb after the smoothing treatment to obtain the bone center of the operator.
6. A continuous motion recognition device, the device comprising:
the detection module is used for respectively inputting a plurality of frames of images with time sequence relation into the operation object identification model and the operator identification model and respectively obtaining an operation object detection frame and an operator detection frame in each frame of image;
the screening module is used for screening out the operator image in the working area of the operation object according to the operation object detection frame and the operator detection frame;
the limb identification module is used for inputting the screened operator images into a limb identification model and obtaining the limb key point coordinates of the operator in each frame of image;
the standardization module is used for carrying out standardization processing on the limb key point coordinates of the operator in all the frame images;
the similarity module is used for calculating similarity scores of the limb key point coordinates of the operator after the standardization processing and each standard action set, and judging that the operator executes corresponding actions in the standard action sets when the similarity scores are larger than a score threshold;
the standardization module is used for standardizing the limb key point coordinates of the operator in all the frame images, and comprises the following steps:
smoothing the coordinates of the limb key points of the operator in each frame of image and performing mean value calculation to obtain the skeleton center of the operator;
calculating the projection length from the skeleton center to the surface line of the operation object trained in advance in each frame image, and obtaining a projection length set in all the frame images
Figure 238694DEST_PATH_IMAGE001
According to the coordinate set of the key points of the limbs of the operator in all the frame images
Figure 25384DEST_PATH_IMAGE002
And set of projection lengths
Figure 838882DEST_PATH_IMAGE001
And carrying out length standardization and target standardization on the limb key point coordinates of the operator, wherein,
Figure 417631DEST_PATH_IMAGE003
the coordinates of the limb key points are represented by i, the types of the limb key points are represented by j, and the time sequence number of the frame image is represented by j;
wherein the length normalization is as follows:
Figure 196231DEST_PATH_IMAGE004
wherein the content of the first and second substances,
Figure 594851DEST_PATH_IMAGE005
Figure 382679DEST_PATH_IMAGE006
the distance is a standard distance, and the distance is,
Figure 376786DEST_PATH_IMAGE007
i.e. by
Figure 326288DEST_PATH_IMAGE008
Is the set of projection lengths
Figure 681046DEST_PATH_IMAGE001
Maximum value of (1);
the formula for the number normalization is as follows:
Figure 866039DEST_PATH_IMAGE009
wherein the content of the first and second substances,int[x]is composed ofxThe number of the round-off points is obtained,
Figure 294747DEST_PATH_IMAGE010
r represents the serial number of the key points of the limbs,
Figure 539783DEST_PATH_IMAGE011
first, theiCoordinate set of each limb key point in limb key point
Figure 53941DEST_PATH_IMAGE012
The number of the (B) is less than the total number of the (B),
Figure 544091DEST_PATH_IMAGE034
is as followsiThe number of key points of each limb in the standard set of actions,
Figure 827304DEST_PATH_IMAGE013
representing a standardized set of limb key points;
the similarity module is configured to calculate similarity scores between the limb key point coordinates of the operator after the normalization processing and each standard action set, and when the similarity score is greater than a score threshold, it is determined that the operator has performed an action corresponding to the standard action set, where the similarity score is greater than a score threshold, and the determining includes:
for all frame images, a time window with the length of t is adopted, and preset step length is adopted according to the time direction
Figure 446504DEST_PATH_IMAGE014
Sliding, calculating the similarity score between the standardized limb key point set and each standard action set in each time window to obtain the similarity score result in each time window
Figure 307013DEST_PATH_IMAGE015
Wherein n represents the frame rate of the video to be detected, g represents the center time of the time window, and the similarity calculation formula is as follows:
Figure 974755DEST_PATH_IMAGE016
where m represents the number of frames in a standard action set,
Figure 237109DEST_PATH_IMAGE017
a set of standard actions is represented that are,
Figure 699314DEST_PATH_IMAGE018
represents the average of the standard set of motion information,
Figure 828811DEST_PATH_IMAGE019
a set of actions within a time window is represented,
Figure 300244DEST_PATH_IMAGE020
represents the average value of the motion information set within the time window,
Figure 417105DEST_PATH_IMAGE021
Figure 378107DEST_PATH_IMAGE022
Figure 88574DEST_PATH_IMAGE023
Figure 222753DEST_PATH_IMAGE024
Figure 335065DEST_PATH_IMAGE025
wherein the content of the first and second substances,
Figure 30751DEST_PATH_IMAGE026
Figure 962935DEST_PATH_IMAGE027
Figure 104066DEST_PATH_IMAGE028
respectively representing the coordinates of the key points of the wrist, the elbow and the shoulder of the operator
Figure 929940DEST_PATH_IMAGE029
A linear equation representing the center of the bone of the operator and perpendicular to the surface of the object, wherein k and b represent parameters of the linear equation representing the center of the bone of the operator and perpendicular to the surface of the object
Figure 436007DEST_PATH_IMAGE030
A linear equation representing the surface of the operation object,
Figure 980121DEST_PATH_IMAGE031
and
Figure 331468DEST_PATH_IMAGE032
parameters of a linear equation representing the surface of the operation object;
scoring the similarity in all time windows
Figure 979225DEST_PATH_IMAGE033
And forming a vector and multiplying the vector by corresponding weight to obtain a total similarity score, judging that the operator executes the action corresponding to the standard action set when the total similarity score is greater than a score threshold value, and recording a time period when the total similarity score is greater than the score threshold value.
7. An electronic device, comprising:
a memory for storing a computer program;
a processor for executing the computer program stored in the memory, and when the computer program is executed, implementing the method for recognizing a continuous motion according to any one of claims 1 to 5.
8. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method of continuous action recognition according to any one of the preceding claims 1 to 5.
CN202011459110.6A 2020-12-11 2020-12-11 Method, apparatus, medium, and device for recognizing continuous motion Active CN112464882B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011459110.6A CN112464882B (en) 2020-12-11 2020-12-11 Method, apparatus, medium, and device for recognizing continuous motion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011459110.6A CN112464882B (en) 2020-12-11 2020-12-11 Method, apparatus, medium, and device for recognizing continuous motion

Publications (2)

Publication Number Publication Date
CN112464882A CN112464882A (en) 2021-03-09
CN112464882B true CN112464882B (en) 2021-09-10

Family

ID=74803694

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011459110.6A Active CN112464882B (en) 2020-12-11 2020-12-11 Method, apparatus, medium, and device for recognizing continuous motion

Country Status (1)

Country Link
CN (1) CN112464882B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113837112A (en) * 2021-09-27 2021-12-24 联想(北京)有限公司 Video data processing method and electronic equipment
CN113918769A (en) * 2021-10-11 2022-01-11 平安国际智慧城市科技股份有限公司 Method, device and equipment for marking key actions in video and storage medium
CN114167993B (en) * 2022-02-10 2022-05-24 北京优幕科技有限责任公司 Information processing method and device

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9122378B2 (en) * 2012-05-07 2015-09-01 Seiko Epson Corporation Image projector device
CN107909060A (en) * 2017-12-05 2018-04-13 前海健匠智能科技(深圳)有限公司 Gymnasium body-building action identification method and device based on deep learning
CN108985259B (en) * 2018-08-03 2022-03-18 百度在线网络技术(北京)有限公司 Human body action recognition method and device
CN110348335B (en) * 2019-06-25 2022-07-12 平安科技(深圳)有限公司 Behavior recognition method and device, terminal equipment and storage medium
CN110490168A (en) * 2019-08-26 2019-11-22 杭州视在科技有限公司 Meet machine human behavior monitoring method in airport based on target detection and skeleton point algorithm

Also Published As

Publication number Publication date
CN112464882A (en) 2021-03-09

Similar Documents

Publication Publication Date Title
CN112464882B (en) Method, apparatus, medium, and device for recognizing continuous motion
US10949658B2 (en) Method and system for activity classification
CN112434666B (en) Repetitive motion recognition method, device, medium, and apparatus
Gowsikhaa et al. Suspicious Human Activity Detection from Surveillance Videos.
US11093886B2 (en) Methods for real-time skill assessment of multi-step tasks performed by hand movements using a video camera
CN109598229A (en) Monitoring system and its method based on action recognition
CN109767422A (en) Pipe detection recognition methods, storage medium and robot based on deep learning
CN111325137A (en) Violence sorting detection method, device, equipment and storage medium
CN110555417A (en) Video image recognition system and method based on deep learning
CN111931654A (en) Intelligent monitoring method, system and device for personnel tracking
CN108762503A (en) A kind of man-machine interactive system based on multi-modal data acquisition
CN111353338A (en) Energy efficiency improvement method based on business hall video monitoring
US8542905B2 (en) Determining the uniqueness of a model for machine vision
CN111985402A (en) Substation security fence crossing behavior identification method, system and equipment
CN111353343A (en) Business hall service standard quality inspection method based on video monitoring
CN116959099A (en) Abnormal behavior identification method based on space-time diagram convolutional neural network
CN112464880B (en) Night foreign body detection method, device, medium and equipment
Ji et al. Motion time study with convolutional neural network
CN115083022A (en) Pet behavior identification method and device and readable storage medium
Ayumi et al. Multimodal decomposable models by superpixel segmentation and point-in-time cheating detection
de León et al. Continuous activity recognition with missing data
CN114898287A (en) Method and device for dinner plate detection early warning, electronic equipment and storage medium
Suresh et al. Marginal MAP estimation for inverse RL under occlusion with observer noise
CN112541870A (en) Video processing method and device, readable storage medium and electronic equipment
Zin et al. Feature detection and classification of cow motion for predicting calving time

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP03 Change of name, title or address

Address after: Building 7, No. 124 Dongbao Road, Dongcheng Street, Dongguan City, Guangdong Province, 523128

Patentee after: Guangdong Prophet Big Data Co.,Ltd.

Country or region after: China

Address before: 523128 Room 401, building 6, No.5 Weifeng Road, Dongcheng Street, Dongguan City, Guangdong Province

Patentee before: Dongguan prophet big data Co.,Ltd.

Country or region before: China